183 85 18MB
English Pages 747 [756] Year 2003
Living on the Edge
W G DE
Studies in Generative Grammar 62
Editors
Harry van der Hulst Jan Köster Henk van Riemsdijk
Mouton de Gruyter Berlin • New York
Living on the Edge 28 Papers in Honour of Jonathan Kaye
edited by
Stefan Ploch
Mouton de Gruyter Berlin · New York
2003
M o u t o n de Gruyter (formerly Mouton, The Hague) is a Division of Walter de Gruyter G m b H & Co. KG, Berlin.
The series Studies in Generative G r a m m a r was formerly published by Foris Publications Holland.
© Printed on acid-free paper which falls within the guidelines of the A N S I to ensure permanence and durability.
I S B N 3-11-017619-X Bibliographic information published by Die Deutsche
Bibliothek
Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at < h t t p : / / d n b . d d b . d e > .
© Copyright 2003 by Walter de Gruyter G m b H & Co. K G , D-10785 Berlin. All rights reserved, including those of translation into foreign languages. N o part of this book may be reproduced in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Printed in Germany.
Jonathan Kaye (Picture courtesy Bernard Howard)
Preface
It took almost five years, I wrote about 250 pages in editorial comments, at least 20 letters and 400 emails, I typeset about 750 pages and collected 50 pages of terms and names for the index. Along the way, there were several people who helped me and who I would therefore like to mention here. This project was originally conceived in the Postgraduate Common Room of the School of Oriental and African Studies, some time in 1998. The people present were Monik Charette, Margaret Cobb, Sean Jensen, and myself. To all three, I would like to express my gratitude. After a little while, it became clear that the editors of this Festschrift would be Monik Charette and myself; Monik was involved in writing the original call for papers and in many dealings with potential publishers. For personal reasons, Monik decided in 1999 to withdraw from her editorial role (while remaining one of the contributors) but continued to be an important source of information about Jonathan Kaye and about people who could be of help to me. For all of these reasons, Monik deserves special mention and an extra portion of gratitude. From 1999 to 2000, Geoff Williams assisted me in proof-reading some of the manuscripts and in composing a number of emails and letters to potential publishers, for which I would like to thank him. In addition to contributing to this Festschrift, Elan Dresher, Edmund Gussmann, Michael Kenstowicz, Yves-Charles Morin and Glyne Piggott provided information about Jonathan and useful comments about various aspects of typesetting and of orthographical matters. Thank you. Many thanks also to Elan Dresher, Sean Jensen, and Yves-Charles Morin for offering their personal experiences with Jonathan Kaye (cf. "Testimonials"). Even though it was not necessary in the end, I would like to thank Friedrich Neubarth for his offer to write a perl script for the index.
vi
Preface
Another person I wish to mention is Ursula Kleinhenz from Mouton de Gruyter, who showed a great deal of patience and care for this project. Then, Harry van der Hulst and Nancy Ritter deserve my gratitude for helping me with the arrangement of the articles; Harry was also very supportive as one of the editors of this series (Studies in Generative Grammar). Of course, I would like to say thank you to all contributors. Last but not least, I want to express my deep gratitude to John Rennison: he showed me the ropes of typesetting and editing, he explained some of the finer details of Word to me, he helped with the preparation of the summaries of each paper and of the index. He generously assisted me in assembling the entire volume. Stefan Ploch Johannesburg, 25 July, 2003
Contents Preface Jonathan D. Kaye Curriculum vitae Testimonials Publications Instead of an introduction 1 1.1.
1
Acquisition
Nancy A. Ritter On the logical order of development in acquiring prosodic structure
7
29
Computation
Geoff Williams On the computability of certain derivations in Government Phonology 1.3.
xi xiii xxi
General issues
B. Elan Dresher Meno's paradox and the acquisition of grammar
1.2.
ν
55
The organisation of grammar
Harry van der Hulst Structure paradoxes in phonology
75
John R. Rennison and Friedrich Neubarth An x-bar theory of Government Phonology
95
1.4.
Philosophy of science and metatheory
Sean Jensen Meta-phonological speculations
131
viii
Contents
Stefan Ploch Metatheoretical problems in phonology with Occam's Razor and non-ad-hoc-ness 2
149
Elements: segmental structure and processes
Farida Cassimjee and Charles W. Kisseberth Eerati tone: towards a tonal dialectology of Emakhuwa
203
Margaret Cobb Government Phonology and the vowel harmonies of Natal Portuguese and Yoruba
223
Thais Cristofaro-Silva Palatalisation in Brazilian Portuguese
243
Michael Kenstowicz, Mahasen Abu-Mansour and Miklos Törkenczy Two notes on laryngeal licensing
259
Tobias Scheer On spirantisation and affricates
283
3 3.1.
Structure Branching onsets
Eugeniusz Cyran Branching onsets in Polish
303
Edmund Gussmann Are there branching onsets in Modern Icelandic?
321
Jean Lowenstamm Remarks on mutae cum liquidä and branching onsets
339
Emmanuel Nikiema Defective syllables: the other story of Italian sC(C)-sequences
365
Contents
3.2.
ix
"Codcts"
Yves Charles Morin Remarks on prenominal liaison consonants in French
385
Glyne L. Piggott The phonotactics of a "Prince" language: a case study
401
Keren Rice On the syllabification of right-edge consonants — evidence from Ahtna (Athapaskan)
427
Yuko Yoshida Licensing constraint to let
449
3.3.
Empty categories
Monik Charette Empty and pseudo-empty categories
465
Yong Heo Unlicensed domain-final empty nuclei in Korean
481
Seon-Jung Kim Unreleasing: the case of neutralisation in Korean
497
3.4.
"Syllabic consonants "
Grazyna Rowicka /r/ syllabicity: Polish versus Bulgarian and Serbo-Croatian
511
Shohei Yoshida The syllabic nasal in Japanese
527
3.5.
Templates and morphology
Ann Denwood Template and morphology in Khalkha Mongolian — and beyond?
543
χ
Contents
Yeng-Seng Goh A non-derivational analysis of the so-called "diminutive retroflex suffixation"
563
M. Masten Guerssel Why Arabic guttural assimilation is not a phonological process
581
3.6.
Metrical structure
Jean-Roger Vergnaud On a certain notion of "occurrence": the source of metrical structure, and of much more
599
References
633
Subject index
685
Language index
717
Names index
719
Contributors
725
Jonathan D. Kaye: Curriculum vitae
Jonathan received his PhD from Columbia University under the supervision of Professor Weinreich. His thesis was on the South American language Desano ("The Desano verb: problems in semantics, syntax and phonology") and was completed after a year of fieldwork in Amazonia. In 1967 he was offered his first position at the University of Toronto. In 1974 he spent a sabbatical year at McGill University and then joined the Universite du Quebec a Montreal (UQAM) in 1975. At the University of Toronto he directed the Odawa Language Project, a research project on Eastern Ojibwa. When he joined UQAM he continued to work on Ojibwa, and from 1977 until 1981 he codirected, with Glyne Piggott, his first PhD student, a research project on Algonquin, the variety of Ojibwa spoken in the province of Quebec. In 1978, he was also invited to give a class at the Linguistics Institute of the University of Illinois. His interest then shifted to West African languages. From 1981 and until he left Canada in 1988 he had a number of major research grants funded by Canada to work on West African languages. He funded and directed the Projet de recherche en linguistique africaniste. In the mid 1980s Jean Lowenstamm joined him at UQAM and co-directed the project with him. Among the research assistants who have worked in the project are Dominic Sportiche, Hilda Koopman, Isabelle Haik and Laurie Tuller. He left Montreal and went to the School of Oriental and African Studies (SOAS, University of London) in 1988 to take Firth's Chair. He left SOAS for China in 1999 to take up a position at the Guangdong University of Foreign Studies. He left China in 2001 to live and work in Spain.
Jonathan D. Kaye: Testimonials
It is common that the editor of a Festschrift provide some background on the person to be honoured. I have decided, however, to let a number of contributors recount their own experiences with Jonathan Kaye. For me personally, he was the most significant influence on my thinking as a linguist.
B. Elan Dresher Jonathan Kaye as a researcher I have known Jonathan Kaye since the early 1970's, when I was studying at McGill University and he was teaching at the University of Toronto. Since then, I have attended many lectures of his in seminars and conferences, I have read and studied his work on a variety of topics, and I have collaborated with him on a project to develop a learning algorithm for the acquisition of stress systems. In my opinion, Jonathan Kaye is one of the most talented phonologists in the world. His own work is a rare blend of original field work and original theorising. He is thus an uncommonly valuable resource, someone who can give guidance in both the theoretical and practical aspects of research. He has published prolifically on a variety of topics in phonological theory, often bringing original field work to bear on current theoretical issues. He has worked on a wide range of topics in phonological theory, including segmental phonology, syllable theory, stress, and more general considerations such as learnability, abstractness and recoverability. He is also one of the founders of the theory of Government Phonology, a theory which has become widely known and which has been quite influential. He has worked closely on an unusually long list of languages, including native languages of South and North America (Desano and Ojibwa, to name just two), of the Ivory Coast (Vata and others), on Slavic (notably Polish), Semitic (various dialects of Arabic and Ethiopian lan-
xiv
Testimonials
guages), Berber, French (Quebec and European), and others. He has also done pioneering and innovative work in computational phonology. Beyond his own publications and research, Jonathan Kaye has had an important and unique impact on Canadian linguistics through the research projects he founded and headed, and the many students he trained in this country. When he first came to Canada generative grammar was still not very well established in Canada. Jonathan was a passionate and articulate advocate for this approach to linguistics, and did much to further understanding of the theory in Canada. Virtually every senior Canadian trained phonologist now active studied with him, and had him as an advisor. At the University of Toronto, Dr. Kaye headed the Odawa Language Project, which produced linguists such as Glyne Piggott (McGill), Pat Shaw (University of British Columbia), and Keren Rice (Toronto), all of whom remain in the forefront of research on native languages of Canada. Later, at the Universite du Quebec ä Montreal (UQAM), Kaye founded the African Language Project, which included large-scale studies of the Kru languages of the Ivory Coast, and then the languages of Ethiopia and North Africa. This project, which is still running, attracted many good students and research fellows, including Jean Lowenstamm (Paris), Carole Paradis (Laval), Jean-Francis Prunet (formerly UQAM), Monik Charette (SOAS), and Emmanuel Nikiema (Toronto), to name but a few. At SOAS, he again created an important program, this time with a focus on Government Phonology, and branched into work on phonetics and speech recognition. As is evident from the above, Jonathan Kaye is unusually good at developing interesting research projects and inspiring and guiding students. Then, he is one of the best speakers and lecturers I have ever seen. He would lecture without notes, even at conferences where talks had to fit into a certain amount of time — 20 minutes, 30 minutes — and he always fit the talk exactly to the time. I on the other hand always read talks — I had to have it all written out. When we gave joint talks on our work, I had to have a text, but Jonathan couldn't work from a written text. Even if the talk was written, he had to deliver it spontaneously. So we worked out a system — I would start and do
Testimonials
xv
the first half my way, and Jonathan would take over in his style. It worked very well — I certainly would not have wanted to follow him. Collaborating with Jonathan was a great experience, and the work we did then still forms the basis of my current research. I learned from him not to be satisfied with conventional solutions, but to look deeper for principled explanations. Jonathan Kaye, the person Jonathan was always a very gregarious person — he really liked and needed to be around people — not any people, but people he could have fun with. I think this trait, and his general philosophy about what people are like, contributed to make him such a great field worker. I once asked him how he was able to live for a year among the Desano in the Amazon. Their whole culture is so different from anything he knew before, how could he relate to people like that? I thought it is hard enough relating to people in our culture, let alone a completely alien one. Jonathan replied that I was thinking about it wrong. He said that in his view, the real differences are not between cultures, but between types of people, and that you find pretty much the same types everywhere. For example, he said, just like in North America there are certain types of people he doesn't get along with — administrators, various officious types. The same was true with the Desano — the religious leaders didn't like him. He said every time one of them passed by him, he (the Desano) shook his head sadly. "According to him," Jonathan told me, "I was totally useless — I didn't know how to plant prayer sticks, I didn't believe in the right things, I was a total failure as a person in his eyes. It's just the same here with priests and rabbis and other such people. But," he went on, "I got along fine with regular people, just like I do here. These people didn't take the religious stuff too seriously — they liked to have a beer and talk about stuff and have a good time." In every society you find the people you get along with and stay away from the ones you don't. He genuinely believed in the oneness of humanity in this respect, and lived this philosophy. I think that this
xvi
Testimonials
is an important insight in these times in particular, when people are talking about "wars between civilisations." The people Jonathan liked were the ones who were not overly impressed by the constructs of culture, religion, or politics that placed barriers — artificial, in his view — between people. For him, the real clash was not between cultures, but between the custodians of official dogmas and those who, like himself, were guided by their own curiosity and creativity. Jonathan was very good at meeting people. One summer in the early 1980s, he and Jean Lowenstamm kindly agreed to drive down to Providence, Rhode Island, where I had been living, to help me move my stuff back to Montreal. On the way out of town, we stopped at a deli that I used to eat at. I went there frequently, but never met anyone there. After the meal we all went to the washroom. A man was there washing his hands. For some reason, he mumbled a few words to us, not much — excuse me, or he thanked us for holding the door, or maybe we bumped into him because the room was crowded — I don't remember, I hardly heard him. But from this one brief utterance Jonathan detected an accent and asked him, in Portuguese: "Portuguese?" The man was, and suddenly they were great friends, chatting away in Portuguese. It turned out he was the husband of the Portuguese consul in Providence (a city with a large Portuguese population), and by the time we left the washroom Jonathan had a standing invitation to visit the consulate. This sort of thing happened all the time. On a flight to Spain we sat next to a guy who ended up inviting us (really Jonathan, who did all the talking) to his villa on the Costa del Sol. Jonathan's approach was always centered on people. We were together at GLOW in Girona in 1986. Jonathan loved Girona — the people, the language, the restaurants. After a few days he seemed to know everyone, and already had a favorite restaurant, bar, etc., as if he'd lived there for years. After the workshop was over, I wanted to go to Barcelona to see the sights there. Jonathan wanted to stay in Girona. He wondered why I wanted to go to Barcelona. I thought it would be self-evident: Barcelona is a famous city with lots of things to see. I liked Girona too, but didn't think it could compete with Barcelona, if we had one or two days to be somewhere.
Testimonials
xvii
Jonathan didn't see it like that at all. "What will you do in Barcelona?" he asked, "wander around and look at buildings? There's buildings everywhere. But you don't know anybody in Barcelona. Here in Girona we know people, we can go to places we know, be with people we know. That's a much better way to spend two days." Jonathan had a great sense of humour, and one could go on about it at length, but I'll finish with one story. We had a stop-over at Heathrow airport once, about four hours between planes. We were in a restricted waiting area, and you had to go through security to get into it. We each had a satchel, and had to put them on the belt to pass through the X-ray machine. There wasn't much to do there, so we decided to go in to London. We left the area, but after walking a few minutes we calculated that we wouldn't really have time, so we returned to the waiting area. We had to go through security again, so we again put our satchels through the machine. After a few minutes we again got bored and decided to go out to look for food. We walked around but couldn't find anything, so we again returned to the waiting area, and lined up again to go through security. I went first and put my bag through the machine. Jonathan was behind me, and as he put his satchel down on the belt, he said to the security guard, "Those little hamsters must be getting really fried by now." On the other side of security I collected my bag, but Jonathan was not behind me. I looked back and saw him taking everything out of his satchel onto the table for the unamused security guard. "Aw, come on," Jonathan said, as the guard asked him to empty his pockets. After that we decided to stay in the area until it was time to board our flight.
Sean Jensen There are two aspects of Jonathan Kaye's philosophy of science which we should like to highlight here. They do not come across as forcefully in his published work as they do in workshop, classroom and conference hall. These two facets are perhaps what sets Jonathan Kaye's ideas about linguistics apart from most others.
xviii
Testimonials
The first modus operandi is his insistence, rightly, that everything that a theory allows to exist, must exist in "the real world". If it seems to be that such predicted things do not exit, that is a serious problem for the theory. A series of lectures for advanced students at the School of Oriental and African Studies (SOAS) in the mid 1990s, and a number of workshops, also at SOAS, spawned a flurry of working papers and research threads investigating the predictive content of Government Phonology, and comparing and contrasting these results with an equally exacting analysis of the predictive content of leading theories from elsewhere (Lexical Phonology and Optimality Theory, principally, as well as the University College of London dialect of Government Phonology). The second modus operandi Jonathan Kaye refers to in lectures as the "Thunderdome", citing the eponymous film as the most important in linguistics. In the Thunderdome, two combatants enter, but only one can come out. For Jonathan Kaye, as for many of us, the only meaningful way to destroy a bad theory is to put it in the Thunderdome with a better theory. One theory predicts X, the other theory predicts not-X; in the Thunderdome, empirical observation provides the "facts" (exactly one of X or not-X). The theory which gets it wrong is obliterated, and the other theory emerges from the Thunderdome, temporarily triumphant, until another, better theory comes along and defeats it in the Thunderdome. The Thunderdome, thus transported to the world of science, and the insistence on the inescapability of logical consequences of a theory, will be immediately familiar to anyone acquainted with the ideas of Karl Popper (which should, of course, be everyone). Jonathan always professed a polite distance between himself and Popper — perhaps due to the staggering amount of misinformation and almost willful misunderstanding of Popper's work in the vast secondary literature — although his working practice is actually the epitome of Popperian methodology.
Testimonials
xix
Yves-Charles Morin I might be interesting to mention that Jonathan was always very keen in using the local vernacular, wherever he was. When in Toronto, he practiced "Canadian raising", with the result that he spoke English with a Manhattan accent plus Canadian raising. In Quebec, he practiced Quebec diphthongisation, with the result that he spoke French with a Manhattan accent plus Canadian raising plus Quebec diphthongisation. He recently sent me an e-mail from Barcelona, written in Catalan (claiming he no longer knew French — I suppose he no longer knew English, or else he wouldn't have used a language I can only read with a dictionary). I presume that in Barcelona he speaks Catalan with a Manhattan accent plus Canadian raising plus Quebec diphthongisation!! I really wonder how it went with his Chinese?
Jonathan D. Kaye: Publications
Books Kaye, J.D. and E.-D. Cook, Linguistic Studies of Native Canada, UBC Press: Vancouver, 279 pages (1978). Kaye, J.D., Koopman, H., Sportiche, D., and A. Dugas, Current Approaches to African Linguistics (Vol 2), Foris: Dordrecht, 400 pages (1983). Kaye, J.D., Angoujard, J.-P., et J. Lowenstamm, Phonologie des langues semitiques, Revue quebecoise de linguistique (collaboration spiciale), 337 pages (1986). Kaye, J.D., Phonology: A Cognitive View, Lawrence Erlbaum Associates: Hillsdale, N.J., 172 pages (1989). Kaye, J.D., Phonology, vol. 7.2. "Government Phonology", Guest Editor. (1990).
Research reports Kaye, J.D., Tokaitchi, K. and G.L. Piggott, Odawa Language Project: First Report, University of Toronto, 200 pages (1971). Kaye, J.D., and G.L. Piggott, Odawa Language Project: Second Report, University of Toronto, 319 pages (1973). Kaye, J.D., and P. Roosen-Runge, A User's Guide to the Phonological Calculator, University of Toronto, 90 pages (1973). Kaye, J.D., Koopman, H., et D. Sportiche, Projet sur les langues kru: Premier Rapport, UQAM, 408 pages (1982). Kaye, J.D., Lowenstamm, J., Haik, I. et L. Tuller, Projet de linguistique africaine: Etudes phonologiques et syntaxiques, UQAM, 516 pages (1985-1986).
Articles in journals or collected works Kaye, J.D., "Nominalized relative clauses in Desano", Canadian Journal of Linguistics 14: 40-57, (1968). Kaye, J.D., "A note on a note on Ambiguity and Vagueness by George Lakoff', Phonetics Laboratory Notes 5: 2-4, University of Michigan, (1970). Kaye, J.D., "Nasal Harmony in Desano", Linguistic Inquiry 2: 37-56, (1971).
xxii Jonathan D. Kaye: Publications Kaye, J.D. and G.L. Piggott, "On the cyclical nature of Ojibwa T-palatalization", Linguistic Inquiry 4: 345-362, (1973). Kaye, J.D., "Opacity and recoverability in phonology", Canadian Journal of Linguistics, 19: 134-149,(1974). Kaye, J.D., "Morpheme structure constraints live!", Recherches Linguistiques ä Montreal 3: 55-62, (1974). Kaye, J.D., "Contraintes profondes en phonologie: les emprunts", Cahier de Linguistque UQAM5: 87-101, (1975). Kaye, J.D., "A functional explanation for rule ordering in phonology", Papers from the parasession on Functionalism CLS XI, pp. 244-252 (1975). Kaye, J.D., "Compte-rendu de H.C. Wolfart, Plains Cree: A grammatical study, American Anthropologist 77: 445-446, (1975). Kaye, J.D., "Klamath and the strict cycle", Linguistic Inquiry 6: 599-602, (1975). Kaye, J.D., "L'algonquin du nord", dans Papers of the Ninth Algonquin Conferece, en collaboration, D. Daviault, M. Dufresne, S. Girouard, P. Legault, pp. 55-60, (1978). Kaye, J.D., "Rule Mitosis", in Cook, E.-D. and J.D. Kaye (edd), Linguistic Studies of Native Canada, UBC Press: Vancouver, pp. 143-156, (1978). Kaye, J.D., "On the alleged correlation of markedness and rule function", dans D. Dinnsen (red) Current Phonological Theories, Indiana University Press: Bloomington, pp. 272-280, (1978). Kaye, J.D. et Y.-C. Morin, "II n'y a pas de regies de troncation, voyons!" dans Dressler, W. et W. Meid (red), Proceedins of the Twelfth International Congress of Linguists, Innsbrucker Beiträge zur Sprachwissenschaft: Innsbruck, pp. 788-792 (1978). Kaye, J.D., "Functionalism and Functional explanation in phonology", Studies in Linguistic Science 8.2, (1979). Kaye, J.D. and B. Nykiel, "Loan words and abstract phonotactic constraints", Revue canadienne de linguistique 24: 71-92, (1979). Kaye, J.D., "The mystery of the tenth vowel", Recherches linguistiques a Montreal 13:22-33,(1979). Kaye, J.D., "II 0tait une fois deux voyelles", Cahier de linguistique de l'UQAM 10: 119-131,(1980). Kaye, J.D., "The Indian languages of Canada", in Chambers, J.K. (ed.) The Languages of Canada, Didier: Montreal, pp. 15-19, (1980). Kaye, J.D., "The Algonquian Languages", in Chambers, J.K. (ed.) The Languags of Canada, Didier: Montreal, pp. 20-53, (1980). Kaye, J.D., "The mystery of the tenth vowel", Journal of Linguistic Research, 1: 1-14, (1980). Kaye, J.D. and J. Lowenstamm, "Syllable structure and markedness theory", in A. Belleti, L. Brandi, and L. Rizzi (edd.), Theory of Markedness in Generative Grammar, Pise: Scuola Normale Superiore, pp. 287-316, (1981).
Jonathan D. Kaye: Publications
xxiii
Kaye, J.D., "Recoverability, abstractness, and phonotactic constraints",in D. Goyvaerts (ed.) Phonology in the 1980's, Gent: Story-Scientia, pp. 469-481, (1981). Kaye, J.D., "La selection des formes pronominales en vata et en d'autres langues kru", Revue Quebecoise de Linguistique 11:117-134, (1981). Kaye, J.D., "La selection des formes pronominales en vata et en d'autres langues kru", Cahiers Ivoiriens de Recherche Linguistique 9: 5-24, (1981). Kaye, J.D., "Comments on the role of the evaluation metric in the acquisition of phonology", in Baker, C.L. and J.J. McCarthy (edd.), The Logical Problem of Language Acquisition, MIT Press: Cambridge, pp. 249-256 (1981). Kaye, J.D., "Implosives as liquids", Studies in African Linguistics, Suppl. #8, 7881,(1981). Kaye, J.D. and M. Charette, "Tone sensitive rules in Dida", Studies in African Linguistics, Suppl. #8, 82-85, (1981). Kaye, J.D., and M. Charette, "Tone sensitive rules in Dida", Cahiers Ivoiriens de Recherche Lingistique 9: 95-103, (1981). Kaye, J.D., "Les diphtongues cachees du vata", Studies in African Linguistics 12: 225-244,(1981). Kaye, J.D., "Les diphtongues cachees du vata", Cahiers Ivoiriens de Recherche Linguistique 9: 69-94, (1981). Kaye, J.D., Doua, B.S., Dugas, A. et Η. Koopman, "Petit lexique de la langue vata", Cahiers Ivoiriens de Recherche Linguistique, 9: 103-125, (1981). Kaye, J.D. and Y.-C. Morin, "The syntactic bases for French liaison", Journal of Linguistics 18: 291-330, (1982). Kaye, J.D., "Les dialectes dida", dans Kaye, Koopman et Sportiche, pp. 233-239, (1982). Kaye, J.D., "Les traits phonologiques", dans Kaye, Koopman et Sportiche, pp. 240-246, (1982). Kaye, J.D., "Harmony processes in Vata", in van der Hulst, H. and N. Smith (edd.) The Structure of Phonological Representations (Part 2), Foris: Dordrecht, pp. 385-452, (1982). Kaye, J.D. et J. Lowenstamm, "De la syllabicate", dans Dell, F., Hirst, D. et J.-R. Vergnaud (red) Forme sonore du langage, Hermann: Paris, pp. 123-160, (1984). Kaye, J.D. and J. Lowenstamm, "A metrical treatment of Grassmann's law", in Berman, S., Choe, J.-W. and J. McDonough (edd.) Proceedings of the 15,h NELS Conference. GLSA, UMASS, pp. 220-233, (1985). Kaye, J.D., Lowenstamm, J. and J.-R. Vergnaud, "The internal of structure of phonological elements: a theory of charm and government," Phonology Yearbook2: 305-328,(1985).
xxiv Jonathan D. Kaye: Publications Kaye, J.D., "On the syllable structure of certain West African languages", in D. Goyvaerts (ed.) African Linguistics: Essays in Memory of M. W.K. Semikenke, SSLS 6, John Benjamin: Amsterdam, pp. 285-308, (1985). Kaye, J.D. and J. Lowenstamm, "Compensatory Lengthening in Tiberian Hebrew", in Wetzeis, L. and E. Sezer (eds.) Studies in compensatory lengthening, Foris: Dordrecht, pp. 97-132, (1985). Kaye, J.D., Ech-Chadli, M. et S. EL Ayachi, "Les formes verbales de l'Arabe ma· rocain", La phonologie des langues semitques. Revue quebecoise de linguistique, 16:61-99.(1986). Kaye, J.D., Lowenstamm, J. and J.-R. Vergnaud, "La structure interne des elements phonologiques: une theorie du charme et du gouvemement", (French translation of Kaye, Lowenstamm and Vergnaud, 1985) Recherches linguistiques de Vincennes 17: 109-134, (1988). Kaye, J.D., "Government in Phonology: The case of Moroccan Arabic", The Linguistic Review, 6: 131-160, "1986/1987" (1990). Kaye, J.D., "Coda licensing", Phonology 7.2:301-330, (1990). Kaye, J.D., Lowenstamm, J. and J.-R. Vergnaud, (German translation of Kaye, Lowenstamm and Vergnaud, 1990, modified) "Konstituentenstruktur und Rektion in der Phonologie". In Μ. Prinzhorn (ed.) Phonologie Wiesbaden: Westdeutscher Verlag, pp. 31-75, (1990). Kaye, J.D., Lowenstamm, J. and J.-R. Vergnaud, "Constituent structure and government in phonology", Phonology 7.2: 193-231, (1990). Kaye, J.D., "What ever happened to Dialect B?" in Mascaro, J. and M. Nespor (eds.), Grammar in Progress: Glow Essays for Henk van Riemsdijk, Foris: Dordrecht, pp. 259-263, (1990). Kaye, J.D. and E. Dresher, "A computer learning model for metrical phonology", Cognition 34: 137-195, (1990). Kaye, J.D. and J. Harris, "A tale of two cities: London glottaling and New York City tapping", The Linguistic Review 7: 251-274. (1990). Kaye, J.D., "The strange vowel sets of Charm theory: the question from top to bottom", Journal of Linguistics 26: 175-181, (1990). Kaye, J.D., "On the interaction of theories of Lexical Phonology and theories of phonological phenomena", in Dressler, W.U., Luschützky, H.C., Pfeiffer, O.E. and J.R. Rennison (eds.), Phonologica 1988, Cambridge University Press, pp. 141-155,(1992). Kaye, J.D., "Do you believe in magic? The story of S + C sequences", SOAS Working Papers in Phonetics and Linguistics 2: 293-314, (1992). Kaye, J.D., "Derivations and interfaces", SOAS Working Papers in Phonetics and Linguistics 3: 90-126, (1993). Kaye, J.D. and E. Gussmann, "Polish notes from a Dubrovnik cafe", SOAS Working Papers in Phonetics and Linguistics 3: 427-462, (1993).
Jonathan D. Kaye: Publications
xxv
Kaye, J.D., "Commentary on Daelemans, Gillis, and Durieux ["The acquisition of stress: a data-oriented approach"]", Computational Linguistics 20: 453. (1994). Kaye, J.D., "Derivations and interfaces", in Durand, Jacques and Francis Katamba (eds.), Frontiers of Phonology, Longman Linguistics Library, London, pp. 289-332,(1995). Kaye, J.D., "Do you believe in magic? The story of s+C sequences", in Kardela, Henryk and Bogdan Szymanek (eds.), A festschrift for Edmund Gussmann, Lublin, Catholic University of Lublin, pp. 155-176, (1996). Kaye, J.D., "Why this article is not about the acquisition of phonology", SOAS Working Papers in Linguistics & Phonetics 7: 209-220, (1997). Kaye, J.D., "Working with licensing constraints" to appear in DziubalskaKolaczyk, Katarzyna. (ed). Constraints and Preferences. A volume in the series Trends in Linguistics, series editor Walter Bisang. Berlin: Mouton de Gruyter. Kaye, J.D., (Translated into Chinese by Ma Qiuwu) "A fresh look at Putonghua onset-rime pairs", Manuscript. Kaye, J.D., "A users' guide to Government Phonology". Manuscript (incomplete). Kaye, J.D., "A short theory about tones". Manuscript.
Instead of an introduction Cassimjee and Kisseberth (p. 203) present a basic description and analysis of the tonal system of the Eerati dialect of Emakhuwa (northern Mozambique), positing a number of constraints to account for both the Eerati and other Emakhuwa tone systems. Charette (p. 465) reanalyses the p-licensing of empty nuclei in French in contexts where they precede a p-licensed pseudo-empty nucleus. She concludes that a proper governor deploys its full governing potential in governing an empty position, but not in governing a pseudo-empty position. Cobb (p. 223) accounts for the relationship of "height" to "ATR"harmony in the vowel harmony processes of Yoruba and Natal Portuguese with a modified version of head licensing, combined with the Complexity Constraint. Cristofaro-Silva (p. 243) analyses the palatalisation of the alveolar stops /t, d/ in various dialects of Brazilian Portuguese as the spreading of the element I regulated by constraints on spreading within licensed domains. Cyran (p. 303) reviews the evidence for the "branching onset" and the "two onset" analysis of polish clusters like [kl, br, tr] and concludes that there is no compelling evidence for the former structure, even though no detailed analysis using the second structure is yet available. Denwood (p. 543) explores the potential of a four-position template, as previously proposed for Chinese, for the description of Khalkha Mongolian, and concludes that it accounts for the attested consonant clusters. Dresher (p. 7) argues for a "treasure hunt" model of language acquisition with innate cues for the setting of individual parameters within
2 Instead of an introduction
Universal Grammar and contrasts this with other current acquisition models. Goh (p. 563) traces the history of the diminutive retroflex suffix [sr] in Beijing Mandarin from an independent word [sr] 'child' or 'smallness' and argues that this suffix is not analytic, but rather still a minimal phonological word in the present-day language. Guerssel (p. 581) proposes that the failure of ablaut in Classical Arabic verbs containing a guttural as their second or third consonant is not the result of a phonological process, but rather is dictated by the principles of non-concatenative morphology Gussmann (p. 321) considers the evidence supporting the existence of branching onsets in Modern Icelandic and draws the conclusion that only obstruents followed by /r/ and perhaps Μ might constitute branching onsets. Heo (p. 481) discusses two cases in which a domain-final empty nucleus that is expected to be p-licensed in Korean resists being licensed and accounts for them with the necessity or optionality of licensing the preceding consonant. van der Hulst (p. 75) argues for a distinction between two independent areas of phonology: lexical and postlexical. The former applies to units that are stored or processed in the lexicon (up to the maximal domain that the lexicon provides) and the latter to units constructed in the syntax. Kenstowicz, Abu-Mansour and Törkenczy (p. 259) review the analysis of voicing in a large number of languages and reanalyse Hungarian voicing with a phonetically motivated revised Laryngeal Licensing Constraint "The feature [voice] is licensed in contexts of salient release."
Instead of an introduction
3
Kim (p. 497) considers the distributional constraints on consonants before a licensed domain-final empty nucleus in Korean and accounts for them with the limited licensing potential of licensed empty nuclei, which results in "unreleasing". Lowenstamm (p. 339) analyses muta cum liquidä consonant sequences as contour segments and thereby accounts for syllable structure paradoxes in a wide range of languages. He argues that such sequences therefore do not constitutes an argument for branching onsets. Morin (p. 385) summarises the literature of the various types of French liaison and goes on to propose a new solution for one subclass, whereby the prenominal liaison consonants [t, z, n, r] and [g] are "status prefixes" on the following (vowel-initial) noun. Nikiema (p. 365) argues for word-initial empty-headed syllables, which he terms "defective syllables", to account for both the distribution of Γύΐ vs. /lo/ variants of the masculine definite article and radoppiamento in Italian. Piggott (p. 401) examines the syllable structure of Selayarese (and other "Prince" languages) and proposes that the coda position (and phonological features found in that position) are licensed by general principles of phonology rather than by a following non-coda position. Ploch (p. 149) charges a range of current phonological theories with "unscientificness" as a result of the over-application of Occam's razor and non-arbitrariness. He argues that Popper's criterion of testability will both restore the scientific status of phonology and make the two former principles derivable and therefore superfluous. Rennison and Neubarth (p. 95) present a comprehensive overview of a new CV-type theory of Government Phonology, in which the CV syllable (termed "syll") is the only constituent type, and all governing and licensing relations are established at the syll level.
4 Instead of an introduction
Rice (p. 427) adduces evidence from Ahtna (Athapaskan) for claiming that there are two distinct types of word-final consonant (codas vs. onsets) and, in one variety, two types of empty nucleus (which respectively license or fail to license marked laryngeal features). Ritter (p. 29) combines innate principles of Universal Grammar with general cognitive functions in a head/dependent model of language acquisition. This she demonstrates with the order in which syllable structure and prosodic structure are acquired by children. Rowicka (p. 511) analyses the "trapped" (i.e. interconsonantal) [r] of Polish, claiming that it can function as syllabic in phonotactic patterns, while at the same time being metrically "weightless". In this way she accounts for previously problematic consonant clusters. Scheer (p. 283) correlates the absence of stops at certain places of articulation with classic spirantisation processes like Grimm's law, and relates both phenomena to the presence or absence of a glottal friction feature in the phonological representations of the consonants concerned. Vergnaud (p. 599) demonstrates that the correspondence between metrical grids and beats is inherent in the formal definition of the metrical grid. His formal account involves a model of metrical structure which relies upon the notion of "occurrences" of units. Williams (p. 55) takes up the issue of the class of languages generated by the theory of Government Phonology and shows that the Projection Principle and the Uniformity Principle largely goals of formal restrictiveness and explanatory adequacy at the same time. Shohei Yoshida (p. 527) describes the range of phonetic variants of the "syllabic nasal" of Japanese and interprets it within Government Phonology as a sequence of an onset and a nucleus with a floating nasal element, thus accounting for both its vocalic and consonantal realisations.
Instead of an introduction
5
Yuko Yoshida (p. 449) shows how the filling of final nuclei in Japanese borrowings from Chinese with an epenthetic U-element (producing /N/, the so-called "moraic nasal"), in fact echoes a licensing constraint of Ancient Chinese.
Meno's paradox and the acquisition of grammar* B. Elan Dresher From our point of view as English speakers, a language such as Chinese might seem totally different from our own. In fact, these two languages as well as all other human languages are nearly identical. The differences that seem all important to us are relatively minor. (Jonathan Kaye 1989: 54).
1
Plato's problem and Meno's paradox
In Plato's dialogue The Meno, Meno doubts that one can investigate what one does not know. Which of the things you do not know, he asks, will you propose as the object of your search? Even if you stumble across it, how will you know it is the thing you did not know? Socrates replies that there is a way out of this paradox: we can investigate what we do not know because, at some level, we already know everything; what we call learning is but recollection. He goes on to demonstrate the truth of this astonishing claim by showing that an ignorant slave boy actually knows the Pythagorean theorem, even though the boy does not know that he knows it, and in fact does not seem to know it until Socrates leads him through a series of questions about it. The implication is that Meno's paradox would indeed make learning impossible, unless we assume that we have knowledge from some source other than experience in this life. More recently, Noam Chomsky has observed that the problem of how we come to know things remains with us. He has named this Plato's problem (Chomsky 1988: 4). In the words of Bertrand Russell (Russell 1948: 5), the problem is this: "How comes it that human beings, whose contacts with the world are brief and personal and limited, are able to know as much as they do know?" As Chomsky has shown over the years, this problem arises sharply in the case of language. For it can be shown that everybody is like Meno's slave boy, in that they know many things about their native language that they
8
Β. Elan Dresher
don't know they know; moreover, these are things they were never taught; nor does it appear that they could have had any other experience, in this life, that could suffice to account for their knowledge. 2
Some examples of Plato's problem in language
To give a brief example (Chomsky 1975: 30-35): every native speaker of English knows how to form yes/no questions. Given a declarative sentence like (la), the corresponding question is (lb); similarly, the corresponding question to (2a) is (2b): (1)
a. The boy is tall. b. Is the boy tall?
(2)
a. Mary has been swimming. b. Has Mary been swimming?
It is clear that every speaker of English can do this for any declarative sentence, so this ability is not just a matter of memorising a long list of questions. Rather, every speaker of English must have at some point acquired a general rule for creating such questions. Some aspects of this rule must be based on experience, since not every language forms questions in the same way. We might imagine a child, on the basis of being exposed to simple questions of the sort mentioned earlier, (unconsciously) formulating a rule such as the following: to form a yes/no question, move the first auxiliary verb of the corresponding declarative to the front of the sentence. (This formulation presupposes that the learner has figured out what auxiliary verbs are.) The rule we have given, though, is not exactly correct. When we consider a more complex example, we see that we do not always move the first auxiliary verb to the front. Consider (3a). Applying our rule, we would move the first auxiliary verb was to the front, deriving the incorrect question (3b). This is of course totally ill-formed. What every speaker of English knows is that the correct question in this case is (3c). That is, one has to skip the first auxiliary verb was,
Meno 's paradox and the acquisition of grammar
9
which is in a subordinate clause who was in the room, and pick out the auxiliary verb of the main clause, which is is, and move that to the front. So the rule is that we must move the first auxiliary verb of the main clause to the front, not the first auxiliary verb in the sentence. (3)
a. The man who was in the room is tall. b. *Was the man who in the room is tall? c. Is the man who was in the room tall?
If children learn this rule by observation, through hypothesis formation or trial and error, we might expect to find a learning sequence such as the following: on the basis of simple sentences, children arrive at the simple rule, "Move the first auxiliary verb to the front." Later on, when they start producing more complex sentences with subordinate clauses, we expect them to make mistakes, where they move the wrong verb to the front. Observations of children, and experiments designed to test how they in fact deal with such cases, reveal, to the contrary, that children never make such mistakes. They make all sorts of other mistakes, but they never try to move a verb from a subordinate clause rather than a main clause (Crain and Nakayama 1987). It follows, then, that children never entertain what appears to be the simplest hypothesis. Rather, they appear to know from the outset that rules of grammar that move elements around are sensitive to clause structure. Here, then, is an example of Plato's problem, and Chomsky proposes a version of Plato's solution: the knowledge that rules of grammar are sensitive to structure is not gained in this lifetime, but sometime before, and is a part of our genetic inheritance. Supporting this idea is the further observation that the principle of sensitivity to structure appears to be universal, true of every human language. Such cases are not limited to syntax, but arise in all aspects of language acquisition. Consider how one learns the meaning of words. In Word and Object (1960: 29-54), W.V.O. Quine made up a story about field linguists studying a completely unfamiliar language. They observe that a native says gavagai when a rabbit runs past, and guess that gavagai means 'rabbit'. But Quine observes that there are many
10
Β. Elan Dresher
other possibilities. Gavagai could be a whole sentence, such as 'Lo, there goes a rabbit!'. And even if the linguists are able to learn enough of the language to determine that gavagai is a word, not a sentence, Quine points out that it will still not be certain that gavagai means 'rabbit'. For it might mean 'temporal rabbit stages'; or 'all and sundry undetached rabbit parts'; or it might refer to 'that single though discontinuous portion of the spatio-temporal world that consists of rabbits'; or 'rabbithood'; or 'a momentary leporiform image'; and so on. Quine observes that all these possible meanings, and infinitely many more, are very hard to distinguish from each other and from plain 'rabbit', in all kinds of tests and situations — i.e., most things which are true of 'rabbit' are true of 'undetached rabbit parts'. So if we ask a native speaker, "Is this [pointing to something] an example of gavagai?" the answer will be "yes" or "no" (or as Quine suggests, evet or yok) whether gavagai means 'rabbit' or some other of the above candidate interpretations. Now, it is interesting to notice that linguists or travellers do not in actual practice encounter the gavagai problem when learning an unknown language; nor do children, since they arrive at the same meanings of words as their parents, siblings, and other members of the same speech community. How can this be? It must be the case that everybody draws similar conclusions about what a word means, and what a likely meaning is. That is, the odds are that gavagai means 'rabbit' and not 'a momentary leporiform image'. As with syntax, children do make many mistakes in the course of learning what words mean. For example, they may overgeneralise or undergeneralise the meanings of words. They might, for example, suppose that the word dog applies also to horses and other such animals. But again, the misgeneralisations children actually make are very limited in comparison with the ones it is possible to make. For example, when a dog enters the room and someone says "Dog," no child supposes that dog means 'animal viewed from the front', or 'animal that has just entered a room'. Again, it appears that learners are constrained to entertain only a small subset of the conceivable hypotheses. The gavagai problem does arise, however, when we try to decode the meanings of the vocalisations of other species. For example,
Meno's paradox and the acquisition of grammar
11
Cheney and Seyfarth (1990), after much observation of vervet monkeys, are unable to decide if their leopard call means 'Behold! a leopard!', or 'Run into the trees!', or a range of other possibilities. It is unlikely that the language of vervet monkeys is so much more complex than our own; vervet monkeys do not do well at learning English yes/no questions, for example. But lacking the appropriate builtin constraints, we are at a loss to arrive at the correct answer, which must be obvious to any vervet monkey. Acquisition of phonology, the sound system of a language, is also not just a question of memorising sounds, but involves learning patterns. Consider how words are stressed in English, for example. English stress, though complex and subject to numerous exceptions and special cases, follows certain patterns that we have come to know at some unconscious level. The reality of these patterns is made manifest when we borrow into the language a word that does not observe them. For example, the Russian word babushka is more usually pronounced by English speakers as babushka. Similarly, the capital of Finland, Helsinki, is frequently pronounced Helsinki in English. The shift in stress brings these words into conformity with English stress patterns. If stress patterns of languages all involved simple generalisations, we could think that it would suffice to learn them from experience. But some extremely complex generalisations are quite common cross-linguistically, whereas myriad others of lesser or comparable complexity that one could imagine do not occur. For example, here is the rule for assigning main stress in Passamaquoddy, an Algonquian language spoken in Maine and New Brunswick (Stowell 1979, based on data from Philip LeSourd): (4)
a. If the penultimate syllable of a word has a full vowel, stress it. b. Otherwise, if the antepenultimate syllable has a full vowel, stress it.
12
Β. Elan Dresher
c. Otherwise (if neither the penult nor antepenult has a full vowel), stress whichever one of these two syllables is separated by an even number of syllables from the last preceding syllable that has a full vowel, or — if there is no full vowel — from the beginning of the word. This type of stress system is not at all rare; many languages have rules that are similar, with minor variations. As before, we seem led to the conclusion that the learner is constrained to look for particular types of patterns. The idea that language learners must be benefiting from innate direction of some kind is supported further when we consider under what conditions language learners (here we are talking about small children) must master these complex and subtle linguistic generalisations. Unlike linguists, who can bring to bear on these questions all sorts of evidence, children must learn the rules of their language from exposure to whatever examples come their way, without any explicit discussion of what the rules are, or any systematic ordering of examples. Imagine trying to learn how to play chess by observing other people play, where nobody tells you explicitly what the rules are for moving a knight or for taking a pawn en passant; suppose moreover that some small percentage of moves you observed are actually illegal, but unnoticed, hence unremarked, and that many games are broken off or left incomplete for a variety of reasons, usually unstated; suppose also that grown-ups let you make many illegal moves without correcting you, because you are little and don't know better; and now suppose that the game you are learning is not chess but one that is many times more difficult. How much more difficult? Recall that a few years ago a computer beat the highest ranked chess player in history; but no computer can speak a language in a way that can even be compared to the most inarticulate human speaker, nor are there any prospects of creating such a program in the foreseeable future.
Meno's paradox and the acquisition of grammar
3
13
Parameters in Universal Grammar
The above remarks, then, are intended to persuade you that Plato's problem arises with great force in the acquisition of language: people know things about their language that seem to go beyond their experience. It follows, then, that their knowledge of these things does not come from experience, but from their own minds. Following Chomsky, let us call these innate cognitive principles, whatever they are, Universal Grammar. What these principles are is an open question that is still far from solved; but the hypothesis that there is a Universal Grammar is, so far, the only hypothesis that offers any hope of solving Plato's problem in the domain of language. Assuming that grammar is universal, though, presents us with a new problem, and that is to explain why languages are not all the same. The most striking way in which languages differ, of course, is in how sounds are paired with meaning to form words. A tree is denoted by the word tree in English, but by arbre in French and by baum in German. This is the phenomenon called the arbitrariness of the sign by de Saussure; that is, there is no inherent connection between the meaning of a word and its sound. It follows that the words of each language have to be learned by experience. Fortunately, Plato's problem does not arise here: English children have lots of opportunities to observe that the word for tree is tree. Apart from vocabulary, there are other sources of cross-language variation that must be accounted for. For example, in some languages a verb precedes its object, while in others the verb follows its object. In some languages, like Italian, a subject is optional, whereas in others, like English, it is obligatory, even if has no semantic role, such as in the sentence It's raining. In Passamaquoddy, the rules of stress assignment distinguish between full vowels and reduced vowels; in other languages, similar distinctions are drawn between long vowels and short vowels, or between open syllables and closed syllables. These cross-language differences have the flavour of being variations on a theme. To account for such variation, Chomsky (1981b) has proposed that the principles of Universal Grammar are not rigidly fixed, but allow for parametric variation. These parameters are like open variables in a formula, whose value is to be set on the basis of
14
Β. Elan Dresher
experience. The possible values of a parameter are limited and given in advance, like choices on a menu that allows very limited substitutions. On this view, then, language acquisition is reduced to setting parameters to their appropriate values. 4
Parameter setting
Parameter setting is a much more manageable learning problem than open-ended induction, or hypothesis formation and testing. Most research into Universal Grammar has quite appropriately concentrated on trying to figure out what the principles and parameters are. There is not much we can say about the learning problem until we know what it is that has to be learned. Nevertheless, a small subfield devoted to parameter setting has arisen in the last fifteen years which has begun to consider different models of how learners might go about setting parameters. The problem has turned out to be deceptively difficult, much more than one might have thought. On the face of it, parameter setting would appear to be a simple matter. Suppose there is a principle of Universal Grammar that says that in each language the subject of a sentence may either precede or follow the verb — the learner must decide which it is. In English, the subject precedes the verb, as in the sentence John kicked the ball. So one might think that a learner just has to hear a few simple sentences like this to know how to set this parameter. But it's not that simple. The parameter fixes what we could call the canonical, or basic, word order of a sentence; however, in many languages the basic order can be disturbed by other rules that move elements around. English has constructions in which the subject follows the verb, as in Hello, said Peter to his friend. Further, in order to set the parameter correctly, we have to know what the subject and object in any given sentence are, but it is not obvious that a learner always knows this. In an imperative sentence like Watch yourself! there is no overt subject; from the meaning, a language learner might mistakenly conclude that the subject is yourself, which follows the verb. Cases of this kind occur in other languages. In French, for example, the object follows the verb, as in Paul voit la table. But when the
Meno's paradox and the acquisition of grammar
15
object is a pronoun, it appears before the verb: Paul la voit. The explanation for this, as we understand it, lies in the fact that French pronouns are clitics, and clitics have special positions that are not the canonical ones. However, a learner might not know that, and be misled by this example. In Dutch and German, it can be shown that the verb follows its object in the basic word order. However, this basic order can be observed mainly in subordinate clauses. In main clauses, the basic order is disturbed by a principle that requires the verb to occur in second position in the sentence. Jonathan Kaye and I worked on a computational model for learning how to assign stress to words. We found that the relation between a parameter and what it does is rather indirect, due to the fact that there are many parameters, and they interact in complex ways. For example, in English main stress is tied to the right edge of the word. But that doesn't mean that stress is always on the last syllable, as in chandelier. It could be on the penultimate syllable, as in Manitoba, or even on the first syllable, as in Cdnada. How can Cdnada be an example of stress on the right? In English, stress is assigned not to individual syllables, but to groupings of syllables called feet. An English foot is a trochee, consisting of a stressed syllable followed by an optional unstressed syllable. Other languages have other kinds of feet, depending on how the foot parameter is set. But then why are words like Cänada and dlgebra stressed on the third to last syllable and not on the second to last? That is due to another parameter to the effect that the last syllable may be ignored in assigning stress. Why, then, are some words stressed on the second to last syllable, like aroma and agenda? That is due to their different syllable structures. In English, syllables with long vowels or closed by a consonant are considered to be heavy, like full vowels in Passamaquoddy, and heavy syllables tend to attract stress. A heavy syllable can be a foot by itself. Thus, words that have a heavy penultimate syllable have stress on that syllable, as in aroma, Helsinki, babüshka; words with a light penult have stress on the antepenult, as in Cdnada, dlgebra. The point of all this is that the stress patterns of any language are the result of a number of interacting parameters. This interaction
16 Β. Elan Dresher
makes the relationship between a parameter and its effects nontransparent. Some surprising consequences follow from this fact. The first one is that learners who have some incorrectly set parameters might know that something is wrong, but might not know which parameter is the source of the problem. Suppose, for example, that a learner of English mistakenly thinks that objects must precede the verb. The learner observes the sentence John kicked the ball. According to the learner's developing grammar, that should have been John the ball kicked. So the learner realises that something is wrong with the grammar: one or more parameters are set to the wrong values. But which ones? It could be the parameter that says whether the object should precede or follow the verb. But it could be something else entirely. Maybe English is like Dutch in requiring the verb to move to second position. Maybe that's the parameter that has to be adjusted, and not the word order parameter. This is known as the credit problem·, a learner cannot reliably assign credit or blame to individual parameters when something is wrong. There is a second way in which parameters can pose problems to a learner, somewhat reminiscent of Meno's paradox. When we talk about specific parameters, we presumably know exactly what they do, and we assume that learners ought to know what they do, also. But it is not obvious that this is the case, and indeed, it looks quite certain that this cannot always be the case. For some parameters are stated in terms of abstract entities and theory-internal concepts which the learner may not initially be able to identify. For example, the theory of stress is couched in terms of concepts such as heavy syllables, heads, feet, and so on. In syntax, various parameters have been posited that refer specifically to anaphors, or to functional projections of various types. These entities do not come labelled as such in the input, but must themselves be constructed by the learner. So, to echo Meno, how can learners determine if main stress falls on the first or last foot if they don't know what a foot is, or how to identify one? How can learners set parameters that control where anaphors can appear when they don't know which parts of the data represent anaphors? And if they happen to stumble across an anaphor, how will they know that that's what it is? This can be called the epistemological problem.
Meno's paradox and the acquisition of grammar
5
17
Some parameter setting learning models
To summarise, to get out of Meno's paradox and to solve Plato's problem, we posited a theory of Universal Grammar with a set of open parameters. By doing so, we limit the role of experience to parameter setting. But now we have found that the same problems arise even in this limited domain. How do we solve them? Different learning algorithms have taken radically different tacks in trying to deal with the credit problem and the epistemological problem. 5.1.
A cue-based learner (Dresher and Kaye 1990)
Our proposal is to put even more into the mind: not only the principles and parameters of Universal Grammar are innate, but learners must be born with some kind of a road map that guides them in setting the parameters. Some ingredients of this road map are as follows. First, Universal Grammar associates every parameter with a cue, something in the data that signals the learner how that parameter is to be set. The cue might be a pattern that the learner must look for, or simply the presence of some element in a particular context. Second, parameter setting proceeds in a (partial) order set by Universal Grammar: this ordering specifies a learning path (Lightfoot 1989). The setting of a parameter later on the learning path depends on the results of earlier ones. Hence, cues can become increasingly abstract and grammar internal the further along the learning path they are. For example, in learning stress patterns, we suppose that children are able to recognise relative stress levels, and can tell if words with different types of syllables have the same stress patterns or different patterns. Such a developmental stage may have representations as in Figure 1. In these simple representations of English stress contours, each syllable is represented by S, and the height of the column of JC'S indicates the relative perceived stress level of each syllable.
18
Β. Elan Dresher
a. Amirica
b. Manitoba
χ
c. aginda
χ
χ
x
χ
χ
χ
χ
S
S
S
S
A me r i
x χ
S
ca
χ χ
χ
S
Ma n i
Line 2
χ
S
χ
S
t o : ba
Line 1
x
x
x
Line 0
S
S
S
Syllables
a gen da
Figure 1. Representations of English stress contours before setting metrical parameters
As learners acquire more of the system, their representations become more sophisticated, and they are able to build on what they have already learned to set more parameters, eventually acquiring the representations in Figure 2. a. America
b. Manitoba
x
x
(x) χ
(χ
L
L
c. agenda
χ) L
A me r i
L
x)
(χ
χ) (χ)
L
ca
x
(x L
Ma n i
H
L
to:ba
Line 2 (x)
Line 1
χ
(x)
LineO
L
H
Syllables
L
a gen da
Figure 2. Acquired representations
In these representations, the undifferentiated S has been replaced by L, which indicates a light syllable, and H, which stands for a heavy syllable. Scanning from right to left, syllables have been grouped into binary left-headed (trochaic) feet, and final syllables are marked as extrametrical (). As an example, consider (a), the word America. The learner has found that the word consists of four light syllables. On line 0, metrical structure is constructed as follows: start at the right end of the word, and skip the last syllable. Then group the preceding two syllables into a foot. Since there is only one light syllable left over, it does not get put into a foot. In (b), Manitoba, we skip the last syllable. Since the second to last syllable is heavy, we do not group it with the preceding syllable but it makes a foot by itself. The two light syllables of Mani are grouped into a second foot.
Meno's paradox and the acquisition of grammar
19
An χ on line 1 indicates a stress, which is the head of a foot. Since English feet have their heads on the left, in (a) the χ goes on the second syllable. In (b) there are two feet, (Ma ni) and (to:), and they are grouped together on line 1. Which foot is stronger? In English, main stress goes on the rightmost foot, so on line 2 we put an JC over the third syllable in Manitoba — this is the main stress in the word. In the learning model proposed by Dresher and Kaye (1990), the representations of Figure 1 are transformed gradually into those of Figure 2, as learners set the metrical parameters that generate these representations. In this learning algorithm, called YOUPIE and modelled on a computer in PROLOG, the learner puts these representations together in the order given in (5). In (6) I list what each parameter is, and what cues the learner looks for to set these parameters. (5)
Order in which parameters must be set 1. Syllable Quantity:
2. Extrametricality: 3. Foot size: 4. Main stress: 5 a. Headedness: 5b. Directionality: 6. Destressing:
Establishes whether feet are quantity insensitive (default, henceforth "QI") or quantity sensitive ("QS") (and type of QS). Establishes edge of domain; can only exclude it at this point. If QI, only bounded feet available; if QS, unbounded is default. Depends on correct settings of all the above. Sometimes depends on having set main stress. Cannot be determined apart from headedness. Determined by comparing stresses predicted by above parameter settings with actual stresses.
Β. Elan Dresher
Parameters and cues 1. Syllable Quantity a. Parameter:
b. Cue: 2.
Extrametricality a. Parameters: b. Cue:
The language {does not/does} distinguish between light and heavy syllables (a heavy syllable may not be a dependent in a foot). Words of η syllables, conflicting stress contours, indicates QS. A syllable on the {right/left} {is not/is} extrametrical. Stress on a peripheral syllable rules out extrametricality on that side.
3. Bounded constituent construction a. Parameter: Line 0 constituents are bounded. The presence of a stressed non-edge L inb. Cue: dicates bounded constituents. 4. Main stress a. Parameter: b. Cue:
Project the {left/right}-most element of the line 1 constituent. Scan a constituent-sized window at the edge of a word. Main stress should consistently appear in either the left or right window.
5. Headedness and directionality of feet a. Parameters: {Left/right}-headed feet are constructed from the {left/right}. b. Cue: Scanning from the {left/right}, a light syllable {following/preceding} any other syllable must be unstressed. Scanning from the left, if for all (XL), L is c. Example: unstressed, then direction = Left, Headedness = Left. If for all (L X) L is unstressed, then headedness = Right.
Meno's paradox and the acquisition of grammar
21
6. Destressing (conflates a number of separate parameters) a. Parameters: {Various types of} feet are destressed in {various situations}. b. Main Cue: The absence of stress on a foot. c. Example: The lack of stress on the first syllable of agenda, with intermediate acquired foot structure (a)(gen), shows that this foot is destressed (further cues reveal the conditions under which this occurs). Space does not allow us to go through all of these parameters and cues in detail, but we can look at at the first one to get some sense of how this works. The first parameter the learner tries to set is syllable quantity: does the language treat all syllables the same with respect to stress, or is there a distinction between heavy and light syllables? If all syllables are equal, then we say that the stress system is quantity insensitive, or QI. If the quantity or weight of a syllable is important, then the stress system is quantity sensitive, or QS. How can a learner set this parameter? Even if you don't know anything about the stress system, you can still keep track of how many syllables words have and where stresses fall. In a language in which syllables are all treated equally, then every word of η syllables should be stressed in the same way. But if stress is quantity sensitive, then heavy syllables will not be treated the same as light syllables, and words of the same length can have different stress patterns. So we propose that this is the cue for quantity sensitivity: if you find conflicting stress patterns in words of the same length, you have QS; if you do not find this, you stick with QI, which we assume is the default setting. English is QS, because words of the same length are not all stressed the same way, as we have seen. Once this parameter has been determined, the learner can use information about syllable quantity to set further parameters, and proceeding in this way, can arrive at the final representations in Figure 2. See Dresher and Kaye 1990 and Dresher 1999 for a detailed description of the cues and parameters assumed here, as well as the order in which they are set.
22
Β. Elan Dresher
This approach has something of a Piagetian flavour, with later stages depending on and building on what was acquired in earlier stages; but whereas Piaget supposed that later stages are literally invented out of earlier ones, without being innately specified — a position that creates a seemingly intractable mystery (see PiattelliPalmarini 1980) — in the view sketched here the whole sequence is innately specified. If this approach is correct, there is no parameter-independent learning algorithm. 5.2.
The Triggering Learning Algorithm (Gibson and Wexler 1994)
A different approach is taken by Gibson and Wexler (1994: 409410), who characterise what they call the Triggering Learning Algorithm (TLA) as follows: (7)
Triggering Learning Algorithm (Gibson and Wexler 1994) "Given an initial set of values for η binary-valued parameters, the learner attempts to syntactically analyze an incoming sentence S. If S can be successfully analyzed, then the learner's hypothesis regarding the target grammar is left unchanged. If, however, the learner cannot analyze S, then the learner uniformly selects a parameter Ρ (with probability l/n for each parameter), changes the value associated with P, and tries to reprocess S using the new parameter value. If analysis is now possible, then the parameter value change is adopted. Otherwise, the original parameter value is retained."
We can illustrate how this learning algorithm is supposed to work by looking at the diagram in Figure 3, where each square represents a setting of two syntactic parameters. The first parameter determines whether the head of Spec X' is initial (value 1) or final (0). In this case, the head is the verb (F) and its specifier is the subject (5). The second parameter encodes whether the head of a complement is initial or final, here exemplified by the relation between a verb and its object (O). These two parameters define a space with four states.
Meno's paradox and the acquisition of grammar S V Source S Ο V
S V 0,0
0,1
s ν ο
V s
V s 1,0
OYS
23
1,1
Target
VΟs
Figure 3. Parameter space (Spec-Head f/i, Comp-Head f/i): final = 0, initial = 1
Assume now that the target language is VOS (1,1), and that the learner's current hypothesis is SOV (0,0). Suppose the learner hears a sentence of the form VOS. This sentence is not parsable by the learner, who now determines that the current state is not correct. Even though there is only one setting of parameters that corresponds to VOS, it would take a change of both parameters for the learner to reach it. This is not allowed by the Triggering Learning Algorithm, which makes available only the two neighbouring spaces. Neither space yields the target VOS. Therefore, the learner cannot move. Thus, the sentence V Ο S is not a trigger to a learner at (0,0). Fortunately, in this case there is another type of sentence from the target that the learner will eventually hear, namely VS. VS is a trigger to a learner at (0,0), since there is a neighbouring space which parses it, namely (1,0). So the learner moves to there. From there, a further presentation of VOS, which is a trigger to a learner at (1,0), will take the learner to the target. A learner following the Triggering Learning Algorithm does not know what any parameter does, it simply tries to match input forms by moving to a parameter space that can parse a given input form. The Triggering Learning Algorithm runs into a number of serious problems: 1. The Triggering Learning Algorithm cannot handle parameters in subset relations. 2. The learner can fall into incorrect grammars from which it cannot escape.
24
Β. Elan Dresher
3. Or the learner can thrash around indefinitely, repeatedly revisiting the same incorrect grammars. 4. Learning follows a sequence dictated by accidents of input data; there is no notion of development toward greater complexity. 5. The learner will be unable to match the input perfectly at early stages of acquisition, so the learning process cannot get off the ground. To see how the Triggering Learning Algorithm would apply to more than three parameters, I generated a set of schematic languages using six metrical parameters. There were thus 26 (64) languages. Each language was assigned four two-syllable words, eight three-syllable words, sixteen four-syllable words, and ten five-syllable words. Thus, each language has 38 words. Four pairs of languages are extensionally equivalent: their surface stress patterns are identical, though their grammars assign different structures. Since a learner would have no evidence to decide which grammar is correct, these languages are excluded as target grammars from the following discussion. An analysis of how the Triggering Learning Algorithm would apply to the remaining 56 languages yields the results in Table 1. Table 1. 64-state Triggering Learning Algorithm, 6 metrical parameters (56 target states tested) Number Problem states of targets Safe states Local maxima Cul-de-sacs Dangerous states 2 16 4 8 36 2 40 13 9 2 2 42 3 1 18 2 48 1 11 4 2 48 1 1 14 2 48 0 6 10 2 52 12 0 0 6 55 9 0 0 8 55 6 0 3 2 56 2 0 6 26 64 0 0 0
Meno's paradox and the acquisition of grammar
25
In this table, local maxima are states (excluding the target itself) from which the learner cannot exit, and cul-de-sacs are states that do not connect to the target, though exit is possible to one or more dead-end states. A learner who arrives at any of these states is guaranteed to fail to reach the target. A dangerous state is a state that connects to a local maximum or cul-de-sac, as well as to the target. Although a learner in a dangerous state has a chance of reaching the target, success is not guaranteed. In terms of the goal of a learning theory for language, all of these states are problem states. Safe states are states that do not connect to any problem states; assuming that each triggered transition from a safe state has some probability greater than zero, arrival at the target is guaranteed in the limit. We find that 26 languages have Triggering Learning Algorithms with no problem states, whereas 30 languages have between 8 and 48 problem states. In other words, even though there are no subset relations in the data set, and all languages have the same number of words, nearly one half of the languages cannot be guaranteed to be learnable by the Triggering Learning Algorithm.
5.3.
A genetic algorithm (Clark and Roberts 1993)
A different type of learning algorithm was proposed by Robin Clark, and applied in Clark and Roberts (1993) in connection with the loss of V2 in French. On this model, parameter setting proceeds by way of a genetic algorithm that enacts a Darwinian competition of survival of the fittest. A learner simultaneously considers a number of competing hypotheses. Each candidate hypothesis is exposed to input which it attempts to parse. At the end of a round of parsing, the learner assesses how well each candidate did. The candidates are ranked according to their relative fitness. The fittest go on to reproduce candidates in the next generation, the least fit die out. Through successive iterations of this procedure, the candidate set presumably becomes increasingly fit, and converges toward the correct grammar. There are three main problems with this model:
26
Β. Elan Dresher
1. It requires an accurate fitness measure, but none has been proposed. 2. Any such measure requires that the learning space be smooth, i.e., that closeness in surface resemblance reflects closeness in parameter space; but this assumption is incorrect. 3. As with the Triggering Learning Algorithm, the developmental sequence is dictated by the input forms encountered. To get some idea of the problems facing this model, consider Table 2. Table 2. Effects of parameter settings: Selkup (number and percent correct) Words Syllables Main stress Parameters of 10 in % of 8 in % of 20 in % of 8 in % 4 2 25 7 35 3 37.5 a. 40 60 1 12.5 7 35 5 62.5 b. 6 4 12 4 50 c. 7 70 50 60 14 d. 8 80 5 62.5 70 5 62.5 90 5 62.5 14 70 e. 9 5 62.5 f. 9 90 3 37.5 10 50 3 37.5
I simulated a genetic algorithm attempting to learn ten metrical parameters for Selkup (Halle and Clements 1983: 189). The chart shows a sample of the results. In the first column are listed the number of parameters correct. This gives an indication of how close the grammar is to the target. In the other columns I listed a number of possible surface indicators that one might try to use to show goodness-of-fit: number of words correctly stressed (there were eight words in the test), number of syllables correctly stressed, and number of main stresses on the correct syllable. Though the grammars get progressively closer to the target as we proceed down the columns, no single surface measure reflects a monotonic improvement. This is illustrated most dramatically by grammar (f), which has only one parameter wrong and yet does worse than grammar (c), with three parameters wrong, and not much better than grammars (a) or (b). A more systematic simulation of 64 grammars using six metrical parameters confirms the general unreliability of surface measures as keys to the goodness of the grammar. In other words, closeness in extensional
Meno's paradox and the acquisition of grammar
27
space (i.e., the surface data) is unreliably correlated with closeness in intensional space (i.e., the grammar). Conclusion Steven Crain (1991) has compared language acquisition to a scavenger hunt: learners are given a list of things to get — a green sock, an old muffler, a banjo — and they run around looking for these things, and collect them as they find them. But if the scenario I have sketched is correct, language acquisition is not a scavenger hunt but a treasure hunt. In a treasure hunt, you have to find a sequence of numbered clues in order, where one clue leads you to the next. Clue #15, for example, might tell you to look for #16 using information you collected at #11 and #14. If you accidentally stumble on this clue before you have reached the earlier ones, it may well be meaningless or misleading to you. Like Meno's slave boy, we gradually construct the solution to the puzzle that is our native language by proceeding systematically, answering a series of questions put by our own inner Socrates.
Note *
I am very happy to dedicate this article to Jonathan Kaye: I learned so much from him, and had so much fun working with him on the problems discussed here. This research was partially supported by grants from the Social Sciences and Humanities Research Council of Canada. An expanded version of the material in section 5 can be found in Dresher (1999).
On the logical order of development in acquiring prosodic structure Nancy A. Ritter
Introduction Following the spirit of Kaye, Lowenstamm, and Vergnaud's (1990) seminal work, in which they utilise established syntactic principles of Universal Grammar to address issues in phonology, the purpose of this article is to support this claim and to advocate the idea of a strong parallelism between the syntactic and phonological components by claiming that a single set of innate, universal principles exists which applies to both these components (and perhaps others) of the language faculty.1 Going further, however, the point of this article is to also demonstrate that, in addition, there are more general cognitive functions of the mind that also play a role in the language faculty. Together, these innate principles and general cognitive functions comprise a cognitive-based, computational system of language. In the latter part of the article, emphasis is put on the construction of this cognitive system and how its construction is realised or naturally embodied in the order of acquisition of syllabic shapes. In the approach proposed in this article, it is assumed that there is an innate predisposition to human language. With respect to phonology, one aspect of such predisposition could be understood to be the ability to attend to sounds found in natural language and to disregard other bodily noises, such as burps, sneezes, snaps et alia, as irrelevant. However, the approach advocated here also assumes that cognitive processes govern the language faculty by imposing a design on the elements (such as sounds) found in each component of language, in terms of an organisational system. This system is based on formal relationships, such as head/dependent, that are found to occur between component elements, and these relationships, in turn, depend
30
Nancy Α. Ritter
upon cognitive notions, such as salience and dominance. Such systematic organisation is crucial in the design and form of language. Other components of the brain also rely upon having an organisational system. For example, organisation is a crucial factor in the visual component in that fragmented visual input requires assembly into some coherent whole, which is then categorised. Such cognitive notions of organisation, precedence, dominance, etc., can be claimed to take on specific instantiations in different components of the brain and within the language faculty itself. For example, the notion of locality, in relating and grouping objects in an organised system, is formulated differently in syntax (in terms of adjacency/subjacency) than in phonology, most likely due to the nature of the objects being related. However, the core notion of there being a minimal distance between related objects (whether the relation is movement, spreading, etc.) underlies the innate principle of locality found in both components of the language faculty. If cognitive concepts are claimed to underlie the functions of the different components of the mind, then it is not far-fetched to claim that these concepts reign over Universal Grammar as well, leading to a unified set of innate principles, which, however, may take on different forms or be formulated slightly differently in keeping with the function of the specific domains of each sub-component of Universal Grammar. Therefore, it is proposed in this article that a synthesis of the two notions of predisposition and cognitive processing, is deemed necessary for language. Following this line of thinking, along with the claim that these cognitive processes are understood in terms of a computational system, this article will explore the clear relationship between a formal system and its various structural layers on the one hand, and the phases usually distinguished in the acquisition of prosodic structure, on the other hand. As acquisition essentially involves a change in behaviour from one system to another, I claim that such changes are motivated by the expansion of the dynamics of the computational system. The progress that a child makes is understood in terms of discrete developments in the way in which the formal system expands. Every structural layer is claimed to correspond to a noticeable development in the production of prosodic syllabic structure. In other words, the child is actively involved in constructing phonological
Logical order of development in acquiring prosodic structure
31
grammars. The evidence used is documented data taken from children's utterances observed at different stages of their development. The purpose here is to try to understand the developmental stages of syllable structure in terms of a formal model. In evaluating any type of model, one should satisfactorily be able to relate structural layers or structural developments in the model to the observed stages of acquisitional growth. Lastly, the model presented here relies on, and incorporates, findings from acquisitional research which has been done from three distinct perspectives, namely, cognition, perception, and production. This approach is in contrast to previous models of acquisition which have tended to ascribe acquisition to only one of these areas, e.g., production-based models of acquisition (for a brief overview of the relation of perception to production in theoretical models, cf. Vihman 1996: 46-47). The article is organised as follows. In section 1, the interaction between the two approaches of predetermination and cognition, as they relate to child language acquisition, is discussed. In section 2, a model in which cognitive principles are claimed to drive fundamental licensing relationships between objects, is set forth. This model, Head-Driven Phonology (van der Hulst and Ritter 1998, 1999a, 2000a,b, in prep.), claims that head/dependent relations, in the form of licensing relations, are the mechanisms that underlie the phonological computational component. In section 3, the logic of the order of acquisition of prosodic structure is addressed in terms of the HeadDriven Phonology framework. Section 4 discusses the effect that the child's recognition of the onset as an independent organisational unit has on the growth of phonological complexity. 1
Two approaches to the child language acquisition process
This section focuses on two approaches that try to explain the language acquisition process in children. The first claims that there is an innate component of the brain known as Universal Grammar, which is wired in such a way as to predispose the child to natural human language. The second approach aims more at discovering the learn-
32 Nancy Α. Ritter
ing processes of children from a more general cognitive capacity. Researchers in this area have taken into account informationprocessing skills, organisational principles of categorising, and varied learning strategies used by the child, such as paying attention to beginnings or ends of items, boundary markers, stress, etc. In this article, I propose that both approaches are necessary in explaining the acquisition of phonological structure. In fact, this model attempts to incorporate, in a cohesive manner, findings and conclusions of studies focusing on the acquisition process, which have been contributed from the areas of cognition, perception, and production. The cognitive disposition toward categorising and organising chaotic information is specifically instantiated in the component of the brain that is predisposed to language. Such cognitive methods target and recognise salient information and organise other information, in turn, in relation to the predominantly targeted essential category (similar to Lakoff s 1987 proposal of center-periphery schemas). The claim here then is that the child's mind is actively involved in constructing a phonological grammar (contra Jakobson 1968 et alia, and Stampe 1973 et alia who basically maintain that phonological acquisition is predetermined). Moreover, given that the proposed approach appeals to the notion of the child organising chaotic input into some principled orderly schema, I disagree with the claims, advocated by Stampe (1973) and many current-day proponents of Optimality Theory (cf. C. Levelt, Schiller, and W. Levelt 1999/2000), that the basis for production of a word is a stored representation of the accurate adult form of that word. Smith's (1973) findings and conclusions that child language is a result of realisation rules operating upon adult representations assume that the child's perception is complete and accurate from birth. As later researchers have pointed out, this approach, however, is unable to readily account for the individual variations in the types of "errors" that children make. Ingram (1974 et sequentia) has argued that the concept of having an adult form plus certain phonological processes to yield a child's form is inadequate given real-life evidence (cf. Menn 1978 and Waterson 1981 for different proposals regarding the same issue). He, instead, proposes that "children actively operate on adult forms to establish their own phonological represen-
Logical order of development in acquiring prosodic structure
33
tations of these words" (Ingram 1986: 233). According to Ingram (1989), the child's phonological representation of a word occurs at a level (the organisational level) which intervenes between the perceived adult representation and the child's phonetic output. This intervening level does not operate on its own as an autonomous system, however. Rather, Ingram, in the spirit of Jakobson, contributes a great deal of importance to the adult form and to the pressure and influence that the adult form asserts in order for the child's representation to conform more closely to the adult form. Thus for Ingram, adult forms seem to be the basis for children establishing and storing their own form or representation of a word in their lexicon. Such approaches seem to suggest that the child is fully cognisant of the adult form in the child's initial state and that the child attempts various ways in which to achieve such forms to yield the final state. While Ingram's proposal may be a correct assessment for the acquisition of segments and phonological contrasts that take place during the linguistic stage of acquisition, between the ages of 1;6 and 4;0, it is claimed in this article that for the acquisition of prosodic structure, which can be seen to arise at earlier stages such as the prelinguistic (0 to 1;0) or transition stages of acquisition, a different process prevails, which I set forth in the following sections. The acquisition of prosodic categories is to be considered a separate process from the acquisition of segmental contrasts and, in fact, is claimed to be a pre-requisite for the acquisition of phonological contrasts, thus logically preceding the latter. Despite the fact that much of the literature on child language acquisition has focused on the acquisition of segments (for instance, Jakobson's (1968) work has been to present a model which shows how children build up their phonological inventory through a process of acquiring contrasts in phonemes), others, however, such as Moskowitz (1970, 1973) and Peters (1983) have considered higher prosodic units (such as phrases, words, syllables) to be the primary objects of recognition in children and the more essential building blocks in the acquisition process. Only recently, though, has a thorough investigation been conducted, which focuses solely on the acquisition of syllable structure (cf. Fikkert 1994). I agree with these
34 Nancy Α. Ritter
latter researchers in their claims about the primacy of prosodic structure in language acquisition. However, I wish to go a step further in considering prosodic development as the rudimentary construction of the computational system of language. I suggest that in the acquisition of prosodic structure, rather than claiming that the adult forms are the input to a child's establishing a certain structural representation, it is the interaction between Universal Grammar principles and cognitive perceptual and organisational principles that results in the child creating and producing representations of certain structural shapes and types. The child is guided by the cognitive disposition toward categorising and organising chaotic input into some principled orderly schema. Therefore, it is not the adult form itself that directly causes or has impact on the child's representational form but, rather, the building and creation of the child's linguistic computational system that has a direct effect on the output produced by the child. The claim, furthermore, is that specific phonological knowledge linked to the spoken modality is not innate, but, rather, constructed (cf. sign languages). In spoken language, then, the stages of production of prosodic development in some way mirror the construction of the child's computational system of natural language. In other words, the development of CV combinations and complications on this schema are tangible evidence of the way in which the child's phonological computational system is being built and designed. Example (1) illustrates a proposal for the child's lexical representation, which is the basis for his/her production. As emphasised above, the phonological representation of the child is not a stored representation of the accurate adult form. (1)
Lexical representation Perceived acoustic image/display of adult form y
/
/
/ /
Phonological representation of child u
Production/output of child
Logical order of development in acquiring prosodic structure
35
In the representation above, the acoustic image of the perceived adult form is claimed to be initially stored as a holistic unit, i.e., a gestalt. (At this point it is unclear to me how much, if any, phonetic detail is stored initially). This stored information is associated with the child's own phonological representation of the adult form; however, this holistic stored unit does not influence the child's phonological representation. It exists separately in the child's short-term memory. Evidence for this claim comes from the observation that a child may, initially, accurately produce the adult form of a word, which would generally be acquired later in the development of the syllabic structure (i.e., when a child first produces a form such as bus with the presence of the final [s] consonant), but after a few occurrences, the child reverts to producing the word with a simpler C V structure (such as [bs]). This can be explained by the claim that before the child has constructed his/her own phonological representation for the word in question, the child is able to resort to its memory bank for production. There is no phonological representation of this piece of memory; thus, it is not part of the cognitive understanding of the child (the child's competence) and is predicted to be unstable in its occurrence. However, after the child's own cognitive form or mental representation of the word has been established, the child relies on that input for its production system. In this model, then, the child's phonetic form is not attempting to realise a phonological representation of the adult form since the adult form is not the direct input to the child's production system but is only stored information which becomes loosely associated with the child's phonological representation. As the child's mind engages in identifying, categorising, organising, and constructing a computational system for language, a child's phonological representation of a word will reflect the acquisition of the discrete development of the way in which the computational system expands itself. A phonological representation is made by the child based on his/her competence of the system at the moment, and this representation leads to the child's output. The child's production and competence are thus very close in this model, as opposed to model's where competence reflects the adult form. As the child's computational system develops, his/her phonological representation will reflect more and more structural complexities, such as the pres-
36
Nancy Α. Ritter
ence of codas, the absence of onsets, the complexity of codas and onsets. As the child's computational system grows further and matures (and thus the child's competence matures), the child is also then further able to analyse the stored adult form (perhaps in terms of finer phonetic detail). Only at this later stage does the child continue to readjust his/her phonological representation of a word to include more segmental information that closely resembles the adult form. This is the stage at which the adult form curries more impact and influence on the child's phonological representation. The development of the structure of words, then, is claimed here to mirror the piecingtogether process of the development of the phonological computational system. This development stems from reliance on innate principles of Universal Grammar and cognitive organisational tasks of the mind, rather than on direct access to the adult form and processes which try to realise that form. The particular sounds of the child's language, however, are acquired (i.e., filled-in) later in a manner which relies more heavily on direct access to the adult form of a word. Lastly, it should be noted that this model does not give prominence to the notion of the "syllable" as a phonological reality in the prosodic hierarchy in the same way as other phonologists have. Rather, the CV unit is regarded as a syncretised whole in which the components are of unequal status and consonant initiality is regarded as a phonetic dependency on the more salient and prominent vocalic unit that is the target of production. Differing cognitive modes of deliberate conscious processes, and automatic, routine processes (phenic versus cryptic, cf. Campbell 1986) are also claimed to play a crucial role in the development of the child's prosodic system leading to consonantal complexities and marked structures. 1.1.
Cognition reigns over Universal Grammar
The cognitive disposition toward categorising and organising chaotic information is manifested in specific ways in the component (Universal Grammar) of the brain that is predisposed to language:
Logical order of development in acquiring prosodic structure
(2)
37
Cognition
Component X
Phonology
UG
Component Y
Syntax
Cognitive abilities target and recognise salient information and organise other information in relation to the predominantly essential category of each component of the mind. What is innate in the child is the ability to recognise the mode or cues in each component that subsume the predominantly essential category of each specific component. With respect to the language component, the concept of "Universal Grammar" has been used to serve two functions. One function of Universal Grammar is to account for the infant's innate ability or predisposition to perceive certain cues or categories from birth (e.g., thematic roles such as Agent). In other words, to recognise the salient aspects which comprise language per se. The other function of Universal Grammar entails principles which allow the child competence in a certain grammatical form, for example, that (s)he has never experienced through input. These principles are formulated in terms of language specifics. However, if one abstracts away from the specifics involved in a number of these principles, certain overall, general types of cognitive principles may be seen to underlie many, if not all, of the principles of Universal Grammar. Xbar theory, which could be claimed to be an innate structure in Universal Grammar, can be understood as a basic cognitive method of organising information in terms of head/dependent relations and hierarchical structure, in accordance with the part-whole schematic functioning of the mind as proposed by Lakoff (1987). This notion of there being universal cognitive principles that pertain to language plays a crucial role in the acquisition of phonological structure. In the area of syntax, for example, many linguists claim that the predicate-argument division is a fundamental relationship that children utilise as an analytic necessity (Braine 1976) or a useful tool (Maratsos and Chalkley 1980) in the acquisition of syntactic struc-
38
Nancy Α. Ritter
tures. The concept of bootstrapping semantic/cognitive categories onto their syntactic realisations, implies that fundamental, or innate, cognitive notions are mapped onto certain corresponding structures (albeit, by means of affixation, structural position, etc.). In the predicate-argument case, essentially, the predicate acts as a head of a syntactic structure and the argument as its dependent. This hypothesis that children can identify and make use of the notion of a predicate in syntactic acquisition has a correlate in phonological acquisition. In phonology, this notion of there being a head/dependent relationship exists as well with the vocalic unit or nucleus being the head or core and the consonantal onset its dependent.2 When one abstracts away from the specific domain-related objects and instead focuses on the roles that these objects play, it appears evident that the cognitive notion of an asymmetrical head/dependent relation exists at the core of universal grammar. This concept can be formulated as a unique principle of Universal Grammar, namely the principle of head/dependency. Secondly, the structure of this relation is such that the components of the relation combine in a binary fashion. Thus there is a principle of binarity that is also claimed to be an innate principle of Universal Grammar. In a model developed by van der Hulst and Ritter (1998) called Head-Driven Phonology, which expands and elaborates upon insights of Government Phonology and Dependency Phonology, it is these two principles which have been isolated as the core principles of Universal Grammar. The combination of these two principles operating in conjunction with one another is claimed in this theory to underlie the mechanisms used in providing an explanatory computational system of phonological occurrences. 2
Theoretical assumptions of the model
In the Head-Driven Phonology model, these two principles, set forth in (3) below, are realised in terms of licensing relations which ensure that each and every object that is phonologically relevant is licensed. These licensing relations then are to be considered the constructs for the phonological computational system.
Logical order of development in acquiring prosodic structure
(3)
39
Core principles of universal grammar 1. Head/Dependency Principle — the notion of saliency is translated into the notion of a head (the dominant object or focal point). 2. Binarity Principle — organises these heads and the material around them in a binary grouping fashion.
The licensing relations, which are in some sense the mechanisms behind the computational system, come in three varieties as outlined in (4) below: (4)
Licensing relations 1. Structural licensing — creates hierarchical prosodic structure. 2. Paradigmatic licensing — determines the segmental selection as a result of the possibility of contrasts allowed in a certain position (based on the position's status as a head or dependent within the structure); this leads to the notions that: a. heads are the site of maximal contrast, and, b. dependents are the site of neutralisation and absence of contrast (with the greatest lack of contrast being "emptiness" or a phonetically null phonological segment). 3. Syntagmatic-content licensing — the content of a position as head has some bearing on the content of its adjacent dependent position (e.g., in assimilation/dissimilation cases and head-to-head relationships such as harmony).
These last two relational mechanisms, i.e., (4-2) and (4-3), pertain to licensing the content within structural positions rather than the structural positions themselves. As each of these types of relations is built and subsequently becomes an automatic process, the phonological system is slowly created. A discussion of the logic of development of the acquisition of prosodic structure, couched in terms of the Head-Driven Phonology framework mentioned above, is given below in section 3.
40 Nancy Α. Ritter
3 3.1.
The logic behind the acquisition of prosodic structure Vocalic saliency
As mentioned in section 2, two crucial cognitive concepts, namely "recognition of saliency" and categorisation, are manifested in Universal Grammar in the form of two principles: Head/Dependency and Binarity. In the first, the notion of saliency is translated into the notion of a head. The second principle subscribes to organising these heads and the material around them in a binary grouping fashion. With respect to spoken language, given the input of spoken utterances surrounding the child, the child (due to the innate ability to recognise human language sounds versus non-language sounds) selectively focuses on the parts of the utterance which are highly salient. This attention to saliency runs along acoustic parameters where infants pay attention to stress, fundamental frequency and pitch.3 Thus, these more salient features in the signal point to the vowel being the head or core unit. I term this "vocalic saliency". In a word of more than one syllable, the focus would more likely be on the most salient vowel, i.e., the stressed vowel (cf. Pye 1980 regarding the importance of stress as a perceptual determinant). Cues such as intensity and duration of stressed vowels can account for this (cf. Waterson 1987 who posits rhythm as a possible acoustic cue of saliency). In some instances, though, an infant's attention to beginnings or ends of words, as well as memory limitations, may be factors in the infant's conscious selection of the head or core vocalic unit. Peters (1983) likens the salience of ends and beginnings of words to remembering initial and final items in a series. Whichever notion of saliency the infant attends to, the infant recognises and thus consciously selects the head of an utterance in this way. In terms of the production correlate at this point, sounds made in the first two pre-linguistic stages (0 to 20 weeks), the second of which (cooing and laughter) is considered a major landmark, are predominantly vocalic (cf. Campbell 1986). This conscious selection of the head and awareness that this head is vocalic (in terms of aperture and sonority) falls under the paradigmatic licensing relation (4-2) (in terms of the Head-Driven Phonol-
Logical order of development in acquiring prosodic structure
41
ogy model cited above). Based on the child's recognition of a head, the child associates the vocalic category with this position. Thus, the child becomes aware that vocalic-type elements are contained within the head. It is this licensing relation of a position (i.e., the head) with its content (i.e., vocalic) that is claimed to operate first. 3.2.
CV stage
The next stage of acquisition demonstrates the effect of consonant/vowel sequencing. This stage arises from the infant's perception and conscious awareness that a vowel is something preceded by closure and from the infant then adopting a general strategy for production. Evidence seems to point to the fact that infants quickly move to producing consonant/vowel sequences in stage three of vocal play (16 to 30 weeks), in landmark stage four of reduplicated babbling (25 to 50 weeks), and subsequently in the expressive jargon stage of stage five.4 Irwin and Wong's (1983) investigation of twenty children aged 16 to 18 months showed a vowel/consonant ratio in the children's speech of 49/51 as opposed to a ratio of 39/61 in adults. In fact, from an acoustic perspective, according to Bertoncini and Mehler (1981), the continuous speech signal is analysed by the infant into segments that target peaks in the speech wave. The segmentation generally contains both consonantal and vocalic information; however, the contrastive parts are not claimed by these researchers to be specifically analysed as distinct units by the infant. Rather, the sound chunk that is segmentised is regarded by the infant as a holistic unit in which the temporal organisation of phonetic events is not seen as the exponent of two underlying or discrete units. In other words, this CV unit is a constituent but not in the sense of a unit that has any phonological realisation in the prosodic hierarchy as some have claimed that the syllable might. In a similar spirit, Peters (1983) considers that the syllable is initially unanalysed and equatable with the word. Since infants attend to acoustic peaks in the sound wave (indicating attention to vowels and vocalic prominence), there seems to be no awareness that the consonant has a prominent role at this point.
42 Nancy Α. Ritter
Rather, it seems that the consonant serves as a type of boundary marker where the infant becomes aware of the internal phonetic dynamic of "closure" as signalling the onset of the more prominent vocalic aperture. This awareness paves the way for the later cognitive/phonological notions of "onset" and "nucleus". Such a distinction in boundary/prominence also leads to the assumption that the ordering of these two units (C/V) is phonetically predictable and noncontrastive. Since infants recognise that stricture always precedes a vowel, there is no need for them to encode a sequential order to these two objects. The fact that it is phonetically predictable that consonantal stricture will precede vocalic aperture implies the notion of plane segregation. This logical implication was first mentioned by McCarthy (1989) with respect to the concatenation of consonants and vowels in Semitic languages. The implication here then is that these two objects (C — the manifestation of stricture, and V — the manifestation of aperture) reside on different planes or tiers. This notion is illustrated in (5): (5)
V In terms of the Head-Driven Phonology framework and of constructing a computational system, the observation that a head/vocalic unit takes a dependent object that is maximally contrastive (in terms of stricture) with it, reflects a stage in the development of the computational system. This form of a head/dependent relation is a manifestation of the establishment of a second type of licensing relation, a syntagmatic-content licensing relation, in which a head may determine the content of its dependent (4-3 )5. In this syntagmatic licensing relation, the content of the head, which is vocalic, requires that the content of the object preceding it contain stricture. The strict enforcement of this particular licensing relation explains the possibility of having strict CV language types. More relaxed versions of this relationship allow for onsetless syllables.
Logical order of development in acquiring prosodic structure
3.3.
43
Reduplication phase — CVCVform
The next stage in the development of prosodic structure is seen in the reduplicative CVCV form. In effect, this phase in acquisition reflects the binary grouping of vocalic units into feet:
This CVCV stage of acquisition is evidence of the infant subsuming the third type of head/dependent licensing relation discussed above in (4-1) namely, structural licensing. As noted in section 2, structural licensing is responsible for creating hierarchical prosodic structure. According to acquisition data, the dependent member of the foot (F) in (6) is a copy of the head. At this point, it is assumed that the head of the foot is on the left and that the dependent is on the right, although there is no evidence to bear out the position of the head nucleus from the data alone, except that since saliency correlates with word edges or stress, one of these may be the deciding factor. Reduplication will result in forms like [baba] 'rock-rock' (Jonah, 12 months), [mama] 'mommy' (Leslie, 11 months, Sarah, 11 months), [kaka] 'quack-quack' (Timmy, 10 months), [bebe] 'bebe' French (Carole, 11 months), [wawa] Japanese 'doggie' (Emi, 14 months).6 This copying phenomenon of reduplication is seen to follow, in the model proposed here, from the claim that dependents are the site for neutralisations and lack of contrasts. 3.4.
Emergence of subsequent canonicalforms — CVC, CW, V, VC
The following phase manifests the emergence of subsequent canonical forms such as:
44 Nancy Α. Ritter
(7)
The emergence of subsequent canonical forms — CVC, CVV, VC CVC:
CW:
VC
[pux] 'pus(js)' 'cat' Dutch (Thomas 15 months), [tsis] 'shoes' (Will 12 months), [bap] 'Birne' 'pear' German (Hans 14 months), [hap] 'schaap' 'sheep' Dutch (Jarmo, 1;7,15), [pap] 'aap' 'monkey' Dutch (Leonie, 1;9,15) [?ai] 'hi' (Jonathan 15 months), [hai] 'hi' (Jessie 15 months), [baibai] 'bye' bye (Sarah 11 months), [dau] 'down' (Alice 16 months) [o:t(o)] 'auto' Dutch (Thomas 15 months) and (Jarmo 1;6,13), [ap] 'aap' 'monkey' Dutch (Jarmo 1;7,15)
Each of these word-types does not necessarily appear in equal number of occurrences in the speech of individual children (CVC occurring more frequently than VC, for example). This observation of children's individual differences in the frequency of production of one prosodic shape over another reflects the individual's differences in choosing which type of licensing relation is expanding in his/her computational system.7 For instance, given the cognitive development of an internal binary foot structure, the next thing that can occur is expansion of the paradigmatic licensing relation to the dependent. Recall that the dependent is the site of neutralisation, which, when strictly enforced, results in the dependent housing the most neutral or weak possibility lacking all contrasts, i.e., emptiness. When this occurs, the child acquires the concept that a phonological unit may be present, yet be phonetically empty. The form CVC is just such an instance of this, with branching at the foot level, as in (6) above, but now with the final (dependent) vowel not phonetically produced. Thus the representation may be CV1CV2, where V2 = 0, which is resultantly produced as CVC on the surface.
Logical order of development in acquiring prosodic structure
(8)
45
F
/\
Ν / Ι
c
Ν
cΛ
V b a p
0 Birne 'pear', German
With respect to the CVV syllable form, this syllable type emerges as the result of the expansion of the structural licensing relation to the nucleus. Expansion of this type of licensing relation is recognised as branching of the nucleus into two vocalic components. Since branching at the foot level has already been exhausted by yielding two sister nuclei, the next lower level (i.e., the head nucleus) is the only possible site remaining that can be recognised as a site for branching. Ν
(9) C V
8
V'
Interestingly, though, initially there seems to be a constraint against allowing both branching at the foot level and at the nucleic head level to occur simultaneously, thus disallowing structures such as CVVCV from being produced. This suggests that expansion of the structural licensing relation, and perhaps any licensing relation in general, focuses on the more recent expansion (a kind of narrow scope phenomenon), without incorporating previous expansions as well to yield a more global, wide scope process. As each separate function of the computational system develops, there is a realistic counterpart that emerges in the guise of distinct syllabic forms. Once the entire system is in place, and "saturated" in the Fregean sense, then these separate functions can be conjoined to yield multi-syllabic forms. The next syllabic forms to emerge are the V and VC forms. These forms do not occur with the initial closure of a consonantal object. This suggests that the development of such forms results from the child's cognitive realisation that 1. stricture may not be the only
46 Nancy Α. Ritter
boundary marker before the prominent vocalic head, and 2. silence can also serve as an initial boundary marker. This realisation leads to the relaxing of the strict obligatory syntagmatic-content licensing relation, which requires stricture to precede aperture. In order to perceive that stricture can be optional, the child must, by implication, conceive stricture to be an independent entity at this point. An entity which can be present or not, thus yielding an alternation between CV and V type forms. In viewing the emergence of stricture as a separate phonological concept at this point, one can more readily understand why V is not the first syllable to emerge in a child's speech, and, moreover, why V emerges after what seem to be more complex syllable shapes, such as CVC and C W . Also, in line with this reasoning, it is more readily comprehensible to see why, given an adult input which is vowel-initial (such as Dutch aap 'monkey'), a child would be more likely to first produce the word by adding an extraneous onset consonant preceding the vowel ([pap]) and only later produce the correct VC form. Up until this point in the child's competence, consonantal material appearing before the vowel is perhaps merely regarded as some beginning point or transitional epiphenomemon. However, once there is awareness that there can be a separation between this initial consonantal material and the vowel, the vowel is then free to emerge on its own. The question arises as to whether the child cognitively retains the concept of their being an initial (boundary type) object, let's call it "onset", directly preceding the vowel. If so, then the child will understand that this object can either contain a set of member elements or be empty of any elements (as discussed above with respect to the vowel/nucleus). The phenomenon of being empty of any elements translates into phonetic silence or phonetic uninterpretability. Given this line of reasoning, the canonical VC shape could then be said to be a variant of the reduplicative binary foot form but with the addition of newly acquired information, namely the non-obligatoriness of consonantal stricture in the onset and the paradigmatic licensing of an "empty segment" in both the onset and final nucleus (the dependent nucleus of the branching foot). In other words, the weaker dependent positions, such as onset of the nucleus, and dependent nucleus of the foot, will be licensed to remain inaudible. Thus the rep-
Logical order of development in acquiring prosodic structure
47
resentations of surface forms such as V and VC would be as in (10), respectively:9 (10) Ν and
C V 0 a [a]
Example (11) below summarises by illustration the development of prosodic structure as outlined above: cv U binarity (structural) F
(11)
binarity Ν C V 4
V
inaudibility of dependents
CVC(V)
(C)V
(C)VC(V)
Onsets coming of age
Correlative with the observation that syllable shapes tend to become differentiated, is the fact that the ratio difference between consonants and vowels tends to increase with age during the acquisition period: from a ratio of 51 consonants: 49 vowels in children 1.5 years old to a ratio of 60 consonants: 40 vowels in 3.0 year old speech (Irwin and Wong 1983). I suggest that the emergence of this higher ratio of con-
48 Nancy Α. Ritter
sonants points to the claim that the child becomes consciously aware of the onset plane as an autonomous dimension within which the three types of head/dependent licensing relations can operate. In this way, the notion "onset" (formerly a signaller of word boundaries) is recognised by the child as an independent organisational unit. As stated in section 3.4 above, there comes a point in the child's acquisition where the child is able to make a distinction between the consonantal material preceding a vowel and the vowel itself. Yet, the question then arises as to what triggers this deliberate consciousness in the child in recognising that consonants are no longer predictable appendages to vocalic material, but, rather are distinctive in themselves. An explanation in terms of this model may be that since the paradigmatic licensing relation becomes exhausted with respect to licensing nucleic heads to vocalic material (by first recognising the salience of vocalic aperture, then incrementally recognising the importance of distinctive cues in terms of vocalic contrasts and later segmental emptiness), this licensing mechanism expands to the next dimension, i.e., the consonantal plane. In so doing, consonantal differences in terms of varying degrees of stricture are now recognised and licensed as a part of the child's phonological representation of a word.
(12) Ο C a A cognitive link emerges between the category onset, as a phonological constituent, and the prosodic/acoustic cues associated with that category. This recognition (as with vocalic recognition) entails a child's perception of salient acoustic cues, such as voice onset time and bursts. The claim here is that what has been referred to as enhancement, and sharpening or fine tuning (Aslin and Pisoni 1980) in the process of perceiving discriminable speech sounds correlates with the expansion of paradigmatic licensing with respect to the phonological concept of "onset". Onset is a category, an organisational unit, which serves initially as a boundary marker preceding the
Logical order of development in acquiring prosodic structure
49
prominent vocalic object. As a child's perception of the role of the onset category as a signaller of word-boundaries changes, the child begins to focus and attend more to prosodic cues, such as VOT and Fl and F2 frequencies, that define contrasts in his/her native language. Though this capacity to discriminate contrasts has been shown to be present early on in the auditory perception of young infants during the first six months of life (Jusczyk 1997), a recent study by Brown and Matthews (1997) shows through experimentation that the capacity to accurately and consistently discriminate contrasts is not truly present from the beginning stages of the infant's phonological development. The study's findings thus reflect an impoverished grammatical system at the child's initial state. While it may be that there is a predisposition in the infant to be able to perceptually discriminate speech sounds, such an ability at this stage could be said to be more "psycho-physical" (an aural ability to attend to certain prosodic cues) than cognitively phonological. The term "phonological" here implies knowledge (albeit unconscious) about 1. the discriminating factors among sounds that are used in one's native language to realise meaning differences in words, and 2. the cognitive representation of these contrastive sounds in one's native language, among other things. The realisation that stricture, the closure before vocalic aperture, is a discriminating phonological factor independent from the adjacent vowel, leads to the ability to formulate a cognitive link between the category onset, as a recognised organising unit (a phonological constituent), and the prosodic/acoustic cues associated with that category. This cognitive link gives life to the concept "onset" as an independent entity and thus to the ability of the paradigmatic head/dependent relation between a constituent and the segmental content contained within it to apply to this newly recognised independent object. Evidence of this cognitive link is seen in reduplicated words of CVCV form, as well as in CVC word forms which progress (at about 1;8 or 2;0 years) from consonantal harmony between the first and second consonants (e.g., [gAk], [beip], [mu:m], taken from Waterson 1987) to these consonants being differentiated: [trAk], [greip], [mu:n] (a further example, taken from Waterson 1987, is [beibe:] 'biscuit'
50 Nancy Α. Ritter
later changed to [biki?]). The transition from the phenomenon of "stopping" (using stops to substitute for fricatives and affricates) to the ability to produce other manners of articulation also shows the development in gaining further cognitive awareness of contrasts and thus producing such contrasts. With this recognition of the onset as an autonomous organisational category, which can sustain phonological sound contrasts, the computational system can then expand its operations to apply the other two types of licensing relations (structural and syntagmatic content-licensing) to the consonantal plane.10 When structural licensing is applied to onsets, branching of the onset emerges in the form of surface consonant clusters: (13)
Ο
c
c
α
β
Stoel-Gammon (1985) posits that consonant clusters in initial and final positions can be produced in some words by the average two year old. When syntagmatic content-licensing applies to the onset dimension, word-internal consonantal sequences can be realised. This type of relation licenses the possibility of coda-onset sequences and paves the way for multi-syllabic complex words through the linking of information that occurs when one object has bearing on an adjacent object. In the example below, the position of Ci is licensed by the presence of C2. The linked relationship of Cj to C2 results in the ability to produce a word with more than one syllable. (14)
O N C V p
a
O N Ci C2 V r
t
y
It should be noted that acquiring complexity (both in terms of phonological sound contrasts and in terms of structural branching) in the
Logical order of development in acquiring prosodic structure
51
onset position, which is a dependent position in relation to the nucleus, follows the acquisition of complexity with respect to the more prominent nucleic head position. This development of complexity underscores the notion inherent in the Head-Driven Phonology model that heads are the primary site of contrasts in relation to dependents. Thus the prediction here is that if contrasts are to arise, they should first be seen in prominent nuclear positions before dependent consonantal/onset ones. The facts bear out this hypothesis. Conclusions This article has attempted to provide a model in which the logical steady order of the development of prosodic structure can be understood. The claim here is that the order of development of syllabic forms is dictated to by the order of acquisition (first on the head nucleus, then on the dependent onset) of three different head/dependent relations, which are manifested in terms of licensing mechanisms. With respect to the prominent nucleic head, the child first acquires paradigmatic licensing between the head nucleic constituent (i.e., organisational category) and segmental content. (15)
Ν
I
V
The child next acquires a syntagmatic-content licensing relation obligatorily requiring stricture to precede vocalic content. The final head/dependent relation that is acquired is structural licensing which targets reduplication of the nucleic head in terms of binarity, i.e., organising two nuclei into a higher prosodic unit of the foot (6). The order of the awareness and acquisition of these head/dependent relations is a bottom-up strategy, beginning with the more phonetic or surface-like realisations (such as vocalic units, stricture preceding vocalic units) and then moving to a deeper understanding of organisational categories and their representations (i.e., the CV unit and the foot).
52 Nancy Α. Ritter
Once this ordering prevails in this bottom-to-top direction, the head/dependent relations are then expanded and reapplied in a topdown fashion. At this point, individuality and preference may play a role in determining which head/dependent relation the child chooses to further expand first. Such preferences account for why some children may have more of an abundance of VC syllables than C W syllables or more CVV forms than CVC ones; cf. Kent and Bauer 1985 whose study of 13-month-olds found that VC and CVC word forms each accounted for only about 2% of the syllabic structures produced while a form consisting of just a single vowel (V) comprised 60% of the structures produced. The fact that V forms were produced vastly more frequently than VC forms could come from the fact that with V forms, the child learns merely to relax the obligatoriness of syntagmatic-content licensing, but with the VC forms, the child must combine this process with the paradigmatic licensing relation which additionally recognises an empty segment in the final nucleic position (0VC0). Once awareness of the onset plane emerges, these three types of head/dependent relations are again ordered in a bottom-up fashion: paradigmatic licensing (contrasts and differentiation of consonantal sounds), structural licensing in the form of branching of the onset constituent, syntagmatic-content licensing in the form of coda-onset sequences in word-medial position. The conclusion, here, then is that a child does not acquire prosodic structures by internalising the perceived adult forms. Rather, such prosodic structures directly result from the different kinds of head/dependent relations which are being internalised by the child. In other words, the child is acquiring and adopting different ways of sorting and grouping salient information, which is received from acoustic cues perceived by the child. And it is these different ways of organisation of information which the child is developing that causes the child to produce certain prosodic structures at certain stages in the child's language development. Thus, the above discussion has attempted to demonstrate that there is a gradual building up of the computational system and that the stages of development of prosodic structure are the manifestations of the computational system, itself, being constructed.
Logical order of development in acquiring prosodic structure
53
Notes 1. Anderson and Ewen (1987) have a similar position and call this parallelism "Structural Analogy". 2. This article focuses on the acquisition of prosodic structure in spoken language only. The undertaking of also analysing the acquisition of sign language is, unfortunately, too large a topic for the scope of this article. Albeit, it may be the case that perceptual saliency in the visual language mode may be attributed to movement, specifically with respect to the size of the movement, amount of space covered along a path, and location on the body. However, this is merely speculation and further investigation is needed into this topic to determine how the development of prosodic structure in sign language relates to a similar developmental process in spoken language, especially if it is claimed that this process is a physical realisation of cognitive growth. 3. The ability to be attuned to specific acoustic parameters may be part of the domain-specific Universal Grammar component. 4. Both visual and auditory perception play a role at this point. Cf. McGurk's comments on Waterson's (1981) paper in which McGurk says that "reaction time data reveal that CV syllables are identified on the basis of lip movement information prior to their being perceived auditorily." (Myers, Laver, and Anderson 1981). Locke (1995) also discusses the visual component in the infant's perception of speech. 5. In syntactic terms, this type of head/dependent relation can be seen in the predicate-head determining the thematic role(s) of its dependent arguments). 6. Data taken from Appendix Β of Vihman (1996) and from C. Levelt, Schiller, and W. Levelt (1999/2000). 7. According to a study done by C. Levelt, Schiller, and W. Levelt. (1999/2000), in which the data of 12 Dutch children were aligned along a Guttman scale for syllable type, there seems to be an order of development among the syllable types that I have chosen to group together in this section. The findings of Levelt et al. suggest that after the quintessential CV syllable is produced, syllable development progresses as follows: CVC, V, VC, CVCC, VCC, CCV, CCVC and CCVCC. This ranking does not include the C W syllable type because of the researchers' assumption that initially vowel length is nondistinctive and there is thus no contrast between CV and C W . However, according to other data of children in a similar age range, while length may not be so clearly distinctive, there does seem to be a distinction between single vowels and diphthongs, thus leading to a distinction between CV and C W . Therefore C W has been included in my analysis along with the other canonical syllable forms.
54 Nancy Α. Ritter If Levelt et al. are correct in their assumption of an order of development of syllable shape, then in my model this would mean that there is a strict order to the expansion of each licensing relation. 8. It may also be posited that C W is a manifestation of relaxing the strict obligatory need for stricture before a vocalic element, as seen with the syllable types V and VC. Thus, it may be the case that the foot is still perceived as branching but that the second phonological consonant is phonetically empty (CVOV). 9. Given the expansion of the computational system at this point, VCV would also be a viable form. However, because two vowels are realised, the form would have two salient vocalic peaks and would thus be considered by the child to be bisyllabic, composed of a syllable containing only V followed by CV. This differs from the CVC syllable, although the two ultimately have the same underlying cognitive (phonological) structure, in that in the CVC case, only one of the nuclei is phonetically realised as a vowel. Since there is only one vocalic peak, the child produces only one syllable. 10. In the marked instances of strict CV languages, while paradigmatic licensing operates in the consonantal dimension, syntagmatic-content licensing between C and V is strictly observed and structural licensing is precluded from applying. Thus the growth and expansion of the different head/dependent relations is held in check in the computational systems of speakers of strict CV languages.
On the computability of certain derivations in Government Phonology* Geoff Williams
Introduction A characteristic of work in Government Phonology is the avoidance of arbitrariness in the analysis of phonological phenomena, such that as far as possible, all and only the attested behaviour is predicted. This restrictiveness and non-arbitrariness is built into the theory in two different ways: firstly by means of substantive constraints, such as the small set of representational primitives and restricted size of constituents; and secondly by restrictions on the formal machinery allowed. These take the form of, for example, the limitation of phonological events to element composition and decomposition, together with constraints on derivations and representations such as the Projection Principle (Kaye, Lowenstamm, and Vergnaud 1990), which bans resyllabification, and the Uniformity Condition (Kaye [1993] 1995), which ensures that segmental representations are fully specified at all levels. Much of the constraining power of these devices however is expressed in relatively informal terms, in keeping with the general tenor of work in theoretical phonology, and the effects of the relevant principles in terms of the class of languages generated by the theory has not been formally demonstrated. Hence the theory has been challenged in just this regard from proponents of declarative approaches in particular: e.g., Coleman (1995: 348).1 Although much of that criticism is based on an inaccurate interpretation of an outdated version of Government Phonology, including such aspects as charm theory, which has now been abandoned, in my view some interesting points arise from the "clash of cultures" represented by the two approaches.
56
Geoff Williams
On the one hand research in Government Phonology has led to strong, and it turns out empirically accurate, predictions about the nature of phonological systems, which argues that it captures some principles of grammar in a non-trivial way. This is achieved in a model which allows a limited notion of derivation, that is there are mappings between lexical forms, surface forms and ultimately the signal. Such a model is justified in terms of the economy and elegance of the analyses it makes use of (cf. the selective targeting of element depletion). However there remain certain aspects of phonological systems that resist non-arbitrary solutions, such as the interpretation of unlicensed empty nuclei in certain languages (which has previously been analysed by invoking ambient elements), and analyses that involve delinking and reassociation of material. Although these processes are subject to substantive constraints such as their being limited by the number of elements in the theory, Coleman is correct in pointing out that this type of constraint does not limit the class of languages permitted by the theory. Analyses of this type then invite justifiable criticisms of formal inadequacy, since they involve arbitrary insertion of material, recapitulating the sort of pathology that led to the demise of Chomsky and Halle's (1968) Sound Pattern of English formalism. The declarative approach is claimed to be a superior alternative to any derivational model since it attempts to characterise grammatical regularities purely in terms of well-formedness conditions on surface forms. Rule ordering is therefore disallowed by definition. Constraints are combined with mathematical operations that are proved to be formally limited in the appropriate ways. While this does effectively eliminate derivations and allow computationally tractable recognition and parsing of structures, serious problems arise. For instance, since an unlimited range of constraints is expressible to the satisfaction of the mathematics, there appears to be no way of limiting the nature of these constraints so that only natural ones are expressible. The problematic phenomena mentioned in the previous paragraph are amenable to "analysis" by declarative constraints, for example. Another example concerns syllable structure, discussed by Coleman (1995: 361-362). A context-free grammar, he argues, is adequate to capture the properties of syllable structure in English.
Computability of derivations in Government Phonology
57
There is no doubt that this analysis is formally restricted, however the simple context-free analysis that he gives makes unsubstantiated claims about the constituent structure of words, which could be either true or false. To take one specific example, it assigns word-final consonants to the coda position,2 a claim which, although widely accepted in mainstream phonology, is denied within Government Phonology based on the analysis of Kaye (1990a). Hence, since empirical data tend to be overlooked in favour of computational efficiency, the empirical content and insights of this approach are severely compromised in practice, although surely not in principle. Placing the highest priority on formal restrictiveness then comes at an extremely high price in this approach. The question I address here then is whether the two desiderata are satisfiable in a single theory: can there be a model of phonology that both makes strong empirical claims and is adequately formally constrained at the same time?3 In what follows I will try to give an answer to this question, using a characterisation of formal power in terms of computational complexity. The paper is organised as follows: after a brief introduction to complexity theory and its application to grammatical analysis, we compare a typical standard autosegmental account of vowel harmony with a Governmentphonological account. In the final section we apply complexity theory to establish a formal basis for the tractability of Government Phonology using a case study of Turkish vowel harmony.
1 1.1.
Formal properties of phonology Why formal properties matter
There are a number of possible responses to the charge that Government Phonology is formally not sufficiently constrained. One is that it does not matter: an intuitive understanding of the restrictiveness of the relevant Government-phonological principles, plus the quality of the results obtained is sufficient to justify the approach. The original motivation for me to take up this challenge however was the purpose of establishing Government Phonology as a tractable platform for
58
Geoff Williams
speech recognition (in Williams 1998, especially chapter 7). If we take seriously Kaye's claim that the role of phonology is to facilitate speech recognition by human beings, then the issue of formal restrictiveness of the theory becomes no less than that of its psychological plausibility as a model of phonological competence. Consider the task of the phonological component in a speech recognition system. It has to convert speech signals into phonological representations which can be used by the lexical access component to recover the sequence of words, and ultimately the structure and meaning of the utterance. In Government Phonology, this involves a pattern recognition stage targeting elements, mediated by a broad class analysis that provides an initial segmentation, which is parsed into a partial constituent structure representation.4 Our main concern here is the derivational component however. Since the signal contains "surface" or derived forms, namely those on which phonology has been performed, and the recognition system must recover these forms, it is important to establish that the mapping between pairs of representations is computable. The computational power of the Government-phonological formalism itself, in terms of the class of languages to which it belongs, has an obvious bearing on the efficiency of recognition algorithms based on the framework. However this is only part of the story, since it has been convincingly argued (almost exclusively with respect to syntax) that the formal restrictiveness of grammars does not tell the whole story of the efficiency of language processing (see Barton, Berwick, and Ristad 1987). Grammar-external factors such as the properties of the processing algorithm or the exigencies of learnability also play a role.5 It is worth considering then to what extent these other factors influence processing efficiency in phonology, although obviously we can hardly expect to do justice to both aspects here. In this paper we will restrict ourselves to the contribution of a fundamental principle of Government Phonology, namely the Uniformity Condition (Kaye 1995), to the computational properties of derivations involving a spreading analysis. This illustrates how a principle established for reasons other than purely computational, turns out to be crucial in rendering certain derivations computationally tractable.
Computability of derivations in Government Phonology
59
An idea central to the declarative approach is the use of weak generative capacity analysis as the sole technique for analysing the formal properties of grammars. The limitations of this technique however were pointed out early on in Chomsky (1965), who concluded that weak generative capacity was of "rather marginal linguistic interest", only revealing anything about a proposed grammar if it failed in this regard, i.e., if it cannot even generate the required strings. While this conclusion has been widely acknowledged in syntactic theorising, it seems to have been largely ignored in phonology. Coleman's (1995) criticisms of Government Phonology, for example, are all expressed in terms of weak generative capacity. The weaknesses of this technique are detailed in Barton, Berwick, and Ristad (1987), who show that: 1. it does not capture true insights into a grammar, and 2. contrary to expectations there is no guarantee that the grammars at the bottom of the Chomsky hierarchy are the easiest to process. A major flaw is that it addresses only the properties of string sets: it is concerned only with the strings generated by a language rather than the structure assigned to them (or strong generative capacity). That is, two alternative grammars may be able to generate all and only the strings of a given language and yet assign different structural descriptions to them. Thus the implicit and usually unjustified assumption in weak generative capacity analysis is that linguistic systems are concerned only with linear strings rather than the structural properties of the complete representation, some of which may not be directly observable and are often theory-dependent. This is precisely what underlies the claim mentioned earlier that syllable structure grammars are necessarily context-free, since what appear to be the correct surface structures can indeed be generated by a context-free grammar. The bulk of phonological evidence however argues strongly that it is not simple strings of segments, but more enriched structures, that are required to characterise phonological constraints. Witness for example the many processes and constraints that are sensitive to constituent structure.
60
1.2.
Geoff Williams
Complexity analysis and natural language
In view of these failings we will be using another technique, computational complexity theory, the application of which in linguistics has been pioneered by Barton, Berwick, and Ristad (1987) and Ristad (1990), and continued by van de Koot (1995). Complexity theory deals with the computational resources needed to solve a problem. Problems can be divided into various classes according to the time (or memory space) needed to solve them. A clear and rigorous division exists between classes of tractable and intractable problems, and the class to which a new problem belongs can be proved by the method of reduction, the demonstration of an effectively computable6 mapping from the new problem on to a known problem in the appropriate class. For example, a task such as retrieving a name from an alphabetically sorted list can be solved by simply comparing the name against each one in the list until we hit the correct one. If there are η items in the list, then in the worst case η comparisons will be made, and we say the problem is soluble in time proportional to the size of the problem, n. Clearly, better algorithms exist than this one to perform the same task, but what we are interested in is the structure of the underlying problem, not a particular algorithm to solve it. Thus there is an important distinction between problem and algorithm complexity. The quality of an algorithm cannot affect the class of the problem to which it is a solution. In general, any problem which is soluble in time proportional to some polynomial function of the length of the input η (i.e., nx, where χ is a positive integer), is said to be in the class of tractable problems known as P. These easy problems include alphabetical sorting, finite-state parsing and context-free language recognition. Many problems are outside this class and are said to be in Ν Ρ (soluble on a Non-deterministic machine in Polynomial time). A problem which is at least as hard as any in this class is said to be "NP-hard". If the upper bound of the problem is no harder than Ν IP it is said to be "NPcomplete". Such problems are characterised by the need for bruteforce search: essentially guessing each possible solution and testing whether it is true or not, the latter being normally easily performed in polynomial time. One such problem is the so-called satisfiability or
Computability of derivations in Government Phonology
61
"3-SAT" problem, which deals with arbitrary Boolean formulas like the following:7 (1)
General 3-SAT formula (xV->yV-«Vz)
Given such a formula, the problem is to find an assignment of true and false values to the variables consistently such that the whole expression is true. I f y is true then ~>y is false, and vice versa. If there is a solution then the whole clause is said to be satisfiable. It follows for the whole expression to be true that each clause must contain a true literal. This is a classic case of an intractable problem since the only effective solution is to try every combination of assignments and calculate the resulting value for the whole formula. Since there are η binary-valued variables in an arbitrary formula, there are 2" possible truth-value pairs to test, and the solution therefore takes a time exponential in the length of the formula. The 3-SAT problem is therefore in the class Ν Ρ of intractable problems, and any problem that can be reduced to this form in polynomial time is therefore also intractable. This problem will be used in the reduction of autosegmental phonology described below. How is the computational problem formulated with respect to linguistic formalisms? The problem which Barton, Berwick, and Ristad (1987) use as a measure of the complexity of a grammatical framework is the so-called Universal Recognition Problem given in (2):8 (2)
Universal Recognition Problem Given a grammar G (in some grammatical framework) and a string a, is a in the language generated by G?
In computing the solution for a given G and a, the structure assigned to a is automatically taken into account under this formulation. The complexity of computing an underlying form must be at least as high as that of the Universal Recognition Problem for the grammar G. Hence the complexity of the Universal Recognition Problem for a given grammatical theory is a direct measure of the difficulty of pars-
62
Geoff Williams
ing the languages generated by the class of grammars specified by the theory.9 Ristad (1990) presents a comprehensive complexity study of modern linguistic theories from which he concludes that all linguistic frameworks are roughly in the same boat computationally, in that they are NP-complete: they are all intractable. This also applies to the two-level morphology framework (Koskenniemi 1983), which is often cited as a model for finite-state implementations. If true, this presents a paradox since one would not expect human beings to be able to process language with the ease and rapidity that they do, or else we would need to posit the unlikely scenario that they have unlimited computational resources available in the form of nondeterministic or parallel processing machines. To avoid this problem, other solutions have been proposed including modifications to grammatical theories, and substantive restrictions on languages such that the hard cases do not arise.
2 2.1.
Complexity of derivations in Government Phonology Interpretation of empty nuclei
One area where Government Phonology-type grammars predict problems is in the licensing of sequences of empty nuclei. An empty nucleus is licensed by an unlicensed nucleus immediately to its right on the nuclear projection. If that nucleus is also empty its status must be determined by the next nucleus to its right. Thus, as there are a potentially unbounded number of adjacent empty nuclei in a word, this process could require exhaustive search and unlimited storage. (Obviously words do not have infinite length, but this is irrelevant from the point of view of complexity, since any fixed-sized problem can be solved by searching through a pre-compiled look-up table.) In practice this situation never arises. In French for example, wordinternal empty nuclei commonly occur, but it is hard to find a word with more than three in sequence (e.g., semler "to sole", realised as [smel] in the third person singular, with the representation /s0m010/). The largest number of consecutive empty nuclei found
Computability of derivations in Government Phonology
63
in English seems to be three, and this only in a morphologically complex form such as sixths parsed as [[[siks0]00]s0] (where the square brackets here indicate phonological domains). In these forms moreover the empty nuclei are domain-final and do not require a following licensor. In summary, this seems to parallel the case of multiple centre-embedded structures in syntax, which although licensed by the grammar do not turn up in natural speech, presumably due to their processing difficulty. Consider now the general properties of derivations in Government Phonology. Since Government Phonology recognises two linguistic levels, known as L(exical)-structure and P(honological)-structure, it could be classed as a two-level system. And since it uses the association principles of standard autosegmental phonology, its derivational mechanism is likely to be non-regular (as shown in Wiebe 1992; Bird and Ellison 1994). We require, then, some form of complexity argument that demonstrates the tractability of a Governmentphonological derivation. In general, phonological events are characterised in Government Phonology as involving the application of a function Φ to lexical forms (Kaye 1995: 302). The function takes a phonological string X as argument and returns another phonological string, which is the result of the application of phonology to X. Clearly, there are many interacting principles and mechanisms invoked in Government-phonological analyses and it would be impossible to give a complete demonstration of the properties of Φ here. However a complexity analysis of some derivation based on autosegmental phonology makes a good starting point in the analysis of Government-phonological derivations, and accordingly we provide this along with a case study of vowel harmony.
2.2.
Complexity of autosegmental phonology
A complexity analysis of the autosegmental framework is provided in Ristad (1990) and in Berwick (1991), using the example of a hypothetical vowel harmony system. Despite the apparent modularity of the formalism of autosegmental phonology, unexpectedly it turns out that autosegmental recognition is computationally intractable
64
Geoff Williams
since it can be reduced to the 3-SAT problem. The reduction is based on a generalised case of a language involving three separate vowelharmony processes, in keeping with the spirit of complexity analysis which studies general cases rather than artificially bounded ones. Crucially though, segments are lexically underspecified for feature values, and constituent structure is not encoded in lexical entries. The reduction from 3-SAT proceeds by identifying each instance of a feature in the representation of a string with a literal in a 3-SAT formula (see Berwick 1991; or Ristad 1990 for full details). Each variable is therefore forced to be assigned true or false consistently throughout the sequence, corresponding to agreement (or spreading) of a particular feature. The 3-SAT structure is duplicated by building a syllable structure on the representation based on a CV tier in which each syllable contains three positions including at least one V slot (see Figure 1, page 65). Each 3-SAT clause then corresponds to a syllable. By making V equal to true (using the feature value [-neg] in the diagram), we enforce the condition that each clause have at least one true value. Since only the surface form is visible and segments are underspecified, we cannot determine the underlying value of a segment's features by inspection: i.e., whether it is lexically specified as + or - for a given feature. Thus the problem displays the same type of indeterminacy as a sentence like police police police, in which each word is in principle ambiguous between a noun and a verb (ignoring word-order cues). With this mapping in place the segmental sequence has a valid autosegmental feature assignment if and only if the corresponding 3-SAT formula is satisfiable. Since the mapping between the representation and the 3-SAT problem can be done in polynomial time, the general autosegmental recognition problem based on feature underspecification is NP-hard.
Computability of derivations in Government Phonology
V
C
[X f-x: 1 -neg] [f-y: 1 +neg]
C
[f-z: 1 -neg]
C
[x f-x: 1 +neg]
V
65
c
[f-y: 1 -neg]
[f-z: 1 -neg]
Μ I I II Figure 1. Illustration of the reduction of standard autosegmental representations to the 3-SAT problem. (After Berwick 1991).
The key property of autosegmental representations as understood in this proof is that the same features encode segmental properties on the feature/variable tiers and syllable structure on the CV tier.10 In this way the value of a given feature in a syllable can affect its neighbours, since it has an effect on the overall truth value of the vowel and hence on the other segments in the same syllable. The only way to find a valid set of values for all features in the sequence of segments is to try all combinations of features and consonant/vowel settings. Hence the problem is exponential in the length of the input string (number of different features). It might be objected that this proof is not based on a specific analysis. However given that the framework used in the proof exploits existing and commonly used mechanisms of autosegmental phonology, the specific rules and representations used are in principle available to an autosegmental
66
Geoff Williams
analysis, unless ruled out by some substantive constraint. The question is whether this assumption applies to all phonological theories that employ the autosegmental notation. As Berwick states (1991: 148), "the complexity reduction can be blocked if distinct tiers cannot share the same features." If we could show then that this featuresharing property did not apply to Government Phonology we could demonstrate that the framework is computationally tractable in the general case (at least with respect to spreading processes). Two assumptions lie at the core of the complexity problem: 1. syllable structure is determined in terms of the relative sonority of adjacent segments (measured in terms of their feature specifications), and 2. segments are lexically underspecified. Assume that a segment X has to be specified as [+high] in order to be less sonorous than its neighbour and that the feature value [-high] is spreading from left to right affecting each vowel. If X is lexically unspecified for the feature [high] then it cannot be syllabified until the spreading process has applied, but likewise the spreading process cannot apply until the segment has been syllabified as a vowel. It is this indeterminacy which causes the intractability. Now consider the version of autosegmental phonology instantiated by Government Phonology. Constituent structure is established lexically on the basis of governing properties of pairs of adjacent segments11 determined by their complexity, expressed in terms of an element count (see Harris 1994: 167ff). A segment is assigned lexically to an appropriate constituent and remains so regardless of what properties spread. Also, since representations are fully specified at all levels (by the Uniformity Condition, cf. Kaye 1995: 292), the makeup of each segment in a derived (surface) form is evident by inspection. Spreading in a vowel harmony process is also restricted to the nuclear projection and can never substantially affect an onset, except perhaps by adding a resonance feature as a secondary articulation (e.g., palatalisation). Indeed spreading between nuclear positions cannot take place unless a nuclear projection is established, i.e., the constituent structure must be largely determined in order for spreading to occur. Put another way, since elements occupy distinct tiers in an autosegmental chart, and governing relations that determine constituent structure do not refer
Computability of derivations in Government Phonology
67
19 to resonance elements (A, U and I) but only source elements, this is in fact how Government Phonology instantiates the constraint that distinct tiers cannot share the same features, which blocks the reduction to 3-SAT. These fundamentals of representations in Government Phonology ensure that there can be no indeterminacy in representations and therefore the recognition problem in Government Phonology (for this area of the phonology at least) is tractable. Here is a clear example then of how a principle of Government Phonology works to restrict the formal power of representations and derivations. 2.3.
A case study: Turkish vowel harmony
As an example of a recognition computation in Government Phonology, we will consider the recognition problem for a Turkish form involving vowel harmony. This is a useful example because the observed facts are well known and uncontroversial, and have also been discussed in the computational phonology literature (e.g., in Bird and Ellison's 1994 "one-level phonology"). We will also compare the Government Phonology approach with the one-level approach. The vocalic system and vowel harmony processes of Turkish are well documented, briefly stated as follows: the language contrasts the eight vowels /i, y, u, e, ce, o, a, i f . Any vowel can appear in the first nucleus of the stem. Suffix vowels are subject to two independent types of harmony. The first demand that consecutive vowels agree in fronting such that if the first is from the set {i, y, e, ce} then all following vowels in the domain have to be front too (except that [oe] can occur only in the initial nucleus); the second requires that consecutive vowels agree in rounding unless the target vowel is already low, such that if the first is a member of the set {u, o, a, i}, then all following vowels must be round as well (except that [o] occurs only in the initial nucleus). Bird and Ellison give an analysis of the rounding harmony process in terms of a finite-state automaton which accepts all and only valid forms, which they claim to be a purely declarative treatment of this aspect of Turkish phonology. The automaton (see Figure 2, page 68)
68
Geoff Williams
is derived from a procedure that converts autosegmental rule charts into regular expressions expressed in a fairly complex notation. They state: "[ijnformally, this automaton will accept any sequence of vowels except those where a round vowel is followed by an unrounded high vowel." (Bird and Ellison 1994: 76). We omit the details of the analysis, which is essentially a notational variant of a standard autosegmental treatment. The problems associated with their interpretation of phonological processes as descriptions have been discussed above, and we are concerned now with the complexity of the recognition procedure. It seems uncontroversial that the automaton can be used to recognise Turkish forms in linear time with no excessive storage requirements, and its form follows directly from the analysis.
Figure 2.Bird and Ellison's automaton for recognition of Turkish forms subject to rounding harmony. (The arrowhead symbol indicates a start state.)
I now sketch an analysis of the same phenomenon in Governmentphonological terms. (3)
Turkish Vowel Harmony:13 1. Element I spreads from leftmost (head) nucleus throughout the domain.
Computability of derivations in Government Phonology 69
2. U spreads rightwards into all positions except those containing A. Additional parameters: 3. No branching nuclei.14 4. Licensed domain-final empty nucleus parameter: YES. This is the grammar to be used in the language recognition task. The appropriate Universal Recognition Problem is as follows: (4)
Given the above (fragment of) grammar G in (3) and a string a, is a in the language generated by G?
To design a recognition algorithm we take note of the fact that successive nuclei must: 1. agree in the specification for the / element, and 2. agree in the specification for the U element provided the target nucleus does not contain A. This boils down to two conditions: 1. If I is a member of the first nucleus, it must be a member of every nucleus in the domain. 2. If U is a member of the first nucleus, it must be a member of every nucleus in the domain except those lexically containing A. These are just statements of the relevant processes expressed in declarative terms, and with these in place we can construct the recognition algorithm given in (5), below, for Turkish forms (with respect to vowel harmony). Since it is only necessary to perform at most three simple comparisons on each nucleus (once in the case of I, plus either once or twice in the case of U), the algorithm clearly runs in time proportional to the length of the string. Also its storage requirements are trivial since the only bits of information stored are the values of the variables "Idomain" and "U domain". Hence the algorithm is adequate to make our intended point that the recognition problem (as defined above) is tractable in Government Phonology, a derivational theory.
Geoff Williams
Government-phonological recognition algorithm for Turkish forms involving both types of Vowel Harmony Begin { Set length of string = jmax #(number of nuclei) l-domain = FALSE; U-domain = FALSE; set j = l ; Read contents of head nucleus Ni; if I e Ni then Ldomain is TRUE; if U e Ni then ILdomain is TRUE; while j < jmax do { set j = j + 1 ; If (j =jmax AND Nj = _) # accept final empty nucleus Then break; if Ldomain then { if I f£ Nj then break;
}
else { If I e Nj then break;
}
if U_domain then { if (U € Nj AND A e Nj) OR (U e Nj AND A e Nj) then break;
}
else { if U e Njthen break;
}
}
If j=jmax then accept; else reject; } end.
Computability of derivations in Government Phonology
3
71
Discussion
What the above example shows is that a computable recognition (as opposed to a parsing) procedure is available equally well in a derivational model as it is in a declarative one. As long as the set of constraints encoded in the algorithm follows relatively directly from the procedural statement of the process then there are no real problems, except that each type of analysis needs to be considered separately. Indeed this seems like the correct division of labour: the goal of phonology is to define the form of the grammar represented in the minds of language users, not the algorithms they use to process utterances. But there is arguably still something wrong with this solution, as it is not clear that the algorithm directly implements the underlying grammar in (3). It is concerned only with the "surface-true" generalisations of Turkish forms and is therefore performing something more akin to a weak generative capacity test. In fact it is no more than a procedural coding of Bird and Ellison's finite-state automaton. Surely what we really need to do to remain true to the spirit of complexity analysis is to apply the grammar as stated to a given string. This is really a question about the status of phonological generalisations in Government Phonology: are they to be interpreted procedurally as processes which convert representations into new ones, or in purely declarative terms as well-formedness conditions? Do events such as /-spreading and the like actually happen? Little has been written on this issue by Government phonologists, but Kaye (1995: 301) has addressed the question as follows with reference to derivations in Yawelmani involving the interpretation of empty nuclei: "It does not really matter if one considers these devices as processes or conditions of well-formedness of phonological structure. Events take place where they must and the Yawelmani results follow from the principles and parameters approach [...]."
On the same issue, Harris (1994: 271) states that "within an authentically generative model of grammar, phonological derivation does no more than define the distributional and alternation regularities that hold over phonological representations, [...]". Derivations then, as construed in Government Phonology, do not convert "abstract" rep-
72
Geoff Williams
resentations into ever more concrete ones suitable for input to articulation and perception, as they were conceived of in the model provided in Chomsky and Halle's Sound Pattern of English}5 This same duality is found generally in formal systems, for example in the understanding of functions either as active processes or in a static sense as a special type of relation between two sets (Partee, ter Meulen, and Wall 1990: 31). I will therefore construe the function Φ as purely relational, without any implication of an active process converting one representation into another. Statements like those in (3) are merely procedural formulations of the generalisations which are licensed by the axioms and formal mathematical operations allowed in the theory. The latter constitute the real propositional content of the theory and this is what should be used to test its computational (and indeed explanatory) adequacy. This characterisation, in which the structures produced by the theory are separate from the algorithms used to recognise and parse it, then has the significant advantage over purely declarative models that it can not only provide a constrained phonological analysis, but also delimit the class of phonological processes and objects. Since there is no formal difference between the procedural formulation of the analysis and the generalisations it captures, we can conclude that there is no material difference between the statements of the Turkish vowel harmony processes in (3) and their declarative formulation given in conditions 1 and 2 above, below (4). Thus freed from the need to "undo" the effects of the processes expressed in (3), we are entitled to conclude that the algorithm given in (5) does in fact constitute a proof of the computational adequacy of the recognition of forms involving a spreading analysis in Government Phonology. Returning to the question posed in the introduction then, I believe I have shown that certain substantive constraints and principles of Government Phonology do indeed have formal constraining power. In particular, I have demonstrated how the fact that constituent structure is assigned lexically and remains fixed throughout a derivation (the phonological Projection Principle), combines with the Uniformity Condition to ensure that representations generated in a common type of harmony process are computationally tractable. Although we have not dealt with the "killer" cases mentioned in the introduction,
Computability of derivations in Government Phonology
73
namely those involving ambient elements or the apparent replacement of elements, the demonstration given covers a large chunk of the derivational machinery used in Government Phonology. Therefore I suggest that if a phonological theory is organised along the lines of current Government Phonology, it is possible for it to satisfy both the goals of formal restrictiveness and explanatory adequacy at the same time.
Notes *
1.
2.
3.
4. 5. 6. 7. 8.
I am grateful to Stefan Ploch for his helpful and detailed comments on the first draft of this paper, which have gone far beyond what one normally expects from an editor and have helped to clarify some of the arguments and conclusions. Naturally all remaining errors, omissions and lack of clarity are entirely my own responsibility. Ploch (1999c) also challenges some of the advantages that declarative approaches are claimed to have over derivational frameworks according to Coleman (1995, 1998). It is not a necessary feature of the analysis however. A context-free grammar could allow for word-final consonants in the onset by treating the word-final syllable as a special constituent which is permitted to consist of a well-formed onset followed by an empty nuclear position. Since any theory which makes empirical claims is both testable and formalisable, there is a trivial sense in which the answer to the question is Yes. I am specifically addressing a stronger form of the question, though, in which we take a theory, expressed as a set of principles, substantive constraints, formal objects and operations allowed on them, and evaluate the extent to which it meets (my own) criteria of formal adequacy. Notice that this is the exact opposite of the strategy taken in the declarative approach, where one starts out with a particular formalism and evaluates its ability to capture empirical facts. I discuss the properties of an algorithm which can achieve this, at least in the case of isolated words, elsewhere (Williams 1994; 1998: chapter 7). Shieber (1985) makes the same point in his demonstration that the syntax of Swiss German requires greater than context-free power. Specifically the mapping must be able to be performed in polynomial time. The '3' in '3-SAT' refers to the fact that there are exactly three literals in each clause. Berwick (1991: 130) gives the technique the more fitting description of intensional grammatical framework analysis, as opposed to the extensional or language analysis given by the study of weak generative capacity.
74 9. 10.
11.
12.
13. 14.
15.
Geoff Williams This succinct characterisation of how the Universal Recognition Problem relates to parsing difficulty is taken from van de Koot (1995). In Ristad (1990) the argument is somewhat different. It relies on the fact that the "stricture" feature, which essentially determines whether a segment is a C or V, is ellipsed and hence not available by inspection. According to an early proposal in Harris (1994: 170-176) the direction of government was determined by the complexity gradient between adjacent segments, expressed in terms of an element count. In more recent versions of Government Phonology, most researchers argue for a reduced set of elements, rendering the complexity argument no longer workable. I do not take a stand on that issue here, since whatever the eventual resolution to the problem, the claim that constituent structure is fixed at the lexical level is taken to be axiomatic. A possible counter-example to this claim comes from the absence of t in the coda in English (*-tp, *-tk). Since all stops would be expected to display the same governing properties, there must be some additional constraint operating to produce this distributional asymmetry. A speculative account is given in Harris (1994: 289n46), in which he proposes to explain the data by appeal to the absence of geminates in English. A thorough Government Phonology analysis is given in Charette and Göksel [1994] (1996), [1996] (1998); Ploch (1998, 2002b). Although Turkish does have long vowels (mainly in Persian, and Arabic loanwords, these are never vowel-harmonic. It is likely that these will turn out to be analysed as sequences of non-branching nuclei, as has been proposed for surface long vowels in Semitic languages. One can object to these arguments on the grounds that they do not address the main implication of a derivational model, namely that more than one phonological level is recognised. If the same generalisations can be captured with a one-level model, then we should prefer that model on grounds of economy. My purpose here however is to defend the standard GP model against charges of computational inadequacy, rather than on its linguistic merits, which I am taking as given. This is not to say that GP cannot be improved upon, or that the same results could not in principle be achieved in a one-level system. Many phonological phenomena can be described in terms of surface-true generalisations, implying a one-level model. However I know of no phenomena that can be explained more successfully within a single level than in a two-level system. If derivations of other types of process within the GP framework turn out to require unfeasible amounts of computing resources, unlike the harmony process I have discussed here, I would then be convinced of the need to attempt to reduce phonology to a single level.
Structure paradoxes in phonology* Harry van der Hulst
Introduction Referring to arguments that suggest that phonological hierarchical structure is present both lexically and postlexically, I argue in this article that the latter is not built "on top of" the former but rather independently from it. This allows lexical structure (up to the maximal domain that the lexicon provides) to differ from the postlexical structure relevant to the corresponding domain. Allowing co-existing, different structures solves many of what I will call (phonology-internal) structure paradoxes. One of the most difficult and most debated issues in linguistics is whether a principled division can be drawn between "words" and "sentences". Discussions of the dichotomy involve a great number of term pairs for both dimensions of language (or grammar). Linguists working on the structure of utterances from the view point of a correspondence in the compositionality of form and meaning, try to argue in favour (or against) a distinction between "morphology" (words) and "syntax" (sentences). Psycholinguists may talk about "storage" (words, but also idioms) and "computation" (sentences, perhaps including inflection). Phonologists, when addressing this issue, will refer to "lexical" (word) phonology and "postlexical" (sentence) phonology. Taking a historical perspective, I refer to at least one earlier, structuralist writer who has made a principled distinction between the notions word and sentence, e.g. E. Sapir (1921, chapter 2). In this article I will focus on a specific consequence of making a distinction between "lexical phonology" and "postlexical phonology". Roughly, we can equate "lexical phonology" with the phonology of units that are stored or processed in the lexicon, whereas postlexical phonology applies to units that are constructed in the syntax. Needless to say that there is no way of knowing beforehand (i.e., on
76 Harry van der Hülst
theoretical grounds alone) which units are stored or processed in the lexicon and which units are constructed in the syntax. All the notions and distinctions involved are heavily theory-dependent and different linguists have proposed very different demarcations using many different types of evidence Turning to phonology, the distinction between two levels of phonology has been revived by Kiparsky (1978, 1985) in a model called Lexical Phonology. Within lexical phonology, various criteria have been mentioned to differentiate between so called lexical rules and postlexical rules; cf. the overview in Kaisse and Shaw (1985). Assuming that a distinction between lexical and postlexical phonology is warranted, I wish to focus on some specific issues that regard the division of labour between both levels. The treatment of phonological phenomena that are limited to the word is, in principle at least, open to both lexical and postlexical analysis. This is because the word is available lexically (as the domain of maximal size, ignoring idioms), as well as postlexically (as the domain of minimal size). If hierarchical structure is exclusively assigned postlexically, processes or alternations that are conditioned in terms of hierarchical structure must be postlexical as well. We then expect these processes to always have the properties that have been argued to identify postlexical processes, such as exceptionlessness, variability, gradualness and blindness to grammatical information (cf. Kiparsky 1978,1985; Kaisse and Shaw 1985). Arguments have been provided, however, for assigning or (to state it non-derivationally) having "prosodic" structure in the lexicon. In many phonological analyses and models it has been argued that syllable structure and even foot structure is present in, or assigned to, lexical representations in order to feed phonological constraints or rules that themselves are assumed to be lexical. Furthermore, it has also been proposed that lexical representations may involve a notion of "phonological" or "prosodic" word (sometimes called the "word level" and usually as the output of level I morphology) that does not coincide with the notion of morpho-syntactic (or grammatical) words, i.e., the output of the last level of morphology and the smallest unit of syntax.
Structure paradoxes in phonology
77
The presence of hierarchical structure in the lexicon raises the question as to whether postlexical structure is built on top of the lexical structure (possibly with some adjustments) or from scratch. The latter view is often taken to involve 'deforestation' or 'erasure' of the lexical structure. Assuming, however, that postlexical structure is built from scratch does not necessarily involve such destructive measures. We could also hold the view that lexical and postlexical structure exist simultaneously in different planes (the x-line represents the phonological skeleton): (1)
a. xxxxxxxxxxxxxxx lexical structure postlexical structure
b. lexical structure xxxxxxxxxxxxxxx postlexical structure
In a model like (lb), we will encounter the lower levels of organisation (syllables, feet, words) twice, both at the lexical level and at the postlexical level. The hypothesis in (lb) is not novel as such. J. Anderson and Ewen (1987: 122) defend it quite explicitly and so does Helsloot (1997). Some of the phenomena to be discussed here are also discussed in these works. At the level of syllabic organisation, Baumann (1996) has argued for two levels of representation. J. Anderson and Ewen (1987) make the important point that a specific lexical structure may correspond to various postlexical structures because postlexical structure is in part determined by factors like rate of speech and speech style/register. In this article I wish to advance further argumentation in favour of model (lb) by showing that it provides a solution to many structure paradoxes in phonology. Before I proceed, I will explain the notion of "structure paradox". A structure paradox arises when a linguistic unit appears to have two (or more) incompatible structural descriptions, each being based on different types of arguments. The term has first been used in morphology, where a structure paradox refers to the fact that in words like model theoretic standard "level theory" demands that the structure is as in (2a), which is also motivated phonologically, while semantic considerations argue for (2b):
b.
model
theory
ic
model
theory
ic
Referring to this ambiguity as a paradox seems to reflect the belief that the availability of two structures raises a problem; it is apparently assumed that it could not be the case that both structures (as morphological structures) have a right to exist. To solve the paradox one could argue against one of the structures. Another route is to say that both structures, being correct, belong to different "articulations": one is morpho-syntactic (2b), implying that standard level theory is just false, while they other, (2a), is phonological. In that case, even though two structures are argued for, their conflicting organisation is no longer suspicious because the structures exist in different modules (i.e., morpho-syntax and phonology). Structure paradoxes have also been identified within phonology. Keyser and O'Neill (1985) suggest that Old English words have two simultaneous metrical organisations. The problem is the following. Primary accent is always initial, disregarding syllable weight, whereas secondary accent appears to be weight-sensitive. How can a metrical structure be weight-insensitive and weight-sensitive at the same time? Keyser and O'Neill argue that this can only be if two planes for metrical organisation are recognised: (3)
Left-headed QI unbounded word tree
xxxxxxxxxx Left-headed QS bounded foot structure
F
F
F
Dresher and Lahiri (1991) disagree strongly with postulating two metrical organisations and plead for what they call "metrical coherence", i.e., one level of metrical structure that accounts for all the
Structure paradoxes in phonology
79
facts. They propose a new type of foot structure ("The Germanic Foot") for that purpose. In a case like this it seems unlikely that one could accept both structures by arguing that they belong to different articulations since both structures quite clearly are "phonological" in nature. However, if one adopts the position in (lb), it can be argued that the two structures are both valid, one in the lexical plane and the other in the postlexical plane. The stress (or accentual) facts of Old English would, in that case, be accounted for as a mix of lexical and postlexical constraints. The consequences of this separation of a word's accentual structure over two planes are far-reaching. In this particular example, we would, in fact, be claiming that the treatment of primary accent and secondary accent are separated, and in particular that the former is not built on top of the latter. This contradicts the standard metrical view which (in derivational terms) builds feet first and the so-called word tree on top of that: (4)
Word tree
Foot structure x x x x x x x x x x x x
In my own work on word accentuation (van der Hulst 1984, 1996, 1997), I have argued in favour of the separation of primary and secondary accent, as in (3). More specifically, I have proposed a model in which primary accent is assigned first or independent of secondary accent. Often, indeed, assignment of primary accent is clearly lexical (making use of foot structure as a computational device to locate the syllable with primary accent), whereas rhythmic structure (represented in terms of exhaustive foot structure) seems completely insensitive to lexical information. I have therefore argued that rhythmic structure is assigned "later", the precise level remaining to be identified as either "late" lexical, or postlexical (in the domain of the prosodic word, clitic group or phrase). Rhythmic foot structure must pay
80 Harry van der Hülst
respect to the primary accented syllable, but not to the entire structure that was built to account for its location, if this indeed required more than one peripheral foot.1 With respect to this particular example, I have not investigated whether foot structure in Old English is late lexical or postlexical. A decision depends on the nature of the kinds of processes that have been claimed to be foot-sensitive, and, in addition on the necessity and nature of such a distinction in the first place. The general point I wish to make in this article is that many debates regarding the exact form of phonological structure involve structure paradoxes, and thus that these debates can be resolved by recognising that the opposing proponents are talking about structures at different levels. The hypothesis of having parallel lexical and postlexical structures will be referred to here as the Duality Hypothesis. A sceptical note is perhaps in order before we proceed. The Duality Hypothesis is of course much less restrictive than the "coherence" view advocated in Dresher and Lahiri (1991). We are, after all, solving problems by adding additional structure. It is therefore important to evaluate the Duality Hypothesis critically and to see whether the proposed enrichment of the theory is counterbalanced by a healthy increase in insight into the nature of phonological phenomena.
1 1.1.
Examples of structure paradoxes in phonology Syllabic structure paradoxes
Phonologists dealing with syllable structure often disagree on the question as to whether syllable onsets can contain more than two consonants. Proponents of Government Phonology, for example, claim that onsets can contain no more than two consonants (cf. Kaye, Lowenstamm, and Vergnaud 1990), and some (like Lowenstamm 1996a) even hold the apparently untenable claim that onsets are universally monosegmental. Other researchers, pointing to languages containing words that "audibly" begin with sequences of more than two consonants, argue that the Government Phonology claim is
Structure paradoxes in phonology
81
clearly false. They propose instead models that allow a fairly liberal adjunction of consonants to the left hand side of the syllabic nucleus. The relative markedness of complex clusters remains unaccounted for in such models; cf. van der Hulst and Ritter (1999a) or Ewen and van der Hulst (2001) for an overview of various approaches. Sometimes it is acknowledged, however, that the researchers in question may be talking about two different kinds of objects, viz. the phonological syllable and the phonetic syllable, respectively. Depending on what one means by terms like "phonological" and "phonetic", this distinction might come close to what I will argue for here (i.e., the lexical/postlexical distinction), except that I will use the terms "phonotactic syllable" and "prosodic syllable", which I both reckon to be phonological objects. It seems obvious that the prosodic syllable is much closer to the phonetic substance of utterances than the phonotactic syllable. Besides the controversy concerning the amount of complexity that syllabic constituents like onsets and rhymes can take, a more fundamental disagreement exists between proponents of onset/rhyme theories and proponents of moraic models of the syllable. For a simple syllable like /pan/ the two approaches argue for the following two structures: (5)
OR structure p a n Mora structure
(Some moraic models adjoin prevocalic segments directly to the syllable node, rather than to the first mora.) Proponents of both OR-models and mora-models typically base their case on quite different arguments. OR-phonologists point to phonotactic facts, i.e., generalisations regarding the segmental structure of lexical items. Here it seems that independent statements can be made for onsets and rhymes. Well-formed words can then be seen
82 Harry van der Hülst
as alternating sequences of such units, possibly with the proviso that special types of onsets and rhymes are allowed at the periphery of words. A consequence of the latter possibility is that word edge often allow for a greater complexity than what is found word-medially. Moraic phonologists do not talk about phonotactics so much. They focus on the interaction between syllabic structure and higher prosodic structure, specifically foot structure. The relevant property of syllables is their relative prominence with respect to rhythmic foot structure, i.e., their "weight". As is well known, determining weight does not depend on the presence or complexity of onsets. The "invisibility" of onsets forms an embarrassment for OR-theories. A possible resolution to the OR-mora debate would be to argue that the level at which weight-sensitive foot structure is computed does not have access to onsets at all, because at this level syllables have a moraic structure. This moraic structure provides the prominence peaks relevant for rhythmic foot assignment. Compatible with this, given the moraic structure in (5), is that even though specific reference to (branching) onsets is impossible postlexically, prevocalic consonants may play a role in determining prominence. Goedemans (1998) indeed argues that this is the way to look at cases that have been put forward as involving onset relevance. In line with the above proposed separation of lexical and postlexical structure it is now tempting to say that a moraic structuring of the syllable and the formation of rhythmic foot structure belong to the same plane, possibly the postlexical plane. It has, in fact, been argued that the moraic type of organisation better fits the prosodic hierarchy in which each layer contains only one type of constituents (van der Hulst 1984, Nespor and Vogel 1986). The moraic structuring of syllables, indeed, much better fits in this hierarchical architecture than syllable nodes that dominate onsets and rhymes. In the latter case an additional problem is that weight must be computed non-locally, i.e., by-passing the syllable nodes since the structure of rhymes rather than the syllables themselves, is relevant for foot formation. The moraic model does not have this problem. A heavy syllable is a branching, a bimoraic syllable and a light syllable is non-branching.
Structure paradoxes in phonology
83
A potential problem arises in connection with my claim that primary accent is lexical. If lexical syllable structure is OR-based then one might infer that we predict that weight differences can play no role in the computation of primary accent location. This, however, is not necessarily so. Weight distinctions can be computed lexically by making lexical structure sensitive to the structure of rhymes only. This more abstract manner of weight computation seems more at place at the more abstract computational lexical level. This proposal implies that there may be differences between weight distinctions at the lexical and the postlexical level. Below, I will show that this is the case in Dutch. Elsewhere I have argued that differences in what counts as heavy for primary and rhythmic stress indeed supports the separation between them (cf. van der Hulst 1984, 1996, 1997). From the present view point, lexical factors would involve structural complexity of the nucleus/rhyme, whereas postlexical factors would involve the more general notion of prominence, a point also made in Dresher and van der Hulst (1998). The suggestion that lexical and postlexical syllabification differ as radically as discussed above needs further investigation. I am not committed to the idea that the postlexical syllable is moraic. Perhaps it is "flat" structure ä la Kahn (1976). There are, however, several other syllabic structure paradoxes that could be mentioned, and solved along the same lines. One example involves the question whether or not intervocalic consonants can be "ambisyllabic". Ambisyllabicity does not go well with an OR-theory. Standard views on constituent structure disallow nodes to be dominated by two mothers. Arguments in favour of ambisyllabicity are usually based on rather low-level processes, such as flapping in English or various other forms of lenition or weakening, which are arguably postlexical rather than lexical. Phonologists adhering to an OR-approach at the lexical level usually assume rules of resyllabification to account for ambisyllabicity at later levels. Selkirk (1982) claims that this kind of resyllabification is precisely what characterises so-called ambisyllabic consonants. Again, we could approach this issue in terms of assuming different structures at different levels, rather than in terms of restructuring. We could say that postlexical syllabification allows ambisyllabicity, while lexical syllabification does not. There is no need
84 Harry van der Hülst
for resyllabification. This entails that phonological generalisations that demonstrably concern lexical structure should never need to refer to ambisyllabic consonants. Apparent lexical reference to "ambisyllabicity" can be avoided anyway since, as argued in van der Hulst (1985), surface ambisyllabic consonants can be represented lexically as geminates. No language seems to have a phonemic contrast between geminates and ambisyllabic consonants, a fact that is correctly predicted by this proposal. A further interesting possibility arises if we recognise that a language may lack evidence for lexical syllabification all together. This, for example, is what we could claim for the much debated Tashlyt Berber situation. This language has received a lot of attention due to its amazing syllabification algorithm. Most discussion is based on the work by Dell and Elmedlauoi (e.g., 1985). Since in this language apparently any string of consonants can be "syllabified", I would like to argue that the only type of syllabification that we have is postlexical. There is simply no evidence for lexical syllable structure if there are no phonotactic restrictions on sequences of segments that form wellformed lexical words. Perhaps it is not surprising that there are no reported lexical (irregularities in terms of word accent either for this language. That languages with these kinds of consonantal strings appear to lack word accent may be because they just do not have any kind of headed structure at the lexical level, which may ultimately be caused by the fact that lexical items are mostly consonantal, and thus not syllabified word-like entities. Postlexically, there are undoubtedly rhythmic structures and accents that mark prosodic domain boundaries. The location of these accents may be "variable" and dependent on phrasal context, and factors such as rate and style of speech. As Bolognesi (1998) argues, the driving force behind postlexical syllabification is rhythm, i.e., a regular alternation of prominence peaks and troughs. At this level, any kind of segment can apparently be a peak, even obstruents. In the perspective taken in this article, this potential of obstruents does not have a direct bearing on what is possible at the lexical level. Languages that show evidence for phonotactic regularities at the word level seem to make no use of obstruents, or any type of consonant, in the syllable head position. In ap-
Structure paradoxes in phonology
85
parent cases, "syllabic consonants" can be represented as emptyheaded syllables as proposed in Government Phonology approaches.2 Government Phonology has proposed very strong constraints on the maximal size of syllabic constituents and it would seem that these constraints are massively violated in many languages. However, if we simply abandon these constraints in favour of unlimited adjunction approaches, we essentially end up without a theory of syllabic organisation, and many phonotactic regularities, even with respect to seemingly unlimited consonant clusters, must be regarded as synchronically arbitrary. The Duality Hypothesis provides a way out by allowing the string of segments to be organised differently at different levels. The lexical organisation is primarily responsible for accounting for phonotactic regularities. This organisation contains "empty nuclei". The postlexical organisation is responsible for the rhythmic structure and many low-level regularities. It does not seem to make any reference to empty nuclei and it also does not seem to care so much about crisp edges.
1.2.
Foot-level structure paradoxes
Going one level up, to foot structure, we encounter again various cases of structure paradoxes. Let us consider the case of Dutch. In order to make sense of the regularities that we find in the location of word accent, we can formulate a metrical algorithm (cf. van der Hulst 1984; Kager 1989; Trommelen and Zonneveld 1999). This algorithm must appeal to various properties of syllabic and foot structure (on the left-hand side in (6)) that eventually do not end up in the postlexical prosodic structure (which shows the right-hand side structures): (6)
Lexical
Postlexical
a.
Ambisyllabic consonant
Geminate (c ν c) (c v) l e m m a
(le(m)a)
86 Harry van der Hülst
b.
c.
Superheavy syllable
Empty syllable (c ν) (c 0) r ä m 0
(ram)
Monosyllabic foot
One foot
[ { ( C V C ) } {( C V c ) } ]
h a r d.
η a s
[{(har)(nas)}]
Ternary foot Extrametrical syllable [{( c ν )( c ν )} ] do mi η e [{(d0)(mi)(ne)}]
(The square brackets indicate word boundaries, braces foot boundaries and parentheses syllable boundaries. Capital vowels represent lax vowels.) The stress evidence shows that the so-called ambisyllabic consonants make the preceding syllable heavy. Thus we must either represent them as geminates or assume that they are codas only (cf. van der Hulst 1984; Trommelen 1983; Kager 1989). In both cases we set up a syllabic structure that is not supported by the surface syllabification in which a single consonant seems to belong to both syllables. With respect to (6b), there is evidence that final superheavy syllables behave lexically like "two syllables", like a branching foot. In the postlexical phonology we seem to be dealing with a single closed "superheavy" syllable. (6c) represents a quite interesting case. In order to arrive at a consistent metrical analysis, we must assume that final closed syllables form feet by themselves which, because they are monosyllabic, cannot be word heads. Gussenhoven (1993), however, points to intonational evidence which suggests a different kind of foot structure in such cases. On the basis of the "chanted call" he points out that a word like harnas behaves like one foot. In my view, (6) reflects several clear examples of structure paradoxes involving foot structure. Regularities at the lexical level (involving the location of primary accent) point to one structure, while "later" regularities (in this case involving tune-to-text association) point to a different structure. In the kinds of surface feet that Gussenhoven needs, closed syllables are no longer prevented from occurring
Structure paradoxes in phonology
87
in the weak position in a foot. This shows that the weight-criteria for lexical and postlexical feet are different. Helsloot (1997) reaches the same conclusion on rhythmic grounds. It was suggested in van der Hulst and Moortgat (1981), in fact, that postlexically in Dutch weight primarily involves the distinction between full vowels and schwallables (a point also suggested in Bolinger 1981 for English) rather than a distinction between open syllables (with tense or long vowels) and closed syllables (with lax or short vowels). There is further evidence for distinguishing between lexical and postlexical footing. The location of secondary accents does not really follow from the lexical algorithm which operates from right-to-left forming a right-headed structure. Secondary accent is governed by a separate algorithm that places a strong secondary accent on the initial syllable, which suggests that this is a left-to-right, left-headed organisation; cf. van der Hulst (1984). I claim here that all these things follow from one fact: there are two metrical algorithms, one lexical (or phonotactic) and one postlexical (or prosodic). The lexical algorithm is right-edge oriented and sensitive to the difference between closed syllables (heavy) and open syllables (light; the latter may include schwallables, which are arguably always open; cf. Oostendorp 1995). The prosodic algorithm pays respect to the location of primary accent which has been lexically determined. This algorithm is left-edge oriented, placing a strong secondary accent on the first syllable. In terms of its further rhythmic effects it is weight-sensitive in the sense of distinguishing between full vowels and schwa. Thus, we reconcile the findings of van der Hulst and Moortgat (1981) (arguing for the relevance of the full vowel - schwa distinction), of van der Hulst (1984) (arguing for a dependency of rhythmic structure on primary accent), and of Gussenhoven (1993) (arguing for the lightness of closed syllables). As in the case of syllabification, languages do not necessarily have lexical feet. Seemingly endless debates regarding the question whether Indonesian has word stress or not, might very well relate to this. Perhaps the situation is as follows. Lexically, there is no word accent, but postlexically there is. One might say: postlexically there
88 Harry van der Hülst
will always be word accent since no language can escape from having a postlexical prosodic organisation which serves as the interface to the production and perception system. That postlexical accent is less easy to pin down is undoubtedly caused by two facts. Firstly, words may occur in various postlexical structures which may result in differences in word accentuation. Secondly, as mentioned above, there will be various postlexical structures depending on factors like speech rate and style. French is probably another case in point. French words do not have lexical accent. Accentuation in French is a phrasal, postlexical phenomenon. This fact has the consequence that the treatment of schwa-zero alternation (which is, in part, dependent on accentuation) must also be postlexical. Van der Hulst and Rowicka (1997), Rowicka (1999a,b) propose that the kinds of proper government structures that Government Phonology proposes for phonotactic reasons can be regarded as the lexical hierarchical organisation. In such cases the organisation is not manifested in terms of accentual cues, but is still needed for phonotactic reasons. Thus these authors argue that the distribution of silent empty nuclei is very similar to the distribution of unstressed syllables, while the two cannot be identified with each other. This point brings to the surface that even though the point of this article is to argue in favour of separating lexical and postlexical prosodic structure, thus allowing the two of them to be different, it would be quite unexpected to find that both kinds of organisations are of a completely different nature. The argument often raised by those who argue against Government Phonology, i.e., that the empty nuclei that this approach postulates seem to play no role in accent assignment, is no longer valid. The proposal made here simply does not claim that such empty nuclei are present or visible postlexically. Thus, if in a language accent is postlexical, we expect empty nuclei to have no influence at all. If, on the other hand, primary accent is lexical, we do expect empty nuclei to play a crucial role. We have just seen that the latter is indeed the case in Dutch, which explains the behaviour of final superheavy syllables (cf. 6b).
Structure paradoxes in phonology
89
A question that must be addressed is whether in specific cases, the lexical foot level could account for both phonotactic facts and aspects of the accentual structure. S. Yoshida (1999) addresses this issue, arguing that in the case of Cairene Arabic one and the same structure accounts for vowel-zero alternations (in terms of proper government) and for accentual structure. For further elaboration of this point I refer to Rowicka (1999a) and van der Hulst and Rowicka (1997).
1.3.
Word-level structure paradoxes
Let us now raise the question as to whether it makes sense to recognise a notion of phonological word at the lexical level? Inkelas (1989) gives an affirmative answer. She "translates" the morphology level I-level II distinction into lexical "prosodic" structuring. The output of level I is said to form the "alpha-domain", and the output of level II the "beta-domain". The output of level I has also been referred to as the "word-level" in, for example, Borowsky (1994). It seems to me that this word-level is also the domain that in Government Phonology corresponds to so called "analytic domain" (Kaye [1993] 1995). If I am correct in viewing government relations as lexical foot structure we expect that the analytical domain corresponds to the lexical phonological word and this is what I wish to claim; cf. also van der Hulst and Ritter (1999a). In line with the terminology introduced here, I will refer to this notion word as the phonotactic word. It is interesting that the phonotactic word quite clearly does not correspond to the postlexical prosodic word. Consider the following example. Suffixes like -ing and -er in English and Dutch are considered to belong to level II, which implies that they fall outside the phonotactic word. Yet, postlexically, the final consonant of sing forms an onset to the vowel of the level II -er suffix, suggesting that both form one prosodic word. To solve this paradox several phonologists (e.g., Booij 1995) proposes a recursive prosodic word level ("PrW" stands for "prosodic word"):
90 Harry van der Hülst
(7)
Recursive prosodic word level (after Booij 1995) PrW' Pr sing
er
McCarthy (1993) also proposes recursivity of PrW, but he discusses postlexical phenomena (the linking [r] in the idea[x\is) and it would seem that PrW' in his case replaces the postlexical clitic group; cf. Peperkamp (1997) for discussion. I would like to suggest that recursivity is the wrong way to go. I will assume that the structure of singer is as in (8a) at the lexical level and as (8b) at the postlexical level: (8)
a. Lexical level ClitGr
b. Postlexical level PrW F
sing
er
sing er
In van der Hülst (ms.) I argue that we need the Clitic Group ("ClitGr") at the lexical level anyway, to make sense of the so called (syllabic) appendix which can be added to structures that have already reached the limits of what a phonotactic word can take, e.g.: (9)
The clitic group (after van der Hulst, ms.) ClitGr
war m0 her fZ
0st0 05/0
(superlative of warm 'warm') (underived 'autumn')
We can conclude from this that the lexical phonotactic organisation goes beyond the word level: in (9), we see a phonotactic clitic group.
Structure paradoxes in phonology
91
Lexically, compounds also form hierarchical structures that consist of two phonotactic words, which we can call phonotactic phrases. It seems clear that we do not wish to project these prosodic units to the postlexical phonology as postlexical prosodic phrases. Postlexically, compound word accent is not more prominent that the word accent of non-compound words. The word accent of the righthand compound member has the status of a secondary accent. Therefore, postlexically a word like carnaval may not be different from muizeval ('mouse trap'). Again cf. Helsloot (1997) for similar claims and examples. Finally, let me make it clear that the lexical level is not identical to the "old" structuralist, phonemic level. Borowsky (1994) has clearly shown that the "word-level" (which is here called the phonotactic word) is not strictly phonemic. She gives examples of several processes in a variety of languages that must apply at this level, while being allophonic. Interestingly, both diminutive formation and plural formation in Dutch, both level II operations, suggest sensitivity to secondary accents; cf. van der Hulst (1984). Since both morphological processes belong to what is usually called level II morphology, this suggests that the phonotactic word can be the domain for rhythmic foot structure and thus that the lexical structure does not only comprise the head foot (for primary accent), but is a more complete hierarchical structure that covers the whole word.3
Summary and conclusions Summarising, we have seen that there is ample evidence for assuming hierarchical phonological structure in the lexicon. This structure is necessary to capture phonological generalisations of lexical units (lexemes, excluding idioms). These generalisations involve phonotactic patterns, accent location and also alternations in the segmental shape of morphemes. I have demonstrated that the required structures resemble postlexical constructs but in almost all cases are different (sometimes in subtle ways). Maintaining that postlexical structure is built on top of lexical structure would require numerous adjustments and destructive operations. The view that I advocate here is that post-
92 Harry van der Hülst
lexical structure is built independently, leaving the lexical structure untouched and intact. That lexical prosodic structure resembles postlexical structure is not surprising. Regularities in the sound structure of languages start life at the postlexical level. Over time such regularities percolate into the lexicon leading to restructuring of lexical entries. Meanwhile, or as a result of that, these regularities are overruled by new postlexical processes, making them opaque to a certain extent. This does not mean that these regularities are no longer worth capturing, but in order to do that one must be prepared to postulate more abstract structures that deviate from the structures that now predominate the postlexical hierarchy. In fact, it strikes me as evident that generative phonology has been founded on the idea that generalisations can hold with respect to different levels, thus causing the phenomenon of opacity. However, opacity is usually thought of as applying to the segmental phonology only, assuming that hierarchical structure is predictable and assigned at the end of the phonological derivation. Once, however, we allow that hierarchical organisation crucially conditions phonotactics, accent placement and morphology4, we should also recognise that the lexical and the postlexical phonology can have their own set of constraints on hierarchical structure.5
Notes *
1. 2.
An earlier version of this work was presented at the "OT on the HIL" workshop, held on December 9, 1996, in Leiden and at the MOT Phonology workshop held on February 7-9, 1997, in Toronto. Similar points are incorporated in van der Hulst and Rowicka (1997). For this publication, I have left the original ideas in tact. A further development of these ideas can be found in van der Hulst and Ritter (2000a,b) and van der Hulst (to appear). I refer to my earlier articles for references to other work where similar ideas have been expressed. If one were to propose a lexical syllabification for this language, Government Phonology would force every consonant to be an onset. The resulting structure, then, would contain a lot of so-called empty nuclei. An analysis along these lines is offered in van der Hulst and Ritter (in prep.).
Structure paradoxes in phonology 3.
4.
5.
93
In van der Hulst (to appear) I argue that both the lexical and the post-lexical structure are always present and exhaustive (i.e., covering the whole word). Thus, no language would lack lexical syllabification or footing. Rather, these lexical structures may no be manifested in the kinds of phonetic cues that we often associate with accent. Indeed, further examples for structure paradoxes can be found in the area of Prosodic Morphology, where it seems evident that the kind of foot structure that is required is not the one that accounts for surface rhythmic structure; cf. Prince and McCarthy (1990); McCarthy and Prince (1993). An issue that I will take up in a further elaboration of this hypothesis concerns the question as to whether the lexical and postlexical phonology make use of the same formalism. In the light of the recent emergence of Optimality Theory I suspect that an optimality-theoretical approach (involving language-specific ranking of constraints) is perhaps more likely to suit postlexical phenomena than lexical phenomena. Thus, it would seem that having two levels also allows us to solve paradoxes at a higher theoretical level (i.e., parameters versus ranked constraints).
An x-bar theory of Government Phonology* John R. Rennison and Friedrich Neubarth
Introduction This article originated as an outline of a few revisions to standard Government Phonology, but it has now grown to define a quite comprehensive theory and model of phonology, based on previous work in Government Phonology and on insights gained through failures of the standard theory to naturally explain some of the data. When the model of Government Phonology was set out during the 1980's, the main focus was on how to represent vowels and the structural relationships between them (proper government) and between and within other constituents like "onsets" and "codas". Certain phenomena were acknowledged as rather mysterious (e.g., the "magic licensing" of Kaye [1991/1992] 1996a), while others were rarely recognised as a problem (e.g., domain-final codas with more than two consonants). The representation of consonants always lagged behind vowels, even more so the representation of inter-consonantal relations.1 Indirectly related to this, in the past few years a vivid discussion has arisen about higher structuring, i.e., syllable structure, which one recent school of thought considers to be quite primitive (onsetnucleus/consonant-vowel pairs — cf. Lowenstamm 1996a; Szigetväri 1999), rather than mirroring the traditional view of syllabification, with its structure of onsets, nuclei, and codas (cf. Kiparsky 1981; Kaye, Lowenstamm and Vergnaud 1989; and for critical discussion see Ploch, this volume). What we want to do in this paper is: agreeing with the view that syllable structure consists only of CV pairs, to establish this minimal structure as the basic constituent of phonological theory. In particular, all licensing relations are defined solely on this level. For consonants, we will present a radically minimal system of elements, which
96 John R. Rennison and Friedrich Neubarth
overcomes some redundancies and stipulative constraints of previous accounts and comprises a fundamentally more restrictive inventory. Finally, we will reconsider the mechanisms of licensing (uninterpreted vowels rendering consonant clusters), arguing that one class of consonant clusters can easily be accounted for if proper government (henceforth V-government) is complemented with a second notion 'C-government' (which in part comprises interonset licensing and coda licensing, but differs in core technical details). The other class of consonant clusters, which coincides largely with the traditional branching onsets of standard Government Phonology, are analysed as complex single consonants. Before outlining our model, we will address a question which has often been posed in recent years: how phonetic is phonology? Here, on a par with most exponents of Government Phonology, we take a very strict point of view. The framework we present here is a formal theory designed to account for the patterning of phonological data, and it is neither motivated by, nor always directly relatable to, phonetics. This is a conceptual rather than an empirical matter. Phonetic output has no resemblance to or impact on phonology. We do not want to preclude that our view on how phonological structure is represented might have interesting consequences for a different conception of phonetics as a direct interpretation (articulatory or acoustic) of these structures. But we definitely want to exclude any structural relation between phonetics and phonology, or any structural interface whatsoever. Our present venture, perhaps better regarded as a research program, reflects what we believe to be a challenging recapitulation of the hard-gained insights from previous work by many people, combined with our own ideas about a unified model of phonology. As Jonathan Kaye instructed us, it is more fruitful to have a very restrictive theory which gets the basics right, and to stand with our backs to the wall and await objections.
An x-bar theory of Government Phonology
1 1.1.
97
Structure Sylls
We assume the standard level of skeletal points (x) as the melodic side of the interface between (hierarchical) phonological structure and melody. Throughout the whole of (linguistic, and therefore a fortiori) phonological structure, heads project, non-heads do not (Jackendoff 1977). We claim that phonological structure is universally right headed at the base level (although this does not hold for higher levels of metrical structure).2 In terms of the skeleton, nuclei project, onsets do not. Thus, phonological structure is minimal: only right-headed binary branching constituents are permitted. Each skeletal point (x) is either the head or non-head of such a constituent, which we will call a syll (noted here with the symbol x).3 A syll thus consists of a pair of skeletal points, the second of which is the head. The non-head position is what is traditionally labelled a consonant (or onset), the head position a vowel (or nucleus).4 Thus, the labels C (or O) and V (or N) as well as the structure CV (ON), as the core of phonological structure, are derived from the familiar structural principles of binary branching, headedness and projection, as can be seen by comparing (la) with its equivalent in "CVCV'-Government Phonology, (lb). (1)
a. A syll constituent χ χ
χ
b. A CV (ON) pair C
V
|
I
χ \
χ I
(The arrow indicates interconstituent government.)
In the present theory there is no onset constituent on any level higher than the skeleton. This means that onsets never "see each other" directly - i.e., interonset relations of every kind are mediated by the syll constituent in which the onsets are located, and therefore often subject to (even multiple) parametric constraints. The χ node of the syll constituent is the structural side of the interface between structure and melody, and the two χ nodes are the me-
98 John R. Rennison and Friedrich Neubarth
Iodic side. The role of the skeleton is fundamentally diminished in this account, since the skeleton is entirely structurally determined. Elements (representing melody) can be attached to either of the two skeletal positions of a syll, but are never directly attached to the χ node. Interconstituent relations and higher-level (i.e., prosodic) phonological relations, on the other hand, can only relate to the χ node of a syll, and never directly to the skeleton. This has several benefits: •
•
• •
Onsets can never relate directly to prosodic structure. So, for example, no language can require that a particular nucleus must have a filled onset, an empty onset, or an onset linked to a specific melody element, except via constraints on syll constituents. Nuclei, on the other hand, being heads, can be required by higher-level mechanisms to be filled, empty, or to have a particular melodic content. There is no longer any direct relationship between a nucleus and its onset of the kind expressed by interconstituent government. The syll, which is the minimal phonetically pronounceable phonological expression and (we claim) minimal cognitive unit is now the central structural unit of phonology.5
1.2.
Phonological domains
At the lowest level of structure, a phonological domain consists of a sequence of one or more sylls. It is typically, but not necessarily, a phonological word. Depending on parameter settings, a phonological domain on the lowest level may or may not contain certain morphological boundaries. All licensing relations are bounded by a phonological domain, and at the edges of a domain there may be special licensing mechanisms (e.g., final empty nucleus licensing). In principle, we agree with Kaye [1993] (1995) as far as the theoretical status of domains is concerned; however, as will come clear in the following, our definitions of phonological domains deviate from his and can be seen as more restrictive. At higher structural levels (metrical, prosodic), a domain consists of a sequence of one or more lower-level domains. It is more conven-
An x-bar theory of Government Phonology
99
ient to graphically represent higher-level domains such as feet or stress-groups as a tree structure. Unless otherwise stated, in this article the term "domain" refers to a domain of the lowest level. Licensing conditions are checked cyclically within phonological domains. But also prosodic properties like stress assignment are computed relative to phonological domains — which means that every phonological domain must contain at least one realised nucleus. A phonological domain may itself contain phonological subdomains. 2 2.1.
Melody Primitives
The set of melodic primitives must be finite, universal and minimal. In this version of Government Phonology it consists of the elements I, U, R, H, L, F (where F is functional element, previously represented as or 'e'). The interpretation of elements depends in part on whether they are attached to a consonant or vowel position. This appears to be a quite straightforward move; however, it is very important to note that the labels "C" and "V" do not have any (melodic) content on their own. They are just structural properties derived by the principles of X-bar structure. Notice further that this is the only way to implement a difference between vowels and consonants without either reserving different elements or (maybe covertly) giving "C" and "V" element status. Their most common phonetic correlates are given in the table in (2), with the usual caveat that elements are hard-wired cognitive entities, and therefore neither articulatory nor acoustic in nature. Positions in (2) with a star (*) are impossible. ("Operator" is synonymous with "non-head". "ME" stands for "melodic expression".)
100 John R. Rennison and Friedrich Neubarth
(2)
Element
ME
As head of ME
As operator in ME
c V
stop "A" (non-high)
fricative/spirar ATR
c V
i-glide "I" (front)
palatal front
U
c V
u-glide "U" (rounded)
labial, "dark" rounded
R
c V
liquid * (see §2.5)
coronal * (see §2.5)
Η
c V
fricative * (see §2.5)
aspiration high tone
c V
nasal * (see §2.5)
voiced nasal/low tone
The main difference vis-a-vis previous proposals about the set of elements is the functional element F, which has a different interpretation in each of the given contexts. It replaces the former elements A, ?, I (ATR) and h (cf. Kaye, Lowenstamm, and Vergnaud 1985; Harris 1990, 1994). However, this small move removes two redundancies: the A and the /-element generally occurred only in vowels (although there are different conceptions about the yi-element, e.g., the view that the A- and i?-elements can be fused into one new ^-element - cf. Kaye 1991; Ploch 1993; Williams 1998). The stop and the noise element are exclusively consonantal.7 These restrictions are now combined in the definition of a single element. The retention of R as an element and the interpretation of Z-head in consonants and L-operator in vowels as 'nasality' are an integral part of the new setup of elements. Notice that an F-head in consonants is interpreted as "stop" (mainly correlated with silence) and not as "occlusive", which is a misguiding term from articulatory phonetics and quite misplaced in a phonological theory. The occlusion found in nasals therefore requires no extra specification beyond an L-head: nasals no longer have any equivalent of the ?-element.
A η x-bar theory of Government Phonology 101
2.2.
Attachment!association
In general, any element may attach to any skeletal position (be it a head or non-head x), though with a few restrictions outlined below. An attachment may be present in the lexicon, or it may result from licensing conditions (either universal or language-specific). We reject the Obligatory Contour Principle as a part of linguistic theory; even as a heuristic it is problematic. So there is no prohibition in principle against sequences of identical elements in consecutive positions. Elements which are not attached to a skeletal point in the lexicon (termed floating elements) may have some degree of leeway as to where they attach,8 and may also (parametrically) be realised phonetically as the second part of a contour segment9 (cf. Rennison 1998). Multiple lexical association of melodic expressions to skeletal positions is still an open question. If permitted, multiple associations could hardly be restricted in any principled way. We will therefore continue on the working assumption that multiple associations do not exist in the lexicon — neither from entire melodic expressions nor from individual elements to more than one skeletal position. This means that long vowels and geminates do not "share" melodic expressions or individual elements; instead, either the melodies in question are present twice, or one of the positions is lexically empty and is phonetically identified by mechanisms operating at the χ level, as outlined in §4 below. 2.3.
Melodic expressions
The set of elements which are attached to a particular skeletal point are termed a melodic expression.10 In general, only a single occurrence of a non-functional element is possible within a melodic expression. Given the standard interpretation of phonological elements as unique acoustic signatures, the double or multiple occurrence of an element (with the exception of the functional element, F) would not change the phonetic interpretation of the expression in which it occurs. The functional element, F, on the other hand, can occur both
102 John R. Rennison and Friedrich Neubarth
as the head and as an operator of the same melodic expression. We know of no evidence which would point to multiple occurrences of the other elements (apart from the lazy elements discussed in §2.7 below), but there is ample evidence from neutral vowels in vowel harmony processes that a putative "additional" I- or [/-element has no phonetic effect if that element is already present in the vowel. Graphically, melodic expressions are represented as follows: The head element(s) come first (and are redundantly underlined). Elements following the comma are operators. The linear order of operators is in principle irrelevant. When a melodic expression has more than one element in its head, the operators that modify only the first head element have a suffix, and those which do not modify the first head element ("lazy operators") a '+' suffix (see §2.7). Where it is optically convenient, the entire melodic expression is surrounded by parentheses. 2.4.
Headedness
Every melodic expression that is not empty must have at least one melodic head. This runs counter to approaches that employ headedness as a parametric option to represent phonological distinctions, e.g., Cobb (1993, 1995, 1997); Charette (1994); Kaye (1994b). As can be seen from the table in (2) above, the phonetic realisation of a particular element as the head or an operator may differ. The functional element is the only element which is allowed to occur both as a head and an operator in the same melodic expression. This is due to its special status: the functional element has no unique acoustic signature. Instead, being functional, F maximises or distracts from the most discernible acoustic pattern of the respective syll position to which it is attached. In the case of consonants, maximisation means silence (stops) and distraction means noise (spirants/fricatives). With vowels maximisation is the concentration of energy in the middle of the relevant frequency band (which spans from ca. 350Hz to 2500Hz), i.e., old A, as opposed to I (significantly more energy in the upper part) and U (more energy in the lower part). However, as an operator, F distracts from the typical formant configurations
An x-bar theory of Government Phonology
103
(tenseness/ATR). On the acoustic signatures of the vocalic elements A, I and U cf. Harris and Lindsey (2000). 2.5.
Distributional asymmetries of elements
Contrary to the past tenet of Government Phonology, elements are neither equal in strength (see §2.8 below) nor balanced in distribution. Some of the most robust asymmetries have been build into the present theory by the introduction of the F-element and the reanalysis of nasals. We hope that eventually all asymmetries will be derivable from more general principles (possibly in conjunction with the affinities of element pairs); for the moment, we can only observe that they seem to exist. We also do not claim that the restrictions outlined here are the sole co-occurrence restrictions on elements within a single melodic expression, but they seem to be the most frequently observed ones. Future research may reveal others. a. Vowels can only be headed by F, I or U: Whilst melodic expressions in non-head positions (consonants) can include any selection of the six elements, either as melodic head or operator, skeletal head positions (vowels) must have either F, I or U as their melodic head.11 In other words, R, Η and L are excluded from the melodic head position of vowels. This is reflected in the segment inventories of languages: the number of consonants is usually larger than the number of vowels. There exist Circassian languages with only a single vowel (Job 1981), and a very large number of languages has precisely the three-vowel set (Ε), (I), (U) (e.g., most Australian languages — see Dixon 1980). Yet there is no language without at least a handful of consonants e.g., Pirahä, with three vowels but at least seven consonants — see D. Everett (1986). b. R can only be attached to a skeletal non-head (onset): The distribution of the element R is even more restricted: it simply cannot be lexically attached to a nucleus. Rhotacised vowels and "syllabic" coronals in our view always involve the spreading of R from an onset to a nuclear position.12
104 John R. Rennison and Friedrich Neubarth
c. R backgrounds U: If R is present in the representation of a melodic expression (as in /JV, /Θ/ or IV), an i/-element in the same melodic expression will never encode place (labial) but rather "darkness" or laterality. d. Operator F requires an F melodic head in a consonant: The functional element, F, can only occur as an operator in a consonant if the same element, F, is the head of the melodic expression. 2.6.
Complexity restrictions
A language may have parametric restrictions on the number or combinability of elements associated with a skeletal position. Although these restrictions are ultimately idiosyncratic, the phonological system must be able to express them in a direct way. At a particular type of skeletal position (head or non-head), there can be restrictions on the total number of elements, on the relative strength of elements, and on melodic headedness relationships. Let us first consider the most restrictive cases. In Circassian languages such as Kabardian (Job 1981 quoting Kuipers 1960), perhaps less obviously, also in Mandarin Chinese (Kaye 2000; Neubarth and Rennison 2000), a nucleus may contain either (F) or nothing; in most Australian languages (e.g., Nyangumarda — see Hoard and O'Grady 1976) a nucleus may have only a single element, or nothing (resulting in a harmonised vowel — see Rennison 1987). Given the ban on Η, L, R as melodic heads in head-x positions, only the elements F, I and U are available, giving the vowel set /a, i, u/. Once a language permits more than one element in a head-x position, it seems that restrictions of other kinds come into play. In view of the well-known, affinities and non-affinities (assimilations and blocking effects) of elements sharing a tier,13 such as I-U, H-L and A-ATR (now F-F), this is hardly surprising. Subsets of the whole set of possible elements in vowels occur in many languages, the precise choice being determined by the affinity of elements (which, in turn, enhances parsability): lexical melodic expressions with two elements
An x-bar theory of Government Phonology
105
from the same tier are more marked. The above mentioned affinities of elements seem to correlate with their strength values in consonants (see §2.8 below). We think that this is no coincidence. Languages with a 5-vowel system /a, e, i, o, u/ allow a maximum of two elements from the three most easily available (i.e., F, I, U), but with the additional restriction that a two-element melody may not contain elements of equal strength (/ and U). 2.7.
Contour segments as lazy elements or complex heads
Lazy elements as operators, and heads containing more than one element, are responsible for contour segments, both in onsets and in nuclei. Lazy elements and second (or later) elements of a head are realised phonetically later than all the other elements in the melodic expression. Space restrictions do not allow us to go into detail here, but for a first approximation, see Rennison (1998). Many of the traditional "branching onsets" are reanalysed in the present theory as contour segments — in particular those involving obstruent + glide or obstruent + liquid sequences. Short diphthongs and affricates are also contour segments (i.e., involve lazy elements). We represent lazy operators with the '+'-suffix and complex heads simply as an intrinsically ordered set of elements. Thus for example Austrian German [I?1] as in ftV] blau 'blue' is notationally represented as (FR,UI+), which can be resolved into a combination of two melodic expressions (F,U / R,UI). Labialised and palatalised consonants have an additional U or I-element in the head. Affricates either involve a lazy F-operator or a complex head with an H-element.
2.8.
Some representations of segments
At this point it seems appropriate to make explicit our assumptions about the representations of some of the more common segments. We definitely do not exclude the possibility of other phonological representations of segments; on the contrary, we expect that all logically possible licit combinations of elements should occur in some
106 John R. Rennison and Friedrich Neubarth
language or other, although some of them will be difficult to find, since each element in a melodic expression adds to its markedness (and therefore rarity). In these examples, we will restrict ourselves to segments of English, French and Austrian German. Let us consider consonants first. • Stops have an F-element as head (with exception of the glottal stop, which we assume to be the realisation of the empty melodic expression - cf. also Ploch 1999b). The distinction of place (labial, coronal, palatal, velar) is encoded by the operators U, R, I and "nothing", respectively. A fortis/lenis contrast can be achieved either by adding an //-operator (indicating aspiration) to the fortis stop or by adding an Z-operator (indicating voicing) to the lenis, as is commonly assumed within Government Phonology.14 • Fricatives either involve an F-operator combined with an F-head or an Η-head. Which of the two option is appropriate for a given phonological entity is not easy to determine. When fricatives follow voicing contrasts involving an Η-operator (as is generally assumed for English), one is inclined to analyse those fricatives with an F-head on a par with stops. • Nasals are generated with an L-head, liquids and glides with an R-head or an I/U-head respectively. Notice that melodic expressions containing only one element as the head and no operators ([F]=/g/, [H]=/h/, [L]=/q/, [R]=/r/ and the glides [I]=/j/, [U]=/w/) are always the weakest of their family, so we expect them to undergo changes most easily or to be subject to special licensing restrictions. In fact this prediction seems to be borne out. Velar stops and fricatives are the most eager to disappear, palatalise or assimilate; velar nasals are prohibited from initial onsets in the majority of languages, /r/ in traditional coda position vocalises in English and Southern German and, finally, glides sometimes lead a double life between consonant and vowel position (e.g., give rise to diphthongs, etc.). • Vowels in this new representation do not differ too much from traditional representations of Government Phonology, except
An x-bar theory of Government Phonology
107
that the ^-element is encoded as an F-head (and must then always be head) and that differences in height or ATR among the mid-vowels (e.g., hi versus ΙοΓ) can no longer be expressed by switching or demoting the head but only by an F-operator expressing ATR. • Complex melodic expressions, i.e., contour segments, involve either complex heads or lazy operators, as outlined above. (3)
Contour segments: Contour type Complex onset Palatalised Labialised Affricate Diphthong
C(obstr.)F C(obstr.) J
C c
w
C(stop —* fricative)
/ai/ /au/ hi/
2.9.
Melody (relevant part only) (ER,·) (FR,..U+I+) (••I) C.u) (..,F+) or (FH,..) (E,i+) (E,u+) (F,U-I+)
The melodic strength of elements
The six phonological elements are not of equal melodic strength and type. The preliminary strength metric of Rennison (1998) is here redefined in order to achieve greater systematicity. It should be noted that the given values are derived heuristically from empirical considerations rather than grounded on theoretical principles. However, we believe that this move away from calculating complexity only by the number of elements (cf. Harris 1994) to a more complex balance is a positive and necessary one, if we are to recapture and exploit what was lost when charming theory was abandoned (cf. Kaye, Lowenstamm and Vergnaud 1985). The simple rationale behind the system is that heads decrease the strength of a consonant, whereas operators add to it. The first guide-
108 John R. Rennison and Friedrich Neubarth
line encodes what has been intuitively captured by some versions of sonority hierarchy, the second formalises complexity. (4)
The strength of elements (for governing relations) Element Value as head Value as operator
F 0 -2
I -5 +1
U -5 +1
Η -2 +3
L -4 +3
R -6 +4
Some examples: /g/ /t/ /s/ /b/ /m/ Ν /j/ Μ
= = = = = = = =
ME (E) (F,RH) (H,R) (F,U) (L,U) (R,UI) (I) (Ε)
=
= = =
= = = =
heads 0 0 -2 0 -4 -6 -5 -6
operators +4+3 +4 +1 +1 +1+1
total = 0 = 7 = 2 = 1 --3 = -5 = -6
Since a consonant is a non-head, the strength calculated for its melodic expression percolates up to its dominating χ node and there represents the consonantal strength of the syll. We will indicate this as a superscript numerical value on x. A syll with an unlicensed, hence phonetically interpreted nucleus will have a "V" subscript, so that all information relevant to government-related licensing (see the next section) is represented at the χ level. 3
Licensing
In general, melody is always licensed (and therefore phonetically folly realised), unless it can be (parametrically) suppressed by government.15 Thus, in the normal case, the melody which is lexically associated with a skeletal position will be realised phonetically at that position. On the other hand, a position which has no lexical melody16 will be identified (realised) phonetically unless it is licensed to
An x-bar theory of Government Phonology
109
remain uninterpreted. In other words, every skeletal position has inherent phonetic content, even if it is melodically "empty". There are several ways of licensing phonological structures, all of which are subject to parametric variation as to whether or not they apply. One licensing mechanism is classic proper government, henceforth V-government. What we will also define here is a second notion of government, namely C-government, which replaces the interonset licensing mechanisms outlined by various people in the past years. All government mechanisms operate exclusively from right to left and at the syll level. (6)
Government A syll Oi can govern its left neighbour σ,_ι iff at least one of the two following mechanisms is parametrically enabled and obtains: 1. the V of θ\ is unlicensed (=V-government), or; 2. the consonantal melodic strength of θ\ is greater than that of o"i_i (=C-government).
(7)
Licensing The head of a syll a is licensed to remain phonetically uninterpreted iff at least one of the three following conditions is parametrically enabled and obtains: 1. a is governed (by C or V-government); 2. a has a melodically empty nucleus and is at the right edge of the phonological domain; 3. a forms a geminate structure with its right neighbour.
3.1.
Licensing by government
The empty head of an unlicensed syll will always be phonetically interpreted, but licensed positions are not so straightforward. The degree of variation is wide, covering the optional non-suppression of "empty" nuclei in Koromfe, the total suppression of lexical melody in Odawa, the partial suppression of lexical melody in unstressed syl-
110 John R. Rennison and Friedrich Neubarth
lables in Russian and the non-suppression of lexical melody in German. In English, governed lexical melodies are realised with the phonetic content of an ungoverned "empty" nucleus (i.e., usually [a]), as in photograph [fautagr.aif] - photographer [fat'ogrsfa] - photographic [f,3üt3gr'aefik]. The nucleus of a lexically empty syll which is not governed must be phonetically identified. These are the classical "filled empty nuclei" which are realised with a variety of vowels in various languages: [i] in Arabic, [a] in English, French and German. Such sylls are also susceptible to phonetic identification by other means (if such are parametrically available in the language), such as vowel harmony or the formation of "syllabic" consonants. If the head of a licensed syll has no lexical melody, it is permitted to remain phonetically unidentified (i.e., silent). Language-specifically (parametrically), such positions (the classic "properly governed empty nuclei" and licensed final empty nuclei) can receive the weakest form of phonetic filling (e.g., schwa in many Gur languages such as Koromfe and Moore). However, in cases of parametrical phonetic expression of licensed empty nuclei, the phonetic expression of unlicensed nuclei must be different (i.e., melodically stronger). 3.2.
Licensing by the final empty nucleus parameter
According to our definition of "final empty nuclei", given in (7b) above, there are two possibilities for an empty nucleus of a domainfinal syll. If the final empty nucleus parameter is not activated, this nucleus must be phonetically interpreted (either by assimilation/harmony or with a default, usually schwa-like vowel). However, in languages where the final empty nucleus parameter is operative, a domain-final nucleus is phonetically silent. In such cases, the final empty nucleus behaves as if it were governed, even though no governor is available. To our knowledge, no previous theory has provided a principled account of the many languages which allow more sonorous wordfinal consonants, but ban final obstruents. We propose that this is due I [s]/_[i]. I fully agree with the following statement from Kaye (1995: 319-320): "I only wish to suggest that B&H's [Bromberger and Halle's 1989] assumptions ... are not a priori true. Since they are unaccompanied by any form of argumentation I feel justified in dismissing them." Cf. S. Jensen 2000, especially chapter 1, for a more detailed discussion. The last statement is also important in relation to the discussion in section 3.4, of Optimality "Theory", a framework that takes great pains to appear nonderivational but imitates it very well: with its combination of inputs and out-
200 Stefan Ploch puts, its subscription to grammaticality, its violable constraints, its untestable approach to non-monotonicity in human reasoning and, for some, its sympathy (cf. McCarthy 1997, 1999). The important point to remember here is of course that it is completely irrelevant in what way the definitions/concepts "derivation" versus "non-derivation" differ, only propositions matter (cf. note 13). For more details on the untestable way in which Optimality Theory deals with nonmonotonicity, cf. Ploch (2001a). For a similar view on the imitative character of Optimality Theory as regards derivation in opposition to its claim that it is a non-derivational approach, cf. Mohanan (2000). 27. Glot International, who had initially accepted Ploch's review after Ploch had rewritten said review twice for them, refused months later to publish it because it was in their opinion too negative and too harsh. The censored manuscript is available at Ploch's university web page (languages.wits.ac.za/~stefan). 28. I am not saying that all Optimality "Theory" analyses are completely untestable. Given that the number of constraints proposed as part of Universal Grammar must be limited (because the human brain as a physical entity must be limited too), there are, even under factorial ranking, also only a limited number of constraint rankings (and, in this way, language types). This makes it possible, in theory, to come up with language types that are not covered by any specific constraint ranking. In other words, there can be language types that would prove a proposed set of constraints wrong. Since this set is always identical to the full universal set (because according to Optimality Theory all constraints, no matter how low ranking and unobservable, are operative in all languages), and since within that set, none of the constraints is itself testable, any improvement to the whole universal set of constraints can only be achieved by regarding as wrong the whole set as it stands. Unfortunately, even if someone were to find a set that, even though falsifiable, cannot be shown to be false, this set would still exclusively contain untestable constraints, i.e., constraints that no evidence can be provided for. Only whole sets of constraints are testable in Optimality Theory; no optimality-"theoretical" explanation can ever show why it is wrong. Even if we have two sets Si and S2 such that S2 is identical to Si apart from it containing one constraint C2, which Si does not contain, and we find that Si is less wrong than S2, then we have not shown that C2 is wrong, i.e., we cannot "narrow" the problem down to its source, we can only say that S 2 as it stands is wrong; C2 in combination with other constraints may still do even better. So optimality-"theoretical" explanations are like a mechanic that can only tell you that one motor "as is" does not work while another one does or works better, without ever being able to tell you why (because no violable constraint is testable). Even though such a mechanic cannot only exchange whole motors but single parts, he can never gain insight as to why what works or does not. Thus, Optimality Theory is an engineering device, not an explanatory theory. Importantly, the only people who can explain
Problems with Occam's Razor and now-ad-hoc-new
29.
30.
31.
32.
33.
201
phonological phenomena are the ones who have a hard time being academically successful because they refuse to do Optimality "Theory' and pretend that there is such a thing as an optimality-"theoretical" explanation. The tableau in (9) is from Krämer (ms.: 22). The page numbers on which the constraints in (10) can be found are: the Ident(F) constraint family: 16 (from McCarthy and Prince 1995a: 264); *LoRo: 22, S-Ident afback] and S-Ident a[round]: 21. In (10), I have deviated from Krämer's capitalisation, paragraphing, italicisation and setting-of-quotes schemes at my pleasure. For more detailed information on Turkish vowel harmony, cf. Lewis (1967); Charette and Göksel [1994] (1996), [1996] (1998); Ploch (1998). An alternative account within the Optimality formalism can be found in Kirchner (1993). Let me also point to the unfalsifiable Grammaticality Hypothesis which is supported by Optimality "Theory", like by most linguistic theories, including Government Phonology (cf. S. Jensen, this volume; Ploch 2001a). Popper has written frequently about the scientific importance of differentiating mental states (world 2) from objective knowledge (word 3) and about the malaise of not doing so, cf., e.g., (1973). This distinction is not to be confused with Popper's view on the distinction between the factuality of inductive argumentation (quid facti?) and the problem of the justification of inductive arguments: "Thus the view I have in mind is characterized by the contention that the distinction between the factual problem of describing how we argue inductively (quid facti?), and the problem of justification of our inductive arguments (quid juris?) is a misplaced distinction" (Popper 1972: 64). The reason for this is that Popper has shown that logically any justification or verification of explanations (assumed universal statements) is impossible. The point relevant here is that Popper does indeed adhere to a strict distinction between quidjuris? and quid facti? in relation to objective knowledge. In chapter 3 of his Objective Knowledge (Popper [1972] 1973: 106-152), he shows that there can be and indeed exists objective knowledge without a knowing subject, which excludes pragmatist-subjectivist views like "usefulness".
Eerati tone: towards a tonal dialectology of Emakhuwa Farida Cassimjee and Charles W. Kisseberth
Introduction This paper presents a basic description and analysis of the tonal system of the Eerati dialect of Emakhuwa. We also locate this dialect within the general pattern of Emakhuwa tonal structure. This paper is intended to honour Jonathan Kaye and to make a small downpayment on our debt to the various people who have assisted us in studying a range of Emakhuwa dialects over more than two decades. Emakhuwa (some of whose varieties are commonly characterised by the term "Elomwe") is spoken by more than six million speakers in northern Mozambique and adjoining regions in Tanzania and Malawi. It is one of the most understudied major Bantu languages. While there have been some significant lexical contributions by missionary-linguists (particularly useful and largely reliable on segmental matters is Prata 1990), the twentieth century has seen practically no linguistically well-informed descriptions of the sound system, morphology, or syntax of this language, nor any systematic research on its dialectology. The most prominent missionary linguistic description, Praia's (1960) Gramätica da Lingua Macua e Seus Dialectos, offers less information on dialectology than its title seems to promise. The excellent M.Phil, thesis done by J.M.M. Katupha at the School of Oriental and African Studies in 1983 (an analysis of his native Esaaka dialect) is unfortunately unpublished. The role of tone in Emakhuwa was not recognised until the series of papers by Cheng and Kisseberth (1979, 1980, 1981) focusing on the Ikorovere dialect. Cheng and Kisseberth (1982) made a small further contribution by examining difference in the moraic structure of nasal consonants between the Ikorovere and Imitthupi dialects as re-
204 Farida Cassimjee and Charles W. Kisseberth
fleeted in the tonology. More recently, Cassimjee and Kisseberth (1999a,b) have argued that Emakhuwa tonal dialectology provides a "conspiracy" argument in favour of an Optimality Theory approach to tonology. Beyond this very limited published work on Emakhuwa tone, the present writers have amassed a large amount of descriptive material on a substantial number of varieties of Emakhuwa, with special focus on tone and morphology. This material was gathered during two periods: 1977-1984 (research conducted in the United States with two Tanzanian Ph.D. students in anthropology) and the 1990's (involving research with a much wider variety of dialects spoken in Mozambique and Malawi). We expect to publish much of this material in the next few years. Here we examine the tonal pattern of one dialect, Eerati, based on data collected during the summer of 1996 from a single speaker. We not only describe the fundamental aspects of the Eerati tone pattern, but we also seek to point out its relationship to some other dialects. While our account is largely descriptive in nature, we do cast our discussion in roughly Optimality-theoretic terms, and the reader who is interested in our theoretical views can consult Cassimjee and Kisseberth (1998) and Cassimjee (1998) for very detailed presentations of our approach to Bantu tone.
1
Some general remarks on Emakhuwa tone
Across the broad spectrum of Bantu languages, most employ tone specifications both lexically on stems and also on affixes. They also exhibit significant and complex interrelationships between lexical tone specifications and tonal patterning imposed by virtue of morphological structure (so-called "grammatical" tone). Lexical specification in the verb stems usually involves just a two-way contrast (High versus toneless) regardless of the number of moras in the stem. Nominal stems in some languages may show a free distribution of High and toneless moras, whereas in other languages some restrictions on the possible distribution of High and toneless moras have developed.
Eerati tone: a tonal dialectology of Emakhuwa
205
Emakhuwa, however, falls within a Bantu language type found mainly in southern Tanzania, northern Malawi, and Mozambique that has what can be referred to as a predictable tone system (cf. Odden 1989). By this we mean that while certain grammatical morphemes may show tonal contrasts, and while there may be grammatical tones, stems are not distinguished from one another by virtue of a High versus toneless specification. The tone shape of a stem is entirely a function of the moraic structure of the stem and the morphological structure in which it is embedded. In other words, the tone may be viewed as being assigned to the stem on these bases rather than being a lexical property of the stem (although the use of the term "assign" here is not intended to indicate a specifically derivational model of tonology). In Emakhuwa, verb stems are entirely lacking in tonal contrastiveness. Noun stems have largely predictable tonal shapes (of the dialects we have studied, only the Imitthupi dialect shows any major disruption in the predictability of nominal tone), but there are various sorts of limited tonal contrasts. We would argue that Emakhuwa dialects fall into two major tonal types: Non-doubling and Doubling dialects. Let us introduce a bit of terminology. If a mora in the input to the phonology bears a High tone due to an underlying specification (as with some affixes) or due to tone assignment (as in the case of stems), we shall say that this mora sponsors a High tone. In Non-doubling dialects, a High tone surfaces on the sponsor and no other High tone is present in the output. There may be considerable complications with respect to the assignment of High tones, but the specifically phonological matters that will concern us in our description of Eerati are foreign to a Nondoubling dialect. A Doubling dialect is one where (in principle though not always in fact) a mora that sponsors a High tone may (or may not) surface with a High tone, but in addition the following mora is also Hightoned. We refer to this as a doubled High tone. The Ikorovere dialect described in Cheng and Kisseberth (1979, 1980, 1981) represents the clearest example known to us of a Doubling dialect. In this paper, we shall 1. underline moras that we claim sponsor a High specification, 2. group this mora together in parentheses with the following mora when that mora has received a doubled High tone and 3. indicate
206
Farida Cassimjee and Charles W. Kisseberth
above the vowels the overt tone. Thus a verb like u-Hmelal 'to cultivate for' in Ikorovere will be transcribed: u-(lime)la. In this way, the reader will be able to immediately see which mora is a sponsor, whether doubling has occurred, and what the actual pronunciation of the two moras is. We shall refer to the material inside parentheses as the High (tone) Domain. What we consider to be Doubling dialects are widely distributed across the Emakhuwa-speaking region (although there are differences with respect to the generality of Doubling). In this paper we examine the tonal pattern of Eerati and establish that it is a Doubling dialect, although superficially there are many examples that would seem to call this into question.
2
The theoretical approach
In this paper, we wish to describe the essentials of the Eerati tone pattern and at the same time to locate this description inside an explicit theory of tone. We will assume a particular version of Optimality Theory that we have developed in detail in Cassimjee and Kisseberth (1998) and Cassimjee (1998), called Optimal Domains Theory. The only difference between this instantiation of Optimality Theory and that employed in the more general literature on Optimality Theory resides in the analysis of features. Most work in Optimality Theory assumes that somehow autosegmental representations remain appropriate when one shifts from a derivational, rule-based model to Optimality Theory. We believe that autosegmental phonology is intimately tied up to the notions of both underspecification and derivation, and actually offers little of interest to non-derivational, constraint-based models of phonology. Optimal Domains Theory assumes that a feature specification F in the output has no phonetic realisation unless it is located on a segment that is inside an F-domain. An F-domain is a unit of structure (represented in this paper by parentheses) entirely parallel with other units of structure, e.g., a syllable or a foot. Furthermore, even if a segment is inside an F-domain, it is not pronounced with the feature in question unless it bears an F-specification. Thus position inside an
Eerati tone: a tonal dialectology of Emakhuwa
207
F-domain does not guarantee that a segment bears the feature. This is crucial to understanding the tone pattern of Eerati. In Cassimjee and Kisseberth (1998) and Cassimjee (1998), the following universal tone constraints were introduced and used to explicate a range of tonal data from a variety of Bantu languages, but with especial focus on Xhosa and Shingazidja: •
•
• • • • •
•
• •
Faithfulness: this constraint says that every mora that is specified High (what we call here a High-sponsor) must be organised inside its own High Domain. (This version of the constraint collapses two different faithfulness constraints, but the need to differentiate the two is not critical in the present context.) Basic Alignment Left requires the Left edge of a sponsor to be aligned with the Left edge of a High Domain, and Basic Alignment Right requires the Right edge of a sponsor to be aligned with the Right edge of a High Domain. These constraints together allow only a sponsor to be inside a High Domain. Express Head: the head of a High Domain must be High. [High Domains in many Bantu languages are Right-headed.] Express: every mora in a High Domain should be pronounced with a High tone. *(High, non-head): moras that are not domain-heads should not be pronounced on a High. No Monomoraic High Domain: a High Domain should not consist of a single mora. Align Right (High Domain, Edge), where Edge refers to a prosodic category: align the right edge of a High Domain with the right edge of a Prosodic Word, or a Prosodic Phrase, or an Intonational Phrase. Non-finality (Edge): the right edge of a High Domain should not be located at the right edge of some prosodic category: Prosodic Word, Prosodic Phrase, Intonational Phrase. No Rise: a syllable should not be pronounced with a Rising tone. Plateau: the moraic sequence High-Toneless-High is ill-formed.
We shall show that while most of these independently motivated constraints can be seen to be at work in Eerati, one additional con-
208
Farida Cassimjee and Charles W. Kisseberth
straint must be recognised. The precise formulation of the constraint will require much more cross-linguistic exploration, but there is certainly a significant body of data available in the literature to suggest the need for a constraint that will have the desired consequences. Setting aside the matter of feature domains, we assume the main outlines of the Optimality Theory architecture and also assume the reader's familiarity with it.
3
A point of reference: the Ikorovere tone pattern
Before examining Eerati, it will be useful to provide a brief sketch of the tonology of a dialect that is perhaps the most transparent of all Doubling dialects that we have studied: Ikorovere. For convenience, we shall use the infinitive form of the verb as the main source of data. The infinitive consists of a prefix (generally ο-, but u- in the more northern dialects like Ikorovere) followed by a stem that necessarily ends in the vowel -a. In most Emakhuwa dialects, sponsors are distributed as follows in the infinitive: (1)
One mora stem: Two/three mora stem: Four/five/six+ mora stem:
prefix vowel is a sponsor (due to dispreference for final High) first stem mora is a sponsor first and third moras are sponsors
Examples from Ikorovere: (u)-lya 'to eat', (u)-khwa 'to die', (u)-wa 'to come' u-(thu)ma 'to buy', u-(mo)ra 'to fall', u-(wu)rya 'to drink' u-{lgyvo)la 'to carry', u-(huku)la 'to sieve beer', u-{ttere)kha 'to cook' u-(l0ko)(tthe)la 'to pick up for', u-(pgpha)(ru)la 'to separate', u-(ttiki)(tthe)la 'to rub', u-(maa)(li)ha 'to make quiet' u-(lökö){ttqni)ha 'to pick up', u-(kutti)(hera)a 'to heat for one another' There are a multitude of forms where Doubling is immediately apparent in Ikorovere. It is clearly visible in the infinitive forms with
Eerati tone: a tonal dialectology of Emakhuwa 209
three moras (u(law0)la, etc., above), four moras (u-{loko){tthe)la, etc., above), and it is visible twice when the stem has five moras (;u-{l0kö){tt0ni)ha, etc.). Furthermore, it is immediately visible in many other forms. In (2a), for instance, we quote examples from a verb tense where there is a prefixal High-sponsor and the second mora of the stem is also a High-sponsor; in (2b) we cite examples where there is no stem High tone, and the only High-sponsor is the negative prefix hi-, in (2c) we cite examples from nominals. (2)
a. a-(k-ää)-lo{k0tthä)le Ί didn't pick up' a-(k-0a)-tho{k0la)le Ί didn't sharpen' a-(k-ää)-ttu{pülä)le Ί didn't cut' kha-(y-0a)-ttho(k0le)lacale '[class 2] did not use it for
sharpening' b.
kha-(y-ää)-lo{k0ttä)nihacale '[class 2] did not pick up pi.' u-(hi-vd)ha 'to not give', u-(hi-thu)ma 'to not buy' u-(hi-lu)pattha 'to not hunt', u-(hi-lo)wola 'to not trans-
port' u-(hi-lo)kotthela
'to not pick up', u-(hi-pa)pharula
'to not
separate' c.
na{mdnu)ku 'porcupine', (ä-nä)(mänü)ku 'porcupines' na{mälö)ve 'echo', (ά-ηά)(τηάΙό)νβ 'echoes' kha{rgmya)a 'messenger', (0-khä)(rgmya)a 'messengers' i-{m0ö)ttika 'motor car', i-tta(amba)xi 'cowpea(s)' i-pi(liki)ca 'piece of cooked meat' i-ku(khuvi)ri
'half-burnt branches after burning field'
The analysis of Doubling in Ikorovere is straightforward, given the proposed set of universal constraints in section 2. Given that a Highsponsor does surface with a High tone, we know that the constraint Faithfulness is respected: there is a High Domain and the sponsor is inside that domain. Since Faithfulness is undominated in both Ikorovere and Eerati (at least in connection with the material examined in this paper), we will not discuss it further. When a High tone appears on a mora other than a High-sponsor, there is a violation of one or both members of Basic Alignment. There must be a constraint, then, that 1. outranks Basic Alignment and 2. demands a vio-
210
Farida Cassimjee and Charles W. Kisseberth
lation of Basic Alignment in order to be satisfied. The relevant constraint is clearly No Monomoraic High Domain (which dislikes High Domains consisting just of a single mora). The optimal form u{lgwö)la violates Basic Alignment Right in order to satisfy No Monomoraic High Domain, while the non-optimal *(u-lo)wola violates Basic Alignment Left. We conclude, then, that Basic Alignment Left is undominated, but No Monomoraic High Domain dominates Basic Alignment Right. Although the existence of Doubling in Ikorovere is obvious, there are two major cases where it does not occur. A sponsor typically does not double onto a mora at the end of an intonational phrase. The data in (3) illustrate. (3)
'to drink', u-{wurya) ma(a)xi 'to drink water' 'to cultivate', u-(lima) i-(ma)tta 'to cultivate the field' pronounced: [u(lim' e)(ma)tta] u-(l0kö)(tthe)la 'to pick up for', u-(lgkö)(tthelä) ma-(lu)ku 'to pick up stones for' ni-(ko)xo 'clan', ni-{kgxö) ni-(kf)na 'another clan' i-(nu)pa 'house(s)', i-(nupa) ci-(kf)na 'other houses' (n)-tthu 'person', (ή-tthü) n-(ku)mi 'healthy, live person' u-(wu)rya u-(li)ma
The analysis of these data is entirely straightforward. The constraint Non-finality (Intonational Phrase) dominates No Monomoraic High Domain. It is more important not to have a High Domain aligned with the right edge of an Intonational Phrase than it is to avoid a monomoraic High Domain. We thus arrive at the constraint hierarchy: Basic Alignment Left, Non-finality (Intonational Phrase) > No Monomoraic High Domain > Basic Alignment Right. There is a second context where a sponsor does not double: namely, onto the second mora of a bimoraic phrase-penult syllable. Some examples. (We indicate medial position in the Intonational Phrase by three dots after a form.) (4)
u-(ma)ala u-(le)eha u-(hi)iha
'to be quiet' 'to say farewell' 'to cause to leave'
but:
u-(mga)la... u-(lee)ha... u-(hii)ha...
Eerati tone: a tonal dialectology of Emakhuwa
211
The constraint set listed in section 2 does not include any proposed universal constraint that would seem to apply to these examples. In a number of Bantu languages, there are examples which suggest a preference for falling tones in Intonational Phrase-penult position. For example, in Ruciga (spoken in Uganda) there is an alternation entirely parallel with the Ikorovere case. Pre-penult bimoraic syllables are either level Low or level High. If ones that are level High occur in Intonational Phrase-penult position, they are realised as falling. In the Sotho language group, where Intonational Phrasepenult vowels are lengthened automatically, a High-toned Intonational Phrase-penult vowel has a falling realisation. Obviously, considerable work needs to be done to define the precise nature of the constraint that results in falling tones on Intonational Phrase-penult syllables rather than other syllables. There are two broad lines of attack. One type of analysis would be to claim that the end of the Intonational Phrase has a Low tone to which the Intonational Phrase must be faithful, and that while in some languages it is sufficient to locate the final mora into a Low Tone Domain, in other languages this Low Tone Domain will extend back into a penult syllable under appropriate circumstances. A second type of analysis would forego Low tones in the input and simply assume that not only do High Domains not like to align with Intonational Phrase-final syllables, they also do not like to align with Intonational Phrase-penult syllables. Given the descriptive focus of the present paper, we shall — without argument and merely for convenience — assume the following constraint. (5)
Penults are Bad High Domain Edges Do not align the Right edge of an Intonational Phrase penult syllable with the Right edge of a High Domain.
Actually, the constraint in (5) is too general to actually permit an account of the Ikorovere facts. In Ikorovere, it is possible to double onto a monomoraic Intonational Phrase-penult syllable: u-(lgwo)la. The only thing not possible is to double onto the second mora of an Intonational Phrase-penult syllable: u-(ma)ala, not *u-{maa)la. In order to properly delimit (5), we appeal to the notion of constraint
212
Farida Cassimjee and Charles W. Kisseberth
conjunction (cf. Smolensky 1995). The idea of constraint conjunction is that two (lowly ranked) constraints may be combined into an independently ranked conjoined constraint (where the conjoined constraint is violated only if both individual constraints are violated). In the present case, (5) would be conjoined with the markedness constraint banning bimoraic syllables into the conjoined constraint (6). (6)
Bimoraic Penults are Bad High Domain Edges (=*Bimoraic Syllable conjoined with (5)).
A representation will violate (6) just in the event it has a syllable that is bimoraic and that syllable is also Intonational Phrase-penult and aligned with the Right edge of a High Domain. In Ikorovere, (5) is ranked below No Monomoraic High Domain and thus has no ability to prevent doubling. On the other hand, the conjoined constraint (6) — Bimoraic Penults are Bad Edges — is ranked above No Monomoraic High Domain. As a consequence, it will be better to have a monomoraic High Domain (i.e., no doubling) than to have a bimoraic Intonational Phrase-penult syllable aligned with the right edge of a High Domain. The constraint ranking for Ikorovere is thus: Basic Alignment Left, Non-finality (Intonational Phrase), Bimoraic Penults are Bad High Domain Edges > No Monomoraic High Domain > (5), Basic Alignment Left. Having provided an account of the main facts about the phonology of tone in Ikorovere, let us now turn our attention to the Eerati dialect, which offers some interesting complications.
4
Eerati
When we examine the infinitive in Eerati, we find some immediate evidence that it may have exactly the same distribution of sponsors of High tone as Ikorovere. For instance, stems with one mora or two moras have the same tone pattern as in Ikorovere when they are Intonational Phrase-final. We identify Eerati forms by [E], Ikorovere forms by [K].
Eerati tone: a tonal dialectology of Emakhuwa (7)
213
{o)-lya [E], (u)-lya [K] ' t o e a t ' (o)-khwa [E], (u)-khwa [K] ' t o d i e ' o-(li)ma [E], u-(li)ma [K] 'to cultivate' o-(thu)ma [E], u-(thu)ma ' t o b u y '
Additional Eerati examples: (ό)-ννα 'to come', (q)-nya 'to defecate', (o)-sa o-(ha)tta 'to cut', o-(ku)sa 'to take', o-(pwe)sa o-(va)ha 'to give'
'to dawn', 'to break',
Furthermore, stems that have two syllables where the first one is bimoraic are also identical in the two dialects: (8)
o-(mä)ala o-(ho)ola
[E], u-(ma)ala [K] 'to be quiet' [E], u-{hq)ola [K] 'to precede'
Additional Eerati examples: o-(w[)iha 'to bring', o-(te)esa beans for porridge'
'to carry',
o-(pha)ala
'to grind
And trisyllabic stems with a bimoraic first syllable are also identical: (9)
o-(mqa)(li)ha
[E], u-(mqq)(li)ha
Additional Eerati examples: o-(wii)(hi)ya 'to be brought', front', o-(h0o)(le)la 'to lead'
[K] 'to make quiet' o-(h00)(lf)ha
'to make go in
However, stems with other moraic structures differ (at least superficially) from Ikorovere. For example, (10)
o-(thume)la [E], u-{thume)la [K] ' t o b u y f o r ' o-(hukü)la [E], u-(hüku)la [K] 'to brew traditional beer' o-(lupä)ttha [E], u-(lupä)ttha [K] 'to hunt' o-(rukü)(nu)sa [E], u-(ruku)(nu)xa [K] 'to turn something
around, over'
214
Farida Cassimjee and Charles W. Kisseberth
Additional Eerati examples: o-(tumi)ha 'to sell', o-(lovo)la 'to transport', o-(hokho)la 'to hunt birds', o-{hrya)na 'to divorce', o-(huru)rya 'to pick off rice', o-(thiki)la 'to cut' o-(hokö)(lo)sa 'to return something', o-(khanye)(re)ra 'to insist' o-(khutu)(pu)la 'to fold back (e.g., the sleeves of a shirt), turn inside out' o-{hoko)(le)ya 'to go and return', o-(there)(k£)la 'to cut (e.g., a board) o-(thiki)(lg)ca 'to sharpen', o-(ttiki)(tthe)la 'to rub' Furthermore, even the one and two mora stems cited in (7) diverge from Ikorovere when these stems are placed in Intonational Phrasemedial position. In (11), we give medial pronunciations in Eerati in comparison with Ikorovere. (11)
(o-Iyä)...[E],(ü-tyd)...[K] (o-khwd)... [E],(u-khwd)... [K] o-(limä)... [E], u-(lima)... [K] o-(thumä)... [E], u-(thumä)... [K]
Additional Eerati examples: (o-wfl)..., o-Qiqtta)...,
(o-nya)... o-(kusa)...,
o-(pwesa)...,
o-(yaha)...
Our analysis of these data is that the pattern of High-sponsors in the infinitive is the same in Eerati as in Ikorovere and that the essential difference between the two dialects has to do with the phonology of the tone. In a number of cases above (but not all), we find the following basic difference: whereas Ikorovere pronounces both the sponsor and the following mora on a High tone, in Eerati the sponsor is pronounced without a High tone and only the next mora is actually realised on a High tone. Examples reflecting this generalisation include: (o-lyd)... in Eerati but (ύ-lyä)... in Ikorovere; o-(lima)... in Eerati but u-(lima)... in Ikorovere; o-{thume)la in Eerati but u-(thume)la in
Eerati tone: a tonal dialectology of Emakhuwa
215
Ikorovere; and o-(ruku)(nu)sa in Eerati but u-(rukit)(nu)xa in Ikorovere. On the basis of this observation, in a derivational model of phonology one might be tempted to simply categorise Eerati as a "tone shifting" rather than a "tone doubling" language. The term "tone shift" refers to languages where an underlying High appears not on the mora that bears it, but rather on some mora to the right, often the immediately following mora. From the analytical point of view, even in derivational terms, it has never been obvious that tone shifting and tone doubling are ultimately different phenomena; shifting could be assumed to involve first doubling and then the removal of the High tone from the mora that underlyingly bears the High tone. In a nonderivational model such as Optimality Theory, where inputs are mapped onto outputs without intermediate steps in a derivation, the only issue is the following: what constraint prevents the sponsor from realising High tone in Eerati but does not prevent it from realising High tone in Ikorovere? The constraint set listed in section 2 approaches the problem of the difference between tone doubling and tone shift as follows. Both phenomena involve a violation of Basic Alignment Right in the interest of satisfying No Monomoraic High Domain. This is perhaps not immediately apparent in a shifting language, since there is only one High-toned mora on the surface. However, in Optimal Domainstheoretical terms, No Monomoraic High Domain is satisfied if the domain includes more than one mora; its satisfaction does not depend on how the moras in the High Domain are actually pronounced. It is immaterial whether one or even both the moras fail, for some reason, to be realised as High-toned. It is of course this very point that distinguishes the domain-based approach from the autosegmental approach. In autosegmental terms, Eerati has in the input just one mora associated to High tone, and also in the output there is just one mora linked to a High tone. There is no representation where two moras are linked to a High tone, thus satisfying the autosegmental equivalent of No Monomoraic High Domain, which would be No Singly Linked High-tone. To summarise, both Ikorovere and Eerati obey No Monomoraic High Domain (setting aside the effects of Non-finality and Bimoraic
216
Farida Cassimjee and Charles W. Kisseberth
Penults are Bad High Domain Edges); where they differ is in the realisation of tone in the domain. Is High tone realised throughout the domain, or is it only realised at the right edge of the domain? The universal constraint Express (High) is satisfied when all the (tonebearing) elements of the domain realise the feature High tone. A second universal constraint, *(High, non-head) appeals to the idea that tonal domains are headed and that in (most) Bantu languages they are right-headed. The constraint says that non-heads should not be High-toned. In a "spreading" language like Ikorovere, Express (High) dominates *(High, non-head). In a "shifting" language, *(High, non-head) dominates Express (High). On the basis of the examples (o-lya)..., o-(lima)..., o-(thume)la, and o-(ruku)(nu)sa, it is clear that in Eerati, *(High, non-head) dominates Express (High). It is better to not pronounce a non-head on a High tone than it is to satisfy Express (High). Having established the critical difference between Eerati and Ikorovere, let us turn to examples where surface forms are similar. Recall the Intonational Phrasefinal form of monomoraic and bimoraic stems: (12)
(o)-khwa o-(va)ha
'to die', (o)-wa 'to come', (o)-nya 'to defecate' 'to give', o-{ku)sa 'to carry', o-(thu)ma 'to buy'
These examples show that in Eerati, Non-finality (Intonational Phrase) dominates No Monomoraic High Domain, just as in Ikorovere. Also recall the Intonational Phrase-final form of a trimoraic stem with a bimoraic penult syllable. (13)
o-(ma)ala
'to be quiet',
o-(ho)ola
'to precede',
o-(wf)iha
'to
bring' These examples indicate that Bimoraic Penults are Bad High Domain Edges dominates No Monomoraic High Domain in Eerati just as in Ikorovere. We thus have the following constraint hierarchy for Eerati:
Eerati tone: a tonal dialectology of Emakhuwa
(14)
217
Expression-related constraints: *(High, non-head) > Express (High) Domain-related constraints: Basic Alignment Left, Non-finality (Intonational Phrase), Bimoraic Penults are Bad High Domain Edges > No Monomoraic High Domain > Basic Alignment Right
In (13) we gave only the Intonational Phrase-final pronunciation of stems of the shape /CWCa/. When we examine the Intonational Phrase-medial form of these verbs we notice a new problem. (15)
o-(maä)la... o-(h0o)la...
[E], u-(mga)la... [E], u-(h0o)la...
[K] [K]
Additional Eerati examples: o-(wii)ha...,
o-(tee)sa...,
o-(phga)la...
Both moras of the penult syllable are High-toned in Eerati just as in Ikorovere. This appears to contradict the analysis whereby the constraint *(High, non-head) dominates Express (High). We do not get the pronunciation *o-{maa)la... as expected. Our explanation for (15) goes as follows. In section 2, the principle No Rise is included in the universal constraint set. (We assume that there is also a corresponding No Fall principle, but it is a dominated principle in Emakhuwa. No Rise and No Fall are members of the No Contour family of constraints.) (16) No Rise A rising tone on a syllable is not permitted. There are various examples in the literature which indicate that contour tones in general and rising tones in particular are marked configurations that tend to be avoided. Assuming then the existence of No Rise in the constraint set, the data in (14) can be explained by ranking No Rise above *(High, nonhead). As a result of this ranking, a Rising tone will be avoided even
218
Farida Cassimjee and Charles W. Kisseberth
if to do so violates *(High, non-head). We should note, of course, that there are other possible ways of avoiding a No Rise violation than letting both mora in the High Domain be pronounced High (the optimal outcome in Eerati). For instance, neither mora could be pronounced on a High tone: *u-(maa)la This violates neither No Rise nor *(High, non-head); it only violates Express (High). And given that in our analysis Express (High) is the lowest ranked constraint, *u-(maa)la... seems to be the predicted form rather than u-(mga)la Clearly, our analysis must be modified somewhat so that the correct output can be achieved. The constraint set in section 2 actually regards Express as a family of constraints. One member of the family, Express Head, requires the head of the domain to be High. The other member requires every element in the domain to be High. The incorrect form *u-(maa)la shows that Express Head is an undominated constraint in Eerati. As such, it guarantees that a Rising tone may not be avoided by simply not expressing High on any mora in the domain. The hierarchy of expression-related constraints in Eerati is that in (17): (17) Express Head, No Rise > *(High, non-head) > Express (High). We have now explained one environment in which *(High, nonhead) is counteracted in Eerati. There is another environment as well. Look at stems with four moras or more in both Intonational Phrasefinal and Intonational Phrase-medial position. We again cite Ikorovere forms for comparison. (18)
o-(rukii)(nu)sa [E], u-(rukü){nu)xa [K] 'to turn sth. around' o-{rukü){nüsä)... [E], u-(ruku)(nuxa)... [K] o-(there)(ke)la [E], u-(there)(ke)la [K] 'to cut (e.g., a board)' o-(there)(kelä)... [E], u-{there)(kelä)... [K] o-(khoma)(dli)ha. [E], u-{kh0mä)(äli)ha [K] 'to strengthen' o-(khoma)(äli)ha... [E], u-(kh0ma)(gli)ha... [K] o-{hokö){lgse)ra [E], u-{hökö)(lgxe)ra [Κ] 'to return sth. to' o-(hokö)(l0se)ra... [Ε], u-(h0ko)(l0xe)ra... [Κ]
Eerati tone: a tonal dialectology of Emakhnwa o-{hokö)(l0se)rana
[E],
u-{hökö){l0xe)rana
219
[Κ] 'to return sth.
to each other' o-(hoko)(l0se)rana...
[E],u-(h0ko)(l0xe)rana...
[K]
In all of these examples, there is a difference between Eerati and Ikorovere in the first High Domain since Eerati pronounces the sponsor on a low tone, while Ikorovere pronounces it on a High tone. This difference is of course just the difference in the ranking of Express (High) and *(High, non-head). Now in the Intonational Phrase-final forms like o-(ruku){nu)sa and o-(there)(ke)la, we do not expect there to be any difference in the second High Domain in the word. Nonfinality (Intonational Phrase) will restrict the High Domain to a single mora. Since the High Domain consists of a single mora, that mora is the head of the High Domain and will be realised on a High tone due to Express Head. However, when these items are in Intonational Phrase-medial position, we would expect the High Domain to expand to include the final mora. Consequently, we would expect *(High, non-head) in Eerati to yield a pronunciation like *o(there)(kela) But this does not happen. There is no difference in pronunciation between Eerati and Ikorovere in terms of the second High Domain! (This is not entirely true, as the example o(ihokö){l0se)ra [Ε], u-(h0ko){l0xe)ra [Κ] shows. We shall take up the falling tone on short vowels in Eerati immediately below.) The explanation for the inappropriateness of *o-{there)(kela)... is provided by another constraint cited in section 2, Plateau. This constraint says that (setting aside domain structure) the phonetic sequence HOH is to be avoided. In derivational approaches to phonology, this sort of phenomenon has sometimes been referred to as bridging or plateauing. It plays a crucial role in accounting for a number of very complex facts in Cassimjee's (1998) analysis of the Nguni language, Xhosa. If we accept the existence of Plateau, then the data under consideration can be accounted for by ranking Plateau above *(High, non-head). In other words, if it is a choice between ending up with a HOH sequence or with a non-head pronounced High, then it is better to make the latter choice. The hierarchy for the realisation constraints in Eerati is given in (19):
220 Farida Cassimjee and Charles W. Kisseberth
(19) Express Head, No Rise, Plateau > *(High, non-head) > Express (High). There are, of course, violations in Eerati of Plateau. In particular, if a High-sponsor is separated from a preceding doubled High by just a single mora, Plateau does not come into play. For example, consider the verb tense illustrated in (20). In this tense there is a prefix aawhose initial mora sponsors a High tone, and the verb stem's second mora sponsors a High tone. (20) n(k-aa)-li(ma)le
Ί had not cultivated'
n(k-ga)-lu(pattha)le
(*n(k-ga-li)(ma)le)
Ί had not hunted' (*n{k-aä-lu){pgtthä)le
n{k-ää)-ru(kunü)sale Ί had not turned it around' (*n(k-aä-rü)(künü)sale)
Since Plateau would predict the correct pronunciation to be not optimal, we must exclude the pronunciations where Plateau is in effect. In order to avoid a Plateau violation in examples like those in (20), it would be necessary to 1. extend the first domain and make it trimoraic, or 2. misalign the second domain to the left of the sponsor, or 3. create an entirely new domain that encloses the mora that would need to be High in order to avoid a Plateau violation. We assume that the third option is unavailable because the * Structure constraint banning domain structure is more highly ranked than Plateau (i.e., one cannot introduce a domain in order to avoid a Plateau violation). We assume that the second option is unavailable because Basic Alignment Left dominates Plateau (i.e., a domain must be aligned with the left edge of a sponsor even if this means violating Plateau). We are left only with the need to explain why the first option cannot be utilised. We suggest that there may be a universal Binarity constraint that allows only binary High Domains. Binarity in Eerati would be dominated by Non-finality (Intonational Phrase) and Bimoraic Penults are Bad High Domain Edges (the two constraints that lead to monomoraic domains), but it in turn dominates Plateau. This represents an example where the hierarchy of realisation constraints in (19) interacts with the domain structure constraints. Proposing a Binarity constraint, of course, raises questions whether the No
Eerati tone: a tonal dialectology of Emakhtiwa
221
Monomoraic High Domain constraint is still needed or whether it can be replaced by Binarity. We forego any discussion of this issue here. We have now almost finished our account of the purely phonological aspects of Eerati tone. The last point requiring discussion is the falling tone on a short penult vowel that we have occasionally seen in Intonational Phrase-final forms. Some relevant data: ( 2 1 ) o-(thume)la o-(lupa)ttha o-{huku)la o-{tumi)ha o-(lovo)la o-(hoko)(l0se)ra
b u t o-(thume)la... o-(lupa)ttha... o-(huku)la... o-(tumi)ha... o-(lovo)la... o-(hoko)(l0se)ra...
'to buy for'
'to hunt' 'to brew traditional beer' 'to sell' 'to transport' 'to return something to'
We should note that this fall is quite perceivable, even though the penult vowel is not a long vowel. These data establish that contour tones are not necessarily linked to phonological vowel length. The distribution of this falling tone is clear. It occurs on the bearer of a doubled High tone (never on the sponsor of High) that is Intonational Phrase-penult. We suggest that this falling tone has to do with the constraint in (5), Penults are Bad High Domain Edges. Specifically, we suggest that in Eerati violation of (5) is avoided by failing to align a High Domain crisply with the syllable edge. In this solution, we assume the validity of an alignment that falls internal to the syllable. We do not develop here a formal account of this notion. However, if falling tones are to be understood as a sequence of High and toneless (or Low) on a single tone-bearing unit, and if monomoraic vowels can display falling tone, then there is considerable reason for Optimal Domains Theory to recognise that there may be misalignment between a domain and a mora. In any case, we believe we are justified in seeing the short falling tone of Eerati as being a response to the constraint Penults are Bad High Domain Edges. The essential ingredient of the analysis is simply this: No Monomoraic High Domain outranks a constraint Crisp High Domain Edges (requiring that a High Domain be crisply aligned with a syllable). This ranking guarantees that a High Domain will expand as much beyond a single mora as possible. The presence of Penults are
222
Farida Cassimjee and Charles W. Kisseberth
Bad High Domain Edges in the constraint set prevents the expansion of the High Domain from taking in an entire mora. (We assume that No Monomoraic High Domain is not violated as long as the domain is larger than a mora; it is not necessary that the domain achieve a two mora size in order to satisfy No Monomoriac High Domain.) In contrast to Eerati, Ikorovere has an undominated Crisp High Domain Edges constraint; in Ikorovere, No Monomoraic High Domain dominates Penults are Bad High Domain Edges and thus doubling occurs, yielding a simple High tone on an Intonational Phrase-penult mora. One issue remains in Eerati. Why does a High Domain consisting just of a sponsor never violate Crisp High Domain Edges? We assume that the absence of *u-(li)ma is to be attributed to Faithfulness. One aspect of Faithfulness is that the sponsoring mora must be inside a High Domain. If only part of the mora is inside a High Domain, then Faithfulness is not achieved. Thus it is the undominated nature of Faithfulness that prevents *u-(lf)ma. Conclusion In this paper we have sketched the fundamental phonology of Eerati tone and demonstrated how a small set of universal constraints permits a characterisation of what Eerati has in common with and how it differs from a transparent Doubling dialect like Ikorovere. The points of most interest about Eerati are: 1. the way in which Plateau and No Rise interact with the constraint *(High, non-head), and 2. the evidence that Eerati provides for a constraint Penults are Bad High Domain Edges. Understanding the precise nature of this constraint requires considerably expanding the data base beyond Emakhuwa.
Note 1.
The transcriptions are in the orthography promoted by the Mozambican research group NELIMO at Eduardo Mondlane University, "c" stands for the alveopalatal affricate, "x" stands for the alveopalatal fricative (a carry-over from the Portuguese colonial role), "t" stands for a dental stop and "tt" stands for an alveolar stop, "h" after a consonant stands for aspiration.
Government Phonology and the vowel harmonies of Natal Portuguese and Yoruba* Margaret Cobb
Introduction Brazilian Portuguese and Yoruba belong to a group of languages which exhibit a vowel harmony, but about which there is little consensus in the literature as to how the harmonic process may be described.1 Yoruba, for example, has been variously described as involving the features of [ - A T R ] (Archangeli and D. Pulleyblank 1989), [ATR] (Chumbow 1982; Calabrese 1988), [low] (Goad 1993), the element t (Qla 1992), or the element A (van der Hülst 1988).2 Natal (Brazilian) Portuguese and Yoruba have received Government Phonology3 analyses in terms of the element/ + , spreading across an "A+ element-bridge" (Segundo 1993; Qla 1992 respectively). An ^-bridge may be described as a relationship contracted between adjacent nuclei which dominate ^-elements. As a result of this relationship, other elements, i.e., may spread. Although both analyses are able to account for the data, they fail to explain why it is A+ that is building bridges in phonological processes, and / + which likes to spread across them.4 Since these analyses were performed, Government Phonology has further evolved. The element / + has been eliminated from the toolkit, with ATR-type phenomena being explained instead by head licensing (Kaye 1994b) — a principle proposed to explain restrictions on vocalic distribution in ATR-type vowel harmony languages such as Vata (see C. Walker 1995). In this paper, I propose that head licensing may be modified to explain the harmony processes of Yoruba and Natal Portuguese. At the heart of head licensing is the head-government relation which exhibits characteristics consistent with all governing relations
224 Margaret Cobb
in Government Phonology (such as conditions on the identification of governors and governees, locality, and so on. Head-government may conform to an additional condition of government: complexity. Harris (1990) proposes that all sites of government should be subject to the Complexity Condition. He claims that government at the level of nuclear projection is parametrically subject to complexity effects: some languages have it, others do not. The Complexity Condition as a condition on governing relations is then predicted to take effect in the head-governing relation. As head-government is contracted at the level of nuclear projection, it is predicted that head licensing in some languages will manifest complexity effects, whilst others will not.5 In this paper I propose that the prediction that some languages will manifest complexity effects is indeed borne out. When analysed in terms of the newer tools of Government Phonology,6 these ^-bridge cases of /+-spreading are simply instances of head licensing with strict conditions on the identification of governors and governees. I evaluate these conditions in terms of complexity. This provides a principled, non-arbitrary explanation of why languages such as Natal Portuguese and Yoruba manifest the harmony processes that they do. 1
Natal Portuguese
The literature on Standard Brazilian Portuguese has focused on the debate over the harmonic feature. [High], [raised], [ATR] and [open] have all been proposed.7 Segundo (1993) approaches the problem in the framework of Standard Government Phonology including Charm Theory, in an analysis of a non-standard dialect, Natal.8 The lexical vowels of Natal Portuguese are transcribed by Segundo (1993) as the following (la), with examples in (lb). 9 (1)
a. Natal vowels: /a, i, u, e, ε, ο, d/ b. [tira] 'removes' [käla] 'shuts up' [püla] 'jumps' [fesa] 'closes' [zära] 'generates' [k01a] 'glues' [flo]10 'flower'
The vowel harmonies of Natal Portuguese and Yoruba
225
In the vowel harmony process, the following alternations are observable: (2)
[ε] ~ [e]/[i] [ο] ~ [o]/[u] a. [kibn] 'break' b. [k51u] 'Iglue' [kebrava] Ί used to break' [kolava] Ί used to glue' [kebrej] Ί broke' [kol6j] Ί glued' c. [fi§ri] 'hurts' d. [t5si] 'coughs' [firia] Ί used to hurt' [tusia] Ί used to cough'
The process may be informally described as follows: in a pair of nuclei, the second of which is stressed, nuclei agree in "height" and/or "tenseness". The following distributional generalisations contribute to this hypothesis: (3)
Harmonic restrictions on nuclear distribution in Natal Nucleus 1 [*ε, *o] [ε, ο, i, u, e, o, a] [e, o] [*e, *o, ε, ο] [i, u, a]
Nucleus 2 (stressed) [i, u, e, o] [ε, ο, a] [e, o] [i, u] [i, u, a, e, ο, ε, ο]
Basically, "lax" mid vowels cannot be found preceding any "tense" vowel, and in some cases, "tense" mid vowels cannot be found preceding the "tense" "high" vowels. The distribution of [i], [u], and [a] is unrestricted. The harmony process appears to be bounded. "Lax" mid vowels can indeed precede stressed "tense" high vowels, but only when separated by an intervening nucleus, as the examples below show: (4)
[deglutia] [kalidia]
Ί used to swallow' Ί used to collide'
Segundo's analysis for the explanation of the alternations in (2) and constraints on distribution in (3) and (4) is based on a right-headed
226 Margaret Cobb
governing relation contracted between nuclei containing A+ in specific head-operator roles (an y4+-bridge): (5)
Vowel harmony in verbs in the Natal dialect (Segundo 1993) Where Ni is the pretonic nucleus and N2 the primary stressed nucleus (head of the domain), the realisation of the governed nucleus (Ni) is directly related to the presence or absence of ^-operators in both the head and in the pretonic (governed) position: a. /4+-operators in governed positions (Ni) can only be licensed by yl+-elements in the governing position. b. i + spreads from N2 onto Ni across the ^4+-bridge (a single element A+ is attached to two adjacent nuclei).
The alternations manifested in (2) are then instances of the derivations below. (6)
a.
[kebrej] ι ^ Ο R Ο Ν
[ksbräva] 1 R Ο R Ν
Ν
Ο R Ο Ν
R Ο R Ν
Ν
χ χ χ χ χ χ χ
χ χ χ χ χ χ χ
k 1° b r 1° j
k Γ b r Α + ν Α+
ϊ
ϊ
t
1
Α+ 1
Α+ ϊ+
The vowel harmonies of Natal Portuguese and Yoruba
227
c. [firia] r^-1 Ο R Ο R Ο R Ν
Ν
Ν
χ χ χ χ χ χ f Γ r 1° Τ αΛ γ
Α+ ~
In (6a) the first and second nucleus in the string contract a rightheaded governing relation, as the conditions described in (5) are met. The ^-operator of the first nucleus (the governee) is licensed by the /i+-element in the second nucleus (the governor). / + spreads across the ^-bridge formed by the ^-licensing relation, and is linked to both nuclei. In (6b), the first two nuclei are in a governing relation, as the operator of the first nucleus must be licensed by the A+ in the second nucleus. However, in this case, although there is the requisite bridge, there is no / + to spread. In (6c) the A+ in the first nucleus needs a licensor (an A+) from the following nucleus. However, no licensor is available, and A+ delinks. The remaining element in the expression is 1°. However, as "lax" [i] is not a lexical expression in Natal (*/if), Segundo calls on the notion of structure preservation to motivate the linking of / + as an ambient element, to yield a "tense" N.
Although the proposal is adequate to explain the data, a number of limitations are apparent. First, the governing relationship between the nuclei contracted by A+ is element-specific: it is only A+ as an operator in pretonic positions which requires special licensing. Secondly, the derivation in (6c), like the derivation in (6a), results in / + linking. However, on the account here, although the linking is triggered by the same governing relationship, they are essentially un-
228 Margaret Cobb
related: in(6a) / + has a local source in the governing nucleus. In (6c) I + is linked ambiently, as it has no local source. Another problem is apparent in (6c). Segundo claims that in the governing relation between the two nuclei, the governee needs to be "weaker" than the governor. In Standard Government Phonology terms, "strength" is equated with charm properties: positive charm is strong, neutral charm is weak. Segundo claims that A+ is lost from the governee because it is positively charmed. However, as the positively charmed element / + is linked for the purposes of structure preservation, it is difficult to understand why this element in the governee should be considered by Natal as less of a burden than A+. Furthermore, at odds with this analysis is the observation that the positively charmed A+ may appear in the governee as a head in the expression (A*)+ (when it contributes its positive charm to the expression), no matter what expression is linked to the governing position. Segundo's analysis focuses on the three-way alternations illustrated in (2). "Lax" mid vowels are the only objects targeted in the vowel harmony process. Lexically "tense" mid vowels do not alternate. She does not attempt to explain why only "lax" mid vowels are targeted by the process. On her analysis, sequences like [e/o ... i/u] should also be subject to ^-licensing conditions, as both [e] and [o] have ^-operators," so these sequences should be ruled out. Finally, Segundo's analysis cannot explain data of the type below: (7)
Forms of beber 'drink' Future 1st singular [beber-ajs] 3rd plural [beber-aw]
Conditional [beber-ias] [beber-iäw]
In the examples above, the conditional [i] in the second column conditions the "tense" realisation of preceding mid-vowels. However, on Segundo's story, / + is linked either via the governing relation manifested by the y4+-bridge, or by ambient / + linking in the interests of structure preservation. The examples above manifest no /4+-bridge between the trigger and the target. Nor can structure preservation be called on: /ε/ can be found lexically in Natal, the mid vowel does not have to depend on / + linking for interpretation. In addition, the first
The vowel harmonies ofNatal Portuguese and Yoruba 229
nuclei in the strings above are not expected to undergo harmony, as they are not left adjacent to the stressed governing nucleus. 2
A Revised Government Phonology analysis
"Headedness" is the characteristic employed by Revised Government Phonology (summaries in Cobb 1997; Ploch 1999b) in the explanation of the distribution of the characteristic "Advanced Tongue Root" (Kaye 1994b). Charm Theory is no longer used and is therefore not called on in this analysis. In harmony systems of "headedness", headed phonological expressions are distributed according to the internuclear governing relationships contracted by the nuclei to which they are associated. These governing relationships are an aspect of head licensing (Kaye 1994b), in which headless expressions are mapped to headed expressions under specifically defined conditions. I propose the lexical expressions for Natal are as follows: (8)
Lexical expressions for Natal12 /i/ © /u/ (U) /e/ (A-I) /o/ (A-U) /ε/ (Α·Ι) hl (A-U) /a/ (A)
These expressions are generated with the licensing constraint in (9):13 (9)
Licensing constraint for Natal: A cannot be a head
The triggers of the harmony are the nuclei containing headed expressions which can identify head-governors in the head-government mechanism. The targets in the process are the nuclei with headless expressions, as shown below: (10) Harmony triggers = headed expressions:
Ν (I); /u/ (U); /e/ (AI); /ο/ (A-U);
Harmony targets = headless expressions: /ε/ (A I); /ο/ (A-U); /a/ (A), in principle.
230 Margaret Cobb
Note that in the set of targets above, a (A) is in principle a target of the vowel harmony process. However, the derivation is constrained by the licensing constraint A cannot be a head, explaining why a does not alternate. The phonological expressions interact with the head licensing principle below. (11) Head licensing (Kaye 1994b; modified by Cobb 1997) a. A nuclear position is head-licensed if it is head-governed; b. a head-governs β if they are adjacent on the relevant projection, and α is a head-governor; β is a governee; c. a nuclear position is a head-governor if it is identified by a headed expression; a nuclear position is a governee if it is identified by a headless expression; d. a is not itself head-governed; e. the status of head-governor is immutable; f. head-government obeys the strict cyclicity of Kean (1974: 179): "On any cycle A no cyclic rule may apply to material within a previous cycle Β without making crucial use of material uniquely in A." The examples below illustrate head-licensing: (12)
a.
[kebrej] I Ο R Ο N,ß
b. 1 R Ο R N„
Ν
χ χ χ χ χ χ χ I I I I I I k I b r I j
[kebrava] ι 'never'. This ranking entails both the insertion of voicing on an underlying voiceless obstruent (6b) as well as the devoicing of an underlying voiced obstruent (6a). The result is thus a two-way departure from faithfulness. Hence, the Uniformity Constraint must dominate Id-[voice]. But Positional Faithfulness in onsets remains topranked since an onset consonant does not change its voicing value.
Two notes on laryngeal licensing
265
The upshot is regressive rather than progressive assimilation. As shown in (6c), an underlying cluster of voiced obstruents remains unchanged in the output in contrast to German Run[tg\ang (4c). (6)
Polish voicing and devoicing a. za/b+k/a Id-[voice]onset Uniformity * [voice] Id-[voice] [bk] *! * [Pk] [bg] *! ** • b. pro/s'+b/a Id-[voice]onset Uniformity *[voice] Id-[voice] [s'b] *! * [z'b] ** * [s'p] *! * c. ni/g+d/y
Id-[voice]onset Uniformity *[voice] Id-[voice]
[gd]
[kd] [kt]
**
*! *!
*
* **
Finally, since these languages voice an underlying voiceless consonant in an obstruent cluster whose final term is voiced (6b), Uniformity must dominate the * [voice] Constraint that militates against voiced obstruents. It is this constraint ranking that differentiates Polish from German. In her original (1991, 1995) typology, Lombardi stipulated the independence of final devoicing from regressive assimilation by treating the final voiced obstruents of Yiddish and Serbo-Croatian as "extrametrical". Optimality Theory provides a more satisfactory explanation by calling on its basic analytic tool: constraint ranking. Since the Uniformity Constraint drives assimilation in clusters, the treatment of word-final obstruents can be divorced from the assimilation in clusters. Languages like Yiddish and Serbo-Croatian that preserve underlying voicing on a final obstruent have faithfulness dominating * [voice] (7a) while final devoicing languages like German and Polish have the opposite ranking (7b).
266 Kenstowicz, Abu-Mansour and Törkenczy
(7)
Final devoicing in Yiddish and Polish a. klulbl Id-[voice] * [voice] [b] * [p] ·! b. klulbl *[voice] Id-[voice] [b] *! ^ [p]
Constraint ranking explains another feature of Lombardi's original typology. There are languages such as Yiddish and Serbo-Croatian that neutralise voicing distinctions in obstruent clusters but maintain a voicing contrast word-finally. But there do not seem to be languages that neutralise voicing distinctions word-finally but maintain them in obstruent clusters. For Lombardi (1999) final devoicing implies *[voice] »Id-[voice]. This effectively devoices everywhere. By ranking Positional Faithfulness above * [voice], a change in the onset consonant is blocked. Given that the constraint repertoire of Universal Grammar lacks any faithfulness constraint that singles out the coda, there is no way to specifically prevent the devoicing of a coda consonant. Consequently, other things being equal, final devoicing implies neutralisation in obstruent clusters but not vice versa. Lombardi also discusses Swedish where Uniformity is satisfied in obstruent clusters by devoicing a voiced obstruent next to a voiceless one regardless of order: i.e., both progressive and regressive assimilation occurs. (8)
Bidirectional devoicing in Swedish obstruent clusters, part 1 hög 'high' hög-tid 'festival' [kt] dag 'day' tis-dag 'Tuesday' [st] syl-de 'covered' läs-te 'read' [st] äg-a 'to own' äg-de 'owned' [gd]
Rather than seeing this as the spread of [-voice] (inconsistent with the thesis of privative voicing), it is now treated as the context-free deletion of underlying [voice] specifications under the pressure of Uniformity and * [voice]. Outside obstruent clusters, voicing is faithfully retained. Lombardi (1999) derives this voicing pattern by de-
Two notes on laryngeal licensing
267
moting Positional Faithfulness below Context-free Faithfulness so that onset obstruents can be devoiced (9c). Uniformity dominates Id[voice] forcing clusters to agree and * [voice] enforces devoicing (9b, c). But Id-[voice] ranks above * [voice] to block devoicing outside a cluster (9a) and in clusters satisfying Uniformity at the outset (9d). (9)
Bidirectional devoicing in Swedish obstruent clusters, part 2 a. Aö/g/ Uniformity Id-[voice] * [voice] Id-[voice]0nset or [gj * [k] *! b. hölg+ilid Uniformity Id-[voice] * [voice] Id-[voice]0nset *! [gt] fkfl * [kt] * *t* * [gd] c. lä/s+d/e [sd] [zd] [st]
Uniformity Id-[voice] * [voice] Id-[voice]0nset
d. ä/g+d/e [gd] [kd] [ktl
Uniformity Id-[voice] * [voice] Id-[voice]onSet **
*
*t*
*
*
*|
*
#
*|*
The Hasse diagrams in (10) show the constraint rankings that generate the German, Polish, Yiddish, and Swedish voicing patterns. Taking them in order, Polish differs from German by promoting Uniformity above * [voice]. Yiddish differs from Polish by inverting the ranking between * [voice] and Id-[voice]. Finally, Swedish differs from Yiddish by demoting faithfulness to onset voicing to the bottom of the hierarchy.
268 Kenstowicz, Abu-Mansour and Törkenczy
(10) Hasse diagrams for German, Polish, Yiddish, and Swedish a. German b. Polish Id-[voice]onset
Id-[voice]onset
* [voice] Id-[voice] c.
Uniformity Yiddish
Id-[voice]onset
Uniformity
Uniformity
* [voice] Id-[voice] d.
Swedish
Uniformity I Id-[voice] I * [voice] I Id-[voice]onset
Finally, Lombardi (1995, 1999) mentions Ukrainian (Bethin 1987) where Uniformity in obstruent clusters is satisfied by (regressive) voicing but not by devoicing (11). Furthermore, this language has no final devoicing: rot 'mouth' versus rod 'kind'; vas 'you (accusative plural)' versus vaz 'vase (genitive plural)'. (11) Voicing in Ukrainian, part 1 rildkJo n'[dk]o 'seldom' ve/z+t!y ve[zt]y 'to drive' pro/s'+b/a /?ro[z'b]a 'request' cf.pros-y-ty 'to request' boro/V+b/a boro[d'b]a 'fight' ne/s+t/y ne[si\y 'to carry' xo/d'+b/α xo[d'b]a 'walking' Given the limited number of constraints at play, the analytic options are quite restricted — an obviously desirable state of affairs. Since there is no final devoicing, Id-[voice] must dominate * [voice]. This
Two notes on laryngeal licensing 269
ranking also preserves a cluster of two voiced obstruents. But in mixed clusters we must introduce voicing in /s'+b/ -> [z'b] yet block devoicing in /z+t/ -» [zt]. It looks like the faithfulness constraint for voicing (i.e., Id-[voice]) must be in two places at the same time (an obvious contradiction). For /s'+b/ —> [z'b] Uniformity dominates Id[voice] while for /z+t/ —> [zt] Id-[voice] dominates Uniformity. A possible solution to this dilemma is to capitalise on the privative status of [voice]. The mapping we must allow (/s'+b/ —> [z'b]) adds [voice] while the one we must block (/z+t/ - » [st]) deletes [voice]. If McCarthy and Prince's (1995a) correspondence constraints are extended from segments to features, then these two departures from faithfulness can be distinguished in terms of Max (don't remove an element from the input) and Dep (don't insert an element into the output). The relevant constraint ranking for Ukrainian then is the same as Yiddish except that Id-[voice] is decomposed into Max- and Dep-variants with Uniformity ranked between them: Max-[voice] » Uniformity » Dep-[voice]. In other words, Uniformity can be satisfied by insertion of [voice] (Uniformity » Dep-[voice]) but not by the deletion of [voice] (Max-[voice] » Uniformity). In essence, this is also the analysis proposed by Gnanadesikan (1997). (12) Voicing in Ukrainian, part 2 a. pro/s'+b/a Max-[voice] [s'b] [z'b] b. ve/z+t/y [zt] [st]
Max-[voice]
Uniformity *!
Dep-[voice] *
Uniformity *
Dep-[voice]
*!
This completes our survey of Lombardi (1999). It is a simple and elegant theory with considerable descriptive coverage. It derives the cross-linguistic predominance of regressive (as opposed to progressive) voicing assimilation from the positional licensing of [voice] in the onset of the syllable. We now turn to some problems we have encountered in extending the theory.
270 Kenstowicz, Abu-Mansour and Törkenczy
2
Hungarian
As the paradigms in (13) demonstrate, Hungarian is a language which preserves the contrast between word-final voiced and voiceless obstruents. It also has regressive voicing assimilation in clusters with the direction of assimilation determined by the final member of the cluster. This process is obligatory.1 (13) Regressive voicing assimilation in Hungarian, part 1 kap 'catches' ka\b]-dos 'catches repeatedly' dob 'throws' do[p]-tam Ί threw' jeg 'ice (nominative)' ye[k]-/d 7 'ice (ablative)' csok 'kiss' cs0[g\-bol 'kiss (elative)' Hungarian thus occupies the same slot in the typology as Yiddish. The ranking of (10c) derives these alternations, as shown in (14). (14) Regressive voicing assimilation in Hungarian, part 2 a. do!b/ Id-[voice] 0 nset Uniform- Id-[voice] * [voice] ity * νά[1] but sokk 'shock' —» so[kk]. As Moren notes, this discrepancy runs contrary to the cross-linguistic preference for length to coincide with increased sonority. If final stops have salient release, then their otherwise aberrant behaviour begins to make some sense. We close this section by observing that we have dropped reference to syllabic affiliation in our formulation of the Laryngeal Li-
276 Kenstowicz, Abu-Mansour and Törkenczy
censing Constraint (17) and have restated it in exclusively segmental terms. The motivation for this move can be seen in various dialects of Arabic, to which we now turn. 3
Arabic
It is well known that Arabic prosody depends on the contrast between light and heavy syllables (Mitchell 1993). Syllable weight is relevant for stress, minimality, and templating processes. Word-final single consonants are non-moraic while the VC-substring in VCC#and VCCV-sequences is uniformly bimoraic and hence, under standard assumptions, tautosyllabic. We can bring this aspect of the prosody to bear on the onset/coda status of consonants with respect to the positional licensing of [voice]. The general upshot is that the two phenomena are largely independent. We consider here two of the Arabic dialects that figure prominently in Abu-Mansour's (1996) discussion of voicing. 3.1.
Daragözü
Daragözü is an Arabic dialect spoken in Turkey (Jastrow 1973). Its stress rule distinguishes closed from open syllables in the expected way (cf. below). Similar to Turkish, Daragözü devoices word-final obstruents. Daragözü also regressively assimilates voicing in obstruent clusters and thus has the constraint ranking of Polish seen in (10). According to Jastrow (1973: 31), "[vjiewed from the end of the word, stress is on the word-internal first long vowel or the first VCCsequence, otherwise on the first syllable of the word". At the end of the word before pause all voiced consonants are realised as voiceless (Jastrow 1973: 19). Final devoicing is not reflected in Jastrow's transcriptions, except for the phaiyngeal Λ7, which takes the voiceless variant /h/ word-finally and before a voiceless consonant. For voicing assimilation Jastrow (1973: 24) states: "If a voiced and voiceless consonant come in contact, then the group is uniquely voiced or voiceless such that the first consonant assimilates to the second".
Two notes on laryngeal licensing
277
These principles of Daragözü phonology are reflected in the following paradigm for the verb /qata?/ 'cut'. (21) Daragözü regressive voicing assimilation 1st person 2nd person a Singular masculine [q tah-tu] [qatäh-t] Singular feminine [qatäh-tu] [qatah-te] a Plural [q tä?-na] [qatah-to]
3rd person [qätah] [qäti-et] [qätf-o]
The form of particular interest here is the first plural [qatä?-na]. It must have a closed penultimate syllable in order to attract the stress and so the /?/ must occupy the coda. Nevertheless, /?/ is not devoiced. This makes perfect sense according to the revised licensing principle in (17) which preserves [voice] on presonorant segments regardless of syllabic affiliation. See Rubach (1996) and Steriade (1999b) for similar critiques, based on Polish and Lithuanian. Daragözü thus has the same ranking as Polish in (19a) with faithfulness for voicing slipped between neutralisation in word-final and presonorant position. 3.2.
Makkan
Abu-Mansour (1996) also discusses voicing assimilation in the Saudi-Arabian dialect of Makkah. It differs from Daragözü in retaining the contrast between voiced versus voiceless obstruents wordfinally as well as before a voiced obstruent. Before voiceless obstruents there is devoicing of underlying voiced obstruents. (22) Devoicing /ji+ktub/ /ji+dbah/ /ji+tba?/ /ji+dfin/
in Makkan Arabic, part 1 [jiktub] 'writes' [jidbah] 'slaughters' [jitbai] 'follows' [jitfin] 'buries'
[katab] [dabah] [tabaf] [dafan]
'wrote' 'slaughtered' 'followed' 'buried'
Viewed formally, Makkan Arabic is the Dep-[voice] » Uniformity » Max-[voice] counterpart to Ukrainian (Gnanadesikan 1997).
278 Kenstowicz, Abu-Mansour and Törkenczy
(23) Devoicing in Makkan Arabic, part 2 (to be revised) a. /ji+tbai/ Dep-[voice] Uniformity Max-[voice] [tb] * [db] *! b. /ji+dfin/ Dep-[voice] Uniformity Max-[voice] [df] *! [tf] * While this analysis works, we suggest an alternative more in keeping with the notion of phonological and phonetic salience. Presonorant position is an optimal context in which to preserve a voicing distinction in obstruents because sonorants typically do not themselves contrast in voicing and so allow voice onset time to serve as an effective cue to the voicing distinction. What is special about Makkan Arabic, we suggest, is that the context for licensing the voicing contrast is extended from sonorants to voiced obstruents. On this view, just as a sonorant such as the nasal in [jikniz] 'accumulates' versus [jigni] 'owns' licenses the voicing contrast in the preceding stops so does the /b/ in [jitbaY] 'follows' versus [jidbah] 'slaughters'. From the perspective of phonological salience, the Makkan pattern can be captured by dividing the *[±voice] / _ [-sonorant] constraint into voiceless and voiced contextual variants, with faithfulness for [voice] ranked between them. (24) Devoicing in Makkan Arabic, part 3 a. /ji+tbai/ *[±voi]/_ Id-[voice] [-son, -voi] [tb] [db] *! b. /ji+dbah/ *[±voi]/_ Id-[voice] [-son, -voi] [db] [tb] *!
*[±voi]/_ [-son,* +voi] *[±voi]/_ [-son,* +voi] *
Two notes on laryngeal licensing 279
c. /ji+dfin/ [df] «- [tf]
*[±voi]/_ [-son, -voi] *!
Id-[voice]
*[±voi]/_ [-son, +voi]
*
One final observation: Makkan masdars (nominalisations) in the CaCC-template break up rising-sonority clusters with epenthesis: /?akl/ —> [?akil] 'food'. But obstruent clusters generally surface without any vocalic support. The range of consonants composing the final cluster in the input freely combines all four possible combinations of voiced and voiceless obstruents. The remarkable fact is that these clusters are resolved in essentially the same way as medial ones: there is devoicing but no voicing. (25) Devoicing in Makkan Arabic, part 4 /fatk/ [fatk] 'destruction' /Tabd/ [Tabd] 'slave' /rabk/ [rapk] 'confusion' /rakb/ [rakb] 'caravan' In a preliminary phonetic study of such clusters with two speakers, the following generalisations emerged. When the second consonant in the cluster is voiceless then the first obstruent neutralises to voiceless. When the second consonant in the cluster is a voiced obstruent then a voicing contrast in the first obstruent is maintained: a /g/ is fully voiced in Ci position while voicing ceases shortly after the onset of /d/ and /b/ in this position. When the second member of the cluster is a stop, then closure voicing consistently disappears in this consonant. Nevertheless, a voicing contrast is still maintained: phonologically in the effect on the preceding obstruent and phonetically in the release: in voiceless stops energy is diffused through the spectrum while in voiced stops it is weaker and more confined. This suggests that even though closure voicing is absent in C2 because the constriction in the oral cavity blocks airflow through the glottis, the voicing contrast is still maintained and becomes audible at release. Makkan thus appears to counter-exemplify a generalisation Lombardi (1999), following Mester and Ito (1989), attributes to
280 Kenstowicz, Abu-Mansour and Törkenczy
Harms (1973) concerning the devoicing that obtains in the analysis of the English plural that posits an underlying voiced consonant: cats /kaet+z/ -» [kaets]. Harm's "Generalisation" states: "voiced obstruents must be closer than voiceless [ones] to the syllable nucleus" (Lombardi 1999: 288). Another point worth mentioning is that, as in Hungarian, Makkan contrasts prepausal single versus geminate stops, which have salient release. With salient release, the final consonant in a CaCC masdar-cluster will have the same status as a prevocalic one with respect to the Laryngeal Licensing Constraint (17), and hence the identical behaviour with regard to voicing assimilation is to be expected. We suspect that the preservation of voicing and place contrasts in such clusters is connected with a more measured intersegmental timing pattern in comparison to that found in English or French. Our data often indicate brief (one or two pulses) moments of periodic vibration between consonants. See Gafos (to appear) for discussion of the importance of such timing factors in Moroccan Arabic. 4
Ukrainian
We close by returning to the Ukrainian data mentioned earlier (section 1). We recall that Ukrainian maintains a voicing contrast wordfinally as well as before a voiceless consonant. Before voiced obstruents there is regressive assimilation and hence neutralisation. While these data could be described by inverting the *[±voice] / _ [ sonorant, -voice] » Id-[voice] » *[±voice] / _ [-sonorant, +voice] Makkan ranking to *[±voice] / _ [-sonorant, +voice] » Id-[voice] » *[±voice] / _ [-sonorant, -voice], the former ranking is grounded in phonetic perception and hence should be irreversible. Bethin (1987) reports that Ukrainian speakers syllabify obstruent clusters by maximising onsets. This applies even if the resulting clusters are not found word initially: xlo.pcyk 'little boy', ko.bzar 'singer'. The one systematic exception is a cluster of a voiced obstruent followed by a voiceless one; it is perceived as heterosyllabic: rid.ko 'seldom'. However, for this to happen, a preceding vowel is required: lzr=p/ekty 'to bake' is realised as [sp]e.kty with voicing assimilation. But if the preceding word ends in a vowel then underlying voicing is re-
Two notes on laryngeal licensing 281
tained: moloko z=silosja 'the milk has curdled' (Andersen 1969: 165). The regressive assimilation in /z=plekty —> [sp]e.kty as well as in pro/s'+b/a —> pro[z'b]a indicates that Ukrainian demotes Id[voice] below the constraints neutralising voicing distinctions before voiced and voiceless obstruents. Hence some other mechanism blocks devoicing in ridko. Here we follow Bethin's (1987) suggestion that in Ukrainian the lack of assimilation in ridko reflects a process of coda laxing that weakens (sonorises?) postvocalic voiced obstruents. This weakening will override the devoicing that is otherwise expected from the perceptually motivated ranking. Following Steriade (1999b), we speculate that the syllabification judgements reflect at least in part the availability of a matching word-initial cluster. In other words, a word-medial V.CCV-parse will be rejected if the cluster is systematically excluded word-initially (based on the isolation pronunciation). This explains why a voiced-voiceless cluster is judged heterosyllabic. And since there is no final devoicing in Ukrainian, the heterosyllabic parse also matches the right edge of the word. The laxing process may be the way the language avoids final devoicing as well. If so, then Ukrainian has the same ranking as Polish and Russian with Id-[voice] below *[±voice] / _ #. This interpretation is supported by dialect variation. The South-western dialects have final devoicing; but they also have regressive assimilation in clusters so that voiced obstruents devoice before a voiceless obstruent: cf. w'[y]o/' 'fingernail' and South-western ni[x]ty versus Standard Ukrainian ηϊ[γ]ΐγ 'finger (plural)'. Clearly, these suggestions are highly speculative; a thorough study of the phonology and phonetics of Ukrainian obstruents is required to substantiate the hypothetical laxing process. Summary and Conclusion In this paper we have reviewed the positional licensing of laryngeal features as proposed in Lombardi (1991, 1995, 1999). In order to extend the model to Hungarian we have recast the constraint in more phonetic terms that refer to the contexts which favour the realisation of cues to voicing contrasts: in particular voice onset time and release
282 Kenstowicz, Abu-Mcmsour and Törkenczy
of stop closure. Dispensing with reference to the syllable also allowed us to come to terms with conflicting evidence in Arabic dialects. Features like release are often regarded as insignificant details added in the phonetic component. The evidence reviewed here suggests that such factors can have an impact on phonological structure. Obviously, systematic study and experimentation is needed to substantiate the notion of saliency that underlies this approach.
Notes *
This paper began in 1995 through e-mail correspondence between the first and third author. We acknowledge the support of the Fulbright Foundation for an award to Miklös Törkenczy to visit MIT in the academic year 1999-2000. We are pleased to offer this paper in celebration of Jonathan Kaye's path-breaking research in phonological theory and his special interest in licensing constraints (cf. Kaye 1997; for summaries, cf. Cobb 1997; Ploch 1998, 1999b). We thank Stefan Ploch in his capacity as editor for many helpful comments. 1. Based on remarks in Vago (1980), Lombardi (1991, 1995) interpreted the process as optional. As pointed out in Lombardi (1999: 284), this disagrees with the judgement of most other Hungarian linguists, for whom the process is obligatory. See Szigetviri (1997: 223) and Siptar and Törkenczy (2000: 201). 2. As Stefan Ploch notes, departing from the "standard" view, Kaye (1990a) proposes that a single domain-final consonant is never linked to a "coda" or postnuclear rhymal position but, universally, always to an onset, which in turn is followed by a p-licensed (i.e., silent) empty nucleus ("coda" licensing). For Kaye, there is no coda qua constituent; he only uses this term as a circumlocution for 'postnuclear rhymal position'. Piggott (this volume) employs a parameterised version of Kaye's Coda Licensing Principle, while Rice (this volume) argues for both final consonants in coda position and final consonants in onset position within one and the same language (Ahtna).
On spirantisation and affricates Tobias Scheer
Introduction This paper investigates two phonological phenomena that do not seem to be related at first sight. It aims at showing that spirantisation and the absence of simplex stops for certain places of articulation share the same cause. As to the former, there are two kinds of spirantisation: the well-known Spanish kind that is conditioned by a vocalic context, and a second kind, illustrated by Grimm's Law and less well described, where aspiration triggers spirantisation without implication of any context. The crucial difference between both is that the fricatives resulting from the latter but not from the former also undergo a change in their place of articulation. The second phenomenon discussed can be appreciated when taking a simple look at the Phonetic Alphabet: some places of articulation like the bilabial one offer both simplex stops (as opposed to affricates) and fricatives, while others such as the labio-dental place exhibit only fricatives. Although phonetically striking, significant attempts to account for this lack of stops (as opposed to the existence of fricatives for all places) do not seem to have been undertaken so far. I show that the distribution of the places exhibiting versus lacking stops is not random, but correlated with the results of the Grimm's-Law type spirantisation. Both phenomena— aspiration triggering spirantisation and the lack of stops for certain places of articulation — are argued to be a consequence of the incompatibility of two phonological primes carrying antipodal properties: A representing maximal aperture and ? contributing maximal closure. As a consequence, affricates appear as the articulations eluding the presence of both A and ? within the same segmental expression while sticking to the place of their fricative part and being a stop. They thereby qualify for assuming the stop-function for places that
284
Tobias Scheer
do not normally tolerate stops. It will also be shown that the real correspondence between fricatives and their related stops, foremost in the dental area, is different from the correspondence that is suggested by the consideration of mere phonetic properties. Section 1 reviews the evidence concerning spirantisation, while section 2 investigates the correspondence between stops and their phonologically related fricatives. The possibility of accounting for the discussed data by a single generalisation involving A (aperture) and ? (stopness) is evaluated in section 3. 1
Two kinds of spirantisation
Consider first relevant data illustrating Grimm's Law as shown below (e.g., Paul, Wiehl, and Grosse [1881] 1989: 113; Braune and Ebbinghaus [1886] 1981: 47): (1)
Grimm's Law. Latin and Greek forms witness the IndoEuropean state of affairs' a.
Spirantisation2 IE CG Got. °P, y 7 / °v b °bh b °v %V θ d °d °dh °d d h °X 0 V 8 V y g
Lat. (Gr.) Got. pater fadar septem sibun fero bairan tres °preis pater fadar dyra (Gr.) daur cornu °haurn dakry (Gr.) °tagr hostis gasts
'father' 'seven' 'cany' 'three' 'father' 'gate' 'horn' 3 'tear' 'stranger'
Devoicing °b °P °d °t °k °8
(s)lubricus edo ego
'sneak' 'eat' Τ
Ρ t k
°sliupan itan ik
On spirantisation and affricates
285
Three regular correspondences for the oldest record of Germanic, i.e., Gothic, can be established (see, e.g., Collinge 1985: 63). 1. IndoEuropean voiced stops become voiceless but remain stops: IE °b, °d, °g > Got p, t, k. 2. Indo-European voiced aspirated stops are represented by Gothic voiced unaspirated stops: IE °bh, °dh, °gh > Got b, d, g. Finally, Indo-European voiceless stops, both aspirated and unaspirated, correspond to either voiceless fricatives or voiced unaspirated stops: IE {°p,°ph}, {% °th), {% > Got { f , b } , {Qd}, {h, g}. The selection of the fricative or the stop for the latter correspondence is not governed by the aspiration value of the input (e.g., both IE {°p, °ph} can appear as both Got {/, Z>}), but by Verner's Law.4 In regard of various secondary processes such as the Second Consonant Shift and for the sake of comparative studies across the Germanic language family, the following correspondences are commonly reconstructed for (unrecorded) Common Germanic:5 1. IE °b, °d, °g > CG °p, % % as before; 2. IE °bh, °dh, γ > CG {% °v}, {°θ,°ό}, just as 3. IE {°p, °ph}, {% °th}, {% > CG {% °v}, {°0,° [f, θ, χ]. As in Germanic, the Greek spirantisation always produces loci that are different from those of the original stops. Contrasting with Grimm's Law and Greek, there is another kind of spirantisation where the place of articulation of stops and the resulting fricatives remains stable. Consider the Spanish case below, where only voiced stops undergo spirantisation, which is dependent on the sonority value of the preceding segment: fricatives stand after vowels and consonants (except [ld]-sequences), while stops occur wordinitially and after nasals (plus [d] after [1]) (see e.g., Hooper 1976: 208; L. Hyman 1975: 62; Kenstowicz 1994: 487):
On spirantisation and affricates
(3)
289
Spanish spirantisation a. Fricatives occur 1. after vowels 'the bank' la banca [la ßagka] la demora [la demora] 'the delay' la gana [la yana] 'the desire' 2. after non-nasal consonants (except [Id]-sequences) curva [kurßa] curve [kalßo] 'bald' calvo 'broth' cardo [karöo] [aßöika] abdica 'he abdicates' [purya] purga 'purge' algo [alyo] 'something' [disyusto] 'trouble' disgusto 'adverse' adverso [aößerso] b. Stops occur 1. word-initially banca [bagka] demora [demora] gana [gana] 2. after nasal consonants ambos [ambos] onda [onda] tengo [teggo] 3. [d] occurs after [1] aldea [aldea]
'bank' 'delay' 'desire' 'both' 'wave' Ί have' 'village'
Whatever the puzzling distribution of triggering and non-triggering preceding environments, there can be no doubt that the sonority of the preceding segment, in conjunction with its place, determines whether a stop or a fricative is met. Alternations involved are [b ~ ß], [d ~ ö] and [g ~ γ]. Apart from [d ~ Ö] (on this point, see Scheer 1996: 229), the place of articulation is invariable: bilabial-bilabial, velar-velar. The Spanish spirantisation also contrasts with Grimm's Law in that aspiration is not involved at all, nor is the alternation "spontaneous". Instead, it makes
290
Tobias Scheer
reference to a triggering contextual condition defined in terms of the sonority and the place of the preceding segment. A similar situation is known from Tiberian Hebrew where any stop, voiced or voiceless, undergoes spirantisation postvocalically (data from Elmedlaoui 1993: 124). (4) Tiberian Hebrew spirantisation Perfective Imperfective Vzkr [zaaxar] [jizkor] Vkpr [kaa$ar] [jixpor] Vbdl [baaöal] [jißdal] Vpth [paaGah] [ji$tah] [ji$gof] V^pgf [paayaf]
Alternation(s) χ~k k~x, φ~ρ b ~ β, ö ~ d ρ-φ,θ-t ρ-φ,γ-g
'remember' 'cover' 'separate' 'open' 'meet'
Like in Spanish, the place of articulation remains stable (except for the dentals), and the triggering context is defined in terms of sonority (fricatives appear postvocalically). As a result, the different properties shown by spirantisations, that is "triggered by aspiration and shift in the place of articulation" versus "triggered by a sonorous context and no shift in the place of articulation" appear to meet a complementary distribution in the way indicated by the double quotes: triggering aspiration implies a change in the place of articulation, while triggering sonorous contexts go along with stable places of articulation. The causal relation of the Spanish/Hebrew phenomenon is usually viewed as an assimilation whereby a more sonorous context assimilates minimally sonorant stops to more sonorant fricatives. There is no reason for the place of articulation to be affected. By contrast, the causal relation between aspiration and the instability of the place of articulation is less obvious. Like other secondary articulations (glottalisation in ejectives, prenasalisation, pharyngealisation in so-called emphatics, affrication), aspiration may be reasonably represented as a contour segment whose second branch hosts the phonological definition of the corresponding glottal friction (see e.g. Iverson and Salmons 1995: 372). Furthermore, the phonological primitive commonly related to guttural activity is the one responsible for aperture, that is, depending on
On spirantisation and affricates 291
the framework used, A (= Retracted Tongue Root), [low], etc., (e.g., Harris 1994: 119; McCarthy 1991; Clements 1993; Angoujard 1995). Accordingly, A is likely to contribute to the secondary articulation at hand (for other primes involved, see Harris 1994: 135). (5)
A contour segment: aspirated stops χ C
A
a Since A is a place definer, it is natural to assume its implication in related processes. That is, A is to be regarded as the melodic prime responsible for the changes in the place of articulation observed in Grimm's Law. The following correspondences obtain for input stops and output fricatives as far as their place is concerned: (6)
Implication of A in place definition a. bilabial + A= labio-dental b. dental + A = interdental c. velar + A = uvular
In other words, the ^-element which is present in the secondary articulation of aspirated stops enters the mother branch of the contour segment and thereby provokes a change in place, the structure collapsing into a non-contour segment.9 Note that as a by-product of this mechanism, a prediction is made to the effect that A does not contribute to the articulation of either bilabial nor dental nor velar stops: if it did, the incorporation of A coming from aspiration would cause no change. This view explains the fact that 1. only aspirated stops undergo spirantisation in Grimm's Law, 2. a change in place occurs, 3. the process is context-free, and 4. in spirantisations of the Spanish/Hebrew kind that involve no aspirated segments, the place remains stable. However, it does not provide any clue to the question why spirantisation occurs at all in Grimm's Law.
292
Tobias Scheer
As mentioned earlier, I aim to show that spirantisation in this case is triggered for the same reason for which certain places lack simplex stops. The correlation of both questions is immediately obvious when exploring the putative result of the process described in (6) if there were no spirantisation. At least for labio-dentals and interdentale, an occlusive output is simply not available because there are no such simplex stops. The next section addresses the question of non-existent stops for certain places of articulation in some more detail. 2
Absence of stops for certain places and the role of affricates
The table given below shows the distribution of obstruents for the different places of articulation.10 (7)
Places of articulation lacking simplex stops Stops Fricatives p, b Bilabials Palatals Φ,β Labio-dentals Velars f, ν Interdentale Θ, ö Uvulars Alveolare t,d s, ζ Pharyngeals Glottals Postalveolars f»3 Alveolo9, ? palatals — —
—
Stops Fricatives c ,i 5,i Κ g Y q, g Χ, V h,? —
?
h, fi
—
The distribution of the gaps in (7) is not likely to be random and thus calls for interpretation. For this purpose, let us see what happens when languages request an occlusive articulation for places that lack simplex stops. An eloquent example is Kabyle Berber, where imperfective forms, as opposed to perfectives, have the characteristic of geminating the second root consonant. As Berber does not tolerate fricative geminates, the stop-correspondent of fricatives may be observed in imperfective forms (data from Elmedlaoui 1993: 124, 133).
On spirantisation and affricates 293
(8)
Kabyle Berber imperfective germination Root Perfective Imperfective Alternation a. Vkjm [ikfim] [kittfim] S ^myc [im3ir] [ m i d a i r ] 3 -das Vnzr [inzir] [niddzir] ζ -das Vmz [imzi] [middzi] ζ ~d3z Vxs [Xittsi] s ~ tts [ixsi] s ~ tts Vfs [ifsi] [fittsi] [sibbiir] b. v^sßK [isßiir] ρ ~bb Θ~tt vm [iföil] [fittil] V/öm [ΐχδΐιη] [/iddim] δ ~dd
'enter' 'harvest' 'beat (sb.)' 'polish' 'go off (fire)' 'unknot' 'paint' 'roll couscous' 'work'
As can be seen, affricates appear when an occlusive articulation is required for a place that does not possess simplex stops. And interestingly, the fricative present in the root reappears in imperfectives as the second part of the affricate. In Berber, it is thus possible to recover the initial fricative when it appears in its affricate stop-version by simply subtracting the first part of the affricate. Let us assume for a moment that this holds true beyond Berber. If so, the most common affricate candidates, i.e., [p?], [ts, dz], [t