158 108
English Pages [411] Year 2020
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
The Complexities of Morphology
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
The Complexities of Morphology Edited by PETER ARKADIEV and FRANCESCO GARDANI
1
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
3
Great Clarendon Street, Oxford, OX2 6DP, United Kingdom Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide. Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries © editorial matter and organization Peter Arkadiev and Francesco Gardani 2020 © the chapters their several authors 2020 The moral rights of the authors have been asserted First Edition published in 2020 Impression: 1 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by licence or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this work in any other form and you must impose this same condition on any acquirer Published in the United States of America by Oxford University Press 198 Madison Avenue, New York, NY 10016, United States of America British Library Cataloguing in Publication Data Data available Library of Congress Control Number: 2020932944 ISBN 978–0–19–886128–7 Printed and bound in Great Britain by Clays Ltd, Elcograf S.p.A. Links to third party websites are provided by Oxford in good faith and for information only. Oxford disclaims any responsibility for the materials contained in any third party website referenced in this work.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Contents List of Figures and Tables List of Abbreviations The Contributors
1. Introduction: Complexities in morphology Peter Arkadiev and Francesco Gardani
vii xi xvii
1
I. THE LANGUAGE-SPECIFIC PERSPECTIVE 2. Irregularity, paradigmatic layers, and the complexity of inflection class systems: A study of Russian nouns Jeff Parker and Andrea D. Sims 3. Demorphologization and deepening complexity in Murrinhpatha John Mansfield and Rachel Nordlinger 4. Overabundance resulting from language contact: Complex cell-mates in Gurindji Kriol Felicity Meakins and Sasha Wilmoth
23 52
81
5. Derivation and the morphological complexity of three French-based creoles Fabiola Henri, Gregory Stump, and Delphine Tribout
105
6. Simplification and complexification in Wolof noun morphology and morphosyntax Michele Loporcaro
136
II. THE CROSSLINGUISTIC PERSPECTIVE 7. Canonical complexity Johanna Nichols
163
8. The complexity of grammatical gender and language ecology Francesca Di Garbo
193
9. Morphological complexity, autonomy, and areality in western Amazonia Adam J. R. Tallman and Patience Epps
230
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
vi
III. THE ACQUISITIONAL PERSPECTIVE 10. Radical analyticity as a diagnostic of adult acquisition John H. McWhorter 11. Different trajectories of morphological overspecification and irregularity under imperfect language learning Aleksandrs Berdicevskis and Arturs Semenuks 12. Where is morphological complexity? Marianne Mithun
267
283 306
IV. DISCUSSION 13. Morphological complexity and the minimum description length approach Östen Dahl
331
References Language Index Subject Index
345 383 387
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
List of Figures and Tables Figures 2.1. Word types per inflection class across different granularities
43
2.2. Complexity measures across granularities of Russian nouns
44
2.3. Conditional entropy of real and a hundred Monte Carlo simulations of Russian nouns across granularities
45
2.4. Effect of the irregularity of each layer on system complexity (entropy difference)
48
3.1. Ackerman & Malouf (2015) mechanism for predicting unknown inflectional forms
58
4.1. Traditional languages and Aboriginal communities of the Victoria River District
87
4.2. Fixed and random effects used to measure the use vs. non-use of subject marking in Gurindji Kriol
92
5.1. Degrees of complexity in the predictability of a base lexeme’s base stem in a particular derivational relation R
108
5.2. Degrees of complexity in the restrictedness of stem X in the morphology of lexeme L, where X serves as L’s base stem in a particular derivational relation
109
7.1. Mean CC 1 standard deviation for three areal breakdowns and selected families
177
7.2. Complexity x longitude
179
7.3. Complexity and altitude in Daghestan (eastern Caucasus) for the three complexity counts
181
8.1. The language sample
198
8.2. Patterns of change in the language sample
207
9.1. Western Amazonian languages sampled
249
9.2. Kernel distribution of densities across the languages of this study
255
11.1. The meaning space of the experimental languages with the corresponding sentences from an example generation 0 language
289
11.2. A schematic representation of the chains in the normal (a), temporarily interrupted (b), and permanently interrupted (c) conditions
290
11.3. Change of the overspecification of agreement, as measured by expressibility, over time
294
11.4. Relative frequency of the agreement marker which denoted the round animal in the initial language of the chain
295
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
viii
11.5. Change of irregularity, as measured by Shannon entropy, over generations
298
11.6. Change of overspecification and irregularity in verbal agreement over generations in individual chains
299
11.7. Learnability as a function of irregularity
302
12.1. Mohawk verb template
317
Tables 1.1. Case paradigm of Turkish ev ‘house’ and Lithuanian miestas ‘city’
3
1.2. Sample paradigms of Lithuanian nouns
4
2.1. An example of morphosyntactically conditioned stress alternation in Russian nouns
30
2.2. Illustration of the four-class system, based on inflectional suffixes
35
2.3. Illustration of stress classes of Russian nouns
37
2.4. Number of nominal inflection classes of Russian nouns as a function of which paradigmatic layers are included
42
3.1. Warlpiri verb inflection classes
55
3.2. Examples of inflected classifier forms
62
3.3. Examples of classifier forms and their formative analyses
63
3.4. Inflectional exponence of na ‘(27)’
64
3.5. Variably inflected classifier stem forms
68
3.6. Allomorphs selected by Ackerman & Malouf (2015) simplification mechanism
69
3.7. Exponence probabilities of older and newer forms
70
3.8. Classifier stem paradigm for ma ‘(34)’
73
3.9. Classifier stem paradigm for ɾa ‘(28)’
74
4.1. Allomorphic reduction in subject marking in Gurindji Kriol
89
4.2. Comparison of case systems and allomorphy across three generations
89
4.3. Occurrence of subject marking in adult Gurindji Kriol speakers according to predictors
94
4.4. Output of generalized linear mixed model analysis on 3,575 tokens
95
4.5. Relative effect of the significant predictors according to dominance analysis
97
4.6. Occurrence of subject marking in child Gurindji Kriol speakers according to predictors
98
4.7. Output of generalized linear mixed model analysis on 2,975 tokens
99
4.8. Relative effect of the significant predictors according to dominance analysis
100
5.1. Patterns of syncretism in the French paradigm (Bonami et al. 2013)
115
5.2. Comparison of and .3 forms in French with long and short forms in Mauritian
117
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
ix
5.3. Sample comparison of long and short forms in four French-based creoles
117
5.4. Stem space of ‘to form’, ‘to finish’, and ´ ‘to defend’
119
5.5. Verb alternations in Mauritian
121
5.6. Reduplication in Mauritian
123
5.7. Deverbal nominalizations in Mauritian
124
5.8. Verb alternations in Guadeloupean
125
5.9. Deverbal nominalizations in Guadeloupean
129
5.10. Verb alternations in Haitian
132
5.11. Deverbal nominalizations in Haitian
133
5.12. Complexity of derivational relations in French, Mauritian, Guadeloupean, and Haitian
135
7.1. Gender unpredictability for some example languages
171
7.2. Areal and family breakdown
176
7.3. Complexity values for four historical groups of languages
180
8.1. Third person pronouns in standard Swedish
201
8.2. Clustering of patterns of change at language-family edges within Eurasia
208
8.3. Direction of change and asymmetries in the structure of the population and/or prestige dynamics
213
9.1. Anderson’s (2015a) schematization of morphological complexity
233
9.2. Similar classifier forms in Guaporé-Mamoré languages (van der Voort 2005: 397)
239
9.3. Evidentiality and tense in Matses (Panoan; Fleck 2007: 593)
243
9.4. Number of morphemes coded in this study by language and functional domain
250
9.5. Number of allomorphs per morpheme attested across the sample
251
9.6. Percentage of morphemes for each EC value across the languages sampled
254
9.7. Rank correlations between EC level and bound status values across languages
258
9.8. Rank correlations between EC level and contiguity value across languages
259
9.9. Rank correlations between EC level and prosodic dependence across languages
261
10.1. Wolof noun class markers
273
11.1. An example of a final language with a fully preserved agreement system
292
11.2. An example of a language with a fully lost agreement system
292
11.3. A language with a fully lost agreement system
296
11.4. A language with an irregular distribution of the agreement markers
297
13.1. Hypothetical noun inflection templates
339
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
List of Abbreviations 1 2 3 A ACLA . . BGW 8 CAY CC
first person second person third person most agent-like or experiencer-like argument of transitive; A-class verb ablative abilitative absolutive accusative Aboriginal Child Language (project) grammatical agent animate anaphoric pronoun antipassive appositional mood applicative ‘article of noun’ aspect associative augmentative auxiliary Bininj Gun-Wok; Gunwingguan, northern Australia noun class 8 plural causative Central Alaskan Yup’ik canonical complexity cislocative classifier; class marker completive contrast comitative connector conditional continuative contrastive copula direct case marker declarative definite; default (in Mansfield and Nordlinger, Chapter 3)
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
xii
. APL EC E-complexity ELAP Fr. G GLMM GYN IALL I-complexity IC IE
demonstrative desiderative determiner indexical marker different event diminutive direct experience evidential discourse marker discontinuitive dual duplicative dynamic E-class verb applicative enumerative complexity; exponence complexity enumerative complexity Endangered Languages Project ergative evidential eyewitness feminine factual focus French vowel frontness frustrative future more goal-like argument of ditransitive geminate genitive Generalized Linear Mixed Models Gbe languages, Yoruba, and Nupe habitual (aktionsart) high vowel height hearsay iterated artificial language learning Integrative complexity inflectional class; inventory complexity Indo-European intransitive inanimate verb immediate imperative imperfective inanimate inchoative
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
L MDL N NC NP 2 PCFP P.N. POS
indicative indefinite infinitive intentional intransitive (subject orientation) interactional irrealis joint agency lexeme long form linking particle linker locative low vowel height masculine Minimum Description Length middle middle marker neuter noun noun class negation non-feminine non-future nominative non-eyewitness evidential noun phrase non-past non-singular nonvisual object; object of monotransitive object oblique optative second position passive grammatical patient paucal Paradigm Cell Filling Problem perfective peripheral plural proper name potential parts of speech
xiii
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
xiv
Poss S SD SV T TAM
possessive possessor process verbalization present pronoun progressive proprietive prothetic vowel presentational partitive proximate past past irrealis realis recent reciprocal reduplication referential focus relative remote respect reflexive reportative ɾ-alternation subject; sole argument of intransitive same event standard deviation sequential short form singular simultaneous semelfactive same subject stative (aktionsart) strong form suppletive subject-verb more theme-like argument of ditransitive transitive animate verb tense/aspect/mood topic advancing voice temporal thematic suffix topic transitive
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
UG V VN VS Y/N
translocative Universal Grammar verb venitive locative verbalization verb-noun verb-subject weak form yes/no
xv
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
The Contributors Peter Arkadiev holds a PhD in theoretical, typological, and comparative linguistics from the Russian State University for the Humanities and a habilitation degree from the Russian Academy of Sciences. Currently he is Senior Researcher at the Institute of Slavic Studies of the Russian Academy of Sciences and Assistant Professor at the Russian State University for the Humanities. His fields of interest include language typology and areal linguistics, morphology, case and alignment systems, tense-aspect, Baltic and Northwest Caucasian languages. He has co-edited Contemporary Approaches to Baltic Linguistics (with Axel Holvoet and Björn Wiemer) and Borrowed Morphology (with Francesco Gardani and Nino Amiridze, both published by De Gruyter Mouton in 2015). Aleksandrs Berdicevskis is a researcher in computational linguistics at the University of Gothenburg, Sweden. At the time of writing he was Assistant Professor at Uppsala University. He has worked on experimental and quantitative approaches to language change and evolution with a focus on Slavonic languages. He has also participated in the development of TOROT (Tromsø Old Russian and Old Church Slavonic Treebank) and related resources. In his PhD dissertation (University of Bergen) he investigated linguistic innovations in Russian computer-mediated communication. Östen Dahl is Professor Emeritus of General Linguistics at Stockholm University, Sweden. He got his academic training at the universities of Gothenburg, Uppsala, and Leningrad (St. Petersburg) and was active at the University of Gothenburg for ten years before moving to Stockholm in 1980. In recent years, his research has mainly been typologically oriented with a strong interest in diachronic approaches to grammar. He has published the monographs Tense and Aspect Systems (1985), The Growth and Maintenance of Linguistic Complexity (2004), and Grammaticalization in the North: Noun phrase morphosyntax in Scandinavian vernaculars (2015). Francesca Di Garbo is currently affiliated to the University of Helsinki as Postdoctoral Research Fellow and member of the GramAdapt team, an ERC-funded project (ID: 805371) investigating mechanisms of adaptation of language structures to social structures. Her research interests include diachronic and synchronic typology, nominal classification, number systems, evaluative morphology, linguistic complexity, sociolinguistic typology, and African languages. Patience Epps is Professor of Linguistics at the University of Texas at Austin. Her research focuses on indigenous Amazonian languages, particularly the Naduhupan language family of the northwest Amazon. Her work engages with language description and documentation, linguistic typology, language contact and language change, and Amazonian prehistory. Major publications include the monograph A Grammar of Hup (De Gruyter Mouton, 2008).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
xviii
Francesco Gardani is Professor of Romance Linguistics at the University of Zurich, Switzerland. His research cuts across the fields of Romance and theoretical linguistics and focuses on morphology, language contact, and linguistic typology. He is the author of Borrowing of Inflectional Morphemes in Language Contact (2008) and Dynamics of Morphological Productivity: The evolution of noun classes from Latin to Italian (2013) and the co-Editor-in-Chief of the Oxford Encyclopedia of Romance Linguistics. Fabiola Henri is Assistant Professor at the University of Kentucky and an affiliate of the CNRS research centre, Laboratoire de Linguistique Formelle. Her recent research focuses on the structure and complexity of morphology in creole languages. Other strands of her research relate to creole genesis, morphology, and its interfaces, and creole syntax, among other topics. She is the co-editor of a recent monograph Negation and Negative Concord: The view from Creoles. Michele Loporcaro is Full Professor of Romance Linguistics at the University of Zurich, a Fellow of Academia Europaea and the Austrian Academy of Sciences. His research focuses on the phonology, morphology, syntax, and lexicon of the Romance languages in synchrony and diachrony; dialectology; linguistic historiography. He is the author of over 200 articles and seven monographs, two of which with OUP: Vowel Length from Latin to Romance 2015; Gender from Latin to Romance 2018 (shortlisted for the Prose Awards of the Association of American Publishers). In 2012 he received the Feltrinelli prize of the Accademia dei Lincei. John Mansfield is Lecturer in Linguistics at the University of Melbourne. His research explores the typology of morphological complexity, with a particular focus on processes of variation and change. Other strands of his research address aspects of morphological theory, prosodic phonology, and sociolinguistics, especially with respect to the Aboriginal languages of northern Australia. John H. McWhorter is Associate Professor of English and Comparative Literature at Columbia University, New York City. He specializes in language change and language contact, in particular the development of creoles, pidgins, koines, ‘vehicular’ languages, and non-standard dialects. Professor McWhorter is author of more than a dozen books including Defining Creole (2005), Language Interrupted (2007), Linguistic Simplicity and Complexity (2011), The Language Hoax (2014), Talking Back, Talking Black (2017), and The Creole Debate (2018). A contributing editor at The New Republic and The Atlantic, he has also hosted Slate’s linguistics podcast Lexicon Valley. Felicity Meakins is ARC Future Fellow in Linguistics at the University of Queensland and Chief Investigator in the ARC Centre of Excellence for the Dynamics of Language. She is a field linguist who specializes in the documentation of Australian Indigenous languages in the Victoria River District of the Northern Territory and the effect of English on Indigenous languages. She has worked as a community linguist as well as an academic over the past twenty years, facilitating language revitalization programmes, consulting on Native Title claims, and conducting research into Indigenous languages. She has compiled a number of dictionaries and grammars of traditional Indigenous languages, and has written numerous papers on language change in Australia.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
xix
Marianne Mithun is Professor of Linguistics at the University of California, Santa Barbara. Her interests range over morphology, syntax, discourse, prosody, and their interrelations; language contact and language change; typology; language documentation and revitalization; and the languages indigenous to North America and Austronesia. Johanna Nichols is Professor Emeritus in the Department of Slavic Languages at the University of California, Berkeley. She works on Slavic languages, languages of the Caucasus, linguistic typology, and historical linguistics. She is AAAS Fellow and LSA Fellow, and presently holds visiting positions as Helsinki University Humanities Visiting Professor and Research Supervisor in the Linguistic Convergence Laboratory, Higher School of Economics, Moscow. She has done extensive fieldwork on the Ingush language of the central Caucasus. Rachel Nordlinger is Professor of Linguistics at the University of Melbourne and Chief Investigator in the ARC Centre of Excellence for the Dynamics of Language. Her research centres around the description and documentation of Australia’s Indigenous languages and their implications for linguistic typology. She has also published on topics in syntactic and morphological theory, and in particular the challenges posed by the complex grammatical structures of Australian languages. Jeff Parker is Assistant Professor of Linguistics at Brigham Young University. His research centres around better understanding inflectional structure from different methodological perspectives, including investigations into how language specific traits contribute to the complexity of inflection class systems, how inflectional structure affects lexical access of inflected forms, and how computational models of learning help explain typological tendencies in inflection class systems. He has published in journals such as Morphology, Word Structure, and The Mental Lexicon, as well as the Slavic-focused Slavic and East European Journal. He is also co-editor of a forthcoming volume, Morphological Typology and Linguistic Cognition (forthcoming, with Andrea D. Sims, Adam Ussishkin, and Samantha Wray). Arturs Semenuks is a PhD student in the Department of Cognitive Science at the University of California, San Diego. He uses experimental and computational methods to investigate what sociocognitive pressures affect the structure of language, especially its morphological complexity, as well as what constraints exist on how language can be structured in principle, and how language affects human thought. His previous work at the University of Essex focused on the relationship between sentence processing costs and acceptability judgements. Andrea D. Sims is Associate Professor at The Ohio State University, jointly appointed in the Department of Linguistics and Department of Slavic and East European Languages and Cultures. Much of her research focuses on the internal organization of inflection class systems (defectiveness and irregularity, syncretism, inflection class complexity) and factors influencing its emergence, reinforcement, and generalization. She is author of a research monograph, Inflectional Defectiveness (2015), co-author of a morphology textbook, Understanding Morphology (2nd edn, 2010, with Martin Haspelmath), and co-editor of Morphological Typology and Linguistic Cognition (forthcoming, with Adam Ussishkin, Jeff Parker, and Samantha Wray).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
xx
Gregory Stump is Professor Emeritus of linguistics at the University of Kentucky. His research includes work on the structure of complex inflectional systems, the nature of inflectional complexity, and the algebra of morphotactics. His research monographs include Inflectional Morphology: A Theory of Paradigm Structure (Cambridge University Press, 2001), Morphological Typology: From Word to Paradigm (Cambridge University Press, 2013, co-authored with Raphael A. Finkel), and Inflectional Paradigms: Content and Form at the Syntax-Morphology Interface (Cambridge University Press, 2016). He is a coeditor of the journal Word Structure. He now resides in Olathe, Kansas. Adam J. R. Tallman is Postdoctoral Researcher at Laboratoire Dynamique du Langage (Université de Lyon II). His research focuses on the documentation and description of the languages of the Amazon. His PhD thesis (University of Texas at Austin, 2018) was a grammar of Chácobo (Pano) based on extensive (ELDP and NSF funded) documentation. Currently he is undertaking the documentation of Araona (Takanan). Apart from his primary interest in documentation and description, Tallman focuses on morphophonology, constituency, and the application of quantitative methods to linguistic typology. Delphine Tribout is Assistant Professor at the University of Lille, France, and member of the CNRS research centre, Savoirs, Textes, Langage. Her main research interests are derivational morphology, especially conversion, and lexical semantics. Sasha Wilmoth is a PhD candidate at the Centre of Excellence for the Dynamics of Language at the University of Melbourne, Australia, working on intergenerational variation and change in Pitjantjatjara. She completed her BA (Hons) degree at the University of Melbourne. She was previously a Research Assistant at the University of Queensland, and Linguistic Project Manager at Appen, a Sydney-based company which provides specialized linguistic data and services for speech and language technologies. Her research interests include morphology, syntax, and digital methods for language documentation.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
1 Introduction Complexities in morphology Peter Arkadiev and Francesco Gardani
1.1 Setting the scene Morphological and, broadly, linguistic complexity has become a popular topic in linguistic typology and theorizing, as several recent publications testify to, such as McWhorter (2001, 2005, 2018); Kusters (2003); Dahl (2004); Hawkins (2004, 2014); Trudgill (2004a, 2011); Shosted (2006); Miestamo et al. (2008); Sampson et al. (2009); Dressler (2011); Kortmann & Szmrecsanyi (2012); Newmeyer & Preston (2014); Baerman et al. (2015b, 2017); Reintges (2015); Baechler & Seiler (2016); Mufwene et al. (2017); among many others. While this large body of work has contributed to significantly improving our understanding of morphological complexity, a number of key issues remain unsettled. They are of both theoretical and empirical nature and pertain to the domain of morphology and morphosyntax as well as to the ways language use and its socioecological conditions influence linguistic structure. Undoubtedly, the most pressing question is what morphological complexity actually is. There is no straightforward answer to this question, as we will see. The issue of how to define ‘morphological complexity’ is of central importance to us and will be treated in detail in the course of this Introduction and of the volume. To properly frame this central issue, however, we can anticipate that the notion of ‘complexity’ in morphological systems is often revealed and investigated through a set of relative measures that attempt to quantify the extent of morphology in a language, the predictability of the morphological system, and the pressures this places on processing and acquisition. The goal of the present volume is to build upon previous work on morphological complexity and to provide a crosslinguistic view on the key problems of its investigation seen from the perspective of a variety of current approaches. In the heart of all discussions of linguistic complexity, and especially of morphological complexity, lies the idea that complexity itself is a parameter of crosslinguistic variation. The history of this line of thought (see Joseph & Newmeyer 2012 for an excellent overview) shows some non-trivial swings of the pendulum ranging from the pre-theoretical assumptions of the linguists and
Peter Arkadiev and Francesco Gardani, Introduction: Complexities in morphology In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Peter Arkadiev and Francesco Gardani. DOI: 10.1093/oso/9780198861287.003.0001
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
2
philosophers of the early nineteenth century about the ‘complex’ classic IndoEuropean languages as opposed to the ‘primitive’ languages of ‘uncivilized people’ to explicit statements that all languages are equally complex. The latter view, which is known under the label of ‘equicomplexity hypothesis’, takes into account obvious differences between languages in the mere degree of elaboration of different structural subdomains (such as, e.g., vowels vs. consonants or nominal vs. verbal morphology); it states that ‘these isolable properties may hang together in such a way that the total complexity of a language is approximately the same for all languages’ (Wells 1954: 104; see also Hockett 1958: 180). Such a position, which is still commonly held by linguists of different backgrounds and theoretical persuasions (see, again, Joseph & Newmeyer 2012: 348–9; and Miestamo 2017), has been challenged by others, who have shown that ‘complexity in one area of grammar [correlates] positively with complexity in another area’ (Sinnemäki 2014: 190). With the development of contact linguistics and especially of pidgin and creole studies in the second half of the twentieth century, claims started being made that pidgins and creoles are structurally overall simpler than languages with a ‘regular’ sociolinguistic history (see, e.g., such work as Bickerton 1984; McWhorter 2001, 2005; Parkvall 2008; Bakker et al. 2011; Good 2012b, 2015), and, more generally, it has been claimed that linguistic complexity is subject to diachronic change and the effects of language contact (see Dahl 2004 and Trudgill 2011). As a matter of fact, statements to the effect that sociolinguistic parameters such as the number of speakers and degree of contact with other languages affect the complexity of linguistic (sub)systems go back as early as Jakobson (1929) and Trudgill (1983). Once it had been recognized that morphological complexity is a parameter of crosslinguistic variation, the urge arose to develop non-impressionistic and crosslinguistically applicable ways of measuring and quantifying the degree of morphological complexity of individual languages. The most important proponent of this line of thought is certainly Greenberg (1954), who developed a methodology of quantitative measurement of different types of morphological structure, the most famous of which is the ‘synthetic index’ (p. 185), that is, morpheme-toword¹ ratio in a sample of texts, which arranges languages into a continuum spanning from radically isolating to polysynthetic. This simple metric, however, is clearly insufficient for the assessment of morphological complexity, since morphology is much more than mere arrangement of morphemes into words. As a simple illustration, consider the case-number paradigms of Turkish (Lewis 2001: 28) and Lithuanian (P.A.’s own knowledge) nouns in Table 1.1. Both Turkish and Lithuanian have two number and six case values, yielding twelve word forms. However, while in Turkish case and number are expressed
¹ ‘Word’ is intended as ‘word form’.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
3
Table 1.1. Case paradigm of Turkish ev ‘house’ and Lithuanian miestas ‘city’
ev ev-i ev-in ev-e ev-de ev-den
ev-ler ev-ler-i ev-ler-in ev-ler-e ev-ler-de ev-ler-den
miest-as miest-ą miest-o miest-ui miest-e miest-u
miest-ai miest-us miest-ų miest-ams miest-uose miest-ais
separately by dedicated suffixes in a compositional way, Lithuanian has cumulative (fused) exponence of both features. Under Greenberg’s morpheme-per-word ratio, Turkish nominal word forms are more complex than Lithuanian ones just because Turkish may have three (and in fact much more) morphemes per nominal word form (e.g., ev-ler-de house--), while Lithuanian has only two (miest-uose city-.). However, if we consider the total number of different affixes occurring in the given paradigms, we find that Turkish with its six overt affixes is actually simpler than Lithuanian with its twelve affixes (see, e.g., Plank 1986 for an early attempt to assess the complexity of morphological systems in such terms). Things become even more complicated if we go beyond Table 1.1 and consider the existence of at least five arbitrary inflectional classes of nouns in Lithuanian intersected by four partly arbitrary accentual classes, also called ‘accentual paradigms’ (a.p.), in Table 1.2 (from Arkadiev et al. 2015: 16; ‘hard’ and ‘soft’ refers to subdeclensions with non-palatalized and palatalized stem-final consonant, respectively; for more details on Lithuanian declension classes, see Ambrazas et al. 2006: 107–33). This example suggests that along with morphological complexity on the syntagmatic axis (something that can be measured by the morpheme-to-word ratio) there exists morphological complexity on the paradigmatic axis, the two being logically and empirically independent of one another. Thus understood, morphological complexity becomes a composite notion and does not admit of such simple measurement as syntagmatic complexity (see more on this issue below), therefore an unbiased and non-reductionist crosslinguistic empirical investigation of morphological complexity itself becomes a fairly complex problem.² All in all, it seems to us that the most urgent still unsolved issues in morphological complexity can be captured in terms of the following questions:
² In this connection, Haspelmath (2009) has shown that parameters traditionally attributed to ‘flexion’, as opposed to ‘agglutination’, such as cumulation, stem allomorphy, and affix allomorphy, are logically and empirically independent of each other.
I hard ‘man’ () I a.p.
I soft ‘horse’ () III a.p.
II hard ‘day’ () IV a.p.
II soft ‘bee’ () II a.p.
III hard ‘son’ () III a.p.
IV (soft) ‘night’ () IV a.p.
výras výro výrui výrą výru výre výre výrai výrų výrams výrus výrais výruose
arklỹs árklio árkliui árklį árkliu arklyjè arklỹ arkliaĩ arklių̃ arkliáms árklius arkliaĩs arkliuosè
dienà dienõs diẽnai diẽną dienà dienojè diẽna diẽnos dienų̃ dienóms dienàs dienomìs dienosè
bìtė bìtės bìtei bìtę bitè bìtėje bìte bìtės bìčių bìtėms bitès bìtėmis bìtėse
sūnùs sūnaũs sūń ui sūń ų sūnumì sūnujè sūnaũ sūń ūs sūnų̃ sūnùms sūń us sūnumìs sūnuosè
naktìs naktiẽs nãkčiai nãktį naktimì naktyjè naktiẽ nãktys naktų̃ naktìms naktìs naktimìs naktysè
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Table 1.2. Sample paradigms of Lithuanian nouns
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
5
1. The hypothesis that morphology and syntax represent distinctly different, but interdependent types of grammatical organization has been challenged by scholars such as Haspelmath (2011), claiming that the divide between morphology and syntax is not clear-cut and hence irrelevant for typology. Given this, are there theoretical and methodological tools suitable to define morphological complexity and if yes, which ones? 2. If we, however, accept the hypothesis that the morphology vs. syntax divide is crosslinguistically and theoretically valid (see Arkadiev & Klamer 2019; Arkadiev 2020)—a view which we espouse—can we arrive at a uniform notion of morphological complexity given the diversity of morphological phenomena? 3. In direct connection to the former question, can we arrive at a single and straightforward measure of complexity that applies to languages that display radically different morphological encoding strategies? 4. What is the role of sociolinguistic, psycholinguistic, and diachronic factors in affecting morphological complexity? These problems constitute the main research questions of this volume, which aims to tackle them in a principled way, by presenting a collection of original research papers on different aspects of morphological complexity. This introductory chapter is meant to outline the field and take the reader through the volume, and it is organized as follows: section 1.2 pursues the question of the scope of ‘morphological complexity’; section 1.3 surveys several conceptions and methodological approaches to morphological complexity distinguishing between two main types: formal approaches (section 1.3.1) and psycholinguistic approaches (section 1.3.2). Section 1.4 presents the structure of the volume and summarizes the contributions to it.
1.2 What is complex? In all discussion on morphological complexity, a question hangs in the air. Is morphology complex in its own right? This question is partly rhetorical, maybe trivial, but still central, as it concerns the theoretical demarcation of the object of investigation. The widespread expression ‘morphological complexity’ has at least two readings. It can refer to the overall contribution of morphology to complexity in grammar or it can mean complexity inside morphology. The first reading, viz. morphology as a source of complexity for the overall language system, would be justified by the fact that languages can do (almost) entirely without morphology and that ‘a language can persist for a long time with little or no morphology’ (Aronoff 2015: 282). In this vein, Carstairs-McCarthy (2010: ch. 2) and Anderson (2015a: 12–13) conceive of morphology as a
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
6
redundant architectural quirk added to the logically necessary systems of syntax and phonology, and Aronoff goes so far to declare: ‘morphology is inherently unnatural. It’s a disease, a pathology of language’ (Aronoff 1998: 413). Such a view apparently entails that languages without morphology (e.g., Yoruba) are less complex than languages with at least a little morphology (e.g., Tok Pisin). This type of morphological complexity could then be paraphrased as ‘complexity induced by morphology’. The assumption that morphology per se is a complication resonates with the terminological use of ‘morphological complexity’ to define the property of words having an internal morphological structure, being, so to say, morphologically complex, as we find in some authors concerned with word recognition (e.g., Fiorentino & Poeppel 2007; Bozic & Marslen-Wilson 2010), sign linguistics (Zwitserlood 2003), and rarely word formation (Hay 2003). Clearly, in this usage, complexity means the presence of internal structure, and claiming that a formally complex (i.e., composite) word is in itself complex, as opposed to a simplex word, amounts to saying that morphology as such is complexity. That would imply that morphology makes the language system more complex—an observation that is relative to other components of a language’s grammar. Adopting the concept of ‘effective complexity’ by Gell-Mann (1995), Moscoso del Prado Martín (2011) performs a corpus-based measure of the inflectional complexity of six European languages and claims that there is a ‘strong degree of mutual dependence between morphological and syntactic information.’ As he shows, when information on word order is explicitly factored in, the apparent gradation in complexity across languages, as calculated on the basis of the number of inflected forms per word, disappears. He arrives at the conclusion that ‘inflectional morphology serves a role in reduction of uncertainty, simplifying the description of the whole grammar’ (p. 3528). Whether or not this be the case, this question—although of great importance also for cognitive approaches to complexity—is not within the scope of the present book. Rather, we are concerned with the second reading of morphological complexity, that is, complexity inside morphology. Taking an inner-morphological perspective, we focus on which morphological phenomena can be considered complex or more complex than others and look at different degrees of complexity within morphology. Some authors have swiftly found an answer to this question, by identifying the core of morphological complexity in phenomena currently running under the heading of autonomous (or ‘pure’) morphology—including morphological entities and processes that are not extramorphologically motivated in a straightforward way, such as, for example, inflectional classes, allomorphy, patterns of syncretism, suppletion, etc. (Aronoff 1994; Maiden et al. 2011; Cruschina et al. 2013). For example, Baerman et al. (2015b: 4) consider morphological complexity as ‘the additional structure that cannot readily be reduced to syntax or phonology’. This extra layer of purely morphological structure, such as inflection classes in the Lithuanian example in
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
7
section 1.1, may attain an astonishing degree of gratuitous complexity, whereas the mere presence of (possibly elaborate) transparent and regular affixal expression of grammatical meaning, such as exemplified by Turkish, is of least relevance for the study of morphological complexity (see also a discussion of different aspects of complexity in the polysynthetic languages, traditionally assumed to be the hallmark of morphological complexity, by Dahl 2017 and Sadock 2017). Of course, the decision to only focus on autonomous morphology has a great methodological advantage, as it provides a clear answer to the question we formulated in section 1.1, concerning the problematic demarcation of morphology and syntax. However, while we acknowledge that phenomena of pure morphology (‘morphology by itself ’) do increase the complexity of morphology as a whole because they have no external motivation, morphology by itself, as it has been theorized, only includes inflection. This would imply that only inflection counts as the locus of complexity and it is a matter of fact that most of the literature published on this topic is exclusively devoted to inflection (see Baerman et al. 2015a, 2017; Baechler 2017). Definitions of morphological complexity (in quantitative terms) such as the number of morphosyntactic features that a language has and the morphological means that are used to realize these features (see below) conform to this view, for morphosyntactic features are typically realized by inflection. As a matter of fact, work on the complexity of word formation processes is virtually missing in the literature, the only two exceptions known to us being a one-paragraph section in Nichols et al. (2006: 101–3) and Stump (2017: 70), each. Therefore, there is no study investigating whether inflection or word formation differ in their degree of complexity along one or another parameter. As Franz Rainer (personal communication, 2017) observes, ‘a great number of asymmetries emerge between word formation and inflection with respect to different dimensions of complexity’, such as the number of elements in the system, number of affixes in a word, or the complexity of allomorphy, among others. However, he notices, ‘in the literature on the inflection-derivation divide (cf. Štekauer 2015), complexity has not been identified up to now as a possible dimension along which these two subcomponents of morphology might differ’. Lack of work on this specific topic might be due to multiple reasons: first, the boundaries between inflection and word formation are often fuzzy; second, word formation, with lexical enrichment as its central function and all its corollaries (e.g., importance of encyclopedia, semantic drift), is less neat and less automatic than inflection and more difficult to grasp (see Kusters 2003: 14–16); third—and crucially—the generally adopted metrics of morphological complexity (see section 1.3) mostly focus on formal criteria, thus lumping together categories of inflection and those of word formation under the general heading of morphological complexity. As we will see in more detail below, research in particular by Dahl (2004, 2009) and Trudgill (2009, 2011) has identified three major ingredients of synchronic
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
8
morphological complexity, which seem to apply to both inflection and word formation: (a) irregularity (e.g., allomorphy); (b) morphosemantic and morphotactic opacity (such a fusion of formatives, cumulative or portmanteau formatives, suppletion and non-linear suprasegmental feature realizations); and (c) syntagmatic redundancy (e.g., pleonastic affixation, see Gardani 2015).
1.3 How many complexities? As we have seen in section 1.1, the linguistic literature on complexity is abundant, not least because ‘[h]ow to measure morphological complexity is itself an issue of some complexity’ (Nichols 1992: 64). As Miestamo (2017: 229) has appropriately noticed, complexity refers either to ‘something that is rich in internal composition (i.e. contains many parts as well as multiple and intricate connections between them), or to something that is difficult to do or to understand.’ In the first case, complexity is an objective property of a linguistic system and therefore labeled ‘objective complexity’ (Dahl 2004: 2) or ‘absolute complexity’ (Miestamo 2008) or ‘formal complexity’ (Stump 2017); in the second case, complexity is conceived as cost/difficulty that a given linguistic system or structure causes to language users and labeled ‘relative complexity’ (Miestamo 2008, 2017) or ‘psycholinguistic complexity’ (Stump 2017). In the following, we will adopt Stump’s terminology.
1.3.1 Formal morphological complexity Formal complexity can be subsumed under the following general definition of complexity provided by the philosopher Nicholas Rescher: ‘Complexity is first and foremost a matter of the number and variety of an item’s constituent elements and of the elaborateness of their interrelational structure, be it organizational or operational’ (Rescher 1998: 1). In linguistics, we identify three principal directions in research on formal complexity, in terms of how it is conceptualized and measured: (1) quantitative approaches; (2) qualitative approaches; and (3) informationtheoretic approaches. Quantitative approaches conceive complexity in terms of the number of elements of which a given morphological entity consists, mainly inventory size and string length, or alternatively, the length of the rules necessary to describe a form. This quantitatively construed type of complexity, dubbed ‘enumerative complexity’ by Ackerman & Malouf (2013), is detectable both syntagmatically and paradigmatically. On the syntagmatic axis, it can be the before-mentioned average number of morphemes per word form (Greenberg 1954, 1960) or the maximal number of inflectionally expressed categories per verb (Bickel & Nichols 2005); this type corresponds to Rescher’s constitutional complexity, viz. the ‘[n]umber of
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
9
constituent elements or components’ (Rescher 1998: 9). On the paradigmatic axis, enumerative complexity relates to the number of distinct inflectional classes for a given part-of-speech (i.e., allomorphy) or the number of cells in a paradigm corresponding to the realizations of different values of a given morphological feature (e.g., case); this type of complexity corresponds to Rescher’s taxonomical complexity, the ‘[v]ariety of constituent elements, i.e., number of different kinds of components in their physical configuration’ (Rescher 1998: 9). Up to fairly recent times, only enumerative complexity had featured prominently in the literature, especially in typologically oriented research; for example, it is only this kind of complexity that is represented in WALS (Haspelmath et al. 2005; Dryer & Haspelmath 2013), certainly due to practical reasons. In this respect, it is worth mentioning several works specifically addressing the issue of enumerative paradigmatic complexity, such as Rhodes (1987) on the different morphological makeup of large and small paradigms and a whole series of works by CarstairsMcCarthy, whose aim was to find constraints on enumerative complexity of inflectional classes in terms of the number of affixal allomorphs and their properties (see Carstairs 1983; Carstairs-McCarthy 1994, 1998, 2010). Another type of quantitative measure concerns not the number of the elements composing a morphologically complex form but rather the (minimum) size (or length) of the rules required to describe and generate such a form. This type of qualitative approach, often referred to as Kolmogorov complexity, resonates with the Rescher’s concepts of both descriptive complexity (the ‘[l]ength of the account that must be given to provide an adequate description of the system at issue’) and generative complexity (the ‘[l]ength of the set of instructions that must be given to provide a recipe for producing the system at issue’, Rescher 1998: 9) (cf. Dahl’s ‘minimum description length’, Chapter 13, this volume). Qualitative approaches conceive complexity in terms of identifying those morphological patterns/elements that are complex or more complex than others. Proponents of qualitative approaches need to stipulate an unmarked, complexity-neutral ideal—a canon, often conceived as an isomorphic relation of content to form—upon which to construe hierarchies of complexity in terms of degrees of deviation from it. Most notably, work by Corbett (e.g., 2007, 2015) has propagated the notion of non-canonicity (both in inflection and derivation), which can be defined as any deviation from properties such as transparency, regularity, and form-function biuniqueness, as is manifested, for example, in non-phonological allomorphy of affixes and stems (Baerman et al. 2017: 100–7), overabundance (Thornton 2019), multiple (extended) exponence (Harris 2017), syncretism (Baerman et al. 2005), defectiveness (Baerman et al. 2010), and polyfunctionality (Stump 2016: 228–51), let alone more dramatic deviations such as suppletion (Stump 2006a; Corbett 2007) or deponency (Baerman et al. 2007). Early discussions of non-canonicity and its possible interactions with enumerative complexity can be found in Plank (1986) and Carstairs (1987) in addition to
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
10
works already mentioned, while recently, Johanna Nichols (2009) has hinted at a possible metric of morphological complexity related to non-canonicity (a proposal she fully develops in Chapter 7, this volume). Most studies of non-canonical phenomena in morphology have focused on the paradigmatic axis; however, nothing per se precludes the application of this notion to syntagmatic phenomena, such as combinatorics and mutual order of affixes (here comes to mind the distinction between semantically driven layered organization of morphology vs. opaque templatic morphology; see Stump 2006b, Good 2016), concatenative vs. non-concatenative exponence, morphophonological transparency vs. opacity and other issues belonging to the domain of morphotactics. It remains an empirical as well as a conceptual question, though, which kind of morphotactic organization should be considered ‘canonical’ and ‘less complex’. For instance, in languages where affix order directly reflects semantics, it is usually possible to permutate certain affixes depending on their mutual scope (Rice 2011; Mithun 2016); whether such deviations from fixed ordering constitute additional complexity is not at all obvious. While teleologically different, also Natural Morphology (Dressler et al. 1987; Dressler & Kilani-Schoch 2016; Dressler 2019) is centered on the idea of deviation from a core.³ Aiming at accounting for morphological preferences based on extralinguistic motivations, it theorizes a semiotically derived notion of naturalness, defined as the immediate, most unmarked, cognitively easiest, and thus universally preferred option. Conversely, naturalness-defining criteria determine deviation from the (most) natural option. This framework makes clear that other factors come to play a role in the conception and interpretation of morphological complexity, such as, for example, transparency vs. opacity of forms or morphotactic rules. As Hengeveld & Leufkens (2018: 141) observe, ‘languages may be complex, yet transparent, or simple, yet opaque’. To take the concrete case, the Turkish vs. Lithuanian data in Table 1.1 show that Turkish morphology is more complex in the sense that a single word form may potentially contain a high number of morphemes. At the same time, however, it is transparent in that every morpheme corresponds to one fixed meaning, while Lithuanian morphology is more opaque. In the framework of Natural Morphology, Dressler (2011) views unnaturalness as a source of complexity and morphological complexity as the sum of all morphological categories, rules, and inflectional classes of a language, including both productive and unproductive patterns. Distinguishing between productive and unproductive patterns, he considers morphological complexity a hyperonym of morphological richness, which is conceived only in terms of productive patterns (Dressler 2003: 47; see also Dressler, Kononenko, et al.
³ Note that, while qualitatively oriented, both Natural Morphology and Canonical Typology are implicitly able to quantify degrees of complexity, computing the degree of deviation from the natural core or canon, respectively.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
11
2019). This distinction between active and static parts of morphology, is, in our view, not only of crucial importance with respect to psycholinguistic approaches to complexity but also foundational of approaches focused on predictability, as we will see below. Finally, information-theoretic approaches play down the role of combinatorics and construe morphological complexity in terms of predictability and entropy. Their development is intimately related to word-and-paradigm models of morphology, which consider inflectional systems as networks of implicative relations holding between fully-inflected word forms. Consequently, they aim to understand to what extent the choice of exponence for a given cell is predictable from any other information available to the speaker, with complexity being in an obvious inverse relation to predictability (cf. Finkel & Stump 2007, 2009; Stump & Finkel 2013). Ackerman & Malouf (2013) propose the term ‘integrative complexity’, based on the notion of entropy as ‘a measure of the reliability of guessing unknown forms on the basis of known ones’, that is, a measure of predictability. They move from the intuition that ‘speakers must generalize beyond their direct and limited experience of particular words’ (p.436) and posit a ‘Low Entropy Conjecture’: morphological systems, such as paradigms, in which conditional entropy among related word forms is low, are more efficient, as they ‘permit these crucial inferences to be made easily’ (p. 436) (cf. ‘Paradigm Structure Conditions’ of Wurzel 1989).⁴ In other words, complexity derives from opaque intraparadigmatic relations, for opacity hampers the predictability and predictiveness among word forms in a lexeme’s paradigm. The ‘Low Entropy Conjecture’ is supported by recent studies on inflection class systems clearly violating the enumerative complexity-based constraints of the kind proposed by CarstairsMcCarthy (see Baerman 2012, 2016; Sims 2015).⁵ The approaches to formal morphological complexity surveyed thus far share the potential to seize the degree of complexity. However, some typological studies have pursued the topic without a focus on metrics. One line of investigation, for example, has concerned the relation of (certain aspects of) morphological complexity to any other typological parameters such as phonological systems (Shosted 2006; Fenk-Oczlon & Fenk 2008, 2014), word order (e.g., Sinnemäki 2008; Bentz & Christiansen 2013), among others. Other studies have focused on the differential elaboration of nominal and verbal morphology (e.g., Nichols 1986, 1992; Mithun 1988; Kibrik 2012). In this domain, there are still more open questions than established answers, partly because of the lack of consensus as regards the
⁴ Also morphomic stem distributions have been interpreted in terms of predictive relations by Blevins (2016b: 123), a view partly criticized by Maiden (2018: 23–4). ⁵ It is likely that a conception of complexity based on entropy applies better to inflection than word formation because inter-word relations are generally much more complex in inflectional than in derivational paradigms.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
12
definition of the relevant aspects of complexity and the adequate ways of its measurement. Still another line of research is concerned with the relation between morphological complexity and sociolinguistic typology. In section 1.1, we already mentioned the idea that pidgins and creoles are in general less complex than languages with a long history and uninterrupted transmission. More generally, in recent work (e.g., Trudgill 1997, 2009, 2011, 2017; Kusters 2003, 2008; McWhorter 2007, 2008; Lupyan & Dale 2010; Bentz & Winter 2013; Bentz et al. 2015; Bentz 2016), claims have been advanced that the overall degree of complexity as well as certain particular types of grammatical complexity correlate with such socioecological conditions of language use as high vs. low degree of contact, number of adult learners, size and geographic expansion of the speaker population, and some others (see also Tinits 2014 for a behavioural experiment with a miniature artificial language). Significantly, most of such studies have focused on simplification caused by language contact (see Dorian 1978; McWhorter 2001; among many others), emphasizing that morphological complexity requires long-term periods of socioecological stability to develop (Dahl 2004). Nevertheless, studies exist showing that certain types of language contact (e.g., those involving stable childhood multilingualism) can contribute to preserve complex patterns (Trudgill 2011; Mithun 2015) and even result in increase rather than loss of morphological complexity due to borrowing and contact-induced grammaticalization (see Vanhove 2001; Aikhenvald 2002, 2003a; de Groot 2008; Loporcaro 2018; Loporcaro et al. forthcoming). Also processes of language genesis brought about by language contact do not necessarily come along with morphological simplification. In a study on the rapid birth of a new mixed language in Australia, Gurindji Kriol, from the admixture of Gurindji and Kriol, Meakins et al. (2019) demonstrate that there was no preferential adoption into Gurindji Kriol of less complex variants and that, in fact, complex Kriol variants were more likely to be adopted than simpler Gurindji equivalents. Given that Gurindji Kriol is the primary language of the younger generation in the Gurindji community, Meakins et al. interpret these results in light of the fact that the acquisition of morphology in morphologically complex languages is less challenging for children than for adults (cf. also Miestamo 2008). The issue of ease vs. difficulty of processing in language acquisition leads us over to the second main type of morphological complexity introduced in section 1.3, viz. psycholinguistic morphological complexity.
1.3.2 Psycholinguistic morphological complexity As we have seen in the previous section, also Natural Morphology and Ackerman & Malouf’s (2013) integrative complexity appeal to ease in processing and
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
13
production, as a key to the interpretation of what is complex in morphology. These models build a bridge to the second type of approach to morphological complexity, psycholinguistic morphological complexity, that focuses on the cost/ difficulty that a given linguistic system or structure causes to language users, that is, computational effort. Psycholinguistic approaches to morphological complexity assume that the degree of ease vs. cost of a morphological pattern in processing and production correlates with its degree of complexity. This line of research draws evidence from three areas of study: adult processing, L1 and L2 acquisition, and the performance of artificial automatic learning. One line of investigation within this field has developed around the equation of complexity with low parsability (Stump 2017). In this respect, the debate on the balance between memory retrieval and online computation in language production is particularly relevant. In the context of the debate on lexical access and specifically of the so called English past-tense debate (for references, cf. Ambridge & Lieven 2011: 169–87), Pinker & Prince (1988) argued for a ‘dual-route’ model that could account for both irregular forms (feel/felt), which are memorized as wholes in the mental lexicon, and an online rule of default responsible for morphemic concatenation (walk/walked) (see also Gardani et al. 2019: 24–7). At the same time, it was observed that regular forms with high frequency can also be stored in the mental lexicon (Alegre & Gordon 1999a: 56). However, the fact that both morphologically less complex (i.e., highly parsable) and morphologically complex (i.e., low parsable) word forms can be lexically stored leads to concluding that complexity qua parsability does not correlate with processing cost. The role of frequency in lexical access has been stressed by nobody else as vigorously as by Joan Bybee (1985, 1995, 2007). Consequently, the conception of complexity focusing on system complexity, in which irregularity is viewed as an ingredient of complexity, is incompatible with the results of studies on processing complexity, which have shown that irregularity does not per se constitute an obstacle for the language user, as it can be defeated by frequency. Studies in language acquisition, too, do not necessarily support the hypothesis that psycholinguistic complexity and formal complexity coincide. For example, in a crosslinguistic study on the relationship between the morphological complexity of child-directed speech and the speed of morphological acquisition in children, Xanthos et al. (2011) found a strong positive correlation between inflectional complexity of the input and the speed of acquisition. This result seems to suggest that the more morphology in the input, the easier the morphology is to acquire. According to Kelly et al. (2014), formal complexity such as heavy synthesis in polysynthetic languages is not a challenge for L1 acquisition if the templatic sequence in which formatives are used is regular, and Allen (2017) also reports longitudinal studies showing that Inuit children acquire elaborate derivational and inflectional morphology early and with ease. (See also Stoll et al. 2017, on the acquisition of verb morphology in polysynthetic Chintang.) Other acquisitional
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
14
studies construe formal complexity not as constitutional complexity but as descriptive complexity. For example, in a crosslinguistic study on the emergence and early development of synthetic compounds, Dressler, Sommer-Lolei, et al. (2019) provide evidence that synthetic compounds (i.e., compounds in which the head is derived from a verb and the non-head is an argument of this verb) such as German Nussknacker ‘nutcracker’ are acquired later than comparable threeconstituent compounds. They interpret this later acquisition as a sign of higher complexity: equating the degree of complexity with the number of rules involved, synthetic compounds, which are derived by both a rule of compounding and a rule of derivation, are more complex than words derived either only by compounding or only by derivation rules. Besides that, numerous studies, both typological and experimental (e.g., Wray & Grace 2007; Lindström 2008; Trudgill 2011; Bentz et al. 2015; Bentz & Berdicevskis 2016; Atkinson et al. 2018), show that morphological complexity, while being an obstacle to L2 acquisition in adults and hence subject to erosion, regularization, and loss in those situations of language contact that involve massive adult acquisition, does not, in fact, constitute a severe challenge for L1 acquisition in children. Moreover, Lupyan & Dale (2010) have hypothesized that infants, in fact, benefit from the increased redundancy brought about by morphological complexity in languages used in small groups. Psycholinguistic approaches to morphological complexity have attracted criticisms mainly of two sorts. One problem is that the perception of ease or, conversely, difficulty, might vary among language users, and therefore might not be an objective metric; the other problem is that ‘psycholinguistic background research on the processing cost and learning difficulty of a given grammatical phenomenon’ might not be enough (Miestamo 2017: 232). As a matter of fact, the correlation between ‘our intuitive notion of morphological complexity and actual evidence of the pace of acquisition of more or less complex inflectional systems in child language’ (Marzi et al. 2018) seems to be poor. In order to solve at least the objectivity issue, recent research in morphological complexity has expanded into the field of neurobiologically inspired computational models of processing and learning. In one such study, Marzi et al. (2018) have focused on the performance of recurrent self-organizing neural networks trained to learn languages, in order to understand how degrees of inflectional complexity affect word processing strategies. They found a significant systematic correlation between regularity and predictability of verb forms and interpret the evidence ‘as the result of a balancing act between two potentially competing communicative requirements’, viz. recognition (leading to a maximally contrastive system) and production (leading to maximally predictable forms).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
15
1.4 About this volume In section 1.1, we identified four issues we deem among the most urgent to solve in research on morphological complexity. In order to tackle these issues in a principled way, we convened a dedicated workshop ‘Morphological Complexity: Empirical and Cross-Linguistic Approaches’ at the 48th Societas Linguistica Europaea (SLE) meeting in Leiden in 2015. The present volume is a collection of original research papers consisting in equal measure of papers delivered at the workshop and of invited contributions. (Each chapter was subject to a threefold reviewing process consisting of an anonymous external reviewing, a nonanonymous internal review performed by a fellow contributor, and comments by the editors.) The volume features: (a) various theoretical, methodological, and typological perspectives on morphological complexity (from ‘classic’ morphological description to experimental and information-theoretic approaches); (b) both detailed investigations of individual languages and wider crosslinguistic studies; (c)synchronic and diachronic analyses; (d) a broad coverage of topics including structural and sociolinguistic issues, such as the development of morphological complexity under different sociohistorical conditions (prominently, language contact); (e) empirical evidence drawn from languages from all continents and belonging to a number of typologically diverse language families. Unfortunately, the volume does not cover the complexity of word formation and the complexity of sign language morphology. We hope that future research will take care of these issues. The volume, introduced by the present chapter, consists of three parts organized according to the chapters’ main focus and scope, and is closed by a discussion in Chapter 13 by Östen Dahl on the volume’s contributions and on the minimum description length approach. Part I includes five chapters dealing with issues of morphological complexity from a language-specific perspective. Jeff Parker and Andrea Sims’s Chapter 2, ‘Irregularity, paradigmatic layers, and the complexity of inflection class systems: A study of Russian nouns’ follow Stump & Finkel’s (2013: 55) definition of complexity of an inflection class system as ‘the extent to which the system inhibits motivated inferences about a lexeme’s full paradigm of realized cells [ . . . ]’. Using data from Russian, the authors explore the implications of gradient (ir)regularity for measuring and comparing the complexity of inflection class systems. They find that some, but not all, less regular inflectional patterns significantly increase the complexity of the system, but that the increased complexity is mitigated by structural and distributional properties of the inflectional system. In Chapter 3, ‘Demorphologization and deepening complexity in Murrinhpatha’, John Mansfield and Rachel Nordlinger investigate diachronic changes in the complexity of verb inflection in Murrinhpatha, a polysynthetic non-Pama-Nyungan language of northern Australia, which displays a high level of
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
16
complexity in terms of unpredictable analogical relations in inflectional exponence. The authors demonstrate that recent changes in inflection allomorphy blur the boundaries of stem and affix, resulting in gradual demorphologization and increasingly unpredictable exponence. Felicity Meakins and Sasha Wilmoth’s Chapter 4, ‘Overabundance resulting from language contact: Complex cell-mates in Gurindji Kriol’ examines the development of overabundance (see above) in the subject-marking system of Gurindji Kriol, an Australian mixed language. By means of generalized linear mixed models, which probabilistically measure the use vs. non-use of a feature, the authors interpret the insurgence of overabundance as an instance of complexification, providing a counterexample to the commonly held view that contact always results in reduction of morphological complexity. In Chapter 5, ‘Derivation and the morphological complexity of three French-based creoles’, Fabiola Henri, Gregory Stump, and Delphine Tribout take a fresh look at a controversial assumption in creole research, namely the widespread claim of poverty of creole morphology (see references in section 1.1). Analysing deverbal nominalizations via conversion in Mauritian, Guadeloupean, and Haitian, and assessing the integrative complexity of the respective morphological systems’ derivational relations, the authors demonstrate that the complexity of the derivational relations in these creoles attains the same degree as those of the lexifier, French. Finally, in Chapter 6, ‘Simplification and complexification in Wolof noun morphology and morphosyntax’, Michele Loporcaro explores the diachronic dynamics of morphological complexity in the nominal morphology and morphosyntax of Wolof, an Atlantic language of Senegal. Loporcaro shows that, while changes such as the emergence of inflectional irregularities produced a local increase in complexity in noun and determiner morphology, overall the morphology of Wolof is less complex than that of closely related Atlantic languages. Loporcaro provides an explanation of the simplifying tendencies in sociolinguistic terms, referring to the correlation between simplification and prestige in the Wolof speech community. Here, speaking correctly is associated with low-caste in rural settings, while linguistic prestige is achieved through language mixing, extensive borrowing, and, crucially, the simplification, via paradigmatic leveling, of inherited alternations impacting on both the morphology and the morphosyntax of the language. Part II consists of three chapters approaching morphological complexity from a crosslinguistic perspective. Johanna Nichols’s Chapter 7, ‘Canonical complexity’ considers not size but non-transparency the locus of morphological complexity and adopts the notion of (non-)canonicity to define crosslinguistically comparable variables, capture non-transparency, and restrict the comparanda to a manageable sample. Francesca Di Garbo’s Chapter 8, ‘The complexity of grammatical gender and language ecology’ is a crosslinguistic investigation of the evolution of gender agreement patterns, which are viewed as an instance of morphological complexity, and its ties to sociohistorical factors. Analysing a sample of thirty-six languages in
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
17
a qualitative fashion, the author is able to establish association between multiple patterns of change, such as loss, reduction, emergence, and expansion of gender, on the one hand, and various sociohistorical situations, ranging from demographic structure (population size) to language policies and language attitudes, on the other. In Chapter 9, ‘Morphological complexity, autonomy, and areality in western Amazonia’, Adam Tallman and Pattie Epps investigate the relationship between morphological complexity and areality-building processes across Amazonia. The authors observe (a) morphological proliferation in four domains (nominal classification, tense, evidentiality, and valency-adjusting mechanisms) across unrelated western Amazonian languages; (b) high system complexity across these domains; and (c) a link between complexity and language contact. They conclude that factors often associated with morphological complexity are in fact not necessarily morphological, as a large percentage of bound morphemes in these languages display ambiguity between morphology and syntax. The three chapters in Part III address the problem of morphological complexity from an acquisitional perspective. In Chapter 10, ‘Radical analyticity as a diagnostic of adult acquisition’, John McWhorter proposes that languages can become radically analytic, that is, completely or near-completely void of inflectional morphology, only via incomplete acquisition. He draws evidence from West Africa and Southeast Asia and shows that the relevant languages score more like creoles than like older languages. In McWhorter’s view, second-language acquisition decisively reduces grammatical complexity (in terms of bound inflection) to a degree that ordinary language change cannot. The author suggests that radical analyticity can be treated as evidence that such second-language acquisition occurred in the history of the language, and thus, synchronic morphological complexity can serve as a clue to the past of a language, in the absence of historical documentation. Also Chapter 11, ‘Different trajectories of morphological overspecification and irregularity under imperfect language learning’ by Aleksandrs Berdicevskis and Arturs Semenuks deals with imperfect language learning, partly supporting McWhorter’s conclusion. By reference to the editors’ fourth question (see section 1.1), the authors investigate how morphological complexity is related to socioecological parameters. They run an iterated artificial language learning experiment, tracing the change of two facets of complexity: overspecification and irregularity. They find that the presence of imperfect learners in a transmission chain leads to a much stronger decrease in morphological overspecification. Overspecification, however, is not usually fully eliminated, and its partial decrease often leads to increased irregularity, thus making languages simpler in one respect, but more complex in another. Additionally, higher irregularity decreases learnability, and this effect is stronger for imperfect learners compared to normal learners. Thus, the relationships between these two facets of morphological complexity and language learnability have their own complexities. Finally, Marianne Mithun’s Chapter 12, ‘Where is morphological complexity?’ is firmly anchored in the debate on the
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
18
psycholinguistic reality of complexity. Examining the speech of native speakers of two North American languages influenced to varying degrees by contact with English, Mithun observes that even native speakers with limited proficiency produce morphological structures that are highly complex for the analyst, with large numbers of morphemes per word, fusion, and irregularity. She argues that the distinction between what linguists consider complex and what speakers find difficult (or easy) to acquire or preserve, is not surprising if one takes the view that morphology in these languages is not processed and learned online, but rather in chunks. As we said, Östen Dahl closes the volume by critically reviewing the volume’s chapters and seeing how the concepts of morphological complexity applied therein relate to the ‘minimum description length approach’. Turning now to the four research questions (section 1.1) the contributors to this volume focused on, we observe that (question 1) it is possible to define morphological complexity, even though the demarcation between morphology and syntax is in many cases fuzzy (see Tallman & Epps, Chapter 9, this volume). At the same time, however, we observe that different authors provide and apply different definitions, also within this volume. Seemingly, the very existence of multiple definitions of morphological (and morphosyntactic) complexity is related not only to the collocation of a specific linguistic feature along the grammar continuum (from pure morphology to morphosyntax), but also to the diversity of phenomena and types of complexity. This observation leads us to answer question 2, namely whether is it possible to arrive at a uniform notion of morphological complexity. We concur with Dahl (Chapter 13, this volume), that a set of shared notions and standard works that everybody refers to has not yet been reached. Thus our answer to question 2 is no, and the motivation for it is that the linguistic facts are so multifarious and diverse that not one, but many different complexities can be detected (whence the plural in this chapter’s title). Then we asked (question 3) whether it is possible to arrive at a crosslinguistically applicable and theoretically founded measure of morphological complexity. Berdicevskis et al. (2018) have recently pointed to the absence of a gold standard. We, too, have observed that there exists neither a commonly accepted definition of morphological complexity nor a uniform measure thereof. Admittedly, the growing understanding of the multifaceted nature of morphological complexity is much in line with the mutivariate nature of typological comparison. So, perhaps we asked the wrong question. Probably, the quest for a unique measure is an epistemological fallacy. Once we have acknowledged that there is not one morphological complexity, but many morphological complexities, we should identify a set of complementary specific measures to apply crosslinguistically. Then, the only reasonable typological approach to morphological complexity is to break it down into individual variables (if necessary, each with its quantitative measure) and then look for mutual correlations between such variables or for their connections with other parameters of crosslinguistic variation. Of course, cumulative
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
19
measures such as the one developed by Nichols (Chapter 7, this volume) are also possible, but they are not holistic, either, and in many cases are based on a significant reduction of empirical data. In conclusion (question 4), we wanted to investigate the role of such extramorphological factors as diachronic development and (in)stability, susceptibility to loss vs. spread in situations of language contact, and, generally, of sociolinguistic and socioecological parameters, in affecting morphological complexity. As several chapters in this volume have demonstrated, in spite of at times diverging results, the study of the correlation between morphological complexity and extralinguistic factors such as the role of language contact or speakers’ sociolinguistic attitudes, is fruitful and promising. Of course, the answers we have provided here are per force partial and by far not definitive, as much more case studies and comparative evidence are necessary to get to a reliable picture of such complex phenomena as morphological complexities. We hope that future research will pursue these pathways.
Acknowledgements The volume’s editors wish to thank the authors, the external reviewers, and our editors at OUP. The support of the Swiss National Science Foundation (SNF CRSII1_160739) is gratefully acknowledged. Besides that, we thank Aleksandrs Berdicevskis, Wolfgang Dressler, Michele Loporcaro, and Franz Rainer for their insightful comments on a preliminary version of this introductory chapter.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
I
THE LANGUAGE-SPECIFIC PERSPECTIVE
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
2 Irregularity, paradigmatic layers, and the complexity of inflection class systems A study of Russian nouns Jeff Parker and Andrea D. Sims
2.1 Introduction The extent to which morphological patterns are included in analyses of inflection class systems tends to be strongly influenced by what is considered to be a ‘regular’ or ‘irregular’ pattern in a language. The number of classes and their definitional properties reflect the assumptions and analytical choices of the investigator. Two such choices are particularly notable. First, patterns that are reflected in few lexemes or unproductive tend to be labeled as ‘irregular’ and considered to be outside of the system. Second, where inflectional properties are correlated with both affixal and non-affixal exponence (e.g., stress, stem alternations), the affix tends to be treated descriptively and theoretically as the exponent of the properties, with non-affixal marking often treated as a kind of irregularity, or simply ignored. Some approaches explicitly choose to focus only on regular affixal patterns (e.g., Cameron-Faulkner & Carstairs-McCarthy (2000)). Others handle stem alternations as phonological readjustments, denying them status as exponents of morphosyntactic properties; see Halle (1994) for this idea as applied to Russian nouns. Even within the Word and Paradigm framework, which explicitly rejects the classical notion of the morpheme as a bundling of (affixal) form and meaning (see Stump 2001: ch. 1 for an overview of arguments), linguists sometimes ignore non-affixal dimensions in their analyses as a practical matter, showing how deeply ingrained the privileged status of affixal patterns is in linguistics. For example, in their study of inflection class system complexity, Ackerman and Malouf (2013: 434f) acknowledge that the description of Greek nominal inflection they adopt abstracts away from ‘many relevant complexities,’ including inflectional stress.¹ (So does their description of Russian nominal inflection.) ¹ As another example, even PARSLI (PARadigm Shape and Lexicon Interface), which is designed to explicitly represent non-canonical inflectional properties like stem change, defectiveness, overabundJeff Parker and Andrea D. Sims, Irregularity, paradigmatic layers, and the complexity of inflection class systems: A study of Russian nouns In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Jeff Parker and Andrea D. Sims. DOI: 10.1093/oso/9780198861287.003.0002
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
24
.
In this chapter, we explore the role that irregularity and non-affixal exponence play in the complexity of inflection class systems.² Recent typological studies of inflection class complexity have focused on the implicative structuring of inflection classes and the extent to which this structure is informative about the exponence of inflected forms (Ackerman et al. 2009; Ackerman & Malouf 2013; Blevins et al. 2017; Bonami & Beniamine 2015; Sims 2015; Sims & Parker 2016; Stump & Finkel 2013). This is reflected in the way that Stump & Finkel define the complexity of an inflection class system as ‘the extent to which the system inhibits motivated inferences about a lexeme’s full paradigm of realized cells from subsets of its cells’ (Stump & Finkel 2013: 55; emphasis ours). Throughout this chapter we will assume a similar definition; see (1). (1)
Complexity of an inflection class system: the average extent to which the system inhibits motivated inferences about the realized form of a lexeme, given one or more other realized forms of the same lexeme.
We make this notion more precise and operationalize it as average conditional entropy in section 2.5 below. Implicative definitions of complexity as in (1) represent a step in the direction of crosslinguistic comparison based on the internal structuring of inflectional systems, rather than measures like the number of inflection classes or the size of paradigms.³ The former is what Ackerman & Malouf (2013) call ‘Integrative’ complexity; the latter they call ‘Enumerative’ complexity. Integrative complexity measures represent a productive development to the extent that they better reflect the ways in which inflectional systems pose challenges for speakers.⁴ While it is not clear to us that any particular notion of complexity within morphology will be adequate for the variety of questions that morphology poses, the implicative-based notion of complexity adopted here also has the potential to emerge as an
ance, etc., does not include non-segmental information like stress as a possible deviation from canonicity (Walther 2017). ² Since inflection classes are an example of a purely morphological phenomenon, that is, not syntactically relevant, this type of complexity seems to avoid the problematic questions about the division between morphology and syntax (see discussion in Arkadiev & Gardani, Chapter 1, this volume). ³ For a distinct but somewhat related notion, see the discussion of ‘relative’ and ‘absolute’ measures of complexity in Miestamo (2008) inter alia. Miestamo’s discussion of relative approaches focuses on psycholinguistic and acquisition-oriented approaches/evidence. While our information-theoretic measures are not psycholinguistic in nature, they (and their use in previous work, for example, Ackerman et al. 2009) could be classified as relative in terms of their focus on the potential ‘cost and difficulty to language users’ (Miestamo 2008: 24). (See also discussion in Arkadiev & Gardani, Chapter 1, this volume; Dahl, Chapter 13, this volume.) ⁴ See section 2.5 for some justification of this claim, and for defining inflection class complexity in terms of the predictability of individual forms, rather than the lexeme’s class membership (i.e., its entire paradigm of forms).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
25
important way to uncover crosslinguistic tendencies in the complexity of inflection class systems (see questions 2 and 3 in Arkadiev & Gardani, Chapter 1, this volume). At the same time, the fact that much previous work within such notions of complexity has been based on descriptions of inflectional systems that include only affixes, and sometimes only the most regular patterns, leaves it unclear whether claims about limits on inflection class complexity (e.g., the Low Conditional Entropy Conjecture, Ackerman & Malouf 2013) apply to all inflectional patterns in a language or only those that are most regular. More generally, it raises questions about how patterns that are typically excluded from consideration interact with other elements in the system, and the role they play in determining the complexity of inflection class systems. Brown & Hippisley (2012) are a notable exception to this tendency to focus just on affixal exponence. We follow them in using the term ‘paradigmatic layers’ (2012: 71) of exponence (or just ‘layers’ for short) for dimensions of inflectional form (e.g., stress, suffixes, stem alternations) that have their own, independent distributions but which jointly realize the inflectional information of a word. We use Russian nouns to investigate these issues. We consider how patterns that are often excluded from consideration affect the complexity of the system and how they are integrated into the implicative structure of the system. The core questions that we ask are: How do interactions between component parts of the Russian nominal inflection class system shape the complexity of that system as a whole? In particular, are less-regular and non-affixal layers of exponence disruptive to an inflectional system, disproportionately increasing its complexity? Or, alternatively, is their disruptive potential mitigated by the way elements in the system interact? Little work has compared implicative structuring within subcomponents of the lexicon—an issue that is potentially important for understanding the internal structuring of inflectional systems. By looking at the inflectional structure of Russian nouns in this way, we aim to promote a fuller understanding of how inflectional organization determines the complexity of inflection class systems. We do not assume that every language is alike, or that Russian is representative. But we use Russian as a way to explore and illustrate the issues involved.
2.2 Regularity, paradigmatic layers, and inflection classes We focus on irregularity and non-affixal layers of exponence because the representation of a system can affect the assessment of its complexity. For example, Sagot & Walther (2011) compare four descriptions of French verbs. The descriptions range from a system with many classes and no lexically specified stem allomorphy (139 classes) to lexically specifying all stem allomorphy (one
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
26
.
inflection class), with two other descriptions that split the burden of explanation between the inflection class system and lexical specification. As they observe (p. 42), it makes little sense to evaluate an inflectional system based only on the morphological description and not what is lexically specified, since a morphological description can always be made simpler by positing more lexical specification. They thus evaluate the analyses in terms of description length, including both the morphological description and lexically specified information. Equating the degree of complexity of the system with the length of its description, they show that the complexity of the different analyses differs significantly; a description with twenty classes and up to twelve lexically specified suppletive stems for some lexemes results in the shortest length.⁵ The point here is that degree of complexity is a property of a particular description of French verbs.⁶ This makes it particularly important to examine and justify the description itself. Stump & Finkel (2015) make a similar point along a different dimension of description. They contrast two potential representations of the same set of English verbs, one based on acoustics alone (what they call ‘hearer-oriented’) and one based on structure known to a speaker that does not surface in the production of forms (‘speaker-oriented’). For example, the exponence of the past participle(s) of and are identical in a hearer-oriented representation, that is, /εnt/, but a speaker knows that they contain different structure, that is, /εn-t/ vs. / εnd-t/. Stump & Finkel show that the two representations exhibit differences in their complexity based on various information-theoretic and set-theoretic measures. (See also Bonami 2013 for similar issues with French verbs.) Mansfield & Nordlinger (Chapter 3, this volume) also draw attention to how systems are represented. Investigating Murinhpatha (non-Pama-Nyungan, Northern Australia), they show that speakers have made analogical changes to the verbal system which, surprisingly, do not lead to greater predictability among allomorphs. They suggest that using existing measures of conditional entropy to calculate the complexity of the system would be misrepresentative because verbs in the language are a closed class with largely idiosyncratic exponence. The exponence for the verbs is made up of intersecting formatives that are partially ⁵ See also Goldsmith (2001, 2011) for arguments for description length-based evaluation metrics in morphological analysis. ⁶ In employing an evaluation metric based on description length, Sagot & Walther (2011) argue that descriptions of shorter length (i.e., of less complexity in their sense) are more adequate. However, it is not obvious to us that for a given inflectional system, the description with the shortest description length should be taken to be the most adequate one. This is a question of the evaluation metric. For instance, see Derwing (1990) for arguments against evaluation metrics based on economy of storage (incl. minimum description length) and for metrics based on economy of processing speed and Dahl (Chapter 13, this volume) for discussion on the relationship between Minimum Description Length and other notions/metrics of complexity. It is not a foregone conclusion that a description that is most cognitively realistic will be the description with the lowest estimated complexity in terms of either description length or the implicative notion outlined in (1) above. This is a question for investigation, but beyond the scope of the present work.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
27
predictive of each other. If the exponents are represented as unanalysable wholes (as they sometimes are in the literature), the subregularities among intersecting formatives, which may help explain the analogical changes, are obscured. Finally, Cotterell et al. (2019) note that the information-theoretic measure used in, for example, Ackerman & Malouf (2013), is highly sensitive to the particular descriptive analysis that is made of an inflectional system. They propose an alternative measure of Integrative complexity in terms of joint entropy—a calculation based on the joint distribution over all cells, with complexity defined as the entropy of the distribution.⁷ However, even if joint entropy is less sensitive to the representation of the system, this does not eliminate the need to investigate how analytic assumptions about that representation affect calculations of the complexity of inflection class systems. These studies highlight how the description of a system can affect calculations of its complexity. Given that inclusion or exclusion of irregularity and non-affixal exponence can substantially change the description of an inflection class system, we should ask in what ways they affect the complexity of that system. It is beyond the scope of this chapter to argue for one particular representation of Russian nouns as being more adequate than another. But roughly similarly to the approach of Sagot & Walther (2011), we explore the effect of different descriptions of the Russian nominal inflection class system for estimates of its complexity.⁸
2.2.1 Regularity and inflection classes It has long been known that high type frequency inflection classes create analogical pressure on irregular patterns. When irregular patterns resist regularization, the most common argument for their persistence despite analogical pressure is that they are lexically stored, leaving them relatively impervious to regularization. The typically high token frequency of such lexemes also makes lexical specification psycholinguistically plausible. This and other evidence of lexical storage is sometimes taken as a basis for treating irregulars as falling outside of the grammatical system—in this case, the inflectional system.
⁷ Cotterell et al.’s work was presented at the Society for Computation in Linguistics just as we were completing final revisions to this chapter, so did not have the opportunity to apply their joint entropy metric to our data, nor to explore whether it produces estimates of system complexity that are less dependent on the particular descriptive analysis that is made of an inflectional system. However, we see this as a promising avenue for investigation. ⁸ Unlike Sagot & Walther (2011), we do not offer a formal analysis of Russian nouns, and make no particular assumptions about what inflectional information is part of the grammatical system, and what is lexically specified. However, like them, we include both regular, productive forms, and also ones that analyses might treat as lexically specified. And of course, their paper and our chapter are similar in investigating how different analytic assumptions affect assessments of the complexity of the inflection class systems.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
28
.
However, a categorical division into regular and irregular types has long been recognized as problematic. First, the scope of a form’s irregularity can range from having an exponent associated with a different class to having a fully suppletive form. The extent to which a lexeme is irregular can also range from a single cell to the majority of the paradigm. (See Corbett et al. 2001 for examples from Russian.) Aside from the most extreme cases of suppletion, irregular lexemes exhibit irregularity in only a subset of their paradigms’ cells. And even in suppletion, stem distributions are often shared with regular patterns (Aski 1995; Bonami & Boyé 2002; Hippisley et al. 2004; Boyé & Cabredo Hofherr 2006). Thus, even the most irregular lexemes frequently overlap with regular ones and tend to exhibit at least some degree of systematicity (Brown & Hippisley 2012). In fact, Brown & Hippisley argue that ‘there is no hard-and-fast contrast between rules and lexical specification. Rather, we must make a distinction between the rule on the one hand and how the lexeme accesses that rule’ (p. 80). In their theory, Network Morphology, rules are information held at nodes in an inheritance hierarchy. This information is inherited ultimately by individual lexemes, defining their patterns of inflectional exponence. However, lexemes may inherit information by default or by direct specification of the node from which the lexeme should inherit. This means that within their theory, regularity is defined in terms of how a lexeme accesses a rule, and a single rule may represent regularity in some lexemes and irregularity in others. Second, speakers draw on their knowledge of irregular patterns when generalizing to new lexemes (Bybee & Slobin 1982; Albright & Hayes 2002, 2003). Words that are traditionally categorized as irregular play a crucial role in predicting how speakers generalize morphological patterns to new words. Irregular inflectional patterns can be more reliable in certain contexts (e.g., phonological neighborhoods) than more regular patterns. Correspondingly, inflectional patterns that are highly irregular can be extended. The athematic 1 marker -m in Common Slavic spread from just a handful of verbs to become the dominant 1 marker in some West and South Slavic languages (Janda 1994). Thus, even highly irregular patterns can exhibit a degree of productivity. Third, it is now generally accepted that both irregularly and regularly inflected words are stored in the mental lexicon and leave traces in memory (Alegre & Gordon 1999a; Baayen 2007 inter alia). Baayen et al. (2007), among many others, find a surface frequency effect for regularly inflected words in a lexical decision task even with low frequency lexemes. Starting with Taft (1979), such a frequency effect has been widely interpreted as reflecting direct lexical storage of the forms, rather than storage via component morphemes.⁹ Thus, showing that irregulars are ⁹ See Taft (2004) and Taft & Ardasinski (2006) for more recent, sceptical interpretations of surface and base frequency effects. Models with different primitive assumptions about representational structure also interpret surface frequency effects somewhat differently, for example connectionist
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
29
subject to lexical storage is not a sufficient basis on which to argue that irregular items are not part of the system of inflectional patterns. Evidence of this sort blurs the binary classification of inflectional patterns into ‘regular’ and ‘irregular’ types and undermines any concomitant claim that there is a categorical distinction between patterns generated by the inflectional rule system (and thus appropriately described in terms of inflection classes) and those that are lexically-stored exceptions. Yet in the context of knowing that the description of an inflection class system makes a big difference for calculations of its complexity, analytic assumptions that place irregulars outside of the inflectional system are pernicious because they preclude even asking important questions about how irregulars interact with regulars and the consequences of this for the complexity of the system.
2.2.2 Paradigmatic layers and inflection classes Similar observations can be made about paradigmatic layers of inflection. Linguistics has a deep-rooted tradition of thinking of words as combinations of linearly (and perhaps hierarchically) ordered morphemes. As noted at the beginning of the chapter, there is a philosophical preference for concatenative patterns that manifests in a privileged status for affixes both descriptively and theoretically. Nonetheless, different layers of exponence can exhibit distinct structural organization. For example, a subset of Russian nouns exhibits fixed stress on the ending and has a stress retraction in the nominative plural, and also in accusative plural when syncretic with nominative (’ ‘nail’ and ‘lip’ in Table 2.1). This is one of several morphosyntactically conditioned stress alternations in Russian nouns (see Zaliznjak 1967 for a description of stress patterns; Brown et al. 1996 offers an overview in English). The alternations define a set of structured stress classes that partly crosscut the suffix-based classes and form an inheritance hierarchy that is distinct from the one defined by inflectional suffixes (Brown et al. 1996). The point here is that the stress and suffix patterns both are informative about and conditioned by morphosyntactic values. For some classes, represented here by ‘lip’ and ‘window’, stress placement is the only thing that distinguishes nominative/accusative plural from genitive singular. In practice, however, virtually all analyses of Russian nominal inflection focus on classes as defined by (regular) suffixal groups, even though inflectional stress exhibits its own, models (Daugherty & Seidenberg 1994) and discriminative learning models (Baayen et al. 2011). However, the important thing in the present context is that none of these models posit that irregular and regular inflected forms are processed and stored in the mental lexicon in categorically different ways (an idea put forward most famously by Prasada & Pinker (1993) and advocated for from a neurolinguistic perspective by Ullman (2001, 2004), but now widely rejected).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
30
.
Table 2.1. An example of morphosyntactically conditioned stress alternation in Russian nouns
’ ‘nail’
‘lip’
‘window’
gvozd’ gvozd’ gvozdjá gvozdjú gvozdé gvozdjóm gvózdi gvózdi gvozdéj gvozdjám gvozdjáx gvozdjámi
gubá gubú gubý gubé gubé gubój gúby gúby gúb gubám gubáx gubámi
oknó oknó okná oknú okné oknóm ókna ókna ókоn óknаm óknаx óknаmi
independent organization into classes. And this choice is rooted, ultimately, in analytic assumptions of the linguist that give a privileged status to affixes in the description of inflectional systems. Another argument comes from the fact that layers of exponence may offer a full picture of the organization and complexity of a system only when considered jointly. Chiquihuitlán Mazatec (Oto-Manguean, Mexico) verbs are marked for person and aspect by a combination of tones, final vowel, and stem formative (Jamieson 1982). The uncertainty associated with predicting the tone, final vowel, and stem formative for a paradigm cell in isolation is high. Moreover, knowing the full paradigm for one of the layers of exponence (tone, final vowel, or stem formative) does little to help predict the pattern for other layers of the same lexeme (Ackerman & Malouf 2013: 448). However, the uncertainty associated with predicting the exponence of any given cell knowing one other cell in the paradigm is surprisingly low because each word form carries some information about the possible tone, final vowel, and stem formative of other cells; there is strong implicative structure between individual cells, which crosscuts the three layers of inflectional exponence (see average conditional entropy in Ackerman & Malouf 2013: 443). Similarly, Sims (2015: ch. 5) shows that the distribution of genitive plural defectiveness in Greek nouns is predictable from the relationship between affixal patterns and inflectional stress. When these layers of inflection are taken together, the picture that emerges is that the genitive plural in some classes is implicatively stranded in the paradigm, causing defectiveness. This kind of evidence undercuts any attempt to exclude non-affixal paradigmatic layers. In the Greek example, the paradigmatic layers reveal aspects of inflectional organization that cannot be discerned from affixal structure alone. The Mazatec example is similar with the addition that including all of the layers of
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
31
inflection actually leads to less complexity than would be expected given each layer independently. The inclusion of stress information in Russian nouns necessitates a second, distinctly structured inheritance hierarchy. Ultimately, paradigmatic layers can reveal organizational properties of inflectional systems that are otherwise hidden. Thus, as with irregularity, analytic assumptions that exclude non-affixal paradigmatic layers from consideration preclude important questions about how elements in an inflectional system interact to determine its overall complexity.
2.2.3 Interim Summary In summary, estimates of the complexity of inflectional systems depend on the representations of the systems under investigation. While there has been a tendency to exclude irregular inflectional patterns and non-affixal layers of exponence from these representations, doing so is not well justified on empirical or theoretical grounds. Both irregulars and non-affixal layers have the potential to reveal structural properties of the system that are otherwise obscured. The question becomes whether a broader understanding of what belongs to ‘the system’ makes a difference for calculations of its complexity, and how.
2.3 Inflection class complexity Inflection classes are a layer of structure that mediates between form and meaning, without bearing meaning directly (they are morphomic in Aronoff’s 1994 terms), and some languages do not have inflection classes, showing that classes are not ‘needed’. These observations have led to the idea that inflection classes create unnecessary complexity in morphological systems and have raised the question of whether there are limits on that complexity. As noted in the introduction, the focus of this question has shifted away from a notion of complexity defined in terms of absolute number of inflection classes/ exponents/cells and towards one that is rooted in implicative paradigmatic structure. Stump & Finkel (2013) define the complexity of an inflection class system as ‘the extent to which the system inhibits motivated inferences about a lexeme’s full paradigm of realized cells from subsets of its cells’ (2013: 55; emphasis ours). When defined in this way, the complexity of an inflection class system may, but need not, be related to the absolute size of the system. Systems with a large number of inflection classes and/or in which lexemes have a large number of paradigm cells can exhibit low complexity if there is strong implicative structure within the paradigm. Likewise, small inflectional systems can be highly complex if inflected forms are not held together by strong implicative relations (Sims 2015: ch. 5).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
32
.
Stump & Finkel operationalize their definition primarily in terms of set-theoretic principal part sets—a set of realized cells from which a lexeme’s full inflection class membership can be determined. The concept of a principal part set is, by its very nature, concerned with implicative paradigmatic structure, giving a way to compare the complexity of different inflection class systems. Somewhat similarly, Ackerman et al. (2009) use information-theoretic tools to ask how much surprisal is associated with the inflected form realizing one paradigm cell, given the form associated with another cell, and define the complexity of an inflection class system in terms of its average conditional entropy. Ackerman & Malouf (2013) use the same information theoretic tools to compare the complexity of a set of typologically diverse languages. Stump & Finkel (2013) and Ackerman & Malouf (2013) both find that when complexity is defined in terms of implicative structure, individual forms tend to be predictable on average. In a survey of ten languages, Ackerman & Malouf calculate the average conditional entropy associated with the realization of a set of morphosyntactic values given knowledge of one other form of the same lexeme and show that it is uniformly relatively low, despite diversity in the size of the languages’ inflectional systems.¹⁰ They focus on the idea that implicative structure allows even large systems to exhibit low average conditional entropy and present their results as a typological tendency, the Low Conditional Entropy Conjecture: ‘enumerative morphological complexity is effectively unrestricted, as long as the average conditional entropy, a measure of integrative complexity, is low’ (2013: 436). Stump & Finkel (2013: 215) offer a similar generalization in the form of the Depth-of-Inference Contrast: ‘languages show a high degree of uniformity in allowing a given form in a lexeme’s paradigm to be deduced from a low number of dynamic principal parts (the average number being not much more than one)’.¹¹ Thus, both find evidence that even inflectional systems that vary widely in size tend to allow for well-motivated inferences when it comes to the task of inferring one inflected form from another. The idea that inflectional systems must maintain low complexity in this way is intuitive given that speakers must learn inflection classes for them to persist. Also, speakers must be able to generalize morphological patterns because not all inflected forms are attested even in large corpora (Baayen 2001; Blevins et al. 2017), and the need to predict unknown forms remains crucial throughout the lifespan (Bonami & Beniamine 2015). ¹⁰ However, at least for the languages that we are most familiar with (Russian, Greek), they base their analyses on grammatical descriptions that exclude irregularities and non-affixal layers of exponence. See Sims (2015: ch. 5) for a comparison between their analysis of Greek nouns and one based on a more robust representation of the nominal system. ¹¹ In a dynamic principal parts analysis, the principal parts need not reflect the same morphosyntactic properties from one inflection class to another. Stump & Finkel primarily differentiate this from a static principal parts analysis, in which the set of principal parts is required to correspond to the same morphosyntactic properties for all lexemes in a given syntactic category, and thus all inflection classes within that category.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
33
At the same time, Stump & Finkel observe a difference in complexity between predicting one inflected form and predicting class membership (i.e., all forms). In contrast with the relatively uniform ease with which a single inflected form can be deduced, ‘Languages vary widely in the number of dynamic principal parts they require to distinguish a given I[nflection] C[lass]’ (Stump & Finkel 2013: 215). Similarly, Ackerman & Malouf find greater crosslinguistic differences in average declensional entropy (an unconditioned entropy measure of inflection class predictability) than in average conditional entropy (a conditional entropy measure of inflected form predictability). This suggests that the complexity of an inflection class system as a whole is not necessarily a direct product of the complexity of the individual exponents. It is therefore important to investigate how the complexity of the system as a whole relates to the complexity of the component elements of the system. A few steps have been taken in this direction. Sims & Parker (2016) find that nine investigated inflection class systems show roughly similar degrees of overall complexity, when calculated over pairs of forms using conditional entropy, consistent with the Low Conditional Entropy Conjecture. Crucially, however, they also show that implicative structure does very different amounts of ‘work’ in the languages to produce this result. In some languages, knowledge of one inflected form is crucial to predicting another. In other languages, inflected forms are independently fairly predictable, and knowledge of another form does little or nothing to improve that predictability. Thus, paradigmatic implication is not always an important determinant of the complexity of inflectional systems. Additionally, based on data from Icelandic and French, Stump & Finkel (2013) propose the Marginal Detraction Hypothesis: ‘[m]arginal I[nflection] C[lasse]s tend to detract most strongly from the IC predictability of other ICs’ (p. 225). Marginal classes here are defined as ones with few lexemes. The Marginal Detraction Hypothesis thus asks whether the internal structure of inflection class systems is homogeneous. The hypothesis is that the implicative structure of low type frequency classes may differ from that the most frequent classes. (See also Sims & Parker 2016 for a similar idea.) Related to this, Blevins et al. (2017) argue that the Zipfian distribution of morphological patterns helps balance two opposing pressures: the importance of predicting forms and the importance of discriminating forms. Frequently occurring patterns facilitate prediction. Suppletive patterns, which are likely to belong to low type frequency classes, may detract from predictability but at the same time have benefits like being highly discriminative. Both types of patterns contribute, in different ways, to ensuring the patterns in the language are usable by speakers. Together these studies explore the idea that competing pressures may lead different components of inflectional systems to exhibit different properties. They also suggest that if there is a strong crosslinguistic tendency for languages to exhibit low inflection class complexity, this both results from and occurs despite
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
34
.
structural aspects of inflectional systems. But so far there is little understanding of how the elements of inflectional systems interact to determine inflection class complexity, so further work is needed in this area, especially work comparing implicative structuring within subcomponents of the lexicon.
2.4 Russian nouns We now turn to the example of Russian nouns. Our approach to investigating Russian is to divide inflectional exponence into its subcomponents and to investigate the effect of each on the complexity of the inflection class system. We do this in two ways. First, starting with a baseline description of the Russian nominal system that consists only of classes as defined by inflectional suffixes, we add in further information about exponence—additional paradigmatic layers—and look at the effect of this on the complexity of the inflection class system (section 2.6). Second, to look more directly at irregularity, we classify the individual exponents within each paradigmatic layer as regular or irregular. We then investigate the extent to which this (ir)regularity contributes to the complexity of the inflection class system (section 2.7). This idea is conceptually close to the Marginal Detraction Hypothesis, given the close connection between the irregularity and type frequency of inflection classes. However, quantifying the regularity of inflection classes’ layers directly allows us to take a closer look at whether layers are making distinct contributions to the complexity of the system as a whole. But first, in this section we describe the data sets that we work with. Various proposals have been made regarding the number of Russian noun classes. The four-class system of Corbett (1982), shown in Table 2.2, is a typical representation of the Russian nominal system, but it is also coarse-grained. It may be an appropriate basis for some kinds of linguistic investigation but questions of inflection class complexity benefit from a more granular representation. We therefore consider a fuller set of suffixal patterns and three additional layers of inflectional exponence. Here we are interested in how different aspects of inflectional exponence affect the complexity of a system without making any claims about which granularity is the ‘right’ or ‘best’ representation (cf. the earlier discussion of Sagot & Walther 2011). Suffixes constitute one layer of exponence. In addition to the four suffix sets illustrated in Table 2.2, we consider ten other patterns of suffixes: 1. Indeclinable nouns, for example, ‘(movie) theater’; 2. Neuter nouns like ‘time’. In the plural these behave like Class IV nouns. In the singular they have an -a in the nominative (like Class II) and accusative, -i in the genitive, locative and dative (like Class III nouns), and -om in the instrumental (like Class I nouns);
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
35
3. Nouns that belong to Class I except that they have a null genitive plural, for example, : raz ‘time..’; 4. Nouns that belong to Class I except that they have -a in the nominative plural, for example, : goroda ‘city..’; 5. Nouns that belong to Class IV, except for -ov in the genitive plural, for example, : oblakov ‘cloud..’; 6. Nouns that belong to Class IV, but with nominative plural -i, for example, : jabloki ‘apple..’; 7. Nouns that belong to Class II except they have an overt genitive plural, for example, : rasprej ‘strife..’; 8. Nouns that belong to Class IV, but have a nominative plural -i and genitive plural -ov, for example, č: očki ‘point..’ and očkov ‘point..’; 9. Nouns that belong to Class I, but have a nominative plural -e and a null genitive plural, for example, ’: krest’jane ‘peasant..’ and krest’jan ‘peasant..’; 10. Nouns that belong to Class I but have a nominative plural in -a and a null genitive plural, for example, ¨ : teljata ‘calf..’ and teljat ‘calf. .’.¹² Table 2.2. Illustration of the four-class system, based on inflectional suffixes
I
II
III
IV
‘law’
‘map’
’* ‘bone’
‘place’
zakon zakon zakona zakone zakonu zakonom zakony zakony zakonov zakonax zakonam zakonami
karta kartu karty karte karte kartoj karty karty kart kartax kartam kartami
kost’ kost’ kosti kosti kosti kost’ju kosti kosti kostej kostjax Kostjam Kostjami
mesto mesto mesta meste mestu mestom mesta mesta mest mestax mestam mestami
Note: * Here and throughout the chapter we use scientific transliteration, rather than transcription. This is a convenience that accommodates Russian speakers and makes it easier to check the examples in a dictionary (because the spelling is maintained). However, the transliteration is sometimes misleading with regard to the phonological (or morphological) shape of words. Although it is not clear in the transliteration of this example, the stem-final consonant cluster in ’ is [sjtj] throughout the paradigm (e.g., nominative singular [kosjtj], genitive singular [kosjtj-i], instrumental singular [kosjtj-ju]).
¹² Nouns like ’ and ¨ also exhibit changes in their stems. See discussion below.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
36
.
A second layer of exponence consists of stem distributions. Here we define a stem as the segmental material left when the suffix sets discussed above are removed from inflected forms (as in, e.g., Aronoff 1994: 31). In total, 80.8% of lexemes have a consistent stem throughout the paradigm (data from Zaliznjak 1977). In addition to this, we consider five types of stem change that are morphologically patterned: 1. Vowel-zero alternation in the nominative singular (and accusative singular when syncretic) but not elsewhere, for example, ’: den’ ‘day..’ ~ dnja ‘day..’; 2. Vowel-zero alternation in the genitive plural (and accusative plural when syncretic) but not elsewhere, for example, ’: pis’mo ‘letter..’ ~ pisem ‘letter..’; 3. A stem extension -in in the singular, for example, ’: krestjanin ‘peasant..’ ~ krestjane ‘peasant..’; 4. A stem extension -en in all forms but the nominative and accusative singular, for example, : vremja ‘time..’ ~ vremeni ‘time.. ’; 5. Extensions -ёnok in singular forms and -jat in plural forms, for example, ¨ : telёnok ‘calf..’ ~ teljata ‘calf..’. A third layer of exponence is stress. In total, 91.6% of nouns have consistent stem stress throughout the paradigm, and an additional 6.1% have consistent stress on the inflectional suffix throughout the paradigm (data from Zaliznjak 1977, reported in Brown et al. 1996).¹³ The remaining nouns have some type of stress shift. While they represent only a small percentage of total types, they tend to be among the words with the highest token frequency. Stress alternations fall into six patterns, shown in Table 2.3. With one exception, the shift is between the first syllable of the stem and the inflectional ending: 1. Two patterns involving a shift according to number, for example, ‘place’ and ˇ ‘number’; 2. Fixed stress on the inflectional ending, but with stem-initial stress in nominative plural (and accusative plural when syncretic), for example, ‘lip’;
¹³ Russian nouns usually have zero exponence in either the nominative singular or genitive plural, depending on class; see Table 2.2. When a form has no overt inflectional suffix in a given paradigm cell, lexemes that otherwise would have stress on the suffix have stress on the last syllable of the stem instead (see in Table 2.3).
Table 2.3. Illustration of stress classes of Russian nouns ˇ ‘number’
‘lip’
‘beard’
‘portion’
ˇ ‘soul’
mésto mésto mésta méstu méste méstom mestá mestá mést mestám mestáx mestámi
čisló čisló čislá čislú čislé čislóm čísla čísla čísel číslam číslax číslami
gubá gubú gubý gubé gubé gubój gúby gúby gúb gubám gubáx gubámi
borodá bórodu borodý borodé borodé borodój bórody bórody boród borodám borodáx borodámi
dólja dólju dóli dóle dóle dólej dóli dóli doléj doljám doljáx doljámi
dušá dúšu duší dušé dušé dušój dúši dúši dúš dúšam dúšax dúšami
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
‘place’
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
38
. 3. Fixed stress on the inflectional ending, but with stem-initial stress in both nominative plural (and accusative plural when syncretic) and accusative singular, for example, ‘beard’; 4. Two patterns that combine a shift according to number with retraction in the nominative plural (and accusative plural when syncretic) and accusative singular, for example, ‘portion’ and š ‘soul’.¹⁴
The fourth and final layer of exponence reflects patterns of defectiveness.¹⁵ In total, 97.7% of nouns have a form for each cell in the paradigm (data from Zaliznjak 1977). However, some lack forms for a subset of the paradigm. Most of these are singularia or pluralia tantum nouns, for example, ˇ ‘pants/ trousers’ has no singular forms. Russian also has a well-known pattern of genitive plural defectiveness that affects a few dozen nouns, for example, ‘reward’ has no genitive plural, and a handful of (diminutive) nouns occur in only the nominative and accusative singular, for example, razok ‘time.’. Within each layer of exponence we do not include patterns that are represented in only one lexeme, nor do we include alternate patterns of stress. However, many lexemes in our data are nonetheless unique in their morphological exponence because they exhibit a unique combination of layers. For example, ‘lord/sir’ has a stem extension in the singular like ’ ‘peasant’ but it has the same set of suffixes and stress pattern as ‘city’. It is the only lexeme to exhibit this particular combination of patterns. We also abstract away from properties that are not related to inflection class membership. Some lexemes exhibit the same exponence but are not identical in other morphosyntactically-relevant traits like gender and animacy. For example, ’ ‘drunkard’ and š ‘girl’ have the same pattern of exponence but
¹⁴ Due to the stress shift between singular and plural, the distribution of the retraction of stress onto the stem is ambiguous. Nouns like are consistent with stress shift in both nominative plural and accusative singular, but since there is stem stress throughout the singular, the accusative singular is ambiguous. Conversely, nouns like š are also consistent with both stress shifts, but since there is stem stress throughout the plural, the nominative plural is ambiguous. Except for ambiguous instances of this sort, accusative singular stress retraction never occurs unless nominative plural stress retraction also does, so it seems safe to analyse š as having both stress retractions, with the nominative plural one being opaque. The proper analysis of is less clear. Stress retraction in the accusative singular happens (unambiguously) only in nouns with the Class II suffix pattern. While belongs to this class, other nouns with the same stress pattern do not (e.g., ‘tooth’ (Class I), šč’ ‘city square’ (Class III)). An alternative possibility is therefore to analyse these nouns as having only the nominative (and accusative) plural stress retraction, since it occurs in combination with a wider range of stem classes. We do not have a firm opinion about which analysis is ultimately the right one, or even whether speakers themselves make only one or the other analysis. But it also makes no difference in the present context. Since our analysis of implicative relations in the following section is based on surface patterns, all six patterns in Table 2.3 are treated as distinct in the analysis. ¹⁵ Walther (2017) distinguishes between ‘deficient’ and ‘defective’ lexemes where the former are lexemes for which a speaker could determine what forms would fill the cells but does not use those forms, and the latter are lexemes for which there is uncertainty about which form would fill missing cells. We include both types of lexemes in our category of defectiveness.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
39
the former is masculine while the latter is feminine. They are treated in our analyses as belonging to the same class since gender is not expressed inflectionally in nouns. We also abstract away from predictable phonologically-conditioned variation and predictable semantically-conditioned variation. For example, vowels reduce when not stressed, but given information about stress, vowel quality is fully predictable and purely phonological. Thus, we abstract away from vowel reduction in our class representations. Some genitive plural forms have no overt exponent, for example, kart ‘map..’, and others have an overt suffix, for example, zakon-ov ‘law-.’ and učitel-ej ‘teacher-.’. Whether a lexeme has a zero genitive plural form or an overt ending is morpholexically conditioned and thus depends on its inflection class, so we include this distribution in our description. However, which of the two overt exponents will occur is fully predictable from the phonology of the stem: -ej occurs with morphologically soft stems and -ov occurs elsewhere (Timberlake 2004: 84–5).¹⁶ Thus, we represent -ov and -ej as a single exponent. Similarly, we do not include differences in accusative marking that are predictable based on animacy (see Corbett & Fraser 1993: 129–30 for justification).¹⁷ Thus, our analysis reflects only information about exponence that is directly a property of inflection class membership. See Parker (2016) for a more complete description of the patterns and paradigmatic layers of Russian nouns.
2.5 Quantifying complexity We adopt a definition of complexity rooted in the predictability of individual forms, rather than entire classes, because it reflects a type of unpredictability speakers must overcome to use an inflectional system (Ackerman et al. 2009). When speakers need to express a combination of lexeme and grammatical
¹⁶ In Russian it is necessary to distinguish phonological softness (secondary palatalization) and morphological softness. The phonological softness of consonants is relevant to phonological processes, for example, conditioning of unstressed vowel reduction. Morphological softness is relevant to allomorph selection in genitive plural. In Russian, consonants that are pronounced with secondary palatalization are soft both phonologically and morphologically. Most of the consonants that are pronounced without secondary palatalization are hard both phonologically and morphologically. However, there are six consonants (traditionally called the ‘unpaired’ consonants) that fall outside of this system in various ways. Three of them differ in softness depending on the level of structure. The consonant /j/ is phonologically soft (it conditions unstressed vowel reduction in the same way as other soft consonants) but morphologically hard (stem-finally it conditions genitive plural -ov, like other hard consonants). Conversely, the consonants /ʃ/ and /ʒ/ are phonologically hard but morphologically soft (stem-finally they condition genitive plural -ej, like other soft consonants). However, the behaviour of these three phonemes is the same across all inflection classes, so we still consider this to be predictable phonological conditioning. ¹⁷ The analysis/number of classes in this chapter differs from that in Parker (2016) and Sims & Parker (2016). This primarily reflects the fact that the earlier work did not abstract away from animacyconditioned exponence in accusative.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
40
.
properties, the predictability of the corresponding individual form is more relevant than the predictability of that lexeme’s class membership—its entire paradigm of forms—for the simple reason that speakers only ever need to produce one inflected form at a time. Moreover, as noted above, recent work suggests that individual form predictability is a relevant level of generalization for statements about the complexity of inflection class organization crosslinguistically (Ackerman & Malouf 2013). Our definition of inflection class complexity is repeated as (2). (2)
Complexity of an inflection class system: the average extent to which the system inhibits motivated inferences about the realized form of a lexeme, given one or more other realized forms of the same lexeme.
We operationalize this definition using information-theoretic tools. We use conditional entropy to estimate the complexity of the system and use the (nonconditioned) entropy of the system to estimate the potential complexity of the system.¹⁸ The potential complexity of an inflection class system is the amount of complexity it would exhibit if the exponents of the various paradigm cells of a lexeme were logically independent of each other, since this would maximally inhibit motivated inferences. A key question is the extent to which the actual complexity of an inflection class system is lower than its potential complexity, since the difference between these reflects the ‘work’ done by inflectional structure to minimize the complexity of the system. Entropy represents the average surprisal associated with the outcome of a random variable A. In the context of inflectional systems, A is a paradigm cell (or more accurately, a set of morphosyntactic properties) and the possible outcomes are the different exponents that realize that cell in each class. Thus, entropy represents the average surprisal associated with the exponents of a given morphosyntactic property set. (3)
Entropy HðAÞ ¼
X
pðaÞlog2 pðaÞ
a∈A
¹⁸ We recognize that these measures do not capture all aspects of a system’s complexity, especially because they are limited to comparisons between individual cells (as opposed to larger subsets of the paradigm). See, for example, Stump & Finkel (2013) and Bonami & Beniamine (2015) for investigations that consider complexity based on predictiveness/predictability of multiple paradigm cells. Expanding the current work to take account of paradigm structuring would be valuable. However, our focus here is on comparing across different descriptions of the Russian nominal system, and the importance of the description for estimates of inflection class complexity. A simple measure gives us the best perspective on this issue.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
41
Conditional entropy H(AjB) represents the average surprisal associated with the outcome of a random variable A, given knowledge of the outcome of another random variable B. In the present context, A and B are paradigm cells in which A ≠ B. Implicitly conditioned on the lexeme, the outcomes of A and B are two inflected forms of the same lexeme. Conditional entropy thus represents the average surprisal associated with the exponent that realizes a given morphosyntactic property set, knowing the exponence of another inflected form of the same lexeme. (4)
Conditional Entropy HðAjBÞ ¼
X a∈A;b∈B
pðb; aÞlog2
pðbÞ pðb; aÞ
Averaging across the entropy values H(A) for all licensed morphosyntactic property sets produces an estimate of the potential complexity of the system as a whole. This mean entropy value represents the average uncertainty associated with predicting the exponent of a paradigm cell knowing only the possible exponents that realize that cell in different classes. Exponents of different morphosyntactic property sets are thus treated as independent of each other. By comparison, averaging across the conditional entropy values H(AjB) of all licensed combinations of morphosyntactic property sets A and B produces an estimate of the complexity of the inflectional system as a whole, taking into account implicative relations holding between pairs of cells. This represents the uncertainty associated with a given cell of a lexeme knowing the exponence of one other cell of the same lexeme. The conditional entropy H(AjB) will never be higher than the entropy H(A) and will be lower whenever the exponent that realizes B is informative about the exponent that realizes A. Knowing one form of a lexeme cannot increase the surprisal associated with another form, but it can lower it. The extent to which knowing one cell reduces the uncertainty associated with another cell (the difference between entropy and conditional entropy) represents how much ‘work’ is being done by the implicative structure of the system.
2.6 Granularity and system complexity We now turn to the primary questions of this chapter, starting with: To what extent does including more paradigmatic layers into the system affect its complexity? Our approach is to develop multiple parallel descriptions of Russian nominal inflectional structure based on the paradigmatic layers. Each description
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
42
.
is based on the same set of lexemes but the lexemes are distributed across classes differently depending on which layers are included in the analysis. This allows us to investigate how paradigmatic layers interact, and specifically, how those interactions influence the complexity of the system as a whole.
2.6.1 Granularity of inflection class information We determined the number of distinct patterns that result from combinations of paradigmatic layers. We took each morphological noun in an exhaustive grammatical dictionary of Russian, Zaliznjak (1977), and created multiple parallel representations of the system by including increasingly more paradigmatic layers. Each representation of the system includes the same 43,486 lexemes distributed among the number of distinct patterns/classes that arise based on the layers considered. In general, as more layers are combined, more classes are needed to describe Russian nominal inflection. In Table 2.4 we provide the number of classes that result when suffix sets are considered independently and in combination with one, two or three additional paradigmatic layers. Note that even the least granular representation here exhibits more classes than the traditional four classes argued for in Corbett (1982) and used in other complexity studies where Russian nouns were considered (e.g., Ackerman & Malouf 2013). We will refer to the different parallel descriptions as ‘granularities’. In Figure 2.1 we show the distribution of word types per inflection class in each of the granularities presented in Table 2.4. The distribution of lexemes across classes is roughly exponential in every granularity, resulting in a more or less linear trend when displayed in log space (Figure 2.1). In other words, there are many lexemes in a small number of classes and few lexemes in many classes. This is not surprising; distributions of this sort are ubiquitous among frequency counts in
Table 2.4. Number of nominal inflection classes of Russian nouns as a function of which paradigmatic layers are included Number of classes
Suffixes
14 21 22 33 42 57 64 82
+ + + + + + + +
Stem changes
Stress
Defectiveness
+ + +
+ + +
+ + + +
+ +
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Log Type frequency
8 6 4 2 0
14 inflection classes
8 6 4 2 0
21 inflection classes
8 6 4 2 0
22 inflection classes
8 6 4 2 0
33 inflection classes
8 6 4 2 0
42 inflection classes
8 6 4 2 0
57 inflection classes
8 6 4 2 0
64 inflection classes
8 6 4 2 0 Inflection Classes
43
82 inflection classes
Figure 2.1. Word types per inflection class across different granularities
natural languages, including word frequencies (see Baayen 2001 for detailed discussion).
2.6.2 Paradigmatic layers and inflection class complexity To assess how the complexity of the system changes with granularity, we calculated the mean entropy (= estimated potential complexity) and mean conditional entropy (= estimated actual complexity) of each representation of the system presented in Table 2.4. In light of the type frequency distribution of classes shown in Figure 2.1, we calculated mean conditional entropy both with and without type frequency weighting. In the weighted condition, the probabilities of each exponent were weighted by the type frequency of the exponent. This measure represents the complexity of the system when both implicative structure and the uneven distribution of lexemes across classes are taken into account. Figure 2.2 shows that as granularity increases, and more paradigmatic layers are included in the system, the entropy and unweighted conditional entropy of the system tend to increase. This is unsurprising from the perspective of information theory—as more elements are present in the system, there will be greater surprisal associated with those elements on average. More interestingly, the weighted conditional entropy values remain low regardless of inflection class granularity; the weighted conditional entropy only increases 0.12 bits from a representation of the system that includes only suffixes (fourteen classes) to one with all
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
44
.
Entropy (unweighted) Conditional Entropy (unweighted) Weighted Conditional Entropy
Complexity Measures in Bits
2.5 2.0 1.5 1.0 0.5 0.0 14
21
22
33 42 Number of Classes
57
64
82
Figure 2.2. Complexity measures across granularities of Russian nouns
paradigmatic layers together (eighty-two classes). This means that the uncertainty associated with a large number of classes is mitigated by a combination of the implicative structure of the system and the unequal distribution of lexemes across classes. Implicative structure and the distribution of lexemes across classes conspire to maintain low systemic complexity. However, even a random distribution of exponents will tend to produce a system with lower mean conditional entropy than mean entropy, because some of the exponents will be accidentally informative about other exponents. Thus, we should ask whether the implicative structure of the system minimizes the complexity of the inflection class system in each granularity more than is expected by chance. Employing Monte Carlo simulation, we created a hundred simulated data sets for each granularity. In each granularity the simulated data sets contained the same exponents and the same number of classes as in the real granularity, but the exponents were randomly distributed across the classes.¹⁹ The mean conditional entropy of the simulated data sets represent the amount of complexity we expect in systems of this size based on a random distribution of exponents. If the actual complexity falls outside of the simulated values, we can conclude that the ‘work’ done by the implicative structure in that granularity is significant at a level of p.05). Figure 2.4 shows the effect size of each independent variable when others are kept constant. Irregular patterns of defectiveness and irregular patterns of stress thus increase the complexity of the system, but irregular patterns of suffixes and stems do not. Irregularity does not inherently make the system more complex; only some types of irregularity do.
0.004 0.003 0.002 0.001
0.004 0.003 0.002 0.001
0.000
0.000
–0.001
–0.001 Reg
Irreg
Reg Stems
0.007
0.007
0.006
0.006
0.005
0.005
Entropy Difference
Entropy Difference
Suffixes
0.004 0.003 0.002 0.001
0.004 0.003 0.002 0.001
0.000
0.000
–0.001
–0.001 Reg
Irreg Stress
Irreg
Reg
Irreg Defectiveness
Figure 2.4. Effect of the irregularity of each layer on system complexity (entropy difference); the vertical bars show 95% confidence intervals
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
49
2.8 Discussion and conclusions This study highlights the need for caution in interpreting results from data whose representations include only affixal and regular inflectional patterns, since they may misrepresent the complexity of inflectional systems and/or obscure important aspects of inflectional structure. For example, the four most granular representations of Russian nouns in our study (forty-two, fifty-seven, sixty-four, and eighty-two classes) have an unweighted average conditional entropy that exceeds the largest unweighted average conditional entropy value among the ten languages investigated by Ackerman & Malouf (2013),²³ even though the conditional entropy of a four-class system of Russian falls in the middle of the range for languages they investigate. The mean conditional entropy of our most granular representation (eighty-two classes) is twice as high as the value for the four-class Russian system in Ackerman & Malouf’s paper. This raises questions about the extent to which typologically low systemic complexity is a reflection of assumptions adopted when creating representations of those systems. At the same time, it is equally important to point out that for every representation of the Russian nominal inflectional system that we investigated—that is, every granularity—the estimated complexity of the Russian noun class system was substantially lower than the potential complexity of the system, as shown in Figure 2.2 in section 2.6.2. The estimated complexity of the system was also significantly lower than would be expected by chance (Figure 2.3 in section 2.6.2). This indicates that a significant amount of ‘work’ is done by implicative structure, regardless of the particular representation that is assumed. The latter result contradicts Ackerman & Malouf’s (2013: 451) speculation that Russian has no need to rely on implicative organization. However, arguably the more important conclusion is that in the end, our results are consistent with their Low Conditional Entropy Conjecture, if it is interpreted as a claim that inflection class systems self-organize to minimize the amount of complexity embodied in the system (rather than as a claim about a particular maximum possible conditional entropy value). No matter what particular representation we assume, Russian nouns show a pattern that is consistent with low systemic complexity, suggesting that a typological tendency towards low systemic complexity may extend beyond affixal and highly regular patterns. While the Low Conditional Entropy Conjecture focuses on a global measure of the complexity of inflection class systems, an equally interesting question has to do with how the component parts of the system shape this global complexity. From this perspective, an important result in this chapter is that the estimated actual complexity of the system changes very little, despite the fact that the
²³ Amele, with a conditional entropy of 1.105 bits; Ackerman & Malouf (2013: 443, table 3).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
50
.
potential complexity of the system tends to increase as information about inflectional exponence (paradigmatic layers) is added (Figure 2.2 in section 2.6.2). This means that the importance of implicative structure to the organization of the Russian nominal system emerges most clearly when irregular and non-affixal patterns are considered. The data presented here thus suggest that inflection class systems self-organize to minimize the potentially disruptive effects of irregularity and to maintain low complexity overall. This is an important aspect of the organization of the nominal system that would be hidden in a more coarsegrained representation. In a similar vein, we also showed that irregularity in some paradigmatic layers (stress, defectiveness) increases the complexity of the system, but in others it does not (Figure 2.4 in section 2.7.2). This suggests that the system as a whole is not simply a function of the complexity of its parts. It is instead a product of the way the parts are distributed—that is, how the component elements are related. This should hardly be a surprise, but the data in this chapter highlight that these sorts of local relations, and how they lead to complexity in an inflection class system (or don’t!), are at least as important to focus on as the complexity of the system overall. To the extent that languages universally or predominantly exhibit low systemic complexity, the question becomes why. At a broad level, the answer likely has to do with learnability (Ackerman et al. 2009), but to get beyond general formulations of this idea, it will be necessary to dive into the learnability of specific inflection class configurations, and to carefully examine local relations among the component parts of individual inflection class systems.²⁴ In this chapter, we have contributed towards this goal. Finally, we consider our results in the more general context of linguistic complexity. Studies on the overall complexity of languages suggest that there may not be any typological limits on linguistic complexity (see Miestamo 2008 for discussion of global vs. local complexity). Trudgill (2011) argues that small communities with dense social networks and little linguistic contact with other communities promote the development and preservation of complexity. Similarly, McWhorter (2007) suggests that diminished linguistic complexity in a language is often the result of an influx of large groups of adults that learn the language. These studies undermine the intuitive idea that complexity in one area of a language leads to diminished complexity elsewhere in the language (see Hockett 1958: 180–1 for an early vocalization of this idea) and challenge any type of typological limit on linguistic complexity. The search for typological similarities in linguistic complexity is elusive enough to have been called a ‘wild goose chase’ (Deutscher 2009). It is thus somewhat surprising that inflection class systems, as particular local domains of complexity, seem to exhibit systemically low complexity. We ²⁴ See Parker et al. (to appear) for computational modelling of inflection class learning that moves in this direction.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
51
think that investigation of interactions between elements in the system is a promising avenue for understanding and testing this issue. Whether similar patterns to what we find in Russian exist in other languages is an empirical question that we feel merits further investigation.
Acknowledgements We thank Peter Arkadiev, Gregory Stump, and an anonymous reviewer for their helpful comments. All errors remain entirely our own. This work was supported in part by The Ohio State University, through a Presidential Fellowship awarded to Jeff Parker and a sabbatical granted to Andrea Sims.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
3 Demorphologization and deepening complexity in Murrinhpatha John Mansfield and Rachel Nordlinger
3.1 Introduction Linguistic complexity is often associated with morphology, but it may also be associated with the unravelling of morphology. Hopper (1990) observes that elements of form that were once morphological exponents may over time lose their morphological status and become unanalysable subparts of lexical stems. For example, the final rime of seldom was once an Old English dative suffix *-um (Hopper 1990: 154). Hopper labels the outcome of this process ‘demorphologisation’, and we here adapt his usage to conceptualize demorphologization as a gradient phenomenon, in which morphological structure becomes gradually blurred over time by the accretion of lexically specific modifications.¹ Our focus is not on the end-point of this process but the mid-point, where there are morphological ‘semi-regularities’ that help speakers and learners predict unknown word forms, but which also leave a residue of unpredictability. This type of analogical unpredictability has become a major focus in research on morphological complexity (e.g., Ackerman et al. 2009; Ackerman & Malouf 2013; Parker & Sims, Chapter 2, this volume). Other studies have focused on the problem of predicting inflectional exponence for unencountered forms in an open lexical class, though as we argue below, there are some unexamined conceptual issues with the open-/closed-class distinction. In the current study, we focus on predictability in a closed class of finite verb stems, albeit one in which there are large inflectional paradigms, and demorphologization has advanced to the point where analogical predictability from one stem to another is highly attenuated. Murrinhpatha finite verb stems, known in the literature as ‘classifier stems’, exhibit semi-regular patterns associated with demorphologization (Walsh 1976; ¹ ‘Demorphologization’ is used rather differently by Joseph & Janda (1988), who use it in reference to regularization of phonological processes such that they become independent of an erstwhile morphological context.
John Mansfield and Rachel Nordlinger, Demorphologization and deepening complexity in Murrinhpatha In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © John Mansfield and Rachel Nordlinger. DOI: 10.1093/oso/9780198861287.003.0003
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
53
Street 1987; Nordlinger 2015; Forshaw 2016; Mansfield 2016). We present data on analogical changes observed by comparing recent fieldwork documentation with forms documented some forty years earlier, showing that the process of demorphologization is still underway. Analogical changes show that classifier stem forms are not learnt and memorized as isolated units, but rather that speakers draw on paradigmatic semi-regularities to predict unknown forms. Though the system does not exhibit regular, productive inflection, neither can it be characterized as a set of ‘frozen forms’. Rather it is a relational system, and one that is in flux. We treat analogical predictability as a form of linguistic complexity, and show that through ongoing demorphologization, the complexity of Murrinhpatha classifier stems is increasing. We quantify this unpredictability by adapting probabilistic tools developed by Ackerman et al. (2009) and Ackerman & Malouf (2015). However, while the latter hypothesize limits of complexity for systems of productive inflection, the Murrinhpatha classifier stems are a closed-class system of 1,638 inflectional forms, where semi-regularities aid acquisition and processing, but whole-form memorization may mitigate the requirement for analogical predictability. Murrinhpatha is a non-Pama-Nyungan polysynthetic Australian language of the Daly River region of the Northern Territory. It has maintained a vibrant speech community some eighty years after its speakers shifted to settled life under the influence of Catholic missionaries (Pye 1972). Murrinhpatha has some of the characteristics, both linguistic and social, that might associate it with the ‘isolated, complex’ language type proposed in sociolinguistic typology (Kusters 2003; Lupyan & Dale 2010; Trudgill 2011: 136; Bentz et al. 2015). However it is doubtful that notions of sociolinguistic ‘isolation’ or ‘low-contact’ apply in this instance, since evidence points to a tradition of regional multilingualism (Falkenberg 1962: 13; Dixon 2002: 674). A crucial distinction for sociolinguistic typology is that between child-acquired versus adult L2-acquired multilingualism: child multilingualism has been argued to maintain or increase complexity, and adult acquisition to reduce complexity (Thomason & Kaufman 1988: 65ff; McWhorter 2007 and Chapter 10, this volume; Trudgill 2011: 34). In the case of Murrinhpatha, we know too little of traditional multilingualism to know which is more applicable. However in the post-settlement era (1930s–present) a large number of people from Marri Ngarr, Marri Tjevin, and other language groups have shifted to Murrinhpatha, in some cases learning both languages as children but switching to Murrinhpatha during adolescent years spent in a multi-ethnic school dormitory established by the missionaries (Mansfield 2014: 98). This influx of new speakers has not brought about any drastic simplifications or other language contact effects in the contemporary grammar of Murrinhpatha, although it has led to the demise of the other languages of the region.² In this chapter, we demonstrate more specifically that ² Note however that the influx of speakers from other language groups may have had some influence on the distribution of sociolinguistic variables (Mansfield 2015a, 2015b: 183).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
54
inflectional changes observed in the post-settlement period do not constitute simplifications, given a definition of complexity as allomorphic unpredictability, and a model of allomorph prediction based on analogical comparison with other lexemes. Given the ambiguity of Murrinhpatha with respect to sociolinguistic typological hypotheses, we do not here pursue the question of whether inflectional complexity depends on social characteristics of the speech community. The structure of the chapter is as follows. In section 3.2 we outline the phenomenon of lexically specified inflectional allomorphy, which is the specific type of morphological complexity discussed in this chapter. In section 3.3 we discuss hypothesized limits to this type of complexity when applied to large lexical classes. In section 3.4 we provide an overview of the Murrinhpatha verb and introduce the relevant aspects of Murrinhpatha verb inflection, which involves exponence by multiple phonological increments which we label ‘intersecting formatives’ (cf. ‘paradigmatic layers’ in Parker & Sims, Chapter 2, this volume). Intersecting formatives are independent of one another in their paradigmatic patterns, and most of these patterns are not consistently applied to all verb stems, making exponence highly unpredictable. This also means that the formatives are generally not in biunique relations with inflectional categories. Section 3.4 describes the paradigms as documented in the 1970s (Walsh 1976; Street 1987), as well as changes to the paradigms observed in our work with a new generation of speakers since 2010. In section 3.5 we compare the observed changes with the types of changes predicted by a model of complexity limitation in large lexical classes (Ackerman & Malouf 2015), showing that none of the observed changes match the model. In section 3.6 we focus on two of the observed changes in particular, arguing that they diverge from the complexitylimitation mechanism because of incremental demorphologization, a process that is both analogical and destructive of existing analogies. In section 3.7 we summarize our findings.
3.2 Complexity in lexically specified allomorphy There are several distinct dimensions of morphology that can be treated as forms of linguistic complexity (Kusters 2003, 2008; Anderson 2015a), but in this chapter we focus solely on (lexically specified) inflectional allomorphy. For example, in the Australian language Warlpiri verbs are suffixed with one of four lexically specified past tense allomorphs, -ca, -ŋu, -ɳu, -nu (Hale 1969; Nash 1980: 40). Where lexemes share the same allomorph selection in all their forms, the shared paradigms are usually referred to as ‘inflection classes’. Inflectional allomorphy of this type can be seen as prototypical morphological complexity, since it directly reduces form:meaning transparency (Aronoff 1998).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
55
The type of complexity instantiated by inflectional allomorphy can be conceptualized in terms of degrees of predictability in allomorph selection. For example, in a language where almost all verbs take -ak 1., and just a handful instead take -iq 1., the exponence is mostly predictable; only a small degree of complexity is involved. But in a language with several lexically-conditioned allomorphs, all more or less likely, there is low predictability, or high complexity. The larger the inflectional paradigm involved, the more that this problem of prediction becomes a real one for speakers of the language (or indeed linguists attempting to accurately document the lexicon and morphology), because where large paradigms are involved there is a more frequent and persistent requirement to produce previously unencountered forms (Bonami & Beniamine 2016; Blevins et al. 2017). Degrees of inflectional predictability can be formalized and quantified using entropy, the weighted average of the log probabilities of all possible outcomes (Shannon 1948). Entropy can be taken as a measure of the unpredictability of a set of possible outcomes. The application of entropy as a measure of paradigmatic implicational structure was proposed by Ackerman et al. (2009). Work on predictability of allomorphy has proceeded from the insight that the inflection of a lexeme is not predicted in an informational vacuum, but rather is a problem of predicting unknown inflectional forms, given one or more forms of the lexeme that have already been encountered. This has been labelled the ‘Paradigm Cell Filling Problem’ (Ackerman et al. 2009; Stump & Finkel 2013; Bonami & Beniamine 2016; Sims & Parker 2016). The paradigmatic structure of inflection is thus crucial: typically, we expect that paradigmatic patterns are shared by lexemes in a language, with those lexemes that share a paradigm belonging to a common inflectional class. The known inflectional forms of a lexeme narrow the possibilities of which class the lexeme might belong to, thus reducing unpredictability of other forms. For example, the past tense suffix allomorphs mentioned above for Warlpiri can usually be predicted based on other inflectional forms. All verbs with imperative in -nta take the past allomorph -nu, licensing an inference from known form jinta ‘scold.’ to the predicted form jinu ‘scold.’ (Nash 1980, p. 40). However there are other instances where allomorphy for a particular tense/aspect/ mood (TAM) category does not uniquely identify an inflection class, leaving some unpredictability in the allomorphy of other forms. Table 3.1 shows the TAM Table 3.1. Warlpiri verb inflection classes (Hale 1969; Nash 1980: 40)
-
.
I II III IV V
-mi -ɳi ~ -ni -ɲi -ɳi ~ -ni -ni
-ca -ɳu -ŋu -ɳu -nu
-ja ~ -ka -ka -ŋka -ɲa -nta
-ju -ku -ŋku -lku -nku
-ɲa -ɳiɲa -ŋaɲa -ɳiɲa -naɲa
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
56
allomorphs for all Warlpiri inflection classes. The syncretism between some classes for some tense categories makes these inflectional forms less than fully predictive of other inflectional forms of the same lexeme. For example, knowing that the presentational form is in -ɳiɲa narrows the range of possible imperative allomorphs, but does not help us to decide between the two possibilities -ka, -ɲa. Residual uncertainty in predicting an inflectional form, given knowledge of other forms of the same lexeme, has been labelled integrative complexity (Ackerman & Malouf 2013). Integrative complexity meets several of the desiderata enumerated in Arkadiev & Gardani’s Introduction to this volume (Chapter 1). First, it is quantifiable and can be used to compare typologically diverse languages. Second, its conceptualization in terms of speaker inferences from known to unknown forms gives it a clear basis in psycholinguistic processing. Finally, whereas enumerative complexities lean heavily on the distinction between morphology and syntax, integrative complexity is relatively independent of this issue. Lexical selection of allomorphs generally occurs within units that are identified as words, but if a similar phenomenon occurred in phrase-like structures (e.g., periphrastic inflections with allomorphy on the auxiliary), this would have no real effect on the modelling of integrative complexity in the paradigm.
3.3 Complexity, predictability, and language change In this chapter, we focus on the effects that language change may have on inflectional predictability. It has been shown that inflection class structure may persist in a language over long time periods (e.g., Maiden 2005; Gardani 2013), but even if it may in some instances be relatively stable, it is of course not completely static. The inflectional allomorphs selected by lexemes exhibit synchronic variation, with fluctuating variation rates over time leading to language change (Weinreich et al. 1968). The long-term patterns of changing allomorph selection have been studied in historically documented languages such as Latin (Gardani 2013: 201–28) and English (Jespersen 1949; Bybee & Moder 1983). An interesting question is whether the direction of such change reflects limits on overall complexity and, conversely, what mechanisms lead to an increase in complexity. There must be some upper limit of unpredictability at which inflectional systems remain learnable. If allomorphic distributions were too unpredictable, their prospects of being stably transmitted from one generation to the next would become rather slim. The obvious way to reduce unpredictability is to replace improbable allomorphs with more probable ones. We have little idea of how much unpredictability is too much, though crosslinguistic studies by Ackerman & Malouf (2013, 2015) and Stump & Finkel (2013) have documented the range of unpredictability found in genetically and typologically diverse samples.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
57
Ackerman & Malouf (2013) compare synchronic inflectional systems in ten languages, showing that in all cases the average conditional entropy of one inflectional form of a lexeme, given knowledge of one other form, is between zero and 1.1 bits, the latter being approximately equivalent to a choice between two equally likely outcomes. Moreover, the languages in the sample that have the most allomorphs, and therefore risk the greatest unpredictability, are also the languages that make the most use of paradigmatic structure to mitigate unpredictability (Ackerman & Malouf 2013: 443). In other words, paradigmatic structure of the type illustrated for Warlpiri above exhibits a strong crosslinguistic tendency to maintain a reasonable level of predictability for unknown inflectional forms. While Ackerman & Malouf (2013) do not propose a specific numeric limit for how much integrative complexity learners can deal with, their study provides a principled method of quantification, and an initial sample of measurements, against which apparently complex languages such as Murrinhpatha can be compared. A simulation of how language change might reduce unpredictability (Ackerman & Malouf 2015) provides a useful model for considering the mechanism of analogical extension. Ackerman & Malouf (2015) model diachronic change in an inflectional system based on the principle that, given a known inflectional form of a lexeme, and the requirement to predict an unknown form of the same lexeme, a speaker identifies lexemes that share allomorphs with the known form. Change proceeds by revising paradigm-internal relations to match the same morphosyntactic relations in other paradigms. We will henceforth use the terms ‘source form’ for the known form, ‘target form’ for the unknown form, ‘comparable lexemes’ for other lexemes that share allomorphy with the source form, and ‘comparable source, comparable target’ for the comparable forms that correspond in morphosyntactic category with the source and target forms respectively. Given the array of comparable lexemes, the speaker establishes which allomorph occurs most frequently among the comparable targets, and predicts this to be the allomorph for the target form. Predictions of this type are taken as a model for language change, because in the next iteration of the simulation, it is the predicted form that is now taken as the allomorph for the target cell, rather than the previous incumbent form. This is a hyperactive model of change, where overgeneralization errors go uncorrected. The model is not specific to either child acquisition or adult usage, which in any case may not be a sharp distinction in large inflectional systems, where some inflected forms must be guessed by speakers even after many millions of words of input (Blevins et al. 2017). In Figure 3.1 we show the process of analogical induction, and replacement of the target form with a predicted form. A, B, etc., represent lexemes, with inflectional categories Ai, Aii, Bi, Bii, etc., while x, y represent exponence candidates. Ai is the source form and Aii is the target form. B, C, D are comparable lexemes (sharing exponence with source form), while E, F are disregarded since they do
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
58
Aii = unknown Ai = x1
compare
Bi = x1, Ci = x1, Di = x1, Ei = x2, Fi = x2 relate
Aii = y2
induce
Bii = y1, Cii = y2, Dii = y2
Figure 3.1. Ackerman & Malouf (2015) mechanism for predicting unknown inflectional forms
not share exponence with the source form. The comparable lexemes analogically present both y₁ and y₂ as exponence candidates for the target form, but y₂ wins out because it occurs more frequently in this distribution. If Aii is used as a source form or comparable lexeme form in a subsequent iteration, it will have the exponence y₂. Ackerman & Malouf (2015) computationally simulate this model of inflectional change based on a ‘highly unrealistic language’ in which allomorphy is almost completely unpredictable in the initial state. The simulation language has a hundred lexemes, each of which inflects for eight morphosyntactic categories, giving a total of 800 forms in the system. Each morphosyntactic category has three allomorphs, which are randomly assigned to each lexeme. Thus there are 3⁸ = 6,561 possible inflectional paradigms, so that most of the hundred lexemes have an idiosyncratic paradigm, that is, not shared with any other lexeme. In this initial state, there are no inflectional classes. As the simulation iterates, replacement of unknown allomorphs with the most predictable allomorph leads to massive convergence of lexemes towards shared inflectional paradigms. The simulation ends when allomorphy stabilizes (i.e., the unknown form already is the most predictable form) for twenty-five consecutive iterations. Given hundreds of trials of the simulation, in a large proportion of simulations (no exact figure is given), all lexemes converge on a single set of allomorphs (i.e., no allomorphy), creating a single inflectional paradigm. In the remaining simulations, lexemes converge on between two and eighty-eight inflectional classes, the median number being twelve (Ackerman & Malouf 2015: 8). In terms of inflectional predictability, the initial random distribution of allomorphs [x₁, x₂, x₃] for each inflected form means that knowledge of other inflected forms does not offer any reduction to uncertainty (except by occasional accident of the distribution), and conditional entropy is therefore only marginally less than unconditional entropy, that is, H(a, b, c) = 1.58 bits. But the replacement by most predictable allomorph mechanism in the simulated language change reduces this entropy to 0 bits in the instances where all lexemes converge on a single paradigm, and an average of 0.64 bits in the instances where the simulation converges on a set of inflectional classes (Ackerman & Malouf 2015: 9). The average conditional entropy found in these simulated inflectional systems sits neatly within the range of
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
59
0–1.1 bits found in the study of natural languages (Ackerman & Malouf 2013). This provides support for the notion that the model’s simplification mechanism may have something in common with mechanisms deployed in natural language. One issue that has been insufficiently addressed in work on integrative complexity is the question of open versus closed lexical classes. The Ackerman & Malouf (2015) simulation works with a set of a hundred lexemes, that is to say a finite set, and therefore a closed class. The basic formulation of the Paradigm Cell Filling Problem (PCFP; Ackerman et al. 2009) presumes that unknown inflectional forms must be predicted by a speaker, but also that the correct inflectional exponence is in some way defined—perhaps by a dictionary, or a more erudite speaker. Now, if we take ‘open class’ to mean a lexical class to which entirely new words can be added, then there must be a point at which inflectional forms of these words are not pre-defined, and there is no correct or incorrect selection of exponence. In other words, for truly open-class lexemes, the PCFP is undefined. In the next section, we will see that Murrinhpatha classifier stems are a closed class, with rather fewer members than may be intended in the original PCFP formulation. However, we argue that the model is still relevant, as Murrinhpatha speakers are not born with complete knowledge of the classifier stem paradigms, and must therefore use predictive mechanisms to extrapolate from known to unknown forms.
3.4 Unpredictable exponence in Murrinhpatha classifier stems Murrinhpatha is a polysynthetic language with complex verbal structures including agreement morphology, nominal incorporation, adverbial modifiers, and complex predicates (Nordlinger 2017). Verbs are built on a finite stem element known as a ‘classifier stem’,³ which may either form a complete verb on its own or form the basis for a complex predicate. Classifier stems encode predicate semantics, subject person and number, and tense/aspect/mood marking. All Murrinhpatha verbs require a classifier stem in first position (bolded in the examples below). There are thirty-nine classifiers, each of which appears in forty-two inflected forms, thus giving a total of 1,638 inflected forms. Eleven of the thirty-nine classifiers can form a verb on their own (1), the remaining twentyeight are only ever found in combination with a second, uninflecting stem element later in the verbal word (underlined in the examples below) with which they jointly determine the predicate semantics (2)–(5). The only allomorphy in the verb is in the classifier stem element—all other elements have a single exponence, subject only to phonologically motivated alternations. For more discussion of the ³ In other work these have been called ‘auxiliaries’ (Walsh 1976), ‘classifier-subject pronominals’ (Nordlinger 2011), and ‘finite verbs’ (Mansfield 2016).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
60
details of the system the reader is referred to Blythe (2009), Nordlinger (2011, 2015), and Mansfield (2016, 2019) among others.⁴ (1)
wuɾan 3S.(6). ‘She goes.’
(2)
muŋam-paɭ 3S.(11).-break ‘She broke it off.’
(3)
pam-ŋin̪t̪a-nu-ma-ɻaʈal 3S.:(24).-.---tear ‘They (two female non-siblings) tore the (cloth) from each other.’ (RN-20070531-002:011)
(4)
piɾim-nin̪t̪a-nu-bu-wuj-waɖa-ya 3S.(3).-.--thigh-put.into-- ‘They put them in their pockets.’ (JB 43JBc743652_747130)
(5)
puddan-wunku-ɭaɭ-dejida-ŋime=pumpan-ka 3S.(29).-3O-drop-in.turn-.=3S.(6).- ‘They (dual, sibling) are dropping them (paucal, female, non-sibling) off, one after the other, as they go along.’ (Blythe 2009: 134)
For most classifier stems the exponence pattern making up the paradigm of fortytwo inflected forms is unique to that stem. Thus the concept of ‘inflectional classes’—a set of exponence paradigms shared by many lexemes—is not directly applicable to Murrinhpatha. (1)–(5) show classifier stems as unsegmented wholes, and this has been the representation used in most work on Murrinhpatha. However there are semi-regular subcomponents evident in these stems, and it is these that we treat as exponents of inflectional categories. These are not productive morphs that are applicable to new lexemes in an open class, however they do constitute morphology in the sense of form:meaning associations between systematically related forms (Anderson 2015b).
⁴ In the Appendix we have provided paradigms for five classifier stems, to exemplify the complexity amongst them. Previous descriptions of the Murrinhpatha verbal system (e.g., Blythe et al. 2007; Nordlinger 2011, 2015) have tended to treat these classifier stem paradigms as consisting of synchronically unanalysable portmanteau forms, due to the substantial amounts of unpredictability and suppletion within the paradigms. The full set of thirty-nine paradigms as analysed in this chapter is available at http://langwidj.org/Murrinhpatha-inflection.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
61
We assume that Murrinhpatha speakers to some extent store classifier stems as whole forms, rather than composing them online from the elements of exponence (cf. Mithun, Chapter 12, this volume). However this does not mean that inflectional exponence has no role in acquisition or processing. We do not know how much input is required for a Murrinhpatha speaker to encounter all 1,638 forms enough times that they can all be memorized, but the available evidence on corpus distribution of inflected forms suggests that many years of input are required to offer complete coverage of large paradigms (Blevins et al. 2017). As with other inflectional systems, when Murrinhpatha speakers parse or produce forms that they have not yet encountered, the recurrent patterns of exponence offer predictive clues. Indeed, the evidence of analogical change presented in this chapter shows that classifier stem forms are not acquired and stored as isolated forms: speakers draw on the exponence of one classifier to produce the exponence of another. Research on child acquisition of Murrinhpatha verb inflection also shows that children make occasional errors in allomorphy selection, revealing morphological structure in the acquisition of classifier stems (Forshaw 2016). Therefore, both the PCFP (Ackerman et al. 2009) and the Ackerman & Malouf (2015) simplification mechanism are relevant to Murrinhpatha. The fact that Murrinhpatha classifier stems constitute a closed class does not disqualify them from applicability of these models, since, as we observed above, the PCFP is only strictly defined for a closed class.
3.4.1 Intersecting formatives and unpredictable allomorphy Inflectional allomorphy in Murrinhpatha classifier stem paradigms is both highly complex and typologically unusual, meaning that a detailed exposition is beyond the scope of this chapter.⁵ Murrinhpatha’s thirty-nine classifiers each appear in forty-two inflected forms (and never in non-finite form). The ‘inner stems’ upon which these forms are built are highly mutable, creating much of the complexity in the system (cf. Parker & Sims, Chapter 2, this volume). Table 3.2 illustrates some sample classifier forms,⁶ representing two classifiers (la ‘(26)’, ma ‘(34)’) that have fairly clear phonological stems, one classifier (ɾu ‘(6)’) that has a highly mutable stem, and one classifier (i ‘(1)’) that has a vowel-only stem,
⁵ Fuller description is available in Mansfield (2016, 2019), drawing on earlier partial analyses (Walsh 1976: 224; Green 2003; Forshaw 2016: 37). As shown in the examples above, there is also further inflectional morphology in the verb that is not part of the classifier stem paradigms, and can be applied equally to verbs based on any classifier stem (Nordlinger 2015, 2017). This morphology has no bearing on the issues discussed in this chapter and will therefore not feature in our remaining discussion. ⁶ The full paradigms are provided in the Appendix.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
62
Table 3.2. Examples of inflected classifier forms
la ‘(26)’ ma ‘(34)’ ɾu ‘(6)’ i ‘(1)’
3.
3.
3.
kila ma kuɾu ki
dilam mam wuɾan dim
pilla pume puɳi piɾini
which only surfaces phonologically when it syllabifies with a consonantal stem alternation, and otherwise results in a phonologically empty stem.⁷ The challenge for segmentation and analysis of classifier forms lies in the fact that each form combines several independent dimensions of allomorphy. Each inflected classifier form selects a prefix consonant allomorph, an orthogonally distributed prefix vowel allomorph, and an orthogonally distributed suffix allomorph (any of which can be zero). We label this combination of orthogonal allomorphs inflection by intersecting formatives (Mansfield 2016). Intersecting formatives appear to be a recurrent feature of highly complex verbal inflection systems, such as Mazatec (Ackerman & Malouf 2013), Greek (Sims 2015: 143ff), Saami (Feist 2015: 140ff), and Seri (Baerman 2016). Intersecting inflectional formatives are given an explicit formulation in Network Morphology, where they are represented as multiple inheritance of inflection class nodes (Brown & Hippisley 2012: 71ff), and there is further discussion of the phenomenon with respect to complexity in Parker & Sims (Chapter 2, this volume), where intersectional inflection is labelled ‘paradigmatic layers’. Intersectional inflection often combines concatenative and supra-segmental morphology, and this is also the case in the Murrinhpatha verb forms. The Murrinhpatha classifier stem is built on a phonologically minimal ‘inner stem’ of the shape (C)(C)V, which alternates in three orthogonal dimensions: stem consonant mutation, vowel height, and vowel frontness. Each inflected form of a classifier stem is therefore determined by six dimensions of intersecting allomorphy: PrefC, PrefV, StemC, StemVH, StemVF, and Suffix. Table 3.3 illustrates the intersecting formative analysis of the forms shown above in Table 3.2. Formatives exhibit ‘semi-regularities’ that appear in some but not all exponents of a morphosyntactic cell, for example, PrefC k- in 3., Suffix -m in 3.. Other (semi-)regularities attach to particular classifier stems, for example PrefV i- in
⁷ The description of ‘vowel-only stems’ is somewhat different from Mansfield (2016), where they are simply labelled ‘phonologically empty stems’. The analysis there nonetheless depends on underlying ‘theme vowels’ in such stems, though this is not explicitly discussed. An alternative analysis would propose a zero theme vowel, to avoid the use of unrealized underlying vowels. We have experimented with calculation of Murrinhpatha integrative complexity using both analyses, and found that the difference is very small (< 1%). The unrealized vowel alternative produces slightly lower complexity measurements, and we therefore select this option to keep our complexity measurements conservative.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
63
Table 3.3. Examples of classifier forms and their formative analyses 3.
la ‘(26)’ ma ‘(34)’ ɾu ‘(6)’ i ‘(1)’
3.
. - . - [, , ] - kila dilam k-i-la[]-∅ d-i-la[]-m ma mam ∅-u-ma[]-∅* ∅-u-ma[]-m kuɾu wuɾan k-u-ɾu[]-∅ w-u-ɾa[:]-n ki dim k-i-∅[]-∅ d-i-∅[]-m
3.
pilla p-i-lla[:]-∅ pume p-u-me[:]-∅ puɳi p-u-ø[:]-ɳi piɾini p-i-ɾi[:]-ni
Notes: * PrefV, like the stem vowel, does not surface unless it can syllabify with an onset consonant. Thus we can analyse a PrefV u- formative in ø-u-ma-m 3.(34)., in keeping with this classifier’s overall paradigmatic pattern, though the surface form is mam. = default; = geminate; = ɾ-alternation.
(26) and (1), but PrefV u- in (34) and (6). Importantly, these patterns are often orthogonal—for example, the PrefV selection is independent of the PrefC selection in 3.. As shown in the full paradigm examples in the Appendix, the complete morphosyntactic paradigm of a classifier stem consists of forty-two inflectional forms. Subjects are distinguished for 1/2/3 person, cross-cutting a three-way // number distinction (although / is consistently collapsed in tense, and in all tenses for some paradigms).⁸ There is also a 1+2 ‘we inclusive’ person category, which has no number distinctions. These are the core number/person categories of Murrinhpatha, but more specific subcategories can be encoded using various predictable suffixes not discussed here (Nordlinger 2015). There are four basic tense/ modality categories (henceforth ‘tenses’): non-future (), irrealis (), past (), and past irrealis (), as well as ‘subtense’ distinctions between vs presentational (), and vs future indicative (), which apply only to third-person forms. Again, these core categories can be further specified by predictable suffixes encoding tense, modality, and aspect (Nordlinger & Caudal 2012). Table 3.4 illustrates a complete paradigm of inflected forms for one of the more regular classifiers, na ‘(27)’, with both surface forms and intersecting formative analysis. Some formatives in some cells have a consistent form (i.e., no allomorphy), such as PrefC p- in 3.. More typical is a selection between a handful of formative allomorphs, for example Suffix -m, -n, -ŋam, -ŋan in , or PrefV a-, e-, i, u- for all cells. A particularly wide selection of allomorphs is PrefC p-, w-, d-, n-, j-, k-,
⁸ The category here labelled is used for both dual and paucal referents; it is labelled PAUCAL (PC) in Mansfield (2016) and DAUCAL in Blythe (2009).
Table 3.4. Inflectional exponence of na ‘(27)’ INNER STEMS:
NFUT (/PRSL) SG
1 2 3
INCL PL/DU
1+2 1 2 3
ŋinaŋam ŋ-i-[]-ŋam t̪inaŋam t̪-i-[]-ŋam ninaŋam/ kinaŋam n-i-[]-ŋam / k-i-[]-ŋam t̪inaŋam t̪-i-[]-ŋam ŋinnaŋam ŋ-i-[]-ŋam ninnaŋam n-i-[]-ŋam pinnaŋam / kinnaŋam p-i-[]-ŋam / k-i-[]-ŋam
SG 1 2 3
INCL 1+2 PL 1 2 3
DU 1 2 3
na nna : ∅ : IRR (/FUT)
PST
PSTIRR
ŋina ŋ-i-[]-∅ t̪ina t̪-i-[]-∅ nina/ kina k-i-[]-∅ / p-i-[]-∅
ŋinaŋa ŋ-i-[]-ŋa t̪inaŋa t̪-i-[]-ŋa niŋa n-i-[]-ŋa
ŋinaŋi ŋ-i-[]-ŋi t̪inaŋi t̪-i-[]-ŋi niŋa n-i-[]-ŋi
pina p-i-[]-∅
t̪inaŋa t̪-i-[]-ŋa
t̪inaŋi t̪-i-[]-ŋi
ŋinna ŋ-i-[]-∅ ninna n-i-[]-∅ kinna / pinna k-i-[]-∅ / p-i-[]-∅
ŋinna ŋaŋ-i-[]-ŋa ninnaŋa n-i-[]-ŋa pinnaŋa p-i-[]-ŋa
ŋinnaŋi ŋ-i-[]-ŋi ninnaŋi n-i-[]-ŋi pinnaŋi p-i-[]-ŋi
ŋinna ŋ-i-[]-∅ ninna n-i-[]-∅ kinna / pinna k-i-[]-∅ / pi-[]-∅
ŋinnaŋa ŋ-i-[]-ŋa ninnaŋa n-i-[]-ŋa pinnaŋa p-i-[]-ŋa
ŋinnaŋi ŋ-i-[]-ŋi ninnaŋi n-i-[]-ŋi pinnaŋi p-i-[]-ŋi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
na ‘(27)’
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
65
ø- 3., and StemC allomorphy also has a large selection of allomorphs, once we take into account various suppletive (i.e., altogether unpatterned) consonant alternations. From the point of view of integrative complexity, that is, the predictability of an inflected form given knowledge of some other form, the formatives individually have an intermediate degree of predictability. In certain dimensions there is very high predictability: for example, if one form takes Suffix -ŋam, there is a very high likelihood (though not quite categorical) that any other form of the same verb will take Suffix -ŋam. This is illustrated in the consistent tense patterning of Suffix allomorphs in Table 3.4. Among cells that have the same tense and number categories but differ for 1/2/3 person, the only difference of exponence is usually PrefC; these triplets of cells are therefore tightly integrated in terms of implicational structure. However, when we consider the implicative relationship between cells from different tenses, we find that, say, knowing -ŋam provides little information about the Suffix allomorph for cells. Allomorph selection across tenses is strongly orthogonal. Other formatives have generally high degrees of integrative complexity, that is to say, inconsistent paradigmatic patterning. This is especially true of the stem formatives StemC, StemVH, and StemVF, and also to some extent of PrefV. The problem of predicting an unknown inflected form of a Murrinhpatha classifier stem therefore involves predicting allomorph selection for six intersecting formatives, based on knowledge of such an intersection for some other form of the classifier stem. Some formatives provide good chances of correct prediction, while others are rather less helpful. This situation is not as extreme as the completely random paradigmatic distribution of allomorphs in Ackerman & Malouf (2015)’s ‘unrealistic language’, though the presence of six different dimensions of allomorphy in Murrinhpatha nonetheless leads to a high degree of complexity, since the unpredictability of the allomorphs is compounded. Because Murrinhpatha classifier stems often have idiosyncratic exponents, that is, allomorphs not shared by any other classifier stem, the entropy calculations used in Ackerman & Malouf (2013) are not directly applicable. The latter’s allomorphic entropy method assumes that all possible exponents have been encountered in other lexemes, so that allomorphy prediction involves a distribution of possible outcomes. But in a system with idiosyncratic exponents, the unknown target exponent may be one that has not previously been encountered (cf. Dahl, Chapter 13, this volume). The speaker’s challenge is not one of entropy in the distribution of previous observations, but of attempting to predict an outcome that may or may not match any previous observation. Thus the mathematical analysis calculates chance of correct prediction (including zero chance for a previously unencountered paradigmatic relation), rather than degrees of entropy. Nonetheless, we can make a notional comparison of Murrinhpatha with the crosslinguistic findings on entropy in Ackerman & Malouf (2013). The latter
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
66
finds average conditional entropy between 0 and 1.1 bits, and 1 bit of entropy equates to a randomized prediction having 50% chance of matching the outcome. Mansfield (2016) calculates that the average chance of correct prediction from one Murrinhpatha classifier stem form to another is 43%, comparable to 1.22 bits of entropy.⁹ This is slightly outside the range of the Ackerman & Malouf sample, suggesting that Murrinhpatha’s closed-class classifier stems have an integrative complexity at the upper end of the scale found for open-class systems in other languages. As far as we know, the only language that has been analysed as having clearly higher integrative complexity is Seri (isolate, Mexico), which has almost 2 bits average conditional entropy (Baerman 2016).
3.4.2 Variation and change With 39 x 42 = 1,638 inflectional cells to be learnt, and implicational relations proving only moderately helpful in deducing unknown forms, it would be surprising if all Murrinhpatha speakers selected the same allomorphs all the time. The presence of allomorphic variation in Murrinhpatha classifier stem forms has previously been explored only to the extent that some paradigm cells are documented with two or more variants, for example nuɻa ~ na 3S.(7). (Street 1987: 84). The 1,638 cells of the full classifier stem paradigms have been documented based on a limited set of spontaneous speech data, with gaps filled by systematic elicitation of paradigms by multiple researchers over a number of years of descriptive work. These collective findings are collated as Blythe et al. (2007), and since then have been further revised and reanalysed in Mansfield (2019) although many questions still remain. Understanding the extent of allomorphic variation, the proportion in which variants are used, and any conditioning factors on the variation requires much more data. Investigation of such variation in Murrinhpatha is still a work in progress, but after forty years of intermittent research on this language, there are now some inflectional variables for which we have enough corpus tokens to begin proposing patterns of variation and implicit diachronic change. For this study we have identified seven inflected forms with attested variation. These are the complete set of forms that fulfil the following criteria: (a) Variation attested in the corpora of adult speech recorded by Blythe, Mansfield, Nordlinger, Street, & Walsh;¹⁰ (b) Allomorphic variants are attested with multiple corpus tokens for each variant; ⁹ That is, log₂(1/0.43) = 1.22. ¹⁰ Much of this corpus material is stored in public archives at the Australian Institute of Aboriginal and Torres Strait Islander Studies (Walsh), the Max Planck Institute Language Archive (Blythe), and PARADISEC (Mansfield, Nordlinger).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
67
(c) The variation is morphological, rather than purely phonological. For example, pujemam ~ pijemam 3S.(34). is purely phonological variation based on assimilation of the vowel to the following glide, and is therefore not an instance of lexically specified allomorphy. None of the seven variables thus identified have enough corpus tokens to support a rigorous variationist analysis. Nor is there sufficient data to permit differentiation between contextual factors such as phrasal context, speech style, speaker gender, etc. Rather, in this study we focus purely on the distribution of variants among speakers born in the first half of the twentieth century (‘older speakers’) versus those born in the second half (‘younger speakers’). This method allows us to detect proportions suggestive of change in progress in inflectional variants, and thereby to search for signs of the Ackerman & Malouf (2015) simplification mechanism in effect. In fact, for all seven of the variables, there is a striking difference between variant distributions among older and younger groups, with the younger moving strongly towards the variant not attested in earlier documentation.¹¹ This is likely not an accident: the fact that these seven inflected forms were noted as variable is primarily because they stood out in Mansfield’s fieldwork as conflicting with earlier grammatical descriptions of the language. On the other hand, though speakers showed clear awareness of social indexicality in phonological and lexical variation among the generations, they were unaware of the intergenerational variations in inflectional morphology (Mansfield 2014: 469ff). It has often been observed that less frequent inflectional forms are more susceptible to analogical change in morphology, though frequent forms may also undergo such changes (e.g., Fertig 2000: 125). Since our method for identifying changes in Murrinhpatha depends on the salience of these changes in fieldwork, these can all be said to occur in fairly frequent forms. We presume that further analogical changes occur in less frequent forms, though we have not had the opportunity to observe these, and the corpus data drawn upon for this study does not permit robust estimates of inflectional form frequency. Table 3.5 lists the seven observed variables, with variants preferred by older and younger speakers respectively according to the corpus evidence. Note that where regular triplets of 1/2/3 person inflections are all involved, these are treated as a single variable in view of their tight mutual implications. Token numbers in parentheses indicate the number of tokens found for the older:newer variants among that speaker group. For example, for 1S.(34)., older speakers were found to have five tokens of me and one token of ŋeme,
¹¹ Some of the sources for older speakers are written (e.g., Bible; Street 1987) and do not have accompanying audio sources. It is possible that these sources underreport use of innovative variants, by correcting them to what may have been seen as the ‘correct’ form. This may account for some of the strength of the swing in proportions from older to younger speaker groups.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
68
Table 3.5. Variably inflected classifier stem forms Classifier, inflection
Older speakers (tokens)
Younger speakers (tokens)
ma, 1.(34). ma, 2S.(34). ma, .(34). ɾu, .(6). nu, .(7). ɾa, 3.(28). ɾi, 3.: (36).
me (5:1) nam (7:1) ŋamam, namam, pamam (17:5) ŋa, na, ka (3:0) ŋunna, nunna, punna (10:1) paŋan (4:0) pim (2:0)
ŋeme (0:9) t ̪amam (10:13) ŋujemam etc (3:12) ŋu etc (0:3) ŋunne etc (0:10) piɾim (1:5) piɾim (0:5)
while younger speakers were found to have zero tokens of me and nine of ŋeme. Interestingly, one of the few forms earlier documented as being variable, nuɻa ~ na 3.. (Street 1987: 84), showed only marginal variability in the corpus data. There are dozens of attestations for na, and only one for nuɻa, suggesting that the latter variant was already on its way out when Street recorded it.
3.5 Predictability of changes observed in Murrinhpatha In the last section we saw that Murrinhpatha classifier stems are a closed class in which the inflectional paradigms are large, and implicational relations are highly unpredictable. We also saw that allomorphy of exponence in this system is not static, but rather encompasses some variable forms, which show signs of change over the last couple of generations. Thus we are now in a position to investigate whether the changes observed in Murrinhpatha decrease or increase the complexity of the system. To test this, we ran the Ackerman & Malouf (2015) simplification method (with adaptions as described above) on the relevant classifier forms, identifying the most predicted allomorphs. We show that the observed change does not replace an incumbent allomorph with the most predictable allomorph in any of the seven inflected forms. We then go on to consider a weaker form of the Ackerman & Malouf (2015) simplification mechanism: when speakers replace an old allomorph with a new one, do they at least select one that is more predictable than the previous? We find that, on the contrary, most of the changes observed in Murrinhpatha select less predictable allomorphs, thus increasing the complexity of the system. The Ackerman & Malouf (2015) simplification mechanism was implemented for Murrinhpatha classifier inflections using intersecting formatives to draw independent analogies, since this method has been shown to provide the
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
69
Table 3.6. Allomorphs selected by Ackerman & Malouf (2015) simplification mechanism Classifier, inflection
Older speakers
Younger speakers
Ackerman & Malouf (2015) simplification
ma, 1.(34). ma, 2.(34). ma, .(34). ɾu, .(6). nu, .(7). ɾa, 3.(28). ɾi, 3.:(36).
me nam ŋamam etc ŋa etc ŋunna etc paŋan pim
ŋeme t ̪amam ŋujemam etc ŋu etc ŋunne etc piɾim piɾim
me nam ŋumam etc ŋuɻu etc ŋunni etc piɻam piɻim
greatest probability of correctly predicting allomorphy (Mansfield 2016).¹² The implementation iterates through every inflected form of every Murrinhpatha classifier stem, treating each in turn as a target form requiring analogical prediction. The predictive mechanism takes each other inflected form of the classifier stem in turn as a source form, and for each identifies comparable classifier stems, from which candidate allomorphs for the target form are deduced. The probability of each candidate allomorph is the proportion of comparable classifier stems that imply that allomorph. The probability of candidates is aggregated across all source forms, revealing the overall most probable candidate. The most probable candidate allomorphs selected by the implementation for our variable inflected forms are illustrated in Table 3.6, along with the older and younger speakers’ attested forms (see full paradigms in Appendix). The results of the implementation do not in any instance match the innovative forms observed among younger speakers. However, in some instances the observed innovation, in comparison with the older form, does exhibit some of the formative allomorphs selected by the Ackerman & Malouf (2015) simplification. For example in 1.(6)., the older form is ŋa and the simplification form is ŋuɻu. The observed innovation ŋu does exhibit the switch to PrefV u-, but maintains the weak stem grade of the older form, rather than the StemC [] ɻu of the Ackerman & Malouf (2015) simplification.¹³ Similarly, in 3.(28). the observed innovation takes on both the PrefV i- allomorph, and the Suffix -m of the Ackerman & Malouf (2015) simplification, but does not take up the StemC [] ɻa selected by the simplification, and also diverges from the ¹² The implementation code is written in Python (Python Software Foundation n.d.), and takes as input the inflectional paradigm data format established for the Principle Parts Analyzer (Finkel & Stump 2013). Both code and data are available online at http://langwidj.org/Murrinhpatha-inflection. ¹³ ɾu [] ! ɻu [] may not seem like an obvious case of gemination, but it follows from a ɾɾ ! ɻ process observed in Murrinhpatha’s sister language Ngan’gityemerri (Reid 1990) and their shared proto-language (Green 2003). In Murrinhpatha it is observable only in the classifier stem paradigms, where it fits with a broader gemination pattern.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
70
Table 3.7. Exponence probabilities of older and newer forms Classifier, inflection
Older form (prob.)
Newer form (prob.)
ma, 1.(34). ma, 2.(34). ma, .(34). ɾu, .(6). nu, .(7). ɾa, 3.(28). ɾi, 3.:(36).
me (.37) nam (.14) ŋamam, namam, pamam (.14) ŋa, na, ka (.06) ŋunna, nunna, punna (.06) paŋan (.06) pim (.06)
ŋeme (.00) t ̪amam (.09) ŋujemam etc (.00) ŋu etc (.06) ŋunne etc (.06) piɾim (.06) piɾim (.12)
simplification by selecting StemVF [] (vowel frontness) and StemVH [] (vowel height) formatives. Finally, .(7). takes up the StemVF [] alternation selected by the simplification, but maintains the StemVH [] (vowel height) alternation of the older form, instead of selecting the StemVH [] of the simplification. Since some of the observed innovations take up subsets of the formative intersection selected by the adapted Ackerman & Malouf (2015) simplification, which is the overall most probable exponence, we might wonder whether the observed innovations represent partial or incomplete moves towards Ackerman & Malouf (2015) simplification. Do the observed innovations have greater probability of being predicted by analogy than the older forms they appear to be replacing? To this question, the answer is again negative, as illustrated in Table 3.7. Table 3.7 illustrates that in six out of the seven instances, the innovative form has either lower probability of being predicted than the older form, or equal probability. Only one instance, the innovation in 3.:(36)., creates a more predictable exponent. Even though some of the innovated formatives match the Ackerman & Malouf (2015) simplification, the selection of nonsimplified formatives undermines the predictability of the entire form. We must therefore conclude that the changes to inflectional allomorphy observed in Murrinhpatha data collected over forty years (or at least, apparent changes, suggested by different distribution of variants among older and younger speakers) increase the complexity of the system. Most of the changes replace more predictable allomorphs with less predictable ones.
3.6 Demorphologization and deepening complexity Observed changes in Murrinhpatha increase the unpredictability of inflectional allomorphs because of breakdown in the structure of intersecting formatives. In this section we argue that this is a form of incremental demorphologization, where allomorphic proliferation is associated with the breakdown of segmentability.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
71
Demorphologization in this sense is a complexifying force running counter to the simplifying force of analogical levelling. Indeed, this demorphologization process appears to be the same phenomenon that has been underway for a much longer time period, leading to the unpredictability of implicational relations in the classifier paradigms. Blurring of constituent boundaries between the inner stems and affixal exponents in the classifier paradigms has produced the semiregularities of the inflectional system. Ackerman & Malouf (2015) propose that the requirement for inflectional allomorphs to be reasonably predictable, given knowledge of other forms of the same lexeme, is a ‘strong evolutionary pressure in language’ (Ackerman & Malouf 2015: 7). They present their model for iteratively simplifying predictions as a demonstration of how predictability might be achieved, though they do not claim that this is the actual mechanism at work in the evolution of natural languages.¹⁴ The implementation of their mechanism for Murrinhpatha, compared to observed changes in the language, suggests that a simplification mechanism of this type is not in operation in the closed-class system of Murrinhpatha. But the broader point remains valid: inflectional changes do appear to reflect analogies drawn by speakers based on the paradigms of other lexemes. This point of view is supported because the innovated forms in Murrinhpatha copy phonological elements found in other classifier forms with which they share morphosyntactic characteristics, rather than being purely phonological changes. But rather than following a direct aggregation of probable allomorphs, there appear to be other predictive influences at work—interference in the system, which leads to an increase in integrative complexity. Each of the innovations observed in Murrinhpatha has its own story, with potential sources of analogy detectable upon investigation of paradigmatically related forms. We here describe two of the innovations in particular, selected because they illustrate a means by which allomorphic complexity may be perpetuated, rather than reduced.¹⁵ As with all the observed changes, these are not the forms selected by the Ackerman & Malouf (2015) simplification mechanism.
¹⁴ In fact, their main argument focuses on the greater generality of their Low Conditional Entropy Conjecture (Ackerman & Malouf 2013) as compared to the No Blur Principle (Carstairs-McCarthy 1994), which does not directly concern us here. ¹⁵ The other changes observed are potentially explicable by more subtle departures from the Ackerman & Malouf (2015) simplification mechanism—for example, by weighting of comparable classifier stems according to their respective entropies of prediction, with near-categorical predictors given extra weight (2.(34).), or by allowing prediction to be based on phonological relationships, including identity, rather than inflectional exponents (.(34).) (Bonami & Beniamine 2016). .(6). and..(7). seem to involve greater independence of formatives than has been previously proposed for the system (Mansfield 2016). Satisfactory analysis of any of these instances would require a separate study.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
72 (6)
1.(34). Older form ∅-a-me[]-∅ me Ackerman & Malouf (2015) simplified ∅-a-me[]-∅ me
Observed ŋ-e-me[]-∅ ŋeme
In the case of (6), there are two observed deviations from Ackerman & Malouf (2015), the first of which is the selection of PrefC ŋ- instead of ∅-. Both ŋ- and ∅are in fact candidates implied by comparable classifier stems for various source forms, with ∅- selected because it has an aggregate 0.73 probability among all source forms, versus 0.27 for ŋ-. It is easy to imagine that this outcome might be different, as in the observed innovation ŋeme, if there were some weighting in the influence of source forms and comparable classifier stems. However the second deviation from Ackerman & Malouf (2015) involves the introduction of PrefV e-, and this is not even a candidate by analogy with comparable classifiers. Classifier stems that do have PrefV e- are never selected as comparable, because none of the ma ‘(34)’ source forms use this allomorph, as illustrated in Table 3.8. Rather, the competing candidates are a- ~ u-. Notice, however, that 1.(34)., like all (34). forms, has a StemVF [] alternation. It seems that rather than arising from analogical prediction of PrefV allomorphy, the form ŋeme applies vowel fronting beyond the morphological inner stem structure ma ~ me in which the pattern is more generally established. On this view, the predicted form is derived analogically from other forms, but the prediction of vowel fronting has been inherited upwards into a morphological unit larger than the inner stem. Such abrogation of the structural distinction between inner stem and prefix is perhaps not surprising, given the widespread lack of phonological transparency in Murrinhpatha classifier stems. (7)
3.(28). p-a-∅[]-ŋan paŋan Ackerman & Malouf (2015) simplified p-i-ɻa[:, :, :]-m piɻam
Observed p-i-ɾi[:, :, :]-m piɾim
The case of (7) suggests more extensive breakdown of inner stem/affix structure in the predictive mechanism. Here the observed deviations from the Ackerman & Malouf (2015) simplification again include a consonant formative that is an analogical candidate though not the aggregate strongest candidate, StemC [] instead of StemC [], which again could be accounted for in a system that includes some weighting of candidates. The other deviation is in the vowel
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
73
Table 3.8. Classifier stem paradigm for ma ‘(34)’ ma ‘(34)’
NFUT (/PRSL) SG
1
ŋamam ŋ-a-[]-m
2
nam ∅-a-[]m mam / kamam ∅-a-[]m / k- . . .
3
INCL 1+2 t a̪ mam t -̪ a-[]-m PL/ DU
1
ŋamam ŋ-a-[]-m
2
namam n- . . . pamam / kamam p- . . . / k- . . .
3
INNER ma STEM: me : mi :, : IRR (/FUT) SG 1 ŋama ŋ-a-[]-∅ 2 t̪ama t̪- . . . 3 kama / pama k- . . . / p . . .
INCL pama 1+2 p-a-[]-∅
na : ne :, : ni :, :, : PST
PSTIRR
me mi ∅-u-[]-∅ ∅-u-[, ]-∅ ni ne ∅-u-[, ∅-u-[, ]-∅ ,]-∅ me mi ∅-u-[]- ∅ ∅-u-[, ]-∅ t̪ume t̪-u-[]-∅
t̪umi t̪-u-[, ]-∅
PL 1 ŋujema ŋ-uje-[]-∅
ŋume ŋumi ŋ-u-[]-∅ ŋ-u-[, ]-∅ 2 nujema nume numi n- . . . n- . . . n- . . . 3 kujema / pujema pume pumi k- . . . / p- . . . p- . . . p- . . .
DU 1 ŋujema ŋ-uje-[]-∅
ŋume ŋumi ŋ-u-[]-∅ ŋ-u-[, ]-∅ 2 nujema nume numi n- . . . n- . . . n- . . . 3 kujema / pujema pume pumi k- . . . / p- . . . p- . . . p- . . .
formatives StemVF [] and StemVH [], neither of which is predicted by formative analogies. None of the source forms use such stem vowel alternations (Table 3.9). Rather, the default inner stem vowel a is overwhelmingly predicted, rather than the observed [, ] alternation i. The most obvious explanation in this case is the existence of a 3. form piɾim in other classifiers, in particular i ‘(1)’ and i ‘(2)’. This is another case of analogical relations being drawn without respect to classifier-internal morphological structure; the comparable classifiers have the i vowel, though it is not determined by [, ]
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
74
Table 3.9. Classifier stem paradigm for ɾa ‘(28)’ ɾa ‘(28)’ NFUT (/PRSL) SG
INCL PL/DU
1 ŋiɾaŋan ŋ-i-[]-ŋan 2 t̪iɾaŋan t̪- . . . 3 diɾaŋan / kiɾaŋan d- . . . / k- . . . 1+2 t̪iɾaŋan t̪-i-[]-ŋan 1 ŋaŋan ŋ-a-[]-ŋan 2 naŋam n- . . . 3 paŋam / kaŋam p- . . . / k- . . .
INNER STEM:
ɾa a : ∅ : IRR (/FUT)
PST
PSTIRR
SG 1 ŋiɾa ŋ-i-[]-∅ 2 t̪iɾa t̪- . . . 3 kiɾa / piɾa k- . . . / p- . . .
ŋiɾa ŋ-i-[]-∅ t̪iɾa t̪- . . . diɾa d- . . .
ŋiɾaŋi ŋ-i-[]-ŋi t̪iɾaŋi t̪- . . . diɾaŋi d- . . .
INCL piɾa 1+2 p-i-[]-∅
t̪iɾa t̪-i-[]-∅
t̪iɾaŋi t̪-i-[]-ŋi
PL 1 ŋiɻa ŋ-i-[]-∅ 2 niɻa n- . . . 3 kiɻa / piɻa k- . . . / p- . . .
ŋiɻa ŋ-i-[]-∅ niɻa n- . . . piɻa p- . . .
ŋiɻaŋi ŋ-i-[]-ŋi niɻaŋi n- . . . piɻaŋi p- . . .
DU 1 ŋiɻa ŋ-i-[]-∅ 2 niɻa n- . . . 3 kiɻa / piɻa k- . . . / p- . . .
ŋiɻa ŋ-i-[]-∅ niɻa n- . . . piɻa p- . . .
ŋiɻaŋe ŋ-i-[]-ŋe niɻaŋe n- . . . piɻaŋe p- . . .
alternations on an inner stem, but rather by an underlying inner stem vowel (visible not in the default stem form, but only in forms with suppletive StemC). Therefore the analogical mechanism depends on a shared morphosyntactic category 3., and on some shared formatives, but ignores the patterns of inner stem vowel defaults and alternations existent in other parts of the paradigm. Again it draws a phonological analogy that abrogates inner stem/affix structure. In historical reconstruction, ‘demorphologization’ has been used to describe phonological material that at one point constitutes a regular, predictable morpheme, and at some later point loses its connection to morphological patterns from which it derived. For example, the final rime of seldom derives from Old English dative *-um, while the m in French rompre ‘break’ derives from a nasal infix associated with present tense in Latin (Klausenburger 1976; Hopper 1990). Each of these was once an inflectional exponent, because it was part of a form:meaning pattern shared by an inflectional class of lexemes, but the dissolution of these patterns has left them absorbed into lexical stems. The recent innovations observed in Murrinhpatha 1.(34). and 3.(28). do not begin from a clear ‘morphemic’ unit in this way, as predictable form:meaning
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
75
relations in the classifier stem morphology have already long given way to lexically specific, unpredictable allomorphy. But the changes nonetheless reflect incremental steps on the path of demorphologization, undermining the morphological structure of the classifier stem. Every time a paradigmatic cell in the system shifts from a more predictable allomorph to a less predictable one, the formative structure of the system is incrementally undermined. Processes of this type are probably responsible for much of the integrative complexity in Murrinhpatha verbs—though pursuit of this hypothesis would depend on more extensive historical reconstruction than is presently available (Green 2003).
3.7 Conclusions In this chapter, we have investigated changes in Murrinhpatha classifier stem paradigms, a closed-class system with high integrative complexity. The system of intersecting formatives underlying the exponence of person, number, and tense on Murrinhpatha verb classifier stems is unusually complex, in terms of both wealth of allomorphy and unpredictability of paradigmatic relations. We have studied changes unfolding in this system with the goal of determining whether observed changes reduce or increase the complexity of the system. Seven likely changes in progress were identified, based on variable exponents where younger speakers showed a strong preference for an innovative variant, as opposed to the conservative variant favoured by older speakers. Calculation of the most predictable allomorphs for these exponents was performed by adapting the model of Ackerman & Malouf (2015), but none of the seven observed changes were selected as expected by this model. Nor were the changed forms more predictable than the incumbent forms they replaced—in fact, in six of the seven instances, the innovated form was less predictable. Analysis of the analogical sources for two of the forms suggests that less predictable forms have been selected by speakers because of analogies that abrogate the inner stem/affix structure evident in the system. The extensive phonological mutation already undergone by the inner stem elements has no doubt led to this further obfuscation of inner stem elements, deepening the overall complexity of the system. Incremental demorphologization produces integrative complexity, but also adds to opacity in structure. We have observed this in a closed-class system of thirty-nine members, but also argued that the problem of integrative complexity presupposes a closed class of some size. The size of the Murrinhpatha paradigms, with 1,638 forms in total, presumably allows for some degree of whole-form memorization. But evidence observed in analogical changes also shows that implicational relations are active in acquisition or processing, and not all forms are learnt and stored in isolation. We hope that further research on integrative complexity will provide more insight into how analogy and memorization interact in complex inflectional systems.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
76
Appendix Illustrated below are the inflectional paradigms for classifiers discussed in this chapter. The paradigms for (34) and (28) are illustrated in the body of the text.
/ø/ ‘(1)’
NFUT (/PRSL) SG
INCL PL/ DU
1 ŋem ŋ-e-[]-m 2 t ̪im t ̪-i-[]-m 3 dim / kem d-i-[]-m / k -e-[]-m 1+2 t ̪im t ̪-i.[].m 1 ŋaɾim ŋ-a-[]-m 2 niɾim n-i-[]-m 3 pirim / kaɾim p-i-[]-m / k-a-[]-m
INNER /∅/ STEM: /ɾi/ : /ju/ : (), : IRR (/FUT)
PST
PSTIRR
SG 1 ŋi ŋ-i-[]-∅ 2 t ̪i t ̪- . . . 3 ki/ pi k- . . . / p- . . .
ŋini ŋ-i-[]-ni t ̪ini t ̪- . . . dini d- . . .
ŋini ŋ-i-[]-ni t ̪ini t ̪- . . . dini d- . . .
INCL pi 1+2 p-i.[].∅
t ̪ini t ̪-i-[]-ni
t ̪ini t ̪-i-[]-ni
ŋaɾini ŋ-a-[]-ni
ŋaɾini ŋ-a-[]-ni
niɾini n-i-[]-ni piɾini p-i-[]-ni
niɾini n-i-[]-ni piɾini p-i-[]-ni
ŋaɾine ŋ-a-[]ne niɾine n-i-[]-ne piɾine p-i-[]-ne
ŋaɾine ŋ-a-[]ne niɾine n-i-[]-ne piɾine p-i-[]-ne
PL 1 ŋuju ŋ-u-[. ]-∅ 2 nuju n- . . . 3 kuju / puju k- . . . / p- . . . DU 1 ŋe ŋe.[].∅ 2 ne n- . . . 3 ke / pe k- . . . / p- . . .
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
/ɾu/ ‘go(6)’
NFUT (/PRSL) SG
INCL
PL/ DU
INNER /ɾu/ / STEM: ɻu/ /∅/ /ji/ (), /mpa/ (), IRR (/FUT)
77
/ɾa/ /ɾi/ /je/ (), ,
PST
PSTIRR
ŋuɾini ŋ-u[]ni t ̪uɾini t ̪- . . . wuɾini w- . . .
ŋuɾi ŋ-u[]-∅ t ̪uɾi t ̪- . . . wuɾi w- . . .
1 ŋuɾan ŋ-u-[]-n
SG 1 ŋuɾu ŋ-u-[]-∅
2 t ̪uɾan t ̪- . . . 3 wuɾan / kuɾan w- . . . / k- . . .
2 t ̪uɾu t ̪- . . . 3 kuɾu / puɾu k- . . . / p- . . .
1+2 t ̪uɾan t ̪-u-[]-n
INCL puɾu 1+2 p-u-[]-∅
t ̪uɾini t ̪-u[]-ni
t ̪uɾi t ̪u[]-∅
1 ŋumpan ŋ-u-[, ]-n 2 numpan n- . . . 3 pumpan / kumpan p- . . . / k- . . .
PL 1 ŋuɻu ŋ-u-[]-∅
ŋuɳi ŋ-u-[, ]-ɳi nuɳi n- . . . puɳi p- . . .
ŋuji ŋ-u-[, ]-∅ nuji n- . . . puji p- . . .
ŋuɳe ŋ-u-[, ]-ɳe
ŋuje ŋ-u-[, , ]-∅ nuje n- . . . puje p- . . .
2 nuɻu n- . . . 3 kuɻu / puɻu k- . . . / p- . . . DU 1 ŋa ŋ-a-[]-∅ 2 na n- . . . 3 ka / pa k- . . . / p- . . .
nuɳe n- . . . puɳe p- . . .
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
78
/nu/ ‘(7)’
SG
INNER /nu/ STEM: /ni/ : /nuj/ : /na/ :
NFUT (/PRSL)
IRR (/FUT)
1 ŋunuŋam ŋ-u-[]-ŋam 2 t ̪unuŋam t ̪- . . . 3 nuŋam / kunuŋam ∅- . . . / k- . . . t ̪unuŋam t ̪-u-[]-ŋam
INCL 1+2
PST
PSTIRR
SG 1 ŋunu ŋ-u-[]-∅ 2 t ̪unu t ̪- . . . 3 kunu / punu k- . . . / p- . . .
ŋuna ŋ-u-[]-∅ t ̪una t ̪- . . . na* ∅- . . .
ŋuni ŋ-u-[]-∅ t ̪uni t ̪- . . . nuj ∅-u-[]-∅
INCL punu 1+2 p-u-[]-∅
t ̪una t ̪-u-[]-∅
t ̪uni t ̪-u-[]-∅
PL/DU 1 ŋunnuŋam ŋ-u-[]-ŋam 2 nunnuŋam n- . . . 3 punnuŋam / kunnuŋam p- . . . / k- . . .
/nnu/ : /nni/ :, : /nna/ :, : /nne/ :, : , :
PL 1 ŋunnu ŋ-u-[]-∅
ŋunni ŋ-u-[, ]-∅ 2 nunnu nunni n- . . . n- . . . 3 kunnu / punnu punni k- . . . / p- . . . p- . . .
DU 1 ŋunna ŋ-u-[, ]-∅ 2 nunna n- . . . 3 kunna / punna k- . . . / p- . . .
ŋunna ŋ-u-[, ]-∅ nunna n- . . . punna p- . . .
ŋunni ŋ-u-[,]-∅ nunni n- . . . punni p- . . . ŋunne ŋ-u-[,, ]-∅ nunne n- . . . punne p- . . .
Note: Street (1987) in addition lists a variant /nuɻa/ use.feet.3.. This variant does not appear in our corpus data.
/la/ ‘(26)’ NFUT (/PRSL) SG
INCL PL/DU
1 ŋilam ŋ-i-[]-m 2 t ̪ilam t ̪- . . . 3 dilam / kilam d- . . . / k- . . . 1+2 t ̪ilam t ̪-i-[]-m 1 ŋillaŋam ŋ-i-[]-ŋam 2 nillaŋam n- . . .
INNER /la/ /lla/ : STEM: IRR (/FUT)
PST
PSTIRR
SG 1 ŋila ŋ-i-[]-∅ 2 t ̪ila t ̪- . . . 3 kila / pila k- . . . / p- . . .
ŋila ŋ-i-[]-∅ t ̪ila t ̪- . . . dila d- . . .
ŋila ŋiŋ-i-[]-ŋi t ̪ilaŋi t ̪-i-[]-ŋi dilaŋi d-i-[]-ŋi
INCL pila 1+2 p-i-[]-∅
t ̪ila t ̪-i-[]-∅
t ̪ilaŋi t ̪-i-[]-ŋi
ŋilla ŋ-i-[]-∅ nilla n- . . .
ŋillaŋi ŋ-i-[]-ŋi nillaŋi n- . . .
PL 1 ŋilla ŋ-i-[]-∅ 2 nilla n- . . .
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
3 pillaŋam / killaŋam p- . . . / k- . . .
3 killa / pilla k- . . . / p- . . . DU 1 ŋilla ŋ-i-[]-∅ 2 nilla n- . . . 3 killa / pilla k- . . . / p- . . .
79
pilla p- . . .
pillaŋi p- . . .
ŋilla ŋ-i-[]-∅ nilla n- . . . pilla p- . . .
ŋillaŋi ŋ-i-[]-ŋi nillaŋi n- . . . pillaŋi p- . . .
/ɾa/ ‘. (36)’ INNER /ɾi/ STEM: /ɻi/ : /∅/ : NFUT (/PRSL) SG
INCL PL/DU
1 ŋiɾim ŋ-i-[]-m 2 t ̪iɾim t ̪- . . . 3 diɾim / kiɾim d- . . . / k- . . . 1+2 t ̪iɾim t ̪-i-[]-m 1 ŋim ŋ-i-[]-m 2 nim n- . . . 3 pim / kim p- . . . / k- . . .
IRR (/FUT) PST SG 1 ŋiɾi ŋ-i-[]-∅ 2 t ̪iɾi t ̪- . . . 3 kiɾi / piɾi k- . . . / p- . . .
PSTIRR
ŋiɾi ŋ-i-[]-∅ t ̪iɾi t ̪- . . . diɾi d- . . .
ŋiɾini ŋ-i-[]-ni t ̪iɾini t ̪- . . . diɾini d- . . .
INCL piɾi t ̪iɾi 1+2 p-i-[]-∅ t -̪ i-[]-∅
t ̪iɾini t ̪-i-[]-ni
PL 1 ŋiɻi ŋ-i-[]-∅ 2 niɻi n- . . . 3 kiɻi / piɻi k- . . . / p- . . .
ŋi ŋ-i-[]-∅ ni n- . . . pi p- . . .
ŋiɻi ŋ-i-[]-∅ niɻi n- . . . piɻi p- . . .
DU 1 ŋiɻi ŋ-i-[]-∅ 2 niɻi n- . . . 3 kiɻi / piɻi k- . . . / p- . . .
ŋi ŋ-i-[]-∅ ni n- . . . pi p- . . .
ŋiɻi ŋ-i-[]-∅ niɻi n- . . . piɻi p- . . .
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
80
Acknowledgements This research is funded by the Australian Research Council Centre of Excellence for the Dynamics of Language (Project ID: CE140100041). We are greatly indebted to the people of Wadeye, Australia, who have generously shared their knowledge of Murrinhpatha with us. We also thank Peter Arkadiev and Francesco Gardani for inviting us to present at the workshop which led to this volume, and for their comments on our original submission. Bill Forshaw, Jeff Parker, and an anonymous reviewer also provided insightful comments, as did audience members of the ‘Morphological Complexity’ workshop at Societas Linguistica Europaea (SLE), 2015. We dedicate this chapter to the late Chester Street, whose detailed documentation work revealed the extraordinary complexity of Murrinhpatha verbs.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
4 Overabundance resulting from language contact Complex cell-mates in Gurindji Kriol Felicity Meakins and Sasha Wilmoth
4.1 Introduction One of the oft claimed results of language contact is the reduction of morphological complexity. For example, syncretism, allomorphic simplification, the difficulty of transferring morphemes, and increased paradigmatic regularity are all observed outcomes of contact-induced change (e.g., McWhorter 1998; Myers-Scotton 2002; Janse & Tol 2003; Gardani 2008). These processes reduce the expression of morphological features, for example case, tense/aspect/mood (TAM), gender, and number; and the complexity of relationships between cells in paradigms expressing these features. In this sense, these changes represent an absolute decrease in the number of morphosyntactic distinctions that a language makes both in terms of the internal structure of words and their arrangement into inflectional classes. This type of morphological complexity has been termed ‘complexity of exponence’ (Anderson 2015a: 20) or ‘E(numerative) complexity’ (Ackerman & Malouf 2013: 433; see also section 1.3.1 in the Introduction to this volume). Such changes can be quantified as a measure of average paradigm entropy, that is, the degree of uncertainty in predicting the content of a particular cell in a paradigm (Ackerman et al. 2009; Ackerman & Malouf 2013; Parker & Sims, Chapter 2, this volume). One area of complexity, which Anderson (2015a: 22) notes as having received less attention in the morphological literature, is variation within the cells of a paradigm, for example ‘dived’ and ‘dove’ which are different word forms of the past tense form of {} in English. Thornton (2011) calls this type of complexity ‘overabundance’. Overabundance refers to multiple forms being realized within the same cell in a paradigm, or lexemes with ‘cell-mates’, as Loporcaro quips (see Loporcaro & Paciaroni 2011: 420 and Loporcaro, Chapter 6, this volume). Thornton observes that variation between cell-mates may be subject to sociolinguistic and syntactic-semantic conditions. Felicity Meakins and Sasha Wilmoth, Overabundance resulting from language contact: Complex cell-mates in Gurindji Kriol In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Felicity Meakins and Sasha Wilmoth. DOI: 10.1093/oso/9780198861287.003.0004
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
82
In this chapter, we demonstrate that overabundance can increase in situations of language contact and, therefore represent an increase in E-complexity due to the proliferation of exponents, in this case, cell-mates. Perhaps more interestingly, we also suggest that overabundance represents an increase in I(ntegrative) complexity, that is, increased within-cell variation makes it harder for speakers ‘to make accurate guesses about unknown forms of words based on exposure to known forms’ (Ackerman & Malouf 2013: 436; see also section 1.3.2 in the Introduction to this volume). Usually I-complexity refers to how speakers are able to surmise a word form in one cell in a paradigm based on other forms in the same paradigm. In this chapter, we show how overabundance requires speakers to make calculated choices about forms based on features beyond the paradigm. We also show that the I-complexity of overabundance can be measured using generalized linear mixed models (GLMM) which probabilistically measure the use versus non-use of a feature (dependent variable) against semantic, grammatical, and information structure features in a clause (independent variables or predictors) and their interactions, within a cluster of idiolects (random variable) (Pinheiro & Bates 2000; Baayen 2008; Marschner 2011). The relative importance of the predictors can then be determined using dependence analysis (Azen & Traxel 2009). We present a case study of the development of overabundance in the subjectmarking system of an Australian mixed language, Gurindji Kriol, and claim that this dimension of complexity is the result of language contact. Furthermore, we assess whether this complexity has stabilized in second-generation child speakers of Gurindji Kriol. This complexification and subsequent stabilization due to contact is reflected experimentally in Berdicevskis & Semenuks (Chapter 11, this volume). Overabundance in Gurindji Kriol manifests itself as optional case marking and involves variation within a cell, that is, the use or non-use of a case suffix where the grammatical role of the nominal is unaffected by non-use (cf. McGregor & Verstraete 2010). This pattern is shown in sequential clauses in (1) where the subject is marked in the first clause and unmarked in the second clause.¹ (1)
Warlaku na bi-ngku bin jeij-im im dog bee- chase- 3. dat mukmuk-Ø bin jeij-im dat karu na the owl- chase- the child ‘The bees chased the dog and the owl chased the child.’ (BP: 9yrs: FM13_35_3e: Frog story: 2:10min)
¹ In all examples, Gurindji elements are given in italics, Kriol in plain font and subjects are bolded.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
-
83
Meakins (2009, 2015) shows that optional subject marking developed as a result of contact between Gurindji and Kriol whereby the Gurindji ergative marker was retained in the process of the formation of the mixed language, Gurindji Kriol, but became optional and was later reanalysed as nominative marking when it also came to mark intransitive subjects. In this respect, overabundance developed in the nominative cell of the case paradigm where an alternation now exists between the forms -ngku/-tu and a zero morph (or nothing, depending on one’s theoretical approach). Variation is driven by a number of semantic, syntactic, and information structure features including transitivity and word order (Meakins 2009; Meakins & O’Shannessy 2010). This optional case marking system requires speakers of Gurindji Kriol to constantly monitor the clause and its place in the discourse to make decisions about whether to overtly express subject marking or not. Thus in this chapter, we make the case that overabundance in Gurindji Kriol is an example of a contact-induced change, which involves the complexification of an inflectional paradigm rather than its simplification. In particular, we examine the further development of overabundance in subject marking using new data from Gurindji children to determine whether the complexity in the case paradigm has stabilized or whether complexification is ongoing. Changes in overabundance are quantified along two dimensions using different quantitative methods: (i) the change between generations of Gurindji speakers in the contribution of different predictors to the use of subject marking is shown through GLMM (Marschner 2011); and (ii) generational differences in the relative contribution of the different factors is demonstrated using dominance analysis (Azen & Traxel 2009).
4.2 Dimensions and measures of morphological complexity in language contact Numerous studies have shown instances of the reduction of morphological complexity, particularly in inflectional paradigms, in situations of language contact (see Miestamo et al. 2008 for a recent collection of papers). There are a number of dimensions which can be affected by simplification processes. Fundamentally, languages that have morphology are considered to be more complex than languages which do not, that is, isolating languages (Sapir 1921; Anderson 1992) (see section 1.2 in the Introduction to this volume). Extreme cases of language contact such as creolization have also been shown to have a radically reductive effect on inflectional morphology (see Miestamo et al. 2008 for a recent collection of papers, and Henri, Stump, & Tribout, Chapter 6, this
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
84
volume, and McWhorter, Chapter 10, this volume, for further discussions).² Similarly, inflectional morphology is rarely borrowed or switched into the grammatical frame of another language (Myers-Scotton 2002; Aikhenvald & Dixon 2006; Matras & Sakel 2007; Gardani 2008). Where inflectional morphology remains in situations of language contact, different dimensions of complexity are affected. In particular what Anderson (2015a: 20) terms the ‘complexity of exponence’ or Ackerman & Malouf (2013: 433) call ‘E(numerative) complexity’ often undergoes reduction. For example syncretism, allomorphic simplification, and increased paradigmatic regularity are all observed outcomes of contact-induced change and language obsolescence (Dorian 1978; Gal 1989; Janse & Tol 2003). All of these processes reduce the exponence of morphological features such as case, TAM, gender, and number, and the complexity of relationships between cells within paradigms expressing these features. At the extreme end, these features gather up their morphological skirts and step out of paradigms and into periphrastic constructions, thereby transforming from synthetic forms into analytic forms (see de Groot’s 2008 study of Hungarian in contact for a recent example). Paradigmatic complexity can be measured as ‘entropy’ which captures the degree of predictability of forms in a paradigm (Ackerman et al. 2009; Ackerman & Malouf 2013). Entropy has been used to measure the relative complexity of different languages (see also Stump & Finkel 2016 for related work), however it can also be used to measure changes in complexity across time within the same language (see Mansfield and Nordlinger, Chapter 3, this volume, for a case study of Murrinhpatha). As Anderson (2015a: 22) has noted, a dimension of complexity which has received less attention in the morphological literature is variation within the cells of a paradigm, for example the ‘dived’ and ‘dove’ examples given in section 4.1— and many more examples of co-existing regular and irregular past tense and plural forms in English. Thornton (2011) calls the exponence of multiple forms in the same cell in a paradigm ‘overabundance’. Overabundance (which can be thought of as morphological ‘cell-mates’) is defined as ‘a cell in a paradigm . . . filled by two or more synonymous forms which realize the same set of morpho-syntactic properties’ (Thornton 2011: 2). She uses the Italian verb paradigm to demonstrate how variation between forms is motivated by different phonological and syntactic-semantic conditions. Thornton’s examples of overabundance mostly involve cases of language change and the regularization of inflectional paradigms. In this scenario, an irregular form co-exists with a newer regularized form. Processes of regularization are one source of variants. We argue that contact with another language provides another source of variants. It is common for multiple forms from different ² Although, see a number of surveys (Plag 2003a, 2003b; Roberts & Bresnan 2008) and countersurveys (DeGraff 2005; Parkvall 2008; Bakker et al. 2011; Henri & Kihm 2015) in response to this claim.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
-
85
languages to co-exist with their use determined by other features in the clause. To give another example from English, possession is expressed by the s-genitive ( Priming > Co-referential pronoun.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
-
97
Table 4.5. Relative effect of the significant predictors according to dominance analysis Additional contribution of fixed effects Subset model
R2
Transitive SV order (X2) (X1)
Coreferential Priming (X3) (X4)
Actualized (X5)
k = 0 average X1 .199 X2 .028 X3 .054 .092 X4 X5 .000 k = 1 average
.199 .030 .048 .088 .009 .044
.092 .195 .036 .054 .000 .071
.000 .208 .028 .055 .092 .096
X1X2 .229 X1X3 .248 .287 X1X4 X1X5 .208 .065 X2X3 .128 X2X4 .028 X2X5 .146 X3X4 .055 X3X5 .092 X4X5 k = 2 average .262 X1X2X3 .322 X1X2X4 .237 X1X2X5 .332 X1X3X4 .259 X1X3X5 .296 X1X3X5 .163 X2X4X4 .066 X2X3X5 .128 X2X4X5 .147 X3X4X5 k = 3 average X1X2X3X4 .342 X1X2X3X5 .272 X1X2X4X5 .332 X1X3X4X5 .342 X2X3X4X5 .163 k = 4 average X1X2X3X4 .358 X5 Overall average
.197 .194 .209 .186 .204 .204 .199 .179 .206 .204 .185 .194 .195 .195 -
.085b .014 .035 .029 .017 .011 .036 .024 .010 .013 .036 .016 .019 .016 .016 -
.054 .194 .011 .092 .001 .075 .033 .045 .051 .035 .038 .055 .043 .020 .035 .046 .035 .034 .026 .026 -
.093 .084 .088 .098 .100 .055 .086 .080 .095 .083 .097 .089 .086 .086 -
.008 .011 .009 .001 .000 .001 .005 .010 .010 .010 .000 .008 .016 .016 -
.166
.034
.046
.085
.025
a b
M
(X2 X3) - X3 (X1 + X3 + X4 + X5) - 4
.028 .201 .037a .100 .000
Table 4.6. Occurrence of subject marking in child Gurindji Kriol speakers according to predictors Transitive NOM no yes %
no 1194 653 35
SV Order yes 498 630 56
VS 101 71 41
Animate SV 1591 1212 43
A 1615 1236 43
Priming I 77 47 38
no 1288 576 31
Actualized yes 404 707 64
no 1489 1120 43
Corefer yes 203 163 45
no 1054 658 38
TOTAL yes 638 625 49
1692 1283 43
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
-
99
Table 4.7. Output of generalized linear mixed model analysis on 2,975 tokens Random effects
Name
Variance
Std. Dev.
Speaker
(Intercept)
0.7514
0.8668
Analysis conducted on 2,975 grammatical subjects, fifty-three speakers Fixed effects (Intercept) Transitive SV order Animate Co-referential Primed Actualized
Estimate 1.45894 1.01318 0.23995 0.31779 0.31724 0.98983 0.18714
Std. Error 0.24122 0.08926 0.19073 0.21602 0.09360 0.09040 0.13185
z value 6.048 11.351 1.258 1.471 3.389 10.949 1.419
p value < 0.001 < 0.001 0.20838 0.14125 < 0.001 < 0.001 0.15581
4.4.4 Discussion The overall question posed by this chapter is whether a change in the complexity in the expression of subject marking has occurred across two generations of Gurindji Kriol speakers. This question is set against the backdrop of broader theoretical questions about how to measure complexity in cases of overabundance, and whether all language contact leads to simplification. The combination of these broader questions allows us to determine whether changes have taken place in subject marking in Gurindji Kriol, and why these changes might have occurred. The question of whether there has been a change in complexity of subject marking was modelled using GLMM analysis. The results show three predictors in common for adults and children. Transitive subjects such as (8) are significantly more likely to be marked than intransitive clauses such as (9). Whether the nominal subject is marked also primes the appearance of the nominative in the next occurrence of a nominal subject. An example is given in (10) of sequential clauses containing nominal subjects with overt nominative marking. Third, subject marking is more likely when a co-referential pronoun is present, as shown in (11) in comparison with (12) which does not have a co-referential pronoun. (8)
Warlaku-ngku bait-im marluka leg-ta dog- bite- old.man leg- ‘The dog bites the old man on the leg.’ (SS: FHM051: 1:37min)
(9)
Dat warlaku bin kutij nyantu-ranyj the dog stand 3- ‘The dog stood on its own.’ (CE: FHM014: 2:24min)
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
100
Table 4.8. Relative effect of the significant predictors according to dominance analysis Additional contribution of fixed effects Subset model
R2
M
k = 0 average X1 .047 X2 .000 X3 .006 X4 .048 X5 .000 k = 1 average X1X2 .047 X1X3 .047 X1X4 .104 X1X5 .047 X2X3 .007 X2X4 .049 X2X5 .000 X3X4 .056 X3X5 .006 X4X5 .049 k = 2 average X1X2X3 .053 X1X2X4 .104 X1X2X5 .047 X1X3X4 .111 X1X3X5 .052 X1X3X5 .105 X2X4X4 .057 X2X3X5 .058 X2X4X5 .049 X3X4X5 .056 k = 3 average X1X2X3X4 .112 X1X2X3X5 .070 X1X2X4X5 .105 X1X3X4X5 .112 X2X3X4X5 .058 k = 4 average X1X2X3X4 .113 X5 Overall average
Transitive SV order (X2) (X1)
Coreferential Priming (X3) (X4)
Actualized (X5)
.047 .047 .041 .056 .047 .048 .046 .055 .047 .055 .046 .056 .051 .055 .012 .056 .056 .045 .055 .055 -
.000 .047 .007 .046 .000 .025 .006 .000 .000 .002 .052 .000 .010 .001 .018 .000 .002 .005 .001 .001 -
.006 .041 .001 .050 .000 .023 .006 .007 .005 .008 .058 .007 .015 .008 .023 .007 .009 .012 .008 .008 -
.048 .056 .001 .008 .001 .017 .057 .064 .058 .050 .049 .050 .055 .059 .058 .060 .000 .044 .043 .043 -
.000 .047 .000 .006 .049 .026 .000 .005 .001 .051 .000 .000 .010 .017 .001 .001 .001 .005 .001 .001 -
.049
.008
.013
.041
.008
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
-
101
(10) Najan kujarra-ngku dei bin gon jeij-im im another two- 3. go chase- 3. ankaj yapakayi najan kujarra-ngku na rarraj poor.thing small another two- run ‘Another two went chasing the poor little thing. Two more run then.’ (RR: FM009.A: 6:11min) (11) Jintaku warlaku-ngku i bin bait-im im one dog- 3. bite- 3. marluka la leg-ta man leg- ‘One dog bit a man on the leg.’ (AC: FHM052: 1:58min) (12) Dat warlaku bin bait-im im leg-ta dat marluka the dog bite- 3. leg- the man ‘The dog bit the man on the leg.’ (SS: FHM065: 4:53min) Adults had two more significant variables than children which predicted subject marking—word order, that is, Gurindji Kriol-speaking adults are more likely to mark subjects when they occur after the verb, as shown in (13) as opposed to (14); and event actualization, that is, events that weren’t actualized were less likely to be marked, as demonstrated in (15) which has a verb marked continuative and (16) which uses the potential auxiliary. (13) I=m put-im jumok tebul-ta igin dat kajirri-ngku 3.= put- smoke table- too the woman- ‘The woman puts the smokes on the table.’ (LS: FHM066: 0:19min) (14) Dat kajirri i=m put-im jumok jiya-ngka the woman 3.= put- smoke chair- The woman puts the smokes on the chair. (CA: FHM127: 2:24min) (15) Dat karu-ma mirlarrang-jawung i garra jarrwaj The child- spear- 3. spear im jamut 3. turkey ‘The child will shoot the turkey with a spear.’ (RR: FHM061: 3:10min) (16) Dat warlaku i bin hard-im-bat-karra nyanuny the dog 3. hurt--- 3. ‘The dog hurt his paw.’ (DO: FM15_55_1b: 1:42min)
wartan paw
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
102
In terms of E-complexity, both the adult system and the child system display overabundance, while traditional Gurindji uses subject marking obligatorily. Nonetheless the adult Gurindji Kriol system requires attention to a greater number of variables to make decisions about the application of subject marking. Thus the subject marking system seems to have complexified, in the sense of Icomplexity, at the point of contact with the genesis of the mixed language (represented in the adult speech), then simplified in the next generation. The child system seems to be a refined version of the adult system. Of the three variables in common, the relative predictive power of variables is the same: transitivity > priming > use of co-referential pronoun. For two of those predictors—priming and co-referential pronoun—subject-marking usage seems stable across the generations. For adults, 60% of primed subjects are marked compared with 28% of unprimed subjects; and children: 64% of primed subjects compared with 31% of unprimed subjects. Similarly for adults, 48% of subjects with co-referential pronouns are marked compared with 28% of subjects without co-referential pronouns; and for the children: 49% of subjects with co-referential pronouns compared with 28% of subjects without co-referential pronouns. Thus the influence of priming and the use of co-referential pronoun seem quite stable diachronically. On the other hand, transitivity, which is the strongest predictor of subject marking for both adults and children, shows larger differences across the generations—adults: 59% of transitive subjects compared with 16% of intransitive subjects; and children: 56% of transitive subjects compared with 35% of intransitive subjects. We argue that differences in the importance of transitivity, coupled with the loss of SV order as a predictor of subject marking in the children’s speech, are the results of decreasing contact with Gurindji. First, the subject marking in Gurindji Kriol finds its origins in the Gurindji ergative marker, which marked only transitive subjects. Many members of the first generation of Gurindji Kriol speakers only used subject marking for transitive subjects, although it was clearly beginning to spread to intransitive subjects. For child speakers of Gurindji Kriol, this pattern is much more entrenched, suggesting that the original influence of the Gurindji ergative pattern is waning. Second, the loss of SV order as a significant variable reinforces the argument that there is a decreasing contact with the Gurindji system. In general, SV order is more dominant for child speakers (only 5% of transitive clauses show VS order compared with 12% of adult speakers), reflecting the Kriol system of argument disambiguation. For adult speakers, ergative marking is more likely in VS clauses, which reflects the continuing interplay of the Gurindji and Kriol systems of argument disambiguation. This influence has been lost in child speakers.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
-
103
4.5 Concluding remarks This study has shown that complexification occurred in the area of subject marking in Gurindji Kriol in the intense contact period which saw its genesis. Subject marking was borrowed from Gurindji where it transformed from obligatory to variable marking, leading to a situation of overabundance, that is, a proliferation of cell-mates (E-complexity). Overabundance required speakers to monitor other linguistic features in the clause and discourse more broadly— transitivity, SV order, the marking of the previous nominal subject, the presence of a co-referential pronoun, and event actualization, rather than just the phonological composition of the stem, as is the case in Gurindji (I-complexity). Another generation on and only three of these variables are now relevant—transitivity, the presence of a co-referential pronoun, and priming. We argue that changes in the relative importance of transitivity and SV order in the children’s speech, and therefore simplification in the exponence of overabundance, is the result of decreasing contact with Gurindji. This chapter demonstrates that language contact does not always lead to the simplification of morphology, and in the case of overabundance, complexity, that is, the degree of variation in the expression of a form within the cell of a paradigm, can be a result of language contact. In the situation outlined by this chapter, the intense contact between Gurindji and Kriol argument marking systems which led to the formation of Gurindji Kriol also saw the development of a system of subject marking which was derived from Gurindji but was more complex than the obligatory marking system of Gurindji. The new generation of Gurindji Kriol has less access to Gurindji, that is, there are fewer speakers of Gurindji in their linguistic environment and they have had fewer years of exposure to Gurindji than the adult speakers. The result has been a simplification of overabundance where the system is no longer an interplay between the Gurindji and Kriol systems of argument disambiguation (i.e., SV order no longer predicts subject marking), and there is an increase in the marking of intransitive subjects, which is far removed from the function of the original Gurindji ergative marker.
Acknowledgements The data collection (see section 4.4.1) was funded by the Aboriginal Child Language (ACLA) project from 2004 to 2007, the Jaminjungan and Eastern Ngumpin DoBeS project from 2007 to 2008 (available in the DoBeS archives—http://dobes.mpi.nl/ projects/jaminjung/), a Hans Rausing Endangered Languages Project from 2008 to 2010 (IPF0134; available in the ELAP archive—http://elar.soas.ac.uk/deposit/0273), an Australian Research Council APD project from 2009 to 2012 (DP0985024); and an
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
104
Australian Research Council DECRA project from 2014 to 2017 (DE140100854). As well as Cassandra Algy, a number of language consultants were instrumental in the collection of data: Samantha, Lisa, Rosie & Leanne Smiler, Cecelia Edwards, and Ronaleen & Anne-Marie Reynolds. We are also grateful for the support of Appen, in particular to Simon Hammond for technical support.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
5 Derivation and the morphological complexity of three French-based creoles Fabiola Henri, Gregory Stump, and Delphine Tribout
5.1 Introduction The claim of creole simplicity is pervasive in linguistics. This claim harks back to the nineteenth-century view that linguistic complexity correlates with the properties of a language’s inflectional morphology and with its age (DeGraff 2001). According to this view, isolating languages are ‘primitive’ in comparison with synthetic languages, whose morphology is taken as evidence of heightened complexity. Modern creolistic literature abounds with such assumptions. Creoles are seen as newborn languages that emerge from rudimentary pidgins embodying a break in the transmission of the lexifier. As such, they constitute a kind of transition between primitive pidgin ‘protolanguages’ and mature languages (Bickerton 1981). Complementing this view of creoles as ‘young’ languages are comparisons with ‘complex’ languages that purportedly reveal creoles to be ‘the world’s simplest grammars’ on the grounds that they exhibit no, or at most, insignificant vestiges of the lexifier’s system of inflectional marking (Seuren & Wekker 1986; Bickerton 1988; McWhorter 2001; Parkvall 2008; Bakker 2014; among others). As has been argued elsewhere (DeGraff 2001; Mufwene 2008; Blasi et al. 2017), these assertions rest upon several controversial assumptions that may be questioned on empirical, theoretical, and sociohistorical grounds. In the domain of morphology, for example, the received view that creoles are maximally isolating has been decisively disconfirmed by unequivocal evidence of inflectional morphology in many creoles (Kihm 1994; DeGraff 2001; Bakker 2003; Baptista 2003a, 2003b; Roberts & Bresnan 2008; among others). It is true that a creole may exhibit less morphology than its lexifier,¹ but does this entail that it is less complex? ¹ Studies relating to the morphological complexity of creoles usually rely on comparisons with the lexifiers rather than with the contributing substrates. A combination of factors has given rise to this preference. First, the formation of a creole usually involves one contributing lexifier, but may involve several substrates whose contributions to the creole’s formation are hard to evaluate in terms of proportion. In the absence of adequate historical documentation, we cannot always attribute particular contributions to particular substrate languages. Even so, we can definitely affirm that the substrates of Fabiola Henri, Gregory Stump, and Delphine Tribout, Derivation and the morphological complexity of three French-based creoles In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Fabiola Henri, Gregory Stump, and Delphine Tribout. DOI: 10.1093/oso/9780198861287.003.0005
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
106
, ,
Morphological complexity is often equated with numerousness—of morphs, categories, processes, or paradigm cells; but this is not the only way of measuring complexity, nor is it in general the most enlightening way (Ackerman & Malouf 2013; Stump 2017). In this chapter, we draw upon an alternative conception of morphological complexity which we apply to a language’s system of derivational morphology. Drawing on a precise analysis of their deverbal derivation, we argue that three French-based creoles (Mauritian, Guadeloupean, and Haitian) display an unexpected degree of morphological complexity. We detail our conception of morphological complexity in section 5.2, and in section 5.3, we discuss the issue of creole simplicity. In section 5.4, we examine the morphology of French (the lexifier language of these creoles) and that of the creoles themselves. In section 5.5, we define our theoretical framework. Finally, section 5.6 presents our new analysis of deverbal nominalizations in Mauritian, Guadeloupean, and Haitian.
5.2 Morphological complexity Various perspectives have informed recent discussions of the notion of linguistic complexity (Dahl 2004; Hawkins 2004; Miestamo et al. 2008; Sampson et al. 2009; Newmeyer & Preston 2014; and Baerman et al. 2015a). On the one hand, the complexity of a linguistic phenomenon may be seen in psycholinguistic terms as the extent of the difficulties that it poses for a language’s learners and users. On the other hand, complexity may be seen in more absolute terms as an independently measurable property of the language system itself, separable, in principle, from issues of acquisition, production, and processing (though no doubt correlated with them in discoverable ways). Moreover, linguistic complexity is logically of at least two types (Ackerman & Malouf 2013): a linguistic phenomenon’s enumerative complexity depends on how many categories (of whatever type) it employs; its integrative complexity, by contrast, depends on the idiosyncrasy of the interactions among those categories. A language’s morphology can exhibit complexity in a variety of ways. The most intensively studied kinds of complexity involve either the morphotactics of individual word forms (whose enumerative complexity is a function of degree of synthesis and degree of fusion; Schlegel 1808; Humboldt 1836; Sapir 1921; Greenberg 1960; Bickel & Nichols 2013) or the structure of whole inflectional paradigms (whose integrative complexity is a function of the predictability of a paradigm’s word forms; Moscoso del Prado Martín et al. 2004; Ackerman et al. 2009; Milin et al. 2009; Ackerman & Malouf
Caribbean creoles differ from those of Indian Ocean creoles. Moreover, creolistics has a history of Eurocentrism, which has favoured the comparison of creole grammars with the more familiar grammars of their Indo-European lexifiers.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
107
2013; Stump & Finkel 2013). But a language’s morphology may exhibit other kinds of complexity as well.² Here, we are concerned with the integrative complexity of a language’s morphology as reflected by the interaction of a lexeme’s inventory of forms with its participation in deverbal derivation. In general, a language’s derivational morphology may exhibit complexity in two different dimensions. In order to distinguish these, it is useful to distinguish not only between a derivational relation’s and , but also between the relation’s and —the specific stems of the base and derived lexemes whose morphology participates in the formal expression of their derivational relation. Thus, the derivational relation of the base lexeme to the derived lexeme is formally expressed by means of the relation of the base stem thiev- to the derived stem thievish. Given these distinctions, the first dimension of a derivational relation’s complexity is that of the predictability of the base lexeme’s base stem; the second dimension is that of a base stem’s restrictedness in the morphology of the base lexeme. Consider first the dimension of base-stem predictability. In discussing this dimension, we make the uncontroversial assumption (Aronoff 1994; Stump 2001) that a lexeme L has a whose members serve in the definition of both (i) the inflected word forms constituting L’s inflectional paradigm; and (ii) the stem sets of lexemes derived from L. In general, we assume that a lexeme’s stem set may include both free and bound stems. On this assumption, the complexity of a particular derivational relation depends on which member of the base lexeme’s stem set is its base stem in that relation. In the simplest cases—those whose complexity is of degree 0—the base stem for a base lexeme L in a particular derivational relation is the only member of L’s stem set. From this endpoint of maximal simplicity, successively greater degrees of complexity can be calibrated. In cases of derivation exhibiting complexity of degree 1 or 2, the base lexeme in a particular derivational relation possesses more than one stem, only one of which serves as its base stem in that relation. In cases exhibiting complexity of degree 0 or 1, the base lexeme’s base stem is predictable; in cases exhibiting complexity of degree 2, the base lexeme’s base stem is unpredictable. Thus, instances of derivation may evince three degrees of increasing complexity, as in Figure 5.1. This first notion of complexity calls to mind those approaches to complexity based on information theory (Arkadiev & Gardani, Chapter 1, this volume); in such approaches, complexity arises from a lack of predictability among a system’s parts. In assessing complexity of this sort in a system of inflection classes, the parts at issue are an inflectional paradigm’s cells (cf. Parker & Sims, Chapter 2, this volume); here, by contrast, the parts at issue are those members of a base ² See Stump (2017) for a discussion of the wide range of possible measures of morphological complexity.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
108
, , Complexity Is base lexeme’s base stem in R predictable? low
↕ high
yes
no
Degree Example
0
boy → boyish
1
man (~ men) → mannish goose (~ geese) → goosish
2
self ~ selve(s) → selfish BUT thief ~ thieve(s) → thievish
Cardinality of base lexeme’s stem set 1
>1
Figure 5.1. Degrees of complexity in the predictability of a base lexeme’s base stem in a particular derivational relation R
lexeme’s stem inventory available for the definition of a derived lexeme’s stem inventory. By this criterion, the derivational relation between boy and boyish is least complex, since the stem on which boy-ish is based is the only available choice, the sole stem of boy; the derivational relation between man and mannish (or between goose and goosish) is more complex, since the stem on which mann-ish (or goos-ish) is based is not the only available choice, though it does conform to a general pattern favouring the use of the singular form’s stem; and the relation between thief and thievish is most complex, since the stem on which thiev-ish is based is not the only available choice and actually fails to conform to the general pattern favouring the use of the singular form’s stem. The second dimension of a derivational relation’s integrative complexity is that of base-stem restrictedness. Where X is the particular member of a lexeme L’s stem set that serves as L’s base stem in a particular derivational relation, how restricted a role does X play in the morphology of L? In the simplest cases (e.g., that of English grass ! grassy), X is L’s only stem and therefore has an unrestricted role in the morphology of L. In more complex cases (e.g., that of English leaf [~ leave(s)] ! leafy), a base lexeme L’s base stem in a particular derivational relation is only used in the realization of certain cells in L’s inflectional paradigm, so that its role in L’s inflectional morphology is restricted according to the morphosyntactic property set to be realized. In the most complex cases (e.g., that of English louse /laʊs/ ! lousy /laʊzi/), a base lexeme L’s base stem is ‘hidden’ to the extent that it has no role at all in the inflection of L but is reserved for defining the stems of some or all lexemes deriving from L. This second dimension of complexity is schematized in Figure 5.2, where we again distinguish three degrees of complexity. This second notion of complexity is qualitative in the sense that it equates complexity with deviation from a canonical ideal (cf. Nichols, Chapter 7, this volume)—specifically, it equates complexity with deviation from a canonical pattern in which the stem that defines a derived lexeme’s form also defines the
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Complexity
Degree
Role of X in L’s morphology
Example
0
Unrestricted because X is L’s sole stem
grass → grassy
1
In the inflection of L, X is restricted to the realization of certain morphosyntactic property sets
leaf [~ leave(s)] → leafy
2
X is not used in the inflection of L, but is restricted to the definition of stems of derivatives of L
louse /laʊs/ → lousy /laʊzi/
low
↕
high
109
Figure 5.2. Degrees of complexity in the restrictedness of stem X in the morphology of lexeme L, where X serves as L’s base stem in a particular derivational relation
base lexeme’s inflected forms. By this criterion, the derivational relation between grass and grassy is least complex, since the stem on which grass-y is based is employed in both inflected forms of grass; the derivational relation between leaf and leafy is more complex, since the stem on which leaf-y is based is only employed in one of the inflected forms of leaf; and the relation between louse and lousy is most complex, since the stem on which lous-y is based isn’t employed in either of the inflected forms of louse.
5.3 Creole simplicity According to Seuren (1998: 292–3), ‘if a language has a Creole origin it is SVO, has TMA particles, [and] has virtually no morphology’. Claims of this kind reflect an ideology about creoles that finds its origin in the eighteenth century, when creoles were described as ‘corrupt’ and ‘deficient’ compared to exemplary grammars such as that of Latin. These deficiencies were presumed to result from the inability of Africans to acquire the grammatical intricacies of European languages (BertrandBocande´ 1849; Baissac 1880; see also Meijer & Muysken 1977 for discussion). With the advent of generative grammar, Bickerton (1981) formulated the Language Bioprogram Hypothesis, a theory that sees the process of creolization as the complexification of a pidgin that creole children are exposed to. A pidgin, according to Bickerton, is an unstable form of communication that results from a simplification of the lexifier language by adults during the process of secondlanguage acquisition. The contact languages emerging from this sort of process come closest to revealing Universal Grammar in its naked form, embodying ‘the world’s simplest grammars’ (McWhorter 2001).³ ³ Although McWhorter’s (2001) claim is about creoles, both pidgins and creoles are generally characterized as simple languages (Romaine 1988). Bickerton’s (1988) hypothesis, however, ranks pidgins as the simpler of the two, since pidgins are not systematic. On his view, it is as an effect of UG that a pidgin is creolized. Research has cast doubt on this generalization. Rich inflection can be
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
110
, ,
Simplification, as evoked in creole studies, is often associated with morphology, particularly inflectional morphology: a creole is identified as a type of language that exhibits semantically regular derivational affixation but no inflectional affixation (McWhorter 1998). More generally, McWhorter (1998, 2001, 2011, among others) claims that simplification of inflectional morphology is an effect of a ‘break in transmission’. In Chapter 10, this volume, McWhorter elaborates on the hypothesis that ‘radical analyticity’ in creoles, Sinitic, Niger-Congo, and some Austronesian languages stems from the drastic elimination of inflection, in particular, contextual inflection during extensive adult acquisition. This peculiar kind of ‘unnatural’ change is nothing comparable to the processes of grammaticalization witnessed in languages like English or French. While these are more analytic than their ancestors, both of these languages retain agreement and complex expression of inherent inflection via root allomorphy. A similar claim is made by Grant (2009), who posits that simplicity is a reduction in the allomorphy found in the lexifier’s system to a sufficient extent that the emerging pidgin/ creole shows no inflectional marking. However, the evidence does not support either of these conceptions of linguistic simplification. Contra McWhorter, Palenquero does show agreement in adnominal adjectives (Schwegler 2013) and even if many creoles have lost gender and number agreement, they have innovated new contextual morphology most certainly influenced by their substrate languages: all varieties of Melanesian Pidgin feature a transitivity marker which is suffixed to an English inherited lexicon (1). (1)
a. bild > bild-im haos build > build- house b. pei > pe-im skul yuniform buy > buy- school uniform c. let > let-em yu go let > let- you go
(Arika 2012)
French-based creoles spoken in the Indian Ocean all exhibit contextual inflection (see section 5.6.1.1 on Mauritian). As for the question of allomorphy, the approach we adopt in the next sections is that languages do not merely eliminate allomorphy. What appears in a new system in terms of forms is heavily dictated by frequency and the identification of paradigmatic patterns that will subsequently serve to make new forms. Such a perspective doesn’t warrant the existence of a prior pidgin. As Mufwene (2008) points out, a closer examination of the facts shows that creoles do not evolve from pidgins but rather from the approximation found in pidgins, even more so than in some creoles (Bakker 2003). If a creole develops through the nativization of a pidgin, as the Language Bioprogram Hypothesis holds, we would expect the creole to be more complex than the pidgin from which it develops.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
111
of a non-standard variety of the lexifier. Indeed, a recent study on the emergence of creole languages questions whether the existence of a pidgin is a necessary precursor to creolization, and suggests that contrary to common belief, emerging creoles are not typologically distinct from other languages (Blasi et al. 2017). In addition, the input for language learners in purely spoken settings differs radically from that of guided settings, since an inflectional paradigm’s perceptible distinctions are very different in speech and writing. Syncretism in a lexeme’s paradigm is much more pervasive in speech than in writing. In spoken French, only three forms are distinguished in the present indicative of first-conjugation verbs (e.g., /mɑ̃ʒ/ eat../3 ~ /mɑ̃ʒɔ̃/ eat..1 ~ /mɑ̃ʒe/ eat..2),⁴ making the form-function relationships quite opaque in purely spoken settings (cf. section 5.4.1). And while some forms, like the simple past (passe´ simple), are rare altogether in colloquial French, others, like the periphrastic future, are preferred over synthetic forms (Abouda & Skrovec 2015, 2017). This is also true of gender and number agreement, which is less perceptible in spoken French than in written French. The stark differences between spoken French and the French of more guided settings are clearly revealed by Cajun French, which derives from varieties of spoken French dating from the period of colonialism both in the Americas and in the Indian Ocean. Cajun French features extensive use of periphrastic expressions comparable to those observed in the creoles. Such periphrasis allows differences of tense, aspect, and mood (TAM) to be expressed without differences in synthetic morphology; the form of the main verb manger ‘to eat’ remains unchanged in periphrastic expressions such as vous-autres est après manger ‘you () are eating’ and vous-autres va manger ‘you () will eat’. Thus, verb paradigms in Cajun French distinguish fewer synthetic forms than their counterparts in standard French. French-based creoles are likewise outgrowths of spoken French; as such, they have not drastically simplified the French inflectional system, but have instead developed a native verb alternation that resembles one salient in spoken forms of the lexifier (Bonami et al. 2013). This is in line with recent empiricist approaches that reject the language innateness hypothesis and favour an integrative view of second-language acquisition according to which language learning relies on multiple factors, including innate learning abilities, prior knowledge of first language, social setting, and perceptual and statistical mechanisms (see also Saffran et al. 1996 and Tomasello 2000).⁵ Finally, there is also the logical problem of language ⁴ In French, the first conjugation constitutes the largest conjugation as well as the most regular and productive. ⁵ Other research on the emergence of language also suggests that aside from the human genetic endowment for language acquisition, human beings possess a mathematical or computational component for language creation and complexification (Hauser et al. 2002; Fitch & Hauser 2004; Gervain & Mehler 2010).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
112
, ,
simplification with regard to what has been identified as foreigner talk (Ferguson 1971). Foreigner talk refers to a simplified version of a language used by native speakers when addressing non-natives; the omission of inflections is widespread in these varieties (Hock & Joseph 1996). In any case, future creole speakers clearly have no prior knowledge of the lexifier language before acquisition, begging the question as to how they could have simplified it. These observations crucially support the view that the input was already simplified. The morphological complexity of creoles has generally been evaluated based on comparisons with their lexifier languages using traditional views of morphology. It must be said at the outset that the extent of a creole’s morphological complexity cannot simply be equated with the extent to which it mirrors complex patterns in the lexifier language; otherwise, as will be argued below, dimensions of complexity in the creole that have no counterpart in the lexifier language may simply be overlooked. This point is all the more crucial given that complexity can be measured in more than one way. Under a morpheme-based approach, a creole’s lexifier can be argued to be morphologically complex because it distinguishes a large number of inflected words, a large number of affixes, and, perhaps also, a large number of morphological processes. By these measures, the morphology of the creole under comparison appears much less complex.These measures, however, imply a particular conception of what constitutes morphology. In the generative-transformational tradition, it has been customary to see periphrasis as a syntactic construct; but periphrasis has recently been argued to function as a kind of inflectional exponence on a par with synthetic varieties of exponence (see Bonami 2015 and the references cited therein). Under the assumption that not all morphology is synthetic morphology, creole morphology takes on a higher degree of complexity, with larger arrays of morphosyntactic properties, larger paradigms, and larger inventories of inflectional exponents (Henri 2010; Kihm 2014; Henri & Kihm 2015). Nevertheless, as we noted in section 5.2, the complexity of a system is not simply enumerative; morphological complexity does not simply reduce to the cardinality of its morphosyntactic properties, the size of its paradigms, or the variety of its inflectional resources (Bonami et al. 2015). Even if creole inflectional systems are smaller on average⁶ than those of their lexifiers, they exhibit a comparable degree of integrative complexity. For example, Henri (2010) shows that in Mauritian, the complementary environments in which a verb’s long and short alternants appear cannot be characterized in morphological, syntactic, or information-structural terms by complementary natural classes of properties ⁶ Verbs in both Mauritian and French exhibit alternating forms, but a Mauritian verb’s synthetic paradigm is limited to two cells, neither of whose forms exhibits true affixation or any coherent morphosyntactic content (Henri 2010); in French, by contrast, a verb’s synthetic paradigm exhibits fifty-one cells, combinations of up to three inflectional affixes (e.g., i-r-i-ons ‘(we) would have gone’) and arguably six morphosyntactic features (Bonami and Boye´ 2003, 2007).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
113
(cf. section 5.6.1). Mismatches of this kind have been argued to be an indicator of integrative complexity (see Stump 2017: 70–1 and the references cited there). Mauritian is likewise more complex when it comes to interpredictability, that is, the difficulty of predicting one form based on knowledge of another (Henri 2010; Bonami & Henri 2010; Bonami et al. 2011). Luís (2014) also shows that IndoPortuguese creoles exhibit different types of form-meaning mismatches in their inflectional system. Korlai, for example, presents both class-specific syncretism and paradigmatic opacity that affect morphosyntactic transparency (Bonami et al. 2013; Luís 2014). Comparable mismatches are found in other Portuguese-based creoles spoken in Africa (Kihm 2014).
5.4 Verb inflection: from French to French-based creoles Creoles are usually claimed to retain few if any of their lexifier’s inflectional distinctions. In French-based creoles, this reduction has led to systems in which each verb has at least a short form (SF) and a long form (LF); systems of this kind are said to be characteristic of French-based creoles spoken in the Indian Ocean, and in the Americas, of Louisiana Creole and Haitian. The formal distinction between a verb’s SF and LF is claimed to be a syntactically-conditioned shape alternation in Isle de France creoles—Seychellois, Rodriguais, Chagossian, and Mauritian⁷—but not in Reunionese (Corne 1982; Seuren 1990; Syea 1992). Corne (1982) argues for a typological difference between Reunionese and Isle de France creoles on the basis of their verbal systems. Isle de France creoles’ verb alternations are said to have been influenced by Bantu alternations while those of Reunionese are reconciled with the assumption that it is merely a variety of French.⁸, ⁹ ⁷ These languages are said to form varieties of the same creole, namely Mauritian, this for reasons linked to colonization. Indeed, the Seychelles used to be part of British Mauritius together with Rodrigues and the Chagos. Rodrigues remains a Mauritian dependence while the sovereignty of the Chagos is still under dispute. ⁸ Depending on the verb, mesolectal varieties of Reunionese exhibit up to five inflected forms, expressing distinctions of tense and aspect. For example the verb ‘eat’ has the three inflected forms mâz, mâze, and mâzra, with the third one being restricted to negative future-tense contexts. Irregular verbs like ‘come’ exhibit five inflected forms, for example viê, vne, viê(n)ra, vni, vnir, where the future tense form viê(n)ra is again restricted to negative contexts and where there is a distinction between a past participle form vne and an infinitive vnir (Corne 1982). Corne (1982) further notes that those forms are unstable to the extent that the past tense, the past participle and the infinitive are interchangeable. Wittmann & Fournier (1987) present a severe critique of Corne’s data and analysis, drawing attention to a range of problems. They argue that his analysis is observationally inaccurate and theoretically questionable (given, e.g., the disparate range of factors that must be assumed to condition the proposed phonological rules; see also Henri 2010); that the analysis is not obviously informed by current thought on the usual motivations for regular sound changes; that the analysis is not compatible with reasonable assumptions about the uniformity of diachronic processes effecting language change; and that his assumption that Mauritian and Reunionese have fundamentally different histories is highly questionable. ⁹ Klingler (2003) and Rottet (1992) also assume that verb alternation in Louisiana Creole is reminiscent of French, making Louisiana Creole a plausible variety of French.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
114
, ,
Following Baker (1972), Corne argues that the LF/SF alternation affects 70% of the Mauritian verb lexicon and that SFs are derived by truncation of the LF’s final vowel under conditions that are syntactically and semantically determined. Chaudenson (2003), Veenstra & Becker (2003), Veenstra (2009), and others defend an alternative analysis according to which Mauritian inherits its long and short forms from a French verb’s infinitive and third-person singular present indicative forms (respectively) but without inheriting their corresponding functions. This development, they argue, is based on universals at play during secondlanguage learning. Veenstra (2009: 110) further hypothesizes that the LF/SF alternation is at first phonologically conditioned but that it gradually becomes grammaticalized so that the appearance of a verb’s SF is conditioned by a following complement. As discussed in section 5.6.1.1, the distribution of the Mauritian alternation is much more complex than what Veenstra assumes (see also Henri 2010). The function of the alternation seen in Mauritian—he says— might reflect Bantu influence, since the conjoint and disjoint verb forms found in Makhuwa and other Eastern Bantu languages exhibit similar functions. While the hypothesis is plausible, it raises the question of the Bantu contribution in Haitian, which shows an alternation associated with a more or less parallel function. According to DeGraff (2001:75), the distinction in Haitian is subject to prosodic or morphosyntactic constraints. Verb alternations are, according to DeGraff (2001), manifestations of inflectional morphology, with a verb’s SF arising from its LF by subtractive morphology in the context of a following complement. The evidence that we present below suggests that verb-stem alternations are characteristic of all French-based creoles to a greater or lesser degree. While the form of such alternations and the functions that they serve are innovated in each individual creole, they are nevertheless relatable to the existence of comparable though distinct alternations in the verb morphology of the lexifier. We advocate a theory of creole genesis that includes unguided second-language acquisition as one of the key components of creolization. In addition, we believe that there are a number of additional factors that may influence the emergence of a creole; these include frequency, salience, ease of perception, transparency, invariance, and congruence (see also Corne 1982; Mufwene 2008).
5.4.1 Properties of the French verbal paradigm As mentioned in section 5.2, the French verbal system is highly unpredictable and therefore unlikely to remain unchanged in French-based creoles (Bonami et al. 2013). Standard written French distinguishes three conjugation classes of synthetic paradigms consisting of a total of fifty-one cells expressing TAM, person, number, and gender. The first conjugation is the productive class, into which
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
115
Table 5.1. Patterns of syncretism in the French paradigm (Bonami et al. 2013) /.2 ./3 . . .3 ./3
finise
rɑ̃de
kɥize
puve
rɑ̃dʁ rɑ̃dy rɑ̃
kɥiʁ
fini finis
rɑ̃d
puvwaʁ py pø pœv pɥis
lave
lav
kɥi kɥiz
dit dize diʁ di diz
loans and neologisms are integrated, as opposed to the non-productive second and irregular third conjugation. As Table 5.1 shows, French verb paradigms exhibit extensive syncretism: in the first conjugation, many of a verb’s forms have one of two shapes, distinguished only by the presence or absence of a final /e/, for example /mɑ̃ʒe/ ~ /mɑ̃ʒ/ (Chaudenson 2003; Veenstra & Becker 2003; Henri 2010). The French Xe ~ X alternation decidedly resembles the long-short alternation seen in Frenchbased creoles, although, as we argue, the creole alternation cannot be seen as purely inherited (see section 5.4.2). In eighteenth-century French, final ‘r’ became unpronounced in second-conjugation infinitives and in third-conjugation infinitives ending in /iʁ/ (though not those ending in /iʁә/, such as ´ecrire ‘to write’); this means that in the expression of the paradigm cells listed in the left hand column of Table 5.1, only three forms were distinguished in the second conjugation, as Bonami et al. 2013 observe. Various factors tend to maximize the use of the syncretic forms in Table 5.1. In both spoken and written corpora, instances of the Xe ~ X pattern of /mɑ̃ʒe/ ~ /mɑ̃ʒ/ constitute more than 89% of forms (Bonami et al. 2013). In spoken French, the periphrastic future formation, involving the combination of the ancillary lexeme ‘go’ with an infinitive form (as in (2a), with syncretic /mɑ̃ʒe/), is overwhelmingly preferred to the synthetic formation in (2b). Similarly, the use of .1 forms with subject nous (nous mangeons ‘we’re eating’) tend, in colloquial French, to be supplanted by that of indefinite .3 forms with subject on (on mange ‘one is eating’, with syncretic /mɑ̃ʒ/). (2)
a. Il va 3 go.3 ‘He will eat.’
manger. eat.
b. Il mangera. 3 eat..3 ‘He will eat.’
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
116
, ,
Bonami et al. (2013) also note that French verb forms are often ambiguous with respect to their inflection-class membership. For instance, /pɛɳe/ serves as the .2 form for both the first-conjugation verb peigner ‘comb’ and the thirdconjugation verb peindre ‘paint’. Thus, certain differences in form may be widely recurrent even if they don’t stem from a single inflection-class difference. If creolization is at all sensitive to factors such as frequency, saliency, and perception, we expect to find an LF/SF distinction in creole verbs as a reflection of the wide recurrence of a comparable distinction in the lexifier (Bonami et al. 2013; see also Corne 1999; DeGraff 2001).
5.4.2 French-based creoles Verb alternations are observable across the French-based creoles, though the number of verbs exhibiting such alternations varies from one creole to another. Verbs in Guadeloupean are customarily described as being invariable. For example, Hazaël-Massieux (2002: 71) claims that Guadeloupean doesn’t show any real inflection, and distinctions between two forms of the same lexeme, like the distinction between fè /fɛ/ and fèt /fɛt/ ‘to do’, are French borrowings and are purely exceptional. A similar type of description is provided by Ehrhart (1993: 158), who maintains that Tayo, a French-based creole spoken in New Caledonia, behaves like American creoles (with the exception of Louisiana Creole) in having only a few verbs with more than one form, such as mete /mete/ ~ met /met/ ‘to put’, balaj /balaj/ ~ balaje /balaje/ ‘to sweep’, kouver /kuvɝ/ ~ kouvri /kuvʁi/ ‘to cover’. Granting the limited nature of verb alternations in these two creoles, we nevertheless believe that even here, the role of such alternations in a creole’s grammar cannot be ignored. When forms of a verb alternate, they exhibit systematic distributional differences. Moreover, the incidence of such alternations is important as a feature shared by the French-based creoles; it constitutes a common aspect of their development from French, but also a significant dimension of innovative divergence among the creoles themselves. We claim that the verb alternations found in the French-based creoles were in all cases shaped by but not necessarily inherited from their lexifier, pace Chaudenson (2003), Veenstra & Becker (2003), and Veenstra (2009). Consider the Mauritian verb forms shown in Table 5.2. The examples suggest that the alternation stems from a single French form from which a second form is independently innovated. The source form in French is very often the infinitive but may instead be some other form. For example, Mauritian /kone/ ‘to know’, though imported as a long form, stems not from the infinitive connaître but from the . connai(t/s) (itself a ‘short form’ in French). For syncretic forms like dwa ‘to owe’, there are two possibilities: either they are integrated as LFs (as in the
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
117
Table 5.2. Comparison of and .3 forms in French with long and short forms in Mauritian French
Mauritian
.
Gloss
ale vəni(ʁ) sɔʁti(ʁ) dəvwa(ʁ) konɛtʁ aswa(ʁ)
va vjɛ̃ sɔʁ dwa kone asjɛ
ale vini soɚti dwa kone asize
al vin soɚt dwa konn asiz
‘go’ ‘come’ ‘exit/go out’ ‘owe’ ‘know’ ‘sit’
Table 5.3. Sample comparison of long and short forms in four French-based creoles Reunionese
Louisiana Creole
Guadeloupean
Haitian
ale vɛne soɚti
al vɛn soɚrt
ale vini sɔɾti
alea vinb sɔɾ
ale vini sɔti
al/ay vin sɔt
‘go’ ‘come’ ‘exit/go out’
konɛt
kone
kɔnɛ̃
kɔnɛ̃
ale vini sɔti save kɔnɛt
kɔnɛ̃
kɔn
‘know’
ay vin sɔt sav kɔnɛt
Gloss
Notes: a
Louisiana Creole has a short form /al/ alternating with a longer form /ale/ meaning ‘to haul/pull’. The suppletive French form /va/ 3. also appears in some French-based creoles as an irrealis marker: va in Mauritian and Louisiana Creole. In Reunionese Creole a form /sava/, possibly lexicalized from the agglutination of the demonstrative with the 3. form of the verb , is used in a number of impersonal constructions. Armand (2014) describes it as an auxiliary.
b
In addition to /vin/, both Mauritian and Louisiana Creole have the form /vjɛ̃/. But in both languages, this is a late borrowing and the two forms are used interchangeably.
case of kone) and the syncretic SFs are derived from them or they enter the paradigm as SFs from which the corresponding LFs are derived. Notice also the case of Mauritian asiz ‘to sit’, whose French source is evidently the feminine past participle assise, is imported as a Mauritian SF from which the corresponding LF asize is then derived. Together with Louisiana Creole, French-based creoles spoken in the Indian Ocean show a more extensive pattern of alternation than New Caledonian creole, Tayo and the creoles of the French West Indies. Table 5.3 illustrates alternations from Reunionese, another French-based creole spoken in the Indian Ocean, and Louisiana Creole, Guadeloupean and Haitian, all spoken in the Americas. In our view, it is likely that verb alternations in these varieties started out as a sandhi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
118
, ,
alternation that was subsequently exapted to serve one or another function in each individual creole. While we focus here only on three French-based creoles, Mauritian, Guadeloupean, and Haitian, we hypothesize that verb-form alternations in all French-based creoles are unequivocally more complex than has previously been acknowledged (see, e.g., Ehrhart 1993; Hazaël-Massieux 2002; BerniniMontbrand et al. 2013). As we show in the following section, this complexity is revealed by the creoles’ processes of deverbal derivation. In discussing deverbal derivation in these creoles, we will draw upon the following useful distinctions: (i) In most cases, an LF may be seen as consisting of a stem plus a particular vowel; we refer to the stem in this combination as an LF-. • In many instances, a verb’s LF-stem is simply the verb’s SF, as in the case of Haitian or Mauritian ‘come’: LF vini, LF-stem/SF vin. • Occasionally, a verb’s LF ends is a consonant that is absent from the verb’s SF. Here, too, the LF-stem may be equated with the SF, as in the case of Haitian ‘do/make’: LF fèt, LF-stem/SF fè. (ii) In some cases, there is a relation of between a verb’s LF and its SF; that is, there is a single form that the grammar of the language treats as both an LF and an SF. • In such cases, the syncretized forms may have the vowel-final morphology of a typical LF, in which case the LF-stem is distinct from the SF. In cases of this kind, the LF-stem may have the status of a hidden stem of the sort discussed in section 5.2 above; we call this a LF. As we will see (section 5.6.3.2), the Haitian verb ‘chat’ has koze as both its LF and its SF, with koz as a hidden LF-stem. • But there are also cases in which a verb’s syncretized LF and SF have the shape of a typical SF; in such cases, one can assume that the LF, the SF and the LF-stem are all alike, as in the case of Mauritian ‘drink’, whose LF, SF, and LF-stem are all bwar. (iii) Finally, a verb may have a hidden stem that is distinct from its LF, its SF, and its LF-stem; we call this a . In Mauritian, for example, the verb ‘drink’ has bwar as its LF, SF, and LF-stem, but also has the special hidden stem biv- appearing in nominalizations such a biver ‘drinker’.
5.5 Approaches to derivation Our analysis is based on the theoretical framework of lexeme-based morphology (Matthews 1972; Aronoff 1994) where the lexeme is defined as a lexical entity
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
119
abstracted away from the syntactic contexts in which it may appear; a lexeme belongs to a lexical category, has semantic content, and is realized by one or more word forms through which it participates in syntax. In inflectional languages, a lexeme is usually associated with a collection of stems used to form the inflected forms that can be inserted into sentences. For instance, the French verbal lexeme ‘to drink’ has a stem /byv/ upon which are built the inflected forms /byvɔ̃/ (buvons ‘we drink’), /byve/ (buvez ‘you () drink’), /byvɛ/ (buvais ‘I was drinking’), etc., and a stem /bwa/ from which are formed the homophonous word forms /bwa/ (bois ‘you () drink’, boit ‘s/he drinks’). Stems such as /byv/ and /bwa/ are morphomic in the sense of Aronoff (1994): they participate in formal alternations whose conditioning cannot be coherently characterized in semantic, morphosyntactic, or phonological terms but must be seen as purely morphological in its motivation. For French verbs, Bonami & Boye´ (2002, 2003) propose a stem space with twelve slots; this is a kind of matrix within which each verb’s full inventory of stems is uniformly specifiable. The stem slots are linked to one another by default implicative rules, so that for a regular verb, there is a slot whose stem suffices to determine the stems in all of the other slots in that verb’s stem space. An irregular verb is a lexeme whose stem space includes at least one stem that overrides a default implicative rule. Extending this idea, Bonami et al. (2009) show that a thirteenth stem is needed to account for deverbal lexemes suffixed with the action nominalizer -ion, the adjectivalizer -if, or the agent nominalizers -eur/-rice. Thus, both rules of inflection and rules of derivation draw upon a lexeme’s stem space; an individual stem may, however, be accessible to rules of only one type; for instance, the thirteenth stem proposed by Bonami et al. (2009) is hidden to inflection, being accessible only to rules of derivation, as in Table 5.4.
Table 5.4. Stem space of ‘to form’, ‘to finish’, and ´ ‘to defend’ #
Stem use
1 2 3 4 5 6 7 8 9 10 11 12 13
imperfect, pres. 1/2 present 3 present present participle imperative 2 imperative 1/2 pres. subjv. & 3 pres. subjv. 1/2 infinitive future, conditional simple past, past subjv. past participle hidden stem
fɔʁm fɔʁm fɔʁm fɔʁm fɔʁm fɔʁm fɔʁm fɔʁm fɔʁme fɔʁm fɔʁma fɔʁme fɔʁmat
finis finis fini finis fini finis finis finis fini fini fini fini finit
defɑ̃d defɑ̃d defɑ̃ defɑ̃d defɑ̃ defɑ̃d defɑ̃d defɑ̃d defɑ̃d defɑ̃d defɑ̃di defɑ̃dy defɑ̃s
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
120
, ,
At least five members of a French verb’s stem set are available as base stems in instances of deverbal nominalization (Bonami et al. 2009; Tribout 2012). Deverbal nouns in -age have stem 1 as their base stem (e.g., /netwaj/: ‘cleaning’); deverbal nouns in -ment generally have stem 2 as their base stem (/netwɑ/: ‘cleaning’, /ʒonis/: ‘yellowing’); and the base stem of a deverbal noun arising by conversion may be stem 3 (/dɑ̃s/: ‘to dance’! ‘dance’), stem 12 (/ɑʁive/: ‘to arrive’ ! ´ ‘arrival’), or the hidden stem 13 (/defɑ̃s/: ´ ‘to defend’ ! ´ ‘a defense’). The selection of a deverbal derivative’s base stem is not uniquely determined by phonological or grammatical criteria. For example, there are instances in which more than one of a verb’s stems serves as a base for conversion, as in the case of ‘to dive’, whose derivatives include ‘dishwashing’ (whose stem /plɔ̃ʒ/ is stem 3 of ) and ´ ‘diving’ (whose stem /plɔ̃ʒe/ is stem 12 of ). More importantly, base-stem selection has no correlation with the semantics of the derived nominal: nominalizations expressing action, result, agent, instrument, or location vary unpredictably with respect to which of the base lexeme’s five possible stems serves as their base stem. Given the dimensions of complexity discussed in section 5.2, we claim that French derivational relations contribute substantially to the morphological complexity of French. In particular: (i) base-stem predictability in the definition of deverbal nominalizations in French exhibits the highest degree of complexity (degree 2 in Figure 5.1); and (ii) where X is a verbal lexeme L’s base stem in a particular derivational relation, the restrictedness of X in L’s morphology may evince the highest degree of complexity (degree 2 in Figure 5.2).
5.6 Derivational relations in French-based creoles We now turn to the description and analysis of derivation in Mauritian, Guadeloupean, and Haitian; in each case, we preface this discussion with a brief overview of the function of long and short verb forms in the creole under scrutiny.
5.6.1 Mauritian 5.6.1.1 Function of verb forms in Mauritian In Mauritian, verbs alternate between a short and a long form. Most verbs (70%) have morphologically distinct forms but some (30%) have syncretic long and short forms (Henri 2010); the verbs in Table 5.5 are representative of the different observed cases. Contrary to previous assumptions (e.g., those of Corne 1982),
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
121
Table 5.5. Verb alternations in Mauritian Verb
SF
LF
‘to think’ ‘to stay’ ‘to buy’ ‘to ask’ ‘to amend’ ‘to snore’ ‘to drink’
pans res aste demann amand ronf bwar
panse reste aste demande amande ronfle bwar
the alternation is not phonologically predictable and shows an intricate distribution that encodes morphological, syntactic, and information-structure oppositions (Henri 2010). In syntax, a verb’s SF is used in the presence of a non-clausal complement (3) as opposed to the LF, which appears in the absence of any complement (4a). LFs also appear with verbs that select clausal complements (4b), have an extracted complement (4c) or are followed by an adjunct (4d). (3)
Toulezour, mo pans mo everyday, 1. think. 1. ‘Everyday, I think about my family.’
(4)
a. Zan ronfle. John snore. ‘John snores.’
fami. family
b. Mo panse ki tou dimoun 1. think. that every person ‘I think that everybody is intelligent.’
intelizan. intelligent
c. Se mo fami ki mo panse. It 1. family that 1. think.. ‘It’s my family that I think about.’ d. Zan ronfle gramatin John snore. morning ‘John snores in the morning.’ However, a verb’s LF may appear where its SF would otherwise be expected under certain discourse conditions. In counter-assertions, the LF is interpreted as an exponent of Verum Focus—using the LF evokes and denies the converse of the proposition making up the content of the clause (Henri et al. 2008; Henri 2010).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
122
, ,
(5)
a. : To pa pans to fami 1. think. 1. family ‘You never think about your family!’ : Mo panse mo fami! 1. think. 1. family ‘I do think about my family.’
zame! never
b. : To pa fer seki to anvi isi! 2. do. what 2. want. here ‘You don’t do what you want here!’ : Mo panse kouma mo le kan mem. 1. think. how 1. want. still ‘I still think like I want to.’ Similarly, post-verbal constituents that are usually construed as adjuncts, while ordinarily inducing the use of the LF, can appear with SFs if and only if those postverbal constituents are focused; this is true of locatives, instrumentals, temporal adjuncts, and adjuncts of degree, frequency, and manner. (6)
a. :
Kot to manze dan zedi? where 2. eat. Thursday ‘Where do you eat on Thursdays?’ : Mo manz rozil dan zedi! 1. eat. Rose-Hill on Thursday ‘I eat in Rose-Hill on Thursdays’
b. : Ar ki to manze? what 2. eat. ‘What do you eat with?’ : Mo manz ar lame. 1. eat. with hand ‘I eat with my hands.’ Finally, both the short and the long form are used in lexeme-formation processes such as reduplication (Henri 2010, 2012). A derived verb formed by reduplication itself has both an SF and an LF; as the examples in Table 5.6 show, the derived verb’s SF is a doubling of the base verb’s SF while its LF is the base verb’s SF combined with its LF. Heterogeneous distributional patterns such as those of a Mauritian verb’s short and long forms can be characterized as morphomic (Henri forthcoming), a property that has been argued to contribute to a system’s integrative complexity (Aronoff 1994). As we now show, Mauritian derivations are as integratively complex as those of French.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
123
Table 5.6. Reduplication in Mauritian Base lexeme
Reduplicated derivative lexeme
SF
LF
Gloss
SF
LF
Gloss
pans manz res demann bwar
panse manze reste demande bwar
‘think’ ‘eat’ ‘stay’ ‘ask’ ‘drink’
pans-pans manz-manz res-res demann-demann bwar-bwar
pans-panse manz-manze res-reste demann-demande bwar-bwar
‘think episodically’ ‘nibble’ ‘stay occasionally’ ‘ask occasionally’ ‘sip’
5.6.1.2 Derivational relations in Mauritian As we have seen, verbs in Mauritian have two basic forms: an SF and an LF. In instances of deverbal nominalization, verbs vary according to whether their base stem is their SF or their LF, as the examples in Table 5.7 show. A deverbal nominalization’s base stem may also be a special hidden stem, as in the case of biv in Table 5.7. In some instances, it is not immediately clear whether a deverbal nominalization’s base stem is an LF or an SF: in cases in which a nominalizing suffix begins with a vowel, the base stem lacks a final vowel, either because it is an SF (or possibly even a hidden LF-stem) or because it is an LF that has undergone a (morpho)phonological process of elision serving to avoid vowel hiatus. Other cases, however, are not ambiguous in this way. In the morphology of the lexeme ‘to remain’, for example, the LF reste has a t but the SF res does not; in view of this fact, the nominalization restan ‘leftovers’ likely involves elision of the LF reste. Conversions in general are unambiguous with respect to their choice of base stem. Moreover, they show that derived nominal lexemes have the same kinds of meanings (action, result, location) whether their stem arises from a verb’s LF or its SF; thus, a base lexeme’s base stem is not, in itself, predictable in Mauritian. A verb’s derived nominal stem is not always inherited from the lexifier language. Derived nominals like (stem /dɑ̃se/) ‘dancing’ or (stem /luke/) ‘peep’ do not exist in French and thus cannot be inherited. As Mauritian innovations, these nouns demonstrate that derivation is a productive process from a qualitative perspective (i.e., the process is still available to form new nouns). Deverbal nominalizations in Mauritian involve base stems that are both variable and unpredictable (Table 5.7): base stems may be LFs, SFs, special hidden stems, and perhaps also hidden LF-stems; in some instances they are comparable in complexity to deverbal nominalizations in French. In particular, base-stem predictability in the definition of deverbal nominalizations in Mauritian exhibits complexity of degree 2 (see again Figure 5.1). Because the grammar of Mauritian defines complementary syntactic distributions for a verbal lexeme’s LF and SF, both of these function as inflected forms and neither, therefore, is hidden. But we also identified instances where a special hidden stem is used in the formation of derived nouns. As a consequence,
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
124
, ,
Table 5.7. Deverbal nominalizations in Mauritian → Noun
Verb LF
SF
danse ‘to dance’
‘to peep’
dans louke
by conversion danse
‘dancing; ball’
(la)dans
‘dance’
louke
‘peep’
chak
‘stroll’
louk
by suffixation dans-er/ez*
‘dancer’
louk-er*
‘peeping Tom’
chak-er*
‘stroller’
rest-an
‘leftovers’
biv-er labiv-et
‘drinker’ ‘bar’
kamouflaz*
‘episode of insulting’
chake ‘to stroll’ ‘to remain’
chak reste res bwar
‘to drink’
special hidden stem biv
‘to insult’ [‘to cover with insults’]
kamoufle
(le)res
‘rest’
bwar
‘drink’
kamoufle ‘insults’
hidden LFstem kamoufl
In a given row, each nominalization has that row’s verb form as its base stem. *An asterisk marks a derived stem that is morphologically ambiguous, involving either (a) a base stem that is an SF or hidden LF-stem or (b) a base stem that is an LF whose final vowel undergoes prevocalic elision.
Mauritian derivations exhibit a degree of base-stem restrictedness similar to that of French (see again Figure 5.2).
5.6.2 Guadeloupean 5.6.2.1 Function of verb forms in Guadeloupean Guadeloupean shows significantly fewer verbs having distinct long and short forms compared to Mauritian. We propose that the grammar of Guadeloupean, like that of Mauritian, makes essential reference to a grammatical distinction between long and short forms, but that the Guadeloupean lexicon differs from that of Mauritian insofar as most verbs exhibit syncretism between their long and short forms. We have identified thirty-four verbs having morphologically distinct short and long forms, based on a sample of 1,824 verbs extracted from two dictionaries (Tourneux & Barbotin 2008; Bernini-Montbrand et al. 2013); Table 5.8 provides a sample of verbs having distinct long and short forms. As is the case in Mauritian, LFs alternating with a morphologically distinct SF usually end in a vowel in Guadeloupean, specifically e and i, but with more
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
125
Table 5.8. Verb alternations in Guadeloupean Verb
SF
LF
́ ‘to look’ ́́ ‘to put’ ́ ‘to know’ ́ ‘to hold’ ‘to come’ ̀ ‘to do’ ́ ‘must’ ‘to give’
gay/gad mèt sav ken vin fè fo ba(n)
gade´ me´te´ save´ kenbe´ vini fèt fale´ bay
members in the i class (four members in Mauritian vs. ten in Guadeloupean). Guadeloupean also presents two cases in which a verb’s LF ends in a consonant that is absent from its SF—fèt /fɛt/ ~ fè /fɛ/ ‘to do’ and bay /baj/ ~ ba(n) /ba/ or /bɑ̃/ ‘to give’; neither is found in Mauritian. Given the restrictedness of the phenomenon in Guadeloupean, one might think of Guadeloupean verb alternations as irregularities in a system in which verbs usually exhibit only a single form and in which alternations that do arise can be argued to be phonologically systematic, conforming to a small number of patterns ranging from the truncation of a final segment or syllable (me´te´ /mete/ ~ mèt /mɛt/ ‘to put’; foute´ /fute/ ~ fou /fu/ ‘give’) to a combination of final truncation with nasal spread (de´fandi /defɑ̃di/ ~ de´fann /defɑ̃n/ ‘to defend’) or nasal shift (kenbe´ /kɛ̃be/ ~ ken /kɛn/ ‘to hold’). There are also instances of partial suppletion, as in alternations such as gade´ /gade/ ~ gay /gɛ/ ‘to look’ or fale´ /fale/ ~ fo /fo/ ‘must’. Our view is that the difference between the Guadeloupean verb system and that of Mauritian is a difference of degree, not of kind. In particular, we assume that in the grammars of both languages, long and short verb forms are systematically distinguished but that the two forms are syncretic in some cases; this syncretism is more widespread in Guadeloupean than in Mauritian, but that is a lexical fact rather than a fact of grammar. This perspective entails that in both languages, LFs possess a systematic cluster of properties distinct from that possessed by SFs—that a verb exhibiting distinct long and short forms is not an irregular verb whose forms possess their own peculiar distributional idiosyncrasies, but fits into a larger pattern. The simplest assumption is that this larger pattern is common to all verbs, but that a verb’s conformity to the pattern is often obscured by the same kind of poverty of forms as characterizes English verbs such as hit, spread, and cost (which exhibit a single form for the infinitive, the non-3 present, the past, and the past participle). Guadeloupean verb alternation codes an aspectual distinction, where SFs are usually interpreted as referring to single events (as in (7a)–(14a)) and LFs as referring to multiple events (as in (7b)–(14b)). In the absence of other TAM markers, the long and short alternants may also express tense contrasts: in (7a) the SF expresses present tense, while in (7b), the LF expresses past tense (or passe´ compose´). (Guadeloupean resembles Louisiana Creole in this respect.)
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
126
, ,
(7)
a. An ken ni ba-’w. 1. hold. 3 -’2. ‘I hold it for you.’ (single event) b. An kenbe´-’y ba-’w. 1. hold.-’3. -’2. ‘I held it for you.’ (multiple events)
When SFs are combined with the progressive marker ka, the interpretation is that of what might be called a ‘progressive completive’, as in (8a); but the combination of an LF with ka (as in (8b)) instead has a prospective reading, in which a multiplicity of future events, potentially but not necessarily completed, is understood. (8)
a. A(n) ka vin. 1. come. ‘I’m coming all the way.’ (‘progressive completive’) b. A(n) ka vini. 1. come. ‘I’m planning to come.’ (prospective)
Similarly, SFs with the irrealis marker ke´ or the past tense marker te´ may have a single event interpretation; the SF sav ‘know’ in (9a) has a single event interpretation, and the SF mèt ‘put’ in (10a) may receive either a single event or multiple events interpretation. By contrast, LFs combine with ke´ and te´ to express multiple events, as in (9b) and (10b). (9)
a. An pe´ ke´ sav konte´. 1. know. count. ‘I won’t know how to count (on that occasion).’ b. An pe´ ke´ save´ konte´. 1. know. count. ‘I won’t know how to count (in general).’
(10)
a. I te´ mèt pima adan. 3. put. pepper inside ‘He/She put pepper in it (on that occasion / in general).’ b. An te´ me´te´ pima adan. 1. put. pepper inside ‘He/She put pepper in it (in general).’
This contrast is of course not obvious in cases in which the long and short forms are syncretized. The data in (11) exemplify syncretic verbs exhibiting meanings that are ambiguous between the single-event and the multiple-event
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
127
interpretations. However, no prospective reading is available in (11a). Speakers typically use kay¹⁰ instead of ka to express the prospective in these contexts. (11)
a. An mange´ kribich. 1. eat./ crawfish ‘I eat/ate crawfish.’ (present single-event/past multiple-event) b. A(n) ka(y) dòmi. 1. sleep ‘I am sleeping.’ (‘progressive completive’ or prospective) c. Timoun-la te´ chante´ on bel chanson child- sing./ beautiful song ‘The child sang a beautiful song.’ (past single- or multiple-event) d. Pon moun pe´ ke´ bouge´. no person move./ ‘No one will move.’ (irrealis single- or multiple-event)
A subclass of verbs shows different constraints: SFs of the verbs ´´ ‘to peep’, ´ ‘to look’, and ´ ‘to put/give/leave’ are only used as imperatives, as in (12); these reflect a more direct borrowing from French, with the exception of the form gay /gɛ/ (12b), apparently a creole neologism. A comparable behaviour is seen with `, whose short and long forms discriminate between the active and the passive/causative, as in (13). (12)
a. Fou sa la! put. this here ‘Put this here!’ (rude) b. Gay bonda-la-sa! look. ass-- ‘Look at this ass!’
(13)
a. Manman a-’w ka Mother -’2. ‘Your mother is making food.’
fè make.
mange´. food
b. Mange´ ka fèt. food make. ‘Food is cooking.’ Finally, the verb ‘to give’ features semantic contrasts but also sandhi effects. With non-pronominal objects, we find both the form bay and ba combined with the irrealis marker ke´, with the former form encoding an irrealis single-event ¹⁰ The form kay probably derives from the contraction of the TAM marker ka with ay (from the short form of the verb ‘to go’).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
128
, ,
meaning (as in (14a)) and the latter encoding an irrealis multiple-event meaning (as in (14b)). With pronominal noun phrases, the form ba precedes a vowel-initial pronoun (14c) and ban, a nasal-initial pronoun (14d). (14)
a. An ke´ bay on tap. 1. give. slap ‘I’ll slap you (on that occasion).’ b. An ke´ ba on tap. 1. give. slap ‘I’ll slap you (in general).’ c. An ba-’w li. 1. give.-’2. 3 ‘I give/gave it to you.’ d. Jan ban mwen tout lajan a-’y. 1. give. 1. all money -’3. ‘John gives/gave me all his money.’
5.6.2.2 Derivational relations in Guadeloupean Guadeloupean shows less verb alternation than Mauritian. When Guadeloupean verbs do have both an LF and an SF, deverbal nominalization seems to favour the LF as the verb’s base stem. Verbs having syncretic forms also give rise to deverbal nominalization. Both cases are illustrated in Table 5.9. Like Mauritian, Guadeloupean exhibits derived nominals that do not exist in French (e.g., ́, ́, ́ in Table 5.9); such innovations reveal that deverbal nominalization is qualitatively productive in Guadeloupean. Guadeloupean grammar defines distinct syntactic distributions for a verbal lexeme’s long and short word forms; for some verbs, these are distinct forms (e.g., vini / vin ‘to come’) though for most, the two forms are syncretized. But even for verbs that do not exhibit a distinct SF, there is sometimes evidence for a distinct LF-stem with its own special distribution. A large number of verbs that lack distinct long and short forms have a present participle formed by means of a suffix -an; the examples in (15) illustrate. Examples of this sort exhibit an ambiguity similar to that observed for Mauritian in section 5.6.1.2: either -an attaches to the verb’s LF-stem or it attaches to the verb’s LF with prevocalic elision of the LF’s final vowel. (15) ́ ‘to lie’ ́ ‘to fight’ ́ ‘to mix’ ́ ‘to drink alcohol’
! ‘lying’ ! ‘fighting’ ! ‘mixing’ ! ‘drinking’
Several operations of deverbal nominalization exhibit a similar pattern in Guadeloupean; these include the operation of -è /ɛ/ suffixation, which forms agent nouns, and the operations of -aj and -asyon suffixation, which form action
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
129
Table 5.9. Deverbal nominalizations in Guadeloupean Verb LF ‘to come’ ‘to go out’ ‘to look’
→ Noun SF
vini
by conversion vini
‘arrival’
sòti
‘outing’
gadé
‘look’
by suffixation
vin sòti sòt gadé gad babouké
‘to constrain’
LF-stem babouk
babouk
‘constraint’
‘to fight’
goumé
goumé
‘fight’
badiné ‘to joke around’
LF-stem badin chomé
‘to have fun’
‘to take advantage’
chomé
babouk-aj*
‘halt’
badin-è* badin-aj*
‘joker’ ‘joke’
chom-aj*
‘party’
pwofit-asyon*
‘benefit’
poupoul-man
‘teasing’
‘party’
LF-stem chom pwofité LF-stem pwofi(t)
pwofi
‘benefit’
poupoulé ‘to tease’
LF-stem poupoul
In a given row, each nominalization has that row’s verb form as its base stem. *An asterisk marks a derived stem that is morphologically ambiguous, involving either (a) a base stem that is an LF-stem or (b) a base stem that is an LF whose final vowel undergoes prevocalic elision.
nouns; these operations are exemplified in Table 5.9, with additional examples in (16)–(18).¹¹ Here, too, the derivational suffix joins with either a verb’s LF-stem or, with elision, its LF. (16)
́ ‘to cuddle’ ́ ‘to stroll’
! ̀ ‘cuddler’ ! ̀ ‘stroller’
(17)
́ ‘to exchange’ ́ ‘to unite’
! ‘exchange’ ! ‘union’
¹¹ The suffixal derivatives in Table 5.9, in (15)–(17), and in (20) are cited from Villoing & Deglas (2016).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
130 (18)
, , ̀́ ‘to annoy’ ‘to follow’
! ̀ ‘annoyance’ ! ‘pursuit/chase’
Villoing & Deglas (2016) pursue the assumption that such derivations involve a sandhi operation by which vowel hiatus is avoided through the prevocalic elision of an LF’s final ´e. Additional evidence, however, reveals that at least some cases cannot be attributed to prevocalic elision but must be seen as involving direct suffixation to a verb’s LF-stem. Consider, for example, the operation of -man suffixation, by which action nouns such as those in (19) are derived. (19)
́ ‘to tease’ ! ‘teasing’ ́́ ‘to hurry up/start moving’! ́ ‘moving/ activating’ ́́ ‘to separate’ ! ́ ‘separation’
As these examples show, deverbal nouns suffixed with -man also lack the final ´e of the verb’s LF. Here, however, the absence of the final ´e cannot be attributed to hiatus avoidance, since the suffix begins with a consonant. Moreover, nouns such as , ́, and ́ have no counterparts in French and so cannot simply be inheritances from the lexifier. The only explanation is that they are productively formed in Guadeloupean through the direct suffixation of -man to a verb’s LF-stem. Moreover, Occam’s Razor favours the assumption that all of the operations in (15)–(19) involve direct suffixation to a verb’s LF-stem. By maintaining a distinction between a verb’s SF and its LF-stem, we can arrive at a straightforward account of deverbal nominalizations such as those in (20) as well as denominal verb derivations such as those in (21). On one hand, the deverbal nominalizations in (20) are conversions of a verb’s LF-stem to a noun; by contrast, the derivations in (21) are conversions of a noun to a verb’s LF-stem, to which the suffixal formative for a verb’s LF then attaches. This account contrasts with that of Villoing & Deglas (2016), who regard the derivations in (20) and (21) as involving processes of suffixation that induce elision rather than processes of conversion. (20)
́ ‘to flirt’ ! ‘a flirt’ ́ ‘to offend’! ̀ ‘an insult’ ́ ‘to stroll’ ! ‘a stroll’
(21)
‘zouk’ ! ́ ‘to dance zouk’ ̀ ‘Christmas’! ́́ ‘to celebrate Christmas’ ‘drizzle’ ! ́ ‘to drizzle’ ‘refuge’ ! ́ ‘to take refuge’
Our analysis assumes the coexistence of deverbal nominalizations whose base stem is a verb’s LF (e.g., goume´ ‘to fight’ ! goume´ ‘fight’) with those whose base stem is a verb’s LF-stem (e.g., LF-stem bas ‘to flirt’ ! bas ‘flirt’). This analysis
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
131
predicts that a particular verb may give rise to two derived nominal stems, one based on the verb’s LF, the other on its LF-stem. This prediction is indeed borne out: ́ ‘to win’ has two derived nominals, ́ ‘victory’ (whose stem is the verb’s LF) and ‘win’ (whose stem is the verb’s LF-stem). In summary, we assume that every verb has an LF-stem, even if it doesn’t exhibit distinct long and short word forms; for those that do, the SF shares the form of the LF-stem. Postulating an LF-stem for every verb offers a unified analysis of both denominal verb derivation and deverbal nominalization (whether by conversion or by the addition of a derivational suffix). On this account, Guadeloupean derivation shows a degree of complexity equivalent to those of French and Mauritian with respect to base-stem predictability. In Guadeloupean, a verbal lexeme’s base stem is its LF in some cases and its LF-stem in others; thus, base-stem predictability in the definition of deverbal nominalizations exhibits complexity of degree 2. By contrast, it is not clear that Guadeloupean deverbal nominalizations ever have a hidden form as their base stem; not even a verb’s LF-stem can be claimed to be hidden in view of its use in the formation of a present participle, an inflected form. Guadeloupean deverbal nominalizations therefore exhibit a base-stem restrictedness whose complexity is no higher than degree 1.
5.6.3 Haitian 5.6.3.1 Function of verb forms in Haitian Only twelve out of 2,657 verbs excerpted from Valdman et al. (2007) alternate between a long and a short form (Table 5.10). The alternation is, according to Alleyne (1996), the result of a phonological reduction, or more precisely that of a syllabic reduction (Cadely 1994). The function of the alternation shows some similarities with both Mauritian and Guadeloupean. DeGraff (2001) argues that truncation occurs when verbs are followed by non-pronominal objects (22a) but fails when the verb is in sentencefinal position (22b), has an extracted object (22c) or is followed by an adjunct (22d). (22)
a. Mari gen kouraj. Marie have. courage ‘Marie has courage.’ (DeGraff 2007) b. Tonton Bouki ap ale. uncle Bouki go. ‘Uncle Bouki is leaving.’ c. Konbyen dan Tonton Bouki genyen? how_much tooth uncle Bouki have. ‘How many teeth does uncle Bouki have?’ d. Le klosh ape sone aster. the bell ring. now. ‘The bells are ringing now.’ (Roberts 1999)
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
132
, , Table 5.10. Verb alternations in Haitian Verb
SF
LF
́ ‘to go’ ́ ‘to look’ ò ‘to go out’ ‘to come ‘to eat’ ̀ ‘to do/make’ ‘to give’
al gad sòt vin gen fè ba(n)
ale´ gade´ sòti vini genyen fèt bay
Notice that the behaviour in (21b) is also attested in Guadeloupean with the verb ale. The opposition fèt fè ‘to do/make’ also occurs in both creoles. In addition, DeGraff (2001) claims that LFs are used for emphasis. He concludes that verb alternations in Haitian are an instance of inflectional morphology whose realization is determined by phonological phrasing and argumenthood.
5.6.3.2 Derivational relations in Haitian Deverbal nominalization is evidently productive from a qualitative perspective in Haitian, since a number of derived nominal stems have no counterpart in French, for example those in (23). (23)
a. ‘to run’ ! ‘the action/result of running’ b. ‘to lie’ ! ‘the action/result of lying’ (Lefebvre 1998)
Because very few verbs in Haitian exhibit an overt inflectional alternation between long and short forms, there are few cases of derivation where one can readily identify the choice of one alternant over the other. When cases of this sort do occur (typically in conversions), they involve the LF in some instances and the SF in others, as in Table 5.11. Suffixal derivation of nouns from verbs often involves a vowel-initial suffix, as in (24); the existence of a sandhi rule eliminating vowel hiatus by means of stemfinal vowel truncation might (as in Guadeloupean) be claimed to allow such derivatives to be based on a verb’s LF. But as in Guadeloupean, the noun-forming suffix -man does not create vowel hiatus; its appearance in post-consonantal positions therefore cannot be attributed to elision, but must be seen as the effect of direct suffixation to a verb’s LF-stem. In some cases (e.g., (25)), the resulting nominalization has no counterpart in French, and so cannot be seen as a direct inheritance from the lexifier. We must therefore assume that as in Guadeloupean, a Haitian verb’s LF-stem sometimes participates directly in the workings of its derivational morphology.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
133
Table 5.11. Deverbal nominalizations in Haitian Verb LF ‘to come’
→ Noun SF
vini
‘arrival’
alé
‘departure’
sòti
‘going out’
gad
gad
‘look’
gen
gen
‘gain’
vini
by suffixation
vin alé
‘to go’
by conversion
al
‘to go out’
sòti sòt gadé
'to see’ ‘to win, to gain’
genyen djòle
‘to chat’
tranché ‘to cut up, to slice’
djòl-è*
hidden LFstem djòl
hidden LFstem tranch
tranché ‘labor pain, shoemaker’s knife’
‘talker’
tranch-man ‘pain’
bati ‘to build’
special hidden stem batis
batis-man
‘construction (action)’
In a given row, each nominalization has that row’s verb form as its base stem. *An asterisk marks a derived stem that is morphologically ambiguous, involving either (a) a base stem that is a hidden LF-stem or (b) a base stem that is an LF whose final vowel undergoes prevocalic elision.
(24)
a. ‘to bet’ ! ‘a bet’ b. ̀ ‘to chat’ ! ̀̀ ‘talker’ (Lefebvre 1998)
(25)
‘to chat’ ! ‘a chat’12
(DeGraff 2003)
VN compounds might seem to afford a parallel argument, since the verb in such compounds often appears to be an LF-stem; for example, ‘break’, ‘break’, ¹² Nominalizations similar to kozman include for instance ajoutman ‘addition’, frapman ‘knocking’ and pledman ‘discussion, quarrel’, which are absent in contemporary French but found in Medieval French. DeGraff (2003: 69) rightfully argues that these might have been inherited from regional varieties spoken in the colonies in the seventeenth century.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
134
, ,
and ‘walk’ all seem to be represented by their LF-stems in the compounds ̀ ‘a destructive individual’ (Fr. -), ̀ ‘hard question’ (Fr. ˆ ), and ’stoop, steps to a house’ (Fr. -); but these compounds all apparently originate in French, and evidence of the productivity of exocentric VN compounds is in general lacking in Haitian (Lefebvre 1998: 345). A final parallel between Haitian and Guadeloupean pertains to denominal verbs. Verbs are apparently derived from nouns by means of a suffix -e, which sometimes produces verb forms having no counterpart in French. (The examples in (26) illustrate.) But as in Guadeloupean, these can instead be seen as instances of N!V conversion whose output is a verb’s LF-stem (in which case -e has the role of an LF-forming verb suffix); here again, distinguishing a verb’s LF-stem from its SF affords a more streamlined account of derivation. (26)
a. (stem pansyon) ‘thought, anxiety’ ! (LF pansyon-e) ‘to think, to ponder’ b. (stem makak) ‘stick’ ! (LF makak-e) ‘to hit with a stick’ c. (stem bourik) ‘donkey, work horse’ ! (LF bourik-e) ‘to work like a dog’ d. ̀ (stem tèk) ‘a hit (in marbles)’ ! ̀ (LF tèk-e) ‘to hit a marble’ (Lefebvre 1998; DeGraff 2003)
It is clear that at least some Haitian verbs possess special hidden stems. Each of the verbs in (27) has a special hidden stem used in derivation (e.g., with the nominalizing suffix -man: vomis-man) but not in inflection. The productivity of this pattern of alternation is attested to by the fact that it gives rise to derivatives having no counterpart in French, as in (28). (27)
‘to vomit’ ‘to refresh’ ‘to cool’
(28)
̀/̀
‘vomiting’ ‘refreshment’ ‘cooling’
‘to build’ ‘to finish’ ‘to thank’ ̀
‘construction (action)’13 ‘end’ ‘thanking’
Thus, relations of deverbal nominalization in Haitian are comparable in complexity to those of Mauritian and French. The base stem in deverbal nominalization is the LF for some verbs, the SF for others, the LF-stem for others, and a special ¹³ Finissement and b^ atissement can be found in Medieval French, but not *remercissement.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
135
Table 5.12. Complexity of derivational relations in French, Mauritian, Guadeloupean, and Haitian
Degree of complexity in base-stem predictability Degree of complexity in base-stem restrictedness
French
Mauritian
Guadeloupean
Haitian
2
2
2
2
2
2
1
2
hidden stem for still others. Base-stem predictability in the definition of deverbal nominalizations therefore attains complexity of degree 2. And given that the base stem in some deverbal nominalizations is a special hidden stem, base-stem restrictedness in the definition of these nominalizations likewise exhibits complexity of degree 2.
5.7 Conclusion In this chapter, we have presented criteria for assessing the integrative complexity of a morphological system’s derivational relations, and we have applied these criteria in an analysis of derivational relations in Mauritian, Guadeloupean, and Haitian. We have demonstrated that each of these languages possesses deverbal nominalizations that are not a mere inheritance from the lexifier language but must be seen as the effect of a productive process within the creole itself. Moreover, we have shown that the complexity of the derivational relations in these creoles attains the same degree of complexity as those of the lexifier; our results are summarized in Table 5.12. When a verb L is the base lexeme in a derivational relation, the identity of L’s base stem in L’s stem set is not, in general, predictable either in French or in Mauritian, Guadeloupean, or Haitian; moreover, the status of L’s base stem in the definition of L’s morphology may be as peripheral in Mauritian and Haitian as in French. These results challenge the extreme simplicity that has so often been attributed to creole morphology. We hypothesize that as further work is done on the morphology of creole languages, other sorts of derivational processes will be found to exhibit a comparable level of integrative complexity.
Acknowledgements We would like to thank Jean-Michel Benjamin for his input on the Guadeloupean data.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
6 Simplification and complexification in Wolof noun morphology and morphosyntax Michele Loporcaro
6.1 Introduction In this chapter, I will describe how Wolof noun morphology has become simplified, compared with the system that can be reconstructed for a previous stage through comparison with other Atlantic languages (the subdivision of the Niger-Congo family to which Wolof belongs). On the other hand, I will also show that, in some respects, Wolof noun morphology and especially morphosyntax has become more complex—more complex than in previous stages of the language and also more complex than usually assumed in the literature—acquiring new irregularities. The Wolof—and Atlantic—facts will be scrutinized against the background of recent research on linguistic complexity. Since the study is about the grammatical system and does not adduce any psycholinguistic evidence (from language usage and/or processing), I will be addressing what the relevant literature (e.g., Dahl 2004: 39; Miestamo 2008: 27; Sinnemäki 2008: 72; Lindström 2008: 217) labels ‘absolute complexity’, not what is sometimes called ‘relative complexity’ (Kusters 2008: 4–8), that is, memory cost/difficulty (Hawkins 2007). The chapter is organized as follows: in section 6.2, I introduce the language and its classification; in section 6.3, I present the basics of the Wolof noun class system, which is then placed in its Atlantic context in section 6.4.¹ In section 6.5, I will briefly introduce the distinction between complexity and morphological richness— as defined in the literature on morphological complexity I take as a point of reference (in particular Baerman et al. 2010; 2015b; 2017; Dressler 2011)—and how complexity and richness relate to morphological type, to then move on to ¹ While the data from other Atlantic languages are drawn from the available literature, for Wolof available sources are complemented with first-hand data from the variety of Mbakke (Mbacke), lying about 150 kilometres east of Ndakaaru/Dakar, in the territory of the traditional kingdom of Bawol which is part of the Wolof heartland, the area on whose dialects the standard variety of Wolof is based. These were collected in cooperation with Cheikh Anta Babou, to whom I am indebted, and are presented in more detail in Babou & Loporcaro (2016). Glossing obeys the Leipzig glossing rules: in addition, indicates class marker (without numbering for Wolof, since contrary to other NigerCongo languages mentioned in the chapter, there is no agreed-on numbering of noun classes in studies on Wolof). Michele Loporcaro, Simplification and complexification in Wolof noun morphology and morphosyntax In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Michele Loporcaro. DOI: 10.1093/oso/9780198861287.003.0006
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
137
considering complexifying changes which have affected Wolof noun morphology, changing many aspects of what must have earlier been a coherently agglutinative system into a system which, in addition to many properties going towards the isolating type, also developed some inflectional irregularities normally found in inflecting-fusional languages. The section also compares similar developments in other Atlantic languages, while section 6.6 addresses complexification in the paradigm of agreement targets. Finally, in section 6.7, I discuss whether the diachronic dynamics of change observed in the language may be explained in external terms, considering the sociolinguistic setting of the language and the nature of the speech community in which it is spoken.
6.2 Wolof and Atlantic languages Wolof is the native language of four million (Lewis et al. 2015) to 4.5 million (Leclerc 2015), and the main inter-ethnic lingua franca among the thirteen million inhabitants of Senegal. It is also spoken in Gambia (about 226,000 speakers), where it is the second most spoken language after Mandinka, Mali (62,000 speakers), Mauritania (around 16,400 speakers), and Guinea Bissau, as well as in migrant communities in Europe (France, Italy, and Spain) and the USA (mainly New York City).² The evidence to establish change in Wolof is twofold: on the one hand, the language has been described thoroughly since the early nineteenth century (cf. Dard 1825, 1826, Boilat 1858, Kobès 1869, etc., with some news on relevant aspects of its structure available since as early as the late sixteenth century: cf. Doneux 1978: 45), so that changes leading to the present situation can be followed through the extant documents and descriptions. Transcending this limited timedepth requires reconstruction, and this poses problems since the classification of Wolof within the Northern Atlantic branch of Niger-Congo is debated: the traditional view considers Wolof most narrowly related to Fula, and places Wolof/Fula, together with Seereer, in a Senegambian subdivision of Atlantic (cf. Sapir 1971: 47f; followed by Wilson 1989: 87f; Childs 2004, 2010: 36, etc.), while Doneux (1978: 43–5) and Segerer (2010: 4f) propose alternatively that the closest relative to Wolof is the Ñuun (also: Bagnoun, Bainuk, Baïnounk) language/dialect cluster (straddling Casamance, in Southern Senegal, the north of Guinea-Bissau, and Gambia), and Pozdniakov (2015: 58) lists Fula/Seereer, Buy/Nyun, and Wolof as three different branches of Northern Atlantic. Be that as it may, all the ² Occasionally, one comes across much lower figures in the literature: see, for example, Njie (1982: 16), reporting slightly more than one million speakers (‘le wolof se parle en Gambie et au Sénégal par un peu plus d’un million de personnes’). Higher figures (e.g., the 7.5 million reported by Perrin 2012: 11) are given by authors not drawing the distinction between native/L1 and vehicular/L2 usage of Wolof.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
138
languages mentioned display a better-preserved noun class system of the NigerCongo type than Wolof, a fact that must be kept in mind when reconstructing past changes leading to the grammatical system observed today.
6.3 Wolof noun classes: the basics and the received view In the rich literature on Wolof, the language is invariably described as featuring ten noun classes (henceforth abbreviated NCs), eight singular and two plural, marked on determiners and other noun modifiers occurring adnominally as well as pronominally.³ A complete list of the usually assumed classes is given on the horizontal dimension in (1), while (1a)–(1d) exemplify the larger list of class-marked function words:
(1) NC marker a. proximal definite article b. distal definite article c. proximal demonstrative d. distal demonstrative etc.
bbi ba bii bee
ggi ga gii gee
kki ka kii kee
jji ja jii jee
lli la lii lee
mmi ma mii mee
ssi sa sii see
wwi wa wii wee
yyi ya yii yee
ññi ña ñii ñee
Taking the proximal definite article, the following examples illustrate NC contrasts: (2)
a. xarit b-i friend -. ‘the friend’ c. nit k-i person -. ‘the person’ e. ndongo l-i disciple -. ‘the disciple’ g. soxna s-i honourable lady -. ‘the honourable lady’
b. góor g-i man -. ‘the man’ d. jëkkër j-i husband -. ‘the husband’ f. njëngtéef m-i sorcerer -. ‘the sorcerer’ h. far w-i lover/fiancé -. ‘the lover/fiancé’
³ Cf., for example, Boilat (1858: 11ff); Rambaud (1898: 11); Delafosse (1927: 30f); Labouret (1935: 46); Gamble (1957: 134); Sauvageot (1965: 72–4); Stewart & Gage (1970: 392); Sapir (1971: 75); Irvine (1978: 43); Thiam (1987: 9); Fal et al. (1990: 17); Mc Laughlin (1997: 2); Munro & Gaye (1997: ix); Becher (2001: 42); Ndiaye (2004: 26); Camara (2006: 11); Diouf (2009: 153); Guérin (2011: 84); Tamba et al. (2012: 895); Torrence (2013: 16); Pozdniakov & Robert (2015: 548). The notion ‘noun class’ is used in different ways by different authors, within and beyond African language studies (see the discussion in Babou & Loporcaro 2016: 4–6).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
139
i. xarit/jëkkër/ndongo/njëngtéef/soxna/far y-i friends/husbands/disciples/sorcerers/ladies/lovers -. ‘the friends/husbands/disciples/sorcerers/lovers’ j. góor/nit ñ-i man/person -. ‘the men/persons’ As usual in Atlantic languages, there is a disproportion between classes, in several respects: (a) a disproportion with respect to number, as there are eight singular classes as opposed to only two classes traditionally recognized for the plural: yi plurals ((2i)), and ñi plurals ((2j)); and (b) an imbalance in numerosity. The exhaustive list of ñi plurals (eleven lexemes in all, all denoting humans) is the following: (3)
gaa/gan/géer/gor/góor/jaam/jigéen/ people/guest/non-casted/free man/man/slave/woman/ mag/maggat/ndaw/nit ñi adult/old person/youngster/person .-. ‘the people/guests/non-casted/free men/men/slaves/women/adults/old people/youngsters/persons’
All the rest of the nouns take yi in the plural ((2i)). Likewise, in the singular the bi class in (2a) accounts for the vast majority of nouns, and has been constantly attracting new members, as schematized in (4) (based on Becher 2001: 42–52): (4)
incidence of the bi class among singular nouns: a. b. c. NineteenthTwentiethToday, urban/Dakar century rural century rural 44% > 64% > ‘for the most part’ > Dard (1825), Irvine (1978: Tamba et al. (2012: Kobès (1875) 51) 894, n. 5)
d. Today, urban/Banjul 90% Becher (2001: 47f)
Its incidence has grown from less than 50% in nineteenth-century rural Wolof to near generalization in the contemporary urban language. As a result, the agreement pattern selected by most nouns in all varieties of Wolof is the one in (5) (singular bi/plural yi):⁴
⁴ This is the default agreement class (consisting of the two default NCs for singular and plural), both in lexical and in syntactic terms: lexically, loanwords are assigned bi/yi class membership (cf. Rambaud 1898: 22; Stewart & Gage 1970: 392; Guérin 2011: 83); syntactically, there are rules substituting yi for other plural markers under certain conditions (cf. Babou & Loporcaro 2016: 16, 31f).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
140
(5)
a. buur b-i /king .-. ‘the king is ready; he . . . ’
noppi ready
na/*na-ñu . . . moom . . . .3/-3 3
b. wuur y-i noppi /king .-. ready ‘the kings are ready; they . . . ’
na-ñu/*na . . . ñoom . . . -3/.3 3
Note that class-agreement is marked exclusively on determiners (boldfaced in (5)), while adjectives (which really are stative verbs in Wolof) do not mark class contrasts. Verb auxiliaries and pronouns mark person and number, not class.
6.4 Wolof within the Atlantic context Thus, Wolof has moved far away from the pervasiveness of agreement typically observed in Niger-Congo, including Atlantic languages. Compare the Fula examples in (6), where the word for ‘king’ is class-marked itself and controls class-agreement on adjectives and function words; or the Baïnounk examples in (7), with classagreeing demonstratives, adjectives, and numerals; or those from Diola-Fogny in (8), with class-agreement also on the verb (again, class markers are boldfaced for clarity): (6)
Pular, Fuuta Jaloo (Guinea; Diallo 2010: 80f): a. lan-ɗo maw-ɗo mo yiiɗ-en on ko janan-o king-. old-. . see-.1 be foreigner-. ‘the old king we saw is a foreigner’ b. lan-ɓe maw-ɓe ɓe yiiɗ-en ɓen ko janan-ɓe king-. old-. . see-.1 be foreigner-. ‘the old kings we saw are foreigners’
(7)
Baïnounk, Gubaher; Ñuun (Casamance, Senegal; Cobbinah 2010: 186) a. bә-kәr ba-m-ba / bә-kәr-әŋ ba-naːk-aŋ -chicken -.- / -chicken- -two- ‘this chicken’ ‘two chickens’ b. feːbi fa-dikaːm goat -female ‘female goat’
(8)
/ /
feːbi-ɛŋ fa-naːk-aŋ goat- -two- ‘two goats’
Diola-Fogny (Casamance, Senegal; Sapir 1965: 24, 90) a. bu-bәːr-ә-b bә-mәk-ә-b bu-lɔlɔ 9-tree--9 9.-big--9 9-fall ‘the big tree fell’ b. u-bәːr-ә-w wә-mәk-ә-w u-lɔlɔ 8-tree--8 8.-big--8 8-fall ‘the big trees fell’
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
141
Since this pervasiveness of class marking on nouns and different agreement targets is a property reconstructed for Niger-Congo, and for Atlantic, Wolof has lost it, which boils down to loss of complexity, under the view that redundancy adds to complexity, maintained by Dahl (2004: 10) among others: the spread-out of information from a segment of the signal to its neighbours means that the mapping from input to output—and thus the system as such— becomes more complex. (Dahl 2004: 10)
Indeed, most of the changes in noun morphology and morphosyntax from Atlantic to Wolof produced simplification, in one way or the other: there has been loss of redundancy in agreement (as readily apparent from comparison of (5) with (6)–(8)), and reduction in the number of NCs (Proto-Atlantic had about fifteen NCs; Doneux 1975: 114), which amounts to loss in constitutional complexity, in Rescher’s (1998: 9) terms. We have also seen (in (4)–(5)) that there is a trend towards the generalization of the default NCs. This is the kind of changes the literature on Wolof tends to focus on. However, there were also changes which made the system more complex, leading to the rise of (previously absent) morphological irregularity (in static morphology; in Dressler’s 2011: 161 terms), both on nouns (with the rise of inflectional classes (ICs), untypical for agglutinating languages), and on agreement targets (rise of defective and otherwise irregular paradigms). These are the changes on which I am going to focus in what follows.
6.5 Complexification in Wolof noun inflection, against the background of Atlantic noun class systems 6.5.1 Morphological complexity vs. morphological richness Niger-Congo languages on the whole have agglutinative morphology. In an ideally agglutinating language, as pointed out, for example, by Dressler (2011: 160), we expect to find less complexity than in languages of the inflectingfusional type: Strongly inflecting-fusional languages have a sizeable amount of morphological richness, but also many unproductive patterns, i.e. additional morphological complexity. Strongly agglutinating languages have much more morphological richness, but ideally no unproductive morphological patterns, a situation nearly completely obtained by Turkish. (Dressler 2011: 160)⁵
⁵ As is well-known in Turkish ‘there are no inflectional classes’ (Wurzel 1989: 74).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
142
To recognize this, though, one has to distinguish complexity from richness of inflection: I agree with Baerman et al. (2010: §1) that the size of a paradigm is not a primary criterion of complexity; it is [ . . . ] a criterion of morphological richness dependent on the importance of inflectional morphology in the morphology– syntax interface. (Dressler 2011: 160)
Under this view, the morphology of an ideally agglutinating language is rich, not complex. To mention just one crucial aspect, relevant for the present discussion, such a language, lacking inflectional classes, lacks ‘the additional structure imposed by inflectional morphology, above and beyond its dedicated task of expressing syntactic and semantic distinctions’ (Baerman et al. 2010: 1). As a final remark to this section, note that the use of notions such as ‘agglutinating’ and ‘inflecting-fusional’ in morphological typology has been criticized, most influentially by Haspelmath (2009), who analyses what he calls the ‘Agglutination Hypothesis’ into three distinct indexes (the Cumulation, the Alternation, and the Suppletion Index) and takes it to be falsified by the fact that, on the whole, the languages in his sample score differently on the three. A language displaying one-toone correspondence between form and meaning in inflectional morphology scores higher on the Cumulation Index than languages allowing for one-to-many correspondences. The ‘Alternation Index’, on the other hand, assigns 0 to languages ‘which exhibit complete stem invariance’, and higher values to languages showing more ‘stem alternations, that is, the (co-)expression of morphological categories by changing, rather than adding to, the stem’ (Haspelmath 2009: 17). The ‘Suppletion Index’, finally, is ‘defined as the average percentage of subcategories (per categorysystem) that exhibit affix suppletion’ (Haspelmath 2009: 22). Note that the only Niger-Congo language in the sample (Swahili) scores 0.1 on the Cumulation Index, while a paramount instance of an agglutinating language such as Turkish (Haspelmath 2009: 23) scores 0. Both Swahili and Turkish also score 0 on the Alternation Index. On the Suppletion Index, on the other hand, Turkish scores 23/100 and Swahili 28/100, which is far from 0 (Nivkh) but much closer to it than to the score reached by a typically ‘inflecting-fusional’ language like Latin (84/100). Thus, despite the scepticism Haspelmath airs about the usefulness of the ‘agglutinating’ vs. ‘inflecting-fusional’ distinction, his own data show that it is far from odd to qualify languages such as Turkish or Swahili as consistently agglutinating, for the purposes of the present study. More broadly, Haspelmath’s line of argument seems to be at odds with the notion itself of a ‘type’, whose legitimacy cannot be called into question by pointing to empirical objects which poorly fit the ideal instantiation of it, however defined, given that ‘linguistic types’ are ‘ideal constructs which natural languages approach to various degrees’ (Dressler 2005: 7).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
143
6.5.2 The emergence of inflectional classes in Wolof Like Niger-Congo in general, Wolof too has agglutinating morphology, but this is today the case only in the verb, since the noun has become almost completely invariable, as reflected in the white dot (meaning ‘no distinct plural form’) on the WALS map 33 on noun plurality (Dryer 2013), a fact remarked ever since the earliest descriptions of Wolof.⁶ However, while such remarks and the WALS white dot are accurate for the overwhelming majority of Wolof nouns, uninflectedness has not yet triumphed completely. In fact, one of the rare lexemes still preserving two distinct forms, that is, buur ‘king’, has already been displayed in (5). The same is the case for about twenty nouns (listed in (9)), whose singular and plural differ because of an alternation in the initial consonant:⁷ (9)
a. mbaam mi mbootaay mi ndono li ndab li ndënd mi ngàttaan mi b. mbagg mi c. baaraam bi boroom bi buur bi buy bi d. pepp mi e. këf ki
baam yi bootaay yi dono yi dab yi dënd yi gàttaan yi wagg yi waaraam yi woroom yi wuur yi wuy yi fepp yi yëf yi
Gloss ‘donkey’ ‘piggyback’ ‘heritage’ ‘utensil’ ‘drum’ ‘short one’ ‘shoulder’ ‘finger’ ‘owner’ ‘king’ ‘baobab fruit’ ‘grain’8 ‘thing’
⁶ On noun invariability in Wolof, see the early remarks by Dard (1826: 14): ‘Mais si le nom n’est pas suivi de la préposition ou, on ajoute après ce nom les articles ya, yi, you, sans jamais rien changer dans son orthographe’ [‘But if the noun is not followed by the preposition ou, one adds after this noun the articles ya, yi, you, withouth ever changing anything in its orthography’]. Similarly, Boilat (1858: 7) points out: ‘En Wolof, les noms ne changent pas de terminaison dans les différentes combinaisons que leur fait éprouver le discours, pas même en passant du singulier au pluriel’ [‘In Wolof, nouns do not change ending in the different combinations in which discourse places them, not even when they change from singular to plural’]. Thus, ‘le substantif est invariable’ [‘the noun is invariable’] (Boilat 1858: 11). ⁷ The alternations—as described in Sauvageot (1965: 74); Diagne (1971: 79); Diouf (2009: 155); Camara (2006: 7–8), etc.—may take different forms, illustrated in (9). The proximal form of the definite article—already seen in (1)–(2)—is added after each word form, to indicate that the two occur in distinct environments (thus glosses expand to ‘the x/the x’s right here’). ⁸ Camara (2006: 8) also reports pan/fan ‘day/days’, showing the same p-/f- consonant alternation as in (9d). However, this paradigm is no longer attested in Mbakke Wolof, where the formerly plural form fan has generalized and is used for singular as well: for example, benn fan jàll na ‘one day has passed’. The lexeme fan is reported as invariable also in also Fal et al.’s (1990: 70) dictionary: fan wi ‘the day’/ ñaari fan ‘two days’. The older singular form pan still occurs only in the fixed expression weer-u benn pan ‘the first day of the month’ (literally ‘crescent-. one day’).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
144
f. bët bi bëñ bi g. loxo bi h. waa ji
gët yi gëñ yi yoxo yi gaa ñi
‘eye’ ‘tooth’ ‘hand, arm’ ‘guy’
For most lexemes, this difference today is only optional since—with the sole exception of këf ‘thing’—the singular form may, and indeed tends to, be used in plural contexts, while the reverse is not the case (see Guérin 2011: 85; Babou & Loporcaro 2016: 10). Once uninflectedness is generalized, noun morphology will have become simplified again, but as long as paradigms such as those in (9) survive, they represent an increase in morphological complexity, determined by changes which introduced morphological irregularity of the sort familiar from inflecting-fusional languages: in other words, that in (9) is evidence for the occurrence of (residual) inflectional classes in Wolof. Note also that free variation in the plural cell of those noun lexemes determines overabundance (Thornton 2011; Meakins & Wilmoth, Chapter 4, this volume), that is, variation between two cell-mates (Loporcaro & Paciaroni 2011: 420), thus contributing to a local increase in complexity, if only ephemeral, on the way towards simplification.
6.5.3 Agglutinative noun-class morphology and inflectional classes in other Atlantic languages The initial consonant alternations defining these inflectional classes are the last remnants of two distinct but intertwined processes which are observed—with varying degrees of regularity—in the neighbouring Atlantic languages, and specifically, in those to be considered as representative comparator languages from the North Atlantic branch under either classification hypothesis for Wolof (see section 6.2), that is, either Fula and Seereer or Ñuun. The two processes are one morphological (NC-prefixation), the other morphonological (initial consonant mutation). Integration of initial consonant mutation into the NC system is an innovation that is currently reconstructed for Proto-Northern Atlantic (see Pozdniakov 2015: 60), even if not preserved in all daughter languages: in Ñuun languages, ‘the system is barely operative now, but can be partly reconstructed’ (Wilson 2007: 86), and the same is true of Wolof, as discussed in (18)–(19) below. In Fula and Seereer, by contrast, the consonant mutation system itself and its interaction with NCs are well-preserved. As an illustration consider the word koor ‘man’ in Seereer-Siin (or SiinGandum, the most conservative variety of Seereer in this respect, spoken in the Sine region of Senegal; see Faye 2013: 3, 9). This nominal root may occur, with distinctive morphology, in several of the sixteen NCs of the language (see Mc Laughlin 2000: 336)—eleven of them displaying overt class prefixes, five lacking
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
145
them, all selecting class-marked enclitic determiners—thus generating word forms such as the following (see also Mc Laughlin 1997: 6): (10)
koor ‘man’
o- koor-oxe goor-we o- ŋgoor-oɴɢe fo- ŋgoor-ne a- ŋgoor-ale (-)man-
Class 1 Class 2 Class 12 Class 13 Class 3b
singular Seereer-Siin plural diminutive singular diminutive plural augmentative singular
In (10), one observes consonant mutation on the stem-initial consonant, exemplified here with koor ‘man’, which appears as koor, goor, or ngoor, depending on the class: ‘Stem-initial consonant mutation in Seereer-Siin is morphologically conditioned by noun class in nouns and dependent adjectives’ (Mc Laughlin 2000: 335). Fula, on the other hand, has twenty-one to twenty-five NCs, according to dialects,⁹ and having lost all NC prefixes, contrasts NCs by means of suffixes,¹⁰ on nouns as well as on agreement targets, resulting in very elaborate paradigms. The initial consonant of both stems and suffixes is subject to mutation, whose effects are exemplified in (11)–(12) with data from the dialect of Gombe (Northern Nigeria), excerpted from the detailed account offered by Arnott (1970: 79–109): (11)
Fula, Gombe, N. Nigeria (Arnott 1970: 87). Suffix grades, lexically selected (invariable stems): Grade A Grade B Grade C Grade D Class Gloss (grammatical) ɓoy-re leemuu-re tummu-de loo-nde 9 ‘x’ ɓoy-e leemuu-je tummu-ɗe loo-ɗe 24 ‘x’s’ ɓoy-el leemu-yel tummu-gel loo-ŋgel 3 ‘small x’ ɓoy-um leemu-yum tummu-gum loo-ŋgum 5 ‘worthless little x’ ɓoy-on leemu-hon tummu-kon loo-kon 6 ‘small x’s’ ɓoy-a leemu-wa tummu-ga loo-ŋga 7 ‘big x’ ɓoy-o leemu-ho tummu-ko loo-ko 8 ‘big x’s’ ‘baobab fruit’ ‘orange’ ‘calabash’ ‘storage pot’ Gloss (lexical) ɓoyleemu(u)- tummulooStem The horizontal dimension shows grade alternation in suffixes, while on the vertical dimension an arbitrary selection of NCs is offered for illustration. For nominal stems, the grade depends on the class, which in turn correlates largely ⁹ For the Senegalese variety of Pulaar Mc Laughlin (1997: 7) describes twenty-one NCs, while twenty-two are reported for the one described by Sylla (1982: 31) and twenty-five for the Gombe dialect (Northern Nigeria) described by Arnott (1970: 75). ¹⁰ This ‘affix renewal’ occurs not only in North Atlantic, as also in ‘at least one language of South Atlantic, Kisi, the normally prefixed NCMs [= noun class markers] are suffixed’ (Childs 2009: 117; see Childs 1983 and the recent discussion by Di Garbo 2014: 80).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
146
(though not perfectly, see Arnott 1970: 73) with the semantics, as shown in the gloss column on the right-hand side: thus, for instance, class twenty-four hosts word forms which are plural to class 9; class 3 is the corresponding diminutive singular, which pluralizes in turn as class 6; class 5 is diminutive/pejorative; and so on. For suffixes, by contrast, the grade is lexically selected by the (lexical specification of the) stem. The data in (11) exemplify invariable stems, where only class suffixes vary according to the class-dependent consonant grade, while the noun stem stays the same because its initial consonant is an invariable one, not involved in consonant mutations, observed here only on suffixes. Thus, for instance, in class 9 the forms -re, -de, -nde, marking different grades, are related morphonologically via mutation with each other, and are selected by the individual noun lexemes so that, for example, ‘baobab fruits’ cannot be *ɓoy-je/-ɗe (i.e., cannot take plural class 24 suffixes of grades B–D) because of lexical specification. The nouns in (12), by contrast, exemplify what Arnott (1970: 93) calls ‘variform’ stems (only some consonant alternations are displayed here, as selected by grades A, C, and D; in other words, (12) displays an arbitrary selection, not only of noun classes, but also of grades and consonant alternations; the reader is referred to Arnott’s description for a full account of the intricacies of this fascinating system): (12)
Fula, Gombe, N. Nigeria (Arnott 1970: 98). Consonant alternation in noun stems of different grades: Grade A Grade A Grade C Grade D Suffix grade (selected) r/d/nd w/b/mb w/g/ŋg y/g/ŋg C- alternation on stem Class Gloss (grammatical) dim-o beer-o gor-ko gim-ɗo 1 ‘x’ rim-ɓe weer-ɓe wor-ɓe yim-ɓe 2 ‘x’s’ dim-el beer-el gor-gel gim-ŋgel 3 ‘small x’ dim-um beer-um gor-gum gim-ŋgum 5 ‘worthless little x’ ndim-on mbeer-on ŋgor-kon ŋgim-kon 6 ‘small x’s’ ndim-a mbeer-a ŋgor-ga ŋgim-ŋga 7 ‘big x’ ndim-o mbeer-o ŋgor-ko ŋgim-ko 8 ‘big x’s’ ‘free man’ ‘host’ ‘man’ ‘person’ Gloss (lexical) rimweerworyim Stem
For instance, the first two stems rim- ‘free man’ and weer- ‘host’ select the same class suffixes (both grade A) but differ in the initial consonant, while the other two, wor- ‘man’ and yim- ‘person’, select allomorphs of the class suffixes which differ from each other, apart from some syncretisms (seen in classes 6 and 8). Thus, for instance dim-o, gor-ko and gim-ɗo all display what is morphologically the same class 1 suffix, but in different allomorphs.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
147
In other words, what we have here is different inflectional classes, in spite of the overall agglutinating character of Fula morphology. The Fula situation, as for the selection of the forms of each NC suffix, is closer to that of an inflecting-fusional language like Italian, with ICs, than to that of a strongly agglutinating language like Turkish, without inflectional classes, as schematized in (13): (13)
inflectional classes in Fula? i. alternative forms of the inflections are related phonologically ii. alternative forms of the inflections are selected phonologically
a. Turkish b. Fula c. Italian + – +
–
–
Turkish has no inflectional classes, since the alternants of each affix are selected phonologically (e.g., ev/ev-ler ‘house/-s’ vs. yol/yol-lar ‘trip/-s’, with plural -ler/-lar depending on the front/backness of the root vowel), while Italian has because cane/can-i ‘dog()-/-’ vs. lup-o/lup-i ‘wolf()-/-’) take different singular endings, not derivable from each other phonologically ((13i)), due to lexical specification ((13ii)).¹¹ In Fula too, ‘there seems no advantage in treating all suffixes of each class as morphophonemic variants of a single class suffix’ (Arnott 1970: 68). In fact, while in some cases one observes, between different suffix grades, alternations that could be accounted for through independently valid morphonological rules of the language (e.g., the alternation between voiced and voiced prenasalized stops between Grades C–D in Classes 3, 5, or 7), this cannot be generalized, since, for example, in Class 1 -ko (Grade C) and -ɗo (Grade D) are not related morphonologically. Thus, Fula differs in this respect from an ideally agglutinative language such as Turkish and rather resembles Italian, where inflections are selected depending on inflectional class (a lexeme-inherent purely morphological property) and are not derived by morphonological rule from one another. In sum, there is no alternative but to recognize the occurrence of inflectional classes in Fula too, though this—as highlighted in Babou & Loporcaro (2016: 44)—is a descriptive notion which is hardly used in the grammars of Atlantic languages. More generally, Atlantic languages offer interesting evidence for the rise of inflectional classes within an agglutinating system.¹² This applies also to the Ñuun ¹¹ Here, an editorial comment asked: ‘why not analyse -o/-e as part of the stem truncated before plural -i?’. This corresponds to Scalise’s (1983: 293–4) vowel deletion rule, and the alternative between the two is indeed a handbook topic in Italian morphology: the reader is referred to Thornton (2005: 160), who shows that this readjustment rule becomes superfluous under a word and paradigm approach to morphology. ¹² An anonymous reviewer comments that, with the present discussion, ‘The author seems to suggest that inflectional classes of nouns are an innovation in the history of individual languages’. Actually, one must recognize ICs for previous stages of Atlantic languages: as observed in n. 16, the same mechanisms of consonant gradation responsible for IC-contrasts in Fula are currently assumed
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
148
language/dialect cluster, the alternative closest relevant comparator languages for Wolof under the second classification hypothesis in section 6.2. For Baïnounk, as shown for different dialects by Sauvageot (1967), Bao Diop (2015; on Baïnounk Gunyamolo) and Cobbinah (2010; on Baïnounk Gubaher), one has to assume inflectional classes, since not all nouns are subject to singular and plural formation via NC prefixes. Rather, in Baïnounk Gubaher (spoken in the village of Djibonker, south of Ziguinchor, in the Casamance), analysed by Cobbinah (2010: 182–7), only one subset of the noun lexemes forms singular and plural prefixally ((14a)), while another substantial subset displays suffixal plurals formed with a default suffix -Vŋ, and divides into a group with plural suffix only ((14b)) and a mixed group combining a plural class-marked prefix and the plural class-neutral suffix ((14c)):¹³ (14)
Baïnounk Gubaher (Cobbinah 2010: 182–7) a. prefixal class marking, paired for and : for example, ra-maːsix ‘crab’/ ɟa-maːsix b. no prefix in the ; suffix (class-neutral -Vŋ): for example, bәːb ‘father’/ bәːb-әŋ ‘fathers, old men’ c. prefixal class marking in the ; with prefix and class-neutral suffix: bә-kәr ‘chicken’/ bә-kәr-әŋ
While (14a) mirrors the inherited Niger-Congo noun inflection, the rest is the product of a series of innovations (e.g., the prefixes occurring in type (14c) nouns ‘do not occur as singular prefixes in the paired prefixed groups or if so then only very rarely’; Cobbinah 2010: 186), which makes the recognition of different inflectional classes, as schematized in (14), necessary, even if the combination of morphs in noun word forms largely stayed agglutinative, rather than fusional, in nature. This evidence could be multiplied, another case in point being, for example, Diallo’s (2010), (2014: 151–81) study of the adaptation of borrowed Mande nouns leading to the creation of inflectional classes (not present in the native lexicon) in Fuuta-Jaloo Pular, the Fula variety spoken in the Fuuta-Jaloo area in Guinea. This shows that all over the area a trend towards the creation of allomorphy in nominal paradigms (and new inflectional class distinctions) is observed.
for earlier stages of Wolof as well. However, this is orthogonal to the fact that new morphological irregularities, defining (new types of) ICs, can be shown to have arisen, as is the case with the stem alternations in (9), which define (residual) ICs (a) of a kind different from that reconstructed for earlier stages of Atlantic, and (b) that are not usually recognized in the literature, before Babou & Loporcaro (2016). ¹³ Pozdniakov (2015: 79–82) reviews pluralizing suffixes (-Vn/ŋ) from different Atlantic languages suggesting that they may be etymologically related with the plural class marker for humans reflected in Wolof as ñ-.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
149
6.5.4 The complexification of Wolof noun inflection As seen in section 6.5.3, thus, Wolof is not the only Atlantic language to have developed morphological irregularities of the kind found in fusional languages. Since such irregularities add to morphological complexity (section 6.5.1), one must recognize that even the morphological system of Wolof, much less rich than those seen in section 6.5.3, has developed new forms of complexity. Recapitulating so far, the marking of NC contrasts in the North Atlantic languages considered above can be summarized as follows (after Mc Laughlin 1997: 7, with one small modification):¹⁴ (15) Class markers in some North Atlantic languages (Mc Laughlin 1997: 7, revised) a. Seereer-Siin √ √ √ b. Fula √ √ √ c. Wolof (traces) (traces) √ As seen for Fula in (11)–(12), in this language consonant mutations and suffixation (which replaced prefixation in the affix renewal process: see n. 10) are involved in lexically conditioned allomorphy defining inflectional classes. Some remnants of this situation persist in Wolof ((15c)), though this has neither class prefixes nor classmarked clitics nor suffixes but, in its present state, marks NC only on determiners. These remnants are the singular/plural alternations in (9), which concerned many more lexemes in the nineteenth century, as shown in (16), listing lexemes which now have lost consonant alternation but still had it according to nineteenth-century sources: (16) Becher (2001: 50f): nouns with allomorphy in Boilat (1858) and Kobès (1875) Gloss / today (Fal et al. 1990) banta bi wanta yi ‘stock’ bant bi/yi ‘bit of wood’ badoolo mi wadoolo yi ‘peasant’ baadolo bi/yi bakan bi wakan yi ‘nose’ bakkan bi/yi bopa bi gopa yi ‘head’ bopp bi/yi garab gi yarab yi ‘tree’ garab gi/yi Further language-internal evidence comes from the indefinite article, which is the only noun determiner to occur categorically in pre-nominal position (while ¹⁴ The modification consists in indicating the occurrence of traces of earlier prefixes for Wolof: see (9) as well as the diachronic data in (16)–(17). In particular, I am non-committal about Mc Laughlin’s distinction between ‘clitic determiners’ and ‘independent determiners’, a distinction one anonymous reviewer finds fault with: ‘I have serious doubts about the validity of the distinction between “clitic determiners” and “independent determiners”.’
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
150
definite article and demonstratives normally follow the noun, though demonstratives can also be preposed), and the only one to display the class marker after, rather than before, its class-invariable part. According to Doneux (1975: 49), this doubly exceptional distribution arose via reanalysis of earlier prefixes, reconstructed as seen in (17a): (17)
Doneux (1975: 49): Wolof prenominal article < former class prefix on noun a. a-b sëriñ ‘a healer’ < *a-b-sëriñ b. sëriñ b-i ‘the healer’ < *bi-sëriñ b-i (bixirim, AD 1594; Ferronha 1994: 24f)
Converging documentary evidence for earlier prefixes, seen in (17b), comes from a Portuguese voyager, who—writing in 1594—calls bixirim what is today sëriñ b-i ‘the healer’, which is evidence, as Doneux comments, ‘qu’un préfixe (probablement figé) était encore utilisé à cette époque’ (Doneux 1975: 45). While in this lexeme, like in most Wolof nouns, the prefix has been simply dropped, one may argue that some of today’s irregular singular/plural alternations in Wolof (seen above in (9)) show the traces of former class prefixes, which have become fused with the stem, as observed also in other Atlantic languages.¹⁵ Among those irregular alternations, some others come instead from consonant mutations, which are regularly involved in NC inflection in other Atlantic languages (cf. (15a–b) and the examples above in (10)–(12)). In Wolof, consonant mutation is still regular in some derivational processes, such as diminutive or deverbal noun formation: (18)
a. diminutive formation: garab gi ‘the tree’ janq bi ‘the little girl’
! !
ngarab si njanq si
‘the little tree’ ‘the very little girl’
b. deverbal noun formation: digël ‘advise’ ! jang ‘study’ !
ndigël li njang mi
‘the advice’ ‘the education/knowledge’
The overall mutation pattern, as observed in today’s derivational morphology, is as follows: (19)
Wolof consonant mutations (Mc Laughlin 1997: 4): a. base/non-diminutive b d j g s x b. derivative/diminutive mb nd nj ng c q
ʔ k
In noun inflection, however, there is no regular mechanism of consonant mutation contrary to Seereer-Siin and Fula ((15a–b)), but inflectional alternations—nowadays ¹⁵ This has been remarked by many scholars: cf. Pozdniakov & Robert (2015: 551) for a recent recapitulation. As for other Atlantic languages, see, for example, Cobbinah (2010: 189) on the so-called ‘literal alliterative concord’ in Baïnunk: ‘the disputed elements [ . . . ] are archaic noun class morphemes in different stages of fusion with the stem’.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
151
irregular—such as (9a–d) and maybe (9h) can be interpreted as remnants thereof.¹⁶ Conversely, alternations such as këf/yëf ((9e)) must go back to original prefixes, as suggested by alliteration with the class-marked determiners, while they cannot possibly come from consonant mutation because—as seen also in (18)–(19)—this only involves homorganic consonants in all Atlantic languages: ‘the range of variation, called a , is always restricted to homo-organic consonants, e.g. f/p/mp’ (Sapir 1971: 65). By the same token, one can argue that, e.g., pepp mi/fepp yi (9d) ‘the grain/-s’ may have arisen as an instance of a nowadays lost type of consonant mutation. Summing up, the regular mechanisms occurring elsewhere in the noun inflection of other Atlantic languages—consonant mutation and class prefixation—have been conflated into a synchronic system for which one has no other choice but to assume (residual) inflectional classes, that is, that kind of morphological complexity usually occurring in inflecting-fusional languages.
6.6 Complexification in Wolof: paradigmatic irregularity in some agreement targets Concluding section 6.4, I mentioned changes which led to the rise of morphological irregularity also in the paradigm of agreement targets: in fact, in the indefinite article, some defective and otherwise irregular paradigms have been created in Wolof, which are not inherited from Proto-Atlantic. This boils down to an increase in formulaic complexity (descriptive and generative), in Rescher’s (1998: 9) terms. To see this, however, we have to abandon morphology proper and consider morphosyntax, since agreement is a crucial criterion to establish the irregular paradigms I will be concerned with. The agreement facts at stake crucially involve the recognition (as in Babou & Loporcaro 2016) of two additional NCs in the plural (boldfaced in (20b)) with respect to the current view ((1), repeated here in (20a)): (20)
a. Wolof: eight singular and two plural classes (traditional analysis): NC marker
b-
g-
k-
j-
l-
s-
m-
w-
y-
ñ-
b. Wolof: eight singular and four plural classes (Babou & Loporcaro 2016): NC marker
b-
g-
k-
j-
l-
m-
s-
w-
y-
ñ-
j-
s-
The singular/plural pairings of NCs traditionally recognized, even in the most accurate treatments available before Babou & Loporcaro (2016), are schematized in (21a–b) (from Guérin 2011: 84, who highlights that most ¹⁶ See Pozdniakov (1993: 85) and Pozdniakov & Robert (2015: 552f) for a reconstruction of the set of initial consonant mutations—richer than the one still observed today in (19)—involved in NC-related alternations in an earlier stage of Wolof.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
152
singular classes combine with both the traditionally recognized plurals, thus resulting in (21b) rather than (21a)), while (21c) schematizes Babou & Loporcaro’s (2016) account:¹⁷ (21) (a)
(b) Expected pairings Singular
Observed pairings
Plural
Singular
k-
Plural
k-
g-
g-
ñ-
jm-
m-
s-
s-
l-
l-
y-
b-
ñ-
j-
y-
b-
w-
w(c) Observed pairings Singular k-
Plural ñ-
gjlmswb-
yjs-
¹⁷ Singular/plural pairings of NCs define distinct genders: cf. Corbett’s (1991: 190f) analysis of Wolof and Fula.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
153
Note preliminarily that several of the pairings in (21), as well as several of the NCs themselves, are established based on small amounts of lexemes. This ‘inquorate’ character (in Corbett’s 1991: 170–5 terms) is however a normal situation in Atlantic languages, as remarked by Ferry & Pozdniakov (2001: 166): Il est faux de penser que chaque appariement de classe nominale, faiblement représenté, refléterait un figement ou la disparition de prefixes ayant existé. Les langues atlantiques se caractérisent par un trait particulier: on y rencontre souvent une classe spéciale ne comportant que deux ou trois noms ou même un seul. [ . . . ] Chaque langue atlantique présente au moins un mot ayant un accord statistiquement rare, irrégulier, qui traduit une notion sélectionné et marquée dans cette culture précise. [It is wrong to think that each weakly represented NC pairing reflects the fixation or the disappearing of prefixes that once existed. Atlantic languages are characterized by a particular feature: in these languages, one often comes across a special class featuring no more than two or three nouns, or even just one. [ . . . ] Each Atlantic language displays at least one word that has a statistically rare, irregular agreement pattern, which translates a selected and specific notion in that very culture.]
Thus, if a consistent syntactic behaviour, distinct from that of other NCs, can be identified for a set of nouns, however small, this must count as evidence to establish a separate NC. This is what Babou & Loporcaro (2016) did for two additional NCs, the plural classes ji and si. These are homophonous with two singular classes, but must be kept distinct from them because they differ in the agreements they trigger. This is a principle of method that holds in general and is standardly applied also in studies of the Atlantic languages. For example, consider Arnott’s (1970: 72) account of the two homophonous ko classes of Gombe Fula (classes 20 and 8), one singular, one plural, distinguished by agreement: There are two ko classes (8 and 20), with agreement marked by -o, -ho, -ko, ko-, ko elements, etc.; but they are distinguished (i) by the different category of initial consonant in full nominals (F-category in class 20, N-category in class 8 [ . . . ]), and (ii) by the different pattern of agreement with verbal radicals [ . . . ], class 20 being a singular class requiring F- or P-category initial in the verbal radical, while class 8 is a plural class requiring N-category initial in the radical, e.g.: but
20 8
huɗo mbinndirko
ko’o ko’o
wonnake mbonnake
this grass has got spoiled these big pens have got spoiled
Exactly the same happens in Wolof, where what is indeed two couples of distinct classes have been previously confused, disregarding the evidence from verb agreement. This is in fact the only morphosyntactic diagnostic, independent
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
154
from class-marker assignment, allowing one to assess the difference between singular and plural NCs in Wolof. Applying the agreement test, it is easy to see that what has been previously lumped together into one class, the si NC, indeed consists of two distinct NCs. On the one hand, si is selected by singular nouns such as soble ‘onion’, in (22a), whose plural is soble yi ((22b)): (22)
a. soble s-i onion .-. ‘the onion is good’
baax good
na .3
/ *na-ñu / -3
b. soble y-i onion .-. ‘onions are good’
baax good
na-ñu -3
/ *na / .3
On the other hand, other nouns that select si, viz. those in (23b), take plural verb agreement (while, of course, when used in the singular the same nouns take another class marker): (23)
a. Séeréer s-i jekk Seereer .-. handsome ‘the Seereers are handsome’ sëriñ s-i ñów healer .-. arrive ‘the healers have arrived’
na-ñu -3
/ *na / .3
na-ñu -3
/ *na / .3
b. Séeréer b-i Seereer .-. ‘the Seereer is handsome’ sëriñ b-i healer .-. ‘the healer has arrived’
jekk handsome
na .3
/ *na-ñu / -3
ñów arrive
na .3
/ *na-ñu / -3
The same can be repeated for plural ji (jeeg/janq ji ‘the women/little girls’, (24b)), which is distinct from singular ji, seen in (2d) and exemplified again in (24c): (24)
a. jeeg/janq b-i lady/little girl .-. ‘the lady/little girl is tired’
sonn tired
na .3
/ *na-ñu / -3
b. jeeg/janq j-i lady/little girl .-. ‘the ladies/little girls are tired’
sonn tired
na-ñu -3
/ *na / .3
c. jigéen j-i woman .-. ‘the woman is tired’
na .3
sonn tired
/ *na-ñu / -3
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
155
The fact that these are plurals has been overlooked in the literature on Wolof up to now because traditionally plurals such as Séeréer si and jeeg ji have been called ‘collective’, in the wake of Sauvageot’s (1965: 73) influential statement: A l’opposition de nombre singulier/pluriel, s’ajoute celle du collectif. Ce dernier a pour particularités a) de ne pas posséder d’expression propre le distinguant du singulier; b) de ne pas avoir de correspondant pluriel. [To the singular/plural number contrast, one has to add that of collective. The peculiarities of the latter are: a) it does not possess a dedicated expression distinguishing it from the singular, b) it has no corresponding plural.]
There are indeed other African languages—also within the Atlantic family—for which it is justified to assume a separate value of the category ‘number’, which is called traditionally ‘collective’ (cf., e.g., Sapir 1965: 61, 64, on Diola-Fogny), or ‘collective plural’: In addition to the first plural, used with countable nouns, many nouns can combine with a second plural, which is a collective plural for non-countable quantities, or non-specified numbers of entities (Cobbinah 2010: 184)
The author, describing Baïnounk Gubaher, refers to triplets such as the following: (25)
a. ra-maːsix -crab ‘big crab’
ran-de .-big
b. ɲa-maːsix ɲa-naːk -crab .-two ‘two crabs’ (count plural) c. ɟa-maːsix ɟa-ŋaːn -crab .-. ‘those crabs’ (collective plural) Alternative terminologies include ‘pluriel limité ≠ illimité’ (Sauvageot 1967: 227 on Baïnounk Gunyamolo) or ‘greater plural’ vs. unmarked plural (Corbett 2000: 31): A potentially interesting case of a language with a greater plural is Banyun [ . . . ]. Nouns typically have singular and plural, distinguished by prefixes of the type shared by many Niger-Kordofanian languages [ . . . ]. In addition there is a greater plural (which Sauvageot calls ‘unlimited’) [ . . . ] which Sauvageot suggests is used when the number cannot be counted or the speaker feels it unnecessary.¹⁸ ¹⁸ To illustrate, Corbett (2000: 31) cites the paradigm bu-sumɔl ‘snake’ singular ≠ i-sumɔl ‘snakes’ plural ≠ ba-sumɔl ‘snakes’ greater plural.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
156
Unlike in these languages, however, in Wolof there is never a three-way contrast of the kind observed, for example, in Baïnounk Gubaher ((25)), and verb agreement guarantees that the contrast is binary, singular vs. plural. The same two pairs of NCs newly recognized in (21c)—singular vs. plural ji and si—are crucial to illustrate the rise of morphological irregularities observed in the paradigm of the indefinite article. Its regular formation is schematized with three nouns from different classes in (26b), compared with that of the definite article ((26a)): (26) definite vs. indefinite article formation in Wolof
a. def b. indf
sg xaj bi ab xaj
pl xaj yi ay xaj
‘dog’
sg muus mi am muus
pl muus yi ay muus
sg till gi ag till
‘cat’
pl till yi ay till
‘jackal’
The indefinite article, as shown above in (17a), is the only determiner in which the class marker follows the class-invariable part, thus becoming the final consonant. As exemplified in (26), and schematized in (27a), in the regular case there is a correspondence between this final consonant and the initial one occurring as a class marker in other determiners. In addition, however, as illustrated in (27b–c), there are two irregular patterns: (27)
a. regular determiner paradigm
def indf
sg C1-i a-C1
pl C2-i a-C2
b. irregular determiner paradigm def indf
sg pl C1-i C2-i a-C1 a-y
c. defective determiner paradigm def indf
sg C1-i *
pl C2-i a-y
(= sg./pl. pairings of NCs)
bi/yi, ki/yi, gi/yi, mi/yi, si/yi, wi/yi
agreement classes (= sg./pl. pairings of NCs): ki/ñi,gi/ñi, mi/ñi,si/ñi, bi/ñi, bi/ji, bi/si
agreement classes (= sg./pl. pairings of NCs): ji/yi, ji/ñi, li/yi, li/ñi
Paradigm (27b) shows a deviation from the regular formation by which the class-marking consonant yields to y- in the indefinite plural, while in (27c) the indefinite article paradigm is defective, lacking the singular form. In the available literature, the occurrence of ay instead of expected a-C₁ is usually recognized for ñi plurals, seen in (3) above:
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
(28)
157
a-y/*a-ñ nit/góor/jigéen/mag/ndaw/gan -. person/man/woman/adult/youngster/guest ‘(some) persons/men/women/adults/youngsters/guests’
In addition to the pairings involving ñi plurals, however, the list in (27b) also includes the two ‘new’ plural NCs in (20b). In fact, as illustrated in (29b)–(30b), plural ji and si both select the default class marker -y in the indefinite article, on a par with ñi, while indefinite plural *aj and *as do not occur: (29)
a. a-b jeeg/janq ñów -. lady/little girl arrive ‘a lady/little girl has arrived’
na/*na-ñu .3/-3
b. a-y/*a-j jeeg/janq ñów na-ñu/*na -. lady/little girl arrive -3/.3 ‘some ladies/little girls have arrived’ (30)
a. a-b sàmm/Séeréer/sëriñ ñów -. shepherd/Seereer/healer arrive ‘a shepherd/Seereer/healer has arrived’
na/*na-ñu .3/-3
b. a-y/*a-s sàmm/Séeréer/sëriñ ñów na-ñu/*na -. shepherd/Seereer/healer arrive -3/.3 ‘some shepherds/Seereers/healers have arrived’ This provides a further argument against the traditional analysis of Wolof NCs in (20a), because singular si and singular ji, the classes with which our two ‘new’ plural classes were earlier confused, do not behave in the same way. Rather, the singular si class forms the indefinite article regularly, as seen in (31a), while singular ji, as shown in (31b), exemplifies the other type of irregularity observed in the paradigms of the indefinite article, that is, defectiveness ((27c)): (31)
a. a-s soxna /gor -. honourable lady /free man ‘an honourable lady/a free man has arrived’
ñów arrive
na/*na-ñu .3/-3
b. *a-j/*a-y jigéen/yaay/jabar ñów na/*na-ñu -. woman/mother/wife arrive .3/-3 intended: ‘a woman/mother/wife has arrived’ In fact, it is not possible at all to form the indefinite article from this class. In order to convey the same meaning, one has to have recourse to suppletion and use instead the (regularly class-marked) form of the numeral C-enn ‘one’, as shown in (32a). This defectiveness also concerns the li class, or the li/yi and li/ñi pairings listed in (27c), as exemplified in (32b) by ndab and ndaw, respectively:
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
158
(32)
a. j-enn/*a-j/*a-b jigéen/yaay/jabar ñów na/*na-ñu .-one/-. woman/mother arrive .3/-3 ‘a/one woman/mother/wife has arrived’ b. l-enn/*a-l/*a-b ndab/ndaw .-one/-. dish/youngster ‘one/a dish youngster’
The scheme in (33) recapitulates the different kinds of irregularity found in the paradigm of the indefinite article ((33d)), compared with two regular function words, highlighting (in boldface) the differences between singular and plural ji and si:¹⁹ (33) Irregularity in the indefinite article in Wolof a. class marker b. article c. numeral ‘one’ d. article
b- g- k- j- l- m- s- w- y- ñ- j- sbi gi ki ji li mi si wi yi ñi ji si benn genn kenn jenn lenn menn senn wenn yenn ñenn jenn senn ab ag ak * * am as aw ay ay ay ay
To conclude, not only change in noun inflection but also change in agreement target morphology has created new irregularities in Wolof, which add to complexity in a way that had largely gone unnoticed under the traditional—but, arguably, incorrect—view of Wolof NCs in (20a). This ‘local complexification’, which yields a more realistic view of Wolof morphology and morphosyntax, can be viewed as an ‘accident’ along a path in which the overall tendency is, for noun morphology, from agglutinating towards isolating: not only are the inherited prefixed NC markers long gone, but also the inflectional irregularities (stem alternations) seen in (9), partly arisen from them, are on their way to disappearing.²⁰ In other areas of inflectional morphology, while the verb maintains its agglutinating structure, pronominal and adnominal agreement targets either stay agglutinative (cf., e.g., (33b–c)) or develop paradigmatic irregularities, as seen for the indefinite article in (27b–c), of the kind linguists usually associate with inflecting-fusional type morphology.²¹ Contrary to those in noun morphology, which are in the process of vanishing, the irregularities in the indefinite article are stable as long as the NC system is stable. This, however, is not anymore the case in contemporary urban varieties, which leads us to the last section. ¹⁹ Pozdniakov & Robert (2015: 565) provide a similar scheme, without the two plural classes ji and si, and marking a blank for both neutralization (occurrence of ay for ñ- plurals as well as for y- plurals) and defectiveness (non-existence of forms for singular j- and l-). ²⁰ In this transitional stage, however, as argued while concluding section 6.5.2, variation between two cell-mates in the plural adds to overall paradigm complexity. ²¹ That verb and noun inflection can differ, in this respect, within one and the same language, ‘and develop diachronically in typologically different directions’ (Dressler 2005: 7) has been shown by much work on morphological typology (see, e.g., Haspelmath 2009: 25).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
159
6.7 External explanatory factors for structural simplification Even once the local increase in complexity in noun and determiner morphology addressed above has been recognized, it remains true that, on the whole, Wolof morphology is both less rich and less complex than that of the closely related Atlantic languages mentioned above (and, hence, of the reconstructible common ancestor, under either of the alternative classifications in section 6.2). This impoverishment/simplification, resulting in a ‘restricted system’ (Pozdniakov & Robert 2015), may be traced back to external factors. In fact, Wolof, a vehicular non-native language for a substantial share of its users, is a typical case of a language spoken in an ‘exoteric niche’ (in Lupyan & Dale’s 2010 terms) or in a ‘Type 2 community’, or ‘an extreme “generalized outsider community” ’ (in Kusters’ 2008: 14 terms). The literature on linguistic complexity has addressed the consequences on morphology that are often observed when the percentage of non-native speakers becomes substantial, concluding that languages spoken in such communities are expected to simplify their morphology: we may conjecture that when a language splits, and one variety becomes more like a Type 1, and the other like a Type 2 community, we expect that the latter becomes simpler in its inflectional morphology. (Kusters 2008: 15)
As McWhorter (2007: 2) puts it, ‘that heavy second-language acquisition decreases structural complexity is thoroughly intuitive to most linguists’ (see also McWhorter, Chapter 10, this volume). On the contrary, a language spoken in a tightly-knit local community by small numbers of speakers may be a favourable setting (as argued by Trudgill 2004b, 2009) for better maintenance of linguistic complexity. If one compares Wolof with Seereer, this seems to provide an explanatory framework, as the latter has slightly more than one million speakers in Senegal and Gambia, and its inflectional morphology remains substantially richer and more complex than Wolof’s (see (10)). However, this is far from yielding a deterministic explanation, as one easily realizes considering that Fula’s inflectional morphology, as seen in (11)–(12), remains both richer and more complex than Wolof’s in spite of the language being spoken by over twentytwo million spread over eighteen countries. Nonetheless, there is a crucial sociolinguistic fact about Wolof, concerning language attitude and prestige hierarchies, that may be invoked as a precondition of the observed simplification. For this language, in fact, the (conservative) linguistic norm as reflected in school grammars and dictionaries, which is often associated elsewhere with the maintenance of complexity, does not go hand in hand with linguistic and social prestige. Rather, in the Wolof speech community speaking correctly is not prestigious, and this holds true both in rural, socially
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
160
traditional areas and in urban ones. In traditional society—as shown by the seminal study by Irvine (1978) and much subsequent work in sociolinguistics (or the ethnography of speaking)—linguistic elaboration and correctness, in keeping with the conservative norm, is associated with griots, low-caste language specialists, regarded as socially inferior in comparison with the ‘géer (“nobles”— farmers, administrators, religious leaders)’ (Irvine 1978: 39). Irvine’s noble informants in general speak in what is considered a less accurate way, involving differences at all levels—as listed by Irvine (2011: 43–5)—from prosody (e.g., flustering style, as opposed to clear voice) to syntax (e.g., incomplete phrase structure, false starts). Simplifying the noun-class system fits into this picture, through what Irvine (1978: 41) labels an ‘appropriate-error strategy’, which crucially involves the generalization of the default class markers bi/yi. The same tendency is observed in urban Wolof as well, as seen in (4c-d) (cf., e.g., Mc Laughlin 2001: 158). Here, the overall strategy to achieve linguistic prestige differs from what is observed in traditional rural social contexts: it is particularly language mixing and extensive borrowing, especially from French in Dakar, which serves the purpose. But all in all, rural and urban society converge, as Irvine (2011: 63f) remarks, in determining higher prestige for ‘bad’, incorrect language: le ‘mauvais’ wolof urbain a quelque chose en commun avec le ‘mauvais wolof ’ des hautes castes rurales. Dans les deux endroits, la ‘plus belle langue wolof ’ n’est pas attribuée aux gens les plus hauts placés. [‘bad’ urban Wolof has something in common with the ‘bad Wolof ’ of rural high castes. In the two settings, the ‘most beautiful Wolof language’ is not attributed to the highest-placed persons.]
Thus, that of Wolophones is not only a Type 2 community, with many non-native speakers, but also a community in which native speakers, in both traditional and urban contexts, tend to adopt themselves, qua prestigious, modes of linguistic behaviour favouring simplification, a fact that can be plausibly invoked as an explanatory factor for the overall structural simplification of morphology and morphosyntax that Wolof has undergone, compared with its antecessor within the Atlantic language family.
Acknowledgements Thanks to the editors and two anonymous reviewers for comments and constructive criticism on a previous draft, as well as to Cheikh Anta Babou for joint fieldwork on Wolof. Usual disclaimers apply.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
II
T H E CR O S S L I N G U I S T I C PERSPECTIVE
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
7 Canonical complexity Johanna Nichols
7.1 Introduction Of the various ways of measuring linguistic complexity (see the Introduction to this volume; and Sinnemäki 2011), this chapter focuses on what I will call enumerative complexity (EC) and canonical complexity (CC). EC is also known as taxonomic complexity (Miestamo et al. 2008), resources (Dahl 2004), economy (Kusters 2008), the principle of fewer distinctions (Di Garbo & Miestamo in press; defining non-complexity), inventory complexity (my previous work), and other terms. It is based on assessing the number of elements in an inventory or values in a system, for some domain or domains such as the number of phonemes, genders, tenses, derivation types, alignments, word orders, etc. It has been widely used in typological surveys, chiefly of phonological complexity (Shosted 2006, Hay & Bauer 2007, Nichols 2009, Donohue & Nichols 2011; Bickel & Nichols 2013 for inflectional complexity of verbs), but it has disadvantages. It is straightforward to survey for well-defined and consistently described subsystems such as the phoneme inventory, but guaranteeing comparability of categories elsewhere can raise problems. For example, is it meaningful to compare the sizes of case inventories when a language with few or no cases probably uses adpositions to the same end? Are the number of contrasting members of a (vertically arranged) paradigm and the number of potentially co-occurring morphemes in a templatic structure both inventories and to be compared in the same way? Importantly, EC is not the kind of complexity that figures most interestingly in studies investigating correlations between linguistic complexity and sociolinguistic history, notably Trudgill (2011) and Dahl (2004); there it is non-transparency, not inventory sizes, that is relevant. The other type used here is close to what is known as descriptive complexity or Kolmogorov complexity: the amount of information required to describe a system. This is a better measure and captures well the non-transparency relevant to learnability and sociolinguistic effects, but it is problematic to measure and compare. Canonicity¹ theory (Corbett 2007, 2013a, 2015, and others), though
¹ Henceforth I use that term to refer to the theory and its body of exemplar studies, since it is used in the foundational literature, but canonicality when I need to nominalize the adjective canonical (since only canonicality is possible in my English). Johanna Nichols, Canonical complexity In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Johanna Nichols. DOI: 10.1093/oso/9780198861287.003.0007
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
164
not a complexity measure in itself, can be used as a good approximation to descriptive complexity and is straightforwardly measurable and comparable (Nichols 2019; see Audring 2017 for a similar approach). The theory aims at improving definitions and technical understanding of linguistic notions. It defines a logical space (for a linguistic concept or structure or system) by determining the central, or ideal, position in that space for each dimension and the whole set of dimensions, and kinds of departures from that ideal. An element is non-canonical to the extent that it departs from the ideal. Essential to defining the ideal position is the structuralist notion of biuniqueness, or ‘one form, one function’: any departure from that ideal is non-canonical. Such departures decrease transparency between function and form or underlying and surface, so the extent or number of non-canonical patterns in a system can also be used as a measure of its nontransparency. The literature of canonicity theory offers a good deal of work on morphological paradigms, which makes it a straightforward matter to identify the non-canonical elements in a paradigm The approach has the further advantage of being well-grounded in morphological theory yet applicable on its own without requiring adoption of an entire formal framework. To avoid cumbersome terms like non-canonicality-based complexity or noncanonicity-based complexity, I will use the simpler if less logical phrase CC.² Measuring CC is straightforward in principle: define types of systems and subsystems so as to maximize crosslinguistic comparability, and count the number of non-canonical patterns or elements found in each, for each language. Both EC and CC are what I will call structural measures of complexity: ones that are based on structural analysis and comparison. (Calculations using the measures can of course vary from classic typological method to computational method.) There are non-structural methods as well: for example, various kinds of complexity can be recovered computationally from text and lexical corpora (e.g. Bentz et al. 2017, using entropy in parallel corpora), or by measuring the difference in size between compressed and uncompressed copies of a corpus (Juola 1998; Ehret & Szmrecsanyi 2016). However, adequate corpora do not always exist, and the computational know-how or resources required may not be within reach of, say, a fieldworker or historical linguist who wants to attribute a complexity level to one language or describe relative complexity among a few languages. Furthermore, automatically extracted measures and variables are not constrained to reflect best practices in linguistic analysis and comparison, a fact that reduces their validity and could eventually cut linguistic analysis entirely out of defining linguistic complexity, thereby cutting linguistics out of an important segment of
² Or perhaps it is logical. Canonicity theory is concerned with whether linguistic elements are canonical or not, while the goal in this fragment of complexity theory is to describe types of complexity. In that theory, presumably the ideal in a space of complexity is maximal complexity, so in that sense ‘CC’ is logical.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
165
Big Data work. Independently of those considerations, typology needs more than one kind of complexity measure. To address these various needs and possibilities, this chapter proposes a method for measuring CC (section 7.2) and presents results of a survey showing that CC yields results that are revealing and do not duplicate those from EC but complement them to make a stronger combined measure (section 7.3).
7.2 Method 7.2.1 Samples For the CC measure, I used a partly convenience and partly diversity-based sample of 113 languages, seeking coverage of some families and areas, and fairly good coverage of northern Eurasia and North America, plus thinner coverage of the rest of the world. The southern lands (Africa, Australia-New Guinea-Oceania, South America) are thinly covered, South Asia not at all, and Southeast Asia by only two languages.³ In addition to coverage, sample languages were chosen for comprehensiveness and quality of descriptions. The sample languages are listed in Appendix 7.3. For the EC survey I drew on the mostly diversity-based set of 226 languages that has grown from Nichols (2009), using the 105 of those languages that are also found in the CC sample. The combined complexity measure is the sum of the other two, available for only the 105 languages of the sample intersection. Where comparisons of the two kinds of complexity are at issue, I used only the 105-language sample intersection. Those involving only CC use the full 113 languages. There are also some comparisons of families and areas, using subsets of the sample.
7.2.2 Survey objects This study addresses only morphological complexity and specifically inflectional morphology. I surveyed a set of morphological typological variables across seven inflectional categories and three lexical classes (or parts of speech, henceforth POS)—nouns, independent pronouns, verbs—and counted the number of
³ The denser coverage of the northern hemisphere is intentional, as I planned to test some of the geographical distributions hypothesized in section 7.3. The coverage of the southern hemisphere is thinner than planned because the survey proved more labour-intensive than anticipated and could not be fully completed as projected.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
166
non-canonical patterns in inflectional paradigms for each category and each POS. The set of categories is a sample chosen because they are generally well-understood and well-described (including that grammars make it relatively straightforward to determine whether the category is present or absent, and if present what its values are). They are present in enough languages to make frequency comparisons meaningful. This section describes, first, the inflectional categories surveyed, then the variables. Survey data consists of: (1) a text report on each language that includes any definitions of categories and variables required and discussion of any coding decisions, plus sources used. These reports discuss but do not fully replicate the information available in grammars. Sometimes they include scans of published paradigms. (2) A database page for each language showing the number of noncanonical patterns in each intersection of POS and category. Appendix 7.1 lists the categories and variables, and Appendix 7.2 gives the sum of entries in each intersection of categories and variables, across the whole sample. Appendix 7.3 lists the sample languages. The entire database will be included in some future release of the Autotyp database (Bickel et al. 2017 is the current release). The inflectional categories surveyed are: • Case. Dependent marking of argument roles. Only the core roles of A, S, O, G, and T, as well as Poss (possessor) were surveyed. • Gender. Lexically specified agreement categories of nouns, usually covert on the noun itself and necessarily made overt in agreement. Only noun gender is surveyed, and not pronoun gender as in English he, she, it. • Number. Only singular and plural were surveyed. For nouns, presence vs. absence of number marking was entered, but the plural paradigms for any inflectional categories of nouns (typically case, gender, possessive marking) were not surveyed. • Person. Only 1-2-3 singular inflectional paradigms were surveyed; for independent pronouns, only first and second persons (singular and plural). Inclusive and exclusive, where they exist, are both included. Person inflection on nouns is possessive inflection; on verbs it is argument indexation.⁴ Where independent personal pronouns have a generic pronominal base and mark person only in the form of the regular inflectional person markers, person is counted as an inflectional category. Examples from Ainu are in (1); the same person prefixes are also verb indexes and possessive markers. In languages like those of Europe, person in pronouns is a lexical category and does not enter into this survey at all. ⁴ Indexation is defined as in Nichols (1992: 48–9): marking on dependent or head of a category of the other, involving copying of relevant grammatical features from one member to the other. It is opposed to registration, which notes the presence of the other member and its type but does not copy features. (Nichols 1992 described only indexation and registration of dependents on heads, but in fact both can go either way: see Nichols & Lander in press.)
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
(1)
167
Ainu (isolate; Japan and formerly Sakhalin) independent pronouns. (Shibatani 1990: 30–1; see Bugaeva 2012: 471 for slightly different forms from Southern Hokkaido dialects.) Singular Plural 1 ku-ani a-oka 2 e-ani eci-oka 3 Ø-ani Ø-oka • Person-number. Person and number are so often co-exponential (portmanteau or otherwise opaquely fused) in inflectional paradigms that personnumber was treated as a separate single category (see Appendix 7.2). Most languages with possessive inflection of nouns signal both the number of the possessor and the number of the possessed noun, using a dedicated plural affix for the number of the noun and co-exponential person-number marking for possessor indexation. (Sometimes the dedicated plural affix is promiscuous in the sense of Leer (1991), indicating plurality of either noun or possessor or both.) If there is a separate, dedicated marker of possessor number, however, that is entered separately as number. • Classifier. Following Fedden & Corbett (2017), I use this term to comprise numeral classifiers as well as what they argue are second gender categories in languages like Mian (Ok family, New Guinea) and several Amazonian languages (e.g., Yagua, Yaguan family) but are called classifiers by tradition or for convenience, since it is useful to distinguish classifiers from the other, more canonical, gender category. For the present survey, the decision whether an inflectional category is gender or classifier is less important than ensuring that it is included somewhere; what figures at this early stage is the total non-canonical points per language, not their distribution across categories, POS, and variables. Classifiers were counted if a classifier is (more or less) obligatory for many or all nouns in contexts of quantification, and possible for most numerals. More precisely, I consider occurrence with numeral classifiers to be an inflectional property of nouns, while the number and predictability of classifiers are properties of classifiers (and not surveyed here since they are not among the three lexical classes targeted here).⁵ For most classifier systems, the contexts of usage extend beyond phrases containing numerals, and while some are primarily numeral classifier systems, for others (particularly languages of Amazonia, e.g. Kwaza: Van der Voort 2006) the contexts considerably exceed those of prototypical numeral classifiers. Only for Mian (Fedden 2011) have I treated what are called
⁵ Numeral classifier systems often recruit regular nouns to the system, and in their capacity as regular lexical nouns they are of course covered in this survey.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
168
classifiers as a gender category and added entries for their number and unpredictability. Thus the six classifiers of Mian are entered as a noun category with six inherent values, all unpredictable, while the 150 or more classifiers of Kwaza, like, for example, the ~50 of Mandarin, do not appear in this database and do not contribute to the EC of noun inflection or to non-canonicality in the form of unpredictability. • Tense/aspect/mood (TAM). The survey seeks the most basic synthetic present-like and aorist-like tense categories. (In terms of aspect these tend to be imperfective and perfective respectively.) If one or both is absent, as it is, for example, in Mawng (Iwaidjan, northern Australia), which has only a future/non-future tense opposition, the closest basic tense opposition is used (future and non-future in Mawng). If the language has no inflectional tense (as Mandarin does not), basic imperfective and perfective are used if the language has inflectional aspect; otherwise there is no entry for the TAM category. • General. Some of the variables are inherently difficult to ascribe to some particular category. Examples are the numbers of stems per lexeme and stem classes per language. They are entered as general rather than as pertaining to paradigms of particular categories (usually with a comment in the data report). Again, for the present survey the exact placement of an entry is less important than ensuring that it is included somewhere and contributes to the total. For each language the database records for each category whether it is present or absent (a yes/no, or 1/0, classification). The variables surveyed are the following.⁶ For all of them the number, or the presence vs. absence, of non-canonical patterns was entered for each of the survey categories just listed. For what was counted as non-canonical see below. For every variable and every category and value, irregular words, lexically specifiable exceptions, and small closed classes are disregarded. Sizable minority classes, and classes that are open or specifiable as a class, are counted. For example, if possessive inflection applies only to kin terms, or even only to consanguineal kin terms, this is counted as a class. [1] Inflectional classes. In the terms of Bickel & Nichols (2007) these are instances of formative flexivity: classes distinguished by different sets of inflectional morphemes (e.g., suffixes). Not all grammars explicitly account for the number of declension or conjugation classes, and those that do often mix together, or at least fail to distinguish, formative flexivity and stem flexivity (variable [5] below), so
⁶ Variables are numbered in square brackets and examples in ordinary parentheses.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
169
deciding whether a class involves formative flexivity or stem flexivity often requires analysis and justification (laid out in data reports). [2] Unpredictability of any inflectional classes. Sometimes the inflectional class of a noun is predictable from semantics, gender, phonology, or some other property, but often it is not. For example, the declension classes of conservative Indo-European languages like Latin or Russian are not predictable overall (though in each language there are some clusters of semantically similar words in each class). For Russian, it is possible to predict gender from declension class with fair accuracy, but not vice versa (Corbett 1982); close analyses like Corbett’s are not usually available, so my practice was to regard classes as unpredictable unless the grammar claimed otherwise and gave good grounds for the claim. The number of inflectional classes and the number of unpredictable ones are matters of EC, not CC. They are removed from some of the calculations here as indicated below. [3] Inherent categories. This applies primarily to gender classes of nouns, which are marked by agreement on other words and are usually covert on the noun. (Overt indication of gender on the noun itself does occur in a number of languages, e.g. Bantu, or to some extent Nakh-Daghestanian. In such languages gender was recorded as an inflectional category of nouns and its number of inflectional classes and their unpredictability were recorded.) Where classifiers are lexically specified for the noun (as is usually said to be the case for Mandarin, e.g. Chao 1968: 589–93), they are also coded as inherent. The alternative is relatively flexible choice of classifiers per noun depending on semantic properties. [4] Unpredictability of inherent categories. Gender classes can be predictable for some or all genders. Here the question asked is how many of the gender classes are predictable (largely or entirely, i.e. for most or all of their nouns). Predictability is sometimes described as phonological, but usually as semantic. Every language with gender in the sample, and nearly every language on earth with gender, has predictable gender for nouns referring to humans, which are usually masculine or feminine depending on the sex of the referent but sometimes belong to a general human category.⁷ What is counted here is not predictability but unpredictability, since that is non-canonical. Counting the number of unpredictable classes amounts to EC, and it also contributes to a rapidly inflating scale.⁸ Instead of counting classes I have used the following values for applicability: ⁷ I know of only one language where human nouns have arbitrary gender: Uduk (Koman, Africa; Killian 2015), where the cutoff point for gender predictability is set even higher on the animacy hierarchy: it is predictable for first and second person pronouns but not for human nouns. ⁸ Cole (1967) describes most of the non-human gender classes of the Bantu language Luganda (which number fourteen singular-plural concord pairs by his count) as ‘miscellaneous’ (these number ten), a large number for one cell of this survey. Most Bantu grammars describe the classes as having a semantic basis with some unpredictable members, but in languages with only one description the decision on predictability has to be taken at face value.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
170
(2)
Applicability thresholds 0 Applies to none or very few of the words in the class (here, nouns in the gender class). 1 Applies to an appreciable minority of the words in the class, and/or the set of words is open or definable rather than requiring enumeration. 2 Applies to all or most of the words in the class.
and the following semantic criteria: Human nouns. Unpredictability of their gender by the above values. Non-human nouns. Unpredictability of their gender by the above values. Human gender cross. Non-human nouns are found in human gender classes or vice versa. As a further note, most languages with a sex-based gender opposition for human nouns also apply it to a few non-human animate nouns, typically large and important domesticates. This kind of individual lexical exception falls under value 0 of the applicability scale. Table 7.1 shows a few languages and how they are treated in this classification. Languages with a zero score have no unpredictable gender classes, either because their gender is entirely predictable (Avar) or because they have no gender, either of nouns (English) or of pronouns (Finnish). [5] Number of stems per lexeme. This is what Bickel & Nichols identify as stem flexivity: declension or conjugation classes based on changes in the stem, such as ablaut, extensions, or allomorphy conditioned by the survey categories. For example, in Nakh-Daghestanian languages, many or most nouns have distinct nominative and oblique stems in the singular, with the oblique stem formed by adding an extension suffix (Kibrik 1991, 2003). This is coded as two stems per lexeme. In English and other Germanic languages, the sizable but minority class of strong verbs has different stems, marked by ablaut, in the two survey tense categories (English sits, sat); this is also two stems per lexeme. A word or class is counted if it involves all, most, or a sizable or open subset of the relevant words, following the thresholds in (2). [6] Number of stem classes per language. The Nakh-Daghestanian languages with extensions in oblique stems mostly have two stems per lexeme, but the number of oblique extension suffixes ranges from one to over a dozen in different languages. This, plus the (usually minority) class of nouns with a single stem, is the total number of stem classes per language. Following the criteria in (2), the number entered in the database is the number of such classes that are sizable, productive, and/or open. [7] Unpredictability of those stem classes (per language), by the same criteria as for [2] and [4] above.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
171
Table 7.1. Gender unpredictability for some example languages IE Ingush Avar Bantu Nama BGW Uduk* English* Finnish Human: Non-human: Cross: Total:
0 2 2 4
0 2 1 2
0 0 0 0
0 1 0 1
0 2 2 4
0 1? 0 1
2 2 2* 6
0 0 0 0
0 0 0 0
Notes: Languages: IE: Generic conservative Indo-European (e.g., Latin, Russian). Three genders: masculine (M), feminine (F), neuter (N). The neuter gender contains relatively few nouns, so most non-human nouns are M or F, arbitrarily classified. Ingush: Nakh-Daghestanian (Caucasus). There is a dedicated gender for human males, a gender containing human females and some inanimates (though if the survey counted singular-plural gender pairings these would be different genders as plurals have different genders for human females and non-humans), and two non-human genders with arbitrary membership. Avar: Nakh-Daghestanian (Caucasus). There are three genders with total semantic predictability: M (human males), F (human females), N (all else). Bantu: Subbranch of Benue-Congo (Africa). Generic entry applicable to most Bantu languages including Luganda in this survey. There is a dedicated human gender and a number of non-human genders (the number varies among languages) which most descriptions present as having a semantic core or prototype plus a limited number of arbitrary members. Usually there are also a few dedicated genders for such things as non-finites or particular deverbal derived nouns. Nama (Khoekhoe): There are two genders, M and F, containing all human males and all human females respectively, and other nouns are arbitrarily divided between M and F. BGW (Bininj Gun-Wok; Gunwingguan, northern Australia): M and F genders contain all human nouns plus some arbitrary members. The other genders also have a semantic core and some arbitrary members. Uduk (Koman; Africa): Two genders; all nouns arbitrarily classified; first and second person pronouns have predictable gender (all have gender 2). English: No noun gender. Finnish: No gender of either nouns or pronouns. * Not in sample. For Uduk, see footnote 7 above in text.
[8] Arguments indexed. The number of core arguments indexed on the verb, counted for the verb type with the most core arguments. The maximum number of core arguments possible is three (A, G, and T), but not all languages have ditransitives, and for those that do not the maximum is two. Arguments indexed are counted only for simple clauses without valence-related derivations such as causatives or applicatives. [9] Co-exponence, that is, portmanteau, cumulative, or otherwise opaquely fused marking of categories. Examples are the gender-number-case suffixes of nouns and adjectives in conservative Indo-European languages. Co-exponence violates the one-form-one-function tenet of canonicality, as one form has three functions (marking gender, number, and case). A language is coded as having coexponence if all, most, or a sizable minority of its words in the relevant categories (e.g., nouns and their case paradigms) have co-exponent markers; it is so coded for all of the categories involved (e.g., for Indo-European, gender, number, and case). [10] Syncretisms: identical formatives in two or more categories that are nonidentical elsewhere in the language. Consider the German articles in (3):
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
172
(3)
Definite articles in German (syncretism patterns numbered with subscripts) M F N Plural das1 die1 Nominative der die1 Accusative den die1 das1 die1 Dative dem der2 dem den Genitive des der2 des der
What is counted is not individual syncretic endings or words but patterns of syncretism. In the German examples, feminine, neuter, and plural paradigms display the same pattern of nominative-accusative syncretism; dative-genitive of feminines is another. German has two syncretism patterns here. In German, case and gender are marked on determiners, of which the articles are the most frequent. Where categories are marked on articles but not the nouns themselves, they are still coded as noun categories, though also as wordhood discrepancies (variable [13] below). The database lists the number of syncretism patterns per category, but the counts and totals in section 7.3 below use only presence vs. absence of syncretism per category, as explained under variable [16] below.⁹ [11] Allomorphy. Defined elsewhere in linguistics as two different forms for a single morpheme or paradigmatic cell, conditioned grammatically or lexically but not phonologically; phonological conditioning is not counted here since it can be considered automatic. An example is nouns of masculine gender in most Slavic languages, which have different accusative endings for animate and inanimate nouns. For example, three cases of Russian masculine nouns: (4) Nominative Accusative Genitive
‘brother’ brat-Ø brat-a brat-a
‘table’ stol-Ø stol-Ø stol-a
There is one allomorphy here in noun case inflection (accusative -a vs. -Ø), and also two patterns of case syncretism.¹⁰
⁹ Syncretism is clearly non-canonical (Corbett 2013a, 2007, and other works), as it makes for nonbiuniqueness, but reviewers and audience members often object that syncretism does not increase the amount of information required to describe a language. This shows that canonical and Kolmogorov complexity are not identical; it is the only respect I am aware of in which they are different. I believe the difference arises because Kolmogorov complexity is concerned only with the information required to describe the text as string alone and not the full text including its message. For the message even at the minimal level of determining which case is intended as in (3), resolving syncretism requires bringing in additional information. ¹⁰ There are debates in the Slavistic literature as to whether animacy is an additional gender category, or for that matter a subgender or supergender. It is also sometimes called a case split or a gender split, but I have not tried to distinguish allomorphy from splitting.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
173
I did not encounter examples where it was difficult to decide whether something was allomorphy (within one category) or a syncretism (in paradigms that do not have that allomorphy), but there may be such cases. If so, the important thing is to enter it somewhere, for this survey in which total numbers of non-canonical points are compared. [12] Position discrepancies. In some languages the forms of a single category are distributed between two different positions, e.g. Pazar Laz (Kartvelian, Turkey; Öztürk & Pöchtrager 2011: 485) subject person agreement in verbs (present tense): (5)
Pazar Laz subject agreement morphemes 1 v-/p-/p’-/b2 Ø3 -s
First person is a prefix, second person zero (presented as a prefix because the object prefixes that compete hierarchically for the same slot have an overt 2 object form), and third person a suffix. Discrepant position is analogous to different forms for one category (albeit the forms are slots rather than morphemes), hence non-canonical. [13] Category discrepancies. I used this variable to account for infrequent examples like verb inflection in many Slavic languages, which have agreement for person-number in the non-past tense and gender-number in the past tense. The survey category is TAM rather than just one tense; if there were only one survey tense there would be no discrepancy. In these languages verbs were coded as having the categories of person-number, gender, and TAM, with a category discrepancy for TAM. [14] Wordhood discrepancies. These are discrepancies between such statuses as independent word, clitic, affix, and non-linear marking such as ablaut, within a single paradigm. For example, in Slovene, singular pronouns have both tonic and clitic forms but plural ones have no clitic forms; in Bulgarian, Romanian, and Ossetic, subject indexation is suffixal while object indexation uses clitics. Languages like German or Mian (Ok, New Guinea) have noun gender marked by articles; this is a wordhood violation for gender not as an inherent category but as an agreement category (in languages without the wordhood violation it is usually marked affixally, as with the noun class prefixes of nouns in Bantu languages). [15] Partial marking: Only some of the otherwise eligible words inflect for the category. An example is gender in Nakh-Daghestanian languages, which is generally marked by prefixation or initial consonant mutation of the verb, but not for all verbs (the verb roots that do take it range in different languages from about 30% to the great majority of verbs). Another example is number: probably all languages that have number inflection on nouns apply it only to some nouns. Most common is drawing the line between count and mass nouns, with mass
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
174
nouns taking no number marking, but it is also fairly common to find the line drawn between animate and inanimate or human and non-human nouns. I did not code number as a partial category for any of these: for count vs. mass nouns it is clearly due to semantics, and for the distinctions higher up there is a case to be made that those are semantically akin to the count/mass distinction. Some languages have only a handful of nouns that make number distinctions, for example Yurok (Algic, California), where only nine nouns, not representing a coherent semantic group, form plurals (Robins 1958: 23); these are my only cases of non-semantic, purely lexically specified, plural marking, but the nouns involved are too few in number to count in this survey. Partial marking is not common in the survey languages; Nakh-Daghestanian gender contributes most of the examples. [16] Multiple marking. Even rarer among the survey languages is marking of an inflectional category more than once in a wordform. For example, Bardi (Nyulnyulan, Australia) marks person-number on verbs with person enclitics, and can add an optional additional person-number enclitic to mark plurality of the object; this amounts to marking person twice. Yurok has A and O agreement in person-number, and in some verb classes and categories one-argument verbs fill both slots and thereby mark subject person-number twice (Robins 1958: 69ff). [17] Other. This entry column handles the occasional uncertainty in classification, but primarily contains calculations of the number of categories or dimensions involved in co-exponential marking. Noun inflectional paradigms of IndoEuropean languages preserving the original design of co-exponential gendernumber-case inflection abound in such non-canonical phenomena as syncretisms, unpredictable declension classes, unpredictable gender classification, human crossgender, and others. (For some illustrations, see Nichols 2019.) These give them extremely high CC values if the number of syncretism patterns is counted, and this skews comparisons. Therefore I coded not the number of such patterns but the number of categories involved in them, treating those as dimensions of freedom within which syncretism might appear. Similarly, for complex systems of verb argument indexation where person-number and role (A, O) are marked by co-exponential and often opaque markers, I counted the number of categories involved (usually person-number and role, sometimes also gender).¹¹ This procedure levels out the possible complexity ranges of case-inflecting languages like Indo-European and complex head-marking languages like many in the Americas. But even with the obvious heavy contributors neutralized, section 7.3 shows that the languages of western Eurasia still reach overall higher CC levels than even the polysynthetic languages of the Americas. I judge this high level to be nonartifactual as measured, implying less opacity for polysynthetic inflection than ¹¹ Recognizing role as involved in the categories is also a way of accounting for the mix of direct and hierarchical marking of person in such systems.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
175
for co-exponential case inflection—and indeed the notable complexity of polysynthetic languages lies not so much in inflectional non-transparency as in their templatic ordering, mix of lexical and inflectional categories, and sheer number of grammatical categories, and not primarily in the transparency or nontransparency of their core argument marking. This is also what the comparison of CC and EC levels in section 7.3 below implies: polysynthetic languages have more categories and slots, not more opacity.¹² The variables are summarized in Appendix 7.1.¹³ CC is what I call a composite variable: one that can be stated as a single typological variable (in this case, the CC value) but that composite consists of a number of separately defined variables. These subvariables are not a random set of variables and not just a thematically related set but the total set of grammatical phenomena that cover the categories and POS and each of which defines some aspect of non-canonicality. They are not drawn from an existing database, and in fact only one of them—the number of arguments indexed—is a variable presently in the Autotyp database.
7.3 Results Appendix 7.4 is a graphic display of the levels of CC in the sample languages, separately for the CC total involving all datapoints and the one omitting those datapoints that enumerate categories (and are therefore a leak of EC into the CC count). They are similar except in absolute values. On either one, the sample languages can be described as spanning the complexity range from Mandarin (lowest) to Skolt Saami (highest). The rest of this section tries out CC by comparing how well CC and EC perform in tests for various kinds of correlations.
7.3.1 CC and enumerative complexity There is no correlation between CC and EC (linear correlation coefficient -0.023; p = 0.819, Spearman’s rank correlation test, two-tailed). This means that they can be used as independent typological variables.
¹² Differential complexity of noun vs. verb inflection and head vs. dependent marking, and measuring the complexity of hierarchical patterns and polysynthetic structure, will be covered in a separate paper. At that point the dimensions of co-exponential marking will be given a term and a separate dedicated variable. ¹³ The variables used for EC are much as defined in Nichols (2009). Publication of an updated version of that list is planned for the next year or two.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
176
7.3.2 Complexity and gender Nichols (2019) found that there was no correlation between EC and the presence of gender in a language, concluding that gender and the well-known complexity of many gender systems are not simply byproducts of overall complex morphology. I replicated that study on the smaller and different language set used here, and using a correlation test, with the same result: there is no correlation between EC and presence of gender. For CC, there is a slight positive correlation but it is far from significant (correlation coefficient 0.089, p = 0.233).¹⁴
7.3.3 Geography: continents and areas I calculated the mean CC for a number of areas and families, and asked whether the range of mean 1 standard deviation for each area overlapped with others, using the breakdowns in Table 7.2. Ranges for local areas and families are in Table 7.2. Figure 7.1 gives a graphic display. Non-overlap of the ranges means significantly different populations. Macrocontinents and continents overlap each other considerably, which means that the largest groups all represent the same population. Of the local areas, the CircumBaltic has a very large standard deviation, that is, very little areality, and overlaps Table 7.2. Areal and family breakdown Macrocontinents: Africa, Eurasia, Australasia (Australia, New Guinea, Oceania), Americas Selected continents: Western Eurasia (to the Urals), North Asia (Siberia and northern Central Asia), North America, Central and South America Local areas: Balkan, Caucasus, Circum-Baltic, North Inner Asia (non-Pacific Siberia and northern Central Asia), North Pacific Rim (coastal and near-coastal from Japan to northern California) Families: Balto-Slavic, Uralic, Nakh-Daghestanian, Tungusic, Uto-Aztecan Notes: Figure 7.1 shows the mean CC 1 standard deviation for all groups. Northern continents (Eurasia, North America), the Caucasus, and the Uralic and Nakh-Daghestanian families are wellsampled; other areas and families are compiled opportunistically from languages in the sample and are less well covered.
¹⁴ For these calculations, to avoid circularity the points contributed by gender were subtracted from the total complexity. (If that is not done, CC yields a highly significant but spurious correlation. EC does not, because the contribution of gender to its total is much less than for CC.) For CC I use the twotailed value since I had no advance expectation about whether or how CC might correlate with gender.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
177
CC: Continents
CC: Macrocontinents 60.0
60.0
50.0
50.0
40.0
40.0
30.0
30.0
20.0
20.0
10.0
10.0 0.0
0.0 1
2
3
4
1
2
Africa
Eurasia
Australasia
Americas
W. Eurasia
N. Asia
CC: Areas
3
4
N. America C-S America
CC: Families
60.0
60.0
50.0
50.0
40.0
40.0
30.0
30.0
20.0
20.0
10.0
10.0 0.0
0.0 1 Balkan
2
3
4
5
Caucasus Circum- N. Inner N. Pacific Baltic Asia Rim
1
2
Balto-Slavic Uralic
3
4
5
UtoNakh- Tungusic Aztecan Daghestanian
Figure 7.1. Mean CC 1 standard deviation for three areal breakdowns and selected families Notes: Groups are defined in Table 7.2. The mean and range for the entire sample are very similar to those for Africa.
most others. The Caucasus has a relatively large standard deviation (unsurprisingly, as its languages range from the fairly simple Lezgi to the very complex Ingush and Khinalug), and its status as an area is debated (con: Tuite 1999, pro: Chirikba 2008; I side with Tuite). The other three are well-known areas and have small standard deviations and little or no overlap. Mean complexity levels differ considerably among the areas, suggesting that regression to some neutral complexity level is not a consequence of areality. The five families show relatively little overlap. Uralic, one of the older and more widely distributed families and the most thoroughly surveyed here, has a large standard deviation. The others have clearer family profiles. Overall, then, continents and macrocontinents are not greatly different from one another or from world totals while local areas and families are more discrete from each other and for the most part internally fairly consistent in their complexity levels. These figures are very preliminary; in particular, standard deviations will probably shrink as the sample adds more members per area and family, reducing overlaps.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
178
7.3.4 Large-scale geography Since a number of typological variables form worldwide east-to-west clines in the northern latitudes (low in Europe, high in eastern North America, or vice versa; Nichols 2017), I tested whether CC and EC have such a distribution. Figure 7.2 plots complexity (CC or EC) against longitude in a series of graphs. (Longitude is universal longitude, not split into east and west but continuous from 0 to 360 .) The plots all have the same design: the vertical scale is the number of CC or EC points and the horizontal scale is longitude, running from west to east as in Figure 7.1. (The plot begins 10 west of Greenwich so that westernmost Europe and Africa will be counted with those continents and not with the Americas.) A worldwide cline will show up as a pronounced overall upward or downward slope to the pattern of dots. Each graph has a trendline showing slope, which can be regarded as indicating the approximate magnitude of difference between west and east. (The trendline is calculated on the rectangular plot used here, i.e. on a flat-earth model with parallel longitude lines, so for the real earth it has no precise meaning. The visible differences between slopes in different plots do, however, make for a useful comparison that may be graphically clearer than the raw pattern of dots. Statistical significance is not calculated on the plot but on the actual ranked longitude values and does not have the flat-earth problem.) Figure 7.2(a) shows CC values running much higher in the west (the left side) than in the east (the right side), and there is a pronounced though not steep downward slope. Figure 7.2(b) plots only the languages in the northern continents;¹⁵ the slope is similar. For both the correlation of CC with longitude is highly significant. Figure 7.2(c) plots only the southern languages; the pattern is much more dispersed and the slope noticeably less steep, and there is no significant correlation. The interpretation is that (as with several other variables, surveyed in Nichols 2017) there is a worldwide west-to-east gradient, in this case with higher values in the west and lower values in the east, and it is stronger in the northern continents than in the south.¹⁶ Due to the sample structure and the composition of the western Eurasian linguistic population, much of the strength of the CC correlation comes from Indo-European languages. To counter their impact, I tested the sample with the four outliers at the upper left of Figure 7.2(a) removed (three are Slavic languages: Russian, Sorbian, Slovene; but highest of all is Skolt Saami, a Uralic language). Impact on the slope and significance was negligible.
¹⁵ Northern continents are Eurasia and North America. Southern ones are Africa, Australia-New Guinea, and Central and South America. ¹⁶ In Figures 7.2(a)–(b), what appear to be dense vertical stacks of dots at some places are regions that are densely sampled and/or have high linguistic diversity at a similar longitude: at left, at about 45 , the Caucasus; at right, at about 230 , the Pacific coast of North America.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
179
CC x longitude: Whole sample (n= 113) 70 60 50 40 30 20 10 0 –10
p = 0.00001 40
90
140
190
240
290
CC x longitude: Northern continents (n = 82) 70 60 50 40 30 20 10 0 –10
p = 0.00002 40
90
140
190
240
290
CC x longitude: Southern continents (n = 31) 50 45 40 35 30 25 20 15 10 5 0 –10
p = 0.104 (n.s.) 40
90
140
190
240
290
Figure 7.2. Complexity x longitude Notes: Longitude (horizontal axis) runs from the Atlantic coast of Europe and West Africa on the left to the Atlantic coast of North and South America on the right: (a) CC x longitude, all languages; (b) northern continents; (c) southern continents. EC shows a highly significant correlation in the opposite direction, with lower values in Europe and Africa and higher values in the Americas (p = 0.0011, confirming what was reported, using a different sample and values, in Nichols 2009).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
180
A map of languages and their CC levels (to be continuously expanded as work on this project proceeds) is at https://lingconlab.github.io/opacity_Johanna/index.html
7.3.5 Sociolinguistics Dahl (2004) and Trudgill (2011) show that what Trudgill calls sociolinguistic isolation tends to allow languages to grow more complex over time, while sociolinguistically expansive languages tend to simplify. Sociolinguistic isolation means that a language absorbs little or no immigrant or language shifting population, so that nothing hinders the further growth of complexity. An expansive language (this is not Trudgill’s term; I take it from Janhunen 2008) absorbs appreciable numbers of adult L2 learners, and their influence tends to simplify the language. This section describes the four language groups in this chapter’s sample for which enough is known of the history of expansion and non-expansion to permit predictions about relative complexity levels. The groups and the complexity levels are listed in Table 7.3. • Altitude in the Caucasus. In mountain ranges with a central crest, languages generally spread uphill from the economically more important lowlands to more isolated highland communities, which are dependent on the lowlands for trade, commerce, and winter pastures (Nichols 2005, 2013). Highlanders know lowland languages but rarely vice versa; this makes uphill language spread possible and downhill spread unlikely, and likewise for diffusion of individual forms, Table 7.3. Complexity values for four historical groups of languages
(a) Avar sphere
(b) Samur sphere
(c) Slavic
(d) Uto-Aztecan
Andic mean Avar Hinuq (Tsezic) Hunzib (Tsezic) Lak Ic’ari Dargwa Tsakhur (Lezgian) Lezgi Udi Archi Tsakhur Russian Lower Sorbian Slovene Bulgarian Pipil Hopi Cupeño Tümpisa Shoshone
CC
EC
CC + EC
28 36 33 49 42 45 41 27 41 36 41 57 56.5 51 43 21 36 39 27
10 10 9 11 10 15 9 4 7 11 9 8
38 46 42 60 52 60 50 31 48 47 50 65
11 11 7 11 12 12
62 54 28 47 51 39
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
181
categories, etc. That is, downhill languages are expansive while uphill ones are more sociolinguistically isolated. Thus we expect higher complexity in highland languages. Nichols (2013) finds a correlation between EC and altitude in the Daghestanian branch of the Nakh-Daghestanian family, and Nichols (2016) finds a stronger correlation using non-transparency of just gender marking. Nichols & Bentz (2018) show that a correlation of altitude with complexity is a significant worldwide tendency on several different measures. The sample used here is smaller but yields similar results. Both CC and EC correlate appreciably with altitude, and combined CC+EC yields a notably strong correlation for the small sample (Figure 7.3). (a) CC x altitude in Daghestan
Altitude (metres)
3000 2000 1000 0 0
10
20
30
40
50
CC (b) EC x altitude in Daghestan
Altitude (metres)
3000 2000 1000 0 0
10 CC
20
(c) Combined CC+EC x altitude in Daghestan
Altitude (metres)
3000 2000 1000 0 0
10
20
30
40
50
CC
Figure 7.3. Complexity and altitude in Daghestan (eastern Caucasus) for the three complexity counts
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
182
• Spreads and isolation in the Caucasus: The Avar sphere. The eastern Caucasus is compactly settled by the 40+ descendants of the Daghestanian branch of Nakh-Daghestanian; Daghestanian may be of about IndoEuropean-like age. The eastern Caucasus has been inhabited by settled food producers for some 8,000 years. For at least the last few millennia the highland populations have followed an uncommon kind of transhumance: the entire working-age male population leaves the highlands for the winter half of the year, taking livestock to markets and winter pastures and usually finding seasonal work or maintaining businesses in lowland cities. There is what seems to have been a long-standing centre of language spread in the northeastern Caucasus and foothills, dominated from at least c.1000 by the Sarir Kingdom. The canyons of the Avar Koisu, Andi Koisu, and their confluence in the Sulak were the avenues of trade and transhumant migration for most of Daghestan, and large markets formed in the Sulak lowlands. The language spoken at and near the confluence—in recent historical times, Avar—had major economic importance and was the language of work and everyday life for half of the year for much of the male population of Daghestan. This has led to contact effects among the languages of western Daghestan, including a distinctive structural type marked among other things by highly transparent gender systems, lack of verbal prefixation, and of course many Avar loans. Three episodes of uphill spreading can be traced in the Avar sphere (Nichols in prep.): most recently Avar, earlier Andic, still earlier Tsezic. These three make up one branch of Daghestanian, with this structure: [ Tsezic [ [Andic] Avar ] ].¹⁷ Avars apparently became rulers in the Sarir Kingdom on its conversion to Islam (at which point it became the Avar Khanate), and the final battles for control between Andi and Avar took place only in the seventeenth to eighteenth centuries (Aglarov 1988: 24). Avar has been an expansive language, serving as lingua franca along the Andi Koisu for about three centuries and along the Avar Koisu for probably somewhat longer; it has spread well uphill and spilled over the crest to Georgia and Azerbaijan, but patchily, with many non-Avar enclaves. Andic is probably about 1,500 years old, during most of which time it has been expansive and its daughters have spread uphill; their settlement of the Andi Koisu is compact. Tsezic may have separated some 3,000 years ago in an earlier uphill spread; Tsezic languages are now at the uppermost highlands of both the Avar Koisu system and the Andi Koisu. The Andic languages can be expected to show more pronounced effects of spreading than Avar does. The western Tsezic languages (Hinuq in this sample) have been under strong Andic and Avar influence;
¹⁷ Avar is one language, Andic a close-knit group of about ten, and Tsezic five more disparate.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
183
the eastern Tsezic languages (Hunzib in this sample) had less Andic contact and held winter pastures not in the Avar-Andic lowlands but in Georgia to the south. Consistent with this history, Hinuq has considerable Avar influence and a very Andic-like grammar; Hunzib is markedly different, with southeastern Daghestanian-like traits. At the edge of the Avar sphere, the isolate branch Lak was not part of the Avar Khanate but used the same trade and transhumance routes and shows Avar lexical and grammatical influence; isolated in a highland plateau, it has no known history of spreading. Beyond Lak are the Dargwa languages, for which the Caspian coastal cities and trade routes were important, lessening Avar influence. To the south of Avar, languages of the Lezgian branch are spoken along the southeast-flowing Samur and its tributaries, and Tsakhur is at the high end of this line of communication and also at the high end of the Koisu-Sulak line. The sample here includes representatives of most of these stages. Thus we expect the descending order of spread effects along and near the Andi Koisu and Avar Koisu systems shown in (6): (6)
Languages of the Avar sphere and their sociolinguistic histories Andic (long expansive; decomplexification expected) languages Avar (recently expansive; some decomplexification expected) Hinuq (early expansion, much subsequent Avar-Andic contact) Hunzib (early expansion, less Avar-Andic contact) Lak (isolated, but fairly large and unified) Ic’ari Dargwa (isolated, fairly small) Tsakhur (isolated, small; complexification expected) Table 7.3(a) shows the complexity values. CC conforms very well to this scale; the only non-conformities are Hinuq, which clusters with Andic as is unsurprising, and Ic’ari (Dargwa), which belongs to the Caspian coastal sphere. EC is not very informative. The combined total is again in good conformity (unsurprisingly, as it adds the fairly uniform EC scores to the CC scores). For the Avar sphere and its periphery, then, CC reflects the sociolinguistics of spreading and isolation better than EC does, and the combined measure differs little from the CC scores. • The Samur sphere. The delta of the Samur River, which drains the southeast Caucasus and flows into the Caspian Sea, is a highly productive agricultural region and long a nexus of trade and tax collection along the East Caspian commercial route. It is the second most important avenue (after the Sulak) for transhumant migration. The Lezgian branch, an old and diversified branch of Nakh-Daghestanian, originated in this vicinity and spread both uphill and into the Alazani valley in eastern Georgia and the lower Kura valley in northern Azerbaijan. The sample contains four Lezgian languages,
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
184
two in the highlands and two in the lowlands. Lezgi is a large, expansive, and inter-ethnic language centred on the lower Samur and nearby. Udi, which descends from a probably expansive inscriptional language of the early to mid first millennium (Caucasian Albanian [Gippert et al. 2009] is its ancestor), has since shrunk to three isolated enclaves in Azerbaijan and Georgia. Archi, noted for its morphological quirks (Corbett 2013b; Bond et al. 2016), and the complex Tsakhur are isolated at high ends of river canyons and have no known history of spread (apart from reaching the highlands in the first place, however that happened). The complexity figures in Table 7.3(b) reflect this history well. Tsakhur has much higher CC than the rest; Archi has higher EC; lowland Lezgi, with its known history of expansion, is low on both counts. Udi is mixed, high on CC and lower on EC, suggesting that CC complexifies faster than EC after the end of expansion. For both Caucasus surveys, EC picks out as most complex one language that is isolated at a high end with connections in more than one direction (Ic’ari Dargwa, Archi), CC appears to reflect spreading more than isolation, and the combined total gives a workable unified complexity scale that correlates reasonably well with altitude and isolation. • Slavic. Of the four Slavic languages in the sample, Russian has a long history of expansion and absorption of Baltic and Finnic populations; Sorbian reflects the leading edge of the Proto-Slavic expansion (c. sixth to ninth centuries) but has been sociolinguistically isolated and receding since then (largely absorbed by the German expansion); Slovene remains close to the homeland and has no known history of expansion other than uphill spread into the Austrian and Slovene Alps; Bulgarian belongs to the Balkan Sprachbund and has undergone drastic structural changes as a result, including loss of cases and thereby of the case-numbergender co-exponence that makes Slavic noun declension so complex. Complexity levels (Table 7.3(c)) are not greatly different for the languages preserving case inflection, while Balkanized Bulgarian is much less complex. • Uto-Aztecan. The Uto-Aztecan family is probably 5,000 years old and has undergone a gradual spread from a probably northern Mexican homeland followed by two large recent spreads: in the south, ancestral Nahuatl spread with the Aztec expansion and empire beginning in the thirteenth century, and in the north the Numic branch spread rapidly from the Sierra Nevada foothills across the Great Basin beginning in approximately the same time frame (Fowler 1972; Miller 1983; Madsen & Rhode 1994; Hill 2001, 2010; Merrill 2012). The languages in the sample, south to north, are Pipil (Nicaragua), Hopi (Arizona), Cupeño (southerneastern California), and Tümpisa Shoshone (east central California). Pipil is the southernmost
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
185
descendent of Aztec and a surviving language probably of a military garrison. Cupeño is an isolated small language spoken in an eastern Sierra Nevada oasis with no history of expansion. Hopi is a pueblo language that gives some evidence of early admixture with a more southern Uto-Aztecan language (Merrill 2012) but has a long history of isolation. Tümpisa Shoshone is from the Numic branch. Table 7.3(d) shows that the complexity levels of expansive Pipil and Tümpisa Shoshone are lower than those of Hopi and Cupeño, as predicted. The difference is mostly due to CC, consistent with what is suggested by the Avar sphere. Though these samples are small, the results are generally consistent with predictions of higher complexity for sociolinguistically isolated communities. CC appears to be the better mirror of sociolinguistic history, and EC points in the same direction but unevenly. Nonetheless, combined CC + EC tends to yield very good correlations with present and prehistoric sociolinguistics: sociolinguistically isolated languages are more complex and expansive languages less complex.
7.4 Discussion and conclusions To summarize, CC makes something very similar to informational (or Kolmogorov) complexity straightforwardly measurable using standard structural analysis and wellworked out theoretical principles. I hope it will make it possible for any linguist to measure and compare the complexity of other languages. The initial hope for CC as first attempted (Nichols 2015) was that it would be a replacement and improvement on EC and more cost-effective. It actually turned out to be not a replacement and no less labour-intensive but a useful complement; combining the two can give a very serviceable complexity measure which, as intended, is capable of reflecting sociolinguistic history and shows interesting geographical distributions. This chapter has laid out a method for describing and measuring CC in inflectional morphology, as a set of seventeen separate variables which for this first attempt were simply added together without weighting. These represent a well-defined and crosslinguistically well-represented subset of inflectional morphology; for both CC and EC, in order to make surveys manageable in time cost, inflectional morphology must be sampled rather than covered fully. In a survey of just over a hundred languages, CC and EC proved to be independent of each other and, independently or combined, give quite revealing results. In terms of geography, CC and EC values both follow worldwide east-west clines in the upper northern latitudes (as do all other composite variables I have surveyed). The continents surveyed all have similar means and ranges of diversity in their complexity values; local areas can vary more, and families can differ still more. For an area with a large range of values, one can question whether it is
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
186
genuinely an area (though a good answer will require surveying more than complexity). Both EC and CC correlate positively with altitude, a geographical factor that is not the cause of complexity levels but reflects the sociolinguistics of isolation. The four families adequately represented in the sample all display some positive correlation of complexity with sociolinguistic isolation, supporting principles advanced in historical linguistics and sociolinguistics. The definitions and coding used here were arrived at using the autotypologizing principle (Bickel & Nichols 2002) of no fixed ontology and constant redefining and recoding as the categories emerge from analysis of more and more languages. Arriving at the current typology has been very labour-intensive, making this pilot survey inordinately time-consuming. By now, though, the typology has stabilized to the point that language surveys themselves are not unduly labour-intensive. This line of inquiry can be improved by expanding the sample to give all continents and areas comparably dense coverage to what has been done here for northern Eurasia and North America, and covering thoroughly a larger number of families and local areas. Methods of weighting the variables, and different calculations using different combinations of variables, need to be proposed and tested; among other things this will give firm grounding to comparisons of the relative complexity of Indo-European noun inflection and polysynthetic verb inflection. For stem classes and inflectional classes, which as mentioned are rarely distinguished in grammars, we need improved and consistent descriptive coverage. We also need consensus definitions and criteria for characterizing the numbers of conforming and non-conforming members of classes that have some semantic or other basis, such as gender classes; descriptions like ‘miscellaneous’ (composition of a class), ‘predictable’ (class membership), ‘arbitrary’, etc., are not consistently used. The applicability thresholds used here (section 7.2.2)—few or no members predictable, a sizable minority predictable, most or all predictable— seem workable but require some quantification, however approximate, of the class membership and openness. Inflectional paradigms are ideally suited to an approach like this one. The same approach works well for some domains of derivational morphology but not all. For phonology and syntax and probably some derivational morphology, non-transparency will probably need to be described with a measure of the distance between underlying and surface. I see this kind of study as moving linguistics in the direction of the data sciences. Variables that form geographically very large patterns, or that correlate with such things as sociolinguistics, expansions, and other human population developments raise the prospects of multifactorial interdisciplinary collaboration. A single variable surveyed in a 113-language sample is not what one would call Big Data, but behind the convenient single number representing the CC value lie seventeen variables surveyed across three POS and eight categories—a total of over 200 datapoints per language or over 20,000 for the hundred-language sample. Massive scope, making possible close comparison with the differently distributed
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
187
data of other fields, might require some 400–500 languages plus similarly massive data for a few other composite variables or many simple ones. Creating such a resource is an ambitious but entirely feasible project.
Appendix 7.1 Categories and variables used here For definitions and discussion, see section 7.2.2. Variables * = entries are number of categories in the paradigm; others are presence vs. absence (calculated as 1 and 0 in total complexity figures). 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.
Inflection classes* Unpredictability of inflection classes* Inherent categories* Unpredictability of inherent classes* Stems per lexeme* Stem classes per language* Unpredictability of stem classes* Arguments indexed Co-exponence Syncretisms Allomorphy Position discrepancies Category discrepancies Wordhood discrepancies Partial marking Multiple marking Other
Grammatical categories surveyed here Case. Case marking of A S O G T and Poss only. Gender. Noun gender only. Number. Singular and plural only. Person. 1-2-3 singular inflectional paradigms; 1-2 singular and plural for independent personal pronouns. Person-number, where these two are co-exponential. Classifier. Chiefly numeral classification; but used for a second set of gender categories in languages with two gender systems (here, only Mian). TAM. The most basic synthetic present-like and aorist-like tense categories, where distinguished; where lacking, two other basic tense categories; where there is no tense, no entry. General. Where a variable cannot easily be attributed to any one category.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
188
Appendix 7.2 Cell totals per category and variable Case Gender Number Person Pers-No Classifier TAM General Total Inflection categories
166
53
199
36
131
7
108
0
Inflection classes
206
44
164
36
185
5
122
62
824
8 0
26 152
25 11
8 0
41 0
0 15
4 0
9 0
121 178
Unpredictability
0
110
4
0
37
6
4
9
170
Stems per lexeme
56
3
10
4
7
1
27
320
428
Stem classes per lg.
49
2
11
7
13
1
47
405
535
Unpredictability
5
2
2
0
9
0
26
108
152
Arguments indexed
0
0
0
0
0
0
0
178
178
Fusions
4
28
2
4
106
0
5
21
170
Syncretisms Overlaps
54 13
25 0
7 0
5 2
43 1
0 0
1 0
4 0
139 16
Allomorphy
118
Unpredictability Inherent categories
700
57
6
11
9
21
0
8
6
Position discrepancies
7
7
5
6
15
1
3
5
49
Category discrepancies
0
4
2
0
2
0
0
2
10
Wordhood discrepancies
13
4
1
1
10
0
0
7
36
Partial marking
1
27
2
0
3
1
0
0
34
Multiple marking
0
1
1
2
11
0
0
0
15
0 638
0 494
2 459
12 132
43 678
0 37
0 355
Other TOTAL
4 61 1140 3934
Appendix 7.3 Sample Classification and geography of the 113 sample languages. * = languages with only CC data and no EC data. Languages where the stock name is identical to the language name are isolates. Language Fula Lango Luganda Jamsay Fur Haro Somali Dahalo Nama Basque
Stock N. Atlantic Nilotic Benue-Congo Dogon Fur Ta-Ne Omotic Cushitic Cushitic Juu Basque (isolate)
Continent Africa Africa Africa Africa Africa Africa Africa Africa Africa W Eurasia
Area
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
German Russian Lithuanian Sorbian * Slovene Bulgarian Romanian Albanian Greek Ossetic Kabardian Ingush Avar Karata Tindi * Godoberi Hinuq Hunzib Lak Icari Udi Tsakhur Lezgi Archi Khinalug Svan Pazar Laz Saami (Kildin) Finnish Mordvin Mari Hungarian Khanty (E.) Khanty (N.) * Nganasan Tundra Nenets Ket Evenki Even *
Germanic Indo-European Indo-European Indo-European Indo-European Indo-European Indo-European Indo-European Indo-European Indo-European West Caucasian Nakh-Daghestanian Nakh-Daghestanian Nakh-Daghestanian Nakh-Daghestanian Nakh-Daghestanian Nakh-Daghestanian Nakh-Daghestanian Nakh-Daghestanian Nakh-Daghestanian Nakh-Daghestanian Nakh-Daghestanian Nakh-Daghestanian Nakh-Daghestanian Nakh-Daghestanian Kartvelian Kartvelian Uralic Uralic Uralic Uralic Uralic Uralic Uralic Uralic Uralic Yeniseian Tungusic Tungusic
W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia N Asia N Asia N Asia N Asia N Asia N Asia N Asia N Asia
189
Circum-Baltic Circum-Baltic Circum-Baltic Circum-Baltic Balkan Balkan Balkan Balkan Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
190
Udehe Nanai Manchu Yakut Chuvash * Mongolian Yukagir (Tundra) Ainu Nivkh Itelmen Chukchi Aleut Mandarin Paiwan Bininj Gun-Wok Mawng Bardi Diyari Kuniyanti Djingulu Mian Usan Tawala Yimas Koiari Central Alaskan Yup’ik Zuni Acoma Lakhota Kiowa Hupa Cree E. Pomo Seneca Thompson Yurok Karok Nuuchahnulth * Tümpisa Shoshone
Tungusic Tungusic Tungusic Turkic Turkic Mongolic Yukagir Ainu (isolate) Nivkh (isolate) Chukchi-Kamchatkan Chukchi-Kamchatkan Eskimo-Aleut Sino-Tibetan Austronesian Gunwingguan Iwaidjan Nyulnyulan Pama-Nyungan Bunuban Mindi Ok Madang Austronesian Lower Sepik Koiarian Eskimo-Aleut Zuni (isolate) Keresan Siouan Kiowa-Tanoan Athabaskan Algic Pomoan Iroquoian Salish Algic Karok (isolate) Wakashan Uto-Aztecan
N Asia N Asia N Asia N Asia N Asia N Asia N Asia N Asia N Asia N Asia N Asia N Asia S&SE Asia S&SE Asia Australia Australia Australia Australia Australia Australia New Guinea New Guinea New Guinea North America North America North America North America North America North America North America North America North America North America North America North America North America North America North America North America
N Pacific Rim
N Pacific Rim N Pacific Rim N Pacific Rim N Pacific Rim N Pacific Rim N Pacific Rim
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
Yokuts Maidu Southern Sierra Miwok Wappo Wishram Nez Perce Klamath Chimariko Cupeño Koasati Hopi Jamul Tiipay Pipil Tzutujil Cayuvava Movima Kashibo-Kakataibo Jaqaru Aymara Huallaga Quechua Mapudungun Kwaza Paez
Utian Maiduan Miwokan Yuki-Wappo Chinookan Klamath-Sahaptian Klamath-Sahaptian Chimariko (isolate) Uto-Aztecan Muskogean Uto-Aztecan Yuman Uto-Aztecan Mayan Cayuvava Movima Panoan Aymaran Aymaran Quechua Mapudungun Kwaza (isolate) Paesan
191
North America North America North America North America North America N Pacific Rim North America North America North America N Pacific Rim North America North America North America North America Central America Central America South America South America South America South America South America South America South America South America South America
Appendix 7.4 CC levels in the survey languages (a) Including count of categories (though this approximates EC). Lowest, in increasing order: Mandarin, Diyari, Manchu, Lango. Highest, in increasing order: Ket, Lower Sorbian, Russian, Skolt Saami. The scale is 9–68; median=mean (arrow) is 32. (b) Excluding count of categories to give a more strictly CC total. Lowest, in order: Mandarin, Manchu=Diyari, Lango=Klamath=Kashibo-Kakataibo. Highest: Russian, Lower Sorbian, Slovene, Skolt Saami. The scale is 7–60; mean 26.4, median (arrow) 25.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
192
(a) Including count of categories 80 70 60 50 40 30 20 10 0 (b) Not including count of categories 70 60 50 40 30 20 10 0
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
8 The complexity of grammatical gender and language ecology Francesca Di Garbo
8.1 Introduction This chapter is a qualitative investigation of the sociohistorical correlates of diachronic change in the domain of grammatical gender agreement. I define grammatical gender systems as systems of nominal classification that presuppose agreement marking and thus highly grammaticalized patterns of inflection, often involving shared exponence with other nominal categories (e.g., number), syncretism, and other types of coding asymmetries. In languages with grammatical gender, nouns are assigned to different classes. These categorizations are not necessarily, or not only, encoded on nouns. On the contrary, gender marking is displaced on words that are engaged in a morphosyntactic relationship with nouns (e.g., adnominal modifiers, verbs, pronouns) and whose inflections point at the gender of the noun. During the last couple of decades, a number of studies have brought qualitative and quantitative evidence in support of the idea that the evolution of morphological complexity (both at the syntagmatic and paradigmatic level) is sensitive to sociohistorical dynamics concerning language population (see, among others, Lupyan & Dale 2010; Trudgill 2011; Bentz & Winter 2013; Bentz et al. 2015). Complexities in certain domains of morphology represent a challenge for the adult learner and tend to be eroded with the increase of the number of adult learners at a given point in the history of a speech community. This adaptive response of language structures to social factors has been claimed to be also crucial to understand how gender systems change through time and how they are distributed worldwide (Trudgill 1999; Nichols 2003; McWhorter 2007). For a number of language families around the world (e.g., Indo-European and NigerCongo) grammatical gender can be reconstructed as a feature of the protolanguage, and as one of the most long-lived. Yet, even though stable at the family-level, the gender systems of individual languages within a gendered family may undergo reduction and loss due to language-internal processes of morphophonological erosion and/or reanalysis that, at least in some cases, pair up with a situation of prolonged contact and bilingualism with languages lacking gender Francesca Di Garbo, The complexity of grammatical gender and language ecology In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Francesca Di Garbo. DOI: 10.1093/oso/9780198861287.003.0008
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
194
(on the role of language contact in the loss of grammatical gender, see the recent study by Igartua 2019; for a broader discussion of loss of morphology and imperfect language learning, see the contributions by McWhorter, Chapter 10, and Berdicevskis & Semenuks, Chapter 11, both in this volume). It has also been observed that gender systems tend to cluster geographically and to be best preserved in languages surrounded by other languages with gender (Nichols 1992, 2003). Thus languages that undergo complete gender loss are expected to be neighbours with each other or to have languages without gender as their closest neighbours (Nichols 2003: 299–304). While instances of gender reduction and loss under contact situations are relatively well documented in the literature, the role of language contact in the rise of gender systems has, so far, been poorly explored, and scholars generally agree on that gender systems very seldom arise within language families that normally lack gender (Nichols 2003: 308). This is directly connected with the fact that full-fledged gender marking systems are commonly associated with rather pervasive patterns of agreement, which are notoriously unlikely to be borrowed (for a similar argument, see Igartua 2019: 209). However, recent research (Stolz 2012, 2015; Di Garbo & Miestamo 2019) shows that elementary patterns of gender agreement may emerge as a result of borrowing of noun phrases from contact languages with gender, and that, albeit rare, these types of systems are spread across unrelated languages and in different areas of the world. Existing research on the stability and evolution of gender systems under contact situations focuses either on the decline or on the rise of gender systems, and the two processes are rarely discussed together. Here I argue that, in order to fully understand to which extent morphological complexity in the domain of grammatical gender ties up with factors pertaining to the social history of a speech community, a comprehensive survey of the evolutionary dynamics of gender systems—focusing not only on loss and emergence, but also on reduction and expansion—is in place. In addition, given that, by definition, gender systems are bound to the existence of productive agreement patterns (Corbett 1991), I contend that complexification and simplification in the morphological encoding of gender distinctions must be primarily studied through the analysis of agreement patterns.¹ Within contact linguistics, it is generally assumed that contact-induced loss or emergence of agreement presupposes long-term contact, heavy borrowing and/ or extensive bilingualism between speech communities (Thomason 2001: 71). However, to date, and to the best of my knowledge, there have been no studies that systematically tackle the issue of which factors may account for the occurrence of these opposite patterns of change, agreement loss and emergence, under
¹ Focusing on patterns of gender agreement does not mean, of course, to underestimate the importance that nominal gender marking has in languages that display it (for a more thorough discussion, see section 8.2).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
195
allegedly similar sociohistorical scenarios. The present study attempts to fill in this gap by investigating loss of gender agreement in language families characterized by the presence of this feature and, conversely, the insurgence of gender agreement in languages with no inherited gender systems. Beside loss and emergence, I also study the reduction and expansion of gender agreement patterns within gendered language families. With respect to sociohistorical variables, the study especially focuses on language contact dynamics, with particular attention to asymmetries between the populations in contact, both in terms of the demographic structure (population size) and prestige differences. The chapter is structured as follows. In section 8.2, I discuss in what respects gender systems, as a grammatical and functional domain, can be relevant to the study of morphological complexity. The sampling methodology and data collection procedure are outlined in section 8.3. In section 8.4, I provide an overview of the patterns of language change attested in the data set, and illustrate their geographic distribution in section 8.5. Section 8.6 discusses the sociohistorical factors that are associated with the patterns of change attested in the languages of the sample. A summary of the results and some concluding remarks are given in section 8.7.
8.2 Grammatical gender and morphological complexity Recent research on linguistic complexity and the typology of gender systems (Audring 2014; Di Garbo 2016) suggests that three dimensions of variation can be relevant to a typologically informed, descriptive² account of the complexity of gender systems: • The number of gender distinctions, under the assumption that the higher the number of distinctions, the more complex the gender system. • The number and nature of assignment rules, under the assumptions that: (a) a gender system where gender assignment is both semantic and formal is more complex than a system where gender assignment is only semantic or only formal, and (b) a gender system with flexible assignment is more complex than a system with rigid assignment. • The pervasiveness of gender marking, under the assumption that the higher the number of word classes and syntactic domains that are subject to gender marking, the more complex the gender system. ² In this chapter, the notion of descriptive, absolute complexity is kept distinct from the notion of difficulty. Under the former approach, complexity is operationalized in terms of description length (Dahl 2004; Miestamo 2008). Under the latter approach, complexity is a measure of difficulty and costs in language learning and use (Kusters 2003). For a discussion of these and related topics, see Arkadiev & Gardani, Chapter 1 in this volume.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
196
The suggested dimensions are based on established typological parameters for the classification of gender systems, but do not exhaust all possible ways in which gender systems may vary, and they can be in turn broken down into a number of subdimensions. For a detailed analysis of how the complexity of gender systems can be further differentiated, see Audring (2017, 2019). While the first and second dimensions of the proposed complexity metrics are not directly linked to morphological complexity, the third dimension (pervasiveness of gender marking) directly hinges on morphology. Gender marking presupposes the existence of morphology that is dedicated to the expression of gender. This applies both to nominal gender marking (also known as overt gender) and to non-nominal gender marking (also known as gender agreement). If we consider non-nominal gender marking first, grammatical gender systems can be associated with morphological complexity both syntagmatically and paradigmatically. At the syntagmatic level, patterns of gender agreement are sets of inflections that may occur on various entities within an utterance (e.g, articles, adjectives, demonstratives, verbs, personal pronouns) and that point at one of multiple classes to which nouns can be assigned (e.g, in Italian, the masculine and feminine class). At the paradigmatic level, each of the items that carry gender inflection in a language typically possesses as many forms as there are gender values to be distinguished, and the number of available forms is even higher if, for instance, a language expresses gender distinctions both in the singular and in the plural. In Italian (Indo-European, Romance),³ the form of the definite article varies between il/lo, la, i/gli, le, depending on whether the noun marked as definite is masculine singular, feminine singular, masculine plural, or feminine plural.⁴ Moving on to overt gender marking, in several languages, gender marking is not only restricted to agreement but also affects nominal morphology, with gender distinctions being overtly marked on nouns. Overt gender marking features higher syntagmatic complexity, inasmuch as it increases the number of word classes where gender is flagged within an utterance. It also increases paradigmatic complexity, in that it leads to higher lexical diversity, given that each noun may in principle have as many forms as there are gender values to be distinguished. Nominal gender marking is, for instance, very pervasive in Atlantic-Congo gender systems, as illustrated in (1) with an example from the Bantu language Chichewa.
³ In this chapter, language classification is based on Glottolog (Hammarström et al. 2019). ⁴ This type of morphological paradigmatic complexity is defined by Bentz et al. (2015: 2) as an instance of lexical diversity, which they describe as the ‘distribution of word forms or word types’ that languages ‘use to encode essentially the same information’. In the domain of definiteness marking, Italian exhibits higher lexical diversity than, say, English, because different forms of the definite articles are used depending on the gender and number values of nouns, whereas definite articles in English are gender and number invariant.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
197
(1) Gender marking in Chichewa (Atlantic-Congo, Bantu; Kiso 2012: 18) chi-nkhanira cha-chi-kazi chi-ku-dzi-kanda 7-scorpion -7-female 7.---scratch ‘The female scorpion is scratching itself.’ In (1), the markers of class 7 (the singular form of gender 7/8 in Chichewa) occur on the adnominal modifier, the verb, and the noun itself. The relationship between nominal and non-nominal (agreement-based) gender marking is not trivial. In some languages, as it is the case in Bantu, nominal and non-nominal marking can have similar means of expression from the point of view of the phonological appearance of the morphemes used to encode gender distinctions. However, this formal correspondence may only apply to parts of the system rather than to all nouns and all agreement targets. In addition, nominal marking and agreement marking may have different sources and undergo different types of diachronic developments. For instance, as is also the case in Bantu languages, animacy-based marking may develop in the domain of agreement without affecting nominal marking. Thus, in languages that have both nominal and agreement-based marking of gender distinctions, it is important to consider these as two separate dimensions that may, but need not interact with each other. In this chapter, I restrict my focus to patterns of change in the domain of agreement marking and their effect on the complexity of gender systems. The reason behind this choice is twofold. On the one hand, while agreement marking is definitional to gender (there is grammatical gender only if there is displaced marking of classificatory distinctions through agreement), nominal marking is not (many languages mark gender distinctions only via agreement). On the other hand, while agreement marking directly hinges on inflectional morphology, in that gender agreement targets obligatorily inflect for gender, nominal gender marking resides more in the domain of lexicalized distinctions and/or word formation rules, which can be argued to be less central to morphological complexity. The patterns of change in the domain of agreement marking that the study focuses on are presented and discussed in section 8.4.
8.3 Method and data 8.3.1 Sampling methodology and variables in focus The study is based on a sample of 36 languages distributed among 15 sets of closely related languages. Each language set contains two to three languages with the exception of Chamorro, a language isolate within the Austronesian family, and the mixed language Michif. The geographical distribution and genealogical affiliation of the sample languages are shown in Figure 8.1. Even though language sets
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
198
Legend Balto−Slavic Bantu Basque Chamorro Central Gunwinyguan Germanic Ghana−Togo−Mountain Greek
Insular Celtic Iranian Khasian Lezgic Mek Michif Thebor
Figure 8.1. The language sample Note: See also Di Garbo & Miestamo (2019).
from at least five of the six world’s macro-areas are represented in the sample, the data set is largely skewed towards Eurasia. The reason behind this bias is twofold. First, along with Africa, Eurasia is one of the areas of the world where gender systems are most frequent. Second, for many of the Eurasian genealogical units included in the sample, diachronic developments in the domain of nominal morphology have been studied with the support of historical-comparative data, and the social history of many of these speech communities is also relatively well documented. The languages of Eurasia thus qualify as an appropriate starting point to explore the evolutionary dynamics of morphological complexity in the domain of gender marking and their sociohistorical correlates. At least one genealogical unit for all other macroareas (except for South America) has been added. A complete list of the languages sampled for each of the genealogical units is given in Appendix 8.1. Each language set consists of one conservative language and at least one innovative language with respect to gender agreement marking, with the exception of the Thebor (Bodic) languages Shumcho and Janshung, both of which represent instances of emerging gender agreement patterns within the family. Languages within one and the same set can be mutually intelligible with each other (as in the case of Kelasi and Kafteji within the Northwestern Iranian set), or more distantly related (as in the case of Nalca and Eipo within the Mek set). The patterns of language change accounted for are: loss, reduction, emergence and expansion in the domain of gender agreement. These are compared with either the retention of gender agreement (in case of reduction, loss and expansion) or with its absence (in case of emerging gender agreement). These diachronic processes are investigated by examining the morphosyntactic domains of gender marking in a language (e.g., attributive modifiers, predicates, pronouns), and the way in which
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
199
these vary across genealogically related languages: what are the word classes that inflect for gender in language X as opposed to the closest relatives Y and Z? Do all targets of gender agreement mark the same kind of gender distinctions or is there a split between, say, adjectives, articles and demonstratives distinguishing between masculine, feminine and neuter gender, and personal pronouns distinguishing between animate and inanimate gender? The relevance of these questions for the understanding of the complexity of gender systems is discussed in section 8.4. In addition to representing more or less conservative languages in the domain of gender agreement, the sampled language sets and the individual languages within each set, were selected so as to attempt to capture diversity at the sociohistorical level. In this respect, variables such as demography, domains of use, and history of contact were considered. This sampling methodology, which aims to capture both structural and sociohistorical diversity within sets of closely related languages, has been already applied to studies of the relationship between language structures and social structures. An example of this approach is the study of morphosyntactic complexity and language contact by Maitz & Németh (2014), where morphosyntactic complexity in three varieties of German is investigated to the effect that these varieties represent three different sociohistorical profiles: one standard, and relatively high contact language (Standard German), two contact languages (the pidgin Kiche Duits and the creole Unserdeutch), and one low contact variety typically learned as L1 only (Cimbrian).
8.3.2 Data collection Data were collected by using a questionnaire, which was sent out to experts of individual languages, as well as by means of descriptive resources. For those languages for which questionnaire responses could not be obtained, I used the questionnaire as a guideline to conduct more informal consultations with language experts and to gather information from descriptive resources. The questionnaire consists of two parts. Part 1 focuses on language ecology and language contact and aims at capturing information on the present and past geographical and sociohistorical environment in which a given language is/was used, with a set of fine-grained questions ranging from demography to domains of language use, issues of language identity and prestige, code switching practices and language contact in the past.⁵ Part 2 focuses on grammatical gender and aims at capturing information on number and type of gender distinctions, gender assignment rules, the morphology and syntax of gender marking and the diachrony of a given gender system. The questionnaire is based on two different ⁵ Not all of these questions could be answered for all languages in the sample.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
200
pre-existing typological questionnaires. Part 1 is based on John Bowden’s questionnaire on language contact in East Nusantara (Eastern Indonesia). Part 2 is based on Greville Corbett’s questionnaire on gender and number.⁶
8.4 Patterns of change under study: an overview Here I provide an overview of the patterns of language change that appear to foster the reduction, loss, expansion and emergence of gender agreement in the languages of the sample. I first discuss patterns of reduction and loss, moving to emergence and expansion thereafter. I also discuss how each of the patterns in focus may contribute to the increase and/or decrease of aspects of morphosyntactic complexity in the domain of gender marking. A description of patterns and contexts of change in each of the sampled languages is given in Appendix 8.1. For a detailed discussion of the patterns of change attested in the languages of the sample and summarized herein, see Di Garbo & Miestamo (2019).
8.4.1 Reduction and loss of gender marking The reduction and loss of gender agreement in the languages of the sample may result from two distinct processes of language change: (1) morphophonological erosion and (2) redistribution of agreement patterns. Under morphophonological erosion, gender marking is eroded or disappears as a result of sound changes that lead to the loss of segmental morphology. Under redistribution of agreement patterns, one gender agreement pattern spreads at the expenses of others, leading to the partial or complete neutralization of gender distinctions. Both processes exhibit properties of directionality, but the preferred directionalities differ under one or the other process: morphophonological erosion is found to often spread from the domain of attributive modifiers whereas the redistribution of gender agreement patterns often has its onset in the domain of anaphoric pronouns. An example of partial loss of gender marking as a result of morphophonological erosion is Standard Swedish (Indo-European, North Germanic). In Standard Swedish, two different systems of gender distinctions are attested. Within the noun phrase, the language distinguishes between two genders: the Common Gender and the Neuter Gender, en person ‘a person’ and ett hus ‘a house’. This distinction is marked on definite and indefinite articles, demonstrative modifiers, and adjectives. In the domain of third person pronouns, a Masculine/Feminine ⁶ Both questionnaires can be freely accessed through the repository for ‘Typological tools for field linguistics’ from the website of the former Department of Linguistics at the Max Planck Institute for Evolutionary Anthropology in Leipzig (http://www.eva.mpg.de/lingua/tools-at-lingboard/question naires.php).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
201
Table 8.1. Third person pronouns in standard Swedish Hum. and Higher Anim. Inanim.
M han ‘he’ C den ‘it’
F hon ‘she’ N det ‘it’
P⁷ de ‘they’ P de ‘they’
type of gender distinction is marked if the pronoun antecedent is a human or a higher animate. If the pronoun⁷ antecedent is a inanimate noun, the Common/ Neuter gender distinction, which is active elsewhere, applies. This split is illustrated in Table 8.1. This split in the domain of gender marking is the result of the merger between masculine and feminine inflections on adnominal modifiers, which occurred through a combination of various morphophonological processes, such as the erosion and loss of the masculine suffix -er from the inflectional paradigm of strong adjectives, the loss of the masculine suffix -r before the definite suffix in the nominative form of the noun, and the loss of final consonant length in the inflectional paradigm of the definite suffixes (Duke 2010: 652–4). Many nonstandard varieties of Swedish, such as Elfdalian Swedish, still retain the tripartite distinction between Masculine, Feminine, and Neuter Gender all throughout the gender marking system. Complete loss of gender inflections as a result of morpholphonological erosion is attested in the Northwestern Iranian language Kelasi. Kelasi’s closest genealogical and geographic neighbour, Kafteji, still retains productive masculine and feminine gender agreement patterns. Lack of gender marking in Kelasi and presence of gender marking in Kafteji are exemplified in (2) and (3), respectively. (2)
No gender agreement in Kelasi (Northwestern Iranian; Stilo 2019: 45) a. m œmd-e ziœ-Ø ní-œ. this P.N-. son-. .-3 ‘This (or ‘he’) is not Ahmahd’s son.’ b. m œmd-e dét-Ø ní-œ. this P.N-. daughter-. .-3 ‘This (or ‘she’) is not Ahmahd’s daughter.’
(3)
Masculine and feminine gender agreement in Kafteji (Northwestern Iranian; Stilo 2019: 45) a. m-Ø œmd-ə zeœ-Ø ní-œ. this-. P.N-. son-. .-3. ‘This (or ‘he’) is not Ahmahd’s son.’
⁷ The Masculine/Feminine distinction is also marked in the accusative and genitive forms of the pronoun. Cf. honom (3..) vs. henne (3..), and hans (3..) vs. hennes (3..).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
202
b. m-œ œmd-ə dét-œ ne-áya. this-. P.N-. daughter-. .-3. ‘This (or ‘she’) is not Ahmahd’s daughter.’
As examples (2) and (3) show, utterances in Kelasi and Kafteji look (and sound) practically the same and the two languages are highly mutually intelligible. One of the few striking structural differences between the two languages is, in fact, the presence of gender inflections in Kafteji (in the form of zero-marked Masculine and marked Feminine) and its complete absence in Kelasi. Stilo (2019) describes loss of gender in Kelasi as the result of morphophonological erosion in the domain of nominal inflection, whereby the possibility to omit overt gender marking on nouns in certain morphosyntactic contexts triggers the systematic erosion of gender marking elsewhere. No information is however given about the ordering of loss of gender inflection on the various agreement targets. Loss of gender by the redistribution of agreement patterns is attested, among other languages, in Cappadocian Greek (Indo-European, Greek), where it results from the generalization of neuter agreement to all instances of masculine and feminine gender agreement (Karatsareas 2009, 2014). Comparative evidence from closely related dialects, such as Pontic Greek, allows us to infer how the process of redistribution took place. In Pontic Greek, grammatically masculine and feminine nouns denoting inanimate entities trigger neuter agreement on all agreement targets but the prenominal articles. This is shown in (4), with the example of the inanimate feminine noun pórta ‘door’, which triggers neuter agreement on the past participle anixtón ‘open’, but feminine agreement on the prenominal definite article i. (4)
Argyroúpolis Pontic (Indo-European, Greek; Karatsareas 2014: 79) i pórta (...) móno ímoson óran estéknen anixtón .. door.. (...) only half.. hour.. stay..3 open.. ‘The door would stay open for only half an hour.’
Conversely, in Standard Modern Greek, the same controller noun selects feminine agreement on all targets. (5)
Standard Modern Greek (Indo-European, Greek; Karatsareas 2014: 80) i pórta móno misí óra émene anixtí .. door.. only half.. hour. stay..3 open.. ‘The door stayed open for only half an hour.’
In Pontic Greek, the redistribution of the neuter gender agreement pattern is semantically motivated. Neuter agreement is associated with inanimate referents, and inanimate nouns select neuter agreement irrespectively of their grammatical
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
203
gender. In Cappadocian Greek, where the generalization of the neuter agreement patterns has taken over, no trace is left of this semantically based redistribution. Morphophonological erosion and agreement redistribution are not two mutually exclusive processes. An example of heavily reduced gender agreement system where both morphophonological erosion and redistribution of agreement patterns are at play is Karleby Swedish (Indo-European, North Germanic), a variety of Swedish spoken in the town of Karleby, which is located in the Finnish region of Ostrobothnia. In Karleby Swedish, gender inflections have been lost everywhere except for the unbound form of the definite articles and the personal and demonstrative pronouns, all of which still inflect as masculine or feminine, but only when the controller nouns denote human beings (Hultman 1894: 229; Huldén 1972: 47). Similarly, gender marking has undergone severe reduction and near-loss across different varieties of Tamian Latvian. According to the recent analysis by Wälchli (2017), the erosion of gender distinctions started out with the loss of short vowels in final syllables. This occurred first on nouns, leading to the neutralization of the masculine and feminine distinction in the accusative plural form, and later extended to agreement marking, starting from the demonstratives. This initial process of morphophonological erosion was followed by multiple processes of redistribution in other domains of gender marking, which led to the generalization of the masculine agreement pattern at the expense of the feminine. Traces of feminine marking are still found, to different extents and different degrees of productivity, in nearly all varieties of Tamian Latvian. For a sociohistorical analysis of these developments, see section 8.6. While it can be assumed that complete loss of gender agreement marking is a straightforward process of morphosyntactic simplification which decreases the overall number of grammatical meanings that must be expressed in a given morphosyntactic context (e.g., on adnominal modifiers, anaphoric pronouns, predicates), partial losses and redistributions of gender marking are harder to classify as straightforward simplification. Here I base my assessment of morphosyntactic complexity in reducing gender systems on recent work by Audring (2017), where different aspects of gender marking are broken into a multidimensional space of variation. Partial loss of gender marking as a result of morphophonological erosion can pave the way to split gender agreement systems such as the one attested in Standard Swedish. Here, not all targets of gender marking are sensitive to the same type of gender distinctions: the personal pronouns make a sex-based distinction that is not found in the domain of adnominal modification. Furthermore, sex-based marking on personal pronouns is conditional, and only occurs if the pronoun’s antecedent is a human being or a higher animate. According to the complexity metric proposed by Audring (2017), split gender agreement systems and conditional gender marking feature higher complexity than absence thereof.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
204
This is captured by two different dimensions of Audring’s metric, both pertaining to the domain of ‘target complexity’: (6)
a. Matching values < Mismatching values b. Targets match controller in value < Targets do not match controller in value (Audring 2017: 63–4)
In (6), the symbol ‘ inference in Jarawara (Arawan);
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
244
. .
‘hear, feel’ > non-visual in Tariana (Arawakan)), nouns (e.g., ‘noise’ > reportive in Xamatauteri Yanomami (Yanomaman), possibly via noun incorporation), and other morphology (e.g., declarative–indicative marker > direct evidential in Shipibo-Konibo (Panoan); past tense markers > reportive/attested in Kamayurá (Tupi-Guaranian)). Whether or not contact is responsible for such innovations is often unclear; in some cases, such as Nanti (Michael 2008), emergent evidential systems do not appear to be directly contact-driven. However, Müller (2013: 227) observes the regional clustering of Amazonian languages exhibiting evidentiality, as for example in the Guaporé-Mamoré (Crevels & van der Voort 2008) and the Vaupés regions, and evidentiality does appear to be relatively prone to diffusion crosslinguistically (see, e.g., Aikhenvald 2004: 21). Surveys of Amazonian evidentiality (Aikhenvald & Dixon 1998; Aikhenvald 2004: 292; Müller 2013: 228) suggest multiple points of independent innovation, from which the phenomenon has likely diffused more widely. Probably the clearest examples of contact-driven elaboration of evidential systems come from the Vaupés, in which a number of unrelated languages have undergone the grammaticalization of native forms to fill a regionally defined set of categories; this is the case for Hup (see above), Tariana (Arawakan, see Aikhenvald 2002: 117–29), and Kakua (Kakua-Nukakan; Bolaños 2016), among other languages.
9.2.4 Valence-adjusting Complex valence-adjusting systems have been noted in Amazonian languages, especially those of the western sub-Andean area (Wise 1990, 2002). Birchall (2014) found that more than 50% of the South American languages in his sample had morphological applicatives, and that these are concentrated in the west, where some languages show particularly elaborate inventories. Also relevant is Guillaume & Rose’s (2010) observation that a large number of Amazonian languages exhibit a dedicated ‘sociative causative’, which specifies that the causer participates in the action along with the causee, in addition to resources for expressing more neutral causation. They propose that the sociative causative may be an Amazonian areal feature in light of its apparent rarity elsewhere in the world, and observe a historical relationship between the sociative causative and applicative constructions. Elaborate valence-adjusting morphology is especially evident in the subAndean Arawakan languages, which stand out as having among ‘the most highly developed systems of morphologically distinct applicative operations on earth’ (T. Payne 1997: 190, cited in Wise 2002: 335; see also Wise 1990; Danielsen 2007; Valenzuela 2010). Such a system can be seen in Nomatsigenga:
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
(12)
245
Nomatsigenga (Arawakan; Wise 1971, 2002) a. -oko ‘with reference to’ -bi / -birí ‘because, for, why, because of ’ -así ‘purposive [action done with some purpose in view], for’ -pí ‘with respect to, in relation to’ -an / -ant ‘instrumental’ -ben / -bin ‘for, benefactive’ -té ‘towards, against’ -ak / -akag ‘comitative/sociative causative’ b. i-samë-ko-k-e-ro i-gisere 3-sleep----3 3-comb ‘He went to sleep with reference to his comb.’ (e.g., he was making it and dropped it) (Wise 2002: 336)
As observed for the other grammatical domains discussed above, Amazonian valence-adjusting systems often display a highly porous boundary between morphology and syntax. In particular, valence-adjusting mechanisms in these languages are often transparently derived or difficult to distinguish from incorporation (of postpositions or nouns). In Paresi (Arawakan), for example, at least half a dozen different postpositions can be incorporated with valence-adjusting or argument-rearranging functions (Brandão 2014: 276). A particularly interesting case is the form kakoa, which Brandão (2014: 256–9) analyses as a reciprocal suffix when it occurs inside the verb word, and as a comitative postposition when it is juxtaposed to the right of a noun phrase. Both are fully productive, and both moreover can co-occur in reciprocal constructions, in which the comitative expresses one of the arguments involved in the reciprocal event: (13)
Paresi (Arawakan; Brandão 2014: 259) wakoakare=kakoa Ø=aitsa-kakoa-ha minita hoka Indian= 3=kill-- always kazaihera-ty-oa-heta be.invisible?--- ‘They were always fighting with each other, with the Nambikwara, and he became invisible.’
Interestingly, the indeterminacy demonstrated by kakoa—which could be regarded as one morpheme with low selectivity or two morphemes, one syntactically and another morphotactically placed—does not appear to be due to recent grammaticalization in Paresi. Wise (1990) reconstructs the form *khakh ‘reciprocal’ to Proto-Arawakan, but notes that both reciprocal and comitative functions are widespread, and that reflexes of *khakh appear in both postpositional phrases and in verb phrases in languages representing diverse branches of the family.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
246
. .
While she suggests that the form originated as a postposition on noun phrases and entered the verb word via incorporation, she observes that a shift from reciprocal to comitative function also appears to have occurred in some languages. It seems likely that the indeterminacy exhibited by Paresi kakoa can be reconstructed to Proto-Arawakan itself. In verb-final languages, the subtlety of the distinction between incorporation and pre-verbal object placement can blur the syntax-morphology divide even further. In Hup, for example, the ‘interactional’ (reciprocal) verbal prefix ʔũh-, which originates in the incorporation of the noun ‘sibling’, can occur as a phonologically free element with an intervening object argument (see Epps 2008, 2010): (14)
Hup (Naduhupan; Epps 2008: 488) hɨd ʔũ̌h nam nɔ́ʔ-ɔ́y 3 poison give- ‘They give poison to each other.’
The diachrony of valence-adjusting systems has in general not been widely explored, both within Amazonia and beyond (see Haspelmath & Müller-Bardey 2004). However, as with the other domains considered here, the elaborate inventories in sub-Andean Amazonia suggest an areal component. It is tempting to speculate that the complex systems of applicatives in these languages—many of which appear to originate in the incorporation of postpositions and other element—might represent the intersection of the complex verb morphology and incorporating tendencies of western Amazonian languages with the prolific casemarking tendencies of Andean languages. Wise (2002: 341) also points out a number of similar applicative and causative forms in unrelated sub-Andean languages (e.g., Chayahuita (Cahuapanan) -të/-ta, Arabela (Zaparoan) -ta/-tia, and Yagua (Peba-Yaguan) -ta/-tya), and van der Voort (2005: 400) observes similar widespread forms in Guaporé-Mamoré languages (e.g., Kanoe (isolate) ta-/-to-, Kwaza (isolate) -ta-/-tia-, and Karo (Tupian) -ta-; see also Crevels & van der Voort 2008: 167). Although these forms are very short, at least some of these similarities may be due to direct borrowing. Otherwise, clear evidence for diffusion in the grammaticalization of valence-adjusting morphology comes once again from the detailed studies of contact in Vaupés languages Hup (Epps 2007a, 2010) and Tariana (Aikenvald 2002: 113–16).
9.2.5 Summary The studies we have reviewed thus far suggest that Amazonian languages tend to display a high degree of morphological elaboration in particular grammatical domains, and that many of these prolific domains show evidence of restructuring
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
247
and diffusion across unrelated languages in particular geographic regions. Moreover, in case after case, there is analytic indeterminacy between a morphological and a syntactic treatment of such elements. Often this indeterminacy can be linked to grammaticalization—either as an outcome of a relatively recent change from syntax to morphology or as a facilitator of developments in which innovative morphological forms are delinked from the constructions in which they originated, and extended to new morphosyntactic contexts. Notably, these processes appear to involve movements toward both tighter and looser bonding of morphological forms, rather than a more consistently one-way trajectory toward affixation, and in some cases freer and more bound instantiations of the same morpheme appear to co-exist in a relatively stable fashion. The ubiquity of such cases in the Amazonian context means that they cannot be treated as categorically different from ‘normal’ cases. While comparable phenomena are mentioned in theories that advocate or presuppose morphological autonomy, they are considered in such discussions to be unusual and of marginal importance (e.g., Blevins 2006: 555). However, western Amazonian languages suggest that at least in some regions of the world they may be the norm rather than the exception, and that an index of the degree of autonomy from syntax should be incorporated into the study of the complexity of morphological systems. However, the cases reviewed thus far only provide anecdotal evidence for this perspective, focusing on individual elements within particular languages. In what follows, we engage with the issue of morphological autonomy on a more global level, addressing the broader morphological profiles within a sample of languages.
9.3 Exponence complexity and morphological autonomy This section takes up the relationship between EC and the morphology-syntax divide empirically in western Amazonian languages. We develop and demonstrate a methodology that provides more globally oriented metrics of morphological autonomy, focusing primarily on Anderson’s second category of morphological complexity, ‘exponence complexity’ (see section 9.1). EC is a key element of the distinction between morphology and syntax: Advocates of morphological autonomy maintain that while complex deviations from biuniqueness (allomorphy, multiple exponence, morphomic structure, etc.) apply in the form-meaning mappings in morphology, these are rare or even absent at the syntactic level (Booij 1997, Anderson 2015a, Blevins 2016b; cf. Haspelmath 2011). Although the investigation is necessarily preliminary at this stage, we argue that the results lend support to the view that low morphological autonomy is a robust feature of languages in the western Amazon region. Our approach can be summarized as follows. If EC is associated with morphology, and morphology is concerned with the structure of words as at least
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
248
. .
partially autonomous systems, then we should expect EC to correlate with other criterial properties of words or parts of words, such as bound status, prosodic dependence, and contiguity. We take the position that the strength and significance of these correlations can be used to assess the degree of discreteness between morphology and syntax in a given language. Below, we show that the correlations between EC and criterial wordhood properties are low and usually non-significant in the western Amazonian languages we sampled. This observation further supports our argument, presented anecdotally in section 9.2 above, that the languages of this region tend to display a low degree of morphological autonomy. The following sections provide a description of the languages considered in this study (section 9.3.1), an overview of the properties of EC considered and a statistical summary of its realization across these languages (section 9.3.2), and a discussion of the correlations between EC and other wordhood criterial properties (section 9.3.3).
9.3.1 Languages considered Our sample consists of eleven western Amazonian languages from nine language families (see Figure 9.1): Cavineña (Tacanan; Guillaume 2008), Chácobo (Panoan; Tallman 2018), Hup (Naduhupan; Epps 2008), Jarawara (Arawan; Dixon 2004), Kokama-Kokamilla (Tupi-Guaranian; Vallejos 2010), Kotiria (Tukanoan; Stenzel 2013b), Movima (isolate; Haude 2006), Paresi (Arawakan; Brandão 2014), Ashéninka Perené (Arawakan; Mihas 2015), Tariana (Arawakan; Aikhenvald 2003b), and Urarina (isolate; Olawsky 2006). The three Arawakan languages represent distinct branches of this family. The eleven languages are distributed widely across western Amazonia, although some (in particular Hup, Kotiria, and Tariana) are not geographically independent. We have focused on languages with descriptions that are detailed enough for us to code wordhood properties and properties of EC for a range of morphemes. The concept of morphological autonomy developed in this chapter is a relative one, which we quantify as an index that can vary from language to language. Accordingly, we need a baseline for assessing how this index ranks in comparative perspective. While this is a large-scale typological problem, we take a preliminary step by comparing the Amazonian languages in our sample to Central Alaskan Yup’ik (CAY; Eskimo-Aleut family). There are three reasons for choosing CAY as a point of comparison: (i) it is a well-described language with a relatively comprehensive grammar and an extensive literature on its morphological and syntactic structure; (ii) it is comparable to Amazonian languages in displaying a high degree of system complexity in its morphology (i.e., it is a polysynthetic language); and (iii) it diverges from Amazonian languages in that its morphological and syntactic structures have been described as easily distinguishable from one
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Figure 9.1. Western Amazonian languages sampled
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
250
. .
Table 9.4. Number of morphemes coded in this study by language and functional domain
Perene Tariana Jarawara Kotiria Urarina Movima Hup Cavineña Chácobo Paresi Kokoma-Kokamilla CAY
Valence
Tense
Evidentiality
Nominal Classification
Total
16 34 5 3 6 32 9 17 38 10 6 13
14 29 13 0 15 8 5 6 20 11 19 10
14 14 1 10 3 2 5 3 4 0 4 2
68 81 0 21 0 111 46 0 0 11 0 0
98 119 19 34 24 153 65 26 61 32 29 25
another on both syntagmatic and morphophonological grounds (Miyaoka 2012; Woodbury 2017). While morphology and syntax are of course interwoven in CAY (e.g., in incorporation), clear cases of indeterminacy in word segmentation do not appear to be as ubiquitous as they are in many Amazonian languages (see Miyaoka 2012: 18). Our hypothesis is that, in general, CAY will rank higher than the western Amazonian languages on metrics of morphological autonomy, reflecting the Amazonian areal tendency to make a fuzzier distinction between words and phrases. For the eleven western Amazonian languages and CAY, we coded a total of 685 morphemes for morphological and wordhood properties in the four domains of grammar that have been discussed in this chapter: nominal classification, evidentiality, tense, and valence-adjusting (Table 9.4).⁵ Morphemes were identified on the basis of their function as grammatical elements associated with these domains (i.e., elements that do not function exclusively as members of a major word class). The variation in the number of elements per functional domain across the sample certainly reflects typological differences among the languages, and may also reflect differences in coverage across grammars regarding particular grammatical domains. The differences between languages sampled with respect to the total number of morphemes coded makes the interpretation of the statistical significance of the correlations somewhat more tentative than it would be if they were ⁵ In general, non-linear and syncretic morphology was not evident in the data. Given that the relationship between morphology and syntax is treated in global fashion in the literature, we did not address possible variation in this regard among domains; however, this could be an interesting question to consider in future work.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
251
more equal. Further details concerning the coding methodology and the metrics of morphological autonomy are provided in the following sections.
9.3.2 Exponence complexity The types of EC considered in this study are listed in (15). Each type of EC was coded as a binomial or ordinal value for the morphemes coded in this study. (15)
a. Number of allomorphs (ordinal: 1 to 5) b. Suppletive allomorphy (binary: yes = 2, no = 0) c. Multiple expression (binary: yes = 1, no = 0)
The EC score we develop in this study is simply the sum of these three scores. For instance, a morpheme that is realized by two allomorphs that are non-suppletive (i.e., related by productive morphophonological rules) and do not involve multiple expression will receive a score of 2; a morpheme that has two allomorphs that are related through suppletion and do not involve multiple expression will receive a score of 4. Below, we describe the process of measuring these variables, and provide a justification for the scoring techniques used in this study. We then present an overview of EC scores across the languages considered in this study. • Number of allomorphs. This variable refers to a count of the segmental allomorphs associated with a given morpheme. The number of allomorphs and the presence/absence of suppletion (our second variable) together relate to Anderson’s complexity measure of allomorphy (see section 9.1 above). It should be noted that for this metric, we are simply concerned with counting the allomorphs, whether they are morphophonologically conditioned or suppletive (i.e., these are not distinguished here). We assume that a higher number of allomorphs translates into higher EC, all other factors remaining equal. The maximum number of allomorphs found in our data was five, but the vast majority of morphemes only have one allomorph. Table 9.5 presents the number of morphemes coded at each level for this variable. An example of a morpheme with at least four allomorphs is the CAY applicative ut~ul~us~uc (example (16)).
Table 9.5. Number of allomorphs per morpheme attested across the sample Number of allomorphs
1
2
3
4
5
Number of morphemes
563
110
6
4
2
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
252 (16)
. . CAY (Miyaoka 2012: 1132–3) a. kis’-ut-aanga sink-APL-.3.1 ‘It [e.g. anchor] sank with me.’ b. AngunP(=E)=llu kis’-ul-luku kica-mA=S man..sg.=and sink-APL-.3. anchor-. ‘The man sank along with the anchor’, i.e. the anchor sank along with the man (entangled). c. An-us-gu mikelnguq! go.out-APL-.2.3 child.. ‘You [] take the child out!’ d. unuaqu-uc-iiq-aaten be.tomorrow-APL--.3.2 ‘It will be tomorrow before you (sg.) are done.’ (lit. It [the dawn] will come on you)
The opposite extreme can be seen in the Urarina causative, which displays no variation in phonological form—it is always realized as -a: (17)
Urarina (Isolate; Olawsky 2006: 459–60) a. kanʉ komasaj ʉ-a-anʉ 1 wife come-1-1/ ‘I have brought my wife.’ b. tɕãe kanaanaj-ʉrʉ eno-a-e=lʉ also child- enter-1-3/= ‘He also made the children enter.’
• Suppletive allomorphy. Suppletive allomorphy is considered one of the most important defining properties for morphological status (cf. Haspelmath & Sims 2010, inter alia). An example of suppletive allomorphy can be seen in the tense-modal suffixes of Jarawara, the forms of which vary depending on the gender of the subject; for the immediate past non-eyewitness tensemodal suffix, the masculine form in (18a) is distinct from the feminine form in (18b). Because there is no identifiable phonological rule that accounts for the difference between the masculine and feminine forms and generalizes beyond this particular pair of tense-modal suffixes, cases such as these are coded as suppletive. (18)
Jarawara (Arawan; Dixon 2004: 206–7) a. bahiS to-ke-hino sun() -in.motion-..: ‘The sun is (surprisingly to me) going away [i.e., setting]’
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
253
b. baniS mee wina-tee-hani animal() 3 live--..: ‘There were surprisingly many animals.’ The difficulty with suppletion in a study such as this one is that many (perhaps most) linguists have an intuition that suppletive allomorphy is qualitatively distinct from allomorphy based on productive morphophonological rules. However, in order to calculate a global EC score we need to change this qualitative intuition into a quantitative metric. To capture the fact that we view suppletive allomorphy as a much stronger weight to EC, suppletion is coded as a binary variable, but one that is weighted relatively heavily (2 for morphemes that display suppletive allomorphy; and 0 for those that do not). Thus if a morpheme displays suppletion its EC score will automatically be 4 (number of allomorphs: 2 + presence of suppletion: 2). • Multiple expression. Multiple exponence, or deviations from biuniqueness, is another measure of morphological complexity as defined by Anderson (2015a; see section 9.1). Here we focus on discontinuous realizations of form that correspond to a single unit of content, that is, infixes and circumfixes. We found no infixes in the languages considered in this study, and there were only a few other cases of multiple expression, such as the reflexive/reciprocal k(a)- . . . -ti in Cavineña:⁶ (19)
Cavineña (Tacanan; Guillaume 2008: 271) tudya=yatse ka-peta-ti-kware e=kwe e-jakwi=tsewe then=1 -look.at-=. 1- 1-brother.in.law= ‘Then my brother-in-law and I looked at each other [wondering who of us would know how to milk a cow].’
Multiple exponence was coded as a binary variable: morphemes like the Cavineña reflexive/reciprocal would receive a score of 1 for expression in a discontinuous fashion, whereas a one-form-one-meaning correspondence would receive a 0. • A metric for gauging EC. We calculated a global measure of EC for each morpheme by summing up the scores for the three EC criteria described above. Accordingly, a morpheme that is realized as one contiguous form with no allomorphy will receive an EC score of 1; typically morphemes that
⁶ Other examples involve the obligatory double-marking of a particular operation; for example, the Tariana passive requires the co-occurrence of the prefix ka- (which elsewhere functions independently as a ‘relative’ prefix) and the suffix -kana (Aikhenvald 2003b: 259). We did not consider other types of deviation from biuniqueness (besides allomorphy and multiple expression) because they were found to be very marginal in our data.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
254
. . receive such a score are described as syntactic elements (e.g., function words) or as agglutinative morphemes. Higher EC scores are associated with forms that deviate from biuniqueness in some way. Our coding was carried out independently of the grammarian’s structural classification of the morpheme in question; for example, we include auxiliaries used as analytic causatives as well as morphological causatives. For this reason, it is unsurprising that a relatively high percentage of morphemes in even a highly polysynthetic language like CAY have a low EC score (56%); this outcome simply reflects the fact that elements Miyaoka (2012) regards as ostensibly syntactic (temporal frame adverbs, evidential clitics, etc.) were coded alongside those he treats as morphological elements. This strategy gets at precisely what we are aiming for: we are interested in how morphology and syntax may (or may not) be distinct in the languages in question, not just how morphemes that grammarians have categorized as morphological correlate with indices of morphological complexity.
Individual morphemes score from 1 to 5 in EC across the languages in the sample.⁷ The percentage of morphemes associated with each EC value in the twelve languages considered are provided in Table 9.6. A visual representation of the distributions of EC values across the twelve languages is provided in Figure 9.2, which provides kernel distributions of EC value densities across the languages in the study.
Table 9.6. Percentage of morphemes for each EC value across the languages sampled (with average scores across all the morphemes for each language)
CAY Cavineña Chácobo Hup Jarawara Kotiria Kokama Movima Paresi Ash. Perené Tariana Urarina
Family
1
2
3
4
5
Average score
Eskimo-Aleut Takanan Panoan Naduhupan Arawán Tucanoan Tupian isolate Arawakan Arawakan Arawakan isolate
56% 85% 92% 94% 26% 97% 58% 57% 91% 91% 91% 83%
20% 11.5% 8% 6% 5% 3% 42% 5% 6% 5% 6% 17%
4% 4.5% 0% 0% 16% 0% 0% 11% 0% 2% 1.5% 0%
16% 0% 0% 0% 53% 0% 0% 27% 3% 2% 0% 0%
4% 0% 0% 0% 0% 0% 0% 0% 0% 0% 1.5% 0%
1.92 1.27 1.08 1.06 2.95 1.06 1.41 2.08 1.16 1.15 1.16 1.17
⁷ As seen in (15) above, the EC score according to our metric could be higher for any given morpheme, but in our data set none go above 5.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
1
2
3
4
1
2
3
Movima
Paresi
Tariana
Urarina
Hup
Jarawara
Kokoma-Kokamilla
Kotiria
255
4
1.5 1.0
Kernel Density Distribution
0.5
1.5 1.0 0.5 Asheninka Perene
Cavinen ˜a
Central Alaskan Yupik
Chácobo
1.5 1.0 0.5 1
2
3
4
1
2
3
4
Exponence Complexity
Figure 9.2. Kernel distribution of densities across the languages of this study
We note two points about the EC values across the languages of this study. First, it is generally true that CAY morphemes are more evenly distributed across the range of EC scores in comparison to the other languages—in other words, they are less likely to cluster at any particular EC value, most notably 1 (the lowest). This is to be expected based on current descriptions of CAY as highly morphophonologically complex, such that affixal elements display a high degree of word internal adjustments (i.e., fusion; see, e.g., Fortescue 1992); a higher degree of allomorphy will produce higher EC values. Second, and in contrast to CAY, the western Amazonian languages sampled cluster predominantly around the lowest EC value (1)—in keeping with the observation that languages of this region tend to exhibit a highly agglutinative profile. On the other hand, Movima, Jarawara, and to a certain extent Kokama-Kokamilla display higher EC levels—a point we return to below. Despite the generalizations made here, we emphasize that a higher EC score does not necessarily translate to a higher degree of morphological autonomy. Higher morphological autonomy is only corroborated if EC correlates with other criterial wordhood properties. In other words, morphological autonomy may be manifested by high EC scores, but high EC scores may not be limited to autonomous morphology.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
256
. .
9.3.3 Criterial wordhood properties and morphological autonomy Criterial wordhood properties refer to features that identify constituents or formatives as independent words versus parts of words. Each morpheme coded in the database is coded with a value (either binary (0,1) or ordinal (0,1,2)) for each of the criterial wordhood properties. Since each morpheme also has an EC value associated with it, we can assess the correlation between EC level and wordhood values. We investigate the following three criterial wordhood properties: (20)
a. Bound status (yes = 1, no = 0) b. Prosodic dependence (0 = never, 1 = sometimes, 2 = always) c. Contiguity (yes = 1, no = 0)
In what follows we provide a brief discussion of each of these criterial wordhood properties and how they were coded. We then turn to measurements of association between the wordhood properties and EC complexity across the languages in our sample. According to our conception of morphological autonomy as a typological index along which languages may vary, we propose that the morphological system of a language can be more or less autonomous. However, we do not feel that we are in a position to directly measure morphological autonomy, since it involves many interacting criteria that need to be weighed against one another in a principled way (although future research on this topic may make an overall global measure more appropriate, as suggested by Haspelmath 2011). For this reason, we simply provide statistical summaries of the correlations between EC levels and wordhood criteria in the languages considered here. Due to the fact that the variables are binary and/or ordinal and not normally distributed we use rank statistics to assess the relationship between EC level and criterial wordhood value across the languages. We use Kendall’s tau adjusted for ties in the statistical analysis programme R (McLeod 2011).⁸ In contrast to other rank correlation statistics like Spearman’s rho, Kendall’s tau is ideal for comparisons that involve many ties and small sample sizes. The data we gathered naturally contains many ties because we are comparing variables that, at most, are quantified from zero to five across a large sample of morphemes. Furthermore, as can be seen from Table 9.1, we gathered fairly small samples of data, according to the morphemes and constructions described in the grammars. We are concerned here with effect size (i.e., correlation strength) as much as we are concerned with ⁸ For an explanation of this methodology, including the concept of ties in rank statistics, see Kendall & Gibbons (1992) and Gibbons (1993). For an introduction to using Kendall’s tau in R, see Field et al. (2012: 225–6).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
257
statistical significance. The tau statistic can be read as a measure of the degree of morphological autonomy that a relationship between EC and a criterial wordhood property affords. For a given association, a strong positive correlation (a tau coefficient that approaches 1) suggests a more robust distinction between morphology and syntax; a weak or negative correlation (a tau correlation close to and/or below 0) suggests a more porous boundary between the two. In this study we judge a correlation to be significant if the p-value is lower than 0.05.⁹ • Bound status. Bound status is a classic criterion for wordhood (Bloomfield 1933; Hockett 1958).¹⁰ Here we consider a morpheme bound if and only if it fails the minimum free form test (and is not a primary content item, i.e., a member of a major word class, such as a verb); otherwise it is considered free. A morpheme or construction passes the minimum free form test if it can stand alone as a single grammatical utterance. Crosslinguistically, bound status tends to be associated with morphological elements, while free forms are more syntactically relevant (Bloomfield 1933: 207). Despite a tendency toward lower EC, western Amazonian languages typically have a large repertoire of bound forms. Example (21) illustrates a verb complex from Chácobo: The only morpheme which can stand on its own is the verb root oʂa ‘sleep’; all other morphemes are bound. (21)
Chácobo (Panoan; Tallman 2018) a. oʂa-mis=tɨkɨn=kas=ʔitá=kɨ=rɨ́ sleep-===.=:= ‘What a shame that he only wanted to sleep yesterday.’ b. oʂa ‘asleep’ c. *-mis d. *=tɨkɨn e. *=ria f. *=ʔitá g. *=kɨ h. *=rɨ
⁹ Of course, high p-values do not necessarily imply that there is no relationship between the EC score and a wordhood property (the sample sizes are too small to afford such an interpretation). We include the information regarding statistical significance for the reader who is interested in gauging how reliable our results are on this point. ¹⁰ A number of authors have pointed out problems with the minimum free form test (Haspelmath 2011; Bickel & Zuñiga 2017), in particular that it identifies compounds as phrasal elements and certain function words (determiners) as morphological elements. However, this test is not uniquely problematic among wordhood tests, as Haspelmath’s (2011) systematic review demonstrates. Furthermore, the test still provides useful information regarding morphological vs. syntactic status; for instance, Haspelmath (2011: 40) points out that if an element passes the minimum free form test this provides strong evidence that this element is not an affix. We see non-affixicality as an important criterion in calculating overall morphological autonomy.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
258
. . Table 9.7. Rank correlations between EC level and bound status values across languages tau correlation CAY Kokama Jarawara Paresi Tariana Hup Ash. Perené Chácobo Cavineña Urarina Kotiria Movima
0.543 0.449 0.329 0.246 0.189 0.183 0.119 0.099 0.085 0.037 0.342 0.716
p-value 0.004 0.017 0.139 0.166 0.038 0.143 0.231 0.445 0.671 0.858 0.050 >0.005
We encode bound status as a binary variable. The morpheme in (21b) would receive a score of 0, and all of the other morphemes (21c–h) receive a score of 1. Table 9.7 provides the rank correlations across the languages of this study. The tau correlation can be interpreted as an indicator of effect size; how strongly associated EC level is with bound status in the language. In CAY and KokamaKokamilla there are significant positive correlations, with CAY coming out on top. In Movima, however, there is a significant and negative correlation, a point we return to in section 9.3.4 below. Such measures of association are here considered to be metrics of morphological autonomy. • Contiguity. This criterion refers to whether a given formative is required to occur directly adjacent to the morpheme it semantically combines with, or can be separated from it by a free element. A lower degree of contiguity is associated with a more syntactic status, while a higher degree of contiguity is associated with a more morphological status (e.g., Mugdan 1994; Dixon and Aikhenvald 2002). To illustrate the criterion of contiguity, we can make reference to the Chácobo verb complex in example (21) above. According to the minimal free form test the verb complex in this example is a single word-unit, but according to rules of contiguity it consists of at least five different units, each of which can be separated from its neighbours by a full noun phrase such as honi siri ‘old man’. Example (22) illustrates the possibility of inserting this noun phrase at any of the points (a–e). Only the antipassive -mis and the combination of the recent past and past tense declarative =ʔitá=kɨ require contiguity.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
(22)
259
Chácobo (Panoan; Tallman 2018) a. (honi siri) oʂa-mis b. =tɨkɨn c. =kas d. =ʔitá=kɨ (man old) sleep- = = =.=: e. =rɨ́ = ‘What a shame that the old man only wanted to sleep again yesterday.’
We coded the criterion of contiguity as a binary variable for a given morpheme. Morphemes that can be separated by a free phrasal construct from the element they associate with semantically receive 0 for contiguity (as in 21d–h). If the morphemes require contiguity they receive a 1, as with antipassive -mis (21c).¹¹ A language that displays a high degree of morphological autonomy is expected to show a strong and positive correlation between EC level and morphemic contiguity. Table 9.8 shows the rank correlations between EC level and contiguity across the languages of this study. CAY, Jarawara, and Ashéninka Perené show positive and significant correlations between EC level and contiguity, with CAY coming out on top. While Ashéninka Perené’s correlation is statistically significant, the effect size is substantially lower than for CAY. Thus on this EC contiguity metric only CAY and Jarawara provide evidence for morphological autonomy. • Prosodic dependence. For a given formative or construction, prosodic word projection is prototypically associated with wordhood status. Incorporation into an adjacent prosodic word is prototypically associated with affix status (Spencer & Luís 2012). Table 9.8. Rank correlations between EC level and contiguity value across languages tau correlation CAY Jarawara Cavineña Ash. Perené Urarina Chácobo Tariana Movima Kotiria Paresi Kokama Hup
0.594 0.532 0.305 0.236 0.205 0.178 0.131 0.101 0.030 0.053 0.139 0.166
p-value 0.002 0.016 0.121 0.018 0.325 0.168 0.150 0.190 0.862 0.739 0.462 0.183
¹¹ A reviewer suggests that contiguity might be better treated as a three-way variable, with intermediate status given to elements that require adjacency in some constructions but not in others. We concur that this could be a productive approach to explore, but for the purposes of this study it was found to be too difficult to apply in a consistent way.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
260
. .
In some cases, the prosodic dependence of a given morpheme may vary depending on its syntagmatic context. For example, the Jarawara auxiliary na prosodically incorporates into the main verb, with which it forms a single phonological word (23a), but projects its own independent prosodic word when it combines with other affixes (23b).¹² (23)
Jarawara (Arawan; Dixon 2004: 30) a. amó+na sleep+. ‘She sleeps.’ b. amo o-ná-habóne sleep --. ‘I’m going to sleep.’
A similar situation occurs with tense morphemes in Chácobo, but the syntagmatic contexts that license prosodic word projection or incorporation are different: In this language, a tense morpheme prosodically incorporates into an adjacent verb root (24a), but projects its own prosodic word when a subject NP intervenes ((24b), repeated from (7a–b) above). (24)
Chácobo (Panoan; Tallman 2018) a. kako sani=ʔi (ka=ʔitá=kɨ)Pwd Caco fish= go=.=: ‘Caco went fishing [yesterday or two days prior]’ b. sani=ʔi (kaa)Pwd kako (=ʔitá=kɨ)Pwd fish= go Caco =.=: ‘Caco went fishing [yesterday or two days prior].’
Finally, some grammatical formatives may always project their own prosodic words, as exemplified by the Jarawara ‘aspect/time lexeme’ hibati ‘completed’ (example (25); Dixon 2004: 223); see also the Hup recent past marker páh in (6) above: (25)
Jarawara (Arawan; Dixon 2004: 223) Barako owa heta na-re-ka name() 1. lease.from -..:-: hibati jaa ‘Branco did lease [the fishing waters] from me, but this arrangement is now finished.’
¹² Dixon uses the symbol ‘+’ to indicate what he refers to as ‘a grammatical word boundary within a phonological word’ (2004: 30).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
261
Table 9.9. Rank correlations between EC level and prosodic dependence across languages tau correlation CAY Kokama Jarawara Ash. Perené Tariana Cavineña Chácobo Paresi Urarina Hup Movima Kotiria
0.543 0.491 0.426 0.187 0.155 0.151 0.129 0.122 0.107 0.018 0.089 0.274
p-value *0.045 0.083 *0.049 0.053 0.076 0.443 0.307 0.485 0.595 0.884 0.236 0.111
Our scoring captures these three possible degrees of prosodic dependence. If a formative always projects a phonological word, it receives a score of 2; if it never projects a phonological word (i.e., it always phonologically incorporates), it receives a score of 0. Formatives that do both receive a score of 1, as in the Jarawara and Chácobo cases above.¹³ Table 9.9 provides the rank correlations for the languages considered in this study. CAY displays the strongest correlation for the relationship between prosodic dependence and EC. Only two languages, CAY and Jarawara, display a significant and positive correlation.
9.3.4 Summary By comparing measures of EC and wordhood status, we obtained a metric by which to gauge the relative degree of morphological autonomy across our sample of languages. The western Amazonian languages in our set show a relatively low degree of morphological autonomy, in contrast to our geographic and typological outlier, CAY, which scored much higher on all measures considered. Despite the fact that the types of EC considered here (allomorphy, culminativity) have been described as unproblematic measures of morphological complexity
¹³ One might argue that prosodic independence is more a fact about the phonological or prosodic component of grammar, rather than having anything to do with the morphology-syntax distinction. However, the relevance of this criterion is evident in the problem of clitics. As Spencer & Luís (2012) argue, the clitic can be understood as a ‘boundary category’—which calls into question the discreteness of the components that it straddles (Croft 1991, 2001). From this perspective, a language with a greater degree of isomorphism between phonological words and grammatical words would be understood as having a higher degree of morphological autonomy.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
262
. .
(e.g., Anderson 2015a), our study illustrates that their status as necessarily morphological cannot be assumed. This point is best exemplified by Movima, which demonstrates a relatively high level of EC complexity in comparison to the other languages in the sample, but coupled with a lower overall tendency for morphemes to be dependent elements with respect to the wordhood criteria considered here, particularly bound status. Similarly, while Jarawara comes closest to CAY in displaying morphological autonomy via its relatively high correlations between EC and the wordhood measures of contiguity and prosodic dependence, its association between EC and bound status is weak and non-significant. The Movima and Jarawara cases demonstrate that deviations from biuniqueness are in principle orthogonal to the structural classification of form-meaning mappings as either morphological or syntactic.
9.4 Conclusion Our findings suggest that a relatively loose distinction between syntax and morphology is an areal feature of western Amazonian languages (perhaps extending into neighbouring regions). In this chapter, we have presented evidence for this view of Amazonian morphological profiles from two major angles. From the perspective of system complexity, we addressed morphological behaviour across four domains that show a tendency toward elaboration in western Amazonian languages—nominal classification, tense, evidentiality, and valence-adjustment— and for each explored the relationship between complexity and language contact and change. Turning our focus to EC, we systematically evaluated aspects of this domain against criteria associated with wordhood for a sample of eleven western Amazonian languages, plus CAY as a point of contrast. In addition to showing that the Amazonian languages all exhibit relatively low degrees of morphological autonomy, our findings highlight the important point that factors associated with morphological complexity are in fact not necessarily morphological: for two Amazonian languages in our sample, high EC does not correlate strongly with wordhood status. In future work, we hope to expand the typological scope of this survey, in order to establish the degree to which Amazonian languages might deviate from a more widely defined baseline relating to morphological autonomy, and to determine a more precise understanding of the geographic distribution of these patterns within and beyond South America. The low degree of morphological autonomy in western Amazonia has important implications not only for our understanding of synchronic relationships among linguistic subsystems, but also for our conception of diachronic processes of contact and grammaticalization. As we have argued here, the porous nature of the morphology-syntax distinction in Amazonian languages is associated with other areal tendencies, such as productivity of compounding and incorporation,
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
263
that facilitate grammaticalization by creating a context in which lexical elements are easily reanalysed as bound morphology. These processes in turn can feed the elaboration of grammatical domains, particularly under the pressure of areal diffusion. A fuzzy morphology-syntax distinction also allows for low selectivity on the part of grammaticalizing morphological elements, through which they may readily detach from the contexts in which they emerge and be extended to new ones. These processes result in outcomes that are typologically unusual in broader perspective; in particular, that morphologization might frequently involve a decrease in bound status, and that more and less bound instantiations of particular morphemes might be maintained over time, rather than representing only fleeting stages of a transition in progress. In sum, a closer look at the morphological profiles of western Amazonian languages invites a revision of current views of morphological complexity and its relationship to processes of language contact and change. The Amazonian case underscores the recognition that large-scale regional patterns may play an important role in shaping our vision of what is canonical or ‘normal’ in language, and that a robust understanding of human language must take a range of diversity into account.
Acknowledgements Epps gratefully acknowledges funding from the University of Texas at Austin, as well as earlier support from the National Science Foundation, Fulbright-Hays, and the Max Planck Institute for Evolutionary Anthropology for work on Hup; Tallman thanks the National Science Foundation and the Endangered Languages Documentation Programme for supporting his work on Chácobo. We are grateful to the editors of this volume for inviting us to contribute, and to Peter Arkadiev, Francesca di Garbo, Tony Woodbury, and an anonymous reviewer for their suggestions.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
III
THE ACQUISITIONAL PERSPECTIVE
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
10 Radical analyticity as a diagnostic of adult acquisition John H. McWhorter
10.1 Introduction I propose a hypothesis (cf. McWhorter 2016, 2019): that when a language is radically analytic in comparison to its close relatives, this can be treated as an indication that in the past, the language was acquired by a critical mass of adults, rather than having always been passed down the generations intact. In previous work (McWhorter 2007) I have argued that when a language within a family is markedly more analytic than its sisters, it can be traced to extensive secondlanguage acquisition (e.g., English, Persian, Mandarin, Malay). Here, however, my argument is more specific, extending this framework to whole families or even Sprachbunds of languages not just relatively analytic, but extremely so.
10.1.1 Definition of radical analyticity By radical analyticity, I refer to absence (or all but absence) of inflectional marking indicated by affixation, tone, or vowel changes in quality or length. The difference must be clear with relative analyticity, which linguists often refer to as ‘analyticity’ in a kind of shorthand, such as Nurse (2007) referring to the amply inflected Supyire (Gur, Niger-Congo) as ‘analytic’ in comparison to especially inflected languages like those of Narrow Bantu. My hypothesis distinguishes two kinds of language contact effects: transfer and structural simplification (although the two are hardly mutually exclusive). The role of transfer in language contact would seem self-evident and is richly studied. However, the role of simplification in language contact has been studied more in regard to pidgins and creoles than to less extremely simplified languages. Kusters (2003) and McWhorter (2007) were pioneering explorations of this intermediate range in a crosslinguistic sense, continued by the now seminal Trudgill (2011).
John H. McWhorter, Radical analyticity as a diagnostic of adult acquisition In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © John H. McWhorter. DOI: 10.1093/oso/9780198861287.003.0010
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
268
.
10.1.2 Radical analyticity worldwide This presentation proposes that there are three main geographical clusters of radically analytic languages with extensive adult acquisition in their histories. The first is the few Niger-Congo languages that are radically analytic, such as the Gbe languages, Yoruba, and Nupe (henceforth GYN), which my hypothesis suggests would have arisen from an earlier Niger-Congo variety with ample inflection. Yoruba’s near lack of inflectional morphology of any kind is indicated here: (1)
Yoruba Mo mú ìwé wá fún ẹ. I take book come give you ‘I brought you a book.’ (Stahlke 1970: 63)
The second cluster is a few languages of Eastern Indonesia—Austronesian ones on the island of Flores and a few on Timor—and some non-Austronesian ones on the northern coast of the island of New Guinea (as documented by Paauw 2007). Within Austronesian, adult acquisition is considered relatively uncontroversial for various colloquial dialects of Malay/Indonesian (Grijns 1991; McWhorter 2007: 223–9), and for Tetun (Hull 1999: ix; Thomaz 2002). However, my proposal will explain why we can infer a history of adult acquisition even for languages of this region with no documented history, such as ones in central Flores like Rongga, whose characteristic analytic structure is shown here: (2)
Ema ja’o weli kebaya toro. father I buy dress red ‘My father bought a red dress.’ (Arka 2011: xviii)
or one of western Papua such as Abun: (3)
Men ben suk no nggwe yo, men ben suk sino. we do thing garden then we do thing together ‘If we do things at the garden, then we do them together.’ (Berry & Berry 1999: 23)
Finally, the Sinitic languages can be seen as revealing, in their radical analyticity, adult acquisition in their past (cf. McWhorter 2016). The radical analyticity in language families neighbouring Sinitic, such as Hmong-Mien, Tai-Kadai, and Mon-Khmer, is often treated as an areal ‘Sinosphere’ feature. I suggest that within this language area, the radical analyticity, at least, traces to Sinitic. This reconstruction is especially compelling given that Mon-Khmer languages are most analytic where Chinese has had influence, and much less so where it has not,
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
269
among the Munda languages to the west and Aslian languages to the south. Under this analysis, the question becomes how Sinitic itself reached a radically analytic state in the first place, upon which I argue that adult acquisition is the most plausible cause. Under the analysis I will present, the GYN languages likely reached their state as the result of waves of second-language acquisition as an earlier Niger-Congo variety travelled southward towards the coast of the Bight of Benin. Various isolates, as well as the Mande and Ijo groups Dimmendaal (2011) has argued not to be members of Niger-Congo, are likely remnants of the original language distribution in upper west Africa. The Flores languages were likely affected by invasions from Sulawesi (or possibly the aboriginal population of Homo floresiensis). Hull (1998) makes a strong case that the Timor languages were deeply impacted by an invasion from the island of Ambon, while Paauw (2007) suggested that contact with Austronesian as its speakers migrated eastward affected the languages in Papua. The reason for the analyticity (and in general the radically isolating structure) of Old Chinese remains unknown, although DeLancey (2011) and McWhorter (2016: 81–2) offer suggestions—under an analysis which, we must recall, posit the nature of Old Chinese as an indication of adult acquisition yet to be identified.
10.1.3 Application to this volume In modern linguistics, many linguists are sceptical of the idea that the development of even radical analyticity necessarily entails a loss of overall morphological complexity. A guiding caveat is that what was once marked by an affix (or clitic) can later be marked by a free morpheme, or even a process on some other level of the grammar such as syntax (e.g., via word order). While this is true, any assumption that this kind of replacement is somehow regular or even obligatory in diachronic development is (i) logically unmotivated (i.e., for what reason or purpose would grammars ‘compensate’ in this way towards an unspecified sine qua non degree of structural complexity?); and (ii) empirically disproven (Shosted 2006 disproves that languages compensate for loss of complexity in one module by gaining it in another). Thus the development of radical analyticity is not a mere matter of a language transforming its typology in a fashion independent of complexity. Rather, the languages addressed in this chapter have lost, or all but lost, overt indication of case marking and concord in any module. They do not mark these with free morphemes. Moreover, while of course they have syntactic processes sensitive to the distinction between, for example, subject and object, these are not as obligatorified (in the terminology of Lehmann 1985) as affixal markers of these categories tend to be, often qualifying more as pragmaticized structures rather than grammaticalized ones.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
270
.
Similarly, indeed noun class markers can be replaced with free morphemes as in Wolof (Loporcaro, Chapter 6, this volume), and numeral classifiers in languages such as many in East and Southeast Asia can be seen as functionally equivalent to noun class marking (Grinevald & Seifart 2004). However, the free morphemes in question neither vary for case—much less according to declensional classes indicating this case variantly—nor vary in form between modifiers and heads as affixal noun class marking often does (Russian iz krasivyx ženščin ‘of the beautiful women’). Similarly, while radically analytic languages indicate inherent inflectional categories such as tense and number with free morphemes, these free morphemes do not occur in paradigmatic variants independent of semantics, in the vein of verb conjugational affix paradigms. Furthermore, it would appear that affixation, complete with the morphophonemic processes it encourages as well as distortions into outright irregularity beyond, conditions much more irregularity—another facet of complexity—than free morphemes do. The ‘irregular verb’ is quite rare in, for example, Yoruba, Mandarin, and Rongga, where there are no affixal markers of inherent inflection likely to drift into morphophonemic subrules, thorough irregularity subject to no rule, and then utter suppletion. Radical analyticity, that is, is less a change of type than an unravelling. Radically analytic languages remain vastly complex in countless ways, as all languages are. However, their radical analyticity does entail a significant degree of relative simplification.
10.2 Adult acquisition versus ‘drift’ That is, I propose that we would no more question whether Yoruba, Rongga, or Mandarin have extensive adult acquisition in their histories than that we would question whether the difference between Haitian Creole French and French—loss of grammatical gender, verbal inflection, and much else—were due to extensive adult acquisition: (4)
a. French Ils n’ont pas de ressources qui puissent 3. -have resource. can.3 leur permettre de résister à la famine. 3. allow of resist to . famine b. Haitian Creole Yo pa gen resous ki pou pèmètyo reziste anba 3 have resource can allow 3 resist under grangou. famine ‘They didn’t have the resources that would allow them to hold off famine.’ (Ludwig et al. 2001: 164)
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
271
Indeed, specialists in second language acquisition and in language contact concur that for adult acquirers, the target language’s inflectional morphology is especially subject to elimination (cf. Pienemann 1998; Plag 2008), as the result of factors of phonological and semantic transparency, of the same kind that condition hierarchies of borrowability (cf. Thomason & Kaufman 1988; Matras 2009: 153–7). Current orthodoxy assumes, however, that adult acquisition is but one of two pathways via which a language might become radically analytic (cf. Thomason 2003: 242; Hyman 2004). That is, works such as Thomason (2003) and Hyman (2004) are typical in their assumption that radical analyticity can also occur grammar-internally as the result of the ‘drift’ process described by Sapir (1921), in which a language’s grammar-internal changes—or even that of a number of contiguous languages such as those of much of Europe—coalesce upon a certain general tendency, such as inflectional loss. The assumption is natural, given that the loss of significant (if not radical) amounts of inflectional affixation is wellknown from the difference between modern and Old English, between the modern Mainland Scandinavian languages and Old Norse, and the general ‘drift’ towards analyticity identified by Sapir (1921). Various treatments, however, have demonstrated that the above cases and similar ones were, themselves, products of second language acquisition (cf. Kusters 2003, McWhorter 2007, Trudgill 2011 for general treatments; McWhorter 2002 on English; Trudgill 2011 on Scandinavian). There is currently such a volume of studies of this kind that it becomes appropriate to explore a certain theoretical economy in our theory of language diachrony and its relationship to language contact. To wit: it is worthwhile to explore whether radical analyticity can emerge only via adult acquisition, and therefore could be useful as a window on the past of languages whose previous stages are otherwise lost to history. In sections 10.3, 10.4, and 10.5, I will present three aspects of radically analytic languages that suggest that they owe their state to second language acquisition rather than grammar-internal development. I will then address two prominent proposals suggesting that radical analyticity could emerge without secondlanguage acquisition: (in section 10.6) Mufwene’s (2001) proposal that creole languages’ analyticity is due simply to the analyticity of their source languages; and (in section 10.7) Hyman’s (2004) proposal that Gbe, Yoruboid, and Nupe reached their state via the evolution of a monosyllabic phonological template. I must specify: my claim is not that any degree of adult acquisition of a language must denude it of a radical amount of its inflectional affixation. Adult acquisition has occurred in various degrees to, probably, most languages, and has varying degrees of effect. My argument is that radical analyticity can be analysed as tracing to an extreme degree of adult acquisition: Trudgill (2011: 57), for example, suggests that the tipping point for stark inflectional loss begins when non-native learners constitute 50% or more of the speech community.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
272
.
10.3 Argument No. 1: contextual versus inherent inflection One indication of radical analyticity’s roots in adult acquisition rather than ‘drift’ is that in languages that have reached such a state, one type of inflection is eliminated entirely or virtually so, while another type is retained in the form of free morphemes. This is typical of adult acquisition, but not of grammar-internal change. Booij (1993) distinguishes inherent inflection from contextual inflection. Inherent inflection contributes meaning, driven by the speaker’s choice of what they wish to communicate. It thus includes nominal number, tense, and aspect, and is not required for syntactic grammaticality. This contrasts with contextual inflection which, indicating features such as case and concord necessary to the syntactic composition of the sentence, has function. Crucially, in creoles, the lexifier language’s inherent inflection is typically preserved to a considerable extent in the form of free morphemes, such as preverbal tense and aspect particles (even when the substrate languages were synthetic, as was the case with many creoles; cf. section 10.6 below). However, contextual inflection is typically not replaced in this fashion (Plag 2008; Luís 2009). In this Haitian sentence, French’s past tense inflection is replaced by the free form te, but the nouns baay ‘thing’ and moun ‘people’ are not marked for grammatical gender as their French equivalents are, nor is grammatical gender marked on Haitian’s definite articles; also, pronouns such as li (here, ‘it’) are not marked for case: (5)
Yo te suvèye baay sa-a pu anpèche moun vole li. they watch thing this- for prevent people steal it ‘They watched this thing in order to prevent people from stealing it.’ (Koopman & Lefebvre 1981: 203)
The facts are similar in pidgins, in which even as free morphemes, contextual inflection is rare while inherent inflection is frequent (Roberts & Bresnan 2008). As Plag (2008) notes, creoles’ retention of inherent rather than contextual inflection is predictable from the hierarchical pathway of second-language acquisition identified by Pienemann (1998), under which inherent morphology is more easily accessible to the learner than contextual, and thus always acquired first. In contrast, under ordinary grammar-internal change, contextual morphology is much less fragile. For example, 1. French has lost Latin’s case inflections on nouns (first collapsing the oblique cases into one and then losing even this distinction), but retains case distinctions in pronouns, and concord within NP. 2. Pashto has lost much of the inflection in early Iranian languages, but nevertheless retains ample case marking and concord.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
273
3. Within Niger-Congo, while Wolof lacks the noun class prefix paradigm typical of Bantu and even many of its own relatives within the Atlantic subfamily, it has replaced them with postposed free morphemes (Torrence 2013: 16; cf. also Babou & Loporcaro 2016, Loporcaro, Chapter 6, this volume), as shown in Table 10.1. 4. Modern Armenian dialects retain Indo-European case marking as well as inflections distinguishing declensional classes; Albanian also retains case marking as well as grammatical gender. Adult acquisition is not assumed to have been significant in the timelines of either of these branches of IndoEuropean, as opposed to in Romance and Germanic. 5. Georgian has retained the contextual inflection of Proto-Kartvelian over several millennia. These cases serve to illustrate, as Nichols (1992: 169) indicates, that ordinary grammar-internal change poses no threat to contextual inflection. The contrast is clear with the extent to which adult acquisition indeed does so. As such, the fact that radically analytic languages like the GYN ones and those of central Flores like Rongga retain free morphemes in the function of inherent inflection, but eschew contextual morphology completely, suggests that they have roots in non-native acquisition, under which learners had access to inherent morphology rather than contextual because inherent inflection is more like derivational morphology, as in more ‘lexical’, and thus more salient to the nonnative learner. This distinction is the one reflected in borrowing as described by Gardani (2008, 2012, 2018). Thus, a sentence like the one below in (example (6)) Fongbe contrasts with a Swahili one not only in encoding aspect with a free morpheme, but in lacking either bound or free noun class morphology: (6)
Fongbe Àvún ɔ́ nɔ hàn àɖú mὲ. dog bite tooth person ‘The dog bites people.’ (Lefebvre & Brousseau 2002: 266) Table 10.1. Wolof noun class markers xaj bi gaal gi ndap li wax ji jën wi ndaw si saw mi nit ki
the dog the boat the pot the talk the fish the young woman the urine the person
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
274
.
(7)
Swahili U-levi hu-ondoa akili. -drunkenness -remove .sense ‘Drunkenness takes away sense.’ (Perrott 1950: 56)
In the same way, Central Malayo-Polynesian languages typically have subject marking concordial prefixes, as in Leti: (8)
Müani-ne püate ra-mtïètne. man-and. woman. 3-sit. ‘The man and the woman sit.’ (Van Engelenhoven 2004: 243) ( = indexical marker)
The languages in central Flores such as Rongga lack these prefixes, and case marking, but mark tense and aspect with free morphemes: (9)
Ata gagi ngai ngaja. person old talk ‘The elders are talking.’ (Arka 2011: 56)
Because contextual morphology is usually discussed in reference to affixal languages, it may seem unremarkable that the Chinese languages have very little marking of case and grammatical relations. However, even a largely monosyllabic language like Akha (Sino-Tibetan) marks ergativity with free morphemes: (10)
ŋà nɛ àjɔq áŋ áshì thì shì biq I he fruit one give ‘I gave him one fruit.’ (Hansson 2003: 243)
ma.
Therefore, the radically analytic languages I discuss resemble creoles not simply in being analytic, but in also retaining a particular kind of morphology as free morphemes while eschewing the other kind. In this, these languages can be seen as harbouring evidence of adult acquisition.
10.4 Argument No. 2: analytic language as an unnatural state Especially given how familiar it is to linguists that Modern English is so much more analytic than Old English, it may seem unexceptionable that, by chance, some languages might shed all of their inflectional affixation.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
275
10.4.1 Grammaticalization is unceasing I will present three observations which, together, suggest that a language can only lose all of its bound inflection via external intervention. First, the emergence of new grammatical items via grammaticalization processes, as well as reanalysis, is a constant in the life cycle of a language. More to the point, there is no indication in the grammaticalization literature that the process only operates in a subset of languages, or that the process is given to halting for long periods. Grammaticalization can be taken as equivalent to the movement of bodies in the theory of physics: just as stasis under this formulation is irregular, we can assume that in language change, the cessation of grammaticalization, indicates the death of the language. To wit, grammaticalization is unceasing. Second, following from this point is that there is no reason that while a language were losing bound inflection, the development of new inflection via grammaticalization would not be occurring simultaneously. Put differently, diachronic theory knows no reason that there would be such a cessation. Moreover, empirical evidence demonstrates its opposite. In Romance, the erosion of Latin’s future marking suffixes was paralleled by the emergence of new ones from the grammaticalization of habere ‘to have’ (as well as a new conditional marking paradigm). Also, Italian developed new noun inflectional classes as original ones were lost (Gardani 2013). In the Kartvelian language Svan, declension marking suffixes proliferated amidst its loss of some of Common Kartvelian’s original concord machinery (Harris 2004: 152–5). In Swahili, past marking prefix li- grammaticalized from a locative verb as the Common Bantu equivalent a- (Nurse 2008: 257) wore away (McWhorter 1994: 62–3). Affixes and paradigms change function as often as they disappear (cf. Mukarovsky 1977: 32–5; Harris & Campbell 1995; Good 2012a). Third, following in turn from the above point, languages do not ‘cycle’ through stages of radical analyticity followed by the development of new inflections which eventually wear away such that the cycle begins again. That linguists sometimes suppose so would seem to be due to a ‘folk’ interpretation of Hodge (1970) on Egyptian, which actually showed a phase of relative analyticity, nothing approaching radical. Meanwhile, no cycle through radical analyticity has been demonstrated elsewhere. As Dahl (2004: 261–88) notes, the absence of such a cycle has been explicitly noted in Afroasiatic, Uralic, and Altaic, and meanwhile specialists in language groups worldwide report no such cycles. In sum, grammaticalization is analogous to crocodiles’ and fishes’ teeth, which are continually replaced throughout life. These animals do not ever reach a toothless stage. If one were encountered toothless, we would know that this was the result of an external disruption. We would neither venture that it was a normal development nor expect it to develop a mouthful of new teeth overnight.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
276
.
With the three above observations, grammars such as Yoruba, Mandarin, and Rongga become puzzles. Adult acquisition is the only mechanism which has been empirically documented to shave away all or almost all of a language’s bound inflection. There are no documents of radical analyticity’s emergence in East and Southeast Asia, Indonesia, or West Africa: in all cases, the languages are radically analytic by the time they were committed to writing. I suggest that a solution to the puzzle that these other languages pose is that they, too, were born of adult acquisition.
10.4.2 Unstressed final syllables do not lead to the typology of Chinese Two common conceptions must be addressed. First, is withdrawal of stress from final (or initial) syllables a possible reason for a language becoming radically analytic? Two answers beckon: 1. This account would neglect that bound affixation often includes vowel changes within the root. A great deal of English’s inflectional morphology, for example, is indicated with the root vowel changes in the past forms of verbs. Even if destressing the final syllable had denuded English of all inflectional suffixes, the vowel changes in the strong verb roots would have remained. 2. Lack of stress on the final syllable is not as regularly destructive of inflectional morphology as often supposed. Withdrawal of stress from the final syllable is common in Indo-European, and usually the result has been languages that have remained richly suffixed. Baltic and Slavic preserve a great deal of Proto-Indo-European nominal morphology, and yet, for example, West Slavic fixed its accent on the first syllable several centuries ago. Armenian has fixed the accent on the penult, and yet retains a rich declensional system and robust verbal inflection. A considerable degree of unaccented wordfinal inflection has survived in Icelandic. In Celtic, when the accent was retracted from endings, Goidelic (such as Irish and Scots Gaelic) retained much verbal inflection and a degree of nominal. We must also consider the Romance languages other than French, such as the Iberian languages and Italian, in which unstressed inflectional suffixes are prolific and robust.
10.4.3 Inflection is more quickly lost than gained The second conception we must address is a possible misinterpretation. My claim that radical analyticity is an unnatural state for a language must not be taken to
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
277
mean that it is incompatible with human cognition. In fact, many would reconstruct that language emerged uninflected (e.g., Comrie 1992). However, the development of grammatical affixes is a slow process. As Dahl (2018) notes, a mere few inflectional affixes are documented to have emerged in Europe over the past 2,000 years. Thus while a single instance of disruption, such as the inmigration of a large population of adult learners, can eliminate a language’s bound inflection (many creoles) or vastly reduce it (English) in one stroke, the nature of grammaticalization conditions no reason to suppose that new affixes would emerge immediately. In fact, theoretically, this is what we would not expect. Yet the radically analytic languages I have referred to do show signs of grammaticalization, albeit the forms are not yet bound ones. This, too, is what we would expect, and would find puzzling if absent. In Fongbe, an imperfective marker wὲ has emerged, likely from a postposition, which in the modern language could be treated as an inflection: (11)
Kɔkú ɖò àsɔ́n ɔ́ ɖù wὲ. Koku be.at crab eat ‘Koku is eating the crab.’ (Lefebvre & Brousseau 2002: 96)
In Palu’e in Central Flores, a new first-person singular subject marking clitic has developed (Donohue 2009). In Mandarin, since the seventh century (Li & Thompson 1976), the marker bǎ has emerged from the meaning take: (12)
Nˇ1 bǎ jiuˇ màn-màn-de hē. you wine slowly drink ‘You drink the wine slowly.’ (Li & Thompson 1981: 464)
In a future stage of Mandarin this, as well as other items that cleave closely to roots such as nominalizer zi, could become bound morphemes. Also, in Mandarin, the modern usage of numeral classifiers began developing in the second century (Norman 1988: 115–17), and diachrony has rendered them quite often semantically unpredictable. Zhī is used with animals (although only some of them) and birds, but is also used with eyes, hands, suitcases, and boats. Tiáo is most immediately identified with long, thin things; less likely to come to mind is that it is also used with proposal, voice, scheme, and ‘piece of news’. Bă is used with things that one holds such as knives and teapots, but also with chairs—and the experience of aging (niánjì). As such, Gao (1998) notes that Mandarin speakers’ mental representation of classifiers is subdivided between three classes of association, one transparent, one prototypical (metaphorically extended in a synchronically processible fashion) and one arbitrary. This can be
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
278
.
analysed as the emergence of grammatical gender—that is, contextual morphology. (Cf. Grinevald & Seifart 2004 on the likeness of noun class marking and grammatical gender.)
10.5 Argument No. 3: radical analyticity is rare Because linguists tend to be familiar with analyticity from the textbook example of Chinese, as well as from any acquaintance with creole languages, it can seem that analyticity is as likely a state in a language as any other. However, this is not true when it comes to, especially, radically analytic languages. Outside of creole languages, where we take it as uncontroversial that adult learning was the cause of the analyticity, radically analytic languages are actually rare. Donohue & Denham (forthcoming) in their survey World Atlas of Language Structures, find none outside of the areas I have cited. If we treat Sinitic as about ten languages, Hmong-Mien as about twenty (a high estimate according to most accounts), Tai-Kadai as about a hundred according to Ethnologue, and treat about 130 of the 168 Austroasiatic languages tabulated by Ethnologue while subtracting Munda and Aslian (again, yielding a likely high tally), then in East and Southeast Asia there are about 260 radically analytic languages. Furthermore, the analyticity of these can be treated as tracing to the analyticity of Chinese alone (McWhorter 2016). In the meantime, outside of these languages, the tally of radically analytic languages in Africa, Flores, Timor, and the island of New Guinea is about three dozen at most. How often the linguist encounters sentences of Mandarin, plus how familiar creole languages have become within the field, can distort our sense of the bigger picture. There would appear to have never been reported a radically analytic indigenous language in: 1. North America, South America, or Australia 2. The four families indigenous to all of Africa other than a tiny pocket of languages in one of those families 3. Dravidian, Uralic, Altaic, the Caucasian families, Yeniseian, or any ‘Paleosiberian’ group 4. Indo-European. A feature manifested in a mere few hundred of the world’s 7,000 older (as opposed to creole) languages qualifies not as an ordinary result (‘Language X simply lost its inflections’) but as an unusual circumstance. This is even more the case if the feature manifests itself in solely a few dozen of 7,000, the result if we count the analyticity of the Sinosphere as an areal feature spread from Chinese. It is clear that radical analyticity is not a state that a language reaches easily and, in
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
279
fact, everything we know about how languages transform over time makes it difficult to see how such a state could occur in stepwise fashion. However, the origin of radical analyticity in acquisition by adults is richly observed and thoroughly predictable. The scientific benefit of cordoning off creoles into this origin scenario while assuming an unspecified different one for other radically analytic languages is unclear. This bifurcated approach would be appropriate if there were evidence that large-scale acquisition of a language by adults was impossible before the emergence of the transatlantic slave trade in the fifteenth century . Obviously, however, there is not. Rather, we could treat creoles as revealing to us how other languages reached a state which, according to observable processes of stepwise grammar-internal evolution, is a mystery. In short, the common idea that a given language simply ‘lost its inflection’ is less coherent than it seems. Lack of stress on final syllables vastly undershoots what would be necessary for a language to reach a radically analytic state, and languages are not empirically recorded to undergo such a process short of extensive acquisition by adults. I will finally discuss two counterproposals to my reasoning.
10.6 On claims dissociating creolization from ossified acquisitional capacity Some creole specialists have attempted a dissociation between even creolization and the effects of adult acquisition. Mufwene (2001), Aboh & Ansaldo (2007), and Aboh (2015) propose a theoretical economy of a different kind: that creole genesis is simply a matter of language mixture, with simplification playing no more significant a part in creoles’ birth than in how languages change elsewhere worldwide. Mufwene, for example, proposes (2001: 80–105) that there is no qualitative distinction between the emergences of standard English, AfricanAmerican Vernacular English, and Gullah Creole English: all were the result of the mixture of features within the ‘ecology’ of the linguistic contexts in which they emerged, analogously to the mechanisms of population genetics. The idea that the association of creoles with pidginization has been a mistake has become familiar among linguists, to the point that I must spell out that my assumptions will not incorporate this proposal, often termed the ‘Feature Pool’ hypothesis. This hypothesis is motivated partly by a claim that while creoles’ analyticity—such as that of Sranan Creole English or Haitian Creole French—may seem to contrast with European languages’ morphology, in actuality English is only moderately inflected, spoken French is much less inflected than its written version suggests, and meanwhile the substrate languages of many creoles are the radically analytic ones abovementioned, such as Gbe and Yoruba.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
280
.
The implication is that these creoles are analytic simply because their source languages were: as Mufwene specifies (2009: 386), ‘The extent of morphological complexity (in terms of range of distinctions) retained by a “contact language” largely reflects the morphological structures of the target language and the particular languages that it came in contact with’. However, an equal number of creoles are based on robustly inflected Iberian languages, and/or have robustly inflected substrate languages such as Bantu, West Atlantic, Nilo-Saharan, and even Austronesian languages, and yet are as analytic as Sranan and Haitian. Linguists supporting the Feature Pool hypothesis have yet to respond to such observations, such as that while Palenquero Creole Spanish was created by Kikongo speakers, such that both of the languages in the ‘pool’ were heavily inflected: (13)
Kikongo (Bentley 1887:526) (8 = noun class 8 plural) O ma-tadi ma-ma ma-mpembe ma-mpwena 8-stone 8- 8-white 8-big i ma-u ma-ma tw-a-mw-ene. 8-that 8- we-them-see-
(14)
Spanish Est-a-s piedr-a-s grande-s y blanc-a-s -- stone-- big- and white-- son las que hemos visto. .3 .. have.1 see.. ‘These great white stones are those which we have seen.’
Palenquero is yet a highly analytic language. The facts are similar with all of the Portuguese-based creoles, as well as Nubi Creole Arabic and the Aboriginal English-based creoles of Australia. Chinook Jargon creolized as well, and despite its source languages all being richly inflected, the creole version was as analytic as Sranan and Haitian (Grant 1996). Adherents of the Feature Pool hypothesis have not responded to such observations, and it is difficult to see how their framework could accommodate them. In this presentation, therefore, I maintain on the basis of the argumentation I have presented that adult acquisition does play a decisive and diagnostic role in creole genesis. My aim is to extend this analysis to languages other than creoles.
10.7 On a phonological pathway to radical analyticity Hyman (2004) proposes a grammar-internal diachronic pathway to radical analyticity. He reconstructs that what caused the difference between verbs in the GYN
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
281
languages, usually monosyllabic or at most bisyllabic, and the heavily affixed ones in Narrow Bantu languages was the development of a phonological template disallowing verbs of more than two syllables. I suggest that extensive adult acquisition is a preferable explanation for both the GYN verb’s lack of inflection and its phonotactics. For one, the process Hyman describes has been proposed, to my knowledge, nowhere else. Hyman’s account, in that light, is more descriptive than explanatory. That is: the literature on language change does not record it as a crosslinguistic commonplace that languages permitting richly multisyllabic words gradually take on a phonological ‘template’ limiting words to one or two syllables, with this treated as an ordinary phonological development alongside processes such as nasalization or resyllabification. I submit that an adult acquisition account has more explanatory power. Second, the templatic account contravenes the tendency for languages to resist letting phonological processes eliminate grammatical morphemes. Hyman’s account requires that speakers of a language ‘drifted’ into a disyllabic or monosyllabic restriction even on the pain of eliminating grammatically crucial affixes, replacing them with free morphemes—despite linguists’ well-known findings that speakers resist phonological erosion when it threatens grammatical morphemes (cf. Guy 1991; Carstairs-McCarthy 2010). Counterproposals to some reported cases of this morphologically conditioned sound change (Hill 2014) have not disproven the tendency itself. Third, pidginization, specifically, explains the GYN situation as well as a templatic explanation, and even better, in proceeding from an empirically observed phenomenon. To wit, the reason words might become radically, as opposed to modestly, shorter in a language, to such a degree as to force a vast restructuring of the grammatical system, is the language’s transformation by nonnative acquirers who are less likely to master lengthier words (as well as grammatical features). To the extent that the GYN languages restrict their verbs to a maximum of two syllables, it is relevant that, as pidgin specialist Mühlhäusler (1997: 140) puts it, ‘There appears to be a tendency in most stable Pidgins, whatever their sub- and superstrata languages and whatever their jargon predecessors, to favour open syllables and words of the canonical shape CVCV.’
10.8 Conclusion My goal has been to demonstrate the arguments for, and advantages of, assuming that radical analyticity traces solely to extensive adult acquisition. Under this analysis, radical analyticity sparks a search for sociohistorical factors that would entail such adult acquisition. The processes in question occurred before written history (otherwise, they would long have been readily apparent) and therefore the
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
282
.
investigation of the relevant sociohistorical factors for the various clusters of radically analytic languages is still in progress (McWhorter 2016, in preparation). The advantage to my hypothesis is theoretical economy: rather than positing two pathways to radical analyticity—one of them mechanically incommensurate with what is known of how languages change—we could posit a single one. As a result, radical analyticity could be treated as a clue to social history otherwise difficult to reconstruct or even unrecoverable. We assume that the featherless bird has been plucked, not that it has lost its feathers by chance. We might approach the language devoid of bound inflection similarly, to the benefit of our models of diachronic change and language contact.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
11 Different trajectories of morphological overspecification and irregularity under imperfect language learning Aleksandrs Berdicevskis and Arturs Semenuks
11.1 Introduction 11.1.1 Why study complexity? In the Introduction, Arkadiev & Gardani (Chapter 1, this volume: 5) list four most important open questions in the study of morphological complexity. In our view, the first three questions become important and interesting only as a means to answer the fourth question, which could be reworded as ‘How is morphological complexity related to socioecological factors?’. The true value of this question is not even that it relates morphology and extralinguistic characteristics of the environment in which the language is spoken, but that it makes complexity more than a mere parameter of crosslinguistic variation. Complexity becomes a parameter involved in explanatory theories, giving us the possibility to use it in order to understand how language is structured. As was discussed in the Introducton, in these theories complexity is a dependent variable, while socioecological parameters are predictors. This means that if the theories are correct, we can better understand why linguistic structures are distributed across languages the way they are, how the processes of language change and social interaction are structured and work together, and how language is organized and functions in the brain. If not for this explanatory attempt, the first three questions from Arkadiev and Gardani’s list (Can we define morphological complexity? Can we find an understanding of morphological complexity which would be applicable to all languages and quantify this understanding? Can we compare and typologize languages in terms of morphological complexity?) would, in our view, be better described as brain teasers rather than research avenues. Brain teasers are not at all useless, but given how notoriously difficult it is to address these particular questions, it would hardly be possible to expect that the potential benefit of finding answers would outweigh the required effort. Arkadiev and Gardani provide examples which Aleksandrs Berdicevskis and Arturs Semenuks, Different trajectories of morphological overspecification and irregularity under imperfect language learning In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Aleksandrs Berdicevskis and Arturs Semenuks. DOI: 10.1093/oso/9780198861287.003.0011
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
284
suggest that Lithuanian nominal inflection is more morphologically complex than Turkish. Does this claim per se yield any new information compared to what can be learned from descriptive grammars of these two languages? The fourth question, however, changes everything. If we can explain which factors likely have contributed to Lithuanian being more complex than Turkish, then the game is worth the candle. That gives us the incentive to ponder about what complexity is and to search for the means of operationalizing it. Our contribution to this volume should be read with this value system in mind.
11.1.2 What is complexity? Having this incentive to deal with all the questions from Arkadiev and Gardani’s list, let us briefly outline what we mean by complexity in this chapter. As most would agree, complexity is a multi-faceted phenomenon, and a language can be complex in several different ways. This volume contains a variety of perspectives on and approaches to complexity, see Dahl (Chapter 13, this volume) for an overview. Trying to tackle all aspects of it simultaneously, however, is likely to hinder progress rather than aid it. In order to usefully limit the scope of this particular investigation, we will concentrate on two of the facets of complexity that are, in our view, most crucial: overspecification and irregularity. We define overspecification as overt and obligatory marking of a semantic distinction that is not necessary for communication, following McWhorter’s (2007: 21–8) understanding. The problem with this definition is that it is not at all obvious what is necessary for communication. McWhorter makes inferences about what is necessary by comparing the grammars of different languages. If many of the world’s languages have neither subject-verb agreement nor any apparent means to compensate for the lack of it, it seems reasonable to hypothesize that this feature is redundant and that languages that do possess it have overspecified grammars. A more direct way to find out what is necessary would be to run psycholinguistic experiments. MacWhinney et al. (1984), for instance, find that Italian speakers do use the subject-verb agreement markers when establishing semantic roles in a sentence. Note that this finding does not necessarily contradict the claim that agreement is an instance of overspecification. That a feature is useful does not mean it is necessary. Fortunately, in this chapter we will be dealing with an artificial language where it is obvious what is overspecification and what is not (see section 11.2). Another facet of complexity we will discuss is irregularity (McWhorter 2007: 33–5). A linguistic system is irregular to the degree that it cannot be described by exceptionless deterministic rules. Such a system can also be described as predictable and consistent. Intuitively, it is usually quite obvious whether a linguistic
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
285
system is regular or not. Systems, however, can be irregular to very different degrees. While in theory it is clear that the fewer rules are required to describe a system, the simpler these rules are and the fewer exceptions they have, the more regular the system is, in practice it is usually difficult to rank several irregular systems in order of their (ir)regularity, and even more difficult to quantify it. Again, for the artificial language in this chapter, this task is simpler than it would be for real languages. Other facets of complexity exist, but some of them are reducible either to overspecification or irregularity, while others are, in our opinion, less ubiquitous and salient. Importantly, overspecification and irregularity are not reducible to each other. It is easy to imagine a system which has little or no overspecification but is irregular, and it is equally easy to imagine a highly overspecified but fully regular system (though these are not that frequent in real languages). This understanding of complexity, however limited and simplified it is, enables us to test specific hypotheses about the typology and diachrony of morphological complexity.
11.1.3 How to study complexity? Various hypotheses have been proposed to explain the distribution of morphological complexity among the languages of the world. The ones that arguably have the strongest empirical support and have the most lively discussions in the literature are those that suggest the existence of a causal link between a large proportion of non-native speakers in the population and morphological simplification (Dahl 2004; Wray & Grace 2007; McWhorter 2007 and Chapter 10, this volume; Trudgill 2011; Dale & Lupyan 2012). The evidence in favour of this hypothesis comes mostly from typological surveys, though rigorous quantitative studies (e.g., Parkvall 2008; Szmrecsanyi & Kortmann 2009; Bentz & Winter 2013; Bentz et al. 2015) are a minority among them. Correlational studies of this kind are necessary, but not sufficient (Tily & Jaeger 2011; Nettle 2012), as other types of evidence are required to demonstrate and explain the causality (Ladd et al. 2015; Roberts 2018). Experimental approaches, in particular iterated artificial language learning (IALL) (Kirby et al. 2008), can be an efficient means to model the simplification and complexification processes. In a typical IALL setting, a constructed mini-language is learned by a participant within a limited amount of time, then this participant’s linguistic output is used as linguistic input (i.e., training data) for the next participant, and then the iteration is repeated. If the output of the participants in generation n differs from their input, then the participants in generation n+1 will learn a changed version of the language. This design enables us to observe language evolution in miniature, as the language changes, being transmitted over ‘generations’.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
286
The IALL approach does have its limitations. The possibility to observe language change in the laboratory and to have the full control over the environment comes at the price of naturalness. The artificial languages are by necessity small and relatively simple, and the learning usually takes less than one hour. Nonetheless, while the experimental results should be treated with due caution, they can be a valuable complement to the typological surveys. Suppose a typological study shows a correlation between a proportion of nonnative speakers and absence of inflectional morphology, and suppose its data and methods are completely reliable and trustworthy. Even in this best-case scenario, we still do not know whether there really exists a causal link between non-native acquisition and simplification (though we have good reasons to hypothesize that). Moreover, we do not get an insight into how exactly adult acquisition facilitates simplification (if it does). An iterated learning experiment can serve as a means both to test the presence of the causal link and to identify a potential causal mechanism.
11.1.4 Why does complexity decrease? Bentz & Winter (2013: 3–4) list three potential mechanisms of contact-induced case loss (which can be generalized to other instances of morphological simplification): imperfect acquisition by adult learners; the tendency of native speakers to reduce morphosyntactic complexity of their speech when talking to foreigners; the tendency of loan words to combine with more productive inflections, forcing the least productive ones out (Barðdal & Kulikov 2009). The first mechanism from this list seems to be mainstream in the typological, sociolinguistic, and evolutionary literature (Nettle 2012). Indeed, in the literature on language acquisition, there is a consensus that morphology is hard for non-native learners, and that concerns both production and perception, both tutored and untutored learners (DeKeyser 2005: 6–7). The main factor causing simplification then is presumed to consist in the differences between native (child) and non-native (adult) language acquisition. However, given this, another question arises: what aspects of these differences and what conditions are necessary to cause simplification? How deep into these differences do we have to delve in order to find a proper explanation? It is possible that deep differences in cognitive biases between children and adults have to be invoked, together with nuanced properties of social network structure or other cognitive processes besides learning. However, it is also possible that the answer lies on the surface: children can (usually) master a language perfectly, while adults (usually) cannot (Bley-Vroman 1989: 43–4), and that by itself is enough to provoke simplification processes. It seems safe to claim that imperfect learning is one of the driving forces behind
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
287
simplification. Can we go further and assume it is the only driving force? While this hypothesis may be too simplistic, it is reasonable to start the search for explanations and mechanisms by testing it. In this chapter, we analyse the data from Berdicevskis & Semenuks (submitted), one of the largest-scale (in terms of the number and the length of transmission chains) IALL experiments so far that directly address linguistic complexity. In Berdicevskis and Semenuks (submitted) we showed that imperfect language learning by itself reduces overspecification. Here we focus on irregularity (see 1.2) and show that it behaves differently from overspecification. We also investigate how the two facets of complexity interact with learnability of the language. In section 11.2, we summarize the methodology of Berdicevskis & Semenuks (submitted). In section 11.3 we describe the trajectory of overspecification, and in section 11.4, that of irregularity. In section 11.5, we draw on the existing knowledge about language acquisition to explain the observed differences. In section 11.6, we conclude.
11.2 Materials and methods In order to investigate whether imperfect learning could lead to higher rates of morphological overspecification loss, we designed and ran an IALL experiment. As mentioned in section 11.1.2, the approach provides the opportunity to model language change in a controlled experimental setting. Each transmission chain contained 10 generations, and each generation consisted of a single participant. After the initial instructions, in the training stage of the experiment the participants learned an artificial language, that is, learned to match 16 ‘sentences’ to 16 stimuli pictures. After that, in the testing stage the participants first matched sentences with their appropriate pictures and then produced sentences that they considered to correspond to the each of the individual pictures. The set of all of the sentences that they produced in the last part of the experiment was used as the learning input language for the next generation. The initial artificial languages that we generated as input for all of the generation 1 participants contained a redundant agreement marker that was not necessary in order to identify which picture corresponded to each sentence. In order to investigate whether imperfect learning could lead to the loss of morphological overspecification (in our case – the semantically redundant agreement marker), the amount of time given to the participants to learn the language was manipulated between three different types of transmission chains. In the normal condition all chains contained an amount of time that pilot experiments suggested to be sufficient to fully learn the language, in the temporarily interrupted condition the generation 2-4 participants received less time to learn the languages, and in
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
288
permanently interrupted condition chains all participants after the first generation received less time. A more detailed description is given in section 11.2.2. Before that, however, we want to note the apparent fact that the IALL approach lacks ecological validity due to a variety of both quantitative and qualitative differences between language learning in an experimental setting and in the real world. Because of that, the claims that one makes based only on IALL experiments need to be tempered. Taken as a piece of a larger picture, however, they provide important supporting evidence and new perspectives on the questions of interest. In the context of the current study, in particular, although we ultimately are interested in differences between native and non-native acquisition, we are not contrasting adult and child learners in our experiment. However, since we are interested in whether the difference between normal and imperfect learning by itself can be a sufficient cause for morphological simplification, we consider our model to possess the necessary external validity.
11.2.1 Artifical language structure Each of the sentences in the languages learned by the participants identified a picture. We will refer to the set of all pictures as the languages’ meaning space (see Figure 11.1). The meaning space had three dimensions, that is, three characteristics that each of the sixteen pictures could be uniquely identified by: the agent performing the action (round animal or square animal), the number of agents (one or many) and the action being performed (no action, falling apart, growing antlers or flying). The structure of the initial input languages (we will refer to them as generation 0 languages) is represented in Figure 11.1.¹ The sentences in the languages transparently mapped onto the meaning space: the noun stem identified the agent, the plural marker (or its absence) identified the number of agents, and the verb stem (or its absence) identified the action. Importantly, the agreement marker is semantically redundant, in the sense that its omission would not affect the identification of the correct picture in the meaning space – the picture is uniquely specified by the other three morphemes. Thus, in the generation 0 languages the agreement system is an instance of morphological overspecification. See Di Garbo (Chapter 8, this volume) for a detailed study of the changes of gender-agreement systems in a sample of real-world languages in relation to complexity.
¹ We used fifteen different isomorphic languages, as is common in IALL experiments. When reporting results, however, we orthographically map all the languages we have onto the example language in Figure 11.1: the first letter of the word for the round animal in the chain’s generation 0 language becomes s, the second letter becomes e, and so on. This procedure makes the comparisons between languages easier while preserving all the information about the changes.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
agent: round animal
agent: square animal
segN
fuvN
segN-lPL
fuvN-lPL
segN mV-oAGR
fuvNmV-iAGR
segN-lPl mV-oAGR
fuvN-lPL mV-iAGR
segN rV-oAGR
fuvN rV-iAGR
segN-lPL rV-oAGR
fuvN-lPL rV-iAGR
segN bV-oAGR
fuvNbV-iAGR
segN-lPL bV-oAGR
fuvN-lPL bV-iAGR
289
singular event: none plural
singular event: fall apart plural
singular event: grow antlers plural
singular event: fly plural
Figure 11.1. The meaning space of the experimental languages with the corresponding sentences from an example generation 0 language Notes: Subscript N denotes noun stems, V = verb stems, PL = plural marker, AGR = agreement marker. Morphemes are hyphenated and subscripts are provided for clarity’s sake. Glosses for the meanings of the sentences are provided in parentheses. Source: Adapted with permission from Berdicevskis & Semenuks (submitted).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
290
11.2.2 Experimental procedure After the initial introductory instructions, the participants learned the language in the training stage of the experiment. The stage consisted of a number of training blocks interspersed with interim test blocks. In the training blocks, the participants saw all of the pictures from the meaning space, which were presented in a random order and accompanied by the sentence corresponding to the picture in the participants’ input language. Each picture-sentence pair remained on the screen for four seconds, after which the next pair appeared. In the interim test blocks the participants were shown one by one eight pictures randomly selected from the meaning space and were asked to type in the corresponding sentences for each of them. The instructions preceding the training block prohibited the participants to take any notes during the experiment. In order to model the difference between normal and imperfect learning, we manipulated the number of training and interim test blocks that the participants received. Normal learner generation participants received six training blocks, whereas imperfect learner generation participants received three blocks. In order to investigate how the amount of imperfect learners in a population would affect the tendency to eliminate morphological overspecification from the language spoken by its members, we compared the development of generation 0 languages in transmission chains in three different conditions: normal, temporarily interrupted and permanently interrupted. Figure 11.2 illustrates the differences in the numbers of normal and imperfect learner generations between the conditions. Since the experiment contained 15 generation 0 languages, each of which was used once in each of the three experimental conditions, and each of the Normal transmission L
L
L
L
L
L
L
L
L
L
L
L
S
S
Temporarily interrupted transmission L
S
S
S
L
L
L
L
Permanently interrupted transmission L
S
S
S
S
S
S
S
Figure 11.2. A schematic representation of the chains in the normal (a), temporarily interrupted (b), and permanently interrupted (c) conditions Notes: L = generations with long (full) learning time, S = generations with reduced learning time (imperfect learners). Arrows denote languages transmitted between generations. The very first arrows denote pre-generated input languages for the first generation learners. Source: Reproduced with permission from Berdicevskis & Semenuks (submitted).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
291
transmission chains required 10 participants, we recruited and analysed the data from a total of 450 participants (140 female, 310 male, mean age = 30.5, SD = 9.2). The participants were recruited online and took part in the experiment on a webpage created using the jsPsych JavaScript library (de Leeuw 2014). Unknowingly to the participants, the web page assigned them to a new generation in a randomly chosen transmission chain before the start of the experiment. The experiment was conducted in Russian, and all of the participants self-reported speaking Russian natively and being at least 16 years old. Because Russian has a salient gender agreement system of its own, we could be sure the native language of our participants would not push them to shed agreement in the experiment by itself.
11.3 The trajectory of overspecification The normal transmission chains tended to preserve morphological overspecification to a much greater extent compared to chains in either temporarily or permanently interrupted transmission condition, thus supporting the hypothesis that a larger share of imperfect learners in a population would lead to the loss of morphological overspecification in the language of that population. In this section, we present a condensed description of some of the results from Berdicevskis & Semenuks (submitted), complementing it with some additional observations.
11.3.1 Qualitative analyses The qualitative analysis of the final languages revealed a general trend for the structure of the languages to deteriorate. Several reasons could have led to this, most likely the underestimated difficulty of learning the language even with six training blocks and the absence of true communicative pressures in the experiment. However, it was not the case that this deterioration of structure was equally likely to affect all aspects of the language and was equally likely to affect chains of all three conditions. The agreement system was eroded by the participants much more often than the other morphological aspects of the system, and this erosion of structure was less frequent in the chains with normal transmission. Nonetheless, it is important to keep in mind that the learning was not entirely perfect in normal condition either. Thus, when speaking about imperfect learning we will mean the degree of imperfect learning rather than its presence or absence. The system was fully preserved in just three languages, two of which were generated in normal condition chains and one in a temporarily interrupted condition chain, and it was also almost fully preserved in three other languages, all of which belonged to normal condition chains. An example of a final (generation 10) language without any damage to the agreement system can be seen in Table 11.1.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
292
As one can see, the last generation language preserves the generation 0 agreement system fully -o is consistently used to mark agreement with seg, and -i with fuv. The only deviation from the generation 0 language structure is the loss of the verb root in one of the sentences of the language (gen. 0 segl ro => gen. 10 segl o), however, this change still conserved the correct agreement suffix. The system disappeared, in turn, in fourteen languages, three of which belonged to the normal condition, five in temporarily interrupted condition, and six in permanently interrupted condition. An example of a generation 10 language that has fully lost the agreement system can be seen in Table 11.2. As Table 11.2 shows, the generation 10 language in this chain has fully lost the -i agreement pattern used for fuv in the generation 0 language, and now uses -o in all sentences, which now is more reasonably analysed as a part of the verb stems than an agreement marker. One can also note that one of the noun stems changed from fuv to fug, likely under the influence of seg. Table 11.1. An example of a final language with a fully preserved agreement system Event fall apart grow antlers fly
Agent Gen 0 round animal
square animal
Gen 10 round animal
square animal
sg
seg
fuv
seg
fuv
pl sg pl sg pl sg pl
segl seg mo segl mo seg ro segl ro seg bo segl bo
fuvl fuv mi fuvl mi fuv ri fuvl ri fuv bi fuvl bi
segl seg mo segl mo seg ro segl o seg bo segl bo
fuvl fuv mi fuvl mi fuv ri fuvl ri fuv bi fuvl bi
Table 11.2. An example of a language with a fully lost agreement system Event fall apart grow antlers fly
Agent Gen 0 round animal
square animal
Gen 10 round animal
square animal
sg
seg
fuv
seg
fug
pl sg pl sg pl sg pl
segl seg mo segl mo seg ro segl ro seg bo segl bo
fuvl fuv mi fuvl mi fuv ri fuvl ri fuv bi fuvl bi
segl seg mo segl mo seg ro segl ro seg bo segl bo
fugl fug mo fugl mo fug ro fugl ro fug bo fugl bo
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
293
In the other chains the initial agreement system substantially deteriorated, but did leave some remnants in generation 10 languages, which made it difficult to precisely characterize the level of system erosion in a qualitative yet objective way. Nevertheless, taking the above findings together, we can see that chains including imperfect learner generations were more likely to completely shed the agreement system and less likely to preserve it.
11.3.2 Quantitative analyses Here we focus on a specific quantitative analysis which operationalizes morphological overspecification in our artificial languages as the expressibility of the only redundant feature, viz. verbal agreement. Expressibility is defined as the proportion of pairs of sentences where meaning differs in (and only in) the agent, and where the surface forms of the verbs are different. The concept can be easily understood by means of Table 11.2. For every language, we ignore the first two rows (as they have no verbal meanings) and then compare pairwise the two cells in the other six rows: are the verbs the same or different? In generation 0, the verbs are always different, and expressibility of agreement would equal 1. In generation 10, the verbs are always the same, and expressibility of agreement would equal 0. As Figure 11.3 shows, although the expressibility of agreement declined in all conditions, it declined to a lesser extent in the normal transmission chains. This pattern is in accord with the qualitative findings reported above. As we mentioned in section 11.3.1, learning is imperfect in all three conditions, but to a lesser degree in the normal one. Taken together, the results of the experiment provided experimental support for the hypothesis that a large share of non-native learners in the population of speakers of a language could lead to the simplification of the morphological structure of that language. More specifically, the study showed that imperfect learning of a language could lead to the loss of morphological overspecification.
11.4 The trajectory of irregularity The initial languages used in the study described above are perfectly regular. While the rule ‘change the verb form depending on the agent’ is redundant, it is still a rule, deterministic and exceptionless, as are the other properties of the initial languages. Irregularity in this setup is equal to zero and thus cannot decrease. At first glance, this setup cannot then be used to test any hypotheses about the potential role of imperfect learning in regularization. Manual inspection of the evolving languages, however, quickly reveals noticeable changes in irregularity. Due to the reasons outlined above they always start with an increase, but some
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
294
1.00
Overspecification
0.75
Transmission Normal Temporarily interrupted
0.50
Permanently interrupted 0.25
0.00 0
1
2
3
4 5 6 Generation
7
8
9
10
Figure 11.3. Change of the overspecification of agreement, as measured by expressibility, over time Note: Shaded regions denote the standard error.
transmission chains show less trivial patterns later. In this section, we present and analyse these patterns. Irregularity emerges because participants fail to learn or to apply a certain rule. Most often, this is the agreement rule, and we will focus solely on the irregularity of agreement (as we did with overspecification in section 11.3).
11.4.1 Probability matching While the participants often fail to learn the rule that governs the distribution of the two agreement markers in the initial languages, they seldom ignore the fact that there are two different markers. When a deterministic distribution rule is not available to learners, they often resort to probability matching, that is, reproduce the variants with approximately the same relative frequency as in the input (Hudson Kam & Newport 2009; Smith & Wonnacott 2010: 447, figure 1), but without a clear consistent rule for when to use which variant. Figure 11.4 demonstrates that our participants do the same with the agreement markers. In all three conditions, the mean relative frequency of the round-animal marker does not deviate much from the initial 50% (and, consequently, the same is true for the
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
295
Proportion of Round Animal Agreement Marker
1.00
0.75 Transmission Normal Temporarily interrupted
0.50
Permanently interrupted
0.25
0.00 0
1
2
3
4 5 6 Generation
7
8
9
10
Figure 11.4. Relative frequency of the agreement marker which denoted the round animal in the initial language of the chain Note: Shaded regions denote the standard error.
second marker). The narrow error bars show that relative frequencies in the individual chains do not deviate much from 50% either (i.e., it is not the case that the mean 50% is a result of half the chains using one marker in 100% cases and the other half in 0% cases). Out of our forty-five chains, fourteen lose agreement completely (see section 11.3.1). Some of those completely replace one marker by another, as the language in Table 11.2, but this happens only in three chains, in the other chains both markers get reanalysed as parts of the verb stems. The most common scenario is represented in Table 11.3. In the final language, all three verbs have only one form. Two (m- and b-) preserve the original round-animal form with the -o ending, one (r-) preserves the square-animal form (-i), thus making the relative frequencies of the markers 2/3 and 1/3, respectively. Out of the fourteen agreement-losing chains, nine arrive at this frequency distribution at the end (counting both cases when it is the roundanimal marker that has frequency of 2/3 and when it is the square-animal one). Analysis of all the individual chains confirms that while a few chains do replace one marker by another completely or almost completely, most keep the proportion not too far from 50% throughout all the generations.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
296
Table 11.3. A language with a fully lost agreement system Agent Gen 0 round animal
Event fall apart grow antlers fly
square animal
Gen 10 round animal
square animal
sg
seg
fuv
seg
fuv
pl sg pl sg pl sg pl
segl seg mo segl mo seg ro segl ro seg bo segl bo
fuvl fuv mi fuvl mi fuv ri fuvl ri fuv bi fuvl bi
segl seg mo segl mo seg ri segl ri seg bo segl bo
fuvl seg mo fuvl mo fuv ri fuvl ri fuv bo fuvl bo
Note: seg instead of expected fuv in the third row is not a typo.
It should be noted that in some chains, verb endings different from the original two emerge. If we calculate denominator of the ratio as the number of all present verb endings and not just the original two, the general picture does not change.
11.4.2 Irregularity and overspecification While the agreement markers continue to be present as elements of form, they lose their connection to the meaning (without being replaced by another element). In order to measure this trend, we pair up the twelve verb forms in the same way as we did when measuring expressibility (see section 11.3.2) and compare the last symbols in the verbs of every pair (manual analysis shows that if agreement is expressed, it is almost always expressed by the last symbol). For every pair of symbols, we calculate how often it occurs (out of six possible cases). Pairs where the symbols are the same get lumped together, regardless of what the symbols actually are. To quantify irregularity, we calculate the Shannon entropy of the probability distribution and normalize it by the maximal entropy, see Equation (1). (1)
Irregularity = H(SC)/log₂(6), where SC is the probability distribution of patterns of agreement expression
This measure is similar to Cuskley et al.’s (2015: 215) Sj measure, used to measure the variability of sub-rules a participant uses in the formation of irregular past tenses. Consider some examples. In the final language in Table 11.1 there is only one pattern of agreement marking: {o, i}, and the same is true for the final language in Table 11.2 (the same-symbol type). Both languages would get an irregularity score of zero. So would the final language in Table 11.3: while there are two different
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
297
Table 11.4. A language with an irregular distribution of the agreement markers Event fall apart grow antlers fly
Agent Gen 0 round animal
square animal
Gen 10 round animal
square animal
sg
seg
fuv
seg
fuv
pl sg pl sg pl sg pl
segl seg mo segl mo seg ro segl ro seg bo segl bo
fuvl fuv mi fuvl mi fuv ri fuvl ri fuv bi fuvl bi
segl seg mi segl mi seg ri segl ro seg bo segl bi
fuvl fuv mi fuvl mi fuv ro fuvl ro fuv bo fuvl bo
Note: Cases where agreement is preserved are marked in bold.
pairs {o, o} and {i, i}, they both fall under the same-symbol pattern. The language in Table 11.4, however, is less regular. The strategy here is almost the same as in Table 11.3 with two exceptions: the verb r- preserved the agent marking in singular, the verb b- in plural. Hence, there are two patterns: the same-symbol pattern (four cases) and {o, i} (two cases). The language gets an irregularity score of 0.36. Irregularity depends on the number of patterns (the more patterns, the higher irregularity is) and the distribution of their probabilities (irregularity is highest if all the patterns are equiprobable). Thus, the least irregular language (apart from the fully regular one, which scores 0) would have two patterns, one of which occurs only once, and would score 0.25. The most irregular language would have six equiprobable patterns and score 1. However, this never happens in our data, the highest observed score is 0.74 (it can be achieved, e.g., by having four patterns: two that occur twice and two that occur once). As can be seen on Figure 11.5, unlike overspecification, in all three conditions irregularity increases rather steeply at first, then starts oscillating around what seems to be a plateau. In the permanently interrupted condition, there is a rather steep decrease during the last two generations, in the other two conditions the peak of irregularity is also closer to the middle (i.e., there is a slight decrease towards the end), but the difference is small. It is, however, interesting to take a look at the individual trajectories of irregularity and compare it to those of overspecification. We do that in Figure 11.6. In most chains, the initial changes in overspecification and irregularity go in exactly opposite directions, that is, the two measures seem to be almost perfectly negatively correlated. Sometimes this trend continues through all the generations (see, e.g., chains 2 and 13). If, however, the overspecification decreases beyond 0.5, the measures become positively correlated and subsequently change almost in unison (see, e.g., chains 22 and 30).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
298
1.00
0.75
Transmission
lrregularity
Normal Temporarily interrupted
0.50
Permanently interrupted
0.25
0.00 0
1
2
3
4 5 6 Generation
7
8
9
10
Figure 11.5. Change of irregularity, as measured by Shannon entropy, over generations Note: Shaded regions denote the standard error.
This behaviour largely follows from the definition of the measures. There are two states where the system is fully regular: complete overspecification and complete absence of overspecification. If the system is closer to the first state (overspecification > 0.5), almost any mutation would change the two measures in different directions (if agreement is lost in one case out of six, it is a decrease in overspecification, but an increase in irregularity), but if it is closer to second state (overspecification < 0.5), then the measures usually change in the same direction (e.g., if the two remnants of agreement in the language in Table 11.4 disappear, both overspecification and irregularity would go down to zero).
11.4.3 Irregularity and learnability For every generation (apart from the final ones) we estimate how learnable its language is. The measure of learnability is transmission fidelity, which is obtained by comparing the language of generation n with the language of generation n+1, calculating the normalized pairwise Levenshtein distance between the sentences with the same meanings and subtracting it from 1. We found that, unlike in most other IALL experiments, learnability clearly decreases over time. If, however, we look at the learnability as a function of overspecification, we find that it follows a
1
2
3
4
5
1.0
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
2
4
6
8
10
0
2
4
6
6
8
10
0
2
4
7
6
8
10
0.0 0
2
4
8
6
8
10
0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
4
6
8
10
0
2
4
6
8
10
0
2
4
12
11
6
8
10
2
4
13
6
8
10
0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
6
8
10
0
2
4
6
8
10
0
2
4
6
8
10
10
4
6
8
10
6
8
10
15
1.0
4
2
14
1.0
2
8
0.0 0
1.0
0
6 10
1.0
2
4
9
1.0
0
2
0.0 0
2
4
6
8
10
0
2
4
Figure 11.6 Change of overspecification (solid line) and irregularity (dashed line) in verbal agreement over generations in individual chains: (a) normal condition; (b) temporarily interrupted condition; (c) permanently interrupted condition
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
0
17
18
19
20
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0
2
4
6
8
10
0
2
4
21
6
8
10
0
2
4
22
6
8
10
0.0 0
2
4
23
6
8
10
0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
4
6
8
10
0
2
4
6
8
10
0
2
4
27
26
6
8
10
2
4
28
6
8
10
0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
6
8
Figure 11.6 Continued
10
0
2
4
6
8
10
0
2
4
6
8
10
10
4
6
8
10
6
8
10
30
1.0
4
2
29
1.0
2
8
0.0 0
1.0
0
6 25
1.0
2
4
24
1.0
0
2
0.0 0
2
4
6
8
10
0
2
4
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
16 1.0
31
32
33
34
35
1.0
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0
2
4
6
8
10
0
2
4
8
10
0
2
4
37
6
8
10
0.0 0
2
4
38
6
8
10
0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
4
6
8
10
0
2
4
6
8
10
0
2
4
42
41
6
8
10
2
4
43
6
8
10
0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
6
8
Figure 11.6. Continued
10
0
2
4
6
8
10
0
2
4
6
8
10
10
4
6
8
10
6
8
10
45
1.0
4
2
44
1.0
2
8
0.0 0
1.0
0
6 40
1.0
2
4
39
1.0
0
2
0.0 0
2
4
6
8
10
0
2
4
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
36
6
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
302
Normal learners
Imperfect learners
1.00
Learn ability
0.75
0.50
0.25
0.00 0
0.25 0.36 0.39 0.48 0.56 0.61 0.69 0.74
0
0.25 0.36 0.39 0.48 0.56 0.61 0.69 0.74
Irregularity
Figure 11.7. Learnability as a function of irregularity
U-curve: high when overspecification is 1 and 0 (slightly higher at 0), but noticeably lower at other values. An obvious reason is that at intermediate overspecification values the system is almost always irregular. On Figure 11.7, we represent learnability as a function of irregularity (averaging across chains and conditions, but keeping normal and imperfect learners separately). As irregularity increases, the learnability indeed decreases, and the decrease is steeper for imperfect learners. Thus, to go from a regular overspecified state (learnable) to a regular nonoverspecified state (more learnable), the system has to pass through an irregular stage (less learnable). The irregular stage can only be avoided if the total loss of overspecification occurs within one generation, which almost never happens.
11.5 Discussion That imperfect learning eliminates morphological overspecification is not surprising and fits well with the predictions of the theories discussed in section 11.1.2. It is also in accord with the knowledge accumulated by acquisition studies. While much is still unknown about how exactly adult learners are different from child learners and why it is so, it seems safe to claim that inflectional morphology is difficult for nonnative speakers and often absent in their speech (DeKeyser 2005: 6).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
303
The observation that irregularity increases under imperfect learning, contrariwise, seems to be at variance with the theories discussed in section 11.1.2. It is, however, not unexpected from the language acquisition and language change perspective. Loporcaro (Chapter 6, this volume) compares Wolof noun morphology to that of other Atlantic languages (the subfamily of the Niger-Congo languages). In Wolof, the system of marking noun classes through initial consonant mutations typical of other Atlantic languages has largely eroded and has been restructured: now noun class is marked on function words that modify the noun, for example articles and demonstratives. However, certain nouns still show remnants of the previous system and can be optionally marked for number through initial consonant alteration. Thus, we see a pattern reminiscent of our experimental results— certain systems of noun classification disappear (thus decreasing overspecification), but leave irregular atavisms (thus increasing irregularity). Clahsen et al. (2010) review evidence in favour of the claim that non-native speakers are less sensitive to morphological structure. They underuse morphological decomposition and rely more on memorization and lexical storage, even of the regularly inflected forms. This effect has been found also in highly proficient non-native speakers which approach native-like performance (Neubauer & Clahsen 2009). While memorization of separate forms per se does not imply irregularity, it clearly creates a friendlier environment for its emergence than does rule-driven form generation. In this context, it is interesting to look at the finding that non-native speakers of English produce significantly more irregular past-tense forms in a Wug-task than native speakers (Cuskley et al. 2015). Cuskley et al., however, argue that the irregularities are still rule-driven and follow the patterns that exist in the set of real English irregular verbs. They hypothesize that the effect is explained by the peculiarities of the non-native input, namely higher relative frequency of the irregular verbs and their higher salience in the explicit instruction. The controversial conclusion of Cuskley et al. (2015) is that despite the seeming preference for irregularity, non-native speakers actually prefer rules over exceptions and simplicity over complexity. Our data lend modest support to Clahsen’s memorization vs. generation account. The elimination of agreement implies that our learners fail to do the full-fledged morphological analysis of their input. Agreement gets affected more than other features, probably because it is redundant and based on a long-distance relationship (verb and agent), and both these factors can inhibit learning (DeKeyser 2005). It is, however, difficult to say whether the normal learners preserve more agreement because they are more sensitive to the morphological structure or because they have more time to memorize the forms. We can only claim that imperfect learning inhibits acquisition of rule-based distributions, but cannot say how exactly it happens.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
304
Usage of complex unproductive rules instead of simple productive ones is one source of irregularity. Another one is the usage of probabilistic rules instead of deterministic ones. Inconsistent probabilistic usage is typical for non-native speakers (Johnson et al. 1996). Hudson Kam & Newport (2005, 2009) in a series of ALL (but not I(terated)ALL) experiments show that if grammatical forms in the linguistic input are used probabilistically, then adult learners usually reproduce the inconsistencies, regularizing only the most infrequent ones in most complex cases. In one experiment, when adults did impose deterministic rules, those were mostly ‘rules of omission which served to remove structure from the language’ (Hudson & Newport 1999: 276). This means that just as with our participants, those learners decreased overspecification but not irregularity. Smith & Wonnacott (2010), however, show that regularization can occur if weak individual biases of adult learners are amplified by iterated transmission. In an IALL study, they show that transmission chains, but not isolate learners eliminate unpredictable variation (see also Smith et al. 2017 on how language use affects bias amplification; Samara et al. 2017 on how sociolinguistic conditioning affects language use by adults and children). Although our chains are twice as long as Smith & Wonnacott’s (2010) fivegeneration chains, we do not see any reliable overall decrease in irregularity (see Figure 11.5). An important difference between the two studies, however, is that Smith & Wonnacott’s participants received probabilistic, or truly unpredictable, input. They saw several signals for exactly the same meaning, and those signals could be different. In our study, the input is, strictly speaking, deterministic, since every meaning is represented by one sentence. Thus, while it is possible that, for instance, ‘fall apart’ will sometimes be denoted by mo and sometimes by mi, the variation will not be fully unpredictable, it will always be possible to condition it on something (e.g., agent or number).² This conditioning is likely to protect variation from elimination. Note that the conditioned variation is still difficult to learn, and participants seldom manage to reproduce faithfully the conditioning ‘invented’ by a previous generation. Instead of eliminating it completely, they replace it by their own conditioning. It can be argued that the participants treat the input as at least partly probabilistic (failing to learn the rule behind the distribution of markers, they nonetheless match the frequencies of markers). The input, however, is not complex enough to trigger the regularization as in Hudson Kam & Newport (2005, 2009). Another reason for the difference from Smith & Wonnacott’s results can be that our languages are more complex and it is more difficult for learners to converge on a regular pattern. In addition, the probability of random mutations that can make
² We are grateful to Kenny Smith for bringing this difference to our attention.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
305
language deviate from the regular state is higher in our case (note that Smith & Wonnacott filtered away certain random mutations that they deemed irrelevant before passing the input on to participants). It should also be noted that while there is a clear difference between the trajectory of overspecification in the normal condition and the interrupted conditions, such difference is absent for irregularity. It can be that overspecification is more sensitive to the degree of imperfect learning. The effect of irregularity on learnability, however, seems to be different for normal and imperfect learners: as irregularity increases, the learnability decreases steeper in the latter category.
11.6 Conclusion We show that during morphological simplification the trajectories of overspecification and irregularity need not be the same and, moreover, are likely to be different. Imperfect learning prevents speakers from acquiring certain morphological rules (especially those that are redundant or particularly difficult) and thus causes decrease in overspecification but increase in irregularity. Interestingly, the degree of imperfect learning seems to affect how much overspecification decreases, but not how much irregularity increases. The increase in irregularity, in turn, makes languages less learnable (this effect is stronger for imperfect learners than for normal ones), unless all overspecification is eliminated and the system reaches the non-overspecified regular state. Our chains seldom reach this optimum, probably because the regularization bias is relatively weak in our participants and the experimental setting suppresses it.
Acknowledgements The experiment was funded by Faculty of Humanities, Social Sciences and Education at UiT, The Arctic University of Norway. AB was supported by the Norwegian Research Council grant ‘Birds and Beasts’ (222506). We are also grateful to the popular-science portal ‘Elementy’ and its editor-in-chief Elena Martynova for advertising the experiment, to Tanja Russita for designing the Epsilon fauna, to Kenny Smith and Peeter Tinits for commenting on an earlier version of the chapter, and to Peter Arkadiev and Francesco Gardani for inviting us to contribute to this volume.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
12 Where is morphological complexity? Marianne Mithun
12.1 Introduction As linguists, we love discovering order in chaos. Grammatical complexity provides us puzzles to play with. An assumption underlying some theoretical models of language has been that the most elegant formal description naturally matches speaker knowledge. But closer attention to what speakers actually do raises the question of whether complexity is in fact the same for the analyst, the speaker, and the language learner. An examination of speech in languages displaying different kinds of morphological complexity, spoken in language contact situations, suggests that they are not.
12.2 What is complexity? Dahl (2004, 2017) provides useful surveys of approaches to complexity, distinguishing first agent-related or relative complexity from objective or absolutive complexity. Agent-related complexity refers to the effort a generalized outsider needs to become acquainted with the system (Kusters 2008: 9). Objective complexity refers to (i) the amount of information needed to specify the system (Kolmogorov complexity); (ii) the length of the description of a set of regularities or recurring patterns (the effective complexity of Gell-Mann 1994); or (iii) the number of parts of a system and/or interactions (Miestamo 2008). Dahl further distinguishes the linguistic material the measures are applied to. System complexity pertains to what a learner must master in order to become proficient in a language, presumably including such things as rules and their exceptions. Structural complexity pertains to the complexity of individual expressions, such as the depth of maximal embedding in a sentence. Corpus complexity measures complexity over samples of connected speech, such as the Greenbergian (1960) calculations of degree of synthesis, or average number of morphemes per word. But morphological complexity is itself not a straightforward matter, as pointed out by editors of this volume in Chapter 1. To compare degrees of synthesis across languages, one could measure the average number of morphemes per word over
Marianne Mithun, Where is morphological complexity? In:The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Marianne Mithun. DOI: 10.1093/oso/9780198861287.003.0012
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
?
307
comparable stretches of speech. Alternatively, one might compare the maximal possible number of morphemes per word. In languages with templatic morphology, one could count the number of slots within the templates, or the number of morphemes per slot, or the coherence of functions of morphemes within slots. One could count all slots, or only those which are obligatory. The fact that a morphological structure is templatic could itself be viewed as adding complexity: it would mean that morpheme order does not follow naturally from scopal relations and must be stipulated. Discussions of morphological complexity usually include form/function mappings as well. Deviations from one form : one function correspondences have been cited as added complexity (Anderson 2015a). Such phenomena would include fusion, suppletion, syncretism, dependence on lexical classes, and elements with no discernible meaning. The very existence of morphological complexity might seem to be counterproductive, adding useless difficulty to the acquisition and use of language. But even where the complexity seems arbitrary, the factors which produce it are not. Perhaps the most important factor is cognitive. Frequently-recurring sequences of meaningful elements eventually tend to become routinized and stored in memory as chunks, as described by Bybee & Beckner (2015) and many others. Over time, the formal and semantic salience of their individual components fades for speakers, and their forms can erode. Another intriguing possible factor in the development of complexity, raised by Dahl (Chapter 13, this volume), Trudgill (2011, 2017), and Dale & Lupyan (2012), is the sociocultural context in which a language is used. Small communities, with dense social networks which persist over long periods of time, might foster an increase in complexity. If speakers interact regularly with a limited set of interlocutors, the relative frequency of particular turns of phrase might increase, setting the stage for routinization and just the kinds of grammaticalization processes that underlie complexity. Multilingualism within the community might affect complexity as well, but in several possible ways. Intensive, longstanding bilingualism might lead to an increase in complexity, as early bilinguals replicate grammatical distinctions of each language in the other, adding to the total number in each. If, on the other hand, the bilingualism has a different profile, consisting, for example, of a substantial proportion of untutored adult learners, there might be an overall decrease in complexity, as second-language speakers systematically choose simpler, analytic constructions over more complex, synthetic ones. Here the fate of morphological complexity under contact is explored in two languages with slightly different kinds of complexity. The data come from conversations among first-language speakers affected to varying degrees by contact. The implications of the findings are then considered for our larger understanding of morphology.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
308
12.3 Central Pomo Central Pomo is a language of the Pomoan family, indigenous to an area of Northern California approximately 100 miles north of San Francisco. It shows a certain degree of morphological complexity, but it would not be considered polysynthetic in one narrow sense: arguments are not specified within the verb. Verbs can show other kinds of morphological elaboration, however, including specification of means/manner, location/direction, various kinds of verbal number, argument structure (causatives, reciprocals, passives), inchoatives, aspect, dependency, and more. An example is in (1). Affixes are in bold. (All examples here are taken from unscripted speech.) (1)
Central Pomo verb structure (Frances Jack, speaker p.c.) Mu:l bašá ʔel ʔ-áʔ-č’i-n ʔe that buckeye the fingering-gather-.-.. ‘When gathering buckeyes, kúyq’a:l ʔe mu:l m-t ̯’á:-ka-w-aʔ-ya-w. right.away that heat-sense---.-- you have to cook them as soon as you get them.’
Complexity can be affected in a variety of ways by the sociocultural context in which languages are spoken. Trudgill (2011) has proposed that small communities, with tightly-knit social networks and frequent interaction among small numbers of participants, could foster the growth of complexity. Enhanced frequencies of recurring expressions could result in routinization and morphologization. Language contact can affect complexity in quite diverse ways. Early bilingualism might increase complexity, as children, who have the least difficulty in acquiring complex systems, replicate distinctions from one of their languages in the other. Late bilingualism in a large proportion of a population might decrease morphological complexity, as adult second-language speakers opt for more analytic forms of expression. Importantly, the encroachment of one language on another might have a simplifying effect, as spheres of usage of the endangered language and frequency of its use are reduced. Northern California is a recognized linguistic area, with striking structural parallels across the languages, including morphological distinctions. Communities have always been small, and exogamy common, so a good proportion of children were raised in bilingual households. The small communities and longstanding, intense contact could well have contributed to morphological complexity. One feature that is widespread across languages of the area is the specification via verbal prefixes of means/manner/instrument (Mithun 2007). Examples of their functions can be seen in (2) with the Central Pomo verb root t̯’é:č’ ‘stick together’.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
? (2)
309
Central Pomo means/manner prefixes t ̯’é:č’ ‘stick together, be alongside each other’ da-t ̯’é:č’ ‘push on something that sticks in your hand’ ʔ-t̯’é:č’ ‘stick on with fingers, as chewing gum under table’ ma-t ̯’é:č’ ‘step on a nail or something that sticks in your foot’ ča-t̯’é:č’ ‘sit on a thorn, put a patch on pants’ h-t ̯’é:č’ ‘stick up a pole, pitchfork, shovel, in ground’ m-t ̯’é:č’ ‘catch fire’ ph-t̯’é:č’ ‘hammer a nail into the wall, nail something on’ pha-t̯’é:č’ ‘something floating downriver gets stuck on bank’ s-t ̯’é:č’ ‘while one is drinking, something gets into the mouth that doesn’t belong, like dirt or a bug’ ša-t ̯’é:č’ ‘stick a support, as a box, next to something long, like fence posts stored upright for use’
Two of the prefixes seen in (1) also occur here: ʔ- ‘fine finger action’ and m‘involving heat’. Another widespread feature is the specification of location and direction. Central Pomo examples of such suffixes are in (3) with the verb čá- ‘run’. (Perfective aspect is marked here with the suffix -w after vowels and glottal stop after obstruents. Imperfective aspect here is -an.) (3)
Central Pomo directional suffixes čá-w ‘run’ (one) čá-:la-w ‘run down’ čá-:qač’ ‘run up (as up a hill)’ čá-č’ ‘run away’ čá-way ‘run against hither, as when a whirlwind came up to you’ čá-:ʔw-an ‘run around here and there’ čá-mli-w ‘run around it (tree, rock, house, pole)’ čá-mač’ ‘run northward’ čá-:q’ ‘run by, over (on the level), south’ čá-m ‘run over, on, across (as bridge)’
A third area of morphological elaboration in Central Pomo as well as in related and unrelated but neighbouring languages is a set of suffixes and enclitics that mark dependent clauses. The markers distinguish what speakers cast as elements of a single larger event or state () and what they cast as related but distinct events or states (). In addition, the markers distinguish realis from irrealis situations. For realis situations, simultaneous or overlapping events and states are distinguished from those viewed as consecutive (sequential). Examples of the realis same suffix -(i)n are in (4).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
310
(4)
Central Pomo dependent same Mú:l ʔe mu:t̯uya, mó da-héle:č’-in that 3. hole pulling-dig- Then, they would dig a hole, hó ʔmáhč’i-n, hole build.fire- build a fire, mi: ʔ=mú:t̯uya lóq’ ts’aqʰáṭ m-čá-la-w-ač’-in, there =3. thing greens -throw-horizontally--.- throw green stuff in there, mi: ʔ=mú:t̯uya mu:l šá ʔel m-ča-la-w-ač’-in, there -3. that fish the -throw-horizontally--.- throw those fish in there, m-ṭ’á-:ka-w-ač’. heat-sense---. and cook them.’
Examples of the realis different enclitic =da are in (5). (5)
Central Pomo dependent different Šé: ʔul ma, yém-aq-’=da longtime already 2 old--= ‘In the future, when you are older ʔá: čʰó-w=da, 1. not.exist-= when I am no longer here, ma ʔ-yá:q-an-ka-w=ʔkʰe 2. mentally-recognize-.--= you will see.’
Speakers can vary in their packaging of events as or . Generally the kinds of factors that enter into their decisions include continuity versus discontinuity of topic, place, and time. The first sustained contact between Pomoan speakers and a European language was in the nineteenth century, when California was a part of Mexico. Contact with Spanish resulted in the adoption of some nouns, primarily designating introduced concrete objects, but it had little apparent effect on the morphological complexity of Central Pomo or its neighbours. During the twentieth century, schools were established in which children were required to speak English, and many children were sent away to boarding schools where they were forbidden to speak Central Pomo. One man born in 1912 recalled that when he left the community at age 5, pretty much everyone spoke the language. When he returned ten years later, almost no one used it on a daily basis.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
?
311
All of the speakers represented here learned Central Pomo as their mother tongue. All subsequently learned English as well, but they had varied histories. All ultimately returned to live in Central Pomo communities. (6)
Central Pomo speakers cited here Speaker 1: Fluent speaker (F), spoke Central Pomo on a daily basis Speaker 2: Fluent speaker (F), away for a few years as young woman otherwise spoke Central Pomo on a daily basis (daughter-inlaw of Speaker 1) Speaker 3: Early Acquisition of Central Pomo Fairly fluent speaker (F), left at age 18, away for 30 years, married to a non-speaker, occasional use of Central Pomo Speaker 4: Early Acquisition of Central Pomo Less fluent speaker (F), lived in community until age 13, returned 30 years later, widowed, occasional use Speaker 5: Early Incomplete Acquisition of Central Pomo Halting speaker (F), language scorned by father, departure for boarding school age 5, rare use Speaker 6: Some early acquisition of Central Pomo Son (M) of Speaker 1, older brother of Speaker 5, son of nonspeaker, boarding school ages 5–15, rare use
The Central Pomo of Speakers 1 and 2 shows full fluency and articulateness. That of the others provides some insight into potential effects of contact on morphological complexity.
12.4 Obsolescence and morphological complexity With reduction in language use, particularly in situations of contact with a less synthetic language, we might expect a reduction in morpheme per word ratios. One way to investigate this hypothesis is to compare the speech of individuals with differing balances in their bilingualism. As noted, all of the speakers cited here learned Central Pomo as a first language, then later learned English. For a preliminary comparison of morphological complexity, the speech of Central Pomo-dominant speakers was compared with that of now English-dominant speakers during the same conversations, so that the topics of discussion, discourse contexts, and social setting were constant. Calculations of morphemes per word revealed surprising results. In one conversation, for example, both Speaker 2 and Speaker 5 averaged precisely 1.44 morphemes per word! Other comparisons yielded similar results.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
312
The nature of the morphological complexity with varying contact effects differs in several ways, however. One is underspecification of certain distinctions regularly mentioned by the most Pomo-dominant speakers. Speaker 4, for example, who was away from the community for some time and did not use the language often after her return, made the comment in (7). (7)
Central Pomo direction: Speaker 4 [‘I walked out,] ʔa: yhé-:n ht̯ow hčé-hče-w. 1. do-. from stagger-stagger- but I was staggering.’
She used reduplication to describe her staggering, but Speaker 2 later commented as we were transcribing the recording that a more dominant Pomo speaker would have used the verb in (8), specifying direction with the suffix -:ʔw- ‘around here and there’. (8)
Central Pomo direction: Speaker 2 hihčé-:ʔw-an stagger-around-. ‘was staggering around’
Whether or not the reduplicative strategy used by Speaker 4 is morphologically simpler than the directional suffix construction suggested by the more fluent Speaker 2 could be debated. Reduplication for iteration does occur elsewhere in the language as a derivational process creating lexical items. To Speaker 2, it was less idiomatic, and the perfective aspect less appropriate than the imperfective. Speaker 4’s comment could be interpreted as an active innovative extension of existing patterns, or the symptom of a more limited vocabulary. Central Pomo verbs contain numerous kinds of number distinctions. One is inflectional. Imperfective markers, as well as the other aspect suffixes derived from them, obligatorily indicate subject number: basically -(a)du- for singulars and -(a) č’i- for plurals. As speakers were discussing the special knowledge Pomo people have about gathering seafood, fluent Speaker 2 made the first comment in (9). The fact that she was describing multiple people was clear not just from the plural pronoun mú:t̯uya ‘they’, but also the plural imperfective suffix -č’i- on the verb ‘know’, the distributive -t̯ay on ‘knowledgeable’ (since each was knowledgeable in their own right), and the distributive -ay on ‘people’. When Speaker 4 echoed the thought, she used the singular form of the verb ‘know’.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
? (9)
313
Central Pomo number: Speakers 2, 4 2 Hínt̯il ʔ=mú:t̯uya šá:-t̯’a:ʔ-č’i-w Indian =3 knowledge-sense-.- ‘Indians know that.’ . . . Ma: šá:-t̯ay ʔe mu:l hínt̯il čá:č’-ay. stuff knowing- that Indian people- ‘They know things, Indians.’ 4 Mm. ʔúda:w ma: šá:-t’a:ʔ-du-w. lots stuff knowledge-sense-.- ‘[He] knows lots of stuff.’
Speaker 2 later commented that Speaker 4 made it sound like just one person is smart. Speaker 4’s imperfective verb, a frequently-occurring one, was well-formed, but inappropriately selected in this context. The speech of less fluent speakers does show morphological complexity. On another occasion, Speaker 4 offered the explanation in (10) with a complex verb. (10)
Central Pomo morphological complexity: Speaker 4 Mé:n=ʔt̯i ʔa: car čá-:ʔw-an-ka-w=ʔkʰe so=but 1. run.-around-.--= t̯ʰi-n ʔi-n. not- be- ‘That’s why I don’t drive.’
The verb is certainly morphologically complex, but it is highly frequent. Speaker 4 did not assemble it online: she selected it as a fully-formed lexical item. The same speaker used the verb in (11). (11)
Central Pomo morphological complexity: Speaker 4 ba:-yú:-čʰ-ma-w=ʔkʰe orally-know--.-= ‘they will understand’
This verb, too, shows some morphological complexity, and it is well-formed. But the context is revealing. It was part of a conversation among Speakers 2, 3, and 4. (The full conversation was in Central Pomo. Just the translation of Speaker 3’s remarks are presented here for context.) (12)
Central Pomo morphological complexity: Speakers 3, 2, 4 3 ‘My daughter says that we don’t want the White people to understand us. That’s why we speak Indian.’ 2 Mú:t̯uya ba-yú:-cʰ-ma-w=ʔkʰe ṱʰi-n. 3. orally-know---= not-. ‘They won’t understand.’
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
314
4 Pretty soon ba:yú:čʰmawʔkʰe. ‘Pretty soon they will understand.’
Speaker 4 was echoing the verb just used by Speaker 2. Morphological complexity like this is not something speakers usually produce online as they speak. Speakers know lexical items: they know which formations exist and which do not, and for those that do, they know their specialized contexts of use. Among the verbs in (2) above with means/manner prefixes was pʰ-ṭʰé:č’, literally ‘by.swinging-stick’. When asked what this word means, Speaker 2 replied ‘hammer a nail into a wall’. Skilled speakers, who spend a major part of their day in the language, have larger lexical inventories and an acute sense of the precise contexts in which items are used. Their awareness of the components of morphologically complex words varies, but in most cases the internal structure of words is opaque to them. This is not altogether surprising. They rarely if ever saw the language written, and many morphemes are no more than a single consonant. Central Pomo contains a passive construction which functions to eliminate a grammatical agent from the clause. The agent may be generic, unknown, or unimportant, and it cannot be mentioned. The passive marker is a verbal suffix -ya. It is added to both transitive and intransitive stems. An example from fluent Speaker 2 is in (13). She was describing a conversation that had taken place at the senior citizens’ center. The identity of the eaters was not important; the passive clause simply served to locate the event. (13)
Central Pomo passive: Speaker 2 Béda maʔá: qa-wá-:ʔ-ya-w=da here food biting-go-.--=. ‘When (people) were eating here ʔi’=ma mu:l– Mitch=t̯o be- that Mitch= she– told Mitch’s mother . . . ’
ṭʰe-l . . . mother-
Slightly less fluent Speaker 3 used a well-formed passive verb, but inappropriately. (14)
Central Pomo passive: Speaker 3 [‘He’s looking for a woman.’] Má:t̯a-ya q’á:-ya-w ʔe. woman- leave-- ‘His wife he was left.’ (For ‘His wife left him.’)
As we later transcribed and translated the conversation, fluent Speaker 2 noted that Speaker 3 should have either used the basic transitive verb q’á:w ‘left’ or not mentioned the wife.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
?
315
Less fluent Speaker 4 used passive verbs in (15). (15)
Central Pomo passive: Speaker 4 Lady oranges qó=de:-ya-w and hither=carry-- ‘A lady brought oranges and needle
qó=de:-ya-w. hither=carry-- brought needles.’ Here, too, the passive verb forms are well-formed but incompatible with mention of the agent, the lady. Both (14) and (15) indicate that the speakers were selecting pre-formed words, rather than constructing them online as they spoke. Example (15) also reflects a smaller lexical inventory. As Speaker 2 later noted, a better choice for the first verb would have been qó=di-w, and for the second qó=be-w. Different verb roots are used for carrying a single round item (de-), multiple round items (di-), and long items carried horizontally (be-). Speaker 4 did use some passive verbs appropriately, as in (16). (16)
Central Pomo passive: Speaker 4 Qʰá:p’-ṭ’á:-ya-w. pity-feel-- ‘Pitiful!’
This is a highly lexicalized, frequent expression. The speech of less Pomo-dominant speakers differed in another way. As seen earlier in examples (1), (4), and (5), the language contains a rich set of dependency markers. Less fluent speakers tend to use less morphological clause combining, as can be seen in (17). (17)
Central Pomo clause combining: Speaker 3 ʔa: E=t̯o čá-l=yo-w 1. E= house-to=go- ‘I go to E’s house (and) hínt̯il ʔel ča:nó-d-an=ya mú:t̯u. Indian the talk-.-.=. 3. talk Indian to her.’
Speaker 2 later commented that the first verb should have been čályohdu-n, ending in the realis event dependency suffix, rather than the perfective -w, yielding a sentence meaning ‘When I go to E’s house I talk Indian with her’. Speaker 3’s prosody in (17) reflected this structure: she did not end the first clause with a terminal fall in pitch or a significant pause.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
316
This same speaker made the comment in (18). (18)
Central Pomo clause combining: Speaker 3 Mkʰé ba:ʔá čʰo-w 2. food not.exist- ‘Even when you don’t have food ma mu:lt̯ayat̯’, 2. 3. bú ʔel ma mu:l fry-č-in, beans ʔel fryč-in . . . potato the 2. that fry--. beans the fry--. you fry potatoes for them, fry beans . . . ’
Speaker 2 later commented that she herself would have used the dependent verb form čʰó-w=da in the first clause, with the realis different event dependency enclitic =da. The puzzle remains as to why the joint conversation between fully fluent Speaker 2 and struggling, English-dominant Speaker 5 should show exactly the same morpheme per word ratio: 1.44. Speaker 2 actually spoke more during the conversation, with twice as many words (tokens). Significantly, she used many more different words. Speaker 5 used just nine different verbs (types), all but four of them repetitions of verbs just used by Speaker 2. Overall, there are two main differences between the speech of fully fluent Speakers 1 and 2 on the one hand, and more English-dominant Speakers 3, 4, 5, and 6 on the other. The first is lexical knowledge. Fluent speakers who spend more time in the language know more words and lexicalized constructions. They can thus make finer semantic distinctions, as with verbs specifying means/manner, location/direction, and different kinds of carrying, all seen here. The second is that fluent speakers have more alternatives for shaping the flow of information, with passives, clause linkers, and discourse particles. A significant difference between the two groups is in fact the use of discourse particles, which convey such distinctions as source and certainty of information (hearsay, inference, etc.), contrast with expectation versus common knowledge, and much more. Fully fluent speakers use substantially more such particles. Since the particles are monomorphemic, their pervasiveness lowers the average number of morphemes per word.
12.5 Mohawk Mohawk is a Northern Iroquoian language indigenous to the North American Northeast, currently spoken in communities in Quebec, Ontario, and New York State. It is prototypically polysynthetic. It is holophrastic in the narrow sense: one
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
?
317
word, a verb complete with pronominal arguments and predicate, can constitute a full sentence. The verb in (19), for example, would be a complete sentence on its own. (19)
Mohawk holophrasis: Ima Johnson, speaker ‘We were driving along and saw a sign advertising free feed with chickens.’ Kén: ne:’ ia’akiate’serehtínion’t ne:’ thí:. ken: ne:’ i-a’-ak-i-ate-’sere-ht-inion-’t-e’ ne:’ thiken here it.is --1.---dragit.is that -be.in-- here it is we two caused our dragger to be in there it is that ‘So we pulled in.’
There are three lexical categories in Mohawk, defined in terms of their internal morphological structure: verbs, nouns, and particles. (Particles are monomorphemic, though they are sometimes compounded.) The morphological structures are templatic; that of verbs is the most elaborate. The basic verb template is in Figure 12.1. Within the blocks of pre-pronominal prefixes and derivational suffixes there are multiple slots. The prepronominal prefixes include a Contrastive, Coincident, Partitive, Translocative, Factual, Duplicative, Irrealis, Future, Cislocative, and Repetitive. The derivational suffixes include an Inchoative, Reversives, Causatives, Instrumental Applicatives, Benefactive Applicatives, a Directional Applicative, Distributives, Andatives, and Ambulatives. There are around sixty pronominal prefixes, three aspect suffixes, and four final tense/mood suffixes. Nearly all show phonologically and/or morphologically conditioned allomorphy. As in many templatic systems, there are discontinuous dependencies among morphemes. Certain verb roots require a Duplicative prefix (), for example. In some cases, a semantic rationale can be discerned: the Duplicative can indicate some kind of ‘two-ness’ or a change of state or position, though its occurrence is lexicalized with each verb. In other cases, any semantic contribution has faded. Some other verb roots require certain other prepronominal prefixes, in what are now lexicalized combinations. Another discontinuous dependency holds between inflectional prefixes and suffixes. The perfective aspect suffix, for example, requires the presence of a Factual, Future, or Irrealis prepronominal prefix.
PREPRONOMINAL PREFIXES
PRONOMINAL PREFIXES
REFLEXIVE MIDDLE
Figure 12.1. Mohawk verb template
NOUN STEM
VERB ROOT
DERIVATIONAL SUFFIXES
ASPECT SUFFIXES
TENSE MOOD
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
318
12.5.1 Inflection All verbs must contain a verb stem, an inflectional pronominal prefix identifying the core arguments of the clause, and an inflectional aspect suffix. There are three sets of pronominal prefixes: grammatical Agents, grammatical Patients, and transitives, which are Agent>Patient combinations. A transitive prefix can be seen in (20). (20)
Mohawk transitive pronominal prefix: Ima Johnson, speaker Taionkhí:ion’ kítkit. ta-ionkhii-on-’ kitkit .-.>1-give- chicken ‘They gave us chickens.’
An assumption that still sometimes appears in the literature is that speakers create inflection by rule, because no one could ever remember so many forms. For Mohawk, the matter is not so simple. Even excellent speakers have differential control over pronominal prefix—root combinations. Some combinations simply occur more often than others: first person singulars are very frequent, for example, while masculine duals are less so. Verb stems beginning with a are very frequent, in good part because the middle voice prefix, which occurs at the beginning of stems, has the shapes -at/-ate-/aten-/-an-/-ar-, while those beginning with the vowel i are relatively rare. Under elicitation, speakers hesitate more with rarer forms: rarer pronominal prefixes, rarer phonological contexts, rarer full words. This does not mean of course that they cannot create new forms by analogy.
12.5.2 Derivation As seen above, the verb allows for morphological expression of a number of distinctions. Skilled Mohawk speakers tend to exploit these more than less Mohawk-dominant speakers. An example of this precision can be seen in (21). The fluent speaker cited above continued her account of the chicken adventure in the course of a conversation with friends. She and her husband bought some chickens and built a chicken coop. They enjoyed hearing the rooster crow in the morning. But one morning the chickens were screaming more than usual. Her husband suggested that something must be after them, and the couple went to look. When the husband peered through a hole in the wall, he saw that something had gotten ahold of one of the chickens, and it was screaming. She suggested he get his gun. It was a weasel. It had already bitten the chicken on the leg. Her husband took aim and shot. The weasel looked around, wondering what had happened. As the wife continued her story, each time she mentioned an event that
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
?
319
had happened before, she included the Repetitive prefix sa- ‘again’ on the verb, whether or not there was a separate particle á:re’ ‘again’. (21)
Mohawk Repetitive prefix sa-: Ima Johnson, speaker Ó:nen á:re’ ne kwáh sahaié:na’ onen are’ ne kwah sa-ha-iena-’ now again the just .-..-grab- now again the just he re-grabbed ‘Still (again) he grabbed onto the chicken.
thi: thiken that that
Ó:nen á:re’ nakwáh taonsaiohén:rehte’ onen are’ nakwah t-a-onsa-io-henreht-e’ now again very.much ---.-yell- now again very much did it re-yell The chicken really screamed. Ó:nen á:re’ sahate’sennón:ni’ onen are’ sa-ha-ate-’sennonni-’ now again .-..--aim- now again he re-aimed (Again) my husband took aim. Thi:, . . weasel thiken that weasel That weasel
kítkit. kitkit chicken chicken
thi: thiken that that
kítkit. kitkit chicken chicken
ne rikstèn:ha. ne ri-ksten=ha the 1>.-be.old=mdim the I have him as old man
nen onen then
kwáh taonsahatkahtónnion’ kwah t-a-onsa-ha-at-kaht-onnion-’ just ---..--look-- just he re-looked around just looked around (again)
ne ne the the
á:re’. are’ again again
Nok á:re’ taonsahatekhwá:ko’ ne ok are’ t-a-onsa-ha-ate-khw-ako-’ the too again ---..--meal-take- and again he re-bite-took And then he took (another) bite thi: kitkit ne kahsinà:ke. thiken kitkit ne ka-hsin-a’ke that chicken the ..-leg-place that chicken its leg place out of the chicken on its leg.’
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
320
12.5.3 Noun incorporation Noun incorporation, the compounding of a noun stem with a verb stem to form a new verb stem, is pervasive in Mohawk, but it is a word-formation device. Speakers generally know which forms are part of the lexicon of the language and which could be but are not. Awareness of neologisms depends on the productivity of individual noun and verb stems. Some noun stems are never incorporated, some are sometimes incorporated, some are often incorporated, and some occur only incorporated. Similarly, some verb stems never occur with an incorporated noun, some occur in a few combinations with nouns, and some in many. New combinations with less productive stems are more often noticed than those involving highly productive ones. Often the language provides alternatives for packaging information: a noun may occur as an independent word or incorporated into a verb. The density of incorporation for discourse purposes generally varies across speakers with the degree of language use. Examples of noun incorporation can be seen in (22), part of a conversation between a grandmother and her granddaughter as they were making meat pies. The grandmother was a highly skilled speaker, who learned English only after she went to school. The granddaughter heard Mohawk as a child, but spent most of her daily life in English. (The entire conversation was in Mohawk, but just the free translation is given for the first few lines to provide context.) (22)
Noun incorporation: Grandmother and granddaughter GM: ‘Go get the wooden bowl.’ GD: ‘Wooden bowl?’ GM: ‘Wooden bowl.’ GD: ‘Wha– GM: ‘You’ll use it to put the flour in.’ GM: Othè:sera’ ostòn:ha, sok, kén:ie’. flour a little then fat ‘A little flour, and then, fat. Tánon’, um, And, um, né: ní: ke-rákw-as that the=1 1.-choose- the I I prefer n=en-ke-wist-á:wen-ht-e’. the=-1.-fat-liquid-- I will fat melt and I myself prefer to melt the fat.’
The grandmother first introduced the fat with the independent noun kén:ie’. Once it was an established referent, she incorporated it: enke-wist-á:wenhte’ ‘I will fat melt’. (Incorporated noun stems are not always the same as their independent
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
?
321
counterparts.) The combination ‘fat-melt’ is of course a common one. The conversation continued. (23)
Noun incorporation: Grandmother and granddaughter GD: To: ní:kon? ‘How much?’ GM:
En, o: ní:se’ enhsanónhton’ en o: ne=ise’ en-hs-anonhton-’ ah oh the=2 -2.-think- ah oh you you will think ne: tho: ní:ioht ne: tho: ni-io-ht that there -.-be.so it is there so it is ‘Ah, you’ll decide tsi ní:kon – enhsena’tarón:ni’. tsi ni-k-on en-hse-na’tar-onni-’ how --be.amount -2.-baked.goods-make- how so it amounts you will baked.goods make according to how many pies you’re making.’
At this point the pies were well-established referents, active in the consciousness of the speakers, so it is no surprise that the noun stem -na’tar- ‘baked goods’ was incorporated. There was little need to highlight it. The verb ‘make’ is what could be called a ‘light verb’, not adding highly complex, new information. It is one that frequently incorporates, and the combination ‘baked.goods-make’ = ‘bake’ is a common one. As the conversation continued, the granddaughter introduced referents with independent nouns, and the grandmother picked them up with incorporated nouns. (24)
Noun incorporation: Grandmother and granddaughter GD: Tánon’, o’wà:ron’, tánon’ ohnennà:ta’? and meat and potato ‘And meat and potatoes?’ GM: En, tsi nikarì:wes ki: sarhá:re’ sok– ah as so it is matter long this you are waiting then ‘Ah, while you’re waiting, enhshennà:ton’, tánon’ teka’wahraríhton. en-hs-henna’t-on-’, tanon’ te-ka-’wahr-a-ri-ht-on -2.-potato-cook- and --meat--cooked-- you will potato cook and it is meat cooking then you’ll cook the potatoes, and the meat is cooking.’
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
322
During this conversation, the grandmother talked more than the granddaughter, with five times as many words (tokens) overall. But as in Central Pomo, perhaps surprisingly, their average morpheme per word ratios were nearly identical: the grandmother’s speech averaged 2.4 morphemes per word, and the granddaughter’s 2.3. As in the Central Pomo conversations, the skilled speaker, the grandmother, used many more discourse particles, which are monomorphemic. The granddaughter used some highly lexicalized polymorphemic words, words that she clearly selected as familiar chunks, and fewer particles. The speech of the two was overall quite different, in many of the same ways as in Central Pomo. Skilled speakers like the grandmother here spend more of their time in the language and simply know more words and more constructions. They have more lexical items to choose from, including verb stems with incorporated nouns, and more choices among constructions for shaping the flow of information.
12.5.4 Processing The crucial role of lexicalization in processing can be seen in interactions among speakers of different dialects. There are six Mohawk communities, distributed across Ontario, New York State, and Quebec. These are, from west to east, Ohswé: ken, Wáhta’, Tehaientané:ken, Ahkwesáhsne, Kanehsatà:ke, and Kahnawà:ke. Phonological differences among the dialects are relatively minor. Where speakers in the west pronounce the affricate written as an alveopalatal before a high front vowel or palatal glide, those in the east pronounce it as alveolar. Where some speakers pronounce as a retroflex flap, others pronounce it as a lateral [l]. Where speakers in the west continue the pronunciation of original *ty and *ky, those at Ahkwesáhsne pronounce both as velar, and those at Kanehsatà:ke pronounce both as alveopalatal. Morphology is constant across the dialects. The morphological templates are the same, as are the inventories of prefixes, roots, and suffixes. Principles of syntax are also the same. Constituent order is purely pragmatically based. Quite surprisingly, when a recording of an excellent speaker from Ohswé: ken’, the westernmost community, was played for skilled speakers in Kahnawà: ke, the easternmost community, they had difficulty understanding him. The barrier was not the individual morphemes, nor their patterns of combination, which are essentially the same in all of the dialects, but vocabulary, the preformed chunks. Over the past several centuries since their separation, different lexical items have developed in the different communities. These Kahnawà:ke speakers were not processing his speech morpheme by morpheme, but word by word.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
?
323
12.5.5 Native acquisition An intriguing issue for the acquisition of languages with complex morphology is how children first break into the system. They never hear verb roots or stems in isolation; in fact Mohawk speakers themselves cannot isolate roots or stems (unless they become linguists). Over the past several decades, there have been relatively few children learning Mohawk as a first language (though that pattern is beginning to change), so large-scale studies of acquisition have not been possible. Some principles have emerged, however, from observation of a few children acquiring the language (Mithun 1989). The first is that the earliest stages of acquisition are phonologically based. Children first extract the stressed syllable of words. This choice is actually useful. Stress basically falls on the penultimate syllable, the second from the end (though certain epenthetic vowels are passed over). The stressed syllable often coincides with the root or part of it, so the children can often get their message across. Progress remains phonologically based for a time: the child first adds the ultimate syllable, producing two-syllable words, then the antepenult, etc. An example of adult/child interaction can be seen in (25). (25)
Child Mohawk: Adult and child, 2;2 Adult Child Wa’kéta’. Kéta’. ‘I’m putting them in.’
Some later child versions of words are in (26). (26)
Child Mohawk Adult Child osahè:ta’ ahe:ta’ ohiákeri iákeri tehotskà:hon otskà:hon
‘beans’ ‘fruit juice’ ‘he’s eating’
What is at first astonishing about the Mohawk of young children is what appears to be their allomorphic skill. The masculine singular agent pronominal prefix ‘he’, for example, has the form ra- word-initially and -ha- word-internally, except that it is basically -hr- after a stressed vowel or before the vowels o, on, e, or en. When the following stem-initial vowel is i, this vowel merges with the a of the pronominal prefix to the nasalized vowel en, ([ᴧ̨]), yielding allomorphs ren-/-hen-/-hren-. (This fusion is characteristic of some other pronominal prefixes, but not a general process throughout the language.) Otherwise, the final a of the pronominal prefix is lost before another vowel. Another phonological process involves coda h in stressed syllables: the laryngeal produces a distinctive high-fall pitch contour (indicated orthographically with a grave accent) on that syllable,
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
324
then disappears, leaving vowel length. The masculine singular agent prefix thus has the forms -hra-, ra-, -ha-, -hr-, r-, -hren-, -hen-, and -ren. And yet, children never seem to make mistakes! At age 2 years and 10 months, one child easily asked Ka’ wà: re? ‘Where’s he going?’, never tripping over those complex phonological processes. (27)
Mohawk phonology Ka’ wà:re’? ka’ wa-hra-e-’ where -..-go- ‘Where is he going?’
Of course this is no surprise. The child knew the full question as a chunk; he did not manufacture the word from underlying forms of morphemes, then apply multiple phonological processes to arrive at a surface form. The second person singular agent pronominal prefix ‘you’ is basically s- wordinitially, -hs- word-internally, with epenthetic -e- before stems beginning in n, r, or w and certain consonant clusters. The basic form of the perfective aspect suffix is glottal stop ’, with epenthetic -e- after consonants. As noted above, stress is penultimate, with stressed vowels lengthened in open syllables, but epenthetic vowels do not enter into the determination of stress. The child cited above in (27) similarly came out with the exclamation in (28) below easily and perfectly, despite the complexity of the processes that would go into building it from underlying forms then applying a sequence of phonological rules. (28)
Mohawk phonology Sótsi enhserá:kewe’! sotsi en-hs-rakew-’ too -2.-wipe- ‘You’re going to erase too much!’
Of course the child learners did not emerge instantaneously with Mohawk equivalent to that of adults. About the time they were producing three-syllable words, they began to discover morphology, usually with a few more frequent pronominal prefixes. (These immediately precede the verb stem.) From this point on, acquisition was governed more by morphology than phonology. As seen earlier, Mohawk speakers generally specify the direction of directed motion, with a Translocative prefix i-/ie-/ia-/ia’-/iaha- ‘thither’ or a Cislocative prefix t-/ te-/ta-/-onta-/-onte-/-ont- ‘hither’. A Translocative prefix was seen earlier in (19) in the verb i-a’akiate’serehtínion’t ‘we pulled in there’. At 2 years and 10 months, the child cited in (27) and (28) generally omitted the directional prefixes.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
? (29)
Mohawk direction Child Adult version enháhawe’ i-enháhawe’ waháhawe’ i-aháhawe’
325
‘he will take it’ ‘he took it’
(The initial w of the factual prefix regularly disappears following the Translocative.) Negation is expressed in Mohawk, as in many languages, with a combination of markers: the particle iáh plus an initial Negative prepronominal prefix te’- or Contrastive prepronominal prefix th-/tha-/tha’-. This child used the analytic marker iáh alone at this age. (30)
Mohawk negation Iáh thí:ken rón:kwe iah thiken r-onkwe not that .-person ‘That man doesn’t eat it.’
ì:raks. i-hr-ak-s -..-eat-
The adult version would include a negative prepronominal prefix on the verb: te’-hr-ak-s > tè:raks. (Mohawk verbs must contain at least two syllables. If a verb would otherwise be monosyllabic, a prothetic vowel i- is added at the beginning, which bears stress.) Overall, children learning Mohawk apparently first build vocabulary within phonological length limitations, then begin to abstract morphological distinctions. The fact that they so rarely make allomorphic errors suggests that they are not in fact producing language by assembling underlying forms then applying sequences of phonological rules. This accords well with the findings of Tomasello (2006 and elsewhere) on acquisition: Children’s earliest acquisitions are concrete pieces of language—words, complex expressions, or mixed constructions—because particularly early in development they do not possess fully abstract categories and schemas. Children construct these abstractions only gradually and in piecemeal fashion. The strategies observed in children learning Mohawk as a first language differ interestingly from those seen in adult second-language learners. In several of the Mohawk communities, an extraordinary generation of young adults are developing an impressive competence in the language. They are becoming fluent, something that would have been considered an impossible dream only a short time ago. These second-language learners show brilliant mastery of the complex morphology, certainly making allomorphic mistakes along the way, but exquisitely tuned in to the complexities involved. First-language speakers are delighted to see their accomplishments, though, interestingly, they observe uniformly that these second-language speakers continually create words that do not exist in the language.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
326
12.6 Implications for our models of morphology Work by Blevins (2006, 2013, 2016a, 2016b), Pirrelli et al. (2015), and others draws a distinction between Constructive and Abstractive models of morphology. In Constructive frameworks, surface word forms are described as built up from subword units, either in terms of substance or rules. In Abstractive frameworks, the basic units of the grammatical system are surface word forms. Roots, stems, and exponents are understood as abstractions over a lexicon of word forms. Constructive perspectives underlie efficient linguistic descriptions, the kinds of descriptions that are useful for both linguists and adult second-language learners. They also fit well with what is seen in adult acquisition of Mohawk as a second language, in particular allomorphy mistakes and the overgeneration of derived forms. Such descriptions also provide measures of objective complexity in the sense described by Dahl and others cited earlier. Abstractive perspectives are word based, though it is recognized that words can be internally structured into recognizable constituent parts. Constituent parts are analysed as emergent from independent principles of lexical organization, whereby full lexical forms are redundantly stored and mutually related through entailment relations (Matthews 1991; Corbett & Fraser 1993; Pirelli 2000; Burzio 2004; Booij 2010; all cited in Pirelli et al. 2015: 142). It is significant that the processing of a given form may be facilitated or inhibited by other, related forms. This makes sense only if the related forms are available as elements of a speaker’s mental lexicon (Taft 1979; Baayen et al. 1997; Schreuder & Baayen 1997; Hay 2001; de Jong 2002; Moscoso del Prado Martin 2003, cited in Blevins 2006; Blevins 2006: 535). Abstractive models accord well with differences between highly fluent firstlanguage speakers of Central Pomo and Mohawk on the one hand, and Englishdominant first-language speakers on the other. One of the most salient differences is that while less fluent speakers do use highly synthetic words if they are very frequent or primed, they have a smaller inventory of choices. Their more limited lexical inventories can result in some inappropriate lexical selections, both inflected and derived, and fewer options for shaping information flow. Abstractive models also accord well with the strong sense among both Central Pomo and Mohawk speakers of whether a possible word exists and exactly when it is used. They are in line with the problems even skilled speakers sometimes face in attempting to process speech from other dialects. They would predict the variable ability of speakers to isolate morphemes which never occur on their own as independent words, the existence of discontinuous dependencies, and speakers’ differential facility in producing inflectional paradigms. Speakers can certainly extend patterns of inflection by analogy on occasion, but rarer forms and combinations present greater challenges. Abstractive perspectives also accord with the
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
?
327
phonologically-based first-language acquisition strategies of Mohawk by young children, and the rarity of allomorphy mistakes. Learners of that age face few memory hurdles that would hinder the acquisition of large numbers of new lexical items. In the end, what is complex for the analyst is not necessarily complex for the speaker or for the learner. Speakers of Central Pomo and Mohawk store most polymorphemic words, or at least stems, as chunks. Allomorphic alternations do not present serious difficulties when they are embedded in the chunks, a fact that is easily observed in the absence of mistakes in frequent forms, but also in the challenges presented by rare or novel combinations. Templatic structure may be unmotivated for the analyst and thus viewed as additional complexity, but, importantly, the routinization of structure they represent can result in fewer decisions on the part of speakers. It can also facilitate the acquisition of new lexical items, items which easily fit into an existing pattern. Do the differences matter? The various types of complexity are all useful, but for different purposes, and for that reason, it is important to recognize them. If our goal is to delineate what is a possible language, we want to think about possible for whom. Language is full of patterns, some no longer productive. As analysts we care about all of them: they allow us to understand the otherwise arbitrary. Speakers inherit the products of past patterns and happily use some without abstracting over them. And it is learners and speakers who shape the language according to their own knowledge.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
IV
DISCUSSION
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
13 Morphological complexity and the minimum description length approach Östen Dahl
13.1 Introduction Within the study of linguistic complexity, morphological complexity has a special place due to the fact that morphology is the part of language where differences in complexity between languages are most apparent. Morphological complexity also seems to lend itself fairly easily to quantification. It is therefore natural that it should attract the attention of linguists. The chapters in this volume show a variety of approaches to morphological complexity which sometimes differ quite considerably in the conceptual apparatus applied. In this concluding chapter, rather than try to review each contribution separately, I will focus on some of the basic concepts used by the authors. Sometimes this will demand going beyond the contributions to the volume. I will start out by presenting briefly what I will call ‘the minimum description length approach’ to complexity and then try to see how other concepts of complexity applied in the chapters of the volume relate to it.
13.2 The minimum description length approach to complexity I will take as my point of departure the idea that the complexity of an entity can be understood as the amount of information needed to recreate or specify it—which in most cases can be identified with the length of the shortest possible complete description of it. This is often referred to as ‘Kolmogorov complexity’ or ‘algorithmic information content’ and has its most natural application when applied to strings (of symbols or characters): the Kolmogorov complexity of a string is the inverse of its compressibility. Kolmogorov complexity is behind the ‘minimum description length (MDL) principle’ which is said to build on the insight that ‘any regularity in the data can be used to compress the data’ (Grünwald 2007), leading to the conclusion that finding the best hypothesis for a given set of data means finding the optimal way to compress it. As in Dahl (2004), I will here use the term ‘pattern’ rather than ‘regularity’, following Goertzel (1994) and Shalizi (2001).
Östen Dahl, Morphological complexity and the minimum description length approach In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Östen Dahl. DOI: 10.1093/oso/9780198861287.003.0013
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
332
¨
The minimum description length principle is sometimes said to be a version of Occam’s Razor, but it might equally well be called ‘Pāṇini’s razor’, given that he and other Indian linguists in the first millennium honoured a principle later formulated as ‘Grammarians rejoice over the saving of half a short vowel as much as over the birth of a son’,¹ which more directly addresses the issue of description length. In modern linguistics, similar ideas have been discussed in terms of ‘descriptive economy’ or ‘parsimony’. But ‘minimum description length’ has been explicitly addressed in computational approaches to morphology, the most cited example being Goldsmith (2001). I think the notion of minimum description length can also be helpful in understanding some notoriously difficult concepts in linguistics. I will take suppletion as an example. Corbett (2009) characterizes suppletion as ‘an outer limit of inflection, the extreme of markedness and complexity’ but also approvingly quotes the following statement from Mel’čuk (1994: 358), which does not refer to complexity, as ‘a good definition of suppletion’: ‘For the signs X and Y to be suppletive their semantic correlation should be maximally regular, while their formal correlation is maximally irregular.’ But what is interesting here is rather the explication in the cited work of Mel’čuk of what he means by ‘maximally irregular’, or as he says elsewhere, ‘minimally regular’. For his ‘rigorous definition’ of suppletion, Mel’čuk introduces the auxiliary notion of co-representability. Two units are co-representable if they can be derived from each other or from a common source by rules of the language, and the condition on maximal irregularity of form means that the signifiers of the units are not co-representable. No particular conditions such as productivity or generality are put on the rules— ‘[t]he only factor that counts for there to be regularity is the presence of rules’. In a minimum description length approach, Mel’čuk’s account can be interpreted as implying that suppletion involves the absence of a pattern or regularity—a way of representing the data in a shorter way than by rendering it literally. Thus, a suppletive form would have to be listed in the description of the language. Notice that this excludes what is not explicitly precluded in Mel’čuk’s account—a rule which applies to one form only. The point is that introducing such a rule would normally involve an increase in description length that would offset what is gained by shortening the specification of the suppletive form.
¹ The maxim (Sanskrit ardhamātrā lāghavena putrotsavaṃ manyante vaiyākaraṇ āḥ) is often quoted in the literature without a source. There is no known formulation of it from classical times. In the form cited here, it derives from the treatise Paribhāṣenduśekhara by the nineteenth-century Indian scholar Nagēśa or Nāgojībhaṭtạ , which was translated into English by the German Indologist Franz Kielhorn (Kielhorn 1871). Incidentally, Occam’s Razor in its commonly cited form (entia non sunt multiplicanda praeter necessitatem) is not found in the writings of William of Ockham but derives from the seventeenth-century Irish philosopher John Punch.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
333
13.3 The organization of morphology When describing a set of objects, the most parsimonious way is often to separate the information about their general properties from the information that is specific to each member of the set. Descriptions of languages are traditionally divided into ‘grammar’ and ‘lexicon’. So let’s see what that implies for morphology. We can see the goal of the morphological component of a grammar—or ‘morphology’ for short—as a tool to generate the set of all word forms, organized in paradigms, in a language from a lexicon. Another way of putting this in the spirit of the MDL principle is to regard the morphological component as a way of compressing the set of paradigms. The morphology and the lexicon together constitute the description of the word forms. The lexicon will consist of a set of entries, which I shall call ‘lexical specifications’, containing the information needed by the morphology to generate one particular paradigm, that is, on the one hand, one or more basic forms or principal parts; on the other, membership in inflection classes, genders, etc. I shall here assume that the lexicon contains no other information. The total length of the morphology and the lexicon is thus indicative of the complexity of the paradigms. But in speaking of morphological complexity we have to sort out a few different components in this. Primarily, the morphological complexity of a language would be the complexity of the morphological component in the sense of the system that relates the lexicon with the set of paradigms. To start with, although I have been speaking of a set of word forms and a set of paradigms as if those things were equal, the difference between them is crucial. Think of the paradigm as a table. Since there is a number of ways any given set of word forms can be organized into a table, and the choice between them is significant, it follows that there is information hidden in the organization of the paradigm and consequently the paradigm is more complex than the set of word forms. Furthermore, the paradigms belonging to lexical items of one part of speech usually share a common structure. But this structure can be studied independently of the system that relates paradigms and lexical specifications. So paradigm organization can be seen as a component of its own. Another problem is to what extent the lexicon is relevant to the question of morphological complexity. On the one hand, to the extent that the morphological component does not treat all lexical items equally, the lexicon will have to contain information that makes that possible. On the other hand, if items are added to or removed from the lexicon, the total length of the lexicon will change—and it seems counter-intuitive that these changes should always influence the morphological complexity of a language. For this reason, it is rather the information contained in the individual lexical specifications that is of interest.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
334
¨
As I said, to assess the complexity of a set of paradigms, we would have to consider both the complexity of the lexicon and the complexity of the morphology—both separately and taken together. As noted in Sagot & Walther (2011; quoted by Parker & Sims, Chapter 2, this volume), it may sometimes be possible to obtain a shorter total description length by changing the division of labor of the two components. It is possible to streamline the general picture somewhat. Instead of thinking of the output of morphology as a set of paradigms, we can think of the morphology as generating a set of annotated word forms—that is, each form comes with a specification of its grammatical features. This way, the system becomes symmetric—we can speak of input and output specifications and a set of rules that relate them. The input and output specifications have specified formats and contain terms taken from specific vocabularies: labels of inflectional classes and values of inflectional features, respectively. The sizes or lengths of the specification formats and vocabularies are part of the overall complexity of the linguistic system, but they also influence the complexity of the rules of the morphological component. What I have just said illustrates that it is not always clear how to draw the boundaries of morphological complexity. In general, speaking of the complexity of a component of the description of a language in isolation easily becomes somewhat artificial, in my opinion even on a modular view of language structure. I will return to this question below.
13.4 Notions of complexity represented in the volume As noted above, the chapters in the volume differ in the notions of complexity that are invoked. But they also differ in the extent to which they place these notions within explicit frameworks. The minimum description length approach to complexity is mentioned in the chapters by Di Garbo, Chapter 9; Loporcaro, Chapter 6; Mithun, Chapter 12; and Nichols, Chapter 7. But more salient in the volume is the approach of Ackerman & Malouf (2013). Several chapters (Henri et al., Chapter 5; Parker and Sims, Chapter 2; Mansfield and Nordlinger, Chapter 3; and Meakins and Wilmoth, Chapter 4) draw on their distinction between two ‘dimensions in the analysis of morphological complexity’, viz. ‘enumerative complexity’ or ‘E-complexity’ and ‘integrative complexity’ or ‘I-complexity’. This motivates discussing these concepts in some detail, which I will do below. A superficially somewhat similar dichotomy is that made by Nichols between ‘inventory complexity (IC)’ and ‘canonical complexity (CC)’, but while ‘IC’ and ‘enumerative complexity’ are fairly closely related, the second members of the pairs bear little resemblance to each other. (There is a potential source of
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
335
confusion in that Nichols’s ‘IC’ is closer to Ackerman & Malouf’s ‘E-complexity’ than to their ‘I-complexity’.) Thus, Nichols’s ‘CC’ deserves a discussion of its own. Mithun (Chapter 12, this volume) cites both minimum descriptive length complexity and the distinction between ‘Constructive’ and ‘Abstractive’ models of morphology. Berdicevskis & Semenuks (Chapter 11, this volume) identify ‘irregularity’ and ‘overspecification’ as the two ‘facets of complexity’ they want to focus on. Tallman & Epps (Chapter 9, this volume) rely on the taxonomy of Anderson (2015a), with ‘system complexity’ and ‘exponence complexity’ as the top level categories.
13.5 Compositional complexity In the introduction to Miestamo et al. (2008), the volume editors apply the analysis of the notion of complexity in Rescher (1998) to linguistic complexity. For Rescher, description length (in his terms, ‘descriptive complexity’), is just one of several ‘modes of complexity’. Another is ‘compositional complexity’, which relates to the constituent elements of a system and is subdivided into two submodes: ‘constitutional complexity’—the number of elements, and ‘taxonomic complexity’—their variety. Miestamo et al. (2008: viii) exemplify the former with the number of ‘phonemes, inflectional morphemes, derivational morphemes, lexemes’, and the latter with the variety of ‘phoneme types, secondary articulations, parts-of-speech, tense-mood-aspect categories, phrase types’, etc. Although there are no references to Rescher’s taxonomy (but see the editors’ Introduction, Chapter 1), notions close to ‘constitutional complexity’ show up in a number of ways in the chapters of the volume, notably as one of the poles of the dichotomies of Nichols and Ackerman & Malouf. Nichols’s ‘IC’ is based on ‘assessing the number of elements in an inventory or values in a system’, exemplified by ‘the number of phonemes, genders, tenses, derivation types, alignments, word orders’. She identifies it with Miestamo et al.’s (and thus indirectly Rescher’s) notion of ‘taxonomic complexity’. It may be noted that some of the items in her list seem rather to belong to ‘constitutional complexity’ in Rescher’s schema, illustrating that the borderline is somewhat fuzzy. Nichols also quotes the term ‘resources’ from Dahl (2004) in this context, which is slightly problematic. In my book, I opposed ‘resources’ and ‘regulations’, saying that intuitively, ‘resources determine what is possible or permitted, regulations what is obligatory’, and noting that ‘the distinction is reminiscent of that between grammar and lexicon but does not coincide with it’ (Dahl 2004: 41). The basic idea was that resources are things that one can more or less freely choose from. The primary examples are lexical items. As the quotation suggests, I did not primarily think of the notion as applying to grammar. Many of the phenomena Nichols
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
336
¨
enumerates are not freely chosen by speakers but rather show up as a consequence of forced choices due to what I called regulations. Later in the book (Dahl 2004: 42) I say that if one wants to characterize a language with respect to its ‘resources’, the parameter that comes first to mind is ‘richness’. Dressler (2011), who is quoted by Loporcaro (Chapter 6, this volume), also uses this term, characterizing the size of paradigms as a criterion of ‘richness’ rather than of ‘complexity’. However, Dressler defines ‘richness’ as ‘the amount of productive morphological patterns’, associating complexity with unproductive patterns, so his notion is different from mine (and apparently also from Nichols’s ‘IC’). Let me now turn to Ackerman & Malouf’s notion of E-complexity. It is not quite clear what is supposed to go into it. The abstract says that E-complexity reflects ‘reflects the number of morphosyntactic distinctions that languages make and the strategies employed to encode them, concerning either the internal composition of words or the arrangement of classes of words into inflection classes’ (Ackerman & Malouf 2013: 429). The definition in the main text (2013: 433) is formulated in a somewhat roundabout way. The authors first note that ‘descriptive linguists often comprehensively catalogue the array of morphological markers and patterns in a given language or languages’, making possible on the one hand typological investigations of the types of information encoded in words and taxonomies of formal strategies for encoding this information, on the other, inferences by theoretical linguists about the bounds on possible word structures in natural languages. ‘We refer to patterns found via this general cataloguing of properties and their surface exponence for words in all of their variety as the enumerative complexity or E-complexity of a morphological system.’ What is unclear here is whether E-complexity is basically a count of distinctions and patterns/strategies or something more. Later formulations in the paper do not really solve this problem. On p. 434, we learn that ‘[on]e salient dimension of E-complexity is the number and nature of inflection classes in a language’, with the word ‘nature’ suggesting that it is not only a question of counting. On the other hand, on p. 437, it is said that paradigm-based models ‘reflect a measure of E-complexity’ which is specified as ‘a greater number of possible exponents, inflectional classes, and principal parts’. Likewise, on p. 451, ‘the same E-complexity’ is equated with ‘the same number of declensions, paradigm cells, and allomorphs’, and in a later work (Ackerman & Malouf 2016: 125), E-complexity is said to increase with ‘(i) larger numbers of morphosyntactic properties a language contains, (ii) greater numbers of allomorphic variants it uses to encode them, and (iii) more inflectional classes that lexemes can be distributed over’. The interpretation of enumerative complexity as being simply an inventory count is clearly the one chosen by Henri et al. (Chapter 5, this volume): ‘a linguistic phenomenon’s enumerative complexity depends on how many categories (of whatever type) it employs’ (p. 106). They seem to have the same thing in
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
337
mind when saying earlier (p. 106) that ‘[m]orphological complexity is often equated with numerousness—of morphs, categories, processes, or paradigm cells’. They also refer to Stump (2017), who is quite explicit on this point when he describes the distinction introduced by Ackerman & Malouf: ‘a linguistic phenomenon’s enumerative complexity depends on how many categories (of whatever type) it employs . . . ’. Parker & Sims (Chapter 2, this volume) also refer to enumerative complexity as ‘the number of inflection classes or the size of paradigms’. Ackerman & Malouf (2013), like also the chapters in this volume that quote it, tend to give the impression that the two notions of I-complexity and E-complexity more or less exhaust the possible approaches to morphological complexity, and that earlier work has been dominated by E-complexity. Thus, Ackerman & Malouf say in a footnote (2013: 434): ‘For examples of efforts to identify and quantify Ecomplexity, see, for example, Juola 1998, 2007, Sampson et al. 2010, Moscoso del Prado Martín 2011.’ But the works listed here represent a variety of approaches to linguistic complexity, including MDL-based ones. And it should be clear that Ecomplexity cannot be identified with description length. A list of morphosyntactic categories, inflection classes, and allomorphs is not yet a morphological description of a language.
13.6 Integrative complexity Minimum description length approaches to complexity can be said to represent ‘objective’ (Dahl 2004) or ‘absolute’ (Miestamo 2008) understandings of the notion in the sense that they concern properties of objects or systems that are independent of concepts such as ‘difficulty’ or ‘cost’, which imply an ‘agentrelated’ (Dahl 2004) or ‘relative’ (Miestamo 2008) notion of complexity. Ultimately, we want to understand how objective measures of linguistic complexity are related to how difficult or costly different aspects of a language are for a learner or a user, but in order to do that, we have to keep objective and agentrelated notions apart and not let them be conflated. When Ackerman & Malouf (2013) say that their notion of ‘integrative complexity’ ‘reflects the difficulty that a paradigmatic system poses for language users (rather than lexicographers) in information-theoretic terms’, it invites the interpretation that they are doing exactly that—conflating objective complexity and difficulty. A more charitable understanding, however, is that their goal is to find an objective measure of complexity that predicts the difficulty of a linguistic system, more specifically the uncertainty that faces a speaker when inferring an unknown word form from other forms in the same paradigm. The most important measure then becomes ‘the average uncertainty in guessing the realization of one randomly selected cell in the paradigm of a lexeme given the realization of one other randomly selected
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
338
¨
cell’. The central result of the study is that high E-complexity of paradigmatic systems is possible as long as low I-complexity is found in those in the form of average conditional entropy of paradigms. The definition of average conditional entropy presupposes that the set of possible realizations of the cell to be guessed is known and finite, otherwise the entropy cannot be calculated. This condition is not fulfilled when some of the possible realizations are suppletive. That is, the notion of conditional entropy cannot be applied to cases such as English go and went. It may perhaps be argued that those are precisely the situations where you have to know the paradigm in advance so guessing is not possible anyway. But it restricts the applicability of the notion to some extent. The notions of ‘conditional entropy’ and ‘average conditional entropy’, as applied to inflection templates, have some interesting mathematical properties not discussed by Ackerman & Malouf. ‘Average conditional entropy’ involves bidirectional predictability relations between cells in a paradigm template. These turn out to be ‘entangled’ in that there is an upper bound on the sum of two symmetric entropies, which has as a consequence that the average conditional entropy of a paradigm can never exceed 50% of what Ackerman & Malouf call its ‘declension entropy’, that is, the surprisal of the inflection class membership of a lexeme under the assumption that each inflection class is equally probable. I have no formal proof of this claim,² but I have tested it for all possible value combinations for sets of classes with sizes up to eight, where I had to stop due to limitations on computer capacity. Concretely, this means that in a system with eight declension classes and declension entropy equal to 3—like the Greek one exemplified in Ackerman & Malouf (2013), the average conditional entropy could not be higher than 1.5. This fact should be taken into account when assessing the actual average conditional entropies calculated by Ackerman & Malouf—as, when they (p. 442) say that the overall average conditional entropy for the eight Greek
² But consider the simplest case: a system with two inflection classes and two inflectional forms, as illustrated in Table 13.1. There are four logical possibilities in such a 22 matrix: (1) identity between the rows in both columns; (2) identity in row 1 and no identity in row 2; (3) no identity in row 1 but identity in row 2; (4) no identity in either row. Case (1) can be disregarded since it would mean there is really only one inflection class. The entropy is zero. In case (4), one form always gives full information about the other, so the entropy is zero. In case (2), the cells in row 1 do not say anything about the cells in row 2, so the entropy for each cell is equal to the choice between two items, that is 1 (=one bit). But since there is no choice in row 1, the entropy in the opposite direction is 0, which gives an average of 0.5. Case (3) is analogous, but with the columns swapped—the average will again be 0.5. Note further that adding a third column will not change anything for the following reason. Guessing is always from one column to another, so we are always dealing with pairs of columns, in which guessing can go in either direction. While a 22 matrix involves just one such pair, a 32 matrix with columns ABC entails three pairs of columns: AB, AC, BC. But that makes such a matrix equivalent to three 22 matrices—and as we saw, a 22 matrix has a maximum average guessing entropy of 0.5, the value for the 32 matrix is the same. And adding further columns gives an analogous result. Things get more complicated when rows are added, but my computer simulation strongly suggests that the relation between declension entropy and maximum average conditional entropy is constant.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
339
declensions is 0.644 bits, which is equal ‘to a choice among . . . 1.56 equally likely declensions’ or ‘slightly more than one’ declension. This is misleading in the sense that no system with two declensions could ever have an average conditional entropy higher than 0.5. Thus, if the entropy is 0.644, the system must have at least three declensions. The low values for average conditional entropy found by Ackerman & Malouf thus at least partly depend on mathematical necessity rather than on anything else. It appears that integrative complexity, in the form of conditional entropy, primarily depends on two factors: one is the extent to which forms ‘wear their inflection class on their sleeve’, that is, are informative about their own inflectional class, the other is the extent to which the distributions of allomorphs—or, more generally, exponents—differ between forms and thus, in the words of Parker & Sims (Chapter 2, this volume), increase the ‘extent to which the system inhibits motivated inferences about the realized form of a lexeme, given one or more other realized forms of the same lexeme’. The dependence of conditional entropy on these factors means that its relationship to minimum descriptive length complexity is not straightforward. The first factor—the informativity of a form about its inflection class membership— means that there is an inverse relation between the diversity of forms in the predicting cells and integrative complexity. Thus, lack of overt marking, which will in general decrease description length, can actually increase integrative complexity. Consider the hypothetical noun inflection templates in Table 13.1, with the rows representing two inflectional classes. The templates can be generated by the rules beneath the table. Table 13.1. Hypothetical noun inflection templates (a) 1 2
(b) sg -∅ -∅
pl -e -i
1 2
sg -a -o
pl -e -i
Rules: (a) If plural then (if 1 -e else -i) else -∅; (b) if plural then (if 1 then -e else -i) else (if 1 then -a else -o).
Thus, (b) has a greater description length than (a). However, in (b), the singular and plural markers are wholly predictable from each other, so the integrative complexity is 0. In (a), on the other hand, the plural form cannot be determined from the singular, which results in an average integrative complexity of 0.5—the theoretic maximum—for the whole template. The second factor—the degree to which allomorph distributions differ—means that a high average number of allomorphs—which would presumably lead to a higher description length—does not necessarily lead to a high integrative
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
340
¨
complexity. Thus, paradoxically, the situation we saw in (b), where all cells of the paradigm are different from each other, will, irrespective of the size of the paradigm, always mean that the integrative complexity is zero. But this is not so strange if we realize that what integrative complexity really measures is the amount of discordance between the classifications of the lexicon entailed by the different columns in a paradigm.
13.7 Canonical complexity and transparency Nichols’s concept of CC builds on the notion of canonicity developed above all by Greville Corbett and his associates (see, e.g., Corbett 2007, 2013a, 2015). ‘CC’ should not be interpreted as ‘complexity in the canonical sense’, but rather, as Nichols herself admits, as ‘less logical’ alternative to the more cumbersome ‘noncanonicity-based complexity’, perhaps also paraphraseable as ‘degree of noncanonicity’. According to Nichols, canonicity theory ‘can be used as a good approximation to descriptive complexity [i.e. minimum description length Ö.D.] and is straightforwardly measurable and comparable’ even if it is not a complexity measure in itself. In Nichols’s words, ‘[i]t defines a logical space (for a linguistic concept or structure or system) by determining the central, or ideal, position in that space and kinds of departures from that ideal, and an element is non-canonical to the extent that it departs from the ideal’. According to Corbett (2015: 149), canonicity theory, or in his words, canonical typology, analyses and defines ‘phenomena that are subject to variability (across and within languages), extracting the various scales along which we characterize variability, and establishing the logical endpoint of these scales’, yielding theoretical spaces of possibilities, which once established can be populated with real instances. Canonical instances are those that match a full set of criteria and may therefore be infrequent or even nonexistent. This distinguishes canonicity from prototypicality with which it is easily confused. As the following quotation (Corbett 2015: 172) makes clear, phenomena are not canonical or non-canonical tout court, but rather they are canonical or noncanonical instances of some concept: Just as, for instance, we say that suppletion is a noncanonical realization of morphosyntactic specification, but can then specify canonical suppletion . . . Similarly, inflection classes are themselves noncanonical, but we can go on to establish criteria for canonical inflection classes . . .
It would appear that this creates a problem for the notion of CC, since we would have to choose a concept to relate it to and also be rather cautious in doing so,
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
341
since for some concepts, such as suppletion, ‘more canonical’ actually means ‘more complex’. I cannot see that this issue is addressed in an explicit fashion in Nichols’s Chapter 7, but since she says that she is concerned exclusively with ‘morphological complexity and specifically inflectional morphology’, it can be assumed that the canonicity she is speaking of is ‘canonical inflection’ as understood in Corbett’s (2015) paper. There is still a catch here, though. In general, one would assume that a language with minimal inflectional complexity would be one without any inflection at all, or that a minimally complex inflectional class system would be having no inflectional differences between lexemes. Under a consistent canonical approach, however, it would appear that isolating languages should not be seen as having zero inflectional complexity (and thus being maximally canonical), rather the notion of inflectional complexity would not be applicable to them. So far as I can see, Nichols’s sample does not contain any purely isolating languages (Mandarin is the one that comes closest) so it is not apparent how she would treat them. But the problem may show up again at another level. Thus, with regard to unpredictability of gender, Nichols puts languages with entirely predictable gender together with languages without gender—which maybe makes sense assuming that one is looking at canonicity of inflection but not if what is at stake is canonicity of gender. Nichols notes one point where there is a discrepancy between Kolmogorov complexity and CC—syncretism, that is, when two or more cells in a paradigm share the same word form. She notes that syncretism does ‘not increase the amount of information required to describe a language’. This may in fact be made stronger—syncretism often makes it possible to shorten a description. But syncretism will in general lead to violations of what Nichols refers to as ‘the structuralist notion of biuniqueness, or “one form, one function” ’,³ which Nichols sees as central to canonicity and thus syncretism increases CC. Likewise, Corbett (2015: 152) says: ‘In the canonical situation, the inflectional material is different in every cell of the lexeme. The major deviation here is syncretism; we have an expectation of a given number of inflectional forms, while with syncretism two or more of them are identical (two or more morphosyntactic specifications share a single realization).’ Sometimes it seems that the choice of criteria on canonicity rely on a demand for ‘proper behaviour’—if you have a distinction somewhere, you had better have it everywhere. If that makes things more complex does not really matter. What Nichols calls ‘biuniqueness’ (like Tallman & Epps (Chapter 9, this volume), who mention ‘deviations from biuniqueness’ as a criterion that relates ³ Cf. also the following statement by Mansfield & Nordlinger (Chapter 3, this volume): ‘Inflectional allomorphy is a prototypical form of morphological complexity, introducing unpredictability into the mapping of form to meaning’.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
342
¨
to measures of morphological complexity) is sometimes referred to as ‘transparency’, notably in the work of Kees Hengeveld and his associates. Hengeveld & Leufkens (2018) define ‘transparency’ as ‘a one-to-one relation between units of meaning and units of form’. As for the relationship between this concept and complexity, they say that ‘The difference is immediately evident from the fact that languages may be complex yet transparent or simple yet opaque’; however, they do not clarify what notion of complexity they have in mind except by giving Turkish as an example, where the verbal morphology ‘is highly complex in the sense that a single verbal word may contain a high number of different morphemes, but also highly transparent in that every morpheme corresponds to one fixed meaning’. This suggests that they are speaking of structural complexity rather than system complexity. There is another problem in identifying deviations from the one-to-one relation between meaning and form with complexity, not addressed by the authors mentioned above, that is crucial when it comes to crosslinguistic comparisons. It concerns the identifiability of units of meaning and is particularly crucial in inflectional morphology. The grammar of a language may force speakers to express information that is not essential to their intended message. Thus, in a language with gendered pronouns, it may not be possible to refer to a person without revealing their gender. The consequence is that it is sometimes impossible to translate a sentence from one language into another which conveys exactly the same information, which makes it difficult to compare the languages with respect to biuniqueness/transparency (see Dahl 2004: 80–6 for further discussion). The notion of ‘overspecification’ is also relevant here. Following McWhorter (2007: 21–8), Berdicevskis & Semenuks (Chapter 11, this volume) regard overspecification as one of the most crucial facets of complexity, defining it as ‘overt and obligatory marking of a semantic distinction that is not necessary for communication’. Noting that ‘it is not at all obvious what is necessary for communication’, they mention McWhorter’s proposal to use crosslinguistic comparison to determine what is necessary: if a distinction is not universally present in languages, it can be assumed not to be necessary for communication. However, as is noted in Dahl (2004: 80), it is not possible to claim that a distinction is necessary or unnecessary as such, since that has to depend on what information the speaker wants to convey—the point is rather that a grammar may force speakers to express some information whether they like it or not.
13.8 Overabundance Meakins & Wilmoth (Chapter 4, this volume) focus on the phenomenon of ‘overabundance’, by which they mean ‘the exponence of multiple forms in the same cell in a paradigm’, arguing that it represents an increase in integrative
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
343
complexity, ‘in that it requires speakers to make calculated choices about forms based on features beyond the paradigm’. The particular problem studied is optional subject marking in the mixed language Gurindji Kriol, more specifically ‘the alternation in the nominative cell of the Gurindji Kriol case paradigm between zero and -ngku’. They identify three factors which govern the variation: (i) transitivity; (ii) priming by a preceding subject in the discourse; (iii) presence of a co-referential (crossreferential) pronoun. This obviously expands the domain within which morphological complexity is considered. I think it may be questioned if this variation is to be treated within morphology at all; it looks similar to other cases of differential argument marking and would naturally be seen as a syntactic phenomenon. On the other hand, as I said above, seeing complexity only from a module-internal perspective can be seen as artificial and may prevent us from making relevant generalizations. In this case, we seem to be dealing with phenomena that were discussed in Dahl (2004: 128–34) under the rubrics ‘pattern competition’ and ‘pattern regulation’. I was mainly interested in what happens during grammaticalization in a single language, but it seems that what I said can be generalized to contact situations. My main point was that competition between two patterns, whether lexical or grammatical, may lead to an increase in complexity. As long as the patterns are in free variation, the increase is minimal (and does not lead to any significant difficulty for learners and users), but there appears to be a universal tendency towards regulation of the variation, which at the initial stages shows itself merely in the form of tendencies.
13.9 Conclusion The chapters of the volume that I have looked at here are those in which there is explicit discussion of the basic notions relating to complexity employed in the chapters. Time and space considerations do not allow me to comment on the others, in spite of many of them being on topics that are of direct interest to me. One reflection is that the study of morphological complexity has still quite some way to go before there is a set of shared notions and standard works that everyone refers to. Which approaches will prevail in the long run is obviously an open question. It is notable that both the notion of minimum description length and Ackerman & Malouf’s notion of integrative complexity are ultimately based on information theory. It is not excluded that we will see other applications of this theory in the future.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
References Abel, Jennifer (2006). ‘That crazy idea of hers: The English double genitive as a focus construction’, Canadian Journal of Linguistics 51(1): 1–14. doi:10.1017/S0008413100003790 Aboh, Enoch O. (2009). ‘Competition and selection: That’s all!’, in Enoch O. Aboh and Norval Smith (eds), Complex Processes in New Languages. Amsterdam: John Benjamins, 317–44. doi:10.1075/cll.35.20abo Aboh, Enoch O. (2015). The Emergence of Hybrid Grammars. Cambridge: Cambridge University Press. doi:10.1017/CBO9781139024167 Aboh, Enoch O. and Umberto Ansaldo (2007). ‘The role of typology in language creation’, in Umberto Ansaldo, Stephen Matthews, and Lisa Lim (eds), Deconstructing Creole. Amsterdam: John Benjamins, 39–66. doi:10.1075/tsl.73.05abo Abouda, Lotfi and Marie Skrovec (2015). ‘Du rapport entre formes synthétique et analytique du futur. Étude de la variable modale dans un corpus oral micro-diachronique’, Revue de Sémantique et Pragmatique 38: 35–57. Abouda, Lotfi and Marie Skrovec (2017). ‘Du rapport micro-diachronique futur simple/ futur périphrastique en français moderne. Étude des variables temporelles et aspectuelles’, Corela, HS-21. URL: http://corela.revues.org/4804 Ackerman, Farrell, James Blevins, and Robert Malouf (2009). ‘Parts and wholes: Implicative patterns in inflectional paradigms’, in James P. Blevins and Juliette Blevins (eds), Analogy in Grammar: Form and Acquisition. Oxford: Oxford University Press, 54–82. Ackerman, Farrell and Robert Malouf (2013). ‘Morphological organization: The Low Conditional Entropy Conjecture’, Language 89(3): 429–64. doi:10.1353/lan.2013.0054. Ackerman, Farrell and Robert Malouf (2015). ‘The No Blur Principle effects as an emergent property of language systems’, Proceedings of the 41st Annual Meeting of the Berkeley Linguistics Society. Berkeley, CA, 1–14. doi:10.20354/B4414110014 Ackerman, Farrell and Robert Malouf (2016). ‘Word and pattern morphology: An information-theoretic approach’, Word Structure 9: 125–31. doi:10.3366/word.2016.0090 Agbetsoamedo, Yvonne (2014). ‘Noun classes in Sɛlɛɛ’, The Journal of West African Languages 41: 95–124. Aglarov, M. A. (1988). Sel’skaja obsčina v Nagornom Dagestane v XVII-načale XIX v. Moscow: Nauka. Aikhenvald, Alexandra Y. (2000). Classifiers: A Typology of Noun Categorization Devices. Oxford: Oxford University Press. Aikhenvald, Alexandra Y. (2002). Language Contact in Amazonia. Oxford: Oxford University Press. Aikhenvald, Alexandra Y. (2003a). ‘Mechanisms of change in areal diffusion: New morphology and language contact’, Journal of Linguistics 39(1): 1–29. doi:10.1017/ S0022226702001937 Aikhenvald, Alexandra Y. (2003b). A Grammar of Tariana. Cambridge: Cambridge University Press. Aikhenvald, Alexandra Y. (2004). Evidentiality. Oxford: Oxford University Press. Aikhenvald, Alexandra Y. and Robert M. W. Dixon (1998). ‘Evidentials and areal typology: A case study from Amazonia’, Language Sciences 20: 241–57.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
346
Aikhenvald, Alexandra Y. and R. M. W. Dixon (eds) (2006). Grammars in Contact: A CrossLinguistic Typology. Oxford: Oxford University Press. Aikhenvald, Alexandra Y. and Diana Green (1998). ‘Palikur and the typology of classifiers’, Anthropological Linguistics 40: 429–80. Åkerberg, Bengt (2012). Älvdalsk grammatik. Älvdalen: Ulum Dalska. Albright, Adam and Bruce Hayes (2002). ‘Modeling English past tense intuitions with minimal generalization’, in M. Maxwell (ed.), Proceedings of the 6th Meeting of the ACL Special Interest Group in Computational Phonology July 2002. New Brunswick, NJ: Association for Computational Linguistics, 58–69. Albright, Adam and Bruce Hayes (2003). ‘Rules vs. analogy in English past tenses: A computational/experimental study’, Cognition 90(2): 119–61. Alegre, Maria and Peter Gordon (1999a). ‘Frequency effects and the representational status of regular inflections’, Journal of Memory and Language 40(1): 41–61. Alegre, Maria and Peter Gordon (1999b). ‘Rule-based versus associative processes in derivational morphology’, Brain and Language 68(1–2): 347–54. Allen, Shanley E. M. (2017). ‘Polysynthesis in the acquisition of the Inuit languages’, in Michael Fortescue, Marianne Mithun, and Nicholas Evans (eds), The Oxford Handbook of Polysynthesis. Oxford: Oxford University Press, 449–72. Alleyne, Mervin (1996). Syntaxe historique créole. Paris: Editions Karthala. Ambrazas, Vytautas, Emma Geniušienė, Aleksas Girdenis, Nijolė Sližienė, Dalija Tekorienė, Adelė Valeckienė, and Elena Valiulytė. 2006. Lithuanian Grammar. 2nd ed. Vilnius: Baltos Lankos. Ambridge, Ben and Elena V. M. Lieven (2011). Child Language Acquisition. Cambridge: Cambridge University Press. Anderson, Stephen R. (1992). A-Morphous Morphology. Cambridge: Cambridge University Press. Anderson, Stephen R. (2015a). ‘Dimensions of morphological complexity’, in Matthew Baerman, Dunstan Brown, and Greville G. Corbett (eds), Understanding and Measuring Morphological Complexity. Oxford: Oxford University Press, 11–26. doi:10.1093/acprof: oso/9780198723769.003.0002 Anderson, Stephen R. (2015b). ‘The morpheme: Its nature and use’, in Matthew Baerman (ed.), The Oxford Handbook of Inflection. Oxford: Oxford University Press, 11–34. Arika, Ann Lindvall (2012). ‘Glimpses of the linguistic situation in Solomon Islands’. Paper given at the 6th international conference on ‘Languages, E-Learning and Romanian Studies’. Arka, Wayan (2011). A Rongga-English Dictionary with English-Rongga Wordlist. Jakarta: Penerbit Universitas Atma Jaya. Arkadiev, Peter (2020). ‘Morphology in typology: Historical retrospect, state of the art, and prospects’, in Mark Aronoff (ed.), Oxford Research Encyclopedia of Linguistics. New York: Oxford University Press. doi: 10.1093/acrefore/9780199384655.013.626 Arkadiev, Peter, Axel Holvoet, and Björn Wiemer (2015). ‘Introduction: Baltic linguistics— State of the art’, in Peter Arkadiev, Axel Holvoet, and Björn Wiemer (eds), Contemporary Approaches to Baltic Linguistics. Berlin: De Gruyter Mouton, 1–109. Arkadiev, Peter and Marian Klamer (2019). ‘Morphological theory and typology’, in Francesca Masini and Jenny Audring (eds), The Oxford Handbook of Morphological Theory. Oxford: Oxford University Press, 435–54. Armand, Alain (2014). Dictionnaire kréol rénioné français. Saint-André (Réunion): Epica.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
347
Arnott, David Whitehorn (1970). The Nominal and Verbal Systems of Fula. Oxford: Clarendon. Aronoff, Mark (1994). Morphology by Itself: Stems and Inflectional Classes. Cambridge, MA: The MIT Press. Aronoff, Mark (1998). ‘Isomorphism and monotonicity: Or the disease model of morphology’, in Steven Lapointe, Diane Brentari, and Patrick Farrell (eds), Morphology and Its Relation to Phonology and Syntax. Stanford, CA: CSLI Publications, 411–18. Aronoff, Mark (2015). ‘Thoughts on morphology and cultural evolution’, in Laurie Bauer, Lívia Körtvélyessy, and Pavol Štekauer (eds), Semantics of Complex Words. Cham: Springer, 277–88. doi:10.1007/978-3-319-14102-2_13 Aski, Janice M. (1995). ‘Verbal suppletion: An analysis of Italian, French and Spanish to go’, Linguistics 33(3): 403–32. doi:10.1515/ling.1995.33.3.403 Atkinson, Mark, Kenny Smith, and Simon Kirby (2018). ‘Adult learning and language simplification’, Cognitive Science 42(8): 2818–54. doi:10.1111/cogs.12686 Audring, Jenny (2014). ‘Gender as a complex feature’, Language Sciences 43: 5–17. doi:10.1016/j.langsci.2013.10.003 Audring, Jenny (2017). ‘Calibrating complexity: How complex is a gender system?’, Language Sciences 60: 53–68. doi:10.1016/j.langsci.2016.09.003 Audring, Jenny (2019). ‘Canonical, complex, complicated?’, in Francesca Di Garbo, Bruno Olsson, and Bernhard Wälchli (eds), Grammatical Gender and Linguistic Complexity, vol. I: General Issues and Specific Studies. Berlin: Language Science Press, 15–52. URL: http://langsci-press.org/catalog/book/223 Azen, Razia and Nicole Traxel (2009). ‘Using dominance analysis to determine predictor importance in logistic regression’, Journal of Educational and Behavioral Sciences 34(3): 319–47. doi:10.3102/1076998609332754 Baayen, R. Harald (2001). Word Frequency Distributions. Dordrecht: Kluwer Academic Publishers. Baayen, R. Harald (2007). ‘Storage and computation in the mental lexicon’, in Gonia Jarema and Gary Libben (eds), The Mental Lexicon: Core Perspectives. Amsterdam: Elsevier, 81–104. Baayen, R. Harald (2008). Analyzing Linguistic Data: A Practical Introduction to Statistics Using R. Cambridge: Cambridge University Press. Baayen, R. Harald, Rochelle Lieber, and Robert Schreuder (1997). ‘The morphological complexity of simplex nouns’, Linguistics 35: 861–77. doi:10.1515/ling.1997.35.5.861 Baayen, R. Harald, Petar Milin, Dusica Filipović Đurđević, Peter Hendrix, and Marco Marelli (2011). ‘An amorphous model for morphological processing in visual comprehension based on naive discriminative learning’, Psychological Review 118(3): 438–81. doi:10.1037/a0023851 Baayen, R. Harald, Lee H. Wurm, and Joanna Aycock (2007). ‘Lexical dynamics for lowfrequency complex words: A regression study across tasks and modalities’, The Mental Lexicon 2(3): 419–63. doi:10.1075/ml.2.3.06baa Babou, Cheikh Anta and Michele Loporcaro (2016). ‘Noun classes and grammatical gender in Wolof ’, Journal of African Languages and Linguistics 37(1): 1–57. doi:10.1515/jall2016-0001 Baechler, Raffaela (2017). Absolute Komplexität in der Nominalflexion. Berlin: Language Science Press. URL: http://langsci-press.org/catalog/book/134 Baechler, Raffaela and Guido Seiler (eds) (2016). Complexity, Isolation, and Variation. Berlin: De Gruyter.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
348
Baerman, Matthew (2012). ‘Paradigmatic chaos in Nuer’, Language 88(3): 467–94. doi:10.1353/lan.2012.0065 Baerman, Matthew (2016). ‘Seri verb classes: Morphosyntactic motivation and morphological autonomy’, Language 92(4): 792–823. doi:10.1353/lan.2016.0073 Baerman, Matthew, Dunstan Brown, and Greville G. Corbett (2005). The SyntaxMorphology Interface: A Study of Syncretism. Cambridge: Cambridge University Press. Baerman, Matthew, Dunstan Brown, and Greville G. Corbett (2010). ‘Morphological complexity: A typological perspective’. Ms, Surrey Morphology Group, University of Surrey. URL: http://epubs.surrey.ac.uk/814702/ Baerman, Matthew, Dunstan Brown, and Greville G. Corbett (eds) (2015a). Understanding and Measuring Morphological Complexity. Oxford: Oxford University Press. Baerman, Matthew, Dunstan Brown, and Greville G. Corbett (2015b). ‘Understanding and measuring morphological complexity: An introduction’, in Matthew Baerman, Dunstan Brown, and Greville G. Corbett (eds), Understanding and Measuring Morphological Complexity. Oxford: Oxford University Press, 3–10. Baerman, Matthew, Dunstan Brown, and Greville G. Corbett (2017). Morphological Complexity. Cambridge: Cambridge University Press. Baerman, Matthew, Greville G. Corbett, and Dunstan Brown (eds) (2010). Defective Paradigms: Missing Forms and What They Tell Us. Oxford: Oxford University Press and British Academy. Baerman, Matthew, Greville G. Corbett, Dunstan Brown, and Andrew Hippisley (eds) (2007). Deponency and Morphological Mismatches. Oxford: Oxford University Press and British Academy. Baissac, Charles (1880). Etudes du patois mauricien. Nancy: Imprimerie Berger-Levrault. Baker, Philip (1972). Kreol: A Description of Mauritian Creole. Ann Arbor: Karoma. Baker, Philip and Chris Corne (1982). Isle de France Creole: Affinities and Origins. Ann Arbor, MI: Karoma. Bakker, Peter (1997). A Language of Our Own: The Genesis of Michif, the Mixed CreeFrench Language of the Canadian Métis. Oxford: Oxford University Press. Bakker, Peter (2003). ‘Mixed languages as autonomous systems’, in Yaron Matras and Peter Bakker (eds), The Mixed Language Debate: Theoretical and Empirical Advances. Berlin: Mouton de Gruyter, 107–50. Bakker, Peter (2013). ‘Michif ’, in Susanne Maria Michaelis, Philipe Maurer, Martin Haspelmath, and Magnus Huber (eds), The Atlas and Survey of Pidgin and Creole Languages, vol. 3: Contact Languages Based on Languages from Africa, Australia, and the Americas. Oxford: Oxford University Press, 158–65. Bakker, Peter (2014). ‘Creolistics: Back to square one?’, Journal of Pidgin and Creole Languages 29: 177–94. doi:10.1075/jpcl.29.1.08bak Bakker, Peter, Aymeric Daval-Markussen, Mikael Parkvall, and Ingo Plag (2011). ‘Creoles are typologically distinct from non-creoles’, Journal of Pidgin and Creole Languages 26(1): 5–42. doi:10.1075/jpcl.26.1.02bak Balode, Laimute and Axel Holvoet (2001). ‘The Latvian language and its dialects’, in Östen Dahl and Maria Koptjevskaja-Tamm (eds), The Circum-Baltic Languages: Typology and Contact, vol. 1: Past and Present. Amsterdam: John Benjamins, 3–40. Bao Diop, Sokhna (2015). ‘Les classes nominales en nyun gunyamolo’, in Denis Creissels and Konstantin Pozdniakov (eds), Les classes nominales dans les langues atlantiques. Köln: Köppe, 371–405. Baptista, Marlyse (2003a). ‘Inflectional plural marking in creoles and pidgins: A comparative study’, in Ingo Plag (ed.), The Phonology and Morphology of Creole Languages. Tübingen: Niemeyer, 315–32.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
349
Baptista, Marlyse. (2003b). ‘Number inflection in creole languages’, Interface 6: 3–26. Becher, Jutta (2001). Untersuchungen zum Sprachwandel im Wolof aus diachroner und synchroner Perspektive. University of Hamburg PhD dissertation. Beier, Christine, Lev Michael, and Joel Sherzer (2002). ‘Discourse forms and processes in indigenous lowland South America: An areal-typological perspective’, Annual Review of Anthropology 31: 121–45. doi:10.1146/annurev.anthro.31.032902.105935 Bendor-Samuel, John Theodore (ed.) (1989). The Niger-Congo Languages: A Classification and Description of Africa’s Largest Language Family. Lanham, MD: University Press of America, by arrangement with the Summer Institute of Linguistics (SIL). Bentley, W. Holman (1887). Dictionary and Grammar of the Kikongo Language. London: Trübner & Co. Bentz, Christian (2016). ‘The low-complexity-belt: Evidence for large-scale language contact in human pre-history?’, in Sean G. Roberts, Christine Cuskley, Luke McCrohon, Lluís Barceló-Coblijn, Olga Feher, and Tessa Verhoef (eds), The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). doi:10.17617/2.2248195 Bentz, Christian, Dimitrios Alikaniotis, Michael Cysouw, and Ramon Ferrer-i-Cancho (2017). ‘The entropy of words—Learnability and expressivity across more than 1000 languages’, Entropy 19: 275. doi:10.3390/e19060275 Bentz, Christian and Aleksandrs Berdicevskis (2016). ‘Learning pressures reduce morphological complexity: Linking corpus, computational and experimental evidence’, in Dominique Brunato, Felice Dell’Orletta, Giulia Venturi, Thomas François, and Philippe Blache (eds), Proceedings of the Workshop ‘Computational Linguistics for Linguistic Complexity (CL4LC)’. Osaka, Japan, 222–32. Bentz, Christian and Morten H. Christiansen (2013). ‘Linguistic adaptation: The trade-off between case marking and fixed word orders in Germanic and Romance languages’, in Feng Shi and Gang Peng (eds), Eastward Flows the Great River: Festschrift in Honor of Professor William S-Y. Wang on his 80th Birthday. Hong Kong: City University of Hong Kong Press, 45–61. Bentz, Christian, Annemarie Verkerk, Douwe Kiela, Felix Hill, and Paul Buttery (2015). ‘Adaptive communication: Languages with more non-native speakers tend to have fewer word forms’, PLoS ONE 10(6): e0128254. doi:10.1371/journal.pone.0128254 Bentz, Christian and Bodo Winter (2013). ‘Languages with more second language learners tend to lose nominal case’, Language Dynamics and Change 3: 1–27. doi:10.1163/ 22105832-13030105 Berdicevskis, Aleksandrs, Çağrı Çöltekin, Katharina Ehret, Kilu von Prince, Daniel Ross, Bill Thompson, Chunxiao Yan, Vera Demberg, Gary Lupyan, Taraka Rama, and Christian Bentz (2018). ‘Using universal dependencies in cross-linguistic complexity research’, in Marie-Catherine de Marneffe, Teresa Lynn, and Sebastian Schuster (eds), Proceedings of the Second Workshop on Universal Dependencies (UDW 2018). Brussels: Association for Computational Linguistics, 8–17. Berdicevskis, Aleksandrs and Arturs Semenuks (submitted). ‘Imperfect language learning reduces morphological overspecification: Experimental evidence’. Bernini-Montbrand, Danièle, Ralph Ludwig, Hector Poullet, and Sylviane Telchid (2013). Dictionnaire créole-français Guadeloupe, avec un abrégé de grammaire créole, un lexique français-créole, les comparaisons courantes, les locutions et plus de 1000 proverbes. Paris: Orphie. Berry, Keith and Christine Berry (1999). A Description of Abun. Canberra: Pacific Linguistics. Bertrand-Bocandé, Emmanuel (1849). ‘Notes sur la Guinée portugaise ou Sénégambie méridionale’ [pt. 2], Bulletin de la Société de Géographie 12: 57–93.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
350
Bickel, Balthasar, Goma Banjade, Martin Gaenszle, Elena Lieven, Netra Prasad Paudyal, Ichchha Purna Rai, Manoj Rai, Novel Kishore Rai, and Sabine Stoll (2007). ‘Free prefix ordering in Chintang’, Language, 83(1): 43–73. doi:10.1353/lan.2007.0002 Bickel, Balthasar and Johanna Nichols (2002). ‘Autotypologizing databases and their use in fieldwork’, in Peter Austin, Helen Dry, and Peter Wittenburg (eds), International LREC Workshop on Resources and Tools in Field Linguistics, Las Palmas, 26–7 May 2002. Nijmegen: Max Planck Institute for Psycholinguistics. Bickel, Balthasar and Johanna Nichols (2005). ‘Inflectional synthesis of the verb’, in Martin Haspelmath, Matthew Dryer, David Gil, and Bernard Comrie (eds), The World Atlas of Language Structures. Oxford: Oxford University Press, 94–7. Bickel, Balthasar and Johanna Nichols (2007). ‘Inflectional morphology’, in Timothy Shopen (ed.), Language Typology and Syntactic Description, vol. 3: Grammatical Categories and the Lexicon. Cambridge: Cambridge University Press, 169–240. Bickel, Balthasar and Johanna Nichols (2013). ‘Inflectional synthesis of the verb’, in Matthew Dryer and Martin Haspelmath (eds), World Atlas of Language Structures Online. URL: http://wals.info/chapter/22 Bickel, Balthasar, Johanna Nichols, Taras Zakharko, Alena Witzlack-Makarevich, Kristine Hildebrandt, Michael Rießler, Lennart Bierkandt, Fernando Zúñiga, and John B. Lowe (2017). The Autotyp typological databases. Version 0.1.0. URL: https://github.com/ autotyp/autotyp-data/tree/0.1.0 Bickel, Balthasar and Fernando Zúñiga (2017). ‘The “word” in polysynthetic languages: Phonological and syntactic challenges’, in Michael Fortescue, Marianne Mithun, and Nicholas Evans (eds), The Oxford Handbook of Polysynthesis. Oxford: Oxford University Press, 158–85. Bickerton, Derek (1981). Roots of Language. Ann Arbor, MI: Karoma. Bickerton, Derek (1984). ‘The language bioprogram hypothesis’, Behavioral and Brain Sciences 7(2): 173–88. doi:10.1017/S0140525X00044149 Bickerton, Derek (1988). ‘Creole languages and the bioprogram’, in Frederick Newmeyer (ed.), Linguistics: The Cambridge Survey, vol. 2: Linguistic Theory. Extensions and Implications. Cambridge: Cambridge University Press, 268–84. Birchall, Joshua (2014). Argument Marking Patterns in South American Languages. Universiteit Nijmegen PhD dissertation. Blasi, E. Damián, Susanne Maria Michaelis, and Martin Haspelmath (2017). ‘Grammars are robustly transmitted even during the emergence of creole languages’, Nature Human Behaviour 1: 723–9. doi:10.1038/s41562-017-0192-4 Blench, Roger (2009). ‘Do the Ghana-Togo mountain languages constitute a genetic group?’, The Journal of West African Languages 36(1–2): 19–36. Blevins, James P. (2006). ‘Word-based morphology’, Journal of Linguistics 42(3): 531–73. doi:10.1017/S0022226706004191 Blevins, James P. (2013). ‘Word-based morphology from Aristotle to modern WP (Word and Paradigm models)’, in Keith Allen (ed.), The Oxford Handbook of the History of Linguistics. Oxford: Oxford University Press, 375–95. Blevins, James P. (2016a). ‘The minimal sign’, in Gregory Stump and Andrew Hippisley (eds), The Cambridge Handbook of Morphology. Cambridge: Cambridge University Press, 50–69. Blevins, James P. (2016b). Word and Paradigm Morphology. Oxford: Oxford University Press.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
351
Blevins, James P., Petar Milin, and Michael Ramscar (2017). ‘The Zipfian paradigm cell filling problem’, in Ferenc Kiefer, James P. Blevins, and Huba Bartos (eds), Perspectives on Morphological Structure: Data and Analyses. Leiden: Brill, 141–58. Bloomfield, Leonard (1914). ‘Sentence and word’, Transactions and Proceedings of the American Philological Association 45: 65–75. Bloomfield, Leonard (1933). Language. New York: Holt. Blythe, Joe (2009). Doing Referring in Murriny Patha Conversation. University of Sydney PhD dissertation. Blythe, Joe, Rachel Nordlinger, and Nicholas Reid (2007). ‘Murriny Patha finite verb paradigms’. Unpublished ms. Boilat, David (1858). Grammaire de la langue woloffe. Paris: Imprimerie Impériale. URL: http://babel.hathitrust.org/cgi/pt?id=wu.89012299343;view=1up;seq=11 Bokamba, Eyamba (1977). ‘The impact of multilingualism on language structures: The case of Central Africa’, Anthropological Linguistics 19: 181–202. Bolaños, Katherine (2016). A Grammar of Kakua. Utrecht: LOT. Bonami, Olivier (2013). ‘Towards a robust assessment of implicative relations in inflectional systems’. Paper given at the ‘Workshop on Computational Approaches to Morphological Complexity’, Paris. Bonami, Olivier (2015). ‘Periphrasis as collocation’, Morphology 25: 63–110. doi:10.1007/ s11525-015-9254-3 Bonami, Olivier and Sarah Beniamine (2015). ‘Implicative structure and joint predictiveness’, in Vito Pirrelli, Claudia Marzi, and Marcello Ferro (eds), Word Structure and Word Usage: Proceedings of the NetWordS Final Conference, Pisa, Italy, March 30–April 1, 2015. Pisa: Institute for Computational Linguistics, National Research Council, 4–9. Bonami, Olivier and Sarah Beniamine (2016). ‘Joint predictiveness in inflectional paradigms’, Word Structure 9(2): 156–82. doi:10.3366/word.2016.0092 Bonami, Olivier and Gilles Boyé (2002). ‘Suppletion and dependency in inflectional morphology’, in Frank van Eynde, Lars Hellan, and Dorothee Beermann (eds), Proceedings of the 8th International Conference on Head-Driven Phrase Structure Grammar. Stanford: CSLI, 51–70. Bonami, Olivier and Gilles Boyé (2003). ‘Supplétion et classes flexionnelles dans la conjugaison du français’, Langages 15: 102–26. Bonami, Olivier and Gilles Boyé (2007). ‘French pronominal clitics and the design of Paradigm Function Morphology’, in Geert E. Booij, Luca Ducceschi, Bernard Fradin, Emiliano Guevara, Angela Ralli, and Sergio Scalise (eds), On-line Proceedings of the Fifth Mediterranean Morphology Meeting (MMM5) Fréjus, 15–18 September 2005. Bologna: University of Bologna, 291–322. Bonami, Olivier, Gilles Boyé, and Fabiola Henri (2011). ‘Measuring inflectional complexity: French and Mauritian’. Paper given at the ‘Workshop on Quantitative Measures in Morphology and Morphological Development’, San Diego. Bonami, Olivier, Gilles Boyé, and Françoise Kerleroux (2009). ‘L’allomorphie radicale et la relation flexion-construction’, in Bernard Fradin, Françoise Kerleroux, and Marc Plénat (eds), Aperçus de morphologie du français. Saint-Denis: Presses Universitaires de Vincennes, 103–25. Bonami, Olivier and Fabiola Henri (2010). ‘Assessing empirically the complexity of Mauritian Creole’. Paper given at the conference ‘Formal Approaches to Creole Studies 2’, Berlin.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
352
Bonami, Olivier, Fabiola Henri, and Ana R. Luís (2013). ‘Comparing sources of inflectional morphology in Romance-based creoles’. Paper given at the workshop ‘Portuguese-based Creoles in Perspective’, Coimbra. Bonami, Olivier, Fabiola Henri, and Ana R. Luís (2015). ‘Making sense of morphological complexity’. Paper given at the ‘SeePiCLa Meeting’, Lisbon. Bond, Oliver, Greville G. Corbett, Marina Chumakina, and Dunstan Brown (eds) (2016). Archi: Complexities of Agreement in Cross-theoretical Perspective. Oxford: Oxford University Press. Booij, Geert E. (1993). ‘Against split morphology’, in Geert E. Booij and Jaap van Marle (eds), Yearbook of Morphology 1993. Dordrecht: Kluwer, 27–49. doi:10.1007/978-94017-3712-8_2 Booij, Geert E. (1997). ‘Allomorphy and the autonomy of morphology’, Folia Linguistica 31: 25–56. doi:10.1515/flin.1997.31.1-2.25 Booij, Geert E. (2010). Construction Morphology. Oxford: Oxford University Press. Boyé, Gilles and Patricia Cabredo Hofherr (2006). ‘The structure of allomorphy in Spanish verbal inflection’, Cuadernos de Lingüística del Instituto Universitario Ortega y Gasset 13: 9–24. Bozic, Mirjana and William Marslen-Wilson (2010). ‘Neurocognitive contexts for morphological complexity: Dissociating inflection and derivation’, Language and Linguistics Compass 4(11): 1063–73. doi:10.1111/j.1749-818X.2010.00254.x Brandão, Ana Paula B. (2014). A Reference Grammar of Paresi-Haliti (Arawak). University of Texas at Austin PhD dissertation. Bresnan, Joan (2007). ‘Is syntactic knowledge probabilistic? Experiments with the English dative alternation’, in Sam Featherston and Wolfgang Sternefeld (eds), Roots: Linguistics in Search of Its Evidential Base. Berlin: Mouton de Gruyter, 77–96. Bresnan, Joan and Marilyn Ford (2013). ‘Predicting syntax: Processing dative constructions in American and Australian varieties of English’, Language 86(1): 186–213. doi:10.1353/ lan.0.0189 Brown, Dunstan, Greville G. Corbett, Norman M. Fraser, Andrew Hippisley, and Alan Timberlake (1996). ‘Russian noun stress and Network Morphology’, Linguistics 34(1): 53–107. doi:10.1515/ling.1996.34.1.53 Brown, Dunstan and Andrew Hippisley (2012). Network Morphology: A Defaults-Based Theory of Word Structure. Cambridge: Cambridge University Press. Burzio, Luigi (2004). ‘Paradigmatic and syntagmatic relations in Italian verbal inflection’, in Julie Auger, J. Clancy Clements, and Barbara Vance (eds), Contemporary Approaches to Romance Linguistics. Amsterdam: John Benjamins, 17–44. Bybee, Joan L. (1985). Morphology: A Study of the Relation between Meaning and Form. Amsterdam: John Benjamins. Bybee, Joan L. (1995). ‘Regular morphology and the lexicon’, Language and Cognitive Processes 10(5): 425–55. doi:10.1080/01690969508407111 Bybee, Joan L. (2007). Frequency of Use and the Organization of Language. Oxford: Oxford University Press. Bybee, Joan L. and Clay Beckner (2015). ‘Language use, cognitive processes, and linguistic change’, in Claire Bowern and Bethwyn Evans (eds), The Routledge Handbook of Historical Linguistics. London: Routledge, 503–18. Bybee, Joan L. and Carol Lynn Moder (1983). ‘Morphological classes as natural categories’, Language 59: 251–70. doi:10.2307/413574 Bybee, Joan and Dan I. Slobin (1982). ‘Rules and schemas in the development and use of the English past tense’, Language 58(2): 265–89. doi:10.2307/414099
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
353
Cadely, Jean-Robert (1994). Aspects de la phonologie du créole haïtien. Université du Québec à Montréal PhD dissertation. Camara, Sana (2006). Wolof Lexicon and Grammar. Madison, WI: NALRC Press. Cameron-Faulkner, Thea and Andrew Carstairs-McCarthy (2000). ‘Stem alternants as morphological signata: Evidence from blur avoidance in Polish nouns’, Natural Language and Linguistic Theory 18(4): 813–35. doi:10.1023/A:1006496821412 Campbell, Lyle (2012). ‘Typological characteristics of South American indigenous languages’, in Lyle Campbell and Verónica Grondona (eds), The Indigenous Languages of South America: A Comprehensive Guide. Berlin: Mouton de Gruyter, 259–330. Carlin, Eithne (2006). ‘Feeling the need: The borrowing of Cariban functional categories into Mawayana (Arawak)’, in Alexandra Y. Aikhenvald and Robert M. W. Dixon (eds), Grammars in Contact: A Cross-Linguistic Perspective. Oxford: Oxford University Press, 313–32. Carstairs, Andrew (1983). ‘Paradigm economy’, Journal of Linguistics 19(1): 115–28. doi:10.1017/S0022226700007477 Carstairs, Andrew (1987). Allomorphy in Inflexion. London: Croom Helm. Carstairs-McCarthy, Andrew (1994). ‘Inflection classes, gender, and the Principle of Contrast’, Language 70(4): 737–88. Carstairs-McCarthy, Andrew (1998). ‘How lexical semantics constrains inflectional allomorphy’, in Geert E. Booij and Jaap van Marle (eds), Yearbook of Morphology 1997. Dordrecht: Springer, 1–24. doi:10.1007/978-94-011-4998-3_1 Carstairs-McCarthy, Andrew (2010). The Evolution of Morphology. Oxford: Oxford University Press. Chao, Yuen Ren (1968). A Grammar of Spoken Chinese. Berkeley, CA: University of California Press. Chaudenson, Robert (2003). La créolisation. Théorie, applications, implications. Paris: L’Harmattan. Childs, G. Tucker (1983). ‘Noun class affix renewal in Southern West Atlantic’, in Jonathan D. Kaye, Hilda Koopman, Dominique Sportiche, and André Dugas (eds), Current Approaches to African Linguistics II. Dordrecht: Mouton de Gruyter and Foris Publications, 17–29. Childs, G. Tucker (2009). ‘What happens when a language dies? Language change vs. language death’, Studies in African Linguistics 38(2): 113–30. Chirikba, Viacheslav A. (2008). ‘The problem of the Caucasian Sprachbund’, in Pieter C. Muysken (ed.), From Linguistic Areas to Areal Linguistics. Amsterdam: John Benjamins, 25–94. Ciucci, Luca (2014). ‘Tracce di contatto tra la famiglia zamuco (ayoreo, chamacoco) e altre lingue del Chaco. Prime prospezioni’, Quaderni del Laboratorio di Linguistica 13: 1–52. Clahsen, Harald, Claudia Felser, Kathleen Neubauer, Mikako Sato, and Renita Silva (2010). ‘Morphological structure in native and nonnative language processing’, Language Learning 60: 21–43. doi:10.1111/j.1467-9922.2009.00550.x Cobbinah, Alexander (2010). ‘The Casamance as an area of intense language contact: The case of Baïnounk Gubaher’, in Friederike Lüpke and Mary Raymond (eds), Documenting Atlantic–Mande convergence and diversity. Special issue of the Journal of Language Contact—THEMA 3: 175–202. Cole, Desmond T. (1967). Some Features of Ganda Linguistic Structure. Johannesburg: Witwatersrand University Press. Comrie, Bernard (1989). Language Universals and Linguistic Typology. 2nd ed. Chicago: University of Chicago Press.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
354
Comrie, Bernard (1992). ‘Before complexity’, in John A. Hawkins and Murray Gell-Mann (eds), The Evolution of Human Languages. London: Addison-Wesley, 193–211. Comrie, Bernard, Lucía A. Golluscio, Hebe Gonzáles, and Alejandra Vidal (2010). ‘El Chaco como área lingüística’, in Z. Estrada Fernández and R. Arzápalo Marín (eds), Estudios de lenguas amerindias, vol. 2: Contribuciones al estudio de las lenguas originarias de América. Hermosillo, Sonora (Mexico): Editorial Unison, 85–130. Corbett, Greville G. (1982). ‘Gender in Russian: An account of gender specification and its relationship to declension’, Russian Linguistics 6(2): 197–232. Corbett, Greville G. (1991). Gender. Cambridge: Cambridge University Press. Corbett, Greville G. (2000). Number. Cambridge: Cambridge University Press. Corbett, Greville G. (2007). ‘Canonical typology, suppletion, and possible words’, Language 83(1): 8–42. doi:10.1353/lan.2007.0006 Corbett, Greville G. (2009). ‘Suppletion: Typology, markedness, complexity’, in Patrick O. Steinkrüger and Manfred Krifka (eds), On Inflection. Berlin: Mouton de Gruyter, 25–40. Corbett, Greville G. (2013a). ‘Canonical morphosyntactic features’, in Dunstan Brown, Marina Chumakina, and Greville Corbett (eds), Canonical Morphology and Syntax. Oxford: Oxford University Press, 48–65. Corbett, Greville G. (2013b). ‘The unique challenge of the Archi paradigm’, in Chundra Cathcart, Shinae Kang, and Clare S. Sandy (eds), Proceedings of the 37th Annual Meeting, Berkeley Linguistics Society: Special Session on Languages of the Caucasus, 52–67. Corbett, Greville G. (2015). ‘Morphosyntactic complexity: A typology of lexical splits’, Language 91(1): 145–93. doi:10.1353/lan.2015.0003 Corbett, Greville G. and Sebastian Fedden (2016). ‘Canonical gender’, Journal of Linguistics 52: 495–531. doi:10.1017/S0022226715000195 Corbett, Greville G. and Norman M. Fraser (1993). ‘Network Morphology: A DATR account of Russian nominal inflection’, Journal of Linguistics 29(1): 113–42. doi:10.1017/S0022226700000074 Corbett, Greville G., Andrew Hippisley, Dunstan Brown, and Paul Marriott (2001). ‘Frequency, regularity and the paradigm: A perspective from Russian on a complex relation’, in Joan Bybee and Paul J. Hopper (eds), Frequency and the Emergence of Linguistic Structure. Amsterdam: John Benjamins, 201–26. Corne, Chris (1982). ‘A contrastive analysis of Reunion and Isle de France Creole French: Two typologically diverse languages’, in Philip Baker and Chris Corne (eds), Isle de France Creole: Affinities and Origins. Ann Arbor, MI: Karoma, 8–129. Corne, Chris (1999). From French to Creole. London: University of Westminster Press. Cotterell, Ryan, Christo Kirov, Mans Hulden, and Jason Eisner (2019). ‘On the complexity and typology of inflectional morphological systems’, Transactions of the Association for Computational Linguistics 7: 327–42. doi: 10.1162/tacl_a_00271 Crevels, Mily and Hein van der Voort (2008). ‘The Guaporé-Mamoré Region as a Linguistic Area’, in Pieter C. Muysken (ed.), From Linguistic Areas to Areal Linguistics. Amsterdam: John Benjamins, 151–79. Croft, William (1991). Syntactic Categories and Grammatical Relations: The Cognitive Organization of Information. Chicago: University of Chicago Press. Croft, William (2001). Radical Construction Grammar: Syntactic Theory in Typological Perspective. Oxford: Oxford University Press. Cruschina, Silvio, Martin Maiden, and John C. Smith (eds) (2013). The Boundaries of Pure Morphology: Diachronic and Synchronic Perspectives. Oxford: Oxford University Press.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
355
Cuskley, Christine, Francesca Colaiori, Claudio Castellano, Vittorio Loreto, Martina Pugliese, and Francesca Tria (2015). ‘The adoption of linguistic rules in native and non-native speakers: Evidence from a Wug task’, Journal of Memory and Language 84: 205–23. doi:10.1016/j.jml.2015.06.005 Dahl, Östen (2004). The Growth and Maintenance of Linguistic Complexity. Amsterdam: John Benjamins. Dahl, Östen (2009). ‘Increases in complexity as a result of language contact’, in Kurt Braunmüller and Juliane House (eds), Convergence and Divergence in Language Contact Situations. Amsterdam: John Benjamins, 41–52. Dahl, Östen (2017). ‘Polysynthesis and complexity’, in Michael Fortescue, Marianne Mithun, and Nicholas Evans (eds), The Oxford Handbook of Polysynthesis. Oxford: Oxford University Press, 19–29. Dahl, Östen (2018). ‘Grammaticalization in the languages of Europe’, in Bernd Heine and Heiko Narrog (eds), Grammaticalization from a Typological Perspective. New York: Oxford University Press, 79–96. Dale, Rick and Gary Lupyan (2012). ‘Understanding the origins of morphological diversity: The Linguistic Niche Hypothesis’, Advances in Complex Systems 15(3–4): 1150017. doi:10.1142/S0219525911500172 Danielsen, Swintha (2007). Baure: An Arawak Language of Bolivia. Leiden: CNWS Publications. Dard, Jean (1825). Dictionnaire français–wolof et français–bambara, suivi du dictionnaire wolof–français. Paris: Imprimerie Royale. Dard, Jean (1826). Grammaire wolofe ou méthode pour étudier la langue des noirs qui habitent les royaumes de Bourba-Yolof, de Walo, de Damel, de Bour-Sine, de Saloume, de Baole, en Sénégambie. Paris: Imprimerie Royale. Daugherty, Kim G. and Mark S. Seidenberg (1994). ‘Beyond rules and exceptions: A connectionist approach to inflectional morphology’, in Susan D. Lima, Roberta L. Corrigan, and Gregory Iverson (eds), The Reality of Linguistic Rules. Amsterdam: John Benjamins, 353–88. de Boeck, Egide (1904). Grammaire et vocabulaire du Lingala, ou Langue du Haut-Congo. Brussels: Polleunis-Ceuterick. DeGraff, Michel (2001). ‘On the origin of creoles: A Cartesian critique of Neo-Darwinian linguistics’, Linguistic Typology 5(2–3): 213–310. doi:10.1515/lity.2001.002 DeGraff, Michel (2003). Against creole exceptionalism. Language 79(4): 391–410. DeGraff, Michel (2005). ‘Linguists’ most dangerous myth: The fallacy of creole exceptionalism’, Language in Society 34: 533–91. doi:10.1017/S0047404505050207 DeGraff, Michel (2007). ‘Haitian creole’. In John Holm and Peter L. Patrick (eds), Comparative Creole Syntax: Parallel Outlines of Eighteen Creole Grammars, vol. 7 of Westminster Creolistic Series. London: Battlebridge Publications, 101–26. de Groot, Casper (2008). ‘Morphological complexity as a parameter of linguistic typology: Hungarian as a contact language’, in Matti Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language Complexity: Typology, Contact, Change. Amsterdam: John Benjamins, 191–214. de Haan, Ferdinand (2013). ‘Semantic distinctions of evidentiality’, in Matthew S. Dryer and Martin Haspelmath (eds), The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. URL: http://wals.info/chapter/77 de Jong, Nivja Helena (2002). Morphological Families in the Mental Lexicon. Universiteit Nijmegen PhD dissertation. DeKeyser, Robert M. (2005). ‘What makes learning second-language grammar difficult? A review of issues’, Language Learning 55: 1–25. doi:10.1111/j.0023-8333.2005.00294.x de Leeuw, Joshua R. (2014). ‘jsPsych: A JavaScript library for creating behavioral experiments in a Web browser’, Behavior Research Methods 47(1): 1–12. doi:10.3758/s13428-014-0458-y
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
356
Delafosse, Maurice (1927). ‘Les classes nominales en wolof ’, in Festschrift Meinhof. Sprachwissenschaftliche und andere Studien. Glückstadt: L. Friedrichsen, 29–44. [Reprinted in Gabriel Manessy and Serge Sauvageot (eds) (1963). Wolof et Sérèr. Études de phonétique et de grammaire descriptive. Dakar: University of Dakar Press, 25–42.] DeLancey, Scott (2011). ‘On the origin of Sinitic’, in Zhuo Jing-Schmidt (ed.), Proceedings of the 23rd North American Conference on Chinese Linguistics. Eugene: University of Oregon, 51–64. Derbyshire, Desmond (1987). ‘Morphosyntactic areal characteristics of Amazonian languages’, International Journal of American Linguistics 53: 311–26. doi:10.1086/466060 Derbyshire, Desmond and Doris Payne (1990). ‘Noun classification systems of Amazonian languages’, in Doris Payne (ed.), Amazonian Linguistics: Studies in Lowland South American Languages. Austin, TX: University of Texas Press, 243–71. Derwing, Bruce L. (1990). ‘Morphology and the mental lexicon: Psycholinguistic evidence’, in Wolfgang U. Dressler, Hans C. Luschützky, Oskar E. Pfeiffer, and John R. Rennison (eds), Contemporary Morphology. Berlin: Mouton de Gruyter, 249–65. Deutscher, Guy (2009). ‘ “Overall complexity”: A wild goose chase?’, in Geoffrey Sampson, David Gil, and Peter S. Trudgill (eds), Language Complexity as an Evolving Variable. Oxford: Oxford University Press, 243–51. Diagne, Anna M., Sascha Kesseler, and Christian Meyer (eds) (2011). Communication wolof et société sénégalaise. Héritage et création. Paris: L’Harmattan. Diallo, Abdourahmane (2010). ‘Morphological consequences of Mande borrowings in Fula: The case of Pular, Fuuta–Jaloo’, in Friederike Lüpke and Mary Raymond (eds), Documenting Atlantic–Mande Convergence and Diversity. Special issue of the Journal of Language Contact—THEMA 3: 71–85. Diallo, Abdourahmane (2014). Language Contact in Guinea: The Case of Pular and Mande Varieties. Köln: Köppe. Di Garbo, Francesca (2014). Gender and Its Interaction with Number and Evaluative Morphology: An Intra- and Intergenealogical Typological Survey of Africa. Stockholm University PhD dissertation. Di Garbo, Francesca (2016). ‘Exploring grammatical complexity crosslinguistically: The case of gender’, Linguistic Discovery 14: 46–85. doi:10.1349/PS1.1537-0852.A.468 Di Garbo, Francesca and Matti Miestamo (2019). ‘The evolving complexity of gender agreement systems’, in Francesca Di Garbo, Bruno Olsson, and Bernhard Wälchli (eds), Grammatical Gender and Linguistic Complexity, vol. II: World-Wide Comparative Studies. Berlin: Language Science Press, 15–60. doi:10.5281/ zenodo.3462778 Dimmendaal, Gerrit J. (2011). Historical Linguistics and the Comparative Study of African Languages. Amsterdam: John Benjamins. Diouf, Jean Léopold (2009). Grammaire du wolof contemporain. Edition revue et complétée. Paris: L’Harmattan. Dixon, Robert M. W. (2002). Australian Languages: Their Nature and Development. Cambridge: Cambridge University Press. Dixon, Robert M. W. (2004). The Jarawara Language of Southern Amazonia. Oxford: Oxford University Press. Dixon, Robert M. W. and Alexandra Y. Aikhenvald (1999). ‘Introduction’, in Robert M. W. Dixon and Alexandra Y. Aikhenvald (eds), The Amazonian Languages. Cambridge: Cambridge University Press, 1–22.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
357
Doneux, Jean Léonce (1975). ‘Hypothèses pour la comparative des langues atlantiques’, Africana Linguistica 6: 41–129. Doneux, Jean Léonce (1978). ‘Les liens historiques entre les langues du Sénégal’, Réalités africaines et langue française 7: 6–55. Donohue, Mark (2009). ‘Flores languages’, in Keith Brown and Sarah Ogilvie (eds), Concise Encyclopedia of Languages of the World. Oxford: Elsevier, 420–1. Donohue, Mark and Tim Denham (to appear). ‘Becoming Austronesian: Mechanisms of language dispersal across southern island Southeast Asia’, in David Gil and Antoinette Schapper (eds), Austronesian Undressed. Donohue, Mark and Johanna Nichols (2011). ‘Does phoneme inventory size correlate with population size?’, Linguistic Typology 15(2): 161–70. doi:10.1515/lity.2011.011 Dorian, Nancy (1978). ‘The fate of morphological complexity in language death: Evidence from East Sutherland Gaelic’, Language 54(3): 590–609. Dressler, Wolfgang U. (2003). ‘Degrees of grammatical productivity in inflectional morphology’, Italian Journal of Linguistics 15(1): 31–62. Dressler, Wolfgang U. (2005). ‘Morphological typology and first language acquisition: Some mutual challenges’, in Geert E. Booij, Emiliano Guevara, Angela Ralli, Salvatore Sgroi, and Sergio Scalise (eds), Morphology and Linguistic Typology: On-line Proceedings of the Fourth Mediterranean Morphology Meeting (MMM4), Catania, 21–23 September 2003, 7–20. Dressler, Wolfgang U. (2011). ‘The rise of complexity in inflectional morphology’, Poznań Studies in Contemporary Linguistics 47(2): 159–76. doi:10.2478/psicl-2011-0013 Dressler, Wolfgang U. (2019). ‘Natural morphology’, in Mark Aronoff (ed.), The Oxford Research Encyclopedia of Linguistics. New York: Oxford University Press. doi: 10.1093/ acrefore/9780199384655.013.576 Dressler, Wolfgang U. and Marianne Kilani-Schoch (2016). ‘Natural morphology’, in Andrew Hippisley and Gregory Stump (eds), The Cambridge Handbook of Morphology. Cambridge: Cambridge University Press, 356–89. Dressler, Wolfgang U., Alona Kononenko, Sabine Sommer-Lolei, Katharina Korecky-Kröll, Paulina Zydorowicz, and Laura Kamandulytė-Merfeldienė (2019). ‘Morphological richness, transparency and the evolution of morphonotactic patterns’, Folia Linguistica s40(1): 85–106. doi:10.1515/flih-2019-0005 Dressler, Wolfgang U., Willi Mayerthaler, Oswald Panagl, and Wolfgang U. Wurzel (1987). Leitmotifs in Natural Morphology. Amsterdam: John Benjamins. Dressler, Wolfgang U., Sabine Sommer-Lolei, Katharina Korecky-Kröll, Reili Argus, Ineta Dabašinskienė, Laura Kamandulytė-Merfeldienė, Johanna J. Ijäs, Victoria V. Kazakovskaya, Klaus Laalo, and Evangelia Thomadaki (2019). ‘First-language acquisition of synthetic compounds in Estonian, Finnish, German, Greek, Lithuanian, Russian and Saami’, Morphology 29(3): 409–29. doi:10.1007/s11525-019-09339-0 Dryer, Matthew S. (2013). ‘Coding of nominal plurality’, in Matthew S. Dryer and Martin Haspelmath (eds), The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. URL: https://wals.info/chapter/33 Dryer, Matthew and Martin Haspelmath (eds) (2013). The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. URL: http://wals.info Duke, Janet (2010). ‘Gender reduction and loss in Germanic: The Scandinavian, Dutch, and Afrikaans case studies’, in Antje Dammel, Sebastian Kürschner, and Damaris Nübling (eds), Kontrastive germanistische Linguistik. Hildesheim: Olms, 643–72.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
358
Ehret, Katharina and Benedikt Szmrecsanyi (2016). ‘An information-theoretic approach to assess linguistic complexity’, in Raffaela Baechler and Guido Seiler (eds), Complexity, Isolation, and Variation. Berlin: de Gruyter Mouton, 71–94. Ehrhart, Sabine (1993). Le créole français de St-Louis (le tayo) en Nouvelle-Calédonie. Hamburg: Helmut Buske. Epps, Patience (2005). ‘Areal diffusion and the development of evidentiality: Evidence from Hup’, Studies in Language 29(3): 617–50. doi:10.1075/sl.29.3.04epp Epps, Patience (2007a). ‘The Vaupés melting pot: Tukanoan influence on Hup’, in Alexandra Y. Aikhenvald and Robert M. W. Dixon (eds), Grammars in Contact: A Cross-Linguistic Typology. Oxford: Oxford University Press, 267–89. Epps, Patience (2007b). ‘Birth of a noun classification system: The case of Hup’, in Leo Wetzels (ed.), Language Endangerment and Endangered Languages: Linguistic and Anthropological Studies with Special Emphasis on the Languages and Cultures of the Andean-Amazonian Border Area. The Netherlands: Leiden University, 107–28. Epps, Patience (2008). A Grammar of Hup. Berlin: Mouton de Gruyter. Epps, Patience (2010). ‘Linking valence change and modality: Diachronic evidence from Hup’, International Journal of American Linguistics 76(3): 335–56. doi:10.1086/ 652792 Epps, Patience (2013). ‘Inheritance, calquing, or independent innovation? Reconstructing morphological complexity in Amazonian numerals’, Journal of Language Contact 6: 329–57. doi:10.1163/19552629-00602007 Epps, Patience (2020). ‘Amazonian linguistic diversity and its sociocultural correlates’, in Mily Crevels, and Pieter C. Muysken (eds), Language Dispersal, Diversification, and Contact: A Global Perspective. Oxford: Oxford University Press, 275–90. Epps, Patience and Lev Michael (2017). ‘The areal linguistics of Amazonia’, in Raymond Hickey (ed.), The Cambridge Handbook of Areal Linguistics. Cambridge: Cambridge University Press, 934–63. Evans, Nicholas (2003). Bininj Gun-Wok: A Pan-Dialectal Grammar of Mayali, Kunwinjku and Kune. Canberra: Pacific Linguistics. Facundes, Sidney da Silva (2000). The Language of the Apurinã People of Brazil. The State University of New York at Buffalo PhD dissertation. Fal, Arame, Rosine Santos, and Jean Léonce Doneux (1990). Dictionnaire wolof-français. Paris: Karthala. Falkenberg, Johannes (1962). Kin and Totem: Group Relations of Aborigines in the Port Keats District. Oslo: Oslo University Press. Faye, Souleymane (2013). Grammaire dialectale du seereer. Dakar: La maison du livre universel E.L.U. Fedden, Sebastian and Greville G. Corbett (2017). ‘Gender and classifiers as concurrent systems: Refining the typology of nominal classification’, Glossa 2(1), 34. doi: 10.5334/ gjgl.177 Feist, Timothy (2015). A Grammar of Skolt Saami. Helsinki: Suomalais-Ugrilainen Seura. Feldman, Laurie B. (2000). ‘Are morphological effects distinguishable from the effects of shared meaning and shared form?’, Journal of Experimental Psychology. Learning, Memory, and Cognition 26(6): 1431–44. doi:10.1037//0278-7393.26.6.1431 Fenk-Oczlon, Gertraud and August Fenk (2008). ‘Complexity trade-offs between the subsystems of language’, in Matti Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language Complexity: Typology, Contact, Change. Amsterdam: John Benjamins, 43–65.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
359
Fenk-Oczlon, Gertraud and August Fenk (2014). ‘Complexity trade-offs do not prove the equal complexity hypothesis’, Poznań Studies in Contemporary Linguistics 50(2): 145–55. doi:10.1515/psicl-2014-0010 Ferguson, Charles A. (1971). ‘Absence of copula and the notion of simplicity: A study of normal speech, baby talk, foreigner talk, and pidgins’, in Dell Hymes (ed.), Pidginization and Creolization of Languages. Cambridge: Cambridge University Press, 141–50. Ferronha, António Luís (ed.) (1994). Tratado breve dos Rios de Guiné do Cabo-Verde. Feito pelo Capitão André Álvares d’Almada. Ano de 1594. Lisboa: Grupo de Trabalho do Ministério da Educação para as Comemorações dos Descobrimentos Portugueses. Ferry, Marie-Paule and Konstantin Pozdniakov (2001). ‘Dialectique du régulier et de l’irrégulier. Le système des classes nominales dans le groupe tenda des langues atlantiques’, in Robert Nicolaï (ed.), Leçons d’Afrique. Filiations, ruptures et reconstitution de langues. Un hommage à Gabriel Manessy. Louvain: Peeters, 153–67. Fertig, David (2000). Morphological Change Up Close: Two and a Half Centuries of Verbal Inflection in Nuremberg. Berlin: De Gruyter. Field, Andy, Jeremy Miles, and Zoë Field (2012). Discovering Statistics Using R. London: Sage. Finkel, Raphael and Gregory Stump (2007). ‘Principal parts and morphological typology’, Morphology 17(1): 39–75. doi:10.1007/s11525-007-9115-9 Finkel, Raphael and Gregory Stump (2009). ‘Principal parts and degrees of paradigmatic transparency’, in James P. Blevins and Juliette Blevins (eds), Analogy in Grammar: Form and Acquisition. Oxford: Oxford University Press, 13–53. Finkel, Raphael and Gregory Stump (2013). Principal parts analyzer. URL: http://www.cs. uky.edu/raphael/linguistics/analyze.html (accessed July 2016). Fiorentino, Robert and David Poeppel (2007). ‘Compound words and structure in the lexicon’, Language and Cognitive Processes 22(7): 953–1000. doi:10.1080/ 01690960701190215 Fitch, W. Tecumseh and Marc D. Hauser (2004). ‘Computational constraints on syntactic processing in a nonhuman primate’, Science 303(5656): 377–80. doi:10.1126/ science.1089401 Fleck, David (2007). ‘Evidentiality and double tense in Matses’, Language 83: 589–614. doi:10.1353/lan.2007.0113 Forshaw, William (2016). Little Kids, Big Verbs: The Acquisition of Murrinhpatha Bipartite Stem Verbs. University of Melbourne PhD dissertation. Fortescue, Michael (1992). ‘Morphophonemic complexity and typological stability in a polysynthetic language family’, International Journal of American Linguistics 58(2): 242–8. doi:10.1086/ijal.58.2.3519761 Fowler, Catherine S. (1972). ‘Some ecological clues to Proto-Numic homelands’, in Don D. Fowler (ed.), Great Basin Cultural Ecology: A Symposium. Reno Desert Research Institute Publications in the Social Sciences, 105–21. Frenda, Alessio (2011). ‘Gender in Irish between continuity and change’, Folia Linguistica 45: 283–316. doi:10.1515/flin.2011.012 Gabas Jr, Nilson (1999). A Grammar of Karo, Tupi (Brazil). University of California at Santa Barbara PhD dissertation. Gal, Susan (1989). ‘Lexical innovation and loss: Restricted Hungarian’, in Nancy Dorian (ed.), Investigating Obsolescence: Studies in Language Contraction and Death. Cambridge: Cambridge University Press, 313–31. Gamble, David (1957). Elementary Wolof Grammar. London: Research Department Colonial Office. [Reprinted in Gabriel Manessy and Serge Sauvageot (eds) (1963).
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
360
Wolof et Sérèr. Études de phonétique et de grammaire descriptive. Dakar: University of Dakar Press, 131–61.] Gao, Yongming (1998). Mental Representations of Chinese Numeral Classifiers. Lehigh University PhD dissertation. Gardani, Francesco (2008). Borrowing of Inflectional Morphemes in Language Contact. Frankfurt am Main: Peter Lang. Gardani, Francesco (2012). ‘Plural across inflection and derivation, fusion and agglutination’, in Lars Johanson and Martine I. Robbeets (eds), Copies versus Cognates in Bound Morphology. Leiden: Brill, 71–97. Gardani, Francesco (2013). Dynamics of Morphological Productivity: The Evolution of Noun Classes from Latin to Italian. Leiden: Brill. Gardani, Francesco (2015). ‘Affix pleonasm’, in Peter O. Müller, Ingeborg Ohnheiser, Susan Olsen, and Franz Rainer (eds), Word-Formation. An International Handbook of the Languages of Europe, vol. 1. Berlin: De Gruyter Mouton, 537–50. Gardani, Francesco (2018). ‘On morphological borrowing’, Language and Linguistics Compass 12(10): 1–17. doi:10.1111/lnc3.12302 Gardani, Francesco, Franz Rainer, and Hans Christian Luschützky (2019). ‘Competition in morphology: A historical outline’, in Franz Rainer, Francesco Gardani, Wolfgang U. Dressler, and Hans Christian Luschützky (eds), Competition in Inflection and Word-Formation. Cham: Springer, 3–36. doi:10.1007/978-3-030-02550-2_1 Gblem-Poidi, Massanvi Honorine (2007). ‘Nominal classes and concord in Igo (Ahlon)’, in Mary Esther Kropp Dakubu, George Akanlig-Pare, Kweku E. Osam, and Kofi K. Saah (eds), Proceedings of the Annual Colloquium of the Legon-Trondheim Linguistics Project 10–20 January 2005, vol. 4. Legon: Linguistics Department, University of Ghana, 52–60. Gell-Mann, Murray (1994). The Quark and the Jaguar: Adventures in the Simple and the Complex. London: Little Brown. Gell-Mann, Murray (1995). ‘What is complexity?’, Complexity 1(1): 16–19. Gervain, Judith and Jacques Mehler (2010). ‘Speech perception and language acquisition in the first year of life’, Annual Review of Psychology 61: 191–218. doi:10.1146/annurev. psych.093008.100408 Gibbons, Jean Dickinson (1992). Nonparametric Measures of Association. Newbury Park, CA: Sage. Gippert, Jost, Wolfgang Schulze, Zaza Aleksidze, and Jean-Pierre Mahé (2009). The Caucasian Albanian Palimpsests of Mount Sinai. Turnhout, Belgium: Brepols. Givón, Talmy (1971). ‘Historical syntax and synchronic morphology: An archeologist’s fieldtrip’, Proceedings of the Chicago Linguistic Society 7: 394–415. Goertzel, Ben (1994). Chaotic Logic: Language, Thought, and Reality from the Perspective of Complex Systems Science. Boston: Springer. Goldsmith, John (2001). ‘Unsupervised learning of the morphology of a natural language’, Computational Linguistics 27(2): 153–98. doi:10.1162/089120101750300490 Goldsmith, John (2011). ‘The evaluation metric in Generative Grammar.’ Paper presented at the 50th anniversary celebration for the MIT Department of Linguistics. Gomez-Imbert, Elsa (1996). ‘When animals become “rounded” and “feminine”: Conceptual categories and linguistic classification in a multilingual setting’, in John J. Gumperz and Stephen C. Levinson (eds), Rethinking Linguistic Relativity. Cambridge: Cambridge University Press, 438–69. Gomez-Imbert, Elsa (2007). ‘Tukanoan nominal classification: The Tatuyo system’, in Leo Wetzels (ed.), Language Endangerment and Endangered Languages: Linguistic and
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
361
Anthropological Studies with Special Emphasis on the Languages and Cultures of the Andean-Amazonian Border Area. Leiden: Leiden University, 401–28. Good, Jeff (2012a). ‘How to become a “Kwa” noun’, Morphology 22: 293–335. doi:10.1007/ s11525-011-9197-2 Good, Jeff (2012b). ‘Typologizing grammatical complexities: Or why creoles may be paradigmatically simple but syntagmatically average’, Journal of Pidgin and Creole Languages 27(1): 1–47. doi:10.1075/jpcl.27.1.01goo Good, Jeff (2015). ‘Paradigmatic complexity in pidgins and creoles’, Word Structure 8(2): 184–227. doi:10.3366/word.2015.0081 Good, Jeff (2016). The Linguistic Typology of Templates. Cambridge: Cambridge University Press. Grant, Anthony P. (1996). ‘The evolution of functional categories in Grande Ronde Chinook Jargon: Ethnolinguistic and grammatical considerations’, in Philip Baker and Anand Syea (eds), Changing Meanings, Changing Functions: Papers Relating to Grammaticalization in Creole Languages. London: University of Westminster Press, 225–42. Grant, Anthony (2009). ‘Admixture, structural transmission, simplicity and complexity’, in Nicholas Faraclas and Thomas Klein (eds), Simplicity and Complexity in Creoles and Pidgins. London: Battlebridge Publications, 125–52. Green, Ian (2003). ‘The genetic status of Murrinh-patha’, in Nicholas Evans (ed.), The NonPama-Nyungan Languages of Northern Australia. Canberra: Pacific Linguistics, 125–58. Greenberg, Joseph H. (1954). ‘A quantitative approach to the morphological typology of language’, in Robert F. Spencer (ed.), Method and Perspective in Anthropology: Papers in Honor of Wilson D. Wallis. Minneapolis: Minnesota University Press, 192–220. Greenberg, Joseph H. (1960). ‘A quantitative approach to the morphological typology of language’, International Journal of American Linguistics 26(3): 178–94. doi:10.1086/ 464575 Grijns, Cornelis D. (1991). Jakarta Malay: A Multidimensional Approach to Spatial Variation. Leiden: KITLV Press. Grinevald, Colette and Frank Seifart (2004). ‘Noun classes in African and Amazonian languages: Towards a comparison’, Linguistic Typology 8: 243–85. doi:10.1515/ lity.2004.007 Grünwald, Peter D. (2007). The Minimum Description Length Principle. Cambridge, MA: The MIT Press. Guérin, Maximilien (2011). Le syntagme nominal en wolof. Une approche typologique. Paris: Université Sorbonne Nouvelle—Paris 3 MA thesis. Guillaume, Antoine (2008). A Grammar of Cavineña. Berlin: Mouton de Gruyter. Guillaume, Antoine (2016). ‘Associated motion in South America: Typological and areal perspectives’, Linguistic Typology 20: 81–177. doi:10.1515/lingty-2016-0003 Guillaume, Antoine and Françoise Rose (2010). ‘Sociative causative markers in South American languages: A possible areal feature’, in Franck Floricic (ed.), Essais de typologie et de linguistique générale, Mélanges offerts à Denis Creissels. Lyon: ENS Éditions, 383–402. Guy, Gregory (1991). ‘Explanation in variable phonology: An exponential model of morphological constraints’, Language Variation and Change 3: 1–22. doi:10.1017/ S0954394500000429 Hale, Kenneth (1969). Walbiri Conjugations. Cambridge, MA: MIT.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
362
Halle, Moris (1994). ‘The Russian declension: An illustration of the theory of Distributed Morphology’, in Jennifer S. Cole and Charles Kisseberth (eds), Perspectives in Phonology. Stanford: CSLI Publications, 29–60. Hammarström, Harald, Robert Forkel, and Martin Haspelmath (eds) (2019). Glottolog 3.4. Jena: Max Planck Institute for the Science of Human History. URL: https://glottolog.org Hansson, Inga-Lill (2003). ‘Akha’, in Randy LaPolla and Graham Thurgood (eds). The Sino-Tibetan Languages. London: Routledge, 236–51. Harris, Alice (2004). ‘History in support of synchrony’, in Charles Chang, Michael J. Houser, Yuni Kim, David Mortensen, and Mischa Park-Doob (eds), Proceedings of the Berkeley Linguistics Society. Berkeley Linguistics Society, 142–59. Harris, Alice (2017). Multiple Exponence. Oxford: Oxford University Press. Harris, Alice and Lyle Campbell (1995). Historical Syntax in Cross-linguistic Perspective. Cambridge: University of Cambridge Press. Haspelmath, Martin (2009). ‘An empirical test of the Agglutination Hypothesis’, in Sergio Scalise, Elisabetta Magni, and Antonietta Bisetto (eds), Universals of Language Today. Dordrecht: Springer, 13–29. Haspelmath, Martin (2011). ‘The indeterminacy of word segmentation and the nature of morphology and syntax’, Folia Linguistica 45(1): 31–80. doi:10.1515/flin-2017-1005 Haspelmath, Martin, Matthew Dryer, David Gil, and Bernard Comrie (eds) (2005). The World Atlas of Language Structures. Oxford: Oxford University Press. Haspelmath, Martin and Thomas Müller-Bardey (2004). ‘Valency change’, in Geert E. Booij, Christian Lehmann, Joachim Mugdan, and Stavros Skopeteas (in collaboration with Wolfgang Kesselheim) (eds), Morphology: A Handbook on Inflection and Word Formation, vol. 2. Berlin: de Gruyter, 1130–45. Haspelmath, Martin and Andrea D. Sims (2010). Understanding Morphology. 2nd ed. London: Hodder Education. Haude, Katharina (2006). A Grammar of Movima. Universiteit Nijmegen PhD dissertation. Hauser, Marc D., Noam Chomsky, and Tecumseh W. Fitch (2002). ‘The faculty of language: What is it, who has it, and how did it evolve?’, Science 298(5598): 1569–79. doi:10.1126/science.298.5598.1569 Hawkins, John A. (2004). Efficiency and Complexity in Grammars. New York: Oxford University Press. Hawkins, John A. (2007). ‘Processing typology and why psychologists need to know about it’, New Ideas in Psychology 25: 87–107. doi:10.1016/j.newideapsych.2007.02.003 Hawkins, John A. (2014). Cross-Linguistic Variation and Efficiency. Oxford: Oxford University Press. Hay, Jennifer (2001). ‘Lexical frequency in morphology: Is everything relative?’, Linguistics 39(6): 1041–70. doi:10.1515/ling.2001.041 Hay, Jennifer (2003). Causes and Consequences of Word Structure. New York: Routledge. Hay, Jennifer and Laurie Bauer (2007). ‘Phoneme inventory size and population size’, Language 83(2): 388–400. doi:10.1353/lan.2007.0071 Haynie, Hannah, Claire Bowern, Patience Epps, Jane Hill, and Patrick McConvell (2014). ‘Wanderwörter in languages of the Americas and Australia’, Ampersand 1: 1–18. doi:10.1016/j.amper.2014.10.001 Hazaël-Massieux, Marie-Christine (2002). ‘Les créoles à base française: une introduction’, Travaux Interdisciplinaires du Laboratoire Parole et Langage d’Aix-en-Provence (TIPA) 21: 63–86. Hengeveld, Kees and Sterre Leufkens (2018). ‘Transparent and non-transparent languages’, Folia Linguistica 52(1): 139–75. doi:10.1515/flin-2018-0003
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
363
Henri, Fabiola (2010). A Constraint-Based Approach to Verbal Constructions in Mauritian. University of Mauritius and Université Paris Diderot PhD dissertation. Henri, Fabiola (2012). ‘Attenuative reduplication in Mauritian’. In Enoch Aboh, Norval Smith, and Anne Zribi-Hertz (eds), The Morphosyntax of Reiteration. Amsterdam: John Benjamins, 203–34. Henri, Fabiola (forthcoming). ‘Morphomic structure in Mauritian: On change, complexity and creolization’, Morphology. Henri, Fabiola and Alain Kihm (2015). ‘The morphology of TMA marking in creole languages: A comparative study’, Word Structure 8(2): 248–82. doi:10.3366/word.2015.0083 Henri, Fabiola, Jean-Marie Marandin, and Anne Abeillé (2008). ‘Information structure coding in Mauritian: Verum Focus expressed by long forms of verbs’. Paper presented at the Workshop on Predicate Focus, Verum Focus, Verb Focus, University of Potsdam. Hill, Jane H. (2001). ‘Proto-Uto-Aztecan: A community of cultivators in Central America?’, American Anthropologist 103: 913–34. doi:10.1525/aa.2001.103.4.913 Hill, Jane H. (2010). ‘New evidence for a Mesoamerican homeland for Proto-Uto-Aztecan’, PNAS 107(11): E33. doi:10.1073/pnas.0914473107 Hill, Nathan (2014). ‘Grammatically conditioned sound change’, Language and Linguistics Compass 8: 211–29. doi:10.1111/lnc3.12073 Hippisley, Andrew, Marina Chumakina, Greville G. Corbett, and Dunstan Brown (2004). ‘Suppletion: Frequency, categories and distribution of stems’, Studies in Language 28(2): 387–418. doi:10.1075/sl.28.2.05hip Hock, Hans Henrich and Brian D. Joseph (1996). Language History, Language Change, and Language Relationship. Berlin: Walter de Gruyter. Hockett, Charles F. (1947). ‘Problems of morphemic analysis’, Language 23(4): 321–43. Hockett, Charles F. (1958). A Course in Modern Linguistics. New York: Macmillan. Hodge, Carleton (1970). ‘The linguistic cycle’, Language Sciences 13: 1–7. [Reprinted in Scott Noegel and Alan S. Kaye (eds) (2004), Afroasiatic Linguistics, Semitics, and Egyptology: Selected Writings of Carleton T. Hodge, Bethesda, MD: CDL Press, 1–17.] Hopper, Paul (1990). ‘Where do words come from?’, in William Croft, Keith Denning, and Suzanne Kemmer (eds), Studies in Typology and Diachrony: Papers Presented to Joseph H. Greenberg on his 75th Birthday. Amsterdam: John Benjamins, 151–60. Hualde, José Ignacio, Gorka Elordieta, and Arantzazu Elordeta (1994). The Basque Dialect of Lekeitio. Bilbo: Universidad del País Vasco/Euskal Herriko Univertsitatea. Hualde, José Ignacio and Jon Ortiz de Urbina (2003). A Grammar of Basque. Berlin: Mouton de Gruyter. Huber, Christian (2011). ‘Some notes on gender and number marking in Shumcho’, in Gerda Lechleitner and Christian Liebl (eds), Jahrbuch des Phonogrammarchivs, vol. 2. Göttingen: Cuvillier Verlag, 52–90. Hudson, Carla L. and Elissa L. Newport. (1999). ‘Creolization: Could adults really have done it all’, in Annabel Greenhill, Heather Littlefield, and Cheryl Tano (eds), Proceedings of the 23rd Annual Boston University Conference on Language Development. Somerville: Cascadilla Press, 265–76. Hudson Kam, Carla L. and Elissa L. Newport (2005). ‘Regularizing unpredictable variation: The roles of adult and child learners in language formation and change’, Language Learning and Development 1(2): 151–95. doi:10.1080/15475441.2005.9684215 Hudson Kam, Carla L. and Elissa L. Newport (2009). ‘Getting it right by getting it wrong: When learners change languages’, Cognitive Psychology 59(1): 30–66. doi:10.1016/j. cogpsych.2009.01.001 Huldén, Lars (1972). ‘Genussystemet i Karleby och Nedervetil’, Folkmålsstudier 22: 47–82.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
364
Hull, Geoffrey (1998). ‘The basic lexical affinities of Timor’s Austronesian languages: A preliminary investigation’, Studies in the Languages and Cultures of East Timor 1: 97–174. Hull, Geoffrey (1999). Standard Tetum-English Dictionary. Sydney: Allen & Unwin. Hultman, Oskar Fredrik (1894). De östsvenska dialekterna. Helsinki: Svenska landsmålsföreningen. Humboldt, Wilhelm von (1836). Über die Verschiedenheit des menschlichen Sprachbaues und ihren Einfluss auf die geistige Entwickelung des Menschengeschlechts. Berlin: F. Dümmler. Hyman, Larry M. (2004). ‘How to become a Kwa verb’, Journal of West African Languages 30: 69–88. Igartua, Iván (2019). ‘Loss of grammatical gender and language contact’, Diachronica 36: 181–221. doi:10.1075/dia.17004.iga Irvine, Judith (1978). ‘Wolof noun classification: The social setting of divergent change’, Language in Society 7: 37–64. doi:10.1017/S0047404500005327 Irvine, Judith (2011). ‘Société et communication chez les Wolof à travers le temps et l’espace’, in Anna M. Diagne, Sascha Kesseler, and Christian Meyer (eds), Communication wolof et société sénégalaise. Héritage et création. Paris: L’Harmattan, 37–70. Jakobson, Roman (1929). Remarques sur l’évolution phonologique du russe comparée à celle des autres langues slaves. Praha: Jednota československých matematiků a fysiků. Jakobson, Roman (1959). ‘On linguistic aspects of translation’, in Reuben A. Brower (ed.), On Translation. Cambridge, MA: Harvard University Press, 232–9. Jamieson, Carole Ann (1982). ‘Conflated subsystems marking person and aspect in Chiquihuitlán Mazatec verbs’, International Journal of American Linguistics 48(2): 139–67. doi:10.1086/465725 Janda, Laura A. (1994). ‘The spread of athematic 1sg -m in the major West Slavic languages’, The Slavic and East European Journal 38(1): 90–119. doi:10.2307/308549 Janhunen, Juha (2008). ‘Mongolic as an expansive language family’, in Tokusu Kurebito (ed.), Past and Present Dynamics: The Great Mongolian State. Tokyo: Tokyo University of Foreign Studies, Research Institute for Languages and Cultures of Asia and Africa, 127–37. Janse, Mark and Sijmen Tol (eds). (2003). Language Death and Language Maintenance: Theoretical, Practical and Descriptive Approaches. Amsterdam: John Benjamins. Jespersen, Otto (1949). A Modern English Grammar on Historical Principles. London: Allen & Unwin. Joanisse, Marc F. and Mark S. Seidenberg (2005). ‘Imaging the past: Neural activation in frontal and temporal regions during regular and irregular past-tense processing’, Cognitive, Affective & Behavioral Neuroscience 5(3): 282–96. Johnson, Jacqueline S., Kenneth D. Shenkman, Elissa L. Newport, and Douglas L. Medin (1996). ‘Indeterminacy in the grammar of adult language learners’, Journal of Memory and Language 35: 335–52. doi:10.1006/jmla.1996.0019 Joseph, Brian D. and Richard D. Janda (1988). ‘The how and why of diachronic morphologization and demorphologization’, in Michael Hammond and Michael Noonan (eds), Theoretical Morphology. New York: Academic Press, 193–210. Joseph, John E. and Frederick J. Newmeyer (2012). ‘ “All languages are equally complex”: The rise and fall of a consensus’, Historiographia Linguistica 39(2–3): 341–68. doi:10.1075/hl.39.2-3.08jos
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
365
Juola, Patrick (1998). ‘Measuring linguistic complexity: The morphological tier’, Journal of Quantitative Linguistics 5: 206–13. doi:10.1080/09296179808590128 Karatsareas, Petros (2009). ‘The loss of grammatical gender in Cappadocian Greek’, Transactions of the Philological Society 107: 196–230. doi:10.1111/j.1467-968X. 2009.01217.x Karatsareas, Petros (2014). ‘On the diachrony of gender in Asia Minor Greek: The development of semantic agreement in Pontic’, Language Sciences 43: 77–101. doi:10.1016/j.langsci.2013.10.005 Kelly, Barbara, Gillian Wigglesworth, Rachel Nordlinger, and Joseph Blythe (2014). ‘The acquisition of polysynthetic languages’, Language and Linguistics Compass 8(2): 51–64. doi:10.1111/lnc3.12062 Kendall, Maurice and Jean Dickinson Gibbons (1990). Rank Correlation Methods. 5th ed. Oxford: Oxford University Press. Kibrik, Aleksandr E. (1991). ‘Organizing principles for nominal paradigms in Daghestanian languages: Comparative and typological observations’, in Frans Plank (ed.), Paradigms: The Economy of Inflection. Berlin: Mouton de Gruyter, 255–74. Kibrik, Aleksandr E. (2003). ‘Nominal inflection galore: Daghestanian, with side glances at Europe and the world’, in Frans Plank (ed.), Noun Phrase Structure in the Languages of Europe. Berlin: Mouton de Gruyter, 37–112. Kibrik, Andrej A. (2012). ‘What’s in the head of head-marking languages?’, in Pirkko Suihkonen, Bernard Comrie, and Valery Solovyev (eds), Argument Structure and Grammatical Relations: A Crosslinguistic Typology. Amsterdam: John Benjamins, 211–40. Kielhorn, Franz (1871). The Paribhāṣenduśekhara of Nāgojībhaṭṭa (2 vols). Bombay: InduPrakāsh Press. Kihm, Alain (1994). Kriyol Syntax. Amsterdam: John Benjamins. Kihm, Alain (2014). ‘Theories of morphology and theories of creole emergence: The inner connection’. PAPIA, São Paulo, 24(1): 43–89. Killian, Don (2015). Topics in Uduk Phonology and Morphosyntax. University of Helsinki PhD dissertation. Kirby, Simon, Hannah Cornish, and Kenny Smith (2008). ‘Cumulative cultural evolution in the laboratory: An experimental approach to the origins of structure in human language’, Proceedings of the National Academy of Sciences 105(31): 10681–6. doi:10.1073/ pnas.0707835105 Kiso, Andrea (2012). Tense and Aspect in Chichewa, Citumbuka and Cisena: A Description and Comparison of the Tense-Aspect Systems in Three Southeastern Bantu Languages. Stockholm University dissertation. Klausenburger, Jurgen (1976). ‘(De)morphologization in Latin’, Lingua 40(4): 305–20. doi:10.1016/0024-3841(76)90082-6 Klingler, Thomas (2003). If I Could Turn My Tongue Like That: The Creole of Pointe Coupee Parish, Louisiana. Baton Rouge: Louisiana State University Press. Kobès, Aloys (1869). Grammaire de la langue volofe. Ouvrage nouveau. Saint-Joseph de Ngasobil: Imprimerie de la Mission. Kobès, Aloys (1875). Dictionnaire volof-francais. Saint-Joseph de Ngasobil: Mission Catholique [cited from the new edition: Kobès, Aloys and Olivier Abiven (1923), Dictionnaire volof-francais. Nouvelle édition revue et considerablement augmentée par le R. P. O. Abiven. Dakar: Mission Catholique]. Koopman, Hilda and Claire Lefebvre (1981). ‘Haitian Creole pu’, in Pieter C. Muysken (ed.), Generative Studies on Creole Languages. Dordrecht: Foris, 201–21. Koptjevskaja-Tamm, Maria and Bernhard Wälchli (2001). ‘The Circum-Baltic languages: An areal-typological approach’, in Östen Dahl and Maria Koptjevskaja-Tamm (eds),
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
366
Circum-Baltic Languages, vol. 2: Grammar and Typology. Amsterdam: John Benjamins, 615–750. Kortmann, Bernd and Benedikt Szmrecsanyi (eds) (2012). Linguistic Complexity: Second Language Acquisition, Indigenization, Contact. Berlin: De Gruyter. Krashnoukhova, Olga (2012). The Noun Phrase in the Languages of South America. Universiteit Nijmegen PhD dissertation. Kreyer, Rolf (2003). ‘Genitive and of-construction in modern written English: Processability and human involvement’, International Journal of Corpus Linguistics 8 (2): 169–207. doi:10.1075/ijcl.8.2.02kre Kusters, Wouter (2003). Linguistic Complexity: The Influence of Social Change on Verbal Inflections. Utrecht: LOT. Kusters, Wouter (2008). ‘Complexity in linguistic theory, language learning and language change’, in Matti Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language Complexity: Typology, Contact, Change. Amsterdam: John Benjamins, 3–22. Labouret, Henri (1935). ‘Remarques sur la langue des wolof ’, in Nicolas Leca (ed.), Les pêcheurs de Guet N’dar. Paris: Larose, 16–27. [Reprinted in Gabriel Manessy and Serge Sauvageot (eds) (1963). Wolof et Sérèr. Études de phonétique et de grammaire descriptive. Dakar: University of Dakar Press, 45–56.] Labov, William (1963). ‘The social motivation of a sound change’, Word 19: 273–309. Ladd, D. Robert, Seán G. Roberts, and Dan Dediu (2015). ‘Correlational studies in typological and historical linguistics’, Annual Review of Linguistics 1: 221–41. doi:10.1146/annurev-linguist-030514-124819 Landaburu, Jon (2005). ‘Expresión gramatical de lo epistémico en algunas lenguas del norte de Suramerica’, Proceedings of the Conference on Indigenous Languages of Latin America, 1–13. URL: lanic.utexas.edu/project/etext/llilas/cilla/landaburu2.pdf Leclerc, Jacques (2015). L’aménagement linguistique dans le monde. URL: http://www.axl. cefan.ulaval.ca/afrique/senegal.htm Leer, Jeff (1991). ‘Evidence for a Northern Northwest Coast language area: Promiscuous number marking and periphrastic possessive constructions in Haida, Eyak, and Aleut’, International Journal of American Linguistics 57(2): 158–93. doi:10.1086/ ijal.57.2.3519765 Lefebvre, Claire (1998). Creole Genesis and the Acquisition of Grammar. Cambridge: Cambridge University Press. Lefebvre, Claire and Anne-Marie Brousseau (2002). Fongbe. Berlin: Mouton de Gruyter. Lehmann, Christian (1985). ‘Grammaticalization: Synchronic variation and diachronic change’, Lingua e Stile 20: 303–18. Lewis, Geoffrey L. (2001). Turkish Grammar. 2nd ed. Oxford: Oxford University Press. Lewis, M. Paul, Gary F. Simons, and Charles D. Fennig (eds) (2015). Ethnologue: Languages of the World. 18th ed. Dallas, TX: SIL International. URL: http://www. ethnologue.com Li, Charles N. and Sandra A. Thompson (1976). ‘Development of the causative in Mandarin Chinese: Interaction of diachronic processes in syntax’, in Masayoshi Shibatani (ed.), The Grammar of Causative Constructions. New York: Academic Press, 477–92. Li, Charles N. and Sandra A. Thompson (1981). Mandarin Chinese: A Functional Reference Grammar. Berkeley, CA: University of California Press. Lindström, Eva (2008). ‘Language complexity and interlinguistic difficulty’, in Matti Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language Complexity: Typology, Contact, Change. Amsterdam: John Benjamins, 217–42.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
367
Loporcaro, Michele (2018). Gender from Latin to Romance: History, Geography, Typology. Oxford: Oxford University Press. Loporcaro, Michele, Francesco Gardani, and Alberto Giudici (forthcoming). ‘Contactinduced complexification in the gender system of Istro-Romanian’. Journal of Language Contact. Loporcaro, Michele and Tania Paciaroni (2011). ‘Four gender-systems in Indo-European’, Folia Linguistica 45(2): 389–434. doi:10.1515/flin.2011.015 Lowe, Ivan (1999). ‘Nambiquara’, in Robert M. W. Dixon and Aikhenvald Y. Aikhenvald (eds), The Amazonian Languages. Cambridge: Cambridge University Press, 269–92. Ludwig, Ralph, Sylviane Telchid, and Florence Bruneau-Ludwig (eds) (2001). Corpus créole. Hamburg: Helmut Buske. Luís, Ana R. (2009). ‘The loss and survival of inflectional morphology: Contextual vs. inherent inflection in creoles’, in Sonia Colina, Antxon Olarrea, and Ana Carvalho (eds), Romance Linguistics 2009. Amsterdam: John Benjamins, 323–36. Luís, Ana R. (2014). ‘Inflectional structure without morphemes: Similarities between creoles and non-creoles’, PAPIA, São Paulo, 24(2): 381–406. Lüpke, Friederike and Mary Raymond (eds) (2010). Documenting Atlantic-Mande Convergence and Diversity. Special issue of the Journal of language contact—THEMA 3. Lupyan, Gary and Rick Dale (2010). ‘Language structure is partly determined by social structure’, PLoS ONE 5(1): e8559. doi:10.1371/journal.pone.0008559 MacWhinney, Brian, Elizabeth Bates, and Reinhold Kliegl (1984). ‘Cue validity and sentence interpretation in English, German, and Italian’, Journal of Verbal Learning and Verbal Behavior 23(2): 127–50. doi:10.1016/S0022-5371(84)90093-8 Madsen, David and David Rhode (1994). Across the West: Human Population Movement and the Expansion of the Numa. Salt Lake City, UT: University of Utah Press. Maiden, Martin (2005). ‘Morphological autonomy and diachrony’, in Geert E. Booij and Jaap van Marle (eds), Yearbook of Morphology 2004. Dordrecht: Springer, 137–75. doi:10.1007/1-4020-2900-4_6 Maiden, Martin (2013). ‘ “Semi-autonomous” morphology? A problem in the history of the Italian (and Romanian) verb’, in Silvio Cruschina, Martin Maiden, and John C. Smith (eds), The Boundaries of Pure Morphology: Diachronic and Synchronic Perspectives. Oxford: Oxford University Press, 24–44. Maiden, Martin (2018). The Romance Verb: Morphomic Structure and Diachrony. Oxford: Oxford University Press. Maiden, Martin, John C. Smith, Maria Goldbach, and Marc-Olivier Hinzelin (eds) (2011). Morphological Autonomy: Perspectives from Romance Inflectional Morphology. Oxford: Oxford University Press. Maitz, Péter and Attila Németh (2014). ‘Language contact and morphosyntactic complexity: Evidence from German’, Journal of Germanic Linguistics 26(1): 1–29. doi:10.1017/ S1470542713000184 Malone, Terrell A. (1988). ‘The origin and development of Tuyuca evidentials’, International Journal of American Linguistics 54: 119–40. doi:10.1086/466079 Manessy, Gabriel and Serge Sauvageot (eds) (1963). Wolof et Sérèr. Études de phonétique et de grammaire descriptive. Dakar: University of Dakar Press. Mansfield, John (2014). Polysynthetic Sociolinguistics: The Language and Culture of Murrinh Patha Youth. Australian National University PhD dissertation. Mansfield, John (2015a). ‘Consonant lenition as a sociophonetic variable in Murrinh Patha (Australia)’, Language Variation and Change 27(2): 203–25. doi:10.1017/ S0954394515000046
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
368
Mansfield, John (2015b). ‘Morphotactic variation, prosodic domains and the changing structure of the Murrinhpatha verb’, Asia-Pacific Language Variation 1(2): 163–89. doi:10.1075/aplv.1.2.03man Mansfield, John (2016). ‘Intersecting formatives and inflectional predictability: How do speakers and learners predict the correct form of Murrinhpatha verbs?’, Word Structure 9(2): 183–214. doi:10.3366/word.2016.0093 Mansfield, John (2019). Murrinhpatha Morphology and Phonology. Berlin: De Gruyter Mouton. Marschner, Ian C. (2011). ‘glm2: Fitting generalized linear models with convergence problems’, The R Journal 3(2): 12–15. Marslen-Wilson, William D. (2007). ‘Morphological processes in language comprehension’, in M. Gareth Gaskell (ed.), The Oxford Handbook of Psycholinguistics. Oxford: Oxford University Press, 175–93. Marzi, Claudia, Marcello Ferro, Ouafae Nahli, Patrizia Belik, Stavros Bompolas, and Vito Pirrelli (2018). ‘Evaluating inflectional complexity crosslinguistically: A processing perspective’, in Nicoletta Calzolari (ed.), LREC 2018: Eleventh International Conference on Language Resources and Evaluation: May 7–12, 2018, Miyazaki, Japan. Paris: European Language Resources Association ELRA, article n. 745. Matras, Yaron (1998). ‘Utterance modifiers and universals of grammatical borrowing’, Linguistics 36: 281–331. doi:10.1515/ling.1998.36.2.281 Matras, Yaron (2009). Language Contact. Cambridge: Cambridge University Press. Matras, Yaron and Jeanette Sakel (eds) (2007). Grammatical Borrowing in Cross-Linguistic Perspective. Berlin: Mouton de Gruyter. Matthews, Peter H. (1972). Inflectional Morphology. Cambridge: Cambridge University Press. Matthews, Peter. H. (1991). Morphology. 2nd ed. Cambridge: Cambridge University Press. McGregor, William (2010). ‘Optional ergative case marking systems in a typologicalsemiotic perspective’, Lingua 120: 1610–36. doi:10.1016/j.lingua.2009.05.010 McGregor, William and Jean-Christophe Verstraete (2010). ‘Optional ergative marking and its implications for linguistic theory’, Lingua 120: 1607–9. doi:10.1016/j. lingua.2009.05.009 Mc Laughlin, Fiona (1997). ‘Noun classification in Wolof: When affixes are not renewed’, Studies in African Linguistics 26(1): 1–28. Mc Laughlin, Fiona (2000). ‘Consonant mutation and reduplication in Seereer-Siin’, Phonology 17: 333–63. doi:10.1017/S0952675701003955 Mc Laughlin, Fiona (2001). ‘Dakar Wolof and the configuration of an urban identity’, Journal of African Cultural Studies 14(2): 153–72. doi:10.1080/13696810120107104 McLeod, A. Ian (2011). ‘Package “Kendall”. R package documentation’. URL: https:// cran.r-project.org/web/packages/Kendall/Kendall.pdf McWhorter, John H. (1994). ‘From focus marker to copula in Swahili’, in Kevin E. Moore, David Peterson, and Comfort Wentum (eds), Proceedings of the Berkeley Linguistics Society, Special Session on Historical Issues in African Linguistics. Berkeley, CA: Berkeley Linguistics Society, 57–66. McWhorter, John H. (1998). ‘Identifying the creole prototype: Vindicating a typological claim’, Language 74: 788–818. doi:10.2307/417003 McWhorter, John H. (2001). ‘The world’s simplest grammars are creole grammars’, Linguistic Typology 5(2–3): 125–66. doi:10.1515/lity.2001.001 McWhorter, John H. (2002). ‘What happened to English?’, Diachronica 19: 217–72. doi:10.1075/dia.19.2.02wha
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
369
McWhorter, John H. (2005). Defining Creole. New York: Oxford University Press. McWhorter, John H. (2007). Language Interrupted: Signs of Non-Native Acquisition in Standard Language Grammars. New York: Oxford University Press. McWhorter, John H. (2008). ‘Why does a language undress? Strange cases in Indonesia’, in Matti Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language Complexity: Typology, Contact, Change. Amsterdam: John Benjamins, 167–90. McWhorter, John H. (2011). Linguistic Simplicity and Complexity: Why Do Languages Undress? Berlin: Walter de Gruyter. McWhorter, John H. (2012). ‘Case closed? Testing the Feature Pool Hypothesis’, Journal of Pidgin and Creole Languages 27: 171–82. doi:10.1075/jpcl.27.1 McWhorter, John H. (2016). ‘Is radical analyticity normal? Implications of Niger-Congo and Sino-Tibetan for typology and diachronic theory’, in Elly van Gelderen (ed.), Cyclical Change Continued. Amsterdam: John Benjamins, 49–91. doi:10.1075/la.227.03mcw McWhorter, John H. (2018). The Creole Debate. Cambridge: Cambridge University Press. McWhorter, John H. (2019). ‘The radically isolating languages of Flores: A challenge to diachronic theory’, Journal of Historical Linguistics 9: 177–207. doi:10.1075/jhl.16021.mcw Meakins, Felicity (2009). ‘The case of the shifty ergative marker: A pragmatic shift in the ergative marker in one Australian mixed language’, in Jóhanna Barðdal and Shobhana L. Chelliah (eds), The Role of Semantic, Pragmatic, and Discourse Factors in the Development of Case. Amsterdam: John Benjamins, 59–91. Meakins, Felicity (2011). Case Marking in Contact: The Development and Function of Case Morphology in Gurindji Kriol. Amsterdam: John Benjamins. Meakins, Felicity (2013). ‘Gurindji Kriol’, in Susanne Maria Michaelis, Philippe Maurer, Martin Haspelmath, and Magnus Huber (eds), The Survey of Pidgin and Creole Languages, vol. III: Contact Languages Based on Languages from Africa, Asia, Australia and the Americas. Oxford: Oxford University Press, 131–9. Meakins, Felicity (2015). ‘From absolutely optional to only nominally ergative: The life cycle of the Gurindji Kriol ergative suffix’, in Francesco Gardani, Peter Arkadiev, and Nino Amiridze (eds), Borrowed Morphology. Berlin: Mouton de Gruyter, 189–218. Meakins, Felicity, Patrick McConvell, Erika Charola, Norm McNair, Helen McNair, and Lauren Campbell (2013). Gurindji to English dictionary. Batchelor, Australia: Batchelor Press. Meakins, Felicity and Rachel Nordlinger (2014). A Grammar of Bilinarra: An Australian Aboriginal Language of the Northern Territory. Berlin: Mouton de Gruyter. Meakins, Felicity and Carmel O’Shannessy (2010). ‘Ordering arguments about: Word order and discourse motivations in the development and use of the ergative marker in two Australian mixed languages’, Lingua 120(7): 1693–713. doi:10.1016/j.lingua.2009.05.013 Meakins, Felicity, Xia Hua, Cassandra Algy, and Lindell Bromham (2019). ‘Birth of a contact language did not favor simplification’, Language 95(2): 294–332. doi:10.1353/ lan.2019.0032 Meeuwis, Michael (2013). ‘Lingala’, in Susanne Maria Michaelis, Philipe Maurer, Martin Haspelmath, and Magnus Huber (eds), The Survey of Pidgin and Creole Languages, vol. III: Contact Languages Based on Languages from Africa, Asia, Australia and the Americas. Oxford: Oxford University Press, 25–33. Meijer, Guus and Pieter C. Muysken (1977). ‘On the beginnings of pidgin and creole studies: Schuchardt and Hesseling’, in Albert Valdman (ed.), Pidgin and Creole Linguistics. Bloomington: Indiana University Press, 21–48. Mel’čuk, Igor (1994). ‘Suppletion: Toward a logical analysis of the concept’, Studies in Language 18: 339–410. doi:10.1075/sl.18.2.03mel
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
370
Merrill, William L. (2012). ‘The historical linguistics of Uto-Aztecan agriculture’, Anthropological Linguistics 54(3): 203–60. doi:10.1353/anl.2012.0017 Meyerhoff, Miriam (2009). ‘Animacy in Bislama: Using quantitative methods to evaluate transfer of a substrate feature’, in James Stanford and Dennis Preston (eds), Variation in Indigenous Minority Languages. Amsterdam: John Benjamins, 369–96. Michael, Lev (2008). Nanti Evidential Practice: Language, Knowledge, and Social Action in an Amazonian Society. University of Texas at Austin PhD dissertation. Michael, Lev, William Chang, and Tammy Stark (2014). ‘Exploring phonological areality in the Circum-Andean region using a naive Bayes classifier’, Language Dynamics and Change 4(1): 27–86. doi:10.1163/22105832-00401004 Miestamo, Matti (2008). ‘Grammatical complexity in a cross-linguistic perspective’, in Matti Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language Complexity: Typology, Contact, Change. Amsterdam: John Benjamins, 23–41. Miestamo, Matti (2017). ‘Linguistic diversity and complexity’, Lingue e Linguaggio 16(2). 227–54. Miestamo, Matti, Kaius Sinnemäki, and Fred Karlsson (eds) (2008). Language Complexity: Typology, Contact, Change. Amsterdam: John Benjamins. Mihas, Elena (2015). A Grammar of Alto Perené (Arawak). Berlin: De Gruyter Mouton. Milin, Petar, Victor Kuperman, Aleksandar Kostić, and R. Harald Baayen (2009). ‘Words and paradigms bit by bit: An information-theoretic approach to the processing of paradigmatic structure in inflection and derivation’, in James P. Blevins and Juliette Blevins (eds), Analogy in Grammar: Form and Acquisition. Oxford: Oxford University Press, 214–52. Miller, Wick R. (1983). ‘Uto-Aztecan languages’, in Alfonso Ortiz (ed.), Handbook of North American Indians, vol. 10: Southwest. Washington, DC: Smithsonian Institution, 113–24. Mithun, Marianne (1988). ‘System-defining structural properties in polysynthetic languages’, Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung 41(4): 442–52. Mithun, Marianne (1989). ‘The acquisition of polysynthesis’, Journal of Child Language 16: 285–312. doi:10.1017/S0305000900010424 Mithun, Marianne (1996). ‘General characteristics of North American Indian languages’, in Ives Goddard (ed.), Handbook of North American Indians, vol. 17: Languages. Washington, DC: Smithsonian Institution, 137–57. Mithun, Marianne (1998). ‘Yup’ik roots and affixes’, in Osahito Miyaoka and Minoru Oshima (eds), Languages of the North Pacific Rim, vol. 4. Kyoto: Kyoto University Graduate School of Letters, 63–76. Mithun, Marianne (2007). ‘Grammar, contact, and time’, Journal of Language Contact. THEMA 1: 133–55. Mithun, Marianne (2015). ‘Morphological complexity and language contact in languages indigenous to North America’, Linguistic Discovery 13(2): 37–59. Mithun, Marianne (2016). ‘Affix ordering: Motivation and interpretation’, in Andrew Hippisley and Gregory Stump (eds), The Cambridge Handbook of Morphology. Cambridge: Cambridge University Press, 149–85. Miyaoka, Osahito (2011). A Grammar of Central Alaskan Yupik (CAY). Berlin: de Gruyter Mouton. Moscoso del Prado Martín, Fermín (2003). Paradigmatic Structures in Morphological Processing: Computational and cross-linguistics studies. University of Nijmegen PhD dissertation.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
371
Moscoso del Prado Martín, Fermín (2011). ‘The mirage of morphological complexity’, in Laura Carlson, Christoph Hoelscher, and Thomas F. Shipley (eds), Proceedings of the 33rd Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society, 3524–9. Moscoso del Prado Martín, Fermín, Aleksandar Kostic, and R. Harald Baayen (2004). ‘Putting the bits together: An information-theoretical perspective on morphological processing’, Cognition 94(1): 1–18. Mufwene, Salikoko S. (2001). The Ecology of Language Evolution. Cambridge: Cambridge University Press. Mufwene, Salikoko S. (2008). Language Evolution: Contact, Competition, and Change. London: Continuum Press. Mufwene, Salikoko S. (2009). ‘Restructuring, hybridization, and complexity in language evolution’, in Enoch O. Aboh and Norval Smith (eds), Complex Processes in New Languages. Amsterdam: John Benjamins, 367–400. Mufwene, Salikoko S., François Pellegrino, and Christophe Coupé (eds) (2017). Complexity in Language: Developmental and Evolutionary Perspectives. Cambridge: Cambridge University Press. Mugdan, Joachim (1994). ‘Morphological units’, in Ron Asher (ed.), The Encyclopedia of Language and Linguistics. Oxford: Pergamon Press, 2543–53. Mühlhäusler, Peter (1997). Pidgin and Creole Linguistics. London: University of Westminster. Mukarovsky, Hans (1977). A Study of Western Nigritic, vol. I. Wien: Institut für Ägyptologie und Afrikanistik der Universität Wien. Müller, Neele (2013). Tense, Aspect, Modality, and Evidential Marking in South American Indigenous Languages. Utrecht: LOT. Munro, Pamela and Dieynaba Gaye (1997). Ay Baati Wolof: A Wolof Dictionary. Revised ed. Los Angeles: Department of Linguistics CLA. Muysken, Pieter C., Harald Hammarström, Joshua Birchall, Swintha Danielsen, Love Eriksen, Ana Vilacy Galucio, Rik van Gijn, Simon van de Kerke, Vishnupraya Kolipakam, Olga Krasnoukhova, Neele Müller, and Loretta O’Connor (2014). ‘The languages of South America: Deep families, areal relationships, and language contact’, in Loretta O’Connor and Pieter C. Muysken (eds), The Native Languages of South America. Cambridge: Cambridge University Press, 299–322. Myers-Scotton, Carol (2002). Contact Linguistics: Bilingual Encounters and Grammatical Outcomes. Oxford: Oxford University Press. Nakagawa, Shinichi and Holger Schielzeth (2013). ‘A general and simple method for obtaining R2 from generalized linear mixed-effects models’, Methods in Ecology and Evolution 4(2): 133–42. Nash, David (1980). Topics in Warlpiri Grammar. Massachusetts Institute of Technology PhD dissertation. Ndiaye, Moussa D. (2004). Eléments de morphologie du wolof. Méthodes d’analyse en linguistique. München: LINCOM Europa. Nettle, Daniel (2012). ‘Social scale and structural complexity in human languages’, Philosophical Transactions of the Royal Society B: Biological Sciences 367(1597): 1829–36. doi:10.1098/rstb.2011.0216 Neubauer, Kathleen and Harald Clahsen (2009). ‘Decomposition of inflected words in a second language: An experimental study of German participles’, Studies in Second Language Acquisition 31(3): 403–35. doi:10.1017/S0272263109090354
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
372
Newmeyer, Frederick J. and Laurel B. Preston (eds) (2014). Measuring Grammatical Complexity. Oxford: Oxford University Press. Nichols, Johanna (1986). ‘Head-marking and dependent-marking grammar’, Language 62(1): 56–119. Nichols, Johanna (1992). Linguistic Diversity in Space and Time. Chicago: University of Chicago Press. Nichols, Johanna (2003). ‘Diversity and stability in language’, in Brian D. Joseph and Richard Janda (eds), The Handbook of Historical Linguistics. Oxford: Blackwell, 283–310. Nichols, Johanna (2005). ‘The origin of the Chechen and Ingush: A study in alpine linguistic and ethnic geography’, Anthropological Linguistics 46: 129–55. Nichols, Johanna (2009). ‘Linguistic complexity: A comprehensive definition and survey’, in Geoffrey Sampson, David Gil, and Peter Trudgill (eds), Language Complexity as an Evolving Variable. Oxford: Oxford University Press, 110–25. Nichols, Johanna (2013). ‘The vertical archipelago: Adding the third dimension to linguistic geography’, in Peter Auer, Martin Hilpert, Anja Stukenbrock, and Benedikt Szmrecsanyi (eds), Space in Language and Linguistics. Berlin: Mouton de Gruyter, 38–60. Nichols, Johanna (2015). ‘Complexity as non-canonicality: An affordable, reliable metric for morphology’. Paper given at the 48th annual meeting of the Societas Linguistica Europaea (SLE), Leiden. Nichols, Johanna (2016). ‘Complex edges, transparent frontiers: Grammatical complexity and language spreads’, in Raffaela Baechler and Guido Seiler (eds), Complexity, Isolation, and Variation. Berlin: de Gruyter, 117–37. Nichols, Johanna (2017). ‘Person as an inflectional category’, Linguistic Typology 21(3): 387–456. doi:10.1515/lingty-2017-0010 Nichols, Johanna (2019). ‘Why is gender so complex? Some typological considerations’, in Francesca Di Garbo, Bruno Olsson, and Bernhard Wälchli (eds), Grammatical Gender and Linguistic Complexity, vol. I: General Issues and Specific Studies. Berlin: Language Sciences Press, 63–92. Nichols, Johanna (in prep.). The languages of the Great Caucasus range. Nichols, Johanna, Jonathan Barnes, and David A. Peterson (2006). ‘The robust bell curve of morphological complexity’, Linguistic Typology 10(1): 96–106. Nichols, Johanna and Christian Bentz (2018). ‘Morphological complexity of languages reflects the settlement history of the Americas’, in Katerina Harvati, Gerhard Jäger, and Hugo Reyes-Centano (eds), New Perspectives on the Peopling of the Americas. Tübingen: Kerns, 13–26. Nichols, Johanna and Yury Lander (2020). ‘Head-dependent marking’, in Mark Aronoff (ed.), Oxford Research Encyclopedia of Linguistics. New York: Oxford University Press. DOI: 10.1093/acrefore/9780199384655.013.523 Njie, Codu Mbassy (1982). Description syntaxique du wolof de Gambie. Dakar: Nouvelles Editions africaines. Nordlinger, Rachel (2011). ‘Transitivity in Murrinh-Patha’, Studies in Language 35(3): 702–34. doi:10.1075/sl.35.3.08nor Nordlinger, Rachel (2015). ‘Inflection in Murrinh-Patha’, in Matthew Baerman (ed.), The Oxford Handbook of Inflection. Oxford: Oxford University Press, 491–519. Nordlinger, Rachel (2017). ‘The languages of the Daly River region (Northern Australia)’, in Michael Fortescue, Marianne Mithun, and Nicholas Evans (eds), The Oxford Handbook of Polysynthesis. Oxford: Oxford University Press, 782–807.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
373
Nordlinger, Rachel and Patrick Caudal (2012). ‘The tense, aspect and modality system in Murrinh-Patha’, Australian Journal of Linguistics 32(1): 73–112. doi:10.1080/ 07268602.2012.657754 Norman, Jerry (1988). Chinese. Cambridge: Cambridge University Press. Nurse, Derek (2007). ‘Did the proto-Bantu verb have a synthetic or an analytic structure?’, SOAS Working Papers in Linguistics 15: 239–56. Nurse, Derek (2008). Tense and Aspect in Bantu. New York: Oxford University Press. O’Connor, Catherine, Joan Maling, and Barbora Skarabela (2013). ‘Nominal categories and the expression of possession: A cross-linguistic study of probabilistic tendencies and categorial constraints’, in Kersti Börjars, David Denison, and Alan Scott (eds), Morphosyntactic Categories and the Expression of Possession. Amsterdam: John Benjamins, 89–121. Olawsky, Knut (2006). A Grammar of Urarina. Berlin: Mouton de Gruyter. Ospina Bozzi, Ana María (2002). Les structures élémentaires du Yuhup Maku, langue de l’Amazonie Colombienne: Morphologie et syntaxe. Université Paris 7—Denis Diderot PhD dissertation. Öztürk, Balkız and Markus A. Pöchtrager (2011). Pazar Laz. München: LINCOM Europa. Paauw, Scott (2007). ‘A North Papua linguistic area?’. Paper given at the ‘Workshop on the Languages of Papua’, Manokwari. Parker, Jeff (2016). Inflectional Complexity and Cognitive Processing: An Experimental and Corpus-Based Investigation of Russian Nouns. The Ohio State University PhD dissertation. Parker, Jeff, Robert Reynolds, and Andrea D. Sims (to appear). ‘The role of languagespecific network properties in the emergence of inflectional irregularity’, in Andrea D. Sims, Adam Ussishkin, Jeff Parker, and Samantha Wray (eds), Morphological Typology and Linguistic Cognition. Cambridge: Cambridge University Press. Parkvall, Mikael (2008). ‘The simplicity of creoles in cross-linguistic perspective’, in Matti Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language Complexity: Typology, Contact, Change. Amsterdam: John Benjamins, 265–85. Payne, Doris L. (1990). ‘Morphological characteristics of lowland South American languages’, in Doris L. Payne (ed.), Amazonian Linguistics: Studies in Lowland South American Languages. Austin, TX: University of Texas Press, 213–41. Payne, Doris L. (2007). ‘Source of the Yagua nominal classification system’, International Journal of American Linguistics 73(4): 447–74. doi:10.1086/523773 Payne, John (2013). ‘The oblique genitive in English’, in Kersti Börjars, David Denison, and Alan Scott (eds), Morphosyntactic Categories and the Expression of Possession. Amsterdam: John Benjamins, 178–92. Payne, Thomas (1997). Describing Morphosyntax. Cambridge: Cambridge University Press. Perrin, Loïc-Michel (2012). L’expression du temps en wolof—langue atlantique parlée au Sénégal. Köln: Köppe. Perrott, D. V. (1950). Teach Yourself Swahili. New York: Random House. Pienemann, Manfred (1998). Language Processing and Second Language Development: Processability Theory. Amsterdam: John Benjamins. Pinheiro, José C. and Douglas M. Bates (2000). Mixed-Effects Models in S and S-PLUS. New York: Springer. Pinker, Steven and Alan Prince (1988). ‘On language and connectionism: Analysis of a parallel distributed processing model of language acquisition’, Cognition 28: 73–193. doi:10.1016/0010-0277(88)90032-7
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
374
Pirrelli, Vito (2000). Paradigmi in morfologia. Un approccio interdisciplinare alla flessione verbale dell’italiano. Pisa: Istituti Editoriali e Poligrafici Italiani. Pirrelli, Vito, Marcello Ferro, and Claudia Marzi (2015). ‘Computational complexity of abstractive morphology’, in Matthew Baerman, Dunstan Brown, and Greville Corbett (eds), Understanding and Measuring Morphological Complexity. Oxford: Oxford University Press, 141–66. Plag, Ingo (2003a). ‘Introduction: The morphology of creole languages’, in Geert Booij and Jaap van Marle (eds), Yearbook of Morphology 2002. Alphen aan den Rijn: Kluwer, 1–2. doi:10.1007/0-306-48223-1_1 Plag, Ingo (2003b). Phonology and Morphology of Creole Languages. Tübingen: Niemeyer. Plag, Ingo (2008). ‘Creoles as interlanguages: Inflectional morphology’, Journal of Pidgin and Creole Languages 23: 114–35. doi:10.1075/jpcl.23.1.06pla Plank, Frans (1986). ‘Paradigm size, morphological typology, and universal economy’, Folia Linguistica 20(1–2): 29–48. doi:10.1515/flin.1986.20.1-2.29 Pozdniakov, Konstantin (1993). Sravnitel’naja grammatika atlantičeskich jazykov. Moscow: Nauka. Pozdniakov, Konstantin (2015). ‘Diachronie des classes nominales atlantiques. Morphonologie, morphologie, sémantique’, in Denis Creissels and Konstantin Pozdniakov (eds), Les classes nominales dans les langues atlantiques. Köln: Köppe, 57–102. Pozdniakov, Konstantin and Stéphane Robert (2015). ‘Les classes nominales en wolof. Fonctionnalités et singularités d’un système restreint’, in Denis Creissels and Konstantin Pozdniakov (eds), Les classes nominales dans les langues atlantiques. Köln: Köppe, 545–628. Prasada, Sandeep and Steven Pinker (1993). ‘Generalisation of regular and irregular morphological patterns’, Language and Cognitive Processes 8(1): 1–56. doi:10.1080/ 01690969308406948 Pye, Br John MSC (1972). The Port Keats Story. Darwin: Colemans. Rambaud, Jean-Baptiste (1898). ‘De la détermination en wolof ’, Bulletin de la Société de Linguistique de Paris 10: 122–36. [Reprinted in Gabriel Manessy and Serge Sauvageot (eds) (1963). Wolof et Sérèr. Études de phonétique et de grammaire descriptive. Dakar: University of Dakar Press, 11–24.] Reid, Nicholas (1990). Ngan’gityemerri: A Language of the Daly River Region, Northern Territory of Australia. Australian National University PhD dissertation. Reintges, Chris (2015). ‘Increasing morphological complexity and how syntax drives morphological change’, in Theresa Biberauer and George Walkden (eds), Syntax Over Time: Lexical, Morphological, and Information-Structural Interactions. Oxford: Oxford University Press, 124–45. Rescher, Nicholas (1998). Complexity: A Philosophical Overview. New Brunswick, NJ: Transaction Publishers. Rhodes, Richard (1987). ‘Paradigms large and small’, Proceedings of the 13th Annual Meeting of the Berkeley Linguistics Society. Berkeley, CA: Berkeley Linguistics Society, 223–34. Rice, Keren (2011). ‘Principles of affix ordering: An overview’, Word Structure 4(2): 169–200. doi:10.3366/word.2011.0009 Roberts, Ian (1999). ‘Verb movement and markedness’, in Michel deGraff (ed.), Language Change: Creolization, Diachrony, and Development. Cambridge, MA: The MIT Press, 287–328.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
375
Roberts, Sarah J. and Joan Bresnan (2008). ‘Retained inflectional morphology in pidgins: A typological study’, Linguistic Typology 12(2): 269–302. doi:10.1515/LITY.2008.039 Roberts, Seán (2018). ‘Chield: Causal hypotheses in evolutionary linguistics database’, in Christine Cuskley, Molly Flaherty, Hannah Little, Luke McCrohon, Andrea Ravignani, and Tessa Verhoef (eds): The Evolution of Language: Proceedings of the 12th International Conference (EVOLANG12). doi:10.12775/3991-1.099 Robins, R. H. (1958). The Yurok Language: Grammar, Texts, Lexicon. Berkeley, CA: University of California Press. Romaine, Suzanne (1988). Pidgin and Creole Languages. London: Longman. Rottet, Kevin J. (1992). ‘Functional categories and verb movement in Louisiana creole’, Probus 4: 261–89. doi:10.1515/prbs.1992.4.3.261 Russell, Kevin (1999). ‘What’s with all these long words anyway?’, in Leora Bar-El, RoseMarie Dechaine, and Charlotte Reinholtz (eds), Papers from the Workshop on Structure and Constituency in Native American Languages. Cambridge, MA: The MIT Press, 119–30. Sadock, Jerrold (2017). ‘The subjectivity of the notion of polysynthesis’, in Michael Fortescue, Marianne Mithun, and Nicholas Evans (eds), The Oxford Handbook of Polysynthesis. Oxford: Oxford University Press, 99–114. Saffran, Jenny R., Richard N. Aslin, and Elissa L. Newport (1996). ‘Statistical learning by 8month infants’, Science 274(5294): 1926–8. doi:10.1126/science.274.5294.1926 Sagot, Benoît and Géraldine Walther (2011). ‘Non-canonical inflection: Data, formalisation and complexity measures’, in Cerstin Mahlow and Michael Piotrowski (eds), Systems and Frameworks for Computational Morphology. Berlin: Springer, 23–45. doi:10.1007/978-3642-23138-4_3 Samara, Anna, Kenny Smith, Helen Brown, and Elizabeth Wonnacott (2017). ‘Acquiring variation in an artificial language: Children and adults are sensitive to socially conditioned linguistic variation’, Cognitive Psychology 94: 85–114. doi:10.1016/j. cogpsych.2017.02.004 Sampson, Geoffrey, David Gil, and Peter Trudgill (eds) (2009). Language Complexity as an Evolving Variable. Oxford: Oxford University Press. Sapir, Edward (1921). Language: An Introduction to the Study of Speech. New York: Harcourt, Brace & Co. Sapir, J. David (1965). A Grammar of Diola–Fogny, a Language Spoken in the BasseCasamance Region of Senegal. Cambridge: Cambridge University Press. Sapir, J. David (1971). ‘West Atlantic: An inventory of the languages, their noun class systems and consonant alternation’, in Thomas Sebeok (ed.), Current Trends in Linguistics, vol. VII: Linguistics in Sub-Saharan Africa. The Hague: Mouton, 44–112. Sauvageot, Serge (1965). Description synchronique d’un dialecte Wolof. Le parler du Dyolof. Dakar: Institut Français de l’Afrique Noire. Sauvageot, Serge (1967). ‘Note sur la classification nominale en baïnouk’, in Gabriel Manessy (ed.), La classification nominale dans les langues négro-africaines. Paris: CNRS, 225–36. Scalise, Sergio (1984). Morfologia lessicale. Padova: CLESP. Schiering, René, Balthasar Bickel, and Kristine Hildebrandt (2010). ‘The prosodic word is not universal, but emergent’, Journal of Linguistics 46: 657–710. doi:10.1017/ S0022226710000216 Schlegel, Friedrich von (1808). Über die Sprache und Weisheit der Indier. Ein Beitrag zur Begründung der Alterthumskunde. Heidelberg: Mohr & Zimmer.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
376
Schreuder, Robert and R. Harald Baayen (1997). ‘How simplex complex words can be’, Journal of Memory and Language 37: 118–39. doi:10.1006/jmla.1997.2510 Schwegler, Armin (2013). ‘Palenquero structure dataset’, in Susanne Maria Michaelis, Philippe Maurer, Martin Haspelmath, and Magnus Huber (eds), Atlas of Pidgin and Creole Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. URL: http://apics-online.info/contributions/48 Segerer, Guillaume (2010). ‘Isolates in Atlantic’. Paper given at the workshop ‘Language Isolates in Africa’, 4 December, Lyon. Seifart, Frank (2005). The Structure and Use of Shape-Based Noun Classes in Miraña (North West Amazon). Universiteit Nijmegen PhD dissertation. Seifart, Frank (2011). Bora Loans in Resígaro: Massive Morphological and Little Lexical Borrowing in a Moribund Arawakan Language. Cadernos de Etnolingüística, Série Monografias 2 [online publisher]. Seifart, Frank and Doris Payne (2007). ‘Nominal classification in the Northwest Amazon: Issues in areal diffusion and typological characterization’, International Journal of American Linguistics 73(4): 381–7. doi:10.1086/523770 Seuren, Pieter (1990). ‘Verb syncopation and predicate raising in Mauritian Creole’, Theoretical Linguistics 1(13): 804–44. doi:10.1515/ling.1990.28.4.809 Seuren, Pieter (1998). Western Linguistics: An Historical Introduction. Oxford: Blackwell. Seuren, Pieter and Herman Wekker (1986). ‘Semantic transparency as a factor in creole genesis’, in Pieter Muysken and Norval Smith (eds), Substrata versus Universals in Creole Genesis. Amsterdam: John Benjamins, 57–70. Shalizi, Cosma Rohilla (2001). ‘Causal architecture, complexity and self-organization in the time series and cellular automata’. University of Wisconsin-Madison PhD dissertation. Shannon, Claude E. (1948). ‘A mathematical theory of communication’, Bell System Technical Journal 27(3): 379–423. Shosted, Ryan (2006). ‘Correlating complexity: A typological approach’, Linguistic Typology 10(1): 1–40. doi:10.1515/LINGTY.2006.001 Silva, Wilson de Lima (2012). A Descriptive Grammar of Desano. University of Utah PhD dissertation. Sims, Andrea D. (2015). Inflectional Defectiveness. Cambridge: Cambridge University Press. Sims, Andrea D. and Jeff Parker (2016). ‘How inflection class systems work: On the informativity of implicative structure’, Word Structure 9(2): 215–39. doi:10.3366/ word.2016.0094 Sinnemäki, Kaius (2008). ‘Complexity trade-offs in core argument marking’, in Matti Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language Complexity: Typology, Contact, Change. Amsterdam: John Benjamins, 67–88. Sinnemäki, Kaius (2011). Language Universals and Linguistic Complexity: Three Case Studies in Core Argument Marking. University of Helsinki PhD dissertation. Sinnemäki, Kaius (2014). ‘Global optimization and complexity trade-offs’, Poznań Studies in Contemporary Linguistics 50(2): 179–95. doi: 10.1515/psicl-2014-0013 Smith, Kenny, Amy Perfors, Olga Fehér, Anna Samara, Kate Swoboda, and Elizabeth Wonnacott (2017). ‘Language learning, language use and the evolution of linguistic variation’, Philosophical Transactions of the Royal Society B 372(1711): 20160051. doi:10.1098/rstb.2016.0051 Smith, Kenny and Elizabeth Wonnacott (2010). ‘Eliminating unpredictable variation through iterated learning’, Cognition 116(3): 444–9. doi:10.1016/j.cognition.2010.06.004 Soubrier, Aude (2013). Description de l’ikposso uwi. Lyon: Université Lumière Lyon 2 dissertation.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
377
Spencer, Andrew and Ana R. Luís (2012). Clitics: An Introduction. Cambridge: Cambridge University Press. Stahlke, Herbert (1970). ‘Serial verbs’, Studies in African Linguistics 1: 60–99. Štekauer, Pavol (2015). ‘The delimitation of derivation and inflection’, in Peter O. Müller, Ingeborg Ohnheiser, Susan Olsen, and Franz Rainer (eds), Word-Formation: An International Handbook of the Languages of Europe, vol. 1. Berlin: De Gruyter Mouton, 218–35. Stenzel, Kristine (2008). ‘Evidentials and clause modality in Wanano’, Studies in Language 32(2): 405–45. doi:10.1075/sl.32.2.06ste Stenzel, Kristine (2013a). A Reference Grammar of Kotiria (Wanano). Lincoln, NE: University of Nebraska Press. Stenzel, Kristine (2013b). ‘Contact and innovation in Vaupés possession-marking strategies’, in Patience Epps and Kristine Stenzel (eds), Cultural and Linguistic Interaction in the Upper Rio Negro Region. Rio de Janeiro: Museu do Índio-FUNAI, 353–402. Stenzel, Kristine and Elsa Gomez-Imbert (2009). ‘Contato linguístico e mudança linguística no noroeste amazônico: O caso do Kotiria (Wanano)’, Revista da ABRALIN 8: 71–100. Stewart, William Alexander and William W. Gage (1970). Notes on Wolof Grammar by William A. Stewart. Adapted by William W. Gage, in Dakar Wolof: A Basic Course prepared by Loren V. Nussbaum, William W. Gage, and Daniel Varre. Washington, DC: Center for Applied Linguistics, 355–412. Stilo, Donald (2019). ‘Loss vs. expansion of gender in Tatic languages: Kafteji (Kabatei) and Kelāsi’, in Alireza Korangy and Behrooz Mahmoodi-Bakhtiari (eds), Essays on Typology of Iranian Languages. Berlin: De Gruyter Mouton, 34–78. doi:10.1515/9783110604443-004 Stoll, Sabine, Balthasar Bickel, and Jekaterina Mažara (2017). ‘The acquisition of polysynthetic verb forms in Chintang’, in Michael Fortescue, Marianne Mithun, and Nicholas Evans (eds), The Oxford Handbook of Polysynthesis. Oxford: Oxford University Press, 495–514. Stolz, Thomas (2012). ‘Survival in a niche: On gender-copy in Chamorro (and sundry languages)’, in Martine Vanhove, Thomas Stolz, Aina Urdze, and Hitomi Otsuka (eds), Morphologies in Contact. Berlin: Akademie-Verlag, 93–140. Stolz, Thomas (2015). ‘Adjective-noun agreement in language contact’, in Francesco Gardani, Peter Arkadiev, and Nino Amiridze (eds), Borrowed Morphology. Berlin: Mouton de Gruyter, 269–301. Street, Chester (1987). An Introduction to the Language and Culture of the Murrinh-Patha. Darwin: Summer Institute of Linguistics. Stump, Gregory (2001). Inflectional Morphology: A Theory of Paradigm Structure. Cambridge: Cambridge University Press. Stump, Gregory (2006a). ‘Heteroclisis and paradigm linkage’, Language 82(2): 279–322. doi:10.1353/lan.2006.0110 Stump, Gregory (2006b). ‘Template morphology’, in Keith Brown (ed.), Encyclopedia of Language & Linguistics. 2nd ed. Oxford: Elsevier, 559–63. Stump, Gregory (2016). Inflectional Paradigms: Content and Form at the SyntaxMorphology Interface. Cambridge: Cambridge University Press. Stump, Gregory (2017). ‘The nature and dimensions of complexity in morphology’. Annual Review of Linguistics 3(1): 65–83. doi:10.1146/annurev-linguistics-011415-040752 Stump, Gregory and Raphael A. Finkel (2013). Morphological Typology: From Word to Paradigm. Cambridge: Cambridge University Press. Stump, Gregory and Raphael A. Finkel (2015). ‘Contrasting modes of representation for inflectional systems: Some implications for computing morphological complexity’, in
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
378
Matthew Baerman, Dunstan Brown, and Greville G. Corbett (eds), Understanding and Measuring Morphological Complexity. Oxford: Oxford University Press, 119–40. Syea, Anand (1992). ‘The short and long forms of verbs in Mauritian Creole: Functionalism versus formalism’, Theoretical Linguistics 18: 61–97. doi:10.1515/thli.1992.18.1.61 Sylla, Yero (1982). Grammaire moderne du Pulaar. Dakar: Nouvelles éditions africaines. Szmrecsanyi, Benedikt and Bernd Kortmann (2009). ‘The morphosyntax of varieties of English worldwide: A quantitative perspective’, Lingua 119(11): 1643–63. doi:10.1016/j. lingua.2007.09.016 Taft, Marcus (1979). ‘Recognition of affixed words and the word frequency effect’, Memory & Cognition 7(4): 263–72. doi:10.3758/BF03197599 Taft, Marcus (2004). ‘Morphological decomposition and the reverse base frequency effect’, The Quarterly Journal of Experimental Psychology 57(4): 745–65. doi:10.1080/ 02724980343000477 Taft, Marcus and Sam Ardasinski (2006). ‘Obligatory decomposition in reading prefixed words’, The Mental Lexicon 1(2): 183–99. doi:10.1075/ml.1.2.02taf Tallman, Adam (2018). A Grammar of Chácobo, a Southern Pano Language of the Northern Bolivian Amazon. University of Texas at Austin PhD dissertation. Tamba, Khady, Harold Torrence, and Malte Zimmermann (2012). ‘Wolof quantifiers’, in Edward Keenan and Denis Paperno (eds), Handbook of Quantification in Natural Language. New York: Springer, 891–939. Thiam, Ndiassé (1987). Les categories nominales en wolof. Aspects sémantiques. Dakar: Centre de linguistique appliquée de Dakar. Thomason, Sarah G. (2001). Language Contact: An Introduction. Washington, DC: Georgetown University Press. Thomason, Sarah G. (2008). ‘Pidgins/creoles and historical linguistics’, in Silvia Kouwenberg and John Victor Singler (eds), Handbook of Pidgin and Creole Languages. Malden, MA: Wiley-Blackwell, 242–62. Thomason, Sarah G. (2015). ‘When is the diffusion of inflectional morphology not dispreferred?’, in Francesco Gardani, Peter Arkadiev, and Nino Amiridze (eds), Borrowed Morphology. Berlin: Mouton de Gruyter, 27–46. Thomason, Sarah G. and Terence Kaufman (1988). Language Contact, Creolization, and Genetic Linguistics. Berkeley, CA: University of California Press. Thomaz, Luis Felípe (2002). Babel Loro Sa’e: O problema linguístico de Timor-Leste. Lisboa: Instituto Camões. Thornton, Anna M. (2005). Morfologia. Roma: Carocci. Thornton, Anna M. (2011). ‘Overabundance (multiple forms realizing the same cell): A non-canonical phenomenon in Italian verb morphology’, in Martin Maiden, John C. Smith, Maria Goldbach, and Marc-Olivier Hinzelin (eds), Morphological Autonomy: Perspectives from Romance Inflectional Morphology. Oxford: Oxford University Press, 359–82. Thornton, Anna M. (2019). ‘Overabundance: A canonical typology’, in Franz Rainer, Francesco Gardani, Wolfgang U. Dressler, and Hans Christian Luschützky (eds), Competition in Inflection and Word-Formation. Cham: Springer, 223–58. doi:10.1007/ 978-3-030-02550-2_9 Tily, Harry and T. Florian Jaeger (2011). ‘Complementing quantitative typology with behavioral approaches: Evidence for typological universals’, Linguistic Typology 15(2): 497–508. doi:10.1515/LITY.2011.033 Timberlake, Alan (2004). A Reference Grammar of Russian. Cambridge: Cambridge University Press.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
379
Tinits, Peeter (2014). ‘Language stability and morphological complexity in situations of language contact: An experimental paradigm’, in 19th International Congress of Linguists Papers. Geneva: Département de Linguistique de l’Université de Genève. Tomasello, Michael (2000). ‘First steps in a usage-based theory of language acquisition’, Cognitive Linguistics 11: 61–82. doi:10.1515/cogl.2001.012 Tomasello, Michael (2006). ‘Acquiring linguistic constructions’, in Robert Siegler and Deanna Kuhn (eds), Handbook of Child Psychology. New York: Wiley, 1860–2010. Torrence, Harold (2013). The Clause Structure of Wolof: Insights into the Left Periphery. Amsterdam: John Benjamins. Tourneux, Henry and Maurice Barbotin (2009). Dictionnaire pratique du créole de Guadeloupe. Paris: Karthala. Tribout, Delphine (2012). ‘Verbal stem space and verb to noun conversion in French’, Word Structure 5: 109–28. doi:10.3366/word.2012.0022 Trudgill, Peter (1983). ‘Language contact and language change: On the rise of the creoloid’, in Peter Trudgill (ed.), On Dialect: Social and Geographical Perspectives. Oxford: Blackwell, 102–7. Trudgill, Peter (1997). ‘Typology and sociolinguistics: Linguistic structure, social structure and explanatory comparative dialectology’. Folia Linguistica 31(3–4): 349–60. doi:10.1515/flin.1997.31.3-4.349 Trudgill, Peter (1999). ‘Language contact and the function of linguistic gender’, Poznań Studies in Contemporary Linguistics 35: 133–52. Trudgill, Peter (2004a). ‘Linguistic and social typology: The Austronesian migrations and phoneme inventories’, Linguistic Typology 8(3): 305–20. doi:10.1515/lity.2004.8.3.305 Trudgill, Peter (2004b). ‘The impact of language contact and social structure on linguistic structure’, in Bernd Kortmann (ed.), Dialectology Meets Typology: Dialect Grammar from a Cross-Linguistic Perspective. Berlin: Mouton de Gruyter, 435–51. Trudgill, Peter (2009). ‘Sociolinguistic typology and complexification’, in Geoffrey Sampson, David Gil, and Peter Trudgill (eds), Language Complexity as an Evolving Variable. Oxford: Oxford University Press, 98–109. Trudgill, Peter (2011). Sociolinguistic Typology: Social Determinants of Linguistic Complexity. Oxford: Oxford University Press. Trudgill, Peter (2017). ‘The anthropological setting of polysynthesis’, in Michael Fortescue, Marianne Mithun, and Nicholas Evans (eds), The Oxford Handbook of Polysynthesis. Oxford: Oxford University Press, 186–202. Tuite, Kevin (1999). ‘The myth of the Caucasian Sprachbund: The case of ergativity’, Lingua 108(1): 1–29. doi:10.1016/S0024-3841(98)00037-0 Ullman, Michael T. (2001). ‘The declarative/procedural model of lexicon and grammar’, Journal of Psycholinguistic Research 30(1): 37–69. doi:10.1023/A:1005204207369 Ullman, Michael T. (2004). ‘Contributions of memory circuits to language: The declarative/ procedural model’, Cognition 92(1–2): 231–70. doi:10.1016/j.cognition.2003.10.008 Valdman, Albert, Iskra Iskrova, and Benjamin Hebblethwaite (2007). Haitian CreoleEnglish Bilingual Dictionary. Bloomington, IN: Indiana University Creole Institute. Valenzuela, Pilar (2003). Transitivity in Shipibo-Konibo Grammar: A Typologically Oriented Study. University of Oregon PhD dissertation. Valenzuela, Pilar (2010). ‘Applicative constructions in Shipibo-Konibo (Panoan)’, International Journal of American Linguistics 76: 101–44. doi:10.1086/652756 Vallejos Yopán, Rosa (2010). A Grammar of Kokama-Kokamilla. University of Oregon PhD dissertation.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
380
van der Voort, Hein (2005). ‘Kwaza in comparative perspective’, International Journal of American Linguistics 71: 365–412. doi:10.1086/501245 van der Voort, Hein (2016). ‘Recursive inflection and grammaticalized fictive interaction in the Southwestern Amazon’, in Esther Pascual and Sergeiy Sandler (eds), The Conversation Frame: Forms and Functions of Fictive Interaction. Amsterdam: John Benjamins, 277–302. Van Engelenhoven, Aone (2004). Leti, a Language of Southwest Maluku. Leiden: KITLV Press. van Gijn, Rik and Fernando Zúñiga (2014). ‘Word and the Americanist perspective’, Morphology 24: 135–60. doi:10.5167/uzh-99717 Vanhove, Martine (2001). ‘Contacts de langues et complexification des systèmes: Le cas du maltais’, Faits de Langues 18: 65–74. Veenstra, Tonjes (2009). ‘Verb allomorphy and the syntax of phases’, in Enoch Aboh and Norval Smith (eds), Complex Processes in New Languages. Amsterdam: John Benjamins, 99–114. Veenstra, Tonjes and Angelika Becker (2003). ‘The survival of inflectional morphology in French-related creoles’, Studies in Second Language Acquisition 25: 285–306. doi:10.1017/S0272263103000123 Villoing, Florence and Maxime Deglas (2016). ‘La formation de verbes dénominaux en guadeloupéen. La part de l’héritage et de l’innovation’, 5ème Congrès Mondial de Linguistique Française 2016, Tours, France. doi:10.1051/shsconf/20162708004 Wälchli, Bernhard (2017). ‘The incomplete story of feminine gender loss in Northwestern Latvian dialects’, Baltic Linguistics 8: 143–214. Wälchli, Bernhard (2018). ‘The rise of gender in Nalca (Mek, Tanah Papua): The drift towards the canonical gender attractor’, in Sebastian Fedden, Jenny Audring, and Greville Corbett (eds), Non-Canonical Gender Systems. Oxford: Oxford University Press, 68–99. Walsh, Michael (1976). The Murinypata Language of North-West Australia. Australian National University PhD dissertation. Walther, Géraldine (2017). ‘Paradigm realisation and the lexicon’, in Ferenc Kiefer, James P. Blevins, and Huba Bartos (eds), Perspectives on Morphological Organization: Data and Analyses. Leiden: Brill, 159–99. Weinreich, Uriel, William Labov, and Marvin Herzog (1968). ‘Empirical foundations for a theory of language change’, in Winfred Philip Lehmann and Yakov Malkiel (eds), Directions for Historical Linguistics. Austin, TX: University of Texas Press, 95–198. Wells, Rulon (1954). ‘Archiving and language typology’, International Journal of American Linguistics 20(2): 101–7. Wichmann, Søren and Eric W. Holman (2009). Temporal Stability of Linguistic Typological Features. München: LINCOM Europa. Wilson, William André Auquier (1989). ‘Atlantic’, in John Theodore Bendor-Samuel (ed.), The Niger-Congo Languages: A Classification and Description of Africa’s Largest Language Family. Lanham, MD: University Press of America, by arrangement with the Summer Institute of Linguistics (SIL), 81–104. Wilson, William André Auquier (2007). Guinea Languages of the Atlantic Group. Frankfurt am Main: Peter Lang. Wise, Mary Ruth (1971). Identification of Participants in Discourse: A Study of Aspects of Form and Meaning in Nomatsiguenga. Norman, OK: Summer Institute of Linguistics of the University of Oklahoma.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
381
Wise, Mary Ruth (1990). ‘Valence-changing affixes in Maipuran Arawakan languages’, in Doris Payne (ed.), Amazonian Linguistics: Studies in Lowland South American Languages. Austin, TX: University of Texas Press, 89–116. Wise, Mary Ruth (2002). ‘Applicative affixes in Peruvian Amazonian languages’, in Mily Crevels, Simon van de Kerke, Sérgio Meira, and Hein van der Voort (eds), Current Studies on South American Languages: Selected Papers from the 50th International Congress of Americanists in Warsaw and the Spinoza Workshop on Amerindian Languages in Leiden, 2000. Leiden: Research School of Asian, African, and Amerindian Studies (CNWS), 329–44. Wittmann, Henri and Robert Fournier (1987). ‘Interpretation diachronique de la morphologie verbale du créole réunionnais.’ Revue québecoise de linguistique 6(2): 137–50. Woodbury, Anthony (2017). ‘Central Alaskan Yupik (Eskimo-Aleut): A sketch of morphologically orthodox polysynthesis’, in Michael Fortescue, Marianne Mithun, and Nicholas Evans (eds), The Oxford Handbook of Polysynthesis. Oxford: Oxford University Press, 536–60. Wray, Alison and George W. Grace (2007). ‘The consequences of talking to strangers: Evolutionary corollaries of socio-cultural influences on linguistic form’, Lingua 117(3): 543–78. doi:10.1016/j.lingua.2005.05.005 Wurzel, Wolfgang U. (1989). Inflectional Morphology and Naturalness. Dordrecht: Kluwer. Xanthos, Aris, Sabine Laaha, Steven Gillis, Ursula Stephany, Ayhan Aksu-Koç, Anastasia Christofidou, Natalia Gagarina, Gordana Hrzica, F. N. Ketrez, Marianne Kilani-Schoch, Katharina Korecky-Kröll, Melita Kovačević, Klaus Laalo, Marijan Palmović, Barbara Pfeiler, Maria D. Voeikova, and Wolfgang U. Dressler (2011). ‘On the role of morphological richness in the early development of noun and verb inflection’, First Language 31 (4): 461–79. doi:10.1177%2F0142723711409976 Yarshater, Ehsan (1969). A Grammar of Southern Tati Dialects. The Hague: Mouton. Zaliznjak, Andrei A. (1967). Russkoe imennoe slovoizmenenie. Moscow: Nauka. Zaliznjak, Andrei A. (1977). Grammatičeskij slovar’ russkogo jazyka. Moscow: Russkij jazyk. Zúñiga, Fernando (2017). ‘On the morphosyntax of indigenous languages of the Americas’, International Journal of American Linguistics 83(1): 111–39. doi:10.1086/689548 Zwitserlood, Inge (2003). ‘Word formation below and above little x: Evidence from sign language of the Netherlands’, Nordlyd 31(2): 488–502.
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Language Index Abkhaz-Adyghean languages, see West Caucasian languages Abun 268 Acoma 190 Aghul 208, 209, 213, 226 Aikanã 239, 241 Ainu 166–7, 190 Akha 274 Albanian 189, 273 Aleut 190 Algic languages 174, 190, 191, 216 Andoke 242 Apurinã 238, 241 Arabela 246 Araucanian languages, see Mapudungun Arawakan languages 237–9, 241, 243–6, 248, 254 Archi 180, 184, 189, 226 Ashéninka Perené 248, 255, 259 Athabaskan languages 190 Atlantic languages 16, 136–60, 188, 273, 280, 303 Atlantic-Congo languages, see also Niger-Congo languages 196, 197, 214, 218, 223 Austroasiatic languages 206, 226, 278 Austronesian languages 110, 190, 197, 205, 211, 228, 268–9, 280 Avar 170, 171, 180, 182–3, 185, 189 Aymara 191 Aymaran 191 Bagnoun, Baïnounk, Bainuk, Banyun 137, 140, 148 Baïnounk Gubaher 140, 148, 155, 156 Baïnounk Gunyamolo 148, 155 Balto-Slavic languages 176, 177, 198, 208, 213, 224 Bantu languages 113, 114, 169, 171, 173, 196–7, 198, 207, 216, 217–19, 223, 267, 273, 275, 280–1 Bardi 174, 190 Basque 189, 198, 205, 208, 213, 215 Lekeitio 205, 208, 213, 215–16, 220, 224 Standard 224 Benue-Congo languages 171, 189 Berber 237
Bilinarra 86–7 Bininj Gun-Wok (BGW) 171, 190 Bislama 86 Bodic languages 198, 228 Bora 236, 239 Boran languages 238, 239 Bulgarian 173, 180, 184, 189 Bunuban languages 190 Buy/Nyun 137 Cariban languages 238 Cavineña 248, 250, 253, 254, 258, 259, 261 Cayuvava 191 Central Alaskan Yup’ik (CAY) 190, 248, 250–2, 254–5, 258, 259, 261, 262 Central Malayo-Polynesian languages 274 Central Pomo 308–16, 322, 326–7 Chácobo 239, 240–1, 248, 250, 254–5, 257–61, 263 Chamorro 197, 198, 205, 211–13, 215, 228 Chayahuita 246 Chimariko 191 Chinese, Mandarin, see Mandarin Chinese Chinook Jargon 280 Chinookan languages 191 Chintang 13 Chiquihuitlán Mazatec 30 Chukchi 190 Chukchi-Kamchatkan languages 190 Chuvash 190 Common Slavic 28 Cree 190, 216–17, 228 Cubeo 238 Cupeño 180, 184–5, 191 Cushitic languages 189 Dahalo 189 Diola-Fogny 140, 155 Diyari 190, 191 Djingulu 190 Dogon languages 189 Eastern Pomo 190 Eipo 198, 228 Elfdalian, see Swedish, Elfdalian
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
384
English 13, 18, 26, 56, 81, 84, 85, 87, 108, 110, 125, 163, 166, 170, 171, 196, 208, 213, 225, 267, 271, 274, 276, 277, 279–80, 303, 310–11, 316, 320, 326, 332, 336 African-Amercian Vernacular 279 Middle 85 Old 52, 74, 271, 274 Eshtehardi 220, 227 Eskimo-Aleut languages 190, 248, 254 Even 190 Evenki 190
Haitian Creole 16, 106, 113, 114, 117–18, 120, 131–5, 270, 272, 279–80 Haitian Creole English 279 Haro 189 Hinuq 180, 182–3, 189 Hopi 180, 184–5, 191 Huallaga Quechua 191 Hungarian 84, 189 Hunzib 180, 183, 189 Hup 232, 238, 240, 242–4, 246, 248, 250, 254–5, 258–9, 260–1, 263 Hupa 190
Finnish 170, 171, 189, 213, 227 Fongbe 273, 277 French 16, 25–26, 33, 74, 105–6, 110–17, 119–20, 122–4, 127–8, 130–5, 160, 216–17, 228, 270, 272, 276, 279 Cajun 111 Medieval 134 Norman 85 French-based creoles 16, 105–6, 110, 113–14, 116–18, 120 Fula 137, 140, 144, 145–50, 152, 153, 159, 188 Fuuta-Jaloo Pular 140, 148 Gombe 145–6, 153 Fur 189
Icari 189 Icelandic 33, 276 Igo 213–15, 223 Ikposo 223 Indo-European languages 2, 106, 169, 171, 174, 178, 182, 186, 189, 193, 196, 200, 202, 203, 207, 210, 216, 224–5, 227, 230, 273, 276, 278 Indo-Portuguese creoles 113 Ingush 171, 177, 189 Insular Celtic languages 198, 213, 225 Inuit 13 Iranian languages 198, 227, 272 Northwestern 201, 208, 220, 227 Southwestern 220 Irish 208, 213, 215, 225–6, 276 Ros Much 226 Iroquoian languages 190 Northern 316 Italian 84, 147, 196, 275, 276, 284 Itelmen 190 Iwaidjan languages 168, 190
Gbe languages 213, 268, 271, 279 German 14, 171–2, 173, 184, 189, 199 Germanic languages 85, 170, 189, 198, 273 North 200, 203, 208, 213, 227 West 213 Ghana-Togo-Mountain languages 198, 213–14, 223–4 Godoberi 189 Gooniyandi, see also Kuniyanti 90 Greek 23, 30, 32, 62, 189, 198, 208, 210, 213, 225, 338 Asia Minor dialects 210, 215, 225 Cappadocian 202–3, 208, 210–13, 215, 225 Pontic 202, 204, 225 Rumeic 225 Standard Modern 202, 210–11, 225 Guadeloupean Creole 16, 106, 116–18, 120, 124–32, 134–5 Gullah Creole English 279 Gunwingguan (Gunwinyguan) languages 171, 190, 198, 224 Central 224 Gurindji 12, 82–3, 87–8, 102–3 Gurindji Kriol 12, 16, 81–3, 86–103, 343
Jaminjung 87 Jamsay 189 Jamul Tiipay 191 Jangshung 205, 208, 209, 228 Jaqaru 191 Jarawara 243, 248, 250, 252, 254, 255, 258, 259–62 Jaru 87 Juu languages 189 Kabardian 189 Kafteji 198, 201–2, 208, 220–1, 227 Kakua 239, 244 Kamayurá 244 Kanoe 239, 246 Karata 189 Karo 242, 246 Karok 191
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Karrangpurru 87 Kartvelian languages 173, 189, 213, 273, 275 Kashibo-Kakataibo 191 Kelasi 198, 201–2, 208, 220–1, 227 Keresan languages 190 Ket 190, 191 Khanty 189, 190 Khasi 206, 226 Khasian languages 198, 226 Khinalug 177, 189 Kikongo 280 Kiowa 190 Kiowa-Tanoan languages 190 Klamath 191 Klamath-Sahaptian languages 191 Koasati 191 Koiari 190 Kokama-Kokamilla 241, 248, 254, 255, 258–9, 261 Kotiria 248, 250, 254, 255, 258–9, 261 Kriol 12, 82, 83, 87–8, 102, 103 Kundjeyhmi 224 Kune 224 Kuniyanti, see also Gooniyandi 190 Kunwinjku 224 Kuuk Thaayorre 90 Kwa languages 213–14 Kwaza 167–8, 191, 239, 246 Lak 180, 183, 189 Lakhota 190 Lango 188, 191 Latin 56, 74, 109, 142, 169, 171 Latvian 212, 224 Tamian 203, 208, 212, 213 Leti 274 Lezgi 177, 180, 184, 189, 209 Lezgic (Lezgian) languages 180, 183, 198, 208, 213, 226 Light Warlpiri 89 Lingala 218 Kinshasa 218–19, 223 Makanza 207, 216, 218–19, 222, 223 Lithuanian 2–4, 6, 10, 189, 284 Lower Sepik languages 190 Luganda 169, 171, 189 Lyngngam 206, 226 Madang languages 190 Maidu 191 Malngin 87 Manchu 190, 191 Mandarin Chinese 168, 169, 175, 190, 191, 267, 270, 276, 277–8, 341
385
Mande languages 148, 269 Mandinka 137 Mapudungun 191, 232 Mari 189 Marri Ngarr 53 Marri Tjevin 53 Matses 242–3 Mauritian Creole 16, 106, 110, 112–14, 116–18, 120–5, 128, 131, 134–5 Mawng 168, 190 Mayan languages 191 Mazatec 30, 62 Mek languages 198, 228 Mian 167–8, 173, 187, 190 Michif 197, 198, 207, 216–17, 228 Mindi languages 190 Miwokan languages 191 Mohawk 316–20, 322–5, 326–7 Mongolian 190 Mongolic languages 190 Mordvin 189 Movima 191, 236–7, 248, 250, 254–5, 258, 259, 261, 262 Mudburra 87 Murrinhpatha 15, 52–80, 84 Muskogean languages 191 Nakh-Daghestanian languages 169, 170, 171, 173–4, 176, 177, 181–3, 189, 209, 226 Nalca 198, 228 Nama 171, 189 Nambikwara 239, 242 Nanai 190 Nanti 243–4 Nez Perce 191 Ngaliwurru 87 Nganasan 190 Ngarinyman 86, 87 Niger-Congo languages 110, 136–8, 140–2, 143, 148, 155, 193, 267–9, 273, 303 Niger-Kordofanian languages, see Niger-Congo languages Nilotic languages 188 Nivkh 142, 190 Nomatsigenga 244–5 Nubi Creole Arabic 280 Nupe 268, 271 Nuuchahnulth 191 Ñuun, see also Bagnoun 137, 140, 144, 147 Nyulnyulan languages 174, 190 Ok languages 167, 173, 190 Omotic languages 189 Ossetic 173, 189
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
386
Paez 191 Paiwan 190 Palenquero 110, 280 Pama-Nyungan languages 15, 87, 190 Panoan languages 191, 239–41, 242–4, 248, 254, 257, 259, 260 Paresi 241, 245–6, 248, 250, 254, 255, 258, 259, 261 Pazar Laz 173, 189 Pilagá 239 Pipil 180, 184–5, 191 Pnar 206, 226 Pomoan languages 190, 308, 310 Portuguese 113, 241, 280 Pular, see Fula Quechuan languages 191 Romance languages 196, 208, 213, 216, 273, 275, 276 Romanian 173, 189 Rongga 268, 270, 273, 274, 276 Russian 15, 23, 25, 27–32, 34–51, 169, 171, 172, 178, 180, 184, 189, 191, 270, 291 Saami 62 Kildin 189 Skolt 175, 178, 191 Salish languages 190 Seereer see Seereer-Siin Seereer-Siin 137, 144–5, 149, 150, 159 Seneca 190 Seri 62, 66 Shipibo-Konibo 239–40, 242, 244 Shumcho 198, 205, 208, 209, 213, 228 Siin-Gandum, see also Seereer-Siin 144 Sinitic languages 110, 268–9, 278 Sino-Tibetan languages 190, 274 Siouan languages 190 Slovene 173, 178, 180, 184, 189, 191 Somali 189 Sorbian 178, 184, 189 Lower 180, 191 Southern Sierra Miwok 191 Spanish 211–13, 215, 220, 225, 228, 280, 310 Sranan Creole English 279–80 Svan 189, 275 Swahili 142, 273–4, 275 Swedish 203, 204 Elfdalian 201, 227 Karleby 203, 208, 213, 215, 227 Standard 200–1, 203–4, 227 Sεlεε 213, 223
Tamambo 86 Tariana 237, 238, 241, 244, 246, 248, 250, 253–5, 258–9, 261 Tatuyo 236 Tawala 190 Thompson 190 Tibeto-Burman languages 228 Tindi 189 Tok Pisin 6 Trans New Guinea languages 228 Tsakhur 180, 183–4, 189 Tukanoan languages 236, 238–9, 241, 242, 248 Tümpisa Shoshone 180, 184–5, 191 Tundra Nenets 190 Tungusic languages 176, 177, 190 Turkic languages 190, 208, 209, 213, 220 Turkish 2–3, 7, 10, 141, 142, 147, 210–13, 225, 284, 342 Tzutujil 191 Udehe 190 Udi 180, 184, 189, 208, 209, 213, 226 Uralic languages 176–8, 189–9, 275, 278 Urarina 248, 250, 252, 254–5, 258–9, 261 Usan 190 Uto-Aztecan languages 176, 177, 180, 184–5, 191 Wakashan languages 191 Wappo 191 Wari’ 241 Warlpiri 54–6, 57, 87 West Caucasian languages 189 Wichí 239 Wishram 191 Witotoan languages 238 Wolof 16, 136–41, 143–4, 148–60, 270, 273, 303 Mbakke 136, 143 Xamatauteri Yanomami 244 Yagua 167, 238–9, 246 Yakut 190 Yanesha’ 241 Yeniseian languages 190, 278 Yimas 190 Yokuts 191 Yoruba 6, 268, 270, 276, 279 Yuhup 238 Yukagir 190 Yuki-Wappo languages 191 Yuman languages 191 Yurok 174, 191 Zuni 190
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
Subject Index abstractive (models, frameworks, perspectives) 326–7 acquisition 12, 13–14, 17, 53, 57, 75, 288, 303, 311, 326–7 first language, L1 13, 61, 323–5 native, see acquisition, first language, L1 non-native, see acquisition, second language, L2, adult second language, L2, adult 17, 111–12, 114, 267–82, 286, 326 actualization 92, 96, 101 adult acquisition, see acquisition, second language, L2, adult agent 90 agglutinative, agglutinating morphology 3, 137, 141, 141–2, 143, 144–8, 158, 234, 255 agreement 173–4, 193–228, 236, 287, 288, 291–8, 303 default 139 redistribution of 200–4 subject-verb 284 agreement targets 151–8 algorithmic information content 331 alignment 88 allomorphy 3, 7, 8, 9, 54–6, 57, 58, 59, 61–6, 68–70, 72, 75, 89, 110, 148, 149, 170, 172–3, 188, 230, 234, 247, 251, 252–3, 255, 261, 317, 326, 327 Amazonian languages 17, 167, 230–63 analogy 16, 26, 27, 52–4, 57, 61, 67, 70, 71–4, 75, 326 analyticity 17, 110, 267–82 Andean languages 231, 246 animacy 38, 39, 85, 90, 91, 92, 95, 96, 172, 174, 197, 199, 201–4, 205, 213, 214, 217, 218, 219, 238 argument relations 88, 90, 93, 102, 103 autonomous (or pure) morphology 6–7, 18, 24, 119, 147, 230–1, 235, 247–51, 255, 256–62 auxiliary 101 average conditional entropy, see entropy bias amplification 304 bilingualism 193, 210, 211, 214, 215, 220, 222, 307, 308, 311
biuniqueness 9, 54, 164, 230, 234, 247, 253, 254, 262, 341–2 borrowing 12, 16, 127, 160, 194, 205, 209, 212, 215, 222, 233, 238–9, 246, 273 bound status 235, 248, 256, 257–8, 262, 263 canonical typology 108, 340–1 canonicality 163–92 canonicity 9, 10, 16, 24, 163–4, 236, 238, 340–1 case 2–3, 82–3, 87–90, 163, 166, 171–2, 174, 175, 184, 246, 272–3, 274, 286, 343 Caucasus 171, 176, 177, 178, 180, 181, 182, 183, 184 Chaco region 238, 239 Circum-Baltic area 176 class prefixation 151 classifier stem 53, 59–75 classifiers 61–3, 167–8, 169, 236–9 numeral 270, 277 closed classes 52, 53, 59, 61, 66, 68, 71, 75 co-exponence 71, 171, 184 complexification 16, 82, 83, 85, 88, 89, 103, 109, 111, 136–60, 183, 194, 285 complexity: absolute (absolutive) 8, 24, 31, 106, 136, 195, 306, 337 agent-related 306, 337 canonical 163–92, 334, 340–2 compositional 335–7 constitutional 8–9, 141, 335 corpus 306 descriptive 9, 14, 151, 163–4, 195, 204, 217, 332, 335, 339, 340 effective 6, 306 enumerative (E-complexity) 8–9, 11, 24, 32, 56, 82, 85, 89, 102, 103, 106, 112, 163, 175, 233, 334, 335, 336–7 exponence 233, 234, 247, 251–5, 335 formal 8, 13–14 generative 9, 151 integrative (I-complexity) 11, 12–13, 16, 24–5, 27, 32, 56, 57, 59, 62, 65–6, 71, 75, 82, 85, 89, 103, 106–7, 108, 112–13, 122, 135, 233, 334, 335, 337–40, 343 inventory (IC) 163, 334–6 Kolmogorov 9, 163, 172, 185, 306, 331, 341
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
388
complexity: (cont.) modes of 335 objective 8, 306, 326, 337 paradigmatic 9, 84, 196 relative 8, 84, 136, 164, 180, 306 structural 159, 269, 306, 342 syntagmatic 3, 196 system 13, 17, 23, 27, 41–6, 46–8, 233, 234, 235–47, 248, 262, 306, 335, 342 taxonomic(al) 9, 163, 335 compounding 14, 232, 233, 235, 238, 262, 320 conditional entropy, see entropy conditioned variation 304 consonant mutation 62, 144, 145, 150–1, 173 constructive (models, frameworks, perspectives) 326, 335 contact-induced change 12, 81–103, 194, 205, 209–10, 211, 213, 244, 246, 286 contiguity 235, 248, 256, 258–9, 262 continuative aspect 101 conversion 120, 123, 124, 129, 130, 132, 133, 134 co-referential pronoun 90, 92, 95, 96, 99, 102, 103, 343 corepresentability 332 cost 8, 12, 13, 14, 24, 136, 185, 195, 337 creoles 2, 12, 16, 87, 105–6, 109–13, 113–14, 116–18, 135, 267, 271, 272, 277, 278–80 crosslinguistic tendency 33, 57 culminativity 261 declension entropy, see entropy default agreement, see agreement defectiveness 9, 30, 38, 42, 47, 48, 50, 157, 158, 234 definiteness 85 demography 199, 209, 216, 221 demorphologization 16, 52–4, 70–1, 74–5 dependent marking 166 Depth-of-Inference Contrast 32 derivation 7, 11, 13, 14, 107, 118–20, 131, 132, 134, 318–19, 335 deterministic input 304 difficulty, see cost dominance 212, 213 dominance analysis 83, 86, 90, 91, 93, 95, 96, 97, 100 drift 270–2, 281 dual-route model 13 entropy 11, 27, 40–9, 55, 56–9, 65–6, 81, 84, 296–8, 338–40 conditional 26, 32, 33, 40–1, 43–6, 47, 49, 57, 58, 66, 71, 338–40 declension(al) 33, 338
equicomplexity hypothesis 2 ergative 83, 88, 89, 90, 93, 95, 102, 103 evidentiality 231, 232, 234, 241–4, 250, 254, 262 expansion (of gender marking) 200, 205–7, 216–19, 220, 222 exponence: cumulative 3, 8, 171 multiple 174, 234, 247, 251, 253 partial 173–4 frequency 13, 28, 67, 110, 114, 116, 294–5, 303, 307 token 27, 36 type 27, 33, 34, 42–3, 44, 46, 47 gender 166, 167, 169–74, 176–7, 193–228, 237, 238, 272 gender marking: emergence of 198, 200, 205–7, 209–16, 221–2, 238, 278 erosion of 200, 203, 213, 220 loss of 200–7, 209–16, 222 reduction of 200–5, 208, 212, 215, 221, 222 generalized linear mixed models (GLMM) 16, 82, 83, 85–6, 91–6, 99 grammatical gender, see gender grammaticalization 12, 110, 206, 231–5, 236, 237–8, 241–7, 262–3, 275–6, 277, 307, 343 greater vs. unmarked plural 155 idiolect 82, 85, 216 imperfect learning 194, 283–305 implicative structure 25, 30, 31–3, 41, 43–6, 49, 50 inanimate, see animacy incorporation 59, 232, 233, 235, 238, 244, 245, 246, 250, 260, 262, 320–2 inflecting-fusional 137, 141, 142, 144, 147, 151, 158 inflection: contextual 110, 272–3 inherent 110, 270, 272–3 inflection class 23–51, 54–6, 62, 107, 147, 168–9, 186, 333, 336 inflectional categories 60, 165–6, 175, 270 information-theoretic approach 8, 11, 24, 26, 27, 32, 40, 337 information theory 43, 107, 343 intergenerational change 67, 68, 89, 99, 102, 213–14, 215, 267, 287, 290, 293, 295, 297 interrupted transmission 12, 290, 291–2, 294, 295, 297, 298, 305 intersecting formative 26–7, 54, 61–6, 68, 70, 75 intransitive subjects 83, 88–90, 102, 103
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
irregular/irregularity 8, 13, 23–51, 84, 119, 125, 136, 137, 141, 144, 148, 149, 150, 151, 151–8, 270, 284–5, 293–302, 303–5, 332 isolating languages 2, 83, 105, 137, 158, 269, 341 iterated learning 285–6, 304
389
obsolescence 84, 311–16 opacity 8, 10, 11, 75, 113, 174–5 overabundance 9, 24, 81–6, 88–90, 99, 102, 103, 144, 342–3 overspecification 284–5, 287, 288, 290, 291–3, 296–305, 335, 342
Kolmogorov complexity, see complexity language attitude 159, 221 language contact 2, 12, 14, 15, 19, 50, 53, 81–103, 109, 182, 183, 193–5, 205, 209–10, 211, 213, 233, 235, 238, 241, 244, 246, 262, 263, 267, 269, 271, 280, 282, 286, 306, 308–9, 310–12, 343 language ecology 209–21 language evolution 222, 285 language genesis 12, 102, 114, 279–80 learnability 50, 163, 298–302, 305 lexeme-based morphology 118 lexical storage 27–9, 303 lexicalization 204, 315, 316, 317, 322 lexicon qua mental lexicon 13, 28, 29, 326, 333–4, 335 lexifier 87, 105, 109, 111–12, 114, 116, 123, 130, 132, 135 linguistic areas 213, 222, 308–9 linguistic correctness 160 Low (Conditional) Entropy Conjecture 11, 25, 32, 33, 45, 49, 71 Marginal Detraction Hypothesis 33, 34 memorization 53, 75, 303 minimum description length 9, 26, 195, 204, 206, 306, 331–2, 334, 337, 340, 343 morpheme-to-word ratio 3 morphological decomposition 303 morphological richness 10, 136, 141–2, 336 morphome 11, 31, 119, 122, 247 morphophonological erosion 193, 200–3 multilingualism 12, 53, 213, 307 Natural Morphology 10, 12–13 naturalness, see Natural Morphology Network Morphology 28, 62 neural networks 14 nominal classification 193, 231, 234, 235–9, 250, 262 North Pacific Rim 176–7 noun class 34, 49, 136, 138–40, 144–8, 150, 160, 173, 218, 219, 236, 270, 273, 303 noun incorporation, see incorporation number 166, 167, 173–4 numeral classifier, see classifiers
Pāṇini’s Principle 332 Paradigm Cell Filling Problem 55, 59, 61 paradigm organization 333 Paradigm Structure Conditions 11 paradigmatic layers 25, 29–31, 34, 39, 41–6, 48, 50, 54, 62 passive 314–16 pattern competition 343 pattern regulation 343 periphrastic construction 84 person 166, 167 pidgin 2, 12, 105, 109–11, 267, 272 pidginization 218, 219, 279, 281 portmanteau 8, 60, 167, 171, 242 possessive 85, 89 predictability 1, 11, 14, 26, 33, 39–40, 45, 47, 52–3, 55, 56–9, 65, 68–70, 71, 84, 85, 106, 107, 120, 123, 131, 135, 169, 171, 338 prestige 159, 160, 195, 199, 209, 212, 213, 222 priming 91, 92, 93, 96, 102, 103, 343 principal parts 32, 33, 333, 336 probabilistic input 304 Probabilistic Syntax 85 probability matching 294–6 processing 1, 12–14, 26, 53, 56, 61, 75, 106, 322, 326 processing cost, see cost productivity 10, 23, 28, 53, 60, 111, 114–15, 128, 130, 132, 134, 135, 141, 194, 201, 203, 205–6, 213, 216, 218, 232, 235, 245, 251, 253, 262, 286, 304, 320, 327, 332, 336 prosodic dependence 235, 248, 256, 259–61, 262 psycholinguistic approach 11, 13, 14 qualitative approach 8, 9–10 quantitative approach 8, 9 redundancy 8, 14, 141, 287, 288, 293, 303, 305 reduplication 122, 312 regression analysis 47, 85, 93, 95 regular/regularity 9, 13, 14, 23, 25–8, 34, 46–8, 81, 84, 144, 235, 285, 297, 302, 304, 305, 331–2 regulations 335–6 resources 163, 335–6 routinization 307, 308, 327
OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi
390
set-theory 26, 32 simplification 12, 59, 61, 67, 68–70, 71, 72, 83, 84, 88, 89, 99, 103, 109, 110, 112, 141, 144, 159–60, 194, 203, 267, 270, 279, 285–7, 288, 293, 305 sociocultural context 307, 308 socioecological parameters 19, 283 sociolinguistic isolation 180, 186 sociolinguistic typology 12, 53 sociolinguistics 85, 160, 180–5, 186 stem alternation 24, 25, 46, 114, 142, 148 stem class 38, 170, 186 stem flexivity 168–9, 170 stress: inflectional 23, 29, 30 syllable 36, 276, 323 suffixation 124, 128, 129, 130, 132, 133, 149 suppletion 6, 8, 9, 26, 28, 33, 60, 65, 74, 117, 125, 142, 157, 234, 251, 252–3, 270, 307, 332, 338, 340–1 syncretism 6, 9, 29, 56, 81, 84, 89, 111, 113, 115, 116, 117, 120, 124, 125, 126–7, 128, 172, 173, 174, 194, 234, 250, 307, 341
synthesis 13, 106, 306 synthesis index 2, 231 templatic morphology 10, 232, 307, 317, 322 tense 234, 235, 239–41, 243–4, 250, 252, 260, 262 topicality 85, 86 transatlantic slave trade 279 transitive subjects 89, 90, 93, 95, 96, 99, 102 transmission fidelity 298 transparency 9, 10, 113, 114, 163–4, 175, 186, 340–2 U-curve 302 unpredictability 39, 52–4, 55–8, 60, 65, 70–1, 75, 168, 169, 170, 171, 341 valence-adjusting 234, 244–6, 250, 262 Vaupés region 233, 238, 242, 244 word formation 6, 7–8, 11, 197, 215, 320 word recognition 6, 14 word-and-paradigm framework 11 wordhood 172, 173, 234–5, 248, 250, 255, 256–61, 262