The Complexities of Morphology 9780198861287

167 108

English Pages [411] Year 2020

Table of contents :
Cover
The Complexities of Morphology
Copyright
Contents
List of Figures and Tables
List of Abbreviations
The Contributors
Chapter 1: Introduction: Complexities in morphology
1.1 Setting the scene
1.2 What is complex?
1.3 How many complexities?
1.3.1 Formal morphological complexity
1.3.2 Psycholinguistic morphological complexity
1.4 About this volume
Acknowledgements
Part I: The Language-Specific Perspective
Chapter 2: Irregularity, paradigmatic layers, and the complexity of inflection class systems: A study of Russian nouns
2.1 Introduction
2.2 Regularity, paradigmatic layers, and inflection classes
2.2.1 Regularity and inflection classes
2.2.2 Paradigmatic layers and inflection classes
2.2.3 Interim Summary
2.3 Inflection class complexity
2.4 Russian nouns
2.5 Quantifying complexity
2.6 Granularity and system complexity
2.6.1 Granularity of inflection class information
2.6.2 Paradigmatic layers and inflection class complexity
2.7 Regularity and system complexity
2.7.1 Defining class (ir)regularity
2.7.2 Regularity and system complexity
2.8 Discussion and conclusions
Acknowledgements
Chapter 3: Demorphologization and deepening complexity in Murrinhpatha
3.1 Introduction
3.2 Complexity in lexically specified allomorphy
3.3 Complexity, predictability, and language change
3.4 Unpredictable exponence in Murrinhpatha classifier stems
3.4.1 Intersecting formatives and unpredictable allomorphy
3.4.2 Variation and change
3.5 Predictability of changes observed in Murrinhpatha
3.6 Demorphologization and deepening complexity
3.7 Conclusions
Appendix
Acknowledgements
Chapter 4: Overabundance resulting from language contact: Complex cell-mates in Gurindji Kriol
4.1 Introduction
4.2 Dimensions and measures of morphological complexity in language contact
4.3 Optional subject marking in Gurindji Kriol
4.4 Changes in the complexity of subject marking
4.4.1 Data
4.4.2 Procedure
4.4.3 Results
4.4.3.1 Adults
4.4.3.2 Children
4.4.4 Discussion
4.5 Concluding remarks
Acknowledgements
Chapter 5: Derivation and the morphological complexity of three French-based creoles
5.1 Introduction
5.2 Morphological complexity
5.3 Creole simplicity
5.4 Verb inflection: from French to French-based creoles
5.4.1 Properties of the French verbal paradigm
5.4.2 French-based creoles
5.5 Approaches to derivation
5.6 Derivational relations in French-based creoles
5.6.1 Mauritian
5.6.1.1 Function of verb forms in Mauritian
5.6.1.2 Derivational relations in Mauritian
5.6.2 Guadeloupean
5.6.2.1 Function of verb forms in Guadeloupean
5.6.2.2 Derivational relations in Guadeloupean
5.6.3 Haitian
5.6.3.1 Function of verb forms in Haitian
5.6.3.2 Derivational relations in Haitian
5.7 Conclusion
Acknowledgements
Chapter 6: Simplification and complexification in Wolof noun morphology and morphosyntax
6.1 Introduction
6.2 Wolof and Atlantic languages
6.3 Wolof noun classes: the basics and the received view
6.4 Wolof within the Atlantic context
6.5 Complexification in Wolof noun inflection, against the background of Atlantic noun class systems
6.5.1 Morphological complexity vs. morphological richness
6.5.2 The emergence of inflectional classes in Wolof
6.5.3 Agglutinative noun-class morphology and inflectional classes in other Atlantic languages
6.5.4 The complexification of Wolof noun inflection
6.6 Complexification in Wolof: paradigmatic irregularity in some agreement targets
6.7 External explanatory factors for structural simplification
Acknowledgements
Part II: The Crosslinguistic Perspective
Chapter 7: Canonical complexity
7.1 Introduction
7.2 Method
7.2.1 Samples
7.2.2 Survey objects
7.3 Results
7.3.1 CC and enumerative complexity
7.3.2 Complexity and gender
7.3.3 Geography: continents and areas
7.3.4 Large-scale geography
7.3.5 Sociolinguistics
7.4 Discussion and conclusions
Appendix 7.1 Categories and variables used here
Appendix 7.2 Cell totals per category and variable
Appendix 7.3 Sample
Appendix 7.4 CC levels in the survey languages
Chapter 8: The complexity of grammatical gender and language ecology
8.1 Introduction
8.2 Grammatical gender and morphological complexity
8.3 Method and data
8.3.1 Sampling methodology and variables in focus
8.3.2 Data collection
8.4 Patterns of change under study: an overview
8.4.1 Reduction and loss of gender marking
8.4.2 Emergence and expansion of gender marking
8.5 Distribution of the patterns of change and clustering effects within Eurasia
8.6 The evolution of gender agreement systems and language ecology
8.6.1 Demographic factors in gender agreement loss and emergence
8.6.2 Language contact, language policies, and the expansion of gender agreement
8.6.3 The symbolic function of gender agreement morphology
8.7 Summary and concluding remarks
Appendix 8.1 Patterns and contexts of change in the languages of the sample
Acknowledgements
Chapter 9: Morphological complexity, autonomy, and areality in western Amazonia
9.1 Introduction
9.2 System complexity in western Amazonia
9.2.1 Nominal classification
9.2.2 Tense
9.2.3 Evidentiality
9.2.4 Valence-adjusting
9.2.5 Summary
9.3 Exponence complexity and morphological autonomy
9.3.1 Languages considered
9.3.2 Exponence complexity
9.3.3 Criterial wordhood properties and morphological autonomy
9.3.4 Summary
9.4 Conclusion
Acknowledgements
Part III: The Acquisitional Perspective
Chapter 10: Radical analyticity as a diagnostic of adult acquisition
10.1 Introduction
10.1.1 Definition of radical analyticity
10.1.2 Radical analyticity worldwide
10.1.3 Application to this volume
10.2 Adult acquisition versus ‘drift’
10.3 Argument No. 1: contextual versus inherent inflection
10.4 Argument No. 2: analytic language as an unnatural state
10.4.1 Grammaticalization is unceasing
10.4.2 Unstressed final syllables do not lead to the typology of Chinese
10.4.3 Inflection is more quickly lost than gained
10.5 Argument No. 3: radical analyticity is rare
10.6 On claims dissociating creolization from ossified acquisitional capacity
10.7 On a phonological pathway to radical analyticity
10.8 Conclusion
Chapter 11: Different trajectories of morphological overspecification and irregularity under imperfect language learning
11.1 Introduction
11.1.1 Why study complexity?
11.1.2 What is complexity?
11.1.3 How to study complexity?
11.1.4 Why does complexity decrease?
11.2 Materials and methods
11.2.1 Artifical language structure
11.2.2 Experimental procedure
11.3 The trajectory of overspecification
11.3.1 Qualitative analyses
11.3.2 Quantitative analyses
11.4 The trajectory of irregularity
11.4.1 Probability matching
11.4.2 Irregularity and overspecification
11.4.3 Irregularity and learnability
11.5 Discussion
11.6 Conclusion
Acknowledgements
Chapter 12: Where is morphological complexity?
12.1 Introduction
12.2 What is complexity?
12.3 Central Pomo
12.4 Obsolescence and morphological complexity
12.5 Mohawk
12.5.1 Inflection
12.5.2 Derivation
12.5.3 Noun incorporation
12.5.4 Processing
12.5.5 Native acquisition
12.6 Implications for our models of morphology
Part IV: Discussion
Chapter 13: Morphological complexity and the minimum description length approach
13.1 Introduction
13.2 The minimum description length approach to complexity
13.3 The organization of morphology
13.4 Notions of complexity represented in the volume
13.5 Compositional complexity
13.6 Integrative complexity
13.7 Canonical complexity and transparency
13.8 Overabundance
13.9 Conclusion
References
Language Index
Subject Index

Recommend Papers

The Complexities of Care: Nursing Reconsidered 9780801465055

"Nursing, everyone believes, is the caring profession. Texts on caring line the walls of nursing schools and studen

98 24 561KB Read more

Human Trafficking: The Complexities of Exploitation 9781474401135

Examines the socio-economic exploitation that underpins human trafficking ‘Human Trafficking’ is a term that does littl

106 33 2MB Read more

Valuation Complexities

143 36 53MB Read more

Digital Mythologies: The Hidden Complexities of the Internet 9780813568058

Surf the web. Ride the information highway. Log on to the future. Corporate ad campaigns like these have become pervasiv

109 59 21MB Read more

Complexities: Social Studies of Knowledge Practices 9780822383550

Asks what is meant by complexity and how it might be handled within knowledge practices without generating a chaos of fu

126 72 1MB Read more

Morphology of Cacti [I]

Morphology of Cacti Section I - Roots and stems

472 55 25MB Read more

Morphology of the Folktale: Second Edition 9780292748095

This book is the classic work on forms of the European folktale.

123 28 9MB Read more

A Glossary of Morphology 9781474464277

This pocket-sized alphabetic guide introduces terms used in the study of linguistic morphology, the study of the structu

98 14 3MB Read more

Principles of Insect Morphology 9781501717918

This classic text, first published in 1935, is once again available. Still the standard reference in the English languag

112 90 42MB Read more

Exploring the Complexities of Human Action 9780190050450, 9780190050436, 9780190050467

112 4 27MB Read more

The Complexities of Morphology
9780198861287

Author / Uploaded
Peter Arkadiev and Francesco Gardani

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

The Complexities of Morphology

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

The Complexities of Morphology Edited by PETER ARKADIEV and FRANCESCO GARDANI

1

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

3

Great Clarendon Street, Oxford, OX2 6DP, United Kingdom Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide. Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries © editorial matter and organization Peter Arkadiev and Francesco Gardani 2020 © the chapters their several authors 2020 The moral rights of the authors have been asserted First Edition published in 2020 Impression: 1 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by licence or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this work in any other form and you must impose this same condition on any acquirer Published in the United States of America by Oxford University Press 198 Madison Avenue, New York, NY 10016, United States of America British Library Cataloguing in Publication Data Data available Library of Congress Control Number: 2020932944 ISBN 978–0–19–886128–7 Printed and bound in Great Britain by Clays Ltd, Elcograf S.p.A. Links to third party websites are provided by Oxford in good faith and for information only. Oxford disclaims any responsibility for the materials contained in any third party website referenced in this work.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

Contents List of Figures and Tables List of Abbreviations The Contributors

1. Introduction: Complexities in morphology Peter Arkadiev and Francesco Gardani

vii xi xvii

1

I. THE LANGUAGE-SPECIFIC PERSPECTIVE 2. Irregularity, paradigmatic layers, and the complexity of inﬂection class systems: A study of Russian nouns Jeff Parker and Andrea D. Sims 3. Demorphologization and deepening complexity in Murrinhpatha John Mansﬁeld and Rachel Nordlinger 4. Overabundance resulting from language contact: Complex cell-mates in Gurindji Kriol Felicity Meakins and Sasha Wilmoth

23 52

81

5. Derivation and the morphological complexity of three French-based creoles Fabiola Henri, Gregory Stump, and Delphine Tribout

105

6. Simpliﬁcation and complexiﬁcation in Wolof noun morphology and morphosyntax Michele Loporcaro

136

II. THE CROSSLINGUISTIC PERSPECTIVE 7. Canonical complexity Johanna Nichols

163

8. The complexity of grammatical gender and language ecology Francesca Di Garbo

193

9. Morphological complexity, autonomy, and areality in western Amazonia Adam J. R. Tallman and Patience Epps

230

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

vi



III. THE ACQUISITIONAL PERSPECTIVE 10. Radical analyticity as a diagnostic of adult acquisition John H. McWhorter 11. Different trajectories of morphological overspeciﬁcation and irregularity under imperfect language learning Aleksandrs Berdicevskis and Arturs Semenuks 12. Where is morphological complexity? Marianne Mithun

267

283 306

IV. DISCUSSION 13. Morphological complexity and the minimum description length approach Östen Dahl

331

References Language Index Subject Index

345 383 387

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

List of Figures and Tables Figures 2.1. Word types per inﬂection class across different granularities

43

2.2. Complexity measures across granularities of Russian nouns

44

2.3. Conditional entropy of real and a hundred Monte Carlo simulations of Russian nouns across granularities

45

2.4. Effect of the irregularity of each layer on system complexity (entropy difference)

48

3.1. Ackerman & Malouf (2015) mechanism for predicting unknown inﬂectional forms

58

4.1. Traditional languages and Aboriginal communities of the Victoria River District

87

4.2. Fixed and random effects used to measure the use vs. non-use of subject marking in Gurindji Kriol

92

5.1. Degrees of complexity in the predictability of a base lexeme’s base stem in a particular derivational relation R

108

5.2. Degrees of complexity in the restrictedness of stem X in the morphology of lexeme L, where X serves as L’s base stem in a particular derivational relation

109

7.1. Mean CC 1 standard deviation for three areal breakdowns and selected families

177

7.2. Complexity x longitude

179

7.3. Complexity and altitude in Daghestan (eastern Caucasus) for the three complexity counts

181

8.1. The language sample

198

8.2. Patterns of change in the language sample

207

9.1. Western Amazonian languages sampled

249

9.2. Kernel distribution of densities across the languages of this study

255

11.1. The meaning space of the experimental languages with the corresponding sentences from an example generation 0 language

289

11.2. A schematic representation of the chains in the normal (a), temporarily interrupted (b), and permanently interrupted (c) conditions

290

11.3. Change of the overspeciﬁcation of agreement, as measured by expressibility, over time

294

11.4. Relative frequency of the agreement marker which denoted the round animal in the initial language of the chain

295

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

viii

    

11.5. Change of irregularity, as measured by Shannon entropy, over generations

298

11.6. Change of overspeciﬁcation and irregularity in verbal agreement over generations in individual chains

299

11.7. Learnability as a function of irregularity

302

12.1. Mohawk verb template

317

Tables 1.1. Case paradigm of Turkish ev ‘house’ and Lithuanian miestas ‘city’

3

1.2. Sample paradigms of Lithuanian nouns

4

2.1. An example of morphosyntactically conditioned stress alternation in Russian nouns

30

2.2. Illustration of the four-class system, based on inﬂectional sufﬁxes

35

2.3. Illustration of stress classes of Russian nouns

37

2.4. Number of nominal inﬂection classes of Russian nouns as a function of which paradigmatic layers are included

42

3.1. Warlpiri verb inﬂection classes

55

3.2. Examples of inﬂected classiﬁer forms

62

3.3. Examples of classiﬁer forms and their formative analyses

63

3.4. Inﬂectional exponence of na ‘(27)’

64

3.5. Variably inﬂected classiﬁer stem forms

68

3.6. Allomorphs selected by Ackerman & Malouf (2015) simpliﬁcation mechanism

69

3.7. Exponence probabilities of older and newer forms

70

3.8. Classiﬁer stem paradigm for ma ‘(34)’

73

3.9. Classiﬁer stem paradigm for ɾa ‘(28)’

74

4.1. Allomorphic reduction in subject marking in Gurindji Kriol

89

4.2. Comparison of case systems and allomorphy across three generations

89

4.3. Occurrence of subject marking in adult Gurindji Kriol speakers according to predictors

94

4.4. Output of generalized linear mixed model analysis on 3,575 tokens

95

4.5. Relative effect of the signiﬁcant predictors according to dominance analysis

97

4.6. Occurrence of subject marking in child Gurindji Kriol speakers according to predictors

98

4.7. Output of generalized linear mixed model analysis on 2,975 tokens

99

4.8. Relative effect of the signiﬁcant predictors according to dominance analysis

100

5.1. Patterns of syncretism in the French paradigm (Bonami et al. 2013)

115

5.2. Comparison of  and .3 forms in French with long and short forms in Mauritian

117

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

    

ix

5.3. Sample comparison of long and short forms in four French-based creoles

117

5.4. Stem space of  ‘to form’,  ‘to ﬁnish’, and ´  ‘to defend’

119

5.5. Verb alternations in Mauritian

121

5.6. Reduplication in Mauritian

123

5.7. Deverbal nominalizations in Mauritian

124

5.8. Verb alternations in Guadeloupean

125

5.9. Deverbal nominalizations in Guadeloupean

129

5.10. Verb alternations in Haitian

132

5.11. Deverbal nominalizations in Haitian

133

5.12. Complexity of derivational relations in French, Mauritian, Guadeloupean, and Haitian

135

7.1. Gender unpredictability for some example languages

171

7.2. Areal and family breakdown

176

7.3. Complexity values for four historical groups of languages

180

8.1. Third person pronouns in standard Swedish

201

8.2. Clustering of patterns of change at language-family edges within Eurasia

208

8.3. Direction of change and asymmetries in the structure of the population and/or prestige dynamics

213

9.1. Anderson’s (2015a) schematization of morphological complexity

233

9.2. Similar classiﬁer forms in Guaporé-Mamoré languages (van der Voort 2005: 397)

239

9.3. Evidentiality and tense in Matses (Panoan; Fleck 2007: 593)

243

9.4. Number of morphemes coded in this study by language and functional domain

250

9.5. Number of allomorphs per morpheme attested across the sample

251

9.6. Percentage of morphemes for each EC value across the languages sampled

254

9.7. Rank correlations between EC level and bound status values across languages

258

9.8. Rank correlations between EC level and contiguity value across languages

259

9.9. Rank correlations between EC level and prosodic dependence across languages

261

10.1. Wolof noun class markers

273

11.1. An example of a ﬁnal language with a fully preserved agreement system

292

11.2. An example of a language with a fully lost agreement system

292

11.3. A language with a fully lost agreement system

296

11.4. A language with an irregular distribution of the agreement markers

297

13.1. Hypothetical noun inﬂection templates

339

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

List of Abbreviations 1 2 3 A     ACLA   .    .     BGW 8  CAY CC             

ﬁrst person second person third person most agent-like or experiencer-like argument of transitive; A-class verb ablative abilitative absolutive accusative Aboriginal Child Language (project) grammatical agent animate anaphoric pronoun antipassive appositional mood applicative ‘article of noun’ aspect associative augmentative auxiliary Bininj Gun-Wok; Gunwingguan, northern Australia noun class 8 plural causative Central Alaskan Yup’ik canonical complexity cislocative classiﬁer; class marker completive contrast comitative connector conditional continuative contrastive copula direct case marker declarative deﬁnite; default (in Mansﬁeld and Nordlinger, Chapter 3)

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

xii

  

      .       APL EC E-complexity ELAP       Fr.    G   GLMM GYN    IALL I-complexity IC IE      

demonstrative desiderative determiner indexical marker different event diminutive direct experience evidential discourse marker discontinuitive dual duplicative dynamic E-class verb applicative enumerative complexity; exponence complexity enumerative complexity Endangered Languages Project ergative evidential eyewitness feminine factual focus French vowel frontness frustrative future more goal-like argument of ditransitive geminate genitive Generalized Linear Mixed Models Gbe languages, Yoruba, and Nupe habitual (aktionsart) high vowel height hearsay iterated artiﬁcial language learning Integrative complexity inﬂectional class; inventory complexity Indo-European intransitive inanimate verb immediate imperative imperfective inanimate inchoative

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

           L       MDL    N NC      NP        2    PCFP    P.N.  POS

indicative indeﬁnite inﬁnitive intentional intransitive (subject orientation) interactional irrealis joint agency lexeme long form linking particle linker locative low vowel height masculine Minimum Description Length middle middle marker neuter noun noun class negation non-feminine non-future nominative non-eyewitness evidential noun phrase non-past non-singular nonvisual object; object of monotransitive object oblique optative second position passive grammatical patient paucal Paradigm Cell Filling Problem perfective peripheral plural proper name potential parts of speech

xiii

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

xiv

  

 Poss                       S  SD          SV T  TAM     

possessive possessor process verbalization present pronoun progressive proprietive prothetic vowel presentational partitive proximate past past irrealis realis recent reciprocal reduplication referential focus relative remote respect reﬂexive reportative ɾ-alternation subject; sole argument of intransitive same event standard deviation sequential short form singular simultaneous semelfactive same subject stative (aktionsart) strong form suppletive subject-verb more theme-like argument of ditransitive transitive animate verb tense/aspect/mood topic advancing voice temporal thematic sufﬁx topic transitive

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

    UG V   VN VS  Y/N

translocative Universal Grammar verb venitive locative verbalization verb-noun verb-subject weak form yes/no

xv

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

The Contributors Peter Arkadiev holds a PhD in theoretical, typological, and comparative linguistics from the Russian State University for the Humanities and a habilitation degree from the Russian Academy of Sciences. Currently he is Senior Researcher at the Institute of Slavic Studies of the Russian Academy of Sciences and Assistant Professor at the Russian State University for the Humanities. His fields of interest include language typology and areal linguistics, morphology, case and alignment systems, tense-aspect, Baltic and Northwest Caucasian languages. He has co-edited Contemporary Approaches to Baltic Linguistics (with Axel Holvoet and Björn Wiemer) and Borrowed Morphology (with Francesco Gardani and Nino Amiridze, both published by De Gruyter Mouton in 2015). Aleksandrs Berdicevskis is a researcher in computational linguistics at the University of Gothenburg, Sweden. At the time of writing he was Assistant Professor at Uppsala University. He has worked on experimental and quantitative approaches to language change and evolution with a focus on Slavonic languages. He has also participated in the development of TOROT (Tromsø Old Russian and Old Church Slavonic Treebank) and related resources. In his PhD dissertation (University of Bergen) he investigated linguistic innovations in Russian computer-mediated communication. Östen Dahl is Professor Emeritus of General Linguistics at Stockholm University, Sweden. He got his academic training at the universities of Gothenburg, Uppsala, and Leningrad (St. Petersburg) and was active at the University of Gothenburg for ten years before moving to Stockholm in 1980. In recent years, his research has mainly been typologically oriented with a strong interest in diachronic approaches to grammar. He has published the monographs Tense and Aspect Systems (1985), The Growth and Maintenance of Linguistic Complexity (2004), and Grammaticalization in the North: Noun phrase morphosyntax in Scandinavian vernaculars (2015). Francesca Di Garbo is currently affiliated to the University of Helsinki as Postdoctoral Research Fellow and member of the GramAdapt team, an ERC-funded project (ID: 805371) investigating mechanisms of adaptation of language structures to social structures. Her research interests include diachronic and synchronic typology, nominal classification, number systems, evaluative morphology, linguistic complexity, sociolinguistic typology, and African languages. Patience Epps is Professor of Linguistics at the University of Texas at Austin. Her research focuses on indigenous Amazonian languages, particularly the Naduhupan language family of the northwest Amazon. Her work engages with language description and documentation, linguistic typology, language contact and language change, and Amazonian prehistory. Major publications include the monograph A Grammar of Hup (De Gruyter Mouton, 2008).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

xviii

 

Francesco Gardani is Professor of Romance Linguistics at the University of Zurich, Switzerland. His research cuts across the fields of Romance and theoretical linguistics and focuses on morphology, language contact, and linguistic typology. He is the author of Borrowing of Inflectional Morphemes in Language Contact (2008) and Dynamics of Morphological Productivity: The evolution of noun classes from Latin to Italian (2013) and the co-Editor-in-Chief of the Oxford Encyclopedia of Romance Linguistics. Fabiola Henri is Assistant Professor at the University of Kentucky and an affiliate of the CNRS research centre, Laboratoire de Linguistique Formelle. Her recent research focuses on the structure and complexity of morphology in creole languages. Other strands of her research relate to creole genesis, morphology, and its interfaces, and creole syntax, among other topics. She is the co-editor of a recent monograph Negation and Negative Concord: The view from Creoles. Michele Loporcaro is Full Professor of Romance Linguistics at the University of Zurich, a Fellow of Academia Europaea and the Austrian Academy of Sciences. His research focuses on the phonology, morphology, syntax, and lexicon of the Romance languages in synchrony and diachrony; dialectology; linguistic historiography. He is the author of over 200 articles and seven monographs, two of which with OUP: Vowel Length from Latin to Romance 2015; Gender from Latin to Romance 2018 (shortlisted for the Prose Awards of the Association of American Publishers). In 2012 he received the Feltrinelli prize of the Accademia dei Lincei. John Mansfield is Lecturer in Linguistics at the University of Melbourne. His research explores the typology of morphological complexity, with a particular focus on processes of variation and change. Other strands of his research address aspects of morphological theory, prosodic phonology, and sociolinguistics, especially with respect to the Aboriginal languages of northern Australia. John H. McWhorter is Associate Professor of English and Comparative Literature at Columbia University, New York City. He specializes in language change and language contact, in particular the development of creoles, pidgins, koines, ‘vehicular’ languages, and non-standard dialects. Professor McWhorter is author of more than a dozen books including Defining Creole (2005), Language Interrupted (2007), Linguistic Simplicity and Complexity (2011), The Language Hoax (2014), Talking Back, Talking Black (2017), and The Creole Debate (2018). A contributing editor at The New Republic and The Atlantic, he has also hosted Slate’s linguistics podcast Lexicon Valley. Felicity Meakins is ARC Future Fellow in Linguistics at the University of Queensland and Chief Investigator in the ARC Centre of Excellence for the Dynamics of Language. She is a field linguist who specializes in the documentation of Australian Indigenous languages in the Victoria River District of the Northern Territory and the effect of English on Indigenous languages. She has worked as a community linguist as well as an academic over the past twenty years, facilitating language revitalization programmes, consulting on Native Title claims, and conducting research into Indigenous languages. She has compiled a number of dictionaries and grammars of traditional Indigenous languages, and has written numerous papers on language change in Australia.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 

xix

Marianne Mithun is Professor of Linguistics at the University of California, Santa Barbara. Her interests range over morphology, syntax, discourse, prosody, and their interrelations; language contact and language change; typology; language documentation and revitalization; and the languages indigenous to North America and Austronesia. Johanna Nichols is Professor Emeritus in the Department of Slavic Languages at the University of California, Berkeley. She works on Slavic languages, languages of the Caucasus, linguistic typology, and historical linguistics. She is AAAS Fellow and LSA Fellow, and presently holds visiting positions as Helsinki University Humanities Visiting Professor and Research Supervisor in the Linguistic Convergence Laboratory, Higher School of Economics, Moscow. She has done extensive fieldwork on the Ingush language of the central Caucasus. Rachel Nordlinger is Professor of Linguistics at the University of Melbourne and Chief Investigator in the ARC Centre of Excellence for the Dynamics of Language. Her research centres around the description and documentation of Australia’s Indigenous languages and their implications for linguistic typology. She has also published on topics in syntactic and morphological theory, and in particular the challenges posed by the complex grammatical structures of Australian languages. Jeff Parker is Assistant Professor of Linguistics at Brigham Young University. His research centres around better understanding inflectional structure from different methodological perspectives, including investigations into how language specific traits contribute to the complexity of inflection class systems, how inflectional structure affects lexical access of inflected forms, and how computational models of learning help explain typological tendencies in inflection class systems. He has published in journals such as Morphology, Word Structure, and The Mental Lexicon, as well as the Slavic-focused Slavic and East European Journal. He is also co-editor of a forthcoming volume, Morphological Typology and Linguistic Cognition (forthcoming, with Andrea D. Sims, Adam Ussishkin, and Samantha Wray). Arturs Semenuks is a PhD student in the Department of Cognitive Science at the University of California, San Diego. He uses experimental and computational methods to investigate what sociocognitive pressures affect the structure of language, especially its morphological complexity, as well as what constraints exist on how language can be structured in principle, and how language affects human thought. His previous work at the University of Essex focused on the relationship between sentence processing costs and acceptability judgements. Andrea D. Sims is Associate Professor at The Ohio State University, jointly appointed in the Department of Linguistics and Department of Slavic and East European Languages and Cultures. Much of her research focuses on the internal organization of inflection class systems (defectiveness and irregularity, syncretism, inflection class complexity) and factors influencing its emergence, reinforcement, and generalization. She is author of a research monograph, Inflectional Defectiveness (2015), co-author of a morphology textbook, Understanding Morphology (2nd edn, 2010, with Martin Haspelmath), and co-editor of Morphological Typology and Linguistic Cognition (forthcoming, with Adam Ussishkin, Jeff Parker, and Samantha Wray).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

xx

 

Gregory Stump is Professor Emeritus of linguistics at the University of Kentucky. His research includes work on the structure of complex inflectional systems, the nature of inflectional complexity, and the algebra of morphotactics. His research monographs include Inflectional Morphology: A Theory of Paradigm Structure (Cambridge University Press, 2001), Morphological Typology: From Word to Paradigm (Cambridge University Press, 2013, co-authored with Raphael A. Finkel), and Inflectional Paradigms: Content and Form at the Syntax-Morphology Interface (Cambridge University Press, 2016). He is a coeditor of the journal Word Structure. He now resides in Olathe, Kansas. Adam J. R. Tallman is Postdoctoral Researcher at Laboratoire Dynamique du Langage (Université de Lyon II). His research focuses on the documentation and description of the languages of the Amazon. His PhD thesis (University of Texas at Austin, 2018) was a grammar of Chácobo (Pano) based on extensive (ELDP and NSF funded) documentation. Currently he is undertaking the documentation of Araona (Takanan). Apart from his primary interest in documentation and description, Tallman focuses on morphophonology, constituency, and the application of quantitative methods to linguistic typology. Delphine Tribout is Assistant Professor at the University of Lille, France, and member of the CNRS research centre, Savoirs, Textes, Langage. Her main research interests are derivational morphology, especially conversion, and lexical semantics. Sasha Wilmoth is a PhD candidate at the Centre of Excellence for the Dynamics of Language at the University of Melbourne, Australia, working on intergenerational variation and change in Pitjantjatjara. She completed her BA (Hons) degree at the University of Melbourne. She was previously a Research Assistant at the University of Queensland, and Linguistic Project Manager at Appen, a Sydney-based company which provides specialized linguistic data and services for speech and language technologies. Her research interests include morphology, syntax, and digital methods for language documentation.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

1 Introduction Complexities in morphology Peter Arkadiev and Francesco Gardani

1.1 Setting the scene Morphological and, broadly, linguistic complexity has become a popular topic in linguistic typology and theorizing, as several recent publications testify to, such as McWhorter (2001, 2005, 2018); Kusters (2003); Dahl (2004); Hawkins (2004, 2014); Trudgill (2004a, 2011); Shosted (2006); Miestamo et al. (2008); Sampson et al. (2009); Dressler (2011); Kortmann & Szmrecsanyi (2012); Newmeyer & Preston (2014); Baerman et al. (2015b, 2017); Reintges (2015); Baechler & Seiler (2016); Mufwene et al. (2017); among many others. While this large body of work has contributed to signiﬁcantly improving our understanding of morphological complexity, a number of key issues remain unsettled. They are of both theoretical and empirical nature and pertain to the domain of morphology and morphosyntax as well as to the ways language use and its socioecological conditions inﬂuence linguistic structure. Undoubtedly, the most pressing question is what morphological complexity actually is. There is no straightforward answer to this question, as we will see. The issue of how to deﬁne ‘morphological complexity’ is of central importance to us and will be treated in detail in the course of this Introduction and of the volume. To properly frame this central issue, however, we can anticipate that the notion of ‘complexity’ in morphological systems is often revealed and investigated through a set of relative measures that attempt to quantify the extent of morphology in a language, the predictability of the morphological system, and the pressures this places on processing and acquisition. The goal of the present volume is to build upon previous work on morphological complexity and to provide a crosslinguistic view on the key problems of its investigation seen from the perspective of a variety of current approaches. In the heart of all discussions of linguistic complexity, and especially of morphological complexity, lies the idea that complexity itself is a parameter of crosslinguistic variation. The history of this line of thought (see Joseph & Newmeyer 2012 for an excellent overview) shows some non-trivial swings of the pendulum ranging from the pre-theoretical assumptions of the linguists and

Peter Arkadiev and Francesco Gardani, Introduction: Complexities in morphology In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Peter Arkadiev and Francesco Gardani. DOI: 10.1093/oso/9780198861287.003.0001

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

2

    

philosophers of the early nineteenth century about the ‘complex’ classic IndoEuropean languages as opposed to the ‘primitive’ languages of ‘uncivilized people’ to explicit statements that all languages are equally complex. The latter view, which is known under the label of ‘equicomplexity hypothesis’, takes into account obvious differences between languages in the mere degree of elaboration of different structural subdomains (such as, e.g., vowels vs. consonants or nominal vs. verbal morphology); it states that ‘these isolable properties may hang together in such a way that the total complexity of a language is approximately the same for all languages’ (Wells 1954: 104; see also Hockett 1958: 180). Such a position, which is still commonly held by linguists of different backgrounds and theoretical persuasions (see, again, Joseph & Newmeyer 2012: 348–9; and Miestamo 2017), has been challenged by others, who have shown that ‘complexity in one area of grammar [correlates] positively with complexity in another area’ (Sinnemäki 2014: 190). With the development of contact linguistics and especially of pidgin and creole studies in the second half of the twentieth century, claims started being made that pidgins and creoles are structurally overall simpler than languages with a ‘regular’ sociolinguistic history (see, e.g., such work as Bickerton 1984; McWhorter 2001, 2005; Parkvall 2008; Bakker et al. 2011; Good 2012b, 2015), and, more generally, it has been claimed that linguistic complexity is subject to diachronic change and the effects of language contact (see Dahl 2004 and Trudgill 2011). As a matter of fact, statements to the effect that sociolinguistic parameters such as the number of speakers and degree of contact with other languages affect the complexity of linguistic (sub)systems go back as early as Jakobson (1929) and Trudgill (1983). Once it had been recognized that morphological complexity is a parameter of crosslinguistic variation, the urge arose to develop non-impressionistic and crosslinguistically applicable ways of measuring and quantifying the degree of morphological complexity of individual languages. The most important proponent of this line of thought is certainly Greenberg (1954), who developed a methodology of quantitative measurement of different types of morphological structure, the most famous of which is the ‘synthetic index’ (p. 185), that is, morpheme-toword¹ ratio in a sample of texts, which arranges languages into a continuum spanning from radically isolating to polysynthetic. This simple metric, however, is clearly insufﬁcient for the assessment of morphological complexity, since morphology is much more than mere arrangement of morphemes into words. As a simple illustration, consider the case-number paradigms of Turkish (Lewis 2001: 28) and Lithuanian (P.A.’s own knowledge) nouns in Table 1.1. Both Turkish and Lithuanian have two number and six case values, yielding twelve word forms. However, while in Turkish case and number are expressed

¹ ‘Word’ is intended as ‘word form’.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi



3

Table 1.1. Case paradigm of Turkish ev ‘house’ and Lithuanian miestas ‘city’

     





ev ev-i ev-in ev-e ev-de ev-den

ev-ler ev-ler-i ev-ler-in ev-ler-e ev-ler-de ev-ler-den

     





miest-as miest-ą miest-o miest-ui miest-e miest-u

miest-ai miest-us miest-ų miest-ams miest-uose miest-ais

separately by dedicated sufﬁxes in a compositional way, Lithuanian has cumulative (fused) exponence of both features. Under Greenberg’s morpheme-per-word ratio, Turkish nominal word forms are more complex than Lithuanian ones just because Turkish may have three (and in fact much more) morphemes per nominal word form (e.g., ev-ler-de house--), while Lithuanian has only two (miest-uose city-.). However, if we consider the total number of different afﬁxes occurring in the given paradigms, we ﬁnd that Turkish with its six overt afﬁxes is actually simpler than Lithuanian with its twelve afﬁxes (see, e.g., Plank 1986 for an early attempt to assess the complexity of morphological systems in such terms). Things become even more complicated if we go beyond Table 1.1 and consider the existence of at least ﬁve arbitrary inﬂectional classes of nouns in Lithuanian intersected by four partly arbitrary accentual classes, also called ‘accentual paradigms’ (a.p.), in Table 1.2 (from Arkadiev et al. 2015: 16; ‘hard’ and ‘soft’ refers to subdeclensions with non-palatalized and palatalized stem-ﬁnal consonant, respectively; for more details on Lithuanian declension classes, see Ambrazas et al. 2006: 107–33). This example suggests that along with morphological complexity on the syntagmatic axis (something that can be measured by the morpheme-to-word ratio) there exists morphological complexity on the paradigmatic axis, the two being logically and empirically independent of one another. Thus understood, morphological complexity becomes a composite notion and does not admit of such simple measurement as syntagmatic complexity (see more on this issue below), therefore an unbiased and non-reductionist crosslinguistic empirical investigation of morphological complexity itself becomes a fairly complex problem.² All in all, it seems to us that the most urgent still unsolved issues in morphological complexity can be captured in terms of the following questions:

² In this connection, Haspelmath (2009) has shown that parameters traditionally attributed to ‘ﬂexion’, as opposed to ‘agglutination’, such as cumulation, stem allomorphy, and afﬁx allomorphy, are logically and empirically independent of each other.





            

I hard ‘man’ () I a.p.

I soft ‘horse’ () III a.p.

II hard ‘day’ () IV a.p.

II soft ‘bee’ () II a.p.

III hard ‘son’ () III a.p.

IV (soft) ‘night’ () IV a.p.

výras výro výrui výrą výru výre výre výrai výrų výrams výrus výrais výruose

arklỹs árklio árkliui árklį árkliu arklyjè arklỹ arkliaĩ arklių̃ arkliáms árklius arkliaĩs arkliuosè

dienà dienõs diẽnai diẽną dienà dienojè diẽna diẽnos dienų̃ dienóms dienàs dienomìs dienosè

bìtė bìtės bìtei bìtę bitè bìtėje bìte bìtės bìčių bìtėms bitès bìtėmis bìtėse

sūnùs sūnaũs sūń ui sūń ų sūnumì sūnujè sūnaũ sūń ūs sūnų̃ sūnùms sūń us sūnumìs sūnuosè

naktìs naktiẽs nãkčiai nãktį naktimì naktyjè naktiẽ nãktys naktų̃ naktìms naktìs naktimìs naktysè

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

Table 1.2. Sample paradigms of Lithuanian nouns

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi



5

1. The hypothesis that morphology and syntax represent distinctly different, but interdependent types of grammatical organization has been challenged by scholars such as Haspelmath (2011), claiming that the divide between morphology and syntax is not clear-cut and hence irrelevant for typology. Given this, are there theoretical and methodological tools suitable to deﬁne morphological complexity and if yes, which ones? 2. If we, however, accept the hypothesis that the morphology vs. syntax divide is crosslinguistically and theoretically valid (see Arkadiev & Klamer 2019; Arkadiev 2020)—a view which we espouse—can we arrive at a uniform notion of morphological complexity given the diversity of morphological phenomena? 3. In direct connection to the former question, can we arrive at a single and straightforward measure of complexity that applies to languages that display radically different morphological encoding strategies? 4. What is the role of sociolinguistic, psycholinguistic, and diachronic factors in affecting morphological complexity? These problems constitute the main research questions of this volume, which aims to tackle them in a principled way, by presenting a collection of original research papers on different aspects of morphological complexity. This introductory chapter is meant to outline the ﬁeld and take the reader through the volume, and it is organized as follows: section 1.2 pursues the question of the scope of ‘morphological complexity’; section 1.3 surveys several conceptions and methodological approaches to morphological complexity distinguishing between two main types: formal approaches (section 1.3.1) and psycholinguistic approaches (section 1.3.2). Section 1.4 presents the structure of the volume and summarizes the contributions to it.

1.2 What is complex? In all discussion on morphological complexity, a question hangs in the air. Is morphology complex in its own right? This question is partly rhetorical, maybe trivial, but still central, as it concerns the theoretical demarcation of the object of investigation. The widespread expression ‘morphological complexity’ has at least two readings. It can refer to the overall contribution of morphology to complexity in grammar or it can mean complexity inside morphology. The ﬁrst reading, viz. morphology as a source of complexity for the overall language system, would be justiﬁed by the fact that languages can do (almost) entirely without morphology and that ‘a language can persist for a long time with little or no morphology’ (Aronoff 2015: 282). In this vein, Carstairs-McCarthy (2010: ch. 2) and Anderson (2015a: 12–13) conceive of morphology as a

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

6

    

redundant architectural quirk added to the logically necessary systems of syntax and phonology, and Aronoff goes so far to declare: ‘morphology is inherently unnatural. It’s a disease, a pathology of language’ (Aronoff 1998: 413). Such a view apparently entails that languages without morphology (e.g., Yoruba) are less complex than languages with at least a little morphology (e.g., Tok Pisin). This type of morphological complexity could then be paraphrased as ‘complexity induced by morphology’. The assumption that morphology per se is a complication resonates with the terminological use of ‘morphological complexity’ to deﬁne the property of words having an internal morphological structure, being, so to say, morphologically complex, as we ﬁnd in some authors concerned with word recognition (e.g., Fiorentino & Poeppel 2007; Bozic & Marslen-Wilson 2010), sign linguistics (Zwitserlood 2003), and rarely word formation (Hay 2003). Clearly, in this usage, complexity means the presence of internal structure, and claiming that a formally complex (i.e., composite) word is in itself complex, as opposed to a simplex word, amounts to saying that morphology as such is complexity. That would imply that morphology makes the language system more complex—an observation that is relative to other components of a language’s grammar. Adopting the concept of ‘effective complexity’ by Gell-Mann (1995), Moscoso del Prado Martín (2011) performs a corpus-based measure of the inﬂectional complexity of six European languages and claims that there is a ‘strong degree of mutual dependence between morphological and syntactic information.’ As he shows, when information on word order is explicitly factored in, the apparent gradation in complexity across languages, as calculated on the basis of the number of inﬂected forms per word, disappears. He arrives at the conclusion that ‘inﬂectional morphology serves a role in reduction of uncertainty, simplifying the description of the whole grammar’ (p. 3528). Whether or not this be the case, this question—although of great importance also for cognitive approaches to complexity—is not within the scope of the present book. Rather, we are concerned with the second reading of morphological complexity, that is, complexity inside morphology. Taking an inner-morphological perspective, we focus on which morphological phenomena can be considered complex or more complex than others and look at different degrees of complexity within morphology. Some authors have swiftly found an answer to this question, by identifying the core of morphological complexity in phenomena currently running under the heading of autonomous (or ‘pure’) morphology—including morphological entities and processes that are not extramorphologically motivated in a straightforward way, such as, for example, inﬂectional classes, allomorphy, patterns of syncretism, suppletion, etc. (Aronoff 1994; Maiden et al. 2011; Cruschina et al. 2013). For example, Baerman et al. (2015b: 4) consider morphological complexity as ‘the additional structure that cannot readily be reduced to syntax or phonology’. This extra layer of purely morphological structure, such as inﬂection classes in the Lithuanian example in

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi



7

section 1.1, may attain an astonishing degree of gratuitous complexity, whereas the mere presence of (possibly elaborate) transparent and regular afﬁxal expression of grammatical meaning, such as exempliﬁed by Turkish, is of least relevance for the study of morphological complexity (see also a discussion of different aspects of complexity in the polysynthetic languages, traditionally assumed to be the hallmark of morphological complexity, by Dahl 2017 and Sadock 2017). Of course, the decision to only focus on autonomous morphology has a great methodological advantage, as it provides a clear answer to the question we formulated in section 1.1, concerning the problematic demarcation of morphology and syntax. However, while we acknowledge that phenomena of pure morphology (‘morphology by itself ’) do increase the complexity of morphology as a whole because they have no external motivation, morphology by itself, as it has been theorized, only includes inﬂection. This would imply that only inﬂection counts as the locus of complexity and it is a matter of fact that most of the literature published on this topic is exclusively devoted to inﬂection (see Baerman et al. 2015a, 2017; Baechler 2017). Deﬁnitions of morphological complexity (in quantitative terms) such as the number of morphosyntactic features that a language has and the morphological means that are used to realize these features (see below) conform to this view, for morphosyntactic features are typically realized by inﬂection. As a matter of fact, work on the complexity of word formation processes is virtually missing in the literature, the only two exceptions known to us being a one-paragraph section in Nichols et al. (2006: 101–3) and Stump (2017: 70), each. Therefore, there is no study investigating whether inﬂection or word formation differ in their degree of complexity along one or another parameter. As Franz Rainer (personal communication, 2017) observes, ‘a great number of asymmetries emerge between word formation and inﬂection with respect to different dimensions of complexity’, such as the number of elements in the system, number of afﬁxes in a word, or the complexity of allomorphy, among others. However, he notices, ‘in the literature on the inﬂection-derivation divide (cf. Štekauer 2015), complexity has not been identiﬁed up to now as a possible dimension along which these two subcomponents of morphology might differ’. Lack of work on this speciﬁc topic might be due to multiple reasons: ﬁrst, the boundaries between inﬂection and word formation are often fuzzy; second, word formation, with lexical enrichment as its central function and all its corollaries (e.g., importance of encyclopedia, semantic drift), is less neat and less automatic than inﬂection and more difﬁcult to grasp (see Kusters 2003: 14–16); third—and crucially—the generally adopted metrics of morphological complexity (see section 1.3) mostly focus on formal criteria, thus lumping together categories of inﬂection and those of word formation under the general heading of morphological complexity. As we will see in more detail below, research in particular by Dahl (2004, 2009) and Trudgill (2009, 2011) has identiﬁed three major ingredients of synchronic

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

8

    

morphological complexity, which seem to apply to both inﬂection and word formation: (a) irregularity (e.g., allomorphy); (b) morphosemantic and morphotactic opacity (such a fusion of formatives, cumulative or portmanteau formatives, suppletion and non-linear suprasegmental feature realizations); and (c) syntagmatic redundancy (e.g., pleonastic afﬁxation, see Gardani 2015).

1.3 How many complexities? As we have seen in section 1.1, the linguistic literature on complexity is abundant, not least because ‘[h]ow to measure morphological complexity is itself an issue of some complexity’ (Nichols 1992: 64). As Miestamo (2017: 229) has appropriately noticed, complexity refers either to ‘something that is rich in internal composition (i.e. contains many parts as well as multiple and intricate connections between them), or to something that is difﬁcult to do or to understand.’ In the ﬁrst case, complexity is an objective property of a linguistic system and therefore labeled ‘objective complexity’ (Dahl 2004: 2) or ‘absolute complexity’ (Miestamo 2008) or ‘formal complexity’ (Stump 2017); in the second case, complexity is conceived as cost/difﬁculty that a given linguistic system or structure causes to language users and labeled ‘relative complexity’ (Miestamo 2008, 2017) or ‘psycholinguistic complexity’ (Stump 2017). In the following, we will adopt Stump’s terminology.

1.3.1 Formal morphological complexity Formal complexity can be subsumed under the following general deﬁnition of complexity provided by the philosopher Nicholas Rescher: ‘Complexity is ﬁrst and foremost a matter of the number and variety of an item’s constituent elements and of the elaborateness of their interrelational structure, be it organizational or operational’ (Rescher 1998: 1). In linguistics, we identify three principal directions in research on formal complexity, in terms of how it is conceptualized and measured: (1) quantitative approaches; (2) qualitative approaches; and (3) informationtheoretic approaches. Quantitative approaches conceive complexity in terms of the number of elements of which a given morphological entity consists, mainly inventory size and string length, or alternatively, the length of the rules necessary to describe a form. This quantitatively construed type of complexity, dubbed ‘enumerative complexity’ by Ackerman & Malouf (2013), is detectable both syntagmatically and paradigmatically. On the syntagmatic axis, it can be the before-mentioned average number of morphemes per word form (Greenberg 1954, 1960) or the maximal number of inﬂectionally expressed categories per verb (Bickel & Nichols 2005); this type corresponds to Rescher’s constitutional complexity, viz. the ‘[n]umber of

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi



9

constituent elements or components’ (Rescher 1998: 9). On the paradigmatic axis, enumerative complexity relates to the number of distinct inﬂectional classes for a given part-of-speech (i.e., allomorphy) or the number of cells in a paradigm corresponding to the realizations of different values of a given morphological feature (e.g., case); this type of complexity corresponds to Rescher’s taxonomical complexity, the ‘[v]ariety of constituent elements, i.e., number of different kinds of components in their physical conﬁguration’ (Rescher 1998: 9). Up to fairly recent times, only enumerative complexity had featured prominently in the literature, especially in typologically oriented research; for example, it is only this kind of complexity that is represented in WALS (Haspelmath et al. 2005; Dryer & Haspelmath 2013), certainly due to practical reasons. In this respect, it is worth mentioning several works speciﬁcally addressing the issue of enumerative paradigmatic complexity, such as Rhodes (1987) on the different morphological makeup of large and small paradigms and a whole series of works by CarstairsMcCarthy, whose aim was to ﬁnd constraints on enumerative complexity of inﬂectional classes in terms of the number of afﬁxal allomorphs and their properties (see Carstairs 1983; Carstairs-McCarthy 1994, 1998, 2010). Another type of quantitative measure concerns not the number of the elements composing a morphologically complex form but rather the (minimum) size (or length) of the rules required to describe and generate such a form. This type of qualitative approach, often referred to as Kolmogorov complexity, resonates with the Rescher’s concepts of both descriptive complexity (the ‘[l]ength of the account that must be given to provide an adequate description of the system at issue’) and generative complexity (the ‘[l]ength of the set of instructions that must be given to provide a recipe for producing the system at issue’, Rescher 1998: 9) (cf. Dahl’s ‘minimum description length’, Chapter 13, this volume). Qualitative approaches conceive complexity in terms of identifying those morphological patterns/elements that are complex or more complex than others. Proponents of qualitative approaches need to stipulate an unmarked, complexity-neutral ideal—a canon, often conceived as an isomorphic relation of content to form—upon which to construe hierarchies of complexity in terms of degrees of deviation from it. Most notably, work by Corbett (e.g., 2007, 2015) has propagated the notion of non-canonicity (both in inﬂection and derivation), which can be deﬁned as any deviation from properties such as transparency, regularity, and form-function biuniqueness, as is manifested, for example, in non-phonological allomorphy of afﬁxes and stems (Baerman et al. 2017: 100–7), overabundance (Thornton 2019), multiple (extended) exponence (Harris 2017), syncretism (Baerman et al. 2005), defectiveness (Baerman et al. 2010), and polyfunctionality (Stump 2016: 228–51), let alone more dramatic deviations such as suppletion (Stump 2006a; Corbett 2007) or deponency (Baerman et al. 2007). Early discussions of non-canonicity and its possible interactions with enumerative complexity can be found in Plank (1986) and Carstairs (1987) in addition to

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

10

    

works already mentioned, while recently, Johanna Nichols (2009) has hinted at a possible metric of morphological complexity related to non-canonicity (a proposal she fully develops in Chapter 7, this volume). Most studies of non-canonical phenomena in morphology have focused on the paradigmatic axis; however, nothing per se precludes the application of this notion to syntagmatic phenomena, such as combinatorics and mutual order of afﬁxes (here comes to mind the distinction between semantically driven layered organization of morphology vs. opaque templatic morphology; see Stump 2006b, Good 2016), concatenative vs. non-concatenative exponence, morphophonological transparency vs. opacity and other issues belonging to the domain of morphotactics. It remains an empirical as well as a conceptual question, though, which kind of morphotactic organization should be considered ‘canonical’ and ‘less complex’. For instance, in languages where afﬁx order directly reﬂects semantics, it is usually possible to permutate certain afﬁxes depending on their mutual scope (Rice 2011; Mithun 2016); whether such deviations from ﬁxed ordering constitute additional complexity is not at all obvious. While teleologically different, also Natural Morphology (Dressler et al. 1987; Dressler & Kilani-Schoch 2016; Dressler 2019) is centered on the idea of deviation from a core.³ Aiming at accounting for morphological preferences based on extralinguistic motivations, it theorizes a semiotically derived notion of naturalness, deﬁned as the immediate, most unmarked, cognitively easiest, and thus universally preferred option. Conversely, naturalness-deﬁning criteria determine deviation from the (most) natural option. This framework makes clear that other factors come to play a role in the conception and interpretation of morphological complexity, such as, for example, transparency vs. opacity of forms or morphotactic rules. As Hengeveld & Leufkens (2018: 141) observe, ‘languages may be complex, yet transparent, or simple, yet opaque’. To take the concrete case, the Turkish vs. Lithuanian data in Table 1.1 show that Turkish morphology is more complex in the sense that a single word form may potentially contain a high number of morphemes. At the same time, however, it is transparent in that every morpheme corresponds to one ﬁxed meaning, while Lithuanian morphology is more opaque. In the framework of Natural Morphology, Dressler (2011) views unnaturalness as a source of complexity and morphological complexity as the sum of all morphological categories, rules, and inﬂectional classes of a language, including both productive and unproductive patterns. Distinguishing between productive and unproductive patterns, he considers morphological complexity a hyperonym of morphological richness, which is conceived only in terms of productive patterns (Dressler 2003: 47; see also Dressler, Kononenko, et al.

³ Note that, while qualitatively oriented, both Natural Morphology and Canonical Typology are implicitly able to quantify degrees of complexity, computing the degree of deviation from the natural core or canon, respectively.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi



11

2019). This distinction between active and static parts of morphology, is, in our view, not only of crucial importance with respect to psycholinguistic approaches to complexity but also foundational of approaches focused on predictability, as we will see below. Finally, information-theoretic approaches play down the role of combinatorics and construe morphological complexity in terms of predictability and entropy. Their development is intimately related to word-and-paradigm models of morphology, which consider inﬂectional systems as networks of implicative relations holding between fully-inﬂected word forms. Consequently, they aim to understand to what extent the choice of exponence for a given cell is predictable from any other information available to the speaker, with complexity being in an obvious inverse relation to predictability (cf. Finkel & Stump 2007, 2009; Stump & Finkel 2013). Ackerman & Malouf (2013) propose the term ‘integrative complexity’, based on the notion of entropy as ‘a measure of the reliability of guessing unknown forms on the basis of known ones’, that is, a measure of predictability. They move from the intuition that ‘speakers must generalize beyond their direct and limited experience of particular words’ (p.436) and posit a ‘Low Entropy Conjecture’: morphological systems, such as paradigms, in which conditional entropy among related word forms is low, are more efﬁcient, as they ‘permit these crucial inferences to be made easily’ (p. 436) (cf. ‘Paradigm Structure Conditions’ of Wurzel 1989).⁴ In other words, complexity derives from opaque intraparadigmatic relations, for opacity hampers the predictability and predictiveness among word forms in a lexeme’s paradigm. The ‘Low Entropy Conjecture’ is supported by recent studies on inﬂection class systems clearly violating the enumerative complexity-based constraints of the kind proposed by CarstairsMcCarthy (see Baerman 2012, 2016; Sims 2015).⁵ The approaches to formal morphological complexity surveyed thus far share the potential to seize the degree of complexity. However, some typological studies have pursued the topic without a focus on metrics. One line of investigation, for example, has concerned the relation of (certain aspects of) morphological complexity to any other typological parameters such as phonological systems (Shosted 2006; Fenk-Oczlon & Fenk 2008, 2014), word order (e.g., Sinnemäki 2008; Bentz & Christiansen 2013), among others. Other studies have focused on the differential elaboration of nominal and verbal morphology (e.g., Nichols 1986, 1992; Mithun 1988; Kibrik 2012). In this domain, there are still more open questions than established answers, partly because of the lack of consensus as regards the

⁴ Also morphomic stem distributions have been interpreted in terms of predictive relations by Blevins (2016b: 123), a view partly criticized by Maiden (2018: 23–4). ⁵ It is likely that a conception of complexity based on entropy applies better to inﬂection than word formation because inter-word relations are generally much more complex in inﬂectional than in derivational paradigms.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

12

    

deﬁnition of the relevant aspects of complexity and the adequate ways of its measurement. Still another line of research is concerned with the relation between morphological complexity and sociolinguistic typology. In section 1.1, we already mentioned the idea that pidgins and creoles are in general less complex than languages with a long history and uninterrupted transmission. More generally, in recent work (e.g., Trudgill 1997, 2009, 2011, 2017; Kusters 2003, 2008; McWhorter 2007, 2008; Lupyan & Dale 2010; Bentz & Winter 2013; Bentz et al. 2015; Bentz 2016), claims have been advanced that the overall degree of complexity as well as certain particular types of grammatical complexity correlate with such socioecological conditions of language use as high vs. low degree of contact, number of adult learners, size and geographic expansion of the speaker population, and some others (see also Tinits 2014 for a behavioural experiment with a miniature artiﬁcial language). Signiﬁcantly, most of such studies have focused on simpliﬁcation caused by language contact (see Dorian 1978; McWhorter 2001; among many others), emphasizing that morphological complexity requires long-term periods of socioecological stability to develop (Dahl 2004). Nevertheless, studies exist showing that certain types of language contact (e.g., those involving stable childhood multilingualism) can contribute to preserve complex patterns (Trudgill 2011; Mithun 2015) and even result in increase rather than loss of morphological complexity due to borrowing and contact-induced grammaticalization (see Vanhove 2001; Aikhenvald 2002, 2003a; de Groot 2008; Loporcaro 2018; Loporcaro et al. forthcoming). Also processes of language genesis brought about by language contact do not necessarily come along with morphological simpliﬁcation. In a study on the rapid birth of a new mixed language in Australia, Gurindji Kriol, from the admixture of Gurindji and Kriol, Meakins et al. (2019) demonstrate that there was no preferential adoption into Gurindji Kriol of less complex variants and that, in fact, complex Kriol variants were more likely to be adopted than simpler Gurindji equivalents. Given that Gurindji Kriol is the primary language of the younger generation in the Gurindji community, Meakins et al. interpret these results in light of the fact that the acquisition of morphology in morphologically complex languages is less challenging for children than for adults (cf. also Miestamo 2008). The issue of ease vs. difﬁculty of processing in language acquisition leads us over to the second main type of morphological complexity introduced in section 1.3, viz. psycholinguistic morphological complexity.

1.3.2 Psycholinguistic morphological complexity As we have seen in the previous section, also Natural Morphology and Ackerman & Malouf’s (2013) integrative complexity appeal to ease in processing and

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi



13

production, as a key to the interpretation of what is complex in morphology. These models build a bridge to the second type of approach to morphological complexity, psycholinguistic morphological complexity, that focuses on the cost/ difﬁculty that a given linguistic system or structure causes to language users, that is, computational effort. Psycholinguistic approaches to morphological complexity assume that the degree of ease vs. cost of a morphological pattern in processing and production correlates with its degree of complexity. This line of research draws evidence from three areas of study: adult processing, L1 and L2 acquisition, and the performance of artiﬁcial automatic learning. One line of investigation within this ﬁeld has developed around the equation of complexity with low parsability (Stump 2017). In this respect, the debate on the balance between memory retrieval and online computation in language production is particularly relevant. In the context of the debate on lexical access and speciﬁcally of the so called English past-tense debate (for references, cf. Ambridge & Lieven 2011: 169–87), Pinker & Prince (1988) argued for a ‘dual-route’ model that could account for both irregular forms (feel/felt), which are memorized as wholes in the mental lexicon, and an online rule of default responsible for morphemic concatenation (walk/walked) (see also Gardani et al. 2019: 24–7). At the same time, it was observed that regular forms with high frequency can also be stored in the mental lexicon (Alegre & Gordon 1999a: 56). However, the fact that both morphologically less complex (i.e., highly parsable) and morphologically complex (i.e., low parsable) word forms can be lexically stored leads to concluding that complexity qua parsability does not correlate with processing cost. The role of frequency in lexical access has been stressed by nobody else as vigorously as by Joan Bybee (1985, 1995, 2007). Consequently, the conception of complexity focusing on system complexity, in which irregularity is viewed as an ingredient of complexity, is incompatible with the results of studies on processing complexity, which have shown that irregularity does not per se constitute an obstacle for the language user, as it can be defeated by frequency. Studies in language acquisition, too, do not necessarily support the hypothesis that psycholinguistic complexity and formal complexity coincide. For example, in a crosslinguistic study on the relationship between the morphological complexity of child-directed speech and the speed of morphological acquisition in children, Xanthos et al. (2011) found a strong positive correlation between inﬂectional complexity of the input and the speed of acquisition. This result seems to suggest that the more morphology in the input, the easier the morphology is to acquire. According to Kelly et al. (2014), formal complexity such as heavy synthesis in polysynthetic languages is not a challenge for L1 acquisition if the templatic sequence in which formatives are used is regular, and Allen (2017) also reports longitudinal studies showing that Inuit children acquire elaborate derivational and inﬂectional morphology early and with ease. (See also Stoll et al. 2017, on the acquisition of verb morphology in polysynthetic Chintang.) Other acquisitional

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

14

    

studies construe formal complexity not as constitutional complexity but as descriptive complexity. For example, in a crosslinguistic study on the emergence and early development of synthetic compounds, Dressler, Sommer-Lolei, et al. (2019) provide evidence that synthetic compounds (i.e., compounds in which the head is derived from a verb and the non-head is an argument of this verb) such as German Nussknacker ‘nutcracker’ are acquired later than comparable threeconstituent compounds. They interpret this later acquisition as a sign of higher complexity: equating the degree of complexity with the number of rules involved, synthetic compounds, which are derived by both a rule of compounding and a rule of derivation, are more complex than words derived either only by compounding or only by derivation rules. Besides that, numerous studies, both typological and experimental (e.g., Wray & Grace 2007; Lindström 2008; Trudgill 2011; Bentz et al. 2015; Bentz & Berdicevskis 2016; Atkinson et al. 2018), show that morphological complexity, while being an obstacle to L2 acquisition in adults and hence subject to erosion, regularization, and loss in those situations of language contact that involve massive adult acquisition, does not, in fact, constitute a severe challenge for L1 acquisition in children. Moreover, Lupyan & Dale (2010) have hypothesized that infants, in fact, beneﬁt from the increased redundancy brought about by morphological complexity in languages used in small groups. Psycholinguistic approaches to morphological complexity have attracted criticisms mainly of two sorts. One problem is that the perception of ease or, conversely, difﬁculty, might vary among language users, and therefore might not be an objective metric; the other problem is that ‘psycholinguistic background research on the processing cost and learning difﬁculty of a given grammatical phenomenon’ might not be enough (Miestamo 2017: 232). As a matter of fact, the correlation between ‘our intuitive notion of morphological complexity and actual evidence of the pace of acquisition of more or less complex inﬂectional systems in child language’ (Marzi et al. 2018) seems to be poor. In order to solve at least the objectivity issue, recent research in morphological complexity has expanded into the ﬁeld of neurobiologically inspired computational models of processing and learning. In one such study, Marzi et al. (2018) have focused on the performance of recurrent self-organizing neural networks trained to learn languages, in order to understand how degrees of inﬂectional complexity affect word processing strategies. They found a signiﬁcant systematic correlation between regularity and predictability of verb forms and interpret the evidence ‘as the result of a balancing act between two potentially competing communicative requirements’, viz. recognition (leading to a maximally contrastive system) and production (leading to maximally predictable forms).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi



15

1.4 About this volume In section 1.1, we identiﬁed four issues we deem among the most urgent to solve in research on morphological complexity. In order to tackle these issues in a principled way, we convened a dedicated workshop ‘Morphological Complexity: Empirical and Cross-Linguistic Approaches’ at the 48th Societas Linguistica Europaea (SLE) meeting in Leiden in 2015. The present volume is a collection of original research papers consisting in equal measure of papers delivered at the workshop and of invited contributions. (Each chapter was subject to a threefold reviewing process consisting of an anonymous external reviewing, a nonanonymous internal review performed by a fellow contributor, and comments by the editors.) The volume features: (a) various theoretical, methodological, and typological perspectives on morphological complexity (from ‘classic’ morphological description to experimental and information-theoretic approaches); (b) both detailed investigations of individual languages and wider crosslinguistic studies; (c)synchronic and diachronic analyses; (d) a broad coverage of topics including structural and sociolinguistic issues, such as the development of morphological complexity under different sociohistorical conditions (prominently, language contact); (e) empirical evidence drawn from languages from all continents and belonging to a number of typologically diverse language families. Unfortunately, the volume does not cover the complexity of word formation and the complexity of sign language morphology. We hope that future research will take care of these issues. The volume, introduced by the present chapter, consists of three parts organized according to the chapters’ main focus and scope, and is closed by a discussion in Chapter 13 by Östen Dahl on the volume’s contributions and on the minimum description length approach. Part I includes ﬁve chapters dealing with issues of morphological complexity from a language-speciﬁc perspective. Jeff Parker and Andrea Sims’s Chapter 2, ‘Irregularity, paradigmatic layers, and the complexity of inﬂection class systems: A study of Russian nouns’ follow Stump & Finkel’s (2013: 55) deﬁnition of complexity of an inﬂection class system as ‘the extent to which the system inhibits motivated inferences about a lexeme’s full paradigm of realized cells [ . . . ]’. Using data from Russian, the authors explore the implications of gradient (ir)regularity for measuring and comparing the complexity of inﬂection class systems. They ﬁnd that some, but not all, less regular inﬂectional patterns signiﬁcantly increase the complexity of the system, but that the increased complexity is mitigated by structural and distributional properties of the inﬂectional system. In Chapter 3, ‘Demorphologization and deepening complexity in Murrinhpatha’, John Mansﬁeld and Rachel Nordlinger investigate diachronic changes in the complexity of verb inﬂection in Murrinhpatha, a polysynthetic non-Pama-Nyungan language of northern Australia, which displays a high level of

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

16

    

complexity in terms of unpredictable analogical relations in inﬂectional exponence. The authors demonstrate that recent changes in inﬂection allomorphy blur the boundaries of stem and afﬁx, resulting in gradual demorphologization and increasingly unpredictable exponence. Felicity Meakins and Sasha Wilmoth’s Chapter 4, ‘Overabundance resulting from language contact: Complex cell-mates in Gurindji Kriol’ examines the development of overabundance (see above) in the subject-marking system of Gurindji Kriol, an Australian mixed language. By means of generalized linear mixed models, which probabilistically measure the use vs. non-use of a feature, the authors interpret the insurgence of overabundance as an instance of complexiﬁcation, providing a counterexample to the commonly held view that contact always results in reduction of morphological complexity. In Chapter 5, ‘Derivation and the morphological complexity of three French-based creoles’, Fabiola Henri, Gregory Stump, and Delphine Tribout take a fresh look at a controversial assumption in creole research, namely the widespread claim of poverty of creole morphology (see references in section 1.1). Analysing deverbal nominalizations via conversion in Mauritian, Guadeloupean, and Haitian, and assessing the integrative complexity of the respective morphological systems’ derivational relations, the authors demonstrate that the complexity of the derivational relations in these creoles attains the same degree as those of the lexiﬁer, French. Finally, in Chapter 6, ‘Simpliﬁcation and complexiﬁcation in Wolof noun morphology and morphosyntax’, Michele Loporcaro explores the diachronic dynamics of morphological complexity in the nominal morphology and morphosyntax of Wolof, an Atlantic language of Senegal. Loporcaro shows that, while changes such as the emergence of inﬂectional irregularities produced a local increase in complexity in noun and determiner morphology, overall the morphology of Wolof is less complex than that of closely related Atlantic languages. Loporcaro provides an explanation of the simplifying tendencies in sociolinguistic terms, referring to the correlation between simpliﬁcation and prestige in the Wolof speech community. Here, speaking correctly is associated with low-caste in rural settings, while linguistic prestige is achieved through language mixing, extensive borrowing, and, crucially, the simpliﬁcation, via paradigmatic leveling, of inherited alternations impacting on both the morphology and the morphosyntax of the language. Part II consists of three chapters approaching morphological complexity from a crosslinguistic perspective. Johanna Nichols’s Chapter 7, ‘Canonical complexity’ considers not size but non-transparency the locus of morphological complexity and adopts the notion of (non-)canonicity to deﬁne crosslinguistically comparable variables, capture non-transparency, and restrict the comparanda to a manageable sample. Francesca Di Garbo’s Chapter 8, ‘The complexity of grammatical gender and language ecology’ is a crosslinguistic investigation of the evolution of gender agreement patterns, which are viewed as an instance of morphological complexity, and its ties to sociohistorical factors. Analysing a sample of thirty-six languages in

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi



17

a qualitative fashion, the author is able to establish association between multiple patterns of change, such as loss, reduction, emergence, and expansion of gender, on the one hand, and various sociohistorical situations, ranging from demographic structure (population size) to language policies and language attitudes, on the other. In Chapter 9, ‘Morphological complexity, autonomy, and areality in western Amazonia’, Adam Tallman and Pattie Epps investigate the relationship between morphological complexity and areality-building processes across Amazonia. The authors observe (a) morphological proliferation in four domains (nominal classiﬁcation, tense, evidentiality, and valency-adjusting mechanisms) across unrelated western Amazonian languages; (b) high system complexity across these domains; and (c) a link between complexity and language contact. They conclude that factors often associated with morphological complexity are in fact not necessarily morphological, as a large percentage of bound morphemes in these languages display ambiguity between morphology and syntax. The three chapters in Part III address the problem of morphological complexity from an acquisitional perspective. In Chapter 10, ‘Radical analyticity as a diagnostic of adult acquisition’, John McWhorter proposes that languages can become radically analytic, that is, completely or near-completely void of inﬂectional morphology, only via incomplete acquisition. He draws evidence from West Africa and Southeast Asia and shows that the relevant languages score more like creoles than like older languages. In McWhorter’s view, second-language acquisition decisively reduces grammatical complexity (in terms of bound inﬂection) to a degree that ordinary language change cannot. The author suggests that radical analyticity can be treated as evidence that such second-language acquisition occurred in the history of the language, and thus, synchronic morphological complexity can serve as a clue to the past of a language, in the absence of historical documentation. Also Chapter 11, ‘Different trajectories of morphological overspeciﬁcation and irregularity under imperfect language learning’ by Aleksandrs Berdicevskis and Arturs Semenuks deals with imperfect language learning, partly supporting McWhorter’s conclusion. By reference to the editors’ fourth question (see section 1.1), the authors investigate how morphological complexity is related to socioecological parameters. They run an iterated artiﬁcial language learning experiment, tracing the change of two facets of complexity: overspeciﬁcation and irregularity. They ﬁnd that the presence of imperfect learners in a transmission chain leads to a much stronger decrease in morphological overspeciﬁcation. Overspeciﬁcation, however, is not usually fully eliminated, and its partial decrease often leads to increased irregularity, thus making languages simpler in one respect, but more complex in another. Additionally, higher irregularity decreases learnability, and this effect is stronger for imperfect learners compared to normal learners. Thus, the relationships between these two facets of morphological complexity and language learnability have their own complexities. Finally, Marianne Mithun’s Chapter 12, ‘Where is morphological complexity?’ is ﬁrmly anchored in the debate on the

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

18

    

psycholinguistic reality of complexity. Examining the speech of native speakers of two North American languages inﬂuenced to varying degrees by contact with English, Mithun observes that even native speakers with limited proﬁciency produce morphological structures that are highly complex for the analyst, with large numbers of morphemes per word, fusion, and irregularity. She argues that the distinction between what linguists consider complex and what speakers ﬁnd difﬁcult (or easy) to acquire or preserve, is not surprising if one takes the view that morphology in these languages is not processed and learned online, but rather in chunks. As we said, Östen Dahl closes the volume by critically reviewing the volume’s chapters and seeing how the concepts of morphological complexity applied therein relate to the ‘minimum description length approach’. Turning now to the four research questions (section 1.1) the contributors to this volume focused on, we observe that (question 1) it is possible to deﬁne morphological complexity, even though the demarcation between morphology and syntax is in many cases fuzzy (see Tallman & Epps, Chapter 9, this volume). At the same time, however, we observe that different authors provide and apply different deﬁnitions, also within this volume. Seemingly, the very existence of multiple deﬁnitions of morphological (and morphosyntactic) complexity is related not only to the collocation of a speciﬁc linguistic feature along the grammar continuum (from pure morphology to morphosyntax), but also to the diversity of phenomena and types of complexity. This observation leads us to answer question 2, namely whether is it possible to arrive at a uniform notion of morphological complexity. We concur with Dahl (Chapter 13, this volume), that a set of shared notions and standard works that everybody refers to has not yet been reached. Thus our answer to question 2 is no, and the motivation for it is that the linguistic facts are so multifarious and diverse that not one, but many different complexities can be detected (whence the plural in this chapter’s title). Then we asked (question 3) whether it is possible to arrive at a crosslinguistically applicable and theoretically founded measure of morphological complexity. Berdicevskis et al. (2018) have recently pointed to the absence of a gold standard. We, too, have observed that there exists neither a commonly accepted deﬁnition of morphological complexity nor a uniform measure thereof. Admittedly, the growing understanding of the multifaceted nature of morphological complexity is much in line with the mutivariate nature of typological comparison. So, perhaps we asked the wrong question. Probably, the quest for a unique measure is an epistemological fallacy. Once we have acknowledged that there is not one morphological complexity, but many morphological complexities, we should identify a set of complementary speciﬁc measures to apply crosslinguistically. Then, the only reasonable typological approach to morphological complexity is to break it down into individual variables (if necessary, each with its quantitative measure) and then look for mutual correlations between such variables or for their connections with other parameters of crosslinguistic variation. Of course, cumulative

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi



19

measures such as the one developed by Nichols (Chapter 7, this volume) are also possible, but they are not holistic, either, and in many cases are based on a signiﬁcant reduction of empirical data. In conclusion (question 4), we wanted to investigate the role of such extramorphological factors as diachronic development and (in)stability, susceptibility to loss vs. spread in situations of language contact, and, generally, of sociolinguistic and socioecological parameters, in affecting morphological complexity. As several chapters in this volume have demonstrated, in spite of at times diverging results, the study of the correlation between morphological complexity and extralinguistic factors such as the role of language contact or speakers’ sociolinguistic attitudes, is fruitful and promising. Of course, the answers we have provided here are per force partial and by far not deﬁnitive, as much more case studies and comparative evidence are necessary to get to a reliable picture of such complex phenomena as morphological complexities. We hope that future research will pursue these pathways.

Acknowledgements The volume’s editors wish to thank the authors, the external reviewers, and our editors at OUP. The support of the Swiss National Science Foundation (SNF CRSII1_160739) is gratefully acknowledged. Besides that, we thank Aleksandrs Berdicevskis, Wolfgang Dressler, Michele Loporcaro, and Franz Rainer for their insightful comments on a preliminary version of this introductory chapter.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

I

THE LANGUAGE-SPECIFIC PERSPECTIVE

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

2 Irregularity, paradigmatic layers, and the complexity of inﬂection class systems A study of Russian nouns Jeff Parker and Andrea D. Sims

2.1 Introduction The extent to which morphological patterns are included in analyses of inﬂection class systems tends to be strongly inﬂuenced by what is considered to be a ‘regular’ or ‘irregular’ pattern in a language. The number of classes and their deﬁnitional properties reﬂect the assumptions and analytical choices of the investigator. Two such choices are particularly notable. First, patterns that are reﬂected in few lexemes or unproductive tend to be labeled as ‘irregular’ and considered to be outside of the system. Second, where inﬂectional properties are correlated with both afﬁxal and non-afﬁxal exponence (e.g., stress, stem alternations), the afﬁx tends to be treated descriptively and theoretically as the exponent of the properties, with non-afﬁxal marking often treated as a kind of irregularity, or simply ignored. Some approaches explicitly choose to focus only on regular afﬁxal patterns (e.g., Cameron-Faulkner & Carstairs-McCarthy (2000)). Others handle stem alternations as phonological readjustments, denying them status as exponents of morphosyntactic properties; see Halle (1994) for this idea as applied to Russian nouns. Even within the Word and Paradigm framework, which explicitly rejects the classical notion of the morpheme as a bundling of (afﬁxal) form and meaning (see Stump 2001: ch. 1 for an overview of arguments), linguists sometimes ignore non-afﬁxal dimensions in their analyses as a practical matter, showing how deeply ingrained the privileged status of afﬁxal patterns is in linguistics. For example, in their study of inﬂection class system complexity, Ackerman and Malouf (2013: 434f) acknowledge that the description of Greek nominal inﬂection they adopt abstracts away from ‘many relevant complexities,’ including inﬂectional stress.¹ (So does their description of Russian nominal inﬂection.) ¹ As another example, even PARSLI (PARadigm Shape and Lexicon Interface), which is designed to explicitly represent non-canonical inﬂectional properties like stem change, defectiveness, overabundJeff Parker and Andrea D. Sims, Irregularity, paradigmatic layers, and the complexity of inﬂection class systems: A study of Russian nouns In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Jeff Parker and Andrea D. Sims. DOI: 10.1093/oso/9780198861287.003.0002

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

24

    . 

In this chapter, we explore the role that irregularity and non-afﬁxal exponence play in the complexity of inﬂection class systems.² Recent typological studies of inﬂection class complexity have focused on the implicative structuring of inﬂection classes and the extent to which this structure is informative about the exponence of inﬂected forms (Ackerman et al. 2009; Ackerman & Malouf 2013; Blevins et al. 2017; Bonami & Beniamine 2015; Sims 2015; Sims & Parker 2016; Stump & Finkel 2013). This is reﬂected in the way that Stump & Finkel deﬁne the complexity of an inﬂection class system as ‘the extent to which the system inhibits motivated inferences about a lexeme’s full paradigm of realized cells from subsets of its cells’ (Stump & Finkel 2013: 55; emphasis ours). Throughout this chapter we will assume a similar deﬁnition; see (1). (1)

Complexity of an inﬂection class system: the average extent to which the system inhibits motivated inferences about the realized form of a lexeme, given one or more other realized forms of the same lexeme.

We make this notion more precise and operationalize it as average conditional entropy in section 2.5 below. Implicative deﬁnitions of complexity as in (1) represent a step in the direction of crosslinguistic comparison based on the internal structuring of inﬂectional systems, rather than measures like the number of inﬂection classes or the size of paradigms.³ The former is what Ackerman & Malouf (2013) call ‘Integrative’ complexity; the latter they call ‘Enumerative’ complexity. Integrative complexity measures represent a productive development to the extent that they better reﬂect the ways in which inﬂectional systems pose challenges for speakers.⁴ While it is not clear to us that any particular notion of complexity within morphology will be adequate for the variety of questions that morphology poses, the implicative-based notion of complexity adopted here also has the potential to emerge as an

ance, etc., does not include non-segmental information like stress as a possible deviation from canonicity (Walther 2017). ² Since inﬂection classes are an example of a purely morphological phenomenon, that is, not syntactically relevant, this type of complexity seems to avoid the problematic questions about the division between morphology and syntax (see discussion in Arkadiev & Gardani, Chapter 1, this volume). ³ For a distinct but somewhat related notion, see the discussion of ‘relative’ and ‘absolute’ measures of complexity in Miestamo (2008) inter alia. Miestamo’s discussion of relative approaches focuses on psycholinguistic and acquisition-oriented approaches/evidence. While our information-theoretic measures are not psycholinguistic in nature, they (and their use in previous work, for example, Ackerman et al. 2009) could be classiﬁed as relative in terms of their focus on the potential ‘cost and difﬁculty to language users’ (Miestamo 2008: 24). (See also discussion in Arkadiev & Gardani, Chapter 1, this volume; Dahl, Chapter 13, this volume.) ⁴ See section 2.5 for some justiﬁcation of this claim, and for deﬁning inﬂection class complexity in terms of the predictability of individual forms, rather than the lexeme’s class membership (i.e., its entire paradigm of forms).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

25

important way to uncover crosslinguistic tendencies in the complexity of inﬂection class systems (see questions 2 and 3 in Arkadiev & Gardani, Chapter 1, this volume). At the same time, the fact that much previous work within such notions of complexity has been based on descriptions of inﬂectional systems that include only afﬁxes, and sometimes only the most regular patterns, leaves it unclear whether claims about limits on inﬂection class complexity (e.g., the Low Conditional Entropy Conjecture, Ackerman & Malouf 2013) apply to all inﬂectional patterns in a language or only those that are most regular. More generally, it raises questions about how patterns that are typically excluded from consideration interact with other elements in the system, and the role they play in determining the complexity of inﬂection class systems. Brown & Hippisley (2012) are a notable exception to this tendency to focus just on afﬁxal exponence. We follow them in using the term ‘paradigmatic layers’ (2012: 71) of exponence (or just ‘layers’ for short) for dimensions of inﬂectional form (e.g., stress, sufﬁxes, stem alternations) that have their own, independent distributions but which jointly realize the inﬂectional information of a word. We use Russian nouns to investigate these issues. We consider how patterns that are often excluded from consideration affect the complexity of the system and how they are integrated into the implicative structure of the system. The core questions that we ask are: How do interactions between component parts of the Russian nominal inﬂection class system shape the complexity of that system as a whole? In particular, are less-regular and non-afﬁxal layers of exponence disruptive to an inﬂectional system, disproportionately increasing its complexity? Or, alternatively, is their disruptive potential mitigated by the way elements in the system interact? Little work has compared implicative structuring within subcomponents of the lexicon—an issue that is potentially important for understanding the internal structuring of inﬂectional systems. By looking at the inﬂectional structure of Russian nouns in this way, we aim to promote a fuller understanding of how inﬂectional organization determines the complexity of inﬂection class systems. We do not assume that every language is alike, or that Russian is representative. But we use Russian as a way to explore and illustrate the issues involved.

2.2 Regularity, paradigmatic layers, and inﬂection classes We focus on irregularity and non-afﬁxal layers of exponence because the representation of a system can affect the assessment of its complexity. For example, Sagot & Walther (2011) compare four descriptions of French verbs. The descriptions range from a system with many classes and no lexically speciﬁed stem allomorphy (139 classes) to lexically specifying all stem allomorphy (one

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

26

    . 

inﬂection class), with two other descriptions that split the burden of explanation between the inﬂection class system and lexical speciﬁcation. As they observe (p. 42), it makes little sense to evaluate an inﬂectional system based only on the morphological description and not what is lexically speciﬁed, since a morphological description can always be made simpler by positing more lexical speciﬁcation. They thus evaluate the analyses in terms of description length, including both the morphological description and lexically speciﬁed information. Equating the degree of complexity of the system with the length of its description, they show that the complexity of the different analyses differs signiﬁcantly; a description with twenty classes and up to twelve lexically speciﬁed suppletive stems for some lexemes results in the shortest length.⁵ The point here is that degree of complexity is a property of a particular description of French verbs.⁶ This makes it particularly important to examine and justify the description itself. Stump & Finkel (2015) make a similar point along a different dimension of description. They contrast two potential representations of the same set of English verbs, one based on acoustics alone (what they call ‘hearer-oriented’) and one based on structure known to a speaker that does not surface in the production of forms (‘speaker-oriented’). For example, the exponence of the past participle(s) of  and  are identical in a hearer-oriented representation, that is, /εnt/, but a speaker knows that they contain different structure, that is, /εn-t/ vs. / εnd-t/. Stump & Finkel show that the two representations exhibit differences in their complexity based on various information-theoretic and set-theoretic measures. (See also Bonami 2013 for similar issues with French verbs.) Mansﬁeld & Nordlinger (Chapter 3, this volume) also draw attention to how systems are represented. Investigating Murinhpatha (non-Pama-Nyungan, Northern Australia), they show that speakers have made analogical changes to the verbal system which, surprisingly, do not lead to greater predictability among allomorphs. They suggest that using existing measures of conditional entropy to calculate the complexity of the system would be misrepresentative because verbs in the language are a closed class with largely idiosyncratic exponence. The exponence for the verbs is made up of intersecting formatives that are partially ⁵ See also Goldsmith (2001, 2011) for arguments for description length-based evaluation metrics in morphological analysis. ⁶ In employing an evaluation metric based on description length, Sagot & Walther (2011) argue that descriptions of shorter length (i.e., of less complexity in their sense) are more adequate. However, it is not obvious to us that for a given inﬂectional system, the description with the shortest description length should be taken to be the most adequate one. This is a question of the evaluation metric. For instance, see Derwing (1990) for arguments against evaluation metrics based on economy of storage (incl. minimum description length) and for metrics based on economy of processing speed and Dahl (Chapter 13, this volume) for discussion on the relationship between Minimum Description Length and other notions/metrics of complexity. It is not a foregone conclusion that a description that is most cognitively realistic will be the description with the lowest estimated complexity in terms of either description length or the implicative notion outlined in (1) above. This is a question for investigation, but beyond the scope of the present work.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

27

predictive of each other. If the exponents are represented as unanalysable wholes (as they sometimes are in the literature), the subregularities among intersecting formatives, which may help explain the analogical changes, are obscured. Finally, Cotterell et al. (2019) note that the information-theoretic measure used in, for example, Ackerman & Malouf (2013), is highly sensitive to the particular descriptive analysis that is made of an inﬂectional system. They propose an alternative measure of Integrative complexity in terms of joint entropy—a calculation based on the joint distribution over all cells, with complexity deﬁned as the entropy of the distribution.⁷ However, even if joint entropy is less sensitive to the representation of the system, this does not eliminate the need to investigate how analytic assumptions about that representation affect calculations of the complexity of inﬂection class systems. These studies highlight how the description of a system can affect calculations of its complexity. Given that inclusion or exclusion of irregularity and non-afﬁxal exponence can substantially change the description of an inﬂection class system, we should ask in what ways they affect the complexity of that system. It is beyond the scope of this chapter to argue for one particular representation of Russian nouns as being more adequate than another. But roughly similarly to the approach of Sagot & Walther (2011), we explore the effect of different descriptions of the Russian nominal inﬂection class system for estimates of its complexity.⁸

2.2.1 Regularity and inﬂection classes It has long been known that high type frequency inﬂection classes create analogical pressure on irregular patterns. When irregular patterns resist regularization, the most common argument for their persistence despite analogical pressure is that they are lexically stored, leaving them relatively impervious to regularization. The typically high token frequency of such lexemes also makes lexical speciﬁcation psycholinguistically plausible. This and other evidence of lexical storage is sometimes taken as a basis for treating irregulars as falling outside of the grammatical system—in this case, the inﬂectional system.

⁷ Cotterell et al.’s work was presented at the Society for Computation in Linguistics just as we were completing ﬁnal revisions to this chapter, so did not have the opportunity to apply their joint entropy metric to our data, nor to explore whether it produces estimates of system complexity that are less dependent on the particular descriptive analysis that is made of an inﬂectional system. However, we see this as a promising avenue for investigation. ⁸ Unlike Sagot & Walther (2011), we do not offer a formal analysis of Russian nouns, and make no particular assumptions about what inﬂectional information is part of the grammatical system, and what is lexically speciﬁed. However, like them, we include both regular, productive forms, and also ones that analyses might treat as lexically speciﬁed. And of course, their paper and our chapter are similar in investigating how different analytic assumptions affect assessments of the complexity of the inﬂection class systems.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

28

    . 

However, a categorical division into regular and irregular types has long been recognized as problematic. First, the scope of a form’s irregularity can range from having an exponent associated with a different class to having a fully suppletive form. The extent to which a lexeme is irregular can also range from a single cell to the majority of the paradigm. (See Corbett et al. 2001 for examples from Russian.) Aside from the most extreme cases of suppletion, irregular lexemes exhibit irregularity in only a subset of their paradigms’ cells. And even in suppletion, stem distributions are often shared with regular patterns (Aski 1995; Bonami & Boyé 2002; Hippisley et al. 2004; Boyé & Cabredo Hofherr 2006). Thus, even the most irregular lexemes frequently overlap with regular ones and tend to exhibit at least some degree of systematicity (Brown & Hippisley 2012). In fact, Brown & Hippisley argue that ‘there is no hard-and-fast contrast between rules and lexical speciﬁcation. Rather, we must make a distinction between the rule on the one hand and how the lexeme accesses that rule’ (p. 80). In their theory, Network Morphology, rules are information held at nodes in an inheritance hierarchy. This information is inherited ultimately by individual lexemes, deﬁning their patterns of inﬂectional exponence. However, lexemes may inherit information by default or by direct speciﬁcation of the node from which the lexeme should inherit. This means that within their theory, regularity is deﬁned in terms of how a lexeme accesses a rule, and a single rule may represent regularity in some lexemes and irregularity in others. Second, speakers draw on their knowledge of irregular patterns when generalizing to new lexemes (Bybee & Slobin 1982; Albright & Hayes 2002, 2003). Words that are traditionally categorized as irregular play a crucial role in predicting how speakers generalize morphological patterns to new words. Irregular inﬂectional patterns can be more reliable in certain contexts (e.g., phonological neighborhoods) than more regular patterns. Correspondingly, inﬂectional patterns that are highly irregular can be extended. The athematic 1 marker -m in Common Slavic spread from just a handful of verbs to become the dominant 1 marker in some West and South Slavic languages (Janda 1994). Thus, even highly irregular patterns can exhibit a degree of productivity. Third, it is now generally accepted that both irregularly and regularly inﬂected words are stored in the mental lexicon and leave traces in memory (Alegre & Gordon 1999a; Baayen 2007 inter alia). Baayen et al. (2007), among many others, ﬁnd a surface frequency effect for regularly inﬂected words in a lexical decision task even with low frequency lexemes. Starting with Taft (1979), such a frequency effect has been widely interpreted as reﬂecting direct lexical storage of the forms, rather than storage via component morphemes.⁹ Thus, showing that irregulars are ⁹ See Taft (2004) and Taft & Ardasinski (2006) for more recent, sceptical interpretations of surface and base frequency effects. Models with different primitive assumptions about representational structure also interpret surface frequency effects somewhat differently, for example connectionist

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

29

subject to lexical storage is not a sufﬁcient basis on which to argue that irregular items are not part of the system of inﬂectional patterns. Evidence of this sort blurs the binary classiﬁcation of inﬂectional patterns into ‘regular’ and ‘irregular’ types and undermines any concomitant claim that there is a categorical distinction between patterns generated by the inﬂectional rule system (and thus appropriately described in terms of inﬂection classes) and those that are lexically-stored exceptions. Yet in the context of knowing that the description of an inﬂection class system makes a big difference for calculations of its complexity, analytic assumptions that place irregulars outside of the inﬂectional system are pernicious because they preclude even asking important questions about how irregulars interact with regulars and the consequences of this for the complexity of the system.

2.2.2 Paradigmatic layers and inﬂection classes Similar observations can be made about paradigmatic layers of inﬂection. Linguistics has a deep-rooted tradition of thinking of words as combinations of linearly (and perhaps hierarchically) ordered morphemes. As noted at the beginning of the chapter, there is a philosophical preference for concatenative patterns that manifests in a privileged status for afﬁxes both descriptively and theoretically. Nonetheless, different layers of exponence can exhibit distinct structural organization. For example, a subset of Russian nouns exhibits ﬁxed stress on the ending and has a stress retraction in the nominative plural, and also in accusative plural when syncretic with nominative (’ ‘nail’ and  ‘lip’ in Table 2.1). This is one of several morphosyntactically conditioned stress alternations in Russian nouns (see Zaliznjak 1967 for a description of stress patterns; Brown et al. 1996 offers an overview in English). The alternations deﬁne a set of structured stress classes that partly crosscut the sufﬁx-based classes and form an inheritance hierarchy that is distinct from the one deﬁned by inﬂectional sufﬁxes (Brown et al. 1996). The point here is that the stress and sufﬁx patterns both are informative about and conditioned by morphosyntactic values. For some classes, represented here by  ‘lip’ and  ‘window’, stress placement is the only thing that distinguishes nominative/accusative plural from genitive singular. In practice, however, virtually all analyses of Russian nominal inﬂection focus on classes as deﬁned by (regular) sufﬁxal groups, even though inﬂectional stress exhibits its own, models (Daugherty & Seidenberg 1994) and discriminative learning models (Baayen et al. 2011). However, the important thing in the present context is that none of these models posit that irregular and regular inﬂected forms are processed and stored in the mental lexicon in categorically different ways (an idea put forward most famously by Prasada & Pinker (1993) and advocated for from a neurolinguistic perspective by Ullman (2001, 2004), but now widely rejected).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

30

    . 

Table 2.1. An example of morphosyntactically conditioned stress alternation in Russian nouns

                       

’ ‘nail’

 ‘lip’

 ‘window’

gvozd’ gvozd’ gvozdjá gvozdjú gvozdé gvozdjóm gvózdi gvózdi gvozdéj gvozdjám gvozdjáx gvozdjámi

gubá gubú gubý gubé gubé gubój gúby gúby gúb gubám gubáx gubámi

oknó oknó okná oknú okné oknóm ókna ókna ókоn óknаm óknаx óknаmi

independent organization into classes. And this choice is rooted, ultimately, in analytic assumptions of the linguist that give a privileged status to afﬁxes in the description of inﬂectional systems. Another argument comes from the fact that layers of exponence may offer a full picture of the organization and complexity of a system only when considered jointly. Chiquihuitlán Mazatec (Oto-Manguean, Mexico) verbs are marked for person and aspect by a combination of tones, ﬁnal vowel, and stem formative (Jamieson 1982). The uncertainty associated with predicting the tone, ﬁnal vowel, and stem formative for a paradigm cell in isolation is high. Moreover, knowing the full paradigm for one of the layers of exponence (tone, ﬁnal vowel, or stem formative) does little to help predict the pattern for other layers of the same lexeme (Ackerman & Malouf 2013: 448). However, the uncertainty associated with predicting the exponence of any given cell knowing one other cell in the paradigm is surprisingly low because each word form carries some information about the possible tone, ﬁnal vowel, and stem formative of other cells; there is strong implicative structure between individual cells, which crosscuts the three layers of inﬂectional exponence (see average conditional entropy in Ackerman & Malouf 2013: 443). Similarly, Sims (2015: ch. 5) shows that the distribution of genitive plural defectiveness in Greek nouns is predictable from the relationship between afﬁxal patterns and inﬂectional stress. When these layers of inﬂection are taken together, the picture that emerges is that the genitive plural in some classes is implicatively stranded in the paradigm, causing defectiveness. This kind of evidence undercuts any attempt to exclude non-afﬁxal paradigmatic layers. In the Greek example, the paradigmatic layers reveal aspects of inﬂectional organization that cannot be discerned from afﬁxal structure alone. The Mazatec example is similar with the addition that including all of the layers of

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

31

inﬂection actually leads to less complexity than would be expected given each layer independently. The inclusion of stress information in Russian nouns necessitates a second, distinctly structured inheritance hierarchy. Ultimately, paradigmatic layers can reveal organizational properties of inﬂectional systems that are otherwise hidden. Thus, as with irregularity, analytic assumptions that exclude non-afﬁxal paradigmatic layers from consideration preclude important questions about how elements in an inﬂectional system interact to determine its overall complexity.

2.2.3 Interim Summary In summary, estimates of the complexity of inﬂectional systems depend on the representations of the systems under investigation. While there has been a tendency to exclude irregular inﬂectional patterns and non-afﬁxal layers of exponence from these representations, doing so is not well justiﬁed on empirical or theoretical grounds. Both irregulars and non-afﬁxal layers have the potential to reveal structural properties of the system that are otherwise obscured. The question becomes whether a broader understanding of what belongs to ‘the system’ makes a difference for calculations of its complexity, and how.

2.3 Inﬂection class complexity Inﬂection classes are a layer of structure that mediates between form and meaning, without bearing meaning directly (they are morphomic in Aronoff’s 1994 terms), and some languages do not have inﬂection classes, showing that classes are not ‘needed’. These observations have led to the idea that inﬂection classes create unnecessary complexity in morphological systems and have raised the question of whether there are limits on that complexity. As noted in the introduction, the focus of this question has shifted away from a notion of complexity deﬁned in terms of absolute number of inﬂection classes/ exponents/cells and towards one that is rooted in implicative paradigmatic structure. Stump & Finkel (2013) deﬁne the complexity of an inﬂection class system as ‘the extent to which the system inhibits motivated inferences about a lexeme’s full paradigm of realized cells from subsets of its cells’ (2013: 55; emphasis ours). When deﬁned in this way, the complexity of an inﬂection class system may, but need not, be related to the absolute size of the system. Systems with a large number of inﬂection classes and/or in which lexemes have a large number of paradigm cells can exhibit low complexity if there is strong implicative structure within the paradigm. Likewise, small inﬂectional systems can be highly complex if inﬂected forms are not held together by strong implicative relations (Sims 2015: ch. 5).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

32

    . 

Stump & Finkel operationalize their deﬁnition primarily in terms of set-theoretic principal part sets—a set of realized cells from which a lexeme’s full inﬂection class membership can be determined. The concept of a principal part set is, by its very nature, concerned with implicative paradigmatic structure, giving a way to compare the complexity of different inﬂection class systems. Somewhat similarly, Ackerman et al. (2009) use information-theoretic tools to ask how much surprisal is associated with the inﬂected form realizing one paradigm cell, given the form associated with another cell, and deﬁne the complexity of an inﬂection class system in terms of its average conditional entropy. Ackerman & Malouf (2013) use the same information theoretic tools to compare the complexity of a set of typologically diverse languages. Stump & Finkel (2013) and Ackerman & Malouf (2013) both ﬁnd that when complexity is deﬁned in terms of implicative structure, individual forms tend to be predictable on average. In a survey of ten languages, Ackerman & Malouf calculate the average conditional entropy associated with the realization of a set of morphosyntactic values given knowledge of one other form of the same lexeme and show that it is uniformly relatively low, despite diversity in the size of the languages’ inﬂectional systems.¹⁰ They focus on the idea that implicative structure allows even large systems to exhibit low average conditional entropy and present their results as a typological tendency, the Low Conditional Entropy Conjecture: ‘enumerative morphological complexity is effectively unrestricted, as long as the average conditional entropy, a measure of integrative complexity, is low’ (2013: 436). Stump & Finkel (2013: 215) offer a similar generalization in the form of the Depth-of-Inference Contrast: ‘languages show a high degree of uniformity in allowing a given form in a lexeme’s paradigm to be deduced from a low number of dynamic principal parts (the average number being not much more than one)’.¹¹ Thus, both ﬁnd evidence that even inﬂectional systems that vary widely in size tend to allow for well-motivated inferences when it comes to the task of inferring one inﬂected form from another. The idea that inﬂectional systems must maintain low complexity in this way is intuitive given that speakers must learn inﬂection classes for them to persist. Also, speakers must be able to generalize morphological patterns because not all inﬂected forms are attested even in large corpora (Baayen 2001; Blevins et al. 2017), and the need to predict unknown forms remains crucial throughout the lifespan (Bonami & Beniamine 2015). ¹⁰ However, at least for the languages that we are most familiar with (Russian, Greek), they base their analyses on grammatical descriptions that exclude irregularities and non-afﬁxal layers of exponence. See Sims (2015: ch. 5) for a comparison between their analysis of Greek nouns and one based on a more robust representation of the nominal system. ¹¹ In a dynamic principal parts analysis, the principal parts need not reﬂect the same morphosyntactic properties from one inﬂection class to another. Stump & Finkel primarily differentiate this from a static principal parts analysis, in which the set of principal parts is required to correspond to the same morphosyntactic properties for all lexemes in a given syntactic category, and thus all inﬂection classes within that category.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

33

At the same time, Stump & Finkel observe a difference in complexity between predicting one inﬂected form and predicting class membership (i.e., all forms). In contrast with the relatively uniform ease with which a single inﬂected form can be deduced, ‘Languages vary widely in the number of dynamic principal parts they require to distinguish a given I[nﬂection] C[lass]’ (Stump & Finkel 2013: 215). Similarly, Ackerman & Malouf ﬁnd greater crosslinguistic differences in average declensional entropy (an unconditioned entropy measure of inﬂection class predictability) than in average conditional entropy (a conditional entropy measure of inﬂected form predictability). This suggests that the complexity of an inﬂection class system as a whole is not necessarily a direct product of the complexity of the individual exponents. It is therefore important to investigate how the complexity of the system as a whole relates to the complexity of the component elements of the system. A few steps have been taken in this direction. Sims & Parker (2016) ﬁnd that nine investigated inﬂection class systems show roughly similar degrees of overall complexity, when calculated over pairs of forms using conditional entropy, consistent with the Low Conditional Entropy Conjecture. Crucially, however, they also show that implicative structure does very different amounts of ‘work’ in the languages to produce this result. In some languages, knowledge of one inﬂected form is crucial to predicting another. In other languages, inﬂected forms are independently fairly predictable, and knowledge of another form does little or nothing to improve that predictability. Thus, paradigmatic implication is not always an important determinant of the complexity of inﬂectional systems. Additionally, based on data from Icelandic and French, Stump & Finkel (2013) propose the Marginal Detraction Hypothesis: ‘[m]arginal I[nﬂection] C[lasse]s tend to detract most strongly from the IC predictability of other ICs’ (p. 225). Marginal classes here are deﬁned as ones with few lexemes. The Marginal Detraction Hypothesis thus asks whether the internal structure of inﬂection class systems is homogeneous. The hypothesis is that the implicative structure of low type frequency classes may differ from that the most frequent classes. (See also Sims & Parker 2016 for a similar idea.) Related to this, Blevins et al. (2017) argue that the Zipﬁan distribution of morphological patterns helps balance two opposing pressures: the importance of predicting forms and the importance of discriminating forms. Frequently occurring patterns facilitate prediction. Suppletive patterns, which are likely to belong to low type frequency classes, may detract from predictability but at the same time have beneﬁts like being highly discriminative. Both types of patterns contribute, in different ways, to ensuring the patterns in the language are usable by speakers. Together these studies explore the idea that competing pressures may lead different components of inﬂectional systems to exhibit different properties. They also suggest that if there is a strong crosslinguistic tendency for languages to exhibit low inﬂection class complexity, this both results from and occurs despite

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

34

    . 

structural aspects of inﬂectional systems. But so far there is little understanding of how the elements of inﬂectional systems interact to determine inﬂection class complexity, so further work is needed in this area, especially work comparing implicative structuring within subcomponents of the lexicon.

2.4 Russian nouns We now turn to the example of Russian nouns. Our approach to investigating Russian is to divide inﬂectional exponence into its subcomponents and to investigate the effect of each on the complexity of the inﬂection class system. We do this in two ways. First, starting with a baseline description of the Russian nominal system that consists only of classes as deﬁned by inﬂectional sufﬁxes, we add in further information about exponence—additional paradigmatic layers—and look at the effect of this on the complexity of the inﬂection class system (section 2.6). Second, to look more directly at irregularity, we classify the individual exponents within each paradigmatic layer as regular or irregular. We then investigate the extent to which this (ir)regularity contributes to the complexity of the inﬂection class system (section 2.7). This idea is conceptually close to the Marginal Detraction Hypothesis, given the close connection between the irregularity and type frequency of inﬂection classes. However, quantifying the regularity of inﬂection classes’ layers directly allows us to take a closer look at whether layers are making distinct contributions to the complexity of the system as a whole. But ﬁrst, in this section we describe the data sets that we work with. Various proposals have been made regarding the number of Russian noun classes. The four-class system of Corbett (1982), shown in Table 2.2, is a typical representation of the Russian nominal system, but it is also coarse-grained. It may be an appropriate basis for some kinds of linguistic investigation but questions of inﬂection class complexity beneﬁt from a more granular representation. We therefore consider a fuller set of sufﬁxal patterns and three additional layers of inﬂectional exponence. Here we are interested in how different aspects of inﬂectional exponence affect the complexity of a system without making any claims about which granularity is the ‘right’ or ‘best’ representation (cf. the earlier discussion of Sagot & Walther 2011). Sufﬁxes constitute one layer of exponence. In addition to the four sufﬁx sets illustrated in Table 2.2, we consider ten other patterns of sufﬁxes: 1. Indeclinable nouns, for example,  ‘(movie) theater’; 2. Neuter nouns like  ‘time’. In the plural these behave like Class IV nouns. In the singular they have an -a in the nominative (like Class II) and accusative, -i in the genitive, locative and dative (like Class III nouns), and -om in the instrumental (like Class I nouns);

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

35

3. Nouns that belong to Class I except that they have a null genitive plural, for example, : raz ‘time..’; 4. Nouns that belong to Class I except that they have -a in the nominative plural, for example, : goroda ‘city..’; 5. Nouns that belong to Class IV, except for -ov in the genitive plural, for example, : oblakov ‘cloud..’; 6. Nouns that belong to Class IV, but with nominative plural -i, for example, : jabloki ‘apple..’; 7. Nouns that belong to Class II except they have an overt genitive plural, for example, : rasprej ‘strife..’; 8. Nouns that belong to Class IV, but have a nominative plural -i and genitive plural -ov, for example, č: očki ‘point..’ and očkov ‘point..’; 9. Nouns that belong to Class I, but have a nominative plural -e and a null genitive plural, for example, ’: krest’jane ‘peasant..’ and krest’jan ‘peasant..’; 10. Nouns that belong to Class I but have a nominative plural in -a and a null genitive plural, for example, ¨ : teljata ‘calf..’ and teljat ‘calf. .’.¹² Table 2.2. Illustration of the four-class system, based on inﬂectional sufﬁxes

                       

I

II

III

IV

 ‘law’

 ‘map’

’* ‘bone’

 ‘place’

zakon zakon zakona zakone zakonu zakonom zakony zakony zakonov zakonax zakonam zakonami

karta kartu karty karte karte kartoj karty karty kart kartax kartam kartami

kost’ kost’ kosti kosti kosti kost’ju kosti kosti kostej kostjax Kostjam Kostjami

mesto mesto mesta meste mestu mestom mesta mesta mest mestax mestam mestami

Note: * Here and throughout the chapter we use scientiﬁc transliteration, rather than transcription. This is a convenience that accommodates Russian speakers and makes it easier to check the examples in a dictionary (because the spelling is maintained). However, the transliteration is sometimes misleading with regard to the phonological (or morphological) shape of words. Although it is not clear in the transliteration of this example, the stem-ﬁnal consonant cluster in ’ is [sjtj] throughout the paradigm (e.g., nominative singular [kosjtj], genitive singular [kosjtj-i], instrumental singular [kosjtj-ju]).

¹² Nouns like ’ and ¨  also exhibit changes in their stems. See discussion below.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

36

    . 

A second layer of exponence consists of stem distributions. Here we deﬁne a stem as the segmental material left when the sufﬁx sets discussed above are removed from inﬂected forms (as in, e.g., Aronoff 1994: 31). In total, 80.8% of lexemes have a consistent stem throughout the paradigm (data from Zaliznjak 1977). In addition to this, we consider ﬁve types of stem change that are morphologically patterned: 1. Vowel-zero alternation in the nominative singular (and accusative singular when syncretic) but not elsewhere, for example, ’: den’ ‘day..’ ~ dnja ‘day..’; 2. Vowel-zero alternation in the genitive plural (and accusative plural when syncretic) but not elsewhere, for example, ’: pis’mo ‘letter..’ ~ pisem ‘letter..’; 3. A stem extension -in in the singular, for example, ’: krestjanin ‘peasant..’ ~ krestjane ‘peasant..’; 4. A stem extension -en in all forms but the nominative and accusative singular, for example, : vremja ‘time..’ ~ vremeni ‘time.. ’; 5. Extensions -ёnok in singular forms and -jat in plural forms, for example, ¨ : telёnok ‘calf..’ ~ teljata ‘calf..’. A third layer of exponence is stress. In total, 91.6% of nouns have consistent stem stress throughout the paradigm, and an additional 6.1% have consistent stress on the inﬂectional sufﬁx throughout the paradigm (data from Zaliznjak 1977, reported in Brown et al. 1996).¹³ The remaining nouns have some type of stress shift. While they represent only a small percentage of total types, they tend to be among the words with the highest token frequency. Stress alternations fall into six patterns, shown in Table 2.3. With one exception, the shift is between the ﬁrst syllable of the stem and the inﬂectional ending: 1. Two patterns involving a shift according to number, for example,  ‘place’ and ˇ ‘number’; 2. Fixed stress on the inﬂectional ending, but with stem-initial stress in nominative plural (and accusative plural when syncretic), for example,  ‘lip’;

¹³ Russian nouns usually have zero exponence in either the nominative singular or genitive plural, depending on class; see Table 2.2. When a form has no overt inﬂectional sufﬁx in a given paradigm cell, lexemes that otherwise would have stress on the sufﬁx have stress on the last syllable of the stem instead (see  in Table 2.3).

Table 2.3. Illustration of stress classes of Russian nouns ˇ ‘number’

 ‘lip’

 ‘beard’

 ‘portion’

ˇ  ‘soul’

mésto mésto mésta méstu méste méstom mestá mestá mést mestám mestáx mestámi

čisló čisló čislá čislú čislé čislóm čísla čísla čísel číslam číslax číslami

gubá gubú gubý gubé gubé gubój gúby gúby gúb gubám gubáx gubámi

borodá bórodu borodý borodé borodé borodój bórody bórody boród borodám borodáx borodámi

dólja dólju dóli dóle dóle dólej dóli dóli doléj doljám doljáx doljámi

dušá dúšu duší dušé dušé dušój dúši dúši dúš dúšam dúšax dúšami

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

                       

 ‘place’

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

38

    .  3. Fixed stress on the inﬂectional ending, but with stem-initial stress in both nominative plural (and accusative plural when syncretic) and accusative singular, for example,  ‘beard’; 4. Two patterns that combine a shift according to number with retraction in the nominative plural (and accusative plural when syncretic) and accusative singular, for example,  ‘portion’ and š ‘soul’.¹⁴

The fourth and ﬁnal layer of exponence reﬂects patterns of defectiveness.¹⁵ In total, 97.7% of nouns have a form for each cell in the paradigm (data from Zaliznjak 1977). However, some lack forms for a subset of the paradigm. Most of these are singularia or pluralia tantum nouns, for example, ˇ  ‘pants/ trousers’ has no singular forms. Russian also has a well-known pattern of genitive plural defectiveness that affects a few dozen nouns, for example,  ‘reward’ has no genitive plural, and a handful of (diminutive) nouns occur in only the nominative and accusative singular, for example, razok ‘time.’. Within each layer of exponence we do not include patterns that are represented in only one lexeme, nor do we include alternate patterns of stress. However, many lexemes in our data are nonetheless unique in their morphological exponence because they exhibit a unique combination of layers. For example,  ‘lord/sir’ has a stem extension in the singular like ’ ‘peasant’ but it has the same set of sufﬁxes and stress pattern as  ‘city’. It is the only lexeme to exhibit this particular combination of patterns. We also abstract away from properties that are not related to inﬂection class membership. Some lexemes exhibit the same exponence but are not identical in other morphosyntactically-relevant traits like gender and animacy. For example, ’ ‘drunkard’ and š ‘girl’ have the same pattern of exponence but

¹⁴ Due to the stress shift between singular and plural, the distribution of the retraction of stress onto the stem is ambiguous. Nouns like  are consistent with stress shift in both nominative plural and accusative singular, but since there is stem stress throughout the singular, the accusative singular is ambiguous. Conversely, nouns like š are also consistent with both stress shifts, but since there is stem stress throughout the plural, the nominative plural is ambiguous. Except for ambiguous instances of this sort, accusative singular stress retraction never occurs unless nominative plural stress retraction also does, so it seems safe to analyse š as having both stress retractions, with the nominative plural one being opaque. The proper analysis of  is less clear. Stress retraction in the accusative singular happens (unambiguously) only in nouns with the Class II sufﬁx pattern. While  belongs to this class, other nouns with the same stress pattern do not (e.g.,  ‘tooth’ (Class I), šč’ ‘city square’ (Class III)). An alternative possibility is therefore to analyse these nouns as having only the nominative (and accusative) plural stress retraction, since it occurs in combination with a wider range of stem classes. We do not have a ﬁrm opinion about which analysis is ultimately the right one, or even whether speakers themselves make only one or the other analysis. But it also makes no difference in the present context. Since our analysis of implicative relations in the following section is based on surface patterns, all six patterns in Table 2.3 are treated as distinct in the analysis. ¹⁵ Walther (2017) distinguishes between ‘deﬁcient’ and ‘defective’ lexemes where the former are lexemes for which a speaker could determine what forms would ﬁll the cells but does not use those forms, and the latter are lexemes for which there is uncertainty about which form would ﬁll missing cells. We include both types of lexemes in our category of defectiveness.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

39

the former is masculine while the latter is feminine. They are treated in our analyses as belonging to the same class since gender is not expressed inﬂectionally in nouns. We also abstract away from predictable phonologically-conditioned variation and predictable semantically-conditioned variation. For example, vowels reduce when not stressed, but given information about stress, vowel quality is fully predictable and purely phonological. Thus, we abstract away from vowel reduction in our class representations. Some genitive plural forms have no overt exponent, for example, kart ‘map..’, and others have an overt sufﬁx, for example, zakon-ov ‘law-.’ and učitel-ej ‘teacher-.’. Whether a lexeme has a zero genitive plural form or an overt ending is morpholexically conditioned and thus depends on its inﬂection class, so we include this distribution in our description. However, which of the two overt exponents will occur is fully predictable from the phonology of the stem: -ej occurs with morphologically soft stems and -ov occurs elsewhere (Timberlake 2004: 84–5).¹⁶ Thus, we represent -ov and -ej as a single exponent. Similarly, we do not include differences in accusative marking that are predictable based on animacy (see Corbett & Fraser 1993: 129–30 for justiﬁcation).¹⁷ Thus, our analysis reﬂects only information about exponence that is directly a property of inﬂection class membership. See Parker (2016) for a more complete description of the patterns and paradigmatic layers of Russian nouns.

2.5 Quantifying complexity We adopt a deﬁnition of complexity rooted in the predictability of individual forms, rather than entire classes, because it reﬂects a type of unpredictability speakers must overcome to use an inﬂectional system (Ackerman et al. 2009). When speakers need to express a combination of lexeme and grammatical

¹⁶ In Russian it is necessary to distinguish phonological softness (secondary palatalization) and morphological softness. The phonological softness of consonants is relevant to phonological processes, for example, conditioning of unstressed vowel reduction. Morphological softness is relevant to allomorph selection in genitive plural. In Russian, consonants that are pronounced with secondary palatalization are soft both phonologically and morphologically. Most of the consonants that are pronounced without secondary palatalization are hard both phonologically and morphologically. However, there are six consonants (traditionally called the ‘unpaired’ consonants) that fall outside of this system in various ways. Three of them differ in softness depending on the level of structure. The consonant /j/ is phonologically soft (it conditions unstressed vowel reduction in the same way as other soft consonants) but morphologically hard (stem-ﬁnally it conditions genitive plural -ov, like other hard consonants). Conversely, the consonants /ʃ/ and /ʒ/ are phonologically hard but morphologically soft (stem-ﬁnally they condition genitive plural -ej, like other soft consonants). However, the behaviour of these three phonemes is the same across all inﬂection classes, so we still consider this to be predictable phonological conditioning. ¹⁷ The analysis/number of classes in this chapter differs from that in Parker (2016) and Sims & Parker (2016). This primarily reﬂects the fact that the earlier work did not abstract away from animacyconditioned exponence in accusative.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

40

    . 

properties, the predictability of the corresponding individual form is more relevant than the predictability of that lexeme’s class membership—its entire paradigm of forms—for the simple reason that speakers only ever need to produce one inﬂected form at a time. Moreover, as noted above, recent work suggests that individual form predictability is a relevant level of generalization for statements about the complexity of inﬂection class organization crosslinguistically (Ackerman & Malouf 2013). Our deﬁnition of inﬂection class complexity is repeated as (2). (2)

Complexity of an inﬂection class system: the average extent to which the system inhibits motivated inferences about the realized form of a lexeme, given one or more other realized forms of the same lexeme.

We operationalize this deﬁnition using information-theoretic tools. We use conditional entropy to estimate the complexity of the system and use the (nonconditioned) entropy of the system to estimate the potential complexity of the system.¹⁸ The potential complexity of an inﬂection class system is the amount of complexity it would exhibit if the exponents of the various paradigm cells of a lexeme were logically independent of each other, since this would maximally inhibit motivated inferences. A key question is the extent to which the actual complexity of an inﬂection class system is lower than its potential complexity, since the difference between these reﬂects the ‘work’ done by inﬂectional structure to minimize the complexity of the system. Entropy represents the average surprisal associated with the outcome of a random variable A. In the context of inﬂectional systems, A is a paradigm cell (or more accurately, a set of morphosyntactic properties) and the possible outcomes are the different exponents that realize that cell in each class. Thus, entropy represents the average surprisal associated with the exponents of a given morphosyntactic property set. (3)

Entropy HðAÞ ¼

X

pðaÞlog2 pðaÞ

a∈A

¹⁸ We recognize that these measures do not capture all aspects of a system’s complexity, especially because they are limited to comparisons between individual cells (as opposed to larger subsets of the paradigm). See, for example, Stump & Finkel (2013) and Bonami & Beniamine (2015) for investigations that consider complexity based on predictiveness/predictability of multiple paradigm cells. Expanding the current work to take account of paradigm structuring would be valuable. However, our focus here is on comparing across different descriptions of the Russian nominal system, and the importance of the description for estimates of inﬂection class complexity. A simple measure gives us the best perspective on this issue.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

41

Conditional entropy H(AjB) represents the average surprisal associated with the outcome of a random variable A, given knowledge of the outcome of another random variable B. In the present context, A and B are paradigm cells in which A ≠ B. Implicitly conditioned on the lexeme, the outcomes of A and B are two inﬂected forms of the same lexeme. Conditional entropy thus represents the average surprisal associated with the exponent that realizes a given morphosyntactic property set, knowing the exponence of another inﬂected form of the same lexeme. (4)

Conditional Entropy HðAjBÞ ¼

X a∈A;b∈B

pðb; aÞlog2

pðbÞ pðb; aÞ

Averaging across the entropy values H(A) for all licensed morphosyntactic property sets produces an estimate of the potential complexity of the system as a whole. This mean entropy value represents the average uncertainty associated with predicting the exponent of a paradigm cell knowing only the possible exponents that realize that cell in different classes. Exponents of different morphosyntactic property sets are thus treated as independent of each other. By comparison, averaging across the conditional entropy values H(AjB) of all licensed combinations of morphosyntactic property sets A and B produces an estimate of the complexity of the inﬂectional system as a whole, taking into account implicative relations holding between pairs of cells. This represents the uncertainty associated with a given cell of a lexeme knowing the exponence of one other cell of the same lexeme. The conditional entropy H(AjB) will never be higher than the entropy H(A) and will be lower whenever the exponent that realizes B is informative about the exponent that realizes A. Knowing one form of a lexeme cannot increase the surprisal associated with another form, but it can lower it. The extent to which knowing one cell reduces the uncertainty associated with another cell (the difference between entropy and conditional entropy) represents how much ‘work’ is being done by the implicative structure of the system.

2.6 Granularity and system complexity We now turn to the primary questions of this chapter, starting with: To what extent does including more paradigmatic layers into the system affect its complexity? Our approach is to develop multiple parallel descriptions of Russian nominal inﬂectional structure based on the paradigmatic layers. Each description

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

42

    . 

is based on the same set of lexemes but the lexemes are distributed across classes differently depending on which layers are included in the analysis. This allows us to investigate how paradigmatic layers interact, and speciﬁcally, how those interactions inﬂuence the complexity of the system as a whole.

2.6.1 Granularity of inﬂection class information We determined the number of distinct patterns that result from combinations of paradigmatic layers. We took each morphological noun in an exhaustive grammatical dictionary of Russian, Zaliznjak (1977), and created multiple parallel representations of the system by including increasingly more paradigmatic layers. Each representation of the system includes the same 43,486 lexemes distributed among the number of distinct patterns/classes that arise based on the layers considered. In general, as more layers are combined, more classes are needed to describe Russian nominal inﬂection. In Table 2.4 we provide the number of classes that result when sufﬁx sets are considered independently and in combination with one, two or three additional paradigmatic layers. Note that even the least granular representation here exhibits more classes than the traditional four classes argued for in Corbett (1982) and used in other complexity studies where Russian nouns were considered (e.g., Ackerman & Malouf 2013). We will refer to the different parallel descriptions as ‘granularities’. In Figure 2.1 we show the distribution of word types per inﬂection class in each of the granularities presented in Table 2.4. The distribution of lexemes across classes is roughly exponential in every granularity, resulting in a more or less linear trend when displayed in log space (Figure 2.1). In other words, there are many lexemes in a small number of classes and few lexemes in many classes. This is not surprising; distributions of this sort are ubiquitous among frequency counts in

Table 2.4. Number of nominal inﬂection classes of Russian nouns as a function of which paradigmatic layers are included Number of classes

Sufﬁxes

14 21 22 33 42 57 64 82

+ + + + + + + +

Stem changes

Stress

Defectiveness

+ + +

+ + +

+ + + +

+ +

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

Log Type frequency

     8 6 4 2 0

14 inflection classes

8 6 4 2 0

21 inflection classes

8 6 4 2 0

22 inflection classes

8 6 4 2 0

33 inflection classes

8 6 4 2 0

42 inflection classes

8 6 4 2 0

57 inflection classes

8 6 4 2 0

64 inflection classes

8 6 4 2 0 Inflection Classes

43

82 inflection classes

Figure 2.1. Word types per inﬂection class across different granularities

natural languages, including word frequencies (see Baayen 2001 for detailed discussion).

2.6.2 Paradigmatic layers and inﬂection class complexity To assess how the complexity of the system changes with granularity, we calculated the mean entropy (= estimated potential complexity) and mean conditional entropy (= estimated actual complexity) of each representation of the system presented in Table 2.4. In light of the type frequency distribution of classes shown in Figure 2.1, we calculated mean conditional entropy both with and without type frequency weighting. In the weighted condition, the probabilities of each exponent were weighted by the type frequency of the exponent. This measure represents the complexity of the system when both implicative structure and the uneven distribution of lexemes across classes are taken into account. Figure 2.2 shows that as granularity increases, and more paradigmatic layers are included in the system, the entropy and unweighted conditional entropy of the system tend to increase. This is unsurprising from the perspective of information theory—as more elements are present in the system, there will be greater surprisal associated with those elements on average. More interestingly, the weighted conditional entropy values remain low regardless of inﬂection class granularity; the weighted conditional entropy only increases 0.12 bits from a representation of the system that includes only sufﬁxes (fourteen classes) to one with all

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

44

    . 

Entropy (unweighted) Conditional Entropy (unweighted) Weighted Conditional Entropy

Complexity Measures in Bits

2.5 2.0 1.5 1.0 0.5 0.0 14

21

22

33 42 Number of Classes

57

64

82

Figure 2.2. Complexity measures across granularities of Russian nouns

paradigmatic layers together (eighty-two classes). This means that the uncertainty associated with a large number of classes is mitigated by a combination of the implicative structure of the system and the unequal distribution of lexemes across classes. Implicative structure and the distribution of lexemes across classes conspire to maintain low systemic complexity. However, even a random distribution of exponents will tend to produce a system with lower mean conditional entropy than mean entropy, because some of the exponents will be accidentally informative about other exponents. Thus, we should ask whether the implicative structure of the system minimizes the complexity of the inﬂection class system in each granularity more than is expected by chance. Employing Monte Carlo simulation, we created a hundred simulated data sets for each granularity. In each granularity the simulated data sets contained the same exponents and the same number of classes as in the real granularity, but the exponents were randomly distributed across the classes.¹⁹ The mean conditional entropy of the simulated data sets represent the amount of complexity we expect in systems of this size based on a random distribution of exponents. If the actual complexity falls outside of the simulated values, we can conclude that the ‘work’ done by the implicative structure in that granularity is signiﬁcant at a level of p.05). Figure 2.4 shows the effect size of each independent variable when others are kept constant. Irregular patterns of defectiveness and irregular patterns of stress thus increase the complexity of the system, but irregular patterns of sufﬁxes and stems do not. Irregularity does not inherently make the system more complex; only some types of irregularity do.

0.004 0.003 0.002 0.001

0.004 0.003 0.002 0.001

0.000

0.000

–0.001

–0.001 Reg

Irreg

Reg Stems

0.007

0.007

0.006

0.006

0.005

0.005

Entropy Difference

Entropy Difference

Suffixes

0.004 0.003 0.002 0.001

0.004 0.003 0.002 0.001

0.000

0.000

–0.001

–0.001 Reg

Irreg Stress

Irreg

Reg

Irreg Defectiveness

Figure 2.4. Effect of the irregularity of each layer on system complexity (entropy difference); the vertical bars show 95% conﬁdence intervals

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

49

2.8 Discussion and conclusions This study highlights the need for caution in interpreting results from data whose representations include only afﬁxal and regular inﬂectional patterns, since they may misrepresent the complexity of inﬂectional systems and/or obscure important aspects of inﬂectional structure. For example, the four most granular representations of Russian nouns in our study (forty-two, ﬁfty-seven, sixty-four, and eighty-two classes) have an unweighted average conditional entropy that exceeds the largest unweighted average conditional entropy value among the ten languages investigated by Ackerman & Malouf (2013),²³ even though the conditional entropy of a four-class system of Russian falls in the middle of the range for languages they investigate. The mean conditional entropy of our most granular representation (eighty-two classes) is twice as high as the value for the four-class Russian system in Ackerman & Malouf’s paper. This raises questions about the extent to which typologically low systemic complexity is a reﬂection of assumptions adopted when creating representations of those systems. At the same time, it is equally important to point out that for every representation of the Russian nominal inﬂectional system that we investigated—that is, every granularity—the estimated complexity of the Russian noun class system was substantially lower than the potential complexity of the system, as shown in Figure 2.2 in section 2.6.2. The estimated complexity of the system was also signiﬁcantly lower than would be expected by chance (Figure 2.3 in section 2.6.2). This indicates that a signiﬁcant amount of ‘work’ is done by implicative structure, regardless of the particular representation that is assumed. The latter result contradicts Ackerman & Malouf’s (2013: 451) speculation that Russian has no need to rely on implicative organization. However, arguably the more important conclusion is that in the end, our results are consistent with their Low Conditional Entropy Conjecture, if it is interpreted as a claim that inﬂection class systems self-organize to minimize the amount of complexity embodied in the system (rather than as a claim about a particular maximum possible conditional entropy value). No matter what particular representation we assume, Russian nouns show a pattern that is consistent with low systemic complexity, suggesting that a typological tendency towards low systemic complexity may extend beyond afﬁxal and highly regular patterns. While the Low Conditional Entropy Conjecture focuses on a global measure of the complexity of inﬂection class systems, an equally interesting question has to do with how the component parts of the system shape this global complexity. From this perspective, an important result in this chapter is that the estimated actual complexity of the system changes very little, despite the fact that the

²³ Amele, with a conditional entropy of 1.105 bits; Ackerman & Malouf (2013: 443, table 3).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

50

    . 

potential complexity of the system tends to increase as information about inﬂectional exponence (paradigmatic layers) is added (Figure 2.2 in section 2.6.2). This means that the importance of implicative structure to the organization of the Russian nominal system emerges most clearly when irregular and non-afﬁxal patterns are considered. The data presented here thus suggest that inﬂection class systems self-organize to minimize the potentially disruptive effects of irregularity and to maintain low complexity overall. This is an important aspect of the organization of the nominal system that would be hidden in a more coarsegrained representation. In a similar vein, we also showed that irregularity in some paradigmatic layers (stress, defectiveness) increases the complexity of the system, but in others it does not (Figure 2.4 in section 2.7.2). This suggests that the system as a whole is not simply a function of the complexity of its parts. It is instead a product of the way the parts are distributed—that is, how the component elements are related. This should hardly be a surprise, but the data in this chapter highlight that these sorts of local relations, and how they lead to complexity in an inﬂection class system (or don’t!), are at least as important to focus on as the complexity of the system overall. To the extent that languages universally or predominantly exhibit low systemic complexity, the question becomes why. At a broad level, the answer likely has to do with learnability (Ackerman et al. 2009), but to get beyond general formulations of this idea, it will be necessary to dive into the learnability of speciﬁc inﬂection class conﬁgurations, and to carefully examine local relations among the component parts of individual inﬂection class systems.²⁴ In this chapter, we have contributed towards this goal. Finally, we consider our results in the more general context of linguistic complexity. Studies on the overall complexity of languages suggest that there may not be any typological limits on linguistic complexity (see Miestamo 2008 for discussion of global vs. local complexity). Trudgill (2011) argues that small communities with dense social networks and little linguistic contact with other communities promote the development and preservation of complexity. Similarly, McWhorter (2007) suggests that diminished linguistic complexity in a language is often the result of an inﬂux of large groups of adults that learn the language. These studies undermine the intuitive idea that complexity in one area of a language leads to diminished complexity elsewhere in the language (see Hockett 1958: 180–1 for an early vocalization of this idea) and challenge any type of typological limit on linguistic complexity. The search for typological similarities in linguistic complexity is elusive enough to have been called a ‘wild goose chase’ (Deutscher 2009). It is thus somewhat surprising that inﬂection class systems, as particular local domains of complexity, seem to exhibit systemically low complexity. We ²⁴ See Parker et al. (to appear) for computational modelling of inﬂection class learning that moves in this direction.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

51

think that investigation of interactions between elements in the system is a promising avenue for understanding and testing this issue. Whether similar patterns to what we ﬁnd in Russian exist in other languages is an empirical question that we feel merits further investigation.

Acknowledgements We thank Peter Arkadiev, Gregory Stump, and an anonymous reviewer for their helpful comments. All errors remain entirely our own. This work was supported in part by The Ohio State University, through a Presidential Fellowship awarded to Jeff Parker and a sabbatical granted to Andrea Sims.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

3 Demorphologization and deepening complexity in Murrinhpatha John Mansﬁeld and Rachel Nordlinger

3.1 Introduction Linguistic complexity is often associated with morphology, but it may also be associated with the unravelling of morphology. Hopper (1990) observes that elements of form that were once morphological exponents may over time lose their morphological status and become unanalysable subparts of lexical stems. For example, the ﬁnal rime of seldom was once an Old English dative sufﬁx *-um (Hopper 1990: 154). Hopper labels the outcome of this process ‘demorphologisation’, and we here adapt his usage to conceptualize demorphologization as a gradient phenomenon, in which morphological structure becomes gradually blurred over time by the accretion of lexically speciﬁc modiﬁcations.¹ Our focus is not on the end-point of this process but the mid-point, where there are morphological ‘semi-regularities’ that help speakers and learners predict unknown word forms, but which also leave a residue of unpredictability. This type of analogical unpredictability has become a major focus in research on morphological complexity (e.g., Ackerman et al. 2009; Ackerman & Malouf 2013; Parker & Sims, Chapter 2, this volume). Other studies have focused on the problem of predicting inﬂectional exponence for unencountered forms in an open lexical class, though as we argue below, there are some unexamined conceptual issues with the open-/closed-class distinction. In the current study, we focus on predictability in a closed class of ﬁnite verb stems, albeit one in which there are large inﬂectional paradigms, and demorphologization has advanced to the point where analogical predictability from one stem to another is highly attenuated. Murrinhpatha ﬁnite verb stems, known in the literature as ‘classiﬁer stems’, exhibit semi-regular patterns associated with demorphologization (Walsh 1976; ¹ ‘Demorphologization’ is used rather differently by Joseph & Janda (1988), who use it in reference to regularization of phonological processes such that they become independent of an erstwhile morphological context.

John Mansﬁeld and Rachel Nordlinger, Demorphologization and deepening complexity in Murrinhpatha In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © John Mansﬁeld and Rachel Nordlinger. DOI: 10.1093/oso/9780198861287.003.0003

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

  

53

Street 1987; Nordlinger 2015; Forshaw 2016; Mansﬁeld 2016). We present data on analogical changes observed by comparing recent ﬁeldwork documentation with forms documented some forty years earlier, showing that the process of demorphologization is still underway. Analogical changes show that classiﬁer stem forms are not learnt and memorized as isolated units, but rather that speakers draw on paradigmatic semi-regularities to predict unknown forms. Though the system does not exhibit regular, productive inﬂection, neither can it be characterized as a set of ‘frozen forms’. Rather it is a relational system, and one that is in ﬂux. We treat analogical predictability as a form of linguistic complexity, and show that through ongoing demorphologization, the complexity of Murrinhpatha classiﬁer stems is increasing. We quantify this unpredictability by adapting probabilistic tools developed by Ackerman et al. (2009) and Ackerman & Malouf (2015). However, while the latter hypothesize limits of complexity for systems of productive inﬂection, the Murrinhpatha classiﬁer stems are a closed-class system of 1,638 inﬂectional forms, where semi-regularities aid acquisition and processing, but whole-form memorization may mitigate the requirement for analogical predictability. Murrinhpatha is a non-Pama-Nyungan polysynthetic Australian language of the Daly River region of the Northern Territory. It has maintained a vibrant speech community some eighty years after its speakers shifted to settled life under the inﬂuence of Catholic missionaries (Pye 1972). Murrinhpatha has some of the characteristics, both linguistic and social, that might associate it with the ‘isolated, complex’ language type proposed in sociolinguistic typology (Kusters 2003; Lupyan & Dale 2010; Trudgill 2011: 136; Bentz et al. 2015). However it is doubtful that notions of sociolinguistic ‘isolation’ or ‘low-contact’ apply in this instance, since evidence points to a tradition of regional multilingualism (Falkenberg 1962: 13; Dixon 2002: 674). A crucial distinction for sociolinguistic typology is that between child-acquired versus adult L2-acquired multilingualism: child multilingualism has been argued to maintain or increase complexity, and adult acquisition to reduce complexity (Thomason & Kaufman 1988: 65ff; McWhorter 2007 and Chapter 10, this volume; Trudgill 2011: 34). In the case of Murrinhpatha, we know too little of traditional multilingualism to know which is more applicable. However in the post-settlement era (1930s–present) a large number of people from Marri Ngarr, Marri Tjevin, and other language groups have shifted to Murrinhpatha, in some cases learning both languages as children but switching to Murrinhpatha during adolescent years spent in a multi-ethnic school dormitory established by the missionaries (Mansﬁeld 2014: 98). This inﬂux of new speakers has not brought about any drastic simpliﬁcations or other language contact effects in the contemporary grammar of Murrinhpatha, although it has led to the demise of the other languages of the region.² In this chapter, we demonstrate more speciﬁcally that ² Note however that the inﬂux of speakers from other language groups may have had some inﬂuence on the distribution of sociolinguistic variables (Mansﬁeld 2015a, 2015b: 183).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

54

    

inﬂectional changes observed in the post-settlement period do not constitute simpliﬁcations, given a deﬁnition of complexity as allomorphic unpredictability, and a model of allomorph prediction based on analogical comparison with other lexemes. Given the ambiguity of Murrinhpatha with respect to sociolinguistic typological hypotheses, we do not here pursue the question of whether inﬂectional complexity depends on social characteristics of the speech community. The structure of the chapter is as follows. In section 3.2 we outline the phenomenon of lexically speciﬁed inﬂectional allomorphy, which is the speciﬁc type of morphological complexity discussed in this chapter. In section 3.3 we discuss hypothesized limits to this type of complexity when applied to large lexical classes. In section 3.4 we provide an overview of the Murrinhpatha verb and introduce the relevant aspects of Murrinhpatha verb inﬂection, which involves exponence by multiple phonological increments which we label ‘intersecting formatives’ (cf. ‘paradigmatic layers’ in Parker & Sims, Chapter 2, this volume). Intersecting formatives are independent of one another in their paradigmatic patterns, and most of these patterns are not consistently applied to all verb stems, making exponence highly unpredictable. This also means that the formatives are generally not in biunique relations with inﬂectional categories. Section 3.4 describes the paradigms as documented in the 1970s (Walsh 1976; Street 1987), as well as changes to the paradigms observed in our work with a new generation of speakers since 2010. In section 3.5 we compare the observed changes with the types of changes predicted by a model of complexity limitation in large lexical classes (Ackerman & Malouf 2015), showing that none of the observed changes match the model. In section 3.6 we focus on two of the observed changes in particular, arguing that they diverge from the complexitylimitation mechanism because of incremental demorphologization, a process that is both analogical and destructive of existing analogies. In section 3.7 we summarize our ﬁndings.

3.2 Complexity in lexically speciﬁed allomorphy There are several distinct dimensions of morphology that can be treated as forms of linguistic complexity (Kusters 2003, 2008; Anderson 2015a), but in this chapter we focus solely on (lexically speciﬁed) inﬂectional allomorphy. For example, in the Australian language Warlpiri verbs are sufﬁxed with one of four lexically speciﬁed past tense allomorphs, -ca, -ŋu, -ɳu, -nu (Hale 1969; Nash 1980: 40). Where lexemes share the same allomorph selection in all their forms, the shared paradigms are usually referred to as ‘inﬂection classes’. Inﬂectional allomorphy of this type can be seen as prototypical morphological complexity, since it directly reduces form:meaning transparency (Aronoff 1998).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

  

55

The type of complexity instantiated by inﬂectional allomorphy can be conceptualized in terms of degrees of predictability in allomorph selection. For example, in a language where almost all verbs take -ak 1., and just a handful instead take -iq 1., the exponence is mostly predictable; only a small degree of complexity is involved. But in a language with several lexically-conditioned allomorphs, all more or less likely, there is low predictability, or high complexity. The larger the inﬂectional paradigm involved, the more that this problem of prediction becomes a real one for speakers of the language (or indeed linguists attempting to accurately document the lexicon and morphology), because where large paradigms are involved there is a more frequent and persistent requirement to produce previously unencountered forms (Bonami & Beniamine 2016; Blevins et al. 2017). Degrees of inﬂectional predictability can be formalized and quantiﬁed using entropy, the weighted average of the log probabilities of all possible outcomes (Shannon 1948). Entropy can be taken as a measure of the unpredictability of a set of possible outcomes. The application of entropy as a measure of paradigmatic implicational structure was proposed by Ackerman et al. (2009). Work on predictability of allomorphy has proceeded from the insight that the inﬂection of a lexeme is not predicted in an informational vacuum, but rather is a problem of predicting unknown inﬂectional forms, given one or more forms of the lexeme that have already been encountered. This has been labelled the ‘Paradigm Cell Filling Problem’ (Ackerman et al. 2009; Stump & Finkel 2013; Bonami & Beniamine 2016; Sims & Parker 2016). The paradigmatic structure of inﬂection is thus crucial: typically, we expect that paradigmatic patterns are shared by lexemes in a language, with those lexemes that share a paradigm belonging to a common inﬂectional class. The known inﬂectional forms of a lexeme narrow the possibilities of which class the lexeme might belong to, thus reducing unpredictability of other forms. For example, the past tense sufﬁx allomorphs mentioned above for Warlpiri can usually be predicted based on other inﬂectional forms. All verbs with imperative in -nta take the past allomorph -nu, licensing an inference from known form jinta ‘scold.’ to the predicted form jinu ‘scold.’ (Nash 1980, p. 40). However there are other instances where allomorphy for a particular tense/aspect/ mood (TAM) category does not uniquely identify an inﬂection class, leaving some unpredictability in the allomorphy of other forms. Table 3.1 shows the TAM Table 3.1. Warlpiri verb inﬂection classes (Hale 1969; Nash 1980: 40)  

-





. 



I II III IV V

-mi -ɳi ~ -ni -ɲi -ɳi ~ -ni -ni

-ca -ɳu -ŋu -ɳu -nu

-ja ~ -ka -ka -ŋka -ɲa -nta

-ju -ku -ŋku -lku -nku

-ɲa -ɳiɲa -ŋaɲa -ɳiɲa -naɲa

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

56

    

allomorphs for all Warlpiri inﬂection classes. The syncretism between some classes for some tense categories makes these inﬂectional forms less than fully predictive of other inﬂectional forms of the same lexeme. For example, knowing that the presentational form is in -ɳiɲa narrows the range of possible imperative allomorphs, but does not help us to decide between the two possibilities -ka, -ɲa. Residual uncertainty in predicting an inﬂectional form, given knowledge of other forms of the same lexeme, has been labelled integrative complexity (Ackerman & Malouf 2013). Integrative complexity meets several of the desiderata enumerated in Arkadiev & Gardani’s Introduction to this volume (Chapter 1). First, it is quantiﬁable and can be used to compare typologically diverse languages. Second, its conceptualization in terms of speaker inferences from known to unknown forms gives it a clear basis in psycholinguistic processing. Finally, whereas enumerative complexities lean heavily on the distinction between morphology and syntax, integrative complexity is relatively independent of this issue. Lexical selection of allomorphs generally occurs within units that are identiﬁed as words, but if a similar phenomenon occurred in phrase-like structures (e.g., periphrastic inﬂections with allomorphy on the auxiliary), this would have no real effect on the modelling of integrative complexity in the paradigm.

3.3 Complexity, predictability, and language change In this chapter, we focus on the effects that language change may have on inﬂectional predictability. It has been shown that inﬂection class structure may persist in a language over long time periods (e.g., Maiden 2005; Gardani 2013), but even if it may in some instances be relatively stable, it is of course not completely static. The inﬂectional allomorphs selected by lexemes exhibit synchronic variation, with ﬂuctuating variation rates over time leading to language change (Weinreich et al. 1968). The long-term patterns of changing allomorph selection have been studied in historically documented languages such as Latin (Gardani 2013: 201–28) and English (Jespersen 1949; Bybee & Moder 1983). An interesting question is whether the direction of such change reﬂects limits on overall complexity and, conversely, what mechanisms lead to an increase in complexity. There must be some upper limit of unpredictability at which inﬂectional systems remain learnable. If allomorphic distributions were too unpredictable, their prospects of being stably transmitted from one generation to the next would become rather slim. The obvious way to reduce unpredictability is to replace improbable allomorphs with more probable ones. We have little idea of how much unpredictability is too much, though crosslinguistic studies by Ackerman & Malouf (2013, 2015) and Stump & Finkel (2013) have documented the range of unpredictability found in genetically and typologically diverse samples.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

  

57

Ackerman & Malouf (2013) compare synchronic inﬂectional systems in ten languages, showing that in all cases the average conditional entropy of one inﬂectional form of a lexeme, given knowledge of one other form, is between zero and 1.1 bits, the latter being approximately equivalent to a choice between two equally likely outcomes. Moreover, the languages in the sample that have the most allomorphs, and therefore risk the greatest unpredictability, are also the languages that make the most use of paradigmatic structure to mitigate unpredictability (Ackerman & Malouf 2013: 443). In other words, paradigmatic structure of the type illustrated for Warlpiri above exhibits a strong crosslinguistic tendency to maintain a reasonable level of predictability for unknown inﬂectional forms. While Ackerman & Malouf (2013) do not propose a speciﬁc numeric limit for how much integrative complexity learners can deal with, their study provides a principled method of quantiﬁcation, and an initial sample of measurements, against which apparently complex languages such as Murrinhpatha can be compared. A simulation of how language change might reduce unpredictability (Ackerman & Malouf 2015) provides a useful model for considering the mechanism of analogical extension. Ackerman & Malouf (2015) model diachronic change in an inﬂectional system based on the principle that, given a known inﬂectional form of a lexeme, and the requirement to predict an unknown form of the same lexeme, a speaker identiﬁes lexemes that share allomorphs with the known form. Change proceeds by revising paradigm-internal relations to match the same morphosyntactic relations in other paradigms. We will henceforth use the terms ‘source form’ for the known form, ‘target form’ for the unknown form, ‘comparable lexemes’ for other lexemes that share allomorphy with the source form, and ‘comparable source, comparable target’ for the comparable forms that correspond in morphosyntactic category with the source and target forms respectively. Given the array of comparable lexemes, the speaker establishes which allomorph occurs most frequently among the comparable targets, and predicts this to be the allomorph for the target form. Predictions of this type are taken as a model for language change, because in the next iteration of the simulation, it is the predicted form that is now taken as the allomorph for the target cell, rather than the previous incumbent form. This is a hyperactive model of change, where overgeneralization errors go uncorrected. The model is not speciﬁc to either child acquisition or adult usage, which in any case may not be a sharp distinction in large inﬂectional systems, where some inﬂected forms must be guessed by speakers even after many millions of words of input (Blevins et al. 2017). In Figure 3.1 we show the process of analogical induction, and replacement of the target form with a predicted form. A, B, etc., represent lexemes, with inﬂectional categories Ai, Aii, Bi, Bii, etc., while x, y represent exponence candidates. Ai is the source form and Aii is the target form. B, C, D are comparable lexemes (sharing exponence with source form), while E, F are disregarded since they do

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

58

     Aii = unknown Ai = x1

compare

Bi = x1, Ci = x1, Di = x1, Ei = x2, Fi = x2 relate

Aii = y2

induce

Bii = y1, Cii = y2, Dii = y2

Figure 3.1. Ackerman & Malouf (2015) mechanism for predicting unknown inﬂectional forms

not share exponence with the source form. The comparable lexemes analogically present both y₁ and y₂ as exponence candidates for the target form, but y₂ wins out because it occurs more frequently in this distribution. If Aii is used as a source form or comparable lexeme form in a subsequent iteration, it will have the exponence y₂. Ackerman & Malouf (2015) computationally simulate this model of inﬂectional change based on a ‘highly unrealistic language’ in which allomorphy is almost completely unpredictable in the initial state. The simulation language has a hundred lexemes, each of which inﬂects for eight morphosyntactic categories, giving a total of 800 forms in the system. Each morphosyntactic category has three allomorphs, which are randomly assigned to each lexeme. Thus there are 3⁸ = 6,561 possible inﬂectional paradigms, so that most of the hundred lexemes have an idiosyncratic paradigm, that is, not shared with any other lexeme. In this initial state, there are no inﬂectional classes. As the simulation iterates, replacement of unknown allomorphs with the most predictable allomorph leads to massive convergence of lexemes towards shared inﬂectional paradigms. The simulation ends when allomorphy stabilizes (i.e., the unknown form already is the most predictable form) for twenty-ﬁve consecutive iterations. Given hundreds of trials of the simulation, in a large proportion of simulations (no exact ﬁgure is given), all lexemes converge on a single set of allomorphs (i.e., no allomorphy), creating a single inﬂectional paradigm. In the remaining simulations, lexemes converge on between two and eighty-eight inﬂectional classes, the median number being twelve (Ackerman & Malouf 2015: 8). In terms of inﬂectional predictability, the initial random distribution of allomorphs [x₁, x₂, x₃] for each inﬂected form means that knowledge of other inﬂected forms does not offer any reduction to uncertainty (except by occasional accident of the distribution), and conditional entropy is therefore only marginally less than unconditional entropy, that is, H(a, b, c) = 1.58 bits. But the replacement by most predictable allomorph mechanism in the simulated language change reduces this entropy to 0 bits in the instances where all lexemes converge on a single paradigm, and an average of 0.64 bits in the instances where the simulation converges on a set of inﬂectional classes (Ackerman & Malouf 2015: 9). The average conditional entropy found in these simulated inﬂectional systems sits neatly within the range of

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

  

59

0–1.1 bits found in the study of natural languages (Ackerman & Malouf 2013). This provides support for the notion that the model’s simpliﬁcation mechanism may have something in common with mechanisms deployed in natural language. One issue that has been insufﬁciently addressed in work on integrative complexity is the question of open versus closed lexical classes. The Ackerman & Malouf (2015) simulation works with a set of a hundred lexemes, that is to say a ﬁnite set, and therefore a closed class. The basic formulation of the Paradigm Cell Filling Problem (PCFP; Ackerman et al. 2009) presumes that unknown inﬂectional forms must be predicted by a speaker, but also that the correct inﬂectional exponence is in some way deﬁned—perhaps by a dictionary, or a more erudite speaker. Now, if we take ‘open class’ to mean a lexical class to which entirely new words can be added, then there must be a point at which inﬂectional forms of these words are not pre-deﬁned, and there is no correct or incorrect selection of exponence. In other words, for truly open-class lexemes, the PCFP is undeﬁned. In the next section, we will see that Murrinhpatha classiﬁer stems are a closed class, with rather fewer members than may be intended in the original PCFP formulation. However, we argue that the model is still relevant, as Murrinhpatha speakers are not born with complete knowledge of the classiﬁer stem paradigms, and must therefore use predictive mechanisms to extrapolate from known to unknown forms.

3.4 Unpredictable exponence in Murrinhpatha classiﬁer stems Murrinhpatha is a polysynthetic language with complex verbal structures including agreement morphology, nominal incorporation, adverbial modiﬁers, and complex predicates (Nordlinger 2017). Verbs are built on a ﬁnite stem element known as a ‘classiﬁer stem’,³ which may either form a complete verb on its own or form the basis for a complex predicate. Classiﬁer stems encode predicate semantics, subject person and number, and tense/aspect/mood marking. All Murrinhpatha verbs require a classiﬁer stem in ﬁrst position (bolded in the examples below). There are thirty-nine classiﬁers, each of which appears in forty-two inﬂected forms, thus giving a total of 1,638 inﬂected forms. Eleven of the thirty-nine classiﬁers can form a verb on their own (1), the remaining twentyeight are only ever found in combination with a second, uninﬂecting stem element later in the verbal word (underlined in the examples below) with which they jointly determine the predicate semantics (2)–(5). The only allomorphy in the verb is in the classiﬁer stem element—all other elements have a single exponence, subject only to phonologically motivated alternations. For more discussion of the ³ In other work these have been called ‘auxiliaries’ (Walsh 1976), ‘classiﬁer-subject pronominals’ (Nordlinger 2011), and ‘ﬁnite verbs’ (Mansﬁeld 2016).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

60

    

details of the system the reader is referred to Blythe (2009), Nordlinger (2011, 2015), and Mansﬁeld (2016, 2019) among others.⁴ (1)

wuɾan 3S.(6). ‘She goes.’

(2)

muŋam-paɭ 3S.(11).-break ‘She broke it off.’

(3)

pam-ŋin̪t̪a-nu-ma-ɻaʈal 3S.:(24).-.---tear ‘They (two female non-siblings) tore the (cloth) from each other.’ (RN-20070531-002:011)

(4)

piɾim-nin̪t̪a-nu-bu-wuj-waɖa-ya 3S.(3).-.--thigh-put.into-- ‘They put them in their pockets.’ (JB 43JBc743652_747130)

(5)

puddan-wunku-ɭaɭ-dejida-ŋime=pumpan-ka 3S.(29).-3O-drop-in.turn-.=3S.(6).- ‘They (dual, sibling) are dropping them (paucal, female, non-sibling) off, one after the other, as they go along.’ (Blythe 2009: 134)

For most classiﬁer stems the exponence pattern making up the paradigm of fortytwo inﬂected forms is unique to that stem. Thus the concept of ‘inﬂectional classes’—a set of exponence paradigms shared by many lexemes—is not directly applicable to Murrinhpatha. (1)–(5) show classiﬁer stems as unsegmented wholes, and this has been the representation used in most work on Murrinhpatha. However there are semi-regular subcomponents evident in these stems, and it is these that we treat as exponents of inﬂectional categories. These are not productive morphs that are applicable to new lexemes in an open class, however they do constitute morphology in the sense of form:meaning associations between systematically related forms (Anderson 2015b).

⁴ In the Appendix we have provided paradigms for ﬁve classiﬁer stems, to exemplify the complexity amongst them. Previous descriptions of the Murrinhpatha verbal system (e.g., Blythe et al. 2007; Nordlinger 2011, 2015) have tended to treat these classiﬁer stem paradigms as consisting of synchronically unanalysable portmanteau forms, due to the substantial amounts of unpredictability and suppletion within the paradigms. The full set of thirty-nine paradigms as analysed in this chapter is available at http://langwidj.org/Murrinhpatha-inﬂection.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

  

61

We assume that Murrinhpatha speakers to some extent store classiﬁer stems as whole forms, rather than composing them online from the elements of exponence (cf. Mithun, Chapter 12, this volume). However this does not mean that inﬂectional exponence has no role in acquisition or processing. We do not know how much input is required for a Murrinhpatha speaker to encounter all 1,638 forms enough times that they can all be memorized, but the available evidence on corpus distribution of inﬂected forms suggests that many years of input are required to offer complete coverage of large paradigms (Blevins et al. 2017). As with other inﬂectional systems, when Murrinhpatha speakers parse or produce forms that they have not yet encountered, the recurrent patterns of exponence offer predictive clues. Indeed, the evidence of analogical change presented in this chapter shows that classiﬁer stem forms are not acquired and stored as isolated forms: speakers draw on the exponence of one classiﬁer to produce the exponence of another. Research on child acquisition of Murrinhpatha verb inﬂection also shows that children make occasional errors in allomorphy selection, revealing morphological structure in the acquisition of classiﬁer stems (Forshaw 2016). Therefore, both the PCFP (Ackerman et al. 2009) and the Ackerman & Malouf (2015) simpliﬁcation mechanism are relevant to Murrinhpatha. The fact that Murrinhpatha classiﬁer stems constitute a closed class does not disqualify them from applicability of these models, since, as we observed above, the PCFP is only strictly deﬁned for a closed class.

3.4.1 Intersecting formatives and unpredictable allomorphy Inﬂectional allomorphy in Murrinhpatha classiﬁer stem paradigms is both highly complex and typologically unusual, meaning that a detailed exposition is beyond the scope of this chapter.⁵ Murrinhpatha’s thirty-nine classiﬁers each appear in forty-two inﬂected forms (and never in non-ﬁnite form). The ‘inner stems’ upon which these forms are built are highly mutable, creating much of the complexity in the system (cf. Parker & Sims, Chapter 2, this volume). Table 3.2 illustrates some sample classiﬁer forms,⁶ representing two classiﬁers (la ‘(26)’, ma ‘(34)’) that have fairly clear phonological stems, one classiﬁer (ɾu ‘(6)’) that has a highly mutable stem, and one classiﬁer (i ‘(1)’) that has a vowel-only stem,

⁵ Fuller description is available in Mansﬁeld (2016, 2019), drawing on earlier partial analyses (Walsh 1976: 224; Green 2003; Forshaw 2016: 37). As shown in the examples above, there is also further inﬂectional morphology in the verb that is not part of the classiﬁer stem paradigms, and can be applied equally to verbs based on any classiﬁer stem (Nordlinger 2015, 2017). This morphology has no bearing on the issues discussed in this chapter and will therefore not feature in our remaining discussion. ⁶ The full paradigms are provided in the Appendix.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

62

     Table 3.2. Examples of inﬂected classiﬁer forms

la ‘(26)’ ma ‘(34)’ ɾu ‘(6)’ i ‘(1)’

3.

3.

3.

kila ma kuɾu ki

dilam mam wuɾan dim

pilla pume puɳi piɾini

which only surfaces phonologically when it syllabiﬁes with a consonantal stem alternation, and otherwise results in a phonologically empty stem.⁷ The challenge for segmentation and analysis of classiﬁer forms lies in the fact that each form combines several independent dimensions of allomorphy. Each inﬂected classiﬁer form selects a preﬁx consonant allomorph, an orthogonally distributed preﬁx vowel allomorph, and an orthogonally distributed sufﬁx allomorph (any of which can be zero). We label this combination of orthogonal allomorphs inﬂection by intersecting formatives (Mansﬁeld 2016). Intersecting formatives appear to be a recurrent feature of highly complex verbal inﬂection systems, such as Mazatec (Ackerman & Malouf 2013), Greek (Sims 2015: 143ff), Saami (Feist 2015: 140ff), and Seri (Baerman 2016). Intersecting inﬂectional formatives are given an explicit formulation in Network Morphology, where they are represented as multiple inheritance of inﬂection class nodes (Brown & Hippisley 2012: 71ff), and there is further discussion of the phenomenon with respect to complexity in Parker & Sims (Chapter 2, this volume), where intersectional inﬂection is labelled ‘paradigmatic layers’. Intersectional inﬂection often combines concatenative and supra-segmental morphology, and this is also the case in the Murrinhpatha verb forms. The Murrinhpatha classiﬁer stem is built on a phonologically minimal ‘inner stem’ of the shape (C)(C)V, which alternates in three orthogonal dimensions: stem consonant mutation, vowel height, and vowel frontness. Each inﬂected form of a classiﬁer stem is therefore determined by six dimensions of intersecting allomorphy: PrefC, PrefV, StemC, StemVH, StemVF, and Sufﬁx. Table 3.3 illustrates the intersecting formative analysis of the forms shown above in Table 3.2. Formatives exhibit ‘semi-regularities’ that appear in some but not all exponents of a morphosyntactic cell, for example, PrefC k- in 3., Sufﬁx -m in 3.. Other (semi-)regularities attach to particular classiﬁer stems, for example PrefV i- in

⁷ The description of ‘vowel-only stems’ is somewhat different from Mansﬁeld (2016), where they are simply labelled ‘phonologically empty stems’. The analysis there nonetheless depends on underlying ‘theme vowels’ in such stems, though this is not explicitly discussed. An alternative analysis would propose a zero theme vowel, to avoid the use of unrealized underlying vowels. We have experimented with calculation of Murrinhpatha integrative complexity using both analyses, and found that the difference is very small (< 1%). The unrealized vowel alternative produces slightly lower complexity measurements, and we therefore select this option to keep our complexity measurements conservative.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

  

63

Table 3.3. Examples of classiﬁer forms and their formative analyses 3.

la ‘(26)’ ma ‘(34)’ ɾu ‘(6)’ i ‘(1)’

3.

 . - . - [, , ] -  kila dilam k-i-la[]-∅ d-i-la[]-m ma mam ∅-u-ma[]-∅* ∅-u-ma[]-m kuɾu wuɾan k-u-ɾu[]-∅ w-u-ɾa[:]-n ki dim k-i-∅[]-∅ d-i-∅[]-m

3.

pilla p-i-lla[:]-∅ pume p-u-me[:]-∅ puɳi p-u-ø[:]-ɳi piɾini p-i-ɾi[:]-ni

Notes: * PrefV, like the stem vowel, does not surface unless it can syllabify with an onset consonant. Thus we can analyse a PrefV u- formative in ø-u-ma-m 3.(34)., in keeping with this classiﬁer’s overall paradigmatic pattern, though the surface form is mam.  = default;  = geminate;  = ɾ-alternation.

(26) and (1), but PrefV u- in (34) and (6). Importantly, these patterns are often orthogonal—for example, the PrefV selection is independent of the PrefC selection in 3.. As shown in the full paradigm examples in the Appendix, the complete morphosyntactic paradigm of a classiﬁer stem consists of forty-two inﬂectional forms. Subjects are distinguished for 1/2/3 person, cross-cutting a three-way // number distinction (although / is consistently collapsed in  tense, and in all tenses for some paradigms).⁸ There is also a 1+2 ‘we inclusive’ person category, which has no number distinctions. These are the core number/person categories of Murrinhpatha, but more speciﬁc subcategories can be encoded using various predictable sufﬁxes not discussed here (Nordlinger 2015). There are four basic tense/ modality categories (henceforth ‘tenses’): non-future (), irrealis (), past (), and past irrealis (), as well as ‘subtense’ distinctions between  vs presentational (), and  vs future indicative (), which apply only to third-person forms. Again, these core categories can be further speciﬁed by predictable sufﬁxes encoding tense, modality, and aspect (Nordlinger & Caudal 2012). Table 3.4 illustrates a complete paradigm of inﬂected forms for one of the more regular classiﬁers, na ‘(27)’, with both surface forms and intersecting formative analysis. Some formatives in some cells have a consistent form (i.e., no allomorphy), such as PrefC p- in 3.. More typical is a selection between a handful of formative allomorphs, for example Sufﬁx -m, -n, -ŋam, -ŋan in , or PrefV a-, e-, i, u- for all cells. A particularly wide selection of allomorphs is PrefC p-, w-, d-, n-, j-, k-,

⁸ The category here labelled  is used for both dual and paucal referents; it is labelled PAUCAL (PC) in Mansﬁeld (2016) and DAUCAL in Blythe (2009).

Table 3.4. Inﬂectional exponence of na ‘(27)’ INNER STEMS:

NFUT (/PRSL) SG

1 2 3

INCL PL/DU

1+2 1 2 3

ŋinaŋam ŋ-i-[]-ŋam t̪inaŋam t̪-i-[]-ŋam ninaŋam/ kinaŋam n-i-[]-ŋam / k-i-[]-ŋam t̪inaŋam t̪-i-[]-ŋam ŋinnaŋam ŋ-i-[]-ŋam ninnaŋam n-i-[]-ŋam pinnaŋam / kinnaŋam p-i-[]-ŋam / k-i-[]-ŋam

SG 1 2 3

INCL 1+2 PL 1 2 3

DU 1 2 3

na  nna : ∅ : IRR (/FUT)

PST

PSTIRR

ŋina ŋ-i-[]-∅ t̪ina t̪-i-[]-∅ nina/ kina k-i-[]-∅ / p-i-[]-∅

ŋinaŋa ŋ-i-[]-ŋa t̪inaŋa t̪-i-[]-ŋa niŋa n-i-[]-ŋa

ŋinaŋi ŋ-i-[]-ŋi t̪inaŋi t̪-i-[]-ŋi niŋa n-i-[]-ŋi

pina p-i-[]-∅

t̪inaŋa t̪-i-[]-ŋa

t̪inaŋi t̪-i-[]-ŋi

ŋinna ŋ-i-[]-∅ ninna n-i-[]-∅ kinna / pinna k-i-[]-∅ / p-i-[]-∅

ŋinna ŋaŋ-i-[]-ŋa ninnaŋa n-i-[]-ŋa pinnaŋa p-i-[]-ŋa

ŋinnaŋi ŋ-i-[]-ŋi ninnaŋi n-i-[]-ŋi pinnaŋi p-i-[]-ŋi

ŋinna ŋ-i-[]-∅ ninna n-i-[]-∅ kinna / pinna k-i-[]-∅ / pi-[]-∅

ŋinnaŋa ŋ-i-[]-ŋa ninnaŋa n-i-[]-ŋa pinnaŋa p-i-[]-ŋa

ŋinnaŋi ŋ-i-[]-ŋi ninnaŋi n-i-[]-ŋi pinnaŋi p-i-[]-ŋi

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

na ‘(27)’

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

  

65

ø- 3., and StemC allomorphy also has a large selection of allomorphs, once we take into account various suppletive (i.e., altogether unpatterned) consonant alternations. From the point of view of integrative complexity, that is, the predictability of an inﬂected form given knowledge of some other form, the formatives individually have an intermediate degree of predictability. In certain dimensions there is very high predictability: for example, if one  form takes Sufﬁx -ŋam, there is a very high likelihood (though not quite categorical) that any other  form of the same verb will take Sufﬁx -ŋam. This is illustrated in the consistent tense patterning of Sufﬁx allomorphs in Table 3.4. Among cells that have the same tense and number categories but differ for 1/2/3 person, the only difference of exponence is usually PrefC; these triplets of cells are therefore tightly integrated in terms of implicational structure. However, when we consider the implicative relationship between cells from different tenses, we ﬁnd that, say, knowing  -ŋam provides little information about the Sufﬁx allomorph for  cells. Allomorph selection across tenses is strongly orthogonal. Other formatives have generally high degrees of integrative complexity, that is to say, inconsistent paradigmatic patterning. This is especially true of the stem formatives StemC, StemVH, and StemVF, and also to some extent of PrefV. The problem of predicting an unknown inﬂected form of a Murrinhpatha classiﬁer stem therefore involves predicting allomorph selection for six intersecting formatives, based on knowledge of such an intersection for some other form of the classiﬁer stem. Some formatives provide good chances of correct prediction, while others are rather less helpful. This situation is not as extreme as the completely random paradigmatic distribution of allomorphs in Ackerman & Malouf (2015)’s ‘unrealistic language’, though the presence of six different dimensions of allomorphy in Murrinhpatha nonetheless leads to a high degree of complexity, since the unpredictability of the allomorphs is compounded. Because Murrinhpatha classiﬁer stems often have idiosyncratic exponents, that is, allomorphs not shared by any other classiﬁer stem, the entropy calculations used in Ackerman & Malouf (2013) are not directly applicable. The latter’s allomorphic entropy method assumes that all possible exponents have been encountered in other lexemes, so that allomorphy prediction involves a distribution of possible outcomes. But in a system with idiosyncratic exponents, the unknown target exponent may be one that has not previously been encountered (cf. Dahl, Chapter 13, this volume). The speaker’s challenge is not one of entropy in the distribution of previous observations, but of attempting to predict an outcome that may or may not match any previous observation. Thus the mathematical analysis calculates chance of correct prediction (including zero chance for a previously unencountered paradigmatic relation), rather than degrees of entropy. Nonetheless, we can make a notional comparison of Murrinhpatha with the crosslinguistic ﬁndings on entropy in Ackerman & Malouf (2013). The latter

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

66

    

ﬁnds average conditional entropy between 0 and 1.1 bits, and 1 bit of entropy equates to a randomized prediction having 50% chance of matching the outcome. Mansﬁeld (2016) calculates that the average chance of correct prediction from one Murrinhpatha classiﬁer stem form to another is 43%, comparable to 1.22 bits of entropy.⁹ This is slightly outside the range of the Ackerman & Malouf sample, suggesting that Murrinhpatha’s closed-class classiﬁer stems have an integrative complexity at the upper end of the scale found for open-class systems in other languages. As far as we know, the only language that has been analysed as having clearly higher integrative complexity is Seri (isolate, Mexico), which has almost 2 bits average conditional entropy (Baerman 2016).

3.4.2 Variation and change With 39 x 42 = 1,638 inﬂectional cells to be learnt, and implicational relations proving only moderately helpful in deducing unknown forms, it would be surprising if all Murrinhpatha speakers selected the same allomorphs all the time. The presence of allomorphic variation in Murrinhpatha classiﬁer stem forms has previously been explored only to the extent that some paradigm cells are documented with two or more variants, for example nuɻa ~ na 3S.(7). (Street 1987: 84). The 1,638 cells of the full classiﬁer stem paradigms have been documented based on a limited set of spontaneous speech data, with gaps ﬁlled by systematic elicitation of paradigms by multiple researchers over a number of years of descriptive work. These collective ﬁndings are collated as Blythe et al. (2007), and since then have been further revised and reanalysed in Mansﬁeld (2019) although many questions still remain. Understanding the extent of allomorphic variation, the proportion in which variants are used, and any conditioning factors on the variation requires much more data. Investigation of such variation in Murrinhpatha is still a work in progress, but after forty years of intermittent research on this language, there are now some inﬂectional variables for which we have enough corpus tokens to begin proposing patterns of variation and implicit diachronic change. For this study we have identiﬁed seven inﬂected forms with attested variation. These are the complete set of forms that fulﬁl the following criteria: (a) Variation attested in the corpora of adult speech recorded by Blythe, Mansﬁeld, Nordlinger, Street, & Walsh;¹⁰ (b) Allomorphic variants are attested with multiple corpus tokens for each variant; ⁹ That is, log₂(1/0.43) = 1.22. ¹⁰ Much of this corpus material is stored in public archives at the Australian Institute of Aboriginal and Torres Strait Islander Studies (Walsh), the Max Planck Institute Language Archive (Blythe), and PARADISEC (Mansﬁeld, Nordlinger).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

  

67

(c) The variation is morphological, rather than purely phonological. For example, pujemam ~ pijemam 3S.(34). is purely phonological variation based on assimilation of the vowel to the following glide, and is therefore not an instance of lexically speciﬁed allomorphy. None of the seven variables thus identiﬁed have enough corpus tokens to support a rigorous variationist analysis. Nor is there sufﬁcient data to permit differentiation between contextual factors such as phrasal context, speech style, speaker gender, etc. Rather, in this study we focus purely on the distribution of variants among speakers born in the ﬁrst half of the twentieth century (‘older speakers’) versus those born in the second half (‘younger speakers’). This method allows us to detect proportions suggestive of change in progress in inﬂectional variants, and thereby to search for signs of the Ackerman & Malouf (2015) simpliﬁcation mechanism in effect. In fact, for all seven of the variables, there is a striking difference between variant distributions among older and younger groups, with the younger moving strongly towards the variant not attested in earlier documentation.¹¹ This is likely not an accident: the fact that these seven inﬂected forms were noted as variable is primarily because they stood out in Mansﬁeld’s ﬁeldwork as conﬂicting with earlier grammatical descriptions of the language. On the other hand, though speakers showed clear awareness of social indexicality in phonological and lexical variation among the generations, they were unaware of the intergenerational variations in inﬂectional morphology (Mansﬁeld 2014: 469ff). It has often been observed that less frequent inﬂectional forms are more susceptible to analogical change in morphology, though frequent forms may also undergo such changes (e.g., Fertig 2000: 125). Since our method for identifying changes in Murrinhpatha depends on the salience of these changes in ﬁeldwork, these can all be said to occur in fairly frequent forms. We presume that further analogical changes occur in less frequent forms, though we have not had the opportunity to observe these, and the corpus data drawn upon for this study does not permit robust estimates of inﬂectional form frequency. Table 3.5 lists the seven observed variables, with variants preferred by older and younger speakers respectively according to the corpus evidence. Note that where regular triplets of 1/2/3 person inﬂections are all involved, these are treated as a single variable in view of their tight mutual implications. Token numbers in parentheses indicate the number of tokens found for the older:newer variants among that speaker group. For example, for 1S.(34)., older speakers were found to have ﬁve tokens of me and one token of ŋeme,

¹¹ Some of the sources for older speakers are written (e.g., Bible; Street 1987) and do not have accompanying audio sources. It is possible that these sources underreport use of innovative variants, by correcting them to what may have been seen as the ‘correct’ form. This may account for some of the strength of the swing in proportions from older to younger speaker groups.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

68

    

Table 3.5. Variably inﬂected classiﬁer stem forms Classiﬁer, inﬂection

Older speakers (tokens)

Younger speakers (tokens)

ma, 1.(34). ma, 2S.(34). ma, .(34). ɾu, .(6). nu, .(7). ɾa, 3.(28). ɾi, 3.: (36).

me (5:1) nam (7:1) ŋamam, namam, pamam (17:5) ŋa, na, ka (3:0) ŋunna, nunna, punna (10:1) paŋan (4:0) pim (2:0)

ŋeme (0:9) t ̪amam (10:13) ŋujemam etc (3:12) ŋu etc (0:3) ŋunne etc (0:10) piɾim (1:5) piɾim (0:5)

while younger speakers were found to have zero tokens of me and nine of ŋeme. Interestingly, one of the few forms earlier documented as being variable, nuɻa ~ na 3.. (Street 1987: 84), showed only marginal variability in the corpus data. There are dozens of attestations for na, and only one for nuɻa, suggesting that the latter variant was already on its way out when Street recorded it.

3.5 Predictability of changes observed in Murrinhpatha In the last section we saw that Murrinhpatha classiﬁer stems are a closed class in which the inﬂectional paradigms are large, and implicational relations are highly unpredictable. We also saw that allomorphy of exponence in this system is not static, but rather encompasses some variable forms, which show signs of change over the last couple of generations. Thus we are now in a position to investigate whether the changes observed in Murrinhpatha decrease or increase the complexity of the system. To test this, we ran the Ackerman & Malouf (2015) simpliﬁcation method (with adaptions as described above) on the relevant classiﬁer forms, identifying the most predicted allomorphs. We show that the observed change does not replace an incumbent allomorph with the most predictable allomorph in any of the seven inﬂected forms. We then go on to consider a weaker form of the Ackerman & Malouf (2015) simpliﬁcation mechanism: when speakers replace an old allomorph with a new one, do they at least select one that is more predictable than the previous? We ﬁnd that, on the contrary, most of the changes observed in Murrinhpatha select less predictable allomorphs, thus increasing the complexity of the system. The Ackerman & Malouf (2015) simpliﬁcation mechanism was implemented for Murrinhpatha classiﬁer inﬂections using intersecting formatives to draw independent analogies, since this method has been shown to provide the

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

  

69

Table 3.6. Allomorphs selected by Ackerman & Malouf (2015) simpliﬁcation mechanism Classiﬁer, inﬂection

Older speakers

Younger speakers

Ackerman & Malouf (2015) simpliﬁcation

ma, 1.(34). ma, 2.(34). ma, .(34). ɾu, .(6). nu, .(7). ɾa, 3.(28). ɾi, 3.:(36).

me nam ŋamam etc ŋa etc ŋunna etc paŋan pim

ŋeme t ̪amam ŋujemam etc ŋu etc ŋunne etc piɾim piɾim

me nam ŋumam etc ŋuɻu etc ŋunni etc piɻam piɻim

greatest probability of correctly predicting allomorphy (Mansﬁeld 2016).¹² The implementation iterates through every inﬂected form of every Murrinhpatha classiﬁer stem, treating each in turn as a target form requiring analogical prediction. The predictive mechanism takes each other inﬂected form of the classiﬁer stem in turn as a source form, and for each identiﬁes comparable classiﬁer stems, from which candidate allomorphs for the target form are deduced. The probability of each candidate allomorph is the proportion of comparable classiﬁer stems that imply that allomorph. The probability of candidates is aggregated across all source forms, revealing the overall most probable candidate. The most probable candidate allomorphs selected by the implementation for our variable inﬂected forms are illustrated in Table 3.6, along with the older and younger speakers’ attested forms (see full paradigms in Appendix). The results of the implementation do not in any instance match the innovative forms observed among younger speakers. However, in some instances the observed innovation, in comparison with the older form, does exhibit some of the formative allomorphs selected by the Ackerman & Malouf (2015) simpliﬁcation. For example in 1.(6)., the older form is ŋa and the simpliﬁcation form is ŋuɻu. The observed innovation ŋu does exhibit the switch to PrefV u-, but maintains the weak stem grade of the older form, rather than the StemC [] ɻu of the Ackerman & Malouf (2015) simpliﬁcation.¹³ Similarly, in 3.(28). the observed innovation takes on both the PrefV i- allomorph, and the Sufﬁx -m of the Ackerman & Malouf (2015) simpliﬁcation, but does not take up the StemC [] ɻa selected by the simpliﬁcation, and also diverges from the ¹² The implementation code is written in Python (Python Software Foundation n.d.), and takes as input the inﬂectional paradigm data format established for the Principle Parts Analyzer (Finkel & Stump 2013). Both code and data are available online at http://langwidj.org/Murrinhpatha-inﬂection. ¹³ ɾu [] ! ɻu [] may not seem like an obvious case of gemination, but it follows from a ɾɾ ! ɻ process observed in Murrinhpatha’s sister language Ngan’gityemerri (Reid 1990) and their shared proto-language (Green 2003). In Murrinhpatha it is observable only in the classiﬁer stem paradigms, where it ﬁts with a broader gemination pattern.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

70

    

Table 3.7. Exponence probabilities of older and newer forms Classiﬁer, inﬂection

Older form (prob.)

Newer form (prob.)

ma, 1.(34). ma, 2.(34). ma, .(34). ɾu, .(6). nu, .(7). ɾa, 3.(28). ɾi, 3.:(36).

me (.37) nam (.14) ŋamam, namam, pamam (.14) ŋa, na, ka (.06) ŋunna, nunna, punna (.06) paŋan (.06) pim (.06)

ŋeme (.00) t ̪amam (.09) ŋujemam etc (.00) ŋu etc (.06) ŋunne etc (.06) piɾim (.06) piɾim (.12)

simpliﬁcation by selecting StemVF [] (vowel frontness) and StemVH [] (vowel height) formatives. Finally, .(7). takes up the StemVF [] alternation selected by the simpliﬁcation, but maintains the StemVH [] (vowel height) alternation of the older form, instead of selecting the StemVH [] of the simpliﬁcation. Since some of the observed innovations take up subsets of the formative intersection selected by the adapted Ackerman & Malouf (2015) simpliﬁcation, which is the overall most probable exponence, we might wonder whether the observed innovations represent partial or incomplete moves towards Ackerman & Malouf (2015) simpliﬁcation. Do the observed innovations have greater probability of being predicted by analogy than the older forms they appear to be replacing? To this question, the answer is again negative, as illustrated in Table 3.7. Table 3.7 illustrates that in six out of the seven instances, the innovative form has either lower probability of being predicted than the older form, or equal probability. Only one instance, the innovation in 3.:(36)., creates a more predictable exponent. Even though some of the innovated formatives match the Ackerman & Malouf (2015) simpliﬁcation, the selection of nonsimpliﬁed formatives undermines the predictability of the entire form. We must therefore conclude that the changes to inﬂectional allomorphy observed in Murrinhpatha data collected over forty years (or at least, apparent changes, suggested by different distribution of variants among older and younger speakers) increase the complexity of the system. Most of the changes replace more predictable allomorphs with less predictable ones.

3.6 Demorphologization and deepening complexity Observed changes in Murrinhpatha increase the unpredictability of inﬂectional allomorphs because of breakdown in the structure of intersecting formatives. In this section we argue that this is a form of incremental demorphologization, where allomorphic proliferation is associated with the breakdown of segmentability.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

  

71

Demorphologization in this sense is a complexifying force running counter to the simplifying force of analogical levelling. Indeed, this demorphologization process appears to be the same phenomenon that has been underway for a much longer time period, leading to the unpredictability of implicational relations in the classiﬁer paradigms. Blurring of constituent boundaries between the inner stems and afﬁxal exponents in the classiﬁer paradigms has produced the semiregularities of the inﬂectional system. Ackerman & Malouf (2015) propose that the requirement for inﬂectional allomorphs to be reasonably predictable, given knowledge of other forms of the same lexeme, is a ‘strong evolutionary pressure in language’ (Ackerman & Malouf 2015: 7). They present their model for iteratively simplifying predictions as a demonstration of how predictability might be achieved, though they do not claim that this is the actual mechanism at work in the evolution of natural languages.¹⁴ The implementation of their mechanism for Murrinhpatha, compared to observed changes in the language, suggests that a simpliﬁcation mechanism of this type is not in operation in the closed-class system of Murrinhpatha. But the broader point remains valid: inﬂectional changes do appear to reﬂect analogies drawn by speakers based on the paradigms of other lexemes. This point of view is supported because the innovated forms in Murrinhpatha copy phonological elements found in other classiﬁer forms with which they share morphosyntactic characteristics, rather than being purely phonological changes. But rather than following a direct aggregation of probable allomorphs, there appear to be other predictive inﬂuences at work—interference in the system, which leads to an increase in integrative complexity. Each of the innovations observed in Murrinhpatha has its own story, with potential sources of analogy detectable upon investigation of paradigmatically related forms. We here describe two of the innovations in particular, selected because they illustrate a means by which allomorphic complexity may be perpetuated, rather than reduced.¹⁵ As with all the observed changes, these are not the forms selected by the Ackerman & Malouf (2015) simpliﬁcation mechanism.

¹⁴ In fact, their main argument focuses on the greater generality of their Low Conditional Entropy Conjecture (Ackerman & Malouf 2013) as compared to the No Blur Principle (Carstairs-McCarthy 1994), which does not directly concern us here. ¹⁵ The other changes observed are potentially explicable by more subtle departures from the Ackerman & Malouf (2015) simpliﬁcation mechanism—for example, by weighting of comparable classiﬁer stems according to their respective entropies of prediction, with near-categorical predictors given extra weight (2.(34).), or by allowing prediction to be based on phonological relationships, including identity, rather than inﬂectional exponents (.(34).) (Bonami & Beniamine 2016). .(6). and..(7). seem to involve greater independence of formatives than has been previously proposed for the system (Mansﬁeld 2016). Satisfactory analysis of any of these instances would require a separate study.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

72 (6)

     1.(34). Older form ∅-a-me[]-∅ me Ackerman & Malouf (2015) simpliﬁed ∅-a-me[]-∅ me

Observed ŋ-e-me[]-∅ ŋeme

In the case of (6), there are two observed deviations from Ackerman & Malouf (2015), the ﬁrst of which is the selection of PrefC ŋ- instead of ∅-. Both ŋ- and ∅are in fact candidates implied by comparable classiﬁer stems for various source forms, with ∅- selected because it has an aggregate 0.73 probability among all source forms, versus 0.27 for ŋ-. It is easy to imagine that this outcome might be different, as in the observed innovation ŋeme, if there were some weighting in the inﬂuence of source forms and comparable classiﬁer stems. However the second deviation from Ackerman & Malouf (2015) involves the introduction of PrefV e-, and this is not even a candidate by analogy with comparable classiﬁers. Classiﬁer stems that do have PrefV e- are never selected as comparable, because none of the ma ‘(34)’ source forms use this allomorph, as illustrated in Table 3.8. Rather, the competing candidates are a- ~ u-. Notice, however, that 1.(34)., like all (34). forms, has a StemVF [] alternation. It seems that rather than arising from analogical prediction of PrefV allomorphy, the form ŋeme applies vowel fronting beyond the morphological inner stem structure ma ~ me in which the pattern is more generally established. On this view, the predicted form is derived analogically from other forms, but the prediction of vowel fronting has been inherited upwards into a morphological unit larger than the inner stem. Such abrogation of the structural distinction between inner stem and preﬁx is perhaps not surprising, given the widespread lack of phonological transparency in Murrinhpatha classiﬁer stems. (7)

3.(28). p-a-∅[]-ŋan paŋan Ackerman & Malouf (2015) simpliﬁed p-i-ɻa[:, :, :]-m piɻam

Observed p-i-ɾi[:, :, :]-m piɾim

The case of (7) suggests more extensive breakdown of inner stem/afﬁx structure in the predictive mechanism. Here the observed deviations from the Ackerman & Malouf (2015) simpliﬁcation again include a consonant formative that is an analogical candidate though not the aggregate strongest candidate, StemC [] instead of StemC [], which again could be accounted for in a system that includes some weighting of candidates. The other deviation is in the vowel

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

  

73

Table 3.8. Classiﬁer stem paradigm for ma ‘(34)’ ma ‘(34)’

NFUT (/PRSL) SG

1

ŋamam ŋ-a-[]-m

2

nam ∅-a-[]m mam / kamam ∅-a-[]m / k- . . .

3

INCL 1+2 t a̪ mam t -̪ a-[]-m PL/ DU

1

ŋamam ŋ-a-[]-m

2

namam n- . . . pamam / kamam p- . . . / k- . . .

3

INNER ma  STEM: me : mi :, : IRR (/FUT) SG 1 ŋama ŋ-a-[]-∅ 2 t̪ama t̪- . . . 3 kama / pama k- . . . / p . . .

INCL pama 1+2 p-a-[]-∅

na : ne :, : ni :, :, : PST

PSTIRR

me mi ∅-u-[]-∅ ∅-u-[, ]-∅ ni ne ∅-u-[, ∅-u-[, ]-∅ ,]-∅ me mi ∅-u-[]- ∅ ∅-u-[, ]-∅ t̪ume t̪-u-[]-∅

t̪umi t̪-u-[, ]-∅

PL 1 ŋujema ŋ-uje-[]-∅

ŋume ŋumi ŋ-u-[]-∅ ŋ-u-[, ]-∅ 2 nujema nume numi n- . . . n- . . . n- . . . 3 kujema / pujema pume pumi k- . . . / p- . . . p- . . . p- . . .

DU 1 ŋujema ŋ-uje-[]-∅

ŋume ŋumi ŋ-u-[]-∅ ŋ-u-[, ]-∅ 2 nujema nume numi n- . . . n- . . . n- . . . 3 kujema / pujema pume pumi k- . . . / p- . . . p- . . . p- . . .

formatives StemVF [] and StemVH [], neither of which is predicted by formative analogies. None of the source forms use such stem vowel alternations (Table 3.9). Rather, the default inner stem vowel a is overwhelmingly predicted, rather than the observed [, ] alternation i. The most obvious explanation in this case is the existence of a 3. form piɾim in other classiﬁers, in particular i ‘(1)’ and i ‘(2)’. This is another case of analogical relations being drawn without respect to classiﬁer-internal morphological structure; the comparable classiﬁers have the i vowel, though it is not determined by [, ]

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

74

    

Table 3.9. Classiﬁer stem paradigm for ɾa ‘(28)’ ɾa ‘(28)’ NFUT (/PRSL) SG

INCL PL/DU

1 ŋiɾaŋan ŋ-i-[]-ŋan 2 t̪iɾaŋan t̪- . . . 3 diɾaŋan / kiɾaŋan d- . . . / k- . . . 1+2 t̪iɾaŋan t̪-i-[]-ŋan 1 ŋaŋan ŋ-a-[]-ŋan 2 naŋam n- . . . 3 paŋam / kaŋam p- . . . / k- . . .

INNER STEM:

ɾa a : ∅ : IRR (/FUT)

PST

PSTIRR

SG 1 ŋiɾa ŋ-i-[]-∅ 2 t̪iɾa t̪- . . . 3 kiɾa / piɾa k- . . . / p- . . .

ŋiɾa ŋ-i-[]-∅ t̪iɾa t̪- . . . diɾa d- . . .

ŋiɾaŋi ŋ-i-[]-ŋi t̪iɾaŋi t̪- . . . diɾaŋi d- . . .

INCL piɾa 1+2 p-i-[]-∅

t̪iɾa t̪-i-[]-∅

t̪iɾaŋi t̪-i-[]-ŋi

PL 1 ŋiɻa ŋ-i-[]-∅ 2 niɻa n- . . . 3 kiɻa / piɻa k- . . . / p- . . .

ŋiɻa ŋ-i-[]-∅ niɻa n- . . . piɻa p- . . .

ŋiɻaŋi ŋ-i-[]-ŋi niɻaŋi n- . . . piɻaŋi p- . . .

DU 1 ŋiɻa ŋ-i-[]-∅ 2 niɻa n- . . . 3 kiɻa / piɻa k- . . . / p- . . .

ŋiɻa ŋ-i-[]-∅ niɻa n- . . . piɻa p- . . .

ŋiɻaŋe ŋ-i-[]-ŋe niɻaŋe n- . . . piɻaŋe p- . . .

alternations on an inner stem, but rather by an underlying inner stem vowel (visible not in the default stem form, but only in forms with suppletive StemC). Therefore the analogical mechanism depends on a shared morphosyntactic category 3., and on some shared formatives, but ignores the patterns of inner stem vowel defaults and alternations existent in other parts of the paradigm. Again it draws a phonological analogy that abrogates inner stem/afﬁx structure. In historical reconstruction, ‘demorphologization’ has been used to describe phonological material that at one point constitutes a regular, predictable morpheme, and at some later point loses its connection to morphological patterns from which it derived. For example, the ﬁnal rime of seldom derives from Old English dative *-um, while the m in French rompre ‘break’ derives from a nasal inﬁx associated with present tense in Latin (Klausenburger 1976; Hopper 1990). Each of these was once an inﬂectional exponent, because it was part of a form:meaning pattern shared by an inﬂectional class of lexemes, but the dissolution of these patterns has left them absorbed into lexical stems. The recent innovations observed in Murrinhpatha 1.(34). and 3.(28). do not begin from a clear ‘morphemic’ unit in this way, as predictable form:meaning

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

  

75

relations in the classiﬁer stem morphology have already long given way to lexically speciﬁc, unpredictable allomorphy. But the changes nonetheless reﬂect incremental steps on the path of demorphologization, undermining the morphological structure of the classiﬁer stem. Every time a paradigmatic cell in the system shifts from a more predictable allomorph to a less predictable one, the formative structure of the system is incrementally undermined. Processes of this type are probably responsible for much of the integrative complexity in Murrinhpatha verbs—though pursuit of this hypothesis would depend on more extensive historical reconstruction than is presently available (Green 2003).

3.7 Conclusions In this chapter, we have investigated changes in Murrinhpatha classiﬁer stem paradigms, a closed-class system with high integrative complexity. The system of intersecting formatives underlying the exponence of person, number, and tense on Murrinhpatha verb classiﬁer stems is unusually complex, in terms of both wealth of allomorphy and unpredictability of paradigmatic relations. We have studied changes unfolding in this system with the goal of determining whether observed changes reduce or increase the complexity of the system. Seven likely changes in progress were identiﬁed, based on variable exponents where younger speakers showed a strong preference for an innovative variant, as opposed to the conservative variant favoured by older speakers. Calculation of the most predictable allomorphs for these exponents was performed by adapting the model of Ackerman & Malouf (2015), but none of the seven observed changes were selected as expected by this model. Nor were the changed forms more predictable than the incumbent forms they replaced—in fact, in six of the seven instances, the innovated form was less predictable. Analysis of the analogical sources for two of the forms suggests that less predictable forms have been selected by speakers because of analogies that abrogate the inner stem/afﬁx structure evident in the system. The extensive phonological mutation already undergone by the inner stem elements has no doubt led to this further obfuscation of inner stem elements, deepening the overall complexity of the system. Incremental demorphologization produces integrative complexity, but also adds to opacity in structure. We have observed this in a closed-class system of thirty-nine members, but also argued that the problem of integrative complexity presupposes a closed class of some size. The size of the Murrinhpatha paradigms, with 1,638 forms in total, presumably allows for some degree of whole-form memorization. But evidence observed in analogical changes also shows that implicational relations are active in acquisition or processing, and not all forms are learnt and stored in isolation. We hope that further research on integrative complexity will provide more insight into how analogy and memorization interact in complex inﬂectional systems.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

76

    

Appendix Illustrated below are the inﬂectional paradigms for classiﬁers discussed in this chapter. The paradigms for (34) and (28) are illustrated in the body of the text.

/ø/ ‘(1)’

NFUT (/PRSL) SG

INCL PL/ DU

1 ŋem ŋ-e-[]-m 2 t ̪im t ̪-i-[]-m 3 dim / kem d-i-[]-m / k -e-[]-m 1+2 t ̪im t ̪-i.[].m 1 ŋaɾim ŋ-a-[]-m 2 niɾim n-i-[]-m 3 pirim / kaɾim p-i-[]-m / k-a-[]-m

INNER /∅/  STEM: /ɾi/ : /ju/ : (), : IRR (/FUT)

PST

PSTIRR

SG 1 ŋi ŋ-i-[]-∅ 2 t ̪i t ̪- . . . 3 ki/ pi k- . . . / p- . . .

ŋini ŋ-i-[]-ni t ̪ini t ̪- . . . dini d- . . .

ŋini ŋ-i-[]-ni t ̪ini t ̪- . . . dini d- . . .

INCL pi 1+2 p-i.[].∅

t ̪ini t ̪-i-[]-ni

t ̪ini t ̪-i-[]-ni

ŋaɾini ŋ-a-[]-ni

ŋaɾini ŋ-a-[]-ni

niɾini n-i-[]-ni piɾini p-i-[]-ni

niɾini n-i-[]-ni piɾini p-i-[]-ni

ŋaɾine ŋ-a-[]ne niɾine n-i-[]-ne piɾine p-i-[]-ne

ŋaɾine ŋ-a-[]ne niɾine n-i-[]-ne piɾine p-i-[]-ne

PL 1 ŋuju ŋ-u-[. ]-∅ 2 nuju n- . . . 3 kuju / puju k- . . . / p- . . . DU 1 ŋe ŋe.[].∅ 2 ne n- . . . 3 ke / pe k- . . . / p- . . .

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

  

/ɾu/ ‘go(6)’

NFUT (/PRSL) SG

INCL

PL/ DU

INNER /ɾu/ / STEM: ɻu/  /∅/  /ji/  (),  /mpa/  (),  IRR (/FUT)

77

/ɾa/ /ɾi/  /je/  (), , 

PST

PSTIRR

ŋuɾini ŋ-u[]ni t ̪uɾini t ̪- . . . wuɾini w- . . .

ŋuɾi ŋ-u[]-∅ t ̪uɾi t ̪- . . . wuɾi w- . . .

1 ŋuɾan ŋ-u-[]-n

SG 1 ŋuɾu ŋ-u-[]-∅

2 t ̪uɾan t ̪- . . . 3 wuɾan / kuɾan w- . . . / k- . . .

2 t ̪uɾu t ̪- . . . 3 kuɾu / puɾu k- . . . / p- . . .

1+2 t ̪uɾan t ̪-u-[]-n

INCL puɾu 1+2 p-u-[]-∅

t ̪uɾini t ̪-u[]-ni

t ̪uɾi t ̪u[]-∅

1 ŋumpan ŋ-u-[, ]-n 2 numpan n- . . . 3 pumpan / kumpan p- . . . / k- . . .

PL 1 ŋuɻu ŋ-u-[]-∅

ŋuɳi ŋ-u-[, ]-ɳi nuɳi n- . . . puɳi p- . . .

ŋuji ŋ-u-[, ]-∅ nuji n- . . . puji p- . . .

ŋuɳe ŋ-u-[, ]-ɳe

ŋuje ŋ-u-[, , ]-∅ nuje n- . . . puje p- . . .

2 nuɻu n- . . . 3 kuɻu / puɻu k- . . . / p- . . . DU 1 ŋa ŋ-a-[]-∅ 2 na n- . . . 3 ka / pa k- . . . / p- . . .

nuɳe n- . . . puɳe p- . . .

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

78

     /nu/ ‘(7)’

SG

INNER /nu/  STEM: /ni/ : /nuj/ :  /na/ : 

NFUT (/PRSL)

IRR (/FUT)

1 ŋunuŋam ŋ-u-[]-ŋam 2 t ̪unuŋam t ̪- . . . 3 nuŋam / kunuŋam ∅- . . . / k- . . . t ̪unuŋam t ̪-u-[]-ŋam

INCL 1+2

PST

PSTIRR

SG 1 ŋunu ŋ-u-[]-∅ 2 t ̪unu t ̪- . . . 3 kunu / punu k- . . . / p- . . .

ŋuna ŋ-u-[]-∅ t ̪una t ̪- . . . na* ∅- . . .

ŋuni ŋ-u-[]-∅ t ̪uni t ̪- . . . nuj ∅-u-[]-∅

INCL punu 1+2 p-u-[]-∅

t ̪una t ̪-u-[]-∅

t ̪uni t ̪-u-[]-∅

PL/DU 1 ŋunnuŋam ŋ-u-[]-ŋam 2 nunnuŋam n- . . . 3 punnuŋam / kunnuŋam p- . . . / k- . . .

/nnu/ : /nni/ :, : /nna/ :, :  /nne/ :, : , :

PL 1 ŋunnu ŋ-u-[]-∅

ŋunni ŋ-u-[, ]-∅ 2 nunnu nunni n- . . . n- . . . 3 kunnu / punnu punni k- . . . / p- . . . p- . . .

DU 1 ŋunna ŋ-u-[, ]-∅ 2 nunna n- . . . 3 kunna / punna k- . . . / p- . . .

ŋunna ŋ-u-[, ]-∅ nunna n- . . . punna p- . . .

ŋunni ŋ-u-[,]-∅ nunni n- . . . punni p- . . . ŋunne ŋ-u-[,, ]-∅ nunne n- . . . punne p- . . .

Note: Street (1987) in addition lists a variant /nuɻa/ use.feet.3.. This variant does not appear in our corpus data.

/la/ ‘(26)’ NFUT (/PRSL) SG

INCL PL/DU

1 ŋilam ŋ-i-[]-m 2 t ̪ilam t ̪- . . . 3 dilam / kilam d- . . . / k- . . . 1+2 t ̪ilam t ̪-i-[]-m 1 ŋillaŋam ŋ-i-[]-ŋam 2 nillaŋam n- . . .

INNER /la/ /lla/ : STEM: IRR (/FUT)

PST

PSTIRR

SG 1 ŋila ŋ-i-[]-∅ 2 t ̪ila t ̪- . . . 3 kila / pila k- . . . / p- . . .

ŋila ŋ-i-[]-∅ t ̪ila t ̪- . . . dila d- . . .

ŋila ŋiŋ-i-[]-ŋi t ̪ilaŋi t ̪-i-[]-ŋi dilaŋi d-i-[]-ŋi

INCL pila 1+2 p-i-[]-∅

t ̪ila t ̪-i-[]-∅

t ̪ilaŋi t ̪-i-[]-ŋi

ŋilla ŋ-i-[]-∅ nilla n- . . .

ŋillaŋi ŋ-i-[]-ŋi nillaŋi n- . . .

PL 1 ŋilla ŋ-i-[]-∅ 2 nilla n- . . .

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

   3 pillaŋam / killaŋam p- . . . / k- . . .

3 killa / pilla k- . . . / p- . . . DU 1 ŋilla ŋ-i-[]-∅ 2 nilla n- . . . 3 killa / pilla k- . . . / p- . . .

79

pilla p- . . .

pillaŋi p- . . .

ŋilla ŋ-i-[]-∅ nilla n- . . . pilla p- . . .

ŋillaŋi ŋ-i-[]-ŋi nillaŋi n- . . . pillaŋi p- . . .

/ɾa/ ‘. (36)’ INNER /ɾi/  STEM: /ɻi/ : /∅/ : NFUT (/PRSL) SG

INCL PL/DU

1 ŋiɾim ŋ-i-[]-m 2 t ̪iɾim t ̪- . . . 3 diɾim / kiɾim d- . . . / k- . . . 1+2 t ̪iɾim t ̪-i-[]-m 1 ŋim ŋ-i-[]-m 2 nim n- . . . 3 pim / kim p- . . . / k- . . .

IRR (/FUT) PST SG 1 ŋiɾi ŋ-i-[]-∅ 2 t ̪iɾi t ̪- . . . 3 kiɾi / piɾi k- . . . / p- . . .

PSTIRR

ŋiɾi ŋ-i-[]-∅ t ̪iɾi t ̪- . . . diɾi d- . . .

ŋiɾini ŋ-i-[]-ni t ̪iɾini t ̪- . . . diɾini d- . . .

INCL piɾi t ̪iɾi 1+2 p-i-[]-∅ t -̪ i-[]-∅

t ̪iɾini t ̪-i-[]-ni

PL 1 ŋiɻi ŋ-i-[]-∅ 2 niɻi n- . . . 3 kiɻi / piɻi k- . . . / p- . . .

ŋi ŋ-i-[]-∅ ni n- . . . pi p- . . .

ŋiɻi ŋ-i-[]-∅ niɻi n- . . . piɻi p- . . .

DU 1 ŋiɻi ŋ-i-[]-∅ 2 niɻi n- . . . 3 kiɻi / piɻi k- . . . / p- . . .

ŋi ŋ-i-[]-∅ ni n- . . . pi p- . . .

ŋiɻi ŋ-i-[]-∅ niɻi n- . . . piɻi p- . . .

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

80

    

Acknowledgements This research is funded by the Australian Research Council Centre of Excellence for the Dynamics of Language (Project ID: CE140100041). We are greatly indebted to the people of Wadeye, Australia, who have generously shared their knowledge of Murrinhpatha with us. We also thank Peter Arkadiev and Francesco Gardani for inviting us to present at the workshop which led to this volume, and for their comments on our original submission. Bill Forshaw, Jeff Parker, and an anonymous reviewer also provided insightful comments, as did audience members of the ‘Morphological Complexity’ workshop at Societas Linguistica Europaea (SLE), 2015. We dedicate this chapter to the late Chester Street, whose detailed documentation work revealed the extraordinary complexity of Murrinhpatha verbs.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

4 Overabundance resulting from language contact Complex cell-mates in Gurindji Kriol Felicity Meakins and Sasha Wilmoth

4.1 Introduction One of the oft claimed results of language contact is the reduction of morphological complexity. For example, syncretism, allomorphic simpliﬁcation, the difﬁculty of transferring morphemes, and increased paradigmatic regularity are all observed outcomes of contact-induced change (e.g., McWhorter 1998; Myers-Scotton 2002; Janse & Tol 2003; Gardani 2008). These processes reduce the expression of morphological features, for example case, tense/aspect/mood (TAM), gender, and number; and the complexity of relationships between cells in paradigms expressing these features. In this sense, these changes represent an absolute decrease in the number of morphosyntactic distinctions that a language makes both in terms of the internal structure of words and their arrangement into inﬂectional classes. This type of morphological complexity has been termed ‘complexity of exponence’ (Anderson 2015a: 20) or ‘E(numerative) complexity’ (Ackerman & Malouf 2013: 433; see also section 1.3.1 in the Introduction to this volume). Such changes can be quantiﬁed as a measure of average paradigm entropy, that is, the degree of uncertainty in predicting the content of a particular cell in a paradigm (Ackerman et al. 2009; Ackerman & Malouf 2013; Parker & Sims, Chapter 2, this volume). One area of complexity, which Anderson (2015a: 22) notes as having received less attention in the morphological literature, is variation within the cells of a paradigm, for example ‘dived’ and ‘dove’ which are different word forms of the past tense form of {} in English. Thornton (2011) calls this type of complexity ‘overabundance’. Overabundance refers to multiple forms being realized within the same cell in a paradigm, or lexemes with ‘cell-mates’, as Loporcaro quips (see Loporcaro & Paciaroni 2011: 420 and Loporcaro, Chapter 6, this volume). Thornton observes that variation between cell-mates may be subject to sociolinguistic and syntactic-semantic conditions. Felicity Meakins and Sasha Wilmoth, Overabundance resulting from language contact: Complex cell-mates in Gurindji Kriol In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Felicity Meakins and Sasha Wilmoth. DOI: 10.1093/oso/9780198861287.003.0004

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

82

    

In this chapter, we demonstrate that overabundance can increase in situations of language contact and, therefore represent an increase in E-complexity due to the proliferation of exponents, in this case, cell-mates. Perhaps more interestingly, we also suggest that overabundance represents an increase in I(ntegrative) complexity, that is, increased within-cell variation makes it harder for speakers ‘to make accurate guesses about unknown forms of words based on exposure to known forms’ (Ackerman & Malouf 2013: 436; see also section 1.3.2 in the Introduction to this volume). Usually I-complexity refers to how speakers are able to surmise a word form in one cell in a paradigm based on other forms in the same paradigm. In this chapter, we show how overabundance requires speakers to make calculated choices about forms based on features beyond the paradigm. We also show that the I-complexity of overabundance can be measured using generalized linear mixed models (GLMM) which probabilistically measure the use versus non-use of a feature (dependent variable) against semantic, grammatical, and information structure features in a clause (independent variables or predictors) and their interactions, within a cluster of idiolects (random variable) (Pinheiro & Bates 2000; Baayen 2008; Marschner 2011). The relative importance of the predictors can then be determined using dependence analysis (Azen & Traxel 2009). We present a case study of the development of overabundance in the subjectmarking system of an Australian mixed language, Gurindji Kriol, and claim that this dimension of complexity is the result of language contact. Furthermore, we assess whether this complexity has stabilized in second-generation child speakers of Gurindji Kriol. This complexiﬁcation and subsequent stabilization due to contact is reﬂected experimentally in Berdicevskis & Semenuks (Chapter 11, this volume). Overabundance in Gurindji Kriol manifests itself as optional case marking and involves variation within a cell, that is, the use or non-use of a case sufﬁx where the grammatical role of the nominal is unaffected by non-use (cf. McGregor & Verstraete 2010). This pattern is shown in sequential clauses in (1) where the subject is marked in the ﬁrst clause and unmarked in the second clause.¹ (1)

Warlaku na bi-ngku bin jeij-im im dog  bee-  chase- 3. dat mukmuk-Ø bin jeij-im dat karu na the owl-  chase- the child  ‘The bees chased the dog and the owl chased the child.’ (BP: 9yrs: FM13_35_3e: Frog story: 2:10min)

¹ In all examples, Gurindji elements are given in italics, Kriol in plain font and subjects are bolded.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 -   

83

Meakins (2009, 2015) shows that optional subject marking developed as a result of contact between Gurindji and Kriol whereby the Gurindji ergative marker was retained in the process of the formation of the mixed language, Gurindji Kriol, but became optional and was later reanalysed as nominative marking when it also came to mark intransitive subjects. In this respect, overabundance developed in the nominative cell of the case paradigm where an alternation now exists between the forms -ngku/-tu and a zero morph (or nothing, depending on one’s theoretical approach). Variation is driven by a number of semantic, syntactic, and information structure features including transitivity and word order (Meakins 2009; Meakins & O’Shannessy 2010). This optional case marking system requires speakers of Gurindji Kriol to constantly monitor the clause and its place in the discourse to make decisions about whether to overtly express subject marking or not. Thus in this chapter, we make the case that overabundance in Gurindji Kriol is an example of a contact-induced change, which involves the complexiﬁcation of an inﬂectional paradigm rather than its simpliﬁcation. In particular, we examine the further development of overabundance in subject marking using new data from Gurindji children to determine whether the complexity in the case paradigm has stabilized or whether complexiﬁcation is ongoing. Changes in overabundance are quantiﬁed along two dimensions using different quantitative methods: (i) the change between generations of Gurindji speakers in the contribution of different predictors to the use of subject marking is shown through GLMM (Marschner 2011); and (ii) generational differences in the relative contribution of the different factors is demonstrated using dominance analysis (Azen & Traxel 2009).

4.2 Dimensions and measures of morphological complexity in language contact Numerous studies have shown instances of the reduction of morphological complexity, particularly in inﬂectional paradigms, in situations of language contact (see Miestamo et al. 2008 for a recent collection of papers). There are a number of dimensions which can be affected by simpliﬁcation processes. Fundamentally, languages that have morphology are considered to be more complex than languages which do not, that is, isolating languages (Sapir 1921; Anderson 1992) (see section 1.2 in the Introduction to this volume). Extreme cases of language contact such as creolization have also been shown to have a radically reductive effect on inﬂectional morphology (see Miestamo et al. 2008 for a recent collection of papers, and Henri, Stump, & Tribout, Chapter 6, this

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

84

    

volume, and McWhorter, Chapter 10, this volume, for further discussions).² Similarly, inﬂectional morphology is rarely borrowed or switched into the grammatical frame of another language (Myers-Scotton 2002; Aikhenvald & Dixon 2006; Matras & Sakel 2007; Gardani 2008). Where inﬂectional morphology remains in situations of language contact, different dimensions of complexity are affected. In particular what Anderson (2015a: 20) terms the ‘complexity of exponence’ or Ackerman & Malouf (2013: 433) call ‘E(numerative) complexity’ often undergoes reduction. For example syncretism, allomorphic simpliﬁcation, and increased paradigmatic regularity are all observed outcomes of contact-induced change and language obsolescence (Dorian 1978; Gal 1989; Janse & Tol 2003). All of these processes reduce the exponence of morphological features such as case, TAM, gender, and number, and the complexity of relationships between cells within paradigms expressing these features. At the extreme end, these features gather up their morphological skirts and step out of paradigms and into periphrastic constructions, thereby transforming from synthetic forms into analytic forms (see de Groot’s 2008 study of Hungarian in contact for a recent example). Paradigmatic complexity can be measured as ‘entropy’ which captures the degree of predictability of forms in a paradigm (Ackerman et al. 2009; Ackerman & Malouf 2013). Entropy has been used to measure the relative complexity of different languages (see also Stump & Finkel 2016 for related work), however it can also be used to measure changes in complexity across time within the same language (see Mansﬁeld and Nordlinger, Chapter 3, this volume, for a case study of Murrinhpatha). As Anderson (2015a: 22) has noted, a dimension of complexity which has received less attention in the morphological literature is variation within the cells of a paradigm, for example the ‘dived’ and ‘dove’ examples given in section 4.1— and many more examples of co-existing regular and irregular past tense and plural forms in English. Thornton (2011) calls the exponence of multiple forms in the same cell in a paradigm ‘overabundance’. Overabundance (which can be thought of as morphological ‘cell-mates’) is deﬁned as ‘a cell in a paradigm . . . ﬁlled by two or more synonymous forms which realize the same set of morpho-syntactic properties’ (Thornton 2011: 2). She uses the Italian verb paradigm to demonstrate how variation between forms is motivated by different phonological and syntactic-semantic conditions. Thornton’s examples of overabundance mostly involve cases of language change and the regularization of inﬂectional paradigms. In this scenario, an irregular form co-exists with a newer regularized form. Processes of regularization are one source of variants. We argue that contact with another language provides another source of variants. It is common for multiple forms from different ² Although, see a number of surveys (Plag 2003a, 2003b; Roberts & Bresnan 2008) and countersurveys (DeGraff 2005; Parkvall 2008; Bakker et al. 2011; Henri & Kihm 2015) in response to this claim.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 -   

85

languages to co-exist with their use determined by other features in the clause. To give another example from English, possession is expressed by the s-genitive ( Priming > Co-referential pronoun.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 -   

97

Table 4.5. Relative effect of the signiﬁcant predictors according to dominance analysis Additional contribution of fixed effects Subset model

R2

Transitive SV order (X2) (X1)

Coreferential Priming (X3) (X4)

Actualized (X5)

k = 0 average X1 .199 X2 .028 X3 .054 .092 X4 X5 .000 k = 1 average

.199 .030 .048 .088 .009 .044

.092 .195 .036 .054 .000 .071

.000 .208 .028 .055 .092 .096

X1X2 .229 X1X3 .248 .287 X1X4 X1X5 .208 .065 X2X3 .128 X2X4 .028 X2X5 .146 X3X4 .055 X3X5 .092 X4X5 k = 2 average .262 X1X2X3 .322 X1X2X4 .237 X1X2X5 .332 X1X3X4 .259 X1X3X5 .296 X1X3X5 .163 X2X4X4 .066 X2X3X5 .128 X2X4X5 .147 X3X4X5 k = 3 average X1X2X3X4 .342 X1X2X3X5 .272 X1X2X4X5 .332 X1X3X4X5 .342 X2X3X4X5 .163 k = 4 average X1X2X3X4 .358 X5 Overall average

.197 .194 .209 .186 .204 .204 .199 .179 .206 .204 .185 .194 .195 .195 -

.085b .014 .035 .029 .017 .011 .036 .024 .010 .013 .036 .016 .019 .016 .016 -

.054 .194 .011 .092 .001 .075 .033 .045 .051 .035 .038 .055 .043 .020 .035 .046 .035 .034 .026 .026 -

.093 .084 .088 .098 .100 .055 .086 .080 .095 .083 .097 .089 .086 .086 -

.008 .011 .009 .001 .000 .001 .005 .010 .010 .010 .000 .008 .016 .016 -

.166

.034

.046

.085

.025

a b

M

(X2 X3) - X3 (X1 + X3 + X4 + X5) - 4

.028 .201 .037a .100 .000

Table 4.6. Occurrence of subject marking in child Gurindji Kriol speakers according to predictors Transitive NOM no yes %

no 1194 653 35

SV Order yes 498 630 56

VS 101 71 41

Animate SV 1591 1212 43

A 1615 1236 43

Priming I 77 47 38

no 1288 576 31

Actualized yes 404 707 64

no 1489 1120 43

Corefer yes 203 163 45

no 1054 658 38

TOTAL yes 638 625 49

1692 1283 43

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 -   

99

Table 4.7. Output of generalized linear mixed model analysis on 2,975 tokens Random effects

Name

Variance

Std. Dev.

Speaker

(Intercept)

0.7514

0.8668

Analysis conducted on 2,975 grammatical subjects, ﬁfty-three speakers Fixed effects (Intercept) Transitive SV order Animate Co-referential Primed Actualized

Estimate 1.45894 1.01318 0.23995 0.31779 0.31724 0.98983 0.18714

Std. Error 0.24122 0.08926 0.19073 0.21602 0.09360 0.09040 0.13185

z value 6.048 11.351 1.258 1.471 3.389 10.949 1.419

p value < 0.001 < 0.001 0.20838 0.14125 < 0.001 < 0.001 0.15581

4.4.4 Discussion The overall question posed by this chapter is whether a change in the complexity in the expression of subject marking has occurred across two generations of Gurindji Kriol speakers. This question is set against the backdrop of broader theoretical questions about how to measure complexity in cases of overabundance, and whether all language contact leads to simpliﬁcation. The combination of these broader questions allows us to determine whether changes have taken place in subject marking in Gurindji Kriol, and why these changes might have occurred. The question of whether there has been a change in complexity of subject marking was modelled using GLMM analysis. The results show three predictors in common for adults and children. Transitive subjects such as (8) are signiﬁcantly more likely to be marked than intransitive clauses such as (9). Whether the nominal subject is marked also primes the appearance of the nominative in the next occurrence of a nominal subject. An example is given in (10) of sequential clauses containing nominal subjects with overt nominative marking. Third, subject marking is more likely when a co-referential pronoun is present, as shown in (11) in comparison with (12) which does not have a co-referential pronoun. (8)

Warlaku-ngku bait-im marluka leg-ta dog- bite- old.man leg- ‘The dog bites the old man on the leg.’ (SS: FHM051: 1:37min)

(9)

Dat warlaku bin kutij nyantu-ranyj the dog  stand 3- ‘The dog stood on its own.’ (CE: FHM014: 2:24min)

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

100

    

Table 4.8. Relative effect of the signiﬁcant predictors according to dominance analysis Additional contribution of fixed effects Subset model

R2

M

k = 0 average X1 .047 X2 .000 X3 .006 X4 .048 X5 .000 k = 1 average X1X2 .047 X1X3 .047 X1X4 .104 X1X5 .047 X2X3 .007 X2X4 .049 X2X5 .000 X3X4 .056 X3X5 .006 X4X5 .049 k = 2 average X1X2X3 .053 X1X2X4 .104 X1X2X5 .047 X1X3X4 .111 X1X3X5 .052 X1X3X5 .105 X2X4X4 .057 X2X3X5 .058 X2X4X5 .049 X3X4X5 .056 k = 3 average X1X2X3X4 .112 X1X2X3X5 .070 X1X2X4X5 .105 X1X3X4X5 .112 X2X3X4X5 .058 k = 4 average X1X2X3X4 .113 X5 Overall average

Transitive SV order (X2) (X1)

Coreferential Priming (X3) (X4)

Actualized (X5)

.047 .047 .041 .056 .047 .048 .046 .055 .047 .055 .046 .056 .051 .055 .012 .056 .056 .045 .055 .055 -

.000 .047 .007 .046 .000 .025 .006 .000 .000 .002 .052 .000 .010 .001 .018 .000 .002 .005 .001 .001 -

.006 .041 .001 .050 .000 .023 .006 .007 .005 .008 .058 .007 .015 .008 .023 .007 .009 .012 .008 .008 -

.048 .056 .001 .008 .001 .017 .057 .064 .058 .050 .049 .050 .055 .059 .058 .060 .000 .044 .043 .043 -

.000 .047 .000 .006 .049 .026 .000 .005 .001 .051 .000 .000 .010 .017 .001 .001 .001 .005 .001 .001 -

.049

.008

.013

.041

.008

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 -   

101

(10) Najan kujarra-ngku dei bin gon jeij-im im another two- 3.  go chase- 3. ankaj yapakayi najan kujarra-ngku na rarraj poor.thing small another two-  run ‘Another two went chasing the poor little thing. Two more run then.’ (RR: FM009.A: 6:11min) (11) Jintaku warlaku-ngku i bin bait-im im one dog- 3.  bite- 3. marluka la leg-ta man  leg-  ‘One dog bit a man on the leg.’ (AC: FHM052: 1:58min) (12) Dat warlaku bin bait-im im leg-ta dat marluka the dog  bite- 3. leg- the man ‘The dog bit the man on the leg.’ (SS: FHM065: 4:53min) Adults had two more signiﬁcant variables than children which predicted subject marking—word order, that is, Gurindji Kriol-speaking adults are more likely to mark subjects when they occur after the verb, as shown in (13) as opposed to (14); and event actualization, that is, events that weren’t actualized were less likely to be marked, as demonstrated in (15) which has a verb marked continuative and (16) which uses the potential auxiliary. (13) I=m put-im jumok tebul-ta igin dat kajirri-ngku 3.= put- smoke table- too the woman- ‘The woman puts the smokes on the table.’ (LS: FHM066: 0:19min) (14) Dat kajirri i=m put-im jumok jiya-ngka the woman 3.= put- smoke chair- The woman puts the smokes on the chair. (CA: FHM127: 2:24min) (15) Dat karu-ma mirlarrang-jawung i garra jarrwaj The child- spear- 3.  spear im jamut 3. turkey ‘The child will shoot the turkey with a spear.’ (RR: FHM061: 3:10min) (16) Dat warlaku i bin hard-im-bat-karra nyanuny the dog 3.  hurt--- 3. ‘The dog hurt his paw.’ (DO: FM15_55_1b: 1:42min)

wartan paw

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

102

    

In terms of E-complexity, both the adult system and the child system display overabundance, while traditional Gurindji uses subject marking obligatorily. Nonetheless the adult Gurindji Kriol system requires attention to a greater number of variables to make decisions about the application of subject marking. Thus the subject marking system seems to have complexiﬁed, in the sense of Icomplexity, at the point of contact with the genesis of the mixed language (represented in the adult speech), then simpliﬁed in the next generation. The child system seems to be a reﬁned version of the adult system. Of the three variables in common, the relative predictive power of variables is the same: transitivity > priming > use of co-referential pronoun. For two of those predictors—priming and co-referential pronoun—subject-marking usage seems stable across the generations. For adults, 60% of primed subjects are marked compared with 28% of unprimed subjects; and children: 64% of primed subjects compared with 31% of unprimed subjects. Similarly for adults, 48% of subjects with co-referential pronouns are marked compared with 28% of subjects without co-referential pronouns; and for the children: 49% of subjects with co-referential pronouns compared with 28% of subjects without co-referential pronouns. Thus the inﬂuence of priming and the use of co-referential pronoun seem quite stable diachronically. On the other hand, transitivity, which is the strongest predictor of subject marking for both adults and children, shows larger differences across the generations—adults: 59% of transitive subjects compared with 16% of intransitive subjects; and children: 56% of transitive subjects compared with 35% of intransitive subjects. We argue that differences in the importance of transitivity, coupled with the loss of SV order as a predictor of subject marking in the children’s speech, are the results of decreasing contact with Gurindji. First, the subject marking in Gurindji Kriol ﬁnds its origins in the Gurindji ergative marker, which marked only transitive subjects. Many members of the ﬁrst generation of Gurindji Kriol speakers only used subject marking for transitive subjects, although it was clearly beginning to spread to intransitive subjects. For child speakers of Gurindji Kriol, this pattern is much more entrenched, suggesting that the original inﬂuence of the Gurindji ergative pattern is waning. Second, the loss of SV order as a signiﬁcant variable reinforces the argument that there is a decreasing contact with the Gurindji system. In general, SV order is more dominant for child speakers (only 5% of transitive clauses show VS order compared with 12% of adult speakers), reﬂecting the Kriol system of argument disambiguation. For adult speakers, ergative marking is more likely in VS clauses, which reﬂects the continuing interplay of the Gurindji and Kriol systems of argument disambiguation. This inﬂuence has been lost in child speakers.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 -   

103

4.5 Concluding remarks This study has shown that complexiﬁcation occurred in the area of subject marking in Gurindji Kriol in the intense contact period which saw its genesis. Subject marking was borrowed from Gurindji where it transformed from obligatory to variable marking, leading to a situation of overabundance, that is, a proliferation of cell-mates (E-complexity). Overabundance required speakers to monitor other linguistic features in the clause and discourse more broadly— transitivity, SV order, the marking of the previous nominal subject, the presence of a co-referential pronoun, and event actualization, rather than just the phonological composition of the stem, as is the case in Gurindji (I-complexity). Another generation on and only three of these variables are now relevant—transitivity, the presence of a co-referential pronoun, and priming. We argue that changes in the relative importance of transitivity and SV order in the children’s speech, and therefore simpliﬁcation in the exponence of overabundance, is the result of decreasing contact with Gurindji. This chapter demonstrates that language contact does not always lead to the simpliﬁcation of morphology, and in the case of overabundance, complexity, that is, the degree of variation in the expression of a form within the cell of a paradigm, can be a result of language contact. In the situation outlined by this chapter, the intense contact between Gurindji and Kriol argument marking systems which led to the formation of Gurindji Kriol also saw the development of a system of subject marking which was derived from Gurindji but was more complex than the obligatory marking system of Gurindji. The new generation of Gurindji Kriol has less access to Gurindji, that is, there are fewer speakers of Gurindji in their linguistic environment and they have had fewer years of exposure to Gurindji than the adult speakers. The result has been a simpliﬁcation of overabundance where the system is no longer an interplay between the Gurindji and Kriol systems of argument disambiguation (i.e., SV order no longer predicts subject marking), and there is an increase in the marking of intransitive subjects, which is far removed from the function of the original Gurindji ergative marker.

Acknowledgements The data collection (see section 4.4.1) was funded by the Aboriginal Child Language (ACLA) project from 2004 to 2007, the Jaminjungan and Eastern Ngumpin DoBeS project from 2007 to 2008 (available in the DoBeS archives—http://dobes.mpi.nl/ projects/jaminjung/), a Hans Rausing Endangered Languages Project from 2008 to 2010 (IPF0134; available in the ELAP archive—http://elar.soas.ac.uk/deposit/0273), an Australian Research Council APD project from 2009 to 2012 (DP0985024); and an

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

104

    

Australian Research Council DECRA project from 2014 to 2017 (DE140100854). As well as Cassandra Algy, a number of language consultants were instrumental in the collection of data: Samantha, Lisa, Rosie & Leanne Smiler, Cecelia Edwards, and Ronaleen & Anne-Marie Reynolds. We are also grateful for the support of Appen, in particular to Simon Hammond for technical support.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

5 Derivation and the morphological complexity of three French-based creoles Fabiola Henri, Gregory Stump, and Delphine Tribout

5.1 Introduction The claim of creole simplicity is pervasive in linguistics. This claim harks back to the nineteenth-century view that linguistic complexity correlates with the properties of a language’s inﬂectional morphology and with its age (DeGraff 2001). According to this view, isolating languages are ‘primitive’ in comparison with synthetic languages, whose morphology is taken as evidence of heightened complexity. Modern creolistic literature abounds with such assumptions. Creoles are seen as newborn languages that emerge from rudimentary pidgins embodying a break in the transmission of the lexiﬁer. As such, they constitute a kind of transition between primitive pidgin ‘protolanguages’ and mature languages (Bickerton 1981). Complementing this view of creoles as ‘young’ languages are comparisons with ‘complex’ languages that purportedly reveal creoles to be ‘the world’s simplest grammars’ on the grounds that they exhibit no, or at most, insigniﬁcant vestiges of the lexiﬁer’s system of inﬂectional marking (Seuren & Wekker 1986; Bickerton 1988; McWhorter 2001; Parkvall 2008; Bakker 2014; among others). As has been argued elsewhere (DeGraff 2001; Mufwene 2008; Blasi et al. 2017), these assertions rest upon several controversial assumptions that may be questioned on empirical, theoretical, and sociohistorical grounds. In the domain of morphology, for example, the received view that creoles are maximally isolating has been decisively disconﬁrmed by unequivocal evidence of inﬂectional morphology in many creoles (Kihm 1994; DeGraff 2001; Bakker 2003; Baptista 2003a, 2003b; Roberts & Bresnan 2008; among others). It is true that a creole may exhibit less morphology than its lexiﬁer,¹ but does this entail that it is less complex? ¹ Studies relating to the morphological complexity of creoles usually rely on comparisons with the lexiﬁers rather than with the contributing substrates. A combination of factors has given rise to this preference. First, the formation of a creole usually involves one contributing lexiﬁer, but may involve several substrates whose contributions to the creole’s formation are hard to evaluate in terms of proportion. In the absence of adequate historical documentation, we cannot always attribute particular contributions to particular substrate languages. Even so, we can deﬁnitely afﬁrm that the substrates of Fabiola Henri, Gregory Stump, and Delphine Tribout, Derivation and the morphological complexity of three French-based creoles In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Fabiola Henri, Gregory Stump, and Delphine Tribout. DOI: 10.1093/oso/9780198861287.003.0005

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

106

 ,  ,   

Morphological complexity is often equated with numerousness—of morphs, categories, processes, or paradigm cells; but this is not the only way of measuring complexity, nor is it in general the most enlightening way (Ackerman & Malouf 2013; Stump 2017). In this chapter, we draw upon an alternative conception of morphological complexity which we apply to a language’s system of derivational morphology. Drawing on a precise analysis of their deverbal derivation, we argue that three French-based creoles (Mauritian, Guadeloupean, and Haitian) display an unexpected degree of morphological complexity. We detail our conception of morphological complexity in section 5.2, and in section 5.3, we discuss the issue of creole simplicity. In section 5.4, we examine the morphology of French (the lexiﬁer language of these creoles) and that of the creoles themselves. In section 5.5, we deﬁne our theoretical framework. Finally, section 5.6 presents our new analysis of deverbal nominalizations in Mauritian, Guadeloupean, and Haitian.

5.2 Morphological complexity Various perspectives have informed recent discussions of the notion of linguistic complexity (Dahl 2004; Hawkins 2004; Miestamo et al. 2008; Sampson et al. 2009; Newmeyer & Preston 2014; and Baerman et al. 2015a). On the one hand, the complexity of a linguistic phenomenon may be seen in psycholinguistic terms as the extent of the difﬁculties that it poses for a language’s learners and users. On the other hand, complexity may be seen in more absolute terms as an independently measurable property of the language system itself, separable, in principle, from issues of acquisition, production, and processing (though no doubt correlated with them in discoverable ways). Moreover, linguistic complexity is logically of at least two types (Ackerman & Malouf 2013): a linguistic phenomenon’s enumerative complexity depends on how many categories (of whatever type) it employs; its integrative complexity, by contrast, depends on the idiosyncrasy of the interactions among those categories. A language’s morphology can exhibit complexity in a variety of ways. The most intensively studied kinds of complexity involve either the morphotactics of individual word forms (whose enumerative complexity is a function of degree of synthesis and degree of fusion; Schlegel 1808; Humboldt 1836; Sapir 1921; Greenberg 1960; Bickel & Nichols 2013) or the structure of whole inﬂectional paradigms (whose integrative complexity is a function of the predictability of a paradigm’s word forms; Moscoso del Prado Martín et al. 2004; Ackerman et al. 2009; Milin et al. 2009; Ackerman & Malouf

Caribbean creoles differ from those of Indian Ocean creoles. Moreover, creolistics has a history of Eurocentrism, which has favoured the comparison of creole grammars with the more familiar grammars of their Indo-European lexiﬁers.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

    

107

2013; Stump & Finkel 2013). But a language’s morphology may exhibit other kinds of complexity as well.² Here, we are concerned with the integrative complexity of a language’s morphology as reﬂected by the interaction of a lexeme’s inventory of forms with its participation in deverbal derivation. In general, a language’s derivational morphology may exhibit complexity in two different dimensions. In order to distinguish these, it is useful to distinguish not only between a derivational relation’s  and  , but also between the relation’s  and  —the speciﬁc stems of the base and derived lexemes whose morphology participates in the formal expression of their derivational relation. Thus, the derivational relation of the base lexeme  to the derived lexeme  is formally expressed by means of the relation of the base stem thiev- to the derived stem thievish. Given these distinctions, the ﬁrst dimension of a derivational relation’s complexity is that of the predictability of the base lexeme’s base stem; the second dimension is that of a base stem’s restrictedness in the morphology of the base lexeme. Consider ﬁrst the dimension of base-stem predictability. In discussing this dimension, we make the uncontroversial assumption (Aronoff 1994; Stump 2001) that a lexeme L has a   whose members serve in the deﬁnition of both (i) the inﬂected word forms constituting L’s inﬂectional paradigm; and (ii) the stem sets of lexemes derived from L. In general, we assume that a lexeme’s stem set may include both free and bound stems. On this assumption, the complexity of a particular derivational relation depends on which member of the base lexeme’s stem set is its base stem in that relation. In the simplest cases—those whose complexity is of degree 0—the base stem for a base lexeme L in a particular derivational relation is the only member of L’s stem set. From this endpoint of maximal simplicity, successively greater degrees of complexity can be calibrated. In cases of derivation exhibiting complexity of degree 1 or 2, the base lexeme in a particular derivational relation possesses more than one stem, only one of which serves as its base stem in that relation. In cases exhibiting complexity of degree 0 or 1, the base lexeme’s base stem is predictable; in cases exhibiting complexity of degree 2, the base lexeme’s base stem is unpredictable. Thus, instances of derivation may evince three degrees of increasing complexity, as in Figure 5.1. This ﬁrst notion of complexity calls to mind those approaches to complexity based on information theory (Arkadiev & Gardani, Chapter 1, this volume); in such approaches, complexity arises from a lack of predictability among a system’s parts. In assessing complexity of this sort in a system of inﬂection classes, the parts at issue are an inﬂectional paradigm’s cells (cf. Parker & Sims, Chapter 2, this volume); here, by contrast, the parts at issue are those members of a base ² See Stump (2017) for a discussion of the wide range of possible measures of morphological complexity.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

108

 ,  ,    Complexity Is base lexeme’s base stem in R predictable? low

↕ high

yes

no

Degree Example

0

boy → boyish

1

man (~ men) → mannish goose (~ geese) → goosish

2

self ~ selve(s) → selfish BUT thief ~ thieve(s) → thievish

Cardinality of base lexeme’s stem set 1

>1

Figure 5.1. Degrees of complexity in the predictability of a base lexeme’s base stem in a particular derivational relation R

lexeme’s stem inventory available for the deﬁnition of a derived lexeme’s stem inventory. By this criterion, the derivational relation between boy and boyish is least complex, since the stem on which boy-ish is based is the only available choice, the sole stem of boy; the derivational relation between man and mannish (or between goose and goosish) is more complex, since the stem on which mann-ish (or goos-ish) is based is not the only available choice, though it does conform to a general pattern favouring the use of the singular form’s stem; and the relation between thief and thievish is most complex, since the stem on which thiev-ish is based is not the only available choice and actually fails to conform to the general pattern favouring the use of the singular form’s stem. The second dimension of a derivational relation’s integrative complexity is that of base-stem restrictedness. Where X is the particular member of a lexeme L’s stem set that serves as L’s base stem in a particular derivational relation, how restricted a role does X play in the morphology of L? In the simplest cases (e.g., that of English grass ! grassy), X is L’s only stem and therefore has an unrestricted role in the morphology of L. In more complex cases (e.g., that of English leaf [~ leave(s)] ! leafy), a base lexeme L’s base stem in a particular derivational relation is only used in the realization of certain cells in L’s inﬂectional paradigm, so that its role in L’s inﬂectional morphology is restricted according to the morphosyntactic property set to be realized. In the most complex cases (e.g., that of English louse /laʊs/ ! lousy /laʊzi/), a base lexeme L’s base stem is ‘hidden’ to the extent that it has no role at all in the inﬂection of L but is reserved for deﬁning the stems of some or all lexemes deriving from L. This second dimension of complexity is schematized in Figure 5.2, where we again distinguish three degrees of complexity. This second notion of complexity is qualitative in the sense that it equates complexity with deviation from a canonical ideal (cf. Nichols, Chapter 7, this volume)—speciﬁcally, it equates complexity with deviation from a canonical pattern in which the stem that deﬁnes a derived lexeme’s form also deﬁnes the

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

     Complexity

Degree

Role of X in L’s morphology

Example

0

Unrestricted because X is L’s sole stem

grass → grassy

1

In the inflection of L, X is restricted to the realization of certain morphosyntactic property sets

leaf [~ leave(s)] → leafy

2

X is not used in the inflection of L, but is restricted to the definition of stems of derivatives of L

louse /laʊs/ → lousy /laʊzi/

low

↕

high

109

Figure 5.2. Degrees of complexity in the restrictedness of stem X in the morphology of lexeme L, where X serves as L’s base stem in a particular derivational relation

base lexeme’s inﬂected forms. By this criterion, the derivational relation between grass and grassy is least complex, since the stem on which grass-y is based is employed in both inﬂected forms of grass; the derivational relation between leaf and leafy is more complex, since the stem on which leaf-y is based is only employed in one of the inﬂected forms of leaf; and the relation between louse and lousy is most complex, since the stem on which lous-y is based isn’t employed in either of the inﬂected forms of louse.

5.3 Creole simplicity According to Seuren (1998: 292–3), ‘if a language has a Creole origin it is SVO, has TMA particles, [and] has virtually no morphology’. Claims of this kind reﬂect an ideology about creoles that ﬁnds its origin in the eighteenth century, when creoles were described as ‘corrupt’ and ‘deﬁcient’ compared to exemplary grammars such as that of Latin. These deﬁciencies were presumed to result from the inability of Africans to acquire the grammatical intricacies of European languages (BertrandBocande´ 1849; Baissac 1880; see also Meijer & Muysken 1977 for discussion). With the advent of generative grammar, Bickerton (1981) formulated the Language Bioprogram Hypothesis, a theory that sees the process of creolization as the complexiﬁcation of a pidgin that creole children are exposed to. A pidgin, according to Bickerton, is an unstable form of communication that results from a simpliﬁcation of the lexiﬁer language by adults during the process of secondlanguage acquisition. The contact languages emerging from this sort of process come closest to revealing Universal Grammar in its naked form, embodying ‘the world’s simplest grammars’ (McWhorter 2001).³ ³ Although McWhorter’s (2001) claim is about creoles, both pidgins and creoles are generally characterized as simple languages (Romaine 1988). Bickerton’s (1988) hypothesis, however, ranks pidgins as the simpler of the two, since pidgins are not systematic. On his view, it is as an effect of UG that a pidgin is creolized. Research has cast doubt on this generalization. Rich inﬂection can be

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

110

 ,  ,   

Simpliﬁcation, as evoked in creole studies, is often associated with morphology, particularly inﬂectional morphology: a creole is identiﬁed as a type of language that exhibits semantically regular derivational afﬁxation but no inﬂectional afﬁxation (McWhorter 1998). More generally, McWhorter (1998, 2001, 2011, among others) claims that simpliﬁcation of inﬂectional morphology is an effect of a ‘break in transmission’. In Chapter 10, this volume, McWhorter elaborates on the hypothesis that ‘radical analyticity’ in creoles, Sinitic, Niger-Congo, and some Austronesian languages stems from the drastic elimination of inﬂection, in particular, contextual inﬂection during extensive adult acquisition. This peculiar kind of ‘unnatural’ change is nothing comparable to the processes of grammaticalization witnessed in languages like English or French. While these are more analytic than their ancestors, both of these languages retain agreement and complex expression of inherent inﬂection via root allomorphy. A similar claim is made by Grant (2009), who posits that simplicity is a reduction in the allomorphy found in the lexiﬁer’s system to a sufﬁcient extent that the emerging pidgin/ creole shows no inﬂectional marking. However, the evidence does not support either of these conceptions of linguistic simpliﬁcation. Contra McWhorter, Palenquero does show agreement in adnominal adjectives (Schwegler 2013) and even if many creoles have lost gender and number agreement, they have innovated new contextual morphology most certainly inﬂuenced by their substrate languages: all varieties of Melanesian Pidgin feature a transitivity marker which is sufﬁxed to an English inherited lexicon (1). (1)

a. bild > bild-im haos build > build- house b. pei > pe-im skul yuniform buy > buy- school uniform c. let > let-em yu go let > let- you go

(Arika 2012)

French-based creoles spoken in the Indian Ocean all exhibit contextual inﬂection (see section 5.6.1.1 on Mauritian). As for the question of allomorphy, the approach we adopt in the next sections is that languages do not merely eliminate allomorphy. What appears in a new system in terms of forms is heavily dictated by frequency and the identiﬁcation of paradigmatic patterns that will subsequently serve to make new forms. Such a perspective doesn’t warrant the existence of a prior pidgin. As Mufwene (2008) points out, a closer examination of the facts shows that creoles do not evolve from pidgins but rather from the approximation found in pidgins, even more so than in some creoles (Bakker 2003). If a creole develops through the nativization of a pidgin, as the Language Bioprogram Hypothesis holds, we would expect the creole to be more complex than the pidgin from which it develops.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

    

111

of a non-standard variety of the lexiﬁer. Indeed, a recent study on the emergence of creole languages questions whether the existence of a pidgin is a necessary precursor to creolization, and suggests that contrary to common belief, emerging creoles are not typologically distinct from other languages (Blasi et al. 2017). In addition, the input for language learners in purely spoken settings differs radically from that of guided settings, since an inﬂectional paradigm’s perceptible distinctions are very different in speech and writing. Syncretism in a lexeme’s paradigm is much more pervasive in speech than in writing. In spoken French, only three forms are distinguished in the present indicative of ﬁrst-conjugation verbs (e.g., /mɑ̃ʒ/ eat../3 ~ /mɑ̃ʒɔ̃/ eat..1 ~ /mɑ̃ʒe/ eat..2),⁴ making the form-function relationships quite opaque in purely spoken settings (cf. section 5.4.1). And while some forms, like the simple past (passe´ simple), are rare altogether in colloquial French, others, like the periphrastic future, are preferred over synthetic forms (Abouda & Skrovec 2015, 2017). This is also true of gender and number agreement, which is less perceptible in spoken French than in written French. The stark differences between spoken French and the French of more guided settings are clearly revealed by Cajun French, which derives from varieties of spoken French dating from the period of colonialism both in the Americas and in the Indian Ocean. Cajun French features extensive use of periphrastic expressions comparable to those observed in the creoles. Such periphrasis allows differences of tense, aspect, and mood (TAM) to be expressed without differences in synthetic morphology; the form of the main verb manger ‘to eat’ remains unchanged in periphrastic expressions such as vous-autres est après manger ‘you () are eating’ and vous-autres va manger ‘you () will eat’. Thus, verb paradigms in Cajun French distinguish fewer synthetic forms than their counterparts in standard French. French-based creoles are likewise outgrowths of spoken French; as such, they have not drastically simpliﬁed the French inﬂectional system, but have instead developed a native verb alternation that resembles one salient in spoken forms of the lexiﬁer (Bonami et al. 2013). This is in line with recent empiricist approaches that reject the language innateness hypothesis and favour an integrative view of second-language acquisition according to which language learning relies on multiple factors, including innate learning abilities, prior knowledge of ﬁrst language, social setting, and perceptual and statistical mechanisms (see also Saffran et al. 1996 and Tomasello 2000).⁵ Finally, there is also the logical problem of language ⁴ In French, the ﬁrst conjugation constitutes the largest conjugation as well as the most regular and productive. ⁵ Other research on the emergence of language also suggests that aside from the human genetic endowment for language acquisition, human beings possess a mathematical or computational component for language creation and complexiﬁcation (Hauser et al. 2002; Fitch & Hauser 2004; Gervain & Mehler 2010).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

112

 ,  ,   

simpliﬁcation with regard to what has been identiﬁed as foreigner talk (Ferguson 1971). Foreigner talk refers to a simpliﬁed version of a language used by native speakers when addressing non-natives; the omission of inﬂections is widespread in these varieties (Hock & Joseph 1996). In any case, future creole speakers clearly have no prior knowledge of the lexiﬁer language before acquisition, begging the question as to how they could have simpliﬁed it. These observations crucially support the view that the input was already simpliﬁed. The morphological complexity of creoles has generally been evaluated based on comparisons with their lexiﬁer languages using traditional views of morphology. It must be said at the outset that the extent of a creole’s morphological complexity cannot simply be equated with the extent to which it mirrors complex patterns in the lexiﬁer language; otherwise, as will be argued below, dimensions of complexity in the creole that have no counterpart in the lexiﬁer language may simply be overlooked. This point is all the more crucial given that complexity can be measured in more than one way. Under a morpheme-based approach, a creole’s lexiﬁer can be argued to be morphologically complex because it distinguishes a large number of inﬂected words, a large number of afﬁxes, and, perhaps also, a large number of morphological processes. By these measures, the morphology of the creole under comparison appears much less complex.These measures, however, imply a particular conception of what constitutes morphology. In the generative-transformational tradition, it has been customary to see periphrasis as a syntactic construct; but periphrasis has recently been argued to function as a kind of inﬂectional exponence on a par with synthetic varieties of exponence (see Bonami 2015 and the references cited therein). Under the assumption that not all morphology is synthetic morphology, creole morphology takes on a higher degree of complexity, with larger arrays of morphosyntactic properties, larger paradigms, and larger inventories of inﬂectional exponents (Henri 2010; Kihm 2014; Henri & Kihm 2015). Nevertheless, as we noted in section 5.2, the complexity of a system is not simply enumerative; morphological complexity does not simply reduce to the cardinality of its morphosyntactic properties, the size of its paradigms, or the variety of its inﬂectional resources (Bonami et al. 2015). Even if creole inﬂectional systems are smaller on average⁶ than those of their lexiﬁers, they exhibit a comparable degree of integrative complexity. For example, Henri (2010) shows that in Mauritian, the complementary environments in which a verb’s long and short alternants appear cannot be characterized in morphological, syntactic, or information-structural terms by complementary natural classes of properties ⁶ Verbs in both Mauritian and French exhibit alternating forms, but a Mauritian verb’s synthetic paradigm is limited to two cells, neither of whose forms exhibits true afﬁxation or any coherent morphosyntactic content (Henri 2010); in French, by contrast, a verb’s synthetic paradigm exhibits ﬁfty-one cells, combinations of up to three inﬂectional afﬁxes (e.g., i-r-i-ons ‘(we) would have gone’) and arguably six morphosyntactic features (Bonami and Boye´ 2003, 2007).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

    

113

(cf. section 5.6.1). Mismatches of this kind have been argued to be an indicator of integrative complexity (see Stump 2017: 70–1 and the references cited there). Mauritian is likewise more complex when it comes to interpredictability, that is, the difﬁculty of predicting one form based on knowledge of another (Henri 2010; Bonami & Henri 2010; Bonami et al. 2011). Luís (2014) also shows that IndoPortuguese creoles exhibit different types of form-meaning mismatches in their inﬂectional system. Korlai, for example, presents both class-speciﬁc syncretism and paradigmatic opacity that affect morphosyntactic transparency (Bonami et al. 2013; Luís 2014). Comparable mismatches are found in other Portuguese-based creoles spoken in Africa (Kihm 2014).

5.4 Verb inﬂection: from French to French-based creoles Creoles are usually claimed to retain few if any of their lexiﬁer’s inﬂectional distinctions. In French-based creoles, this reduction has led to systems in which each verb has at least a short form (SF) and a long form (LF); systems of this kind are said to be characteristic of French-based creoles spoken in the Indian Ocean, and in the Americas, of Louisiana Creole and Haitian. The formal distinction between a verb’s SF and LF is claimed to be a syntactically-conditioned shape alternation in Isle de France creoles—Seychellois, Rodriguais, Chagossian, and Mauritian⁷—but not in Reunionese (Corne 1982; Seuren 1990; Syea 1992). Corne (1982) argues for a typological difference between Reunionese and Isle de France creoles on the basis of their verbal systems. Isle de France creoles’ verb alternations are said to have been inﬂuenced by Bantu alternations while those of Reunionese are reconciled with the assumption that it is merely a variety of French.⁸, ⁹ ⁷ These languages are said to form varieties of the same creole, namely Mauritian, this for reasons linked to colonization. Indeed, the Seychelles used to be part of British Mauritius together with Rodrigues and the Chagos. Rodrigues remains a Mauritian dependence while the sovereignty of the Chagos is still under dispute. ⁸ Depending on the verb, mesolectal varieties of Reunionese exhibit up to ﬁve inﬂected forms, expressing distinctions of tense and aspect. For example the verb ‘eat’ has the three inﬂected forms mâz, mâze, and mâzra, with the third one being restricted to negative future-tense contexts. Irregular verbs like ‘come’ exhibit ﬁve inﬂected forms, for example viê, vne, viê(n)ra, vni, vnir, where the future tense form viê(n)ra is again restricted to negative contexts and where there is a distinction between a past participle form vne and an inﬁnitive vnir (Corne 1982). Corne (1982) further notes that those forms are unstable to the extent that the past tense, the past participle and the inﬁnitive are interchangeable. Wittmann & Fournier (1987) present a severe critique of Corne’s data and analysis, drawing attention to a range of problems. They argue that his analysis is observationally inaccurate and theoretically questionable (given, e.g., the disparate range of factors that must be assumed to condition the proposed phonological rules; see also Henri 2010); that the analysis is not obviously informed by current thought on the usual motivations for regular sound changes; that the analysis is not compatible with reasonable assumptions about the uniformity of diachronic processes effecting language change; and that his assumption that Mauritian and Reunionese have fundamentally different histories is highly questionable. ⁹ Klingler (2003) and Rottet (1992) also assume that verb alternation in Louisiana Creole is reminiscent of French, making Louisiana Creole a plausible variety of French.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

114

 ,  ,   

Following Baker (1972), Corne argues that the LF/SF alternation affects 70% of the Mauritian verb lexicon and that SFs are derived by truncation of the LF’s ﬁnal vowel under conditions that are syntactically and semantically determined. Chaudenson (2003), Veenstra & Becker (2003), Veenstra (2009), and others defend an alternative analysis according to which Mauritian inherits its long and short forms from a French verb’s inﬁnitive and third-person singular present indicative forms (respectively) but without inheriting their corresponding functions. This development, they argue, is based on universals at play during secondlanguage learning. Veenstra (2009: 110) further hypothesizes that the LF/SF alternation is at ﬁrst phonologically conditioned but that it gradually becomes grammaticalized so that the appearance of a verb’s SF is conditioned by a following complement. As discussed in section 5.6.1.1, the distribution of the Mauritian alternation is much more complex than what Veenstra assumes (see also Henri 2010). The function of the alternation seen in Mauritian—he says— might reﬂect Bantu inﬂuence, since the conjoint and disjoint verb forms found in Makhuwa and other Eastern Bantu languages exhibit similar functions. While the hypothesis is plausible, it raises the question of the Bantu contribution in Haitian, which shows an alternation associated with a more or less parallel function. According to DeGraff (2001:75), the distinction in Haitian is subject to prosodic or morphosyntactic constraints. Verb alternations are, according to DeGraff (2001), manifestations of inﬂectional morphology, with a verb’s SF arising from its LF by subtractive morphology in the context of a following complement. The evidence that we present below suggests that verb-stem alternations are characteristic of all French-based creoles to a greater or lesser degree. While the form of such alternations and the functions that they serve are innovated in each individual creole, they are nevertheless relatable to the existence of comparable though distinct alternations in the verb morphology of the lexiﬁer. We advocate a theory of creole genesis that includes unguided second-language acquisition as one of the key components of creolization. In addition, we believe that there are a number of additional factors that may inﬂuence the emergence of a creole; these include frequency, salience, ease of perception, transparency, invariance, and congruence (see also Corne 1982; Mufwene 2008).

5.4.1 Properties of the French verbal paradigm As mentioned in section 5.2, the French verbal system is highly unpredictable and therefore unlikely to remain unchanged in French-based creoles (Bonami et al. 2013). Standard written French distinguishes three conjugation classes of synthetic paradigms consisting of a total of ﬁfty-one cells expressing TAM, person, number, and gender. The ﬁrst conjugation is the productive class, into which

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

    

115

Table 5.1. Patterns of syncretism in the French paradigm (Bonami et al. 2013)  /.2 ./3  . . .3 ./3









ﬁnise

rɑ̃de

kɥize

puve

rɑ̃dʁ rɑ̃dy rɑ̃

kɥiʁ

ﬁni ﬁnis

rɑ̃d

puvwaʁ py pø pœv pɥis

lave

lav

kɥi kɥiz

 dit dize diʁ di diz

loans and neologisms are integrated, as opposed to the non-productive second and irregular third conjugation. As Table 5.1 shows, French verb paradigms exhibit extensive syncretism: in the ﬁrst conjugation, many of a verb’s forms have one of two shapes, distinguished only by the presence or absence of a ﬁnal /e/, for example /mɑ̃ʒe/ ~ /mɑ̃ʒ/ (Chaudenson 2003; Veenstra & Becker 2003; Henri 2010). The French Xe ~ X alternation decidedly resembles the long-short alternation seen in Frenchbased creoles, although, as we argue, the creole alternation cannot be seen as purely inherited (see section 5.4.2). In eighteenth-century French, ﬁnal ‘r’ became unpronounced in second-conjugation inﬁnitives and in third-conjugation inﬁnitives ending in /iʁ/ (though not those ending in /iʁә/, such as ´ecrire ‘to write’); this means that in the expression of the paradigm cells listed in the left hand column of Table 5.1, only three forms were distinguished in the second conjugation, as Bonami et al. 2013 observe. Various factors tend to maximize the use of the syncretic forms in Table 5.1. In both spoken and written corpora, instances of the Xe ~ X pattern of /mɑ̃ʒe/ ~ /mɑ̃ʒ/ constitute more than 89% of forms (Bonami et al. 2013). In spoken French, the periphrastic future formation, involving the combination of the ancillary lexeme ‘go’ with an inﬁnitive form (as in (2a), with syncretic /mɑ̃ʒe/), is overwhelmingly preferred to the synthetic formation in (2b). Similarly, the use of .1 forms with subject nous (nous mangeons ‘we’re eating’) tend, in colloquial French, to be supplanted by that of indeﬁnite .3 forms with subject on (on mange ‘one is eating’, with syncretic /mɑ̃ʒ/). (2)

a. Il va 3 go.3 ‘He will eat.’

manger. eat.

b. Il mangera. 3 eat..3 ‘He will eat.’

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

116

 ,  ,   

Bonami et al. (2013) also note that French verb forms are often ambiguous with respect to their inﬂection-class membership. For instance, /pɛɳe/ serves as the .2 form for both the ﬁrst-conjugation verb peigner ‘comb’ and the thirdconjugation verb peindre ‘paint’. Thus, certain differences in form may be widely recurrent even if they don’t stem from a single inﬂection-class difference. If creolization is at all sensitive to factors such as frequency, saliency, and perception, we expect to ﬁnd an LF/SF distinction in creole verbs as a reﬂection of the wide recurrence of a comparable distinction in the lexiﬁer (Bonami et al. 2013; see also Corne 1999; DeGraff 2001).

5.4.2 French-based creoles Verb alternations are observable across the French-based creoles, though the number of verbs exhibiting such alternations varies from one creole to another. Verbs in Guadeloupean are customarily described as being invariable. For example, Hazaël-Massieux (2002: 71) claims that Guadeloupean doesn’t show any real inﬂection, and distinctions between two forms of the same lexeme, like the distinction between fè /fɛ/ and fèt /fɛt/ ‘to do’, are French borrowings and are purely exceptional. A similar type of description is provided by Ehrhart (1993: 158), who maintains that Tayo, a French-based creole spoken in New Caledonia, behaves like American creoles (with the exception of Louisiana Creole) in having only a few verbs with more than one form, such as mete /mete/ ~ met /met/ ‘to put’, balaj /balaj/ ~ balaje /balaje/ ‘to sweep’, kouver /kuvɝ/ ~ kouvri /kuvʁi/ ‘to cover’. Granting the limited nature of verb alternations in these two creoles, we nevertheless believe that even here, the role of such alternations in a creole’s grammar cannot be ignored. When forms of a verb alternate, they exhibit systematic distributional differences. Moreover, the incidence of such alternations is important as a feature shared by the French-based creoles; it constitutes a common aspect of their development from French, but also a signiﬁcant dimension of innovative divergence among the creoles themselves. We claim that the verb alternations found in the French-based creoles were in all cases shaped by but not necessarily inherited from their lexiﬁer, pace Chaudenson (2003), Veenstra & Becker (2003), and Veenstra (2009). Consider the Mauritian verb forms shown in Table 5.2. The examples suggest that the alternation stems from a single French form from which a second form is independently innovated. The source form in French is very often the inﬁnitive but may instead be some other form. For example, Mauritian /kone/ ‘to know’, though imported as a long form, stems not from the inﬁnitive connaître but from the . connai(t/s) (itself a ‘short form’ in French). For syncretic forms like dwa ‘to owe’, there are two possibilities: either they are integrated as LFs (as in the

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

    

117

Table 5.2. Comparison of  and .3 forms in French with long and short forms in Mauritian French

Mauritian



.





Gloss

ale vəni(ʁ) sɔʁti(ʁ) dəvwa(ʁ) konɛtʁ aswa(ʁ)

va vjɛ̃ sɔʁ dwa kone asjɛ

ale vini soɚti dwa kone asize

al vin soɚt dwa konn asiz

‘go’ ‘come’ ‘exit/go out’ ‘owe’ ‘know’ ‘sit’

Table 5.3. Sample comparison of long and short forms in four French-based creoles Reunionese

Louisiana Creole

Guadeloupean

Haitian

 ale vɛne soɚti

 al vɛn soɚrt

 ale vini sɔɾti

 alea vinb sɔɾ

 ale vini sɔti

 al/ay vin sɔt

‘go’ ‘come’ ‘exit/go out’

konɛt

kone

kɔnɛ̃

kɔnɛ̃

 ale vini sɔti save kɔnɛt

kɔnɛ̃

kɔn

‘know’

 ay vin sɔt sav kɔnɛt

Gloss

Notes: a

Louisiana Creole has a short form /al/ alternating with a longer form /ale/ meaning ‘to haul/pull’. The suppletive French form /va/ 3. also appears in some French-based creoles as an irrealis marker: va in Mauritian and Louisiana Creole. In Reunionese Creole a form /sava/, possibly lexicalized from the agglutination of the demonstrative with the 3. form of the verb , is used in a number of impersonal constructions. Armand (2014) describes it as an auxiliary.

b

In addition to /vin/, both Mauritian and Louisiana Creole have the form /vjɛ̃/. But in both languages, this is a late borrowing and the two forms are used interchangeably.

case of kone) and the syncretic SFs are derived from them or they enter the paradigm as SFs from which the corresponding LFs are derived. Notice also the case of Mauritian asiz ‘to sit’, whose French source is evidently the feminine past participle assise, is imported as a Mauritian SF from which the corresponding LF asize is then derived. Together with Louisiana Creole, French-based creoles spoken in the Indian Ocean show a more extensive pattern of alternation than New Caledonian creole, Tayo and the creoles of the French West Indies. Table 5.3 illustrates alternations from Reunionese, another French-based creole spoken in the Indian Ocean, and Louisiana Creole, Guadeloupean and Haitian, all spoken in the Americas. In our view, it is likely that verb alternations in these varieties started out as a sandhi

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

118

 ,  ,   

alternation that was subsequently exapted to serve one or another function in each individual creole. While we focus here only on three French-based creoles, Mauritian, Guadeloupean, and Haitian, we hypothesize that verb-form alternations in all French-based creoles are unequivocally more complex than has previously been acknowledged (see, e.g., Ehrhart 1993; Hazaël-Massieux 2002; BerniniMontbrand et al. 2013). As we show in the following section, this complexity is revealed by the creoles’ processes of deverbal derivation. In discussing deverbal derivation in these creoles, we will draw upon the following useful distinctions: (i) In most cases, an LF may be seen as consisting of a stem plus a particular vowel; we refer to the stem in this combination as an LF-. • In many instances, a verb’s LF-stem is simply the verb’s SF, as in the case of Haitian or Mauritian ‘come’: LF vini, LF-stem/SF vin. • Occasionally, a verb’s LF ends is a consonant that is absent from the verb’s SF. Here, too, the LF-stem may be equated with the SF, as in the case of Haitian ‘do/make’: LF fèt, LF-stem/SF fè. (ii) In some cases, there is a relation of  between a verb’s LF and its SF; that is, there is a single form that the grammar of the language treats as both an LF and an SF. • In such cases, the syncretized forms may have the vowel-ﬁnal morphology of a typical LF, in which case the LF-stem is distinct from the SF. In cases of this kind, the LF-stem may have the status of a hidden stem of the sort discussed in section 5.2 above; we call this a  LF. As we will see (section 5.6.3.2), the Haitian verb ‘chat’ has koze as both its LF and its SF, with koz as a hidden LF-stem. • But there are also cases in which a verb’s syncretized LF and SF have the shape of a typical SF; in such cases, one can assume that the LF, the SF and the LF-stem are all alike, as in the case of Mauritian ‘drink’, whose LF, SF, and LF-stem are all bwar. (iii) Finally, a verb may have a hidden stem that is distinct from its LF, its SF, and its LF-stem; we call this a   . In Mauritian, for example, the verb ‘drink’ has bwar as its LF, SF, and LF-stem, but also has the special hidden stem biv- appearing in nominalizations such a biver ‘drinker’.

5.5 Approaches to derivation Our analysis is based on the theoretical framework of lexeme-based morphology (Matthews 1972; Aronoff 1994) where the lexeme is deﬁned as a lexical entity

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

    

119

abstracted away from the syntactic contexts in which it may appear; a lexeme belongs to a lexical category, has semantic content, and is realized by one or more word forms through which it participates in syntax. In inﬂectional languages, a lexeme is usually associated with a collection of stems used to form the inﬂected forms that can be inserted into sentences. For instance, the French verbal lexeme  ‘to drink’ has a stem /byv/ upon which are built the inﬂected forms /byvɔ̃/ (buvons ‘we drink’), /byve/ (buvez ‘you () drink’), /byvɛ/ (buvais ‘I was drinking’), etc., and a stem /bwa/ from which are formed the homophonous word forms /bwa/ (bois ‘you () drink’, boit ‘s/he drinks’). Stems such as /byv/ and /bwa/ are morphomic in the sense of Aronoff (1994): they participate in formal alternations whose conditioning cannot be coherently characterized in semantic, morphosyntactic, or phonological terms but must be seen as purely morphological in its motivation. For French verbs, Bonami & Boye´ (2002, 2003) propose a stem space with twelve slots; this is a kind of matrix within which each verb’s full inventory of stems is uniformly speciﬁable. The stem slots are linked to one another by default implicative rules, so that for a regular verb, there is a slot whose stem sufﬁces to determine the stems in all of the other slots in that verb’s stem space. An irregular verb is a lexeme whose stem space includes at least one stem that overrides a default implicative rule. Extending this idea, Bonami et al. (2009) show that a thirteenth stem is needed to account for deverbal lexemes sufﬁxed with the action nominalizer -ion, the adjectivalizer -if, or the agent nominalizers -eur/-rice. Thus, both rules of inﬂection and rules of derivation draw upon a lexeme’s stem space; an individual stem may, however, be accessible to rules of only one type; for instance, the thirteenth stem proposed by Bonami et al. (2009) is hidden to inﬂection, being accessible only to rules of derivation, as in Table 5.4.

Table 5.4. Stem space of  ‘to form’,  ‘to ﬁnish’, and ´ ‘to defend’ #

Stem use







1 2 3 4 5 6 7 8 9 10 11 12 13

imperfect, pres. 1/2 present 3 present  present participle imperative 2 imperative 1/2 pres. subjv.  & 3 pres. subjv. 1/2 inﬁnitive future, conditional simple past, past subjv. past participle hidden stem

fɔʁm fɔʁm fɔʁm fɔʁm fɔʁm fɔʁm fɔʁm fɔʁm fɔʁme fɔʁm fɔʁma fɔʁme fɔʁmat

ﬁnis ﬁnis ﬁni ﬁnis ﬁni ﬁnis ﬁnis ﬁnis ﬁni ﬁni ﬁni ﬁni ﬁnit

defɑ̃d defɑ̃d defɑ̃ defɑ̃d defɑ̃ defɑ̃d defɑ̃d defɑ̃d defɑ̃d defɑ̃d defɑ̃di defɑ̃dy defɑ̃s

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

120

 ,  ,   

At least ﬁve members of a French verb’s stem set are available as base stems in instances of deverbal nominalization (Bonami et al. 2009; Tribout 2012). Deverbal nouns in -age have stem 1 as their base stem (e.g., /netwaj/:  ‘cleaning’); deverbal nouns in -ment generally have stem 2 as their base stem (/netwɑ/:  ‘cleaning’, /ʒonis/:  ‘yellowing’); and the base stem of a deverbal noun arising by conversion may be stem 3 (/dɑ̃s/:  ‘to dance’!  ‘dance’), stem 12 (/ɑʁive/:  ‘to arrive’ ! ´ ‘arrival’), or the hidden stem 13 (/defɑ̃s/: ´ ‘to defend’ ! ´ ‘a defense’). The selection of a deverbal derivative’s base stem is not uniquely determined by phonological or grammatical criteria. For example, there are instances in which more than one of a verb’s stems serves as a base for conversion, as in the case of  ‘to dive’, whose derivatives include  ‘dishwashing’ (whose stem /plɔ̃ʒ/ is stem 3 of ) and ´ ‘diving’ (whose stem /plɔ̃ʒe/ is stem 12 of ). More importantly, base-stem selection has no correlation with the semantics of the derived nominal: nominalizations expressing action, result, agent, instrument, or location vary unpredictably with respect to which of the base lexeme’s ﬁve possible stems serves as their base stem. Given the dimensions of complexity discussed in section 5.2, we claim that French derivational relations contribute substantially to the morphological complexity of French. In particular: (i) base-stem predictability in the deﬁnition of deverbal nominalizations in French exhibits the highest degree of complexity (degree 2 in Figure 5.1); and (ii) where X is a verbal lexeme L’s base stem in a particular derivational relation, the restrictedness of X in L’s morphology may evince the highest degree of complexity (degree 2 in Figure 5.2).

5.6 Derivational relations in French-based creoles We now turn to the description and analysis of derivation in Mauritian, Guadeloupean, and Haitian; in each case, we preface this discussion with a brief overview of the function of long and short verb forms in the creole under scrutiny.

5.6.1 Mauritian 5.6.1.1 Function of verb forms in Mauritian In Mauritian, verbs alternate between a short and a long form. Most verbs (70%) have morphologically distinct forms but some (30%) have syncretic long and short forms (Henri 2010); the verbs in Table 5.5 are representative of the different observed cases. Contrary to previous assumptions (e.g., those of Corne 1982),

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

    

121

Table 5.5. Verb alternations in Mauritian Verb

SF

LF

 ‘to think’  ‘to stay’  ‘to buy’  ‘to ask’  ‘to amend’  ‘to snore’  ‘to drink’

pans res aste demann amand ronf bwar

panse reste aste demande amande ronﬂe bwar

the alternation is not phonologically predictable and shows an intricate distribution that encodes morphological, syntactic, and information-structure oppositions (Henri 2010). In syntax, a verb’s SF is used in the presence of a non-clausal complement (3) as opposed to the LF, which appears in the absence of any complement (4a). LFs also appear with verbs that select clausal complements (4b), have an extracted complement (4c) or are followed by an adjunct (4d). (3)

Toulezour, mo pans mo everyday, 1. think. 1. ‘Everyday, I think about my family.’

(4)

a. Zan ronﬂe. John snore. ‘John snores.’

fami. family

b. Mo panse ki tou dimoun 1. think. that every person ‘I think that everybody is intelligent.’

intelizan. intelligent

c. Se mo fami ki mo panse. It 1. family that 1. think.. ‘It’s my family that I think about.’ d. Zan ronﬂe gramatin John snore. morning ‘John snores in the morning.’ However, a verb’s LF may appear where its SF would otherwise be expected under certain discourse conditions. In counter-assertions, the LF is interpreted as an exponent of Verum Focus—using the LF evokes and denies the converse of the proposition making up the content of the clause (Henri et al. 2008; Henri 2010).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

122

 ,  ,   

(5)

a.  : To pa pans to fami 1.  think. 1. family ‘You never think about your family!’  : Mo panse mo fami! 1. think. 1. family ‘I do think about my family.’

zame! never

b.  : To pa fer seki to anvi isi! 2.  do. what 2. want. here ‘You don’t do what you want here!’  : Mo panse kouma mo le kan mem. 1. think. how 1. want. still ‘I still think like I want to.’ Similarly, post-verbal constituents that are usually construed as adjuncts, while ordinarily inducing the use of the LF, can appear with SFs if and only if those postverbal constituents are focused; this is true of locatives, instrumentals, temporal adjuncts, and adjuncts of degree, frequency, and manner. (6)

a.  :

Kot to manze dan zedi? where 2. eat.  Thursday ‘Where do you eat on Thursdays?’  : Mo manz rozil dan zedi! 1. eat. Rose-Hill on Thursday ‘I eat in Rose-Hill on Thursdays’

b.  : Ar ki to manze?  what 2. eat. ‘What do you eat with?’  : Mo manz ar lame. 1. eat. with hand ‘I eat with my hands.’ Finally, both the short and the long form are used in lexeme-formation processes such as reduplication (Henri 2010, 2012). A derived verb formed by reduplication itself has both an SF and an LF; as the examples in Table 5.6 show, the derived verb’s SF is a doubling of the base verb’s SF while its LF is the base verb’s SF combined with its LF. Heterogeneous distributional patterns such as those of a Mauritian verb’s short and long forms can be characterized as morphomic (Henri forthcoming), a property that has been argued to contribute to a system’s integrative complexity (Aronoff 1994). As we now show, Mauritian derivations are as integratively complex as those of French.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

    

123

Table 5.6. Reduplication in Mauritian Base lexeme

Reduplicated derivative lexeme

SF

LF

Gloss

SF

LF

Gloss

pans manz res demann bwar

panse manze reste demande bwar

‘think’ ‘eat’ ‘stay’ ‘ask’ ‘drink’

pans-pans manz-manz res-res demann-demann bwar-bwar

pans-panse manz-manze res-reste demann-demande bwar-bwar

‘think episodically’ ‘nibble’ ‘stay occasionally’ ‘ask occasionally’ ‘sip’

5.6.1.2 Derivational relations in Mauritian As we have seen, verbs in Mauritian have two basic forms: an SF and an LF. In instances of deverbal nominalization, verbs vary according to whether their base stem is their SF or their LF, as the examples in Table 5.7 show. A deverbal nominalization’s base stem may also be a special hidden stem, as in the case of biv in Table 5.7. In some instances, it is not immediately clear whether a deverbal nominalization’s base stem is an LF or an SF: in cases in which a nominalizing sufﬁx begins with a vowel, the base stem lacks a ﬁnal vowel, either because it is an SF (or possibly even a hidden LF-stem) or because it is an LF that has undergone a (morpho)phonological process of elision serving to avoid vowel hiatus. Other cases, however, are not ambiguous in this way. In the morphology of the lexeme  ‘to remain’, for example, the LF reste has a t but the SF res does not; in view of this fact, the nominalization restan ‘leftovers’ likely involves elision of the LF reste. Conversions in general are unambiguous with respect to their choice of base stem. Moreover, they show that derived nominal lexemes have the same kinds of meanings (action, result, location) whether their stem arises from a verb’s LF or its SF; thus, a base lexeme’s base stem is not, in itself, predictable in Mauritian. A verb’s derived nominal stem is not always inherited from the lexiﬁer language. Derived nominals like  (stem /dɑ̃se/) ‘dancing’ or  (stem /luke/) ‘peep’ do not exist in French and thus cannot be inherited. As Mauritian innovations, these nouns demonstrate that derivation is a productive process from a qualitative perspective (i.e., the process is still available to form new nouns). Deverbal nominalizations in Mauritian involve base stems that are both variable and unpredictable (Table 5.7): base stems may be LFs, SFs, special hidden stems, and perhaps also hidden LF-stems; in some instances they are comparable in complexity to deverbal nominalizations in French. In particular, base-stem predictability in the deﬁnition of deverbal nominalizations in Mauritian exhibits complexity of degree 2 (see again Figure 5.1). Because the grammar of Mauritian deﬁnes complementary syntactic distributions for a verbal lexeme’s LF and SF, both of these function as inﬂected forms and neither, therefore, is hidden. But we also identiﬁed instances where a special hidden stem is used in the formation of derived nouns. As a consequence,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

124

 ,  ,   

Table 5.7. Deverbal nominalizations in Mauritian → Noun

Verb LF

SF

danse ‘to dance’

‘to peep’

dans louke

by conversion danse

‘dancing; ball’

(la)dans

‘dance’

louke

‘peep’

chak

‘stroll’

louk

by suffixation dans-er/ez*

‘dancer’

louk-er*

‘peeping Tom’

chak-er*

‘stroller’

rest-an

‘leftovers’

biv-er labiv-et

‘drinker’ ‘bar’

kamouflaz*

‘episode of insulting’

chake ‘to stroll’ ‘to remain’

chak reste res bwar

‘to drink’

special hidden stem biv

‘to insult’ [‘to cover with insults’]

kamoufle

(le)res

‘rest’

bwar

‘drink’

kamoufle ‘insults’

hidden LFstem kamoufl

In a given row, each nominalization has that row’s verb form as its base stem. *An asterisk marks a derived stem that is morphologically ambiguous, involving either (a) a base stem that is an SF or hidden LF-stem or (b) a base stem that is an LF whose final vowel undergoes prevocalic elision.

Mauritian derivations exhibit a degree of base-stem restrictedness similar to that of French (see again Figure 5.2).

5.6.2 Guadeloupean 5.6.2.1 Function of verb forms in Guadeloupean Guadeloupean shows signiﬁcantly fewer verbs having distinct long and short forms compared to Mauritian. We propose that the grammar of Guadeloupean, like that of Mauritian, makes essential reference to a grammatical distinction between long and short forms, but that the Guadeloupean lexicon differs from that of Mauritian insofar as most verbs exhibit syncretism between their long and short forms. We have identiﬁed thirty-four verbs having morphologically distinct short and long forms, based on a sample of 1,824 verbs extracted from two dictionaries (Tourneux & Barbotin 2008; Bernini-Montbrand et al. 2013); Table 5.8 provides a sample of verbs having distinct long and short forms. As is the case in Mauritian, LFs alternating with a morphologically distinct SF usually end in a vowel in Guadeloupean, speciﬁcally e and i, but with more

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

    

125

Table 5.8. Verb alternations in Guadeloupean Verb

SF

LF

́ ‘to look’ ́́ ‘to put’ ́ ‘to know’ ́ ‘to hold’  ‘to come’ ̀ ‘to do’ ́ ‘must’  ‘to give’

gay/gad mèt sav ken vin fè fo ba(n)

gade´ me´te´ save´ kenbe´ vini fèt fale´ bay

members in the i class (four members in Mauritian vs. ten in Guadeloupean). Guadeloupean also presents two cases in which a verb’s LF ends in a consonant that is absent from its SF—fèt /fɛt/ ~ fè /fɛ/ ‘to do’ and bay /baj/ ~ ba(n) /ba/ or /bɑ̃/ ‘to give’; neither is found in Mauritian. Given the restrictedness of the phenomenon in Guadeloupean, one might think of Guadeloupean verb alternations as irregularities in a system in which verbs usually exhibit only a single form and in which alternations that do arise can be argued to be phonologically systematic, conforming to a small number of patterns ranging from the truncation of a ﬁnal segment or syllable (me´te´ /mete/ ~ mèt /mɛt/ ‘to put’; foute´ /fute/ ~ fou /fu/ ‘give’) to a combination of ﬁnal truncation with nasal spread (de´fandi /defɑ̃di/ ~ de´fann /defɑ̃n/ ‘to defend’) or nasal shift (kenbe´ /kɛ̃be/ ~ ken /kɛn/ ‘to hold’). There are also instances of partial suppletion, as in alternations such as gade´ /gade/ ~ gay /gɛ/ ‘to look’ or fale´ /fale/ ~ fo /fo/ ‘must’. Our view is that the difference between the Guadeloupean verb system and that of Mauritian is a difference of degree, not of kind. In particular, we assume that in the grammars of both languages, long and short verb forms are systematically distinguished but that the two forms are syncretic in some cases; this syncretism is more widespread in Guadeloupean than in Mauritian, but that is a lexical fact rather than a fact of grammar. This perspective entails that in both languages, LFs possess a systematic cluster of properties distinct from that possessed by SFs—that a verb exhibiting distinct long and short forms is not an irregular verb whose forms possess their own peculiar distributional idiosyncrasies, but ﬁts into a larger pattern. The simplest assumption is that this larger pattern is common to all verbs, but that a verb’s conformity to the pattern is often obscured by the same kind of poverty of forms as characterizes English verbs such as hit, spread, and cost (which exhibit a single form for the inﬁnitive, the non-3 present, the past, and the past participle). Guadeloupean verb alternation codes an aspectual distinction, where SFs are usually interpreted as referring to single events (as in (7a)–(14a)) and LFs as referring to multiple events (as in (7b)–(14b)). In the absence of other TAM markers, the long and short alternants may also express tense contrasts: in (7a) the SF expresses present tense, while in (7b), the LF expresses past tense (or passe´ compose´). (Guadeloupean resembles Louisiana Creole in this respect.)

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

126

 ,  ,   

(7)

a. An ken ni ba-’w. 1. hold. 3 -’2. ‘I hold it for you.’ (single event) b. An kenbe´-’y ba-’w. 1. hold.-’3. -’2. ‘I held it for you.’ (multiple events)

When SFs are combined with the progressive marker ka, the interpretation is that of what might be called a ‘progressive completive’, as in (8a); but the combination of an LF with ka (as in (8b)) instead has a prospective reading, in which a multiplicity of future events, potentially but not necessarily completed, is understood. (8)

a. A(n) ka vin. 1.  come. ‘I’m coming all the way.’ (‘progressive completive’) b. A(n) ka vini. 1.  come. ‘I’m planning to come.’ (prospective)

Similarly, SFs with the irrealis marker ke´ or the past tense marker te´ may have a single event interpretation; the SF sav ‘know’ in (9a) has a single event interpretation, and the SF mèt ‘put’ in (10a) may receive either a single event or multiple events interpretation. By contrast, LFs combine with ke´ and te´ to express multiple events, as in (9b) and (10b). (9)

a. An pe´ ke´ sav konte´. 1.   know. count. ‘I won’t know how to count (on that occasion).’ b. An pe´ ke´ save´ konte´. 1.   know. count. ‘I won’t know how to count (in general).’

(10)

a. I te´ mèt pima adan. 3.  put. pepper inside ‘He/She put pepper in it (on that occasion / in general).’ b. An te´ me´te´ pima adan. 1.  put. pepper inside ‘He/She put pepper in it (in general).’

This contrast is of course not obvious in cases in which the long and short forms are syncretized. The data in (11) exemplify syncretic verbs exhibiting meanings that are ambiguous between the single-event and the multiple-event

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

    

127

interpretations. However, no prospective reading is available in (11a). Speakers typically use kay¹⁰ instead of ka to express the prospective in these contexts. (11)

a. An mange´ kribich. 1. eat./ crawﬁsh ‘I eat/ate crawﬁsh.’ (present single-event/past multiple-event) b. A(n) ka(y) dòmi. 1.  sleep ‘I am sleeping.’ (‘progressive completive’ or prospective) c. Timoun-la te´ chante´ on bel chanson child-  sing./  beautiful song ‘The child sang a beautiful song.’ (past single- or multiple-event) d. Pon moun pe´ ke´ bouge´. no person   move./ ‘No one will move.’ (irrealis single- or multiple-event)

A subclass of verbs shows different constraints: SFs of the verbs ´´ ‘to peep’, ´ ‘to look’, and ´ ‘to put/give/leave’ are only used as imperatives, as in (12); these reﬂect a more direct borrowing from French, with the exception of the form gay /gɛ/ (12b), apparently a creole neologism. A comparable behaviour is seen with `, whose short and long forms discriminate between the active and the passive/causative, as in (13). (12)

a. Fou sa la! put. this here ‘Put this here!’ (rude) b. Gay bonda-la-sa! look. ass-- ‘Look at this ass!’

(13)

a. Manman a-’w ka Mother -’2.  ‘Your mother is making food.’

fè make.

mange´. food

b. Mange´ ka fèt. food  make. ‘Food is cooking.’ Finally, the verb  ‘to give’ features semantic contrasts but also sandhi effects. With non-pronominal objects, we ﬁnd both the form bay and ba combined with the irrealis marker ke´, with the former form encoding an irrealis single-event ¹⁰ The form kay probably derives from the contraction of the TAM marker ka with ay (from the short form of the verb  ‘to go’).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

128

 ,  ,   

meaning (as in (14a)) and the latter encoding an irrealis multiple-event meaning (as in (14b)). With pronominal noun phrases, the form ba precedes a vowel-initial pronoun (14c) and ban, a nasal-initial pronoun (14d). (14)

a. An ke´ bay on tap. 1.  give.  slap ‘I’ll slap you (on that occasion).’ b. An ke´ ba on tap. 1.  give.  slap ‘I’ll slap you (in general).’ c. An ba-’w li. 1. give.-’2. 3 ‘I give/gave it to you.’ d. Jan ban mwen tout lajan a-’y. 1. give. 1. all money -’3. ‘John gives/gave me all his money.’

5.6.2.2 Derivational relations in Guadeloupean Guadeloupean shows less verb alternation than Mauritian. When Guadeloupean verbs do have both an LF and an SF, deverbal nominalization seems to favour the LF as the verb’s base stem. Verbs having syncretic forms also give rise to deverbal nominalization. Both cases are illustrated in Table 5.9. Like Mauritian, Guadeloupean exhibits derived nominals that do not exist in French (e.g., ́, ́, ́ in Table 5.9); such innovations reveal that deverbal nominalization is qualitatively productive in Guadeloupean. Guadeloupean grammar deﬁnes distinct syntactic distributions for a verbal lexeme’s long and short word forms; for some verbs, these are distinct forms (e.g., vini / vin ‘to come’) though for most, the two forms are syncretized. But even for verbs that do not exhibit a distinct SF, there is sometimes evidence for a distinct LF-stem with its own special distribution. A large number of verbs that lack distinct long and short forms have a present participle formed by means of a sufﬁx -an; the examples in (15) illustrate. Examples of this sort exhibit an ambiguity similar to that observed for Mauritian in section 5.6.1.2: either -an attaches to the verb’s LF-stem or it attaches to the verb’s LF with prevocalic elision of the LF’s ﬁnal vowel. (15) ́ ‘to lie’ ́ ‘to ﬁght’ ́ ‘to mix’ ́ ‘to drink alcohol’

!  ‘lying’ !  ‘ﬁghting’ !  ‘mixing’ !  ‘drinking’

Several operations of deverbal nominalization exhibit a similar pattern in Guadeloupean; these include the operation of -è /ɛ/ sufﬁxation, which forms agent nouns, and the operations of -aj and -asyon sufﬁxation, which form action

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

    

129

Table 5.9. Deverbal nominalizations in Guadeloupean Verb LF ‘to come’ ‘to go out’ ‘to look’

→ Noun SF

vini

by conversion vini

‘arrival’

sòti

‘outing’

gadé

‘look’

by suffixation

vin sòti sòt gadé gad babouké

‘to constrain’

LF-stem babouk

babouk

‘constraint’

‘to fight’

goumé

goumé

‘fight’

badiné ‘to joke around’

LF-stem badin chomé

‘to have fun’

‘to take advantage’

chomé

babouk-aj*

‘halt’

badin-è* badin-aj*

‘joker’ ‘joke’

chom-aj*

‘party’

pwofit-asyon*

‘benefit’

poupoul-man

‘teasing’

‘party’

LF-stem chom pwofité LF-stem pwofi(t)

pwofi

‘benefit’

poupoulé ‘to tease’

LF-stem poupoul

In a given row, each nominalization has that row’s verb form as its base stem. *An asterisk marks a derived stem that is morphologically ambiguous, involving either (a) a base stem that is an LF-stem or (b) a base stem that is an LF whose final vowel undergoes prevocalic elision.

nouns; these operations are exempliﬁed in Table 5.9, with additional examples in (16)–(18).¹¹ Here, too, the derivational sufﬁx joins with either a verb’s LF-stem or, with elision, its LF. (16)

́ ‘to cuddle’ ́ ‘to stroll’

! ̀ ‘cuddler’ ! ̀ ‘stroller’

(17)

́ ‘to exchange’ ́ ‘to unite’

!  ‘exchange’ !  ‘union’

¹¹ The sufﬁxal derivatives in Table 5.9, in (15)–(17), and in (20) are cited from Villoing & Deglas (2016).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

130 (18)

 ,  ,    ̀́ ‘to annoy’  ‘to follow’

! ̀ ‘annoyance’ !  ‘pursuit/chase’

Villoing & Deglas (2016) pursue the assumption that such derivations involve a sandhi operation by which vowel hiatus is avoided through the prevocalic elision of an LF’s ﬁnal ´e. Additional evidence, however, reveals that at least some cases cannot be attributed to prevocalic elision but must be seen as involving direct sufﬁxation to a verb’s LF-stem. Consider, for example, the operation of -man sufﬁxation, by which action nouns such as those in (19) are derived. (19)

́ ‘to tease’ !  ‘teasing’ ́́ ‘to hurry up/start moving’! ́ ‘moving/ activating’ ́́ ‘to separate’ ! ́ ‘separation’

As these examples show, deverbal nouns sufﬁxed with -man also lack the ﬁnal ´e of the verb’s LF. Here, however, the absence of the ﬁnal ´e cannot be attributed to hiatus avoidance, since the sufﬁx begins with a consonant. Moreover, nouns such as , ́, and ́ have no counterparts in French and so cannot simply be inheritances from the lexiﬁer. The only explanation is that they are productively formed in Guadeloupean through the direct sufﬁxation of -man to a verb’s LF-stem. Moreover, Occam’s Razor favours the assumption that all of the operations in (15)–(19) involve direct sufﬁxation to a verb’s LF-stem. By maintaining a distinction between a verb’s SF and its LF-stem, we can arrive at a straightforward account of deverbal nominalizations such as those in (20) as well as denominal verb derivations such as those in (21). On one hand, the deverbal nominalizations in (20) are conversions of a verb’s LF-stem to a noun; by contrast, the derivations in (21) are conversions of a noun to a verb’s LF-stem, to which the sufﬁxal formative for a verb’s LF then attaches. This account contrasts with that of Villoing & Deglas (2016), who regard the derivations in (20) and (21) as involving processes of sufﬁxation that induce elision rather than processes of conversion. (20)

́ ‘to ﬂirt’ !  ‘a ﬂirt’ ́ ‘to offend’! ̀ ‘an insult’ ́ ‘to stroll’ !  ‘a stroll’

(21)

 ‘zouk’ ! ́ ‘to dance zouk’ ̀ ‘Christmas’! ́́ ‘to celebrate Christmas’  ‘drizzle’ ! ́ ‘to drizzle’  ‘refuge’ ! ́ ‘to take refuge’

Our analysis assumes the coexistence of deverbal nominalizations whose base stem is a verb’s LF (e.g., goume´ ‘to ﬁght’ ! goume´ ‘ﬁght’) with those whose base stem is a verb’s LF-stem (e.g., LF-stem bas ‘to ﬂirt’ ! bas ‘ﬂirt’). This analysis

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

    

131

predicts that a particular verb may give rise to two derived nominal stems, one based on the verb’s LF, the other on its LF-stem. This prediction is indeed borne out: ́ ‘to win’ has two derived nominals, ́ ‘victory’ (whose stem is the verb’s LF) and  ‘win’ (whose stem is the verb’s LF-stem). In summary, we assume that every verb has an LF-stem, even if it doesn’t exhibit distinct long and short word forms; for those that do, the SF shares the form of the LF-stem. Postulating an LF-stem for every verb offers a uniﬁed analysis of both denominal verb derivation and deverbal nominalization (whether by conversion or by the addition of a derivational sufﬁx). On this account, Guadeloupean derivation shows a degree of complexity equivalent to those of French and Mauritian with respect to base-stem predictability. In Guadeloupean, a verbal lexeme’s base stem is its LF in some cases and its LF-stem in others; thus, base-stem predictability in the deﬁnition of deverbal nominalizations exhibits complexity of degree 2. By contrast, it is not clear that Guadeloupean deverbal nominalizations ever have a hidden form as their base stem; not even a verb’s LF-stem can be claimed to be hidden in view of its use in the formation of a present participle, an inﬂected form. Guadeloupean deverbal nominalizations therefore exhibit a base-stem restrictedness whose complexity is no higher than degree 1.

5.6.3 Haitian 5.6.3.1 Function of verb forms in Haitian Only twelve out of 2,657 verbs excerpted from Valdman et al. (2007) alternate between a long and a short form (Table 5.10). The alternation is, according to Alleyne (1996), the result of a phonological reduction, or more precisely that of a syllabic reduction (Cadely 1994). The function of the alternation shows some similarities with both Mauritian and Guadeloupean. DeGraff (2001) argues that truncation occurs when verbs are followed by non-pronominal objects (22a) but fails when the verb is in sentenceﬁnal position (22b), has an extracted object (22c) or is followed by an adjunct (22d). (22)

a. Mari gen kouraj. Marie have. courage ‘Marie has courage.’ (DeGraff 2007) b. Tonton Bouki ap ale. uncle Bouki  go. ‘Uncle Bouki is leaving.’ c. Konbyen dan Tonton Bouki genyen? how_much tooth uncle Bouki have. ‘How many teeth does uncle Bouki have?’ d. Le klosh ape sone aster. the bell  ring. now. ‘The bells are ringing now.’ (Roberts 1999)

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

132

 ,  ,    Table 5.10. Verb alternations in Haitian Verb

SF

LF

́ ‘to go’ ́ ‘to look’ ò ‘to go out’  ‘to come  ‘to eat’ ̀ ‘to do/make’  ‘to give’

al gad sòt vin gen fè ba(n)

ale´ gade´ sòti vini genyen fèt bay

Notice that the behaviour in (21b) is also attested in Guadeloupean with the verb ale. The opposition fèt fè ‘to do/make’ also occurs in both creoles. In addition, DeGraff (2001) claims that LFs are used for emphasis. He concludes that verb alternations in Haitian are an instance of inﬂectional morphology whose realization is determined by phonological phrasing and argumenthood.

5.6.3.2 Derivational relations in Haitian Deverbal nominalization is evidently productive from a qualitative perspective in Haitian, since a number of derived nominal stems have no counterpart in French, for example those in (23). (23)

a.  ‘to run’ !  ‘the action/result of running’ b.  ‘to lie’ !  ‘the action/result of lying’ (Lefebvre 1998)

Because very few verbs in Haitian exhibit an overt inﬂectional alternation between long and short forms, there are few cases of derivation where one can readily identify the choice of one alternant over the other. When cases of this sort do occur (typically in conversions), they involve the LF in some instances and the SF in others, as in Table 5.11. Sufﬁxal derivation of nouns from verbs often involves a vowel-initial sufﬁx, as in (24); the existence of a sandhi rule eliminating vowel hiatus by means of stemﬁnal vowel truncation might (as in Guadeloupean) be claimed to allow such derivatives to be based on a verb’s LF. But as in Guadeloupean, the noun-forming sufﬁx -man does not create vowel hiatus; its appearance in post-consonantal positions therefore cannot be attributed to elision, but must be seen as the effect of direct sufﬁxation to a verb’s LF-stem. In some cases (e.g., (25)), the resulting nominalization has no counterpart in French, and so cannot be seen as a direct inheritance from the lexiﬁer. We must therefore assume that as in Guadeloupean, a Haitian verb’s LF-stem sometimes participates directly in the workings of its derivational morphology.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

    

133

Table 5.11. Deverbal nominalizations in Haitian Verb LF ‘to come’

→ Noun SF

vini

‘arrival’

alé

‘departure’

sòti

‘going out’

gad

gad

‘look’

gen

gen

‘gain’

vini

by suffixation

vin alé

‘to go’

by conversion

al

‘to go out’

sòti sòt gadé

'to see’ ‘to win, to gain’

genyen djòle

‘to chat’

tranché ‘to cut up, to slice’

djòl-è*

hidden LFstem djòl

hidden LFstem tranch

tranché ‘labor pain, shoemaker’s knife’

‘talker’

tranch-man ‘pain’

bati ‘to build’

special hidden stem batis

batis-man

‘construction (action)’

In a given row, each nominalization has that row’s verb form as its base stem. *An asterisk marks a derived stem that is morphologically ambiguous, involving either (a) a base stem that is a hidden LF-stem or (b) a base stem that is an LF whose final vowel undergoes prevocalic elision.

(24)

a.  ‘to bet’ !  ‘a bet’ b. ̀ ‘to chat’ ! ̀̀ ‘talker’ (Lefebvre 1998)

(25)

 ‘to chat’ !  ‘a chat’12

(DeGraff 2003)

VN compounds might seem to afford a parallel argument, since the verb in such compounds often appears to be an LF-stem; for example,  ‘break’,  ‘break’, ¹² Nominalizations similar to kozman include for instance ajoutman ‘addition’, frapman ‘knocking’ and pledman ‘discussion, quarrel’, which are absent in contemporary French but found in Medieval French. DeGraff (2003: 69) rightfully argues that these might have been inherited from regional varieties spoken in the colonies in the seventeenth century.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

134

 ,  ,   

and  ‘walk’ all seem to be represented by their LF-stems in the compounds ̀ ‘a destructive individual’ (Fr. -), ̀ ‘hard question’ (Fr. ˆ ), and  ’stoop, steps to a house’ (Fr. -); but these compounds all apparently originate in French, and evidence of the productivity of exocentric VN compounds is in general lacking in Haitian (Lefebvre 1998: 345). A ﬁnal parallel between Haitian and Guadeloupean pertains to denominal verbs. Verbs are apparently derived from nouns by means of a sufﬁx -e, which sometimes produces verb forms having no counterpart in French. (The examples in (26) illustrate.) But as in Guadeloupean, these can instead be seen as instances of N!V conversion whose output is a verb’s LF-stem (in which case -e has the role of an LF-forming verb sufﬁx); here again, distinguishing a verb’s LF-stem from its SF affords a more streamlined account of derivation. (26)

a.  (stem pansyon) ‘thought, anxiety’ !  (LF pansyon-e) ‘to think, to ponder’ b.  (stem makak) ‘stick’ !  (LF makak-e) ‘to hit with a stick’ c.  (stem bourik) ‘donkey, work horse’ !  (LF bourik-e) ‘to work like a dog’ d. ̀ (stem tèk) ‘a hit (in marbles)’ ! ̀ (LF tèk-e) ‘to hit a marble’ (Lefebvre 1998; DeGraff 2003)

It is clear that at least some Haitian verbs possess special hidden stems. Each of the verbs in (27) has a special hidden stem used in derivation (e.g., with the nominalizing sufﬁx -man: vomis-man) but not in inﬂection. The productivity of this pattern of alternation is attested to by the fact that it gives rise to derivatives having no counterpart in French, as in (28). (27)

  

‘to vomit’ ‘to refresh’ ‘to cool’

(28)

  ̀/̀

  

‘vomiting’ ‘refreshment’ ‘cooling’

‘to build’  ‘to ﬁnish’  ‘to thank’ ̀

‘construction (action)’13 ‘end’ ‘thanking’

Thus, relations of deverbal nominalization in Haitian are comparable in complexity to those of Mauritian and French. The base stem in deverbal nominalization is the LF for some verbs, the SF for others, the LF-stem for others, and a special ¹³ Finissement and b^ atissement can be found in Medieval French, but not *remercissement.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

    

135

Table 5.12. Complexity of derivational relations in French, Mauritian, Guadeloupean, and Haitian

Degree of complexity in base-stem predictability Degree of complexity in base-stem restrictedness

French

Mauritian

Guadeloupean

Haitian

2

2

2

2

2

2

1

2

hidden stem for still others. Base-stem predictability in the deﬁnition of deverbal nominalizations therefore attains complexity of degree 2. And given that the base stem in some deverbal nominalizations is a special hidden stem, base-stem restrictedness in the deﬁnition of these nominalizations likewise exhibits complexity of degree 2.

5.7 Conclusion In this chapter, we have presented criteria for assessing the integrative complexity of a morphological system’s derivational relations, and we have applied these criteria in an analysis of derivational relations in Mauritian, Guadeloupean, and Haitian. We have demonstrated that each of these languages possesses deverbal nominalizations that are not a mere inheritance from the lexiﬁer language but must be seen as the effect of a productive process within the creole itself. Moreover, we have shown that the complexity of the derivational relations in these creoles attains the same degree of complexity as those of the lexiﬁer; our results are summarized in Table 5.12. When a verb L is the base lexeme in a derivational relation, the identity of L’s base stem in L’s stem set is not, in general, predictable either in French or in Mauritian, Guadeloupean, or Haitian; moreover, the status of L’s base stem in the deﬁnition of L’s morphology may be as peripheral in Mauritian and Haitian as in French. These results challenge the extreme simplicity that has so often been attributed to creole morphology. We hypothesize that as further work is done on the morphology of creole languages, other sorts of derivational processes will be found to exhibit a comparable level of integrative complexity.

Acknowledgements We would like to thank Jean-Michel Benjamin for his input on the Guadeloupean data.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

6 Simpliﬁcation and complexiﬁcation in Wolof noun morphology and morphosyntax Michele Loporcaro

6.1 Introduction In this chapter, I will describe how Wolof noun morphology has become simpliﬁed, compared with the system that can be reconstructed for a previous stage through comparison with other Atlantic languages (the subdivision of the Niger-Congo family to which Wolof belongs). On the other hand, I will also show that, in some respects, Wolof noun morphology and especially morphosyntax has become more complex—more complex than in previous stages of the language and also more complex than usually assumed in the literature—acquiring new irregularities. The Wolof—and Atlantic—facts will be scrutinized against the background of recent research on linguistic complexity. Since the study is about the grammatical system and does not adduce any psycholinguistic evidence (from language usage and/or processing), I will be addressing what the relevant literature (e.g., Dahl 2004: 39; Miestamo 2008: 27; Sinnemäki 2008: 72; Lindström 2008: 217) labels ‘absolute complexity’, not what is sometimes called ‘relative complexity’ (Kusters 2008: 4–8), that is, memory cost/difﬁculty (Hawkins 2007). The chapter is organized as follows: in section 6.2, I introduce the language and its classiﬁcation; in section 6.3, I present the basics of the Wolof noun class system, which is then placed in its Atlantic context in section 6.4.¹ In section 6.5, I will brieﬂy introduce the distinction between complexity and morphological richness— as deﬁned in the literature on morphological complexity I take as a point of reference (in particular Baerman et al. 2010; 2015b; 2017; Dressler 2011)—and how complexity and richness relate to morphological type, to then move on to ¹ While the data from other Atlantic languages are drawn from the available literature, for Wolof available sources are complemented with ﬁrst-hand data from the variety of Mbakke (Mbacke), lying about 150 kilometres east of Ndakaaru/Dakar, in the territory of the traditional kingdom of Bawol which is part of the Wolof heartland, the area on whose dialects the standard variety of Wolof is based. These were collected in cooperation with Cheikh Anta Babou, to whom I am indebted, and are presented in more detail in Babou & Loporcaro (2016). Glossing obeys the Leipzig glossing rules: in addition,  indicates class marker (without numbering for Wolof, since contrary to other NigerCongo languages mentioned in the chapter, there is no agreed-on numbering of noun classes in studies on Wolof). Michele Loporcaro, Simpliﬁcation and complexiﬁcation in Wolof noun morphology and morphosyntax In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Michele Loporcaro. DOI: 10.1093/oso/9780198861287.003.0006

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

137

considering complexifying changes which have affected Wolof noun morphology, changing many aspects of what must have earlier been a coherently agglutinative system into a system which, in addition to many properties going towards the isolating type, also developed some inﬂectional irregularities normally found in inﬂecting-fusional languages. The section also compares similar developments in other Atlantic languages, while section 6.6 addresses complexiﬁcation in the paradigm of agreement targets. Finally, in section 6.7, I discuss whether the diachronic dynamics of change observed in the language may be explained in external terms, considering the sociolinguistic setting of the language and the nature of the speech community in which it is spoken.

6.2 Wolof and Atlantic languages Wolof is the native language of four million (Lewis et al. 2015) to 4.5 million (Leclerc 2015), and the main inter-ethnic lingua franca among the thirteen million inhabitants of Senegal. It is also spoken in Gambia (about 226,000 speakers), where it is the second most spoken language after Mandinka, Mali (62,000 speakers), Mauritania (around 16,400 speakers), and Guinea Bissau, as well as in migrant communities in Europe (France, Italy, and Spain) and the USA (mainly New York City).² The evidence to establish change in Wolof is twofold: on the one hand, the language has been described thoroughly since the early nineteenth century (cf. Dard 1825, 1826, Boilat 1858, Kobès 1869, etc., with some news on relevant aspects of its structure available since as early as the late sixteenth century: cf. Doneux 1978: 45), so that changes leading to the present situation can be followed through the extant documents and descriptions. Transcending this limited timedepth requires reconstruction, and this poses problems since the classiﬁcation of Wolof within the Northern Atlantic branch of Niger-Congo is debated: the traditional view considers Wolof most narrowly related to Fula, and places Wolof/Fula, together with Seereer, in a Senegambian subdivision of Atlantic (cf. Sapir 1971: 47f; followed by Wilson 1989: 87f; Childs 2004, 2010: 36, etc.), while Doneux (1978: 43–5) and Segerer (2010: 4f) propose alternatively that the closest relative to Wolof is the Ñuun (also: Bagnoun, Bainuk, Baïnounk) language/dialect cluster (straddling Casamance, in Southern Senegal, the north of Guinea-Bissau, and Gambia), and Pozdniakov (2015: 58) lists Fula/Seereer, Buy/Nyun, and Wolof as three different branches of Northern Atlantic. Be that as it may, all the ² Occasionally, one comes across much lower ﬁgures in the literature: see, for example, Njie (1982: 16), reporting slightly more than one million speakers (‘le wolof se parle en Gambie et au Sénégal par un peu plus d’un million de personnes’). Higher ﬁgures (e.g., the 7.5 million reported by Perrin 2012: 11) are given by authors not drawing the distinction between native/L1 and vehicular/L2 usage of Wolof.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

138

 

languages mentioned display a better-preserved noun class system of the NigerCongo type than Wolof, a fact that must be kept in mind when reconstructing past changes leading to the grammatical system observed today.

6.3 Wolof noun classes: the basics and the received view In the rich literature on Wolof, the language is invariably described as featuring ten noun classes (henceforth abbreviated NCs), eight singular and two plural, marked on determiners and other noun modiﬁers occurring adnominally as well as pronominally.³ A complete list of the usually assumed classes is given on the horizontal dimension in (1), while (1a)–(1d) exemplify the larger list of class-marked function words: 

(1) NC marker a. proximal deﬁnite article b. distal deﬁnite article c. proximal demonstrative d. distal demonstrative etc.

bbi ba bii bee

ggi ga gii gee

kki ka kii kee

jji ja jii jee

lli la lii lee

 mmi ma mii mee

ssi sa sii see

wwi wa wii wee

yyi ya yii yee

ññi ña ñii ñee

Taking the proximal deﬁnite article, the following examples illustrate NC contrasts: (2)

a. xarit b-i friend -. ‘the friend’ c. nit k-i person -. ‘the person’ e. ndongo l-i disciple -. ‘the disciple’ g. soxna s-i honourable lady -. ‘the honourable lady’

b. góor g-i man -. ‘the man’ d. jëkkër j-i husband -. ‘the husband’ f. njëngtéef m-i sorcerer -. ‘the sorcerer’ h. far w-i lover/ﬁancé -. ‘the lover/ﬁancé’

³ Cf., for example, Boilat (1858: 11ff); Rambaud (1898: 11); Delafosse (1927: 30f); Labouret (1935: 46); Gamble (1957: 134); Sauvageot (1965: 72–4); Stewart & Gage (1970: 392); Sapir (1971: 75); Irvine (1978: 43); Thiam (1987: 9); Fal et al. (1990: 17); Mc Laughlin (1997: 2); Munro & Gaye (1997: ix); Becher (2001: 42); Ndiaye (2004: 26); Camara (2006: 11); Diouf (2009: 153); Guérin (2011: 84); Tamba et al. (2012: 895); Torrence (2013: 16); Pozdniakov & Robert (2015: 548). The notion ‘noun class’ is used in different ways by different authors, within and beyond African language studies (see the discussion in Babou & Loporcaro 2016: 4–6).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

139

i. xarit/jëkkër/ndongo/njëngtéef/soxna/far y-i friends/husbands/disciples/sorcerers/ladies/lovers -. ‘the friends/husbands/disciples/sorcerers/lovers’ j. góor/nit ñ-i man/person -. ‘the men/persons’ As usual in Atlantic languages, there is a disproportion between classes, in several respects: (a) a disproportion with respect to number, as there are eight singular classes as opposed to only two classes traditionally recognized for the plural: yi plurals ((2i)), and ñi plurals ((2j)); and (b) an imbalance in numerosity. The exhaustive list of ñi plurals (eleven lexemes in all, all denoting humans) is the following: (3)

gaa/gan/géer/gor/góor/jaam/jigéen/ people/guest/non-casted/free man/man/slave/woman/ mag/maggat/ndaw/nit ñi adult/old person/youngster/person .-. ‘the people/guests/non-casted/free men/men/slaves/women/adults/old people/youngsters/persons’

All the rest of the nouns take yi in the plural ((2i)). Likewise, in the singular the bi class in (2a) accounts for the vast majority of nouns, and has been constantly attracting new members, as schematized in (4) (based on Becher 2001: 42–52): (4)

incidence of the bi class among singular nouns: a. b. c. NineteenthTwentiethToday, urban/Dakar century rural century rural 44% > 64% > ‘for the most part’ > Dard (1825), Irvine (1978: Tamba et al. (2012: Kobès (1875) 51) 894, n. 5)

d. Today, urban/Banjul 90% Becher (2001: 47f)

Its incidence has grown from less than 50% in nineteenth-century rural Wolof to near generalization in the contemporary urban language. As a result, the agreement pattern selected by most nouns in all varieties of Wolof is the one in (5) (singular bi/plural yi):⁴

⁴ This is the default agreement class (consisting of the two default NCs for singular and plural), both in lexical and in syntactic terms: lexically, loanwords are assigned bi/yi class membership (cf. Rambaud 1898: 22; Stewart & Gage 1970: 392; Guérin 2011: 83); syntactically, there are rules substituting yi for other plural markers under certain conditions (cf. Babou & Loporcaro 2016: 16, 31f).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

140

 

(5)

a. buur b-i /king .-. ‘the king is ready; he . . . ’

noppi ready

na/*na-ñu . . . moom . . . .3/-3 3

b. wuur y-i noppi /king .-. ready ‘the kings are ready; they . . . ’

na-ñu/*na . . . ñoom . . . -3/.3 3

Note that class-agreement is marked exclusively on determiners (boldfaced in (5)), while adjectives (which really are stative verbs in Wolof) do not mark class contrasts. Verb auxiliaries and pronouns mark person and number, not class.

6.4 Wolof within the Atlantic context Thus, Wolof has moved far away from the pervasiveness of agreement typically observed in Niger-Congo, including Atlantic languages. Compare the Fula examples in (6), where the word for ‘king’ is class-marked itself and controls class-agreement on adjectives and function words; or the Baïnounk examples in (7), with classagreeing demonstratives, adjectives, and numerals; or those from Diola-Fogny in (8), with class-agreement also on the verb (again, class markers are boldfaced for clarity): (6)

Pular, Fuuta Jaloo (Guinea; Diallo 2010: 80f): a. lan-ɗo maw-ɗo mo yiiɗ-en on ko janan-o king-. old-. . see-.1  be foreigner-. ‘the old king we saw is a foreigner’ b. lan-ɓe maw-ɓe ɓe yiiɗ-en ɓen ko janan-ɓe king-. old-. . see-.1  be foreigner-. ‘the old kings we saw are foreigners’

(7)

Baïnounk, Gubaher; Ñuun (Casamance, Senegal; Cobbinah 2010: 186) a. bә-kәr ba-m-ba / bә-kәr-әŋ ba-naːk-aŋ -chicken -.- / -chicken- -two- ‘this chicken’ ‘two chickens’ b. feːbi fa-dikaːm goat -female ‘female goat’

(8)

/ /

feːbi-ɛŋ fa-naːk-aŋ goat- -two- ‘two goats’

Diola-Fogny (Casamance, Senegal; Sapir 1965: 24, 90) a. bu-bәːr-ә-b bә-mәk-ә-b bu-lɔlɔ 9-tree--9 9.-big--9 9-fall ‘the big tree fell’ b. u-bәːr-ә-w wә-mәk-ә-w u-lɔlɔ 8-tree--8 8.-big--8 8-fall ‘the big trees fell’

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

141

Since this pervasiveness of class marking on nouns and different agreement targets is a property reconstructed for Niger-Congo, and for Atlantic, Wolof has lost it, which boils down to loss of complexity, under the view that redundancy adds to complexity, maintained by Dahl (2004: 10) among others: the spread-out of information from a segment of the signal to its neighbours means that the mapping from input to output—and thus the system as such— becomes more complex. (Dahl 2004: 10)

Indeed, most of the changes in noun morphology and morphosyntax from Atlantic to Wolof produced simpliﬁcation, in one way or the other: there has been loss of redundancy in agreement (as readily apparent from comparison of (5) with (6)–(8)), and reduction in the number of NCs (Proto-Atlantic had about ﬁfteen NCs; Doneux 1975: 114), which amounts to loss in constitutional complexity, in Rescher’s (1998: 9) terms. We have also seen (in (4)–(5)) that there is a trend towards the generalization of the default NCs. This is the kind of changes the literature on Wolof tends to focus on. However, there were also changes which made the system more complex, leading to the rise of (previously absent) morphological irregularity (in static morphology; in Dressler’s 2011: 161 terms), both on nouns (with the rise of inﬂectional classes (ICs), untypical for agglutinating languages), and on agreement targets (rise of defective and otherwise irregular paradigms). These are the changes on which I am going to focus in what follows.

6.5 Complexiﬁcation in Wolof noun inﬂection, against the background of Atlantic noun class systems 6.5.1 Morphological complexity vs. morphological richness Niger-Congo languages on the whole have agglutinative morphology. In an ideally agglutinating language, as pointed out, for example, by Dressler (2011: 160), we expect to ﬁnd less complexity than in languages of the inﬂectingfusional type: Strongly inﬂecting-fusional languages have a sizeable amount of morphological richness, but also many unproductive patterns, i.e. additional morphological complexity. Strongly agglutinating languages have much more morphological richness, but ideally no unproductive morphological patterns, a situation nearly completely obtained by Turkish. (Dressler 2011: 160)⁵

⁵ As is well-known in Turkish ‘there are no inﬂectional classes’ (Wurzel 1989: 74).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

142

 

To recognize this, though, one has to distinguish complexity from richness of inﬂection: I agree with Baerman et al. (2010: §1) that the size of a paradigm is not a primary criterion of complexity; it is [ . . . ] a criterion of morphological richness dependent on the importance of inﬂectional morphology in the morphology– syntax interface. (Dressler 2011: 160)

Under this view, the morphology of an ideally agglutinating language is rich, not complex. To mention just one crucial aspect, relevant for the present discussion, such a language, lacking inﬂectional classes, lacks ‘the additional structure imposed by inﬂectional morphology, above and beyond its dedicated task of expressing syntactic and semantic distinctions’ (Baerman et al. 2010: 1). As a ﬁnal remark to this section, note that the use of notions such as ‘agglutinating’ and ‘inﬂecting-fusional’ in morphological typology has been criticized, most inﬂuentially by Haspelmath (2009), who analyses what he calls the ‘Agglutination Hypothesis’ into three distinct indexes (the Cumulation, the Alternation, and the Suppletion Index) and takes it to be falsiﬁed by the fact that, on the whole, the languages in his sample score differently on the three. A language displaying one-toone correspondence between form and meaning in inﬂectional morphology scores higher on the Cumulation Index than languages allowing for one-to-many correspondences. The ‘Alternation Index’, on the other hand, assigns 0 to languages ‘which exhibit complete stem invariance’, and higher values to languages showing more ‘stem alternations, that is, the (co-)expression of morphological categories by changing, rather than adding to, the stem’ (Haspelmath 2009: 17). The ‘Suppletion Index’, ﬁnally, is ‘deﬁned as the average percentage of subcategories (per categorysystem) that exhibit afﬁx suppletion’ (Haspelmath 2009: 22). Note that the only Niger-Congo language in the sample (Swahili) scores 0.1 on the Cumulation Index, while a paramount instance of an agglutinating language such as Turkish (Haspelmath 2009: 23) scores 0. Both Swahili and Turkish also score 0 on the Alternation Index. On the Suppletion Index, on the other hand, Turkish scores 23/100 and Swahili 28/100, which is far from 0 (Nivkh) but much closer to it than to the score reached by a typically ‘inﬂecting-fusional’ language like Latin (84/100). Thus, despite the scepticism Haspelmath airs about the usefulness of the ‘agglutinating’ vs. ‘inﬂecting-fusional’ distinction, his own data show that it is far from odd to qualify languages such as Turkish or Swahili as consistently agglutinating, for the purposes of the present study. More broadly, Haspelmath’s line of argument seems to be at odds with the notion itself of a ‘type’, whose legitimacy cannot be called into question by pointing to empirical objects which poorly ﬁt the ideal instantiation of it, however deﬁned, given that ‘linguistic types’ are ‘ideal constructs which natural languages approach to various degrees’ (Dressler 2005: 7).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

143

6.5.2 The emergence of inﬂectional classes in Wolof Like Niger-Congo in general, Wolof too has agglutinating morphology, but this is today the case only in the verb, since the noun has become almost completely invariable, as reﬂected in the white dot (meaning ‘no distinct plural form’) on the WALS map 33 on noun plurality (Dryer 2013), a fact remarked ever since the earliest descriptions of Wolof.⁶ However, while such remarks and the WALS white dot are accurate for the overwhelming majority of Wolof nouns, uninﬂectedness has not yet triumphed completely. In fact, one of the rare lexemes still preserving two distinct forms, that is, buur ‘king’, has already been displayed in (5). The same is the case for about twenty nouns (listed in (9)), whose singular and plural differ because of an alternation in the initial consonant:⁷ (9)

 a. mbaam mi mbootaay mi ndono li ndab li ndënd mi ngàttaan mi b. mbagg mi c. baaraam bi boroom bi buur bi buy bi d. pepp mi e. këf ki

 baam yi bootaay yi dono yi dab yi dënd yi gàttaan yi wagg yi waaraam yi woroom yi wuur yi wuy yi fepp yi yëf yi

Gloss ‘donkey’ ‘piggyback’ ‘heritage’ ‘utensil’ ‘drum’ ‘short one’ ‘shoulder’ ‘ﬁnger’ ‘owner’ ‘king’ ‘baobab fruit’ ‘grain’8 ‘thing’

⁶ On noun invariability in Wolof, see the early remarks by Dard (1826: 14): ‘Mais si le nom n’est pas suivi de la préposition ou, on ajoute après ce nom les articles ya, yi, you, sans jamais rien changer dans son orthographe’ [‘But if the noun is not followed by the preposition ou, one adds after this noun the articles ya, yi, you, withouth ever changing anything in its orthography’]. Similarly, Boilat (1858: 7) points out: ‘En Wolof, les noms ne changent pas de terminaison dans les différentes combinaisons que leur fait éprouver le discours, pas même en passant du singulier au pluriel’ [‘In Wolof, nouns do not change ending in the different combinations in which discourse places them, not even when they change from singular to plural’]. Thus, ‘le substantif est invariable’ [‘the noun is invariable’] (Boilat 1858: 11). ⁷ The alternations—as described in Sauvageot (1965: 74); Diagne (1971: 79); Diouf (2009: 155); Camara (2006: 7–8), etc.—may take different forms, illustrated in (9). The proximal form of the deﬁnite article—already seen in (1)–(2)—is added after each word form, to indicate that the two occur in distinct environments (thus glosses expand to ‘the x/the x’s right here’). ⁸ Camara (2006: 8) also reports pan/fan ‘day/days’, showing the same p-/f- consonant alternation as in (9d). However, this paradigm is no longer attested in Mbakke Wolof, where the formerly plural form fan has generalized and is used for singular as well: for example, benn fan jàll na ‘one day has passed’. The lexeme fan is reported as invariable also in also Fal et al.’s (1990: 70) dictionary: fan wi ‘the day’/ ñaari fan ‘two days’. The older singular form pan still occurs only in the ﬁxed expression weer-u benn pan ‘the ﬁrst day of the month’ (literally ‘crescent-. one day’).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

144

  f. bët bi bëñ bi g. loxo bi h. waa ji

gët yi gëñ yi yoxo yi gaa ñi

‘eye’ ‘tooth’ ‘hand, arm’ ‘guy’

For most lexemes, this difference today is only optional since—with the sole exception of këf ‘thing’—the singular form may, and indeed tends to, be used in plural contexts, while the reverse is not the case (see Guérin 2011: 85; Babou & Loporcaro 2016: 10). Once uninﬂectedness is generalized, noun morphology will have become simpliﬁed again, but as long as paradigms such as those in (9) survive, they represent an increase in morphological complexity, determined by changes which introduced morphological irregularity of the sort familiar from inﬂecting-fusional languages: in other words, that in (9) is evidence for the occurrence of (residual) inﬂectional classes in Wolof. Note also that free variation in the plural cell of those noun lexemes determines overabundance (Thornton 2011; Meakins & Wilmoth, Chapter 4, this volume), that is, variation between two cell-mates (Loporcaro & Paciaroni 2011: 420), thus contributing to a local increase in complexity, if only ephemeral, on the way towards simpliﬁcation.

6.5.3 Agglutinative noun-class morphology and inﬂectional classes in other Atlantic languages The initial consonant alternations deﬁning these inﬂectional classes are the last remnants of two distinct but intertwined processes which are observed—with varying degrees of regularity—in the neighbouring Atlantic languages, and speciﬁcally, in those to be considered as representative comparator languages from the North Atlantic branch under either classiﬁcation hypothesis for Wolof (see section 6.2), that is, either Fula and Seereer or Ñuun. The two processes are one morphological (NC-preﬁxation), the other morphonological (initial consonant mutation). Integration of initial consonant mutation into the NC system is an innovation that is currently reconstructed for Proto-Northern Atlantic (see Pozdniakov 2015: 60), even if not preserved in all daughter languages: in Ñuun languages, ‘the system is barely operative now, but can be partly reconstructed’ (Wilson 2007: 86), and the same is true of Wolof, as discussed in (18)–(19) below. In Fula and Seereer, by contrast, the consonant mutation system itself and its interaction with NCs are well-preserved. As an illustration consider the word koor ‘man’ in Seereer-Siin (or SiinGandum, the most conservative variety of Seereer in this respect, spoken in the Sine region of Senegal; see Faye 2013: 3, 9). This nominal root may occur, with distinctive morphology, in several of the sixteen NCs of the language (see Mc Laughlin 2000: 336)—eleven of them displaying overt class preﬁxes, ﬁve lacking

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

145

them, all selecting class-marked enclitic determiners—thus generating word forms such as the following (see also Mc Laughlin 1997: 6): (10)

koor ‘man’

o- koor-oxe goor-we o- ŋgoor-oɴɢe fo- ŋgoor-ne a- ŋgoor-ale (-)man-

Class 1 Class 2 Class 12 Class 13 Class 3b

singular Seereer-Siin plural diminutive singular diminutive plural augmentative singular

In (10), one observes consonant mutation on the stem-initial consonant, exempliﬁed here with koor ‘man’, which appears as koor, goor, or ngoor, depending on the class: ‘Stem-initial consonant mutation in Seereer-Siin is morphologically conditioned by noun class in nouns and dependent adjectives’ (Mc Laughlin 2000: 335). Fula, on the other hand, has twenty-one to twenty-ﬁve NCs, according to dialects,⁹ and having lost all NC preﬁxes, contrasts NCs by means of sufﬁxes,¹⁰ on nouns as well as on agreement targets, resulting in very elaborate paradigms. The initial consonant of both stems and sufﬁxes is subject to mutation, whose effects are exempliﬁed in (11)–(12) with data from the dialect of Gombe (Northern Nigeria), excerpted from the detailed account offered by Arnott (1970: 79–109): (11)

Fula, Gombe, N. Nigeria (Arnott 1970: 87). Sufﬁx grades, lexically selected (invariable stems): Grade A Grade B Grade C Grade D Class Gloss (grammatical) ɓoy-re leemuu-re tummu-de loo-nde 9 ‘x’ ɓoy-e leemuu-je tummu-ɗe loo-ɗe 24 ‘x’s’ ɓoy-el leemu-yel tummu-gel loo-ŋgel 3 ‘small x’ ɓoy-um leemu-yum tummu-gum loo-ŋgum 5 ‘worthless little x’ ɓoy-on leemu-hon tummu-kon loo-kon 6 ‘small x’s’ ɓoy-a leemu-wa tummu-ga loo-ŋga 7 ‘big x’ ɓoy-o leemu-ho tummu-ko loo-ko 8 ‘big x’s’ ‘baobab fruit’ ‘orange’ ‘calabash’ ‘storage pot’ Gloss (lexical) ɓoyleemu(u)- tummulooStem The horizontal dimension shows grade alternation in sufﬁxes, while on the vertical dimension an arbitrary selection of NCs is offered for illustration. For nominal stems, the grade depends on the class, which in turn correlates largely ⁹ For the Senegalese variety of Pulaar Mc Laughlin (1997: 7) describes twenty-one NCs, while twenty-two are reported for the one described by Sylla (1982: 31) and twenty-ﬁve for the Gombe dialect (Northern Nigeria) described by Arnott (1970: 75). ¹⁰ This ‘afﬁx renewal’ occurs not only in North Atlantic, as also in ‘at least one language of South Atlantic, Kisi, the normally preﬁxed NCMs [= noun class markers] are sufﬁxed’ (Childs 2009: 117; see Childs 1983 and the recent discussion by Di Garbo 2014: 80).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

146

 

(though not perfectly, see Arnott 1970: 73) with the semantics, as shown in the gloss column on the right-hand side: thus, for instance, class twenty-four hosts word forms which are plural to class 9; class 3 is the corresponding diminutive singular, which pluralizes in turn as class 6; class 5 is diminutive/pejorative; and so on. For sufﬁxes, by contrast, the grade is lexically selected by the (lexical speciﬁcation of the) stem. The data in (11) exemplify invariable stems, where only class sufﬁxes vary according to the class-dependent consonant grade, while the noun stem stays the same because its initial consonant is an invariable one, not involved in consonant mutations, observed here only on sufﬁxes. Thus, for instance, in class 9 the forms -re, -de, -nde, marking different grades, are related morphonologically via mutation with each other, and are selected by the individual noun lexemes so that, for example, ‘baobab fruits’ cannot be *ɓoy-je/-ɗe (i.e., cannot take plural class 24 sufﬁxes of grades B–D) because of lexical speciﬁcation. The nouns in (12), by contrast, exemplify what Arnott (1970: 93) calls ‘variform’ stems (only some consonant alternations are displayed here, as selected by grades A, C, and D; in other words, (12) displays an arbitrary selection, not only of noun classes, but also of grades and consonant alternations; the reader is referred to Arnott’s description for a full account of the intricacies of this fascinating system): (12)

Fula, Gombe, N. Nigeria (Arnott 1970: 98). Consonant alternation in noun stems of different grades: Grade A Grade A Grade C Grade D Sufﬁx grade (selected) r/d/nd w/b/mb w/g/ŋg y/g/ŋg C- alternation on stem Class Gloss (grammatical) dim-o beer-o gor-ko gim-ɗo 1 ‘x’ rim-ɓe weer-ɓe wor-ɓe yim-ɓe 2 ‘x’s’ dim-el beer-el gor-gel gim-ŋgel 3 ‘small x’ dim-um beer-um gor-gum gim-ŋgum 5 ‘worthless little x’ ndim-on mbeer-on ŋgor-kon ŋgim-kon 6 ‘small x’s’ ndim-a mbeer-a ŋgor-ga ŋgim-ŋga 7 ‘big x’ ndim-o mbeer-o ŋgor-ko ŋgim-ko 8 ‘big x’s’ ‘free man’ ‘host’ ‘man’ ‘person’ Gloss (lexical) rimweerworyim Stem

For instance, the ﬁrst two stems rim- ‘free man’ and weer- ‘host’ select the same class sufﬁxes (both grade A) but differ in the initial consonant, while the other two, wor- ‘man’ and yim- ‘person’, select allomorphs of the class sufﬁxes which differ from each other, apart from some syncretisms (seen in classes 6 and 8). Thus, for instance dim-o, gor-ko and gim-ɗo all display what is morphologically the same class 1 sufﬁx, but in different allomorphs.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

147

In other words, what we have here is different inﬂectional classes, in spite of the overall agglutinating character of Fula morphology. The Fula situation, as for the selection of the forms of each NC sufﬁx, is closer to that of an inﬂecting-fusional language like Italian, with ICs, than to that of a strongly agglutinating language like Turkish, without inﬂectional classes, as schematized in (13): (13)

inﬂectional classes in Fula? i. alternative forms of the inﬂections are related phonologically ii. alternative forms of the inﬂections are selected phonologically

a. Turkish b. Fula c. Italian + – +

–

–

Turkish has no inﬂectional classes, since the alternants of each afﬁx are selected phonologically (e.g., ev/ev-ler ‘house/-s’ vs. yol/yol-lar ‘trip/-s’, with plural -ler/-lar depending on the front/backness of the root vowel), while Italian has because cane/can-i ‘dog()-/-’ vs. lup-o/lup-i ‘wolf()-/-’) take different singular endings, not derivable from each other phonologically ((13i)), due to lexical speciﬁcation ((13ii)).¹¹ In Fula too, ‘there seems no advantage in treating all sufﬁxes of each class as morphophonemic variants of a single class sufﬁx’ (Arnott 1970: 68). In fact, while in some cases one observes, between different sufﬁx grades, alternations that could be accounted for through independently valid morphonological rules of the language (e.g., the alternation between voiced and voiced prenasalized stops between Grades C–D in Classes 3, 5, or 7), this cannot be generalized, since, for example, in Class 1 -ko (Grade C) and -ɗo (Grade D) are not related morphonologically. Thus, Fula differs in this respect from an ideally agglutinative language such as Turkish and rather resembles Italian, where inﬂections are selected depending on inﬂectional class (a lexeme-inherent purely morphological property) and are not derived by morphonological rule from one another. In sum, there is no alternative but to recognize the occurrence of inﬂectional classes in Fula too, though this—as highlighted in Babou & Loporcaro (2016: 44)—is a descriptive notion which is hardly used in the grammars of Atlantic languages. More generally, Atlantic languages offer interesting evidence for the rise of inﬂectional classes within an agglutinating system.¹² This applies also to the Ñuun ¹¹ Here, an editorial comment asked: ‘why not analyse -o/-e as part of the stem truncated before plural -i?’. This corresponds to Scalise’s (1983: 293–4) vowel deletion rule, and the alternative between the two is indeed a handbook topic in Italian morphology: the reader is referred to Thornton (2005: 160), who shows that this readjustment rule becomes superﬂuous under a word and paradigm approach to morphology. ¹² An anonymous reviewer comments that, with the present discussion, ‘The author seems to suggest that inﬂectional classes of nouns are an innovation in the history of individual languages’. Actually, one must recognize ICs for previous stages of Atlantic languages: as observed in n. 16, the same mechanisms of consonant gradation responsible for IC-contrasts in Fula are currently assumed

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

148

 

language/dialect cluster, the alternative closest relevant comparator languages for Wolof under the second classiﬁcation hypothesis in section 6.2. For Baïnounk, as shown for different dialects by Sauvageot (1967), Bao Diop (2015; on Baïnounk Gunyamolo) and Cobbinah (2010; on Baïnounk Gubaher), one has to assume inﬂectional classes, since not all nouns are subject to singular and plural formation via NC preﬁxes. Rather, in Baïnounk Gubaher (spoken in the village of Djibonker, south of Ziguinchor, in the Casamance), analysed by Cobbinah (2010: 182–7), only one subset of the noun lexemes forms singular and plural preﬁxally ((14a)), while another substantial subset displays sufﬁxal plurals formed with a default sufﬁx -Vŋ, and divides into a group with plural sufﬁx only ((14b)) and a mixed group combining a plural class-marked preﬁx and the plural class-neutral sufﬁx ((14c)):¹³ (14)

Baïnounk Gubaher (Cobbinah 2010: 182–7) a. preﬁxal class marking, paired for  and : for example, ra-maːsix ‘crab’/  ɟa-maːsix b. no preﬁx in the ;  sufﬁx (class-neutral -Vŋ): for example, bәːb ‘father’/  bәːb-әŋ ‘fathers, old men’ c. preﬁxal class marking in the ;  with preﬁx and class-neutral sufﬁx: bә-kәr ‘chicken’/  bә-kәr-әŋ

While (14a) mirrors the inherited Niger-Congo noun inﬂection, the rest is the product of a series of innovations (e.g., the preﬁxes occurring in type (14c) nouns ‘do not occur as singular preﬁxes in the paired preﬁxed groups or if so then only very rarely’; Cobbinah 2010: 186), which makes the recognition of different inﬂectional classes, as schematized in (14), necessary, even if the combination of morphs in noun word forms largely stayed agglutinative, rather than fusional, in nature. This evidence could be multiplied, another case in point being, for example, Diallo’s (2010), (2014: 151–81) study of the adaptation of borrowed Mande nouns leading to the creation of inﬂectional classes (not present in the native lexicon) in Fuuta-Jaloo Pular, the Fula variety spoken in the Fuuta-Jaloo area in Guinea. This shows that all over the area a trend towards the creation of allomorphy in nominal paradigms (and new inﬂectional class distinctions) is observed.

for earlier stages of Wolof as well. However, this is orthogonal to the fact that new morphological irregularities, deﬁning (new types of) ICs, can be shown to have arisen, as is the case with the stem alternations in (9), which deﬁne (residual) ICs (a) of a kind different from that reconstructed for earlier stages of Atlantic, and (b) that are not usually recognized in the literature, before Babou & Loporcaro (2016). ¹³ Pozdniakov (2015: 79–82) reviews pluralizing sufﬁxes (-Vn/ŋ) from different Atlantic languages suggesting that they may be etymologically related with the plural class marker for humans reﬂected in Wolof as ñ-.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

149

6.5.4 The complexiﬁcation of Wolof noun inﬂection As seen in section 6.5.3, thus, Wolof is not the only Atlantic language to have developed morphological irregularities of the kind found in fusional languages. Since such irregularities add to morphological complexity (section 6.5.1), one must recognize that even the morphological system of Wolof, much less rich than those seen in section 6.5.3, has developed new forms of complexity. Recapitulating so far, the marking of NC contrasts in the North Atlantic languages considered above can be summarized as follows (after Mc Laughlin 1997: 7, with one small modiﬁcation):¹⁴ (15) Class markers in some North Atlantic languages (Mc Laughlin 1997: 7, revised)         a. Seereer-Siin √ √ √ b. Fula √ √ √ c. Wolof (traces) (traces) √ As seen for Fula in (11)–(12), in this language consonant mutations and sufﬁxation (which replaced preﬁxation in the afﬁx renewal process: see n. 10) are involved in lexically conditioned allomorphy deﬁning inﬂectional classes. Some remnants of this situation persist in Wolof ((15c)), though this has neither class preﬁxes nor classmarked clitics nor sufﬁxes but, in its present state, marks NC only on determiners. These remnants are the singular/plural alternations in (9), which concerned many more lexemes in the nineteenth century, as shown in (16), listing lexemes which now have lost consonant alternation but still had it according to nineteenth-century sources: (16) Becher (2001: 50f): nouns with allomorphy in Boilat (1858) and Kobès (1875)   Gloss / today (Fal et al. 1990) banta bi wanta yi ‘stock’ bant bi/yi ‘bit of wood’ badoolo mi wadoolo yi ‘peasant’ baadolo bi/yi bakan bi wakan yi ‘nose’ bakkan bi/yi bopa bi gopa yi ‘head’ bopp bi/yi garab gi yarab yi ‘tree’ garab gi/yi Further language-internal evidence comes from the indeﬁnite article, which is the only noun determiner to occur categorically in pre-nominal position (while ¹⁴ The modiﬁcation consists in indicating the occurrence of traces of earlier preﬁxes for Wolof: see (9) as well as the diachronic data in (16)–(17). In particular, I am non-committal about Mc Laughlin’s distinction between ‘clitic determiners’ and ‘independent determiners’, a distinction one anonymous reviewer ﬁnds fault with: ‘I have serious doubts about the validity of the distinction between “clitic determiners” and “independent determiners”.’

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

150

 

deﬁnite article and demonstratives normally follow the noun, though demonstratives can also be preposed), and the only one to display the class marker after, rather than before, its class-invariable part. According to Doneux (1975: 49), this doubly exceptional distribution arose via reanalysis of earlier preﬁxes, reconstructed as seen in (17a): (17)

Doneux (1975: 49): Wolof prenominal  article < former class preﬁx on noun a. a-b sëriñ ‘a healer’ < *a-b-sëriñ b. sëriñ b-i ‘the healer’ < *bi-sëriñ b-i (bixirim, AD 1594; Ferronha 1994: 24f)

Converging documentary evidence for earlier preﬁxes, seen in (17b), comes from a Portuguese voyager, who—writing in 1594—calls bixirim what is today sëriñ b-i ‘the healer’, which is evidence, as Doneux comments, ‘qu’un préﬁxe (probablement ﬁgé) était encore utilisé à cette époque’ (Doneux 1975: 45). While in this lexeme, like in most Wolof nouns, the preﬁx has been simply dropped, one may argue that some of today’s irregular singular/plural alternations in Wolof (seen above in (9)) show the traces of former class preﬁxes, which have become fused with the stem, as observed also in other Atlantic languages.¹⁵ Among those irregular alternations, some others come instead from consonant mutations, which are regularly involved in NC inﬂection in other Atlantic languages (cf. (15a–b) and the examples above in (10)–(12)). In Wolof, consonant mutation is still regular in some derivational processes, such as diminutive or deverbal noun formation: (18)

a. diminutive formation: garab gi ‘the tree’ janq bi ‘the little girl’

! !

ngarab si njanq si

‘the little tree’ ‘the very little girl’

b. deverbal noun formation: digël ‘advise’ ! jang ‘study’ !

ndigël li njang mi

‘the advice’ ‘the education/knowledge’

The overall mutation pattern, as observed in today’s derivational morphology, is as follows: (19)

Wolof consonant mutations (Mc Laughlin 1997: 4): a. base/non-diminutive b d j g s x b. derivative/diminutive mb nd nj ng c q

ʔ k

In noun inﬂection, however, there is no regular mechanism of consonant mutation contrary to Seereer-Siin and Fula ((15a–b)), but inﬂectional alternations—nowadays ¹⁵ This has been remarked by many scholars: cf. Pozdniakov & Robert (2015: 551) for a recent recapitulation. As for other Atlantic languages, see, for example, Cobbinah (2010: 189) on the so-called ‘literal alliterative concord’ in Baïnunk: ‘the disputed elements [ . . . ] are archaic noun class morphemes in different stages of fusion with the stem’.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

151

irregular—such as (9a–d) and maybe (9h) can be interpreted as remnants thereof.¹⁶ Conversely, alternations such as këf/yëf ((9e)) must go back to original preﬁxes, as suggested by alliteration with the class-marked determiners, while they cannot possibly come from consonant mutation because—as seen also in (18)–(19)—this only involves homorganic consonants in all Atlantic languages: ‘the range of variation, called a , is always restricted to homo-organic consonants, e.g. f/p/mp’ (Sapir 1971: 65). By the same token, one can argue that, e.g., pepp mi/fepp yi (9d) ‘the grain/-s’ may have arisen as an instance of a nowadays lost type of consonant mutation. Summing up, the regular mechanisms occurring elsewhere in the noun inﬂection of other Atlantic languages—consonant mutation and class preﬁxation—have been conﬂated into a synchronic system for which one has no other choice but to assume (residual) inﬂectional classes, that is, that kind of morphological complexity usually occurring in inﬂecting-fusional languages.

6.6 Complexiﬁcation in Wolof: paradigmatic irregularity in some agreement targets Concluding section 6.4, I mentioned changes which led to the rise of morphological irregularity also in the paradigm of agreement targets: in fact, in the indeﬁnite article, some defective and otherwise irregular paradigms have been created in Wolof, which are not inherited from Proto-Atlantic. This boils down to an increase in formulaic complexity (descriptive and generative), in Rescher’s (1998: 9) terms. To see this, however, we have to abandon morphology proper and consider morphosyntax, since agreement is a crucial criterion to establish the irregular paradigms I will be concerned with. The agreement facts at stake crucially involve the recognition (as in Babou & Loporcaro 2016) of two additional NCs in the plural (boldfaced in (20b)) with respect to the current view ((1), repeated here in (20a)): (20)

a. Wolof: eight singular and two plural classes (traditional analysis):   NC marker

b-

g-

k-

j-

l-

s-

m-

w-

y-

ñ-

b. Wolof: eight singular and four plural classes (Babou & Loporcaro 2016):   NC marker

b-

g-

k-

j-

l-

m-

s-

w-

y-

ñ-

j-

s-

The singular/plural pairings of NCs traditionally recognized, even in the most accurate treatments available before Babou & Loporcaro (2016), are schematized in (21a–b) (from Guérin 2011: 84, who highlights that most ¹⁶ See Pozdniakov (1993: 85) and Pozdniakov & Robert (2015: 552f) for a reconstruction of the set of initial consonant mutations—richer than the one still observed today in (19)—involved in NC-related alternations in an earlier stage of Wolof.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

152

 

singular classes combine with both the traditionally recognized plurals, thus resulting in (21b) rather than (21a)), while (21c) schematizes Babou & Loporcaro’s (2016) account:¹⁷ (21) (a)

(b) Expected pairings Singular

Observed pairings

Plural

Singular

k-

Plural

k-

g-

g-

ñ-

jm-

m-

s-

s-

l-

l-

y-

b-

ñ-

j-

y-

b-

w-

w(c) Observed pairings Singular k-

Plural ñ-

gjlmswb-

yjs-

¹⁷ Singular/plural pairings of NCs deﬁne distinct genders: cf. Corbett’s (1991: 190f) analysis of Wolof and Fula.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

153

Note preliminarily that several of the pairings in (21), as well as several of the NCs themselves, are established based on small amounts of lexemes. This ‘inquorate’ character (in Corbett’s 1991: 170–5 terms) is however a normal situation in Atlantic languages, as remarked by Ferry & Pozdniakov (2001: 166): Il est faux de penser que chaque appariement de classe nominale, faiblement représenté, reﬂéterait un ﬁgement ou la disparition de preﬁxes ayant existé. Les langues atlantiques se caractérisent par un trait particulier: on y rencontre souvent une classe spéciale ne comportant que deux ou trois noms ou même un seul. [ . . . ] Chaque langue atlantique présente au moins un mot ayant un accord statistiquement rare, irrégulier, qui traduit une notion sélectionné et marquée dans cette culture précise. [It is wrong to think that each weakly represented NC pairing reﬂects the ﬁxation or the disappearing of preﬁxes that once existed. Atlantic languages are characterized by a particular feature: in these languages, one often comes across a special class featuring no more than two or three nouns, or even just one. [ . . . ] Each Atlantic language displays at least one word that has a statistically rare, irregular agreement pattern, which translates a selected and speciﬁc notion in that very culture.]

Thus, if a consistent syntactic behaviour, distinct from that of other NCs, can be identiﬁed for a set of nouns, however small, this must count as evidence to establish a separate NC. This is what Babou & Loporcaro (2016) did for two additional NCs, the plural classes ji and si. These are homophonous with two singular classes, but must be kept distinct from them because they differ in the agreements they trigger. This is a principle of method that holds in general and is standardly applied also in studies of the Atlantic languages. For example, consider Arnott’s (1970: 72) account of the two homophonous ko classes of Gombe Fula (classes 20 and 8), one singular, one plural, distinguished by agreement: There are two ko classes (8 and 20), with agreement marked by -o, -ho, -ko, ko-, ko elements, etc.; but they are distinguished (i) by the different category of initial consonant in full nominals (F-category in class 20, N-category in class 8 [ . . . ]), and (ii) by the different pattern of agreement with verbal radicals [ . . . ], class 20 being a singular class requiring F- or P-category initial in the verbal radical, while class 8 is a plural class requiring N-category initial in the radical, e.g.: but

20 8

huɗo mbinndirko

ko’o ko’o

wonnake mbonnake

this grass has got spoiled these big pens have got spoiled

Exactly the same happens in Wolof, where what is indeed two couples of distinct classes have been previously confused, disregarding the evidence from verb agreement. This is in fact the only morphosyntactic diagnostic, independent

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

154

 

from class-marker assignment, allowing one to assess the difference between singular and plural NCs in Wolof. Applying the agreement test, it is easy to see that what has been previously lumped together into one class, the si NC, indeed consists of two distinct NCs. On the one hand, si is selected by singular nouns such as soble ‘onion’, in (22a), whose plural is soble yi ((22b)): (22)

a. soble s-i onion .-. ‘the onion is good’

baax good

na .3

/ *na-ñu / -3

b. soble y-i onion .-. ‘onions are good’

baax good

na-ñu -3

/ *na / .3

On the other hand, other nouns that select si, viz. those in (23b), take plural verb agreement (while, of course, when used in the singular the same nouns take another class marker): (23)

a. Séeréer s-i jekk Seereer .-. handsome ‘the Seereers are handsome’ sëriñ s-i ñów healer .-. arrive ‘the healers have arrived’

na-ñu -3

/ *na / .3

na-ñu -3

/ *na / .3

b. Séeréer b-i Seereer .-. ‘the Seereer is handsome’ sëriñ b-i healer .-. ‘the healer has arrived’

jekk handsome

na .3

/ *na-ñu / -3

ñów arrive

na .3

/ *na-ñu / -3

The same can be repeated for plural ji (jeeg/janq ji ‘the women/little girls’, (24b)), which is distinct from singular ji, seen in (2d) and exempliﬁed again in (24c): (24)

a. jeeg/janq b-i lady/little girl .-. ‘the lady/little girl is tired’

sonn tired

na .3

/ *na-ñu / -3

b. jeeg/janq j-i lady/little girl .-. ‘the ladies/little girls are tired’

sonn tired

na-ñu -3

/ *na / .3

c. jigéen j-i woman .-. ‘the woman is tired’

na .3

sonn tired

/ *na-ñu / -3

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

155

The fact that these are plurals has been overlooked in the literature on Wolof up to now because traditionally plurals such as Séeréer si and jeeg ji have been called ‘collective’, in the wake of Sauvageot’s (1965: 73) inﬂuential statement: A l’opposition de nombre singulier/pluriel, s’ajoute celle du collectif. Ce dernier a pour particularités a) de ne pas posséder d’expression propre le distinguant du singulier; b) de ne pas avoir de correspondant pluriel. [To the singular/plural number contrast, one has to add that of collective. The peculiarities of the latter are: a) it does not possess a dedicated expression distinguishing it from the singular, b) it has no corresponding plural.]

There are indeed other African languages—also within the Atlantic family—for which it is justiﬁed to assume a separate value of the category ‘number’, which is called traditionally ‘collective’ (cf., e.g., Sapir 1965: 61, 64, on Diola-Fogny), or ‘collective plural’: In addition to the ﬁrst plural, used with countable nouns, many nouns can combine with a second plural, which is a collective plural for non-countable quantities, or non-speciﬁed numbers of entities (Cobbinah 2010: 184)

The author, describing Baïnounk Gubaher, refers to triplets such as the following: (25)

a. ra-maːsix -crab ‘big crab’

ran-de .-big

b. ɲa-maːsix ɲa-naːk -crab .-two ‘two crabs’ (count plural) c. ɟa-maːsix ɟa-ŋaːn -crab .-. ‘those crabs’ (collective plural) Alternative terminologies include ‘pluriel limité ≠ illimité’ (Sauvageot 1967: 227 on Baïnounk Gunyamolo) or ‘greater plural’ vs. unmarked plural (Corbett 2000: 31): A potentially interesting case of a language with a greater plural is Banyun [ . . . ]. Nouns typically have singular and plural, distinguished by preﬁxes of the type shared by many Niger-Kordofanian languages [ . . . ]. In addition there is a greater plural (which Sauvageot calls ‘unlimited’) [ . . . ] which Sauvageot suggests is used when the number cannot be counted or the speaker feels it unnecessary.¹⁸ ¹⁸ To illustrate, Corbett (2000: 31) cites the paradigm bu-sumɔl ‘snake’ singular ≠ i-sumɔl ‘snakes’ plural ≠ ba-sumɔl ‘snakes’ greater plural.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

156

 

Unlike in these languages, however, in Wolof there is never a three-way contrast of the kind observed, for example, in Baïnounk Gubaher ((25)), and verb agreement guarantees that the contrast is binary, singular vs. plural. The same two pairs of NCs newly recognized in (21c)—singular vs. plural ji and si—are crucial to illustrate the rise of morphological irregularities observed in the paradigm of the indeﬁnite article. Its regular formation is schematized with three nouns from different classes in (26b), compared with that of the deﬁnite article ((26a)): (26) deﬁnite vs. indeﬁnite article formation in Wolof

a. def b. indf

sg xaj bi ab xaj

pl xaj yi ay xaj

‘dog’

sg muus mi am muus

pl muus yi ay muus

sg till gi ag till

‘cat’

pl till yi ay till

‘jackal’

The indeﬁnite article, as shown above in (17a), is the only determiner in which the class marker follows the class-invariable part, thus becoming the ﬁnal consonant. As exempliﬁed in (26), and schematized in (27a), in the regular case there is a correspondence between this ﬁnal consonant and the initial one occurring as a class marker in other determiners. In addition, however, as illustrated in (27b–c), there are two irregular patterns: (27)

a. regular determiner paradigm

def indf

sg C1-i a-C1

pl C2-i a-C2

b. irregular determiner paradigm def indf

sg pl C1-i C2-i a-C1 a-y

c. defective determiner paradigm def indf

sg C1-i *

pl C2-i a-y

  (= sg./pl. pairings of NCs)

bi/yi, ki/yi, gi/yi, mi/yi, si/yi, wi/yi

agreement classes (= sg./pl. pairings of NCs): ki/ñi,gi/ñi, mi/ñi,si/ñi, bi/ñi, bi/ji, bi/si

agreement classes (= sg./pl. pairings of NCs): ji/yi, ji/ñi, li/yi, li/ñi

Paradigm (27b) shows a deviation from the regular formation by which the class-marking consonant yields to y- in the indeﬁnite plural, while in (27c) the indeﬁnite article paradigm is defective, lacking the singular form. In the available literature, the occurrence of ay instead of expected a-C₁ is usually recognized for ñi plurals, seen in (3) above:

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     (28)

157

a-y/*a-ñ nit/góor/jigéen/mag/ndaw/gan -. person/man/woman/adult/youngster/guest ‘(some) persons/men/women/adults/youngsters/guests’

In addition to the pairings involving ñi plurals, however, the list in (27b) also includes the two ‘new’ plural NCs in (20b). In fact, as illustrated in (29b)–(30b), plural ji and si both select the default class marker -y in the indeﬁnite article, on a par with ñi, while indeﬁnite plural *aj and *as do not occur: (29)

a. a-b jeeg/janq ñów -. lady/little girl arrive ‘a lady/little girl has arrived’

na/*na-ñu .3/-3

b. a-y/*a-j jeeg/janq ñów na-ñu/*na -. lady/little girl arrive -3/.3 ‘some ladies/little girls have arrived’ (30)

a. a-b sàmm/Séeréer/sëriñ ñów -. shepherd/Seereer/healer arrive ‘a shepherd/Seereer/healer has arrived’

na/*na-ñu .3/-3

b. a-y/*a-s sàmm/Séeréer/sëriñ ñów na-ñu/*na -. shepherd/Seereer/healer arrive -3/.3 ‘some shepherds/Seereers/healers have arrived’ This provides a further argument against the traditional analysis of Wolof NCs in (20a), because singular si and singular ji, the classes with which our two ‘new’ plural classes were earlier confused, do not behave in the same way. Rather, the singular si class forms the indeﬁnite article regularly, as seen in (31a), while singular ji, as shown in (31b), exempliﬁes the other type of irregularity observed in the paradigms of the indeﬁnite article, that is, defectiveness ((27c)): (31)

a. a-s soxna /gor -. honourable lady /free man ‘an honourable lady/a free man has arrived’

ñów arrive

na/*na-ñu .3/-3

b. *a-j/*a-y jigéen/yaay/jabar ñów na/*na-ñu -. woman/mother/wife arrive .3/-3 intended: ‘a woman/mother/wife has arrived’ In fact, it is not possible at all to form the indeﬁnite article from this class. In order to convey the same meaning, one has to have recourse to suppletion and use instead the (regularly class-marked) form of the numeral C-enn ‘one’, as shown in (32a). This defectiveness also concerns the li class, or the li/yi and li/ñi pairings listed in (27c), as exempliﬁed in (32b) by ndab and ndaw, respectively:

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

158

 

(32)

a. j-enn/*a-j/*a-b jigéen/yaay/jabar ñów na/*na-ñu .-one/-. woman/mother arrive .3/-3 ‘a/one woman/mother/wife has arrived’ b. l-enn/*a-l/*a-b ndab/ndaw .-one/-. dish/youngster ‘one/a dish youngster’

The scheme in (33) recapitulates the different kinds of irregularity found in the paradigm of the indeﬁnite article ((33d)), compared with two regular function words, highlighting (in boldface) the differences between singular and plural ji and si:¹⁹ (33) Irregularity in the indeﬁnite article in Wolof  a. class marker b.   article c. numeral ‘one’ d.  article



b- g- k- j- l- m- s- w- y- ñ- j- sbi gi ki ji li mi si wi yi ñi ji si benn genn kenn jenn lenn menn senn wenn yenn ñenn jenn senn ab ag ak * * am as aw ay ay ay ay

To conclude, not only change in noun inﬂection but also change in agreement target morphology has created new irregularities in Wolof, which add to complexity in a way that had largely gone unnoticed under the traditional—but, arguably, incorrect—view of Wolof NCs in (20a). This ‘local complexiﬁcation’, which yields a more realistic view of Wolof morphology and morphosyntax, can be viewed as an ‘accident’ along a path in which the overall tendency is, for noun morphology, from agglutinating towards isolating: not only are the inherited preﬁxed NC markers long gone, but also the inﬂectional irregularities (stem alternations) seen in (9), partly arisen from them, are on their way to disappearing.²⁰ In other areas of inﬂectional morphology, while the verb maintains its agglutinating structure, pronominal and adnominal agreement targets either stay agglutinative (cf., e.g., (33b–c)) or develop paradigmatic irregularities, as seen for the indeﬁnite article in (27b–c), of the kind linguists usually associate with inﬂecting-fusional type morphology.²¹ Contrary to those in noun morphology, which are in the process of vanishing, the irregularities in the indeﬁnite article are stable as long as the NC system is stable. This, however, is not anymore the case in contemporary urban varieties, which leads us to the last section. ¹⁹ Pozdniakov & Robert (2015: 565) provide a similar scheme, without the two plural classes ji and si, and marking a blank for both neutralization (occurrence of ay for ñ- plurals as well as for y- plurals) and defectiveness (non-existence of forms for singular j- and l-). ²⁰ In this transitional stage, however, as argued while concluding section 6.5.2, variation between two cell-mates in the plural adds to overall paradigm complexity. ²¹ That verb and noun inﬂection can differ, in this respect, within one and the same language, ‘and develop diachronically in typologically different directions’ (Dressler 2005: 7) has been shown by much work on morphological typology (see, e.g., Haspelmath 2009: 25).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

159

6.7 External explanatory factors for structural simpliﬁcation Even once the local increase in complexity in noun and determiner morphology addressed above has been recognized, it remains true that, on the whole, Wolof morphology is both less rich and less complex than that of the closely related Atlantic languages mentioned above (and, hence, of the reconstructible common ancestor, under either of the alternative classiﬁcations in section 6.2). This impoverishment/simpliﬁcation, resulting in a ‘restricted system’ (Pozdniakov & Robert 2015), may be traced back to external factors. In fact, Wolof, a vehicular non-native language for a substantial share of its users, is a typical case of a language spoken in an ‘exoteric niche’ (in Lupyan & Dale’s 2010 terms) or in a ‘Type 2 community’, or ‘an extreme “generalized outsider community” ’ (in Kusters’ 2008: 14 terms). The literature on linguistic complexity has addressed the consequences on morphology that are often observed when the percentage of non-native speakers becomes substantial, concluding that languages spoken in such communities are expected to simplify their morphology: we may conjecture that when a language splits, and one variety becomes more like a Type 1, and the other like a Type 2 community, we expect that the latter becomes simpler in its inﬂectional morphology. (Kusters 2008: 15)

As McWhorter (2007: 2) puts it, ‘that heavy second-language acquisition decreases structural complexity is thoroughly intuitive to most linguists’ (see also McWhorter, Chapter 10, this volume). On the contrary, a language spoken in a tightly-knit local community by small numbers of speakers may be a favourable setting (as argued by Trudgill 2004b, 2009) for better maintenance of linguistic complexity. If one compares Wolof with Seereer, this seems to provide an explanatory framework, as the latter has slightly more than one million speakers in Senegal and Gambia, and its inﬂectional morphology remains substantially richer and more complex than Wolof’s (see (10)). However, this is far from yielding a deterministic explanation, as one easily realizes considering that Fula’s inﬂectional morphology, as seen in (11)–(12), remains both richer and more complex than Wolof’s in spite of the language being spoken by over twentytwo million spread over eighteen countries. Nonetheless, there is a crucial sociolinguistic fact about Wolof, concerning language attitude and prestige hierarchies, that may be invoked as a precondition of the observed simpliﬁcation. For this language, in fact, the (conservative) linguistic norm as reﬂected in school grammars and dictionaries, which is often associated elsewhere with the maintenance of complexity, does not go hand in hand with linguistic and social prestige. Rather, in the Wolof speech community speaking correctly is not prestigious, and this holds true both in rural, socially

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

160

 

traditional areas and in urban ones. In traditional society—as shown by the seminal study by Irvine (1978) and much subsequent work in sociolinguistics (or the ethnography of speaking)—linguistic elaboration and correctness, in keeping with the conservative norm, is associated with griots, low-caste language specialists, regarded as socially inferior in comparison with the ‘géer (“nobles”— farmers, administrators, religious leaders)’ (Irvine 1978: 39). Irvine’s noble informants in general speak in what is considered a less accurate way, involving differences at all levels—as listed by Irvine (2011: 43–5)—from prosody (e.g., ﬂustering style, as opposed to clear voice) to syntax (e.g., incomplete phrase structure, false starts). Simplifying the noun-class system ﬁts into this picture, through what Irvine (1978: 41) labels an ‘appropriate-error strategy’, which crucially involves the generalization of the default class markers bi/yi. The same tendency is observed in urban Wolof as well, as seen in (4c-d) (cf., e.g., Mc Laughlin 2001: 158). Here, the overall strategy to achieve linguistic prestige differs from what is observed in traditional rural social contexts: it is particularly language mixing and extensive borrowing, especially from French in Dakar, which serves the purpose. But all in all, rural and urban society converge, as Irvine (2011: 63f) remarks, in determining higher prestige for ‘bad’, incorrect language: le ‘mauvais’ wolof urbain a quelque chose en commun avec le ‘mauvais wolof ’ des hautes castes rurales. Dans les deux endroits, la ‘plus belle langue wolof ’ n’est pas attribuée aux gens les plus hauts placés. [‘bad’ urban Wolof has something in common with the ‘bad Wolof ’ of rural high castes. In the two settings, the ‘most beautiful Wolof language’ is not attributed to the highest-placed persons.]

Thus, that of Wolophones is not only a Type 2 community, with many non-native speakers, but also a community in which native speakers, in both traditional and urban contexts, tend to adopt themselves, qua prestigious, modes of linguistic behaviour favouring simpliﬁcation, a fact that can be plausibly invoked as an explanatory factor for the overall structural simpliﬁcation of morphology and morphosyntax that Wolof has undergone, compared with its antecessor within the Atlantic language family.

Acknowledgements Thanks to the editors and two anonymous reviewers for comments and constructive criticism on a previous draft, as well as to Cheikh Anta Babou for joint ﬁeldwork on Wolof. Usual disclaimers apply.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

II

T H E CR O S S L I N G U I S T I C PERSPECTIVE

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

7 Canonical complexity Johanna Nichols

7.1 Introduction Of the various ways of measuring linguistic complexity (see the Introduction to this volume; and Sinnemäki 2011), this chapter focuses on what I will call enumerative complexity (EC) and canonical complexity (CC). EC is also known as taxonomic complexity (Miestamo et al. 2008), resources (Dahl 2004), economy (Kusters 2008), the principle of fewer distinctions (Di Garbo & Miestamo in press; deﬁning non-complexity), inventory complexity (my previous work), and other terms. It is based on assessing the number of elements in an inventory or values in a system, for some domain or domains such as the number of phonemes, genders, tenses, derivation types, alignments, word orders, etc. It has been widely used in typological surveys, chieﬂy of phonological complexity (Shosted 2006, Hay & Bauer 2007, Nichols 2009, Donohue & Nichols 2011; Bickel & Nichols 2013 for inﬂectional complexity of verbs), but it has disadvantages. It is straightforward to survey for well-deﬁned and consistently described subsystems such as the phoneme inventory, but guaranteeing comparability of categories elsewhere can raise problems. For example, is it meaningful to compare the sizes of case inventories when a language with few or no cases probably uses adpositions to the same end? Are the number of contrasting members of a (vertically arranged) paradigm and the number of potentially co-occurring morphemes in a templatic structure both inventories and to be compared in the same way? Importantly, EC is not the kind of complexity that ﬁgures most interestingly in studies investigating correlations between linguistic complexity and sociolinguistic history, notably Trudgill (2011) and Dahl (2004); there it is non-transparency, not inventory sizes, that is relevant. The other type used here is close to what is known as descriptive complexity or Kolmogorov complexity: the amount of information required to describe a system. This is a better measure and captures well the non-transparency relevant to learnability and sociolinguistic effects, but it is problematic to measure and compare. Canonicity¹ theory (Corbett 2007, 2013a, 2015, and others), though

¹ Henceforth I use that term to refer to the theory and its body of exemplar studies, since it is used in the foundational literature, but canonicality when I need to nominalize the adjective canonical (since only canonicality is possible in my English). Johanna Nichols, Canonical complexity In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Johanna Nichols. DOI: 10.1093/oso/9780198861287.003.0007

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

164

 

not a complexity measure in itself, can be used as a good approximation to descriptive complexity and is straightforwardly measurable and comparable (Nichols 2019; see Audring 2017 for a similar approach). The theory aims at improving deﬁnitions and technical understanding of linguistic notions. It deﬁnes a logical space (for a linguistic concept or structure or system) by determining the central, or ideal, position in that space for each dimension and the whole set of dimensions, and kinds of departures from that ideal. An element is non-canonical to the extent that it departs from the ideal. Essential to deﬁning the ideal position is the structuralist notion of biuniqueness, or ‘one form, one function’: any departure from that ideal is non-canonical. Such departures decrease transparency between function and form or underlying and surface, so the extent or number of non-canonical patterns in a system can also be used as a measure of its nontransparency. The literature of canonicity theory offers a good deal of work on morphological paradigms, which makes it a straightforward matter to identify the non-canonical elements in a paradigm The approach has the further advantage of being well-grounded in morphological theory yet applicable on its own without requiring adoption of an entire formal framework. To avoid cumbersome terms like non-canonicality-based complexity or noncanonicity-based complexity, I will use the simpler if less logical phrase CC.² Measuring CC is straightforward in principle: deﬁne types of systems and subsystems so as to maximize crosslinguistic comparability, and count the number of non-canonical patterns or elements found in each, for each language. Both EC and CC are what I will call structural measures of complexity: ones that are based on structural analysis and comparison. (Calculations using the measures can of course vary from classic typological method to computational method.) There are non-structural methods as well: for example, various kinds of complexity can be recovered computationally from text and lexical corpora (e.g. Bentz et al. 2017, using entropy in parallel corpora), or by measuring the difference in size between compressed and uncompressed copies of a corpus (Juola 1998; Ehret & Szmrecsanyi 2016). However, adequate corpora do not always exist, and the computational know-how or resources required may not be within reach of, say, a ﬁeldworker or historical linguist who wants to attribute a complexity level to one language or describe relative complexity among a few languages. Furthermore, automatically extracted measures and variables are not constrained to reﬂect best practices in linguistic analysis and comparison, a fact that reduces their validity and could eventually cut linguistic analysis entirely out of deﬁning linguistic complexity, thereby cutting linguistics out of an important segment of

² Or perhaps it is logical. Canonicity theory is concerned with whether linguistic elements are canonical or not, while the goal in this fragment of complexity theory is to describe types of complexity. In that theory, presumably the ideal in a space of complexity is maximal complexity, so in that sense ‘CC’ is logical.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 

165

Big Data work. Independently of those considerations, typology needs more than one kind of complexity measure. To address these various needs and possibilities, this chapter proposes a method for measuring CC (section 7.2) and presents results of a survey showing that CC yields results that are revealing and do not duplicate those from EC but complement them to make a stronger combined measure (section 7.3).

7.2 Method 7.2.1 Samples For the CC measure, I used a partly convenience and partly diversity-based sample of 113 languages, seeking coverage of some families and areas, and fairly good coverage of northern Eurasia and North America, plus thinner coverage of the rest of the world. The southern lands (Africa, Australia-New Guinea-Oceania, South America) are thinly covered, South Asia not at all, and Southeast Asia by only two languages.³ In addition to coverage, sample languages were chosen for comprehensiveness and quality of descriptions. The sample languages are listed in Appendix 7.3. For the EC survey I drew on the mostly diversity-based set of 226 languages that has grown from Nichols (2009), using the 105 of those languages that are also found in the CC sample. The combined complexity measure is the sum of the other two, available for only the 105 languages of the sample intersection. Where comparisons of the two kinds of complexity are at issue, I used only the 105-language sample intersection. Those involving only CC use the full 113 languages. There are also some comparisons of families and areas, using subsets of the sample.

7.2.2 Survey objects This study addresses only morphological complexity and speciﬁcally inﬂectional morphology. I surveyed a set of morphological typological variables across seven inﬂectional categories and three lexical classes (or parts of speech, henceforth POS)—nouns, independent pronouns, verbs—and counted the number of

³ The denser coverage of the northern hemisphere is intentional, as I planned to test some of the geographical distributions hypothesized in section 7.3. The coverage of the southern hemisphere is thinner than planned because the survey proved more labour-intensive than anticipated and could not be fully completed as projected.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

166

 

non-canonical patterns in inﬂectional paradigms for each category and each POS. The set of categories is a sample chosen because they are generally well-understood and well-described (including that grammars make it relatively straightforward to determine whether the category is present or absent, and if present what its values are). They are present in enough languages to make frequency comparisons meaningful. This section describes, ﬁrst, the inﬂectional categories surveyed, then the variables. Survey data consists of: (1) a text report on each language that includes any deﬁnitions of categories and variables required and discussion of any coding decisions, plus sources used. These reports discuss but do not fully replicate the information available in grammars. Sometimes they include scans of published paradigms. (2) A database page for each language showing the number of noncanonical patterns in each intersection of POS and category. Appendix 7.1 lists the categories and variables, and Appendix 7.2 gives the sum of entries in each intersection of categories and variables, across the whole sample. Appendix 7.3 lists the sample languages. The entire database will be included in some future release of the Autotyp database (Bickel et al. 2017 is the current release). The inﬂectional categories surveyed are: • Case. Dependent marking of argument roles. Only the core roles of A, S, O, G, and T, as well as Poss (possessor) were surveyed. • Gender. Lexically speciﬁed agreement categories of nouns, usually covert on the noun itself and necessarily made overt in agreement. Only noun gender is surveyed, and not pronoun gender as in English he, she, it. • Number. Only singular and plural were surveyed. For nouns, presence vs. absence of number marking was entered, but the plural paradigms for any inﬂectional categories of nouns (typically case, gender, possessive marking) were not surveyed. • Person. Only 1-2-3 singular inﬂectional paradigms were surveyed; for independent pronouns, only ﬁrst and second persons (singular and plural). Inclusive and exclusive, where they exist, are both included. Person inﬂection on nouns is possessive inﬂection; on verbs it is argument indexation.⁴ Where independent personal pronouns have a generic pronominal base and mark person only in the form of the regular inﬂectional person markers, person is counted as an inﬂectional category. Examples from Ainu are in (1); the same person preﬁxes are also verb indexes and possessive markers. In languages like those of Europe, person in pronouns is a lexical category and does not enter into this survey at all. ⁴ Indexation is deﬁned as in Nichols (1992: 48–9): marking on dependent or head of a category of the other, involving copying of relevant grammatical features from one member to the other. It is opposed to registration, which notes the presence of the other member and its type but does not copy features. (Nichols 1992 described only indexation and registration of dependents on heads, but in fact both can go either way: see Nichols & Lander in press.)

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

  (1)

167

Ainu (isolate; Japan and formerly Sakhalin) independent pronouns. (Shibatani 1990: 30–1; see Bugaeva 2012: 471 for slightly different forms from Southern Hokkaido dialects.) Singular Plural 1 ku-ani a-oka 2 e-ani eci-oka 3 Ø-ani Ø-oka • Person-number. Person and number are so often co-exponential (portmanteau or otherwise opaquely fused) in inﬂectional paradigms that personnumber was treated as a separate single category (see Appendix 7.2). Most languages with possessive inﬂection of nouns signal both the number of the possessor and the number of the possessed noun, using a dedicated plural afﬁx for the number of the noun and co-exponential person-number marking for possessor indexation. (Sometimes the dedicated plural afﬁx is promiscuous in the sense of Leer (1991), indicating plurality of either noun or possessor or both.) If there is a separate, dedicated marker of possessor number, however, that is entered separately as number. • Classiﬁer. Following Fedden & Corbett (2017), I use this term to comprise numeral classiﬁers as well as what they argue are second gender categories in languages like Mian (Ok family, New Guinea) and several Amazonian languages (e.g., Yagua, Yaguan family) but are called classiﬁers by tradition or for convenience, since it is useful to distinguish classiﬁers from the other, more canonical, gender category. For the present survey, the decision whether an inﬂectional category is gender or classiﬁer is less important than ensuring that it is included somewhere; what ﬁgures at this early stage is the total non-canonical points per language, not their distribution across categories, POS, and variables. Classiﬁers were counted if a classiﬁer is (more or less) obligatory for many or all nouns in contexts of quantiﬁcation, and possible for most numerals. More precisely, I consider occurrence with numeral classiﬁers to be an inﬂectional property of nouns, while the number and predictability of classiﬁers are properties of classiﬁers (and not surveyed here since they are not among the three lexical classes targeted here).⁵ For most classiﬁer systems, the contexts of usage extend beyond phrases containing numerals, and while some are primarily numeral classiﬁer systems, for others (particularly languages of Amazonia, e.g. Kwaza: Van der Voort 2006) the contexts considerably exceed those of prototypical numeral classiﬁers. Only for Mian (Fedden 2011) have I treated what are called

⁵ Numeral classiﬁer systems often recruit regular nouns to the system, and in their capacity as regular lexical nouns they are of course covered in this survey.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

168

 

classiﬁers as a gender category and added entries for their number and unpredictability. Thus the six classiﬁers of Mian are entered as a noun category with six inherent values, all unpredictable, while the 150 or more classiﬁers of Kwaza, like, for example, the ~50 of Mandarin, do not appear in this database and do not contribute to the EC of noun inﬂection or to non-canonicality in the form of unpredictability. • Tense/aspect/mood (TAM). The survey seeks the most basic synthetic present-like and aorist-like tense categories. (In terms of aspect these tend to be imperfective and perfective respectively.) If one or both is absent, as it is, for example, in Mawng (Iwaidjan, northern Australia), which has only a future/non-future tense opposition, the closest basic tense opposition is used (future and non-future in Mawng). If the language has no inﬂectional tense (as Mandarin does not), basic imperfective and perfective are used if the language has inﬂectional aspect; otherwise there is no entry for the TAM category. • General. Some of the variables are inherently difﬁcult to ascribe to some particular category. Examples are the numbers of stems per lexeme and stem classes per language. They are entered as general rather than as pertaining to paradigms of particular categories (usually with a comment in the data report). Again, for the present survey the exact placement of an entry is less important than ensuring that it is included somewhere and contributes to the total. For each language the database records for each category whether it is present or absent (a yes/no, or 1/0, classiﬁcation). The variables surveyed are the following.⁶ For all of them the number, or the presence vs. absence, of non-canonical patterns was entered for each of the survey categories just listed. For what was counted as non-canonical see below. For every variable and every category and value, irregular words, lexically speciﬁable exceptions, and small closed classes are disregarded. Sizable minority classes, and classes that are open or speciﬁable as a class, are counted. For example, if possessive inﬂection applies only to kin terms, or even only to consanguineal kin terms, this is counted as a class. [1] Inﬂectional classes. In the terms of Bickel & Nichols (2007) these are instances of formative ﬂexivity: classes distinguished by different sets of inﬂectional morphemes (e.g., sufﬁxes). Not all grammars explicitly account for the number of declension or conjugation classes, and those that do often mix together, or at least fail to distinguish, formative ﬂexivity and stem ﬂexivity (variable [5] below), so

⁶ Variables are numbered in square brackets and examples in ordinary parentheses.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 

169

deciding whether a class involves formative ﬂexivity or stem ﬂexivity often requires analysis and justiﬁcation (laid out in data reports). [2] Unpredictability of any inﬂectional classes. Sometimes the inﬂectional class of a noun is predictable from semantics, gender, phonology, or some other property, but often it is not. For example, the declension classes of conservative Indo-European languages like Latin or Russian are not predictable overall (though in each language there are some clusters of semantically similar words in each class). For Russian, it is possible to predict gender from declension class with fair accuracy, but not vice versa (Corbett 1982); close analyses like Corbett’s are not usually available, so my practice was to regard classes as unpredictable unless the grammar claimed otherwise and gave good grounds for the claim. The number of inﬂectional classes and the number of unpredictable ones are matters of EC, not CC. They are removed from some of the calculations here as indicated below. [3] Inherent categories. This applies primarily to gender classes of nouns, which are marked by agreement on other words and are usually covert on the noun. (Overt indication of gender on the noun itself does occur in a number of languages, e.g. Bantu, or to some extent Nakh-Daghestanian. In such languages gender was recorded as an inﬂectional category of nouns and its number of inﬂectional classes and their unpredictability were recorded.) Where classiﬁers are lexically speciﬁed for the noun (as is usually said to be the case for Mandarin, e.g. Chao 1968: 589–93), they are also coded as inherent. The alternative is relatively ﬂexible choice of classiﬁers per noun depending on semantic properties. [4] Unpredictability of inherent categories. Gender classes can be predictable for some or all genders. Here the question asked is how many of the gender classes are predictable (largely or entirely, i.e. for most or all of their nouns). Predictability is sometimes described as phonological, but usually as semantic. Every language with gender in the sample, and nearly every language on earth with gender, has predictable gender for nouns referring to humans, which are usually masculine or feminine depending on the sex of the referent but sometimes belong to a general human category.⁷ What is counted here is not predictability but unpredictability, since that is non-canonical. Counting the number of unpredictable classes amounts to EC, and it also contributes to a rapidly inﬂating scale.⁸ Instead of counting classes I have used the following values for applicability: ⁷ I know of only one language where human nouns have arbitrary gender: Uduk (Koman, Africa; Killian 2015), where the cutoff point for gender predictability is set even higher on the animacy hierarchy: it is predictable for ﬁrst and second person pronouns but not for human nouns. ⁸ Cole (1967) describes most of the non-human gender classes of the Bantu language Luganda (which number fourteen singular-plural concord pairs by his count) as ‘miscellaneous’ (these number ten), a large number for one cell of this survey. Most Bantu grammars describe the classes as having a semantic basis with some unpredictable members, but in languages with only one description the decision on predictability has to be taken at face value.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

170

 

(2)

Applicability thresholds 0 Applies to none or very few of the words in the class (here, nouns in the gender class). 1 Applies to an appreciable minority of the words in the class, and/or the set of words is open or deﬁnable rather than requiring enumeration. 2 Applies to all or most of the words in the class.

and the following semantic criteria: Human nouns. Unpredictability of their gender by the above values. Non-human nouns. Unpredictability of their gender by the above values. Human gender cross. Non-human nouns are found in human gender classes or vice versa. As a further note, most languages with a sex-based gender opposition for human nouns also apply it to a few non-human animate nouns, typically large and important domesticates. This kind of individual lexical exception falls under value 0 of the applicability scale. Table 7.1 shows a few languages and how they are treated in this classiﬁcation. Languages with a zero score have no unpredictable gender classes, either because their gender is entirely predictable (Avar) or because they have no gender, either of nouns (English) or of pronouns (Finnish). [5] Number of stems per lexeme. This is what Bickel & Nichols identify as stem ﬂexivity: declension or conjugation classes based on changes in the stem, such as ablaut, extensions, or allomorphy conditioned by the survey categories. For example, in Nakh-Daghestanian languages, many or most nouns have distinct nominative and oblique stems in the singular, with the oblique stem formed by adding an extension sufﬁx (Kibrik 1991, 2003). This is coded as two stems per lexeme. In English and other Germanic languages, the sizable but minority class of strong verbs has different stems, marked by ablaut, in the two survey tense categories (English sits, sat); this is also two stems per lexeme. A word or class is counted if it involves all, most, or a sizable or open subset of the relevant words, following the thresholds in (2). [6] Number of stem classes per language. The Nakh-Daghestanian languages with extensions in oblique stems mostly have two stems per lexeme, but the number of oblique extension sufﬁxes ranges from one to over a dozen in different languages. This, plus the (usually minority) class of nouns with a single stem, is the total number of stem classes per language. Following the criteria in (2), the number entered in the database is the number of such classes that are sizable, productive, and/or open. [7] Unpredictability of those stem classes (per language), by the same criteria as for [2] and [4] above.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 

171

Table 7.1. Gender unpredictability for some example languages IE Ingush Avar Bantu Nama BGW Uduk* English* Finnish Human: Non-human: Cross: Total:

0 2 2 4

0 2 1 2

0 0 0 0

0 1 0 1

0 2 2 4

0 1? 0 1

2 2 2* 6

0 0 0 0

0 0 0 0

Notes: Languages: IE: Generic conservative Indo-European (e.g., Latin, Russian). Three genders: masculine (M), feminine (F), neuter (N). The neuter gender contains relatively few nouns, so most non-human nouns are M or F, arbitrarily classiﬁed. Ingush: Nakh-Daghestanian (Caucasus). There is a dedicated gender for human males, a gender containing human females and some inanimates (though if the survey counted singular-plural gender pairings these would be different genders as plurals have different genders for human females and non-humans), and two non-human genders with arbitrary membership. Avar: Nakh-Daghestanian (Caucasus). There are three genders with total semantic predictability: M (human males), F (human females), N (all else). Bantu: Subbranch of Benue-Congo (Africa). Generic entry applicable to most Bantu languages including Luganda in this survey. There is a dedicated human gender and a number of non-human genders (the number varies among languages) which most descriptions present as having a semantic core or prototype plus a limited number of arbitrary members. Usually there are also a few dedicated genders for such things as non-ﬁnites or particular deverbal derived nouns. Nama (Khoekhoe): There are two genders, M and F, containing all human males and all human females respectively, and other nouns are arbitrarily divided between M and F. BGW (Bininj Gun-Wok; Gunwingguan, northern Australia): M and F genders contain all human nouns plus some arbitrary members. The other genders also have a semantic core and some arbitrary members. Uduk (Koman; Africa): Two genders; all nouns arbitrarily classiﬁed; ﬁrst and second person pronouns have predictable gender (all have gender 2). English: No noun gender. Finnish: No gender of either nouns or pronouns. * Not in sample. For Uduk, see footnote 7 above in text.

[8] Arguments indexed. The number of core arguments indexed on the verb, counted for the verb type with the most core arguments. The maximum number of core arguments possible is three (A, G, and T), but not all languages have ditransitives, and for those that do not the maximum is two. Arguments indexed are counted only for simple clauses without valence-related derivations such as causatives or applicatives. [9] Co-exponence, that is, portmanteau, cumulative, or otherwise opaquely fused marking of categories. Examples are the gender-number-case sufﬁxes of nouns and adjectives in conservative Indo-European languages. Co-exponence violates the one-form-one-function tenet of canonicality, as one form has three functions (marking gender, number, and case). A language is coded as having coexponence if all, most, or a sizable minority of its words in the relevant categories (e.g., nouns and their case paradigms) have co-exponent markers; it is so coded for all of the categories involved (e.g., for Indo-European, gender, number, and case). [10] Syncretisms: identical formatives in two or more categories that are nonidentical elsewhere in the language. Consider the German articles in (3):

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

172

 

(3)

Deﬁnite articles in German (syncretism patterns numbered with subscripts) M F N Plural das1 die1 Nominative der die1 Accusative den die1 das1 die1 Dative dem der2 dem den Genitive des der2 des der

What is counted is not individual syncretic endings or words but patterns of syncretism. In the German examples, feminine, neuter, and plural paradigms display the same pattern of nominative-accusative syncretism; dative-genitive of feminines is another. German has two syncretism patterns here. In German, case and gender are marked on determiners, of which the articles are the most frequent. Where categories are marked on articles but not the nouns themselves, they are still coded as noun categories, though also as wordhood discrepancies (variable [13] below). The database lists the number of syncretism patterns per category, but the counts and totals in section 7.3 below use only presence vs. absence of syncretism per category, as explained under variable [16] below.⁹ [11] Allomorphy. Deﬁned elsewhere in linguistics as two different forms for a single morpheme or paradigmatic cell, conditioned grammatically or lexically but not phonologically; phonological conditioning is not counted here since it can be considered automatic. An example is nouns of masculine gender in most Slavic languages, which have different accusative endings for animate and inanimate nouns. For example, three cases of Russian masculine nouns: (4) Nominative Accusative Genitive

‘brother’ brat-Ø brat-a brat-a

‘table’ stol-Ø stol-Ø stol-a

There is one allomorphy here in noun case inﬂection (accusative -a vs. -Ø), and also two patterns of case syncretism.¹⁰

⁹ Syncretism is clearly non-canonical (Corbett 2013a, 2007, and other works), as it makes for nonbiuniqueness, but reviewers and audience members often object that syncretism does not increase the amount of information required to describe a language. This shows that canonical and Kolmogorov complexity are not identical; it is the only respect I am aware of in which they are different. I believe the difference arises because Kolmogorov complexity is concerned only with the information required to describe the text as string alone and not the full text including its message. For the message even at the minimal level of determining which case is intended as in (3), resolving syncretism requires bringing in additional information. ¹⁰ There are debates in the Slavistic literature as to whether animacy is an additional gender category, or for that matter a subgender or supergender. It is also sometimes called a case split or a gender split, but I have not tried to distinguish allomorphy from splitting.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 

173

I did not encounter examples where it was difﬁcult to decide whether something was allomorphy (within one category) or a syncretism (in paradigms that do not have that allomorphy), but there may be such cases. If so, the important thing is to enter it somewhere, for this survey in which total numbers of non-canonical points are compared. [12] Position discrepancies. In some languages the forms of a single category are distributed between two different positions, e.g. Pazar Laz (Kartvelian, Turkey; Öztürk & Pöchtrager 2011: 485) subject person agreement in verbs (present tense): (5)

Pazar Laz subject agreement morphemes 1 v-/p-/p’-/b2 Ø3 -s

First person is a preﬁx, second person zero (presented as a preﬁx because the object preﬁxes that compete hierarchically for the same slot have an overt 2 object form), and third person a sufﬁx. Discrepant position is analogous to different forms for one category (albeit the forms are slots rather than morphemes), hence non-canonical. [13] Category discrepancies. I used this variable to account for infrequent examples like verb inﬂection in many Slavic languages, which have agreement for person-number in the non-past tense and gender-number in the past tense. The survey category is TAM rather than just one tense; if there were only one survey tense there would be no discrepancy. In these languages verbs were coded as having the categories of person-number, gender, and TAM, with a category discrepancy for TAM. [14] Wordhood discrepancies. These are discrepancies between such statuses as independent word, clitic, afﬁx, and non-linear marking such as ablaut, within a single paradigm. For example, in Slovene, singular pronouns have both tonic and clitic forms but plural ones have no clitic forms; in Bulgarian, Romanian, and Ossetic, subject indexation is sufﬁxal while object indexation uses clitics. Languages like German or Mian (Ok, New Guinea) have noun gender marked by articles; this is a wordhood violation for gender not as an inherent category but as an agreement category (in languages without the wordhood violation it is usually marked afﬁxally, as with the noun class preﬁxes of nouns in Bantu languages). [15] Partial marking: Only some of the otherwise eligible words inﬂect for the category. An example is gender in Nakh-Daghestanian languages, which is generally marked by preﬁxation or initial consonant mutation of the verb, but not for all verbs (the verb roots that do take it range in different languages from about 30% to the great majority of verbs). Another example is number: probably all languages that have number inﬂection on nouns apply it only to some nouns. Most common is drawing the line between count and mass nouns, with mass

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

174

 

nouns taking no number marking, but it is also fairly common to ﬁnd the line drawn between animate and inanimate or human and non-human nouns. I did not code number as a partial category for any of these: for count vs. mass nouns it is clearly due to semantics, and for the distinctions higher up there is a case to be made that those are semantically akin to the count/mass distinction. Some languages have only a handful of nouns that make number distinctions, for example Yurok (Algic, California), where only nine nouns, not representing a coherent semantic group, form plurals (Robins 1958: 23); these are my only cases of non-semantic, purely lexically speciﬁed, plural marking, but the nouns involved are too few in number to count in this survey. Partial marking is not common in the survey languages; Nakh-Daghestanian gender contributes most of the examples. [16] Multiple marking. Even rarer among the survey languages is marking of an inﬂectional category more than once in a wordform. For example, Bardi (Nyulnyulan, Australia) marks person-number on verbs with person enclitics, and can add an optional additional person-number enclitic to mark plurality of the object; this amounts to marking person twice. Yurok has A and O agreement in person-number, and in some verb classes and categories one-argument verbs ﬁll both slots and thereby mark subject person-number twice (Robins 1958: 69ff). [17] Other. This entry column handles the occasional uncertainty in classiﬁcation, but primarily contains calculations of the number of categories or dimensions involved in co-exponential marking. Noun inﬂectional paradigms of IndoEuropean languages preserving the original design of co-exponential gendernumber-case inﬂection abound in such non-canonical phenomena as syncretisms, unpredictable declension classes, unpredictable gender classiﬁcation, human crossgender, and others. (For some illustrations, see Nichols 2019.) These give them extremely high CC values if the number of syncretism patterns is counted, and this skews comparisons. Therefore I coded not the number of such patterns but the number of categories involved in them, treating those as dimensions of freedom within which syncretism might appear. Similarly, for complex systems of verb argument indexation where person-number and role (A, O) are marked by co-exponential and often opaque markers, I counted the number of categories involved (usually person-number and role, sometimes also gender).¹¹ This procedure levels out the possible complexity ranges of case-inﬂecting languages like Indo-European and complex head-marking languages like many in the Americas. But even with the obvious heavy contributors neutralized, section 7.3 shows that the languages of western Eurasia still reach overall higher CC levels than even the polysynthetic languages of the Americas. I judge this high level to be nonartifactual as measured, implying less opacity for polysynthetic inﬂection than ¹¹ Recognizing role as involved in the categories is also a way of accounting for the mix of direct and hierarchical marking of person in such systems.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 

175

for co-exponential case inﬂection—and indeed the notable complexity of polysynthetic languages lies not so much in inﬂectional non-transparency as in their templatic ordering, mix of lexical and inﬂectional categories, and sheer number of grammatical categories, and not primarily in the transparency or nontransparency of their core argument marking. This is also what the comparison of CC and EC levels in section 7.3 below implies: polysynthetic languages have more categories and slots, not more opacity.¹² The variables are summarized in Appendix 7.1.¹³ CC is what I call a composite variable: one that can be stated as a single typological variable (in this case, the CC value) but that composite consists of a number of separately deﬁned variables. These subvariables are not a random set of variables and not just a thematically related set but the total set of grammatical phenomena that cover the categories and POS and each of which deﬁnes some aspect of non-canonicality. They are not drawn from an existing database, and in fact only one of them—the number of arguments indexed—is a variable presently in the Autotyp database.

7.3 Results Appendix 7.4 is a graphic display of the levels of CC in the sample languages, separately for the CC total involving all datapoints and the one omitting those datapoints that enumerate categories (and are therefore a leak of EC into the CC count). They are similar except in absolute values. On either one, the sample languages can be described as spanning the complexity range from Mandarin (lowest) to Skolt Saami (highest). The rest of this section tries out CC by comparing how well CC and EC perform in tests for various kinds of correlations.

7.3.1 CC and enumerative complexity There is no correlation between CC and EC (linear correlation coefﬁcient -0.023; p = 0.819, Spearman’s rank correlation test, two-tailed). This means that they can be used as independent typological variables.

¹² Differential complexity of noun vs. verb inﬂection and head vs. dependent marking, and measuring the complexity of hierarchical patterns and polysynthetic structure, will be covered in a separate paper. At that point the dimensions of co-exponential marking will be given a term and a separate dedicated variable. ¹³ The variables used for EC are much as deﬁned in Nichols (2009). Publication of an updated version of that list is planned for the next year or two.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

176

 

7.3.2 Complexity and gender Nichols (2019) found that there was no correlation between EC and the presence of gender in a language, concluding that gender and the well-known complexity of many gender systems are not simply byproducts of overall complex morphology. I replicated that study on the smaller and different language set used here, and using a correlation test, with the same result: there is no correlation between EC and presence of gender. For CC, there is a slight positive correlation but it is far from signiﬁcant (correlation coefﬁcient 0.089, p = 0.233).¹⁴

7.3.3 Geography: continents and areas I calculated the mean CC for a number of areas and families, and asked whether the range of mean 1 standard deviation for each area overlapped with others, using the breakdowns in Table 7.2. Ranges for local areas and families are in Table 7.2. Figure 7.1 gives a graphic display. Non-overlap of the ranges means signiﬁcantly different populations. Macrocontinents and continents overlap each other considerably, which means that the largest groups all represent the same population. Of the local areas, the CircumBaltic has a very large standard deviation, that is, very little areality, and overlaps Table 7.2. Areal and family breakdown Macrocontinents: Africa, Eurasia, Australasia (Australia, New Guinea, Oceania), Americas Selected continents: Western Eurasia (to the Urals), North Asia (Siberia and northern Central Asia), North America, Central and South America Local areas: Balkan, Caucasus, Circum-Baltic, North Inner Asia (non-Paciﬁc Siberia and northern Central Asia), North Paciﬁc Rim (coastal and near-coastal from Japan to northern California) Families: Balto-Slavic, Uralic, Nakh-Daghestanian, Tungusic, Uto-Aztecan Notes: Figure 7.1 shows the mean CC 1 standard deviation for all groups. Northern continents (Eurasia, North America), the Caucasus, and the Uralic and Nakh-Daghestanian families are wellsampled; other areas and families are compiled opportunistically from languages in the sample and are less well covered.

¹⁴ For these calculations, to avoid circularity the points contributed by gender were subtracted from the total complexity. (If that is not done, CC yields a highly signiﬁcant but spurious correlation. EC does not, because the contribution of gender to its total is much less than for CC.) For CC I use the twotailed value since I had no advance expectation about whether or how CC might correlate with gender.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 

177

CC: Continents

CC: Macrocontinents 60.0

60.0

50.0

50.0

40.0

40.0

30.0

30.0

20.0

20.0

10.0

10.0 0.0

0.0 1

2

3

4

1

2

Africa

Eurasia

Australasia

Americas

W. Eurasia

N. Asia

CC: Areas

3

4

N. America C-S America

CC: Families

60.0

60.0

50.0

50.0

40.0

40.0

30.0

30.0

20.0

20.0

10.0

10.0 0.0

0.0 1 Balkan

2

3

4

5

Caucasus Circum- N. Inner N. Pacific Baltic Asia Rim

1

2

Balto-Slavic Uralic

3

4

5

UtoNakh- Tungusic Aztecan Daghestanian

Figure 7.1. Mean CC 1 standard deviation for three areal breakdowns and selected families Notes: Groups are deﬁned in Table 7.2. The mean and range for the entire sample are very similar to those for Africa.

most others. The Caucasus has a relatively large standard deviation (unsurprisingly, as its languages range from the fairly simple Lezgi to the very complex Ingush and Khinalug), and its status as an area is debated (con: Tuite 1999, pro: Chirikba 2008; I side with Tuite). The other three are well-known areas and have small standard deviations and little or no overlap. Mean complexity levels differ considerably among the areas, suggesting that regression to some neutral complexity level is not a consequence of areality. The ﬁve families show relatively little overlap. Uralic, one of the older and more widely distributed families and the most thoroughly surveyed here, has a large standard deviation. The others have clearer family proﬁles. Overall, then, continents and macrocontinents are not greatly different from one another or from world totals while local areas and families are more discrete from each other and for the most part internally fairly consistent in their complexity levels. These ﬁgures are very preliminary; in particular, standard deviations will probably shrink as the sample adds more members per area and family, reducing overlaps.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

178

 

7.3.4 Large-scale geography Since a number of typological variables form worldwide east-to-west clines in the northern latitudes (low in Europe, high in eastern North America, or vice versa; Nichols 2017), I tested whether CC and EC have such a distribution. Figure 7.2 plots complexity (CC or EC) against longitude in a series of graphs. (Longitude is universal longitude, not split into east and west but continuous from 0 to 360 .) The plots all have the same design: the vertical scale is the number of CC or EC points and the horizontal scale is longitude, running from west to east as in Figure 7.1. (The plot begins 10 west of Greenwich so that westernmost Europe and Africa will be counted with those continents and not with the Americas.) A worldwide cline will show up as a pronounced overall upward or downward slope to the pattern of dots. Each graph has a trendline showing slope, which can be regarded as indicating the approximate magnitude of difference between west and east. (The trendline is calculated on the rectangular plot used here, i.e. on a ﬂat-earth model with parallel longitude lines, so for the real earth it has no precise meaning. The visible differences between slopes in different plots do, however, make for a useful comparison that may be graphically clearer than the raw pattern of dots. Statistical signiﬁcance is not calculated on the plot but on the actual ranked longitude values and does not have the ﬂat-earth problem.) Figure 7.2(a) shows CC values running much higher in the west (the left side) than in the east (the right side), and there is a pronounced though not steep downward slope. Figure 7.2(b) plots only the languages in the northern continents;¹⁵ the slope is similar. For both the correlation of CC with longitude is highly signiﬁcant. Figure 7.2(c) plots only the southern languages; the pattern is much more dispersed and the slope noticeably less steep, and there is no signiﬁcant correlation. The interpretation is that (as with several other variables, surveyed in Nichols 2017) there is a worldwide west-to-east gradient, in this case with higher values in the west and lower values in the east, and it is stronger in the northern continents than in the south.¹⁶ Due to the sample structure and the composition of the western Eurasian linguistic population, much of the strength of the CC correlation comes from Indo-European languages. To counter their impact, I tested the sample with the four outliers at the upper left of Figure 7.2(a) removed (three are Slavic languages: Russian, Sorbian, Slovene; but highest of all is Skolt Saami, a Uralic language). Impact on the slope and signiﬁcance was negligible.

¹⁵ Northern continents are Eurasia and North America. Southern ones are Africa, Australia-New Guinea, and Central and South America. ¹⁶ In Figures 7.2(a)–(b), what appear to be dense vertical stacks of dots at some places are regions that are densely sampled and/or have high linguistic diversity at a similar longitude: at left, at about 45 , the Caucasus; at right, at about 230 , the Paciﬁc coast of North America.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 

179

CC x longitude: Whole sample (n= 113) 70 60 50 40 30 20 10 0 –10

p = 0.00001 40

90

140

190

240

290

CC x longitude: Northern continents (n = 82) 70 60 50 40 30 20 10 0 –10

p = 0.00002 40

90

140

190

240

290

CC x longitude: Southern continents (n = 31) 50 45 40 35 30 25 20 15 10 5 0 –10

p = 0.104 (n.s.) 40

90

140

190

240

290

Figure 7.2. Complexity x longitude Notes: Longitude (horizontal axis) runs from the Atlantic coast of Europe and West Africa on the left to the Atlantic coast of North and South America on the right: (a) CC x longitude, all languages; (b) northern continents; (c) southern continents. EC shows a highly signiﬁcant correlation in the opposite direction, with lower values in Europe and Africa and higher values in the Americas (p = 0.0011, conﬁrming what was reported, using a different sample and values, in Nichols 2009).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

180

 

A map of languages and their CC levels (to be continuously expanded as work on this project proceeds) is at https://lingconlab.github.io/opacity_Johanna/index.html

7.3.5 Sociolinguistics Dahl (2004) and Trudgill (2011) show that what Trudgill calls sociolinguistic isolation tends to allow languages to grow more complex over time, while sociolinguistically expansive languages tend to simplify. Sociolinguistic isolation means that a language absorbs little or no immigrant or language shifting population, so that nothing hinders the further growth of complexity. An expansive language (this is not Trudgill’s term; I take it from Janhunen 2008) absorbs appreciable numbers of adult L2 learners, and their inﬂuence tends to simplify the language. This section describes the four language groups in this chapter’s sample for which enough is known of the history of expansion and non-expansion to permit predictions about relative complexity levels. The groups and the complexity levels are listed in Table 7.3. • Altitude in the Caucasus. In mountain ranges with a central crest, languages generally spread uphill from the economically more important lowlands to more isolated highland communities, which are dependent on the lowlands for trade, commerce, and winter pastures (Nichols 2005, 2013). Highlanders know lowland languages but rarely vice versa; this makes uphill language spread possible and downhill spread unlikely, and likewise for diffusion of individual forms, Table 7.3. Complexity values for four historical groups of languages

(a) Avar sphere

(b) Samur sphere

(c) Slavic

(d) Uto-Aztecan

Andic mean Avar Hinuq (Tsezic) Hunzib (Tsezic) Lak Ic’ari Dargwa Tsakhur (Lezgian) Lezgi Udi Archi Tsakhur Russian Lower Sorbian Slovene Bulgarian Pipil Hopi Cupeño Tümpisa Shoshone

CC

EC

CC + EC

28 36 33 49 42 45 41 27 41 36 41 57 56.5 51 43 21 36 39 27

10 10 9 11 10 15 9 4 7 11 9 8

38 46 42 60 52 60 50 31 48 47 50 65

11 11 7 11 12 12

62 54 28 47 51 39

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 

181

categories, etc. That is, downhill languages are expansive while uphill ones are more sociolinguistically isolated. Thus we expect higher complexity in highland languages. Nichols (2013) ﬁnds a correlation between EC and altitude in the Daghestanian branch of the Nakh-Daghestanian family, and Nichols (2016) ﬁnds a stronger correlation using non-transparency of just gender marking. Nichols & Bentz (2018) show that a correlation of altitude with complexity is a signiﬁcant worldwide tendency on several different measures. The sample used here is smaller but yields similar results. Both CC and EC correlate appreciably with altitude, and combined CC+EC yields a notably strong correlation for the small sample (Figure 7.3). (a) CC x altitude in Daghestan

Altitude (metres)

3000 2000 1000 0 0

10

20

30

40

50

CC (b) EC x altitude in Daghestan

Altitude (metres)

3000 2000 1000 0 0

10 CC

20

(c) Combined CC+EC x altitude in Daghestan

Altitude (metres)

3000 2000 1000 0 0

10

20

30

40

50

CC

Figure 7.3. Complexity and altitude in Daghestan (eastern Caucasus) for the three complexity counts

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

182

 

• Spreads and isolation in the Caucasus: The Avar sphere. The eastern Caucasus is compactly settled by the 40+ descendants of the Daghestanian branch of Nakh-Daghestanian; Daghestanian may be of about IndoEuropean-like age. The eastern Caucasus has been inhabited by settled food producers for some 8,000 years. For at least the last few millennia the highland populations have followed an uncommon kind of transhumance: the entire working-age male population leaves the highlands for the winter half of the year, taking livestock to markets and winter pastures and usually ﬁnding seasonal work or maintaining businesses in lowland cities. There is what seems to have been a long-standing centre of language spread in the northeastern Caucasus and foothills, dominated from at least c.1000  by the Sarir Kingdom. The canyons of the Avar Koisu, Andi Koisu, and their conﬂuence in the Sulak were the avenues of trade and transhumant migration for most of Daghestan, and large markets formed in the Sulak lowlands. The language spoken at and near the conﬂuence—in recent historical times, Avar—had major economic importance and was the language of work and everyday life for half of the year for much of the male population of Daghestan. This has led to contact effects among the languages of western Daghestan, including a distinctive structural type marked among other things by highly transparent gender systems, lack of verbal preﬁxation, and of course many Avar loans. Three episodes of uphill spreading can be traced in the Avar sphere (Nichols in prep.): most recently Avar, earlier Andic, still earlier Tsezic. These three make up one branch of Daghestanian, with this structure: [ Tsezic [ [Andic] Avar ] ].¹⁷ Avars apparently became rulers in the Sarir Kingdom on its conversion to Islam (at which point it became the Avar Khanate), and the ﬁnal battles for control between Andi and Avar took place only in the seventeenth to eighteenth centuries (Aglarov 1988: 24). Avar has been an expansive language, serving as lingua franca along the Andi Koisu for about three centuries and along the Avar Koisu for probably somewhat longer; it has spread well uphill and spilled over the crest to Georgia and Azerbaijan, but patchily, with many non-Avar enclaves. Andic is probably about 1,500 years old, during most of which time it has been expansive and its daughters have spread uphill; their settlement of the Andi Koisu is compact. Tsezic may have separated some 3,000 years ago in an earlier uphill spread; Tsezic languages are now at the uppermost highlands of both the Avar Koisu system and the Andi Koisu. The Andic languages can be expected to show more pronounced effects of spreading than Avar does. The western Tsezic languages (Hinuq in this sample) have been under strong Andic and Avar inﬂuence;

¹⁷ Avar is one language, Andic a close-knit group of about ten, and Tsezic ﬁve more disparate.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 

183

the eastern Tsezic languages (Hunzib in this sample) had less Andic contact and held winter pastures not in the Avar-Andic lowlands but in Georgia to the south. Consistent with this history, Hinuq has considerable Avar inﬂuence and a very Andic-like grammar; Hunzib is markedly different, with southeastern Daghestanian-like traits. At the edge of the Avar sphere, the isolate branch Lak was not part of the Avar Khanate but used the same trade and transhumance routes and shows Avar lexical and grammatical inﬂuence; isolated in a highland plateau, it has no known history of spreading. Beyond Lak are the Dargwa languages, for which the Caspian coastal cities and trade routes were important, lessening Avar inﬂuence. To the south of Avar, languages of the Lezgian branch are spoken along the southeast-ﬂowing Samur and its tributaries, and Tsakhur is at the high end of this line of communication and also at the high end of the Koisu-Sulak line. The sample here includes representatives of most of these stages. Thus we expect the descending order of spread effects along and near the Andi Koisu and Avar Koisu systems shown in (6): (6)

Languages of the Avar sphere and their sociolinguistic histories Andic (long expansive; decomplexiﬁcation expected) languages Avar (recently expansive; some decomplexiﬁcation expected) Hinuq (early expansion, much subsequent Avar-Andic contact) Hunzib (early expansion, less Avar-Andic contact) Lak (isolated, but fairly large and uniﬁed) Ic’ari Dargwa (isolated, fairly small) Tsakhur (isolated, small; complexiﬁcation expected) Table 7.3(a) shows the complexity values. CC conforms very well to this scale; the only non-conformities are Hinuq, which clusters with Andic as is unsurprising, and Ic’ari (Dargwa), which belongs to the Caspian coastal sphere. EC is not very informative. The combined total is again in good conformity (unsurprisingly, as it adds the fairly uniform EC scores to the CC scores). For the Avar sphere and its periphery, then, CC reﬂects the sociolinguistics of spreading and isolation better than EC does, and the combined measure differs little from the CC scores. • The Samur sphere. The delta of the Samur River, which drains the southeast Caucasus and ﬂows into the Caspian Sea, is a highly productive agricultural region and long a nexus of trade and tax collection along the East Caspian commercial route. It is the second most important avenue (after the Sulak) for transhumant migration. The Lezgian branch, an old and diversiﬁed branch of Nakh-Daghestanian, originated in this vicinity and spread both uphill and into the Alazani valley in eastern Georgia and the lower Kura valley in northern Azerbaijan. The sample contains four Lezgian languages,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

184

 

two in the highlands and two in the lowlands. Lezgi is a large, expansive, and inter-ethnic language centred on the lower Samur and nearby. Udi, which descends from a probably expansive inscriptional language of the early to mid ﬁrst millennium (Caucasian Albanian [Gippert et al. 2009] is its ancestor), has since shrunk to three isolated enclaves in Azerbaijan and Georgia. Archi, noted for its morphological quirks (Corbett 2013b; Bond et al. 2016), and the complex Tsakhur are isolated at high ends of river canyons and have no known history of spread (apart from reaching the highlands in the ﬁrst place, however that happened). The complexity ﬁgures in Table 7.3(b) reﬂect this history well. Tsakhur has much higher CC than the rest; Archi has higher EC; lowland Lezgi, with its known history of expansion, is low on both counts. Udi is mixed, high on CC and lower on EC, suggesting that CC complexiﬁes faster than EC after the end of expansion. For both Caucasus surveys, EC picks out as most complex one language that is isolated at a high end with connections in more than one direction (Ic’ari Dargwa, Archi), CC appears to reﬂect spreading more than isolation, and the combined total gives a workable uniﬁed complexity scale that correlates reasonably well with altitude and isolation. • Slavic. Of the four Slavic languages in the sample, Russian has a long history of expansion and absorption of Baltic and Finnic populations; Sorbian reﬂects the leading edge of the Proto-Slavic expansion (c. sixth to ninth centuries) but has been sociolinguistically isolated and receding since then (largely absorbed by the German expansion); Slovene remains close to the homeland and has no known history of expansion other than uphill spread into the Austrian and Slovene Alps; Bulgarian belongs to the Balkan Sprachbund and has undergone drastic structural changes as a result, including loss of cases and thereby of the case-numbergender co-exponence that makes Slavic noun declension so complex. Complexity levels (Table 7.3(c)) are not greatly different for the languages preserving case inﬂection, while Balkanized Bulgarian is much less complex. • Uto-Aztecan. The Uto-Aztecan family is probably 5,000 years old and has undergone a gradual spread from a probably northern Mexican homeland followed by two large recent spreads: in the south, ancestral Nahuatl spread with the Aztec expansion and empire beginning in the thirteenth century, and in the north the Numic branch spread rapidly from the Sierra Nevada foothills across the Great Basin beginning in approximately the same time frame (Fowler 1972; Miller 1983; Madsen & Rhode 1994; Hill 2001, 2010; Merrill 2012). The languages in the sample, south to north, are Pipil (Nicaragua), Hopi (Arizona), Cupeño (southerneastern California), and Tümpisa Shoshone (east central California). Pipil is the southernmost

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 

185

descendent of Aztec and a surviving language probably of a military garrison. Cupeño is an isolated small language spoken in an eastern Sierra Nevada oasis with no history of expansion. Hopi is a pueblo language that gives some evidence of early admixture with a more southern Uto-Aztecan language (Merrill 2012) but has a long history of isolation. Tümpisa Shoshone is from the Numic branch. Table 7.3(d) shows that the complexity levels of expansive Pipil and Tümpisa Shoshone are lower than those of Hopi and Cupeño, as predicted. The difference is mostly due to CC, consistent with what is suggested by the Avar sphere. Though these samples are small, the results are generally consistent with predictions of higher complexity for sociolinguistically isolated communities. CC appears to be the better mirror of sociolinguistic history, and EC points in the same direction but unevenly. Nonetheless, combined CC + EC tends to yield very good correlations with present and prehistoric sociolinguistics: sociolinguistically isolated languages are more complex and expansive languages less complex.

7.4 Discussion and conclusions To summarize, CC makes something very similar to informational (or Kolmogorov) complexity straightforwardly measurable using standard structural analysis and wellworked out theoretical principles. I hope it will make it possible for any linguist to measure and compare the complexity of other languages. The initial hope for CC as ﬁrst attempted (Nichols 2015) was that it would be a replacement and improvement on EC and more cost-effective. It actually turned out to be not a replacement and no less labour-intensive but a useful complement; combining the two can give a very serviceable complexity measure which, as intended, is capable of reﬂecting sociolinguistic history and shows interesting geographical distributions. This chapter has laid out a method for describing and measuring CC in inﬂectional morphology, as a set of seventeen separate variables which for this ﬁrst attempt were simply added together without weighting. These represent a well-deﬁned and crosslinguistically well-represented subset of inﬂectional morphology; for both CC and EC, in order to make surveys manageable in time cost, inﬂectional morphology must be sampled rather than covered fully. In a survey of just over a hundred languages, CC and EC proved to be independent of each other and, independently or combined, give quite revealing results. In terms of geography, CC and EC values both follow worldwide east-west clines in the upper northern latitudes (as do all other composite variables I have surveyed). The continents surveyed all have similar means and ranges of diversity in their complexity values; local areas can vary more, and families can differ still more. For an area with a large range of values, one can question whether it is

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

186

 

genuinely an area (though a good answer will require surveying more than complexity). Both EC and CC correlate positively with altitude, a geographical factor that is not the cause of complexity levels but reﬂects the sociolinguistics of isolation. The four families adequately represented in the sample all display some positive correlation of complexity with sociolinguistic isolation, supporting principles advanced in historical linguistics and sociolinguistics. The deﬁnitions and coding used here were arrived at using the autotypologizing principle (Bickel & Nichols 2002) of no ﬁxed ontology and constant redeﬁning and recoding as the categories emerge from analysis of more and more languages. Arriving at the current typology has been very labour-intensive, making this pilot survey inordinately time-consuming. By now, though, the typology has stabilized to the point that language surveys themselves are not unduly labour-intensive. This line of inquiry can be improved by expanding the sample to give all continents and areas comparably dense coverage to what has been done here for northern Eurasia and North America, and covering thoroughly a larger number of families and local areas. Methods of weighting the variables, and different calculations using different combinations of variables, need to be proposed and tested; among other things this will give ﬁrm grounding to comparisons of the relative complexity of Indo-European noun inﬂection and polysynthetic verb inﬂection. For stem classes and inﬂectional classes, which as mentioned are rarely distinguished in grammars, we need improved and consistent descriptive coverage. We also need consensus deﬁnitions and criteria for characterizing the numbers of conforming and non-conforming members of classes that have some semantic or other basis, such as gender classes; descriptions like ‘miscellaneous’ (composition of a class), ‘predictable’ (class membership), ‘arbitrary’, etc., are not consistently used. The applicability thresholds used here (section 7.2.2)—few or no members predictable, a sizable minority predictable, most or all predictable— seem workable but require some quantiﬁcation, however approximate, of the class membership and openness. Inﬂectional paradigms are ideally suited to an approach like this one. The same approach works well for some domains of derivational morphology but not all. For phonology and syntax and probably some derivational morphology, non-transparency will probably need to be described with a measure of the distance between underlying and surface. I see this kind of study as moving linguistics in the direction of the data sciences. Variables that form geographically very large patterns, or that correlate with such things as sociolinguistics, expansions, and other human population developments raise the prospects of multifactorial interdisciplinary collaboration. A single variable surveyed in a 113-language sample is not what one would call Big Data, but behind the convenient single number representing the CC value lie seventeen variables surveyed across three POS and eight categories—a total of over 200 datapoints per language or over 20,000 for the hundred-language sample. Massive scope, making possible close comparison with the differently distributed

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 

187

data of other ﬁelds, might require some 400–500 languages plus similarly massive data for a few other composite variables or many simple ones. Creating such a resource is an ambitious but entirely feasible project.

Appendix 7.1 Categories and variables used here For deﬁnitions and discussion, see section 7.2.2. Variables * = entries are number of categories in the paradigm; others are presence vs. absence (calculated as 1 and 0 in total complexity ﬁgures). 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.

Inﬂection classes* Unpredictability of inﬂection classes* Inherent categories* Unpredictability of inherent classes* Stems per lexeme* Stem classes per language* Unpredictability of stem classes* Arguments indexed Co-exponence Syncretisms Allomorphy Position discrepancies Category discrepancies Wordhood discrepancies Partial marking Multiple marking Other

Grammatical categories surveyed here Case. Case marking of A S O G T and Poss only. Gender. Noun gender only. Number. Singular and plural only. Person. 1-2-3 singular inﬂectional paradigms; 1-2 singular and plural for independent personal pronouns. Person-number, where these two are co-exponential. Classiﬁer. Chieﬂy numeral classiﬁcation; but used for a second set of gender categories in languages with two gender systems (here, only Mian). TAM. The most basic synthetic present-like and aorist-like tense categories, where distinguished; where lacking, two other basic tense categories; where there is no tense, no entry. General. Where a variable cannot easily be attributed to any one category.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

188

 

Appendix 7.2 Cell totals per category and variable Case Gender Number Person Pers-No Classiﬁer TAM General Total Inﬂection categories

166

53

199

36

131

7

108

0

Inﬂection classes

206

44

164

36

185

5

122

62

824

8 0

26 152

25 11

8 0

41 0

0 15

4 0

9 0

121 178

Unpredictability

0

110

4

0

37

6

4

9

170

Stems per lexeme

56

3

10

4

7

1

27

320

428

Stem classes per lg.

49

2

11

7

13

1

47

405

535

Unpredictability

5

2

2

0

9

0

26

108

152

Arguments indexed

0

0

0

0

0

0

0

178

178

Fusions

4

28

2

4

106

0

5

21

170

Syncretisms Overlaps

54 13

25 0

7 0

5 2

43 1

0 0

1 0

4 0

139 16

Allomorphy

118

Unpredictability Inherent categories

700

57

6

11

9

21

0

8

6

Position discrepancies

7

7

5

6

15

1

3

5

49

Category discrepancies

0

4

2

0

2

0

0

2

10

Wordhood discrepancies

13

4

1

1

10

0

0

7

36

Partial marking

1

27

2

0

3

1

0

0

34

Multiple marking

0

1

1

2

11

0

0

0

15

0 638

0 494

2 459

12 132

43 678

0 37

0 355

Other TOTAL

4 61 1140 3934

Appendix 7.3 Sample Classiﬁcation and geography of the 113 sample languages. * = languages with only CC data and no EC data. Languages where the stock name is identical to the language name are isolates. Language Fula Lango Luganda Jamsay Fur Haro Somali Dahalo Nama Basque

Stock N. Atlantic Nilotic Benue-Congo Dogon Fur Ta-Ne Omotic Cushitic Cushitic Juu Basque (isolate)

Continent Africa Africa Africa Africa Africa Africa Africa Africa Africa W Eurasia

Area

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

  German Russian Lithuanian Sorbian * Slovene Bulgarian Romanian Albanian Greek Ossetic Kabardian Ingush Avar Karata Tindi * Godoberi Hinuq Hunzib Lak Icari Udi Tsakhur Lezgi Archi Khinalug Svan Pazar Laz Saami (Kildin) Finnish Mordvin Mari Hungarian Khanty (E.) Khanty (N.) * Nganasan Tundra Nenets Ket Evenki Even *

Germanic Indo-European Indo-European Indo-European Indo-European Indo-European Indo-European Indo-European Indo-European Indo-European West Caucasian Nakh-Daghestanian Nakh-Daghestanian Nakh-Daghestanian Nakh-Daghestanian Nakh-Daghestanian Nakh-Daghestanian Nakh-Daghestanian Nakh-Daghestanian Nakh-Daghestanian Nakh-Daghestanian Nakh-Daghestanian Nakh-Daghestanian Nakh-Daghestanian Nakh-Daghestanian Kartvelian Kartvelian Uralic Uralic Uralic Uralic Uralic Uralic Uralic Uralic Uralic Yeniseian Tungusic Tungusic

W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia W Eurasia N Asia N Asia N Asia N Asia N Asia N Asia N Asia N Asia

189

Circum-Baltic Circum-Baltic Circum-Baltic Circum-Baltic Balkan Balkan Balkan Balkan Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus Caucasus

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

190

 

Udehe Nanai Manchu Yakut Chuvash * Mongolian Yukagir (Tundra) Ainu Nivkh Itelmen Chukchi Aleut Mandarin Paiwan Bininj Gun-Wok Mawng Bardi Diyari Kuniyanti Djingulu Mian Usan Tawala Yimas Koiari Central Alaskan Yup’ik Zuni Acoma Lakhota Kiowa Hupa Cree E. Pomo Seneca Thompson Yurok Karok Nuuchahnulth * Tümpisa Shoshone

Tungusic Tungusic Tungusic Turkic Turkic Mongolic Yukagir Ainu (isolate) Nivkh (isolate) Chukchi-Kamchatkan Chukchi-Kamchatkan Eskimo-Aleut Sino-Tibetan Austronesian Gunwingguan Iwaidjan Nyulnyulan Pama-Nyungan Bunuban Mindi Ok Madang Austronesian Lower Sepik Koiarian Eskimo-Aleut Zuni (isolate) Keresan Siouan Kiowa-Tanoan Athabaskan Algic Pomoan Iroquoian Salish Algic Karok (isolate) Wakashan Uto-Aztecan

N Asia N Asia N Asia N Asia N Asia N Asia N Asia N Asia N Asia N Asia N Asia N Asia S&SE Asia S&SE Asia Australia Australia Australia Australia Australia Australia New Guinea New Guinea New Guinea North America North America North America North America North America North America North America North America North America North America North America North America North America North America North America North America

N Paciﬁc Rim

N Paciﬁc Rim N Paciﬁc Rim N Paciﬁc Rim N Paciﬁc Rim N Paciﬁc Rim N Paciﬁc Rim

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

  Yokuts Maidu Southern Sierra Miwok Wappo Wishram Nez Perce Klamath Chimariko Cupeño Koasati Hopi Jamul Tiipay Pipil Tzutujil Cayuvava Movima Kashibo-Kakataibo Jaqaru Aymara Huallaga Quechua Mapudungun Kwaza Paez

Utian Maiduan Miwokan Yuki-Wappo Chinookan Klamath-Sahaptian Klamath-Sahaptian Chimariko (isolate) Uto-Aztecan Muskogean Uto-Aztecan Yuman Uto-Aztecan Mayan Cayuvava Movima Panoan Aymaran Aymaran Quechua Mapudungun Kwaza (isolate) Paesan

191

North America North America North America North America North America N Paciﬁc Rim North America North America North America N Paciﬁc Rim North America North America North America North America Central America Central America South America South America South America South America South America South America South America South America South America

Appendix 7.4 CC levels in the survey languages (a) Including count of categories (though this approximates EC). Lowest, in increasing order: Mandarin, Diyari, Manchu, Lango. Highest, in increasing order: Ket, Lower Sorbian, Russian, Skolt Saami. The scale is 9–68; median=mean (arrow) is 32. (b) Excluding count of categories to give a more strictly CC total. Lowest, in order: Mandarin, Manchu=Diyari, Lango=Klamath=Kashibo-Kakataibo. Highest: Russian, Lower Sorbian, Slovene, Skolt Saami. The scale is 7–60; mean 26.4, median (arrow) 25.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

 

192

(a) Including count of categories 80 70 60 50 40 30 20 10 0 (b) Not including count of categories 70 60 50 40 30 20 10 0

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

8 The complexity of grammatical gender and language ecology Francesca Di Garbo

8.1 Introduction This chapter is a qualitative investigation of the sociohistorical correlates of diachronic change in the domain of grammatical gender agreement. I deﬁne grammatical gender systems as systems of nominal classiﬁcation that presuppose agreement marking and thus highly grammaticalized patterns of inﬂection, often involving shared exponence with other nominal categories (e.g., number), syncretism, and other types of coding asymmetries. In languages with grammatical gender, nouns are assigned to different classes. These categorizations are not necessarily, or not only, encoded on nouns. On the contrary, gender marking is displaced on words that are engaged in a morphosyntactic relationship with nouns (e.g., adnominal modiﬁers, verbs, pronouns) and whose inﬂections point at the gender of the noun. During the last couple of decades, a number of studies have brought qualitative and quantitative evidence in support of the idea that the evolution of morphological complexity (both at the syntagmatic and paradigmatic level) is sensitive to sociohistorical dynamics concerning language population (see, among others, Lupyan & Dale 2010; Trudgill 2011; Bentz & Winter 2013; Bentz et al. 2015). Complexities in certain domains of morphology represent a challenge for the adult learner and tend to be eroded with the increase of the number of adult learners at a given point in the history of a speech community. This adaptive response of language structures to social factors has been claimed to be also crucial to understand how gender systems change through time and how they are distributed worldwide (Trudgill 1999; Nichols 2003; McWhorter 2007). For a number of language families around the world (e.g., Indo-European and NigerCongo) grammatical gender can be reconstructed as a feature of the protolanguage, and as one of the most long-lived. Yet, even though stable at the family-level, the gender systems of individual languages within a gendered family may undergo reduction and loss due to language-internal processes of morphophonological erosion and/or reanalysis that, at least in some cases, pair up with a situation of prolonged contact and bilingualism with languages lacking gender Francesca Di Garbo, The complexity of grammatical gender and language ecology In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Francesca Di Garbo. DOI: 10.1093/oso/9780198861287.003.0008

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

194

  

(on the role of language contact in the loss of grammatical gender, see the recent study by Igartua 2019; for a broader discussion of loss of morphology and imperfect language learning, see the contributions by McWhorter, Chapter 10, and Berdicevskis & Semenuks, Chapter 11, both in this volume). It has also been observed that gender systems tend to cluster geographically and to be best preserved in languages surrounded by other languages with gender (Nichols 1992, 2003). Thus languages that undergo complete gender loss are expected to be neighbours with each other or to have languages without gender as their closest neighbours (Nichols 2003: 299–304). While instances of gender reduction and loss under contact situations are relatively well documented in the literature, the role of language contact in the rise of gender systems has, so far, been poorly explored, and scholars generally agree on that gender systems very seldom arise within language families that normally lack gender (Nichols 2003: 308). This is directly connected with the fact that full-ﬂedged gender marking systems are commonly associated with rather pervasive patterns of agreement, which are notoriously unlikely to be borrowed (for a similar argument, see Igartua 2019: 209). However, recent research (Stolz 2012, 2015; Di Garbo & Miestamo 2019) shows that elementary patterns of gender agreement may emerge as a result of borrowing of noun phrases from contact languages with gender, and that, albeit rare, these types of systems are spread across unrelated languages and in different areas of the world. Existing research on the stability and evolution of gender systems under contact situations focuses either on the decline or on the rise of gender systems, and the two processes are rarely discussed together. Here I argue that, in order to fully understand to which extent morphological complexity in the domain of grammatical gender ties up with factors pertaining to the social history of a speech community, a comprehensive survey of the evolutionary dynamics of gender systems—focusing not only on loss and emergence, but also on reduction and expansion—is in place. In addition, given that, by deﬁnition, gender systems are bound to the existence of productive agreement patterns (Corbett 1991), I contend that complexiﬁcation and simpliﬁcation in the morphological encoding of gender distinctions must be primarily studied through the analysis of agreement patterns.¹ Within contact linguistics, it is generally assumed that contact-induced loss or emergence of agreement presupposes long-term contact, heavy borrowing and/ or extensive bilingualism between speech communities (Thomason 2001: 71). However, to date, and to the best of my knowledge, there have been no studies that systematically tackle the issue of which factors may account for the occurrence of these opposite patterns of change, agreement loss and emergence, under

¹ Focusing on patterns of gender agreement does not mean, of course, to underestimate the importance that nominal gender marking has in languages that display it (for a more thorough discussion, see section 8.2).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

    

195

allegedly similar sociohistorical scenarios. The present study attempts to ﬁll in this gap by investigating loss of gender agreement in language families characterized by the presence of this feature and, conversely, the insurgence of gender agreement in languages with no inherited gender systems. Beside loss and emergence, I also study the reduction and expansion of gender agreement patterns within gendered language families. With respect to sociohistorical variables, the study especially focuses on language contact dynamics, with particular attention to asymmetries between the populations in contact, both in terms of the demographic structure (population size) and prestige differences. The chapter is structured as follows. In section 8.2, I discuss in what respects gender systems, as a grammatical and functional domain, can be relevant to the study of morphological complexity. The sampling methodology and data collection procedure are outlined in section 8.3. In section 8.4, I provide an overview of the patterns of language change attested in the data set, and illustrate their geographic distribution in section 8.5. Section 8.6 discusses the sociohistorical factors that are associated with the patterns of change attested in the languages of the sample. A summary of the results and some concluding remarks are given in section 8.7.

8.2 Grammatical gender and morphological complexity Recent research on linguistic complexity and the typology of gender systems (Audring 2014; Di Garbo 2016) suggests that three dimensions of variation can be relevant to a typologically informed, descriptive² account of the complexity of gender systems: • The number of gender distinctions, under the assumption that the higher the number of distinctions, the more complex the gender system. • The number and nature of assignment rules, under the assumptions that: (a) a gender system where gender assignment is both semantic and formal is more complex than a system where gender assignment is only semantic or only formal, and (b) a gender system with ﬂexible assignment is more complex than a system with rigid assignment. • The pervasiveness of gender marking, under the assumption that the higher the number of word classes and syntactic domains that are subject to gender marking, the more complex the gender system. ² In this chapter, the notion of descriptive, absolute complexity is kept distinct from the notion of difﬁculty. Under the former approach, complexity is operationalized in terms of description length (Dahl 2004; Miestamo 2008). Under the latter approach, complexity is a measure of difﬁculty and costs in language learning and use (Kusters 2003). For a discussion of these and related topics, see Arkadiev & Gardani, Chapter 1 in this volume.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

196

  

The suggested dimensions are based on established typological parameters for the classiﬁcation of gender systems, but do not exhaust all possible ways in which gender systems may vary, and they can be in turn broken down into a number of subdimensions. For a detailed analysis of how the complexity of gender systems can be further differentiated, see Audring (2017, 2019). While the ﬁrst and second dimensions of the proposed complexity metrics are not directly linked to morphological complexity, the third dimension (pervasiveness of gender marking) directly hinges on morphology. Gender marking presupposes the existence of morphology that is dedicated to the expression of gender. This applies both to nominal gender marking (also known as overt gender) and to non-nominal gender marking (also known as gender agreement). If we consider non-nominal gender marking ﬁrst, grammatical gender systems can be associated with morphological complexity both syntagmatically and paradigmatically. At the syntagmatic level, patterns of gender agreement are sets of inﬂections that may occur on various entities within an utterance (e.g, articles, adjectives, demonstratives, verbs, personal pronouns) and that point at one of multiple classes to which nouns can be assigned (e.g, in Italian, the masculine and feminine class). At the paradigmatic level, each of the items that carry gender inﬂection in a language typically possesses as many forms as there are gender values to be distinguished, and the number of available forms is even higher if, for instance, a language expresses gender distinctions both in the singular and in the plural. In Italian (Indo-European, Romance),³ the form of the deﬁnite article varies between il/lo, la, i/gli, le, depending on whether the noun marked as deﬁnite is masculine singular, feminine singular, masculine plural, or feminine plural.⁴ Moving on to overt gender marking, in several languages, gender marking is not only restricted to agreement but also affects nominal morphology, with gender distinctions being overtly marked on nouns. Overt gender marking features higher syntagmatic complexity, inasmuch as it increases the number of word classes where gender is ﬂagged within an utterance. It also increases paradigmatic complexity, in that it leads to higher lexical diversity, given that each noun may in principle have as many forms as there are gender values to be distinguished. Nominal gender marking is, for instance, very pervasive in Atlantic-Congo gender systems, as illustrated in (1) with an example from the Bantu language Chichewa.

³ In this chapter, language classiﬁcation is based on Glottolog (Hammarström et al. 2019). ⁴ This type of morphological paradigmatic complexity is deﬁned by Bentz et al. (2015: 2) as an instance of lexical diversity, which they describe as the ‘distribution of word forms or word types’ that languages ‘use to encode essentially the same information’. In the domain of deﬁniteness marking, Italian exhibits higher lexical diversity than, say, English, because different forms of the deﬁnite articles are used depending on the gender and number values of nouns, whereas deﬁnite articles in English are gender and number invariant.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

    

197

(1) Gender marking in Chichewa (Atlantic-Congo, Bantu; Kiso 2012: 18) chi-nkhanira cha-chi-kazi chi-ku-dzi-kanda 7-scorpion -7-female 7.---scratch ‘The female scorpion is scratching itself.’ In (1), the markers of class 7 (the singular form of gender 7/8 in Chichewa) occur on the adnominal modiﬁer, the verb, and the noun itself. The relationship between nominal and non-nominal (agreement-based) gender marking is not trivial. In some languages, as it is the case in Bantu, nominal and non-nominal marking can have similar means of expression from the point of view of the phonological appearance of the morphemes used to encode gender distinctions. However, this formal correspondence may only apply to parts of the system rather than to all nouns and all agreement targets. In addition, nominal marking and agreement marking may have different sources and undergo different types of diachronic developments. For instance, as is also the case in Bantu languages, animacy-based marking may develop in the domain of agreement without affecting nominal marking. Thus, in languages that have both nominal and agreement-based marking of gender distinctions, it is important to consider these as two separate dimensions that may, but need not interact with each other. In this chapter, I restrict my focus to patterns of change in the domain of agreement marking and their effect on the complexity of gender systems. The reason behind this choice is twofold. On the one hand, while agreement marking is deﬁnitional to gender (there is grammatical gender only if there is displaced marking of classiﬁcatory distinctions through agreement), nominal marking is not (many languages mark gender distinctions only via agreement). On the other hand, while agreement marking directly hinges on inﬂectional morphology, in that gender agreement targets obligatorily inﬂect for gender, nominal gender marking resides more in the domain of lexicalized distinctions and/or word formation rules, which can be argued to be less central to morphological complexity. The patterns of change in the domain of agreement marking that the study focuses on are presented and discussed in section 8.4.

8.3 Method and data 8.3.1 Sampling methodology and variables in focus The study is based on a sample of 36 languages distributed among 15 sets of closely related languages. Each language set contains two to three languages with the exception of Chamorro, a language isolate within the Austronesian family, and the mixed language Michif. The geographical distribution and genealogical afﬁliation of the sample languages are shown in Figure 8.1. Even though language sets

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

198

  

Legend Balto−Slavic Bantu Basque Chamorro Central Gunwinyguan Germanic Ghana−Togo−Mountain Greek

Insular Celtic Iranian Khasian Lezgic Mek Michif Thebor

Figure 8.1. The language sample Note: See also Di Garbo & Miestamo (2019).

from at least ﬁve of the six world’s macro-areas are represented in the sample, the data set is largely skewed towards Eurasia. The reason behind this bias is twofold. First, along with Africa, Eurasia is one of the areas of the world where gender systems are most frequent. Second, for many of the Eurasian genealogical units included in the sample, diachronic developments in the domain of nominal morphology have been studied with the support of historical-comparative data, and the social history of many of these speech communities is also relatively well documented. The languages of Eurasia thus qualify as an appropriate starting point to explore the evolutionary dynamics of morphological complexity in the domain of gender marking and their sociohistorical correlates. At least one genealogical unit for all other macroareas (except for South America) has been added. A complete list of the languages sampled for each of the genealogical units is given in Appendix 8.1. Each language set consists of one conservative language and at least one innovative language with respect to gender agreement marking, with the exception of the Thebor (Bodic) languages Shumcho and Janshung, both of which represent instances of emerging gender agreement patterns within the family. Languages within one and the same set can be mutually intelligible with each other (as in the case of Kelasi and Kafteji within the Northwestern Iranian set), or more distantly related (as in the case of Nalca and Eipo within the Mek set). The patterns of language change accounted for are: loss, reduction, emergence and expansion in the domain of gender agreement. These are compared with either the retention of gender agreement (in case of reduction, loss and expansion) or with its absence (in case of emerging gender agreement). These diachronic processes are investigated by examining the morphosyntactic domains of gender marking in a language (e.g., attributive modiﬁers, predicates, pronouns), and the way in which

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

    

199

these vary across genealogically related languages: what are the word classes that inﬂect for gender in language X as opposed to the closest relatives Y and Z? Do all targets of gender agreement mark the same kind of gender distinctions or is there a split between, say, adjectives, articles and demonstratives distinguishing between masculine, feminine and neuter gender, and personal pronouns distinguishing between animate and inanimate gender? The relevance of these questions for the understanding of the complexity of gender systems is discussed in section 8.4. In addition to representing more or less conservative languages in the domain of gender agreement, the sampled language sets and the individual languages within each set, were selected so as to attempt to capture diversity at the sociohistorical level. In this respect, variables such as demography, domains of use, and history of contact were considered. This sampling methodology, which aims to capture both structural and sociohistorical diversity within sets of closely related languages, has been already applied to studies of the relationship between language structures and social structures. An example of this approach is the study of morphosyntactic complexity and language contact by Maitz & Németh (2014), where morphosyntactic complexity in three varieties of German is investigated to the effect that these varieties represent three different sociohistorical proﬁles: one standard, and relatively high contact language (Standard German), two contact languages (the pidgin Kiche Duits and the creole Unserdeutch), and one low contact variety typically learned as L1 only (Cimbrian).

8.3.2 Data collection Data were collected by using a questionnaire, which was sent out to experts of individual languages, as well as by means of descriptive resources. For those languages for which questionnaire responses could not be obtained, I used the questionnaire as a guideline to conduct more informal consultations with language experts and to gather information from descriptive resources. The questionnaire consists of two parts. Part 1 focuses on language ecology and language contact and aims at capturing information on the present and past geographical and sociohistorical environment in which a given language is/was used, with a set of ﬁne-grained questions ranging from demography to domains of language use, issues of language identity and prestige, code switching practices and language contact in the past.⁵ Part 2 focuses on grammatical gender and aims at capturing information on number and type of gender distinctions, gender assignment rules, the morphology and syntax of gender marking and the diachrony of a given gender system. The questionnaire is based on two different ⁵ Not all of these questions could be answered for all languages in the sample.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

200

  

pre-existing typological questionnaires. Part 1 is based on John Bowden’s questionnaire on language contact in East Nusantara (Eastern Indonesia). Part 2 is based on Greville Corbett’s questionnaire on gender and number.⁶

8.4 Patterns of change under study: an overview Here I provide an overview of the patterns of language change that appear to foster the reduction, loss, expansion and emergence of gender agreement in the languages of the sample. I ﬁrst discuss patterns of reduction and loss, moving to emergence and expansion thereafter. I also discuss how each of the patterns in focus may contribute to the increase and/or decrease of aspects of morphosyntactic complexity in the domain of gender marking. A description of patterns and contexts of change in each of the sampled languages is given in Appendix 8.1. For a detailed discussion of the patterns of change attested in the languages of the sample and summarized herein, see Di Garbo & Miestamo (2019).

8.4.1 Reduction and loss of gender marking The reduction and loss of gender agreement in the languages of the sample may result from two distinct processes of language change: (1) morphophonological erosion and (2) redistribution of agreement patterns. Under morphophonological erosion, gender marking is eroded or disappears as a result of sound changes that lead to the loss of segmental morphology. Under redistribution of agreement patterns, one gender agreement pattern spreads at the expenses of others, leading to the partial or complete neutralization of gender distinctions. Both processes exhibit properties of directionality, but the preferred directionalities differ under one or the other process: morphophonological erosion is found to often spread from the domain of attributive modiﬁers whereas the redistribution of gender agreement patterns often has its onset in the domain of anaphoric pronouns. An example of partial loss of gender marking as a result of morphophonological erosion is Standard Swedish (Indo-European, North Germanic). In Standard Swedish, two different systems of gender distinctions are attested. Within the noun phrase, the language distinguishes between two genders: the Common Gender and the Neuter Gender, en person ‘a person’ and ett hus ‘a house’. This distinction is marked on deﬁnite and indeﬁnite articles, demonstrative modiﬁers, and adjectives. In the domain of third person pronouns, a Masculine/Feminine ⁶ Both questionnaires can be freely accessed through the repository for ‘Typological tools for ﬁeld linguistics’ from the website of the former Department of Linguistics at the Max Planck Institute for Evolutionary Anthropology in Leipzig (http://www.eva.mpg.de/lingua/tools-at-lingboard/question naires.php).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

    

201

Table 8.1. Third person pronouns in standard Swedish Hum. and Higher Anim. Inanim.

M han ‘he’ C den ‘it’

F hon ‘she’ N det ‘it’

P⁷ de ‘they’ P de ‘they’

type of gender distinction is marked if the pronoun antecedent is a human or a higher animate. If the pronoun⁷ antecedent is a inanimate noun, the Common/ Neuter gender distinction, which is active elsewhere, applies. This split is illustrated in Table 8.1. This split in the domain of gender marking is the result of the merger between masculine and feminine inﬂections on adnominal modiﬁers, which occurred through a combination of various morphophonological processes, such as the erosion and loss of the masculine sufﬁx -er from the inﬂectional paradigm of strong adjectives, the loss of the masculine sufﬁx -r before the deﬁnite sufﬁx in the nominative form of the noun, and the loss of ﬁnal consonant length in the inﬂectional paradigm of the deﬁnite sufﬁxes (Duke 2010: 652–4). Many nonstandard varieties of Swedish, such as Elfdalian Swedish, still retain the tripartite distinction between Masculine, Feminine, and Neuter Gender all throughout the gender marking system. Complete loss of gender inﬂections as a result of morpholphonological erosion is attested in the Northwestern Iranian language Kelasi. Kelasi’s closest genealogical and geographic neighbour, Kafteji, still retains productive masculine and feminine gender agreement patterns. Lack of gender marking in Kelasi and presence of gender marking in Kafteji are exempliﬁed in (2) and (3), respectively. (2)

No gender agreement in Kelasi (Northwestern Iranian; Stilo 2019: 45) a. m œmd-e ziœ-Ø ní-œ. this P.N-. son-. .-3 ‘This (or ‘he’) is not Ahmahd’s son.’ b. m œmd-e dét-Ø ní-œ. this P.N-. daughter-. .-3 ‘This (or ‘she’) is not Ahmahd’s daughter.’

(3)

Masculine and feminine gender agreement in Kafteji (Northwestern Iranian; Stilo 2019: 45) a. m-Ø œmd-ə zeœ-Ø ní-œ. this-. P.N-. son-. .-3. ‘This (or ‘he’) is not Ahmahd’s son.’

⁷ The Masculine/Feminine distinction is also marked in the accusative and genitive forms of the pronoun. Cf. honom (3..) vs. henne (3..), and hans (3..) vs. hennes (3..).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

202

   b. m-œ œmd-ə dét-œ ne-áya. this-. P.N-. daughter-. .-3. ‘This (or ‘she’) is not Ahmahd’s daughter.’

As examples (2) and (3) show, utterances in Kelasi and Kafteji look (and sound) practically the same and the two languages are highly mutually intelligible. One of the few striking structural differences between the two languages is, in fact, the presence of gender inﬂections in Kafteji (in the form of zero-marked Masculine and marked Feminine) and its complete absence in Kelasi. Stilo (2019) describes loss of gender in Kelasi as the result of morphophonological erosion in the domain of nominal inﬂection, whereby the possibility to omit overt gender marking on nouns in certain morphosyntactic contexts triggers the systematic erosion of gender marking elsewhere. No information is however given about the ordering of loss of gender inﬂection on the various agreement targets. Loss of gender by the redistribution of agreement patterns is attested, among other languages, in Cappadocian Greek (Indo-European, Greek), where it results from the generalization of neuter agreement to all instances of masculine and feminine gender agreement (Karatsareas 2009, 2014). Comparative evidence from closely related dialects, such as Pontic Greek, allows us to infer how the process of redistribution took place. In Pontic Greek, grammatically masculine and feminine nouns denoting inanimate entities trigger neuter agreement on all agreement targets but the prenominal articles. This is shown in (4), with the example of the inanimate feminine noun pórta ‘door’, which triggers neuter agreement on the past participle anixtón ‘open’, but feminine agreement on the prenominal deﬁnite article i. (4)

Argyroúpolis Pontic (Indo-European, Greek; Karatsareas 2014: 79) i pórta (...) móno ímoson óran estéknen anixtón .. door.. (...) only half.. hour.. stay..3 open.. ‘The door would stay open for only half an hour.’

Conversely, in Standard Modern Greek, the same controller noun selects feminine agreement on all targets. (5)

Standard Modern Greek (Indo-European, Greek; Karatsareas 2014: 80) i pórta móno misí óra émene anixtí .. door.. only half.. hour. stay..3 open.. ‘The door stayed open for only half an hour.’

In Pontic Greek, the redistribution of the neuter gender agreement pattern is semantically motivated. Neuter agreement is associated with inanimate referents, and inanimate nouns select neuter agreement irrespectively of their grammatical

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

    

203

gender. In Cappadocian Greek, where the generalization of the neuter agreement patterns has taken over, no trace is left of this semantically based redistribution. Morphophonological erosion and agreement redistribution are not two mutually exclusive processes. An example of heavily reduced gender agreement system where both morphophonological erosion and redistribution of agreement patterns are at play is Karleby Swedish (Indo-European, North Germanic), a variety of Swedish spoken in the town of Karleby, which is located in the Finnish region of Ostrobothnia. In Karleby Swedish, gender inﬂections have been lost everywhere except for the unbound form of the deﬁnite articles and the personal and demonstrative pronouns, all of which still inﬂect as masculine or feminine, but only when the controller nouns denote human beings (Hultman 1894: 229; Huldén 1972: 47). Similarly, gender marking has undergone severe reduction and near-loss across different varieties of Tamian Latvian. According to the recent analysis by Wälchli (2017), the erosion of gender distinctions started out with the loss of short vowels in ﬁnal syllables. This occurred ﬁrst on nouns, leading to the neutralization of the masculine and feminine distinction in the accusative plural form, and later extended to agreement marking, starting from the demonstratives. This initial process of morphophonological erosion was followed by multiple processes of redistribution in other domains of gender marking, which led to the generalization of the masculine agreement pattern at the expense of the feminine. Traces of feminine marking are still found, to different extents and different degrees of productivity, in nearly all varieties of Tamian Latvian. For a sociohistorical analysis of these developments, see section 8.6. While it can be assumed that complete loss of gender agreement marking is a straightforward process of morphosyntactic simpliﬁcation which decreases the overall number of grammatical meanings that must be expressed in a given morphosyntactic context (e.g., on adnominal modiﬁers, anaphoric pronouns, predicates), partial losses and redistributions of gender marking are harder to classify as straightforward simpliﬁcation. Here I base my assessment of morphosyntactic complexity in reducing gender systems on recent work by Audring (2017), where different aspects of gender marking are broken into a multidimensional space of variation. Partial loss of gender marking as a result of morphophonological erosion can pave the way to split gender agreement systems such as the one attested in Standard Swedish. Here, not all targets of gender marking are sensitive to the same type of gender distinctions: the personal pronouns make a sex-based distinction that is not found in the domain of adnominal modiﬁcation. Furthermore, sex-based marking on personal pronouns is conditional, and only occurs if the pronoun’s antecedent is a human being or a higher animate. According to the complexity metric proposed by Audring (2017), split gender agreement systems and conditional gender marking feature higher complexity than absence thereof.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

204

  

This is captured by two different dimensions of Audring’s metric, both pertaining to the domain of ‘target complexity’: (6)

a. Matching values < Mismatching values b. Targets match controller in value < Targets do not match controller in value (Audring 2017: 63–4)

In (6), the symbol ‘ inference in Jarawara (Arawan);

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

244

 . .    

‘hear, feel’ > non-visual in Tariana (Arawakan)), nouns (e.g., ‘noise’ > reportive in Xamatauteri Yanomami (Yanomaman), possibly via noun incorporation), and other morphology (e.g., declarative–indicative marker > direct evidential in Shipibo-Konibo (Panoan); past tense markers > reportive/attested in Kamayurá (Tupi-Guaranian)). Whether or not contact is responsible for such innovations is often unclear; in some cases, such as Nanti (Michael 2008), emergent evidential systems do not appear to be directly contact-driven. However, Müller (2013: 227) observes the regional clustering of Amazonian languages exhibiting evidentiality, as for example in the Guaporé-Mamoré (Crevels & van der Voort 2008) and the Vaupés regions, and evidentiality does appear to be relatively prone to diffusion crosslinguistically (see, e.g., Aikhenvald 2004: 21). Surveys of Amazonian evidentiality (Aikhenvald & Dixon 1998; Aikhenvald 2004: 292; Müller 2013: 228) suggest multiple points of independent innovation, from which the phenomenon has likely diffused more widely. Probably the clearest examples of contact-driven elaboration of evidential systems come from the Vaupés, in which a number of unrelated languages have undergone the grammaticalization of native forms to ﬁll a regionally deﬁned set of categories; this is the case for Hup (see above), Tariana (Arawakan, see Aikhenvald 2002: 117–29), and Kakua (Kakua-Nukakan; Bolaños 2016), among other languages.

9.2.4 Valence-adjusting Complex valence-adjusting systems have been noted in Amazonian languages, especially those of the western sub-Andean area (Wise 1990, 2002). Birchall (2014) found that more than 50% of the South American languages in his sample had morphological applicatives, and that these are concentrated in the west, where some languages show particularly elaborate inventories. Also relevant is Guillaume & Rose’s (2010) observation that a large number of Amazonian languages exhibit a dedicated ‘sociative causative’, which speciﬁes that the causer participates in the action along with the causee, in addition to resources for expressing more neutral causation. They propose that the sociative causative may be an Amazonian areal feature in light of its apparent rarity elsewhere in the world, and observe a historical relationship between the sociative causative and applicative constructions. Elaborate valence-adjusting morphology is especially evident in the subAndean Arawakan languages, which stand out as having among ‘the most highly developed systems of morphologically distinct applicative operations on earth’ (T. Payne 1997: 190, cited in Wise 2002: 335; see also Wise 1990; Danielsen 2007; Valenzuela 2010). Such a system can be seen in Nomatsigenga:

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

      (12)

245

Nomatsigenga (Arawakan; Wise 1971, 2002) a. -oko ‘with reference to’ -bi / -birí ‘because, for, why, because of ’ -así ‘purposive [action done with some purpose in view], for’ -pí ‘with respect to, in relation to’ -an / -ant ‘instrumental’ -ben / -bin ‘for, benefactive’ -té ‘towards, against’ -ak / -akag ‘comitative/sociative causative’ b. i-samë-ko-k-e-ro i-gisere 3-sleep----3 3-comb ‘He went to sleep with reference to his comb.’ (e.g., he was making it and dropped it) (Wise 2002: 336)

As observed for the other grammatical domains discussed above, Amazonian valence-adjusting systems often display a highly porous boundary between morphology and syntax. In particular, valence-adjusting mechanisms in these languages are often transparently derived or difﬁcult to distinguish from incorporation (of postpositions or nouns). In Paresi (Arawakan), for example, at least half a dozen different postpositions can be incorporated with valence-adjusting or argument-rearranging functions (Brandão 2014: 276). A particularly interesting case is the form kakoa, which Brandão (2014: 256–9) analyses as a reciprocal sufﬁx when it occurs inside the verb word, and as a comitative postposition when it is juxtaposed to the right of a noun phrase. Both are fully productive, and both moreover can co-occur in reciprocal constructions, in which the comitative expresses one of the arguments involved in the reciprocal event: (13)

Paresi (Arawakan; Brandão 2014: 259) wakoakare=kakoa Ø=aitsa-kakoa-ha minita hoka Indian= 3=kill-- always  kazaihera-ty-oa-heta be.invisible?--- ‘They were always ﬁghting with each other, with the Nambikwara, and he became invisible.’

Interestingly, the indeterminacy demonstrated by kakoa—which could be regarded as one morpheme with low selectivity or two morphemes, one syntactically and another morphotactically placed—does not appear to be due to recent grammaticalization in Paresi. Wise (1990) reconstructs the form *khakh ‘reciprocal’ to Proto-Arawakan, but notes that both reciprocal and comitative functions are widespread, and that reﬂexes of *khakh appear in both postpositional phrases and in verb phrases in languages representing diverse branches of the family.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

246

 . .    

While she suggests that the form originated as a postposition on noun phrases and entered the verb word via incorporation, she observes that a shift from reciprocal to comitative function also appears to have occurred in some languages. It seems likely that the indeterminacy exhibited by Paresi kakoa can be reconstructed to Proto-Arawakan itself. In verb-ﬁnal languages, the subtlety of the distinction between incorporation and pre-verbal object placement can blur the syntax-morphology divide even further. In Hup, for example, the ‘interactional’ (reciprocal) verbal preﬁx ʔũh-, which originates in the incorporation of the noun ‘sibling’, can occur as a phonologically free element with an intervening object argument (see Epps 2008, 2010): (14)

Hup (Naduhupan; Epps 2008: 488) hɨd ʔũ̌h nam nɔ́ʔ-ɔ́y 3  poison give- ‘They give poison to each other.’

The diachrony of valence-adjusting systems has in general not been widely explored, both within Amazonia and beyond (see Haspelmath & Müller-Bardey 2004). However, as with the other domains considered here, the elaborate inventories in sub-Andean Amazonia suggest an areal component. It is tempting to speculate that the complex systems of applicatives in these languages—many of which appear to originate in the incorporation of postpositions and other element—might represent the intersection of the complex verb morphology and incorporating tendencies of western Amazonian languages with the proliﬁc casemarking tendencies of Andean languages. Wise (2002: 341) also points out a number of similar applicative and causative forms in unrelated sub-Andean languages (e.g., Chayahuita (Cahuapanan) -të/-ta, Arabela (Zaparoan) -ta/-tia, and Yagua (Peba-Yaguan) -ta/-tya), and van der Voort (2005: 400) observes similar widespread forms in Guaporé-Mamoré languages (e.g., Kanoe (isolate) ta-/-to-, Kwaza (isolate) -ta-/-tia-, and Karo (Tupian) -ta-; see also Crevels & van der Voort 2008: 167). Although these forms are very short, at least some of these similarities may be due to direct borrowing. Otherwise, clear evidence for diffusion in the grammaticalization of valence-adjusting morphology comes once again from the detailed studies of contact in Vaupés languages Hup (Epps 2007a, 2010) and Tariana (Aikenvald 2002: 113–16).

9.2.5 Summary The studies we have reviewed thus far suggest that Amazonian languages tend to display a high degree of morphological elaboration in particular grammatical domains, and that many of these proliﬁc domains show evidence of restructuring

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     

247

and diffusion across unrelated languages in particular geographic regions. Moreover, in case after case, there is analytic indeterminacy between a morphological and a syntactic treatment of such elements. Often this indeterminacy can be linked to grammaticalization—either as an outcome of a relatively recent change from syntax to morphology or as a facilitator of developments in which innovative morphological forms are delinked from the constructions in which they originated, and extended to new morphosyntactic contexts. Notably, these processes appear to involve movements toward both tighter and looser bonding of morphological forms, rather than a more consistently one-way trajectory toward afﬁxation, and in some cases freer and more bound instantiations of the same morpheme appear to co-exist in a relatively stable fashion. The ubiquity of such cases in the Amazonian context means that they cannot be treated as categorically different from ‘normal’ cases. While comparable phenomena are mentioned in theories that advocate or presuppose morphological autonomy, they are considered in such discussions to be unusual and of marginal importance (e.g., Blevins 2006: 555). However, western Amazonian languages suggest that at least in some regions of the world they may be the norm rather than the exception, and that an index of the degree of autonomy from syntax should be incorporated into the study of the complexity of morphological systems. However, the cases reviewed thus far only provide anecdotal evidence for this perspective, focusing on individual elements within particular languages. In what follows, we engage with the issue of morphological autonomy on a more global level, addressing the broader morphological proﬁles within a sample of languages.

9.3 Exponence complexity and morphological autonomy This section takes up the relationship between EC and the morphology-syntax divide empirically in western Amazonian languages. We develop and demonstrate a methodology that provides more globally oriented metrics of morphological autonomy, focusing primarily on Anderson’s second category of morphological complexity, ‘exponence complexity’ (see section 9.1). EC is a key element of the distinction between morphology and syntax: Advocates of morphological autonomy maintain that while complex deviations from biuniqueness (allomorphy, multiple exponence, morphomic structure, etc.) apply in the form-meaning mappings in morphology, these are rare or even absent at the syntactic level (Booij 1997, Anderson 2015a, Blevins 2016b; cf. Haspelmath 2011). Although the investigation is necessarily preliminary at this stage, we argue that the results lend support to the view that low morphological autonomy is a robust feature of languages in the western Amazon region. Our approach can be summarized as follows. If EC is associated with morphology, and morphology is concerned with the structure of words as at least

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

248

 . .    

partially autonomous systems, then we should expect EC to correlate with other criterial properties of words or parts of words, such as bound status, prosodic dependence, and contiguity. We take the position that the strength and signiﬁcance of these correlations can be used to assess the degree of discreteness between morphology and syntax in a given language. Below, we show that the correlations between EC and criterial wordhood properties are low and usually non-signiﬁcant in the western Amazonian languages we sampled. This observation further supports our argument, presented anecdotally in section 9.2 above, that the languages of this region tend to display a low degree of morphological autonomy. The following sections provide a description of the languages considered in this study (section 9.3.1), an overview of the properties of EC considered and a statistical summary of its realization across these languages (section 9.3.2), and a discussion of the correlations between EC and other wordhood criterial properties (section 9.3.3).

9.3.1 Languages considered Our sample consists of eleven western Amazonian languages from nine language families (see Figure 9.1): Cavineña (Tacanan; Guillaume 2008), Chácobo (Panoan; Tallman 2018), Hup (Naduhupan; Epps 2008), Jarawara (Arawan; Dixon 2004), Kokama-Kokamilla (Tupi-Guaranian; Vallejos 2010), Kotiria (Tukanoan; Stenzel 2013b), Movima (isolate; Haude 2006), Paresi (Arawakan; Brandão 2014), Ashéninka Perené (Arawakan; Mihas 2015), Tariana (Arawakan; Aikhenvald 2003b), and Urarina (isolate; Olawsky 2006). The three Arawakan languages represent distinct branches of this family. The eleven languages are distributed widely across western Amazonia, although some (in particular Hup, Kotiria, and Tariana) are not geographically independent. We have focused on languages with descriptions that are detailed enough for us to code wordhood properties and properties of EC for a range of morphemes. The concept of morphological autonomy developed in this chapter is a relative one, which we quantify as an index that can vary from language to language. Accordingly, we need a baseline for assessing how this index ranks in comparative perspective. While this is a large-scale typological problem, we take a preliminary step by comparing the Amazonian languages in our sample to Central Alaskan Yup’ik (CAY; Eskimo-Aleut family). There are three reasons for choosing CAY as a point of comparison: (i) it is a well-described language with a relatively comprehensive grammar and an extensive literature on its morphological and syntactic structure; (ii) it is comparable to Amazonian languages in displaying a high degree of system complexity in its morphology (i.e., it is a polysynthetic language); and (iii) it diverges from Amazonian languages in that its morphological and syntactic structures have been described as easily distinguishable from one

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

Figure 9.1. Western Amazonian languages sampled

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

250

 . .    

Table 9.4. Number of morphemes coded in this study by language and functional domain

Perene Tariana Jarawara Kotiria Urarina Movima Hup Cavineña Chácobo Paresi Kokoma-Kokamilla CAY

Valence

Tense

Evidentiality

Nominal Classiﬁcation

Total

16 34 5 3 6 32 9 17 38 10 6 13

14 29 13 0 15 8 5 6 20 11 19 10

14 14 1 10 3 2 5 3 4 0 4 2

68 81 0 21 0 111 46 0 0 11 0 0

98 119 19 34 24 153 65 26 61 32 29 25

another on both syntagmatic and morphophonological grounds (Miyaoka 2012; Woodbury 2017). While morphology and syntax are of course interwoven in CAY (e.g., in incorporation), clear cases of indeterminacy in word segmentation do not appear to be as ubiquitous as they are in many Amazonian languages (see Miyaoka 2012: 18). Our hypothesis is that, in general, CAY will rank higher than the western Amazonian languages on metrics of morphological autonomy, reﬂecting the Amazonian areal tendency to make a fuzzier distinction between words and phrases. For the eleven western Amazonian languages and CAY, we coded a total of 685 morphemes for morphological and wordhood properties in the four domains of grammar that have been discussed in this chapter: nominal classiﬁcation, evidentiality, tense, and valence-adjusting (Table 9.4).⁵ Morphemes were identiﬁed on the basis of their function as grammatical elements associated with these domains (i.e., elements that do not function exclusively as members of a major word class). The variation in the number of elements per functional domain across the sample certainly reﬂects typological differences among the languages, and may also reﬂect differences in coverage across grammars regarding particular grammatical domains. The differences between languages sampled with respect to the total number of morphemes coded makes the interpretation of the statistical signiﬁcance of the correlations somewhat more tentative than it would be if they were ⁵ In general, non-linear and syncretic morphology was not evident in the data. Given that the relationship between morphology and syntax is treated in global fashion in the literature, we did not address possible variation in this regard among domains; however, this could be an interesting question to consider in future work.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     

251

more equal. Further details concerning the coding methodology and the metrics of morphological autonomy are provided in the following sections.

9.3.2 Exponence complexity The types of EC considered in this study are listed in (15). Each type of EC was coded as a binomial or ordinal value for the morphemes coded in this study. (15)

a. Number of allomorphs (ordinal: 1 to 5) b. Suppletive allomorphy (binary: yes = 2, no = 0) c. Multiple expression (binary: yes = 1, no = 0)

The EC score we develop in this study is simply the sum of these three scores. For instance, a morpheme that is realized by two allomorphs that are non-suppletive (i.e., related by productive morphophonological rules) and do not involve multiple expression will receive a score of 2; a morpheme that has two allomorphs that are related through suppletion and do not involve multiple expression will receive a score of 4. Below, we describe the process of measuring these variables, and provide a justiﬁcation for the scoring techniques used in this study. We then present an overview of EC scores across the languages considered in this study. • Number of allomorphs. This variable refers to a count of the segmental allomorphs associated with a given morpheme. The number of allomorphs and the presence/absence of suppletion (our second variable) together relate to Anderson’s complexity measure of allomorphy (see section 9.1 above). It should be noted that for this metric, we are simply concerned with counting the allomorphs, whether they are morphophonologically conditioned or suppletive (i.e., these are not distinguished here). We assume that a higher number of allomorphs translates into higher EC, all other factors remaining equal. The maximum number of allomorphs found in our data was ﬁve, but the vast majority of morphemes only have one allomorph. Table 9.5 presents the number of morphemes coded at each level for this variable. An example of a morpheme with at least four allomorphs is the CAY applicative ut~ul~us~uc (example (16)).

Table 9.5. Number of allomorphs per morpheme attested across the sample Number of allomorphs

1

2

3

4

5

Number of morphemes

563

110

6

4

2

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

252 (16)

 . .     CAY (Miyaoka 2012: 1132–3) a. kis’-ut-aanga sink-APL-.3.1 ‘It [e.g. anchor] sank with me.’ b. AngunP(=E)=llu kis’-ul-luku kica-mA=S man..sg.=and sink-APL-.3. anchor-. ‘The man sank along with the anchor’, i.e. the anchor sank along with the man (entangled). c. An-us-gu mikelnguq! go.out-APL-.2.3 child.. ‘You [] take the child out!’ d. unuaqu-uc-iiq-aaten be.tomorrow-APL--.3.2 ‘It will be tomorrow before you (sg.) are done.’ (lit. It [the dawn] will come on you)

The opposite extreme can be seen in the Urarina causative, which displays no variation in phonological form—it is always realized as -a: (17)

Urarina (Isolate; Olawsky 2006: 459–60) a. kanʉ komasaj ʉ-a-anʉ 1 wife come-1-1/ ‘I have brought my wife.’ b. tɕãe kanaanaj-ʉrʉ eno-a-e=lʉ also child- enter-1-3/= ‘He also made the children enter.’

• Suppletive allomorphy. Suppletive allomorphy is considered one of the most important deﬁning properties for morphological status (cf. Haspelmath & Sims 2010, inter alia). An example of suppletive allomorphy can be seen in the tense-modal sufﬁxes of Jarawara, the forms of which vary depending on the gender of the subject; for the immediate past non-eyewitness tensemodal sufﬁx, the masculine form in (18a) is distinct from the feminine form in (18b). Because there is no identiﬁable phonological rule that accounts for the difference between the masculine and feminine forms and generalizes beyond this particular pair of tense-modal sufﬁxes, cases such as these are coded as suppletive. (18)

Jarawara (Arawan; Dixon 2004: 206–7) a. bahiS to-ke-hino sun() -in.motion-..: ‘The sun is (surprisingly to me) going away [i.e., setting]’

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     

253

b. baniS mee wina-tee-hani animal() 3 live--..: ‘There were surprisingly many animals.’ The difﬁculty with suppletion in a study such as this one is that many (perhaps most) linguists have an intuition that suppletive allomorphy is qualitatively distinct from allomorphy based on productive morphophonological rules. However, in order to calculate a global EC score we need to change this qualitative intuition into a quantitative metric. To capture the fact that we view suppletive allomorphy as a much stronger weight to EC, suppletion is coded as a binary variable, but one that is weighted relatively heavily (2 for morphemes that display suppletive allomorphy; and 0 for those that do not). Thus if a morpheme displays suppletion its EC score will automatically be 4 (number of allomorphs: 2 + presence of suppletion: 2). • Multiple expression. Multiple exponence, or deviations from biuniqueness, is another measure of morphological complexity as deﬁned by Anderson (2015a; see section 9.1). Here we focus on discontinuous realizations of form that correspond to a single unit of content, that is, inﬁxes and circumﬁxes. We found no inﬁxes in the languages considered in this study, and there were only a few other cases of multiple expression, such as the reﬂexive/reciprocal k(a)- . . . -ti in Cavineña:⁶ (19)

Cavineña (Tacanan; Guillaume 2008: 271) tudya=yatse ka-peta-ti-kware e=kwe e-jakwi=tsewe then=1 -look.at-=. 1- 1-brother.in.law= ‘Then my brother-in-law and I looked at each other [wondering who of us would know how to milk a cow].’

Multiple exponence was coded as a binary variable: morphemes like the Cavineña reﬂexive/reciprocal would receive a score of 1 for expression in a discontinuous fashion, whereas a one-form-one-meaning correspondence would receive a 0. • A metric for gauging EC. We calculated a global measure of EC for each morpheme by summing up the scores for the three EC criteria described above. Accordingly, a morpheme that is realized as one contiguous form with no allomorphy will receive an EC score of 1; typically morphemes that

⁶ Other examples involve the obligatory double-marking of a particular operation; for example, the Tariana passive requires the co-occurrence of the preﬁx ka- (which elsewhere functions independently as a ‘relative’ preﬁx) and the sufﬁx -kana (Aikhenvald 2003b: 259). We did not consider other types of deviation from biuniqueness (besides allomorphy and multiple expression) because they were found to be very marginal in our data.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

254

 . .     receive such a score are described as syntactic elements (e.g., function words) or as agglutinative morphemes. Higher EC scores are associated with forms that deviate from biuniqueness in some way. Our coding was carried out independently of the grammarian’s structural classiﬁcation of the morpheme in question; for example, we include auxiliaries used as analytic causatives as well as morphological causatives. For this reason, it is unsurprising that a relatively high percentage of morphemes in even a highly polysynthetic language like CAY have a low EC score (56%); this outcome simply reﬂects the fact that elements Miyaoka (2012) regards as ostensibly syntactic (temporal frame adverbs, evidential clitics, etc.) were coded alongside those he treats as morphological elements. This strategy gets at precisely what we are aiming for: we are interested in how morphology and syntax may (or may not) be distinct in the languages in question, not just how morphemes that grammarians have categorized as morphological correlate with indices of morphological complexity.

Individual morphemes score from 1 to 5 in EC across the languages in the sample.⁷ The percentage of morphemes associated with each EC value in the twelve languages considered are provided in Table 9.6. A visual representation of the distributions of EC values across the twelve languages is provided in Figure 9.2, which provides kernel distributions of EC value densities across the languages in the study.

Table 9.6. Percentage of morphemes for each EC value across the languages sampled (with average scores across all the morphemes for each language)

CAY Cavineña Chácobo Hup Jarawara Kotiria Kokama Movima Paresi Ash. Perené Tariana Urarina

Family

1

2

3

4

5

Average score

Eskimo-Aleut Takanan Panoan Naduhupan Arawán Tucanoan Tupian isolate Arawakan Arawakan Arawakan isolate

56% 85% 92% 94% 26% 97% 58% 57% 91% 91% 91% 83%

20% 11.5% 8% 6% 5% 3% 42% 5% 6% 5% 6% 17%

4% 4.5% 0% 0% 16% 0% 0% 11% 0% 2% 1.5% 0%

16% 0% 0% 0% 53% 0% 0% 27% 3% 2% 0% 0%

4% 0% 0% 0% 0% 0% 0% 0% 0% 0% 1.5% 0%

1.92 1.27 1.08 1.06 2.95 1.06 1.41 2.08 1.16 1.15 1.16 1.17

⁷ As seen in (15) above, the EC score according to our metric could be higher for any given morpheme, but in our data set none go above 5.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

      1

2

3

4

1

2

3

Movima

Paresi

Tariana

Urarina

Hup

Jarawara

Kokoma-Kokamilla

Kotiria

255

4

1.5 1.0

Kernel Density Distribution

0.5

1.5 1.0 0.5 Asheninka Perene

Cavinen ˜a

Central Alaskan Yupik

Chácobo

1.5 1.0 0.5 1

2

3

4

1

2

3

4

Exponence Complexity

Figure 9.2. Kernel distribution of densities across the languages of this study

We note two points about the EC values across the languages of this study. First, it is generally true that CAY morphemes are more evenly distributed across the range of EC scores in comparison to the other languages—in other words, they are less likely to cluster at any particular EC value, most notably 1 (the lowest). This is to be expected based on current descriptions of CAY as highly morphophonologically complex, such that afﬁxal elements display a high degree of word internal adjustments (i.e., fusion; see, e.g., Fortescue 1992); a higher degree of allomorphy will produce higher EC values. Second, and in contrast to CAY, the western Amazonian languages sampled cluster predominantly around the lowest EC value (1)—in keeping with the observation that languages of this region tend to exhibit a highly agglutinative proﬁle. On the other hand, Movima, Jarawara, and to a certain extent Kokama-Kokamilla display higher EC levels—a point we return to below. Despite the generalizations made here, we emphasize that a higher EC score does not necessarily translate to a higher degree of morphological autonomy. Higher morphological autonomy is only corroborated if EC correlates with other criterial wordhood properties. In other words, morphological autonomy may be manifested by high EC scores, but high EC scores may not be limited to autonomous morphology.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

256

 . .    

9.3.3 Criterial wordhood properties and morphological autonomy Criterial wordhood properties refer to features that identify constituents or formatives as independent words versus parts of words. Each morpheme coded in the database is coded with a value (either binary (0,1) or ordinal (0,1,2)) for each of the criterial wordhood properties. Since each morpheme also has an EC value associated with it, we can assess the correlation between EC level and wordhood values. We investigate the following three criterial wordhood properties: (20)

a. Bound status (yes = 1, no = 0) b. Prosodic dependence (0 = never, 1 = sometimes, 2 = always) c. Contiguity (yes = 1, no = 0)

In what follows we provide a brief discussion of each of these criterial wordhood properties and how they were coded. We then turn to measurements of association between the wordhood properties and EC complexity across the languages in our sample. According to our conception of morphological autonomy as a typological index along which languages may vary, we propose that the morphological system of a language can be more or less autonomous. However, we do not feel that we are in a position to directly measure morphological autonomy, since it involves many interacting criteria that need to be weighed against one another in a principled way (although future research on this topic may make an overall global measure more appropriate, as suggested by Haspelmath 2011). For this reason, we simply provide statistical summaries of the correlations between EC levels and wordhood criteria in the languages considered here. Due to the fact that the variables are binary and/or ordinal and not normally distributed we use rank statistics to assess the relationship between EC level and criterial wordhood value across the languages. We use Kendall’s tau adjusted for ties in the statistical analysis programme R (McLeod 2011).⁸ In contrast to other rank correlation statistics like Spearman’s rho, Kendall’s tau is ideal for comparisons that involve many ties and small sample sizes. The data we gathered naturally contains many ties because we are comparing variables that, at most, are quantiﬁed from zero to ﬁve across a large sample of morphemes. Furthermore, as can be seen from Table 9.1, we gathered fairly small samples of data, according to the morphemes and constructions described in the grammars. We are concerned here with effect size (i.e., correlation strength) as much as we are concerned with ⁸ For an explanation of this methodology, including the concept of ties in rank statistics, see Kendall & Gibbons (1992) and Gibbons (1993). For an introduction to using Kendall’s tau in R, see Field et al. (2012: 225–6).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     

257

statistical signiﬁcance. The tau statistic can be read as a measure of the degree of morphological autonomy that a relationship between EC and a criterial wordhood property affords. For a given association, a strong positive correlation (a tau coefﬁcient that approaches 1) suggests a more robust distinction between morphology and syntax; a weak or negative correlation (a tau correlation close to and/or below 0) suggests a more porous boundary between the two. In this study we judge a correlation to be signiﬁcant if the p-value is lower than 0.05.⁹ • Bound status. Bound status is a classic criterion for wordhood (Bloomﬁeld 1933; Hockett 1958).¹⁰ Here we consider a morpheme bound if and only if it fails the minimum free form test (and is not a primary content item, i.e., a member of a major word class, such as a verb); otherwise it is considered free. A morpheme or construction passes the minimum free form test if it can stand alone as a single grammatical utterance. Crosslinguistically, bound status tends to be associated with morphological elements, while free forms are more syntactically relevant (Bloomﬁeld 1933: 207). Despite a tendency toward lower EC, western Amazonian languages typically have a large repertoire of bound forms. Example (21) illustrates a verb complex from Chácobo: The only morpheme which can stand on its own is the verb root oʂa ‘sleep’; all other morphemes are bound. (21)

Chácobo (Panoan; Tallman 2018) a. oʂa-mis=tɨkɨn=kas=ʔitá=kɨ=rɨ́ sleep-===.=:= ‘What a shame that he only wanted to sleep yesterday.’ b. oʂa ‘asleep’ c. *-mis d. *=tɨkɨn e. *=ria f. *=ʔitá g. *=kɨ h. *=rɨ

⁹ Of course, high p-values do not necessarily imply that there is no relationship between the EC score and a wordhood property (the sample sizes are too small to afford such an interpretation). We include the information regarding statistical signiﬁcance for the reader who is interested in gauging how reliable our results are on this point. ¹⁰ A number of authors have pointed out problems with the minimum free form test (Haspelmath 2011; Bickel & Zuñiga 2017), in particular that it identiﬁes compounds as phrasal elements and certain function words (determiners) as morphological elements. However, this test is not uniquely problematic among wordhood tests, as Haspelmath’s (2011) systematic review demonstrates. Furthermore, the test still provides useful information regarding morphological vs. syntactic status; for instance, Haspelmath (2011: 40) points out that if an element passes the minimum free form test this provides strong evidence that this element is not an afﬁx. We see non-afﬁxicality as an important criterion in calculating overall morphological autonomy.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

258

 . .     Table 9.7. Rank correlations between EC level and bound status values across languages tau correlation CAY Kokama Jarawara Paresi Tariana Hup Ash. Perené Chácobo Cavineña Urarina Kotiria Movima

0.543 0.449 0.329 0.246 0.189 0.183 0.119 0.099 0.085 0.037 0.342 0.716

p-value 0.004 0.017 0.139 0.166 0.038 0.143 0.231 0.445 0.671 0.858 0.050 >0.005

We encode bound status as a binary variable. The morpheme in (21b) would receive a score of 0, and all of the other morphemes (21c–h) receive a score of 1. Table 9.7 provides the rank correlations across the languages of this study. The tau correlation can be interpreted as an indicator of effect size; how strongly associated EC level is with bound status in the language. In CAY and KokamaKokamilla there are signiﬁcant positive correlations, with CAY coming out on top. In Movima, however, there is a signiﬁcant and negative correlation, a point we return to in section 9.3.4 below. Such measures of association are here considered to be metrics of morphological autonomy. • Contiguity. This criterion refers to whether a given formative is required to occur directly adjacent to the morpheme it semantically combines with, or can be separated from it by a free element. A lower degree of contiguity is associated with a more syntactic status, while a higher degree of contiguity is associated with a more morphological status (e.g., Mugdan 1994; Dixon and Aikhenvald 2002). To illustrate the criterion of contiguity, we can make reference to the Chácobo verb complex in example (21) above. According to the minimal free form test the verb complex in this example is a single word-unit, but according to rules of contiguity it consists of at least ﬁve different units, each of which can be separated from its neighbours by a full noun phrase such as honi siri ‘old man’. Example (22) illustrates the possibility of inserting this noun phrase at any of the points (a–e). Only the antipassive -mis and the combination of the recent past and past tense declarative =ʔitá=kɨ require contiguity.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

      (22)

259

Chácobo (Panoan; Tallman 2018) a. (honi siri) oʂa-mis b. =tɨkɨn c. =kas d. =ʔitá=kɨ (man old) sleep- = = =.=: e. =rɨ́ = ‘What a shame that the old man only wanted to sleep again yesterday.’

We coded the criterion of contiguity as a binary variable for a given morpheme. Morphemes that can be separated by a free phrasal construct from the element they associate with semantically receive 0 for contiguity (as in 21d–h). If the morphemes require contiguity they receive a 1, as with antipassive -mis (21c).¹¹ A language that displays a high degree of morphological autonomy is expected to show a strong and positive correlation between EC level and morphemic contiguity. Table 9.8 shows the rank correlations between EC level and contiguity across the languages of this study. CAY, Jarawara, and Ashéninka Perené show positive and signiﬁcant correlations between EC level and contiguity, with CAY coming out on top. While Ashéninka Perené’s correlation is statistically signiﬁcant, the effect size is substantially lower than for CAY. Thus on this EC contiguity metric only CAY and Jarawara provide evidence for morphological autonomy. • Prosodic dependence. For a given formative or construction, prosodic word projection is prototypically associated with wordhood status. Incorporation into an adjacent prosodic word is prototypically associated with afﬁx status (Spencer & Luís 2012). Table 9.8. Rank correlations between EC level and contiguity value across languages tau correlation CAY Jarawara Cavineña Ash. Perené Urarina Chácobo Tariana Movima Kotiria Paresi Kokama Hup

0.594 0.532 0.305 0.236 0.205 0.178 0.131 0.101 0.030 0.053 0.139 0.166

p-value 0.002 0.016 0.121 0.018 0.325 0.168 0.150 0.190 0.862 0.739 0.462 0.183

¹¹ A reviewer suggests that contiguity might be better treated as a three-way variable, with intermediate status given to elements that require adjacency in some constructions but not in others. We concur that this could be a productive approach to explore, but for the purposes of this study it was found to be too difﬁcult to apply in a consistent way.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

260

 . .    

In some cases, the prosodic dependence of a given morpheme may vary depending on its syntagmatic context. For example, the Jarawara auxiliary na prosodically incorporates into the main verb, with which it forms a single phonological word (23a), but projects its own independent prosodic word when it combines with other afﬁxes (23b).¹² (23)

Jarawara (Arawan; Dixon 2004: 30) a. amó+na sleep+. ‘She sleeps.’ b. amo o-ná-habóne sleep --. ‘I’m going to sleep.’

A similar situation occurs with tense morphemes in Chácobo, but the syntagmatic contexts that license prosodic word projection or incorporation are different: In this language, a tense morpheme prosodically incorporates into an adjacent verb root (24a), but projects its own prosodic word when a subject NP intervenes ((24b), repeated from (7a–b) above). (24)

Chácobo (Panoan; Tallman 2018) a. kako sani=ʔi (ka=ʔitá=kɨ)Pwd Caco ﬁsh= go=.=: ‘Caco went ﬁshing [yesterday or two days prior]’ b. sani=ʔi (kaa)Pwd kako (=ʔitá=kɨ)Pwd ﬁsh= go Caco =.=: ‘Caco went ﬁshing [yesterday or two days prior].’

Finally, some grammatical formatives may always project their own prosodic words, as exempliﬁed by the Jarawara ‘aspect/time lexeme’ hibati ‘completed’ (example (25); Dixon 2004: 223); see also the Hup recent past marker páh in (6) above: (25)

Jarawara (Arawan; Dixon 2004: 223) Barako owa heta na-re-ka name() 1. lease.from -..:-: hibati jaa   ‘Branco did lease [the ﬁshing waters] from me, but this arrangement is now ﬁnished.’

¹² Dixon uses the symbol ‘+’ to indicate what he refers to as ‘a grammatical word boundary within a phonological word’ (2004: 30).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     

261

Table 9.9. Rank correlations between EC level and prosodic dependence across languages tau correlation CAY Kokama Jarawara Ash. Perené Tariana Cavineña Chácobo Paresi Urarina Hup Movima Kotiria

0.543 0.491 0.426 0.187 0.155 0.151 0.129 0.122 0.107 0.018 0.089 0.274

p-value *0.045 0.083 *0.049 0.053 0.076 0.443 0.307 0.485 0.595 0.884 0.236 0.111

Our scoring captures these three possible degrees of prosodic dependence. If a formative always projects a phonological word, it receives a score of 2; if it never projects a phonological word (i.e., it always phonologically incorporates), it receives a score of 0. Formatives that do both receive a score of 1, as in the Jarawara and Chácobo cases above.¹³ Table 9.9 provides the rank correlations for the languages considered in this study. CAY displays the strongest correlation for the relationship between prosodic dependence and EC. Only two languages, CAY and Jarawara, display a signiﬁcant and positive correlation.

9.3.4 Summary By comparing measures of EC and wordhood status, we obtained a metric by which to gauge the relative degree of morphological autonomy across our sample of languages. The western Amazonian languages in our set show a relatively low degree of morphological autonomy, in contrast to our geographic and typological outlier, CAY, which scored much higher on all measures considered. Despite the fact that the types of EC considered here (allomorphy, culminativity) have been described as unproblematic measures of morphological complexity

¹³ One might argue that prosodic independence is more a fact about the phonological or prosodic component of grammar, rather than having anything to do with the morphology-syntax distinction. However, the relevance of this criterion is evident in the problem of clitics. As Spencer & Luís (2012) argue, the clitic can be understood as a ‘boundary category’—which calls into question the discreteness of the components that it straddles (Croft 1991, 2001). From this perspective, a language with a greater degree of isomorphism between phonological words and grammatical words would be understood as having a higher degree of morphological autonomy.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

262

 . .    

(e.g., Anderson 2015a), our study illustrates that their status as necessarily morphological cannot be assumed. This point is best exempliﬁed by Movima, which demonstrates a relatively high level of EC complexity in comparison to the other languages in the sample, but coupled with a lower overall tendency for morphemes to be dependent elements with respect to the wordhood criteria considered here, particularly bound status. Similarly, while Jarawara comes closest to CAY in displaying morphological autonomy via its relatively high correlations between EC and the wordhood measures of contiguity and prosodic dependence, its association between EC and bound status is weak and non-signiﬁcant. The Movima and Jarawara cases demonstrate that deviations from biuniqueness are in principle orthogonal to the structural classiﬁcation of form-meaning mappings as either morphological or syntactic.

9.4 Conclusion Our ﬁndings suggest that a relatively loose distinction between syntax and morphology is an areal feature of western Amazonian languages (perhaps extending into neighbouring regions). In this chapter, we have presented evidence for this view of Amazonian morphological proﬁles from two major angles. From the perspective of system complexity, we addressed morphological behaviour across four domains that show a tendency toward elaboration in western Amazonian languages—nominal classiﬁcation, tense, evidentiality, and valence-adjustment— and for each explored the relationship between complexity and language contact and change. Turning our focus to EC, we systematically evaluated aspects of this domain against criteria associated with wordhood for a sample of eleven western Amazonian languages, plus CAY as a point of contrast. In addition to showing that the Amazonian languages all exhibit relatively low degrees of morphological autonomy, our ﬁndings highlight the important point that factors associated with morphological complexity are in fact not necessarily morphological: for two Amazonian languages in our sample, high EC does not correlate strongly with wordhood status. In future work, we hope to expand the typological scope of this survey, in order to establish the degree to which Amazonian languages might deviate from a more widely deﬁned baseline relating to morphological autonomy, and to determine a more precise understanding of the geographic distribution of these patterns within and beyond South America. The low degree of morphological autonomy in western Amazonia has important implications not only for our understanding of synchronic relationships among linguistic subsystems, but also for our conception of diachronic processes of contact and grammaticalization. As we have argued here, the porous nature of the morphology-syntax distinction in Amazonian languages is associated with other areal tendencies, such as productivity of compounding and incorporation,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     

263

that facilitate grammaticalization by creating a context in which lexical elements are easily reanalysed as bound morphology. These processes in turn can feed the elaboration of grammatical domains, particularly under the pressure of areal diffusion. A fuzzy morphology-syntax distinction also allows for low selectivity on the part of grammaticalizing morphological elements, through which they may readily detach from the contexts in which they emerge and be extended to new ones. These processes result in outcomes that are typologically unusual in broader perspective; in particular, that morphologization might frequently involve a decrease in bound status, and that more and less bound instantiations of particular morphemes might be maintained over time, rather than representing only ﬂeeting stages of a transition in progress. In sum, a closer look at the morphological proﬁles of western Amazonian languages invites a revision of current views of morphological complexity and its relationship to processes of language contact and change. The Amazonian case underscores the recognition that large-scale regional patterns may play an important role in shaping our vision of what is canonical or ‘normal’ in language, and that a robust understanding of human language must take a range of diversity into account.

Acknowledgements Epps gratefully acknowledges funding from the University of Texas at Austin, as well as earlier support from the National Science Foundation, Fulbright-Hays, and the Max Planck Institute for Evolutionary Anthropology for work on Hup; Tallman thanks the National Science Foundation and the Endangered Languages Documentation Programme for supporting his work on Chácobo. We are grateful to the editors of this volume for inviting us to contribute, and to Peter Arkadiev, Francesca di Garbo, Tony Woodbury, and an anonymous reviewer for their suggestions.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

III

THE ACQUISITIONAL PERSPECTIVE

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

10 Radical analyticity as a diagnostic of adult acquisition John H. McWhorter

10.1 Introduction I propose a hypothesis (cf. McWhorter 2016, 2019): that when a language is radically analytic in comparison to its close relatives, this can be treated as an indication that in the past, the language was acquired by a critical mass of adults, rather than having always been passed down the generations intact. In previous work (McWhorter 2007) I have argued that when a language within a family is markedly more analytic than its sisters, it can be traced to extensive secondlanguage acquisition (e.g., English, Persian, Mandarin, Malay). Here, however, my argument is more speciﬁc, extending this framework to whole families or even Sprachbunds of languages not just relatively analytic, but extremely so.

10.1.1 Deﬁnition of radical analyticity By radical analyticity, I refer to absence (or all but absence) of inﬂectional marking indicated by afﬁxation, tone, or vowel changes in quality or length. The difference must be clear with relative analyticity, which linguists often refer to as ‘analyticity’ in a kind of shorthand, such as Nurse (2007) referring to the amply inﬂected Supyire (Gur, Niger-Congo) as ‘analytic’ in comparison to especially inﬂected languages like those of Narrow Bantu. My hypothesis distinguishes two kinds of language contact effects: transfer and structural simpliﬁcation (although the two are hardly mutually exclusive). The role of transfer in language contact would seem self-evident and is richly studied. However, the role of simpliﬁcation in language contact has been studied more in regard to pidgins and creoles than to less extremely simpliﬁed languages. Kusters (2003) and McWhorter (2007) were pioneering explorations of this intermediate range in a crosslinguistic sense, continued by the now seminal Trudgill (2011).

John H. McWhorter, Radical analyticity as a diagnostic of adult acquisition In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © John H. McWhorter. DOI: 10.1093/oso/9780198861287.003.0010

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

268

 . 

10.1.2 Radical analyticity worldwide This presentation proposes that there are three main geographical clusters of radically analytic languages with extensive adult acquisition in their histories. The ﬁrst is the few Niger-Congo languages that are radically analytic, such as the Gbe languages, Yoruba, and Nupe (henceforth GYN), which my hypothesis suggests would have arisen from an earlier Niger-Congo variety with ample inﬂection. Yoruba’s near lack of inﬂectional morphology of any kind is indicated here: (1)

Yoruba Mo mú ìwé wá fún ẹ. I take book come give you ‘I brought you a book.’ (Stahlke 1970: 63)

The second cluster is a few languages of Eastern Indonesia—Austronesian ones on the island of Flores and a few on Timor—and some non-Austronesian ones on the northern coast of the island of New Guinea (as documented by Paauw 2007). Within Austronesian, adult acquisition is considered relatively uncontroversial for various colloquial dialects of Malay/Indonesian (Grijns 1991; McWhorter 2007: 223–9), and for Tetun (Hull 1999: ix; Thomaz 2002). However, my proposal will explain why we can infer a history of adult acquisition even for languages of this region with no documented history, such as ones in central Flores like Rongga, whose characteristic analytic structure is shown here: (2)

Ema ja’o weli kebaya toro. father I buy dress red ‘My father bought a red dress.’ (Arka 2011: xviii)

or one of western Papua such as Abun: (3)

Men ben suk no nggwe yo, men ben suk sino. we do thing  garden then we do thing together ‘If we do things at the garden, then we do them together.’ (Berry & Berry 1999: 23)

Finally, the Sinitic languages can be seen as revealing, in their radical analyticity, adult acquisition in their past (cf. McWhorter 2016). The radical analyticity in language families neighbouring Sinitic, such as Hmong-Mien, Tai-Kadai, and Mon-Khmer, is often treated as an areal ‘Sinosphere’ feature. I suggest that within this language area, the radical analyticity, at least, traces to Sinitic. This reconstruction is especially compelling given that Mon-Khmer languages are most analytic where Chinese has had inﬂuence, and much less so where it has not,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

269

among the Munda languages to the west and Aslian languages to the south. Under this analysis, the question becomes how Sinitic itself reached a radically analytic state in the ﬁrst place, upon which I argue that adult acquisition is the most plausible cause. Under the analysis I will present, the GYN languages likely reached their state as the result of waves of second-language acquisition as an earlier Niger-Congo variety travelled southward towards the coast of the Bight of Benin. Various isolates, as well as the Mande and Ijo groups Dimmendaal (2011) has argued not to be members of Niger-Congo, are likely remnants of the original language distribution in upper west Africa. The Flores languages were likely affected by invasions from Sulawesi (or possibly the aboriginal population of Homo ﬂoresiensis). Hull (1998) makes a strong case that the Timor languages were deeply impacted by an invasion from the island of Ambon, while Paauw (2007) suggested that contact with Austronesian as its speakers migrated eastward affected the languages in Papua. The reason for the analyticity (and in general the radically isolating structure) of Old Chinese remains unknown, although DeLancey (2011) and McWhorter (2016: 81–2) offer suggestions—under an analysis which, we must recall, posit the nature of Old Chinese as an indication of adult acquisition yet to be identiﬁed.

10.1.3 Application to this volume In modern linguistics, many linguists are sceptical of the idea that the development of even radical analyticity necessarily entails a loss of overall morphological complexity. A guiding caveat is that what was once marked by an afﬁx (or clitic) can later be marked by a free morpheme, or even a process on some other level of the grammar such as syntax (e.g., via word order). While this is true, any assumption that this kind of replacement is somehow regular or even obligatory in diachronic development is (i) logically unmotivated (i.e., for what reason or purpose would grammars ‘compensate’ in this way towards an unspeciﬁed sine qua non degree of structural complexity?); and (ii) empirically disproven (Shosted 2006 disproves that languages compensate for loss of complexity in one module by gaining it in another). Thus the development of radical analyticity is not a mere matter of a language transforming its typology in a fashion independent of complexity. Rather, the languages addressed in this chapter have lost, or all but lost, overt indication of case marking and concord in any module. They do not mark these with free morphemes. Moreover, while of course they have syntactic processes sensitive to the distinction between, for example, subject and object, these are not as obligatoriﬁed (in the terminology of Lehmann 1985) as afﬁxal markers of these categories tend to be, often qualifying more as pragmaticized structures rather than grammaticalized ones.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

270

 . 

Similarly, indeed noun class markers can be replaced with free morphemes as in Wolof (Loporcaro, Chapter 6, this volume), and numeral classiﬁers in languages such as many in East and Southeast Asia can be seen as functionally equivalent to noun class marking (Grinevald & Seifart 2004). However, the free morphemes in question neither vary for case—much less according to declensional classes indicating this case variantly—nor vary in form between modiﬁers and heads as afﬁxal noun class marking often does (Russian iz krasivyx ženščin ‘of the beautiful women’). Similarly, while radically analytic languages indicate inherent inﬂectional categories such as tense and number with free morphemes, these free morphemes do not occur in paradigmatic variants independent of semantics, in the vein of verb conjugational afﬁx paradigms. Furthermore, it would appear that afﬁxation, complete with the morphophonemic processes it encourages as well as distortions into outright irregularity beyond, conditions much more irregularity—another facet of complexity—than free morphemes do. The ‘irregular verb’ is quite rare in, for example, Yoruba, Mandarin, and Rongga, where there are no afﬁxal markers of inherent inﬂection likely to drift into morphophonemic subrules, thorough irregularity subject to no rule, and then utter suppletion. Radical analyticity, that is, is less a change of type than an unravelling. Radically analytic languages remain vastly complex in countless ways, as all languages are. However, their radical analyticity does entail a signiﬁcant degree of relative simpliﬁcation.

10.2 Adult acquisition versus ‘drift’ That is, I propose that we would no more question whether Yoruba, Rongga, or Mandarin have extensive adult acquisition in their histories than that we would question whether the difference between Haitian Creole French and French—loss of grammatical gender, verbal inﬂection, and much else—were due to extensive adult acquisition: (4)

a. French Ils n’ont pas de ressources qui puissent 3. -have   resource.  can.3 leur permettre de résister à la famine. 3. allow of resist to . famine b. Haitian Creole Yo pa gen resous ki pou pèmètyo reziste anba 3  have resource  can allow 3 resist under grangou. famine ‘They didn’t have the resources that would allow them to hold off famine.’ (Ludwig et al. 2001: 164)

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

271

Indeed, specialists in second language acquisition and in language contact concur that for adult acquirers, the target language’s inﬂectional morphology is especially subject to elimination (cf. Pienemann 1998; Plag 2008), as the result of factors of phonological and semantic transparency, of the same kind that condition hierarchies of borrowability (cf. Thomason & Kaufman 1988; Matras 2009: 153–7). Current orthodoxy assumes, however, that adult acquisition is but one of two pathways via which a language might become radically analytic (cf. Thomason 2003: 242; Hyman 2004). That is, works such as Thomason (2003) and Hyman (2004) are typical in their assumption that radical analyticity can also occur grammar-internally as the result of the ‘drift’ process described by Sapir (1921), in which a language’s grammar-internal changes—or even that of a number of contiguous languages such as those of much of Europe—coalesce upon a certain general tendency, such as inﬂectional loss. The assumption is natural, given that the loss of signiﬁcant (if not radical) amounts of inﬂectional afﬁxation is wellknown from the difference between modern and Old English, between the modern Mainland Scandinavian languages and Old Norse, and the general ‘drift’ towards analyticity identiﬁed by Sapir (1921). Various treatments, however, have demonstrated that the above cases and similar ones were, themselves, products of second language acquisition (cf. Kusters 2003, McWhorter 2007, Trudgill 2011 for general treatments; McWhorter 2002 on English; Trudgill 2011 on Scandinavian). There is currently such a volume of studies of this kind that it becomes appropriate to explore a certain theoretical economy in our theory of language diachrony and its relationship to language contact. To wit: it is worthwhile to explore whether radical analyticity can emerge only via adult acquisition, and therefore could be useful as a window on the past of languages whose previous stages are otherwise lost to history. In sections 10.3, 10.4, and 10.5, I will present three aspects of radically analytic languages that suggest that they owe their state to second language acquisition rather than grammar-internal development. I will then address two prominent proposals suggesting that radical analyticity could emerge without secondlanguage acquisition: (in section 10.6) Mufwene’s (2001) proposal that creole languages’ analyticity is due simply to the analyticity of their source languages; and (in section 10.7) Hyman’s (2004) proposal that Gbe, Yoruboid, and Nupe reached their state via the evolution of a monosyllabic phonological template. I must specify: my claim is not that any degree of adult acquisition of a language must denude it of a radical amount of its inﬂectional afﬁxation. Adult acquisition has occurred in various degrees to, probably, most languages, and has varying degrees of effect. My argument is that radical analyticity can be analysed as tracing to an extreme degree of adult acquisition: Trudgill (2011: 57), for example, suggests that the tipping point for stark inﬂectional loss begins when non-native learners constitute 50% or more of the speech community.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

272

 . 

10.3 Argument No. 1: contextual versus inherent inﬂection One indication of radical analyticity’s roots in adult acquisition rather than ‘drift’ is that in languages that have reached such a state, one type of inﬂection is eliminated entirely or virtually so, while another type is retained in the form of free morphemes. This is typical of adult acquisition, but not of grammar-internal change. Booij (1993) distinguishes inherent inﬂection from contextual inﬂection. Inherent inﬂection contributes meaning, driven by the speaker’s choice of what they wish to communicate. It thus includes nominal number, tense, and aspect, and is not required for syntactic grammaticality. This contrasts with contextual inﬂection which, indicating features such as case and concord necessary to the syntactic composition of the sentence, has function. Crucially, in creoles, the lexiﬁer language’s inherent inﬂection is typically preserved to a considerable extent in the form of free morphemes, such as preverbal tense and aspect particles (even when the substrate languages were synthetic, as was the case with many creoles; cf. section 10.6 below). However, contextual inﬂection is typically not replaced in this fashion (Plag 2008; Luís 2009). In this Haitian sentence, French’s past tense inﬂection is replaced by the free form te, but the nouns baay ‘thing’ and moun ‘people’ are not marked for grammatical gender as their French equivalents are, nor is grammatical gender marked on Haitian’s deﬁnite articles; also, pronouns such as li (here, ‘it’) are not marked for case: (5)

Yo te suvèye baay sa-a pu anpèche moun vole li. they  watch thing this- for prevent people steal it ‘They watched this thing in order to prevent people from stealing it.’ (Koopman & Lefebvre 1981: 203)

The facts are similar in pidgins, in which even as free morphemes, contextual inﬂection is rare while inherent inﬂection is frequent (Roberts & Bresnan 2008). As Plag (2008) notes, creoles’ retention of inherent rather than contextual inﬂection is predictable from the hierarchical pathway of second-language acquisition identiﬁed by Pienemann (1998), under which inherent morphology is more easily accessible to the learner than contextual, and thus always acquired ﬁrst. In contrast, under ordinary grammar-internal change, contextual morphology is much less fragile. For example, 1. French has lost Latin’s case inﬂections on nouns (ﬁrst collapsing the oblique cases into one and then losing even this distinction), but retains case distinctions in pronouns, and concord within NP. 2. Pashto has lost much of the inﬂection in early Iranian languages, but nevertheless retains ample case marking and concord.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

273

3. Within Niger-Congo, while Wolof lacks the noun class preﬁx paradigm typical of Bantu and even many of its own relatives within the Atlantic subfamily, it has replaced them with postposed free morphemes (Torrence 2013: 16; cf. also Babou & Loporcaro 2016, Loporcaro, Chapter 6, this volume), as shown in Table 10.1. 4. Modern Armenian dialects retain Indo-European case marking as well as inﬂections distinguishing declensional classes; Albanian also retains case marking as well as grammatical gender. Adult acquisition is not assumed to have been signiﬁcant in the timelines of either of these branches of IndoEuropean, as opposed to in Romance and Germanic. 5. Georgian has retained the contextual inﬂection of Proto-Kartvelian over several millennia. These cases serve to illustrate, as Nichols (1992: 169) indicates, that ordinary grammar-internal change poses no threat to contextual inﬂection. The contrast is clear with the extent to which adult acquisition indeed does so. As such, the fact that radically analytic languages like the GYN ones and those of central Flores like Rongga retain free morphemes in the function of inherent inﬂection, but eschew contextual morphology completely, suggests that they have roots in non-native acquisition, under which learners had access to inherent morphology rather than contextual because inherent inﬂection is more like derivational morphology, as in more ‘lexical’, and thus more salient to the nonnative learner. This distinction is the one reﬂected in borrowing as described by Gardani (2008, 2012, 2018). Thus, a sentence like the one below in (example (6)) Fongbe contrasts with a Swahili one not only in encoding aspect with a free morpheme, but in lacking either bound or free noun class morphology: (6)

Fongbe Àvún ɔ́ nɔ hàn àɖú mὲ. dog   bite tooth person ‘The dog bites people.’ (Lefebvre & Brousseau 2002: 266) Table 10.1. Wolof noun class markers xaj bi gaal gi ndap li wax ji jën wi ndaw si saw mi nit ki

the dog the boat the pot the talk the ﬁsh the young woman the urine the person

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

274

 . 

(7)

Swahili U-levi hu-ondoa akili. -drunkenness -remove .sense ‘Drunkenness takes away sense.’ (Perrott 1950: 56)

In the same way, Central Malayo-Polynesian languages typically have subject marking concordial preﬁxes, as in Leti: (8)

Müani-ne püate ra-mtïètne. man-and. woman. 3-sit. ‘The man and the woman sit.’ (Van Engelenhoven 2004: 243) ( = indexical marker)

The languages in central Flores such as Rongga lack these preﬁxes, and case marking, but mark tense and aspect with free morphemes: (9)

Ata gagi ngai ngaja. person old  talk ‘The elders are talking.’ (Arka 2011: 56)

Because contextual morphology is usually discussed in reference to afﬁxal languages, it may seem unremarkable that the Chinese languages have very little marking of case and grammatical relations. However, even a largely monosyllabic language like Akha (Sino-Tibetan) marks ergativity with free morphemes: (10)

ŋà nɛ àjɔq áŋ áshì thì shì biq I  he  fruit one  give ‘I gave him one fruit.’ (Hansson 2003: 243)

ma. 

Therefore, the radically analytic languages I discuss resemble creoles not simply in being analytic, but in also retaining a particular kind of morphology as free morphemes while eschewing the other kind. In this, these languages can be seen as harbouring evidence of adult acquisition.

10.4 Argument No. 2: analytic language as an unnatural state Especially given how familiar it is to linguists that Modern English is so much more analytic than Old English, it may seem unexceptionable that, by chance, some languages might shed all of their inﬂectional afﬁxation.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

275

10.4.1 Grammaticalization is unceasing I will present three observations which, together, suggest that a language can only lose all of its bound inﬂection via external intervention. First, the emergence of new grammatical items via grammaticalization processes, as well as reanalysis, is a constant in the life cycle of a language. More to the point, there is no indication in the grammaticalization literature that the process only operates in a subset of languages, or that the process is given to halting for long periods. Grammaticalization can be taken as equivalent to the movement of bodies in the theory of physics: just as stasis under this formulation is irregular, we can assume that in language change, the cessation of grammaticalization, indicates the death of the language. To wit, grammaticalization is unceasing. Second, following from this point is that there is no reason that while a language were losing bound inﬂection, the development of new inﬂection via grammaticalization would not be occurring simultaneously. Put differently, diachronic theory knows no reason that there would be such a cessation. Moreover, empirical evidence demonstrates its opposite. In Romance, the erosion of Latin’s future marking sufﬁxes was paralleled by the emergence of new ones from the grammaticalization of habere ‘to have’ (as well as a new conditional marking paradigm). Also, Italian developed new noun inﬂectional classes as original ones were lost (Gardani 2013). In the Kartvelian language Svan, declension marking sufﬁxes proliferated amidst its loss of some of Common Kartvelian’s original concord machinery (Harris 2004: 152–5). In Swahili, past marking preﬁx li- grammaticalized from a locative verb as the Common Bantu equivalent a- (Nurse 2008: 257) wore away (McWhorter 1994: 62–3). Afﬁxes and paradigms change function as often as they disappear (cf. Mukarovsky 1977: 32–5; Harris & Campbell 1995; Good 2012a). Third, following in turn from the above point, languages do not ‘cycle’ through stages of radical analyticity followed by the development of new inﬂections which eventually wear away such that the cycle begins again. That linguists sometimes suppose so would seem to be due to a ‘folk’ interpretation of Hodge (1970) on Egyptian, which actually showed a phase of relative analyticity, nothing approaching radical. Meanwhile, no cycle through radical analyticity has been demonstrated elsewhere. As Dahl (2004: 261–88) notes, the absence of such a cycle has been explicitly noted in Afroasiatic, Uralic, and Altaic, and meanwhile specialists in language groups worldwide report no such cycles. In sum, grammaticalization is analogous to crocodiles’ and ﬁshes’ teeth, which are continually replaced throughout life. These animals do not ever reach a toothless stage. If one were encountered toothless, we would know that this was the result of an external disruption. We would neither venture that it was a normal development nor expect it to develop a mouthful of new teeth overnight.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

276

 . 

With the three above observations, grammars such as Yoruba, Mandarin, and Rongga become puzzles. Adult acquisition is the only mechanism which has been empirically documented to shave away all or almost all of a language’s bound inﬂection. There are no documents of radical analyticity’s emergence in East and Southeast Asia, Indonesia, or West Africa: in all cases, the languages are radically analytic by the time they were committed to writing. I suggest that a solution to the puzzle that these other languages pose is that they, too, were born of adult acquisition.

10.4.2 Unstressed ﬁnal syllables do not lead to the typology of Chinese Two common conceptions must be addressed. First, is withdrawal of stress from ﬁnal (or initial) syllables a possible reason for a language becoming radically analytic? Two answers beckon: 1. This account would neglect that bound afﬁxation often includes vowel changes within the root. A great deal of English’s inﬂectional morphology, for example, is indicated with the root vowel changes in the past forms of verbs. Even if destressing the ﬁnal syllable had denuded English of all inﬂectional sufﬁxes, the vowel changes in the strong verb roots would have remained. 2. Lack of stress on the ﬁnal syllable is not as regularly destructive of inﬂectional morphology as often supposed. Withdrawal of stress from the ﬁnal syllable is common in Indo-European, and usually the result has been languages that have remained richly sufﬁxed. Baltic and Slavic preserve a great deal of Proto-Indo-European nominal morphology, and yet, for example, West Slavic ﬁxed its accent on the ﬁrst syllable several centuries ago. Armenian has ﬁxed the accent on the penult, and yet retains a rich declensional system and robust verbal inﬂection. A considerable degree of unaccented wordﬁnal inﬂection has survived in Icelandic. In Celtic, when the accent was retracted from endings, Goidelic (such as Irish and Scots Gaelic) retained much verbal inﬂection and a degree of nominal. We must also consider the Romance languages other than French, such as the Iberian languages and Italian, in which unstressed inﬂectional sufﬁxes are proliﬁc and robust.

10.4.3 Inﬂection is more quickly lost than gained The second conception we must address is a possible misinterpretation. My claim that radical analyticity is an unnatural state for a language must not be taken to

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

277

mean that it is incompatible with human cognition. In fact, many would reconstruct that language emerged uninﬂected (e.g., Comrie 1992). However, the development of grammatical afﬁxes is a slow process. As Dahl (2018) notes, a mere few inﬂectional afﬁxes are documented to have emerged in Europe over the past 2,000 years. Thus while a single instance of disruption, such as the inmigration of a large population of adult learners, can eliminate a language’s bound inﬂection (many creoles) or vastly reduce it (English) in one stroke, the nature of grammaticalization conditions no reason to suppose that new afﬁxes would emerge immediately. In fact, theoretically, this is what we would not expect. Yet the radically analytic languages I have referred to do show signs of grammaticalization, albeit the forms are not yet bound ones. This, too, is what we would expect, and would ﬁnd puzzling if absent. In Fongbe, an imperfective marker wὲ has emerged, likely from a postposition, which in the modern language could be treated as an inﬂection: (11)

Kɔkú ɖò àsɔ́n ɔ́ ɖù wὲ. Koku be.at crab  eat  ‘Koku is eating the crab.’ (Lefebvre & Brousseau 2002: 96)

In Palu’e in Central Flores, a new ﬁrst-person singular subject marking clitic has developed (Donohue 2009). In Mandarin, since the seventh century  (Li & Thompson 1976), the marker bǎ has emerged from the meaning take: (12)

Nˇ1 bǎ jiuˇ màn-màn-de hē. you  wine slowly drink ‘You drink the wine slowly.’ (Li & Thompson 1981: 464)

In a future stage of Mandarin this, as well as other items that cleave closely to roots such as nominalizer zi, could become bound morphemes. Also, in Mandarin, the modern usage of numeral classiﬁers began developing in the second century  (Norman 1988: 115–17), and diachrony has rendered them quite often semantically unpredictable. Zhī is used with animals (although only some of them) and birds, but is also used with eyes, hands, suitcases, and boats. Tiáo is most immediately identiﬁed with long, thin things; less likely to come to mind is that it is also used with proposal, voice, scheme, and ‘piece of news’. Bă is used with things that one holds such as knives and teapots, but also with chairs—and the experience of aging (niánjì). As such, Gao (1998) notes that Mandarin speakers’ mental representation of classiﬁers is subdivided between three classes of association, one transparent, one prototypical (metaphorically extended in a synchronically processible fashion) and one arbitrary. This can be

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

278

 . 

analysed as the emergence of grammatical gender—that is, contextual morphology. (Cf. Grinevald & Seifart 2004 on the likeness of noun class marking and grammatical gender.)

10.5 Argument No. 3: radical analyticity is rare Because linguists tend to be familiar with analyticity from the textbook example of Chinese, as well as from any acquaintance with creole languages, it can seem that analyticity is as likely a state in a language as any other. However, this is not true when it comes to, especially, radically analytic languages. Outside of creole languages, where we take it as uncontroversial that adult learning was the cause of the analyticity, radically analytic languages are actually rare. Donohue & Denham (forthcoming) in their survey World Atlas of Language Structures, ﬁnd none outside of the areas I have cited. If we treat Sinitic as about ten languages, Hmong-Mien as about twenty (a high estimate according to most accounts), Tai-Kadai as about a hundred according to Ethnologue, and treat about 130 of the 168 Austroasiatic languages tabulated by Ethnologue while subtracting Munda and Aslian (again, yielding a likely high tally), then in East and Southeast Asia there are about 260 radically analytic languages. Furthermore, the analyticity of these can be treated as tracing to the analyticity of Chinese alone (McWhorter 2016). In the meantime, outside of these languages, the tally of radically analytic languages in Africa, Flores, Timor, and the island of New Guinea is about three dozen at most. How often the linguist encounters sentences of Mandarin, plus how familiar creole languages have become within the ﬁeld, can distort our sense of the bigger picture. There would appear to have never been reported a radically analytic indigenous language in: 1. North America, South America, or Australia 2. The four families indigenous to all of Africa other than a tiny pocket of languages in one of those families 3. Dravidian, Uralic, Altaic, the Caucasian families, Yeniseian, or any ‘Paleosiberian’ group 4. Indo-European. A feature manifested in a mere few hundred of the world’s 7,000 older (as opposed to creole) languages qualiﬁes not as an ordinary result (‘Language X simply lost its inﬂections’) but as an unusual circumstance. This is even more the case if the feature manifests itself in solely a few dozen of 7,000, the result if we count the analyticity of the Sinosphere as an areal feature spread from Chinese. It is clear that radical analyticity is not a state that a language reaches easily and, in

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

279

fact, everything we know about how languages transform over time makes it difﬁcult to see how such a state could occur in stepwise fashion. However, the origin of radical analyticity in acquisition by adults is richly observed and thoroughly predictable. The scientiﬁc beneﬁt of cordoning off creoles into this origin scenario while assuming an unspeciﬁed different one for other radically analytic languages is unclear. This bifurcated approach would be appropriate if there were evidence that large-scale acquisition of a language by adults was impossible before the emergence of the transatlantic slave trade in the ﬁfteenth century . Obviously, however, there is not. Rather, we could treat creoles as revealing to us how other languages reached a state which, according to observable processes of stepwise grammar-internal evolution, is a mystery. In short, the common idea that a given language simply ‘lost its inﬂection’ is less coherent than it seems. Lack of stress on ﬁnal syllables vastly undershoots what would be necessary for a language to reach a radically analytic state, and languages are not empirically recorded to undergo such a process short of extensive acquisition by adults. I will ﬁnally discuss two counterproposals to my reasoning.

10.6 On claims dissociating creolization from ossiﬁed acquisitional capacity Some creole specialists have attempted a dissociation between even creolization and the effects of adult acquisition. Mufwene (2001), Aboh & Ansaldo (2007), and Aboh (2015) propose a theoretical economy of a different kind: that creole genesis is simply a matter of language mixture, with simpliﬁcation playing no more signiﬁcant a part in creoles’ birth than in how languages change elsewhere worldwide. Mufwene, for example, proposes (2001: 80–105) that there is no qualitative distinction between the emergences of standard English, AfricanAmerican Vernacular English, and Gullah Creole English: all were the result of the mixture of features within the ‘ecology’ of the linguistic contexts in which they emerged, analogously to the mechanisms of population genetics. The idea that the association of creoles with pidginization has been a mistake has become familiar among linguists, to the point that I must spell out that my assumptions will not incorporate this proposal, often termed the ‘Feature Pool’ hypothesis. This hypothesis is motivated partly by a claim that while creoles’ analyticity—such as that of Sranan Creole English or Haitian Creole French—may seem to contrast with European languages’ morphology, in actuality English is only moderately inﬂected, spoken French is much less inﬂected than its written version suggests, and meanwhile the substrate languages of many creoles are the radically analytic ones abovementioned, such as Gbe and Yoruba.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

280

 . 

The implication is that these creoles are analytic simply because their source languages were: as Mufwene speciﬁes (2009: 386), ‘The extent of morphological complexity (in terms of range of distinctions) retained by a “contact language” largely reﬂects the morphological structures of the target language and the particular languages that it came in contact with’. However, an equal number of creoles are based on robustly inﬂected Iberian languages, and/or have robustly inﬂected substrate languages such as Bantu, West Atlantic, Nilo-Saharan, and even Austronesian languages, and yet are as analytic as Sranan and Haitian. Linguists supporting the Feature Pool hypothesis have yet to respond to such observations, such as that while Palenquero Creole Spanish was created by Kikongo speakers, such that both of the languages in the ‘pool’ were heavily inﬂected: (13)

Kikongo (Bentley 1887:526) (8 = noun class 8 plural) O ma-tadi ma-ma ma-mpembe ma-mpwena  8-stone 8- 8-white 8-big i ma-u ma-ma tw-a-mw-ene.  8-that 8- we-them-see-

(14)

Spanish Est-a-s piedr-a-s grande-s y blanc-a-s -- stone-- big- and white-- son las que hemos visto. .3 ..  have.1 see.. ‘These great white stones are those which we have seen.’

Palenquero is yet a highly analytic language. The facts are similar with all of the Portuguese-based creoles, as well as Nubi Creole Arabic and the Aboriginal English-based creoles of Australia. Chinook Jargon creolized as well, and despite its source languages all being richly inﬂected, the creole version was as analytic as Sranan and Haitian (Grant 1996). Adherents of the Feature Pool hypothesis have not responded to such observations, and it is difﬁcult to see how their framework could accommodate them. In this presentation, therefore, I maintain on the basis of the argumentation I have presented that adult acquisition does play a decisive and diagnostic role in creole genesis. My aim is to extend this analysis to languages other than creoles.

10.7 On a phonological pathway to radical analyticity Hyman (2004) proposes a grammar-internal diachronic pathway to radical analyticity. He reconstructs that what caused the difference between verbs in the GYN

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

281

languages, usually monosyllabic or at most bisyllabic, and the heavily afﬁxed ones in Narrow Bantu languages was the development of a phonological template disallowing verbs of more than two syllables. I suggest that extensive adult acquisition is a preferable explanation for both the GYN verb’s lack of inﬂection and its phonotactics. For one, the process Hyman describes has been proposed, to my knowledge, nowhere else. Hyman’s account, in that light, is more descriptive than explanatory. That is: the literature on language change does not record it as a crosslinguistic commonplace that languages permitting richly multisyllabic words gradually take on a phonological ‘template’ limiting words to one or two syllables, with this treated as an ordinary phonological development alongside processes such as nasalization or resyllabiﬁcation. I submit that an adult acquisition account has more explanatory power. Second, the templatic account contravenes the tendency for languages to resist letting phonological processes eliminate grammatical morphemes. Hyman’s account requires that speakers of a language ‘drifted’ into a disyllabic or monosyllabic restriction even on the pain of eliminating grammatically crucial afﬁxes, replacing them with free morphemes—despite linguists’ well-known ﬁndings that speakers resist phonological erosion when it threatens grammatical morphemes (cf. Guy 1991; Carstairs-McCarthy 2010). Counterproposals to some reported cases of this morphologically conditioned sound change (Hill 2014) have not disproven the tendency itself. Third, pidginization, speciﬁcally, explains the GYN situation as well as a templatic explanation, and even better, in proceeding from an empirically observed phenomenon. To wit, the reason words might become radically, as opposed to modestly, shorter in a language, to such a degree as to force a vast restructuring of the grammatical system, is the language’s transformation by nonnative acquirers who are less likely to master lengthier words (as well as grammatical features). To the extent that the GYN languages restrict their verbs to a maximum of two syllables, it is relevant that, as pidgin specialist Mühlhäusler (1997: 140) puts it, ‘There appears to be a tendency in most stable Pidgins, whatever their sub- and superstrata languages and whatever their jargon predecessors, to favour open syllables and words of the canonical shape CVCV.’

10.8 Conclusion My goal has been to demonstrate the arguments for, and advantages of, assuming that radical analyticity traces solely to extensive adult acquisition. Under this analysis, radical analyticity sparks a search for sociohistorical factors that would entail such adult acquisition. The processes in question occurred before written history (otherwise, they would long have been readily apparent) and therefore the

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

282

 . 

investigation of the relevant sociohistorical factors for the various clusters of radically analytic languages is still in progress (McWhorter 2016, in preparation). The advantage to my hypothesis is theoretical economy: rather than positing two pathways to radical analyticity—one of them mechanically incommensurate with what is known of how languages change—we could posit a single one. As a result, radical analyticity could be treated as a clue to social history otherwise difﬁcult to reconstruct or even unrecoverable. We assume that the featherless bird has been plucked, not that it has lost its feathers by chance. We might approach the language devoid of bound inﬂection similarly, to the beneﬁt of our models of diachronic change and language contact.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

11 Different trajectories of morphological overspeciﬁcation and irregularity under imperfect language learning Aleksandrs Berdicevskis and Arturs Semenuks

11.1 Introduction 11.1.1 Why study complexity? In the Introduction, Arkadiev & Gardani (Chapter 1, this volume: 5) list four most important open questions in the study of morphological complexity. In our view, the ﬁrst three questions become important and interesting only as a means to answer the fourth question, which could be reworded as ‘How is morphological complexity related to socioecological factors?’. The true value of this question is not even that it relates morphology and extralinguistic characteristics of the environment in which the language is spoken, but that it makes complexity more than a mere parameter of crosslinguistic variation. Complexity becomes a parameter involved in explanatory theories, giving us the possibility to use it in order to understand how language is structured. As was discussed in the Introducton, in these theories complexity is a dependent variable, while socioecological parameters are predictors. This means that if the theories are correct, we can better understand why linguistic structures are distributed across languages the way they are, how the processes of language change and social interaction are structured and work together, and how language is organized and functions in the brain. If not for this explanatory attempt, the ﬁrst three questions from Arkadiev and Gardani’s list (Can we deﬁne morphological complexity? Can we ﬁnd an understanding of morphological complexity which would be applicable to all languages and quantify this understanding? Can we compare and typologize languages in terms of morphological complexity?) would, in our view, be better described as brain teasers rather than research avenues. Brain teasers are not at all useless, but given how notoriously difﬁcult it is to address these particular questions, it would hardly be possible to expect that the potential beneﬁt of ﬁnding answers would outweigh the required effort. Arkadiev and Gardani provide examples which Aleksandrs Berdicevskis and Arturs Semenuks, Different trajectories of morphological overspeciﬁcation and irregularity under imperfect language learning In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Aleksandrs Berdicevskis and Arturs Semenuks. DOI: 10.1093/oso/9780198861287.003.0011

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

284

    

suggest that Lithuanian nominal inﬂection is more morphologically complex than Turkish. Does this claim per se yield any new information compared to what can be learned from descriptive grammars of these two languages? The fourth question, however, changes everything. If we can explain which factors likely have contributed to Lithuanian being more complex than Turkish, then the game is worth the candle. That gives us the incentive to ponder about what complexity is and to search for the means of operationalizing it. Our contribution to this volume should be read with this value system in mind.

11.1.2 What is complexity? Having this incentive to deal with all the questions from Arkadiev and Gardani’s list, let us brieﬂy outline what we mean by complexity in this chapter. As most would agree, complexity is a multi-faceted phenomenon, and a language can be complex in several different ways. This volume contains a variety of perspectives on and approaches to complexity, see Dahl (Chapter 13, this volume) for an overview. Trying to tackle all aspects of it simultaneously, however, is likely to hinder progress rather than aid it. In order to usefully limit the scope of this particular investigation, we will concentrate on two of the facets of complexity that are, in our view, most crucial: overspeciﬁcation and irregularity. We deﬁne overspeciﬁcation as overt and obligatory marking of a semantic distinction that is not necessary for communication, following McWhorter’s (2007: 21–8) understanding. The problem with this deﬁnition is that it is not at all obvious what is necessary for communication. McWhorter makes inferences about what is necessary by comparing the grammars of different languages. If many of the world’s languages have neither subject-verb agreement nor any apparent means to compensate for the lack of it, it seems reasonable to hypothesize that this feature is redundant and that languages that do possess it have overspeciﬁed grammars. A more direct way to ﬁnd out what is necessary would be to run psycholinguistic experiments. MacWhinney et al. (1984), for instance, ﬁnd that Italian speakers do use the subject-verb agreement markers when establishing semantic roles in a sentence. Note that this ﬁnding does not necessarily contradict the claim that agreement is an instance of overspeciﬁcation. That a feature is useful does not mean it is necessary. Fortunately, in this chapter we will be dealing with an artiﬁcial language where it is obvious what is overspeciﬁcation and what is not (see section 11.2). Another facet of complexity we will discuss is irregularity (McWhorter 2007: 33–5). A linguistic system is irregular to the degree that it cannot be described by exceptionless deterministic rules. Such a system can also be described as predictable and consistent. Intuitively, it is usually quite obvious whether a linguistic

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

285

system is regular or not. Systems, however, can be irregular to very different degrees. While in theory it is clear that the fewer rules are required to describe a system, the simpler these rules are and the fewer exceptions they have, the more regular the system is, in practice it is usually difﬁcult to rank several irregular systems in order of their (ir)regularity, and even more difﬁcult to quantify it. Again, for the artiﬁcial language in this chapter, this task is simpler than it would be for real languages. Other facets of complexity exist, but some of them are reducible either to overspeciﬁcation or irregularity, while others are, in our opinion, less ubiquitous and salient. Importantly, overspeciﬁcation and irregularity are not reducible to each other. It is easy to imagine a system which has little or no overspeciﬁcation but is irregular, and it is equally easy to imagine a highly overspeciﬁed but fully regular system (though these are not that frequent in real languages). This understanding of complexity, however limited and simpliﬁed it is, enables us to test speciﬁc hypotheses about the typology and diachrony of morphological complexity.

11.1.3 How to study complexity? Various hypotheses have been proposed to explain the distribution of morphological complexity among the languages of the world. The ones that arguably have the strongest empirical support and have the most lively discussions in the literature are those that suggest the existence of a causal link between a large proportion of non-native speakers in the population and morphological simpliﬁcation (Dahl 2004; Wray & Grace 2007; McWhorter 2007 and Chapter 10, this volume; Trudgill 2011; Dale & Lupyan 2012). The evidence in favour of this hypothesis comes mostly from typological surveys, though rigorous quantitative studies (e.g., Parkvall 2008; Szmrecsanyi & Kortmann 2009; Bentz & Winter 2013; Bentz et al. 2015) are a minority among them. Correlational studies of this kind are necessary, but not sufﬁcient (Tily & Jaeger 2011; Nettle 2012), as other types of evidence are required to demonstrate and explain the causality (Ladd et al. 2015; Roberts 2018). Experimental approaches, in particular iterated artiﬁcial language learning (IALL) (Kirby et al. 2008), can be an efﬁcient means to model the simpliﬁcation and complexiﬁcation processes. In a typical IALL setting, a constructed mini-language is learned by a participant within a limited amount of time, then this participant’s linguistic output is used as linguistic input (i.e., training data) for the next participant, and then the iteration is repeated. If the output of the participants in generation n differs from their input, then the participants in generation n+1 will learn a changed version of the language. This design enables us to observe language evolution in miniature, as the language changes, being transmitted over ‘generations’.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

286

    

The IALL approach does have its limitations. The possibility to observe language change in the laboratory and to have the full control over the environment comes at the price of naturalness. The artiﬁcial languages are by necessity small and relatively simple, and the learning usually takes less than one hour. Nonetheless, while the experimental results should be treated with due caution, they can be a valuable complement to the typological surveys. Suppose a typological study shows a correlation between a proportion of nonnative speakers and absence of inﬂectional morphology, and suppose its data and methods are completely reliable and trustworthy. Even in this best-case scenario, we still do not know whether there really exists a causal link between non-native acquisition and simpliﬁcation (though we have good reasons to hypothesize that). Moreover, we do not get an insight into how exactly adult acquisition facilitates simpliﬁcation (if it does). An iterated learning experiment can serve as a means both to test the presence of the causal link and to identify a potential causal mechanism.

11.1.4 Why does complexity decrease? Bentz & Winter (2013: 3–4) list three potential mechanisms of contact-induced case loss (which can be generalized to other instances of morphological simpliﬁcation): imperfect acquisition by adult learners; the tendency of native speakers to reduce morphosyntactic complexity of their speech when talking to foreigners; the tendency of loan words to combine with more productive inﬂections, forcing the least productive ones out (Barðdal & Kulikov 2009). The ﬁrst mechanism from this list seems to be mainstream in the typological, sociolinguistic, and evolutionary literature (Nettle 2012). Indeed, in the literature on language acquisition, there is a consensus that morphology is hard for non-native learners, and that concerns both production and perception, both tutored and untutored learners (DeKeyser 2005: 6–7). The main factor causing simpliﬁcation then is presumed to consist in the differences between native (child) and non-native (adult) language acquisition. However, given this, another question arises: what aspects of these differences and what conditions are necessary to cause simpliﬁcation? How deep into these differences do we have to delve in order to ﬁnd a proper explanation? It is possible that deep differences in cognitive biases between children and adults have to be invoked, together with nuanced properties of social network structure or other cognitive processes besides learning. However, it is also possible that the answer lies on the surface: children can (usually) master a language perfectly, while adults (usually) cannot (Bley-Vroman 1989: 43–4), and that by itself is enough to provoke simpliﬁcation processes. It seems safe to claim that imperfect learning is one of the driving forces behind

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

287

simpliﬁcation. Can we go further and assume it is the only driving force? While this hypothesis may be too simplistic, it is reasonable to start the search for explanations and mechanisms by testing it. In this chapter, we analyse the data from Berdicevskis & Semenuks (submitted), one of the largest-scale (in terms of the number and the length of transmission chains) IALL experiments so far that directly address linguistic complexity. In Berdicevskis and Semenuks (submitted) we showed that imperfect language learning by itself reduces overspeciﬁcation. Here we focus on irregularity (see 1.2) and show that it behaves differently from overspeciﬁcation. We also investigate how the two facets of complexity interact with learnability of the language. In section 11.2, we summarize the methodology of Berdicevskis & Semenuks (submitted). In section 11.3 we describe the trajectory of overspeciﬁcation, and in section 11.4, that of irregularity. In section 11.5, we draw on the existing knowledge about language acquisition to explain the observed differences. In section 11.6, we conclude.

11.2 Materials and methods In order to investigate whether imperfect learning could lead to higher rates of morphological overspeciﬁcation loss, we designed and ran an IALL experiment. As mentioned in section 11.1.2, the approach provides the opportunity to model language change in a controlled experimental setting. Each transmission chain contained 10 generations, and each generation consisted of a single participant. After the initial instructions, in the training stage of the experiment the participants learned an artiﬁcial language, that is, learned to match 16 ‘sentences’ to 16 stimuli pictures. After that, in the testing stage the participants ﬁrst matched sentences with their appropriate pictures and then produced sentences that they considered to correspond to the each of the individual pictures. The set of all of the sentences that they produced in the last part of the experiment was used as the learning input language for the next generation. The initial artiﬁcial languages that we generated as input for all of the generation 1 participants contained a redundant agreement marker that was not necessary in order to identify which picture corresponded to each sentence. In order to investigate whether imperfect learning could lead to the loss of morphological overspeciﬁcation (in our case – the semantically redundant agreement marker), the amount of time given to the participants to learn the language was manipulated between three different types of transmission chains. In the normal condition all chains contained an amount of time that pilot experiments suggested to be sufﬁcient to fully learn the language, in the temporarily interrupted condition the generation 2-4 participants received less time to learn the languages, and in

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

288

    

permanently interrupted condition chains all participants after the ﬁrst generation received less time. A more detailed description is given in section 11.2.2. Before that, however, we want to note the apparent fact that the IALL approach lacks ecological validity due to a variety of both quantitative and qualitative differences between language learning in an experimental setting and in the real world. Because of that, the claims that one makes based only on IALL experiments need to be tempered. Taken as a piece of a larger picture, however, they provide important supporting evidence and new perspectives on the questions of interest. In the context of the current study, in particular, although we ultimately are interested in differences between native and non-native acquisition, we are not contrasting adult and child learners in our experiment. However, since we are interested in whether the difference between normal and imperfect learning by itself can be a sufﬁcient cause for morphological simpliﬁcation, we consider our model to possess the necessary external validity.

11.2.1 Artiﬁcal language structure Each of the sentences in the languages learned by the participants identiﬁed a picture. We will refer to the set of all pictures as the languages’ meaning space (see Figure 11.1). The meaning space had three dimensions, that is, three characteristics that each of the sixteen pictures could be uniquely identiﬁed by: the agent performing the action (round animal or square animal), the number of agents (one or many) and the action being performed (no action, falling apart, growing antlers or ﬂying). The structure of the initial input languages (we will refer to them as generation 0 languages) is represented in Figure 11.1.¹ The sentences in the languages transparently mapped onto the meaning space: the noun stem identiﬁed the agent, the plural marker (or its absence) identiﬁed the number of agents, and the verb stem (or its absence) identiﬁed the action. Importantly, the agreement marker is semantically redundant, in the sense that its omission would not affect the identiﬁcation of the correct picture in the meaning space – the picture is uniquely speciﬁed by the other three morphemes. Thus, in the generation 0 languages the agreement system is an instance of morphological overspeciﬁcation. See Di Garbo (Chapter 8, this volume) for a detailed study of the changes of gender-agreement systems in a sample of real-world languages in relation to complexity.

¹ We used ﬁfteen different isomorphic languages, as is common in IALL experiments. When reporting results, however, we orthographically map all the languages we have onto the example language in Figure 11.1: the ﬁrst letter of the word for the round animal in the chain’s generation 0 language becomes s, the second letter becomes e, and so on. This procedure makes the comparisons between languages easier while preserving all the information about the changes.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

     agent: round animal

agent: square animal

segN

fuvN

segN-lPL

fuvN-lPL

segN mV-oAGR

fuvNmV-iAGR

segN-lPl mV-oAGR

fuvN-lPL mV-iAGR

segN rV-oAGR

fuvN rV-iAGR

segN-lPL rV-oAGR

fuvN-lPL rV-iAGR

segN bV-oAGR

fuvNbV-iAGR

segN-lPL bV-oAGR

fuvN-lPL bV-iAGR

289

singular event: none plural

singular event: fall apart plural

singular event: grow antlers plural

singular event: fly plural

Figure 11.1. The meaning space of the experimental languages with the corresponding sentences from an example generation 0 language Notes: Subscript N denotes noun stems, V = verb stems, PL = plural marker, AGR = agreement marker. Morphemes are hyphenated and subscripts are provided for clarity’s sake. Glosses for the meanings of the sentences are provided in parentheses. Source: Adapted with permission from Berdicevskis & Semenuks (submitted).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

290

    

11.2.2 Experimental procedure After the initial introductory instructions, the participants learned the language in the training stage of the experiment. The stage consisted of a number of training blocks interspersed with interim test blocks. In the training blocks, the participants saw all of the pictures from the meaning space, which were presented in a random order and accompanied by the sentence corresponding to the picture in the participants’ input language. Each picture-sentence pair remained on the screen for four seconds, after which the next pair appeared. In the interim test blocks the participants were shown one by one eight pictures randomly selected from the meaning space and were asked to type in the corresponding sentences for each of them. The instructions preceding the training block prohibited the participants to take any notes during the experiment. In order to model the difference between normal and imperfect learning, we manipulated the number of training and interim test blocks that the participants received. Normal learner generation participants received six training blocks, whereas imperfect learner generation participants received three blocks. In order to investigate how the amount of imperfect learners in a population would affect the tendency to eliminate morphological overspeciﬁcation from the language spoken by its members, we compared the development of generation 0 languages in transmission chains in three different conditions: normal, temporarily interrupted and permanently interrupted. Figure 11.2 illustrates the differences in the numbers of normal and imperfect learner generations between the conditions. Since the experiment contained 15 generation 0 languages, each of which was used once in each of the three experimental conditions, and each of the Normal transmission L

L

L

L

L

L

L

L

L

L

L

L

S

S

Temporarily interrupted transmission L

S

S

S

L

L

L

L

Permanently interrupted transmission L

S

S

S

S

S

S

S

Figure 11.2. A schematic representation of the chains in the normal (a), temporarily interrupted (b), and permanently interrupted (c) conditions Notes: L = generations with long (full) learning time, S = generations with reduced learning time (imperfect learners). Arrows denote languages transmitted between generations. The very ﬁrst arrows denote pre-generated input languages for the ﬁrst generation learners. Source: Reproduced with permission from Berdicevskis & Semenuks (submitted).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

291

transmission chains required 10 participants, we recruited and analysed the data from a total of 450 participants (140 female, 310 male, mean age = 30.5, SD = 9.2). The participants were recruited online and took part in the experiment on a webpage created using the jsPsych JavaScript library (de Leeuw 2014). Unknowingly to the participants, the web page assigned them to a new generation in a randomly chosen transmission chain before the start of the experiment. The experiment was conducted in Russian, and all of the participants self-reported speaking Russian natively and being at least 16 years old. Because Russian has a salient gender agreement system of its own, we could be sure the native language of our participants would not push them to shed agreement in the experiment by itself.

11.3 The trajectory of overspeciﬁcation The normal transmission chains tended to preserve morphological overspeciﬁcation to a much greater extent compared to chains in either temporarily or permanently interrupted transmission condition, thus supporting the hypothesis that a larger share of imperfect learners in a population would lead to the loss of morphological overspeciﬁcation in the language of that population. In this section, we present a condensed description of some of the results from Berdicevskis & Semenuks (submitted), complementing it with some additional observations.

11.3.1 Qualitative analyses The qualitative analysis of the ﬁnal languages revealed a general trend for the structure of the languages to deteriorate. Several reasons could have led to this, most likely the underestimated difﬁculty of learning the language even with six training blocks and the absence of true communicative pressures in the experiment. However, it was not the case that this deterioration of structure was equally likely to affect all aspects of the language and was equally likely to affect chains of all three conditions. The agreement system was eroded by the participants much more often than the other morphological aspects of the system, and this erosion of structure was less frequent in the chains with normal transmission. Nonetheless, it is important to keep in mind that the learning was not entirely perfect in normal condition either. Thus, when speaking about imperfect learning we will mean the degree of imperfect learning rather than its presence or absence. The system was fully preserved in just three languages, two of which were generated in normal condition chains and one in a temporarily interrupted condition chain, and it was also almost fully preserved in three other languages, all of which belonged to normal condition chains. An example of a ﬁnal (generation 10) language without any damage to the agreement system can be seen in Table 11.1.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

292

    

As one can see, the last generation language preserves the generation 0 agreement system fully -o is consistently used to mark agreement with seg, and -i with fuv. The only deviation from the generation 0 language structure is the loss of the verb root in one of the sentences of the language (gen. 0 segl ro => gen. 10 segl o), however, this change still conserved the correct agreement sufﬁx. The system disappeared, in turn, in fourteen languages, three of which belonged to the normal condition, ﬁve in temporarily interrupted condition, and six in permanently interrupted condition. An example of a generation 10 language that has fully lost the agreement system can be seen in Table 11.2. As Table 11.2 shows, the generation 10 language in this chain has fully lost the -i agreement pattern used for fuv in the generation 0 language, and now uses -o in all sentences, which now is more reasonably analysed as a part of the verb stems than an agreement marker. One can also note that one of the noun stems changed from fuv to fug, likely under the inﬂuence of seg. Table 11.1. An example of a ﬁnal language with a fully preserved agreement system Event fall apart grow antlers fly

Agent Gen 0 round animal

square animal

Gen 10 round animal

square animal

sg

seg

fuv

seg

fuv

pl sg pl sg pl sg pl

segl seg mo segl mo seg ro segl ro seg bo segl bo

fuvl fuv mi fuvl mi fuv ri fuvl ri fuv bi fuvl bi

segl seg mo segl mo seg ro segl o seg bo segl bo

fuvl fuv mi fuvl mi fuv ri fuvl ri fuv bi fuvl bi

Table 11.2. An example of a language with a fully lost agreement system Event fall apart grow antlers fly

Agent Gen 0 round animal

square animal

Gen 10 round animal

square animal

sg

seg

fuv

seg

fug

pl sg pl sg pl sg pl

segl seg mo segl mo seg ro segl ro seg bo segl bo

fuvl fuv mi fuvl mi fuv ri fuvl ri fuv bi fuvl bi

segl seg mo segl mo seg ro segl ro seg bo segl bo

fugl fug mo fugl mo fug ro fugl ro fug bo fugl bo

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

293

In the other chains the initial agreement system substantially deteriorated, but did leave some remnants in generation 10 languages, which made it difﬁcult to precisely characterize the level of system erosion in a qualitative yet objective way. Nevertheless, taking the above ﬁndings together, we can see that chains including imperfect learner generations were more likely to completely shed the agreement system and less likely to preserve it.

11.3.2 Quantitative analyses Here we focus on a speciﬁc quantitative analysis which operationalizes morphological overspeciﬁcation in our artiﬁcial languages as the expressibility of the only redundant feature, viz. verbal agreement. Expressibility is deﬁned as the proportion of pairs of sentences where meaning differs in (and only in) the agent, and where the surface forms of the verbs are different. The concept can be easily understood by means of Table 11.2. For every language, we ignore the ﬁrst two rows (as they have no verbal meanings) and then compare pairwise the two cells in the other six rows: are the verbs the same or different? In generation 0, the verbs are always different, and expressibility of agreement would equal 1. In generation 10, the verbs are always the same, and expressibility of agreement would equal 0. As Figure 11.3 shows, although the expressibility of agreement declined in all conditions, it declined to a lesser extent in the normal transmission chains. This pattern is in accord with the qualitative ﬁndings reported above. As we mentioned in section 11.3.1, learning is imperfect in all three conditions, but to a lesser degree in the normal one. Taken together, the results of the experiment provided experimental support for the hypothesis that a large share of non-native learners in the population of speakers of a language could lead to the simpliﬁcation of the morphological structure of that language. More speciﬁcally, the study showed that imperfect learning of a language could lead to the loss of morphological overspeciﬁcation.

11.4 The trajectory of irregularity The initial languages used in the study described above are perfectly regular. While the rule ‘change the verb form depending on the agent’ is redundant, it is still a rule, deterministic and exceptionless, as are the other properties of the initial languages. Irregularity in this setup is equal to zero and thus cannot decrease. At ﬁrst glance, this setup cannot then be used to test any hypotheses about the potential role of imperfect learning in regularization. Manual inspection of the evolving languages, however, quickly reveals noticeable changes in irregularity. Due to the reasons outlined above they always start with an increase, but some

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

294

    

1.00

Overspecification

0.75

Transmission Normal Temporarily interrupted

0.50

Permanently interrupted 0.25

0.00 0

1

2

3

4 5 6 Generation

7

8

9

10

Figure 11.3. Change of the overspeciﬁcation of agreement, as measured by expressibility, over time Note: Shaded regions denote the standard error.

transmission chains show less trivial patterns later. In this section, we present and analyse these patterns. Irregularity emerges because participants fail to learn or to apply a certain rule. Most often, this is the agreement rule, and we will focus solely on the irregularity of agreement (as we did with overspeciﬁcation in section 11.3).

11.4.1 Probability matching While the participants often fail to learn the rule that governs the distribution of the two agreement markers in the initial languages, they seldom ignore the fact that there are two different markers. When a deterministic distribution rule is not available to learners, they often resort to probability matching, that is, reproduce the variants with approximately the same relative frequency as in the input (Hudson Kam & Newport 2009; Smith & Wonnacott 2010: 447, ﬁgure 1), but without a clear consistent rule for when to use which variant. Figure 11.4 demonstrates that our participants do the same with the agreement markers. In all three conditions, the mean relative frequency of the round-animal marker does not deviate much from the initial 50% (and, consequently, the same is true for the

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

295

Proportion of Round Animal Agreement Marker

1.00

0.75 Transmission Normal Temporarily interrupted

0.50

Permanently interrupted

0.25

0.00 0

1

2

3

4 5 6 Generation

7

8

9

10

Figure 11.4. Relative frequency of the agreement marker which denoted the round animal in the initial language of the chain Note: Shaded regions denote the standard error.

second marker). The narrow error bars show that relative frequencies in the individual chains do not deviate much from 50% either (i.e., it is not the case that the mean 50% is a result of half the chains using one marker in 100% cases and the other half in 0% cases). Out of our forty-ﬁve chains, fourteen lose agreement completely (see section 11.3.1). Some of those completely replace one marker by another, as the language in Table 11.2, but this happens only in three chains, in the other chains both markers get reanalysed as parts of the verb stems. The most common scenario is represented in Table 11.3. In the ﬁnal language, all three verbs have only one form. Two (m- and b-) preserve the original round-animal form with the -o ending, one (r-) preserves the square-animal form (-i), thus making the relative frequencies of the markers 2/3 and 1/3, respectively. Out of the fourteen agreement-losing chains, nine arrive at this frequency distribution at the end (counting both cases when it is the roundanimal marker that has frequency of 2/3 and when it is the square-animal one). Analysis of all the individual chains conﬁrms that while a few chains do replace one marker by another completely or almost completely, most keep the proportion not too far from 50% throughout all the generations.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

296

     Table 11.3. A language with a fully lost agreement system Agent Gen 0 round animal

Event fall apart grow antlers fly

square animal

Gen 10 round animal

square animal

sg

seg

fuv

seg

fuv

pl sg pl sg pl sg pl

segl seg mo segl mo seg ro segl ro seg bo segl bo

fuvl fuv mi fuvl mi fuv ri fuvl ri fuv bi fuvl bi

segl seg mo segl mo seg ri segl ri seg bo segl bo

fuvl seg mo fuvl mo fuv ri fuvl ri fuv bo fuvl bo

Note: seg instead of expected fuv in the third row is not a typo.

It should be noted that in some chains, verb endings different from the original two emerge. If we calculate denominator of the ratio as the number of all present verb endings and not just the original two, the general picture does not change.

11.4.2 Irregularity and overspeciﬁcation While the agreement markers continue to be present as elements of form, they lose their connection to the meaning (without being replaced by another element). In order to measure this trend, we pair up the twelve verb forms in the same way as we did when measuring expressibility (see section 11.3.2) and compare the last symbols in the verbs of every pair (manual analysis shows that if agreement is expressed, it is almost always expressed by the last symbol). For every pair of symbols, we calculate how often it occurs (out of six possible cases). Pairs where the symbols are the same get lumped together, regardless of what the symbols actually are. To quantify irregularity, we calculate the Shannon entropy of the probability distribution and normalize it by the maximal entropy, see Equation (1). (1)

Irregularity = H(SC)/log₂(6), where SC is the probability distribution of patterns of agreement expression

This measure is similar to Cuskley et al.’s (2015: 215) Sj measure, used to measure the variability of sub-rules a participant uses in the formation of irregular past tenses. Consider some examples. In the ﬁnal language in Table 11.1 there is only one pattern of agreement marking: {o, i}, and the same is true for the ﬁnal language in Table 11.2 (the same-symbol type). Both languages would get an irregularity score of zero. So would the ﬁnal language in Table 11.3: while there are two different

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

297

Table 11.4. A language with an irregular distribution of the agreement markers Event fall apart grow antlers fly

Agent Gen 0 round animal

square animal

Gen 10 round animal

square animal

sg

seg

fuv

seg

fuv

pl sg pl sg pl sg pl

segl seg mo segl mo seg ro segl ro seg bo segl bo

fuvl fuv mi fuvl mi fuv ri fuvl ri fuv bi fuvl bi

segl seg mi segl mi seg ri segl ro seg bo segl bi

fuvl fuv mi fuvl mi fuv ro fuvl ro fuv bo fuvl bo

Note: Cases where agreement is preserved are marked in bold.

pairs {o, o} and {i, i}, they both fall under the same-symbol pattern. The language in Table 11.4, however, is less regular. The strategy here is almost the same as in Table 11.3 with two exceptions: the verb r- preserved the agent marking in singular, the verb b- in plural. Hence, there are two patterns: the same-symbol pattern (four cases) and {o, i} (two cases). The language gets an irregularity score of 0.36. Irregularity depends on the number of patterns (the more patterns, the higher irregularity is) and the distribution of their probabilities (irregularity is highest if all the patterns are equiprobable). Thus, the least irregular language (apart from the fully regular one, which scores 0) would have two patterns, one of which occurs only once, and would score 0.25. The most irregular language would have six equiprobable patterns and score 1. However, this never happens in our data, the highest observed score is 0.74 (it can be achieved, e.g., by having four patterns: two that occur twice and two that occur once). As can be seen on Figure 11.5, unlike overspeciﬁcation, in all three conditions irregularity increases rather steeply at ﬁrst, then starts oscillating around what seems to be a plateau. In the permanently interrupted condition, there is a rather steep decrease during the last two generations, in the other two conditions the peak of irregularity is also closer to the middle (i.e., there is a slight decrease towards the end), but the difference is small. It is, however, interesting to take a look at the individual trajectories of irregularity and compare it to those of overspeciﬁcation. We do that in Figure 11.6. In most chains, the initial changes in overspeciﬁcation and irregularity go in exactly opposite directions, that is, the two measures seem to be almost perfectly negatively correlated. Sometimes this trend continues through all the generations (see, e.g., chains 2 and 13). If, however, the overspeciﬁcation decreases beyond 0.5, the measures become positively correlated and subsequently change almost in unison (see, e.g., chains 22 and 30).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

298

    

1.00

0.75

Transmission

lrregularity

Normal Temporarily interrupted

0.50

Permanently interrupted

0.25

0.00 0

1

2

3

4 5 6 Generation

7

8

9

10

Figure 11.5. Change of irregularity, as measured by Shannon entropy, over generations Note: Shaded regions denote the standard error.

This behaviour largely follows from the deﬁnition of the measures. There are two states where the system is fully regular: complete overspeciﬁcation and complete absence of overspeciﬁcation. If the system is closer to the ﬁrst state (overspeciﬁcation > 0.5), almost any mutation would change the two measures in different directions (if agreement is lost in one case out of six, it is a decrease in overspeciﬁcation, but an increase in irregularity), but if it is closer to second state (overspeciﬁcation < 0.5), then the measures usually change in the same direction (e.g., if the two remnants of agreement in the language in Table 11.4 disappear, both overspeciﬁcation and irregularity would go down to zero).

11.4.3 Irregularity and learnability For every generation (apart from the ﬁnal ones) we estimate how learnable its language is. The measure of learnability is transmission ﬁdelity, which is obtained by comparing the language of generation n with the language of generation n+1, calculating the normalized pairwise Levenshtein distance between the sentences with the same meanings and subtracting it from 1. We found that, unlike in most other IALL experiments, learnability clearly decreases over time. If, however, we look at the learnability as a function of overspeciﬁcation, we ﬁnd that it follows a

1

2

3

4

5

1.0

1.0

1.0

1.0

1.0

0.8

0.8

0.8

0.8

0.8

0.6

0.6

0.6

0.6

0.6

0.4

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0.2

0.0

0.0

0.0

0.0

2

4

6

8

10

0

2

4

6

6

8

10

0

2

4

7

6

8

10

0.0 0

2

4

8

6

8

10

0

1.0

1.0

1.0

0.8

0.8

0.8

0.8

0.8

0.6

0.6

0.6

0.6

0.6

0.4

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0.2

0.0

0.0

0.0

0.0

4

6

8

10

0

2

4

6

8

10

0

2

4

12

11

6

8

10

2

4

13

6

8

10

0

1.0

1.0

0.8

0.8

0.8

0.8

0.8

0.6

0.6

0.6

0.6

0.6

0.4

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0.2

0.0

0.0

0.0

0.0

6

8

10

0

2

4

6

8

10

0

2

4

6

8

10

10

4

6

8

10

6

8

10

15

1.0

4

2

14

1.0

2

8

0.0 0

1.0

0

6 10

1.0

2

4

9

1.0

0

2

0.0 0

2

4

6

8

10

0

2

4

Figure 11.6 Change of overspeciﬁcation (solid line) and irregularity (dashed line) in verbal agreement over generations in individual chains: (a) normal condition; (b) temporarily interrupted condition; (c) permanently interrupted condition

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

0

17

18

19

20

1.0

1.0

1.0

1.0

0.8

0.8

0.8

0.8

0.8

0.6

0.6

0.6

0.6

0.6

0.4

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0.2

0.0

0.0

0.0

0.0

0

2

4

6

8

10

0

2

4

21

6

8

10

0

2

4

22

6

8

10

0.0 0

2

4

23

6

8

10

0

1.0

1.0

1.0

0.8

0.8

0.8

0.8

0.8

0.6

0.6

0.6

0.6

0.6

0.4

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0.2

0.0

0.0

0.0

0.0

4

6

8

10

0

2

4

6

8

10

0

2

4

27

26

6

8

10

2

4

28

6

8

10

0

1.0

1.0

0.8

0.8

0.8

0.8

0.8

0.6

0.6

0.6

0.6

0.6

0.4

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0.2

0.0

0.0

0.0

0.0

6

8

Figure 11.6 Continued

10

0

2

4

6

8

10

0

2

4

6

8

10

10

4

6

8

10

6

8

10

30

1.0

4

2

29

1.0

2

8

0.0 0

1.0

0

6 25

1.0

2

4

24

1.0

0

2

0.0 0

2

4

6

8

10

0

2

4

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

16 1.0

31

32

33

34

35

1.0

1.0

1.0

1.0

1.0

0.8

0.8

0.8

0.8

0.8

0.6

0.6

0.6

0.6

0.6

0.4

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0.2

0.0

0.0

0.0

0.0

0

2

4

6

8

10

0

2

4

8

10

0

2

4

37

6

8

10

0.0 0

2

4

38

6

8

10

0

1.0

1.0

1.0

0.8

0.8

0.8

0.8

0.8

0.6

0.6

0.6

0.6

0.6

0.4

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0.2

0.0

0.0

0.0

0.0

4

6

8

10

0

2

4

6

8

10

0

2

4

42

41

6

8

10

2

4

43

6

8

10

0

1.0

1.0

0.8

0.8

0.8

0.8

0.8

0.6

0.6

0.6

0.6

0.6

0.4

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2

0.2

0.0

0.0

0.0

0.0

6

8

Figure 11.6. Continued

10

0

2

4

6

8

10

0

2

4

6

8

10

10

4

6

8

10

6

8

10

45

1.0

4

2

44

1.0

2

8

0.0 0

1.0

0

6 40

1.0

2

4

39

1.0

0

2

0.0 0

2

4

6

8

10

0

2

4

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

36

6

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

302

     Normal learners

Imperfect learners

1.00

Learn ability

0.75

0.50

0.25

0.00 0

0.25 0.36 0.39 0.48 0.56 0.61 0.69 0.74

0

0.25 0.36 0.39 0.48 0.56 0.61 0.69 0.74

Irregularity

Figure 11.7. Learnability as a function of irregularity

U-curve: high when overspeciﬁcation is 1 and 0 (slightly higher at 0), but noticeably lower at other values. An obvious reason is that at intermediate overspeciﬁcation values the system is almost always irregular. On Figure 11.7, we represent learnability as a function of irregularity (averaging across chains and conditions, but keeping normal and imperfect learners separately). As irregularity increases, the learnability indeed decreases, and the decrease is steeper for imperfect learners. Thus, to go from a regular overspeciﬁed state (learnable) to a regular nonoverspeciﬁed state (more learnable), the system has to pass through an irregular stage (less learnable). The irregular stage can only be avoided if the total loss of overspeciﬁcation occurs within one generation, which almost never happens.

11.5 Discussion That imperfect learning eliminates morphological overspeciﬁcation is not surprising and ﬁts well with the predictions of the theories discussed in section 11.1.2. It is also in accord with the knowledge accumulated by acquisition studies. While much is still unknown about how exactly adult learners are different from child learners and why it is so, it seems safe to claim that inﬂectional morphology is difﬁcult for nonnative speakers and often absent in their speech (DeKeyser 2005: 6).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

303

The observation that irregularity increases under imperfect learning, contrariwise, seems to be at variance with the theories discussed in section 11.1.2. It is, however, not unexpected from the language acquisition and language change perspective. Loporcaro (Chapter 6, this volume) compares Wolof noun morphology to that of other Atlantic languages (the subfamily of the Niger-Congo languages). In Wolof, the system of marking noun classes through initial consonant mutations typical of other Atlantic languages has largely eroded and has been restructured: now noun class is marked on function words that modify the noun, for example articles and demonstratives. However, certain nouns still show remnants of the previous system and can be optionally marked for number through initial consonant alteration. Thus, we see a pattern reminiscent of our experimental results— certain systems of noun classiﬁcation disappear (thus decreasing overspeciﬁcation), but leave irregular atavisms (thus increasing irregularity). Clahsen et al. (2010) review evidence in favour of the claim that non-native speakers are less sensitive to morphological structure. They underuse morphological decomposition and rely more on memorization and lexical storage, even of the regularly inﬂected forms. This effect has been found also in highly proﬁcient non-native speakers which approach native-like performance (Neubauer & Clahsen 2009). While memorization of separate forms per se does not imply irregularity, it clearly creates a friendlier environment for its emergence than does rule-driven form generation. In this context, it is interesting to look at the ﬁnding that non-native speakers of English produce signiﬁcantly more irregular past-tense forms in a Wug-task than native speakers (Cuskley et al. 2015). Cuskley et al., however, argue that the irregularities are still rule-driven and follow the patterns that exist in the set of real English irregular verbs. They hypothesize that the effect is explained by the peculiarities of the non-native input, namely higher relative frequency of the irregular verbs and their higher salience in the explicit instruction. The controversial conclusion of Cuskley et al. (2015) is that despite the seeming preference for irregularity, non-native speakers actually prefer rules over exceptions and simplicity over complexity. Our data lend modest support to Clahsen’s memorization vs. generation account. The elimination of agreement implies that our learners fail to do the full-ﬂedged morphological analysis of their input. Agreement gets affected more than other features, probably because it is redundant and based on a long-distance relationship (verb and agent), and both these factors can inhibit learning (DeKeyser 2005). It is, however, difﬁcult to say whether the normal learners preserve more agreement because they are more sensitive to the morphological structure or because they have more time to memorize the forms. We can only claim that imperfect learning inhibits acquisition of rule-based distributions, but cannot say how exactly it happens.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

304

    

Usage of complex unproductive rules instead of simple productive ones is one source of irregularity. Another one is the usage of probabilistic rules instead of deterministic ones. Inconsistent probabilistic usage is typical for non-native speakers (Johnson et al. 1996). Hudson Kam & Newport (2005, 2009) in a series of ALL (but not I(terated)ALL) experiments show that if grammatical forms in the linguistic input are used probabilistically, then adult learners usually reproduce the inconsistencies, regularizing only the most infrequent ones in most complex cases. In one experiment, when adults did impose deterministic rules, those were mostly ‘rules of omission which served to remove structure from the language’ (Hudson & Newport 1999: 276). This means that just as with our participants, those learners decreased overspeciﬁcation but not irregularity. Smith & Wonnacott (2010), however, show that regularization can occur if weak individual biases of adult learners are ampliﬁed by iterated transmission. In an IALL study, they show that transmission chains, but not isolate learners eliminate unpredictable variation (see also Smith et al. 2017 on how language use affects bias ampliﬁcation; Samara et al. 2017 on how sociolinguistic conditioning affects language use by adults and children). Although our chains are twice as long as Smith & Wonnacott’s (2010) ﬁvegeneration chains, we do not see any reliable overall decrease in irregularity (see Figure 11.5). An important difference between the two studies, however, is that Smith & Wonnacott’s participants received probabilistic, or truly unpredictable, input. They saw several signals for exactly the same meaning, and those signals could be different. In our study, the input is, strictly speaking, deterministic, since every meaning is represented by one sentence. Thus, while it is possible that, for instance, ‘fall apart’ will sometimes be denoted by mo and sometimes by mi, the variation will not be fully unpredictable, it will always be possible to condition it on something (e.g., agent or number).² This conditioning is likely to protect variation from elimination. Note that the conditioned variation is still difﬁcult to learn, and participants seldom manage to reproduce faithfully the conditioning ‘invented’ by a previous generation. Instead of eliminating it completely, they replace it by their own conditioning. It can be argued that the participants treat the input as at least partly probabilistic (failing to learn the rule behind the distribution of markers, they nonetheless match the frequencies of markers). The input, however, is not complex enough to trigger the regularization as in Hudson Kam & Newport (2005, 2009). Another reason for the difference from Smith & Wonnacott’s results can be that our languages are more complex and it is more difﬁcult for learners to converge on a regular pattern. In addition, the probability of random mutations that can make

² We are grateful to Kenny Smith for bringing this difference to our attention.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    

305

language deviate from the regular state is higher in our case (note that Smith & Wonnacott ﬁltered away certain random mutations that they deemed irrelevant before passing the input on to participants). It should also be noted that while there is a clear difference between the trajectory of overspeciﬁcation in the normal condition and the interrupted conditions, such difference is absent for irregularity. It can be that overspeciﬁcation is more sensitive to the degree of imperfect learning. The effect of irregularity on learnability, however, seems to be different for normal and imperfect learners: as irregularity increases, the learnability decreases steeper in the latter category.

11.6 Conclusion We show that during morphological simpliﬁcation the trajectories of overspeciﬁcation and irregularity need not be the same and, moreover, are likely to be different. Imperfect learning prevents speakers from acquiring certain morphological rules (especially those that are redundant or particularly difﬁcult) and thus causes decrease in overspeciﬁcation but increase in irregularity. Interestingly, the degree of imperfect learning seems to affect how much overspeciﬁcation decreases, but not how much irregularity increases. The increase in irregularity, in turn, makes languages less learnable (this effect is stronger for imperfect learners than for normal ones), unless all overspeciﬁcation is eliminated and the system reaches the non-overspeciﬁed regular state. Our chains seldom reach this optimum, probably because the regularization bias is relatively weak in our participants and the experimental setting suppresses it.

Acknowledgements The experiment was funded by Faculty of Humanities, Social Sciences and Education at UiT, The Arctic University of Norway. AB was supported by the Norwegian Research Council grant ‘Birds and Beasts’ (222506). We are also grateful to the popular-science portal ‘Elementy’ and its editor-in-chief Elena Martynova for advertising the experiment, to Tanja Russita for designing the Epsilon fauna, to Kenny Smith and Peeter Tinits for commenting on an earlier version of the chapter, and to Peter Arkadiev and Francesco Gardani for inviting us to contribute to this volume.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

12 Where is morphological complexity? Marianne Mithun

12.1 Introduction As linguists, we love discovering order in chaos. Grammatical complexity provides us puzzles to play with. An assumption underlying some theoretical models of language has been that the most elegant formal description naturally matches speaker knowledge. But closer attention to what speakers actually do raises the question of whether complexity is in fact the same for the analyst, the speaker, and the language learner. An examination of speech in languages displaying different kinds of morphological complexity, spoken in language contact situations, suggests that they are not.

12.2 What is complexity? Dahl (2004, 2017) provides useful surveys of approaches to complexity, distinguishing ﬁrst agent-related or relative complexity from objective or absolutive complexity. Agent-related complexity refers to the effort a generalized outsider needs to become acquainted with the system (Kusters 2008: 9). Objective complexity refers to (i) the amount of information needed to specify the system (Kolmogorov complexity); (ii) the length of the description of a set of regularities or recurring patterns (the effective complexity of Gell-Mann 1994); or (iii) the number of parts of a system and/or interactions (Miestamo 2008). Dahl further distinguishes the linguistic material the measures are applied to. System complexity pertains to what a learner must master in order to become proﬁcient in a language, presumably including such things as rules and their exceptions. Structural complexity pertains to the complexity of individual expressions, such as the depth of maximal embedding in a sentence. Corpus complexity measures complexity over samples of connected speech, such as the Greenbergian (1960) calculations of degree of synthesis, or average number of morphemes per word. But morphological complexity is itself not a straightforward matter, as pointed out by editors of this volume in Chapter 1. To compare degrees of synthesis across languages, one could measure the average number of morphemes per word over

Marianne Mithun, Where is morphological complexity? In:The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Marianne Mithun. DOI: 10.1093/oso/9780198861287.003.0012

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    ?

307

comparable stretches of speech. Alternatively, one might compare the maximal possible number of morphemes per word. In languages with templatic morphology, one could count the number of slots within the templates, or the number of morphemes per slot, or the coherence of functions of morphemes within slots. One could count all slots, or only those which are obligatory. The fact that a morphological structure is templatic could itself be viewed as adding complexity: it would mean that morpheme order does not follow naturally from scopal relations and must be stipulated. Discussions of morphological complexity usually include form/function mappings as well. Deviations from one form : one function correspondences have been cited as added complexity (Anderson 2015a). Such phenomena would include fusion, suppletion, syncretism, dependence on lexical classes, and elements with no discernible meaning. The very existence of morphological complexity might seem to be counterproductive, adding useless difﬁculty to the acquisition and use of language. But even where the complexity seems arbitrary, the factors which produce it are not. Perhaps the most important factor is cognitive. Frequently-recurring sequences of meaningful elements eventually tend to become routinized and stored in memory as chunks, as described by Bybee & Beckner (2015) and many others. Over time, the formal and semantic salience of their individual components fades for speakers, and their forms can erode. Another intriguing possible factor in the development of complexity, raised by Dahl (Chapter 13, this volume), Trudgill (2011, 2017), and Dale & Lupyan (2012), is the sociocultural context in which a language is used. Small communities, with dense social networks which persist over long periods of time, might foster an increase in complexity. If speakers interact regularly with a limited set of interlocutors, the relative frequency of particular turns of phrase might increase, setting the stage for routinization and just the kinds of grammaticalization processes that underlie complexity. Multilingualism within the community might affect complexity as well, but in several possible ways. Intensive, longstanding bilingualism might lead to an increase in complexity, as early bilinguals replicate grammatical distinctions of each language in the other, adding to the total number in each. If, on the other hand, the bilingualism has a different proﬁle, consisting, for example, of a substantial proportion of untutored adult learners, there might be an overall decrease in complexity, as second-language speakers systematically choose simpler, analytic constructions over more complex, synthetic ones. Here the fate of morphological complexity under contact is explored in two languages with slightly different kinds of complexity. The data come from conversations among ﬁrst-language speakers affected to varying degrees by contact. The implications of the ﬁndings are then considered for our larger understanding of morphology.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

308

 

12.3 Central Pomo Central Pomo is a language of the Pomoan family, indigenous to an area of Northern California approximately 100 miles north of San Francisco. It shows a certain degree of morphological complexity, but it would not be considered polysynthetic in one narrow sense: arguments are not speciﬁed within the verb. Verbs can show other kinds of morphological elaboration, however, including speciﬁcation of means/manner, location/direction, various kinds of verbal number, argument structure (causatives, reciprocals, passives), inchoatives, aspect, dependency, and more. An example is in (1). Afﬁxes are in bold. (All examples here are taken from unscripted speech.) (1)

Central Pomo verb structure (Frances Jack, speaker p.c.) Mu:l bašá ʔel ʔ-áʔ-č’i-n ʔe that buckeye the ﬁngering-gather-.-..  ‘When gathering buckeyes, kúyq’a:l ʔe mu:l m-t ̯’á:-ka-w-aʔ-ya-w. right.away  that heat-sense---.-- you have to cook them as soon as you get them.’

Complexity can be affected in a variety of ways by the sociocultural context in which languages are spoken. Trudgill (2011) has proposed that small communities, with tightly-knit social networks and frequent interaction among small numbers of participants, could foster the growth of complexity. Enhanced frequencies of recurring expressions could result in routinization and morphologization. Language contact can affect complexity in quite diverse ways. Early bilingualism might increase complexity, as children, who have the least difﬁculty in acquiring complex systems, replicate distinctions from one of their languages in the other. Late bilingualism in a large proportion of a population might decrease morphological complexity, as adult second-language speakers opt for more analytic forms of expression. Importantly, the encroachment of one language on another might have a simplifying effect, as spheres of usage of the endangered language and frequency of its use are reduced. Northern California is a recognized linguistic area, with striking structural parallels across the languages, including morphological distinctions. Communities have always been small, and exogamy common, so a good proportion of children were raised in bilingual households. The small communities and longstanding, intense contact could well have contributed to morphological complexity. One feature that is widespread across languages of the area is the speciﬁcation via verbal preﬁxes of means/manner/instrument (Mithun 2007). Examples of their functions can be seen in (2) with the Central Pomo verb root t̯’é:č’ ‘stick together’.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    ? (2)

309

Central Pomo means/manner preﬁxes t ̯’é:č’ ‘stick together, be alongside each other’ da-t ̯’é:č’ ‘push on something that sticks in your hand’ ʔ-t̯’é:č’ ‘stick on with ﬁngers, as chewing gum under table’ ma-t ̯’é:č’ ‘step on a nail or something that sticks in your foot’ ča-t̯’é:č’ ‘sit on a thorn, put a patch on pants’ h-t ̯’é:č’ ‘stick up a pole, pitchfork, shovel, in ground’ m-t ̯’é:č’ ‘catch ﬁre’ ph-t̯’é:č’ ‘hammer a nail into the wall, nail something on’ pha-t̯’é:č’ ‘something ﬂoating downriver gets stuck on bank’ s-t ̯’é:č’ ‘while one is drinking, something gets into the mouth that doesn’t belong, like dirt or a bug’ ša-t ̯’é:č’ ‘stick a support, as a box, next to something long, like fence posts stored upright for use’

Two of the preﬁxes seen in (1) also occur here: ʔ- ‘ﬁne ﬁnger action’ and m‘involving heat’. Another widespread feature is the speciﬁcation of location and direction. Central Pomo examples of such sufﬁxes are in (3) with the verb čá- ‘run’. (Perfective aspect is marked here with the sufﬁx -w after vowels and glottal stop after obstruents. Imperfective aspect here is -an.) (3)

Central Pomo directional sufﬁxes čá-w ‘run’ (one) čá-:la-w ‘run down’ čá-:qač’ ‘run up (as up a hill)’ čá-č’ ‘run away’ čá-way ‘run against hither, as when a whirlwind came up to you’ čá-:ʔw-an ‘run around here and there’ čá-mli-w ‘run around it (tree, rock, house, pole)’ čá-mač’ ‘run northward’ čá-:q’ ‘run by, over (on the level), south’ čá-m ‘run over, on, across (as bridge)’

A third area of morphological elaboration in Central Pomo as well as in related and unrelated but neighbouring languages is a set of sufﬁxes and enclitics that mark dependent clauses. The markers distinguish what speakers cast as elements of a single larger event or state () and what they cast as related but distinct events or states (). In addition, the markers distinguish realis from irrealis situations. For realis situations, simultaneous or overlapping events and states are distinguished from those viewed as consecutive (sequential). Examples of the realis same sufﬁx -(i)n are in (4).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

310

 

(4)

Central Pomo dependent same Mú:l ʔe mu:t̯uya, mó da-héle:č’-in that  3. hole pulling-dig- Then, they would dig a hole, hó ʔmáhč’i-n, hole build.ﬁre- build a ﬁre, mi: ʔ=mú:t̯uya lóq’ ts’aqʰáṭ m-čá-la-w-ač’-in, there =3. thing greens -throw-horizontally--.- throw green stuff in there, mi: ʔ=mú:t̯uya mu:l šá ʔel m-ča-la-w-ač’-in, there -3. that ﬁsh the -throw-horizontally--.- throw those ﬁsh in there, m-ṭ’á-:ka-w-ač’. heat-sense---. and cook them.’

Examples of the realis different enclitic =da are in (5). (5)

Central Pomo dependent different Šé: ʔul ma, yém-aq-’=da longtime already 2 old--= ‘In the future, when you are older ʔá: čʰó-w=da, 1. not.exist-= when I am no longer here, ma ʔ-yá:q-an-ka-w=ʔkʰe 2. mentally-recognize-.--= you will see.’

Speakers can vary in their packaging of events as  or . Generally the kinds of factors that enter into their decisions include continuity versus discontinuity of topic, place, and time. The ﬁrst sustained contact between Pomoan speakers and a European language was in the nineteenth century, when California was a part of Mexico. Contact with Spanish resulted in the adoption of some nouns, primarily designating introduced concrete objects, but it had little apparent effect on the morphological complexity of Central Pomo or its neighbours. During the twentieth century, schools were established in which children were required to speak English, and many children were sent away to boarding schools where they were forbidden to speak Central Pomo. One man born in 1912 recalled that when he left the community at age 5, pretty much everyone spoke the language. When he returned ten years later, almost no one used it on a daily basis.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    ?

311

All of the speakers represented here learned Central Pomo as their mother tongue. All subsequently learned English as well, but they had varied histories. All ultimately returned to live in Central Pomo communities. (6)

Central Pomo speakers cited here Speaker 1: Fluent speaker (F), spoke Central Pomo on a daily basis Speaker 2: Fluent speaker (F), away for a few years as young woman otherwise spoke Central Pomo on a daily basis (daughter-inlaw of Speaker 1) Speaker 3: Early Acquisition of Central Pomo Fairly ﬂuent speaker (F), left at age 18, away for 30 years, married to a non-speaker, occasional use of Central Pomo Speaker 4: Early Acquisition of Central Pomo Less ﬂuent speaker (F), lived in community until age 13, returned 30 years later, widowed, occasional use Speaker 5: Early Incomplete Acquisition of Central Pomo Halting speaker (F), language scorned by father, departure for boarding school age 5, rare use Speaker 6: Some early acquisition of Central Pomo Son (M) of Speaker 1, older brother of Speaker 5, son of nonspeaker, boarding school ages 5–15, rare use

The Central Pomo of Speakers 1 and 2 shows full ﬂuency and articulateness. That of the others provides some insight into potential effects of contact on morphological complexity.

12.4 Obsolescence and morphological complexity With reduction in language use, particularly in situations of contact with a less synthetic language, we might expect a reduction in morpheme per word ratios. One way to investigate this hypothesis is to compare the speech of individuals with differing balances in their bilingualism. As noted, all of the speakers cited here learned Central Pomo as a ﬁrst language, then later learned English. For a preliminary comparison of morphological complexity, the speech of Central Pomo-dominant speakers was compared with that of now English-dominant speakers during the same conversations, so that the topics of discussion, discourse contexts, and social setting were constant. Calculations of morphemes per word revealed surprising results. In one conversation, for example, both Speaker 2 and Speaker 5 averaged precisely 1.44 morphemes per word! Other comparisons yielded similar results.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

312

 

The nature of the morphological complexity with varying contact effects differs in several ways, however. One is underspeciﬁcation of certain distinctions regularly mentioned by the most Pomo-dominant speakers. Speaker 4, for example, who was away from the community for some time and did not use the language often after her return, made the comment in (7). (7)

Central Pomo direction: Speaker 4 [‘I walked out,] ʔa: yhé-:n ht̯ow hčé-hče-w. 1. do-. from stagger-stagger- but I was staggering.’

She used reduplication to describe her staggering, but Speaker 2 later commented as we were transcribing the recording that a more dominant Pomo speaker would have used the verb in (8), specifying direction with the sufﬁx -:ʔw- ‘around here and there’. (8)

Central Pomo direction: Speaker 2 hihčé-:ʔw-an stagger-around-. ‘was staggering around’

Whether or not the reduplicative strategy used by Speaker 4 is morphologically simpler than the directional sufﬁx construction suggested by the more ﬂuent Speaker 2 could be debated. Reduplication for iteration does occur elsewhere in the language as a derivational process creating lexical items. To Speaker 2, it was less idiomatic, and the perfective aspect less appropriate than the imperfective. Speaker 4’s comment could be interpreted as an active innovative extension of existing patterns, or the symptom of a more limited vocabulary. Central Pomo verbs contain numerous kinds of number distinctions. One is inﬂectional. Imperfective markers, as well as the other aspect sufﬁxes derived from them, obligatorily indicate subject number: basically -(a)du- for singulars and -(a) č’i- for plurals. As speakers were discussing the special knowledge Pomo people have about gathering seafood, ﬂuent Speaker 2 made the ﬁrst comment in (9). The fact that she was describing multiple people was clear not just from the plural pronoun mú:t̯uya ‘they’, but also the plural imperfective sufﬁx -č’i- on the verb ‘know’, the distributive -t̯ay on ‘knowledgeable’ (since each was knowledgeable in their own right), and the distributive -ay on ‘people’. When Speaker 4 echoed the thought, she used the singular form of the verb ‘know’.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    ? (9)

313

Central Pomo number: Speakers 2, 4 2 Hínt̯il ʔ=mú:t̯uya šá:-t̯’a:ʔ-č’i-w Indian =3 knowledge-sense-.- ‘Indians know that.’ . . . Ma: šá:-t̯ay ʔe mu:l hínt̯il čá:č’-ay. stuff knowing-  that Indian people- ‘They know things, Indians.’ 4 Mm. ʔúda:w ma: šá:-t’a:ʔ-du-w. lots stuff knowledge-sense-.- ‘[He] knows lots of stuff.’

Speaker 2 later commented that Speaker 4 made it sound like just one person is smart. Speaker 4’s imperfective verb, a frequently-occurring one, was well-formed, but inappropriately selected in this context. The speech of less ﬂuent speakers does show morphological complexity. On another occasion, Speaker 4 offered the explanation in (10) with a complex verb. (10)

Central Pomo morphological complexity: Speaker 4 Mé:n=ʔt̯i ʔa: car čá-:ʔw-an-ka-w=ʔkʰe so=but 1. run.-around-.--= t̯ʰi-n ʔi-n. not- be- ‘That’s why I don’t drive.’

The verb is certainly morphologically complex, but it is highly frequent. Speaker 4 did not assemble it online: she selected it as a fully-formed lexical item. The same speaker used the verb in (11). (11)

Central Pomo morphological complexity: Speaker 4 ba:-yú:-čʰ-ma-w=ʔkʰe orally-know--.-= ‘they will understand’

This verb, too, shows some morphological complexity, and it is well-formed. But the context is revealing. It was part of a conversation among Speakers 2, 3, and 4. (The full conversation was in Central Pomo. Just the translation of Speaker 3’s remarks are presented here for context.) (12)

Central Pomo morphological complexity: Speakers 3, 2, 4 3 ‘My daughter says that we don’t want the White people to understand us. That’s why we speak Indian.’ 2 Mú:t̯uya ba-yú:-cʰ-ma-w=ʔkʰe ṱʰi-n. 3. orally-know---= not-. ‘They won’t understand.’

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

314

  4 Pretty soon ba:yú:čʰmawʔkʰe. ‘Pretty soon they will understand.’

Speaker 4 was echoing the verb just used by Speaker 2. Morphological complexity like this is not something speakers usually produce online as they speak. Speakers know lexical items: they know which formations exist and which do not, and for those that do, they know their specialized contexts of use. Among the verbs in (2) above with means/manner preﬁxes was pʰ-ṭʰé:č’, literally ‘by.swinging-stick’. When asked what this word means, Speaker 2 replied ‘hammer a nail into a wall’. Skilled speakers, who spend a major part of their day in the language, have larger lexical inventories and an acute sense of the precise contexts in which items are used. Their awareness of the components of morphologically complex words varies, but in most cases the internal structure of words is opaque to them. This is not altogether surprising. They rarely if ever saw the language written, and many morphemes are no more than a single consonant. Central Pomo contains a passive construction which functions to eliminate a grammatical agent from the clause. The agent may be generic, unknown, or unimportant, and it cannot be mentioned. The passive marker is a verbal sufﬁx -ya. It is added to both transitive and intransitive stems. An example from ﬂuent Speaker 2 is in (13). She was describing a conversation that had taken place at the senior citizens’ center. The identity of the eaters was not important; the passive clause simply served to locate the event. (13)

Central Pomo passive: Speaker 2 Béda maʔá: qa-wá-:ʔ-ya-w=da here food biting-go-.--=. ‘When (people) were eating here ʔi’=ma mu:l– Mitch=t̯o be- that Mitch= she– told Mitch’s mother . . . ’

ṭʰe-l . . . mother-

Slightly less ﬂuent Speaker 3 used a well-formed passive verb, but inappropriately. (14)

Central Pomo passive: Speaker 3 [‘He’s looking for a woman.’] Má:t̯a-ya q’á:-ya-w ʔe. woman- leave--  ‘His wife he was left.’ (For ‘His wife left him.’)

As we later transcribed and translated the conversation, ﬂuent Speaker 2 noted that Speaker 3 should have either used the basic transitive verb q’á:w ‘left’ or not mentioned the wife.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    ?

315

Less ﬂuent Speaker 4 used passive verbs in (15). (15)

Central Pomo passive: Speaker 4 Lady oranges qó=de:-ya-w and hither=carry-- ‘A lady brought oranges and needle

qó=de:-ya-w. hither=carry-- brought needles.’ Here, too, the passive verb forms are well-formed but incompatible with mention of the agent, the lady. Both (14) and (15) indicate that the speakers were selecting pre-formed words, rather than constructing them online as they spoke. Example (15) also reﬂects a smaller lexical inventory. As Speaker 2 later noted, a better choice for the ﬁrst verb would have been qó=di-w, and for the second qó=be-w. Different verb roots are used for carrying a single round item (de-), multiple round items (di-), and long items carried horizontally (be-). Speaker 4 did use some passive verbs appropriately, as in (16). (16)

Central Pomo passive: Speaker 4 Qʰá:p’-ṭ’á:-ya-w. pity-feel-- ‘Pitiful!’

This is a highly lexicalized, frequent expression. The speech of less Pomo-dominant speakers differed in another way. As seen earlier in examples (1), (4), and (5), the language contains a rich set of dependency markers. Less ﬂuent speakers tend to use less morphological clause combining, as can be seen in (17). (17)

Central Pomo clause combining: Speaker 3 ʔa: E=t̯o čá-l=yo-w 1. E= house-to=go- ‘I go to E’s house (and) hínt̯il ʔel ča:nó-d-an=ya mú:t̯u. Indian the talk-.-.=. 3. talk Indian to her.’

Speaker 2 later commented that the ﬁrst verb should have been čályohdu-n, ending in the realis  event dependency sufﬁx, rather than the perfective -w, yielding a sentence meaning ‘When I go to E’s house I talk Indian with her’. Speaker 3’s prosody in (17) reﬂected this structure: she did not end the ﬁrst clause with a terminal fall in pitch or a signiﬁcant pause.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

316

 

This same speaker made the comment in (18). (18)

Central Pomo clause combining: Speaker 3 Mkʰé ba:ʔá čʰo-w 2. food not.exist- ‘Even when you don’t have food ma mu:lt̯ayat̯’, 2. 3. bú ʔel ma mu:l fry-č-in, beans ʔel fryč-in . . . potato the 2. that fry--. beans the fry--. you fry potatoes for them, fry beans . . . ’

Speaker 2 later commented that she herself would have used the dependent verb form čʰó-w=da in the ﬁrst clause, with the realis different event dependency enclitic =da. The puzzle remains as to why the joint conversation between fully ﬂuent Speaker 2 and struggling, English-dominant Speaker 5 should show exactly the same morpheme per word ratio: 1.44. Speaker 2 actually spoke more during the conversation, with twice as many words (tokens). Signiﬁcantly, she used many more different words. Speaker 5 used just nine different verbs (types), all but four of them repetitions of verbs just used by Speaker 2. Overall, there are two main differences between the speech of fully ﬂuent Speakers 1 and 2 on the one hand, and more English-dominant Speakers 3, 4, 5, and 6 on the other. The ﬁrst is lexical knowledge. Fluent speakers who spend more time in the language know more words and lexicalized constructions. They can thus make ﬁner semantic distinctions, as with verbs specifying means/manner, location/direction, and different kinds of carrying, all seen here. The second is that ﬂuent speakers have more alternatives for shaping the ﬂow of information, with passives, clause linkers, and discourse particles. A signiﬁcant difference between the two groups is in fact the use of discourse particles, which convey such distinctions as source and certainty of information (hearsay, inference, etc.), contrast with expectation versus common knowledge, and much more. Fully ﬂuent speakers use substantially more such particles. Since the particles are monomorphemic, their pervasiveness lowers the average number of morphemes per word.

12.5 Mohawk Mohawk is a Northern Iroquoian language indigenous to the North American Northeast, currently spoken in communities in Quebec, Ontario, and New York State. It is prototypically polysynthetic. It is holophrastic in the narrow sense: one

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    ?

317

word, a verb complete with pronominal arguments and predicate, can constitute a full sentence. The verb in (19), for example, would be a complete sentence on its own. (19)

Mohawk holophrasis: Ima Johnson, speaker ‘We were driving along and saw a sign advertising free feed with chickens.’ Kén: ne:’ ia’akiate’serehtínion’t ne:’ thí:. ken: ne:’ i-a’-ak-i-ate-’sere-ht-inion-’t-e’ ne:’ thiken here it.is --1.---dragit.is that -be.in-- here it is we two caused our dragger to be in there it is that ‘So we pulled in.’

There are three lexical categories in Mohawk, deﬁned in terms of their internal morphological structure: verbs, nouns, and particles. (Particles are monomorphemic, though they are sometimes compounded.) The morphological structures are templatic; that of verbs is the most elaborate. The basic verb template is in Figure 12.1. Within the blocks of pre-pronominal preﬁxes and derivational sufﬁxes there are multiple slots. The prepronominal preﬁxes include a Contrastive, Coincident, Partitive, Translocative, Factual, Duplicative, Irrealis, Future, Cislocative, and Repetitive. The derivational sufﬁxes include an Inchoative, Reversives, Causatives, Instrumental Applicatives, Benefactive Applicatives, a Directional Applicative, Distributives, Andatives, and Ambulatives. There are around sixty pronominal preﬁxes, three aspect sufﬁxes, and four ﬁnal tense/mood sufﬁxes. Nearly all show phonologically and/or morphologically conditioned allomorphy. As in many templatic systems, there are discontinuous dependencies among morphemes. Certain verb roots require a Duplicative preﬁx (), for example. In some cases, a semantic rationale can be discerned: the Duplicative can indicate some kind of ‘two-ness’ or a change of state or position, though its occurrence is lexicalized with each verb. In other cases, any semantic contribution has faded. Some other verb roots require certain other prepronominal preﬁxes, in what are now lexicalized combinations. Another discontinuous dependency holds between inﬂectional preﬁxes and sufﬁxes. The perfective aspect sufﬁx, for example, requires the presence of a Factual, Future, or Irrealis prepronominal preﬁx.

PREPRONOMINAL PREFIXES

PRONOMINAL PREFIXES

REFLEXIVE MIDDLE

Figure 12.1. Mohawk verb template

NOUN STEM

VERB ROOT

DERIVATIONAL SUFFIXES

ASPECT SUFFIXES

TENSE MOOD

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

318

 

12.5.1 Inﬂection All verbs must contain a verb stem, an inﬂectional pronominal preﬁx identifying the core arguments of the clause, and an inﬂectional aspect sufﬁx. There are three sets of pronominal preﬁxes: grammatical Agents, grammatical Patients, and transitives, which are Agent>Patient combinations. A transitive preﬁx can be seen in (20). (20)

Mohawk transitive pronominal preﬁx: Ima Johnson, speaker Taionkhí:ion’ kítkit. ta-ionkhii-on-’ kitkit .-.>1-give- chicken ‘They gave us chickens.’

An assumption that still sometimes appears in the literature is that speakers create inﬂection by rule, because no one could ever remember so many forms. For Mohawk, the matter is not so simple. Even excellent speakers have differential control over pronominal preﬁx—root combinations. Some combinations simply occur more often than others: ﬁrst person singulars are very frequent, for example, while masculine duals are less so. Verb stems beginning with a are very frequent, in good part because the middle voice preﬁx, which occurs at the beginning of stems, has the shapes -at/-ate-/aten-/-an-/-ar-, while those beginning with the vowel i are relatively rare. Under elicitation, speakers hesitate more with rarer forms: rarer pronominal preﬁxes, rarer phonological contexts, rarer full words. This does not mean of course that they cannot create new forms by analogy.

12.5.2 Derivation As seen above, the verb allows for morphological expression of a number of distinctions. Skilled Mohawk speakers tend to exploit these more than less Mohawk-dominant speakers. An example of this precision can be seen in (21). The ﬂuent speaker cited above continued her account of the chicken adventure in the course of a conversation with friends. She and her husband bought some chickens and built a chicken coop. They enjoyed hearing the rooster crow in the morning. But one morning the chickens were screaming more than usual. Her husband suggested that something must be after them, and the couple went to look. When the husband peered through a hole in the wall, he saw that something had gotten ahold of one of the chickens, and it was screaming. She suggested he get his gun. It was a weasel. It had already bitten the chicken on the leg. Her husband took aim and shot. The weasel looked around, wondering what had happened. As the wife continued her story, each time she mentioned an event that

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    ?

319

had happened before, she included the Repetitive preﬁx sa- ‘again’ on the verb, whether or not there was a separate particle á:re’ ‘again’. (21)

Mohawk Repetitive preﬁx sa-: Ima Johnson, speaker Ó:nen á:re’ ne kwáh sahaié:na’ onen are’ ne kwah sa-ha-iena-’ now again the just .-..-grab- now again the just he re-grabbed ‘Still (again) he grabbed onto the chicken.

thi: thiken that that

Ó:nen á:re’ nakwáh taonsaiohén:rehte’ onen are’ nakwah t-a-onsa-io-henreht-e’ now again very.much ---.-yell- now again very much did it re-yell The chicken really screamed. Ó:nen á:re’ sahate’sennón:ni’ onen are’ sa-ha-ate-’sennonni-’ now again .-..--aim- now again he re-aimed (Again) my husband took aim. Thi:, . . weasel thiken that weasel That weasel

kítkit. kitkit chicken chicken

thi: thiken that that

kítkit. kitkit chicken chicken

ne rikstèn:ha. ne ri-ksten=ha the 1>.-be.old=mdim the I have him as old man

nen onen then

kwáh taonsahatkahtónnion’ kwah t-a-onsa-ha-at-kaht-onnion-’ just ---..--look-- just he re-looked around just looked around (again)

ne ne the the

á:re’. are’ again again

Nok á:re’ taonsahatekhwá:ko’ ne ok are’ t-a-onsa-ha-ate-khw-ako-’ the too again ---..--meal-take- and again he re-bite-took And then he took (another) bite thi: kitkit ne kahsinà:ke. thiken kitkit ne ka-hsin-a’ke that chicken the ..-leg-place that chicken its leg place out of the chicken on its leg.’

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

320

 

12.5.3 Noun incorporation Noun incorporation, the compounding of a noun stem with a verb stem to form a new verb stem, is pervasive in Mohawk, but it is a word-formation device. Speakers generally know which forms are part of the lexicon of the language and which could be but are not. Awareness of neologisms depends on the productivity of individual noun and verb stems. Some noun stems are never incorporated, some are sometimes incorporated, some are often incorporated, and some occur only incorporated. Similarly, some verb stems never occur with an incorporated noun, some occur in a few combinations with nouns, and some in many. New combinations with less productive stems are more often noticed than those involving highly productive ones. Often the language provides alternatives for packaging information: a noun may occur as an independent word or incorporated into a verb. The density of incorporation for discourse purposes generally varies across speakers with the degree of language use. Examples of noun incorporation can be seen in (22), part of a conversation between a grandmother and her granddaughter as they were making meat pies. The grandmother was a highly skilled speaker, who learned English only after she went to school. The granddaughter heard Mohawk as a child, but spent most of her daily life in English. (The entire conversation was in Mohawk, but just the free translation is given for the ﬁrst few lines to provide context.) (22)

Noun incorporation: Grandmother and granddaughter GM: ‘Go get the wooden bowl.’ GD: ‘Wooden bowl?’ GM: ‘Wooden bowl.’ GD: ‘Wha– GM: ‘You’ll use it to put the ﬂour in.’ GM: Othè:sera’ ostòn:ha, sok, kén:ie’. ﬂour a little then fat ‘A little ﬂour, and then, fat. Tánon’, um, And, um, né: ní: ke-rákw-as that the=1 1.-choose- the I I prefer n=en-ke-wist-á:wen-ht-e’. the=-1.-fat-liquid-- I will fat melt and I myself prefer to melt the fat.’

The grandmother ﬁrst introduced the fat with the independent noun kén:ie’. Once it was an established referent, she incorporated it: enke-wist-á:wenhte’ ‘I will fat melt’. (Incorporated noun stems are not always the same as their independent

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    ?

321

counterparts.) The combination ‘fat-melt’ is of course a common one. The conversation continued. (23)

Noun incorporation: Grandmother and granddaughter GD: To: ní:kon? ‘How much?’ GM:

En, o: ní:se’ enhsanónhton’ en o: ne=ise’ en-hs-anonhton-’ ah oh the=2 -2.-think- ah oh you you will think ne: tho: ní:ioht ne: tho: ni-io-ht that there -.-be.so it is there so it is ‘Ah, you’ll decide tsi ní:kon – enhsena’tarón:ni’. tsi ni-k-on en-hse-na’tar-onni-’ how --be.amount -2.-baked.goods-make- how so it amounts you will baked.goods make according to how many pies you’re making.’

At this point the pies were well-established referents, active in the consciousness of the speakers, so it is no surprise that the noun stem -na’tar- ‘baked goods’ was incorporated. There was little need to highlight it. The verb ‘make’ is what could be called a ‘light verb’, not adding highly complex, new information. It is one that frequently incorporates, and the combination ‘baked.goods-make’ = ‘bake’ is a common one. As the conversation continued, the granddaughter introduced referents with independent nouns, and the grandmother picked them up with incorporated nouns. (24)

Noun incorporation: Grandmother and granddaughter GD: Tánon’, o’wà:ron’, tánon’ ohnennà:ta’? and meat and potato ‘And meat and potatoes?’ GM: En, tsi nikarì:wes ki: sarhá:re’ sok– ah as so it is matter long this you are waiting then ‘Ah, while you’re waiting, enhshennà:ton’, tánon’ teka’wahraríhton. en-hs-henna’t-on-’, tanon’ te-ka-’wahr-a-ri-ht-on -2.-potato-cook- and --meat--cooked-- you will potato cook and it is meat cooking then you’ll cook the potatoes, and the meat is cooking.’

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

322

 

During this conversation, the grandmother talked more than the granddaughter, with ﬁve times as many words (tokens) overall. But as in Central Pomo, perhaps surprisingly, their average morpheme per word ratios were nearly identical: the grandmother’s speech averaged 2.4 morphemes per word, and the granddaughter’s 2.3. As in the Central Pomo conversations, the skilled speaker, the grandmother, used many more discourse particles, which are monomorphemic. The granddaughter used some highly lexicalized polymorphemic words, words that she clearly selected as familiar chunks, and fewer particles. The speech of the two was overall quite different, in many of the same ways as in Central Pomo. Skilled speakers like the grandmother here spend more of their time in the language and simply know more words and more constructions. They have more lexical items to choose from, including verb stems with incorporated nouns, and more choices among constructions for shaping the ﬂow of information.

12.5.4 Processing The crucial role of lexicalization in processing can be seen in interactions among speakers of different dialects. There are six Mohawk communities, distributed across Ontario, New York State, and Quebec. These are, from west to east, Ohswé: ken, Wáhta’, Tehaientané:ken, Ahkwesáhsne, Kanehsatà:ke, and Kahnawà:ke. Phonological differences among the dialects are relatively minor. Where speakers in the west pronounce the affricate written as an alveopalatal before a high front vowel or palatal glide, those in the east pronounce it as alveolar. Where some speakers pronounce as a retroﬂex ﬂap, others pronounce it as a lateral [l]. Where speakers in the west continue the pronunciation of original *ty and *ky, those at Ahkwesáhsne pronounce both as velar, and those at Kanehsatà:ke pronounce both as alveopalatal. Morphology is constant across the dialects. The morphological templates are the same, as are the inventories of preﬁxes, roots, and sufﬁxes. Principles of syntax are also the same. Constituent order is purely pragmatically based. Quite surprisingly, when a recording of an excellent speaker from Ohswé: ken’, the westernmost community, was played for skilled speakers in Kahnawà: ke, the easternmost community, they had difﬁculty understanding him. The barrier was not the individual morphemes, nor their patterns of combination, which are essentially the same in all of the dialects, but vocabulary, the preformed chunks. Over the past several centuries since their separation, different lexical items have developed in the different communities. These Kahnawà:ke speakers were not processing his speech morpheme by morpheme, but word by word.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    ?

323

12.5.5 Native acquisition An intriguing issue for the acquisition of languages with complex morphology is how children ﬁrst break into the system. They never hear verb roots or stems in isolation; in fact Mohawk speakers themselves cannot isolate roots or stems (unless they become linguists). Over the past several decades, there have been relatively few children learning Mohawk as a ﬁrst language (though that pattern is beginning to change), so large-scale studies of acquisition have not been possible. Some principles have emerged, however, from observation of a few children acquiring the language (Mithun 1989). The ﬁrst is that the earliest stages of acquisition are phonologically based. Children ﬁrst extract the stressed syllable of words. This choice is actually useful. Stress basically falls on the penultimate syllable, the second from the end (though certain epenthetic vowels are passed over). The stressed syllable often coincides with the root or part of it, so the children can often get their message across. Progress remains phonologically based for a time: the child ﬁrst adds the ultimate syllable, producing two-syllable words, then the antepenult, etc. An example of adult/child interaction can be seen in (25). (25)

Child Mohawk: Adult and child, 2;2 Adult Child Wa’kéta’. Kéta’. ‘I’m putting them in.’

Some later child versions of words are in (26). (26)

Child Mohawk Adult Child osahè:ta’ ahe:ta’ ohiákeri iákeri tehotskà:hon otskà:hon

‘beans’ ‘fruit juice’ ‘he’s eating’

What is at ﬁrst astonishing about the Mohawk of young children is what appears to be their allomorphic skill. The masculine singular agent pronominal preﬁx ‘he’, for example, has the form ra- word-initially and -ha- word-internally, except that it is basically -hr- after a stressed vowel or before the vowels o, on, e, or en. When the following stem-initial vowel is i, this vowel merges with the a of the pronominal preﬁx to the nasalized vowel en, ([ᴧ̨]), yielding allomorphs ren-/-hen-/-hren-. (This fusion is characteristic of some other pronominal preﬁxes, but not a general process throughout the language.) Otherwise, the ﬁnal a of the pronominal preﬁx is lost before another vowel. Another phonological process involves coda h in stressed syllables: the laryngeal produces a distinctive high-fall pitch contour (indicated orthographically with a grave accent) on that syllable,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

324

 

then disappears, leaving vowel length. The masculine singular agent preﬁx thus has the forms -hra-, ra-, -ha-, -hr-, r-, -hren-, -hen-, and -ren. And yet, children never seem to make mistakes! At age 2 years and 10 months, one child easily asked Ka’ wà: re? ‘Where’s he going?’, never tripping over those complex phonological processes. (27)

Mohawk phonology Ka’ wà:re’? ka’ wa-hra-e-’ where -..-go- ‘Where is he going?’

Of course this is no surprise. The child knew the full question as a chunk; he did not manufacture the word from underlying forms of morphemes, then apply multiple phonological processes to arrive at a surface form. The second person singular agent pronominal preﬁx ‘you’ is basically s- wordinitially, -hs- word-internally, with epenthetic -e- before stems beginning in n, r, or w and certain consonant clusters. The basic form of the perfective aspect sufﬁx is glottal stop ’, with epenthetic -e- after consonants. As noted above, stress is penultimate, with stressed vowels lengthened in open syllables, but epenthetic vowels do not enter into the determination of stress. The child cited above in (27) similarly came out with the exclamation in (28) below easily and perfectly, despite the complexity of the processes that would go into building it from underlying forms then applying a sequence of phonological rules. (28)

Mohawk phonology Sótsi enhserá:kewe’! sotsi en-hs-rakew-’ too -2.-wipe- ‘You’re going to erase too much!’

Of course the child learners did not emerge instantaneously with Mohawk equivalent to that of adults. About the time they were producing three-syllable words, they began to discover morphology, usually with a few more frequent pronominal preﬁxes. (These immediately precede the verb stem.) From this point on, acquisition was governed more by morphology than phonology. As seen earlier, Mohawk speakers generally specify the direction of directed motion, with a Translocative preﬁx i-/ie-/ia-/ia’-/iaha- ‘thither’ or a Cislocative preﬁx t-/ te-/ta-/-onta-/-onte-/-ont- ‘hither’. A Translocative preﬁx was seen earlier in (19) in the verb i-a’akiate’serehtínion’t ‘we pulled in there’. At 2 years and 10 months, the child cited in (27) and (28) generally omitted the directional preﬁxes.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    ? (29)

Mohawk direction Child Adult version enháhawe’ i-enháhawe’ waháhawe’ i-aháhawe’

325

‘he will take it’ ‘he took it’

(The initial w of the factual preﬁx regularly disappears following the Translocative.) Negation is expressed in Mohawk, as in many languages, with a combination of markers: the particle iáh plus an initial Negative prepronominal preﬁx te’- or Contrastive prepronominal preﬁx th-/tha-/tha’-. This child used the analytic marker iáh alone at this age. (30)

Mohawk negation Iáh thí:ken rón:kwe iah thiken r-onkwe not that .-person ‘That man doesn’t eat it.’

ì:raks. i-hr-ak-s -..-eat-

The adult version would include a negative prepronominal preﬁx on the verb: te’-hr-ak-s > tè:raks. (Mohawk verbs must contain at least two syllables. If a verb would otherwise be monosyllabic, a prothetic vowel i- is added at the beginning, which bears stress.) Overall, children learning Mohawk apparently ﬁrst build vocabulary within phonological length limitations, then begin to abstract morphological distinctions. The fact that they so rarely make allomorphic errors suggests that they are not in fact producing language by assembling underlying forms then applying sequences of phonological rules. This accords well with the ﬁndings of Tomasello (2006 and elsewhere) on acquisition: Children’s earliest acquisitions are concrete pieces of language—words, complex expressions, or mixed constructions—because particularly early in development they do not possess fully abstract categories and schemas. Children construct these abstractions only gradually and in piecemeal fashion. The strategies observed in children learning Mohawk as a ﬁrst language differ interestingly from those seen in adult second-language learners. In several of the Mohawk communities, an extraordinary generation of young adults are developing an impressive competence in the language. They are becoming ﬂuent, something that would have been considered an impossible dream only a short time ago. These second-language learners show brilliant mastery of the complex morphology, certainly making allomorphic mistakes along the way, but exquisitely tuned in to the complexities involved. First-language speakers are delighted to see their accomplishments, though, interestingly, they observe uniformly that these second-language speakers continually create words that do not exist in the language.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

326

 

12.6 Implications for our models of morphology Work by Blevins (2006, 2013, 2016a, 2016b), Pirrelli et al. (2015), and others draws a distinction between Constructive and Abstractive models of morphology. In Constructive frameworks, surface word forms are described as built up from subword units, either in terms of substance or rules. In Abstractive frameworks, the basic units of the grammatical system are surface word forms. Roots, stems, and exponents are understood as abstractions over a lexicon of word forms. Constructive perspectives underlie efﬁcient linguistic descriptions, the kinds of descriptions that are useful for both linguists and adult second-language learners. They also ﬁt well with what is seen in adult acquisition of Mohawk as a second language, in particular allomorphy mistakes and the overgeneration of derived forms. Such descriptions also provide measures of objective complexity in the sense described by Dahl and others cited earlier. Abstractive perspectives are word based, though it is recognized that words can be internally structured into recognizable constituent parts. Constituent parts are analysed as emergent from independent principles of lexical organization, whereby full lexical forms are redundantly stored and mutually related through entailment relations (Matthews 1991; Corbett & Fraser 1993; Pirelli 2000; Burzio 2004; Booij 2010; all cited in Pirelli et al. 2015: 142). It is signiﬁcant that the processing of a given form may be facilitated or inhibited by other, related forms. This makes sense only if the related forms are available as elements of a speaker’s mental lexicon (Taft 1979; Baayen et al. 1997; Schreuder & Baayen 1997; Hay 2001; de Jong 2002; Moscoso del Prado Martin 2003, cited in Blevins 2006; Blevins 2006: 535). Abstractive models accord well with differences between highly ﬂuent ﬁrstlanguage speakers of Central Pomo and Mohawk on the one hand, and Englishdominant ﬁrst-language speakers on the other. One of the most salient differences is that while less ﬂuent speakers do use highly synthetic words if they are very frequent or primed, they have a smaller inventory of choices. Their more limited lexical inventories can result in some inappropriate lexical selections, both inﬂected and derived, and fewer options for shaping information ﬂow. Abstractive models also accord well with the strong sense among both Central Pomo and Mohawk speakers of whether a possible word exists and exactly when it is used. They are in line with the problems even skilled speakers sometimes face in attempting to process speech from other dialects. They would predict the variable ability of speakers to isolate morphemes which never occur on their own as independent words, the existence of discontinuous dependencies, and speakers’ differential facility in producing inﬂectional paradigms. Speakers can certainly extend patterns of inﬂection by analogy on occasion, but rarer forms and combinations present greater challenges. Abstractive perspectives also accord with the

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

    ?

327

phonologically-based ﬁrst-language acquisition strategies of Mohawk by young children, and the rarity of allomorphy mistakes. Learners of that age face few memory hurdles that would hinder the acquisition of large numbers of new lexical items. In the end, what is complex for the analyst is not necessarily complex for the speaker or for the learner. Speakers of Central Pomo and Mohawk store most polymorphemic words, or at least stems, as chunks. Allomorphic alternations do not present serious difﬁculties when they are embedded in the chunks, a fact that is easily observed in the absence of mistakes in frequent forms, but also in the challenges presented by rare or novel combinations. Templatic structure may be unmotivated for the analyst and thus viewed as additional complexity, but, importantly, the routinization of structure they represent can result in fewer decisions on the part of speakers. It can also facilitate the acquisition of new lexical items, items which easily ﬁt into an existing pattern. Do the differences matter? The various types of complexity are all useful, but for different purposes, and for that reason, it is important to recognize them. If our goal is to delineate what is a possible language, we want to think about possible for whom. Language is full of patterns, some no longer productive. As analysts we care about all of them: they allow us to understand the otherwise arbitrary. Speakers inherit the products of past patterns and happily use some without abstracting over them. And it is learners and speakers who shape the language according to their own knowledge.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

IV

DISCUSSION

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

13 Morphological complexity and the minimum description length approach Östen Dahl

13.1 Introduction Within the study of linguistic complexity, morphological complexity has a special place due to the fact that morphology is the part of language where differences in complexity between languages are most apparent. Morphological complexity also seems to lend itself fairly easily to quantiﬁcation. It is therefore natural that it should attract the attention of linguists. The chapters in this volume show a variety of approaches to morphological complexity which sometimes differ quite considerably in the conceptual apparatus applied. In this concluding chapter, rather than try to review each contribution separately, I will focus on some of the basic concepts used by the authors. Sometimes this will demand going beyond the contributions to the volume. I will start out by presenting brieﬂy what I will call ‘the minimum description length approach’ to complexity and then try to see how other concepts of complexity applied in the chapters of the volume relate to it.

13.2 The minimum description length approach to complexity I will take as my point of departure the idea that the complexity of an entity can be understood as the amount of information needed to recreate or specify it—which in most cases can be identiﬁed with the length of the shortest possible complete description of it. This is often referred to as ‘Kolmogorov complexity’ or ‘algorithmic information content’ and has its most natural application when applied to strings (of symbols or characters): the Kolmogorov complexity of a string is the inverse of its compressibility. Kolmogorov complexity is behind the ‘minimum description length (MDL) principle’ which is said to build on the insight that ‘any regularity in the data can be used to compress the data’ (Grünwald 2007), leading to the conclusion that ﬁnding the best hypothesis for a given set of data means ﬁnding the optimal way to compress it. As in Dahl (2004), I will here use the term ‘pattern’ rather than ‘regularity’, following Goertzel (1994) and Shalizi (2001).

Östen Dahl, Morphological complexity and the minimum description length approach In: The Complexities of Morphology. Edited by: Peter Arkadiev and Francesco Gardani, Oxford University Press (2020). © Östen Dahl. DOI: 10.1093/oso/9780198861287.003.0013

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

332

¨  

The minimum description length principle is sometimes said to be a version of Occam’s Razor, but it might equally well be called ‘Pāṇini’s razor’, given that he and other Indian linguists in the ﬁrst millennium  honoured a principle later formulated as ‘Grammarians rejoice over the saving of half a short vowel as much as over the birth of a son’,¹ which more directly addresses the issue of description length. In modern linguistics, similar ideas have been discussed in terms of ‘descriptive economy’ or ‘parsimony’. But ‘minimum description length’ has been explicitly addressed in computational approaches to morphology, the most cited example being Goldsmith (2001). I think the notion of minimum description length can also be helpful in understanding some notoriously difﬁcult concepts in linguistics. I will take suppletion as an example. Corbett (2009) characterizes suppletion as ‘an outer limit of inﬂection, the extreme of markedness and complexity’ but also approvingly quotes the following statement from Mel’čuk (1994: 358), which does not refer to complexity, as ‘a good deﬁnition of suppletion’: ‘For the signs X and Y to be suppletive their semantic correlation should be maximally regular, while their formal correlation is maximally irregular.’ But what is interesting here is rather the explication in the cited work of Mel’čuk of what he means by ‘maximally irregular’, or as he says elsewhere, ‘minimally regular’. For his ‘rigorous deﬁnition’ of suppletion, Mel’čuk introduces the auxiliary notion of co-representability. Two units are co-representable if they can be derived from each other or from a common source by rules of the language, and the condition on maximal irregularity of form means that the signiﬁers of the units are not co-representable. No particular conditions such as productivity or generality are put on the rules— ‘[t]he only factor that counts for there to be regularity is the presence of  rules’. In a minimum description length approach, Mel’čuk’s account can be interpreted as implying that suppletion involves the absence of a pattern or regularity—a way of representing the data in a shorter way than by rendering it literally. Thus, a suppletive form would have to be listed in the description of the language. Notice that this excludes what is not explicitly precluded in Mel’čuk’s account—a rule which applies to one form only. The point is that introducing such a rule would normally involve an increase in description length that would offset what is gained by shortening the speciﬁcation of the suppletive form.

¹ The maxim (Sanskrit ardhamātrā lāghavena putrotsavaṃ manyante vaiyākaraṇ āḥ) is often quoted in the literature without a source. There is no known formulation of it from classical times. In the form cited here, it derives from the treatise Paribhāṣenduśekhara by the nineteenth-century Indian scholar Nagēśa or Nāgojībhaṭtạ , which was translated into English by the German Indologist Franz Kielhorn (Kielhorn 1871). Incidentally, Occam’s Razor in its commonly cited form (entia non sunt multiplicanda praeter necessitatem) is not found in the writings of William of Ockham but derives from the seventeenth-century Irish philosopher John Punch.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi



333

13.3 The organization of morphology When describing a set of objects, the most parsimonious way is often to separate the information about their general properties from the information that is speciﬁc to each member of the set. Descriptions of languages are traditionally divided into ‘grammar’ and ‘lexicon’. So let’s see what that implies for morphology. We can see the goal of the morphological component of a grammar—or ‘morphology’ for short—as a tool to generate the set of all word forms, organized in paradigms, in a language from a lexicon. Another way of putting this in the spirit of the MDL principle is to regard the morphological component as a way of compressing the set of paradigms. The morphology and the lexicon together constitute the description of the word forms. The lexicon will consist of a set of entries, which I shall call ‘lexical speciﬁcations’, containing the information needed by the morphology to generate one particular paradigm, that is, on the one hand, one or more basic forms or principal parts; on the other, membership in inﬂection classes, genders, etc. I shall here assume that the lexicon contains no other information. The total length of the morphology and the lexicon is thus indicative of the complexity of the paradigms. But in speaking of morphological complexity we have to sort out a few different components in this. Primarily, the morphological complexity of a language would be the complexity of the morphological component in the sense of the system that relates the lexicon with the set of paradigms. To start with, although I have been speaking of a set of word forms and a set of paradigms as if those things were equal, the difference between them is crucial. Think of the paradigm as a table. Since there is a number of ways any given set of word forms can be organized into a table, and the choice between them is signiﬁcant, it follows that there is information hidden in the organization of the paradigm and consequently the paradigm is more complex than the set of word forms. Furthermore, the paradigms belonging to lexical items of one part of speech usually share a common structure. But this structure can be studied independently of the system that relates paradigms and lexical speciﬁcations. So paradigm organization can be seen as a component of its own. Another problem is to what extent the lexicon is relevant to the question of morphological complexity. On the one hand, to the extent that the morphological component does not treat all lexical items equally, the lexicon will have to contain information that makes that possible. On the other hand, if items are added to or removed from the lexicon, the total length of the lexicon will change—and it seems counter-intuitive that these changes should always inﬂuence the morphological complexity of a language. For this reason, it is rather the information contained in the individual lexical speciﬁcations that is of interest.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

334

¨  

As I said, to assess the complexity of a set of paradigms, we would have to consider both the complexity of the lexicon and the complexity of the morphology—both separately and taken together. As noted in Sagot & Walther (2011; quoted by Parker & Sims, Chapter 2, this volume), it may sometimes be possible to obtain a shorter total description length by changing the division of labor of the two components. It is possible to streamline the general picture somewhat. Instead of thinking of the output of morphology as a set of paradigms, we can think of the morphology as generating a set of annotated word forms—that is, each form comes with a speciﬁcation of its grammatical features. This way, the system becomes symmetric—we can speak of input and output speciﬁcations and a set of rules that relate them. The input and output speciﬁcations have speciﬁed formats and contain terms taken from speciﬁc vocabularies: labels of inﬂectional classes and values of inﬂectional features, respectively. The sizes or lengths of the speciﬁcation formats and vocabularies are part of the overall complexity of the linguistic system, but they also inﬂuence the complexity of the rules of the morphological component. What I have just said illustrates that it is not always clear how to draw the boundaries of morphological complexity. In general, speaking of the complexity of a component of the description of a language in isolation easily becomes somewhat artiﬁcial, in my opinion even on a modular view of language structure. I will return to this question below.

13.4 Notions of complexity represented in the volume As noted above, the chapters in the volume differ in the notions of complexity that are invoked. But they also differ in the extent to which they place these notions within explicit frameworks. The minimum description length approach to complexity is mentioned in the chapters by Di Garbo, Chapter 9; Loporcaro, Chapter 6; Mithun, Chapter 12; and Nichols, Chapter 7. But more salient in the volume is the approach of Ackerman & Malouf (2013). Several chapters (Henri et al., Chapter 5; Parker and Sims, Chapter 2; Mansﬁeld and Nordlinger, Chapter 3; and Meakins and Wilmoth, Chapter 4) draw on their distinction between two ‘dimensions in the analysis of morphological complexity’, viz. ‘enumerative complexity’ or ‘E-complexity’ and ‘integrative complexity’ or ‘I-complexity’. This motivates discussing these concepts in some detail, which I will do below. A superﬁcially somewhat similar dichotomy is that made by Nichols between ‘inventory complexity (IC)’ and ‘canonical complexity (CC)’, but while ‘IC’ and ‘enumerative complexity’ are fairly closely related, the second members of the pairs bear little resemblance to each other. (There is a potential source of

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi



335

confusion in that Nichols’s ‘IC’ is closer to Ackerman & Malouf’s ‘E-complexity’ than to their ‘I-complexity’.) Thus, Nichols’s ‘CC’ deserves a discussion of its own. Mithun (Chapter 12, this volume) cites both minimum descriptive length complexity and the distinction between ‘Constructive’ and ‘Abstractive’ models of morphology. Berdicevskis & Semenuks (Chapter 11, this volume) identify ‘irregularity’ and ‘overspeciﬁcation’ as the two ‘facets of complexity’ they want to focus on. Tallman & Epps (Chapter 9, this volume) rely on the taxonomy of Anderson (2015a), with ‘system complexity’ and ‘exponence complexity’ as the top level categories.

13.5 Compositional complexity In the introduction to Miestamo et al. (2008), the volume editors apply the analysis of the notion of complexity in Rescher (1998) to linguistic complexity. For Rescher, description length (in his terms, ‘descriptive complexity’), is just one of several ‘modes of complexity’. Another is ‘compositional complexity’, which relates to the constituent elements of a system and is subdivided into two submodes: ‘constitutional complexity’—the number of elements, and ‘taxonomic complexity’—their variety. Miestamo et al. (2008: viii) exemplify the former with the number of ‘phonemes, inﬂectional morphemes, derivational morphemes, lexemes’, and the latter with the variety of ‘phoneme types, secondary articulations, parts-of-speech, tense-mood-aspect categories, phrase types’, etc. Although there are no references to Rescher’s taxonomy (but see the editors’ Introduction, Chapter 1), notions close to ‘constitutional complexity’ show up in a number of ways in the chapters of the volume, notably as one of the poles of the dichotomies of Nichols and Ackerman & Malouf. Nichols’s ‘IC’ is based on ‘assessing the number of elements in an inventory or values in a system’, exempliﬁed by ‘the number of phonemes, genders, tenses, derivation types, alignments, word orders’. She identiﬁes it with Miestamo et al.’s (and thus indirectly Rescher’s) notion of ‘taxonomic complexity’. It may be noted that some of the items in her list seem rather to belong to ‘constitutional complexity’ in Rescher’s schema, illustrating that the borderline is somewhat fuzzy. Nichols also quotes the term ‘resources’ from Dahl (2004) in this context, which is slightly problematic. In my book, I opposed ‘resources’ and ‘regulations’, saying that intuitively, ‘resources determine what is possible or permitted, regulations what is obligatory’, and noting that ‘the distinction is reminiscent of that between grammar and lexicon but does not coincide with it’ (Dahl 2004: 41). The basic idea was that resources are things that one can more or less freely choose from. The primary examples are lexical items. As the quotation suggests, I did not primarily think of the notion as applying to grammar. Many of the phenomena Nichols

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

336

¨  

enumerates are not freely chosen by speakers but rather show up as a consequence of forced choices due to what I called regulations. Later in the book (Dahl 2004: 42) I say that if one wants to characterize a language with respect to its ‘resources’, the parameter that comes ﬁrst to mind is ‘richness’. Dressler (2011), who is quoted by Loporcaro (Chapter 6, this volume), also uses this term, characterizing the size of paradigms as a criterion of ‘richness’ rather than of ‘complexity’. However, Dressler deﬁnes ‘richness’ as ‘the amount of productive morphological patterns’, associating complexity with unproductive patterns, so his notion is different from mine (and apparently also from Nichols’s ‘IC’). Let me now turn to Ackerman & Malouf’s notion of E-complexity. It is not quite clear what is supposed to go into it. The abstract says that E-complexity reﬂects ‘reﬂects the number of morphosyntactic distinctions that languages make and the strategies employed to encode them, concerning either the internal composition of words or the arrangement of classes of words into inﬂection classes’ (Ackerman & Malouf 2013: 429). The deﬁnition in the main text (2013: 433) is formulated in a somewhat roundabout way. The authors ﬁrst note that ‘descriptive linguists often comprehensively catalogue the array of morphological markers and patterns in a given language or languages’, making possible on the one hand typological investigations of the types of information encoded in words and taxonomies of formal strategies for encoding this information, on the other, inferences by theoretical linguists about the bounds on possible word structures in natural languages. ‘We refer to patterns found via this general cataloguing of properties and their surface exponence for words in all of their variety as the enumerative complexity or E-complexity of a morphological system.’ What is unclear here is whether E-complexity is basically a count of distinctions and patterns/strategies or something more. Later formulations in the paper do not really solve this problem. On p. 434, we learn that ‘[on]e salient dimension of E-complexity is the number and nature of inﬂection classes in a language’, with the word ‘nature’ suggesting that it is not only a question of counting. On the other hand, on p. 437, it is said that paradigm-based models ‘reﬂect a measure of E-complexity’ which is speciﬁed as ‘a greater number of possible exponents, inﬂectional classes, and principal parts’. Likewise, on p. 451, ‘the same E-complexity’ is equated with ‘the same number of declensions, paradigm cells, and allomorphs’, and in a later work (Ackerman & Malouf 2016: 125), E-complexity is said to increase with ‘(i) larger numbers of morphosyntactic properties a language contains, (ii) greater numbers of allomorphic variants it uses to encode them, and (iii) more inﬂectional classes that lexemes can be distributed over’. The interpretation of enumerative complexity as being simply an inventory count is clearly the one chosen by Henri et al. (Chapter 5, this volume): ‘a linguistic phenomenon’s enumerative complexity depends on how many categories (of whatever type) it employs’ (p. 106). They seem to have the same thing in

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi



337

mind when saying earlier (p. 106) that ‘[m]orphological complexity is often equated with numerousness—of morphs, categories, processes, or paradigm cells’. They also refer to Stump (2017), who is quite explicit on this point when he describes the distinction introduced by Ackerman & Malouf: ‘a linguistic phenomenon’s enumerative complexity depends on how many categories (of whatever type) it employs . . . ’. Parker & Sims (Chapter 2, this volume) also refer to enumerative complexity as ‘the number of inﬂection classes or the size of paradigms’. Ackerman & Malouf (2013), like also the chapters in this volume that quote it, tend to give the impression that the two notions of I-complexity and E-complexity more or less exhaust the possible approaches to morphological complexity, and that earlier work has been dominated by E-complexity. Thus, Ackerman & Malouf say in a footnote (2013: 434): ‘For examples of efforts to identify and quantify Ecomplexity, see, for example, Juola 1998, 2007, Sampson et al. 2010, Moscoso del Prado Martín 2011.’ But the works listed here represent a variety of approaches to linguistic complexity, including MDL-based ones. And it should be clear that Ecomplexity cannot be identiﬁed with description length. A list of morphosyntactic categories, inﬂection classes, and allomorphs is not yet a morphological description of a language.

13.6 Integrative complexity Minimum description length approaches to complexity can be said to represent ‘objective’ (Dahl 2004) or ‘absolute’ (Miestamo 2008) understandings of the notion in the sense that they concern properties of objects or systems that are independent of concepts such as ‘difﬁculty’ or ‘cost’, which imply an ‘agentrelated’ (Dahl 2004) or ‘relative’ (Miestamo 2008) notion of complexity. Ultimately, we want to understand how objective measures of linguistic complexity are related to how difﬁcult or costly different aspects of a language are for a learner or a user, but in order to do that, we have to keep objective and agentrelated notions apart and not let them be conﬂated. When Ackerman & Malouf (2013) say that their notion of ‘integrative complexity’ ‘reﬂects the difﬁculty that a paradigmatic system poses for language users (rather than lexicographers) in information-theoretic terms’, it invites the interpretation that they are doing exactly that—conﬂating objective complexity and difﬁculty. A more charitable understanding, however, is that their goal is to ﬁnd an objective measure of complexity that predicts the difﬁculty of a linguistic system, more speciﬁcally the uncertainty that faces a speaker when inferring an unknown word form from other forms in the same paradigm. The most important measure then becomes ‘the average uncertainty in guessing the realization of one randomly selected cell in the paradigm of a lexeme given the realization of one other randomly selected

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

338

¨  

cell’. The central result of the study is that high E-complexity of paradigmatic systems is possible as long as low I-complexity is found in those in the form of average conditional entropy of paradigms. The deﬁnition of average conditional entropy presupposes that the set of possible realizations of the cell to be guessed is known and ﬁnite, otherwise the entropy cannot be calculated. This condition is not fulﬁlled when some of the possible realizations are suppletive. That is, the notion of conditional entropy cannot be applied to cases such as English go and went. It may perhaps be argued that those are precisely the situations where you have to know the paradigm in advance so guessing is not possible anyway. But it restricts the applicability of the notion to some extent. The notions of ‘conditional entropy’ and ‘average conditional entropy’, as applied to inﬂection templates, have some interesting mathematical properties not discussed by Ackerman & Malouf. ‘Average conditional entropy’ involves bidirectional predictability relations between cells in a paradigm template. These turn out to be ‘entangled’ in that there is an upper bound on the sum of two symmetric entropies, which has as a consequence that the average conditional entropy of a paradigm can never exceed 50% of what Ackerman & Malouf call its ‘declension entropy’, that is, the surprisal of the inﬂection class membership of a lexeme under the assumption that each inﬂection class is equally probable. I have no formal proof of this claim,² but I have tested it for all possible value combinations for sets of classes with sizes up to eight, where I had to stop due to limitations on computer capacity. Concretely, this means that in a system with eight declension classes and declension entropy equal to 3—like the Greek one exempliﬁed in Ackerman & Malouf (2013), the average conditional entropy could not be higher than 1.5. This fact should be taken into account when assessing the actual average conditional entropies calculated by Ackerman & Malouf—as, when they (p. 442) say that the overall average conditional entropy for the eight Greek

² But consider the simplest case: a system with two inﬂection classes and two inﬂectional forms, as illustrated in Table 13.1. There are four logical possibilities in such a 22 matrix: (1) identity between the rows in both columns; (2) identity in row 1 and no identity in row 2; (3) no identity in row 1 but identity in row 2; (4) no identity in either row. Case (1) can be disregarded since it would mean there is really only one inﬂection class. The entropy is zero. In case (4), one form always gives full information about the other, so the entropy is zero. In case (2), the cells in row 1 do not say anything about the cells in row 2, so the entropy for each cell is equal to the choice between two items, that is 1 (=one bit). But since there is no choice in row 1, the entropy in the opposite direction is 0, which gives an average of 0.5. Case (3) is analogous, but with the columns swapped—the average will again be 0.5. Note further that adding a third column will not change anything for the following reason. Guessing is always from one column to another, so we are always dealing with pairs of columns, in which guessing can go in either direction. While a 22 matrix involves just one such pair, a 32 matrix with columns ABC entails three pairs of columns: AB, AC, BC. But that makes such a matrix equivalent to three 22 matrices—and as we saw, a 22 matrix has a maximum average guessing entropy of 0.5, the value for the 32 matrix is the same. And adding further columns gives an analogous result. Things get more complicated when rows are added, but my computer simulation strongly suggests that the relation between declension entropy and maximum average conditional entropy is constant.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi



339

declensions is 0.644 bits, which is equal ‘to a choice among . . . 1.56 equally likely declensions’ or ‘slightly more than one’ declension. This is misleading in the sense that no system with two declensions could ever have an average conditional entropy higher than 0.5. Thus, if the entropy is 0.644, the system must have at least three declensions. The low values for average conditional entropy found by Ackerman & Malouf thus at least partly depend on mathematical necessity rather than on anything else. It appears that integrative complexity, in the form of conditional entropy, primarily depends on two factors: one is the extent to which forms ‘wear their inﬂection class on their sleeve’, that is, are informative about their own inﬂectional class, the other is the extent to which the distributions of allomorphs—or, more generally, exponents—differ between forms and thus, in the words of Parker & Sims (Chapter 2, this volume), increase the ‘extent to which the system inhibits motivated inferences about the realized form of a lexeme, given one or more other realized forms of the same lexeme’. The dependence of conditional entropy on these factors means that its relationship to minimum descriptive length complexity is not straightforward. The ﬁrst factor—the informativity of a form about its inﬂection class membership— means that there is an inverse relation between the diversity of forms in the predicting cells and integrative complexity. Thus, lack of overt marking, which will in general decrease description length, can actually increase integrative complexity. Consider the hypothetical noun inﬂection templates in Table 13.1, with the rows representing two inﬂectional classes. The templates can be generated by the rules beneath the table. Table 13.1. Hypothetical noun inﬂection templates (a) 1 2

(b) sg -∅ -∅

pl -e -i

1 2

sg -a -o

pl -e -i

Rules: (a) If plural then (if 1 -e else -i) else -∅; (b) if plural then (if 1 then -e else -i) else (if 1 then -a else -o).

Thus, (b) has a greater description length than (a). However, in (b), the singular and plural markers are wholly predictable from each other, so the integrative complexity is 0. In (a), on the other hand, the plural form cannot be determined from the singular, which results in an average integrative complexity of 0.5—the theoretic maximum—for the whole template. The second factor—the degree to which allomorph distributions differ—means that a high average number of allomorphs—which would presumably lead to a higher description length—does not necessarily lead to a high integrative

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

340

¨  

complexity. Thus, paradoxically, the situation we saw in (b), where all cells of the paradigm are different from each other, will, irrespective of the size of the paradigm, always mean that the integrative complexity is zero. But this is not so strange if we realize that what integrative complexity really measures is the amount of discordance between the classiﬁcations of the lexicon entailed by the different columns in a paradigm.

13.7 Canonical complexity and transparency Nichols’s concept of CC builds on the notion of canonicity developed above all by Greville Corbett and his associates (see, e.g., Corbett 2007, 2013a, 2015). ‘CC’ should not be interpreted as ‘complexity in the canonical sense’, but rather, as Nichols herself admits, as ‘less logical’ alternative to the more cumbersome ‘noncanonicity-based complexity’, perhaps also paraphraseable as ‘degree of noncanonicity’. According to Nichols, canonicity theory ‘can be used as a good approximation to descriptive complexity [i.e. minimum description length Ö.D.] and is straightforwardly measurable and comparable’ even if it is not a complexity measure in itself. In Nichols’s words, ‘[i]t deﬁnes a logical space (for a linguistic concept or structure or system) by determining the central, or ideal, position in that space and kinds of departures from that ideal, and an element is non-canonical to the extent that it departs from the ideal’. According to Corbett (2015: 149), canonicity theory, or in his words, canonical typology, analyses and deﬁnes ‘phenomena that are subject to variability (across and within languages), extracting the various scales along which we characterize variability, and establishing the logical endpoint of these scales’, yielding theoretical spaces of possibilities, which once established can be populated with real instances. Canonical instances are those that match a full set of criteria and may therefore be infrequent or even nonexistent. This distinguishes canonicity from prototypicality with which it is easily confused. As the following quotation (Corbett 2015: 172) makes clear, phenomena are not canonical or non-canonical tout court, but rather they are canonical or noncanonical instances of some concept: Just as, for instance, we say that suppletion is a noncanonical realization of morphosyntactic speciﬁcation, but can then specify canonical suppletion . . . Similarly, inﬂection classes are themselves noncanonical, but we can go on to establish criteria for canonical inﬂection classes . . .

It would appear that this creates a problem for the notion of CC, since we would have to choose a concept to relate it to and also be rather cautious in doing so,

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi



341

since for some concepts, such as suppletion, ‘more canonical’ actually means ‘more complex’. I cannot see that this issue is addressed in an explicit fashion in Nichols’s Chapter 7, but since she says that she is concerned exclusively with ‘morphological complexity and speciﬁcally inﬂectional morphology’, it can be assumed that the canonicity she is speaking of is ‘canonical inﬂection’ as understood in Corbett’s (2015) paper. There is still a catch here, though. In general, one would assume that a language with minimal inﬂectional complexity would be one without any inﬂection at all, or that a minimally complex inﬂectional class system would be having no inﬂectional differences between lexemes. Under a consistent canonical approach, however, it would appear that isolating languages should not be seen as having zero inﬂectional complexity (and thus being maximally canonical), rather the notion of inﬂectional complexity would not be applicable to them. So far as I can see, Nichols’s sample does not contain any purely isolating languages (Mandarin is the one that comes closest) so it is not apparent how she would treat them. But the problem may show up again at another level. Thus, with regard to unpredictability of gender, Nichols puts languages with entirely predictable gender together with languages without gender—which maybe makes sense assuming that one is looking at canonicity of inﬂection but not if what is at stake is canonicity of gender. Nichols notes one point where there is a discrepancy between Kolmogorov complexity and CC—syncretism, that is, when two or more cells in a paradigm share the same word form. She notes that syncretism does ‘not increase the amount of information required to describe a language’. This may in fact be made stronger—syncretism often makes it possible to shorten a description. But syncretism will in general lead to violations of what Nichols refers to as ‘the structuralist notion of biuniqueness, or “one form, one function” ’,³ which Nichols sees as central to canonicity and thus syncretism increases CC. Likewise, Corbett (2015: 152) says: ‘In the canonical situation, the inﬂectional material is different in every cell of the lexeme. The major deviation here is syncretism; we have an expectation of a given number of inﬂectional forms, while with syncretism two or more of them are identical (two or more morphosyntactic speciﬁcations share a single realization).’ Sometimes it seems that the choice of criteria on canonicity rely on a demand for ‘proper behaviour’—if you have a distinction somewhere, you had better have it everywhere. If that makes things more complex does not really matter. What Nichols calls ‘biuniqueness’ (like Tallman & Epps (Chapter 9, this volume), who mention ‘deviations from biuniqueness’ as a criterion that relates ³ Cf. also the following statement by Mansﬁeld & Nordlinger (Chapter 3, this volume): ‘Inﬂectional allomorphy is a prototypical form of morphological complexity, introducing unpredictability into the mapping of form to meaning’.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

342

¨  

to measures of morphological complexity) is sometimes referred to as ‘transparency’, notably in the work of Kees Hengeveld and his associates. Hengeveld & Leufkens (2018) deﬁne ‘transparency’ as ‘a one-to-one relation between units of meaning and units of form’. As for the relationship between this concept and complexity, they say that ‘The difference is immediately evident from the fact that languages may be complex yet transparent or simple yet opaque’; however, they do not clarify what notion of complexity they have in mind except by giving Turkish as an example, where the verbal morphology ‘is highly complex in the sense that a single verbal word may contain a high number of different morphemes, but also highly transparent in that every morpheme corresponds to one ﬁxed meaning’. This suggests that they are speaking of structural complexity rather than system complexity. There is another problem in identifying deviations from the one-to-one relation between meaning and form with complexity, not addressed by the authors mentioned above, that is crucial when it comes to crosslinguistic comparisons. It concerns the identiﬁability of units of meaning and is particularly crucial in inﬂectional morphology. The grammar of a language may force speakers to express information that is not essential to their intended message. Thus, in a language with gendered pronouns, it may not be possible to refer to a person without revealing their gender. The consequence is that it is sometimes impossible to translate a sentence from one language into another which conveys exactly the same information, which makes it difﬁcult to compare the languages with respect to biuniqueness/transparency (see Dahl 2004: 80–6 for further discussion). The notion of ‘overspeciﬁcation’ is also relevant here. Following McWhorter (2007: 21–8), Berdicevskis & Semenuks (Chapter 11, this volume) regard overspeciﬁcation as one of the most crucial facets of complexity, deﬁning it as ‘overt and obligatory marking of a semantic distinction that is not necessary for communication’. Noting that ‘it is not at all obvious what is necessary for communication’, they mention McWhorter’s proposal to use crosslinguistic comparison to determine what is necessary: if a distinction is not universally present in languages, it can be assumed not to be necessary for communication. However, as is noted in Dahl (2004: 80), it is not possible to claim that a distinction is necessary or unnecessary as such, since that has to depend on what information the speaker wants to convey—the point is rather that a grammar may force speakers to express some information whether they like it or not.

13.8 Overabundance Meakins & Wilmoth (Chapter 4, this volume) focus on the phenomenon of ‘overabundance’, by which they mean ‘the exponence of multiple forms in the same cell in a paradigm’, arguing that it represents an increase in integrative

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi



343

complexity, ‘in that it requires speakers to make calculated choices about forms based on features beyond the paradigm’. The particular problem studied is optional subject marking in the mixed language Gurindji Kriol, more speciﬁcally ‘the alternation in the nominative cell of the Gurindji Kriol case paradigm between zero and -ngku’. They identify three factors which govern the variation: (i) transitivity; (ii) priming by a preceding subject in the discourse; (iii) presence of a co-referential (crossreferential) pronoun. This obviously expands the domain within which morphological complexity is considered. I think it may be questioned if this variation is to be treated within morphology at all; it looks similar to other cases of differential argument marking and would naturally be seen as a syntactic phenomenon. On the other hand, as I said above, seeing complexity only from a module-internal perspective can be seen as artiﬁcial and may prevent us from making relevant generalizations. In this case, we seem to be dealing with phenomena that were discussed in Dahl (2004: 128–34) under the rubrics ‘pattern competition’ and ‘pattern regulation’. I was mainly interested in what happens during grammaticalization in a single language, but it seems that what I said can be generalized to contact situations. My main point was that competition between two patterns, whether lexical or grammatical, may lead to an increase in complexity. As long as the patterns are in free variation, the increase is minimal (and does not lead to any signiﬁcant difﬁculty for learners and users), but there appears to be a universal tendency towards regulation of the variation, which at the initial stages shows itself merely in the form of tendencies.

13.9 Conclusion The chapters of the volume that I have looked at here are those in which there is explicit discussion of the basic notions relating to complexity employed in the chapters. Time and space considerations do not allow me to comment on the others, in spite of many of them being on topics that are of direct interest to me. One reﬂection is that the study of morphological complexity has still quite some way to go before there is a set of shared notions and standard works that everyone refers to. Which approaches will prevail in the long run is obviously an open question. It is notable that both the notion of minimum description length and Ackerman & Malouf’s notion of integrative complexity are ultimately based on information theory. It is not excluded that we will see other applications of this theory in the future.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

References Abel, Jennifer (2006). ‘That crazy idea of hers: The English double genitive as a focus construction’, Canadian Journal of Linguistics 51(1): 1–14. doi:10.1017/S0008413100003790 Aboh, Enoch O. (2009). ‘Competition and selection: That’s all!’, in Enoch O. Aboh and Norval Smith (eds), Complex Processes in New Languages. Amsterdam: John Benjamins, 317–44. doi:10.1075/cll.35.20abo Aboh, Enoch O. (2015). The Emergence of Hybrid Grammars. Cambridge: Cambridge University Press. doi:10.1017/CBO9781139024167 Aboh, Enoch O. and Umberto Ansaldo (2007). ‘The role of typology in language creation’, in Umberto Ansaldo, Stephen Matthews, and Lisa Lim (eds), Deconstructing Creole. Amsterdam: John Benjamins, 39–66. doi:10.1075/tsl.73.05abo Abouda, Lotﬁ and Marie Skrovec (2015). ‘Du rapport entre formes synthétique et analytique du futur. Étude de la variable modale dans un corpus oral micro-diachronique’, Revue de Sémantique et Pragmatique 38: 35–57. Abouda, Lotﬁ and Marie Skrovec (2017). ‘Du rapport micro-diachronique futur simple/ futur périphrastique en français moderne. Étude des variables temporelles et aspectuelles’, Corela, HS-21. URL: http://corela.revues.org/4804 Ackerman, Farrell, James Blevins, and Robert Malouf (2009). ‘Parts and wholes: Implicative patterns in inﬂectional paradigms’, in James P. Blevins and Juliette Blevins (eds), Analogy in Grammar: Form and Acquisition. Oxford: Oxford University Press, 54–82. Ackerman, Farrell and Robert Malouf (2013). ‘Morphological organization: The Low Conditional Entropy Conjecture’, Language 89(3): 429–64. doi:10.1353/lan.2013.0054. Ackerman, Farrell and Robert Malouf (2015). ‘The No Blur Principle effects as an emergent property of language systems’, Proceedings of the 41st Annual Meeting of the Berkeley Linguistics Society. Berkeley, CA, 1–14. doi:10.20354/B4414110014 Ackerman, Farrell and Robert Malouf (2016). ‘Word and pattern morphology: An information-theoretic approach’, Word Structure 9: 125–31. doi:10.3366/word.2016.0090 Agbetsoamedo, Yvonne (2014). ‘Noun classes in Sɛlɛɛ’, The Journal of West African Languages 41: 95–124. Aglarov, M. A. (1988). Sel’skaja obsčina v Nagornom Dagestane v XVII-načale XIX v. Moscow: Nauka. Aikhenvald, Alexandra Y. (2000). Classiﬁers: A Typology of Noun Categorization Devices. Oxford: Oxford University Press. Aikhenvald, Alexandra Y. (2002). Language Contact in Amazonia. Oxford: Oxford University Press. Aikhenvald, Alexandra Y. (2003a). ‘Mechanisms of change in areal diffusion: New morphology and language contact’, Journal of Linguistics 39(1): 1–29. doi:10.1017/ S0022226702001937 Aikhenvald, Alexandra Y. (2003b). A Grammar of Tariana. Cambridge: Cambridge University Press. Aikhenvald, Alexandra Y. (2004). Evidentiality. Oxford: Oxford University Press. Aikhenvald, Alexandra Y. and Robert M. W. Dixon (1998). ‘Evidentials and areal typology: A case study from Amazonia’, Language Sciences 20: 241–57.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

346



Aikhenvald, Alexandra Y. and R. M. W. Dixon (eds) (2006). Grammars in Contact: A CrossLinguistic Typology. Oxford: Oxford University Press. Aikhenvald, Alexandra Y. and Diana Green (1998). ‘Palikur and the typology of classiﬁers’, Anthropological Linguistics 40: 429–80. Åkerberg, Bengt (2012). Älvdalsk grammatik. Älvdalen: Ulum Dalska. Albright, Adam and Bruce Hayes (2002). ‘Modeling English past tense intuitions with minimal generalization’, in M. Maxwell (ed.), Proceedings of the 6th Meeting of the ACL Special Interest Group in Computational Phonology July 2002. New Brunswick, NJ: Association for Computational Linguistics, 58–69. Albright, Adam and Bruce Hayes (2003). ‘Rules vs. analogy in English past tenses: A computational/experimental study’, Cognition 90(2): 119–61. Alegre, Maria and Peter Gordon (1999a). ‘Frequency effects and the representational status of regular inﬂections’, Journal of Memory and Language 40(1): 41–61. Alegre, Maria and Peter Gordon (1999b). ‘Rule-based versus associative processes in derivational morphology’, Brain and Language 68(1–2): 347–54. Allen, Shanley E. M. (2017). ‘Polysynthesis in the acquisition of the Inuit languages’, in Michael Fortescue, Marianne Mithun, and Nicholas Evans (eds), The Oxford Handbook of Polysynthesis. Oxford: Oxford University Press, 449–72. Alleyne, Mervin (1996). Syntaxe historique créole. Paris: Editions Karthala. Ambrazas, Vytautas, Emma Geniušienė, Aleksas Girdenis, Nijolė Sližienė, Dalija Tekorienė, Adelė Valeckienė, and Elena Valiulytė. 2006. Lithuanian Grammar. 2nd ed. Vilnius: Baltos Lankos. Ambridge, Ben and Elena V. M. Lieven (2011). Child Language Acquisition. Cambridge: Cambridge University Press. Anderson, Stephen R. (1992). A-Morphous Morphology. Cambridge: Cambridge University Press. Anderson, Stephen R. (2015a). ‘Dimensions of morphological complexity’, in Matthew Baerman, Dunstan Brown, and Greville G. Corbett (eds), Understanding and Measuring Morphological Complexity. Oxford: Oxford University Press, 11–26. doi:10.1093/acprof: oso/9780198723769.003.0002 Anderson, Stephen R. (2015b). ‘The morpheme: Its nature and use’, in Matthew Baerman (ed.), The Oxford Handbook of Inﬂection. Oxford: Oxford University Press, 11–34. Arika, Ann Lindvall (2012). ‘Glimpses of the linguistic situation in Solomon Islands’. Paper given at the 6th international conference on ‘Languages, E-Learning and Romanian Studies’. Arka, Wayan (2011). A Rongga-English Dictionary with English-Rongga Wordlist. Jakarta: Penerbit Universitas Atma Jaya. Arkadiev, Peter (2020). ‘Morphology in typology: Historical retrospect, state of the art, and prospects’, in Mark Aronoff (ed.), Oxford Research Encyclopedia of Linguistics. New York: Oxford University Press. doi: 10.1093/acrefore/9780199384655.013.626 Arkadiev, Peter, Axel Holvoet, and Björn Wiemer (2015). ‘Introduction: Baltic linguistics— State of the art’, in Peter Arkadiev, Axel Holvoet, and Björn Wiemer (eds), Contemporary Approaches to Baltic Linguistics. Berlin: De Gruyter Mouton, 1–109. Arkadiev, Peter and Marian Klamer (2019). ‘Morphological theory and typology’, in Francesca Masini and Jenny Audring (eds), The Oxford Handbook of Morphological Theory. Oxford: Oxford University Press, 435–54. Armand, Alain (2014). Dictionnaire kréol rénioné français. Saint-André (Réunion): Epica.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi



347

Arnott, David Whitehorn (1970). The Nominal and Verbal Systems of Fula. Oxford: Clarendon. Aronoff, Mark (1994). Morphology by Itself: Stems and Inﬂectional Classes. Cambridge, MA: The MIT Press. Aronoff, Mark (1998). ‘Isomorphism and monotonicity: Or the disease model of morphology’, in Steven Lapointe, Diane Brentari, and Patrick Farrell (eds), Morphology and Its Relation to Phonology and Syntax. Stanford, CA: CSLI Publications, 411–18. Aronoff, Mark (2015). ‘Thoughts on morphology and cultural evolution’, in Laurie Bauer, Lívia Körtvélyessy, and Pavol Štekauer (eds), Semantics of Complex Words. Cham: Springer, 277–88. doi:10.1007/978-3-319-14102-2_13 Aski, Janice M. (1995). ‘Verbal suppletion: An analysis of Italian, French and Spanish to go’, Linguistics 33(3): 403–32. doi:10.1515/ling.1995.33.3.403 Atkinson, Mark, Kenny Smith, and Simon Kirby (2018). ‘Adult learning and language simpliﬁcation’, Cognitive Science 42(8): 2818–54. doi:10.1111/cogs.12686 Audring, Jenny (2014). ‘Gender as a complex feature’, Language Sciences 43: 5–17. doi:10.1016/j.langsci.2013.10.003 Audring, Jenny (2017). ‘Calibrating complexity: How complex is a gender system?’, Language Sciences 60: 53–68. doi:10.1016/j.langsci.2016.09.003 Audring, Jenny (2019). ‘Canonical, complex, complicated?’, in Francesca Di Garbo, Bruno Olsson, and Bernhard Wälchli (eds), Grammatical Gender and Linguistic Complexity, vol. I: General Issues and Speciﬁc Studies. Berlin: Language Science Press, 15–52. URL: http://langsci-press.org/catalog/book/223 Azen, Razia and Nicole Traxel (2009). ‘Using dominance analysis to determine predictor importance in logistic regression’, Journal of Educational and Behavioral Sciences 34(3): 319–47. doi:10.3102/1076998609332754 Baayen, R. Harald (2001). Word Frequency Distributions. Dordrecht: Kluwer Academic Publishers. Baayen, R. Harald (2007). ‘Storage and computation in the mental lexicon’, in Gonia Jarema and Gary Libben (eds), The Mental Lexicon: Core Perspectives. Amsterdam: Elsevier, 81–104. Baayen, R. Harald (2008). Analyzing Linguistic Data: A Practical Introduction to Statistics Using R. Cambridge: Cambridge University Press. Baayen, R. Harald, Rochelle Lieber, and Robert Schreuder (1997). ‘The morphological complexity of simplex nouns’, Linguistics 35: 861–77. doi:10.1515/ling.1997.35.5.861 Baayen, R. Harald, Petar Milin, Dusica Filipović Đurđević, Peter Hendrix, and Marco Marelli (2011). ‘An amorphous model for morphological processing in visual comprehension based on naive discriminative learning’, Psychological Review 118(3): 438–81. doi:10.1037/a0023851 Baayen, R. Harald, Lee H. Wurm, and Joanna Aycock (2007). ‘Lexical dynamics for lowfrequency complex words: A regression study across tasks and modalities’, The Mental Lexicon 2(3): 419–63. doi:10.1075/ml.2.3.06baa Babou, Cheikh Anta and Michele Loporcaro (2016). ‘Noun classes and grammatical gender in Wolof ’, Journal of African Languages and Linguistics 37(1): 1–57. doi:10.1515/jall2016-0001 Baechler, Raffaela (2017). Absolute Komplexität in der Nominalﬂexion. Berlin: Language Science Press. URL: http://langsci-press.org/catalog/book/134 Baechler, Raffaela and Guido Seiler (eds) (2016). Complexity, Isolation, and Variation. Berlin: De Gruyter.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

348



Baerman, Matthew (2012). ‘Paradigmatic chaos in Nuer’, Language 88(3): 467–94. doi:10.1353/lan.2012.0065 Baerman, Matthew (2016). ‘Seri verb classes: Morphosyntactic motivation and morphological autonomy’, Language 92(4): 792–823. doi:10.1353/lan.2016.0073 Baerman, Matthew, Dunstan Brown, and Greville G. Corbett (2005). The SyntaxMorphology Interface: A Study of Syncretism. Cambridge: Cambridge University Press. Baerman, Matthew, Dunstan Brown, and Greville G. Corbett (2010). ‘Morphological complexity: A typological perspective’. Ms, Surrey Morphology Group, University of Surrey. URL: http://epubs.surrey.ac.uk/814702/ Baerman, Matthew, Dunstan Brown, and Greville G. Corbett (eds) (2015a). Understanding and Measuring Morphological Complexity. Oxford: Oxford University Press. Baerman, Matthew, Dunstan Brown, and Greville G. Corbett (2015b). ‘Understanding and measuring morphological complexity: An introduction’, in Matthew Baerman, Dunstan Brown, and Greville G. Corbett (eds), Understanding and Measuring Morphological Complexity. Oxford: Oxford University Press, 3–10. Baerman, Matthew, Dunstan Brown, and Greville G. Corbett (2017). Morphological Complexity. Cambridge: Cambridge University Press. Baerman, Matthew, Greville G. Corbett, and Dunstan Brown (eds) (2010). Defective Paradigms: Missing Forms and What They Tell Us. Oxford: Oxford University Press and British Academy. Baerman, Matthew, Greville G. Corbett, Dunstan Brown, and Andrew Hippisley (eds) (2007). Deponency and Morphological Mismatches. Oxford: Oxford University Press and British Academy. Baissac, Charles (1880). Etudes du patois mauricien. Nancy: Imprimerie Berger-Levrault. Baker, Philip (1972). Kreol: A Description of Mauritian Creole. Ann Arbor: Karoma. Baker, Philip and Chris Corne (1982). Isle de France Creole: Afﬁnities and Origins. Ann Arbor, MI: Karoma. Bakker, Peter (1997). A Language of Our Own: The Genesis of Michif, the Mixed CreeFrench Language of the Canadian Métis. Oxford: Oxford University Press. Bakker, Peter (2003). ‘Mixed languages as autonomous systems’, in Yaron Matras and Peter Bakker (eds), The Mixed Language Debate: Theoretical and Empirical Advances. Berlin: Mouton de Gruyter, 107–50. Bakker, Peter (2013). ‘Michif ’, in Susanne Maria Michaelis, Philipe Maurer, Martin Haspelmath, and Magnus Huber (eds), The Atlas and Survey of Pidgin and Creole Languages, vol. 3: Contact Languages Based on Languages from Africa, Australia, and the Americas. Oxford: Oxford University Press, 158–65. Bakker, Peter (2014). ‘Creolistics: Back to square one?’, Journal of Pidgin and Creole Languages 29: 177–94. doi:10.1075/jpcl.29.1.08bak Bakker, Peter, Aymeric Daval-Markussen, Mikael Parkvall, and Ingo Plag (2011). ‘Creoles are typologically distinct from non-creoles’, Journal of Pidgin and Creole Languages 26(1): 5–42. doi:10.1075/jpcl.26.1.02bak Balode, Laimute and Axel Holvoet (2001). ‘The Latvian language and its dialects’, in Östen Dahl and Maria Koptjevskaja-Tamm (eds), The Circum-Baltic Languages: Typology and Contact, vol. 1: Past and Present. Amsterdam: John Benjamins, 3–40. Bao Diop, Sokhna (2015). ‘Les classes nominales en nyun gunyamolo’, in Denis Creissels and Konstantin Pozdniakov (eds), Les classes nominales dans les langues atlantiques. Köln: Köppe, 371–405. Baptista, Marlyse (2003a). ‘Inﬂectional plural marking in creoles and pidgins: A comparative study’, in Ingo Plag (ed.), The Phonology and Morphology of Creole Languages. Tübingen: Niemeyer, 315–32.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi



349

Baptista, Marlyse. (2003b). ‘Number inﬂection in creole languages’, Interface 6: 3–26. Becher, Jutta (2001). Untersuchungen zum Sprachwandel im Wolof aus diachroner und synchroner Perspektive. University of Hamburg PhD dissertation. Beier, Christine, Lev Michael, and Joel Sherzer (2002). ‘Discourse forms and processes in indigenous lowland South America: An areal-typological perspective’, Annual Review of Anthropology 31: 121–45. doi:10.1146/annurev.anthro.31.032902.105935 Bendor-Samuel, John Theodore (ed.) (1989). The Niger-Congo Languages: A Classiﬁcation and Description of Africa’s Largest Language Family. Lanham, MD: University Press of America, by arrangement with the Summer Institute of Linguistics (SIL). Bentley, W. Holman (1887). Dictionary and Grammar of the Kikongo Language. London: Trübner & Co. Bentz, Christian (2016). ‘The low-complexity-belt: Evidence for large-scale language contact in human pre-history?’, in Sean G. Roberts, Christine Cuskley, Luke McCrohon, Lluís Barceló-Coblijn, Olga Feher, and Tessa Verhoef (eds), The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11). doi:10.17617/2.2248195 Bentz, Christian, Dimitrios Alikaniotis, Michael Cysouw, and Ramon Ferrer-i-Cancho (2017). ‘The entropy of words—Learnability and expressivity across more than 1000 languages’, Entropy 19: 275. doi:10.3390/e19060275 Bentz, Christian and Aleksandrs Berdicevskis (2016). ‘Learning pressures reduce morphological complexity: Linking corpus, computational and experimental evidence’, in Dominique Brunato, Felice Dell’Orletta, Giulia Venturi, Thomas François, and Philippe Blache (eds), Proceedings of the Workshop ‘Computational Linguistics for Linguistic Complexity (CL4LC)’. Osaka, Japan, 222–32. Bentz, Christian and Morten H. Christiansen (2013). ‘Linguistic adaptation: The trade-off between case marking and ﬁxed word orders in Germanic and Romance languages’, in Feng Shi and Gang Peng (eds), Eastward Flows the Great River: Festschrift in Honor of Professor William S-Y. Wang on his 80th Birthday. Hong Kong: City University of Hong Kong Press, 45–61. Bentz, Christian, Annemarie Verkerk, Douwe Kiela, Felix Hill, and Paul Buttery (2015). ‘Adaptive communication: Languages with more non-native speakers tend to have fewer word forms’, PLoS ONE 10(6): e0128254. doi:10.1371/journal.pone.0128254 Bentz, Christian and Bodo Winter (2013). ‘Languages with more second language learners tend to lose nominal case’, Language Dynamics and Change 3: 1–27. doi:10.1163/ 22105832-13030105 Berdicevskis, Aleksandrs, Çağrı Çöltekin, Katharina Ehret, Kilu von Prince, Daniel Ross, Bill Thompson, Chunxiao Yan, Vera Demberg, Gary Lupyan, Taraka Rama, and Christian Bentz (2018). ‘Using universal dependencies in cross-linguistic complexity research’, in Marie-Catherine de Marneffe, Teresa Lynn, and Sebastian Schuster (eds), Proceedings of the Second Workshop on Universal Dependencies (UDW 2018). Brussels: Association for Computational Linguistics, 8–17. Berdicevskis, Aleksandrs and Arturs Semenuks (submitted). ‘Imperfect language learning reduces morphological overspeciﬁcation: Experimental evidence’. Bernini-Montbrand, Danièle, Ralph Ludwig, Hector Poullet, and Sylviane Telchid (2013). Dictionnaire créole-français Guadeloupe, avec un abrégé de grammaire créole, un lexique français-créole, les comparaisons courantes, les locutions et plus de 1000 proverbes. Paris: Orphie. Berry, Keith and Christine Berry (1999). A Description of Abun. Canberra: Paciﬁc Linguistics. Bertrand-Bocandé, Emmanuel (1849). ‘Notes sur la Guinée portugaise ou Sénégambie méridionale’ [pt. 2], Bulletin de la Société de Géographie 12: 57–93.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

350



Bickel, Balthasar, Goma Banjade, Martin Gaenszle, Elena Lieven, Netra Prasad Paudyal, Ichchha Purna Rai, Manoj Rai, Novel Kishore Rai, and Sabine Stoll (2007). ‘Free preﬁx ordering in Chintang’, Language, 83(1): 43–73. doi:10.1353/lan.2007.0002 Bickel, Balthasar and Johanna Nichols (2002). ‘Autotypologizing databases and their use in ﬁeldwork’, in Peter Austin, Helen Dry, and Peter Wittenburg (eds), International LREC Workshop on Resources and Tools in Field Linguistics, Las Palmas, 26–7 May 2002. Nijmegen: Max Planck Institute for Psycholinguistics. Bickel, Balthasar and Johanna Nichols (2005). ‘Inﬂectional synthesis of the verb’, in Martin Haspelmath, Matthew Dryer, David Gil, and Bernard Comrie (eds), The World Atlas of Language Structures. Oxford: Oxford University Press, 94–7. Bickel, Balthasar and Johanna Nichols (2007). ‘Inﬂectional morphology’, in Timothy Shopen (ed.), Language Typology and Syntactic Description, vol. 3: Grammatical Categories and the Lexicon. Cambridge: Cambridge University Press, 169–240. Bickel, Balthasar and Johanna Nichols (2013). ‘Inﬂectional synthesis of the verb’, in Matthew Dryer and Martin Haspelmath (eds), World Atlas of Language Structures Online. URL: http://wals.info/chapter/22 Bickel, Balthasar, Johanna Nichols, Taras Zakharko, Alena Witzlack-Makarevich, Kristine Hildebrandt, Michael Rießler, Lennart Bierkandt, Fernando Zúñiga, and John B. Lowe (2017). The Autotyp typological databases. Version 0.1.0. URL: https://github.com/ autotyp/autotyp-data/tree/0.1.0 Bickel, Balthasar and Fernando Zúñiga (2017). ‘The “word” in polysynthetic languages: Phonological and syntactic challenges’, in Michael Fortescue, Marianne Mithun, and Nicholas Evans (eds), The Oxford Handbook of Polysynthesis. Oxford: Oxford University Press, 158–85. Bickerton, Derek (1981). Roots of Language. Ann Arbor, MI: Karoma. Bickerton, Derek (1984). ‘The language bioprogram hypothesis’, Behavioral and Brain Sciences 7(2): 173–88. doi:10.1017/S0140525X00044149 Bickerton, Derek (1988). ‘Creole languages and the bioprogram’, in Frederick Newmeyer (ed.), Linguistics: The Cambridge Survey, vol. 2: Linguistic Theory. Extensions and Implications. Cambridge: Cambridge University Press, 268–84. Birchall, Joshua (2014). Argument Marking Patterns in South American Languages. Universiteit Nijmegen PhD dissertation. Blasi, E. Damián, Susanne Maria Michaelis, and Martin Haspelmath (2017). ‘Grammars are robustly transmitted even during the emergence of creole languages’, Nature Human Behaviour 1: 723–9. doi:10.1038/s41562-017-0192-4 Blench, Roger (2009). ‘Do the Ghana-Togo mountain languages constitute a genetic group?’, The Journal of West African Languages 36(1–2): 19–36. Blevins, James P. (2006). ‘Word-based morphology’, Journal of Linguistics 42(3): 531–73. doi:10.1017/S0022226706004191 Blevins, James P. (2013). ‘Word-based morphology from Aristotle to modern WP (Word and Paradigm models)’, in Keith Allen (ed.), The Oxford Handbook of the History of Linguistics. Oxford: Oxford University Press, 375–95. Blevins, James P. (2016a). ‘The minimal sign’, in Gregory Stump and Andrew Hippisley (eds), The Cambridge Handbook of Morphology. Cambridge: Cambridge University Press, 50–69. Blevins, James P. (2016b). Word and Paradigm Morphology. Oxford: Oxford University Press.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi



351

Blevins, James P., Petar Milin, and Michael Ramscar (2017). ‘The Zipﬁan paradigm cell ﬁlling problem’, in Ferenc Kiefer, James P. Blevins, and Huba Bartos (eds), Perspectives on Morphological Structure: Data and Analyses. Leiden: Brill, 141–58. Bloomﬁeld, Leonard (1914). ‘Sentence and word’, Transactions and Proceedings of the American Philological Association 45: 65–75. Bloomﬁeld, Leonard (1933). Language. New York: Holt. Blythe, Joe (2009). Doing Referring in Murriny Patha Conversation. University of Sydney PhD dissertation. Blythe, Joe, Rachel Nordlinger, and Nicholas Reid (2007). ‘Murriny Patha ﬁnite verb paradigms’. Unpublished ms. Boilat, David (1858). Grammaire de la langue woloffe. Paris: Imprimerie Impériale. URL: http://babel.hathitrust.org/cgi/pt?id=wu.89012299343;view=1up;seq=11 Bokamba, Eyamba (1977). ‘The impact of multilingualism on language structures: The case of Central Africa’, Anthropological Linguistics 19: 181–202. Bolaños, Katherine (2016). A Grammar of Kakua. Utrecht: LOT. Bonami, Olivier (2013). ‘Towards a robust assessment of implicative relations in inﬂectional systems’. Paper given at the ‘Workshop on Computational Approaches to Morphological Complexity’, Paris. Bonami, Olivier (2015). ‘Periphrasis as collocation’, Morphology 25: 63–110. doi:10.1007/ s11525-015-9254-3 Bonami, Olivier and Sarah Beniamine (2015). ‘Implicative structure and joint predictiveness’, in Vito Pirrelli, Claudia Marzi, and Marcello Ferro (eds), Word Structure and Word Usage: Proceedings of the NetWordS Final Conference, Pisa, Italy, March 30–April 1, 2015. Pisa: Institute for Computational Linguistics, National Research Council, 4–9. Bonami, Olivier and Sarah Beniamine (2016). ‘Joint predictiveness in inﬂectional paradigms’, Word Structure 9(2): 156–82. doi:10.3366/word.2016.0092 Bonami, Olivier and Gilles Boyé (2002). ‘Suppletion and dependency in inﬂectional morphology’, in Frank van Eynde, Lars Hellan, and Dorothee Beermann (eds), Proceedings of the 8th International Conference on Head-Driven Phrase Structure Grammar. Stanford: CSLI, 51–70. Bonami, Olivier and Gilles Boyé (2003). ‘Supplétion et classes ﬂexionnelles dans la conjugaison du français’, Langages 15: 102–26. Bonami, Olivier and Gilles Boyé (2007). ‘French pronominal clitics and the design of Paradigm Function Morphology’, in Geert E. Booij, Luca Ducceschi, Bernard Fradin, Emiliano Guevara, Angela Ralli, and Sergio Scalise (eds), On-line Proceedings of the Fifth Mediterranean Morphology Meeting (MMM5) Fréjus, 15–18 September 2005. Bologna: University of Bologna, 291–322. Bonami, Olivier, Gilles Boyé, and Fabiola Henri (2011). ‘Measuring inﬂectional complexity: French and Mauritian’. Paper given at the ‘Workshop on Quantitative Measures in Morphology and Morphological Development’, San Diego. Bonami, Olivier, Gilles Boyé, and Françoise Kerleroux (2009). ‘L’allomorphie radicale et la relation ﬂexion-construction’, in Bernard Fradin, Françoise Kerleroux, and Marc Plénat (eds), Aperçus de morphologie du français. Saint-Denis: Presses Universitaires de Vincennes, 103–25. Bonami, Olivier and Fabiola Henri (2010). ‘Assessing empirically the complexity of Mauritian Creole’. Paper given at the conference ‘Formal Approaches to Creole Studies 2’, Berlin.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

352



Bonami, Olivier, Fabiola Henri, and Ana R. Luís (2013). ‘Comparing sources of inﬂectional morphology in Romance-based creoles’. Paper given at the workshop ‘Portuguese-based Creoles in Perspective’, Coimbra. Bonami, Olivier, Fabiola Henri, and Ana R. Luís (2015). ‘Making sense of morphological complexity’. Paper given at the ‘SeePiCLa Meeting’, Lisbon. Bond, Oliver, Greville G. Corbett, Marina Chumakina, and Dunstan Brown (eds) (2016). Archi: Complexities of Agreement in Cross-theoretical Perspective. Oxford: Oxford University Press. Booij, Geert E. (1993). ‘Against split morphology’, in Geert E. Booij and Jaap van Marle (eds), Yearbook of Morphology 1993. Dordrecht: Kluwer, 27–49. doi:10.1007/978-94017-3712-8_2 Booij, Geert E. (1997). ‘Allomorphy and the autonomy of morphology’, Folia Linguistica 31: 25–56. doi:10.1515/ﬂin.1997.31.1-2.25 Booij, Geert E. (2010). Construction Morphology. Oxford: Oxford University Press. Boyé, Gilles and Patricia Cabredo Hofherr (2006). ‘The structure of allomorphy in Spanish verbal inﬂection’, Cuadernos de Lingüística del Instituto Universitario Ortega y Gasset 13: 9–24. Bozic, Mirjana and William Marslen-Wilson (2010). ‘Neurocognitive contexts for morphological complexity: Dissociating inﬂection and derivation’, Language and Linguistics Compass 4(11): 1063–73. doi:10.1111/j.1749-818X.2010.00254.x Brandão, Ana Paula B. (2014). A Reference Grammar of Paresi-Haliti (Arawak). University of Texas at Austin PhD dissertation. Bresnan, Joan (2007). ‘Is syntactic knowledge probabilistic? Experiments with the English dative alternation’, in Sam Featherston and Wolfgang Sternefeld (eds), Roots: Linguistics in Search of Its Evidential Base. Berlin: Mouton de Gruyter, 77–96. Bresnan, Joan and Marilyn Ford (2013). ‘Predicting syntax: Processing dative constructions in American and Australian varieties of English’, Language 86(1): 186–213. doi:10.1353/ lan.0.0189 Brown, Dunstan, Greville G. Corbett, Norman M. Fraser, Andrew Hippisley, and Alan Timberlake (1996). ‘Russian noun stress and Network Morphology’, Linguistics 34(1): 53–107. doi:10.1515/ling.1996.34.1.53 Brown, Dunstan and Andrew Hippisley (2012). Network Morphology: A Defaults-Based Theory of Word Structure. Cambridge: Cambridge University Press. Burzio, Luigi (2004). ‘Paradigmatic and syntagmatic relations in Italian verbal inﬂection’, in Julie Auger, J. Clancy Clements, and Barbara Vance (eds), Contemporary Approaches to Romance Linguistics. Amsterdam: John Benjamins, 17–44. Bybee, Joan L. (1985). Morphology: A Study of the Relation between Meaning and Form. Amsterdam: John Benjamins. Bybee, Joan L. (1995). ‘Regular morphology and the lexicon’, Language and Cognitive Processes 10(5): 425–55. doi:10.1080/01690969508407111 Bybee, Joan L. (2007). Frequency of Use and the Organization of Language. Oxford: Oxford University Press. Bybee, Joan L. and Clay Beckner (2015). ‘Language use, cognitive processes, and linguistic change’, in Claire Bowern and Bethwyn Evans (eds), The Routledge Handbook of Historical Linguistics. London: Routledge, 503–18. Bybee, Joan L. and Carol Lynn Moder (1983). ‘Morphological classes as natural categories’, Language 59: 251–70. doi:10.2307/413574 Bybee, Joan and Dan I. Slobin (1982). ‘Rules and schemas in the development and use of the English past tense’, Language 58(2): 265–89. doi:10.2307/414099

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi



353

Cadely, Jean-Robert (1994). Aspects de la phonologie du créole haïtien. Université du Québec à Montréal PhD dissertation. Camara, Sana (2006). Wolof Lexicon and Grammar. Madison, WI: NALRC Press. Cameron-Faulkner, Thea and Andrew Carstairs-McCarthy (2000). ‘Stem alternants as morphological signata: Evidence from blur avoidance in Polish nouns’, Natural Language and Linguistic Theory 18(4): 813–35. doi:10.1023/A:1006496821412 Campbell, Lyle (2012). ‘Typological characteristics of South American indigenous languages’, in Lyle Campbell and Verónica Grondona (eds), The Indigenous Languages of South America: A Comprehensive Guide. Berlin: Mouton de Gruyter, 259–330. Carlin, Eithne (2006). ‘Feeling the need: The borrowing of Cariban functional categories into Mawayana (Arawak)’, in Alexandra Y. Aikhenvald and Robert M. W. Dixon (eds), Grammars in Contact: A Cross-Linguistic Perspective. Oxford: Oxford University Press, 313–32. Carstairs, Andrew (1983). ‘Paradigm economy’, Journal of Linguistics 19(1): 115–28. doi:10.1017/S0022226700007477 Carstairs, Andrew (1987). Allomorphy in Inﬂexion. London: Croom Helm. Carstairs-McCarthy, Andrew (1994). ‘Inﬂection classes, gender, and the Principle of Contrast’, Language 70(4): 737–88. Carstairs-McCarthy, Andrew (1998). ‘How lexical semantics constrains inﬂectional allomorphy’, in Geert E. Booij and Jaap van Marle (eds), Yearbook of Morphology 1997. Dordrecht: Springer, 1–24. doi:10.1007/978-94-011-4998-3_1 Carstairs-McCarthy, Andrew (2010). The Evolution of Morphology. Oxford: Oxford University Press. Chao, Yuen Ren (1968). A Grammar of Spoken Chinese. Berkeley, CA: University of California Press. Chaudenson, Robert (2003). La créolisation. Théorie, applications, implications. Paris: L’Harmattan. Childs, G. Tucker (1983). ‘Noun class afﬁx renewal in Southern West Atlantic’, in Jonathan D. Kaye, Hilda Koopman, Dominique Sportiche, and André Dugas (eds), Current Approaches to African Linguistics II. Dordrecht: Mouton de Gruyter and Foris Publications, 17–29. Childs, G. Tucker (2009). ‘What happens when a language dies? Language change vs. language death’, Studies in African Linguistics 38(2): 113–30. Chirikba, Viacheslav A. (2008). ‘The problem of the Caucasian Sprachbund’, in Pieter C. Muysken (ed.), From Linguistic Areas to Areal Linguistics. Amsterdam: John Benjamins, 25–94. Ciucci, Luca (2014). ‘Tracce di contatto tra la famiglia zamuco (ayoreo, chamacoco) e altre lingue del Chaco. Prime prospezioni’, Quaderni del Laboratorio di Linguistica 13: 1–52. Clahsen, Harald, Claudia Felser, Kathleen Neubauer, Mikako Sato, and Renita Silva (2010). ‘Morphological structure in native and nonnative language processing’, Language Learning 60: 21–43. doi:10.1111/j.1467-9922.2009.00550.x Cobbinah, Alexander (2010). ‘The Casamance as an area of intense language contact: The case of Baïnounk Gubaher’, in Friederike Lüpke and Mary Raymond (eds), Documenting Atlantic–Mande convergence and diversity. Special issue of the Journal of Language Contact—THEMA 3: 175–202. Cole, Desmond T. (1967). Some Features of Ganda Linguistic Structure. Johannesburg: Witwatersrand University Press. Comrie, Bernard (1989). Language Universals and Linguistic Typology. 2nd ed. Chicago: University of Chicago Press.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

354



Comrie, Bernard (1992). ‘Before complexity’, in John A. Hawkins and Murray Gell-Mann (eds), The Evolution of Human Languages. London: Addison-Wesley, 193–211. Comrie, Bernard, Lucía A. Golluscio, Hebe Gonzáles, and Alejandra Vidal (2010). ‘El Chaco como área lingüística’, in Z. Estrada Fernández and R. Arzápalo Marín (eds), Estudios de lenguas amerindias, vol. 2: Contribuciones al estudio de las lenguas originarias de América. Hermosillo, Sonora (Mexico): Editorial Unison, 85–130. Corbett, Greville G. (1982). ‘Gender in Russian: An account of gender speciﬁcation and its relationship to declension’, Russian Linguistics 6(2): 197–232. Corbett, Greville G. (1991). Gender. Cambridge: Cambridge University Press. Corbett, Greville G. (2000). Number. Cambridge: Cambridge University Press. Corbett, Greville G. (2007). ‘Canonical typology, suppletion, and possible words’, Language 83(1): 8–42. doi:10.1353/lan.2007.0006 Corbett, Greville G. (2009). ‘Suppletion: Typology, markedness, complexity’, in Patrick O. Steinkrüger and Manfred Krifka (eds), On Inﬂection. Berlin: Mouton de Gruyter, 25–40. Corbett, Greville G. (2013a). ‘Canonical morphosyntactic features’, in Dunstan Brown, Marina Chumakina, and Greville Corbett (eds), Canonical Morphology and Syntax. Oxford: Oxford University Press, 48–65. Corbett, Greville G. (2013b). ‘The unique challenge of the Archi paradigm’, in Chundra Cathcart, Shinae Kang, and Clare S. Sandy (eds), Proceedings of the 37th Annual Meeting, Berkeley Linguistics Society: Special Session on Languages of the Caucasus, 52–67. Corbett, Greville G. (2015). ‘Morphosyntactic complexity: A typology of lexical splits’, Language 91(1): 145–93. doi:10.1353/lan.2015.0003 Corbett, Greville G. and Sebastian Fedden (2016). ‘Canonical gender’, Journal of Linguistics 52: 495–531. doi:10.1017/S0022226715000195 Corbett, Greville G. and Norman M. Fraser (1993). ‘Network Morphology: A DATR account of Russian nominal inﬂection’, Journal of Linguistics 29(1): 113–42. doi:10.1017/S0022226700000074 Corbett, Greville G., Andrew Hippisley, Dunstan Brown, and Paul Marriott (2001). ‘Frequency, regularity and the paradigm: A perspective from Russian on a complex relation’, in Joan Bybee and Paul J. Hopper (eds), Frequency and the Emergence of Linguistic Structure. Amsterdam: John Benjamins, 201–26. Corne, Chris (1982). ‘A contrastive analysis of Reunion and Isle de France Creole French: Two typologically diverse languages’, in Philip Baker and Chris Corne (eds), Isle de France Creole: Afﬁnities and Origins. Ann Arbor, MI: Karoma, 8–129. Corne, Chris (1999). From French to Creole. London: University of Westminster Press. Cotterell, Ryan, Christo Kirov, Mans Hulden, and Jason Eisner (2019). ‘On the complexity and typology of inﬂectional morphological systems’, Transactions of the Association for Computational Linguistics 7: 327–42. doi: 10.1162/tacl_a_00271 Crevels, Mily and Hein van der Voort (2008). ‘The Guaporé-Mamoré Region as a Linguistic Area’, in Pieter C. Muysken (ed.), From Linguistic Areas to Areal Linguistics. Amsterdam: John Benjamins, 151–79. Croft, William (1991). Syntactic Categories and Grammatical Relations: The Cognitive Organization of Information. Chicago: University of Chicago Press. Croft, William (2001). Radical Construction Grammar: Syntactic Theory in Typological Perspective. Oxford: Oxford University Press. Cruschina, Silvio, Martin Maiden, and John C. Smith (eds) (2013). The Boundaries of Pure Morphology: Diachronic and Synchronic Perspectives. Oxford: Oxford University Press.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi



355

Cuskley, Christine, Francesca Colaiori, Claudio Castellano, Vittorio Loreto, Martina Pugliese, and Francesca Tria (2015). ‘The adoption of linguistic rules in native and non-native speakers: Evidence from a Wug task’, Journal of Memory and Language 84: 205–23. doi:10.1016/j.jml.2015.06.005 Dahl, Östen (2004). The Growth and Maintenance of Linguistic Complexity. Amsterdam: John Benjamins. Dahl, Östen (2009). ‘Increases in complexity as a result of language contact’, in Kurt Braunmüller and Juliane House (eds), Convergence and Divergence in Language Contact Situations. Amsterdam: John Benjamins, 41–52. Dahl, Östen (2017). ‘Polysynthesis and complexity’, in Michael Fortescue, Marianne Mithun, and Nicholas Evans (eds), The Oxford Handbook of Polysynthesis. Oxford: Oxford University Press, 19–29. Dahl, Östen (2018). ‘Grammaticalization in the languages of Europe’, in Bernd Heine and Heiko Narrog (eds), Grammaticalization from a Typological Perspective. New York: Oxford University Press, 79–96. Dale, Rick and Gary Lupyan (2012). ‘Understanding the origins of morphological diversity: The Linguistic Niche Hypothesis’, Advances in Complex Systems 15(3–4): 1150017. doi:10.1142/S0219525911500172 Danielsen, Swintha (2007). Baure: An Arawak Language of Bolivia. Leiden: CNWS Publications. Dard, Jean (1825). Dictionnaire français–wolof et français–bambara, suivi du dictionnaire wolof–français. Paris: Imprimerie Royale. Dard, Jean (1826). Grammaire wolofe ou méthode pour étudier la langue des noirs qui habitent les royaumes de Bourba-Yolof, de Walo, de Damel, de Bour-Sine, de Saloume, de Baole, en Sénégambie. Paris: Imprimerie Royale. Daugherty, Kim G. and Mark S. Seidenberg (1994). ‘Beyond rules and exceptions: A connectionist approach to inﬂectional morphology’, in Susan D. Lima, Roberta L. Corrigan, and Gregory Iverson (eds), The Reality of Linguistic Rules. Amsterdam: John Benjamins, 353–88. de Boeck, Egide (1904). Grammaire et vocabulaire du Lingala, ou Langue du Haut-Congo. Brussels: Polleunis-Ceuterick. DeGraff, Michel (2001). ‘On the origin of creoles: A Cartesian critique of Neo-Darwinian linguistics’, Linguistic Typology 5(2–3): 213–310. doi:10.1515/lity.2001.002 DeGraff, Michel (2003). Against creole exceptionalism. Language 79(4): 391–410. DeGraff, Michel (2005). ‘Linguists’ most dangerous myth: The fallacy of creole exceptionalism’, Language in Society 34: 533–91. doi:10.1017/S0047404505050207 DeGraff, Michel (2007). ‘Haitian creole’. In John Holm and Peter L. Patrick (eds), Comparative Creole Syntax: Parallel Outlines of Eighteen Creole Grammars, vol. 7 of Westminster Creolistic Series. London: Battlebridge Publications, 101–26. de Groot, Casper (2008). ‘Morphological complexity as a parameter of linguistic typology: Hungarian as a contact language’, in Matti Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language Complexity: Typology, Contact, Change. Amsterdam: John Benjamins, 191–214. de Haan, Ferdinand (2013). ‘Semantic distinctions of evidentiality’, in Matthew S. Dryer and Martin Haspelmath (eds), The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. URL: http://wals.info/chapter/77 de Jong, Nivja Helena (2002). Morphological Families in the Mental Lexicon. Universiteit Nijmegen PhD dissertation. DeKeyser, Robert M. (2005). ‘What makes learning second-language grammar difﬁcult? A review of issues’, Language Learning 55: 1–25. doi:10.1111/j.0023-8333.2005.00294.x de Leeuw, Joshua R. (2014). ‘jsPsych: A JavaScript library for creating behavioral experiments in a Web browser’, Behavior Research Methods 47(1): 1–12. doi:10.3758/s13428-014-0458-y

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

356



Delafosse, Maurice (1927). ‘Les classes nominales en wolof ’, in Festschrift Meinhof. Sprachwissenschaftliche und andere Studien. Glückstadt: L. Friedrichsen, 29–44. [Reprinted in Gabriel Manessy and Serge Sauvageot (eds) (1963). Wolof et Sérèr. Études de phonétique et de grammaire descriptive. Dakar: University of Dakar Press, 25–42.] DeLancey, Scott (2011). ‘On the origin of Sinitic’, in Zhuo Jing-Schmidt (ed.), Proceedings of the 23rd North American Conference on Chinese Linguistics. Eugene: University of Oregon, 51–64. Derbyshire, Desmond (1987). ‘Morphosyntactic areal characteristics of Amazonian languages’, International Journal of American Linguistics 53: 311–26. doi:10.1086/466060 Derbyshire, Desmond and Doris Payne (1990). ‘Noun classiﬁcation systems of Amazonian languages’, in Doris Payne (ed.), Amazonian Linguistics: Studies in Lowland South American Languages. Austin, TX: University of Texas Press, 243–71. Derwing, Bruce L. (1990). ‘Morphology and the mental lexicon: Psycholinguistic evidence’, in Wolfgang U. Dressler, Hans C. Luschützky, Oskar E. Pfeiffer, and John R. Rennison (eds), Contemporary Morphology. Berlin: Mouton de Gruyter, 249–65. Deutscher, Guy (2009). ‘ “Overall complexity”: A wild goose chase?’, in Geoffrey Sampson, David Gil, and Peter S. Trudgill (eds), Language Complexity as an Evolving Variable. Oxford: Oxford University Press, 243–51. Diagne, Anna M., Sascha Kesseler, and Christian Meyer (eds) (2011). Communication wolof et société sénégalaise. Héritage et création. Paris: L’Harmattan. Diallo, Abdourahmane (2010). ‘Morphological consequences of Mande borrowings in Fula: The case of Pular, Fuuta–Jaloo’, in Friederike Lüpke and Mary Raymond (eds), Documenting Atlantic–Mande Convergence and Diversity. Special issue of the Journal of Language Contact—THEMA 3: 71–85. Diallo, Abdourahmane (2014). Language Contact in Guinea: The Case of Pular and Mande Varieties. Köln: Köppe. Di Garbo, Francesca (2014). Gender and Its Interaction with Number and Evaluative Morphology: An Intra- and Intergenealogical Typological Survey of Africa. Stockholm University PhD dissertation. Di Garbo, Francesca (2016). ‘Exploring grammatical complexity crosslinguistically: The case of gender’, Linguistic Discovery 14: 46–85. doi:10.1349/PS1.1537-0852.A.468 Di Garbo, Francesca and Matti Miestamo (2019). ‘The evolving complexity of gender agreement systems’, in Francesca Di Garbo, Bruno Olsson, and Bernhard Wälchli (eds), Grammatical Gender and Linguistic Complexity, vol. II: World-Wide Comparative Studies. Berlin: Language Science Press, 15–60. doi:10.5281/ zenodo.3462778 Dimmendaal, Gerrit J. (2011). Historical Linguistics and the Comparative Study of African Languages. Amsterdam: John Benjamins. Diouf, Jean Léopold (2009). Grammaire du wolof contemporain. Edition revue et complétée. Paris: L’Harmattan. Dixon, Robert M. W. (2002). Australian Languages: Their Nature and Development. Cambridge: Cambridge University Press. Dixon, Robert M. W. (2004). The Jarawara Language of Southern Amazonia. Oxford: Oxford University Press. Dixon, Robert M. W. and Alexandra Y. Aikhenvald (1999). ‘Introduction’, in Robert M. W. Dixon and Alexandra Y. Aikhenvald (eds), The Amazonian Languages. Cambridge: Cambridge University Press, 1–22.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi



357

Doneux, Jean Léonce (1975). ‘Hypothèses pour la comparative des langues atlantiques’, Africana Linguistica 6: 41–129. Doneux, Jean Léonce (1978). ‘Les liens historiques entre les langues du Sénégal’, Réalités africaines et langue française 7: 6–55. Donohue, Mark (2009). ‘Flores languages’, in Keith Brown and Sarah Ogilvie (eds), Concise Encyclopedia of Languages of the World. Oxford: Elsevier, 420–1. Donohue, Mark and Tim Denham (to appear). ‘Becoming Austronesian: Mechanisms of language dispersal across southern island Southeast Asia’, in David Gil and Antoinette Schapper (eds), Austronesian Undressed. Donohue, Mark and Johanna Nichols (2011). ‘Does phoneme inventory size correlate with population size?’, Linguistic Typology 15(2): 161–70. doi:10.1515/lity.2011.011 Dorian, Nancy (1978). ‘The fate of morphological complexity in language death: Evidence from East Sutherland Gaelic’, Language 54(3): 590–609. Dressler, Wolfgang U. (2003). ‘Degrees of grammatical productivity in inﬂectional morphology’, Italian Journal of Linguistics 15(1): 31–62. Dressler, Wolfgang U. (2005). ‘Morphological typology and ﬁrst language acquisition: Some mutual challenges’, in Geert E. Booij, Emiliano Guevara, Angela Ralli, Salvatore Sgroi, and Sergio Scalise (eds), Morphology and Linguistic Typology: On-line Proceedings of the Fourth Mediterranean Morphology Meeting (MMM4), Catania, 21–23 September 2003, 7–20. Dressler, Wolfgang U. (2011). ‘The rise of complexity in inﬂectional morphology’, Poznań Studies in Contemporary Linguistics 47(2): 159–76. doi:10.2478/psicl-2011-0013 Dressler, Wolfgang U. (2019). ‘Natural morphology’, in Mark Aronoff (ed.), The Oxford Research Encyclopedia of Linguistics. New York: Oxford University Press. doi: 10.1093/ acrefore/9780199384655.013.576 Dressler, Wolfgang U. and Marianne Kilani-Schoch (2016). ‘Natural morphology’, in Andrew Hippisley and Gregory Stump (eds), The Cambridge Handbook of Morphology. Cambridge: Cambridge University Press, 356–89. Dressler, Wolfgang U., Alona Kononenko, Sabine Sommer-Lolei, Katharina Korecky-Kröll, Paulina Zydorowicz, and Laura Kamandulytė-Merfeldienė (2019). ‘Morphological richness, transparency and the evolution of morphonotactic patterns’, Folia Linguistica s40(1): 85–106. doi:10.1515/ﬂih-2019-0005 Dressler, Wolfgang U., Willi Mayerthaler, Oswald Panagl, and Wolfgang U. Wurzel (1987). Leitmotifs in Natural Morphology. Amsterdam: John Benjamins. Dressler, Wolfgang U., Sabine Sommer-Lolei, Katharina Korecky-Kröll, Reili Argus, Ineta Dabašinskienė, Laura Kamandulytė-Merfeldienė, Johanna J. Ijäs, Victoria V. Kazakovskaya, Klaus Laalo, and Evangelia Thomadaki (2019). ‘First-language acquisition of synthetic compounds in Estonian, Finnish, German, Greek, Lithuanian, Russian and Saami’, Morphology 29(3): 409–29. doi:10.1007/s11525-019-09339-0 Dryer, Matthew S. (2013). ‘Coding of nominal plurality’, in Matthew S. Dryer and Martin Haspelmath (eds), The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. URL: https://wals.info/chapter/33 Dryer, Matthew and Martin Haspelmath (eds) (2013). The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. URL: http://wals.info Duke, Janet (2010). ‘Gender reduction and loss in Germanic: The Scandinavian, Dutch, and Afrikaans case studies’, in Antje Dammel, Sebastian Kürschner, and Damaris Nübling (eds), Kontrastive germanistische Linguistik. Hildesheim: Olms, 643–72.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

358



Ehret, Katharina and Benedikt Szmrecsanyi (2016). ‘An information-theoretic approach to assess linguistic complexity’, in Raffaela Baechler and Guido Seiler (eds), Complexity, Isolation, and Variation. Berlin: de Gruyter Mouton, 71–94. Ehrhart, Sabine (1993). Le créole français de St-Louis (le tayo) en Nouvelle-Calédonie. Hamburg: Helmut Buske. Epps, Patience (2005). ‘Areal diffusion and the development of evidentiality: Evidence from Hup’, Studies in Language 29(3): 617–50. doi:10.1075/sl.29.3.04epp Epps, Patience (2007a). ‘The Vaupés melting pot: Tukanoan inﬂuence on Hup’, in Alexandra Y. Aikhenvald and Robert M. W. Dixon (eds), Grammars in Contact: A Cross-Linguistic Typology. Oxford: Oxford University Press, 267–89. Epps, Patience (2007b). ‘Birth of a noun classiﬁcation system: The case of Hup’, in Leo Wetzels (ed.), Language Endangerment and Endangered Languages: Linguistic and Anthropological Studies with Special Emphasis on the Languages and Cultures of the Andean-Amazonian Border Area. The Netherlands: Leiden University, 107–28. Epps, Patience (2008). A Grammar of Hup. Berlin: Mouton de Gruyter. Epps, Patience (2010). ‘Linking valence change and modality: Diachronic evidence from Hup’, International Journal of American Linguistics 76(3): 335–56. doi:10.1086/ 652792 Epps, Patience (2013). ‘Inheritance, calquing, or independent innovation? Reconstructing morphological complexity in Amazonian numerals’, Journal of Language Contact 6: 329–57. doi:10.1163/19552629-00602007 Epps, Patience (2020). ‘Amazonian linguistic diversity and its sociocultural correlates’, in Mily Crevels, and Pieter C. Muysken (eds), Language Dispersal, Diversiﬁcation, and Contact: A Global Perspective. Oxford: Oxford University Press, 275–90. Epps, Patience and Lev Michael (2017). ‘The areal linguistics of Amazonia’, in Raymond Hickey (ed.), The Cambridge Handbook of Areal Linguistics. Cambridge: Cambridge University Press, 934–63. Evans, Nicholas (2003). Bininj Gun-Wok: A Pan-Dialectal Grammar of Mayali, Kunwinjku and Kune. Canberra: Paciﬁc Linguistics. Facundes, Sidney da Silva (2000). The Language of the Apurinã People of Brazil. The State University of New York at Buffalo PhD dissertation. Fal, Arame, Rosine Santos, and Jean Léonce Doneux (1990). Dictionnaire wolof-français. Paris: Karthala. Falkenberg, Johannes (1962). Kin and Totem: Group Relations of Aborigines in the Port Keats District. Oslo: Oslo University Press. Faye, Souleymane (2013). Grammaire dialectale du seereer. Dakar: La maison du livre universel E.L.U. Fedden, Sebastian and Greville G. Corbett (2017). ‘Gender and classiﬁers as concurrent systems: Reﬁning the typology of nominal classiﬁcation’, Glossa 2(1), 34. doi: 10.5334/ gjgl.177 Feist, Timothy (2015). A Grammar of Skolt Saami. Helsinki: Suomalais-Ugrilainen Seura. Feldman, Laurie B. (2000). ‘Are morphological effects distinguishable from the effects of shared meaning and shared form?’, Journal of Experimental Psychology. Learning, Memory, and Cognition 26(6): 1431–44. doi:10.1037//0278-7393.26.6.1431 Fenk-Oczlon, Gertraud and August Fenk (2008). ‘Complexity trade-offs between the subsystems of language’, in Matti Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language Complexity: Typology, Contact, Change. Amsterdam: John Benjamins, 43–65.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi



359

Fenk-Oczlon, Gertraud and August Fenk (2014). ‘Complexity trade-offs do not prove the equal complexity hypothesis’, Poznań Studies in Contemporary Linguistics 50(2): 145–55. doi:10.1515/psicl-2014-0010 Ferguson, Charles A. (1971). ‘Absence of copula and the notion of simplicity: A study of normal speech, baby talk, foreigner talk, and pidgins’, in Dell Hymes (ed.), Pidginization and Creolization of Languages. Cambridge: Cambridge University Press, 141–50. Ferronha, António Luís (ed.) (1994). Tratado breve dos Rios de Guiné do Cabo-Verde. Feito pelo Capitão André Álvares d’Almada. Ano de 1594. Lisboa: Grupo de Trabalho do Ministério da Educação para as Comemorações dos Descobrimentos Portugueses. Ferry, Marie-Paule and Konstantin Pozdniakov (2001). ‘Dialectique du régulier et de l’irrégulier. Le système des classes nominales dans le groupe tenda des langues atlantiques’, in Robert Nicolaï (ed.), Leçons d’Afrique. Filiations, ruptures et reconstitution de langues. Un hommage à Gabriel Manessy. Louvain: Peeters, 153–67. Fertig, David (2000). Morphological Change Up Close: Two and a Half Centuries of Verbal Inﬂection in Nuremberg. Berlin: De Gruyter. Field, Andy, Jeremy Miles, and Zoë Field (2012). Discovering Statistics Using R. London: Sage. Finkel, Raphael and Gregory Stump (2007). ‘Principal parts and morphological typology’, Morphology 17(1): 39–75. doi:10.1007/s11525-007-9115-9 Finkel, Raphael and Gregory Stump (2009). ‘Principal parts and degrees of paradigmatic transparency’, in James P. Blevins and Juliette Blevins (eds), Analogy in Grammar: Form and Acquisition. Oxford: Oxford University Press, 13–53. Finkel, Raphael and Gregory Stump (2013). Principal parts analyzer. URL: http://www.cs. uky.edu/raphael/linguistics/analyze.html (accessed July 2016). Fiorentino, Robert and David Poeppel (2007). ‘Compound words and structure in the lexicon’, Language and Cognitive Processes 22(7): 953–1000. doi:10.1080/ 01690960701190215 Fitch, W. Tecumseh and Marc D. Hauser (2004). ‘Computational constraints on syntactic processing in a nonhuman primate’, Science 303(5656): 377–80. doi:10.1126/ science.1089401 Fleck, David (2007). ‘Evidentiality and double tense in Matses’, Language 83: 589–614. doi:10.1353/lan.2007.0113 Forshaw, William (2016). Little Kids, Big Verbs: The Acquisition of Murrinhpatha Bipartite Stem Verbs. University of Melbourne PhD dissertation. Fortescue, Michael (1992). ‘Morphophonemic complexity and typological stability in a polysynthetic language family’, International Journal of American Linguistics 58(2): 242–8. doi:10.1086/ijal.58.2.3519761 Fowler, Catherine S. (1972). ‘Some ecological clues to Proto-Numic homelands’, in Don D. Fowler (ed.), Great Basin Cultural Ecology: A Symposium. Reno Desert Research Institute Publications in the Social Sciences, 105–21. Frenda, Alessio (2011). ‘Gender in Irish between continuity and change’, Folia Linguistica 45: 283–316. doi:10.1515/ﬂin.2011.012 Gabas Jr, Nilson (1999). A Grammar of Karo, Tupi (Brazil). University of California at Santa Barbara PhD dissertation. Gal, Susan (1989). ‘Lexical innovation and loss: Restricted Hungarian’, in Nancy Dorian (ed.), Investigating Obsolescence: Studies in Language Contraction and Death. Cambridge: Cambridge University Press, 313–31. Gamble, David (1957). Elementary Wolof Grammar. London: Research Department Colonial Ofﬁce. [Reprinted in Gabriel Manessy and Serge Sauvageot (eds) (1963).

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

360



Wolof et Sérèr. Études de phonétique et de grammaire descriptive. Dakar: University of Dakar Press, 131–61.] Gao, Yongming (1998). Mental Representations of Chinese Numeral Classiﬁers. Lehigh University PhD dissertation. Gardani, Francesco (2008). Borrowing of Inﬂectional Morphemes in Language Contact. Frankfurt am Main: Peter Lang. Gardani, Francesco (2012). ‘Plural across inﬂection and derivation, fusion and agglutination’, in Lars Johanson and Martine I. Robbeets (eds), Copies versus Cognates in Bound Morphology. Leiden: Brill, 71–97. Gardani, Francesco (2013). Dynamics of Morphological Productivity: The Evolution of Noun Classes from Latin to Italian. Leiden: Brill. Gardani, Francesco (2015). ‘Afﬁx pleonasm’, in Peter O. Müller, Ingeborg Ohnheiser, Susan Olsen, and Franz Rainer (eds), Word-Formation. An International Handbook of the Languages of Europe, vol. 1. Berlin: De Gruyter Mouton, 537–50. Gardani, Francesco (2018). ‘On morphological borrowing’, Language and Linguistics Compass 12(10): 1–17. doi:10.1111/lnc3.12302 Gardani, Francesco, Franz Rainer, and Hans Christian Luschützky (2019). ‘Competition in morphology: A historical outline’, in Franz Rainer, Francesco Gardani, Wolfgang U. Dressler, and Hans Christian Luschützky (eds), Competition in Inﬂection and Word-Formation. Cham: Springer, 3–36. doi:10.1007/978-3-030-02550-2_1 Gblem-Poidi, Massanvi Honorine (2007). ‘Nominal classes and concord in Igo (Ahlon)’, in Mary Esther Kropp Dakubu, George Akanlig-Pare, Kweku E. Osam, and Koﬁ K. Saah (eds), Proceedings of the Annual Colloquium of the Legon-Trondheim Linguistics Project 10–20 January 2005, vol. 4. Legon: Linguistics Department, University of Ghana, 52–60. Gell-Mann, Murray (1994). The Quark and the Jaguar: Adventures in the Simple and the Complex. London: Little Brown. Gell-Mann, Murray (1995). ‘What is complexity?’, Complexity 1(1): 16–19. Gervain, Judith and Jacques Mehler (2010). ‘Speech perception and language acquisition in the ﬁrst year of life’, Annual Review of Psychology 61: 191–218. doi:10.1146/annurev. psych.093008.100408 Gibbons, Jean Dickinson (1992). Nonparametric Measures of Association. Newbury Park, CA: Sage. Gippert, Jost, Wolfgang Schulze, Zaza Aleksidze, and Jean-Pierre Mahé (2009). The Caucasian Albanian Palimpsests of Mount Sinai. Turnhout, Belgium: Brepols. Givón, Talmy (1971). ‘Historical syntax and synchronic morphology: An archeologist’s ﬁeldtrip’, Proceedings of the Chicago Linguistic Society 7: 394–415. Goertzel, Ben (1994). Chaotic Logic: Language, Thought, and Reality from the Perspective of Complex Systems Science. Boston: Springer. Goldsmith, John (2001). ‘Unsupervised learning of the morphology of a natural language’, Computational Linguistics 27(2): 153–98. doi:10.1162/089120101750300490 Goldsmith, John (2011). ‘The evaluation metric in Generative Grammar.’ Paper presented at the 50th anniversary celebration for the MIT Department of Linguistics. Gomez-Imbert, Elsa (1996). ‘When animals become “rounded” and “feminine”: Conceptual categories and linguistic classiﬁcation in a multilingual setting’, in John J. Gumperz and Stephen C. Levinson (eds), Rethinking Linguistic Relativity. Cambridge: Cambridge University Press, 438–69. Gomez-Imbert, Elsa (2007). ‘Tukanoan nominal classiﬁcation: The Tatuyo system’, in Leo Wetzels (ed.), Language Endangerment and Endangered Languages: Linguistic and

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi



361

Anthropological Studies with Special Emphasis on the Languages and Cultures of the Andean-Amazonian Border Area. Leiden: Leiden University, 401–28. Good, Jeff (2012a). ‘How to become a “Kwa” noun’, Morphology 22: 293–335. doi:10.1007/ s11525-011-9197-2 Good, Jeff (2012b). ‘Typologizing grammatical complexities: Or why creoles may be paradigmatically simple but syntagmatically average’, Journal of Pidgin and Creole Languages 27(1): 1–47. doi:10.1075/jpcl.27.1.01goo Good, Jeff (2015). ‘Paradigmatic complexity in pidgins and creoles’, Word Structure 8(2): 184–227. doi:10.3366/word.2015.0081 Good, Jeff (2016). The Linguistic Typology of Templates. Cambridge: Cambridge University Press. Grant, Anthony P. (1996). ‘The evolution of functional categories in Grande Ronde Chinook Jargon: Ethnolinguistic and grammatical considerations’, in Philip Baker and Anand Syea (eds), Changing Meanings, Changing Functions: Papers Relating to Grammaticalization in Creole Languages. London: University of Westminster Press, 225–42. Grant, Anthony (2009). ‘Admixture, structural transmission, simplicity and complexity’, in Nicholas Faraclas and Thomas Klein (eds), Simplicity and Complexity in Creoles and Pidgins. London: Battlebridge Publications, 125–52. Green, Ian (2003). ‘The genetic status of Murrinh-patha’, in Nicholas Evans (ed.), The NonPama-Nyungan Languages of Northern Australia. Canberra: Paciﬁc Linguistics, 125–58. Greenberg, Joseph H. (1954). ‘A quantitative approach to the morphological typology of language’, in Robert F. Spencer (ed.), Method and Perspective in Anthropology: Papers in Honor of Wilson D. Wallis. Minneapolis: Minnesota University Press, 192–220. Greenberg, Joseph H. (1960). ‘A quantitative approach to the morphological typology of language’, International Journal of American Linguistics 26(3): 178–94. doi:10.1086/ 464575 Grijns, Cornelis D. (1991). Jakarta Malay: A Multidimensional Approach to Spatial Variation. Leiden: KITLV Press. Grinevald, Colette and Frank Seifart (2004). ‘Noun classes in African and Amazonian languages: Towards a comparison’, Linguistic Typology 8: 243–85. doi:10.1515/ lity.2004.007 Grünwald, Peter D. (2007). The Minimum Description Length Principle. Cambridge, MA: The MIT Press. Guérin, Maximilien (2011). Le syntagme nominal en wolof. Une approche typologique. Paris: Université Sorbonne Nouvelle—Paris 3 MA thesis. Guillaume, Antoine (2008). A Grammar of Cavineña. Berlin: Mouton de Gruyter. Guillaume, Antoine (2016). ‘Associated motion in South America: Typological and areal perspectives’, Linguistic Typology 20: 81–177. doi:10.1515/lingty-2016-0003 Guillaume, Antoine and Françoise Rose (2010). ‘Sociative causative markers in South American languages: A possible areal feature’, in Franck Floricic (ed.), Essais de typologie et de linguistique générale, Mélanges offerts à Denis Creissels. Lyon: ENS Éditions, 383–402. Guy, Gregory (1991). ‘Explanation in variable phonology: An exponential model of morphological constraints’, Language Variation and Change 3: 1–22. doi:10.1017/ S0954394500000429 Hale, Kenneth (1969). Walbiri Conjugations. Cambridge, MA: MIT.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

362



Halle, Moris (1994). ‘The Russian declension: An illustration of the theory of Distributed Morphology’, in Jennifer S. Cole and Charles Kisseberth (eds), Perspectives in Phonology. Stanford: CSLI Publications, 29–60. Hammarström, Harald, Robert Forkel, and Martin Haspelmath (eds) (2019). Glottolog 3.4. Jena: Max Planck Institute for the Science of Human History. URL: https://glottolog.org Hansson, Inga-Lill (2003). ‘Akha’, in Randy LaPolla and Graham Thurgood (eds). The Sino-Tibetan Languages. London: Routledge, 236–51. Harris, Alice (2004). ‘History in support of synchrony’, in Charles Chang, Michael J. Houser, Yuni Kim, David Mortensen, and Mischa Park-Doob (eds), Proceedings of the Berkeley Linguistics Society. Berkeley Linguistics Society, 142–59. Harris, Alice (2017). Multiple Exponence. Oxford: Oxford University Press. Harris, Alice and Lyle Campbell (1995). Historical Syntax in Cross-linguistic Perspective. Cambridge: University of Cambridge Press. Haspelmath, Martin (2009). ‘An empirical test of the Agglutination Hypothesis’, in Sergio Scalise, Elisabetta Magni, and Antonietta Bisetto (eds), Universals of Language Today. Dordrecht: Springer, 13–29. Haspelmath, Martin (2011). ‘The indeterminacy of word segmentation and the nature of morphology and syntax’, Folia Linguistica 45(1): 31–80. doi:10.1515/ﬂin-2017-1005 Haspelmath, Martin, Matthew Dryer, David Gil, and Bernard Comrie (eds) (2005). The World Atlas of Language Structures. Oxford: Oxford University Press. Haspelmath, Martin and Thomas Müller-Bardey (2004). ‘Valency change’, in Geert E. Booij, Christian Lehmann, Joachim Mugdan, and Stavros Skopeteas (in collaboration with Wolfgang Kesselheim) (eds), Morphology: A Handbook on Inﬂection and Word Formation, vol. 2. Berlin: de Gruyter, 1130–45. Haspelmath, Martin and Andrea D. Sims (2010). Understanding Morphology. 2nd ed. London: Hodder Education. Haude, Katharina (2006). A Grammar of Movima. Universiteit Nijmegen PhD dissertation. Hauser, Marc D., Noam Chomsky, and Tecumseh W. Fitch (2002). ‘The faculty of language: What is it, who has it, and how did it evolve?’, Science 298(5598): 1569–79. doi:10.1126/science.298.5598.1569 Hawkins, John A. (2004). Efﬁciency and Complexity in Grammars. New York: Oxford University Press. Hawkins, John A. (2007). ‘Processing typology and why psychologists need to know about it’, New Ideas in Psychology 25: 87–107. doi:10.1016/j.newideapsych.2007.02.003 Hawkins, John A. (2014). Cross-Linguistic Variation and Efﬁciency. Oxford: Oxford University Press. Hay, Jennifer (2001). ‘Lexical frequency in morphology: Is everything relative?’, Linguistics 39(6): 1041–70. doi:10.1515/ling.2001.041 Hay, Jennifer (2003). Causes and Consequences of Word Structure. New York: Routledge. Hay, Jennifer and Laurie Bauer (2007). ‘Phoneme inventory size and population size’, Language 83(2): 388–400. doi:10.1353/lan.2007.0071 Haynie, Hannah, Claire Bowern, Patience Epps, Jane Hill, and Patrick McConvell (2014). ‘Wanderwörter in languages of the Americas and Australia’, Ampersand 1: 1–18. doi:10.1016/j.amper.2014.10.001 Hazaël-Massieux, Marie-Christine (2002). ‘Les créoles à base française: une introduction’, Travaux Interdisciplinaires du Laboratoire Parole et Langage d’Aix-en-Provence (TIPA) 21: 63–86. Hengeveld, Kees and Sterre Leufkens (2018). ‘Transparent and non-transparent languages’, Folia Linguistica 52(1): 139–75. doi:10.1515/ﬂin-2018-0003

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi



363

Henri, Fabiola (2010). A Constraint-Based Approach to Verbal Constructions in Mauritian. University of Mauritius and Université Paris Diderot PhD dissertation. Henri, Fabiola (2012). ‘Attenuative reduplication in Mauritian’. In Enoch Aboh, Norval Smith, and Anne Zribi-Hertz (eds), The Morphosyntax of Reiteration. Amsterdam: John Benjamins, 203–34. Henri, Fabiola (forthcoming). ‘Morphomic structure in Mauritian: On change, complexity and creolization’, Morphology. Henri, Fabiola and Alain Kihm (2015). ‘The morphology of TMA marking in creole languages: A comparative study’, Word Structure 8(2): 248–82. doi:10.3366/word.2015.0083 Henri, Fabiola, Jean-Marie Marandin, and Anne Abeillé (2008). ‘Information structure coding in Mauritian: Verum Focus expressed by long forms of verbs’. Paper presented at the Workshop on Predicate Focus, Verum Focus, Verb Focus, University of Potsdam. Hill, Jane H. (2001). ‘Proto-Uto-Aztecan: A community of cultivators in Central America?’, American Anthropologist 103: 913–34. doi:10.1525/aa.2001.103.4.913 Hill, Jane H. (2010). ‘New evidence for a Mesoamerican homeland for Proto-Uto-Aztecan’, PNAS 107(11): E33. doi:10.1073/pnas.0914473107 Hill, Nathan (2014). ‘Grammatically conditioned sound change’, Language and Linguistics Compass 8: 211–29. doi:10.1111/lnc3.12073 Hippisley, Andrew, Marina Chumakina, Greville G. Corbett, and Dunstan Brown (2004). ‘Suppletion: Frequency, categories and distribution of stems’, Studies in Language 28(2): 387–418. doi:10.1075/sl.28.2.05hip Hock, Hans Henrich and Brian D. Joseph (1996). Language History, Language Change, and Language Relationship. Berlin: Walter de Gruyter. Hockett, Charles F. (1947). ‘Problems of morphemic analysis’, Language 23(4): 321–43. Hockett, Charles F. (1958). A Course in Modern Linguistics. New York: Macmillan. Hodge, Carleton (1970). ‘The linguistic cycle’, Language Sciences 13: 1–7. [Reprinted in Scott Noegel and Alan S. Kaye (eds) (2004), Afroasiatic Linguistics, Semitics, and Egyptology: Selected Writings of Carleton T. Hodge, Bethesda, MD: CDL Press, 1–17.] Hopper, Paul (1990). ‘Where do words come from?’, in William Croft, Keith Denning, and Suzanne Kemmer (eds), Studies in Typology and Diachrony: Papers Presented to Joseph H. Greenberg on his 75th Birthday. Amsterdam: John Benjamins, 151–60. Hualde, José Ignacio, Gorka Elordieta, and Arantzazu Elordeta (1994). The Basque Dialect of Lekeitio. Bilbo: Universidad del País Vasco/Euskal Herriko Univertsitatea. Hualde, José Ignacio and Jon Ortiz de Urbina (2003). A Grammar of Basque. Berlin: Mouton de Gruyter. Huber, Christian (2011). ‘Some notes on gender and number marking in Shumcho’, in Gerda Lechleitner and Christian Liebl (eds), Jahrbuch des Phonogrammarchivs, vol. 2. Göttingen: Cuvillier Verlag, 52–90. Hudson, Carla L. and Elissa L. Newport. (1999). ‘Creolization: Could adults really have done it all’, in Annabel Greenhill, Heather Littleﬁeld, and Cheryl Tano (eds), Proceedings of the 23rd Annual Boston University Conference on Language Development. Somerville: Cascadilla Press, 265–76. Hudson Kam, Carla L. and Elissa L. Newport (2005). ‘Regularizing unpredictable variation: The roles of adult and child learners in language formation and change’, Language Learning and Development 1(2): 151–95. doi:10.1080/15475441.2005.9684215 Hudson Kam, Carla L. and Elissa L. Newport (2009). ‘Getting it right by getting it wrong: When learners change languages’, Cognitive Psychology 59(1): 30–66. doi:10.1016/j. cogpsych.2009.01.001 Huldén, Lars (1972). ‘Genussystemet i Karleby och Nedervetil’, Folkmålsstudier 22: 47–82.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

364



Hull, Geoffrey (1998). ‘The basic lexical afﬁnities of Timor’s Austronesian languages: A preliminary investigation’, Studies in the Languages and Cultures of East Timor 1: 97–174. Hull, Geoffrey (1999). Standard Tetum-English Dictionary. Sydney: Allen & Unwin. Hultman, Oskar Fredrik (1894). De östsvenska dialekterna. Helsinki: Svenska landsmålsföreningen. Humboldt, Wilhelm von (1836). Über die Verschiedenheit des menschlichen Sprachbaues und ihren Einﬂuss auf die geistige Entwickelung des Menschengeschlechts. Berlin: F. Dümmler. Hyman, Larry M. (2004). ‘How to become a Kwa verb’, Journal of West African Languages 30: 69–88. Igartua, Iván (2019). ‘Loss of grammatical gender and language contact’, Diachronica 36: 181–221. doi:10.1075/dia.17004.iga Irvine, Judith (1978). ‘Wolof noun classiﬁcation: The social setting of divergent change’, Language in Society 7: 37–64. doi:10.1017/S0047404500005327 Irvine, Judith (2011). ‘Société et communication chez les Wolof à travers le temps et l’espace’, in Anna M. Diagne, Sascha Kesseler, and Christian Meyer (eds), Communication wolof et société sénégalaise. Héritage et création. Paris: L’Harmattan, 37–70. Jakobson, Roman (1929). Remarques sur l’évolution phonologique du russe comparée à celle des autres langues slaves. Praha: Jednota československých matematiků a fysiků. Jakobson, Roman (1959). ‘On linguistic aspects of translation’, in Reuben A. Brower (ed.), On Translation. Cambridge, MA: Harvard University Press, 232–9. Jamieson, Carole Ann (1982). ‘Conﬂated subsystems marking person and aspect in Chiquihuitlán Mazatec verbs’, International Journal of American Linguistics 48(2): 139–67. doi:10.1086/465725 Janda, Laura A. (1994). ‘The spread of athematic 1sg -m in the major West Slavic languages’, The Slavic and East European Journal 38(1): 90–119. doi:10.2307/308549 Janhunen, Juha (2008). ‘Mongolic as an expansive language family’, in Tokusu Kurebito (ed.), Past and Present Dynamics: The Great Mongolian State. Tokyo: Tokyo University of Foreign Studies, Research Institute for Languages and Cultures of Asia and Africa, 127–37. Janse, Mark and Sijmen Tol (eds). (2003). Language Death and Language Maintenance: Theoretical, Practical and Descriptive Approaches. Amsterdam: John Benjamins. Jespersen, Otto (1949). A Modern English Grammar on Historical Principles. London: Allen & Unwin. Joanisse, Marc F. and Mark S. Seidenberg (2005). ‘Imaging the past: Neural activation in frontal and temporal regions during regular and irregular past-tense processing’, Cognitive, Affective & Behavioral Neuroscience 5(3): 282–96. Johnson, Jacqueline S., Kenneth D. Shenkman, Elissa L. Newport, and Douglas L. Medin (1996). ‘Indeterminacy in the grammar of adult language learners’, Journal of Memory and Language 35: 335–52. doi:10.1006/jmla.1996.0019 Joseph, Brian D. and Richard D. Janda (1988). ‘The how and why of diachronic morphologization and demorphologization’, in Michael Hammond and Michael Noonan (eds), Theoretical Morphology. New York: Academic Press, 193–210. Joseph, John E. and Frederick J. Newmeyer (2012). ‘ “All languages are equally complex”: The rise and fall of a consensus’, Historiographia Linguistica 39(2–3): 341–68. doi:10.1075/hl.39.2-3.08jos

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi



365

Juola, Patrick (1998). ‘Measuring linguistic complexity: The morphological tier’, Journal of Quantitative Linguistics 5: 206–13. doi:10.1080/09296179808590128 Karatsareas, Petros (2009). ‘The loss of grammatical gender in Cappadocian Greek’, Transactions of the Philological Society 107: 196–230. doi:10.1111/j.1467-968X. 2009.01217.x Karatsareas, Petros (2014). ‘On the diachrony of gender in Asia Minor Greek: The development of semantic agreement in Pontic’, Language Sciences 43: 77–101. doi:10.1016/j.langsci.2013.10.005 Kelly, Barbara, Gillian Wigglesworth, Rachel Nordlinger, and Joseph Blythe (2014). ‘The acquisition of polysynthetic languages’, Language and Linguistics Compass 8(2): 51–64. doi:10.1111/lnc3.12062 Kendall, Maurice and Jean Dickinson Gibbons (1990). Rank Correlation Methods. 5th ed. Oxford: Oxford University Press. Kibrik, Aleksandr E. (1991). ‘Organizing principles for nominal paradigms in Daghestanian languages: Comparative and typological observations’, in Frans Plank (ed.), Paradigms: The Economy of Inﬂection. Berlin: Mouton de Gruyter, 255–74. Kibrik, Aleksandr E. (2003). ‘Nominal inﬂection galore: Daghestanian, with side glances at Europe and the world’, in Frans Plank (ed.), Noun Phrase Structure in the Languages of Europe. Berlin: Mouton de Gruyter, 37–112. Kibrik, Andrej A. (2012). ‘What’s in the head of head-marking languages?’, in Pirkko Suihkonen, Bernard Comrie, and Valery Solovyev (eds), Argument Structure and Grammatical Relations: A Crosslinguistic Typology. Amsterdam: John Benjamins, 211–40. Kielhorn, Franz (1871). The Paribhāṣenduśekhara of Nāgojībhaṭṭa (2 vols). Bombay: InduPrakāsh Press. Kihm, Alain (1994). Kriyol Syntax. Amsterdam: John Benjamins. Kihm, Alain (2014). ‘Theories of morphology and theories of creole emergence: The inner connection’. PAPIA, São Paulo, 24(1): 43–89. Killian, Don (2015). Topics in Uduk Phonology and Morphosyntax. University of Helsinki PhD dissertation. Kirby, Simon, Hannah Cornish, and Kenny Smith (2008). ‘Cumulative cultural evolution in the laboratory: An experimental approach to the origins of structure in human language’, Proceedings of the National Academy of Sciences 105(31): 10681–6. doi:10.1073/ pnas.0707835105 Kiso, Andrea (2012). Tense and Aspect in Chichewa, Citumbuka and Cisena: A Description and Comparison of the Tense-Aspect Systems in Three Southeastern Bantu Languages. Stockholm University dissertation. Klausenburger, Jurgen (1976). ‘(De)morphologization in Latin’, Lingua 40(4): 305–20. doi:10.1016/0024-3841(76)90082-6 Klingler, Thomas (2003). If I Could Turn My Tongue Like That: The Creole of Pointe Coupee Parish, Louisiana. Baton Rouge: Louisiana State University Press. Kobès, Aloys (1869). Grammaire de la langue volofe. Ouvrage nouveau. Saint-Joseph de Ngasobil: Imprimerie de la Mission. Kobès, Aloys (1875). Dictionnaire volof-francais. Saint-Joseph de Ngasobil: Mission Catholique [cited from the new edition: Kobès, Aloys and Olivier Abiven (1923), Dictionnaire volof-francais. Nouvelle édition revue et considerablement augmentée par le R. P. O. Abiven. Dakar: Mission Catholique]. Koopman, Hilda and Claire Lefebvre (1981). ‘Haitian Creole pu’, in Pieter C. Muysken (ed.), Generative Studies on Creole Languages. Dordrecht: Foris, 201–21. Koptjevskaja-Tamm, Maria and Bernhard Wälchli (2001). ‘The Circum-Baltic languages: An areal-typological approach’, in Östen Dahl and Maria Koptjevskaja-Tamm (eds),

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

366



Circum-Baltic Languages, vol. 2: Grammar and Typology. Amsterdam: John Benjamins, 615–750. Kortmann, Bernd and Benedikt Szmrecsanyi (eds) (2012). Linguistic Complexity: Second Language Acquisition, Indigenization, Contact. Berlin: De Gruyter. Krashnoukhova, Olga (2012). The Noun Phrase in the Languages of South America. Universiteit Nijmegen PhD dissertation. Kreyer, Rolf (2003). ‘Genitive and of-construction in modern written English: Processability and human involvement’, International Journal of Corpus Linguistics 8 (2): 169–207. doi:10.1075/ijcl.8.2.02kre Kusters, Wouter (2003). Linguistic Complexity: The Inﬂuence of Social Change on Verbal Inﬂections. Utrecht: LOT. Kusters, Wouter (2008). ‘Complexity in linguistic theory, language learning and language change’, in Matti Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language Complexity: Typology, Contact, Change. Amsterdam: John Benjamins, 3–22. Labouret, Henri (1935). ‘Remarques sur la langue des wolof ’, in Nicolas Leca (ed.), Les pêcheurs de Guet N’dar. Paris: Larose, 16–27. [Reprinted in Gabriel Manessy and Serge Sauvageot (eds) (1963). Wolof et Sérèr. Études de phonétique et de grammaire descriptive. Dakar: University of Dakar Press, 45–56.] Labov, William (1963). ‘The social motivation of a sound change’, Word 19: 273–309. Ladd, D. Robert, Seán G. Roberts, and Dan Dediu (2015). ‘Correlational studies in typological and historical linguistics’, Annual Review of Linguistics 1: 221–41. doi:10.1146/annurev-linguist-030514-124819 Landaburu, Jon (2005). ‘Expresión gramatical de lo epistémico en algunas lenguas del norte de Suramerica’, Proceedings of the Conference on Indigenous Languages of Latin America, 1–13. URL: lanic.utexas.edu/project/etext/llilas/cilla/landaburu2.pdf Leclerc, Jacques (2015). L’aménagement linguistique dans le monde. URL: http://www.axl. cefan.ulaval.ca/afrique/senegal.htm Leer, Jeff (1991). ‘Evidence for a Northern Northwest Coast language area: Promiscuous number marking and periphrastic possessive constructions in Haida, Eyak, and Aleut’, International Journal of American Linguistics 57(2): 158–93. doi:10.1086/ ijal.57.2.3519765 Lefebvre, Claire (1998). Creole Genesis and the Acquisition of Grammar. Cambridge: Cambridge University Press. Lefebvre, Claire and Anne-Marie Brousseau (2002). Fongbe. Berlin: Mouton de Gruyter. Lehmann, Christian (1985). ‘Grammaticalization: Synchronic variation and diachronic change’, Lingua e Stile 20: 303–18. Lewis, Geoffrey L. (2001). Turkish Grammar. 2nd ed. Oxford: Oxford University Press. Lewis, M. Paul, Gary F. Simons, and Charles D. Fennig (eds) (2015). Ethnologue: Languages of the World. 18th ed. Dallas, TX: SIL International. URL: http://www. ethnologue.com Li, Charles N. and Sandra A. Thompson (1976). ‘Development of the causative in Mandarin Chinese: Interaction of diachronic processes in syntax’, in Masayoshi Shibatani (ed.), The Grammar of Causative Constructions. New York: Academic Press, 477–92. Li, Charles N. and Sandra A. Thompson (1981). Mandarin Chinese: A Functional Reference Grammar. Berkeley, CA: University of California Press. Lindström, Eva (2008). ‘Language complexity and interlinguistic difﬁculty’, in Matti Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language Complexity: Typology, Contact, Change. Amsterdam: John Benjamins, 217–42.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi



367

Loporcaro, Michele (2018). Gender from Latin to Romance: History, Geography, Typology. Oxford: Oxford University Press. Loporcaro, Michele, Francesco Gardani, and Alberto Giudici (forthcoming). ‘Contactinduced complexiﬁcation in the gender system of Istro-Romanian’. Journal of Language Contact. Loporcaro, Michele and Tania Paciaroni (2011). ‘Four gender-systems in Indo-European’, Folia Linguistica 45(2): 389–434. doi:10.1515/ﬂin.2011.015 Lowe, Ivan (1999). ‘Nambiquara’, in Robert M. W. Dixon and Aikhenvald Y. Aikhenvald (eds), The Amazonian Languages. Cambridge: Cambridge University Press, 269–92. Ludwig, Ralph, Sylviane Telchid, and Florence Bruneau-Ludwig (eds) (2001). Corpus créole. Hamburg: Helmut Buske. Luís, Ana R. (2009). ‘The loss and survival of inﬂectional morphology: Contextual vs. inherent inﬂection in creoles’, in Sonia Colina, Antxon Olarrea, and Ana Carvalho (eds), Romance Linguistics 2009. Amsterdam: John Benjamins, 323–36. Luís, Ana R. (2014). ‘Inﬂectional structure without morphemes: Similarities between creoles and non-creoles’, PAPIA, São Paulo, 24(2): 381–406. Lüpke, Friederike and Mary Raymond (eds) (2010). Documenting Atlantic-Mande Convergence and Diversity. Special issue of the Journal of language contact—THEMA 3. Lupyan, Gary and Rick Dale (2010). ‘Language structure is partly determined by social structure’, PLoS ONE 5(1): e8559. doi:10.1371/journal.pone.0008559 MacWhinney, Brian, Elizabeth Bates, and Reinhold Kliegl (1984). ‘Cue validity and sentence interpretation in English, German, and Italian’, Journal of Verbal Learning and Verbal Behavior 23(2): 127–50. doi:10.1016/S0022-5371(84)90093-8 Madsen, David and David Rhode (1994). Across the West: Human Population Movement and the Expansion of the Numa. Salt Lake City, UT: University of Utah Press. Maiden, Martin (2005). ‘Morphological autonomy and diachrony’, in Geert E. Booij and Jaap van Marle (eds), Yearbook of Morphology 2004. Dordrecht: Springer, 137–75. doi:10.1007/1-4020-2900-4_6 Maiden, Martin (2013). ‘ “Semi-autonomous” morphology? A problem in the history of the Italian (and Romanian) verb’, in Silvio Cruschina, Martin Maiden, and John C. Smith (eds), The Boundaries of Pure Morphology: Diachronic and Synchronic Perspectives. Oxford: Oxford University Press, 24–44. Maiden, Martin (2018). The Romance Verb: Morphomic Structure and Diachrony. Oxford: Oxford University Press. Maiden, Martin, John C. Smith, Maria Goldbach, and Marc-Olivier Hinzelin (eds) (2011). Morphological Autonomy: Perspectives from Romance Inﬂectional Morphology. Oxford: Oxford University Press. Maitz, Péter and Attila Németh (2014). ‘Language contact and morphosyntactic complexity: Evidence from German’, Journal of Germanic Linguistics 26(1): 1–29. doi:10.1017/ S1470542713000184 Malone, Terrell A. (1988). ‘The origin and development of Tuyuca evidentials’, International Journal of American Linguistics 54: 119–40. doi:10.1086/466079 Manessy, Gabriel and Serge Sauvageot (eds) (1963). Wolof et Sérèr. Études de phonétique et de grammaire descriptive. Dakar: University of Dakar Press. Mansﬁeld, John (2014). Polysynthetic Sociolinguistics: The Language and Culture of Murrinh Patha Youth. Australian National University PhD dissertation. Mansﬁeld, John (2015a). ‘Consonant lenition as a sociophonetic variable in Murrinh Patha (Australia)’, Language Variation and Change 27(2): 203–25. doi:10.1017/ S0954394515000046

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

368



Mansﬁeld, John (2015b). ‘Morphotactic variation, prosodic domains and the changing structure of the Murrinhpatha verb’, Asia-Paciﬁc Language Variation 1(2): 163–89. doi:10.1075/aplv.1.2.03man Mansﬁeld, John (2016). ‘Intersecting formatives and inﬂectional predictability: How do speakers and learners predict the correct form of Murrinhpatha verbs?’, Word Structure 9(2): 183–214. doi:10.3366/word.2016.0093 Mansﬁeld, John (2019). Murrinhpatha Morphology and Phonology. Berlin: De Gruyter Mouton. Marschner, Ian C. (2011). ‘glm2: Fitting generalized linear models with convergence problems’, The R Journal 3(2): 12–15. Marslen-Wilson, William D. (2007). ‘Morphological processes in language comprehension’, in M. Gareth Gaskell (ed.), The Oxford Handbook of Psycholinguistics. Oxford: Oxford University Press, 175–93. Marzi, Claudia, Marcello Ferro, Ouafae Nahli, Patrizia Belik, Stavros Bompolas, and Vito Pirrelli (2018). ‘Evaluating inﬂectional complexity crosslinguistically: A processing perspective’, in Nicoletta Calzolari (ed.), LREC 2018: Eleventh International Conference on Language Resources and Evaluation: May 7–12, 2018, Miyazaki, Japan. Paris: European Language Resources Association ELRA, article n. 745. Matras, Yaron (1998). ‘Utterance modiﬁers and universals of grammatical borrowing’, Linguistics 36: 281–331. doi:10.1515/ling.1998.36.2.281 Matras, Yaron (2009). Language Contact. Cambridge: Cambridge University Press. Matras, Yaron and Jeanette Sakel (eds) (2007). Grammatical Borrowing in Cross-Linguistic Perspective. Berlin: Mouton de Gruyter. Matthews, Peter H. (1972). Inﬂectional Morphology. Cambridge: Cambridge University Press. Matthews, Peter. H. (1991). Morphology. 2nd ed. Cambridge: Cambridge University Press. McGregor, William (2010). ‘Optional ergative case marking systems in a typologicalsemiotic perspective’, Lingua 120: 1610–36. doi:10.1016/j.lingua.2009.05.010 McGregor, William and Jean-Christophe Verstraete (2010). ‘Optional ergative marking and its implications for linguistic theory’, Lingua 120: 1607–9. doi:10.1016/j. lingua.2009.05.009 Mc Laughlin, Fiona (1997). ‘Noun classiﬁcation in Wolof: When afﬁxes are not renewed’, Studies in African Linguistics 26(1): 1–28. Mc Laughlin, Fiona (2000). ‘Consonant mutation and reduplication in Seereer-Siin’, Phonology 17: 333–63. doi:10.1017/S0952675701003955 Mc Laughlin, Fiona (2001). ‘Dakar Wolof and the conﬁguration of an urban identity’, Journal of African Cultural Studies 14(2): 153–72. doi:10.1080/13696810120107104 McLeod, A. Ian (2011). ‘Package “Kendall”. R package documentation’. URL: https:// cran.r-project.org/web/packages/Kendall/Kendall.pdf McWhorter, John H. (1994). ‘From focus marker to copula in Swahili’, in Kevin E. Moore, David Peterson, and Comfort Wentum (eds), Proceedings of the Berkeley Linguistics Society, Special Session on Historical Issues in African Linguistics. Berkeley, CA: Berkeley Linguistics Society, 57–66. McWhorter, John H. (1998). ‘Identifying the creole prototype: Vindicating a typological claim’, Language 74: 788–818. doi:10.2307/417003 McWhorter, John H. (2001). ‘The world’s simplest grammars are creole grammars’, Linguistic Typology 5(2–3): 125–66. doi:10.1515/lity.2001.001 McWhorter, John H. (2002). ‘What happened to English?’, Diachronica 19: 217–72. doi:10.1075/dia.19.2.02wha

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi



369

McWhorter, John H. (2005). Deﬁning Creole. New York: Oxford University Press. McWhorter, John H. (2007). Language Interrupted: Signs of Non-Native Acquisition in Standard Language Grammars. New York: Oxford University Press. McWhorter, John H. (2008). ‘Why does a language undress? Strange cases in Indonesia’, in Matti Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language Complexity: Typology, Contact, Change. Amsterdam: John Benjamins, 167–90. McWhorter, John H. (2011). Linguistic Simplicity and Complexity: Why Do Languages Undress? Berlin: Walter de Gruyter. McWhorter, John H. (2012). ‘Case closed? Testing the Feature Pool Hypothesis’, Journal of Pidgin and Creole Languages 27: 171–82. doi:10.1075/jpcl.27.1 McWhorter, John H. (2016). ‘Is radical analyticity normal? Implications of Niger-Congo and Sino-Tibetan for typology and diachronic theory’, in Elly van Gelderen (ed.), Cyclical Change Continued. Amsterdam: John Benjamins, 49–91. doi:10.1075/la.227.03mcw McWhorter, John H. (2018). The Creole Debate. Cambridge: Cambridge University Press. McWhorter, John H. (2019). ‘The radically isolating languages of Flores: A challenge to diachronic theory’, Journal of Historical Linguistics 9: 177–207. doi:10.1075/jhl.16021.mcw Meakins, Felicity (2009). ‘The case of the shifty ergative marker: A pragmatic shift in the ergative marker in one Australian mixed language’, in Jóhanna Barðdal and Shobhana L. Chelliah (eds), The Role of Semantic, Pragmatic, and Discourse Factors in the Development of Case. Amsterdam: John Benjamins, 59–91. Meakins, Felicity (2011). Case Marking in Contact: The Development and Function of Case Morphology in Gurindji Kriol. Amsterdam: John Benjamins. Meakins, Felicity (2013). ‘Gurindji Kriol’, in Susanne Maria Michaelis, Philippe Maurer, Martin Haspelmath, and Magnus Huber (eds), The Survey of Pidgin and Creole Languages, vol. III: Contact Languages Based on Languages from Africa, Asia, Australia and the Americas. Oxford: Oxford University Press, 131–9. Meakins, Felicity (2015). ‘From absolutely optional to only nominally ergative: The life cycle of the Gurindji Kriol ergative sufﬁx’, in Francesco Gardani, Peter Arkadiev, and Nino Amiridze (eds), Borrowed Morphology. Berlin: Mouton de Gruyter, 189–218. Meakins, Felicity, Patrick McConvell, Erika Charola, Norm McNair, Helen McNair, and Lauren Campbell (2013). Gurindji to English dictionary. Batchelor, Australia: Batchelor Press. Meakins, Felicity and Rachel Nordlinger (2014). A Grammar of Bilinarra: An Australian Aboriginal Language of the Northern Territory. Berlin: Mouton de Gruyter. Meakins, Felicity and Carmel O’Shannessy (2010). ‘Ordering arguments about: Word order and discourse motivations in the development and use of the ergative marker in two Australian mixed languages’, Lingua 120(7): 1693–713. doi:10.1016/j.lingua.2009.05.013 Meakins, Felicity, Xia Hua, Cassandra Algy, and Lindell Bromham (2019). ‘Birth of a contact language did not favor simpliﬁcation’, Language 95(2): 294–332. doi:10.1353/ lan.2019.0032 Meeuwis, Michael (2013). ‘Lingala’, in Susanne Maria Michaelis, Philipe Maurer, Martin Haspelmath, and Magnus Huber (eds), The Survey of Pidgin and Creole Languages, vol. III: Contact Languages Based on Languages from Africa, Asia, Australia and the Americas. Oxford: Oxford University Press, 25–33. Meijer, Guus and Pieter C. Muysken (1977). ‘On the beginnings of pidgin and creole studies: Schuchardt and Hesseling’, in Albert Valdman (ed.), Pidgin and Creole Linguistics. Bloomington: Indiana University Press, 21–48. Mel’čuk, Igor (1994). ‘Suppletion: Toward a logical analysis of the concept’, Studies in Language 18: 339–410. doi:10.1075/sl.18.2.03mel

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

370



Merrill, William L. (2012). ‘The historical linguistics of Uto-Aztecan agriculture’, Anthropological Linguistics 54(3): 203–60. doi:10.1353/anl.2012.0017 Meyerhoff, Miriam (2009). ‘Animacy in Bislama: Using quantitative methods to evaluate transfer of a substrate feature’, in James Stanford and Dennis Preston (eds), Variation in Indigenous Minority Languages. Amsterdam: John Benjamins, 369–96. Michael, Lev (2008). Nanti Evidential Practice: Language, Knowledge, and Social Action in an Amazonian Society. University of Texas at Austin PhD dissertation. Michael, Lev, William Chang, and Tammy Stark (2014). ‘Exploring phonological areality in the Circum-Andean region using a naive Bayes classiﬁer’, Language Dynamics and Change 4(1): 27–86. doi:10.1163/22105832-00401004 Miestamo, Matti (2008). ‘Grammatical complexity in a cross-linguistic perspective’, in Matti Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language Complexity: Typology, Contact, Change. Amsterdam: John Benjamins, 23–41. Miestamo, Matti (2017). ‘Linguistic diversity and complexity’, Lingue e Linguaggio 16(2). 227–54. Miestamo, Matti, Kaius Sinnemäki, and Fred Karlsson (eds) (2008). Language Complexity: Typology, Contact, Change. Amsterdam: John Benjamins. Mihas, Elena (2015). A Grammar of Alto Perené (Arawak). Berlin: De Gruyter Mouton. Milin, Petar, Victor Kuperman, Aleksandar Kostić, and R. Harald Baayen (2009). ‘Words and paradigms bit by bit: An information-theoretic approach to the processing of paradigmatic structure in inﬂection and derivation’, in James P. Blevins and Juliette Blevins (eds), Analogy in Grammar: Form and Acquisition. Oxford: Oxford University Press, 214–52. Miller, Wick R. (1983). ‘Uto-Aztecan languages’, in Alfonso Ortiz (ed.), Handbook of North American Indians, vol. 10: Southwest. Washington, DC: Smithsonian Institution, 113–24. Mithun, Marianne (1988). ‘System-deﬁning structural properties in polysynthetic languages’, Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung 41(4): 442–52. Mithun, Marianne (1989). ‘The acquisition of polysynthesis’, Journal of Child Language 16: 285–312. doi:10.1017/S0305000900010424 Mithun, Marianne (1996). ‘General characteristics of North American Indian languages’, in Ives Goddard (ed.), Handbook of North American Indians, vol. 17: Languages. Washington, DC: Smithsonian Institution, 137–57. Mithun, Marianne (1998). ‘Yup’ik roots and afﬁxes’, in Osahito Miyaoka and Minoru Oshima (eds), Languages of the North Paciﬁc Rim, vol. 4. Kyoto: Kyoto University Graduate School of Letters, 63–76. Mithun, Marianne (2007). ‘Grammar, contact, and time’, Journal of Language Contact. THEMA 1: 133–55. Mithun, Marianne (2015). ‘Morphological complexity and language contact in languages indigenous to North America’, Linguistic Discovery 13(2): 37–59. Mithun, Marianne (2016). ‘Afﬁx ordering: Motivation and interpretation’, in Andrew Hippisley and Gregory Stump (eds), The Cambridge Handbook of Morphology. Cambridge: Cambridge University Press, 149–85. Miyaoka, Osahito (2011). A Grammar of Central Alaskan Yupik (CAY). Berlin: de Gruyter Mouton. Moscoso del Prado Martín, Fermín (2003). Paradigmatic Structures in Morphological Processing: Computational and cross-linguistics studies. University of Nijmegen PhD dissertation.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi



371

Moscoso del Prado Martín, Fermín (2011). ‘The mirage of morphological complexity’, in Laura Carlson, Christoph Hoelscher, and Thomas F. Shipley (eds), Proceedings of the 33rd Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society, 3524–9. Moscoso del Prado Martín, Fermín, Aleksandar Kostic, and R. Harald Baayen (2004). ‘Putting the bits together: An information-theoretical perspective on morphological processing’, Cognition 94(1): 1–18. Mufwene, Salikoko S. (2001). The Ecology of Language Evolution. Cambridge: Cambridge University Press. Mufwene, Salikoko S. (2008). Language Evolution: Contact, Competition, and Change. London: Continuum Press. Mufwene, Salikoko S. (2009). ‘Restructuring, hybridization, and complexity in language evolution’, in Enoch O. Aboh and Norval Smith (eds), Complex Processes in New Languages. Amsterdam: John Benjamins, 367–400. Mufwene, Salikoko S., François Pellegrino, and Christophe Coupé (eds) (2017). Complexity in Language: Developmental and Evolutionary Perspectives. Cambridge: Cambridge University Press. Mugdan, Joachim (1994). ‘Morphological units’, in Ron Asher (ed.), The Encyclopedia of Language and Linguistics. Oxford: Pergamon Press, 2543–53. Mühlhäusler, Peter (1997). Pidgin and Creole Linguistics. London: University of Westminster. Mukarovsky, Hans (1977). A Study of Western Nigritic, vol. I. Wien: Institut für Ägyptologie und Afrikanistik der Universität Wien. Müller, Neele (2013). Tense, Aspect, Modality, and Evidential Marking in South American Indigenous Languages. Utrecht: LOT. Munro, Pamela and Dieynaba Gaye (1997). Ay Baati Wolof: A Wolof Dictionary. Revised ed. Los Angeles: Department of Linguistics CLA. Muysken, Pieter C., Harald Hammarström, Joshua Birchall, Swintha Danielsen, Love Eriksen, Ana Vilacy Galucio, Rik van Gijn, Simon van de Kerke, Vishnupraya Kolipakam, Olga Krasnoukhova, Neele Müller, and Loretta O’Connor (2014). ‘The languages of South America: Deep families, areal relationships, and language contact’, in Loretta O’Connor and Pieter C. Muysken (eds), The Native Languages of South America. Cambridge: Cambridge University Press, 299–322. Myers-Scotton, Carol (2002). Contact Linguistics: Bilingual Encounters and Grammatical Outcomes. Oxford: Oxford University Press. Nakagawa, Shinichi and Holger Schielzeth (2013). ‘A general and simple method for obtaining R2 from generalized linear mixed-effects models’, Methods in Ecology and Evolution 4(2): 133–42. Nash, David (1980). Topics in Warlpiri Grammar. Massachusetts Institute of Technology PhD dissertation. Ndiaye, Moussa D. (2004). Eléments de morphologie du wolof. Méthodes d’analyse en linguistique. München: LINCOM Europa. Nettle, Daniel (2012). ‘Social scale and structural complexity in human languages’, Philosophical Transactions of the Royal Society B: Biological Sciences 367(1597): 1829–36. doi:10.1098/rstb.2011.0216 Neubauer, Kathleen and Harald Clahsen (2009). ‘Decomposition of inﬂected words in a second language: An experimental study of German participles’, Studies in Second Language Acquisition 31(3): 403–35. doi:10.1017/S0272263109090354

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

372



Newmeyer, Frederick J. and Laurel B. Preston (eds) (2014). Measuring Grammatical Complexity. Oxford: Oxford University Press. Nichols, Johanna (1986). ‘Head-marking and dependent-marking grammar’, Language 62(1): 56–119. Nichols, Johanna (1992). Linguistic Diversity in Space and Time. Chicago: University of Chicago Press. Nichols, Johanna (2003). ‘Diversity and stability in language’, in Brian D. Joseph and Richard Janda (eds), The Handbook of Historical Linguistics. Oxford: Blackwell, 283–310. Nichols, Johanna (2005). ‘The origin of the Chechen and Ingush: A study in alpine linguistic and ethnic geography’, Anthropological Linguistics 46: 129–55. Nichols, Johanna (2009). ‘Linguistic complexity: A comprehensive deﬁnition and survey’, in Geoffrey Sampson, David Gil, and Peter Trudgill (eds), Language Complexity as an Evolving Variable. Oxford: Oxford University Press, 110–25. Nichols, Johanna (2013). ‘The vertical archipelago: Adding the third dimension to linguistic geography’, in Peter Auer, Martin Hilpert, Anja Stukenbrock, and Benedikt Szmrecsanyi (eds), Space in Language and Linguistics. Berlin: Mouton de Gruyter, 38–60. Nichols, Johanna (2015). ‘Complexity as non-canonicality: An affordable, reliable metric for morphology’. Paper given at the 48th annual meeting of the Societas Linguistica Europaea (SLE), Leiden. Nichols, Johanna (2016). ‘Complex edges, transparent frontiers: Grammatical complexity and language spreads’, in Raffaela Baechler and Guido Seiler (eds), Complexity, Isolation, and Variation. Berlin: de Gruyter, 117–37. Nichols, Johanna (2017). ‘Person as an inﬂectional category’, Linguistic Typology 21(3): 387–456. doi:10.1515/lingty-2017-0010 Nichols, Johanna (2019). ‘Why is gender so complex? Some typological considerations’, in Francesca Di Garbo, Bruno Olsson, and Bernhard Wälchli (eds), Grammatical Gender and Linguistic Complexity, vol. I: General Issues and Speciﬁc Studies. Berlin: Language Sciences Press, 63–92. Nichols, Johanna (in prep.). The languages of the Great Caucasus range. Nichols, Johanna, Jonathan Barnes, and David A. Peterson (2006). ‘The robust bell curve of morphological complexity’, Linguistic Typology 10(1): 96–106. Nichols, Johanna and Christian Bentz (2018). ‘Morphological complexity of languages reﬂects the settlement history of the Americas’, in Katerina Harvati, Gerhard Jäger, and Hugo Reyes-Centano (eds), New Perspectives on the Peopling of the Americas. Tübingen: Kerns, 13–26. Nichols, Johanna and Yury Lander (2020). ‘Head-dependent marking’, in Mark Aronoff (ed.), Oxford Research Encyclopedia of Linguistics. New York: Oxford University Press. DOI: 10.1093/acrefore/9780199384655.013.523 Njie, Codu Mbassy (1982). Description syntaxique du wolof de Gambie. Dakar: Nouvelles Editions africaines. Nordlinger, Rachel (2011). ‘Transitivity in Murrinh-Patha’, Studies in Language 35(3): 702–34. doi:10.1075/sl.35.3.08nor Nordlinger, Rachel (2015). ‘Inﬂection in Murrinh-Patha’, in Matthew Baerman (ed.), The Oxford Handbook of Inﬂection. Oxford: Oxford University Press, 491–519. Nordlinger, Rachel (2017). ‘The languages of the Daly River region (Northern Australia)’, in Michael Fortescue, Marianne Mithun, and Nicholas Evans (eds), The Oxford Handbook of Polysynthesis. Oxford: Oxford University Press, 782–807.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi



373

Nordlinger, Rachel and Patrick Caudal (2012). ‘The tense, aspect and modality system in Murrinh-Patha’, Australian Journal of Linguistics 32(1): 73–112. doi:10.1080/ 07268602.2012.657754 Norman, Jerry (1988). Chinese. Cambridge: Cambridge University Press. Nurse, Derek (2007). ‘Did the proto-Bantu verb have a synthetic or an analytic structure?’, SOAS Working Papers in Linguistics 15: 239–56. Nurse, Derek (2008). Tense and Aspect in Bantu. New York: Oxford University Press. O’Connor, Catherine, Joan Maling, and Barbora Skarabela (2013). ‘Nominal categories and the expression of possession: A cross-linguistic study of probabilistic tendencies and categorial constraints’, in Kersti Börjars, David Denison, and Alan Scott (eds), Morphosyntactic Categories and the Expression of Possession. Amsterdam: John Benjamins, 89–121. Olawsky, Knut (2006). A Grammar of Urarina. Berlin: Mouton de Gruyter. Ospina Bozzi, Ana María (2002). Les structures élémentaires du Yuhup Maku, langue de l’Amazonie Colombienne: Morphologie et syntaxe. Université Paris 7—Denis Diderot PhD dissertation. Öztürk, Balkız and Markus A. Pöchtrager (2011). Pazar Laz. München: LINCOM Europa. Paauw, Scott (2007). ‘A North Papua linguistic area?’. Paper given at the ‘Workshop on the Languages of Papua’, Manokwari. Parker, Jeff (2016). Inﬂectional Complexity and Cognitive Processing: An Experimental and Corpus-Based Investigation of Russian Nouns. The Ohio State University PhD dissertation. Parker, Jeff, Robert Reynolds, and Andrea D. Sims (to appear). ‘The role of languagespeciﬁc network properties in the emergence of inﬂectional irregularity’, in Andrea D. Sims, Adam Ussishkin, Jeff Parker, and Samantha Wray (eds), Morphological Typology and Linguistic Cognition. Cambridge: Cambridge University Press. Parkvall, Mikael (2008). ‘The simplicity of creoles in cross-linguistic perspective’, in Matti Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language Complexity: Typology, Contact, Change. Amsterdam: John Benjamins, 265–85. Payne, Doris L. (1990). ‘Morphological characteristics of lowland South American languages’, in Doris L. Payne (ed.), Amazonian Linguistics: Studies in Lowland South American Languages. Austin, TX: University of Texas Press, 213–41. Payne, Doris L. (2007). ‘Source of the Yagua nominal classiﬁcation system’, International Journal of American Linguistics 73(4): 447–74. doi:10.1086/523773 Payne, John (2013). ‘The oblique genitive in English’, in Kersti Börjars, David Denison, and Alan Scott (eds), Morphosyntactic Categories and the Expression of Possession. Amsterdam: John Benjamins, 178–92. Payne, Thomas (1997). Describing Morphosyntax. Cambridge: Cambridge University Press. Perrin, Loïc-Michel (2012). L’expression du temps en wolof—langue atlantique parlée au Sénégal. Köln: Köppe. Perrott, D. V. (1950). Teach Yourself Swahili. New York: Random House. Pienemann, Manfred (1998). Language Processing and Second Language Development: Processability Theory. Amsterdam: John Benjamins. Pinheiro, José C. and Douglas M. Bates (2000). Mixed-Effects Models in S and S-PLUS. New York: Springer. Pinker, Steven and Alan Prince (1988). ‘On language and connectionism: Analysis of a parallel distributed processing model of language acquisition’, Cognition 28: 73–193. doi:10.1016/0010-0277(88)90032-7

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

374



Pirrelli, Vito (2000). Paradigmi in morfologia. Un approccio interdisciplinare alla ﬂessione verbale dell’italiano. Pisa: Istituti Editoriali e Poligraﬁci Italiani. Pirrelli, Vito, Marcello Ferro, and Claudia Marzi (2015). ‘Computational complexity of abstractive morphology’, in Matthew Baerman, Dunstan Brown, and Greville Corbett (eds), Understanding and Measuring Morphological Complexity. Oxford: Oxford University Press, 141–66. Plag, Ingo (2003a). ‘Introduction: The morphology of creole languages’, in Geert Booij and Jaap van Marle (eds), Yearbook of Morphology 2002. Alphen aan den Rijn: Kluwer, 1–2. doi:10.1007/0-306-48223-1_1 Plag, Ingo (2003b). Phonology and Morphology of Creole Languages. Tübingen: Niemeyer. Plag, Ingo (2008). ‘Creoles as interlanguages: Inﬂectional morphology’, Journal of Pidgin and Creole Languages 23: 114–35. doi:10.1075/jpcl.23.1.06pla Plank, Frans (1986). ‘Paradigm size, morphological typology, and universal economy’, Folia Linguistica 20(1–2): 29–48. doi:10.1515/ﬂin.1986.20.1-2.29 Pozdniakov, Konstantin (1993). Sravnitel’naja grammatika atlantičeskich jazykov. Moscow: Nauka. Pozdniakov, Konstantin (2015). ‘Diachronie des classes nominales atlantiques. Morphonologie, morphologie, sémantique’, in Denis Creissels and Konstantin Pozdniakov (eds), Les classes nominales dans les langues atlantiques. Köln: Köppe, 57–102. Pozdniakov, Konstantin and Stéphane Robert (2015). ‘Les classes nominales en wolof. Fonctionnalités et singularités d’un système restreint’, in Denis Creissels and Konstantin Pozdniakov (eds), Les classes nominales dans les langues atlantiques. Köln: Köppe, 545–628. Prasada, Sandeep and Steven Pinker (1993). ‘Generalisation of regular and irregular morphological patterns’, Language and Cognitive Processes 8(1): 1–56. doi:10.1080/ 01690969308406948 Pye, Br John MSC (1972). The Port Keats Story. Darwin: Colemans. Rambaud, Jean-Baptiste (1898). ‘De la détermination en wolof ’, Bulletin de la Société de Linguistique de Paris 10: 122–36. [Reprinted in Gabriel Manessy and Serge Sauvageot (eds) (1963). Wolof et Sérèr. Études de phonétique et de grammaire descriptive. Dakar: University of Dakar Press, 11–24.] Reid, Nicholas (1990). Ngan’gityemerri: A Language of the Daly River Region, Northern Territory of Australia. Australian National University PhD dissertation. Reintges, Chris (2015). ‘Increasing morphological complexity and how syntax drives morphological change’, in Theresa Biberauer and George Walkden (eds), Syntax Over Time: Lexical, Morphological, and Information-Structural Interactions. Oxford: Oxford University Press, 124–45. Rescher, Nicholas (1998). Complexity: A Philosophical Overview. New Brunswick, NJ: Transaction Publishers. Rhodes, Richard (1987). ‘Paradigms large and small’, Proceedings of the 13th Annual Meeting of the Berkeley Linguistics Society. Berkeley, CA: Berkeley Linguistics Society, 223–34. Rice, Keren (2011). ‘Principles of afﬁx ordering: An overview’, Word Structure 4(2): 169–200. doi:10.3366/word.2011.0009 Roberts, Ian (1999). ‘Verb movement and markedness’, in Michel deGraff (ed.), Language Change: Creolization, Diachrony, and Development. Cambridge, MA: The MIT Press, 287–328.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi



375

Roberts, Sarah J. and Joan Bresnan (2008). ‘Retained inﬂectional morphology in pidgins: A typological study’, Linguistic Typology 12(2): 269–302. doi:10.1515/LITY.2008.039 Roberts, Seán (2018). ‘Chield: Causal hypotheses in evolutionary linguistics database’, in Christine Cuskley, Molly Flaherty, Hannah Little, Luke McCrohon, Andrea Ravignani, and Tessa Verhoef (eds): The Evolution of Language: Proceedings of the 12th International Conference (EVOLANG12). doi:10.12775/3991-1.099 Robins, R. H. (1958). The Yurok Language: Grammar, Texts, Lexicon. Berkeley, CA: University of California Press. Romaine, Suzanne (1988). Pidgin and Creole Languages. London: Longman. Rottet, Kevin J. (1992). ‘Functional categories and verb movement in Louisiana creole’, Probus 4: 261–89. doi:10.1515/prbs.1992.4.3.261 Russell, Kevin (1999). ‘What’s with all these long words anyway?’, in Leora Bar-El, RoseMarie Dechaine, and Charlotte Reinholtz (eds), Papers from the Workshop on Structure and Constituency in Native American Languages. Cambridge, MA: The MIT Press, 119–30. Sadock, Jerrold (2017). ‘The subjectivity of the notion of polysynthesis’, in Michael Fortescue, Marianne Mithun, and Nicholas Evans (eds), The Oxford Handbook of Polysynthesis. Oxford: Oxford University Press, 99–114. Saffran, Jenny R., Richard N. Aslin, and Elissa L. Newport (1996). ‘Statistical learning by 8month infants’, Science 274(5294): 1926–8. doi:10.1126/science.274.5294.1926 Sagot, Benoît and Géraldine Walther (2011). ‘Non-canonical inﬂection: Data, formalisation and complexity measures’, in Cerstin Mahlow and Michael Piotrowski (eds), Systems and Frameworks for Computational Morphology. Berlin: Springer, 23–45. doi:10.1007/978-3642-23138-4_3 Samara, Anna, Kenny Smith, Helen Brown, and Elizabeth Wonnacott (2017). ‘Acquiring variation in an artiﬁcial language: Children and adults are sensitive to socially conditioned linguistic variation’, Cognitive Psychology 94: 85–114. doi:10.1016/j. cogpsych.2017.02.004 Sampson, Geoffrey, David Gil, and Peter Trudgill (eds) (2009). Language Complexity as an Evolving Variable. Oxford: Oxford University Press. Sapir, Edward (1921). Language: An Introduction to the Study of Speech. New York: Harcourt, Brace & Co. Sapir, J. David (1965). A Grammar of Diola–Fogny, a Language Spoken in the BasseCasamance Region of Senegal. Cambridge: Cambridge University Press. Sapir, J. David (1971). ‘West Atlantic: An inventory of the languages, their noun class systems and consonant alternation’, in Thomas Sebeok (ed.), Current Trends in Linguistics, vol. VII: Linguistics in Sub-Saharan Africa. The Hague: Mouton, 44–112. Sauvageot, Serge (1965). Description synchronique d’un dialecte Wolof. Le parler du Dyolof. Dakar: Institut Français de l’Afrique Noire. Sauvageot, Serge (1967). ‘Note sur la classiﬁcation nominale en baïnouk’, in Gabriel Manessy (ed.), La classiﬁcation nominale dans les langues négro-africaines. Paris: CNRS, 225–36. Scalise, Sergio (1984). Morfologia lessicale. Padova: CLESP. Schiering, René, Balthasar Bickel, and Kristine Hildebrandt (2010). ‘The prosodic word is not universal, but emergent’, Journal of Linguistics 46: 657–710. doi:10.1017/ S0022226710000216 Schlegel, Friedrich von (1808). Über die Sprache und Weisheit der Indier. Ein Beitrag zur Begründung der Alterthumskunde. Heidelberg: Mohr & Zimmer.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

376



Schreuder, Robert and R. Harald Baayen (1997). ‘How simplex complex words can be’, Journal of Memory and Language 37: 118–39. doi:10.1006/jmla.1997.2510 Schwegler, Armin (2013). ‘Palenquero structure dataset’, in Susanne Maria Michaelis, Philippe Maurer, Martin Haspelmath, and Magnus Huber (eds), Atlas of Pidgin and Creole Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. URL: http://apics-online.info/contributions/48 Segerer, Guillaume (2010). ‘Isolates in Atlantic’. Paper given at the workshop ‘Language Isolates in Africa’, 4 December, Lyon. Seifart, Frank (2005). The Structure and Use of Shape-Based Noun Classes in Miraña (North West Amazon). Universiteit Nijmegen PhD dissertation. Seifart, Frank (2011). Bora Loans in Resígaro: Massive Morphological and Little Lexical Borrowing in a Moribund Arawakan Language. Cadernos de Etnolingüística, Série Monograﬁas 2 [online publisher]. Seifart, Frank and Doris Payne (2007). ‘Nominal classiﬁcation in the Northwest Amazon: Issues in areal diffusion and typological characterization’, International Journal of American Linguistics 73(4): 381–7. doi:10.1086/523770 Seuren, Pieter (1990). ‘Verb syncopation and predicate raising in Mauritian Creole’, Theoretical Linguistics 1(13): 804–44. doi:10.1515/ling.1990.28.4.809 Seuren, Pieter (1998). Western Linguistics: An Historical Introduction. Oxford: Blackwell. Seuren, Pieter and Herman Wekker (1986). ‘Semantic transparency as a factor in creole genesis’, in Pieter Muysken and Norval Smith (eds), Substrata versus Universals in Creole Genesis. Amsterdam: John Benjamins, 57–70. Shalizi, Cosma Rohilla (2001). ‘Causal architecture, complexity and self-organization in the time series and cellular automata’. University of Wisconsin-Madison PhD dissertation. Shannon, Claude E. (1948). ‘A mathematical theory of communication’, Bell System Technical Journal 27(3): 379–423. Shosted, Ryan (2006). ‘Correlating complexity: A typological approach’, Linguistic Typology 10(1): 1–40. doi:10.1515/LINGTY.2006.001 Silva, Wilson de Lima (2012). A Descriptive Grammar of Desano. University of Utah PhD dissertation. Sims, Andrea D. (2015). Inﬂectional Defectiveness. Cambridge: Cambridge University Press. Sims, Andrea D. and Jeff Parker (2016). ‘How inﬂection class systems work: On the informativity of implicative structure’, Word Structure 9(2): 215–39. doi:10.3366/ word.2016.0094 Sinnemäki, Kaius (2008). ‘Complexity trade-offs in core argument marking’, in Matti Miestamo, Kaius Sinnemäki, and Fred Karlsson (eds), Language Complexity: Typology, Contact, Change. Amsterdam: John Benjamins, 67–88. Sinnemäki, Kaius (2011). Language Universals and Linguistic Complexity: Three Case Studies in Core Argument Marking. University of Helsinki PhD dissertation. Sinnemäki, Kaius (2014). ‘Global optimization and complexity trade-offs’, Poznań Studies in Contemporary Linguistics 50(2): 179–95. doi: 10.1515/psicl-2014-0013 Smith, Kenny, Amy Perfors, Olga Fehér, Anna Samara, Kate Swoboda, and Elizabeth Wonnacott (2017). ‘Language learning, language use and the evolution of linguistic variation’, Philosophical Transactions of the Royal Society B 372(1711): 20160051. doi:10.1098/rstb.2016.0051 Smith, Kenny and Elizabeth Wonnacott (2010). ‘Eliminating unpredictable variation through iterated learning’, Cognition 116(3): 444–9. doi:10.1016/j.cognition.2010.06.004 Soubrier, Aude (2013). Description de l’ikposso uwi. Lyon: Université Lumière Lyon 2 dissertation.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi



377

Spencer, Andrew and Ana R. Luís (2012). Clitics: An Introduction. Cambridge: Cambridge University Press. Stahlke, Herbert (1970). ‘Serial verbs’, Studies in African Linguistics 1: 60–99. Štekauer, Pavol (2015). ‘The delimitation of derivation and inﬂection’, in Peter O. Müller, Ingeborg Ohnheiser, Susan Olsen, and Franz Rainer (eds), Word-Formation: An International Handbook of the Languages of Europe, vol. 1. Berlin: De Gruyter Mouton, 218–35. Stenzel, Kristine (2008). ‘Evidentials and clause modality in Wanano’, Studies in Language 32(2): 405–45. doi:10.1075/sl.32.2.06ste Stenzel, Kristine (2013a). A Reference Grammar of Kotiria (Wanano). Lincoln, NE: University of Nebraska Press. Stenzel, Kristine (2013b). ‘Contact and innovation in Vaupés possession-marking strategies’, in Patience Epps and Kristine Stenzel (eds), Cultural and Linguistic Interaction in the Upper Rio Negro Region. Rio de Janeiro: Museu do Índio-FUNAI, 353–402. Stenzel, Kristine and Elsa Gomez-Imbert (2009). ‘Contato linguístico e mudança linguística no noroeste amazônico: O caso do Kotiria (Wanano)’, Revista da ABRALIN 8: 71–100. Stewart, William Alexander and William W. Gage (1970). Notes on Wolof Grammar by William A. Stewart. Adapted by William W. Gage, in Dakar Wolof: A Basic Course prepared by Loren V. Nussbaum, William W. Gage, and Daniel Varre. Washington, DC: Center for Applied Linguistics, 355–412. Stilo, Donald (2019). ‘Loss vs. expansion of gender in Tatic languages: Kafteji (Kabatei) and Kelāsi’, in Alireza Korangy and Behrooz Mahmoodi-Bakhtiari (eds), Essays on Typology of Iranian Languages. Berlin: De Gruyter Mouton, 34–78. doi:10.1515/9783110604443-004 Stoll, Sabine, Balthasar Bickel, and Jekaterina Mažara (2017). ‘The acquisition of polysynthetic verb forms in Chintang’, in Michael Fortescue, Marianne Mithun, and Nicholas Evans (eds), The Oxford Handbook of Polysynthesis. Oxford: Oxford University Press, 495–514. Stolz, Thomas (2012). ‘Survival in a niche: On gender-copy in Chamorro (and sundry languages)’, in Martine Vanhove, Thomas Stolz, Aina Urdze, and Hitomi Otsuka (eds), Morphologies in Contact. Berlin: Akademie-Verlag, 93–140. Stolz, Thomas (2015). ‘Adjective-noun agreement in language contact’, in Francesco Gardani, Peter Arkadiev, and Nino Amiridze (eds), Borrowed Morphology. Berlin: Mouton de Gruyter, 269–301. Street, Chester (1987). An Introduction to the Language and Culture of the Murrinh-Patha. Darwin: Summer Institute of Linguistics. Stump, Gregory (2001). Inﬂectional Morphology: A Theory of Paradigm Structure. Cambridge: Cambridge University Press. Stump, Gregory (2006a). ‘Heteroclisis and paradigm linkage’, Language 82(2): 279–322. doi:10.1353/lan.2006.0110 Stump, Gregory (2006b). ‘Template morphology’, in Keith Brown (ed.), Encyclopedia of Language & Linguistics. 2nd ed. Oxford: Elsevier, 559–63. Stump, Gregory (2016). Inﬂectional Paradigms: Content and Form at the SyntaxMorphology Interface. Cambridge: Cambridge University Press. Stump, Gregory (2017). ‘The nature and dimensions of complexity in morphology’. Annual Review of Linguistics 3(1): 65–83. doi:10.1146/annurev-linguistics-011415-040752 Stump, Gregory and Raphael A. Finkel (2013). Morphological Typology: From Word to Paradigm. Cambridge: Cambridge University Press. Stump, Gregory and Raphael A. Finkel (2015). ‘Contrasting modes of representation for inﬂectional systems: Some implications for computing morphological complexity’, in

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

378



Matthew Baerman, Dunstan Brown, and Greville G. Corbett (eds), Understanding and Measuring Morphological Complexity. Oxford: Oxford University Press, 119–40. Syea, Anand (1992). ‘The short and long forms of verbs in Mauritian Creole: Functionalism versus formalism’, Theoretical Linguistics 18: 61–97. doi:10.1515/thli.1992.18.1.61 Sylla, Yero (1982). Grammaire moderne du Pulaar. Dakar: Nouvelles éditions africaines. Szmrecsanyi, Benedikt and Bernd Kortmann (2009). ‘The morphosyntax of varieties of English worldwide: A quantitative perspective’, Lingua 119(11): 1643–63. doi:10.1016/j. lingua.2007.09.016 Taft, Marcus (1979). ‘Recognition of afﬁxed words and the word frequency effect’, Memory & Cognition 7(4): 263–72. doi:10.3758/BF03197599 Taft, Marcus (2004). ‘Morphological decomposition and the reverse base frequency effect’, The Quarterly Journal of Experimental Psychology 57(4): 745–65. doi:10.1080/ 02724980343000477 Taft, Marcus and Sam Ardasinski (2006). ‘Obligatory decomposition in reading preﬁxed words’, The Mental Lexicon 1(2): 183–99. doi:10.1075/ml.1.2.02taf Tallman, Adam (2018). A Grammar of Chácobo, a Southern Pano Language of the Northern Bolivian Amazon. University of Texas at Austin PhD dissertation. Tamba, Khady, Harold Torrence, and Malte Zimmermann (2012). ‘Wolof quantiﬁers’, in Edward Keenan and Denis Paperno (eds), Handbook of Quantiﬁcation in Natural Language. New York: Springer, 891–939. Thiam, Ndiassé (1987). Les categories nominales en wolof. Aspects sémantiques. Dakar: Centre de linguistique appliquée de Dakar. Thomason, Sarah G. (2001). Language Contact: An Introduction. Washington, DC: Georgetown University Press. Thomason, Sarah G. (2008). ‘Pidgins/creoles and historical linguistics’, in Silvia Kouwenberg and John Victor Singler (eds), Handbook of Pidgin and Creole Languages. Malden, MA: Wiley-Blackwell, 242–62. Thomason, Sarah G. (2015). ‘When is the diffusion of inﬂectional morphology not dispreferred?’, in Francesco Gardani, Peter Arkadiev, and Nino Amiridze (eds), Borrowed Morphology. Berlin: Mouton de Gruyter, 27–46. Thomason, Sarah G. and Terence Kaufman (1988). Language Contact, Creolization, and Genetic Linguistics. Berkeley, CA: University of California Press. Thomaz, Luis Felípe (2002). Babel Loro Sa’e: O problema linguístico de Timor-Leste. Lisboa: Instituto Camões. Thornton, Anna M. (2005). Morfologia. Roma: Carocci. Thornton, Anna M. (2011). ‘Overabundance (multiple forms realizing the same cell): A non-canonical phenomenon in Italian verb morphology’, in Martin Maiden, John C. Smith, Maria Goldbach, and Marc-Olivier Hinzelin (eds), Morphological Autonomy: Perspectives from Romance Inﬂectional Morphology. Oxford: Oxford University Press, 359–82. Thornton, Anna M. (2019). ‘Overabundance: A canonical typology’, in Franz Rainer, Francesco Gardani, Wolfgang U. Dressler, and Hans Christian Luschützky (eds), Competition in Inﬂection and Word-Formation. Cham: Springer, 223–58. doi:10.1007/ 978-3-030-02550-2_9 Tily, Harry and T. Florian Jaeger (2011). ‘Complementing quantitative typology with behavioral approaches: Evidence for typological universals’, Linguistic Typology 15(2): 497–508. doi:10.1515/LITY.2011.033 Timberlake, Alan (2004). A Reference Grammar of Russian. Cambridge: Cambridge University Press.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi



379

Tinits, Peeter (2014). ‘Language stability and morphological complexity in situations of language contact: An experimental paradigm’, in 19th International Congress of Linguists Papers. Geneva: Département de Linguistique de l’Université de Genève. Tomasello, Michael (2000). ‘First steps in a usage-based theory of language acquisition’, Cognitive Linguistics 11: 61–82. doi:10.1515/cogl.2001.012 Tomasello, Michael (2006). ‘Acquiring linguistic constructions’, in Robert Siegler and Deanna Kuhn (eds), Handbook of Child Psychology. New York: Wiley, 1860–2010. Torrence, Harold (2013). The Clause Structure of Wolof: Insights into the Left Periphery. Amsterdam: John Benjamins. Tourneux, Henry and Maurice Barbotin (2009). Dictionnaire pratique du créole de Guadeloupe. Paris: Karthala. Tribout, Delphine (2012). ‘Verbal stem space and verb to noun conversion in French’, Word Structure 5: 109–28. doi:10.3366/word.2012.0022 Trudgill, Peter (1983). ‘Language contact and language change: On the rise of the creoloid’, in Peter Trudgill (ed.), On Dialect: Social and Geographical Perspectives. Oxford: Blackwell, 102–7. Trudgill, Peter (1997). ‘Typology and sociolinguistics: Linguistic structure, social structure and explanatory comparative dialectology’. Folia Linguistica 31(3–4): 349–60. doi:10.1515/ﬂin.1997.31.3-4.349 Trudgill, Peter (1999). ‘Language contact and the function of linguistic gender’, Poznań Studies in Contemporary Linguistics 35: 133–52. Trudgill, Peter (2004a). ‘Linguistic and social typology: The Austronesian migrations and phoneme inventories’, Linguistic Typology 8(3): 305–20. doi:10.1515/lity.2004.8.3.305 Trudgill, Peter (2004b). ‘The impact of language contact and social structure on linguistic structure’, in Bernd Kortmann (ed.), Dialectology Meets Typology: Dialect Grammar from a Cross-Linguistic Perspective. Berlin: Mouton de Gruyter, 435–51. Trudgill, Peter (2009). ‘Sociolinguistic typology and complexiﬁcation’, in Geoffrey Sampson, David Gil, and Peter Trudgill (eds), Language Complexity as an Evolving Variable. Oxford: Oxford University Press, 98–109. Trudgill, Peter (2011). Sociolinguistic Typology: Social Determinants of Linguistic Complexity. Oxford: Oxford University Press. Trudgill, Peter (2017). ‘The anthropological setting of polysynthesis’, in Michael Fortescue, Marianne Mithun, and Nicholas Evans (eds), The Oxford Handbook of Polysynthesis. Oxford: Oxford University Press, 186–202. Tuite, Kevin (1999). ‘The myth of the Caucasian Sprachbund: The case of ergativity’, Lingua 108(1): 1–29. doi:10.1016/S0024-3841(98)00037-0 Ullman, Michael T. (2001). ‘The declarative/procedural model of lexicon and grammar’, Journal of Psycholinguistic Research 30(1): 37–69. doi:10.1023/A:1005204207369 Ullman, Michael T. (2004). ‘Contributions of memory circuits to language: The declarative/ procedural model’, Cognition 92(1–2): 231–70. doi:10.1016/j.cognition.2003.10.008 Valdman, Albert, Iskra Iskrova, and Benjamin Hebblethwaite (2007). Haitian CreoleEnglish Bilingual Dictionary. Bloomington, IN: Indiana University Creole Institute. Valenzuela, Pilar (2003). Transitivity in Shipibo-Konibo Grammar: A Typologically Oriented Study. University of Oregon PhD dissertation. Valenzuela, Pilar (2010). ‘Applicative constructions in Shipibo-Konibo (Panoan)’, International Journal of American Linguistics 76: 101–44. doi:10.1086/652756 Vallejos Yopán, Rosa (2010). A Grammar of Kokama-Kokamilla. University of Oregon PhD dissertation.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

380



van der Voort, Hein (2005). ‘Kwaza in comparative perspective’, International Journal of American Linguistics 71: 365–412. doi:10.1086/501245 van der Voort, Hein (2016). ‘Recursive inﬂection and grammaticalized ﬁctive interaction in the Southwestern Amazon’, in Esther Pascual and Sergeiy Sandler (eds), The Conversation Frame: Forms and Functions of Fictive Interaction. Amsterdam: John Benjamins, 277–302. Van Engelenhoven, Aone (2004). Leti, a Language of Southwest Maluku. Leiden: KITLV Press. van Gijn, Rik and Fernando Zúñiga (2014). ‘Word and the Americanist perspective’, Morphology 24: 135–60. doi:10.5167/uzh-99717 Vanhove, Martine (2001). ‘Contacts de langues et complexiﬁcation des systèmes: Le cas du maltais’, Faits de Langues 18: 65–74. Veenstra, Tonjes (2009). ‘Verb allomorphy and the syntax of phases’, in Enoch Aboh and Norval Smith (eds), Complex Processes in New Languages. Amsterdam: John Benjamins, 99–114. Veenstra, Tonjes and Angelika Becker (2003). ‘The survival of inﬂectional morphology in French-related creoles’, Studies in Second Language Acquisition 25: 285–306. doi:10.1017/S0272263103000123 Villoing, Florence and Maxime Deglas (2016). ‘La formation de verbes dénominaux en guadeloupéen. La part de l’héritage et de l’innovation’, 5ème Congrès Mondial de Linguistique Française 2016, Tours, France. doi:10.1051/shsconf/20162708004 Wälchli, Bernhard (2017). ‘The incomplete story of feminine gender loss in Northwestern Latvian dialects’, Baltic Linguistics 8: 143–214. Wälchli, Bernhard (2018). ‘The rise of gender in Nalca (Mek, Tanah Papua): The drift towards the canonical gender attractor’, in Sebastian Fedden, Jenny Audring, and Greville Corbett (eds), Non-Canonical Gender Systems. Oxford: Oxford University Press, 68–99. Walsh, Michael (1976). The Murinypata Language of North-West Australia. Australian National University PhD dissertation. Walther, Géraldine (2017). ‘Paradigm realisation and the lexicon’, in Ferenc Kiefer, James P. Blevins, and Huba Bartos (eds), Perspectives on Morphological Organization: Data and Analyses. Leiden: Brill, 159–99. Weinreich, Uriel, William Labov, and Marvin Herzog (1968). ‘Empirical foundations for a theory of language change’, in Winfred Philip Lehmann and Yakov Malkiel (eds), Directions for Historical Linguistics. Austin, TX: University of Texas Press, 95–198. Wells, Rulon (1954). ‘Archiving and language typology’, International Journal of American Linguistics 20(2): 101–7. Wichmann, Søren and Eric W. Holman (2009). Temporal Stability of Linguistic Typological Features. München: LINCOM Europa. Wilson, William André Auquier (1989). ‘Atlantic’, in John Theodore Bendor-Samuel (ed.), The Niger-Congo Languages: A Classiﬁcation and Description of Africa’s Largest Language Family. Lanham, MD: University Press of America, by arrangement with the Summer Institute of Linguistics (SIL), 81–104. Wilson, William André Auquier (2007). Guinea Languages of the Atlantic Group. Frankfurt am Main: Peter Lang. Wise, Mary Ruth (1971). Identiﬁcation of Participants in Discourse: A Study of Aspects of Form and Meaning in Nomatsiguenga. Norman, OK: Summer Institute of Linguistics of the University of Oklahoma.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi



381

Wise, Mary Ruth (1990). ‘Valence-changing afﬁxes in Maipuran Arawakan languages’, in Doris Payne (ed.), Amazonian Linguistics: Studies in Lowland South American Languages. Austin, TX: University of Texas Press, 89–116. Wise, Mary Ruth (2002). ‘Applicative afﬁxes in Peruvian Amazonian languages’, in Mily Crevels, Simon van de Kerke, Sérgio Meira, and Hein van der Voort (eds), Current Studies on South American Languages: Selected Papers from the 50th International Congress of Americanists in Warsaw and the Spinoza Workshop on Amerindian Languages in Leiden, 2000. Leiden: Research School of Asian, African, and Amerindian Studies (CNWS), 329–44. Wittmann, Henri and Robert Fournier (1987). ‘Interpretation diachronique de la morphologie verbale du créole réunionnais.’ Revue québecoise de linguistique 6(2): 137–50. Woodbury, Anthony (2017). ‘Central Alaskan Yupik (Eskimo-Aleut): A sketch of morphologically orthodox polysynthesis’, in Michael Fortescue, Marianne Mithun, and Nicholas Evans (eds), The Oxford Handbook of Polysynthesis. Oxford: Oxford University Press, 536–60. Wray, Alison and George W. Grace (2007). ‘The consequences of talking to strangers: Evolutionary corollaries of socio-cultural inﬂuences on linguistic form’, Lingua 117(3): 543–78. doi:10.1016/j.lingua.2005.05.005 Wurzel, Wolfgang U. (1989). Inﬂectional Morphology and Naturalness. Dordrecht: Kluwer. Xanthos, Aris, Sabine Laaha, Steven Gillis, Ursula Stephany, Ayhan Aksu-Koç, Anastasia Christoﬁdou, Natalia Gagarina, Gordana Hrzica, F. N. Ketrez, Marianne Kilani-Schoch, Katharina Korecky-Kröll, Melita Kovačević, Klaus Laalo, Marijan Palmović, Barbara Pfeiler, Maria D. Voeikova, and Wolfgang U. Dressler (2011). ‘On the role of morphological richness in the early development of noun and verb inﬂection’, First Language 31 (4): 461–79. doi:10.1177%2F0142723711409976 Yarshater, Ehsan (1969). A Grammar of Southern Tati Dialects. The Hague: Mouton. Zaliznjak, Andrei A. (1967). Russkoe imennoe slovoizmenenie. Moscow: Nauka. Zaliznjak, Andrei A. (1977). Grammatičeskij slovar’ russkogo jazyka. Moscow: Russkij jazyk. Zúñiga, Fernando (2017). ‘On the morphosyntax of indigenous languages of the Americas’, International Journal of American Linguistics 83(1): 111–39. doi:10.1086/689548 Zwitserlood, Inge (2003). ‘Word formation below and above little x: Evidence from sign language of the Netherlands’, Nordlyd 31(2): 488–502.

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 24/8/2020, SPi

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

Language Index Abkhaz-Adyghean languages, see West Caucasian languages Abun 268 Acoma 190 Aghul 208, 209, 213, 226 Aikanã 239, 241 Ainu 166–7, 190 Akha 274 Albanian 189, 273 Aleut 190 Algic languages 174, 190, 191, 216 Andoke 242 Apurinã 238, 241 Arabela 246 Araucanian languages, see Mapudungun Arawakan languages 237–9, 241, 243–6, 248, 254 Archi 180, 184, 189, 226 Ashéninka Perené 248, 255, 259 Athabaskan languages 190 Atlantic languages 16, 136–60, 188, 273, 280, 303 Atlantic-Congo languages, see also Niger-Congo languages 196, 197, 214, 218, 223 Austroasiatic languages 206, 226, 278 Austronesian languages 110, 190, 197, 205, 211, 228, 268–9, 280 Avar 170, 171, 180, 182–3, 185, 189 Aymara 191 Aymaran 191 Bagnoun, Baïnounk, Bainuk, Banyun 137, 140, 148 Baïnounk Gubaher 140, 148, 155, 156 Baïnounk Gunyamolo 148, 155 Balto-Slavic languages 176, 177, 198, 208, 213, 224 Bantu languages 113, 114, 169, 171, 173, 196–7, 198, 207, 216, 217–19, 223, 267, 273, 275, 280–1 Bardi 174, 190 Basque 189, 198, 205, 208, 213, 215 Lekeitio 205, 208, 213, 215–16, 220, 224 Standard 224 Benue-Congo languages 171, 189 Berber 237

Bilinarra 86–7 Bininj Gun-Wok (BGW) 171, 190 Bislama 86 Bodic languages 198, 228 Bora 236, 239 Boran languages 238, 239 Bulgarian 173, 180, 184, 189 Bunuban languages 190 Buy/Nyun 137 Cariban languages 238 Cavineña 248, 250, 253, 254, 258, 259, 261 Cayuvava 191 Central Alaskan Yup’ik (CAY) 190, 248, 250–2, 254–5, 258, 259, 261, 262 Central Malayo-Polynesian languages 274 Central Pomo 308–16, 322, 326–7 Chácobo 239, 240–1, 248, 250, 254–5, 257–61, 263 Chamorro 197, 198, 205, 211–13, 215, 228 Chayahuita 246 Chimariko 191 Chinese, Mandarin, see Mandarin Chinese Chinook Jargon 280 Chinookan languages 191 Chintang 13 Chiquihuitlán Mazatec 30 Chukchi 190 Chukchi-Kamchatkan languages 190 Chuvash 190 Common Slavic 28 Cree 190, 216–17, 228 Cubeo 238 Cupeño 180, 184–5, 191 Cushitic languages 189 Dahalo 189 Diola-Fogny 140, 155 Diyari 190, 191 Djingulu 190 Dogon languages 189 Eastern Pomo 190 Eipo 198, 228 Elfdalian, see Swedish, Elfdalian

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

384

 

English 13, 18, 26, 56, 81, 84, 85, 87, 108, 110, 125, 163, 166, 170, 171, 196, 208, 213, 225, 267, 271, 274, 276, 277, 279–80, 303, 310–11, 316, 320, 326, 332, 336 African-Amercian Vernacular 279 Middle 85 Old 52, 74, 271, 274 Eshtehardi 220, 227 Eskimo-Aleut languages 190, 248, 254 Even 190 Evenki 190

Haitian Creole 16, 106, 113, 114, 117–18, 120, 131–5, 270, 272, 279–80 Haitian Creole English 279 Haro 189 Hinuq 180, 182–3, 189 Hopi 180, 184–5, 191 Huallaga Quechua 191 Hungarian 84, 189 Hunzib 180, 183, 189 Hup 232, 238, 240, 242–4, 246, 248, 250, 254–5, 258–9, 260–1, 263 Hupa 190

Finnish 170, 171, 189, 213, 227 Fongbe 273, 277 French 16, 25–26, 33, 74, 105–6, 110–17, 119–20, 122–4, 127–8, 130–5, 160, 216–17, 228, 270, 272, 276, 279 Cajun 111 Medieval 134 Norman 85 French-based creoles 16, 105–6, 110, 113–14, 116–18, 120 Fula 137, 140, 144, 145–50, 152, 153, 159, 188 Fuuta-Jaloo Pular 140, 148 Gombe 145–6, 153 Fur 189

Icari 189 Icelandic 33, 276 Igo 213–15, 223 Ikposo 223 Indo-European languages 2, 106, 169, 171, 174, 178, 182, 186, 189, 193, 196, 200, 202, 203, 207, 210, 216, 224–5, 227, 230, 273, 276, 278 Indo-Portuguese creoles 113 Ingush 171, 177, 189 Insular Celtic languages 198, 213, 225 Inuit 13 Iranian languages 198, 227, 272 Northwestern 201, 208, 220, 227 Southwestern 220 Irish 208, 213, 215, 225–6, 276 Ros Much 226 Iroquoian languages 190 Northern 316 Italian 84, 147, 196, 275, 276, 284 Itelmen 190 Iwaidjan languages 168, 190

Gbe languages 213, 268, 271, 279 German 14, 171–2, 173, 184, 189, 199 Germanic languages 85, 170, 189, 198, 273 North 200, 203, 208, 213, 227 West 213 Ghana-Togo-Mountain languages 198, 213–14, 223–4 Godoberi 189 Gooniyandi, see also Kuniyanti 90 Greek 23, 30, 32, 62, 189, 198, 208, 210, 213, 225, 338 Asia Minor dialects 210, 215, 225 Cappadocian 202–3, 208, 210–13, 215, 225 Pontic 202, 204, 225 Rumeic 225 Standard Modern 202, 210–11, 225 Guadeloupean Creole 16, 106, 116–18, 120, 124–32, 134–5 Gullah Creole English 279 Gunwingguan (Gunwinyguan) languages 171, 190, 198, 224 Central 224 Gurindji 12, 82–3, 87–8, 102–3 Gurindji Kriol 12, 16, 81–3, 86–103, 343

Jaminjung 87 Jamsay 189 Jamul Tiipay 191 Jangshung 205, 208, 209, 228 Jaqaru 191 Jarawara 243, 248, 250, 252, 254, 255, 258, 259–62 Jaru 87 Juu languages 189 Kabardian 189 Kafteji 198, 201–2, 208, 220–1, 227 Kakua 239, 244 Kamayurá 244 Kanoe 239, 246 Karata 189 Karo 242, 246 Karok 191

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

  Karrangpurru 87 Kartvelian languages 173, 189, 213, 273, 275 Kashibo-Kakataibo 191 Kelasi 198, 201–2, 208, 220–1, 227 Keresan languages 190 Ket 190, 191 Khanty 189, 190 Khasi 206, 226 Khasian languages 198, 226 Khinalug 177, 189 Kikongo 280 Kiowa 190 Kiowa-Tanoan languages 190 Klamath 191 Klamath-Sahaptian languages 191 Koasati 191 Koiari 190 Kokama-Kokamilla 241, 248, 254, 255, 258–9, 261 Kotiria 248, 250, 254, 255, 258–9, 261 Kriol 12, 82, 83, 87–8, 102, 103 Kundjeyhmi 224 Kune 224 Kuniyanti, see also Gooniyandi 190 Kunwinjku 224 Kuuk Thaayorre 90 Kwa languages 213–14 Kwaza 167–8, 191, 239, 246 Lak 180, 183, 189 Lakhota 190 Lango 188, 191 Latin 56, 74, 109, 142, 169, 171 Latvian 212, 224 Tamian 203, 208, 212, 213 Leti 274 Lezgi 177, 180, 184, 189, 209 Lezgic (Lezgian) languages 180, 183, 198, 208, 213, 226 Light Warlpiri 89 Lingala 218 Kinshasa 218–19, 223 Makanza 207, 216, 218–19, 222, 223 Lithuanian 2–4, 6, 10, 189, 284 Lower Sepik languages 190 Luganda 169, 171, 189 Lyngngam 206, 226 Madang languages 190 Maidu 191 Malngin 87 Manchu 190, 191 Mandarin Chinese 168, 169, 175, 190, 191, 267, 270, 276, 277–8, 341

385

Mande languages 148, 269 Mandinka 137 Mapudungun 191, 232 Mari 189 Marri Ngarr 53 Marri Tjevin 53 Matses 242–3 Mauritian Creole 16, 106, 110, 112–14, 116–18, 120–5, 128, 131, 134–5 Mawng 168, 190 Mayan languages 191 Mazatec 30, 62 Mek languages 198, 228 Mian 167–8, 173, 187, 190 Michif 197, 198, 207, 216–17, 228 Mindi languages 190 Miwokan languages 191 Mohawk 316–20, 322–5, 326–7 Mongolian 190 Mongolic languages 190 Mordvin 189 Movima 191, 236–7, 248, 250, 254–5, 258, 259, 261, 262 Mudburra 87 Murrinhpatha 15, 52–80, 84 Muskogean languages 191 Nakh-Daghestanian languages 169, 170, 171, 173–4, 176, 177, 181–3, 189, 209, 226 Nalca 198, 228 Nama 171, 189 Nambikwara 239, 242 Nanai 190 Nanti 243–4 Nez Perce 191 Ngaliwurru 87 Nganasan 190 Ngarinyman 86, 87 Niger-Congo languages 110, 136–8, 140–2, 143, 148, 155, 193, 267–9, 273, 303 Niger-Kordofanian languages, see Niger-Congo languages Nilotic languages 188 Nivkh 142, 190 Nomatsigenga 244–5 Nubi Creole Arabic 280 Nupe 268, 271 Nuuchahnulth 191 Ñuun, see also Bagnoun 137, 140, 144, 147 Nyulnyulan languages 174, 190 Ok languages 167, 173, 190 Omotic languages 189 Ossetic 173, 189

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

386

 

Paez 191 Paiwan 190 Palenquero 110, 280 Pama-Nyungan languages 15, 87, 190 Panoan languages 191, 239–41, 242–4, 248, 254, 257, 259, 260 Paresi 241, 245–6, 248, 250, 254, 255, 258, 259, 261 Pazar Laz 173, 189 Pilagá 239 Pipil 180, 184–5, 191 Pnar 206, 226 Pomoan languages 190, 308, 310 Portuguese 113, 241, 280 Pular, see Fula Quechuan languages 191 Romance languages 196, 208, 213, 216, 273, 275, 276 Romanian 173, 189 Rongga 268, 270, 273, 274, 276 Russian 15, 23, 25, 27–32, 34–51, 169, 171, 172, 178, 180, 184, 189, 191, 270, 291 Saami 62 Kildin 189 Skolt 175, 178, 191 Salish languages 190 Seereer see Seereer-Siin Seereer-Siin 137, 144–5, 149, 150, 159 Seneca 190 Seri 62, 66 Shipibo-Konibo 239–40, 242, 244 Shumcho 198, 205, 208, 209, 213, 228 Siin-Gandum, see also Seereer-Siin 144 Sinitic languages 110, 268–9, 278 Sino-Tibetan languages 190, 274 Siouan languages 190 Slovene 173, 178, 180, 184, 189, 191 Somali 189 Sorbian 178, 184, 189 Lower 180, 191 Southern Sierra Miwok 191 Spanish 211–13, 215, 220, 225, 228, 280, 310 Sranan Creole English 279–80 Svan 189, 275 Swahili 142, 273–4, 275 Swedish 203, 204 Elfdalian 201, 227 Karleby 203, 208, 213, 215, 227 Standard 200–1, 203–4, 227 Sεlεε 213, 223

Tamambo 86 Tariana 237, 238, 241, 244, 246, 248, 250, 253–5, 258–9, 261 Tatuyo 236 Tawala 190 Thompson 190 Tibeto-Burman languages 228 Tindi 189 Tok Pisin 6 Trans New Guinea languages 228 Tsakhur 180, 183–4, 189 Tukanoan languages 236, 238–9, 241, 242, 248 Tümpisa Shoshone 180, 184–5, 191 Tundra Nenets 190 Tungusic languages 176, 177, 190 Turkic languages 190, 208, 209, 213, 220 Turkish 2–3, 7, 10, 141, 142, 147, 210–13, 225, 284, 342 Tzutujil 191 Udehe 190 Udi 180, 184, 189, 208, 209, 213, 226 Uralic languages 176–8, 189–9, 275, 278 Urarina 248, 250, 252, 254–5, 258–9, 261 Usan 190 Uto-Aztecan languages 176, 177, 180, 184–5, 191 Wakashan languages 191 Wappo 191 Wari’ 241 Warlpiri 54–6, 57, 87 West Caucasian languages 189 Wichí 239 Wishram 191 Witotoan languages 238 Wolof 16, 136–41, 143–4, 148–60, 270, 273, 303 Mbakke 136, 143 Xamatauteri Yanomami 244 Yagua 167, 238–9, 246 Yakut 190 Yanesha’ 241 Yeniseian languages 190, 278 Yimas 190 Yokuts 191 Yoruba 6, 268, 270, 276, 279 Yuhup 238 Yukagir 190 Yuki-Wappo languages 191 Yuman languages 191 Yurok 174, 191 Zuni 190

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

Subject Index abstractive (models, frameworks, perspectives) 326–7 acquisition 12, 13–14, 17, 53, 57, 75, 288, 303, 311, 326–7 ﬁrst language, L1 13, 61, 323–5 native, see acquisition, ﬁrst language, L1 non-native, see acquisition, second language, L2, adult second language, L2, adult 17, 111–12, 114, 267–82, 286, 326 actualization 92, 96, 101 adult acquisition, see acquisition, second language, L2, adult agent 90 agglutinative, agglutinating morphology 3, 137, 141, 141–2, 143, 144–8, 158, 234, 255 agreement 173–4, 193–228, 236, 287, 288, 291–8, 303 default 139 redistribution of 200–4 subject-verb 284 agreement targets 151–8 algorithmic information content 331 alignment 88 allomorphy 3, 7, 8, 9, 54–6, 57, 58, 59, 61–6, 68–70, 72, 75, 89, 110, 148, 149, 170, 172–3, 188, 230, 234, 247, 251, 252–3, 255, 261, 317, 326, 327 Amazonian languages 17, 167, 230–63 analogy 16, 26, 27, 52–4, 57, 61, 67, 70, 71–4, 75, 326 analyticity 17, 110, 267–82 Andean languages 231, 246 animacy 38, 39, 85, 90, 91, 92, 95, 96, 172, 174, 197, 199, 201–4, 205, 213, 214, 217, 218, 219, 238 argument relations 88, 90, 93, 102, 103 autonomous (or pure) morphology 6–7, 18, 24, 119, 147, 230–1, 235, 247–51, 255, 256–62 auxiliary 101 average conditional entropy, see entropy bias ampliﬁcation 304 bilingualism 193, 210, 211, 214, 215, 220, 222, 307, 308, 311

biuniqueness 9, 54, 164, 230, 234, 247, 253, 254, 262, 341–2 borrowing 12, 16, 127, 160, 194, 205, 209, 212, 215, 222, 233, 238–9, 246, 273 bound status 235, 248, 256, 257–8, 262, 263 canonical typology 108, 340–1 canonicality 163–92 canonicity 9, 10, 16, 24, 163–4, 236, 238, 340–1 case 2–3, 82–3, 87–90, 163, 166, 171–2, 174, 175, 184, 246, 272–3, 274, 286, 343 Caucasus 171, 176, 177, 178, 180, 181, 182, 183, 184 Chaco region 238, 239 Circum-Baltic area 176 class preﬁxation 151 classiﬁer stem 53, 59–75 classiﬁers 61–3, 167–8, 169, 236–9 numeral 270, 277 closed classes 52, 53, 59, 61, 66, 68, 71, 75 co-exponence 71, 171, 184 complexiﬁcation 16, 82, 83, 85, 88, 89, 103, 109, 111, 136–60, 183, 194, 285 complexity: absolute (absolutive) 8, 24, 31, 106, 136, 195, 306, 337 agent-related 306, 337 canonical 163–92, 334, 340–2 compositional 335–7 constitutional 8–9, 141, 335 corpus 306 descriptive 9, 14, 151, 163–4, 195, 204, 217, 332, 335, 339, 340 effective 6, 306 enumerative (E-complexity) 8–9, 11, 24, 32, 56, 82, 85, 89, 102, 103, 106, 112, 163, 175, 233, 334, 335, 336–7 exponence 233, 234, 247, 251–5, 335 formal 8, 13–14 generative 9, 151 integrative (I-complexity) 11, 12–13, 16, 24–5, 27, 32, 56, 57, 59, 62, 65–6, 71, 75, 82, 85, 89, 103, 106–7, 108, 112–13, 122, 135, 233, 334, 335, 337–40, 343 inventory (IC) 163, 334–6 Kolmogorov 9, 163, 172, 185, 306, 331, 341

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

388

 

complexity: (cont.) modes of 335 objective 8, 306, 326, 337 paradigmatic 9, 84, 196 relative 8, 84, 136, 164, 180, 306 structural 159, 269, 306, 342 syntagmatic 3, 196 system 13, 17, 23, 27, 41–6, 46–8, 233, 234, 235–47, 248, 262, 306, 335, 342 taxonomic(al) 9, 163, 335 compounding 14, 232, 233, 235, 238, 262, 320 conditional entropy, see entropy conditioned variation 304 consonant mutation 62, 144, 145, 150–1, 173 constructive (models, frameworks, perspectives) 326, 335 contact-induced change 12, 81–103, 194, 205, 209–10, 211, 213, 244, 246, 286 contiguity 235, 248, 256, 258–9, 262 continuative aspect 101 conversion 120, 123, 124, 129, 130, 132, 133, 134 co-referential pronoun 90, 92, 95, 96, 99, 102, 103, 343 corepresentability 332 cost 8, 12, 13, 14, 24, 136, 185, 195, 337 creoles 2, 12, 16, 87, 105–6, 109–13, 113–14, 116–18, 135, 267, 271, 272, 277, 278–80 crosslinguistic tendency 33, 57 culminativity 261 declension entropy, see entropy default agreement, see agreement defectiveness 9, 30, 38, 42, 47, 48, 50, 157, 158, 234 deﬁniteness 85 demography 199, 209, 216, 221 demorphologization 16, 52–4, 70–1, 74–5 dependent marking 166 Depth-of-Inference Contrast 32 derivation 7, 11, 13, 14, 107, 118–20, 131, 132, 134, 318–19, 335 deterministic input 304 difﬁculty, see cost dominance 212, 213 dominance analysis 83, 86, 90, 91, 93, 95, 96, 97, 100 drift 270–2, 281 dual-route model 13 entropy 11, 27, 40–9, 55, 56–9, 65–6, 81, 84, 296–8, 338–40 conditional 26, 32, 33, 40–1, 43–6, 47, 49, 57, 58, 66, 71, 338–40 declension(al) 33, 338

equicomplexity hypothesis 2 ergative 83, 88, 89, 90, 93, 95, 102, 103 evidentiality 231, 232, 234, 241–4, 250, 254, 262 expansion (of gender marking) 200, 205–7, 216–19, 220, 222 exponence: cumulative 3, 8, 171 multiple 174, 234, 247, 251, 253 partial 173–4 frequency 13, 28, 67, 110, 114, 116, 294–5, 303, 307 token 27, 36 type 27, 33, 34, 42–3, 44, 46, 47 gender 166, 167, 169–74, 176–7, 193–228, 237, 238, 272 gender marking: emergence of 198, 200, 205–7, 209–16, 221–2, 238, 278 erosion of 200, 203, 213, 220 loss of 200–7, 209–16, 222 reduction of 200–5, 208, 212, 215, 221, 222 generalized linear mixed models (GLMM) 16, 82, 83, 85–6, 91–6, 99 grammatical gender, see gender grammaticalization 12, 110, 206, 231–5, 236, 237–8, 241–7, 262–3, 275–6, 277, 307, 343 greater vs. unmarked plural 155 idiolect 82, 85, 216 imperfect learning 194, 283–305 implicative structure 25, 30, 31–3, 41, 43–6, 49, 50 inanimate, see animacy incorporation 59, 232, 233, 235, 238, 244, 245, 246, 250, 260, 262, 320–2 inﬂecting-fusional 137, 141, 142, 144, 147, 151, 158 inﬂection: contextual 110, 272–3 inherent 110, 270, 272–3 inﬂection class 23–51, 54–6, 62, 107, 147, 168–9, 186, 333, 336 inﬂectional categories 60, 165–6, 175, 270 information-theoretic approach 8, 11, 24, 26, 27, 32, 40, 337 information theory 43, 107, 343 intergenerational change 67, 68, 89, 99, 102, 213–14, 215, 267, 287, 290, 293, 295, 297 interrupted transmission 12, 290, 291–2, 294, 295, 297, 298, 305 intersecting formative 26–7, 54, 61–6, 68, 70, 75 intransitive subjects 83, 88–90, 102, 103

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

  irregular/irregularity 8, 13, 23–51, 84, 119, 125, 136, 137, 141, 144, 148, 149, 150, 151, 151–8, 270, 284–5, 293–302, 303–5, 332 isolating languages 2, 83, 105, 137, 158, 269, 341 iterated learning 285–6, 304

389

obsolescence 84, 311–16 opacity 8, 10, 11, 75, 113, 174–5 overabundance 9, 24, 81–6, 88–90, 99, 102, 103, 144, 342–3 overspeciﬁcation 284–5, 287, 288, 290, 291–3, 296–305, 335, 342

Kolmogorov complexity, see complexity language attitude 159, 221 language contact 2, 12, 14, 15, 19, 50, 53, 81–103, 109, 182, 183, 193–5, 205, 209–10, 211, 213, 233, 235, 238, 241, 244, 246, 262, 263, 267, 269, 271, 280, 282, 286, 306, 308–9, 310–12, 343 language ecology 209–21 language evolution 222, 285 language genesis 12, 102, 114, 279–80 learnability 50, 163, 298–302, 305 lexeme-based morphology 118 lexical storage 27–9, 303 lexicalization 204, 315, 316, 317, 322 lexicon qua mental lexicon 13, 28, 29, 326, 333–4, 335 lexiﬁer 87, 105, 109, 111–12, 114, 116, 123, 130, 132, 135 linguistic areas 213, 222, 308–9 linguistic correctness 160 Low (Conditional) Entropy Conjecture 11, 25, 32, 33, 45, 49, 71 Marginal Detraction Hypothesis 33, 34 memorization 53, 75, 303 minimum description length 9, 26, 195, 204, 206, 306, 331–2, 334, 337, 340, 343 morpheme-to-word ratio 3 morphological decomposition 303 morphological richness 10, 136, 141–2, 336 morphome 11, 31, 119, 122, 247 morphophonological erosion 193, 200–3 multilingualism 12, 53, 213, 307 Natural Morphology 10, 12–13 naturalness, see Natural Morphology Network Morphology 28, 62 neural networks 14 nominal classiﬁcation 193, 231, 234, 235–9, 250, 262 North Paciﬁc Rim 176–7 noun class 34, 49, 136, 138–40, 144–8, 150, 160, 173, 218, 219, 236, 270, 273, 303 noun incorporation, see incorporation number 166, 167, 173–4 numeral classiﬁer, see classiﬁers

Pāṇini’s Principle 332 Paradigm Cell Filling Problem 55, 59, 61 paradigm organization 333 Paradigm Structure Conditions 11 paradigmatic layers 25, 29–31, 34, 39, 41–6, 48, 50, 54, 62 passive 314–16 pattern competition 343 pattern regulation 343 periphrastic construction 84 person 166, 167 pidgin 2, 12, 105, 109–11, 267, 272 pidginization 218, 219, 279, 281 portmanteau 8, 60, 167, 171, 242 possessive 85, 89 predictability 1, 11, 14, 26, 33, 39–40, 45, 47, 52–3, 55, 56–9, 65, 68–70, 71, 84, 85, 106, 107, 120, 123, 131, 135, 169, 171, 338 prestige 159, 160, 195, 199, 209, 212, 213, 222 priming 91, 92, 93, 96, 102, 103, 343 principal parts 32, 33, 333, 336 probabilistic input 304 Probabilistic Syntax 85 probability matching 294–6 processing 1, 12–14, 26, 53, 56, 61, 75, 106, 322, 326 processing cost, see cost productivity 10, 23, 28, 53, 60, 111, 114–15, 128, 130, 132, 134, 135, 141, 194, 201, 203, 205–6, 213, 216, 218, 232, 235, 245, 251, 253, 262, 286, 304, 320, 327, 332, 336 prosodic dependence 235, 248, 256, 259–61, 262 psycholinguistic approach 11, 13, 14 qualitative approach 8, 9–10 quantitative approach 8, 9 redundancy 8, 14, 141, 287, 288, 293, 303, 305 reduplication 122, 312 regression analysis 47, 85, 93, 95 regular/regularity 9, 13, 14, 23, 25–8, 34, 46–8, 81, 84, 144, 235, 285, 297, 302, 304, 305, 331–2 regulations 335–6 resources 163, 335–6 routinization 307, 308, 327

OUP CORRECTED AUTOPAGE PROOFS – FINAL, 23/8/2020, SPi

390

 

set-theory 26, 32 simpliﬁcation 12, 59, 61, 67, 68–70, 71, 72, 83, 84, 88, 89, 99, 103, 109, 110, 112, 141, 144, 159–60, 194, 203, 267, 270, 279, 285–7, 288, 293, 305 sociocultural context 307, 308 socioecological parameters 19, 283 sociolinguistic isolation 180, 186 sociolinguistic typology 12, 53 sociolinguistics 85, 160, 180–5, 186 stem alternation 24, 25, 46, 114, 142, 148 stem class 38, 170, 186 stem ﬂexivity 168–9, 170 stress: inﬂectional 23, 29, 30 syllable 36, 276, 323 sufﬁxation 124, 128, 129, 130, 132, 133, 149 suppletion 6, 8, 9, 26, 28, 33, 60, 65, 74, 117, 125, 142, 157, 234, 251, 252–3, 270, 307, 332, 338, 340–1 syncretism 6, 9, 29, 56, 81, 84, 89, 111, 113, 115, 116, 117, 120, 124, 125, 126–7, 128, 172, 173, 174, 194, 234, 250, 307, 341

synthesis 13, 106, 306 synthesis index 2, 231 templatic morphology 10, 232, 307, 317, 322 tense 234, 235, 239–41, 243–4, 250, 252, 260, 262 topicality 85, 86 transatlantic slave trade 279 transitive subjects 89, 90, 93, 95, 96, 99, 102 transmission ﬁdelity 298 transparency 9, 10, 113, 114, 163–4, 175, 186, 340–2 U-curve 302 unpredictability 39, 52–4, 55–8, 60, 65, 70–1, 75, 168, 169, 170, 171, 341 valence-adjusting 234, 244–6, 250, 262 Vaupés region 233, 238, 242, 244 word formation 6, 7–8, 11, 197, 215, 320 word recognition 6, 14 word-and-paradigm framework 11 wordhood 172, 173, 234–5, 248, 250, 255, 256–61, 262