Arabic and the Case against Linearity in Historical Linguistics (Oxford Studies in Diachronic and Historical Linguistics) 9780192867513, 0192867512

This book explores the long history of the Arabic language, from pre-Islamic Arabic via the Classical era of the Arabic

135 38 9MB

English Pages 512 [513] Year 2023

Table of contents :
Cover
Title
Copyright Page
Contents
Series preface
Preface
List of figures and maps
Abbreviations and symbols
Six principles
1 Introduction
1.1 Fallacies and metonymies, both unwanted and wanted
1.1.1 Linearity
1.1.2 The written over oral fallacy
1.1.3 The part for whole metonymic fallacy
1.1.4 Historical linguistics via non-linguistic criteria: The "cultural entities are linguistic entities fallacy''
1.1.5 The script is language fallacy
1.2 Non-linearity: An empirical comparative alternative
1.3 Data sources and methodology
1.4 Notes and conventions
1.5 Overview of chapters
Part I: Old Arabic
Part II: Reconstruction
Part III: Contact
Part IV: Stability
Part V: Taxonomy
Putting it all together, Chapters 13 and 14
Part I Old Arabic
2 Arabic and Semitic
2.1 Common Semitic
Segmental phonemes
Verb
2.2 Contrastive, but general: The ancestors of Arabic in trees
2.2.1 The classic arguments
2.2.2 Hetzron's alternative
2.3 Bifurcated features in Arabic
2.3.1 -t "007E-k = 1, 2 perfect verb suffix
2.3.2 Short vowels in open syllables
2.3.3 The nominal feminine suffix -at
2.3.4 -ki "007E-iš 2FSG object
2.3.5 Stammbaum and bifurcation
2.4 Arabic: A composite West Semitic language
3 Arabs, Arabic
3.1 Arabs
3.2 *k → c:25ex: Sibawaih the modernist
3.2.1 The 2FSG object pronoun suffix in Sibawaih
3.2.2 The history of the *k > c-.25ex/c split revisited: Sibawaih and historical linguistics
3.3 The early tradition
3.3.1 The traditional linear approach
3.3.2 Ibn al-Nadim: Classical Arabic as construct
4 Three types of pre- and early Islamic sources: The pre-Sibawaihian setting
4.1 Epigraphy
4.1.1 Taymanitic
4.1.2 Safaitic
4.1.3 Limits of Safaitic for historical reconstruction; the burden of underspecification
4.1.3.1 Underspecification I: Lack of formal indication of short vowels, gemination
4.1.3.2 Underspecification II: Gaps in paradigms
4.1.4 The contradictions of interpreting underspecification
Orthography and reconstruction
4.1.5 Linearity
4.1.5.1 Link to CA
4.1.5.2 Link to Proto-Semitic
4.1.6 Summary, Safaitic
4.1.7 Aramaic loanword š = Arabic s
4.2 Papyri
4.2.1 Basic overview
4.2.1.1 Phonology
4.2.1.2 Morphology and syntax
4.2.2 A case study, raw data, and deviation from CA
4.2.3 From juridical and cultural koine to Classical Arabic?
4.3 Greek orthography, bilinguals, Greek renditions of Arabic names
4.4 Language change and socio-demographic realism
4.5 An interpretive record
Part II Reconstruction
5 Punctuation and language history: I/I + D, inheritance/innovation, and diffusion
5.1 Basic concepts, basic exemplification: The I/I + D paradigm
5.2 When things get complicated: Diffusion, not parallel independent development
5.2.1 A basis for discussion: The intrusive -n
5.2.2 The intrusive –n and Lass' principle
5.3 Geographically non-contiguous features with a postulated common source
5.3.1 Phonology
5.3.1.1 *j = z-.25ex
5.3.1.2 *k → c-.25ex/
5.3.1.3 *aa → ie imala
5.3.1.4 * → q
5.3.1.5 *θ → s
5.3.1.6 Others
5.3.2 Morphology
5.3.2.1 Invariable –ki `2FSG'
5.3.2.2 –ı-.25ex `my', -nı-.25ex `me'
5.3.2.3 -ha/hin/hum "007E-a/-in/um
5.3.2.4 Imperfect verb: 1SG, 1PL, n-, n-…-u
5.3.2.5 taltala morphemic /a/ vs. /i/
5.3.2.6 b-: future or indicative imperfect prefix
5.3.2.7 b-imperfect: Against parallel independent development
5.3.2.8 Deflected agreement: Plural, singular or plural, singular only
5.3.2.9 The linker –n: The incrementation corollary; independent but not parallel development
5.4 Lexicon
5.4.1 Reflexes in contemporary dialects
5.5 Creole Arabic: Where Arabic stops
5.6 Exogenous discontinuity
5.7 Summary
6 Four issues in Arabic historical linguistics
6.1 Reconstruction and the Semiticist/Arabicist tradition
6.2 Grammaticalization theory and historical linguistics
6.3 Historical linguistics, reconstruction
6.4 The speech community and the scope of change: How does it help?
6.5 A non-deterministic speech community
6.5.1 City as speech community
6.5.2 Neighborhood as speech community
6.5.3 The household as speech community
6.6 Change doesn't need to happen
6.7 Linguistic stages and contemporaneous speech communities
6.7.1 A diachronic trail across speech communities
6.7.2 Motivation for change
6.8 Non-Arabicists beware: The community of diglossia
Part III Contact
7 Arabic in contact I: Aramaic
7.1 The era of equilibrium: Directed dia-planar diffusion: Aramaic–Arabic contact
7.2 A sample of potential of common Aramaic–Arabic isoglosses
7.2.1 Segmental phonology
Uvular fricatives
Affect on syllable structure
Diphthongs
7.2.2 Syllable structure
7.2.3 Morphophonology
7.2.3.1 Stress protection for short vowels in open syllables
7.2.3.2 1SG stress
7.2.4 The active participle
7.2.4.1 The active participle as verbal predicate
7.2.4.2 Person-marked participle
7.2.4.3 Development of finite conjugation based on active participle in Central Asian Arabic
7.2.5 Differential object marking (DOM)
7.2.6 What didn't happen
7.3 Arabs and Aramaeans: The socio-cultural basis of diffusion
7.4 Dia-planar diffusion
8 Morphosyntax as an adapative mechanism I: Idioms
8.1 Idioms
8.2 Idiomaticity
8.2.1 Idioms and online processing
8.2.2 Two alternative approaches
8.2.2.1 A lexical approach
8.2.2.2 A psycholinguistic alternative
8.2.3 The case for the lexical basis of idiom interpretation
8.2.3.1 The data, what are idioms?
8.2.3.2 Idiomatic usage is the normal state of affairs for many lexemes
8.2.4 Idioms contain normal words, normal morphemes, normal morphosyntax
8.2.4.1 Idioms are normal words I: Compositionality
8.2.4.2 Idioms are normal words II: Intra-clausal functions
8.3 Idioms are normal words but they produce distributed polysemy
8.3.1 Pronominal reference
8.3.2 Distributed polysemy and thematic roles
8.4 How idioms are different from `normal' constructions: Characterizing idioms
8.5 The discourse semantics of idiomaticity
8.5.1 Prominent part
8.6 The origins of LCA idiomaticity
8.7 LCA, Egyptian, southern Tunisian: Three dialects, two idiom areas
8.8 Are idioms universal?
9 Morphosyntax as an adapative mechanism II: The expansive demonstrative
9.1 Basic history and linguistic background
9.1.1 The data, the corpora
9.2 The role of contact
9.2.1 Referring expressions
9.2.2 Three types of categorical variables
9.3 Descriptive introduction
9.3.1 Definite article
9.3.1.1 Lake Chad area Arabic
9.3.1.2 Egyptian Arabic
9.3.2 Demonstrative, LCA
9.3.2.1 LCA, inherited features
9.3.2.2 Innovative functions
9.3.3 Egyptian Arabic
9.4 Quantitative overview
9.4.1 LCA
9.4.2 Egyptian Arabic demonstratives
9.5 The Lake Chad linguistic area
9.5.1 Kanuri -d (with H tone)
9.5.2 Glavda, Wandala
9.5.3 Bagirmi
9.5.4 Fali
9.6 An overview: Realignment of what?
9.7 Discussion
9.7.1 Areality and contact
9.7.2 The citation of parallel distributions of determiners in LCAL
9.7.3 Diffusion, simplification, irregularity
9.8 Corpora and the comparative method
9.9 Morphosyntax as an adaptive mechanism
Part IV Stability
10 Language stability I: Three case sketches
10.1 Najdi Arabic
10.2 Lake Chad area Arabic (LCA)
10.2.1 How "unarabic'' is LCA? A discussion of oso-9780192867513-bibliography-1-bibItem-385McWhorter 2007
10.2.2 Continuity or innovation
10.3 Damascus Arabic
10.4 Summary
11 Language stability II: Watching paint dry, or, metrics for measuring language stability
11.1 A basic observation
11.2 Stability in historical linguistics
11.2.1 Looking under the hood of transmission
11.3 Why? The basic issue
11.3.1 Verbal predicates
11.3.2 The other predicates
11.4 A multivariate insight into language stability
11.4.1 The data
11.4.2 The parameters
11.4.3 The statistical tests
11.4.4 Conclusions on the basis of the regression results
11.5 Comparative data
11.5.1 Background: Model trees, linguistic and demographic
11.5.2 Overview of main argument
The linguistic issue
Demographic split
Realization of linguistic phenomenon in speech communities
Comparative perspective: Things don't have to be as they are demonstrated to be
11.5.3 Things can be different, I: Universal language typology, some languages are O, others N/O
11.5.4 Things can be different II: For a reason
11.5.4.1 Differential parsing model
11.5.4.2 Modern Hebrew
11.5.4.3 N/O, O split in Arabic
11.5.5 Support from universal factors
11.6 Stability
11.7 Comparative contemporary corpora and historical linguistic interpretation: The limits of adaptation
11.7.1 Overt transmission
Re-arrangement of overt lexemes, lexemic adjacency, no new structure
Exploitation of clause-internal co- and disjoint reference
11.7.2 Inferential transmission
Part V Taxonomy
12 Toward a typology for historical linguistics
12.1 English
12.1.1 Old English
12.1.2 Middle and early modern English
12.2 Icelandic
12.3 Icelandic, Old English, Arabic
12.3.1 English
12.3.2 Gender maintenance in Arabic
12.3.3 Final vowels in Arabic
12.4 Short final vowels and gender
12.5 Arabic as art: Toward a taxonomy of the history of languages
12.5.1 Arabic alinearity
12.5.2 Arabic multi-linearity
13 Summing up
13.1 Four parameters for classifying changes
13.1.1 Contiguous or non-contiguous
13.1.2 Origin: Proto-Semitic, proto West Semitic, pre-Islamic, Islamic, via contact
13.1.3 Age in Arabic
13.1.4 Diffusion, extent
13.1.5 Classification of features
13.2 Parallel independent development
13.3 The speech community
13.4 Incrementation
14 Why Arabic is special and special for historical linguistics
References
Subject index
Index of language families, languages, and dialects

Recommend Papers

Quantitative Historical Linguistics: A Corpus Framework (Oxford Studies in Diachronic and Historical Linguistics) 9780198718178, 0198718179

This book is an innovative guide to quantitative, corpus-based research in historical and diachronic linguistics. Gard B

111 70 5MB Read more

Case in Semitic: Roles, Relations, and Reconstruction (Oxford Studies in Diachronic and Historical Linguistics) 9780199671809, 019967180X

This book sets out a new reconstruction for the Semitic case system. It is based on a detailed analysis of the expressio

112 9 3MB Read more

Arabic Historical Dialectology: Linguistic and Sociolinguistic Approaches (Oxford Studies in Diachronic and Historical Linguistics) [Illustrated] 9780198701378, 0198701373

This book, by a group of leading international scholars, outlines the history of the spoken dialects of Arabic from the

101 35 5MB Read more

Syllable and Segment in Latin (Oxford Studies in Diachronic and Historical Linguistics) [Illustrated] 9780199660186, 0199660182

Syllable and Segment in Latin offers new and detailed analyses of five long-standing problems in Latin historical phonol

109 103 3MB Read more

Variation and Change in Gallo-Romance Grammar (Oxford Studies in Diachronic and Historical Linguistics) 9780198840176, 0198840179

This volume offers a wide-range of case studies on variation and change in the sub-family of the Romance languages that

110 22 10MB Read more

Verb Second in Medieval Romance (Oxford Studies in Diachronic and Historical Linguistics) [Illustrated] 9780198804673, 0198804679

This volume provides the first book-length study of the controversial topic of Verb Second and related properties in a r

101 12 2MB Read more

Referential Null Subjects in Early English (Oxford Studies in Diachronic and Historical Linguistics) 9780198808237, 0198808232

This book offers a large-scale quantitative investigation of referential null subjects as they occur in Old, Middle, and

123 98 4MB Read more

Syntactic Change in French (Oxford Studies in Diachronic and Historical Linguistics) 9780198864318, 0198864310

This book provides the most comprehensive and detailed formal account to date of the evolution of French syntax. It make

121 80 4MB Read more

Dative External Possessors in Early English (Oxford Studies in Diachronic and Historical Linguistics) 9780198832263, 0198832265

This volume is the first systematic, corpus-based examination of dative external possessors in Old and Early Middle Engl

110 48 2MB Read more

Cycles in Language Change (Oxford Studies in Diachronic and Historical Linguistics) 9780198824961, 0198824963

This volume explores the multiple aspects of cyclical syntactic change from a wide range of empirical perspectives. The

98 41 5MB Read more

Arabic and the Case against Linearity in Historical Linguistics (Oxford Studies in Diachronic and Historical Linguistics)
9780192867513, 0192867512

Author / Uploaded
Jonathan Owens

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Arabic and the Case against Linearity in Historical Linguistics

O X F O R D ST U D I E S I N D I A C H R O N I C A N D H IST O R I C A L L I N GU IST I C S General editors Adam Ledgeway and Ian Roberts, University of Cambridge Advisory editors Cynthia L. Allen, Australian National University; Ricardo Bermúdez-Otero, University of Manchester; Theresa Biberauer, University of Cambridge; Charlotte Galves, University of Campinas; Geoff Horrocks, University of Cambridge; Paul Kiparsky, Stanford University; David Lightfoot, Georgetown University; Giuseppe Longobardi, University of York; George Walkden, University of Konstanz; David Willis, University of Oxford RECENTLY PUBLISHED IN THE SERIES 45 The Diachrony of Differential Object Marking in Romanian Virginia Hill and Alexandru Mardale 46 Noun-Based Constructions in the History of Portuguese and Spanish Patrı´cia Amaral and Manuel Delicado Cantero 47 Syntactic Change in French Sam Wolfe 48 Periphrasis and Inflexion in Diachrony A View from Romance Edited by Adam Ledgeway, John Charles Smith, and Nigel Vincent 49 Functional Heads Across Time Syntactic Reanalysis and Change Edited by Barbara Egedi and Veronika Hegedűs 50 Alignment and Alignment Change in the Indo-European Family Edited by Eystein Dahl 51 Germanic Phylogeny Frederik Hartmann 52 Arabic and the Case against Linearity in Historical Linguistics Jonathan Owens For a complete list of titles published and in preparation for the series, see pp. 464–468

Arabic and the Case against Linearity in Historical Linguistics J O N AT H A N O W ENS

Great Clarendon Street, Oxford, OX2 6DP, United Kingdom Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide. Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries © Jonathan Owens 2023 The moral rights of the author have been asserted All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by licence or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this work in any other form and you must impose this same condition on any acquirer Published in the United States of America by Oxford University Press 198 Madison Avenue, New York, NY 10016, United States of America British Library Cataloguing in Publication Data Data available Library of Congress Control Number: 2023930757 ISBN 978–0–19–286751–3 DOI: 10.1093/oso/9780192867513.001.0001 Printed and bound in the UK by Clays Ltd, Elcograf S.p.A. Links to third party websites are provided by Oxford in good faith and for information only. Oxford disclaims any responsibility for the materials contained in any third party website referenced in this work.

Contents Series preface Preface List of figures and maps Abbreviations and symbols Six principles

1. Introduction 1.1 Fallacies and metonymies, both unwanted and wanted 1.1.1 1.1.2 1.1.3 1.1.4

Linearity The written over oral fallacy The part for whole metonymic fallacy Historical linguistics via non-linguistic criteria: The “cultural entities are linguistic entities fallacy” 1.1.5 The script is language fallacy

1.2 1.3 1.4 1.5

xii xiii xvi xviii xxii

1 2 2 4 4 7 8

Non-linearity: An empirical comparative alternative Data sources and methodology Notes and conventions Overview of chapters

10 11 15 17

Part I: Old Arabic Part II:Reconstruction Part III:Contact Part IV:Stability Part V: Taxonomy Putting it all together, Chapters 13 and 14

17 18 19 20 20 21

PA RT I . O L D A R A B I C 2. Arabic and Semitic 2.1 Common Semitic Segmental phonemes Verb

2.2 Contrastive, but general: The ancestors of Arabic in trees 2.2.1 The classic arguments 2.2.2 Hetzron’s alternative

2.3 Bifurcated features in Arabic 2.3.1 -t ~ -k = 1, 2 perfect verb suffix 2.3.2 Short vowels in open syllables 2.3.3 The nominal feminine suffix -at

25 26 26 27

29 29 31

32 32 33 35

vi

CONTENTS 2.3.4 -ki ~ -iš 2FSG object 2.3.5 Stammbaum and bifurcation

2.4 Arabic: A composite West Semitic language

3. Arabs, Arabic 3.1 Arabs 3.2 ∗ k → cˇ: Sibawaih the modernist 3.2.1 The 2FSG object pronoun suffix in Sibawaih 3.2.2 The history of the ∗ k > cˇ/c split revisited: Sibawaih and historical linguistics

3.3 The early tradition

38 38

40

49 49 53 54 64

66

3.3.1 The traditional linear approach 3.3.2 Ibn al-Nadim: Classical Arabic as construct

66 69

4. Three types of pre- and early Islamic sources: The pre-Sibawaihian setting 4.1 Epigraphy

73 73

4.1.1 Taymanitic 4.1.2 Safaitic 4.1.3 Limits of Safaitic for historical reconstruction; the burden of underspecification 4.1.3.1 Underspecification I: Lack of formal indication of short vowels, gemination 4.1.3.2 Underspecification II: Gaps in paradigms 4.1.4 The contradictions of interpreting underspecification Orthography and reconstruction 4.1.5 Linearity 4.1.5.1 Link to CA 4.1.5.2 Link to Proto-Semitic 4.1.6 Summary, Safaitic 4.1.7 Aramaic loanword š = Arabic s

4.2 Papyri

76 77 79

80 81 83

83 85

85 86 91 92

93

4.2.1 Basic overview 4.2.1.1 Phonology 4.2.1.2 Morphology and syntax 4.2.2 A case study, raw data, and deviation from CA 4.2.3 From juridical and cultural koine to Classical Arabic?

94

98 102

4.3 Greek orthography, bilinguals, Greek renditions of Arabic names 4.4 Language change and socio-demographic realism 4.5 An interpretive record

107 111 113

95 95

CONTENTS

vii

PA RT I I . R E C O NST RU C T I O N 5. Punctuation and language history: I/I + D, inheritance/innovation, and diffusion 5.1 Basic concepts, basic exemplification: The I/I + D paradigm 5.2 When things get complicated: Diffusion, not parallel independent development

117 117 122

5.2.1 A basis for discussion: The intrusive -n 5.2.2 The intrusive –n and Lass’ principle

123 125

5.3 Geographically non-contiguous features with a postulated common source

133

5.3.1 Phonology 5.3.1.1 ∗ j = zˇ 5.3.1.2 ∗ k → cˇ/ç 5.3.1.3 ∗ aa → ie imala 5.3.1.4 ∗ ɣ → q 5.3.1.5 ∗ θ → s 5.3.1.6 Others 5.3.2 Morphology 5.3.2.1 Invariable –ki ‘2FSG’ 5.3.2.2 –ı´ ‘my’, -nı´ ‘me’ 5.3.2.3 -ha/hin/hum ~ -a/-in/um 5.3.2.4 Imperfect verb: 1SG, 1PL, n-, n-…-u 5.3.2.5 taltala morphemic /a/ vs. /i/ 5.3.2.6 b-: future or indicative imperfect prefix 5.3.2.7 b-imperfect: Against parallel independent development 5.3.2.8 Deflected agreement: Plural, singular or plural, singular only 5.3.2.9 The linker –n: The incrementation corollary; independent but not parallel development

5.4 Lexicon 5.4.1 Reflexes in contemporary dialects

5.5 Creole Arabic: Where Arabic stops 5.6 Exogenous discontinuity 5.7 Summary

6. Four issues in Arabic historical linguistics 6.1 Reconstruction and the Semiticist/Arabicist tradition 6.2 Grammaticalization theory and historical linguistics 6.3 Historical linguistics, reconstruction 6.4 The speech community and the scope of change: How does it help? 6.5 A non-deterministic speech community 6.5.1 City as speech community 6.5.2 Neighborhood as speech community

134

134 134 135 136 137 139 139

139 140 140 141 144 146 155 162 165 170 171

172 174 174

175 175 177 181 186 189 193 193

viii

CONTENTS 6.5.3 The household as speech community

194

6.6 Change doesn’t need to happen 6.7 Linguistic stages and contemporaneous speech communities

197

6.7.1 A diachronic trail across speech communities 6.7.2 Motivation for change

6.8 Non-Arabicists beware: The community of diglossia

199 200 202

206

PA RT I I I . C O N TA C T 7. Arabic in contact I: Aramaic 7.1 The era of equilibrium: Directed dia-planar diffusion: Aramaic–Arabic contact 7.2 A sample of potential of common Aramaic–Arabic isoglosses 7.2.1 Segmental phonology Uvular fricatives Affect on syllable structure Diphthongs 7.2.2 Syllable structure 7.2.3 Morphophonology 7.2.3.1 Stress protection for short vowels in open syllables 7.2.3.2 1SG stress 7.2.4 The active participle 7.2.4.1 The active participle as verbal predicate 7.2.4.2 Person-marked participle 7.2.4.3 Development of finite conjugation based on active participle in Central Asian Arabic 7.2.5 Differential object marking (DOM) 7.2.6 What didn’t happen

7.3 Arabs and Aramaeans: The socio-cultural basis of diffusion 7.4 Dia-planar diffusion

8. Morphosyntax as an adapative mechanism I: Idioms 8.1 Idioms 8.2 Idiomaticity 8.2.1 Idioms and online processing 8.2.2 Two alternative approaches 8.2.2.1 A lexical approach 8.2.2.2 A psycholinguistic alternative 8.2.3 The case for the lexical basis of idiom interpretation 8.2.3.1 The data, what are idioms? 8.2.3.2 Idiomatic usage is the normal state of affairs for many lexemes 8.2.4 Idioms contain normal words, normal morphemes, normal morphosyntax

215 215 218 218

219 219 220 221 222

222 223 224

224 225 226 227 229

230 235

239 243 244 244 246

246 247 249

249 250 251

CONTENTS 8.2.4.1 8.2.4.2

Idioms are normal words I: Compositionality Idioms are normal words II: Intra-clausal functions

8.3 Idioms are normal words but they produce distributed polysemy 8.3.1 Pronominal reference 8.3.2 Distributed polysemy and thematic roles

8.4 How idioms are different from ‘normal’ constructions: Characterizing idioms 8.5 The discourse semantics of idiomaticity 8.5.1 Prominent part

8.6 The origins of LCA idiomaticity 8.7 LCA, Egyptian, southern Tunisian: Three dialects, two idiom areas 8.8 Are idioms universal?

9. Morphosyntax as an adapative mechanism II: The expansive demonstrative 9.1 Basic history and linguistic background 9.1.1 The data, the corpora

9.2 The role of contact 9.2.1 Referring expressions 9.2.2 Three types of categorical variables

9.3 Descriptive introduction 9.3.1 Definite article 9.3.1.1 Lake Chad area Arabic 9.3.1.2 Egyptian Arabic 9.3.2 Demonstrative, LCA 9.3.2.1 LCA, inherited features 9.3.2.2 Innovative functions 9.3.3 Egyptian Arabic

9.4 Quantitative overview 9.4.1 LCA 9.4.2 Egyptian Arabic demonstratives

9.5 The Lake Chad linguistic area 9.5.1 9.5.2 9.5.3 9.5.4

Kanuri -dǝ (with H tone) Glavda, Wandala Bagirmi Fali

9.6 An overview: Realignment of what? 9.7 Discussion 9.7.1 Areality and contact 9.7.2 The citation of parallel distributions of determiners in LCAL 9.7.3 Diffusion, simplification, irregularity

9.8 Corpora and the comparative method 9.9 Morphosyntax as an adaptive mechanism

ix 251 253 255 255 256

259 262 264

268 272 276

280 281 283

284 285 285

286 286

287 289 290

290 291 295

297 297 301

301 303 306 308 309

310 313 313 314 315

315 316

x

CONTENTS

PA RT I V. STA B I LI T Y 10. Language stability I: Three case sketches 10.1 Najdi Arabic 10.2 Lake Chad area Arabic (LCA) 10.2.1 How “unarabic” is LCA? A discussion of McWhorter 2007 10.2.2 Continuity or innovation

10.3 Damascus Arabic 10.4 Summary

11. Language stability II: Watching paint dry, or, metrics for measuring language stability 11.1 A basic observation 11.2 Stability in historical linguistics 11.2.1 Looking under the hood of transmission

11.3 Why? The basic issue 11.3.1 Verbal predicates 11.3.2 The other predicates

11.4 A multivariate insight into language stability 11.4.1 11.4.2 11.4.3 11.4.4

The data The parameters The statistical tests Conclusions on the basis of the regression results

11.5 Comparative data 11.5.1 Background: Model trees, linguistic and demographic 11.5.2 Overview of main argument The linguistic issue Demographic split Realization of linguistic phenomenon in speech communities Comparative perspective: Things don’t have to be as they are demonstrated to be 11.5.3 Things can be different, I: Universal language typology, some languages are O, others N/O 11.5.4 Things can be different II: For a reason 11.5.4.1 Differential parsing model 11.5.4.2 Modern Hebrew 11.5.4.3 N/O, O split in Arabic 11.5.5 Support from universal factors

11.6 Stability 11.7 Comparative contemporary corpora and historical linguistic interpretation: The limits of adaptation 11.7.1 Overt transmission

321 321 325 332 335

336 338

341 341 343 345

347 348 351

352 352 354 355 361

363 363 364

364 365 365 365 365 366

366 368 370 370

371 375 378

CONTENTS Re-arrangement of overt lexemes, lexemic adjacency, no new structure Exploitation of clause-internal co- and disjoint reference 11.7.2 Inferential transmission

xi

378 379 379

PA RT V. TA X O N O M Y 12. Toward a typology for historical linguistics 12.1 English 12.1.1 Old English 12.1.2 Middle and early modern English

12.2 Icelandic 12.3 Icelandic, Old English, Arabic 12.3.1 English 12.3.2 Gender maintenance in Arabic 12.3.3 Final vowels in Arabic

12.4 Short final vowels and gender 12.5 Arabic as art: Toward a taxonomy of the history of languages 12.5.1 Arabic alinearity 12.5.2 Arabic multi-linearity

13. Summing up 13.1 Four parameters for classifying changes 13.1.1 Contiguous or non-contiguous 13.1.2 Origin: Proto-Semitic, proto West Semitic, pre-Islamic, Islamic, via contact 13.1.3 Age in Arabic 13.1.4 Diffusion, extent 13.1.5 Classification of features

13.2 Parallel independent development 13.3 The speech community 13.4 Incrementation

383 384 385 387

389 391 391 394 396

408 409 413 413

416 416 417 417 419 420 421

423 428 429

14. Why Arabic is special and special for historical linguistics

431

References Subject index Index of language families, languages, and dialects

436 464 475

Series preface Modern diachronic linguistics has important contacts with other subdisciplines, notably first-language acquisition, learnability theory, computational linguistics, sociolinguistics, and the traditional philological study of texts. It is now recognized in the wider field that diachronic linguistics can make a novel contribution to linguistic theory, to historical linguistics and arguably to cognitive science more widely. This series provides a forum for work in both diachronic and historical linguistics, including work on change in grammar, sound, and meaning within and across languages; synchronic studies of languages in the past; and descriptive histories of one or more languages. It is intended to reflect and encourage the links between these subjects and fields such as those mentioned above. The goal of the series is to publish high-quality monographs and collections of papers in diachronic linguistics generally, i.e. studies focussing on change in linguistic structure, and/or change in grammars, which are also intended to make a contribution to linguistic theory, by developing and adopting a current theoretical model, by raising wider questions concerning the nature of language change or by developing theoretical connections with other areas of linguistics and cognitive science as listed above. There is no bias toward a particular language or language family, or toward a particular theoretical framework; work in all theoretical frameworks, and work based on the descriptive tradition of language typology, as well as quantitatively based work using theoretical ideas, also feature in the series. Adam Ledgeway and Ian Roberts University of Cambridge

Preface If the history of a language is tied together within a loose lattice of chronological time and can often be silhouetted against historical events, the object which it describes is multifarious in its manifestations from era to era, diverse in its realizations across different domains of language, and equally predisposed to be marked by dramatic change or by remarkable stability. As attestations grow both in diachronic depth and in geographical breadth so too does the challenge to historical interpretation. Social forces are crucial to determining how a language transmits from generation to generation. In short, there is no linguistic endeavor which brings together so many diverse intellectual threads as does the history of a language. This multiplicity of parameters circumscribes Arabic as they do any language. In the case of Arabic further factors intervene, a metatradition, the Arabic linguistic tradition beginning in the second/eighth century, which enriches through its formal linguistic sophistication and observation of its own internal variability. The earliest textual attestation of Arabic dates to between 1,700 and 2,300 years ago. Its historical backdrop fades out and in, but at a number of places is remarkably detailed. Its contemporary geographical expanse is among the largest of any language in the world, and to a large degree reflects past migrations. Understanding Arabic language history entails in the first instance recognizing the complex challenge facing its interpretation. It is not surprising given the diverse mix of analytical instruments which constitute the basis of Arabic historical linguistics that there should be an impulse to reduce it to simple categories, for instance Old vs. Neo Arabic. The fundamental argument against such reductionism is not to motivate an essentialist alternative but rather to recognize the challenges posed by multiple pathways leading to an overarching historical interpretation. My own pathway to Arabic is constituted of several stages, hardly any single one of which is in and of itself an inherently historical approach. I have had the opportunity of studying multiple Arabic dialects in their home setting. This includes an expansive sociolinguistic study of Arabic in NE Nigeria as well as a corpusbased analysis of idiomaticity, looked at both in terms internal to Arabic and as a reflex of contact-based influence. Five enjoyable years at Yarmouk University in Jordan gave me the opportunity of becoming acquainted with what is surely one of the most fascinating and sophisticated intellectual traditions ever conceived, the

xiv

PREFACE

Arabic grammatical tradition. It was while teaching a survey course on Arabic language history at Bayreuth University that I began to think critically about 100+ years of western scholarship on the subject. This led to placing a greater emphasis on reconstruction based on contemporary sources than had prevailed in the Arabicist tradition. This book is the product of many individuals and institutions. Beginning with the latter, over many years the Deutsche Forschungsgemeinschaft (DFG) has supported my diverse research projects even if at the periphery of the Arabic world, but nonetheless central to issues in Arabic linguistics. As the interpretive narrative has unfolded, these have proven crucial to elucidating topics of historical linguistics. Much of this research was carried out in NE Nigeria, and for this I am pleased to acknowledge the years of support which Maiduguri University has provided. I would like to thank the OUP for hosting a sequel to my 2006/2009 Linguistic history of Arabic. It is a pleasure to thank Prof. Ali Al-Hamad of Yarmouk University, whose interest and patience introduced me to Sibawaih, and to the Arabic grammatical tradition writ large. I leave unnamed the many sympathetic and critical colleagues whose pleasure it has been for me to engage in critical discussion with, including two readers of the current book. Without the valued expertise of Robin Dodsworth, Chapter 11 in its entirety and parts of Chapter 8 could not have been written. I would like to thank Ms. Muhadj Adnan and Ms. Carolina Zucchi for critical comments on a number of chapters, as well as for help in preparing the maps. Most enduringly I thank my wife Jocelyne for her unstinting support and patience in accompanying me in my research (Sibawaih and all) across four continents and over as many decades. It was, finally and firstly, my parents who set me down on the pathway of intellectual curiosity. It is clear that any single linguistic issue has ramifications not all of which can be given justice even in a fairly long book. For some topics greater detail and exemplification help contextualize an issue whose broad contours are adequately treated in a sub-section. Other issues are so large that including them in their full scope would risk sidetracking the main text away from its central expository flow. For this reason an online Appendix has been included, available at www.oup.co.uk/ companion/Owens. The appendix treats matters of both kind. An appendix entry might provide further background to a single, specific issue (see e.g. App. 3.2.2 for further exemplifications of parallel independent development looked at critically). In other cases it might expatiate in considerable detail on a central issue in Arabic historical linguistics, allowing an abbreviated, streamlined position summary to move the discussion in the main text along quickly. The important discussion of Old Arabic–Neo Arabic in Brockelmann and Bergstraesser (App. 1.1.1a) is a case in point. Appendix entries are cited as App. X. I would like to gratefully acknowledge permission to reprint from the following journals and books: “The lexical basis of idiomaticity.” Language Sciences 57:

PREFACE

xv

49–69, 2016; “Dialects (speech communities), the apparent past and grammaticalization: towards an understanding of the history of Arabic,” in Clive Holes (ed.), The historical dialectology of Arabic, 206–256. Oxford, OUP, 2018; “Equilibrium, punctuation, dia-planar diffusion: Towards understanding early Aramaic–Arabic contact.” Al-Qanṭara 39: 391–475, 2018; “Contemporaneous comparative corpora and historical linguistic reconstruction,” Anthropological Linguistics 62: 58–109, 2020; “Deflected agreement and verb singular in Arabic: A three-stage historical model,” Journal of Semitic Studies. 64: 483–502, 2021. Map 8 was reprinted by permission of John Benjamins from my 1998, Neighborhood and ancestry: Variation in the spoken Arabic of Maiduguri, Nigeria.

List of figures and maps 1.1. Linearity in Semitic linguistics

2

1.2. The Arabic koine, Ferguson

6

1.3. Taltala on a tree

6

2.1. Arabic as South Semitic

29

2.2. SS innovation of faaʕal, broken plurals in South Semitic (including Arabic)

30

2.3. Shared maintenance of faaʕal, broken plural South Semitic

31

2.4. Hetzron-inspired Semitic

32

2.5. Parallel independent development of –t ~ – a(h) allomorphy

36

2.6. Innovation + diffusion of –at ~ -a(h)

37

2.7. 1SG perfect allomorphs, realizations in speech communities

44

2.8. t/k perfect + 2FSG allomorphs, realizations in speech communities

46

4.1. Typical split

112

5.1. Split, Merger

121

5.2. Analogical spread of –u PL in imperfect paradigm

142

5.3. Linker –n and tanwin indefiniteness marker

168

6.1. Development of WSA –t ~ Ø allomorphy

184

6.2a. WSA vs. rest of Arabic

200

6.2b. Demographics of 6.2a

200

8.1. Semantic mapping

268

9.1. EA and WSA

282

11.1. Lexeme frequency for null and overt subjects in each corpus

359

11.2. Z-scores from logistic regression models. Values further from 0 indicate stronger estimated effects

360

11.3a. WSA vs. rest of Arabic

363

11.3b. Demographics of Figure 11.3a

363

11.4. Demographic split, lack of linguistic split

364

11.5. Arabic vs. Nubi

370

12.1. Germanic branch of Indo-European

383

13.1. Intrusive -n

419

13.2. Syriac imperfect (Moscati et al. 1980: 142)

425

LIST OF FIGURES AND MAPS

xvii

Map 1. Countries with Arabic as a majority language

46

Map 2. Arabic in the Middle East

47

Map 3. Arabic in Africa

48

Map 4. Arabic as a minority language

48

∗

Map 5. Discontinuous k > cˇ

65

Map 6. Western and eastern LCA dialects

191

Map 7. Gwange and Dikkeceri

195

Map 8. Early migrations into Egypt and migrations from Egypt into Africa

240

Abbreviations and symbols Journals BASOR BSOAS JAOS JSS LVC ZAL ZDMG ! # @ {[[x] y] = z} {} {a} 1 2 3 IIR á, é, ´ı, ó, ú ACC Akk ALT ANA AnA AP Ar ASSOC B+W BA CA CL CONJ CyA D DAT DEF DEM

Bulletin of the American School of Oriental Research Bulletin of the School of Oriental and African Studies Journal of the American Oriental Society Journal of Semitic Studies Language Variation and Change Zeitschrift der arabischen Linguistik Zeitschrift der deutschen morgenla¨ndischen Gesellschaft inappropriate form word boundary underlying form idiom formula: x = actual collocation, y = literal gloss, z = idiomatic gloss orthographic form or root consonants of triliteral root represents the ‘alif ’ (long /aa/ or /ʔa/) in orthographic forms first person second person third person reduplicated form II derived verb stressed vowels accusative Akkadian Arabic linguistic tradition Ancient North Arabian Andalusian Arabic active participle Arabic associative marker Behnstedt and Woidich Biblical Aramaic Classical Arabic cognitive linguistics conjunctive (tense) Cypriot Arabic dual dative definite demonstrative

ABBREVIATIONS AND SYMBOLS DET DM DO DRP DRI DS DS E EA EI ELA EMPH ESA F FT G G/N GEN Gut H H. Heb Her I I/I + D IND INDF INDF Inn IO IPA KSA L L/E LCA LCAL LIN M ME MI MSA N (in glosses) N N NA

determiner discourse marker direct object discourse referentiality paradigm discourse referentiality identifier Dialectal Swedish different subject Emirati Egyptian Arabic early Islamic era Eastern Libyan Arabic emphasis educated spoken Arabic or Epigraphic South Arabian feminine future Ge’ez Gulf/Najdi genitive gutturality high variety (diglossia) Hijri (calendar year), Islamic calendar date Hebrew inherited Islamic era inheritance or innovation + diffusion indicative indefinite indefinite (tense) innovative indirect object International phonetic alphabet Kingdom of Saudi Arabia low variety (diglossia) Levantine/Egyptian Lake Chad area Arabic Lake Chad area languages linker masculine Middle English Modern Icelandic Modern South Arabian intrusive –n neuter noun Neo-Arabic

xix

xx

ABBREVIATIONS AND SYMBOLS

NCS NENA Ness. NO N/O NOM NP NW O OA obj OE OI PER PL PN PP Pro pro PS PST PV PWS QCT R RC RT s1 s2 SA SBJ SG SS SS SS STA SVS Syr TAN TMA UPSID UCLA

Northern cities shift Northeast neo-Aramaic Nessana (city in Negev) New Orleans null or overt (subject) nominative noun phrase Northwest overt (subject) Old Arabic object Old English Old Icelandic perfect plural proper name prepositional phrase empty subject of tensed clause pronoun proto Semitic past preformative vowel proto West Semitic Koranic consonantal text reduplicated relative clause reaction time reconstructed proto-Semitic value of sibilant, customarily interpreted as ∗ š reconstructed proto-Semitic value of sibilant customarily interpreted as ∗ ɬ Standard Arabic subject singular same subject South Semitic Standard Swedish southern Tunisian Arabic southern vowel shift Syriac tanwin tense mode aspect phonetic segment inventory database

ABBREVIATIONS AND SYMBOLS UzA V ύ WS WSA

Uzbekistan Arabic verb or vowel (depending on context) stressed vowel West Semitic West Sudanic Arabic

xxi

Six principles P1 Language and speech community principle P2 Lass’ principle P3a Corollary to Lass’ principle

42 126 133

P4 Grammaticalization sufficiency principle

157

P4 Grammaticalization principle (revised)

158

P3b The incrementation corollary P5 Underspecification and reconstruction

170 176

1 Introduction In this book I develop a model for understanding the history of Arabic, based methodologically to a large degree on Owens (2006 [2009]), but expanded to encompass a more comprehensive treatment of Arabic language history. In Owens (2006 [2009]) the main purpose was to argue three points. First, the traditional treatments of Arabic language history are overly simplistic in defining a linear development from proto-Semitic or proto West Semitic to Old or Classical Arabic to Neo-Arabic. Secondly, previous treatments did not adequately account for historical developments of contemporary Arabic dialects and thirdly, related to this, did not adequately incorporate standard comparative historical linguistic methodology in ascertaining Arabic language history. A corollary of these positions is that the Arabic linguistic tradition (ALT) itself can be understood as contributing to a variegated, sometimes ambiguous reading of language history, rather than as a monolithic Old Arabic or Classical Arabic through which pass all stages to contemporary Arabic. Building on these premises the current work gives more attention to the position of Arabic within West Semitic, and more particularly, critically examines the contribution which pre- and early Islamic epigraphic, papyrological, and early Greek transliterational sources can make to its history. Its main goal, however, is to expand the purview of linguistic perspectives which can be brought to bear on its history. From the outset it should be stated that the book takes critical and often explicit stances toward such central historical linguistic issues as the role of independent parallel development vs. diffusion in evaluating language change or the degree to which grammaticalization theory and linguistic typology can be invoked to justify historical linguistic interpretation. Equally, it interprets the intellectual traditions which have customarily been invoked or assumed in the explanation of the linguistic development of Arabic. Arabic is tailor-made for addressing these and other historical linguistic issues and, following on this, it is suggested that it is insightful to taxonomize fixed chronological periods—a range of about 1,500 years is used here—to define the degree to which languages in such a temporal window change. Icelandic, English, and Arabic are compared as to morphological changes. I start Chapter 1 with a critical introduction which argues that there are a number of fallacies, to be sure based ostensibly on common sense, which have stood in the way of a linguistically orientated historical linguistics of Arabic.

Arabic and the Case against Linearity in Historical Linguistics. Jonathan Owens, Oxford University Press. © Jonathan Owens (2023). DOI: 10.1093/oso/9780192867513.003.0001

2

INTRODUCTION

I would note before beginning that most key place and language names are found on Maps 1, 2, and 3.

1.1 Fallacies and metonymies, both unwanted and wanted By fallacy I understand a well-intentioned linguistic idea which may be supported by a great deal of positive data or by simple common sense that appears to characterize a significant domain of inquiry. A fallacy is marked by two characteristics. First, a tendency, a reasonable assumption, a possible explanation for a given phenomenon may be raised to the status of a universal truth. The fallacy may be based on a solid set of observations and on a plausible argument. Secondly, however, fallacies typically overextend themselves. The essence of the fallacy either is applied to phenomena it does not fit, or it is used to exclude phenomena which should be a part of the universe of discourse at hand. Once established, the mere citation of the fallacy may carry authoritative weight. Fallacies are related to metonymies because it is via metonymies that fallacies may be made to appear reasonable. Metonymies are part–whole relations. Metonymies are in and of themselves reasonable to one degree or another. ‘The pot is boiling’ picks out a prominent instrument for boiling water and attributes to it the property of the material the pot contains. After all, we see the pot not the water boiling in it (barring glass boilers) or the steam coming out of it. It is in the abstract world of disciplined conceptions such as theoretical and descriptive linguistic systems are that one needs to be especially aware of metonymies. By definition a metonymy contains a part. Linguistic systems, however, are wholes. Individual metonymies might be useful for characterizing certain parts of a linguistic system. However, it is unlikely that a metonymy will ever be able to represent an entire language.

1.1.1 Linearity The linearity fallacy says that language history is linear. It goes from landmark to landmark, the linguist merely needing to fill in the interstitial details. In Arabic language history the progression is viewed as in Figure 1.1. Proto Semitic Old Arabic Modern dialects (= neo-Arabic)

Figure 1.1 Linearity in Semitic linguistics

1.1 FALL ACIES AND METONYMIES, BOTH UNWANTED AND WANTED

3

This succession of linguistic stages aligns with the presumed emergence of peoples in the Middle East as propagated by Carl Brockelmann (1908/1913) who saw the individual Semitic languages emerging as waves across the Middle East. Akkadian was the first, followed by Hebrew and Phoenecian, Aramaic and finally Arabic.¹ Brockelmann’s waves coincide with when the languages emerge in attested history, which does not necessarily reflect their actual age. An early skeptic of the linearization of Arabic was Karl Vollers (1906 [1981]), whose ideas, however, gained little traction among Semiticists (though more recently see Edzard 1998). In the online Appendix 1.1.1a is found further background exemplification from the classic philological texts to the Old–New dichtotomy. The problem here is not the postulation of a succession of stages. The assumption of such a construct is constitutive of historical linguistics itself. It rather lies in how the stages are justified in terms of such basic concepts as splits and mergers (Chapter 2). Chronological history does not automatically determine linguistic history. A part of a language may go from stage A to stage B, while another part stays at stage A (see Chapter 5) while over given periods of chronological time, even very long periods, languages or parts of language may hardly change at all (Chapters 10–12). Intimately connected to the linearity fallacy is the idea that there always exists a direct descent from a proto-form to successor form. In principle this is not a fallacy at all but rather a tenet of historical linguistics. In many situations, however as Edzard (1998) observes, the further back one goes in time the more difficult the issue may become and it may not be possible to decide what a reconstructed predecessor was. Indeterminate situations need to be allowed for (see e.g. discussion in Chapter 2 and 4.1.5). A rather pernicious corollary of the linearity principle is what might be termed the “one language” principle. Arabic has only one history so it must proceed sequentially from one stage to another, as in Figure 1.1, from Old Arabic to the dialects. This idea was developed as early as 1854 by the German Arabicist Heinrich Fleischer (1854) whose “holistic Arabic” (Gesamtarabisch) was divided into three stages, Old, Middle and Neo-Arabic (Altarabisch, Mittelarabisch, Neuarabisch; see discussion in Owens 2006 [2009]: 40–43). Old Arabic is the language of the Koran, of the grammarians, pre- and early Islamic poetry, Middle Arabic the literary language that emerged in the early Islamic era² and Neo-Arabic the modern dialects. I would add that formal models of language structure may also reinforce derivation from a unitary source, a point discussed in App. 1.1.1b. Without falling into the trap of recognizing “Arabics” (as in McWhorter 2007, see 10.2.1) as opposed to “Arabic,” a different “one language” principle is to

¹ Brockelmann does not align Ge’ez, representing South Semitic, among the expansion waves, merely noting that its speakers probably reached Ethiopia in the pre-Christian era (1908: 30). ² Similarly it appears to Fischer’s (1982) post-klassisches Arabisch.

4

INTRODUCTION

recognize fuzzy boundaries. Languages consist of dialects, they may consist of standardized varieties, of non-standardized written varieties. Each of these constructs has their own history (Chapter 3 and 6.8). With the rise of sociolinguistics it is now appreciated that languages are embedded in speech communities. These are semi-autonomous linguistic entities which may innovate or remain stable independently of other, genetically related speech communities. Chapters 5 and 6 are devoted to this topic.

1.1.2 The written over oral fallacy In the case of the Semitic languages, closely aligned to the progression in Figure 1.1 is the precedence given to the written word over the spoken (see Borg 2022: 8, 10). In the standard work on Semitic languages, Moscati et al., written 1964 and revised 1980, there was no attempt at all to integrate contemporary spoken reflexes into reconstructions, these being relegated to the “neo” stage. The written language becomes the stand-in for the language as a whole. This priority may appear selfevidentally correct when contemporary orality lies well over 1,000 years later than earlier written attestation, but the matter is not so simple either in principle or in practice. The principle involved is how in the comparative method ancestral and successor forms are defined. This is an issue running in the background throughout this book. The practical issue is that nearly all the earliest attestations of Arabic are written in a script which does not distinguish short vowels and other elements which are without exception necessary for ascertaining syllable structure, and which often are crucial because they may carry contrastive morphemic values. Ascertaining their value is as much a problem today as it was in the eighth and ninth (third and fourth H.) centuries when, for instance, scholars worked on a consensual received version or versions (the qiraaʔaat) of the Koran whose earliest manuscripts showed neither short vowels nor even many consonantal contrasts (see Chapter 4 and ch. 4 n. 9).

1.1.3 The part for whole metonymic fallacy In the early era of the historical study of Arabic, up to Bergstra¨sser (1928 [1977]: 156) the linearity principle was sufficient for a conceptualization of Arabic history. Arabic axiomatically was assumed to have an “old” stage which was closely identified with Classical Arabic, a difficult-to-define concept treated in greater detail in Chapters 3 and 4 and App. 1.1.1a. Fischer (1982: 37–45) for instance identifies the sources for CA, namely the Koran, poetry, grammatical treatises, and early ʔaadaab literature as the same sources which generically define Old Arabic in the sense of Owens (see discussion in 2006 [2009]: 39–43, 2019: 30, 1.5 below, and 3.3.1).

1.1 FALL ACIES AND METONYMIES, BOTH UNWANTED AND WANTED

5

This stage is succeeded by a “new” or “neo” stage, manifested inter alia in the modern dialects. Beginning with Bergstra¨sser a more sophisticated linguistic argument was introduced, namely that “die neuarabischen Dialekte gehen im grossen und ganzen auf eine einheitliche Grundform zuru¨ck.” This may be the first time the contrast between Old and New Arabic was formulated as a linguistic postulate, essentially that the two are characterized by distinctive differences which allow one to speak of a split, splits being a classic concept for defining different entities (dialects, languages, entire language families—see Chapter 2). However, as the idea was developed in greater detail, Fu¨ck (1950), Fischer and Jastrow (1980: 39–48), Blau (1981b: 3–4, 2002: 16) as well as Ferguson (1959), the criteria defining the split turned out to be metonymic fallacies. What happened was that a general property was ascribed to Arabic as a whole, on the basis of a property which in fact pertains only to a sub-class of the language. The sub-class becomes a metonym for the whole. Using classic comparative criteria, these scholars defined a split between Old Arabic (Altarabisch) and Neo-Arabic (Neuarabisch) on the basis of individual features. Their criteria, however, do not categorically define a split between “old” and “new” Arabic at all. The list that follows, given along with at least one exception which is equally attested in Old and in Neo-Arabic, is a very brief summary (see Owens 2006 [2009]: 43–74 for longer critical discussion). The metonym is the Neo-Arabic feature in the second column. This metonym, however, does not generalize across all Neo-Arabic varieties as can be seen in column 3 in Table 1.1. The metonym itself may already be attested in OA (1b). Table 1.1 The part for whole metonymic fallacya Old Arabic (OA)

Neo Arabic

contemporary “OA” attestation

a. ʔaf ʕal comparative b. ta-f ʕal-u c. CV-CV

faʕiil min (simple adj + min) ti-f ʕal CV-CV → CCV

ʔaf ʕal min (most dialects today) ta-f ʕal (Hijazi, Sudanese Arabic) highland Yemen CVCV

a

For a recent expression of the part for whole fallacy see al-Jallad and van Putten (2017) who appear to assume that because some short final vowels in some Arabic dialects have been lost, the parameter “loss of final short vowels” defines a contrast between the neo-dialects and Old Arabic (see discussion in 12.3.3 and ch. 12 n. 12 and Table 12.8 n.b).

Comments: point (a), due to Blau 1981b: 3–4. Virtually all contemporary Arabic dialects maintain the ʔaf ʕal comparative, including in fact some Chadian dialects which Blau lists as innovative (Owens 1993b: 130, 137),³ e.g. ʔakbar min ‘bigger than.’ A few use the basic adjective + min, kabiir min. Point (b), due to Ferguson 1960: 621. A number of Arabic bound morphemes alternate between /a/ and /i/, ya ~ yi ‘imperfect verb preformative vowel; -at ~ -it ‘3FSG’; al- ~ il- definite article (termed “taltala”). Ferguson held that the /i/ variant was a defining property of modern dialects, i.e. Neo-Arabic. ³ Blau attempts to bundle a number of features together to argue that Old Arabic is characterized by a synthetic morphological structure, Neo Arabic by an analytic. The contrast is deconstructed in Owens (2006 [2009]: 111–113).

6

INTRODUCTION

Aside from the fact that, as Ferguson himself noted, the /i/ variant is also attested in the ALT,⁴ there are many contemporary dialects with /a/ in various morphemes (see discussion in 5.3.2.5). Point (c), due to Fischer and Jastrow 1980: 40. Old Arabic maintains short vowels in open syllables, Neo-Arabic deletes them. First, there are instances noted in the ALT where short vowels in open syllables are in fact deleted (discussion in Owens 2006 [2009]: 59–63). Secondly, there are contemporary dialects, e.g. Yemen, some Egyptian oases, where all short vowels are maintained in open syllables. This feature is discussed again in 2.3.2, 7.2.2, and 12.3.3; cf. Behnstedt 1985: 53–54, B + W 1985: 53–54, 64–68, B + W 2018: 71).

I would note that the basic idea behind the part for whole metonymy follows standard comparative linguistic practice: if a universal feature of Old Arabic should indeed be lacking in Neo-Arabic, then it is a candidate for defining a change via shift, split, loss, or merger. If valid it is not a metonymy (or it is a “correct” metonymy). It is a legitimate example of a change that can be used to define a historical linguistic stage. Ferguson would represent the “taltala” development (see 5.3.2.5) as in Figure 1.2, presumably. It should be noted here that Ferguson uses CA as a proto-variety (not OA) and that in his model the modern dialects descended not directly from CA but rather from a reconstructed common koine which itself was a descendent of CA, hence the intermediary node. CA /a/ *koine /i/ modern dialect 1

modern dialect 2

modern dialect 3 …

Figure 1.2 The Arabic koine, Ferguson

In this idealized model there is a categorical shift of /a/ → /i/.⁵ The problem here and indeed for most Arabic features which have been proposed as representing historical linguistic change is that it is contradicted by the data, in this case, both in OA and in contemporary Arabic (see Figure 1.3). *OA /a/~ /i/ dialect 1 /a/ ~ /i/

dialect 2 /a/ ~ /i/…

Figure 1.3 Taltala on a tree

⁴ /a ~ i/ variation in CA is termed “taltala.” ⁵ Ferguson (1960: 622): “… this /i/ for /a/ is either a general phonetic change or a morphologicallyconditioned change of some kind affecting all affixes.” It should be noted that Ferguson (1960) is still held by some scholars as an attractive explanation for the historical development of Arabic (Heath 2015: 12). Similarly Knauf, discussed in ch. 3 n. 12 below, follows a Fergusonian model.

1.1 FALL ACIES AND METONYMIES, BOTH UNWANTED AND WANTED

7

1.1.4 Historical linguistics via non-linguistic criteria: The “cultural entities are linguistic entities fallacy” Before offering an alternative linguistic model to linearity, it is relevant to look critically at one further approach to treating Arabic language history. The treatments in Table 1.1 and Figures 1.1 and 1.2 at least have the goal of defining historical stages on the basis of explicit linguistic criteria. There exists in Arabic studies another tradition which defines linguistic stages according to historical-cultural criteria or according to historical-genre criteria. One can term this the “cultural entities are linguistic entities fallacy.” I begin with a tradition associated inter alia with Nabataean Arabic. Nabataeans are a somewhat enigmatic group who dominated and for a time ruled an area stretching between northern Saudi Arabia (Madaaʔin Ṣaaleh) and the middle of Jordan. Its lasting legacy is the stone city of Petra in Jordan, also its capital. The Nabataeans ruled this area between ca. 168 BC and AD 106 (Retso¨ 2003: 378). The Nabataeans themselves wrote in Aramaic (see Cantineau 1930), though their own ethnic identity has been interpreted both as Aramaic and as Arab. As will be assumed in Chapter 7 when the question of early Aramaic–Arabic language contact is treated, I think they need to be regarded as having a composite ethnic identity. For present purposes, their Arab affiliation is betrayed by one significant Arabic inscription written in Aramaic script from Nemara in southern Syria (AD 328), a very few bilingual (codeswitched) Arabic–Aramaic texts containing individual Arabic words or phrases, and Arabic personal names written in the Nabataean script as part of Aramaic texts. Thus we know Arabs were intimately involved in the Nabataean culture. The Arabic language in it, however, is relatively sparsely attested. Despite this, Knauf (2010: 229) claims that “Nabataean Arabic became some sort of standard Arabic as early as the second century BC, but it was a spoken, not a written language….” Gzella (2017: 308) views the language statuses similarly. Knauf further discovers (2010: 244) this “early Standard Arabic” in the earliest Arabic script inscriptions from the sixth century (Zebed, Harran, Jabal Usays, see Map 2), a corpus which amounts to less than fifty words. The linguistic testament, however, is analogous to much of Ancient North Arabian, discussed in 4.1.1. It is all but non-existent in its written attestations; how it could be ascertained to be a spoken standard defies linguistic methodology and logic. While we certainly can assume that Arabic of some sort was spoken in the Nabataean civilization, the largely non-existent knowledge of its actual linguistic structure and usage renders ascription of its socio-cultural status as a standard language otiose.⁶

⁶ Knauf follows in the tradition of Bellamy (1985) who equally “worked on the assumption that the language [of epigraphy] is Standard Arabic” (1985: 46); see Owens 2006 [2009]: 20–21 for criticisms analogous to those of Knauf.

8

INTRODUCTION

If Knauf derives standard Arabic from an assumed socio-cultural status without regard to its linguistic attributes, Kaplony (2015, 2016, 2018) represents an altogether different perspective. His careful documentation of the early papyri leads him to discern a discrepancy between Classical Arabic and the language of the papyri. These will be discussed in more detail in 4.2, but to give one instance, in Umayyad documents (661–750) the 2FSG suffix is assumed to be –ti, as in katab-ti ‘you.F wrote.’ In Abbasid documents (750–969) a new variant -tii (katab-tii) occurs (2018: 14). Sibawaih (I: 323) has –ti as in Classical and many varieties of Arabic, but also notes the variant –tii.⁷ Refreshingly, Kaplony does not view –tii and other variant forms as deviations from an assumed, existent standard, but rather reserves a place for them in the appellation “documentary Arabic,” the language of official scribes as used in business letters and personal writing, as attested in the early papyri and other sources. While Knauf standardizes before there was a demonstrable standard, Kaplony goes to an opposite extreme, using genre as the criterion of linguistic variety. While Kaplony’s approach is clearly the descriptively more adequate, neither Knauf nor Kaplony are working within a historical linguistic framework. Knauf proceeds by definitional fiat. Kaplony legitimately isolates a written genre which can be followed over a number of centuries. He makes no attempt, however, to integrate this into a holistic interpretation of early Arabic, even an interpretation of the written sources only, to reconstruct antecedent forms, to resolve discrepancies into sequential linguistic developments, for instance between –ti and –tii, in a comparative framework. Kaplony has gathered and systematized important data whose import for historical linguistics will be evaluated in 4.2. It is a resource, not an explanation.

1.1.5 The script is language fallacy Though a truism, it is worthwhile recalling that the history of a script is different from that of the history of a language. This point bears reiterating in the case of Arabic because there is doubtlessly more scholarly attention paid to written Arabic than to spoken. This is mentioned not as a criticism but rather as a reminder, because scholars working via written texts may have their world view circumscribed by the medium they are specialists in. I think it no accident that neither Knauf, nor Kaplony, simply to name the two scholars treated in the Nehmé (2017) offers a good overview of the language of Nabataean texts between the third and fifth centuries (ca. AD 200–428) which in my view (Nehmé appears reticent on this point) clearly indicate an Aramaic dominance, albeit with significant Arabic features as well. All in all, as Retso¨ and Hajayneh observe (see Chapters 3 and 4), they suggest a high degree of bilinguality. It is highly unlikely that from the epigraphic record we will ever be able to answer the question of what the L1 of the population was, if indeed there was a single L1. ⁷ Though in the Kitaab, only before an object suffix, katab-tii-hi ‘you.F wrote it.M.’

1.1 FALL ACIES AND METONYMIES, BOTH UNWANTED AND WANTED

9

previous section, make any connection at all to the contemporary spoken language. Remarkably, Knauf claims to have found a standard spoken variety that is over 2,000 years old, where the idea of “spoken” could only have a metaphorical sense. That script is different from language is evident in the different scripts in which Arabic itself is written. From the era of Old Arabic itself these include those in Table 1.2. The first three are based on a south Arabian-type of script (there being a number of related south Arabian scripts). Nabataean Aramaic uses an Aramaic script, and it is usually assumed that the Arabic script as we know it today derived from the Nabataean. It should be noted that in many cases the inscriptions are so short—20 words or less in the case of 1, 4, and 6—that it may be not be possible to determine whether the inscriptions are in fact early Arabic or a related North Arabian language⁸ (see e.g. Al-Jallad 2014). Table 1.2 Pre-Islamic Arabic inscriptions Script

Place

1. Sabaic 2. Dedanitic 3. Safaitic/Hismaic

Qaryat al Faw, SW KSA (first century BC) Dedan (northern Hijaz, sixth century BC - first century AD) Syrian-Jordanian desert/southern Jordan (first century BC–fourth century AD) Avdat (ca. AD 100–200, see Kropp 2017), Hegra Hijaz (Raqaash, AD 267), Nemara inscription (AD 328) Arabic proper names in Greek (4.3), Damascus Psalter, ca. seventh century to tenth century; see 5.2.2 and App. 5.2.2 Zebed, Harran, Jabal Usays (all Syria-Jordan in sixth century AD)

4. Nabataean 5. Greek 6. Arabic

Hoyland (2008) would see evidence in these for a fairly widespread literacy in the Arabic language.⁹ Be that as it may, the fact that the Arabic script itself as the medium for representing Arabic appears only toward the Islamic era is, as Retso¨ would have it, indicative of Arabic being but one language among many in the region. By the same token, it underscores the basic linguistic reality that a language exists whether or not it has a script to represent it. This is because languages are a spoken medium supported by speech communities. Scripts are obviously of considerable socio-historical importance, but as will be seen in detail in Chapter 4, scripts alone, particularly those which impinge directly on this study, are generally inadequate as media for providing the detailed picture of language structure needed for a basic reconstruction of language history. ⁸ This assuming that a meaningful distinction among the ANA varieties can be consistently drawn. ⁹ Though elsewhere writes that Arabic “remained primarily a vernacular, employed by non-literate peoples …” (2004: 184).

10

INTRODUCTION

1.2 Non-linearity: An empirical comparative alternative Linearity has the intuitive advantage of iconicity: language history unfolds in lockstep with interval time. In some cases, as will be seen in 12.1, it may be an appropriate model for understanding language history. For a variety of reasons, some relating to the impoverished nature of very early sources (Chapter 4), some, paradoxically, to the richness of early sources (Chapter 3), some to results derived from application of the comparative method, a major premise of this book is that linearity does not work in an insightful way for Arabic. The essential argument is, rather, that Arabic linguistic history can be better elucidated not through a single, one-factor model—the succession Old Arabic to Neo-Arabic—but rather through five distinctive elements. The first is basic to any historical linguistics and that is reconstruction via the comparative method (Chapters 5 and 6). Without it there is no historical linguistics. Beyond this I will introduce four further factors in various ways in the course of the book. While these factors in principle are relevant to the historical interpretation of any language, fate is such that they are manifested in particularly clear terms, in one case uniquely so, in Arabic. The second element, after the results of the comparative method, is the role of speech community in interpreting language history (Chapter 6). This has two aspects, one banal, one not so. The idea of the speech community is relevant to any language history and in this sense, to mention the factor is trivial. What makes it of particular relevance to Arabic is that a considerable amount is known about Arabic, about its place among Semitic languages, and about expectations as to how individual linguistic features may change. Given what we know about Arabic-Islamic history and the nature of individual Arabic speech communities, the idea of speech community can be invoked in certain interesting cases in order to help decide how best to interpret a linguistic change. A third motif is the importance of language stability over long periods of time. Languages change. They also remain stable. In the case of Arabic, how stable they remain can be estimated via triangulation involving the comparison of varieties (dialects, older sources) widely separated geographically and by implication by chronological distance. Chapters 10 and 11 are devoted to this factor. Equally, given its large geographical extension and its original expansion out of the Middle East, one perturbating factor, the role of contact, borrowing and shift in individual varieties is clearly discernible (Chapters 7–9). Fourthly, once the idea of language stability is given due recognition, the importance of contemporary spoken Arabic takes on greater evidential value. This is not a matter of identifying in piecemeal fashion relics, odd survivals of forms which might be assumed to have long died out in older forms of Arabic, or even in other

1.3 DATA SOURCES AND METHODOLOGY

11

Semitic languages. Examples of this type are discussed, are interesting and are relevant.¹⁰ However, it is far more the case that contemporary Arabic can give us insights into how languages are passed on generation to generation—in Labov’s terms, the transmission question. As it turns out, this aspect of Arabic links up with the reading of the Arabic linguistic tradition advanced here (Chapter 3) that interprets a number of developments traditionally thought of as contemporary “dialectal” as already embedded in early descriptive work of the ALT. In their entirety the first four agenda items go beyond a narrow focus on reconstruction. A broad overview over Arabic is intended, which gives due attention to change, stability, and contact, and how these elements are embedded in a sociolinguistic reading of the material, to the extent that this is possible. Over-ambitiously perhaps, a more holistic reading of Arabic language history is intended than is traditionally the case. This large-scale perspective is developed further beyond Arabic (or Semitic languages) to a broader taxonomy of languages’ history (Chapter 12). It is argued that certain languages—arguably most of the world’s languages are not able to offer such a perspective—are especially interesting for historical linguistics because their development can be followed via different media over a long period of time. For this comparative taxonomy three languages visible to us in documentation and reconstruction for approximately 1,500 years are compared: Icelandic, English, and Arabic. The focus remains on Arabic and it will be argued that Arabic in an interesting and perhaps unique way can be located between Icelandic, a language of remarkable stability, and English, one marked by dramatic changes. Throughout the book general questions central to historical linguistics are examined critically: how grammaticalization theory may or may not inform historical linguistic interpretation is treated in a number of places; how, methodologically, the important factor of language stability can be integrated into historical linguistics is developed. The concept of independent parallel development, a default option for many working in Arabic historical linguistics, is given particularly detailed treatment, as is the question, already alluded to, of how socio-demographic parameters can be integrated into the history of Arabic.

1.3 Data sources and methodology Arabic historical linguistics suffers from an embarrassment of riches because Arabic provides an embarrassment of linguistic riches. Herein lies one of the challenges to the historian of Arabic. The data begins in pre-Islamic times with ¹⁰ Pat-El (2017) discusses seven possible instances of disparate Arabic dialects having features linking them with other Semitic languages but distinctive from Classical Arabic. These features deserve closer scrutiny from different angles. The assumption of Semitic inheritance vs. contact-based change is not discussed for instance, while some (e.g. wawation) appear very marginal. A more far-reaching criticism is that the exercise assumes that there should be a linear movement from PS or PWS to CA, and that it is only in deviations from this path that the neo-dialects are interesting.

12

INTRODUCTION

often hard-to-decipher epigraphic texts, which for the most part lack all direct indication of short vowels. This is complemented to a degree by Greek renditions of Arabic names in various papyri, which give some direct idea of the nature of vowels. The early Islamic papyri are also part of this source. I would tentatively include early Koranic manuscripts as well here, though since these are not considered in this grammar their linguistic status is not treated (see Larcher 2021: 19–38 for overview, van Putten 2022 for recent treatment). I term this tradition, data source 1. These early interpretive conundrums then explode into one of the most detailed grammars ever written, the Kitaab of Sibawaih (177/789), data source 2. Were Sibawaih to walk into a linguistics classroom today (or perhaps better, one 20–30 years ago), he would be pleased to see that the moderns had followed his lead. Whereas the earliest sources for instance offer little or no information about short vowels, pausal forms or a host of phonological observations which we take for granted in Classical Arabic, it is Sibawaih who not only fills in a remarkable amount of detail, but does so in a systematic, theoretically coherent fashion. Sibawaih, moreover, is the initiator of an Arabic linguistic tradition (ALT) which branches into a multitude of intellectual directions, some of them interesting largely for their theoretical coherence and insight into domains such as linguistic metastructure, discourse/pragmatics and semantics, others for their interpretive insights into Arabic language history. Without putting a strict definition on the term, it is effectively with Sibawaih that Classical Arabic begins. Not least importantly for the current work, Sibawaih also was an excellent dialectologist and sociolinguist. Prescriptive grammar was not in his purview. Finally we arrive at data source 3, contemporary Arabic, the native language of approximately 300 million speakers spread from Uzbekistan to Nigeria (see Maps 1–4). While it may appear odd to mention this data source in a language whose epigraphically attested history goes back over 1,500 years (see Table 12.10 in Chapter 12), it will be seen in the course of this book that because Arabic exhibits a remarkable stability across interval time, reconstruction based on contemporary sources is remarkably efficacious. It cannot be emphasized enough that the data base from each of these sources has burgeoned over the last fifty years. Taking Ferguson (1960) as a methodological point of comparison (see 1.1.3 above), in Ferguson’s time the main data source on Arabic was essentially Classical Arabic, data source 2. This was implicitly and sometimes explicitly assumed to represent a proto-situation, a language deriving directly from a proto-Semitic source, and from which the contemporary dialects derived (see Figure 1.2 and ch. 3 n. 12). Some contemporary observers such as McWhorter (2007) still adhere to this position (see 10.2.1). While both editorial efforts relating to the original ALT sources and critical studies describing them have made significant contributions over the intervening sixty years, this period of time has seen a far more significant improvement in the other two data sources. The epigraphic and papyrological material now enjoys large data bases,

1.3 DATA SOURCES AND METHODOLOGY

13

due inter alia for instance to Michael Macdonald et al.’s Online corpus of the inscriptions of Ancient North Arabian (OCIANA) and the Arabic Papyrological Database overseen by Andreas Kaplony. A special resource is The Quranic Arabic corpus conceived of by Kais Dukes which enables quick textual reference to the Koran, from a number of different angles, lexical, morphological, syntactic, and semantic. Data source 1 now has a very strong and accessible empirical basis. The same applies to data source 3, contemporary Arabic. There are now online data bases in both transcriptional and audio format housed at the University of Heidelberg, the Semarchive (Semarch), as well as individual online corpora, for instance the extensive corpus of Lake Chad Arabic housed at Bayreuth University. Beyond this, particularly in regard to data source 3, contemporary Arabic, there has been a burgeoning of studies of individual dialects, the esteemed work on dialect atlases by Peter Behnstedt and Manfred Woidich, as well as a significant number of sociolinguistic studies of spoken Arabic. To these data sources I would add a fourth factor external to Arabic and that is the growth over the past 30–40 years of linguistics itself, particularly sociolinguistics and corpus linguistics. One upshot of this growth in well organized, professional material is that the information which we now have on Arabic has far outstripped its received historical interpretation. Ferguson 1960, was interpreting an infinitely smaller data set than we have today. A second is that it may, strictly speaking, be impossible for any single individual to have a complete mastery of the field, both in terms of personal experience and in terms of theoretical orientation toward the material analyzed. Herein lies the danger of mistaking interpretation in terms of one’s own field of specialization as a stand-in for the history of Arabic as a whole, or to search for metonymic short-cuts (1.1.3 above) which in the end account for only a part of the history of the language. Each data source comes with its own intellectual history. Roughly, data source 1 is associated with a philological tradition (Edzard 2013 [2019]). The interpretation of text stands as a center piece in this tradition, and its culmination in the data bases cited above attest to its success in pushing back our corpus of Arabic to anywhere between 300 and 900 years before Islam (see 12.5). Data source 3 is associated with an oral dialectological tradition. In terms of chronological time it is situated in the here and now. As seen above, traditionally in Semitic and Arabic studies this very large data source had no central role to play, as it was considered to represent a “younger” stage of Arabic than Old Arabic. However, a case has been (Owens 2006 [2009]) and will (here) be made, that applying standard comparative historical methodology reveals contemporary dialects to reconstruct to the earliest era of Islam, and before. Data and methodology go hand in hand. The first data source—epigraphy and papyri—is confronted with two challenges. The first is simply how to decipher the primary data. Here I defer completely to the many excellent collections and studies of individual inscriptions such as now available on OCIANA. Interpretation is a

14

INTRODUCTION

problem because of an exclusively consonantal script. Here I will comment critically in Chapter 4 on the two main procedures with which missing linguistic detail is filled in: reference to Classical Arabic and reference to proto-Semitic. Both are problematic in different ways. The third data source is contemporary sources. Again an initial challenge now adequately met in a long tradition of field research is the collection and analysis (alias grammar writing) of primary data. As an object of diachronic interest, contemporary sources are obviously useful only if traditional comparative linguistic methods are applied. This, in fact, poses neither a material nor a conceptual problem. The main question is whether the results so obtained in fact lead us into a past we are looking for. To further elucidate this question I expand the comparative method in two ways. One is to invoke comparative oral corpora and their attendant quantitative means for ascertaining questions of contact-induced change and long-term stability in paradigms. A second is to operationalize the idea of speech community, not so much to explain individual reconstructions as to provide a framework in which the often times confusing multilinearity of Arabic can be embedded in a classic comparative linguistic interpretation. All in all it is demonstrated that contemporary sources can take us a remarkably long way into the past. This leaves data source 2, Sibawaih, the Arabic linguistic tradition and Classical Arabic. This tradition is interpreted here as a balance between the older sources, which it borders on chronologically, and a reconstructed Arabic. Relations to each of these are interesting for different reasons. By consensus, Classical Arabic is not the same as documentary Arabic, first attested before Sibawaih. By the same token, Sibawaih and the documentation in the ALT provides a kind of litmus test for the plausibility of reconstructions arrived at via data source 3. There are thus multiple pathways (Owens 2018c) leading to an interpretation of proto-Arabic, not all of which self-evidently converge on a unified object. For the most part the study treats the bread and butter of historical linguistics, phonology, morphophonology, and morphology. A section on the lexicon is intended more to indicate what contribution this domain might make than it is a thorough-going treatment. The study does, however, branch into less chartered territory in a number of respects. It interprets the semantics of idioms, hardly treated within a diachronic framework in any domain of historical linguistics; it correlates the structural stability of paradigms with properties of discourse they are embedded in and, as noted above, at a number of points uses comparative corpora to arrive at historical linguistic conclusions. A final remark about the general thrust and style of argumentation in this book. The degree of technical explanation will vary considerably from topic to topic. On the one hand, addressing the general historical linguist rather than the Arabic specialist, basic sketches of key background topics are given. These include prominent aspects of proto-Semitic (2.1), a complete epigraphic North Arabian text (4.1.1

1.4 NOTES AND CONVENTIONS

15

App. 4.1), a text-by-text discussion of early papyri (4.2), and summaries of three Arabic dialects (Chapter 10). This presentation of nearly raw data is intended to allow the non-specialist to get a feel of the data on which generalizations are based. By the same token the citation of numerous morphological paradigms from Icelandic and English in Chapter 12 is intended to allow the Arabic specialist to reflect on the degree and manner in which Arabic paradigms from a comparable era have changed, or failed to. On the other hand, rather complex comparative linguistic derivations (Chapters 5 and 6) and linguistic methodologies (corpus-based analysis in Chapters 9 and 11) are developed. For many arguments a detailed knowledge of Arabic will be relevant.

1.4 Notes and conventions A number of conventions and terminological notes are summarized in this section. “Old Arabic” has been used in at least three senses. Often it is an undistinguished cover term for an early stage of Arabic (Blanc 1964). This is not a reconstruction, but rather, it appears, a shorthand term for a hard-to-define intuitive concept. Macdonald and a good part of the Semiticist tradition terms all the Ancient North Arabian (see 4.1) “old Arabic” varieties. This would include Safaitic, Hismaic, and perhaps the other ANA languages listed in Table 4.1 in 4.1. It does not appear that Macdonald would include Classical Arabic within Old Arabic (2000: 29). A third usage is my own, “Old Arabic” being the sum total of all written sources up to and including the ALT. This encompasses epigraphy, papyri, the Koran, and Sibawaihi’s Arabic (see Owens 2006 [2009]: chapter 2, 2019a: 30). It does not include reconstructed forms. My term Old Arabic is about material sources and says nothing about the interpreted form of Arabic recorded in these sources. “Old” in this sense means chronologically old and may range over genres which are linguistically quite different. Classical Arabic as described by Sibawaih and the early papyri are equally “Old Arabic” in this sense (see 4.2.3). All usages are problematic in one way or another. The problem to a degree inheres in the material itself. Is it possible to draw linguistically justifiable boundaries between the various languages and varieties which are undoubtedly related to Arabic as we know it from the Islamic era, yet attested often in such fragmentary form that determining a precise link is impossible? This is an ongoing issue. As Retso¨ (2013 [2019]) observes, the inherently fuzzy concept of dialect continuum is further aggravated by the fact that hardly enough data exists for most ANA varieties to develop an adequate linguistic classification beyond the observation that they are somehow closely related to Arabic as known from the ALT and thereafter. From my perspective distinguishing between the material sources which one is Old Arabic or which one is Proto Arabic allows one to concentrate on the key historical linguistic concept, namely proto-language. This by definition is a

16

INTRODUCTION

reconstructed concept. Edzard (1998) cautions that in a strict historical linguistic sense the concept of “proto” in an Arabic context is problematic. However, it has utilitarian value. Proto language as used here is more an idealized goal than it is a complete set of reconstructed forms. In some cases proto-forms are suggested, but in many instances it is a general idea, a possible direction toward a reconstruction that is given, rather than a concrete starred form. The many loose ends reflect the state of research in Arabic language history and perhaps the nature of the object that one is trying to reconstruct. In this vein, arguments unfold across the length of the book. A single phenomenon, such as the different stages of agreement patterns, may be thematically relevant at different places in the book and therefore may be introduced multiple times according to their relevance to the topic at hand. The conception of the book is not a linear treatment of the historical stages of Arabic, which as will be seen may be a chimera in any case, but rather highlighting the key linguistic elements which are crucial to a global understanding of Arabic language history. I generally follow IPA renditions of Arabic, for instance /x/ rather than “kh” or “hh” for the letter “xaaf.” When a neutral name for a sound with a number of variants is needed, I will use the Arabic letter name. For instance the “qaaf ” variable is a cover term of the variants [q ~ g ~ ʔ ~ ḳ ~ dzˇ ~ dz]. For the stem patterns—wazn or binaaʔ in the Arabic terminology—I use the traditional Arabic designations such as faʕʕala (stem II derived verb) rather than the Semiticist terminology (d-stem). Orthographic forms are placed between curly brackets, “{abw}” (abuu). “@” represents an underlying form which may surface differently from its underlying value. In some cases abbreviations are used with more than one meaning, “SS” for instance for “same subject,” “Standard Swedish,” and “South Semitic.” The context always disambiguates. I abbreviate the Arabic of the Lake Chad region to LCA. In previous publications I have termed this Nigerian Arabic, though in fact the large sample on which this work is based includes many speakers from western Chad and some from Cameroon. As far as territorial Nigeria goes, two main dialect areas are termed western and eastern (see Map 6). In other publications these have been referred to as Ngummaati for the western and Bagirmi for the eastern, using local geographical terms which roughly correspond to the dialectal regions. In the course of the book certain general linguistic principles are identified, relevant not only for understanding Arabic linguistic history but for historical linguistics in general. Some of these emerge from the discussion of Arabic, while others were developed elsewhere but are considered particularly cogent for understanding Arabic language history, for instance Lass’ principle (P2). Finally, in an online appendix material can be found which provides relevant facts (such as paradigms) and critical commentary on the main text. In some cases these constitute sub-sections in and of themselves. “App” + section number links the text to the appropriate place in the Appendix.

1.5 OVERVIEW OF CHAPTERS

17

1.5 Overview of chapters The remainder of the book can be divided into five thematic parts as summarized here.

Part I: Old Arabic The first part encompasses Chapters 2–4 looking critically at the place of Old Arabic, in my sense of Old Arabic sources, in interpreting Arabic language history. It examines Arabic as a Semitic language, Arabic as a philological language, and Arabic as a classical language. Chapter 2 introduces the reader to Arabic as a Semitic language. A core of basic phonological and morphological structures characterizes all the Semitic languages. In this world, Arabic is one out of many Semitic languages. The genetic classification of Arabic has therefore been the object of discussion going back to the nineteenth century or earlier, a key question centering on the interpretation of Arabic as a South Semitic language, closest to Ge’ez (inter alia) or being a Central Semitic language, closest to Northwest Semitic languages such as Aramaic and Hebrew. In this book the question of genetic affiliation is essentially sidestepped. It is suggested that Arabic is a composite language with clear South Semitic and clear NW Semitic affinities. The early sources are necessarily written sources, and the broad analytic framework developed to interpret them is part of the large enterprise of philology. This perspective has produced an impressive and invaluable amount of carefully edited and interpreted original texts, epigraphic, papyri, and eventually paper, covering a diverse set of genres. Chapter 4 takes a critical look at the epigraphic and papyrological sources from a linguistic stance. In order to illustrate concretely the evidential nature of these texts, samples of epigraphic and papyri texts are presented and commented upon. Two perspectives are invoked to integrate them into Arabic language history. First, the papyri are seen as an inheritance of Wansbrough’s (1996) “juridical and cultural koine,” which is attested in the Middle East since Akkadian times. Secondly, in its particular Arabic manifestation they fall into Kaplony’s (2018) “documentary Arabic,” which both preceded and antedated Classical Arabic. This chapter cautions that results derived solely by comparing written texts without integrating them into the longer and larger development of Arabic cannot be equated with the history of Arabic. However, the central icon of Arabic is Classical Arabic. What, though, is to be understood by Classical Arabic? Different scholars offer different answers to this question. In Chapter 3 a distinction is drawn here between a normed Classical Arabic such as was defined by Ibn al-Sarraj’s (d. 316/928) Al-ʔUṣuwl fiy l-Naħw (Owens 1988) and the earlier, monumental, heterogeneous grammar of Sibawaih

18

INTRODUCTION

(179/789). Sibawaihi’s Kitaab marks a watershed not only in Arabic linguistics, but in the history of linguistics in general, being a work of remarkable descriptive and theoretical coherency, covering all domains of language, from low-level phonetics to discourse. It is only with Sibawaih that we have a reliable, for the most part unambiguous idea of what Arabic was. There are various reasons for this, but one of them is simply that Sibawaih was a master phonetician and phonologist. The low-level phonological (i.e. vowels, gemination, semivowels, even pausal conditioning) details, which are generally assumed for epigraphy and the papyri but which in fact physically are not in these sources, can be retrospectively filled in on the basis of Sibawaihi’s later work. One detailed illustration of Sibawaihi’s attention to theoretical and descriptive methodology is given, concerning palatalization of /∗ k/, in order to underscore the point of how dependent we are on his descriptive acumen. The example equally is adduced to show that Sibawaih provides a direct link to contemporary dialects. Thus the Sibawaih of the current book is not the exclusive preserve of the Classical and philological literary traditions, even if he is an integral part of these. Rather, he forms a natural bridge to the material introduced from Chapter 5 onward. The distinction between a normed Classical Arabic and Sibawaihi’s Kitaab is not, however, a complete answer to the question, what Classical Arabic is. Following another classical linguist, Ibn al-Nadim (late fourth/tenth century), a distinction is drawn between an Arabic of diverse tribes, which antedates and postdates Classical Arabic, and the normed Arabic which by the end of the fourth/tenth century had become associated with the Koran,¹¹ and as interpreted here, with the emerging Arabic-Islamic culture in general.

Part II: Reconstruction Part II highlights the crucial role of reconstruction from contemporary sources can play in Arabic language history. Here Arabic is the object of inquiry via the comparative method. Chapters 5 and 6 switch gears completely. Whereas Chapters 2–4 concentrate on the early written sources (Old Arabic in the sense used in this book), Chapters 5 ¹¹ It might be useful to distinguish three linguistic “stages” of the Koran. The first is that of the original consonantal text which dates as a complete Koran from the end of the first Islamic century (ca. 94/712– 713) at the latest (Déroche 2003: 256, following Grohmann). This is linguistically very interesting but associated with so many interpretive issues, many purely philological and linguistic, others cultural and religious, that this is one source which is not treated in this book. A second might be termed the stage of normalization, ca. 712–935 (see ch. 3 n. 15 below) as Arabic linguists and Koranic readers systematized the originally unvocalized text. A third stage, the classical tradition proper as it were to which Ibn Faris and Ibn al-Nadim belong, assumes the normative status of one or the other of the seven reading traditions and treat the work formally as a ne varietur text whose interpretive interest lies with questions of grammar, discourse, semantics, theology and others, but not in the linguistic form. I assume that when Ibn al-Nadim speaks of the normative force of the Koran he is referring more narrowly to a “stage 3” Koran. The issue, however, is obviously a very large one.

1.5 OVERVIEW OF CHAPTERS

19

and 6 introduce oral Arabic, i.e. contemporary Arabic as an evidential source for interpreting language history. The seeming anomaly of using contemporary Arabic for reconstructions projected back over 1,000 or 1,500 years is justified inter alia by the results of Chapter 11. Chapter 5 introduces 17+ case studies discussed in greater or lesser detail which serve as the basis for reconstruction based on contemporary sources. With two exceptions, the individual studies illustrate a fundamental point, namely that Arabic language history is characterized by a multitude of discontinuous isoglosses which can join contemporary dialects as disparate as Uzbekistan and that of the Lake Chad area (Map 4). The common isoglosses are as often as not non-contiguous, reappearing at a remove from a given dialect, for instance, in Uzbekistan, Baħrain, Yemen, and LCA, but not in the intervening Iraqi or Egyptian. This evidential basis is closely tied to two general theoretical issues, one the status of independent parallel development as an explanation for language change, the other the role of speech communities in understanding historical linguistics. Whereas it has been common practice in Semitic and Arabic linguistics to explain non-contiguous commonalities via parallel independent development, as a general principle a preferable solution (here termed “Lass’ principle”) is to think in terms of diffusion after a single innovation or diffusion of an inherited (i.e. PS or PWS) feature. This issue leads to the role of speech communities in understanding language change and stability. Whereas looked at in isolation the reappearance of non-contiguous features may suggest multiple implementation of a phonologically, morphophonologically, and/or morphologically defined innovation, associating each change with groups of mobile communities dissipates the need for such an explanation in most cases. In Labov’s words (2007: 346), “When entire communities move, they carry with them agents of transmission and incrementation.” Chapter 6 illustrates the tenacity with which inherited differences may survive migration with a sample variationist case study from Maiduguri in the LCA region. An analogy suggests itself here. The written word is inherently contextualized in a stone, a piece of paper, papyrus which can be interpreted socio-historically more or less, better or worse depending on how well known the circumstances are surrounding that stone or piece of paper (Sitz am Leben). For its part, the oral word is contextualized in a speech community, whose historical provenance can be more or less understood depending on how much is known about that community and its antecedents.

Part III: Contact Chapters 7, 8, and 9 treat Arabic and language contact from two historical eras. These three chapters examine contact-related phenomena from two very different perspectives, both chronologically and methodologically. Chapter 7

20

INTRODUCTION

summarizes the extensive pre- and early Islamic influence of Aramaic on Arabic. This influence is understood in Dixonian terms as representing an era of equilibrium whereby there were diverse, but extensive instances of Aramaic–Arabic contact throughout the Middle East homeland. Chapters 8 and 9, on the other hand, describe the much later contact between Arabic and the languages of the Lake Chad area which began about 1400. Chapters 8 and 9 open up new methodological ground in linguistics in general, in that the entire basis of comparison are two large oral corpora, one from the LCA, the other from Egypt, from where ancestral LCA derives. Such comparative corpora demonstrate with particular clarity not only how, but also the degree to which contact can influence a language. In addition they allow treating issues such as the history of idioms which otherwise typically fly under the historical linguistic radar.

Part IV: Stability It is a major argument that by and large Arabic over the 1,500+ years treated here has remained very stable in a number of key respects. Understanding stability, however, is a challenge to historical linguistics. Chapters 10 and 11 shift gears again, this time from language change, to language stability. Setting a time frame of 1000+ years engenders the age-old question of how stable languages are. Surprisingly, historical linguistics has done little to define this issue, beyond noting that stability exists. In Chapter 10 I simply show using short sketches of three Arabic dialects that there is a case to answer for as far as stability goes. In Chapter 11 I take a complex morpho-discourse problem, comparing three separate, non-contiguous Arabic dialects to argue that the reconstructed result gives an insight into why languages do not change. The object of explanation is the highly stable Arabic verb paradigm, which in its essentials has not changed at all in over 1,500 years. The stability derives from an identical set of constraints across the three dialects related to the referential status of the subject in discourse.

Part V: Taxonomy Arabic, it is argued, paradoxically has undergone significant changes over the 1,500+ years examined, and at the same time across all its varieties is marked by considerable stability. Its historical linguistic profile is distinctive enough that it is best conceptualized in a larger taxonomy of the history of languages. Three case studies, including Arabic, establish a basic framework. Chapters 1–11 develop a holistic overview of Arabic language history, but one, as it turns out, which is quite complex. In order to explicitly orientate this picture of

1.5 OVERVIEW OF CHAPTERS

21

Arabic within historical linguistics Chapter 12 compares Arabic with the history of two further languages, Icelandic and English, concentrating on the morphology of these languages. As is well known, Icelandic, from the Old Icelandic period (ca. 800–1200) up to the present has changed little. English, on the other hand, underwent a drastic change from the Old English period, when it was not in fact very different from Old Icelandic morphologically, to the end of the Middle English period (ca. 1500). This is illustrated with comparative paradigms. Icelandic is said to be “alinear”—it shows negligible change (or “development”) over a nearly identical chronological period as English. English is said to be “linear.” English gradually “moves” from one state to another over the defined period of time. Using these two extremes, alinearity and linearity as idealized orientation, Arabic turns out to be both alinear and linear. This ostensible contradiction is mitigated by the speech community construct. Once Arabic is understood as consisting of semi-autonomous multiple speech communities, its history can be understood in part as a composite of discontinuous changes. In this sense it is multilinear. By the same token, the fact that core parts of its grammar—verbal morphology and discourse practice are key domains here—are effectively stable across widely divergent speech communities marks Arabic as alinear. Given this complexity, I propose that Arabic language history is usefully thought of in terms of a metaphor from art. Metaphors do not, in a scientific sense, explain. They can, however, allow complex, internally contradictory objects to be comprehended in a single image. Think of Arabic as a work of art, a painting in the manner of Piet Mondrian, encapsulating movement and stability on one canvas.

Putting it all together, Chapters 13 and 14 Chapters 13 and 14 tease out historical linguistic “lessons” which can be derived from the study of Arabic. The last two chapters tie together, as well as can be done, themes from throughout the book. There are two foci. In Chapter 13 the comparative features which have been adduced throughout the work are classified according to their status within Semitic and West Semitic, and within a broad chronological scale hinged on Islam (nominally 622) and whether or not confirmed in Sibawaih (178/793). A relatively detailed historical picture of Arabic can be described which is, in chronological time, as old as the earliest attested Germanic languages. A second focus lies in the general linguistic issues which inform the study. What is the role of diffusion in Arabic language history, of contact, incrementation and internal change? How does grammaticalization theory help and hinder understanding its history? Why does the history of Arabic caution against the wholesale invocation of parallel independent development? What added insights does the use of large, oral corpora bring to understanding language history, particularly

22

INTRODUCTION

when coupled with domains of linguistics rarely treated from a historical perspective, for instance metaphor theory and the analysis of discourse? How does the concept of speech community inform a taxonomy of linguistic change? Without gainsaying the obvious fact that Arabic is a constitutive element of Semitic language history, two conclusions emerge. First, there is a huge amount of historical linguistics which, at this juncture at least, can be understood only as a history of Arabic. Many aspects generalize over Arabic, but do not generalize to other Semitic languages. Secondly, once this is accepted, the contribution which Arabic can make to a broader comparative taxonomy of historical linguistics is enormous. In Owens (2013d: 1) I stated that Arabic is, linguistically, the most interesting language of the world. Its potential for elucidating historical linguistics is largely untapped. This book is about the linguistic history of Arabic in the context of methodologies, theories and different sorts of data sets which bear on understanding both its internal and Semitic-specific history, and its history in a broader framework of languages outside of the Semitic world. In the perspective of this book, linguistic argumentation requires critical examination of opposing issues, identifying these by chapter and verse, correctly paraphrasing these, weighing the different ideas, and ultimately deciding which are the better ones not by fiat but by reasoned argumentation. It is an analytic perspective. Such argumentation has been termed “polemic” (Suchard 2020: 643). This, however, trivializes historical linguistic argumentation. It is mere polemic. For historical linguistics, however, the issues and methodology are not only fundamental, but constitute what is interesting and vital in the discipline.

PART I

OLD A R A BIC Documented Arabic begins with the multiple data sources provided by Old Arabic. Part I illustrates the many inflections of this material with the Semitic background of Arabic, with the documentation and genetic interpretation of pre- and early Islamic Arabic, and the pivotal role assumed by the Arabic linguistic tradition.

2 Arabic and Semitic Arabic is a Semitic language, the Semitic languages one of five or six branches (depending on how they are counted) of Afro-Asiatic (Porkhomovsky 2020). The Semitic languages are characterized by a remarkable structural coherence implying a common source. Prominent attributes among these are summarized in the following. A great deal has been written about individual Semitic languages, and about the family as a whole, and readers are referred to the now numerous general publications such as Moscati et al. (1980), Hetzron (ed.) 1997, Stempel (1999), and Weninger (2011). Basic structural accounts of individual Semitic languages are found for instance in Bergstra¨sser (1928 [1977]) and Goldenberg (2013). The current summary concerns how, in my view, comparative Semitic is relevant to understanding Arabic language history. Only the main representatives of a given sub-family will be cited. I begin with a basic, brief introduction which highlights the common features of the languages, then move in 2.2–2.4 to the problematic issue of Arabic and Semitic language sub-classification. This advocates for a perspective less concerned with defining unique arboreal classificatory parameters as it is with showing how Arabic integrates at a number of levels with other Semitic languages. The relation between Arabic and Semitic divides into three categories: 1. Arabic continues common proto-Semitic features 2. Arabic continues Semitic features of a sub-grouping 3. Arabic innovates relative to other Semitic groups This book is concerned with the third situation, to a limited degree with the second as well, though in each case there is little emphasis on drawing lines between Arabic and other languages. While in the classical Arabic tradition there were cultural reasons for identifying linguistic characteristics which set Arabic apart from other languages, to the advantage of Arabic (see e.g. Ibn Faris al-Ṣaaħibi: 16–25, 123), in a modern context the exercise, as Retso¨ (2013 [2019]) observes, risks erecting sometimes artificial boundaries. In 2.1 I will very briefly summarize point 1, major commonalities between Arabic and the other Semitic languages, glossing over myriad details, and at this point counting as “Arabic” any feature which is represented by at least some of its varieties. In the remainder of the chapter I discuss and illustrate issues relating to a more fine-grained classification. As general background to this discussion, the Semitic languages are conventionally divided between the eastern Semitic languages (Akkadian, Eblaitic) and Arabic and the Case against Linearity in Historical Linguistics. Jonathan Owens, Oxford University Press. © Jonathan Owens (2023). DOI: 10.1093/oso/9780192867513.003.0002

26

ARABIC AND SEMITIC

western Semitic which encompasses all of the rest. There consensus largely dissipates, so in this background chapter I will, with an introductory expository purpose in mind, present two broad genetic classificatory interpretations of Arabic. The first goes back at least to the nineteenth century whereas the second was advocated for first time in the 1970s and continues to be interpreted in various ways till today. In both cases the position of Arabic turns out to be problematic in one way or another.

2.1 Common Semitic Segmental phonemes Table 2.1 from Moscati et al. (1980: 24, Kogan 2011: 55) needs little commentary. Since “Arabic,” even “Classical Arabic,” is an abstraction whose instantiation in a given paradigm will be open to interpretation, I leave exemplification of basic Arabic phonology to Chapter 10 where three concrete dialects are introduced. Table 2.1 Proto-Semitic (PS) segmental consonantal phonemes p b ṭ̣̠ θ đ m w

t d ɮ,a ṭ ɬ

k g ṣ s z

š

x ɣ

q or ḳ

ʔ

ħ ʕ

h

n y r, l

a

A voiced emphatic lateral fricative, attested phonetically at least in NW Yemen and in the Najran area of Saudi Arabia.

Arabic on this reckoning is close to PS. Three major (non-exhaustive) differences which will be discussed critically at points in this book (4.1.5, 4.1.7) should be cited here. 1.1∗ PS = Arabic ∗

p = f, ∗ ɬ = š (s2), ∗ š = s (s1),¹

In some cases the CA or dialectal realization of these sounds may either be variable, i.e. different dialects or even CA itself may have multiple variants (see 3.2), so the ¹ It is assumed on the basis of comparative lexical sets, orthographic practice, and reflexes in contemporary languages that there are three non-emphatic alveolar fricatives in proto-Semitic, identified as s1, s2 and s3. Frequently these are reconstructed as š = s1, ɬ = s2 and s = s3 (Moscati et al. 1980: 33). In Arabic, ∗ s1 (∗ š) and ∗ s3 (∗ s) merge in /s/.

2.1 COMMON SEMITIC

27

current chart simply provides an orientation which indicates that Arabic adheres closely to PS segmental phonology. This phonology is of course supported by lexical cognates across all the Semitic languages. Table 2.2 gives just a small sample. For the sake of illustration I follow Kogan’s interpretation of the PS and successor language phonetic values. Table 2.2 Lexical cognates (examples and interpretations from Kogan 2011: 55 ff.) PS

Arabic

Hebrew

Ge’ez

Syriac

Ð x š (= s1) ɬ (= s2 = ˆs)a s (= s3)

ʔuđn xasar ism karš xasar

ʔozεεn ħsr šeem kɔɔreeɬ ħsr

ʔəzn xasra səm karš (karɬ) xasra

ʔedn-aa ħsr šm-aa kars-aa ħsr

‘ear’ ‘lose, be deficient’ ‘name’ ‘stomach’ ‘lose, be deficient’

a

ˆs is one conventional representation of s2. It is not necessarily specified phonetically. From the standpoint of Arabic this value is problematic, as will be discussed in greater detail in 4.1.5.2. When presenting the proto Semitic or proto West Semitic values in summary form, I follow the conventional literature, without necessarily endorsing each value in detail.

None of the languages even in this very small sample correspond completely to Kogan’s reconstructed PS forms, but all are derivable from it by a straightforward, regular sound correspondence. For instance, NW Semitic (Hebrew, Syriac) shows the shift ∗ x → ħ (Goldenberg 2013: 68), and Arabic and Ge’ez display the shift ∗ š → s (see 2.4 and 4.1.5).

Verb What is arguably the most striking instance of pan-Semitic coherence is the inflected verb. Here again the commonalities speak for themselves. The imperfect verb listed here (from Moscati et al. 1980: 142) consists of a voweled stem, a CVprefix, and in some persons—always the same in all languages—a suffix. These morphemes always mark the same category, e.g. MPL vs. FPL perfect is always distinguished by (at least) a suffix. The morphemes marking these categories are essentially the same—t- in the second person, t- in the 3F, y- in the 3M, and a glottal stop + V in the first person. For later expository purposes I cite the two Ge’ez imperfect forms, the first the subjunctive, the second the indicative. In Table 2.3 I cite only WS languages. Special issues not directly germane to the current book accrue when East Semitic (Akkadian) is added, some of which are touched on in 2.2 below. Here Classical Arabic represents “Arabic.”

28

ARABIC AND SEMITIC Table 2.3 Imperfect verb, yaqburu ‘bury’

1. 2M. 2F 3M 3F

Arabic

Hebrew

Ge’ez

ʔa-qbur-u ta-qbur-u ta-qbur-iina ya-qbur-u ta-qbur-u

ʔe-qbor ti-qbor ti-qbər-i yi-qbor ti-qbor

ʔe-qbər tə-qbər tə-qbər-i yə-qbər tə-qbər

ʔə-qabbər tə-qabbər tə-qabr-i yə-qabbər tə-qabbər

(In App. 2.1 are found examples of the perfect verb, derived verbs, and question words.) Independent pronouns equally speak for themselves (Table 2.4).

Table 2.4 Independent pronouns (singular)

I You.M You.F He She

Arabic

Hebrew

Ge’ez

Akkadian

ʔana ʔanta ʔanti huwa hiya

ʔanii/ʔaanooki ʔatta ʔatt huu hii

ʔana ʔanta ʔantii wəʔətuu yəʔətiia

anaaku attaa attii šuu šii

a

Moscati et al. (1980: 105) suggest ∗ < hu → ʔuw → wu → wə + suffixation of –tuu/-tii.

Without wanting to gloss over many issues regarding the detailed differences among the languages, it is clear that there are a significant number of fundamental phonological, morphological, and lexical identities which set the Semitic languages apart as a group, Arabic among them. It is relevant to point out an issue which will recur throughout the book, and that is that the Arabic forms listed here are CA, and among the dialects may occur forms which may be closer to another Semitic language than they are to CA. For instance, for the 1SG Baghdadi Arabic has aani ‘I,’ recalling Hebrew; all of the western Arabic dialects lack a final –n in the (2FSG, MPL) imperfect suffix, e.g. ta-qbur-i in the 2FSG, hence identical to the other WS languages. Such similarities engender a basic comparative question of common origin or independent parallel development. Barring the special case of Aramaic (Chapter 7), I do not discuss these cross-Semitic language correspondences for the most part, though it will be seen that in many cases analogous similarities among varieties of Arabic exist, whose interpretation bears in important ways on Arabic language history.

2.2 CONTRASTIVE, BUT GENERAL: THE ANCESTORS OF ARABIC IN TREES

29

2.2 Contrastive, but general: The ancestors of Arabic in trees Turning to what differentiates the Semitic languages, I center the discussion on the position of Arabic. As noted, Arabic is a West Semitic language whose further affiliation has been ascribed either to the South Semitic or the Central Semitic languages. As orientation I begin with the assumption that Arabic is South Semitic, which is the older of the two positions. In outlining the isoglosses which argue for this affiliation, the perspective that Arabic is Central Semitic will follow naturally.

2.2.1 The classic arguments There may be as many tree representations of the genetic relationship among Semitic languages as there are Semiticists. It is fair to say that an overall consensus is missing at the moment, so in order to introduce the position of Arabic within Semitic in a relatively “uncluttered” landscape, I will begin by summarizing it within one straightforward traditional position. Going back at least to No¨ldeke (1886: 644) Arabic has been aligned with Ethio-Semitic as a South Semitic language. This classification has been reiterated and updated by a number of scholars (Brockelmann 1908/1913: 30; Diem 1980b; Ratcliffe 1998: 152; Stempel 1999: 21). The following is based on Faber (1997: 5). Since the focus of the discussion is broadly expository, to show how Arabic fits into the overall Semitic tree, I leave out a number of languages (e.g. Phoenecian, Amorite, and Sayhadic [epigraphic South Arabian]) not immediately germane to the exposition, and when I speak of SS itself, I usually ignore MSA. Faber’s tree in Figure 2.1 sees Arabic as a part of the south Semitic languages, a sister of Ethio-Semitic.

West Semitic

South Semitic

Southeast Semitic (Ethiopic, Modern South Arabian)

Figure 2.1 Arabic as South Semitic

NW Semitic

Arabic

Ugaritic Hebrew Aramaic

30

ARABIC AND SEMITIC

Trees are drawn on the basis of contrastive, shared features. To say there is a SS and a NW Semitic branch is to say that there is a significant, consistent linguistic feature or features which differentiate them. These differences are interpreted as having arisen at some point in the past and define a split in the original PS values. On this basis there are traditionally three strong arguments for a South Semitic sub-family. As seen in Figure 2.1 and in (2.1), the sound usually reconstructed as ∗ p appears as such in Northwest Semitic, but as /f/ in South Semitic. (2.1)

∗

(2.2)

‘mouth’ Heb pee, G. fam

p/f = f Ar. f am (fuu)

Only the South Semitic languages and Arabic have the stem III verb form marked by a long /aa/. (2.3)

G. šaaqaya ‘he tormented,’ Ar. qaatal ‘combat’

Analogously, both languages derive form VI with prefixation of ta(2.4)

G. ta-maasal-uu ‘they resembled each other,’ Ar. ta-qaatal ‘fight one another’

Finally, broken plurals are attested in Arabic and in the South Semitic languages. G. hagar/ʔahguur ‘town/towns,’ Ar. jamal/ʔajmul ‘camel/camels’ (al-Kitaab II: 193). In tree terms (Figure 2.2) this identifies a class of South Semitic languages, which includes Arabic. PS:

Ø

(Akkadian) ES

Ø

WS

Ø Northwest Semitic

faa al, broken PL South Semitic

Figure 2.2 SS innovation of faaʕal, broken plurals in South Semitic (including Arabic)

There are two problems with this traditional view. The first pertains to the interpretation of faaʕal and broken plurals. It has been argued that both Akkadian and NW Semitic languages have both broken plurals and the faaʕal verb, or at least vestiges of these, Hebrew melek—mǝlaak-iim ‘king—kings’ (Huehnergard and Rubin 2011: 272). In tree terms there are two ways to look at this. It can be assumed that PS had no broken plurals or faaʕal stems, and that these developed in ancestral South Semitic. This is represented in Figure 2.2. Alternatively, PS originally

2.2 CONTRASTIVE, BUT GENERAL: THE ANCESTORS OF ARABIC IN TREES

31

PS: faa al, broken plural (Akkadian) ES

WS

Ø/reduced Ø/reduced Northwest Semitic

faa al/broken PL South Semitic

Figure 2.3 Shared maintenance of faaʕal, broken plural South Semitica a It is not surprising given the large “feature pool” (Mufwene 1996, 2009) to choose from, that other trees have been suggested as well. Stempel (1999: 21) for instance suggests that in West Semitic Ugaritic is a single parallel member of West Semitic, and that in this branch it stands opposed to Central Semitic, with three parallel groups, Cananite (Hebrew), Aramaic and South Semitic. Arabic is part of South Semitic for Stempel

did have these forms, but they became attenuated or lost in Akkadian and NW Semitic, as in Figure 2.3. Assuming Figure 2.3 would take away two prominent features characterizing South Semitic.

2.2.2 Hetzron’s alternative A second issue was raised most prominently by Robert Hetzron (1974, 1976) who pointed out that Arabic shares a significant feature with NW Semitic, namely in the loss of one of the three inherited tense/aspect verb stems. Whereas Akkadian (see (2.7) below) and Ethiopic have three stems, a perfect, a preterite f ʕal (subjunctive in Ethiopic Ge’ez), and an imperfect -faʕʕil, the NW Semitic languages plus Arabic (and Sayhadic) have only two, perfect and imperfect. (2.5)

Merger of jussive/imperfect in Central Semitic Akkadian/Ethiopic -faʕʕil (imperfect) f ʕal (preterite) NW + Arabic Ø f ʕal = imperfect (jussive)

The original preterite (a past tense in Akkadian) merges into a new imperfect in NW Semitic and Arabic (see Table 2.3 for partial paradigm), the original imperfect being lost in these languages (see Ro¨ssler 1950 for classic exposition of this process). In this view Arabic and NW Semitic constitute a class marked by the innovation in (2.5). Hetzron called this class Central Semitic, a term often used today. Arabic, with its single imperfect stem, is now among the Central Semitic languages, whereas Ethio-Semitic (and MSA) with three finite verbal stems constitute independent branches. Hetzron cited further isoglosses justifying the Central Semitic class, for instance that Central Semitic marks the first and second person masculine singular perfect with -t rather than -k, as in Ethio-Semitic. However, his

32

ARABIC AND SEMITIC Common Semitic West Semitic

East Semitic

Central Semitic Northwest Semitic Ugaritic Hebrew Aram Arabic ESA MSA Ethio-Semitic

Akkadian

Figure 2.4 Hetzron-inspired Semitic

further criteria have not proved effective. The status of the perfect marking, for instance, is taken up in 2.3.1 below. In any case, on the basis of Hetzron’s arguments a revised Semitic tree has been suggested, as in Figure 2.4, simplifying Huehnergard and Rubin 2011: 263 (see also Faber 1997: 6 for alternative tree).

2.3 Bifurcated features in Arabic I will return to a discussion of the trees in 2.4 below. First I introduce a further issue which impinges directly on the place of Arabic on the tree. Whereas Arabic is traditionally in Semitic classification considered to belong to one branch or another—South Semitic for instance, or Central Semitic—it displays a number of key features which I term bifurcated features. These are features which some varieties of Arabic share with other languages on one branch of a given tree, while other varieties share them with languages on another branch.

2.3.1 -t ~ -k = 1, 2 perfect verb suffix The best example of this is the marking of the 1, 2MSG in the perfect, in CA -tu, -ta, -ti. I will abbreviate this feature to an alternation between -t and -k, as the person distinctions are not relevant to the immediate issue. Whereas Hetzron ascribed to Arabic the morpheme -t for this (e.g. qul-tu, qul-ti) putting Arabic with NW Semitic, in fact there are an appreciable number of Yemeni Arabic dialects which use -k, qul-ku, qul-ki. Otherwise all Ethio-Semitic (and MSA) languages mark the 1, 2MSG with -k (see App. 2.1). The feature is bifurcated because Arabic aligns here along two branches of a tree, whether as in Figure 2.1 or Figure 2.4. (2.6)

Bifurcated features: 1, 2 perfect suffixes -ku, -ka, -ki, qul-ku, qul-ka, qul-ki ‘I etc. said’ = -tu, -ta, -ti, qul-tu, qul-ta, qul-ti

2.3 BIFURCATED FE ATURES IN ARABIC

33

The background to the -t ~ k variation is briefly as follows. The Akkadian marking of the 1, 2 perfect verb (stative) (as in [2.8]) is also the reconstructed PS personal value for these three morphemes as perfect verb suffixes in WS, with ∗ –ku “1SG” and ∗ –ta/ti “2” (Edzard 1998: 120; Moscati et al. 1980: 139; Weninger 2011: 162 for alternative interpretation). The existence of the Akkadian stative, it can be noted in passing, is one of the essential criteria often cited for distinguishing East from West Semitic. (2.7)

1, 2 Akkadian stative qabaraa-ku ‘I have buried’ qabaraa-ta ‘you.M have buried’ qabaraa-ti ‘you.F have buried’

Among the WS languages, the South Semitic languages generalized –k to all second persons in the perfect, the NW Semitic –t to the first person. Arabic communities split on this feature, some generalizing along the lines of South Semitic, others like NW Semitic i.e., Arabic internally replicated the generalization of either the –k or –t perfect which otherwise characterizes different languages (see 12.3.3, 12.4). The -k dialects today are found in the Yemen highlands (Behnstedt 1985: 116–117).

2.3.2 Short vowels in open syllables As a second bifurcated feature, some Arabic dialects, as well as by and large Classical Arabic, maintain short vowels in open syllables intact (Owens 2006/2009: 49).² The dialects include most Highland Yemen dialects (Behnstedt 2016a: 38, 40; Werbeck 2001: 43–46, 56), the Egyptian oases, many Sinai dialects and by and large, LCA (see 10.2 for elaboration). Ge’ez as well essentially maintains short vowels in open syllables (Tropper 2002: 29–30, 89), as do contemporary Ethio-Semitic languages. (2.8)

CV, katab, kátab-at, ‘he wrote, she wrote,’ firiħ, fíriħ-at ‘he was happy, she was happy,’ yáabis-ah ‘dry.F’ Ge’ez, ḳa¨ta¨la¨, yəḳa¨ttəl yəḳa¨ttəl-u, ‘he/they kill,’ tə-ngar-u ‘that you.PL speak’

By contrast, there are also many Arabic dialects which have rather elaborate rules of re-syllabification, initiated in most cases by constraints on CV in an open syllable. Here I briefly exemplify two types of outcomes. In Eastern Libyan Arabic a ² This criterion was already identified in Brockelmann (1908: 21), though he assumed without argumentation that the CA-type maintenance of short vowels in open syllables was the original type. The complex relations between syllable structure and morphological structure are treated as a typological issue in Embarki and Owens, 2023.

34

ARABIC AND SEMITIC

short low vowel in an open syllable is raised and a short high vowel in an open syllable is deleted. However, sequences of two short vowels in open syllables are not allowed and if they arise (via suffixation), the first syllable is deleted (see 10.1 for identical constraint in Najdi). Epenthetic vowels, which are inserted at the end of the phonological cycle, are underlined. (2.9)

Eastern Libyan Arabic, kitab ‘he wrote,’ kitab-iit ‘I wrote’ iiktíb-o ‘they wrote,’

In Baghdadi Arabic no short vowels whether high or low are allowed in open syllables (unless stressed). In this case, the vowel in the open syllable is deleted. The vowel targeted for deletion is in boldface. (2.10)

Baghdadi Arabic ktab-it ‘I wrote,’ kitb-at ‘she wrote’ (< ki.ta.bat) yiktib ‘he writes,’ yikitb-uun ‘they write’ (via yiktb-uun < yiktib-uun)

These will be treated in greater detail in 7.2. Similarly, in Aramaic and Hebrew short vowels in open syllables are subject to various changes, albeit rather different in the two languages. Aramaic, as will be seen in 7.2.2, in fact is similar to Baghdadi Arabic disallowing short vowels in open syllables (unless stressed, see 7.2.2 (9) for discussion). (2.11)

1. ne-θdkar-aak→ (delete short vowel [bold] in open syllable) 2. ne-θdkr-aak→ repair CCCC sequence via vowel insertion (underlined vowel) 3. ne-θdaakr-aak (→ neddakraak via assimilation) 3-remember-you.MSG ‘He shall remember you.M’ (Syriac, Muraoka 1997: 15)

Hebrew is more complex (Blau 1976: 12–14; Jou¨on and Muraoka 2005: 97), the presence and quality of vowels depending on syllable structure and stress placement. • Open syllables contain long vowels (2.12. .raa.) • Unstressed closed syllables contain short vowels (2.12 ʔab ) • Stressed syllables contain long vowels (2.12, .háam.) (2.12)

Vowel lengthening in open syllables, Hebrew ʔab-raa-háam ‘Ibrahim’ CVC-CVV-CVVC

Short vowels /a, e, i/ in open unstressed syllables undergo changes in quality or may be deleted.

2.3 BIFURCATED FE ATURES IN ARABIC

(2.13)

35

koo-teb-iim > koot-biim ‘writers’

The basic issue among Hebrew/Aramaic and many Arabic dialects is the constraint prohibiting short vowels in open syllables. The different languages may react to this constraint in different ways—deletion, lengthening, drawing protection via stress—but in all cases significant changes in syllable structure are implied. Again Arabic aligns in two directions, toward Ethio-Semitic and toward NW Semitic, in this case especially Aramaic.

2.3.3 The nominal feminine suffix -at In Huehnergard and Rubin’s (2011) revision of Hetzron’s classification the authors consider further isoglosses which are candidates for identifying Central Semitic as a coherent group. I treat one here as representative.³ This is the form of the feminine suffix. In Arabic, Hebrew, and Aramaic the feminine nominal suffix has two allomorphs, -a(h)⁴ or –aa and –at,⁵ the latter when a further affix is attached or when it occurs as a possessed noun (iḍaafa). (2.14)

Arabic: xidm-ah ‘service, work,’ xidm-at-na ‘our service, work,’ xidm-at an-naas ‘service of the people’ Biblical Aramaic: malk-aah ‘queen,’ malk-at-na ‘our queen’ (Rosenthal 1961: 88) s. Ge’ez nəgəɬ-t ‘queen’ (=nəgəs2-t),⁶ nəgəɬ ‘king’

(2.15)

nominal feminine suffix allomorphy: -at: before affix or in possessive construction -a(h): otherwise

Other Semitic languages (e.g. Akkadian, Ge’ez, Modern South Arabian, Ugaritic) have invariable -(V )t. This looks ostensibly to be a strong candidate for a shared central Semitic innovation. It is worthwhile discussing this in greater detail, as it ³ Another interesting and important feature, and one also frequently cited in classifying ANA varieties is the form of the definite article, l-, m- or n- (see Tropper 2001). The issue is as variegated and interesting as that of the feminine –t. For introductory discussion, see Huehnergard and Rubin (2011: 269–270, also Pat-El 2017: 451–452). ⁴ Usually the unsuffixed form is –V (-a, -e or –i), though in some dialects it may be –Vh, usually -ah or –ih. ⁵ I use –at as the conventional representation. ⁶ s2 in Ethiopic is often assumed to have the voiceless lateral reflex (see Table 1.1). I follow Weninger (2011: 1127) here, though it should be emphasized that the value of s2 is reconstructed, and as will be seen in 4.1.5.2, such reconstructions at least for Arabic are problematic.

36

ARABIC AND SEMITIC

is an indication of the complications which arise when features are looked at in detail. As Huehnergard and Rubin point out, there are two problems with this sharedinnovation account. First, Ugaritic, which the authors consider to be part of NW Semitic, has invariable –t, as do a number of the ANA languages, Safaitic (AD ca. 400, see 4.1) and Taymanitic (ca. 500 BC). Moreover, the fact that both Aramaic (Imperial Aramaic, not noted by the authors) and Arabic have or had varieties which maintained –t speaks for a wave change which only partly washed over some of the varieties. (2.16a) (2.16b)

{qdl-t}⁷ ‘(one) complaint’ (Muraoka and Porten 2003: 65) ib-bagar-it DEF-cow-F (al-Naḍiir, Behnstedt 1987: 162)

In the case of Arabic, in al-Naḍiir (former North Yemen) invariable –it occurs only on definite nouns (and nouns in possessive construct, as in other forms of Arabic). In indefinite form the allomorph is -ah, bagar-ah. Since it is an isolated case, it is not entirely clear whether it is a relic or a later analogical development. The fact that Ingham (1982) reports invariable –at in Shammari Arabic (northern KSA), though in a different context, suggests that both are relics which have survived under different conditions.⁸ Invariable –t is also reported in the grammatical tradition to be maintained by “some Arabs” in pausal position, as in ṭalħ-at ‘acacia’ (Sibawaih II: 189; Ibn Faris, al-Ṣaaħibiy: 30; Ibn Yaʕish Sharħ al-Mufaṣṣal V: 89). (2.15) is thus an innovation in Central Semitic (assuming the classification for the sake of current exposition), but it did not encompass all of the Central Semitic languages, and even those it did it appears to have affected at different times and to different degrees. Huehnergard and Rubin leave undecided the question whether (2.15) occurred independently in Aramaic, Hebrew, and Arabic, or whether it is a diffusional spread (2011: 268). The former possibility I think can be peremptorily dismissed. On a tree this would look as in Figure 2.5, with three independent events leading to –at ~ -ah. -at -t Ugaritic

at- /–ah Hebrew

at- /–ah Aramaic

at- /–ah Arabic

Figure 2.5 Parallel independent development of –t ~ – a(h) allomorphy

⁷ Along with {qdl-h} on a variable basis. ⁸ It appears that –at in Shammari may be used when the noun is in close juncture with the following word. This includes (from Ingham’s examples) A “S-V” juncture, a “N and N” juncture, neither of which contexts support the –t allomorph in other varieties today. More study is needed (see van Putten 2017).

2.3 BIFURCATED FE ATURES IN ARABIC

37

The fact, however, that invariable –t is attested in the ANA relatives of Arabic (see e.g. Taymanitic 4.1.1), and still in contemporary Arabic, and in early forms of Aramaic means that there would be multiple instances of parallel independent development with Aramaic and Arabic independently working their way through the at → at- ~ -ah shift. As matters stand, the development more accurately can be represented as a variationist development, with Arabic still maintaining relics of invariable –t. The tree in Figure 2.6 represents the development of (2.15) in three stages. In the first, in an unspecified ancestral population the allomorphy in (2.15) developed. Given the facts on the ground, this is most likely a direct ancestor of Hebrew, which assumed the allomorphy in full. In the ancestral Aramaic and Arabic population there continued to be populations with invariable –t, and others with the allomorphy in (2.15). In tree-structure terms this is represented as structural variation, [-t ~ -at / –a(h)] (read, invariable –t alternated in the overall population with conditioned -at / -ah). Relatively early in Aramaic this gave way to categorical allomorphy. The variable [-t ~ -at / –a(h)] allomorphy was inherited in ancestral Arabic populations and indeed the shift has not worked its way completely through Arabic, though the allomorphy in (2.15) is by far the dominant variant. The tree structures are intended to help visualize the process and are not a claim that the shift proceeded independently in Hebrew, Aramaic, and Arabic. To the contrary, it may be assumed that the entire process was moved on via contact and diffusion. If the full-scale shift first occurred in Hebrew, this shift could have passed on to Aramaic, recalling that there was a full-scale language shift from Hebrew to Aramaic at the latest in the second century AD. Aramaic-Arabic contact in turn was a constant throughout the history of these peoples (see Chapter 7). The outcome of these changes, however, leaves Arabic again bifurcated. Most varieties are like Hebrew (and later Aramaic), but a few relics align Arabic with original -t invariable languages, namely Ugaritic and Ethio-Semitic. *-at [-t ~ -at / a(h)] [-t ~ at / –a(h)] [-t ~ at / -a(h)] -t Ugaritic

–ah /-at Hebrew

–a(h)/-at Aramaic

[-at ~ –at / -a(h)] Arabic

Note:[-at ~ at –a(h)] = the variation between invariable –t and the allomorphy in (16) was transmitted in full to a successor population. In this figure “/” = allomorphically conditioned variation.

Figure 2.6 Innovation + diffusion of –at ~ -a(h)

38

ARABIC AND SEMITIC

2.3.4 -ki ~ -iš 2FSG object A final bifurcated feature is an interesting one. The 2FSG object suffix is either -ki or –(i)š/-ši. -ši⁹ as 2FSG in Arabic occurs throughout the southern and eastern part of the Arabian peninsula, including most of highland and eastern Yemen, Oman, and parts of Saudi Arabia and the Gulf countries (see Holes 1990 2018c: 125, 136 for summary, Behnstedt 2016a: 116). It is also attested in Sibawaih (see 3.2). –š for 2FSG object is widespread, though not universal in South Semitic being found in some, especially South Ethio-Semitic languages, e.g. Amharic, Harari (Wagner 2011), and Gurage (as well as in all MSA languages where-šε occurs). Significantly for the current exposition, it is also the 2FSG suffix pronoun in West Neo-Aramaic (Maʕluula, already noted in Fischer 1956: 25). (2.17)

Omani Arabic, -iš ‘your FSG.’ beet-iš ‘your.F.SG house’ (= beet-ki) Maʕlula tarb-iš ‘your.F way’ (Jastrow 1997: 337, 345)

This feature is hard to interpret. Its widespread distribution across the language trees suggests an inherited feature. On the other hand, as a reader points out, if -š was inherited from a proto stage in Arabic, one would expect it to have gone to -s, according to the correspondences given in Table 2.2 (∗ š (= s1) → s). The same applies to reflexes in Ethio-Semitic. The same reader suggests that the entire region is characterized by widespread palatalization and hence the attestations are explained via parallel independent development. However, whereas the Arabic suffix alternates with a velar /k/, in Ethio-Semitic palatalization in general affects coronals,¹⁰ aligning Ethio-Semitic with Cushitic in this respect, e.g. Oromo t + causative -s→ cˇcˇ, ƙot-siis → ƙocˇcˇisiis ‘have s.o. cultivate s.t.’ Thus it cannot be ruled out that 2FSG -š is morphophonologically protected from wider phonological changes, representing an exceptional relic.

2.3.5 Stammbaum and bifurcation The bottom line from this discussion is that a simple family tree model works in the sense that it can identify a single least common denominator which divides groups of languages along a common parameter. In the case of Figure 2.4 this is the innovation described in (2.5): languages “above” Central Semitic have three finite verbal stems, but those within Central Semitic have two. ⁹ Representing these morphemes conventionally, rather than as reconstructions. Individual allomorphy is large. ¹⁰ For Gurage and Amharic see Meyer 2011:1232; Teferra and Hudson 2007: 32. Fischer (1956: 36–38) citing work by Ullendorf, Leslau and others considers Amharic -š to be a borrowing from MSA, and Gurage -š to arise from ∗ k > ∗ x > š. He explicitly acknowledges having no explanation for Ma’lula -iš. There are many unresolved issues here, so that the current suggestion of inheritance in all three branches remains open.

2.3 BIFURCATED FE ATURES IN ARABIC

39

It may be asked, however, what is lost by searching too assiduously for simple classificatory parameters. Ideally, as in Hetzron’s original critique of the NW– South Semitic dichotomy, one would want bundles of features aligning along a single innovative node, for instance, as Hetzron once suggested, that NW marks perfects with -t, SS with -k. There is a danger that once an effective classificatory parameter is found, counterevidence will be too easily dismissed. Thus the broken plural and faaʕal parameters in Arabic, which are now split between Central Semitic and other West Semitic languages, are said to in fact be original throughout WS. However, this is only conditionally true. The class of broken plurals is of a considerable order more complicated in Arabic and Ethio-Semitic, and functionally more complicated (cf. e.g. Owens 2021b for development of deflected agreement in Arabic) than the few traces of the phenomenon which are adduced in Hebrew. The late Robert Ratcliffe (1998: 151) cautions us to think not in terms of either/or, either broken plurals or no broken plurals, but rather in terms of “relative complexity and richness.” While acknowledging incipient elements of broken plurals in NW Semitic, Ratcliffe sees the complex system of internal broken plurals as a significant South Semitic (including for him MSA) innovation.¹¹ This leads to restating the parameter as a significant shared feature. If Ethio-Semitic and Arabic are marked by incremental complication of the class of broken plurals that is of a qualitatively different order from the forms in NW Semitic, doesn’t this itself constitute an argument for a shared “complicating” node? Similarly for the faaʕal verb. In total eight features have been discussed in some detail in this section. One of them sets Arabic and Ethio-Semitic apart from other languages, one sets Arabic and NW Semitic apart, and four are bifurcated features. (2.18)

Three types of Arabic genetic affiliation Shared Arabic – Ethio-Semitic: ∗ p → f Shared Arabic - NW Semitic: loss of one verb stem Bifurcated features: Arabic shares feature with both Ethio-Semitic and NW Semitic 1, 2MSG = -t ~ k Short vowels in open syllables = deleted, like Aramaic or = maintained, like Ethio-Semitic FSG nominal form = invariable -t (Ethio-Semitic and Ugaritic) or = -ah ~ -t (Hebrew, later Aramaic) 2FSG object suffix = -š (Maʕlula Aramaic, Ethio-Semitic), = -ki otherwise

¹¹ While Weninger (2011: 1132) sees the Arabic and Ethio-Semitic broken plural systems as sharing features at the level of plurals patterns, but not at the level of individual lexemes, Ratcliffe (1998: 168) notes examples such as Ar. ħulm—aħlaam ‘dream—dreams’ = Ge’ez ħelm—aħlaam.

40

ARABIC AND SEMITIC

In addition, the existence of a robust class of broken plurals and a productive faaʕal verbal class provide a further potential link between Arabic and Ethio-Semitic. I would highlight three points in this discussion. First, finding criteria defining major splits is challenging. Hetzron’s verbal stem merger criterion is thus in formal terms important. Given this, however, a second point asks where this leads to. Even the most ardent supporters of a neat tree structure as in Figure 2.4 need to explain the shifts which led to Arabic sharing significant features with Ethio-Semitic. The shift of ∗ p → f is explained as “a very common change” (Huehnergard and Rubin 2011: 272) that is easily spread via contact. Broken plurals are seen as remnants of a once widespread feature which was severely reduced in NW Semitic and Akkadian (i.e. Figure 2.3), and faaʕal is also interpreted as representing a form once more frequent throughout Semitic. One thing which makes these explanations suspicious is that they serve to justify the unambiguous tree in Figure 2.4. The wide extension of /f/ (Ethio-Semitic, Arabic, also MSA) is not due to a single innovation + spread via primary diffusion (see 5.1), but rather due to an innovation somewhere (unspecified) followed by contact-based change. For the broken plurals and faaʕal it is not discussed (see above) that Arabic and Ethio-Semitic, if not absolutely innovative, at least took a very rudimentary structural “idea” and complexified it considerably, i.e. that Ethio-Semitic and Arabic incremented these patterns to a high degree in a common step. Thus, Hetzron’s classificatory innovation is indisputable. What can be disputed is whether this justifies explaining away commonalities between Arabic and Ethio-Semitic with what at this point look to be rather ad hoc explanations—loss of previously more widespread structures in NW Semitic, unspecified chains of contact spreading /f/ from language community to language community. A third point relates to the second, and that is that Arabic itself is hardly characterized by univalent structures. It bifurcates, a point I will elaborate on in the next section. Here I restrict myself to a simple observation. Whereas Huehnegard and Rubin (2011: 274) see the -k variant as arising via contact with Old South Arabian languages (Sayhadic), as will be sketched in 2.4 below, it is equally plausible to regard them as generalizing inherited structures in two different ways.¹²

2.4 Arabic: A composite West Semitic language In this section I would like to sketch an alternative categorization of Arabic as a Semitic language which categorizes without making strong claims about the nature of the Semitic genetic tree which Arabic fits into. Rather than view this as contradicting Figure 2.1 or Figure 2.4, it should rather be seen as an ¹² The origin of bifurcated features in Arabic is itself heterogeneous. The -t ~ -k parameter is postulated to have been inherited from PWS (see Figure 2.7 below). The variation in the FSG between -t ~ -ah probably was introduced via contact, for instance, with Aramaic-speaking communities.

2.4 ARABIC: A COMPOSITE WEST SEMITIC L ANGUAGE

41

alternative perspective. The inspiration for this approach was defined critically by Jan Retso¨. He basically criticizes Semitic linguistics for its interpretation of the Semitic languages as invariant entities, each cleanly delineated from other Semitic languages. the different Semitic languages are basically closed-worlds. It is tacitly assumed that these languages have once upon a time arisen as regional differentiations from a more or less unitary base. After that time they have sometimes interacted, as documented by borrowings, and, in some cases, substratum influence, but on the whole they have remained closed linguistic worlds leading lives of their own like Leibnitzian monads. Like in these, similar phenomena in different languages tend to be seen as parallel developments, the result of drift as described by Sapir (although Sapir is never referred to). (Retso¨ 2000: 112)

I would like to qualify Retso¨’s criticism in one way. Essentially he is correct, and I believe the previous discussion about the status of Arabic as a Central Semitic language illustrates his point that the underlying assumption is that clean Stammba¨ume can or should be discerned. One only needs to find the correct parameter. At the same time, even a superficial reading of the discussion in Huehnergard and Rubin, Tropper (2001, see n. 3, this chapter; ch. 4 n. 7) and others shows that the idea of areality and diffusion is acknowledged in principle. What does not happen, however, is to allow the possibility that Semitic languages, or at least parts of them, will never be amenable to a neatly delineated Stammbaum treatment. This issue was met with above. If the feminine nominal suffix is an areal feature, where did it innovate, how did it spread? That they are a problem for the purely Stammbaum approach is implicitly recognized. This recognition, however, is acknowledged to save the Stammbaum, to find those innovations which will cement the case that Central Semitic is a tangible sub-class. From a more general linguistic perspective, however, serious but at the same time interesting issues remain. Precisely this problem was treated in some detail in Edzard (1998) who developed a model of polygenesis, convergence and entropy. A Semitic proto-language is a chimera (an idea Edzard traces back to Karl Vollers at the turn of the twentieth century). We need to assume a heterogeneous (polygenetic, see ch. 5 n. 7) input whose individual features converged in irregular ways, followed by the spread and redistribution of these features in ways whose historical “causes” may be opaque to the contemporary observer (1998: 25–26, 72). While agreeing in principle with much of Edzard’s argumentation, including the important observation (1998: 41) that in many cases it may be impossible to choose a best historical explanation for a particular development, the approach followed here is that complex reflexes of features possibly of polygenetic origin should still be described in classic comparative-reconstructive terms. The payoff in pursuing this interpretive goal is

42

ARABIC AND SEMITIC

that following the many multi-linear strands of Arabic will yield insights into the internal history of Arabic, and throw up many interesting challenges to general historical linguistics. It can also be noted in passing that a fully “entropic” model is ostensibly at odds with long-term stability as described in Part IV of this book. Moreover, this book is less pessimistic than Edzard about the efficacy of classic reconstruction, as will be seen. At the same time as Edzard (1998), a related approach to genetic relationship among closely related languages, was proposed by Dixon in his equilibrium and punctuation model (1997). Looked at over the long extent of their chronological history, language families may experience different historical and social events which impinge critically on developments in language. During a period of equilibrium small groups of speakers will interact over long periods of time with one another, these interactions leading to extensive borrowings and shifts, L2 language acquisition and attendant substrate influence. Crucially in such an historical-social configuration discerning linear developments, e.g. a single innovation marking a new language or set of languages, is difficult precisely because the interacting speakers live in small, decentralized groups. These may lose contact with one another, but later take up contact again, take up contact with descendants of groups they have lost contact with, all the while spreading innovations which they may have developed at some point in their social history. At a certain point it may become impossible to discern neatly contoured Stammba¨ume, even while recognizing that a particular language is in some sense related (i.e. shares linguistic features with) to languages recognized as belonging to different language families.¹³ This issue is illustrated in some detail in Chapter 7 when I interpret pre- and early Islamic Arabic-Aramaic contact in terms of Dixon’s equilibrium model. Here I would like to revisit the idea of Central Semitic. As seen in 2.2.2, a classic Stammbaum approach finds compelling evidence for classifying Arabic with NW Semitic, this now under the aegis of Central Semitic. Classic historical reconstruction, long recognized among scholars of Semitic, equally justifies significant Arabic–Ethio-Semitic genetic links (2.2.1). A part of the problem is simply that of reifying the idea of language. Arabic is Arabic and deviations from an idealized model of this language require explanations from outside of a classic Stammbaum treatment. In many instances, as will be seen, such is a correct response, but not always. A perspective which informs this book and which pertains to the issue at hand is the following. P1 Language and speech community principle. A language can maintain contrasting forms in different speech communities, and still remain the same language. ¹³ For a classic critique of the Stammbaum theory and the difficulty of applying in a region populated by two very different language families (Papuan and Austronesian), see Ross 1996.

2.4 ARABIC: A COMPOSITE WEST SEMITIC L ANGUAGE

43

Speech communities can maintain inherited differences; they can induce historical disjunctions as one speech community maintains an inherited feature while another innovates. This is a basic premise of much of Chapters 5–9 and 13.3 so detailed consideration will need to wait. While it may be a solution which runs counter to recent attempts to develop unique taxonomies (Al-Jallad 2019; Huehnergard and Rubin 2011), I think the basic issue is that the defining position of Arabic vis-à-vis NW or SS runs afoul of two essential aspects of language. The first is Dixon’s (1997) construct, era of equilibrium, alluded to directly above. Languages long in contact with one another may transfer properties in such opaque ways that it is impossible to derive neat tree structures representing their genetic relationship. Edzard (1998: 72) makes essentially the same point under the rubric of chaos theory. In this respect the question of the relation of Arabic to the other Semitic languages reproduces a theme which will be repeated when relations between varieties of Arabic are considered in the rest of this book. In these terms, as Retso¨ (2013: 444) remarks, the question whether Arabic should be classified as Central Semitic or South Semitic is not very meaningful. Still, this does not mean that it is not interesting to examine criteria within a larger linguistic context as to how attested configurations arose, what diverse elements define them and how they are maintained. Observing why a classical Stammbaum cannot be constructed may be as interesting as constructing one that works. The second one is that within “languages” writ large shelter individual speech communities—in the case of Arabic many of these—and it is in these that classificatory contradicting features will survive, thus, not 1SG –tu vs. –ku but both –tu and –ku distributed across the different Arabic speech communities. I would like to expand upon this point by adumbrating an issue which will reappear in this book, and that is to think of an historical development as proceeding across populations of speech communities. Here I will illustrate the point rather programmatically, using material which has already been introduced, and in Chapters 6 and 11 I will elaborate. To begin I assume (following in principle Greenberg 2005: 78) the construct “Arabic” which may contain a number of variant forms for certain morphemes. These variant forms are designated as types which characterize the entire language. The intention is proof of concept, not a detailed exposition of each component part of the argument. For ease of exposition I assume a genetic tree contrasting NW vs. SS with Arabic part of SS, as in Figure 2.1. Much of the discussion above involved this tree variant. Converting the representation to Figure 2.4 with Arabic in Central Semitic, at least in a simplified version, would be a straightforward matter, however. In the Figure 2.7 and 2.8 speech communities are represented in stylized manner as defined by a particular allomorph and separated graphically from the next speech community with a comma. Allomorphs are represented by “~ .” (-t ~ -k) indicates that the 1SG perfect in Arabic has two allomorphic types, one –t, the

44

ARABIC AND SEMITIC

other –k. [-t, -t ~ -k, -k]¹⁴ represents four speech communities, two with the -t allomorph, two with the -k. I enclose sets of speech communities in square brackets, and at a superordinate stage recognize two sets, each representing an ancestor of the two subordinate nodes. In the bottom subordinate row sub-families are distinguished by their position at the end of a node, but whether individual languages or dialects are distinguished here is not specified. With this, the development of the 1SG perfect verb suffix in WS can be diagrammed as follows. In Figure 2.7, WS begins with two allomorphs of the 1SG, (-t, ~ -k), distributed (purely for illustrative purposes) in eight distinctive speech communities. As PWS split into NW and SS, ∗ -t and ∗ –k in individual speech communities did not get transmitted uniformly. In particular, in NW Semitic the ∗ –k variant was completely lost. On the other hand, ∗ -k and ∗ –t continued in about equal parts in the developing SS speech communities. -k was taken over in Ethio-Semitic, whereas Arabic communities split, some with t- others with k-. PWS: (-t~-k) [-t, -t, ~-k, -k] [-k, -k, ~-t, -t]

NW -t, -t, -t …

SS -k -kEthio-Sem… -k~-tArabic…

Figure 2.7 1SG perfect allomorphs, realizations in speech communities

Matters become much more complicated very quickly, but one further variable can be added to illustrate how the model proposed matches the outcomes observed. Referring to (2.18), Arabic has been characterized by four bifurcated features, one of them the -t ~ -k alternation in the perfect verb. For illustrative purposes, a further parameter from the bifurcated features can be added. For this I will assume that the inherited WS value of -ši, along with -ki for the 2FSG object can be motivated (see 2.3.4). PWS begins with two allomorphs for the 2FSG -ši ~ -ki and two for first and second person subject, -t ~ -k. This is represented here as (-ši ~ -ki -t ~ -k). In individual speech communities the combinations of the two morphemes gives the logical possibility of [-ši + t ~ -ši + k ~ ki + t ~ -ki + k]. As matters turned out, in NW Semitic the –k subject marker was lost completely, as seen in Figure 2.7, hence the combination [-ši, -k] is impossible in NW Semitic. [-ši + -t] occurs in one variety, Maʕlula Aramaic. Excluding Arabic, SS lost -t categorically, hence the impossibility of [-ši, -t] in this case. ¹⁴ Representationally one could alternatively simply notate [t, k], where “t” = all speech communities with -t, “k” all communities with -k. The eight individual communities simply graphically represent the fact that the variants have an assumed wide distribution. This representation is used further in 11.5.1.

2.4 ARABIC: A COMPOSITE WEST SEMITIC L ANGUAGE

45

However, Arabic continues all logical possibilities: -ši + -t ~ -ki + -t ~ -ši + -k ~ -ki + -k … . This is exemplified in (2.19). (2.19)

Arabic: 2FSG object and 2FSG perfect suffixes Obj Sbj -iš + -t = Baħarna, many Yemeni dialects (Watson 1993; Holes 2016) -ki-/ik + -t = most dialects -iš + -k = eastern Yemen (Behnstedt 2016a: 114, 196; see also 12.3.3) -ki/ik + -k = south central Yemen, Diem 1973: 95–97, 111; also Ḍulmah (sample point 122a, Behnstedt 2016a: 114, 196)

Other combinations, e.g. –iš + -ši also occur. It would obviously be very premature to claim that all possible combinations in Arabic reflect proto-inheritance. But this is a proof of concept summary illustrating how it is possible for one and the same language to have such a variety of allomorphic variation in core morphological categories merely on the basis of inheriting 2×2 variants. Nothing in principle argues against (2.19) as representing retentions. In passing it can be noted that Diem (1973: 93) regards the area where the –k/-t variants are found (al-ʕUḍayn and vicinity) as belonging to the “most original Yemeni if not neo-Arabic dialects in general” (“… originellsten jemenitischen wenn nicht neuarabischen Dialekten u¨berhaupt”). As I interpret this, these dialects are original in the sense of earliest. As already indicated, it is possible to seek alternative explanations. In regard to Maʕlula Aramaic, for instance, it might be argued that the –iš variant was a later borrowing. However, I also noted that ostensibly there is no mechanism, for instance contact with a –ši-speaking community which would support such an explanation. Similarly with the complex combination of variants in Figure 2.8 in Arabic. Most are located in Yemen, which is one of the most complicated, but equally, one of the least-well described linguistic regions of the Arabic world. One cannot on an a priori basis rule out change via contact for some of the combinations. However, logically change via contact must defer initially to an historical development defined via a classic comparative linguistic methodology.¹⁵ This is what is offered and serves as a hypothesis for further research. The methodological point made here is that as soon as one allows allomorphy to be conceptualized as a property of speech communities, a richer set of reconstructed forms is possible than if the historical development is formulated only in terms of the concept of “language.”

¹⁵ Entities, languages, dialects, need to exist before influence on them can be ascertained.

46

ARABIC AND SEMITIC PWS: (-ši, -ki, -t, -k) [-ši + t ~ -ši + k ~ ki + t ~ -ki + k] [–ki + t ~ -ši + t ~ -ši + k ~ -ki + k] Loss of –k SBJ suffix in NW Semitic, of -t in SS NW -ši+ t ~ -ki + -t …

SS (ex -Arabic) -ši + -k ~ -ki + -k …

Figure 2.8 t/k perfect + 2FSG allomorphs, realizations in speech communities

As more and more information becomes available about Arabic, both from written and oral (contemporary) sources, the more and more complicated becomes a simple genetic classification of the language within Semitic. At this point rather than looking for magical criteria defining a simple, binary classification, it is more interesting to consider in greater detail the many variables both linguistic and demographic-sociolinguistic which will bear on this enterprise. This is one goal of the current book.

Map 1 Countries with Arabic as a majority language

2.4 ARABIC: A COMPOSITE WEST SEMITIC L ANGUAGE

Map 2 Arabic in the Middle East

47

48

ARABIC AND SEMITIC

Map 3 Arabic in Africa

Map 4 Arabic as a minority language

3 Arabs, Arabic “Arabs” have been known in written sources for a much longer time than Arabic. The oldest sources in Arabic script conventionalized as we know it today go back “only” to the sixth century AD, and written sources in Arabic script become plentiful only in the Islamic era (see 4.2). Depending on how one evaluates the Ancient North Arabian languages, discussed in 4.1, the oldest sources can be pushed back to 700–1100 years before the Islamic era. Nonetheless, these epigraphic sources are highly deficient for comparative linguistic purposes (see 4.1), so that it is effectively not until the grammarian Sibawaih (177/789) that Arabic, however understood, is known in all the fine detail that we know it today. In this chapter, after reviewing briefly the pre-Islamic history of the Arabs, I introduce Sibawaih. I do this not by summarizing his grammar in general terms—for this a number of works can be recommended (Baalbaki 2008; Carter 2004)—but rather by treating one issue in considerable detail. This performs the triple task of introducing Sibawaihi’s complex and sophisticated linguistic thinking to the reader, and secondly, in so doing showing that even Sibawaih needs interpreting in order to bring him in line with contemporary research and thirdly introducing one linguistic feature which will be adduced for its significance in interpreting linguistic history at a number of points in the book. In the last section, 3.3, of this chapter I argue against a sharp distinction between Classical Arabic and the contemporary spoken dialects, Classical Arabic sources themselves supporting my meta-conceptualization of Arabic language history.

3.1 Arabs The Arabs come from the Middle East. Where their exact origin lay is a complex question which comparative historical linguistics is not in position at the moment to make more specific. Different answers to the question have been proposed. Petrácek (1988) held that Arabs originated in the desert and steppe between what is today the border between Iraq, Saudi Arabia, and Jordan. While not discounting this possibility, on philological grounds Retso¨ (2003: 49–51) envisioned the possibility that their original homeland lay in the Hijaz, though equally he presents ample evidence that pinpointing a single homeland against the many references to “Arabs” in classical sources is very problematic. Indeed, Retso¨ argues that the very idea of Arabs as an ethnic group is a fluid concept, shifting from context to context. All of this is probably an issue which will never be answered definitively. It has been Arabic and the Case against Linearity in Historical Linguistics. Jonathan Owens, Oxford University Press. © Jonathan Owens (2023). DOI: 10.1093/oso/9780192867513.003.0003

50

ARABS, ARABIC

noted numerous times that in pre-Islamic times groups assumed to be Arabs were found in what is today Lebanon, southern Turkey, Syria, Iraq, Jordan, the Negev, Sinai, and Egypt in the Nile Delta, and throughout the Arabian peninsula, in short everywhere in the Middle East except perhaps in the NW of the Levant and perhaps not in parts of Yemen. Till today Retso¨ (2003) is the most detailed treatment, though others agree in a general way with his summary (Hoyland 2001: 26, 48, 59l Lipiński 2000: 38; Wilmsen 2014: 130–140). Hoyland reconstructs movements of Arabs beginning in the first millennium BC, and then beginning in the first century AD, migrations from the southern Arabian peninsula toward the north lasting centuries (2001: 230, 233–234, 238), movements motivated by population pressures, the collapse of state structures in Yemen, conquest, and increasingly the attraction of serving as border gatekeepers to either the Roman or Parthian, later Sassanian empires. Underpinning these migrations was the fact that large segments of Arab society were nomadic.¹ This fact alone militates against identifying a specific homeland. Nomadism also has an important linguistic correlate. According to von Grunebaum (1999 [1963]: 10), in the pre-Islamic Arabian peninsula “the real linguistic unit of the Bedouin Arab was the ħayy” that is a group of families living together. Von Grunebaum is half prescient here in remarking that this supported an “intense linguistic fragmentation.” As will be discussed in 6.5, the family does indeed have an important role in transmitting stable linguistic norms. “Linguistic fragmentation” on the other hand, is an a prioristic essentialization of a complex topic—the topic of this book—which unfortunately can be invoked to dismiss or ignore the legitimacy of any systematic linguistic study. As for the term “Arab” itself, it is first attested from 853 BC in an Akkadian document which records the defeat of a Syrian–Palestinian army that included “Gindibu the Arab” (jundub ‘grasshopper’ in Arabic) by the Assyrian king Shalmaneser III (Hoyland 2001: 59; Retso¨ 2003: 125). Byrne (2003: 12–14) assumes that this mention marks a threshold for recognition to a people whose presence in the Arabian peninsula and the Levant had been long established. After this mention references to “Arabs” throughout the region become more common in Assyrian documents, the Bible, in Aramaic documents and inscriptions, and in the works of various Greek and Roman geographers and historians. However, interpreting who were or weren’t Arabs in classical sources is often a puzzle with no unambiguous solution. The Greek geographer Strabo (d. AD 24), basing himself on Erastothenes (d. 195 BC) writes of “tent dwelling Arabs” (skēnitai) who are different from árabes elsēnoı´ who lived in the Arabian peninsula (Retso¨ 2003: 302–303). “Medianites” (NW Jordan in a Biblical context) are in some sources described as distinct from Arabs, and in others as being Arabs (Retso¨ 2003: 335–336). ¹ Hoyland (2001: 169) reports that Ptolemy (ca. AD 150) listed 218 settlements in Arabia, only six of which were cities. All of these were located in south Arabia (Yemen or Hijaz).

3.1 ARABS

51

Retso¨ (2003: 505) describes a period, approximately AD 300–450, when in the Greek and Syriac literature the names Saracens and ṭayyaayee ~ ṭayaaʔee ~ taenoı´ are assumed to refer to Arabs as a group. In the Islamic era, the ṭayyiʔ are a subtribe of Arabs (a nice metonymy). Throughout the pre-Islamic classical sources “we see these Arabs through the eyes of others” (Retso¨ 2003: 330). We know they were there, but often know little more than this. Indeed, Retso¨ cautions that the term “Arab” need not refer to an ethnic group at all. He suggests that the term could designate special social groups. The military elite in the Nabataean kingdom were “Arabs” (2003: 382). Most consistently Retso¨ argues that “Arabs” were religious officiants who presided over an “Arabic” that was a special, even divine, devotional language or register (2003: 591–598, 2013 [2019]: 434).² At the end of the day the many references to “Arabs” say nothing directly whether the Arabs of these reports, however designated, spoke an Arabic language and if they did, what form it had (Retso¨ 2003: 126, 591). The reality is that it is in the Islamic era that a well-profiled, if sometimes contradictory consensus developed around the meaning of “Arabs.” Briefly based on Ibn Manẓur (Lisaan 4: 2863–2868), Arabs (ʕarab or ʕurb) are a people opposed to non-Arabs (ʕajam). They are divided into two groups, the al-ʕarab al-ʕaariba, the ‘pure Arabs’ (al-xullaṣ minhum) and the ‘Arabicized mustaʕraba’ who in some later accounts are those who acculturated and learned Arabic (see Ibn al-Nadim Fihrist: 3 for one account of this). Ibn Manẓur also mentions a third group, al-ʔaʕraab ‘the Bedouins.’ The Quraysh are said to be the most eloquent of the Arabs (ʔafḍal luɣaat al-ʕarab) and a man ‘of an Arabic tongue’ (rajul ʕarabi al-lisaan) is a compliment to the person’s eloquence in Arabic. (Lisaan 4: 2863–2868). Ibn Manẓur is a late (thirteenth-century), though extremely comprehensive dictionary whose presentation conflates a number of opposing interpretations (see Retso¨ 2003: 24–48 and 82–93 on ʔaʕraab for extensive discussion). At the same time it represents in a general way the perceptions of ʕarab ‘Arabs’ in Islamic society. In his account two aspects are clear. Arabs are a discrete demographic group, however defined, and this group is closely connected in a positive way to the Arabic language. Thus for our purposes the conflation of Arabic and Arabs is significant, since an important set of reconstructions developed here lead back to the early Islamic era. However, naming is one thing and the linguistic entities behind the names are another. Concretely, it is only in the immediate pre-Islamic period, beginning very sparsely to be sure around the sixth century AD, that an interpretable corpus of the Arabic language becomes available. This date might be pushed back considerably depending on how one considers the North Arabian languages (ANA = Ancient North Arabian). These begin

² Even assuming that these usages were metonymic extensions of a group who typically performed these functions, it would indicate that simple ethnic affiliation was not necessarily a prime distinguishing characteristic in the pre-Islamic Middle East.

52

ARABS, ARABIC

with Taymanitic (ca. 500 BC, see 4.1.1) and continue through Safaitic (attested 100 BC–AD 400). Knauf (2010: 207) rather boldly considers all of these to be “protoArabic.”³ In a somewhat earlier formulation Mu¨ller (1982: 17) considered the ANA languages to be a “Vorstufe des Altarabischen” “a predecessor of Old Arabic.” This seems to suggest that these older ANA attestation are not “Old Arabic,” but are in some unspecified sense its predecessor. Huehnergard (2017: 27) would see only Safaitic as forming a common node with Arabic. Parker (1987: 41) and Peters (1978: 322) consider Safaitic to be “pre-Islamic Arabic.” The linguistics of these varieties is dealt with in greater detail in 4.1. This gives a time frame for a half-way realistic linguistic interpretation of pre-Islamic Arabic of up to 700 years (Safaitic, leaving the status of other ASA languages aside) and at the least 200 years (inscriptions in Arabic script). In the light of this ambiguity, the question sometimes posed by Semiticists (e.g. Al-Jallad 2019: 18; Mascitelli 2006) “what is Arabic?” or “when is it first attested in writing?” is unanswerable as anything other than a curiosity question because, as will be seen in Chapter 4, the source material is itself too deficient for meaningful fine-grained classification. Certainly, as was seen in Chapter 2 above there is a great deal linguistically to talk about as far as understanding the classification of Semitic languages. It is the view of this book, however, that rather than trying to find water-tight compartmentalizations of entities which are defined a prioristically and often on the basis of a dearth of interpretable linguistic information, it is more important to focus on the linguistic methodologies which can be adduced to render a broad interpretation of the history of Arabic. As will be seen, priority in this endeavor is given to data sources which yield a complex corpus of linguistically interpretable language (see App. 3.1 for further discussion). One data source which is given prominence in this book is the spoken word (Chapters 5–11). There are three reasons for giving detailed attention to the spoken word. As will be seen, integrating such data into historical linguistics will greatly enhance the discipline in general (see Chapters 10–12). Secondly the available data is indeed large, thanks to the fate of history. In the early seventh century Islam came to the world and it set in motion one of the great demographic expansions, the spread of Arabic-speakers from the Middle East “homeland”—to a diaspora which brought them by 710 to Uzbekistan, by 640 to Egypt, by 711 to Andalusia, by 1400 to the Lake Chad area (see Owens 2006/2009: 271 ff. Appendix I for overall summary of migrations and Maps 1, 3, 4, 6). The linguistic effects

³ He says that “Ancient North Arabian is Proto Old Arabic just as (vulgar) Latin is proto-French …” (2010: 207). Even ignoring the fact that a proto-language, no matter how close to its successors, definitionally involves reconstruction, the analogy glosses over the type of interpretive subtleties which differentiate historical linguistic methodology from Semitic historical linguistics. It is doubtful that it can be shown in a linguistically satisfying manner that Dedanitic, Taymanitic, Hasaitic … are protoOld Arabic as opposed to say being parallel sisters or cousins in the way that Old Norse is, roughly, an aunt to Old English and not a direct ancestor.

3.2 ∗ K → Cˇ: SIBAWAIH THE MODERNIST

53

of this were profound, the area where Arabic was spoken more than doubling within the period of less than 100 years, and even further expanding after this. I call the Arabic brought to these regions, diasporic Arabic, this being a purely demographic-geographical term with no implifications for the form of Arabic. The “corpus” of Arabic thereby increased dramatically. Thirdly, the focus on the written word misses the crucial point that written Arabic and spoken Arabic have much in common, a fact we know thanks to the great eighth-century linguist Sibawaih. While it is true that Sibawaih essentially defined the entity which today is known as Classical Arabic, the point is often missed that a great deal of what he wrote is directly assimilable to contemporary spoken Arabic. I turn to one small yet detailed exemplification of this point now. It is chosen purposely from the domain of (morpho)phonemics, for four reasons. First it shows that Sibawaih was a virtuoso phonologist well before anything of the like was seen in the West.⁴ Secondly, his detailed phonological information will provide an implicit contrast to the impoverished signal found in the earlier written material summarized in Chapter 4. Thirdly it is chosen to anticipate an interpretive issue regarding Sibawaihi’s motivation as a scholar and his role in the pantheon of the Arabic-Islamic sciences, and by extrapolation, the role of the Kitaab in defining what Classical Arabic is (see 4.3). In particular these sections establish Sibawaihi’s credentials as a hard-core phonetician and phonologist. Fourthly and finally, the phenomenon exemplifies directly the close relationship between contemporary varieties and their historical origin dating back to at least the eighth century. Sibawaih looks forward to the Arabic of today.

3.2 ∗ k → cˇ: Sibawaih the modernist Arabic has a rich literary tradition which, paradoxically, both abets and confounds its historical linguistic interpretation. The literary tradition includes a brilliant linguistic tradition which forms the basis of Classical Arabic as we know it today. What is generally not appreciated, however, is that the idea of “Classical Arabic” as a perfect fully formed construct, logically contoured, defined by rule and regulation does not become reality until the early fourth/tenth century, some centuries after both the advent of Islam and Sibawaih himself. The interpretation of this development will be taken up in the next section, 3.3. The system of Arabic grammar, however, was laid out and described over 100 years earlier. Virtually all of Classical Arabic as we know it derives from the astounding efforts of the early Arabic grammarians, foremost among them Sibawaih (178/793). A Persian by birth (see Carter 2004: chapter 1 for biographical summary), in a densely ⁴ A contrastive study between Sibawaih and the early Irish and Icelandic phonologists would be welcome.

54

ARABS, ARABIC

written, nearly 1,000 page treatise Sibawaih described in minute detail the phonetics, phonology, morphology, and syntax of Arabic—the core domains of any linguistic description—as well as incisive observations on Arabic semantics and pragmatics and variational usage. Sibawaih, however, was not alone. A generation after him the linguist al-Farraaʔ (214/820) composed a three-volume study of the grammar, semantics, and pragmatics of the Qurʔaan, Maʕaaniy al-Qurʔaan. This work is closely related, genre-wise, to that of the tenth-century grammarian Ibn Mujahid (Mujaahid 324/936) who minutely catalogued and canonized interpretive variation accruing to seven Qurʔaanic variants, al-Qiraaʔaat al-Sabʕa. Each of these works is marked by a high degree of linguistic sophistication, and a minute attention to detail which extends beyond a simple exposition of basic rules of grammar. While our contemporary understanding and interpretation of what Arabic is cannot be separated from the post-Sibawaihian grammatical tradition, a point I take up in 3.3, it is the earliest era, the time roughly between 770 and 900 (153/286) which gives us the most direct insight into a living, spoken form of Arabic from its earliest documented period. The foremost reason for this is the linguistic genius of Sibawaih. I would like to illustrate this with a single case study illustrating that Sibawaih, beyond his expertise in the core domains of Arabic, frequently delved into almost unimaginable ramifications of structural detail, whether linguistic variation in Arabic, fine points of Arabic phonetics, creating standards for judging between competing grammatical forms and constructions or integrating the Arabic of written documents—poetry, the Koran—into his framework of grammar.⁵

3.2.1 The 2FSG object pronoun suffix in Sibawaih The 2FSG suffix pronoun in Classical Arabic is generally thought of as –ki, and indeed this is the only form mentioned in Sarraj’s ʔUṣuwl (I: 149). However, in chapter 504 (Derenbourg edition, II: 322–323) Sibawaih described four variants of the 2FSG object suffix –ki. One of these, -ši, figures in the discussion of the classification of Arabic in 2.3.4. (3.1)

Variants of 2FSG object suffix a. -ši, bayt-u-ši ‘house-NOM-your.F’ b. –kiš#, bayt-u-kiš# c. –kis#, bayt-u-kis# d. -ki, bayt-u-ki

⁵ 3.2.1 is based on Owens 2013c.

3.2 ∗ K → Cˇ: SIBAWAIH THE MODERNIST

55

Before introducing broader comparative data it is relevant to note that in describing these terms Sibawaih by no means indicated that any of the forms were sub-standard or incorrect (qabiyħ ‘ugly,’ radiyʔ ‘bad’ etc. in his vocabulary), even though only (3.1d) became the normal Classical Arabic variant. To the contrary, pondering the logic behind –ši Sibawaih explains that such a form is understandable because in pausal contexts (3.2, 3.3) the contrast between M -ka and F -ki is maintained. (3.2)

contrastive –ka vs. –ši in pause bayt-u-š# = F. bayt-u-k# = M.

vs. (3.3)

–ka vs. –ki bayt-u-k# (< -ki) bayt-u-k# (< -ka)

In pausal position (see 12.4) the final /i/ and /a/ are dropped. In (3.2) based on nonpausal 2FSG -ši and 2MSG -ka the M-F contrast remains, whereas in (3.3) based on -ka, -ki it is neutralized. Sibawaih further notes, as an indication of the breadth of his phonological thinking, that (3.2) is understandable in that a contrast carried by a consonant is more perceptible than one shown only by vowels, citing the M – F contrast: (3.4)

ʔantum ‘you.MPL’ ʔantunna ‘you.FPL’

Additionally, Sibawaih understands the choice of /š/ as being determined analogically by the orality and voicelessness of /k/, i.e. a consonant with the same phonetic features is chosen. Regarding –kiš and –kis, Sibawaih suggests that in pausal position the final /š/ or /s/ serve to “perceptualize” (yubayyinuuna ‘they (speakers) make it clear’) the (feminine) vowel /i/. It is here clear that Sibawaih was far more than a neutral outside observer. His evaluation of this, and indeed nearly any phenomenon he examined in detail, was embedded in a system of analogies, expectations, and explanations for why a given set of contrasts should exist at all. That is to say, one doesn’t pick Sibawaih off the shelf as a reference work to casually deal with a certain feature. There is always a theoretical and meta-linguistic background involved. Interpreting Sibawaih alone is therefore an issue. Integrating this into a historical linguistic interpretation is a further challenge. “Integrating into a historical linguistic interpretation” means relating, to the extent possible, Sibawaih’s Arabic to contemporary Arabic. In (3.1) there are two forms which require no discussion. –ki is identical to many contemporary dialects (see 5.3.2.1), while –ši, as seen briefly in 2.3.4, is identical with the 2FSG object suffix in many parts of the

56

ARABS, ARABIC

southern Arabian peninsula. The forms requiring interpretation are (3.1 b, c). My interpretation follows that of Fischer (1956: 34) and Johnstone (1963; also probably Holes 1991: 670, n. 63, Kaye 1991) who observing the similarity between –kiš/kis on the one hand, and the contemporary Gulf and Najdi Arabic suffixes of the same meaning, -cˇ/ts, assumed that Sibawaih was talking about –cˇ [tš] and –ts, as in Najdi. (3.5)

beet-icˇ ‘house-your.F’ beet-its

Beyond the ostensible similarity noted by Johnstone, there are a number of arguments in favor of this interpretation, e.g. relating to orthographic representation, and the social status of the form (all discussed extensively in Owens 2013c: 16–21).⁶ I will return to the contemporary interpretation in 3.2.2 below. In this section I elaborate at length on why Johnstone’s assumption is confirmed in Sibawaih. This will further illustrate that the route from Sibawaih to interpreting contemporary historical outcomes is fraught with interpretation. A crucial question is whether in fact a –cˇ or –ts can be motivated for the Arabic of Sibawaih’s day. The answer I give is “yes.” though demonstrating this requires a further excursus into Sibawaih’s phonetic/phonological thinking. Sibawaih’s introductory chapter on al-idɣaam “assimilation” (chapter 565, II: 452–455) is essential reading regarding his phonetics and phonology. For Sibawaih, phonetics was a functional phonology, hence his most important observations on phonetics and phonemic inventory come at the beginning of the chapter in which he begins to outline the rich phonemics and morphophonology of Arabic. Sibawaih defines 29 standard phonemes or ħuruwf for Arabic. I term these sounds “basic phonemes” and conventionally represent them in phonological slashes “/…/.” In addition, there are two sets of what can be called an extended variant list. As will become clear in the subsequent discussion, these two sets are crucial to an interpretation of the sounds/morphophonemes under discussion here, because they contain no less than four variants which belong to what can be called the jiym/shiyn complex. To date, there is no systematic integration of all of these sounds into an interpretation of Sibawaih’s phonetic and phonological thinking, though a number of linguists (see below) have cited different variants in one place or another. The first set consists of six further phonological variants, which are sanctioned for Quranic recitation and poetry. These can be listed without further comment.

⁶ I would add one more factor not treated in Owens 2013c. This is, Sibawaih was reticent to employ the non-basic (3.6, 3.7) sounds in his allomorphic descriptions. I believe they are adduced in none of his treatments of Arabic morphophonology at the end of the Kitaab. These sounds are recognized as single sounds, but not as a part of his phonological explanatory mechanisms.

3.2 ∗ K → Cˇ: SIBAWAIH THE MODERNIST

57

The variants for the forms immediately relevant to this chapter are listed in brackets and will be justified in the following. I will conventionally represent these in phonetic brackets, “[…].” (3.6)

Sanctioned variants a. the medial Hamza “glottal stop” (bayna bayna) b. imalized alif (see 5.3.1.3 below) c. the light /n/ d. the shiyn like a jiym, (= [zˇ]) e. the ṣaad like a zaa. (= [ẓ] f. the emphatic alif.

Most members of the set of sanctioned variants are mentioned elsewhere as well, either in the Kitaab (e.g. II: 279 ff. for imala, II: 168 ff. for medial hamza), and/or in other linguistic traditions (see below). Except for the shiyn like a jiym, all of these are readily interpretable as allophonic or diamorphic variants of one of the 29 core phonemes. A classic case relates to the complex conditioning of the imala (see 5.3.1.3), a vowel-harmonic palatalization of [aa]), but others are equally susceptible to a conditioned allophonic reading. For instance, the ṣaad like a zaaʔ is a voiced ṣaad in the environment of another voiced consonant. Beyond these, Sibawaih also notes a second set of eight further pronunciations which he does not sanction for the Koran and poetry, which are not considered good, and which he states are not frequent among those who use good Arabic. (3.7)

Proscribed variants g. The jiym like a kaaf (= [cˇ] or [c] or [ts]) h. The jiym like a shiyn (= [cˇ]) i. The weak ṣaad j. The ṣaad like a siyn k. The ṭaaʔ like a taaʔ l. The ḍaaʔ like a θaaʔ m. The baaʔ like a faaʔ (= [p]) n. The kaaf that is between the jiym and kaaf (= [-j]) In addition in Chapter 525 Sibawaih identifies a: p. sound (ħarf ) that is between a kaaf and a jiym (= [g] ?)

It is not always a straightforward step to interpret what the variants (3.6, 3.7) mean phonetically, and a number of suggestions have been made (Bakalla 1984; Cantineau 1960; Schaade 1911; Watson 1992: 76). What all suggestions to date lack, however, is a systematicity in that Sibawaih’s sounds are interpreted on an ad hoc case by case basis, not in terms of a common formula. It can be suggested here that the key to interpreting (3.7d, 3.7g, h, m, p) begins with a different chapter than that in which the sounds are described phonetically,

58

ARABS, ARABIC

namely chapter 525 (II: 375). This is the chapter in which he treats the adaptation of Persian sounds, including consonants, into Arabic. Two of the sounds are (3.7m, p). Speaking of (3.7p) Sibawaih writes, “They [Arabs, j.o] convert the [Persian j.o.] sound that is between the kaaf and the jiym into a jiym …” al-jurbuz “imposter” < gurbuz, al-jawrab “sock” < gawrab. Sibawaih more briefly notes that the sound between the faaʔ and baaʔ is changed to either [f ] or [b], for instance, Sibawaihi’s designation firind/birind “decorative garment, decorated sword handle” < parand= [p] (Lane 1980 [1877], 6: 2389). These two cases are instructive for postulating a systematicity behind Sibawaih’s designation. Sibawaih the phonetician would have thought in his phonetic classificatory terms. As in contemporary articulatory phonetics, Sibawaih classified sounds according to their place of articulation (muxraj or maxraj, pl. maxaarij), whether they are voiced (majhuwr) or voiceless (mahmuws), and their manner, which roughly in Sibawaih is expressed by how the sound (ṣawt) flows through the vocal tract, e.g. whether it is stopped = šadiyd, fricative = rixwa, and so on (see Owens 2021a for extensive discussion). In addition he describes the secondary articulation of emphasis (ʔiṭbaaq), which is irrelevant for current purposes. It can be assumed that in describing the extended variants, Sibawaih was thinking in these classificatory phonetic categories, even if he did not describe the extended sounds individually. The question is what parameters he used to operationalize them. A first place to start looking for general phonetic parameters is in the two parts of the description itself. Sibawaih uses two formulations, either “sound X ka-sound Y,” “sound X like sound Y,” or “a sound between sound X and sound Y”. There is either a likeness or a betweenness. Whereas previous analyses made suggestions for individual sounds, e.g. (3.7g), a jiym like a kaaf is a [g], X and Y can be broken into constituent phonetic categories, generalizable to a common model. Clearly the extended sounds are a single sound. A “jiym like a kaaf,” whatever is meant, is a single sound. Single sounds in Sibawaih’s phonetic theory can only consist of the discrete place, manner, and voicing parameters which he defined. It can be suggested that Sibawaih used the “like” relation and the “between” relation to define these composite sounds, and if this is the case, he would have needed a system to make compound sounds regularly. The two relatively certain examples allow a phonetic model to be extracted, which describes such a system. (3.7m) is used for exemplification, repeated here, (3.7m)

the baaʔ like a faaʔ (= [p])

Knowing that the outcome is a voiceless bilabial plosive, one need only look in the phonetic attributes of baaʔ and faaʔ to discern which phonetic parameters Sibawaih took from each sound. These are marked in boldface in the following.

3.2 ∗ K → Cˇ: SIBAWAIH THE MODERNIST

(3.8)

59

baaʔ= voiced (majhuwr), stop (šadiyd), bilabial (al-šafataan) faaʔ = voiceless (mahmuws), fricative (rixwa), labio-dental The Persian [p] takes the following attributes: [p] = voiceless (mahmuws), stop (šadiyd), bilabial (al-šafataan)

What Sibawaih has done is to take the voicing parameter of the second sound, Y, and the manner and place parameter from the first, X. The boldfaced attributes are taken from each sound and reassembled in the extended phoneme. A general formula can be postulated: (3.9)

X = place/manner; ka/bayna Y = voicing

This description serves as a hypothesis; Sibawaih used the X ka/bayna Y as a general model, whereby X = manner and place, Y = voicing. Sibawaih’s extended sounds are, in most cases, not as it were sounds at all, but rather instructions on how to combine the phonetic features of each constituent sound to interpret a composite extended sound. The model can be tested against (3.6d, 3.6e, 3.7g, 3.7h, 3.7j). (3.10)

designation composite features phonetic interpretation

(3.11)

Sibawaih’s designation: ħarf between kaaf and jiym place/manner of /k/+ voicing of /j/ [-j] = (3.7n)

(3.12)

Sibawaih’s designation: ṣaad like a zaaʔ alveolar/fricative + vd [ẓ] (=3.6e)

(3.13)

Sibawaih’s designation: shiyn like a jiym alveopalatal/fricative + vd

[zˇ] (= 3.6d)

(3.14)

Sibawaih’s designation: jiym like kaaf place/manner of /j/ + voicing of /k/ [cˇ], [c] or [ts] (= 3.7g)

(3.15)

Sibawaih’s designation: jiym like shiyn place/manner of /j/, voicing of /š/

(3.16)

[cˇ] (= 3.7h)

Sibawaih’s designation: ṣaad like a siyn place, manner, voicing of /ṣ/, emphasis of /s/ [s] (= 3.7j)

Regarding (3.7p) on the basis of the loanword chapter cited above, the phonetic value is [g], i.e. the voiced counterpart of [k] (Boyce 1975: 169; Salemann 1930: 13). It is therefore unlikely that the “betweenness” resides in physical distance (place of articulation) since in Persian there is no physical distance between [k] and [g]. Rather, “betweenness” should be interpreted in classificatory phonetic terms. [g] has the place and manner of articulation of a [k], but the voicing of a jiym, i.e. it lies between two articulatorily-defined coordinates.

60

ARABS, ARABIC

All of the remaining sounds as interpreted in (3.6), (3.7) are attested either in variants of Old Arabic, and/or in the modern dialects. (3.6e), [ẓ], is found inter alia in the koranic Qiraaʔaat⁷ (Ibn Mujahid 106–107), ṣiraaṭ “road, path,” the reading of Hamza, as well as one interpretation of Abu ʕAmr ibn ʕAlaa’s pronunciation being a sound “between a zaaʔ and a ṣaad,” i.e. ẓiraaṭ. (3.6d) is the widespread pronunciation of jiym found today in most of North Africa, in the Levant and large parts of the Hijaz. (3.7g, h) are “palatalalized k’s,” which in 3.2.2 will be seen to be widespread. There is an interpretive problem here, in that both (3.7g) and (3.7h), in the current formula, allow a [cˇ] reading. Such a result runs against an expected precision in Sibawaih’s description. That the jiym is crucially implicated in the current issue is indicated on a prima facie basis by the basic observation that it is mentioned no less than five times in the description of the extended variants, i.e. in over a third of the phonetic descriptions. Here a closer look at the basic jiym is necessary to elucidate the matter. Uncontroversially, the basic jiym is a stop (šadiyd) and voiced (majhuwr) sound. As is well known, in the case of jiym Sibawaih did not specify a contrast between an affricated and plain stop. It is simply “šadiyd.” As far as place of articulation goes (muxraj), it is placed after the kaaf (moving back to front), (II: 453.7) “what is from the middle of the tongue between it and between the middle of the hard palate [al-ħanak al-ʔaʕlaa] is the place of articulation of the jiym, and the shiyn and the yaaʔ.” (Kitaab II: 453). Further to the front of these three sounds comes [ɮ], an emphatic lateral fricative, then the laam /l/. A literal reading of this place of articulation would give the phonetic values to these three sounds of [y], [ç] (vl. palatal fricative) and [-j] (vd. palatal stop), and in fact the latter two values have been suggested by various scholars.⁸ However, this literal interpretation runs afoul both of phonetic and Sibawaih-specific considerations. Beginning with general phonetics, assuming that /j/ is, or is only [j], the palatal region has to accommodate not one, but three different [j] sounds (the basic one plus [3.7g, h]), as well as serve as a partial analogy for two others, (3.6d, 3.7p) above. However, the palatal region is, typologically, a somewhat underused region, probably for acoustic and perceptual reasons. It is implausible that such a logjam of sounds would cluster in the palatal region, particularly if this interpretation entails

⁷ There are, depending on what school one follows, between seven and 14 received Koranic reading traditions, known as the qiraaʔaat. Whereas the original recognized readers are mainly second/eighth century scholars, it was Ibn Mujahid in the early fourth/tenth century who compiled and systematized the traditions. ⁸ For shiyn Watson 1992: 74 suggests [ç] or [ɬ]. Daniels (2010) has a good summary of the debate over the last 50 years. Determining the phonetic value of shiyn is an issue in and of itself and is relevant to the discussion in 4.1.5. While Sibawaih’s description of ḍaad as a voiced emphatic lateral is fairly unambiguous, that of shiyn allows a lateral interpretation only inferentially (e.g. it is pronounced over an extended area of the mouth, which however, is compatible with the properties of alveopalatal [š] as well, see Sibawaih II: 467). Discussion is continued in 4.1.5.2.

3.2 ∗ K → Cˇ: SIBAWAIH THE MODERNIST

61

leaving the alveolar and alveopalatal region to the front empty of š-, cˇ- and dzˇ-like sounds. This general phonetic point can be underscored with a brief look at statistics readily available in the UPSID searchable database, the UCLA Phonological Segment Inventory Database, as available at the University of Frankfurt. This contains the segmental phonetic inventories of 450 languages. For the purposes of this chapter, two places of articulation were contrasted, the palatal vs. the alveo-palatal. The expectation is that the alveo-palatal region will show a greater degree of diversity than the palatal. This is measured by a simple statistic. Language tokens were classified as either having one segment at the palatal or alveopalatal position, or more than one (e.g. [š] and [dzˇ], or only [š]). Segments with secondary articulations (e.g. labialized, laryngealized, nasalized, pre-nasalized), and special phonation types (breathiness, aspiration) were excluded from the count. The 2×2 Table 3.1 shows that there is a high degree of difference between the two places of articulation, with the palatal position having a far higher number of languages with only a single segment (usually [y]) than the alveopalatal. Table 3.1 Palatal vs. alveopalatal, single vs. more than one segment, UPSID sample

single segment > 1 segment p = .000, df = 1, chi sq = 89

palatal

Alveopalatal

237 167

65 223

In a very basic way this confirms the observation that differentiation in the palatal region is considerably restricted. Moreover, in the 450 language sample, only 105 have three or more palatal segments, and the suggested combination of Arabic palatal values, [ç, y, j] (see above) occurs in only two languages in the entire sample. No language has as palatal values only [ç, y, j], the values of a “literal” reading of Sibawaih’s phonetic description. Looking further at the UPSID sample, no languages or ancestors of languages with which Arabic can be assumed to have been in close proximity in the late second/eighth century, Sibawaih’s era, have more than two palatal sounds and most have only [y]. On the other hand, alveopalatal sounds are common (Table 3.2). Expanding the survey slightly, Eastern Neo Aramaic in general has [š, zˇ, tš, dzˇ], while Western Neo-Aramaic has [š, tš] + either [dzˇ] or [zˇ] on a dialectal basis (Jastrow 1997: 334, 348). Similarly Middle Persian has [š, tš, dzˇ], with allophonic alternation (post-vocalic) of [dzˇ~zˇ] (Boyce 1975: 16). In the UPSID data base, of

62

ARABS, ARABIC Table 3.2 Alveopalatal and palatal sounds among languages in proximity with Arabic Amharic: Tigre: Modern Persian: Kurdish: Neo-Aramaic (Iranian Azerbaijan): Soqotri: ([tš] =

[š, zˇ, tš, dzˇ, tš’] [š, zˇ, tš, dzˇ, tš’] [š, zˇ, tš, dzˇ] [š, zˇ, tš, dzˇ] [š, zˇ, tš, dzˇ] [š, zˇ, š’] [cˇ] here)

the languages ancestrally linked to the Arabic of the eighth century, only Modern Greek lacks alveopalatals (but also palatals). Both from a typological perspective and on the basis of an areal survey, it would therefore be unusual for Arabic to have three distinctive sounds at the palatal region in the combination which has been suggested. On the other hand, two alveopalatal sounds (e.g. [š], [dzˇ] ~ [zˇ]) are not unusual, and alveopalatal sounds with values which can be interpreted from Sibawaihi’s own description of Classical Arabic are common in languages (or their ancestors) that Arabic is either closely related to genetically and/or in contact with. In Sibawaih-specific terms, it is noteworthy that after the hard palate (and as ever following Sibawaih’s progression of place, moving from the back toward the front of the mouth), Sibawaih hardly uses parts of the key passive articulator, the upper part of the mouth, to define further sounds. Instead, from the ḍaad onward, the basic parameter defining the upper articulator is the upper teeth. For instance, the sound which comes “after” the jiym, the ḍaad, is described as being pronounced with the edge (ħaafa, “blade, front”?) of the tongue in the area adjacent (maa yaliyhu) to the molars (ʔaḍraas). These considerations suggest that the area of the al-ħanak al-ʔaʕlaa covered a much wider articulatory range than simply the hard palate, as is usually assumed by interpreters of Sibawaih (e.g. Beeston 1962: 224; Cantineau 1960; Watson 2002: 269). That is, al-ħanak al-ʔaʕlaa was conceived by Sibawaih as extending from after the velar region (occupied by [k]) at least all the way to the alveopalatal region, if not beyond. This interpretation is compatible with the categorization of /y/, /š/ and /j/ as constituting a class. It is supported by Sibawaihi’s designation of the laam as also being in the al-ħanak al-ʔaʕlaa area, very unlikely a lateral palatal [λ], a value which, to my knowledge, no one has interpreted for it (see further in 4.1.5). Finally, Sibawahi’s contemporary Al-Xalil in his Kitaab alʕAyn categorized the /š/ with /j/ and the emphatic lateral /ɮ/ (ḍaad) forming the class /j š ɮ/ (Sara 2013: 524). As seen above, /ɮ/ in Sibawaih appears to be alveopalatal, so perhaps Al-Xalil’s classification recognizes that /š/ was further forward than palatal.

3.2 ∗ K → Cˇ: SIBAWAIH THE MODERNIST

63

Thus, against a number of previous interpretations (e.g. Cantineau 1960: 57; Schaade 1911: 19), it is plausible to assign more than one place articulatory interpretation to the basic jiym. It could have been palatal (in the sense of universal articulatory phonetics), but could well have been alveopalatal as well, or postpalatal. All fit within the extended interpretation of al-ħanak al-ʔaʕlaa. Indeed, it could have had more than one value. Sibawaih classifies the jiym as a stop (shadiyd), but is not more specific than this, for instance giving no intimation as to whether it should be interpreted as a simple stop ([-j]), or an affricate ([dzˇ]). Whatever the basic phonetic value of jiym was, (3.6d) says that there was also a sanctioned variant that was [zˇ]. If the basic jiym itself had the variants [dzˇ], [-j], two values of jiym widely attested among the dialects today would already have been in place in Sibawaih’s time and classified by Sibawaih as permissible pronunciations. Assuming this perspective in turn would potentially elucidate the variants (3.6) and (3.7) above. Both are correct but reflect different dialectal variants. Assuming that Sibawaih would not have detected two [cˇ] sounds, in (3.7g) the most likely interpretation of the sound is probably [cˇ], as this is the variant closest to /k/. In this case, both (3.7g) and (3.7h) use the “ka l-Y” to define a voiceless value. (3.7g) additionally nuances a more backed variant, via the comparison to “kaaf ” rather than “shiyn.” This would be the voiceless palatal stop corresponding to the voiced variant [-j]. Still, it cannot be ruled out that (3.7g) represents the second variant attested today, namely [ts]. To conclude this reading of Sibawaih, it is argued that among Sibawaihi’s nonsanctioned sounds (3.7g) was a voiceless alveopalatal and/or palatal stop. Allowing that his description is vague in respect to the articulatory extent of the ‘palatal’ sounds, it is assumed that already in Sibawaihi’s day the dialectal differentiation existed which is found today in the jiym. The basic jiym sound could have been /dzˇ/ or/-j/, it had a sanctioned variant [zˇ] (3.6d), while the proscribed variant of the first would have been [c].⁹ (3.7h) unequivocally in this analysis is a [cˇ] (agreeing via a different pathway with Fischer 1956: 34). This answers the question whether the 2FSG object suffix (see (3.1)) could have been a phonetic [cˇ]. I have dwelt on one issue, the ∗ k > cˇ/ts split for two reasons, the second of which I take up in the next section. Here what has been exemplified is that reading Sibawaih requires multiple interpretive perspectives. Foremost is Sibawaihi’s thinking itself. In some cases, as will be seen elsewhere (e.g. 5.3.1.3), there is a direct relation between what Sibawaih said and how he can be understood in contemporary terms. In other instances, as here, one needs to look behind a literal reading of his text. Three elements are involved.

⁹ This model, moreover, resolves the ostensible incompatibility of (3.6d) and (3.7h), which appear to refer to a single value. In the formulation developed here, the relation is not symmetrical, “X like Y” does not imply “Y like X,” so there is no contradiction in Sibawaih’s formulation of the variants.

64

ARABS, ARABIC

(3.17)

1. Sibawaih’s theoretical model 2. Interpretation of Sibawaih’s model in the context of contemporary linguistic constructs 3. Relevance of conclusion of 1, 2 to historical linguistic understanding

What is arguably the most interesting part of the exercise is reading Sibawaih himself. It is suggested here, for instance, that understanding Sibawaihi’s non-basic sounds requires a conceptual interpretation which is based on standard articulatory categories, but which are deployed in a conceptual framework unique to Sibawaih. The interpretation of (3.6) and (3.7) follows a long road through much of Sibawaihi’s phonetic and phonological thinking. To be clear here, contemporary ideas are interpretive instruments. They may work or they may not. The fact that they are modern does not give them precedence over Sibawaih.¹⁰ Still, if, as assumed here, Sibawaihi’s ideas are held to have universal linguistic validity, it follows that any linguistic model can, in theory, help us to understand Sibawaih himself.

3.2.2 The history of the ∗ k > cˇ/c split revisited: Sibawaih and historical linguistics I now turn to the third step in (3.17), to show the direct link between Sibawaih and linguistic history. As seen in (3.1), Sibawaih identifies four variants of the 2FSG object suffix. All of these are found (3.18)

Four 2FSG object suffix variants (see Map 5) (a) -iš 3 Highland Yemen, southeastern Arabian peninsula (see 2.3.4). -icˇ and -iš intermingle in Oman, with –icˇ dominant (Holes 1989). In the Gulf region, -iš is found among the Baħarna and occurs in the Emirates and in eastern KSA. (b) -its Najdi (Ingham 1994a: 14) (c) -icˇ [itš] Gulf, NW Yemen (Behnstedt 2016a: 20), “gilit” Iraqi, Jordanian, and Syrian desert, central rural Palestinian, Soukhne (Syria), Khorasan (Iran), Sharqiyya (NE Egypt), Jijel, Tlemcan (Algeria), Jabli, Debdou (Morocco) (d) -ik or –ki otherwise in the Arabic-speaking world

In an earlier treatment Holes (1991: 666) traces the conditioned split of ∗ k > cˇ to the thirteenth century. The arguments given here obviously speak for a far earlier development of –cˇ and perhaps –ts than this, being already in place ¹⁰ Thus in Owens (2021a) I argue that Heselwood et al.’s (2014) attempt to interpret Sibawaihi’s voicing concept in part with the help of modern instrumental phonetic techniques badly distorts Sibawaihi’s articulatory phonetic model and adds little to the complex and still unresolved issue of how he understood voicing.

3.2 ∗ K → Cˇ: SIBAWAIH THE MODERNIST

65

Map 5 Discontinuous ∗ k > cˇ

by Sibawaihi’s day. These agree with the earlier work of Fischer (1956: 32), “Man mo¨chte annehmen, dass damit gesagt sein soll, dass in der damaligen Zeit tatsa¨chlich bereits eine Palatisierung k : cˇ vorhanden war” “It can be assumed, that it should be concluded that in the earlier era the palatalization of k > cˇ was already present.” Fischer’s discussion concentrates on Sibawaih and the ALT, so there is no doubt that he understood the change to have occurred in early Islamic times at the latest. Stepping back from Sibawaih to take a broader look at the contemporary situation, Seeger (2002) notes that conditioned [cˇ], including the 2FSG –icˇ, occurs in Khorasan Arabic in eastern Iran, and Behnstedt and Woidich document conditioned [cˇ] “in non-/u/ contexts” (nicht -/u/- haltiger Umgebung,” 1984: 17 and p.c. May 2011)¹¹ in the eastern part of the Sharqiyya (Egyptian Delta). /cˇ/ is further found as an unconditioned reflex of ∗ k in Soukhne (Behnstedt 1994: 7). Cantineau (1960: 66) writes that /cˇ/ is found as an unconditioned reflex of /∗ k/ among Jewish speakers of Tlemcan, among Arabs in the Kabylie and in the area of Jijel (Djidjelli), fluucˇa < fluuka “boat,” and /ç/ in the mountains north of Tlemcan. Caubet (2018: 2, 3) reports /cˇ/ < ∗ k in Debdou in NE Morocco. Both Heath (2002: 140) and Vicente (2007: 131) note the [ç] reflex of ∗ k in northern Morocco (Jebli dialects). Cantineau considers the basis of these palatalized variants to be old, i.e. like Johnstone he relates them to the kashkasha and kaskasa of the Arabic grammars, without giving a detailed history of each variant. In various Levantine dialects ¹¹ Though in Behnstedt and Woidich (2018: 74) an unconditioned shift is reported, which would link this to Central Palestinian.

66

ARABS, ARABIC

(in particular, those referred to as the bəkuul group) as well as Soukhne in Syria (Cantineau 1960: 66), ∗ k unconditionally has changed to /cˇ/ (Behnstedt 1994: 7– 9; Behnstedt 1997: 30; Grotzfeld 1980: 174; Palva 1995; Seeger 2013a). Between the conditioned and unconditioned reflexes, the historical development was undoubtedly, ∗ k > conditioned ∗cˇ, and thereafter, an incremental unconditioned ∗ k > cˇ (see App. 3.2.2 for more detailed discussion). Taking a brief tour of the historical demography of these dialects, the Arabs of Khorasan probably go back to the earliest Arabic incursions (late seventh/early eighth century) outside of the original Arabic homeland. Arab migrations into the Sharqiyya are attested from the fourth to the fifteenth century (Behnstedt and Woidich 2018: 72–74), so pinpointing when [cˇ] arrived remains an open question. Certainly an early date is plausible. As far as North Africa goes, it is plausible that the earliest wave of Arabic speakers in the seventh and eighth centuries brought the forms to Algeria and Morocco, the complete merger having already occurred, as noted above, in the Levant. Independently of Sibawaih, the evidence that both the conditioned k > cˇ and the unconditioned change are found both within the Arabic pre-diasporic homeland, as well as outside of it, is a classic argument that the change occurred once, within the homeland, then spread outside of it. This supposition is bolstered when the change is associated with populations which were among the first to leave in the Arabic-Islamic expansion. The k ~ cˇ split thus already existed in pre-diasporic Arabic. When the origin of the proto split is to be found is an open question. It should be clear that this comparative historical linguistic interpretation dovetails with Sibawaihi’s treatment of the 2FSG suffix and has been expanded to argue that among his non-basic sounds was a [cˇ], and perhaps a [ts].

3.3 The early tradition The interpretive method advocated here is to assume a wide-ranging continuity between Arabic as described in the Arabic grammatical tradition, and indeed other early sources (see 4.2) and contemporary Arabic. This is an assumption necessary for a comprehensive understanding of the history of the language and it does not contradict the descriptive precepts and detail of the founder of Arabic grammar, Sibawaih.

3.3.1 The traditional linear approach It was briefly illustrated in 1.1 that traditionally Arabic language history was viewed as proceeding from one stage, Old Arabic, to another, Neo Arabic. I have termed this the linear interpretation (see 1.1.1). No¨ldeke (1899: 61) speaks of the

3.3 THE E ARLY TRADITION

67

Classical language “restructuring and deterioriating into dialects” (sich umbilden und in Dialecte … zerfallen). The early, influential Brockelmann (1908/1913) systematized this perspective. His brief was to provide an integrative interpretation of the Semitic languages, Arabic being but one of these. An original Arabic (1908: 25) is broadly identified with the Classical language, Altarabisch, while in the course of time this developed into the “Neuarabisch” dialects. A bifurcated Arabic representing two historical stages is an inherent part of Semitic historical linguistics. Bergstra¨sser (1928 [1977]: 156, repeated from 1.1.3) provides one of the clearest statements of the development from Old to neo Arabic, writing that “on the whole the neo-Arabic dialects derive from a unitary basis” (die neuarabischen Dialekte gehen im grossen und ganzen auf eine einheitliche Grundform zuru¨ck). This unitary basis is measured relative to Classical Arabic. Detailed discussion of this short summary can be found in App. 1.1.1a, as well as Owens (2006/9: 119–123). In the middle of the last century two scholars gave greater content to this transition from Old to New, Alt to Neu, proposing common outcomes via opposing mechanisms. Fu¨ck (1950) sees the critical period for the language to be the early Islamic period, as new urban centers developed where Classical Arabic developed, but at the same time, massive influxes of non-native speakers led to the simplification of the language and development of the dialects. Ferguson (1959) roughly turns Fu¨ck on his head: it was the urban areas where Arabs met and developed a simplified koine, while a purer form of Classical Arabic was maintained among the Bedouins. The koine, in turn, became the ancestor of the modern dialects, so that in Ferguson’s view the dialects developed not from Classical Arabic directly but rather via a koine which itself descended from Classical Arabic, as illustrated in Figure 1.1 in Chapter 1.¹² It should be noted that even if writers do not define how they understand the historical status of Classical Arabic, this variety is frequently cited as the normative historical state of the language pre-dating the “new” stage. Hopkins (1984) for instance contrasts the neo-Arabic of the papyri with Classical Arabic, while working in the other direction (see 4.2), Al-Jallad (2015a) orientates Safaitic relative to the later Classical Arabic. Retso¨ (2013) [2019], it can be noted, is a refreshing exception to this practice. It is a major argument of the current book that a strictly linear account of Arabic language history cannot be justified against the linguistic data which such an ¹² Knauf (2010: 204) is a slightly simplified version of this, where after standardization (Standard Arabic in his terms) comes dialect differentiation again. The relation of the dialects to their past is mediated via a standard: Knauf ’s tree dialect dialect dialect

standardization dialect

dialect

dialect

68

ARABS, ARABIC

approach claims to explain. This will be developed in various places, but already the difficulty of linearity was demonstrated in 3.2. Linearity would hold that there was a single form for the 2FSG, probably ∗–ki at a point of origin, which, presumably developed into –ši, -cˇi, and –tsi. At some level this argument might be correct, at least of [cˇ] and [tsi] (see Owens 2013c on conditioned ∗ k → cˇ). Though the dates are too late in the perspective of this book, it was seen in 3.2.1 that Holes (1991), for instance, dates what he terms dialectal –cˇ and –ts to the thirteenth century. However, deriving –ši from < ∗-ki, as seen in 2.3.4 is more problematic given that –ši reflexes occur throughout West Semitic. From an Arabic tradition-internal perspective, however, there is no historical development. As would be universally acknowledged among Sibawaih scholars, Sibawaih gave a strictly synchronic account of the variant forms in (3.1). He goes to some length not to understand the historical origin of –ši ~ cˇi ~ ∗tsi, but rather to understand their existence in perceptual and cognitive terms. Sibawaihi’s ʕArabiyya, Classical Arabic, is a complex object intimately tied to his own theoretical interpretive framework. This object contains many things. It contains nearly all the core structures and lexica and even most of the interpretive nuances, pragmatic, semantic, which became part of the later, standardized Classical Arabic (see next section). Sibawaihi’s grammar does not deliver to contemporary Semiticists and Arabicists a ready-made model for a single stage of Arabic history, anno 179/789. There are no labels in his grammar about Old and Neo-Arabic. There is, as seen above, a great deal which can be interpreted in historical linguistic terms. However, dealing with these interpretive issues has always been a challenge. One response is appeal to hyperbole. Rabin (1951: 2), a wonderful descriptive work which till today is perhaps the best single treatment of variation in early Arabic (ca. AD 650–900), concludes that West Arabian can be considered a “different language” from Classical Arabic.¹³ Practically in the same breath (1951: 13), however, he notes that “we cannot reconstruct the complete paradigm of any tense in any dialect; we can hardly say with certainty what a complete word may have sounded like” (see discussion in Owens 2013d/19: 458). It is hard to say here whether the problem is one of methodology: Rabin was not attuned to systematically dealing with variation. Or is it one of conceptual definitions? Assuming on some basis an invariant “Classical Arabic” which, as Brockelmann formulated the issue, can be integrated into a simple linear interpretation of Arabic language history, how can one deal with what is left? Fischer (1982: 37–45) is perhaps on the right track when he sees Sibawaih as the culmination of the development of the classical language (following in the tradition of Corriente 1971, 1973), but in the end Fischer answers none of the key questions ¹³ Al-Jallad’s (2020b) “Old Hijazi,” which relies largely on Rabin’s argumentation and to a significant degree on his data as well, is an early twenty-first-century remake of a classic mid-twentieth-century work. The criticisms expressed in Owens (2013d : 458) to Rabin automatically carry over to Al-Jallad, 2020b (see discussion in 5.2.2 and App. 5.2.2).

3.3 THE E ARLY TRADITION

69

which provide a conceptual framework for understanding the historical development. In particular Fischer’s pre-Classical stage appears to include the awkward variation such as described in 3.2, so that in the end, Sibawaihi’s grammar is the main witness to both pre-Classical and Classical Arabic. However, there were not two Sibawaihi’s, one who lived in pre-Classical times, one in classical. If western philologists want to turn him into multiple entities, then they have to do so using standard methods of historical linguistics, not arbitrary cultural cloning. Ultimately, Sibawaihi’s rich descriptions cannot be simply ignored, but they certainly cannot be simply labeled variants of Classical Arabic or, as in Fischer (1982: 44), a “klassisch-arabische Standardsprache.” By the same token, effectively everything which we today understand as Classical Arabic was defined by Sibawaih. Paradoxically, the problem is still with us today, and at the same time, an answer was already developed in the ALT.

3.3.2 Ibn al-Nadim: Classical Arabic as construct One is certainly justified at this point in asking on what Fleischer, No¨ldeke, Brockelmann, Bergstra¨sser, Fu¨ck, Ferguson, and a number of contemporary Arabicists and Semiticists (e.g. n. 12, this chapter) based their linear interpretation. I believe there are three parts to this answer. One, which I am not in a position to evaluate in any detail, is the conception of language as it was developed in the nineteenth century in the West. Historical linguistics in general was born of linearity.¹⁴ Secondly, naively but powerfully, history is iconically linear. If a language is divided into periods, Old, Middle, New (see discussion of Fleischer in 1.1.1 and see 4.2.3) it viscerally makes sense. The major argument of Chapters 9–12 is that it is with language history one needs to be very circumspect in assuming that linguistic stages will obey some sort of temporal determinism. The third and most important aspect for the present work derives from the Arabic tradition itself. Sibawaih (and a few other linguists I do not treat here) marks a beginning and an end at the same time. His Kitaab establishes definitive descriptive and theoretical standards for Arabic, but it also largely marks the end of what material can be included at all in the classical concept of Arabic. The issue leads into a different sort of language history from that being treated here, but two authors can be chosen as representative of the profound change in intellectual approach to the language which occurred during the 200 or so years after Sibawaih. The first is Ibn al-Sarraj (316/928). Sibawaih’s Kitaab was part compendious grammar, part compilation of a vast number of observations from various domains of grammar, for instance the variation in palatal quality and ¹⁴ From one perspective, Neogrammarians vs. the later wave theorists (Schmidt) and Schuchhardt the same issue of linearity was debated.

70

ARABS, ARABIC

segmental variants beyond a basic set of 29 sounds. It was, as Carter rightly emphasized (2004: 133), unusable as a convenient pedagogical work. It was Ibn al-Sarraj in his al-ʔUṣuwl fi l-Naħw who provided the first comprehensive account of Arabic grammar which was readily accessible to learners of Arabic. The organization is simple, yet effective. For instance, he orders his material functionally, first describing all nominal grammatical functions (e.g. topic mubtadaʔ, comment xabar, agent faaʕil) in the nominative case, then proceeding to all functions in the accusative and finally all in the genitive. He then moves on to verbs, and then to particles, and so on. In this he still manages to provide a wealth of variationist detail, even if not in the fine detail of Sibawaih, by adding at the end of each major chapter a section entitled “masaaʔil ʕan haađaa l-baab” “Issues arising from this chapter.” Ibn Al-Sarraj’s accomplishment was to create a handy, well-organized yet intellectually sophisticated reference work which, it may be suggested, along with further contemporaneous developments such as the crystallization of the KufanBasran schools debate, marks the beginning of Classical Arabic as it is popularly known today. Arabic as a pedagogical entity, as a tool for creating intellectual discourse across a spectrum of sub-disciplines, and probably as a tool for a ruling bureaucracy had come of age. A second scholar who I think is a key figure, at least for western perceptions of what Arabic is, is Ibn Faris, who lived some two centuries after Sibawaih (394/1004), in his al-Ṣaaħibiy. An ostensible motivation of Ibn Faris was to develop criteria, or simply to define himself, how proper, widespread Arabic could be identified and distinguished from Arabic which is marginal and dialectally limited. For Ibn Faris, what he terms the kaškaša and kaskasa 2FSG forms—met with in 3.2 above, terms by which they are known still today—are introduced in a chapter on that which is mađmuwma ‘reprehensible’ (pp. 35–40). In a fairly remarkable turnabout, Ibn Faris employs a condemnatory vocabulary which Sibawaih had developed, qabiyħ ‘ugly,’ radiyʔ ‘bad’ to characterize (p. 33), a vocabulary which as seen in 3.2 Sibawaih had never applied to these forms. What we witness here is the Arabic tradition taking stock of itself. Ibn Faris’ purpose is not to define general criteria for developing new technical terms, for instance as the language institutes (majaamiʕ) do today. It is rather to sift through that which has been defined as Arabic—material from Sibawaihi’s Kitaab plays a major role here—and to judge as to its appropriateness. The West, it can be suggested, bases its conceptualization of Classical Arabic to a large degree on the entity defined in the 80-year span between Al-Sarraj and Ibn Faris. This era saw the crystallization of a standardized Arabic, as well as, crucially, the development of a meta-vocabulary which could still countenance everything which Sibawaih (and others) had reported on, but was equally able to exclude, namely those elements which for one reason or another were not part of what can now be termed mainstream Classical Arabic. It follows from this that deviation from Classical Arabic, as for instance in modern dialects, could be interpreted as a linear development away from the classical source.

3.3 THE E ARLY TRADITION

71

It is clearly important to be able to contextualize the development of standardization. There is an unfortunate tendency to conflate the different sources and eras into a general “Classical Arabic.” Thus Holes (2018a, 2018b: 6) speaks of the grammarians as being “highly prescriptive.” Leaving aside the meta-question, whether the notion of grammar (whether defined by rules, usage, statistical tendencies) itself isn’t at some point of necessity prescriptive, such a summary generalizes across eras, style, and different ways of conceptualizing linguistic variation. The characterization, to be sure, does fit the “late” conception of grammar of Ibn Faris.¹⁵ It may even fit the phonological description of Sibawaihi’s contemporary Xalil ibn Ahmad in his Kitaab al-ʕAyn. As Sara (2013/2019: 524) notes, Xalil limits his description to the 29 phonemes of the emerging Classical Arabic. Crucially, however, the characterization does not speak to Sibawaih. Holes’ purpose is to define the basis of a historical Arabic dialectology. He appears to be on the lookout for early linguistic writings for old dialects which will provide a direct linkage to modern dialects, and proposes a dichotomy between a stylized, ritualized register, the basis of Classical Arabic,¹⁶ and a “tantalizingly undescribed dialectal variation” (2018: 7) in the eighth century. The trouble with this formulation is that it ignores the testimony of Sibawaih himself. Indeed, the very palatalization phenomenon which Holes (1991) insightfully treated as a dialectal development turns out to mimic Sibawaih’s description of the “same” phenomenon (see 3.2). The point here is that it is impossible as a methodological principle to separate Arabic dialects from the history of Arabic itself. Against Holes, a more incisive conceptualization of the relation between the “dialects” and Classical Arabic was provided by a contemporary of Ibn Faris, Ibn al-Nadim (385/995) in his Fihrist. Writing about the origin of Arabic he states, Each tribe [speech community] of Arabs had its own variety which distinguished it and was passed on, and which partook of the original source. The Arabs hindered the increase in dialects after the sending of the Prophet because of the Koran. (p. 3)¹⁷ ¹⁵ See n. 11, chapter 1, on changing conceptualizations of so central a cultural icon as the Koran. I should caution here that Ibn Faris treated a differentiated array of socio-culturally-tinged linguistic phenomena, not all of which created a simple standard—non-standard/sanctioned—proscribed dichotomy. At a number of places, for instance (al-Ṣaaħibiy: 58–66, 78–86) he documents lexical or semantic change between the pre-Islamic and Islamic era, and in one place (pp. 38, 41) neatly resolves the heated issue of whether the Koran has foreign words with his formulation that the words were loaned into Arabic and arabicized, prior to the koranic revelation. In an albeit limited fashion he thought historically. ¹⁶ Without mentioning him, Holes would appear to be following Retso¨ 2003: 591, 2013: 435–6 here. ¹⁷ ‫ وإن اﻟﺰﻳﺎدة ﰲ اﻟﻠﻐﺔ اﻣﺘﻨﻊ اﻟﻌﺮب ﻣﻨﻬﺎ ﺑﻌﺪ ﺑﻌﺚ اﻟﻨﺒﻲ ﺻﲆ اﻟﻠﻪ ﻋﻠﻴﻪ‬:‫وﻟﻜﻞ ﻗﺒﻴﻠﺔ ﻣﻦ ﻗﺒﺎﺋﻞ اﻟﻌﺮب ﻟﻐﺔ ﺗﻨﻔﺮد ﺑﻬﺎ وﺗﺆﺧﺬ ﻋﻨﻬﺎ وﻗﺪ اﺷﱰﻛﻮا ﰲ اﻷﺻﻞ ﻗﺎل‬ ‫وﺳﻠﻢ ﻷﺟﻞ اﻟﻘﺮآن‬

72

ARABS, ARABIC

In a nutshell this provides a working model for how Arabic language history should be conceptualized. It begins with dispersed varieties, known today as dialects, which can be conceived of as deriving from a proto-source. Substituting “standardization,” similar to that described above, for “the Koran” (see ch. 1 n. 11), one effectively has a model of diglossia, except a diglossia which was superimposed upon an inherited heterogeneity. In this formulation the key events in creating the apparent disjuncture between a dialect and a standard transpired with Islam, though with the active participation of the dialect speakers themselves.¹⁸ Whether one can, with Retso¨ and Holes, discern a ritualized language as a basis for Classical Arabic I believe remains to be seen. Before closing, let me emphasize that this evaluation pertains to the Arabic language, not to the Arabic linguistic tradition. If what is considered “Arabic” became circumscribed in its limits, the linguistic tradition continued to develop in brilliant ways, from the discourse analysis (Jurjani), the nature of linguistic categories (metatheory, az-Zajjaaji, Ibn Jinni), pragmatics (al-Astarabadhi), or semantics (ʕilm al-waḍʕ).

¹⁸ Pat-El (2017: 468) mysteriously credits me with having said that CA “… was not a spoken variant at all, and was … a fiction based on no contemporary model ….” This is presumably in Owens (2006), though unfortunately the editor of the volume failed to ask for a page number, leaving me perplexed as to the basis of this interpretation.

4 Three types of pre- and early Islamic sources The pre-Sibawaihian setting

It was argued in the previous chapter that Sibawaih marked a fundamental turning point in our understanding of Arabic—and if histories of Linguistics would give due weight to all non-European sources (not only Panini), one would say, understanding of language in general. Sibawaih, however, was far from the first source for early Arabic. In this chapter I will summarize three of the four pre-Sibawaihan sources for understanding early Arabic. These are the epigraphic record, Greek bilinguals and renditions of Arabic in Greek language texts, and the early papyri. I will not deal with a fourth early source, the Koran, which requires far more attention than a separate chapter in a linguistic history.

4.1 Epigraphy Traditionally the oldest Arabic-only inscription is considered the Nemara inscription from southern Syria, dated to AD 328. It was written in the Nabataean script. There are also earlier Arabic fragments embedded in texts in other languages (see 1.4). Pre-Islamic Arabic epigraphy in the Arabic language (in any script) is thus vanishingly rare, hardly 10 inscriptions scattered through the Syrio-Jordanian desert and NE Saudi Arabia. On the other hand, there are many inscriptions found in this same region written in a South Arabian based script. These have been identified with what are termed the Ancient North Arabian (ANA) languages, and a number of different varieties. Hajayneh (2011: 756, 759) provides the list in Table 4.1, with an approximate date for first attestation. These varieties are all written in a South Arabian type script and are attested from the Jordanian–Syrian desert to southern Saudi Arabia. The amount of effort that has gone into compiling and contextualizing these varieties is impressive and as a whole has greatly advanced our understanding of the pre-Islamic linguistic landscape in the Middle East. How much they can be claimed to directly elucidate the history of Arabic is a linguistic issue which I will look at critically here. Arabic and the Case against Linearity in Historical Linguistics. Jonathan Owens, Oxford University Press. © Jonathan Owens (2023). DOI: 10.1093/oso/9780192867513.003.0004

74

THREE T YPES OF PRE- AND E ARLY ISL AMIC SOURCES Table 4.1 Ancient North Arabian languages/varieties Taymanitic, (400 inscriptions) NW KSA, (ca. 500 BC) Dumatic (three graffiti): NW KSA (ca. 500 BC?) Dedanite (Lihyanic): NW KSA (beginning 6th BC) Hasaitic: Hasa and neighboring areas in NE KSA (2nd BC) Hismaic: southern Jordan into NW KSAa (1st BC) Thamudic B: between Madaʔin Ṣalih and Taymaaʔ, NW KSA (Beginning 6th BC) Thamudic C: Najd Thamudic D: NW Saudi Arabia Safaitic: eastern Jordan-Syrian desert (100 BC–AD 400) a

Also termed Thamudic E. Taymanitic is also known as Thamudic A.

Already in the 1970s and ’80s, as noted above, Parker (1987) and Peters (1978) suggest that Safaitic is in fact a part of Arabic, and a certain consensus has developed in this direction in recent years, with Hismaic also being included as a variety of Arabic (Graf and Zwettler 2004). Other ANA varieties are sometimes regarded as sister languages (see discussion of Knauf in 3.1). If the view is accepted that Safaitic is Arabic, given the large corpus of Safaitic and the fact that it is attested some 900 years before the crystallization of Classical Arabic (Sibawaih), it might appear that an invaluable key is available, which gives concrete shape to forms and structures which otherwise can be inferred only by historical linguistic comparative and internal reconstruction. As it turns out, matters are not so simple. The basic issue is that despite the progress that has been made in integrating ever more inscriptions into the data base, there remain huge gaps in what these inscriptions tell us linguistically. In order to understand the critical stance which I take in regard to the Arabicist tradition of historical linguistics, it is necessary to examine some of these ANA corpora in detail. Before starting in on this, however, it is relevant to remember universal problems in using any historically fixed corpus. The 2012 The handbook of historical sociolinguistics edited by Juan Manuel Hernández-Campoy and Juan Camilo Conde-Silvestre dealt extensively with issues in the historical evaluation of language on the basis of texts. The problems they define are relevant to the status of any linguistic problem, socio-historical, or historical linguistic and serve as a cautionary note. Their methodology does not deal with reconstruction based on contemporary corpora, as practiced extensively in this book. They do, however, implicitly recognize that historical sociolinguistics—I would prefer the term philological sociolinguistics since their work depends exclusively on written texts—is not simply sociolinguistics applied to old texts (see discussion of Bergs 2012 in 6.3).

4.1 EPIGRAPHY

75

In this work Hernández-Campoy and Schilling (2012: 67–71) summarize the inherent problems in older, written material. They identify six different ways in which historical written texts will yield material which makes them inherently different from normal sociolinguistic corpora (Table 4.2). Table 4.2 Sociolinguistic attributes of written corpora • representativeness of sources: some texts preserved, others lost • limitations of corpus size, genre, class, social attributesa • invariation in written texts, conventionalization • authenticity, written lg vs. ‘vernacular of the writers’ • authorship, identity of author, social position of writers, their motivation to write all often uncertain • social structure of the community in which texts produced a

A good case study of this type is Rosemeyer (2019) who shows that the apparent large-scale replacement of Ex-situ WH-interrogatives by In-situ interrogatives between 1800-present in Brazilian Portuguese is due to a greater representation of orality in later texts, a genre favoring In-situ interrogatives.

These authors also provide a useful Table (4.3) contrasting differences in working with spoken vs. written language (2012: 67, see also Maas 2009). Table 4.3 Written vs. spoken language Written language

Spoken language

literate people (upper ranks, men) randomly preserved texts social structure to be reconstructed

all people authentic speech: observation, elicitation society familiar, much data available

I do not think it necessary to comment on each point individually, though a few remarks are relevant. Oral corpora are infinitely expandable, subject to time and resources. If after an initial survey it turns out that one group, area, or genre is underrepresented, or appears especially interesting in the speech community under study, one can always go back and get more texts, as many as needed. If, as is often the case in Arabic, an initial collection should yield a corpus one suspects of being overly standardized (see 6.8 on Educated Spoken Arabic), one can devise strategies for recording under more informal circumstances, and so on. In Arabic a historical sociolinguistics does not exist nor do the corpora necessary for such an endeavor, as understood in Tables 4.2 and 4.3. It is true that the remarkable work of Sibawaih goes some way to at least providing us authentic early material (observation, elicitation, all people, at least all Bedouins or all

76

THREE T YPES OF PRE- AND E ARLY ISL AMIC SOURCES

reliable informants¹). However, here we are entirely beholden to this one observer (plus a few others such as al-Farraʔ).² So-called Middle Arabic texts were once thought to provide “the missing link between Classical Arabic and modern Arabic dialects” (Blau 1981a: 115) though now these are by consensus interpreted as a mixed stylistic genre, akin to the early papyri discussed in 4.2 (Larcher 2001, Versteegh 2005, even Blau 2002: 14, see discussion in Owens 2006/2009: 46–47). Of course, only very few languages offer even the semblance of a sociolinguistically manipulable historical corpus (see 6.4), but beyond the basic issues listed in Tables 4.2 and 4.3 further problems accompany the interpretation of early Arabic texts. I will illustrate this with summaries of two of ANA varieties, Taymanitic and Safaitic, in this section (and see App. 4.1 for a third, Hasaitic) and then move on critically to the papyrological and Greek transliterational material.

4.1.1 Taymanitic A number of languages are hardly known linguistically, beyond being a name of an entity. Hasaitic, summarized in App. 4.1. is one such. Taymanitic from NW KSA (Map 2) dated to around the sixth century BC is somewhat richer with around 400 inscriptions. A good summary of these is provided by Kootstra (2016). While much more usable linguistic information can be derived from these, the improvement is tangible only by comparison with the Hasaitic situation described in App. 4.1. • Perfect verb: four attestations with {t} ‘I,’ remainder 3MSG formally (2016: 90) • Imperfect verb: one attestation (2016: 91) • Independent pronoun: one attestation {ʔn} (1SG, ʔana) (2016: 96) • Object pronoun: two attestations, {-hm} 3MPL, {-h} “3MSG” (probably) (2016: 96) On the other hand, there are important classificatory bits and pieces, which include the following. • {h-} definite article (i.e. not /l/; see n 2.3) (2016: 86) • FSG nominal suffix probably {–t} (not {–t ~ -ah}; see Figure 2.6 in 2.3.3 above, 2016: 84) • Two significant sound changes, ∗ đ and ∗ z → /z/, ∗ θ and ∗ s3 (s) → s3 (s ?) (2016: 105) ¹ .‫ﻣﻦ ﻳﻮﺛﻖ ﺑﻌﺮﺑﻴﺘﻪ‬ . ² Larcher (2020) emphasizes that what might appear to be based on early oral sources for instance in the qiraaʔaat in fact are based on an oral interpretation of a written (and usually “defective”) model.

4.1 EPIGRAPHY

77

Here one is dealing with a language fragment, though there is enough of it to provide a wider picture of the diversity of varieties which are included among ANA. Even a few such features help give a feel for the diverse linguistic landscape. The invariable FSG suffix {–t} aligns Taymanitic with Safaitic (see below) and Ugaritic, indicating that the shift to {–t ~ ah} characterizing Hebrew, Aramaic, and Arabic was circumscribed among the speech communities of the region. Whether assuming a NW Semitic affiliation is justified remains an open question (Gzella 2017: 298). Kootstra (2016: 104–107) nicely summarizes what Taymanitic might tell us about the nature of ANA in general, but what one can say based on the small, defective corpus is necessarily limited. Essential questions for the history of Arabic and WS in general, did Taymanitic have case, is Taymanitic really a VSO language,³ what was the form of the 2FSG object pronoun suffix, unless dramatically new inscriptional data is found, will never be answerable.

4.1.2 Safaitic Safaitic moves to the other end of the data extreme. There are over 33,000 Safaitic inscriptions available (see Map 2), and many more still to be documented. Assuming an average of seven words per inscription, the total available corpus amounts to over 230,000 words. Assuming a rate of about 10,000 words per hour in spoken language, this would come to around 23 hours of speech. Aside from the printbased giga-corpora which can be downloaded today, a 20 hour + corpus is quite large. In today’s terms, many corpus-based sociolinguistic studies are quite comfortable with such a data base. In other words, whereas the defective corpora of Hasaitic and Taymanitic are certainly in part attributable to their small size, one can legitimately expect more from Safaitic. To a degree this is the case. A recent excellent summary of Safaitic is provided in Al-Jallad (2015a). This deals with orthography, phonetic interpretation of the script, morphology, and syntax, as well as providing useful vocabulary lists characterizing the different conventionalized domains which the inscriptions cover, such as lineages, pasturage, relations with Nabataeans and funereal matters. I examine the data from the following methodological perspective: if one only had the corpus of Safaitic, would one arrive at a language like Classical Arabic? Formulated in another way, does Safaitic need CA⁴ more than CA needs Safaitic? This can be expanded from CA to any form of Arabic. I use this criterion in 4.2 as well. ³ Kootstra (2016: 103) notes that most sentences are SVO. This runs counter to a widespread assumption that WS (Semitic in general?) inherited a VSO word order. She explains the discrepancy as one due to a topicalization construction which moves the S to topic position. This, however, begs the question whether Arabic (and perhaps Aramaic, see App. 7.2.5) had any unmarked order of major sentence constituents, as opposed to a pragmatically determined word order (Owens and Dodsworth 2009). ⁴ I follow Al-Jallad (2015) in using Classical Arabic as the standard of comparison.

78

THREE T YPES OF PRE- AND E ARLY ISL AMIC SOURCES

First of all it should be stated that there is an impressive amount in Safaitic which is reproduced in Classical Arabic and many other varieties of Arabic. Some of this is well known but is still significant and bears repeating. These include the existence of broken plurals, which was seen in 2.2–2.4 to be an important traditional isogloss separating Arabic from the NW Semitic languages, and the personal prefixes in the imperfect verb, t- ‘3FSG,’ y- ‘3MSG,’ n- ‘1PL.’⁵ Like CA and some contemporary dialects, Safaitic has gender polarity with numerals 3–10 (2015: 90), the imperative of verbs beginning with “w” lose w- in imperatives ({hb} ‘give!’ < whb), as in CA {s-} marks a future, and as in CA there is variation in verbs ending in –w which point to a merger with verbs ending in –aya (2015a: 50). (4.1)

Saf: {rḍw ~ rḍy} CA: radaw-tu ~ raḍay-tu ‘I accepted’ (Lisaan 14: 323–324) CA innovations

Under the assumption that Safaitic is a form of Arabic, the most interesting cases are probably those which show CA to be innovative relative to Safaitic, though it has to be said, there are not many of these. Form of definite article. There are three different forms marking the Safaitic definite article. One of them is, as in CA, l- (or ʔal⁶). A second is ʔ- which al-Jallad notes (2015: 74) may represent an assimilation of l-, while the most frequent is h-. In other work (e.g. Mu¨ller 1982) h- (or hn-) is frequently cited as an isogloss separating the ANA varieties from Arabic, and as seen above in 4.1.1, Taymanitic has only h-. Tropper (2001: 10) sees the CA l- as derived from ∗ h(n). In this perspective Safaitic would represent a variationist stage in which l- and h- coexisted, before l- won out in most varieties.⁷ Form of nominal FSG suffix. Safaitic has invariable –t in all positions, a property it shares with Taymanitic (see 2.3.3, 4.1.1). Lastly, weak final verbs are consistently represented with a final –y or –w (alJallad 2015: 47). (4.2)

{bny} ‘build’ {štw} ‘winter over’

In the current case, stems with final /y/ and /w/ in Arabic are considered by the Arabic grammarians to end in a semi-vowel, although the appearance of this consonant is determined allomorphologically. The symbol “@” here stands for an underlying form whose surface realization is determined by subsequent rules.

⁵ Safaitic lacks (attestations of ?) the expected 1SG imperfect prefix ʔa-. ⁶ The grammarians in a phonologically correct manner understand the definite article to be l- alone (laam al-taʕriif ), with “ʔ” in ʔal- representing a conditioned epenthetic vowel. ⁷ As Tropper (2001: 11) points out, nearly the same variation (hn ~, hal ~ l-) is attested in Lihyanic (Dedanitic), which indicates Safaitic was part of a wider areal variational universe.

4.1 EPIGRAPHY

79

In the synchronic rules of morphophonology as developed by the grammarians (Owens 2000), these are derived via two rules. One is a general rule which converts the sequence ay/wa to aa. The second shortens a long vowel in the context VVC# or Vy/wCC to VC#/VCC. (4.3)

Rule 1 ay/wa → aa Rule 2 VVC# → VC#

The effects of these are general, applying to weak final and weak medial verbs, as shown in Table 4.4. Table 4.4 Derivations with weak consonants

Underlying representation 1. ay/awa → aa 2. constraint on VVC# (VVC# →VC#) Surface realization

Weak final

Weak medial perfect

@banayat @banaat banat banat

@bayaʕa baaʕa baaʕa

Beginning with the underlying form @banayat, aya converts to /aa/, which in turn in the context aat# shortens to at# at the word boundary. A weak medial verb @bayaʕa is included on the table, to show that the derivation applies generally to ay/wa strings. There are, it should be noted, also many forms in the {bny} cohort in CA and other varieties of Arabic where a palatal vowel does appear: (4.4)

binaaya ‘building’ yabniy ‘he builds’

The Safaitic finding represents an interesting confirmation that analyses based on internal reconstruction can be confirmed in historically attested forms. It should be emphasized that (4.3, Table 4.4) are formulated by the grammarians as synchronic rules. From today’s perspective, however, they can in this case be interpreted as internal historical reconstruction.

4.1.3 Limits of Safaitic for historical reconstruction; the burden of underspecification While acknowledging the importance of Safaitic for understanding the overall history of Arabic, from a wider comparative perspective it is relevant to point out the limitations of the data base for comparative purposes. The problem permeates nearly all aspects of interpretation and is encapsulated here under the rubric of underspecification. Two types can be distinguished. Formal underspecification

80

THREE T YPES OF PRE- AND E ARLY ISL AMIC SOURCES

references the well-known fact that many Semitic scripts, including Arabic and the South Semitic-derived scripts used for the ANA languages, do not indicate short vowels, consonant gemination, and usually do not indicate long vowels /aa, ii, uu/ either. As any beginning student of Arabic learns, a very normal orthographic word such as {kml}, ‫ ﻛﻤﻞ‬might represent, kamala (or kamula/kamila) ‘it is complete,’ kaml ‘finishing,’ kammala ‘he completed,’ kummila ‘it was completed.’ Students today learn to fill in the correct form through rules of grammar and contextual inference. This formal situation is no better in the case of the inscriptions and even the papyri discussed in the next section, but in these cases there is no recourse to a reference grammar or a larger set of contexts to facilitate the interpretation. Everything in the epigraphy is inherently underspecified formally. Formal underspecification as such is not inherently a problem. To the contrary, given education, experience, contextual inferencing, practice, the underspecified Arabic script is quite efficient. However, there is a second type of underspecification which can be termed pragmatic underspecification. Information is missing because the macro-context of the inscriptions is discourse poor. Forms are simply missing, not underspecified. For instance, there are no unequivocal tokens (as yet) of the 1SG perfect verb, and probably none of the 1SG imperfect. Presumably there are no contexts requiring the agency or description of ego. Yet in normal speech, the first person singular is, as expected, highly frequent. By way of illustration, in a corpus of Emirati Arabic (the basis of Owens et al. 2013: 10) the first person singular comprises 17% of all verb forms (193 null subject verbs, 99 overt subject). Of course, pragmatic gaps might, with luck, be filled with a new discovery. Nonetheless, it is highly unlikely that there will ever be enough tokens found which allow one to get a feel for the discourse relevance of differentiated person marking in verbs. In the next two sections I illustrate the consequences of underspecification for historical interpretation.

4.1.3.1 Underspecification I: Lack of formal indication of short vowels, gemination First, as those who work in this field acknowledge, the lack of short vowels makes interpretations of many important constructions speculative. For instance, the internal passive (Al-Jallad 2015a: 135, 136), the unambiguous interpretation of verbal nouns (58) and the interpretation of verb stems in CA all require the help (šakala) of added signs for vowels and gemination, as marked in (4.5) in boldface. (4.5)

Passive: f uʕila Verbal noun: f aʕl Stem II: f aʕʕala

Each of the forms in (4.5) in Arabic script is simply {f ʕl}‫ﻓﻌﻞ‬. Without the vowels and gemination marks, in many/most contexts the intended pattern (wazn) can only be guessed at. In (4.5) three possible guesses are spelled out.

4.1 EPIGRAPHY

81

Al-Jallad (2015a: 137) recognizes this problem: “it is unclear whether all forms which must be interpreted as semantically passive should be construed as such.” In contemporary spoken Arabic impersonals, including passives, can be marked in any number of ways, via 3MSG or 3FSG verb forms, via second person imperfectives, via 3MPL verbs (isamm-uuna ‘they name it’ = it is called’; see Owens et al. 2013: 25 for discussion and 11.4 Table 11.4). An “impersonal” context can be rendered by any number of formal means.

4.1.3.2 Underspecification II: Gaps in paradigms Compounding the ambiguity of underspecified orthography is the simple lack of key forms in paradigms, the pragmatic underspecification issue. As is well known, and whether or not one agrees with him, Greenberg (1987) developed an entire theory of the spread of language in the western hemisphere largely on the basis of a comparative consideration of pronouns. In Safaitic only the following independent (6) and object pronouns (Table 4.5) are attested (al-Jallad 2015a: 94–97). (4.6)

independent pronouns ʔn, 1SG h 3SG Table 4.5 Suffix object pronouns in Safaitic SG

PL

12 -k 3 -h

-n (one attestation) -km -hm

Only two formal independent pronouns are attested and one of these h is ambiguous between M and F. Similarly among the object pronouns, -k and –h could equally be M or F. In this case al-Jallad does in fact propose distinctive reconstructions, based on antecedents which suggest either M or F reference. I return to this point in 4.1.4 below. Ignoring the dual object pronouns which are postulated for the Safaitic data largely via linearly based assumptions, not formal evidence in the epigraphy itself, in CA and many varieties of Arabic one expects a complete pronominal paradigm to consist of 10 members. In fact, only two independent pronouns are found in the data, and only five object pronouns. Anticipating the discussion of the use of contemporary data for historical reconstruction, it is interesting in this context to contrast this basic data with a somewhat randomly selected text from

82

THREE T YPES OF PRE- AND E ARLY ISL AMIC SOURCES

contemporary Arabic to compare how many members of the independent and object pronominal paradigms occur in running text. As noted, the Safaitic data which forms the basis of Al-Jallad’s data would, if spoken, amount to approximately 22 hours of speech. In these terms, the actually attested pronouns are exiguous in the extreme. For the comparison I have used a text from the Bayreuth online collection of LCA (Nigerian Arabic, Owens and Hassan 2011–present), a text of about 3,500 words. It is available in transcriptional, audio, and translated format. The text is IM50XMS. The interview was conducted in August 1990 with the main speaker a woman who had grown up in Maiduguri, Nigeria. She was interviewed by a first cousin of hers, so there was a high degree of informality in the interview situation. Table 4.6 lists in paradigmatic format all of the independent and object pronominal forms which are found at least one time in the text.⁸ Table 4.6 Independent, object pronouns IM50 XMS, 3500 words Independent pronouns

Object pronouns

SG 1. ana 2M inta 2F inti 3M hu 3F hi

SG -í, ni -ak -ki -a -ha

PL aniina intu – humma hinna

PL -na -ku – -hum –

For the independent pronouns only the 2FPL intan is missing, i.e. it is 90% complete. For the object pronouns the 2FPL (-kan) and 3FPL (-hin ~ -han) are missing, an 80% coverage. Note that the paradigms understate the amount of information actually found in the text, in comparison to the Safaitic data, particularly in respect of allomorphy. For instance, the 3MSG (see 10.2, Table 10.8) is –a after a consonant, but after a vowel is marked by a shift of stress to the final vowel (as in virtually all Arabic dialects), šaaf-o ‘they saw,’ šaaf-ó ‘they saw him.’ Moreover unusual formal attributes such as the stressed 1SG possessive suffix –ı´ vs. the unstressed –ni are readily discernible in a way they would never be in an epigraphic text (see 5.3.2.2). Of course such a comparison in a sense is not “fair.” One expects from an interactive interview among acquaintances a free exchange of information of all kinds, ⁸ That it is not far-fetched to invoke a contemporary paradigm to elucidate an ancient one can be seen intuitively by comparing Table 4.6 to Table 2.4, in particular the CA and Hebrew independent pronouns. There is a high degree of correspondence among all three paradigms. A more rigorous defense of this approach is given in Chapters 10 and 11.

4.1 EPIGRAPHY

83

and pronouns are crucial to any such discourse. Such a discourse situation does not exist in Safaitic. But the point is not to bemoan the lack of data or by explaining it away to trivialize the comparison above, but rather to point to a simple, hard truth which will be developed in the course of the book, namely that comparative reconstruction requires hard comparative data. If it is found in LCA, and if the data derived from it can be shown to be useful for historical reconstruction, then it appears, barring counterevidence, that LCA (perhaps any Arabic dialect) will tell us more about the proto-Arabic pronouns than will Safaitic. I develop my critical remarks in the following section.

4.1.4 The contradictions of interpreting underspecification It is clear that Safaitic and the other ANA languages have a severely underspecified orthography, lacking a clear indication of vowels and gemination, two elements which are crucial to understanding the language in fine detail. As already noted, there are many cases where the orthography allows in a general way deductions about what variety lies behind it, and in many cases these appear to be broadly identical with CA (and other varieties of Arabic). In a few cases, deviations from CA are obvious enough to speak of language change within one parameter or another. Note, however, that when I say “broadly identical,” much fine-grained differentiation may be lost. The nominal FSG ended invariably in –t, but we have no way of knowing whether it ended in –at, –it or simply -t, or perhaps varied among all three. All three variants are attested throughout the varieties of Arabic, with –t a frequent allomorphoric variant when ∗ Vt occurs in an open syllable. The problem of interpreting values beyond a basic consonantal form is significant enough, for the historian of the Arabic language at least, to give it a name. I will call it the “ascription of detailed value” and look at the issue from three perspectives, orthography and reconstruction, treated in this section below, linearity (4.1.5, 4.1.6) and the problem of interpreting ∗ s1, ∗ s2 in Safaitic, Arabic, and Aramaic (4.1.7).

Orthography and reconstruction In Table 4.7 I give my interpretation of what object pronominal forms are found in the Safaitic material. As already noted, only two forms, a generic second person and a generic third person singular can be unequivocally identified. Al-Jallad (2015: 94) sees this differently, differentiating between for instance 2M and 2F as in Table 4.7. For the sake of brevity I list only the singular forms, and will concentrate only on the second person.

84

THREE T YPES OF PRE- AND E ARLY ISL AMIC SOURCES Table 4.7 What one sees, what one interprets (a) What one sees SG 12M –k 2F –k 3M -h 3F -h (b) Interpreted (by al-Jallad 2015: 94) SG 12M –k ∗ /ka/ 2F –k ∗ /ek/, ∗ /kii/ 3M -h 3F -ha a

Al-Jallad also interprets a vowel for each of the third person pronouns, -oh MSG, -ah FSG. As with -k, the Safaitic epigraphic record however, gives no support for this interpolation to date.

One needs to distinguish here between what one sees in the epigraphic record and what one deduces from it.⁹ Without making this explicit, a reader might be mislead into thinking that we have definitive evidence in Safaitic for a distinction between the 2FSG vs. 2MSG object pronouns.¹⁰ In fact, the starred forms are reconstructions, and Al-Jallad simply posts some possibilities without justifying them in comprehensive comparative terms. In Owens (2006/2009: 248–251) such reconstructions were made as in Table 4.8. Table 4.8 2SG object pronouns, reconstructed (a) ∗ ki = F (b) ∗ k or ∗ ka = M ⁹ A point appreciated by Brockelmann: “Da alle semitischen Sprachen der a¨lteren Zeit … ihren Lautcharakter nach im einzelnen nur mangelhaft bekannt sind, so mu¨ssen wir zu ihrer Deutung noch mehr, als bei den indogermanischen Sprachen erforderlich ist, auf ihre ju¨ngsten, von europa¨ischen Forscher genau beobachteten Entwicklungsstufen eingehn.” (1908: 34). (Since all of the older Semitic languages are only attested in their phonetic character in degraded format, we therefore have to rely for their interpretation far more than is the case with the Indo-European languages on contemporary developmental stages such as accurately observed by European researchers.) What Brockelmann does not say here is that by appealing to successor forms to fill in predecessor forms, one is effectively invoking the comparative method, without actually subjecting the material to the comparative rigors of the method. ¹⁰ An identical criticism is found in Owens (2018c: 102) of the Semiticist practice ascribing to bare consonants a vowel-based case system in epigraphic South Arabian when in fact there is no physical indication of their presence. The case system is simply copied over from Classical Arabic, with no regard to comparative reconstruction.

4.1 EPIGRAPHY

85

There is no need to discuss this in detail here. The reconstructions obviously stand independently of the Safaitic inscriptions, since they were made without any reference to this variety. The point of independence, however, means that if one should want to attribute to the bare {k} (Table 4.7a) a more detailed phonetic value via reconstruction (Table 4.8) one can, but only in the knowledge that there is no independent confirmation that the Safaitic {–k} should be understood as in Table 4.7b. Until such is shown, Table 4.7a is preferable to Table 4.7b as a representation of the Safaitic object pronouns. Note that the reconstruction in Table 4.8 does not contradict Table 4.7a. In passing it can be noted that the present example exemplifies the usefulness of distinguishing between formal and pragmatic underspecification. In this case the pragmatic contexts in the examples in Al-Jallad (2015: 96) perspicaciously allow the inference to be made that –k can represent a male or a female reference.¹¹ Unfortunately, formal underspecification renders speculation about what vowels should be interpreted of no interest from a Safaitic-internal (i.e. non-comparative) perspective. The phonologically underspecified Safaitic texts create a curious situation. On the one hand in them is found evidence pre-dating CA by up to 900 years that forms familiar to CA were in all likelihood present in an ancestral variety. As will be seen, this finding can be arrived at in other ways, but the epigraphic texts are valuable in adding a chronological link to our historical reconstructions. It needs to be asked, however, what can be unequivocally read off of the underspecified Safaitic texts. I continue this point in the next section.

4.1.5 Linearity As explained in Chapter 1, the traditional Arabicist interpretation of language history is a march from Semitic to the dialects. I have termed this linearity. Linearity, however, needs to be shown, not assumed. I illustrate this here with two critical cases.

4.1.5.1 Link to CA The ascription of detailed value always implies impoverished primary data. If it needs to be filled in at all, then attribution based on reconstruction is the best route to go, as discussed in the preceding 4.1.4. What should not be taken for granted is a simple link to an assumed OA value (or CA value). Two brief case studies presented show this. In a few cases a Safaitic 3SG object pronoun is preceded by n-, glossed here as “N.” All three such cases concern pronominal objects of verbs. ¹¹ For the feminine {h ʔlt s1ʕd … ʕbd-k} ‘O ʔlt aid … (?) worshipper-your.F.’ with ʔl-t interpreted as a female goddess, therefore ‘your-F.’ One needs to be cautious even here, however. In Taymanitic (Kootstra 2016: 88) Kootstra points out that –t-final nouns appear to mark masculine personal names.

86 (4.7)

THREE T YPES OF PRE- AND E ARLY ISL AMIC SOURCES {y-ʕwr-n-h} 3-efface-N-it ‘He effaces it’ (al-Jallad 2015: 97)

Al-Jallad says this –n- invites “… comparison with the energic endings in CA and Ug” (2015: 98).¹² There is, however, an alternative explanation to the energic affiliation and that is that the –n- represents what I have termed an intrusive –n. I discuss this in greater detail in 5.2. Here it suffices to note that there exist Arabic dialects in which an –n- is automatically inserted before pronominal objects. This is functionally distinct from the CA energic usage, even if probably historically related (see Owens 2013a). (4.8)

n-sawwi-nn-a ‘we do it’ (Oman) we-do-N-it.M

(4.8) in particular would appear to replicate (4.7). Given the fact that in all Safaitic examples the -n only occurs preceding an object pronoun, the intrusive -n interpretation is this case looks to be the more convincing. This aligns Safaitic with those contemporary dialects with the intrusive -n, rather than with the Classical Arabic energic. A similar instance applies to the following example. (4.9) mħlt-n l-đ y-ʕwr dearth of pasture-n to-who 3-efface ‘dearth of pasture to he who would efface’ (2015: 69) This –n is interpreted as “a vestige of nunation” (tanwin). Given that there is no evidence for a CA-type case system in the data,¹³ this claim is best understood as dictated by the linearity assumption, that a nominal -n must point toward a known function in CA. However, the example equally fits the description of what I have termed the linker –n. This again is cognate with, but not in a linear relationship with, the tanwin of CA. Here an element –Vn can be inserted between a noun and an adnominal adjunct, as will be discussed in greater detail in 5.3.2.9. In this case as well the primary Safaitic data allows a choice between the two alternatives. Al-Jallad’s assumption of a linear cognation with CA may be correct, but nothing inherently speaks against the alternatives suggested here.

4.1.5.2 Link to Proto-Semitic In the previous section the linear link was projected forward, from Safaitic to CA. A case with wider ramifications concerns the degree to which one can work in the opposite direction, from Safaitic back into proto-Semitic. One case involves the Safaitic graph which is customarily equated with the proto Semitic value ∗ ɬ and ¹² la-yaf ʕal-anna ‘he should do!,’ see Kitaab II: 152 and extensive discussion in Owens 2013a. ¹³ There is perhaps evidence for one case, one in –a, but its interpretation is problematic (see Owens 2018c), and even if an accusative case, would only point to a Ge’ez type case system, which is markedly different from that of CA.

4.1 EPIGRAPHY

87

is identified as s2 (also represented as /ṥ/ and {ˆs} see Table 2.2 and ch. 2 n. 1). ExSafaitic, in Arabic ∗ s2 is realized as /š/, and s1 and s3 as /s/.¹⁴ Al-Jallad, however, argues for a Safaitic value of /ɬ/, i.e. the direct proto-Semitic value. This then would have shifted to /š/ in other forms of Arabic. It is worthwhile enquiring whether there are any strong arguments supporting this case, other than the presumed cognation with PS s2 (∗ ɫ). To begin with, as with the other cases discussed in this section, there is no evidence in the primary data, i.e. the glyph for /ɬ/ (or indeed for any related value such as /s/ or /š/).¹⁵ The argument for /ɬ/ runs as follows. First it is pointed out (2015a: 44–45) that Aramaic names in Safaitic texts are transcribed with s1/∗ š. In Aramaic the sibilants have a different phonetic value for Arabic s1. Aramaic s1 is interpreted phonetically as equivalent to PS s1 = ∗ š. This is set out in Tables 4.9a and 4.9b, repeating information from Table 2.2 in 2.1 and using Aramaic reflexes as in Syriac, which with a slight displacement overlaps chronologically with Safaitic. Table 4.9a ∗ s1 and ∗ s2

š = s1 (š) ɬ = s2 (š) a

Aramaic

Arabic

š s

s š

Safaitic glyph (not which is interpreted as s2)a

The (assumed) phonetic values of the Safaitic letters are determined comparatively.

‘sun’ (Al-Jallad 2015: 344) =¸Arabic ‫ ﺷﻤﺲ‬Syriac either {ɫms} (Al-Jallad’s interpretation) or {šms}.

) for instance would be

Table 4.9b Examples Aramaic

Arabic

šm esar

ism ‘name’ ʕašar ‘ten’

This looks perhaps a bit confusing, but as the examples show, on a synchronic basis speakers of Aramaic and Arabic could easily have made etymological associations which could have limited any confusion implied in the abstract representation in Table 4.9a. Al-Jallad observes, however, that Safaitic uses s1 (the ¹⁴ A reader suggests (insisting on the point multiple times) that “Raziħit” is a form of Arabic maintaining the vl. lateral fricative /ɫ/. The authors of the only study (in the West) on this interesting variety have this to say, “we leave open the question as to whether Raziħi is a dialect of Arabic or is better regarded as another language with certain features derived from a common Semitic source and others adopted from Arabic” (Watson et al. 2006: 39). Raziħi is a contact-inflected problem whose integration into the debate about proto-forms remains outstanding. ¹⁵ To underscore the point that there is no inherent phonetics in epigraphic orthography, Kootstra (2016: 82) interprets Taymanitic s2 as [š], not [ɬ]. S2 in Taymanitic is represented by three different shapes (Kootstra 2016: 74), one of which appears identical to Safaitic s2, except, as represented, it is more vertical.

88

THREE T YPES OF PRE- AND E ARLY ISL AMIC SOURCES

glyph

) to transcribe Aramaic /š/, i.e. the Arabic phonetic value [s]. They do

not use s2 ( ). He reasons, Aramaic /š/ (e.g. šm) expects a glyph with the phonetic value /š/, i.e. a one to one mapping of Aramaic phonetic value onto the Safaitic glyph representing this value, which would be s2 ( ). From this it is concluded that “s2 [in Safaitic, jo] did not have the same value as its CA counterpart, namely [š].” This allows him to interpret Safaitic by default as identical to PS /ɬ/. As a further argument Al-Jallad would apparently invoke an interpretation of the value of Sibawaihi’s shiyn. He states, with little argumentation, that today’s pronunciation as /š/ did not arise until the third/ninth century, i.e. after the description of Sibawaih. For Sibawaih he understands the “shiyn” value [ç] (2015b: 28, 2020: 17). To address this issue further, it is therefore necessary to return to Sibawaih, already treated in 3.2.1 above under the general question of what “palatal” and “alveolar” meant in Sibawaih. I repeat relevant parts of the description here (see ch. 3 n. 8 for further). Sibawaihi’s shiyn (II: 453–454) is uncontroversially a voiceless (mahmuus) fricative (rixwa). About the place of articulation, he groups the shiyn with the jiym and yaaʔ. Repeating the quote from 3.2.1 it is pronounced (II: 453.7) in “what is from the middle of the tongue between it and between the middle of the hard palate (al-ħanak al-ʔaʕlaa) is the place of articulation of the jiym, and the shiyn and the yaaʔ.” Strictly speaking all that can be said with absolute certainty is that this set of three sounds lies between a palatal and an alveopalatal sound. Adapting the discussion from 3.2 to this specific issue, the “al-ħanak al-ʔaʕlaa” covers a broad region extending from the hard palate all the way to the place of articulation of laam ([l]). This region accommodates different interpretations of shiyn, and it would accommodate one of the sounds in the set, yaaʔ having a palatal pronunciation, and others having an alveopalatal. All lie within the al-ħanak al-ʔaʕlaa. As far as the phonetic interpretation of shiyn goes, one argument that has been advanced is that Sibawaihi’s shiyn in fact is the lateral /ɬ/ (Beeston 1962). Beeston (1985 9; Cantineau 1960: 62; Rabin 1951: 33) takes Sibawaih’s al-ħanak al-ʔaʕlaa to be limited to the hard palate, and hence would endorse the [y], [ç], [-j] interpretation of the sounds. Ignoring the distributional and variationist issues detailed in 3.2.1, the major problem with interpreting the sound as a lateral is that Sibawaih, an expert articulatory phonetician, does not describe it as a lateral, and does not pair it with the explicit emphatic lateral . Sibawaih is very clear in his descriptions. He says, for instance, that if you de-emphasize [ṭ] you get [d]. If you de-emphasize ḍaad, however “the ḍaad would exit the language since it has no [corresponding non-emphatic j.o.] place of articulation” (II 455.10).¹⁶

¹⁶ ‫و ﻟﺨﺮﺟﺖ اﻟﻀﺎد ﻣﻦ اﻟﻜﻼم ﻻﻧّﻪ ﻟﻴﺲ ﺷﺊ ﻣﻦ ﻣﻮﺿﻌﻬﺎ‬.

4.1 EPIGRAPHY

89

Sibawaih’s phonology was essentially an applied phonetics. He introduces his meticulous articulatory description as an introduction to the extensive discussion of assimilation (idɣaam or iddiɣaam) in Arabic. In this Sibawaih occasionally mentions the shiyn, at one point observing that the shiyn has an extended place of articulation so that it reaches as far as the ṭaaʔ … “…‫”اﻟﺸني اﺳﺘﻄﺎل ﻣﺨﺮﺟﻬﺎ ﺣﺘﻰ اﺗﺼﻞ مبﺨﺮج اﻟﻄﺎء‬ (II: 462)¹⁷

The ṭaaʔ [ṭ] in turn belongs to the class of three sounds, /ṭ, d, t/ which are pronounced with the tip (ṭaraf ) of the tongue placed against the root of the incisors (ʔuṣuwl al-θanaaya). In conjunction with the idea that the hard palate extended at least all the way to where an /l/ is pronounced, the shiyn as described here would appear to correspond to a voiceless alveopalatal fricative [š]. Such a sound is articulated with the blade of the tongue behind the alveolar ridge and the front of the tongue raised against the hard palate, the tongue tip touching against the lower incisors. Here there is an almost perfect match between Sibawaih’s predicate istiṭaal ‘extend, be long’ and a modern phonetic description which clearly identifies the sound as pronounced over a large stretch of the mouth between the alveolum and the hard palate. The only discrepancy is perhaps Sibawaih’s identification of the front extension of shiyn with ṭaaʔ, which appears to be slightly in front of the alveolum. On the other hand, it might also be significant that Sibawaih chose /ṭ/ from among the set of three dental sounds, not the non-emphatic /t, d/. As he explains (II: 455.5), the emphatic sounds as a class are characterized by raising the tongue toward the hard palate (al-ħanak al-ʔaʕlaa), so that /ṭ/ would be perceived as filling precisely the space where the shiyn is said to be pronounced. If Sibawaihi’s basic phonetic description of /š/ allows room for more than one interpretation, his observations on its assimilatory functionality argues more strongly for the alveopalatal interpretation. It is hard to see, for instance, how /ç/ can be conceived of as having a place of articulation stretching to the alveolum. Further to Sibawaih, as seen in (3.6d, 3.7h) above, assuming alveopalatal shiyn allows a straightforward interpretation of two of the non-sanctioned sounds: (4.10)

The jiym like a shiyn (= [cˇ]) the shiyn like a jiym, (= [zˇ])

The bulk of the evidence therefore favors interpreting Sibawaihi’s shiyn to be /š/, as in all known varieties of Arabic today.¹⁸ ¹⁷ He gives a number of examples of shiyn assimilating to anterior consonants, for instance hal šayʔ → haš šayʔ ‘did something’ (II 467.20, also II 471.10 for ṭ, d, t + š). ¹⁸ There is one exception. Seeger describes a fascinating dialect in Khorasan where all ∗ š have merged into ∗ θ.

90

THREE T YPES OF PRE- AND E ARLY ISL AMIC SOURCES

Having argued that Sibawaihi’s shiyn is in fact a “contemporary” shiyn, I can return to Al-Jallad’s interpretation of Safaitic s2 as something other than this, namely /ɬ/. Al-Jallad appears to have two sequential interpretations. Safaitic maintained proto-Semitic ∗ ɬ, while this shifted to /ç/ in Sibawaih and in a post-Sibawaih time to /š/. That Sibawaihi’s shiyn was most likely /š/ was argued for immediately above, and therefore this part of the development can be peremptorily dispensed with. Still, this interpretation, along with that of s2 as /ɬ/ have another conceptual and methodological problem to address, namely, if /ɬ/ (and /ç/) are predecessors to /š/, how is it that Arabic uniformly (see n. 19, this chapter) realizes shiyn as /š/? Not only do the varieties of Arabic in the Middle East heartland have /š/, but also those in the diasporic regions of Africa and Middle Asia do. This basic parameter has been introduced in principle in the discussion of the ∗ k → /cˇ/ shift above in 3.2.2. If forms are equally found in the Middle East—tantamount to saying they are pre-Islamic—and in diasporic regions, it can be assumed that they spread in tact from the heartland to diasporic areas. The Arabic-Islamic diaspora began some 130 years before Sibawaih. By inference, the /š/ was already in place by then. Beyond taking as an article of faith that Safaitic had the value of PS ∗ ɫ, it is at least desirable to have an account for how Safaitic ∗ ɬ shifted uniformly to /š/ (via /ç/?) in the pre-Islamic era. Note that there are no /ɬ/ relics as there are relics in Arabic of nominal FSG –t as uniformly –t (see 2.3.3). The point is simple, but hardly trivial. A sound change needs to be assumed to be linked to a given population at a given time. This in fact is what underlines one of the major challenges in sociolinguistics, the actuation problem (Weinreich et al. 1968: 186). How does an innovation move from being a nonce event to a community-wide one? It cannot of course be expected that the social embeddings surrounding the supposed shift, ∗ ɬ → ∗ ɕ → š be ascertained. In this case, however, the actuation problem does act as what can be termed a “realism filter.” We observe /š/ uniformly today.¹⁹ Is there any evidence in contemporary Arabic that it had ∗ ɬ or even ∗ ɕ in its history? The answer here is “no,” and the answer from Sibawaih is also “no.” We reach an impasse: no actuation link between a proto Semitic reconstruction and Arabic can be established.²⁰ The analysis here pushes the origin of Arabic /š/ = s2 into an era when no varieties of Arabic are documented in any manner. I develop this point more explicitly in 4.4 below. One final issue remains to be treated, namely the question why, if Arabic (or Safaitic) s2 is /š/, Aramaic names in /š/ were represented as s1 = /s/, not s2 = /š/? ¹⁹ Behnstedt (1987: 9) notes the value ˆs which he calls a “retroflex š” in the north Yemeni dialect of the Bani Minnabbih (im-Maθθ θθa). θθ To add to the mysteries of Arabic sibilants, as well as to issues of contact, Davis and Faifi (2022) argue that a reflex of ∗ ṣ as [st] from the same region (Faifi in Saudi Arabia) is due to substrate contact with South Arabian. ²⁰ Kogan (2011: 74–75) seeks to summarize evidence of an original /ɬ/ value of shiyn in Arabic, but finds it only in isolated, and in some cases rather obscure, possible, lexical doublets, e.g. ḍ ~ š, ʕillawḍ ~ ʕillawš ‘jackel,’ where /ḍ/ represents the original lateral fricative. Alternation between /š/ ~ /ḍ/ should suggest alternation between laterals. The data is so exiguous, and alternative explanations so many that the search for such correspondences is hardly worth the effort.

4.1 EPIGRAPHY

91

The straightforward explanation is that whoever wrote the names thought etymologically, not phonetically, as al-Jallad’s argument assumes. Wansbrough’s (1996: 96) observation can usher in the discussion: “Orthography tends to be morphophonemic, that is, etymological rather than phonetic.” Looking at the first line in Table 4.9a, an Aramaic “šm” becomes in a script written by an Arab “ism,” since the writer knows that Aramaic /š/ = Arabic /s/. This is an old point, nicely laid out in Diem (1980a), who explicitly “answers” the problem of al-Jallad’s phonetic assumptions: Ha¨tte man etwa in ‫ … אחרשו‬gema¨ß der aktuellen Lautung ʔaxras Stumm den Buchstaben ‫ ש‬durch den phonetisch eindeutigen Buchstaben ‫ ס‬ersetzt, so ha¨tte man sich damit von der arama¨ischen Schreibung des arama¨ischen Gegenstu¨ckes ħeršaa gelo¨st. In diesem und in anderen Fa¨llen hinderte ein unmittelbares arama¨isches Gegenstu¨ck den Schreiber, ‫ ש‬in Schreibungen arabischer Namen durch ‫ ס‬zu erstezten… (1980a: 79) (If one had substituted, in say, ‫ – אחרשו‬according to the actual form ʔaxras ‘dumb’ – the letter ‫ ש‬for the phonetically explicit letter ‫ס‬,²¹ then the connection to the corresponding Aramaic form ħeršaa would have lost. In this and in other cases a direct Aramaic equivalent prevented the copyist from substituting the ‫ש‬ in Arabic names with ‫ס‬.)

Diem further notes that the vast majority of Arabic siin (s) words have an Aramaic correspondence in ‫ש‬, so that a general convention developed whereby Arabic siin would be written with ‫ש‬, even in the few cases when this etymology was strictly speaking wrong. I will return to these issues when I generalize the issue of actuation realism in 4.4 below. Given the high degree of Aramaic–Arabic bilingualism (see Chapter 7 and n. 22, this chapter), it is not surprising that Aramaic proper names in Safaitic texts were automatically switched over phonetically to their equivalent in Arabic.²²

4.1.6 Summary, Safaitic Even if some of the varieties are attested in only the most fragmentary fashion, the ANA languages indicate that the language map of the pre-Islamic Middle East was populated by many more languages than the dominant Aramaic and Sabaic (Macdonald 2000). These varieties should not be conceptualized as representing ²¹ I. e. Diem imagines ‫ אחרסו‬here. ²² Nabataean bilingualism as indicated by bilingual inscriptions is attested with a number of languages – Greek, Safaitic, Arabic. Writing on Nabataean–Hismaic blingual texts (see Map 2), Hajayneh (2009: 216) observes, “The scribes were fully aware of the style and formulaic structure of the Nabataean and Hismaic texts. In other words, the formulaic structure of both versions conformed to the separate traditions operating in the two languages.” An assumption that scribes would look for literal one to one equivalences appears misplaced here as well.

92

THREE T YPES OF PRE- AND E ARLY ISL AMIC SOURCES

hard and fast linguistic boundaries, however, but rather as located along a linguistic continuum of “small differences” (Retso¨ 2013/2019: 442, echoing in a manner Macdonald (2000: 56)). What has emerged from the critical discussion of three test cases is that the ANA epigraphic material only conditionally can be used as primary data in the interpretation of the history of Arabic. This is paradoxical. The inscriptions all pre-date Islam, and Sibawaih. The earliest attestations of Safaitic are some 700–800 years before Islam, or some 1,000 years pre-Sibawaih. Underspecification, however, is the operative term, and, again paradoxically, the best way to specify the texts is by comparative reconstruction invoking CA or contemporary dialects. Any results so obtained, however, while certainly suggestive as to how Safaitic was spoken, obviously cannot be re-inserted in the reconstruction of proto-Arabic once they are based on the data which reconstruction is intended to explain (see P5 in 6.1). The few really interesting cases involve instances where Safaitic diverges from CA or contemporary spoken Arabic. Three stand out. The definite article is frequently h-, invariable –t in the feminine nominal suffix, and weak final verbs maintain –y, -w. Two of these have direct analogues in contemporary Arabic. As shown in Table 4.4 above, the Arabic grammarians already offered an internal reconstruction of final –y/w, in their case in the form of a synchronic rule, though one which can be reinterpreted as a claim about historical origin. The case of the feminine –t was discussed in 2.2.2. The third instance is the interesting case of the definite article which has been left out of discussion in this book (see ch. 2 n. 3).

4.1.7 Aramaic loanword š = Arabic s It is relevant to describe a case analogous to that discussed in Table 4.9a involving ∗ s1 of pre- or early Islamic Arabic, even though the example does not come from the epigraphic record. This pertains to early Aramaic loanwords in Arabic. Retso¨ (2006: 179–180) notes that in general Aramaic loanwords in the Koran are similarly represented as Arabic s1 (/s/). The linguistic issue is analogous to the problem sketched in Table 4.9a, hence its treatment here. Arabic stage 2 is attested everywhere, without exception, both in the homeland and in the diaspora. (4.11)

Aramaic /š/ = Arabic /s/ Aramaic Arabic šbbiil sabiil ‘path’ mšiiħ masiiħ ‘Christ’ šabtt sabt ‘sabbath, Saturday’

This change, stage 1 ∗ š → stage 2 [s], would have had to have happened before 622, around the start of Arabic diaspora.

4.2 PAPYRI

93

Recalling that Arabic s1 was original ∗ š, Retso¨’s explanation for this regular correspondence is that the Aramaic loanwords were introduced at a time before the shift of Arabic s1 to [s].²³ In this interpretation, the Aramaic Koranic loanwords are in place before the Arabic shift of ∗ š → s. The loanwords then shifted to /s/ whenever the general shift of ∗ š → s occurred in Arabic. (4.12)

Shift of PS ∗ š and loanword ∗ š → [s] in Arabic Stage 1²⁴ word → Stage 2 Arabic ∗ š (s1) → [s] ∗ PS word šm ism ‘name’ Aramaic loanword mšiiħ masiiħ ‘Christ’

Retso¨ would date the era of loaning as far back as the second century BC: “The Aramaic š (= Semitic s1) is always s in these items, which shows that these words were borrowed from Aramaic before the Arabic sound shift š > s …” (2006: 180). By the same token, nothing linguistically speaks against a later loaning date after the change ∗ š → s. The scribes, or simply bilingual speakers routinely recognized the equivalences in (51) and calculated the appropriate variant in their heads (see 8.6 for analogous development among LCA idioms). Given that most of Retso¨’s examples come from learned vocabulary, which may be susceptible to special treatment in loanwords (see 5.3.1.5 for one such example), the “etymological” solution of Diem/Wansbrough discussed above could have worked for these words. In this case, however, the effects of (4.11, 4.12) would already have been in place. In this perspective the Aramaic ∗ š → Arabic s would have happened twice, under different conditions.²⁵

4.2 Papyri A second major source of pre-Sibawaihan data comes from the papyri. These have a long research tradition in the West going back to the last third of the nineteenth century, and the basic grammatical properties are well-described. A landmark study is Hopkins (1984), and though based on a corpus later than Sibawaih, Blau (1966–7) is a significant work. The earliest of the Arabic-only papyri dates to ²³ “These words were borrowed from Aramaic before the Arabic sound shift ∗ š → s …” (Retso¨ 2006: 180). Recall that the PS reconstruction of s1 = ∗ š is the same as the Aramaic (see Table 2.2 in 2.1). Given this reconstruction, the Arabic realization /s/ would have needed to have shifted from ∗ š, hence Retso¨’s suggestion. ²⁴ I.e. this assumes the period when Arabic s1 still had its original proto-Semitic value ∗ š. ²⁵ Looking beyond etymological cognates, Fraenkel notes (1886: xxi) multiple reflexes of ambiguous Aramaic sounds/letters in independent loans. For instance Aramaic ħ, ‫ ח‬/ (∗ merged from ∗ ħ/x) sometimes surfaces in Arabic as /x/ in the Aramaic loan, sometimes as /ħ/ naxlan < {nħla} ( ) ‘valley,’ ħawwaariin < {ħwryn} ( ) ‘washers.’ Since there is no etymological model, it appears that any Aramaic “ħ” could trigger an etymological response, even if folk etymologic, as adumbrated by Diem. Similarly for Aramaic sh = /š/ or /s/, for “p” usually /f/, but occasionally /b/.

94

THREE T YPES OF PRE- AND E ARLY ISL AMIC SOURCES

22/640, over one hundred years before the era of Sibawaih and they continue to be numerous until the end of the third/ninth century (Hopkins 1984: xli), when they were displaced by paper. Most of the earliest papyri, the era which interests us here, come from Egypt, this country because of its very dry climate being an excellent store of old manuscripts. As will be seen, the papyri are not simply early Classical Arabic. Kaplony (2018) views differences from CA as numerous and systematic enough to view the language of this entire corpus as he terms it, “Documentary Arabic” (or “scribal”), which he distinguishes from Classical Arabic, or as he rather problematically from a linguistic perspective calls it, “grammatical Arabic.” As seen in 1.1.5, the term Documentary Arabic is useful in providing a label to an entity which self-evidently is of an age, indeed in its origins, older than CA. A label, however, does not address the crucial historical question of how these documents feed into what ultimately becomes Classical Arabic, and by implication, into what their status is in the history of Arabic in general. I return to this point after a summary of the language of the papyri.

4.2.1 Basic overview Using Hopkins as a basic reference, key elements of the Arabic of the papyri can be summarized briefly and in perfunctory style. Where Hopkins’ summarizing position is clear I will not cite examples (often they are, however, found in 4.2.2). Where Greek transliterations are available it is often useful to cite them for information which is lacking in the Arabic script, for instance the nature or presence of short vowels. Hopkins (1984: xlvi) suggests that in nearly all cases where the language of the papyri deviates from that of Classical Arabic, it deviates in the direction of the contemporary dialects (see below). While this statement is an over-generalization,²⁶ this situation does often obtain. In this summary I cite only characteristics which the papyri share with dialects, as against CA. I limit the discussion to phonology and morphology, though I will collapse morphosyntactic features such as the indefinite –n (tanwin) under morphology. In order to get a feel for how Hopkins’ grammatical summary plays out online in texts, in 4.2.2 I illustrate the language of early Islamic papyri from one corpus. A second example is found in App. 4.2.2.1. In order to indicate a deviance from the expected CA norm, the “correct” forms are written in brackets, indicated by an asterisk, (∗ form).

²⁶ For instance, Hopkins states (1984: 105) that the final –n of the dual may be left off, e.g. {aθny} = CA iθnayni ‘2’. I know of no dialects however, where this is true of the nominal suffix in general, and not for ‘2’ (iθneen, tineen, sineen, etc.). The relation of the papyri to the modern dialects is mediated via writing conventions of the papyrological tradition.

4.2 PAPYRI

95

4.2.1.1 Phonology The glottal stop is lost. For all practical purposes it can be stated quite plainly that in the language of the early papyri Hamza, the glottal stop, barely exists. (1984: 19)

The position of the glottal stop will be filled by /y, w/, vowel length, or it will be simply deleted. A shift of the interdental θ → t is attested in Palestinian documents in the first/seventh and in Egyptian in the third/ninth century, {n-bʕt} ‘we send’ < n-bʕθ, (p. 33). /θ/ as in CA is apparently the more common. In Arabic dialects as well /θ/ and /t/ are the most frequent reflexes of ∗ /θ/ (see 5.3.1.5). /ḍ/ and /ḍ/ merged, {w aħfḍ} ‘and keep’ < aħfḍ). There are more tokens of merger toward /ḍ/ than toward /ḍ/. With an exception reported by Behnstedt for the Bani Abaadil in North Yemen (1987: 5), there are no dialects today which contrast the two sounds phonemically in native vocabulary (i.e. excluding ESA). The phonemic status of the two in CA as well deserves closer scrutiny (see Brown 2007). Hopkins finds no evidence for final short vowels (see 12.3.3) {amṭr} ‘cause to rain’ (αμτρ). He says there is widespread deletion of short vowels in open syllables (1984: 2), as is the situation among dialects today (see 2.3.2). Of course, short vowel deletion is also attested in CA, though to a lesser degree, and the manner in which they are deleted in contemporary dialects is extremely differentiated and worthy of individual historical interpretation. Given the impossibility of ascertaining syllable structure in detail in the papyri given the lack of indication of short vowels,²⁷ no more can be said than deletion is widespread in both the papyri and the dialects. There is ample evidence of the taltala phenomenon (see 5.3.2.5) in the papyri, yeqdir = ιεκδιρ (1984: 5). In the Damascene Psalter (Hopkins cites Violet) there is clear evidence for imaala, e.g. (jeeb Gr. γεβ ‘he brought’ (p. 8). Imala is characteristic both of CA and a number of modern dialects (see 5.3.1.3).

4.2.1.2 Morphology and syntax In the verb one finds MPL suffixes used where the referents are clearly feminine, as in {aktb-w} ‘write-MPL’ in letter addressed to females (1984: 92, early 2nd /8th ). In many dialects a distinctive FPL inflection has been lost. Similarly, reference to the ²⁷ Arabic written in Greek, matres lectionis, writing of the alif al-waṣl are all direct or indirect indications for ascertaining the presence or absence of short vowels.

96

THREE T YPES OF PRE- AND E ARLY ISL AMIC SOURCES

dual will be replaced by a plural pronoun (p. 94, 178/312), {w tħml-hm ʕla l-ħq} ‘and you bring them (not dual {–hma} ∗ humaa) to the truth.’ In CA the negative particle lam requires that a verb be in jussive form, which entails shortening a medial or final long vowel. In the papyri shortening often does not occur. {f lm n-ṣyb} ‘we did not find’ (∗ n-ṣb), {fa-lm-aʕṭyh} = ʔaʕṭiihu (∗ ʔaʕṭihu) ‘I did not give him’ (1984: 83, 85). In all western Arabic dialects the MPL imperfect indicative suffix ends in –u rather than -uun as in the east. The papyri often display the western form. {kma tħbw} kamaa tuħibb-uw ‘as you would like’ (∗ tuħibb-uwna) (p. 70, mid 3rd / 9th ). What is arguably the emblematic symbol for a distinction between CA (and therefore, Old Arabic) and dialects is the presence of case in CA, the lack of it in the dialects. The issue is long, involved and has many ramifications, and is therefore is not treated in this book except in passing (see e.g. Al-Jallad and van Putten 2017 vs. Owens 2018c for contrasting views). Hopkins is quite categorical on the general lack of a functional case marking system in the papyri. This follows in part from the lack of short final vowels—most case marking is effected via short vowel suffixes. The historical status of final vowels is discussed in greater detail in 12.3.3 and 12.4. More dramatically, since in contrast to the short vowels the expression of case will always be overt, this is apparent in those nominal forms where the case marking is expressed by a long vowel or by a consonant. In these forms case, if it exists, will be represented orthographically. Hopkins writes: In common with the general trend of non-Classical varieties of Arabic, in both the dual and sound masculine plural there are no distinctions of case, the only genuine endings of these categories being reflexes of the CA casus obliquus. (1984: 99) With certain minor exceptions… it is quite clear that the language treated in this study was characterized by the absence of a case system … (1984: 155)

Thus, CA expects the following contrasts. (4.13)

nominative –uwna {wn} -aani {an} Nominative abuw axuw

accusative/genitive -iyna {yn} sound MPL -ayni {yn} ‘dual’ accusative genitive abaa abiy ‘father’ axaa axiy ‘brother’

Instead of this full system, in many instances an invariable nominative occurs, and in others more or less arbitrary case forms not corresponding to their expected (CA) function. The following exemplify these.

4.2 PAPYRI

(4.14)

{mnha mn jazyt dynaryn} Min-haa min jazyat raas-ak diinaar-ayn (∗ diynaar-aan) “two dinars thereof are for your poll tax” (113 H)

(4.15)

{nħn mlzmyn} naħnu mulzam-iyna ‘we are obliged’ (∗ mulzamuwn, p. 107, 3rd /9th )

(4.16)

{ʔn axwh} ʔinna axuw-hu ‘that his brother’ (∗ ʔaxaa-hu) (p. 159, 3rd /9th )

97

In (4.14) the nominative {dynaraan} is expected, in (4.15) the predicate should be nominative, {mlz-wn}, in (4.16) the particle ʔinna expects an accusative, not nominative. A second explicit case marker is a long /aa/, {a} ‘accusative’ attached at the end of a noun, the so-called “tanwin alif,” which actually represents two discrete morphemes, both accusative and indefinite (-n) e.g rajul-a-n ‘man-ACC-INDF’ (see 5.3.2.9). In the papyri Hopkins (1984: 162) writes “tanwiin alif may be absent in every syntactic environment in which it would have been obligatory in CA.” (4.17a)

{f ʔʕṭw dynar} fa-ʔaʕṭ-uu diinaar (∗ diinaar-an) ‘then give one dinar” (p. 163, 90 H)

Equally, the tanwin-alif appears in contexts where it should not occur in CA, as in the following where CA expects a nominative. (4.17b)

{f hl bha lbn-a} fa hal bi-haa labn-an and Q in-it.F milk-ACC ‘Is there any milk in it?’ (∗ labn-un = {lbn}) (p. 170, 229 AH)

It is relevant to point out that developments which apply to many (not all) dialects are well attested in the papyri. For instance, the internal passive, which is common in the Arabian peninsula but not frequent outside of it, is well attested, e.g. {yqal} ‘it is said’ (1984: 71). For Hopkins the relation between the papyri and the dialects is mediated via Middle Arabic, which is “typologically akin to most of the modern colloquials” (1984: xlvi). Middle Arabic itself, as seen by scholars such as Larcher (2001) and Versteegh (2005), describes a genre, not a historical stage.²⁸ This view rescues the anomaly that a variety termed “Middle Arabic” should chronologically ante-date ²⁸ Thus Khan (2018) summarizes Judaeo-Arabic (beginning in the tenth century, see Wagner 2010), which is a variety less beholden to the strict norms of literary classical Arabic. However, he expresses the difficulty of considering Judaeo-Arabic texts as representing a direct witness to the spoken Arabic of the period. “not all deviations from CLA should be identified as the reflection of genuine dialect features. In some cases these deviations are pseudo-literary features which arise when the writer attempts to

98

THREE T YPES OF PRE- AND E ARLY ISL AMIC SOURCES

the variety, CA, which is held to represent “Old Arabic” and to be marked by innovations relative to the nominally older but chronologically younger CA. By the same token, because it is a mixed genre containing both classical and nonclassical elements, it is a linguistic object sui generis. To the extent that it represents a speech community, it is a community of learned scribes. For our purposes its value in interpreting Arabic language history resides in its blatant contradiction of the linearity interpretation of Arabic history. At the very least its mixed nature indicates that from its earliest attestations there existed in Arabic different varieties which initially at least must not have been perceived as dividing the Arabic language into sanctioned and, as Ibn Faris might have judged them 350 years later, “blameworthy” (mađmuuma) varieties.

4.2.2 A case study, raw data, and deviation from CA To give an “online” impression to complement the structural summary in 4.2.1 about how CA-like the early papyri are I summarize the points where in two sets of early papyri the Arabic deviates from CA norms. In App. 4.2 a second online analysis of complete running texts from the Qurrah Aphrodito is made. Here I treat the earliest papyri beginning in 20/640 up to the year 675/55 that are available in the Munich online collection as of 2021. The purpose here is to give an indication of how the general divergences between the papyri and CA summarized on the basis of Hopkins (1984) play out in an “online” sample of texts. All non-orthographic deviations are noted. Otherwise the texts conform to CA, though as will be noted in a number of places, these texts equally conform in most ways to contemporary spoken Arabic as well. The context of all of the papyri is administrative missives, highly conventionalized in many ways, and hence even allowing for the basic problem of orthographic underspecification (see 4.1.3.1) the range of linguistic structures at all levels is very limited. In order to give a concrete idea of the grammatical scope of the sample involved, for the papyri I have noted how many words²⁹ are in each text and how many verbs there are in each. There are, for instance, a total of 28 verbs (tokens), nine of which are the verb {šhd} šahida ‘witness’ and 12 of which are the verb {ktb} kataba ‘write’ in 3MSG form. These verbs repeat themselves across all of the letters because they designate the name of the scribe who wrote the letter and the witness to the event or transaction depicted in the letter (Wansbrough 1996: 114, 118). Earliest 12 papyri from Munich collection (available as of 2020). The examples are identified by name as they are in the online corpus. The criterion of a “mistake” is how the form would be judged according to CA grammar. The “offending” form is marked in boldface. avoid a dialectal feature but produces a form that does not exist either in his spoken dialect or in the CLA literary language” (2018: 156). ²⁹ {w} ‘and’ is counted as a separate word, but {f } ‘then’ and bound prepositions like {b} are not.

4.2 PAPYRI

99

Chrest. Khoury. 20/640–641 (Egypt 30 words, no verbs) (4.18)

{mn ʔhl ʔbw mʕrwf } (∗ CA {ʔby}): no genitive ‘From the people of Abu Maʕruf ’

(4.19)

P. Diem. (22/642–43) (Egypt, 12 words, no verbs) {dynra w nṣf dynr-a} (∗ CA niṣf diinaar-in): accusative where genitive, which would be unmarked orthographically, is expected ‘one dinar and a half ’

(4.20)

P. Grohmann. (22/643) 62 words, four verbs, 1 1PL (ʔaxađnaa), others 3SG: {ʔxđ}, {ktb}, {ʔjzr-ha} {ʔbn ʔbw qyr} (∗ CA ʔabiy = {aby}) (twice): no genitive ‘Ibn Abu Qir’

(4.21)

P. Ragib. 42/662-3 127 words, 10 verbs, all third person, {ktb} 3, {šhd} 5, {đkr} 2 Nominative wrongly combined with genitive (l. 2, 13) {mlʔ θnaan w ʔrbʕyn} (∗ CA iθnayn): θnaan expects the genitive, not nominative as here. By the same token, whatever the reason for its use (e.g. hypercorrection), {θnan} is the CA nominative dual form, ‘fullness/measure amounting to 42’ Genitive wrongly combined with nominative (l.5) {mn snt ʔθnyn w ʔrbʕwn} (∗ CA iθnayn wa ʔarbaʕiin): ʔrbʕwn ‘twenty’ expects a genitive, not the nominative as here (to nominative, same remark as above) ‘from the year 42’ Nominative + nominative rather than genitive + genitive {aƟnan w ʔrbʕwn} (l. 10)

(4.22)

P. Brunning. 44/664–665. 29 words three verbs, all third person {šhd}, {ktb}, {nfʕ} {θlθ dynaar} (∗ CA = θalaaθ-at danaaniir, which would be represented {dnanyr}): polar value of numeral should have –t ({θlθ-t}), and noun should be plural, not singular

(4.23)

P. Tillier. 48/668 (Egypt, Fusṭat, 53 words, seven verbs, all third person), {đkr}, {šhd} 3, {ktb} 3 No mistakes P. Grohmann. 643–674/ (Egypt, 13 words, one verb {ʔʕṭy}) No mistakes, but no complete sentences either. P. Ness. 60a. 54/674, five words, no verbs No mistakes, but text very incomplete

100 (4.24)

THREE T YPES OF PRE- AND E ARLY ISL AMIC SOURCES P. Ness. 60b. 54/674 Nessana (Negev), 61 words, two verbs, 1 3PL perfect {aʕṭ-w}, {ktb} {sbʕyn mda qmħ w mθlh zyta} (l. 7, 17) (CA {qmħ-a}): accusative (tamyiiz) expected on qamħ-an. Note that the accusative does appear on {zyt-a} as it would in CA. ‘70 measures of wheat and the same in oil’ {sbʕyn}, predicate expects nominative {∗ sbʕwn}

(4.25)

P. Ness. 61. 55/675, 55 words, two verbs, 3PL, {fa-aʕṭw}, {ktb} {sbʕyn mda qmħ w mθlh zyta}: both tokens exactly as in (4.24) {stt w tsʕyn mda qmħ w mθlh zyta} ‘96 measures of wheat and the same in oil’ ({∗ tsʕwn}, parallel to (4.24).

(4.26)

P. Ness. 62. 675, 49 words, two verbs, 1 3PL, other 3SG {f-aʕṭw}, {ktb} {mda qmħ w mθlh zyta}: as in (4.25)

(4.27)

P. Ness. 63. 55/675, 27 words, one verb (3SG) {ktb} {mda qmħ w mθlh zyta}(twice) (as in [4.26]). {tsʕyn} predicate expects nominative {∗ tsʕwn}

In most of the texts there is at least one difference with CA. In only three are there no deviations, though two of these have a tiny number of words. By far the most common deviation is in case form found in eight of the texts, largely confirming Hopkins’ assessment in 4.2.1 that the papyri do not show a functioning case system such as is found in CA. Wrong number polarity and plural form are two more error types. Tokens which do have the CA-type case marking are few, and there are some odd mixtures such as (4.21) above. The deviations from CA are certainly an interesting aspect of the early papyri. How systematic they are requires a dedicated variationist study comparing what deviates with what does not. The larger point to be made here, however, is how difficult it would be to have a thorough idea of what seventh- and early-eighthcentury Arabic was on the basis of these texts alone. Seventy-five percent of the verb tokens in the first 12 papyri are the 3MSG of two verbs, {ktb} ‘write’ and {šhd} ‘witness.’ Even with the best and most generous of intentions person-numbergender inflection or transitivity properties of verbs remain obscure. The variation in case defines a problem, but gives no answer to the question, whether the variation reflects the breakdown of the system, the traditional answer, or whether there were speech communities which did not have a functioning system to begin with. The vast bulk of Arabic morphology and syntax cannot be ascertained in these texts, and underspecification at best leads to discerning multiple interpretations with no way to decide which is the best (see discussion of xabbar(a) (or ʔaxbara) in App. 4.2). To this period (55/675) it can be unequivocally emphasized that there is no written model from which we can derive CA as defined by Sibawaih. The case system,

4.2 PAPYRI

101

the shibboleth to define the transition from Old to Neo Arabic, is nowhere in sight. There are forms which replicate what became typical case markers, e.g iθn-aan ‘2NOM’ in (4.21) but as often as not they are employed incorrectly by CA standards. Ostensibly this would be a problem for advocates of the Old vs. New dichotomization, since “ungrammatical” case usage antedates the normatively defined system in Sibawaih by over 100 years. New precedes Old. Of course, there are legitimate interpretive issues. It might be, for instance, that all the mistakes were committed by non-native Arabic-speaking scribes, which I find doubtful, however. It might be that this represents a case system breaking down, scribes reflecting the free variation which had overtaken the system as a whole. In this perspective the issue of where case came from gets thrown back into comparative historical linguistics, as, for instance, debated in Al-Jallad and van Putten (2017) vs. Owens (2018c). The data equally supports the interpretation of Owens (1998c and elsewhere), that there co-existed native Arabic populations or speech communities, some with a functioning case system, others without, and that the variation reflects what happens when formally different systems in contact need to come to a consensus about how to write the language. The fact that many of the early papyri were parallel Arabic–Greek productions even allows a fourth possibility. The case-marked original Greek text served as a model of the Arabic, which tried to reproduce, unsystematically as it turns out because Sibawaih had not yet arrived on the scene, an Arabic text with case endings. The answer to the question, the degree to which Sibawaihi’s linguistic thinking can be read directly as an interpretation of the written Arabic of the papyri, is obvious. Moving toward Sibawaih, from 96/714 written documents do become somewhat more numerous, the official letters tend to be more complete, the lexical range increases considerably, fuller inflectional paradigms can be deduced from the documents, and they are generally longer than the period described here.³⁰ Some letters, for instance the one described in Sijpesteijn (2004) from AD 735 contain narrative passages which describe events not inherently connected with the stated business that the letter is concerned with. Still, on the whole what we have are mainly official documents of limited stylistic range. They still give no direct detail of the phonetics and phonology of the language, they do not yield complete verbal paradigms with their underlying grammatical categories across the spectrum of basic and derived verbs, and while variation in the language can be discerned, the provenance of the variation is obscure. The linguistic information contained in them is of an order more detailed and better specified than the epigraphic record. It is, however, still incomplete. Certainly up to 96/714 and impressionistically from the papyri up to 750/133, the answer to the question whether one could derive CA from them as we know it from Sibawaih is “no.” ³⁰ For instance, in the Munich collection between AD 730–750/H 112–133, the era immediately before Sibawaih, there are 83 documents listed on the website.

102

THREE T YPES OF PRE- AND E ARLY ISL AMIC SOURCES

To be explicit, the documents are extremely valuable linguistically and otherwise. Indeed, they are merely another attestation to the richness of data which exists in Old Arabic. It needs to be recalled, however, that they are but one element among a number providing interpretive fodder for a holistic account of Arabic language history.

4.2.3 From juridical and cultural koine to Classical Arabic? Traditionally it is not a problem that a variety of Arabic lacking attributes which distinguish the Classical variety are attested well before CA itself. This discrepancy is general and not limited to the papyri (see 4.3, 4.4). The explanation for the discrepancy is that the papyri are written in the genre termed Middle Arabic (see 4.2.1.2, 1.1.1), a mixed CA–dialect genre. But a label is not an explanation. There are two relevant questions in this case. The lesser of them is how to explain if CA represents proto-Arabic, or something close to it (Al-Jallad and van Putten 2017), where there was room for a rather different variety, established well before CA, and normalized enough to constitute a rather large body of written texts. Looking at this from a historical linguistic perspective the question is, to the extent that there are discrepancies between the early language of the papyri (and other sources), how the descrepancies are resolved. It is worthwhile remembering that chronological age of an attested variety does not necessarily imply older in a comparative linguistic sense. Akkadian is the oldest attested Semitic language, yet in a number of respects it displays innovations relative to a later-attested Semitic languages, for instance ∗ ʕa/ħa → ʔe, ʔe (or simply /e/, see 6.5), which is generally lacking in other Semitic languages. In this book I will only sketch the direction an answer might be found. While Kaplony might be faulted for not directly addressing the question of the historical linguistic status of discrepancies between his Documentary Arabic and his “Grammatical” Arabic (CA), an adequate answer to the question has ramifications which would lead to a study in and of itself. Basic elements of the question are given here. The starting point of this summary exposition is Wansbrough’s (1996: 142) idea of a “juridical and cultural koine.” He argues that eastern Mediterranean literacy is marked by a formulaic matrix common throughout legal and business communication from Akkadian in the third millennium BC all the way to early literary Arabic, an expanse of some 3,000 years. Wansbrough’s koine (1996: 97–124) is an organizational matrix, not a discrete language variety. The matrix defines the order in which the elements of a letter appear, and the content of each category. A letter will begin with an “invocation” invoking a divine author (e.g {b-sm allh} ‘in the name of God’), mention of the originator of the letter (intitulatio) ({mn qrh bn šryk} ‘from Qurrah bin Shariik’), a devotional expression followed by the name of the addressee (in case of Qurrah letters, these reversed, {ʔla ṣaħb ašqwh}

4.2 PAPYRI

103

‘to the Sahib of Ashqawh,’ {f any aħmd llh alđy la alh ala hw} and so on (the protocol), then to the message, with its sub-parts and the closing section (eschatocol). The message itself (with preamble, contextualization in previous correspondence, content of letter) was often shorter than the formulaic eschatocol and protocol. In the Qurrah letter referenced here (Abbott 1930: 43) the introductory protocol and closing eschatocol comprise (with overlap) 10 lines, the message eight. This legalistic koine is language neutral. Wansbrough discusses its manifestation in inter alia Akkadian, Ugaritic, Phoenecian, Hebrew, Aramaic,³¹ and Arabic. He does identify certain expressions which are either direct translations or approximate calques found at opposite ends of the region, e.g. Ugaritic {il-m t-gr-k t-šlm-k} ‘may the gods protect and give you peace,’ Arabic abqaa-ka allaah ‘may God make you endure.’ The koine, however, is identified largely by its unitary conventionalized format. Viewed in these terms the earliest, extensive genre of Arabic writing at our disposal was based on this juridical matrix. The genre is functionally highly circumscribed, tailored to matters of trade, commerce, and legal matters. The content of all the papyri summarized in 4.2.2. above fall into these categories. As Kaplony (2018) points out, many early papyri, were bi- or multilingual, for instance the Qurrah Aphrodito written in Greek and Arabic. The newcomer, Arabic, modeled itself on the Greek. The first substantive corpus of early Arabic written in Arabic script therefore started out in an eastern Mediterranian matrix. As noted in 4.2.2, it is hardly possible to understand Classical Arabic as a natural outgrowth of Documentary Arabic. Indeed, Kaplony notes that the genre of Documentary Arabic continues even after CA became well defined through the works of the grammarians, a diglossia within diglossia, as it were. Aside from the grammatical differences, which are the main interest here, CA is vastly more expansive than is Documentary Arabic. If the papyrological corpus is understood as the continuation of a legal and commercial genre permeable across linguistic, political, and cultural regions, it is the development of CA which requires explanation. Indeed, my own answer will be brief, but it is relevant to programmatically identify major linguistic aspects of the issue. In principle the development of the Arabic grammatical tradition, which gave rise to a codified and structurally well-defined Classical Arabic, cannot be treated independently of the development of Arabic-Islamic culture in general. By the same token, attempts to draw a direct causal connection between Islamic institutions and the rise of Classical Arabic have been problematic, or have simply been left hanging for 50 years. Notable here is Carter’s (1968/2016, 1973, 2004: 50–1) insistence that Sibawaih was fundamentally a legal scholar who applied established legal terminology to develop linguistic categories and analysis. Waq f, lit. the verbal ³¹ See Gzella 2017: 184 on Aramaic–Farsi matrix.

104

THREE T YPES OF PRE- AND E ARLY ISL AMIC SOURCES

noun ‘stopping’ may be used as a legal term (‘endowment’) or as a theological one (‘suspending judgment’) as well as the linguistic ‘pause’ discussed in 12.3.3 below (Carter 2004: 54). This multifunctionality says nothing inherently of its disciplinary origin, other than that the scholars of the period drew from a common metaphorical source (Makhzumi 1958, see Edzard 1998: 37). From the fact that Bloomfield wrote a chapter on morphology (chapter 13 in his Language) it would be incorrect, as we know, to deduce that he had once been a student of biology. Carter’s interesting idea was never developed beyond an initial hypothesis, however. In telegraphic form, major problems are the following. • Sibawaihi’s Kitaab antedates all extant legal treatises. The Muwaṭṭaʔ of Ibn Malik (d. 179/795) is a contemporary of Sibawaihi’s Kitaab, but Sibawaih would have had to have worked immensely quickly to ascertain the principles of legal scholarship, and then write a dense, 1,000 page grammar based on them. Moreover, the style and content of the Muwaṭṭaʔ hardly allows direct interpretation as a work on grammar. • No mechanism (educational institution, common teachers/colleagues) has been shown through which the legal methodology might have transmitted to the linguistic. • The basic alternative, as noted above raised by Makhzumi (1958) that grammatical and legal institutions developed from a common source has never been addressed in Carter’s model. In Makhzumi’s model, while drawing on a common intellectual discourse, the disciplines would have gone their natural divergent paths.³² • Versteegh (1977: 14–16) points out that the use of ethical vocabulary in Sibawaih to describe correct language (qabiiħ ‘ugly,’ radiiʔ ‘bad,’ mustaqiim ‘correct,’ etc.), which Carter derives from the supposed legal background of Sibawaihi’s to describe correct speech, was common throughout the preIslamic Middle East (e.g. Byzantine–Greek traditions). Versteegh observes, for instance, that treatises about “virtues and vices in speech” were common in the Byzantine cultural period. • As discussed at length in 3.2, it is hard to derive Sibawaihi’s phonetic and phonological theory from legalistic vocabulary or methodology. Such terms as voicing (mahmuus/majhuur), emphasis (iṭbaaq), and place of articulation (maxaarij al-ħuruuf ) have a basis in articulatory phonetics (of any culture), not legal thinking. If Sibawaihi’s Kitaab has not been shown to derive from legal thinking, an alternative perspective would be to see it as an outgrowth of a textual religious exegetical tradition. Here too, however, the evidence does not bear out the idea. Versteegh ³² Of course, later the two did converge in the meta-discussion of the ʔuṣuwl, but this was well after the grammatical tradition had become a fixed institution.

4.2 PAPYRI

105

(1993) examined the earliest Koranic exegetical works, those of Maqaatil ibn Sulayman (150/767), Mujaahid ibn Jabr (104/722), Sufyaan aθ-θawri (161/778), Zaydi ibn Ali (122/740), Saʔib al-Kalbi (146/763), and Maʕmar ibn Rashid (153/770). These are mainly from about a generation before Sibawaih (177/789), though some of these, such as Zayd ibn Ali, are known only through reports of his work in later writers, (see e.g. Versteegh 1993: 115 on Muħammad al-Kalbi for other problems). The goal of these writers was to explain the significance of Koranic verses for everyday life, give background information regarding whom the verses were about, resolve apparently contradictory verses, mention variant readings, and explain individual words in the Koran. Linguistic analysis was not one of its explicit purposes. A number of terms were used which did become embedded in the ALT, though Versteegh attributes to the linguists the task of imbuing them with technical content. The linguistic terminology in the pre-Sibawaihan exegetes was “largely non-technical” (Versteegh 1993: 95). After his detailed summary of the six exegetes Versteegh concludes: The grammatical terms they used were mainly non-technical devices to refer to various aspects of the text, namely those that were indispensable for its exegesis. (1993: 196)

Grammar, to the extent it was (implicitly) evoked, was employed in ad hoc functional explanations, grammatical analysis as such not being these exegetes’ purpose. Against the hundreds of technical terms in Sibawaih, Versteegh finds only 24 terms in these works, whose linguistic content, however, still needed to be fleshed out by the linguists. Thus, contrary to a popular conception (see e.g. Owens’ [2019c] criticism of Sadan and Kasher 2018 and agreeing with Carter 2004: 44), the grammatical tradition did not obviously derive directly from a religious-exegetical one. Nor is it the case that any of the fixed sources commonly quoted in the Kitaab—Koran, poetry, proverbs, ħadiiθ (rarely cited in any case)—explain in any way the theoretical structure and linguistic detail of the Kitaab (Larcher 2020, chapter 5). These were raw material for his linguistic thinking. Recalling that it is Sibawaihi’s Kitaab which defines Classical Arabic grammar, the mystery remains, how there emerged one generation after a linguistically unsophisticated exegetical tradition with a functional but rudimentary literary apparatus in the background, one of the greatest grammars ever written. While my answer is simple, if tautological, it is one which is strangely ignored among Arabicists. This is that Sibawaih ascertained or invented simple but fundamental linguistic principles and applied them assiduously and consequently to virtually all domains of Arabic.³³ Ironically then perhaps, Sibawaih built a formal system in ³³ I think reluctance to accept this basic point is due to two main factors, one very legitimate, the other a reflection on the history of Islamic studies in the West. It is wholly legitimate and necessary, as in

106

THREE T YPES OF PRE- AND E ARLY ISL AMIC SOURCES

which multifarious intellectual activity could be encoded and categorized, even if the system itself was based on universal principles lying outside any single cultural era (see Owens 1988). From this perspective it would be ontologically misleading not to recognize the stand-alone status of Sibawaihi’s accomplishment. One answer then to the question why Documentary Arabic differs to a degree from Classical Arabic as defined by Sibawaih, and why it is not possible to derive Classical Arabic from the language of the papyri is because the conventions of Documentary Arabic were established, following Wansbrough, before Arabic itself became a language intimately bound up with a developing and systematizing Arabic–Islamic culture. Documentary Arabic is narrowly functional and organizationally highly formalized. Sibawaih, on the other hand, was interested in an adequate linguistic representation of Arabic. What use, for instance, would Documentary Arabic have for the complex articulatory phonetics of Sibawaih described in 3.2 above? Similarly, Sibawaihi’s contributions were descriptive, analytic, and theoretical, far removed from the applied linguistic horizon of Documentary Arabic. This only establishes that two different motives were at work in these two early sources of Arabic. An immediate historical linguistic question is in the case of discrepancies, how they are to be interpreted. As seen in 4.2, a major characteristic of the papyri is that there are, as Hopkins puts it “no distinctions of case.” Case on the other hand was one of the main preoccupations of Sibawaihi’s Kitaab. As I have said above, the question of case endings in CA is too expansive an issue to deal with in this book (see e.g Owens 1998c, 2006/2009: chapters 3, 4, 2018c). However, we should be clear that there are two aspects to the historical status of case in Arabic. For one, the papyri (as well as other early sources, see e.g. 4.3 below) simply provide further evidence for my contention (argued for first in 1998c) that the contemporary dialects continue an older, caseless form of Arabic. This is the easiest explanation for the lack of case in contemporary Arabic and says nothing about the status of case in CA. While efforts have been made to “explain” how the contemporary dialects came to lose case marking (in particular Al-Jallad and van Putten 2017), these explanations are convoluted, contradictory, and ultimately

Versteegh’s work discussed above, to seek for ultimate origins and inspirations of a work so important as the Kitaab. Besides Versteegh and Carter, in this respect one thinks of Talmon (2003) and Baalbaki’s early work (e.g. 1981). On the other hand, I see in the reluctance to embrace Sibawaih the linguist a deep-seeded mistrust of contemporary Linguistics (however conceived) among Arabicists, Islamic studies scholars, philologists, and historians, coupled with a conviction that the custodians and conceptualizers of so central an icon as Classical Arabic should be historians and interpreters of Islam, not linguists. If the study of Arabic and the Arabic tradition should serve mainly to inform the cultural history of Arabic and Islam, how can the originator of this tradition, Sibawaih himself, not himself have a motive ulterior to his detailed linguistic description? Al-Jallad’s remark (2020: 1 n. 4) is typical. “Owens, A Linguistic History of Arabic, p. 93, for example, has argued that there are no pure data to be found in the Kitāb of Sibawayh and that everything he writes or observes is filtered through his grammatical thinking. This extreme view, however, remains a minority position.” Ignoring the demoscopic irrelevance of the remark, Al-Jallad’s response is to ignore Sibawaihi completely in numerous key points, including, crucially, Sibawaihi’s treatment of imala, which is a weak link in Al-Jallad’s attempt to motivate an “Old Hijazi” Arabic (see discussion in 5.2.2, 5.3.1.3, App. 5.2.2).

4.3 GREEK ORTHOGRAPHY, BILINGUALS, GREEK RENDITIONS

107

of little comparative linguistic substance (Owens 2018c, see discussion in 12.3). Of course, this does not rule out that it may be possible to develop a plausible historical linguistic argument, but until such time the simple explanation stands. For the other, the question is where case came from in Sibawaihi’s Classical Arabic. This is a question I will simply defer on here. However, it would be grave reductionism to think that Classical Arabic is defined by case alone. Leaving this one domain aside, when the entire range of Sibawaihi’s descriptions are included in “Classical Arabic,” for instance all four –ki variants discussed in 3.2, the differences between CA and the contemporary dialects diminishes considerably. To end this section I should add that it is not suggested that Arabic poetry, the Koran, or other fixed sources are irrelevant to the emergence of Classical Arabic. The result of Sibawaihi’s intellectual achievement was to provide a matrix in which these could be analyzed grammatically. In this respect it is fair to assume, even if circularly, that part of Sibawaihi’s motivation was to develop a linguistic framework for analyzing these sources. Sibawaihi’s individual genius would not have been called upon if the larger context of Arabic–Islamic learning hadn’t asked for it. However, from my contemporary linguistic perspective, the ultimate success of Sibawaih (and the ALT in general) was in anchoring his thinking in universal linguistic concepts and methodologies, creating an instrument, grammar, infinitely extensible to the analysis of texts (e.g. poetry, Al-Siiraafiy’s Sharħ ʔAbayaat Sibawaih), employment in pedagogy (al-Sarraj, ʔUṣuul, cf. Zajjaaji’s ʕilal taʕliimiyya ‘pedagogical reasons’), parsing of Koranic verses (attr. Zajjaj, ʔIʕraab al-Qurʔaan), and generally supporting the vibrant intellectual tradition which characterized Arabic–Islamic culture.

4.3 Greek orthography, bilinguals, Greek renditions of Arabic names The last early source of Arabic which I examine here comes from Greek texts. These fall into two main types. The less common are simply a “transcription” in Greek of the Arabic/Safaitic. The more common are personal Arabic names rendered in a Greek text. Besides the fact that some of these are pre-Islamic, the potential value of these is that they might give insights into the precise pronunciation of consonants whose interpretation is for some reason problematic in the Arabic/Safaitic script (see discussion around 4.1.3) and provide information about short vowels, which is otherwise nearly always lacking. Kaplony (2015) examines the Arabic material in Greek orthography in three sets of texts. (4.28)

Arabic represented by Greek orthography Nessana Documents 6th –8th century, southern Palestine (Negev) Petra Documents 6th century, Petra (Jordan) Qurrah Letters early 8th century, Egypt (see App. 4.2)

108

THREE T YPES OF PRE- AND E ARLY ISL AMIC SOURCES

Summarizing these, Kaplony (2015: 2) notes that each of these sources was embedded in a multilingual environment. The Petra Documents and Nessana reflect an Arabic-speaking populations writing in Greek and using Aramaic in church, with a transition to Arabic becoming a written language among the Nessana Documents. The Qurrah letters come from a Coptic-speaking population who used Greek, Coptic and Arabic as literary languages. For basic exemplification, the following bilingual name from the Nessana Documents, written in Arabic and Greek serves for discussion. (4.29a) is the Arabic in transliteration (4.29b) the Greek transliteration of this name, and (4.29c) a roughly phonetic transliteration of the Greek. In (4.29c) a Greek word is in italics. (4.29d) shows Arabic sounds altogether missing from the Greek transliterations, paired against (4.29e) that fills in the gaps in the Greek transliteration on the basis of CA. (4.29)

Interpretations of a Greek transliteration (a) ʕbd ʔlʔʕlY bn ʔby ħkYm ‘abd alʔAʕlaa, son of Abi ħakiim’ (Kaplony 2015: 3) (b) αβδ ελαλε νιo αβι αχιμ (c) abd elale vio abi akim (d) Øabd el-ØaØle Øabi ØakiØm (e) ʕabd l-ʔaʕlaa ʔabiy ħakiym

In contrast to the Arabic, the Greek shows short vowels /a/ and /e/. The latter is interesting in suggesting an imala vowel (/aa/ → /ee/ or /ie/ in {elale}) though already this suggestion illustrates the problematic nature of even the Greek underspecification. Imala has a long mid vowel, but it might be /ee/ or /ie/ (Owens 2006/2009: chapter 7). Alternatively it has been suggested (van Putten 2022: 134) that the alif maqṣuura (‫ )ى‬is originally an -ee. This is an issue not treated in this book (though see discussion in 5.2.2 and App. 5.2.2). (4.29) is also interesting in suggesting the existence of a functioning case system in the genitive abi. However, Kaplony (2015: 3) notes that in the entire corpus of about 500 Arabic words there is no Arabic case flexion. The only trace of this is ab-i, written in filiation with a father’s name, otherwise abu. That this is conventionalized through context and not an interpreted case system is shown by the fact that abu will occur in contexts where Classical grammar requires a genitive, so long as it is not part of a filiation, e.g. “by the hand of Abu Zunayn” (written in Greek αβoυ {abou}, not as expected in CA with genitive αβι). It is equally clear, however, that the problem of underspecification met with in Safaitic is in evidence here as well. The Arabic script (4.29d) conventionally contains 14 consonants, {ʕbd, lʔla, ʔby, ħkym}. The Greek, however, contains only seven consonants. Three of the Arabic letters represent long vowels, -a on ʔaʕlaa, -y on ab-iy and the /y/ of ħakiym. Each of these is represented by a Greek

4.3 GREEK ORTHOGRAPHY, BILINGUALS, GREEK RENDITIONS

109

vowel letter, either ε or ι, so one can provisionally say that the 14 Arabic consonants are represented in 10 Greek letters. Of course, even this is problematic. {ι} here stands for an Arabic long vowel, whereas it might also represent a short /i/ (kasra, Al-Jallad 2015b: 32) a /y/ as well as /ii/ (/iy/) and various diphthongal offglides (Kaplony 2015: 6, 9). One sees the issue with ε, at the beginning of ʔaʕlaa representing short /a/ (probably, but perhaps /i/) and at the end long /ee/ (or /ie/). This still leaves a discrepancy of four consonants, and here the issue is even more serious. The glottal stop (ħamza) at the beginning of ʔab- is missing. This might reflect actual pronunciation, though there is no way to determine this. The two missing /ʕ/’s (ʕayns) and the missing /ħ/ suggest something more systematic, and in fact Kaplony (2015: 6) notes that the pharyngeals are never unambiguously represented in the Greek transliterations. Of course, we do know what /ʕ/ is phonetically in Arabic, so it is possible to fill in what is missing in the Greek transliteration. Ironically though, it is the Greek here that needs to be reconstructed, which underlines how easily the pre-Classical data turns in a circle. Thus, these early Greek bilingual texts and excerpts give us only very conditionally a better fully specified insight into pre-Classical Arabic than does the Safaitic epigraphy.³⁴ I will look at this issue in greater detail, concentrating on the Arabic consonants. The lack of one-to-one correspondences in (4.29) is not because of an inherent mismatch between Greek and Arabic orthography. There are 24 individual letters³⁵ in the Greek alphabet, 28 in the Arabic, excluding the alif /aa/. This is a discrepancy which favors many-to-one mappings from Arabic to Greek, but only slightly and once the pharyngeals and laryngeals are ignored, a one-to-one mapping or something close to it should have been possible. Had scribes been concerned with the issue, they might, for instance, have distinguished Arabic {k} and {x} as χ (chi) vs. ξ (xi). ξ was made use of, though to represent the Arabic sequence {xš} or {xs} (Kaplony 2015: 5), in this case the Greek mapping close to its phonetic value onto a relatively infrequent Arabic sound sequence.³⁶ Instead, all in all there are nine one-to-one mappings, while seven Greek letters are deployed to represent 15 Arabic sounds. In all of these cases, then, it is only possible to unequivocally discern which sound the Greek letter should represent by referring to the original Arabic sound which the Greek letter is supposed to elucidate. Table 4.10 summarizes the situation according to Kaplony.

³⁴ Alif, /aa/ is not represented in the Greek transliterations. For a modern Egyptian Greek rendition of Arabic /ħ/ as /x/ see Hassan 2020: 88. ³⁵ Digraphs are excluded here, e.g. oυ for /u/. ³⁶ In Wehr (1976) there are hardly five lexemes which would regularly yield a {xš} word initial sequence on the basis of regular morphological alternations, e.g. xašaa, ya-xšaa ‘fear.’

110

THREE T YPES OF PRE- AND E ARLY ISL AMIC SOURCES

Table 4.10 Representation of Arabic consonants with Greek alphabet (Greek to Arabic) Zero to one: no Greek representation of Arabic ʔ, ʕ, ħ, h (i.e. ʔ, ʕ, ħ, h are missing in Greek orthographic representation) one-to-one (9/9): b, q, n, w (oυ), l, m, n, r, f = (nine Greek letters to represent nine Arabic) one to many 7/15 (7 Greek letters to represent 15 Arabic letters) • {χ} k, x = 1 to 2 • {τ} = ṭ, t = 1 to 2 • {ɣ} j, ɣ = 1 to 2 • {σ} s, ṣ, š; = 1 to 3a • {θ} t, θ (ḍ) = 1 to 2 + three Greek letters representing ḍ • {ζ} z (đ, ḍ) = 1 to 1 + representing ḍ and đ • {δ} d (đ, ḍ) = 1 to 1, plus representing đ and ḍ Many to one (2/1): • (η, ει) yb • (as well as, {θ, ζ, δ} = ḍ, {ζ, δ }= đ, {τ, θ} = t) a As a reader points out, some data sets consistently disambiguate with letter sequences, e.g. σζ in the Qurra documents (Egypt) is used to represent š. b Kaplony is ambiguous on this point. It appears he intends here a consonantal value of /y/. {ι} also represents orthographic {y}.

Excluding the laryngeals and pharyngeals, most of the Arabic letters at least have a dedicated Greek equivalent, even if the mapping is two (e.g. gamma) or three (e.g. sigma) to one. However, the idea that uniqueness at some level of precision was intended is contradicted by the fact that three Arabic sounds lack a dedicated Greek counterpart. {ḍ} can be represented by any of three Greek letters (ζ, θ, δ) and Arabic {đ} (đaal) by two, (ζ or δ), but each of these two Greek sounds also represents another Arabic sound. Arabic {z} for instance, is uniquely represented by ζ. {τ} exclusively represents {ṭ}, but it can also represent {t}, so Arabic {t} is represented either by {τ} or {θ}. All of these one to many Greek to Arabic mappings and other discrepancies have been interpreted in various ways—how the Greek sound behind the letter might be pronounced, which Arabic variant was intended—but in the end such explanations, though interesting certainly, are speculative. Turning briefly to the vowels, the Arabic bilingual texts do not themselves represent short vowels since the Arabic signs for the short vowels had not yet been invented. Thus the only way to interpret the Greek representation of short vowels, e.g. fatħa, ◌́ = {α} or {ε}, is by reference to a later era in Arabic (ca. AD 700) when full vocalization was introduced. Long vowels and diphthongs show the same many to one mappings seen for consonants above, e.g. the Arabic diphthong {aw}, appears as {αυ}, {αoυ}, {oυ}, {ω}, or {o}, a one to five mapping, or five to one from the Greek perspective (Kaplony 2015: 9–10). However, the Greek values are in a sense meaningless unless one assumes one already knows that they were

4.4 L ANGUAGE CHANGE AND SOCIO-DEMOGRAPHIC RE ALISM

111

meant to represent an Arabic sound whose explicit attestation is ensured only a century or more later. For instance, {oυ} for {ρoυζικo} rizq, “wage” is interpreted by Kaplony (2015: 37) as [ruziq], where {oυ} “means” /u/, not /aw/ whereas in {θoυαβ} it receives the interpretation /aww/ = /tawwaab/ (Kaplony 2015: 25). The interpretation of {oυ} requires context, but the context is provided by the Arabic word it is supposed to elucidate in the first place. Kaplony (2015: 4) states that the above set of correspondences covers about 90% of all cases, implying that adding the further 10% would increase the ambiguity. Indeed, among items in Kaplony’s lexicon are found for instance interpreted [x] written with Greek ɣ, muxalliṣ ‘Savior” μωɣαλλις, and interpreted Arabic [x] with corresponding Ø in the Greek rendition, xaluuṣ ‘city in Palestine’ (Kaplony 2015: 36). Looking further afield than the three sources used in Kaplony, sigma (σ) can also represent {ḍ} (Al-Jallad 2015b: 25). A further problem resides in the interpretation of the Greek phonetics. Kaplony (2015: 12) observes that in the seventh century the Greek aspirated stops χ, φ, θ might represent aspirated stops, kh , ph , th , but equally might represent Hellenistic Greek which had fricative values for these sounds [x, f, θ]. Given the alternative spellings for ‘lastingness’{θαβαθ/θεβετ} (2015: 24) toggling through all combinations of aspirated/fricative interpretations yields [th abath ], [θabaθ], [θabat], or even [th abaθ]. While these are renditions based on an interpretation of Greek pronunciation of Greek, it happens that the first (see 4.2.1.1) and third approximate to known variants in Arabic.³⁷ In sum then there is underspecification because of: • Lack of Greek equivalences to Arabic sounds altogether • Ambiguity due to one to many and many to one Greek to Arabic mappings • Ambiguity in interpretation of Greek historical phonetics

4.4 Language change and socio-demographic realism In the final section of this chapter, I would like to add one more critical dimension to parameters for interpreting Arabic language history. This is the issue of realism in interpreting the primary data. Socio-demographic realism asks that results of reconstruction be correlated with what we know of the populations to whom the varieties are ascribed. A case in point is /k/ palatalization discussed at length in 3.2. There it was argued that ∗ k → cˇ innovated in the Middle East and was diffused outward into North Africa and ³⁷ To add to the problem, the AP form is interpreted as [saabit] on the basis of {σαβiτ} (2015: 24), which is either a further instance of one for many{σ} for [Ɵ], or a very early attestation of the ∗ θ → [s] shift found in certain registers in Levantine and Egyptian Arabic, and a further scattered range of dialects (see 5.3.1.5).

112

THREE T YPES OF PRE- AND E ARLY ISL AMIC SOURCES

Khorasan. Leaving aside theoretical issues of parallel independent development which are discussed in the next chapter, the suggestion is socio-demographically plausible given that we know Arabic-speaking populations left the Middle and moved into Algeria, into Khorasan, and other areas which ∗ k → cˇ is attested. An ensuing typical socio-demographic split in the great bulk of data which is examined in this work is of the type shown in Figure 4.1. A A

*k B

k

č

Figure 4.1 Typical split

Innovations occur in some segments of the population—the “B” branch, but not in others, which maintain the original feature. Socio-demographic realism does not give automatic solutions for innovations. A case in point is Aramaic š = Arabic s in certain loanwords, described in 4.1.7. Retso¨ argues that the history of a word such as Arabic masiiħ originating in Aramaic mšiiħ goes back to an era when Arabic s1 = PS š, was /š/ (see Table 2.2). When Arabic ∗ š shifted to /s/, it took mšiiħ (→masiiħ) with it. The alternative proposed here (in principle following Diem/Wansbrough) is that bilingual scribes understood Aram. mšiiħ as follows: Aram. /š/ = Ar. /s/ (šm = ism ‘name’) and made the conversion in their translations. Both solutions are socio-demograhically plausible. I think what speaks for my preference is that the words are largely restricted to a learned vocabulary relevant to the emerging Islam, i.e. the type of words which learned scribes would have dealt with. A second issue is defining when the general shift PS ∗ š → s occurred in Arabic. Were it long before 622 the likelihood of just a set of learned words being preserved into Islamic times is less plausible than understanding them as introduced at the time Islam itself was emerging. Finally socio-demographic realism may run into dead ends. A case in point is the Safaitic interpretation discussed in 4.1.5.2, Safaitic ∗ ɬ < PS s2. Al-Jallad suggests that the phonetic/phonological value of Safaitic was a voiceless lateral fricative, [ɬ] (see discussion in 4.1.5.2). Al-Jallad takes this value to be the PS value of the sibilant sometimes represented as s2 (see Table 2.2 in 2.1), and he thinks that this was maintained into Safaitic in the case of Arabic. Even if Al-Jallad’s reasoning may be questioned (4.1.5.2), from a purely philological perspective, an assumption of PS ∗ ɫ is possible, but so too is š. The reality of underspecification leaves us hanging. Philological evidence supporting one interpretation or another such as relic tokens, borrowings, odd phonological correspondences, compelling evidence from within Safaitic texts is totally lacking. Whereas in the previous two cases the source values (∗ k/∗ š) are certain, in this case there are no Arabic varieties or no instances of earlier borrowings where an unequivocal ∗ ɫ is found. It was

4.5 AN INTERPRETIVE RECORD

113

argued that interpreting Safaitic s2 as [š] enjoys the plausibility of linking the two Safaitic values and , directly to their later reflexes in siyn ‫ س‬and shiyn‫ش‬, though such cannot be considered a definitive argument. In this case the Edzard’s point (1998) may have been reached where no definitive historical linguistic solution is possible.

4.5 An interpretive record This chapter has taken a critical look at problems in interpreting the comparatively rich pre- and early Isamic resources. The epigraphic record, the early papyri, Greek transliterations of Arabic names are valuable sources, but none are stand-alone witnesses to early forms of Arabic. At one point or another these sources always need reconstructive interpretive aid from later corpora, from later grammatical descriptions, even from the evidence of modern dialects. Although the issue was already pointed out by Brockelmann (see n. 9, this chapter), it has received less meta-interpretive attention from scholars than is its due. As was touched on in 4.1.5 this situation requires careful assessment of all explanations. What is the best categorical explanation for the -n- in {y-ʕwr-n-h} ‘he effaces it’ (see (4.7)), a reflex of the CA energic or, as argued for here, of the automorphemic intrusive -n? An answer, one way or another, will align Safaitic either with CA (energic) or with contemporary dialects (intrusive -n), with all the interpretive nuances associated therewith. The discussion in this chapter therefore leads in a more direct line to that of the next, reconstruction, than is usually recognized in the history of Arabic. Early Arabic sources mean sources underspecified in one way or another. Their citation more often than not will come with caveats attached. They constitute an invaluable insight into the basic structure of Arabic in pre-Islamic times, and in the lexicographic domain represent an important stand-alone source for comparison across different varieties and languages (e.g. recently Borg 2022). Taken as a whole the fact that independent reconstruction and correspondences with the early grammars of Arabic allow a highly plausible ascription of detailed value to underspecified orthography coincides with one main argument of this book, namely that alinearity is one characterizing trait of Arabic.

PART II

R ECONSTRU CTION Without reconstruction there is no historical linguistics. Arabic presents a particularly propitious testing ground for issues in reconstruction because much of what can be reconstructed can be correlated with the linguistic attestations, however imperfect, stretching back across the earliest period, with population movements and socio-historical events over the past 1,500+ years. This state of affairs makes Arabic as interesting for historical linguistics as it does offer transparency to the history of the Arabic language itself. This transparency, however, comes at the “cost” of considerable complexity.

5 Punctuation and language history I/I + D, inheritance/innovation, and diffusion

Essential principles of comparative historical linguistics have not changed since the nineteenth century when the discipline developed on the basis of an explicit methodology. Languages, it is assumed, change through time and these changes can be defined as basic linguistic events happening to linguistic forms. The events are described in familiar linguistic contexts. The format of the event is an input environment, a change of some sort, resulting in an output. Each of these elements requires its own justification. I proceed directly to two examples, one very simple, the other rather complicated, so each element can be ascertained concretely.

5.1 Basic concepts, basic exemplification: The I/I + D paradigm The boundaries between the formal representation of a historical linguistic change and a synchronic morphophonological alternation are porous. In this section I discuss two examples. In the first, the formal change postulates the historical linguistic change, and in the second the change represents both a synchronic alternation and a representation for how the change occurred. As seen in 3.2 one of the values of Arabic ∗ jiim {j} is the affricate [dzˇ], this in fact being the most widespread variant in the Arabic world. (5.1)

∗

j→y

In southern Iraq, e.g. the Arabic of Basra (Mahdi 1985) and the Upper Gulf (e.g. Kuwait) original ∗ j is realized as /y/. (5.2)

∗ ∗

ja → ya ‘he came’ waajid → waayid ‘a lot, many’

The same rule is found down the Persian Gulf all the way to Oman, but on an increasingly irregular basis. In Baħrain, for instance, it characterizes the ʕArab population, but in general, not the Baħarna (Holes 1983, 1986). The input environment is ∗ j [∗ dzˇ]. The ultimate reconstruction of this sound is problematic, as seen in the discussion in (3.2) above. However, the surrounding dialects, e.g. Baghdadi Arabic, Baħarna Arabic, have the affricate value /dzˇ/, so it Arabic and the Case against Linearity in Historical Linguistics. Jonathan Owens, Oxford University Press. © Jonathan Owens (2023). DOI: 10.1093/oso/9780192867513.003.0005

118

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D

may be assumed that this was the input. Equally, the input value is justified by the extremely large spread of dialects which “still” have /dzˇ/. In this case the event in (5.1) is nearly exceptionless. All ∗ j’s go to /y/, at least in the Upper Gulf.¹ Two sorts of caveats may be mentioned. As Holes (1983: 444, 448) points out, words of SA origin like jaamiʕa ‘university’ do not change to /y/. Furthermore, in Baħrain where the pronunciation of /j/ or /y/ is associated with different communal groups, one may have variation in the change of ∗ j → y, e.g. among the Baħarna (Holes 1987). The output is, in theory, a new form. A rule of the form (5.1) has the same basic structure as a rule describing a synchronic linguistic change (e.g. an allophonic change), but the asterisk at the beginning marks the change as one designating two historical stages. In (5.2) ∗ j is the postulated input. /y/ is the output. The input must be postulated because, barring the factor of sociolinguistic variation which will be taken up in Chapter 6, ‘come’ in Kuwaiti Arabic is ya, ‘much’ is waayid and so on. The historical linguistic rule assumes that the outputs are now the underlying forms which native spreakers of Kuwaiti Arabic learn as their L1. A rather more complicated case involves a conditioned change and it comes from Western Sudanic Arabic (Owens 2013d [2019]: 460). The input data are the person markers of the first and second person verb (see 12.3.3 below). Speakers of Western Sudanic Arabic migrated from Upper Egypt (see 8.1), and so the input form can be assumed to be –t as in Upper Egypt (Woidich 2006a: 273–282).² (5.3) Input to WSA katab-t ‘I wrote’ katab-t ‘you.M wrote’ katab-ti ‘you.F wrote’ In the WSA a conditioned event occurred, which can provisionally be formulated as follows, where –t = 1, 2MSG perfect suffix and # = word boundary. (5.4)

∗

-t → Ø /C_#

¹ It is quite possible that even the ∗ j → y change is non-contiguous; see e.g.Qahṭani (2015: 47) ∗ j → y is however also confirmed in SW Saudi Arabia by Behnstedt (2016: 6). As for so many features, all statements of geographical distribution are provisional. ² Given the complete uniformity of (5.4) in the WSA region, the innovation must have been early. It could have taken place in Upper Egypt among an ancestral group which then migrated into the WSA area, or among early migrants into the WSA area who then spread it further (see Chapter 8 for historical background of Arabs in the Sudanic region). One point arguing for the former explanation is the unique shared isogloss of form V and VI verbs (see 10.2 (10.6)) with the “odd” derivational prefix al- instead of tV-, e.g. al-kallam ‘he spoke.’ This is also found in the southwestern and central western oases of Kharga and Farafira in Egypt, and in a few isolated points around Gina north of Luxor (Owens 1993c: 146).

5.1 BASIC CONCEPTS, BASIC E XEMPLIFICATION

119

Thus, given the input (5.5) katab-t# → katáb_# the –t is lost, though note that it leaves a trace behind in the stress shift to the final /á/. The stress shift is dictated by the common Arabic rule: stress VCC#, and here the –t has an underlying presence in attracting stress to the final syllable. Furthermore, the presence of an underlying -t is implicit in the fact that the rule does not impinge on the morphophonology of weak verbs. As in all varieties of Arabic, the “hollow” verb saar ‘go, travel’ has the short form sər before a C-initial subject suffix in the perfect, e.g. sər-tu ‘you.M.PL went.’ Even when the –t is not present the weak verb morphophonology acts as if the –t is there. (5.6) ana sər-Ø d’awaali I went directly ‘I went directly’ (Xadija 37) The rule is more complicated than this, however. It does not apply in the following contexts. (5.7) Context of -t (a) katab-t-a ‘I wrote it.M’ (b) mašee-t ‘I went’ It is not expected that the rule applies in these contexts. In (5.7a) –t is not word final. It is followed by the 3MSG suffix pronoun –a. In (5.7b) it is preceded by a vowel, not a consonant. In these contexts WSA is identical to all other varieties of Arabic in having an overt -t. More unexpectedly, it also does not apply when the following word is (1) marked by the definite article and (2) this definite article marks a direct object. (5.8) (a) katáb-t al-maktuub ‘I wrote the letter’ vs. (b) katáb maktuub-ı´ ‘I wrote my letter’ (c) katáb ajala ‘I wrote quickly’ (d) katáb al-yoom ‘I wrote today’ In (5.8a) the –t occurs before the definite article al-, which marks the direct object. Clearly al- is integrated phonologically to the preceding word creating a t-V context. In (5.8b), however, there is a direct object, but it is not marked by al, in (5.8c) the following word begins with a vowel, but it is an adverb not a direct object, and in (5.8d) the following word bears the definite article al-, but it is not a direct object. A more comprehensive, if rather basic representation of the conditions of the rule are as follows, filling in the condition on “#.”

120 (5.9)

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D ∗

-t → Ø /C# # ≠ following al-DO

The change here is a conditioned change, which clearly is more complicated than the unconditioned ∗ j → y in (5.1), but it follows the same basic format. The input needs to be justified by comparison to attestations elsewhere, and the rule needs to be formulated in a general linguistic manner. Before leaving this particular example it is relevant for future discussion to note that the nature of the change in (5.9) was set in motion, as can be reconstructed, by phonological factors. Briefly here syllable structure in WSA is determined largely by the sonority of consonants (see Owens 2006 [2009]: chapter 6 for extensive discussion; and exx.(10.8)–(10.10)). The difference between the following is one of sonority. (5.10)

Sonority epenthesis LCA (a) bu-tubus ‘he cuts off shrub’ < ∗ bu-tbus (b) bu-ktub ‘he writes’

The basic rule is: (5.11)

insert an epenthetic vowel between C1 C2 , where C1 is less sonorant than C2 (rising sonority).

For present purposes the important property is that the least sonorant consonant is /t/ (Angoujard 1990: 15), hence in a C-t sequence, there will never be epenthetic vowel insertion, via (5.11). On the other hand, t-C will command epenthesis via rising sonority, as in (5.10a). In (5.10a) one can postulate ∗ bu-tbus, which meets the condition of (5.11), resulting in the insertion of epenthetic /u/. In (5.10b) /k/ is more sonorant than /t/, hence there is no epenthetic insertion (falling sonority, no epenthesis). No matter what consonant -t is suffixed to, no epenthesis will occur because the consonant before /t/ will always be more sonorant: (5.12)

∗

katab-t ‘I wrote’ tabas-t ‘I cut off a shrub’ ∗ marag-t ‘I left’ … ∗

The frequently occurring Ct# descending sonority contour can be reconstructed to have led to a situation where the final –t became perceptually less and less prominent (see DPM in 11.5.4.1), resulting eventually in (5.5). In 2.3.2 and 7.2.2 rather different syllabification patterns are documented. Changes can be splits or mergers (Figure 5.1). The allomorphemic rule in (5.9) is a split. In WSA the output produces two variants, -t and Ø, -t ~ Ø in the conditions given. (5.1) on the other hand is a merger. The input has ∗ j, but there also exists ∗ y, not yet mentioned, as in yaabis ‘dry.’ The rule in (5.1) merges original ∗ j with original ∗ y in y.

5.1 BASIC CONCEPTS, BASIC E XEMPLIFICATION *-t

Figure 5.1 Split, Merger

-t

*j Ø

*y y

*ja

121 *yaabis

ya, yaabis

Splits and mergers are the bread and butter of historical linguistics. They were already met in 1.1.1 where it was seen that Ferguson (1960), even less successfully Blau, tried to define Old vs. Neo Arabic, or CA vs. modern dialects by splits, one branch characterizing OA/CA, the other dialects.³ As historical linguistics developed these were used to define language differentiation, but since they are basic linguistic constructs they can equally define dialect differentiation. (5.9) is perhaps the most important feature setting WSA off from all other dialects, and similarly the merger in (5.1) sets the Gulf dialects off, especially Upper Gulf, from other dialects. In this respect, splits and mergers are innovations relative to the original input. They potentially define new entities, be they languages or dialects. However, whereas all “new” entities imply an innovation distinguishing them (a split or merger), not all innovations imply new languages or dialects, and in general innovations require large-scale change in other parts of the entity. Examples such as these are easy to find in Arabic, and the tradition of Arabic dialectology gives a solid overview of major features. Behind every dialectological difference, however, there rests a historical linguistic process leading up to it, and this aspect of the distribution is not given prominence in the study of Arabic. Without such a perspective, however, it is impossible to develop a coherent macro-picture of Arabic language history, as I will illustrate in the next section. What might be disconcerting in a general way is that Arabic is often internally characterized by very different innovations. Gulf and WSA Arabic are both “Arabic” (see 10.2.1, 10.2.2 to this point), yet are distinguished inter alia by sometimes fundamentally different innovations, for instance those in (5.1) vs. (5.9). Hovering in the background to any linguistic change is the fact that movement and migration has been a part of Arabic culture for as long as it has been documented (see 3.1). In what is termed the pre-diasporic era, the pre-Islamic period, this movement occurred mainly in the Middle East itself, between southern Turkey and Yemen. In the aftermath of the Islamic expansion, the movement covered a much larger area, from Uzbekistan in the east to Nigerian and Andalusia in the west. These movements may produce linguistic changes, some straightforward, others more profound, the totality of them producing a somewhat confusing tapestry. One major theme in this book is that the vast majority of these changes need to be understood in part in terms of the schema: (5.13)

I/I + D I/I = iinheritance or iinnovation plus diffusion

³ Under one interpretation. Ferguson at least saw the development linearly, all CA → dialect, as in Figure 1.1.

122

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D

A linguistic feature is either inherited by a group, or innovative in it, and when the group migrates it diffuses the feature with it. As Labov states (2007: 346, cited above already in 1.5) “When entire communities move, they carry with them agents of transmission and incrementation.” The feature may further diffuse when the migrating group comes into contact with another population which adopts the feature. In a linguistic context, “diffusion” is ambiguous between diffusion of peoples and diffusion of linguistic features. It is argued in this chapter that in Arabic linguistics diffusion is typically understood in the second sense. It may therefore be useful to distinguish here between primary diffusion, features carried intact from one place to another by a specific group and secondary diffusion, features spread from one group to another via contact. I abbreviate both types of diffusion to I/I + D. In the case of (5.9), for instance, immigration first brought Arabs to Upper Egypt, and a subsequent migration took them into the Sudanic region (see 8.1). Unless the context demands disambiguation, I will simply use the shorthand I/I + D. To this point in the discussion primary diffusion has occurred (see 11.5), though no known striking linguistic innovation. Either in Upper Egypt, or shortly after settling into the Sudanic region (5.9) can be postulated, a major innovation. This innovation once in place was subsequently spread over the entire western Sudanic region. I/I go together. Groups spread with inherited features. Ancestral WSA speakers brought ∗ /j/ with them into Upper Egypt, and they have carried it (diffused it) further in the WSA region. ∗ /j/ is inherited. Ancestral WSA speakers innovated (5.9), and they subsequently diffused it throughout the WSA region. In this chapter I would like to dwell on the importance of the I/I + D paradigm for understanding Arabic language history by illustrating it with a number of case studies. Some of these are short and direct, whereas others require considerable elaboration, so the length of the following sub-sections varies significantly. In the course of presenting these case studies a number of general principles of historical linguistics will become evident and elaborated on, in particular that pertaining to grammaticalization theory and the concept of parallel independent development.

5.2 When things get complicated: Diffusion, not parallel independent development The two innovative features illustrated in the previous section are simple in the sense they define and are defined by geographically discrete areas. The one area is the Upper Gulf, the other the western Sudanic region starting in Kordofan in the Sudan and moving westward all the way into the Lake Chad area of NE Nigeria. On the periphery of these areas matters may become a bit fuzzy and mixed. As noted, the lower Gulf has (5.1), but on a less regular basis, while (5.9) may find its way all the way into Khartoum and the central Sudan as a result of migration

5.2 WHEN THINGS GET COMPLICATED

123

from the west. However, the changes in the core regions are completely regular, i.e. in historical linguistic terms, in that they have gone to absolute completion. For instance, a survey of (5.9) as found in the texts used in the study described in Chapter 11 below showed no exceptions to (5.9) in the total of 33 tokens where the environment applies. As often as not, perhaps more often than not, however, such geographically coherent areas marked by a discrete innovation are the exception in the case of Arabic. As a prototypical exemplification of this situation, in this section I describe one such case in detail.⁴

5.2.1 A basis for discussion: The intrusive -n As already encountered briefly in 4.1.5.1, there exists a morpheme which I term the “intrusive –n” that occurs before an object pronoun. It has the basic allormorphy, -Vn after a consonant and –n after a vowel, as in the following examples from LCA Arabic. (5.14)

kaatb-in-ha ‘he has written it.F’ baani + N-ha → baani-n-ha ‘he has built it.F’

In some dialects the intrusive –n is obligatory in that if an object pronoun suffix occurs, it must be inserted between the stem and object pronoun. In most dialects the intrusive –n is added only after active participle predicates. A complete list of the dialects with this property is given in (5.15). (5.15)

Intrusive -n Yemen (Dathina, South Yemen, Landberg 1909: 720 ff.) meħaalif-ínn-ak ‘He is your allied’ (= allied-N-you-M), Oman (Reinhardt 1894, [1972: 139]) ḍaarb-aat-ínn-iš ‘They.F have hit you.F’ Baħrain (Holes 1987: 109, 2016: 203) xaaṭb-ín-ha ‘He has become engaged to her’ Syrian desert (Wetzstein 1868: 75, 192, not attested in later works to my knowledge) šaayf-ann-u ‘He has seen him’ Khorasan (Seeger 2002: 635) aaxđ-t-unn-a ‘I (F) married him’ (take-F-N-him) Nigeria, eastern or Bagirmi dialect kaatb-in-ha ‘he has written it.F’

⁴ This is based on Owens (2013b), which can be referred to for the details of reconstruction.

124

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D

The conditions of occurrence may vary slightly from dialect to dialect. For instance in Baħrain and in the Emirates (Wilmsen and Al Muahiri 2020: 289) –n occurs only after a singular participle stem, either masculine or feminine, in Dathina after any inflectionally marked participle, and in LCA the –n neutralizes other suffixes completely so that –n is immediately post-stem. The allomorphy of the suffix is identical across all dialects, -nn before a V-initial suffix, -n before a C-initial. The vowel may be either /i/ or /a/, the variation being a reflex of the widespread taltala differences across dialects (5.3.2.5), as well as /u/ in Khorasan. In all of these dialects the condition is the same: if an object suffix is attached to the participle, the –n is obligatorily inserted. One dialect requires special mention and that is Uzbekistan Arabic and its nineteenth century Afghanistan offshoot. Here the form is as in (5.16). However, the pronoun object suffix in this case has been refunctionalized to mark a subject, not an object, though only in the first and second persons (Ingham 2006: 32; Zimmermann 2009: 620). This construction is discussed in greater detail in 7.2.4. (5.16)

qaʕd-in-kum seated-N-you.MPL ‘You MPL have sat down’ kaatb-an-ni Written-N-me ‘I have written’

In a small minority of dialects –n may be inserted before a pronoun object suffix after an imperfect verb (see [4.8]). This is reported in Oman (Holes 2011), the Emirates (Wilmsen and Al Muhairi 2020: 286) in southern Iraq (Holes 2016: 23) and perhaps Tihama (Al-Qahtani 2015). (5.17)

n-sawwi-nn-a ‘we do it’ (Oman) (we-do-N-it.M) ʔana y-aaxđ-inn-ah ‘I (M) will take it.m’ (Al-Qahṭani 2015: 60)

The insertion of –n automatically before object suffixes can only be understood as an instance of a retention in those dialects which have it, and retention from a common proto-form ∗ -n. The linguistic logic of this conclusion is clear. –n itself is meaningless. Some dialects have active participle + object suffix pronoun as (5.18a) and some as (5.18b): (5.18)

AP + object suffix (a) kaatib-ha (b) kaatb-in-ha ‘have written it.F’

Semantically (5.18a) is identical to (5.18b). Speakers of Bagirmi Arabic will use (5.18b) where speakers of Cairene intend (5.18a) and speakers of Cairene will use

5.2 WHEN THINGS GET COMPLICATED

125

(5.18a) where Bagirmi speakers intend (5.18b), but it all means the same. The only difference is that the –n speakers automatically insert the –n before an object suffix. The intrusive –n is also attested in other Semitic languages, particularly western Aramaic, as described in App. 5.2.1 (see Hasselbach 2006, Williams 1972).

5.2.2 The intrusive –n and Lass’ principle A number of scholars interpret the intrusive –n from a different historical perspective, claiming that it has evolved in the different languages/dialects independently (Retso¨ 1988: 87;⁵ Hetzron 1969: 100). Essentially, they would allow parallel independent development. This is vanishingly unlikely for the following reasons. • The development has no typological basis. The insertion of an –n- before an object pronoun is perhaps unique to the languages discussed here. • The development is highly specific. Following Hetzron’s own “morphology first” principle (1976), it is a strong candidate for flagging historical linguistic continuity. • This specificity is all the more striking in that intrusive –n is semantically meaningless. It is auto-morphological, determined in the configuration (for most Arabic dialects) AP + OBJ PRO → AP-n-PRO. Its only “meaning” is the morphological marking determined by its context. • The development extends across independent varieties separated by great spaces of time and geography. If it is contended that it arose in one variety via parallel independent development, it needs to be argued that in each variety where it is attested it arose in this way. • Arguing for parallel independent development would make the refunctionalization of an original –in + object paradigm into a subject marker in Uzbekistan extremely difficult to explain; Uzbekistan Arabic would have needed to have inherited the basic pattern before it refunctionalized it. • Uzbekistan was settled early (710) and soon cut off from the rest of Arabicspeaking world (around 800). This indicates that –n was already in place at that time. This accords with its eventual migration to the LCA: it diffused out of the southwestern Arabian peninsula to its current locations today. I have concentrated on this point for a reason beyond its inherent interest to the history of Arabic and Semitic. A basic orientation is offered by Lass (1993: 163).

⁵ Somewhat surprisingly for Retso¨, who in other places (2000, see 2.4) has argued against using the notion of drift (alias parallel independent development) in comparative Semitic.

126

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D

P2 Lass’ principle … parallel innovation (convergence) is to be avoided in favour of single innovation pushed back to an earlier date.

Lass instrumentalizes this principle: The coincidence-barring requirement: convergent development is bad, not to be invoked except under severe compulsion. (1993: 164)⁶

Without this working principle it should be clear that as soon as one finds (1) strikingly similar forms, which are (2) separated geographically by large distances, to claim parallel independent development is tantamount to claiming that a language can re-invent itself at will, at the behest of the historical linguist. This principle is applied to the intrusive –n discussed here. All instances of the intrusive –n + object pro are due to a single innovation followed by the spread of this innovation (I + D) in disparate languages, dialects and speech communities, as described above. It is not difficult to see why this simple principle needs to be given precedence in historical linguistics. First, it is difficult to see how the multiple innovations could be defined independently in each variety the construction is attested in. An Arabic-speaking population reached the Lake Chad area, then decided that AP + Obj pro needed an –n-. The Arabic-speaking population which migrated to Uzbekistan decided the same thing and decided to go a step further in refunctionalization of the construction. However, there is no obvious linguistic conditioning environment which explains these developments, let alone adding those from the Gulf region, Yemen, and on into Middle and Neo-West Aramaic. Moreover, simply ignoring normal linguistic procedure and declaring the event to have occurred independently nullifies the genuinely tangible and linguistically explicable innovations which do characterize the Arabic of the Lake Chad region. In (5.9) above the split of the 1, 2MSG subject marker was described as the result of motivated morphophonological conditioning. Motivation is entirely lacking in the current case if parallel independent development is claimed. Secondly, without Lass’ principle P2 there will be no obvious place to draw a line between independent development and shared inherited change. Pushing the argument to the extreme in order to short-circuit further discussion, it might be claimed that the paradigms in Table 11.1 (Chapter 11) arose via parallel independent development, that there is no historical relationship among the varieties listed in the paradigms, or, if one will, that some of the paradigms arose via parallel ⁶ This is expressed from a slightly different perspective in Lass (1997: 176) “If a character C is shared by post-separation Lc and Lm, which in turn have a common ancestor, then C was a property of the ancestor.”

5.2 WHEN THINGS GET COMPLICATED

127

independent development, say LCA? In fact, to bring this home more pointedly, there is no reason changes routinely assumed to be shared innovations among the Semitic languages couldn’t, by the current logic, be explained in terms of parallel independent development. Taking Figure 2.4 as a point of reference, there are three nodes sharing ∗ p → f, Ethio-Semitic, MSA and Arabic (5.19). No one has suggested that these arose via parallel independent development, even though, as Huehnergard and Rubin would have it, the change is very common in languages. But parallel independent development is exactly the solution proposed over and over again as soon as multiple, non-contiguous occurrences of the same feature is attested among the spoken Arabic dialects. In (5.20) is illustrated the change from n- to n-…-u marking the 1PL imperfect as independent changes in various African Arabic dialects. This is discussed at length in 5.3.2.4 below under the I/I+ D rubric. (5.19)

∗

p → f in Ethio-Semitic p → f in MSA …∗ p → f in Arabic ∗

(5.20)

∗

n- → n- … -u in Morocco n- → n- … -u in Upper Egypt ∗ n- → n- … -u in WSA ∗

There is a conceptual disconnect here. In the case of Arabic we know from independent historical sources that the Arabic populations moved about, that they would have been agents of primary diffusion. No such historical record helps us with the proto-Semitic cases (see 4.4). Nonetheless at the PS level the default assumption is common origin, whereas in Arabic it is as often as not parallel independent development. Thirdly it is useful to recall assumptions made about innovation + diffusion in well-documented cases from outside Arabic and to hypothetically apply parallel independent development argumentation to them. Because the point is so basic to any historical linguistics, I will provide a number of examples well-known in some circles, but not necessarily in general in historical linguistics, nor in Arabicist and Semiticist circles. In a well-known study, Trudgill et al. (2000, see 6.6 below, Trudgill (2004) describes a number of features which were transported from Britain to New Zealand around 1840 when a large number of Britons migrated to New Zealand. They describe a three generation koineization process in which mixed features eventually were leveled out till a more homogeneous dialect was formed. The data is based on actual recordings. For instance, early on immigrants from Ireland used t < ∗ θ, tiŋ ‘thing,’ whereas the majority of speakers had /θ/, the variant which eventually prevailed. Looking at this phenomenon from the perspective of traditional Arabic historical interpretation, it might be concluded that the shift of

128

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D

t < ∗ θ in the speech of some first generation New Zealanders was the result of parallel independent development and had nothing to do with an inherited change which had long set in in Irish English. To my knowledge, this has never been suggested. In similar fashion, Labov (2007) describes the westward expansion of two short vowel (/a/) systems from the eastern United States into the Midwest. One is assumed to have originated in New York City by at least 1800, and spread from there to Albany, Cincinnati, and New Orleans (2007: 364). The system of phonetic constraints governing tense and lax /a/ in particular is argued to be so similar in each variety that independent parallel development is ruled out. Carmichael and Becker (2018) follow up on this in more detail, but with the same explanation. New York City and New Orleans share the very specific “split” short /a/ system, lack of rhotic /r/ and diphthongization of /r/ in bəid ‘bird.’ The sharedness of the rhotic system is particularly compelling. Both dialects essentially “lack” an /r/ after a vowel (biə ‘beer’). What the study by Carmichael and Becker brings out, however, is that a whole series of complex conditioning environments are shared between NYC and NO: in word final position before a vowel in the following word /r/ is favored (/biər# iz/) but word final before a consonant (biə# spild) it is unlikely in both dialects. A correlation of 15 conditioning linguistic factors reveals a nearly perfect correspondence between the NYC and NO data sets for the two general factors, “word context,” subsuming seven independent variables, and “preceding vowel,” subsuming six different vowels (Carmichael and Becker 2018: 303). The same explanatory logic has been applied to a second, different system of short vowels termed the Northern Cities Shift (NCS). This is a complex yet homogeneous system stretching today from Wisconsin in the west to upper New York State in the east, around the Great Lakes. Again Labov (2007: 374) explains this vast region of homogeneity not by parallel independent development, but rather in terms of an origin in Upper New York state around 1830, followed by westward diffusion. What is striking about these case studies as compared to Arabic are two points. First, the multiple appearance of complex systems, such as phonological conditioning factors governing short /a/ allophony between New Orleans and New York City (Carmichael and Becker 2018: 305) are immediately suspected to be instances of I/I + D. While scholars comb older records and literature to determine whether there is actually mention of such variants in the historical record, the case for a common origin + diffusion stands on the linguistic merits of the case. The hypothesized common origin of the NYC and NO features mentioned here is based on linguistic reconstruction. Secondly, there often is historical material bearing on the issue, reports in historical societies, diaries, trade ledgers between cities, and the like. There is a plausible connection between a common linguistic origin and a common social link.

5.2 WHEN THINGS GET COMPLICATED

129

The parallel to Arabic could hardly be clearer. We know, for instance, that Arabs spread out of the Middle East homeland into Africa, Central Asia. The written historical record is rarely so clear as, say the migratory connection between New Jersey/NYC and Cincinnati (Labov 2007: 362), but it is not altogether lacking either (see 8.1). The linguistic record, as described here, I think speaks for itself. What is missing is a historical linguistic reflex in the Arabic and Semiticist traditions which looks for common origins and their developments. Finally, if one doesn’t give precedence to the comparative method, including Lass’ principle (P2), there is nothing stopping one from allowing cultural or other stereotypes from taking precedence over linguistic methodology. Blau (5.3.2.9) rejects a shared innovation (retention?) and one instrument to do so is to claim parallel independent development especially for those features which represent striking, relatively unusual isoglosses with Arabic in other regions. One final point. In claiming that all intrusive –n constructions in Arabic go back to a reconstructed common source, it is not claimed that once “in the language” the construction is indelibly fixed. In fact, while the basic conditioning factor of a pronominal suffix (one would say “underlying grammar”) is identical wherever the construction occurs, the reality is that it is almost never identical in its various manifestations, as the partial lists in (5.21) shows. (5.21)

Incrementation on a common construction a. Dathina (Yemen), Oman: -Vn can occur after all active participles in any inflectional form b. Baħrain: -Vn only after singular AP’s (M or F) c. Nigeria: suffixation of –n neutralizes all gender and number contrasts in the AP d. Uzbekistan: -Vn suffixed only in 1, 2 person; -Vn refunctionalized to mark subject, not object

This variation cannot, I believe, be argued to show that the phenomenon itself does not have a unitary historical source. Exploring the detailed history of the construction is an unfinished task. Here it can be quickly noted that (5.21b) loses the contrast in the marked plural category, and that (5.21c) neutralizes all contrasts marked on the AP stem, i.e. generalizes the defining context to AP stem + -Vn (vs. AP stem + suffix + n in Dathina and Oman). Both of these changes are derivable via leveling from a postulated original situation of (5.21a). (5.21d) is more complex, and probably involves contact-based influence (see discussion in 7.2.4.2/3 below). Given that the construction can be dated to some time before AD 622 and given the wide geographical expanse that the construction has covered, it is clear that the different speech communities where it occurs could witness further changes, incrementation, in the construction. This point of principle is taken up further in 5.3.2.9.

130

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D

This section was devoted to illustrating in some detail a feature with essentially the following attributes: (5.22) Elaboration of P2 for Arabic • a single linguistic manifestation • and therefore whose chances of independent parallel development are from the outset unlikely • yet which is found at the most disparate ends of the Arabic-speaking world today, and at a number of points, mostly non-contiguous, in between • with a probable affinity to a morpheme in Classical Arabic • and which has a direct pedigree to an identical morpheme in various West Semitic languages Nonetheless, independent parallel development⁷ has been argued for this feature as noted above, and as a general phenomenon is argued for over and over by Semiticists and Arabicists. I will quickly bundle a number of prominent examples here, examples whose linguistic description will be developed in the rest of this chapter. Readers may wish to look ahead to the more detailed case studies presented in this chapter and return to these examples later. Heath (2015: 14, see 5.3.2.4) argues that the leveling of the first person imperfect paradigm occurred in North Africa multiple times. Leddy-Cecere and Retso¨ propose parallel independent development of the b-indicative (see 5.3.2.6 for detailed discussion), Lucas and Lash (2010: 399), which will not be treated in this book, argue that the discontinuous negative ma … -š arose independently in Egypt and in Oman and Yemen (against this, see Wilmsen 2020: 526, 2022 for an extended I/I + D account).⁸ Palva (1995: 5, see App. 3.2.2) advocates the view that unconditioned ∗ k → cˇ in central Palestinian dialects is the result of a process independent of palatalization described in 3.2 (see App. 3.2.2). Furthermore, he sees the same unconditioned change of ∗ k → cˇ in a few Algerian dialects as yet a third instance of independent development of this identical ⁷ Also termed “polygenesis.” I prefer the direct term for two reasons. The direct designation is linguistically straightforward. “Polygenesis” on the other hand has been used in different senses. Polygenesis for Holes (2018b: 8, 17) would appear to mean the same as independent parallel development, with an added biological tinge. For Edzard (1998: 25) on the other hand it describes an initial state of heterogeneity, beyond which historical Semitic linguistics can possibly not reconstruct. ⁸ Though arguing for parallel development, Lucas and Lash see a contact origin at play independently in both cases, once via Coptic (in Egypt) and once via South Arabian languages (in Yemen). It appears that Lucas broadly reiterates the parallel independent development argument (2020: 654). However, beyond noting typological parallels between dialects/languages with discontinuous negative marking (EA and Coptic on the one hand and Yemeni Arabic and Modern South Arabian languages on the other), no in-depth discussion examining the plausibility of the one or the other has occurred. Note, for instance, that Behnstedt and Woidich (2018: 71–74, discussed in 9.1 below) take for granted that an ancestral form of Hijazi and Yemeni Arabic migrated to Egypt. A priori, there is no reason they couldn’t have brought the discontinuous negative with them.

5.2 WHEN THINGS GET COMPLICATED

131

phonological change.⁹ In a case that will be discussed in 7.2.5 and in App. 7.2.5b, Souag (2017) proposes multiple independent development in a (proleptic or cataphoric) double-object (differential object marking) construction which occurs from Uzbekistan to Andalusia, with scattered reflexes in between in Iraq, Syria, Cyprus, and Malta (see ch. 7 n. 12). Parallel independent development in the context of early Aramaic–Arabic contact is discussed in 7.4 below and in App. 7.2.5a where Pat-El and Stokes (2022) are argued against. In some cases, assumptions of parallel independent development are tailored to fit what appear to be categories set up on an a prioristic basis. Imala (aa → ie, this anticipating 5.3.1.3) is one of the most widespread of the non-contiguous features discussed in this chapter (see Owens 2006 [2009], chapter 7). What I term classic imala is the conditioned change aa →ie /_i. A long /aa/ raises in the context of a preceding or following high front vowel, provided an inhibiting consonant is not adjacent to /aa/. Inhibitors are emphatics (ṣ, ṭ, ḍ, ḍ) and gutturals (ɣ, x, q). Besides this basic conditioning environment, what Sibawaih implicitly recognized as irregular imala occurs (1) to long /aa/ even lacking a conditioning /i/ in the context, e.g. naas → nies ‘people’ and four words including xiefa ‘he feared, ṭieba ‘be good’ (Sibawaih II: 281). /x/ and /ṭ/ are normally imala inhibitors, yet in these lexemes /aa/ imalizes. It has been noted that an early attestation of imala in Arabic literature comes from the Damascene Psalm fragment (PF), dated to somewhere between 700 and 900 (Violet 1901, Al-Jallad 2020b). As discussed in App. 5.2.2, the PF is argued to be a reflex of what Al-Jallad terms “Old Hijazi” Arabic. Since the PF is written in Greek letters, the vowel structure is discernible. the Damascene Psalter shows the classic long-aa imaala represented by {έ} - keen ‘it was’ (< kaan), siħeeb ‘clouds’ (< siħaab). Among his Middle Arabic texts in fact, Hopkins (see 4.2.1.1) notes that imala is “indicated by the transcriptions, especially those of the Psalm fragment” (1984: 8). Two issues, however, constitute a problem for Al-Jallad’s analysis. The first issue is that his Old Hijazi has reflexes of imala which the PF does not. In the PF the emphatics and gutturals regularly inhibit imala, e.g. faaḍat ‘it emptied.’¹⁰ A form such as xiefa ‘he was afraid’ is not expected. Al-Jallad offers solutions here. One is to appeal to Rabin’s (1951: 112) suggestion which attributes the raising in these forms to a process separate from imala which converts an underlying @ayi →ee, {xyf }, = @xayifa → xeefa. This, however, buys a solution at the expense of postulating two independent sources for imala or imala-like behavior. One, classic imala, ⁹ I can note that Behnstedt (1994: 7–9) had one year earlier than Palva proposed the same drag-push chain explanation for the two shifts ∗ q > ḳ and ∗ k > cˇ. His analysis for Sukhne in Syria is subject to the same criticisms as Palva. However, there is no indication that he advocates independent development in these two dialects. ¹⁰ Other imala inhibitors in the PF need to be postulated as well, including /w/, /ʕ/ and /r/, which will not be dealt with here.

132

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D

is conditioned by the surrounding context. The other arises due to the lexicophonological structure of a class of weak medial verbs. Herein lies the point of the current criticism which I return to presently.¹¹ A second general issue is not mentioned by Al-Jallad, and this is Sibawaihi’s observation (II: 279: 21) that all Hijazi do not have imala, wa jamiiʕ haađa laa yumiiluhu ʔahl al-ħijaaz, “all of these [i.e. imala forms, j.o.] the Hijaazi do not imalize.” On this basis, the PF would not qualify as Hijazi in the first place. From the current perspective one of two possibilities explains the special imala. One is that these verbs simply reflected the incrementality of a then-spreading imala phenomenon. In non-emphatic, non-guttural contexts long /aa/ imalizes, e.g. baab → bieb ‘door’ (II: 285). There is no doubt that Sibawaih regarded xiefa etc. as belonging to the imala family, as such examples are introduced in the middle of his detailed imala discussion. Moreover, Sibawaih recognized the anomaly of xiefa etc. himself. He cited word-final imala of weak-final verbs which are not expected to have imala: saqie ‘he watered’ (II 287.19, /q/ is inhibitor), ɣazie ‘he attacked’ (root is {ɣzw}, not a /y/ root). Sibawaihi’s explanation is worth reflecting on. He says that in ɣazie it is as if the /w/ were transformed into a /y/. The overall theoretical explanation is that of analogical incrementation. Sibawaihi’s extended interpretations avoid the need to postulate two independent sources for imala. A second explanation relating to Koranic orthography–imala as an orthographc event—is discussed in App. 5.2.2. As a final example of Arabicists appealing to independent parallel development, presumed historico-cultural differences, as adumbrated in 1.1.4, play an explicit role for Blau (1969, 1981b). Anticipating 5.3.2.9 below, Blau perspicaciously observes that Judaeo–Arabic has the same linker –n construction as do various Bedouin dialects (e.g. Najdi) as well as Uzbekistan Arabic. He poses the question whether these have a common origin, answering the question in the negative because “It seems improbable that the dialects of Central Asia, at the beginning, shared in the linguistic development that affected the modern Bedouin dialects” (1969: 40). The issue of linguistic plausibility of three times parallel independent development isn’t considered. I return to Blau 1981b further in 5.3.2.9 where the linker –n is discussed in greater detail. The list goes on. A logical corollary of parallel independent development is to dispense with Arabic altogether in favor of “Arabics.” I will criticize McWhorter’s (2007) interpretation of this in 10.2.1 below. Among Semiticists Rubin (2017: 855) is a recent expression of this.

¹¹ Al-Jallad countenances a second approach, namely to regard the /ee/ as “conditioned by environment” (2020b: 68). This latter alternative, however, makes little sense, given that the problem began with the PF environmental inhibitors disallowing imala in the first place

5.3 GEOGRAPHICALLY NON-CONTIGUOUS FE ATURES

133

There also exist a large number of spoken Arabic dialects. Some of these dialects are so different from one another that, if we use mutual intelligibility as a distinguishing criterion, we would really speak of the modern Arabic languages, in plural.

Of course, from a historical linguistic perspective mutual intelligilibity has never been used as a decisive criterion, so in a sense Rubin’s comment is harmless. However, it is not completely harmless that no supporting references are cited. In fact, hardly any such mutual intelligibility studies have been done (see ch. 13 n. 6 for comments on one such) and certainly none encompassing the entire Arabic-speaking world, so there is no perceptual basis for the statement in the first place. The issue is interesting, but given the lack of studies, irrelevant to any domain of linguistic research. It is even more harmful, however, when the methodological consequences to assuming the many Arabic languages approach are thought through. Following up on Rubin’s logic, the approach more or less ensures that multiple manifestations of the same phenomenon will count as independent development since each will now be ensconced in its own new Arabic language. Of course, I may be wrong in this. Perhaps these many Arabics will reconstruct back to common ancestors. In this case, however, the result will probably nullify the assumption of different languages, in line with the overall thrust of the current study.¹² Against parallel independent development/many Arabics, I propose a fundamental alternative. A working corollary to Lass’ principle is P3a: this is termed “P3a,” as a second corollary P3b will be introduced in 5.3.2.9. P3a Corollary to Lass’ principle Complex systems identical in their main features are most likely due to inheritance/innovation + diffusion (I/I + D). Reject explanations of parallel independent development except for compelling linguistic reasons and with compelling comparative evidence.

5.3 Geographically non-contiguous features with a postulated common source Section 5.2 served two purposes. First, it makes the argument, essentially, that the Arabic language history is made immensely complicated by the fact that many linguistic changes are non-contiguous. This shouldn’t be surprising given the history ¹² Rubin’s article is published in a handbook of linguistic typology, which only solidifies a skepticism against allowing typology to dictate historical linguistics (see discussion in 5.3.2.7).

134

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D

of the Arabic-speaking peoples. However, integrating this reality into an interpretation of Arabic language history has proved challenging. Secondly, it made the methodological argument that such commonalities as can be discerned across disparate geographical locations are to be understood in principle as instances of spread of a common innovation or common inheritance, which, as in the case of the intrusive –n, may need to be reconstructed into West Semitic. In this section I would like to summarize a number of strong candidates for the I/I + D interpretation of linguistic features. Some of these are presented in perfunctory fashion, essentially suggestions for more detailed historical treatment, while others are elaborated upon and integrated into a detailed reconstruction of Arabic. In all cases the argument follows the same logic. A striking linguistic feature is found in disparate, non-contiguous geographical areas. The presumptive explanation for this is a single innovation or a single inherited form followed by diffusion.

5.3.1 Phonology The first three features are already attested in Sibawaih. In the following the geographical extension of the feature is representative but not necessarily exhaustive. Regions are chosen which minimally show that the feature occurs over a wide, discontinuous area.

5.3.1.1 ∗ j = zˇ Distribution of /zˇ/: southern Syria, Lebanon, Cyprus, Tihama and Hijaz, Libya, northern Tunisia, some of Algeria (particularly non-urban, Singer 1980: 252), Morocco e.g. dzˇaa-ni ‘he came to me’ = zˇaa-ni As seen in 3.2.1 (3.6d), one of the jiim variants described by Sibawaih is the “shiyn like a jiym,” interpreted as a /zˇ/. The variant is found in Syria. There, because it is the variant of Damascus, it is effectively the prestige pronunciation in the country, as well as in neighboring Lebanon. Aleppo, it should be noted, has /dzˇ/. It is also the pronunciation of much of the Hijaz and Tihama in Saudi Arabia, and beginning with eastern Libya, it is the dominant pronunciation throughout the Maghreb with the exception of Malta and much of Algeria. /dzˇ/ is dominant in Mesopotamian, Jordanian (East Bank), the Arabian peninsula ex Hijaz, and all of Sudanic Africa. 5.3.1.2 ∗ k → cˇ/ç Distribution: Gulf Arabic, Baghdadi, Jordanian, and central rural Palestinian, isolated Syria, Sharqiyya in Egyptian delta, Jabli in Morocco, Tlemcan in Algeria e.g. kibiir = ˇcibiir ‘big’

5.3 GEOGRAPHICALLY NON-CONTIGUOUS FE ATURES

135

This feature was already discussed in detail in 3.2 and its reconstructional status in Owens (2006 [2009]: 246–250). It is described in Sibawaih, and its wide distribution both in the heartland Middle East and in the diasporic regions attests to its pre-Islamic origins. As discussed in 3.2, I regard both the conditioned and unconditioned palatalization of ∗ k as deriving from a common innovation.¹³

5.3.1.3 ∗ aa → ie imala Distribution: Syria and Lebanon, Cyprus, qǝltu dialects = Northern Iraq (Mosul) and most Anatolian dialects, Tihama,¹⁴ Negev, Andalusia, Eastern Libya, Malta e.g. saamaħ ‘he forgave,’ i-siemiħ ‘he forgives’ (ELA) Imala is arguably the most interesting sanctioned phonological feature discussed by Sibawaih. He spends a good 15 pages describing the form, phonological and lexical conditioning of the phenomenon, a sample of which was already discussed in 5.2.2 above. Classic imala represents the conditioned change of a long /aa/ to [ie/ee] in the context of a front vowel (Owens [2006 [2009]: chapter 7). Equally, as Corriente (1977: 22, perhaps following Cantineau 1960: 96–97) succinctly formulated the matter, imala can occur when an inhibiting factor is not present. In general, inhibiting factors are neighboring emphatic consonants, a /u/ or /a/ following the long /aa/ and the so-called raised consonants (al-ħuruuf almustaʕlya /q, x, ɣ/). Hence, taabal ‘coriander’ does not induce imala, but kitieb ‘book’ does; ṣaaʕid ‘raising’ inhibits (because of emphatic [ṣ]) but sieʕid ‘help!’ does not. Importantly, allowing for specific local divergences, the conditioning factors described by Sibawaih are effective today wherever imala is attested. In his detailed discussion, however, Sibawaih discusses lexical, contextual, and other factors which influence imala, and in a statement which is stunningly sociolinguistic in its purport and remarkably neutral in evaluating the status of imala notes: Know that not everyone who imalizes the /aa/ agrees with others of the Arabs who do so. Rather, each of the two groups might differ from the other, in that one might use /aa/ where his neighbor imalizes, while he will imalize where his neighbor uses /aa/. Similarly, someone who has /aa/ will differ from another who has /aa/, in a way similar to those who use imala. So if you should encounter an Arabic with such forms, don’t assume that he is simply mixing up forms. Rather, that is how the matter stands. (II: 284.1) ¹³ In contradiction to a reader who against P2 advocates for the typical Semiticist position of parallel independent development: “Palatalization of ∗ k is so widespread cross-linguistically that it is possible that we are dealing with parallel development in some cases.” Of course many things are possible in language, which is why specific factors (Arabic is a language with many sub-varieties which split and split and rejoined), general principles (Lass’ principle) and socio-demographic realities (4.4) and in the case of Arabic, the ALT, are customarily invoked to temper simple ex cathedra generalizations. ¹⁴ For Tihama, Qahṭani 2015: 49 (mee-hi ‘she is not < ∗ maa-hi).

136

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D

Sibawaih counts the imala among the phonetic variants acceptable in poetry and Koranic recitation (3.2.1). Phonetically today the dominant value is [ee], this occurring in all but Eastern Libya and Malta. Besides these two values there is a so-called level 2 imala “incrementation” where /ee/ raises in a usually lexically conditioned manner to /ii/. Hence in Levantine can be found a contrast between (5.23)

zˇeemiʕ ‘gathering’ zˇiimiʕ ‘mosque’ (both < ∗ jaamiʕ)

In the Baghdadi Arabic of 1964 Jewish Baghdadi had /ii/ where Christian had /ee/, miiziin vs. miizeen ‘’scale’ (< miizaan, Blanc 1964: 42). This /ee/ vs. /ii/ contrast is found in the Levant, Iraq, and Andalusia, but not elsewhere. A second instance of imala incremenation was discussed in 5.2.2 above. It can be noted that what today is known as the imaalat taaʔ marbuuṭa ‘the imala of the feminine –t suffix” discussed in 2.2.2, was not treated by Sibawaih. In much modern parlance, imala is in fact often associated with this variant, and indeed the conditioning factors are often similar.¹⁵ However, imala as described here refers only to the long /aa/. Briefly commenting on the distribution of imala in contemporary varieties, as with the previous two features it is found scattered irregularly throughout the heartland as well as in diasporic regions. A cautionary note should be added here as well. Clearly imala and palatalization of ∗ k discussed in the previous section have a common phonetic basis, namely a shift toward a palatal segmental value. To date so far as I know, there are no dialects, however, which have both ∗ k palatalization and /aa/ imala. Data in Sibawaih is not specific enough to shed light on this question.

5.3.1.4 ∗ ɣ → q ʕArab of Baħrain, (north) Yemen, Syria, Central Algeria, Kordofan, Shukriyya/ Sudan, LCA e.g. ɣassal = qassal ‘he washed’ The shift of the voiced velar fricative /ɣ/ to /q/ is not so widespread as the previous three features, and I believe is attested neither in Sibawaih nor in the ALT generally. However, its widespread distribution among Gulf dialects (Holes 2005, 2016: 53–54), in the extreme north and south Yemen (Behnstedt 2016a: 10), parts of Syria (Behnstedt 1997: 16), Central Algeria (Singer 1980: 252) and in a number of the Sudanic dialects (Reichmuth 1983: 46–47, Manfredi 2010: 225) attests to ¹⁵ For instance, in Damascene the feminine suffix is either –a or –e. –a follows what Sibawaih termed the raised (mustaʕlya) consonants and emphatics, plus /r/, otherwise -e, e.g. ṣiiɣ-a ‘jewelry’ vs. ʕaad-e ‘custom’ (Cowell 1964: 138).

5.3 GEOGRAPHICALLY NON-CONTIGUOUS FE ATURES

137

its pre-diasporic provenance (see Owens 2006 [2009]: 165 for more). One qualification of this feature is in order. In the previous three cases there is no obvious correlation between the feature in question and other highly prominent variational features. For instance, imala as described in 5.3.1.3 occurs with /zˇ/ dialects as in 5.3.1.1, e.g. Syrian) and with /dzˇ/ dialects (Negev, Anatolia, northern Iraq). In the case of this feature, all the dialects where it is attested have the value of qaaf as /g/. There are two ways to interpret this. This could be evidence for a push chain, whereby an original /q/ or /ʔ/ shifted to /g/, leaving room for /ɣ/ to shift to /q/, assuming maintenance of lexical contrastiveness. Alternatively, it could simply have been that /g/ was an original value of qaaf (Sibawaih qaaf = voiced, majhuur) and there was always phonological space empty in such dialects for ∗ ɣ → q. Additionally one can speculate, in contact with original qaaf = /q/ dialects, ɣ became reinterpreted as /q/.

5.3.1.5 ∗ θ → s Uzbekistan, Anatolia (Azex, Bəħzaani), LCA (eastern, Bagirmi) θoor = soor ‘bull’ The last segmental change encompasses the smallest number of speakers and varieties. It is the regular change of ∗ θ → s, in all environments. ∗ θ in fact has four reflexes in contemporary varieties, /θ/, /t/, /f/ and /s/.¹⁶ /t/ is probably the most widespread, though many dialects (Arabian peninsula, Iraq, Jordan, Eastern Libyan) maintain /θ/. /f/ is vanishingly rare, attested in the Baħarna of Baħrain (Holes 1983, 1986, 1987) and in a few Anatolian dialects, such as Siirt. The /s/ reflex is also rare. I limit the discussion here to this one reflex in order to concentrate on a methodological point. This is a feature where it is most difficult to prove common origin. Another instance of ∗ θ → s is in fact attested in Arabic (see below, this section), which is interpreted as a genuine case of parallel independent development. However, as will be seen, there are clear indications that the conditions surrounding that shift are different from here, involving an element of diglossic change. Simply claiming here ∗ θ → s does not provide compelling evidence for common origin. By the same token, claiming that the change occurred independently in Uzbekistan, Anatolia and Lake Chad (eastern dialect) has no compelling basis either. What can be adduced is the following. All three cases are extreme instances of peripheral dialects, and it is not uncommon for relics to be maintained in peripheral regions. More interestingly, as pointed out in Owens (1998a: 73–75), LCA shares a number of less common isoglosses with Uzbekistan Arabic, all of which are strong candidates for shared retentions, four of these discussed elsewhere in this book. ¹⁶ And in other Semitic languages proto-Semitic ∗ θ sometimes surfaces as /s/, as in Ge’ez (Moscati et al. 1980: 34).

138 (5.24)

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D Uzbekistan-LC shared retentions¹⁷ a. invariable 2FSG –ki, beet-ki ‘your.F house’ (see 5.3.2.1) b. –n, intrusive –n, as discussed at length in 5.2 c. Linker –n, as will be discussed in 5.3.2.9 d. Imperfect verb paradigm where indicative marker occurs only in persons marked by V-initial persons, as will be discussed in 5.3.2.6.2 below e. Doubled verbs maintain final –a, lamma ‘he gathered’ (see 12.3.3)

A final argument in favor of Bagirmi Arabic ∗ θ → s reflecting an innovation predating entry into the WSA area goes as follows. Throughout Egypt, the mother of Sudanic Arabic, as well as throughout the Sudanic region itself the change ∗ θ → t has (∗ θaani → taani ‘other’) occurred without exception, except for Bagirmi Arabic. In order for ∗ θ → s to have arisen independently in the Bagirmi region it must have been carried via ∗ θ. So (1) ∗ θ would have had to have survived intact all the way to the Bagirmi area and then (2) it would have had to have innovated (for some reason) to /s/. This in a region which otherwise has seen uniform ∗ θ → t. Instead of these two steps, it is more plausible to postulate a single shift of ∗ θ → s before the ancestral population ever got to Egypt. Since ∗ θ → s was already in place in Egypt/the Sudanic region, it would have been under no pressure to innovate toward /t/ (unconditioned ∗ t → s having no motivation). Of course, where and when this innovation occurred will never be definitively answered, but the current explanation is commensurate with the possibility that it occurred once in the Middle East, the populations splitting, toward Central Asia and toward Central Africa. Taken in isolation, ∗ θ → s is a possible common retention. When understood as one of a bundle of unusual retentions, the case for a shared retention from an early split is much stronger. It can be noted that there is another fairly regular shift of ∗ θ → s which occurs in “learned” words in Levantine and Egyptian Arabic. Thus, instead of θawra ‘revolution,’ one may have sawra. This was probably introduced with the Ottoman dominance (see Cowell 1964: 3, Medhat 1991: 161). The scope of this shift is quite different from the regular shift, being restricted to learned words. Hence Cairene has [s], sawra vs. [t] toor ‘bull,’ both from ∗ θ. It was probably motivated by the lack of /θ/ in Turkish, the desire to keep learned words distinct from the common ∗ θ → t shift which marks Cairene and Damascene Arabic and by Osmanic Turkish itself, where Arabic ∗ θ is routinely interpreted as /s/ (e.g. istisna ‘exception’ < ∗ istiθnaaʔ). This can be seen as a special instance of a diglossic H feature mimicking a “natural” linguistic change (see discussion in 6.8). ¹⁷ Only the very specific isoglosses linking Uzbekistan and LCA are given here. In Owens (1998a: 75–76), further isoglosses, which might include other dialects, are also noted, for instance, the relative clause marker instead of being illi/alli, al- only in LCA and il- only in Uzbekistan.

5.3 GEOGRAPHICALLY NON-CONTIGUOUS FE ATURES

139

The current feature is included in order to show that even in uncertain cases the argument for a common retention of an early innovation when contextualized with bundles of other shared isoglosses is made much more plausible.

5.3.1.6 Others The set of five phonological examples undergirds the argument that the geographically dispersed reflexes of the linker –n discussed in 5.2, deriving from a common Arabic or even West Semitic inheritance is typical, not exceptional in Arabic. The first three arguments (5.3.1.1–5.3.1.3) are I believe irrefutable, given that they are already described, sometimes in devastating detail, by Sibawaih. However, even lacking evidence in Sibawaih other features can be adduced, in particular if they share bundles of isoglosses with other dialects. It should be emphasized that the current list is far from exhaustive. What is undoubtedly the most iconic variant of Arabic, in both dialectology and sociolinguistics, the “qaaf ” variable (/q ~ ʔ ~ k ~ g (dzˇ)/ (see 6.8) is not even treated here. Syllable structure variation, discussed in some detail in Owens (2018b, see 7.2.2) is touched on in this book only briefly (see 2.3.2, 7.2). The point, however, is not to develop a definitive historical account of these and other features but rather to prepare the reader for the reality that dealing with the Arabic language requires orientating toward a different horizon from the history of better-studied European languages.

5.3.2 Morphology Morphology obviously plays an essential role in language reconstruction and, because Arabic and the Semitic languages in general have a relatively rich morphology, there is much to consider. Hetzron (1976) is well known for advocating the primacy of morphological criteria over all other. Key elements of Arabic morphology present in essence the same geographical dispersion as do the phonological features considered in 5.3.1.

5.3.2.1 Invariable –ki ‘2FSG’ WSA, Bduul and Sinai (most northern litoral dialects, de Jong 2000:168, 368, 577, 675), Tihama, qəltu dialects in general—northern Iraq and Anatolia; Uzbekistan; conditioned—ki Cairene, Central Palestinian beet-ki ‘your.F house’ Today the geographically and demographically most widespread reflex of ‘your.FSG’ is based on –ik, beet-ik ‘your.F house.’ However, what was reconstructed as proto-Arabic ∗ –ki (Owens 2006 [2009]: 246–250) is also well attested, at least in its geographical expanse, stretching from the LCA all the way to Uzbekistan. Furthermore, some dialects have a conditioned alternation, as follows.

140 (5.25)

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D –ki after VV, before a suffix Otherwise –ik

Thus Cairene has šaaf-ik ‘he saw you.FSG,’ but šaaf-oo-ki ‘they saw you.FSG’ and ma šaaf-kii-š ‘he didn’t see you FSG.’ This is discussed in greater detail in 12.3.3.

5.3.2.2 –ı´ ‘my’, -nı´ ‘me’ southern Jordan (Bduul), most of the north Sinai litoral, WSA e.g. beet-ı´ ‘my house,’ abuu-ı´ ‘my father,’ šaaf-nı´ ‘he saw me’ Displaying a considerably smaller geographical distribution than –ki, but still covering an extensive, discontinuous area is stressed –ı´ marking the 1SG possessive pronoun and object pronoun. The feature is confined to this one person— other object suffixes follow the phonologically determined stress assignment appropriate to their dialect, a point which distinguishes this stress pattern from Hebrew, which stresses most suffix object pronouns. In most instances both the possessive pronoun and object pronoun bear exceptional stress. In LCA, however, only the possessive pronoun does, hence šáaf-ni ‘he saw me,’ without stressed –ni. This feature is also discussed in the chapter on borrowing from Aramaic, 7.2.3.2. 5.3.2.3 -ha/hin/hum ~ -a/-in/um Damascus, LCA, Uzbekistan, various qǝltu Mesopotamian; see Procházka 2018: 275 for list of Levantine dialects with this feature e.g. beet-a ‘her house,’ šaaf-um ‘he saw them.M’ Object suffixes beginning with –h often have a variant lacking the /h/. This is a variant argued to have been described already by Sibawaih (see Owens 2006 [2009]: 211), where the /h/ is said to be ‘hidden,’ xafiyya (II: 464.4). It is reconstructed in Owens (2006 [2009]: 241–245). The conditions for dropping the /h/ differ slightly from dialect to dialect. In Damascus and in Iraqi the rule is formulated allomorphically, where –h is maintained after a vowel, dropped after a consonant. (5.26)

abuu-ha ‘her father’ umm-a ‘her mother’

In LCA, as described extensively in Owens (2006 [2009]: 241–243) /h/ can be dropped in any environment, though in an extensive quantitative comparison, hretention outnumbered deletion by about a 2:1 margin. It can finally be noted that there is a third possibility in spoken Arabic in respect of –h and that is complete assimilation to a preceding consonant. (5.27)

beet-ha → beet-ta ‘her house’

This further adds to the geographical dispersion of the feature. This is discussed further in 7.2.1 below.

5.3 GEOGRAPHICALLY NON-CONTIGUOUS FE ATURES

141

5.3.2.4 Imperfect verb: 1SG, 1PL, n-, n-…-u n- ‘I’ imperfect verb: all of North Africa beginning in western Egypt, Malta, Andalusia (Ferrando 2000: 51, Corriente 2013 88–89), Upper Egypt (intermittently between Asyut to south of Luxor) and western Delta, Bahariyya, and Farafira oases, Chad There are basically two sets of paradigms. (5.28)

‘I, we’ (a) ʔa-, n(b) n-, n … -u

a-ktub—na-ktub na-ktub—na-ktub-u

‘I write—we write’ ‘I write—we write’

(5.28a) is the dominant one generally and is also CA. (5.28b) is the object of discussion here. In addition to (b), mixed paradigms occur, e.g. a-ktub na-ktub-u in the central Nile Delta area and in dispersed locations in the Gina-Luxor region in Upper Egypt (Behnstedt and Woidich 1985: 210–212, Behnstedt 2016b). In Shukriyya Arabic in the eastern Sudan (Atbara area, Reichmuth 1983: 277), as well as in LCA n- may occur for 1SG, without n-…-u in the plural, under variationist (Owens 1998: 168–173) or grammatical conditions (Owens 2003: 736). There is widespread consensus (e.g. Fischer 1898, cited in Brockelmann 1908: 567) that the plural na-ktub-u arose via analogy to the 2 and 3PL, ta-ktub-u, yaktubu-u ‘you.MPL, they.M write’ (see Figure 5.2). The geography of this feature is not particularly discontinuous. All of North Africa has it and its attestations in western Egypt can be seen as a continuation of this North African extension. Its occurrence in the Nile Valley and the oases is discontinuous, however; it is not attested in the eastern Sudan and Nile Valley, except to the extent it is brought by migrants from the WSA area. It re-occurs in most of Chadian Arabic, which is part of the WSA area, but it occurs only variably in the Kordofanian Arabic of the Baggara (Manfredi 2010: 130) and not at all in most of the LCA dialects, even though these latter two are part of the WSA area. The Africa-specific distribution of this feature speaks to a post-diasporic development, but Blau (2002, see Dalman 1905 [1982]: 265) notes that Palestinian Aramaic has n- ‘1SG,’ hence n-, n- ‘I—we’ in the imperfect verb, like Shukriyya and the limited LCA paradigms.¹⁸ It is relevant to note that as a minimal terminus ab quo (5.28b)¹⁹ is attested in Judaeo-Arabic eleventh-century corpora from both

¹⁸ {n-yqwm} ‘I will get up.’ According to Dalman this is the first person plural (i.e. {n-}) often used instead of a-, which also occurs for the 1SG (Statt der 1 Pers. Sing. wird in Galil. Sehr oft die erste Person Pluralis gebraucht). His examples make clear, however, that it is simply {n-} used as 1SG, e.g. from the concordance with the possessive suffix –i in the same sentence. Contemporary Maʕlula Aramaic also has n-, n-ifθuħ ‘I open.’ (Arnold 1990). As Blau (1981b) rather cryptically suggested, it cannot thus be ruled out that n- was part of an Aramaic substrate, in which case the feature could have been introduced into Arabic via contact. The problem with Blau’s observation is that it inevitably leads to postulating a greater degree of Aramaic–Arabic contact than Blau ever countenanced. ¹⁹ Though only the SG, ana …n-ktib ‘I write’ is certain, n-…-u only interpretively so in Wagner’s two eleventh-century corpora.

142

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D

Morocco and Egypt (Wagner 2010: 78–79). (See App. 5.3.2.4 for contribution of Behnstedt to this feature.) Since the features introduced in this chapter are all intended as an argument against parallel independent development, it is relevant to note that Heath (2015: 14), following Marçais and Guiga (1925) advocates interpreting this feature as one which arose independently in different places. Though his exposition is brief, it is worthwhile looking at it in some detail as Heath is one of the few to have actually recognized the issue of explanations in terms of parallel independent development.²⁰ Methodologically, it is not sufficient simply to claim independent parallel development. In the extreme case, (5.28b) is interpreted to arise independently wherever it is attested. Presumably, this has to be at least limited to discrete geographical populations, and not to individuals. On this reckoning one would probably have at least the following instances of independent parallel development, identifying the regions very broadly. (5.29)

Independent parallel development of n-, n- -u first person imperfect North Africa Upper Egypt Chad Andalusia Malta

If Andalusia and Malta are not included as part of North Africa, there are up to five independent developments of the event in (5.28b/Figure 5.2 below). Arguing in support of parallel independent development, Heath speaks of “a rather obvious repair” in a paradigm needing to level out analogous morphemes, -u for ‘plural,’ n- for ‘first person,’ whose generalization of n- was supported by the nasal element in the independent pronoun ana ‘I.’ This process happened independently multiple times. His line of argument is to identify a structural condition and assume that, to the extent that the condition exists, it can effect a change anywhere. For instance, the 1PL would start with nu-ktub, but under analogical influence of the plural –u in the second and third persons, would itself acquire the –u suffix. nu-ktub

nu-ktub-u

‘we write’

analogy tu-ktub-u yu-ktub-u

‘you.M.PL write’ ‘they.M write’

Figure 5.2 Analogical spread of –u PL in imperfect paradigm

²⁰ Heath, consistently, also would see Jabli /ç/ as a parallel independent development (2015: 7, n. 5, see 3.2). He doesn’t consider the role of diffusion.

5.3 GEOGRAPHICALLY NON-CONTIGUOUS FE ATURES

143

Certainly historical linguistics requires more than a valid structural description to offer a plausible explanation as to how an event happened. If not, no constraints exist as to what can be claimed happened in any of the non-contiguous developments which are introduced in this chapter. For instance, the basic imala, /aa/ palatalization condition (5.3.1.3) is basically a shift of /aa/ to /ie/ when the adjacent syllable has /i/. (5.30)

Imala aa →ie / _i / before/after i

This is the basic structural imala description. It works for Eastern Libyan Arabic, for Andalusian Arabic, for qəltu dialects in Mesopotamian Arabic and for those Lebanese and Syrian dialects which have it. The rule, moreover, is a “natural” one, essentially vowel harmonic assimilation as Sibawaih himself (II: 279) describes it. Following Heath’s “it was a condition just waiting to happen” approach, imala occurred independently multiple times. The same argument, on this basis, is valid against all the phenomena treated in this chapter. This is where a broader comparative perspective is necessary. Unmentioned is the fact that the development in Figure 5.2 occurred only among dialects whose populations are known from the historical record to be related to common points of dispersal, first in Egypt then in other parts of Africa (and Malta, Andalusia). Chadian Arabic came to that region in the thirteenth or fourteenth century (see Chapter 8), from Upper Egypt. It is thus odd that this innovative feature is found only among populations for whom the I/I + D model provides a plausible explanation for the contemporary distributions. Furthermore, keeping with the current example, the variation between (5.28a) and (5.28b) is found in the WSA area (not discussed by Heath), with the central area, Chad, largely having (5.28b), the peripheral regions, Kordofan and LCA lacking it. This existence of both variants mirrors exactly the situation in Upper Egypt where some dialects have (5.28b) and others don’t. All of the ancestral Arabic-speaking populations of the WSA area migrated out of Upper Egypt. Under the I/I + D model, the replication of (5.28b) already found in Upper Egypt in the WSA region is accounted for without the unnecessary complications of positing parallel independent development. It is interesting to consider a fundamental issue in Heath’s claim, namely that paradigms repair themselves, that syntagmatic fusion happens all the time, and that if these processes happen once, they can be invoked independently ad infinitum. That all these repairs happened among populations known to have spread both east to west and west to east as well as north to south, but did not happen in the Middle East homeland, or spread into Central Asia is ignored. The justification here for parallel independent development thus runs into two problems. First, clearly, there is no magical process conjuring up an n- 1SG imperfect any time, anywhere in the Arabic paradigm, because in the majority of cases, it never

144

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D

happens, and still is not happening, even though the conditions for the “repair” are met. Proposing a structural definition of a historical linguistic change is not a license to claim it happened independently every time the effects of that change are present. Secondly, while identifying structural factors which plausibly led to a historical linguistic shift is, trivially, necessary for the postulation of the linguistic event, the structural description, as in Figure 5.2, can as well be read as an instruction for the speaker of a dialect with (5.28a) to apply analogical tools to use (5.28b), i.e. is a blueprint for diffusional spread via secondary diffusion (see 5.1). The mere citation of the conditions says nothing definitive about how the event came about in the populations where it is attested. To be specific, the “North African” imperfect verb paradigm came about via I/I + D. It could have been diffused directly in a population in which the innovation was completely present. It could have been diffused via secondary contact: population A met B, A had (5.28a), B figured out (!) the logic of (5.28b) and applied it in their speech. One can speak of two types of diffusion, primary and secondary, but in both cases, the argument of diffusion over parallel independent development stands. It can be noted that Heath’s main interest is not in fact the inflectional expression of first person but rather the expression of nominal possession, where a part of his argument relies on independent parallel development of its exponents. Without passing judgment or commenting on this issue, there is little to recommend to the idea that the n-…u first person plural arose independently four or five times in various parts of Africa and Andalusia.

5.3.2.5 taltala morphemic /a/ vs. /i/ Taltala as interpreted in this book is the variation between /a/ vs. /i/ among various grammatical morphemes occurring in closed syllables. Indeed, surveying the entire Arabic-speaking world indicates a ~ i variation in nearly all -∗ VC morphemes. The basic idea in abbreviated form goes back at least to Ibn Faris (Ṣaaħibiy: 28, 34) to represent variation in the preformative vowel of the imperfect verb, and here is generalized to account for widespread a ~ i variation in -VC morphemes. A representative but non-exhaustive²¹ list is given in Table 5.1. Tracing the linguistic history of each of these affixes as a whole awaits systematic treatment. Where CA has the same morpheme, it always has the value /a/, which probably tells us as much about the variety underlying CA than it does about proto-Arabic. What is intriguing in these forms is that unlike CA, an /a/ or /i/ in ²¹ See e.g. Procházka’s (2014) summary of pronouns, which includes alternations such as anta/inti ‘you.F.SG,’ ittan/ittin and ‘you.F.PL,’ intam/intum ‘you M.PL,’ -ham/-him or hum ‘they/them.M.’ One could probably add interrogative pronouns (man/min or miin ‘who?’), function words (e.g. bidd/badd ‘want,’ gid/gad ‘TMA particle’) and other functional classes where the a ~ i alternation is non-contrastive.

5.3 GEOGRAPHICALLY NON-CONTIGUOUS FE ATURES

145

Table 5.1 taltala /a/ /i/ preformative vowel of verb 3FSG perfect suffix Definite article FSG suffix 2FPL 3FPL 3FPL object Intrusive–n Linker –n

Ca-at al-at -tan -an -han an an

Ci -it il-it -tin -in -hin in in

ta-ktub katab-at al-beet kibd-at-ha kitab-tan katab-an šaaf-han šaayf-ann-u

ti-ktub ‘she writes’ katab-it ‘she wrote’ il-beet ‘the-house’ kibd-it-ha ‘her liver’ kitab-tin ‘you.F.PL wrote’ katab-in ‘they.F wrote’ šaaf-hin ‘he saw them.F’ šaayf-inn-u ‚has seen him‘ dirt-in ṭayyiba ‘a good tribal area’

one morpheme often does not imply an /a/ or /i/ in others. Note that taltala is very much alive and well outside of the Arabian peninsula.²² (5.31)

Examples of taltala in select functional morphemes /a/ /i/ Iraqi (Baghdadi) kitb-at ti-ktib kibd-at-ha il-beet Najdi ktib-at šaaf-hin ktib-an kitab-tin al-bayt rajjaal-in ʕarabi ‘an Arab man’ (linker –n) kibd-at-ha ELA iktib-at kitab-tan iktib-an

kibd-it-ha il-bayt šaaf-hin

WSA katab-at ti-ktub ~ ta-ktub katab-tan kibd-it-ha katab-an šaayf-inn-a (intrusive –n) al-beet šaaf-hin abu bagarat-an waade (linker –n) Moreover, more complicated distributions may be attested which may allow both vocalic variants of a morpheme in the dialect. The preformative vowel in ELA for instance is either Ca or Ci, ya- or yi- depending on whether the stem vowel is low ²² Ferguson’s thinking on this was clear but oversimplified: 1960: 621: “… all dialects outside Arabia seem to have the reflexes /i/ instead of those of /a/….” In Upper Egypt, for instance, Behnstedt and Woidich (2018: 85) note that in the extreme south /a/-dialects dominate (see App. 5.2.2 for LCA).

146

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D

(ya-takallam ‘he speaks’) or high (yi-ktib ‘he writes’, Owens 1984). WSA nearly always in all dialects, has –it for the FSG nominal suffix, but with CaCaC stems, may have –at, bagar-at-na ‘our cow.’ Furthermore, this list does not exclude other morphophonological variants, for instance, the 3FSG perfect suffix has many allomorphs under various conditions (see Owens 2006 [2009]: 181 n. 29), e.g. as soon as the suffix falls in an open syllable as in ELA, iktib-at-ih → iktib-iet-ih ‘she-wrote it.F.’ Moreover, even the representation in Table 5.1 is too general. In LCA, for instance, there are dialects with exclusively preformative /a/, yaktub ‘he writes’ and, separately from these, those with –han (see discussion in 6.5 below). While the situation described here is, at first glance at least, far too complicated to argue for individual parallel independent development for each dialect, clearly the interplay of inherited forms and analogical effects operative within each dialect has worked independently to produce uniquely distinctive bundles of taltala reflexes. It is assumed here that the inherited starting point was: /a/ in some morphemes, /i/ in others. As the feature spread, each dialect went its own way innovating toward /i/ or /a/ sometimes in one morpheme, sometimes in another.²³

5.3.2.6 b-: future or indicative imperfect prefix Uzbekistan, Damascus, Egypt, Nigeria, Najdi, Gulf, Yemen Not in: CA, Iraq, ELA, Anatolia,²⁴ Tihama, North Africa Example: b-yiktib ‘he writes’ (Cairene) This is a complicated feature treated in detail, though still not exhaustively, in Owens (2018b), on which the current discussion is based. Probably the most extensive imperfective pre-verbal marker in Arabic has the form b- (see El-Wer 2014 for more on form of b-). It is, however, by no means the only such marker. Morocco and Jijel in eastern Algeria have a marker ka-, ka-n-ktəb ‘I write’ (Caubet 1993: 32; Aguadé 2018: 58 for longer Maghrebi list).²⁵ Furthermore there are varieties including CA itself, Eastern Libya, northern Tunisian and the region known as Dagana in Chad E and NE of Lake Chad which have no imperfect pre-verbal marker whatsoever. ²³ This basic mechanism follows Barth (1894: 4–6) who treated only the preformative vowel, but from a comparative Semitic basis. He suggested that proto-Semitic originally had two values of the preformative vowel a- and i-. The distribution of these was determined by the rule, a- before an /u/or /i/-stemmed verb, i- before an /a/-stemmed verb. He illustrated this with Hebrew, ya-ħmud vs. yiħsar ‘he breaks.’ Individual languages/dialects were “free” to generalize the one vowel or the other, e.g. Ethio-Semitic in general has only the high vowel i-. Barth introduces the variation at the PS level. It is suggested here that in analogous fashion in proto-Arabic there was variation in a host of function morphemes alternating between a ~ i. The question of the degree to which the proposed proto-Arabic situation is generalizable outside of Arabic is beyond the scope of the book. Other phonological factors should also be explored. For instance, many dialects display an /a ~ /i/ alternation in closed/open syllables, e.g. Shukriyya, Abbeché taktub, tikítbu ‘you write, you.PL write.’ Taltala might have played off of such phonologically based shifts. ²⁴ See discussion in Jastrow (1978: 300). ²⁵ For the pragmatic prefix da- in Iraqi, see App. 5.3.2.6a, and for a class of prefixal pragmatic markers in general, Adnan and Owens to appear.

5.3 GEOGRAPHICALLY NON-CONTIGUOUS FE ATURES

147

Still, besides the inherent interest of explaining such a wide, non-contiguous geographical extension, there are three further reasons for concentrating on b-. First, as will be seen, it has two distinctive grammatical values whose historical definition is a challenge in and of itself. Secondly, other scholars have suggested that these values arose via parallel independent development, so here is another test case against this assumption. Thirdly, it introduces a further important issue into Arabic historical linguistics, namely a critical perspective on the degree to which grammaticalization trajectories can be invoked to support historical linguistic interpretation. The following account will pare down a great deal of detail to argue the position that b- which occurs throughout these varieties has a single origin. In order to treat this question I begin by summarizing five imperfect paradigms from widely separated Arabic dialects. Unless descriptive clarity dictates otherwise, as it will for LCA and Uzbekistan Arabic, after presenting one complete paradigm, for the remainder, for the sake of brevity, I illustrate with only three persons, 2MSG, 3MSG, and 1PL. As will be seen, these persons are adequate to represent the major necessary comparative linguistic parameters. It will be argued that three sequentically linked stages can be identified in the development of b-. I will begin with the final stage three because this is the stage which is best known, it essentially being identical between the two dominant centers of Damascus and Cairo. I then work my way into the past, via stage 2 and finally stage 1 which is interpreted as being the stage where the original b- innovation entered Arabic. It is argued here that although the grammar of stage 1 differed markedly from both stages 2 and 3, linguistic mechanisms can be discerned by which b- changed from an original semantico-pragmatic value, to a fairly strictly grammaticized marker of indicative. As a brief starting point, commensurate with grammaticalization studies, it is assumed that b- originates in the verb ‘want,’ which itself has various forms, yibɣi, yibɣa, yibi, yiba ‘he wants.’ I assume these last two forms produced two future marker variants, bi- and baa- (App. 5.3.2.6 for more discussion of etymology). (5.32)

baa-ya-hab-uu-lla-na FT-3-give-MPL-to-us ‘They will give it to us’ Ṣaʕdah in North Yemen (Behnstedt 1987: 207).

(5.33)

m-in-ṭarriš b-we-send ‘We’ll send’ (Baħrain, Holes 2016: 301; see 5.3.2.7 for /b → m/ assimilation).

yibi < yibɣa appears to have a wider distribution than yiba outside of the Arabian peninsula, while in the Arabian peninsula yabi appears to be dominant in the eastern Najdi area, and yiba in eastern Yemen, SE Saudi Arabia and perhaps into Oman (Diem 1973: 56, 62, 126; Brocket 1985: 21 on Oman; Behnstedt 1985: 132;

148

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D

Behnstedt and Woidich 2014: 498, 507), though no sharp isogloss separates the bi- and baa- areas. While it is a comparative exercise in and of itself to substantiate whether the current distributions of yibi ~ yiba in fact plausibly mirror the two verbal prefixes bi, baa-, and if so, to determine to what degree such overlap might reflect historical developments, this study proceeds on the assumption that this is the case. I further assume that it is the bi- < ∗ yibi or ∗ yabi variant which provided the ultimate source for indicative b-, one formal argument being that as a future marker baa- is typically a long vowel (see (5.30)), whereas bi- is short. This provides two arguments for considering (yi)bi the source: short vowel reduction to bis more likely than long vowel reduction, and reduction of a high vowel is always more likely in Arabic than reduction of a low vowel in Arabic. 5.3.2.6.1 Stage 3 Damascus, Cairene and (?) Sanʕaaʔ Damascus Arabic Damascus (Cowell 1964: 55, and see 10.3) SG PL 1 b-ǝ-ktob m-nǝ-ktob 2M b-tǝ-ktob b-tǝ-kǝtb-u F b-tǝ-kǝtb-i 3M b-ǝ-ktob b-yǝ-kǝtb-u F b-tǝ-ktob At issue is the origin of the prefix b-. Cowell (1964: 324) describes b-marked imperfects as having the values of indicative, habitual and a simple imperfect.²⁶ A key contrast of the b-imperfect is with the subjunctive, which is marked only by the person marker, here represented as Ø or as it is termed the Ø-form. The subjunctive alone expresses a wish, as in (5.34), and contextualized by various expressions, such as ʔalla, expresses an optative. (5.34)

Indicative mǝ-n-rūḥ ˁas-sinama “We’ll go to the movies” ˀaḷḷa bi-waffˀak “God will grant you success”

Subjunctive Ø n-rūḥ ˁas-sinama “Let’s go to the movies” ˀalla Ø y-waffˀak “May God grant you success” (Cowell 1964: 344–358)

For future reference, it can be noted that in the 1PL imperfect b- has the allomorph m- (see (5.33)). There are contexts where only the subjunctive can be used. These include after modals, many of them epistemic, deontic or estimative, and after aspectual ²⁶ And similarly Mħarde in west-central Syria, b- is an actual present, historical present, habitual and indicative, (Yoseph 2012: 78). In Nabk 80 kilometres north of Damascus the b- imperfect is “a general present” or a habitual (Gralla 2006: 124) and equally in Soukhne in central-eastern Syria, an actually occurring or habitual action (Behnstedt 1994: 60–61). Yoseph (2012: 78) designates the b- form an “indicative” opposed to the Ø subjunctive.

5.3 GEOGRAPHICALLY NON-CONTIGUOUS FE ATURES

149

predicates, including “bǝdd, want,” “laazim, must,” “yǝmkǝn, might,” “ˀǝdǝr be able,” bada “begin,” nšaaḷḷa “God willing, I hope,” ˀaḥsan “it is better,” as well as after certain subordinating conjunctions like ˀawwal ma “as soon as” and bass “as soon as, provided that.” Cairene Cairene Arabic, imperfect conjugation, indicative, (Mitchell 1956 and Woidich 2006a: 273–282) SG PL 1 bi-ni-ktib 2M bi-ti-ktib bi-ti-ktib-u 3M bi-yi-ktib The meaning of Cairene b- overlaps considerably with that of Damascus. It expresses a habitual, a historical present, and reports an event that is transpiring at the time of speaking (enunciative). (5.35)

b-y-ruuħ ‘he’s going, he goes’

Also similar to Damascus, b- is excluded from various marked modal contexts which require the basic stem (Ø-marked) instead. Essentially, deontic (obligation) and epistemic (knowing) contexts do not allow the b- imperfect. (5.36)

b-a-ʔdar ʔa-rūḥ (never ∗ a-ʔdar … ba-rūḥ or ∗ baʔdar b-a-rūḥ) b-I-can I-go “I can go.” yi-ddi-l-hum xabar 3-give-to-them news ‘He is to let them know.’ (Mitchell 1956: 84)

As with Levantine, an m- allormorph is attested in Egyptian Arabic as well, in the central Delta region of Egypt and in isolated locations in the eastern Delta (Behnstedt & Woidich 1985: 222). This form, moreover, may have once been present in Cairene (Spitta Bey 1880: 204). A more difficult distribution of b- in Sanʕaani Arabic is discussed in App. 5.3.2.6.1. In both cases discussed in this section a broad contrast exists between b- vs. Ø corresponding to the contrast, indicative vs. subjunctive. This modal contrast encompasses a number of tense and aspect features. b- is a habitual, a generic, and it can describe an action taking place at the time of speaking, i.e. it is an imperfective. The subjunctive, by contrast can be strictly defined by higher order predicates denoting various modal values such as ability, obligation and possibility. In certain contexts it may also freely contrast semantically with indicative b- (see [5.36]). This summary fits Damascus (and the Levant in general) and Cairene (and Egyptian in general) well.

150

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D

5.3.2.6.2 Stage 2, LCA and Uzbekistan LCA indicative (imperfect) SG PL 1 b-a-ktub nu-ktub 2M tu-ktub tu-ktub-u 2F tu-ktub-i tu-ktub-an 3M bu-ktub bu-ktub-u 3F tu-ktub bu-ktub-an

subjunctive (sg only) a-ktub tu-ktub tu-ktub-i i-ktub tu-ktub

WSA, for which in this case LCA is a stand-in, has basically the same contrast between a b-marked indicative indicating a habitual, generic, the description of a current event, or a future (see discussion in 5.3.2.4 for 1PL). This contrasts with the Ø-marked subjunctive, which indicates a wish or an order. The contrast was already met in Cairene Arabic (5.36) (and Sanʕaani, see App. 5.3.2.6.1). (5.37a)

guul ley-a b-u-ktub-a say to-him b-PV-write-it “Tell him that he is writing it.”

(5.37b)

guul ley-a i-ktub-a²⁷ say to-him 3-write-it “Tell him to write it/he should write it.” (subjunctive)

The b- vs. Ø contrast in LCA is more specific than simply indicative vs. subjunctive. It is useful to think of the contrast as describing non-control and control contexts. In a non-control context the speaker, or the subject, has no immediate control over the action or event depicted in the predicate. (5.38)

b-imši ‘He is going/will go’

By contrast, in the Ø-marked control context the speaker is in control of the event or state expressed in the predicate. (5.39)

yaa i-mši (∗ b-imši) don’t 3-go ‘He shouldn’t go’

Vs. (5.40)

bi-gdar bi-mši ‘he can go’ (he has this ability)

Note that b- in LCA is not so constrained as in Cairene and Damascene in that in contrast to these (see (5.36)) it can occur in deontic and epistemic contexts. ²⁷ The preformative vowel is realized as i- if word initial in the subjunctive.

5.3 GEOGRAPHICALLY NON-CONTIGUOUS FE ATURES

151

LCA thus has the same basic contrast b- = indicative, Ø = subjunctive as are clearly grammaticized in Damascene and Cairene, and which is discernible in at least some contexts in Sanʕaaʔ Arabic. LCA has, however, another unrelated property which will be argued to form an important link to the original marking of bin Gulf and Najdi Arabic. This is the asymmetric distribution of b- in the paradigm in (5.37), repeated in full in order to render the explication more graphic. What is apparent is that b- occurs not before every person, but rather before every person which in the Ø-marked subjunctive begins with a vowel; or equally, b- does not occur before C-marked persons. One reason LCA (and Uzbekistan) are singled out as stage II varieties is that they suggest that a simple phonological motivation was involved in the regularization of b- throughout the imperfect paradigm. Comparing the LCA indicative (5.37a) with its subjunctive (5.37b) it is clear that there is an asymmetry between the two. In the subjunctive some persons begin with a vowel, others with a consonant whereas in the indicative b- fills in the initial slot so that all persons uniformly begin with CV. In other words, whereas the result of this extension was a contrast between indicative and subjunctive, an initial motivation is reconstructed to have begun under the influence of creating a phonological symmetry in the paradigm. There are further aspects of LCA relevant to the development of the grammaticalized indicative function of stage 3 which I return to in 5.3.2.6.4. Uzbekistan Taken alone the suggestion that LCA represents a frozen stage between the original state described below and Damascene/Cairene finds support in a variety already introduced in relation to LCA in 5.3.1.5 above, Uzbekistan Arabic. It too has the two paradigms structurally identical to LCA (Zimmermann 2009; Fischer 1961: 249, 252). Uzbekistan SG PL 1 m-a-qtil nǝ-qtil 2F tǝ-qtil-iin tǝ-qtil-in 2M tǝ-qtil tǝ-qtil-uun 3F tǝ-qtil m-ǝqtil-in 3M m-ǝqtil m-ǝqtil-uun Subjunctive SG PL 1 a-qtil nǝ-qtil 2F tǝ-qtil-iin tǝ-qtil-in 2M tǝ-qtil tǝ-qtil-uun 3F tǝ-qtil yǝ-qtil-in 3M yǝ-qtil yǝ-qtil-uun

152 (5.41)

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D fat ibʕiir a-sū rūḥ-i a camel I-do soul-my ‘I intend to change myself into a camel’

The only difference between LCA and Uzbekistan Arabic is that Uzbekistan has m- where LCA, has b-. Whereas Retso¨ (2014a, b) and Ingham (1994b) would see this m- as a borrowing from the Dari (Farsi) indicative marker mi-, I would interpret it as the extension of the universal 1PL allomorph m-, cf. Damascus mnəktəb ‘we write’ throughout the paradigm. This may well have been supported by the presence in Uzbekistan Arabic of a co-territorial language with mi- in a similar function. However, merely copying mi- from Dari does not explain how the paradigm fills out in exactly the same way as does LCA, why it results in an indicative – subjunctive contrast and how it changed from ∗ mi to m-. Were the assumed Dari ∗ mi- → m- a purely phonetic development, one would have expected it to have filled out the V-initial slot and become a non-alternating part of the imperfect paradigm. 5.3.2.6.3 Stage 1: Gulf and Najdi, Tripolitanian, Fezzan (Libya) The final stage in the development argued for here is the first historical stage, where the b- originally entered the language. The historical homeland of this morpheme is to be found in the Persian Gulf (today Oman, Emirates, Baħrain) and Najd where it has a future, irrealis, or volitional meaning. (5.42)

bi-ni-ntihi FT-we-finish ‘We’ll finish’ (Holes 1990: 188, Baħrain) b-yi-rgˇiʕ-k al-ɣaarah FT-3-bring back-you DEF-attack ‘He intends to come back and raid you.’ (Ingham 1994a: 121, Najdi)

b- in these dialects is always optional. There is no clear-cut contrast such as indicative vs. subjunctive to limit the choice to one or the other. Lack of b- is, against the values set for b-, ambiguous: Ø marks intentionality as well as non-evidential, future, imagined events and states, as does b-, but Ø also represents ‘real’ and, following Perrson’s (2008) terminology, “evidential” events or states. The choice whether or not to use b- depends on the semantico-pragmatic estimation of its appropriateness by the speaker. Here as well b- has the variant m- before n- of the 1PL, via nasal assimilation (Holes 2016: 301 n. 116, see (5.33)). Without comment, it can be noted that volitional/future b- is also reported in Tripolitanian Arabic (Perreira 2008: 451, 508), in the Fezzan (Caubet 2017: 343– 348) and somewhat cryptically in Morocco (Heath 2002: 217).²⁸ ²⁸ Heath’s data is based on elicited material. He reports a lexical verb bw ɣi- ‘want,’ which may be inflected for person and used as a future marker. He seems also to say that it may be shortened to a clitic or prefix status, though probably more research is needed to this point.

5.3 GEOGRAPHICALLY NON-CONTIGUOUS FE ATURES

153

An interesting footnote to this stage is the suggestion by Eksell (2006: 83) that G/N b- possibly began under the influence of an Aramaic calque.²⁹ Babylonian Aramaic bʕi ‘want, volitional’ calqued on to ∗ yabġa/yibɣi ‘want’ which then fed into the grammaticalization described in 5.3.2.6.4. 5.3.2.6.4 From referencing the world to discourse immanence Indicative b- is interpreted as developing in three stages: (5.43)

b- A three stage model Stage 1: (Gulf/Najdi) ∗ yibi > b- future-volitional marker, occurrence dependent on speaker’s evaluation of appropriateness relative to the situation of the real world Stage 2: (LCA/Uzbekistan) b- (m-) fills in “empty”—V-initial slots in imperfect verb; shifts condition of occurrence from marking direct relation with external world to marking sequential relation between discourse-internal propositions (as described in this section) Stage 3: grammaticized marker of indicative

In the current analysis, b- begins its life in Arabic as a future-volitional marker whose occurrence depends on the speaker’s evaluation of the content of the proposition relative to the world. In stage 3 it is a grammaticized indicative marker opposed to Ø subjunctive. Stage 2 is the crucial transitional stage. Its role in the transformation to indicative marker requires greater discussion. The transitional stage 2 has two aspects. The first, in evidence in both Uzbekistan and LCA, is a purely morphophonological development in which b-, interpreted as m- in Uzbekistan, fills out the implicit Ø-V-C imperfect verb structure to b-V-C. All imperfect paradigm members now begin with C-. The second aspect is reconstructed from the behavior of b- in LCA. Here the description in 5.3.2.6.2 needs to be expanded upon, however. As represented in the LCA paradigm above, b- does not occur before –C-initial paradigm positions. This is not completely correct, however. In fact, b- can occur before these persons. The construction is better attested in the eastern (Bagirmi) dialect than the western (see 6.5 for discussion). What is crucial for the current argument is what discoursepragmatic value its use in this context is. In this context three main functions can be identified: b- may indicate a close sequential relationship between the b-marked predicate and preceding predicate (5.44), it may represent a causal relationship between them (5.45), and it may mark a resultative action. Al-Qahtani (2015: 60) notes that in the KSA Tihama a bi- future is gaining ground against a modally uninflected form among younger speakers, suggesting the Najdi bi- “future” is expanding. ²⁹ For the pre-Islamic Aramaic presence in the Gulf region and elsewhere in the Arabian peninsula, see 7.3. One can speculate: did an ancestral variety of Arabic calque an Aramaic tense value onto the verb baɣa (yabɣa, yiba) which subsequently reduced to b-, or was a b- integrated in Arabic directly from an Aramaic b-?

154

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D

(5.44)

F, TV72 Rafaa ni-širi le-a doorı´ ho m-nǝ-xud’d’-a, we-buy for-it.M dried meat and b-we-put-it.M “We buy dried meat for it and put it in.”

(5.45)

ma ti-talaf, ma ti-talaf be-ni-diss le-he širgaaniyye not 3F-spoil not 3F-spoil b-we-insert to-it.F mat nu-ṣuḅḅa-ha hu nǝ-difin-e fi t-tǝraab, we-pour-it.F and we-bury-it.F in DEF-earth (AB, TV72b Rafaa) ‘It doesn’t spoil, it doesn’t spoil because we put it in a grass mat and we pour it in and bury it in the ground.’

Following my interpretation in Owens (2018b: 225–232), the common thread linking these classes is that b-C signals a close link between the proposition expressed by the b-marked verb, and the preceding discourse. In all instances it signals that the speaker is explicitly drawing a sequential link between the preceding proposition, and the current b-marked proposition. The sequentiality may simply reflect the flow of the events themselves or may signal a closer causal or resultative relation. Across speaker turns it may signal a specific attitude to what the interlocutor has just said. The important point is that the b-C imperfect verb in LCA is defined on a discourse-internal basis. It is cued by nothing outside of the discourse itself, but rather has the status of an adjacency pair, signaling a significant relationship between two adjacent propositions. b- marks a significant propositional adjacency pair. This function differs both from L/E and from G/N. It differs from L/E in being pragmatically, not syntactically or semantically conditioned, and it differs from G/N in that its pragmatic value is discourse internal. In G/N, b- references a relation between an external state of affairs and the speaker’s assessment of that state. In LCA, b-C links one clause to another. It is this contrast with G/N which will be argued to form a missing link in the transition of G/N to, ultimately, L/E. b-C is an element constrained by the sequential exigencies of discourse. The b-C function is transitional (in a historical linguistic sense) in that it moves the locus of choice from being one between speaker and assessment of an external state of affairs (Gulf/Najdi), to one between speaker and the sequential flow of discourse. b- is now discourse-internal. This, it is inferred, smooths the way for b- marking any proposition in discourse, unless the individual semantics dictate otherwise. The LCA b-C function is a transitional step that, as it were, filters out the external world, bringing b- into a discourse-immanent environment where its occurrence can be governed by syntactic factors, and where it can eventually assume a simple unmarked indicative function.³⁰ ³⁰ Wagner (2010: 167) suggests that in Egyptian Judaeo-Arabic texts from the twelfth to thirteenth century the b-imperfect had a wider range of functions than it has in today’s Cairene, attested for

5.3 GEOGRAPHICALLY NON-CONTIGUOUS FE ATURES

155

In the same process, as b- moves toward a grammaticalized, text-internal indicative value, the Ø form, now termed the subjunctive, acquires a direct connection to the external world. (5.46a) is a state of affairs, happening now or habitually or inferred to be happening. (5.46b) asks that a state of affairs which does not exist in the world should happen. It asks that the proposition come into existence. (5.46)

a. bi-ji ‘He is coming’ b. i-ji ‘He should come’

The current argument entails interpreting LCA (and, by reconstruction, Uzbekistan) as having Janus-like properties. Looking toward stage 3, b- is fully grammaticized in certain contexts, as set out in 5.3.2.6.2. It marks an indicative vs. the Ø-subjunctive. It does not do so in all persons, however. The generalization to binvariably marking all members of the paradigm is a stage 3 property. b- marking C-initial persons sporadically as a propositional adjacency pair marker is the discourse property which set in motion the transition to b- as marker of indicative. In LCA itself therefore b- is both forward looking to the full indicative marker and at the same time maintains a relic of the transition to this status in its propositional adjacency pair function. App. 5.3.2.6.4 diagrams the development of b- as interpreted here.

5.3.2.7 b-imperfect: Against parallel independent development I will end the discussion of the b- imperfect with two general issues, beginning with that of parallel independent development. The current feature, the historical explanation of b-, is one of the test cases for the key question of parallel independent development. It is a geographically widespread feature which is not mentioned at all in the ALT literature. In four of the five dialects examined here it marks a basic indicative—subjunctive contrast, while in one the use of b- is determined by semantico-pragmatic factors. In the current account b- started its life as yibɣi which in turn developed into different variants, yibi and yiba ‘want’ in contemporary Gulf Arabic. b- in the future/volitional sense would have arisen via a classic grammaticalization process as in (5.47). (5.47)

origin of ∗ byabɣa/yibɣi > yiba/yibi > bi‘he wants’ > reduced to yibi > bi- ‘volitional, future’

In stage two, two factors favoured the anchoring of b- in the imperfect paradigm, one acting as a phonological filler Ø-V > b-V, creating a uniform C-V imperfect paradigm. The second was the development of b- as a propositional adjacency marker, b- propositional adjacency > marker of any proposition, unless semantics instance in purpose clauses. Though the data base is small, this property would align this era with contemporary LCA, which here is postulated to represent a stage 2 relic.

156

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D

dictate otherwise. Finally in stage 3 “Any proposition” becomes its indicative value, while exceptional semantics are the residual subjunctive meaning. Setting out a possible trajectory of development does not in and of itself clinch the argument for innovation and spread. It is possible to pursue this issue further in this case because there have been two alternative proposals accounting for the b- which invoke parallel independent development. Both of them appear to be motivated by the semantic disjunction between Gulf/Najdi ‘future/volitional’ band the indicative b- of the remaining dialects. It can be remarked that the attention in these studies focusses on what here is stage 1 future, volitional vs. stage 3 indicative, with no attempt to link the two in a sequential development. I will look in detail at only one of these proposals. Many of the objections to the analysis carry over to the other. Jan Retso¨ (2014a, b) advocates roughly the yibɣi ‘want’ origin as described in 5.3.2.6.3 for the Gulf/Najdi b-. For the Levantine/Cairene (he does not mention more than these) Retso¨ advocates an origin in a locative preposition b-. Criticisms of this can be found in Owens (2018a: 226). Here I would like to take up a second recent suggestion by Leddy-Cecere (2020), which again assumes (5.47) for Gulf/Najdi, and for the Levantine the grammaticalization of the predicate badd- or bidd- ‘want,’ the common word for this concept in Levantine Arabic, (5.48)

badd-u yi-ktib want-his he-write’ ‘he wants to write,’ lit. want-his he-write’

From the outset, there are two closely related ‘want’ concepts, badd and wadd. Wadd is originally an inflected verb (waddeet ‘I wanted,’ CA wadid-tu ‘I loved’ Lisaan VI: 4793) and found in the sense of ‘want’ in a number of dialects including Yemen, LCA (Behnstedt and Woidich 2014: 499). The CA verbal noun is wudd ‘love, affection.’ In CA {bdd} is associated with the meaning ‘separate, go different ways, distribute’ (Lisaan I: 226) and is not related to the sense of ‘love.’³¹ Bidd ‘want’ is therefore usually assumed to derive from the prepositional phrase, biwidd- ‘by the wish of ’ (> bidd). Bidd alternates dialectically with badd throughout the Levant. So far as I know, advocates of the bidd future do not specify whether the source would have been bidd or badd. I would note that Woidich and Behnstedt do not themselves suggest a derivation bidd > bi-. Nonetheless, elaborating on Leddy-Cecere’s etymology is instructive. In fact, in Leddy-Cecere 2020 this idea is more assumed than argued for in detail, so here I disinterpret in advance possible arguments supporting this analysis. It appears that there are three arguments for postulating badd- as the origin of Levantine b-. First, grammaticalization theory allows ‘want’ as an input to a ³¹ E.g. badda ṣaaħibahaa ʕan al-šayʔ ‘he separated/restrained its possessor from s.t.’ = abʕadahu ‘he removed it from.’

5.3 GEOGRAPHICALLY NON-CONTIGUOUS FE ATURES

157

future marker, so a walk through ‘want’ predicates in Levantine puts bidd- on the shopping list. Secondly in Cilician (Procházka 2002: 115–116) badd- has short forms bad and ba- with the meaning of ‘future’ (Leddy-Cecere 2020: 613). A short form bi- is not mentioned. (5.49)

bad-ti-šrab-i ~ ba-ti-šrab-i ‘you.F will drink’

Cilician Arabic, however, also has the normal b- indicative paradigm ‘habitual, indefinite time.’ (5.50)

bi-ti-ftaħi ‘you.F open’

Thirdly, in implicitly a similar line of reasoning, any b- in Syrian dialects appears to be assumed to have a bidd/badd- origin, so that, for instance, Soukhne b- is said to derive from badd- (Leddy- Cecere 2020: 613). Soukhne (Behnstedt 1994: 60, 184) does indeed have a b- prefix with its main meaning ‘habitual, frequentive, imperfect (“lang andaurende Handlung,” see n. 26, this chapter), but Behnstedt merely notes it as such and himself makes no suggestions as to its origin.³² It is relevant to note that others as well assume badd- as the origin of Levantine b-, (e.g. Yoseph 2012: 78), though without discussion. There are multiple problems with this account. First of all, it is not stated why we should be looking for multiple origins of b- in the first place. It appears that the following principle is operative, which I will dub the “grammaticalization sufficiency principle.” This will be revised below. P4 Grammaticalization sufficiency principle If grammaticalizaton theory has determined a possible source of a construction anywhere, it may be used as a sufficient explanation for a given historical linguistic development.

Numerous grammaticalization studies have indeed shown that ‘want’ will be a source for a future marker. This is assumed in (5.47) above. It does not follow, however, that every predicate ‘want’ will be the source of a future marker. A look at Behnstedt and Woidich’s Wortatlas (2014: 498–506) shows that Arabic dialects have the following main predicates for ‘want,’ indicating whether or not ‘want’ in that dialect is related to a future or volitional marker. The case of yibɣa/yibɣi is a bit complicated. Three cases are distinguished: those where yibɣa/yibɣi occurs along with b- as a ‘future, volitional’ marker, where yibɣa/yibɣi occurs without future/volitional b- and where the perfect baɣa occurs with the indicative marker b-. ³² In the Soukhne lexicon (Behnstedt 1994: 209) he gives {bdd} + pronominal object as ‘want.’

158 (5.51)

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D ‘want’ in future meaning baɣa/yibɣa/yibɣi ‘want,’ grammaticalization in Najdi/Gulf + spread (I/I + D) Extension, where b- = ‘future’: Gulf, Najdi, Fezzan, Tripoli (Libya) Extension where either no b- at all: SE Saudi Arabia, ELA, Tunisia, Algeria, Mauritania, Morocco, Chad (Dagana) or b- = evidential, indicative b-: Syria, Egypt (isolated) raad ‘want,’ no grammaticalization to future extension: Iraq, Levant, Uzbekistan, Cyprus, ELA, Egypt, LCA, Chad, Sudan, Morocco, Mauritania, Arabian peninsula in general ʕaayiz/ʕaawiz ‘want,’ no grammaticalization to a future Extension: Egypt, particularly Lower Egypt, Sudan dawwar ‘want,’ no grammaticalization to future Extension: LCA, Chad, Sudan (WSA) widd ‘want,’ no grammaticalization to future Extension: Oman, Yemen, SW Saudi Arabia, Jordan, Sinai, Syria (Horan), LCA ištaha ‘wish, desire,’ no grammaticalization to future Extension: Yemen, Oman, Gulf dialects, Libya, Tunisia, Algeria, Morocco

To these could be added a host of ‘want’ verbs with a much more local distribution, none of which developed into future or subjunctive markers or the like. One potential counterexample to these should be mentioned. In the Tihama (coastal North Yemen) the future particle ša- is cognate with the verb ‘want,’ šaa, yšaa, (cf. CA šaaʔ ‘want’) (Behnstedt and Woidich 2014: 509, Behnstedt 1987: 132). This deserves much closer scrutiny. It cannot be ruled out, given that the distributions are geographically contiguous, that ša ‘future’ < yšaa ‘want’ calques b- ‘future’ < yibɣi ‘want.’ As formulated, the grammaticalization principle is clearly wrong. Arabic shows many instances where a basic word for ‘want’ does not grammaticize into a future. “Sufficiency” needs to be deleted. P4 Grammaticalization principle (revised) If grammaticalization theory has determined a possible source of a construction anywhere, this may be adduced as a partial explanation for a given historical linguistic development, provided the explanation is supported by independent evidence.

Secondly, Levantine b-, as seen in 5.3.2.6.1 has a different value from Gulf/Najdi b-. I have summarized this difference as indicative vs. future/volitional. Indeed, it is

5.3 GEOGRAPHICALLY NON-CONTIGUOUS FE ATURES

159

this semantic difference which appears to have led Retso¨ to seek an independent origin for the two. Leddy-Cecere, however, represents badd- as developing into a future marker. There are two issues here, one factual, the other procedural/explanatory. Factually, in descriptions of Levantine dialects, neither b- nor bidd/badd are exclusively future markers. ‘Future’ may be one value but they have other values as well. In Nabk (Gralla 2006: 124) b- is an indicative, a habitual or universal present, in Soukhne an actual or habitual action,’ and in Damascus Cowell (1964: 324–327) says that b- has a range of possible values, vague future, annunciatory, habitual, indicating propensities. Bəddo on the other hand in Cowell is treated among modal determinants, meaning ‘want to, intend to, be going to’ (1964: 347). There is no intimation, however, that this further developed into the indicative marker which it is today.³³ (52)

badd- → b- ‘future’ ? → b- indicative

This problem is particularly acute when one considers that badd/bidd always coexists with indicative b-, i.e. all Levantine badd/bidd dialects are also b- dialects, so with absolute consistency one would have had (5.52), while always preserving badd/bidd. It might be replied here that grammaticalized elements often co-exist with their source, ‘gonna’ and ‘going to,’ bi- and yibi. The difference here is that bidd/badd and its presumed offspring, bi- co-exist as grammaticalized elements. bi- ~ yibi, on the other hand, is a straightforward conversion to grammaticized bi-, with the postulated source continuing to function as a normal finite verb in the meaning ‘want.’ In its finite form guise, it does not compete in the meaning space of ‘future’ with b-, as would be the case if b- and badd are considered to have the same source in Levantine Arabic. Moreover, in general one finds b- co-existing with a host of ‘want’ verbs, as already listed. This can be repeated here very perfunctorily for illustrative purposes. (5.53)

‘want’ ʕaayiz dawwar dawwar ištaha raad raad

bbno bb bno b-

lower Egypt, Sudan LCA, Chad, Sudan Dagana Chad Yemen, Oman LCA, Najdi, Syria Iraq, ELA

³³ Lentin (2018: 189–190) summarizes early attestations of b- in literary texts beginning in the ninth century but eschews commenting on its ultimate historical origin. He does not consider an origin in badd-.

160

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D

In the Levantine case the reasoning appears to be that, because there is a one-toone overlap of badd/bidd with b-, we can assume that b- derives from badd/bidd. As (4.53) shows, those dialects which have b- in the majority of cases do not have the cognate yibi but rather a different ‘want’ verb. Assuming my own unitary analysis, it can be seen that in the course of its spread, the b- indicative did indeed stay in contact with the etymological source yibi in many instances. Even more frequently, however, it lost contact with this source, and became associated with other ‘want’ verbs, as listed in (5.53). There is no relationship in these cases between the ‘want’ verb and the expression of indicative (or future for that matter). In the current analysis, b- developed into an inflectional element which gave it the independence to collocate, as it were, with any ‘want’ verb its speakers chose. On this account, it is reasonable to assume that the relation (5.54)

badd/bidd

b-Levantine

was simply another one of these associations, that an inherited b-imperfect happened to have acquired badd/bidd as its ‘want’ congener, just as the b- in the LCA has acquired raad or dawwar. The phonetic resemblance is accidental. Thirdly, as has been seen repeatedly in this chapter, geographical compactness or distance plays no role. Najdi/Gulf encompasses a coherent region. However, volitional/future b- also occurs in western Libya and in the Fezzan, with no documented occurrences in between, as yet at least (Pereira 2008: 451, 508). Fourthly, I would mention an issue which requires individual attention elsewhere, but can be set out in principle here. This is that a ‘future’ meaning is encoded in all Levantine dialects with a marker derived from the verb (or AP) raaħ/raayiħ ‘go/going,’ raħ/raayiħ/ħ-. This is prefixed to the verb and does not co-occur with either badd or b-, (5.55)

raħ a-ruuħ ‘I am going to go’

Any suggestion that Levantine b- made its way into the language via a future meaning needs to work out how its temporal space was carved out, and in what sequence, with the explicit future marker raħ. Fifthly, as seen in 5.3.2.6 wherever prefix b- occurs, whether in future/volitional meaning or the indicative meaning, it alternates with m- in the 1PL. This applies to Gulf Arabic, and in the Levant is attested inter alia in Cilicia (Procházka 2002: 114), Damascus, Soukhne (Behnstedt 1994: 60), Mħarde in central Syria (Yoseph 2012: 78), in Sult in Jordan (Herin 2010: 16 m-ni-ḥcˇi ‘we say’), and as noted in 5.3.2.6.1 it is attested in the Delta region in Egypt and may have been present in nineteenth-century Cairo and the allomorph is attested in Abbeché Arabic (Roth-Laly 1979: 49) and other WSA dialects (Manfredi on Baggara Kordofan 2010: 149).

5.3 GEOGRAPHICALLY NON-CONTIGUOUS FE ATURES

161

Of course, here again it might be claimed that m- developed independently ntimes. But this is confronted by the same issue, namely how many independent developments are to be assumed, rhetorically, once for every “dialect” where it is attested. One might limit this to twice, once in Gulf/Najdi, once in Levantine, though here one would have to accept a blatantly circular interpretation. The simplest assumption and one consistent with the known spread of Arabic is that the b- ~ m- variation innovated once and spread. Sixthly and finally, to bring all of these points together in a more general critique, the change needs to be integrated into an overall explanation of the manifestations of b- described in 5.3.2.6. For instance, indicative b- in Cairene and Damascus, as described in 5.3.2.6.1 are very similar in form and function. One would here have to advocate one of two things: badd → b- occurred in Levantine Arabic then spread to Cairo. Note that this would need to be the direction of spread, because Cairo does not have, and so far as is known did not have in the past, badd- as a ‘want’ predicate. Or, there was yet another instance of parallel independent development, with Cairene b- deriving from an as yet unidentified source. Thus, ironically, in avoiding a single holistic explanation for the spread of b-, the explanation of baddas the source of b- still requires an innovation and spread explanation, perhaps even multiple versions thereof. Reinterpretation, not independent parallel development I would like to address a possible procedural objection to the I/I + D explanation of b- and this is that it entails a major reinterpretation of future/volitional bto indicative b-. Two answers are relevant here. First, there are multiple instances in Arabic where a morpheme is refunctionalized, with the original value remaining in the ancestral population, alongside the new value in another. A short, non-exhaustive list is as follows. (5.56)

n- 1PL → n- 1SG n- in most variants has the value 1PL. In a number of North African dialects, n- marks ‘first person,’ indifferently as SG or PL (see 5.3.2.4 for extensive discussion). For 1PL, the additional PL suffix –u is added. Intrusive –in; marks object → subject, -ak = object → subject

As seen in 5.2 the intrusive –n usually signals an object suffix. In Uzbekistan Arabic, however, it has been completely refunctionalized to mark a subject complement (see 7.2.4.2/3 for more). ʕam actuality present, bIn Damascus Arabic the prefix ʕam alternates with or co-occurs with b-, marking an event as immediately occurring, ʕam-(b)-yiktəb ‘he is writing right now.’ Cowell terms it an actuality marker. In parts of Upper Egypt the Dakhla oasis in Egypt ʕam with manner allomorphs occurs with a discourse pragmatic meaning (Adnan and Owens to appear.).

162

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D

All these cases are parallel to b- as interpreted here. A grammatical morpheme is observed to have different but related values in different dialects. Following Lass’ principle, the presumption is that they go back to a common source. How the morphemes are related in terms of stages needs to be worked out on a case-bycase basis. In none of these cases is it necessary to postulate parallel independent development, that the value was invented anew in the community which developed it. Indeed, there are clear cases where such an argumentation would make no sense. As pointed out in 5.2, the only way to understand the refunctionalization of the intrusive –in marking a subject in Uzbekistan Arabic, rather than marking an object elsewhere where it occurs, is to see it as originally marking an object. (5.57)

zorb-in-ak → zorb-inn-ak having hit-N-you (someone) having hit you you having hit (someone)

-ak is an object suffix. There is no way it would have developed directly into a subject-marking function: it found its way into the paradigm regularly as an object and was refunctionalized into a subject. Nor is it necessary to argue that the different functions originally derived from different morphemic sources, that the ʕam marking an immediate action in Damascus has a different source than ʕam marking a general indicative in Dakhla for instance.³⁴ Finally, in one more general observation, as Eksell notes (2006: 84) the shift from a volitional, non-evidential future to an indicative, evidential is not an attested grammaticalization trajectory. Indeed, whereas a good deal of attention has been devoted to primary grammaticalization, relatively little has been done for so-called secondary grammaticalization. Once a grammaticalization process sets in, what happens next? The current account proposes a discourse-internal interpretation accounting for the eventual fixed attribution of an indicative value to b- in stages 2 (and hence to stage 3).

5.3.2.8 Deflected agreement: Plural, singular or plural, singular only FSG: Baghdadi, Damascene, Egyptian FSG or M/FPL Emirati, Gulf in general, Jordan, S. Tunisian,³⁵ CA in general, PL only: WSA, Eastern Libyan, Bʕeeri (Upper Egypt)³⁶ This instance of I/I + D is, as with b- a somewhat complicated matter, which will be dealt with in greater detail in 6.7.1 below. Here are the basic points. In Arabic there ³⁴ Usually interpreted as deriving from ʕammaal ‘in process of.’ ³⁵ SW Tunisia, well exemplified in Ritt-Benmimoun 2017 as a stage 2 variety. ³⁶ I would be inclined to add Maghrebi Arabic in general to this third category. Harrel (1962/2004: 157–158) reports that in Moroccan Arabic plural nouns concord in plural regardless of the lexical identity of the noun (deflected agreement only very exceptionally). The situation is slightly complicated by the fact that many Maghrebi dialects do not have a morphological feminine plural category, so one would need to assume that the plural non-human = FPL agreement, as in WSA kept the PL agreement, simply discarding “F.” The issue requires separate treatment.

5.3 GEOGRAPHICALLY NON-CONTIGUOUS FE ATURES

163

are three types of plural agreement rules. In one, singular agrees with singular and plural with plural. (5.58)

al-ʕajjaal maal-hin ṭawwal-an ke DEF-calves why-they.F tarry-FPL so ‘The calves, why did they take so long?’ (Manfredi 2010: 222) al-buyuut anħarag-an DEF-houses burned-FPL ‘The houses burned down’

Though this essentially is a continuation of a proto-Semitic inheritance (see 6.7.1) it is the least common of the three types of rule, found in Western Sudanic Arabic, as in (9.58), probably eastern Libyan Arabic, Bduul in southern Jordan (Owens and Bani-Yasin 1984: 222–223) and in Bʕeeri Arabic in the Nile Delta (Behnstedt and Woidich 1985: 142; Woidich 2006b). Importantly, this is also attested in the Classical Arabic era, under Sibawaihi’s rubric ʔakaluuniy al-baraaɣiiθ (Levin 1989) in which, as with other Semitic languages, a plural subject agreed with the plural verb in all positions. (5.59)

ḍarab-uu-niy hit-PL-me

qawm-u-ka people-IND-your (Al-Kitaab I: 202)³⁷

In the second there is a choice between FSG agreement and plural agreement with one and the same head noun. Evidence comes both from the Classical sources and from contemporary dialects. In Classical Arabic broken plural nouns allow both plural (feminine or masculine) and feminine singular agreement. The situation is well described by Ibn Yaʕish (Sharħ al-Mufaṣṣal 5: 103–105) in the thirteenth century, though he essentially follows Sibawaih (I: 202, II: 179). He notes that broken plural nouns, even human broken plurals (5.60a, b), allow FSG agreement.³⁸ (5.60)

(5.60b)

qaam-at get up-F or qaam get-up

ar-rijaal-u DEF-men-NOM ar-rijaal-u DEF-men-NOM

ar-rijaal-u qaam-at or ar-rijaal-u qaam-uw

³⁷ Beeston (1985: 30 n. 16) draws a direct association between this construction in Arabic and the invariable plural agreement in Sabaic. ³⁸ For discussion of stages 2 and 3 and the relevance of contemporary dialects in understanding the Classical Arabic deflected agreement pattern, see Wilmsen (2014: 36–39).

164

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D

(5.60c)

al-ʔayyaam faʕal-at DEF days did-FSG ‘The days did.’ or al- ʔayyaam faʕal-na DEF-days did-FPL ‘The days did.’

He further notes that the sound FPL in –aat equally allows FSG agreement. Note that the subject here is human. (5.60d)

al-muslim-aat-u DEF-Muslim-FPL-NOM al-muslim-aat-u DEF-Muslim-FPL-NOM

qaam-at get up-FSG qum-na get up-FPL

This choice between FSG and plural agreement is equally attested in many dialects today, particularly those of the Arabian peninsula and adjoining areas, such as Jordan, from which (5.61) comes. (5.61)

(a) gaal inno əḍ-ḍbaaʕ t-iiji ṭul əl-leel y-ħuum-in said that DEF-hyenas F-come all DEF-night 3-wander-FPL ‘He said that the hyenas come and wander about the whole night’ (Herin 2010: 283) (b) ḍall-at əl-ʕurbaan gabəl yi-ṭlaʕ-u ʕa l-baṭiin rester.AC.3fs ART-Bédouins avant sortir.SB.3mp sur ART-steppe ‘Before the bedouins used to go out on the steppes.’ (286)³⁹

(5.61a, b) in fact begin with FSG agreement, and move on to FPL within one sentence. In (5.51a) the subject is non-human, in (5.61b) human.⁴⁰ Other contemporary dialects where such either-or agreement choice is attested include but are hardly limited to Kuwaiti (Johnstone 1961: 264), Emirati (David Wilmsen p.c.), ³⁹ For clarification, in the Jordanian examples the FSG verb initially reflects the possibility of FSG agreement with the broken plural subject noun. As with other stage 2 varieties (except CA), an S-initial verb may agree in number with the following subject: awwalma bad-u n-naas yu-sukn-u fii-ha ‘A s soon as the people began (MPL) to live (MPL) in it…” (Herin 2010: 284). ⁴⁰ According to Enam Al-Wer (p.c.), Jordanian human broken plurals also allow either FSG or PL agreement, giving the examples, zulum kwayys-e iz-zulum j-o

‘people good-FSG’ ‘The people came-PL’

However, against the situation in Baħrain, human FPL marked by the sound plural –aat allow only plural agreement: mudarris-aat kwayyis-aat ‘good (FPL) female teachers’ (∗ kwayys-e FSG) iθ-θaaniy-aat raaħ-an (or raaħ-u), not ∗ raaħ-at ‘went-FSG,’ ‘The others-FPL went-PL.’ See Owens and Bani Yasin 1987 for more.

5.3 GEOGRAPHICALLY NON-CONTIGUOUS FE ATURES

165

and Najdi (Ingham 1994a: 62–65). Holes (2016: 332) indicates that even sound FPL nouns allow deflected (FSG) agreement. (5.62)

t-yi ummah-aat-na t-naadii-na 3FSG come mother-FPL-our 3FSG-call-us ‘Our mothers would come and call us.’ (Baħrain)

The only plural nouns which do not allow FSG agreement in both CA and those modern dialects with both types of agreement are sound masculine plural nouns, which require M(PL) agreement. The third possibility is defined by sensitivity to the natural properties of the subject, human nouns, whether broken plural or sound masculine or feminine plural, determining plural agreement, non-human feminine singular.⁴¹ The key development here is that the default agreement⁴² for non-human plural nouns is FSG. This stage is attested in many contemporary dialects such as Cairene, Damascene, and Baghdadi, and it is essentially the agreement used in Modern Standard Arabic.⁴³ (5.63)

il-buyuut DEF-houses

inħaraʔ-it burned-FSG (Cairene)

I will interpret these in terms of historical stages in 6.7.1 below and link it to mechanisms of change in 9.9. Here it is sufficient to note that there are multiple exponents of a basic number/gender agreement rule distributed non-contiguously around the Arabic world.

5.3.2.9 The linker –n: The incrementation corollary; independent but not parallel development The intrusive –n (5.2) is not the only –n morpheme which runs parallel to, but is not obviously directly derivable from CA. Another one is what I term the linker –n (Owens 1998: 216, 2006 [2009]: 104–106). Consistently across a number of varieties a –n may be inserted between a noun and its modifier. The linker –n is attested virtually in all parts of the Arabic speaking world and may well be attested in Safaitic (see 4.1.5.1, [4.9]). ⁴¹ This state of affairs is broadly confirmed by the corpus-based study of Belnap (1993: 101) for Cairene Arabic. Inanimate plurals are overwhelmingly FSG (deflected agreement) while human plurals have overwhelmingly natural (or “strict” in Belnap’s terms) agreement. Individual nouns may, however, be statistical outliers. Naas ‘people,’ for instance, is one human noun with a fairly high (30%) degree of FSG agreement. Woidich (2006a: 248) also notes that collective nouns, which he exemplifies with ahl ‘people, family,’ naas ‘people,’ xalʔ ‘people’ allow both FSG and PL. In general, however, plural non-human nouns take FSG agreement. ⁴² In these dialects, relics of the earlier stage 2 are found even in stage 3. If a non-human noun is modified by a numeral, with a very high probability (Belnap 1993: 105) it will take plural agreement: xams buyuut inħaraʔ-u ‘five houses burned-PL’ (Cairene). ⁴³ It is an interesting question outside the scope of this book whether the default deflected agreement in modern SA is due to the outsize influence on the variety in the nineteenth century of the Levant and Egypt, or whether it continued a trend which had set in in the literary language by the fourth/tenth century. Belnap and Gee (1994) would argue for the latter.

166 (5.4)

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D Linker -n (a) Andalusia (Ferrando 2018: 101) ʕay-ayn-an milaaħ eye-D-LIN beautiful ‘beautiful eyes’ (b) Mesopotamian muṭraħ-in yiṭlaʕ yi-sraħ bi-hin place-LIN he-go out he-graze with-them.F A place (thing) where he can go out and graze them.’ (Procházka 2018: 267) (c) Uzbekistan ħintit-in ħamra wheat-LIN red.F ‘red grains of wheat’ (Ingham 1994b: 115) (d) Najdi kalmit-in ramy-at word-LIN thrown-F ‘a word thrown out’ (Ingham 1994a: 49) (e) Baħrain bint-in zeen-a girl-LIN nice-F ‘a nice girl’ (Holes 2016: 131) (f ) LCA nəswaan, amaan-aat ṛaas-an xala azab-aat ke women having-FPL head-LIN field unmarried-FPL like ‘(like) women without responsibilities, like unmarried ones… (IM50) (g) Judaeo-Arabic {ʕly blad an⁴⁴ bʕyd-t} On contry LIN far-F ‘to a far country’ (Blau 1981b: 175) (h) Judaeo-Arabic {y-kwn l-y ʕbd-a} 3-be to-my servant-LIN ‘He shall be my servant’

(Blau 1981: 181)

As can be seen, the vowel is either /a/ or /i/, and as already noted in 5.3.2.5 is thus to be understood as part of the taltala complex. In the literature this is often referred to as “tanwin.” Tanwin is a CA construction in which an –n marking indefiniteness is suffixed to a case marking vowel.

⁴⁴ Written as a separate word, as often in Andalusian Arabic (Ferrando 2018).

5.3 GEOGRAPHICALLY NON-CONTIGUOUS FE ATURES

(5.65)

167

bayt-u-n house-NOM-TAN ‘a house’

The formal similarity is only partly striking. The CA tanwin partakes in a paradigmatic case contrast: (5.66)

bayt-u-n ‘house-NOM-TAN’ bayt-a-n ‘house-ACC-TAN’ bayt-i-n ‘house-GEN-TAN’

The vowel which occurs before the tanwin is grammatically variable.⁴⁵ The vowel in the linker –n is variable only when looked at in terms of the entire taltala complex (5.3.2.5), where a ~ i varies from one dialect to another, but not within a single paradigm in a single dialect. Furthermore, the linker –n marks the presence of an adnominal modifier as in the examples above, hence its name. Marking adnominal modification, it occurs in contexts where the CA tanwin would not occur. For instance, the linker –n occurs after the sound plural masculine suffix –iin and the dual suffix –ayn, as in (5.64a). These are contexts which do not support tanwin in CA. The historical linguistics of the construction still awaits a detailed treatment, though it can be suggested that it is cognate with the indefiniteness marker of CA, without being derived directly from it (see Figure 5.3 below). While in the vast majority of dialects/usage the linker –n requires a following complement, there are dialects where –n may occur phrase finally. For Baħrain Holes (2016: 132) notes the following. (5.67)

aana θ-θaaliθ, iħna xamsat-in ‘I am the third (born), there are five of us.’

One of the few texts, if not the only one, where this usage is attested regularly is Behnstedt’s text (1987a: 220–224) from im-Maθθ θθa⁴⁶ θθ in (north) Yemen. In this text nearly all indefinite nouns are marked by –in. (5.68)

ṣabb-u ʔaħb-in, ṣabb-u m-ħabb poured-they grain-LIN, poured-they DEF-grain ‘They poured grain, they poured the grain out.’

It also does not vary for case. (5.69)

wagaʕ la-ha wald-in drop to-her boy-LIN ‘She gave birth to a boy.’ (CA expects wald-u-n [walad-un])

⁴⁵ Ignoring special lexemic classes, e.g hudaa + n → huda-n ‘guidance.NOM/ACC/GEN.’ ⁴⁶ In this word only, the underlined θ signals an emphatic interdental voiceless fricative (following Behnstedt’s description).

168

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D

It should be clear that in no usage is the linker –in to be equated either with the indefinite tanwin or with the complex [case vowel + tanwin]. The –Vn implies an invariable vowel—in the examples in (5.64) for instance an /i/ or /a/. In CA one would expect in (5.67) xamsat-u-n ‘5-NOM-N.’ Similarly in Behnstedt’s text there are many instances where, if evaluated according to CA norms, a “wrong” vowel would result.⁴⁷ At this point, therefore, Figure 5.3 is the best working representation of the development.⁴⁸ PS or PWS –n Tanwiin (CA)

(non-specific?) discourse-status marker linker

-in

Figure 5.3 Linker –n and tanwin indefiniteness markera a What further research needs to clarify is whether a specific morpheme such as –in or –Vn can be postulated parallel to the tanwin (on right side of tree). If the answer θθ would represent a retention of is “yes,” then im-Maθθ θθa this morpheme, and the linker –n as attested in those dialects where it occurs would be an innovation, defined by a more restricted, endocentric context. At this point more research is needed.

Before leaving this section it is relevant to discuss the Judaeo-Arabic examples in greater detail, because Blau provides another test case for claims of parallel independent development (see 5.2.2 above). As noted there, Blau would see the linker –n (also known misleadingly in my view as “dialectal tanwin”) as having developed independently from Classical Arabic among three groups, “Bedouin” Arabic, Judaeo-Arabic and probably Uzbekistan Arabic. For Blau these represent “the surprising extent of parallel features showing to what degree linguistic development may repeat itself ” (1981b: 212). What distinguishes Blau is that he does give an explanation for the parallel independent developments. This deserves greater scrutiny, since it gives insight into his historical linguistic thinking. In short, Blau isolates two types of linker –n structural conditions. One (Blau 1981b: 174–176) ⁴⁷ Holes’ suggestion (2016: 132) that –in indicates indefinite specificity would only be helpful in the context of a study showing the discourse-pragmatic conditions of use of the linker –n. ⁴⁸ This similar to the intrusive –n discussed in 5.2 which is also considered cognate with, derived from a common source as the CA energic, but not derived directly from the CA cognate. The grammar of indefiniteness/specificity in the dialects of the SW Arabian peninsula remains an open question. Qahṭani (2015: 48–49) reports (all too briefly) that in the dialect of Asir indefinite nouns are marked by –in in linker position, and by –u in pausal position. ree-t raayiil-u ree-t raayiil-in

‘I saw men’ (pausal) fi m-ħugnah ‘I saw men in the field’ (linker position)

5.3 GEOGRAPHICALLY NON-CONTIGUOUS FE ATURES

169

is the N + attribute illustrated in (5.64a–g). The other is a complex set of circumstances which, as Blau shows, distinguish Judaeo-Arabic from contemporary spoken Arabic, as illustrated in (5.64h). These include the following: • • • •

Nominal predicate of nominal sentences subject of kaan ‘be’ nominal predicates of ʔinna sentences nominal subject of existential sentences

It is on the basis of the list that he makes his case for parallel independent development. In (5.64h) we find the subject of the verb kaan marked by the linker –an, though CA would expect a nominative case (ʕabd-un) here. Quite reasonably Blau suggests that the –n-marked nominals in the above list are instances of hyper-correction. Scribes presumably would have known that some sort of nominal suffix is expected in these environments, and they filled this suffix in with the linker –n. It is hyper-correction because the presumption is that the scribes were either attempting to mimic CA, or they developed a register on the basis of an interpretation of CA + the linker –n. The larger interpretive matter is the same as met in the discussion of Heath’s supposition that the n-…-u 1PL imperfective verb marking developed independently multiple times (5.3.2.4). In this case, however, there is a different issue, namely that there is no contradiction between seeing Judaeo-Arabic as partaking of the same historical split which gives us the linker –n, and as having a set of contexts which obviously do not fit the development since, as in (5.64h), there is no following attribute. The operative concept again is that the hypercorrective linker –n is simply an example of Labovian incrementation on the basis of a construction shared among many Arabic varieties. Blau’s explanation for examples such as (5.64h) in terms of hyper-correction is plausible, and simply represents yet another chapter in the many ways Arabic historical linguistics needs interpretation via multiple pathways: The basic construction is defined by the split in Figure 5.3—against Blau none of the linker –n constructions derive directly from CA tanwin. The particular Judaeo-Arabic innovation was to incorporate an interpretation based on CA grammar into its nominal marking (5.64h). Clearly in this case the development is based on the diglossia between the written norm (CA) and an inherited spoken variety, and represents an interesting example of a stylistic hybrid arising from oral and written sources (see 6.8). There are methodological and interpretive points here. As with the criticism of Heath, merely defining a structural factor characterizing an innovative development does not in and of itself license the etiquette parallel independent development. What Blau has demonstrated with his array of hypercorrective examples is to be sure a new development, independent of the other attested linker –n

170

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D

usages. There is, however, no “parallel” to justify Blau’s claims of independence for the Judaeo-Arabic since it is only Judaeo-Arabic which has structures of the type in (5.64h). What he actually demonstrates is incrementation from a common source. This brings us to a second corollary on Lass’ principle P2. P3b The incrementation corollary parallel innovation (convergence) is to be avoided in favour of single innovation pushed back to an earlier date. Before assuming parallel independent development, ensure that the development is not incremental to a previously defined antecedent source.

5.4 Lexicon The basic argument that multiple occurrence of the same feature in disparate parts of the Arabic-speaking world is evidence for the I/I + D model can now be easily, compactly and richly exemplified for the lexicon, thanks to Behnstedt and Woidich’s Word Atlas (2011, 2014). I will content myself here with one simple example, which could be multiplied many times over via a casual stroll through these works. A second example is found in App. 5.4. The atlas represents common lexemes in an area with shaded colors and restricted, isolated lexeme occurrences with special symbols. In the following example, the CA meaning as given in the Lisaan al-ʕArab of Ibn Manẓur is given first and thereafter the word in contemporary dialects. In a Classical Arabic dictionary, words are arranged under a conventionalized consonantal root, indicated here between curly brackets, which itself is not glossed in the dictionaries. The gloss is given with a word (wazn) having the basic root consonants and is typically a verb or a verbal noun. As a tease, the basic or first meaning for the entry is listed here, to give an indication of how far one may need to travel metaphorically and metonymically⁴⁹ in Arabic to reach a meaning. (5.70)

‘stallion’ (Hengst) in today’s dialects (2011: 111)

Lemma heads in the Lisaan corresponding to individual lexemes for contemporary ‘stallion,’ as described in Behnstedt and Woidich 2011: 111

⁴⁹ Leaving aside the fraught question of whether the Arabic root consonants are morphemic in any sense, many meanings in the Lisaan, i.e. in CA itself, need to be analyzed as deriving via metaphorical or metonymic extension. I.e. CA vocabulary is itself emergent from earlier processes. For instance, the word jawaad is interpreted as having been applied to a ‘good mare, that is, it became splendid, it suffuses goodness’ (Lisaan I: 721). Many of these points are discussed in the Atlas.

5.4 LE XICON

Word ħiṣaan jawaad ʕawd zˇaamil

171

{root} lemma {ħṣn} ħaṣuna ‘prevent,’ here: ‘stallion’ (II: 902) {jwd}, al-jayd ‘opposite of bad,’ here: horse (male or female, I: 721) {ʕwd} ʕawd ‘he who begins and renews’ (attribute of God), here: ‘old camel, sheep, goat’ (IV: 3157) {zˇml} zˇamal ‘male camel’; here no entry for zˇaamil (I: 683)

5.4.1 Reflexes in contemporary dialects ħuṣaan: Libya, Tunisia, eastern Algeria, Egypt, Irak, Syria, Jordan, Palestine, entire Arabic peninsula, Jordan, Sudan, Cyprus, Uzbekistan jawaad/juwaad: Sudan, Chad, LCA, Saudi Tihama, isolated Oman ʕawd: northern Morocco, Algeria, isolated Saudi Arabia zˇaaməl: Mauritania, southern Morocco, Malta Isolated lexemes⁵⁰: faħl: N. Morocco, Algeria, Malta, Tunisia, Chad, xeel: Morocco, Mauritania; faras Palestine (Ramallah); xtiš/kadiiš: Cyprus, Uzbekistan; kaabaayo NE Morocco. Ħuṣaan and jawaad are both ‘horse, stallion.’ The most widespread word for ‘stallion’ is identical (allowing for regular phonological differences) to CA. Roughly, it covers the northern part of the Arabic-speaking world. Jawaad, a less common word for ‘horse, stallion’ (gender undifferentiated), covers roughly sub-Saharan Arabic. ʕawd appears to be a metonymic extension. Žaaməl is formally an active participle, which happens not to be given a separate entry under {zˇml} in the Lisaan. This is not uncommon. Few entries in the Lisaan exemplify all possible derived forms (taṣriif ) of an entry. Its meaning here also appears to be metaphoric or metonymic, though in the Lisaan (I: 683) it is reported that al-jamaala al-xayl “the camels (collective) are the horses.” As for the isolated citations, faħl is also ‘stallion’ in CA. xayl is the collective for ‘horses’ in CA and in many dialects, here refunctionalized into a singular. Faras is ‘mare’ in CA, though this is attested for ‘horse’ only in Ramallah in the West Bank. Kadiiš in the Wehr dictionary of SA is reported as ‘cart horse.’ In CA this meaning is not found. However, kadaša means ‘move, hurry’ (Lisaan V: 3836), kadiiš no doubt a metonymy built in this meaning. Kaabaayo is from Spanish caballo. All of the most common words for ‘stallion’ in the dialects correspond to forms in CA, often with identical meanings. In addition, a host of less-common variants occur, most of them with correspondences of some sort in CA. There therefore can be little doubt that they originated in the heartland of the Arabic world and spread from there. It would be extremely hard to find independent origins for ⁵⁰ Listed in the Atlas as single data points rather than extensive isoglosses.

172

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D

jawaad in CA and jawaad in Sudanic Arabic. There may be, of course, innovation/incrementation after an initial spread. Žaaməl for ‘horse’ is clearly a North African innovation, though even here its origin is probably best seen as being early in the spread of Arabic in the region, given its occurrence in Malta. ʕawd for ‘horse’ might also appear to be a North African innovation, though its appearance in the Arabian peninsula would rather argue for a pre-diaspora origin. As Edzard (1998: 166) observed, the fact that the dialectal variation is reflected in the lexical variation in the Lisaan itself already speaks for dialectal variation inherited from the classical era.

5.5 Creole Arabic: Where Arabic stops To this point it has been illustrated that there are ample instances of individual changes in Arabic, which, whatever their linguistic nature (in most cases splits, in one or two, mergers), are readily comprehensible as having diffused from a single source. However, given its wide geographical extension it should not be a surprise that in its variegated history Arabic has in a manner of speaking, found itself in odd situations, and in some of these the classical comparative model of change via individual innovations or spread of inherited features into non-contiguous areas breaks down. The most striking of these is in the southern Sudan (today roughly co-terminous with the state of South Sudan) of the second half of the nineteenth century. Until 1851 the southern Sudan was largely isolated from the northern Sudan. In the wake of Egypt’s expansionist Nile politics, however, Egypt began opening up routes to the south. Within a period of 20 years a large network of trading camps was reported (Schweinfurth 1918; Collins 1962; see Mahmud 1983; Owens 2014b) to have been established at distances of 15 miles. These camps, first used as slaving depots, later for ivory accumulation, quickly attracted a large multi-ethnic and multi-lingual population. There was a relatively small group of European and Arab (e.g. Syrian, Egyptian) commanders, a cadre of Nubian (Nile valley) soldiers, and many southern Sudanese coming from the multilingual population of that area—Bari, Dinka, Lotuxo, Moru-Madi, Azande. The list is long. In addition, seasonal traders (jallaaba) came particularly from the western Sudan. Unsurprisingly a lingua franca developed in this environment in the camps based on Sudanese and Egyptian Arabic. While the form of this “proto” Creole Arabic is not attested, in 1888 a significant event occurred. The Mahdi expelled the British from the northern Sudan and threatened to capture the last Anglo-Egyptian bastion in the south, commanded by Edouard Schnitzer, better known as Emin Pasha. In the heat of the early colonial period of Africa, an Emin Pasha rescue mission was organized commanded by Henry Stanley. Stanley found Emin having fled to

5.5 CREOLE ARABIC: WHERE ARABIC STOPS

173

northern Uganda and gave his troops the choice of returning with him to Egypt (via Zanzibar) or staying in Uganda. A large body of these troops chose to stay in Uganda, where their language, today called Nubi, became a new native tongue. The only Arabic creole was born.⁵¹ Today it is spoken in Uganda (Wellens 2005), Kenya, and in the southern Sudan where it is known as Juba Arabic (Manfredi 2017). With this basic background, the crucial linguistic question is (1) how similar Nubi (or Juba Arabic) is to Arabic and (2) what its historical relation to Arabic is. The first question is easy to answer. Structurally, it is not at all similar. This issue has been treated at length (e.g. Owens 2014b), so here it suffices to cite a basic difference. Whereas Arabic has a complex verbal system morphologically distinguishing person, number, and gender via segmental affixes, Nubi has lost all of these. The sample Arabic verb paradigms in 10.1–10.3 can be compared to the Nubi in (5.71). (5.71)

Nubi verb, singular’ Nubi Imperfect perfect 1 ana gi-katifu ana katifu 2 ita gi-katifu ita katifu 3 uwo gi-katifu uwo katifu

Nubi does not distinguish gender as a morphological category nor does it distinguish number or person in verbs as a morphological category. It distinguishes person paraphrastically via independent pronouns (see discussion in 11.5.4.3) and imperfect from perfect via the prefix gi-. While connections can be reconstructed in the development from a Sudanese Arabic to Nubi (Owens 2014b), the only area where it does show fairly complete overlap is in the lexicon. Following Thomason and Kaufman (1991: 314), Nubi does not derive from Arabic via a normal historical linguistic derivation. Nubi (creole Arabic) derives from Arabic, but it is not Arabic.⁵² In contrast to the previous cases discussed in this chapter, Nubi is not a non-contiguous development of some closely related dialect or Arabic variety but rather is a sui generis innovation. For a structural definition of the history of Arabic Nubi does provide an interesting comparative insight for the issue of stability in language, as will be taken up later in 11.5.4.3.⁵³

⁵¹ For its offshoot Turku in Chad, see Tosco and Owens 1993. ⁵² I do not think that Nubi provides any sort of missing link, structural, sociolinguistic, or otherwise, between Classical Arabic and Arabic dialects, as Versteegh (1984) once proposed. For related discussion of McWhorter 2007 to this point, see 10.2.1. ⁵³ For a somewhat murkier instance of a variety which clearly tends toward mixed language status, see Owens 2018a on Uzbekistan Arabic.

174

PUNCTUATION AND L ANGUAGE HISTORY: I/I + D

5.6 Exogenous discontinuity Before leaving this chapter it is relevant to recall that noncontiguous features can result from in-migration as well as migration. A good example is Holes (2006: 28– 30, 2019: 67–68) who argues that at one time the entire Persian Gulf coast was populated by speakers of a dialect akin to the Baħarna of Baħrain today. He terms these sedentary dialects. In the course of time—some time after AD 1000—parts of the coast were settled by speakers of a dialect akin to the ʕArab of Baħrain today, termed “Bedouin” or “Arab” by Holes. The influx into a previously unbroken line of Baħarna-like dialects was interspersed with ʕArab-like dialects, creating now a discontinuous set of Baħarna-like dialects. In this case the non-contiguousness was imported to the Persian Gulf coast by migrations from outside it. It is not impossible that in some cases such “exogenous discontinuity” was one aspect of the test features discussed in Chapter 5. In none of the individual cases studies, however, was exogenous migration the sole cause of the non-contiguous features, and to the extent that it plays a role, it is on a very local geographical basis.

5.7 Summary I have spent a considerable amount of space illustrating what I think is a fairly obvious point, namely that one of the epistemological bases of interpreting Arabic language history is the recognition that multiple occurrences of one and the same phenomenon in geographically discontinuous regions is evidence for a common origin: the feature either is an original Semitic inheritance, or is an Arabic innovation, and it spread with the populations who used it. In general these are instances of splits, with the original value being maintained in some speech communities. All of this may appear to be overkill to historical linguists who have not invested their careers in Arabic, but in fact it contradicts a long tradition in Semitic and Arabic studies which more often than not explains away such commonalities through independent parallel development.

6 Four issues in Arabic historical linguistics While historical linguistics is based on widely agreed upon principles, one thrust of the current book is that how these principles are interpreted and applied will depend to a large degree on the specifics of the language under study. These include: the intellectual tradition which the language has been embedded in; more recent approaches to the study of the language; as well as the nature of the language itself, in particular as elucidated in Chapters 5, and 7–11. This chapter elaborates on four essential aspects of historical linguistics as they impinge on the interpretation of Arabic. Critical stances will be developed vis-à-vis the older Semiticist/Arabicist traditions with special attention given to the methodological challenge of interpreting underspecified sources (6.1), to the more recent application of grammaticalization theory to Arabic (6.2), while 6.3 lays out the basic premises of historical linguistics as a retrospective undertaking. This leads to the multifaceted relevance of speech community as a concept legitimizing and helping to elucidate the specifics of Arabic (6.4–6.8). At places in this chapter I recapitulate material which has already been introduced.

6.1 Reconstruction and the Semiticist/Arabicist tradition The comparative reconstructive method of historical linguistics in conjunction with the principle of uniformity are two powerful concepts for understanding how languages change, and as will be seen, remain stable. The reconstructive approach, however, runs against much practice in the Semiticist and Arabicist tradition. It is not surprising, given how much effort is invested in accumulating and systematizing the sources discussed in Chapters 1–4, that old written sources—epigraphic and papyrological sources were summarized here—might be accorded a special status. From a comparative perspective, however, there is a danger of according them too much independent, stand-alone status, for three reasons. (6.1) Old sources: 1. may be underspecified, formally and pragmatically 2. may offer an incomplete picture of what “A” might have been relative to “B” 3. Need to be correlated with linguistic events which happened after them

Arabic and the Case against Linearity in Historical Linguistics. Jonathan Owens, Oxford University Press. © Jonathan Owens (2023). DOI: 10.1093/oso/9780192867513.003.0006

176

FOUR ISSUES IN ARABIC HISTORICAL LINGUISTICS

The first point has been discussed and illustrated at length in Chapters 3–4. The major problem which often limits the effectiveness of this data source is that the linguistic detail of the earlier sources may be of limited scope as primary data and needs “help” from other sources. To claim that a ∗ /k/ in the epigraphic record as 2MSG should be interpreted as ∗ ka immediately takes the form out of the realm of primary data—no [a] or {a} is found in it—and into the realm of reconstruction (see 4.1.3.1). A constant danger is that forms postulated within an imperfect epigraphic or papyrological record cannot be adduced as independent support for a reconstruction. This produces the following postulate. P5 Underspecification and reconstruction. Interpretations of underspecified forms may be the target of reconstruction, but the results cannot be used as evidence for an antecedent state

What one can do, of course, as done in 4.1.4 Table 4.7 above, is to recognize that a form such as Safaitic {–k} signals 2SG object pronoun undifferentiated for gender and integrate this into the larger reconstruction of suffix pronominal forms. Safaitic serves as confirmatory support. The epigraphic record and reconstruction are mutually supportive, but in this case the fine details of the form are provided by reconstruction (see Owens 2006 [2009]: 256 for reconstruction of ∗ -ka/∗ ki). The second caveat introduces a new critical perspective and that is “incompleteness.” Structures, morphemes, may have existed in Old Arabic for which there is no record in Old Arabic itself. The Old Arabic record is incomplete. Incompleteness is distinguished from underspecification in that it simply recognizes that even under the best of circumstances, not everything which one wishes to have will be found in Old Arabic. Sibawaih, as was made clear above, is not only not underspecified; he provides often unimaginable detail about many aspects of Arabic. However, his record is incomplete. He wrote at length about variant forms (see summary in Owens 2019a) in a way which can be understood in terms of sociolinguistics and/or dialectology if thinking in contemporary terms. However, his observations on variation had their limits. He talks about Hijazi forms and Asad forms and a whole range of socio-dialectologically relevant entities, but going through the Kitaab one would be hard-pressed to define an Old Arabic dialectology. Probably what he did was to pick out salient features associated with groups of speakers/regions/tribes who happened to be prominent in late eighth-century Basra where he lived and worked and introduce them at those places in the Kitaab where they add to our knowledge of the construction in question. His information is tantalizing, but incomplete, because having mentioned a special feature among one group, one needs to know, for the sake of comprehensiveness, what the reflex was elsewhere. Let me emphasize very clearly here, (see Carter 1994 vs. Owens 1995), that this is not a criticism of Sibawaih. It is not difficult to understand that Sibawaihi’s brief was not to invent and systematize sociolinguistics, to invent

6.2 GRAMMATICALIZ ATION THEORY AND HISTORICAL LINGUISTICS

177

and systematize Arabic dialectology. From the perspective of the current task of reconstruction, however, the limits to what can be found in Sibawaih need to be recognized. This point is all the more cogent since, as emphasized in Chapter 4, scholars rely to a far greater degree on Sibawaih to interpret the epigraphic and papyrological record than is often given cognizance. To take the concrete example of the intrusive morpheme ∗ –n- argued here to be part of proto-Arabic, this morpheme in this function is mentioned nowhere in Sibawaih. This, however, does not mean that it did not exist in Sibawaihi’s day. It simply means he did not record it, at least as a direct cognate. His record is incomplete. If it is incomplete, if the Old Arabic sources in general are incomplete, they can still be complemented by other methods discussed in this book. The third criticism is that Arabic did not stop with Old Arabic. Arabic is alive and well today, and the current view is a direct descendant of most of what is found in Old Arabic. Accordingly, too high a status to Old Arabic alone risks losing sight of the fact that Arabic continues. There are two major criticisms in this section, one explicit and one implicit. Explicitly it simply states the obvious, that interpretations of underspecified and incomplete evidentiary sources need to be justified using the comparative method itself. Implicit in these criticisms is the assumption that historical linguistics should at least acknowledge the linkage between a formal rule and its instantiation in a society where it is postulated. This is a perspective developed in greater detail in 6.4–6.7 below. To end this section and anticipating the discussion in 6.3, ambiguous though the pre- and early Islamic epigraphic and papyrological record is, nothing that it says blatantly contradicts the results of the classic comparative method or what we know from the Classical Arabic grammatical tradition. In this sense it conforms to and confirms Labov’s uniformity principle (see 6.3 below) in saying that so far back as Arabic sources are known, they attest to a variety similar to Arabic as known up to hundreds of years later.

6.2 Grammaticalization theory and historical linguistics Turning away from older, traditional issues, it is relevant to consider more recent developments in historical linguistics. I look at two here beginning with the role of grammaticalization¹ in Arabic historical linguistics. Here the perspective is that the rich detail and relative diachronic depth which a language like Arabic can offer should be of central interest to testing grammaticalization theory. ¹ Grammaticalization theory is hardly new, in the western tradition going back at least to the French linguist Meillet. What is new is the attempt to work out systematic principles on a broad comparative basis which extend beyond the loss of independence between collocating morphemes.

178

FOUR ISSUES IN ARABIC HISTORICAL LINGUISTICS

In this context it is interesting to look at the role of grammaticalization in understanding the development of ∗ b-, discussed at length in 5.3.2.6. While the data does not demand a treatment in terms of grammaticalization, grammaticalization processes have been implicitly adduced in a number of places, and ostensibly the data described here looks tailor-made to fit into a grammaticalization framework. More importantly in this context, if grammaticalization exists as an independent structural force of some kind, as some grammaticalization theorists at least seem to say it is (in particular Heine, Heine and Kuteva 2003, 2011), one might expect to see it repeated in the different dialectal speech communities examined here, or alternatively, one might expect predictions to be made against universal grammaticalization trajectories which tell where ∗ b- is headed to in a given dialect. Many instances of grammaticalization as described in the literature can be found in the current data. Properties of grammaticalization which can be discerned in the current data include phonological shortening, morphologization and limitation of a morpheme to a fixed position (Verhoeven 2008: 2), and an increase in obligatoriness (Heine and Reh 1984: 11.² The shift from a verb ‘want’ to future/irrealis/volative marker is well described and relevant to Arabic (Bybee et al. 1991, see (5.47)). If (5.47) represents primary grammaticalization, the developments which come thereafter exemplify so-called “secondary grammaticalization” (van der Auwera 2002: 24; Kranich 2010). The interest here, however, is not to associate specific individual phenomena in the current data with comparable cases or generalizations over a number of cases in other languages. Rather, it can be asked to what extent grammaticalization theory steps in with a degree of predictability about which changes have occurred and why, generalizing across all the individual speech communities discussed here. The entry point, the postulated ∗ yibi ‘want’ > ∗ b- ‘future, irrealis, non-evidential’ starts well enough (see (5.47)). Individual lexemes do become grammaticalized in ways that reflect their lexical origins. However, even here, in the Arabic context, one wants to know why this happened in some varieties, but not in others. While the unavailability of yibi in many dialects might explain some of the difference, many varieties have reflexes of ∗ yibi but no b-, as in the summaries in (5.51, 5.53). The change happened at some point, and once it stopped happening yibi and bi- either continued to co-exist, or, reflexes of yibi alone continued without developing into b-. Equally, varieties where b- developed continued on without yibi. Once the new morpheme is in the language the phase of secondary grammaticalization begins. Bybee et al. (1991) have a specific set of predictions for how reflexes of ‘want,’ yibi, should develop: first a modal force ‘desire, obligation,’ then

² By way of orientation, in Bybee et al.’s fusion, dependence, and shortness indices (1991: 34–39), b- places well toward the more grammaticalized end of the scale in each (respectively, ‘4′ , ‘5′ , and either ‘7′ or ‘8′ ).

6.2 GRAMMATICALIZ ATION THEORY AND HISTORICAL LINGUISTICS

179

‘intention, immediate future’ then future, and finally ‘probability, possibility.’ In Persson’s (2008) exposition, b- in G/N arguably reaches at least the third stage, though the status of the two anterior stages is difficult to gauge. To say a first step is ‘desire, obligation’ is simply to state the meaning of the basic lexical entry of baɣa ‘want, desire.’ There is no clear distinction in the data between an immediate future and a future, while “possibility” is not a value in Gulf/Najdi. A stage by stage grammaticalization process is not obvious. Thereafter in its further development outside of G/N ∗ b- makes a complete about face. ∗ b- shows a dramatic shift in L/E for instance, becoming a marker of indicativeness, of here and now or habitual evidentiality. The modal values in L/E by contrast are assumed by the Ø imperfect. Once the primary grammaticalization occurred, the fate of ∗ b- within the six relevant dialectal communities is varied. In Ṣanʕaani, b- vs. Ø develops a wellprofiled contrast only in part, though evidential b- vs. subjunctive Ø is evident (App. 5.3.2.6.1). In LCA, in the b-V context and in L/E generally, and probably in Uzbekistan Arabic as well, b- vs. Ø has a broad indicative vs. subjunctive contrast. However, in LCA only deontic modality is marked by Ø, whereas in L/E Ø marks both deontic and epistemic modality. Apart therefore from providing an insight into the primary grammaticalization of ∗ yibi > ∗ b-, grammaticalization theory generally fails to elucidate the global developments of Arabic ∗ b-. Most importantly, it provides no link as to how the original non-evidential b- of G/N changed dramatically to the evidential indicativeness of the other varieties. Indeed, taken at face value, Bybee et al.’s (1991) scale might be interpreted as saying that such a shift should not occur. It is for this reason that considerable detail was expended describing the LCA b-C context in 5.3.2.6.2–5.3.2.6.4. What is argued is that the shift to indicativeness followed from the happy conjunction of two factors. b- shifted from being a marker relating a proposition to the external world to one marking a relation between adjacent, discourse-immanent propositions. This then generalized into the unmarked value of indicative. The regularization of this shift was facilitated by the phonological ‘gap’ which existed in V-initial imperfect personal verbs. The regularization of bhad a phonological as well as a pragmatic motivation, the phonological moreover entailing morphophonological extension, not shortening. However, if grammaticalization theory does not elucidate beyond the primary grammaticalization stage in the data presented here,³ it still can be seen to have

³ The findings here complement Poplack (2011) who shows that the grammaticalization of ‘go’ futures in three related Romance languages, Portuguese, French, and Spanish are sensitive to quite different conditioning factors in each. A recent special edition of Language Sciences devoted to secondary grammaticalization (Breban and Kranich 2015) fails to find a unified definition of the phenomenon, and some contributors (Bisang 2015) even question its relevance. Hoenigswald (1977) would a priori argue against the idea of expecting a predictable trajectory from primary to secondary grammaticalization.

180

FOUR ISSUES IN ARABIC HISTORICAL LINGUISTICS

two important functions. First, grammaticalization provides a convenient and graphic metaphor allowing the parametricized values to be aligned in a fashion which encompasses all varieties where ∗ b-reflexes occur. This is represented in the different stages as represented on a tree in App. 5.3.2.6.4. While the diagram is termed a grammaticalization diagram, it is necessarily epiphenomenal. Aside from the issue of whether grammaticalization itself is epiphenomenal, App. 5.3.2.6.4. is epiphenomenal because it fictionalizes ‘Arabic’ into a single language community. The diagram is the analyst’s post hoc assessment of developments, which has no predictive value, if only because the variants are ensconced in individual speech communities (see 6.6). Aside from the possibility of dialect contact, it is difficult to see what language-internal forces should push contemporary stage 1 G/N b- to stage 3 b- indicative, for instance. Still, summarizing the developments under the blanket concept of ‘grammaticalization’ allows a complex linguistic history that stretches back at least over 1,400 years and over a large part of the earth’s surface to be concisely compressed onto one diagram. It allows one to talk about what are in some sense related phenomena within a context of ongoing variation, change, and stability The second function of grammaticalization is as a metafilter. The reconstruction developed here begins with ∗ yibi ‘want’ > ∗ bi- ‘non-evidential, future,’ a change argued to be still in place in G/N. Logically and counterfactually for illustrative purposes, it could be argued—and indeed as seen briefly in 5.3.2.6.3 it has been in different forms— that the original change was ∗ yibi > ∗ b- ‘future/volitional’ (rather than > evidential). In this respect Bybee et al.’s study of the trajectory of ‘want futures’ provides corroborating evidence for the argumentation followed in 5.3.2.7. Even if it doesn’t provide a detailed roadmap, grammaticalization theory used circumspectly can contribute to understanding complex historical linguistic developments. I would add here that comparative data bases of whatever sort are important and helpful, for instance the UPSID as employed in 3.2.2 to help interpret Sibawaihi’s “palatal” terminology. Equally, however, it can obfuscate historical interpretation. The danger was illustrated in detail in the discussion of the status of Levantine badd/bidd- ‘want’ as the source for indicative b- in 5.3.2.6. Grammaticalization trajectories can be assumed in check-list like fashion, obviating the need for a normal consideration of historical linguistic developments. Badd/bidd means ‘want,’ ‘want’ has been shown in grammaticalization theory to be a source for ‘future’ as marked in affixes, badd/bidd has a phonetic similarity to b-. Therefore it is the source of b-. An analogous argument applied to Fali demonstratives in 9.5.4 is criticized in ch.9 Table 9.10 note b. Indeed, an advocate of grammaticalization theory might even say that the proposed unitary account of the development of ∗ b- ultimately from one historical linguistic event traceable to the dawn of the Islamic era is nullified by grammaticalization theory, which has no place for a shift from future → evidential,

6.3 HISTORICAL LINGUISTICS, RECONSTRUCTION

181

such as advocated here (5.3.2.6.4). The “dual origin” source of ∗ b- such as advocated in different ways by Retso¨ and Leddy Cecere is preferable. It is good to consider this perspective, since it illustrates what happens when grammaticalization theory meets a comprehensive historical linguistics. What such a grammaticalization perspective still needs to deal with is non-exhaustively the following: • The solution proposed by Leddy-Cecere and Retso¨ runs afoul of Lass’ principle, 5.2.2, against parallel independent development. • All varieties with b- are Arabic, whose speakers are known to have spread throughout much of the Middle East and Africa. The I/I + D model undeniably works in many cases (see Chapter 5). There is no fundamental reason it shouldn’t work here. • A crucial argument depends on the analogical spread of b- to a V-initial slot (5.3.2.6.2), where grammaticalization theory has nothing of inherent interest to say.⁴ • All varieties with b- have b ~ m allomorphy in the 1PL imperfect. To these one would add the still unresolved issues of explaining within the “two morpheme” origin how bidd/badd ‘want’ serves as a source for Cairene b- and LCA b-, neither of which have the assumed source morpheme, as explained in 5.3.2.7. In short then, grammaticalization theory should be a part of Arabic historical linguistics, but it does not substitute for it.

6.3 Historical linguistics, reconstruction Having cautioned about problems relating to the many underspecified sources available for interpreting Arabic history, and about both the problems and the felicities of invoking grammaticalization theory, I turn first to two core precepts of historical linguistics. The first of these, the comparative method, is constitutive of historical linguistics itself. The second, an emergent perspective closely associated with sociolinguistic methodology, seeks to instrumentalize the idea of speech community for understanding language change. It is a truism, but one worth repeating that historical linguistics is inherently backward looking and backward moving. If A is chronologically older than B, in principle reconstruction goes from B toward A. It is easy to see the logic in ⁴ I am assuming. Adding analogy to the war chest of grammaticalization theory would be tantamount to declaring normal historical linguistics and grammaticalization theory to be one and the same.

182

FOUR ISSUES IN ARABIC HISTORICAL LINGUISTICS

this. Historical linguistics is based on the comparative method which allows the reconstruction of postulated antecedent states on the basis of what is attested in subsequent states. Lass (1993: 157) expresses one promise of the comparative method “To reconstruct is to reverse time and to make the products of that reversal accessible.” This obviously is an idealization. Strictly speaking, it is not actually time that is reversed. Rather our assumption that we can understand subsequent states as the result of linguistic processes which occurred in past linguistic states gives the appearance of ushering us into the past. Underlying this is the basic postulate better known as Labov’s principle of uniformity which basically states that past states replicate what we can observe in today’s language (Labov 1972b, 1994: 20– 24; Lass 1993: 164–170, 1997: 28, 273). This has wide ramifications, but the basic logic is impeccable. If all languages today distinguish discourse situations in terms (minimally) of speaker, addressee, and non-speech-act participant (in the Arabic tradition, al-mutakallim, al-muxaaṭab, al-ɣaaʔib), then we do not expect to reconstruct a language with only two of these types of participants. Of course it is an intuitive judgment as to what the principle disallows, and clearly to the extent that it needs to be invoked on any given instance, it needs to be done so with care. For instance, as Walkden (2019: 4) notes, Dixon’s dichotomization into eras of equilibrium and punctuation (7.1) acknowledges that languages can change rapidly (cf. 5.5) but equally are marked by long periods when they do not (Chapter 11 below). Change and no change are equally aspects of uniformity. Linguistic uniformity refers to uniformity of form, of language as an abstract system. In the light of now 60 years of sociolinguistics, it would appear desirable to extend the idea to uniformity of social structure associated with the language. To this end, in an interesting article Bergs (2012) embeds the problem of anachronism in the uniformity principle. An anachronism is an object, institution, concept, manner of behaving which is out of synch with the period that is attributed to it. For instance, contemporary sociolinguistics was born of the concept of social class (Labov 1966). It would be anachronistic, however, to assume that the modern concept of social class is applicable to societies in the past. Bergs (2012: 88) suggests that corpus-based studies of fifteenth- and sixteenth-century English do give evidence for socially based differences, but that it would be naïve to assume that these mimic social class as we know it today. This provides a realistic orientation for how, generally, contemporary sociolinguistics can be instrumentalized for historical linguistics. The relevance of concepts from contemporary sociolinguistics (class, gender, network) can be assumed as basic heuristics, without necessarily expecting precise replication. Once the interpretation of subsequent states is embedded in a larger comparative framework, in the best case these can be aligned in a more global interpretation of antecedent events, which does include a link to actual populations. It is this aspect of change, the invocation of actual speech communities, postulated populations of speakers, that I will develop in the rest of this chapter.

6.3 HISTORICAL LINGUISTICS, RECONSTRUCTION

183

To get to a correlation between classical linguistic reconstruction, sociolinguistic constructs, and actual populations I begin with the linguistic example introduced in 5.1. In 5.1 the WSA development which “allomorphized” the distribution of the –t suffix ‘1/2MSG’ (see (5.13)) requires that the input to the development was (simplifying slightly) “∗ C-t#.” The morpheme needed to occur after a C- and be word final. As will be discussed at some length in 12.3.3 antecedent states to WSA might include varieties with the following forms: (6.2) -tu (katab-tu ‘I wrote’) -ta (katab-ta ‘you M wrote’) -ti (katab-ti ‘you.F wrote’) These in all likelihood⁵ would not have been the immediate antecedent state to the WSA rule since in (6.2) the 1/2MSG forms have final vowels. This observation dovetails with what we know of the WSA development. The development requires a variety with a final –t which neutralized the 1/2MSG contrast and this is exactly what one finds in all varieties of (inter alia) Egyptian Arabic. (6.3) katab-t ‘I/you.MSG wrote’ katab-ti ‘you.F wrote’ We know that ancestral WSA diffused from Upper Egypt, hence providing the necessary input to (5.9), repeated here as (6.4). (6.4)

∗

-t → Ø /C_# # ≠ following al-DO

This discussion illustrates two classic aspects of reconstruction. First, any reconstruction needs to be linguistically plausible. Plausibility includes not only the basic observation that a change has occurred, but also, circumstances allowing, that a linguistic explanation for the change be identified. In the case of (6.4), sonority was the operative principle (see (5.10) and 10.2 (10.8)–(10.10)). Secondly it shows that the change can be aligned sequentially against other changes in the language. In this case, (6.4) needs to follow the loss of the vocalic contrast –tu/-ta. The development to (6.4) as a series of linguistic changes can be informally represented as in Figure 6.1. The basic justification for Figure 6.1 is found in 12.3.3, though it will be noted that alternative interpretations are possible. For now it suffices to note that the ⁵ The caveat needs to be added since all varieties of Arabic presumably have alternations of the type katab-tu ~ katab-t ‘I wrote’ based on pausal varieties (see 12.3.3).

184

FOUR ISSUES IN ARABIC HISTORICAL LINGUISTICS *-tu, -ta, -ti (?) -t, -ti (Egypt) -tu, -t, -ti -t, -ti

-tu, -ta, -ti

…

-t ~ Ø (WSA)

Figure 6.1 Development of WSA –t ~ Ø allomorphy

marking of the 1 and 2SG perfect verb proceeded non-linearly, splitting into a number of paradigmatic alternatives. One of these saw the neutralization of the 1SG and 2MSG, as in Figure 6.1, and this provided the input into (ancestral) WSA (6.4). This innovation is identified with Egypt in Figure 6.1, though its full geographical extent is much broader, an issue treated only tangentially in this book. Having defined a basic sequence of linguistic events, one can think of correlating these developments with actual migrations, actual populations (see Owens 2005b). The historical background will be summarized in Chapter 8, but here it can be mentioned that (6.4) needed to be in place sometime between 700 and 1220–1390.⁶ 700, the earliest date, is when Arabs moved in large numbers into Egypt. 1390 attests to them being in the WSA at the latest by this date. It could be that (6.4) had already occurred in the ancestral WSA in Egyptian Arabic, and in fact the innovation discussed in ch. 5 n. 2 (and (10.20) in 10.2) provides one indexical support for this assumption. In any case, the linguistic developments correlate broadly with the slow, demographic drift of Arabs into the western Sudanic region, today politically western Sudan, Chad, Cameroon, and NE Nigeria. It may be assumed that these represented a nomadic culture much as can still be found in the region today (see Brauka¨mper 1993), in which case it would have been based on small social units, probably family-based, as will be discussed further in 6.5. Strictly speaking, showing this empirically requires having much better comparative sociolinguistic data than we have at our disposal. Nonetheless at some point a link needs to be drawn between generalizing representations as in Figure 6.1 and actual populations. The operative concept which is axiomatically assumed not to be anachronistic in Bergs’ sense is that of speech community. Speech community is not anachronistic in principle, and it is probably not anachronistic in the narrower context of western Sudanic Arabic society,

⁶ The range 1220–1390 follows from two known historical developments. We know from letters written by the Mai of Kanem to the Sultan of Mameluke Egypt that Arabs had reached the Lake Chad area by 1390. The large-scale movement into the Sudanic region began about 1220. I use 700 as a terminus ab quo, rather than 640 the date of Arabic–Islamic hegemony in Egypt, because nothing like (6.4) is attested elsewhere in the Arabic world. Had (6.4) been in place at an early date, one might have expected, for instance, that it could have been carried with the invasion of North Africa and Andalusia. One suspects, rather, that the innovation occurred somewhat later.

6.3 HISTORICAL LINGUISTICS, RECONSTRUCTION

185

as will be illustrated in 6.5 below. This idea is hardly controversial in and of itself. As seen in 5.2.2 and will be exemplified further in this section, speech community is a standard element in historical sociolinguistic thinking. What needs to be elaborated on is how the idea of speech community can be instrumentalized to help understand communities of language which, at best, can only be reconstructed, such as at times is the case with Arabic. Fortunately in the case of Arabic there are speech communities which do allow direct observation and by extrapolation allow generalization to past states of the language. In this and the next three sections in this chapter I develop this point. To say that language change occurs in speech communities is a tautology. It is nonetheless appropriate to examine the construct for three reasons. The first (6.4) is the methodological problem of instrumentalizing the role of the speech community in explaining language maintenance and change. The second concentrates on the specific historical linguistics of Arabic. As soon as one looks beyond written records, and beyond Classical Arabic as it was normalized by the fourth/tenth century, discerning a linear trend in the history of Arabic is, to say the least, a challenge. A traditional answer to this was the overly simplified dichotomization into Old and Neo-Arabic, which as shown in 1.1 is a dichotomy which does not withstand comparative linguistic scrutiny. This book explores other answers and a part of appreciating these answers entails remembering that within one language there can exist seemingly contradictory forms, structures, and sub-systems. Looked at purely structurally these contrasts may inhibit the practice of straightforward historical linguistic reconstruction. The idea of a speech community allows us to assign these contradictions to postulated populations within which the linguistic “logic” can be preserved and passed on, or changed (sections 6.5–6.7). The third is the special status of Arabic diglossia (6.8) representing yet another methodological twist in ascertaining Arabic language history. I use “speech community” as a helpful heuristic. There is no universal definition of speech community. It is generally applied to smaller populations marked by similar linguistic usage, shared norms of speaking and having similar attitudes. It is, however, an extensible concept. Because the idea of a speech community is strategically important for understanding Arabic language history,⁷ in a way it is not, for instance, for Icelandic (see 12.2), I will discuss the concept from the perspective of two case studies in the next two sections.

⁷ Strategically in the sense of according to all evidential sources an equal status for purposes of historical linguistics, a stance particularly important for Arabic, as has been and will further be emphasized. Of course, an historical account of Icelandic requires a careful sociolinguistic treatment.

186

FOUR ISSUES IN ARABIC HISTORICAL LINGUISTICS

6.4 The speech community and the scope of change: How does it help? To begin with, the assumption of a speech community entails explaining linguistic change as a step from one concrete system to another. This accords with the uniformity principle which is a basic axiom allowing us to project language history backward (see above). Secondly, the concept of speech community is appropriate because, obviously, that is where change occurs. For Labov (2007: 347) a speech community should have “well-defined limits, a common structural base, and a unifed set of sociolinguistic norms.” Effectively, however, speech communities are defined by the scope of the sociolinguistic processes which take place within them and on this basis speech communities can be very large, or very small, depending on what is being studied. In Labov (2007) the speech communities are large. Two short vowel systems in American English are contrasted, one the so-called Northern Cities Shift (NCS, see discussion at end of 5.3.2.4), the other the New York City short /a/ system. The NCS is postulated to have emerged in a koineization process engendered by a multi-dialectal influx into upstate New York set in motion by the building of the Erie Canal (Labov 2007: 372). Subsequent migration took the system west. The more constrained NYC short /a/ system, on the other hand, was the result of smaller, more coherent migrations historically tied to New Jersey and NYC (Labov 2007: 362, 367). In reconstructed terms two types of speech community⁸ resulted in different vowel systems. The speech communities here are fairly large, but in Labov’s analysis they have developed incrementally over a period of up to 200+ years. In Labov (2007) incrementation is inferred, essentially using the classic comparative method, triangulating between NYC and its successor populations toward the west and south. At best we can synchronically observe change working its way through a speech community in no more than four generations.⁹ Arguably the best witness to incrementation as a diachronic process is found in historical documents. This is unfortunately severely limiting in the range of languages which can be examined in these terms, but one example is appropriate. Kroch (1989) was concerned with the question whether linguistic changes set in simultaneously in all contexts where a particular form occurs, or whether they start in highly frequent contexts and spread to less-frequent ones. English has the luxury of allowing examination of this question with a fairly large written corpus, which in this case begins in 1350 in the Middle English period. Kroch examines the increasing frequency of the verb do during the period 1350–1700, using it to question the idea

⁸ Which is not closely characterized by Labov. When they are, the spread of the community norm will recall their establishment based on a founding effect (Mufwene 1996). New Orelans, Cincinnati, Plainfield New Jersey are all interpretable as a changed variety of the NYC short /a/ system. ⁹ Unless of course one has the luxury of trend studies.

6.4 THE SPEECH COMMUNIT Y AND THE SCOPE OF CHANGE

187

that the progression of a change varies in the different linguistic contexts it occurs in, with some contexts favoring a more robust change (Bailey 1973). As a counterexplanation, Kroch proposed the “constant rate hypothesis” under which a given feature changes at the same rate in all the linguistic contexts which can be discretely identified for it. Briefly, Kroch defined five different do contexts. (6.5) illustrates an adverbial transitive question and (6.6) a negative declarative. Other contexts are negative declaratives, affirmative intransitive adverbials and yes-no questions, and affirmative wh-object questions. (6.5) Where doth the grene knyght holde hym

(Kroch 1989: 216)

(6.6) bycause the nobylyte ther commynly dothe not exercyse them In addition, do appears in simple affirmative context. (6.7) Me thinke I doe heare a good manerly Beggar

(Kroch 1989: 217)

Kroch argues that the best model for understanding the expansion of do is a global increase of the frequency of do in all five contexts simultaneously. What is of interest here, however, is that the textual material graphically shows that the change eventually encompasses the entire English language (of that day) via a series of incremental changes in the frequency of do from 1400 to 1700, as in the following Table 5.2 (Kroch 1989: 224). There is some fluctuation particularly with negatives between periods 7–10 (1550–1650), which Kroch associates with a general re-categorization of the verbal system in which an inherited ability of all finite verbs to occur in negative and question contexts, raising to Infl in a generativist representation, is lost. This resulted in (6.8). (6.8) Whiche he perceiueth not Instead, a new category, auxiliaries including do, gradually monopolize the ability to take finite form in questions and negatives (Table 6.1). Kroch’s study shows that large though it is, an entire language can be conceived of as a speech community which change eventually works its way through. Speech community is probably more often instrumentalized to describe changes within smaller communities. A good example of this is Thelander (1982). In an early sociolinguistic study he documented the emergence within four generations of a stable koine in the north-central Swedish town of Burtra¨sk. He identifies 12 different variables differentiating standard Swedish from the northern dialect, features such as SS (standard Swedish) aer ‘are’ vs. DS (dialect Swedish) vara. In the first generation, whose oldest members were born before 1930, the SS and DS intermingle in the texts at similar frequencies, any given variant between 30%

188

FOUR ISSUES IN ARABIC HISTORICAL LINGUISTICS

Table 6.1 Periphrastic do (Kroch 1989: 224) Negatives Negative Declaratives Questions

Affirmative Transitive Adverbiala & Yes/No Questionsb

Affirmative Transitive Adverbial & Yes/No Questions

Affirmative Wh-Object Questions

Period Date

% do Total

% do Total % do Total

% do Total

% do Total

1 2 3 4 5 6 7 8 9 10 11

0 1.2 4.8 7.8 13.7 27.9 38.0 23.8 36.7 31.7 46.0

11.7 17 8.0 25 11.1 27 59.0 78 60.7 56 75.0 84 85.4 48 64.8 128 93.7 95 84.2 38 92.3 52

0 0 0 21.1 19.7 31.9 42.3 44.4 61.9 75.7 70.2

0 1 0 27 2.0 51 11.3 62 9.5 63 11.0 73 36.0 75 38.6 120 29.8 171 53.0 66 54.9 51

1400–1425 1425–1475 1475–1500 1500–1525 1525–1535 1535–1550 1550–1575 1575–1600 1600–1625 1625–1650 1650–1700

177 903 693 605 651 735 313 629 278 344 274

0 3 10.7 56 13.5 74 24.2 91 69.2 26 61.5 91 73.7 57 79.2 173 77.3 277 90.9 66 94.7 76

7 86 68 90 76 116 71 205 310 74 131

and 50%. By the fourth generation, those born after 1956, the SS and DS features had broken in two, the balanced distribution of tokens splitting into seven features that had become dominantly SS and five which had become dominantly DS. At the extremes, for instance, DS negative int was used over 95% of the time by fourth generation speakers, who equally used SS var ‘were’ almost 100% of the time (Thelander 1982: 72). What is striking (Thelander 1982: 72) is that three changes which until generation three were tending toward SS took a sharp turn in the fourth generation toward dialectal usage, so that fourth generation usage is essentially bimodal, either SS or DS for a given variable, to a far greater degree than any of the previous three generations were. In a number of studies a finer-grained delimitation of speech community provides a context for explaining the direction of changes. Dodsworth and Kohn (2012) describe what they interpret as a reversal of the southern vowel shift (SVS) in Raleigh, North Carolina. The southern vowel shift is marked by the tense vowels /iy/ and /ey/ becoming laxer and more central while the lax vowels /i/ and /e/ become tense and more fronted (Fridland 1998). Looking at three generations of speakers beginning in 1925, Dodsworth and Kohn show that beginning around 1960 the SVS stopped, and by the third generation, those born after 1979, the mean F1 and F2 formant values have all but moved outside of the original “southern range” of values marking the oldest generation (2012: 231, 234). The authors explain this change as due to a massive immigration from outside the south since 1960 and resulting residential patterns in which children of immigrants tend to mix with other immigrants. The overall context of this shift is the larger Raleigh speech community.

6.5 A NON-DETERMINISTIC SPEECH COMMUNIT Y

189

In short, speech community is explicitly or implicitly the domain of language variation in sociolinguistics. A third orientating function of speech community allows hypotheses to be posed about different aspects of the actuation problem. One that has been exemplified in a number of studies is that intra-community change proceeds within a fixed number of generations. Trudgill (2004) for instance argues on the basis of the formation of New Zealand English that a typical trajectory is three generations (also Ivars 2005: 1060–1063 on Kristinestad in Finland). While the Burtra¨sk and the Raleigh study cited above (also Kerswill and Williams 2005: 1024, 1036) document a different generational time (four generations in the one case, two in the other), the operative social unit is a generationally characterized speech community in all cases.

6.5 A non-deterministic speech community If the concept of speech community allows linguistic changes to be contextualized in a social environment, it doesn’t follow that the concept carries an inherent predictive value to define what sort of changes occur or under what conditions they emerge. The concept is a necessary element in posing the actuation question, but it does not solve it. The overwhelming majority of studies describing variation within a speech community do describe variation as leading to change. This is the case with each of the studies referenced in 6.4. Still, there are probably many cases where the speech community is usefully viewed as a forum for hosting competing forms which for various reasons never, at least so far as can be ascertained in the time horizons available, filter out the constituent variation.¹⁰ I discuss one such case here, which will be seen relevant to understanding Arabic language history.¹¹ What will be seen is that Arabic in Maiduguri located in NE Nigeria, an urban center of between 500,000–1,000,000 people, is characterized by a high degree of variation. Variation in speech communities as just illustrated is expected over the long term to lead to a decrease in variation and eventually to change. What will be documented for Maiduguri is an overall high degree of variation, which decreases not through time, but rather through social unit, according to the size of the speech community focussed on. To show this I will consider three data sets of decreasing size. Before starting, however, a brief overview of Maiduguri is in order. Maiduguri is today the largest city in NE Nigeria, though it became important only in 1906 when the British colonial government made it the administrative center for NE Nigeria. NE Nigeria represents the westernmost extension of Arabs belonging broadly to the WSA and LCA dialect area (see Chapter 8). Originally Arabs came to the area as cattle ¹⁰ To this, Naro et al. 1999. ¹¹ This discussion is based mainly on Owens 1998: chapter 4 (dialectology) and 12 (community variation).

190

FOUR ISSUES IN ARABIC HISTORICAL LINGUISTICS

herders, though even as herders they typically are associated with brethren who live as sedentary farmers.¹² In the wake of independence and a general rapid rise of urbanization in Nigeria, by the 1990s, when the current data was collected, Maiduguri represented the largest population of Arabs in Nigeria. However, Arabs, even if by NE Nigerian standards numerous, are a distinct ethnic minority relative to the dominant Kanuri, and they are a linguistic minority relative to the dominant Hausa lingua franca (see Owens 2005a for studies on Arabic-Hausa codeswitching). An important linguistic reflex of this status is that Arabic is not a language of public spaces.¹³ Five parameters LCA, though small in population by comparison to Arabic in many countries, displays a significant dialectal differentiation. Five prominent binary contrasts define isoglosses (Table 6.2) which taken together characterize a differentiated dialect picture of LCA. LCA as a whole is characterized by a number of sharp variational boundaries, some of these marking single features only, others congealing into important dialect bundles. Table 6.2 Five isoglosses (a) Preformative vowel (b) CVCVC stress (c) a → e ∗ ħ/ʕe (d) AP-n (e) FPL suffix pro a

/a/ ya-ktub initial CύCVC bágar ∗ h/ʕa no -hin

/i/ i-ktuba final CVCύC bagár ∗ ħ/ʕe yes -han

Cited in subjunctive (non-b form).

The first isogloss is an isolated one, the only one that runs east–west, on a line that runs roughly just to the north of the city of Maiduguri and all the way to the east until the eastern dialect area, i.e. the eastern LCA dialect only has the /i/ variant. This is not, however, a part of the overall dialect bundle because the northern half of the western region has /i/ as well. The other four features form a rough bundle of isoglosses which separate western LCA from eastern (also termed Bagirmi). The fit is not perfect, as –han runs a little bit more westerly and northerly than AP-n, while ∗ ħe is a bit more northerly than –han. AP-n and CύCVC/CVCύC are pretty much co-terminous. All in all, however they are variables with a high frequency which define two well-profiled regions, the western and eastern (Bagirmi, see Map 6).

¹² Tragically the entire rural Arab population, numbers ranging in the hundreds of thousands, have until today been completely displaced to refugee camps and other regions of Nigeria and surrounding countries in the wake of the Boko Haram insurgency. ¹³ A number of group conversations, e.g. GR158, GR167 in the Bayreuth collection have excerpts in which children switch to Hausa while speaking.

6.5 A NON-DETERMINISTIC SPEECH COMMUNIT Y

191

Map 6 Western and eastern LCA dialects

Briefly to locate these isoglosses in the larger picture of Arabic, i/a is a taltala feature (5.3.2.5). The penultimate or ultimate stress on CVCVC words is a fundamental isogloss which roughly speaking separates Arabic as a whole into eastern and western zones. CύCVC is the eastern zone, which includes Egyptian and the Sudan, though there is a small CVCύC zone in southern Jordan, while CVCύC encompasses the rest of North Africa. This same CVCύC isogloss, however, is repeated in the WSA region, running through NE Nigeria, northern Cameroon and into Chad, where its eastern border has not been defined. The shift of ∗ ħ/ʕa → ∗ ħ/ʕe is nearly unique in Semitic. It is, however, exactly the same as what is found in Akkadian. In WSA, this isogloss runs almost identically with the previous. Note that Table 6.1c is formulated as having occurred on an ancestral form. As will be seen in 10.2, proto-Arabic ∗ ħ and ∗ ʕ surface as /h/ and /ʔ/ in the LCA. The change in Table 6.1c is carried forward on the basis of its having shifted at a time the pharyngeals were still in place, hence the contrast šahár ‘month’ < šahar vs. šehéd ‘beg, implore’ < ∗ šaħad. AP-n is the intrusive –n discussed in detail in 5.2. The 3FPL suffix pronoun has either /a/ or /i/. This is also a taltala feature,

192

FOUR ISSUES IN ARABIC HISTORICAL LINGUISTICS

though the value with the low vowel in this morpheme is not particularly common (Procházka 2014). In passing it can be noted that the features in Table 6.2 remind us of the complexity of Arabic language history. None of these features are LCA or WSA innovations. All integrate in one way or another into broader historical-dialectal isoglosses. Nonetheless, the uniqueness of the LCA area is confirmed by the shift described in (5.67) above. I return to this point in 11.7. In Table 6.3 I use a simple index to illustrate how the notion of speech community helps understand the distribution of these variants in Maiduguri. Hundred percent represents the configuration on the left, except for a/i the bundle of features that define the western dialect area and 0% on the right, defining the eastern. Table 6.3 Interpretation of percent values

a. Preformative vowel b. Stress c. ∗ ħ/∗ ʕa → d. AP-n e. 3FPL

100%

0%

i CύCVC ∗ ħ/ʕa Ø hin

a CVCύC ∗ ħ/ʕe -in han

What I will illustrate now is that in Maiduguri these originally rural isoglosses translate to a large degree intact into local markers. I will do this in three steps. I begin with an overview of the entire city of Maiduguri. I then narrow the focus to a local neighborhood, and then even more narrowly describe a single household. The data and its sociolinguistic setting was described in detail in Owens (1998a). It was based on three data sets, interviews conducted in villages to ascertain the rural dialectal divisions, formal interviews conducted over the city of Maiduguri in order to get a picture of Maiduguri as a whole, and informal conversations recorded in local city wards, or in households, one of which is described here. The corpus was gathered in Maiduguri and NE Nigeria over a period of about 10 years in the course of various research perspectives on LCA, sociolinguistic, dialectological, semantic, and genre-specific. It totals just under 400,000 words in audio and transcription format, most of it online at the address given in the bibliography (Owens and Hassan 2011—present). It is divided into three data sets, a general overview of Maiduguri Arabic based on 56 interviews conducted in Maiduguri, a general overview of rural LCA, again based on interviews, and a set of informal, non-structured conversations also conducted in Maiduguri.

6.5 A NON-DETERMINISTIC SPEECH COMMUNIT Y

193

6.5.1 City as speech community Beginning with the Maiduguri city survey consisting of 56 interviews (see Table 6.4a note a, this chapter), overall the western dialects are numerically dominant, strongly so in the case of the 3FPL suffix, but as a whole there is no dramatic dominance of one dialect over the other. The t-test comparing the speakers under 32 and those over 50 showed no significant differences. If there should be evidence of change in progress it is expected to show up in the different generations. This does not appear on a city-wide level. Moreover, at the city-wide level the percentages do not indicate a marked tendency toward a city-wide norm based on one of the two prominent dialectal variants. There is a numerical dominance in three of the four west-east dialect features of the western variants. However, this is due to the fact that two of the three local Maiduguri communities where data was collected are dominated by speakers of the western dialect. Table 6.4a The city of Maiduguria Token count

%

t-test sig. age 50

a. Preformative vowel

i 3209 Western

a2931 Eastern

.52

.35

b. Stress

CύCVC 2252

CVCύC 1434

.61

.68

c. a → e ∗ ħ/ʕe

a 1766

e 898

.66

.64

d. AP-n

Ø 185

in 189

.49

.07

e. 3FPL

hin 248

han 96

.72

.81

a.

The Maiduguri city data is from 56 interviews conducted in four neighborhoods, which are included in Table 6.4a. The areas are Gwange, Gamboru (or Gamborí), Ruwan Zafi and a fourth which included speakers from other areas than these.

6.5.2 Neighborhood as speech community Maiduguri as a whole was, at the time of the recordings in the 1990s, only three generations old (founded 1906). Its rapid growth was initially fueled largely by migration, and the migration came from diverse areas—the western dialect, area, the eastern, and Chad (Ndjammena in particular). It is to be expected that a citywide heterogeneity as documented in Table 6.4a should ensue. It is interesting therefore, to look more closely at smaller residential areas. Here I look at a single

194

FOUR ISSUES IN ARABIC HISTORICAL LINGUISTICS

neighborhood in one part of the city, Gwange (see Map 7), in particular the mosque area which has attracted a high concentration of Arabic speakers. The mosque area is located in the city ward of Gwange, possibly the area of Maiduguri with the densest Arabic-speaking population.¹⁴ This area is the target of migration from the eastern dialect region because it is associated with speakers of eastern origin, as the imam of the mosque, Ibrahim Saleh, a well-known Tijani scholar who himself is from the eastern area. In this sample, there are 24 total speakers in the mosque area. The data used here excludes the interviews which are the basis of the Maiduguri city summary. Here again the t-tests show no significant difference between the younger and older speakers. As far as the dominance of eastern or western norms, clearly the eastern norms are dominant, indicated in Table 6.4b b–e by the lower percent, where a “perfect” eastern value would be 0. This is not surprising given that this area is something of a magnet for speakers from the eastern dialect region. However, it relativizes the observation above that in Maiduguri as a whole western norms are more numerous. Maiduguri as a whole is an abstraction. Local speech communities are where the transition from rural to urban varieties occur. This can be illustrated at an even more local level, a single household. Table 6.4b Mosque area Mosque area a. Preformative vowel b. Stress ∗

c. a → e ħ/ʕe d. AP-n e. 3FPL

t-test (mosque area), 50

i/a.90

.90

CVCvC/CvCVC .40

.79

a/e .38

.08

Ø/in .06

.36

hin/han .24

.56

6.5.3 The household as speech community The household is found in the Dikkeceri district of Maiduguri, an area with a high concentration of Arabs (Map 7). Both parents are from the eastern dialect area, ¹⁴ Ruwan Zafi, with a large Chadian population and Old Maiduguri are other areas.

6.5 A NON-DETERMINISTIC SPEECH COMMUNIT Y

Map 7 Gwange and Dikkeceri

195

196

FOUR ISSUES IN ARABIC HISTORICAL LINGUISTICS Table 6.5 The household data Core household a. Preformative vowel b. Stress

i/a.93 CVCvC/CvCVC .16

outsiders .79 .39

c. a → e ∗ ħ/ʕe .19

.63

.27

.39

.22

.69

d. AP-n: Ø/in e. 3FPL: hin/han

the father actually from Waza, which is an extension of the same dialect region in Cameroon, about 30 kilometers from the Nigerian border. Extensive recordings were made of 11 members of this household, and because of the informal nature of the relations with the members of this household, the recorder could be turned on at will as visitors from the outside came and went. In all 11 household members, all relatives were recorded. Five constituted a nuclear family, and six were brothers or sisters of the household heads or their children. All of these either were born in the eastern area, or their children were. Nine outsiders were of mixed origin, at least five from the western dialect region and two from the eastern region (Owens 1998: 262–263). The household lies in a beeline from the Gwange mosque about one kilometer (half a mile approximately) away to the west. For the household data (Table 6.5) it is not realistic to perform the age t-test as at the time of the interviews there were only two individuals older than the cut-off point of 32 years old, one of them the mother of the husband and owner of the house. It can be noted that both of these speakers have dominant eastern usage and, for information, an age-based comparison does not produce any significant differences. In addition to the core household members, the conversations included nonhousehold members who came as visitors and neighbors. These included but were not limited to, for instance, visitors from the neighboring compound, also Arabs, whose origin is the Mafa area,¹⁵ firmly in the western dialect region. As can be seen, these “outsiders” have, except for the AP-n and stress variables, dominant western variants. Despite these “intrusions,” and obviously despite the fact that there are daily hundreds of interactions between these two households, there is no evidence that the core household is shifting to a more western dialect. ¹⁵ Good examples of the dialect around Mafa can be found in the recordings TV 36, TV44a, b, c, TV 45, and TV 76b on the LCA website.

6.6 CHANGE DOESN’T NEED TO HAPPEN

197

Starting from a global summary of five linguistic variables in Maiduguri, it has been argued that the operative social unit informing an interpretation of the variation is the speech community. Maiduguri-wide, there is a good deal of variation, with only one feature, the 3FPL –hin appearing to be dominant over the variant –han. The idea of a Maiduguri-wide speech Arabic speech community is, however, problematic in that the languages of wider communication in it are English, Hausa, and Kanuri. A great deal of Arabic is spoken, but Arabic-only transactions take place in local neighborhoods and in households. Once one concentrates on the variationist spectrum of these very local speech communities, one finds, roughly, that the local Maiduguri variation replicates the dialectal norms of the ancestral rural area. The eastern norms are particularly distinctive and therefore greater attention has been paid to them, because they exhibit variants, some of which are either vanishingly rare at the pan-Arabic level, for instance AP-n discussed at length in 5.2, or simply unique, the raising of ∗ ħ/ʕa → ħe/ʕe. Yet at the local level, a local area and in an extended household, there is no evidence of an attrition of these features. To the contrary, whereas in the Maiduguri-wide survey (Table 6.4a), it appears that the variant –han is a minority form, which might suggest recessiveness, in the household (Table 6.5) –han is very dominant. The local household simply replicates its rural origins here, as well as in respect of the other features (Owens 1998: chapter 12).

6.6 Change doesn’t need to happen Intergenerational studies of western urban societies typically reveal clear directions of language change. These shifts may be evident in as little as two generations (Dodsworth and Kohn 2012; Kerswill and Williams 2005), may extend over three (Trudgill 2004; Ivars 2005) or four (Thelander 1982). What may happen is quite variable. In the case of Raleigh the entire southern vowel shift is reversed; in the case of Burtra¨sk individual variables split into either Standard Swedish or northern dialect. In New Zealand Trudgill argues that the demographically preponderant founding variant wins out. In the case of Labov’s NYC split /a/ input, the phonological conditioning tends to remain, but the grammatical conditioning (function words) is lost. In all cases a clear trend is discernible. Were these developments applicable to the Maiduguri Arabic situation, one would expect some sort of trend, possibly toward a growing dominance of what are probably the variants of the numerically larger western speakers (as Trudgill 2004 would predict). One would expect this trend to appear in the younger generation. The basic survey conducted here, however, shows that this is not the case. The operative forces are smaller and smaller social configurations, the city, the neighborhood, and then the household. To the degree the sample allows comparison between two generations, none of the variables show any significant differences.

198

FOUR ISSUES IN ARABIC HISTORICAL LINGUISTICS

As one moves toward the household level homogeneity increases of a type which is the outcome of the language shifts documented in the western nations. But this is an inherited homogeneity brought by urban migrants, not one resulting from an interaction with speakers of other dialects. The broader interpretation of the Maiduguri situation is that the smaller the speech community, down to the household level, the greater will appear the homogeneity reflecting, as in Owens (1998), an ancestral variety. The increasing heterogeneity as one defines the speech community in larger terms is merely a reflex of the fact that a larger speech community in the Maiduguri Arabic context will contain households whose ancestral variety goes back to different, often linguistically contrasting sources.¹⁶ This is not to say Maiduguri as a speech community is irrelevant to Arabic. Rather, Maiduguri can be thought of as a forum which draws dialects throughout the Lake Chad region and redistributes them in a somewhat piecemeal fashion throughout the city. The redistributed mini-dialects are still driven by kinship relations attracting new migrants from rural areas, from other Nigerian cities, and from other areas in Maiduguri itself.¹⁷ LCA is a somewhat exotic Sprachinsel. This does not mean, however, that the communal forces behind the maintenance of Arabic and of inherited varietal differences are special to NE Nigeria. In Casablanca Hachimi (2007: 108) documents Fezi female immigrants who maintain classic Fez features, such as /q/ for Arabic ∗ qaaf.¹⁸ Behnstedt (1987: 178–179) briefly describes two varieties θθ Arabic in NW Yemen (near Ṣaʕdah), their own θθa used by the speakers of im-Maθθ dialect, and a wider lingua franca. There is no indication that this bi-dialectalism is recent. Two classic instances which are reasonably, in one case, very well documented are the “confessional” dialects of Baghdad and Baħrain. In the former case (Blanc 1964, Palva 2009) three distinct confessional communities, Muslims, Jews, and Christians, maintained distinct dialects over centuries. In the case of Baħrain Sunni and Shiʕa, ʕArab and Baħarna, the latter group itself dialectically differentiated, have maintained quite distinct dialectal variants over centuries (see Holes 2019, Al-Wer and Horesh 2019). Maiduguri is thus relevant to the larger Arabic world. The analogy to the wider Arabic-speaking world is that where local conditions support them, minority varieties will flourish. ¹⁶ Note that social segregation in a sociolinguistic context does not trivially presume maintenance of inherited differences. The idea of social segregation has been used to explain rapid linguistic change in the case of the SVS reversal in Raleigh, NC (Dodsworth and Kohn 2012: 242–243). ¹⁷ Though the author did not collect new text-based corpora for the last 20 years, he has continued to follow some of the developments of the individuals he worked with between 1990 and 2000. In one interesting development, one individual from what is here termed the Gwange mosque area, now a professor at the University of Maiduguri, started a new “zaawiya,” a Tijaniyya religious enclave on the eastern boundary of Maiduguri, drawing almost exclusively on individuals of eastern dialect origin, many from Gwange. It now has upward of 100 individuals. Eastern variants migrate from the rural eastern area, to Gwange, and on to further, “new districts” (new communities) in the city. ¹⁸ The only variation between /q/ ~ /g/, /g/ being the larger Casablanca koine usage, occurs in the one lexeme qaal ~ gaal ‘say.’

6.7 LINGUISTIC STAGES AND CONTEMPORANEOUS SPEECH

199

Whether a minority among other languages or a minority dialect among demographically or politically dominant groups, Arabic contains many instances of maintenance which can be “explained” by the resilience of the local speech community which supports the variety. Here it is time to revisit the observation of von Grunebaum from 3.1. Paraphrasing him, it is the local group where Arabic language history begins. The differences between the eastern and western Nigerian Arabic dialects are clear and striking, yet it appears at the moment at least that there is no convergence toward an inter-dialectal norm or koine. The obverse of this observation is that there are social mechanisms which hold the norms in place, even when they are displaced to the “modern” environment of a city of nearly one million people. The family and local community settlement patterns maintain an inherited rural ancestral variety. This social factor, it can be suggested, helps account for the diversity of individual linguistic outcomes documented for many linguistic variables in this book (for more to this point see remarks on Milroy 1992 in Chapter 11).

6.7 Linguistic stages and contemporaneous speech communities Beyond the basic reality that language occurs in speech communities, there is a strategic reason for invoking the idea of a speech community in historical linguistics and that is that it puts a human face to what otherwise, ultimately, is a formal representation of language change. At a number of points in this work changes in Arabic were classified into stages of development of a particular phenomenon, for instance the development of b- from marking future-volitional to indicative or the development of deflected number agreement (5.3.2.8). Stages, however, conjure up the idea of the language moving ineluctably from one state to another. Arabic begins along with other Semitic languages in a state of plural agreement, then moves to optional feminine singular agreement, for instance. Since Brockelmann, Arabicists are conditioned to thinking of one stage supplanting another. This may, of course, happen. As seen in Chapter 2, there have been categorical shifts in which features have split or have merged with other features. But it can equally happen that a change, a split or a merger, occurs in one speech community only, and establishes itself only in one part of the original speech community. Cases such as this are rampant in Arabic. The following will be used here and in Chapter 11. The conditioned split of the 1/2MSG morpheme –t in WSA discussed in 5.1 (5.9) for instance, can be represented as a conditioned split in which most of the global Arabic-speaking community witnessed no change at all. One part did, however, which resulted in the distinctive WSA form of the 1 and 2MSG perfect verb suffix. There are two ways to look at the split. Figure 6.2a represents the formal outcome of the split. As a classic representation of a linguistic change it is fully adequate. Figure 6.2b is intended to bring home the point that the vast majority of

200

FOUR ISSUES IN ARABIC HISTORICAL LINGUISTICS

Arabic speakers were and are oblivious to the rule. Figure 6.2a represents a linguistic split. What it does not say is that demographically the split affected only a small part of the entire Arabic-speaking world. The demographic split in Figure 6.2b indicates that the maintenance of –t is the rule in the Arabic world. A naïve conclusion from Figure 6.2b would be that Arabic changes, but doesn’t change. This misses a key point. By associating the many changes which can be observed in Arabic with individual speech communities the notion of linguistic stage can be made more precise. The shift to t ~ Ø represents a linguistic stage, but one which affected/affects only a single speech community. A part of Arabic has changed, not all of it. Behind each variant described in Chapter 5 lies a speech community or communities. The overall result is a giant paradox. There may be many linguistic stages, but they may all exist contemporaneously. I discuss this now in greater detail with a fairly simple development already introduced in 5.3.2.8. (a)

*t

-t

t~Ø

Figure 6.2a WSA vs. rest of Arabic (b)

*t

-t, -t, -t, …

-t ~ Ø

Figure 6.2b Demographics of 6.2a

6.7.1 A diachronic trail across speech communities In 5.3.2.8 three different structures in Arabic were introduced for representing gender/number agreement. These were defined, exemplified, and correlated with the varieties each is associated with. In this section I would like to redefine the three structural types into three historical linguistic stages. Glossing over details (see 5.3.2.8) the types are as follows. • all plural nouns agreeing in plural with their concordants • plural nouns sensitive to their structural type: broken plurals and human sound FPL command either plural or FSG agreement • plural nouns sensitive to their referent: human nouns with plural agreement, non-human nouns FSG In this section I will account for each type as a separate historical stage from 1 to 3, with stage 1 representing the oldest strata. The three stages in Table 6.6 are interesting in that they begin with a natural basis of agreement based on number, and end with a natural basis, based on the

6.7 LINGUISTIC STAGES AND CONTEMPORANEOUS SPEECH

201

Table 6.6 Three stages of subject–verb agreement noun Stage 1. Stage 2. Stage 3.

PL = broken PL and –aat (human noun)= human PL = PL, non-human plural =

verbal agreement FSG or

PL PL

speech community WSA, Bʕeeri CA, Arab. peninsula Egypt, Damascus

FSG

parameter human/non-human. The complication comes in stage 2, as discussed in some detail in 5.3.2.8. In stage 2 the choice of plural agreement is to be seen on the one hand as a continuation of the “proto-Semitic” inheritance stage 1 (PL with PL). On the other, the systematic FSG option is an innovation unique to Arabic among the Semitic languages. It is clear that the only way to understand stage 3 is via stage 2. The conceptualization of plural nouns as FSG needed to precede the grammatical generalization of FSG to all non-human nouns. I walk through the three stages here. Stage 1 is at one and the same time the proto-Semitic inheritance, and the original Arabic situation. All of the following Semitic languages have plural + plural agreement: Hebrew (Blau 1976: 88; Jou¨on and Muraoka 2005: 552, see 6.7.2 below); Sabaic (Beeston 1985: 16); Modern South Arabian (Watson 2012: 251; on Mehri, Simeone-Senelle 1997: 413; for South Arabian in general, Beeston 1985);¹⁹ Akkadian (von Soden 1995 5: 223–231); Ugaritic (Boudreuil and Pardee 2009: 70–71); and Aramaic (e.g. Rosenthal 1961: 70–71 for Biblical Aramaic; Muraoka 1997: 72 for Syriac). It may be assumed that PL + PL agreement represented the earliest stage of Semitic. Biblical Aramaic even with V-S word order takes plural agreement with a following subject. (6.9) al yǝ-bahl-uu-k raʕyoon-aak not 3-bother-PL-you thoughts-your ‘Let not your thoughts bother you.’ Daniel, 5.10 ‫ְיַבֲהלּוְך ַרְעיֹוָנְך‬-‫ַאל‬ As noted in 5.3.2.8, this agreement is attested in Classical Arabic as well, under the rubric akaluuni al-baraaɣiiθ. (6.10)

ḍarab-uu-niy qawm-u-ka hit-PL-me people-NOM-your (Al-Kitaab I: 202)

¹⁹ Collective nouns are excluded, and in general often have special agreement properties in the Semitic languages (see Dror 2016, Owens 2021b for further discussion).

202

FOUR ISSUES IN ARABIC HISTORICAL LINGUISTICS

Stage two is marked on the one hand by a continuation of the PL-PL agreement, but is now joined by an optional choice which allows broken plurals to have feminine singular agreement. (6.11)

Three types of agreement (a) qaam-at ar-rijaal-u get up-F DEF-men-NOM or (b) qaam ar-rijaal-u get-up DEF-men-NOM or (c) ar-rijaal-u qaam-uw²⁰

In addition, the sound feminine plural –aat now takes FSG agreement even with human nouns (see (5.62) in 5.3.2.8). How this choice came about is treated below. Finally in stage 3 human plurals categorically take plural agreement and non-human plurals take FSG. (6.12)

il-buyuut inħaraʔ-it (∗ inħaraʕ-u) DEF-houses burned-FSG ‘The houses burned.’ ar-riggaal raaħ-u (∗ raaħ-at) DEF-men went-PL (Cairene) ‘The men went.’

The three stages represent at one and the same time individual diachronic developments as will be explained presently, but also different speech communities/dialects. Stage 1: WSA, Eastern Libya, Bʕeeri (UpperEgypt; possibly Moroccan, see ch. 5 n. 36) Stage 2: Emirati, Arabian peninsula in general, Jordan, southern Tunisian, CA in general Stage 3: Egyptian (with odd exception), Damascus, Baghdadi Before coming back to the key point of this section, the correlation of speech community with historical linguistic stage, it is relevant to describe the key shift in stage 2 from plural only agreement, to an either-or plural or FSG.

6.7.2 Motivation for change Stage 2 saw a split in the agreement pattern whereby plural nouns take either FSG or PL agreement. The motivation for this split can be interpreted as a ²⁰ These examples are from CA, which has a complicating, though related factor of an initial verb being invariably singular. I do not treat this issue here (though see Dror 2016, Owens 2021b for detailed discussion).

6.7 LINGUISTIC STAGES AND CONTEMPORANEOUS SPEECH

203

reconceptualization in how plural nouns were classified. The idea developed here was suggested in a recent book by Fassi Fehri entitled Constructing feminine to mean. He argues that the grammatico-semantic concept of feminine is defined along three parameters. It can define natural gender, as in kalb ‘dog’– kalb-a(h) ‘bitch,’ it defines a singulative of a species, as in ħuut–ħuut-a(h) ‘fish–single fish’ (see below) and thirdly it can define an entity as a collective unity, as opposed to a plural of individuals. Here I concentrate on the third aspect. Fassi Fehri (2018: 150) exemplifies this third category with the examples, (6.13)

al-majuusiy-at-u qaal-uu ‘The Magi (as individuals) said’ al-majuusiy-at-u qaal-at ‘The Magi (as a group) said’

In the first example the majuusiyya are conceived of as individual members of a group, whereas in the second they are conceived of as an undifferentiated whole.²¹ He terms the difference as one of “plural” vs. “plurative,” where quotation marks represent the semantics manifested in the agreement pattern.²² He notes that the plural–plurative division is found inter alia in broken plurals, and that one and the same noun can be conceptualized in either of the two ways. Rijaal ‘men’ for instance (Fassi Fehri 2018: 13, 178), can conceptualize men as individuals in a group, or men as an undifferentiated collective. This leads to the crucial point for present purposes that the reconceptualization, essentially the innovative lexical re-classification, was represented in two alternative agreement patterns. “Plural” is marked by plural agreement, whereas the “plurative” is characterized by feminine singular agreement. The dual conceptualization of lexical “plurality” is reflected directly in the two different agreement patterns which in fact define the stage 2 innovation. Correlating the lexical changes with their respective agreement patterns gives the overview in Table 6.7 in a stage-by-stage breakdown, correlating the agreement pattern with a basic semantic description of the parameter defining agreement. ²¹ Analogous to the difference in English between “the committee are meeting” (plural) vs. “the committee is meeting” (plurative). Fassi Fehri, incidentally, bases his idea of plurative on a category used within Cushitic linguistics. ²² The idea is essentially found in the Arabic tradition, as will be seen, in Wright (1977: 233), in certain respects Brockelmann’s idea (1908: 418) of the feminine ending as a class marker. Belnap and Shebabneh (1992: 259) suggest that +/- Human was a crucial factor favoring deflected agreement, with deflected agreement developing around -Human. However, in CA and as well as till today, nouns in the “stage 2 dialects” a human broken plural can command FSG agreement. Recently D’Anna 2020, see also Corbett 2014) invokes the typologists’ idea of collectivity vs. individuation (Hopper and Thompson 1980, 1984), which appears to be similar to Fassi Fehri’s plurative vs. plural. Holes (2016: 329) phrases this as generic (FSG) vs. specific (PL). In a suggestive study Belnap (1993) shows that the choice of agreement is multifactorial, influenced by frequency, specificity and genericness, and individuality. Unfortunately it appears the intertwining but diverse threads of this early sociolinguistic study were not followed up on.

204

FOUR ISSUES IN ARABIC HISTORICAL LINGUISTICS Table 6.7 Parameters of plural agreement in three stages Stage 1: plural = “plural”: natural agreement Stage 2: broken plural and FPL -aat = • Lexical classification: “plural” or “plurative” • crucial parameter, conceptualization as individuals or set • Agreement: plural or FSG Stage 3: “human” = plural, “non-human plural” = FSG: crucial parameter: human vs. non-human

In stage 1 morphological plurals are grammatical plurals for purposes of agreement, and, it may be assumed (see Owens 2021b, section 4), they are also semantic plurals cataloguing individuals in a group. Stage 2 is, obviously, more complex. Stage 2. The decisive innovation is a re-classification of morphological plurals, in particular broken plurals and human FPL –aat, into a binary semantic choice, plural or plurative. This choice, in turn, correlates with a syntactic agreement pattern, either plural (as in stage 1) or the innovative feminine singular. In stage 2 there are, therefore, both semantic and grammatical (agreement) factors in play. A grammatical plural represents a set of individuals whereas a grammatical plurative represents individuals as a set. Ibn Yaʕish, who will be elaborated upon shortly, expresses this succinctly, noting that the choice of FSG agreement signals a collective (V: 103). In his words, (V: 100) “the collective is feminine (al-jamaaʕa muʔannaθa)” in (Classical) Arabic. Stage 3 simplifies the conditioning factors. With a plural noun, human vs. nonhuman becomes the decisive factor in determining agreement. Unlike stage 2, no free choice is in play. Human plurals, whether sound or broken plural, take plural agreement. Thus, whereas in stage 2 speakers conceptualize rijaal and muslim-aat ‘Muslims-F’ as individuals (plural) or a group (plurative), in stage 3, since both are human, only plural agreement is allowed. In stage 3 as with stage 1, agreement is fully determined, given the lexical identity of the noun. Before leaving this section, it is relevant to remark briefly on associations made within the Arabic linguistic tradition. This is the difference between the plural of paucity and plural of abundance (using Wright’s term, 1977: I 234, jamʕ al-qilla vs. jamʕ al-kaθra). In the Arabic grammars this is treated primarily as a formal difference, each type being typically associated with one or another form (see extensive discussion in al-Kitaab II: 126–44, 181–205). Ibn Yaʕish (V: 103) picks up on this in greater detail, noting that the Kufans characterized feminine agreement in pairs such as (8.21) above as representing a small amount (qilla), the masculine abundance (kaθra). As a true Basran he does not endorse the position. However, a little later he does return to the issue, expanding upon the position that the paucal has characteristics of singularity. Specifically

6.7 LINGUISTIC STAGES AND CONTEMPORANEOUS SPEECH

205

he notes that the jamʕ al-qilla “has a number of properties of the singular” (V: 106, qad jaraa alayhi kaθiir min aħkaam al-waaħid). He notes, for instance, that the paucal plural aθwaab ‘garments’allows a diminutive (uθayb), a form which is usually based on the singular.²³ Moreover, commenting the choice of M or F in examples such as (6.11a), with a human subject, he explicitly states, if you consider it a plural you make it masculine,²⁴ and if you consider it a collective, you make it feminine. (V: 103) (fa-ʔin qaddartahu bi-l-jamʕ đakkartahu wa ʔin qaddartahu bi-l-jamaaʕa ʔannaθtahu)

It thus appears that there was a perception among some of the grammarians that the idea of plurativity was associated in general with FSG agreement among morphologically plural nouns, and that one subclass of these, the plural of paucity in fact had an element of “plurativity” in it in that it created a conception of entities as a singular collective. In a nutshell then the idea of three stages needs to be understood in two senses. On the one hand there is a clear diachronic development which can be seen most clearly in the transition from stage 2 to stage 3. Stage 3 formally presupposes as its input the innovation of FSG agreeing with broken plural nouns, which occurred in stage 2. It altered this slightly, adding the condition that FSG agreement required a non-human noun, while human nouns, regardless of whether broken or suffixal, take plural agreement. “Stage” in this sense is a diachronic development. Paradoxically, at the same time the stages are merely synchronic alternatives distributed among different Arabic dialects, since there is no compulsion that all speech communities be characterized by, or develop via, the same stages. Some speech communities may reflect an original stage 1 for instance. The three stages described here are therefore an abstraction over all Arabic speech communities. The current example demonstrates how it would be misleading in Arabic historical linguistics to ignore the role of speech community. Ostensibly it is contradictory to speak of three historical linguistic stages, which are not successive stages at all. Arabic changes, but it doesn’t change. The contradiction is resolved as soon as the requirement is lifted that the entire language be characterized by all three stages. It is not the language as a whole that is the operative unit here but rather individual speech communities, each of which to one degree or another is allowed to go its own way as far as each innovation is concerned. It can indeed be suggested that the failure to make the idea of speech community a fixed pillar in Arabic historical linguistics is a major reason for its distortions. Each of the individual generalizations characterizing change in Arabic which were introduced ²³ Note also that in Safaitic there are forms that appear to be cognate with the paucal plural (ʔɣnm in following example) which agree in FSG with the verb. ʕqd-t m-rhbt h-ʔɣnm ‘The goats were prevented from entering Rhbt’ (Al-Jallad 2015a: 141–142). ²⁴ This linkage appears to require independent consideration in the ALT.

206

FOUR ISSUES IN ARABIC HISTORICAL LINGUISTICS

in 1.1 are in a sense correct. What makes them entirely misleading, however, is that they were privileged in order to represent the changes as changes to the language as a whole. Thereupon rests much of the Old Arabic–Neo Arabic fallacy. The changes do not, however, characterize the language as a whole, but rather parts of it, some of its speech communities (see App. 6.7 for more).

6.8 Non-Arabicists beware: The community of diglossia It is relevant to end this chapter with a word of caution. Whereas to this point variation has been assumed to reflect past historical processes, which may point either to ongoing change or reflect the past effects of such variation, there is one linguistic domain in Arabic where there is little or no correlation between variation and change. One of the classic linguistic articles of the twentieth century is surely Charles Ferguson’s “Diglossia” (1959). In what became known as “classic diglossia” he described four languages (German, French, Arabic, Greek) in which co-existed different varieties, one a high (H), prestigious variety, the other the low (L) popular variety, sustained by a range of functionally supported domains. In the case of Arabic, Ferguson opposed Classical Arabic, or its modern successor, Standard Arabic, to Arabic dialects. In Ferguson’s idealized H vs. L varieties there is a clear linguistic dichotomy between CA (H) and dialect (L). He does acutely observe however (1959: 240) that in reality there is a medial variety, known inter alia as the al-luɣa al-wusṭaa ‘middle language’ in which the linguistic attributes of each intermingle. This is in a very broad sense the contemporary, oral equivalent of Middle Arabic (4.2.3). This genre, in fact, is typical of many spoken Arabic encounters. Some 20 years after Ferguson the English linguist T. F. Mitchell (1986 undertook to characterize this variety, which he appropriately called “Educated spoken Arabic” (ESA). Mitchell observes that it is particularly among the educated class, those who have a good knowledge of Standard Arabic, where this variety is used. As the Arabic name, ‘medial language’ implies, ESA is not a variety with a fixed grammar, but it does have discernible norms. Speakers may use more or less of SA (or dialect), they may vary the mix according to topic, audience, venue, and other factors. A number of empirical studies have described the linguistics of this variety (e.g. Holes 1993; Mazraani 1997; Al-Wer 2002; Mejdell 2006; Bassiouney 2006; Parkinson 2003; Davies, Bentahila and Owens 2013: 339–342; see summary in Owens 2019b: 83–85). It is important to appreciate that ESA is an entity with linguistic substance. It is not badly spoken Standard Arabic. An essential historical linguistic question is gauging what if any effect ESA has on linguistic change. A short summary of a case study will give an idea about the type of interpretive problems which can arise. An early study by El-Hassan (1979) is illuminating. The parameters of his research were to find out when educated Arabs of different backgrounds met—the clientele and situation typically associated with ESA— what sort of Arabic they spoke. The focus was on Levantine

6.8 NON-ARABICISTS BEWARE: THE COMMUNIT Y OF DIGLOSSIA

207

and Cairene Arabic. Groups of speakers from Lebanon, Syria, Jordan, and Egypt were recorded, in some sessions speaking with their countrymen and women, in others with speakers from the other countries. El-Hassan examined demonstrative usage. The MSG proximal demonstrative has the forms shown in Table 6.8 in these dialects, according to El-Hassan. Table 6.8 Demonstrastives in an ESA encounter Jordan

Syria

Egypt

haađa, haada, haaḍa, haaḍ

haađa, haaza, haada, haad

haađa, haaza, da

A typical finding of ESA studies is that so-called “stigmatized” forms will not be used in encounters among educated speakers of different national backgrounds. No definition of stigmatized is given, but they are often forms peculiar to one dialect, such as Jordanian haaḍa with its distinctive emphatic interdental voiced fricative. The study distinguished three different functional positions of the demonstratives, the details of which can be ignored here. In this summary I give the figures only for the pronominal usage of the demonstrative (not as a noun complement), which represents by a large margin its most frequent occurrence in the oral texts. Haađa is the SA form, and haaza is an alternative to this.²⁵ Syrian and Jordanian (see below) haada arises because of an earlier shift of ∗ đ → d, the loss of interdental fricatives (see 5.3.1.5). If one were to identify which of the forms were typical dialectal variants, these would be Egyptian da, Syrian haada or haad and Jordanian haaḍ̵a or haada. The variants with and without a final –a are common throughout the Arabic world, and probably dependent on pragmatic, discourse and phonological factors. The numbers of /a/-final forms for each category is small, so they will be included in a single category, not separate as in El-Hassan’s original article. The results are interesting and deserve attention from different perspectives. For present purposes a comparison between the Egyptian and Jordanian results is relevant. Egyptians, it turns out, hardly switch to “pure” SA haađa. They maintain their native da under both conditions, speaking to educated Egyptians and to non-Egyptians, and use the medial haaza to the same degree in both situations (Table 6.9). From a sociolinguistic perspective the considerable contrasts between the Egyptian and Jordanian speakers are interesting. Effectively, Egyptians only marginally move away from their native, and highly distinctive da,²⁶ whereas Jordanians do ²⁵ See 5.3.1.5 for discussion of ∗ θ → s in learned words in Egyptian Arabic. This is the voiced reflex of the same phenomenon. ²⁶ A major distinction emerges in nominal modification, not treated here. Egyptian da is invariably post N, il-beet da ‘this house,’ whereas in the other three dialects the demonstrative occurs pre- and postN, under conditions still awaiting treatment, Jordanian haaḍa il-beet/ il-beet haaḍa (see 10.1, 10.3).

208

FOUR ISSUES IN ARABIC HISTORICAL LINGUISTICS Table 6.9 Demonstrative pronoun, Egyptian (El-Hassan 1979: 41) 6.9a Egyptian to Egyptians da haaza haađa 71 17 6 6.9b Egyptians to non-Egyptians da haaza haađa 59 7 5

Table 6.10 Jordanian (El-Hassan 1979: 34–9) 6.10a Jordanians to Jordanians haaḍ(a) haad(a) haađa 6 16 48 6.10b Jordanians to non-Jordanians haaḍ(a) haad(a) haađ(a) 2 10 11

so en masse as it were, preferring the SA variant haađa to their own haaḍa or haada. To put this in historical linguistic perspective, from a variationist perspective one would be justified in forming the hypothesis that whereas the demonstrative usage in Egyptian (Cairene) Arabic is stable, Jordanian Arabic is witnessing a shift toward SA haađa. Table 6.10 can be read as a change in progress in Jordanian Arabic. To an outsider this is no doubt an appealing interpretation. However, it would be completely wrong. In a diglossic language situation, it is dangerous if not impossible to read off historical linguistic interpretations from even robust statistical observations, such as in El-Hassan (1979). Rather, one needs to contextualize the Jordanian data sociolinguistically. I would emphasize three points here, which need to be interpreted interlinearly as it were—the data itself allows more than one interpretation. First, there is a large contrast between the Egyptian and Jordanian demonstrative usages, which probably reflects the dominant status of Egyptian as a default dialect. Particularly as of 1979—the situation is arguably changing today— Egyptian was a dominant spoken variety due to its widespread exposure via the Egyptian film industry and through the fact that Egyptian teachers taught widely throughout the Arab world. The statistics for Syrians (El-Hassan 1979: 44–45)

6.8 NON-ARABICISTS BEWARE: THE COMMUNIT Y OF DIGLOSSIA

209

resemble the Egyptian, and Syrian as well represents a prestige dialect in the Levant, Damascus being the most important urban center in the region. Jordanians, at least as of 1979, could have evaluated their dialect negatively. In the formal situation of a microphone interview, Table 6.10a would reflect a refuge in the neutral SA variant (see below). Secondly, Table 6.10b shows the Jordanian variant haad(a) to be almost as frequent as SA haađa. What is not evident from the table, nor explained in the article, is who the Jordanians were talking to. As can be seen in Table 6.8 above, Jordanian haada is identical to Syrian haada, so it could be that when interlocutors from these two dialects spoke, they found a common ground in their shared variants. Thirdly, the post 1948- and 1967 era saw a massive influx of Palestinians, both from what is today Israel and from what is today the West Bank to Jordan. The fact that there are two normative variants, haaḍa/haada in Jordanian Arabic is due simply to the fact that haada is the Israeli/Palestinian variant (particularly urban), haaḍa the east bank Jordanian.²⁷ Between these two variants, the SA haađa offers as it were neutral ground in which speakers do not have to overtly commit themselves to identifying the West Bank vs. East Bank demonstratives. The matter was probably all the more fraught in the late 1970s when the research was carried out. In 1970–71 there was armed conflict between the Jordanian state and the Fedayiin, an armed group associated with the PLO, essentially aligned with interests representing the West Bank population. Speaking before a microphone at this time was probably a delicate matter, and it could be that the very high SA use of haađa in Table 6.10a, Jordanians speaking with Jordanians, was a reflection of assiduously seeking out neutral forms. Thus, whereas one might ostensibly be justified in reading a change in progress interpretation into the Jordanian data, a consideration of the broader sociolinguistic context suggests that other factors explain the differences between Jordanians and Egyptians. Underpinning the Jordanian usage in particular is diglossia. Diglossia is a structurally embedded phenomenon. It is, in those speech communities where it exists, always there, and as suggested here for the Jordanians, one instrument which can always be fallen back upon if the situation warrants. Without appreciating this factor, western accounts run the risk of painting Arabic language history with a western brush. The classic and concise demonstration of this is found in Muhammad Ibrahim’s (1986) distinction between “standard” and “prestige.” Until the article, it had been assumed SA was indifferently both the

²⁷ The reader should not lose sight of the fact that this is the basic situation, and still prevalent as of the 1970s. Many developments have occurred since then, particularly in Amman (see e.g. Al-Wer 2007, Miller et al. 2007). Only a re-study using the same linguistic variants would clarify the degree to which matters have changed.

210

FOUR ISSUES IN ARABIC HISTORICAL LINGUISTICS

standard and the prestige variety. This follows from Ferguson’s definition of diglossia itself, whereby the H variety (SA in this case) is the variety with prestige. What prompted Ibrahim’s article was the claim (Shuqayyir 1981: 324) that Arab women differed from women in the west in using prestigious forms to a lower degree. By prestige was understood SA, Standard Arabic forms. El-Hassan for instance observed (1979: 53): it is the men, not the women, who use “the most advanced forms” and “correct more sharply” to the acrolectal end of the continuum.

Acrolectal in this quote is understood as SA, the observation being that the SA form haađa is used more frequently by men than by women. What Ibrahim observed, however, was that there are effectively two targeted norms in most Arab societies. One is indeed the standard, SA, study after study showing that it is typically men who have a greater usage of SA forms than women do. The other he termed “prestige,” whereby he observed that a second dynamic at work is that, spoken Arabic has its own local prestigious varieties which always comprise certain features that are not only different from but are often stigmatized by H norms. All available data indicate that Arab women in speaking Arabic employ the locally prestigious features of L more than men (1986: 124). Thus in a study carried out in Amman (Shuqayyir 1981), four variants of the sound are represented by the letter “qaaf.” (6.14)

‘say’ in four Amman variants a. qaal b. ʔaal c. kaal d. gaal

(6.14a) is the SA pronunciation, (6.14b) originally urban west Bank (e.g. Jerusalem), (6.14c) kaal is rural West Bank and (6.14d) gaal East Bank. Shuqayyir shows that men overproportionally use the SA (qaal) variant, whereas women, whatever their origin, overproportionally use /ʔ/ (ʔaal). Twenty-five years after Shuqayyir, Al-Wer (2007: 66–7, 2014) confirms that the local gender-based prestige status of /ʔ/ is fully established in Amman. By now nearly any sociolinguistic study takes into account the existence of what I will call the diglossic norm—the norm defined by Standard Arabic—and a local norm, the basis of Ibrahim’s prestige category. Examples include Holes (1983: 444–448) where local Sunni /y/ < ∗ j and /q/ < ∗ ɣ represent local prestige forms, even though they deviate from SA, Haeri (1996) who describes Cairene ʔ < ∗ q as representing the local prestige norm, and implicitly Hachimi’s (2007) discussion of

6.8 NON-ARABICISTS BEWARE: THE COMMUNIT Y OF DIGLOSSIA

211

Fez-origin women in Casablanca who maintain a merged first and second person paradigm (see Palva 1982 and 12.3.3 below). This distinction, however, is sometimes missed by outside interpreters. A case in point is Labov’s (1994: 346) reading of the status of /q/, arguing for the following (6.15)

Merger of ∗ /q/, ∗ /ʔ/ in ʔ, demerger of /ʔ/ to /q/, /ʔ/

Labov is interested in the question of undoing mergers, which in general is deemed unlikely (Garde’s law). Largely following Shuqayyir (1981) who in turn follows Garbell (1958), he postulates the merger of ∗ q and ∗ ʔ in ʔ, hence badaʔa < ∗ badaʔa ‘begin,’ and ʔaal ‘say’ < ∗ qaal. The use of qaal instead of ʔaal in contemporary Amman would be interpreted as the demerger. However, cutting a long story short (Owens 1998b: 124–127 for details), it is unlikely that the postulated merger ever occurred in the first place. • ʔ had already disappeared from many Arabic varieties by early Islamic times (30/650; see notes on Qurrah letters and papyri in general, 4.2). • Some Quranic orthography, and some of the reading traditions support an interpretation of an early loss of ∗ ʔ. • ʔ is lost in nearly all dialects, including all Levantine, and forms with original ʔ have merged into other paradigms, e.g. original ∗ badaʔ-ti ‘you.F began’→ badee-ti which falls into the same paradigm as a verb which historically never had a final glottal stop, like banee-ti ‘you.F built < ∗ banay-ti (see 4.1.2). It is a correct observation that modern Amman does have both /ʔ/ and /q/. This, however, has nothing to do with “a remarkable reversal of mergers of q” (Labov 1994: 346). Rather it confirms the endurance of the diglossic norm. /q/ is indeed introduced into Amman (and most varieties of Arabic) but as part of a diglossic norm. A word such as iqtiṣaad ‘economy’ will always appear as [iqtiṣaad], not ∗ [iʔtiṣaad] because it is part of the diglossic system. A word such as ‘say’ can surface as both ʔaal, the local norm, and as qaal, the diglossic norm, but just as El-Hassan’s Jordanian demonstratives are not indicative of language change, neither is the use of [q] in qaal. Rather the sociolinguistic dynamics of its conditions of use are crucial, similar to what was discussed for the demonstratives above. [q] being sanctioned as a diglossic norm will always either be complementary to the phonetic value of the local norm via the mechanism of lexical conditioning or will vary with it as a stylistic practice sanctioned by norms of usage. To summarize, it should be clear that a historical account of Arabic, unless it is explicitly interested in a history of the standard language only, needs to give precedence to the history of local norms, the local speech community. It is here, as Al-Wer (2013: 251–254) points out, that the dynamics of permanent language change transpires. Diglossic norms are a permanent fixture in Arabic, but the

212

FOUR ISSUES IN ARABIC HISTORICAL LINGUISTICS

speech community it represents is effectively pan-Arab and anchored in the early history of the codification of Arabic. It is a norm fixed already by Ibn al-Sarraj in the fourth/tenth century (see 3.3.2 above) and one always present in literate Arabic societies. Structurally it does not change.²⁸ What is of enduring interest is understanding the sociolinguistic circumstances under which it is called upon.²⁹

²⁸ As so often, this needs qualification. The phonology, morphology (in general) and syntax are in principle immutable. Lexical change, calquing new meanings on to existing words and collocations, on the other hand, is common (see Blau 1981a for an enlightening study). ²⁹ I note here that the difficulty of using SA and contemporary spoken Arabic for direct historical comparison is further illustrated in Appendix 10.4 where frequency data of SA verb stems is discussed.

PART III

CON TACT Change via contact has many names and encompasses a number of processes— diffusion, borrowing, sub-stratal influence, imposition. One common denominator among all is that contact-based change tends to be structurally “disruptive,” introducing “odd” elements into what, in a best-case scenario, can be postulated as a system which is not expected to have such elements. It is not surprising that Arabic in its great geographical and diachronic breadth should evince many interesting individual cases, three of which are discussed in some detail here. In 11.7 I will contrast the data presented in these chapters with the very different case of long-term stability, with a view toward inquiring into the reasons for rather dramatic differential diachronic outcomes. The case studies come from very different eras and cultural backgrounds. In Chapter 7 early Aramaic – Arab language contact is examined. This is reconstructed to extend from before 100 AD up to the early Islamic period and spans an era whose beginnings saw a dominant Aramaic give way to a dominant Arabic. The contact-based influence is diffuse, but significant. The concept of “dia-planar” diffusion is developed to account for the postulated multi-centric contact events which introduced Aramaic into Arabic among disparate populations. Chapters 8 and 9 will turn the focus in the opposite direction, to a “micro” account of significant contact-based influence on the Arabic of the LCA. Two types of contact are examined, the effects on the demonstrative system, and on idiomaticity. In both cases it will be demonstrated how LCA Arabic was drawn into a Lake Chad area Sprachbund in ways which in some cases displaced inherited structures, but in others left inherited structures intact while expanding their functionality.

7 Arabic in contact I Aramaic

In this chapter, 7.1 gives the basic background to the Aramaic varieties which are relevant to understanding the early contact situation. 7.2 moves on to the linguistic evidence for the contact. In 7.3 the demographic plausibility of extensive contactbased influence is developed in a summary of early Aramaean–Arab social contact, and finally in 7.4 a linguistic model termed “dia-planar diffusion” is presented which models how early contact between the speech communities took place. I would note that particularly for the linguistic account (Section 7.2) more extensive accounts can be found in Owens (2017 and 2018d).

7.1 The era of equilibrium: Directed dia-planar diffusion: Aramaic–Arabic contact To begin a very brief sketch of Aramaic is in order. Aramaic is first attested about 800 BC (Garr 1985: 231; Greenfield 1978: 94). Old Aramaic consists of about a 5,000 word mainly epigraphic corpus (Degen 1969). This gave way in attested sources to Imperial Aramaic, first used during the Assyrian Empire, then as the dominant language of the Persian Achaemenids. Two of the most important corpora come from this era, Egyptian papyri dating from the fifth century BC and Biblical Aramaic, the latter found in parts of the Books of Daniel and Ezra in the Hebrew Bible. Thereafter develop a number of varieties collectively known as Middle Aramaic (Boyarin 1981), spoken in the first millennium AD. These include Palestinian Jewish, Christian and Samaritan Aramaic, Palmyrean, Nabataean, Jewish Baylonian, Classical Mandaic and most significantly Syriac. This last variety, which flourished between AD 200–700 and was centered in Edessa (presentday Urfa in southern Anatolia, Turkey), was the language of the early Christian community and later of the Nestorian Christians. Syriac eventually split into an easterly (Nestorian) and westerly (Jacobite/Maronite) tradition. Samaritan Aramaic, a second variety used in this comparison is best attested from the fourth century AD (Gzella 2017: 277). It appears that its speakers shifted to Arabic around the time of the Arabic-Islamic expansion (Macuch 1982: xxxiv). Arabic and the Case against Linearity in Historical Linguistics. Jonathan Owens, Oxford University Press. © Jonathan Owens (2023). DOI: 10.1093/oso/9780192867513.003.0007

216

ARABIC IN CONTACT I: ARAMAIC

As the varietal designations suggest, there never developed in Aramaic a standard Aramaic comparable to Classical Arabic as it emerged in the late eighth century. Imperial Aramaic is the closest, but any uniformity in it is usually explained in terms of koinization rather than formal, planned standardization.¹ Broadly speaking, by the early Christian era there had developed two dialect areas, an eastern one, including Babylonian and Nestorian Syriac, and a western one, including the Palestinian varieties and Nabataean. This dialectal differentiation continues today, with the dialect around Ma’lula in Syria the one surviving member of the western branch, and a number of varieties spoken in Turkey, Iran, Iraq, and eastern Syria continuing older Eastern Aramaic. Turoyo is often considered to be in a class of its own, however (Jastrow 1997). Many of these contemporary varieties have unfortunately slipped into the category of endangered languages, as political instability beginning in late Ottoman and continuing to the present day has forced large scale migration and emigration of Aramaic speakers out of the region (Owens 2006). In this chapter I rely mainly on a sample of three Aramaic varieties for comparison with Arabic, Biblical Aramaic (Rosenthal 1961), Syriac (Daniels 1997; Kaye 2007, Muraoka 2007; No¨ldeke 1880/1904, 1898), and Samaritan Aramaic (Macuch 1982). The choice of these three varieties is motivated by three factors. First, a reasonable number of very good, detailed descriptive studies allow for broad-based comparisons with Arabic. Secondly, the varieties allow divergent diachronic and geographical sampling. Biblical Aramaic is a chronologically older variety, while Samaritan and Syriac represent the emerging West/East geographically based varieties. Thirdly, they are all attested in scripts which allow short vowels to be interpreted (as opposed to e.g. the consonantal scripts of Egyptian Aramaic or Palestinian Jewish Aramaic). Where appropriate evidence from other Aramaic varieties will be adduced. Aramaic contact with Akkadian has been treated in detail (Kaufman 1974), as has Greek and Latin contact with Syriac (Butts 2016) while contact with Hebrew has been intense at various points in Aramaic history and forms a contact backdrop to many varieties of Aramaic (e.g. Macuch 1982: 57; Rosenthal 1961: 9; Segert 1975: 35, 95–96, 103, 1997). Arabic–Aramaic contact on the other hand is either not treated at all, as in Rosenthal 1961 (57–59), restricted mainly to lexicon (Fraenkel 1886; Jeffrey 1938), treated only in an era where Aramaic was well ¹ Huehnergard’s (1995: 273) invocation of Arabic as a parallel for the structural development of Imperial Aramaic is problematic. Invoking a simplification paradigm implicitly based on Ferguson and Fu¨ck, Huehnergard suggests that spread among foreign speakers and outside of its original home area led to a uniformization of Aramaic. Among other problems with using Arabic as an analogical model, is that a historically linear simplification is not an obvious attribute of Arabic linguistic history. A further issue is whether norms of writing don’t necessarily abstract away from representing existent variation. It is, after all, thanks in part to Sibawaih, a linguist who has no comparable counterpart in the Aramaic tradition, that we discern a variation in Arabic even within the normatizing writing tradition which Sibawaih stood in.

7.1 THE ERA OF EQUILIBRIUM: DIRECTED DIA-PL ANAR DIFFUSION

217

on its way to acceding to Arabic as the lingua franca of the Middle East (Behnstedt and Arnold 1993), or treated in such a geo-politically limited manner as to preclude deriving a broad overview of the phenomenon (Neishadt 2015). As will become clear in the course of this chapter, the current treatment views Al-Jallad’s (2020a) summary of Arabic–Aramaic contact as far too dependent on written sources alone. Retso¨ (2006) speaks of contact (“interference”) “one millennium before the Islamic conquest,” and has a useful summary of evidence of Arabic contact in Koranic Arabic. However, his treatment is largely restricted to the lexicon and as seen in 4.1.5 one wishes for independent evidence such as relic forms to confirm his proposed temporal sequencing of borrowing. Blau (1985) presents a number of parallel phenomena in Aramaic and in modern Arabic dialects, which he, refreshingly, does not attempt to explain in terms of parallel development. However, he does not develop a systematic framework for explaining the observed similarities, and he appears to limit his observations mainly to contemporary contact. Diem (1979) found evidence for Aramaic influence on Arabic, which will be noted below, though it falls short of a systematic treatment of the subject. Still, the current presentation finds agreement with a number of points in Diem’s article. The current chapter departs from most treatments to date, which are largely restricted to lexical contact, in concentrating exclusively on evidence of structural influence in the realms of phonology, morphophonology, morphology, and syntax. Before proceeding to the linguistics, it is relevant to contextualize this chapter in two opposing interpretations of early Aramaic–Arabic contact. In one, Retso¨ (2000) criticizes the thinking behind the reticence to countenance large-scale contact-based influence among the Semitic languages. His position, essentially the one adopted in the current section, is quoted at the start of 2.4. In support of what might be termed his “permeable” approach to Semitic languages he argues, inter alia, that there has been extensive Aramaic influence on Maghrebinian Arabic going back to the earliest introduction of Arabic into the region (see e.g. 7.2.2, 7.2.6, App. 7.2.5b). Retso¨’s position contrasts with that of another well-known Semiticist, Macuch (1991: 969). Macuch also observes parallels between Aramaic and North African Arabic dialects in respect of syllable structure (see 7.2.2), but peremptorily dismisses them, noting that since the phenomenon is found in North African Arabic, “it would be hard to explain it exclusively through Aramaic influence.” Here one observes a widespread pre-theoretical filter that truncates exploration of potentially interesting leads in reconstructing Semitic language history: as far as we know, there were no large scale Aramaic settlements in North Africa, so any similarities between North African Arabic and Aramaic must be due to independent parallel development. The thrust of this chapter argues against this perspective. Looking specifically at the early Islamic language situation, Retso¨ (2013/2019: 445) speaks of “continuum of isoglosses,” which would have been in existence

218

ARABIC IN CONTACT I: ARAMAIC

at the time of the initial Arabic–Islamic expansion. While agreeing broadly with Retso¨’s critique, I take a different tack from the continuum model, instead arguing in 7.4 that long-term Arabic–Aramaic contact shows the typical effects of what Dixon (1997) termed an era of equilibrium.

7.2 A sample of potential of common Aramaic–Arabic isoglosses In this section I exemplify instances in which certain varieties of Arabic share unpredictable isoglosses with varieties of Aramaic in order to establish that preand early Islamic contact leading to structural change was widespread among some ancestral Arabic varieties. A larger, more detailed exposition of this material can be found in Owens (2018d, as well as 2022), which includes an assessment of where these features stand relative to Semitic in general (2018d: 430–433) and within the large family of Arabic varieties (2018d: 434–445).² I assume these positions in the context of the overall assessment of the role of Aramaic–Arabic contact. In App. 7.2 is found a complete list of suggested early Aramaic–Arabic contact-based influence.

7.2.1 Segmental phonology Varieties of Arabic share special properties of gutturality with Aramaic. /r/ The first example is the sound represented by Aramaic /r/ (‫)ר‬. Beginning with Biblical Aramaic /r/ is described as belonging to the category of sounds which pattern with the guttural sounds, /ʕ, ħ, h/, without actually being of the class of gutturals. The functional unity of the class of gutturals is described below. A typical formulation is “a guttural or /r/” (No¨ldeke 1898: 107, “ein Guttural oder r,” also 32, 38, Muraoka 1997: 10; Rosenthal 1961: 16). From the perspective of this formulation, the class is bifurcated, gutturals and /r/, the latter presumably being a trill or flap. An obvious way to explain the functional unity of the class is to assume that /r/ in fact represented a uvular trill (/ʁ/), or the voiced velar fricative /γ/. Etymologically, the velar fricative of proto Semitic has been lost in Aramaic, having merged with ‫( ע‬/ʕ/, Moscati et al. 1980: 40; see 7.2.2 below on CyA), so there is phonological space for the present interpretation.³ ² In 2.3.3 the status of the FSG nominal suffix -t ~ -ah as a contact-based feature was also mentioned, though whether Aramaic contact was criterial was left open. ³ Khan (1997: 107) writing on Jewish Palestinian Aramaic, a variety contemporary with Samaritan Aramaic, notes that /r/ may have had a uvular pronunciation.

7.2 A SAMPLE OF POTENTIAL COMMON ARAMAIC–ARABIC ISOGLOSSES

219

Besides the systematic, syllable structure evidence in favor of this, which will be adduced below in 7.2.2, it can be noted that in various contemporary Mesopotamian Arabic dialects, Arabic /r/ has merged with /γ/, in particular Christian and Jewish Arabic dialects. Hence Baghdadi Christian /γaγiib/ “strange,” < /γariib/ (Abu Haidar 1991: 9; also Jastrow 1978: 39). This pronunciation of Arabic /γ/ (ɣain) is attested in Baghdad as early as the ninth century (Blanc 1964: 23). It may be suggested that this reflex in Christian and Jewish Arabic dialects is due to language shift by original Aramaic speakers, imposing their uvular pronunciation of /r/ into their version of Arabic. The uvular trill, /ʁ/, is further attested in urban Moroccan areas, Fez, Tetouan and various Jewish Moroccan dialects (Aguadé 2003: 78–79; Behnstedt 2003: 165; Hachimi 2007: 107). All of these, it should be noted, are associated with an earlier, pre-Hilali migration going back to the initial expansion of Arabs into the region (see Maps 3, 6).

Uvular fricatives Borg (2007: 539–540) explicitly relates various phenomena found in Cypriot Arabic to an Aramaic substratum. He includes the change of γ > ʕ, and a postulated ∗ x > ∗ ħ. Both of these are highly characteristic of Aramaic, and are otherwise hardly found in Arabic. Likewise, Brockelmann (1908: 121) refers specifically to reflexes of ∗ γ, noting that in Yemen (Dathina, west of Aden) and in Maltese /γ/ is realized as /ʕ/. Affect on syllable structure The guttural sounds /ħ, h, ʕ, γ/ have two prominent properties common to BA, Syriac and Samaritan. The first is a general tendency to lower a short high vowel to /a/ in the context of one of these sounds (No¨ldeke: 36, 62). (7.1)

BA mšabbaħ “praising” vs. mmallil “speaking” ʕabd-ett “I made” vs. kitb-et “I wrote”

(Rosenthal 1961: 17) (Rosenthal 1961: 46)

The second is in verbs, to insert an /a/ in the context: ħ, h, ʕ, γ_C-V. As will be seen in 7.2.2 below, the normal form of a verb such as yiktbuun “they write” has a sequence of three consonants, the stem vowel being deleted in an open syllable (< yiktub-uun). Where the first or second consonant is a guttural, an /a/ will be inserted after it in these forms: (7.2) taʕbd-uun → taʕabd bd-uun “you.M.PL do” (note: first /a/ due to general bd lowering effect of /ʕ/ noted in previous point) (Rosenthal 1961: 46, 47 BA) In Arabic gutturals nearly always tend to favor a low /a/ rather than a high vowel. This obtains in Classical Arabic (Kitaab II: 270), as well as in dialects. Thus, in Anaiza Najdi (Saudi Arabia, Johnstone 1967: 6), whereas a low vowel is normally raised in an open syllable, before or after a guttural raising is inhibited.

220 (7.3)

ARABIC IN CONTACT I: ARAMAIC xazan “he stored,” vs. sikan “he lived”

More striking than the lowering effect of the gutturals is a phonetic phenomenon very similar to the Aramaic /a/ insertion, termed the “gahawa” syndrome (a term coined by Haim Blanc). An identical set of gutturals (plus /x/, not found in Aramaic)⁴ /ħ, x, γ, ʕ, h/ induces the insertion of /a/ in the context Cgut C, e.g. aħmar → aħamar ‘red’ The term “gahawa” comes from the varying pronunciation of the word for coffee, gahwa in dialects without the gahawa syndrome, gahawa for those with it. The difference with Aramaic resides only in the distribution of the phenomenon. As described by Rosenthal for BA and No¨ldeke for Syriac, in Aramaic it is restricted to imperfect verbs or the 2MSG perfect form. In Arabic it is generalized to all Cgut C context, both verbs and nouns.⁵

Diphthongs In West Syriac and in Samaritan Aramaic the dipthongs /∗ ay/ and /∗ aw/ show the following phonological alternation (No¨ldeke 1898: 34; Macuch 1982: 118): (7.4)

/∗ ay/, /∗ aw/ →

/e/, /o/ in closed syllables /ii/, /uu/ in open syllables bet “house,” bet-nu “our house,” vs. biit-ak “your.M house” yom “day,” vs. yuum-a “its day”

In most varieties of Arabic the diphthongs either maintain their original value, or shift to monophthong /ee/, /oo/. bayt “house,” yawm “day”: Najdi etc. beet, yoom: LCA etc. In one geographically large variety, namely in most North African dialects, from Tunisia to Morocco the diphthongs are unconditionally raised to ii and uu. biit ‘house’, yuum ‘day’ These forms are identical to the open syllable variant of Syriac and Samaritan Aramaic. In UzA as well /aw/ raises to /uu/ (nawm →nuum ‘sleep’) and /ay/ sometimes raises to /ii/ (Fischer 1961: 234), while AnA also shows evidence of the raising, ɣiira ‘jealousy < ∗ ɣayra, fuuq-u ‘on it’ < ∗ fawq (Corriente 1977: 30).

⁴ The allophonic variant –Vk > Vx found in BA and Syriac apparently does not induce guttural /a/ in the context –VkC. ⁵ See Edzard 1998: 32. An interesting point about this Aramaic–Arabic isogloss is that “gahawa” dialects (i.e. those with the post-guttural insertion rule) are prototypically thought of as Bedouin (Fischer and Jastrow 1980: 109). These include Najdi Arabic, Sinai Arabic, Eastern Libyan Arabic, and WSA. I would note here that associating a bedouin feature with that of a classical language such as written Syriac will be met with skepticism on cultural grounds (see discussion in 5.2.2).

7.2 A SAMPLE OF POTENTIAL COMMON ARAMAIC–ARABIC ISOGLOSSES

221

In Eastern Syriac the alternation was ay/aw in open syllables, ee/oo in closed. In the Qalamuun region in central Syria as well as in NW Syria (Behnstedt 1997: 1002; Behnstedt and Arnold 1993: 68–69) an identical distribution is found: beet/bayt-u “house, his house” In the first case, North African Arabic raising of diphthongs to /ii/, /uu/ it is reasonable to see a generalization of (7.4) effected by a substrate of Aramaic speakers. In the case of dialects in the Qalamuun area a contact explanation is very plausible. Pronominal –h In 5.3.2.3 it was pointed out that a range of dialects exhibit a basic variation between C-h ~ C-V, where h/V begin an object pronoun beginning with h-. (7.5a)

beet-on ‘their house’ (Damascus) abuu-hon ‘their father’

In fact, this alternation is well attested in Aramaic as well. In Biblical Aramaic for instance (Rosenthal 1961: 20) the 3MSG is –hiy after long vowels, otherwise –eh. In Syriac (Muraoka 1997: 137) the 3M and 3F singular forms have –eeh/aah after C-, -y/h after a vowel. (7.5b)

melk-aah ‘her king’ vs. abuu-h ‘her father’

In Samaritan Aramaic the initial –h has been lost altogether (Macuch 1982: 132, 300). (7.5c)

ʕabd-on ‘their servant’

There is a high degree of likelihood that this feature was transferred from Aramaic to Arabic in pre-Islamic times. It spread in Arabic during the era of punctuation.

7.2.2 Syllable structure I have treated this topic in detail in Owens (2017) and therefore will be brief. Syllable structure was already introduced in the context of genetic classification in 2.3.2. Varieties of Arabic and Aramaic share a complex syllable structure rule based on two simple premises: unstressed short vowels in open syllables are deleted, and sequences of CCC/CCCC need to be broken up by an epenthetic vowel. Simple deletion can be exemplified in the basic perfect verb. (7.6)

Baghdadi BA kı´tab kttább ‘he wrote’ kitb-at kittb-att ‘she wrote’

222

ARABIC IN CONTACT I: ARAMAIC

In the Baghdadi Arabic form, stress protects deletion in an open syllable (see 7.2.3.1 below). In the Biblical Aramaic form presumably the first syllable was unstressed, leaving the vowel in an unprotected open syllable (< ∗ kittabb). In the 3FSG form, the suffix –at/att creates an open syllable (boldface), ∗ kitab-at/kittabb-at, which is disallowed and deleted, giving the structurally identical results. Deletion and repair. The output of a deletion can create a CCC sequence. This CCC sequence needs to be broken up by an epenthetic vowel, which in Baghdadi and in Syriac is placed between before the penultimate consonant. (7.7a)

Baghdadi yiktib-uun → yiktb-uun (deletion of short unstressed /i/ in open syllable → yikitb-uun (insertion of epenthetic vowel to break up CCC)

A parallel epenthesis applies in Syriac (repeating (1.11) above), except the condition on epenthesis appears to be CCCC, except in the context Cgut CC, as described in 7.2.1. (7.7b)

Syriac 1. ne-θdkar-aak→ (cf. ʔettkar ‘he remembered’; delete short vowel in open syllable) 2. ne-θdkr-aak→ repair CC_CC sequence 3. ne-θdakr-aak (→ neddakraak via assimilation) He shall remember you.M’ (Syriac, Muraoka 1997: 15)

There are a number of intricacies to these rules that are discussed in detail in Owens (2017). For instance, as seen in (7.2) above, whether or not an epenthetic vowel is inserted in Biblical Aramaic depends in part on whether or not a guttural consonant occurs. In Syriac it appears that the rule kicks in automatically when a sequence of four consonants, CCCC, emerges after short vowel deletion. The overall argument, however, is that rules of such specificity, particularly the deletion and repair schema, are rare among the world’s languages, and, that Baghdadi and some northern Syrian Arabic dialects (Behnstedt 1997: 147, 149, 269) are among those Arabic varieties which have them. Again these are areas where Aramaic contact is seen to be highly likely.

7.2.3 Morphophonology 7.2.3.1 Stress protection for short vowels in open syllables As seen in 7.2.3, short vowels in open syllables are normally deleted in Aramaic and in some varieties of Arabic. An important exception to this is when a suffix is added to an imperfect verb, or when a possessive pronoun suffix is added to a

7.2 A SAMPLE OF POTENTIAL COMMON ARAMAIC–ARABIC ISOGLOSSES

223

noun. In these cases the short vowel in the open syllable is stressed, protecting it from deletion (see 7.2.3.1. above). (7.8)

BA⁶ akkúl-i “eat-FSG!,” šbúq-u ‘leave-PL!’

(Rosenthal 1961: 18)

The F. suffix –i induces stress on the preceding open syllable, preventing deletion of the vowel. The situation in Syriac is not entirely clear. In Samaritan Macuch (1982: 122) explains forms such as qaaṭáal-at as due to a reconstructed stress on a short vowel, as in BAr, ∗ qatál-at which was subsequently lengthened. While noting that the evidence is not direct, Knudsen (2015: 173) suggests that a similar condition applies to Syriac. In Arabic stress protection of short vowels in open syllables is found in the presence of an object suffix. In Baghdadi Arabic, for example, which as seen in 7.2.2 categorically does not allow short vowels in open syllables, before an object suffix stress is attracted to the syllable before the suffix, thereby protecting a short /a/. (7.9)

xaabár-a darrás-ak

‘he telephoned him’ (not ∗ xaabr-a) ‘he taught you.M’

This situation is thus parallel to Aramaic, except that in Arabic protection of a short vowel in an open syllable is limited to object suffixes.⁷ (7.10)

xaabr-i ‘you-F telephone’ (imperative)

7.2.3.2 1SG stress In Biblical Aramaic the first person singular suffix –ı´ is always stressed (Rosenthal 1961: 18). This is a morphological peculiarity of this form, as there is no phonological context to sanction its stress. Other –V-initial object suffixes do not bear stress. (7.11)

melk-ı´ ‘my king’

Among Arabic dialects there exists a long string of dialects with unique stress on this suffix, and no other object suffix, stretching from southern Jordan (e.g. Bduul, Owens, and Bani Yasin 1984, across the northern Sinai (de Jong 2000: 164, 282, 368, 675 [maps]) and into the eastern Delta (Jawf ), then reappearing in the Baggara ⁶ Rosenthal explains the penultimate stress here as due to the effect of adding a pronominal suffix. However, as seen in (7.1, 7.6), the pronominal suffixes –et 1SG and –at 3FSG induce vowel deletion in the preceding open syllable, according to general rule. It rather appears that the person object suffixes on verbs are lexical exceptions in drawing stress rather than deletion. ⁷ As a morphophonological rule. There are dialects (Cairene) in which a penultimate syllable before CC- is stressed phonologically, e.g. yiktúb-u ‘they write.’ In Cilician (Procházka 2002: 97–106) both object and subject suffixes attract stress in the imperfect verb.

224

ARABIC IN CONTACT I: ARAMAIC

dialect of the western Sudan (Manfredi 2010: 67), Chadian⁸ and Nigerian Arabic (see 5.3.2.2). It is relevant to note that the Bduul traditionally lived in Petra, which in its day was the Nabataean capital. I would note here that it is important to distinguish the exceptional Aramaic 1SG stress from the Biblical Hebrew stress rule which under appropriate conditions can stress most pronoun object suffixes (Greenberg 1965: 65; Jou¨on and Muraoka 2005: 62, 105, 286). It is this confluence of specificity between Aramaic and some varieties of Arabic which suggest the imprint of Aramaic influence rather than a general NW Semitic inheritance.

7.2.4 The active participle The Arabic AP shares not only formal but also interesting significant functional similarities with various varieties of Aramaic.

7.2.4.1 The active participle as verbal predicate Perhaps the most distinctive feature of Aramaic morphosyntax is the development of a set of verbal predicates which are marked for person, number, and gender out of active participles which originally were marked only for number and gender (see 11.3.2). In the modern Aramaic dialects these originally participial forms have wholly (in all NENA dialects) or partially (in modern western Aramaic and in modern Mandaic) replaced the earlier prefix and suffix conjugations (imperfect and perfect verb, respectively). By Biblical Aramaic times this development was already well under way. (7.12)

la hašt-iin aniħnah not need.AP-PL we ‘we don’t need’ Daniel 3.16

Detailed studies as to the function of the active participle as verbal predicate remain to be undertaken, though broad functions have been identified. Rosenthal (1961: 55) for instance notes “immediate present” or “continuous and habitual action.” A notable development in Samaritan Aramaic (Macuch 1982: 118, 204; 1991: 972) occurs where the second person object suffixes assume personal status as subject suffixes on active participles, as in (7.13)

qaaʔeem-ek ‘you have stood up, lit. standing up-your’

In Arabic the status of the active participle as a member of the tense/aspect system has been underappreciated (see Eisele 1999; Owens and Yavrumyan 2007, ⁸ The Sinai dialects as well as Chadian and Baggara, but not Nigerian Arabic, furthermore stress the 1SG verbal object suffix.

7.2 A SAMPLE OF POTENTIAL COMMON ARAMAIC–ARABIC ISOGLOSSES

225

for overview). Essentially the active participle in spoken Arabic is a third form along with the perfect and imperfect tenses, with a clearly profiled aspectual function whose unified meaning is to indicate an action relevant to a given point in a narrative, or lacking this, to the time of speaking. Often it is equivalent to the English have perfective. It is typical to find three-way contrasts of the following type (Emirati Arabic): (7.14)

hu yalas ‘he sat down’ (perfect) hu y-iilis ‘he is sitting down right now’ (is in process of sitting down, imperfect) hu yaalis ‘he is seated/has sat down’ (AP)

The verbal function of the active participle is so uniform across all dialects (only Maltese lacking it) that it clearly derives from a common source. In Arabic and Aramaic that the active participle plays a central role in the verb systems of Semitic languages.

7.2.4.2 Person-marked participle The person-marked participle occurs only in Uzbekistan and Afghanistan Arabic. In an incisive, though incomplete, interpretation of the development of this form, Windfuhr (2005) notes that this construction, which he terms the “perfect” goes back to the original active participle. In particular, he suggests that this construction could be calque on a Kurdish perfective, formed, as in Sulaimani Kurdish, by a participle + agent + patient construction. He places the following forms side by side (2005: 118), for “I/you have hit them” (“N” added by me): (7.15)

participle N Agent Patient Sulaimani Kurdish xward-uu Ø t in Uzbekistan Arabic zorb in ak um

Windfuhr’s contribution is to have indicated that the construction is to be understood as a calque on a system already functioning in a co-territorial language. But a more direct, and in the context of the current exposition, profound association can be found in Aramaic, not Kurdish. The key element is that Aramaic and Arabic already share a ‘verbal’ function of what is historically an active participle form, so that all bilingual speakers had to do was to transfer the Aramaic idea of marking person on the participal form to a set of forms amenable to this representation in Arabic. Moreover, as noted above, a formal “hint” to use the object pronouns as a subject marker potentially existed in the construction noted in Samaritan Aramaic, whereby the object markers on the active participle form assumed subject marking status.

226

ARABIC IN CONTACT I: ARAMAIC

Note that from a larger areal perspective Windfuhr’s observation can be expanded in the direction of a Sprachbund going back into the first part of the millennium. As has been demonstrated above, both Arabic and Aramaic share a verbal paradigm where the participal forms a part of the paradigm. Aramaic in general has expanded this function more than Arabic, though Uzbekistan is one variety of Arabic which has ‘calqued onto’ the Aramaic construction formally. Unless it can be shown that Kurdish otherwise had a specific influence on Uzbekistan Arabic (as opposed to the well-established Persian/Tajik areal influence), it makes much more sense both linguistically and historically to attribute the specific influence to Aramaic speakers. The larger issue raised by Windfuhr’s observations, however, is whether the expanded verbal function of the participle in Aramaic wasn’t due to very early contact with Iranic or other languages.

7.2.4.3 Development of finite conjugation based on active participle in Central Asian Arabic As noted above, already by Official Aramaic times the participle was well on its way to integration into the verbal system of Aramaic. In Aramaic this integration had far-reaching morphological consequences. In all varieties an inflected personal form is attested, which goes back historically to the passive participle (C(V )CiiC). (7.16)

kttiibb-att “it.F was written”

(Rosenthal 1961: 64)

In Syriac and Samaritan Aramaic person marking expands to the active participle as well. In Samaritan the active participle may be marked by original object suffixes (see [193]), while in Syriac encliticization of original independent pronouns begins. As was illustrated above in 5.2.1 (5.16), the Uzbekistan AP paradigm has developed its own person markers. In the first and second person forms the object suffixes were refunctionalized as subject markers. Uzbekistan Arabic is one of those dialects where an object suffix is marked by the intrusive –n-, so in the first and second persons we get: (7.17)

Uzbekistan perfect zoorb-in-ni “I have hit” zoorb-in-na “we … zoorb-inn-ak “You.M have hit” zoorb-in-kum “you.M.PL … zoorb-inn-ik “You.F have hit” zoorb-in-kin “you.F.PL …

Note that this subject marking with what is historically a direct object pronoun parallels Samaritan Aramaic above (7.13). In the third person the original participle stem stands, but as the person and number suffixes are used exclusively in the third person, they take on personal value. (7.18)

zoorib zoorb-a

‘he has hit zoorb-iin ‘they.M have hit’ ‘she has hit zoorb-aat ‘they.F have hit’⁹

7.2 A SAMPLE OF POTENTIAL COMMON ARAMAIC–ARABIC ISOGLOSSES

227

In the case of the third person there is no direct analogue in Aramaic that I am aware of. However, there is a close parallel in the contemporary spoken varieties, where, as noted above, personal forms have developed out of forms which, like Uzbekistan Arabic, were originally non-personal. In the Neo-Aramaic dialects the new finite forms developed via enclitization of formerly independent pronouns. The following, for instance, is from Turoyo in Anatolia (Jastrow 1997: 367). (7.19)

qayim-no ‘I stood up’ < ∗ qaayim uno ‘having stood up I’ qayim-it ‘you.M stood up’ < ∗ qaayim hat ‘having stood up you’

When these forms arose in Aramaic is an unresolved question. While they traditionally are ascribed to the “neo” phase of Aramaic, in recent years there is some acceptance of the fact that contemporary diversity may reflect much older diversity (Khan 2007b: 8).¹⁰ In this instance then, the argument is one of calquing. Aramaic had developed a way of marking person on the active participle, involving pronouns, whether suffix pronouns of SamAram or the encliticization of independent pronouns. In Central Asian Arabic this basic format was applied to the Arabic active participle, matching varieties of Aramaic with UzA. As seen in the previous point, Arabic and Aramaic share the basic verbal value of the participle, so no verbal refunctionalization was necessary.

7.2.5 Differential object marking (DOM) In Aramaic definite nominal direct objects are optionally marked by the preposition l-, which otherwise indicates a benefactive or indirect object argument (Macuch 1982; Muraoka 1997: 77; No¨ldeke 1898: 168, 218; Rosenthal 1961: 56). This “differential object marking” occurs in two formats. In one (7.21a) an anticipatory (proleptic or cataphoric) pronoun is attached to the verb and the noun object is marked by the preposition, while in the other (7.21b) no anticipatory pronoun occurs. The examples are from Syriac (Contini 1999; Fischer 1907: 181; Khan 1988: 108, 130; No¨ldeke 1898: 168, 218). (7.20a)

bnaa bayt-aa → bnaa-hy l-bayt-aa built house-DEF built-it.M to-house-DEF ‘He built the house > he built it the house’

⁹ Cf. here the active participle ‘having hit’ in a “normal” Arabic dialect like Baghdadi Arabic, ḍaarib “having hit.M” (unspecified for person), ḍaarb-a “having hit.F,” ḍaarb-iin “having hit.PL.” One can say, for instance, ana ḍaarib/inta ḍaarib, huwa ḍaarib … “I have hit/you.M/he has hit.” ¹⁰ Khan (2007: 8) writes of the origins of the contemporary eastern Aramaic dialects, “neither the Jewish nor the Christian spoken NENA dialects appear to be direct descendants of the earlier literary forms of Aramaic such as Babylonian Talmudic Aramaic and Syriac.” He goes on to note that these first millennium AD written varieties co-existed with a widely diverse spectrum of spoken varieties.

228

ARABIC IN CONTACT I: ARAMAIC

(7.20b)

bnaa l-bayt-aa built to-house-DEF ‘He built the house.’

Contini¹¹ (1999, also Diem 1979: 47) notes that an analogous construction occurs in Lebanon, greater Syria and in Baghdadi Arabic. (7.21a)

šuf-t-a li j-jaahal saw-I-him to DEF-child ‘I saw the child.’

(7.21b)

šuf-it li j-jaahal saw-I to DEF-child ‘I saw the child’

Variants of this construction are found in Andalusian, Maltese, Cypriot Arabic, and Central Asian Arabic where the direct object is marked by the preposition l-, but no anticipatory pronoun object on the verb occurs, i.e. the model is (7.21b). In Andalusian Arabic it generalizes to both definite and indefinite direct objects (Corriente 2007: 110, Heine and Kuteva 2005: 152). In the Afghanistan offshoot of Uzbekistan Arabic Ingham (2006: 34) reports the object must be animate. (7.22)

(7.23)

(7.24)

ray-t lil-hom saw-I to-them ‘It was them I saw’ leel li hat qatal-u night to him killed-him ‘He killed him in the night’ pi-vaddi l-exl-u IND-send to-parents-his ‘He sends his parents’

(Maltese, Borg and Azzopardi 1997: 136)

(Uzbekistan, Fischer 1961: 263)

(CyA, Borg, 1985: 138)

Rubin (2005: 107) exemplifies the danger of not casting one’s comparative net far enough when evaluating this feature. While noting the construction in Maltese he adopts Borg and Mifsud’s (2002, not seen by me) explanation in terms of a Romance substratum. The presence of the construction in Central Asian Arabic as well as in Cypriot Arabic provides strong evidence for the classic diffusionist explanation of its expansion. While so-called differential object marking is found in a number of languages throughout the world (Lucas and Manfredi 2020), in Arabic it is (1) only found ¹¹ Contini (1999: 111) appears to view the l-marked object constructions without cataphoric pronoun as being a more recent independent development in Lebanese Arabic, but still with an Aramaic substrate influence.

7.2 A SAMPLE OF POTENTIAL COMMON ARAMAIC–ARABIC ISOGLOSSES

229

in a discontinuous set of dialects. (2) It has two manifestations, one with cataphoric pronoun, one without, but both of these correspond to constructions identical to Aramaic (e.g. Syriac). (3) The areas where the differential object marking occur are also areas where past Aramaic–Arabic contact is plausible, in particular Iraq and the Levant. (4) Some areas where the differential object marking without a cataphoric pronoun as in (7.22, 7.23) is found do not obviously have a contemporary Aramaic substrate, such as Uzbekistan and Malta. However, these regions are precisely those which were settled in early diasporic movements. Either the differential marking was brought with the original population movement, and/or it was maintained through a strong Aramaic substratal population in the immigrant community. One of the regions where the differential marking is attested, Cyprus, is an area where on independent grounds Borg (2007) discerns strong substrate Aramaic influence. Explaining the differential object marking construction in Arabic therefore is plausible on three grounds. First, given its dispersed distribution, and its general absence in most Arabic dialects, it is probably not an inherited proto-feature. Secondly, where it is attested it allows a plausible historical-demographic connection either between Aramaic speakers and Arabic speakers, or to Arabic speakers who became part of the immediate Arabic-Islamic diaspora. Thirdly, invoking Lass’ principle, P2, the likelihood of multiple independent development is vanishingly small.¹² Further exemplification of possible morphological and syntactic influence is found in App. 7.2.5b, and in App. 7.2.5a specific arguments by Pat-El and Stokes (2022) in favor of independent parallel development are argued against.

7.2.6 What didn’t happen It would take up too much space to indicate for each of the features summarized here, where these proposed contact-induced changes did not occur (see Owens 2018d: 434–443 for this perspective). This chapter makes the basic argument for widespread Aramaic influence. In some cases its effects are, today, limited. Guttural /r/ (7.2.1.) for instance is restricted to areas of Iraq and Morocco (Fez). Others are very widespread, for instance the deletion and repair (7.2.2) schema is found in many dialects, though to a different degree and with different manifestations in different dialects. In some (most Maghrebi dialects) it is all-pervasive, while in others it is much more limited or even non-existent. Following each feature in its historical development remains an outstanding task. ¹² Souag (2017) discusses clitic (pronoun) doubling in Arabic, and concludes that cases, some of which are identical to those considered here, are due to parallel independent development. His treatment is structural, not historical however and it ignores morphemic identity (see Owens 2018d: 424 n. 135).

230

ARABIC IN CONTACT I: ARAMAIC

7.3 Arabs and Aramaeans: The socio-cultural basis of diffusion Before moving to a concluding linguistic assessment of these facts, it is relevant to outline in some detail the history of Aramaean–Arab contact. Without a social and cultural meeting there clearly could not have been such widespread, and in some cases, structurally deep-seated borrowing as here. From the time of the earliest attestations of Aramaic in the eighth century BC, Aramaeans and Arabs have been in close contact. Aramaeans themselves first appear in history as nomads on the northern fringes of the Assyrian empire in the Syrian desert and in Mesopotamia. Lipiński (2000: 38) infers their historical attestation by 1800 BC in the designation “Sutaeans.” These were a nomadic group that frequented the Middle Euphrates and Syria, who by the eleventh century BC had become synonymous with the Aramaeans. Beginning around 1300 BC they become a significant threat to the Assyrian empire (Lipiński 2000: 38), and by the ninth century BC some of them had taken up a sedentary life. A number of Aramaean kingdoms dominated by different tribes developed along the middle and upper Euphrates and into Anatolia, Syria, and Lebanon, as well as in southern Babylonia. Many of these kingdoms were the object of various attacks by Assyrian kings, accounts of which are a main basis of our knowledge of their existence. No Aramaean kingdom ever achieved widespread political dominance in the region. However, Aramaic itself did become the major native language of much of the Middle East for over 1,000 years. Gzella (2017: 84) suggests for instance that Aramaic was dominant in Babylonia by 612 BC. It is assumed in this book that its demographic dominance was coupled with an L2 vernacular status (see Goldenberg 2013: 12; Gzella 2017: 111) which would have facilitated its spread via contact into other languages. During Persian Achaeminid rule (ca. 550–350 BC) a variety was used for official correspondence, hence its attestation as far as Egypt. In his summary, Lipiński (2000) puts considerable emphasis on the extent, both geographical and chronological, to which Arabs and Aramaeans have lived in close, and apparently a largely non-antagonistic relationship. In the Aramaean centers of power located in Syria and along the Euphrates beginning 1000 BC comingled Aramaeans and Arabs. Lipiński for instance describes Laqee, an area around present-day Deer iz-Zoor in eastern Syria on the Euphrates River, as “a rather loose confederation of North-Arabian¹³ and Aramaean sheikhs” (2000: 101), and later as a “mixed Aramaean-Arabian confederation” (495). He similarly describes Aramaeans and Arab tribes living together in the ninth century BC south

¹³ Lipiński alternates between the designations “North Arabians” and “Arabs,” though inasfar as etymological origins allow, via for instance tribal and place names, evidence points to Arabs, i.e. speakers of the Arabic language.

7.3 ARABS AND ARAMAE ANS: THE SOCIO-CULTURAL BASIS OF DIFFUSION

231

of the Diyala River, i.e. in the area of present-day Baghdad. Particularly vivid evidence for Aramaean–Arab interaction comes from the records of a campaign by the Babylonian king Tiglath-pileser III, who in BC 735 conducted a compaign against the Chaldeans who were a dominant group in southern Babylonia, and which had a significant, if not dominant, Aramaean ethnic composition (Lipiński 2000: 416–422, also Eph’al 1974). The document mentions 35 tribes who were subjugated in the attack, and of these, Lipiński on the basis of the tribal names suggests that nearly half were either Arabs or had Arabic clans in them. The ħiḍḍaar tribe, for instance, was said to contain four groups. Lipiński (2000: 457) suggests that two of these were Aramaic, two Arabic. Under one tribal umbrella “the various groups forming the tribe may have spoken two different languages, respectively Aramaic and Arabic.” Summarizing the situation he states, the global history of these Aramaeans in the 8th –7th centuries B.C. can hardly be separated from the history of the North-Arabian tribes living in the same regions and called “Aramaeans” in Assyrian sources that barely and only exceptionally distinguish the two groups. (485)

What little can be reconstructed of social life among the Aramaeans and Arabs in this era further allows us to assume a close relationship between the two groups. Originally they, like Arabs, were nomadic (Segert 1975: 35), and while Aramaeans developed an urban culture, they continued nomadism probably throughout the history of the Arab–Islamic expansion. The centralized states which they did develop were politically weak. The close relationship between the Aramaeans and Arabs continues to be attested up to the Arab-Islamic expansion. Retso¨ (2003) in his compendious summary of pre-Islamic Arabs documents this affinity in a number of places. Most striking in this respect is the somewhat enigmatic Nabataean culture which began emerging in the Horan (northern Jordan, southern Syria) in 312 BC. Its heyday saw its center in Petra in southern Jordan, between ca. 170 BC and AD 100 from where it dominated western Jordan and the northern Hijaz, southern Syria and the Negev, with its interests stretching to present-day Gaza. The Nabataean script is well attested in the Sinai through numerous graffiti. The Nabataeans have been problematic for cultural historians of the Middle East. They wrote in Aramaic, yet they produced in Nemara in southern Syria the earliest Arabic text, in Aramaic (Nabataean) script. They were connected in contemporary sources to Arabs, but were not considered to be Arabs (Retso¨ 2003: 381). Different ethnic identities have been attributed to them. For Cantineau (1930: 9) they are Arabs. Against this, Starcky (1955: 87) suggests they were originally Arabs who gave up their language in favor of the Aramaic-speaking peoples they came in contact with and among whom they settled. In fact, it is more likely that they are comprised of

232

ARABIC IN CONTACT I: ARAMAIC

a supra-ethnic identity, as with the Chaldeans described briefly above, composed of Aramaic and Arabic speakers. Writing about northern Syria and Mesopotamia, the Greek geographer Strabo (d. 24 CE), quoting the geographer Posidonius who wrote and described conditions in the first half of the first century BCE (ca. 80 BCE), notes that “the Armenians, Syrians [= Aramaens, arimaioi] and aráboii betray a close affinity, not only in their language, but in their mode of life and bodily build, and particularly wherever they live as close neighbors” (Retso¨ 2003: 352). Eratosthenes (d. 195 BC) wrote that Nabataeans lived in Arab lands. Diodorus writing around 50 BC notes that they were originally nomads (Hoyland 2001: 70–72, Hoyland 2004). It appears a moot point whether or not Nabataeans were Arabs. Their original lifestyle was nomadic, they continued the Aramaic–Arab symbiosis described above, and it appears they lived in a bilingual Arabic–Nabataean (Aramaic) speech community. In Retso¨’s words (2003: 379) “the Arabo-Nabataean kingdom inherited a multi-ethnic and multilinguistic territory from the very beginning.” Especially interesting is the report of large numbers of Arabs in the region of Edessa in southern Turkey between 0 and AD 110 (Retso¨ 2003: 412, 434). It will be recalled that Edessa emerged around this time as the center of Christian Syriac culture. While Retso¨ does not consider the short-lived Palmyran kingdom around AD 270 to be Arab, he does note the presence of a large number of Arab names in the Aramaic inscriptions, and Abbott (1941: 13) claims Palmyra’s queen Zenobia for the Arabs. Wilmsen (2014: 130–147) as well emphasizes the pre-Islamic presence of Arabs in the Levant. Retso¨’s broad documentation of a well-attested pre-Islamic Arab presence throughout the Middle East in what today are Iraq, southern Turkey, Syria, Lebanon, and Jordan, is usefully juxtaposed with Cantineau’s summary of Aramaic speakers in the Middle East in 150 CE. Dans le region qui s’étend entre la Méditerranée et le bord du plateau iranien, noun trouvons donc constituée vers – 150, aux lieu de la mosaique linguistique qui existait auparavant, un ensemble cohérent de parlers araméens. (1930: 11)

In short, the argument for a long lived and socially intense period of Aramaic– Arabic contact comes from two directions. On the one hand, beyond the Arabian peninsula populations of Arabs are well attested in pre-Islamic times in the Levant, Mesopotamia, and into southern Turkey. On the other, the historically attested spread of Aramaic is all but co-extensive with the Middle East itself: the Levant and Mesopotamia into Turkey, the northern Najd, the Persian Gulf between the fourth and tenth centuries (al-Thani 2014: 23) including present-day Qatar, Kuwait, the UAE and the NE Saudi Arabian litoral (Holes 2016: 12; Kozah et. al. 2014; Macdonald et al. 2017), the southern Arabian peninsula (Butts to appear: 10–12) and

7.3 ARABS AND ARAMAE ANS: THE SOCIO-CULTURAL BASIS OF DIFFUSION

233

even where there is no archaeological trace, as in Yemen, arguments from reconstruction allow it to be entertained in other places as well. The socio-geographic overlap between Aramaic and Arabic is far greater than customarily assumed. Admittedly some scholars viewed the pre-Islamic populations of the Middle East in more dichotomous terms, at least linguistically. Donner (1981: 95) notes on the one hand that “Aramaic speaking populations of Syria … culturally had more in common with the tribal society of the Arabian peninsula than they did with the settled communities of Syria.” He similarly notes that Aramaic was even more dominant in Iraq than it was in Syria (1981: 171). On the other hand, he speaks of the Nabataeans simply as being “Arab Nabataea” (1981: 95), and when he speaks of language he conceives of the population in Syria as speaking either Aramaic or Arabic (1981: 117) with Aramaic apportioned to the west, Arabic to the east. Some scholars apportion the two languages bimodally. “The Nabataeans thus seem to have used Arabic for oral communication, religious practice and cultural traditions, and Aramaic purely as a written language for representation, formal and legal agreements and contact with other Aramaic-speaking groups.” (Gzella 2017: 308; see also Al-Jallad 2020a: 43). It is, however, inherently unlikely that all individuals simply spoke either one language or another or that Aramaic was sealed in a written mode, the more so given the admittedly scant but still quite interesting specimens such as the Raqash inscription discussed below. A lingua franca, as Aramaic was in the region, by definition implies bilingualism, and given the close cultural affinity between Arabs and Aramaeans, it can be assumed that there was also a linguistic affinity marked by bilingualism, Aramaic the dominant lingua franca (see e.g. Hajayneh 2009). It is equally unlikely that even in the immediate pre-Islamic era the two populations simply separated into discrete language areas. Unfortunately, eyewitness accounts to this in the sixth to seventh centuries are entirely lacking. Small, but dramatic ones do exist, however, for instance in the remarkable “Raqash” (dedicatee of inscription) Nabataean inscription from the northern Hijaz at Hegra, NW of Medina, dated to AD 267 (discussed by Cantineau 1930/1935 and re-evaluated by O’Conner, 1986: 221–227). The four-line inscription consists of lexical and structural elements from both languages, a demonstrative from Aramaic, for instance, and prepositions from Arabic. O’Conner terms the text a “polyglot puzzle.” From a contemporary perspective, it appears to fall either within the typological bounds of codeswitching or of a mixed language.¹⁴ More cannot be said, beyond the key point that only a population deeply bilingual in Arabic and Aramaic could have produced it.¹⁵

¹⁴ For one more short example, the inscription from Sakaakaa also in NW Saudi Arabia, Nehmé (2010). ¹⁵ The text is not the earliest attestation of codeswitching/mixed language. Gibson (1982: 78 ff.) describes a Phoenecian–Aramaic mixed text from Arsland-Tash in northern Syria, dating to the seventh

234

ARABIC IN CONTACT I: ARAMAIC

Throughout the attested history of Aramaean–Assyrian relations, therefore, Arabs are consistently depicted as, or can be inferred to have been culturally and socially close to Aramaeans, often living with them in the same tribal affiliation. Closer to Islamic times, detailed information about the displacement of Aramaic in favor of Arabic may never be forthcoming. While Knudsen (2015: 21) suggests that by the tenth century Arabic was clearly the dominant spoken language, Hoyland (2001: 50) cautions that Aramaic was still a vibrant language into the thirteenth century. Arabic dominance certainly would have begun earlier from region to region. Thus Griffith (1997) documents the gradual shift in Palestinian monasteries between 500 and 800 CE from Greek and Aramaic to Greek and Arabic as liturgical languages, and from Aramaic to Arabic in the general populace. He notes (1997: 20) that in immediate pre-Islamic times the monks in these monasteries spoke the languages of the general populace, which were Aramaic and Arabic. By the eighth century Christian Palestinian Aramaic had nearly died out in favor of Arabic (1997: 27). That Aramaic speakers were present even in the most intimate Islamic circles is shown by Gilliot (2003) who documents references to Christians and Jews among the Prophet Muhammad’s intellectual entourage, some of whom most likely were Aramaic speakers, for instance Zayd ibn Thabit himself, the last secretary of the Prophet. For evidence of contact among the diaspora populations one would need to determine the ethnic and linguistic makeup of the “Syrians” who constituted large contingents of soldiers and immigrants to newly conquered lands. Importantly, Donner (1981: 249) notes that after the Islamic conquest of Syria “relatively few tribesmen [from the Arabian peninsula, jo] migrated there after the conquest.” Moreover, the early Islamic armies were not large—Donner (1981) for instance speaks of 20,000 soldiers at the battle of Yarmouk in 636—so any sizeable Aramaic-speaking contingent would have been significant. In Egypt and North Africa these early conquering Arabic armies typically established themselves in newly-founded communities—Qayrawaan in Tunisia (founded 670), and Tangier (708), Fes (789), and Sijilmassa (758) in Morocco (Aguadé 2018: 38–41)—where an Aramaic influence in the diaspora could have been maintained. For the Levant, whatever the ethno-linguistic situation was before the conquest would have been maintained in its immediate aftermath.

century BC. His summary is interesting: “It is difficult to avoid the conclusion that the writers intentionally mix Phoen. and Aramaic in order to impart a magical flavor to their texts and thus increase the potency of the incantations … both the language and the orthography are artificial creations specially concocted for this genre of writing.” Looking beyond Gibson’s speculative attribution of magical intent, the use of more than one language in an incantation could well have been directed toward the two ethno-linguistic communities (Arabic and Aramaean in the case of Raqaash, Phoenecian, and Aramaic in the case of Arslan Tash) which constituted the societies. Informing this mixing, however, was certainly a real sociolinguistic code, not artificial creation. Macdonald (2000) similarly speaks of Safaeo-Arabic, and Dadano-Arabic, besides Nabataeo-Arabic and one Aramaeo-Arabic text from the present-day United Arab Emirates.

7.4 DIA-PL ANAR DIFFUSION

235

There is ample evidence for the important role of Syrian “Qaysites,” in diaspora populations. Writing about Egypt, Kubiak (1987: 82) for instance, reports Qaysites were settled in the eastern Delta and in 727, the importation of a large contingent of “Qaysi” from the Syrian desert to Upper Egypt is reported (Lewis 1970: 176). An important jumping off point for the conquest of North Africa developed in Fustat (Cairo), founded in 641. While the early population of this city was very mixed, it is clear that a substantial part of the population came from Syria and Iraq (Kubiak 1987: 79, 83), areas where Aramaic would still have been widespread. Kubiak (1987: 83) notes that “The immigration from Syria must have been considerable, since under the Umayyad Caliphs close contact between the two provinces was maintained.” Since Fustat residential districts tended to be defined by tribal affiliations, the urban linguistic ecology would have favored the maintenance of minority languages in these parts of the city. The district of Al-Hamra al-Wusta, for instance, was a Syro-Byzantine stronghold (Kubiak 1987: 100). This brief survey shows that the intimate contact between Aramaic and Arabic, attested as early as Aramaeans and Arabs themselves are identified in written sources, did not abruptly come to an end in 622 with the coming of Islam. Rather the transition from Aramaic to Arabic lingua francahood occurred gradually over the period 600–900, during which time Aramaic continued to be widely spoken, even by non-native speakers, not only in the Middle East, but also in the emerging centers of the Arabic–Islamic diaspora.¹⁶

7.4 Dia-planar diffusion As illustrative material a series of features have been described which are common to the well-attested Middle Aramaic varieties and Biblical Aramaic, and to some, in most cases, a minority or even a very tiny minority of Arabic dialects.¹⁷ The features moreover are phonologically or morphologically specific ones. This is evident from the fact that for every proposed case of contact cited, there also occur forms and varieties which do not match the Aramaic. When specific traits are shared with some varieties of another language, but not all varieties, and when it is known that the languages were long in contact with one another, the suspicion of contact induced change is inevitable. ¹⁶ As one observes in the transition out of one dominant colonial language into global English dominance today. In Alexandria, Egypt, for instance, French continued to be used as a lingua franca among an old expatriate community and beyond until very recently, but now as the language of wider communication has given away to English. ¹⁷ Though it needs to be cautioned that what is “small” in an Arabic context may be rather large against other measures. CyA is definitely small both in terms of population and geographical area. Maltese is small by geographical area, but with over 300,000 speakers would stack up fairly well among the world’s languages in terms of speakers (median speaker range being around 5–10,000 for all world’s languages).

236

ARABIC IN CONTACT I: ARAMAIC

Beyond issues of cultural and academic traditions, I think there are two reasons early systematic contacts between Aramaic and have been neglected. (1) First, for any individual case, alternative explanations are available. This was seen in the case of the Maltese marker of the il-DO marker (7.2.5) where either a universal, parallel independent development or a Romance-specific alternative was preferred to the idea of the I/I + D model developed here. (2) Secondly the distribution of possible influences is spread over a large area of the Arabic world, much of which is outside the purview of direct Aramaic contact. However, all historical linguistic interpretation is evidential. If Contini and Macuch see parallel independent development in play (see above), then evidence needs to be adduced explaining why the features can’t be explained via single innovation plus spread (Lass’ principle). In this section I will present a systematic account of how pre- and early-Islamic contact between Aramaic and Arabic can be conceptualized, and how it helps to tie together diverse strands of influence. I term the contact “dia-planar diffusion.” It uses a wave model of diffusion, but adds a social basis, namely the observation from the previous section that groups of Aramaic and Arabic speakers, often small groups, were in close contact with one another over long periods of time. There were many local encounters. This can be represented as in Table 7.1, where each “Aram–Ar” token represents groups of speakers in contact, at different times. To underline the dispersed nature of the contact, for purposes of illustration it is assumed that among some communities there was always contact between the two groups (first row), whereas in other cases the contact was less lasting (e.g. third, fourth row).

Table 7.1 Aramaic–Arabic dia-planar diffusion T1

T2

T3

T4

Aram–Ar Aram–Ar

Aram–Ar Aram–Ar

Aram–Ar Aram–Ar Aram–Ar

Aram–Ar

Aram–Ar

Aram–Ar

…T = time, rows = dispersed geographical points of contact

In these groups, Aramaic would have constituted the language of wider communication, as it was the dominant language in the Middle East up until the Arabic–Islamic expansion. In general it would have been the Arabic speakers who would have been bilingual in Aramaic. These served as the locus of the diffusion of Aramaic traits into Arabic. It may be assumed that throughout the (at least) 1,600 years (900 BC–AD 700) many encounters and different linguistic outcomes

7.4 DIA-PL ANAR DIFFUSION

237

resulted, many of which probably eventually disappeared with no trace.¹⁸ Those which did become established enough to be transmitted into the present day were the result of local events. The geographical plane of contact was large and politically decentralized, so no standard set of diffused features resulted. At different periods different populations would have been at the locus of contact, a fact represented by the gaps at different times. We thus observe much linguistic contact evidence dispersed throughout the present-day Arabic-speaking world. In Owens (2018d: 460–464) the features treated in 7.2 (and others) are further apportioned into three “dia-planes.” Bundles of the contact-induced change are sequenced relative to one another into three temporal-spatial eras. For instance, features 7.2.1 (guttural /r/), 7.2.3.1 and App. 7.2.5a, b., are suggested all to have entered Arabic somewhere between AD 100 and AD 600 roughly the Middle Aramaic period, termed “dia-plane 2.” The deletion and repair schema, on the other hand (7.2.2), on the basis of its very wide extension, is postulated to have entered Arabic earlier, in what is termed dia-plane 3, sometime before AD 100, and similarly the AP as a full-fledged verbal paradigm member, as well as the intrusive –n discussed in 5.2.1. The person-inflected participle (7.2.4.2/3) attested only in Uzbekistan is considered to belong to dia-plane 1, which begins with the Islamic era. The diffusion, moreover, largely stopped with the Arabic ascendancy. By AD 800 Arabic had replaced Aramaic as the language of wider communication in the Middle East, and with Islam, the direction of change would have been for Aramaic speakers to switch to Arabic.¹⁹ In any case, once in the local varieties, the original Aramaic features were now part of different varieties of Arabic. Here they began to undergo further permutations and spread, as Arabic groups diffused linguistic traits amongst themselves. Once direct object marking with l- entered the language (7.2.5), it takes on a life of its own, and can be transmitted by its speakers independent of the presence of any Aramaic speakers amongst them. Two features (5.3.2.2/3) included in Chapter 5 quite possibly originated in Aramaic contact. The overall result was to produce unequivocal, but widely distributed linguistic traits which today we can ultimately trace back to Aramaic. The model is circular—it explains the current distribution of Aramaic features in Arabic by the fact that it is plausible to see them as original Aramaic features. By the same token, the major argument in support of the model is Occam’s razor. ¹⁸ Cf. e.g. the Raqash inscription referred to above which is mixed Aramaic-Arabic, probably representing the tip of what was a bilingual iceberg. ¹⁹ Linguists are far more reticent to intrude into interpretations of history than historians are to use a truncated, as often as not overly simplified interpretation of “Classical” Arabic in their own historiography of Arabs and Islam. However, it is relevant in this instance to cite Mackintosh-Smith’s (2019: 187) speculation that one reason for the rapid success of the Arab-Islamic expansion around 622 was the presence of an implicit “fifth column” among the large Aramaic population in the lands contested during the early Arab expansion. The evidence from language contact supports the supposition that the two peoples/languages were marked more by amity than emnity.

238

ARABIC IN CONTACT I: ARAMAIC

Similar or nearly identical features do not have to be accounted for twice. Lass’ principle, this time invoked across languages, argues for the model. It offers a simple conceptualization, which intuitively squares with the situation as can be documented today. Moreover, I think it infinitely better than traditional accounts which effectively reject diffusion and argue instead for massive independent parallel development, as with Macuch (1991) cited above. Effectively this is the theoretical linguistic choice that needs to be made: either there was a common innovation which then spread to four or five or six different locations via I/I + D as can be ascertained today, or the same phenomenon originated independently in the “same” language, at multiple points in history. As the discussion in 5.2.2 makes clear, the onus of proof is on those advocating such a vast amount of independent parallel development in two populations which have been in close contact for well over 1,000 years.

8 Morphosyntax as an adapative mechanism I Idioms

Before beginning the linguistic exposition a brief summary is in order describing how today’s Arabic speakers reached the Lake Chad area, since this population will be the focus of both this and the next chapters (see 6.6 for brief introduction). It is useful to situate LCA (Nigerian Arabic) among the major migratory movements which led to Arabic spreading out of the Middle East, what in Dixon’s (1997, see 2.5, 7.1) terms is known as a punctuation phase of language history. As far as Arabic goes, the era of punctuation has left in its wake contemporary populations which are historically multilayered and often multilingual. Three significant examples are the following. 1. Western North Africa (Maghreb) is defined by two main Arab migrations. The earlier pre-Hilali was mainly urban, which first arrived around 700, and did not lead to large-scale Arabicization of the region. The second, known as Hilali after the eponymous Bani Hilal from Upper Egypt began in the eleventh century and brought a different dialectal layer to the region. PreHilali and Hilali continue to be defining dialectal constructs today (Aguadé 2018: 36–39; see Maps 3, 8). 2. Punctuation has left behind Sprachinseln as the Arab expansion receded or where formerly continuous populations were interrupted (see 5.6) by newer language migrations. Notable among these are Uzbekistan Arabic, Maltese, and Cypriot Arabic (Map 4). 3. Punctuation has extended chains of Arabic speakers into areas otherwise dominated by speakers of other languages. The Arabic of Anatolia is one such extension of an Iraqi/Mesopotamian variety into a region dominated at various times and places by Aramaic, Armenian, Kurdish, or Turkish. A second is the expansion of Arabs out of Upper (southern) Egypt into the Sudanic region, ending their expansion in the Lake Chad region, the westernmost point being NE Nigeria, contemporary Borno. It is the historical linguistics of this variety that is treated here. Historically this movement shares with the Bani Hilal migrations an origin in Upper Egypt, though occurring 150–200 years afterward (see Map 8). Arabic and the Case against Linearity in Historical Linguistics. Jonathan Owens, Oxford University Press. © Jonathan Owens (2023). DOI: 10.1093/oso/9780192867513.003.0008

240

MORPHOSYNTAX AS AN ADAPTIVE MECHANISM I: IDIOMS

Map 8 Early migrations into Egypt and migrations from Egypt into Africa

As far as LCA goes, therefore, the immediate starting-off point was Egypt. Egypt was conquered by the Arab-Islamic armies between 630 and 641, 641 being a convenient terminus ab quo for the initial, partial Arabicization of Egypt. In the beginning Arabs constituted a distinct minority, and as seen in 4.3, during Umayyad Egypt Arabic itself shared official functions with Greek and Coptic. The settlement of Egypt by Arabs extended over a relatively long period of time, in large part between 630 and 1100, though the bulk of the Arab migrations were probably over by 900. The original Arabs came as members of the military which conquered the country as far as Aswan by 641. Fustat (Fusṭaaṭ), at the location of present-day Cairo, was founded in 641 as the country’s administrative center and served as a staging point for the further conquest of North Africa (see 7.3 for more background). Arabic tribes in the pre- and early Islamic era are frequently grouped under two large categories, a northern and eastern Qays and a southern and western Yemenite. These two groups in turn can be broadly associated with the Classical division, descendants of Ismael and Qaħṭaan. In all areas of the Arabic diasporic expansion which accompanied Islam these two broad groups are mentioned. This designation is geo-cultural, and to date no attempt has been made to associate them with a well-reconstructed dialectal affiliation. Upper Egypt reflects this division, with tribes arriving both directly from the Arabian peninsula and the Levant, as well as from internal migration out of northern Egypt. As conquerors, Arabs enjoyed special favor and monetary support from the central government until 831, at which point the burden of supporting a force to protect a pacified country was deemed too onerous. Loss of their privileged status led the nomadic Arabs to revolt against the Abbasid government in 831. The upris-

MORPHOSYNTAX AS AN ADAPTIVE MECHANISM I: IDIOMS

241

ing was put down, and many of the Arabs moved into Upper Egypt where central control was more tenuous than in the north. This is the region from which ancestral Nigerian Arabic/LCA migrated. Here the Arabs formed a minority (Garçin 1976: 360), and the historical sources speak of an unruly minority up until the time they moved into the Sudanic region. Uprisings against the central government are recorded in AD 1252, 1290, 1301, 1313, 1342, 1365. An important trend was the gradual extension of Arabs south of Aswan into the Sudan. In 630 Aswan marked the boundary between an Islamic Egypt and a Christian Nubia. This boundary basically held until the thirteenth century, when Arab tribes threatened Nubia to an ever increasing degree. Without going into the relatively well-documented details, Table 8.1 gives a general overview of the earliest year individual tribes arrived in Egypt and their probable or alleged origin in the Arabian peninsula or the Levant (based particularly on MacMichael 1922 [1967]: 133–148 and other sources, Brett 1978a: 502, 1978b: 620; Fage 1978,Holt et al. 1970, Garçin 1976: 48–57; Behnstedt and Woidich 2018: 81–84). All of these are attested in Upper Egypt by the twelfth century, i.e. before the major migration toward the Lake Chad area. The table ignores significant re-migrations from North Africa, some of which tribes themselves had passed out of the Middle East via Egypt (Behnstedt and Woidich 2018: 85–88). It needs to be recalled that the tribes themselves often had a pre-diasporic history of migration within the Middle East both in pre-Islamic and early Islamic times, which may explain why the “Yemeni” Tayy, for instance, migrated into Egypt from the Levant. Table 8.1 Arabic tribes in Egypt Qaysites (Ismael) Qays, (727) Upper Najd (Arabian peninsula) Kenana (818) Hijaz Fezara (647) Hijaz Banu Hilal, Banu Sulaym (1000) Baħrain, Najd Rabiiʕa, Banu Kanz (854) Tihama, Hijaz, Baħrain

Yemenites (Qaħṭaan) Juhayna (647) Hijaz Tayy (1050) Syria Baali (650) Syria Juđam (650) northern Hijaz Lakhm (800) Iraq–Arabia bordera

a

Behnstedt and Woidich (2018: 67–68) term the Juđam, Bali and Lakhm ‘pseudo-southern Arabs’ (pseudo-Yemenites), groups who fictionalized a Yemeni heritage in the wake of larger inter-tribal enmities. No comparative linguistic work has been done which might elucidate the import of such interpretations.

Anticipating Chapters 10 and 11, while it will probably be impossible to define a specific Emirati/Hijazi–Upper Egypt–LCA connection, it is clear that LCA is linked indirectly via Upper Egypt to an Arabic homeland in the Middle East. This includes links via tribes which nominally at least are associated with either the Hijaz or the Persian Gulf. Historical sources do specifically mention Sulaym and

242

MORPHOSYNTAX AS AN ADAPTIVE MECHANISM I: IDIOMS

Banu Hilal migrations from Baħrain, i.e the Gulf area, to Upper Egypt (Garçin 1976: 74 n. 2). Equally, the Juhayna Arabs, associated in early Islamic times with the Hijaz, are said to have formed a large portion of the Arab migrants into the Lake Chad area (Thomas 1959). The Arabic tribes in Upper Egypt were historically a group difficult for the central government to control,¹ so as early as 1200 it became official policy to encourage the tribes to migrate into the northern Sudan, then ruled by a Nubian kingdom. In 1276, during Mameluke rule, a major Egyptian expedition was sent against the Nubian kingdom, and thereafter Egyptian influence became ever stronger until in 1316, during Mameluke (1250–1517) times, Nubia itself was conquered. This left the pathway to Lake Chad open, and a famous letter written by the Mai of Kanem-Bornu of the Lake Chad region to the Mameluke Sultan Barquq in 1391 complaining that Arabs were devastating his kingdom through their slaving activities (Garçin 1976: 380), confirms the Egyptian origin of the ancestral Nigerian Arabs (see MacMichael 1967: 275; Thomas 1959: 144; Zeltner 1970; Brauka¨mper 1993 for Arab transition from Egypt to the Sudan, Chad and Nigeria). LCA belongs to a geographically large expanse of Arabic speakers within what is termed the western Sudanic region (WSA) stretching from Kordofan in the east, passing through Chad and Cameroon into NE Nigeria (LCA) in the west (Hassan and Owens 2008, see Maps 3, 8). This Western Sudanic Arabic has a number of uniform distinguishing traits, as has been noted at various points in this book (see e.g. 5.1), the dialectal unity mirroring a cultural unity based on cattle nomadism, the so-called “baggaara belt” (Brauka¨mper 1993). This dialectal uniformity is explicable if the linguistic features characterizing them (see extensive discussion in 5.1 for instance) were already in place in Upper Egypt before the migrations in the Sudanic region occurred, or alternatively occurred at the beginning of these expansions into the Sudanic area at the end of the thirteenth century. By the same token, it should be kept in mind that dialectal diversity was also present in these migrating populations, as discussed in some detail in 6.5. With this historical background to LCA, I now turn to two significant ways in which the Arabic of the Lake Chad area has been profoundly influenced by other languages in the region. In the rest of this chapter I will discuss idioms, while in Chapter 9 I consider demonstratives. In both cases, it should be emphasized, the basis of the analysis is an empirical corpus-based study. The corpus, whose quantified results were already met with in 6.8, consists of about 400,000 words of oral Arabic which were collected from a range of dialectal and genre-settings.

¹ And indeed the Bedouins in all parts of Egypt; see Petry 2001 for their unruly reputation during late Mameluke times.

8.1 IDIOMS

243

Traditionally idioms are either a part of descriptive lexicography or theoretical cognitive linguistics, rarely being invoked in comparative historical linguistic studies. In this chapter, therefore, it is relevant to spend much more time than in any other chapters defining the conceptualization of the category which is the basis of the comparative research. For this, two interpretive models of idioms are defined and compared in 8.1–8.4. It is argued that a lexical model of idiomatization, which roughly aligns with a model of psycholinguistic processing developed by Glucksberg (2001) and co-workers, best characterizes idioms and is most insightful for understanding the rather dramatic historical linguistic changes documented in this chapter.

8.1 Idioms Idioms have been largely the domain of cognitive linguistics (CL), associated in particular with the work of George Lakoff. They have rarely been treated in detail from the perspective of Arabic or African linguistics. In this chapter, however, idiomaticity in LCA is the focus, a focus which leads both to African languages and to the Arabic dialects. Moreover, idiomaticity in LCA is analyzed within a lexicalist framework deriving indirectly from a psycholinguistic rather than CL framework.² I treat idioms in LCA from three perspectives. I first discuss how I understand idioms (8.2), locating the discussion at the intersection of a cognitive linguistics (CL) and a lexically based treatment. The latter, in broad conceptual outline but not in precise operational implementation, follows the psycholinguistic work of Glucksberg (e.g. 2001) and his associates. I then turn to a description of LCA idioms, as embedded in the lexical framework (8.3–8.5). A number of key descriptive constructs will be introduced here, including a contextualized sense taxonomy. This will show that LCA idioms are entirely adventitious upon basic morphosyntactic and discourse properties of LCA. Idioms are morphosyntactically undifferentiated from non-idiomatic expressions. Where they are markedly different is their reduced referentiality. I illustrate this extensively with a discussion of idiomaticity in LCA, adducing as one important piece of evidence the behavior of idioms in a large linguistic corpus. I then turn to what distinguishes LCA Arabic idioms from idiomaticity in other varieties of Arabic (8.6, 8.7). In 8.8 I relate the discussion to universal properties of idioms. This is the historical linguistic analysis. It will be shown that LCA is largely indistinguishable from the idiomaticity of other LCAL’s, most prominently being indistinguishable from Kanuri. Equally,

² This chapter is based on Owens 2014a, 2015, 2016, 2020; Owens and Dodsworth 2017; and Benmimoun et al. 2017.

244

MORPHOSYNTAX AS AN ADAPTIVE MECHANISM I: IDIOMS

LCA differs dramatically from its “original” (see Chapter 9) lexical source, Egyptian Arabic. Even more dramatically, Egyptian Arabic idiomaticity is shown to share fundamental similarities with Southern Tunisian Arabic (STA). That is, neither geographical nor chronological separation between EA and LCA account for the differences between them. STA is equally distant from EA, chronologically even further removed in fact (see Map 8), yet the similarities between them are striking.

8.2 Idiomaticity In this sub-section I discuss the nature of idiomaticity.

8.2.1 Idioms and online processing With one significant exception, discussed in 8.2.2 below, the analysis of idioms has largely been assimilated to a broader theory of metaphor. Thus, while Lakoff makes reference to idioms in his work (e.g. 1980: 46), he does not present a systematic analysis of them, and often what he terms “metaphoric expressions,” “linguistic expressions,” or simply “expressions” (1980: 7, 14, 54) could equally be analyzed as idioms.³ Lakoff (1987: 446–453) sets the tone for later treatments of idioms in CL with the assumption that they are licensed by conceptual metaphors. ‘Spill the beans’ is interpretable in the context of metaphors THE MIND IS A CONTAINER and IDEAS ARE ENTITIES, for instance. Against what he terms “traditional” treatments which saw idioms as totally arbitrary, Lakoff emphasizes that the meaning of the idiom is motivated by the link provided by the metaphor. It is these metaphors which sanction the mapping of elements from source to target domain. ‘Beans’ in the literal, source domain are mapped on to ‘information’ or ‘ideas’ in the target. While it is not clear whether Lakoff would view all non-frozen idioms in this light, his perspective on the structure of idioms is followed in a number of later treatments. Clausner and Croft (1997, also Gibbs 1992, 1993), which draws out parallels between basic CL concepts such as entrenchment, productivity, and schematicity in semantic and morphological processes, situate idioms on a three-point scale of figurative language defined by metaphor, idiom and frozen idiom. The one extreme is populated by non-productive idioms such as “kick the bucket,” while the other is filled by the conceptual metaphor, which itself can have graded degrees of productivity. The middle category, the idiom, exemplified by “spill the ³ For instance (Lakoff and Johnson 1980: 8) the metaphorical expressions, ‘You’re running out of time,’ ‘He’s living on borrowed time,’ ‘Is it worth your while.’

8.2 IDIOMATICIT Y

245

beans,” still involves a source-target mapping embedded in a conceptual metaphor (e.g. THE MIND IS A CONTAINER), but it is semantically and collocationally fixed. In Clausner and Croft’s terms, such ‘transparent idioms’ (1997: 225) are semi-productive because the metaphor which motivates them sanctions only a limited number of idioms (‘spill the beans, let the cat out of the bag, blow the lid off a matter’). Nunberg, Sag, and Wasow’s study of idioms (1994) looks at idioms from a general linguistic, syntactic, and cognitive linguistic perspective. From a general linguistic perspective they offer a broad characterization of properties which characterize idioms. They are collocationally fixed and collocationally unpredictable. Other key properties of idioms are their opacity, compositionality, and conventionality (1994: 498). Opacity measures the disjunction between the idiomatic meaning and “the meaning we would predict for the collocation if we were to consult only the rules that determine the meanings of the constituents in isolation.” Compositionality refers to the individual lexemic senses of keyword + collocate accessed in order to compose a given idiomatic meaning. Compositionality is a key issue and will be elaborated on in detail in 8.2.4. It would appear that by compositionality Nunberg et al. intend compositionality relative to the literal meanings of the constituent lexemes. This indeed is the basis of their second parameter, opacity. In this sense, ‘spill the beans’ is noncompositional, since ‘spill – the – beans’ does not obviously compose to ‘reveal secret.’ Clearly, however, compositionality could imply figurative meanings of the constituent keywords. If ‘beans’ in ‘spill the beans’ accesses a non-literal extension of ‘beans, say ‘secrets’ or ‘small, relatively undetectable items’ or the like, the idiom comes closer to being compositional. As will be seen in detail later, as soon as one departs from the literal meaning of a lexeme, the constructs opacity and compositionality beg the question of the extent to which idiomatic meanings themselves presuppose a structured representation of the constituting lexemes.⁴ While Nunberg et al. do not attempt to embed their study explicitly in a CL framework, they appear to take for granted that idioms involve systematic mappings of the same order as metaphors. In this they are broadly in keeping with Lakoff ’s treatment.

⁴ Taken at face value, opacity and compositionality imply the need to form some sort of baseline of expectations for what idiom collocates call up for native speakers. Note, however, that Nunberg et al. do not themselves propose a baseline of expectation even for their study of American English, instead relying on their own intuitions in their analysis. The problems for establishing a baseline of expectations for languages with a cultural background quite different from English are all the more problematic.

246

MORPHOSYNTAX AS AN ADAPTIVE MECHANISM I: IDIOMS

To say that an idiom is an idiomatically combining expression is to say that the conventional mapping from literal to idiomatic interpretation is homomorphic with respect to certain properties of the interpretations of the idiom’s components. (1994: 504)

‘Spill the beans’ is interpreted as a mapping from a literal source to an abstract target, ‘beans’ mapped on to ‘important information,’ ‘spill’ on to ‘reveal.’ One possible nuance differentiating Nunberg et al. from Lakoff is that the predicate ‘homomorphic’ is stative, versus Lakoff ’s processual ‘mapping,’ leaving open the possibility that the idiom is in some sense a fixed entity. I return to this general point in 8.3 below. In all three approaches⁵ idioms are sanctioned to one degree or another in that they instantiate a general conceptual metaphor. Probably because of this assumed association between metaphor and idiom, the interpretation of idioms is described in terms of the same online mapping procedures which characterize conceptual metaphors. I will term this the ‘online mapping’ approach to idiom interpretation.

8.2.2 Two alternative approaches Rather than view idioms as constructs embedded in metaphor theory, two alternative and not necessarily mutually exclusive approaches can be described here. In 8.2.2.1 I outline briefly a lexical approach, which will be argued for and described in detail in this chapter in 8.3–8.5 and a psycholinguistic approach, in 8.2.2.2, from which the current approach draws.

8.2.2.1 A lexical approach A different approach to idioms can be termed the lexical. Briefly, this sees idioms as elements stored in the lexicon in a two-tier structure which will be outlined below, with the lexical collocates comprising the idiom realizing different senses which are called up only in the context of a given idiom. This idea is based on three elements: the work of Riemer (2005) and others, the psycholinguistic tradition of metaphor treatment which was particularly active between 1985 and 2005 associated with the work of Glucksberg and colleagues discussed in 8.2.2.2 below, and a detailed consideration of idiomaticity in LCA, which will represent the most detailed treatment in this chapter. ⁵ As well as other work on the topic. Goossens (2002: 364) in his analysis of idioms such as ‘catch someone’s ear,’ for instance, speaks of a metonymy in a metaphor. In his analysis, ‘ear’ undergoes metonymic extension, to ‘attention,’ and the entire phrase is given a metaphoric interpretation ‘get someone’s ear.’ A structural account in Horn (2003) will not be treated here. Svanlund (2007) develops the idea of “lexical metaphors” (see criticisms in Owens 2016: 67–68).

8.2 IDIOMATICIT Y

247

Riemer (2005) does not treat idioms as such. He is, however, concerned with describing how lexical polysemy arises. In his model, a property of a lexical item can be singled out for its metaphoric or metonymic value and become conventionalized in a new sense in the lexeme.⁶ Riemer (2005) recognizes a core meaning of a lexeme, and develops further meanings via metonymic or metaphoric interpretation. Analyzing the Warlpiri verb pakarni, he begins with a meaning of ‘hit’ or ‘hit with an object such as a hand.’ He begins with what he terms “prototypical centres” (2005: 327), also termed “core meaning” (2005: 345). Pakarni has a number of further meanings, including ‘kill, pierce, paint, perform dance ceremony.’ Each of these meanings is derived via a metaphoric or metonymic application. The meanings ‘kill’ and ‘perform a dance’ for instance, are both seen as effect metonymies, killing a causal metonymy from hitting, and ‘performing a dance ceremony,’ since this involves hitting feet or instruments against the ground. Briefly, the crucial point in Riemer’s model is that metonymic and metaphoric extensions of meaning are conventionalized in lexemes. Thus ‘dance’ is not derived via a “metonymic processing procedure” each time it is used in discourse (e.g. in sense of Searle 1980), but rather is stored in the lexical item ‘pakarni.’ The contrast to the “online mapping model” is obvious. If the meaning ‘dance’ is invoked, speakers of Warlpiri call up ‘parkarni’ directly, without recreating the procedures online by which parkarni came to represent ‘dance.’

8.2.2.2 A psycholinguistic alternative If one considers an idiom to be a ‘long word’ (Swinney and Cutler 1979), idioms are simply lexical items of a particular sort. Alternatively, individual lexical components of an idiom could be conceived of as lexical metaphors, calling up other elements of an idiom (Svanlund 2007). There are, however, both methodological and conceptual problems to either approach. The idea that idioms are ‘long words’ (a term coined by Glucksberg 1993, I believe) was proposed by Swinney and Cutler (1979: 528), who argue that idioms in the mental lexicon “are stored and accessed as individual lexical items.” In a number of articles, Glucksberg and colleagues (Glucksberg 2001: chapter 5; Cacciari and Tabossi 1988; Glucksberg et al. 1993) partly substantiate this conceptualization of idioms, but also raise problems on the basis of their extensive psycholinguistic experiments based on reaction time (RT) tests. In particular, they show that even if, say, ‘spill the beans’ has the meaning of ‘reveal,’ in recognitionbased RT tests it does not behave like a single word in that it lacks the gating ⁶ Similar to Evans (2009: 166) who suggests that polysemy can arise when “situated implicatures associated with a particular context can become reanalysed as distinct sense units,” without, however, developing the idea in detail. The historical development of polysemy based on metonymy and metaphor extension is described in Geeraerts (1997); see also Robert 2008 and Enfield 2002.

248

MORPHOSYNTAX AS AN ADAPTIVE MECHANISM I: IDIOMS

properties of individual words.⁷ Moreover, idioms have an internal syntactic structure which can be manipulated without changing the idiomatic meaning (It’s no use denying it; the beans have been spilled).⁸ From these observations Glucksberg argues that idioms have a dual structure: they have both an idiomatic or figurative aspect, hence their meaning, but they maintain their literal meaning as well, which is what is accessed when they are, for instance, manipulated syntactically. Idioms are, simultaneously, ‘long words’ and individual literal, ‘normal’ words.⁹ Glucksberg’s work serves three purposes here. First, in a discussion which has largely remained internal to psycholinguistics, it refines Swinney and Cutler’s idea of idioms. Secondly, it focusses the discussion on the lexical nature of idioms. Idioms in Glucksberg’s conception have a dual status. One is a traditional conception of individual words in an idiom as single lexemes. The other is the idea that the idiom itself has the status of a ‘long word,’ this a metaphorical way of conceptualizing the semantics of idioms, but a metaphor with a lexical basis. The third relevant aspect of Glucksberg’s work requires invoking a discussion which goes beyond his immediate treatment of idioms. This aspect is not germane to the immediate interpretation of the lexical nature of idioms, but rather to the general question of how metaphorical language is interpreted. In his approach to metaphor, Glucksberg is critical of Lakoff ’s (and CL’s) conceptualization of metaphor in general. In Glucksberg’s view, rather than see metaphors defined by a systematic mapping between source and target domain instantiating a ‘conceptual metaphor,’ Glucksberg explains the interpretation of metaphors by what he terms “property attribution.” As McGlone (1996: 457) explains, the metaphoricity of ‘Our marriage was a rollercoaster ride,’ is interpreted as matching ‘our marriage’ with the situations which typify a rollercoaster ride—exciting or full of ups and downs or scary, for instance. One of these properties is attributed to ‘marriage.’ This is an online, as it were situational approach, which, importantly for the present discussion, is not embedded in a conceptual metaphor, such as LOVE IS A JOURNEY. The relevance of this third aspect of Glucksberg’s work is that against most CL treatments (see 8.2.1 above), idioms are potentially divorced from an embedding in conceptual metaphors, because conceptual metaphors themselves are postulated not to exist. ⁷ I.e. recognition of the idiom does not follow incrementally as with single words, but rather is dependent on the identification of a ‘key word,’ which may come anywhere in the idiomatic string. ⁸ Glucksberg (2001: 73–75), similarly to most detailed detailed treatments of idioms, distinguishes four classes of idioms, non-compositional (‘by and large’), partially compositional (‘kick the bucket’), fully compositional (‘let the cat out of the bag,’ ‘spill the beans’), and quasi-metaphorical, which appear to be what in the CL tradition are considered figurative expressions instantiating conceptual metaphors. Glucksberg’s analysis pertains largely to the fully compositional idioms, and this seems to be the category analogous to the LCA idioms treated here. ⁹ The meaning of these long words in Glucksberg’s work are ‘stipulated’ (2001: 78). Importantly, against my own proposed treatment in 8.5 below, the stipulation is ‘non-compositional’ in that the meaning of the idiom does not follow from that of the individual constitutive lexemes.

8.2 IDIOMATICIT Y

249

In this book I use this third aspect of Glucksberg’s work instrumentally and selectively. That is, since my own treatment does not see a place for conceptual metaphor in the interpretation of idioms, I am invoking an intellectual tradition which sees the world in a similar way. By the same token, my approach is selective. I will not suggest that idioms are interpreted via online metaphorical processing at all, and hence my approach cannot be directly derived from a property attribution approach, though I will return to this idea in 8.4.¹⁰ To sum up this discussion of Glucksberg et al., idioms are interpreted as it were from the ground up, on a lexical basis. Their abstractness, their figurativeness is defined by a lexical stipulation as a special type of ‘long word’ meaning. Because Glucksberg does not recognize conceptual metaphors, idioms cannot be said to be embedded within them.

8.2.3 The case for the lexical basis of idiom interpretation In this section I would like to develop the argument that idioms have a lexical basis. This argument has six aspects (presented in 8.2.3). All will rely on data from Arabic, to a large degree from LCA Arabic, hence in 8.2.3.1 and 8.2.3.2 background information on this source will be given. In 8.2.4 arguments for the identity of the nouns constituent of the idioms with their non-idiomatic counterparts are given. In 8.3 discourse contextuality defined in terms of distributed polysemy. In 8.4, on the other hand, the argument for the lexically-based uniqueness of idiomatic nouns is introduced. 8.4–8.5 develop a model for representing a lexically based idiomaticity. In 8.6–8.7 a comparative, historical linguistic perspective will be further adduced to support this position.

8.2.3.1 The data, what are idioms? Nearly all of the data which I will adduce comes from a large corpus of LCA (Nigerian Arabic) idioms, which in turn are based on a 400,000 word corpus of spoken LCA texts (see Owens 2014a; Owens and Hassan 2011-present). The corpus has been supplemented with extensive work with Nigerian Arab language consultants. Most of the corpus is available publicly online (see bibliography). From the corpus 163 ostensible idiom types¹¹ have been identified from the four ¹⁰ Though it should be said that it does not appear that Glucksberg et al. see idioms as derived via ‘property attribution.’ By the same token I do not believe that he and colleagues make explicit the relation between idiomaticity and property attribution. Rather, the idea of property attribution is directly an answer to Lakoff ’s and CL’s interpretation of metaphorical meaning as being embedded in conceptual metaphors. ¹¹ “Ostensible idiom” allows that what are provisionally identified as separate idioms could be reducible to a single type. For instance, ligi ṛaas ‘escape, give birth’ (see (8.9c i)) was initially (ostensibly) identified as two idioms, ‘escape’ and ‘give birth,’ until it was pointed out that childbirth is a prototypical escape from danger, so they are now classified as a single idiom. The total of 163 idiom

250

MORPHOSYNTAX AS AN ADAPTIVE MECHANISM I: IDIOMS

keywords, ṛaas ‘head,’ gaḷb ‘heart,’ iid ‘hand’, and ʔeen ‘eye’ which are used in the statistical study. Examples taken from the corpus are identified by a code referring to the text they are taken from. The sociolinguistic background to the corpus is found in 6.8. I use the following format for the representation of idioms. {[[X X] literal meaning] = idiomatic meaning} [X X] are the lexical keywords, the collocates, which have a literal meaning, and an idiomatic meaning. Unless they are crucial to the meaning of the idiom, other elements, such as pronominal possessors, verb form and other factors are left off of the summary.¹² The idiom gaḷb-ı´ faaṛ ‘I got angry’ is as follows. {[[gaḷb faaṛ] heart boil] = be furious} This notation may be abbreviated according to the information which needs to be presented, e.g. inner bracketing may be dispensed with.

8.2.3.2 Idiomatic usage is the normal state of affairs for many lexemes Before looking specifically at the grammatical nature of lexemes in idioms, it is relevant to show that for many lexemes, idiomaticity is their normal state of affairs. The basis for this claim is actual usage as can be seen in an exhaustive listing of idiomatic and non-idiomatic token occurrence of six lexemes in the corpus (Table 8.2). Table 8.2 Sample of degree to which words are idiomatic Keyword

literal

idiomatic

% corpus

% idiomatic of lexeme

ṛaas ‘head’ gaḷḅ ‘heart’ iid ‘hand’ ʔeen ‘eye akal ‘eat’ šaal

54 0 65 57 366 595

247 101 28 21 143 324

.07725 % .02525 % .0235 % .02025 % .12725% .22975%

82% 100 30 27 28 35

While there is considerable variation among lexemes as to their degree of idiomaticity—and many lexemes as will be seen are never idiomatic in the corpus—it is clear that (1) idiomaticity is the dominant, in one case, exclusive state of a number of lexemes, (2) that idiomaticity is not at all an uncommon types is, nonetheless, probably a good indication of the number of idioms built around these four nominal keywords. ¹² Thus, in (1) the possessor can be freely changed, gaḷb-i/gaḷb-ak/gaḷb-aha faaṛ ‘I/you.M/she got angry’ (my heart/your heart/her heart boiled).

8.2 IDIOMATICIT Y

251

phenomenon on a quantitative basis, reaching a significant (in an informal sense) percentage of the entire corpus for a number of lexemes (see column 4).

8.2.4 Idioms contain normal words, normal morphemes, normal morphosyntax In many respects idioms are simply normal words displaying normal morphosyntax.

8.2.4.1 Idioms are normal words I: Compositionality Traditionally idioms have been classified into three or four different types. Glucksberg (2001: 73–76) has a four-point scale of idiomaticity described above in n. 8, this chapter, noncompositional, non-analyzable, compositional, and transparent idioms and metaphorically combining expressions. It is uncontroversial that idioms display compositionality, i.e. the degree to which the lexical collocates, or keywords¹³ as they will be termed here, behave syntactically as independent words. The relation between compositionality and lexical independence is so obvious a point that it hardly needs to be made. Nonetheless, it is useful to reiterate the obvious. Definitionally, lexemes are transparent to syntax. An idiom is compositional to the degree to which its lexical collocates behave syntactically like non-idiomatic expressions. All of the idioms cited in this study fall within this class, i.e. Nunberg et al’s “idiomatically combining expressions,” in Glucksberg’s typology, “compositional and transparent” idioms. This can be determined by elicitation (see (8.5)–(8.8) below), but perhaps more strikingly, is also evident in the corpus itself, where a number of idioms show a large range of variation across syntactic constructions, while maintaining the same idiomatic meaning. This is exemplified with the following idiom. {[[lamma ṛaas] join head] = unify} The most frequent idiom in the corpus with 48 tokens is lamma ṛaas ‘unite.’ While the majority of these tokens occur in a canonical V + O construction (8.1a), in all it is attested in five types of constructions, including topicalization (8.1b), passivization, nominalizations with different variants of the verbal noun for lamma (8.1c), and occurrence in active participle form (8.1b).

¹³ Not to be confused with the ‘key’ in psycholinguistic studies. The key is the word at which point in a gating experiment an idiom comes to be recognized as an idiom (see above, n. 7, this chapter).

252

MORPHOSYNTAX AS AN ADAPTIVE MECHANISM I: IDIOMS

Lamma ṛaas idioms lamma = verb, ṛaas = object tawwa lammee-tu ṛaas-ku formerly join-you.PL head-you.PL ‘Formerly you united’ (TV45) (8.1b) ṛaas = topic, lamma = active participle predicate al-kloob al ṛaas-na laamm-inn-a fi l-koob DEF-club which head-our joined-M.PL-it in DEF-club al-waahid da DEF-one this ‘The club in which they united us’ (IM20) (8.1c) lammiin, mǝlamm, malamma, lamamaan = possessed verbal noun, ṛaas = possessor ind-uhum mǝlamm ar-ṛaaṣ at-their joining DEF-head ‘they have unity’ (IM11) šuqul mallam-it ṛaas thing joining-F head ‘the matter of uniting’ (IM68a) ʔaarf-iin lamam-aan aṛ-ṛaas Know-PL joining-VN DEF-head ‘They know about unifying.’ (IM20) (8.1a)

Note also that ṛaas has three phonetic variants, [ṛaas], [ṛaaṣ] and [raas] all of which occur in both idiomatic and non-idiomatic variants.¹⁴ A high token count does not necessarily lead to distribution across syntactic constructions, as in {lamma ṛaas}. For idiomatic ‘heart,’ the most frequent idiom is {gaḷb raad ‘like, prefer’}. {[[gaḷb-PSSR raad] heart want] = want, like, prefer} Gaḷb here occurs in only three constructions:in its canonical position of subject (8.2), or as possessor (8.3) or object of a preposition. (8.2)

hay an naadum da gaḷba da ammal raayit-ni fišaan da bas bisawwi. . . ‘Boy does this person like me; that is why I do it’ (GR166)

(8.3)

bas aniina šik, ayy wahid kula le bikaan gaḷb-a DM we different each one DM to place heart-his ‘We are all different; each according to what he wants.’ (GR139)

¹⁴ LCA does have minimal ‘r’ pairs based on emphasis and lack of emphasis, e.g kaṛṛa ‘drag,’ karra ‘make hateful, reject’ (see 10.2), so in theory at least the contrast could be instrumentalized for literal vs. figurative meanings.

8.2 IDIOMATICIT Y

253

Variation is also morphological. Whereas (8.3) typically occurs with a possessor— as represented in the structure of the idiom itself in (8.4), the same idiom can occur without a possessor, ‘heart’ being definite. App. 8.2.4.1 discusses an opaque but productive idiom. (8.4)

baʔaḍ al-gaḷb bu-hub al bi-sma al-arab some DEF-heart 3-like who 3-hear DEF-Arabic ‘Some prefer (to marry) one who understands Arabic.’ (IM 80)

It follows that syntactic and morphological properties of lexemes make no reference at all to a construct such as ‘idiom.’ Lexemes in idiomatic constructions have the same morphological and syntactic properties that they have in non-idiomatic ones. As a final general point, the difference between idiomaticity and collocational frequency should be noted. Idioms are idioms because they conventionally collocate with only certain lexemes to constitute their idiomaticity (see 8.3 below). {[[lamma ṛaas] join head] = unite} is a conventional idiom, is highly frequent in the corpus (48 tokens in all), and is highly sensitive to any changes in its constitutive lexemes. In fact, if, say, lamma is exchanged with the semantically very similar kambal ‘gather together,’ the idiomaticity is lost. Kambal ṛaas ‘gather together (literal) head’ has only a literal interpretation. High collocational frequency, however, does not necessarily imply idiomaticity. As far as ṛaas goes, its most frequent literal collocation is with bagaṛ ‘cattle,’ all four tokens being literal. No idioms in LCA are known to the author which use the keywords ṛaas and bagaṛ.¹⁵

8.2.4.2 Idioms are normal words: Intra-clausal functions The need to reference ‘normal’ properties of lexemes in idiomatic meaning can be further exemplified in the following. In principle across clause boundaries parts of idioms can be accessed either via their lexical head, or via a possessor pronoun. The former accesses the syntactic properties of the antecedent clause, the latter referential. The following examples are all elicited. Given the question in (8.5), cross-clausally either (8.6a) or (8.6b) are possible answers. In (8.6a) the 3MSG object pronoun on tallafó cross-references the head noun of the idiom, gaḷb. (8.6b) is equally a possible answer, cross-referencing the possessor of gaḷb.¹⁶ ¹⁵ Similarly Newman (2014: 125) offers collocational statistics for ‘hand’ and other body-part keywords, based on the BNC corpus. Hand as subject shows the highest strength of association with the verb tremble. The collocation, hand tremble, however, is not an idiom. By way of comparison, in Newman’s list of the ten most frequent collocations for hand as object, the familiar idiom ‘give s.o. a hand = help’ does not make the list. ¹⁶ In Owens and Dodsworth (2017) it was assumed as a statistical hypothesis that since the ‘true’ subject of a clause such as (8.2 and n. 21 below) is the possessor of the idiomatic keyword (-a in galba), idioms such as (8.2) would favor a heightened degree of cross-clausal reference, vis-à-vis a literal noun

254 (8.5)

(8.6a)

MORPHOSYNTAX AS AN ADAPTIVE MECHANISM I: IDIOMS šunu tallaf galb-ak? what spoiled heart-your ‘what angered you?’ rufugaan-hum tallaf-ó friends-their spoil-M.PL.it.M ‘Their friends angered me’ (lit. angered it = my heart)

or (8.6b)

rufugaan-hum tallaf-oo-ni friends-their spoil-M.PL-me ‘Their friends angered me’

On the other hand, clause internally it is only the nominal idiomatic keyword which can be cross-referenced in standard syntactic processes. For instance, a common construction in Arabic is the topicalization of a noun to clause-initial position (see (8.7), (8.8)), its ‘original’ position marked by a cross-referencing pronoun. With all nouns, whether idiomatic or not, it is only the head noun which can be cross-referenced pronominally. Thus, given the same model as in (8.5), only gaḷb can be cross-referenced on tallafó. 8.7(a)

galb-ak tallaf-ó heart-your.M spoil-they.it.M ‘A s for you, they angered you’

(8.7b)

∗

galb-ak tallaf-oo-k heart spoil-they-you This is the identical constraint that applies with literal nouns. ∗

(8.8a)

watiir-ak tallaf-oo-ha car-your spoil-they-it.F ‘A s for your car, they spoiled it’

(8.8b)

∗

watiir-ak

tallaf-oo-k

Thus, whatever lies behind idiomaticity semantically, the constituent lexemes need to have a lexical representation which allows them to be treated in the same way as literal nouns for clause-internal processes. I would note here that there is empirical evidence as to the degree to which a keyword is actually referred to in discourse, which will be discussed in 8.4 below. + possessor pronoun, such as bagart-ak ‘your cow.’ Statistical comparison, however, showed that this is not the case (and in fact, pronouns on literal nouns are actually more likely to be referred to in following discourse than are idiomatic ones). This for us was a counterintuitive result. The issue of the status of pronominal possessor on an idiomatic noun requires greater research.

8.3 IDIOMS ARE NORMAL WORDS BUT PRODUCE DISTRIBUTED POLYSEMY

255

Whereas the elicited responses allow both (8.6a) and (8.6b), in practice (8.6a) is highly dispreferred.

8.3 Idioms are normal words but they produce distributed polysemy Idiomatic collocates utilize not only obligatory grammatical properties of their literal counterparts pertaining to all clause-internal processes. They also access these properties adventitiously to define contrasting idioms. For reasons which will become clear, I term this “distributed polysemy” (Fillmore 1982; more recently Sullivan 2013).

8.3.1 Pronominal reference The idiom {[[ligi ṛaas-PSSR] get head-PSSR = “X”]} requires the verb ligi and the object ṛaas, which is in a possessed relation. The possessor of ṛaas must be human (as a noun token, or by reference, if pronominal). This can be illustrated with the following set of examples. All involve differential interpretations based on what broadly can be thought of as differences of reference and anaphor.¹⁷ (8.9)

ligi ṛaas idioms a i. lig-at ṛaas-hum get-F head-their ‘She got their support’ a ii. lig-at ṛaas an-naas get-F head DEF-people ‘She got the people’s support’ b. ligi ṛaas-a get head-his ‘He escaped.’ c i. lig-at ṛaas-ha get-F head-her ‘She gave birth.’ c ii. ligii-na ṛaas-na got-we head-our ‘We (female speakers) gave birth’

¹⁷ This includes treating subject–verb agreement as an anaphoric relation, as will be dealt with in Chapter 11.

256

MORPHOSYNTAX AS AN ADAPTIVE MECHANISM I: IDIOMS

In each case a difference of meaning correlates with a difference in the grammatical environment of each idiom. The common structure can be broken down as follows. (8.10)

ligi ṛaas a. ligi1 ṛaas-possessor2 b. ligi1 ṛaas-possessor1 c. lig-at1F ṛaas-possessor1F

In (8.9a), the possessor must have disjoint reference with the subject. The possessor can be either a pronoun, as in (8.9a i), or a noun (8.9a ii). In (8.9b) the possessor must be co-referential with the subject, so it must be a pronoun crossreferencing the subject. Herein lies the fundamental difference between (8.9a) on the one hand, and (8.9b, c) on the other. In (8.9c) the possessor must equally crossreference the subject, and in addition, the subject/co-referencing possessor must be feminine. The cross-reference can be speech-situational in the case of a first person subject. (8.9c ii) assumes the speakers are females.

8.3.2 Distributed polysemy and thematic roles The nature of the frames plays a role in determining the interpretation of the idiom. What is meant by frame can be described informally using the basic format of FrameNet (FrameNet: https://framenet.icsi.berkeley.edu/fndrupal). A frame describes an event or state and is composed of thematic roles of various types, Agent, Theme, Source, Goal, Frequency, Manner, and many more. Frames can be defined at different levels of generality, with lower order frames inheriting frame attributes from higher level ones. Frames are realized in individual lexical units, which are said to “evoke” the frame they instantiate. Each lexical unit which instantiates a frame has the attributes of that frame. The frame for “Bringing” describes a certain type of event and is realized by such verbs as ‘carry,’ ‘bring,’ and ‘fetch.’ The basic elements of the frame for šaal ‘carry’ might be characterized as follows, adding syntactic categorical information to the frame Because there are no general frames defined for LCA, the lexical item šaal will include a definition of the activity which it describes (normally such definitions appear in higher non-lexical levels). (8.11)

skeletal frame, šaal Nature of frame: pick up and keep (for one’s benefit, see (8.13, 8.14, 8.16) below for elaboration) Frame elements: Agent = subject: NP, usually human Patient = object: NP Source: PP Goal: PP

8.3 IDIOMS ARE NORMAL WORDS BUT PRODUCE DISTRIBUTED POLYSEMY

257

The agentive subject is normally human. A look at the first 200 of the 1006 šaal tokens from the corpus shows 180, 90%, to have a human subject. It can equally be any being or object with inherent motive force, a cow for instance carries individuals, as do cars and trucks. In a few cases abstract entities like hukuuma ‘government’ ‘appoint,’ i.e. šaal an individual.¹⁸ The object is semantically and grammatically unconstrained. It can be either singular or plural, be a single discrete object or a collection. Like all verbs it occurs with various adjuncts—time, reason etc. (non-core thematic roles). In addition it occurs with a specific source— goal complement, min … le ‘from … to.’ These will not figure in the present discussion, though there are idioms which utilize them, e.g. an idiom to indicate the extent of area which a distance covers. (8.12)

šiil-a min hine wadd-a le baama carry-it from here send-it to Bama ‘Carry it (i.e. the hypothetical line of distance) from here and send it to Bama ‘It lies between here and Bama.’

An important aspect of šaal is its semantics. Basically it indicates a composite action of picking up an object, carrying it and keeping it, or disposing of it on one’s own behalf (if the subject is human). (8.13)

al-mara šaal-at al-ṛaaba le s-suug DEF-woman carry-F DEF-sour milk to DEF-market ‘The woman carried the sour milk to the market.’

Here šaal implies that the woman prepared the milk, carried it all the way to the market, and sold it on her own behalf. Selling is implied by ‘market,’ and that it is for her own benefit is implied by the verb šaal. A contrasting meaning is produced by the verb axad ‘take’ (al-maṛa axadat …), which in the same sentence would suggest that the sour milk which the woman took to the market was not her own. Altogether six {šaal ṛaas} idioms have been identified, and their syntax and semantics can be discussed further here. The six idioms all inherit the skeletal frame given in (8.11). Importantly, all idioms also reflect the essential two parts of the meaning of ‘šaal’: it suggests that an act has begun which has a duration over time, and the agent of ‘šaal’ is in control of what is taken over. Beyond this basic point, however, each of the idioms is marked by its own idiosyncratic, properties. These properties involve basic structural, paradigmatic and syntagmatic elements. Four of the idioms have ṛaas as Patient. ¹⁸ The FrameNet annotation for “carry” distinguishes between an Agent as subject, and a Carrier, which would be for instance, a guided missle (carrying a payload). Such fine distinctions are ignored for current purposes.

258 (8.14)

MORPHOSYNTAX AS AN ADAPTIVE MECHANISM I: IDIOMS ṛaas, patient (a) šaal ar-ṛaas carried DEF-head ‘He took the lead/headed the column’ (b) al-kalaam šaal ṛuusse katiir-aat DEF-word carried heads many-PL ‘The issue had a number of ramifications/became complicated’ (c) šuqul šaal ṛaas-ı´ something carried head-my ‘Something distracted me’ (d) kalaam-ak šaal ṛaas-ı´ word-your carried head-my ‘What you said convinced me’

Running through the distinctive characteristics of each idiom, in (8.14a) the Patient-object ṛaas is marked by the definite article. In (8.14b) the Agent-subject is propositional. (8.15) would be a good paraphrase of (8.14d). (8.15)

propositional al-inta gul-t-a šaal ṛaas-i REL-you said-you-it carried head-my ‘What you said convinced me.’

In (8.14c, d) the object ṛaas must be possessed, usually by a possessive pronoun, and in (8.14b) ṛaas must be an indefinite noun, can be plural, and is typically modified by an adjective, as in the example. (8.14c) is distinguished from (8.14d) merely by the nature of the subject. If the subject is propositional, the appropriate translation in English is ‘convince.’ If it is not, if it is an arbitrary sound, an unspecified matter, it is ‘distract.’ Two {šaal ṛaas} idioms have ṛaas as subject. (8.16)

ṛaas as Agent = subject (a) ṛaas-a bi-šiil head-his 3-carry ‘He learns quickly’ (b) raas-a šaayil head-his carrying ‘He has problems’

al-gǝṛá ajala DEF-study quickly kalaam word

In (8.16a) the patient must be abstract. (8.16b) is normally expressed without an object, the head is carrying something, but this thing is not stated. In both the subject ṛaas possessed. (8.16) is doubly durative. The verb “šaal” itself is durative, as already seen, and the active participle form of the verb also has a durative function, in LCA as indeed in virtually all varieties of Arabic (see 11.3.2). In both of these,

8.4 HOW IDIOMS ARE DIFFERENT FROM ‘NORMAL’ CONSTRUCTIONS

259

again the idiomatic meaning mirrors the durative meaning of “šaal” itself and in both the subject is agentive, though not prototypically human. The six idioms obviously are not distinctive in their basic lexical make-up. Instead differentiation is achieved via various grammatical and lexical properties. Sequence and attendant upon this, thematic role, plays a role: whether ṛaas occurs as Agent = subject or Patient = object is distinctive. The choice of subject is distinctive: whether the item which occurs as subject is propositional or not, is distinctive, whether it is human or not is distinctive. Syntactic form can be distinctive: whether ṛaas is definite, indefinite or possessed are all factors defining the idioms. While all idioms adhere to the basic meaning of the frame, of the four {šaal ṛaas} idioms with ṛaas as object, only one adheres to the prototypical Agent = human. In three of them the agent is propositional. In this regard, the idioms are idiosyncratic. What is clear is that the different meanings which {šaal ṛaas} is associated with are distributed across different parts of the basic frames and pertain to different aspects of them. The polysemy is distributed across the entire frame. Distributed polysemy also implies that idioms are necessarily sensitive to a contextualized integration of all grammatical parts of a clause, both lexical and functional morphemic.¹⁹ In general distributed polysemy adds considerable semantic expressivity to the language. There are in the corpus six different idioms based on the single collocation {šaal + ṛaas}, with the differentiation defined by elements of referentiality, word order, the semantic nature of subjects, occurrence of the definite article and other factors. All of these elements are those which are inherent properties of any lexeme, whether used literally or idiomatically. Nouns in Arabic can take a definite article, they contract referential relations, they have inherent semantic attributes which combine with the constituents they occur in to produce clausal meaning.²⁰

8.4 How idioms are different from ‘normal’ constructions: Characterizing idioms It was shown in the previous three sub-sections that nouns in an idiomatic relation display normal morphological and syntactic properties of lexemes. Acknowledging this property of idioms, it can be asked how then idioms are different from

¹⁹ Further discussion and exemplification of distributed polysemy as well as various issues relating frames to idiomaticity in LCA is found in Owens (2015 and 2016). ²⁰ Note that in the summaries for individual idioms, elements which are definitional to an idiom, such as the occurrence of a possessor with a noun, are included in the formula.

260

MORPHOSYNTAX AS AN ADAPTIVE MECHANISM I: IDIOMS

their non-literal congeners. As seen in 8.1, the CL answer to this is that idioms realize conceptual metaphors, participating thereby in the systematic mapping which characterizes metaphors. This explanation is obviously not available to a treatment which sees idioms as lexically defined entities. There are two ways in which idioms are different from their literal counterparts. One way, though not frequent, is by the creation of new structure. This is left for illustration in App. 8.4. A far more general characteristic distinguishing idioms pertains to discourse referentiality. In an independent study (Owens and Dodsworth 2017), the same corpus of LCA was used to examine the idiomaticity around the keywords ṛaas ‘head,’ gaḷb ‘heart,’ iid ‘hand,’ baɗun ‘stomach,’ and ʔeen ‘eye.’ These were analyzed in what we termed “discourse embeddedness.” A noun that is embedded in discourse is transparent to referential properties. Discourse referentiality is measured by the degree to which a referring expression, a noun or pronoun, is picked up in a preceding or following context. The degree of referentiality was compared between three classes of lexemes, those which (1) are never idiomatic (in the corpus), such as bagara ‘cow,’ qalla ‘grain,’ and nugura ‘hole’. Nouns which can be idiomatic or not, are classified according to whether they are (2) used literally or (3) idiomatically. Classes (2) and (3) consist of the “same” nouns with both literal and idiomatic meanings. The purpose of the comparison was to determine the referentiality of these nouns in discourse. Discourse referentiality was measured using a battery of tests (Owens et al. 2010, Owens, Dodsworth, and Kohn 2013), which essentially are based on the criteria developed in the 1980s and 1990s (e.g. Prince 1981; Givón 1983; Chafe 1994). A noun is referential if it is referred to in previous or following discourse. (8.17)–(8.19) illustrate proto-typical situations from the corpus. Non-idiomatic noun, qalla ‘grain’. (8.17)

i-jiib1- u2 lee-na4 3-bring-PL to-us bi-kiil-uu2 -ha3 3-weigh-PL-it.F

al qalla3 foog DEF-grain on

at-tawwaar DEF-bulls

ni-nš-u4 1-go-PL

min borno bi-jiib-uu2 -ha3 . . . from Kanuri 3-bring-PL-it.F

‘They bring us grain on bulls and we go and weigh it and the Kanuri bring it. (TV44b) Potentially idiomatic noun in literal guise, ʔeen ‘eye. (8.18)

ʔeen1 -ak da ma ti-sill-aha tu-zugg-ii-ni ba-a1 eye-your DET not you-remove-it.F you-throw-F-me with-it.F ‘That eye of yours, don’t take it out and hit me with it’ (GR153, from a folktale)

Potentially idiomatic noun, in idiomatic guise.

8.4 HOW IDIOMS ARE DIFFERENT FROM ‘NORMAL’ CONSTRUCTIONS

(8.19)

261

[kan gaḷḅ1 -i2 i1 -door šiya bikaan] [ba2 -guul] [ba2 -waddı´3 if heart-my 3M-want little place I-say I-send.him le aj-juduud-a áṛaḅ] to RC-ancestors-his Arabs ‘If I were to prefer a place, I would say I would send him to one whose ancestors are Arab’ (IM144)

In (8.17) and (8.18) the nouns qalla, never idiomatic, and ʔeen, non-idiomatic in (8.18), are referred to pronominally in the following clause, as marked by the coindices. The idiomatic noun gaḷb in (8.19) evokes no reference in the following clause. Nine nouns were selected. Bagara ‘cow,’ nugura ‘hole,’ qalla ‘grain,’ ruďaana ‘language’ are never idiomatic in our corpus. Baďun ‘stomach,’ ṛaas ‘head,’ gaḷb ‘heart,’ iid ‘hand,’ and ʔeen ‘eye’ can be used literally or idiomatically. These nouns were tagged for their discourse properties, yielding 1403 tokens. A look at the bare statistics reveals a clear three-way contrast between the three classes of lexemes. The literal nouns, bagar, qalla, nugura, and ruɗaana to about an equal degree are referential to a pre- or post-context. Body parts in their non-idiomatic guise are less likely to be referential, by about a 3:1 margin. Body parts in their idiomatic guise are far less likely to be referential, by about a 5:1 margin. The preliminary conclusion is that idiomaticity reduces the referential transparency of a noun in discourse. Table 8.3 Referentiality of body part and non-body part keywords in discourse, raw scores

Continued in following clause Not continued

idiomatic guise

non-idiomatic guise

literal only

68

56

385

354

142

404

This study was made more precise in a multivariate analysis with the factors ‘keyword type,’ ‘syntactic function of keyword,’ ‘syntactic function of keyword continued in following clause,’ and whether or not the keyword was marked by a possessor pronoun (Table 8.4).²¹ This essentially confirmed the basic statistical trend in Table 8.3, except that there was no significant difference between an idiomatic and non-idiomatic body part, The literal body part did positively correlate with referentiality in the clause.²² ²¹ The idea here was that the possessor of an idiomatic keyword like gaḷb-í faar ‘my heart boiled = I got angry’ was the true subject and hence would be the referent followed in surrounding discourse. This, however, turned out not to be significant. ²² One factor perhaps accounting for the lack of significant contrast in the multivariate analysis, besides trying to determine the role of co-determining factors, is that some keywords have either very few, or no literal tokens at all. Gaḷb ‘heart’ is only non-literal and ṛaas ‘head’ is largely so.

262

MORPHOSYNTAX AS AN ADAPTIVE MECHANISM I: IDIOMS Table 8.4 Estimates from generalized linear mixed effects model with persistence vs. non-persistence of the keyword in the following clause as the dependent variable Estimate Keyword type literal body part literal non-body part reference level=abstract body part Syntactic function of keyword adverb object prepositional object possessor topic reference level=subject Possessor pronoun reference level=no possessor Keyword type ∗ Possessor pronoun literal body part: possessor literal non-body part: possessor ∗

.66 1.34∗∗

−13.45 −.11 −.54∗∗ −.85∗∗∗ 1.00∗∗∗ −.61

.59 .81∗

=p Ving > _gonna. These include New York City, Detroit, Los Angeles, Berkeley, and Springfield Texas, among others (Rickford et al. 1991: 121; Cukor-Avila 1999: 347; Labov 1969: 715 n. 2). Carmichael and Becker (2018) show that rhoticity in New Orleans shares inherited ⁷ As argued by Labov, 2007: 364. ⁸ Though Labov might see (11.1b) as incrementally simplifying as it were (e.g. loss of morpholexical conditioning factors), vs. (11.1c) which is simply irregular.

11.3 WHY? THE BASIC ISSUE

347

constraints with its NYC origins. Buchstaller and D’Arcy (2009: 306 (also Tagliamonte and D’Arcy, 2007)) demonstrate that quotative be like has spread and been maintained fairly uniformly from American English to British and New Zealand English. The stability demonstrated in these studies is interesting in a different way from that in Table 11.1. Despite representing a shallower, in some cases (e.g. “be like”) a much shallower time depth, because the variationist treatment defines the stability via a complex set of conditioning factors, one can attribute maintenance of the system to factors independent of the phenomenon itself. Stability is in part characterized as the maintenance of internal constraints. These points serve as a broad orientation to a comparative linguistic reconstruction which will examine three speech communities separated by a far longer period of time than the approximately 230 years covered in Labov’s study. This chapter looks at the variable expression of null/overt subject in Arabic. It takes as its starting point spoken Arabic. The question that will be addressed starts with a simple, striking observation about the uniformity of complex verb paradigms in Arabic. It seeks to explain what is behind the uniformity using a comparative research design comparable to that used in a sociolinguistic methodological tradition.

11.3 Why? The basic issue Set against the twin factors of time and geo-social expansion rather than a simple observation about verb forms detached from the communities which support them, the reason why Table 11.1 exists can now be set in sharper focus. The stability in the paradigm is, at the risk of overusing the adjective, remarkable. Clearly, however, there is nothing in the nature of paradigms which inevitably leads to inter-generational transmission (see Chapter 12). Taking 1400, the date in which ancestral LCA came to its current location (see Chapter 8), why should children in generation after generation, between 17.5 to 25.5 generations⁹ have, faithfully reproduced the same basic paradigms in locations as disparate as Uzbekistan, the Emirates, and Nigeria? Formal education obviously is not the answer. Arabic dialects are nowhere taught in schools.¹⁰ They are simply learned as native languages in local communities. The history of Arabic in Nigeria shows that, sheltering away from other languages is not a factor detrimental to maintenance (Miestamo 2009: 129). Arabs are a minority throughout the Lake Chad region, and many are multilingual. In the LCA texts considered (see Table 11.3 below) ⁹ 17.5 if a generation is calculated at 35 years; 25.5 if calculated at 25 years. 1400 is a very shallow period. Ancestral Nigerian Arabic itself has hypothetical ancestors in Upper Egypt, and on into the Middle East, which at least doubles the number of generations where the paradigms have been preserved. ¹⁰ Maltese is the exception, basically an Arabic dialect that has become a national language. Al-Wer (2013/2019: 252) makes the point that as far as the dynamics of language change in Arabic go, Standard Arabic plays a marginal role at best. Contact between native spoken varieties is the crucial variable.

348

L ANGUAGE STABILIT Y II: WATCHING PAINT DRY

for instance, all speakers are bilingual in at least Hausa and/or Kanuri. Moreover, there has at times been considerable contact between Arabic and non-Arabic speakers (Brauka¨mper 1993), and many non-Arabs have been assimilated to Arabic-speaking society (Owens 1998a: Chapter 10) without, apparently, disruption to the language. As seen in Chapters 8 and 9 massive adstratal—probably primarily Kanuri—influence in the domain of idioms in NA, as well as a significant functional extension of demonstratives, causally associated with a massive frequency increase of the demonstratives, occurred under areal language influence, without, however, impinging on core grammatical structures.

11.3.1 Verbal predicates We suggest that one critical part of the answer pertains to the function of the paradigms in context, in discourse. The inflectional elements in Table 11.1 are subject markers.¹¹ As earlier studies of discourse showed (Prince 1981, Givón 1983 Chafe 1994, Naro and Votre 1999), the subject is a key element tracking what a discourse is about. Nearly every sentence has a subject in Arabic overtly indicated in one way or another, and every verbal sentence has one of the inflectional subject indicators in the paradigm in Table 11.1. In Arabic, any of the verb forms listed can serve as a complete sentence (in the following ‘Ø’ = no overt subject or ‘null’ subject, N/O = either null or overt, Brustad 2000). (11.2)

Ø yi-ktib Ø he-write ‘He is writing’

Besides (11.2), one can of course have overt subjects. In the case of an overt subject, there is agreement between the verb form and subject. (11.3)

ana b-a-siir maʕ-ak baacˇir I FT-I-go with-you.M tomorrow (EM1.3, see below) ‘I’ll go with you tomorrow.’

Looking only at the paradigm in Table 11.1, or only at individual sentences, as in (11.2) and (11.3) there is essentially no systematicity as to when null (11.2) vs. overt (11.3) subjects are used. On an elicitation basis, for any sentence elicited with a null subject, the same sentence (same verb form) can be elicited with an overt subject and vice versa. ¹¹ Without entering into the extensive topic of the nature of null subjects discussed in detail in the generative tradition. The nature of null subjects has been the topic of ongoing debate, where there is consensus that a null subject is structurally represented by “pro.” Much of the debate pertains to how pro should be valued, with suggestions ranging from person, number, gender (phi) features on pro valuing the inflectional elements in the verb paradigm (e.g. Schlonksy 2009: 139 on Hebrew) to inflectional elements on the verb valuing pro (see Camacho 2013: 174–184 for overview).

11.3 WHY? THE BASIC ISSUE

349

When the question of choice of overt vs. null subject is phrased as a discourse linguistic problem, however, a very different picture emerges. Basically, other things being equal, overt subjects are avoided. Other things are not necessarily equal when, for instance, a new referent needs to be introduced, when a dramatic highpoint is reached in a narrative and a subject needs to be brought back into the story, when there is a false start or other discourse disfluency which disrupts the story line, or when generalizing but identity irrelevant subjects such as ‘most, someone, no one, a few of ’ (“pronominals”) are called upon. These are contexts which call for an overt subject, as has been shown in a number of studies on a corpus of spoken Emirati and Arabian peninsular Arabic (Owens et al. 2009, 2010, 2013: 26–27). These factors emerge only in discourse. Discourse, however, is normal speech. Paradigms by contrast are analytical abstractions. For this reason it is clear that to claim that inherent properties of the paradigm generate its stability across time, geography, and social space is circular. On the other hand, looking at the deployment of the individual paradigm forms in discourse gives a very coherent picture of the functionality of the contrastive paradigmatic elements. A brief example from a published Emirati Arabic text (AlRawi 1990: 121) illustrates this. In a story about a woman with six sisters the protagonist is introduced in the beginning of the story with an overt subject, first as a member of a collective of seven (hum sabiʕ banaat ‘they were seven girls’) then 11 clauses later individually, ya-t il-ħurma ‘the woman came (to the house).’ The rest of the story, consisting in total of 155 clauses, revolves around the woman and her relation with seven hunters, whom she does not actually meet until the end of the story. The main protagonist, the woman, is referred to in a 3FSG verbal predicate a total of 59 times.¹² In 57 of these cases no overt subject is mentioned. This is despite the fact that references in the 3FSG verb form to the woman are not linearly adjacent. Other protagonists frequently intervene. For example, after 62 clauses the following narrative occurs. (11.4)

… (-4) wu yaa iθ-θaani and came.M DEF-other And the other came Ø rigad Ø slept.M And he slept wu Ø ya-t and Ø came-F And she came

¹² At the end of the story, there are two references to the main protagonist’s sisters.

350

L ANGUAGE STABILIT Y II: WATCHING PAINT DRY

‘The other’ who comes is one of the seven brothers. He is introduced with an overt subject, then is continued with a Ø subject verb in the next clause, and then the woman is brought back into the narrative, after an absence of four clauses, but in the Ø form. The previous reference to the woman was with the object pronoun – ha, not with a full noun. Even relatively long absences from discourse do not force repetition of an overt subject.¹³ In the case of Arabic it is easy to see one important factor at play, namely the fact that the identity of the subject is implicit in the verb form itself. Subject identity in turn can be related to its tendency to occur either overtly or as Ø with an underlying pragmatic mechanism that can be stated as follows. (11.5)

Avoid overt subjects

This can be thought of as a reflex of the Gricean maxim ‘avoid prolixity.’¹⁴ Thinking of the Arabic verb paradigm not in terms of its paradigmatic values, but rather in terms of the function of its individual members in discourse, allows the question of why the paradigm should be so stable to be addressed in contextual and functional terms. (11.6)

function of the paradigm: Given the discourse injunction to avoid overt subjects, the paradigm allows subjects to be tracked minimizing loss of referential identity of the subject

As it stands (11.6) merely draws a possible link between the morphological form of Arabic verbs and a discourse-defined reason for their maintenance. Given the potentially infinite amount of data at one’s disposal, however, (11.6) can be taken as the basis of a testable hypothesis. If the verbal morphology of Table 11.1 is held in place by the discourse function of the person, gender, number markers, then the following should hold. (H 1)

The discourse frequencies of null and overt subjects should be roughly comparable across those dialects where the paradigm in Table 11.1 exists.

H 1 is about bare frequencies. A more sophisticated hypothesis can be formulated against the discourse categories which overt and null subject realize. ¹³ Quite in contrast to the suggestion in Gundel et al. (2010: 1782); though see Prince’s (1981: 238– 247) tagged oral excerpt where there is only one lexical reference to the main protagonist (a female) interspersed with many “she” tokens referencing this individual, some of which resume after other female subjects have been introduced. ¹⁴ This is more general than the structural dictum to “avoid pronoun” (Chomsky 1981: 65), since it covers any subject expression. (11.5) claims that null subjects are the rule in Arabic, and therefore that what has been termed a stylistic “aboutness topic” which sanctions “idiosyncratic” use of pronouns (van Gelderen 2013: 281 on Old English) is not expected. The issue deserves separate treatment, but in Owens et al. (2013: 25–27) repeated co-referential overt subjects are argued to be subject to specific discourse conditions (termed “disharmonic conditions”), i.e. are not idiosyncratic.

11.3 WHY? THE BASIC ISSUE

(H 2)

351

Given a set of discourse categories defining subject occurrence, the constraints defining the categories will be comparable across independent data sets.

The test relating to these two hypotheses will be carried out in the next section, 11.4. Three dialects have been chosen on what is essentially an arbitrary basis. The discourse structure of Emirati Arabic and to a degree Hijazi Arabic has already been treated in detail by the authors, and one of the authors has extensive corpus material for LCA. LCA represents an interesting case for two reasons. First, as explained in 8.1 its ultimate ancestry in the Middle East can be broadly confirmed with written sources, and secondly, since the ancestral variety migrated into the Lake Chad area around 1400 where its speakers constitute a minority group, it has had no meaningful contact with other varieties of Arabic.

11.3.2 The other predicates Before proceeding to the test it is necessary to expand the purview of the study by incorporating two further types of predicates. Arabic offers a challenging test case because it has predicates which are not only richly inflected, but also predicate types which are either only partially inflected or are completely uninflected. All in all, besides finite verbs there are two further types of predicate which we summarize here in order to incorporate them into the research question in 11.4. The active participle in spoken Arabic effectively functions as a third tense/aspect alternative to the imperfect and perfect (see [7.15] and discussion in 7.2.4.3) verbs discussed in 3.1. Just as all verbs have imperfect and perfect forms, so too do all but some stative intransitives have an active participle (AP). The semantic value of the AP varies according to the nature of the lexical stem, though roughly speaking the AP has two main profiles. It has a perfective meaning with active, teleological verbs, the AP presenting the tangible effects of the completed action expressed in the predicate. Thus, whereas rigad in (11.4) identifies the event as a completed whole thus allowing the lady to continue her main role in the story, the AP raagid meaning roughly ‘he has slept,’ would invite an inference that his having slept is immediately relevant to the further development of the action. The second main profile pertains to motion verbs where the AP generally has an intentional meaning. (11.7)

raayiħ ‘I/you/he is (intends to) going’

The essential point for the current issue, however, is that the AP is inflected for gender and number, but not for person (i.e. in Arabic grammar, it inflects like an adjective).

352

L ANGUAGE STABILIT Y II: WATCHING PAINT DRY

(11.8)

Inflection of Arabic AP raagid ‘I.M, you.M he is asleep/lying down’ raagd-a ‘I.F, you.F, she is asleep’ raagd-iin ‘We.M, you.M.PL, they.M are asleep’ raagd-aat ‘We.F, you.F.PL, they.F are asleep’

Out of context, raagid could be any of ‘I, you, he (is) asleep’ A third type of predicate is invariable in form. This is the existential predicate fı´ (with alternative form bı´ or bih rarely šı´ in Gulf Arabic, including Emirati, and Yemen, Holes 2016: 24–28, Behnstedt 2016a: 346). (11.9)

fı´ ʕaddaad yaʕni ʕaddaad sitt dərham exist meter DM meter six dirham ‘There’s a meter that says six dirhams!’

(EM2.1)

The three types of predicates add a further structural dimension to the status of null subjects.¹⁵ If there is a relation between the referential-inflectional properties of the predicate, and the occurrence of null subjects, the following can be hypothesized: (11.10)

Expected hierarchy of increasing occurrence of overt subjects full marking for person, marking for unmarked number, gender > number, gender > finite verb active participle existential predicate

11.4 A multivariate insight into language stability In this section we turn to Emirati, Hijazi, and LCA (Nigerian Arabic), invoking comparative discourse parameters which have now been used in a number of studies.

11.4.1 The data The data come from Emirati, Hijazi (Jedda) and LCA texts. There are two Emirati data sets gathered at an interval of ten years. One set consists of the texts in Al-Rawi ¹⁵ In passing it is relevant to note that there is a fourth type of predicate not treated here, namely equative/attributive/locative predicates which formally simply juxtapose subject and predicate with no formal marking, as in hi hinaak ‘She (is) there.’ These are not included because of the difficulty of distinguishing between a null subject and a sentence fragment, which are not included in any statistical counts. Hinaak by itself could be interpreted either as a predicate with a null subject ‘(she is) there,’ or as a sentence fragment, e.g. as an answer to ween ruħ-ti ‘where did you.F go?.’ A fuller study of ellipsis in spoken Arabic is needed before such cases can be incorporated. The problem does not arise with the existential predicate, which is used only predicatively. To give a very rough idea about how many such “non-verbal” predicates are involved, there are 604 such clauses with an overt subject, which amounts to about 7.3% of all predicates included in the analysis.

11.4 A MULTIVARIATE INSIGHT INTO L ANGUAGE STABILIT Y

353

(1990), while the second consists of three samples, one short telephone conversation from the LDC, collected by an Australian company called Appen, and two more texts collected by an anthropologist, Dr. Bill Young. These texts form the basis of (Owens et al. 2009, 2010, 2013). Jedda is represented in one long text in which two native speakers, both about 50 at the time of the recording engage in an often lively discussion about issues of social relevance to Saudi Arabia. The LCA texts are from a large collection of about 400,000 words which were collected between 1991 and 2001 in northeast Nigeria (see 6.5). Most of these are available online at the address in the bibliography. The speakers in these texts are nativeborn Nigerians, though some of the younger speakers in NA-5 are children of a family from Ndjamena. Seven variables were defined with the values set out in Table 11.4 below. All tagging had to be done by hand, so there is, unfortunately, a limit to how many texts can be tagged. For LCA, five texts were chosen which are roughly of the same total length as the Emirati. Furthermore, very roughly three genres are represented in each. Both have one text which are narratives and folk tales (Al-Rawi, N5), both have informal conversation (E-Appen, E-1, LCA 3, 4) and both have formal interviews conducted, in the case of LCA conducted in part, by a non-Arab Arabic speaker. Text E1 is a mixed genre in that it begins as an interview by a non-Arab, but essentially gets continued as a conversation between two Emiratis (from Al-Ain). The Hijazi text is an unstructured conversation. The basic data is as follows. Table 11.3 Summary of texts used in comparison clausesa

# participants

E1 Al-Rawi (1990), folk tales E2.1 interview with unscripted conversation E2.2-unscripted telephone conversation E2.3 interview

1101 1426 177 543

8 3 (1) 2 2

H unscripted conversation

2,139

2

NA-1, interview, conversation NA-2, interview/conversation, village NA-3, unscripted conversation NA-4, unscripted conversation NA-5, folk tales

1074 683 282 756 945

3 (1) 4 (1) 3 4 6

a

In this case, clauses include full clauses, with dependent clauses such as relative clauses defined as a clause independent of the clause in which they are embedded, as well as fragments, false starts, and backchannel and other sub-clausal discourse markers. Notes: unless otherwise stated, the LCA interviews were conducted in Maiduguri, Borno State. For the Al-Rawi Emirati, eight different individuals contributed stories. The number in parentheses indicates how many of the total number of participants were non-native speakers. Their linguistic contribution is not included in the statistics, and in all cases their contribution is very minimal, for instance in EM2.1 less than 30 words in total.

354

L ANGUAGE STABILIT Y II: WATCHING PAINT DRY

In these texts there are a total of 3,247 clauses in Emirati, 2,139 in Jedda and 3,740 in LCA. Clauses include both verbal and non-verbal clauses, as well as sentence fragments (false starts, one-word answers, and backchannel markers like nzeen ‘right, okay’, Hassan and Owens 2008). The main statistical tests only treat clauses with predicates as described in 11.3.

11.4.2 The parameters Seven independent parameters describe the conditions under which reference to either null or overt subjects, is effected. They are based on the work of Prince (1981), Givón (1983) and others, and have been operationalized in a number of recent quantitatively orientated studies (e.g. Owens et al. 2013). The essential categories have been introduced already in 9.2. The parameters have been described in a number of articles and are similar to those used in other discourse-based studies, and therefore they will be described only briefly here. It should be mentioned that only values are taken into account in the statistical analysis where a free choice of subject type is possible (Labov 1972: 82). tǝrjaʕuun in (11.11) must agree with the main verb in terms of 2MPL and it must have a Ø subject, so it is excluded from the statistical count. (11.11)

ti-gdər-uun tə-rjaʕ-uun 2-able-MPL 2-return-MPL ‘can you get back (in time)?’ (Emirati)

Table 11.4 The variablesa Type

Variable

Categories

morphological

number of subject (marked on predicate) person of subject (marked on predicate) tense/aspect of predicate

singular vs. plural

morphological morphological syntactic lexical discursive discursive a

complement of gaal ‘say’ lexical frequency previous mention of subject reference genre

first, second, third perfect, imperfect, active participle yes vs. no (continuous) in previous clause vs. environment conversational vs. procedural

We note here only essential information. In actual fact it is often the case that an independent variable was tagged in a more detailed way than its ultimate definition in the statistical analysis takes accounts of. For instance, the variable whether the subject was, following Prince (1981), “brand new” or “inferable” was distinguished, but both types are invariably overt and hence were not treated in the statistics.

11.4 A MULTIVARIATE INSIGHT INTO L ANGUAGE STABILIT Y

355

The linguistic variables are straightforward. As can be seen in Table 11.1, all verbs are inflected for singular or plural and for first, second, and third person. Gender was not included as a variable in this study. Arabic verbs are perfect, as in Table 11.1, imperfect (see paradigms from three dialects in Chapter 10) and a third alternative is the active participle, as illustrated in (11.8, 11.9) above. Lexical frequency counts how many tokens per lexical type occur in a sample. A lexical type is an abstraction¹⁶ in that it generalizes all related morphological forms, as defined under the linguistic variables above, i.e. a lexeme such as širib ‘drink s.t.’ may occur in perfect, imperfect (a-šrab ‘I drink’), AP (šaarib ‘has drunk’) form, with different person and number inflections (širb-at ‘she drank, širb-o ‘they drank’ etc.) each individual form counting as a token of the lexeme {širib}. Derived verbs, e.g. šarrab ‘have s.o. drink s.t.’ on the other hand are counted as separate lexemes. There is one syntactic variable. Previous studies showed that a predicate occurring as a complement of gaal ‘say’ (i.e. ‘say that Clause’) tends to favor an overt subject. This variable is also included in the current study. Previous mention describes the relation between the subject referent in current clause to a previous mention in the discourse. Genre divides the texts into two types, conversation and procedural. Conversational texts are completely unsolicited as to topic as far as the interviewers go and the participants were free to speak whenever they wished to. Procedural texts are two types. One is an interview. In Emirati Oral, for instance, an interviewer posed questions to the interviewee, who was bound to answer directly to the question. The others are folktales and narratives, where one speaker has the stage and expatiates, either following the script of the folktale or describing a cultural activity or artefact.

11.4.3 The statistical tests In this section, we describe the data to be used for quantitative analysis of the constraints on subject expression in the four corpora, discuss the raw similarities and differences across the corpora, and finally compare the strength and ranking of constraints via logistic regression. The goal of the quantitative analysis is the evaluation of our H1 and H2. The major class of exclusions to the data prior to quantitative analysis has to do with the previous occurrence of the subject referent in the discourse, coded in the “previous mention” variable. Several of our initial categories of previous mention are fully or nearly categorical in all corpora and were therefore excluded: new referents, ‘available’ referents, and inferable referents all correspond to overt subjects in all four corpora, and impersonal constructions have null subjects (see Table 11.4). This leaves the three variable categories of subject in previous ¹⁶ In contrast to the treatment of Erker and Guy (2012: 530) which defines lexical frequency in terms of individual surface forms.

356

L ANGUAGE STABILIT Y II: WATCHING PAINT DRY

clause, non-subject in previous clause, and ‘environment.’ The first two of these categories were ultimately combined due to low token counts in the second. In addition, existential and imperative clauses were excluded from all four corpora. In both Emirati corpora, imperatives categorically have null subjects and existentials categorically have overt subjects. Hijazi imperatives are not categorically null, and in Nigerian, neither existentials nor imperatives are categorically null or overt; nevertheless, they were excluded from Hijazi and Nigerian to allow for true comparison across corpora. Finally, clauses with nominal predicates and wh-words as subjects were excluded. Tables 11.5–11.8 give token counts and overt subject percentages for each corpus. Note that the two Emirati corpora differ in size and in genre; the AlRawi/Emirati corpus has narrative but not conversational data.

Table 11.5 Contingency table for the oral Emirati corpus. Subject Previous mention environment in previous clause Predicate type imperfect active participle (AP) perfect Person first person second person third person Number singular plural Discourse type conversation procedural gaal complement no yes Total

Null

Overt

Total N (% overt)

367 823

170 194

537 (32) 1017 (19)

877 45 268

224 43 97

1101 (20) 88 (49) 365 (27)

390 232 568

144 39 181

534 (27) 271 (14) 749 (24)

747 443

263 101

1010 (26) 544 (19)

870 320

293 71

1163 (25) 391 (18)

1175 15 1190

351 13 364

1526 (23) 28 (46) 1554 (23)

The similarities in raw frequency of overt subject expression are striking and appear to support our H1. The overall rates of overt subject expression for the four varieties are 23%, 16%, 25%, and 23%. The lowest rate of 16% corresponds to Al-Rawi/Emirati, the one variety for which we have only narrative rather than

11.4 A MULTIVARIATE INSIGHT INTO L ANGUAGE STABILIT Y

357

Table 11.6 Contingency table for the Al-Rawi/Emirati corpus Subject Previous mention environment in previous clause Predicate type imperfect active participle (AP) perfect Person first person second person third person Number singular plural Discourse type conversation procedural gaal complement no yes Total

Null

Overt

Total (% overt)

166 579

73 74

239 (31) 653 (11)

300 11 434

63 10 74

363 (17) 21 (48) 508 (15)

134 34 577

20 2 125

154 (13) 36 (6) 702 (18)

390 355

120 27

510 (24) 382 (7)

0 745

0 147

0 892 (16)

706 39 745

132 15 147

838 (16) 54 (28) 892 (16)

conversational data. For conversational speech, which excludes Al-Rawi/Emirati, overt subjects occur at the rates of 25%, 25%, and 24%, respectively.¹⁷ There are also clear similarities with respect to the internal constraints. In all four varieties, overt subjects are less frequent when the subject referent occurs in the previous clause, relative to when it occurs in the preceding discourse environment but not in the previous clause. This is not surprising in light of previous work on subject expression. In addition, the rates of overt subject expression are substantially higher in all four varieties when the verb is an active participle (AP) relative to imperfect or perfect verbs. Finally, in all four varieties, second person subjects have lower rates of overt expression relative to first and third person subjects, and overt subjects are more frequent when occurring in a clausal complement to gaal ‘say.’ While the similarities among these historically and geographically distal varieties are remarkably strong, we also find differences. First, as noted, existential constructions are categorically overt in both Emirati corpora and in Hijazi, but ¹⁷ However, the Hijazi text was subjected to a finer analysis of subject expression in the conversation vs. narrative domain, and here it can be seen that the conversational segments contain more overt subjects (for this and other details, Owens 2019b: 87).

358

L ANGUAGE STABILIT Y II: WATCHING PAINT DRY Table 11.7 Contingency table for Hijazi Subject Previous mention environment in previous clause Predicate type imperfect active participle (AP) perfect Person first person second person third person Number singular plural Discourse type conversation procedural gaal complement no yes Total

Null

Overt

Total (% overt)

211 460

130 90

341 (38) 550 (16)

454 16 201

117 22 81

571 (20) 38 (58) 282 (29)

245 164 262

97 26 97

342 (28) 190 (14) 359 (27)

527 144

170 50

697 (24) 194 (26)

671 0

220 0

891 (25) 0

663 8 671

211 9 220

874 (24) 17 (53) 891 (25)

not in LCA. The categoricity of existential clauses could, however, be an accidental gap, the result of limited data or genre rather than a truly categorical constraint. Similarly, imperative clauses categorically have null subjects in Emirati, but not in Hijazi or Nigerian, though this too could be accidental. Another apparent difference is that the higher rate of singular vs. plural overt subjects in Emirati does not appear in Hijazi and is weaker in LCA. A final difference across the corpora is the collinearity among constraints. The variables of previous mention and predicate type are not collinear, as determined by chi-squared tests, in Hijazi or LCA, but they are collinear in both the oral Emirati corpus (X 2 = 8.1, df = 2, p