Reflections on Syntax 143318432X, 9781433184321

The lectures in this book are immensely Chomskyan in spirit, recursive-syntactic in nature, and tethered to a framework

269 103 12MB

English Pages 314 [315] Year 2021

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Generalized Transformations and Beyond: Reflections on Minimalist Syntax 9783050074757, 9783050032467

137 18 12MB Read more

Reflections on Translation 9781847694102

This collection of essays brings together a decade of writings on translation by leading international translation studi

148 55 441KB Read more

Reflections on hanging

Preface ======= In 1937, during the Civil War in Spain, I spent three months under sentence of death as a suspected spy

594 61 4MB Read more

Reflections on Liszt 9781501717031

"No one knows more about Franz Liszt than Alan Walker."—Malcolm Bowie, Times Literary Supplement In a series o

117 69 22MB Read more

On Delbruck's Vedic Syntax 9781463221737

William Whitney reviews the Vedic syntax of Delbruck, the founder of the study of comparative linguistcs.

177 47 2MB Read more

Reflections on the Psalms

Lewis writes here about the difficulties he has met or the joys he has gained in reading the Psalms. He points out that

564 22 279KB Read more

Reflections on Violence 0486437078, 9780486437071

494 101 641KB Read more

On Hinduism: Reviews and Reflections

576 73 244KB Read more

Reflections on Multiliterate Lives 9781853597046

This is a collection of personal accounts of the formative literacy experiences of highly successful second language use

136 41 741KB Read more

Reflections on Life After Life

Here are more exciting first-hand accounts that life after death may actually exist, gathered by Dr. Raymond Moody, the

117 59 Read more

Reflections on Syntax
143318432X, 9781433184321

Author / Uploaded
Joseph Galasso

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Reflections on Syntax

BERKELEY I N SI G H TS IN LINGUISTICS AND SEMIOTICS Irmengard Rauch General Editor Vol. 101

The Berkeley Insights in Linguistics and Semiotics series is part of the Peter Lang Humanities list. Every volume is peer reviewed and meets the highest quality standards for content and production.

PETER LANG New York • Bern • Berlin Brussels • Vienna • Oxford • Warsaw

Joseph Galasso

Reflections on Syntax Lectures in General Linguistics, Syntax, and Child Language Acquisition

PETER LANG New York • Bern • Berlin Brussels • Vienna • Oxford • Warsaw

Cataloging-iinPublication Library of Congress Cataloging- n-P ublication Control Number: 2021010269

Bibliographic information published by Die Deutsche Nationalbibliothek. Die Deutsche Nationalbibliothek lists this publication in the “Deutsche Nationalbibliografie”; detailed bibliographic data are available on the Internet at http:// http://dnb.ddnb.d-nb.de/ nb.de/.

ISSN 08930893-6935 6935 ISBN978ISBN978-11-43314331-84328432-1 (hardcover) ISBN978-11-43314331-84338433-8 (ebook pdf) ISBN 978978-11-43314331-84348434-5 (epub) DOI 10.3726/b18267 b18267

© 2021 Peter Lang Publishing, Inc., New York 80 Broad Street, 5th floor, New York, NY 10004 www.peterlang.com All rights reserved. Reprint or reproduction, even partially, in all forms such as microfilm, xerography, microfiche, microcard, and offset strictly prohibited.

To Nathalie, Nicolas, Zoe, and Daphne …

Contents

List of Figures and Tables Preface Overview Introduction

ix xi xv 1

Opening Philosophical Questions: Language and Brain Analogies

11

Lectures

1

2

Preliminary Overview

47

3

The ‘Four Sentences’

65

4

Reflections on Syntax

93

5

Reasons for Syntactic Movement/‘Four Sentences’ Revisited

125

viii | Reflections

6

on Syntax: Lectures in General Linguistics

The Myth of ‘Function Defines Form’ as the Null-Biological Adaptive Process and the Counter Linguistics-Based Response. (The ‘Accumulative Lecture’)

141

Appendixes A1: Poverty of Stimulus 175 A2: Concluding Remarks. The Dual Mechanism: Studies on Language 183 A3: A Note on ‘Proto-language’: A Merge-Based Theory of Language Acquisition—Case, Agreement and Word Order Revisited 195 A4: Concluding Remarks: Lack of Recursion Found in Protolanguage 225 A5: A Note on the Dual Mechanism Model: Language Acquisition vs. Learning and the Bell-Shape Curve 235 A6: Overview of Chomsky 247 Works Cited 251 List of Terms (informal definitions) 253 Full References and Web Links 257 Index267

Figures and Tables

Figure 1 Input-Output Model of LF

12

Figure 2 LF Model

16

Figure 3 Model of Linguistic Expression

26

Figure 4 Fibonacci Sequence Yields Syntactic Tree Diagram

36

Figure 5 Principles & Parameters Model

41

Figure 6 Fibonacci Spiral Formation

48

Figure 7 Interface Systems

70

Figure 8 Version of an AI ‘Treelet’ Structure (Showing Hierarchy). Marcus (2001). ‘Box Inside Pot’ (NOT ‘Pot Inside Box’)

186

Figure 9 Magnet Effect and Clustering of Target Phonemes. (Google© ‘Free-to-Use’ Image)

189

x | Reflections

on Syntax: Lectures in General Linguistics

Figure 10 Hidden Markov Model (Figures 10–12: Google© ‘Free-to-Use’).

191

Figure 11 Multilayer Perceptron

192

Figure 12 Müller-Lyer Illusion

192

Figure 13 Template Scaffolding Linguistic Theory

240

Figure 14 Bell-Shape Curve/Competency of Learned Skill

241

Figure 15 Bell-Shape vs. Right Wall: Biological Basis for Language

242

Table 1

Statistics of Stage-1 vs. Stage-2 Possessive {‘s}, Verbal {s}

229

Table 2

Frequency of Occurrence of First-Person Singular Possessors

230

Table 3

Verb-Subject (VS) Structures/Token Counts (Word Order)

232

Preface

Whether it be … constraints placed on phonological assimilation which stipulate that in order for the horizontal spreading of voicing to occur between two adjacent consonants, they must first be of a ‘sisterhood’ relation; whereby, for example, the /r/in ‘cars’ provokes assimilation of plural /s/=> /z/, in contrast to the / r/in ‘Carson’ which does not. (The former a structural ‘sisterhood’ relation, the latter a ‘mother-daughter’ relation in terms of ‘family-tree’ hierarchy): (i)

‘Carson’ being broken-up by a syllabic stress /$/… /kar $ sƏn/ (CVC $ CVC). 1. ‘Cars’ [K [a [rz]]] 2. ‘Carson’ /a/

/r/ /r/

s = /z/ /karz/ (cars)

Assimilation of ‘voicing’ applies between sisters.

/s/ = /s/ *(not /z/)

*No assimilation between mother-daughter.

xii | Reflections

on Syntax: Lectures in General Linguistics

Or whether it be … the naïve view that the two apparently adjacent final- position sounds of /_ks/as found in the two words ‘fix’ /f Iks/versus ‘speaks’ / spiks/surely must equally get processed similarly (they do NOT) calls on us to reconsider something much more insidious going on in the underlying structure of morphophonology: (ii)

1.

‘Fix’ /fIks/

2.

/I/

‘Speaks’

stem affix /k/

/s/

[[speak] s]

The differences in processing resulting in such distinctions can account for both developmental (as found in child language) as well as for Second Language (L2) errors of omission: viz., with a lexical stage-1 of child language often deleting the final ‘mother-daughter’ /_ /s/inflectional affix but never deleting the final ‘sisterhood’ /_ s/stem-based element (with similar findings for L2 adults).

And then … to think that such scaffolding of ‘mother-daughter’ hierarchical structure which yields a recursive syntax comes to us for ‘free’—part-and-parcel of the design of the human brain/mind—is something to wonder. This is what this book is about—’the wonder and unfolding of recursive syntax’, and the manner in which it has forced the field of modern-day linguistics to reconsider old assumptions we once held dear—old assumptions which were hard to kill off, but which had to eventually die at the stroke of the generative grammar enterprise. The chapters contained in this book derive from a series of accumulative course lectures given across a span of several semesters to my graduate students of theoretical syntax, as well as to my many undergraduate students of child language acquisition, both at California State University Northridge, as well as Cal State Long Beach where I have lectured as an adjunct professor over the past twenty years. I’d like to thank all my students over the years that have helped shape these lectures. Our collective class discussions have better sharpened my own understanding of these issues. If these lectures in linguistics have improved at all since their first incarnation, it is only because they have benefited from the many discussions, multifaceted argumentation, and the steadfast persistence on seeking-out diverting points of departure on given topics—a ll respectively instigated by you, my students, over those years.

Preface | xiii These lectures are immensely Chomskyan in spirit, recursive-syntactic in nature, and are tethered to a framework which takes as the null hypothesis the notion that language is an innate, pre-determined biological system—a system which, by definition, is multi-complex, human-specific, and analogous to a philosophy highly commensurate of Descartes’ great proverbial adage which announces the calling for a ‘Ghost-in-the-machine’. And for those today who would wish-away Descartes’ Mind-body dualism as no longer tenable, Chomsky turns the table on them by suggesting that all we have achieved, thus far, is exorcise the machine (via Newtonian mechanics), we have left the ghost intact. So, if philosophical dualism is claimed to be no longer tenable (which I believed is a false claim), then it is not for the typical reasons assigned to the break. Rather, dispensing with a duality, all we are left with is the singular haunting ghost. (Chomsky 2002, p. 53).

Conceptually, the production of this book comes out of an abridged edition of my theoretical monograph entitled ‘Recursive syntax’ (LINCOM Publications, Studies in Theoretical Linguistics, 61, 2019) whereas a conceptual pedagogical device I address the syntactic implications of the Four Sentences. While the ‘four- sentences’ came to me merely as a point of departure, as a sort of omnibus tour of Chomskyan syntax over the last half of the past century, it also occurred to me to show how recursive designs of language—i.e., Reflections of Syntax—might play a significant role in so many different spin-off areas. These after-thoughts formulate much of the material found in the appendixes of the text. I’d like to thank all my colleagues of the faculty of linguistics at California State University—Northridge where I have been a proud part of this fine theoretical department over the past twenty years. I’d very much like to thank all involved with the production of this text: Tony Mason, Jackie Pavlovic, Abdur Rawoof, as well as Naviya Palani along with all the editors on her production team at Peter Lang.

Overview

A Brief History of Psychology. Let’s began with some interesting and historical analogies related to (i) the technology-interface to learning, and (ii) brain- analogies. It’s interesting to question what the many psychological impacts have been on the state of our human evolution. For instance, we can start with the invention of paper and what its lightweight, efficient economy and easy transport has meant for the establishment of learning. (I am reminded of the discovery of the Archimedes Palimpsest, the 10th century manuscript found when its original vellum, the dried animal skin used before paper, revealed what was just underneath its surface—as monks, three centuries later, recycled earlier vellums by scraping-off the prior script. We can only be horrified by the sheer volume of writing lost over time). Of course, the typical inventions follow—a ll of which bring very different psychological impacts: the (movable type) printing press and how the eventual spread of knowledge (sciences, religion) played on our human psychology. The typewriter, the PC computer, advancing software (the ability to cut & paste and copy), the floppy-disk … through to all the trappings of the ‘ internet’ (first called the ‘ethernet’, and then the ‘information superhighway’: metaphors for ‘ethereal & otherness’—the neither ‘here nor there’—and of unfathomable ‘speed’). These innovations are often reduced and treated as ‘hardware’ developments, as artifacts—but it is indeed interesting to ask, in retrospect, what such

xvi | Reflections

on Syntax: Lectures in General Linguistics

incremental progress meant for our human psychology, what it meant for our human, biologically-based ‘software’ (i.e., the human mind). It is instructive to look for psychological impacts and to ask how human experiences regarding our interface with such innovations have helped shape our understanding of ourselves, our fellow man, as well as the world around us. Brain Analogies. This led to so-called (historic) ‘brain-analogies’. For instance, it was once understood that the brain was analogous to mechanical ‘clocks’ (of the philosophical ‘clouds & clocks’ argument, see Karl Popper ‘Objective Knowledge: an evolutionary approach’ Ch. 6). In this antiquated notion, the ‘brain as clock’ was said to be made-up of levers and gears which would interact in very trivial ways with the environment. The most obvious interaction with our brain-as-clock metaphor was to count and remember things. A person was understood to be the mere product of the things we came across in our environment, the things we noticed and counted: ‘man as ultimate calculator’. (See Locke’s notion of man as a blank slate, a tabula rasa). Whether or not a person was ‘smart’ was based on how well he noticed and remembered his token counts of environmental interactions; of course, there was no notion as to why a person might notice one thing over another (that question might have more to do with the psychology of observation, ‘a cloud’). The idea of how a ‘bad’ experience might impact our brain/clock—for example, how a person’s personality might be affected and formed—was not considered. In fact, such ideas of ‘personality’ really don’t begin to be formulated, psychologically, until the 19th century, coming to bear on the work of Freud, etc. (But there were earlier antecedents for sure, found in 17th century early modern English literature: e.g., Shakespeare’s first psychological profile of MacBeth). The notion of ‘brain as cloud’ begins to take shape actually in the second half of the 20th century with the work of the linguist Noam Chomsky, who himself brings forth 17th century arguments from the Age of Enlightenment’s own René Descartes in forming a ‘Cartesian linguistics’. This new ‘spookiness’ of the brain (a nebulous-like cloud)—an enlightenment invention—was a large part of the underpinnings of Chomsky’s linguistics theoretical pursuits: viz., how Chomsky questions the ‘direct-environmental’ theories of language acquisition of the day. A clock account to language acquisition would have it that the child’s sole role in language acquisition is to consciously count the linguistic items she comes across in the course of her language growth. This direct ‘input-to-output’ imitation-scheme was a large attribute of the Behaviorists Theories behind general learning procedures, as advanced by the Behaviorists schools of thought, e.g., B.F. Skinner. (See imitation to analogy to computational below).

Overview | xvii This classic dichotomy over Empirical thought (e.g., Aristotle, John Locke, John Stuart Mill, and B.F. Skinner) versus Rational thought (e.g., Plato, Descartes, Spinoza, and Noam Chomsky) has remained constant over time, with only the refashioning of terminology making the epic debate seem contemporary. For instance, the current developmental/psychology debate over ‘Nature vs. Nurture’, more currently coined as ‘the nature of Nature’ is nothing more than a reworking of the earlier Skinner vs. Chomsky (1959) debate (e.g., see Galasso 2003, as cited in Owens 2007).1 A central tenet of the debate revolves around morphological storage: viz., how stem + affix process—whether or not stems + affixes are stored as a single item or not (the former constituting a single mechanism model, the latter a dual model). Of course, even Skinner knew that the grammatical construct for regular plural in English was N + s = Pl (e.g., ‘Books’). The question here is not about the spell-out of the grammar, per se, but rather about how such morphological grammars which hold across inflectional morphology get processed. (Skinner’s model on how language is learned would require that e.g., [book] and [books] are rendered as two separate stimuli, as individually coded in the environmental context by ‘singular vs. plural’ respectively: the two items make-up two independent memory schemes, like how go is memorized to change to went). Proponents for a single mechanism advocate for a full-listing hypothesis which has it that all lexical stems + affixes get processed (stored & retrieved) as a single lexical chunk […]; both regular as well as irregular constructs similarly process as undecomposed chunks insofar that their ‘full-listing’ is directly pulled from out of the lexicon. Such memorized schemes indeed lean towards Skinnerian behaviorism since no decomposed rules are required. Such a full- listing would not only have the items ‘book’ and ‘books’ as two separate (undecomposed) lexical items, but that all derivations of a word would be likewise memorized as separate, independent items—e.g., [speak], [speaks], [speaking], [spoke]*, [spoken] *(noting that a dual pathway would still credit the irregular past tense ‘spoke’ as derived via an undecomposed lexical chunk). A hybrid model, sometimes referred to as a dual-pathway hypothesis, more commonly known in generative circles as the Dual Mechanism Model, suggest that such undecomposed memorized schemes do prevail with irregulars—where rules can’t be attributed to their construct (e.g., go>went, dream>dreamt, (note

1 Owens, Robert (2007). Language Development: An Introduction 7th (seventh) edition.

xviii | Reflections

on Syntax: Lectures in General Linguistics

the sound shift), but critically not dream>dreamed which would be a dual model). Many proponents of the dual model even go so far as to compromise (playing nice by debate standards) by crediting a single pathway for what appears by many to be a ‘common regular present tense verb’—for example, the verb ‘do’ seems to fall into the irregular single mechanism route when considering that the verbal present tense inflection of the verb ‘do’ undergoes the same type of sound shifts obliged by a single route of storage & retrieval: e.g., note verb-stem vowel shifts for present tense do>does (similar to past tense dream>dreamt), and, of course, the past tense do>did has all the hallmarks of an irregular verb. Perhaps a phonological clue here to the dual-hybrid-pathway is to say that whenever there is a stem sound shift within the morphological paradigm, a new word must be realized, stored and retrieved in the lexicon as a new single item. But the hybrid dual-model provides strong evidence that decomposed rules are indeed obligatory for regular stem+affix formations (where no stem sound- shift are observed). I have suggested in various writings that the dual mechanism model can be extended to capturing the dual morphological stages of child language acquisition, where stage-1 memory of undecomposed lexical schemes come on-line prior to a stage-2 of decomposed inflectional rules. Much work of the 1960s generative grammar enterprise, including Berko’s famous Wugs test, set out to demonstrate the nature of such creative and productive morphological processes as only decomposed rules could offer. In this sense, I often use the Skinner v Chomsky debate as a kind of ‘pedagogical device’ in capturing the debate on morphological processing. Skinner:

singular

plural → A ‘Single’ processing

[book]

[books]

[memory stem]:

Chomsky:

non-rule-based

singular

plural → A ‘Dual’ processing

[book]

[[book] s] [stem [+ affix]]:

rule-based

Overview | xix This [[…] edge]-feature, as we will show below and throughout the text, allows for the most crucial property of language of all, that of recursion. It is this outside property at the edge which is abstract and removed from the obligatory binding of memory-based schemata. Without such an ability for displacement, movement, affix lowering and raising, inversion, fronting, and most crucially embedding, there could not be a syntax as it manifests in human language. The Dual Mechanism Model. This dichotomy in essence leads to how a ‘hybrid model’, (i.e., a Dual Mechanism Model (Appendix-2, 5)), which can incorporate both clocks and clouds, might be embedded in our psychological processes of language, and how these two modes of processing indeed have a real physiological presence in our brain: viz., the idea that there are two fundamentally different areas of the brain which bring about this dichotomy of processing. A very simple example of this dual processing could be how an English speaker differently processes the two verbs DO in the expression: ‘How do you do?’, noting how both verbs ‘DO’ have the same spelling and the same phonology, but how they may hold different meanings—of course, also how the two may have very different psychological realities. Try to guess which DO is a clock, and which is a cloud: which has ‘calculative’ meaning and which is ethereal ‘neither here nor there’. Regarding the Dual Mechanism Model, we can simply note how the expression ‘How do you do?’ allows the deletion of the first ‘do’ but not the second ‘do’ (in quick, spontaneous speech). But why might this be, given that at least on the surface-level, the two verbs appear identical? Note: Another example of such surface level deception is found in processing distinctions of the Spanish Los vs Las, where the former is an undecomposed memory chunk [Los] and the latter a decomposed stem+affix determiner [[La] s]—with different reaction-time processes: ‘Las’ obtains the signature of rule- based stem+affix dual processing attributed to the movement of affix {s}, while ‘Los’ (being replaced by the masculine determiner ‘el’ within the inflectional paradigm,:*lo/el niῆo, vs la niῆa) reduces to signal as a mere lexical item without movement: viz., where the {s} in [los] rather incorporates into the stem). Spanish speakers don’t process the masculine/plural ‘Los’ as equivalent to the feminine/ plural ‘Las’ (despite their surface-level identity). There was a time when ‘lo’ was a functioning determiner in Spanish, but it has since been supplanted by the el>los paradigm: la>las, el>los. Regarding our ‘clock analogy’, one could say that Spanish speakers no longer count ‘los’ and ‘las’ as being two separate minutes on the same clock. They are on entirely different clocks: one a clock, the other a cloud.

xx | Reflections

on Syntax: Lectures in General Linguistics

(Also see processing distinctions between very high-f requency regulars such as ‘walked’ over very low frequency regulars such as ‘stalked’, with high-frequency priming like undecomposed irregular-stem verbs ([go] >[went]). See Ullman, Clahsen).

Well, the two DOs are not identical! In fact, the two have very different psychological states, are located in different areas of the brain, as well as hold different linguistic status. The first ‘Do’ is an auxiliary verb (a cloud) which doesn’t deliver the kind of meaning usually attributed to main verbs, while the second ‘Do’ (a clock) is a main verb which delivers main verb meaning. As we can quickly see, the two verbs hold very different linguistic and psychological status: note the acceptance (in spontaneous speech) of ‘How__you do?’ versus *How do you__ ? (where __marks deletion of ‘do’). One question that naturally follows is if the two different verbs occupy different areas of the brain, in addressing a brain-to-language corollary. They do! The discussion now leads to abstract/non- substantive functional-category words (clouds) such as the Determiner ‘The’, the Auxiliaries ‘Do, Be, Have’ (this is not an exhaustive list), etc. versus substantive/lexical-category words (clocks) such as Nouns, (Adjectives), Verbs (Adverbs), Prepositions, and how the former functional categories are attributed to Broca’s area (front left hemisphere), and how the latter lexical categories are attributed to Wernicke’s area (left temporal lobe). One question is how this psychological distinction plays out in the trajectory of (i) Child Language Acquisition, as well as in the production of (ii) Second language learning. Particularly, the matter of whether there’s incremental maturational onset of Broca’s area for children—if clouds develop after clocks.2 By extension we can similarly ask if the human Frontal Lobe (FL) of the brain (the ultimate cloud) could have only become neuro-wired later—as a ‘cascading’ consequence—after an earlier neuro-onset of the Temporal Lobe (TL). Linguistics who study ‘Proto-Language’ (Appendix-4) have this progression in mind. In any case, we know the more robust TL was neurologically connected well before the onset of a FL (both in the development of the child, as well as evolutionarily, and that earlier hominid species may only have had access to TL processes (see Footnote 2)). If so, we might expect functional/Inflectional morphology to go missing in early stages of child language acquisition (2–3years 2 ‘From ‘clock to cloud’ mirrors the evolution of ‘animation’ (primate) to ‘cogito, ergo sum’ (human): a kind of ‘ontogeny recapitulates phylogeny’ argument.

Overview | xxi of age). This is what’s behind a maturational hypothesis of child language (e.g., Radford & Galasso 1998). Priming-effects & Slips. Priming effects are another example of ‘brain as clock’ since the frequency of token items seems to affect how we come to notice and remember certain language structures. We can give a couple of easy examples here: (i) Word Recognition: how the word ‘Bread’ (scrambled as ‘Breda’) seems to be unaffected when supported (top-down) by the priming of another strongly ‘associated’ word ‘butter’: e.g., ‘butter’ primes ‘bread’. (We also include the example ‘Easter B__ ’ to see what a word beginning with B_ _might prime: Bunny, Basket, Break, Bonnet, are to be expected but not Barbeque). The notion that such priming is tied to the number of token examples as found in the input puts the brain squarely in the ‘clocks’ camp. We can also discuss how Frequency-effects (as supported by Behaviorism) led to models of Artificial Intelligence (AI) as supported by the school-of-thought called Connectionism (see link below for a paper on AI).3 (ii) Slips of the tongue: For instance, we might suggest that if a possible slip is allowed, but another slip is disallowed, that this disparity might be a response to the dual mechanism model, and just questions regarding speed of processing. Consider the two slips below (one permitted, the other not): a. Target: ‘The car was driving’ b. Slip (allowed): ‘The drive was caring’ c. Slip (disallowed): @‘The driving was car’.

3 We’ll discuss other examples, e.g., ‘Sentence No. 4’ regarding how the high frequency of the clitic form [that’s] should similarly be overridden by the much lower frequency of the full form [that is], but that in fact the opposite happens—the lower frequency item wins (a very ‘spooky’ consequence which speaks to the brain as a cloud metaphor). Such examples show how frequency may not always win out in terms of a race/ competitive model of processing and speaks to the brain as processing language in two fundamentally different ways (Wernicke’s area rote-memorization [+Freq] vs Broca’s area rule-based [-Freq]). For my ideas on AI, see: https://w ww.academia.edu/39578937/Note_4 _A _Note_ on_ A rtificial_Intelligence_a nd_t he_critical_recursive_i mplementation_The_lagging_problem_of_background_k nowledge_1

xxii | Reflections

on Syntax: Lectures in General Linguistics

Priming-Effects and Syntactic Constraints on Garden-Paths (See Sentence No. 3 of ‘Four-Sentences’ for a full discussion of garden-path sentences). Examples of ‘top-down’ processing whereby language structure is employed to facilitate interpretation of sentences also shows us how language can’t be simply read-off as a ‘beads-on-a string’ theory and further shows how our mind doesn’t work like a clock, but rather is dependent on abstract structure (a ‘mind as cloud’). Consider how the sentences below become incrementally better (faster in processing) not based on the speed of 1–1 ‘items’, but rather based on ‘structure’: Examples of Priming-effects (‘Top-down’ processes helping with ambiguous structures): a. b. c.

The woman painted died. (=The woman (who was) painted died) The woman painted last week died. The woman painted by the artist last week died.

Questions: How does this progression from ‘less-to-more’ syntax (a–c) shape our effects on processing—i.e., the more top-down syntax we have the less degree of ambiguous (d–f ) and garden-path results (g–j)? d. e. f.

The man touched Mary with silk gloves. Who has silk gloves: The Man or Mary? (This is an ambiguous sentence). The man touched Mary so softly with silk gloves. The man touched Mary so pretty with silk gloves.

Do you interpret (f) as ‘Mary’ is wearing silk gloves, and (e) as ‘The man’ wearing silk gloves? Why? Well, one might explain as follows: The word ‘Soft’ primes for the verb ‘touching’? The word ‘Pretty’ primes for the noun ‘Mary’? g. The horse raced past the barn __. h. The horse raced past the barn fell#.

(A soft touch/*A pretty touch). (Mary is pretty/*Mary is soft). (a fine sentence: grammatically accepted). (#garden-path: usually not accepted).

Overview | xxiii *(h). As soon as the final verb ‘fell’ is uttered, most people suffer the garden- path effect and don’t want to register the verb ‘fell’. However, (h) quickly gets resolved by (j) below but not (i): i. j.

*The horse that raced past the barn. (*not a complete sentence, compare to (g)). The horse raced past the barn fell to the ground yesterday.

In (j), the preposition ‘to the ground’ and adverb ‘yesterday’ reduce the garden-path effect. In such priming, speed of processing is untethered to the frequency of the individual item/items, per se, but rather is tethered to overall syntactic structure. In other words, (e–f ) both words here, soft, pretty, as individual lexical items carry the same speed of its individual processing, but when embedded within a syntax, the items spread-over and act upon the overall structure in very different ways.4 This is a ‘brain as cloud’ scenario. (See Appendix-5 regarding the ‘Bell-shape Curve’ in terms of this discussion on frequency, the ‘brain as clock’). *Note in (i), by the addition of only one word ‘that’ (and a ‘non-substantive’ word to boot), we reduce in (i) the previously accepted reading of (g) and render it ungrammatical. Indeed spooky: by adding a single word to an already existing grammatical correct sentence we alter it beyond grammatical recognition. This does seem to defy mathematical logic: but then again, language doesn’t have the characteristics of a calculative math: Language is a cloud!

Slips of the Tongue Errors. Such errors show a ‘language-to-brain’ corollary: viz. how the brain processes language into ‘two’ fundamentally different areas of the brain (and not as a ‘single’ clock mechanism), and how these dual linguistic properties are theoretically pegged to their relevant functional vs. lexical categorial distinctions. Specifically, we note the morphological distinctions having to do with stem vs. inflectional morphologies—how the verb+affix must be decomposed as [driv[ing]] and not as a holistic word/stem chunk *[driving]. If the verb ‘driving’ (and if all stem+affix formations were mere lexical incorporations) were 4 ERP Experiments show that individual processing times (their ‘speed’) of lexical items and/or of processing glitches of such items show a contrast: with lexical ‘items’ being very fast N400 (400 millisecond time) while the spreading effects related to ‘syntax’ are a bit slower, e.g., P600 (600 millisecond time spans). E.g., ‘I need to butter my socks’ would provoke an N400, (an item-based error), not a P600.

xxiv | Reflections

on Syntax: Lectures in General Linguistics

simply memorized as a new word, then we would expect the slip in (c) to occur often: in fact, such slips/errors are completely @unattested: They don’t occur! Structure of slip: Note how Lexical items-[] switch with their counterpart lexical item-[], while the functional (INFL)ectional/a ffixes […[INFL]] remain untouched by the displacement: a. The [car] was [driv [ing]]. b. The [drive] was [car [ing]]: (lexical items [car], [drive] switch and displace). c. @The [driving] was [car]: [driving] can’t incorporate stem+affix as lexical chunk. Result: ‘driving’ must be processed as [driv [ing]], where only the stem [drive] is free to lexically switch with the counter lexical stem [car], but where the affix {ing} remains isolated as a decomposed affix (separate in storage & retrieval from the stem). Any morphological full-list theory which posits that stem+affix come fully incorporated as a ‘single lexical chunk’ [driving] doesn’t capture the evidence here: if a full-list hypothesis were correct, we would expect (c) as a potential slip. (@ But it’s unattested!).

This shows the Dual Mechanism Model behind the [stem [+affix]] decomposition—an attribute of the ‘language-to-brain corollary’.

What is the Nature of this INFL-Affix/Outside Bracket [[]‌]? In theoretical linguistics (Chomsky’s Minimalist Program, 1995), the outer bracket-[[ε]] for the structure [α, β [ε]] constitutes what is called the ‘Edge’-feature* {ε}, a slot-position within the formation of both words and phrase building which houses only abstract/recursive properties of language. This recursive slot found on the ‘edge’ of language formation [[]‌ε] exclusively holds both Functional categories as well as INFLectional morphology. Their two processes both with regards to functional word formation as well as inflectional morphology draw unique parallels to a language-to-brain corollary insofar that Edge-properties completely behave unlike their frequency-sensitive lexical counterparts (e.g., nouns and verbs). Language elements labelled as ‘Edge-features’ are directly attributed to Broca’s area of the brain. Hence, any maturational theory of child language would have to consider the delays and onsets of such edge-properties

Overview | xxv of language, given their alignment and mapping to the maturational nature of Broca’s area. (Broca’s area is understood as not coming ‘online’ in the child until around 3 years of age). Consider our discussion below. Note how both functional categories like the Determiner (DP) or Auxiliary verb (AUX), as well as INFL-morphology, always occupy the outer edge feature, with lexical-stem/derivational* morphology occupying the center bracket. (See below for a recap of functional/INFL elements of language). *Note: Instances of Derivational morphology count as new-word formations, hence their sound-shift quality—consider the rule: ‘whenever there is a sound shift, there is a new word’. And new words must get stored and retrieved as new words; they must list in the lexicon as completely separate word (e.g., (V) celebrate> (Adj) celebratory (note the vowel shift on the third syllable from /e/to / ə/), (Adj) real> (N) reality, (sound shift on second syllable), secret>secretary, etc. These sound-shifts are a hallmark of new-word formations, as seen with Noun to Verb word change: e.g., (N) bath /æ/(V) bathe /e/; (N) glass /æ/(V) glaze/e/; (N) house /s/(V) house /z/, etc.).

Keep in mind our main idea about the Clouds & Clocks metaphor and begin to question how language can’t simply be a singular clock. Later we’ll call this naïve assumption that language is like a clock ‘a string-on-a-bead’ theory (a theory with all the trapping of Skinner’s behaviorism). Below, let’s recap what Lexical vs. Functional categories are (along with their INFLectional vs. DERivational morphology counterparts): A Lexical vs. Functional Recap: •Lexical Categories: (N)oun, (V)erb, (Adj)ective, (ADV)erb, (PREP)osition. – Irregular formations (e.g., go>went, keep>kept, write>wrote (noting sound-shifts)) – DERivational affixes: {-er} teach-er, {-ing} fascinate>fascinate-ing, {-ful} wonder>wonder-ful, {-ory} celebrat-ory (again, noting sound shift from (V) ‘celebrate’ to (Adj) ‘celebratory’). *Where derivational means to derive one word from another (a word- changing process: e.g. with the affix of {-er}, the verb ‘teach’ becomes a noun ‘teach-er’, etc).

xxvi | Reflections

on Syntax: Lectures in General Linguistics

•Functional Categories:

– (D)eterminers (e.g., A, The, This, That, These, My, etc.) all which merge with a potential N to form a DP (e.g., These books). – (AUX)iliary Verbs (Do, Be, Have) along with Modals (e.g., Can, Could; Will, Would; Shall, Should; May, Might, etc). all which merge with a potential V to form a TP (Tense Phrase), given that AUX/edge slot provides Tense to the Verb. – INFLectional affixes: e.g., Plural {s}, Agreement, Case (I vs me, He vs Him), Possessive {‘s}, Tense [present {s}], [past {ed}], present/past participles {ing/en}.

Below we’ll consider the Determiner ‘These’ and the Tense feature {s} in ‘Speaks’. Keep in mind that ‘functional’ can take the shape of either affix (e.g., plural {s}), or Word (e.g., ‘These’). Derivational morphology too has its word and affix options (e.g., compounding black + bird = blackbird, break + fast = breakfast (again, noting the sound shift of break to BREAKfast)), with derivational affixes such as {er} teach-er, {ful} wonder-ful, {al} critic-a l, {ory} celebrat-ory. Determiner INFL- Features (onto the N) include: Number {+/ - PL}, Agreement (Number, Gender), Case (Nominative-subject: I, He; Accusative- object Me, Him,) etc., This is not an exhaustive list. Such INFL features:

(i) are abstract in nature, (ii) define a ‘language-to-brain’ corollary, (iii) shape the recursive nature of syntax, (iv) provide variation among language types, (v) are prone to biologically-based maturation development in children (as well as being prone to critical periods in second language learning).

Let’s flesh out this ‘Edge’ feature just a bit more below, recalling that Lexical Stems constitute in the inner […]-bracket, while Functional/INFL constitutes the outer [[]‌…]-bracket: [[Lexical stem] Functional/INFL]: e.g., [[book] these] => [DP [D these] [N book] s] (where there is theoretical movement/fronting of D to form a DP headed by the determiner ‘These’— showing correct word order of D+N, since English is a Head Initial Language).

Overview | xxvii e.g., [[speak]s] = [TP Mary {s} [VP speak] s] (where {s} affix lowers onto V stem).

What we have above is a bracket-notation which aligns ‘lexical with lexical’ and ‘functional with functional edge’ mappings (before relevant movement takes place). For instance, consider such pre-movement theoretical alignments:

[[books] these]: where the D ‘These’ serves two functions: (i) to provide a phrase-rule constituency whereby D introduces a N, (ii) to provides INFL plural {s}onto the Noun.

Of course, as stated in (i), English phrase-structure rules specify that D must position before the N in forming a DP (Determiner Phrase), but this word order is a secondary consequence of a parameter known as +Head Initial. There are in fact language types which show N+D: e.g., Italian Mama Mia (Mother my) where ‘My’ is a determiner in final position (a minus Head Initial position): [- Head Initial]. So, what we intend to capture here in this first instance is not so much the proper word order of the phrase, but rather how the elements within the phrase map onto what would be the inner bracket (lexical/stem) and what would be found at the edge.

A Brain-to-Language Corollary Albeit somewhat unorthodox and counterintuitive from what is typically shown in introductory syntax textbooks, what this mapping of such alignments shows us is how the phrasal syntactic structures map onto separate brain regions, with Lexical-[] being assigned to Wernicke’s area of the Brain/Left Temporal Lobe and the Edge (functional/INFL) being assigned to Broca’s area/Left Frontal Lobe5:

[Wernicke [Broca]], and where it is [… [Broca]]- area which involves movement—e.g., i. [D [N, [D]]] in forming a DP: ‘These books’: [[book]s] ii. [T [V, [T]]] in forming a TP: ‘She speaks’: [[speak]s]

5 Though of course, most images of the human brain show a left-side facing brain with Broca’s area at the upper left and Wernicke’s area at the mid-right. The order is irrelevant, only that there is a mapping of Wernicke’s area as corresponding to the inner []-bracket, and a mapping of Broca’s area as corresponding to the edge.

xxviii | Reflections

on Syntax: Lectures in General Linguistics

The notion of the edge being assigned an outer slot is tantamount to saying that its properties are on the outer margins of language, that the features contained within are formed at the periphery of language. And finally, we note that such Edge properties in fact instigate MOVEment, whether it be affix lowering (Verbs), Agreement between N & D (Spanish) (see below), or even ‘D-fronting’ to get to the +Head Initial position (N), as found in English. Notice how INFL would map accordingly on N & V: [DP These {+Pl} s [book] s] > [[N]‌INFL {s}] (with plural {s} moving to affix onto N).

For instance, we know that D provides the Number [+/-Pl] feature onto the Noun by looking at D-feature agreements: e.g., while the D ‘A’ is exclusively singular in nature (A book/*books), the D ‘The’ allows the agreement for both [+/-Pl]: (The book/books, etc.). An example of a determiner which specially marks for Plural would be the Determiner ‘These’ (These books/*book). Such properties are to be assigned as edge-features, given their abstract/syntactic qualities. Also note the Gender agreement as in Spanish DP systems: (e.g., La Niῆa/*Niῆo)—where both the D as well as the N have to be feminine (ending in an {-a} affix). Both Gender as well as Number are among the ‘INFL-Edge’ features which must get distributed on to their relevant lexical stems. In this sense, we know that D maps onto N and T/Aux maps on to V with regards to relevant functional/INFL features. (*D-features include: Definiteness, Case, Person, Number, (Gender)). Noun-Mapping: ‘These books’ [Lexical [Edge]] [N [D]‌] [N book [D These]] [N Book [D These: plural {s}]]* [DP [D These {s} [N book]]] (= +Head initial) (before nominal affix {s} plural lowering/ move) [DP These [book] s] ‘These books’

Verb-Mapping: ‘She speaks’ [Lexical [Edge]] [V [Aux/T ]] [V speak [Aux/T ]] [V speak [T 3P, Sg, Pres {s}]]* [TP [T {s} [V speak]]] (= +Head initial) (before verbal affix {s} lowering/move) [TP She [speak]s]

She ‘speaks’

*The Rule: V+{s} = 3P (3rd person), Sg (singular), Pres (Present Tense) *The Rule: N+{s} = Plural (only showing the Number feature/projection here)

Or consider how the functional Tense (T) infinitive-verb affix ‘to’ aligns with the T auxiliary-verb ‘do’ and modal ‘can’—noting that both aux verbs as well as modal, like tense, constitute an edge-feature:

Overview | xxix (i) (ii) (iii)

John likes John John

[T to [T does [T can

[V study]] [V study]] [V study]]

We notate such constructs as originating as follows (where strikethrough of edge shows erasing of edge element after movement/raising): (i’) (ii’) (iii’)

[to [does [can

[V study [edge {to}]]] [V study [edge {does}]]] [V study [edge {can}]]]

As an extension, we consider all functional categories (such as the Determiner or Tense) to be an Edge-feature—despite the fact that both D and T can form the Head of their respective phrases (DP, TP). In this sense, it seems both Spec as well as Heads can be derived via an edge-feature. This further gets extended in the theoretical-syntax literature as the distinction between +/-semantic interpretability: [+/-Interpretable] features being lexical/substantive, while the [-Interp] feature is functional/abstract in nature. Note: (Albeit the caveat above) What’s interesting about this Edge-feature is that this theoretical slot, when configured with X-bar theory, becomes the Specifier position which acts as an adjunct to a given Head of a Phrase. Curious enough, the Spec position is often referred to as the Elsewhere category. This ‘elsewhere’ jibes nicely with our ‘brain as cloud’ metaphor for ‘ethereal & otherness’—viz., Spec is the ‘neither here nor there’ position. This seems correct given that the Specifier position seems to be an ad hoc invention (attached to XP), a position seemingly prompted to serve no substantive material as would be provided by the Head of a Phrase. Rather, the position seems to have been theoretically invented for the sole purpose to host ‘moved elements’ from below the syntactic tree. Assuming that all subjects originate within Complement of VP (The Verb-phrase Internal Subject Hypothesis), the subject VP in situ must then raise up to adjoin to Spec-of-TP in forming a declarative/tensed sentence. The only position free and abstract enough to host such moved elements is the Spec. (See Lecture-3, [§§7.1–7.3]): X-bar

XP Spec [ edge

(X-Phrase: VP, DP, TP, CP…) X’

[ X

(X-bar) Y]]

(Head of Phrase {X}, Complement {Y})

xxx | Reflections

on Syntax: Lectures in General Linguistics

Having recapped the properties of the Edge-feature*, let’s take a further look to see how such ‘slips-of-the-tongue’ as provided by Gordon’s study shows that errors are not random but rather are quite systematic as based on our edge- mappings: where lexical errors slip and slot with other lexical areas, and where their counter functional-edge errors doing likewise. In sum, what we attempt to show here by way of ‘inside vs. outside’ processing- brackets is that the inner stem-formations are derived by lexical learning—a frequency-sensitive condition based on Stimulus & Response (S&R) methods attributed to the school of Behaviorism (e.g., B.F. Skinner). The outer edge is shown to have very different abstract qualities which form the basis of syntax. Perhaps the best and simplest way to see this is by examining Gordon’s famous ‘Rat-eater’ experiment where children know not to insert an INFL affix (an edge property) embedded within a Noun+Noun lexical compounding e.g., rat-eater, namely, children never say for a plural form @rat-s-eater. See Gordon below for how children have tacit knowledge never to break the inherent mappings as provided by the scheme: [Lexical [Functional]], … and that functional edge-features can never squeeze inside two lexical brackets: @[lexical [functional] lexical] as in ‘rat-s-eater’ where compounding of N+N {rat + eater} can’t take an inserted INFL plural affix {s}, an edge-feature, between the two Nouns. *(Edge-feature: recall, our ethereal ‘neither here nor there’ metaphor: Also, this same abstract slot gets labelled as the ‘Elsewhere category’ regarding the Specifier position within X-bar theory).

Innate Assumptions: Poverty of Stimulus ‘Rat-Eater’ Experiment (Gordon) In addition to what will be laid out in our ‘4-Sentences’ section, consider the claims made by the linguist Peter Gordon (1985) that ‘young children know not to keep plurals embedded within compounds’. In Gordon’s classic ‘Rat-eater’ experiment (see Appendix-1), children are asked: ‘What do you call a person who eats rats?’ Children respond ‘rat-eater’ (they delete the {s}) and they never respond *rats-eater. Gordon suggests that children innately know that inflectional morphology {s} can’t be kept embedded within a compound, even though they have never been explicitly shown that such data is in violation of some English grammar. The mere fact that they never hear it (because it is, in fact, ungrammatical) doesn’t explain why children never entertain the prospect: children say loads of

Overview | xxxi erroneous things that they have never heard before. Hence, even though children have no empirical evidence (negative stimulus) that such constructs are wrong, they still shy away from compound-embedded plurals. This is what is referred to as the ‘poverty of stimulus’—namely, when children’s inferences go beyond the data they receive. Gordon suggests in this sense that there must be some innate built-in machinery constraining child learning of language. So, (if say strong behaviorists) ‘input-to-outputs models are the square product of environmental learning’, one question that will come up is: ‘How does such learning deliver a result such as found with the poverty of stimulus case?’ Perhaps symbol manipulation of rules will be required in some fashion after-a ll. But, if so, perhaps we need to rethink the brain as a mere neuro/digital network, as a super-advanced clock. In fact, the analogy of the brain as a digital computer has been under attack for some time. Perhaps at the lower level of []- stem- formation learning, the brain- to- computer analogy holds (where local firing of neurons takes place, etc.). But with higher functions (Edge-feaures)—when we talk of a ‘mind over brain’ in how a brain bootstraps a mind—there may need to be a fundamentally different level of processing with an entirely different neuro-underwriting. The Dual Mechanism Model does precisely this: while it treats frequency- sensitive lexical learning based on brute memorization to be of a ‘lower-level’ []- processing (attributed to the cognitive, problem- solving procedures along with Wernicke’s rote learning), it simultaneously introduces us to a ‘higher- level’ [[edge]]-processing undeterred from low-level neurological networks. This higher-level seems to be epiphenomenal in the sense that there is no Darwinian biological bottom-up pressure for its existence (see the Accumulative Lecture-6 for discussion). ‘Taco Experiment’ Another informal study is to examine what I call the ‘Taco Experiment’—the nature of common language Slips (i.e., mistakes of processing) spontaneously made by speakers, while showing why other slips go unattested (@ are never made). Below is a common slip (while showing the unattested counterpart). Target is the intended utterance: Target: ‘What about tacos tonight?’ Slip: What about taco_tonights? (displacement of plural {s} to last word ‘tonight_s’).

a. Unattested: @ What about _acos tonight t? b. Unattested: @ What about tacot sonight?

xxxii | Reflections

on Syntax: Lectures in General Linguistics

The slip we see is the result of the functional/INFLectional [plural] {s} moving to the functional edge of the word [[tomorrow]_] where there is a possible, available functional slot open—e.g., [[tomorrow]’s] (where a possessive {‘s} or even a clitic {is} can be properly inserted at the functional edge). So, the error is the result of the ‘same kind’—viz., a switch between functional and functional (both Broca based). However, consider why the unattested @ errors in a, b, can’t be made. In (a), the {t} is lexical, part of the noun stem {taco}, and can’t switch to the functional edge (* tonight t). Likewise, the ex. (b), the plural/INFL {s} can’t attached to the beginning of the lexical stem ‘tonight’: where the unattested word ‘sonight’ erroneously mixes with initial consonant/functional element {s} attached as an initial part of the remaining lexical stem [_onight].

Overview of Language Structure: ‘Items vs. Categories’ Consider how items are like ‘table, chair, nightstands’ (and represent flat-[] vertical processing), and categories are like ‘furniture’ (and represents recursive- [[]‌] horizontal processing. (See Appendix-5, §[4]) for vertical vs. horizontal processing): [tables, chairs, nightstands [furniture]]

(See how the category (furniture) occupies the edge: [[edge]]. All abstract properties of language/syntax occupy the edge since abstraction is [-Freq] based and free to be productive). One could easily conceptualize as substantive tables, chairs and nightstands, and readily draw them, but just try to do the same for Furniture, draw ‘furniture’! What do you draw? (No, don’t draw a table or chair. Draw furniture!) You quicky discover that you can’t since furniture is a categorical and abstract in nature; it is a placeholder of sorts into which tables and chairs can be placed. Using this as a metaphor, tables and chairs represent lexical items, while furniture represents an abstract quantity on a par with Edge-features of language. The overview of an ‘item vs. category’ dichotomy of structures also shows up within a dual-distinction of morpheme affixes—one of which abides by laws of ‘frequency & meaning’ (derivational)—our ‘clock’ metaphor, and one of which only abides by ‘category principle’ (inflectional)—our ‘cloud’ metaphor. Let’s summarize below:

Overview | xxxiii Affixes are either: (i) Free to scan—to search for a relationship with an ‘itemized’ (I) stem— a search process which isn’t beholden to a specific item (but is rather CATegory based), where [+Cat] is [-Itemized] and isn’t frequency sensitive [-Frequency], isn’t sensitive to specific association, but is free. This would have the features [-Item]/[-Freq]. For example, the INFL plural {s} can attach to any CATegory of Noun [+Cat] and is not beholden to specific items of nouns [-I] (except in instances of irregular Nouns ‘man>men’, ‘mouse>mice’, etc). Note how in this sense irregulars behave like derivational [+I], [+Freq] processes. So, for INFLections, if we posit a new nonce (made-up) noun such as ‘spigolit’ and ask a s child now there are two: they will say ‘Two [[spigolit]s]’. (Just like Berko’s wugs text: [N [{s}]]). Same with inflectional verbal {s}, past tense {ed}, etc. This freedom to attach to categories (such as N, V) and not to be beholden to specific items (i.e, specific word selections) is a psychological hallmark behind the ability to abstract and leads to the creativity of language (Chomsky). (Recall Noun, Verb are categories, they hold no substance quality other than how they serve as a placeholder. They are [-Freq] based since there is no memory trace of its specific item. On the other hand, the item/word like N-‘book’ or V-‘dance’ are specific lexical items [+freq] based. This dual mode is represented in our vertical [+Freq] mode vs. horizontal [-Freq] mode (above)). (ii) Or Not free to scan. The counter to this is what happens to derivational affixes, where [+I/+Freq] determiners affixation. For instance, while the {ing} derivational affix can attach to the item-word/verb {fascinate} as in {fascinating}, deriving an adjective, (e.g., This is a fascinating class), it can’t affix onto the verb wonder (e.g.,* this is a wondering class), where the derivational affix {ful} gets selected (e.g., This is a wonderful/*wondering class).

Sound Shifts for Derivational/Word Change Also, sound shifts can provide ‘clock-like’ evidence for +Freq word-based status (or word change), while non-sound shift provides ‘cloud-like’ evidence for -Freq categorical rules:

xxxiv | Reflections

on Syntax: Lectures in General Linguistics

Clue: whenever there is a sound shift, just think: ‘there is a new word!’. We find a complete list of such phonological stem shifts in derivational formations—for instance, just sound-out the following derivational Noun-to- Verb counterparts and notice the internal stem sound shifts: glass /æ/(N) => glaze /e/ (V), grass /æ/(N) => graze /e/ (V), bath /æ/ (N)=> bathe /e/ (V), life /ay/(N) => live /I/(V), other examples include real > reality, secret>secretary, etc. Note: This is an interesting psychological shift that takes place in the mind of a speaker order to demark a different word-status. What we are generally carrying on about here is what is referred to linguistically as MOVEment, and as we have seen above, there are a number of reasons for MOVE. Syntactically, the notion of movement has been expressed in GE (Generative Enterprise) terms by the need to either acquire and/or check-off formal features. For example, take Case (Accusative Case marker {-m} as found in the wh-object whom). Case seems to be the kind of formal feature (along with AGReement) which doesn’t impact the semantics of mere communication. This can be seen today by the very little observance of the accusative marker {m}— e.g., must native speakers of English prefer to say ‘Who does she like?’ and not ‘Whom does she like?’ suggesting that the case marking {-m} is rather invisible to the semantics of communication. Hence, in GE terms, such formal features (such as Agreement or Case, among other purely syntactic features) must be stripped- off of its value from the lexical item before they can enter into linguistic processing. Case as trigger for Movement Case represents one of those Syn-MOVE operations which seem to motivate movement. ‘Four Sentences’: (For full discussion of ‘Four Sentences’, see Lecture-3). When it comes time for us to consider the classic 4-sentences (in Lecture-3), we’ll closely examine how they exposes the dual mechanism model of [[]‌]. For instance, one interesting way to tease this discussion out is to ask how a ‘beads- on-a-string’ theory—which is a strong behaviorist theory which posits that all words and affixes are undecomposed chunks-[], as in flat encyclopedia list of items—can’t hold … just can’t be right given the nature of our linguistic evidence argued (in the form of processing, ‘slips of the tongue’ errors, staged-child development, etc.). Below, for now, are just the list of the four sentences (later discussed in Lecture-3). (Also see link below for additional discussion). For now, as a precursor of things to come, try to see if you can understand why such structures are important for child language studies.

Overview | xxxv The ‘Four Sentences’:

(i) Can eagles that fly swim? (ii) Him falled me down. (iii) The horse raced past the barn fell. (iv) I wonder what that is/*’s up there. (where clitic {‘s} can adjoin to [that’s] (note that all clitics theoretically become incorporated into the stem as [that’s] and not *[[that]’s]. (e.g., phonologically, {that’s} syllabically marks as a single word)).

Even the simple example in (v) (first illustrated by Bickerton) shows how linear adjacency (of say noun with verb) doesn’t apply in certain syntactic constructs:

(v) ‘The Boy Bill asked to speak to Mary thinks he is smart’. Who is thinking? Mary??

It turns out No, it’s not Mary thinks—despite the two words being adjacent (and certainly these two words strung together such as ‘Mary thinks’ is very prevalent in our input). It turns out rather that ‘recursive structure’ dictates that the matrix noun to be processed with the verb ‘thinks’ is in fact way up in the sentence chain: ‘The boy’ is doing the ‘thinking’, not Mary or the next noun closest Bill. The upshot here is that it seems the embedded structures-[[]‌] guides us to jump and advance in ways not seen if mere adjacency-[] ruled language processing.

Imitation>Analogy>Computational Finally, by extending our ‘clock v cloud’ metaphor, we can provide a cursory look at the evolution of linguistic theory: (i) from 19th century naïve theories of language as imitative, to (ii) one-notch-up theories of analogy, (iii) ended with Chomsky’s latest incarnation of a computational theory. While there is robust evidence that children initially move through these three incremental processes—i.e., from using both sides of their holistic brain to using the specialized left hemisphere, known as lateralization (e.g., Mills et al. 1997)—the question has been: What is the nature of such incremental steps leading to full left-brain lateralization of language, and how might such incremental phases map onto our current understanding of syntactic theory?

newgenprepdf

xxxvi | Reflections

on Syntax: Lectures in General Linguistics

Coupled with this, what current developmental linguists are looking for is a viable unifying processing model which can account—incrementally in the child over time—for a triad processing of: (i) imitation (of item) > to (ii) analogy (of sound pattern)> to (iii) computational (categorical): (i) Imitation: [x + x => x] (lexicalization) e.g., [V break] + [N fast] = [N breakfast], [N wine] + [N bottle]= [AdjP wine bottle] •Where X+X item-based memorization is employed. (ii) A nalogy: [w [xy]] => [z [xy]] (analogy) singular: [_[ug]] > plural: [_[ugs]] (bug > bugs) which can analogize to [w[ug]] > [w[ugs]], etc. (*Note that the plural affix {s} here is incorporated inside the stem inside the stem-bracket: [..[__{s}]], something Chomsky argues against). •Where sound-pattern analogy is employed: e.g, ‘_ugs’ sounds like ‘bugs’, therefore ‘_ugs’ sounds like ‘wugs’, etc. (iii) Computational: {γ {α, β}} or x + y = z (computational) e.g., two [[book]s], two [[wug]s] John [[drive]s], has [[driv]en] (*Note now that the plural {s} is now a Free INFLectional affix detached from the stem [[…]s]). This syntactic description of {s} as decomposed goes against analogy. •Where true viable categorial-rule manipulation not tethered to either imitation or analogy. Where sole computation is employed. (Chomsky, via Berko’s ‘wugs test’ emphasis this latter mode in (iii)).

Introduction

Recursive Syntax Some 40,000–60,000 years ago, a monumental clash between two hominids occurred whereby an older species, Neanderthals, confronted a newly-emergent species called Cro-magnon, whose very existence was drenched in symbols (something that had never been seen before on the planet) (Randall White 1989). Both had rather archaic body-t ypes (the former, a slightly larger brain-size, the latter much more boney in frame), but the determining factor as to who would gain the advantage can be readily reduced and expressed in modern-day computer terminology—namely, of how a battle (between the two species) might have been played-out between antiquated hard-ware versus advanced soft-ware. The rather strange and human-species-specific ability to form hierarchical recursive syntax, as exclusively found in human language (sparked by Cro-magnum), would be that unique soft-ware advantage.

At the very conception of Greco-Roman philosophy, we already find Plato convincing us that immaterial essences merely consist of Forms, which contain the true and ultimate reality, while the world of sensible things is only a vague transitory and untrustworthy copy. One has an infinitely better grasp of an abstract

2 | Reflections

on Syntax: Lectures in General Linguistics

& subjective quality of, say, the color orange than any real 3-dimensional object— The former, we have a firm mental grasp of its full essence, its outer covering, its inner trappings, its corners and edges; all is in total mental-sight due to its abstract quality. While the latter so-called ‘concrete’ entity is elusive, an entity upon which one could never fully size-up from mere empirical observation alone, given that human vision cannot process sight in one fell-swoop-panoramic 360° viewing—there will always be the odd hidden corner or edge which is covered just behind what appears on the surface in front of us, escaping our sight (mysteriously hidden like the proverbial dark-side of the moon). (José Ortega y Gasset). A 19th– 20th century cognitive perspective would come to show how cued-representations (Icons) could only represent an individual or item, while a detached symbol could stand for an item without the unnecessarily burdening requirement of external stimulation—the former being triggered by direct, environmental stimuli, the latter by a delayed response of its memory—The sign that once expresses an idea will serve to recall it in the future. (George Santayana). In the latter part of 20th century linguistics, we would find Noam Chomsky, alone, dwelling on these observations, dreaming of a return to a Cartesian Linguistics—particularly thinking, that in order for grammatical syntax to take on its full-operational quality, something must happen whereby a smaller structure gains the ability to lodge itself within a larger structure, all the while preserving its structural integrity with no information lost. Again, the ‘item inside a category’ analogy would be a critical component to the theory. Today, on a more personal note—taking-on these concepts as a unifying framework, (even as a ‘pedagogical device’)—I find that in my own lectures I often resort to extending the metaphor of ‘tables, chairs, and nightstands vs. furniture’, and show just how, in analogy, the former ‘items’ (cued-representations) stand in direct opposition to the latter ‘furniture’ as a category: [furniture-δ [table-α, chair-β, nightstand-γ]], … where we can analyze recurrent-flat items expressed as [α, β, γ] (alpha, beta, gamma, etc.), and a matrix recursive-embedded structure as delta: [δ [α, β, γ]]— with recurrent-flat items forming a lexicon, while hierarchical recursion forms a syntax. (See ‘Recursion’ definition below).

So, if Neanderthal were indeed stuck in a flat world whereby only cued direct items could be linearly expressed, or possibly stacked on top of each other lexically, (of the ‘table-chair-and-nightstand’ variety), then there can be no doubt that Cro-magnon, drenched in their detached symbolic categories (furniture),

Introduction | 3 would eventually come to outpace their rivals. Such a cognitive thought-process which allows its host to see the world via categorization would undoubtedly lead to the displacement of the earlier species. Modern Homo-sapiens have made the most out of this cognitive niche, up until today, and with artificial intelligence fast approaching—for better or for worse (though I have some thoughts on the topic). As a final note, perhaps the notion ‘ontogeny-recapitulates-phylogeny’ (ORP) is not too far off the mark, as is typically decried, at least not in these respects— where we do in fact find early states of child syntax initially taking-on a flat ‘beads-on-a-string’ methodology, a biased item-based processing completely reliant on the frequency of cued-based representation. However, what we do later find—via an emergent second, more formal stage—is the delivering of true categorization not sensitive to frequency. This ORP-theme seems to have played itself out accordingly, over the tens of thousands of years, within the emergence of our early-hominid human species only to be re-enacted once again in the narrow, ‘few-month-closing’ window of infancy by the maturing child—viz., where first-order recurrent structure quite rapidly gets surpassed by a maturation-based second-order recursive structure. This is the final act of our evolving human story: where homo-sapiens of today have found themselves at a place where they should rather be called homo- recursive, or at the very least, Homo-grammaticus. It has a nice ring to it: ‘Homo- recursive’. Don’t you think?

‘Recursion’: How to Define It? Our working definition of recursion and recursive structure will be demarcated along a very simple criterion, which may then become expanded in various ways depending on the context of discussion. The two-fold definition is as follows—informally, recursion is:

(i) The ability for either an item-α, or structure-α to embed itself within a newly expanded item/structure-β without the necessary loss of information as determined by the origination of structure-α. In other words, recursion is manifold onto itself; it is the ability to extend and expand information in ways which create cyclical concatenation without surrendering the loss of its initial step-1 material found in the earlier stage of its derivation. e.g., [structure-α [structure-α, β]]… [α [α, β]]*

4 | Reflections

on Syntax: Lectures in General Linguistics

(ii) The ability to create loops, (feed-back loops) which maintain antecedents back to the first derivation of the loop cycle, with no information lost.

Let’s consider two structures which show such recursion, a more obvious example first (English), with a more subtle example to follow (French): (1) John knows that Mary knows that Tom smokes. This embedded concatenation goes as follows, without any loss of information derived from the first derivation: [John knows X [X= Mary knows Y [Y = Tom smokes]]] [John knows that [Mary knows that [Tom Smokes]]] Whereby both John and Mary are ‘forward-looking’ to the same fact that ‘Tom smokes’. In other words, as expressed here, the knowledge that Tom has (regarding his smoking) is not lost on the two prior expressions, and conversely, John’s and Mary’s knowledge entail the same point, viz., that ‘Tom smokes’. Of course, this could go on indefinitely, as loops and feed- back loops (of ‘The house that Jack built’ nursey-rhyme variety).

*(A more formal example of a recursive Language is set L = {abc, aabbcc, aaabbbccc, …}. See Appendix-2 for a discussion of a neuro-net implementation leading to recursion). Now, consider a more subtle French example which utilizes the same type of embedded/recursion in the form of ‘raising’ upon a syntactic tree: (2) Il a vu les articles. (He has seen the articles).

[Il a vu les articles]

Note how the past participle-verb ‘seen’ (vu) is singular in accordance with subject-verb agreement, since the feed-back loop to singular-number is localized to the French singular pronoun Il (‘he’ [-Pl]) . But once recursion takes place, with the raising/embedding of the [+plural] article/determiner ‘les’ (those/them) which raises and inserts above the past participle-verb ‘vu’, the verb now must become marked as plural as well (since this is its new local domain), with the added feature that the plural {s} must attach to the past participle vus—as if the act of les raising—passing through and being filtered by the past participle— creates an entirely different feed-back loop on itself: (i) (ii)

‘Il [-Pl] a vu [-Pl] ‘Il les [+Pl] a vus [+Pl]

les arcles [+Pl]’ les arcles [+Pl]’

Introduction | 5 (2’) Il les a vus. (He them has seen).

What recursion as defined herein allows is the ability to move structures (in terms of syntax, upward raising) in order to expand its expression. The French recursive/ syntactic structure would look as follows. a. [Il [-Pl] a vu [les articles]] b. [Il [les [+Pl] a vus [a vu les articles]]]

This is exactly what we would expect if natural language is recursive. The same raising happens in English. Consider an example that we’ll come to talk about regarding ‘possessive recursion’. When one begins a step-1 feed-loop behind the expression ‘John’s friend’, note what takes place at step-2 of the derivation, again, with no initial information lost as the cycle gets expanded: (in fact, it marks for double possessive {of}, {‘s}, based on each local domain) (see below): c. ‘I am a friend of John’s’ i. ii. iii.

[John’s friend] (step-1 recursion) [of [John’s friend]] (intermediate step) [a friend [of [John’s friend]]] (step-2 recursion)

Recursion as a Diagnostic Test The ability found in all human languages to perform recursive operations has recently been utilized in the literature as a diagnostic test in recognizing certain types of language impairment as found in autism & specific language impairment, second language (L2), as well as establishing a means for demarcating the healthy grammatical dual stages of normal-developing child syntax.

Second Language Development First, let’s consider how recursion is implicated in the movement of the Tense [+T] feature. When the well-formed English utterance ‘Does he do it?’ is generated, the [T]‌-feature, projected from a Tense Phrase (TP) is situated relative to

6 | Reflections

on Syntax: Lectures in General Linguistics

the verb ‘does’—hence, the structure looks something like the following (using QP here as a Question Phrase, better understood as CP): (3) [QP [+T] does [TP he [-T]

do it]]?

[TP he [+T] does do it]!

The recursive/movement of [α [α, β]] is then applied by the raising of the [T]‌- feature from TP to QP (but once raising/copy has been completed, deletion of T must take place).

But one of the more interesting characteristics of non-native second language (L2) speech is that very often the [T]‌-feature is not recursively cyclical and rather remains in-situ (i.e., there is no movement, only direct insertion). For example, consider the two L2 examples below spoken by a native French speaker who learned English only as an adult (showing double Tense): (4) Does he does it?

[QP Does [TP He does it]] (showing no move and delete) (5) Did he visited you? [QP Did [TP He visited you]]

What we can gleam from such L2-utterances is that recursion doesn’t seem to be fully operative here. In other words, such utterances appear rather ‘flat’ in terms of their structure—namely, it rather seems that added words merely get tacked onto existing structure (like a lego piece). This is a phenomenon referred to as ‘shallow’ processing, as opposed to ‘deep processing’—a bricolage strategy often employed in the adult-learning of second language syntax: (6) (i) [QP [+T] Does he [+T] does it]? (ii) [QP [+T] Does [TP he [-T ] do it]]?

(wrong ‘shallow’ structure with No Movement) (correct ‘deep’ structure with Movement)

(Note how (6i) shows ‘flat’ [α …. β.…] versus ‘embedded’ [α ….[β ….].…] found in (6ii)). (See Lecture 2, [5]‌). (Note how (6i) shows an operation called Merge whereby the lexical item ‘does’ simply gets drawn from out of the lexicon, fully inflected, and inserts/attaches at the top of the derivation. Whereas (6ii) shows some form of Move (in this case, the T-feature percolates up to QP, instigating a recursive [[]‌]-structure)).

Introduction | 7 In (6) above, what is interesting is that the Tense feature [T]‌doesn’t actually move upward, copy and then delete, but rather follows a kind of a lego bricolage ‘add- insert’ operation, which would be typical of so-called flat ‘recurrent’ operations rather than embedded ‘recursive’ operations. (See ‘recursive vs. recurrent’ arguments herein). The exact same would apply with the past tense example found in (5). Some linguists have characterized such L2 speech as a surface-level ‘shallow’ phenomenon which lacks deep-structure recursion, perhaps being motivated by phonological and other general problem-solving skills which place a high-value demand on surface phonology and other adjacency factors, such as where and how the string is heard superficially by the L2 speaker—a kind of a bricolage ‘beads-on-a-string’ theory. (The French word bricolage means build-up/construct). It is as if the L2 strategy is primarily surface-driven, whereby the insertion of the formulaic ‘does’ simply inserts in front of an already fully-formulated existing tense phrase. This appears to be a syntactically impoverished Question-formation strategy that doesn’t rely on [T]‌-movement (from TP to QP/CP), but rather relies on Do-insert, and nothing more. Hence, we get the well-cited examples of double tense found in the L2 literature.

Autism and Broca’s Aphasia Data on autism also show us how recursion (or the lack thereof) can be utilized as a diagnostic in terms of identifying Broca’s aphasia (BA), where BA subjects infamously have a very difficult time handling movement operation, particularly those movements which span a distance over a parsed utterance. Yosef Grodzinsky has spent a fair amount of his medical/linguistic career studying such cases. For example, the distance required of the recursive movement seems to matter in terms of processing, with closer-move functioning much better than distant-move. Consider some BA data below: (7) Which boy t ___t pushed the girl?

(Above Chance): local move/surface S_VO order (8) Which boy t did the girl push __ _ _t? (Chance): distant move /surface OSV_ order

BA subjects have difficulty with examples such as (8) above: they only have ‘chance’ readings/interpretations when ‘movement at a distance’ is required of them. In example (8), the trace of ‘boy’ must coincide with the position which is further away sentence-final. Whereas the example in (7), the trace can be quickly recovered due to its close surface-level adjacency.

8 | Reflections

on Syntax: Lectures in General Linguistics

Tom Roeper (see his 2007 book ‘Prism of Grammar’), along with long-time colleagues at UMass, has invented such a diagnosis which focuses on movement in order to evaluate language disorders such as Autism Spectrum (AS) and Specific Language Impairment (SLI). (See Roeper, DELV diagnosis herein—DELV (diagnosis evaluation of language variation)). For example, in so-called ‘Double- Question formations’—where a speaker must hold two or more items and/or matching antecedents together at one time—DELV has been shown to pin-point linguistic anomalies due to such a lack of recursive. Results are borne out in the SLI literature, as well as in aphasia studies which show just how problems handling recursion, or distant recursion, impact normal language processing. Along with BA, AS and SLI, studies show how similar syntactic deficits are a hallmark of language processing in front-left hemisphere stroke victims.

Child Langauge Syntax The stages of child syntax can be said to follow a trajectory of (i) non-recursive operation (found at stage-1, at approx. MLU of 2.5 words—e.g., what Andrew Radford (1990) called a ‘lexical/thematic’ stage devoid of movement and inflection), and (ii) a recursive, INFLectional stage (just after three years of age), whereby INFL as well as partial mastery of other functional categories (such as Tense, AGReement, Case, and fixed Word Order) begin to come on-line. (Radford 1990 assumes a maturational hypothesis here regarding the onset of movement. It could be argued that such a maturational onset of recursion is connected to the latter development of Broca’s area. I have gone on to call this stage-1 a ‘merge-only’ stage devoid of movement. See Galasso 2016). The classic stage-1 examples of ‘Him do it’, ‘Why daddy crying?’ show the same type of flat, non-recursive Merge-projections as found in our L2 data above. Consider the structure of ‘Him do it’ below: (9) [VP Him do it] (showing no Tense feature) (10) [TP He [+T] [VP does [VP Him do it]].

In (10), we show:

(i) the pronoun ‘Him’ raising from VP to TP to receive nominative Case (He) (T hands-over the feature to assign nominative Case), (ii) the verb ‘do’ receiving the Tense INFL-morpheme {s}.

Introduction | 9 The example of ‘Why daddy crying’? is quite similar to that flat, non-recursive surface-level strategy we found of L2 speakers above: (11) [QP why daddy crying]?

rather than the recursive structure of: (12) [QP Why is [TP daddy is crying why]]? (base-structure prior to movement).

Here, the young child’s question-formation strategy is to simply attach the Wh-word ‘why’ above the already formed VP, forming a recurrent, non-recursive structure (with auxiliary verb omission): (13) [VP daddy crying] => [QP why daddy crying]?

Clearly, such a simple, prosaic Merge operation whereby lexical items get attached to preexisting structure lends credence to the notion that early child English (at roughly 2 years of age) is predominately a surface-level phenomenon which places a high degree of burden only on surface-level phonological processes (absent full syntactic Move operations). In this sense, phonology is understood as a quintessential surface-level processing, whereby frequency of input and adjacency carries perhaps the highest priority in any overriding optimality theory of phonology.

1

Opening Philosophical Questions: Language and Brain Analogies

So, for one reason or another you have taken it upon yourself—or, perhaps more honestly, have somehow been led by another—to take an ‘introductory linguistics course in syntax’ which now asks of you to ponder the following question. Plato asked it of us via Meno (Meno’s problem). It has even earlier antecedents, say, of biblical proportion (Talmud). Historical nuances of the ‘very question’ have been handed down to us in a variety of forms, as has been made available by current understandings of the time. For instance, the now classic dichotomy over Empirical thought (e.g., Aristotle, John Locke, John Stuart Mill, and B.F. Skinner) versus Rational thought (e.g., Plato, Descartes, Spinoza, and Noam Chomsky) has remained constant over time, with only the refashioning of terminology making the epic debate seem contemporary. For instance, we’ll come to learn in the following sections and chapters that the current developmental/ psychology debate over ‘Nature vs. Nurture’, more currently coined as ‘the nature of Nature’ (Bates 1997) is nothing more than a reworking of the earlier Skinner vs. Chomsky (1959) debate (e.g., see Galasso 2003, as cited in Owens 2007).1 (*See Popper 1972 ‘Of clouds & clocks’ analogy). 1 Owens, Robert (2007). Language Development: An Introduction 7th (seventh) edition.

12 | Reflections

on Syntax: Lectures in General Linguistics

This re-repackaging of the same issue is best illustrated by the adoption of different Mind Analogies* throughout the centuries—such as e.g., ‘mind as a blank slate’ (tabula rasa), ‘mind as steam engine’ (mechanical), ‘mind as a tape recorder’ (imitative), ‘mind as pattern- seeker’ (analogy), ‘mind as computer’ (binary, digital, computational), and most recently, ‘mind as a quantum processor’ (spooky-quirky, stochastic, quantum-rule-based—viz., rules non-tethered to a deterministic environment). This pendulum of analogies across the spectrum has basically pivoted back and forth between these same two poles found within the classic dichotomy of Empirical vs. Rational thought—namely, whether or not the brain-mind relation to language is one of a strict iconic mapping of ‘form- to-meaning’ (connectionism), or whether the brain-mind language mapping is more symbolic in nature, which seeks out algorithmic rules and performs manipulation on variables (rule-based). This current debate is so polarizing that it is often referred to in U.S. geographical terms as ‘East-Coast’ (MIT, Chomsky) vs. ‘West-Coast’ (UCSD, Elman) philosophy. (For a sense of the ongoing debates, see Gary Marcus vs. Jeff Elman ‘the Dual Mechanism Model vs. the Single Mechanism Model’ (respectively), Marcus 1998; Elman 1998). Most recently, the debate has centered on defining terms of whether or not a connectionism which utilizes feedback loops (hidden units) should be regarded as ‘rules embedded in the very architecture’ of the single mechanism system (Elman) or whether, by definition any inherent architecture which is rule-reliant should be regarded as a ‘rule-based’ system (Marcus).

What is the Question? So, what is the question? The singular question which we abstract here, and one which we find as a reoccurring theme throughout, can readily be captured and reduced by a simple diagram as shown in (1) below. (Figure 1) (omission errors ‘input deleted’, commission errors ‘output added’):

c Lx output (omission of y) (1)

a Lxy input

b FL d Lxyz output (commission of z)

Figure 1 Input-Output Model of FL

Opening Philosophical Questions | 13 The question (actually a puzzle) is: How is it to be understood that input (1) given to a language (L) can manifest in such a way which allows it to proceed through some intervening device (FL) such that L output may be prone to developmental errors (e.g., child language errors of omission and/or commission) as well as interference factors as seen via L2 (e.g., second language processing which leads to so-called L1-transfer). In short: How is it that the L output doesn’t necessarily match the L input? Certainly, the ‘brain as tape recorder’ analogy discussed above (i.e., language as imitative) would be hard-pressed to explain such bizarre ‘spooky’ phenomena. In the scheme in (1), refers to a given Language input which contains language characteristics as denoted by variables . FL refers to the Faculty of Language (FL) of the human computational system which allows for language to be acquired naturally as triggered by the impending natural input, earlier termed by Chomsky as the Language Acquisition Device (LAD). But the above reduced scheme is in fact drawn from what we have observed over the past half century of child language studies. What will be advanced herein, particularly dealing with Child Language Development, is the above observation that the nature of child language errors of omission and commission relates to the more abstract functional categories having to do with morphosyntactic grammatical relations as found in the following examples: (2) AGReement (subject-verb agreement of person, number) —e.g., Mary sleep-s

The grammar of Agreement states that the verbal features must agree with that of the subject (a kind of verb-subject agreement, the converse of what is typically referred to as ‘subject-verb’ agreement). In (2), the AGR features generated by the proper noun Mary include third person [3P], singular, (or minus plural) [-Pl]—hence, the verb’s AGR features likewise must show 3P/Sing, which in connection with the verbal tense feature present (or minus past) [-Past] then triggers the verbal bound morpheme {s}, shown as a suffix {s} which attaches to the verb stem [sleep] => [[sleep]s]. The grammar of INFLection (number, case, tense) and functional word categories such as Determiners (the), Determiner Phrases (DP) (the book) (replacing the traditional notion of a Noun Phrase), Auxiliary-Modal verbs (do/be/have- should/may/might) or Tense Phrases (does study, is studying, has studied, should/ may/might study).

14 | Reflections

on Syntax: Lectures in General Linguistics

Such a systematic treatment of ‘what gets missed out where’ in the early stages of child language has successfully been used as a means by which linguists ‘backwards-engineer’ the highly articulated hierarchical structures of language. In fact, and perhaps few recognize this, the study of child language acquisition has become the gold standard upon which we reveal this multi-layered structure of language, by which we mean that certain sectors of language remain autonomous and are possibly pegged to distinct areas of the brain. By studying the various phases and stages of child language acquisition, we observe the natural emergence of these distinct language sectors. And, of course, theories about language and brain maturation follow. This notion calling for autonomous language areas has come to be referred to as a modular model of language. I suppose there is something still unyielding about the old saw ‘ontogeny recapitulates phylogeny’, whereby the multi-phased protracted and incremental development of speech children present may actually reenact, within a very compressed period of time (say, within 30 months), that which humans have been evolving over a span of a million years. We could use as a starting point here homo-erectus and his usage of a kind of protolanguage—at the very best, a one-to-one iconic mapping of worldly objects which map onto primitive chats, cries and linguistic expressions. Such a protolanguage would take as its shape that which highly resembles early stages of child speech. Perhaps, we can assume from the nature of this protracted language growth that there are certain areas of language which are more robust, pegged to more primitive regions of the brain, which have survived in humans for the duration—viz., a more primitive yet robust form of language processing as related to the more global & holistic right hemisphere dominance. The temporal lobe region of the limbic system along with the motor-strip area dealing with cognitive motor-control would be a prime area to consider as the ‘cerebral-cradle’ for the birth of protolanguage. (See Appendix-3 Proto-language). In any event, we have learned quite a lot about adult language through the eyepiece of child language. At least from the perspectives of neurolinguists, in one sense the young child is the proverbial canary in the coal-mind when it comes to identifying how the human brain comes to treat, organize and process the multi-modularity of language. In brief, regarding these modular distinctions and the state to which respective features emerge, a common theme emerges suggesting that certain properties of language behave in a singular fashion, with certain linguistic features and categories being tethered to specific sites of the brain. Such a putative brain-to- language correlation has created a niche of linguistic science which looks to the

Opening Philosophical Questions | 15 study of neuro-linguistic mappings of language and subsequent developing— hence, the birth of neurolinguistics. Also, the very protracted nature of child language acquisition suggests a maturational hypothesis of brain development— hence, a biological basis of language leading to biolinguistics. Perhaps the most interesting aspect of these theories, when taken together, is that they shine a light on what has actually been in our linguistic focus for quite some time, and may have been simply overlooked—namely, a hybrid model which has antecedences going back to the seminal Skinner vs. Chomsky debates of the late 1950s (Skinner 1958 vs. Chomsky 1959) when behaviorism was being challenged by the notion that language needed to be considered as a formal computational system in its own right, autonomous and unrelated to what one might typically think of as a mere communicative, meaning-based system. The reverberations of the classic debate, which is really the nature vs. nurture debate repackaged, have rippled through main-street linguistics and have remained with us ever since, with the most recent reincarnation establishing the framework behind the current Dual Mechanism Model (DMM) (see Pinker 1999; Clahsen 1999 for reviews). The dual model promotes the idea that both Skinner and Chomsky can be brought together under one linguistic roof, since relevant aspects of each theory can now be correctly aligned within the hybrid DMM theory of language. In a nutshell, with further discussion to follow, what the hybrid approach recognizes is that Skinner indeed got it right! when it comes to rote learning & frequency effects driving the ‘non-rule-based’ lexical/ semantic side of language (lexical categories, vocabulary, irregular word formations, derivational morphology and semantics), whereas Chomsky got it right! when it comes to computations driving the ‘rule-based’ functional/syntactic side of language (functional categories, regular word formation, inflectional morphology and syntax). It is in fact the merging of these two models which renders a Dual Mechanism Model— with ‘flat-Skinner’ accommodating the frequency component of language while reconciling ‘recursive-Chomsky’ to maintaining the symbolic/rule-based components of language. (See End-note on ‘Evolution of Linguistic Theories’).

While much of this will be properly presented as we move through subsequent sections and chapters of the text, it makes for a good jumping-off point. So, with no further ado, let’s now return to our aforementioned discussion dealing with ‘the question’ at hand. (Did we almost forget the question? Not a chance … !).

16 | Reflections

on Syntax: Lectures in General Linguistics

Restating now the scheme as posed in (1), let’s be more explicit with ‘the question’ and add a touch of concreteness by inserting some examples. Here is the Question. Question—W hat kind of a model of language (b), labelled here as the Faculty of Language (FL), could deliver such anomalous structures as made apparent by contrasting the language (L) input (a) Lxy with the L outputs (c, d), with (c) showing omission of L input (mistakes of subtraction), and (d) showing errors of commission (mistakes of addition)? At the very minimum, FL must serve in some capacity as an intervening computational system, which, as the expression goes, ‘has a mind of its own’ (Figure 2 (ex. 3) below). c Lx output (omission) (3)

a Lxy input

b FL d Lxyz output (commission)

Figure 2 FL Model (Note: Given what’s been said in our little background found in the previous page, keep in mind what kind of language model would predict such a highly systematic treatment of language regarding what suffers omission or commission in child language development).

Consider (4) below showing one such example of functional omission. (4) Lx output (c): omission of Determiner, Auxiliary a. Input target structure: What (is) (the) man (is) doing?

→ What is the man doing?

(target structure)

b. output child utterance:

What Ø Ø man Ø doing? → What man doing?

(child utterance)

Opening Philosophical Questions | 17

c. Compare:

What is the man doing? What man doing?

(adult/target utterance) (child utterance)

Note that both the Determiner ‘the’ and Auxiliary ‘is’ get deleted in the child language target. These two categories (D & AUX) and considered functional categories. Next in (5), consider the omission of more subtle functional morpho- syntactic features. (5) Lx output (c): omission of Case, Tense, Inflection, Number: a. input structure: [[ 3P/Sg/Nom] He] [IP [drive] s] [[ +PL] two] [Num[car] s]

→ He drives two cars.

(target structure)

b. Output utterance:

→ Him drive-Ø two car-Ø

(child utterance): (‘him’ as default case)

c. Compare:

He drives two cars. Him drive two car.

(adult/target utterance) (child utterance): (lack of Tense/A greement)

(6) Lxyz output (d): commission, copy of Auxiliary

a. Input structure:

What is the man (is) doing? → What is the man doing?

(target structure)

b. output utterance:

→ What is the man is doing?

(child utterance):

(copy of Aux ‘is’)

18 | Reflections

on Syntax: Lectures in General Linguistics

c. Compare:

What is the man doing? What is the man is doing?

(adult/target structure) (child utterance)

(7) Lxyz output (d): commission of possessive

a. input structure:

This is [[Poss] mine-Ø] → This is mine!

(target structure)

d. output utterance: This is [[Poss] mine’s]

→ This is mine’s

(child u erance):(over-generalizaon of {‘s})

While developmental linguists have long observed that young children systematically omit certain aspects of language over others, and that there is a maturational timetable to these omissions, we are still learning the degree to which some language elements remain more conservative over others, and why there might be cross-linguistic differences found amongst such omissions. Perhaps an even more important question is to ask whether some commonality holds between more vs. less conservative elements found in languages across the world. But one doesn’t have to look at global typologies here to find interesting distinctions amongst functional categories of different languages. For instance, let’s consider two common English examples below and see if we can tease out the abstract/functional vs. concrete/lexical features which underwrite the structures as either being less or more conservative in nature. Consider the following examples below: (8a) *Today is the day you’ll remember [the rest of your life]. (8b) Today is the day you’ll remember [for [the rest of your life]]. (8c) John brought [a wine bottle] to the party. (8d) John brought [a bottle of wine] to the party.

Opening Philosophical Questions | 19 Notice that the reading in (8a) is at best unclear given that one cannot remember into the future of one’s life using ‘today’ as the starting point. The problem here is that the pragmatics of the sentence suffers from the lack of syntax. The logical structure of [the rest of your life] takes on a noun property and therefore results as a logical argument of the verb ‘remember’, the noun phrase becomes what you are actually remembering. The structure can easily be corrected with the insertion of the preposition ‘for’ creating an adverbial-like structure which doesn’t provide for an argument status. Hence, what one remembers is ‘the day’ (for the rest of your life) and not the ‘rest of your life’ (from that day). You might say that this is a very subtle and abstract distinction that doesn’t typically show-up in naturally spoken language. But then, look again. Consider how the examples in (8c) and (8d) differ, and how even adult native speakers have access to such distinctions and use them correctly. In sentence (8c), John bought a ‘wine bottle’ and in (8d), He bought a ‘bottle of wine’. Now upon some close inspection of the two linguistic structures, one immediately finds that a distinction can be made about the two bottles, and if John is coming to our dinner party, we surely hope that he has brought ‘the bottle of wine’ and not the (potentially empty) ‘wine bottle’. You see, again, the little insertion of the possessive functional word ‘of ’ changes everything about what we understand about the bottle. What can be said here is that the strategy whereby two nouns are placed together [wine, bottle] provides only for the ‘stacked’ combinatory readings of the two lexical items, thus allowing for derivational processes to kick-in changing the noun ‘wine’ to becoming adjectival—e.g., what kind of bottle is it: a plastic/glass/wine bottle. (For syntactic analysis of ‘wine bottle’ v. ‘bottle of wine’, see Lecture 4, [16]). We know that children in fact use this bricolage strategy of placing two nouns together when they try to express a possessive structure which they otherwise don’t have at early lexical strategies of their language development. Consider some classic examples: (9) mommy sock, daddy car, baby bottle, etc.

But perhaps the more interesting point to be made here is that adults likewise still have access to such base structures, though we understand that the syntactic readings of the two are different.

20 | Reflections

on Syntax: Lectures in General Linguistics

Question Restated (From (3) Above) What kind of a model of language (3b), labelled here as the Faculty of Language (FL), could deliver such anomalous structures as made apparent by contrasting the language (L) input (3a) Lxy with the L outputs (3c, d), with (3c) showing omission of L input (mistakes of subtraction), and (3d) showing errors of commission (mistakes of addition)? At the very minimum, FL must serve in some capacity as an intervening computational system, which, as the expression goes, ‘has a mind of its own’. So, let’s recap. The question as diagrammed above is a simple illustration of why it remains largely insufficient to explain the complex processes which lead to language acquisition by any simpleton naïve theory of a direct input- to-output scheme, as was earlier (and unsuccessfully) advanced in middle 20th century behaviorism (B.F. Skinner). The problem with such simple ‘input- output’ X=X schemes is that such models can’t make provisions for theoretical intervening computational processes ‘FL-worthy’ of what would be required due to stipulations of a rule-based mode of processing. In short, only by way of some intervening additional mechanism could FL bear on the direct input in ways which indeed render different outputs. This simple fact that what goes in the mind of a speaker may not necessarily relate back to what comes out of the mouth of a speaker must be considered when devising any theory of language acquisition, and its subsequent understanding may go far in ultimately providing spin-off theories in describing other language-related topics such as language evolution, language impairment, and language variation & change. Therefore, regarding language development, it seems that ‘What gets missed out where’ in the mind of a child must ultimately be explained by some maturational development of the brain (a biological basis for language), and that this biological basis must become the actual underpinning of the language faculty. By understanding this biological basis, its maturational development, and by examining what goes on in the brain whenever ‘language goes wrong’, we become better able to devise strong working theories of general linguistics. So, we can summarize here and suggest at least a tentative answer to the above question: ‘What kind of model does FL call for?’ Well, at the very minimum, FL must be formed by (i) some aspect of the human brain, (ii) be computationally driven (i.e., rule-based), and (iii) must reflect certain inherent qualities of the human mind which provide for models unrelated to direct input-output schemes. (i) speaks to a phylogenetic basis (e.g., language is unique among humans), (ii)

Opening Philosophical Questions | 21 speaks to a theoretical basis (e.g., Skinner. vs. Chomsky), and (iii) speaks to language as residual of a ‘theory of mind’.

What is Language? Certainly, the best place to begin any series of Introductory to Language & Linguistics lectures—though perhaps not always the most obvious—is to ask the three-pong question: What is language, What is its properties, and How should we go about defining it as a formal computational system? It has been too often considered, I feel mistakenly so, that the study of language and linguistics should be narrowly treated in ways which solely emphasizes the role language plays in providing communication. Sure, the role that language plays in communication is real enough, and it may do us little good as linguists to talk about language absent its functioning value as a means of communication (a kind of functionalism). But we must be very clear here on this point: while communication certainly involves, indeed requires, at least some fragmentary aspect of language, what we wish to do here in this text is extend the formal definition of language, move it beyond its mere functionalism value, and attribute to it features which have to do with pure computational qualities (a kind of formalism), features completely untethered to factors which contribute to communication. These same formal qualities we attribute to language are thus present even in our silent, inner-most private language, independent of whether or not we are actually ‘communicating’ to anyone in a public space. True, the counterpart qualities and properties of such a private language-of-thought will need to take-on very different shapes when transferring over to a public language-of-communication—this is simply due to the fact that there will be differences in demands placed upon the two systems. (For instance, while we may be able to hold several silent thoughts at once, we can’t speak several phonemes/sounds at once: time & space constraints hold differently between abstract vs. physical phenomena). Let’s consider this below. It’s plain enough to see that a ‘sound/phoneme-based’ (public) language- of-communication would need to satisfy demands placed on it by constraints on the cogno-auditory system. The mere fact that a public linguistic expression has to be uttered →heard →processed presents linguists with some fairly non- trivial demands, both physical as well as mental, requirements of which must be fully met. There certainly would not be the same types of demands placed on, say, a public language-of-thought if one were to exist, other than, say, demands which might creep into processes of telepathy: I don’t know what such demands

22 | Reflections

on Syntax: Lectures in General Linguistics

would look like. Of course, what we mean here, and what does exist, is a private language-of-thought (not public) which counters our public language-of- communication. In any event, we can get by with this definition of a public language-of-communication, in contrast to its counterpart private language-of- thought, by addressing the kinds of demands which must be satisfied in order for either linguistic system as a whole to remain cohesive and coherent. It may be surprising to most people that it is this latter private and more formal aspect of language, labeled herein as the formal value of language (formalism) that has become the touchstone and leading focus behind the Chomskyan framework of linguistic theory. (Noam Chomsky is an MIT scholar and is considered by many to be the seminal figure in modern-day linguistics). By adhering to these inherent computational features of language—a ll the while paying our due respect to how such features must manifest and project in communication— we keep to our well-defined notion that language is formal, abstract in nature, and structurally dependent—i.e., that language is always more than ‘just the sum of its parts’. So, we must be careful and define language—and the processes which underwrite language (i.e., its computational systems)—in ways which address this higher-level quality. Holding this more stringent definition of language, we can begin to move beyond language in its mere communicative capacity and accept, in Chomskyan theoretical terms, that language doesn’t (nor should it) simply reduce to a mere functional status. In fact, we may wish to de-emphasize and forego the role of communication all together, and by doing so, strictly turn our attention to evaluating what these more abstract and formal qualitative values of language really are. This is a kind of shedding of the outer clothing (language-of-communication), to a reaching-down to the bare skeletal properties (language-of-thought). In so doing, questions regarding the true formal nature of what underwrites language emerge in one fell swoop, forcing us to accept that fact that there could be language without the uttered word, and thought without the spoken sentence. But does this mean we sell ourselves short of word, phrase, and sentence? No! In fact, these very constituents will be rightly called upon as the outer (objective) manifestation of our inner (subjective) language. As mentioned, these constituents serve as satisfying certain demands; they roughly serve as a cognitive bridge which carries the formal inner impression of thought to the outer expression of communication. So, what we mean is that in addition to the substantive word, phrase, sentence, there must also be something of a more formal non-substantive nature which governs and underwrites the processes of word, phrase, and sentence.

Opening Philosophical Questions | 23 The aim of this introductory text is to draw attention to these non-substantive governing processes which allow language to emerge. Language can mean a lot of things for different people—i.e., the rolling of a teenager’s eyes when embarrassed by a well-intentioned parent, the slamming of a door, the expression ‘that’s nice dear’ to mean ‘I don’t really care’, etc. What are we to make of these tad bits of language? The most common mistake held is that language, more or less, amounts to ‘the sum of its parts’: isn’t it the case, you say, that words mean things, or gestures mean things, and when taken together, as in an act of behavior or act of sentence, such tiny provocations just add-up to paint a larger conceptual tapestry of the selected items in the array of the expression? It could be said that such a theory of language might function like mathematics, where the product is the sum of its parts: e.g., 2 + 3 = 5, or where the negative word ‘not’ works in a logical manner, negating what came before it. If so, in the latter example, the (non-standard) negative expression ‘I do not know nothing’ might paradoxically mean ‘I know everything’ since two negations surely must cancel each other out, positing the affirmative. Why not? This is how it would work in math. Yet, no one would follow such reasoning for language. Clearly, the idiom He kicked the bucket shouldn’t add-up to what it actually means He died if you take and simply add the items together in the expression. There is ‘no kicking of any bucket’ at all here. Being paid under the table doesn’t require one to receive payment under a table, etc. These are ‘idiomatic expressions’ which carry their own agreed meanings, absent the actual works that constitute the expression (There is no math here, no adding-up words to reach a sum). When it comes to Language, it seems there is much more than meets the untrained eye. And let’s call it “L”anguage with upper capital “L” so as not to confuse the term language (with a lower case “l”) which would mean token examples of Language such as e.g., the languages of English, French, Italian, Japanese, etc. Well, as it turns out regarding Language, a simple calculus of the items added together is simply not enough. Language is not math! ‘Language is much more than the sum of its parts’. So again: What is Language? Let’s start off by claiming what Language is Not. As already mentioned, Language is not the mere memorization or knowledge of a serial list of words (a lexicon). It is not enough to say you ‘know a language’ by simply knowing its flat dictionary. The memorization of a long list of words along with their meanings still doesn’t get you very far in terms of speaking/understanding a language. So, what is missing—you say: if you have the one- to-one ‘sound/sign to meaning’ mapping, what could possibly be missing? Well,

24 | Reflections

on Syntax: Lectures in General Linguistics

this is the very question that gets to the heart of the Chomskyan framework of language analysis. What’s missing is syntax. Syntax is defined as the set of rules (or now what is commonly referred to as parameters) which allow sound/ phonology to map onto meaning/semantics. Recursive-syntax is how we string our words together to derive meaning. We take the two main components of the linguistic expression to involve the ‘sound-based’ Phonological Form (PF) and the ‘conceptual-based’ Logical Form (LF). (See (10) below). Linguistics has a long tradition of studying this mapping between sound and meaning. So, where does syntax fit in? Syntax is the hidden interface of the two, abstract in nature and seemingly not the direct result of any condition or demand. Hence, one might claim that syntax does not emerge in order to meet some demand placed on it by an external condition. In fact, most Chomskyan linguists usually accept this line of reasoning: for the Chomskyan linguist, syntax is just quirky. These somewhat strange & special features of syntax—features which are often redundant and seemingly not arising out of some niche to satisfy some condition, features which seem to defy any notion of a Darwinist biological pressure of selection—make-up the touchstone of the Chomskyan framework. It is true: in order to map, there needs to be some degree of syntactic derivation involved. Indeed! But the vast degree of abstraction we gain from syntax—which reveals itself in various forms, universally, amongst all human languages—seems to go well beyond what would suffice to meet such communicative demands. This is the Darwinian puzzle of language: the question why such vast levels of abstraction and redundancy should emerge and evolve in nature absent any biological demand which would otherwise lead to its requirement. This notion that syntax might be too superfluous in nature, given its degree of abstraction, and seemingly so without a guided purpose toward satisfy some external condition is what has puzzled modern linguists over the last half century. This puzzle of trying to understand what guides this optimal usage of syntax is what Chomskyan linguists are most excited about—a large part of the Chomskyan framework, currently called the Minimalist Program (Chomsky 1995), is set-up to evaluate how the linguistic expression might in fact require such abstract levels of syntax in order to optimally serve the sound-meaning interface. The Minimalist Program involves the close examination of this structural mapping between the two components of the interface, all the while keeping an eye on how certain external conditions & requirements have to be optimally satisfied. Within such a minimalist theory, little import is given to words per se. Carrying forward in the Saussurian tradition (F. de Saussure), Chomsky views

Opening Philosophical Questions | 25 the spoken word as simply an arbitrary phonological shell, with no inherent ‘sound-meaning’ connection to what it denotes. It could be said that words arise out of a need to satisfy requirements of cognoauditory processing—namely, the word’s phonological structure arises out of its need to satisfy external ‘space-time’ conditions on language which results from the simple fact that words must be uttered and channeled in order to convey meaning across the airways. (I say ‘space & time’ here in the following manner: just try to say the three sound- phonemes /k /, /æ/, /t/all at the ‘same time’ in making the word ‘cat’. Clearly a feasibility condition requires that each phoneme be separated in space and time. This is not so with the abstract thought of ‘cat’). This condition works in a bi-directional manner in that both Articulatory as well as Perceptual conditions must be mutually met—the active articulatory in the sense of how the mouth can subservice the many articulatory features. I grant it that the mouth, along with its many articulatory features (place & manner of articulation such as lips, teeth, nose, hard palate, etc.), can combine in various ways to give language quite a robust store of possible human speech sounds (Note that in ASL, the same terms ‘phonemic’, ‘place & manner’ are used, given that the same underlying processes are at work in governing language, whether it be speech or ASL). But let’s not forget that any feasibility of sound production is based upon how such combinations of place and manner can easily be produced, reproduced, stored and retrieved from our mental lexicon, where we store all such information relevant to the sound-production and mental-conception of our words. This feasibility condition would be one such external condition that would be placed on the PF interface—i.e, the need for the active articulatory system to feasibly combine in ways which allow the formation of distinguishable speech sounds. (Slips of the tongue, e.g., ‘He tasted the whole worm’ meant as ‘He wasted the whole term’ which are known as spoonerisms, seem to be the result of some active mix-up found at the articulatory system). Sometimes, such mix-ups are the result of a passive fault, as found in assimilation and/or dissimilation of perceptual boundaries and features. The passive perceptual system (though not entirely ‘passive’ as processing studies reveal) must also satisfy conditions in the sense of how such articulatory features can be made coherent and recognizable as part of the sign-referent model of language. (Of course, the audible word is lost in American Sign Language (ASL) where the visual sign/hand-gesture replaces the spoken utterance, but where the same feasibility conditions apply). The satisfying of the condition is the basis for what becomes the interface mode between the Phonological Form (PF) dealing with sound and the Logical Form (LF) dealing with thought. These two ends

26 | Reflections

on Syntax: Lectures in General Linguistics

of the interface result from the fact that sounds/(PF) take shape as words which give us a reference to thoughts/(LF). Consider below in (10) how the Linguistic Expression models these two interfaces (interface in the sense that both PF and LF are no isolated processes, but must exchange information from outside sources—viz., LF (and its inner thought process) must interface with the outside world via PF, while PF too must interface with the outside world in terms of its sign-reference exchange. This flow of interface allows language to become objective and public and not just stuck as a subjective private intercourse held inside of our heads). (10) Model of Linguistic Expression (Figure 3)

Sound

Thought

PF→ Linguisc Outside world

Expression

→ LF Inside

Figure 3 Model of Linguistic Expression

Thus, these two forms, or levels of representations, arise out of a need to satisfy (cogno)audio-semantic properties of language. The articulatory system must provide ‘instructions’ to the speaker regarding how a given linguistic expression must be expressed. (Note, we are not referring to reading here, say, from a text. Rather, what we mean by ‘instructions’ refers to exactly how a speaker is to go about the phonological production of a given expression, a kind of pure phonetic reading from a mental script). The logical system along with its external constraints must allow ease of mapping of the world, as humans observe it—e.g., the fact that our worldly experience provides us with plenty of evidence for actors (causing an action), actions (as states, motions, and events), themes (people or objects which may undergo some action) or use of instrument (by some actor during some actor). (The details of these properties called argument structure will be later discussed in their relevant chapters). However, words too must be closely examined. Recall, words are arbitrary with no real recourse to meaning. By agreement, two people can attribute any meaning they want for their words. Words come and go and so do their meanings. But what seems to interest Chomsky is what is behind the rules which

Opening Philosophical Questions | 27 underwrite and govern how words can be strung together. (Syntax too is involved with the stringing together of ASL signs). Knowing its syntax allows one to claim they know a language. Thinking is not communicative language. True, to produce language one must be able to think. But as already mentioned, surely one can think without engaging in spoken external language, a public E-Language. Fine! So, there is something called internal, silent language, an uncommunicative private I- language. Right! Again, this is where Chomsky draws his line in the sand and says that the I-language is the more important of the two. There can be no naïve theory of language that relies solely on external E-language processes of reinforcement alone: viz., imitation, correction, or memorization. Any formal theory of language must address this particular I-language status (this aforementioned bridge between the inner/subjective and outer/objective). What Chomsky wants to know is the following: What is it exactly that is going on in the brain when you hear such-and-such or say such-and-such. In other words, what is the brain- to-language corollary? Given these arguments, language is defined herein as the syntactic mapping between sound and meaning. Perhaps thinking is a pure mapping of syntax- to-thought. But one doesn’t typically think in proper words/meaning. Albert Einstein famously claimed that he thought in images, and not in words. The great American physicist Richard Feynman likewise claimed that people often remember ‘ideas’ only—not memories packed by words, but pure memories of ideas only: ‘The ideas they remember, but not the words’ (Feynman 1985: 41, ‘Who stole the door?’). The linguist Steven Pinker in his book ‘How the mind works’ has extended the most articulated notion of this ‘wordless thought’ by suggesting that there is a formal system of thinking, a kind of mentalese which transcends all modes of language, but which is responsible for the underwriting of the very linguistic computational-system upon which all languages are based. Such a claim is tantamount to saying that language is the mere outer cosmetic clothing which lies upon the naked inner thought of mentalese. I suppose if we could communicate by direct-thought of mentalese, as in a form of telepathy, though there still would be the silent-language of mentalese, there would be no requirement for a phonology. It may very well turn out that what we term ‘language’—as understood by its constituent parts, defined as a ‘sound-meaning’ computational system with subsystems dealing with phonology, morphology, and syntax—is merely a bundle of phonological features which have come together from out of necessity to satisfy certain constraints as imposed on any biologically-determined

28 | Reflections

on Syntax: Lectures in General Linguistics

communicative system. Such features would be said to have come together— over time, perhaps via processes of evolution of adaption (Pinker & Bloom), or via a non-adaptive account termed exaptation (Chomsky, Gould)—to basically satisfy constraints which channel between the sensorimotor system and the system of thought. Both systems would come with its own portmanteau of internal and external constraints which would need to be satisfied in order for a well-structured and economically feasible language system to take shape within its inherent biological niche—i.e, the fact that language must start out (i) as a phonological, sound-based phenomenon (phoneme), (ii) allow certain phonemes be tagged to represent derivational and inflectional properties of a given stem (morpheme), and finally (iii) to determine the arrangements of such morphemes into a fairly fixed linear order (syntax). On the face of it, these three basic properties of language come out of a need to satisfy constraints as imposed from the outside onto the inner linguistic system. (We note that in ASL, the same constraints apply). Taking this idea a bit further, we need to spell-out what we mean by the term language and language & linguistics, as discussed in this book. First off, as mentioned earlier in this chapter, we must acknowledge that ‘Language’ can be a lot of different things for different people. For instance, who is to say that the ‘slamming of a door’ just after a heated exchange doesn’t contain some equivalent meaning as would be expressed in the linguistic utterance Damn it: I have had enough! Conversation over! (Door slam). Gestures, indeed, reveal quite a lot of meaning. Eye-rolling, head-nodding, and other facial expression seem to be shorthand substitutes for their relevant linguistic counterparts. Just think how often the innocent-enough expression that’s fine— dear in fact (silently) means I don’t really care—d o what you will. Plainly, we cannot always simply string together words to get at meaning. Language, as we have shown above, is not mathematical in this way—language is not a zero-sum game, language, as it turns out, is very rarely merely the sum of its parts. The stuff of language is largely ‘emotional’. As the translators of languages know all too well, it is never enough to simply mediate between the definitions of two words between any given two languages. In addition to what we claim as ‘meaning’ in any sense of the term, what ‘words’ carry is predominately culture- bound bundles of sentiments and experiences. It is in fact this bundle of commonly shared world-views that we hear and process as language whenever we use words.

Opening Philosophical Questions | 29

Three Theoretical Models of Linguistics This all leads to some mention of the three different theoretical strands of linguistics, all of which could be said to be housed under the main heading ‘the psychology of language’. The three models are: (i) Generative linguistics (Chomsky) which seeks to provide a biologically based though isolative rule-based template to innate language, (ii) Cognitive linguistics (Langacker) which seeks to tether any such language template to more rudimentary cognitive means, such that linguistic structures are motivated by general cognitive processes, and, (iii) Interactionism linguistics (Bruner) which seeks to show that language is by far more importantly a social phenomenon. These three frameworks have one thing in common—they all attempt to deal with language as a real object of inquiry. Using a philosophical analogy of ‘clouds & clocks’, these three approaches provide the full spectrum of how one deals with language more-or-less as a mechanical ‘clock’ (with Chomsky being the utmost clock’s man). What is missing from these models however is any reference to how ‘emotions’ (or ‘clouds’) play into language. While we do acknowledge that language is largely an emotional behavior and experience, the topic of an ‘emotional language’ (Greenspan & Shanker) is a topic for another day, and for an altogether different book. The range of topics which has been presented in this text coincides with more of a mechanical view of language, not an emotional view. In no way do I say ‘mechanical’ in a derogatory tone, but say so stressing the fact that the syntactic scaffolding as presented within upholds language, I think, in its true light: as a real object, biologically determined and rule-based, with real physiological manifestations in the speaker’s mind/brain. Hence, the framework of this text is a Chomskyan one. It is a framework which is determined to show that ‘language’ should be counted among the ‘soft- sciences’: mathematical, biological, neurological … toward the ‘clock’ direction. Others prefer to turn their attention to the opposite direction, towards ‘clouds’. There, ‘language’ is counted among the humanities: literary, historical, discourse, emotional, social. There is no battle cry amongst the two camps here, only differing perspectives.

30 | Reflections

on Syntax: Lectures in General Linguistics

Language and Linguistics The study of Language and Linguistics has a lot going for it. Sure, in typical cocktail parlance, it might take the odd linguist some doing to get beyond the initial misconceptions that most people have about the field. But once common ground is found and a shared point of departure is reached, the kinds of intuitive questions that get asked by most people emerge within a certain scope of genius: it seems people of all walks-of-life are naturally drawn to topics surrounding Language & Linguistics, and the kinds of questions people come to ask are seldom naive in nature. I know of no other topic that does more to touch on the collective nerve of what it means to be uniquely human. I find such sophisticated questioning falls into two broad classes of wonderment: How is it that language has evolved with such an elaborate labyrinth of complexity? How are we to describe this complexity and how is it only humans have evolved such a system? How can we possibly explain the fact that children arrive to acquire such a complex weave of sophistication, and seemingly so effortlessly without instruction? The first question covers aspects of language evolution as well as the Darwinian puzzle of how there has arisen in the first place a ‘species-specific’, biologically determined human computational system absent the natural selective powers which otherwise would be required to necessitate such a system. The second question addresses the so-called learnability problem. What linguistic theory grapples with on a daily basis is how to refine and reshape what we in fact intuitively know about our own language in ways which render this tacit knowledge worthy of empirical study. The main challenge for linguists over the years has always been to divert what seems to be a naturally occurring language phenomenon (of an implicit nature) and to make it explicit. In other words, our charge as linguists is to turn as a formal object of study that which otherwise seems plain and comes quite natural to us. This move from subjective (inner/implicit) language to objective (outer/explicit) language than allows for what privately occurs in the heads of the speaker to be open to hypotheses testing and scientific procedure. But perhaps the most important facet and articulated feature which drives language is the idea that language is biologically determined—namely, a system which naturally emerges out of our endowed birthright of being homo sapiens-sapiens. The notion of a phylogenetic, biologically based language system will become one of the major themes of modern day linguistic.

Opening Philosophical Questions | 31

The Biolinguistic Perspective The biological basis for language, the biolinguistic perspective, began to take shape some three decades ago as part-and-parcel of subsequent spin-off discussions arising from an even earlier revolutionary shift in linguistic thinking. The actual paradigm shift itself began in the middle part of the last century headed by the young MIT linguist Noam Chomsky. Chomsky’s contemporary Eric Lenneberg, with his seminal 1967 work Biological Foundations of Language, further extended this newly conceived perspective of language squarely into the biological realm. To the incidental layperson caught-up in the audience, a seeming rationale behind the paradigm shift would be construed as amounting to little more than notational squabbles over linguistic terminologies and definitions. To be sure, heated exchanges of this type did take place between the two newly diverging linguistic camps (Pro/Con-Chomskyan). But as was to be quickly discovered, paradigm shifts bring about with it not just new terminologies—and, as in Chomsky’s case, a vastly new vocabulary with a host of metaphors, acronyms, and syntactic tree diagrams—but paradigm shifts are just that, ‘shifts’ in reasoning. Though few and far between, what paradigm shifts best reveal is a timely and entirely new and innovative way of thinking about an old something. Similar to what we found with Freud’s new field Psychoanalysis (sure, we had dreams before Fraud got ahold of them) or Darwinian’s insights into Evolution of Species, Chomsky’s new field, now called Generative Grammar, too would be an entirely new way of thinking about an old something. The notion of ‘Language’ now would never be the same. In order to trace back how the new way of thinking of language evolved, it’s instructive to recall what comprised of the earlier field of linguistics prior to the paradigm shift. For the most part, it could be said that classical linguistics held closely to roots appropriated by the disciplines of the humanities—i.e., a 19th century view of the study of language & linguistics for the most part amounted to the study of e.g., language as a classificatory system, historical, and to a large degree, defined by external appearances as would be understood by terms such as formal vs. informal usage, colloquialism, vulgarisms, non-standard dialects, etc. There was a fair amount of work undertaken to build-up dictionaries of what were then considered to be ‘primitive languages’ (18th–19th century missionary work), and certainly, a valid attempt to glean what we could from such primitive peoples and their languages would play center stage for well over the two-hundred-year tradition of anthropological linguistics (of the Levi-Strauss

32 | Reflections

on Syntax: Lectures in General Linguistics

persuasion). Although some attempts to view internal processes were undertaken in the early 20th century—such as was the case of phonology and phonological sound-shift with the work of Hermann Paul—much of the direction of study regarding ‘processes of change’ were treated in terms of the external system as a whole. Notwithstanding Paul’s laudable contributions, little attempt was ever made to understand the phonological change by addressing internal rule-based procedures. While 19th century linguists were consumed by language’s outer phenomena, its usage and historical record, there was yet no attempt to get at the core of what made language tick. The move away from external observation of language (B.F. Skinner) to internal explanation of language (Chomsky) came with this Chomskyan turn of a screw—linguistics was now to be considered rightly in its place as a science, with a new emphasis to understand “L”anguage (with a capitol “L”) as an external manifestation of some internal component of the mind. Hence, with a slight Chomskyan turn of the screw, what befell us-linguists was a newly envisioned brain-language corollary. But how should we exactly define ‘language’? What the new linguists wanted more than anything else was to make explicit their thinking of what they felt constituted the formal features of language. Namely, what was pressing on their minds was how to define the linguistic subject of inquiry and how best to clear-up any lingering and misguided notions carried forward from earlier linguistic frameworks. The most pressing item on their minds was to advance the notion of language & linguistics in such a way that it promoted the discipline as a science worthy of scientific inquiry, a science on par with fields such as biology, math and perhaps even physics. But in order to advance the scientific inquiry of this newly defined ‘language’, bold action needed to be taken to strip otherwise ‘non-scientific’ aspects of the study, leaving only bare essential related to the brain-language corollary. Such non-essentials which would become de-emphasized within the new Chomskyan framework would include, inter alia, historical linguistics, sociolinguistics, pragmatics, as well as notions of Second Language Learning. (It is still my understanding that Chomsky has remained largely agnostic about putative theories surrounding second language learning beyond the so-called critical period of acquisition reached at around puberty). In his view, such language learning doesn’t fall under the heading of a biologically determined natural acquisition process.

Opening Philosophical Questions | 33 Second language (often labelled L2) is therefore cast more in line with general theories of learning and/or problem solving (as indicated by ‘bell-shape’ curve distributions—see Appendix-5). Of course, Child Language Acquisition (L1) would become the cornerstone of the Chomskyan revolution since it is the best and closest means of pegging language development to brain development, a true biological measure. From out of the child-language perspective came Chomsky’s most articulated work, his Principles & Parameters Framework (Chomsky 1981). The newly envisioned Chomskyan language system, now stripped down to its bare biological features of necessity, consist of only two central modules, referred to as the two internal properties of the language faculty: (i) the sound/ sign interface (Phonology) and (ii) the meaning interface (Semantics), and the fact that Syntax would be said to minimally interact between the two. We take the term ‘interface’ to meaning that there must be some amount of interface/ exchange between the representation system itself and the outside word. Since we are dealing with a biological system, we might begin to ask of the modules what kinds of constraints must be considered and which requirements must be satisfied as imposed by the brain-to-language corollary. This needs to seek out and understand both constraints and requirement, as imposed by the cognolinguistic operating system, and how such requirements become satisfied will become the central themes of investigation over the life-span of the Chomskyan framework, culminating to its highest degree in the Minimalist Program (Chomsky 1995).

Language Evolution Language is quite possibly the most unique of all complex systems known to man, with little if any antecedence to its nature and origin traceable back to a Darwinian world. This is obviously a strong statement and so deserves some careful attention. Let’s begin with notions of communication, as would be underwritten by more robust ‘cognosemantic’ schemes (potentially similar schemes which might likewise underwrite animal communication), then proceed to see where such schemes end, thus delimiting those more ‘abstract-syntactic’ schemes thought to be related to true language (i.e., ‘L’anguage with a capitol “L” (e.g., Bickerton 1990)). First off, it appears that mere communicative needs as would be determined by a Darwinian model could not have possibly provided any great selective pressure to produce such an elaborate system as language that relies heavily

34 | Reflections

on Syntax: Lectures in General Linguistics

on properties of abstraction.2 As it turns out, at least in Darwinian terms, a vastly more prosaic and rudimentary cognitive-based system would certainly have sufficed—i.e., an adaptive system which plateaus at levels where basic communicative demands are sufficiently met. What we mean by ‘basic’ here is that the quality/quantity of the communicative system would not have needed to go beyond what is sufficient to support early hominid existence. One must keep in mind that our hominid ‘family’—starting from ‘genus’ Homo habilis, erectus and moving through to our early ‘species’ Homo sapien—didn’t seem to rely on the kinds of abstraction we are considering here for hundreds of thousands of years. Here is a case in point: The fact that I find myself right now in my eighth-floor office of a beautifully designed State University campus, writing on my computer, sifting through e-files under florescent lighting, listening to a transistor radio about the landing of our American ‘Mars Rover’ and its preparation to take rock samples all provide ridiculously little comfort for the many problems which beset our species in securing survival. In short, the vast levels of abstraction required to support such superfluous behaviors in no way assist in any biological calculus that would help maintain a basic survival niche as deemed associated with communication.

It seems that at the upper-range of our early homo communicative-complexity spectrum, any putative Darwinian demands would be readily handled by the mere combining of simple one/t wo-word utterances, taken together with token hand gesture. It seems that’s all! In other words, the intensely abstract nature of syntax that we find throughout all human languages is ‘freakish’—it just doesn’t seem to be a necessary prerequisite for the minimal type of meaningful communication that would have had to be established and nourished in order to maintain

2 The Nobel Laureate Salvador Luria is best remembered for this claim, as cited in Chomsky’s address to LSA, Jan. 9 2004. An instructive example of ‘abstraction’ as a leading indicator and precursor to ‘language is the shift one finds between the ‘iconic’ representations found with the first three digits of Roman numerology I, II, III, to the more abstract ‘symbolic’ representations of IV, V, VI, X, L, C, M, etc. With the former, an iconic 1-to-1 mapping can be traced from referent to referred so that the wholeness and entity of ‘one’ is maintained throughout the mapping—i.e., I = one object, II (I + I) = two objects, III (I+I+I) = three objects as mapped in the referential environment. Note that this is not the case with the latter Roman numerals where no such iconic mapping can be established. The invention of the zero has also been rightly used to illustrate this nature of abstraction.

Opening Philosophical Questions | 35 the homo family line. I don’t mean to say that there were/are no other aspects of human language which indeed rely upon and can be explained by adaptation- based evolutionary theories. It’s just that as it so happens, looking back, then speaking in strict scientific measures, there doesn’t seem to have been any strong evidence for the kinds of selective/adaptive biological pressures that would go into motivating such a full-blown abstract grammar replete of all the ‘bells & whistles’ that have led to so many of humanity’s great achievements. The abstract & symbolic qualities that language has to offer do little to address ‘external’ communicative norms of language of function, but rather the abstract nature supports ‘internal’ norms of language of thought (Jerry Fodor). Language, for all is great features and innovations, is vastly superfluous in evolutionary terms, reduced to having a non-functional status as in ‘frosting on top of frosting on top of frosting (on top of increasingly diminishing returns of our cake)’. In short, such qualities and quantities of abstraction provide little foundation which otherwise would fit into a Darwinian evolution model of language (e.g., Bickerton 1995; Gould 1993; Lightfoot 2000).

Universal Constraints on Language One of the major contributions made by this biological basis of language (‘biolinguistics’ as termed by Chomsky) is the proposal that Human ‘language of thought’ seems to be highly constrained, both in its arrangement of architecture and in its modularity of processing. Much of current biolinguistics is preoccupied with the nature and scope of these constraints—which may or may not be tethered exclusively to the language system—which seem to limit and restrict the arrangement of possible language structures and functions. Of course, the very fact that language as communication must travel via some auditory channel surely means that the physical properties of sound enter into the equation of physics, and that phonology of language as sound must be constrained by the same general laws of sound. For instance, reconsider our simple example of the word ‘Cat’ /kæt/as shown earlier. Surely, language cannot get away from the fact that any (reverse) disturbance of the phonemes as arranged within the word (its linear order), say, yielding ‘Tac’ /tæk/(for ‘Cat’) breaks down the communicative channel. Hence, linear constraints placed upon inter-word word order, as well as intra-word phoneme placement might merely be an artifact of the physical constraints time & space have on sound. (If we could speak all our words at the same time, perhaps word order would just dissolve, not to mention

36 | Reflections

on Syntax: Lectures in General Linguistics

what telepathy might do to notions of word order. In this manner, it has been suggested that our true language of thought may abide by no fixed order). Regarding linear order as expressed above involving phonology, syntax equally could be argued to abide by physical constraints due to the manner in which merge operations evolve merge, being defined here as the syntactic process by which two separate items get pulled from out of a lexicon and come ‘merge’ together to form single larger constituent, as we find with two words within a phrase. This newly formed phrase, while taking on the lexical properties of the two merged words per se, now becomes its own identity with its own set of phrasal properties. One interesting illustration of how physical constraints can shape syntactic derivations involves so called rewrite rules. By defining such rules in terms of the famed Fibonacci series 0, 1, 1, 2, 3, 5, 8, 13, 21…(rule: add the last two numbers to derive the next) we arrive at a set of rewrite lines of derivation which reveal exactly what one finds in terms of Chomskyan minimalist assumption of Merge. For instance, if we take as our constraint the Fibonacci rewrite rules whereby 0 → 1, 1→ 0,1, we find syntactic phrase structure naturally occurs as follows (which naturally provides ‘binary-branching’): (11)

0

(Fig. 4)

1 0 1 0

1 0 11 0

1 1

Figure 4 Fibonacci Sequence Yields Syntactic Tree Diagram

This same recursive structure is what we will find when we begin to consult syntactic tree diagrams, structures which are made up of phrase-level constituents. (See lecture 2 below for an introduction). So, while such constraints are likely to go unnoticed by an untrained eye, indeed, what linguists find when closely examining putative constraints on universal grammar are precisely such constraints which speak to the nature of how any computational system would have to be organized. Specifically speaking, with regards to what is known of ‘abstract’ patterns as found in grammars—patterns typically referred to as stemming from peripheral properties of UG (Universal Grammar)—such abstract patterns appear

Opening Philosophical Questions | 37 to a large degree ‘arbitrary’ in nature, whereby other alternative possible patterns, though perhaps not found within UG, could have equally surfaced, and, in fact, by doing so improve the language communicative channel, at least from a functionalist standpoint. That is, if one selectively looks to the general, core properties of language as a means of mere communication (functionalism), one quickly discovers that many alternative logical patterns of language which could otherwise have surfaced in the world’s languages (and with added beneficial features to boot) simply don’t. Such arguments have been advanced within the Chomskyan ‘biolinguistics’ framework to define the theoretical split found between— (i) Functionalism and functional linguists (who emphasize the role of communication and external ‘language of function’ in the evolution and acquisition of language) and, (ii) Formalism and formal linguists (who emphasize the role symbolic language plays in the internal ‘language of thought’). Conversely, there are even stronger claims made that existing constraints found amongst UG patterns may, in fact, hinder the best and most effective modes of delivering a message via the communication channel (e.g., Chomsky 2005; Lightfoot 2000). For example, when one examines the UG principles of binding co- referential in English reflexives pronouns, it appears the patterns are arbitrary and that many alternative patterns not found within UG could equally have served from a mere functional/communicative standpoint (where the subscript indicates a co-reference, and asterisk* marks for ungrammaticality). Consider the referential properties of the English Pronoun system found below. Referential properties of English Pronouns (12)

a. Johni hurt himself i b. Johni hurt *himi c. Johni said hei/j feels pain.

d. *Hei said Johni feels pain. e. Mei hurt mei (child syntax) f. Him do it

(where the index (i) correlates and joins the two items indexed insofar that Johni & Himself i are one in the same constituent: John = himself, referred to as an anaphoric/antecedent relation). For example, one might easily question what it is exactly that makes (12b) so wrong (where John and him could correlate). As for myself, I simply cannot envision

38 | Reflections

on Syntax: Lectures in General Linguistics

what line of difficulty arises by parsing (processing) the utterances John hurt him as having the equivalent meaning John hurt himself. What is it that possibly blocks the two from having the same interpretation? Clearly, at some basic ‘functionalist’ level (of communication), the two expressions easily mean the same thing. Therefore, it must be blocked by some level of ‘formalism’, some level of syntax. It is striking that such constraints as found with binding theory, while non- functional in nature, tend to hold as ‘universals’ across the world’s languages. So, what sort of ‘universal’ do we have here (and where might they come from) which so apparently fly in the face of mere intuitive communicative assessments? One might rather ask here whether such a universal constraint on binding is in fact addressing some other conditions placed on language which speak less to (iconic) functionality and more to the very (abstract) nature of syntactic processing itself.

Biological Basis for Language We now know that the ‘brain-to-language’ correlation is physiologically real: that is, we see specific language tasks (such as storage & retrieval of verbs, nouns, such as phrase structure constituency) as well as language breakdowns—either be ‘activated’ by (the former) or be ‘caused’ by (the latter) specific areas of the brain (e.g., see Saffran for a review). In sum, what we shall term Lexical Categories in this book (e.g., Nouns, Verbs, Adjectives, Adverbs) will be said to activate the Temporal Lobe region of the brain (Wernicke’s area), and what we shall term Functional Categories (e.g., Determiners, Auxiliaries/Modals) will be said to activate the Left Frontal Lobe region of the brain (Broca’s area). When we reach that juncture in our discussion which requires the drawing of tree diagrams, we must keep in mind that we are not simply drawing trees, but rather, what we are drawing is indeed a modeling of what we believe is going on inside the brain: a brain-to-language mapping. In fact, we will come to view trees as being cryptic models of the inner-trappings of our brains, so that when we process some aspect of language, we might visualize what is going on in our heads. Trees allow us to model such a mapping. Syntactic trees are physiologically real!

Brain/Mind-Language Relation It is now largely accepted that language is served by two major regions of the brain: Broca’s area (left frontal lobe), and Wernicke’s area (left temporal lobe).

Opening Philosophical Questions | 39 As stated above, the differing activation areas seem to present us with categorical distinctions between lexical substantive words and functional abstract words. Also, it has been reported that the same distinctions hold between (rule-based) Inflectional morphology—e.g., the insertion of {s} after a noun to make it plural, (e.g., book-s)—and (rote-learned) Derivational morphology—e.g., the insertion of {er} after a verb to change it into a noun (e.g., teach-er). The picture is much more complicated as is made out here, with some overlap of processing that may blur clear distinctions. However, overall, the brain does seem to behave as a Swiss Army knife of sorts, with specific language tasks activating specific regions of the brain. This dual distinction is best shown in brain imaging studies using fMRI (functional Magnetic Resonance Imaging) and ERPs (event related potentials) whereby different areas of the engaged brain undergo different blood flow (fMRI) or electric response (ERP). Either measuring shows a triggered response to specific language-based tasks.

Connectionism vs. Nativism Connectionism. Some cognitive psychologists and developmental linguists wish to attribute a greater role of grammar and language development to the environmental interface. By stressing the ‘exterior’ environmental aspect, connectionists attempt to show correlations between the nature of the language input and subsequent language processing leading to output. Connectionism suggests that there is often a one-to-one mapping between input and output as based on thresholds of type/token item frequency. Their models assume that though language input is ‘stochastic’ in nature (i.e., random), the child has an inborn statistical calculus ability to count and uncover patterns which lead to the formation of a given grammatical state. They further suggest that the only way a child can gain access to the stochastic surface level phenomena of language is by brute powers of analogical association. Such powers of association are assumed to be part of the general knowledge the child brings to bear on the data, a general knowledge as found in cognitive problem-solving skills. Unlike the nativist position (on the one hand) which upholds the view that the language faculty is autonomous in nature and formal in processing (i.e., not tethered to ‘lower-level’ cognitive arenas of the brain), connectionists argue against formalism and do not assume (nor believe) such ‘higher-level’ processing specific to language. Connectionists prefer a more functionalist stance in

40 | Reflections

on Syntax: Lectures in General Linguistics

claiming that language development arises in its own niche as the need to communicate increases. Due to their functionalist stance, connectionists don’t theoretically need to stipulate for an autonomous rule-based module in the brain. Connectionists rather believe that brut cognitive mechanisms alone are in of themselves enough to bring about language development. Connectionists usually flaunt their models as being closest to what they believe the human brain actually does—viz., a neuro-cellular brain driven by ‘Off’ (0) and ‘On’ (1) switches. In stark contrasts to the nativist position stated below, connectionism assumes language development to process much in the same manner as any form of learning. (See Marcus for an overview of the ongoing debate). Nativism. Other cognitive psychologists and developmental linguists rather place the burden of language acquisition squarely on the innate interface by stressing the internal aspect generating the grammar. While innate models also support the notion that the environment is stochastic in nature, they do so by stressing that the perceived input is at such a high level of randomness, with apparently ambiguous surface-level phenomena found at every turn, that one must rather assume a preconceived template in order to guide the child into making appropriate hypotheses about her language grammar. Otherwise, without such an innate template to guide the child, the randomness is simply too pervasive to deduce any workable analogy to the data. An important rationale of nativism is its claim that language development is much too stochastic in nature for the available input to make much of an impact on the child’s learning scheme. Much of the work behind nativism is to show just how the child’s perceived data is much too impoverished to determine an appropriate grammar of that target language (as determined by the ‘Poverty of Stimulus’ argument (see Appendix-1)). In other words, since an appropriate minimum level of order is missing in the input, an innate module of the brain termed Universal Grammar (more currently being called the Faculty of Language (FL)) must step in to supply whatever rules might be missing from the environmentally driven input. The nativist model places its emphasis on the inner working of the brain/ mind relationship to language by stipulating that there are innate principles which guide the language learner into making appropriate hypotheses about the parameters of a grammar being acquired. This Principles and Parameters model as illustrated below shows how (i) the language input first passes through the Faculty of Language (FL), and (ii) the FL determines the correct parameter settings (Principles & Parameters), then (iii) the parameterized language gets spelled-out in the output:

Opening Philosophical Questions | 41 FL (13) Input-L

Output-L

P&P

Figure 5 Principles & Parameters Model

The Principles and Parameters Model: (P&P) In summary, with more detailed discussion to follow, the Principles and Parameters Theory (PPT) (or what Chomsky sometimes labels as P&P Framework), removes the (conscious) burden of ‘language-learning’ from the child and rather instantiates in its place an ‘innate’ FL as a (subconscious) ‘intervening computational system’. Chomsky claims it is this specific FL which is housed as an autonomous module for language, and not some general cognitive learning apparatus, which plays the greater role in the language acquisition process. In one sense, PPT interpretations suggest that FL holds all possible human grammars in the head of a child at any one time prior to parameterization (which is at about two years of age). In this sense, very young children, say before the age of two, are really citizens of the world. This is one way to view the term Universal Grammar. Even potential grammars that don’t get realized in terms of a language are held as potential bundles of parameter settings waiting to be set by the relevant input the child would receive. This greatly reduces the role of ‘active’ learning and rather emphasizes the role of ‘passive’ triggering of the appropriate parameter settings which then form the spell-out of a specific language (say, English or French or German). In other words, PPT redefines a ‘Language’ as a set of specific bundles of arrangements of parameter settings (of which there could be as many as twenty or so). Some of the basic parameters have to do with Word Order of a specific language type. For instance, languages that are SVO (Subject, Verb, Object) reduce to a parameter that specifies that Head of a Phrase place in initial position— (14) [VP [V (head) like] [N (complement) ice-cream] ],

[+ Head ini al]

VP V like e.g., We like ice-cream.

N ice-cream

(Head = Verb, found in in

l po

on)

[VP [V like] [N ice-cream]]

42 | Reflections

on Syntax: Lectures in General Linguistics

or if a language allows verbs to invert or wh-words to move— (15) Auxiliary inversion

( [Aux Do] [ you [do] smoke?] ) or,

[+ Aux Invert]

(16) [What do [you do study what?]].

[+ Wh-movement]

Based on parameters being binary in nature [+/- setting], we could account for languages which do not allow such verb or wh-movement (as found in Chinese) or of languages which rather maintain the Head of a Phrase as Head Final, labeled as [-Head initial] (as found in Japanese or Korean)— (17) [VP [N ice-cream] [V like]].

[- Head inial]

VP N

V

ice-cream like e.g., We ice-cream like.

(Head = Verb, found in final posion) [VP [N ice-cream] [ V like]]

(Note: The Japanese equivalent sentence to English ‘I love you’ would be similar to what we find in Spanish where there is also found within certain syntactic constructs an SOV word order with a parameter setting [-Head initial]. Japanese/ Spanish ‘I love you’ translates to ‘I you love’ (‘Yo te amo’) or in Japanese ‘Watashi- wa’ (Nominiative case “I”), ‘anata-o’ (object case “my darling you”), with the final word verb-stem ‘aisuru’ /‘aishitei’ (variants meaning “to love”): ‘watashi-wa anata-o aishitei-masu’ (= I you love) with ‘-masu’ a polite present tense suffix).

The greatest contribution, I feel, is that P&P completely lessens the burden of learnability children face with regards to their language acquisition. Children simply don’t ‘learn’ their language, rather they (subconsciously) correctly ‘set’ (over time) the relevant ‘parameters’—namely, child language acquisition is a maturational- based passive (not active) enterprise of parameter setting. Rather than talking about the course of ‘language learning’, most developmental linguists today talk about the course of ‘language parameter setting’. The fact that parameters are simple binary choices reduces the myriad of possible choices that otherwise would be made available to the child. There are sound reasons to be suspect of any putative form of active learning of language outside of what parameters would provide. As noted earlier, it may very well be that language

Opening Philosophical Questions | 43 is just that kind of a closed biologically-determined system (as is cell division or the acquisition and fending off of a virus) which can’t be learned, de-learned, or abridged (by statistical counting or otherwise). And so nativists take as their biological null-hypothesis the assumption that some maturational scheduling of the innate LF must serve as a surrogate learning mechanism and, in time, deliver the language grammar.

The Critical Period Hypothesis If language is biologically determined, might there be a closing window of opportunity for such a biological system to manifest a full fledge grammar? Many think so. In fact, the critical period has been used to help account for the well-k nown fact that the learning of a second language (during adulthood) seldom seems to progress as smoothly as the acquisition of a first native language (during childhood). But to speak of a critical period is somewhat strange. One doesn’t typically speak about critical periods when we are dealing with ‘learned endeavors’, i.e., cognitive problem-solving skills, etc. For instance, one doesn’t necessarily assume that there is some upper age limit that would prevent a wishful adult from, say, learning how to drive a car, granted there is no disability that would otherwise hamper cognitive learning. Conversely, pre-critical period child language doesn’t seem to follow the typical bell-shape curve3 found in learned activities which show a statistical bell curve of distributional mastery for the given activity. It seems that if there is a critical period, it doesn’t support any putative culture-bound ‘learning of language’ per se. Rather, it seems a critical period has more to do with an endowed human gift for ‘acquiring a language’—an acquisition that (i) is our free birth right, making-up part of our species-specific genetic code (the mental/internal component), that (ii) must be triggered by the natural input (the material/external component), and that (iii) than closes up at around puberty, fully after the acquisition has been secured. If there is any concept of learning taking place within language acquisition, it would be with the material/external second component, 3 For discussions on the ‘Bell-shape’ curve, see Appendix-4, portions copied from the paper: https://w ww.academia.edu/42204713/Notes_1-2 _Reflections_on_ Syntax_ Note_1_A _Note_on_t he_Dual_Mechanism_Model_L anguage_acquisition_vs._ learning_and_the_Bell shape_curve._Note_2 _A _Note_on_Chomskys_2013_ Lingua_paper_Problems_of_Projection

44 | Reflections

on Syntax: Lectures in General Linguistics

though nativists would prefer to us the term parameter setting instead of learning, since parameter setting is considered to be done on a more passive, subconscious level. One of the more striking distinctions made between nativism and classic behaviorism is that the former assumes parameter-setting and language knowledge thereof to be of an implicit nature, (i.e., grammar is considered a form of procedural knowledge we don’t normally access on-line), while the latter affirms that knowledge of language is declarative, active and arrived at by a conscious will. Having said this, there seems to be some consensus brewing from both sides of the debate that, minimally, some form of innate a priori knowledge or mechanism is indeed required in order for a child to speculate on the range of possible hypotheses generated by the input. Current arguments today, often termed the Nature of Nurture, therefore may boil down to only the second component cited here—viz., of whether or not ‘learning’ is taking place or whether ‘parameter-setting’ more accurately describes the acquisition process. It seems now all but a very few accept the idea that some amount of an innate apparatus must already be realized by design in order to get the ball rolling. So, it is becoming more recognized that the cited first component which speaks to the mental/internal nature of language must be somehow given a priori if any feasible theory of language is to be offered—much to the credit of Chomsky and to the chagrin of the early behavioralists of the pre-Chomskyan era.

Future Research and Directions: Where Do We Go From Here? In addition to core questions as to what forms the bases of our grammar, other peripheral questions regarding the uniqueness of language, the biological basis of language, along with notions of a critical period and brain imaging of language related tasks, etc. will remain with us for a very long time to come. Ongoing, as we begin to understand the many complexities behind this brain-to-language relation—while keeping up with current pursuits for utilizing brain imaging devices—our continual aim is to sustain this shift in linguistics from being a mere typological, classificatory and historical discipline, (a branch of humanities, though fruitful as it has been in its own right) to being a hard-scientific discipline, (on a par with biological studies). The material as presented in this text squarely comes down on the nativism side of the debate. However, what is important to understand is that both

Opening Philosophical Questions | 45 connectionism and nativism have their own unique roles to play in determining language processing and grammar development—both are to certain degrees correct depending on what aspects of language one is talking about. For instance, it may very well be that vocabulary learning is associative-driven and sensitive to frequency. It seems though that the same arguments seemingly cannot be made for syntax, which relies more on a computational algorithm to detect hidden rules of grammar. As will unfold in the following pages and chapters of the text, the debate between associative vs. rule-based systems, or connectionism vs. nativism, will make itself known, so much so that the debate will actually infiltrate all aspects of our discussion of grammar. As a final note, I firmly believe the greatest impact to be made on our future understanding of language and linguistics will be in how we come to partition specific regions of the brain which are responsible for specific language tasks. Our understanding of grammar, viewed in this way, will be informed as based upon our understanding of the brain-to-language relation. End-Note: Much of the discussion here on Skinner vs. Chomsky draws out and retraces an ‘Evolution of Linguistics Theories’: viz., that early naïve theories held that language was: (i) Imitation-based of the [X]‌=>[X] type, where child language was seen as a mere echo-like repeating process. (ii) Analogy of the [X [YZ]] => [W[YZ]] type eventually overtook imitation- based theories when we began to see that speakers are not at all always conservative to their inputs, but rather may be productive, at least in analogy. In fact, Berko’s famous ‘wugs test’ quickly disproved imitation and, as could be argued, helped to advance the theory of analogy—viz., given the sound-pattern that singular [[ug]] leads to => plural [[ugs]] (i.e., since bug to bugs, therefore via analogy wug to wugs). Of course, Berko and Chomsky would show that while both imitation and analogy may have its place in language production, both cannot be the sole twin- modes of processing since the two modes are without category-based rules. Chomsky shows that true language/syntax is free from such frequency and rote-learning and rather relies on the ability to use algorithmic computational models of language processing. Another less-k nown example of analogy comes to us via the historical evolution of ‘Wh’ words: where the analogy of wh-prefix plus stem is formulated: [wh]-‘question’ to [th]-‘response’ (with [at], [en], [ere], [ich] stems) (e.g., What is that? When is then? Where is there? Which is each?):

46 | Reflections

on Syntax: Lectures in General Linguistics

[wh [at]] [th [at]]

[wh [en]] [th [en]]

[wh [ere]] [th [ere]]

[wh [ich]] [Ø [each]]

[wh [u]‌] (who) [I [Ø]] [Ø [you]] [m [i]‌] (me)*

*Or what we find with accusative case {m}-marking: [Hi [m]‌] (he vs. him), [the [m]] (they v them), [who [m]] (who v whom). (Both imitation and analogy are seen as Skinner-based associative means of language production) (iii) Computational of the {γ {α, β}} or x + y = z type is the final completion in the series of language theories—where language is productive in nature, can be free from frequency effects and rote-memorization, and is otherwise abstract/categorical in nature. (e.g., N+s = plural, V+s = 3P, Sg, Pres, V+ed = past tense, I = Nominative v Me = Accusative Case. Instances of INFLectional morphology are computationally x+y = z based).

2

Preliminary Overview

I can’t think of any other sort of software (the ‘computer-program’ analogy for the computational design of the human mind) which would require its hardware (the human brain) to establish a meaningful algorithm which makes use of movement at a distance, thus preferring abstract structural closeness over physical adjacent closeness—whereby an essential aspect of the operating system relies on rules of structure dependency. This preference widely differs with what we find amongst formal, non-human language designs. Hence, such a selective choice being ‘biology-driven’ must somehow be recognized as ‘biologically optimal in design’ in order to satisfy displacement properties exclusively found in natural language. This monograph is essentially about such an operating system of language design.

The Fibonacci Code The very idea that the way humans string words together may have ancestral links to spiral formations found in shellfish is nothing short of stunning. Yet, the ‘golden ratio’ of Fibonacci 1,1,2,3,5,8,13,21,34… etc.… holds for our language design. (If you prefer to read the ratio as a binary rule: then [0 = 1], [1= 0, 1]).

48 | Reflections

on Syntax: Lectures in General Linguistics

(Merge (add) first two numbers (adjacent) of the sequence to get the third number … and keep going: 1 + 1 = 2, 2 + 1 = 3, 3 + 2 = 5, 5 + 3 = 8, 8 + 5 = 13… ). From physical-adjacent merge, we get abstract structure: (0) ‘Fibonacci Spiral Formation’ (like shellfish, snails). (Figure 6)

Figure 6 Fibonacci Spiral Formation

Here is the rule that explains close to everything ever designed by nature (the Fibonacci code): 0 = 1, 1 = 0, 1. Let’s see how it works to get design (just the first two evolutions 1,1,2): Tree Diagram: ‘Bottom-up’ design: Merge [0,1] first, then Move [1]‌=> [1 [0,1]]… (1)

0 1 0

Move [1] out set [0, 1] yielding => [1, [0,1]]

1

1

0

[1i

[0

1

Merge/set [0,1]

1i ]]

Preliminary Overview | 49 Notice where movement of 1 has taken place from within a flat structure [] giving rise to an embedded structure [[]‌] (where subscript i shows index of moved item). (Note that the essential take-away from this discussion is that in the first- generation of the inherent design only flat and recursive ‘merge’ surfaces, out of which then comes a second-generation displacement of ‘move’). David Lightfoot (2006, p. 52) beautifully shows how a simple movement analogy of [[]‌] is both psychologically and indeed physically captured by the following simple illustration, showing the merge/move sequence as found in (1) above. Consider the ‘is-what’ phrase in the following sentence ‘I wonder what that is up there?’ The base-generated structure first looks something like I wonder [__ [that [VP is what]]] up there and where the Wh-object ‘what’ begins as the object/complement of the verb ‘is’ and then gets displaced by moving above ‘that’ in the surface phonology (PF), yielding the derived structure. But if we take a closer look, we see that after such movement of ‘what’ out of the [VP ‘is-what’] phrase, the VP survives only as a head [VP is ø] and is without its complement ‘what’—thus the phrase ‘partially projects’. But partial-phrase projections are allowed given that their Heads still remain (in situ) within the constituent phrase, hence, we get the licit structure in (a): a. I wonder [whatj [that [VP is __ j ]]] up there? b.*I wonder [whatj [that’sk [VP __k __j ]]] up there?

But movement has an effect: note how the head ‘is’ must remain phonologically intact as a head of the VP and can’t become a clitic attached to the adjacent ‘that’ as in [that’s]. In other words, moved-based *[[that]’s] is an illicit structure found in (b) (asterisk* marks ungrammaticality). It seems simultaneous movement of both head ‘is’ along with its complement ‘what’ of the [VP is-what] renders the verb phrase vacuous (i.e., phrases can’t be both headless and complementless). In this sense, MOVE-based *[[that]’s] is barred and only Merge-based [that][is] is allowed to project (the former being affixal, the latter lexical). This ‘merge vs. move’ treatment is similar to what we find regarding so- called ‘wanna contractions’ (‘want to’) where the clitic {na} (which is a phonological reduction of the infinitive marker {to}) sits in a relation to the trace/empty category {__j} of a moved element. Consider the constraints on ‘wanna’ contractions below:

50 | Reflections

a.

b.

on Syntax: Lectures in General Linguistics

Who do you ‘wanna’ help? Licit ‘wanna’: adjacent [want to]. (Whoj do you want to help __j?) (You do want to help who?) (base word order before movement). *Who do you ‘wanna’ help you?

* Illicit ‘wanna’: non-adjacent [want_to]. Whoj do you want __j to help you? (You do want who to help you?) (base word order before movement).

We note here in (b) that the [want] and [to] (of a potential phonological ‘wanna’) is actually blocked from occurring by the base-generated ‘who’ syntactically inserted between the two elements—namely, it’s the psychological presence of the empty category/trace [… j] of the moved element [who] which prevents an adjacent phonological reduction of ‘wanna’. In current Minimalist Program terms, these are interesting examples showing just how phonology at the Phonological Form (PF)—in this case the pronunciation of the clitic {‘s}—can be affected by underlying syntax at the Logical Form (LF). So, based on our Merge vs. Move distinction above, there seem to be two fundamentally different ways in which the brain processes information via language design:

(a) Linearly []/Merge: where physical adjacency counts: [] + [] + [] etc. by simply adding adjacent objects/lexical items together [x]‌[y] [z]—where x affects y, and y affects z (a domino-effect). For example, in the sentence ‘Ben is riding a unicycle’ (five words sit next to each other). (b) Non-linearly [[]‌]/Move: where abstraction is formed—whereby two things don’t have to sit next to each other: [x [y] z] (where x affects z but not y). This non-linear stuff is indeed very strange (and speaks to the ‘brain/mind as quantum physics’ analogy). Programming-language designs tend to depend on bits & strings of information that sit next to each other (like the binary code of 0’s and 1’s for computers). In other words, computer/formal languages of zeros and ones (0, 1) mostly depend on physical adjacency. But human language seems very different. In this sense, the formal language design for human speech is very strange—it may in fact be ‘quantum-physics-like’—such that items/words can affect other items/ words from a distance. For example, consider the question formation below:

Preliminary Overview | 51 (2)

(i) [Ben] [is] riding his unicycle. (ii) Question: Is Ben is riding his unicycle?

In order to make a declarative statement into a question, you might think (in naïve-theory terms) that all that is involved is to invert the adjacent word ‘is’ which sits next/closest to the word ‘Ben’: so, [Ben] + [is] then invert [is] to get [Is] [Ben] [is]…? So fine, this is linear. But now look at this sentence: (3) Ben who is my friend’s son is riding his unicycle.

But if you invert the closest/adjacent [is] you get the wrong structure:

(4) Is Ben who is my friend’s son is riding his unicycle?

*(Is Ben who _my friend’s son is riding his unicycle?) (* = ungrammatical).

The language design based on ‘closest adjacency’ just doesn’t work here: the closest [is] cannot invert. We must rather consider a new structural design of the ‘embedded’ type where structure-α is nested within structure-β, something that takes on recursive properties and rather looks like this: [[]‌]. So, language just can’t be based on sequential and linear features of design []. What we now must consider is a new sense of ‘closeness’—namely, not closeness in terms of adjacency, but rather closeness in terms of structure. Consider (5) below showing long distant movement of the auxiliary verb is: (5)

[ Ben [who is my friend’s son] is riding his unicycle] [Is Ben [who is my friend’s son] is riding his unicycle]]? = correct queson

(Is Ben who is my friend’s son _ riding his unicycle?)

Even though ‘is’ in (5) is distant in adjacency, the correct ‘is’ is closer in terms of nested structure. Here’s the structure just like our tree diagram in (1): [α Ben [β who is my friend’s son β ] is riding his unicycle α].

It looks like this [[]‌]…where [α ….] denotes a single constituent structure, with possible embedded structures [α ….[β ….].…]. … and not like this []: [α ….β.…].

52 | Reflections

on Syntax: Lectures in General Linguistics

So, the flat structure [Ben who is my friend’s son is riding his unicycle] doesn’t seem to fit language design. But recursiveness is strange. In fact, there seems to be very little if any biological pressure at all for movement-based recursive algorithms found in nature. Starkly put, there is simply no antecedent to its nature and origin found anywhere inside our universe, except where we uniquely find it in the minds of humans engaged in language.1 This ‘structural-closeness’ as opposed to ‘adjacency-closeness’—the latter being what we would expect of any operating system making use of adjacency principles of binary code—is perhaps one of the greatest mysteries of human language. Recently, distance traveled with regards to this structural closeness has been captured by neurological activity in the brain (using fMRIs) to map out where precisely movement occurs and in which lobes. The Broca/Wernicke vs. Frontal/Temporal area/lobe cut remains a classic demarcation in processing, but even more refined analyses of the brain-to- language mappings are being proposed which speak to the nature and distance of two types of movement performed by specific linguistic tasks: close movement vs. distant movement. Though the two types of movement tend to fall on the classic Broca/LIFG (Left Inferior Frontal Gyrus) vs. Wernicke/LITG (Left Inferior Temporal Gyrus) cut, today, what we find are even more accurate mappings of cortical regions within Broca’s area proper—with Brodmann’s Area (BA) 44 showing adjacency/closeness effects (perhaps more in-line with semantic retrieval processing) and BA 45 showing effects dealing with long-distance movement (in-line with syntactic-search processing) (Santi & Grodzinsky 2007). (For continuing discussion, see this Lecture below).

Overview: Language Design Based on what’s been said, we can tentatively advance a theory of Language Design built around defining aspects of Movement, at least to a degree in which Chomsky himself has very recently suggested—namely, that Language, a unique human computational system, is exclusively defined as that system which utilizes recursive operations which rely upon movement (a feature of the human

1 One exception may be what we find with so-called ‘entanglement’ properties found in the field of Quantum Physics whereby an item may undergo the effects of another item without a direct or adjacent pathway leading to the physical disturbance.

Preliminary Overview | 53 computational system which seemingly defies even Darwinian explanation). It is no exaggeration to claim that each stage of development along the continuum of generative paradigms has been somewhat shaped by how movement has been defined in that particular point in time. From the Standard Theory (1965), to GB and P&P theory of the 1980s, on to the Minimalist Program (MP) of today, it seems any generative theory of the times has essentially been about how to best account for and constrain the intricate processes of movement. Upon closer introspection of movement-based analogies currently being proposed in the literature, theoretical linguists find themselves asking the sorts of questions which force an essential unraveling of a singular universal principle— viz., that which underwrites the ‘species-specific’ property of language, namely Movement. Movement-analogies seem to take-on a theoretical status of its own doing, carried along (historically) by a portmanteau of claims, and classified accordingly with non-trivial implications to their import and design:

(6) Movement Classification:

(i) Merge (ii) Move a. Local (semantic) b. Distant (syntactic)2

We view Merge as bricolage in nature, meaning that it is a (physical) ‘lexical builder’ at the very local phrase-level. Merge, simply put, creates the phrase—a binary-branching operation which is the essential property of language design. The role of Move is then to extend the phrase up the syntactic tree with movement being motivated by the need to ‘check-off’ and erase formal (abstract) features on the lexicon which have (for some reason) entered into the human computational system (say, at the level of Phonological Form/PF) but which cannot seek semantic interpretation (at the level of Logical Form/LF). So, in a sense, Move creates upward mobility and extends the phrase into higher functional projections. Move seems to be part of a language design which seeks to create recursive embedding and nesting formations—‘an exclusive property of human language 2 This semantic/syntactic cut, sometimes referred to as the ‘Duality of Semantics’, will be further expanded in our discussions culminating with the view that there exist two probe-goal relations which trigger movement: one more ‘local’ (checking off of Case/ semantic, light vP) and the other more ‘distant’ (checking off of AGReement/syntactic, CP).

54 | Reflections

on Syntax: Lectures in General Linguistics

and that which separates human language from all other communicative systems’ (e.g., Hauser, Chomsky, Fitch 2002). Move is a displacement property of language which allows an item to be heard in one area of the phonology while being interpreted in another area of syntax. For example, in passive structures [Mary was kissed by [John __ __ ]] the active subject/agent ‘John’ is dislocated in the surface phonology (PF) from where/how it gets interpreted in the underlying semantic structure (LF) [John kissed Mary]. Other examples of Movement show-up within AGReement mechanisms, say, between number regarding the verb and the matrix subject—e.g., John speak-s French where the AGR-a ffix {s} is a move-based inflectional reflex of subject/ verb morphology (3P/Singular/Present tense). The fact that both subject and verb must enter into an agreement and correspondently mark for 3 Person (whereas marking on one of the two items should suffice) shows a high level of redundancy built into the design. The same case could be said of number regarding the DP— e.g. two book-s where the plural marker on the Determiner two alone should have sufficed without need to also mark plural {s} on the Noun book. Such redundant aspects of language seem to arise out of an optimality of language design motivated by movement. If there were no optimal need for movement, human language could have evolved as an essentially flat [] design. When we look closely at Move, we find that it too has a binarity of design: it either involves a more local constraint (closeness) which we find in examples such as binding, or it is free to displace as (distant) movement in cyclic fashion (step-by- step) to far-reaches up the syntactic tree (e.g., Wh-movement). Local vs. Distant Move may also capture what we know of the Semantic vs. Syntactic Cut (respectively). We’ll come to suggest that the properties of language design, as they slowly emerge in child syntax, (viz., Item>Merge>Move-local>Move-distant>…) is, to a certain degree, a creative attempt to rehash the notion of ontogeny recapitulates phylogeny (Ernst Haeckel)—being that the progression of language design is pegged to a maturational development of certain cortical parts of the brain. (There does seem to be a hidden flavor of truth in the notion here). One could extend Haeckel’s analogy here by suggesting that any putative ‘proto-language’ would most certainly have followed and had remained frozen in one of the intermediate steps, along the same progression as taken by the early child—if for no other reason but that it would be stipulated by the progression of design. (See e.g., Bickerton (1990) for thoughts on Proto-language, also see Appendix-3. But also see Punctuated Equilibrium as argued by Eldredge & Gould, Gould (1972)

Preliminary Overview | 55 which calls for some aspects of Darwinian evolution to appear in sudden burst as opposed to gradual onsets). In any case, the growth of child syntax, if looked at very myopically, shows a fast-closing window of incremental growth over a small period of time. But if we can slow down the progression just a bit, we might have a glance of slow growth. We believe we have done so here with our data (see Appendix-4)—the case study of a bilingual child’s longitudinal English grammar has been ‘slowed’ just enough by the two emerging grammars. Now, while this text is not about brain studies per se, nor is it about any attempt to tether emergent child language to proto-language, our main objective is rather that of ‘language design’—to draw some light on how current linguistic assumptions within the Chomskyan Minimalist Program (MP) framework might lend an account for what we see emerge in the slow growth of early child syntax. As a case in point, let’s just turn to one such finding pointing to the aforementioned dichotomy of closeness vs. distance, while keeping an eye how the underlying theoretical assumptions of the two processes might prove valuable for our discourse in the pages and chapters which follow. Consider the structures below (Santi & Grodzinsky 2007): (7) John knows that [Maryi pinched herselfi]

→ Binding: Local

(8) *[Johni knows that [Mary pinched himself i]] → Binding: *Distant *

Here, it seems to be the case that the antecedent Mary must have an adjacent closeness to its reflexive herself (whereas the ungrammaticality of (8) is due to the distance of the antecedent John falling outside the clause and thus breaking adjacent closeness). We can refer to this processing of closeness as similar to what we find regarding retrieval processing of say lexical items—a brain-to-language mapping traditionally assigned to Wernicke’s area/Temporal lobe. We will come to call this a local move operation which may be one-step removed from a prosaic merge combination (using MP terminology). In (9) below, we see a very different underlying processing: (9) John loves the woman i [that [David pinched___i ]] → Movement: Distant.

56 | Reflections

on Syntax: Lectures in General Linguistics

(where [___i ] indicates an index/copy of the moved item—in this case, that the woman has been displaced from out of the [__[David pinched the woman]] clause).

When we compare the ‘distance travelled’ between the two structures, we quickly find that the example in (7) rather mimics what we find of adjacency lexical conditions–viz., that certain words collocate together—e.g., verbs introduce noun phrases forming a Verb Phrase (VP) (eat the cake), or that idioms must remain intact (John kicked/*knocked-over the bucket/*pale = died) and that very few exceptions allow much space to separate the constituency (and in the former idiom case, no exceptions are allowed at all). It seems under this condition, when too much space separates the constituency there is often a breakdown in processing (e.g., *John ate very quietly in the middle of the room after dinner the cake). Or when such a breach of closeness is allowed, it is to compensate for higher-order pragmatics: e.g., John ate __very quickly the dinner that was prepared for him by Mary seems to be fine since the dinner can be seen as moving rightward (Heavy NP shift) in order to pragmatically coordinate with the lower phrase prepared for him (the dinner). To a large degree, and in overly-simplistic terms, phrase-structure rules as well as lexical retrieval processing come down to this closeness condition e.g., [[pushed-him] down], but not [pushed__ [down -him]]. Contrastingly, (though still of a systematic nature) the above structure we find in (9), as opposed to (7), pays little heed to closeness and rather shows licit long-distance movement (moving over several phases at a time, though presumably in cyclic stepwise fashion). Let’s take a closer look at the two movements, one which is deemed to remain within a single phase (binding, intra-phasal) and one which crosses phasal boundaries (distant/inter-phasal) (Chomsky 2001 defines phases as CP and vP*). The notion of pegging movement to phase will become a pivotal aspect of our developing theory on Merge over Move: (*vP light verb situates above the thematic VP as the first potential functional category to host a moved item. The rationale behind VP-shells arise out of a stipulate for an upper vP above VP: e.g., in the ergative sentence ‘John [vP rolled [ VP-ergative the ball rolled down the hill]]’).

Preliminary Overview | 57 (10) John knows that [vP Mary pinched herself] T’ Tense

vP

→ Binding: Local/Intra-phasal

{ed} Spec

VP spec

V’

Mary

V

= Maryj pinched Maryj.

DP

Pinched herself

(11) John loves [the woman that [David pinched___]]. CP spec

→ Movement: Distant/Inter-phasal C’

C

T’

The woman

vP that

T

Spec

= David pinched the woman VP spec

{ed}

David

V’ V

DP

pinched the woman

Notice in (10) how the canonical SVO ordering remains intact: Mary- pinched-V herself-Obj Subj … but how (11) breaks canonical word order by object raising (OSV_) The woman- David-Subj pinched-V __ Obj

It seems that binding is captured by a local processing operation and that syntactic movement is captured by long distance processing. If so, we might expect to draw some parallels with other aspects of syntax which show similar structural distinctions. For instance, we will consider that the Probe-Goal

58 | Reflections

on Syntax: Lectures in General Linguistics

relation (using MP terminology) for the checking-off of features will follow a dual-mechanism route:

(i) Probe-Goal relations of a case/thematic/semantic nature secured by local movement (e.g., vP—handling case & argument structure), and (ii) Probe-Goal relations of a syntactic nature secured by distant movement (e.g., CP—handling expression structure). In addition to these two important phrases (known has phases since material from each phase must be independently transferred to PF and LF interpretation), we will consider how the two movements work together informing a cohesive syntactic structure. Any putative lack of distant movement at early stages of child syntax must surely impact the speech of a child.

In sum, the nature of these twin-t ypes of movement represents the classic Duality of Semantics—namely, the separation between lexical and functional heads. We assume throughout (following e.g., Miyagawa’s (2010) assessment) the following demarcation of ‘heads’ (or, using more recent MP terminology, ‘probes’), that:

a. Lexical heads/probes deal with the semantic-thematic/argument structure of language (and is configured via local merge operations), while, b. Functional heads/probes deal with the syntactic expressive structure.

Case and Agreement We further assume (following much consensus within the MP) that the quasi- lexical projection light verb vP assigns a probe-goal relation whereby Case gets assigned (e.g., as we see inherent Case being lexically assigned), while the quintessential functional projection CP assigns Agreement. Hence, as part of our developing story of ‘Merge over Move’, we find a similar overlap regarding the duality of semantics: with Merge catering more to the lexical/semantic side and Move catering more to the functional/syntactic side of the demarcation. The goal of this monograph is to present a hypothesis regarding the growth of syntax along the lines of this classic split found within the duality of semantics. Having said this, we’ll come to consider the following sequence and family of movement (below): Family of Movement a. basic merge sequences (pulling two items from out of the lexicon), then,

Preliminary Overview | 59

b. to local move (forming a semantic/ argument- structure hierarchical phrase), then finally, c. to distant move (forming syntactic agreement relations).

The distinction between (b) and (c) is tantamount to what was traditionally called the lexical vs. functional categorical split (respectively) with (a) being the bricolage building-blocks of phrase formation. In addition, to these three points, we will also assume that it is Agreement (CP), the quintessential motor driving functional categories, which motivates inter-phasal movement up the syntactic tree (with Tense (TP) being adjunct in nature and Case being derived via vP). What we hope to gain from the general theory as laid out here is that there is indeed reason to assume a brain-to-language mapping and that such language and movement mapping follows a pegged maturational development of the brain. If Movement is so critical here, a theory will have to be devised which defines what is the motivation for it. The simple answer will be that abstract functional features are what drive movement (up the syntactic tree) where a substantive lexical head category is selected by its more formal functional head counterpart—viz., where C selects T and where v selects V. But the question still remains regarding the nature of the movement: What is the nature of the features and does it drive local vs. distant, or inter vs. intra phasal movement as discussed above regarding binding vs. movement? So, a two-prong analysis of movement may be in the offering. Let’s take a peek below at how we might understand a developing theory on movement, utilizing a proposed ‘brain-to-language mapping’.

Developing Theory on Movement Keeping to our syntactic footing incorporating recent insights made in the framework of the Minimalist Program, on the topic of ‘Movement & Child Language’, this text explores the notion that what very early children’s grammars lack—say, at the very early onset of the two/three-word lexical-category stage of development—is that ability which is arguably the most fundamental aspect known to all human adult languages—namely, early grammars lack the ability of syntactic movement. Our central theme is thus two-prong:

60 | Reflections

on Syntax: Lectures in General Linguistics

(i) That the earliest phases of child language development (what we will come to call a ‘merge phase’) indeed shows a lack of syntactic movement, and that, (ii) This lack of movement is attributed to the neurologically underdevelopment of Broca’s area in the brain, a maturation-sensitive area of the front-left hemisphere which seems to be largely responsible for syntactic movement.

The ‘Broca-to-Syntactic Movement’ (BSM) corollary implication has come about over the past two decades as a result of research undertaken not only in relation to Broca’s aphasia (e.g., Grodzinsky), but also in relation to what had been known (for a very long time) regarding the lack of more abstract levels of syntax as found in early child speech—viz., passive movement and inflectional morphology—the results of which proved the basis of much of the classic child language research undertaken during the latter part of the last century. For example, Carol Chomsky in the 1960–1970s showed very young children to have problems handling passive-to-active movement, to the degree in which upon hearing ‘Mary was kissed by John’ when asked to determine ‘Who’ was doing the kissing? the children were primed to select the closest adjacent nominal—i.e., they would say ‘Mary’ was doing the kissing (and not John). What this showed was that young children at this maturational point in their development were yet to employ distant syntactic-movement operations such that John (the object of the passive) could in fact be syntactically moved and thus reinterpreted as the active person doing the kissing. The correct interpretation of ‘John’ relies heavily on distant syntactic movement for such restructuring. On the other hand, if asked upon hearing ‘The ball was kicked by the boy’ ‘Who’ is doing the kicking? they would be able to fall back on their more robust thematic/semantic knowledge to get at the right interpretation—since the pragmatics/semantics of ‘balls’ prevent a thematic reading, viz., ‘balls’ can’t kick people.

Binding vs. Movement: Broca’s Aphasia Much of Grodzinsky’s research looking into Broca aphasia showed a deep-seated bias towards shallow Working Memory (WM) directed to canonical order and adjacency and/or semantic readings over deep working memory directed toward long distance, non-canonical and/or syntactic readings. For an example—citing the same data as presented in examples (7–9) above—when Broca’s area is damaged, binding vs. movement operations show distinct processing characteristics,

Preliminary Overview | 61 with long-distant movement showing more demands on WM and thus more prone to processing deficits. It turns out that binding relations such as the antecedent relation of object ‘herself’ to subject ‘Mary’ as found in (12) follow the canonical ordering of Subj V Obj, and that such SVO canonical ordering interpretations become preferred over Movement relations in which canonical orders of SVO are violated (13) (where only chance comprehensions are reported) (Santi & Grodzinsky 2007): S (12)

(13)

V

O

John knows [Mary pinched herself].

(binding)

John loves [the woman [David pinched__ ]]

(movement)

We notice how binding requires the antecedent Mary to be structurally close to its antecedent herself: (14)

John knows [Maryi pinched herself i]

Binding requires adjacency.

(15)

*[Johni knows [Mary pinched himself i]] Binding rejects distant move.

Relative canonical word order. (16)

The mani [who [the lion chased__i]] is dead.

(long distant move)

(17)

The mani [who [__i chased the lion]] is dead. (short distant move)

In such readings (determining who is the agent doing the chasing), Broca aphasics, as well as young children, have trouble comprehending the agent/actor of the relative clause but have even more difficulty when there is non-canonical word order of OSV which requires long distance movement.

Compound Modifier Relations A similar kind of Broca-based movement is employed with compound readings which require a dominant word that has risen to construct phrase-level hierarchy and label the phrase. For example, Tom Roeper has looked into how early children might comprehend ambiguous compounds such as ‘house-boat’ vs.

62 | Reflections

on Syntax: Lectures in General Linguistics

‘boat-house’. For instance if boat house ↔ house boat is initially processed in the early stages of acquisition as a mere sisterhood merge [x, y] sequence (thus with no fixed word order) then we might find some evidence that indicates young children’s inability to establish left-head [Adj [N]‌] modification—such that if asked to show where is the ‘house-boat’ (upon being shown pictures of a ‘house-boat’ and a ‘boat house’), children may only be able to pick-out the correct picture, say, half the time. (See Roeper 2007 for work and inquiries along these lines). If this line of reasoning is correct, there might be a default sisterhood conjunction reading of such compounds (pers. comm. Tom Roeper). What we can say is that UG provides for the bottom-up bricolage structure of first unlabeled Merge formations—unlabeled in that there is yet to be hierarchical structure leading to a Lexical/Head to Word/Phrase which renders word order; merge in the sense that merely two lexical items attach [X+Y]—merge is a stage prior to what we minimally find for the simplest of labeled adjectival readings e.g., [redi [redi, car]] or with labeled compounding phrases such as we find in [boati [house, boati]] which activate minimal local movement as shown below:

‘Red car’ (18)

car

[redi

=> a ‘move-based’ product: ‘a kind of car’ (AdjP) car

[car, [redi,

=> ‘merge’ two sisters: conjuncon reading. red] car]]

=> ‘move’ red out of merge to higher phrase.

‘Boat house’ (19)

house => a ‘move-based’ product’: ‘a kind of house’ (Adj-compound)

[boati

house => ‘merge’ two sisters: conjuncon reading. [house, boat] [boati,

house]]

=> ‘move’ out of merge to higher phrase.

Preliminary Overview | 63

Active to Passives and Embedding Classic data reveal that it’s not only with Broca damage that such movement- based analogies show deficits, but also with very young children where above chance vs. above chance interpretation are based on movement and distance traveled, with ‘above chance’ readings being pegged to the matrix clause canonical (S)-SVO local order versus (S)-OVS distant order: (20)

a. The boy who [__pushed the girl] is tall. b. The boy who [the girl pushed__ ] is tall. c. The girl admired the boy. d. The girl was admired by the boy.

Above chance interpretation Chance interpretation Above chance interpretation Chance interpretation

INFLection: IP Once the environment for movement is created—viz., a head is targeted along with some feature-specificity which motivates movement up the syntactic tree into higher functional positions (typically the IP above VP)—the relevant higher positions must project from the syntactic tree in order to host the moved element. In the cases above, both structures such as genitive cup of coffee and genitive derived adjectival coffee cup require a higher clitic position (typically IP) as a result of a movement operation from the base merge [cup, coffee]. (See Lecture 4, [16] ‘Steps of Move-based derivations’). Adult (hierarchical) English uniquely allows for both a move-2 compliant structure e.g., wine bottle (=predicative adjective) and a move-1 compliant structure bottle of wine (=genitive), but not bottle wine. Any ‘non-compliance of movement’ would then account not only for our attested child word order deviance of the type cup coffee found in our data, but also allow us to account for the wide array of mixed word order found amongst early SV, VO ‘single argument strings’ (where only merge is said to apply), with late acquired ‘double argument strings’ thus targeting a position created by move and triggering correct SVO word order. As will be shown, if there is non-compliance of distant movement, then not only attested stage-1 child word-order deviance of the type cup-coffee found in the data can be accounted for, but also inflectional deficiencies. Other more ubiquitous examples in the data come from IP-based movement analogies whereby nominal/verbal inflection is seen as a result of movement—examples such as: Tom’s book [IP Tom [I ‘s] book], drinks milk [IP drink] [I {s}] milk] (Kayne 1994).

64 | Reflections

on Syntax: Lectures in General Linguistics

Thus, a child goes from projecting flat merge operations of: (i) Tom book. Daddy car. Mommy sock. (ii) He (Him) drink. Boy play ball. Mommy cook pasta. [-s –‘s [Tom book]], He [–s [drink]] (before movement) to: (Merge-only stage) Tom’s book, He drink-s (after movement). (Move stage)

Child language data—showing a bricolage stage-1 ‘merge-only stage’ vs. stage-2 ‘movement stage’—bear this progression out. Proposed theoretical models show how the delay follows from a protracted development in which ‘Local/ Merge’ operations emerge in the child’s grammar slightly ahead of ‘Distant/ Move’—a Merge-first over Move-later account of syntactic development. In conclusion, most developmental linguists come to view this latter form of rule-based/move operation as being generated by an emergent and innate neuro- treelet template which renders language as a protracted development spread across linguistic stages, as stages are pegged to brain maturation—the so called ‘brain-to-language’ corollary.

3

The ‘Four Sentences’

Ever since the initial conception of the ‘generative’ enterprise (GE) begun in the latter part of the last century (Chomsky 1955), two central components of the framework have exceedingly stood out, guiding in a principled way how linguistic science should go about handling any investigative study leading to descriptive and explanatory adequacies. Both of these components would ultimately have to be underwritten by a universal faculty of language (FL). The two components are: (i) how to describe the rich observable complexity of a final state of a given language (L), where L (as an object) is an external language (E-Language (E-L)) (e.g., the surface description, complexities, and behaviors of, say, English, French, Japanese, etc.), and (ii) how to reduce the final state of E-L in terms of an outgrowth and design of an initial, universal, and internally-specified internal state of FL (an FL that, by definition, must be human-species specific, biologically determined and maturational)—viz., an FL which is defined as double- disassociated from other non-linguistic/cognitive problem-solving capacities, and rather seated as a specific module of the human brain. It is in this latter sense that the counterpart of an ‘external object’ E-L arises— namely, an internal-language (I-L), as an artifact of universal grammar (UG). Chomsky has increasingly emphasized the distinction—viz., that we have moved away from the 19th early 20th century Darwinian classification of E-language as

66 | Reflections

on Syntax: Lectures in General Linguistics

part of an external list of language-families and have rather moved towards a new definition calling for any specific language, say, French, as an outward manifestation of an internal state of the mind. In other words, the language ‘French’ is a particular physiological/mental instantiation of a state of mind—viz., ‘a “French” state of mind’. I-Language thus is strictly internal and can be the object of study in of itself, independent of all other non-linguistic factors which, priorly assumed, went into the building of naïve theories about ‘speaking French’. For instance, a classic 19th early 20th century linguistics definition might have sounded something like the following: ‘Language (French) is the mere sum of words and images of a given speaking community (France)—linguistics is the study of word patterns and how a speaker of the community might use such patterns to express ideas, emotions and desires by means of symbols connected to sound’ (Edward Sapir). Counter to these naïve assumptions, what UG attempts to reveal is that all E- languages share core properties (referred to as principles of conceptual-intentional (CI)), whereby the only distinctions which surface between two given languages (say, English vs. French) amount to little more than sensorimotor (SM) artifacts: cosmetic, superficial differences, perhaps as a result of the arbitrary nature of the classic ‘sound-to-meaning’ corollary (de Saussure), and other peripheral operations which deal with the phonological form (PF) which derives word order. The terms UG and FL are synonymous in this way. In other words, what GE sets out to do is to establish a set of universal constraints that all languages share (as a product of language design and the brain), and to investigate how such constraints might map their way onto language. The very first theoretical notions of GE would have to challenge prior assumptions held that language was simply, at best, a product of pattern-seeking induction and/or analogy, or, at worst, a mere by-product of problem-solving skills which are attributed to cognitive learning—both proposals (major orthodoxies of the day) would be seen as attributing no special value to language other than the aforementioned general learning procedures, and clearly nothing in their prescribed theories would ever vaguely be suggestive of language specificity. As it turns out, both naïve views of language remain with us today, perhaps even strengthened by (broken) promises related to artificial language (AI). But little has come of such enterprises. Proponents of AI are still holding out.1

1 See web-link no. 1. See end of references section for all web-link references.

The ‘Four Sentences’ | 67 With GE, questions were now refined in new and interesting ways which begged of explanatory adequacy; where attempts at explanation could no longer be dismissed as cognitive problem-solving or pattern-seeking formations. GF placed such a new rigor on descriptive and explanatory adequacy to a point that an entirely new framework had to be devised—viz., where old behaviorist frameworks which heavily relied on frequency-effects and/or analogy could no longer keep up with the articulated data. The age of pegging language to some internal module of the brain—some sort of underwriting which governed and licensed language and the development of language—would become the new consensus, marking a shift from language as general problem-solving skills (based on association, frequency-effects, and pattern-seeking analogy) to a language-specific and biologically-endowed processing, an outgrowth of a universal grammar (UG). Below, using our ‘four sentences’ as a theme to this text, as a kind of pedagogical device, we begin to sketch-out our proposed omnibus tour throughout the decades of GE:

[0]‌ The Four-Sentences (see Lecture 5 for discussion of ‘Four-Sentences’). 1. Can eagles that fly swim? (1950s) 2. Him falled me down. (1960s) 3. The horse raced past the barn fell. (1970s) 3.i. [The boy Bill asked to speak to *Mary thinks he is smart]. = flat structure a. …. *[Mary thinks] (Interpretation (a) is a wrong analysis not often assumed by native English speakers). 3.ii. [The boy [Bill asked to speak to Mary] thinks he is smart]. = Recursive structure b …. [The boy [ …] thinks] (This is a correct analyses). (Note: example [1.3] seems to prime for adjacency, whereby ‘the horse’ primes as subject for the finite verb ‘raced’ creating the garden-path structure, [1.3.i] does NOT prime us to parse ‘Mary’ as the one doing the ‘thinking’, hence it is not a garden-path. This is of interest to us in terms of where in the point of sentence formation the garden-path takes place).

68 | Reflections

4.

on Syntax: Lectures in General Linguistics

I wonder what that [is__ _ ] up there. (1980s) i. that [is what]… ii. *that’s [is what]… (that’s) is an illicit structure here: (*I wonder what that’s up here?)

5. A fifth 1990s treatment (to extend the analogy) would most certainly present the boom in research that was undertaken in that decade regarding language and brain-imaging studies (chief among them the many varies fMRI studies dealing with lexical storage and retrieval, Broca’s area processing regarding Broca’s aphasia subjects, Event-related potentials (ERP) dealing with priming effects, etc.). (See web-link no. 2, Cathy Price).

(We’ll wait for subsequent sections and chapter to fully flesh-out our proper analyses posed by these classic ‘four-sentences’).

The Minimalist Program The following discussions are largely thematic summaries & exegeses on the linguistic topic of ‘Move’ and the fundamental role movement-operations play at defining what constitutes a core property of language. These four token sentences—a long with the accompanying material which make-up the core of this monograph—represent particular instances of processing: they are ‘pieces of structure of language’ which can serve metaphorically as a roadmap to uncovering the core properties of language, what we’ll come to regard as a ‘singular property’—the kind of property which has nontrivial implications to emergent factors leading to child language syntax. In addition, the ‘Four-Sentences’ could be viewed as a sort of omnibus tour of Chomskyan linguistics over the decades, along with Chomskyan turn-of-a- screw spin-off sub-disciplines, which, in turn, could be likewise chronicled as they made their way onto the Chomskyan scene—starting with Sentence-1 (a first incarnation of the Generative Grammar enterprise, mid 1950s): (i) 1950s: Sentence-1 => the first incarnation of generative grammar. (See Syntactic Structures 1957 p. 22: Chomsky uses embedded structures as an example to show that we need more than linear-order properties of ‘beads-on-a-string’ sort to process language).

The ‘Four Sentences’ | 69 Chomsky argues that natural languages are much more than ‘word-to- word’ models which give rise to finite-state grammars (of the type S=> a S1, S => bS2, S=> cS3),2 but rather natural language goes beyond a finite-state grammar by generating interdependent words, phrases, and constituencies where a dependency exists between non-local configurations. For example, Chomsky shows how the sentence ‘[S1 The man who said that [S2…] is arriving today] must generate a dependency the allows the parsing of the type: [‘The man [who said that “linguistics should not be regarded as a science by our students”] is arriving today’]. Such Sentence (S) within another S contains embedded dependencies such that the subject of the S1 ‘the man’ sits in a structurally-close (though not adjacency-close) configuration with the verb ‘is’, blocking any parsing which would otherwise erroneously map the adjacent Noun ‘students’ of the embedded [S2] clause with the next adjacent verb ‘is’.3 (ii) 1960s: Sentence-2 => leads to research undertaken on child language in the 1960s. (iii) 1970s: Sentence-3 => leads through to the fields of psycholinguistics of the 1970s. (iv) 1980s: sentence-4 => leads towards refinement of the theory in 1980s. *(v) 1990s: Neuro-linguistics => begins remarks on the brain-to-language corollary (e.g., remarks on Grodzinsky). We reach current discussions of a brain-to-language corollary marked by research in the two decades 1990–2010 regarding brain imaging devices such as fMRIs and ERPs, along with the accompanied proper bio-linguistic pursuits of language and the brain, the putative language gene. *(Extending the Chomskyan omnibus tour into the 1990s, what we would tour would be the many neuro-linguistics discoveries made in the span of this decade). [1]‌ The Minimalist Program’s (MP) (Chomsky 1995)4 main objective is to seek out core properties of language, separate from other, say, general cognitive (problem-solving) systems, and to ask if these properties are ‘exclusive to’ and ‘necessary for’ language—namely, to ask if they reflect conceptual necessity. 2 See Appendix-3 on ‘Lack of Recursion found in Proto-language’. 3 Chomsky 1957, p. 22: It is clear, then, that in English we can find a sequence a + S1 + b, where there is a dependency between α and β, and we can select as S another sequence containing c + S2 + d, etc. 4 See web-link no. 3. Review article.

70 | Reflections

on Syntax: Lectures in General Linguistics

If not, the MP seeks to do away with them, linguistically—since, as it often turns out in the sciences, it is the most elegant and simplest theory that wins out over the more elaborate and convoluted (a principle of Occam’s razor). Hence, as the name suggests, MP looks to reduce language’s architecture (language as defined as a unique, human computational system) to its bare minimum design. The question of a ‘minimal design of language’ specifically arises in the context of investigating how the child (ontogeny), has well as our species (phylogeny) comes/came to acquire language.

In other words, on the phylogeny side of the equation, MP is really about minimizing the properties of language to a very narrow subset, or even, perhaps, to a singular feature, so that we might account for the spawned cascading-effect which can account for language evolution. (Some evolutionary linguists are now suggesting a timeline for language emergence somewhere between 40–60 thousand years ago—it was a long wait, some six million years after the ‘Pongid- Hominid’ split). On the ontogeny side, the aim of the MP is to discover the nature of maturational and developmental onsets of these unique and necessary properties which then lead to a ‘staged acquisition’ of language. [2] W hat MP takes as its point of departure is that (spoken) language (= a given string or utterance as generated by a grammar, what we call a syntactic operation), must satisfy a twin interface {~} condition in order for the string to be conceptual. In other words, starting with the first step of syntax (as a generating device which merges strings of words together), in order for language to be productive as a full interpretation system, language must satisfy two other interfaces: (i) a Thought component, which we call LF (Logical Form), and (ii) a Phonological component, which we call PF (Phonological Form) [3]‌ To a large degree, this is common-sense linguistic thinking: of course, language must be mentally formulated first as some form of ‘structure’ (syntax), and then the ‘structure’ must become ‘articulated’ in the mouth (phonology), and finally, the ‘articulated structure’ must make ‘logical sense’ (logic). (PF) [Phonological Form]

PF {~} ‘Speech’

(LF) [Logical Form]

LF {~} ‘Thought’

Lexicon >Syntax (S)

(where {~} marks interface).

Figure 7 Interface Systems

The ‘Four Sentences’ | 71 Full interpretation/Interface systems (Figure 7) [4] In addition to these two interface systems (what is often termed spell-out at PF/LF), when considering among the subset of universal operations found amongst all languages (Merge, Move, Agreement), a question naturally arises as to whether such operations also reflect a singular-core property, a property unique to human language. A single core property, if one exists, is exactly what one would be looking for in terms of the human evolution of language, since any simultaneous evolutionary emergence of a complex arrangement of otherwise quite unique features, randomly selected, would be hard to come by.

In theory, what bio/evolutionary linguists are looking for is a single feature, property, gene (or bundle of genes) which came online almost simultaneously in our human genome, which created a cascading-effect of supporting neuron- structure of the type which could only support what we find in human language. In terms of biology, one would expect a potential gene, or a bundle of genes, responsible for creating the types of mental processing which could support a human computational system allowing for Recursive Structure (recursive: the ability to form structures within structures..[[]‌]…, which can contain more than one instance of a given category). (See web-link no. 4). The FOXP2 gene has been looked at in this way. (See web-link no. 5). For example, regarding recursiveness, what we would need as a starting point (for our lexicon/syntax) would be an operation, say Merge, whereby an item (word) is selected from out of a lexicon (vocabulary/list of words) and then is merged with a second item to form an entirely new category—e.g. Determiner [the] merges with Noun [tree] to form a new category Determiner Phrase [DP the tree], then, say, (bottom-up), the DP merges with a Preposition [under] to then form a second entirely new category within a prior category, such as we see with the Prepositional Phrase [PP under [DP the tree]]. This type of merge, when it looks like …[[]‌]… is said to be unbound. [5] Merge & Binary Branching

Consider one of the universal principles of language which stipulates that all Merge operations must be Binary branching. Let’s tease out what such a binary linguistic structure means. Imagine a possible response to the question: What are you doing? Now there are two ways to answer this. One is to respond with the complete expression I am going to sleep, which is a declarative sentence that carries (Finite) Tense. Therefore, we label it as a Tense Phrase (TP). A second way to respond has to do with a partial expression—namely, by removal of some already

72 | Reflections

on Syntax: Lectures in General Linguistics

stated material (otherwise known as an Ellipsis), as in the response going to sleep (= I am going to sleep). But notice how one cannot say as an elliptical structure *am going to sleep (where [*]‌marks ungrammaticality). ‘What are you doing?’ i. I am going to sleep. ii. *__am going to sleep. (*ungrammatical) iii. _______going to sleep.

Why might this be? In other words, what are the constraints on ellipsis, and are these constraints universal? As it turns out, Phrases must be formed via Merge of a binary nature (never ternary, in three’s). In this way, a Phrase (P) must have an infrastructure which projects the features of the Head (X) category of the XP—where three subparts of a given P must project: its Head of P, the Comp(lement) of the Head, and any other outside material which becomes subsumed under H, what we call its Spec(ifier). This is known in X-bar Theory as a Spec>Head>Comp configuration (See [7, 11] below). [6] Now, let’s take the question once again ‘What are you doing?’ [5] and see how XP stipulates and constrains our responses. ‘I am going to sleep’ a.

XP

Spec

From the universal configura on in (a), we get the TP found in (b). X’

Head Comp

Notice how the binary branching of (b) differs from ternary branching of (c). b.

TP

Subj

T’ T

I

(i) TP: I am going to sleep. (Max-projecon of TP) *(ii) T’ (T-bar): am going to sleep. (Intermediate-projecon of T’) VP

(iii) VP: going to sleep. (Max-projecon of VP)

am going to sleep.

The ‘Four Sentences’ | 73 c.

*TP

Subj I

T

VP

am going to sleep.

[7]‌ Now, why are the above structure-t ypes important? So, the interesting question here is what exactly constrains elliptical expression of the type addressed above? If we can respond both ways to the question What are you doing? either as (i) I am going to sleep (the full TP-expression), or as (iii) going to sleep (the full VP-expression), but never as *(ii)_am going to sleep (the partial T-bar expression), then the question arises as to why (ii) should be an *illicit structure. Well, X-bar Theory along with the Binary-branching requirement may provide an account. It seems that what we have here are levels of P: a Maximal projection TP, an Intermediate projection T’, and a Minimal projection T. (The VP would constitute its own max projection).

So, now an account can be made that ellipsis follows a principle which says that only Max-projections can be expressed, never Inter or Min projections. Hence, this rules out the T-bar (inter projection) of *‘am going to sleep’ but where Max projection TP ‘I am going to sleep’ or Max projection VP ‘going to sleep’ is allowed as a response to What are you doing? Note that in (6c) above, the flat ternary structure would have no way to articulate such a distinct three-level projection and would predict that all possible responses would be indiscriminative and could be freely given (which is not the case). One of the first analyses which come to us here is that language structure is not flat, but rather is recursive. This first-order analysis comes to us by way of consideration of movement as well as its (many) constraints. [7.1] X-bar Theory & C-Command. So, in summary, Merge requires binary (recursive) branching: the inner template which scaffolds language is recursive in nature (not flat), binary-branching (never ternary): X-bar: a.

XP (= recursive)

Spec

X’

Head Comp

b.

*XP (= flat) Spec Head Comp

74 | Reflections

on Syntax: Lectures in General Linguistics

[7.2] X-bar theory allows forces the two-fold stipulation: (i) That lexical items insert into a larger structure, which is binary in nature:

a. There are Max-projections (the highest constituency), b. There are Intermediate (X-bar), and Minimal projections (Heads).

(ii) That such structure renders a hierarchical domain, whereby one ‘mother’ item higher-up in the syntactic tree may hold scope over its lower ‘daughter’ constituent (within a Max-projection).

The above structure in [7.1a] is a Max-projection, whereas the structure in (b) is flat and this holds no hierarchal x-bar projection. [7.3] C -command allows for the above X-bar structure to hold unidirectional relationships, such a mother of x, daughter of x, sister of x, etc. Consider the C-command-structural relation below (also see [12] below): In short, what we can say about c-command is that it is a hierarchical configuration which accounts for the relationship between two or more constituents. Constituent x is said to c-command constituent y if x is no lower that y in the structure (if x is above y). Within a Specifier-Head configuration, we can say that the Spec c-commands its Head in a Spec-Head relation: XP Spec

→

c-command

X’

Head

(Comp)

In the above structure, the Spec c-commands its Head. The Head doesn’t c- command Spec (hence a left-to-rightward relation is imposed). A B

E F

G

We can illustrate these unidirectional relationships such as Mother of B, Daughter of A, Sister of F, etc. by seeing how one constituent might be contained (within the domain) of another. (See [12] below for fuller description of relations):

The ‘Four Sentences’ | 75 [8] It seems that even Phonology (at PF) is constrained by some aspects of hierarchical structure—namely, what is typically referred to as C-command. (For a detailed discussion of C-command, see Radford 2016, Analyzing English Sentences (2nd edition) p. 148ff). For example, consider what we find regarding the elicit structure in (ii) (*you’ve) in what otherwise would constitute PF—viz., regarding the seemingly mere surface-level (adjacent) phonology:

(i) These stories of you have been going around. (ii) These stories of *you’ve been going around. (iii) You’ve/have been going around.

Once we consider the surface-level observation that both you and have are adjacent strings as found at PF, the more subtle analysis now begins. If not surface phonology (via adjacency), the question then turns to ‘What else could impose these kinds of constraints’… which disallows the clitic {‘ve} (= have) from attaching onto the previous adjacent word you (as found in (ii)), but that allows for the clitic to attach to you in (iii)? It seems that syntactic structure too plays a role in what we find at the twin PF/LF interfaces (in this case, specifically, what we find at PF). Let’s look at it a bit more closely (below): [9] These stories of you have been going around. TP

(i)

DP D

NP

T’ => (T-Head ‘have’ not c-commanded by ‘you’)

These N PP T PROGP (progressive) Stories P Prn have PROG VP of you been going around.

Notice in (9i) above how the Auxiliary verb ‘have’ must be pronounced in its full lexical form (and not as a clitic {‘ve} as found in (10ii) below {*you’ve}). This is due to the fact that ‘have’ is not c-commanded by ‘you’ (i.e., ‘you’ doesn’t not c- command ‘have’ although the two words sit in an adjacent relation to one another at the (surface) phonological-level …[(PF) = …‘you have’…]).

76 | Reflections

on Syntax: Lectures in General Linguistics

Rather, in the underlying (deep) syntactic level (before the PF ~interface), it is the DP and T’ which are sisters. Thus, (D) ‘These’ c-commands (T) ‘have’ (and not ‘you’)—hence, the Prn ‘You’ embedded within the PP can’t c-command ‘have’, it rather being a sister of the P ‘of’ [of you]. So, to recap, since ‘These (stories)’ c-commands ‘have’, and not ‘you’, ‘have’ can’t cliticize (attach) to ‘you’ as {*you‘ve}. It is in this sense that the underlying ‘deep’ syntactic structure trumps what otherwise might appear at the ‘surface’ interface PF. This contrasts with what we find in [11a(iii)] below where ‘you’ and ‘have’ abide by two conditions of cliticization:

a. Where ‘have’ is preceded by a word ending in a vowel (a phonological condition (PF)), b. Where ‘have’ is c-commanded by that word (a syntactic condition (S)). (Only (11iii) below abides by this condition). [10] These stories of *you’ve been going around. (ii)

TP

DP D

NP

These

N

T’=> (clic *{‘ve} is illicit: non c-command relaon) PP

Stores P of

T PROGP (progressive) Prn *‘ ve you PROG VP been going around.

[11a] You’ve/have been going around. TP

(iii) Subj You

T’ => (clic {‘ve} is allowed: appropriate c-command of Prn ‘you) T ProgP ‘ve Prog VP been going around

The ‘Four Sentences’ | 77 Where Subj Pronoun (Prn) ‘you’ found in Spec of TP c-commands T’, and consequently the Head in T (since T is contained within T’). The Pronoun/Subj ‘you’ Spec of TP and T’ (headed by T {‘ve}) are sister relations.

Notice in [11b] below how the same kind of c-command serves what we find with subject-verb agreement (accord) whereby the person/number features of the subject must agree with that of the verb—replacing the plural subject ‘these stories’ with the singular ‘this story’, also note how adjacency doesn’t apply, but rather how structural c-command serves in the function of subject-verb agreement via structure, and not via positional adjacency. If mere adjacency were to hold, the sequenced words [you have] would project since ‘you’ is second person, +/-Plural, which would agree with the verb have, as in ‘[You have] a very nice story’. [11b] This story of you *have/has been going around. TP

DP D This

NP N Story

P of

T’ => (‘have’ not c-commanded by subject ‘you’) PP

T PROGP (progressive) Prn *have you PROG VP been

going around.

has i. This story of [*you have]… ii. [This story] of you [has]…

Note that the verb must be spelled out as ‘has’ [3 person/singular] since the c- command relation is with the subject ‘This story’ [3P, sing] and not with the adjacent previous word ‘you’ (as would be found in the superficial linear word order). [11c] These friends of the president blame *himself/themselves.

78 | Reflections

on Syntax: Lectures in General Linguistics TP

DP D

NP

=> ‘themselves’ is c-commanded by ‘These friends’) T’=> (‘himself’ not c-commanded by Prn ‘the president’)

These N PP T VP Friends P Prn of the president V prn/reflex blame *himself

themselves i. Friends of the [president] blame *[himself]. ii. Friends] of the president blame [themselves]. iii. The president] blames [himself]. iv. [These friends] blame [themselves].

i. Friends of the [president] blame *[himself]. ii. [Friends] of the president blame [themselves]. iii. [The president] blames [himself]. iv. [These friends] blame [themselves].

In example (i), the pronoun ‘the president’ is not in a structural relation which would allow c-command to take place (which is required of reflexive pronouns), it being a sister of the preposition ‘of’ and a daughter of PP. TP DP a. The president b. These friends

T

T’ => correct c-command rela on: DP c-commands Prn Reflexive VP V Prn-Reflex a. blames b. blame

a. himself b. themselves

The ‘Four Sentences’ | 79 Note how it’s the Prn-reflexive ‘themselves’ of the VP which must be bound (via anaphoric binding) by an appropriate antecedent—(the subject (DP/NP) ‘The president/These friends’ are the antecedents of the anaphor ‘himself/themselves’ (respectively). In sum: The two items (the anaphor and antecedent) must fall within a structural c-command relation). Below in [12], let’s spell out exactly what c-command looks like incorporating terms such as sister, daughter, mother relations found in a family tree. [12]

C-Command A

Specifier>Head>Complement XP

B C

E D

H

F J

G

y Spec

X’ (x-bar)

X Z Head Comp = sister-sister/flat > = mother-daughter/recursive

C-command, as a ‘linguistic description’, is a way of showing just how recursiveness acts upon syntactic constituents, which in turn, brings about hierarchical relations. C-command in this way is described as a syntactic operation found at (Syntax) (prior to PF, LF interfaces). A is the mother of daughters B, E (which themselves are sisters). Sisters can (symmetric) c-command each other (both ways). Mothers can (asymmetric) c-command daughters (one way). As observed in the illicit nature of the clitic *[you’ve] in (10ii) above, the rationale behind the ungrammatical status has to do with the fact that (J) can’t c-command (F) (they are not sisters), given the structure in [10] that the pronoun you would project from a hierarchical c-commanding position similar to (J), while the clitic [‘ve] would be projecting from a similar position found in (F) (as referenced to the scheme in [12] above). While (F) can be symmetrically c-commanded by sister (G) {G>F, F>G}, or asymmetrically c-commanded by Mother (E) {E>F}, there can be no c-command relation either way between (J) (F). And so, what we find in (10 ii) is the simple fact that ‘you’ (J) cannot c- command {‘ve} (F), hence the ungrammatical status of (10), while (11) [you’ve] is permitted since (B) ‘you’ asymmetrically c-commands (F) [‘ve] as correlated to the scheme referenced in (12).

80 | Reflections

on Syntax: Lectures in General Linguistics

So, in sum, what we have here is a two-prong combination of Merge, a combination which functions in one of two ways:

(i) Merge (local/adjacent) as sisters {x, z} when projected in a flat, non- recursive manner, or as, (ii) Merge/Move (distant/non-adjacent) as mother daughter {y {x,z}} when recursive.

It is this latter ‘distant’ function of Merge (ii) which MP relabels as Move. [13] Binding & Licensing. Related to C-command, we can say that any constituent X binds another constituent Y (and conversely, symmetrical sisterhood binding of Y to X) if X determines the properties of Y. This is most evidence in reflexive anaphor expressions such as ‘John dressed himself’ whereby the reflexive anaphor ‘himself’ is bound, and thus its properties are determined by ‘John’ (‘John’ is the antecedent of ‘_self’). Loosely playing with the formal terms here, what we can say is that Johni binds & licenses _selfi.

However, notice how binding effects are seemingly blocked in the following examples:

Binding Constraints a. b. c.

Johnj wanted Maryi to dress *himself j /himj/k /herself i Do they want __to talk to each other? *Do they want John to talk to each other?

Returning to the naïve discussion that all that’s needed for language processing is a good cognitive, problem-solving procedure, then one question that should arise is what exactly prevents example (c) from being grammatical, given that (b) is fine. What blocks (c)? Presuming that the anaphor each other must be plural, and thus match a plural antecedent, why don’t we just continue via some problem-solving search mechanism to parse the structure in (c) until we find the correct [+Pl] antecedent to match each other, in this case they (which would be what we found for (b))? Likewise, the same could be said about the wrong anaphor relations in (a): i.e., just keep parsing the structure until you find the logical antecedent/match—for instance, since we know that himself is masculine and so is John, what is it exactly that blocks the potential interpretation, leading to a well-formed structure of ‘John wanted Mary to dress himself’ (where himself

The ‘Four Sentences’ | 81 refers back to John). Why is this so bad? What prevents the reading? How does language structure prevent this? Recall, that if we believe language to be flat and non-recursive, the examples (a–c) above would be structured accordingly: [13.1] Flat Structures & Free Binding (the wrong theory) d. [Johnj wanted Maryi to dress *himself j /himj/k /herself i] e. [Do theyj want __to talk to each otherj]? f. *[Do theyj want Johnk to talk to each otherj]?

Within purported flat [….] structures, there can be no account for the kind of constraints on binding that is found in (a–c), particularly where, in such flat structures, problem-solving skills could easily do the work (as mentioned above). However, if language is not flat, but rather recursive [..[..]..], and not a result of general problem-solving, but rather based on a computational design, then the constraints on (a–c) make sense—viz; binding must be local: [13.2] Recursive Structures & Constraints on binding (the correct theory) g. [Johnj wanted [Maryi to dress *himselfj /himj/k /herselfi]] __self h. [Do they want [ __ to talk to each other]]?

h’. Do they want [theyi to talk to each otheri]]?

i. *[Do they want [Johni to talk to each otheri]]?

(In example (g), the pronoun ‘Him’ (himj/k) is said to be ‘free’ since it doesn’t have anaphoric features—hence, ‘Him’ could refer back to ‘John’ in the main clause or to another ‘John’ not stated).

The constraint on binding clearly relies on the nature of recursiveness found within clause formations: Binding is local (within the matrix clause—where an anaphor must refer back to its antecedent within the same clause … it must look for a clause-mate pronoun). Clearly, in (i), this is not possible since the anaphor ‘each other’ has a [+Pl] feature but where the closest clause-mate pronoun is ‘John’ which has a [-Pl] features, resulting in a so-called feature crash, resulting in an

82 | Reflections

on Syntax: Lectures in General Linguistics

illicit binding relation. Such constraints on binding—that it be structurally local—flies in the face of theories which stipulate adjacency and locality to be of a theoretical premium. [14] Double Objects/Dative Shift. The same can be said about argument structure. For example, consider double object structures (three-place predicates) such as John gave Mary Flowers. The argument structure is in fact reliant on c-command, binding and licensing (in the loose way being treated here). For instance, the argument/logic structure surrounding the verb looks something like this: >. But what we can discern from these three arguments is that, in fact, the verb ‘give’ can only directly license the DP argument ‘Flowers’ and not ‘Mary’. In this sense, ‘Mary’ is something of an imposter, known as an adjunct, which is optional, extraneous information (typically adverbial in nature relevant to place, time, manner) and which can be deleted within the structure—e.g., John gave flowers (to Mary), but notice that the direct object ‘Flowers’ is a true argument and cannot be deleted in the structure *John gave Mary (flowers). This type of binding can be expressed by stating that the verb ‘give’ within the VP must both case mark the subject (as NOMinative Case ‘He/John’) and the object as ACCusative Case (‘them/flowers’), but that the adjunct ‘Mary’ would be Case marked by the Case-marking particle ‘to’ (to Mary) (what is often referred to as Dative-case). (Note, in this usage, the particle ‘to’ is not solely realized in the traditional way as a lexical category preposition, but rather as a functional category Case-marker). a. PP P

‘Mary’ via Dative Case DP

VP V

to Mary/*Flowers Dative Case

b.

‘Flowers’ via ACC Case DP

give Flowers/*Mary < via C-command>

ACC Case

The Case marking particle ‘to’ can only bind and license a Dative argument (i.e., an Indirect object of a preposition), and conversely, the verb ‘give’ can only bind and license an ACC argument (i.e., a Direct object). In languages with a richer morphology, (English being a weak morphological language), Dative Case would be distinct from ACC Case, as was the case for the very rich morphology of languages like Latin.

The ‘Four Sentences’ | 83

Merge [15] But is this operation Merge unique to language? In other words, is it a unique language-specific core property, or is it found elsewhere? One problem is that Merge seems to have antecedents to non-linguistic environments found in nature. For example, the Fibonacci code, too, sequences to first merge two items (sisters) in order to create a third hierarchical item (mother). (See web-link no. 6). But, in so doing, if the third newly created item (or label) is recast as a new category (different from what we had with the two items), then what we can say is that the simple operation Merge generates a new set not exclusively found inherent in the two separate items. In other words, the merge of two-labeled items {α, β} creates a new third- labeled set, i.e., a new category {γ {α, β}}. It is this newly recursive category/ label as raising from out of the two lower items/Merge which signals what we call Move (whereby Move is defined as a result of recursion). Hence, one definition of Move, as we see it here, is a two-prong result of (i) unbound merge, and (ii) labeling.

It appears that this byproduct of merge, when it leads to recursive properties, is what lies behind a core linguistic property, a core and unique linguistic property of which can be said has no other traceable antecedents found outside of language. [16] In terms of linguistic theory, (Chomsky), the notion of a very narrow range of features (a narrow language faculty) which somehow got selected for language—or perhaps not even selected but fell from what Stephen Jay Gould called exaptation5 —makes for a very narrow definition of language as that which allows for a structure of recursive design. (viz., Language = Recursive). All other aspects of what is typically referred to as language, e.g., the phonological system and other general cognitive systems are said to fall under the label of broad language faculty. (See Fitch, Hauser, Chomsky (2005) web-link-7).

Move [17] When Merge turns into Move, two essential things happen: Merge creates labels which are unbound in limit—we can e.g., [z nestle [y nestle [x nestle]]]… and this allows for movement of [x]‌to [z [ [x]]]. This first type of movement comes to us via surface movement, whereby we actually see 5 See web-link no. 7.

84 | Reflections

on Syntax: Lectures in General Linguistics

displacement of a moved item on the surface phonology, as in e.g., passive-to-active structures (found in [20, iii] below). Or consider such surface movement with topic fronting:

Topic Move:

i. Are we ever going to win with this team? ii. This team, are we ever going to win! (with) this team?

In this example, we can see the moved items get displaced from the original in-situ structure. The exact process which lead to movement is composed of three subparts: (i) Copy> (ii) Merge> (iii) Delete. For example, regarding PP- movement (slightly different from Topic Move found above), consider in [18] below how these three components of move work in tandem: [18] (i) We are going to win [PP with this team]! (ii) With this team, are we are going to win [PP with this team]? CP => Interrogave sentence PP with this team C

C’ TP => Declarave sentence

are Prn

T’

We T

VP

are V

TP

going T to

VP V

PP

win with this team => delete with this team => copy> merge to CP

(i) The PP ‘with this team’ must make a copy of itself. (ii) The copy is then merged higher-up in the tree (merge along with movement mostly follows a ‘bottom-up’ progression, with the exception of so-called affix-lowering, as found in English)—e.g., [TP [Ts] [VP [V speak]s]] (e.g, ‘He speaks’).

(iii) Once merge is complete, deletion of copied item ensures only the higher merged version gets spelled out at PF.

The ‘Four Sentences’ | 85 Imagine what might happen if a merge/copied item doesn’t get deleted. This in fact does happen on occasion, in certain structures. Consider Paul McCartney’s ‘But … in this ever changing world in which we live in’… [PP in this ever changing world] in which [we live [PP in this ever changing world]]].

Regarding the three-pong process, consider how [17ii] above, as an example of topic movement, illustrates ‘copy> merge> deleted’ as a way to show focus: the ‘topic/move’ operation may take what in fact started as a sincere interrogative sentences (?), and turn it into a declarative sentence (!). We can readily see that something has moved from the surface. [19] Or consider a type of focus movement dealing with Scope:

Scope Move:

a. The president didn’t pass one piece of legislation. b. One piece of legislation, the president didn’t pass one piece of legislation.

Notice that in [19a], the reading is ambiguous, it feeds into the {~}/Thought process as two possible interpretations—either:

(i) There was not one piece of legislation that the president passed (= everything failed), or, (ii) The president passed all legislation except one (= only one failed).

What movement allows us to do is resolve the ambiguity via scope. (Scope in this sense means ‘a sphere of influence’ by which an item or a string of constituents receive some modification). In [19a], we say that not has scope over one, hence, the {~}/Thought via the scope [not … one] (= all failed). In [19b] however, movement shifts the arrangement of scope to one has scope over not [one … not] (= one failed). An additional example of scope comes with the following usage of the adjective ‘necessary’ as it maps onto distinct mapping of the Modals ‘must’ vs. ‘need’* accordingly: * ‘need’ is considered as a Modal and not a Lexical verb whenever its complement verb is required to project as a bare stem—e.g, He needn’t tell her/*He

86 | Reflections

on Syntax: Lectures in General Linguistics

needn’t to tell her (‘need’ = modal with negation and ‘tell’ projects as a bare verb stem, as opposed to ‘He needs to tell her’ where ‘need’ functions as main lexical verb with the infinitive ‘to’ verb complement). Regarding Scope, consider the distinct mappings of the modals ‘must’ vs. ‘need’ accordingly:

(iii) You must not help him with the exam (= It is necessary for you not to help him). ({~} => necessary not): ‘necessary that you not help’

(iv) You need not help him with the exam (= It is not necessary for you to help him). ({~} => not necessary): ‘not necessary that you help’

Where not has scope over help in [19iii] (what is called wide scope) regarding the negation of ‘help’, versus what we find in (19iv) where not doesn’t have wide scope over ‘help’, but rather narrow scope since it could be argued that not has scope over ‘necessary’ instead of ‘help’. Specifically, it is said that ‘must’ has wide scope over ‘not’ while ‘need’ has narrow scope over ‘not’. We can examine this processing distinction via movement. [20] Scope processing via Movement a. you must not help him (= it is necessary for you not to help him) => replacing must with necessary we get [necessary not] [TP you [T must] not [VP [V help] him]] b. you need not help him (= it is not necessary for you to help him) => replacing need with necessary we get [not necessary]

But in order to derive the right underlying structure of [20b]—in other words, in order to situate the negation not above need (‘no need’)—we need to show a base-generated (underlying) structure prior to movement as ‘not need’. The Modal (M) ‘need’ of the Modal Phrase (MP) ‘need help’ is a polarity item (it can only occur in question and negation structures), and so polarity items like ‘need’ must be c-commanded by Negation ‘not’ or an interrogative ‘wh’-expression. As our discussion above has shown with regards to reflexives, c-command forces a specific structure. So, we must stipulate that the polarity item ‘need’ must begin

The ‘Four Sentences’ | 87 lower down in the syntactic tree below the licenser Neg ‘not’ and move from V- to-T, whereas the non-polarity item ‘must’ is base-generated within T (showing no such V-to-T movement). c. [TP you [T need] not [MP [need] [VP [V help him]]]] => ‘no need’ (not necessary)

d. [TP you [T must] not [VP [V help him]]] => ‘must no’ (necessary not) direct lexical insert

[21] Polarity Items/Licensers

Note the following distribution of polarity licensers/Modals such as need and dare as opposed to non-licensers such as must. (Noting how the Modal ‘must’ is not a polarity item since it is not restricted in its usage only to Question or Negation grammars, whereas polarity items/licensers are indeed restricted to these licensing conditions) (see below): (a) [TP He [T must] [VP help him]] => ‘must’ is free to serve as T-base generated modal. (b) [TP He [T *need/*dare] [VP help him]]. => ‘need/dare’ not T-base-generated but must show Modal-to-T Movement. (c) [TP He [T need/dare] [NegP [Neg Not] [MP [M need/dare] [VP help him]]]]. => ‘need/dare’ as polarity items c-commanded and licensed by ‘not’. ({~} => No need /No dare (see [20b])). (d) [CP How [C dare] [TP he [T dare][MP [M dare] [VP help him]]]]! => ‘dare’ as polarity item licensed by wh-expression ‘How’. [22] In [20a] it is said that ‘must’ has wide scope with respect to negation. In [20b] ‘need’ has narrow scope with respect to negation. This wide vs. narrow scope distinction also helps define what we know of polarity items such as any. Continuing with polarity expression, consider the polarity item ‘any/anything’ (which must be licensed by ‘not’ or ‘how’) in the following strings:

88 | Reflections

on Syntax: Lectures in General Linguistics

a. you must do *anything/something. ({~} => necessary something):

[TP you [T must] [VP [V do *anything/something]]]

b. You needn’t tell her anything. ({~} => not necessary anything): [TP you [T need] not [MP [M need] [VP [V tell her anything/*something]]]]

c. You need to tell her *anything/something

[TP you [T need][InfP [ Inf to] [VP [V tell her *anything/something]]]]

d. How could he tell her anything/*something? [CP How [C could] [TP he [T] [VP [V tell her anything/*something?]]]]

In [22a, c] above, the polarity item any cannot project since it is not being c-commanded by a polarity licenser (such as ‘not’ or a ‘wh’-word). The Modal ‘must’ is base-generated within T. In (20b, d) the polarity item any is licensed (c-command) by not and how respectively. [23] Perhaps one of the most elegant examples of licensing regarding polarity is the example below (taken from Radford) which uniquely demonstrates just how the subtle licensing of the polarity item any in the constituents [*will change/won’t change > anything] relies on notions of c-command (see below): The fact that he has resigned won’t change anything. (Radford 2016, p. 153) a.

TP DP D

T’ NP

the N fact

T CP

C

won’t TP

that Prn he

VP V

change

Prn anything c-command is established

T’ T

V

has

resigned

The ‘Four Sentences’ | 89 [T won’t] being a sister of the VP [VP [V change] [Prn anything]] allows asymmetrical c-command.

Here, in (a) above, we see that the partitive polarity item ‘any’ of the pronoun anything is properly c-commanded by a negative licenser ‘not’ of the Tense/Modal won’t. In other words, any falls within the scope via c-command of a negative licenser {n’t}. But conversely, see how in (b) below the same licensing is incongruent, thus the partitive ‘any’ is made illicit. *The fact that he hasn’t resigned will change anything. b.

TP DP D the

T’ NP

N

T CP

fact

will

C

TP

that Prn he

VP V change

Prn *anything

T’ T

V

hasn’t resigned no c-command is established

Above, while [T hasn’t] is indeed a sister of the embedded [V resign] (and is an appropriate c-command relation), it is not a sister of the main-clause matrix VP [VP [V change] [Prn anything]], thus c-command between n’t and any is NOT established, rendering the sentence ungrammatical.

Rather, in (b) above, what we see is that the partitive polarity item ‘any’ of the pronoun anything is c-commanded by the [T will], but ‘will’ (being an affirmative and not a negative modal) cannot license the polarity item ‘any’. In other words, any falls outside of the scope of c-command of a negative licenser {n’t}. Licensing Condition: Polarity expressions e.g., ever, any, and the modal dare must be c-commanded by either (i) negative (not), (ii) interrogatives (Aux- inversion formations for questions), or (iii) conditional licensers.

90 | Reflections

on Syntax: Lectures in General Linguistics

[24] Clitic Climbing /raising-movement: So, we see that Move-based operations involve Topic/focus or scope dealing with the thought interface. There are other examples of MOVE which deal with speech interface (phonology), as we see in example of clitic climbing: a. Je l’aime (SOV order derived from base order ‘Je aime le, an SVO order (= I love him)). b. Je aime le c. Je l’aime le

XP Je

VP V Cl

obj le

l’ aime

[25] What is happening here is that movement of the clitic is forced since the pronoun le (him) doesn’t have the kind of phonological (stress) properties which would allow it to survive as an independent word chunk (as an item of merge). Thus, the instigated movement here is not a result of the {~}/ Thought (at LF), but rather is a result of the {~}/Speech (at PF).

Merge of lexical items : [Je] [aime] [le]. (= SVO base word order)

Move: [Je] [l’aime] [le ]. (= derived SOV order via movement).

[26] One might also, in this same context, consider how the usage of the infinitive marker {to} projects in two different configurations (paying attention to the weak vs strong stress marking of ‘to’): a. I decided [not to help him]. (strong stress on both ‘not’ /nát/ and ‘to’ /tú/… /ná tú/…) b. I decided [to not help him]. (weak stress on ‘to’ /tÙ/). NegP (Negave Phrase) Neg Cl

InfP (= Infinive Phrase) Inf

VP

to-not to help him

The ‘Four Sentences’ | 91 Note that this example of the above affix/clitic ‘raising-movement’ [24, 26] is similar to what we find in INFLectional morphology ‘lowering-movement’ [28] (as seen in an inflectional over-generalization error made by young children (see sentence-2 re. [[fall]ed]), whereby, in this case, the affix {ed} must lower onto the Verb). [27] Interesting, Chomsky’s notion of Movement draws parallels to what we find in phonology regarding a rule known as Assimilation [28], whereby one feature may spread via movement and bleed into an adjacent phoneme (if both phonemes are ‘sister-relations’ within a structural phonological configuration). Note that syllable structure of (cat) equally takes on hierarchical status of ‘mother-to-daughter’ relations, similar to what we see in X-bar theory. This was made evident in so-called ‘tapping experiments’ where students in early-school-age years were asked to simply tap-out the sounds of words.

Tapping Experiments: Children who were aware at the (more narrow) phonemic level would tap out three times for (cat) with phonemes (at onset, nucleus & coda). But children at a (broader) syllabic level of onset & rime would only tap twice, as found in the (more broad) syllable structure >, and never as *, showing that the English parametrized (default) syllable-template was right-binary branching (where ($) marks syllable template seen below): a.

but not

* b.

* $

$ onset

/k/

rime

nucleus

coda

/æ/

/t/

onset rime /k/

/æ/

coda /t/

And what we find of phonological spreading of rules (i.e., Assimilation): that mere adjacency is not enough, but rather that structure must be stipulated as having a sisterhood hierarchical relation (as we find governing syntax):

92 | Reflections

on Syntax: Lectures in General Linguistics ‘Cars’ [K [a [rz]]] /a/

‘Carson’ /r/

/r/

s = /z/ /karz/ (cars)

Assimilaon of ‘voicing ‘applies between sisters.

* /s/ = /s/ (not /z/)

*No assimilaon between mother-daughter.

4

Reflections on Syntax

The central theme of this lecture revolves around an emerging consensus of what most linguists today regard to be the core property of human language—that of recursion. But what exactly is recursion, specifically, in its syntactic form, and how do these properties of recursive-syntax come to be held in such ‘high esteem’ when it comes to questioning human language? What the central theme of this text reveals is that language is much more than the sum of its parts (words). Rather, properties of language are not only epiphenomenal in nature, but, when taken as a bundle of species-specific features, the properties come to occupy a unique place in the evolution of our species—a place which asserts an extremely high value on the ability to abstract away from the here-and-now (displacement), from the surface-level structure (movement), from what we hear as words sitting next to each other (adjacency), or, by how often a string of words or phrases might be heard ( frequency), or how actions and events can come to be first conceptualized, then expressed (argument structure). The ‘holy grail’ of all of these crucial properties, when they do come together to make-up language, is what we can call a recursive-syntax—and the one crucial aspect that defines a recursive-syntax is that of MOVEment. Syntax=Movement. Syntax, to a greater or lesser degree, is the convergence of all the aforementioned properties. But when these properties are processed in just the right way, allowing for MOVE-operations to be performed across an array of a mental lexicon

94 | Reflections

on Syntax: Lectures in General Linguistics

(a parser), what we arrive at is the defining of what can truly be called a human language. Perhaps the most interesting way of expressing what constitutes a MOVE- based recursive-syntax leading to human language is to consider three unique observations made by Noam Chomsky (2010): [0]‌ Chomskyan Axioms: (i) Words may not even get pronounced (on their surface phonological level). (ii) Even when words do get pronounced, they may not deliver the relevant structure necessary of the pronounced string. (iii) And whence the string is actually heard, it may not be heard from its originally structured-source, but rather may have been displaced from a previous lower position and hence moved up the syntactic tree. In a sense, ‘Words climb!’

While the three axioms may not exhaust what we know of all the rich complexities of a given language, when taken together, its convergent power becomes the quintessential driving force behind the core properties of human language. So, as the phrase goes: Words, they climb. Having said this, the question becomes Where do words climb to? Where do they climb from? Why do they move at all? And, what is it exactly that they are climbing on? More recently, the portmanteaux of questions here come to consider how the brain is engaged in many of these steps (a language-to-brain corollary). These have been core questions since the early formation of the Generative Grammar Enterprise (GE), as formulated by Noam Chomsky in the late 1950s.

Why Move? The true reason for movement is remarkably elusive, and still to this day baffles most linguists. But some instances of movement seem to be more readily available for explanation than others. For example, if movement is done in order to enhance and secure a communicative value [+Value], then I suppose one could rightly claim that the linguistic parser (our mental grammars) has evolved essentially out of a (Darwinian) pressure to be ‘sensitive’ to Interp(retation), in order to deal with the functionalism of language as a means to form a communicative niche. All [+value] features encode the labeling of [+Interp], since the movement is to enhance ‘Interpretation’. (The [+/-Interp]-feature label is widely used in current Chomskyan syntax).

Reflections on Syntax | 95 Of course, the rather bizarre flip-side to this argument regarding MOVEment (and one emphasized by Chomsky throughout his writings), is that MOVE, as engineered by our human mental parser of grammar, may in fact be [-Interp] in nature (and may not have evolved out of any bottom-up, Darwinian communicative pressure). Hence, a ‘Move-based’ theory of language as found in the Minimalist Program (MP) (Chomsky 1995) does not seem to arise by a stipulation to satisfy functionalism. Rather, a Chomskyan theory of language is much more defined by its formalistic tendencies—viz., [-value] [-Interp] formal linguistic aspects which naturally fall out of the human brain/mind. In this latter sense, language is indeed ‘epiphenomenal’ in nature and seems to arise as a bi-product of the unique abstract abilities of the human mind— The design of language exceeds ‘beyond core properties’ which would have been otherwise minimally necessary of mere communication. Such formal and abstract features would give rise to a true human-language capacity. But what might such [-value], abstract/formal [-Interp]-features look like which would motivate movement? Two features come to mind: both have become the cornerstone of earlier GE and more recent MP theories over the years. They include Case and AGReement. Let’s sketch-out below what such a template for MOVE-based [-Interp] might look like: [1]

Move (Probe) Case/AGR Z

Merge (Goal) X

Y

Steps from Merge to Move (Merge utterance is underlined): [1.1]

DP Spec

=> Move ‘mommy ’s sock’

(DP = Determiner Phrase)

D’

Mommy

D

NP

=> Merge ‘mommy sock’

N N mommy sock

[+Poss]

‘s

96 | Reflections

on Syntax: Lectures in General Linguistics

[1.2]

DP Spec My

=> Move ‘My sock’ D’

D

NP

[+Poss]

N N me sock

[1.3]

DP Spec

=> Merge ‘me sock’

=> Move ‘Two books’ D’

Two

D

[+PL]

s

NP

=> Merge ‘two book’

N N two [[book] s]

[1.4]

TP Spec

=> Move ‘(Daddy) drives a car’ (TP = Tense Phrase). T’

(Daddy) T

VP

=> Merge ‘(daddy) drive car’

[3P,Sg Pres] V

N S [[drive]s] car

What these basic templates show is that MOVE {Z} acts as a Probe for one of two Merge-based items {X, Y} to serve as its Goal in order that the relevant formal features 1 of the item are (i) either obtained onto the lexical item itself, and/or (ii) is stripped-off of the lexical item (the former as an instance of inflectional affixation, the latter as checking off of features). We define Merge as any process which doesn’t include a copy of itself—viz., as bricolage structure- building via word-linking formations as found in e.g., compounding, derivational affixation, as well as simpleton ‘inflectionless’ lexical phrases which would

1 Tense might also be considered in such a formal MOVE configuration, although it must be said that some treatments of T(ense) within current MP assumptions considered T to be of a [+Value]/[+Interp]-feature specificity.

Reflections on Syntax | 97 include the prosaic N+N/-possessive (NP) (‘mommy sock’), N+N/-number (‘two book’), V+N/-tense (VP) (‘drive car’), and Prepositional Phrase (PP)).2 The recursive move-based structures showing affix inflection look as follows: [mommy ‘s [mommy sock]], [two +pl s [two book]s], [drive +T s [drive car]], etc. Recall that all instances of inflectional affixation are abstract in nature and thus carry a [- value] feature. Such Probe-Goal processes are considered to be highly formal [-Value]. Indeed, what is most puzzling to linguists, even baffling, is why our mental linguistic parsers should even require this level of abstraction of formal material in the first place when, in the end, the result will be that such formal features will have to be stripped off of the lexical item somewhere down-the-line in the derivation, a point referred to as spell-out. The question as to the nature of this type of very superfluous movement is extensively written about throughout the current MP literature (e.g., Miyagawa’s monograph Why Agree? Why Move?). We’ll leave this more formal aspect of MOVE for the time being (to be discussed in a section below §Syntactic Reasons for Move: To decrease formality [-Interp]). For now, let’s discuss below perhaps the easier assumptions made that MOVE is motivated by [+Inter] and [+value] features having to do with phonology and semantics. In other words, that MOVE facilities communication (at a functional level).

Phonological/Semantic Reasons for Move: To Increase Interpretability [+Interp] Phonological account: A simple case is to consider so-called ‘phonological-clitic movement’ in languages such as French. Let’s first recognize the fact that the original base-word order in French is SVO (subject-verb-object), just like in English: (in fact, there are claims that SVO is a universal, default word order, with all subsequent variations being derived via upward Movement (Kayne 1994)). But after a closer inspection, we see that object movement from V-O to O-V (O) is motivated by phonological considerations. Consider the French example below (examples taken from Radford 2016, p. 27): 2 Such prosaic simpleton phrases, termed lexical phrases (without MOVE) all the hallmark of early child speech. See Radford & Galasso (1998) for a longitudinal case study showing a child’s passing through a non-MOVE stage-1 of language development.

98 | Reflections

[2]‌

on Syntax: Lectures in General Linguistics S V Il a vu (He has seen Paul) S O Il l’a S V Il a

(i) (ii) (iii)

O Paul V vu __ vu

(O) __ O *le (* marks ungrammaticality)

What we find in [2iii] is that the pronouns *l(e) (=him) cannot survive on its own as an O(bject) (which places after the verb) because it doesn’t have the phonological/syllabic stress integrity to support the clitic/Object-pronoun le as a free- standing word. So, in order for the clitic (a weak-stressed, weak-syllable) partial word formation to survive communicative value—as it must be supported (lean up against) a more stable free word—the CL(itic) must MOVE up to attach to a more stable stem. Hence, movement is motivated to secure le as having a more stable syllabic/stress formation. This is done via its attachment to the auxiliary conjugated (3P, Sg, present) verb {a} (=has). [2’]

Cl-P (Clitic Phrase) Cl l’

TP (Tense Phrase) => {l’a} = ({Him as} seen) T a

VP (Verb phrase) V vu

Prn-Obj le

Question rising. Notice how phonological intonation-rising also works to serve a [+Interp] value for interrogative (Question) expressions: [3]‌ (a) (b)

He found it on the street? /hi faÚnd It an ðә ↗strit/

=> with rising ↗ on the word ‘street’ [↗street].

Yes, he found it on the street! /↘jεs hi faÚnd It an ðә ↘strit/ => with no rising on the word ‘street’ [↘street].

Reflections on Syntax | 99 Note how the syntactic movement operation of Auxiliary-inversion serves as a substitution to phonology intonation rising: (c) Did he

(

find it on the street?

=> no rising indicated.

He did find it on the street)

One other aspect of phonological movement might be seen with so-called clipping, where a part of the word has been truncated (shortened): (d)

Sally => Sal (Sally) Joseph => Joe (Joseph)

=> where {_ly} is removed. => where {_seph} is removed.

Though it is important to realize that such ‘Phonological-clipping movement’ (via truncation) is merely a surface-level sound phenomena—viz., the actual full word is still being mentally processed (albeit silently): when I say ‘Sal’, I am still mentally processing and silently hearing the full form ‘Sally’. A final example could be what we find with ‘logical & (and)’ expressions, such as: (e)

I need to buy a, b, c, d, and e.

In this ‘logical &’ example (as notated with comma usage), we find that the initial sentence ‘I need to buy …’ doesn’t need to be repeatedly restated, given that the comma acts as a ‘repeat’ (instruction, ‘go back to the statement’, like a Da capo in music). Hence, the above example is actually processed in its full mental-processing string as: (e’) I need to buy a, I need to buy b, I need to buy c, I need to buy d, etc. etc.

Semantic Account In addition to phonological reasons for MOVE, there are also semantic reasons, both which ensure communicative valuation of the expression. Consider below an example which deals with so-called scope (an interpretable feature) (taken from Radford 2016, p. 27.):

100 | Reflections

on Syntax: Lectures in General Linguistics

[4]‌(i) He didn’t fail one of the students. (ii) One of the students, he didn’t fail (one of the students).

In [4i], the interpretation is ambiguous (hence, not optimal in communicative value). The sentence in [4i] could either mean he didn’t fail one student (= not one student failed, as they all passed!) where scope of (i) is: not>one. Or sentence [4i] could alternatively mean he didn’t fail one student (= there is one student who did not fail, but all the others failed!). The scope here would be one>not. But when we consider the result of MOVE in [4ii], where the phrase ‘one of the students’ has moved from its base-generated position to a higher position actually above the subject He, we discover that the ambiguity has been resolved: in [4ii], the only possible interpretation is that he didn’t fail one of the students (one>not). So [4i] yields both not>one and one>not, while only [4ii] yields one>not. Here’s how we can see Scope at work: a. didn’t fail one of the students. b. One of the students, he didn’t fail

=> not>one => one>not

Semantic/Theta-Marking Another account for movement is that it may be thematic/semantic in scope. For example, Chomsky (2013, p. 40) considers the sentence Which books did John read? So, the underlying base-structure prior to MOVE/raising looks like the following in (iα), with subsequent movement as shown in (iiβ): [5]

(i) α [TP John (did) read which books]? (ii) β[CP Which books did [TP John (did) read which books]]?

Reflections on Syntax | 101 What’s interesting here is that this sentence has two ways of dealing with the thematics of the phrase ‘which books’: the ‘first-instance’ thematic reading found in (α) reads-off the phrase ‘which book’ as the thematic/semantic object of the verb ‘read’, forming the Verb Phrase ‘read which book’ [VP read which book]: (i) α [TP-Dec [VP read which books?]]

Then, the second-instance thematic reading in (β) reads-off the phrase ‘which books’ not as the object of the verb (that has already been read-off), but as a separate phrase which theta-marks an Int(errogative) theta-feature: (ii)

β

[CP-Int Which book (did)[TP [VP]]]

If we look closely, the twin thematic assignment of < α, β > is as follows: (i) α John read X

(X= which books): where X is the complement/object of ‘read’ and takes the proper theta-role marking as ‘object of verb’. In the MOVE structure, we read-off the dual theta-role markings accordingly: (ii) β for which books X, α John read books X

[CP-Int

[TP-Dec ]]

Perhaps more common examples of how MOVE allocates theta-markings along the movement trajectory are found in so-called Ergative and Unaccusative structures: [6]‌a) John rolled the ball__down the hill. a’) The ball rolled down the hill. b) John broke the window_______. b’) The window broke. Ergative: β[John rolled α[the ball rolled down the hill]]. (Also see [20] for similar constructs as defined by ‘VP-shells’).

In the structure α[], we find that the verb ‘roll’ theta-marks its subject as THEME. But when MOVE is applied whereby the verb ‘roll’ moves up the tree, what we

102 | Reflections

on Syntax: Lectures in General Linguistics

find is that the theta marking of the verb ‘roll’ now treats ‘John’ as AGENT. Hence, movement, in this way, has the ability to mark different theta assignment (similarly to what was shown above). Let’s see how so-called unaccusatives similarly work: Unaccusative: [7]

c) There came a violent scream from the yard. c’) A violent scream came from the yard.

Both types of structure show MOVE of the verbs accordingly: d) β[John rolled α[The ball rolled down the hill]]. ‘roll-α’ marks THEME ‘roll-β’ marks AGENT e) β[A violent scream came α[(There) came a violent scream from the yard]].

In (e), it seems that for unaccusative constructs, the apparent subject (‘scream’) originates as the verb’s complement/object (e.g., [VP came a scream]). (See [21] for tree diagram of unaccusative subject movement). NOTE: Perhaps, the most crucial aspect here is that potential subjects (e.g., ‘The ball’, ‘A scream’) start out as subjects internal to the VP, what is termed the Verb-Phrase-Internal Subject-Hypothesis (VPISH)—particularly, if we take β[[]‌] to be TP, and α[] to be VP. See [18b] and [19c] for VPISH example showing [TP [VP Mary likes John]] where subject ‘Mary’ must then raise out of VP and insert as Spec of TP (a stipulation that a Tense Phrase must have a subject).

Reflections on Syntax | 103

Labeling Account A further reason for movement may have to do with the labeling of a phrase. In terms of a ‘Merge-base’ theory of language acquisition, first-instance merge can only establish a primitive, unordered set of two items {a, b}—for example, an unordered {N, N}-compound of ‘boat-house’ would allow the ambiguous readings of either ‘a kind of house’ and/or alternatively ‘a kind of boat’ e.g., {boat- house} and/or {house-boat}. In this first-instance of Merge, no labeling of the phase can yet be determined (as we are unsure which of the two Ns will be realized as the Head of the compound NP). This ‘unordered nature’ is the result of so- called sisterhood relations where no hierarchy is yet established. It is only with second-instance of MOVE that order is derived out of a set {a {a, b}} which yields the proper recursive properties of hierarchical syntax—e.g., a ‘house-boat’ {house {house, boat}} now reads unambiguously only as a ‘kind of boat’. It is this property of recursion that allows for projection and labeling of a phrase to take place: [8] ‘boat-house’ (= a kind of house) Boat-house β Boat

[boat [boat, house]] (where order is fixed)

house-boat // boat-house (where order is irrelevant): α boat

house

: both ‘boat

Theory-internal considerations are made within MP which stipulate for a labeling algorithm (LA) to scan syntactic objects (SO) and provide a label for one of the two SO-items (within the binary pair). In this case, MOVE is required in order for LA to break the flat sisterhood relation found of the SO. Move thus allows for one of the selected items to move up to a higher phrase, thus creating what is termed in MP as dynamic antisymmetry. In sum, in the flat structure found in [8α], order is symmetric and irrelevant. So, MOVE (a form of raising—the unique displacement property found in language) must ensure to allow for the recursive structures found in [8β] above. MOVE thus provides a mechanism for labeling the head of a phrase. Again, as an end-result of MOVE, a ‘boat-house’ is read as a ‘kind of house’ (and not as a ‘kind of boat’).

104 | Reflections

on Syntax: Lectures in General Linguistics

Syntactic Reasons for Move: To Decrease Formality [-Interp] But the above cited reasons for MOVE are for the most part communicative [+Interp] in nature—viz., to enhance communicative value. But what are the motivating factors behind [-Interp] syntactic displacement when no apparent communicative value is at stake? Of course, the whole notion as to why there should even be [–Inter] formality in languages in the first place—formalities which do not necessarily enhance communicative value—is the $60,000 question which still quite baffles linguists. To some degree, what can be said is that such seemingly superfluous movement is seemingly an epiphenomenon and comes ‘for free’ as a byproduct of the design of language. In any case, languages do have formality—some language-t ypologies have quite a bit of formality in abundance, much more so than others, and linguists have to grapple with what to do with it in order to:

(i) Ensure a proper spell-out of communicative operations (both, at the phonological and semantic levels), while, (ii) Find ways to strip the otherwise formality of its non-communicative valuation.

It is this latter reason for Syntactic MOVE (Syn-MOVE) which has propelled much of syntactic investigations (of the generative grammar enterprise) over the latter part of the last century—the question being What motivates syntactic movement: ‘Why MOVE?’

Affix Movement Theoretically, in a very general sense, it seems MOVE allows an affix to freely search for a relationship with an ‘itemized’ (I) stem—a search process which isn’t beholden to [+Itemized]/[+Frequency]-driven association. Such [+I/+Freq] one- one association is the classic hallmark property of word-building formations— processes which creep into the lexicon such as lexical development, compounding, derivational morphology, irregulars, etc. Rather, what we find at this manner of

Reflections on Syntax | 105 MOVE-based search adheres to category-based processing over item-based processing. In more recent Minimalist Program (MP) terms (Chomsky 1995), a probe- goal MOVE relation is said to develop which allows an item (stem) or a word particle (affix) to search over an array of lexical items (found in the parsing-structure) in search for a relevant, feature-matching counterpart. If [+I], [+Freq] processing is targeted in the formation of words, (at the lexical level), then [-I]/[-Freq] allows MOVE to break away from adjacency constraints & conditions of association which bind word-formation processing and allows for displacement and categorization to take place (the two conditions required of human syntax). The former word-based itemization without move is referred to as vertical processing [a] = a lexicon [b] [c]

—a one-to-one associative means, which stacks word-chucks on top of each other (as would be found in a dictionary). Vertical processing is deemed flat since sister/word relations [a]‌, [b], [c], would have no hierarchical structure. On the contrary, the latter ‘horizontal-processing’ speaks to the notion of the spreading of rules (horizontally), as with any rule: [[x]+y] =z. At this latter stage, we get what is referred to as recursive structures [a [b [c]]] typical of horizontal spreading (a result of MOVE): (spreading: moving away from an initial item with a ‘memory- embedded’ trace of the previously item parsed, with no memory lost). [a [b [c]]] = a syntax

The same dichotomy found in morphology parallels these two modes—with derivational morphology being of a [+I, +Freq] ‘word-based’ nature (vertical/ flat), and inflectional morphology being of a [-I, -Freq] ‘category-based’ nature (horizontal/hierarchical). Let’s summarize these two modes below (leaving further discussion to the core chapters of the text).

106 | Reflections

on Syntax: Lectures in General Linguistics

Movement Distinctions Based on Inflectional vs. Derivational Morphology: ‘Fascinating’—Types (Item- Based) vs. ‘Celebrating’—Types (Category-Based) Let’s consider the dual treatment by examining so-called [fascinating]-t ype processes over so-called [[celebrat]ing]-t ypes. (Note the use of brackets []). But first we must give these two examples some structure (since any assessment of language must be structure-dependent). [9] (a) This is a [fascinating] class.

(b) Mary is [[celebrat]ing] her birthday.

While we realize that ‘fascinating’ in (a) is an adjective, which modifies the noun ‘class’, we also see that ‘celebrating’ retains its verb status. In other words, an item-based (word) change has occurred in (a) but not so in (b). In (a), the verb ‘fascinate’ (to fascinate) has now become an adjective (AdjP) via derivational morphology. We can tree diagram the structure found in (a) as shown below: (a’) ‘This is a fascinang class’ AdjP Adj N [fascinating] [class]

= flat

[FASCINATING] (‘Vercal’ as an encyclopedic-reading of items top-down) [CLASS]

But notice how the _{ing} (derivational-a ffix) is sensitive to the [+frequency] of the item (stem): namely, the {ing}-a ffix ‘Verb-to-Adjective’ derivational formation only works with a select few stems (a few select ‘items’), such as:

(i) fascinate + {ing} => Adj (ii) amaze, (iii) interest, (iv) bore, (v) intrigue, etc. (vi) * Wonding (but Wonder + {ful} (‘wonderful’). (Note the selective properties of some verb stems which enter into adjectival derivation, all of which select their unique endings: {al} for critic => critical, {ory} for

Reflections on Syntax | 107 deposit => depository, {ive} for collect => collective, {ful} for wonder => wonderful, etc. etc.)

Note how, for example, the verb in (vi) ‘wonder’ could not be slotted in this same {ing} ‘verb-to-adjectival’ class: (e.g., ‘This is a *wondering class’ is ungrammatical), but that the affix {ful} must be applied, correcting the phrase to ‘a wonderful class’. Or notice how ‘collect’ becomes collective as in ‘collective bargaining’, etc. There is a whole itemized (lexical) list of matches which stems have to consider when their relevant derivational affixes have to be paired. This is part of the native knowledge we have in knowing our language’s lexicon—it is the kind of declarative knowledge that rides on the back of brute-force and itemized memorization schemes. Other such affix matches would be {_ous} (a dangerous road), {_a l} (a critical period), {_en} (a broken window), etc. (And this is by no means an exhaustive list … It seems we just know them). But if this item-based process is completely memory-based, then, from time to time, speakers should have ‘look-up & retrieval’ glitches, a kind of bottleneck in memory, which may pose problems when recalling the correct affix match. And this is precisely what happens. Take for example the verb ‘celebrate’. How do you turn this verb into an adjective? Is it a ‘celebrating gathering?’ No! The correct affix to look-up is {_atory}. (Note the phonological schwa reduction on the third stress of the stem √celebrate from /e/to /Ə/in celebratory). Sometimes native English speakers stumble on this affix since it may take-on a [-Freq] ranking: ({atory} is not often heard in the input)—e.g., we should say ‘a celebratory gathering’. So, it seems that moving from verb to adjective (an item-based derivation) is indeed rather rote-learned and frequency sensitive (i.e., reliant on the frequency amount of token examples as found in the input). In sum, what we can say about derivational morphology is that the ‘selection processes’ of affix per word is certainly ‘item-based’, as its internal processing is extremely sensitive to the +frequency of input—in the sense that specific stems must attach to specific affixes, as promoted by the frequency of such composition. In some sense, there really isn’t any decomposition at all of the [stem and [affix]], but rather the [stem + affix] seems to be processed as a completely whole new word-item with its internal semantics in place: e.g., the derivational {er} (as in teach => teacher) reads off its lexical semantics as : e.g., [toaster], [driver], [dancer], [teacher], etc. So then, what about ‘celebrating-t ypes’? Let’s consider such ‘celebrating-t ypes’ below.

108 | Reflections

on Syntax: Lectures in General Linguistics

(b’) Mary is [[celebrat]ing] her birthday. AspP (= Aspect Phrase: progressive ‘Be + [V +[ing]]’ VP Asp V N ing [celebrat]ing her bd.

The counter example here shows categorical processing (a rather horizontal mode of the processing of language). In such category-based processing, note how the affix {_ing} is rather indifferent to what the stem is (item), as long as it is a verb (category). In a sense, (quite an important sense), the inflectional {ing} is +Frequency sensitive NOT to the item/stem, but is frequency sensitive to the category (Verb). The actual item (word) plays no role in determining whether or not the rule can be activated: all the {ing} looks for in ‘celebrating-t ype’ processes is to attack to a Verb-category. That is all. This is very different than what we just saw above regarding derivational morphemes, which looks-up and selects specific stem/items to attach to. We can show this rather creative productivity by placing any nonce word (a made-up word) in the V-slot and then carrying on with the rule. Try it! Say the sentence ‘John is spigulswog-ing away all of his money’ (with √spigulswog as nonce word). (Note in fact how it’s the INFL-a ffix {ing} itself which defines the nonce word as a verb). What we notice is that there is no look-up of whether or not the affix {ing} matches the stem-nonce. It just applied itself in all environments. This free-range of productivity is a hallmark-feature of a [-Freq of item] but [+Freq of category] processing, uniquely characteristic of MOVE. What we find of this hallmark of MOVE is that it is rule-based (hence its productive nature): in this case, the rule would be Aspect/Progressive: Be + Verb + ing = progressive grammar. Note how the underlying processing of such a stem and affix would inherently be decomposed as shown in structural-brackets [stem [affix]]. The form of decomposed Stem + affix, or ‘celebrating-t ype’ is what we term INFLectional morphology. Inflectional morphology is in stark contrast to derivational morphologies which cater to undecomposed [stem- affix] processing, a very lexical & word-based mode of retrieval, storage and recognition.

Reflections on Syntax | 109 While we notice how the derivational ‘fascinating-type’ {ing} is beholden (tethered) to specific stem/ items, we equally note how the INFLectional ‘celebrating-t ype’ is free. We can summarize the two {ing} formations below (where [-productive] requires the selection of a specific stem/item, while [+productive] is free to select any stem/item, even a nonce item): [10]

(a) ‘Fascinang types’ (derivaonal): as ‘chunking’ [ ]. [Stem + ing]

Stem is [+Frequency] sensive/ [-Producve]

(b) ‘Celebrang types’ (infleconal): as embedded/recursive [ [ ] ]. [Stem + [ing]]

Stem is [– Frequency] sensive/ [+Producve]

In morphological terms, these two types are often referred to as ‘stem-based/lexical incorporation’ [10a] vs. ‘affix-inflectional’ [10b]. For instance, the past tense feature in irregular verbs such as ‘went’ would be stem-lexical incorporation [T {+past} went] = [went] while regular rule-based verbs such as ‘talked’ would be affixal [T {+past} ed [V talk]ed] = [[walk]ed]. The former showing merge-[], the latter showing move-[[]‌]. Note that irregulars as stem-based are [+Freq] sensitive while rule-based affixal structures are [-Freq] sensitive. (Also, see note on anti- locality in [20] below). However, note that we could take the stem-verb √fascinate (to fascinate) and turn it into a ‘celebrating-t ype’—e.g., ‘She is fascinating him with her eyes’ (showing Aspect/Progressive). Again, recall that all the {ing} cares about here is that it attaches to a stem (presumably a verb) in order to satisfy the progressive rule: [be + V + ing]. But a very funny thing happens when we try to go the other way, by turning the stem-verb √celebrate into a ‘fascinating type’ (i.e., by turning it into a stem-adjective): recall, how ‘celebrate’ must take on its unique [+Freq] affix {ory} as found in the structure ‘This was a *celebrating/celebratory party’ (just as we found with the example deposit > depository cited above). Also note how the stem-internal phonology shifts—a feature very typical of a new-word classification: (celebrate) /sεlIbret/> (celebratory) /sεlIbrәtori/. We find a complete list of such phonological stem shifts in derivational formations—for instance, just sound-out the following derivational Noun-to-Verb counterparts

110 | Reflections

on Syntax: Lectures in General Linguistics

and notice the internal stem sound shifts: glass (N) => glaze (V), grass (N) => graze (V), bath (N)=> bathe (V), life (N) => Live (V), other examples include real > reality, secret>secretary, etc. What we are generally carrying on about here is what is referred to linguistically as MOVEment, and as we have seen above, there are a number of reasons for MOVE. Syntactically, the notion of movement has been expressed in GE terms by the need to either acquire and/or check-off formal features. For example, take Case (Accusative Case marker {-m} as found in the wh-object whom). Case seems to be the kind of formal feature (along with AGReement) which doesn’t impact the semantics of mere communication. This can be seen today by the very little observance of the accusative marker {m}—e.g., must native speakers of English prefer to say ‘Who does she like?’ and not ‘Whom does she like?’ suggesting that the case marking {-m} is rather invisible to the semantics of communication. Hence, in GE terms, such formal features (such as Agreement or Case, among other purely syntactic features) must be stripped-off of its value from the lexical item before they can enter into linguistic processing.

Case as Trigger for Movement Case represents one of those Syn-MOVE operations which seem to motivate movement. Assuming in the first instance a Double-Noun N+N (Object) construct to be a product of (vertical) Merge (of both Noun items) and showing how Move triggers (horizontal) Case-marking (an example of inflectional morphology), let’s now return to our template in [1]‌, restated here in [11] and consider how the formal feature of Case triggers movement. [11]

Case

=> Move => Merge N N

Consider double object constructs N+N where a pronoun is employed (noting that in English, pronouns must assign overt case). Let’s consider the two sentences below (IO = indirect object, DO = direct object): [12]

IO DO a. John gave Mary money b. John gave *money Mary

*(noting the ungrammaticality of (b)).

Reflections on Syntax | 111 In this configuration showing covert Case (structurally assigned via the verb ‘give’), no movement is forced and the double Nouns can sit together in IO+DO order (which may in fact constitute the default order). Overt Case of ‘Mary’ is achieved via a structural relation of ‘Mary’ to the verb ‘give’ (Pronoun/ case = ‘her’). In this sense, the verb ‘hands-over’ Case structurally and covertly (as there is no overt Case-marking assigner produced at the surface level). But notice that once MOVE is employed, Case gets triggered by an overt Case-assigning constituent {to} (whereby {to} functions as a Case-marking clitic). [13] a. John gave money to Mary …(to her).

Let’s see how the template in [11] operates to Case-mark the pronoun Mary/ her in [13] above: [14]

[15]

=> Move Case => Merge _to N N Mary money => Move Case => Merge money_to N N Mary money a. John gave money to Mary b. John gave *money Mary

Let’s follow the steps to the process:

(i) The first-instance of merge takes both Noun/items and merges them as an NP. This is equivalent to our vertical processing which shows no movement. (ii) The second instance of move forces one of the two nouns to move for reasons having to do with overt Case.

This stepwise process is interesting because it accounts for how we might get double Case-marking operations. Consider the two-step derivation of (i) his friend, and (ii) friend of his:

112 | Reflections

on Syntax: Lectures in General Linguistics

[16] Steps of MOVE-based derivations: (i)

Merge N Him

N friend

(showing [-Possessive] case)

(before ‘him’ raises) [him, friend] (a non-recursive merge)

(ii)

Move

e.g., I am ‘his friend’ [+Poss case] => move => merge

Poss His N N Him friend (‘him’ raises) (iii)

[his [him, friend]] (a recursive move)

=> move-2 e.g., I am a ‘friend of his’ N => move-1 friend Poss => merge of Poss his N N him friend (both ‘him’, and ‘friend’ raise)

The same processes are at work with MOVE-based derivations such as the adjectival reading of ‘Wine bottle’ (Merge) versus ‘Bottle of wine’ (Move). Consider the derivations of Merge>Move below, and consider how the processes map on to the above Possessive derivations:

Reflections on Syntax | 113 Merge

(iv)

N N wine bole

The merging of two items {N+N} forming an Adjecve Phrase. => (v)

AdjP Adj wine

N bole

[wine, bole] ( a non-recursive merge) [coffee, cup]

(vi) ‘Bole of wine’

N

=> move => merge

bole Poss of N N wine bole

(vii)

=> move-2 N => move-1 friend Poss => merge of Poss Tom’s N N Tom friend

[bole of [wine, bole]] ( a recursive move) [cup of [ coffee , cup]]

e.g., I am a ‘friend of Tom’s

‘Root vs. Synthetic’ Compounds In the same vein of our two-step derivations found above, consider the ‘local- Merge/semantic’ vs. ‘distant-Move/syntactic’ distinction in the e xamples below:

114 | Reflections

on Syntax: Lectures in General Linguistics

(viii) Merge (non-Move): Root compound ‘chain smoker’ => Not a *‘smoker of chains’ (so, = root compound) Ex. John is a [N chain-smoker]

Merge N - N Chain - smoker => AdjP

Note how the above Merge construct would be similarly compared to a simple merge-based Adjectival Phrase ‘black bird’: Ex. This is a [AdjP black bird]

AdjP Adj N black bird

=> Merge

(Noting that blackbird, as a compound in its own right, has a different interpretation from the Adjective Phrase: a black bird is a bird colored black, while a blackbird is a type of bird, presumably black).

So, we note above that there’s no necessary movement to derive the double- noun sequence {N+N} as a combined single N (compound), with the appropriate interpretation of the N+N sequence that follows—e.g., John is a ‘chain-smoker’ has the same non-movement quality as found in the structure John is a ‘teach-er’ where the derivational morpheme {er} is of a strict non-move-based procedure. Let’s note how ‘chain-smoker’ doesn’t take on a modification-interpretation as *‘smoker of chains’, which is actually what we do find below in synthetic/move- based compounds. (ix) MOVE: Synthetic compound

Note the stepwise MOVE derivations:

cigarette smoker => is a ‘smoker of cigarettes’

Reflections on Syntax | 115 (a) cigarette smoker (b) smoker of cigarette(s) (c) a cigarette smoker

(Merge) (Move-1) (Move-2)

Stepwise derivations of movement: [17]

=> move-2 (raising of the Noun ‘cigaree’) N => move-1 (raising of the Noun ‘smoker’) cigaree N => merge (the simple merge of two items) smoker Poss of N N cigaree smoker

While ‘Root Compounds’ carry a [-Move] feature, it’s clear to see that the ‘Synthetic Compound’ carries a [+Move] features.

Where Move? Traces of Movement Left Behind This question of movement requires the postulation of a base word order before we can gleam any amount of movement (displacement). In more recent GE terms (Kayne 1994), a theory was developed which suggested that all base word order, for any given language, consists of the universal-default SVO order (Subject, Verb, Object). As an extension of this, the more local configuration, say of a phrase, was deemed to be the ‘verb-object’ (VO) relation, whereas the subject/Spec position (sometimes referred to as the ‘elsewhere category’) was deemed to be a positional slot outside of this local VO configuration—the Spec position came to be interpreted as a higher available slot which could host not only subjects but any moved item from below. Hence, Subjects (located in the Spec position in X-bar terms) was rather a peripheral position, outside of the core VO configuration. The Spec as an elsewhere-category took on crucial implications throughout GE theory. One other interesting theory made by Kayne (1994) regarding universal SVO word order and phrase-projections is that all right-branching structures have

116 | Reflections

on Syntax: Lectures in General Linguistics

to end in a trace (i.e., that all Complement positions must end with a trace). (see Kayne’s ‘Linear Correspondence Axion’ (LCA)). Also see note in [20] below regarding anti-locality, movement, and VP-shells. [18] X-Bar Theory. (a)

XP (Phrase) Spec S

X’*

*(The VP, specifically the VO-relaon is opmal

Head Comp

(b)

V

(See Lecture 3 [§§5, 7.1] for discussion of X-bar theory).

in delivering [+value] communicave informaon).

O

VP (= VPISH) Spec Mary

V’ V

N

Likes John

But in GE terms, it was very quickly spotted that the affix {s} of ‘likes’ (in b) (which is a formal AGR marker showing subject-verb agreement) could not have been processed as a lexical chunk *[V likes] since only the base stem [like] of the VO-relation serves purposes of communicative value (cf. functionalism). Hence, similarly to the {-m} Case marker shown above, likewise the AGR {s} marker must be realized as a [-value] formal feature and therefore must project higher up in the tree above the base-generated VO-configuration. This leads to the expansion of the tree diagram that shows all SVO declarative sentences to be derived from out of a formal/functional TP (Tense Phrase), as shown below:

Reflections on Syntax | 117 [19] Tense Phrase (TP) = Declarave structure. (a)

TP (not showing VPISH) Spec T’ Head VP

Mary

T V

N

{s} like John

*In [19] above, we don’t show a full VPISH- porjection: but note here that all subjects are VP-internal subjects and that they must then raise to TP. (VPISH) = Verb Phrase Internal Subject Hypothesis which stipulates that all subjects must originate within (in situ) the VP for theta-marking considerations, given that theta assignment is a [+Value, +Interp] feature which must reside exclusively within a lexical category (such as VP). The higher functional categories (vP, TP, CP) are therefore responsible for all other [-Value] [-Interp]-feature specificities. Hence, movement of affix AGR marker {s}: T’ (b)

T

VP

s V N [[like]s]

In the above sense, a TP is a formal functional phrase—with the Spec being an elsewhere category to host subjects (for TP) and other moved items (for CP/ adjuncts). In fact, theory-internal considerations suggest that the subject indeed originates in the Spec of VP (subject VP-internal hypothesis) and then raises to Spec of TP (Fukui & Speas 1986). (See Radford 1997 for discussion). (c) [TP Mary [T {s}] [VP Mary [V like] [N John]]]

118 | Reflections

on Syntax: Lectures in General Linguistics

Move on What (Scaffolding)? What the above X-bar scaffolding (a syntactic tree diagram) allows us to do is to conceptualize where words first locate in their original base-generated structure, assuming, (as a universal property) that all sentences are base-generated as SVO. Thus, if there is any order other than SVO, we can hypothesize that movement has taken place in order to acquire and/or strip off formal features. Since all languages begin with the verb-stem, and hence the verb phrase, added structure (such as TP, CP) would be needed only when stipulated by these formal syntactic features. In the case of most languages, a higher projection of TP is required simply due to the fact that most languages engage in some kind of formal features beyond Tense (e.g, Case, AGR). And since a prosaic VP cannot host formal/ functional features, an extended TP becomes the universal scaffolding for all declarative sentences across all language types. (In other words, the TP tree along with the VP are said to be a universal). This VP>TP-projection is also coupled with the notion that even higher-still projections can be extended via multiple Spec positions. It seems Spec-Head of TPs are the kind of landing sites which could always be used to either host further adjoined words which raise and/or either immediately insert into the derivation— as do (e.g., words such as Modals and Auxiliary verbs which don’t necessary move from lower down in the tree). But the essential reason for TP is to host the subject (a subject which begins as only peripheral to the verb phrase, recalling that many languages don’t require overt subjects to insert into Spec (pro-drop), while all languages require objects of some kind to be COMPlements of the Verb). Out of these SVO assumptions, further refinements of GE called for all declarative sentences to have the base configuration of TP. [20] VP-shell.

Finally, another theory- internal consideration, in addition the VP- Internal Subject Hypotheses, is the hypothesis that there could be a double VP stacked on top of one another (what is referred to as ‘VP-shells’, Larsonian shells). (See Larson 1988). Consider the kind of double scaffolding which would be required to represent the kind of construct (known as ergative sentence-t ype): (a) [TP …[vP John rolled [VP the ball down the hill]]]. (b) [TP … [VP The ball rolled down the hill]]. (Ergative structure)

(See Radford 1988, p. 374 for discussion). (Full TP-projection not shown).

Reflections on Syntax | 119

Split VP-Shells (Theta-Marking) [vP John rolled [VP The ball rolled down the hill]] vP => Case and θ-marking AGENT Spec

v’

John v VP =>The ball roll down the hill rolled-Ø spec V’ roll θ-marks ‘the ball’ as THEME the ball

V rolled

PP

P down

DP D

N

the

hill

(roll θ-marks ‘the hill’ as GOAL)

*Note on anti-locality and VP-shells. One possible ‘anti-locality’ argument which might support VP-shells (double VP-projections)—an expansion which would seem to break with anti-locality—is to suggest that such VP>VP projections are a result of merge (external merge) and are not Move-based, hence the conditions stipulated for anti-locality are preserved. (Also, DPs in this sense could merge and stack on top of each other, e.g., ‘The last two years were bad’ where ‘the’ ‘last’ ‘two’ are triple DPs: DP>DP>DP). Such a merge operation of phrases would be similar to what we find in merge-based root-compounds as opposed to move-based synthetic-compounds (as seen above). Also, within such VP-shells, the driving force behind the twin projections is usually thematic/semantic in nature and not syntactic (as found in [20] above), and semantic-feature attachment is arguable merge-based—viz., arguments have even been spun to suggest that the Tense-feature is ‘semantic’ in nature (a [+Interp] feature) and may motivate a simply merge-operation onto VP—a direct merge application which may not motivate phrasal movement upward. Others suggest that the distinctions between ‘merge vs. move’ are incorporated by two separate probe-goal relations: ‘sematic vs. syntactic’ (see Lecture 2, ‘Duality of Semantics’, e.g., Miyagawa 2010). Other parallels could be drawn between affixal vs. stem-based morphology, whereas

120 | Reflections

on Syntax: Lectures in General Linguistics

affixal is move based, and stem is merge—e.g., consider the past tense affix-feature of the verb ‘talked’ [T {ed} [V talk]ed]] = [[talk]ed] vs. the stem-feature of the adverb ‘yesterday’ [Adv {+past} yesterday] = [yesterday]. While adverbs are not typically thought of as assigning tense, surely ‘yesterday’ has a merge-based tense feature which must agree with its matrix verb: e.g., ‘Yesterday, I *want/wanted a nice dinner’. While both words ‘yesterday’ and ‘wanted’ could be said to mark [+Past], the ‘verb’ is ‘affixal’ and move-based, while the ‘adverb’ is ‘stem’ and merge-base. (Likewise, it has been argued that even nouns can carry a similar embedded tense feature: see Radford 2004, 2009). Of course, such arguments are moot when speaking about vP>VP shells e.g., as found within ergative VP-shell constructs ‘John rolled the ball rolled down the hill’ type (above), since light verb vP are considered as functional categories, different from main VPs which are lexical categories; hence anti-locality is preserved. One final note regarding how to distinguish external merge (true merge) from internal merge (true move) is to see if there is any trace being left behind (e.g., Kayne’s axiom that all right-branching structures must end with a trace). Thematic assignment as seen in [20] doesn’t leave a trace behind since theta percolation up the tree would be non-indexed, marking different theta assignments: e.g., as found in [20] above. [21] Unaccusatives Subjects

Note how similar raising/movement is found in unaccusative-subject structures: a. [TP have [VP [V arisen] [QP several complications]]] b. [TP There have [VP [V arisen] [DP several complications]]] (EPP) c. [TP Several complications have [VP [V arisen] [DP several complications]]]

Reflections on Syntax | 121 But notice that here in the case of unaccusative verbs (such as arise) what becomes the TP subject via raising originated as the VP complement (where it was assigned its theta role THEME): [21.1] ‘There have arisen several complications’

TP PRN there

T’ T have

VP V

DP

arisen [21.2]

several complica ons

TP

DP Several complica ons T have

T’ VP V arisen

DP

several complica ons

Let’s consider below some interesting examples of MOVE as it applies to French. [21] French:

French is famous for exploiting movement not only for clitic formations of the SOV(O) ‘Je t’aime’ sort (I you love = I love you) talked about earlier in [2]‌above

122 | Reflections

on Syntax: Lectures in General Linguistics

(where the base-generated Object (of SVO) raises above the verb), but also even more complex forms found below, where the complete SVO word-order is reversed: O V S (1) Sans doute vous écrira-t-elle. (Often used in 19th century high-literature French). (no doubt (to) you will write she) (=no doubt she will write you) (2)

O,S O V Mon chien, je l’ai perdu (my dog, I it have lost) (= I have lost my dog) [+Fin] NEG [-Fin]

(3a) Je ne mange pas manger (3b) Je ne veux pas manger

Examples such as those found in [21.1] above perhaps suggest a portmanteau of procedures: perhaps …

(i) that a form of passivization is triggered by the structure such that ‘active’ SVO (=‘Elle écrira une lettre’ (=She will write a letter)) becomes a ‘passive’ OVS. (ii) This may also be coupled by the fact that a sort of focus is required which has the object fronted to the early part of the sentence.

*Note: A possible version of this ‘verb-passivization’ process can be found in Spanish with the verb gustar (= like, pleases me) such that the subject-position of the verb gustar must be marked for the accusative case Me as in the sentence Me gusta coca (where the inflectional {a} marks for 3rd person agreement with coca, and not with the first person subject (Yo/me))—noting the ungrammaticality of the Nominative case *Yo gusto coca. Hence, the SVO utterance Me gusta coca-cola actually is processed as OVS (= Me pleased (by) coca-cola (Coca-cola pleased me)). Also note the notion of fronting can be found in examples where certain elements of the utterance might want to appear closer to other related elements. For example, in English the Adverb ‘quickly’ typically follows the noun phrase, e.g., John ate the dinner quickly, and not before *John ate quickly the dinner. However, movement seems to be involved (referred to as heavy-NP shift) whenever the

Reflections on Syntax | 123 noun phrase ‘the dinner’ needs to be in an adjacent position to further modification (perhaps via virtue of some pragmatic constraint)—e.g., ‘John ate quickly the dinner Mary prepared for him’, ‘John ate the dinner *quickly Mary prepared for him’. (Such expressive subtlety is reminiscent of the high-literature French example found above). Regarding [21.3], Pierce (1989) classically makes use of a fixed NegP to demonstrate that whenever French verbs move and raise across Neg (Pas) they take on a finiteness [+Fin] feature, and when they stay put lower down in the structure (below Neg) they preserve their infinitive [-Fin] feature. Consider some examples: (4) (5)

Verb + Neg (pas) a. marche pas (walks not) [+Fin] a. ça tourne pas (this turns not) [+Fin]

Neg (pas) + Verb [-Fin] b. pas casser (not to break) [-Fin] b. pas rouler (not to roll) [-Fin]

*Note in English that main verbs are not allowed to raise and cross NegP (and where an auxiliary/modal verb do/will must directly insert): (6)

a. *She speaks not French. c. *It breaks not

b. She does not speak d. It will not break.

In English, only a certain category of words (Auxiliary DO, BE, HAVE) as well as Modals (can, could; shall, should; will, would) can be inserted above NegP (Not) in order to form negation.

5

Reasons for Syntactic Movement/‘Four Sentences’ Revisited

[1]‌Taken from the above discussions, it appears that syntactic movement (a displacement of items), orientates from one of two conditions: that it be PF-based (showing displacement of items at the surface phonological level), or, that it be LF-based (showing displacement as a way to focus or emphasis a string or item).

One point raised in Chomsky’s Minimalist Program (MP) (1995), which had earlier antecedents in the literature (e.g., Pesetsky 1982), is the theoretical condition that paths (its ‘trace’ pathway) of movement cannot overlap (can’t cross one another), labeled as the Path Containment Condition. In more recent terminology, this same condition is referred to as the Edge Constraint. Let’s consider how these two similar conditions on movement work when considering ‘wh’-subjects. [2]‌ Wh-subjects (Subject questions)

The syntax of wh-subject question is quite interesting in a number of respects having to do with movement. One question has to do with the nature of the position of the wh-subject as it starts its projection—namely, does the ‘wh-subject’ (i) remain in situ in spec-TP (like all typically subjects), or, does it (ii) advance up the tree to spec-CP (like all wh-question operators) in order to check off a question feature {Qf}?

126 | Reflections

on Syntax: Lectures in General Linguistics

A second question to ask is whether or not such CP involvement (cf. ii) would trigger necessary Auxiliary inversion, typical of all wh-operations, e.g., [CP what did you like [TP you did like what?]] where the Aux ‘did’ moves from Head-TP to Head-CP. CP involvement (option ii) also puts into question the status of C-head (of a potentially unfilled Head which would serve as the landing site for Aux inversion), given that the status of Head of C would project a required tense feature (requiring the Aux verb to be drawn up to C), as in the past tense feature {Tf} of the Aux verb ‘do’ in the example below (noting that both Head of C as well as Head of T are heads which project tense): [CP {Qf} what [C {Tf} did] [TP you [T {Tf} did] [VP [V like] what]]] [3]

a. *who did find it?

(Radford 2016, p. 336).

b. who did you find? *a’ [CP who [C did] [TP who [T did] [VP [V find] it]]](*overlap/cross movement)

b’ [CP who [C did] [TP you [T did] [VP [V find] who]]]

The structure in [*3a] is only possible when projecting not as an interrogative CP, but rather as an emphatic TP usage—where do is emphasized, and where who is treated as subject in Spec of TP. Notice that such movement not only crosses paths, but also the CP projection (as compared to TP) does not affect a word order distinction at PF. In other words, perhaps what these ‘path conditions’ reduce to is a rather more robust condition which stipulates that movement must (either):

(i) Enhance interface at PF (e.g., via word order, affix hopping, etc.), or, (ii) Enhance interface at LF (e.g., via emphasis, focus, scope).

In other words, perhaps such conditions on movement are similar to ‘economy conditions’ that stipulates that movement must expand a new XP—namely, that movement can’t take place within the same XP. Hence, movement must expand the tree upward to create higher functional projections. If movement can’t advance the tree projecting a new higher phrase, then economy constrains (anti-locality) prevent its movement and so movement is blocked. (In other words,

Reasons for Syntactic Movement/‘Four Sentences’ | 127 nothing is ‘syntactically gained’ by moving an item from Head or Comp of XPα to Spec of the same XPα). (See [6]‌below). We clearly see that in [3]‌, the CP word order [CP who [C did]]… parallels the word order of TP [TP who [T did]], thus making [3a] an illicit structure, while [3b] is fine. Noting that the above movement in [3b] shows enhancement at PF, taking the path of the string TP>VP [did who] and altering it in CP to [who did]. [4]‌ Subject Questions

One interesting question to ask is why we don’t get ‘Do-support inversion’ in subject questions—in other words, why should (b) be ungrammatical which shows Do-support? Apart from the prior analysis taken above regarding overlapping/ crossing-pathways (cf. Pesetsky 1982), let’s take a closer look at the contrasting ungrammaticality of Do-support in (b) versus the grammaticality of Do-support in (c) by considering what motivates Do-support inversion found in question formations in the first place. Let’s flesh out the three structures below: (a) Who found it? (b) *Who did find it? (c) Who did you find?

(No required Do-support) (Illicit Do-support inversion/*ungrammatical structure) (Licit Do-support inversion/grammatical structure)

(Note *(b) above is ungrammatical (showing illicit T-to-C movement) when used as an (unstressed) ‘do-supporting’ interrogative structure. A parallel emphatic usage (non-interrogative) would have the ‘do/did’ as phonologically stressed, and thus would be grammatical—e.g., Who DID find it! John! John DID! Really?)

Note. Recall that Auxiliary inversion/Do-support works in tandem with Yes- No question and Wh-question formations: (i)

[CP [C Do] [TP you [T do] [VP [V like (Yes-No question: Do- pizza]]]]? support/inversion) (ii) [TP you [T (do)] [VP [V like pizza]]]! (Base structure: (do) optional emphatic usage). (iii) [CP What [C do] [TP you [T do] [VP [V like what]]]]? (iv) [TP you [T (do)] [VP [V like what]]] (Base structure: (do) optional emphatic)

128 | Reflections

on Syntax: Lectures in General Linguistics

[5] (a’) Who found it?

(b’) *Who did find it?

CP Prn who

CP C’

C

Prn TP

Prn

who T’

who

C’ C

TP

did Prn

T

VP V found

T’

who T Prn it

did

VP V find

Prn it

(c’) who did you find? CP Prn who

C’ C did

TP => base structure: ‘you did find who’ Prn you

T’ => do-support is triggered once wh-movement passes thru T. T

did

VP V

Prn => pronoun ‘who’ starts out base-generated as object within VP.

find who

As shown in (c’) above, it appears that what triggers necessary ‘Do- support’/inversion is the lower Wh-pronoun/object ‘who’ passing (‘percolating up’) through the T-head (which houses the auxiliary Do) on its way (bottom- up) to Spec of CP. Without a lower Wh-Prn passing through T, there is no trigger for Do-support/inversion. Note that (a’) projects a Wh-subject (not a Wh-object) which means that the Wh-pronoun is base-generated already above T and so has no chance to pass through T, hence, there is no trigger. In sum, movement of wh-object VP-internal passing through Tense Head triggers Aux inversion. Only Auxiliary/Modals can invert by such a triggering mechanism due to their being considered as light-verbs (i.e., abstract in nature). Main substantive lexical verbs in English are heavy in this way and so cannot undergo raising/inversion (e.g., *‘When smoke you cigars?’ corrected

Reasons for Syntactic Movement/‘Four Sentences’ | 129 as ‘When do you smoke cigars?’ where the light Aux verb ‘do’ can Move (percolate up) from T to C, as opposed to the heavy lexical verb ‘smoke’ which must remain always VP in-situ). (i) *[CP Prn When [C smoke] [TP you [T smoke] [VP smoke cigars when]]]? (ii) [CP Prn When [C do] [TP you [T do] [VP smoke cigars when]]]?

Noting that (i) disallows (heavy) lexical verb movement which would emulate what we would see with (light) Do-support/inversion. Also, we must note that this illicit lexical movement found in English is rather language-specific (and thus, must be treated as a language-specific parameter), since other languages, like Spanish, do allow such heavy Lexical verb movement to percolate up the tree [¿CP fumas [TP tu fumas]]? (Smoke you? => English: Do you smoke?) [6]‌ Anti-locality Condition: (Lasnik & Saito 1992).

The condition states that if the Head movement from V-to-T or T-to-C doesn’t enhance and achieve any new configuration, or is too short and superfluous, then the movement is barred. Hence, the superfluous projection found in [3a] is banned since no new (word order) configuration of [who did] has been created. In English, the same ban on local movement prevents V to T as would be in the example *[TP John [T was] [VP [V was] tired]]. (i)

Head movement from V to T, T to C is disallowed: (Boeckx 2008, p. 104). *[TP John [T was [VP was red] *[CP who [C did] [TP who [T did] [VP [V find] it]]] TP T *

CP VP V

C *

TP [who did]

[was]

As cited in Boeckx and Bobaljik (2000) makes the claim that in order for such movement to be licit, and intervening Head must separate the two XPs, as is found when NegP intervenes between TP and VP (as shown below).

130 | Reflections

on Syntax: Lectures in General Linguistics

(See Lecture 4 [20] for a note regarding constraints imposed on movement by the Anti-locality Condition as well as how the condition might license apparently banned double phrases such as VP-shells). (ii)

Intervening XP between V and T: Movement is allowed TP T

(where new XP (NegP) intervenes between TP>VP: NegP

VP (showing ‘was’ as crossing NegP and raising into T)

was Neg not

a new configuraon is extended)

V

adj

was red

[7] So following up on our summary on movement found in lecture-2, it seems that movement at the morphosyntactic level is confined to (at least) two conditions: (i) That it be recursive involving a mother-daughter relation. (ii) That it be non-local creating an expansion of the tree structure.

Here, the Verb must remain ‘VP in-situ’ since Head-to-Head movement from V to T wouldn’t enhance visible word order at PF. Note that affix-lowering of {s} is required as an escape-hatch to non-movement. Such local movement would not expand the phrase into a new configuration. (Note that once a NegP projects, displacing and movement of lower items becomes visible as they must traverse NegP, as they come forth from lower down in the syntactic structure, namely VP). Four-sentences [8] Four-Sentences

1. 2. 3. 4.

Can eagles that fly swim? Him falled me down. The horse raced past the barn fell. I wonder what that is__ _ up there.

i. that [is what]… ii. *that’s [is what ]…

(where strikethrough words show movement to left of the sentence structure). (*Shows ungrammaticality).

Reasons for Syntactic Movement/‘Four Sentences’ | 131 These four sentences (chronologically) take us on an omnibus-tour (1950s– 1980s) of what would shape the next sixty years of the generative grammar framework of Noam Chomsky’s work in linguistics.

 Sentence-1 (1950s). Chomsky (2013 p. 39) refers to such structures as early puzzles, brought to attention early-on in the generative grammar framework by the likes of John Ross (1967) and Tanya Reinhart (1975, 1976): ‘Can eagles that fly swim’? One early puzzle, still alive, has to do with a simple but curious fact, never recognized to be a problem before, though it is. Chomsky goes on to discredit notions which suggest a ‘linear- structural design’ of language put forward by non-language-specific theoretical frameworks of the day, enhanced by cognitive problem-solving mechanisms—by simply showing that structural distance is what is at the core of such processing over proximate linear distance.

Sentence-1 begins to question ‘what it is we exactly know when we know a language’—namely, linguists begin to ask how native speakers come to implicitly know the inner (mental) workings of the hidden structural processing of their syntax, a knowledge which seems to speak directly against naïve adjacency/associative models of processing which were historically advocated by prior structuralist and behaviorists of the time (Franz Boas, B.F. Skinner). This token Sentence-1 is the posterchild behind the Language & Cognitive Science Revolution (and harkens us back to the Galilean Revolution). (See web-link no. 8). Chomsky’s remarks go to the linguistic distinctions which are drawn between the generative grammar perspective vs. associative-based and/or general cognitive problem-solving models of language acquisition and marks the initial period of the Chomskyan paradigm shift. Extending 1950s associative-based models of language through to what would become Connectionism and AI, Chomsky uses such simple sentences—He would often say ‘take the simple sentence’—to demonstrate just how language processing based on associative ‘strength-and- weakness’ connectivity was doomed to fail. The theoretical background to the sentence is set-up accordingly: In 1956, the computer scientist John McCarthy coined the term ‘Artificial Intelligence’ (AI) to describe theoretical processing of intelligence (language) which simply implements the same kind of features which could be coded for a computer-operating system (with the neuro bases of off (0) and On (i) binary weighted connectivity). Most AI cognitive scientists flaunt their 0&1 models by suggesting that their models come closest to what (they think) actually happens in the neuro-cellular brain. (Of course, see Fodor’s ‘The mind doesn’t work that way’ response). By instantiating an intelligent system

132 | Reflections

on Syntax: Lectures in General Linguistics

using man-made hardware—viz., an association-based theory governing the distribution of weighted inputs and outputs, (what would later become known as a connectionism), rather than our own ‘biological hardware’ of cells and tissues (biolinguistics, faculty of language science)—an attempt was to show that AI would ultimately gain understanding, thus paving the way for practical applications in the creation of intelligent devices or even robots. (See John Seale’s Chinese Room Argument). (See web-links 9–12).

 Sentence-2 (1960s) marks the beginning period of intensive, formal child language investigations (The Roger Brown children: Adam, Eve, Sarah, based at Harvard). (See web-link no. 13).  Sentence-3 (1970s) marks the begging of new pursuits dealing with psycholinguists. (See web-link no. 14). Researchers such as Tom Bever, George Miller, Dan Slobin (working within the early Chomskyan framework), as well as more recent psycholinguistic work conducted by Steve Crain and his collaborators, as reported in Crain and Thornton (1998), brought the newly emerging field of psycholinguistics (of the 1970s) into the next century. (See web-link no. 15).  Sentence-4 (1980s) demonstrates just how far a movement-based generative grammar can go to describe and account for such illicit structures found in syntax. The fact that the illicit nature of this string found in sentence-4 is part of what we know implicitly about our native syntax (a knowledge never explicitly taught to us) shows again how the inner mental processes of our language often fall just out of reach/sight from what we typically think we know when we say ‘we know a language’.

This intense decade between the 1980s–1990s marks a refined and articulated analysis of movement operations found in syntax (Merge vs. Move), with its theory coming to full fruition in the publication of the Minimalist Program (1995).1 (If I wanted to add a fifth 1990s example to the list, I’d go with Grodzinsky’s work on how Broca’s area handles movement operations as found in embedded structures, etc.) (Grodzinsky & Santi)2 [9] Let’s consider the ‘4-sentences’ in turn examining how the core property of recursiveness might be implicated as a considered core property of language. Our first play on these sentences is to consider how they might be processed 1 web-link no.16. 2 web-link no.17.

Reasons for Syntactic Movement/‘Four Sentences’ | 133 assuming a flat (non-recursive) structure. We’ll show a flat-structure as [ ], as opposed to a recursive-structure as [ [ ] ]. [10] Sentence-1: ‘Can eagles that fly swim?’ (1950s Chomsky)3

‘Chomsky’s main point here is that we will need more than linear order properties of strings to understand sentences’. Sentence-1 is a classic sentence first used by Chomsky in a variety of his writings to illustrate what seems to be an innate, tacit and universal property which all speakers have in interpreting a given string (string here means a sequence of words in a given phrase or sentence). What is interesting about sentence-1 is that we immediately come to process the second verb (v2) ‘swim’ as the verb which we are seeking to ask ‘what eagles can do’. ‘Can eagles that fly swim’: we are asking ‘can eagles swim’, and not ‘can eagles fly’. But why should this be? In other words, what is it that internally guides us to skip-over the first closest adjacent verb ‘fly’ (closest to the subject ‘eagles’) and rather directs us to process the second verb ‘swim’? Surely, any theory which posits that surface order via adjacency is what guides us to interpret a string would get this wrong. Even more importantly here, consider the fact that what we easily read-off at the surface phonological level amounts to what we would also read-off as interpreted at the flat-structure level. In other words, in order to properly process the string, the PF level (Phonological Form) and all its ‘adjacency glory’ somehow must get reshuffled (scrambled) and reinterpreted at an altogether different level from PF, say at a Logical Form level (LF). So, if we take a naïve theory which assumes ‘adjacency of flatness over recursive structure’, then we should expect to be guided in principle by the first verb (closest to the subject), and not the second (more distant). (But this is not what happens). Let’s consider what a flat (non-recursive) structure would look like for sentence 1: a.

[Can eagles that

v1 fly

v2 swim?]

So, if we are simply scanning strings via a process which only adheres to the ‘adjacency-factors’ of the string, then we should interpret that we are asking ‘can eagles fly?’ But let’s consider now what sentence-1 looks like under a recursive structure: 3 web-link no. 18.

134 | Reflections

on Syntax: Lectures in General Linguistics

b. [x Can eagles [y that fly y] swim x]

Now, if we consider the nature of recursive structures (as found with embedded strings), then we can see that indeed the closest verb to the subject [Eagles x] (found within the x constituency, or unit of structure), is in fact [swim x] and not [fly y]. As Chomsky puts it, it rather seems that it is due to some unique design of our human brain (a brain which gives rise to language) that allows us to instantiate immediately upon recognition (an innate recognition) the underlying recursive structure of [[]‌] over a flat structure [].This recognition is knowledge not learned in school, nor is it taught to us by our parents at an early age, but rather, comes for ‘free’ out of the human design of language. [11] Sentence-2: ‘Him falled me down’ (1960s child language studies)4,5

In considering sentence- 2, the item we are interested in here is the over- regularization of the verb ‘fall’ => ‘falled’ (fell). If we were, again, to take the naive flat assumption that all words are memorized, stored and retrieved as holistic chunks, in other words as [falled], then the immediate problem surfaces as to where and how the child ever came across such the word, being that it is not supported by the input. This very question goes to the heart of what Chomsky referred to as the creativity of language. Berko’s6 work on child language quickly saw that such errors in fact proved that the child was working under a rule-based design of language, and that at roughly the point where over-regularizations take place in the child stages of acquisition, we find that the over-regularizations align with the acquisition of the rule—viz., [[N]‌+ s], for plural, and [[V] + ed] for past, noting that such ‘errors based on rules’ supports recursive structure. Hence, what we have here with such errors is a decomposed item of [[stem]+affix] e.g., [[fall]ed] whereby the two parts of the words must be stored in distinct units or constituencies as found in the morphology (stem, inflectional morphology). [12] Sentence-3a: ‘The horse raced past the barn fell’.

Sentence-3a is also known as a ‘garden-path’ sentence. (See web-link no. 22). (The classic sentence and its first use is attributed to Tom Bever. See web-link no. 23). In such designed constructs, readers are often lured into parsing (processing) the structure of a given sentence in a certain way, and by doing so is actually led 4 web-link no. 19. 5 web-link no.20. 6 web-link no. 21.

Reasons for Syntactic Movement/‘Four Sentences’ | 135 down a wrong syntactic reading of the sentence (i.e., down a ‘garden path’)—viz., in believing that a grammatical element should follow based on what came prior. In other words, the erroneous assumption is tied to a processing which reads the first verb parsed ‘raced’ as a past tense main verb of the matrix subject ‘horse’, rather than how it should alternatively be processed, as an embedded passive past-participle of a covert embedded clause (The horse—that was raced past the barn—fell). This nice parsing trick shows how the brain seeks to parse and process pieces of syntactic phrase structure in systematic ways, in ways which speak to phrase-structure rules, and in more current theory X-bar syntax. When the reader first hears and confronts the designed parsing of an initial DP, say ‘the horse’ (in the above garden path sentence), the DP immediate ly gets assigned as subject—this is done in concord and under syntactic X-bar theory, assuming that the syntax of the given language is SVO (subject-Verb-Object). Fine, but what this also means is that the following verbal item usually gets assigned as a Tense verb which then, due to phrase-structure rules, determines the Tensed verb to be a matrix predicate of the subject. The phrase structure design would read as follows, S (sentence)→DP, TP… But this reading is false. The first Tensed Verb item raced does not relate to the predicate of the subject, but rather is part of an embedded structure which should rather be parsed accordingly: (*Note: Let’s note here how ‘closeness and adjacency’ actually do interfere with our mental processing, dealing us a bias towards such garden-paths. Perhaps the default is in fact locality over distance—it seems to be a mammalian trade- off that humans so readily associate things that bundle close together. But also note that in sentence-3b (below) we do dispense with locality after all and rather submit to distant/gapped processing). [13] [ S [DP The horse] [ that was raced past the barn] [fell]] TP (Tense Phrase = Sentence) DP DP

T’ CP

The horse C

T VoiceP

that Voice was

(= predicate) VP fell

VP V raced

PP past the barn

(where Voice P = Voice Phrase for passive voice was raced).

136 | Reflections

on Syntax: Lectures in General Linguistics

There is a question of binding & licensing here which closely relates to C- command (see Lecture- 3, [7.1], [12]). Although binding and licensing is usually called upon to show anaphor/antecedent relations, (as well as polarity expressions), what example [13] shows is that the same types of conditions and constraints which speak to binding & licensing can equally serve us here in explaining how garden-path constructs come to be analyzed. For example, let’s slightly extend the garden-path sentence to read ‘The horse raced past the barn fell to the ground’. Now what we discover is that the Preposition Phrase (PP ‘to the ground’) can only be bound & licensed by the verb ‘fell’, as an adjunct/ argument of the verb , and not the embedded verb ‘raced’ * (e.g., … fell/*raced to the ground). This PP-effect is referred to as a syntactic constraint on garden-path processing, whereby the PP forces a non-garden-path reading. (See structures below). [14]

i.

VP V fall

PP P

ii.

* VP

*

V DP

to the ground

PP

race P

DP

to the ground

(where * marks ungrammaticality) Sentence-3b ‘The boy Bill asked to speak to Mary thinks he is smart’ (Bickerton 2010, p. 202). In sentence-3b, consider how we actually find the opposite effect from that of sentence-3a (the garden-path sentence). In 3a the closest adjacent verb (raced) as pronounced in the utterance took precedence over a more distant verb (fell), hence the wrong assumption was made that the ‘horse fell’ rather than the ‘horse raced’. Adjacency wins out in processing in such garden-path structures. On the other hand, sentence-b, when read outload—as opposed to simply reading it silently which doesn’t give you the wanted effect (recalling that the natural skills, ‘speaking’ and ‘listening’ provide the underlying structure while the artificial culture-bound skills ‘reading’ and ‘writing’ are rather learned and do not necessarily provide the underlying language structure)—instantly leads to the correct assumption that it is The boy who is doing the thinking rather that Mary who is doing the thinking, despite the fact that Mary thinks is adjacently placed together which might otherwise trigger a frequency-effect. Consider how sentence-3b must be structured below:

Reasons for Syntactic Movement/‘Four Sentences’ | 137 [15]

[The boy [Bill asked to speak to *Mary] thinks he is smart].

a. *Mary thinks he is smart. b. The boy thinks he is smart.

When we read the sentence aloud, we go against adjacency of Mary thinks and rather, via an instinct level of processing, we naturally understand that it is ‘the boy’ further down and far removed in the tree that is the subject of the verb ‘think’. Such types of examples give linguists evidence that native speakers of a language at times (actually quite often) go against surface-frequency or statistical- probability analyses as would be presented in the actual surface data. In other words, native speakers of a language go beyond surface data made available in the input, and rather rely on deep-hidden structures which may not always be evidenced in the surface-level (PF) pronunciation. [16]

TP =S(entence) DP DP

T’ CP

The boy C

T TP

VP thinks he is smart.

that DP Bill

(= predicate)

T’ T [+past]

VP [+Fin] V

VP [-Fin]

asked to speak to Mary

[17] Sentence-4: ‘I wonder what that is __ _up there’ (pulled from Galasso 2016).7

David Lightfoot (2006, p. 52) beautifully shows how a recursive-movement analogy of [[]‌] is both psychologically and indeed physically captured by the

7 web-link no.24.

138 | Reflections

on Syntax: Lectures in General Linguistics

following simple illustration, showing the merge/move sequence as discussed in the Overview. Consider the ‘is-what’ phrase in the sentence ‘I wonder what that is up there’. The base-generated structure first looks something like: [18] I wonder [__[that [VP is what]]] up there.

… and where the Wh-object ‘what’ begins as the object/complement of the verb ‘is’ and then gets displaced by moving above ‘that’ in the surface phonology (PF), yielding the derived structure. But if we take a closer look, we see that after such movement of ‘what’ out of the [VP ‘is-what’] phrase, the VP survives only as a head [VP is ø] and is without its complement ‘what’—thus the phrase ‘partially projects’. But partial phrase projections are allowed given that their Heads still remain (in situ) within the constituent phrase, hence, we get the licit structure in (a): a. I wonder [whatj [that [VP is __j ]]] up there? b. *I wonder [whatj [that’sk [VP __k __ j ]]] up there?

But movement has an effect: note how the head ‘is’ must remain phonologically intact as a head of the VP and can’t become a (phonological) clitic attached to the adjacent ‘that’, as in [that’s]. In other words, at least one of the two lexical items within a phrase (P) must be pronounced (be projected). Hence, as we see, when both items [is] as well as [what] move out of the VP—'What’ moving into a Spec of a higher P along with the item [is] moving out of its head (H) position of the P and (forming itself as a clitic) piggy-backs onto the item [that] of the higher P, we see the result that the VP becomes vacuous (completely empty) and so the structure cannot survive (it becomes ungrammatical). Moved-based *[[that]’s] is an illicit structure found in (b) (asterisk* marks ungrammaticality), while Merge-based of the two words [that] [is] is the only licit structure. It seems simultaneous movement of both head ‘is’ along with its complement ‘what’ of the [VP is-what] renders the verb phrase vacuous [VP ø] (i.e., phrases can’t be both headless and complementless). In this sense, MOVE- based *[[that]’s] is barred and only Merge-based of the two items [that] [is] is allowed to project—the former (move) being affixal in nature, the latter (merge) lexical. This ‘merge vs. move’ treatment is similar to what we find with the distinction between (merge- based) Derivational vs. (move- based) Inflectional

Reasons for Syntactic Movement/‘Four Sentences’ | 139 morphology, where the former is an affix process, and where the latter is a word- forming process. [19] Progression of structure: (a)

‘is-what’ = VP (Verb Phrase) VP

When object ‘what’ moves up, it leaves Complement/Obj of Head V sll intact, sll allowing a licit projecon of VP.

V

Obj

is what (b)

XP Y

X’

what

X

VP

that V

(VP head is filled with V ‘is’, so VP projects) Obj

But note how ‘is’ must remain as a full word and not as a clic.

is what

(c)

XP Y what

X’ X

that’s

VP

(When V ‘is’ is reduced to a clic [‘s], the VP becomes vacuous

V Obj is what

(i.e., both V and Comp are empty) and so the VP can’t project).

In the example above, now both words have moved: (a) ‘what’ up to a higher position (of XP), and ‘is’ up to the adjacent word ‘that’ (say, as Head X of a higher phrasal projection XP).

6

The Myth of ‘Function Defines Form’ as the Null- Biological Adaptive Process and the Counter Linguistics-Based Response. (The ‘Accumulative Lecture’) This accumulative lecture serves as a springboard for discussion leading to data- collection and analyses of the types of linguistic corpora which demonstrate the fact that language, in its most ‘narrow sense’ of the term—viz., as a phonological/syntactic categorial representation buttressed by and resting upon recursive design—seems to defy all common-sense adaptive notions of the type championed by Darwin. Of course, Darwin got it right! There is no other theory. But his theory was not designed to handle, as Stephen Jay Gould terms, ‘punctuated equilibrium’—a phenomenon which does not at all abide by otherwise bottom- up, environmentally-determined pressures of the sort Darwin spoke of. Well- accepted terms of the day such as ‘adaption’, ‘evolution’, and ‘biological pressure’, would soon become replaced by ‘exaptation’, ‘skyhook’ (a top-down processing as opposed to a bottom-up ‘crane’),1 and ‘non-evolutionary’ accounts (of the sort 1 See Daniel Dennett’s Darwin’s Dangerous Ideas (1995) for top-down ‘Skyhooks’ vs. bottom-up ‘Cranes’ analogies. The term ‘Exaptation’ was first used by Gould and Vrba (1981) for the idea that traits are understood as artifacts which themselves can be highjacked and recycled with new functions, like spandrel-artwork found in churches: these corner-ceiling spaces were not intentionally designed to hold art, but rather was a byproduct of architectural design, like the aesthetic beauty behind flying

142 | Reflections

on Syntax: Lectures in General Linguistics

Noam Chomsky would refer to as ‘hopeful monster’).2 The insights here pointed to a new direction which showed that language, in its most ‘narrow sense’, simply didn’t abide by the same rules and principles as found in the Darwinian world of evolution. But, in a more general footing, there may be some evolution left to language after all, as found in the more communicative ‘broad’ sense. It’s just the case that there is increasingly very little that a Darwinian theory can approach and handle within its parameters regarding the narrow scope of language as narrowly defined as a sole instrument of ‘recursion’.3 Exaptation is a trait which can evolve for one trait but then become highjacked for another. Even this notion of exaptation would become challenged by ‘punctuated equilibrium’, (something bordering a hopeful monster). Claims of language/speech in such a capacity began to challenge the most common of notions related to how things get acquired, learned and processed. It would certainly defy the radical behaviorists’ hypotheses that all of learning takes place within a singular crucible—a common melting-pot intuition that all belong to the mechanical world of clocks, language just being another sort of clock (with gears and levers, not unlike the ‘brain-as-computer’ metaphor which would later be discredited). This lecture presents the idea that the generally accepted Darwinian adaptive- notion that ‘function defines form’ is not completely accurate, and, in most cases, is simply wrong—at least for language as defined in its narrow scope. Conversely, what we show is that for speech & language ‘form defines function’. We shall use this analogy as a simple pedagogical device in order to reveal some interesting phenomena found in language. Indeed, ‘speech is special’.4

buttresses which were built for structural integrity and not specifically for expressive beauty. For the Spandrel argument, see Gould & Lewontin (1979) ‘The spandrels of San Marco and the Panglossian paradigm’. 2 A term first used by Goldschmidt, a German-born American geneticist—a term which suggested that gradual, evolutionary pressures could not bridge the gap between micro and macro-evolution. That some other phenomenon outside of gradual feature displacement and change had to be involved. Until today, a phenomenon completely unexplainable. 3 See Fitch et al. (2005) ‘The evolution of the language faculty’. 4 See (Haskin’s laboratory) Philip Lieberman’s ‘speech is special’ hypothesis: https:// www.the-scientist.com/features/why-human-speech-is-special-- 64351.

The Myth of ‘Function Defines Form’ | 143

Introduction The most striking property of human language (a property which indeed heralds language as a quite unique ‘species-specific’ entity) is perhaps best expressed as that ‘human kernel of deepest predispositions’—that which freely comes to us as an innate structure, bewildering to the keenest observer, which lies hidden in the deepest recesses of our human mind, but without which no human activity as we know it today could be performed. This kernel is a property which defies all ‘common-sense’ environmental and empirical notion. For instance, take the very claim made when dealing with human speech: the question ‘How could it be possible that two people might hear the same sound different?’ Clearly, one would think, the mere fact of listening, as all humans can (mechanically) do if they are aware and sensitive to their surroundings, should be capable of hearing their surroundings in the same ways. But indeed, this is not so with language, and not so with speech: two people may in fact hear the same sound as different or process grammatical constructs differently. It seems ‘Speech is Special’. ‘Language is special’. So, how can this be? A rather different account is needed. One that explains this disconnect between subjective illusion and objective reality. Indeed, language is something like an illusion (not a clock but a cloud, using our ‘Clouds & Clocks’ metaphor from lecture 1) since it is category-based—and the ability to construct abstract categories, (that quintessential human property), is the ability to generate recursion. This notion to the untrained ear is indeed strange and seems even absurd when pitted against all common-sense thinking. It is an idea that posits (i) internal language-specific parameters which (ii) innately emerge from out of a developmental-maturational trajectory. Both of these points (innate structure untethered to the environment, and maturation thereof) are not what one would expect from associative & strengthening reflex of the type reliant on frequency of input/activity. While the distinction may be subtle, it certainly renders non-trivial results. This lecture serves as a research prompt-option and allows students to develop spin-off proposals leading to final exams/research projects. The prompt is wide- ranging in scope and can cover any aspect of what would typically be presented in any introductory/undergraduate linguistics course. Aspects which fall within the scope of this prompt include:

(i) Structure, (ii) Phonology, (iii) Syntax.

144 | Reflections

on Syntax: Lectures in General Linguistics

The ‘baseball-glove’ analogy. In discussing ‘function & form’ analogies (typically as they play out in biology), I like to use the ‘baseball-glove’ analogy. In a perfect Darwinian world, the baseball-glove analogy should be extended to all of the natural world. But it is not. Let’s play this out below (fun pun!). Let’s first consider the catcher’s glove, an extremely heavy, clumsily designed glove. Why? Well, the glove has adapted & evolved over the years (a Darwinian explanation) in order to keep-up with the increasingly powerful pitchers who, at times, can consecutively throw the pitched ball well exceeding 100 miles an hour. For that reason, added pressures forced adaptive measures in order to secure functional success—viz., to protect the catcher from hand damage. In this manner, indeed, function (consecutive fast-ball catching) helps shape and define the form (a bulky, high-padded catcher’s glove). Catcher’s gloves: they are a wonder to handle. I have one myself. But a catcher’s glove is not my optional choice-option, say, if I need to play outfield, where I would be running with that heavy glove all over center field. So, consider now the outfielder’s glove: svelte, slim, lightweight … it’s very easy to run with such a glove. Of course, all that bulky padding a catcher’s glove has doesn’t serve a function in outfield, where fly-balls are caught high in the air, with only a falling velocity. This is a perfect biological adaptive characteristic (something Darwin would espouse for, say, as in the evolved shape of a finch’s beak).5 To further the analogy, let’s look at the first-baseman’s glove. Notice how the glove is over-extended by a few inches, much longer in reach than the outfielder’s glove. Why so? Well, consider the environmental role and function of the first-base position. So, most of the time, the function of the first-baseman is to secondarily catch a ball first hit to the infield (the second-baseman, shortstop, or third-baseman), whereby the batter is running to outpace the thrown ball to first base. It is a race between ball and batsman! Any extra few inches of netting in such a glove with the sole purpose to catch the ball as early as possible would provide a much-needed benefit, an ‘advantage’ to its function. Once again, the function shapes the form: it defines it. (The same analogy could easily be extended to golf, and the types of clubs needed to strike the ball for different distances and circumstances—but I confess, I know absolutely nothing about golf).

5 See https://news.harvard.edu/gazette/story/2006/07/how-darwins-finches-got-their- beaks/.

The Myth of ‘Function Defines Form’ | 145 This is the story one usually employs for any adaptive (Darwinian) mode of evolution. For those linguists who espouse a functional theory to language (functionalism), they are likeminded in their treatment to the form of language—viz., form serves a functional niche. Yet, as we will see below regarding structure- phonology-syntax, this ‘function defines form’ application doesn’t subsume certain narrow properties of language. Given our baseball-glove analogy, the reverse order of ‘form defines function’ (as found in language—or, ‘internal to external’) would have it that based on the glove one is wearing, that (internal) glove would define the nature of the (external) ball thrown to it. (Extending the analogy via fruit: If you have only a grape catcher’s glove, and, say, I throw to you an orange, well, the orange would change midstream into a grape once it is caught. The form shapes the function. Very strange indeed! Even spooky! (Clouds can be spooky). But this in fact seems to be what happens with certain quirky instances of language). This is the topic of the lecture—topics which can be extended to cover a wide-range of other linguistic data which speak to the reverse order of form defines function. In order to see how the reverse-order of ‘internal/form defines external/ function’ gets applied in narrow realms of language—how it holds contrarily to common-sense notions of ‘function defines form’—we look to the following three topics aforementioned.

Structure Regarding structure, the most obvious aspect would be to examine what the level of frequency plays on determining whether one structure is intuitively preferable or grammatically acceptable over another (recall, ‘frequency’ is assigned to ‘clocks’—and the more clocks we have, you say, the more accurately we keep time. So we think!) For instance, in any Darwinian explanation of adaption (a ‘function defines form’ explanation), one would expect that the more a speaker hears or uses a specific construct, the stronger its reinforced usage shapes and maintains the acceptability of the structure—usage reigns king in such a functional, adaptive theory. However, after taking only a quick cursory glance of what actually takes place in language, we see that some high-levels of ‘frequency-usage’ (as found in the input) do in no way guarantee the same, parallel high-level of ‘acceptable-usage’—viz., function doesn’t seem to follow form. This is often discussed within syntax seminars which show how high-frequency phrases such as [That’s] as in [[that’s] nice], [[that’s] good], [[that’s] John], doesn’t facilitate the

146 | Reflections

on Syntax: Lectures in General Linguistics

parallel phrase, *‘I wonder what [that’s] up there’, where the full-form phrase [that is] is rather required (I wonder what that is up there) (where *marks for an illicit structure).6 Stange, yes? Since [that’s] is so often heard. An even more obvious and ubiquitous example deals with child structure— viz., the highest frequency-based word as found in the young child’s English input is the word ‘The’ (in fact it is marked as the highest frequency word in the English language), yet, for young child language acquisition, the word ‘the’ is one of the last acquired by the child (typically a stage-2 acquisition acquired at around 30–36 months of age). And perhaps the nicest example comes to us via phonology (specifically, the internal syllabic- template / ȿ/ ). With syllabic templates—something the resides in the mind of a speaker, as a parameter, innate and prone to maturation—we see overwhelming data which purport to ‘form defines function’. Imagine a young child at a given stage where she can only generate a template that captures Consonant and Vowel sequences, a CV-stage. Well, if the external/function input provided exceeds this internal/form CV template, some sound would have to be sacrificed. Form defines Function! For example, let’s consider what a syllabic template might look like (very informally), and see if we can briefly show some data which would be accounted for and mapped by such a template. Syllabic Template (See §6.3 ‘Note on Syntactic Tree’ for similarity between syllabic and syntactic tree-templates: both are recursive).7

6 For ‘Progression of structure’, see §[3]‌found in the link to the paper: https://w ww. academia.edu/4 2204713/Notes_R eflections_on_ Syntax_Note_1._ A _Note_on_ the_Dual_Mechanism_Model_L anguage_acquisition_vs._learning_a nd_t he_Bell- shape_curve. 7 See footnote 13 (and link to paper) for the ‘recursive nature’ of syllabic/syntactic tree-templates.

The Myth of ‘Function Defines Form’ | 147 (i)

ȿ

CV-stage: at this stage only CV utterances surface.

onset rime C

V

/k

(ii)

æ/

ȿ

= ‘cat’ /kæ/ (mama, look a kæ: a /kiki kæ/)

CVC-stage: at this stage only CVC utterances surface.

onset rime C nucleus coda /k/

V /æ/

C /t/ = ‘cat’ full CVC with final /t/ coda.

When looking at early child speech patterns, developmental linguistics find a robust stage of utterances which defies and goes against the parental input. Young children at stage-1 say /kæ/for cat ‘kæt’, whereby they drop the final consonant /t/. At a slightly later stage, the child then is able to say cat as CVC / kæt/, etc. The ubiquitous examples of /k i:ki/for ‘kitty’ also suggest a geminated/ duplicated CV:CV-template stage (where both vowels and consonants are mere duplicate copies of one another). Such gemination is notated via indexing as follows: [CiVj: CiVj]. (For phonemic/syllabic stages, see §§6.1–6.2, and 6.7.1 below. For syntactic template, see [§6.3]). As an extension, imagine a child caught in the CVC-stage (only one initial consonant) trying to grapple with the production of CCVC ‘school’ /skul/(an initial consonant cluster). Surely, one of the two initial consonants would have to be sacrificed: which one? For that matter, we have to turn to phonemic development (see §§6.1, 6.7.1 below). Since plosive /k /is acquired before fricative /s/, the /k /wins the musical-chairs competitive game and gets seated within the only C-seat available. Applying ‘common-sense’ values to which phoneme should loss out in such competition, one might expect that always the initial letter/phoneme should always be self-preserving, since it formulates the very beginning of the external word (the word ‘look-up’ hook, so to speak). But for the example ‘school’, it is the first sound /s/which deletes. In fact, this is the right way to think about such an ‘external’ race model, as based on the number of ‘internal’ chairs/syllabic slots available. This is most certainly a ‘form defines function’ sort of game.

148 | Reflections

on Syntax: Lectures in General Linguistics

Hence, in sum, there does seem to be quite a disconnect, at least in Darwinian adaptive terms, between usage-‘function’ (external frequency of …) and ‘form’ (internal representation of …). This notion seems to strike at the heart of the null-biological adaptive assertion that ‘function defines form’. Indeed, while such null-hypothesis assertions do normally fall-out from biological-determined systems, it just doesn’t seem to do so for language. While language is seen as being steadfastly embedded within the biological system—a Darwinian pursuit (e.g., the so-called ‘biological basis for language’)—there does seem to be a very narrow range of language properties which don’t fall within the Darwinian scope of bottom-up adaption. This very narrow range has everything to do with recursive syntax (recursive structure)—a phenomena of structure which rather seem to have non-Darwinian ‘top-down’ properties.8

Phonology Regarding phonology, the problems with a bottom-up Darwinian assertion that speech proceeds ‘from external environment to internal representation’ is that it just doesn’t seem to work that way around. Rather, as can be seen in data which show for second language speech errors (so called L1-transfer, or L-1 interference), speech is not simply a reinforced act or rehearsal of mapping speech output (external function) to speech representation (internal form). Let’s examine two data-sets below (child language speech/phonology (L1), and second language speech (L2)):

Child Phonology (L1) It often comes at a surprise (but seldomly to parents of young children) that young children can’t simply repeat certain phonological strings (say, ‘spaghetti’) when asked to do so. What one finds is that the input of ‘spaghetti’ typically gets mangled and pronounced in the output as ‘bazgedi’.9 This inability to match input-to- output, what I call ‘the child’s inability to be conservative’ suggests, theoretically, that the internal form of the child has been somehow outpaced, mismatched by the external complexity of the input. Hence, the input-output mismatch. This is a classic example of how children’s staged-development of syllabic complexity 8 See Hauser, Chomsky, Fitch (2002). http://w ww.public.asu.edu/~gelderen/Hauser 2002.pdf. 9 For phonemic/syllabic stages of child speech development, see https://w ww.csun.edu/ ~galasso/Ling417LectureIExamReviewChapteronPhonology.pdf (§13.3, ex. 36).

The Myth of ‘Function Defines Form’ | 149 matures over time, with simple CVC (consonant-vowel-consonant) sequences found in form subsequently affects the function—viz., what is a CCVCVCV pattern /spəgƐti/turns into a CVC-CVCV pattern /bəz-gƐdi/. This is a form- to-function compromise given that children, at certain stages of syllabic development, can’t produce CC-clusters (initial consonant clusters CC__ _ ). This is not unlike our example of a grape turning in to an orange, or a 100mph fast-ball pitch turning into a slow-descending high-fly, as determined by the nature of the glove being used in the catch. (funny!). Perhaps the most infamous example is what is found in the production of the phoneme /r/(à la Elmer Fudd—e.g., ‘I’m gonna kill that crazy wabbit’ (rabbit)). Given too that young children famously pass through stages of phonemic development, /r/-initial words (as found in the input) get produced as /w/, since the phoneme /w/(a bilabial) is earlier acquired over the rhotic-R . (/w/is stage-1, /r/stage-2). So again, input might not necessarily map onto the output whereby roller-skate (noting the CC-cluster for ‘skate’) rather turns into ‘wowo-kate’. This is a clear example of what is sometimes referred to in philosophy as the ‘Ghost in the machine’ where the order of the day in Aristotelian philosophy saw the world as a gathering of external forces made precise in our nature-to-senses mapping (a pure empirical enterprise). Even Newton comes down on such function to form—(with the caveat), at least until he realizes that it doesn’t entirely work— i.e., that bodies can’t just be extended features which behave in classical clock-like formation, but rather that forces, ‘action at a distance’ mysteriously are involved. This idea of hidden internal operations which would influence external objects was seen as a ‘Great Absurdity’. Of course, Newton had no physical explanation for this, but he would eventually have to conclude that some internal ghost to it did exists (e.g., gravity, or ‘action at a distance’). In so doing, Newton would eventually exorcise our classical notion of body and leave the (quantum-mechanic) ghost intact. (See Chomsky 2002, Chapter 2 for discussion).

Second Language Phonology (L2) Second language (L2) speech shows if not exact but similar errors. Of course, the speaker of an L2 already has at her disposal the whole phonemic inventory of her L1 speech. So, while e.g., syllabic developmental problems, or even phonemic development problems may not show up, other types of errors related to so-called L1-transfer/interference might. For instance, let’s take syllabic considerations first. Imagine a language (such as Japanese) which has a very low to zero form allowance/tolerance for

150 | Reflections

on Syntax: Lectures in General Linguistics

CC-clusters in initial position (while they may show up in medial position if other stress constraints are followed). The Japanese L2 production of ‘love story’ often gets produced as ‘loba sutori’. This is an interesting case of so-called L1- transfer or L-1 interference. There are two aspects which we can address here:

(i) First, the Japanese language may reduce an otherwise English CC- cluster of /st/found in the word ‘story’ and rather produce it as ‘sutori’. This makes sense only if we keep to a form defines function scenario. Specifically, sense Japanese doesn’t have CC initial internal form representations, the input will have to be modified in order to match the internal mapping—hence, CC__ → CVC, whereby CC /st/gets torn apart with an added insertion of a vowel /u/between /s/and /t//su|to|ri/ (CC becomes CVCV). (ii) Second, one other point of interest is the noted replacement of the phoneme /v/. Japanese doesn’t represent this phoneme in their speech inventory. The closest phoneme would be the /b/. In other words, the Japanese ‘baseball glove’ for /v/doesn’t exist. Hence, they can only ‘catch’ an incoming /v/-pitch with the /b/-glove. This is the perfect example of form defines function—where the internal form/representation defies the input and changes it to match. Where the form defines function. (Similar problems arrive with Arabic, where their phonemic form inventory doesn’t allow for the function of a voiceless /p/e.g., where ‘police’, or ‘palm tree’, etc. get pronounced as ‘bolice’ and ‘balm tree’).

As we see below, both syllabic as well as phonemic considerations of form enter into the equation of whether or not the function of a speech sound can be copied by an L2 speaker. Other instances of phonemic considerations can account for why Spanish L1 speakers speaking L2 English don’t assimilate /s/to /z/, as in the example ‘cars’ where English L1 speakers assimilate the voicing properties of the adjacent /r/ across to the next sound /s/, changing */kars/to /karz/. This is a subtle but quite interesting observation. The fact of the matter is that Spanish simply doesn’t have as an L1-form the phoneme representation for /z/. They might have the spelling for ‘z’, but they don’t have the sound /z/. (They simply don’t have the /z/-glove). Lopez sounds Lopes, zero, sero, zebra, sebra, etc. Furthermore, the Spanish ‘sh’ (/š/) doesn’t exist. Its closest catcher glove would be ‘ch’ /č/sound when in initial position (/s/when in final position, as in Inglis for English). In Spanish, the external sound/f unction of the word ‘shower’

The Myth of ‘Function Defines Form’ | 151 turns into the internal representational sound/form ‘chower’: e.g., ‘first I’ll take a chower and then we’ll go chopping for choes’. One can imagine many such L1-transfer problems which might show up in the early-school years having to do with bilingual Spanish to English education given these observations. (Even L1-transfer could undermine English L2 spelling). An L1 Spanish child spelling ‘shoe’ as choe, etc. In sum, phonology offers us an intriguing window into the contrary ‘form- defines-function’ analysis. One might rather think the most obvious environmental aspect of language is that of sound/phonology, and that the once assumed (radical) behaviorist and naïve ‘tape-recorder’ theory of language would suggest all a speaker need do is observe, listen, really listen hard! to one’s surroundings, and the correct picture would emerge. This is the erroneous Aristotelian tradition since debunked as early as Galileo, Descartes, and as recently as Chomsky. Chomsky, in this sense that indeed (innate) form defines function, is in clear accordance with Plato (vs. Aristotle), Descartes (vs. John Locke), Galileo, and famously against B.F. Skinner.

Syntax (Child Grammar) The examples of how form defines function I’d like to offer here regard the syntax of so-called functional categories, such as INFLectional {s} for plural, Possessive {‘s}, Present AGREEment/Tense {s} (a lot of {s}’s in English morphosyntax), but likewise what happens to Case (I vs. Me), Auxiliary verbs (Do, Be, Have). Etc. Let’s take each in turn. a-Plural, b-Tense, and c-Possessive {s}.10

a. The omission of Plural {s} examples is ubiquitous in the child data. e.g., two car, two spoon, more cookie … b. Daddy drive car (where {s} in ‘drives’ is missing). Mommy cook pasta. Him do it. c. That daddy car (daddy’s car). Mommy sock. Where boy bike? Why you cry? 11 10 See Radford & Galasso (1998) for some data on the development of ‘form to function’ having to do with Possessive structures. https://w ww.csun.edu/~galasso/ arjg.pdf 11 Notice Aux ‘do’ omission in ‘Why _you cry?’ (Why do you cry?) but its realization as a main verb in ‘Him do it’. The contrast between the two can be phonological (since they are equal), but rather the distinction between [lexical] vs [[]‌functional].

152 | Reflections

on Syntax: Lectures in General Linguistics

INFLection In sum: If we look to the inner-template form of inflection as elements of language such as Tense, Agreement markers, Possessives, etc., we find that not only are such elements categorical/abstract in nature, but they also occupy a particular inner-template position in the morphosyntactic template/structure—viz., they occupy the ‘edge’ position. This can be shown by the use of syntactic brackets [] (a variant notation of tree diagrams). Consider below: (i) (ii) (iii)

Tense: [[drive]s] (‘Daddy drives’) Agreement (number): Two [[car]s] Possessive: [[boy]’s] (‘Boy’s bike’)

[[ stem] affix]: what we find of INFL affixes is that this ‘edge slot’ [ [ ] edge] corresponds to Broca’s area of the Brain (found in the Front Left Hemisphere (FLH)). See the literature on the Dual Mechanism Model for consequences of Broca’s area maturational development, and the delayed onset of young child language INFL.

*Note on Syntactic Tree. Consider below how syntactic trees resemble syllabic trees (both being recursive in nature).12 TP T s

VP

She [TP s [VP speak] s] French.

V N [[speak]s] French

Likewise, functional words such as ‘the’, Aux verbs (‘do-be-have’), similarly would occupy edge-related properties, being categorical/abstract functional items.

12 See link below: §[27] p. 10, and §[31] p 13 for how phonology takes-on recursive properties (trees): https://w ww.academia.edu/42273106/Reflections_on_ Syntax_From_ Merge_items_to_ Sets_c ategories_._ Movement-based_Theoretical_ Applications_ Morphology_down_to_Phonology

The Myth of ‘Function Defines Form’ | 153 These example above are notoriously classic in the early-child syntax literature. They represent the mismatch between input and output. Clearly, no parent speaks to their young children in this matter. Where is the evidence for such utterances coming from, if not from the input? Well, in Chomsky’s theoretical position, it’s coming to the child as residual approximate structure, albeit incomplete, but via the only internal form of a template-structure the child has to utilize at the given time of syntactic development (what is typically referred to as a lexical stage-1 which precedes a functional stage-2). Other examples such as Aux(iliary) omission come in form of such utterances below: Aux omission. What _you want? (where _indicates the place where an Aux-verb ‘do’ should insert). What _daddy doing? (_omission of ‘is’)

But while early lexical stage-1 utterances deleted auxiliary ‘do’, they have little problem with the main verb do: e.g., Mommy, him do it. Me do it, etc. Only the Aux ‘do’ gets deleted. What this interesting contrast tells us is that phonology plays no role here (since both words ‘do’ sound the same). Rather, it is about the syntax. While the early child may have a ‘main-verb-do-glove’ to catch incoming main verbs such as ‘I do it’, etc., they don’t yet have the auxiliary ‘do’-glove, as in the first ‘do’ in the utterance ‘What do you do’? This certainly goes against a common-sense ‘function-defines-form’ analysis since, clearly, as one might suspect, given the child can say ‘do’, all bets are off and so the child should be able to say ‘do’ across the board, given that ‘do’ is now perceived in the input/environment. But this simply is not what happens: there is a clear demarcation in form having to do with main verb ‘do’ vs. auxiliary verb ‘do’. Notice how in the expression ‘How do you do’? one can easily deleted in spontaneous speech the former Aux ‘do’, but never the latter main verb do—e.g., How_you do? vs. *How do you_? These are the types of data worth considering as linguists grapple over the form-function debate. Our final example comes to us via the most frequently heard word in the English language, as mentioned earlier in this lecture, the word ‘The’. Given that frequency reigns king in evolutionary circles, the opposite of the axiom ‘if

154 | Reflections

on Syntax: Lectures in General Linguistics

you don’t use it, you lose it!’, it becomes quite telling that young children (at the lexical stage-1) don’t produce this word (examples below): a. Daddy look, __car! (where _indicates omission of ‘the’) b. What __boy doing? c. Me want _bike. Etc. (in ex. b, __double deletion of is the).

Clearly, the bombardment of the word ‘the’, one would think, should have some effect on the child’s functional use of the word ‘the’: ‘Function defines Form’. It’s common sense. But it doesn’t happen. Rather, the child must await her internal form/FL in order to be able to handle/perceive the semantically vacuous and rather abstract notion of such functional categories before the function becomes realized. For the child, the form shapes the function, the form defines the function. Such waiting is the ‘maturational’ story of language development.

Instances When ‘function’ Does Define ‘form’: Environmental Factors, Dynamics Due to Time, Space, Informalism Having looked at the above data which exhibit an internal-to-external model, a ‘function as defined by form’ theory, there do exist plenty of occasions where language seems to give-way to environmental factors from the outside. Real Darwin at work. These factors typically come in the manner of time, space, and informality (just to name a few)—a ll second-level environmental factors: so-called peripheral pragmatics. Time Constraints. Regarding ‘time’, the most obvious examples we can examine have to do with abbreviation, so-called ‘wanna contractions’, diary drop, and gapping. Let’s take each in turn. Abbreviations certainly take into consideration both time as well as space constraint which would be imposed via the environment. For example, my name [Joseph] (when in its full form is not only bi-syllabic, but carries quite a nice spelling convention, such as the ‘ph’ for the phoneme /f/). The abbreviated form ‘joe’ (or the endearing ‘joey’) whether due to informality, rapidity, and/or to facilitate ease of pronunciation, certainly comes to us via the bearing of environmental pressures. As an exercise, think of all the abbreviations we us in our daily speech. Language & technology: This takes us to a topic which many first-year linguistics students might want to examine—the role of technology and language.

The Myth of ‘Function Defines Form’ | 155 Examples of how external/environment factors of tech affects language can readily be seen in texting and other abbreviated forms of language such as the use of picture-language (a modern form of the ancient pictograph) called emoji. Consider emoji, where language (speech, syntax, structure) is completely missing and where iconic representations take-on a 1-to-1 picture-to- meaning (or emotion) relation. This is an iconic representation as found in still photography. One very nice topic of inquiry might be to see how language can be or has already been shaped by texting. Examples of abbreviated text SMS include acronyms/initialism (examples provided by my tech-savvy son Nicolas below): Examples of Initialisms: LOL (laugh out loud) SMH (so much hate) GG (good game) GTG (got to go) THX (thanks) LMAO (laughing my ass off) BRB (be right back) TTYL (talk to you later) *CYA (see ya)13

These initialisms must be pronounced letter by letter and not read as a whole word. Other more common examples we find in society include acronyms which are read as a whole words which include NASA (pronounced /næsə/), WHO (Word Health Organization)—when read as a word ‘WHO’ /hu/becomes an acronym, when read as individual letters W-H-O becomes initialism. Other examples of initialisms include ETA (Expected Time of Arrival), GMO (Genetic Modified Organism), FBI, CIA, NATO (when read letter-by- letter, an acronym when read as a whole word). SWAK (sealed with a kiss) was a popular acronym during WWI with soldiers sending letters back home to their families and love ones. My favorite is XXX, OOO (hugs and kisses) which is a different notation all together, more symbolic in nature (with iconic hints to meaning), such that X = hug (crossing of arms) and O = kiss (a pictograph suggesting the rounded mouth to deliver a kiss). The X was common in the middle- ages meaning ‘kiss’ to show faith and honesty. The X may in fact go back to the symbolic meaning of Jesus Christ—to cross is to anoint. Christ is the anointed one, the one who is crossed. (Also, consider ‘Christ on the Cross’—There is a double significance here). On a lighter note, consider Chris (X), as in ‘criss-cross- apple source’ (a way of sitting among children with legs crossed).

13 *(Note the reduced/abbreviated spelling of ‘you’ as ‘ya’ (which would fall into our ‘informalism’ category)).

156 | Reflections

on Syntax: Lectures in General Linguistics

‘Wanna Contraction’ (a special note). The so-called ‘wanna contraction’—like all contractions, such as it’s (= it is), don’t (do not), ain’t (am not—*notice the phonological shift in ‘ain’t’ vs. ‘am not’ typically found with irregular formations)—such examples provide another instance where rapidity, facilitation and/or ease of speech due to brevity restrict the full form of words. This is another example of how environmental factors can impact internal language. But recall that whenever we say ‘wanna’, as in ‘I wanna go home’ we tacitly know that what we are really saying, despite the (external) ‘wanna’ speech reduction, is the (internal) infinitive verb ‘want to’… ‘I want to go home’. Wanna contractions are products of reduced speech, as found in the objective-side of language. There is a hidden subjective, deep and covert silent structure which underlies the overt production. Let’s consider this note below: A special syntactic note on ‘wanna contractions’. Consider the two question forms (A, B) below:

A. Who do you want to help? (1) Who do you wanna help? (OK ‘wanna’ contraction) (2) You do want to help who? (base underlying structure before movement) (3) Whoii doi you do want-to help who? i. Showing Aux inversion of ‘do’ (from post-subject position) ii. Showing Wh-movement of ‘who’ (from sentence-final position). B. Who do you want to help you? (1) *Who do you wanna help you? (*NOT OK ‘wanna’ contraction) (2) You do want who to help you? (base structure before movement) (3) Who do you do want who to help you?

*(Such irregulars with phonology shift must be stored as separate entities, unlike regular formations ‘dream> dreamed’ (with no sound shift). Irregulars become different words in terms of storage & retrieval: (e.g., keep>kept, dream>dreamt, bath>bathe, write>wrote etc.) This distinction is addressed in the Dual Mechanism Model). Notice this special syntax note showing how while the ‘wanna contraction’ is licit in A1, the form in B1 is illicit. Why might this be? Well, like in white-collar crime when the investigators say ‘Follow the money! What we must do for this syntactic crime is ‘follow the movement’. So, let’s follow the movement (as found in (i), (ii)) and see where and how ‘wanna’ gets blocked in (B).

The Myth of ‘Function Defines Form’ | 157 If you examine the movement in (A), the underlying original syntactic slot ‘who’ occupies a slot which doesn’t interject between the ‘want’ and ‘to’ elements: (‘who’ comes at the end of the sentence). In other words, ‘want’ and ‘to’ remain as an adjacent string (as ‘beads on a string’) in the surface phonology. In this manner, due to deep-structure adjacency, the two elements can join together in surface speech and form a single ‘wanna’ (= want to). Note however, how this is not the case in example (B). In (B), we see in the deep/base structure that the elements ‘want’ and ‘to’ are in fact split-up by the intervening element ‘who’ (which has since been moved). So, a strange phenomenon takes place—even though a word has moved from out of its base position, some kind of a syntactic trace (or an empty category) has been left behind, thus syntactically blocking the ‘wanna’ (want-to) formation. Look closely at what we have in (B): . It’s the moved who which blocks the contraction; viz., in (B), ‘want’ is no longer adjacent to ‘to’. Empty categories are interesting, they hold a psycholinguistic effect on us: I’d say: ‘There’s a ghost in the machine’ !!!

Telegraphic Speech, Diary Drop, Gapping Telegraphic Speech Telegraphic speech (Radford 1990) is a reference to early child sentence structure whereby the young child exclusively omits functional words & categories— like the words ‘the’, auxiliary verb ‘do’, and the functional categories like Tense, Agreement, Case, (while sometimes displaying unfixed word order). Let’s note that the very term comes from the idea that language can adhere to certain (environmental) constraints—for example, if one were to send a telegraph whence each word carries a cost. To spend the least amount of money on words, one could easily imagine which words to dispense of and which words to keep in order to secure communication. Essentially, very expensive words would be functional words/categories (viz., they don’t deliver much of a punch in the way of communicative meaning—in fact, the very notion that languages have such words at all seems to fly in the face of any putative Darwinian theory of language as ‘functional’, in the sense that language has evolved for the sole purpose to communicate with others. This theory is referred to as functionalism—the theory that language is evolved and designed to serve a communicative/functional a niche).

158 | Reflections

on Syntax: Lectures in General Linguistics

Recall, functional words such as the determiner ‘the’, auxiliary verbs (sometimes called ‘helping verbs’) ‘do-be-have’, along with Case (‘I vs. Me’, ‘He vs. Him’), or even Tense (since many languages of the world don’t mark for inflectional Tense), suggest that such words are evolutionarily expensive since they seem somewhat redundant and out of touch with meaning. Regarding redundancy, why should a language have to mark plural twice, both on the noun and adjective, as in Spanish, or between a plural determiner ‘two’ and nouns ‘cars’? as in English. Isn’t the plural {s} inflection on the noun [[car]s] redundant since we already know that the determiner ‘two’ is plural? Such redundancy would seem to come at a cost for any evolutionary theory of language. Likewise, what is the essential difference between the morphological/Inflectional Case between I and Me? In fact, most native English speakers get this subtle and redundant distinction wrong every time, whenever they say ‘keep it between you and I’ (where the Case on the pronoun should be accusative ‘me’, not nominative ‘I’). No matter: we carry on because it has no effect on communication. What are these functional words and inflections really doing? This seems to be a quirky-feature development—one which has arisen without the typical underlying (bottom- up) biological pressures (pace a Darwinian theory). Rather, lexical words such as Nouns, Verbs, Adjectives, Prepositions would be less expensive since they give back every pound’s worth of money. Such words can’t be easily deleted without communication breaking down. For an example of an exclusive lexical language—one which functionalist would claim as a universal language, devoid of byzantine ornamentation, and one compatible with a Darwinian-based theory of language evolution—one only need look at so-called Pidgin languages (see the late Derek Bickerton’s book Language & Species, Chapter 5 on ‘Pidgin Languages’. For further review see Oxford link).14

Some Example of a Lexical-Based Pidgin Language Him go eat. What the man doing? Money no can carry. Too much children, many children, small children, house money pay. e.g., = (I have a lot of small children and need to pay the rent) Here below is a favorite example I once heard in an open market:

14 https://w ww.oxfordhandbooks.com/v iew/10.1093/oxfordhb/9780199935345.001. 0001/oxfordhb-9780199935345-e-13

The Myth of ‘Function Defines Form’ | 159 ‘Hello, me sell you four pound tomato for two dollar, very good price, yeh? You buy? Me no come tomorrow. Me friend come but him no good! Him want too much money. Me better price. Me sell you four pound tomato for two dollar. Yeh, you buy?’

Diary Drop The notion of diary drop is quite simple: image you are simply writing to yourself (as in a diary). In this environmental context, so many things can be omitted since you know who the writer is, along with the background context of the descriptive writing. In this environment, many aspects of language (functional language/words, correct grammatical constructs) are rather relaxed and can simply go missing. Diary-drop is said to be very similar to lexical (stage-1) child language given that many of the same features of language get omitted. See example below: Diary Drop: ‘Saw John hospital today. Doesn’t look good… bad shape. Might not make it thru night.’

Gapping We’ll end with perhaps the most interesting and syntactically complex examples of so-called Gapping. Gapping is a form of ellipsis (a removal of certainly words)— such ellipses work since the omitted word or utterance can be reconstructed based on understood contexts and prior utterances (via pragmatics: a Darwinian pursuit). Such omissions of words typically come in the form of conjuncts in coordinate structures in order to avoid repetition. Examples include: ‘John bought an apple, and Mary _an orange’ (where the second verb ‘bought’ is deleted and forms a gap between ‘Mary’ and ‘orange’). There are interesting constraints on such gapping. The most obvious is that the element deleted must be part of a coordinate structure. If not, gapping is illicit: e.g., John drove to the market and *Mary walked home. In this example, the two verbs ‘drive’ and ‘walk’ can’t be linked with a coordinative conjunction since the two don’t coordinate. One must rather spell out that ‘John drove to the market and Mary walked home’ (two different predicates). Also note that there may be subtle constraints between coordinating with or without prepositions. It seems the utterance: ‘John drove to the market, and Mary _home’ seems not to work (at least for me). This doesn’t seem to be as fluid as our first example with proper syntactic coordination ‘bought an apple … an orange’ shown above. Here, ‘home’ doesn’t require a Preposition {to}. But if we change the preposition

160 | Reflections

on Syntax: Lectures in General Linguistics

to have the same coordination, the syntactic constraint seems to be relaxed: e.g., ‘John [VP drove [PP to the market]] and Mary [VP _[PP to the library]]’.

Word Segmentation (Word Boundaries)/Word Change One final interesting note is to examine where actual ‘gaps’ might be between words, so-called word boundaries (the IPA word-boundary symbol is #). So, in spontaneously spoken speech (unlike with written text), one may not know precisely where to cut words from other words—namely, where the word boundaries lie. This is a considerable problem, a so-called ‘learnability problem’ for developmental theories of child language acquisition and has forced developmental linguistics to posit very complex innate structures that children must have and bring to the table in confronting word segmentation. Of course, historically, errors undoubtedly have arisen, giving examples of word change over time. As our final piece of data, let’s have fun with such one- time errors (which are now wholly accepted as part of our English language). Note how the ‘n’ has been displaced forming an historically erroneous word boundary: (i) (ii)

A # norange → An # orange. (‘Orange’ via Latin has the recognizable cognate ‘Norange’) A # Napron → An # apron. (The Latin ‘napkin’ comes from Napron (nape): an ‘apron’ is a ‘napkin’).

Conclusion The aim of this accumulative lecture on ‘form & function’ helps to serve as a springboard to other analyses and research topics for linguistics students. Examining structure/phonology/syntax in L1 and/or L2 settings, these data, I hope, have provided some insight not only into Chomsky’s claim for an innate Faculty of Language (FL), earlier described in his work as ‘Universal Grammar’, but also leads to other insights and implications dealing with early school-age education as well as bilingual educational settings. The philosophical history as well as the history leading up to evolutionary theories are rich with the scatterings of this debate, which continues even as I write. Final notes: for specific research proposals per class, please see §6.7 below. The following ideas, data, topics are simply proposals, to get students to think about research topics which may come in the way of a midterm or final. My personal classes usually are engaged in research at their appropriate levels—using at least

The Myth of ‘Function Defines Form’ | 161 one outside source as a reference while also utilizing their texts and lecture notes as an inside source. Original data, samples etc. are always encouraged but are not necessary. Usually, a ‘Summary’ and/or ‘Literature Review’ serves as a good platform for exams. In terms of style and formatting, I prefer single-space with a works-cited page, MLA or APA are accepted. Good luck!

A Note for Students: Topics for Research Specifically, (for the typical undergraduate course)* please consider the following as topics for research/final exam projects. For example, if Phonology is the topic of research, I’d suggest looking at the phonology material to serve as a springboard for research topics for papers, exams. *Linguistics between the 100– 300- level would fall as ‘Introductory Courses’ e.g., An Introduction of Language & Linguistics, Modern English Grammar, etc. *Linguistics at the more ‘Advanced levels’ typically deal with formal, theoretical topics of theory (e.g., Child Language Acquisition, Phonetics & Phonology, Historical, Discourse). (Note: While this section/lecture does not apply to graduate seminars, some material here could be reviewed and help serve as a springboard for research). Staying on the topic of Phonology (typical of a final exam topic), specific class suggestions might look like the following (from introductory class to advance senior-level classes): Introductory Courses: Phonology/History of Spelling (Research Topics) For a final exam/research topic in Phonology, consider speech as found in the community. This could include samples of data which show child language speech development, speech errors, and/or second language (L2) bilingual speech (see section below). Another topic might be to provide a brief summary of the history of spelling. For example, the letter ‘A’ was once reversed to show the shape of an Ox (which triggered the sound ‘A’ should make, /a/for ox). For some fun, here’s a NY post article on it: https://nypost.com/2015/02/08/ the-stories-behind-the-letters-of-our-a lphabet/ Likewise, there are notions that the shape of mouth came to represent the letter (Letters are called ‘Graphemes’ in linguistics). Consider below:

(i) / θ/where 0 = lips/mouth, and—is teeth between lips yielding /θ/.

162 | Reflections

on Syntax: Lectures in General Linguistics

(ii) The letter g is of historical interest: for Greek the /g/sound was gamma Γ. This in turn took on a ‘C’-shape by the Romans by extending the bottom tail of Γ = C with the added feature that it now took on a—voice (voiceless) quality: /k /(as in criss, Christ). (iii) But now the romans had no /g/, it being replaced by /k /now represented by C. (iv) Hence, the origin of C (voiced) by crossing the lower curve, yielding ‘G’. Hence, the letter G was the result of sound and grapheme change (via error) from Greek to Roman. (v) Don’t forget how technology, SMS can affect spelling over time.

Proper phonology topics are appropriate for such courses: For example:

(i) see child speech (§3.1). (ii) see L1-transfer (§3.2).

For more detailed per class suggestions, see below. For topics on grammar, see:

(i) Child Grammar (§6.3) (ii) Pidgin Grammar (§6.5)

Other topics regarding phonology are appropriate for lower-level linguistics: L1 transfer, vernaculars and bilingualisms such as Black Vernacular English, Ebonics, Spanglish, (See below): For black vernacular data, see the great link below: https://w ww.rehabmed. ualberta.ca/spa/phonology/features.htm (There are also great webpages devoted to Spanglish). Also, L1-transfer as found with Japanese to English can provide some nice phonemic/syllabic interferences, sometimes called ‘phonological repair’. Link below presents English words borrowed in Japanese, examples of L1 -transfer, English to-Japanese. http://w ww.csun.edu/~bashforth/301_PDF/301_ P_P/EnglishLoanWordsJapanese.pdf Recall (§6.2) our example of Japanese of how ‘love story’ often gets produced as ‘loba sutori’.

The Myth of ‘Function Defines Form’ | 163 Introductory Courses: English Grammar/Syntax (Research Topics) For English Linguistics courses, final exam topics concern Grammar/syntax (where ‘tree’ diagrams are involved). One possibility would be to focus on child grammars and the lack of certain grammatical features which come at certain stages of development, employing a lexical stage-1 vs. a functional categories stage-2. One specific example would be the development of Tense {ed}, present tense {s}, Possessive {‘s}, Auxiliary verbs (do, be, have), Determiners {The}, etc.

(i) See Radford & Galasso on Possessive. (ii) See Galasso ‘Minimum of English Grammar’ (Chapter 11) for some data and analyses. (iii) Topics could also include theoretical debates (Skinner v Chomsky), Brain processing (The Dual Mechanism Model, see Pinker ‘Words & Rules’ book cited herein), as well as ASL (language in the deaf community), as well as language of special populations (language impairment, autism, SLI, etc. topics found in ‘Min. of English Grammar’ text). *Tree diagrams would be required.

Introductory Courses: Language Development (Research Topics) A focus on any of these topics can make its way as a discussion point with the special aim and scope of ‘language in the early school years settings’. The topic of Phonology is well placed in the research literature here. Specific topics for research as cited within this paper include:

(i) The baseball-glove analogy in term of language evolution. a. Animal communication vs. human language. b. The pongid v. hominid split (6 mya) (million years ago). (ii) Has language evolved like all other Darwinian properties, or is Language special? a. See Chomsky vs. Pinker/Bloom here, etc. b. Fodor, Wexler vs. Tomasello (Lenneberg’s Dream). (iii) The analysis and development of structure (See my ‘Ben paper’ enclosed). (iv) The development of: a. Grammar (See telegraphic speech, Radford 1990) b. Radford and Galasso (1998) (link enclosed) c. Phonology (see below).

164 | Reflections

on Syntax: Lectures in General Linguistics

First Language (L1) Child Language Acquisition (Phonology) For Child Language Course, I recommend readings in the Hoff text dealing with child phonology (Hoff, E. (2009) Language Development (4ed) Cengage Learning). Topics include: Phonemic Development (i)

(ii)

(iii)

Stage-1 Phonemic Development: a. Plosive /b, p/, /d, t/, /g, k/, b. Fricative /v, f/, /h/ c. Nasal /m/, /n/ d. Glide /w/ Stage-2 Phonemic Development: a. Fricative /z, s/ b. Liquid /r/, /l/ c. Glide /y/ Stage-3 Phonemic Development: a. Fricative (interdental): /θ, δ/ b. Fricative (palatal) /ž, š/ c. Affricate (palatal) /ǰ, č/

→ stage-1

→ stage-2

→ stage-3

As an exercise, theorize on the types of utterances a young child might be able to pronounce given these phonemic stages: e.g., /wowo/‘roller’, /də/for ‘the’. Syllabic Development Stage-1: V, CV, CV:CV (gemination: e.g., wowo, baba, kaka, or kiki/for kitty (/kIti/) Theorize as why a young child might say /ki:ki kæ/(for /kIti kæt/(=kitty cat)). Note that gemination as well as the /t/to /k /substitution, and the final/t/deletion in cat/kæ/. Note which syllabic stages the child would be in given such data. Stage -2: CVC (referred to as the CVC-proto word template). At this stage now the child can say ‘cat’ /kæt/(CVC). Once CVC emerges, the child then enters into the so-called ‘vocabulary spurt’ since most words either conform to or can be readily reduced to the format CVC-proto-word template. Note though what would happen to the word ‘skate’ /sket/if the child only has access to CVC (e.g., /ket/). Question, why would the /s/delete and not the /k /? (See phonetic stages above).

The Myth of ‘Function Defines Form’ | 165 Stage-3: CCVC, CCVCC, etc. now ‘Skate’ can be produced /sket/(a CCVC syllabic structure). See my chapter on phonology (chapter 13) of ‘Minimum of English Grammar’ (link enclosed). Recall how CVC form-template structures resemble each other and give rise to recursion: ‘She [[speak]s] French’ (showing INFL) CVC template e.g., /kæt/(cat) (See Footnote 13). TP* T s

VP

V N [[speak]s] French

ȿ Onset rime C nucleus coda V C

*(See Footnote 13 with link for discussion of how both structures have recursive properties. For syllabic tree, see link to paper ex. [27], ex. [31] which show ‘sister-to-sister’ assimilation and constraints on assimilation having to do with ‘mother-daughter’ relations). This is a very interesting theoretical observation. For so-called ‘tapping experiments’ see Hoff’s chapter dealing with ‘The Early School Years’. (See further readings below for full chapter on phonology: see Galasso 2013 Chapter, ‘English/A merican Sound System’).

Second Language Theory First Language (L1) transfer (See Bley-Vroman’s ‘Fundamental Difference Hypothesis’). · L1-interferce (e.g., Spanish substitution of /č/for /š/: e.g., ‘chower’ for ‘shower’). (See my link to IPA for proper phonological /IPA/notation (found in ‘Minimum of English Grammar’, Chapter 13)). Reading strategies which take the phoneme as the basic unit of a sound-to- print relation (the so-called decoding question). See the ‘Reading Wars’ which pin Phonics-based (Bottom-up) methods of decoding written text vs. (Top-down) Whole-language approaches. See Krashen’s theory for L2 learning: https://apps.esc1.net/Professional Development/uploads/W KDocs/58121/2.%20Stephen%20Krashen.pdf

166 | Reflections

on Syntax: Lectures in General Linguistics

Teaching Methodologies: See works by: Vivian Cook: https://scholar.google.com/citations?user=YPoCwdcAAAAJ&hl= en&oi=ao https://books.google.com/books?hl=en&lr=&id=b06gIjWiCZIC&oi=fnd&pg= PT333&dq=vivian+cook&ots=Po7zwkN0iq&sig=WY1YYQsylrrRA6xlM9zdvfc AbQY#v=onepage&q=vivian%20cook&f=false https:// b ooks.google.com/ b ooks?hl=en&lr=&id=Ma0u AgA AQBAJ&oi= fnd&pg=PP1&dq=vivia n+cook&ots=uY lsrKGSo8& sig=oIpzU DA Dq OkOQ2XLQgmcp-GmyUM#v=onepage&q=vivian%20cook&f=false https:// b ooks.google.com/ b ooks?hl=en&lr=&id=TZMEemWIlyEC&oi= fnd&pg=PR7&dq=vivia n+cook&ots=- 5 NT25edTA& sig= 0 w- E U V5k AMDdFqrzrIpLqgYmiQk#v=onepage&q=vivian%20cook&f=false Steven Krashen: https://apps.esc1.net/ProfessionalDevelopment/uploads/W KDocs/58121/2.%20 Stephen%20Krashen.pdf

First Language Acquisition (L1) of Syntax/Grammar

(i) For acquisition of Possessive, see link to Radford & Galasso enclosed in this paper. (ii) See ‘Minimum of English Grammar’ text (chapter on Child Grammar (Chapter 11)), link enclosed in this paper. (iii) Hoff text chapter 5 on Development of Syntax where much data can be found: see boxes (§5.1, 5.2, 5.3, etc.).

Introduction Courses: Intro to Linguistics For all introductory course, some interesting topics for research include:

(i) The Skinner vs. Chomsky debate as well as its implications: a. Berko’s ‘wugs test’ (for L1) b. Sally Experiment (for L2) (Galasso) (ii) Speech and Accents of non-native speakers (Spanish L1 => English L2) (iii) Spellings: evolution and change a. See ‘Word segmentation/Word boundaries’ in section (§5.3.4) b. The history of spelling could also be examined: see link below: https:// www.researchgate.net/publication/283664530_ English_ Spelling_ and_its_Difficult_Nature

The Myth of ‘Function Defines Form’ | 167

c. See §6.4 Abbreviations and Initialisms (as evolved via text SMS). d. Language and Technology (how technology may shape language— which would indeed be an environmental ‘Function defines Form’ analysis). See Minimum of English Grammar, vol 1. Chapter 13 (§13.1D) on ‘The Great Vowel Shift’. (iv) All topics of Phonology are open for research at this level: a. phonological development in children (provide based examples of child speech) b. Speech of bilinguals (bilingualism and accent, speech perception).

Endnote: Multiple-Language States The upshot from this chapter on ‘Function defines Form’ as connected to Child Language Acquisition is quite unorthodox—viz., the perceived commonsensical view that it is the ‘child that acquires language’ gets turned on its head with the assertion that it is rather ‘language which acquires the child’. This is not a new concept overall, as this has been suggested for the processing behind Creolization. However, such an expansion to child first language gives the flavor of suggesting that there are in reality all these multiple languages ‘out there’, each falling somewhere along a spectrum from a very basic and prosaic language-state, to that of the adult target-state—and that the child developmental process involves the act of an appropriate language picking an appropriate child (similar to our discussion of the ‘Maturation of Parameters Settings’). These multilanguage-states are all legitimate in their own rights, as they are often observable instantiations of language typologies found across the world (e.g., non-inflectional languages, Pro-drop, non-agreeing languages, etc.). The wealth of such language diversity as seen in the world is in fact the kernel of a single language being realized along the trajectory of a single child (English-speaking or otherwise). Indeed, the notion that an objective language must match a subjective inter-mental capacity—which in turn must await maturation in order for the speaker to absorb the language state—speaks to a strong innateness-hypothesis tradition. In the end of the day, from age two through to twenty, a person may have moved in and out of say ten or more different language-state phases. It’s quite remarkable to think about language development in this way.

168 | Reflections

on Syntax: Lectures in General Linguistics

Examples of an English child passing through multiple languages: 1. Missing Subjects (Pro-Drop (Pronoun drop)). Various languages allow subjects to go missing. These languages are called Pro-drop language. They include language like Spanish, Italian, German, Chinese which allow subjects to be dropped where English requires them. Consider Spanish (taken from Roeper 2007. p. 216): Que pasó a Juan?—Se fue. (What happened to Juan?—went) (= He went) In English we wouldn’t be able to just say ‘went’, English requires the complete Subject and Verb structure ‘He went’… However, English children in fact pass through this Pro-drop phase on their way to developing a full-fledge English grammar. Consider a two-year old English child who says: ‘mommy, look, _bounce ball’ (with _indicating where the subject/ pronoun should have been inserted: ‘mommy, look, I bounce ball’). 2. Non-A greeing languages: Such languages like Japanese do not seem to require agreement between grammatical number [Plural], as well as subject-verb agreement. Phase-1 grammar of an English-speaking child seems to be speaking a Non-agreeing language with an utterance such as ‘mommy, look, two car’ (where _indicates missing plural {s}). This is exactly what one finds in adult Japanese (‘two car’). In a sense, it seems a phase-1 English child begins her linguistic career speaking Japanese. This is a very interesting way of thinking about multiple- language states. These variants of grammar (what I term ‘multiple-language states’) is fully discussed in our Principles & Parameters section of Lecture One. Part of what Chomsky is referring to when he talks about a Universal Grammar (UG) is that all languages seem to be governed by a predetermined & biologically-based default language state (what is sometimes called a ‘bioprogram’ of language, see Bickerton 1984). For a very interesting look at how there could be multiple grammars within a single language, see Roeper 2007, pp. 211–226. Roeper in this context suggests that all native speakers are in fact ‘multi-linguals’—that we are all bilinguals.

The Myth of ‘Function Defines Form’ | 169

Further Readings Christiansen, M. & Chater, N. (2008). Language as shaped by the brain. Behavior and Brain Sciences (Target article). 31: 489–558. Fisher, S.E., & G. F. Marcus. (2006). The eloquent ape: genes, brains and the evolution of language. Nature Reviews Genetics, Vol. 7 issue no. 1 Fitch, T. (2010). Three meanings of ‘recursion’: Key distinctions for biolinguists (chapter 4) in Larson, R., Déprez, V., Yamakido, H. (eds). The Evolution of Language Fitch, T., Hauser, M., Chomsky, N. (2005). The Evolution of the Language Faculty: Clarifications and Implications Cognition 97, 179–210. Fodor, J. (2000). The mind doesn’t work that way: scope and limits of computational psychology. MA. MIT Press. Galasso, J. (2019). Note on Artificial Intelligence and the critical recursive implementation. https://w ww.academia.edu/39578937/Note_4 _A _Note_on_A rtificial_Intelligence_a nd_ the_critical_recursive_i mplementation_The_l agging_problem_of_background_k nowledge_1 For a note on how recursive structures show up in phonology (via assimilation), see ‘Recursive Syntax’ monograph (2019), link below (ex. [31]): https://w ww.academia.edu/42204248/ Working_Paper_no._4 _Reflections_on_Syntax_From_Merge_items_to_Sets_c ategories_ Move_Movement-based_Theoretical_Applications_Morphology_down_to_Phonology Galasso, J. (2013). Chapter on The American/English Sound System. https://w ww.academia.edu/ 42975487/L ecture_ Notes_The_ A merican_ English_ Sound_ System Gould, S.J. (2017). Punctuated Equilibrium. Harvard University Press. Hauser, M., Chomsky, N., Fitch, T. (2002) The Language Faculty: What is it, who has it, and how did it evolve? Science, 298, 1569–1579. Lightfoot, D. (2006). How new languages emerge. Cambridge University Press. Pinker, S. (1999). Words & Rules. NY: Basic Books. Pinker, S. & P. Bloom (1990). Natural language and natural selection. Behavior and Brain Sciences, 13. (4): 707–784. Tomasello, M. (2000). Do young children have adult syntactic competence? Cognition, 74. pp. 209–304. Tomasello, M. & J. Call (1997). Primate Cognition. Oxford Press. Wexler, K. (2003). Lenneberg’s Dream. (Chapter 1, pp.11–61) in Levy, Y. & J. Schaeffer (eds). Language Competence across Populations. Mahwah, Erlbaum.

See debates between:

(i) Pinker vs. Chomsky (cf. Pinker & Bloom vs. Chomsky (Fitch et al.)) on the possibilities of ‘adaptive measures’ leading to language evolution. Chomsky suggests that only top-down, non-Darwinian dynamics can explain recursive structures required for language. (See my ‘For Ben’

170 | Reflections

on Syntax: Lectures in General Linguistics

paper for review on recursion): https://w ww.academia.edu/15151583/ Some_ n otes_ on_ w hat_ m akes_ l anguage_ i nteresting_ For_ B en._ Opening_R emarks_for_L ing_417_C hild_L anguage_ A cquisition_ Spring_2014. (ii) Fodor vs. Pinker (cf. ‘The mind doesn’t work that way’). (iii) Works of S.J. Gould regarding ‘evolution and mind’ https://melaniemitchell.me/EssaysContent/ep-essay.pdf/ (iv) Dawkins vs Gould https://en.wikipedia.org/wiki/Dawkins_vs._Gould (i) Sterelny, K. (2007). Dawkins Vs Gould: Survival of the Fittest. Cambridge, U.K.: Icon Books. ISBN 1-84046-780-0. Also ISBN 978-1-84046-780-2 (ii) Dawkins, Richard (2004). The Ancestor’s Tale: A Pilgrimage to the Dawn of Life. London: Weidenfeld & Nicolson. p. 503. ISBN 0-297-82503-8 (iii) Gould, Stephen Jay (1996). Full House: The Spread of Excellence from Plato to Darwin. New York: Harmony Books. ISBN 0-517-70394-7. Some popular readings:

(1) Dennett, D. (1995). Darwin’s Dangerous Idea. Touchstone Book. (2) Diamond, J. (1993) The Third Chimpanzee. Harper. (3) Gould, S.J. (2003). The Hedgehog, the Fox, and the Magister’s Pox. Three Rivers Press. _____(1997). Life’s Grandeur. Vintage. _____(1996). The Mismeasure of Man. W.W. Norton. (4) Penrose, R. (1994). Shadows of the Mind. Oxford Press. (5) Pinker, S. (1997). How the Mind Works. Penguin Press. (See counter argument found in Fodor’s The mind doesn’t work that way. (See references for citation)). ____ Language Instinct (1994) __ _ _The stuff of language (2007) (6) Searle, J. (2004). Mind. Oxford University Press.

And there are many, many other books, articles, summaries (that can be found on the web)—topics related to the debate over whether or not language (in a narrow scope as it relates to syntax/recursiveness) can have emerged via adaptive/ Darwinian biological pressures. Chomsky comes out the strongest against this point, as expressed in the Pinker/Bloom vs. Chomsky debates.

The Myth of ‘Function Defines Form’ | 171 For an overview of ‘Language and Artificial Intelligence’, see my paper https:// www.academia.edu/39578937/Note_4 _A _Note_on_A rtificial_Intelligence_ and_t he_critical_recursive_implementation_The_lagging_problem_of_background_k nowledge_1

Appendixes

[A]‌p pendix-1 Poverty of Stimulus

In addition to arguments that have been laid out in our ‘4-Sentences’ section, consider the claims made by the linguist Peter Gordon (1985) that ‘young children know not to keep plurals embedded within compounds’. In Gordon’s classic ‘Rat-eater’ experiment, children are asked: ‘What do you call a person who eats rats?’ Children respond ‘rat-eater’ (they delete the {s}) and they never respond *rats-eater. Gordon suggests that children innately know that inflectional morphology {s} can’t be kept embedded within a compound, even though they have never been explicitly shown that such data is in violation of some English grammar. The mere fact that they never hear it (because it is, in fact, ungrammatical) doesn’t explain why children never entertain the prospect: children say loads of erroneous things that they have never heard before. Hence, even though children have no empirical evidence (negative stimulus) that such constructs are wrong, they still shy away from compound-embedded plurals. This is what is referred to as the ‘poverty of stimulus’—namely, when children’s inferences go beyond the data they receive. Gordon suggests in this sense that there must be some innate built-in machinery constraining child learning of language. So, (if say strong behaviorists) ‘input-to-outputs models are the square product of environmental learning’, one question that will come up is: ‘How does such learning deliver a result such as

176 | Reflections

on Syntax: Lectures in General Linguistics

found with the poverty of stimulus case?’ Perhaps symbol manipulation of rules will be required in some fashion after-a ll. But, if so, perhaps we need to rethink the brain as a mere neuro/digital network. In fact, the analogy of the brain as a digital computer had been under attack for some time—as Gary Marcus (2001) claims: ‘We still know very little about how the brain works at the higher level’. Perhaps at the lower level the brain-to-computer analogy holds (where local firing of neurons takes place, etc.) but with higher functions—when we talk of a ‘mind over brain’ in how a brain bootstraps a mind—there may need to be a fundamentally different level of processing with an entirely different neuro-underwriting.

[A-1]1. Michael Tomasello vs. Ken Wexler (See ‘Lenneberg’s Dream’, Wexler 2003) Perhaps one of the more passionate debates which has arisen regarding how this distinction of ‘recurrent/association’ vs. ‘recursive/rule-based symbolism’ can be played-out in developmental child syntax is that (naïve) notion that the errors found in early child syntax is simply the product of a lack of memory, or a ‘bottle- neck’ of cognitive-processing sorts. Let’s just take a last moment to flesh this hypothesis out. There is a case to be made that if, for example, a child says ‘He open it’ (where agreement {s} is missing (e.g., He opens it)), one could claim that such an utterance is in fact available in the input via the utterance ‘[Should [he open it]]’, etc. Likewise, ‘She eat grapes’ could come from available positive evidence of ‘Does she eat grapes?’ etc. where if the initial finite verb is dropped (forgotten) of the matrix main clause, the remaining structure would be consistent of what young children say. This theory relies on a partial mimicking theory of an X=x type, where the initial mimic has been lost. Tomasello argues for such a theory of child language which is completely based on associative mimicking (and/or the lack thereof of certain parts of the base mimic). This is well and good!… But problems quickly emerge from such a naïve theory. For one thing, if a partial mimicking theory is correct, young children (two years of age) would also say things like ‘Did you want’? as derived from the partial mimic of [what [did you want]]? However, they don’t say such utterances. Rather, a more typical stage-1 expression (years and younger) might be ‘What _you want?’ where the second word (auxiliary verb) is missing from the string. Children at stage-1 don’t produce such Aux-first words with Wh-words missing. They rather deleted Aux and maintain the fronted Wh-word. The fact that positional errors can no

A1: Poverty of Stimulus | 177 longer be part of the theory suggests that the child is operating under a structure- dependent hypothesis of language, and not a positional-dependent (or structure independent) hypothesis. Other even more interesting examples include why a child might say ‘Him do it’. Tomasello might argue that they hear [I saw [him do it]], and so on. But still, even if children process in this way, we would have to account for why the first part of the sentence is ignored given that in frequency, first-parts are most marked. This would seem to strike at the heart of Tomasello’s theory. Note that such a theory of mimicking is completely reliant on Frequency. (Note that while the highest frequency word in the English language is the word ‘The’, all the same, the word ‘the’ notoriously is the last word acquired by a young child (e.g, ‘The car is broken’ => ‘Car broken’)). ‘Him do it’ as shown above are so-called Small Clauses. While it may be true that such small clauses are abundant in the data, such a theory would also predict the following: [Mary knows [I like candy]], and so the prediction would be that the child might drop the initial ‘Mary knows’- clause and say [I like candy] with nominative case ‘I’ (the adult utterance and the small clause are homogeneous).1 However, the child does nothing of the sort. The child strictly says ‘Me like candy’, etc. Such utterances pose a problem for any naïve theory based on mimicking or forgetfulness of mimic. In this case, children could be said to go beyond their data, they go beyond frequency of input. Their stage-1 utterances are rather the result of a lack of structure, and not the lack of positional memory. Stage-1 child grammars are systematic, rule-based (or the lack thereof), are constituency-sensitive and based on syntactic properties (provided by UG). In sum, extending this to an AI-discussion (see paper in [A1–3] cited below)—if we hope to ever achieve an AI device in which the learning of human language is even remotely possible, such a device would have to be sensitive to such a processing of learning based on symbolic and rule-based processes: processes which must be able to go far beyond mere +Frequency/mimicking of structure.

[A-1]2. Brain-Mind Bootstrapping The idea that a brain can bootstrap a mind may have its origins in theoretical linguistics, particularly looking at child language development of syntax. One 1 There is a clear relationship (syntactic) between [+Nom] nominative case and [+Fin] finite verbs.

178 | Reflections

on Syntax: Lectures in General Linguistics

very promising model which has implications to AI and the cognitive sciences is the notion that these low-level brain processes which map local configurations on a frequency-based threshold may be only one part of the brain’s processing of language (a more primitive part which is responsible for lexical look-up, retrieval mechanisms dealing with objects, items, etc.) while a second more abstract mode of processing takes such general properties of items and spreads them over categories whereby recursive operations may allow diacritics and indexes to work as variables. (Steven Pinker’s 1999 book entitled Words & Rules captures this dual mechanism model distinctly). By extension, much of what I am on about here in this section speaks to the notion that within a single domain of processing, a dual operating system may be in use which allows for this individual versus kind distinction. Minds can nicely grapple with categories and recursive structures which can handle the tracking of individuals (where indexes, diacritic variables are in operation) while the brute- force calculating brain serves one-one properties of kinds.

[A-1]3. The ‘Brain-to-Computer’ Analogy: ‘Low vs. High’ Levels as a Linguistic Function (For a full discussion and historical treatment of ‘the brain-computer analogy’ leading up to claims of Artificial Intelligence, see my paper): https://w ww.academia.edu/39578937/Note_4 _A _Note_on_A rtificial_Intelligence_a nd_t he_ critical_recursive_i mplementation_T he_lagging_problem_of_background_ knowledge_1 Low-levels. At the lowest levels of the hierarchical brain spectrum are found neuro connectionist systems which rely on approximation to ‘local-dependency’–these are so-called finite-state grammars. (Included here would be so-called ‘multilayer perceptron’). At these low levels, statistical regularities work in local configurations so that they bundle together. In linguistics terms, these low-level grammars create so-called Merge properties: {X, Y} = {XP {X, Y}} (where X is Head and Y is complement) as when two word/items merge in becoming a phrase. Also, linguistic compounding can be said to be a result of such merge. For a slightly more sophisticated example, consider linguistics Root-compounds: e.g., ‘chain- smoker’ where only the two items merge {chain} {smoker} and where no movement is required. (Notice a move-based product becomes ungrammatical, one can’t

A1: Poverty of Stimulus | 179 derive the compound as *A smoker of chains). It’s only at a higher-level of processing where two merged items become more than the sum total of their parts. Consider what happens to the seemingly similar linguistic compound ‘cigarette- smoker’ where one can say ‘A smoker of cigarettes’. This is such an example of a higher-level processing (albeit linguistic) which shows how the two merged items retain their specific past memories over two or more computational time- steps (CTS). Consider the two distinct processes below, starting with low-level/Merge to high-level/Move: (1) Local: [X, Y] => {XP}: [Chain] + [smoker] = [chain-smoker] (2) Distant [X, [X, Y]] => {XP}t (where t is trace of prior memory of structure) [Cigarette] + [smoker] = [cigarette-[smoker of cigarettes]]

In (1) above a local neuro function, say within one CTS (CTS-1) (computational time-step), shows the memory limit in how two adjacent inputs (and adjacency does seem to be a prerequisite, common to how neuro firings work in adjacent bundling) combine to achieve an averaged-weighed product. However, if we were to apply the same ‘local neuro-firing’ to the syntactic compound found in (1, 2) above, the interpretive distinctions between root vs. synthetic compounds would be lost. Consider (1, 2) restated in (3–4) below showing CTS numerations:

[A-1]4. Root Compounds (RC) vs. Syntactic Compounds (SC) as an Analogy to Computational Time-Steps (CTS) (3) Local: [X, Y] => {XP}: [Chain] + [smoker] = [chain-smoker] [X]

+ CTS-1

[Y]

=> (memory between one CTS)

CTS-1 [X= chain], [Y= smoker] => local firing of two units as found in a multilayer recurrent system.

180 | Reflections

on Syntax: Lectures in General Linguistics

(4) [Cigarette] + [smoker] = [cigarette-[smoker of cigarettes]] i.

[X + Y] CTS-1 = {of cigare es}

ii.

[W + [X, Y]] CTS-2 = {smoker {of cigare es}}

[W+ [X, Y]]]

iii. [Yt

CTS-3 = {cigare e {smoker {of cigare es}}}

Showing a linguistics syntactic tree, the CST-1 found in (4i) would be represented in (5) below: (5)

YPβ N

YPα

cigaree N

XP

smoker {poss} N (of

cigarrete)

The above notion of ‘locality vs. distance’ as bound by computational time- steps (CTS) has antecedence to levels of computational processing found in the brain—with ‘Low-level’ computations being assigned to exact one-to-one neuro firing (say, having to do with the triggering of a specific node within a connectionists model), while ‘High-Level’ processing establishes distant relationships, say, between nodes. The expression that the brain bootstraps itself in creating a mind can be played out in such a dualist scenario of local vs. distant neuron triggering—with the brain being pegged to locality conditions (the ‘associative brain’/Temporal lobe region) and the mind to non-local freedom (the symbolic

A1: Poverty of Stimulus | 181 ‘rule- based’ brain/ Broca’s region). (See ‘treelet’ structure below for further discussion). High-levels. At the higher levels, statistical regularities seem not to be dependent on local constraints, as shown in [4]‌above. Hence, the syntactic/semantic interpretational distinctions found between the above ‘root vs. syntactic’ compounds could be drawn as analogous to ‘local vs. distant’ neuro/unit firing—where unit would be labeled here as word (cigarette) and grammatical feature (possessive). This same dual distinction is also very nicely seen within linguistic rules (so- called regular-rules/distant versus irregular-rules/local). For instance, notice how sound-pattern analogies of irregulars work on this low-level where frequency-effects of ‘bundling of feature’ can impact either neuro or linguistic processing: for example, consider how the pattern/analogy of [_ [ing]] > [_[ang]] > [_[ung]] which generates sing>sang>sung may over-generate (over-trigger) based on a sound-frequency effect to bring>brang>brung, (but not *bling> blang> blung). Notice how such novel words (so-called made-up ‘nonce’ words used for experiments) generate the distant rule—e.g., Today I bling it, yesterday I blinged it. Such distant true rules never become dependent on local frequency-effects (or local neuro-firings), with true rules projecting over a variable/category such as instruction: , : add {s} to N when plural one wug > two wugs (see Berko), or, as just demonstrated above, but with an altered final consonant from /g/to /t/e.g., today I blink, and yesterday I blinked. True rule-formations of [N+s], [V+ed] work independently of frequency-bundling and their productivity allows new and novel items to be freely expressed as categorical variables across data spreads. (*See End-note over-leaf on potential regular verbs being high-jacket by frequency effects). Such a multilayer-perceptron model would have difficulty showing such a dual-level processing since perceptron models (SRN’s, CRN’s) would only code for a single mechanism model (SMM)— viz., the same mechanisms would have to be involved between RC’s and SC’s, thus losing the distinction. In other words, regarding CTS’s, ‘cigarette-smoker’ would be forced into a local neuro-firing between two local units as expressed in (6) below: (6) [cigarette] [smoker] [X]

+ CTS-1

[Y]

=> (memory only between one CTS)

182 | Reflections

on Syntax: Lectures in General Linguistics

For example, multilayer-perceptrons can only average (approximate) a broad range of functions, based on local distributions of a single mechanism model (SMM). It has been found (Marcus, Brinkmann, Clahsen, Wiese & Pinker 1995 and Hadley 2000) that these SMM’s cannot capture such a class of operations which spread of two or more CTS’s and/or that follow from a recursive, embedded coding: noticing how the progressive structure in (4i, ii, iii) require embedded clusters (i.e., recursive nesting). (See Pinker 1984 for initial reports of grammar modeling, and Pinker 1999 for review of a dual mechanism model, Galasso 2016 for ‘First Merge, then Move’ model in early child syntax). The conclusions reached here are that multilayer-perceptrons cannot generalize from a limited data the same way as humans do. *End-Note: There is a potential for regular verbs such as ‘walked’ to be high- jacked by frequency in the sense that if a regular word is so highly prevalent in the language in terms of its frequency usage, the word along with its affix, (that is, both its stem and affix) may become lexically incorporated as seen in some brain imaging, ERP and/or reaction-time experiments which show distinctions between the two words [+frequency] ‘walked’ vs. [-Freq] ‘stalked’, where the former is seen as being processed as an irregular lexical and undecomposed item- based chunk [walked], (similar to what we find with irregular words e.g., ‘went’, ‘children’ etc.) and the latter as a true regular decomposed stem + affix [[stalk]ed]. Accounts of such processing distinctions can be made on the basis of frequency. (See Clahsen & Rothweiler 1993). This shows that even regular words may shift from one part of the brain to another based on the role of frequency, with [+Freq] words being mapped to the temporal lobe (Wernicke’s area) and [-Freq] words being mapped onto Broca’s area. If this is correct, rules can be thought-of as a second-course surrogate method of processing to accommodate for the lack of frequency, providing an alternative method of production which is free from the constraints of first- course memory. I say first and second course here assuming that memory was first utilized in our evolutionary processing of language (as seen in protolanguage) with the second, more abstract course of rule-based grammar to follow. This is akin to our Ontogeny-Recapitulates-Phylogeny (ORP) argument advanced in this text.

[A]‌p pendix-2 Concluding Remarks. The Dual Mechanism: Studies on Language

The notion of a ‘Dual Processing Mechanism’ has been debated ever since the very conception of AI programming. The debates have centered around the question of whether or not (i) (top-down) symbolic & rule-based manipulators were considered ‘required implementation’ in the hardware (as part of any AI architecture, presumably ‘innately’ prewired), or whether (ii) mere (bottom-up) connectionism (which were said to more closely mimic what we know of neuron-networks found in the human brain) were all that was needed to simulate human thought and learning. The two modes of the debate tend to map onto what we often describe as top-down (non-local connections) vs. bottom-up (local connections), with the former ‘local connections’ being more sensitive to frequency effects either dealing with semantics and/or distribution of ‘collocation’ distribution, or, in the case of SRNs/CRNs (simple/complex Recurrent Networks, or so-called ‘multilayer perceptions’) gradient weight-scale adjustments, and the latter ‘distant networks’, being the least dependent on frequency-sensitivity—the latter top-down, rule-based processing being the best candidate for generating novel productivity, as found in the creativity of language, etc. Let’s summarize below both (1) how the two modes differ in fundamental ways and (2) how they may in fact be implemented in a hybrid model for AI programming:

184 | Reflections

on Syntax: Lectures in General Linguistics

[A-2] 1. Bottom-Up and Local Neurons Bottom-up nodes in a connectionist model rely on local, frequency-sensitive, connective networks, very much in the spirit of strong associations. The Hebbian expression (Donald Hebb) ‘what fires together wires together’ is a perfect way to express this mode of learning. Now it may very well be the case that, based on what we now know, the human brain does in fact work in such a way, at least at the lower levels. Behaviorist associative learning doesn’t only work in animal studies (Pavlovian experiments), but also in many human learning tasks. For example, priming experiments work in precisely this manner: based of frequency, pattern-formation and association.

Priming Effects As a classic ‘priming effect’ (an effect totally based on frequency), consider the following example: If I say to someone the word ‘Easter’, than follow it up with only the next sound of a word that has an initial /b/-sound ‘B’__ _ , the brain frequency-reaction would quickly generate either one of the utterances ‘bunny’, ‘ basket’, ‘ break’ (= university spring break), or ‘ bonnet’, (somewhat in that order dependent upon the frequency-cline of the particular phrasal expression). But, for example, notice how the phrase *Easter bullet, also starting with a /b/sound, doesn’t surface high enough in the range of possible choices (at least not until all the other higher-frequency choices have been exhaustively eliminated). Such ranking order of associative processing is exactly what we would expect if learning were built upon mere local, connective neuro-nets, sensitive to frequency of semantics and/or sound pattern. So we can at least assume the some portion of human learning is indeed associative based and local (local in terms of how the triggered neuron interacts with its neighboring neurons to form an ‘associative-bundle’ of neurons, viz., the Hebbian axiom cited above). Granting this mechanism, there must be some attempt to implement associative-style learning in any AI programming. This is what multilayer perceptrons (ML-P) and SRNs have been successfully doing over the past half century, and exclusively so. Connectionism is nothing more than supped-up associationism based on 1950s behaviorism (B.F. Skinner). (But recall the Skinner vs. Chomsky debate of 1958 where Chomsky convincingly began a revolution discrediting the naïve theory that human language & thought could ever be exclusively build upon brute association).

A2: Concluding Remarks: The Dual Mechanism | 185

[A-2] 2. Top-Down and Distant Neurons However, one of the observations that came from linguistics is that symbol manipulation of rules must also be applied in times when memorization (of pattern or semantics) fails us. For instance, it was through the pioneering works of Pinker and Prince (1988), Clahsen (1999), Marcus (2000) and others (see Pinker 1999 for general review) that we got an emerging picture of how symbolic manipulation was necessary for language. Some of the first results working with young children were that the rules, say of past tense {ed} were indeed not based on frequency at all. In other words, what we found of the above ‘Easter Bunny’ experiment (a so-called priming effect) was not found with past tense {ed} insertion—viz., there was no associative ‘pecking order’ or frequency-based cline to its usage. In fact, its stem pattern, its 1-to-1 ‘sound to meaning’ (semantics), its token-count frequency, etc. held no sway at all over the ability to insert {ed} onto a (regular) verb base (whether or not the stem was of high frequency values, such as ‘talk’ [[talk]ed], or low frequency value, such as ‘stalk’ [[stalk]ed]). (See Endnote in Appendix-1). ‘Regular’ is the operative word here: for irregulars such as ‘sing>sang’, like-sounding novel words could fall within that same local domain, whereby a child might say of the nonce (made-up) word ‘bling’: today I bling, yesterday I blang based on the sound-pattern formation of {pres} to {past} [_[ing]] to [_[ang]]—which is an extension of the frequency-based analogy of sing>sang [_[ing] >[_[ang] .1 (We find young children in fact often use a Skinner processing for over-generalizing such patterns to *bring>brang>brung). What this meant is that upon any novel-sounding new verb, in order to make it past tense, a rule had to be applied, such that (1) if you have a verb [V]‌, and (2) you want to make it past tense, but (3) you have no memory of a frequency effect about how to do it, then (4) you apply the ‘default’ rule … add {ed} such that => [[V]+ {ed}], [wug] => [[wug]ed]. (See Berko’s ‘Wugs test’, 1958). Interesting, such studies when examining German (Clahsen) show that the German default plural {s} doesn’t even correspond with the most token frequent German plural marking, where German plural {en} shows-up as having a higher frequency distribution on the frequency cline. The fact that German use plural {s} over {en} suggests that this default rule ‘add {ed}’ must NOT be sensitive to lower-level frequency, but rather is rule-based and found at the higher, more distant level of neuro-networks. 1 See web-link no. 33. See link to paper for opening remarks on a dual-mechanism account based on child language.

186 | Reflections

on Syntax: Lectures in General Linguistics

[A-2] 3. A Hybrid Model One way to satisfy both modes of processing, as found in human language, is to find a way to implement both modes within an AI-program. A natural way to do this would be to take so-called (higher-level, more distant) tree diagrams (of a recursive nature) and map lower-level associations onto nodes of ‘treelet’ structures. This has been suggested by cognitive scientists such as Gary Marcus, among others who wish to maintain the necessary recursive nature of human language and thought in the implementation of AI. This hybrid model would allow lower-level connectionism to work for pattern-seeking formations (of sound, meaning) while keeping the crucial recursive implementation so that productivity, novelty and creativity can flourish. (See Marcus (2001) for a discussion of future implementation for a variety of hybrid models and why they are crucial for any future AI success).

Fact #342

Prodicate

Subject

Box

Relation

Object

Inside

Pot

Figure 8 Version of an AI ‘Treelet’ Structure (Showing Hierarchy). Marcus (2001). ‘Box Inside Pot’ (NOT ‘Pot Inside Box’)

A2: Concluding Remarks: The Dual Mechanism | 187 What we want out of an AI-generated proposition e.g., ‘box inside pot’ is not just the associative SEMantic Features of {Sem F: [+box], [+ pot], [+inside]}, but that we can also generate a hierarchal configuration that yields order—so that it is not ‘the pot that is inside the box’, but that it is ‘the box that is inside’. This can only be done via SYNtactic hierarchy which mere semantic features do not provide. This same hierarchy applies to what we know of recursive Tense, Agreement, Case (see Appendixes-2, 4). Hence, the treelet structure that Marcus advances is a hybrid way of being able to implement associative/semantic means at the local level in terms of nodes ([nodes] = [box] [inside] [pot]) while also capturing a recursive hierarchical/syntactic relationship which renders a specific order. This semantic vs. syntactic distinction may have antecedents to what we know of ‘low-level vs. high-level’ neuro processing—whereby locality seems to be semantically driven involving a SEM feature [+ locality], whereas proper syntax (i.e., ‘movement at a distance’) involves the SYN feature [-locality]. The treelet-structure is exactly the kind of tree we find in Generative Grammar syntactic analyses.

[A-2] 4. Low-Level vs. High-Level Neuro Processing At the lowest level of the hierarchical spectrum we would find neuro triggering (bundling) which rely on local-dependency (a proximity factor). Regarding AI algorithms, this is what we would find in so-called finite-state grammars. In finite- state grammars, statistical regularities work in local configurations—local neurons/ units must bundle together. This type of a low-level grammar equates to properties of Merge (Linguistics): {X, Y} = {XP}, where two items/neurons/units merge together in creating a bundle or unit. This constitutes the Hebbian phrase: ‘What fires together, wires together’, where a neuro-net acts upon another neuro-net when in local proximity to one another. Semantic features (SEM-F) such as ‘fish’: {Fish}: {SEM-F [+feathers], [+wings], [+fly]} would be wired to local configurations within a SEM grid. In associative/connectionism model, such local triggering shows priming effects having to do with language processing. For instance, and what turns out to be a problem for mere SEM/AI, Gary Marcus shows how ‘Penguin’ when mapped with SEM-F [+bird], [+feathers] but [-fly] triggers an output for ‘fish’ (and not ‘bird’) (Marcus 2001, p. 94). Marcus refers to this as catastrophic interference (citing McCloskey & Cohen 1989). This is the kind of problem AI faces if its Operating System (OS) simply relies on local, low-level SEM features, when the network can only pick out particular facts about an item at any computational

188 | Reflections

on Syntax: Lectures in General Linguistics

time step without the ability to generalize across multi time-steps. What is missing is the ability to disregard certain aspects in order to save the larger derivation. It is this larger ability that is triggered by higher level processing.

[A-2] 5. The Dual Mechanism: Studies on Phonology Phonological development could also be argued to follow such a dual route. There’s plenty of evidence in the child speech literature to suggest that the kind of phonemic development that takes place over the first year of a child’s growth mimics what we know of how sound would be perceived if it were non-linguistic and simply environmental. An environmental sound-pitch perception would not have the quality of a rule-based category and would rather follow a distribution trajectory of the kind we find in digital voice recognition—that is, the sound perception would be too close and local, (i.e., too sensitive, as if denoted by very precise, digital voice onset times for consonants and high-low, back-front signal perception for vowels). In other words, what computerized voice recognition (VR) does is match the exact frequency of sound to a digital signal and stores it that way. One of the leading problems with VR in this way is that it is too precise. (Note, digital is processing too precise, too perfect. Humans could be said to use less precise analog processes over digital in this respect). What an analogue- based, symbolic human speech perception allows us to do is formulate a rule- based category of sound based on distinctive phonemic features (voice, place, manner, etc.) and generalize that sound to a proto-sound (one step removed from the actual digital environment). What Kuhl and Meltzoff (below) demonstrate is how the young infant moves from local/environmental perception, where ambiance speech-points are spread further apart, to a more generalized stereotype sound. This progression from very precisely locative sound representations—where clusters of sound perception are spread across a broad field based on environmental disturbance—to very narrow target phonemes suggest that young children go from (i) broad perception (where every local sound signature has been perceived across a register), to (ii) a narrow proto-target phoneme which is no longer based on the actual ‘one-to- one’ environmental locative sound itself, but rather is based on a ‘many to one’ categorical sound/phoneme as constructed and configured from out of rules and features—viz., locative individual sound as might be presented across the frequency of a spectrogram become abstracted away from the actual environmetal sound and get made into ‘proto-target’ phonemes; hence environmental sounds turn into a psychological representation of speech. Moving from broad phoneme

A2: Concluding Remarks: The Dual Mechanism | 189 representations (too sensitive) to narrow (less sensitive) is a hallmark of Kuhl’s native language magnet theory. Note that this progression may be indicative of the ‘so-called’ digital to analog shift whereby in the first case stimulus is environmentally associated by an exact signature (for sound, the voice-print signature comes to mind, see VR below), and then where exactness becomes exchanged for a broad-sweeping generalization. Notice from Figure 9 below how the imitation of specific vowels changes over time. By 1 year of age, the infants’ spontaneous utterances reflect their strict 1- to-1 imitation of sound. Even though the fundamental capacity to imitate sound patterns is in place even earlier for children, there seems to be a dual mechanism in how they perceive sound. Kuhl and Meltzoff recorded infant utterances at 12, 16, and 20 weeks of age while the infants watched and listened to a video recording of a woman producing a vowel, either /a/, /i/, or /u/for 5 minutes on each of 3 successive days. The results demonstrated a ‘fundamentally different processing’ change which occurred between 12 weeks and 20 weeks—at 12 weeks, there is a broad overlap of the three vowels as would be indicative of how the child is perceiving those three sounds, individually, based on exact environmental perception at that time. By 20 weeks, there is a clear separation between the three vowel categories— since now the sounds are not merely pitch sounds as moving through the environmental ambient speech stream, but rather have become condensed into a category such as /a/, /i/, /u/. (Notice how deaf speakers are often unable to ‘zoom in’ on a categorical-speech perceptions of target phonemes as a result of their inability to profit from an articulatory loop). By week 20, infants clearly imitate the model, and their vowels have appropriate formant frequency values in relation to one another, even though infants’ vowels occur in a much higher frequency range.

Figure 9. Magnet Effect and Clustering of TargetPhonemes. (Google© ‘Free-to-Use’ Image)

190 | Reflections

on Syntax: Lectures in General Linguistics

Infants’ vowels recorded as they imitate an adult show developmental change between 12 and 20 weeks of age. [Reproduced with permission (Copyright 1996, Acoustical Society of America).] This shift from locative voice-print (week 12) to more abstract generalization (week 20) is analogous to the difference in processing between computer simulated speech (voice recognition) and human speech—indicative of a ‘digital vs. analog’ processing distinction.

[A-2] 6. Background of Voice Recognition (VR) Starting in the early 1980s, voice recognition science took much of the research that went into neuro-net (NN) computer modeling and started to advance similar models in order to deal with speech. So-called Voice-Print (VP) spectrograms were introduced as a natural extension to grammar models done by multilayer perceptrons. Such VPs were inherently based on the physical features that built-up human speech. These however were physical features which could be physically parameterized as directly mapped onto the environmental ambient speech stream: features included space of sound, duration, pitch, loudness—these were ‘real’ physical features of sound (a voice print) similar to how one would map a fingerprint. In other words, Neuro-net VPs were extremely myopic in terms of what was to be mapped and processed as a sound feature. Such VP-features in essence captured the direct ‘energy-bands’ of the sound. And very similar to how NN multilayer perceptron used ‘frequency’ as the crucial aspect of its training (as was discussed in our AI section herein), so too was the VP entirely based on probability as determined by the distributional frequency of these features. These bands related to ‘fixed’ both the shape of the speaker’s vocal tract as well as how the vocal track was manipulated in space and time. A parameter was then shaped which mapped onto these precise associations. This in turned allowed for the sound band to become digitalized. Digitalized Signal Processing (DSP) now allowed for an otherwise analog sound band to become a digital code (binary based 0,1). This move from ‘analog v. digital’ is what essentially creates this fundamental difference hypothesis between what we know of human SPeech (SP-a s analog) and AI speech (VR-a s digital). For VR, pattern recognition leading to VP is based on training (repetitively so), even to the point where overtraining becomes a handicap. One of the

A2: Concluding Remarks: The Dual Mechanism | 191 problems VP faces is when overtraining deprives the computational model of generalizing. This ‘generalizing’ is in fact what humans do, can only do, since human SP doesn’t map exact physical features onto a digital binary code (of sound band). Other problems inherent to a myopic model would include so- called background noise, pitch shift when speaking, emotional tone, stress, excitement, softness/loudness at variable levels, etc. Note that none of these are problematic in human Speech, since SP is analog and not digital. VR along with SP modeling became inherently linked to NN multilayer-perceptron models which dealt with grammar (e.g., Hinton, Elman regarding feed forward and/ or back-propagation algorithms) since these models were similarly capable of allowing hidden layers. For instance, the Hidden Markov Models (HMM)—once a prewired setting (default parameter) was established—some small amount of innate architecture is required of any NN operating system—then the probability ratios between states could be determined: e.g., transition X has the probability index Y. HMM works very nicely with NN multi-layer perceptrons (which also have hidden units). But still, problems surface whenever digital sequencing becomes over-trained, or when the population of speakers becomes too vast (as with large models, when new speakers/input come on the scene the whole system has to be retrained), or when the input becomes even slightly unstable (as with background noise, or simply when a speaker has a nasal cold), etc.

Figure 10. Hidden Markov Model. (Figures 10–12: Google© ‘Free-to-Use’).

192 | Reflections

on Syntax: Lectures in General Linguistics

Figure 11. Multilayer Perceptron.

[A-2] 7. Human Vision There are also theories which support similar claims that neuro-nets uphold the human visual system, like all animals. The difference however is that human vision as based on neuro-nets may reduce to a rule-based analogy whereby foreground, background, angle of sight, and other 3D aspects enter into the visual processing, thus creating potential illusions. In this way, one could argue for a DMM visual processing insofar that neuro-nets of optics begin with optical nerves firing at a local configuration (our local node analogy), which then gets coupled by distant, rule-based interference procedures. Consider the diagram in Figure 12 below. Although both straight lines are equal distant, when figure & ground of foreground and background (Y-branches off the midline) are computationally added, an illusion is produced suggesting the top midline as longer than the bottom midline, (despite the fact that both lines are actually equal distant). One could affix a rule mechanism to this effect by claiming that when the outside Y-branch breaches the mid-line frame of view, we receive rule-α, and when inside Y-branch remains within the mid-line view, we receive rule-β, (with the mid-line as rule-γ).

Figure 12. Müller-Lyer Illusion.

A2: Concluding Remarks: The Dual Mechanism | 193 There have been various experiments2 attempting to tease out human vs. non- human perceptual processing distinctions (if any). The same has been applied to gestalt psychology. Overall, must finding show that non-human visual perceptions are not to be automatically assumed to be fundamentally different from humans. However, the findings may be difficult to interpret. For instance, findings reveal that dogs do apparently perceive the ‘Müller-Lyer’ illusion. However, when appropriate controls are used, findings reveal that dogs rather use ‘global stimulus’ information rather than judging perceived line length. This might suggest that the gestalt, rule-based application of the perception is not present in non- human mammals. Other studies which applied a gestalt treatment to animals show similar problems of teasing out global vs. perceived judgments. (But see the seminal works of Wilhelm Wundt and Wolfgang Köhler here).

2 See Keep, B, H. Zulch; A. Wilkinson (2018). ‘Truth is in the eye of the beholder: Perception of the Müller-Lyer illusion in dogs’. Open access article. 05 September 2018. https://link.springer.com/article/10.3758/s13420-018-0344-z. See Köhler, Wolfgang. The Task of Gestalt Psychology. Princeton, NJ: Princeton University Press, 1972.

[A]‌p pendix-3 A Note on ‘Proto- language’: A Merge-Based Theory of Language Acquisition— C ase, Agreement and Word Order Revisited Language = Recursion, which is ‘recently evolved and unique to our species’—Hauser et al. 2002, Chomsky 2010. ∙If there is no recursion, there can be no language. What we are left in its stead is a (Merge-based) broad ‘beads-on-a string’ sound-to-meaning recurrent function, serial sequenced, combinatory non-conservative and devoid of the unique properties of recursion which make human speech special. It may be ‘ labeling’ (see Epstein et al.)—the breaking of ‘combinatory serial sequencing’ found among sister-relations—that constitutes the true definition of language since in order to label a phrase one must employ a recursive structure—JG. ∙If Continuity is allowed to run freely, in all aspects in respect to biology, and is therefore the null hypothesis, then what we may be talking about is a ‘ function’ that matures over time, and not the ‘ inherent design’ (UG) which underwrites the function, since, given strong continuity claims, the design has always been there from the very beginning. It may be that the (Move-based) function ‘Recursion’ may mature over time, in incremental intervals, leading to stages of child language acquisition, and in the manifesting of pidgin language. But when all is said and done, strong continuity claims do not necessary span across other species or even intermediate phases of our own species. In fact, strong evidence suggests the contrary—that the unique recursive property found specific to our own species, early Homo Sapiens (Cro-Magnon) has in fact no other antecedent that can be retraced past a date of approximately 60kya—JG.

196 | Reflections

on Syntax: Lectures in General Linguistics

[1]‌ Introduction Before advancing theories about the nature of protolanguage, it would seem that what we now know of the ‘brain-to-language’ corollary would help inform our understanding of critical issues on the topic. Along with an ‘ontogeny-to- phylogeny’ trajectory, perhaps indicative of how the stages of early child language seem to unfold and mirror what we know of language evolution in general, the best heuristic tool we have to solving the puzzle of language emergence, growth, and mastery is two-fold in nature, which approximates answers to the questions: (1) what type of linguistic processing seems to be unique only to our species (species-specific)?, and (2) what levels or areas of neuro-cortical substrates seem to underwrite these unique processes? The former question is perhaps best articulated in the Hauser, Chomsky and Fitch (HCF) paper which first appeared in the journal Science in 2002. The latter has been addressed in multiple sources, the first of which drew my attention was the Fisher, Marcus review which first appeared in 2006 in the journal Nature Reviews Genetics, and others including most recently in Larson et al. (2010) The Evolution of Human Language (see both Lieberman and Stromswold chapters). [2]‌What seems to be the locus to the question surrounding the nature of protolanguage hinges on our understanding of, first, how we should go about exactly defining ‘Language’—not say language with a small ‘l’, (such as French, English, Japanese), but rather language with a capitol “L” (what we mean of Language in principle (question 1 above)). What has come out of the second half of the last century, in terms of our Chomskyan framework, is an attempt to demonstrate, empirically, the quite distinct notion of what had henceforth been assumed a priori by theory-internal devices—namely, that ‘No syntactic principle or processing applies directly to words, or to superficial word ordering’ (Piattelli-Palmarini 2010, p. 151).1 Rather, what the Chomskyan view grants us is a language borne of

1 See Piattelli-Palmarini’s chapter found in Larson et als. (eds. 2010). Hence, linguistic theory had to move away from a traditional ‘word-based’ Phrase-constituencies (VP, TP, CP) to more abstract ‘feature-constituency’ (Distributed Morphology of an INFLectional Phrase (IP), see Halle, Marantz). Most recently, ‘phrase’ has been replaced with ‘phase’, an alignment which maps onto the so-called ‘duality of semantics’ (vP, CP)—a move in keeping with what had been articulated by prior notions of scope, c- command, Head- to- Head/ Comp movement, dynamic anti- symmetry (Moro), as well as probe-goal relations: all of which have become central, abstract tenets of the theory.

A3: A Note on ‘Proto-language’ | 197 a categorial nature, abstract and seemingly defiant of communicative functions and unrelated in critical ways towards any strict interphase with the environment. [3]‌It seems theory had to move away from traditional ‘word-based’ constituencies, such as Noun Phrase, Verb Phrase (NP, VP) and move towards more abstract constituencies dealing with inherent features of the H(ead) of a word, along with a H’s relationship to other Heads. (For instance, the INFLectional Phrase (or IP) came out of this tension between head of words versus heads of features (see Footnote 1)). A cursory look at the chronological record of the generative grammar enterprise takes us from early 1950s T-makers of transformational grammar (TG), to recursive phrase structure grammars (PSG), to X-bar theory of the Principles & Parameters framework (P&P) of the 1980–1990s which delivered a ‘Spec-Head-Comp’ configuration (the holy grail of the ‘Spec-Head’ relation)—a ll, only to be overturn most recently within the minimalist program (MP) by the prosaic ‘Head to Head’ relation whereby the simple ‘Merge’ of two Heads is now the driving force behind all syntactic operations. (See ‘Note-2’ for full discussion). [4]‌Much of our discussion related to the ‘four-sentences’ section of this book is deliberate in showing just how rather byzantine constraints on abstract syntactic structure defy what would otherwise be intuitively expected of a simplistic means of functional communication: e.g., why should the clitic formation of [that’s] in ‘sentence no 4’ (found in our ‘four-sentences’ analyses)—a clitic formation that is legitimately pervasive otherwise—not be allowed2? Sentence-4: ‘I wonder what *that’s/that is up there.’ Clearly, either form [that’s] vs. [that] [is] should share in the ‘equal status’ of plainly being able to communicate the simple proposition; however, the clitic [that’s] in this syntactic structure found in sentence no. 4 is ungrammatical. What we could say is that while [that’s] communicative value is plus [+Com], its syntactic value is minus [-Syn], demonstrating that there is disassociation between (formal) syntax and (functional) communication. (See Piattelli-Palmarini (ibid) for other such data and analyses). [5]‌For example, it is the internal categorical structure of the ‘H(ead)-features of the word’ which is now seen as projecting the outer phrase constituency: e.g., for the lexical item V (verb), it may be the categorical features related to T (tense), a 2 See ‘Four Sentences’ [Sentence #4] for a full analysis.

198 | Reflections

on Syntax: Lectures in General Linguistics

‘finiteness effect’, which determines its syntactic valence of how it might select for a determiner (Subject, Object, Case). The dual probe-goal phases of CP & vP, as currently understood within the Minimalist Program (MP), may similarly assign H-features which map onto the so-called duality of semantics: where phase/CP is responsible for scope and discourse-related material as well as the functional projection of AGReement (and presumably Tense), and where the phase/vP maps onto argument structure (and presumably Case). [6]‌Recently, it has been proposed (HCF) that it may be the sole, unique properties of recursion which is behind the very underwriting of this ‘categorical nature’, and that more specifically, these features reside as ‘edge-features’ of a phase.3 If we assume that Chomsky (in particular) and HCF (more generally) are right, within the linguistics context, namely, that ‘language = recursion’, as defined in his terms by a Faculty of Language narrow (FLn), then our question becomes: What types of neuro-substrates serve recursion, which would be separate from other cognitive/motor control function? A second follow-up question, relevant to the question at hand, would be what a proto-language might look like stripped of this narrow language faculty, where a language may only show evidence of linear sequence [A], [B], [C]…, a recurrent but not a recursive structure, as found in [A [B [C]]]…4

Broca’s Area and Wernicke’s Area Revisited [7]‌Thought it still makes for a nice pedagogical device, we now realize it is quite over-simplistic to talk about a compartmentalized ‘seat of language’ in this way 3 In a [Spec [Head-Comp]] configuration, the so-called ‘edge’ would be the Spec position, away from the core inner working of the phrase/phase. Spec is often defined as an ‘elsewhere category’ which allows for MOVE to take place, whether the Spec is serving as a host for the moved item, or is instigating the move in the first place in accordance with a Probe-Goal relation. 4 Recall, a linear recurrent model would show a potential two-word utterance as [[drink] [water]] without the necessary syntactic/recursive properties which would allow for a full expression behind the notion of someone drinking. In other words, a flat combinatory sequence would only yield two items in isolation [x]‌, [y]. What is lacking is the recursive syntax of: [drink [drink water]], which shows MOVE allowing for a hierarchical expression. (See §[22] ‘A Summary of Labeling and how “Merge vs. Move” affects Word Order’. Also see ‘Note 2’ for fuller discussion of ‘dynamic antisymmetry’).

A3: A Note on ‘Proto-language’ | 199 that straddles the classic Broca-Wernicke divide. Although it continues to feel natural in wanting to map etiology of language-specific diseases to specific cortical regions of the brain—e.g., how Parkinson’s disease (PD) presents differently from Alzheimer’s disease (AD), or how with the Autism-spectrum Williams’ syndrome (WS) suffers from unique processing deficits distinct from Asperger’s syndrome (AS), etc., and, furthermore, how these distinctions might in fact show up in specific areas of the brain (Broca (= PD, WS)) vs. Temporal-lobe (AD, In In terms of (AS), what we have rather discovered is that the surface areas which we call Broca’s area (BA) & Wernicke’s area (WA) are merely terminus levels found on the cortex. (The notion of terminus nodes which surface on the outer-cortex was presented as early as 1885 when Lichtheim analyzed interconnected neuropathways between BA & WA). If BA & WA are just termini which are post hoc defined merely by the gathering place where certain types of neuro- bundles that gather together fire together (under specific tasks, language tasks, etc.), then, the more critical question is: What actually underwrites such specific neuro-bundles? [8]‌In other words, what we must reconsider is the possibility that perhaps it is not the cortex at all that is doing the underwriting of the neuro processing (not BA, WA), but rather the processing is being guided by more robust and underlying subcortical-clusters which precisely bundle and target specific areas of cortical mapping. In other words, if BA and WA do subserve specific types of language tasks (as classically assumed), they do so due to their mapping of subcortical-neural-circuit (SNC) triggering. The best-case scenario here for such SNC processing is what we have learned over the past 20 years regarding the functions of the basal ganglia (a group of structures found deep within the cerebral hemispheres, which includes the relay-connectivity of the Putamen and Thalamus, both working in tandem which form a cortical ‘feed-back’ loop). Recent studies have now shown (e.g., Leiberman p. 167, quoting Cumming in Larson et al. 2010) that distinct regions of the frontal cortex indeed connect with their Basal Ganglia (BG) and thalamic counterparts, constituting largely segregated basal ganglia-thalamo-cortical (BTC) neuro-circuits. *(As a personal note: I have a hunch that movement operations—whether or not of the sort found in recursive syntax, or simply of the sort required for fluidity of body movement—are specifically pegged to the basal ganglia area of the brain, to the extent that people suffering from MS may have difficulty finishing fluid movements, possibly as a result of lesions found in the BG).

200 | Reflections

on Syntax: Lectures in General Linguistics

[9]‌The main ‘SNC-processing’ which roughly maps onto BA is movement (MOVE), the unique ability (perhaps motor-control related) to displace an item in the surface-level (phonology) to some other place in the underlying (syntactic) structure, inter alia. The basal ganglia, with its ‘looping effect’, bringing subcortical neuro-circuitry to percolate up to the surface cortex, seems to be the best-case cerebral candidate to serve the unique phenomena of MOVE, where recursion is required to break with a flat sister-relation otherwise found of surface phenomena (See Chapter 3 ‘Labeling Acccount’ as well as §[26] below regarding recursion & dynamic antisymmetry (DA)—processes which extend otherwise flat sister-relations to having hierarchical status). Recursion has the property which allows cortical mapping of two language-specific tasks (both seemingly BA-related): viz., that of phonology, and that of inflectional morphology. While phonology as recursive is still hotly debated,5 (and Chomsky doesn’t appear to be swayed by such arguments), to the contrary, inflectional morphology which is defined by movement (displacement) is clearly quintessential recursive in nature. [10] In this note, I focus only on the displacement properties of recursive syntax found in inflectional (INFL) morphology, as present in morphological Case and Agreement, both which are INFL-related, and see what proto-language absent such INFL/Recursion might look like (comparing data results to that of pidgin language and/or even Chimp ASL e.g., Nim Chimski, (Terrace 1979)).

Proto-Language and Derek Bickerton [11] I know of no more passionate advocator for a protolanguage than the late Derek Bickerton. His and his colleagues’ tireless work examining Hawaiian pidgin—as a heuristic model for what linguists should look for towards a proto-language

5 Syllable structure of might be recursive due to its inherent hierarchical structure. For review, see Schreuder, Gilbers, and Quene’s paper ‘Recursion in Phonology’ Lingua 119 (2009). It also bears keeping in mind that MOVE-related diseases such as PD do seem to impact both phonology and syntax, while other studies also suggest that MOVE correlates to mouth movement, planning and articulation of speech as well as syntax. Broca’s aphasia may impact both speech as well as syntax. What we could then say of PD is that it affects the basal ganglia along with its SNC-processing leading to the inability to exact MOVE-based recursion, as found both in phonology and syntax.

A3: A Note on ‘Proto-language’ | 201 grammar—has brought the once taboo topic to the fore of current linguistic theory and debate. Today, the theoretical notions leading to any understanding of a putative proto-language have suddenly found its underwriter by the larger, and perhaps even more ambitious, interdisciplinary field of Biolinguistics. This brief ‘Note-1’ is in response to some thoughts on what has been laid out in Derek Bickerton’s 2014 paper ‘Some Problems for Biolinguistics’ (Biolinguistics 8). Having set-up some discussion regarding the current state of the ‘biolinguistics enterprise’, and some non-trivial problems pertaining to its research framework, particular to the Minimalist Program (MP) (Chomsky 1995), Bickerton goes on to express his long-held views on the nature of a Protolanguage (§4.2)— namely, pace the given Chomskyan account, that there should be NO inherent contradiction between the coexistence of the two statements (below): Statements: [12] (i) ‘Statement-1’: That language is to be properly defined, very narrowly, within the terms of a Faculty of Language-narrow, an FLn which, by definition, excludes most of what is typically accepted within the linguistics community (outside MP) as defining what normally constitutes a language—viz., vocabulary, idiomatic & encyclopedic knowledge (= the lexicon), phonology (syllabic constructions), and some particular aspect of morphology (e.g., derivational processes, compounding, etc.). A layman’s classical definition of what constitutes ‘language’ is intuitively very broad in nature. But Chomsky’s definition of a language faculty (LF), to the dismay of many, is exceedingly Narrow (n): That FLn is the sole property of recursion: that language is exhaustively defined by the exclusive and very narrow property of recursion. (ii) ‘Statement-2’: That a putative protolanguage theoretically exists and could serve as an intermediate step between a partial language and a full- blown LFn—viz. an intermediate language phase which would find itself tucked- in between what we know of pidgin languages (an L2 attempt to formulate a rough grammar for functional communicative purposes), and perhaps chimp sign-language and other animal cognitive-scope features (of the type taught to the chimp named Nim Chimpski (Terrace 1979)), along with other communication systems which are not on equal par with LFn, of what Chomsky refers to as Faculty of Language-broad (FLb)—viz., ‘broad’ factors which include the aforementioned lexical-item development sensitive to frequency learning, formulaic expression, and other similar ‘frequency-sensitive’ morphological word-building processes such as compounding and derivational morphology.

202 | Reflections

on Syntax: Lectures in General Linguistics

[13] In other words, Bickerton’s claim here is that we can accept both statements as true—they are not mutually exclusive: (i) (FLn) Yes! ‘Language-proper’ is to be narrowly defined as pertaining to the sole (and, as it turns out, quite a unique) property of ‘recursion’, and, (ii) (FLb) Yes! There too could be a protolanguage (by definition, an FLb) without ‘recursive operations’—a language just shy of maintaining the status of a ‘’full-blown language’ along the language spectrum.6 The two claims appear to reflect on larger dichotomy issues. Let’s flesh this out below in the way of the dichotomy debates: ‘form vs. function’, ‘continuity vs. discontinuity’, ‘nature vs. nurture …’

The Dichotomy Debates [14] Taking the former ‘recursive property’ (=syntax/FLn) as a critical aspect of a dichotomy-debate (say, of continuity), one would most certainly claim the emergence and development of recursion (MOVE) to be discontinuous in nature from all other non-human primate communicative systems, perhaps accepting Gould’s version of recursion as ‘exaptation’ at one end of the spectrum with Chomsky’s single-mutation-event leading to a ‘pop hypothesis’ on the other.7 In any case, both claims would be consistent with what Gould calls a ‘punctuated equilibrium’ hypothesis—i.e., that recursive language (FLn) emerged in one-fell-swoop, either exaptation from prior material (his ‘spandrels’) or a completely novel structure.8 The features of the latter (FLb), ‘from prior material’, most certainly would maintain at least some level of continuity assumptions, as widely expressed in the language-evolution literature (e.g., Pinker & Bloom (P&B), among others). 6 Bickerton has long sought to advance an intermediate stage of language, ‘a proto- language’, as a grammar just shy of maintaining a fully-fledged recursive grammar. What so-called ‘flat-recurrent’ (non-recursive) grammars would not be able to do is creatively generate and parse constructions beyond a preconceived semantic/canonical specificity. See endnote of this section for discussion of recurrent versus recursive grammars. 7 See Jean Aitchison (1998) for a review of the ‘slow-haul’ vs. ‘pop hypothesis’ in this context. 8 See Crow (2002) for a sudden ‘genetic mutation’ hypothesis (which would be akin to Gould’s ‘punctuated equilibrium’).

A3: A Note on ‘Proto-language’ | 203 P&B may be correct in assuming that there is ‘somewhat’ continuity regarding the articulation mechanism of sound/phonology (i.e., the chimp’s ability for syllabic pant-hoots, and other primate syllable-vocalization capacities—though it must be said that human speech is indeed quite unique and highly specialized due to the lowering of the larynx), as well continuity in what we would find regarding the idiomatic ‘one-to-one’ associative-learning mechanisms behind the mapping of ‘sound/gesture to meaning’ (manuofacial expression), ‘cue-based’ representation (in the ‘here & now’), and other non-formal constructions leading to compounding and even limited syntax (lexical-root phrases such as [NP [N]‌ + [N]] constructs which approximate possessive structures e.g., [NP daddy book] (= daddy’s book) or prosaic [VP [V] [N]] constructs which approximate Tense/ Agreement e.g., [VP daddy drink water] (=daddy drinks water), etc). But I suppose, for Chomsky, the question is: Can we really get there from here? Can FLb turn into FLn?: Really, can broad-communicative features (attributed to non-humans) as laid out in (HCF) evolve into (human) FLn? For Chomsky, the answer is an unequivocal NO! and hence another dichotomy debate. Chomsky’s now famous analogy lends us to imagine a sudden mutation (or catastrophic event) devoid of any ‘bottom-up’ Darwinian selective pressure for FLn: ‘We know very little about what happens when 1010 neurons are crammed into something the size of a basketball … ’ (Chomsky 1975: 59).

These open lines of a much longer paragraph on the topic fully commit to a top- down ‘form-precedes-function’ analysis regarding FLn. Bickerton has carried on with the same theme arguing against any ‘non-human to human-language continuity’ when he claims that: ‘[T]‌rue language, via the emergence of syntax, was a “catastrophic event”, occurring within the first few generations of Homo sapiens sapiens’. (Bickerton 1995: 69).

Child-to-Adult-Continuity [15] Let’s remind ourselves that Chomsky believes in ‘child-to-adult’ continuity (if not in ‘function’, in ‘form’) given that language, per se, has potential drop-off waystations on its way to a full target-grammar projection. So, for early child language utterances, the nature of their errors (functions) is rather epiphenomenal since the underlying grammars (forms) which underwrites the syntactic templates

204 | Reflections

on Syntax: Lectures in General Linguistics

must be (if we assume a UG) the same ‘all the way up/down’ between child and adult. Though, perhaps a better way to view Chomsky’s remarks is to suppose that we can still tease apart ‘ form from function’ (yet another dichotomy).9 For instance, assume that Chomsky agrees with the assertion that children first exclusively function with Merge (and not Move)—while still maintaining that the form of UG is the same as consistent with continuity. Well then, there could be space within such an argument for an emerging grammar. The hypothesis would be that young children (at the low Mean-Length of utterances (MLU) stages) would be forming the same UG as their adult counterparts while their functions would be immature, following a protracted maturational scheduling of function: UG …(stage-0)…..stage-1 (Merge)….stage-2 (Move)… on their way to a full Target language (stage-T ). What could be claimed then is that it’s the function ‘MOVE’ which matures and eventually comes on-line. True, the capacity for MOVE was always there (UG), it’s just that the hidden processes which map ‘form to function’ followed a protracted schedule. This is not unlike what we would find for the maturational development, say, of functional categories—viz., while their form is intact, as part of UG (DP, TP, CP), their mappings of ‘form to function’ are delayed (see Galasso 2003). Again, this form-to-function disparity could be one way to reconcile Chomsky’s strong stance calling for a non-developmental UG (since the empirical observation is valid: that language, at any given stage of development, never exhibit UG violations, nor do they ever exhibit ‘wild grammars’). [16] Lastly, the above notion that MOVE is maturational-driven (within our Homo species) seems to nicely correlate with what Chomsky himself claims of 9 ‘Function-to-form’: so, think baseball glove: catcher’s-glove is padded due to repeated fast balls (‘catch the ball softly!’), outfielder’s glove is light due to its having to be held while running for the fly-ball (‘catch the ball running!’), first-baseman’s glove is extended due to a race between the ball and the bat’s-man running to tag first base (‘catch the ball quickly!’). This here is ‘function defines form’ (or function precedes form). But language seems to be the reverse (form precedes function), where the form of the (internal) mental template seems to shape the potential (external) function of language. For example, an Arabic speaker sticks out his phonological-perception glove to catch an (external) English ‘/p/-ball’ (say, the word /P/olice) (using a phonetic-pitch metaphor to baseball), but (internally) catches it as a ‘/b/-ball’, where /polis/(police) gets caught as /bolis/. (Arabic has no /p/phoneme in its phonological inventory, and /p/vs. /b/does not make up minimal pairs). See also discussions surrounding the dichotomy between ‘functionalism vs. formalism’.

A3: A Note on ‘Proto-language’ | 205 language evolution (within our Homo species): that ‘Every inquiry into the evolution of language must be an inquiry into the evolution of the computational brain machinery capable of carrying out edge-features operations’ (Chomsky MIT lecture, July 2005, cited by Piattelli-Palmarini (2010, p. 151)). Recall that what we mean by ‘edge-feature’ operations are those syntactic operations which can only be handled by the unique recursive property of MOVE. Also recall that there is also a high level of inherent abstract symbolism involved in any MOVE- related/edge-feature operation since such principles of MOVE (i.e., syntax) do not map onto words per se, in an iconic 1-to-1 manner (as might be intuitively imaged of language), nor is there any surface word-order mapping (which might be expected of surface phonology). Rather, MOVE inherently requires the mental manipulation of categories (=symbols)—these are categorical concepts such as Verb, Noun, or constituency structure which breaks with surface word order. (See our ‘Four-sentences’ analysis for full discussion of recursive constituency). The implications here is that in order to question the nature of language evolution, and all of its complexity, the first order of business is to address the question of determining when the first evidence of MOVE appears in the early Homo species, and if, as a result of MOVE, other spin-off exaptations (or so- called ‘hitch-hiking’ free-rider adaptions) can be explained as being bundled with recursion (perhaps even neurologically bundled): I am thinking about theory of mind, shared attention, symbolism and displacement which contribute to so- called ‘detached representations’, altruism, dance, ceremonial practices & taboos, and other perhaps niche motor-control abilities such as tool-making capacities (which demonstrated a mental template for the design of the tool), throwing capacity, so-called ‘remote threat’, sheltering, cooking of food, etc. [17] What all of the above features have in common—as a unifying thread which can lead to recursion/MOVE—is the ability to project an item, project oneself, away from an icon, an index, and to become symbolic and categorial, both in nature, in index, and in design. What we currently know—out of all archaic homo species (Homo-Habilis (Australopithecus africanus), (early Africa)-Ergaster, (late Asia)-Erectus, Heidelberg, Neanderthal)—is that only Cro-Magnon10 (our 10 The brain-size trajectory most certainly would be a major contributing factor with regards to any such evolutionary-based theory for either a ‘gradual development’ (bottom-up) or ‘sudden emergence’ (top-down) of FLn. To be considered is a respective brain-size spectrum that would begin at around 450cc with Australopithecus, Erectus at 1000cc, to roughly 1500cc with Neanderthal, followed by a very slight decline in Cro-magnum stabilizing at 1300cc.

206 | Reflections

on Syntax: Lectures in General Linguistics

early homo sapiens-sapiens ancestors, say at around 40–60KYA) had emerged onto the scene (seemingly top-down) into being categorial in nature, gaining a rich symbolic system first drawn from an inner mental language (MOVE), with subsequent bootstrapping to be applied to other non-linguistic, cognitive, motor- control tasks. Once a full-blown symbolic inner-language system emerged (either via a catastrophic mutation or via exaptation), what came with it was all the ‘bells and whistles’ of being a member of a unique symbolic club, what today we call the ‘homo-sapiens-sapiens club’. It is now well recognized that by the time Cro-Magnon comes on the scene, having evolved in whatever which way, they came on the scene drenched in symbolism (White 1989). If they evolved at all (bottom-up), what we can say is that they evolved from an earlier time/species of not yet having recursion, to a later time/species11 when they have it, or if you prefer, using our current linguistic terminology, from an earlier time of having FLb, to having FLn. [18] All of these FLb linguistic factors—which, as suggested above, may have hitch-hiked from categorial symbolism in its purest form (viz., sensori-motor control, mapping of sound-to-meaning, lexical retrieval, word-building, compounding, and even derivational morphology)—are all found in the very early stages of child language acquisition (Radford & Galasso 1998; Galasso 2003, 2016), and have antecedents which can be traced back to pidgin grammars (Bickerton), and, to a large degree, even further back to what we have gleaned from non-human primate communication systems (the use of ASL by Nim cited in Terrace). Of course, it goes without saying that notions of any putative ‘ “somewhat”-continuity’ between human and non-human as it regards cognitive scope, theory of mind, altruistic features, etc., must be taken with a grain of salt—viz., there really is not much continuity to speak of, in these realms, and the very fact that non-human primates lack what would in humans be such simple operations surely present us with the ‘smoking gun’ of discontinuity through and through. (The underlying question to ponder here is whether there might be a ‘singular, unifying mode of processing’ which underwrites these realms—and, neurologically, might it be related to MOVE?). In any case, for what it’s worth, this dual acceptance allows for both continuity (FLb) and discontinuity (FLn) to flow from out of ontogeny and phylogeny trajectories—ontogeny in terms of ‘critical period’ cases in which a proto-language 11 One possibility implied here is that (early-FLb) Homo erectus evolved into (late-FLn) Cro-magnum.

A3: A Note on ‘Proto-language’ | 207 (Bickerton’s ‘bioprogram-hypothesis’) may fare no better than the ‘end result’ of a trajectory of an individual’s growth and plateau of syntax (leading to a pidgin language—in that a pidgin is in many ways discontinuous from a target L1, just as early child language shows discontinuous properties due to their lack of full recursion of syntax). [19] In terms of phylogeny, we could assess claims which speak to how ‘language- broad’ evolution might be continuous in nature, with antecedents which harken back to animal cognitive capacities. In other words, Bickerton claims we can find an intermediate phase along these dichotomy-spectrums, in one sense leading to a human-language (immature) capacity which would solely incorporate FLb- features—including inter alia a limited lexicon with perhaps a maximum ‘mean length of utterance’ (MLU) count of below 3 (i.e., no more than three words per utterance), along with the complete absence of Inflectional morphology; what we would expect of pidgin-language capacity. Though it is the latter statement that Chomsky rejects, I, along with Bickerton, see no reason at all, at least conceptually, why there couldn’t be a syntactically, albeit robust, FLb-phase of child language (ontogeny) on its way to a fully-fledged FLn, and if so, why this intermediate phase that the child passes through couldn’t constitute what we would at least theoretically claim of a protolanguage (phylogeny). In one sense, this kind of argument mimics the old adage ‘ontogeny recapitulates phylogeny’ (first cited by Ernst Haeckel).12 [20] Chomsky’s insistent belief is that there could be, conceptually, no intermediate step shy of a full language; if you have such a step, then it’s merely a function of a communicative niche (as expressed above), and that such a deprived system (deprived of recursion) would by its fixed nature need to remain there, as a non-evolving, non-human communicative system (=FLb). This is tantamount to saying that FLn cannot arise from FLb—viz., that there can be no continuity between FLb and FLn (not in a phylogenetic way ‘evolution’, nor in an ontogenetic way ‘child maturation’ as cited above). I believe (and I assume Bickerton would agree with me here) that Chomsky’s assertion is too strong. Chomsky has been quite consistent ever since our reading of the ‘Fitch, Hauser, and Chomsky paper’ (2005)—on the topic of the nature of FLn and of language evolution— that a definition of ‘Language’ (a language with a capitol “L”) can only be purely based on one essential property, namely the property of recursion. 12 For example, see Dan Slobin (2004).

208 | Reflections

on Syntax: Lectures in General Linguistics

For Chomsky et al. (2005), [language = recursion]. This very narrow definition is perhaps the only way that Chomsky can maintain his long-held notion that language is biologically modular and human species-specific (modular in that its function is autonomous and compartmentalized, like any other organ, e.g., the liver, stomach, lungs) and species-specific in that its operation is uniquely situated in the human brain/mind (presumably Broca’s area, a region which seemingly only serves recursive operations such as (inter alia) the planning of articulation leading to mouth movement, and the movement involved with syntax).

The Minimalist Program (MP) Enterprise: Resolving a Dichotomy [21] But there does seem to be way to reconcile both statements within the MP enterprise. Within MP, there are two types of movement (mapping with what one finds regarding the ‘duality of semantics’): (i) Local-move (= Merge, the merging of two Heads) is based on the merging of two items (two Heads (H) within a Phrase (P))—e.g., such as what we would primitively finds in H-H compounding (e.g., Adj+N sequences such as [black]-[bird]=> [blackbird]), a simple base in-situ Verb Phrase such as [VP [V bounce] [N ball]], and non-formal sentence constructs such as ‘Me go’ or ‘Him do it’ which show a complete lack of inflectional morphology (a lack of Case and Agreement). All these constructs do show up in impoverished pidgin systems as well as in very early MLU stages of child language and can be attributed to the kinds of features we find in non-human communicative systems of the sort famously demonstrated by the chimp named Nim (ibid). But note here that in order to know where the H of the P is, one must involve a second (later) merge operation coming on the heels of the first. In order to reach the VP derivation of the unordered set {V, N}, locate the Head {V} and label the P accordingly, the speaker must utilize what is referred to as Internal Merge (IM) (an instance of distant-Merge/MOVE), so that the unordered set {bounce, ball}, becomes an ordered pair: syntactically deriving the mere twin lexical items [bounce, ball], to a fully-fledged VP [bounce [bounce, ball]]. So, we would speculate that at any impoverished mere ‘local- move stage’, we should find instances of mixed word order and lack of inflectional morphologies leading to the absence of Case and Agreement. This is indeed what we find not only of child language (See Radford & Galasso

A3: A Note on ‘Proto-language’ | 209 1998), but also what we find regarding pidgin formations, and finally what the extremely curtailed limits of Nim’s speech range. (ii) Distant-merge (= MOVE) is based on the subsequent move (a second-order move) which, as a result, breaks the flat symmetry of an unordered set and allows the labeling of a Head of the phrase to be defined. In contrast to local merge, distant merge (= move) allows for a portmanteau of features and phenomena, among them the syntactic operation of movement which break base in-situ constructs and allows for the lexical item to percolate up the syntactic tree in order to check-off +Formal features (in current MP terms, as guided by the ‘probe-goal’ relation). Other consequences of MOVE would be the projection of Case (+/- Nominative), AGReement (Person, Number) as well as Tense, all of which are found in higher phrasal projections above the base-generated VP. Again, any lack of these higher, formal projections would have syntactic consequences. (For full discussion of ‘Merge vs. Move’ and ‘Problems of Projection’, see https://w ww.academia.edu/42204713/Notes_1-2 _Reflections_on_Syntax_ Note_1_A _Note_on_the_Dual_Mechanism_Model_Language_acquisition_ vs._learning_and_the_Bell-shape_curve._Note_2 _ A _Note_on_Chomskys_ 2013_Lingua_paper_Problems_of_Projection).

A Summary of Labeling and How ‘Merge vs. Move’ Affects Word Order [22] First-order /local merge—the simple assembles of two lexical items in creating an unordered set, say a Phrase (P) {a, b} out of the two items. Yet, there is no recursion; hence, there can be no labeling of what would constitute the Head (H) of the P. In order to derive H of P, a second-order /distant merge must break with the set in creating an ordered pair {α, {α, β}} = P (where α = H). It is via this second-order merge (which constitutes as a recursive property) that we can derive order within the P—an order which comes about as a result of the ability to label which of the two items is rendered as H. [23] Consider, at least theoretically if not empirically,13 a young child’s inability at the early mean length of utterance stage (early-MLU) to derive second-order merge labeling, thus being incapable of understanding labeling of H, rendering such otherwise adult unambiguous structures ambiguous: e.g, [house-boat] is read and interpreted by the adult as a kind of boat (and not as a kind of house). But, if we first 13 See the monograph From Merge to Move (Galasso, 2016).

210 | Reflections

on Syntax: Lectures in General Linguistics

examine the base-structure of the two lexical items {house, boat}, there is no way we can glean from a flat, unordered structure what the Head word of the compound [N+N] would be. This problem is in fact what we find in very early instances of child language.14 Carol Chomsky15 first found the lack of recursive operations regarding passive formations—that when young children were faced with (improbable) irreversible passives (e.g., The ball was kicked by the boy/*The boy was kicked by the ball) they scored quite well. But when children were presented with reversible passives—passive interpretations which must exclusively rely on ‘syntax’, as opposed to irreversible passives which were actually acquired quite early in development since ‘semantics’ can serve to help with the only probable interpretation— the children tested were unable to correctly demonstrate that type of movement necessary for a passive interpretation. In other words, children had a hard time with (e.g., The man was killed by the lion/The lion was killed by the man) where both reading are probable and reversible. [24] It is interesting to note here that Grodzinsky (1986, 1990, 1995) similarly finds in Broca’s aphasia subjects an inability to handle ‘distance of movement’ in embedded subject-relative clauses, where (i) local movement had an ‘Above chance’ level of acceptance/reading and where (ii) distance movement had only a ‘Chance level’—e.g.,

14 And as discussed herein, such an inability for labeling would force a flat reading of the two items [drink] + [water] as two separate interties without the luxury of syntax—viz., an individual with a recursive grammar can reconstruct the two items syntactically, within a VP, such that a proposition can be generated: that ‘someone is drinking/wants to drink/should drink water’, etc. The same two items, recursively, get structured as [VP drink [drink water]] where the Verb ‘drink’ now dominates the Noun ‘water’ in a mother-daughter hierarchical relation [x [x,y]]. As long as the two items stand in a flat non-recursive, recurrent manner [x, y], all one could gleam from the utterances is that ‘drink’ and ‘water’ have been combined, ‘stacked’ in sequence, and where perhaps word order has no bearing on structure. The fact that a person (with full adult syntax) can reconstruct a meaning out of a simple two- word utterance such as ‘drink water!’ suggests that such a bootstrapping relies on a matured mental syntax in supporting the reading. (See Note 2 herein for Dynamic antisymmetry and Problems of projection). 15 Chomsky, Carol. 1969. The acquisition of syntax in children from 5 to 10. Cambridge, MA: MIT Press.

A3: A Note on ‘Proto-language’ | 211 a. a.

The catt [that [__t chased the dog]] was very big. (local move = Above chance) catt [that [the dog chased __t]] was very big. (distant move = Chance).

[25] In other words, the greater difficulty in comprehending sentences in situations where syntactic form is not supported by semantic content suggests that the semantic component of grammar may play an important role in the young child’s acquisition of syntactic comprehension—the latter ‘sematic-content’ interpretation being a product of local merge, viz., the yielding of the lexical items and how each item plays a thematic role in the sentence. In the case of ‘distance-moved’ (cf. Grodzinsky), it seems that adjacency of linear order (surface phonology) takes prominence over hidden structure at a distance. (Also see Galasso 2016, c hapter 8 for treatment of Broca’s aphasia data). [26] Distant Merge (= Move) has something to say about how we glean a Head for a given phrase. In the case of ‘house-boat’ (‘a kind of boat’, not ‘a kind of house’), in order to derive the head of the Compound (C) (heads are right- branching in English Compounds) we must employ second-order distant-merge. Following Moro’s work on dynamic antisymmetry, accordingly, in order to label a H of a P (or C), we first must break with flat/sisterhood relations—of the kind typically associated with ‘logical and’ (e.g., I need to buy: ‘a and b and c and d’, whereby comma-insertion allows for displacement and rearrangement of ‘a–d ’ in any order since sister-relations are symmetrical and hold no hierarchical order)— and create an antisymmetric hierarchy, such that from out of a sister, first-order/ local-merge set {α, β} (showing symmetry), we derive {house {house, boat}}. In this second-order ‘Move-based’ structure, notice how the Noun ‘house’ has risen up to a higher functional node within C. It is this movement that breaks flat sister relations and creates, as Moro puts it, dynamic antisymmetry in labeling H of C. For phrases, that is at work. Take for example the VP [VP [{V bounce}, {N ball}]]. In order to derive the H verb ‘bounce’ of the P ‘bounce ball’, (English H of P are left-branching, just the reverse we found with C), there needs to be second-order distant move, such that the H becomes labeled as distinct from its complement: first-order local merge: {bounce, ball} (showing no order) becomes second-order distant merge/move {bounce {bounce, ball}} => [VP {bounce {bounce, ball}}]. It is clear that there are instances in the child language literature where young children cannot yet discern their proper word-order, e.g.,

212 | Reflections

on Syntax: Lectures in General Linguistics

a child may utter VP bounce ball, or ball bounce with identical intentionality (see Galasso 2001, https://w ww.csun.edu/~galasso/worder.pdf). Distant-Merge related structures missing in pidgin and which constitute the basis for Proto-language. How ‘Merge vs. Move’ affects Case Assignment. [27] Let’s begin this section on Case with some basic assumptions, some of which are theory-internal: (1) Case marking is a ‘functional-category enterprise’—viz., a formal projection which requires movement of the case-marked item to raise out of the base- generated VP and insert into a higher functional phrase. (2) That there are three distinct (and overt) Case markings in Standard English (SE): Nominative on subjects [+Nom] (e.g., I, he/she, we), Accusative on objects [-Nom] (e.g., me, him/her, us), and Possessive/Genitive when used as prenominal [Poss+N] [+Gen] (e.g., my, his/her, our), and when used as pronominal [N-{Poss}] (e.g., mine, his, hers, ours). Also [+Gen] is morphemic {‘s}, {of} in examples [Tom’s [house]], and ‘The house [of [Tom] ‘s] house]]’ (The house of Tom). The morpheme {to} also serves to case mark [-Nom/Obj] (e.g., give it to him /*give it him/give him it/*give him to it). (3) The default, base-generated order of Double-arguments is [Indirect Object + Direct Object] [IO, DO]. (For theoretical discussion, see Boeckx 2008). (4) That Case can’t be doubly marked from a single verb—in this sense, case is of a ‘Probe-Goal’ (PG) relation instigating an upward projection of a targeted item. Once a probe has located its goal and relevant features have been checked, the probe is no longer active in the syntactic derivation. (5) There are at least two mechanisms for Case marking: a. via Structural/Configurational with a lexical item (in local domains)— so-called lexical Case: i. Verb-complement, PRN => Object [-Nom] ii. Preposition-complement, PRN => Object [-Nom] b. via Morphemic assignment (probe-goal) with clitics {-‘s}, {-to}, and {-m} c. Otherwise, via default (or inherent case).

A Theory [28] One very persistent characteristic of any putative protolanguage would be its lack of MOVE (movement operations which are motivated by (inter alia) functional features which make-up a Probe-Goal (PG) relation): e.g., Case,

A3: A Note on ‘Proto-language’ | 213 AGReement & Tense (and Word Order is most often a result of some movement operation whereby the surface-structure phonology is derived from an underlying hidden structure). Let’s just briefly examine how MOVE might correlate to the functional feature of CASE (nominative, accusative, genitive) assignment. Theory Internal consideration: All functional features/ projections must involve movement from out of the base-generated VP/NP, (the VP/NP being a first result and product of simple merge). [29] Let’s begin with two lexical items (they can be both Heads (H) for the time being). Consider the merging of [[John] [book]]. In this simple [N]‌+[N] merge operation (absent of any MOVE), the two Hs are considered flat-sequenced, base-generated and thus cannot generate any formal functional features such as Case. (1)

NP => merge John

book

In this instance of merge, genitive/possessive case can’t be assigned. (2)

Poss => Move (John’s book) NP => Merge (John book)

Case

John ‘s

N John

[_‘s [John, book]] [John, book]

N book

It is rather ‘Move’, a recursive structure, which generates Genitive/possessives: (3)

Poss _‘s

NP N

N

[30] What we find here, theoretically, is that MOVE is responsible for triggering possessive (Genitive) case marking. One further speculation (see analysis below) is that such functional case marking: {-’s}, {-to}, and {-m} function as clitics (bound morphemes) which directly insert, perhaps directly pulled from the lexicon, (sometimes merely as a feature, but often as a lexical item itself, as in the case of ‘to’) which in turn motivate the raising of a lexical host, as specified

214 | Reflections

on Syntax: Lectures in General Linguistics

by the Head information to search-out a PG relation up from a lower position within VP/NP. [31] What we know of pidgin syntax, word order is often variable (e.g., Bickerton 1990, also see Galasso 2003, 2016, 2018 for accounts and analyses of early-child mixed word order). Theory internal considerations speculate that at the exclusive merge-level—what we would find of proto-language—no word order can be fixed since both Heads (H) {x, y} or H and Phrase (P) {x, yp} would serve as flat sister relations with no hierarchical dominance. In other words, in brief, fixed word order must be the result of MOVE {x {x, y}}, a recursive property, a property only seen in full-blown human language.16 [32] Given that pidgin, as well as very early child utterances, lack a fixed word order (at the early multi-word stage) (e.g, me car, car me, (= ‘my car’), mommy sock, sock mommy (= ‘mommy’s sock’), etc.), this seems to suggest that pidgin, early child language would fall somewhere on the spectrum close to a protolanguage, if what we mean by a proto-language system is that which is devoid of any formal movement operations, and is a system which only employs merge. [33] What’s also very interesting about the analysis above (and explicitly advanced in Bickerton’s 1990 syntax) is that this would explain double-possessive markings found in examples such as [The book of John’s], where possessive Case for ‘book’ seems to be marked twice. Let’s see how this might work: (a) (b) (c)

[John’s [John book]] [of [John’s [John book]]] [book [of [John’s [John book]]]] Poss-2 => Move-β case

book-of

{-of} shows as a cli c.

Poss-1 => Move-α (John’s book)

{-‘s} shows as a cli c.

NP => Merge (John book)

Case

John-‘s N John

N book

16 See web-link no. 28. On Merge vs. Move in child language.

A3: A Note on ‘Proto-language’ | 215 [34] So, the clitic-climbing is expressed as shown in below, with lexical items raising up to attach to the CLitic (CL) as in a PG relation (in this sense, clitics and/or features of clitics force raising). Poss-2 => Move-β

Poss-1 => Move-α (John’s book)

CL

-of

{-of} shows as a cli c. {-‘s} shows as a cli c.

NP => Merge (John book)

CL

-‘s

N

N

John

book

Let’s consider further examples of how MOVE & CLitic PG-relation triggers and projects POSSessive case. In sum, our analysis of how MOVER triggers Pronoun (PRN) Case assignment shows as follows: [35] Case [35] Case (a)

(b) +Nom

Merge/Local

+Gen

Merge

his

I me…

him…

(= I like John) where PRN I is nominative case.

(=His book) where Genitive Prn his is POSS case.

So, Case marked syntactic tree looks like this: (a)

move Case

merge N

N/V

216 | Reflections

on Syntax: Lectures in General Linguistics

[36] Move-based Case: (Where accusative is default case): a. From accusative to nominative via Move: [I [me do it]], [He [him do it]], b. From accusative to genitive via Move: [my [me dolly]], [his [him car]].

Case in Double Object Constructs In double-object constructions, when PRNs are employed—which must present overt case marking—e.g., I, me, my/he, him, his/they, them, their (Nominative- subject, Accusative-object, Genitive-possessive)—we see how MOVE can trigger Case assignment. Consider the distinction between the two sentences: NP

NP

[37] (a) John gave…[Mary] [money] (Indirect Obj

Direct Obj)

(b)

….*[money] [Mary] (DO, IO)

[38] a. ‘Give [him money/it]!’, whereas b.*‘Give [money/it him]!’ is *unacceptable due to there being no ‘PG-relation’ to enable the Case-marking to be checked-off on the pronoun ‘him’ (noting that pronouns in English need the checking-off of the overt Case-marking feature, unlike Nouns which require no Case). It can be argued that only in (a) does the PRN Him remain in a PG configuration whereby it can receive and check-off the accusative [Nom]-feature structurally via the verb ‘give’ (cf. [27, (5a,i)]). But nothing hinges on that treatment: otherwise, Case is acquired via default. Also note that there seems to be a preference for the structure in ‘Give him money’ over ‘Give him it’*?, suggesting that the PRN ‘it’ is more sensitive to the right configurational position leading to Case-marking than is its Noun counterpart ‘money’ (again, since Nouns in SE don’t require case-marking). (Below we note that in order to save the derivation found in b., ‘him’ must raise to be structurally adjacent to the verb, a position that would be forced by a PG-relation in any event).

‘Him raising’ [39] In addition to a potential default setting (where configuration of word order is no matter), another work-around may be to assume that in order to save the ungrammatical derivation in *(b), movement of the Noun ‘money’ found in (a) optionally can be employed, allowing for the case-marking morpheme {to} to

A3: A Note on ‘Proto-language’ | 217 attach to an appropriate stem—now, the case-marking clitic {to}, attached to the N, serves as the probe of a PG-relation, attracting the goal PRN ‘him’ to raise in a local/adjacent domain in order to receive Case. ‘Give [money-(to) [him money]]!’

Probe-Goal relation.

[40] So, restating what was said in (5), we have three ways in which the PRN ‘Him’ gets case marked: (1) Structural via the verb {give}: a. John gave him money (to) him. = ‘Him’ raising (2) Morphemic via the clitic {to}: b. John gave money-to him change. = ‘money’ raising (3) Lexical via [Verb + Preposition {to}]: c. John gave (to) him money. = (similar to (a)) d. *John gave money him. = *(unacceptable)

Lexical Case must derive from a [Verb+Prep] configuration: John [V gave [PP to him]] money, as separate from [Noun + Clitic], as found in the structure: John gave [N/CL money-to [him]]. (4) whereby the ungrammatical sentence in d. *John gave money him is due to there being no case marking PG-relation for the PRN ‘him’, it being stranded in a non-configurational manner without local domain to check-off its Case feature (i.e., neither morphemic, structural, nor lexical configuration is available), and a putative ACC default can’t be employed within a structural configuration—viz., an accusative ‘default status’ can only emerge as a result of a non-configurational environment (i.e., ‘no structure’). This is exactly what we would expect if Case were a functional projection triggered by MOVE (and an operation lacking at the early stages of child language, pidgin, and certainly not found in non-human primate communication). So, with raising (MOVE) we derive DO-IO and render the sentence: [41] a. John gave money-to Mary. CL(ic Phrase) Case

money-to N

NP N

[Mary] [money]

218 | Reflections

on Syntax: Lectures in General Linguistics

With the underling structure showing PG relation between {to} & Mary, and raising of money to serve as a host to the clitic {to}: [42] John gave money-to [Mary money].

Recall, as noted in our lexical case-marking treatment above, another work- around involving Case via prepositional {to} is to analyze the Verb Phrase (VP) [give+N] as the right kind of projection that can allow prepositions as its complement, thus allowing case marking to be applied lexically (lexical case given that Heads of PP can only assign Accusative [-Nom] Case in SE). (Again, where {to} now serves as a case-marking clitic, this brings to the number of morphemic case-marking clitics to three: to, of, ‘s). The PRN ‘Her’ Mary must be case marked by clitic {to}, just as in the other cases involving {of} and {‘s}. Note again how we cannot say *‘I want to give money Mary’. This sentence is unacceptable since neither the proper name ‘Mary’ along with its object/pronoun counterpart ‘her’ wouldn’t be case marked. But both I want to give money to Mary/I want to give Mary money are fine. Note: The sentence I want to give Mary money brings our attention to the problem of how Mary gets case marked—since no necessary movement has been employed out of base-generated [IO, DO], and lexical {to} Case checking is absent. Well, Mary is first of all, not overtly marked, (only Pronouns in SE get overtly marked, perhaps explaining why we can’t say *‘I want to give to Mary money’), and secondly, we have a mechanism easy enough that can take care of Case, that of Structural case via the verb ‘give’. [Give [Mary]] is in the structural local domain where the verb ‘give’ is allowed to case-mark its complement/ object ‘Mary/her’. Since lexical case-marking is present, the extra insertion of case-marking {to} would be redundant, and thus ungrammatical as seen in the contrasts between the two structures *‘I want to give to Mary/her money’ vs. ‘I want to give money to Mary/her’. In summary of this section, a structure such as ‘give it to him’ derives the following steps: [43]

VP V

Merge = base order

give him

it

A3: A Note on ‘Proto-language’ | 219 (1) Note how *‘give to him it’ is unacceptable, while ‘give it to him’ is acceptable. (2) The ProNouns ‘him’ is case-marked via base-generated/in-situ verb ‘give’ (or otherwise by default). But the PRN ‘it’ must also be case-marked (overtly so). So the PRN ‘it’ must raise to clitic ‘to’ (morphemic case marking). Note the grammatical contrasts of (a) *give it him. (where PRN ‘him’ is left stranded without a case-assigning configuration) (b) give it to him. (where PRN ‘him’ now receives a proper case-assigning configuration via CLitic {to}). (c) give him it => base-generated order [I0, D0]. VP give

CL = Move Acc

it-to

Merge => base-generated/in-situ him

it

In the illicit example found in (a), case marking cannot be doubly assigned to PRN ‘it’ because the dative verb ‘give’ has (earlier) already assigned case to ‘him’ via base-generated/in-situ order (‘Give him it!’). (Ex. C). [44] Given that the morpheme {to} now has a dual status (preposition and case marking), let’s consider the contrast regarding phonology/stress (schwa reduction) between the following clitic case-marker {to} versus the prepositional {to}— two very different items: (a) John gave (to) him coins. (base order): [V+PP] => {to} is a Preposition (no schwa reduction) (i) *John gave him them, (ii) John gave them to him (In (a) above, it appears that when the Prep {to} is removed, and case of ‘him’ can be assigned via the verb, what happens is that the PRN ‘them’ is then left stranded without a case-assigning configuration. The structure in (ii) corrects this). (b) John gave coins-to him __: N+CL => {to} is CLitic case marker (schwa reduction is allowed). In (a), ‘to’ is a preposition and no schwa reduction is observed; ‘to’, if pronounced, must either be pronounced with stress /tú/or may be deleted (whereby ‘him’

220 | Reflections

on Syntax: Lectures in General Linguistics

receives structural case). (Recall, the Noun ‘coins’ doesn’t have to raise to be overtly case marked, and so it may remain in-situ in base-order with default case. Only Pronouns in SE get overtly case marked). In contrast to (c) below, the stress in (b) is weakened to a schwa: (c)

(i) John gave it-to him. (ii) *John gave to him it.

{to}-Clitic {to}-Prep

Recall, that for the example (c) above, ‘him’ has already been case marked structurally (in base-generated order) via the verb ‘give’, (cf. [40 (1)]) (*and so the prepositional structure in (c, ii) above is redundant for case marking). But the PRN ‘It’ still must be overly case marked, so raising to the clitic {to} (as in a PG relation) is forced. Note that {to} in this respect found in (c, i) is a ‘case-marking’ clitic, and not a preposition, noting the possible phonological/stress ‘schwa reduction’: e.g., ‘John gave it /tә/him’ as opposed to ‘John gives to/tú/him so much money’.

Agreement [45] Another formal feature we could consider is Agreement, which contains the dual features of Person, and Number. For instance, if the subject ‘He’ is [3rd person, singular], then the verb ‘speaks’ must match (so-called subject-verb AGReement): e.g., ‘He drives’. We could speculate that the default setting here is the non-a ffix Infinitive-verb stem of ‘speak’, so that a child at the very early multi- word stage might say ‘Daddy drive’ whereby no AGR features is present. Coupled with a default Accusative case (as suggested above), we might expect to find utterances of the example ‘Him drive car’, and we do. Let’s consider some token examples of such utterances dealing with the absence of Case and AGR below, as we draw our attention to what a speculative Protolanguage would sound like.

Postscript Note. On Protolanguage: A merge-based theory of language acquisition ∙Language is recursion, which is ‘recently evolved and unique to our species’. (Hauser et al. 2002, Chomsky 2010). ∙If there is no recursion, there can be no language. What we are left in its stead is a broad ‘sound-to-meaning’ communicative act devoid of the unique properties which make human speech special. It may be ‘ labelling’ (see Epstein et al.) that constitutes

A3: A Note on ‘Proto-language’ | 221 the true definition of language—since, in order to label a phrase one must employ a recursive structure. jg There has been no more passionate advocator for a protolanguage than Derek Bickerton. His and colleagues tireless work examining Hawaiian pidgin—as a model for what linguists should look for towards a proto-language grammar— has brought the once taboo topic to the fore of current linguistic theory. Today, the theoretical notions leading to any understanding of a putative proto-language have suddenly found its underwriter by the larger, and perhaps even more ambitious, interdisciplinary field of Biolinguistics. This brief note is in response to some thoughts on what has been laid out in Derek Bickerton’s 2014 paper ‘Some Problems for Biolinguistics’ (Biolinguistics 8). Having set-up some discussion regarding the current state of the ‘biolinguistics enterprise’, and some non-trivial problems pertaining to its research framework, particular to the Minimalist Program (MP) (Chomsky 1995), Bickerton goes on to express his long-held views on the nature of a Protolanguage (§4.2)—namely, pace the given Chomskyan account, that there should be NO inherent contradiction between the coexistence of the two statements: (i) That language is to be properly defined, very narrowly, within the terms of a Language Faculty (narrow), an FLn which, by definition, excludes most of what is typically accepted within the linguistics community (outside MP) as defining what normally constitutes a language (e.g., vocabulary, phonology, and some particular aspect of morphology such derivational processes, etc.). That FLn is the sole property of recursion. (ii) That a putative protolanguage theoretically exists and could serve as an intermediate step between a partial language and a full-blown FLn—viz. an intermediate language phase which would find itself tucked-in between what we know of pidgin languages (an L2 attempt to formulate a rough grammar for functional communicative purposes), and perhaps chimp sign- language (of the type taught to the chimp named Nim Chimpsky (Terrace 1979)), along with other communication systems which are not on equal par with FLn, of what Chomsky refers to as FLb—(broad) factors which include lexical-item development sensitive to frequency learning, and other similar ‘frequency-sensitive’ morphological word-building processes (compounding and derivational morphology). In other words, Bickerton’s claim here is that we can accept both statements: yes, ‘language’ is to be narrowly-defined as pertaining to the sole (and, as it turns out,

222 | Reflections

on Syntax: Lectures in General Linguistics

quite a unique) property of ‘recursion’, and yes, there could be a protolanguage (by definition) without ‘recursive operations’—a language just shy of maintaining the status of a ‘’full-blown language’. In other words, Bickerton claims we can find an intermediate phase along the developmental spectrum leading to a fully-fledged human language which would solely incorporate FLb, including inter alia a limited lexicon with perhaps a maximum ‘mean length of utterance’ (MLU) count of below 3—i.e., no more than three words per utterance, along with the complete absence of Inflectional morphology. Though it is the latter statement that Chomsky rejects, I, along with Bickerton, see no reason at all, at least conceptually, why there couldn’t be a syntactically (albeit robust) FLb phase of child language on its way to a fully-fledged FLn, and if so, why this intermediate phase couldn’t constitute what we would at least theoretically see of a protolanguage. In one sense, this kind of argument mimics the old adage ‘ontogeny recapitulates phylogeny’ (first cited by Ernst Haeckel).17 Chomsky’s insistent belief is that there could be no intermediate step shy of a full language; if you have such a step, then it’s merely a function of a communicative niche (as expressed above), and that such a deprived system (deprived of recursion) would by its fixed nature need to remain there, as a non-evolving communicative system. This is tantamount to saying that FLn cannot arise from FLb, (not in a phylogenetic way ‘evolution’, nor in an ontogenetic way ‘child maturation’ as cited above). Chomsky has been quite consistent ever since our reading of the ‘Fitch, Hauser, and Chomsky paper’ (2005)—on the topic of the nature of LFn and of language evolution—that a definition of “Language” (a language with a capitol “L”) can only be purely based on one essential property, namely the property of recursion. For Chomsky et al. (2005), [language = recursion]. This very narrow definition is perhaps the only way that Chomsky can maintain his long-held notion that language is biologically modular and human species-specific (modular in that it functions like any other organ, e.g., the liver, stomach, lungs) and species-specific in that its operation is uniquely situated in the human brain/mind (presumably Broca’s area, a region which seemingly only serves recursive operations such as (inter alia) the planning of articulation leading to mouth movement, and the movement involved with syntax). But there does seem to be way to reconcile both statements within the MP enterprise. Within MP, there are two types of movement (mapping with what one finds regarding the ‘duality of semantics’):

17 For example, see Dan Slobin (2004).

A3: A Note on ‘Proto-language’ | 223 (i) Local-move (= Merge) is based on the merging of two items—e.g., such as a Verb Phrase [VP [V bounce] [N ball]]. But note here that in order to know where the Head of the Phrase (P) is, one must involve a second (later) merge operation coming on the heels of the first. In order to reach the VP derivation of the unordered set {V, N}, locate the Head {V} and label the P accordingly, the speaker must utilize what is referred to as Internal Merge (IM) (an instance of distant-Merge/MOVE), so that the unordered set {bounce, ball}, becomes an ordered pair: syntactically deriving the mere twin lexical items [bounce, ball], to a fully-fledged VP [bounce [bounce, ball]]. (ii) Distant-merge (= MOVE) is based on the subsequent move (a second-order move) which, as a result, breaks the flat symmetry of an unordered set and allows the labelling of a Head of the phrase to be defined. First-order /local merge—the simple assembles of two lexical items in creating an unordered set, say a Phrase (P) {a, b} out of the two items. Yet, there is no recursion; hence, there can be no labeling of what would constitute the Head (H) of the P. In order to derive H of P, a second-order /distant merge must break with the set in creating an ordered pair {α, {α, β}} = P (where α = H). It is via this second-order merge (which constitutes as a recursive property) that we can derive order within the P—an order which comes about as a result of the ability to label which of the two items is rendered as H. Consider, at least theoretically if not empirically,18 a young child’s inability (at the early MLU stage) to derive second-order merge labeling, thus being incapable of understanding labeling of H, rendering such otherwise adult unambiguous structures ambiguous: e.g, [house-boat] is read and interpreted as a kind of boat (and not as a kind of house). But, if we first examine the base-structure of the two lexical items {house, boat}, there is no way we can glean from a flat, unordered structure what the Head word of the compound [N+N] would be. This problem is in fact what we find in very early instances of child language. Carol Chomsky19 first found the lack of recursive operations regarding passive formations—that when young children were faced with (improbable) irreversible passives (e.g., The ball was kicked by the boy/*The boy was kicked by the ball) they scored quite well. But when children were presented with reversible passives—passive interpretations which must exclusively rely on ‘syntax’, as opposed to irreversible passives 18 See Galasso, From Merge to Move (LINCOM Publishing). 19 Chomsky, Carol. 1969. The acquisition of syntax in children from 5 to 10. Cambridge, MA: MIT Press.

224 | Reflections

on Syntax: Lectures in General Linguistics

which were actually acquired quite early in development since ‘semantics’ can serve to help with the only probable interpretation—the children tested were unable to correctly demonstrate that type of movement necessary for a passive interpretation. In other words, children had a hard time with (e.g., The man was killed by the lion/The lion was killed by the man) where both reading are probable and reversible. In other words, the greater difficulty in comprehending sentences when syntactic form is not supported by semantic content suggests that the semantic component of grammar may play an important role in the young child’s acquisition of syntactic comprehension—the latter ‘sematic-content’ interpretation being a product of local merge, viz., the yielding of the lexical items and how each item plays a thematic role in the sentence. Distant Merge (= Move) has something to say about how we glean a Head for a given phrase. In the case of ‘house-boat’ (a kind of boat, not a kind of house), in order to derive the head of the Compound (C) (heads are right-branching in English Compounds) we must employ second-order distant-merge. Following Moro’s work on dynamic antisymmetry, accordingly, in order to label a H of a P (or C), we first must break with flat/sisterhood relations—of the kind typically associated with ‘logical and’ (e.g., I need to buy: ‘a and b and c and d’ whereby comma-insertion allows for displacement and rearrangement of ‘a–-d ’ in any order since sister-relations are symmetrical and hold no hierarchical order)— and create an antisymmetric hierarchy, such that from out of a sister, first-order/ local-merge set {α, β} (showing symmetry), we derive {house {house, boat}}. In this second-order ‘Move-based’ structure, notice how the Noun ‘house’ has risen up to a higher functional node within C. It is this movement that breaks flat sister relations and creates, as Moro puts it, dynamic antisymmetry in labeling H of C. For phrases, that is at work. Take for example the VP [VP [{V bounce}, {N ball}]]. In order to derive the H verb ‘bounce’ of the P ‘bounce ball’, (English H of P are left-branching, just the reverse we found with C), there needs to be second-order distant move, such that the H becomes labeled as distinct from its complement: first-order local merge: {bounce, ball} (showing no order) becomes second-order distant merge/move {bounce {bounce, ball}} => [VP {bounce {bounce, ball}}]. It is clear that there are instances in the child language literature where young children cannot yet discern their proper word-order, e.g., a child may utter VP bounce ball, or ball bounce with identical intentionality (see Galasso 2001, https://w ww.csun.edu/~galasso/worder.pdf). For Works cited, consult: https:// w ww.academia.edu/ 4 2204283/ Working_ Papers_ R ef lections_ on_ Syntax_References_and_Links

[A]‌p pendix-4 Concluding Remarks: Lack of Recursion Found in Protolanguage

Theorical assumptions towards a Protolanguage suggests that we today have remanences of what a historical protolanguage might have looked like, right here in our own backyards (literally), if we closely examine what we find in early child grammar. The idea here is a kind of ‘ontogeny recapitulates phylogeny’ in the sense that the child, (over a brief period of several months) reenacts the long- protracted developmental stages of human language acquisition in such a way that it mirrors the same periods of language evolution over what would be a span of several hundreds of thousands of years. Individual ‘child months’ translate into human evolution ‘millennia’. Two remanences come to mind: (i) child language syntax, and (ii) pidgin syntax. A Pidgin-language is defined as an informal attempt (not really a conscious attempt, but rather subconscious and passive attempt) of the acquisition of a foreign second language without the use of formal training. In other words, a pidgin L2-language is the outcome of what the human brain/mind allows to fashion given the onset of the critical period which delimits the plasticity of the brain to absorb a full language, as otherwise achieved for a first/native language. The final outcome-stage of a Pidgin is what we find of the early, developmental stages of child language. In this sense, pidgin language maps onto what we known of early child syntax—viz., both showing a lack of recursive properties

226 | Reflections

on Syntax: Lectures in General Linguistics

which renders an impoverished syntax. Hence, a subsequent question might be to ask whether an ‘ontogeny-to-phylogeny’ progression holds for a putative proto-language. Following here a short overview of data of early child non- recursive syntax (the same type of recursive omissions of what would be seen of pidgin language), we beg the notion that a proto-language might show very similar commissions of Tense, Agreement, Case, etc. Below is an overview of child syntax. (For a full discussion of Protolanguage, see Bickerton 2010). Also see papers: https:// w w w.academia.edu/ 4 2933589/ N ote_ 3 _ A _ N ote_ o n_ P roto- language_ A _merge-based_theory_of_language_ acquisition-Case_ A greement_ and_Word_Order_Revisited https:// w w w.academia.edu/ 4 2933615/ Postscript_ N ote._ O n_ P roto- language_a _merge-based_theory_of_language_acquisition

[A-4]1. Some Data taken from Child Language Acquisition Premising the above lack of any formal features to a protolanguage, what one would fine is the following generalizations: A Protolanguage would emerge with a lack of agreement inflectional morphology—whereby a single presentation of stem-morphology would act as a default across the morphosyntactic paradigm—and a lack of Case (nominative, genitive)—whereby a default setting would spread across the entire case paradigm. Examples: (See Radford & Galasso 1998 for review).

Morphological Case Consider the two-prong stage regarding Case by application of the following morpho-syntactic tree diagram showing stage-1 sequential/recurrent [x, y] versus stage-2 recursive [x [x, y]]. This same application can be used for utterances such as ‘me car’ vs. ‘my car’ (with ‘me’ raising to Case-marking Clitic (CL) position whereby the Genitive/possessive Case-feature is checked) [My [Me car]], as well as with Accusative/default Case (Me) vs. Nominative Case (I) (where VP internal ‘Me’ [VP Me do it] raises to Spec of IP/TP(a functional projection) and gets pronounced as ‘I’ [IP I [VP me do it]] in accordance with the checking off of [+Nom] case feature in IP/TP). (See data below):

A4: Concluding Remarks: Lack of Recursion | 227

[A-4]2. Theory → ‘John money’ (stage-1) Non-possessive (double noun sequence) i.

CL(itic Phrase) Case Ø

NP N

N

[John] [money] => sequential, recurrent [x, y]

→ John’s money (stage-2) //Possessive Case {‘s} ii.

CL(itic Phrase) => recursive [x [x, y]] Case

NP

[John[ ’s]] N

N

[John] [money]] (Movement of ‘John’ to CL –possessive position)

→ Him money (stage-1) iii.

CL(itic Phrase) => recursive [x [x, y]] Case ø

NP

N

N

[him]

[money]]

(showing No case)

→ His money (stage-2) //Possessive case {His} iv.

CL(itic Phrase) => recursive [x [x, y]] => Stage-2 Case [His

NP N

[him]

N

=> Stage-1

[money]] (movement of ‘him’ to CL-possessive position).

228 | Reflections

on Syntax: Lectures in General Linguistics

→ Car go (stage-1) vs. ‘Car goes’ (stage-2) v.

TP => recursive [x [x, y]] => Stage-2 T ø

VP N

V

[car

go]

=> Stage-1

→ Car goes (Tense Phrase at stage-2) vi.

TP => recursive [x [x, y]] => Stage-2 T

VP

s N

V

(affix/Tense {s} lowering)

car [[go]es]

[A-4]3. Data Lack of Possessives Daddy car. That me car. Have me shoe. Me and Daddy (= Mine and Daddy’s). Where me car? I want me car. I want me bottle. I want me woof. I want me duck. That me chair. Where me Q-car? No me, daddy (= It isn’t mine, Daddy). Me pasta. Mine pasta. My pasta. In my key. It my (= It’s mine). No book my (=The book isn’t mine.) No you train. It’s him house. Daddy car.

Lack of Subject Case Me wet. Him is hiding. Him dead. Him is my friend. What him doing? Me do it. Him is alright. Evidence for a Discontinuity model is striking. For instance, Radford and Galasso (1998), Galasso (1999, 2003) and Radford (2000) provide English data showing that children enter into a ‘No Agreement’/‘No Inflection’ initial stage- one of acquisition during which they completely omit functional categories and [-Interp] complex features.

A4: Concluding Remarks: Lack of Recursion | 229

[A-4]4. Stage-1: ‘No AGReement-No INFLection’ (Radford & Galasso 1998) Possessives: That Mommy car. Daddy car. That me car. Me dolly. No baby bike. Me and daddy (= Mine and daddy’s), Him name. Where me car? I want me car. Have me shoe. *Iwant me bottle. It me. Question: Where Daddy car? This you pen? What him doing? Declarative: Baby have bottle. Car go. Me wet. Me playing. Him dead *(Iwant examples are analyzed as formulaic chunking, since no other supportive material providing for a functional analysis of nominative case is found in the relevant stage).

[A-4]5. Statistics for Stage-1 vs. Stage-2: Possessive {‘s}/ verbal {s} (Table 1) (1) OCCURRENCE IN OBLIGATORY CONTEXTS Table 1 Statistics of Stage-1 vs. Stage-2 Possessive {‘s], Verbal {s} AGE

3sgPres s

Poss ‘s

Stage-1, 2;3–3;1 Stage-2, 3;2–3;6

0/69 (0%) 72/168 (43%)

0/118 (0%) 14/60 (23%)

Note that a Verbal {s} carries a two-fold marking of Tense (T) and Agreement (Agr). As a cover labeling, what we are suggesting here is that our child stage-1 (2;3–3;1 years of age) shows a lack of Case and Agreement across the board. Hence, Stage-1 shows a lack of recursive structure. It is not until a matured stage-2 (3;2–3;6) that we begin to find evidence of an optional Case, T, AGR stage. This stage-2 is sometimes referred to an Inflectional Phrase (IP) stage (or Optional Infinitive stage, cf. Wexler 1994).

230 | Reflections

on Syntax: Lectures in General Linguistics

[A-4]6. Frequency of Occurrence of First Person Singular Possessors (Table 2) Table 2 Frequency of Occurrence of First-Person Singular Possessors AGE

OBJECTIVE ME

GENITIVE MY/MINE

NOMINATIVE I

2;6–2;8 2;9 2;10 2;11 3;0 3;1–3;6

53/55 (96%) 11/25 (44%) 4/14 (29%) 5/24 (21%) 4/54 (7%) 6/231 (3%)

2/55 (4%) 14/25 (56%) 10/14 (71%) 19/24 (79%) 50/54 (93%) 225/231 (97%)

0/55 (0%) 0/25 (0%) 0/14 (0%) 0/24 (0%) 0/54 (0%) 0/231 (0%)

[A-4]7. Stage-2: ‘OPtional AGRement -INFLection’* Possessives: That’s Mommy’s car. My dolly. Baby’s bike. His name. Question: Where’s Daddy’s car? This is your pen? What (is) he doing? Declarative: Baby has bottle. Car goes. I’m wet. I’m playing. He’s dead. *(The OI stage (as suggested by Wexler 1994) would simultaneously incorporate both data sets as described in his initial Optional Infinitive stage-1). Radford and Galasso make a clear demarcation between the two stages, with the complete absence of any optional functional projections for their stage-1. For complete data/analyses, see Galasso 2003). (i) Possessive projections, which rely on an AGReement relation with a nominal INFL, must default to an objective case (e.g. my to me); (ii) Verb projections are limited to VPs without INFLection (hence auxiliary- less question and declarative bare verb stems) (e.g. What him doing?, Car go.); (iii) Subjects, which rely on an AGReement with a verbal INFL, must default to having an objective case (e.g., Me wet). Consider the syntactic structures below pairing the two data sets, with stage-one showing no inflectional phrase (IP) agreement.

A4: Concluding Remarks: Lack of Recursion | 231

[A-4]8. Structure: Stage-One/-AGR Structure: Stage- Two /+AGR (i) (ii) (iii)

Possessive: * [IP Mummy [I {-agr}-ø] car] [IP Me [I {-agr}] dolly] Case: [IP Him [I {-agr}] dead] [IP Me [I {-agr}] wet] Verb: [IP Baby [I {-agr}] have] [IP Car [I {-agr}] go -ø]

[IP Mummy [I {+agr}‘s [car]]] [IP My [I {+agr} [dolly]]] [IP He [I {+agr}‘s [dead]]] [IP I [I {+agr} ‘m [wet]]] [IP Baby [I {+agr} [has]]] [IP Car [I {+agr} [go-es]]]

What we are suggesting here is that similar to what we would find of Pidgin language, a putative Proto-language would also bear out such hallmark non- recursive structures, resulting in the complete lack of Tense, Agreement, Case, and presumably, word order. (For early child word order variability, see just below). Regarding word order, a protolanguage would not have the movement capacity to fix where its head of a phrase would position, resulting in a variety of mixed word orders. Tense would need to be signaled by semantic means, (perhaps by words such as today, yesterday, tomorrow), and morphology case, as well as agreement would be non-existent. Language would be reduced, for the most part, to mere iconic representations, with little if any evidence for a symbolic, rule-based processing. For the most part, these above characteristics are exactly what Bickerton (1990) claims for a protolanguage. Such a proto-language could be extended to what we might find of a speculative Neanderthal communicative system— namely, pointing to both qualitative and quantitative distinctions found between a proto-language used by Neanderthal vs. a true/abstract language used by Cro- Magnum. The former being based on a recurrent algorithm, the latter being based on recursion.

[A-4]9. A Brief look at Early Child Word Order As a snap-shot of what early child multi-word word order might look like, whereby only a recurrent merge operation is available and where recursive move has yet to develop in the brain—viz., Broca is yet to be fully on-line (see Wakefield & Wilcox cited herein); what one would expect, as with proto-language, is that there would be plenty of instances in the child language literature showing mixed

232 | Reflections

on Syntax: Lectures in General Linguistics

word order. In fact, with just a cursory look into my own data (see Galasso 1998), what one finds is that whenever only single argument strings (SAS) get uttered by the child (ages 18–30 months, files 8–16) there is a prevalence for mixed usages of SV & VS orders. Such SAS’s would be a prime environment for instances of simpleton merge-operations whereby only two items combine in the merge. In fact, if looked at carefully enough, there is a rich history of cited mixed word order in the child language literature (e.g., Bloom 1970; Bowerman 1973, 1990; Deuchar 1993, Travis 1984; and Tsimpli 1992; among others). Consider some token counts in the final Table 3 below: Table 3 Verb-Subject (VS) Structures/Token Counts (Word Order) VS structures with SASs VP (= merge) / \ V’ Subj | | | |

(xyz)

b.' f.' j.' n.'

V run baby cook daddy work bike go plane

Token counts of VS SASs Table 3 Token counts SV VS SVO Other

(files 8-16): n. 87 78

290

15

In sum, my own results found in my data support the conclusion that what requires word order to be fixed is at least a double argument string (DAS), and that at least a three-member set (as found in the MLU, (mean length of utterance)) must prevail in the young child’s speech output. This DAS allows for the type of movement consistent with the dynamic asymmetry (DA) talked about herein (cf. Moro)—among other things, which forces the break-up of sister relations and secures recursive hierarchy. Unlike simpleton merge (of two sister items) combined in a recurrent manner, move on the other hand secures the necessary labeling of Heads and Complement. In the tree above, what we label as spec actually simply pertains to a generic argument, and nothing more (which merges alongside a verb).

A4: Concluding Remarks: Lack of Recursion | 233 In a more accurate sense, a VP tree should be replaced by a ‘merge-tree’, as follows: merge X

Y

cook daddy

(= VS)

daddy cook

(= SV)

[A]‌p pendix-5 A Note on the Dual Mechanism Model: Language Acquisition vs. Learning and the Bell-Shape Curve

In this first brief note (one of five), I’d like to reflect on how the Dual Mechanism Model (DMM), as compared to a Single Mechanism Model (SMM), might inform our more narrow discussion of Artificial Intelligence (AI) (discussed in Note 4), as well as inform our larger-scope discussions surrounding the ‘nature of language & design’ more generally. The description of our methods here will be based on the following dichotomies: [1] DMM vs. SMM (i) Whereas an SMM is solely reliant on brute-force associations which are inherently tethered to overt Learning—a frequency endeavor [+Freq], where frequency of item-based learning belongs on the vertical mode of processing (to be presented and discussed below). Such item-based learning could be thought of as ‘structure-independent’ since its focus is solely on the isolated item in question and not on the context of overall structure surrounding the item. (ii) Whereas a DMM is abstract and rule-based which is inherently tethered to tacit, covert Acquisition—a [-Freq] endeavor which doesn’t rely on a one-one association of item, but rather can be both (i) item-based and (ii) categorical in nature, where structure-dependency is observant of category over item. Hence a DMM mode—a mode which is both ‘item-based’ when called upon

236 | Reflections

on Syntax: Lectures in General Linguistics

(e.g. such as lexical learning, irregular formation over rule-based regulars, etc.) and ‘categorical-based’ when called upon to engage in the manipulation of symbols—is in a unique position to deliver the kind of ‘learning curve’ which is consistent with what we find of native language acquisition (to be presented and discussed below).

(T)heory. (T). Perhaps the sole property of what makes us uniquely human (i.e., the ability to use language) amounts to little more than the sensitivity to remove ourselves from the myopic item, and to place ourselves at a perspective, just a step away from the item, and to become sensitive to structure dependency. In this sense, T (the ultimate theory of what it means to be human) is that ‘taking a step removed’ from the frequency of an item and seeing how the item sits in an overall structure. (This process of seeing ‘item plus structure’ will be what makes up T of a Dual Mechanism Model (DMM) as advanced herein these five notes, and what was considered as the core property of language discussed in the four-sentences portion of this text).

It goes without saying that items (lexical words) are quintessential learned entities (environmental), they are +frequency-sensitive [+Freq] and carry a classic portmanteaux of features which are typically associated with concrete and conceptual meaning (e.g., Nouns, Verbs, Adjectives). But structure is an altogether different entity. Structure is promoted not by frequency of learning since it is upheld by categorical processes which may strip an item away from frequency and place it into a variable standing within the structure. To see what I mean by this stripping of the item, let’s consider an example that was presented in Sentence #4. Consider the two bracketed Items (I) (say, as ‘phrasal-chunks’) as presented in the Structure (S): (I) (i) [that is] (a two-word item*) (S) (a) I wonder what [that is] up there. (b) I wonder what *[that’s] up there.

*Word-item here as defined by phonological dominant stress—viz., a single word is represented by a single dominant stress pattern—if two dominant stresses, then two words, etc. Note how the word-item ‘spaghetti’ would have the stress pattern of ‘weak-strong-weak’ with the middle stress being dominant. The item

A5: A Note on the Dual Mechanism Model | 237 [that is] holds two dominant stresses, (hence, a two-word item), as we hear when we clap out the two words, while [that’s] holds only one stress, (hence a one- word item). Let’s restate the analysis we presented earlier in Sentence #4 below: The base-generated structure first looks something like: [2] Sentence # 4 (restated) I wonder [__[that [VP is what]]] up there.

In [2]‌, the Wh-object ‘what’ begins as the object/complement of the verb ‘is’ forming a Verb Phrase (VP), and then gets displaced by moving above ‘that’ in the surface phonology (PF) yielding the derived structure. But if we take a closer look, we see that after such movement of ‘what’ out of the [VP ‘is-what’] phrase, the VP survives only as a head [VP is ø] (i.e., the Head (H) ‘is’ survives without its complement object ‘what’). Thus, the phrase is said to ‘partially project’. But partial-phrase projections are indeed allowed in natural languages given that the H still remains (in situ) within the constituent phrase. Hence, we get the licit structure in (a), as compared to the illicit vacuous/empty VP in (b): a. I wonder [whatj [that [VP is __j]]] up there? (A licit structure/oK) b. *I wonder [whatj [that’sk [VP __k __ j]]] up there? (An illicit structure/Not ok)

But movement, even partial movement, does have an effect: note how the H ‘is’ must remain phonologically intact as an H of the VP and can’t become a (phonologically attached) clitic clinging to the adjacent ‘that’, as in the one-word item [that’s], whereby there is a reduction now of only one dominant stress. In other words, at least one of the two lexical items within a phrase (P) (in this case, within the VP) must be pronounced (must be phonologically projected). Hence, as we see, when both items [is] as well as [what] move out of the VP—‘What’ moving into a Spec of a higher P along with the item [is] moving out of its head (H) position of the P and forming itself as a clitic piggy-backing onto the item [that] of the higher P—we see the end result that the VP becomes vacuous (completely empty) and so the structure cannot survive (it becomes ungrammatical). Let’s restate below some points on move from just prior discussions. Moved-based Hence, *[[that]’s] is an illicit structure found in (b) (asterisk* marks ungrammaticality), while Merge-based of the two words [that] [is] is the only licit structure. It seems simultaneous movement of both head ‘is’ along with its complement ‘what’ of the [VP is-what] renders the verb phrase vacuous (i.e.,

238 | Reflections

on Syntax: Lectures in General Linguistics

phrases can’t be both without a head and without its complement at the same time). In this sense, MOVE-based *[[that]’s] is barred and only Merge-based (of the two items) [that] [is] is allowed to project—the former (move) being affixal in nature, the latter (merge) lexical. This ‘Merge vs. Move’ treatment is similar to what we find with the distinction between (merge-based) Derivational vs. (move-based) Inflectional morphology, where the latter is an affix process, and where the former is a word-forming process. (For a similar treatment of ‘Merge vs. Move’ in child language acquisition, see Galasso 2016). [3]‌ Progression of structure (a) ‘is-what’ = VP (Verb Phrase) VP

When object ‘what’ moves up, it leaves Complement/Obj of Head V sll intact, sll allowing a licit projecon of VP.

V

Obj

is what (b)

XP

(XP marks a higher func onal projec on above VP)

Y

X’

what

X

VP

that V

(VP head is filled with V ‘is’, so VP projects) Obj

But note how ‘is’ must remain as a full word and not as a cli c.

is what

(c)

XP Y what

X’ X

that’s

VP

(When V ‘is’ is reduced to a clic [‘s], the VP becomes vacuous

V Obj

(i.e., both V and Comp are empty) and so the VP can’t project).

is what

A5: A Note on the Dual Mechanism Model | 239 But what I want to suggest here for our theory (T) having to do with the following Five Notes below—including, and perhaps most importantly, our discussions to come regarding artificial intelligence (AI)—is that this sensitivity of structure over item sits as a core property of language. Let’s play this out below: Imagine asking any native speaker of English if the two items below are properly formed (Sentence #4): (i) [that is], (ii) [that’s]. Fine, all native speakers will say both are equally proper in their form. And if there was any preference between the two, the preference would certainly go to the item which is most frequent in the input (i.e., the item most usually heard in the speech environment): that would be item (ii) [that’s]. My guess would be that the frequency-count between the two versions could be as high as ‘a-hundred-to- one’ (if not exceedingly more) when measured in spontaneous speech. That is a perfect example of how the processing of an ‘item’ is [+Freq]-sensitive: clearly, we hear more numerous examples of [that’s] than we do of [that is]. (I am treating the two phrases as items here, fragments of constituent structure we hear in the input). But now reconsider the structure of Sentence #4 (restated in [7]‌below) and how the isolated item now becomes a rather peripheral feature of the overall structure. Now consider this: when those same people who were earlier asked about the two items in isolation are now presented with the same two, but now embedded within a structure, all of a sudden the aforementioned preference of [that’s] not only becomes the non-preferred item, but, even more egregious, it becomes altogether ungrammatical in its usage (i.e., [that’s] it can’t be pronounced within Sentence #4). Here is a perfect example of how +Frequency of item is trumped by [–Freq] of structure—‘Item vs. Structure’. I say [-Freq] of structure because structure is not the kind of construct which carries that portmanteaux of features (semantic) which can be readily processed via a brute-force memorization scheme. (Structure is rather category-dependent, syntactic in nature (not semantic), and works on and across variables). In fact, most speakers don’t know, nor can they conceptualize, what it is that allows them to tacitly know if one construct is grammatically correct over another (not unless, of course, one is a linguist who works in syntax). Rather, syntactic structure is notorious abstract and hidden, away from the

240 | Reflections

on Syntax: Lectures in General Linguistics

mundane processes of learning a list of items. (Syntax is not simply a list of items gathered which make-up a lexicon). In linguistic theory, much is made about linguistic intuition and to question from where such grammatical intuitions come. One very interesting way to talk about differences between intuition (which seem to arise in a natural way), versus a kind of learned methodology (which relies on declarative understanding of what makes a sentence grammatical) is to overlap a statistical methodology to the range of competency found for such +/-grammatical judgments. The methodology I have in mind here is the classic statistical averaging found across a given demographic range (γ), which measures a competency of a given skill . The test is to see whether one finds the classic bell-shape curve (a universal staple behind any measurement of a learned skill), or if one finds the so-called right- wall (which portrays a biological endeavor of acquisition over learning). This dual outcome of learned (bell-shape curve) vs. acquisition (right-wall) might suggest how a template scaffolding (overlapping linguistic theory) could serve to illustrate our DMM: [4] Template scaffolding overlaps onto linguistic theory* (Figure 13) X=X, associative-based, item-dependent/SMM (Learned) Items [V, N, Adj] / [+ Freq] [verb] {irregulars}

[dream] = new word /phonological shift of stem [dreamt] X +

[[dream]

Y

=Z

/DMM [-Freq]

ed]

= past tense

ed]

= [[dream] ed] /same phonology of stem

X + Y = Z rule-based, category-dependent /DMM (Acquired) {regulars}

Figure 13 Template Scaffolding Linguistic Theory Whereas items extent vertically [x = x], rules spread horizontally [x + y =z], the former is recurrent [], the latter recursive [[]‌]. (As discussed in the Preface, this dual distinction makes-up my personal metaphor of Items [x-tables, y-chairs, z- nightstands] vs. category [α-furniture [x, y, z]]).

A5: A Note on the Dual Mechanism Model | 241 *Consider such words which share semantically close stems but where the stems shift phonologically: e.g., [N glass]-[ V glaze], [N grass]-[ V graze] /s/>/z/, [N bath]-[ V bathe] /θ/>/δ/, plus vowel shift of /ǽ/>/é/. Also note how irregulars such as dream-dreamt, keep-kept, knell-knelt, dive-dove must contain a similar phonological sound shift in order for the lexicon to identify the item as a new word (X=X). (Sound-shifts facilitate memorization of a new item—there is a difference between grass and graze, one is a noun-item, the other a verb-item). Also note how only a DMM could handle a certain class of words which can be both irregular and regular (both versions being accepted) at the same time: √dive (dove or dived), √knell (knelled or knelt) √dream (dreamt or dreamed) etc. So to recap, what our theory above shows (implicating a DMM as compared to an SMM) is that with such high frequency [+Freq] learning, (as with any skill which relies on brute-force memorization), what we get statistically is the bell- shape curve (below). On the other hand, when the competency level seems to reach a mastery competency across 100% of its demography, what we suggest is that such a right-wall is consistent with what we find of biology. It has long been recognized that first language (L1) acquisition, as compared to (post-critical period) second-language (L2) learning follows this same trajectory—with L1 biology pegged to right-wall distributions, and L2 learned skills pegged to bell- shape curves. [5] Bell-shape curve (Google© ‘free-to-use’ image). (Figure 14) Competency of a Learned Skill /(L2)

Figure 14 Bell-Shape Curve/Competency of Learned Skill

242 | Reflections

on Syntax: Lectures in General Linguistics

Whenever statistical averages of a competency of a certain Skill are spread across a given demographic , what one finds is a very consistent (probabilistic) average. This average, shown above as 34.1%/34.1% (= 68.3%) on both sloping sides of the bell, indicates the average skill set of . The largest subset of ‘people studies’ shows an average-level of competency for skill x, and this is consistent across all skills looked at. (Note: The stableness of this bell-shape ratio comes close to what we know of the Fibonacci ‘golden ratio’). The scale here −4 to +4 could be understood as extreme incompetence at the ‘left-wall’ (−4), while the right-wall (+4) shows a very rare mastery.1 What is so intriguing about the so-called ‘right-wall’ when it comes to a learned skill is that its extreme high-level of mastery mimics what we know of any biological-bases which governs learning. In fact, it’s not overt ‘learning’ at all, but rather a state of biologically determined acquisition. It is in this sense that the terms ‘learned’ versus ‘acquisition’ makes its way into the L1 vs. L2 literature— namely, L1 (pre-critical-period native first language) is biologically determined and so does not suffer the competency spread found of bell-shape learning, while L2 (post-critical-period second language learning) shows bell-shape statistics. [6] Bell-shape for language ‘learning’ vs. Right-wall for ‘Biological Bases’ of Language Acquisition. (Google© ‘free-to-use’ image). (Figure 15). Bell-shape (=L2)

Right-wall (=L1)

Figure 15 Bell-Shape vs. Right Wall: Biological Basis for Language 1 (See Stephen Jay Gould’s Full House: The spread of Excellence from Plato to Darwin (1996) for discussion of the ‘right-wall’).

A5: A Note on the Dual Mechanism Model | 243 What we know of biological-based competency distributions is that they show mastery of the acquired endeavor at mastery levels only found at the extreme right-wall—viz., of what would be considered ‘the very rare extreme mastery level’ of 0.1%. What right-wall mastery shows is that amongst the general population (of all biologically healthy individuals) the statistical anomaly of, say, 0.1% actually becomes the normal average. The difference here is that learned vs. biologically determined accrue very different processing costs—namely, learning a skill is a general problem solving skill, cognitive in manner and follows all the classical IQ-dynamic hallmarks of ‘learning’ (e.g., asserting oneself in such a learning environments, note-taking skills, preparation, mnemonic devices for memorization, etc., and other strategies for learning such as motivation, aptitude, as well as some physiological factors which might determine the rate and success of the attempted skill). One the other hand, biologically-determined acquisition accrues no cost—it comes for free (as part of human endowment). So, it becomes interesting to us that the right-wall of competency only shows- up across a demographic when a biologically-based behavior is measured. This becomes important when we begin to measure linguistic intuition for first language (L1) as compared to second language (L2). Recall, that for our four sentences, the ability to process such recursive structure embedded in these sentence types takes on a right-wall grammatical intuition and acceptability (for L1). For instance, recall another example of the grammatical intuition that came when L1 speakers were asked ‘can eagles that fly swim?’ (so, what are we asking that eagles can do?). Recall, the L1 reply was 99.7% consistent across the board that what was being asked was ‘if eagles can swim’ and not fly. Such is a right-wall distribution on a par with any other biologically-determined processing. It rather seems the kind of knowledge native L1 speakers bring to their L1 performance has little, if any, connection to IQ problem-solving skill capacity, motivation, or the like. In fact, it has been repeatedly shown that even low IQ children, who otherwise may suffer from general learning handicaps, seem to have their language competency completely intact and unaffected. (Even some severely mentally retarded children show little impact on their L1 language acquisition). This may be precisely because L1 is indeed innate acquisition (= biology), and not learning. Also, it becomes interesting that when the ‘eagles sentence’ is presented to L1 speakers, but just visually shown to them, the innate recursive processing is not immediately made apparent to them (many students initially stumble on which is the right answer). Perhaps, this is because reading is a ‘learned’ processing (unlike speech) and so it doesn’t necessarily map onto the internal language mode of processing. Interesting, once the L1 speaker says the sentence out-loud,

244 | Reflections

on Syntax: Lectures in General Linguistics

and hears the construction via speech, the hidden-internal recursive mechanism becomes activated and immediately the L1 speaker instinctively knows that we are asking if ‘eagles can swim’ and not ‘fly’ (again, despite the surface-level phonology that indicates the first, closest verb as ‘fly’). *Note how when sequential bell-shape curves spiral out and get spread out over a time span (evolution)—and then when something emerges along the way as some constraint or human barrier, or upper ceiling—that what we find is the horizontal spread of the bells becomes smaller and smaller (in longitude) until such a time that a right-wall develops (latitude). In a sense, the right wall is a natural outcome of a collapse of space and time, as some consequence of human capacity to statistical convergence. L1 biologically-determined (right wall) vs. L2 learned (bell-shape) Recall, that regarding the four language modes (speaking, listening, reading, writing), only the former two are natural and biologically determined which bring on the right-wall distribution of competency. The latter two are artificial skills, hence, their bell-shape curve of competency. These latter two modes, which are culture-bound and must be practiced, rely on a kind of ‘frequency-effect’ for its level of competency. Such ‘frequency-effect’ bases of learning are altogether reliant on memorization, among other cognitive strategies. Let’s keep this dual distinction in mind when we come to discuss the intuitive grammatical judgment of Sentence 4a vs. 4b of ‘Sentence #4’ (restated here in [7]‌): [7] ‘I wonder [what [that is]]… up there?’

(i) (ii)

[what [that is]]__ [what *[that’s]__

The judgment is even more fascinating given that fact that (ii) [that’s] is abundantly more prevalent in the frequency-data as compared to [that is]. Still, despite the higher frequency of [that’s] as found in the Bell-shape data, the ruling against frequency and rather for structure (even when the structure goes against the frequency) suggest that a very different kind of an operating system (OS), using Artificial Intelligence (AI) terms, is being employed. (The theoretical linguist reminds us that language is indeed structure dependent, not frequency dependent). The best way to test this is by simply asking a native English speaker: Which of the two utterances do you prefer, (i) that is, or (ii) that’s …? The latter is overwhelmingly approved above the former, perhaps for reasons that are not at all syntactic in nature, such as simplicity, ease of speech, economy, etc. In any

A5: A Note on the Dual Mechanism Model | 245 event, the fact that (i) is close to 100% judged as the only possible structure (for ‘Sentence #4’), a fact which flies in the face of a statistical/frequency-based analysis, suggests that a larger, and rather hidden deep-state structure is active below what we find at the surface level phonology. Couple this with the notion that close to 100% of native speakers come to the same conclusion argues against a learned, bell-curve response and rather speaks to how such a deep structure, in Chomsky terms, is indeed a biological determined a right-wall. ‘Language is biology through and through’… and may not be something that can be learned, as if learning to play the piano. This demarcation of natural acquisition (pre- critical period), as found in child first language acquisition, as compared to what we know of (post-critical period) artificial second language learning allows us to see the bell-shape curve for what it really is—a probabilistic informatique of math and statistics which we reach when observing competency of a non-innate learned skill.

[A]‌p pendix-6 Overview of Chomsky

For student overview, see link below: https:// w ww.academia.edu/4 3570822/ I nitial_ Notes_ f or_ L ING_ 329_ LING_417_Skinner_v_Chomsky_and_the_Biological_Basis_for_Language Over many years, Chomsky has, in one mode or another, constantly referred to some form of an innate/mental ‘Linguistic Apparatus’—to quote Descartes, a ‘Faculty of Thinking’—as taking on a biological dimension, lodged in the human brain, which serves to function as some ‘intervening process’ between input & output. This linguistic apparatus is what we refer to hear in this lecture as ‘Form’. This appendix presents a brief chronological overview the form has taken over the span of several decades pertaining to the ‘Generative Grammar’ Enterprise. (GGE)

[A-6]1. ‘Grammatical Transformations’/‘Phrase- structure’ Rules Aspects of the Theory of Syntax (Chomsky 1965). In this early treatment, the notions of ‘Surface vs. Deep’ structures enter the GGE literature showing that some intervening process in the way of Transformational/Phrase structure rules

248 | Reflections

on Syntax: Lectures in General Linguistics

decodes what one hears in the surface phonology (input) and manipulates it to map onto deep-structure logic. The most famous examples are: (i) I expected John (to leave) (ii) I persuaded John (to leave) Where, while on the surface/phonological level, the two sentences may seem to have identical structure, the deep structure bears very different meaning: in (ii), The underlying deep structure must show that John is the direct object of the verb persuade, while in (i) John doesn’t take on direct object status: consider the difference, I persuaded John (of the fact …) is grammatically fine while * I expected John of the fact (is illogical). The same aspects deal with Active vs. Passive constructs where it was assumed under behaviorist theories that they were two separate sentences, while in GGE, they take on the same deep-structure logical form while only having two different surface-structure phonological processes. (i) John kissed Mary [John does the kissing]. (ii) Mary was kissed by John [John does the kissing].

[A-6] 2. ‘Universal Grammar’ (1976) Stated by Chomsky as the innate ‘system of principles, conditions and rules that are elements or properties of all human language’. (Chomsky 1976, p. 29). UG, in this sense, is a system of knowledge, not of behavior: It regards an internal mental form not exclusively pegged to external behavior. UG becomes instantiated as a species-specific language organ situated in the human brain which guides internal processing to create external linguistic behavior.

[A-6] 3. ‘Principles and Parameters’ Framework (P&P) (1981) Part and parcel of the Government and Binding framework of the early 1980s, we find UG becoming a platform for P&P—whereby principles are those common elements and properties that all human-natural languages share, and parameters are those variant ‘peripheral features’ which make languages appear somewhat different from one another (at least on the surface level).

A6: Overview of Chomsky | 249 (i) Input →[P&P] →output (ii) With P&P innate structure as [lexical principles [functional parameters]] Input

Input

UG

*UG

[Principles Parameters]

Output

PF

LF

*(Phonological Form (PF) breaks down into (i) syllabic and (ii) phonemic form representations, while Logical Form (LF) has to do with (i) semantics and (ii) theta-marking). This dual process found in UG then led to two stages of child language acquisition, based on a lexical stage-1 vs. a functional stage-2 —where stage-1 principles map onto lexical categories (N, V, Adj, Prep) and where stage-2 parameters map onto functional categories (Determiner, Aux-verb, Tense, Agreement, Case). This model was then articulated as the Language Acquisition Device (LAD) which helped to account for the two classic stages of child language acquisition.

[A-6] 4. The Dual Mechanism Model (DMM) (1990s) The coming together of these two various aspects of language in the brain. A Brain-to-Language corollary becomes established in the literature to account for not only child language acquisition, but aphasia as well: (Broca’s area as it relates to parameters/movement vs. Wernicke’s areas as it relates to principles). (See Pinker).

[A-6] 5. ‘Faculty of Language’ (FL) (2000) This term proceeds from out of Hauser, Chomsky, Fitch (2002) whereby the authors articulate a clear distinction between (i) broad-scope cognitive/general

250 | Reflections

on Syntax: Lectures in General Linguistics

problem-solving skills associated with language learning (which all higher-order primates share) and (ii) narrow-scope ‘recursive properties of syntax’ (which is human-species specific). Out of FL comes the phonological form (PF) (typically what is found at the surface level) and Logical Form (LF) which had since replaced the notion of deep structure. Input => [LF [broad [narrow]] as it maps onto [UG [principles [parameters]]

Works Cited

Accumulative Lecture Bickerton, D. (1990). Language & Species. University of Chicago Press. Chomsky, N. (2002). On Nature and Language. Cambridge University Press. (See chapter 2 ‘Perspective on language and mind’). __ _ _(1981). “Principles & parameters in syntactic theory”. In N. Hornstein & D. Lightfoot (eds) Explanations in Linguistics, London: Longman. ____(1965). Aspects of the Theory of Syntax. MIT Press. Galasso, J. (2019). Recursive Syntax: A minimalist perspective on recursion as the core property of human language and its role in the generative grammar enterprise. LINCOM Studies in Theoretical Linguistics, 61. __ _ _(2019). “Reflections on Syntax”. PDF Ms paper. Academia site. ____(2016). From Merge to Move: A minimalist perspective on the design of language and its role in early child syntax. LINCOM Studies in Theoretical Linguistics, 59. ____(2013). Minimum of English Grammar: an introduction to feature theory, Vol 1. Cognella publications. (2006). “The nature of the input”. (MS. CSUN, see Pinker below). Hauser, Chomsky, Fitch (2002). (See fn 4 for link and citation). Hoff, E. Language Development (class text for Ling 417/329). (See syllabus for citation). Pinker, S. (1999). Word & Rules. NY: Basic Books. (For the ‘Dual Mechanism Model’ (DMM), also see Galasso, 2006). https://w ww.csun.edu/~galasso/Thenatureoftheinputfullpaper.pdf.

252 | Reflections

on Syntax: Lectures in General Linguistics

Radford, A. (2016, 2nd ed). Analysing English Sentences. Cambridge Textbooks in Linguistics, CUP. ____ (1990). Syntactic Theory and the Acquisition of English Syntax. Oxford: Blackwell. Radford, A. & J. Galasso (1998). “Children’s Possessive Structures: A Case Study”. Essex Research Reports in Linguistics, 19: 37– 45. https://w ww.semanticscholar.org/paper/Children%27s p ossessive- s tructures%3A- a - c ase- s tudy- R adford- G alasso/ 5 83c986c65c04b585ed0e 8705ba21f630ca11030

List of Terms (informal definitions)

Adaption (vs. Exaptation): terms which distinguish Darwinian ‘adaption’ via natural selection vs. S.J. Gould’s hitchhike ‘exaptation’ (as a free rider). Agreeing Languages/Non-A greeing (Language types): Languages like Japanese which doesn’t require any overt agreement between e.g., Number/Plural (e.g., ‘Two book’ rather than what we find in +Agree languages—to a certain respect what we find in English ‘two books’). English however is not a full +Agree language (unlike Spanish which is) since English doesn’t require Agree between Nouns and Adjectives, etc. (Spanish requires agreement of the plural marker {s} between N and Adj—e.g., ‘mis carros rojos’ translates to (= my+s car+s red+s) (English: My red cars)). Aristotle (see his debates with Plato over empirical v rationalism). Assimilation (phonological): a rule-based phonological process whereby ‘distinctive features’ move and affect neighboring phonemes. (e.g., the voiceless /s/in ‘cars’ => becomes voiced /z/, due to the adjacent voiced /r/: /k arz/). Constraints on (mother-daughter relation) (see fn. 13 link, Galasso 2019) ‘Movement-based theoretical applications.’ Sister-sister relation. In hierarchical structure, two nodes which are adjacent and arise out of a singular constituency: e.g., nodes (of the structure ) are said to be ‘sisters’, while are not sister, but rather form a ‘mother-daughter’ relation (e.g., Phonological assimilation is allowed only between sister phonemes). Case (morphology/INFLection) e.g., I vs. Me, She vs. her, etc. Nominative case vs. Accusative case.

254 | Reflections

on Syntax: Lectures in General Linguistics

Categorical representation (phonemic, syntactic), Similar to Plato’s argument of a pure/ideal form. (See Meno’s problem, the Republic). That we don’t rely on the environment to obtain category representation. ‘One step removed from the environment’. Child language (First Language, L1). (See Radford, Radford & Galasso cited herein). Chomsky, Noam (See Skinner vs. Chomsky debate, 1959. His review of Skinner’s 1957 book Verbal Behavior). See Berko’s ‘Wugs test’. Roger Brown’s children (Adam, Eve, Sarah, Christopher): First child language studies (Harvard). Darwin (founder of the science of evolution: adaptive measures). Descartes (17th century Enlightenment, Rationalism). Dual Mechanism Model (DMM) (see Pinker, ‘Words & Rules’ theory/book. Galasso 2006). A dual-storage capacity of explicit knowledge of language in the brain for both (i) enclycopedic/semantic information and (ii) episodic/syntactic information-processing. Form defines Function (innate category/design which has an effect on how humans perceive and process speech and language—e.g., the idea that ‘two people can hear the same sound different). Frequency (effects) (Behaviorism) Where the role of repetition secures strength of processing. Functional (stage-2) (Language acquisition) Functional words: Auxiliary verbs (Do, Be, Have) sometimes called ‘helping verbs’. Elements which introduce a Verb (e.g, do speak, is cooking, have seen). Determiners (a, the, this that, my, each, every, all, etc.), elements which introduce a Noun. (Functional words along with INFLections go missing at early stages of child language acquisition). Functionalism (that language evolved out of a niche to communicate, different from functional stage-2) Galileo (See Chomsky’s book On Nature and Language, Chapter 2). Gapping (e.g. ‘John bought apples and Mary __oranges’: there is a silent gap __(= bought) between ‘Mary’ and __‘oranges’ which is recovered via the context/prior utterance). There are interesting syntactic constrains imposed on Gapping. ‘Ghost in the machine’ (Descartes). Newton exorcised the machine and left the ghost intact. His inability to explain action at a distance (e.g., gravity). The departure of the classical ‘mechanical world’. INFLectional (morphology) affixes such as {s}, {ed}, {‘s}, {ing} which involve movement. L1-Interference (L1-Transfer) The influence of L1 affecting a second language (See Fundamental Difference Hypothesis, Bley-Vroman). Language Faculty (Chomsky’s notion calling for an innate and mental design for language). Language & Technology (emoji, SMS, texting, initialism) Lexical (stage-1) (Language acquisition) Locke (see the ‘Blank Slate’ theory, Tabula Rasa). (17th century British Enlightenment, vs. Descartes). Newton (Though who sees absurd the notion of ‘action at a distance’, nonetheless, must accept paving the way of the dismantling of classical mechanics). (See Ghost in the Machine).

List of Terms (informal definitions) | 255 Phonemic (development): stages of phonemic representation. (e.g., Plosives>Fricatives>Palatal>Interdental). Three stages of phonemic develop leading to child speech errors. (See ‘Form defines Function’ in this respect). Phonology (L1, L2) Pidgin Language (see D. Bickerton). Plato (see debate between Plato vs. (his student) Aristotle). Much of the dialogues come to us via Socrates. Pro-drop. Language such as Spanish which allows pronoun/subjects to be dropped. Punctuated equilibrium (S.J. Gould) That ‘Gradualism’ may not explain certain aspects of evolution. Recursive syntax (See Galasso 2019s book ‘Recursive Syntax’). The ability to have recursive/embedded structures within language design. One notion is that recursion takes us from (i) the ‘public’ objective/explicit semantics of the item (albeit detached, still an aspect of so-called ‘declarative, enclyopedic knowledge’) to (ii) a ‘private’ episodic/personal connection, whereby the very movement of the item from public to private instigates a new ordering system in the brain. For instance, in certain aphasia/stroke cases, while ‘semantics’ may be preserved (i.e., enclyclopedic knowledge), the aphasic subject may no longer have access to his/her emotional/episodic response to that public information since the neuro-routing of ‘semantic knowledge to episodic knowledge’ has been disrupted. Many cognitive-neurolinguists think that this movement from ‘item to category’, or from ‘semantic to episodic’ is an essential feature of the nature of recursive ‘syntax’, a uniquely human ability. (See Eco’s 2005 novel for a fictionalized portayal). Second language (L2) ‘Non-native’ language. See L1 interference, The ‘Fundamental Difference Hypothesis’ (Bley-Vroman). Skinner, B.F. (See Skinner vs. Chomsky debate). The belief that language learning is just another form of general problem-solving skill. That language is a ‘learned behavior’ based on Stimulus & Response conditioning. (See Radical Behaviorism of the 1950s pace Chomsky). Speech is Special (Philip Lieberman) Syllabic Template (CVC) CC-clusters (e.g., ‘skate’ CCVC, /sket/(initial CC-cluster)). Syllabic stages (CV /k a/, CV:CV /k a: ka/gemination, CCVC /skul/(school) with CC-cluster). Telegraphic Speech. Reduction of functional words and inflectional morphology (only lexical communicative content words are generated in early child syntax—like sending a word- reduced telegraph at minimum cost). Child lexical stage-1 (See Radford’s 1990 book). (See Radford & Galasso 1998 for lack of INFLection/Possessives). Tense (Tense Phrase, TP). [TP Mary [T {s} [VP speak-s French]]]. ‘Wanna’ contraction (= ‘want to’) e.g., ‘Who do you ‘wanna’ help?’ as opposed to the illicit *‘Who do you wanna help you?’ Constraints on (See constraints on ‘wanna’ contraction). Word segmentation (word boundaries #) e.g., ‘The teacher sits’ has the word boundaries: (i) /δə # tičər # sIts/ (The teacher sits)

256 | Reflections

on Syntax: Lectures in General Linguistics

(ii) /δət # ičərs # Its/ (*Thet eachers its). (Image wrong placement of boundaries in (ii) above: But children don’t seem to make such egregious errors: Why?) So, how do very young children know where to place the word boundaries before they have acquired the language? This poses a so-called ‘learnability problem’. There are indeed innate factors which guide children here: Innate Constraints help guide the child—e.g., (i) CVC can’t hold a nucleus which is an unstressed schwa /ə/, ruling out any possible */δət/(CVC) since schwa is unstressed, and CVC proto-word templates much have a stressed vowel (its only vowel). The word ‘the’ is allowed since it is only a CV (not a proto-word template). (ii) Assimilation is violated with # ičərs since /r/is voiced and /s/is voiceless.

Full References and Web Links

Abler, W. L. 1989. “On the Particulate Principle of Self-Diversifying Systems.” J. of Social and Biological Structures 12: 1–13. Balogh, J., and Y. Grodzinsky. 2000. “Levels of Linguistics Representation in Boca’s Aphasia.” In Grammatical Disorders in Aphasia: A Neurolinguistics Perspective, edited by R. Bastiaanse and Y. Grodzinsky. London: Whurr Publishers. Bates, E. 1997. “Origins of Language Disorders: A Comparative Approach”. Developmental Neuropsychology 13, 447-476. Baumgartner, P., and Sabine Payr, eds. 1995. Speaking Minds: Interviews with Twenty Eminent Cognitive Scientists. Princeton University Press. Berko, J. 1958. “The Child’s Learning of English Morphology”. Word 14, (2–3): 150–177. http:// www.tandfonline.com/doi/pdf/10.1080/00437956.1958.11659661. Bever, T. 1970. “The Cognitive Basis for Linguistic Structures.” In Cognition and the Development of Language, edited by J. R. Hayes, 279–362. New York: Wiley. Bickerton, D. 1984. “The Language Bioprogram Hypothesis”. Behavior and Brain Sciences 7: 173–221. —— —. 1990. Language and Species. Chicago University Press. —— —. 1995. Language and Human Behavior. University of Washington Press. —— —. 2010. “On Two Incompatible Theories of Language Evolution” (chapter 14). In The Evolution of Human Language: A Biolinguistic Perspective, edited by Larson, R., V.Déprez., and H. Yamalido. Cambridge University Press. Bloom, L. 1970. Language Development. MIT Press.

258 | Reflections

on Syntax: Lectures in General Linguistics

Bobaljik, J. 2000. The Rich Agreement Hypothesis in Review. Ms. McGill University. Boeckx, C. 2008. Understanding Minimalist Syntax. Blackwell. Bowerman, M. 1973. Early Syntactic Development. Cambridge University Press. __ _ _ _ _. 1990. “Mapping thematic roles onot syntactic functions: are children helped by innate linking rules?” Linguistics, 28: 1253-1289. __ _ _ _ _. 1995. “Don’t giggle me!” Talk presented at Essex University, Nov. 1995. Brown, R. 1973. A First Language: The Early Stages. Harvard University Press. Chomsky, N. 1955. Logical Structure of Linguistic Theory, ms. (revised version published in 1975). Chicago: Plenum. — — — . 1956. “Three Models for the Description of Language”. In IRE Transactions on Information Theory. MIT. —— —. 1957. Syntactic Structures. The Hague: Mouton & Co. —— —. 1958. “Review of B. F. Skinner, ‘Verbal Behavior’ ”. Language 35: 26–58. —— —. 1965. Aspects of the Theory of Syntax. Cambridge, MA: MIT Press. —— —. 1966. Cartesian Linguistics: A Chapter in the History of Rationalist Thought. University Press of America. —— —. 1976. Reflections on Language. London. Penguin. —— —. 1981. Lectures in Government and Binding. Dordrecht: Foris. —— —. 1995. The Minimalist Program. MIT Press. —— —. 2001. “Beyond Explanatory Adequacy”, ms. MIT. (A published version appeared in Belletti, A. ed. 2004. Structures and Beyond: The Cartography of Syntactic Structures, Vol. III. Oxford University Press). —— —. 2002. On Nature and Language. Cambridge University Press. —— —. 2008. “On Phases”. In Foundational Issues in Linguistic Theory: Essays in Honor of Jean- Roger Vergnaud, edited by R. Freidin, C. Otero, and M. L. Zubizarreta. MIT Press. __ _ _ _. 2010. “Some evo devo theses: how true might they be for language?” In Larson et al (eds) The Evolution of Human Language: 45-62. —— —. 2013. “Problems of Projection.” Lingua 130: 33–49. (‘Lingua paper’). Clahsen, H. 1999. “Lexical Entries and Rules of Language. A Multidisciplinary Study of German Inflection.” Behavioral and Brain Science 22: 991–1060. (Target article). Clahsen H., and M. Almazen. 1998. “Syntax and Morphology in Williams Syndrome.” Cognition 68: 167–198. Clahsen, H., G. Marcus., S. Bartke, and R. Wiese. 1995. “Compounding and Inflection in German Child Language”. In Yearbook of Morphology, edited by G. Booij and J. van Marle. Kluwer. Clahsen, H., and M. Rothweiler. 1993. “Inflectional Rules in Children’s Grammars. Evidence from German Participles.” Yearbook of Morphology 1992: 255–288. Crain, S. 1991. “Language Acquisition in the Absence of Experience.” Behavioral and Brain Sciences 14: 597–650. (See also Ni, Weijia, Steven Crain, and Donald Shankweiler. 1996. “Side-Stepping Garden Paths.” Language and Cognitive Processes 11: 283–334). Crain, S., and R. Thorton. 1998. Investigations Ion Universal Grammar: A Guide to Experiments on the Acquisition of Syntax and Semantics. MIT Press. Dennett, D. 1995. Darwin’s Dangerous Ideas. Simon & Schuster.

Full References and Web Links | 259 Deuchar, M. 1993. X-Bar Syntax and Language Acquisition. (ms. paper given at the Linguistics Association of Great Britain). Dreyfus, H., and S. Dreyfus. 1986. Mind Over Machine. New York: The Free Press. Eco, U. 2005. The Mysterious Flame of Queen Loana. Harcourt. Eldredge, N. & S.J. Gould. 1972. “Punctuated equilibria”. In Schopf, Thomas, J.M. (ed). Models in Paleobiology. 82-115. Freeman Cooper & co. Elman, J. 1998. “Generalization, simple recurrent networks, and the emergence of structure.” Proceedings of the 20th Annual Conference of the Cognitive Science Society. Lawrence Erlbaum Associates. Epstein, D., H. Kitahara, and T. Daniel Seely. 2014. Linguistic Inquiry, Vol. 45.3, 463–481. MIT. https://w ww.mitpressjournals.org/doi/f ull/10.1162/LING_a _00163. (Posted Online August 01, 2014 https://doi.org/10.1162/LING_a _00163) Feynman, R. 1985. Surely You’re Joking Mr. Feynman. Norton Press. Fisher, S. E., and G. F. Marcus. 2006. “The Eloquent Ape: Genes, Brains and the Evolution of Language.” Nature Reviews Genetics 7, (1). 9-20. Fitch, W. 2010. “Three Meanings of ‘recursion’: Key Distinctions for Biolinguists” (Chapter 4). In The Evolution of Human Language, edited by Larson, R., V. Déprez, and H. Yamakido. Cambridge University Press. Fitch, T., M. Huaser, and N. Chomsky. 2005. “The Evolution of the Language Faculty: Larifications and Implications.” Cognition 97: 179–210. Fodor, J. 2000. The Mind Doesn’t Work That Way: Scope and Limits of Computational Psychology. MIT Press. Fukui, Naoki, and Margaret Speas. 1986. “Specifiers and Projection.” MIT Working Papers in Linguistics 8: 128–172. Galasso, J. 1999/2001. The Development of English Word Order. (Essex University 1998 Ph.D. Dissertation, Cal. State University Northridge 2001) https://w ww.csun.edu/~galasso/ worder.pdf. (Text taken out of J.A. Galasso. 1999. The Acquisition of Functional Categories: A Case Study. Ph.D. Diss. Essex University (Ch 3). For references, see Galasso, 1999/2003 Essex/IULC Press). —— —. 2003. The Acquisition of Functional Categories. Bloomington, Indiana: IULC Publications. —— —. 2016. From Merge to Move: A Minimalist Perspective on the Design of Language and Its Role in Early Child Syntax. LINCOM Studies in Theoretical Linguistics 59: 249. — — — . 2018. “A Brief Note on a Merge- Based Theory of Child Language Acquisition.” Northridge: Ms. Cal. State University. https://w ww.academia.edu/36787155/A _ Merge- based_theory_of_child_language_acquisition. —— —. 2019. Recursive Syntax. LINCOM Studies in Theoretical Linguistics, 61: 226. Goldschmidt, R. 1940/1982. The Material Basis of Evolution. Yale University Press. Gordon, P. 1985. “Level-Ordering in Lexical Development.” Cognition, 21: 73–98. (See ‘Rat- eater’ experiment). Gould, S. J. 1979. (Stephen Jay Gould; Richard Lewontin. 1979) “The Spandrels of San Marco and the Panglossian Paradigm: A Critique of the Adaptationist Programme.” Proc. Roy. Soc. London B 205 (1161): 581–598. doi:10.1098/rspb.1979.0086. Gould, S. J. (ed) 1993. The Book of Life. Norton Press.

260 | Reflections

on Syntax: Lectures in General Linguistics

______1996. Full House: The Spread of Excellence from Plato to Darwin. Harmony Books. —— — 2007. Punctuated Equilibrium. Harvard University Press. Gould, S. J., and R. C. Lewontin. 1979. “Proceedings of the Royal Society of London.” Series B, Biological Sciences 205, 1161. The Evolution of Adaptation by Natural Selection, 581–598. https://faculty.washington.edu/lynnhank/GouldLewontin.pdf Gould, S. J., and E. Vrba. 1981. “Exaptation: A Missing Term in the Science of Form.” Paleobiology 8: 4–15. Grodzinsky, Y. 1986. “Language Deficits and the Theory of Syntax.” Brain and Language 27(1): 135–159. —— —. 1990. Theoretical Perspectives on Language Deficits. Cambridge, MA: MIT Press. —— —. 1995. “A Restrictive Theory of Agrammatic Comprehension.” Brain and Language 50: 27–51. Grodzinsky, Y., and Santi, A. 2008. “The Battle for Broca’s Region.” Trends Cogn Sci, Dec. 1: 474–80. Hadley, R. F. 2000. “Cognition and the Computational Power of Connectionist Networks.” Connection Science, 12: 95–110. Halle, Morris, and Alec Marantz. 1993. “Distributed Morphology and the Pieces of Inflection.” In The View from Building 20, edited by Kenneth Hale, and S. Jay Keyser, 111–176. MIT Press, Cambridge. Hauser, M., N. Chomsky, and W. Fitch. 2002. “The Faculty of Language: What is it, Who Has it, and How Did it Evolve?” Science 298: 1569–1579. Hebb, D. 1949. Organization of Behavior. New York: Wiley. Kayne, R. 1994. The Antisymmetry of Syntax. MIT Press. Keep, B., H. Zulch, and A. Wilkinson. 2018. Truth is in the Eye of the Beholder: Perception of the Müller-Lyer Illusion in dog. Open access article, 5, Sept. 2018. https://link.springer.com/ article/10.3758/s13420-018-0344-z. Köhler, W. 1972. The Task of Gestalt Psychology. Princeton, NJ: Princeton University Press. Kuhl, P., and Meltzoff. 1996. “Infant Vocalization in Response to Speech: Vocal Imitation and Developmental Change.” The Journal of the Acoustical Society of America, 100: 2425. Larson, R. 1988. “On the Double Object Construction.” Linguistic Inquiry 19: 335–392. Larson, R., V. Déprez, and H. Yamakido, eds. 2010. The Evolution of Human Language: A Biolinguistic Perspective. CUP. Lasnik, H., and M. Saito. 1992. Move Alpha: Conditions on Its Application and Output. MIT Press. Levy, Y., and J. Schaeffer, eds. 2003. Language Competence across Populations. Lawrence Erlbaum. Lightfoot, D. 2000. “The Spandrels of the Linguistic Genotype”. In Knight et al. The Evolutionary Emergence of Language, Social Function and the Origins of Linguistic Form. Cambridge University Press. ____ 2006. How New Languages Emerge. Cambridge University Press. Marantz, A. 1997. No Escape from Syntax: Don’t Try Morphological Analysis in the Privacy of Your Own Lexicon. University of Pennsylvania Working Papers in Linguistics. Vol. 2, issue 4. https://repository.upenn.edu/cgi/viewcontent.cgi?article=1795&context=pwpl. Marcus, G. 2001. The Algebraic Mind: Integrating Connectionism and Cognitive Science. MIT Press.

Full References and Web Links | 261 —— —. 2017. Artificial Intelligence is Stuck. Here’s How to Move If Forward. New York Times, July 29. —— —. 2018. The Deepest Problem with Deep Learning. Article reprinted on TheAtlantic.com. https://medium.com/@GaryMarcus/the-deepest-problem-with-deep-learning91c5991f5695 For summary of Marcus v Elman debates, See: http://psych.nyu.edu/marcus/TAM/author_ response.html Marcus, G. 1998. “Can connectionism save constructivism?” Cognition, 66 (2), 153-182. Marcus, G., S. Pinker, M. Ullman, J. Hollander, T. Rosen, and F. Xu. 1992. “Over-Regularization in Language Acquisition.” Monographs of the Society for Research in Child Development 57: 1–181. Marcus, G., U. Brinkmann, H. Clahsen, R. Wiese, and S. Pinker. 1995. “German Inflection: The exception to the rule.” Cognitive Psychology 29: 189–256. McCloskey, M., and N. Cohen. 1989. “Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem.” Psychology of Learning and Motivation, 24: 109–65. Miller, G. 1955. “The Magic Number Seven, Plus or Minus Two: Some Limits on Our Capacity For Processing Information.” Published in 1956, Psychological Review 63: 81–97. —— —. 1956. Human Memory and the Storage of Information. IRE Transactions on Information Theory. (MIT). —— —. 1968. The Psychology of Communication. Allen Lane, The Penguin Press. Mills, D., Coffey-Corina, S., & H. Neville. 1997. “Language Comprehension and Cerebral Specialization from 13 to 20 months”. Developmental Neuropsychology 13: 397–4 45. Minsky, M., and S. Papert. 1969. Perceptrons: An Introduction to Computational Geometry. MIT Press. Miyagawa, S. 2010. Why Agree? Why Move? MIT Press. Moro, A. 2000. Dynamic Antisymmetry. MIT Press. Munti, T., T. Say, H. Clahsen, K. Schiltz, and M. Kutas. 1999. “Decomposition of Morphologically Complex Words in English: Evidence from Event-Related Brain Potentials.” Cognitive Brain Research 7: 241–253. Newell, A., and H. Simon. 1956. The Logic Theory Machine: A Complex Information Processing System. IRE Transactions on Information Theory. (MIT). Ott, D. 2011. “A Note on Free Relative Clauses in the Theory of Phases.” Linguistic Inquiry 42: 183–92. Owens, R. 2007. Language Development: An Introduction. Allyn & Bacon. (PDP) ‘Parallel Distributional Processing’ Research Group. UC San Diego. Pearl, Judae. 2018a. Theoretical Impediments to Machine Learning with Seven Sparks from the Causal Revolution. Technical Report R-475, July 2018. _____. 2018b. The Book of Why: The New Science of Cause and Effect. Basic books. Pesetsky, D. 1982. Paths and Categories. Ph.D. Diss. MIT. http://w ww.ai.mit.edu/projects/dm/ theses/pesetsky82.pdf. —— —.1987. “Wh-in-situ: Movement and Unselective Binding.” In The Representation of (In) definiteness, edited by E. Reuland, and A. ter Meulen. MIT Press. Piattelli-Palmarini. 2010. (chapter found in Larson et als. (eds) The Evolution of Human Language: A Biolinguistic Perspective. Cambridge University Press.

262 | Reflections

on Syntax: Lectures in General Linguistics

Pierce, A. 1989. On the Emergence of Synytax: A Crosslinguistic Study. Ph.D. Diss. MIT. Pinker, S. 1984. Language learnability and language development. Cambridge, MA: MIT Press. —— —. (1999). Words and Rules. Basic Books. Pinker, S., and P. Bloom. 1990. “Natural Language and Natural Selection.” Behavioral and Brain Sciences 13 (4): 707–784. Pinker, S., and A. Prince. 1988. “On Language and Connectionism: Analysis of a Parallel Distributed Processing Model of Language Acquisition.” Cognition 28: 59–108. Popper, K. 1972. Objective Knowledge: An Evolutionary Approach. Oxford Press. Radford, A. 1988. Transformational Grammar. Cambridge University Press. —— —. 1990. Syntactic Theory and the Acquisition of English Syntax. Oxford: Blackwell. ______. 1997a. Syntactic Theory and the Structure of English. Cambridge University Press. ______. 1997b. Syntax: A Minimalist Introduction. Cambridge University Press. __ _ _ _ _. 2000. “Children in Search of Perfection: Towards a Minimalist Model of Acquisition”. Essex Research Reports in Linguistics, Vol. 34. —— —. 2009. Analyzing English Sentences. Cambridge University Press. —— —. 2016. Analysing English Sentences, 2nd edn. Cambridge University Press. http:// www.cambridge.org/u s/a cademic/s ubjects/ l anguages-l inguistics/g rammar-a ndsyntax/ analysing-english-sentences-2nd-edition?format=PB#v1SARaL3IBRuXXf0.97. Radford, A., and J. Galasso. 1998. “Children’s Possessive Structures: A Case study”. Essex Research Reports in Linguistics, Vol. 19. http://w ww.csun.edu/~galasso/a rjg.pdf. Reinhart, T. 1975. The Strong Coreference Restriction: Indefinites and Reciprocals. Cambridge, Mass: Ms., MIT. —— —. 1976. The Syntactic Domain of Anaphora. Doctoral dissertation, Cambridge, Mass.: MIT. Roeper, T. 2007. The Prism of Grammar. MIT Press. Rosenblatt, F. 1959. Two Theorems of Statistical Separability in the Perceptron. Ms. Proceedings of a Symposium on the Mechanism of Thought Processes. London. Ross, J. 1967. Constraints on Variables in Syntax. (Doctoral dissertation, MIT). (Published as Ross 1986). Saffran, E. 2003. “Evidence from Language Breakdown: Implications For the Neural and Functional Organization of Language”. In Mind, Brain, and Language, edited by M. Banich and M. Mach. Lawrence Erlbaum. Santi, A. & Y. Grodzinsky 2007. “Working Memory and Syntax Interact in Broca’s Area. Neuroimage 37, 8-17. Schreuder, Gilbers, and Quene. 2009. “Recursion in Phonology”. Lingua 119. Shin, N., S. Pinker, and E. Halgren. 2006. “Abstract Grammatical Processing of Nouns and Verbs in Broca’s Area: Evidence from FMRI”. Cortex, 42. Slobin, D. 1971. Psycholinguistics. Berkeley: University of California. —— —. 2004. “From Ontogenesis to Phylogenesis: What Can Child Language Tell us About Language Evolution?” In Biology and Knowledge Revisited: From Neurogenesis to Psychogenesis, edited by J. Langer, S. T. Parker, and C. Milbrath. Mahwah, NJ: Lawrence Erlbaum Associates. Smith, N., and I. M. Tsimpli. 1995. The Mind of a Savant. Oxford: Blackwell.

Full References and Web Links | 263 Terrace, H., L. Petitto, R. Sanders, and T. Bever. 1979. “Can an Ape Create a Sentence?” Science 206: 891–902. Tomasello, M. 2000. “Do Young Children Have Adult Syntactic Competence?” Cognition 74: 209–304. Tomasello, M., and J. Call. 1997. Primate Cognition. Oxford Press. Toulmin, S. 1961. Forecast and Understanding. Indiana: University Press. Travis, L. 1984. Parameters and effect of word order variation. PhD. Diss., MIT. Tsimpli, I. M. 1992. “Functional Categories and Maturation.” PhD Diss., UCL. Ullman, M., S. Corkin, M. Coppola, G. Hickok, J. Growdon, W. Koroshetz, and S. Pinker 1997. “A Neural Dissociation within Language: Evidence that the Mental Dictionary Is Part of Declarative Memory, and that Grammatical Rules Are Processed by the Procedural.” Journal of Cognitive Neuroscience 9 (2): 266–276. Wakefield, J., and M. Wilcox. 1994. Brain Maturation and Language Acquisition: Theoretical Model and Preliminary Investigation. Proceedings of the BUCLD 19: 643– 654, Vol. 2. Cascadilla Press. Wexler, K. 1994. “Optional Infinitives, Head Movement and the Economy of Derivations.” In Verb Movement, edited by D. Lightfoot, and N. Hornstein, 305–362. CUP. — — — . 2003. “Lenneberg’s Dream” (Chapter 1, 11– 61). In Language Competence across Populations, edited by Levy, Y. and J. Schaeffer. Mahwah: Erlbaum. Wexler, K., C. Schütze, and M. Rice. 1998. “Subject Case in Children with SLI and Unaffected Controls: Evidence for the AGE/Tns Omission Model.” Language Acquisition 7: 317–344. White, R. 1989. “Visual Thinking in the Ice Age.” Scientific American 260 (7): 92–99. (Quoted in Ian Tattersall, p. 196. The Evolution of Human Language: A bio-linguistic perspective. Larson, R., Déprez, V., and H. Yamakido (eds), 2010 CUP).

Some Important Neuroimaging Studies Related to Broca’s Area, Foxp2, and Linguistics Movement Operations Demonet, J. Thierry, G., and D. Cardebat. 2005. “Renewal of the Neurophysiology of Language: Functional Neuroimaging.” Physiological Reviews 85: 49–95. https://w ww.physiology.org/ doi/pdf/10.1152/physrev.00049.2003 Gopnik, M., and M. Crago. 1991. “Familial Aggregation of a Developmental Language Disorder.” Cognition 39: 1–50. https://w ww.sciencedirect.com/science/a rticle/pii/001002779190058C? via%3Dihub Grodzinsky, Y., and A. Santi. 2008. “The Battle for Broca’s Region.” Trends in Cognitive Sciences 12, no.12: 474–480. https://ac.els-cdn.com/S1364661308002222/1-s2.0-S136466130800 2222-m ain.pdf?_t id=adddcd282486- 4 ea8-a 81f-5 ee92c36b8d6&acdnat=1542657624_ 0efb9d3d401aa2be3ab6ce7420550acc Koelsch, S. 2000. “Brain and Music: A Contribution to the Investigation of Central Auditory Processing with a new Electrophysiological Approach”. Leipzig: Max Planck Institute of

264 | Reflections

on Syntax: Lectures in General Linguistics

Cognitive Neuroscience, MPI Series in Cognitive Neuroscience, 11. https://pure.mpg.de/rest/ items/item_720506/component/file_720505/content Lai, C. S. L., Fisher, S. E., Hurst, J. A., Vargha-K hadem, F., and A.P.A Monaco. 2001. “Forkhead- Domain Gene is Mutated in a Severe Speech and Language Disorder.” Nature 413: 519–523. https://w ww.researchgate.net/publication/232749064_ L ai_C _ S _ L _ Fisher_ S _ E _ Hurst_ J_ A _VarghaKhadem_ F_ Monaco_ A _ P_ A _ f orkheaddomain_ g ene_ i s_ mutated_ i n_ a _ severe_ speech _ a nd_language_disorder_ Nature_413_519-523 Liégeois, F., T. Baldeweg, A. Connelly, D. Gadian, M. Mishkin, and F. Vargha-K hadem. 2003. “Language fMRI Abnormalities Associated with FOXP2 Gene Mutation.” Nature Neuroscience 6, 12301237. https://w ww.researchgate.net/publication/9053686_L anguage_ fMRI_abnormalities_a sso ciated_with_FOXP2_gene_mutation. Marcus, G., and S. Fisher. 2003. “FOXP2 in Focus: What Can Genes Tell us About Speech and Language?” Trends in Cognitive Sciences 7: 257–262. http://w ww.ai.mit.edu/projects/ dm/foxp2.pdf Musso, M., A. Moro, V. Glauche, M. Rijntjes, J. Reichenbach, C. Büchel, and C. Weiller. 2003. “Broca’s Area and the Language Instinct.” Nature Neuroscience 6, no. 7. http://courses. washington.edu/ccab/Musso%20et%20al%20-%20language%20instinct%20%20Nat%20 NS%202003.pdf Poeppel, D., and Hickok. 2004. “Towards a New Functional Anatomy of Langauge.” Cognition 92: 1–12. https://ac.els-cdn.com/S0010027703002257/1-s2.0-S0010027703002257main. p d f ? _ t i d = b a5 6 a 8 d 2 - 158 a - 4 2 8 0 9 c b 814 b b a 7 7d16 4 0 & a c d n a t=15 4 2 65 45 47_ 7d745cc91c0cb4c684d3ecdb1b634eb6 The SLI Consortium (research update) http://w ww.well.ox.ac.uk/_a sset/file/update-2009.pdf Vargha-K hadem, F., D. Gadian, A. Copp, and M. Mishkin. 2005. “FOXP2 and the Neuroanatomy of Speech and Language”. Nature Reviews, Neuroscience 6. https://w ww.princeton.edu/ ~adele/LIN_106:_UCB_files/FoxP2-Vargha-K hadem05.pdf

References to Web-Links Link-1. https://w ww.theatlantic.com/technology/a rchive/2012/11/noam-chomsky-on-where- artificialintelligence-went-wrong/261637/ Link-2. See Cathy Price https://w ww.ncbi.nlm.nih.gov/pmc/a rticles/PMC3398395/ https://ac.els-cdn.com/S1053811912004703/1-s2.0S1053811912004703main.pdf?tid=eb34e5d90edc457f93c1fc2ba4ddbc16&acdnat=1540513741_1c272e9492031e9279c02edf76fafff2 Link-3. See Review article. http://w ww.let.rug.nl/z wart/docs/minprogrev.pdf Link-4. http://science.sciencemag.org/content/298/5598/1569.full file:///C:/Users/c sunl/A ppData/L ocal/Packages/M icrosoft.MicrosoftEdge_ 8wekyb3d8bbwe/ TempState/Downloads/download%20(1).pdf Link-5. https://search.proquest.com/openview/ddc5976e9e8173d6fe51f661c65ded1b/1?pqorigsite= gscholar&cbl=40569

Full References and Web Links | 265 file:///C:/Users/c sunl/A ppData/L ocal/Packages/M icrosoft.MicrosoftEdge_ 8wekyb3d8bbwe/ TempState/Downloads/FOXP2.Nature%20(1).pdf Link-6. https://w ww.academia.edu/15151583/Some_notes_on_what_makes_language_interesting_For_Ben Link-7. http://w ww.pnas.org/content/94/20/10750.full.pdf Link-8. https://chomsky.info/20110408/ (Language and the Cognitive Science Revolution(s). Noam Chomsky Text of lecture given at the Carleton University, April 8, 2011). Link-9. http://w ww.iep.utm.edu/chineser/ Link10. https://a rchive.org/stream/NoamChomskySyntcaticStructures/Noam%20Chomsky% 20%20Syntcatic%20structures_djvu.txt. Link-11. http://pds27.egloos.com/pds/201406/22/38/Chomsky13_Lingua.pdf (p. 39). Link-12. https://w ww.theatlantic.com/technology/a rchive/2012/11/noam-chomsky-on-where- artificialintelligence-went-wrong/261637/ Link-13. https://psychology.fas.harvard.edu/people/roger-brown Link-14. https://link.springer.com/content/pdf/10.1007%2FBF01067053.pdf Link-15. https://researchers.mq.edu.au/en/persons/stephen-crain Link-16. http://semantics.uchicago.edu/kennedy/classes/japan/intro-handout.pdf Link-17. https://ac.els-cdn.com/S1364661308002222/1-s2.0-S1364661308002222main.pdf?_ t id=fa449568da1811e7ad4300000aab0f6b&acdnat=1512518827_ 2 c5897f2461537d 75c319c 2ac44f8275 Link-18. https://w ww.terpconnect.umd.edu/%7Epietro/research/papers/POS.pdf. (p. 3). (See also ‘Poverty of Stimulus’ argument found here). (Chomsky et al). Link-19. https://papyr.com/hypertextbooks/grammar/lgdev.htm Link-20. http://w ww.psych.nyu.edu/gary/marcusArticles/marcus%201996%20CDPS.pdf Link-21. https://w ww.tandfonline.com/doi/pdf/10.1080/00437956.1958.11659661?needAccess= true Link-22. http://w ww.haskins.yale.edu/sr/SR119/SR119_09.pdf Link-23. https://en.wikipedia.org/wiki/Garden_path_ sentence Link-24. https://w ww.academia.edu/20416638/Monograph_Studies_in_Theoretical_Linguistics_ From_ Merge_to_ Move_ A _ Minimalist_ Perspective_on_t he_ Design_of_ L anguage_ a nd_ its_Role_in_Early_Child_Syntax_2016_ Link-25. http://w ww.ebire.org/aphasia/dronkers/the_ gratuitous.pdf http:// w ww.oxfordscholarship.com/ v iew/ 10.1093/ a cprof:oso/ 9 780195177640.001.0001/ acprof9780195177640-chapter-6 Link-26. For full discussion on the topic of Merge vs. Move in early child language acquisition, see Galasso (a set of ‘working papers’). For paper no. 1 see https://w ww.academia.edu/ 34403452/ Working_Papers_1. (For 2016 monograph, see http://lincom-shop.eu/L STL-59-From-Merge-to-Move/en) Link- 27. For movement in compounds see https://w ww.academia.edu/34403441/Working_ Papers_4 Link-28. On Merge vs. Move in child language, see https://w ww.academia.edu/36787155/A _ Mergebased_theory_of_child_language_acquisition

266 | Reflections

on Syntax: Lectures in General Linguistics

Link-29. See http://norvig.com/chomsky.html Link- 30. See Marcus https://w ww.theverge.com/2018/7/3/17530232/self-driving-a i-winter-f ull autonomy-waymo-tesla-uber. And while you are there, check out the polar-bear profile experiment link https://youtu.be/M4ys8c2NtsE Link-31. http://w ww.psych.nyu.edu/gary/TAM/author_response.html Link-32. See G. Marcus for a review of image software https://w ww.nytimes.com/2017/07/29/ opinion/sunday/a rtificial-intelligence-is-stuck-heres-how-tomove-it-forward.html Link-33. See link to paper for opening remarks a on dual-mechanism account based on child language. https://w ww.academia.edu/15155921/Small_C hildren_ s _ S entences_ a re_ Dead_ on_ A rrival_ Remarks_on_ a _ M inimalist_ A pproach_t o_ E arly_C hild_ Syntax_ Journal_of_C hild_ Language_Acquisition_a nd_Development_vol_3 no._4. Dec._2015_

Index

A Abbreviations: 154 (acronyms, emoji, initialisms, texting) Active to passive and embedding: 54, 63 Acquisition v Learning: 33 (See also ‘bell-Shape Curve’) AGRee: 95, 110, 198 Agreement: 13, 220 non-agreeing languages (Japanese): 168 person, number: 8 subject-verb: 77 American Sign Language (ASL): 25, 28 Analogy: xxxv, 45, 185 Anti-locality Condition (see Lasnik & Saito): 126, 130 Argument Structure: 26, 93 Artificial Intelligence (AI): xxi, 66, 131, 177–178, 235 AI-Treelet structure: 186

catastrophic interference: 187 digital to analog shift (phonology): 188–189 finite-state grammars: 69, 187 hidden Markov model: 191 human vision: 192 (Müller-Lyer illusion): 192 hybrid model: 183, 186 multi-layer perceptrons: 181, 190, 192 semantic vs. syntactic features: 187 voice recognition/voice print: 188, 190 Asperger’s Syndrome: 199 Associative Learning: 67, 184 Autism: 5, 7–8, 199 Auxiliary (verb): xx, xxix copy of: 17 do-insert/support: 6–7, 127 inversion: 42 omission: 8, 9, 16, 153

268 | Reflections

on Syntax: Lectures in General Linguistics

B ‘Baseball-glove’ analogy (form defines function): 144 ‘Beads-on-a-string’ Theory: xxii, xxv, 3, 6–7, 68 Behaviorism (See Skinner): xvi, 151, 184 Bell-shaped curve: 43, 235 Right-wall: 241 Berko, J. xviii, 45, 134, 181 wugs test: xviii, 45, 185 Bickerton, D. xxxv, 35, 54, 158, 200 bioprogram: 168, 207 (See proto-language) Binding/Licensing (Pronouns): 37, 57, 78–80, 136 anaphoric/antecedent: 79 constraints on: 80–81 deficits in aphasia (See Grodzinsky): 61 local vs distant: 55, 57, 81 Biological Basis for Language: 15, 20, 31, 35, 38, 44, 132, 201 Brain-area (Imaging): Alzheimer’s vs Parkinson’s: 199 basal ganglia: 199 Broca’s area/front-left hemisphere: xx, xxv, xxvii, 7, 38, 52, 60, 132, 152, 180, 198 scans (fMRI, ERP: N400, P600): xxiii, 39, 52, 68–69 Wernicke’s area/limbic temporal- lobe: xx, 14, 38, 52, 180, 198 Brain-Language Corollary: xx, xxvii, 14, 32, 38, 44–45, 59, 64, 69, 196 brain maturation: (see maturation) Brain-Mind Analogies: xvi, 12 brain-mind bootstrapping/ mapping: 177 brain to computer: 12, 131, 142, 178 Broca’s area: 199

aphasia: 7, 60, 68, 210 maturation of: 231 non-Inflectional stage: 8 Broca-to-Syntactic Movement: 7, 60 Brown, R. 132 Roger Brown children (Adam, Eve): 132

C C-Command (see X-bar theory). Case, (Agreement & Tense): xxvi, 5, 8, 13, 17, 58, 77, 82, 95, 158, 195, 198, 212 as trigger for movement: xxxiv, 54, 110 accusative: 46, 82, 110, 158 clitic: 219 dative: 82 default Case (He -> Him): 217 double case-marking: 216 genitive (possessive): 97, 213 lack of in child language: 8, 214, 227 lexical: 217 marking: 82 via light vP: 59 marking particle {to}: 82 morphemic marking: 218 nominative: 8, 46, 158, 177 finite-verb relation to: 177 structural-configurational: 218 Child Language Syntax (Acquisition/ Data): xx, 8, 13, 33, 55, 151, 164, 166, 168, 195, 228 child grammars: 8, 64, 151, 226 (See also ‘Language Errors’). functional categories-stage-2: xviii, xxvi, 19 lexical categories-stage-1: xviii, xxv, 8, 19, 59, 63, 151, 153

Index | 269 merge-only stage: 63–64 telegraphic speech: 157 Child-to-adult Continuity: 203 Chomsky, C. 60, 210 Chomsky, N. xiii, 22, 27, 29, 31, 37, 43–44, 65, 69, 94 Chomskyan axioms: 94 Clahsen, H. xx, 15, 182, 185 ( see shallow processing) Clitic (Climbing): 75, 90, 215 Closeness (in Structure): 47, 69 in terms of adjacency: 47, 52, 69 in terms of structure: 47, 52, 69 Compounds: 61, 113–115 ‘root’ (non-move) v ‘synthetic’ (move): 61, 113, 178–179 Connectionism (vs. Nativism): xxi, 12, 39, 45, 131, 183–184 associative-driven learning: 45, 131, 184 CP: 125 Crain & Thornton: 132 Critical Period Hypothesis (see Lenneberg): 43, 225

D Darwin: 35, 53–54, 95, 141–142, 158 bottom-up vs top-down: 95, 141, 148 selective pressure for language: 33, 53, 94, 141 Darwinian puzzle of language: 24, 30 DELV (See Roeper) Dennett, D. 141 Descartes, R. (Mind-Body Dualism): xiii ‘ghost-in-the-machine’: xiii Double-Object/Question formations: 8, 82

case in: 111 constructs: 82, 110 dative shift: 82 Duality of Semantics: fn 53, 58, 119, 198, 208 Dual Mechanism Model: xvii, xxxi, 12, 15, 152, 183, 235 vs. single mechanism model: xviii, 12, 182, 235 studies on phonology: 188 Dynamic Antisymmetry: 62, 103, fn196, 200

E Ellipsis: 72 Empty category: 157 Ergative (see structure). Exaptation (see S. J. Gould). Explanatory Adequacy: 67

F Faculty of Language (FL): 12, 16, 20, 33, 40, 65, 132 broad: 83, 201, 206 sensitive to frequency: 201 narrow: 83, 201, 206 sole property of recursion: 205–207 Feynman, R.: 27 Fibonacci Code: 36, 47 First Language Acquisition (L1): Fitch, Hauser, Chomsky (2002): 54, 83, fn148, 196, 198 (Hauser, Chomsky, Fitch) Fodor, J. 35, 131 Fodor v. Pinker: 131, 172 (‘the mind doesn’t work that way’)

270 | Reflections

on Syntax: Lectures in General Linguistics

‘For Ben’ Sentence: 51 Form vs Function (see ‘baseball-glove’ analogy): 142 ‘Four Sentences’: xxxv, 65–68, 130 i Can eagles that fly swim? ii Him falled me down. iii The horse raced past the barn fell. iv I wonder what that is up there. FOXP2 (gene): 71 French: 4, 90, 98, 121 negative phrase: (See Pierce). 123 raising: 4 Frequency (effects): xxi, xxxiii, 3, 15, 67, 93, 145, 181–183, 190 sensitivity to: 105, 107–109, 177, 183, 201, 235 Full-listing Hypothesis: xvii, xxiv how stems + affixes get processed: xvii, xxiii, 182 Functional Categories: xxvi, 13, 38, 157, 204 delay of: 204 (see child language) Functional Heads: 58 Functionalism vs Formalism: 21, 35, 37, 39, 94–95, 157, 204 Fundamental Difference Hypothesis: 190 (Bley-Vroman).

G Galasso, J. 8, 182, 204, 206 Galilean Revolution: 131 Gapping: 159 Garden-path sentence: (see Structure). German: (default plural): 185 ‘Ghost-in-the-machine’: (See Descartes). Gordon, P. 175

(‘rat-eater’ experiment): xxx, 175 Gould, S. J. 28, 35, 54, 141 exaptation: 28, 83, 141 punctuated equilibrium: 54, 141, fn202 Grodzinsky, Y (& Santi). 7, 52, 55, 60– 61, 69, 132, 210 binding v movement (Broca’s aphasia): 60

H History of Spelling: 161–162, 166

I Imitation>Analogy> Computational: xxxv, 45 Inflection (Agreement, Case, Number. Tense): 91 INFL-affix (nature of): xxiv lack thereof: 8, 13 Interface Systems: 24, 26, 33, 70–71, 75, 126 at LF: 24, 26, 50, 53, 58, 70 at PF: 24, 26, 50, 53, 58, 70 +/- Interpretable (feature): xxix, 53, 94, 104 Item vs Category: xxxii, 83, 105 Irregulars (N, V) stored as different words: xvii, 181, 185 sound shift of: xviii, xxv, xxxiii (also See Dual Mechanism Model).

J Japanese: 42, 150, 168

Index | 271

K Kayne, R. 63, 115 linear correspondence axion: 116, 120 (all word order is SVO) 116–118 Kuhl (& Meltzoff): 188 (See Native-Language Magnet Theory): 189 Knowledge/Learning: associative: 107, 203 ‘declarative vs. procedural’ (See Ullman) 44, 107 Krashen (Theory for L2): 165

L Label Algorithm: 62, 83, 103, 209 (see dynamic antisymmetry, word order) Language Acquisition Device (LAD): 13 Language Design: 47, 51–52, 54, 195 Language: as ability for abstraction: 34 as associative-driven: 67, 107, 131, 184, 203 as classificatory: 31 as computation: 20, 22, 27, 29, 45, 52 as ‘mere’ communication: 21–22, 157 (also see functionalism vs. formalism) as problem-solving/cognitive: 7, 33, 39, 43, 66–67, 80, 131, 235 (also see bell-shape curve/ statistical) as defined by recursion: 83, 93, 208 as sensorimotor: 66 creativity: 13, 134 commission: 16, 20 developmental (L1): 13 E-language vs. I-language: 27, 65

errors: 13, 151 impairment: 20 (See Specific Language Impairment) learnability problem: 30, 41 learning (vs acquisition): 43 of function 35 (see functionalism v formalism) of thought: 22, 35 omission: 16, 20 over-regularization: 91 transfer/Interference (L2): 13, 162 universal constraints on: 35 variation & change: 20 what is language? 12, 21 Language as ‘Problem-solving’ Skill: 43, 66 Language Evolution: 20, 30–31, 33, 35, 45, 70, 196 Language Magnet Theory: (See Kuhl): (Magnet effect clustering of phoneme) Lasnik & Saito (anti-locality condition): 129 Lenneberg, E. 31 (see Critical Period Hypothesis): Levi-Strauss: 31 Lexical Categories: xxv, 38, 59 lexical v functional categories: xxv, 59 (also see child Language) Lexical Head: 58 Lexical Items: xvii, xxiii, 49 Lexical stage-1 (see child syntax). Lightfoot, D. 35, 37, 49, 137 Logical Form (LF) (see Interface System).

M Marcus, G.: 12, 176, 182, 185 vs. Elman, J. (debates): 12

272 | Reflections

on Syntax: Lectures in General Linguistics

Maturation (based hypothesis): xxi, xxiv, 8, 14, 20, 59–60, 64, 152, 231 MERGE (linear/local): 6, 48, 50, 80, 83, 103, 187 Merge-based (Derivational): 49, 83, 138 Merge-base theory of child language: 64, 97, 182, 195 (Galasso): 97, fn209, 182 Merge v. Move (operations): 6, 9, 36, 48–49, 53, 56, 58, 71, 80, 83, 132, 209 merge stage-1: 8, 9, 64, 97 move stage-2: 8, 64 Minimalist Program (Chomsky 1995): 24, 33, 50, 68–69, 95, 105, 125, 208 Miyagawa (2010): 58, 119 why agree? why move? 97 Modular Model of Language: 14, 65 Moro, A. (See dynamic antisymmetry) fn196 Morpheme: 28 Morphology: xvii affixal [that’s] vs lexical [that] [is]: 49 affixes: xxiv, xxxiii, 107 ‘celebrating’ vs ‘fascinating’ types: 106–107 derivational: xvii, xxv, 28, 39, 105, 108, 138 sound shifts: xxxiii inflectional: xvii, 28, 39, 60, 63, 105, 108, 138, 152 full-listing hypothesis: xvii storage of: xvii undecomposed (chunks): xvii, xxxiv MOVE (non-linear/distant): xxxiv, 48, 50, 80, 83, 93, 95, 103, 110, 204 Move-based (Inflectional): 49, 83, 138 Movement: xxvi, 5, 49, 53 affixal/inflectional: 49, 60, 104 agree: 71, 95

anti-locality: 119, 126 at a distance 47, 54 as core property of language: 68, 208 binding vs. movement: ‘Broca-to-syntax movement’ corollary: case as trigger for movement: (see Case) classification of: 8, 53 clitic: 50, 75, 90, 97 constraints on: 49–50, 54 copy>merge>delete: 6, 84–85 deletion: 6, 84 distance traveled: 56, 59 family of: 52, 58–59 feature check-off: 53, 59 head-to-head: 126 heavy NP-shift: 56, 122 labelling account: 62, 83, 103 (see label algorithm) lack of in child grammar: 9, 59–60 lexical: 49 linear v. non-linear: xxxv, 50 local v distant: 7, 47, 51–55, 59, 80, 208, 210 passive to active: 54, 60, 63 phonological/clitic account: 97–98 ‘raising’: 4, 8 semantic (account): 85, 97, 99–100 scope: 85, 99 steps of (stepwise): 5, 56, 63, 95, 112, 115, 139 from merge to move: 95, 112, 115 syntactic (account): 57, 60, 97, 104, 125 topic: 84 Wh-: 42, 125 where move? /on what (scaffolding)? xii, 115, 118 why move? 53, 59, 94, 104 Multiple-language States: 169 Myth of ‘function defines form’: 141

Index | 273

N Native language magnet theory: 189 Nature of Nurture: 11, 16, 44 Neanderthal v Cro-Magnum: 1, 205 Neurolinguistics: 14–15, 69

O Ontogeny-Recapitulates-Phylogeny: xx, 3, 14, 54, 70, 182, 196, 206, 226 Over-regularization (-generalization): 134

P Paradigm Shift: 31 Parameters (setting): 24, 42, 44 +/-head initial: xxvii, 42 +/- pro-drop: 168 Partitive (‘any’): 89 ( see polarity item) Paul, H.: 32 Pesetsky (Edge Constraint) Phoneme: 28 as category: 188 clustering (Kuhl): 189 development: 164 target: 188 template (right branching): 91 transfer of L1: 150 Phonology: 25, 33, 148 assimilation: xi, 91–92 between mother-daughter relation: xi constraints on: xi, 92, 165 between sister relations: xi, 92 first-language/child (L1): 148, 164 L1-transfer (Japanese): 150

L1-transfer (Spanish): 150, 165 sound shift: magnet effect (Kuhl): 189 phonemic development: 91, 149, 164 perception: 188 tapping experiments: 91, 165 place & Manner of Articulation: 25 second-language (L2): 149–151, 165 syllable template: 91, 149 (see syllabic development). Phonological Form (PF) (see Interface System). Phrase: 72 head initial: 41 head of: xxix, 41, 72 label algorithm (word order): 209 structure rules: 247 Piattelli-Palmarini, M.: 196, 205 Pidgin Language: 158, 200 Pierce, A (1989). (French NegP): 123 Pinker, S (Words & Rules Theory). 27, 178, 182, 185 mentalese: 27 Pinker & Bloom v Chomsky: 28, 171, 202 Polarity Item: 86, 88–89 licensing condition: 89 (see movement: scope) Pongid-Hominid Split: 70, 163, 195 Possessives: 151, 163, 166 as diagnostic test: 5 lack of in child language: 95, 151, 166, 228 recursion: 5 Poverty of Stimulus/Learnability Problem: xxx, 40, 175 Priming-effects: xxi, 184 Principles & Parameters Theory (Setting): 33, 37, 40–41, 44, 53 Probe-Goal Relation: 57–58, 95, 105

274 | Reflections

on Syntax: Lectures in General Linguistics

Proto-Language: xx, 14, 54, 195, 200, 225 Processing: 20 bell-shape curve/statistical: 43 ‘deep’-structure: 6 ‘shallow’-structure: 6 Surface-level: 6, 8, 9

R Radford, A. 8, 75, 88, 97, 99, 117, 120, 157 Radford & Galasso: xxi, 97, 151, 163, 206, 229 Reasons for Movement: 125 (See Dynamic Antisymmetry). labeling: (See Phrase ‘Label Algorithm’). phonological: 90 semantic: 87 syntactic: 125 Recursion: 1, 81 definition of: 3, 93 diagnostic test: 5 lack of, (see also protolanguage): 6, 8, fn69, 225–226 Recurrent Networks: 183 Recursive Structure/Syntax: xii, 1, 3, 51–52, 81, 93, 226 Recursive v Recurrent: 2, 7, 9, 15, 51, 176, 183, 198 Roeper, T. 8, 62, 168 language diagnostic (DELV): 8 Rote-Learning & Frequency- effects: 15, 107 Rule-based grammar: 11, 12, 20, 29, 45 horizontal spreading of: xi, 105

S Salvador Luria (Nobel laureate): fn 34 Sapir: 66 de Saussure: 66 Searle, J. 132 Chinese-room argument: 132 Second Language Development/ Learning (L2): 5, 32, 43, 55 interference factors (L1-transfer): 13 Semantics (thematic): 33, 100 semantic v syntactic cut: 54 ‘Sisterhood’ relation: 74, 79 (see Structure/flat). Skinner, B.F. v Chomsky (Debate): xvii, 11, 15, 32, 39 as a ‘pedagogical’ device: xviii, 2 behaviorism: xvi, xxx, 11, 20, 151, 184 empirical vs rational thought: xvii, 11, 12 hybrid model: xviii, 183 Slips of the tongue: xxi, xxxi, 25 Slobin, D. 132, fn207 Small Clause: 177 Spanish: xix, xxviii, 42, 168 (Also see Phonology, L1 transfer). Specific Language Impairment (SLI): 5, 8 ‘Speech is Special’ hypothesis: fn142, 143 (Haskin’s Lab) Speech-category Perception: 189 Spoonerism: 25 Structure: 2, 47, 145 ambiguous: 62 base-generated: 49 binary branching: 36, 71–72 all/only right branching: 91, 115, 120

Index | 275 (see also closeness in structure): 47, 52, 69 dependent: 106, 177 embedded: xxxv, 49, 51, 69 ergative/unaccusative: 56, 101, 118, 120 flat-(sisterhood)/non-recursive: xi, 2, 9, 11, 49, 52, 62, 79–81, 105, 133, fn202 garden-path: xxii, 67 items vs categories: xxxii, 2, 83, 105 linear-adjacency (order): 36 non-linear: 131 non-local (v. local): 47, 131 (Also see Movement). progression of: 139 recursive: 2, 3, 51, 71, 133 shallow (see processing). Spec-Head-Comp: (see X-bar theory) Structure Dependency: 106, 236 Syllabic: 91, 146, 148 cc-cluster: 149–150 considerations: 149 development: 91, 149, 164 recursive nature of: 91 template (CVC): xi, 91, 146 transfer of L1: 150 Symbolic (rule-based) vs. Iconic (item): (See also ‘Item vs Category’) Syntax: 28, 33, 36, 70, 93, 151 Syntax (Syn-Move): 93, 104, 110

Theoretical (three) models of Linguistics: 29 (cognitive, generative, interactionism) Theta-Role (marking): 26, 102 (see ergative/unacussative): Tomasello, M. 176 mimicking theory: 176 Topic (Movement): 84

T

W

T-to-C (QP) Movement: 6–7 Teaching Methods (Cook): Telegraphic Speech: 157 Tense (Phrase) (TP): 5–8, 59, 71

[Walked] vs [[Stalk]ed] (and the role of frequency on regular verbs): End note: 182 ‘Wanna’ contraction: 49, 156

U Ullman, M. xx Declarative/Procedural Model: 44 Universal Grammar (UG): 36–37, 40– 41, 65 Universal Principles of Binding: 37

V Vertical v Horizontal Processing: xxxii, 105, 240 (See Morphology: ‘fascinating’ v ‘celebrating’ typologies) V-to-T (to-C) Movement: 8, 87, 129 Verb-Phase-Internal-Subject Hypothesis: xxix, 102, 117 VP-Shell (Larson): 118–119 light verb*: fn 53, 56, 58 case derived via light verb: 58–59 for theta-marking: 119

276 | Reflections

on Syntax: Lectures in General Linguistics

Wexler, K. 176 Lenneberg’s dream: 176 White, R. 1, 206 Wh-Question: 8, 46 Wh-subject: 125 ‘Wh’-to-‘Th’ historical analogy (what? > that!): 46 Williams’ Syndrome: 199 ‘Wine bottle’ vs ‘Bottle of wine’ (argument): 19, 63, 112 Word association (see priming): xxi Word Order: 7, 8, 41–42, 57, 61, 115, 121, 195, 231 how merge affects word order: 63, 209, 232 (also see label algorithm).

Word Segmentation/Boundaries: 160 word change (a norange –> an orange): Word recognition: xxi Working memory: xxi, 60

X X-Bar Theory: xxix, 72–73, 116 binary branching: 71–73, 91 c-command: 73–76, 78–79, 89, 136 max/min-projection: 73–74 mother-daughter relation: 74, 79 (also see Sisterhood relation). scope: 74 spec-head-comp: 72–74, 79, 197

B E R K E L E Y I N S I G H T S IN LINGUISTICS AND SEMIOTICS Irmengard Rauch General Editor

Through the publication of ground-breaking scholarly research, this series deals with language and the multiple and varied paradigms through which it is studied. Language as viewed by linguists represents micrometa-approaches that intersect with macrometa-approaches of semiotists who understand language as an inlay to all experience. This databased series bridges study of the sciences with that of the humanities. To order other books in this series, please contact our Customer Service Department at: [email protected] (within the U.S.) [email protected] (outside the U.S.) Or browse online by series at: www.peterlang.com