323 10 9MB
English Pages 346 [282] Year 2021
Language in Development A Crosslinguistic Perspective
Edited by Gita Martohardjono and Suzanne Flynn
The MIT Press Cambridge, Massachusetts London, England
© 2021 The Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. The MIT Press would like to thank the anonymous peer reviewers who provided comments on drafts of this book. The generous work of academic experts is essential for establishing the authority and quality of our publications. We acknowledge with gratitude the contributions of these otherwise uncredited readers. Library of Congress Cataloging-in-Publication Data Names: Martohardjono, Gita, 1956- editor. | Flynn, Suzanne, editor. Title: Language in development : a crosslinguistic perspective / edited by Gita Martohardjono and Suzanne Flynn. Description: Cambridge, Massachusetts : The MIT Press, [2021] | Includes bibliographical references and index. Identifiers: LCCN 2020044374 | ISBN 9780262542005 (paperback) Subjects: LCSH: Language acquisition. | Similarity (Language learning) | Multilingualism. Classification: LCC P118 .L36367 2021 | DDC 401/.93--dc23 LC record available at https://lccn.loc.gov/2020044374
d_r0
Contents
Preface Gita Martohardjono and Suzanne Flynn I THEORETICAL AND METHODOLOGICAL CONSIDERATIONS 1 Five Questions about Language Learning Virginia Valian 2 Coordinate Compounds in Theory and Practice D. Terence Langendoen 3 Hard Words Lila R. Gleitman, Kimberly Cassidy, Rebecca Nappa, Anna Papafragou, and John C. Trueswell 4 Elicited Imitation in First Language Acquisition Research: Cognitive Grounding and Crosslinguistic Application Cristina Dye and Claire Foley II CHILDREN 5 The Development of Person and Number Agreement in Child Heritage Speakers of Spanish Learning English as a Second Language Jennifer Austin, Liliana Sánchez, Silvia Perez-Cortes, and David Giancaspro 6 The Role of Gestures in First and Second Language Acquisition: A Case Study of a Hebrew-English Bilingual Child Yarden Kedar 7 Discourse-Morphosyntax Interaction in the Acquisition of Spanish Finite and Nonfinite Verbs María Blume 8 A Hybrid Approach to Infant-Directed Speech Reiko Mazuka 9 Discontinuous Dependent Morphemes in German and English Parental Speech: Input Differences between Two Languages Lynn M. Santelmann III ADULTS 10 Syntactic Ambiguity Resolution in Native and Nonnative Speakers of Chinese Yun Yao and Jerome Packard 11 The Tense Puzzle in Second Language Acquisition Gita Martohardjono, Virginia Valian, and Elaine C. Klein
12 Bilingual Processing of the First-Learned Language: Are Heritage Speakers and Late Bilinguals Really That Different? Ian Phillips, Gita Martohardjono, Christen N. Madsen II, and Richard G. Schwartz 13 Identifying Early Language Changes in Alzheimer’s Disease: Extrapolating Lessons Learned from Methodologies Used in Investigating First Language Acquisition Janet Cohen Sherman and Suzanne Flynn
Contributors Index
List of Figures Figure 1.1 Percent subject use with verbs in nonimitative, nonimperative utterances by twelve 2-yr-olds observed thirty-four times over the course of one year. Figure 2.1 ⊇ ordering of S2. Figure 2.2 ≽ ordering of I2. Figure 2.3 Ordering of I2*. Figure 2.4 Ordering of I’2. Figure 2.5 Ordering of T1. Figure 2.6 Ordering of T1.1.2*. Figure 2.7 Ordering of I3*. Figure 2.8 Ordering of I3.12*. Figure 2.9 Ordering of I3.34*. Figure 2.10 Quantificational spine ordering of I5*. Figure 2.11 Quantificational spine subordering of Iq*. Figure 3.1 Percentage of correct identification of the mystery word in the HSP as a function of the type of information supplied to the participants. (Adapted from Snedeker & Gleitman, 2004.) Figure 3.2 Informativity across verb types. Percentage of correct identification of the mystery word in the HSP as a function of verb class and the type of information supplied to the participants. Mental verbs are better identified by syntactic frame information, whereas action verbs are better identified by scene information and by noun information. (Adapted from Snedeker & Gleitman, 2004.)
Figure 3.3 A partial subcategorization matrix illustrated for eight verbs. Verbs that can describe self-caused acts such as come, go, think, and know license one-argument intransitive constructions when they do so; verbs that can describe transfer events such as give, get, argue, and explain appear in three-argument ditransitive constructions when they do so; verbs such as think, know, argue, and explain, which can describe the mental relation between an entity and an event, appear in tensed sentence-complement constructions (S-complement) when they do so. Taken together, then, give describes the transfer of physical objects (three arguments but not sentence complements), whereas explain describes the transfer of mental objects, such as ideas—that is, communication —and so licenses both three-argument and S-complement structures (John explains the facts to Mary; John explains [to Mary] that dinosaurs are extinct). In contrast, the intransitive possibilities of come, go, think, and know reflect their independence of outside agencies; that is, thinking is “one-head cognition.” Notice also that unlicensed frames, if uttered, add their semantics to known verbs. For example, “John thinks the ball to Mary” would be interpretable as an instance of psychokinesis. Figure 3.4 Cartoon illustrations of two perspective verb pair events. Can syntax override the saliences of the scene? (From Fisher et al., 1994.) Figure 3.5 Proportion of source and goal subject interpretations by 3- and 4-yr-olds and adults as a function of introducing syntactic context. (Adapted from Fisher et al., 1994.) Figure 3.6 Attention, structure, and verb use. An attention-capture technique used in A influenced the choice of structure and verb use (B). The attentional state of a speaker (C) had a similar effect (D). Preferred subject is defined as the subject that people typically conjecture in unprimed situations. (Adapted from Nappa et al., 2004.) Figure 5.1 Parental reports of children’s language proficiency. Figure 5.2 Production of verbs in complex sentences in Spanish and English. Figure 5.3 Production of nontarget inflection in Spanish. Figure 5.4 Production of nontarget inflection in English. Figure 6.1 Mean rates of gesture per minute by period and L-CT. Figure 6.2 Type of gesture by social function and L-NG. Figure 6.3 Main communicative function by L-NG and period. Figure 7.1 Answers to imperfect questions. Figure 7.2 Answers to progressive questions. Figure 7.3 Answers by age group to ambiguous imperfect questions (with ostensive context). Figure 7.4 Answers by age group to habitual imperfect questions (without ostensive context). Figure 8.1 Predicted mora durations: all phrasal positions. Error bars indicate 95% confidence intervals. (Adapted from Martin et al., 2016, p. 57, fig. 4.) Figure 8.2
First and second formants of vowels uttered by individual mothers in ADS and IDS. Average F1 and F2 for each speaker for each vowel are shown as letters /i/, /a/, and /u/, and the triangles show the overall average vowel space. (Adapted from Miyazawa et al., 2017, p. 87, fig. 1.) Figure 8.3 The Mahalanobis distance among five vowel categories. Each box shows the average MD among each vowel pair for all speakers. Error bars indicate standard error. *** = p 1).11 b. Recursive steps: If x, y ∈ C, then x ∧ y, x ∨ y ∈ C. c. Closure: Nothing else is in C. Let #C be the size of C defined as in 15. Then #C = 2k. Since k = 2 in S2 and I2*, #S2 = #I2* = 22 = 4.12 A subclass of Boolean orderings can be defined over a smaller base case by including negation alongside conjunction and disjunction as a generative function. Let B ∈ ℬ, and let #B be the size of B as defined in 16.13 Then
.
16. Recursive definition of the members of a Boolean classical ordering B from its base case using negation, conjunction, and disjunction: a. Base case: A′ = {b1, …, bk} ⊆ B (k > 0). b. Recursive steps: If x ∈ B, then ¬x ∈ B, and if x, y ∈ B, then x ∧ y, x ∨ y ∈ B. c. Closure: Nothing else is in B. For example, the CI ≽-ordering I′2 = {ε, a, b, c, d, ab, ac, ad, bc, bd, cd, abc, abd, acd, bcd, abcd} in figure 2.4 is a Boolean ordering with members and can be defined by 16 from any of the base cases {ab, cd}, {ac, bd} and {ad, bc}.14
Figure 2.4 Ordering of I’2.
The most familiar classical ordering of type ℬ is the propositional-logic (PL) ⊒-ordering
T defined over a base case of a non-empty finite set P of independent propositions, in which the logical operators are defined by particular truth-value functions on the truth values assigned to the members of P. For example, let P1 = {p}, where p is expressed by Daddy fed Tripod, ¬p by Daddy didn’t feed Tripod, ∧ by and, and ∨ by or. Then T1 = {p, ¬p, p ∧ ¬p, p ∨ ¬p} with the ⊒ ordering in figure 2.5.
Figure 2.5 Ordering of T1.
To account for the equivalences in 1 and 2 and in 4 and 5, we extend the PL ⊒-ordering T to a kind of predicate-logic (ΠL) ⊒*-ordering T* by incorporating members of ECI orderings as the arguments of predicates within sentences.15 For example, consider the classical ⊒*ordering T1.2.2*, with one base-case two-place predicate F(x, y), where F = fed, which incorporates the I2a* ≽* ordering for the value of its second argument y and the I2b* ≽* ordering for the value of its first argument x, which is generated from the atomic choice sets d = Daddy and e = Mommy. T1.2.2* is a Boolean (ΠL) ⊒* ordering with members, which is too large to show here. Instead we show T1.1.2* in figure 2.6, in which only the choice set d = Daddy occurs as the first argument. T1.1.2* has the members listed in table 2.3 along with at least one English expression for each variant. T1.1.2* has two base cases on level 3, for example, F(d, a) = Daddy fed Tripod and ¬F(d, b) = Daddy didn’t feed Towzer.
Figure 2.6 Ordering of T1.1.2*. Table 2.3 Members of T1.1.2* and some of their English glosses Member
17. Glosses
F(d, a) ∨ ¬F(d, a) = …16
a. Daddy fed Tripod or Daddy didn’t feed Tripod
F(d, a) ∨ ¬F(d, b)
b. Daddy fed Tripod or Daddy didn’t feed Towzer
F(d, a) ∨ F(d, b) = F(d, a|b)
c. Daddy fed Tripod or Daddy fed Towzer = Daddy fed Tripod or Towzer = Daddy fed a dog/one of the dogs
¬F(d, a) ∨ ¬F(d, b) = ¬F(d, ab)
d. Daddy didn’t feed Tripod or Daddy didn’t feed Towzer = Daddy didn’t feed Tripod and Towzer = Daddy didn’t feed both dogs
¬F(d, a) ∨ F(d, b)
e. Daddy didn’t feed Tripod or Daddy fed Towzer
F(d, a)
f. Daddy fed Tripod
¬F(d, b)
g. Daddy didn’t feed Towzer
(F(d, a) ∧ ¬F(d, b)) ∨ (¬F(d, a) ∧ F(d, b)) = F(d, a) ↔ ¬F(d, b) = ∀i ∈ a|b (F(d, i) ∧ ¬F(d, ¬i))
h. Daddy fed Tripod and Daddy didn’t feed Towzer, or Daddy didn’t feed Tripod and Daddy fed Towzer = Daddy fed Tripod if and only if Daddy didn’t feed Towzer = Daddy fed one of the dogs and not the other
(F(d, a) ∧ F(d, b)) ∨ (¬F(d, a) ∧ ¬F(d, b)) = F(d, a) ↔ ¬F(d, b) = F(d, ab) ∨ ¬F(d, a|b)
i. Daddy fed Tripod and Daddy fed Towzer, or Daddy didn’t feed Tripod and Daddy didn’t feed Towzer = Daddy fed Tripod and Towzer or Daddy didn’t feed Tripod or Towzer = Daddy fed both dogs or Daddy fed neither dog
F(d, b)
j. Daddy fed Towzer
¬F(d, a)
k. Daddy didn’t feed Tripod
F(d, a) ∧ ¬F(d, b)
l. Daddy fed Tripod and Daddy didn’t feed Towzer
F(d, a) ∧ F(d, b) = F(d, ab)
m. Daddy fed Tripod and Daddy fed Towzer = Daddy fed Tripod and Towzer = Daddy fed each dog/both dogs
¬F(d, a) ∧ ¬F(d, b) = ¬F(d, a|b)
n. Daddy didn’t feed Tripod and Daddy didn’t feed Towzer = Daddy didn’t feed Tripod or Towzer = Daddy didn’t feed any dog = Daddy fed neither Tripod nor Towzer = Daddy fed no dog = Daddy didn’t feed either dog = Daddy fed neither dog
¬F(d, a) ∧ F(d, b) F(d, a) ∧ ¬F(d, a) = … (see note 16)
o. Daddy didn’t feed Tripod and Daddy fed Towzer p. Daddy fed Tripod and Daddy didn’t feed Tripod
The incorporation of the members of ECI orderings as the arguments of predicates in ΠL orderings enables us to account for the equivalences in 17d and 17n as instances of DeMorgan’s laws for classical orderings in . Applying 18a to 17c yields ¬(F(d, a) ∨ F(d, b)) = ¬F(d, a) ∧ ¬F(d, b) = ¬F(d, a|b) as in 17n, and applying 18b to 17m yields ¬(F(d, a) ∧ F(d, b)) = ¬F(d, a) ∨ ¬F(d, b) = ¬F(d, ab) as in 17d. 18. For all x, y ∈ C: a. ¬(x ∨ y) = ¬x ∧ ¬y b. ¬(x ∧ y) = ¬x ∨ ¬y We also learn that English provides a means of referring to the negation of an individual in the scope of a quantifier using the term the other as in 17h, in which the gloss Daddy fed one of the dogs and not the other is derived by forward deletion of the second occurrence of Daddy fed without the formation of a coordinate compound phrase of the remainder (see note 3).17 Dedekind Orderings of Individuals
As note 12 points out, Ik* for k > 2 are not classical orderings as they do not satisfy 14, as one can determine for I3*, whose atoms are the members of A3 = A2 ∪ {c}, where c = Tigger (also a dog). I3* has the eighteen members in table 2.4, each given appropriate glosses in English separated by semicolons and internally punctuated if necessary to indicate the
relative scope of and and or, and the ordering in figure 2.7. From the definition of negate for I* in 11, we find that none of 19b–19g nor of 19k–19n in table 2.4 are the negations of any other member of I3a*; similarly, from the definition of d-negate for I* in 11b, none of 19e– 19g nor of 19k–19q are the d-negations of any other member. Nevertheless, every member of I3* has both a negation and a d-negation, as shown in table 2.5; that is, both negation and dnegation map I3* into I3* (are injective). Moreover, both conjunction and disjunction map I3* × I3* onto Ik* (are surjective) for all values of k, so that each ECI ≽-ordering I* can be recursively defined using conjunction and disjunction from its base case of atoms as in 20.
Figure 2.7 Ordering of I3*. Table 2.4 Members of I3* Member
19. Glosses
a|b|c a|b a|c b|c a|bc b|c c|ab a b c ab|ac|bc ab|ac ab|bc ac|bc ab ac bc abc
a. Tripod, Towzer, or Tigger; a dog; any dog b. Tripod or Towzer; any dog but Tigger c. Tripod or Tigger; any dog but Towzer d. Towzer or Tigger; any dog but Tripod e. Tripod, or Towzer and Tigger; Tripod or the other dogs f. Towzer, or Tripod and Tigger; Towzer or the other dogs g. Tigger, or Tripod and Towzer; Tigger or the other dogs h. Tripod i. Towzer j. Tigger k. Tripod and Towzer, Tripod and Tigger, or Towzer and Tigger; any two dogs; all the dogs but one l. Tripod, and Towzer or Tigger; Tripod and one of the other dogs m. Towzer, and Tripod or Tigger; Towzer and one of the other dogs n. Tigger, and Tripod or Towzer; Tigger and one of the other dogs o. Tripod and Towzer; every dog/all the dogs but Tigger p. Tripod and Tigger; every dog/all the dogs but Towzer q. Towzer and Tigger; every dog/all the dogs but Tripod r. Tripod, Towzer, and Tigger; every dog; all the dogs
Table 2.5 Negation and d-negation of members of I3* Member
Negation
D-negation
a|b|c
abc
abc
a|b a|c b|c a|bc b|ac c|ab a b c ab|ac|bc ab|ac ab|bc ac|bc ab ac bc abc
abc abc abc abc abc abc bc ac ab abc abc abc abc c b a a|b|c
c b a b|c a|c a|b b|c a|c a|b a|b|c a|b|c a|b|c a|b|c a|b|c a|b|c a|b|c a|b|c
20. Recursive definition of the members of I* from its base case of atoms using conjunction and disjunction functions only: a. Base case: Ak = {{a1}, …, {ak}} ⊆ I* (k > 0).18 b. Recursive steps: If x, y ∈ I*, then x ∧ y, x ∨ y ∈ C. c. Closure: Nothing else is in C. An ordering of the type represented by I* was discovered by Richard Dedekind (1897; Fricke et al., 1931) as the solution to the following problem: Given a set of k prime numbers Pk as the base case, and of their 2k–1 multiples Mk, find the largest set of sets Mk* ⊆ ℘(Mk) − that contain no divisible pairs. For example, for P2 = {2, 3}, the solution is (in our notation for the members of I*) M2* = {2, 3, 6, 2|3}, and for P3 = {2, 3, 5}, it is M3* = {2, 3, 5, 6, 10, 15, 30, 2|3, 2|5, 3|5, 2|15, 3|10, 5|6, 6|10, 6|15, 10|15, 2|3|5, 6|10|15}. M2* is isomorphic to I2* and M3* to I3* given the ordering ≽f on Mk in which x ≽f y if and only if x is divisible by y and the ordering ≽f* on Mk* is derived from ≽f in the same way that ≽* is derived from ≽. Dedekind reported the first four values of the series he was later named for, noting that it grows very quickly and that he had not yet tried to find a general expression for it. To date only the first eight values listed in table 2.6 have been calculated.19 Table 2.6 The first eight Dedekind numbers, the sizes of #Mk* = #Ik* (1 ≤ k ≤ 8) k
#Mk* = #Ik*
1 2 3 4 5 6 7 8
1 4 18 166 7579 7,828,352 2,414,682,040,996 56,130,437,228,687,557,907,786
#Ik* grows rapidly not only in size with k but also in complexity so that even the
determination of the exact value of #I9* has not yet been determined, thirty years after the calculation of #I8*. It is startling to find that such a complex ordering should hold for something as basic to the functioning of natural languages as the conjunction and disjunction of sets of individuals. We may presume that our ability to deploy it as a result of learning a language is based on focusing on its most salient and useful properties and ignoring the rest. In the rest of this chapter, I examine a few of the properties of I3* that might give us clues as to how to explore the possibilities. To start with, I3* can be decomposed into the four overlapping classical suborderings in 21, of which I3.1* in 21a and I3.2* in 21.b overlap in the set {a|b|c, a, b, c, abc}; and I3.3* in 21c and I3.4* in 21d overlap in the singleton {ab|ac|bc}. Forming the union of these two pairs, we obtain the suborderings I3.12* and I3.34* in figures 2.8 and 2.9. I3.12* contains all and only all the members of I3* whose choice sets all have the same number (1, 2, or 3) of atomic parts, that is, those members of I3* that are strictly singular, dual, or plural. I3.34*, on the other hand, contains all the members of I3* except the atoms, which are just the ones for which paraphrases involving quantified expressions can more or less naturally be used. It also contains a spine of three members, which can be thought of as essentially quantificational in nature, a|b|c = ∃x (or 1x), ab|ac|bc = 2x = (∀—1)x, and abc = ∀x.
Figure 2.8 Ordering of I3.12*.
Figure 2.9 Ordering of I3.34*.
21. Decomposition of I3* into classical suborderings: a. I3.1* = ¬I3* = {a|b|c, a, b, c, ab, ac, bc, abc} b. I3.2* = ⌐I3* = {a|b|c, a|b, a|c, b|c, a, b, c, abc} c. I3.2* = {x ∈ I3*: x ≽* ab|ac|bc} = {ab|ac|bc, ab|ac, ab|bc, ac|bc, ab, ac, bc, abc} d. I3.2* = {x ∈ I3*: ab|ac|bc ≽* x} = {a|b|c, a|b, a|c, b|c, a|bc, b|ac, c|ab, ab|ac|bc} For Ik* (k > 3), a complete decomposition along the lines of 21 cannot be carried out. For I4*, about 60 percent of the members are included, and the percentage drops with k, but it may be supposed that the members that are not included are comparatively less in need of access. The quantificational spine grows only at the same rate as k and serves as an appropriate model for the numerical quantifiers as it identifies the set of choice sets made up of the same number of atoms as the quantifier expresses. Figure 2.10 shows an example for I5* where A5 = {a, b, c, d, e}, every one of which is, let us say, a dog.
Figure 2.10 Quantificational spine ordering of I5*.
The subordering schema can be extended to equivalence classes of choice sets based on approximate relative sizes of the number of atoms greater than one with the total number assumed to be large but unspecified, so as to include φ (paucal), a few or a little; μ (multal), many or much; and ∀—φ, all but a few, all but a little or most, resulting in the subordering of Iq* in figure 2.11.
Figure 2.11 Quantificational spine subordering of Iq*. Notes 1. Chomsky did not use the term CR for either schema 3, which he called the “general rule for conjunction,” or its ancestor, a generalized transformation, which he called simply Conjunction. However CR was the name that caught on for the derivation of conjunctive coordinate compound words and phrases from conjunctive coordinate compound sentences. The first occurrence in print that I have located of conjunction reduction to refer to such an operation is mentioned en passant in Postal (1966), which suggests that the term was already then in use by generative grammarians. It was subsequently used as the subject and title of a survey paper (Harries, 1973) and a squib (Hudson, 1973). 2. To coordinate n occurrences of some category, A → A and A in schema 3 should read something like A → An and A (n ≥ 1), analogous to the phrase-structure-rule schema proposed in Chomsky and Miller (1963, p. 298, ex. 21) and then rejected as an “ad hoc adjustment” (p. 299). 3. Evidence that this rule is part of the adult grammar of English is presented in the section “Recursive Definition of Classical Orderings by Means of Logical Operators.” 4. During this time there were significant investigations of children’s comprehension of DNP- and DS-CCs, such as Suppes & Feldman (1971) and Paris (1973), but these were directed at studying their judgments of the truth values of the sentences involved, not their grammatical derivations. 5. ∈ S2 is the empty set, which is a subset of every set in any set of sets of which it is a member, and ε ∈ I2 is the empty individual, which is part of every individual in any set of individuals of which it is a member. In CI and in this paper, the atomic individuals (henceforth atoms) in an ordering are those non-empty individuals that have no parts that are members of that ordering other than themselves, thus excluding ε as a member. 6. I2* consists of the antichains of I2a* whose members are independent members of the ≽ ordering of I2 (Weisstein, n.d.). The members of I2a* − I2* are equivalent to their own weakest subset, i.e., a|ab = a, b|ab = b, and a|b|ab = a|b. 7. In his discussion of Geach’s challenge, Jennings (1994, p. 228) observed: “ ‘Tripod or Towzer’ does not stand for something for which a name can be introduced,” noting that the disjunction of two names stands for a set, not an individual. He did not consider the possibility that names also stand for sets, in this case singleton sets of atomic individuals. In any event, Jennings reached essentially the same conclusion about the reference of a disjunction of names as the one proposed here but by a more complex route: “We may also think of ‘or’ as a function that takes an ensemble of individual constants to a general term, whose semantic representation will be a function taking the representatives of those constants to the least set containing them.” The general term in this case presumably would be dog, or more exactly, a dog. 8. In I2a*, the negation of each member is identical to its d-negation. For ab, this equivalence follows from the if-clause of the definition of negate in 11 and the else-clause of the definition of d-negate in 11.b: ¬ab = ⊘*ab = A2 = a|b and ⌐ab = A2 − ab = a|b. For a|b, it follows from the else-clause of the definition of negate and the if-clause of the definition of dnegate: ¬a|b = a⊕b = ab and ⌐a|b = a⊕b = ab. For a and b, it follows from the else-clauses of the two definitions, which I leave as an exercise for the reader.
9. That class consists of orderings with two independent members such as the ≽-ordering A2. 10. If is added to the base case of ⊇ orderings and ε to the base case of ≽ orderings, then only conjunction is needed in the recursive step for generating the members of those orderings. 11. The members of the base case of C are those that appear in the next-to-top level of the ordering. For S2 and I2*, that is level 2 of figures 2.1 and 2.3. 12. Sk is classical for all k ≥ 2, whereas Ik* is classical for k = 2 only. Ik* for k > 2 is discussed in the next section. 13. Schema 16 reduces to a special case of 15 if the first conjunct of 16b is removed and A’ in 16a is replaced by A = {a1, …,
} consisting of all and only all of the atoms of ℬ.
14. The base case A’ ∈ ℬ is a subset of its members on level 2k + 1 of its ordering counting up from the bottom; the members of ¬A’ also occur on that level. For S2 and I2* considered as the Boolean orderings S’2 and I’2a*, that is level 2, which is also the level on which the atoms of S2 and I2a* occur; but for I4 considered as the Boolean ordering I’2, the base case is found on level 3 in figure 2.4, whereas the atoms of I4 are all and only all the four members on level 4 in figure 2.4. 15. Goodman (1951) incorporated members of CI-orderings as arguments of predicates and developed a classification of one- and two-place predicates based on how they interact with sums and products of individuals. In particular, Goodman’s analysis accounts for the equivalence of examples like 1 and 2. See also Langendoen and Magloire (2003) for application of Goodman’s approach to the analysis of reflexive and reciprocal interpretations of sentences in English. 16. See figure 2.6 for equivalent logical forms for the topmost gloss, 17a, and the bottommost one, 17b. 17. Note that the CNP-CC neither Tripod nor Towzer in one of the glosses of 17n does not refer to an individual at all, much less the negation of an individual, just as the NP no dog in another of those glosses does not. 18. Ik* has levels, and its atoms are in the middle level in figure 2.7.
of the ordering. For example, the atoms of I3* are in level
19. For further discussion and references, see https://oeis.org/A007153. References Ardery, G. (1979). The development of coordinations in child language. Journal of Verbal Learning and Verbal Behavior, 18, 745–756. Chomsky, N. (1957). Syntactic structures. Mouton. Chomsky, N. (1965). Aspects of the theory of syntax. MIT Press. Chomsky, N., & Miller, G. A. (1963). Introduction to the formal analysis of natural languages. In R. D. Luce, R. R. Bush, & E. Galanter (Eds.), Handbook of mathematical psychology (Vol. 2, pp. 269–321). John Wiley & Sons. Dedekind, R. (1897). Über Zerlegungen von Zahlen durch ihre größten gemeinsamen Teiler. In H. Beckurts (Ed.), Festschrift der Technischen Hochschule zu Braunschweig bei Gelegenheit der 69. Versammlung deutscher Naturforscher und Ärtze (pp. 1–40). Vieweg und Sohn. Reprinted in Fricke et al. (1931, pp. 103–147). Dik, S. C. (1968). Coordination: Its implications for a theory of general linguistics. North-Holland. Dougherty, R. S. (1970). A grammar of coordinate conjoined structures I. Language, 46, 850–898. Dougherty, R. S. (1971). A grammar of coordinate conjoined structures II. Language, 47, 298–339. Fricke, R., Noether, E., & Öre, O. (Eds.). (1931). Richard Dedekind: Gesammelte Mathematische Werke II. Vieweg und Sohn. Geach, P. T. (1962). Reference and generality. Cornell University Press. Goodman, N. (1951). The structure of appearance. Harvard University Press. Harries, H. (1973). Conjunction reduction. Stanford University Working Papers on Language Universals, 11, 139–209. Hudson, R. A. (1973). Conjunction-reduction. Journal of Linguistics, 9, 303–305. Jennings, R. E. (1994). The genealogy of disjunction. Oxford University Press. Koslow, A. (1992). A structuralist theory of logic. Cambridge University Press. Langendoen, D. T., & Magloire, J. (2003). The logic of reflexivity and reciprocity. In A. Barss (Ed.), Anaphora: A reference
guide (pp. 237–263). Blackwell Publishing. Leonard, H., & Goodman, N. (1940). The calculus of individuals and its uses. Journal of Symbolic Logic, 5, 45–55. Lust, B. (1977). Conjunction reduction in child language. Journal of Child Language, 4, 257–287. Lust, B., & Mervis, C. A. (1980). Development of coordination in the natural speech of young children. Journal of Child Language, 7, 279–304. Paris, S. (1973). Comprehension of language connectives and propositional logical relationships. Journal of Experimental Child Psychology, 16, 278–291. Postal, P. M. (1966). A note on “understood transitively.” International Journal of American Linguistics, 32, 90–93. Ross, J. R. (1970). Gapping and the order of constituents. In M. Bierwisch & K. Heidolph (Eds.), Progress in linguistics (pp. 249–259). Mouton. Slobin, D., & Welsh, C. A. (1973). Elicited imitation as a research tool in developmental psycholinguistics. In C. Ferguson & D. Slobin (Eds.), Studies of child language development (pp. 485–497). Holt, Rinehart & Winston. Suppes, P., & Feldman, S. (1971). Young children’s comprehension of logical connectives. Journal of Experimental Child Psychology, 12, 304–317. Weisstein, E. W. (n.d.). Antichain. MathWorld—A Wolfram Web Resource. http://mathworld.wolfram.com/Antichain.html
3 Hard Words Lila R. Gleitman, Kimberly Cassidy, Rebecca Nappa, Anna Papafragou, and John C. Trueswell
Much of linguistic theory in the modern era takes as its central task to provide an account of the acquisition of language: What kind of machine in its initial state, supplied with what kinds of input, could acquire a natural language in the way that infants of our species do? Chomsky (1980) cast this problem in terms of the “poverty of the stimulus,” or Plato’s Problem. What is meant here is that if input information is insufficient to account for the rapidity, relative errorlessness, and uniformity of language growth, it follows that certain properties of language—alternatively, certain ways of taking in, manipulating, and representing linguistic input—are preprogrammed in human nature. Usually in this context, linguists are talking about the acquisition of phonology (e.g., Dresher, 1998) or syntax (e.g., Hornstein & Lightfoot, 1981; Pinker, 1984), not vocabulary. For this latter aspect of language, the poverty of the stimulus argument is hardly raised, and it is easy to see why. With rare exceptions, everybody seems to subscribe to something like Yogi Berra’s theory: If it’s not something you can observe, then you’re not going to learn it by watching.
Tailored to vocabulary growth in particular, this would mean one acquires the meanings of words by observing the contingencies for their use—that is, by pairing the words to the world. For instance, we learn that “cat” means ‘cat’ because this is the word that is uttered most systematically in the presence of cats and the least systematically in their absence.1 All the learner has to do is match the real-world environment (recurrent cat situations) with the sounds of the words (recurrent phonetic sequences) in the exposure language. Here is an even more famous version of this theory, from John Locke: If we will observe how children learn languages, we shall find that … people ordinarily show them the thing whereof they would have them have the idea, and then repeat to them the name that stands for it, as ‘white’, ‘sweet’, ‘milk’, ‘sugar’, ‘cat’, ‘dog’. (1690 [1967], 3.9.9)
The British Empiricists were of course cannier than this out-of-context passage implies and evidently only meant to make a start with this word-to-world pairing procedure (afterward, reflection and imagination would take over). Notice, in this regard, that to make his story plausible, Locke has selected some rather transparent examples, items for which perception might straightforwardly offer up the appropriate representations to match to the sounds. If there is a cat out there, or whiteness, this may well trigger a salient perceptual
experience. But what of such words as fair (as in “That’s not fair!”), a notion and vocabulary item that every child with a sibling learns quickly in self-defense? Or how about know or probably? How does one watch or observe instances of probably? In the present chapter, we try to motivate a picture of what makes some words harder to acquire than others, not only for babies but also for other linguistic novices.2 Findings we report suggest that a considerable part of the bottleneck for vocabulary learners is not so much in limitations of the early conceptual repertoire but rather in solutions to the mapping problem that Locke introduces in 1: determining which phonetic formative expresses which concept. Thereafter, we describe a theory of word learning that in early incarnations was called syntactic bootstrapping (Gleitman, 1990; Landau & Gleitman, 1985). In line with most recent commentary in the literature of infant perception and conception, this approach accepts that infants by their first birthday or sooner approach the task of language learning equipped with sophisticated representations of objects and events (e.g., “core knowledge” in the sense of Spelke, 2003; Spelke et al., 1992) and quite a smart pragmatics for interpreting the gist of conversation during communicative interactions with caretakers (per Baldwin, 1991; P. Bloom, 2002; and Tomasello & Farrar, 1986). These capacities enable the learners to entertain a variety of concepts expressed by the words that their caregivers utter. However, although this sophistication with event structure and conversational relevance necessarily frames the word-learning task, we posit that it is insufficient taken by itself. The other major requirement for vocabulary growth is developing linguistic representations of incoming speech that match in sophistication, and dovetail with, their pragmatic and conceptual representations. By so doing, the learners come to add structure-to-world mapping procedures to the word-to-world mapping procedures with which they began. Specifically, the position we defend is that vocabulary learning presents a classic poverty of the stimulus problem that becomes obvious as soon as we move our attention past the simplest basic-level whole-object terms. For many if not most other words, the ambient world of the language learner is surprisingly impoverished as the sole basis for deriving meanings. Yet children learn these hard words, too, although crucially with some measurable delay. Two broad principles characterize our account. On the one hand, we claim that learners’ usable input, both linguistic and nonlinguistic, for word learning is much broader and more varied than is usually acknowledged. But on the other hand, this improved input perspective threatens to create a learning problem that, just as perniciously, substitutes a “richness of the stimulus” problem for the poverty of the stimulus problem as previously conceived (cf. Chomsky, 1959; Quine, 1960; see also Gleitman, 1990). The learner who can observe everything can drown in the data. Two kinds of capacity and inclination rescue the learning device. The first is a general learning procedure that can extract, combine, and coordinate multiple probabilistic cues at several levels of linguistic analysis (in the spirit of many machine-learning and constraint-satisfaction proposals; e.g., Bates & Goodman, 1997; Elman, 1993; Goldsmith, 2001; Kelly & Martin, 1994; Manning & Schütze, 1999; McClelland, 1987; Trueswell & Tanenhaus, 1994). However, for such a probabilistic
multiple-cue learning process to work at all requires unlearned principles concerning how language realizes conceptual structures and similarly unlearned principles for how these mappings can be discovered from their variable and complex encoding in speech within and across languages (e.g., Baker, 2001a; Borer, 1984; Grimshaw, 1990; Jackendoff, 1990; Lidz et al., 2003b). Two Accounts of Hard Words
When we compare infants’ conceptual sophistication to their lexical sophistication, we find a curious mismatch. Earliest vocabularies all over the world are replete with terms that refer in the adult language to whole objects and object kinds, mainly at some middling or basic level of conceptual categorization—for example, words such as doggie and spoon (Au et al., 1994; Bates et al., 1995; Caselli et al., 1995; Fenson et al., 1994; Gentner & Boroditsky, 2001; Goldin-Meadow et al., 1976; Kako, 2005; Lenneberg, 1967; Markman, 1994). This is consistent with many demonstrations of responsiveness to objects and object types in the prelinguistic stages of infant life (Kellman & Arterberry, 1998; Kellman & Spelke, 1983; Mandler, 2000; Needham & Baillargeon, 2000). In contrast, for relational terms the facts about concept understanding do not seem to translate as straightforwardly into facts about early vocabulary. Again there are many compelling studies of prelinguistic infants’ discrimination of and attention to several kinds of relations, including containment versus support (Hespos & Baillargeon, 2001), force and causation (Leslie, 1995; Leslie & Keeble, 1987), and even accidental versus intentional acts (Carpenter et al., 1998; Woodward, 1998). Yet when the time comes to talk, there is a striking paucity of relational and property terms compared to their incidence in caretaker speech. Infants tend to talk about objects first (Gentner, 1978, 1981). Consequently, because of the universal linguistic tendency for objects to surface as nouns (Baker, 2001b; Pinker, 1984), nouns overpopulate the infant vocabulary as compared to verbs and adjectives, which characteristically express events, properties, and relations. The magnitude of the noun advantage from language to language is influenced by many factors, including frequency of usage in the caregiver input; even so, it is evident to a greater or lesser degree in all languages that have been studied in this regard (Gentner & Boroditsky, 2001; Snedeker & Li, 2000).3 In sum, verbs as a class are hard words, whereas nouns are comparatively easy. Why is this so? An important clue is that the facts as just presented are wildly oversimplified. Infants generally acquire the word kiss (the verb) before the word idea (the noun), and even before kiss (the noun). As for the verbs, their developmental timing of appearance is variable, too, with such items as think and know generally acquired later than go and hit (L. Bloom et al., 1975). Something akin to concreteness rather than lexical class per se appears to be the underlying predictor of early lexical acquisition (e.g., Gentner, 1978, 1982; Gentner & Boroditsky, 2001; Gleitman & Gleitman, 1997).4 The Conceptual Change Hypothesis
Plausibly enough, the early advantage of concrete terms over abstract ones has usually been taken to reflect the changing character of the child’s conceptual life, whether attained by maturation or learning. Smiley and Huttenlocher (1995) presented this view as follows: Even a very few uses may enable the child to learn words if a particular concept is accessible. Conversely, even highly frequent and salient words may not be learned if the child is not yet capable of forming the concepts they encode.… Cases in which effects of input frequency and salience are weak suggest that conceptual development exerts strong enabling or limiting effects, respectively, on which words are acquired. (p. 20)
Indeed, the word-learning facts are often adduced as rather straightforward indexes of concept attainment (e.g., Dromi, 1987; Huttenlocher et al., 1983). In particular, the late learning of credal, or belief, terms is taken as evidence that the child does not have control of the relevant concepts. In the words of Gopnik and Meltzoff (1997), the emergence of belief words like “know” and “think” during the fourth year of life, after “see,” is well established. In this case … changes in the children’s spontaneous extensions of these terms parallel changes in their predictions and explanations. The developing theory of mind is apparent both in semantic change and in conceptual change. (p. 121) The Informational Change Hypothesis
A quite different explanation for the changing character of the vocabulary, the so-called syntactic bootstrapping solution (Fisher, 1996; Gleitman, 1990; Landau & Gleitman, 1985), has to do with information change rather than, or in addition to, conceptual change. Specifically, we propose the following general explanation: 1. Several sources of evidence contribute to solving the mapping problem for the lexicon. 2. These evidential sources vary in their informativeness over the lexicon as a whole. 3. Only one such evidential source is in place when word learning begins; namely, observation of the word’s situational contingencies. 4. Other systematic sources of evidence have to be built up by the learner through accumulating linguistic experience. 5. As the learner advances in knowledge of the language, these multiple sources of evidence converge on the meanings of new words. These procedures mitigate and sometimes reverse the distinction between easy and hard words. The result of this learning procedure is a knowledge representation in which detailed syntactic and semantic information is linked at the level of the lexicon. According to this hypothesis, then, not all words are acquired in the same way. As learning begins, the infant has the conceptual and pragmatic wherewithal to interpret the reference world that accompanies caretaker speech, including the gist of caretaker-child conversations (to some unknown degree; but see P. Bloom, 2002, for an optimistic picture, which we accept). These capacities and inclinations to interpret the reference world meaningfully are implicit as well in Locke’s dictum (1). Words that can be acquired solely from such confrontations with extralinguistic context are easy in the sense that we have in mind (for a closely related position, see Gentner, 1982). The output of this observational, word-to-world learning procedure is a substantial stock of
basic-level terms, largely nouns. This foundational vocabulary, important in its own right for the novice’s early communications with others, also plays a necessary role in the computational machinery for further language learning. Crucially, it provides a first basis for constructing the rudiments of the language-specific clause-level syntax of the exposure language—that is, the structural placement of nominal arguments (a matter discussed later in this chapter). This improved linguistic representation in turn becomes available as an additional source of evidence for acquiring further words, those that cannot efficiently be acquired by observation operating as a stand-alone procedure. The primitive observationbased procedure that comprises the first stage of vocabulary growth is what preserves this model from the vicious circularity implied by the whimsical term bootstrapping (you can’t pull yourself up by your bootstraps if you’re standing in the boots). We now turn to some evidence.5 The Human Simulation Paradigm and the Learning of Easy Words
In several recent studies we investigated the mapping problem in lexical learning under varied informational conditions (Gillette et al., 1999; Kako, 2005; Snedeker & Gleitman, 2004). The first purpose of these studies was to investigate the potency, in principle, of various kinds of situational and linguistic cues for identifying the concept that individual words (common nouns and verbs) encode. Of course, findings of this kind do not necessarily imply that learners can or do recruit these cues for use in word learning. This limitation has made it seem to some critics paradoxical, if not perverse, that we chose adults as the participants—the word learners—in these simulations. After all, what we want to understand is why the learning sequence for the vocabulary in young children describes the trajectory that it does. So why study adults? The answer has to do with our second aim. An adult population provides a first method for disentangling the hypotheses of the conceptual change and the information change. If the character and trajectory of learning differ greatly between children and adults as a function of the vast conceptual gap between these populations, then we should not expect to be able to model infant learners of vocabulary with adults. But what if we can make perfectly sophisticated (well, reasonably sophisticated) undergraduates learn as babies learn simply by withholding certain kinds of information? And what if that withheld information was just the kind of language-particular information that the baby could not have? Such a result would bolster an information-change account. Thus the Human Simulation Paradigm (HSP) is designed to model the target population (infants) by investigating effects of input on the learning function in adults, much in the vein of computer simulations of this process (see Webster & Marcus, 1989, for the first computer simulation of verb learning in this line, based on the outline scheme in Landau & Gleitman, 1985). HSP derives its choice of materials from a realistic source: a database of approximately thirty hours of conversations collected under naturalistic circumstances between Englishspeaking mothers and their infants, aged about 18 to 24 mo. The test items were the twentyfour most frequent nouns and the twenty-four most frequent verbs that occurred in these conversations. To test how easy or hard it might be to identify these words from
extralinguistic context alone, adult observers were shown video clips (each about a minute in length), chosen by a random procedure, of the mothers’ child-directed speech.6 Crucially, the tapes were silenced, but an audible beep occurred whenever the mother uttered the mystery word. Participants saw six such clips for each word, presented in a row, and were told that each series of six beeps marked utterances of the same word by the mother. Thus they were being asked to identify words (perform the mapping task) by word-to-world pairing, conjecturing the meaning of each of the mystery words by observation of the real-world contingencies for its use. The six exemplars provided the participants with an opportunity for some cross-situational observation to guide the interpretation of each word in situational context. In the initial studies they were told whether the words were nouns or verbs. Nouns were overwhelmingly easier for the participants to identify (45 percent correct) than verbs (15 percent correct) in this situation, in which number of exposures was the same. In a replication by Snedeker and Gleitman (2004) the massive advantage of the nouns over the verbs remained, even when the beeps were not identified by lexical class. Thus these results reproduced the noun-dominance property observed for babies as they first begin to speak. Success rates in this task could be predicted by other participants’ judgments of the concreteness of each word in the set of forty-eight items. On average, the test nouns in the mothers’ speech were judged more concrete (or more imageable) than the verbs, and these concreteness scores were much better predictors of the success rate on the identification task than the noun-verb distinction. Within lexical class, the same results hold. For example, Kako (2005) found that words for basic-level categories of whole objects (elephant) are strikingly easier to identify in this paradigm, based on observation alone, than are abstract nouns (thing). Similarly, the most concrete verbs (throw) were correctly identified relatively frequently, whereas the most abstract verbs (think, know) were never guessed correctly by any participant. What is important here is that the concreteness factor that determined adult behavior in the HSP laboratory also characterizes the infant’s first vocabulary, as earlier described by Gentner (1982): an overpopulation of concrete nouns, an underrepresentation of verbs (compared to their frequency in input speech), and a total absence of credal terms. Taken at their strongest, these results suggest that the chief limiting factor on early vocabulary growth is a lack of tools and information available for solving the mapping problem rather than a lack of conceptual sophistication. This is so even though we can think of our adults in this task as doing something like second language acquisition. Already knowing the words ball and think in English and the concepts that these encode, they learned that “beep” means ‘ball’ more easily than they learned that “beep” means ‘think’ just because they were discovering the mappings by using the evidence of their senses. The suggestion is that, in a related sense, infant vocabulary acquisition is second language learning as well.7 As we next discuss, the initial stock of lexical items acquired via word-to-world pairing eventuates not only in a primary vocabulary. These items play a crucial further role in language learning. They form the scaffold on which further linguistic achievements—both lexical and phrase-structural—are built. HSP and the Learning of Hard Words
How does the child move beyond an initially concrete, largely nominal vocabulary? The indirect relationship between verb meaning and observed events renders verb learning in particular somewhat mysterious. For one thing, verb occurrence is apparently not time locked with event occurrence to anything like the extent that noun occurrence is linked to object presence (Akhtar & Tomasello, 1997; Gleitman, 1990). Second, there is much more surface variability in how verbs get realized and encoded grammatically than nouns within and across languages (Baker, 2001a; Gentner, 1982; Gentner & Boroditsky, 2001; Goldberg, 2004; Lidz & Gleitman, 2004). Third, as we discuss later, some verbs represent concepts so divorced from everyday perception that the observed scene is almost wholly opaque for gleaning their intent. For these hard words the learner needs supplementary evidence— linguistic evidence bootstrapped from (grounded by) the early-acquired vocabulary of concrete words. To illustrate, let us return to the HSP procedures. To study the effects of changing the input database for learning, we next asked adults to identify nouns and verbs spoken to young children based on various combinations of linguistic and extralinguistic information (Gillette et al., 1999; Kako, 2005; Snedeker & Gleitman, 2004). The test items were the same ones for which we had previously shown the silenced video clips (the six randomly selected instances for each of the twenty-four most frequent nouns and the twenty-four most frequent verbs in our sample of maternal childdirected speech). Groups of adult participants were again asked to guess these words, but each group was provided with different, potentially informative sources of evidence. Table 3.1 illustrates these sources of evidence for the six instances of the mothers’ uttering the verb “call.” The first source of evidence was again the video-clip scenes. The second, a linguistic source, was the presence of the other content words in the mother’s utterances (in this case, the nouns); these were presented in alphabetical order (within sentence) to avoid cueing the syntax. The third source of evidence, again linguistic, was the set of syntactic frames in which the test items had occurred in the six test maternal utterances. To construct such frames, we simply replaced the content words and the mystery word itself (in caps) with nonsense forms, much in the spirit of Lewis Carroll’s Jabberwocky (see also Epstein, 1961). Three groups of participants were each presented with one of these three evidential sources, and the other participant groups received various combinations of these cues, sometimes including and sometimes excluding the video clips. Table 3.1 Information sources in the HSP for the item “call” Task
Information source provided to participants
What does GORP mean?
Scenes: Six video clips of mother-child interactions (no audio, single beep played at time of mystery word utterance).
What does GORP mean?
Nouns that occurred in the six maternal utterances (alphabetized): Gramma, you Daddy, Daddy Daddy, you I, Markie Markie, phone, you Mark
What does GORP mean?
Frames in which the six maternal utterances occurred: Why don’t ver GORP telfa? GORP wastorn, GORP wastorn. Ver gonna GORP wastorn? Mek gonna GORP litch. Can ver GORP litch on the fulgar? GORP litch.
Source: Adapted from Gillette et al. (1999).
How well did participants do when guessing the mystery word under these different information conditions? Figure 3.1 shows the accuracy scores for each information condition and their combinations. As seen in the figure, participants who got just the nouns did about as well as those who got just the silenced videotaped scenes (about 15 percent). Those who were provided with both sources of information were significantly more accurate; indeed, the effects of the two sources of evidence are roughly additive, yielding an accuracy score of about 30 percent correct, an instance of the cooperative use of cues (a subject to which we will return at length). Interestingly, participants who got explicit syntactic information about the verbs’ original contexts of use (the frames condition) did better than those who got only nouns or only scenes and even did better than those who got both of these latter cues. Adding the scenes to the frames improved performance to well over 50 percent accuracy, as did giving participants the nouns and the frames. And, of course, performance was best (nearly 80 percent correct) when the full range of information was provided (nouns, scenes, and frames). When it is realized that each participant group was exposed to only the six randomly chosen instances for each verb as the basis for learning, the accuracy rates with these improved databases is truly impressive. Learning how verbs map onto possible events in a scene seems to be a snap in the presence of noun knowledge and knowledge of the clauselevel syntax.
Figure 3.1 Percentage of correct identification of the mystery word in the HSP as a function of the type of information supplied to the participants. (Adapted from Snedeker & Gleitman, 2004.)
Most relevant for our present purposes, these findings dovetail nicely with the findings from the earliest stages of vocabulary growth in children. For the first hundred words or so,
learning is slow and heavily favors concrete nouns that express basic-level whole-object concepts. At this stage, most infants give little evidence of competence with the syntax of the exposure language; they are mainly one-word-at-a-time speakers. At the next stage, the rate of vocabulary learning roughly triples, with all word classes represented. This stage is contemporaneous with the time at which rudiments of syntactic knowledge are evident in speech (for prior statements that these lexical and syntactic achievements are causally linked, see Bates & Goodman, 1997; L. Bloom, 1970; Gleitman & Wanner, 1982; and Lenneberg, 1967). Perhaps the most striking revelation from the HSP data concerns the trade-off in the weighting (informativeness) of various cues in predicting accuracy scores for different kinds of words.8 As mentioned earlier, Gillette et al. (1999) found through a set of correlational analyses of HSP performance that only highly concrete terms benefited in their learning from the presence of scenes, whereas the more abstract words benefited primarily from the language-internal information. For example, scenes were the overwhelmingly potent cues to identifying go and were completely uninformative for know (zero accuracy score); symmetrically, participants were almost perfect in identifying know from its syntactic environments but were helpless to do the same for go. Figure 3.2 shows this cue-trading effect in a different and more transparent way. Snedeker and Gleitman (2004), using new materials and new participants, compared accuracy scores for a subset of the test items: action verbs (relatively concrete) versus mental-content verbs (abstract verbs of perception, communication, and thinking). As the items become more abstract, language-internal cues become most informative for their identification. This outcome should come as no great surprise. To the extent that thinking takes place invisibly, inside nontransparent heads, the intent to express it could not be literally revealed by observing the objects and events that are in view (excepting, perhaps, the sight of certain Rodin statues). This generalization is the contrapositive to Yogi Berra’s dictum:
Figure 3.2 Informativity across verb types. Percentage of correct identification of the mystery word in the HSP as a function of verb class and the type of information supplied to the participants. Mental verbs are better identified by syntactic frame information, whereas action verbs are better identified by scene information and by noun information. (Adapted from Snedeker & Gleitman, 2004.)
2. If it’s not something you can observe, then you’re not going to learn it by watching. Some sophisticated linguistic knowledge of the exposure language is required to support the learning of these hard words. Before concluding this section we want to point out that several machine-learning investigators have in recent years performed computer simulations that are relevant to and supportive of the claims made here and that these findings have in most cases behavioral evidence to back them, showing that humans can (to say the least!) do just as well as the Macs, PCs, and Sun workstations in this regard. One finding from that literature is that subcategorization frame sets (of the kinds exemplified in table 3.1) can be extracted from large corpora by automatic data-manipulation procedures and assigned to specific verbs (Brent, 1994; Manning, 1993; Mintz, 2003). The incredible facility of young babies in performing distributional analyses of the kinds that these simulations use is of course well attested—most notably for syllables (Saffran et al., 1996) but for other analytic levels as well (Goldsmith, 2001; Jusczyk et al., 1992; Morgan et al., 1987). Second, corpus analyses of speech to children (Lederer et al., 1995), including crosslinguistic studies (Geyer, 1994; Li, 1994), correlational studies with adults (Fisher et al., 1991), and several computer simulations with large corpora (Dang et al., 1996; Dorr & Jones, 1995; Li, 2003; Merlo & Stevenson, 2001), provide convergent evidence that syntactic subcategorization frame overlap is a powerful predictor of semantic relatedness. How Increasing Language Knowledge Supports Vocabulary Growth: A Probabilistic Multiple-Cue Perspective
Demonstrably, then, language-internal cues help in the solution to the mapping problem. We now begin to describe why this should be so. How can mere structure cue the semantics of words? We attempt to answer this question in the remaining sections of this chapter by offering an informal model of word and phrasal learning in this regard. We consider first how the evidential sources that we identified earlier might support abstract word learning. We then offer a solution that is later explored and refined by experimental findings (in the section entitled “How Does the Child Acquire the Syntactic-Semantic Linkages?”). Distributional Cues
Knowing some of the words in a sentence narrows the space of plausibilities for what other, unknown words in that sentence might mean (Harris, 1964; Resnick, 1996; see also Pinker, 1989). Figure 3.1 shows this effect for participants in HSP, whose accuracy given distributional information (nouns) equals the accuracy of participants shown extralinguistic context (scenes), about 15 percent correct. Thus there is some degree of recoverable inference from a word to its positional neighbors. It is easy to illustrate why this is so: Drink and eat are not only transitive verbs; they systematically select animate subjects and directobject nouns that express potables and edibles. Young children are sensitive to this sort of information. For example, two-year-olds move their eyes to a picture of a glass of juice, rather than to a picture of a nonpotable object, upon hearing the familiar verb drink and before hearing the actual noun object (Chang & Fernald, 2003). And they successfully induce the referent of a new word introduced in an informative context as the object of a familiar verb: for example, “She’s feeding the ferret!” (Goodman et al., 1998). Adults also make very rapid online decisions about what the object of the verb must be, given the verb’s meaning. For instance, as soon as they hear “The boy ate …,” they direct their gaze to objects in the scene that can be eaten (Altmann & Kamide, 1999; Kako & Trueswell, 2000). In general, they rapidly constrain the domain of reference of upcoming constituents to multiple objects with appropriate semantic affordances that compete for referential consideration. Clearly, this cue will vary in its informativeness depending on the type of word. Thus, because one can find or see practically anything, distributional cues to words expressing these concepts will be far weaker than they are for such verbs as fold or break, which suggest soft and brittle things, respectively (Fillmore, 1970). Moreover, like all cues that we will be describing, the information will be probabilistic rather than determinative, as it must be if language is to express rare and even bizarre ideas. For instance, in our corpora of maternal speech, there are two instances of the sentence “Don’t eat the book.” Finally, whether distributionally related items are likely to be contiguous will also vary to some degree as a function of language type (see Mintz, 2003). Syntactic Information
The results of the HSP suggest that syntactic information—in this case, subcategorization frame information—is a powerful inferential cue to verb meaning. Again, it is easy to see the basis for why. Verbs vary in their syntactic privileges (that is, the number, type, and positioning of their associated phrases). These variations are systematically related to the
verbs’ meanings (e.g., Chomsky, 1981; Dang et al., 1996; Fisher, 1996; Fisher et al., 1991; Gleitman, 1990; Goldberg, 1995; Grimshaw, 1990; Jackendoff, 1990; Joshi & Srinivas, 1994; Levin & Rappaport Hovav, 1995; Pinker, 1989; Tanenhaus & Carlson, 1988). A verb that describes the motion of an object will usually occur with a noun phrase that specifies that object; a verb that describes action on an object will typically accept two noun phrases (i.e., be transitive), one for the actor and one for the object; a verb that describes transfer of an object from one place to another will take three arguments, one each for the moving thing and for its source (start point) and goal (end point). Similarly sensible patterns appear for argument type: a verb such as see can take a noun phrase as its complement because we can see objects, but it can also take a sentence as its complement because we can perceive states of affairs (for discussion in the context of blind children’s competence with such perceptual terms in the absence of experience, see Landau & Gleitman, 1985). These syntactic-semantic correspondence patterns show strong regularities across languages (Baker, 2001a, 2001b; Croft, 1990; Dowty, 1991). These crosslinguistic regularities have long been taken to be primary data for linguistic theories to explain, leading to significant linguistic generalizations such as the theta criterion and the projection principle (Chomsky, 1981), which jointly state that the noun phrases in sentences must be licensed by the right kind of predicate (one that can assign them a thematic, or theta, role) and that clause structure must be projected from lexical entries. Similarly, unlearned constraints linking thematic roles such as agent and theme to the grammatical functions subject and object have been proposed to explain striking crosslinguistic regularities in the assignments of semantic roles to sentence positions. Causal agents, for example, overwhelmingly appear as grammatical subjects across languages (Baker, 2001a; Keenan, 1976). Given these systematic links between syntax and meaning, the adults in Gillette et al.’s (1999) studies were able to consult each verb’s sentence structure—implicitly, of course—to glean information about the semantic structure of the verb in that sentence. The observed sentence structure, by specifying how many and what types of arguments are being selected by the verb, provides a kind of “linguistic zoom lens” to help the learner detect what is currently being expressed about an ongoing event or a state or relation (Fisher et al., 1994). Recent evidence documents that young children, like adults, use these verb-syntactic correspondences in real time to parse sentences and resolve ambiguity (e.g., Trueswell et al., 1999).9 All the same, it has sometimes been questioned how significantly distributional and syntactic information can really aid in verb learning (e.g., Pinker, 1994; Woodward & Markman, 2003). The reasons for skepticism have to be answered if the model we propose is translatable as a true picture of the processes underlying vocabulary growth and early phrasestructural learning in infants. We briefly mention some of these problems here to clarify how the model approaches their solution. Thereafter we turn to behavioral findings that support the model and further specify the account. A paucity of structural distinctions and the zoom lens hypothesis There are only a few scores of
basic phrase-structure types in a language, yet there are thousands of verbs and verb
meanings. Then how much of a constraint can the structure provide? For any particular occurrence of a novel verb in a frame, only some exceedingly coarse narrowing of the semantic range of that verb is possible. But what this syntactic information lacks in specificity, it makes up for in establishing saliency. When paired with a scene, the structural properties of an utterance focus the listener on only certain aspects of the many interpretations that are always available to describe a scene in view. Consider in this regard a study from Naigles (1990). She showed infants (average age 25 mo) a videotaped scene in which there were two salient happenings: A duck and a rabbit were each wheeling one of their arms in a wide circle as the duck was pushing the rabbit down into a squatting position. Some infants heard a voice saying, “Look! The duck and the rabbit are gorping.” The others heard, “Look! The duck is gorping the rabbit!” The two introducing sentences differ in that one exemplifies a one-argument intransitive construction and the other exemplifies a twoargument transitive construction. According to the principles relating structure and meaning, only the two-argument construction can support a causal interpretation. But can babies of this age—most of whom have never uttered a verb or even a two-word construction of any kind in their short lives—make this same inference? The answer is yes. When the two scenes were later disambiguated on two separate video screens (one showing the pushing without arm wheeling, the other showing arm wheeling without pushing), babies’ dominant gaze direction was shown to be a function of the syntactic introducing circumstances. Notice that the syntactic information that was provided could not, and therefore did not, directly cue the meaning, say, arm wheeling. There is no arm-wheeling structure in English or any language. At best, the syntactic information could only, and therefore did only, signal the distinction between a self-caused act (intransitive) and an other-caused act (transitive). If these babies learned that the statement “The rabbit and the duck are gorping” meant ‘They are wheeling their arms’ (something that the manipulation cannot reveal), then that conjecture is based on two cues converging on the same solution: the information in the scene and the collateral, argument-specifying information in the syntax. In sum, a single syntactic cue can play only a focusing role: it causes the listener to zoom in on one salient aspect of an ambiguous scene. Because we believe, along with Chomsky and Quine and Goodman, that every scene is multiply ambiguous, even saliently so, this zoom lens function is crucial in solving the mapping problem for verbs. Refined semantic information from the matrix of subcategorization frames In the Naigles study just
described, the learning situation was modeled for circumstances in which the learner was provided with only a single scene by syntax pair. But notice that in the HSP nonsense-frame manipulation (table 3.1), participants were provided with half a dozen exemplars of the structures licensed for the mystery word, as spoken by the mother. The semantically powerful role that these multiple frames play—both in learning and in sentence processing throughout life—derives from the fact that they successively narrow the semantic range of single verbs. Each frame that a verb naturally accepts provides an indication of one of its allowed argument structures; the set of frames associated with single verbs provides convergent evidence as to their full expressive range (Fisher & Gleitman, 2002; Gleitman & Fisher, in
press; Landau & Gleitman, 1985; Levin, 1993). Very few verbs share all their syntactic privileges; for many verbs, their licensed frame set may be close to unique. Much more interestingly, overlap in verbs’ syntactic range is a powerful measure of their semantic relatedness, as has been shown in correlational studies with adults (Fisher et al., 1991) and in analyses of the input speech to young children (Lederer et al., 1995). Moreover, as we move from language to language, we see that the same frame sets are associated with the same syntactic generalizations over a substantial range (e.g., Baker, 2001b; Geyer, 1994). To give an intuitive feel for the power of syntactic overlap for semantic interpretations, we can do no better than to quote Zwicky (1971), who makes this point forcefully: If you invent a verb, say greem, which refers to an intended act of communication by speech and describes the physical characteristics of the act (say a loud hoarse quality) then you know that … it will be possible to greem (i.e., speak loudly and hoarsely), to greem for someone to get you a glass of water, to greem to your sister about the price of doughnuts, to greem “Ecch” at your enemies, to have their greem frighten the baby, to greem to me that my examples are absurd, and to give a greem when you see the explanation. (p. 232)
Notice, then, that while there are only scores of individual subcategorization frames, there are hundreds of thousands of distinct combinations in which these can occur, vastly more than the number of verbs in any language. In other words, the verb by frame matrix is sparsely populated, with the effect that the convergence of frames can and often does yield a rather precise picture of the expressive range of any verb. In figure 3.3 we show some of this convergency for a set of verbs in frames (theoretically including greem).
Figure 3.3 A partial subcategorization matrix illustrated for eight verbs. Verbs that can describe self-caused acts such as come, go, think, and know license one-argument intransitive constructions when they do so; verbs that can describe transfer events such as give, get, argue, and explain appear in three-argument ditransitive constructions when they do so; verbs such as think, know, argue, and explain, which can describe the mental relation between an entity and an event, appear in tensed sentence-complement constructions (S-complement) when they do so. Taken together, then, give describes the transfer of physical objects (three arguments but not sentence complements), whereas explain describes the transfer of mental objects, such as ideas—that is, communication—and so licenses both three-argument and S-complement structures (John explains the facts to Mary; John explains [to Mary] that dinosaurs are extinct). In contrast, the intransitive possibilities of come, go, think, and know reflect their independence of outside agencies; that is, thinking is “one-head cognition.” Notice also that unlicensed frames, if uttered, add their semantics to known verbs. For example, “John thinks the ball to Mary” would be interpretable as an instance of psychokinesis.
A Potential Practical Limitation
Though we cannot discuss this important issue at length in the present chapter, we do want to point out that the mapping between argument structure and surface sentences is complex and indirect, rendering the rosy picture just painted much more difficult than it seems at first glance (and is the meat and potatoes of generative grammar). Thus even though give is (or is claimed to be) a three-argument predicate, it often shows up in real utterances with only two noun phrases, as in “Give me your hand.” This argument-number–noun-phrase-number mismatch often looks materially worse for languages that allow more extensive argument dropping at the surface than English does (e.g., Korean or Mandarin Chinese). Verbs in such languages often appear with fewer than their (supposedly) required arguments represented as overt noun phrases (Rispoli, 1989). However, this variability is systematic; that is, the generalizations implicit in the data set have to do with such properties as the maximum number of noun phrases that a verb ever allows and, in large frequency, differences of verb occurrence within different frames. Even in English we may say, “John gives a book to Mary” or “John gives at the office” or “Something’s gotta give.” But the difference between give and, say, snore or sleep is that these latter verbs, unlike give, are vanishingly rare in ditransitive (three-argument) structures and predictably metaphorical when they so occur (“The reader may be sleeping his way
through this chapter”). In languages with a significant proportion of argument-dropped utterances, this relationship remains systematic: as a statistical matter, the verb meaning ‘give’ continues to occur with a larger number of overt noun phrases than does the verb meaning ‘hit’ and, mutatis mutandis, for ‘snore.’ (For a discussion of the complexity of nounphrase-to-argument relations in languages and their relation to the syntactic bootstrapping hypothesis, see Lidz et al., 2003b; Lidz & Gleitman, 2004.) How Does the Child Acquire the Syntactic-Semantic Linkages?
We have just reviewed the ideas behind the syntactic bootstrapping hypothesis for lexical learning. However, we have so far left aside fundamental questions about how the infant could ever make contact with, and come to exploit, a system of language-to-thought mappings of the kinds just posited. We turn now to two such questions. The Unlearned Character of Predicate-Argument Structure
To an interesting extent, prelinguistic infants naturally factor their representations of events into conceptual predicates and arguments. A particularly striking piece of evidence comes from habituation and eye-movement studies (Gordon, 2003) in which infants were shown videos depicting giving or hugging. In the former, two people approach each other; one hands a large stuffed bear to the other; and then they part. In the latter, two people approach each other, embrace, and then part. The clever part of this manipulation is that in both scenes one of the two participants is holding a large floppy stuffed toy. The only difference between the two depicted events in this regard is that only in the give scene is this toy transferred from one participant’s grasp to the other’s before the two separate. Once babies were habituated to such scenes, they viewed new scenes that were identical to the originals except that the toy was now absent. The habituation results and, most interesting, the infants’ eye movements demonstrated that the infants visually searched for the missing toy in these new give (or givelike) scenes but not in the new hug scenes. For the new give scenes, they gazed at the area of the two people’s hands, as if searching for the missing toy. In contrast, they did not seem to notice much difference (they did not dishabituate) when a yellow flopping bear was suddenly no longer in view in new scenes of hugging. They did not even look toward the hand of the person who previously held it, nor did they give other measurable signs that they were thinking, “Whatever happened to that yellow bear?” Apparently, the babies’ implicit supposition was that, even though stuffed bears are of great interest in everyday life, hugging events are not relevantly changed as a function of whether one of the huggers is holding a bear during the performance of this act. But an act of giving is demolished if the potential present does not change hands. How Arguments Map onto Syntactic Representations
We have just sketched evidence that infants factor their experience into things and their doings—for babies, just as for adults, the act of, say, a kangaroo jumping comes apart naturally into the kangaroo and its jumping. The most central question for our proposal remains to be discussed: What suppositions (if any) does the learner make regarding how
these natural parts are to be mapped onto linguistic representations? In this regard we now discuss three principled ways that languages—and the babies who come to learn them— realize predicate-argument structure in the form of the linguistic clause, thus rendering the learning of hard words easy (or at least easier). These mappings relate to three essential variables in argument-structure representation in language and thought: argument number, argument position, and argument type.10 As we sketch these mappings and the evidence for how their discovery supports acquisition of the vocabulary, we will be emphasizing the role of multiple probabilistic cues. This is critical because each of these cue types is unreliable when taken by itself. Efficient learning of the abstract vocabulary is made possible by the fact that these cues trade (one does service when the next is unavailable or uninformative) and conspire (several cues converge on the same conjecture). Argument number and noun-phrase number We have already looked at some evidence (Naigles,
1990) that young language learners expect a simple mapping between predicate-argument structure and the design of the clause, namely, 3. Each thematic (argument) role receives a slot in the clause. This is an informal statement of the linguistic principle known as the theta criterion (Chomsky, 1981). A child acquiring a language might expect this principle to be realized in the number of noun phrases that appear in construction with the verb. 4. Every argument is realized as a noun phrase in sentences as uttered.11 Perhaps the most revealing evidence in favor of 4 being quite literally an expectation of learners as to how languages are designed comes from observation of isolated deaf children and the languages they devise without a formal linguistic model. Most congenitally deaf children are born to hearing parents who do not sign; therefore, such children may not come into contact with gestured languages for years. Their deafness also makes it impossible for them to acquire the language spoken in the home. Children in these circumstances spontaneously invent gesture systems called home sign. Remarkably, although these children are isolated from exposure to any conventional language, their home sign systems partition their experience into the same basic elements that characterize known human languages. These communicative systems have nouns and verbs, distinguishable from each other by, among other indicants, their distinctive iconic properties. For instance, an outstretched hand, palm up, denotes ‘give.’ In an early study of six children in these circumstances, Feldman et al. (1978; for a definitive statement and evidence, see Goldin-Meadow, 2003a) were able to show that the number of noun phrases in these children’s gestured sentences was a function of the verb’s argument structure, with the number of signed arguments systematically related to the argument structure of each verb, in accordance with 3 and 4. Intensive study of the syntactic character of these self-devised systems shows that the same principles arise again and again in different cultural environments and in contexts where the surrounding linguistic community is speaking languages as different as Chinese and English (see Goldin-Meadow, 2003b, for a full theoretical and empirical treatment of self-invented sign systems and their
crucial status for understanding language learning). Adding materially to this picture are studies of the elaboration and formalization of such systems when home signers form a stable community that maintains its social integrity across time, as in the recent emergence of Nicaraguan Sign Language (Senghas et al., 1997). Thus, the same fundamental relationships between verb meaning and clause structures surface in the speech of children who are acquiring a conventional language and in the gestures of linguistically isolated children who must invent one for themselves. Another way to study children’s respect for the alignment between argument number and event participants is by testing how they semantically extend known verbs when these are heard uttered in novel syntactic contexts. Naigles, Gleitman, and Gleitman (1992) and Naigles, Fowler, and Helm (1992) asked two- to five-year-olds to act out sentences using a toy Noah’s ark and its associated characters. The informative trials were those in which a familiar verb was presented in a novel syntactic environment, as in “Noah brings to the ark” or “Noah goes the elephant to the ark.” The children adjusted their interpretation of the verb to fit its new syntactic frame: for example, acting out go as ‘cause to go’ (or ‘bring’) when it occurred ditransitively and bring as ‘go’ when it occurred intransitively. The important generalization here is that semantic extensions of these verbs in novel linguistic contexts are precisely what is expected if the child implicitly honors 3 and 4. A further informative manipulation is from Fisher (1996). She showed two-and-a-half-, three-, and five-year-olds unfamiliar motion events, describing them with nonsense verbs. The verbs were presented either transitively or intransitively. The sentences contained only pronouns that did not distinguish between the all-female characters depicted—for example, “She’s pilking her over there” or “She’s pilking over there.” Thus the sentences differed only in their number of noun phrases. Children’s interpretations of the novel verbs were tested by asking them to point out in a still picture of the previously demonstrated event which character’s role the verb described—“Who’s pilking over there?” or “Who’s pilking her over there?” Adults and children at all three ages were more likely to select the causal agent in the event as the subject of a transitive verb than as the subject of an intransitive verb. Just as for the adult judges in the Gillette et al. (1999) studies, these findings provide evidence that the number of noun phrases in the sentence—here without information from noun labels regarding the grammatical subject—influences even two-year-olds’ interpretations of verbs. Compare these results with the innovations of the home signers who invented their own manual communication systems. In both cases, children seem to be biased to map participants in a conceptual representation of an event one-to-one onto noun arguments in sentences. One further crucial question should be raised concerning the status of these principles for learners. Are they acquired by induction from the statistical preponderances manifest in a particular language, as proposed by Tomasello (2000) and Goldberg (2004); by the product of unlearned expectations, as the home signer data seem to suggest; or both? To find out, it is useful to look at a language for which the alignment of noun phrases with arguments is partially masked, indeed heavily supplanted by alternative coding machinery. Lidz and
colleagues (Lidz & Gleitman, 2004; Lidz et al., 2003b) performed such a test by replicating the Noah’s ark studies in Kannada, a language spoken by some millions of individuals in southwestern India. Kannada differs markedly from English in two relevant ways. First, Kannada licenses much more extensive argument dropping than does English, thus weakening the relationship between argument number and surface noun-phrase number in input speech. Second, it only rarely employs lexical causatives, as in the English sink, burn, and open. Transparently enough with regard to principle 3, for many such items, English simply adds a noun phrase, rendering “The door opens” as its causative “John opens the door.”12 In contrast, Kannada systematically requires, in addition to adding the noun phrase expressing the new role, adding a causative suffix to verbs when and only when causativity is intended. For example, the following (Lidz et al., 2003b) is a Kannada intransitive noncausal usage meaning ‘The horse rises’: 5. Kudure eer-utt-ade horse rise-npst-3sn To express the causative proposition “The alligator raises the horse” in Kannada, one cannot follow the English model and simply add a noun phrase for the causal agent (in this case, an alligator). That is, the following hypothetical form is ungrammatical: 6. moSale kudure-yannu eer-utt-ade alligator horse-acc rise-npst-3sn Rather the causative suffix (-is) is also required: 7. moSale kudure-yannu eer-is-utt-ade alligator horse-acc rise-caus-npst-3sn One sees this morphological machinery in English (e.g., lionize or elevate), though sporadically and unsystematically. In short, the two languages differ in their dominant means for expressing the causative version of a predicate. Surprisingly enough, young Kannada-speaking children who were tested in a Kannadaflavored version of the Noah’s ark paradigm assigned causative interpretation to anomalous structures in Kannada as a function of noun-phrase number only (as if they were speakers of English), ignoring the statistically more reliable cue of the presence or absence of the causative morpheme. In contrast, adult Kannada speakers were sensitive to noun-phrase number and the appearance or nonappearance of this morpheme. Lidz et al. (2003b) drew two related conclusions. First, the young children’s behavior reflects the influence of a strong unlearned bias toward the one-to-one alignment principle 3, a bias implicated in early verb learning. Second, the language-particular morphological means of Kannada became linguistic second nature to its expert speakers and so, along with the universal principles, came to play a role in their productive form-to-meaning generalizations. Syntactic configuration and argument position Child learners are not limited to the noun-phrase-
to-argument-number principle as language-internal evidence for the meaning of verbs. The
position of each noun phrase can be highly informative, too, especially in languages that, like English, are quite strictly phrase ordered: 8. The structural positions of noun phrases in the clause are related to their thematic role assignments. We see this principle at work in the spontaneous gesturing of the home signers. In the children’s signing, the nouns occurring with each verb do not occur haphazardly to either side of the verb; rather, the children adopt systematic gesture orders, such as routinely signing undergoers immediately before verbs, transitive agents following verbs, and intransitive actors before verbs. Thus, a home signer who produced “snack–eat– Susan” might also produce “Susan–move over” and “cheese–eat” (GoldinMeadow, 2003a). Apparently, just as no child has to learn to factor experience into predicates and arguments, no child has to learn from some imposed external model to use word order systematically to specify the semantic role played by each element. However, the ability to exploit surface positional cues to thematic role assignment varies in informativeness and in the specifics of realization across languages. Such cues are most useful, obviously, for languages that are systematically phrase ordered (such as English), less useful in the case of scrambling languages, and perhaps useless in nonconfigural languages. Even for the many languages in which cues from serial order map most transparently onto hierarchical structure, the specifics have to be acquired. Some languages are canonically ordered subject-verb-object, but several other orders occur, including in languages in which objects canonically precede subjects. Whatever the ordering, the child must discover it to make the kinds of semantic-syntactic inferences we are now discussing. Notice that the simulated absence of this knowledge shown in the first three bars of figure 3.1 (i.e., nouns, scenes, and scenes + nouns) limits the efficiency of the learning procedure for hard words (15, 17, and 30 percent accuracy scores, respectively). The knowledge enabling the use of syntactic cues to recover the structure of input utterances (in the subsequent four bars) sharply increases that efficiency. In essence, the big issue for the learner in reaching this point is to discover the structural position of the grammatical subject. Again, unlearned as well as learned factors contribute to this critical step. There is a universal crosslinguistic bias for agent and source semantic roles to capture the grammatical subject position, especially with motion verbs (Baker, 2001a; Dowty, 1991; Keenan, 1976). If this is the learner’s expectation—and if he or she understands the nouns boy and ball—then hearing such sentences as “The boy hit the ball” in the presence of scenes showing a boy hitting a ball will set the subject position for English (as will “The ball hit the boy” and its standard extralinguistic contingencies; and so forth). Here, we consider studies that demonstrate the special potency of argument position, once established, for disentangling perspective verb pairs, a particularly interesting class of items for understanding the vocabulary learning machinery. Perspective verb pairs include, among many others, buy/sell, chase/flee, and give/get (see figure 3.4 for sample depictions). As these cases illustrate, such pairs describe highly overlapping, if not identical, actions and states of affairs in the observed world. Consider, for example, chase and flee. Both of these predicate
the same event. One implies the other. Whenever the hounds are chasing the fox, the fox is fleeing the hounds. If some brave fox turns and makes a stand against its tormentors, it is no longer running away; and in virtue of that, the hounds are no longer chasing it. But if the contingencies for the use of both members of such a pair are the same, then the two cannot be distinguished by Locke’s method (1), which requires that such contingencies differ.
Figure 3.4 Cartoon illustrations of two perspective verb pair events. Can syntax override the saliences of the scene? (From Fisher et al., 1994.)
Notice that the fox-hunting scenario is not invisible or imperceptible; chase and flee are not abstract in the senses we discussed earlier. What is different for the two verbs is the speaker’s perspective on the (same) event: whether the utterance is about the fox or about the hounds. This feature of the predication is invisible. The perspective exists only in the eye of the beholder—in this case, the speaker. To be sure, the two verb meanings encode the two alternative perspective choices, but the question at hand is how the learner discovers which is which if the scene itself is one and the same. Now the syntactic positioning can fill the informational gap. The position of the mentioned participants in the sentence sets their roles as grammatical subject and complement, thereby fixing the meaning of the verb—for example, the rabbit and elephant in the chase/flee scene in figure 3.4. Children as young as three years (and probably younger) make these syntactic inferences (Fisher et al., 1994). When watching live-action puppet-show events like those illustrated in figure 3.4, there is a natural tendency to take the source participant, rather than the goal participant, as the salient perspective (for discussion of these source-goal asymmetries, see
Lakusta & Landau, in press) and, equivalently, to select the perceived causal agent as the sentence subject. For instance, when a scene like that depicted in the second cartoon in figure 3.4 is described without syntactic cues (“Oh look, glorping!”), children and adults show a strong preference to think the novel verb means something like ‘chase’ rather than ‘flee.’ That is, the causal structure of the event preferentially flows from rabbit instigation to elephant reaction. This preference is substantially enhanced when source syntax is provided (when the scene is linguistically described as “The rabbit is glorping the elephant”). But the preference is reversed to a goal preference when goal syntax is provided (“The elephant is glorping the rabbit”); in that case, children and adults think that glorp means ‘run away.’ Figure 3.5 shows Fisher et al.’s (1994) effects quantitatively (collapsed across several perspective verb pairs including chase/flee and give/get, as in figure 3.4). As figure 3.5 shows, the salience of the source perspective is something that the syntactic configurational evidence must battle against, and it does so fairly successfully but not perfectly (especially for the child participants).13 This pattern would be expected if the structural configuration chosen by a speaker were indeed used to reflect his or her attentional stance, or perspective. Research on discourse coherence strongly suggests that subject position plays such a role in the transition from one utterance to the next (e.g., Gordon et al., 1993; Walker et al., 1997). Subject position is often used to denote the current discourse center. It often marks transitions from one center to another. This is why Fisher et al. described their effect as a “syntactic zoom lens,” in which the configuration of the utterance helps the child take the perspective necessary to achieve successful communication and to infer the meaning of unknown elements in an utterance.
Figure 3.5 Proportion of source and goal subject interpretations by 3- and 4-yr-olds and adults as a function of introducing syntactic context. (Adapted from Fisher et al., 1994.)
As just discussed, because the usefulness of the argument-structure cue is heavily dependent on, and interacts with, real-world factors, we now ask whether there are systematic means of communicating or displaying attentional state (i.e., gaze, gesture, posture) such that these may play an informative role in word learning. After all, it is no more reasonable to suppose that lexical learning ignores extralinguistic context than to suppose that it is inattentive to the syntactic context. The work of Baldwin (1991) suggests that the child’s following of the maternal gaze as a clue to attentional state heavily influences the mapping procedure in the case of nouns. But what about verbs? Can the attentional state of a speaker serve as a cue to verb learning? If so, do young children track these attentional states of speakers to recover their referential intentions? We have begun to explore this question in a series of studies using perspective verb pairs (Nappa et al., 2004). For the language learner to be able to use attentional state information as a predictive cue to a speaker’s linguistic perspective, a reliable relationship would have to be established between attention-direction patterns and the ensuing linguistic choices. Thus, the first step in this line of research was to establish whether the attentional state of an adult speaker in fact contributes to choice of sentential subject (the one that the description is about) and hence verb choice. Prior work had suggested that this might be the case (e.g., Forrest, 1996; Tomlin, 1997), but manipulations in these previous studies were often overt, leaving open the possibility that these were laboratory effects—speakers just trying to please the experimenter —and might not characterize normal communicative interactions. We therefore studied this issue again, using subliminal attention-capture manipulations, and we found that we can indeed influence speaker word order and verb choices for the perspective verb pairs. In particular, participants were asked to describe pictures designed to elicit perspective verb description (e.g., figure 3.6A). From their descriptions, we coded their choice of subject and their choice of verb (e.g., “The dog is chasing …” vs. “The man is running away …”). Crucially, we captured a speaker’s attention on a particular character by briefly flashing a square on the computer screen just before the onset of the picture: this square was aligned with the upcoming position of one character or the other; it typically caused eye movements to that character; and it was rarely if ever noticed by the speaker (i.e., a subliminal attention capture). Capturing attention on the chaser in figure 3.6A generated chase utterances, whereas capturing attention on the fleer generated increased run away / flee utterances (figure 3.6B). So, how the speaker attentionally approaches an event such as this does seem to affect its description and verb choice.
Figure 3.6 Attention, structure, and verb use. An attention-capture technique used in A influenced the choice of structure and verb use (B). The attentional state of a speaker (C) had a similar effect (D). Preferred subject is defined as the subject that people typically conjecture in unprimed situations. (Adapted from Nappa et al., 2004.)
What about the listener? Can cues to the attentional state of a speaker help in the listener’s inference of a verb meaning? Preliminary evidence suggests that this is possible, at least for adults. We modified our task to include a character describing the scene (see figure 3.6C), and we asked our participants to “Guess what John is saying.” Note that this is quite similar to the task in the HSP: we were asking participants to guess a verb, this time in the absence of syntactic cues. As can be seen in figure 3.6D, verb choice was influenced toward the direction where John was looking: looking at the fleer increased the use of run away / flee. We are now in the process of assessing whether children can make a similar inference under verb-learning situations. The question is whether, as Baldwin’s (1991) studies suggest, the child following his or her mother’s gaze in chase-flee scenes will show a bias shift, as the mother does and as our adult participants do in the laboratory. We acknowledge that it is early days to make strong claims in this regard. So we turn next to another kind of hard word, one for which we can provide stronger empirical evidence for the child’s convergent use of structural and situational evidence. Argument type and the acquisition of mental-content verbs Mental
verbs are hard too. Even though children produce verbs describing actions or physical motion early, often before their second birthday (L. Bloom et al., 1975) and appear to understand them well (Gentner, 1978; Huttenlocher et al., 1983), they do not use mental verbs as such until they are about two-anda-half years old (Bretherton & Beeghly, 1982; Shatz et al., 1983), and they do not fully distinguish them from one another in comprehension until around age four (Johnson & Maratsos, 1977; Moore et al., 1989). For the mental verbs, a new type of structural
information becomes important in the inference to meaning: 9. The lexical and phrasal composition of arguments is related to the meanings of their predicates. And in particular: 10. Sentence complementation implies a thematic relation between an animate entity and a proposition (semantically, an event or state of affairs). This relation holds in every known language of the world. The class of items licensing sentence complements includes verbs of communication (e.g., say, tell, announce, and Zwicky’s nonce verb greem), perception (see, hear, perceive), and mental acts or states (believe, think, know) (see figure 3.3). To the extent that children can identify it, this syntactic behavior is a useful and principled cue to a novel verb’s meaning. Argument type, in conjunction with argument number and position, provides a source of information that systematically cross-classifies the set of verbs within and across languages along lines of broad semantic similarity (Fisher et al., 1991; Geyer, 1994; Gleitman & Fisher, in press; Landau & Gleitman, 1985; Lidz et al., 2003a). As mentioned earlier, adults are sensitive to these regularities in syntax-to-semantics mappings. Furthermore, adults weigh these aspects of language design differently for different verb classes, as the HSP showed (figure 3.3). This finding makes sense once one considers the syntactic privileges of the two verb classes. Action verbs are more likely to appear in transitive or simple intransitive frames, which are themselves associated with a broad range of verb meanings. Consequently, these frames provide little constraint on the kind of verb that can appear in them. By contrast, mental verbs often take clausal complements which are more restrictive and hence more informative about the kind of verb that can appear with them (for demonstrations of effects of differential frame informativeness, see Goldberg, 1995; Kako, 1998; Lederer et al., 1995). The HSP studies also showed that, in the case of action verbs, scene information had some measurable efficacy in securing verb identification; however, the same scenes cue was highly indeterminate for the identification of mental predicates (think was hard to acquire by inspecting scenes containing thinkers). Can children take advantage of argument type in inferring the meanings of new verbs? And how do they coordinate such structural constraints with event representations delivered by observation of the world? Papafragou et al. (2007) set out to investigate these questions by focusing on the vexing class of mental-content predicates, particularly credal verbs, such as think and believe. The idea was to compare the contributions of syntactic cues (e.g., sentential complementation) versus potentially helpful observational cues (e.g., the presence of a salient mental state, such as a false belief held by an event participant) in the identification of credal verbs. In this study, adults and four-year-old children watched a series of videotaped stories with a prerecorded narrative. At the end of each clip, one of the story characters described what happened in the scene with a sentence in which the verb was replaced by a nonsense word.
The participants’ task, much as in the HSP, was to identify the meaning of the mystery word. The stories fully crossed type of situation (true versus false belief) with syntactic frame (transitive frame with direct object versus clausal that-complement). For instance, in one of the false-belief stories inspired by the adventures of Little Red Riding Hood, a boy named Matt brings food to his grandmother (who is actually a big bad cat in disguise). In the truebelief variant of the story, the big cat accompanies Matt as he brings food to his real grandmother. At the end of each story, the cat offers one of these two statements: 11. “Did you see that? Matt gorps that his grandmother is under the covers!” (complement clause condition) 12. “Did you see that? Matt gorps a basket of food!” (transitive condition) The researchers hypothesized that false-belief situations would increase the salience of belief states and would make such states more probable topics for conversation—thereby promoting mentalistic conjectures for the novel verb. They also hypothesized that sentential complements would prompt mentalistic interpretations for the target verb. They predicted that when both types of cues cooperated (i.e., in the false-belief scenes with a sentential complement), the situations would be particularly supportive of mentalistic guesses. Finally, they expected syntactic cues to overwhelm observational biases when the two conflicted (e.g., in false-belief scenes with a transitive frame). These predictions were borne out. The data showed that scene type had a major effect on the verb guesses produced by both children and adults. Specifically, false-belief scenes increased the percentage of belief verbs guessed by the experimental participants, when compared to true-belief scenes (from 7 percent to 27 percent in children’s responses and from 24 percent to 46 percent in adults’ responses). The effects of syntax were even more striking: transitive frames almost never occurred with belief verbs, whereas complement clauses strongly prompted belief verbs (27 percent and 66 percent of all responses in children and adults, respectively). When both types of supportive cues were present (i.e., in false-belief scenes with complement clause syntax), a substantial proportion (41 percent) of children’s responses and an overwhelming majority (85 percent) of adults’ responses were belief verbs. Similar effects were obtained in a further experiment with adults, which assessed pure effects of syntactic environment (minus supporting content words) in the identification of mental verbs. In that study, true- and false-belief scenes were paired with transitive or complement clause structures from which all content words had been removed and replaced with nonsense words (e.g., “He glorps the bleep” versus “He glorps that the bleep glexes”). Again syntax proved a more reliable cue over even the most suggestive extralinguistic contexts; furthermore, the combination of clausal and false-belief scene information resulted in an overwhelming proportion of mental-verb guesses. Taken together, these experiments demonstrate that the syntactic type of a verb’s argument (e.g., whether the object of a transitive verb is a noun phrase or a tensed sentence complement) helps word learners narrow their hypotheses about the possible meaning of the verb.14 Furthermore, this type of syntactic cue interacts overadditively with cues from the
extralinguistic environment (e.g., the salience of a mental state). We interpret these findings to support the presence of a learning procedure with three crucial properties: it is sensitive to different types of information in hypothesizing the meaning of novel words; it is especially responsive to the presence of multiple conspiring cues; and it especially weights the language-internal cues when faced with unreliable extralinguistic cues to the meaning of the verb (see figure 3.2 for related evidence from HSP). Remarkably, the workings of this procedure seem much alike in young and more experienced (adult) learners. Both groups show sensitivity to the same kinds of syntactic and situational information, and both groups are able to combine this information in learning novel word meanings in broadly the same ways. To be sure, child participants provide more variable data, but the character of the data set is the same across the age groups. The fact that adults and children are sensitive to the same variables in the same approximate difference magnitudes is unexpected on accounts that attribute children’s difficulties with mental and other kinds of meaning to the cognitive immaturity of the learner. It is entirely compatible, however, with proposals that explain the course of early verb learning in terms of the information conditions required to map different kinds of verbs onto their meanings (e.g., Gleitman, 1990; Snedeker & Gleitman, 2004). For mental verbs, the information relevant to identify them resides almost exclusively in their distinctive syntactic privileges. The unavailability of such information at the early stages of word learning delays the acquisition of mental verbs accordingly. Summary and Discussion
We began this chapter by asking the questions, How can children learn the words of their native language, and what is it about natural language vocabulary that allows it to be learned by mere children? We suggested that the answers to these questions are related and point to a learning procedure in which unlearned biases about the mapping of structure onto meaning interact with a learning machinery that integrates across multiple probabilistic sources of input evidence, linguistic and extralinguistic. A key to the evolution of word types over the course of early vocabulary growth has to do with the changing availability of the linguistic cues and their differential potency in cuing different aspects of the lexical stock. As learning begins, the novice’s only available cue resides in the ability to interpret the ambient world conceptually and in pragmatically salient ways, matching the lexical formatives with their contingencies of use. This word-to-world pairing procedure sets word learning into motion but is efficient only for certain terms, the so-called concrete ones, especially basic-level object terms. Limited to this kind of evidence, even adults acquire mainly concrete terms, suggesting that the constraint on children, too, may be more informational than conceptual. Acquisition of abstract items requires the learner to examine the distribution of these known items against one another as a source (among several) of information about the language specifics of the phrase structure. Once the learner has acquired these syntactic facts, he or she can match interpretations of ongoing events and states of affairs with the semantically relevant structures underlying co-occurring utterances, a powerful structure-to-world
matching procedure that is efficient across all word types (figure 3.1). This improvement in informational resources, rather than changes in the learner’s mentality, is what most centrally accounts for the changing character of vocabulary knowledge over the first few years of life, with abstract items acquired relatively late (figure 3.2). In sum, lexical learning is intrinsically ordered in time, with certain kinds of words necessarily acquired before others, for learning-theoretic reasons rather than conceptual-growth reasons. To learn the verbs efficiently, one needs prior knowledge of a stock of nouns, and one needs to construct linguistic representations that will reveal the argument structures intended by the adults who utter them. We next focused on two kinds of issues concerning the informativeness of structure for lexical learning. The first had to do with how the structure manages to be so efficient as a semantic cue in light of the limited variation among base syntactic structures. The answer was twofold. First, we described the zoom lens hypothesis (Fisher et al., 1994; Gleitman & Wanner, 1982; Landau & Gleitman, 1985): the particular syntactic structure in which a verb appears in the current utterance reveals the argument structure it is currently encoding (figures 3.4 and 3.5). In detail, we showed that the features of argument number, argument position, and argument type are revealing in these regards. The zoom lens procedure plays the joint function of focusing the listener’s attention—rendering different aspects of observed scenes more, or less, salient—and exposing one syntactic-semantic linkage that applies to the specific verb. Thus the role of single structures, while too underspecified to establish any verb’s exact meaning, is crucial in narrowing the way that the extralinguistic world itself is to be relevantly parsed. Most strikingly, sentence complement constructions focus the listener’s attention on mental aspects of a situation that otherwise are rarely salient to listeners, child or adult (figure 3.2). The second role for syntax in accounting for lexical learning has to do with the information value of the range of a verb’s allowed subcategorization (and related) features, taken together. These frame matrices are very sparsely populated (most structures are disallowed for most verbs), and they partition the stock of verbs into rather tight semantic subclasses. Partial overlap of such frame ranges is therefore a powerful predictor of semantic relatedness (figure 3.3). As we showed, children and adults efficiently use a verb’s observed syntactic range to make quite precise meaning conjectures. Rather remarkably, enough of this range is reflected, even in a half dozen randomly chosen utterances of a mother to her baby, that a commendably high accuracy score in identifying them was achieved in the HSP laboratory setting (figure 3.1). As we progressed through this discussion, we emphasized that there is much unpacking to be done in phrases such as “Children … make use of …” and other remarks that have to do with a learning procedure in quite a classical sense. Expectations and biases about language structure and contents ultimately have to make contact with the stream of sounds and the stream of events that confront the novice. Unlearned biases and expectations about the nature of language do not relieve learners from the necessity to acquire the exposure language by inductive inferences of great complexity and subtlety using, among other cues, the evidence
of the senses. We tried to show that, for learners to understand how the exposure language realizes these expectations, they need to access an information-processing device that combines, weighs, and integrates across information from different sources (figure 3.6). One important implication of the learning procedure as described is that vocabulary and language-specific syntax are interlocked all along their course. The result is a knowledge representation in which detailed syntactic and semantic information is linked at the level of the lexicon. We do not believe that these lexically specific representations, created in the course of and for the purpose of learning, are dismantled or replaced at some point in life when learning is more or less complete. Rather, the learning procedure leaves its footprint in the mature language design (in related regard, see Osherson & Weinstein, 1982; Pinker, 1984; Wexler & Culicover, 1980; for related but quite different perspectives on how learning may constrain language design, see Elman, 1993; Seidenberg, 1997). Experimentation on sentence comprehension suggests the continued lexical specificity of linguistic knowledge. This body of findings tells us how detailed probabilistic knowledge about the syntactic behavior of individual verbs pervades the adult language processing system. Native speakers learn not only which sentence structures can grammatically combine with each verb, but also how often each verb occurs in each such structure. Adults retrieve this information as soon as they identify a verb, and they use it to bias online sentence interpretation (e.g., Garnsey et al., 1997; Trueswell & Kim, 1998). Snedeker and Trueswell (2004) demonstrated that children and adults resolve the ambiguity of such sentences as “Tickle the frog with the feather” and “Choose the frog with the feather” as a function of the frequency with which these verbs naturally occur with noun-phrase or verb-phrase modification. Thus online parsing decisions by adults and by children as young as five years are influenced by detailed and frequency-sensitive knowledge about the syntactic behavior of each verb. These findings from the psycholinguistic literature mesh naturally with computational approaches to parsing that represent syntactic representations as strongly lexicalized. For example, in Lexicalized Tree Adjoining Grammar, the syntactic possibilities of a language are represented by a finite set of tree structures that are linked with individual lexical items and a small set of operations by which trees can be joined (e.g., Srinivas & Joshi, 1999). This apparatus permits the statement of syntactic dependencies (such as subcategorization) and semantic dependencies (such as selection restrictions) and yields a natural treatment of noncompositional idioms (kick the bucket). Such approaches are based on a claim similar to the one we derived from examination of the learning procedure: an adequate description of the syntactic combinatorial principles of a language is intrinsically entwined with the lexicon and the principles by which it is acquired. Notes 1. Notationally, we use italics for mention of a phrase or word, “double quotes” for its utterance or sound, and ‘single quotes’ for a concept. 2. This research was partially supported by grants from the National Institutes of Health to John Trueswell and Lila Gleitman (#1-R01-HD37507) and Anna Papafragou (#F32MH65020). This chapter was originally published in Language
Learning and Development, 1(1), 23–64, and revised for this volume. 3. Large differences in the type-token frequencies of nouns and verbs crosslinguistically result from the fact that some languages permit argument dropping (where the content is pragmatically recoverable) much more than others do (see, e.g., Gelman & Tardif, 1998; Tardif et al., 1999). But even in such verb-friendly languages as Korean and Mandarin Chinese, the noun advantage in early learning is still visible, though smaller in magnitude (Gentner & Boroditsky, 2001; Snedeker & Li, 2000). 4. The idea that the noun advantage is an artifact of the greater concreteness of concepts expressed by the common stock of nouns compared to the common stock of verbs is maintained by almost all investigators of these phenomena, notably, Gentner (1982), whose individuation hypothesis was the first in the modern era to draw attention to the facilitative role of transparent word-to-world mappings (but see Hume, 1739 [1978], on simple concepts). The explanatory victory here, as Gentner also notes, is somewhat hollow, because concreteness is itself a term in need of considerable sharpening. For instance, a large literature shows that all sorts of words describe concepts whose exemplars are perceptible, so in this sense concrete, but not all of these are equally easy to learn (L. Bloom et al., 1975; Graham et al., 1998; Hall, 1991; Kako, 2005; Markman, 1987). These include, among others, partitives (e.g., trunk or tail as opposed to elephant) and superordinates (thing and animal as opposed to dog; Shipley et al., 1983); proper names (Daddy as opposed to man; Hall, 1991; Katz et al., 1974); and terms that describe a situation-restricted entity (passenger or lawyer versus man; Hall, 1994). Overall these studies suggest that “basic-level object” (Rosch et al., 1976) may be closer to the mark than “concrete” (see Hall & Waxman, 1993). For present purposes, and because it is incontrovertible, we accept that something like this aspect of certain concepts is accounting for their transparency to the initial lexical mapping procedures, and we use the approximate term concreteness as its nickname. In any case, our main topic here is the acquisition procedures for hard words, issues that are not engaged in the literature on concreteness and the noun advantage. 5. Three strange misinterpretations of this bootstrapping hypothesis have crept into the mythology of the field, perhaps in part through a misunderstanding of Pinker (1994) or Grimshaw (1994). The first is that the child gives up use of the extralinguistic context of input speech as a cue to word meaning once he or she achieves control of the semantics of syntactic structures, substituting internal linguistic analyses. Nothing could be further from the truth or from any proposal that our group of investigators has ever made: the proposal has always been that word-to-world pairing comes to be supplemented by structure-to-world pairing (Landau & Gleitman, 1985). The world—that is, the extralinguistic concomitants of word use—never disappears from the learning equation. The second misunderstanding is that linguistic structure can directly reveal the full meaning of verbs. To believe any such thing would make a mystery of the fact that we learn many verbs whose syntactic properties are the same (e.g., those for freeze/burn and those for bounce/roll; see Fillmore, 1970). Of course, syntactic structure can reveal only the argument-taking properties of verbs, which constrains—but does not exhaust—their semantics. The third misconception is that, according to our hypothesis, you could never learn the meaning of any verb without syntactic support. That would make a mystery of the fact that Spot and Rover can understand “Roll over” and “Give me your paw” given enough dog biscuits. We are talking about the basis for efficient learning (“fast mapping” per Carey, 1978). 6. This choice of stimulus materials has another advantage in realism over most laboratory probes for lexical acquisition: the learner is presented with a complex videotaped reference world—that is, an undoctored, ongoing interaction between mother and child in a setting filled with all the objects and fleeting actions of everyday life and in a long-enough segment for the gist of the conversation to be extracted. This is in comparison with the usual laboratory tasks for child learners, in which they are offered a few structured alternatives in a stripped-down environment—for example, the learner is confronted with a limited set of test objects that differ only in, say, size, shape, or color, or in thingness and stuffness. In the real environment of learners, to the extent simulated here, the world is so richly and variously specified that the mapping problem is exposed in something like its true buzzing, blooming confusion (cf. Chomsky, 1959; for discussion of this factor in HSP, see Kako, 2005). 7. “But if the knowledge which we acquired before birth was lost by us at birth, and afterwards by the use of the senses we recovered that which we previously knew, will not that which we call learning be a process of recovering our knowledge, and may not this be rightly termed recollection?” (Jowett, 2013). 8. In evaluating these findings, please keep in mind that we are always testing accuracy scores for the forty-eight most frequent nouns and verbs from a sample of spontaneous speech of mothers to their babies. Mental-content words such as want and think do show up on these highest-frequency lists. 9. The structured lexical entries necessarily built in the course of learning are used as well in building an efficient, dynamic language-processing system—a system that in the adult state automatically retrieves detailed syntactic tendencies of individual verbs on the fly as they are encountered. This allows an accurate estimation of the sentence structure to be
recovered rapidly enough by listeners so as to assign meaning and establish reference almost on a word-by-word basis (e.g., Kim et al., 2002; MacDonald et al., 1994; J. C. Trueswell, 2000; J. Trueswell & Tanenhaus, 1994). 10. Of course, many other linguistic properties, including morphology and modal structure, can also provide cues for verb discovery. But we do not have direct evidence in these cases, so we discuss them no further. Keep in mind, too, that the systematic islands of sameness in syntactic-semantic linkages within and across languages that we now discuss (following, e.g., Baker, 2001a) coexist within a sea of differences, also real. Gentner’s (1982) natural partitions hypothesis emphasized these differences in crosslinguistic conflation patterns in particular as one causal factor in the infant’s late learning of verbs as opposed to nouns. We agree. There is a variability in these regards that learners must reckon with, which renders the acquisition of a language a formidable computational problem. But the considerable differences at the surface should not blind us to the reality and theoretical centrality of the crosslinguistic communalities at the interface of syntax and semantics. What we are discussing are these universal mapping principles that undergird the learning of hard words. 11. As mentioned earlier, such a principle can be realized only probabilistically in the surface forms of utterances because, among many other reasons, certain constructions (such as the English imperative) systematically render one argument covertly. Moreover, the child learner will have to engage in some fancy footwork to discover these relations from real input as he or she is often in the position of solving for more than one unknown at one time; that is, the ditransitive frame may be truly informative for the semantics of transfer, but the child making this inference must somehow assign this structure upon hearing, say, “John put a ball on the table” and not “John saw a ball on the table.” For some discussion of the computational problems that must be faced in this regard, see Lidz and Gleitman (2004). 12. The changed word order reflects a different universal bias—namely, to align subject-argument with agent-semantic roles (Keenan, 1976). For a linguistic analysis of causative constructions in English, see Levin and Rappaport Hovav (1995). For emerging knowledge of these constructions in child learners, see Bowerman (1974). 13. The proportions here do not add up to 100 percent in any condition because of the indeterminacy of what can relevantly be said given any observed situation. Thus, in response to one of these scenes, children and even adults sometimes respond, “They’re having fun!” or “Look at his hair!” rather than “He’s chasing him.” 14. Other sources of structural and morphological information (e.g., the type of complementizer in a subordinate clause) place even tighter constraints on verb meaning within the proposition-taking verbs. For instance, you can expect that someone will come back but you cannot expect whether someone will come back, though it is fine to wonder whether someone will; you can insist that someone come but not suppose or hope that someone come (rather, you can suppose or hope that someone comes); and so forth. References Akhtar, N., & Tomasello, M. (1997). Young children’s productivity with word order and verb morphology. Developmental Psychology, 33, 952–965. Altmann, G. T. M., & Kamide, Y. (1999). Incremental interpretation at verbs: Restricting the domain of subsequent reference. Cognition, 73, 247–264. Au, T. K., Dapretto, M., & Song, Y. K. (1994). Input versus constraints: Early word acquisition in Korean and English. Journal of Memory and Language, 33, 567–582. Baker, M. C. (2001a). The atoms of language: The mind’s hidden rules of grammar. Basic Books. Baker, M. C. (2001b). Phrase structure as a representation of “primitive” grammatical relations. In W. D. Davies & S. Dubinsky (Eds.), Objects and other subjects: Grammatical functions, functional categories and configurationality. Kluwer. Baldwin, D. A. (1991). Infants’ contribution to the achievement of joint reference. Child Development, 62, 875–890. Bates, E., Dale, P. S., & Thal, D. (1995). Individual differences and their implications for theories of language development. In P. Fletcher & B. MacWhinney (Eds.), The handbook of child language. Blackwell. Bates, E., & Goodman, J. (1997). On the inseparability of grammar and the lexicon: Evidence from acquisition, aphasia and real-time processing. Language and Cognitive Processes, 12(5–6), 507–586. Bloom, L. (1970). Language development: Form and function in emerging grammars. Cambridge, MA: MIT Press. Bloom, L., Lightbrown, P., & Hood, L. (1975). Structure and variation in child language. Monographs of the Society for Research in Child Development, 40, 1–97. Bloom, P. (2002). Mindreading, communication, and the learning of the names for things. Mind and Language, 17, 37–54.
Borer, H. (1984). Parametric syntax. Foris. Bowerman, M. (1974). Learning the structure of causative verbs: A study in the relationship of cognitive, semantic, and syntactic development. Papers and Reports on Child Language Development, 8, 142–178. Brent, M. R. (1994). Surface cues and robust inference as a basis for the early acquisition of subcategorization frames. In L. R. Gleitman & B. Landau (Eds.), The acquisition of the lexicon (pp. 433–470). MIT Press. Bretherton, I., & Beeghly, M. (1982). Talking about internal states: The acquisition of an explicit theory of mind. Developmental Psychology, 18, 906–921. Carey, S. (1978). The child as word learner. In M. Halle, J. Bresnan, & G. A. Miller (Eds.), Linguistic theory and psychological reality (pp. 264–293). MIT Press. Carpenter, M., Akhtar, N., & Tomasello, M. (1998). Fourteen- through 18-month-old infants differentially imitate intentional and accidental actions. Infant Behavior & Development, 21, 315–330. Caselli, M. C., Bates, E., Casadio, P., Fenson, J., Fenson, L., Sanderl, L., & Weir, J. (1995). A cross-linguistic study of early lexical development. Cognitive Development, 10, 159–199. Chang, E., & Fernald, A. (2003). Use of semantic knowledge in speech processing by 26-month-olds [Paper presentation]. Biennial Meeting of the Society for Research in Child Development, Tampa, FL. Chomsky, N. (1959). Review of B. F. Skinner’s Verbal behavior. Language, 35, 26–58. Chomsky, N. (1980). Rules and representations. Basil Blackwell. Chomsky, N. (1981). Lectures on the theory of government and binding. Foris. Croft, W. (1990). Typology and universals. Cambridge University Press. Dang, H. T., Kipper, K., Palmer, M., & Rosenzweig, J. (1996). Investigating regular sense extensions based on intersective Levin classes. Proceedings of the International Conference on Computational Linguistics (pp. 293–299). Association for Computational Linguistics. Dorr, B. J., & Jones, D. (1995). Role of word sense disambiguation in lexical acquisition: Predicting semantics from syntactic cues. Proceedings of the International Conference on Computational Linguistics (pp. 322–327). Association for Computational Linguistics. Dowty, D. (1991). Thematic proto-roles and argument selection. Language, 67, 547–619. Dresher, B. E. (1998). Child phonology, learnability, and phonological theory. In T. Bhatia & W. C. Ritchie (Eds.), Handbook of language acquisition (pp. 299–346). Academic. Dromi, E. (1987). Early lexical development. Cambridge University Press. Elman, J. L. (1993). Learning and development in neural networks: The importance of starting small. Cognition, 48, 71–99. Epstein, W. (1961). The influence of syntactical structure on learning. American Journal of Psychology, 74, 80–85. Feldman, H., Goldin-Meadow, S., & Gleitman, L. (1978). Beyond Herodotus: The creation of language by linguistically deprived deaf children. In A. Lock (Ed.), Action, symbol, and gesture. Academic. Fenson, L., Dale, P. S., Reznick, J. S., Bates, E., Thal, D., & Pethick, S. (1994). Variability in early communicative development. Monographs of the Society for Research in Child Development, 59(5), 1–189. Fillmore, C. J. (1970). The grammar of hitting and breaking. Georgetown University Press. Fisher, C. (1996). Structural limits on verb mapping: The role of analogy in children’s interpretations of sentences. Cognitive Psychology, 31, 41–81. Fisher, C., & Gleitman, L. R. (2002). Language acquisition. In H. Pashler & R. Gallistel (Eds.), Steven’s handbook of experimental psychology: Vol. 3. Learning, motivation, and emotion (3rd ed.). Wiley. Fisher, C., Gleitman, H., & Gleitman, L. R. (1991). On the semantic content of subcategorization frames. Cognitive Psychology, 23, 331–392. Fisher, C., Hall, D. G., Rakowitz, S., & Gleitman, L. (1994). When it is better to receive than to give: Syntactic and conceptual constraints on vocabulary growth. Lingua, 92, 333–375. Forrest, L. B. (1996). Discourse goals and attentional processes in sentence production: The dynamic construal of events. In A. E. Goldberg (Ed.), Conceptual structure, discourse and language. CSLI Publications. Garnsey, S. M., Pearlmutter, N. J., Myers, E., & Lotocky, M. A. (1997). The contributions of verb bias and plausibility to the comprehension of temporarily ambiguous sentences. Journal of Memory and Language, 37, 58–93.
Gelman, S., & Tardif, T. (1998). A cross-linguistic comparison of generic noun phrases in English and Mandarin. Cognition, 66, 215–248. Gentner, D. (1978). On relational meaning: The acquisition of verb meaning. Child Development, 49, 988–998. Gentner, D. (1981). Some interesting differences between nouns and verbs. Cognition and Brain Theory, 4, 161–178. Gentner, D. (1982). Why nouns are learned before verbs: Linguistic relativity versus natural partitioning. In K. Bean (Ed.), Language, thought, & culture (pp. 301–334). Lawrence Erlbaum Associates, Inc. Gentner, D., & Boroditsky, L. (2001). Individuation, relativity and early word learning. In M. Bowerman & S. C. Levinson (Eds.), Language acquisition and conceptual development (pp. 215–256). Cambridge University Press. Geyer, H. L. (1994). Subcategorization as a predictor of verb meaning: Evidence from Hebrew [Unpublished manuscript]. University of Pennsylvania, Philadelphia. Gillette, J., Gleitman, H., Gleitman, L. R., & Lederer, A. (1999). Human simulations of vocabulary learning. Cognition, 73, 135–176. Gleitman, L. R. (1990). The structural sources of verb meanings. Language Acquisition, 1, 3–55. Gleitman, L. R., & Fisher, C. (in press). Universal aspects of word learning. In J. McGilvray (Ed.), Cambridge companion to Chomsky. Cambridge University Press. Gleitman, L., & Gleitman, H. (1997). What is language made of? Lingua, 100, 29–55. Gleitman, L. R., & Wanner, E. (1982). Language acquisition: The state of the state of the art. In E. Wanner & L. R. Gleitman (Eds.), Language acquisition: State of the art (pp. 3–48). Cambridge University Press. Goldberg, A. E. (1995). Constructions: A construction grammar approach to argument structure. University of Chicago Press. Goldberg, A. E. (2004). But do we need universal grammar? Comment on Lidz et al. (2003). Cognition, 94, 77–84. Goldin-Meadow, S. (2003a). The resilience of language. In B. Beachley, A. Brown, & F. Conlin (Eds.), Proceedings of the 27th Annual Boston University Conference on Language Development (pp. 1–25). Cascadilla. Goldin-Meadow, S. (2003b). The resilience of language: What gesture creation in deaf children can tell us about how all children learn language. Psychology Press. Goldin-Meadow, S., Seligman, M., & Gelman, R. (1976). Language in the two-year old. Cognition, 4, 189–202. Goldsmith, J. (2001). Learning of the morphology of a natural language. Computational Linguistics, 27, 153–198. Goodman, J. C., McDonough, L., & Brown, N. B. (1998). The role of semantic context and memory in the acquisition of novel nouns. Child Development, 69, 1330–1344. Gopnik, A., & Meltzoff, A. N. (1997). Words, thoughts, and theories. MIT Press. Gordon, P. (2003). The origin of argument structure in infant event representations. In A. Brugos, L. Micciulla, & C. E. Smith (Eds.), Proceedings of the 28th Annual Boston University Conference on Language Development (pp. 189–198). Cascadilla. Gordon, P., Grosz, B. & Gillom, L. (1993). Pronouns, names, and the centering of attention in discourse. Cognitive Science, 3, 311–347. Graham, S. A., Baker, R. K., & Poulin Dubois, D. (1998). Infants’ expectations about object label reference. Canadian Journal of Experimental Psychology, 52(3), 103–113. Grimshaw, J. (1990). Argument structure. MIT Press. Grimshaw, J. (1994). Lexical reconciliation. In L. R. Gleitman & B. Landau (Eds.), The acquisition of the lexicon. MIT Press. Hall, D. G. (1991). Acquiring proper nouns for familiar and unfamiliar animate objects: Two-year-olds’ word-learning biases. Child Development, 62, 1142–1154. Hall, D. G. (1994). How mothers teach basic-level and situation-restricted count nouns. Journal of Child Language, 21, 391–414. Hall, D. G., & Waxman, S. R. (1993). Assumptions about word meaning: Individuation and basic-level kinds. Child Development, 64(5), 1550–1570. Harris, Z. S. (1964). Co-occurrence and transformation in linguistic structure. Prentice-Hall.
Hespos, S., & Baillargeon, R. (2001). Infants’ knowledge about occlusion and containment events: A surprising discrepancy. Psychological Science, 12, 141–147. Hornstein, N., and Lightfoot, D. (1981). Introduction. In N. Hornstein and D. Lightfoot (Eds.), Explanation in linguistics. Longman. Hume, D. (1978). A treatise on human nature. Clarendon. (Original work published 1739.) Huttenlocher, J., Smiley, P., & Charney, R. (1983). The emergence of action categories in the child: Evidence from verb meanings. Psychological Review, 90, 72–93. Jackendoff, R. (1990). Semantic structures. Current studies in linguistics. (Vol. 18). MIT Press. Johnson, C. N., & Maratsos, M. P. (1977). Early comprehension of mental verbs: Think and know. Child Development, 48, 1743–1747. Joshi, A., & Srinivas, B. (1994). Disambiguation of super parts-of-speech (or supertags): Almost parsing. Proceedings of the 15th International Conference on Computational Linguistics (pp. 154–160). Association for Computational Linguistics. Jowett, B., trans. (2013). Phaedo: The last hours of Socrates, by Plato. Createspace Independent Publisher. Jusczyk, P., Hirsh-Pasek, K., Kemler Nelson, D., Kennedy, L., Woodward, A., & Piwoz, J. (1992). Perception of the acoustic correlates of major phrasal units by young infants. Cognitive Psychology, 24, 252–293. Kako, E. (1998). The event semantics of syntactic structures [Unpublished doctoral dissertation]. University of Pennsylvania, Philadelphia. Kako, E. (2005). Information sources for noun learning. Cognitive Science, 29, 223–260. Kako, E., & Trueswell, J. C. (2000). Verb meanings, object affordances, and the incremental restriction of reference. Proceedings of the Annual Conference of the Cognitive Science Society, 22, 256–261. Katz, N., Baker, E., & MacNamara, J. (1974). What’s in a name? A study of how children learn common and proper names. Child Development, 45(2), 469–473. Keenan, E. L. (1976). Towards a universal definition of subject. In C. N. Li (Ed.), Subject and topic (pp. 303–333). Academic. Kellman, P. J., & Arterberry, M. E. (1998). The cradle of knowledge: The development of perception in infancy. MIT Press. Kellman, P. J., & Spelke, E. S. (1983). Perception of partly occluded objects in infancy. Cognitive Psychology, 15, 483–524. Kelly, M. H., & Martin, S. (1994). Domain-general abilities applied to domain-specific tasks: Sensitivity to probabilities in perception, cognition, and language. Lingua, 92, 105–140. Kim, A., Srinivas, B., & Trueswell, J. C. (2002). The convergence of lexicalist perspectives in psycholinguistics and computational linguistics. In P. Merlo and S. Stevenson (Eds.), Sentence processing and the lexicon: Formal, computational and experimental perspectives. John Benjamins Publishing. Lakusta, L., & Landau, B. (in press). Starting at the end: The importance of goals in spatial language. Cognition. Landau, B., & Gleitman, L. R. (1985). Language and experience: Evidence from the blind child. Harvard University Press. Lederer, A., Gleitman, H., & Gleitman, L. (1995). Verbs of a feather flock together: Semantic information in the structure of maternal speech. In M. Tomasello & W. E. Merriman (Eds.), Beyond names for things: Young children’s acquisition of verbs. Lawrence Erlbaum Associates, Inc. Lenneberg, E. H. (1967). Biological foundations of language. Wiley. Leslie, A. M. (1995). A theory of agency. In D. Sperber, D. Premack, & A. J. Premack (Eds.), Causal cognition: A multidisciplinary debate (pp. 121–141). Oxford University Press. Leslie, A. M., & Keeble, S. (1987). Do six-month-old infants perceive causality? Cognition, 25, 265–288. Levin, B. (1993). English verb classes and alternations: A preliminary investigation. University of Chicago Press. Levin, B., & Rappaport Hovav, M. (1995). Unaccusativity: At the syntax-lexical semantics interface. MIT Press. Li, P. (1994). Subcategorization as a predictor of verb meaning: Cross-linguistic study in Mandarin [Unpublished manuscript]. University of Pennsylvania. Li, P. (2003). Language acquisition in a self-organizing neural network model. In P. Quinlan (Ed.), Connectionist models of development: Developmental processes in real and artificial neural networks. Psychology Press. Lidz, J., & Gleitman, L. R. (2004). Yes, we still need universal grammar. Cognition, 94, 85–93.
Lidz, J., Gleitman, H., & Gleitman, L. (2003a). Kidz in the ’hood: Syntactic bootstrapping and the mental lexicon. In D. G. Hall & S. R. Waxman (Eds.), Weaving a lexicon (pp. 603–636). MIT Press. Lidz, J., Gleitman, H., & Gleitman, L. (2003b). Understanding how input matters: Verb learning and the footprint of universal grammar. Cognition, 87, 151–178. Locke, J. (1964). An essay concerning human understanding (pp. 259–298). Meridian Books. (Original work published 1690) MacDonald, M. C., Pearlmutter, N. J., & Seidenberg, M. S. (1994). The lexical nature of syntactic ambiguity resolution. Psychological Review, 101, 676–703. Mandler, J. M. (2000). Perceptual and conceptual processes in infancy. Journal of Cognition and Development, 1, 3–36 Manning, C., & Schütze, H. (1999). Foundations of statistical natural language processing. MIT Press. Manning, C. D. (1993). Automatic acquisition of a large subcategorization dictionary from corpora. Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics (pp. 235–242). Association for Computational Linguistics. Markman, E. (1987). How children constrain the possible meanings of words. In Ulric Neisser (Ed.), Concepts and conceptual development: Ecological and intellectual factors in categorization (pp. 255–287). Cambridge University Press. Markman, E. M. (1994). Constraints on word meaning in early language acquisition. Lingua: International Review of General Linguistics, 92(1–4), 199–227. McClelland, J. L. (1987). The case for interactionism in language processing. In M. Coltheart (Ed.), Attention and performance. (Vol. 12). Lawrence Erlbaum Associates. Merlo, P., & Stevenson, S. (2001). Automatic verb classification based on statistical distribution of argument structure. Computational Linguistics, 27, 373–408. Mintz, T. H. (2003). Frequent frames as a cue for grammatical categories in child directed speech. Cognition, 90, 91–117. Moore, C., Bryant, D., & Furrow, D. (1989). Mental terms and the development of certainty. Child Development, 60, 167– 171. Morgan, J. L., Meier, R. P., & Newport, E. L. (1987). Structural packaging in the input to language learning: Contributions of prosodic and morphological marking of phrases to the acquisition of language. Cognitive Psychology, 19, 498–550. Naigles, L., Gleitman, H., & Gleitman, L. R. (1992). Children acquire word meaning components from syntactic evidence. In E. Dromi (Ed.), Language and cognition: A developmental perspective (pp. 104–140). Ablex. Naigles, L. G. (1990). Children use syntax to learn verb meanings. Journal of Child Language, 17, 357–374. Naigles, L. G., Fowler, A., & Helm, A. (1992). Developmental changes in the construction of verb meanings. Cognitive Development, 7, 403–427. Nappa, R. L., January, D., Gleitman, L. R., & Trueswell, J. (2004). Paying attention to attention: Perceptual priming effects on word order. In Proceedings of the Annual Meeting of the Cognitive Science Society, 26. Needham, A., & Baillargeon, R. (2000). Infants’ use of featural and experiential information in segregating and individuating objects: A reply to Xu, Carey and Welch (2000). Cognition, 74, 255–284. Osherson, D. N., & Weinstein, S. (1982). Criteria of language learning. Information and Control, 52(2), 123–138. Papafragou, A., Cassidy, K., & Gleitman, L. (2007). When we think about thinking: The acquisition of belief verbs. Cognition, 105, 125–165. Pinker, S. (1984). Language learnability and language development. Harvard University Press. Pinker, S. (1989). Learnability and cognition. MIT Press. Pinker, S. (1994). How could a child use verb syntax to learn verb semantics? Lingua: International Review of General Linguistics, 92(1–4), 377–410. Quine, W. (1960). Word and object. Wiley. Resnick, P. (1996). Selectional constraints: An information-theoretic model and its computational realization. Cognition, 61, 127–159. Rispoli, M. (1989). Encounters with Japanese verbs: Caregiver sentences and the categorization of transitive and intransitive sentences. First Language, 9, 57–80. Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8, 382–439.
Saffran, J., Aslin, R., & Newport, E. (1996). Statistical learning by 8-month-old infants. Science, 274, 1926–1928. Seidenberg, M. S. (1997). Language acquisition and use: Learning and applying probabilistic constraints. Science, 275, 1599–1604. Senghas, A., Coppola, M., Newport, E. L., & Supalla, T. (1997). Argument structure in Nicaraguan Sign Language: The emergence of grammatical devices. In Proceedings of the Boston University Conference on Language Development, 21 (pp. 550–561). Cascadilla Press. Shatz, M., Wellman, H. M., & Silber, S. (1983). The acquisition of mental terms: A systematic investigation of the first reference to mental state. Cognition, 14, 201–321. Shipley, E. F., Kuhn, I. F., & Madden, E. C. (1983). Mothers’ use of superordinate category terms. Journal of Child Language, 10, 571–588. Smiley, P., & Huttenlocher, J. E. (1995). Conceptual development and the child’s early words for events, objects, and persons. In M. Tomasello & W. Merriman, Beyond names for things (pp. 21–62). Lawrence Erlbaum Associates, Inc. Snedeker, J., & Gleitman, L. (2004). Why it is hard to label our concepts? In D. G. Hall & S. R. Waxman (Eds.), Weaving a lexicon (pp. 603–636). MIT Press. Snedeker, J., & Li, P. (2000). Can the situations in which words occur account for cross-linguistic variation in vocabulary composition? In J. Tai and Y. Chang (Eds.), Proceedings of the Seventh International Symposium on Chinese Languages and Linguistics. National Chung Cheng University. Snedeker, J., & Trueswell, J. C. (2004). The developing constraints on parsing decisions: The role of lexical-biases and referential scenes in child and adult sentence processing. Cognitive Psychology, 49, 238–299. Spelke, E. S. (2003). Core knowledge. In N. Kanwisher & J. Duncan (Eds.), Attention and performance: Vol. 20. Functional neuroimaging of visual cognition. Oxford University Press. Spelke, E. S., Breinlinger, K., Macomber, J., & Jacobson, K. (1992). Origins of knowledge. Psychological Review, 99, 605– 632. Srinivas, B., & Joshi, A. K. (1999). Supertagging: An approach to almost parsing. Computational Linguistics, 252(2), 237– 265. Talmy, L. (1985). Lexicalization patterns: Semantic structure in lexical forms. In T. Shopen (Ed.), Language typology and syntactic description. Cambridge University Press. Tanenhaus, M. K., & Carlson, G. N. (1988). Thematic roles and language comprehension. Syntax and Semantics, 21, 263– 288. Tardif, T., Gelman, S. A., & Xu, F. (1999). Putting the “noun bias” in context: A comparison of English and Mandarin. Child Development, 70(3), 620–635. Tomasello, M. (2000). Do young children have adult syntactic competence? Cognition, 74, 209–253. Tomasello, M., & Farrar, M. J. (1986). Joint attention and early language. Child Development, 57, 1454–1463. Tomlin, R. S. (1997). Mapping conceptual representations into linguistic representations: The role of attention in grammar. In J. Nuyts & E. Pederson (Eds.), Language and conceptualization: Language, culture, and cognition. Cambridge University Press. Trueswell, J., & Gleitman, L. (2004). Children’s eye movements during listening: Developmental evidence for a constraintbased theory of sentence processing. In J. M. Henderson & F. Ferreira (Eds.), The interface of language, vision, and action: Eye movements and the visual world (pp. 319–346). Taylor & Francis. Trueswell, J., & Tanenhaus, M. (1994). Toward a lexicalist framework of constraint-based syntactic ambiguity resolution. In C. Clifton, K. Rayner, & L. Frazier (Eds.), Perspectives on sentence processing. Lawrence Erlbaum Associates, Inc. Trueswell, J. C. (2000). The organization and use of the lexicon for language comprehension. In B. Landau, J. Sabini, J. Jonides, & E. Newport (Eds.), Perception, cognition, and language: Essays in honor of Henry and Lila Gleitman (pp. 327– 345). MIT Press. Trueswell, J. C., & Kim, A. E. (1998). How to prune a garden-path by nipping it in the bud: Fast-priming of verb argument structures. Journal of Memory and Language, 39, 102–123. Trueswell, J. C., Sekerina, I., Hill, N. M., & Logrip, M. L. (1999). The kindergarten-path effect: Studying on-line sentence processing in young children. Cognition, 73, 89–134. Walker, M. A., Joshi, A. K., & Prince, E. F. (1997). Centering in naturally-occurring discourse: An overview. In M. A.
Walker & E. F. Prince (Eds.), Centering in discourse. Oxford University Press. Webster, M., & Marcus, M. (1989). Automatic acquisition of the lexical semantics of verbs from sentence frames. 27th annual meeting of the Association for Computational Linguistics (pp. 177–184). Association for Computational Linguistics. Wexler, K., & Culicover, P. (1980). Formal principles of language acquisition. MIT Press. Woodward, A. L. (1998). Infants selectively encode the goal object of an actor’s reach. Cognition, 69, 1–34. Woodward, A. L., & Markman, E. M. (2003). Early word learning. In D. Khuhn & R. S. Siegler (Eds.), Handbook of child psychology: Vol. 2. Cognition, perception, and language (5th ed., pp. 371–415). Wiley. Zwicky, A. M. In a manner of speaking. Linguistic Inquiry, 2, 223–233.
4 Elicited Imitation in First Language Acquisition Research: Cognitive Grounding and Crosslinguistic Application Cristina Dye and Claire Foley
Elicited imitation as a method for first language acquisition research extends back to the 1960s. Its value for studying first language acquisition was explained and documented by Lust et al. (1987) and Lust et al. (1996). These authors discussed the theoretical rationale for the method and the nature of reliability and validity of research using the method and summarized discoveries enabled by its use. In the decades since, developments in cognitive science and neurolinguistics have uncovered new evidence about basic cognition underlying the capacity for elicited imitation. Further, an explosion of interest in elicited imitation has led to its extensive use in studies of second language (L2) acquisition, bilingualism, and language development in special populations, and methodological studies have detailed its use in these areas (e.g., Vinther, 2002; Yan et al., 2016). Together, these developments call for updating the grounding of the method and reviewing progress and potential in first language acquisition.1 In this chapter, we seek to unpack the theoretical underpinnings of elicited imitation as a research method in light of both early research and later developments in the study of language and cognition.2 We draw conclusions about its usefulness in understanding the nature of language and language acquisition. Areas of Research Using Elicited Imitation
As summarized by Lust et al. (1987), early research using elicited imitation provided evidence of the reliability and validity of the method. In the decades that followed, elicited imitation was used in the study of first language acquisition in many theoretical domains, including, among many others, syntax of sentential subjects (Valian & Aubrey, 2005), interaction of phrase structure and verbal inflectional features (Cohen Sherman & Lust, 1993), and question formation (Santelmann et al., 2002). We return to a discussion of the use of elicited imitation in studies of first language acquisition in the section titled Role of Linguistic Knowledge. In recent decades, elicited imitation has been used extensively in studies of second language acquisition (Jessop et al., 2007; Vinther, 2002; West, 2014). It has also been used in studies of bilingualism in children (Komeili & Marshall, 2013) and in studies of specific
language impairment (Coady & Evans, 2008; Leclerq et al., 2014). Beyond language acquisition research, elicited imitation has been used for language assessment. Bowden (2016) argued that the method is valid for assessment, offering evidence of correlation between elicited imitation findings and findings from simulated oral proficiency interviews. In a study using several tasks with potential for identifying specific language impairment, sentence repetition was argued to be “the most useful” marker (ContiRamsden et al., 2001, p. 741). Assumptions about Methodology
In drawing inferences about children’s knowledge of language based on language data, several assumptions are necessary. We number these for convenience below. 1. All windows into children’s knowledge of language involve behavior. Whether a method is primarily seeking information about comprehension or production, evidence about knowledge is available only through some form of behavior, such as speech, nonverbal signaling, or manipulation of objects. 2. Methods vary in the amount of overt behavior they require. Some methods call for only a small quantity of observable behavior, such as a yes or no response or pointing at a picture. Others involve more observable behavior, such as a sequence of manipulations of objects in an act-out task or long segments of spoken language. 3. Methods vary in the cognitive and motor elements of behavior that interact with language knowledge to yield linguistic data. Even in a method that involves limited quantities of observable speech or action, multiple cognitive capacities underlie responses. These may include visual and auditory perception, executive function (which in turn includes such components as working memory, flexibility, and planning), long-term memory, imagination, and meta-awareness of an activity as a game. Motor elements of behavior may include reaching, grasping, pointing, moving and controlling objects, and motor activities of the speech tract. 4. Nonlinguistic cognitive and motor capacities are developing alongside language in the child. All of the cognitive and motor capacities listed above develop over time. 5. Performance may not always clearly reflect the target. Behavior is susceptible to unintended error. Thus, any form of speech or movement may imperfectly instantiate what a person intended. All inferences drawn about linguistic competence must respect these assumptions, which constitute constraints on the inferences that may be drawn on the basis of language behavior. There is no single perfect method, because there are often trade-offs among these areas. For example, a method that involves only a yes or no response limits the potential observable variation and thus the range of possible mismatches between intended behavior and performance. However, because the overt required behavior is not complex, the same method also provides less rich information about interpretation and may tap many different
nonobservable cognitive capacities, such as the capacity to keep alternatives in working memory, make a choice, and translate the choice into the appropriate word. Like any method, elicited imitation offers advantages and also has limitations with respect to these constraints on inferences. If administered with the required pretraining, it offers the advantage of a known linguistic target. It also offers both the advantages and the limitations of a task with more complex overt behavior. At the same time, because it does not involve the motor system beyond the speech production system, it is not confounded with gross motor development. Because it does not involve pretending, it does not draw on the capacity to distinguish between reality and imagination, which also develops over time (see, e.g., Weisberg, 2013). Grounding of the Method
Elicited imitation is grounded in several ways that underlie its availability and authenticity as a window into linguistic competence. Imitation Is Innate
Evidence suggests that the human capacity for imitation is innate. At the age of 12 to 21 days, infants are able to imitate facial gestures (tongue protrusion, mouth opening, and lip protrusion) and gestures (sequential finger movement) (Meltzoff & Moore, 1977). Findings were replicated with infants between the ages of 0 and 72 hr (Meltzoff & Moore, 1983). Infants’ capacity to imitate actions is robust with respect to changes in context and to shape and size of manipulated objects. Relative to control groups, 14-mo-old infants who had earlier seen an action with a toy performed by an adult (e.g., a two-part toy pulled apart) later performed more of these actions with the same toy even in a different setting or when the size and color of the object had changed (Barnat et al., 1996). There is also evidence that preverbal children between the ages of 1;6 and 2;8 imitate one another in social interactions. In videotaped interactions, children demonstrated use of an object that copied a peer’s use of the object. These imitations were not seen in children ages three and older, after they began communicating with peers using language (Nadel, 2002). Perception and Action Are Neurologically Connected and Compositionally Analyzed in Imitation
In addition to evidence that the capacity to imitate is present beginning in infancy, studies of motor activity have uncovered evidence of a neurological connection between observing and performing a motor task. As summarized by Mohlenbergs et al. (2009), functional magnetic resonance imaging (fMRI) studies have revealed that the same region of the brain shows activity when a motor sequence is perceived and carried out (see also Chaminade et al., 2002). However, this connection between perception and action does not reflect representation of an action as an unanalyzed whole: there is evidence that children’s motor imitations are sensitive not simply to the mechanics of the movement but also to the goal of the action (Bekkering et al., 2000; Williamson & Markman, 2006; Wohlschläger et al., 2003). In one study, thirty-two three-year-old children imitated adult motions that varied along several
dimensions, including whether the movement ended on a body part or near the body part, whether it crossed the body’s midline, and whether one or two hands were involved. There were more errors when the movement ended on a body part, crossed midline, and included only one hand, leading to the hypothesis that children encoded a goal (e.g., touching a knee) that could be met by touching the closest knee (thus explaining the higher number of errors in imitations of contralateral movements). The authors hypothesized that imitation involves a hierarchy of goals, with the possibility that acting out higher-order goals leads to errors (Gleissner et al., 2000). Related work demonstrates that in complex motor imitations, children observe and use hierarchical organization of complex actions. In one motor imitation study, children observed adults opening a puzzle box using one of two possible hierarchical approaches to solving the puzzle (rows versus columns as the systematic organization of actions). Thirty-one children between the ages of 2;10 and 3;11 (mean 3;5) were assigned to one of two experimental groups (observing row actions or observing column actions) or to a control group (no observation before attempting to solve the puzzle box). Children more frequently approached the task using the hierarchical approach they had observed. A second trial introduced a new sub-action at the point of imitation to investigate whether a chain of small actions was being imitated rather than a hierarchical approach. Children assimilated this step into the hierarchical approach rather than ignoring it or postponing it (Whiten et al., 2006). A subsequent study varied not only overall hierarchical organization of actions but also the manner of actions in sub-steps (e.g., twisting versus tapping a part to move it). Children at ages three (mean age 3;6, N = 57) and five (mean age 5;5, N = 60) were more likely to imitate overall hierarchical organization than the manner of actions in sub-steps, though the minority who either twisted or tapped a part tended to use the action they had observed (E. Flynn & Whiten, 2008). These studies suggest that actions are not represented as unanalyzed wholes but rather that imitation of movement involves active reconstruction that can be guided by hierarchical cognitive organization. Imitation Draws on Knowledge
Even though there is evidence that the capacity for imitation may be innate, it is not possible to imitate anything at any time. As noted in early work on the method, imitation of complex behaviors awaits the development of the capacity for those behaviors (Lust et al., 1987). For example, in the domain of language, an in-depth study of spontaneous speech by five twoyear-old children acquiring English examines the consistency in structure of their spontaneous imitations of adult speech and their spontaneous utterances that were not repetitions. Children’s imitations did not include structures more complex than their other spontaneous utterances (Ervin, 1964). Similarly, another study of the acquisition of English (ages 2;10–6;1, N = 64) reports a correlation between the use of sentence types in spontaneous speech and successful imitation of those types (Menyuk, 1963). Clay (1971) summarizes a range of evidence that, at least at younger ages, imitation is possible only for structures that are present in spontaneous speech (noting, however, that older children may
sometimes be able to imitate structures not present in spontaneous speech). Using a sentence repetition task with pictures, Devescovi and Caselli (2007) report that for children acquiring Italian (ages 2;0–4;0, N = 100), omission of articles in sentence repetition correlated with omission of articles in spontaneous speech. In the section titled Role of Linguistic Knowledge, we return to studies demonstrating that elicited imitation taps linguistic domains. Imitation Involves Reconstruction, If the Stimulus Is Long Enough
Repeating a sentence involves regeneration of the stimulus. An alternative view is that sentence repetition involves generating a surface representation of a string of words (e.g., one stored only in a generalized perceptual system) and that only short-term memory is consulted in recall. A challenge for this perspective is that while recall of lists is typically limited to about seven items (Baddeley, 1984; Miller, 1956 [1994]), verbatim recall of sentences with many more words is possible. Potter and Lombardi (1990, p. 634) point out that although “chunking” (Miller, 1956 [1994]) may theoretically account for longer recall, such an account requires an explanation of how the chunks are organized by a linguistic system and reassembled. In a series of experiments, Potter and Lombardi (1990) present evidence that sentence recall involves regeneration of the sentence from a conceptual representation. They demonstrate that sentence recall is influenced by the presence of a lure word that is semantically similar to a word in the sentence. Participants in experiments were asked to recall sentences verbatim, but only after they had completed a secondary task that involved presentation of a list of words. Lists either did or did not include a word semantically similar to a word in the sentence (e.g., one sentence included the word palace, and the secondary task list included castle). Findings showed that the lure words appeared spontaneously in some repetitions even when they had not appeared on the list, but they appeared significantly more frequently when they did appear on the list, and other words on the list seldom appeared. Results held with written and spoken stimuli and with both adults and children between the ages of 3;11 and 5;2. These findings provide evidence that for both adults and children, sentence repetition involves regenerating the stimulus from a linguistic representation. Implications
The studies summarized above suggest that elicited imitation taps innate abilities. Further, the evidence that perception and action are neurologically connected and compositionally analyzed in motor imitation offers a foundation for their connection in the study of linguistic imitation. Building on this foundation, finally, there is evidence that elicited imitation references linguistic knowledge and involves reconstruction of language forms. Role of Working Memory, Executive Function, and Speech Perception and Production Working Memory
Working memory can be defined as “a brain system” that allows us to store and manipulate
information in order to perform cognitive tasks (Baddeley, 1992, p. 556). One prominent model of working memory includes executive attentional control and two systems for shortterm information storage: the “phonological loop” for verbal and acoustic information, and the “visuospatial sketchpad” for visual information. The model later incorporated an “episodic buffer,” a temporary storage system that can integrate information into unified representations (Baddeley, 2003; Repovs & Baddeley, 2006). Later descriptions of the model emphasize the complexity of processes within the phonological loop and visuospatial sketch pad, which may not be open to conscious awareness, and the role of the episodic buffer as a repository and manipulator of information and as an interface to conscious awareness (Baddeley & Hitch, 2019). Each of these components is at work in processing and manipulating stimuli. For example, the role of executive attentional control in working memory can be studied independently of the other components (see, e.g., Allen et al., 2014). Working memory also involves access to long-term memory: “activating conceptual representations in the language system provides content to verbal [working memory]” (Abrahamse et al., 2017, p. 429). This access to longterm memory contributes to the capacity of the phonological loop to provide verbal information for manipulation in the episodic buffer. It appears that mechanisms of language production underlie verbal working memory. For example, phonological similarity is associated with both speech errors in production and working memory interference. When sound segments share phonological features such as place of articulation, they are recalled less well; and the likelihood of speech errors is also higher with phonological similarity, as in tongue twisters like “She sells seashells …” (Acheson & MacDonald, 2009). Another example is lexical status: words are recalled more easily than nonwords, and speech errors more often produce words than nonwords (Acheson & MacDonald, 2009). This parallel suggests that even short-term memory storage of language is undergirded by linguistic mechanisms. Executive Function
Working memory has been identified as one component of a much broader cognitive domain termed executive function, which is at work “when individuals engage in conscious, goaldirected thought and action under novel or unfamiliar circumstances, where previously established routines for responding are nonexistent” (Carlson et al., 2013). The components of executive function have been investigated over many decades through research using a wide variety of tests. A large body of latent variable research using confirmatory factor analysis has yielded findings that support three factors of executive function: working memory, inhibitory control, and shifting of mental sets (Carlson et al., 2013). However, this characterization is not static throughout development: for preschool-age children, one- or two-factor models are more common in published studies, with shifting of mental sets emerging for older children and adolescents (Karr et al., 2018). Further, components beyond these three are frequently cited in the literature; examples include goal-directed planning and error detection and correction. Like other methods, imitation taps executive function. For example, as noted above,
studies of imitation in motor development have distinguished between development of the mapping from perception to motor activity on one hand and development of goal-directed coding of activities on the other. There is evidence that children’s motor imitations reference both and that they establish a hierarchy of goals for complex imitation (Gleissner et al., 2000; see discussion in the section titled Grounding of the Method). Theoretical models for this mapping (e.g., Meltzoff & Moore, 1997) either explicitly or implicitly invoke executive function in their assumption that successful imitation requires storage of perceptual input, analysis, and identifying and prioritizing goals. Together, the fact that executive function plays a role in any experimental method and the evidence that components of executive function are developing suggest that methods for studying child language that include multiple sources of stimuli and multiple demands or activities are more complicated to evaluate. For example, executive function and the development of imagination are connected (Carlson & White, 2013). Methods for the study of language that call on children to imagine a scenario or what a character is thinking therefore interact with the development of imagination, which is in turn connected to the development of executive function. Speech Perception and Production
From a psycholinguistic perspective, speech perception is the first linguistic task in elicited imitation. As a multilayered system, it involves segmentation at phonetic, lexical, and syntactic levels (Samuel, 2011). It consults prosodic information and uses it for segmentation (see, e.g., Frazier et al., 2006). It feeds an interpretive system that must not only pair syntactic forms with interpretations in a compositional way but also take contextual knowledge into account, including knowledge of the possibility for speech errors (Frazier, 2015). Elicited imitation includes formulation of a response using a speech production system. Recent research has identified over ten specific cortical and subcortical structures or regions that are implicated in speech production (for a summary of neuroimaging studies, see Guenther et al., 2015). This research attempts to isolate tasks such as motor control and the organization of sensory information from higher-order language knowledge, such as knowledge of abstract units of linguistic analysis like phonemes. In general, there is evidence that language processing mechanisms play a role in a task similar to elicited imitation: verbal serial recall, where a string of words unconnected through syntax must be repeated. For example, success in verbal serial recall is higher for words than for nonwords, for concrete words than for abstract words, and for words with many similarsounding “neighbors” than for words with relatively fewer neighbors (see summary in Allen & Hulme, 2006). Linguistic mechanisms also underlie repetition. For example, phonological information influences sentence recall. Participants in an experiment recalled sentences presented either visually or auditorily and recalled auditory sentences with greater success, but the difference disappeared when independent word lists were presented before recall, suggesting that phonological information interfered with recall (Rummer & Engelkamp, 2001). Further, there is evidence that phonological and semantic information is represented
and consulted separately in sentence repetition. Martin et al. (1994) report that two adults with different language impairments were affected differently in their imitation capacities: an adult with phonological impairment performed worse on sentence repetition than an adult with semantic impairment, with opposite findings on a comprehension task. Motor performance in speech production is influenced by linguistic knowledge. One study of motor performance investigated repetition of a six-syllable phrase both in isolation and also embedded in longer, syntactically more complex sentences (Maner et al., 2000). Within the embedded condition, the study varied syntactic type (e.g., coordinate versus subordinate structure). The study applied a measure of motor variation in speech termed the spatiotemporal index (STI), which uses speech waveforms to compute a composite measure of spatial variation (e.g., changes in amplitude) and temporal variation (e.g., changes in timing of peaks). High STI indicates more variability. Five-year-old children (N=8) and adults (N=8) took part in the elicited imitation study. Analysis excluded repetitions with errors such as omission of words or changes in order. STI was significantly higher for both groups for the phrase in embedded contexts than in isolation, although there was no significant difference across syntactic type within the longer sentences. Although length and syntactic complexity are not dissociated in these findings, results demonstrate that motor performance on the same phrase varies based on the syntactic setting. Implications
Together, the evidence from research in the cognitive sciences has several implications for the use of elicited imitation as a research method. First, there is now evidence that working memory accesses long-term memory and that the same mechanisms underlie verbal working memory and language production. Second, the role of executive function in imitation is increasingly understood—for example, there is evidence that motor imitation involves setting and prioritizing goals. Third, the mechanisms of language processing characterize performance in tasks like sentence repetition (e.g., repetition of lists), with evidence that linguistic information not only is consulted but can be dissociated in its role (e.g., phonological versus semantic information). Role of Linguistic Knowledge
As summarized above, cognitive capacities that transcend the use of language are tapped in elicited imitation. At the same time, linguistic capacities are tapped. For example, in a study seeking to understand the connection between sentence recall and the development of reading skills in children with learning challenges, Alloway and Gathercole (2005) tested seventytwo children on a range of cognitive and language-related measures. They conducted a series of analyses to examine the contribution of working memory and sentence recall to language skills (comprehension and oral expression). Their results indicated that sentence recall predicted language skills when working memory was held constant, suggesting that while working memory is tapped through sentence recall, the method draws on linguistic abilities apart from capacity for working memory.
This section describes evidence that various types of linguistic knowledge are accessed through elicited imitation. Phonology and Morphology
Evidence indicates that repetition of nonwords (i.e., novel items) is not repetition of an unanalyzed whole (Dye et al., 2016). For example, features of phonological segments affect success in repetition: such differences as phoneme type and single consonants versus clusters related to different success in imitation (see summary in Dye et al., 2016, p. 63). As noted in the previous section, in a study on tongue twisters like “She sells seashells,” the likelihood of speech errors was higher with phonological similarity (Acheson & MacDonald, 2009). Error patterns reflected not merely phonological features but syllable structure: substitutions in syllable onset position drew mostly from other onsets, and substitutions in coda position drew exclusively from other codas. Providing further evidence that elicited imitation consults phonological knowledge, one study demonstrated that teaching seven-year-old children acquiring Norwegian (N=160) explicitly about phonological segmentation of words improved performance on repetition of those words in lists (Melby-Lervag & Hulme, 2010). Elicited imitation has been used to investigate variable omission of morphemes and phonetic segments in children’s speech production. For example, through an elicited imitation study, production of inflectional morphemes in English was shown to occur more in phonologically simpler positions: inflection was produced more when it was the only segment in the coda (e.g., sees) than when it was part of a consonant cluster (e.g., needs) (J. Y. Song et al., 2009). Elicited imitation studies have also investigated the acquisition of language-specific constraints on syllable structure. For example, in English, open-class phonological words must contain two moras, a requirement met by an onset consonant followed by a long vowel or diphthong, or by an onset consonant, short vowel, and coda consonant, but not by an onset consonant and short vowel without coda. Children acquiring English often omit coda-final consonants, but these omissions occur more frequently after long vowels, where they are not necessary to complete the prosodic foot, than after short vowels, where language-specific syllable structure requires them (Miles et al., 2016). Syntax
Elicited imitation played a crucial role in a body of research that uncovered and teased apart evidence for two broad types of theoretically defined competence for language in children. Researchers in the Universal Grammar framework (e.g., Chomsky, 1981, 1986, 1995) theorized that a set of principles holds for all languages and constrains the course of language acquisition. For example, one proposed principle is that languages are structure dependent, with hierarchical structure underlying sentences. In contrast, parameters account for crosslinguistic variation and can be set on the basis of linguistic input over the course of acquisition. An example of a parameter is branching direction, capturing whether the canonical direction for hierarchical embedding in syntactic structure is to the right or to the left. A comprehensive set of studies using elicited imitation together revealed knowledge of
both structure dependence and directionality in young children. These studies investigated whether children would have greater success imitating sentences in which anaphoric elements followed or preceded their antecedents under different hierarchical conditions (e.g., varying not only linear order but also whether these elements appeared in main or embedded clauses under different directionality of embedding). For example, one study compared structures like those in 1–3 (structures are from Lust, 1981; descriptive labels are from S. Flynn, 19873). 1. Right branching/Forward anaphora Jenna drank some juice while she was having lunch. 2. Left branching/Backward anaphora While she was having lunch, Jenna drank some juice. 3. Left branching/Forward anaphora While Jenna was having lunch, she drank some juice. Lust (1981) found that children acquiring English (ages 2;6–3;5, mean 3;0, N=24) had fewer errors in imitation for forward anaphora structures like 1 than for backward anaphora structures like 2. Holding anaphora directionality constant, children imitated right-branching structures like 1 more successfully than left-branching structures like 3; however, the difference in success between forward versus backward anaphora (e.g., 1 vs. 2) was significantly larger than the difference between sentences with consistent anaphora direction and different branching direction (e.g., 1 vs. 3). Thus, this study argued that children distinguish both anaphora direction and branching direction. There is evidence that children’s application of an anaphora direction preference reflects knowledge of structure and thus of structure dependence. For example, Lust and Clifford (1986) tested ninety-four children between the ages of 3;5 and 7;11 (mean age 5;7) on structures with prepositional phrase embedding such as 4 and 5. 4. Forward anaphora Under Oscar the Grouch, he quietly bounced the ball. 5. Backward anaphora On him, Cookie Monster quickly poured the orange juice. Elicited imitation findings reflected significantly greater success with forward than with backward anaphora, reflecting the same directionality finding seen above. However, possible interpretations of these structures vary: the antecedent and pronoun may corefer in the backward case (e.g., 5) but not the forward case (e.g., 4). Findings from a comprehension task in the same study revealed that children chose a coreference interpretation (e.g., selected the same doll to act out both clauses) significantly more often for the backward case, where it is grammatically possible, than for the forward case, where it is ruled out in the adult grammar. The directionality preference thus exists alongside structural knowledge. There is evidence that greater success with a particular directionality of anaphora reflects
language-specific input. The finding of greater success with forward anaphora in English reported by Lust (1981) was replicated by other studies collectively testing hundreds of children (e.g., Lust & Clifford, 1986; Lust et al., 1986). However, forward direction is not more productive than backward for languages with left-branching structure, such as Chinese (e.g., Lust, Chien, et al., 1996, who used a comprehension task). Together, these studies provide evidence that the course of language acquisition is constrained by the universal principle of structure dependence but at the same time reflects developing language-specific knowledge, such as branching direction. Evidence from studies using elicited imitation converges with evidence using other methods. Elicited imitation has also been used in studies of many other areas of syntactic knowledge. Examples include coordination in English (Lust, 1981), Chinese (Lust & Chien, 1984), and Japanese (Lust & Wakayama, 1979); the distinction between topics and subjects in Mandarin (Chien & Lust, 1985); knowledge of the syntax of control structures in English (Cohen Sherman & Lust, 1993); the interaction between prosodic structure and syntax in the acquisition of articles in English (Gerken, 1996); auxiliaries and verbal inflection in French (Dye, 2011); relative clauses in English (S. Flynn & Lust, 1981), French (Foley, 1995; Kail, 1975), and Tulu (Somashekar, 1999); question formation in English (e.g., Santelmann et al., 2002); and omission of sentential subjects in English (e.g., Gerken, 1991; Valian & Aubry, 2005; Valian et al., 1996). Semantics
As noted above and summarized by Allen and Hulme (2006), success in verbal serial recall is influenced by semantic features. Success tends to be higher for words than for nonwords and for concrete words than for abstract words. Research in this area suggests that semantics is among the linguistic capacities that can be tapped in experimental design using elicited imitation. Elicited imitation has been used over time to probe semantic knowledge in first language acquisition in a variety of areas. For example, an early study using elicited imitation explored acquisition of terms that relate actions temporally, finding that children at ages three to five years imitated sentences with sequential temporal meaning (e.g., using “before”) with greater success than sentences with simultaneous temporal meaning (e.g., using “while”) (KellerCohen, 1981). Research using elicited imitation has revealed semantic and conceptual knowledge relating to verb-object combinations. Valian et al. (2006) compared children’s imitations of sentences with predictable semantic content in the predicate (e.g., “The cat is eating some food”) and those with less-predictable content (e.g., “The cat is eating a sock”). Across two experiments, a total of forty-seven children acquiring English (ages 1;9–2;9) included major sentence constituents (subject, verb, object) in their imitations more frequently with predictable semantic content. Elicited imitation has also been used in first language acquisition studies of the scalarsemantic properties of different focus particles in Mandarin (Yang, 2000), the semantics of aspect in Greek (Panitsa, 2010), and modal verbs functioning as negative polarity items in
Dutch (Lin et al., 2018). The Nature of Elicited Imitation Evidence in First Language Acquisition
Elicited imitation is a valuable method for investigating contrasting theoretical approaches to the study of first language acquisition. For example, one theoretical perspective is that children use the language spoken around them to generalize patterns based on frequency and/or word and morpheme combinations (Ambridge & Lieven, 2011). A contrasting perspective is that children apply abstract knowledge of linguistic properties to the analysis of language data, producing utterances constrained by this knowledge (Chomsky, 1986; Lust, 1994). In the latter approach, researchers need a way to uncover knowledge of particular hypothesized dimensions of abstract knowledge. If precise enough and strong enough, evidence of such knowledge can support the claim that an innate language faculty guides acquisition. Elicited imitation offers the opportunity to control experimental design such that precise grammatical factors are present or absent in stimuli, allowing comparison of responses. Significant differences in success imitating stimuli that do and do not involve a factor being tested can offer evidence of knowledge of that factor (Lust, Flynn, & Foley, 1996). For example, the percentage of responses without changes can be compared across stimulus sentences that vary the factor being tested. The assumption that success in imitation reflects dimensions of abstract knowledge is warranted by the research summarized in the third through fifth sections in this chapter. Especially important to this claim is research demonstrating that speech production varies with syntactic complexity. For example, as previously described, in a study that investigated stability of motor production of speech across different syntactic structures, movement in three-dimensional space as well as timing of movements varied more for the same phrase in a syntactically complex environment than in isolation (Maner et al., 2000). Such findings would be unexpected if imitation did not reflect syntactic structure. Also previously noted, elicited imitation has played a critical role in research uncovering simultaneous awareness of universal principles and sensitivity to language-specific grammatical properties. The possibility for precise experimental design permits elicited imitation to be used together with comprehension methods (e.g., act-out, truth-value judgment) that also allow precise manipulation of experimental variables so that researchers can seek converging evidence across different areas of language performance. Elicited imitation has provided a critical way to test hypotheses about omissions in children’s spontaneous speech. For example, researchers have debated reasons for the omissions in children’s speech of sentential subjects, articles, and auxiliaries and/or verbal inflection. Experimental use of elicited imitation provides a way to probe possible reasons for these omissions that would not be evident through studies of spontaneous speech alone, because grammatical variables can be manipulated to see whether omissions occur differentially across different contexts. A growing body of findings demonstrates that such omissions seen in children’s spontaneous speech do in fact vary across linguistic (syntactic
and phonological) contexts that can be varied using elicited imitation (e.g., Dye, 2011; J. Y. Song et al., 2009; Valian & Aubrey, 2005). This use of elicited imitation extends to study of other apparent errors in child speech, such as omission of auxiliary inversion in question formation, which have now been shown to vary across syntactic context when tested via elicited imitation (e.g., Santelmann et al., 2002). Discussion and Conclusion
While the focus of this chapter has been the use of elicited imitation in studies of first language acquisition, the method has been used in studies of second language acquisition (see, e.g., Erlam, 2006; Flynn, 1987; Gaillard & Tremblay, 2016; Jessop et al., 2007; Munnich et al., 1994; Vinther, 2002; West, 2014). Yan et al.’s (2016) meta-analysis reviews eighteen theoretical papers on the role of elicited imitation in L2 research and fifty-eight empirical L2 studies using elicited imitation. The method has been used in studies of bilingualism in children (e.g., Komeili & Marshall, 2013). Studies using elicited imitation in second language acquisition and multilingualism research may even outnumber those using it in first language acquisition research. Elicited imitation has also been widely used in studies of specific language impairment (Coady & Evans, 2008; Conti-Ramsden et al., 2001; Leclerq et al., 2014) and in language assessment (Bowden, 2016; Cox et al., 2016). Despite this wide use, over its history, as traced by Vinther (2002), the method has been repeatedly challenged on various grounds. Even in the face of the robust evidence it can yield and the repeated convergence of evidence across studies using other methods, questions persist: The investigation of the role of sentence production mechanisms in sentence repetition is particularly pertinent given the wide use of the task as a diagnostic tool in developmental disorders such as specific language impairment.… The task is also sensitive to the language acquisition history of the child … and the levels of proficiency attained by learners of an additional language when compared to native monolingual speakers.… However, the key psycholinguistic mechanisms accounting for individual differences in sentence repetition are still poorly understood. (Nag et al., 2018, p. 305)
The debate over what elicited imitation reveals is particularly important for those seeking to use sentence repetition clinically and as a diagnostic tool. This concern has prompted researchers to examine whether it may reflect a specific construct that plays a role in human use of language or whether instead, as argued by Klem et al. (2014), it reflects language skill more generally. This debate is important for the clinical and assessment-related use of sentence repetition, where absolute performance on a task or tool may be used as a basis for decisions about interventions. In contrast, for those seeking to use elicited imitation for research investigating linguistic theory and language acquisition, open questions about constructs measured by elicited imitation may be pursued as a research question about cognition, but they are not grounds for dismissing the reliability and validity of the method in uncovering knowledge of a particular aspect of grammar. When used with an experimental design that varies grammatical factors across stimuli, elicited imitation can uncover relative success in imitation across factors, without a focus on absolute success.
At the same time, elicited imitation offers a method grounded in innate capacity—one that is seen in young infants. While the method taps cognitive abilities that develop, the foundational capacity for imitation is not in question even at the youngest ages tested. This is not the case for all methods. For example, elicited behavior methods depend to varying degrees on a child’s imaginative capacity (e.g., the capacity to distinguish reality and imagination, which develops over time; Weisberg, 2013). Picture-choice tasks call for maintaining both verbal and visual stimuli in short-term memory and comparing information from the two; even when conscious choice of a picture is replaced by more automatic looking and tracked with eye-movement methods, this additional capacity is tapped (Brandt-Kobele & Höhle, 2010). Understanding what these methods reveal requires taking into account the development of related cognitive capacities. The neurological connections between perception and action summarized in this chapter and the evidence of compositional analysis in motor imitation imply parallels in linguistic imitation. We have also summarized strong evidence that elicited imitation references linguistic knowledge and involves reconstruction of linguistic forms, helping to warrant the method’s use to study language. Studies using elicited imitation suggest several areas for future development of the method. For example, H.-J. Song and Fisher (2005) use elicited imitation as a method for assessing comprehension of pronouns in stories, suggesting that elicited imitation might be used more broadly to explore comprehension of context or discourse. Finally, the cognitive capacities required in elicited imitation and their connection to language knowledge and performance are increasingly well understood. As seen in this chapter, this is true for working memory, executive function, and language processing. The growing body of evidence from the cognitive sciences adds to the current understanding of this versatile method. Notes 1. A later overview of the method, its advantages and disadvantages, and design considerations appears in Blume and Lust (2017, pp. 119–131), but the focus of this later work is not the theoretical grounding of the method. 2. Some studies use the term sentence repetition instead of elicited imitation. We use elicited imitation in this chapter in part because our review encompasses use in studies of phonology and prosody where the unit of repetition is smaller than a sentence. 3. Right (or left) branching captures the fact that in some structures, the embedded clause appears to the right (or left). Forward (or backward) anaphora captures the fact that in some sentences, reference looks forward from antecedent to pronoun appearing later in linear sequence (or backward from antecedent to pronoun appearing earlier in linear sequence). References Abrahamse, E. L., van Dijck, J.-P., & Fias, W. (2017). Grounding verbal working memory: The case of serial order. Current Directions in Psychological Science, 26(5), 429–433. Acheson, D. J., & MacDonald, M. C. (2009). Verbal working memory and language production: Common approaches to the serial ordering of verbal information. Psychological Bulletin, 135(1), 50–68. Allen, R., & Hulme, C. (2006). Speech and language processing mechanisms in verbal serial recall. Journal of Memory and Language, 55(1), 64–88.
Allen, R. J., Baddeley, A. D., & Hitch, G. J. (2014). Evidence for two attentional components in visual working memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40(6), 1499–1509. Alloway, T. P., & Gathercole, S. E. 2005. The role of sentence recall in reading and language skills of children with learning difficulties. Learning and Individual Differences, 15(4), 271–282. Ambridge, B., & Lieven, E. V. M. (2011). Child language acquisition: Contrasting theoretical approaches. Cambridge University Press. Baddeley, A. (1984). Working memory. Clarendon Press. Baddeley, A. (1992). Working memory. Science, 255(5044), 556–559. Baddeley, A. (2003). Working memory: Looking back and looking forward. Nature Reviews: Neuroscience, 4, 829–839. Baddeley, A. D., & Hitch, G. J. (2019). The phonological loop as a buffer store: An update. Cortex, 112, 91–106. Barnat, S. B., Klein, P. J., & Meltzoff, A. N. (1996). Deferred imitation across changes in context and object: Memory and generalization. Infant Behavior and Development, 19, 241–251. Bekkering, H., Wohlschlager, A., & Gattis, M. (2000). Imitation of gestures in children is goal-directed. The Quarterly Journal of Experimental Psychology Section A, 53(1), 153–164. Blume, M., & Lust, B. (2017). Research methods in language acquisition: Principles, procedures, and practices. American Psychological Association. Bowden, H. W. (2016). Assessing second-language oral proficiency for research: The Spanish elicited imitation task. Studies in Second Language Acquisition, 38, 647–675. Brandt-Kobele, O.-C., & Höhle, B. (2010). What asymmetries within comprehension reveal about asymmetries between comprehension and production: The case of verb inflection in language acquisition. Lingua, 120, 1910–1925. Carlson, S. M., & White, R. E. (2013). Executive function, pretend play, and imagination. In M. Taylor (Ed.), The Oxford handbook of the development of imagination. Oxford University Press. Available at https://www.oxfordhandbooks.com/view /10.1093/oxfordhb/9780195395761.001.0001/oxfordhb-9780195395761 Carlson, S. M., Zelazo, P. D., & Faja, S. (2013). Executive function. In P. D. Zelazo (Ed.), The Oxford handbook of developmental psychology: Vol. 1. Body and mind (pp. 706–743). Oxford University Press. Chaminade, T., Meltzoff, A. N., & Decety, J. (2002). Does the end justify the means? A PET exploration of the mechanisms involved in human imitation. Neuroimage, 12, 318–328. Chien, Y.-C., & Lust, B. (1985). The concepts of topic and subject in first language acquisition of Mandarin Chinese. Child Development, 56(6), 1359–1375. Chomsky, N. (1981). Lectures on government and binding. Foris. Chomsky, N. (1986). Knowledge of language: Its nature, origin, and use. Praeger. Chomsky, N. (1995). The minimalist program. MIT Press. Clay, M. (1971). Sentence repetition: Elicited imitation of a controlled set of syntactic structures by four language groups. The University of Chicago Press for the Society for Research in Child Development. Coady, J., & Evans, J. L. (2008). Uses and interpretations of non-word repetition tasks in children with and without specific language impairments (SLI). International Journal of Language Communications Disorders, 43(1), 1–40. Cohen Sherman, J., & Lust, B. (1993). Children are in control. Cognition, 46, 1–51. Conti-Ramsden, G., Botting, N., & Faragher, B. (2001). Psycholinguistic markers for specific language impairment (SLI). Journal of Child Psychology and Psychiatry, 42 (6), 741–748. Cox, T., Bown, J., & Burdis, J. (2016). Constructing a Russian elicited imitation exam. Russian Language Journal, 66, 51– 88. Devescovi, A., & Caselli, C. M. (2007). Sentence repetition as a measure of early grammatical development in Italian. International Journal of Language and Communication Disorders, 42, 187–208. Dye, C. (2011). Reduced auxiliaries in early child language: Converging observational and experimental evidence from French. Journal of Linguistics, 47, 301–339. Dye, C., Walenski, M., Mostofsky, S. H., & Ullman, M. T. (2016). A verbal strength in children with Tourette syndrome? Evidence from a non-word repetition task. Brain and Language, 160, 61–70. Erlam, R. (2006). Elicited imitation as a measure of L2 implicit knowledge: An empirical validation study. Applied
Linguistics, 27(3), 464–491. Ervin, S. (1964). Imitation and structural change in children’s language. In E. Lenneberg (Ed.), New directions in the study of language (pp. 163–189). MIT Press. Flynn, E., & Whiten, A. (2008). Imitation of hierarchical structure versus component details of complex actions by 3- and 5year-olds. Journal of Experimental Child Psychology, 101(4), 228–240. Flynn, S. (1987). Second language acquisition of pronoun anaphora: Resetting the parameter. In B. Lust (Ed.), Studies in the acquisition of anaphora: Vol. 2. Applying the constraints (pp. 227–243). Reidel. Flynn, S., & Lust, B. (1981). Acquisition of relative clauses: Developmental changes in their heads. In W. Harbert & J. Herschensohn (Eds.), Cornell Working Papers in Linguistics, 1, 33–45. Foley, C. (1995). Opérateurs et competence de l’enfant. In J. J. Audette, M.-A. Bélanger, A. Bourcier, I. Dion, P. Larrivée, J. Nicole, F. Pichette, & E. Rosales (Eds.), Actes des 9e journées de linguistique. International Center for Research on Language Planning. Frazier, L. (2015). Two interpretive systems for natural language? Journal of Psycholinguistic Research, 44, 7–25. Frazier, L., Carlson, K., & Clifton, C. (2006). Prosodic phrasing is central to language comprehension. Trends in Cognitive Sciences, 10(6), 244–249. Gaillard, S., & Tremblay, A. 2016. Linguistic proficiency assessment in second language acquisition research: The elicited imitation task. Language Learning, 66(2), 419–447. Gerken, L. (1991). The metrical basis of children’s subjectless sentences. Journal of Memory and Language, 30, 431–451. Gerken, L. (1996). Prosodic structure in young children’s language production. Language, 72(4), 683–712. Gleissner, B., Meltzoff, A. N., & Bekkering, H. (2000). Children’s coding of human action: Cognitive factors influencing imitation in 3-year-olds. Developmental Science, 3(4), 405–414. Guenther, F. H., Tourville, J. A., & Bohland, J. W. (2015). Speech production. Brain mapping: An encyclopedic reference, 3, 435–444. https://www.sciencedirect.com/science/article/pii/B9780123970251002657?via%3Dihub Jessop, L., Suzuki, W., & Tomita, Y. (2007). Elicited imitation in second language acquisition research. The Canadian Modern Language Review, 64(1), 215–220. Kail, M. (1975). Étude génétique de la reproduction de phrases relatives. Année Psychologique, 75, 109–126. Karr, J. E., Areshenkoff, C. N., Rast, P., Hofer, S. M., Iverson, G. L., & Garcia-Barrera, M. A. (2018). The unity and diversity of executive functions: A systematic review and re-analysis of latent variable studies. Psychological Bulletin, 144(11), 1147–1185. Keller-Cohen, D. (1981). Elicited imitation in lexical development: Evidence from a study of temporal reference. Journal of Psycholinguistic Research, 10(3), 273–288. Klem, M., Melby-Lervag, M., Hagtvet, B., Halaas Lyster, S.-A., Gustafsson, J.-E., & Hulme, C. (2014). Sentence repetition is a measure of children’s language skills rather than working memory limitations. Developmental Science, 18(1), 1–9. Komeili, M., & Marshall, C. R. (2013). Sentence repetition as a measure of morphosyntax in monolingual and bilingual children. Clinical Linguistics and Phonetics, 27 (2), 152–161. Leclerq, A.-L., Quémart, P., Magis, D., & Maillart, C. (2014). The sentence repetition task: A powerful diagnostic tool for French children with specific language impairment. Research in Developmental Disabilities, 35, 3423–3430. Lin, J., Weerman, F., & Zeijlstra, H. (2018). Acquisition of the Dutch NPI Hoeven ‘Need’: From lexical frames to abstract knowledge. Language Acquisition, 25(2), 150–177. Lust, B. (1981). On coordinating studies of coordination: Problems of method and theory in first language acquisition—A reply to Ardery. Journal of Child Language, 8, 457–470 Lust, B. (1994). Functional projection of CP and phrase structure parameterization: An argument for strong continuity. In B. Lust, M. Suñer, & J. Whitmaneds (Eds.), Syntactic theory and first language acquisition: Cross-linguistic perspectives: Vol. 1. Heads, projections, and learnability (pp. 85–118). Lawrence Erlbaum. Lust, B., & Chien, Y.-C. (1984). The structure of coordination in Mandarin Chinese: Evidence for a universal. Cognition, 17(1), 49–83. Lust, B., Chien, Y.-C., Chiang, C.-P., & Eisele, J. (1996). Chinese pronominals in universal grammar: A study of linear precedence and command in Chinese and English children’s first language acquisition. Journal of East Asian Linguistics,
5(1), 1–47. Lust, B., Chien, Y.-C., & Flynn, S. (1987). What children know: Methods for the study of first language acquisition. In B. Lust (Ed.), Studies in the acquisition of anaphora: Vol. 2. Applying the constraints (pp. 271–356). Reidel. Lust, B., & Clifford, T. (1986). The 3-D study: Effects of depth, distance and directionality on children’s acquisition of anaphora. In B. Lust (Ed.), Studies in the acquisition of anaphora: Vol. 1. Defining the constraints (pp. 203–243). Reidel. Lust, B., Flynn, S., & Foley, C. (1996). What children know about what they say: Elicited imitation as a research method for assessing children’s syntax. In D. McDaniel, C. McKee, & H. Smith Cairns (Eds.), Methods for assessing children’s syntax (pp. 55–76). MIT Press. Lust, B., Solan, L., Flynn, S., Cross, C., & Schuetz, E. (1986). Distinguishing bound and free anaphora. In B. Lust (Ed.), Studies in the acquisition of anaphora: Vol. 1. Defining the constraints (pp. 245–77). Reidel. Lust, B., & Wakayama, T. K. (1979). The structure of coordination in children’s first language acquisition of Japanese. In F. Eckman & A. Hastings (Eds.), Studies in first and second language acquisition (pp. 134–152). Newbury House. Maner, K. J., Smith, A., & Grayson, L. (2000). Influences of utterance length and complexity on speech motor performance in children and adults. Journal of Speech, Language, and Hearing Research, 43(2), 560–573. Martin, R. C., Shelton, J. R., & Yaffee, L. S. (1994). Language processing and working memory: Neuropsychological evidence for separate phonological and semantic capacities. Journal of Memory and Language, 33(1), 83–111. Melby-Lervag, M., & Hulme, C. (2010). Serial and free recall in children can be improved by training: Evidence for the importance of phonological and semantic representations in immediate memory tasks. Psychological Science, 21(11), 1694– 1700. Meltzoff, A., & Moore, M. K. (1977). Imitation of facial and manual gestures by human neonates. Science, 198, 75–78. Meltzoff, A., & Moore, M. K. (1983). Newborn infants imitate adult facial gestures. Child Development, 54, 702–709. Meltzoff, A., & Moore, M. K. (1997). Explaining facial imitation: A theoretical model. Early Development and Parenting, 6, 179–192. Menyuk, P. (1963). A preliminary evaluation of grammatical capacity in children. Journal of Verbal Learning and Verbal Behavior, 2(5–6), 429–439. Miles, K., Yuen, I., Cox, F., & Demuth, K. 2016. The prosodic licensing of coda consonants in early speech: Interactions with vowel length. Journal of Child Language, 43(2), 265–283. Miller, G. (1956 [1994]). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 101(2), 343–352. Molenberghs, P., Cunnington, R., & Mattingley, J. B. (2009). Is the mirror neuron system involved in imitation? A short review and meta-analysis. Neuroscience and Biobehavioral Reviews, 33, 975–980. Munnich, E., Flynn, S., & Martohardjono, G. (1994). Elicited imitation and grammaticality judgment tasks: What they measure and how they relate to each other. In E. Tarone, S. Gass, & A. Cohen (Eds.), Research methodology in second language acquisition (pp. 227–245). Erlbaum. Nadel, J. (2002). Imitation and imitation recognition: Functional use in preverbal infants and nonverbal children with autism. In A. N. Meltzoff & W. Prinz (Eds.), The imitative mind: Development, evolution, and brain bases (pp. 42–62). Cambridge University Press. Nag, S., Snowling, M. J., & Mirkovic, J. (2018). The role of language production mechanisms in children’s sentence repetition: Evidence from an inflectionally rich language. Applied Psycholinguistics, 39, 303–325. Panitsa, G. (2010). Aspects of aspect: The acquisition of viewpoint and situation aspect [Unpublished doctoral dissertation]. University College London. Potter, M. C., & Lombardi, L. (1990). Regeneration in short-term recall of sentences. Journal of Memory and Language, 29, 633–654. Repovs, G., & Baddeley, A. (2006). The multi-component model of working memory: Explorations in experimental cognitive psychology. Neuroscience, 139, 5–21. Rummer, R., & Engelkamp, J. (2001). Phonological information contributes to short-term recall of auditorily presented sentences. Journal of Memory and Language, 45(3), 451–467. Samuel, A. G. (2011). Speech perception. Annual Review of Psychology, 62, 49–72.
Santelmann, L., Berk, S., Austin, J., Somashekar, S., & Lust, B. (2002). Continuity and development in the acquisition of inversion in yes/no questions: dissociating movement and inflection. Journal of Child Language, 29, 813–843. Somashekar, S. (1999). Developmental trends in the acquisition of relative clauses: Cross-linguistic experimental study of Tulu [Unpublished doctoral dissertation]. Cornell University. Song, H.-J., & Fisher, C. (2005). Who’s “she”? Discourse prominence influences preschoolers’ comprehension of pronouns. Journal of Memory and Language, 52, 29–57. Song, J. Y., Sundara, M., & Demuth, K. (2009). Phonological constraints on children’s production of English third person singular -s. Journal of Speech, Language, and Hearing Research, 52(3), 623–642. Valian, V., & Aubrey, S. (2005). When opportunity knocks twice: two-year-olds’ repetition of sentence subjects. Journal of Child Language, 32, 617–641. Valian, V., Hoeffener, J., & Aubrey, S. (1996). Young children’s imitation of sentence subjects: Evidence of processing limitations. Developmental Psychology, 32(1), 153–164. Valian, V., Prasada, S., & Scarpa, J. (2006). Direct object predictability: Effects on young children’s imitation of sentences. Journal of Child Language, 33, 247–269. Vinther, T. (2002). Elicited imitation: A brief overview. International Journal of Applied Linguistics, 12(1), 54–73. Weisberg, D. S. (2013). Distinguishing imagination from reality. In M. Taylor (Ed.), The Oxford handbook of the development of imagination. Oxford University Press. Available at https://www.oxfordhandbooks.com/view/10.1093 /oxfordhb/9780195395761.001.0001/oxfordhb-9780195395761 West, D. (2014). Assessing L2 lexical versus inflectional accuracy across skill levels. Journal of Psycholinguistic Research, 43, 535–554. Whiten, A., Flynn, E., Brown, K., & Lee, T. (2006). Imitation of hierarchical action structure by young children. Developmental Science, 9(6), 574–582. Williamson, R. A., & Markman, E. (2006). Precision of imitation as a function of preschoolers’ understanding of the goal of the demonstration. Developmental Psychology, 42(4), 723–731. Wohlschläger, A., Gattis, M., & Bekkering, H. (2003). Action generation and action perception in imitation: An instantiation of the ideomotor principle. Philosophical Transactions of Royal Society of London Series B: Biological Sciences, 358, 501– 515. Yan, X., Maeda, Y., Lv, J., & Ginther, A. (2016). Elicited imitation as a measure of second language proficiency: A narrative review and meta-analysis. Language Testing, 33 (4), 497–528. Yang, X. (2000). Focus and scales: L1 acquisition of CAI and JIU in Mandarin Chinese [Unpublished doctoral dissertation]. Chinese University of Hong Kong.
II CHILDREN
5 The Development of Person and Number Agreement in Child Heritage Speakers of Spanish Learning English as a Second Language Jennifer Austin, Liliana Sánchez, Silvia Perez-Cortes, and David Giancaspro
Generativist analyses of the acquisition of agreement have long debated the nature of children’s errors in producing person and number inflection and what these nontarget productions can tell us about children’s underlying grammatical representations. While there has been considerable research on the production of verbal morphology in adult heritage learners of Spanish (e.g., Montrul 2002, 2009; Silva-Corvalán, 1994, 2003), there is far less data from child heritage bilinguals, particularly from longitudinal studies, or from research that addresses their development in both languages (Anderson, 2001; Silva-Corvalán, 2014). In addition, investigations of child heritage speakers (HS) have generally focused on the acquisition of morphosyntactic properties other than person and number inflection (tense and aspect: Cuza & Miller, 2015; Cuza et al., 2013; Silva-Corvalán, 2003; mood: e.g., Anderson, 1999a, 1999b, 2004; Merino, 1983). In this chapter, we focus on the development of Spanish and English verbal inflection by reporting the results of four experimental sessions scheduled over a period of two years.1 The study interviewed five child HS of Spanish who were learning English as a second language (L2) through immersion in school (mean age 4;6 in session 1 and 6;2 in session 4). Through a narrative and a picture description task, we elicited speech samples in both languages to examine the development of tense and agreement morphology. Verbs in finite contexts were coded for correct tense, number, and person inflection, and the children’s linguistic proficiency and complexity were calculated by comparing the production of verbs in Spanish versus English. In both languages, the children produced more inflected verbs over the four sessions, fewer bare verbs over time in both Spanish (10 percent in the first session, 0 percent in session 4) and English (37 percent bare verbs in the first session, 17 percent in the fourth), as seen in tables 5.2 and 5.4. However, the children behaved quite differently in the two languages in their production of target-like verbal inflection. In English, the children’s production of target-like verbal morphology remained roughly constant across the four sessions (92 percent in the first session, 89 percent in the fourth). In Spanish, the children’s
production of (nontarget) default third-person agreement declined from 48 percent in the first session to 14 percent in the fourth session. However, in Spanish the children also began producing nontarget inflection with person and number combinations that were not thirdperson singular (0 percent in the first session to 32 percent in the fourth session). These results differ from the children’s development in English and also from the pattern reported in the literature for young monolingual and bilingual children acquiring Spanish, in which the production of nontarget inflection declines steadily over time. Our data suggest that child HS of Spanish produce more nontarget agreement than monolingual and previously studied simultaneous bilinguals at similar ages (Anderson, 2001; Austin, 2001; Bel, 1998, 2002; Ezeizabarrena, 1996; Grinstead, 2000). Differences between these three groups also emerge when we analyze the type and frequency of the inflectional divergences (i.e., instances of bare verbs in the elicited narratives). We argue that these results can best be accounted for by attrition and reduced lexical activation due to increased exposure to English (Putnam & Sánchez, 2013). Later in the chapter, we provide summaries of the background literature on inflectional development in monolingual children and the previous research on grammatical maturation in child learners. Then we review the research on inflectional development in child bilinguals and present our research questions, followed by our methods, discussion, and conclusions. The Development of Inflection in Monolingual Children’s Grammars
Young children between the ages of 1;8–3;6 acquiring a first language (L1) sometimes omit person and number inflection that is obligatory in adult grammar, producing bare verbs in languages such as English and infinitives in Spanish,2 as shown in the examples in 1 and 2. This phenomenon is known as the root infinitive stage: 1. Adult: Tell me how the truck goes. Child: go home. [AF, 2;2]3 2. El otro buscar the other look for (INF) ‘look for the other one’ [María, 1;8] (Liceras et al., 1999) Children produce more utterances with bare verbs in non-pro-drop languages such as Icelandic, English, and French than in languages with rich agreement and pro-drop such as Spanish (Bel, 2002; Blume, 2002; Montrul, 2004), Italian (Guasti, 1993/1994), and Catalan (Torrens, 1995). By age 2;1, children acquiring Spanish as an L1 are reported to produce inflected verbs in 98 percent of possible contexts (Bel, 1998), meaning that they produce very few bare verbs. Children acquiring English as an L1, in contrast, produce inflected verbs only 44 percent of the time (Phillips, 1996), meaning that bare verbs are a much more frequent occurrence in their early speech. Grammatical Maturation and Inflectional Development
Although children produce bare verbs, which are not a part of adult speech, generativist accounts of L1 morphosyntactic development nonetheless assume that children are endowed with (at least) some degree of adultlike grammatical competence from birth. There are three primary variants of this theoretical position, each of which posits a different extent to which early child grammars resemble the grammars of adult native speakers. Proponents of the grammatical maturation hypothesis argue that child grammars, unlike adult grammars, lack functional categories, therefore causing children to produce bare verbs where adults do not. According to Guilfoyle and Noonan (1992), Tsimpli (1992), and Radford (1990), children’s inability to project functional categories until around two years of age is the result of biological constraints on their grammatical development. Radford, for example, suggested that children pass through three maturational phases in their early grammatical development. During the first phase, lasting until around 20 mo, children are agrammatical. In the second phase, taking place between 20 and 24 mo, children are able to project lexical but not functional categories. Finally, after 24 mo, children develop a fullfledged grammar, which, like adult grammars, includes both lexical and functional categories. In recent years, the grammatical maturation hypothesis has fallen out of favor due to evidence that functional categories emerge sooner in some languages than in others, even in bilingual children (Bohnacker, 1997; Paradis & Genesee, 1997). If the emergence of functional categories is biologically timed, as proposed by Radford (1990) and others, then there is no clear reason why children who speak one native language should develop certain functional categories earlier (or later) than children who speak another language. As a result of these findings, most generativists now reject the grammatical maturation hypothesis, instead preferring some variant of the continuity hypothesis, which reflects the “prevailing view (that) … despite apparent differences, child grammars are essentially like adult grammars” (Montrul, 2004, p. 12; see also Lust, 1999). In the next sections, we discuss the two most common variants of this broad hypothesis. Unlike the grammatical maturation hypothesis, the weak continuity hypothesis assumes continuity between child and adult grammars, especially in the domain of functional categories. Within this broader hypothesis, however, there are a number of distinct theories that seek to explain the nature of the similarities and differences between children and adults’ knowledge of functional categories. The lexical learning hypothesis (Clahsen et al., 1996) posits that children build functional projections on the basis of exposure to the relevant linguistic input. The optional infinitive hypothesis (Wexler, 1994) and the underspecification hypothesis (E. Gavruseva, 2003), on the other hand, propose that children have functional categories that are somehow either underspecified and/or missing key elements of the adult grammar. Some proponents of weak continuity have argued that children acquire functional categories by acquiring the lexical items associated with those categories, meaning that children’s grammars are initially incomplete, only becoming adultlike when lexical elements in the input trigger the development of missing functional categories (e.g., Clahsen, 1990;
Clahsen & Penke, 1992; Gawlitzek-Maiwald et al., 1992; Meisel & Müller, 1992; Platzack, 1992). As described by Clahsen (1999, p. 1007), this proposal assumes that “grammatical development may result from increases in the child’s lexicon, that is, from the set of lexical and morphological items the child has acquired.” In support of this proposal, Platzack and Clahsen and Penke have argued that children’s acquisition of verbs is what allows them to activate the functional projection, IP, above VP. Furthermore, these authors argue that children acquiring languages with impoverished agreement morphology (e.g., English and Swedish) take longer to develop the IP projection than children acquiring languages with rich inflectional morphology such as Spanish, likely due to the higher percentage of verbal inflection in the input. The strong continuity hypothesis, as the name indicates, assumes stronger continuity between child and adult grammars, which are taken to be essentially identical (Borer & Rohrbacher, 2002; Crain & Thornton, 1998; Hyams, 1992; Lust, 1999; Phillips, 1995, 1996; Poeppel & Wexler, 1993). Consequently, proponents of the strong continuity hypothesis attribute differences between child and adult speech to differences in morphological, rather than syntactic, development. One such explanation comes from Santelmann et al. (2000), whose feature mapping hypothesis states the development of INFL reflects the child’s attempt to integrate Universal Grammar (UG)-given areas of grammar with the morphological characteristics of the target language. These authors argue that while UG provides the child with knowledge of the grammar of inflection, including the existence of formal features (FF), it does not provide the child with knowledge of how or where such FFs are realized in inflection. Consequently, children make errors of both commission and omission with inflectional morphology as they learn to map FFs to the phrase structure of their target language. Nonetheless, these errors are expected to always reflect possible, UGlicensed inflectional patterns. In summary, generativist theories of inflectional development in monolingual children that argue for grammatical maturation have attributed inflectional errors to an absence of functional categories or to their underspecification, whereas theories that defend continuity between child and adult grammars have attributed children’s inflectional errors to extragrammatical factors, such as the mapping of functional features onto morphology. In the following section, we turn to the literature on inflectional development in bilingual children. The Development of Verbal Inflection in Child Bilinguals
Previous research indicates that simultaneous bilingual children who acquire one language with rich, pro-drop licensing morphology and another language with less rich inflection exhibit a discrepancy in the production of finiteness between languages, mirroring the patterns observed in monolingual children. Thus, simultaneous bilingual children learning Spanish and English appear to master person and number agreement early in Spanish, exhibiting almost uniformly target-like inflection of these features by age 3;0 (Bel, 2002; Blume, 2002; Montrul, 2004), whereas in English they do not consistently produce person and number morphology until closer to age 3;6–4;0. This lead-lag pattern has been observed
in the acquisition of different language pairings, such as English and French (Paradis & Genesee, 1997), English and German and English and Latvian (Sinka & Schelleter, 1998), Spanish and English (Deuchar & Quay, 2000), and Spanish and Basque, though, in this case, both Spanish and Basque are morphologically rich pro-drop languages (Austin, 2010). For children simultaneously learning two languages with rich inflection, their production of verbal inflection in both languages is high. In a study of ten children (seven monolingual speakers of Spanish (N=2) and Catalan (N=5) and three Spanish-Catalan bilinguals, all between the ages of 1;7–3;0), Serrat and Aparici (2001) found that 1.9 percent of the monolingual and bilingual children’s utterances contained errors in verbal agreement (121/6489). Of the utterances with nontarget person and number, 78 percent (93/119) consisted of the use of third-person-singular inflection in contexts requiring first- or secondsingular and third-plural forms. The overextension of third-person-singular agreement, however, also occurs in monolingual Spanish-speaking children, as reported by Radford and Ploennig-Pacheco (1995). In an illustrative case study, these authors document that a monolingual Mexican child (age 2;2–2;8), despite correctly producing all first- and secondperson inflectional forms, produced a number of third-person-singular forms (20 percent of all third-person forms produced) with verbs requiring first- or second-person agreement, suggesting that third-person-singular agreement is a kind of default or “elsewhere” form both for L1 and 2L1 learners. In a study of twenty bilingual children (ages 2;0–3;6) acquiring Spanish and Basque simultaneously, Austin (2001) found that children produced only 1.5 percent of verbs with nontarget person or number agreement, demonstrating highly accurate knowledge of these morphological forms. Similarly, Sanz-Torrent et al. (2008) found that monolingual and bilingual children (ages 4;7–5;3) acquiring Catalan and Spanish exhibited an even smaller percentage (0.09 percent) of person/number agreement errors. The patterns obtained in Anderson’s (2001) study of two heritage Spanish speakers growing up in the US, however, differed from those reported in the aforementioned set of studies. Her longitudinal study followed two Spanish-speaking children (starting at ages 3;6 and 1;6) who were immersed in English at school or day care but continued to speak exclusively in Spanish at home with their parents. In this two-year study, Anderson found that, while verbal errors in Spanish (inflection plus verb choice) remained steady in the speech of the older child (fluctuating between 2.3 and 9 percent), they were more frequent in the younger child, whose error rates increased over time (from 3.8 to 13.9 percent). Most of the children’s verb errors involved person/number agreement (42.3 percent for the older child and 44.7 percent for the younger one) and consisted of the use of default third-person singular in contexts requiring other person/number contrasts. This nontarget overextension of third-person-singular morphology accounted for 69 percent of the person/number errors of the older child and 91 percent of the person/number errors of her younger sibling. In the final sessions that Anderson recorded, both children’s sensitivity to person and number contrasts appeared to diminish, especially in the younger sibling, who used no plural forms at all across the last two recording sessions.
The comparison between the results in Anderson (2001) and those reported in Austin (2001) and Sanz-Torrent et al. (2008) highlights the effects of sociolinguistic context on the acquisition of more than one language. In contrast with the latter studies, where the two languages being acquired (Basque and Spanish and Catalan and Spanish, respectively) enjoy roughly coequal status, Spanish in the United States is clearly a minoritized language whose usage is largely restricted to the home and (in some cases) neighboring community, unlike English, which is prevalent in all societal settings. Because Spanish in the US is both less prestigious than English and less commonly used across a variety of settings (e.g., school), child HS of Spanish in the US usually undergo a dominance shift from Spanish to English once they begin school, where English is almost always the preferred and exclusive language of instruction (Pires & Rothman, 2009; Valdés, 2001). This rapid decrease in heritage speakers’ production of (and exposure to) Spanish is argued to trigger their attrition or reanalysis of certain linguistic properties in the home language (Putnam & Sánchez, 2013), as seen in the growing percentage of person/number errors in Spanish produced by the children reported in Anderson’s (2001) study. While each of the three studies discussed in this section focused on children’s development of the home language, it is important to note that some of the participants started their exposure to the majority language at three years old, only receiving input from the L1 until that point. These bilinguals, known in the literature as sequential bilinguals (Blom & Unsworth, 2010), are not only HS of a minority language but also L2 learners of the majority one. This is precisely the case of the participants analyzed in this chapter, who lived in a heavily Spanish-speaking community and received most of their input in Spanish before they started school (for more details, see the subsection titled “Participant Information” later in this chapter). The Acquisition of Inflection in Child and Adult L2 Learners
Like L1 learners, adult L2 learners sometimes produce bare verbs in finite contexts, as seen in example 3 from Lardiere’s (1998, 2003, 2006) longitudinal case study of Patty, an adult L1 speaker of Mandarin and Hokkien learning L2 English: 3. He have the inspiration to say what he want to say. (Lardiere, 1998) Unlike L1 learners, however, adult L2 speakers often fossilize at the bare verb stage, meaning that they never reach a point in their development of the L2 in which they produce inflected verbs in finite contexts at the same rate as adult native speakers. In this case study, Lardiere found that after ten years of exposure to English, Patty produced past tense morphology in just 34 percent of obligatory contexts and third-person-singular -s in just 0 to 5 percent of obligatory contexts. Despite her relatively infrequent production of target-like inflectional morphology, however, Patty was able to raise the verb in settings where it was required, such as negation, thereby demonstrating target-like syntactic knowledge. To explain the puzzling discrepancy between Patty’s minimal production of agreement morphology and her target-like command of syntactic movement, Lardiere proposed that adult L2 learners like Patty, even after acquiring the syntactic features of tense (T), may continue to produce
tenseless verbs due to postsyntactic difficulties in mapping abstract functional features to morphophonological forms. Lardiere’s proposal has been supported by subsequent work such as the missing surface inflection hypothesis (Prévost & White, 2000), an approach couched in the Distributed Morphology framework (Halle & Marantz, 1993), which posits that uninflected forms are produced by L2 speakers as a kind of morphological default when a more specific form cannot be retrieved. In the case of adult L2 Spanish, McCarthy (2006) has argued that morphological underspecification results in the production of examples such as 4, where the speaker produces finite, third-person-singular inflection instead of the first-person-singular inflection that they intended to express. Unlike in English, where missing surface inflection results in the production of bare verb forms, missing surface inflection in Spanish results in the use of default, third-person-singular inflection, as shown in the bilingual child data discussed below. 4. Nació en Boston be.born-FUT-3SG in Boston ‘I was born in Boston’ (McCarthy, 2006) These analyses contrast with earlier research in L2 acquisition, which attributed examples such as 3 and 4 to either L2 learners’ lack of access to Universal Grammar or unspecified L2 functional features (Bley-Vroman 1989; Hawkins & Chan, 1997; Hawkins & Hattori, 2006; Tsimpli, 2003). Child L2 learners, like adult L2 learners, also produce bare verbs, a finding that Vainikka and Young-Scholten (1996) interpreted as evidence in favor of weak continuity in child L2 acquisition—that is, that functional categories are not initially available to child L2 learners. Other researchers, however, have argued for strong continuity in child L2 development, citing child L2 English speakers’ production of negation, inversion in questions, and pronominal case as evidence that they are capable of projecting IP and CP, regardless of their L1, and even if they continue to produce bare verbs (L. Gavruseva & Lardiere, 1996; Haznedar, 2003; Lakshmanan 1993/1994). The production of bare verbs by child L2 learners differs from that of both child L1 acquirers and adult L2 learners in two important respects. First, child L2 learners produce fewer bare verbs than each of the other two groups. Lakshmanan (1993/1994) studied four children (ages 4;5–5;0) who were learning English as an L2; two had Spanish as an L1, one spoke French as an L1, and one was an L1 speaker of Japanese. Results of her study indicated that three of the four child L2 learners correctly produced inflected verbs in 70 percent or more of finite contexts within a few months of immersion in an English-speaking school. Second, child L2 learners differ from child monolinguals and L2 adults with respect to their ultimate attainment of inflected verbs, which is less target-like than that of L1 learners (Brouwer et al., 2008) but more target-like—and less likely to fossilize—than that of adult L2 acquirers. The Acquisition of Inflection in Child Heritage Bilinguals
Heritage speakers are bilinguals whose home language is a minority language in the society in which they live. Research on adult heritage speakers of Spanish in the United States has provided evidence that their morphosyntactic knowledge of Spanish is not the same as a Spanish-dominant speaker’s in areas of grammar such as gender (e.g., Montrul et al., 2014), aspect (e.g., Montrul, 2002; Montrul & Perpiñán, 2011), and mood (e.g., Giancaspro, 2017; Montrul, 2008; Montrul & Perpiñán, 2011; Perez-Cortes, 2016). Within the field of heritage language acquisition research, there has been debate as to whether such divergences between heritage speakers of Spanish and Spanish-dominant speakers should be attributed to underlying representational differences (e.g., Montrul, 2002, 2008), input quality (e.g., Montrul & Sánchez-Walker, 2013; Rothman, 2009), crosslinguistic influence (e.g., SilvaCorvalán, 1994), or other factors (e.g., Putnam & Sánchez, 2013). Although morphological errors produced by monolingual children or child L2 learners are sometimes attributed to maturational constraints on grammatical development, morphological errors produced by adult heritage speakers cannot be the result of maturational constraints on grammar, given that adult heritage speakers are cognitively mature and, by definition, acquired the heritage language from birth. According to Putnam and Sánchez (2013), adult heritage speakers’ morphological errors are actually attributable to difficulties that they experience in retrieving correct morphophonological forms: as heritage speakers activate the heritage language less and less frequently, it becomes more and more difficult for them to retrieve such forms. While there has been considerable research on adult heritage learners, fewer studies are dedicated to the linguistic development of child heritage bilinguals. In our longitudinal study, we report on findings from children who are at the beginning stages of immersion in English and who, unlike the children in previous research (e.g., Anderson, 2001), live in a community where Spanish is widely spoken. The Study
Our study explored the morphosyntactic effects of language attrition in the Spanish of five child heritage bilinguals, as well as the concurrent morphological development of their L2 English. In particular, we addressed the following research questions: 1. How does the rate of production of inflectional errors compare in the children’s Spanish and English? 2. What type of divergences characterize morphosyntactic production in child heritage Spanish? How do they compare to divergences produced by Spanish monolingual children, simultaneous bilinguals, and other heritage learners? 3. Do child heritage speakers of Spanish show morphological signs of attrition in their home language after immersion in English? Methodology
The participating children (N=5) were interviewed at school over a two-year period in four
different experimental sessions. During these meetings, participants were instructed to complete two production tasks: (1) a story retelling activity based on Mayer’s (1969) Frog, Where are You? series, and (2) a description of images depicting different everyday situations (e.g., going to the dentist, attending a birthday party, or celebrating Halloween). The interviews, which lasted an average of fifteen minutes, were conducted in Spanish and English. In order to control for language mode and avoid any type of interference, researchers collected data in each language at least one week apart. Participant Information
The study took place at a school located in an urban setting in the US northeast. According to the 2016 American Community Survey, 72.8 percent of the city’s residents (and 62.3 percent of the school’s student body) spoke Spanish at home. While this trend is common across the city’s county, where 45.4 percent of the population also speaks this language, it is significantly higher than the state’s (18.6 percent) and the country’s (16.1 percent) averages. The information obtained from the language background questionnaires completed by the participants’ families reflected the heterogeneity of the Hispanic population residing in the area of study. Three of the participants had parents from Ecuador, one from Honduras, and one from Peru. Crucially, none of these countries uses a dialect of Spanish in which verbal agreement markers are aspirated (pronounced as [h] rather than [s]), preventing confounds in the coding of number divergences. All five participants were born and raised in the United States and lived in bilingual homes. On the basis of the information provided, they were categorized as sequential bilinguals who learned Spanish as a first language and English between the ages of 1;6 and 3;0 (mean age 2;3).4 At the beginning of the study, participants’ ages ranged from 4;4 to 4;10 (mean age 4;9), reaching the ages 5;9–6;6 (mean age 6;1) by the last session. In addition to their origin, parents were also asked about their children’s perceived language proficiency in English and Spanish (see figure 5.1). As shown, three of four families responded that their children were more comfortable using Spanish than English in the home, and only one of them indicated that their child used both languages equally.5 In terms of language dominance, the parents responded that three of the four children spoke Spanish like natives and one spoke Spanish very fluently. For English, families rated one child at a native-like level, one as very fluent, and two as having some difficulty speaking English.
Figure 5.1 Parental reports of children’s language proficiency.
Given the information from the background questionnaire, as well as the fact that the participants live in an area of New Jersey that is almost exclusively Spanish speaking, we assume that the children in this study are Spanish-dominant speakers who have not yet undergone the transition to English dominance that is so characteristic of Spanish speakers in the United States (see, e.g., Lipski, 1993). Figure 5.2 shows the percentage of verbs that the children produced in each language in a measure of their production of complex sentences in Spanish and English..
Figure 5.2 Production of verbs in complex sentences in Spanish and English. Coding
The recorded interviews were transcribed, and verbs that were either inflected or required inflection were coded according to the types of categories in the left-hand column of table 5.1.
Table 5.1 Children’s inflectional errors in Spanish and English Type of divergence
Spanish
English
Number
INT: “¿Dónde estaba esta niña en esta foto?” (Where was this girl in this picture?) ((They) were at the doctor’s) CHI: “Estaban [*] en el doctor”
CHI: “The boys *is shopping”
Person
INT: “¿Dónde estaba?” (Where was (he)?) CHI: “*Estabas en el jungle gym” (He was on the jungle gym)
Tense
CHI: “Yo (es)taba corriendo con Maria # (es)taba corriendo y me # *coge en el agua” (I was running with Maria, was running and (she) catches me in the water)
Aspect
CHI: “Cuando yo estaba en pre-K, yo #estaba pintando dibujos” (When I was in pre-K, I was drawing pictures)
Missing auxiliary
INT: “¿Y qué hacía?” (And what was it doing?) CHI: “*poniendo en la roca” (Putting (it) on the rock)
No inflection (infinitive in Spanish, bare verb in English)
CHI: “el niño estaba jugando con la tierra y *meter aquí” (The boy was playing with the sand and he put it here)
Mood
CHI: “[…]balde y después él quería poner para que lo *ves” ([…] bucket and then he wanted to put it so that you see him)
CHI: “Yeah. And I *gots.” INT: “What did you do?” CHI: “I #play soccer.”
CHI: “The frog *following the footsteps of the boy” CHI: “Then the frog *follow the boy to his house” (unclear whether T or P/N is missing)
INT = interviewer; CHI = child; P/N = person and/or number; T = tense; # = pause; * = ungrammatical because of missing person, number, or tense
We obtained a total of 1023 verb tokens in Spanish and 1453 verb tokens in English. Data transcription and coding were completed independently by four researchers and were checked to ensure a high degree of inter-rater reliability. In the following section, we report the results obtained in the four interviews conducted with each participant. We begin by focusing on the overall error rates in each language (Spanish and English) and follow with an in-depth analysis of the data based on the type of divergence. Results
In general, when we compiled all the recordings together, the children produced a higher proportion of target-like inflection in Spanish, their heritage language (89 percent), than in their L2, English (68 percent correct). When we examined patterns of development, however, they showed similar trajectories in each of their two languages. In Spanish, the percentage of nontarget forms produced remained steady across the data-collection sessions (mean error rate in the first session: 12 percent; mean error rate in the last session: 11 percent), as seen in figure 5.3.
Figure 5.3 Production of nontarget inflection in Spanish.
In contrast, children produced fewer errors in English by the end of the two years of data collection. While in the first session 39 percent of the verbs contained nontarget inflection, by the end of the last session this number had decreased to 29 percent, as shown in figure 5.4.
Figure 5.4 Production of nontarget inflection in English.
Thus far, we have presented the broad pattern of children’s nontarget-like inflection in each language. To complement this data, we now turn to a more specific analysis of the types of divergences made by the children in both Spanish and English. As mentioned above, we coded for errors in the children’s production of person, number, tense, mood, and aspect, as well as utterances that were missing auxiliaries or contained bare verbs. In both languages, a sizable percentage of inflectional errors involved the production of nontarget tense inflection (29 percent in Spanish, 51 percent in English); however, other error patterns differed across the languages. Out of all the children’s divergent productions in Spanish, 44 percent consisted of person/number mismatches (using third person as a default form or other types of errors) while only 8 percent consisted of missing inflection (e.g., missing auxiliaries or infinitives used in finite contexts), as presented in table 5.2. Note that in this table we have
categorized the person/number nontarget forms according to whether they were divergent because they involved substituting third-person-singular (3sg) (default) inflection for other parts of the verbal paradigm or nondefault mismatches, in which other person/number combinations were produced. Table 5.2 Spanish nontarget inflection by error type
Some examples are provided in 5: 5. a. Missing auxiliary: CHI4: uh y < perso> [= el perro?] caminando también uh and xxx [the dog?] walk-IMP also *‘Uh and xxx [the dog?] walking also’ Target: uh y < perso> [= el perro?] está caminando también b. Default person/number agreement (3sg) error: CHI5: solo toma leche (los hermanitos) only drink:3sg milk younger siblings *‘Little brothers/sisters only drink milk’ Target: solo toman leche (los hermanitos) c. Nondefault person/number error: CHI3: Mi mami los dejas ir con las bicicletas # a ir # para ir al parque My mommy them let:2sg go with bikes to go to go to the park *‘My mommy lets them (us) go with bikes to the park’ Target: Mi mami nos deja ir con las bicicletas # a ir # para ir al parque d. Tense error: CHI2: Cuando él (es)taba chiquito no come nada When he was little NEG eat:PRES:3sg nothing
*‘When he was little he eats nothing’ Target: Cuando él (es)taba chiquito no comía nada e. Bare verb (infinitive): CHI2: El niño (es)taba jugando con la tierra y meter aquí the boy was playing with the dirt and to put here *‘The boy was playing with the dirt and to put here’ Target: El niño (es)taba jugando con la tierra y la metió aquí f. Aspect error: CHI3: Cuando yo estaba en pre-K yo estaba pintando un dibujo When I was:IMP in pre-K I was painting a drawing *‘When I was in pre-K, I was painting (painted) a drawing’ Target: Cuando yo estaba en pre-K, pinté un dibujo g. Mood error: CHI2: Él quería escapar para que no lo coge He want:IMP to escape so that NEG him catches (indicative) *‘He wanted to escape so that he/she doesn’t catch him’ Target: Él quería escapar para que no lo cogiera (subjunctive) Overall, the children produced many more Spanish verbs with nondefault agreement errors (18/1023, or 1.8 percent of all verbs) than has been reported in the L1 and 2L1 literature (Radford & Ploennig-Pacheco, 1995; Serrat & Aparici, 2001) and by Anderson (2001) for heritage speakers. In table 5.3, these errors are presented in detail. Table 5.3 Nondefault agreement errors in Spanish
Person errors
Number errors
Total
Tokens Type of features
Example
1
2sg for 3pl
3
2sg for 3sg
1
1sg for 3sg
Adult: Y entonces ¿dónde estaban estos niños, qué hacían? (And so where were these kids, what were they doing?) CHI5 (Es)tás jugando soccer *‘You are playing soccer’ Adult ¿Dónde estaba? (Where was he/she?) *CHI1: Estabas en el jungle gym *‘You were on the jungle gym’ CHI5: Mi mami ya no puedo baja(r) *‘My mother already I can’t get down’
12
3pl for 3sg
1
1sg for 1pl
CHI2: Después van a jugar toda la familia *‘Afterwards they are going to play the whole family’ CHI2: Y él y yo me tome una foto *‘And he and I I took a photo’
18
The children’s inflectional errors documented in English took different forms. In contrast to the patterns seen in the children’s Spanish, in which 44 percent of inflectional errors involved mismatched person or number and 8 percent of errors were nonfinite verbs, 13 percent of the children’s inflectional errors in English were comprised of person/number
mismatches, whereas 43 percent of nontarget verbal forms were missing inflection or missing an auxiliary (see table 5.4). Table 5.4 Nontarget inflection in English
Examples of these divergences can be seen in 6: 6. a. Missing auxiliary: Adult: Okay. Can you remember what happened next? CHI2: *The frog uh he no wanna to sleep. Adult: He he didn’t want to sleep? Adult: what was he doing? CHI1: doing the day of Halloween b. Tense error: Adult: And then what happened when the boy and the dog woke up? CHI2: He [/] he go to school for [/] for [/] for b(oys) & g girls and boys. Adult: The frog went to a school for girls and boys? c. No inflection on verb: CHI2: He go this school for boys d. Person/number mismatches: Adult: And what do you think they were doing in the pool? CHI1: They was washing. Adult: they were washing or swimming maybe? In the following section, we will discuss the implications of these results for theories of developing inflection in heritage speakers. Discussion
Several interesting patterns emerged from our data. First, although the children produced a similar percentage of inflectional errors in Spanish and in English, the errors had a distinct profile in each language. In English, the agreement errors consisted primarily of the omission of auxiliaries or the production of bare verbs without inflection (43 percent), whereas the
Spanish agreement errors consisted of verbs with nontarget inflection for person and/or number (44 person). In both languages, a sizable percentage of inflectional errors involved the nontarget production of tense (29 percent in Spanish and 44 percent in English). The rate of inflectional errors in the children’s English across all four sessions was 32 percent (463/1453), declining from 39 percent in the first session to 29 percent in the final session. These percentages are comparable to the error rates produced by the children who participated in Lakshmanan’s (1993/1994) study of young L2 learners who were immersed in an English-speaking environment. The inflectional errors produced by our participants differ from the errors produced by Spanish-speaking children who are not heritage speakers both in terms of the higher percentage of nontarget forms they produced and the nature of these nontarget forms. First, our participants (who had a mean age of 4;6 in the first session and 6;2 in the last) produced a consistently high percentage of inflectional errors, around 11 percent on average. In contrast, monolingual and bilingual children learning Spanish and Catalan who are the same age as our participants (4;7–5;3) produced nontarget inflection in 0.25 percent of their utterances (Sanz-Torrent et al., 2008), perhaps because Catalan has rich inflection, unlike English. Second, our heritage Spanish participants’ person-agreement errors had a different pattern from the one observed in previous L1 and 2L1 research. Unlike the L1 and 2L1 participants in Serrat & Aparici (2001), whose errors primarily (78 percent) consisted of substituting default (3sg) forms for other parts of the inflectional paradigm, our participants produced a relatively lower percentage (64 percent) of these nontarget forms and a relatively higher percentage (36 percent) of other person/number agreement errors. An example of an error using nondefault agreement can be seen in 7, where the child is using second-person-singular agreement with a third-person-singular subject: 7. Adult: Por ejemplo, aquí que hacía este niño? ¿Dónde estaba? For example, here what was doing this boy? Where was he? ‘For example, what was this boy doing here? Where was he?’ CHI1: *Estabas en el jungle gym be.PAST.2nd.SG on the jungle gym ‘You were on the jungle gym’ Such (nondefault) agreement errors, which were not produced by the children at all in session 1, emerged in session 2 and gradually increased in frequency over time, ultimately comprising 32 percent of all inflectional errors in session 4. In contrast, the use of default third-person-singular inflection declined over the four sessions from 48 percent of all inflection errors in session 1 to 14 percent in session 4. We suggest that the increasing percentage of inflectional errors is evidence of morphological attrition in progress in the children’s Spanish. In particular, we speculate that the use of inflectional mismatches may reflect a weakening of the connections between previously acquired functional features and morphological spell-out as attrition progresses. In addition, we propose that the children’s morphological attrition is a result of increasing lexical competition from their L2 English at school, leading to decreased lexical activation in
Spanish (Putnam & Sánchez, 2013). We assume that before beginning preschool in English at age three, our participants had acquired Spanish person/number agreement distinctions (Bel, 2002; Blume, 2002; Montrul, 2004). However, the high production of nondefault person/number errors in sessions 1–4 seems to reflect increasing difficulty in retrieving target morphological forms as the children’s L2 proficiency grows. Assuming that different inflected verb forms compete for activation and that this competition occurs across as well as within languages (Kroll et al., 2006; Lagrou et al., 2011; Spivey & Marian, 1999), we hypothesize that children are increasingly less able to activate and retrieve target forms in the L1 as competition from their L2 grows stronger (Putnam & Sánchez, 2013; Seton & Schmid, 2016). The children’s overgeneralizations in producing irregular verbal stems in examples (such as in 8) also support the hypothesis that the children are having difficulty accessing correct morphological forms in Spanish. The children use the overregularized form ponió instead of the irregular targets puso (‘s/he put’) in 8a and puse (‘I put’) in 8b. 8. a. CHI2: y después él se ponió (target = puso) esto para ver si él es un monstruo and afterwards he put on this to see if he is a monster *‘And afterwards he put this on to see if he is a monster’ b. CHI5: Sí and then yo ponió mi seat belt and sit down y yo vió en la afuera yes and then I put:3sg my seat belt and sit down and I see:3sg in the outside de la window of the window *‘Yes and then I put my seat belt and sits down and I saw outside of the window’ Errors such as these are characteristic of younger monolingual children acquiring Spanish and are thought to occur when a child is unable to retrieve an irregular form and uses a rulebased form instead (Clahsen et al., 2002). We further suggest that these two different types of person/number errors reflect distinct challenges faced by learners. The overuse of default 3sg inflection may reflect that a learner is having difficulty retrieving a correct inflectional form or that s/he has not fully acquired the inflectional morphology associated with a set of features; in these cases, an elsewhere form is used as a substitute. In contrast, nondefault agreement errors may emerge when a heritage speaker cannot activate a target form in his/her L1 that was already acquired and a nontarget form is retrieved instead (Putnam and Sánchez, 2013). Lipski (1993) cites examples such as 9 from adult heritage speakers of Spanish that contain nontarget verbal agreement: 9. a. Yo bailo y come I dance:1sg and eat:3sg *‘I dance and (s/he) eats’ b. Un lugar tan grande donde nadie conozco a place so big where no one know:1sg DOM a nadie
no one *‘Such a big place where nobody (I) know anybody’ These utterances resemble the nontarget inflectional forms produced by the children in our study. Lipski calls these adults transitional bilinguals, heritage speakers of Spanish living in the US who share the following characteristics: • Had little or no schooling in Spanish • Spoke Spanish in earliest childhood, either monolingually or bilingually with English • Experienced a rapid shift in dominance from Spanish to English before adolescence • Limit use of Spanish to conversations with a few relatives • When addressed in Spanish by individuals known to be bilingual, will often respond in English While Lipski does not provide the percentage of agreement errors out of the total for these heritage speakers, he claims that instability in use of person/number agreement is highly characteristic of adult transitional bilinguals but is not found in the speech of heritage speakers with high proficiency in Spanish. Many of our heritage participants share these biographical characteristics with the adult speakers that Lipski identified as transitional bilinguals, and it is possible that a high percentage of nontarget agreement forms will become a permanent feature of their Spanish. It is also possible that some of our children have heard varieties of Spanish with divergent inflection, which could account for some of their nontarget productions (p.c., Gita Martohardjono). However, we assume that this is not the case for most children, who have heard input from Spanish-dominant bilinguals or speakers who had almost no knowledge of English, and that these divergent forms therefore result from learning strategies on the part of our young heritage learners. Conclusions
In English, our participants’ production of nontarget bare verbs decreased over the four datacollection sessions, and their error rates were in line with results reported for other child L2 learners of English (Lakshmanan 1993/1994). However, their rates of nontarget inflectional forms in Spanish were as high as those produced by the heritage children in Anderson’s (2001) longitudinal study. This was surprising, given that the children studied by Anderson had been immersed in English for three years at the beginning of the study and were living in a monolingual English-speaking community in which they only received input in Spanish at home. In contrast, our participants lived in a community that is mostly Spanish speaking and had only begun English language preschool a year earlier. Nevertheless, our participants already showed possible signs of morphological attrition in the form of high rates of nontarget person/number forms, which differ from the types of errors found in L1 and 2L1 Spanish speakers. The higher rate of nondefault agreement errors that we found differed also from Anderson’s results and may reflect difficulties in the mapping of features onto morphological forms in production, a stage that had already occurred in Anderson’s
participants, leaving them with fewer person/number contrasts than the children who participated in our study. Additional research is needed to shed light on the ongoing effects of lack of activation due to switches in dominance that lead to attrition in child heritage speakers as well as to establish developmental benchmarks for heritage learners to avoid confusing L1 attrition with language impairment or other delays. Notes 1. We would like to thank the children, parents, and teachers of the Eugenio de Hostos School in Union City, NJ; Rutgers University Aresty Research Center undergraduate assistants Katy-Anne Blacker, Marlene Garzona, Jessica Kustra, Taylor Lampton, Glenn Ramirez, Kaitty Reyes, and Francesca Venezia; Professor Crystal Marull (University of Florida); Professor Gretchen Van de Walle; and Professor Gita Martohardjono for her helpful comments and suggestions on this chapter. 2. Davidiak and Grinstead (2004) propose that in child Spanish, third-person-singular forms are root infinitives. 3. This speech sample is from the Cornell Language Acquisition Lab. 4. In this chapter, we use “sequential bilinguals” to refer to these children because they did not begin acquiring English at birth. Nonetheless, some authors would consider them to be simultaneous bilinguals because they began acquiring English before the age of three (e.g., Blom & Unsworth, 2010). This terminological difference will have little relevance for the argumentation presented in this chapter. 5. The family of one of the participants (CHI4) did not complete the questionnaire. References Anderson, R. (1999a). Impact of language loss on grammar in a bilingual child. Communication Disorders Quarterly, 21, 4– 16. Anderson, R. (1999b). Loss of gender agreement in L1 attrition: Preliminary results. Bilingual Research Journal, 23, 319– 338. Anderson, R. (2001). Lexical morphology and verb use in child first language loss: A preliminary case study investigation. International Journal of Bilingualism, 5(4), 377–401. Anderson, R. (2004). First language loss in Spanish-speaking children: Patterns of loss and implications for clinical practice. In B. Goldstein (Ed.), Bilingual Language Developments and Disorders in Spanish-English speakers (pp. 187–212). Brookes. Austin, J. (2001). Language differentiation and the development of morphosyntax in bilingual children acquiring Basque and Spanish [Unpublished doctoral dissertation]. Cornell University. Austin, J. (2010). Rich inflection and the production of root infinitives in child language. Morphology, 20(1), 41–69. Bel, A. (1998). Teoria lingüística i adquisició del llenguatge. Anàlisi comparada dels trets morfològics en català i en castellà [Unpublished doctoral dissertation]. Universitat Autònoma de Barcelona. Bel, A. (2002). Early verbs in the acquisition of tense features in Spanish and Catalan. In A. Pérez-Leroux & J. Liceras (Eds.), The acquisition of Spanish morphosyntax: The L1/L2 connection (pp. 1–34). Kluwer. Bley-Vroman, R. (1989). What is the logical problem of foreign language learning? Linguistic Perspectives on Second Language Acquisition, 4, 1–68. Blom, E., & Unsworth, S. (2010). Experimental methods in language acquisition research. John Benjamins. Blume, M. (2002). Tenseless verb forms in the acquisition of Spanish as a first language [Unpublished doctoral dissertation]. Cornell University. Bohnacker, U. (1997). Determiner phrases and the debate on functional categories in early child language. Language Acquisition, 6(1), 49–90. Borer, H., & Rohrbacher, B. (2002). Minding the absent: Arguments for the Full Competence Hypothesis. Language Acquisition, 10(2), 123–175. Brouwer, S., Cornips, L., & Hulk, A. (2008). Misrepresentation of Dutch neuter gender in older bilingual children. In B.
Haznedar & E. Gavruseva (Eds.), Trends in Child Second Language Acquisition, 83–96. John Benjamins. Clahsen, H. (1990). Constraints on parameter setting. Language Acquisition, 1, 361–391. Clahsen, H. (1999). Lexical entries and rules of language: A multi-disciplinary study of German inflection. Behavioral and Brain Sciences, 22, 991–1060. Clahsen, H., Aveledo, F., & Roca, I. (2002). The development of regular and irregular verb inflection in Spanish child language. Journal of Child Language, 29(3), 591–622. Clahsen, H., Eisenbeiss, S., & Penke, M. (1996). Lexical learning in early syntactic development. In H. Clahsen (Ed.), Generative perspectives on language acquisition: Empirical findings, theoretical considerations and crosslinguistic comparisons (pp. 129–160). John Benjamins. Clahsen, H., & Penke, M. (1992). The acquisition of agreement morphology and its syntactic consequences: New evidence on German child language from the Simone Corpus. In J. Meisel (Ed.), The acquisition of verb placement: Functional categories and V2 phenomena in language acquisition (pp. 181–223). Kluwer. Crain, S., & Thornton, R. (1998). Investigations in Universal Grammar: A guide to experiments in the acquisition of syntax and semantics. MIT Press. Cuza, A., & Miller, L. (2015). The protracted acquisition of past tense aspectual values in child heritage Spanish. In R. Klassen, J. Liceras, & E. Valenzuela (Eds.), Hispanic Linguistics at the Crossroads: Theoretical Linguistics, Language Acquisition and Language Contact (pp. 211–229). John Benjamins. Cuza, A., Pérez-Tattam, R., Barajas, E., Miller, L., & Sadowski, C. (2013). The development of tense and aspect morphology in child and adult heritage Spanish: Implications for heritage language pedagogy. In J. Schwieter (Ed.), Innovative Research and Practices in Second Language Acquisition and Bilingualism (pp. 193–220). John Benjamins. Davidiak, E., & Grinstead, J. (2004). Root non-finite forms in Spanish [Paper presentation]. Generative Approaches to Language Acquisition North America, University of Hawai’i. Deuchar, M., & Quay, S. (2000). Bilingual acquisition: Theoretical implications of a case study. Oxford University Press. Ezeizabarrena, M. J. (1996). Adquisición de la morfología verbal en euskera y castellano por niños bilingües. Servicio Editorial de la Universidad del País Vasco. Gavruseva, E. (2003). Aktionsart, aspect, and the acquisition of finiteness in early child grammar. Linguistics, 41(4), 723– 756. Gavruseva, L., & Lardiere, D. (1996). The emergence of extended phrase structure in child L2 acquisition. In A. Stringfellow, D. Cahana-Amitay, E. Hughes, & A. Zukowski (Eds.), Proceedings of the 20th Annual Boston University Conference on Language Development (pp. 225–236). Cascadilla Press. Gawlitzek-Maiwald, I., Tracy, R., & Fritzenschaft, A. (1992). Language acquisition and competing linguistic representations: The child as arbiter. In J. Meisel (Ed.), The acquisition of verb placement: Functional categories and V2 phenomena in language acquisition (pp. 139–180). Kluwer. Giancaspro, D. (2017). Heritage speakers’ production and comprehension of lexically- and contextually-selected subjunctive mood morphology [Unpublished doctoral dissertation]. Rutgers University, New Brunswick. Grinstead, J. (2000). Case, inflection and subject licensing in child Catalan and Spanish. Journal of Child Language, 27(1), 119–155. Guasti, M. (1993/1994). Verb syntax in Italian child grammar: Finite and nonfinite verbs. Language Acquisition, 3, 1–40. Guilfoyle, E., & Noonan, M. (1992). Functional categories and language acquisition. Canadian Journal of Linguistics, 37, 241–272. Halle, M., & Marantz, A. (1993). Distributed Morphology and the pieces of inflection. In K. Hale & S. Keyser (Eds.), The view from building 20 (pp. 111–176). MIT Press. Hawkins, R., & Chan, C. Y. H. (1997). The partial availability of Universal Grammar in second language acquisition: The “failed functional features hypothesis.” Second Language Research, 13(3), 187–226. Hawkins, R., & Hattori, H. (2006). Interpretation of English multiple wh-questions by Japanese speakers: A missing uninterpretable feature account. Second Language Research, 22(3), 269–301. Haznedar, B. (2003). Missing surface inflection in adult and child L2 acquisition. In Proceedings of the 6th Generative Approaches to Second Language Acquisition Conference (GASLA 2002) (pp. 140–149). Cascadilla Proceedings Project. Hyams, N. (1992). Morphological development in Italian and its relevance to parameter-setting models: Comments on the
paper by Pizzuto & Caselli. Journal of Child Language, 19, 695–709. Kroll, J. F., Bobb, S. C., & Wodniecka, Z. (2006). Language selectivity is the exception, not the rule: Arguments against a fixed locus of language selection in bilingual speech. Bilingualism: Language and Cognition, 9(2), 119–135. Lagrou, E., Hartsuiker, R. J., & Duyck, W. (2011). Knowledge of a second language influences auditory word recognition in the native language. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37(4), 952. Lakshmanan, U. (1993/1994). Universal Grammar in child L2 acquisition: Morphological uniformity and null subjects. John Benjamins Publishing. Lardiere, D. (1998). Case and tense in the “fossilized” steady state. Second Language Research, 14(1), 1–26. Lardiere, D. (2003). Second language knowledge of [±past] vs. [±finite]. In Proceedings of the 6th Generative Approaches to Second Language Acquisition Conference (GASLA 2002) (pp.176–189). Cascadilla Proceedings Project. Lardiere, D. (2006). Attainment and acquirability in second language acquisition. Second Language Research, 22(3), 239– 242. Liceras, J., Valenzuela, E., & Díaz, L. (1999). L1/L2 Spanish grammars and the pragmatic deficit hypothesis. Second Language Research, 15(2), 161–190. Lipski, J. (1993). Creoloid phenomena in the Spanish of transitional bilinguals. In A. Roca & J. Lipski (Eds.), Spanish in the United States: Linguistic contact and diversity (pp. 155–182). Mouton de Gruyter. Lust, B. (1999). Universal Grammar: The strong continuity hypothesis in first language acquisition. In W. Ritchie & T. Bhatia (Eds.), Handbook of child language acquisition (pp. 111–155). Academic Press. Mayer, M. (1969). Frog, where are you? Dial Press. McCarthy, C. (2006). Default morphology in second language Spanish: Missing inflection or underspecified inflection? In J. P. Montreuil & C. Nishida (Eds.), New Perspectives on Romance Linguistics: Selected Papers from the 35th Linguistic Symposium on Romance Languages (pp. 201–212). John Benjamins. Meisel, J., & Müller, N. (1992). Finiteness and verb placement in early child grammars. In J. Meisel (Ed.), The acquisition of verb placement: Functional categories and V2 phenomena in language acquisition (pp.109–38). Kluwer. Merino, B. (1983). Language loss in bilingual Chicano children. Journal of Applied Developmental Psychology, 4(3), 277– 294. Montrul, S. (2002). Incomplete acquisition and attrition of Spanish tense/aspect distinctions in adult bilinguals. Bilingualism: Language and Cognition, 5(1), 39–68. Montrul, S. (2004). The acquisition of Spanish: Morphosyntactic development in monolingual and bilingual L1 acquisition and adult L2 acquisition. John Benjamins. Montrul, S. (2008). Incomplete acquisition in bilingualism: Re-examining the age factor. John Benjamins. Montrul, S. (2009). Knowledge of tense-aspect and mood in Spanish heritage speakers. International Journal of Bilingualism, 13(2), 239–269. Montrul, S., Davidson, J., de La Fuente, I., & Foote, R. (2014). Early language experience facilitates the processing of gender agreement in Spanish. Bilingualism: Language and Cognition, 17(1), 118–138. Montrul, S., & Perpiñán, S. (2011). Assessing differences and similarities between instructed heritage language learners and L2 learners in their knowledge of Spanish tense-aspect and mood (TAM) morphology. Heritage Language Journal, 8(1), 90–133. Montrul, S., & Sánchez-Walker, N. (2013). Differential object marking in child and adult heritage speakers. Language Acquisition, 20(2), 109–132. Paradis, J., & Genesee, F. (1997). On continuity and the emergence of functional categories in bilingual first language acquisition. Language Acquisition, 6, 91–124. Perez-Cortes, S. (2016). Mood selection in obligatory and variable contexts: Spanish HS and L2 learners’ acquisition of desideratives and directives [Unpublished doctoral dissertation]. Rutgers University, New Brunswick. Phillips, C. (1995). Syntax at two: Cross-linguistic differences. In C. Schütze, J. Ganger, & K. Broihier (Eds.), Papers on Language Processing and Acquisition (Vol. 26, pp. 225–282). MIT Working Papers in Linguistics. Phillips, C. (1996). Root infinitives are finite. In A. Stringfellow, D. Cahana-Amitay, E. Hughes, & A. Zukowski (Eds.), Proceedings of the 20th Annual Boston University Conference on Language Development (pp. 588–599). Cascadilla Press.
Pires, A., & Rothman, J. (2009). Disentangling sources of incomplete acquisition: An explanation for competence divergence across heritage grammars. International Journal of Bilingualism, 13(2), 211–238. Platzack, C. 1992. Functional categories and early Swedish. In J. Meisel (Ed.), The acquisition of verb placement: Functional categories and V2 phenomena in language acquisition (pp. 63–82). Kluwer. Poeppel, D., & Wexler, K. (1993). The Full Competence Hypothesis of clause structure in early German. Language, 69, 1– 33. Pratt, A., & Grinstead, J. (2007). Optional infinitives in child Spanish. In Proceedings of the 2nd Conference on Generative Approaches to Language Acquisition North America (GALANA) (pp. 351–362). Cascadilla Press. Prévost, P., & White, L. (2000). Missing surface inflection or impairment in second language acquisition? Evidence from tense and agreement. Second Language Research, 16, 103–133. Putnam, M., & Sánchez, L. (2013). What’s so incomplete about incomplete acquisition?—A prolegomenon to modeling heritage language grammars. Linguistic Approaches to Bilingualism, 3(4), 476–506. Radford, A. (1990). Syntactic theory and the acquisition of English syntax. Blackwell. Radford, A., & Ploennig-Pacheco, I. (1995). The morphosyntax of subjects and verbs in child Spanish: A case study. Essex Reports in Linguistics, 5, 23–67. Rothman, J. (2009). Understanding the nature and outcomes of early bilingualism: Romance languages as heritage languages. International Journal of Bilingualism, 13(2), 155–163. Santelmann, L., Berk, S., & Lust, B. (2000). Assessing the strong continuity hypothesis in the development of English inflection: Arguments for the grammatical mapping paradigm. In R. Billery (Ed.), Proceedings of the 19th West Coast Conference on Formal Linguistics (pp. 439–452). Cascadilla Press. Sanz-Torrent, M., Serrat, E., Andreu, L., & Serra, M. (2008). Verb morphology in Catalan and Spanish in children with specific language impairment: A developmental study. Clinical Linguistics & Phonetics, 22(6), 459–474. Serrat, E., & Aparici, M. (2001). Morphological errors in early language acquisition: Evidence from Catalan and Spanish. In M. Almgren, A. Barreña, M.-J. Ezeizabarrena, I. Idiazabal, & B. MacWhinney (Eds.), Research on child language acquisition: Proceedings of the 8th Conference of the IASCL (pp. 1260–1277). Cascadilla Press. Seton, B., & Schmid, M. (2016). Multi-competence and first language attrition. In V. Cook & L. Wei (Eds.), The Cambridge handbook of linguistic multi-competence (pp. 338–354). Cambridge University Press. Silva-Corvalán, C. (1994). Language Contact and Change: Spanish in Los Angeles. Oxford University Press. Silva-Corvalán, C. (2003). El español en Los Ángeles: aspectos morfosintácticos. Insula. Revista de Ciencias y Letras, 58(679–680), 19–25. Silva-Corvalán, C. (2014). Bilingual Language Acquisition: Spanish and English in the First Six Years. Cambridge University Press. Sinka, I., & Schelletter, C. (1998). Morphosyntactic development in bilingual children. International Journal of Bilingualism, 2, 301–326. Spivey, M. J., & Marian, V. (1999). Cross talk between native and second languages: Partial activation of an irrelevant lexicon. Psychological Science, 10(3), 281–284. Torrens, V. (1995). The acquisition of inflection in Spanish and Catalan. In C. Schütze, J. Ganger, & K. Broihier (Eds.), MIT Working Papers in Linguistics, 26, 451–472. Tsimpli, I. (1992). Functional categories and maturation: The prefunctional stage. [Unpublished doctoral dissertation]. University College, London. Tsimpli, I. (2003). Clitics and determiners in L2 Greek. In J. Liceras, H. Goodluck, & H. Zobl (Eds.), Proceedings of the 6th Generative Approaches to Second Language Acquisition Conference (GASLA 2002) (pp. 331–339). Cascadilla Press. Vainikka, A., & Young-Scholten, M. (1996). Gradual development of L2 phrase structure. Second Language Research, 12(1), 7–39. Valdés, G. (2001). Heritage language students: Profiles and possibilities. In J. K. Peyton, D. A. Ranard, & S. McGinnis (Eds.), Heritage languages in America: Preserving a national resource (pp. 37–77). Center for Applied Linguistics. Wexler, K. (1994). Optional infinitives, head movement and the economy of derivations in child grammar. In D. Lightfoot & N. Hornstein (Eds.), Verb movement (pp. 305–350). Cambridge University Press.
6 The Role of Gestures in First and Second Language Acquisition: A Case Study of a Hebrew-English Bilingual Child Yarden Kedar
Gestures are movements of the head, hands, and arms that are typically used by speakers during verbal discourse. Studies with apes have shown that they employ a broad repertoire of gestures—which for the most part are also present in human gestural repertoires—and use those gestures systematically and intentionally for an array of communicative purposes in order to achieve their everyday goals. Thus, some scholars view gestures as one of the principal triggers for the emergence of language in humans. That is, they argue that gestures and speech share a common evolutionary origin (e.g., Kendon, 2004; Kersken et al., 2019; Kita & Özyürek, 2003; McNeill, 1985, 1992; Tomasello, 2006). In the following sections, I discuss some critical issues regarding the determinants of gesture development and use in the human species in relation to the acquisition and use of language.1
Gestures: An Integral Part of Language?
What are the key functions of gestures, and to what extent are these functions tied to the acquisition, representation, and use of language? Some researchers have emphasized the intrinsic nature of gesture use; namely, self-directed gestures that are not communicative in essence and which do not necessarily co-occur with speech. Such gestures are supposedly used for assimilating thought processes or for regulating one’s emotions during intense situations. For example, in a study on the communicative relevance of gestures for interlocutors, Iverson and Goldin-Meadow (1998) reported that the production of gestures did not depend on the presence of either a model or an observer. According to speechauxiliary theories, gestures serve as an independent facilitation device for language-related processes. For example, using gestures for lexical retrieval (Krauss et al., 2000) or to facilitate the representation of contents just before these contents are verbalized (Alibali et al., 2000). In contrast, gesture-speech theories depict gestures as an integral part of language. In particular, gestures have been claimed to mark phrasal and sentential boundaries; to overcome grammatical issues (e.g., tense, temporality); to offer a visual illustration of semantic concepts, actions, or objects; to realize and clarify pragmatic and communicative purposes (e.g., maintaining fluency, signaling transitions and turn-taking in discourse); to clarify children’s spoken messages; and to provide an “emotional soundtrack” for speech by using beats2 (Capone & McGregor, 2004; Gullberg, 2006). Gestures: Universal or Culture Specific?
Gestures are used in all human cultures. However, the particular ways in which gestures are produced by people in various social contexts and in different parts of the world vary considerably. Some researchers have argued that such systematic cross-cultural differences in gesture use should be classified as different gestural repertoires, analogous to the different lexicons and variations in syntactic structure that are found among spoken languages globally. In addition, there is a range of individual variation among speakers from the same group in the specific types of gestures being used as well as in their rate of use. Thus, gestural repertoires seem to be formalized and driven by a mixture of universal tendencies, culture-specific norms, and individual differences among speakers from the same culture (see Gullberg, 2006; Wang et al., 2015). The Emergence of Gestures
Tracking down the development of gesture use in children has proven to be an effective paradigm for exploring both questions discussed above. Thus, researchers have been studying the relation between gestures and language, as well as the universal versus culturespecific aspects of gesture use, as they unfold during the child’s first years. Both positive and negative evidence exist concerning the interdependence between gestures and language in early development. For example, infants communicate with their
caregivers by using their hands and arms prior to the emergence of speech. In fact, the first intentional symbolic use of social communication is typically achieved via a sequence of deictic gestures such as showing, giving, and pointing—rather than by language (Petitto, 1988, 1992). However, such prelinguistic gestures, which begin to emerge at approximately 10 mo (Bates et al., 1975; Folven & Bonvillian, 1991), have been shown to correlate with and to predict the child’s first words (Capone & McGregor, 2004; Goldin-Meadow & Alibali, 2013; Özçalıkan & Goldin-Meadow, 2005). In addition, there is evidence that gestures fulfill a compensatory communicative function during the early stages of language acquisition, when language proficiency is still weak (e.g., Capirci et al., 1996). As for the universality of gestures, the main issue under debate is the extent to which a child’s gestural repertoire depends on her unique experience in a particular cultural environment. Alternatively, certain types of gestures may always appear before others, universally, regardless of a specific cultural setting. Once again, there seems to be mixed evidence supporting both of these scenarios. On the one hand, children have indeed been found to follow a universal developmental trajectory with respect to the types of gestures they use at different developmental phases (see, e.g., Blake et al., 2005; Gullberg & de Bot, 2010). Following the emergence of deictic and beat gestures, a new kind of gesture appears around two years of age: iconic gestures. Iconics depict certain aspects of a referent such as people, objects, and actions. An example is moving the index finger and middle fingers back and forth to indicate the act of walking (Mayberry & Nicoladis, 2000; Nicoladis, 2002). This new kind of gesture has been shown to have a significant role in the facilitation of early word learning (e.g., Goodwyn et al., 2000). As the child grows, she begins to use iconic gestures in reference to abstract descriptions that are not bounded by her immediate surroundings (in addition to gestures referencing concrete entities in her environment). At the same time, however, it is clear that the specific gestural model available in the child’s environment will also affect her choice of using certain gestures in particular social contexts. In fact, most of our gestural repertoire is eventually achieved through learning and imitation, as in the case of conventional gestures, a set of gestures that convey a culturespecific, known meaning within a particular social context (e.g., waving bye-bye; holding up the index finger to indicate the number one). Conventional gestures become very frequent in children’s gestural repertoire from three years of age and on (Hodges et al., 2018; Perrault et al., 2019). Gesture Development in Bilingual Settings
Gestures have been shown to play a vital role for adult L2 learners in sustaining social interaction with native speakers. For example, Gullberg (1998) found that adult L2 learners used gestures as an effective communication strategy. Gestures were produced during conversational narratives to elicit words from interlocutors in order to manage problems of coreference and to metalinguistically signal the presence of a problem such as an ongoing lexical search or management of disfluency. McCafferty (2002) inspected the use of gestures by an adult L2 learner, finding that gestures played a key role in promoting language use by
facilitating positive interaction between this nonnative speaker and a native participant. Are there similar relations between gestures and language in children who acquire two languages (or more), and if so, how do they develop over time? In fact, early bilingualism (and early multilingualism)—either in its simultaneous or in its sequential (successive) form —has proven to be yet another fruitful scenario for understanding the various roles of gestures in human cognition and behavior. For example, Nicoladis et al. (1999) tested bilingual children between the ages of 2;0 and 3;6 who acquired French and English simultaneously. Different patterns were found in each language in the emergence and use of iconic and beat gestures. In another study, this time with nonbalanced French-English bilingual children between the ages of three to five years, Nicoladis et al. (2005) found that the children used more iconic gestures in their dominant language than in their nondominant language. Bilingual children have also been reported to use more conventional and deictic gestures while communicating in their nondominant, weaker language. This is also the case in instances in which the child produces a gesture without any speech accompanying it but the social environment and the language directed to the child are based on the nondominant language (Sherman & Nicoladis, 2004; Nicoladis et al., 2005). The Current Study
This longitudinal case study mapped the developmental trajectory of gesture production by a three-year-old child in parallel with her acquisition of language. The child was recorded during eight months in both her first language (Hebrew) and her second language (English). The leading questions and hypotheses were as follows: 1. Does the child integrate and synchronize her production of gestures and speech? Hypothesis 1: If gestures and language are intertwined during early phases of development (rather than being two isolated systems), we would expect the child to synchronize her use of gestures with speech and only minimally use gestures when no linguistic content accompanies them. 2. Does the child show different gestural patterns in first language acquisition (FLA) contexts versus second language acquisition (SLA) contexts? Specifically, when, with whom, to what extent, and for what means does this young bilingual child gesture in each of her evolving languages (Hebrew, English)? Does she rely on gestures in her nondominant language (English) more than in her native tongue (Hebrew)? Hypothesis 2: If gestures serve as a compensatory device for missing lexical and grammatical knowledge in the child’s nondominant language (L2), then a greater rate of gesture use (i.e., number of gestures per minute) is predicted in L2 English discourse contexts in comparison to L1 Hebrew discourse contexts. 3. Does the child exhibit a similar repertoire of gestures, and for similar purposes, in both languages? Alternatively, is language-specific influence evident in her choice of particular types of gestures that accompany her productive language in her L1 (Hebrew) versus her L2
(English)? Hypothesis 3: The child’s production of particular types of gestures will change as a function of the specific language she uses and the social context of the communication (i.e., different addressees). Hypothesis 4: The child’s use of gestures for achieving three main communicative goals (compensation, demonstration, or segmentation; see details below) will change as a function of the specific language she speaks while gesturing and her level of mastery in each language at a given point in time. Method
In this longitudinal case study, NG,3 a Hebrew-speaking child, was followed from the age of 3;0 to 3;8, a period during which she was acquiring English as a second language. NG was born in August 2003 and raised in Israel. At the age of 3;0, NG and her family moved to a northeastern town in the United States for a period of one year (August 2006 to August 2007). NG is the third child in her family, with a significant gap in age between her and two older siblings—a brother and a sister, who were approximately fifteen and thirteen, respectively, at the time of the family’s move. Both of NG’s parents have doctoral degrees. Prior to coming to the United States, NG was not exposed to English on a regular basis and could thus be considered a monolingual Hebrew-speaking child at that stage. By the age of three years, NG’s development in terms of her cognitive, emotional, and language abilities was assessed by her parents and teachers to be appropriate for her age. Shortly after her arrival to the United States, NG began attending a monolingual, English-speaking, oncampus university preschool. Procedure
NG’s production of natural speech and her manual gestures were recorded by video and audio from her first encounters with the unfamiliar language (English) until its (partial) mastery as a second language. Data were collected during interactive sessions in which NG was playing and communicating with a team of researchers, both at her preschool and at her home, with sessions being held in Hebrew as well as in English in each of these locations. The researchers met regularly with NG about once or twice a week. However, due to the academic year timeline, there were some episodes in which no data were collected. During home sessions, at least one of NG’s family members was present. During school sessions, one or more of her teachers were present. A native Hebrew speaker was in charge of administering the Hebrew sessions, while four English native speakers administered the English sessions (usually, only two of these researchers would participate in a given session). Coding
The current findings are based on a subset of fourteen sessions from a total of forty-three recorded sessions of NG between August 2006 and July 2007. Seven of these sessions were recorded at NG’s preschool, and seven at her home. Twelve sessions were completely or
primarily English oriented, while two sessions featured Hebrew as the dominant language. The total length of the videotaped data that were coded for this study was seven hours and seven minutes. Speech
To evaluate NG’s progress in speech production in Hebrew and in English, mean length of utterance (MLU; Brown, 1973) was calculated for each session, based on the overall number of words that were produced by NG divided by the total number of utterances in that given session. For Hebrew utterances, an equivalent mean morpheme per utterance (MPU) measure was used. This measure has been shown to capture the nature of Hebrew as a highly synthetic language with a rich bound morphology more accurately since it is based on the total number of morphemes produced by the child divided by her total number of utterances in a given session (Dromi & Berman, 1982). Gestures
The following gestural coding system is largely based on conventional schemes and variables previously developed in similar studies with adults and children (e.g., Iverson & GoldinMeadow, 1998; McNeill, 1992), as well as some novel measures developed for this study. A team of three researchers coded a total of 767 gestures that were produced by NG in the fourteen sessions that were included in this study. In cases of discrepancy, two coders would watch the controversial gestures again until agreement was reached. The main gestural features that were coded are the following. First, each gesture produced by NG was classified on the basis of its main characteristic (type): deictic, iconic, beat, conventional, or emotional. Second, to acknowledge possible changes in gesture use as a function of NG’s continuous presence in an English-speaking environment, the relevant period of data collection for each documented gesture was indicated: period 1: September– November 2006; period 2: January–April 2007. Third, each gesture was marked for the location in which it was produced (i.e., at NG’s home or in her school). Fourth, each gesture was coded on the basis of whether it accompanied one of NG’s spoken utterances in English or Hebrew or, alternatively, whether it was produced in the absence of speech (L-NG). A similar variable, linguistic context (L-CT), indicated whether the language spoken to NG by her environment was English or Hebrew (i.e., the specific linguistic context surrounding NG as she was producing a single gesture or a set of gestures). Fifth, each gesture produced by NG was marked for its social function. Specifically, it was noted whether a given gesture was directed to adults (e.g., the team of researchers and NG’s teachers and parents); to her peers (the children in the preschool; friends visiting at her home); or to herself (self-directed gestures). Lastly, the main communicative function of each gesture was noted: a gesture that was used to compensate for lexical difficulties (such as missing words) or grammatical difficulties (e.g., lack of access to a specific functional head) was coded under compensation; a gesture providing a visual illustration was coded under demonstration; and a gesture that marked grammatical structures, such as phrasal or sentential boundaries, was coded as segmentation
(which is obviously relevant only to English and Hebrew utterances but not to no speech situations). Results Speech Production
As expected, NG’s MPU in Hebrew was consistently high and relatively stable in both datacollection periods (period 1 = 4.69; period 2 = 4.93), whereas her MLU in English clearly increased with time (period 1 = 2.62; period 2 = 4.44). A mixed growth model procedure, which was applied separately for each language using the IBM SPSS Statistics 21.0 package, with period serving as the independent variable, confirmed this impression. The effect of period on MLU (English) was significant at the .05 level for both the linear term (F(1,19) = 8.4, p = 0.009) as well as the quadratic term (F(1,19) = 4.4, p = 0.05), indicating an overall increase in MLU (English) over time, from period 1 to period 2. In Hebrew, the effect of period on MPU was insignificant (linear term: p = 0.36; quadratic term: p = 0.56). Hypothesis 1
A multinomial logistic regression analysis testing for the effects of L-NG (English; Hebrew; no speech) and period (period 1; period 2) on social function yielded a significant effect of LNG, χ2(2) = 6.03, p