Bilingual Acquisition of Intonation: A Study of Children Speaking German and English [Reprint 2014 ed.] 9783110929881, 9783484304246

This book investigates the acquisition of intonation by German/English bilingual children. Intonation is analysed both a

255 80 7MB

German Pages 188 [192] Year 2000

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Acknowledgements
Notational conventions
1. Introduction
2. Bilingual acquisition of intonation
2.1 Description and transcription of the phonological systems of English and German intonation
2.1.1 The British tradition
2.1.2 The autosegmental-metrical (AM) approach
2.1.3 Compatibility of the two transcription systems
2.2 The linguistic functions of intonation
2.3 The phonetic correlates of intonation
2.3.1 The phonetic correlates of nuclei, pitch accents and intonational phrasing
2.3.2 Pitch
2.3.3 Loudness
2.3.4 Length
2.3.5 Pause
2.4 Bilingual acquisition of intonation
2.4.1 Bilingual first language acquisition
2.4.2 Bilingual language representation and processing
2.4.3 Language representation and processing in bilingual first language acquisition
2.4.4 Bilingual acquisition of the phonological system of intonation
2.4.5 Bilingual acquisition of the phonetic parameters of intonation
2.5 A model of the bilingual acquisition of intonation
3. Bilingual acquisition of nucleus placement
3.1 The phonological systems of nucleus placement in German and English
3.2 The phonetic production of nuclei
3.2.1 Pitch and pitch movement during stressed syllables
3.2.2 Intensity
3.2.3 Length
3.3 The acquisition of nucleus placement
3.3.1 The acquisition of word stress
3.3.2 Transition to sentence-level stress
3.3.3 Acquisition of the phonological rules of nucleus placement
3.4 Mastery of the phonetic production of nuclear stress and emphasis
3.5 Research questions
4. Bilingual acquisition of the system of pitch
4.1 The phonological systems of pitch in English and German
4.2 The phonetic production of pitch accents in English and German
4.3 The acquisition of the phonological system of pitch
4.4 Mastery of the phonetic production of pitch accents
4.5 Research questions
5. Bilingual acquisition of intonational phrasing
5.1 The phonological systems of intonational phrasing in English and German
5.2 The phonetic correlates of intonational phrases
5.3 The acquisition of intonational phrasing
5.4 Mastery of the phonetic correlates of intonational phrasing
5.5 Research questions
6. The study - research questions, method and analysis
6.1 Research questions
6.1.1 Nucleus placement
6.1.2 Pitch
6.1.3 Intonational phrasing
6.2 Method
6.2.1 Data
6.2.2 The subjects of the study
6.2.3 Data collection
6.3 Analysis
6.3.1 Data
6.3.2 Auditory analysis and layout of the transcription
6.3.3 Reliability of the auditory analysis
6.3.4 Acoustic analysis
6.3.5 Agreement between the two kinds of analysis
7. Results
7.1 Hannah’s general acquisition path: 2;1 to 2;6
7.2 Laura’s general acquisition path: 2;5 to 4;3
7.3 Adam’s general acquisition path: 3;6 to 5;5
8. The acquisition of nucleus placement
8.1 Hannah: Acquisition from 2;1 to 2;6
8.1.1 Phonological use of nucleus placement at 2;1
8.1.2 Phonetic production of stress
8.1.3 Production of nuclei at 2;6
8.2 Laura: Acquisition from 2;5 to 4;3
8.2.1 Phonological use of nucleus placement
8.2.2 The phonetic production of nuclei
8.3 Adam: Acquisition from 3;6 to 5;5
8.3.1 Phonological use of nucleus placement
8.3.2 The phonetic production of nuclei
9. The acquisition of the system of pitch
9.1 Hannah: Acquisition from 2;1 to 2;6
9.1.1 Phonological use of pitch
9.1.2 Marking of the communicative situation by pitch
9.1.3 Hannah’s acquisition of the systematic use of pitch in questions
9.1.4 The phonetic production of pitch accents
9.2 Laura: Acquisition from 2;5 to 4;3
9.2.1 Phonological use of pitch
9.2.2 Laura’s acquisition of the systematic use of pitch in questions
9.2.3 The phonetic production of pitch accents
9.2.4 Laura’s phonetic pattern
9.3 Adam: Acquisition from 3;6 to 5;5
9.3.1 Phonological use of pitch
9.3.2 Adam’s acquisition of the systematic use of pitch in questions
9.3.3 The phonetic production of pitch accents
9.3.4 Adam’s phonetic pattern
10. The acquisition of intonational phrasing
10.1 Hannah: Acquisition from 2;1 to 2;6
10.1.1 Production of phonetic correlates of intonational phrases
10.1.2 Phonological use of intonational phrasing
10.2 Laura: Acquisition from 2;5 to 4;3
10.2.1 Production of the phonetic correlates of intonational phrases
10.2.2 Phonological use of intonational phrasing
10.3 Adam: Acquisition from 3;6 to 5;5
11. Summary and discussion
11.1 Summary and discussion
11.1.1 Nucleus placement
11.1.2 Pitch
11.1.3 Intonational phrasing
11.1.4 Bilingual acquisition of intonation
11.1.5 Bilingual vs. monolingual acquisition
11.2 The model revised
11.3 The acquisition of intonation in the general language acquisition process
11.4 Outlook and future research
12. References
Recommend Papers

Bilingual Acquisition of Intonation: A Study of Children Speaking German and English [Reprint 2014 ed.]
 9783110929881, 9783484304246

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Linguistische Arbeiten

424

Herausgegeben von Hans Altmann, Peter Blumenthal, Hans Jürgen Heringer, Ingo Plag, Heinz Vater und Richard Wiese

Ulrike Gut

Bilingual Acquisition of Intonation A Study of Children Speaking German and English

Max Niemeyer Verlag Tübingen 2000

Die Deutsche Bibliothek - CIP-Einheitsaufnahme Gut, Ulrike: Bilingual acquisition of intonation: a study of children speaking German and English / Ulrike Gut. - Tübingen : Niemeyer, 2000 (Linguistische Arbeiten; 424) Zugl.: Mannheim, Univ., Diss. 1999 ISBN 3-484-30424-3

ISSN 0344-6727

© Max Niemeyer Verlag GmbH, Tübingen 2000 Das Werk einschließlich aller seiner Teile ist urheberrechtlich geschützt. Jede Verwertung außerhalb der engen Grenzen des Urheberrechtsgesetzes ist ohne Zustimmung des Verlages unzulässig und strafbar. Das gilt insbesondere für Vervielfältigungen, Übersetzungen, Mikroverfilmungen und die Einspeicherung und Verarbeitung in elektronischen Systemen. Printed in Germany. Gedruckt auf alterungsbeständigem Papier. Druck: Weihert-Druck GmbH, Darmstadt Einband: Industriebuchbinderei Nädele, Nehren

Contents

Acknowledgements

IX

Notational conventions

X

1. Introduction

1

2. Bilingual acquisition of intonation 4 2.1 Description and transcription of the phonological systems of English and German intonation 4 2.1.1 The British tradition 4 2.1.2 The autosegmental-metrical (AM) approach 7 2.1.3 Compatibility of the two transcription systems 10 2.2 The linguistic functions of intonation 11 2.3 The phonetic correlates of intonation 12 2.3.1 The phonetic correlates of nuclei, pitch accents and intonational phrasing... 13 2.3.2 Pitch 13 2.3.3 Loudness 15 2.3.4 Length 15 2.3.5 Pause 15 2.4 Bilingual acquisition of intonation 16 2.4.1 Bilingual first language acquisition 16 2.4.2 Bilingual language representation and processing 16 2.4.3 Language representation and processing in bilingual first language acquisition 17 2.4.4 Bilingual acquisition of the phonological system of intonation 20 2.4.5 Bilingual acquisition of the phonetic parameters of intonation 21 2.5 A model of the bilingual acquisition of intonation 22 3. Bilingual acquisition of nucleus placement 3.1 The phonological systems of nucleus placement in German and English 3.2 The phonetic production of nuclei 3.2.1 Pitch and pitch movement during stressed syllables 3.2.2 Intensity 3.2.3 Length 3.3 The acquisition of nucleus placement 3.3.1 The acquisition of word stress 3.3.2 Transition to sentence-level stress 3.3.3 Acquisition of the phonological rules of nucleus placement 3.4 Mastery of the phonetic production of nuclear stress and emphasis 3.5 Research questions

25 25 27 29 29 29 30 31 33 34 35 35

VI 4. Bilingual acquisition of the system of pitch 4.1 The phonological systems of pitch in English and German 4.2 The phonetic production of pitch accents in English and German 4.3 The acquisition of the phonological system of pitch 4.4 Mastery of the phonetic production of pitch accents 4.5 Research questions

37 37 42 44 48 49

5. B ilingual acquisition of intonational phrasing 5.1 The phonological systems of intonational phrasing in English and German 5.2 The phonetic correlates of intonational phrases 5.3 The acquisition of intonational phrasing 5.4 Mastery of the phonetic correlates of intonational phrasing 5.5 Research questions

51 51 54 55 57 57

6. The study - research questions, method and analysis 6.1 Research questions 6.1.1 Nucleus placement 6.1.2 Pitch 6.1.3 Intonational phrasing 6.2 Method 6.2.1 Data 6.2.2 The subjects of the study 6.2.3 Data collection 6.3 Analysis 6.3.1 Data 6.3.2 Auditory analysis and layout of the transcription 6.3.3 Reliability of the auditory analysis 6.3.4 Acoustic analysis 6.3.5 Agreement between the two kinds of analysis

59 59 59 60 60 61 61 61 63 63 63 67 71 72 78

7. Results 7.1 Hannah's general acquisition path: 2;1 to 2;6 7.2 Laura's general acquisition path: 2;5 to 4;3 7.3 Adam's general acquisition path: 3;6 to 5;5

79 79 83 96

8. The acquisition of nucleus placement 8.1 Hannah: Acquisition from 2;1 to 2;6 8.1.1 Phonological use of nucleus placement at 2;1 8.1.2 Phonetic production of stress 8.1.3 Production of nuclei at 2;6 8.2 Laura: Acquisition from 2;5 to 4;3 8.2.1 Phonological use of nucleus placement 8.2.2 The phonetic production of nuclei 8.3 Adam: Acquisition from 3;6 to 5;5 8.3.1 Phonological use of nucleus placement 8.3.2 The phonetic production of nuclei

105 105 105 107 110 112 112 115 117 117 120

VII 9. The acquisition of the system of pitch 9.1 Hannah: Acquisition from 2;1 to 2;6 9.1.1 Phonological use of pitch 9.1.2 Marking of the communicative situation by pitch 9.1.3 Hannah's acquisition of the systematic use of pitch in questions 9.1.4 The phonetic production of pitch accents 9.2 Laura: Acquisition from 2;5 to 4;3 9.2.1 Phonological use of pitch 9.2.2 Laura's acquisition of the systematic use of pitch in questions 9.2.3 The phonetic production of pitch accents 9.2.4 Laura's phonetic pattern 9.3 Adam: Acquisition from 3,6 to 5;5 9.3.1 Phonological use of pitch.. 9.3.2 Adam's acquisition of the systematic use of pitch in questions 9.3.3 The phonetic production of pitch accents 9.3.4 Adam's phonetic pattern

124 124 124 125 126 127 129 129 130 133 135 137 137 139 139 140

10. The acquisition of intonational phrasing 10.1 Hannah: Acquisition from 2;1 to 2;6 10.1.1 Production of phonetic correlates of intonational phrases 10.1.2 Phonological use of intonational phrasing 10.2 Laura: Acquisition from 2;5 to 4;3 10.2.1 Production of the phonetic correlates of intonational phrases 10.2.2 Phonological use of intonational phrasing 10.3 Adam: Acquisition from 3;6 to 5;5

146 146 146 149 150 150 154 156

11. Summary and discussion 11.1 Summary and discussion 11.1.1 Nucleus placement 11.1.2 Pitch 11.1.3 Intonational phrasing 11.1.4 B ilingual acquisition of intonation 11.1.5 Bilingual vs. monolingual acquisition 11.2 The model revised 11.3 The acquisition of intonation in the general language acquisition process 11.4 Outlook and future research

162 162 162 164 164 165 167 168 170 171

12. References

173

Acknowledgements

This work would not have been possible without the generous help of the following people: I am very grateful to Francis Nolan and Sarah Hawkins at Cambridge University for letting me use the Phonetics Laboratory of the Linguistics Department for the instrumental analysis of my data. My thanks go to everyone who made my stays there so productive and enjoyable: To Geoff Potter for his invaluable and untiring technical support, to Tomasina, Steve, Ali, Daniel, Jonny, Eric, Andrew, Juha and all my other friends for their company, help and encouragement and the loan of beds and floor space... In Mannheim, I am very much indebted to Wilfried Schiitte from the Institut fur Deutsche Sprache for letting me occupy the only computer running xwaves for my various last-minute analyses. Furthermore, I owe thanks to Katrin Lindner for the IPA script, and to Richard Wiese, Erika Kaltenbacher, Jadranka Gvozdanovic, Rosemarie Tracy and Ira Gawlitzek-Maiwald for helpful comments on earlier versions. Thank you also to all my friends and everyone in Mannheim who offered me support and encouragement and who lent me their ears and time over the past 2 Vi years. Last, but certainly not least, I would like to express my gratefulness and indebtedness to my supervisor, Rosemarie Tracy: for supplying me with the data for this study, for her wonderful ability to create an atmosphere which is at the same time friendly, relaxed and inducive to hard work, and for always providing me with so many opportunities.

Notational Conventions

Transcription of Intonation in the British Tradition:

Types of nuclei: / \ A A , 1

Types of heads:

rise fall rise-fall rise-fall low level mid level high level

Prehead:

/ rising >» falling ' high level , low level

Tone groups: minor intonational phrase boundary major intonational phrase boundary

T high

Transcription of Intonation in the AM framework:

pitch accents L* H* H*+L L*+H !H*

a low pitch accent a high pitch accent a high plus low pitch accent a low plus high pitch accent a high pitch accent following another high pitch accent and slightly lower than this (downstepped pitch accent)

phrase accents LH!H-

a low phrase accent a high phrase accent a downstepped phrase accent

boundary tones L% H%

a low boundary tone a high boundary tone

1. Introduction

This thesis investigates some selected aspects of the acquisition of intonation of German/English bilingual children. So far, most of the research in both monolingual and bilingual language acquisition has focused on the "classic" parts of grammar: Syntax, morphology, segmental phonology, and semantics. Thus, were one to build a model child based on the current knowledge of language acquisition, she or he might acquire the ability to produce novel and syntactically correct utterances using the phoneme inventory and lexicon of the target language(s), but she would not, in terms of intonation, sound like a child learning to speak. For an illustration consider the following utterances produced by Laura, a German/ English bilingual child: [Laura, 2;06]' E: Is she? What is she doing? (1) L: Jifkinda/tsima she I Kinderzimmer (she I children's room) [Laura, 3;05] (playing a board game) (2) L: ven du ni? .kano 2 ,k h iraufgamax|dafadu ni$ ,vyaveln bai \mi:a wenn Du nicht ganz draufgemachtldarfst Du nicht wiirfeln bei mir (if you haven't put it on there entirely\you may not roll the dice at my place) [Laura, 3; 10] (3) L: /luk|ai \ d r o p t it in di 'gwa:s an da da den its \braukan (look 11 dropped it in the grass and then it's broken) First, given sufficient knowledge of the child's background and her present situation, current theories in bilingualism research may explain why Laura mixes her two languages in (1) but does not mix them in (2) or (3). Second, theories of syntactic acquisition will describe her path from (1), an utterance that lacks a verb and function words, to (2) and (3), which are essentially adult-like structures containing inflectional and complementizer phrases. Finally, theories of phonological acquisition could be employed to explain Laura's substitutions on a segmental level in utterances (2) and (3), such as [gwa:s] for [gra:s] and [kano] for [gans].

1

2

The conventions for the transcription of child speech used here are as follows: Age is given in square brackets in [years;months]. Descriptions of the situational context are put in round brackets. Child utterances are transcribed in IPA and adult productions are given in orthography. The index "E:" stands for the English-speaking investigator. An overview of the transcription conventions can be found on the previous page and a detailed description is given in section 6.3.2. This symbol refers to a particular lisp which is described in section 6.3.2.1.

2 In contrast, so far there are only rudimentary hypotheses and ideas which could capture Laura's progress in the area of intonation and which could explain the differences in sentence-level intonation between utterances (1) and (2) and (3). In (1), her intonational phrases (demarcated by the symbol "I") contain no more than a single lexical item. Conversely, in examples (2) and (3) an intonational phrase comprises many more. Furthermore, the intonational phrase boundary is placed at a syntactically relevant position only in (2) and (3) but not in (1). In addition, Laura produces a variety of pitch movements in utterances (1) to (3) (here transcribed with the symbols "," 3 for low level; and for falling; "/" for rising and "'" for high level). A theory of the acquisition of intonation would have to describe and explain their order of appearance and their linguistic functions. In summary, the change from short and intonationally limited utterances to fluent, native-sounding discourse can be observed in every normally developing child but has so far not attracted enough research to obtain sufficient results for a detailed description and reliable predictions. In their recent "Handbook of phonological development", Bernhardt and Stemberger (1998) write that they "do not address sentence intonation, which is notoriously difficult to transcribe reliably and which phonological theories have little to say about" (p. 367). This is especially lamentable because there is a growing demand for research in the area of sentence-level intonation. Firstly, much recent research has been carried out on the acquisition of prosody below the word level: A prosodic hierarchy has been established with mora, syllables, feet and words at the lower levels and compound words at the peak (e.g. Archibald 1995a). An analysis of the acquisition of sentence-level intonation thus supplements and extends this line of research. Secondly, it has been pointed out that intonational acquisition may provide mechanisms for bootstrapping other linguistic systems such as phonology, morphology and syntax (Jusczyk 1997a; Jusczyk & Kemler Nelson 1996; Peters & Stromqvist 1996). The study of sentence-level intonation may contribute to strengthening these assumptions. Bernhardt & Stemberger (1998) justly mentioned the lack of theory in the area of the acquisition of intonation. In this study, therefore, it will be tested whether the theories and transcription systems developed for the system of adult intonation can be meaningfully applied to child speech, and a first descriptive model of the acquisition of intonation will be presented. The study of German/English bilingual children, moreover, offers the opportunity to address additional theoretical issues. It will provide evidence for some specific questions of bilingual acquisition such as questions concerning the nature of the mental representations of the two languages. Since the differences between the intonation systems of German and English are relatively well established on both the phonological and the phonetic level, it will be possible to decide whether bilingual children acquire two separate phonological representations and phonetic production strategies or not and the synchronicity of the acquisition paths in both languages can be compared. The structure of this study is as follows: Chapter 2 presents a definition of intonation and describes the two major transcription systems of sentence-level intonation currently in use: The system of the British tradition and that of the autosegmental-metrical (AM) approach. 3

The conventions of the transcription of intonation used in this study will be described in detail in chapter 2.

3 As three of the main linguistic functions of intonation nucleus placement, nuclear tone 4 and intonational phrasing will be discussed, and the phonetic parameters that underlie their production will be described. It will be argued that the intonation of a language consists of both a system of phonological representations and their phonetic realisations and that, consequently, acquisition must be investigated on two levels: The phonetic one, which involves the physical control of certain phonetic parameters, and the phonological one, where these parameters are applied systematically in order to achieve various linguistic purposes. Subsequently, older and current approaches to the study of bilingual language acquisition will be presented, and aspects of bilingual language acquisition on both the phonological and the phonetic level will be discussed. The chapter closes with a provisional model of the bilingual acquisition of intonation. Chapter 3 focuses on the bilingual acquisition of nucleus placement. It describes the phonological systems of German and English and their phonetic correlates. Data from the acquisition of nucleus placement will be discussed before an outline of the research questions of this study is given. Chapter 4 is concerned with the bilingual acquisition of the phonological system of pitch. After a description of the phonological systems and phonetic production in both German and English, current knowledge about the acquisition process will be discussed, and questions for research will be raised. Chapter 5 provides a description of the phonological systems and phonetic production of intonational phrasing in German and English. It will be shown that, so far, very little is known about their acquisition, and the research questions of this study will be presented. Chapter 6 gives a summary of the questions and hypotheses of this study that have been developed in the preceding chapters and describes the methodology and the procedure of the data analysis. In chapter 7, a brief overview of the language acquisition of the three children studied here will be given in order to provide a general background for the more detailed results presented in chapters 8 to 10. There, the results concerning the phonological and phonetic acquisition of nucleus placement, pitch and intonational phrasing will be described. Chapter 11 summarises the results of this study and discusses them in the light of other studies. Finally, the provisional model of the bilingual acquisition of intonation presented in chapter 2 will be revised.

4

In the following, the term "pitch" will be used instead of "nuclear tone". As the latter implies a phonological system, it might not apply to all pitch movements in early child speech. The description of the emergence of a system of nuclear tone will be described as "phonological use of pitch".

2. Bilingual acquisition of intonation

This chapter explores general aspects of the acquisition of intonation by German/English bilingual children. Intonation is defined by Ladd (1996) as "the use of suprasegmental phonetic features to convey "postlexical" or sentence-level pragmatic meanings in a linguistically structured way" (p. 6). This definition comprises three aspects of intonation: It is described as "linguistically structured" - as a phonological system; this system is used for the linguistic function of conveying meaning; and intonational phenomena are correlated with certain phonetic parameters. These three aspects of intonation will be discussed here. Section 2.1 presents the description of the linguistic systems of English and German intonation in both the British tradition and the autosegmental-metrical approach and introduces their transcription systems. In section 2.2, three of the main linguistic functions of intonation will be discussed: The use of nucleus placement, pitch and intonational phrasing for conveying meaning. The phonetic parameters that underlie these intonational phenomena will be described in section 2.3. Section 2.4 focuses on the bilingual acquisition of intonation. After a description of general aspects of bilingual language acquisition, the acquisition of the phonological system of intonation and the mastery of the phonetic parameters of intonation will be discussed with reference to the specific bilingual task. Section 2.5 summarises these points and presents a first provisional model of the bilingual acquisition of intonation.

2.1 Description and transcription of the phonological systems of English and German intonation 2.1.1 The British tradition It has been unanimously accepted by now that intonation constitutes a linguistic system. O'Connor & Arnold (1973), who work within the framework of the so-called British tradition of intonation analysis, founded by Palmer (1922), described the pitch patterns of colloquial English as significant and systematic. "Significant" implies that pitch patterns have phonological status: Just as two lexical items differing in one speech sound can have different meanings, two utterances differing only in their intonation can convey different meanings. "Systematic" refers to the fact that there is only a limited number of distinctive pitch patterns with discrete meanings and that the relationship between form and function is stable. The basic unit of the analysis of English intonation in the British tradition is the "tone": A specific pitch movement on a stressed syllable. The last stressed syllable in an utterance, which usually carries the main stress and a distinctive pitch movement, is called the nucleus. Nuclear types that are usually described for English and their transcription symbols can be taken from figure 1. Simple nuclei include falls, rises and a level terminal pitch contour, complex nuclei include fall-rises, rise-falls and rise-fall-rises.

5 Simple Nuclei

Complex Nuclei

\

fall

V fall-rise

/

rise

A rise-fall

- level

N rise-fall-rise

Figure 1: Inventory of English nuclei according to Nolan (1994). Some authors additionally differentiate between high and low falls, high and low rises and high and low levels (Kingdon 1958; O'Connor & Arnold 1973; Cruttenden 1986; Tench 1996). The nucleus in example (4), which falls on do, is simple and a fall. (4) She'll \ k n o w what to \do about it The stretch from any stressed syllables preceding the nucleus up to the nucleus constitutes the head. 1 Simple, multiple and compound heads are possible in English. Simple heads can have a falling, rising or level form. Multiple heads consist of a sequence of identical stressed syllables, e.g. three falling ones in a row. Compound heads contain a sequence of different tones on stressed syllables as shown in figure 2.

Simple Heads

Compound Heads

\

falling

\+/*

/

rising

/ +'

rising+level

/+\

rising+falling

'+/*

level+rising

'

level

falling+rising

Figure 2: Inventory of simple and compound heads in English according to Nolan (1994). The head in example (4), which stretches from know to to, is simple and falling. Any unstressed syllables preceding the head - or the nucleus if there is no head - are called "prehead". They can be either low or high, with low being the neutral and high the marked form. Only a high prehead receives a symbol, the t . The prehead in example (4), which comprises she'll, is low and therefore unmarked. Any stressed syllables following the nucleus are called the "tail". However, due to its lack of contrastive function, Nolan (1984) argues against the intonational concept of a tail. There is no tail in example (4). The following intonational structure is proposed in the British tradition: The basic unit of the intonation structure are the tone units (TU). Minor and major tone units (MU) can be 1

This use of the term "head" is not to be confused with the head of a phrase in the X-bar theory of syntactic analysis.

6 distinguished (Trim 1959). A major tone unit usually coincides with a sentence and is marked by a II. Utterance (4) is an example for a single major tone unit and is therefore transcribed as in (4a). (4a) She'll \ k n o w what to \do about itll (5)

\She doesn'tlbut \ I know what to \do about itll

Utterance (5) is an example for a major tone unit that comprises two minor tone units: She doesn 't and but I know what to do about it. The transcription symbol for a minor tone unit is the I. A minor tone unit must contain a nucleus and has heads, prehead and tail as optional elements. Figure 3 shows the hierarchical structure of intonation as proposed in the British tradition. Optional éléments are put in parentheses. MU

(TU)

(TU)

TU

(PH)

(H)

N

Figure 3: The internal structure of a major tone unit as proposed in the British tradition. N stands for nucleus, H for head, PH for prehead, TU for minor tone unit and MU for major tone unit. Trim (1964) argues that this transcription system of English in the British tradition can be meaningfully applied to the description of German intonation as well. He proposes the following inventory of nuclear tones in German (figure 4):

Nuclear Tones \

low fall high fall low rise high rise V fall-rise A rise-fall ' high level

'

Figure 4: The inventory of nuclear tones in German according to Trim (1964). In contrast, some researchers claim that German does not make use of some nuclear forms that appear in English. According to Fox (1981) there is no low rise, other authors claim the absence of a low fall-rise (Raith 1986) or rise-fall-rise (Fox 1984). Typically, for the determination of tone units and the types of heads and nuclei, an auditory analysis is carried out. Auditory analysis within the descriptive system of the

7 British tradition requires training and yields transcriptions such as those illustrated in examples (4) and (5).

2.1.2 The autosegmental-metrical (AM) approach Intonational analysis within the autosegmental-metrical (AM) framework (Pierrehumbert 1980; Liberman & Pierrehumbert 1984; Beckman & Pierrehumbert 1986; Pierrehumbert & Beckman 1988) postulates three kinds of discrete events that make up the pitch contour of an English utterance: Pitch accents, phrase accents and boundary tones. Pitch accents are characteristic pitch movements, which are associated with prominent syllables. Phrase accents occur at the end of phrases, and boundary tones at the end of utterances. Two levels of tones are proposed, high (H) and low (L), which are the basic constituents of all accents and boundary tones. Figure 5 lists all the different types of accents and boundary tones of English with their notation:

Pitch accents L* H* H+L* H*+L L+H* L*+H !H*

a low pitch accent a high pitch accent a high plus low (falling) pitch accent a low plus high (rising) pitch accent a high pitch accent following another high pitch accent and slightly lower than this (downstepped pitch accent)

Phrase accents LH!H-

a low phrase accent a high phrase accent a downstepped phrase accent

Boundary tones L% H%

a low boundary tone a high boundary tone

Figure 5: The pitch accents, phrase accents and boundary tones as proposed by Beckman & Pierrehumbert (1986). In addition, there are labels for some rare phenomena: %H %r Highest_FO *?

for a high tone at the beginning of an utterance for the resetting of the intonation after a phrase boundary for the highest value of F0 for a not clearly identifiable pitch accent

8 Pitch accents are associated with prominent syllables and are marked by a star (*). There are two simple pitch accents (H* and L*) and four compound ones, which are marked by a "+" linking the two tones. In the pitch accents H*+L and L*+H, there is a perceptible pitch movement on the stressed syllable, in the pitch accents H+L* and L+H*, it precedes the stressed syllable. The symbol !H* refers to a downstepped accent, a high pitch accent which, following another H*, is at a lower absolute pitch height due to influences of the general slope in the fundamental frequency contour throughout an utterance (declination). Phrase accents are marked by the diacritic " - " and boundary tones by the "%". Both of them can only consist of a single tone: Either a H or a L. Phrase accents stretch from the last pitch accent of an intermediate phrase to the beginning of the following intermediate phrase or to the end of the utterance. The boundary tone falls exactly on the phrase boundary. Example (4) from above would thus look as follows (4b) in the descriptive system of the AM approach: (4b) She'll know what to do about it H*+L H* L-L%

Beckman and Pierrehumbert (1986) propose the following hierarchical structure of intonation (figure 6). The smallest elements are tones (either high, H, or low, L), which, either on their own or in combination with each other, form pitch accents that are always associated with a prominent syllable. One or more pitch accents (PA) together with a phrase accent - which is also either a H or a L tone - constitute an Intermediate Phrase. The Intonational Phrase comprises one or more Intermediate Phrases plus a phrase accent and a boundary tone (again either H or L). Intonational Phrase

(PA)

(PA)

PA

(PA) (PA)

PA Phrase Accent Boundary Tone

Figure 6: The structure of intonation according to Beckman and Pierrehumbert (1986).

Within this theoretical framework, the transcription system ToBI (Tone and Break Indices) was developed by Silverman et al. (1992) and Beckman & Ayers (1993) for American English. This transcription system is based on an instrumental analysis of pitch, i.e. an automatic tracking of the fundamental frequency movement within an utterance with the help of a speech analysis programme (see figure 7). An utterance is recorded on a computer as a speech file. The FO-tracker, an inbuilt facility of many speech analysis programmes, calculates the fundamental frequency (FO) contour (the pitch movement throughout an utterance) and plots it in a window, as can be seen in the lower half of figure 7. This representation of the phonetic component of intonation forms the basis of the ToBI analysis. The various pitch accent types, which are marked in the top line of figure 7, are determined by a combination of the interpretation of the FO-line and an auditory

9 verification. Line 2 gives an orthographic representation of the utterance. 2 In line 3, the types of word boundaries are noted, which yield information about the intonational phrasing of the utterance. Hj

Wjb

Figure 7: Example for the ToBI analysis of an utterance. The top line carries information about the pitch accents, the second line gives an orthographic rendition of the utterance; in line three the type of word boundaries are marked. The FO line of the utterance is plotted in the window below. The German ToBI system of prosodic labelling was developed simultaneously at the universities of Saarbrücken, Stuttgart and Braunschweig (Grice & Benzmüller 1995; Mayer 1995; Batliner & Reyelt 1994). The inventory of pitch accents proposed for German can be taken from figure 8:

Inventory of pitch accents H* H+!H* L* L+H* L*+H H+L* Figure 8: Inventory of pitch accents in German as proposed in the German ToBI system (Grice & Benzmuller 1995). 2

See section 6.3.4.2 for a full description of ToBI transcription.

10 Phrase accents and boundary tones are the same as in English: L- and H- and L% and H%. Uhmann (1988) recognises only 4 pitch accents 3 and two boundary tones for German (figure 9) and altogether dispenses with phrase accents.

Pitch accents

Boundary tones

H* L* H*+L L*+H

H% L%

Figure 9: Pitch accents and boundary tones of German as proposed by Uhmann (1988). Grabe (1998), in her contrastive analysis of the intonational phonologies of German and English, reduces the categories of pitch accents even further and describes only L*+H and H*+L for both languages. She then, however, assumes a second phonological level before the phonetic one, in which various phonological adjustments can apply.

2.1.3 Compatibility of the two transcription systems The descriptive system of intonation in the British tradition shows many similarities with the systems based on the autosegmental-metrical approach. Ladd (1996) illustrates the correspondences between nuclear types in the British tradition and pitch accent, phrase accent and boundary tone combinations as proposed by Pierrehumbert. Some examples of these are given in figure 10 below. Roach (1994) reports a partially successful attempt to automatically convert British intonation marks to the ToBI notation on a computer. The two description systems will be used alongside each other in this study. Auditory analysis in the British tradition will be complemented by an instrumental analysis of selected utterances. For this, the original ToBI notation will be employed for the description of both German and English intonation as this has the advantage of allowing direct comparison of utterances of one child in both languages. The term "intonational phrase" (IP) will be used for both tone units in the British tradition and IPs in the AM approach. The last pitch accent of an IP will be called the nucleus, and nuclear types, assuming near-perfect correspondence, will be given in both the ToBI and the British-style transcription, as exemplified in figure 10. This twofold approach has several advantages: Both notational systems were developed for adult speech and their applicability to child speech has not been tested yet. Thus, the appropriateness of the phonological categories for the description of child speech can be validated by using two different but comparable notation systems. Furthermore, it can be decided which of the two systems suits the description of child speech better. Intonational

1

Her notation of T (for German 'Tief') for a low tone has not been adopted and is here changed into the L symbol.

11 ToBI notation H* L- L% H* L- H% L* H- H% L+H*L- L % L* L- L%

British notation \ V / A ,

(fall) (fall-rise) (rise) (rise-fall) (low level)

Figure 10: Some nuclear tones in ToBI and British-style transcription. analysis in this combined approach also allows the minimisation of errors and disadvantages connected with the particular systems. On the one hand, technical requirements for a ToBI analysis are so high that it is not feasible as the only means of analysis for large corpora of data typical for longitudinal acquisition studies. Equally, instrumental analysis alone is fraught with difficulties as a direct mapping of FO-movements to phonological categories of pitch movement patterns is not possible (see section 2.3.2 below). Therefore, instrumental analysis of the fundamental frequency must be complemented by auditory analysis. On the other hand, the reliability of an auditory analysis in the British tradition can be tested by a comparison with an instrumental analysis. Thus, the auditory impression can be validated with the help of the physical measurements carried out by the computer.

2.2

T h e linguistic f u n c t i o n s of intonation

O'Connor & Arnold (1973) defined the pitch patterns of English as significant insofar as two utterances which are identical on a segmental level and only differ in their nuclear pitch pattern can have different meanings. These kinds of intonational minimal pairs can be found for the three intonational phenomena illustrated in examples (6) to (11). Other intonational phenomena such as voice quality, speech rate and register do not have phonological status. (6) (7)

\He is going home. He is going \home.

(8) (9)

It's \green. It's /green.

(10) My brother who lives in Rome is coming. (11) My brotherl who lives in Romel is coming. Examples (6) and (7) exemplify the function of intonation to mark focus. The interaction between focus and intonation has been the subject of many studies (e.g. Uhmann 1991; Altmann et al. 1989a, b; Lambrecht 1994) and has turned out to be notoriously difficult to describe. Typically, it seems, focus is marked by intonation (stress in combination with pitch movements), but this does not imply that intonation cannot run counter to the focus of

12 a sentence in some cases. A more detailed discussion of this will follow in chapter 3. If intonation is employed to mark focus, it can have two functions (Dik 1997): Either it marks an informational gap on the part of the speaker (e.g. new vs. old information), or it is used for the purpose of contrast. The marking of both types of focus can be achieved by nucleus placement - the placement of the main stress in a sentence. Example (6), in which the nucleus falls on he, would only be appropriate in contexts where it is necessary to assign focus to information already referred to, e.g. for reasons of contrast. For example, this could be the case after "They are all having a good time at the party. Only Andrew and Jill are bored stiff." Conversely, example (7) is only appropriate in a discourse where he unambiguously refers to the only possible subject, and the focus lies on the new information following sentences such as "Andy has had a hard day at work." Example (8) is an utterance with a falling pitch movement on the word green (transcribed with the preceding "\"), which is the typical intonational form of a neutral statement in both German and English. The same sequence of words, however, uttered with a rising pitch movement (transcribed with the "/" in (9)) on green leads to the perception of the utterance as a question. Here, the type of nuclear pitch movement produced has the linguistic function of determining the sentence type and/or of characterising the speech act. Example (10) shows an utterance spoken in one intonational phrase, i.e. without any inserted pauses (pitch movement is not transcribed here). Compare example (11), where two pauses, transcribed with the symbol "I", are inserted after "brother" and "Rome". Here, the intonational phrasing has altered the meaning of the utterance: Whereas (10) implies that the speaker has more than one brother, (11) suggests that there is only one. It will be assumed that the linguistic functions of pitch, nucleus placement and intonational phrasing as described here for German and English are among the earliest to be acquired by language learners: The choice and systematic use of the various types of pitch movements is theoretically possible from the one-word stage on. With the production of two-word utterances, nucleus placement becomes linguistically important, and the expansion of utterances from the multiword stage onwards leading to the production of subordinate structures and larger stretches of speech requires intonational phrasing.

2.3

T h e phonetic correlates of intonation

The acquisition of the linguistic systems of the three intonational phenomena nucleus placement, pitch accents and intonational phrasing requires the acquisition of the physical control of the phonetic parameters underlying their production. As phonetic parameters of intonation pitch, loudness, length and pause have been identified (Cruttenden 1986). Section 2.3.1 briefly describes how these parameters are assumed to combine for the production of nuclei, pitch accents and intonational phrasing. The following sections present short descriptions of the physiological, acoustic and auditory correlates of the phonetic parameters pitch, loudness, length and pause.

13 2.3.1 The phonetic correlates of nuclei, pitch accents and intonational phrasing Figure 11 illustrates how the phonetic parameters pitch, loudness, length and pause combine for the production of nuclei, pitch accents and intonational phrasing. The nucleus was defined as the last stressed syllable in an utterance. The phonological use of nucleus placement thus requires the phonetic production of stress. Perception of stressedness is usually correlated with an increase in loudness and length of the stressed syllable and a perceptible jump in pitch or pitch movement. A detailed description of the phonetic mechanisms underlying the production of stress in both German and English will be given in section 3.2. Pitch

Loudness

Length

Pause

Figure 11: The combination of the phonetic parameters pitch, loudness, length, and pause for the production of nuclei, pitch accents, and intonational phrases (IPs).

The production of pitch accents involves distinct pitch movement on the syllable carrying the pitch accent and an increase in both length and loudness. The specific combinations of the phonetic parameters used in German and English will be described in section 4.2. Intonational phrasing finally involves the placement of pauses. Furthermore, pre-pausal syllables show increased length, and a resetting of pitch after pauses may occur. The interplay of the phonetic parameters used for intonational phrasing in German and English will be described in detail in section 5.2.

2.3.2 Pitch For the production of pitch or voicing, the activities of the internal and external intercostal muscles and the diaphragm combine to force air from the lungs to the mouth. This airstream passes through the larynx. The shape of the larynx is determined by the thyroid cartilage resting on the cricoid cartilage. The vocal folds, fine sheets of muscle, run from the inner sides of the thyroid cartilage to the arytenoids. Various muscles control the opening, closing and tensioning of the vocal folds. When they are drawn close together air pushed upwards from the lungs causes them to vibrate. This vibration consists of regular cycles of opening and closing of the vocal folds, which is an effect of aerodynamic forces and the structure of the vocal folds themselves. Periodic movement of the vocal folds underlies all vowels and voiced consonants. In addition to this creation of sound in the

14 larynx, the position of the articulators, including tongue, jaw and various facial muscles, and the nasal cavities in the supralaryngeal vocal tract lend each sound its specific quality. Complex muscular interactions are necessary for maintaining a steady pulmonary air pressure for speech, and both breathing and the movement of the articulators for each speech sound must - as many experiments show (e.g. Barton & Macken 1980; Bond & Wilson 1980; Sereno et al. 1987; Smith 1994, 1995; Stathopoulos 1995) - be learned and automatized, i.e. neural representations in the motor cortex must be acquired (Lieberman & Blumstein 1988; Ackermann 1998; Ziegler 1998). Thus, the acquisition of control over pitch production necessitates automatized physical activities, which co-ordinate the interplay of the laryngeal muscles and the organs involved in the production of the air stream. In acoustic terms, pitch can be described as complex periodic waves with specific frequencies and amplitudes. The vibrations of the vocal folds correspond to the fundamental frequency (F0), given in Hertz (Hz), which indicates the number of openingclosing cycles of the vocal folds per second. The acoustic measurement of the fundamental frequency and the perception of pitch height are not proportionally related. For one tone to be perceived as twice as high as another, a significantly greater difference in fundamental frequency is required in a high region than in a low region (e.g. Lindsay & Norman 1981). The just noticeable difference in pitch height is - depending on the pitch region - between 0.3 and 4% of its frequency in Hz (t'Hart, Collier & Cohen 1990) or a semitone (Allen 1983). This, however, also depends on the length of the tone (a minimum of 30 ms). Some of the phenomena which can be described in acoustic terms are not picked up by human perception. Speech consists in part of voiceless sounds (of up to a quarter of the speech signal, depending on the language) without measurable fundamental frequency. In speech perception, however, pitch is perceived as continuous. Additionally, vowels have different inherent fundamental frequencies (Lehiste 1970; Ohala 1978), partly depending on the adjacent consonants, which cannot be perceived by a listener. Another acoustically measurable phenomenon is declination, the constant falling of F0 throughout an utterance (see Kutik et al. 1983; Ladd 1984; Nolan 1995), which is also not picked up by a listener. Neither is the resetting of the F0 line after a pause perceptible. In summary, a hearer employs a number of biologically endowed mechanisms to regulate many aspects of the acoustic signal, which facilitates the task of speech perception. Pitch is independent of the other linguistically relevant phonetic parameters, whereas perception of loudness depends on pitch and intensity. The interaction between pitch and the other phonetic parameters has been described by many authors. Vassiere (1983) notes that in German and English a high F0 is connected with high intensity (loudness). In order to increase pitch height, the tension in various laryngeal muscles and the subglottal air pressure must be increased. As these mechanisms also regulate loudness, an increase in pitch is associated with an increase in loudness (Fant 1968). Equally, pitch movements often correlate with a lengthening of the syllable, and an inserted pause is usually followed by a subsequent resetting of the F0 line of an utterance.

15 2.3.3 Loudness Loudness is produced physiologically by stronger activity in the diaphragm and in the intercostal and laryngeal muscles. This causes an increase in subglottal air pressure and may result in a greater tension of the vocal folds. Acoustically, this results in greater movement of the air particles and an increase of air pressure (measured in dynes/cm 2 ). Increased air pressure is reflected in a greater amplitude of the wave form. The term intensity refers to the square value of amplitude and is given in decibel (dB). Perception of loudness varies with the fundamental frequency of a tone (Schmidt 1985) and is given in phon. Vowels have different inherent intensities, which are not picked up by a listener (Lehiste 1970). Despite its role in the production of stress and emphasis, loudness usually is not explicitly transcribed in intonational analysis. In this study, transcription of loudness will be given for all analysed utterances. Intensity interacts with the fundamental frequency and the length of a speech sound. Variation of intensity is closely connected with variation in pitch height. An increase in pulmonary effort and subglottal air pressure may cause greater tension of the vocal folds, which leads to an increase in pitch height as well as an increase in loudness. High intensity in German and English is also correlated with an increased length of a syllable (Vassiere 1983).

2.3.4 Length The duration of a segment, syllable or other linguistic unit is identical in terms of production and acoustic measurement. Perception is not necessarily linear, however. In both German and English, lengthening of syllables is used to signal stress (Dogil 1995; Jessen et al. 1995; Sluijter 1995), emphasis or the end of a phrase. As is the case for loudness, length is not transcribed in the intonation systems of either the British tradition or the AM framework. Length interacts with both pitch and loudness. It usually co-occurs with a more pronounced pitch movement, and an increase in intensity generally results in greater length of speech segments.

2.3.5 Pause Physiologically, a pause is correlated with an absence of both movement of the articulators and vibration of the vocal folds. This happens at the closure period of stops, at phrase boundaries, and at what we call hesitations. A pause interacts with pitch and length in specific ways. For pauses occurring within utterances, pre-pausal syllables show a noticeable pitch movement and often increased length as well. After the pause the fundamental frequency, which dropped during the preceding stretch of utterance, is reset to a higher level. Pauses between utterances are distinctly longer than those within utterances and usually correlate with a phrase boundary.

16

2.4 Bilingual acquisition of intonation 2.4.1 Bilingual first language acquisition "Bilingualism" is a term that, over the past decades, has been used with a wide range of meanings. Haugen (1953) refers to bilingualism when a "speaker of one language can produce meaningful utterances in another language" (p. 7), whereas Bloomfield's (1933) definition of bilingualism is "native-like control of two languages" (p. 56). Apart from linguistic ability in a broad sense, the acquisition process of two languages itself is taken as a criterion for (the degree of) bilingualism. Romaine (1995) argues that only in the case of more or less simultaneous acquisition of two languages the term bilingualism should be employed and that cases of children acquiring one language after the other belong to the field of second language acquisition. This study deals with the acquisition of intonation in bilingual first language acquisition (BFLA), a termed coined by Meisel (1990) and de Houwer (1990). It refers to the child's exposure to two languages from birth on and to a regular (ideally daily) contact with these languages. However, even within this narrow definition several acquisition scenarios are possible, varying in terms of both the persons and the environment associated with each language. This aspect of bilingual language acquisition is closely connected with the question whether competence in both languages has been acquired fully and to the same extent. Romaine (1995) differentiates various types of bilingual language acquisition depending on sociolinguistic factors, the status of the languages in the community, the parents' native language/s and the parents' language policy when speaking to the child. On the basis of these factors, it is claimed, predictions can be made concerning the acquisition process and its end result - the nature of the mental representations of the two languages in the bilingual brain. It seems that quality and amount of language input as well as the social context such as the acceptance and prestige of a language in the speech community directly affect the rate and quality of acquisition. Only very few bilinguals seem to acquire fully equal representations of their two languages insofar as they use both languages equally well in all social contexts. Most bilinguals seem to prefer one language over the other, depending on the context. In these cases, the preferred language is called "dominant". However, language dominance is not static but may change with a bilingual's language environment.

2.4.2 Bilingual language representation and processing One of the principal areas of interest in research in bilingualism is the nature of language representation and language processing. Nearly half a century ago, Weinreich (1953) proposed the now famous three types of bilingualism: The co-ordinate bilingual speaker was thought to have two separate conceptual representations of the two languages; the compound bilingual was assumed to have one shared conceptual system, to which terms from both languages are linked. Weinreich believed that these differences resulted from the way in which the languages had been learned. In the case of a co-ordinate bilingual there was virtually no overlap in the contexts in which the languages were acquired, whereas the compound bilingual learned the languages in high contextual confusion. The third type of bilingualism, the subordinative one, was hypothesised to be a result of second language

17 acquisition at school. Experimental confirmation of Weinreich's distinctions was never achieved, partly because of difficulties with the interpretation of results and partly because of methodological shortcomings such as the choice of subjects and their assignment to the types of bilinguals (see Hakuta 1986; Romaine 1995). Similarly, neurological evidence for bilingual language representation is inconclusive. A proposed right hemisphere involvement of bilinguals in contrast to a left hemisphere lateralization in (right handed) monolinguals seems to apply mainly to bilinguals who learned their second language later in life (Genesee et al. 1978; Wuillemin et al. 1994). This claim is furthermore difficult to maintain in view of the conflicting findings and the lack of comparability between the studies (see Paradis 1990, 1992). Equally, the debate between the "extended system" hypothesis, which claims that the same neural mechanisms underlie the two languages of a bilingual, and the "dual system" hypothesis, which assumes separate neural networks for each linguistic level (i.e. phonology, syntax, morphology, semantics) is still unresolved (Paradis 1981). Psycholinguistic experiments show that the two lexicons of a bilingual are closely connected. Stimulus material in one language results in a parallel low level activation of the other language. In the Stroop test, where subjects are asked to name the colour of the letters of a word, which in itself constitutes a colour term, cross-language interference occurs (Segalowitz 1977; Ehri & Ryan 1980). Similarly, a double activation of both languages for visual word recognition was proposed by Grainger (1993). These experiments provide empirical support for the now prevalent view, derived from sociolinguistic research, that Bloomfield's (1933) "native-like control of two languages" constitutes an unrealistic goal. Rather, it is assumed that no bilingual will perform in both languages with equal ease and motivation (Tracy 1995). Similarly, claims that speech events belong to a definite language (Weinreich 1953) have been found untenable, and the term "bilingual mode" (Grosjean 1982) was introduced to refer to language processing in bilinguals. This term encompasses the simultaneous activation and accessibility of two languages. In speech to monolinguals, it is usually suppressed. In conversation with other bilinguals, in contrast, output control is relaxed and speakers are in the bilingual mode, where code-switching can occur (Grosjean 1995).

2.4.3 Language representation and processing in bilingual first language acquisition Language representation and processing have been central issues in research on bilingualism in adults but have been investigated less intensively for bilingual language acquisition. Most evidence comes from studies with monolingual children. KarmiloffSmith (1992) developed a model describing how (monolingual) children build up and store linguistically relevant representations. Based on the assumption that language acquisition involves more than successful storage and behavioural mastery of linguistic tasks, she argues that the linguistic representations, in the course of language acquisition, undergo redescription "such that they become linguistic objects of attention outside their on-line use in comprehension and production" (p. 48). According to her, initially stored linguistic representations, acquired through an innate linguistic predisposition and attention biases, enable the child to function as a fluent speaker but are not available for metalinguistic reflection. Learning in this first phase is data driven and results in what Karmiloff-Smith

18 calls "representational adjunctions", which do not have any relations to other representations. The representational redescription (RR) model postulates that those linguistic representations, embedded implicitly in linguistic procedures (level-I (=implicit) representations) undergo a process of re-representation in order to become flexible and manipulable (Ellevel (^explicit) representations) and to become eventually accessible to metalinguistic reflection and cross-domain relationships with other areas of cognition (E2/3-level representations). Indicators for redescription processes in language acquisition are assumed to be self-repairs and late-occurring errors. In phase 2, learning is assumed to be externally driven with the focus on represented knowledge so that external data can be temporarily disregarded. Balance between internal and external focus is achieved in phase 3. Despite the fact that this model was not explicitly developed for bilingual first language acquisition, it might serve as a descriptional framework for the acquisition of intonation in bilingual first language acquisition. Evidence for a linguistic predisposition and attention biases of (monolingual) infants towards intonational cues is abundant. Hirsh-Pasek et al. (1987) showed that 7-month-old babies are sensitive to the intonational structuring of their mother tongue. Babies oriented longer to speech with pauses at linguistic boundaries than to speech with randomly inserted pauses. Mehler et al. (1988) report that already four-dayolds are able to discriminate any two languages on the basis of their intonational structure. Using the sucking habituation technique, they could show that the infants were sensitive to the overall prosodic shape of languages. Interestingly, two-month-olds fail to differentiate languages if one of them is not their native one. In other words, at two months, infants have set some first parameter values concerning the structure of their mother tongue and treat all unfamiliar languages as not fitting this model. In the TIGRE framework, Mehler et al. (1996) propose that babies are specifically endowed to cope with bi- and multilingualism. TIGRE stands for Time and Intensity Grid Representation and is assumed to be the infant's gridlike representation of the vocalic nuclei in the speech signal as they carry important information about stress, rhythm and metrical properties of a language. On the basis of this, infants are assumed to differentiate languages into three classes: Syllable-based, stressbased and mora-based rhythm. Bilinguals growing up with two languages from different classes thus have a powerful tool to keep them apart. It is claimed that bilinguals growing up with two languages of the same class start with the assumption that they are surrounded by only one language and learn to differentiate them only later. However, experimental results still have to substantiate these claims. Phase 2 of Karmiloff-Smith's (1992) model, applied to the bilingual acquisition of intonation, would then predict the child's emerging awareness of intonational structures. This could coincide with errors in production and the occurrence of self-repairs. In the course of this redescription process, children would become aware of the intonational features of their native languages and be able to comment on them. However, it seems likely that no E2/3-level representations of intonational phonology are possible - research on the teaching of intonation or the acquisition of intonation in a second language suggests that intonation is an area inaccessible to metalinguistic reflection even in adults (e.g. Raith 1986; Trim 1988). Research in language processing and representation during bilingual language acquisition has been largely centred around the debate on the initial fusion vs. early separation of the two languages. The first view was proposed by Volterra & Taeschner

19 (1978) and many others (Park & Redlinger 1980; Vihman & McLaughlin 1982; Grosjean 1982; Saunders 1982; Taeschner 1983; Vihman 1985; Arnberg 1987), who describe bilingual language acquisition as a progression from a single undifferentiated system to two separate systems. Supporters of the Independent Development Hypothesis (Bergman 1976; Padilla & Lindholm 1984) assume a completely separate acquisition of the two languages from the beginning. Tracy (1996) argues that the early fusion or separateness of the two systems very much depends on the type of languages involved: Whereas in the case of two languages with very similar phonologies she would predict an initial undifferentiated phonological system, this would be less likely in the case of two languages which have little similarity in this respect. Research in the acquisition of segmental phonology by bilingual children clearly exemplifies the difficulties associated with the study of phonological acquisition in general. The main problem lies in the assessment of whether child productions reflect two separate phonological systems or whether they are a product of an undifferentiated "mixed" mental representation. On the one hand, Burling (1959) and Schnitzer & Krasinski (1994) as well as Leopold (1947) adduce evidence for an initially undifferentiated consonant system in the bilingual children they studied. On the other hand, no such data is reported by Deuchar (1989), Ingram (1981), Oksaar (1970), and Raffler-Engel (1965), which led these authors to assume two separate phonological systems from the beginning. One reason for this apparent contradiction could be that many of the proposed "undifferentiated systems", unfortunately, turn out to be misinterpretations of regular features of phonological acquisition. This is most probably the case in the above cited studies which found evidence for an initial cross-linguistic consonant system: Schnitzer & Krasinski (1994) reported that their Spanish-English subject showed a phase in which the appropriate vowels were produced in each language, but only one consonant system seemed to exist. This was interpreted as evidence for a "mixed" stage. Deuchar & Clark (1996), however, offer a phonetic explanation for this pattern of acquisition. The voice onset time (VOT) for the Spanish stops /p/, /t/, and DsJ nearly equals those of the English stops /b/, /d/, and /g/ - they are produced by a short lag after the release. This means that the contrast between "voiced" and "voiceless" stops in Spanish is realised by voicing lead vs. short lag, whereas in English it is realised by short vs. long lag. At 1; 11, Deuchar & Clark's subjects did not achieve a perceptible voicing contrast in either language. At 2;3, the English contrast was acquired. In Spanish, however, it was hardly perceptible and produced within the short lag. These findings are in line with evidence from monolingual studies which report that Spanish children acquire voicing lead much later than English children acquire their contrast. Thus, whilst English/Spanish bilingual learners may temporarily sound as if they were using the same consonants in their Spanish and English utterances, Deuchar & Clark's study presents clear evidence of a parallel and asynchronous phonological acquisition. Thus, many conclusions of a mixed system could be due to the lack of sophisticated acoustic measurements. Similar examples for the difficulty of identifying cross-linguistic structures in the acquisition of segmental phonology can be adduced for German/English bilingual children. Consider the following examples from Adam, one of the subjects of this study, a German/English bilingual child growing up in Germany.

20 (12) [3;6] [we:t] for red (13) [4;4] [hasp] forgave Example (12) shows final devoicing of the /d/ and could be considered an instance of a cross-linguistic production where the German final devoicing rule is applied to an English lexical item. However, data from monolingual children acquiring English also show final devoicing (Wode 1988; cf. Ingram 1989). Example (13) again might be assumed to be a cross-linguistic mix of the German [ha:b] and the English [haev]. Yet, other productions by Adam show that he systematically replaces labiodental fricatives with bilabial stops in English (examples 14 and 15): (14) [4;4] [auba] for over (15) [4;4] [mu:b] for move Moreover, this substitution has also been recorded for monolingual children. Assessment of the nature of bilingual phonological representation must therefore be made with great care. This applies even more to the area of intonational phonology, where not much is known about the adult systems yet and acquisition data is scarce.

2.4.4 Bilingual acquisition of the phonological system of intonation In the overviews in de Houwer (1990) and Hoffmann (1991), not a single study in bilingual language acquisition is listed that deals with intonation. Consequently, hardly anything is known about the simultaneous acquisition of two intonation systems. As is implied by the definition of intonation given by Ladd (1996) at the beginning of this chapter, the acquisition of the intonational phonology of a language comprises at least three areas: The acquisition of control over the physiological mechanisms underlying the suprasegmental phonetic features used for intonation; the perception and mental representation of the intonation system and its functions; and the matching of the production to the perceived system. To my knowledge, so far no studies have been carried out which explore any of these areas and which could present evidence for either a separate or a mixed representation of the phonological system of intonation in bilingual language learners. Some evidence for a bilingual representation of two separate phonological systems on a suprasegmental level comes from a study by Paradis (1998), who investigated the truncation patterns in multisyllabic words by English/French bilingual two-year-olds. Asked to repeat nonsense words of four-syllable length with varying stress patterns, the bilinguals showed the same truncation strategies as their age-matched monolingual peers. These truncation strategies were specific to either English or French and thus reflected the acquisition of the respective phonological systems. However, the bilinguals also differed from the monolinguals in certain ways: In words with a WSWS (weak-strong-weak-strong) syllable structure or words with a WSWW structure, bilinguals did not preserve the syllables in the second position more often than those in third position - a pattern observed in the English monolingual group. Neither did the bilingual two-year-olds prefer heavy weak syllables over light weak syllables as had their monolingual counterparts. These limited but systematic differences between the two groups led Paradis (1998) to conclude

21 that the phonological systems of the bilingual two-year-olds cannot be assumed to be autonomous but rather to be in interaction with each other. The direction of the interaction seems to be determined by language-dominance.

2.4.5 Bilingual acquisition of the phonetic parameters of intonation Interaction between the two systems of bilingual language learners is even more evident on the phonetic level. In terms of language processing, bilinguals seem to have to make subtle adjustments in both perception and production in order to accommodate the two languages. Cutler (1994) describes how the perception of the rhythm of a language may help the infant to segment speech. A strategy for English would be to assume that a stressed syllable signals the beginning of a word. In French, segmentation would be based on the syllable, and in Japanese on the mora. Children growing up as bilinguals, she claims, can only use one strategy. Should the two languages they acquire require different strategies, their one strategy would be misapplied for one language. This hypothesis was tested with adult French/English bilinguals (Cutler et al. 1992), who had been exposed to both languages from one year on or earlier and who used both languages regularly. They were presented with the same speech perception task as two groups of French and English monolinguals in earlier studies. Results were heterogeneous: The bilinguals behaved neither like monolingual English speakers with the English material nor like French monolinguals in the French part of the experiment. When the subjects were divided into English- or French-dominant speakers, however, Cutler et al. (1992) could show that the French-dominant bilinguals used syllable-based segmentation strategies in French and the English-dominant bilinguals used stress-based segmentation with English. In short, in the dominant language each group resembled monolingual speakers, whereas in the non-dominant language they did not. However, the authors stress that segmentation in the non-dominant language was not imperfect or inferior in any way, but that the subjects simply fell back on general, less language-specific segmentation strategies. This suggests that bilingual language learners may only develop highly specialised segmentation strategies for one of their language. However, it also seems to corroborate the view that the mental representation of the intonational (rhythmic) structure of the two languages is completely separate. Furthermore, there is some evidence that bilinguals employ a specific production strategy for speech sounds. Watson (1991) found that the articulation of certain phonemes by bilinguals may differ from that of monolinguals, yet without being perceptually noticeable. The French/English bilinguals he studied showed a systematic difference in onset frequency of the first formant in vowels following voiced and voiceless stops in both languages; a feature which is produced only by English but not by French monolinguals. He also found greater aspiration in French voiceless velars in his bilingual data compared to monolingual average values. Each production, however, was well within the acceptable monolingual range. Bilinguals thus may have different production routines for some phonemes to reduce the processing load, perceptually, however, they will stay well within monolingually acceptable limits. It is quite possible to suppose that a similar strategy would be employed in the phonetic production of intonational phenomena.

22 2.5 A model of the bilingual acquisition of intonation

Child language acquisition has traditionally been divided into several stages. The first stage, in which the infant produces speech sounds without using them meaningfully (babbling), is commonly referred to as the "prelinguistic stage". At around one year of age, this is followed by the "one-word stage", in which individual words of the target language/s are produced in a meaningful way. This early child speech is variously called "holophrastic" (Oksaar 1977) and "asyntactic". The next stage is that of the concatenation of two words, which is then replaced by the stage of multiword combinations (see e.g. Radford 1990). Crystal (1981, 1986) proposes a 5-stage model of the development of intonation: He assumes that the intonation of the first vocalisations by infants express their emotional state. As a next step, linguistic awareness of the differences in the use of intonation by adults is acquired. The first productions of pitch movement patterns by the infant are assumed to be accidental and meaningless and to correspond to babbling in the phonological development. From the age of six months on he claims the existence of stable intonation contours without any clearly associated meanings. Intonational phrasing is postulated to begin with the combination of words. Crystal and other authors (Halliday 1975, 1979; Leopold 1947) describe a detailed order of the acquisition of pitch movements and their associated meanings. These claims, however, are based on impressionistic accounts or auditory analyses alone and still await experimental confirmation. The greatest drawback of these models, moreover, is that they describe intonation on a single holistic level of representation and that they make no distinction between the phonological and phonetic aspects of pitch patterns. As discussed at the beginning of this chapter, a model of the acquisition of intonation must incorporate this distinction. Furthermore, current research has begun to challenge the concept of clearly separable stages and stresses their interrelatedness and the continuity of development. In segmental phonology, a great similarity has been discovered between the frequency of certain speech sounds in babbling and early words, which has led authors to propose continuity between those two phases (Boysson-Bardies et al. 1992; Lleo et al. 1994; Menn & Stoel-Gammon 1995). In suprasegmental phonology, Penner (personal communication) claims continuity in the prosodic development from babbling to object+verb constructions (see also Tracy 1991 for syntax). The model presented below will thus use the term "phase" instead of "stage". No clear division between phases is proposed and overlap is assumed to be possible. For example, it is conceivable that a child progresses to phase 2 in a certain aspect of intonation while it still remains in the first phase in another. In summary, current knowledge of the acquisition of phonological and phonetic processes in (segmental) phonology as discussed above suggests the following provisional model of the acquisition of intonation by German/English bilingual learners (table 1): In the first phase, infants are exposed to the intonation of German and English in their parents' speech. They show attentional biases towards the prosodic structure of both English and German and begin to build up representations of the two systems. At this stage, infants have very little physical control over the production mechanisms of the phonetic parameters of intonation, and although accidental productions of either German or English intonational phenomena may occur, they are not used in a linguistic or purposeful way.

23 In the second phase, which is assumed to coincide roughly with the onset of speech, control over the physical mechanisms of the phonetic parameters of intonation grows and first rules of the phonological use of intonation are acquired. This can be observed in the systematic application of nucleus placement, pitch accents or intonational phrasing in early words and sentences. The representation of phonological rules is separate for the two languages; however, interaction between the systems is possible. Intonational rules already acquired in the dominant language may be applied in the weaker one so that mixing structures can occur.

Phase I

Phase II

Phase m

Phonological Level

-beginnings of phonological representation

-increase of phonological representation: two separate systems with possible interaction and mixing structures; interaction in the direction of the weaker language

-fully acquired phonologies; -two separate systems with possible interaction on a non-perceptual level

Phonetic Level

-little control over the -growing control over the phonetic parameters phonetic parameters -production of intonational phenomena without linguistic function (intonational babbling)

-complete control over phonetic parameters; -possibly simplified production strategy without perceptual consequences

Table 1: A provisional model of the phases of bilingual acquisition of intonation on a phonological and phonetic level. The last phase of the bilingual acquisition of intonation entails the full representation of the two phonological systems, which may interact in certain situations. At this stage, redescription processes as postulated by Karmiloff-Smith (1992) are beginning to occur. They become observable in the child's metalinguistic awareness of intonational phenomena as well as the production of speech errors and the occurrence of self-repair. Control over the phonetic parameters is complete, but simplification strategies may be employed; however, it is assumed that they will stay within the monolingual range and thus not be perceptible as "deviant" In the following, this provisional model will be tested. Chapters 3 to 5 deal with our current knowledge on the bilingual acquisition of nucleus placement, pitch and intonational phrasing. Subsequently, the results of this study (chapters 7 to 10), which investigates these

24 aspects in the language acquisition of three German/English bilingual children, will be used as a basis to modify this provisional model (chapter 11).

3. Bilingual acquisition of nucleus placement

This chapter focuses on the bilingual acquisition of nucleus placement. After a discussion of what is known about the German and English phonological systems of nucleus placement (3.1), the specific combination of phonetic parameters employed for the production of nuclei will be investigated (3.2). Section 3.3 presents our current knowledge about the different tasks of a language learner acquiring the phonology of nucleus placement, and section 3.4 gives an outline of research on the mastery of the phonetic production of nuclei. In section 3.5, the research questions of this study will be listed.

3.1 The phonological systems of nucleus placement in German and English

In both German and English, the phonological system of nucleus placement is intricately interwoven with the concept of focus. "Focus" has been defined as a grammatical feature (e.g. Selkirk 1984; Winkler 1994) or a semantic term (e.g. Altmann et al. 1989a) and refers to that part of a sentence which is prominent in terms of content and which forms a contrast to the background information of the sentence (Uhmann 1991). Firstly, focus can be differentiated in terms of scope (e.g. Dik 1997): Broad focus falls on entire constituents or even whole sentences, whereas narrow focus applies to individual lexical items (F6ry 1988; Ladd 1980). Examples (16) and (17) illustrate broad focus (which can be described by the term "normal stress" as Ladd (1996) suggests). (16) He took the \ball. (17) Er nahm den \Ball. In both cases, broad focus falls on the constituent "the ball" resp. "den Ball" or it might even be argued that the entire sentence is in focus. In examples (18) and (19) only the words ball resp. Ball have narrow focus. (18) There were a doll and a ball. He took the \ball. (19) Da waren eine Puppe und ein Ball. Er nahm den \Ball. Secondly, focus can be differentiated in terms of pragmatic function. On the one hand, it can be employed to indicate an informational gap on the part of the speaker or an assumed one on the part of the listener. In these cases, as illustrated in (20) and (21), focus is used to mark new vs. old information. (20) Where is the dog? The dog is in the \house. (21) Wo ist der Hund? Der Hund ist im \Haus.

26 On the other hand, focus can have the pragmatic function of indicating contrast, as exemplified in (22) or sentences (6) and (7) in section 2.2. (22) All three of them had a /go II only \mother won

Here, the nucleus falls on mother although she has previously been referred to in all three of them. By this, the contrast between mother and the rest of the group is achieved. Thus, in instances of contrast, focus can fall on old information. It is generally believed that sentence-stress (and nucleus placement in particular) reflects the intended focus of an utterance. However, Ladd (1996) adduces some examples in which constituents or words are focussed without receiving stress. Conversely, to my knowledge, it has never been argued that a lexical item which carries the nucleus can at the same time not be in focus. Closely related to the concept of focus is the concept of emphasis. A lexical item is perceived to be emphasised when it receives exceptionable stress. Emphasis can be used for pragmatic reasons of contrast (e.g. Bannert 1985), for example to mark a particular item in enumeration. This is illustrated in (23) and (24). (23) Apples, pears, potatoes and peas. (24) Apfel, Birnen, Kar\toffeln und Erbsen.

The distinction between emphasis and stress or focus is difficult to make, which is mainly due to the fact that emphasis is defined phonetically rather than grammatically or semantically (see e.g. Gut 1995). The phonological rules of nucleus placement seem to be very similar in German and English. In both languages, the nucleus is associated with a distinctive pitch movement and, in an unmarked utterance, falls on the last word. However, not any word level category can receive nuclear stress. It is far more common for words of a lexical than a non-lexical category (Markus 1992). Crystal (1969) found that the head of English NPs and VPs receives the nucleus as a default and that any other nucleus placements are perceived as marked. Thus, in the majority of cases, unmarked utterances will have the nucleus on the last noun in both German and English (see also Gibbon 1998; Ladd 1980). This clearly applies only to very simple sentences such as simple main clauses. The description of most of adult speech would thus require elaborate additions to this nucleus placement rule. It might, however, serve well for the description of a great proportion of child utterances. From this follows one area of contrast: In German, because it is a V2-language, sentences with verb-end position are possible, for example sentences containing modals or participial constructions, where a noun then appears in the penultimate position. In these cases, at least in unmarked conditions, the nucleus is still on the noun, albeit in penultimate position (example 25). In English, in contrast, word order prevents such cases (example 26). (25) Er kann den \Ball sehen. (26) He can see the \ball.

27 Other differences between nucleus placement in German and English are not well attested. However, Gut (1995) found that English speakers put the nucleus on phone in the utterance (27)

The \phone rang.

whereas German speakers in the equivalent German utterance (28) placed it on the verb. (28) Das Telefon \klingelte.

When reading the English text, the German speakers also placed the nucleus on the verb, i.e. produced (29): (29) The phone \rang.

Equally, the utterance (30) received nuclear stress on Hi by all English speakers: (30) Hi Nick.

German speakers placed the nucleus on Nick in both the German equivalent utterance and the English one. These data provide some evidence for a systematic difference between the phonological use of nucleus placement in the two languages, and it can be concluded that bilingual learners will have to acquire two systems with different phonological rules.

3.2 The phonetic production of nuclei

A nucleus is defined as the last stressed syllable of an utterance which carries a distinctive pitch movement. Stress can occur on various levels. Firstly, it is an abstract property of a word, which will be realised when the word is spoken in isolation. There, it can have the function of lexical disambiguation as in 1object (the noun) and object (the verb) in English and the word pair ueber'setzen (translate) and 'uebersetzen (take across) in German, which are distinguished by the place of stress. Secondly, in connected speech, word stress becomes a potential place for sentence stress. The distribution of stress on the sentencelevel has been proposed to follow generative rules (Chomsky & Halle 1968 for English and Bierwisch 1966 for German). Esser (1978) suggests that its function is communicative differentiation of the individual lexical items and a marking of theme and rheme. Emphasis was cited as one of the linguistic functions of nucleus placement. Emphasis is the term for prominence assigned to a lexical item and is used for contrastive focus, irony and other functions. Although emphasis has its significance attached to the whole word, it is strictly speaking just a feature of the emphasised syllable, which normally carries the word stress. It will be assumed that the difference between stress and emphasis is of a quantitative rather than a qualitative nature (cf. Liberman & Pierrehumbert 1984).

28 Most authors agree that stress and emphasis are produced by a combination of the parameters pitch, loudness and length (e.g. Cruttenden 1986; Batliner 1989b; Lehiste 1970, to which de Jong 1995, adds "a formant structure in vowels which is more different from those of a uniform tube than their unstressed counterparts", p. 491), but only rudimentary attempts have so far been made to describe the complex interplay of these features in detail. This is probably largely due to the fact that these parameters are closely connected in physiological terms (see also section 2.3). Two mechanisms have been discovered which are involved in the production of stress. One is an increased effort of muscular activity in the intercostals and the diaphragm which leads to an increase in subglottal air pressure (Lehiste 1970). The other is a tensioning of the vocal folds (Fant 1968). Monsen, Engebretson & Vemula (1978) investigated the effects of both these mechanisms on the fundamental frequency by using a phonation model based on a twomass model of vocal-fold vibration, in which the parameters could be controlled independently. It was found that an increase in air pressure resulted in an increase of intensity without showing a large effect on the fundamental frequency. An increase in vocal-fold tension, conversely, produced a large change in F0, but resulted in only a small decrease of the intensity. No evidence, however, has yet been brought forward which would clarify the extent to which speakers can control these mechanisms independently. It is intuitively clear that pitch and intensity can be controlled separately if attention is drawn to them during a speech act. The question, however, how they act together in unmonitored spontaneous speech is of greater interest. Lehiste (1970) proposes that subglottal pressure peaks are closely related to the production of emphasis as opposed to that of word stress, which is more associated with pitch change. Sundberg, Elliot, Gramming & Nord (1993) discovered that during a recital of a poem the subglottal pressure pulses are synchronised with the voiced consonants of the emphasised syllable. Monsen et al. (1978) conclude from their experiments with speakers phonating into a tube, which acted as a continuation of the vocal tract and yielded glottal waves, that subglottal air pressure regulates fundamental frequency changes within syllables whereas stress, which is reflected in pitch changes between syllables, is associated with both high vocal-fold tension and high subglottal pressure. Vassiere (1983) furthermore suggests that high intensity of a syllable is correlated with a longer duration of the syllable as the general increase in articulatory effort requires additional time. De Jong (1995) describes the production of prominence in English as a localised hyperarticulation. Using an x-ray microbeam facility to record the movements of jaw, tongue, lips and the shape of the vocal tract, he showed that the motions of the articulators, usually characterised by motor economy in unstressed conditions, were greatly increased in order to produce stress, which then resulted in an increased distinctness. Larger duration thus seems to be the physical requirement for a clearer enunciation (cf. Erickson 1998; Erickson et al. 1998). There is some evidence that different relative importance is attached to the parameters pitch, intensity and duration for the production of stress in different languages (e.g. Gut 1995). Jones (described in Lehiste 1970) showed that the identification of word stress in a foreign language poses problems for subjects. Scuffil (1982) demonstrated the same for emphasis. He presented English and German subjects with sentences such as "What S are they \doing?" where / signifies a rising head and \ a falling nucleus. Asked to mark the

29 most prominent word in the sentence, a significant number of German native speakers chose the last word, whereas the majority of English speakers agreed on the second word. In addition, in a sentence with the structure 1 2 '3 /4, many Germans heard a prominence on the third word, which led Scuffil to conclude that, for German speakers, low pitch inhibits perceptual prominence. The relative importance of pitch, intensity and duration for the production of stress and emphasis in English and German will be discussed in the following.

3.2.1 Pitch and pitch movement during stressed syllables A perceptible rise in pitch is commonly described to be the most important cue for stress in English. For German, Kuhlmann (1952) characterises nuclear stress as either a particularly high or low pitch in comparison to the adjacent syllables. Antonsen (1966) and von Essen (1964) consider extra pitch height as most important for the production of emphasis, and Batliner (1989a) and Bannert (1985) report experimental evidence for a higher relative fundamental frequency on stressed syllables. Gut (1995) found that pitch was the most important cue for stress for English speakers. There is some evidence that different heights of pitch accents signal different kinds of focus (e.g. single vs. double focus), at least when associated with rise-falling pitch movements (Liberman & Pierrehumbert 1984; Pierrehumbert & Hirschberg 1990; Ladd & Terken 1995; Rump & Collier 1996).

3.2.2 Intensity Whereas in English intensity is considered to play a subordinate role to changes in fundamental frequency in matters of stress, more importance seems to be attached to it in German. Trim (1988) claims that in German the relative importance of emphasised syllables is determined by loudness and predicts the highest volume on the most emphasised one and less loudness on all other syllables including the nucleus. Batliner (1989a) found that the use of intensity is a characteristic of only some speakers. In his experiment, only three of six speakers used loudness for purposes of emphasis and then only in declarative sentences but not in questions. Gut (1995) found that some German speakers use only intensity to mark emphasis, others produce a combination of intensity and pitch. ClaBen et al. (1998) show that the measurement of the overall intensity of vowels might be too crude to pick up characteristics of stressed vowels. They suggest that stress is reflected in the spectral tilt, the relationship between amplitudes in the low frequency and the mid- to high-frequency domain of a vowel's spectrum.

3.2.3 Length There are currently no languages known which use length as the most important phonetic cue for stress. Lehiste (1970) postulated this for Serbo-Croatian, but this claim was rejected by Gvozdanovic (1980). In English, length merely seems to play a subordinate role and may just constitute a by-product of the clearer articulation of the emphasised syllable. In

30 German, the role of length for the production of stress is probably more significant: Bannert (1985) found that two of his four subjects used length as a signal of stress, and Batliner (1989a) reports from his investigation that prominent syllables are slightly longer than unstressed ones. Dogil (1999; see also Kohler 1995) argues that length is the only reliable correlate of word stress in German. In English, emphasised syllables are always characterised by a full vowel. The word of occurring in an unstressed position in a sentence is pronounced with a schwa (a). An emphasised of conversely, is pronounced with a full vowel (the D), which requires a greater and more precise displacement of the tongue than a schwa and therefore more time for its production. This is also true for German; however, unstressed vowels show less reduction than in English. In summary, the phonetic realisation of nuclear stress or emphasis differs between German and English to some extent. Whereas pitch plays the most important role in English, with loudness and length having only minor functions, it seems that stress in German can be produced by an increase in loudness alone. These phonetic differences are associated with perceptually distinguishable acoustic events (Trim 1988), which means for bilingual learners that a "mixed" production strategy would be perceived as deviant from the monolingual norm.

3.3

T h e acquisition of nucleus placement

The acquisition of the phonology of nucleus placement requires the mastery of several phenomena: An utterance may consist of a string of words. Each word has a clearly defined syllable, on which word stress is produced when the word is spoken in isolation. This stress pattern is the property of the word and is likely to be stored in the lexicon with it. The first task of a language learner is therefore to acquire the language-specific rules of word stress. In an utterance however, not every word stress is realised. Some word stresses are suppressed, especially in casual and fast speech. Even in slow speech, function words typically do not receive stress, which leads to the production of a language-specific rhythm, an alternation between stressed and unstressed syllables. The second task of a language learner is thus to realise only certain of the word stresses when stringing words together. By definition, the nucleus is the last stressed syllable of an utterance or sentence. In order to use nucleus placement phonologically with the linguistic purposes of emphasis and the various forms of focus such as contrast described above, children have to master the third task of recognising the specific status of the nucleus and of acquiring the placement rules. Prior and parallel to all this, the phonetic parameters used for the production of stress must be controlled physically. A language-specific combination of pitch, length and loudness must be acquired for the production of stress and emphasis, and the nucleus must be marked by an additional distinct pitch movement. For German/English bilingual learners, two systems of word stress rules and nucleus placement rules as well as two production strategies of nuclear stress must be acquired. The following sections describe what is known about these acquisition tasks so far. Unfortunately, data from bilingual

31 children have not been collected yet so that all the evidence cited comes from monolingual acquisition.

3.3.1 The acquisition of word stress Fikkert (1994, 1995) and Fikkert, Penner & Wymann (1998) propose a model of the acquisition of word stress in Dutch and German with the following phases (figure 12): In the first phase, all disyllabic words are reduced to the stressed syllable and only this is produced. A German example would be the production of ['ta] for Katze ['katsa] (cat). In phase 2, disyllabic words with the stress pattern strong-weak (SW) are produced correctly; production of words with the stress pattern WS, in contrast, show a truncated first syllable and the production of only the stressed syllable. This also occurs with trisyllabic words of the SWS stress pattern, where only the last strong syllable is realised. Examples would be the production of [wa:f] for the English word giraffe and ['te:n] for the German word Kapitän [,kapi'te:n] (captain). In this phase, weak syllables may be added to the strong syllable of iambic (WS) words, while the first syllable is truncated, thus turning the word stress into a trochaic (SW) pattern (['wa:fa] for giraffe).

Formen der

Phase 1

Phase 2

Phase 3

Phase 4

Phase 5

Phase 6

[t>«J

fbeibii]

Iteibii]

[teibiij

{•betbii]

[txabii]

Itei}

{lein], [teins]

fkutnein]

[•ka-nein]

[koi'neln]

[koi*nem]

twi

(Toni

('osfan]

[•orfiifan]

['ot^ii/anq

/Vslii/ant/

Cpaipiijein]

[Jtapitein]

Erwachsenen

baby •Baby* /•bttbli/ konijn 'Kaninchen* /koriiein/ olifant 'Elephant' ftnHdant/ kapitein 'Kapitän' /Jcaoitcin'

Figure 12: The six phases of the acquisition of word stress as proposed by Fikkert, Penner & Wymann (1998) with examples from Dutch. The reduction of words with more than two syllables to this pattern has, further supported by the listening preference for trochees found by Cutler et al. (1993) in 9-month-olds, led many authors to postulate a bias towards trochaic patterns (Allen & Hawkins 1978, 1980; Echols & Newport 1992; Leopold 1947; however see Vihman et al. 1998) and an initial template with a trochaic form and a simplified structure has been proposed (Archibald 1995b; Fee 1995; Gerken 1994; Wijnen et al. 1994). A model of the prosodic structure of early child word forms with the initial minimal word form (Wd min) as proposed by Fee

32 (1995) is shown in figure 13: The word consists of a single foot (F), which comprises two syllables (a), both of which are monomoraic (|i).

Figure 13: The minimal word form as proposed by Fee (1995). This basic structure becomes apparent in child word forms that constitute mismatches to adult forms such as ['naenas] for [ba'naena], where the leftmost mora captures the leftmost stressed syllable and the unstressed [ba] is truncated as can be taken from figure 14.

[b] [a] [n] [as]

[n]

[«]

Figure 14: The prosodic structure of the child production [naenae] for "banana" as proposed by Fee (1995). Phase 3 of the acquisition of word stress sees an over-generalisation of this trochaic pattern so that a stress shift in iambic words may occur (e.g. ['ku:nein] instead of [ko:'nein], Dutch for rabbit). In phase 4, trisyllabic words of the SWS kind are not truncated any more, but the two strong syllables receive level stress. In the last phases, children produce iambic word stress correctly as well as words with a SWS pattern that show main stress on either the first or the last strong syllable. The single quantity-insensitive trochaic foot, which constituted the proposed minimal word form (figure 13), has been expanded to include more than one foot in view of evidence of disyllabic target language words stressed on the second syllable, as Fikkert (1995) describes. This model was developed for Dutch and has been validated for Swiss German and Standard German (Dahmer 1997). It still needs confirmation with English data and has yet to be tested for German/English bilingual language acquisition. Phase 4 is estimated to be

33 reached around age 2;2, which means that children already produce many word combinations before they have acquired the rules of word stress in their language.

3.3.2 Transition to sentence-level1 stress As stated above, word combinations are produced when children are still in the phase of level stress production of adult iambic and trisyllabic words. It seems that early two-word utterances are produced within this prosodic frame as well: Penner (personal communication) observed this for early object+verb constructions in German. The first productions show level stress, i.e. equal stress on both words, and only later renditions with a single nucleus become possible. He claims that the acquisition of these early word combinations can be compared to the acquisition of the prosodic structure of compounds and that a representation of syntactic structure is not required yet. Similarly, Oksaar (1977) and Bloom (1971) report that the process of stringing two words together begins with two equally stressed lexical items, which are initially even separated by an intonational pause. Crystal (1986) describes the transition from single- to two-word-utterances as a gradual process dividable into two phases. The first constitutes a simple concatenation of two words still separated by a short pause. The second is characterised by an integration of the two words to a single tone unit or intonational phrase. Usually, one word has a greater intensity than the other, and in 9 0 % of the cases this is the second one. Thus, stress in these early word combinations is not yet produced at a rhythmic or sentence level but still within the prosodic frame of word stress. Probably, the final phase of word stress acquisition has to be reached before sentence-level stress can be produced. Evidence for a complete reorganisation of the speech planning process in monolingual acquisition at the transition to longer utterances comes from several authors: The beginning of word combinations seems to coincide with a restructuring process in word stress. Snow (1994) describes a regression to isochronicity of syllables in the production of single words, and Archibald (1995b) and Fikkert (1994) also claim that adult-like word stress is preceded by a phase of equal stress distribution on the syllables of a word. Elbers & Wijnen (1992), working on the basis of Levelt's (1989) speech production model, describe this phase as the transition from semantic to syntactic speech, which requires the development of a "formulator", a mental device encoding grammar and phonology in speech production. The transitional phase is characterised by many hesitations, restarts and stuttering and reflects the increasing control of speech planning processes. Especially in English, the transition to sentence-level stress is associated with syllable reduction. Allen & Hawkins (1978) differentiate between three levels of stress in syllables: Accented syllables receive both stress and pitch movement, heavy syllables are stressed without pitch movement, and light syllables are neither stressed nor show pitch movement. Child speech at around two years of age consists only of accented and heavy syllables. Even at age 3;9, only two thirds of syllables that would be light in adult speech were

1

In the following, the term "sentence-level stress" will refer to any type of stress above the wordlevel and thus include phrase-level stress.

34 produced appropriately. To my knowledge, no equivalent data have been collected for the acquisition of German or bilingual learners.

3.3.3 Acquisition of the phonological rules of nucleus placement Very few studies have been carried out with the object of investigating the acquisition of the phonological rules of nucleus placement. Cruttenden (1986) adduces examples for systematic variation of nucleus placement with the semantic content of a two-word utterance. If the child wishes to express a locative relationship with the utterance "Daddy garden", the nucleus falls on the second word. If, however, possession is expressed, the nucleus falls on the first word. In longer stretches of speech consisting of a few utterances it can furthermore be observed, he claims, that new information is more likely to be marked by stress than old information. Unfortunately, his description constitutes an impressionistic account rather than a systematic study. Purnell (1997) reports empirical evidence that the function of focusing by nucleus placement is acquired in the stage of early word combinations. She studied nucleus placement in two-word utterances where her subjects provided answers to questions such as "Who is eating the banana?" and "What is the monkey doing?". Responses were "\Monkey banana" in the first and "eat \banana" in the second case, showing a N+N (noun plus noun) respectively V+N (verb plus noun) structure with the nucleus always on the new information. However, experimental data by Schoeler (1987) shows that the acquisition of the phonology of nucleus placement continues well into puberty. He tested the metalinguistic awareness of children concerning the nucleus placement in questions, which determined the marking of new and old information in the context. One situation was given to the subjects, in which two friends saw someone swimming in a lake. The subjects were asked whether question (31) /Schwimmt Peter dort (/Is Peter swimming there) or question (32) Schwimmt /Peter dort (Is /Peter swimming there) was appropriate. At age 7 subjects were only slightly better than chance (56%), and at age 15 only 82% of the replies were correct. This suggests that this development continues beyond 15 years of age. Decision on the appropriateness of nucleus placement in statements proved to be even more difficult for Schoeler's subjects. In production, nucleus placement in questions was correct in 95% of the cases for 9year-olds and 90% for 14-year-olds. Nucleus placement in statements was mastered by 68% at age 9 and 74% at age 14.

35 3.4 Mastery of the phonetic production of nuclear stress and emphasis

Allen & Hawkins (1978) measured the phonetic production of nuclear and non-nuclear accented and heavy syllables of three children acquiring English between 2;8 and 3;4. Despite difficulties in classifying syllables as nuclear or non-nuclear, they concluded that nuclear accented syllables show greater length than non-nuclear accented and non-nuclear heavy ones. Considerable pitch movement was found in all stressed syllables irrespective of their status as nucleus or not. Intensity was not measured. Pitch movement thus seems to be the most important attribute of stress in English child speech at around three years with length differentiating between nuclear and non-nuclear stress. Pollock et al. (1993) cite evidence for the hypothesis that major developments in the acquisition of the phonetic production of word stress occur between the ages two and three. They compared the combination of the phonetic parameters pitch, loudness and length in the production of stress in disyllabic nonsense words produced by children aged two, three and four years. Acoustic measurements showed that the three- and four-year-olds consistently used higher pitch and increased loudness for the marking of stress. Stress production by the two-year-olds was far more variable. To begin with, only 67% of their productions could be reliably classified as containing a stressed syllable. In other words, in a third of their words adults could not detect or agree on stress placement. The acoustic analysis revealed that, although there were differences in fundamental frequency and intensity between the two syllables of the word, the magnitude of the change was significantly smaller than for those words where stress was clearly perceptible. Yet, even in those productions with perceptible stress placement, two-year-olds did not use pitch and loudness systematically. Although stressed and unstressed syllables differed in pitch height and intensity, only in 55% of the cases this was in the direction predicted by the adult target. In summary, the phonetic production of stress by two-year-olds is still very unstable. Words produced without perceptible stress or with stress patterns other than that suggested by the target indicate that physical control over the production of pitch and intensity has not been mastered yet. In terms of the phonetic parameter of length, Pollock et al. (1993) could replicate the findings by Allen & Hawkins (1978). Unstressed syllables produced by three- and fouryear-olds were significantly shorter than those produced by two-year-olds. The mean length of unstressed vowels in a final position decreased from 25cs at two years to 21cs at four years of age. In non-final position, length of the unstressed vowel decreased from 18cs for the two-year-olds to 12 cs for the four-year-olds. To my knowledge, no data are available for the phonetic correlates of stress production by German children. Neither have bilingual children been tested yet.

36

3.5 Research questions

From the above it can be seen that the acquisition process up to the word level is very well attested, at least for monolingual acquisition. The acquisition of sentence-level stress and with it nuclear stress has not been the subject of many studies, and data concerning the acquisition of the phonological rules of nucleus placement or the phonetic production of nuclei are very scarce in the case of monolingual children and non-existent in the case of bilingual ones. This study aims to fill some of these gaps: The acquisition of the phonological rules of nucleus placement by German/English bilingual children will be investigated, and their phonetic production during acquisition will be explored. In detail, it will be tested how the transition from word-level stress to sentence-level stress is achieved by languages learners, when and in which order the phonological rules of nucleus placement are acquired, and which phonetic parameters of intonation are used for the production of nuclei. With reference to the specific bilingual acquisition process, it will be investigated whether the phonological systems are acquired at the same rate and order, and whether interactions between the systems occur. On the phonetic level, cross-linguistic differences and similarities in the production of stress will be observed. Chapter 6 summarises all of the research questions of this study.

4. Bilingual acquisition of the system of pitch

This chapter is concerned with the bilingual acquisition of the phonological system of pitch. Section 4.1 describes the inherent meanings the various types of pitch movements are assumed to have and explores the different ways of phonological use of pitch in the adult systems of German and English, and section 4.2 compares the phonetic productions of pitch accents in both languages. Section 4.3 focuses on the acquisition of phonological rules for the linguistic use of pitch, and data from both monolingual and bilingual studies will be discussed with reference to the theoretical framework and methodology employed. Finally, section 4.4 lists what is known about the acquisition of the phonetic production of pitch accents, and section 4.5 develops the research questions posed in this study in the area of the bilingual acquisition of pitch.

4.1 The phonological systems of pitch in English and German

Many authors assume inherent meanings of the various types of nuclear pitch movements identified in the descriptive systems of both the British tradition and the AM approach. Cruttenden (1986) writes that a fall signals finality and completeness and a rise-fall finality and being impressed. A fall-rise is claimed to express reservation, emphatic contrast and contradiction, and a rise uncertainty, menace or a non-committal attitude. Equally, Gussenhoven (1983) suggests that nuclear tones have an inherent and independent meaning. His rationale is that the intonation system must be made up of distinctive elements (which he calls morphemes) and rules that operate on them and that these elements, i.e. the nuclear tones, have distinct meanings as the system would otherwise not be learnable. He distinguishes three basic nuclear tones with morphemic status: The fall (\), the rise (/), and the fall-rise (V), which can all be used both literally and metaphorically. The basic meanings are: \ / V

speaker adds something to the background (the hearer's knowledge) speaker selects something from the background speaker does not know whether it belongs to the hearer's background

According to the autosegmental approach, each intonational phrase has a tune or melody that is determined by its particular sequence of pitch accents, phrase accents and boundary tone. Pierrehumbert & Hirschberg (1990) claim that their meaning is purely compositional, i.e. made up of the meaning of their elements, which all control their specific domain. Thus, boundary tones convey information about relationships among intonational phrases, phrase accents signal the relatedness of intermediate phrases to the preceding or succeeding ones, and pitch accents convey information about the specific lexical items they are associated

38 with. Together, they are assumed to convey information about the way a speaker intends the hearer to interpret his communication regarding a) their shared knowledge and b) the speaker's intentions with the subsequent utterances. In detail, each of the pitch accents, phrase accents and boundary tones are proposed to be associated with distinct meanings: In their analysis of American English intonation, Pierrehumbert & Hirschberg postulate that the H* accent conveys that the lexical items made salient by it are to be treated as new information. In the combinations H* L-L% and H* L-H% it forms the neutral declarative intonation. In combination with a H-H% it signals a high-rise question, in which some information is still given to the hearer. An example is the reply (33) I thought it was good H* H* H-H% to the question "How did you like the film?" if the speaker wants to convey "I liked it, do you agree", as exemplified in Pierrehumbert & Hirschberg (1990, p. 290). They claim that also in combination with H-L% the H* pitch accent expresses that information is given to the hearer. The L * accent, in contrast, is assumed to convey salience of a lexical item, which however is not to form part of what the speaker is predicating in the utterance. The L * + H is taken to signal lack of commitment or uncertainty on the speaker's side, and for the L + H * accent they report that it is used to mark a correction or contrast. Evidence for distinct meanings of H + L * and H*+L, they claim, is rather scarce. Phrase accents in English intonation are interpreted as follows: A H- is taken to signal the cohesion between this phrase and the subsequent phrase, whereas a L- indicates the separation of the current phrase from the following one. Similarly, a H% boundary tone is considered to indicate that the phrase is linked with the subsequent ones, whereas a L % signals completion of this part of the discourse. Most of these assumptions of meanings of certain categories of pitch movements or pitch accents are based on English data. However, Vassiere (1995) proposes universal archetypes of the following: / \ / V

and high pitch signal beginning signals end signals incompleteness signals disjuncture

Differences of use of pitch between English and German on a very general level were found by Grover, Jamieson & Dobrovolsky (1987), who observed that English speakers use falls to signal continuation whereas German speakers produce rises for this purpose. Furthermore, although falls are used in both languages to signal termination, the falls in German were lower than those in English. This was also noted by Kuhlmann (1952). As illustrated in examples (8) and (9) in section 2.2, the phonological system of pitch can be used to characterise different sentence types. Batliner (1988) and Oppenrieder (1988a, b) postulate that, in German, pitch is used systematically for the marking of different sentence types. Oppenrieder (1988b) attempted a description of intonational prototypes for various German sentence types for which he employed read examples of so-

39 called intonational minimal pairs, sentences containing the same string of words and differing only in their intonation. An example would be the utterance (34) Schlafen Sie (sleep you) which can be realised as either a question with the meaning "Are you asleep?" (by a rising nucleus on schlafen) or an imperative "Sleep!" (by a falling nucleus). The read test sentences were judged by naive listeners according to their "naturalness" for several sentence types such as statement, imperative and question. Those sentences that were rated to be typical instances for one of the categories (so-called prototypes) were analysed acoustically. Analyses showed that every prototype sentence is characterised by a particular interplay between initial pitch height, pitch movement on the accented syllable and final pitch height. Figure 15 shows the intonational prototypes of an imperative and a question for sentence (34).

ffie-Imperativsatz Schlafen Sie!

12

50%

tief 100*

konvex 50*

Bntscheidungsfragesatz Schlafen Sie?

14

64%

hoch 100%

konkav 75%

Figure 15: Intonational prototypes for sentence (34) as an imperative (top line) or a question (bottom line) as postulated by Oppenrieder (1988b). It can be seen that the typical yes-no question "Schlafen Sie?" in German has a high onset and a fall-rising pitch movement whereas a typical imperative "Schlafen Sie!" is characterised by a low onset and a falling nucleus. Sentences which listeners judged to have an intonational form typical for statements showed falls or rise-falls on the nucleus (Batliner 1989c). These experiments demonstrate that hearers have specific expectations concerning the nuclear pitch patterns of particular sentence types. Unfortunately, this does not permit the conclusion of corresponding phonological rules for the production of various types of sentences. This very complex issue comprises the interplay of sentence type, intonational form and speech act. In other words, the grammatical form, the use of certain pitch patterns and the pragmatic function all contribute to the "meaning" of an utterance. An utterance with the grammatical form of an interrogative question may nevertheless have the pragmatic function of a statement and vice versa. A rising pitch movement, for example, turns (35) He's going out

40 a sentence with the grammatical form of a statement, into a question at the pragmatic level, just as the sentence (36) Nehmen Sie Platz

(sit down)

with the grammatical form of an imperative has this function only with a falling pitch. A low rise in this utterance, in contrast, would signal a friendly request; a high rise insistence. Choice of both grammatical form and pitch movement pattern seems to depend on situational context. This was demonstrated in an experiment by Dorn-Mahler & Grabowski (1991). Their male subjects were asked to play the role of a boss asking his secretary to make some coffee. In the first situation, the request was both legitimate and appropriate, in the second, resistance from the secretary could be expected (it was well past her leaving time). Results show that in the first case 17 of the 20 requests were grammatical questions with 10 showing a rising nucleus. The three grammatical imperatives all had a rising nucleus. In the second case, only 12 of 21 requests were formulated as grammatical questions with only five rises. The nine imperatives showed rises in eight cases. The analysis of spontaneous speech underlines the difficulty of establishing phonological rules for the use of pitch to mark speech acts. These difficulties are especially apparent in the attempt to characterise English and German questions by specific pitch patterns. Although there seems to be an intuitive consensus that they are associated with rising pitch, empirical research has yielded contrary evidence. Fries (1964) analysed English yes-no questions by TV quiz panellists and found that 62% had a falling rather than a rising pitch. Equally, Crystal (1969) in his corpus of 3 hours speech, could not find a clear association of questions with any nuclear form. Ladd (1996) proposes that the tune of a polite yes/no question in British English is H* L-H% and H* H-H% in American English. Luuko-Vinchenzo (1988) found that 70% of the German questions with a question word she analysed showed a high onset, a fall-rising pitch movement on the nucleus and either a high or low boundary tone. Yes-no questions, in contrast, showed a high onset and fallrising pitch movement, which, however, need not begin on the nucleus but may start earlier and stretch over the rest of the utterance. Batliner (1989c) postulates that the intonational marking of questions is only necessary if grammatical cues are absent. In this case, the speaker can choose between a high boundary tone or a fall-rising nucleus to mark questions. Typically, these two options are realised together. The above discussion suggests that pitch is used in a similar way in English and German when it is employed for the linguistic function of marking the type of speech act. Statements in both languages seem to be marked rather consistently by falls (Oppenrieder 1988a, b; Crystal 1969). The characterisation of other types of speech acts by pitch is influenced by pragmatic criteria. Tendencies of systematic differences only seem to exist in the area of questions, where falling nuclei occur in English in certain types of questions, whereas in German, rises seem most common in all types of questions. Another of the well-researched linguistic functions of pitch in the English adult intonation system is the expression of attitudes. However, it is difficult to keep attitude and marking of sentence type apart: Although Armstrong & Ward (1926) claim that a "final falling contour" is used in ordinary, definite and decided statements in English, they concede that a statement may end in a final rising contour, in which case it is "not so

41 definite". It seems that the expression of attitude by pitch overrides the marking of the grammatical sentence type. As described above, O'Connor and Arnold (1973) distinguish between the last stressed syllable in an intonation group (the nucleus) and preceding stressed syllables (the heads) as well as any unstressed syllables preceding the head/s (the pre-head). Each of these elements have contrastive pitch movements. Certain combinations of types of head and types of nucleus form "tunes", a notion introduced by Armstrong and Ward (1926) and Jones (1957). These systematic combinations, of which O'Connor and Arnold establish ten (table 2), are assumed to convey a specific meaning: PITCH FEATURES OF T O N E GROUP (UNEMPHATIC) Tone Group The Low Drop The High Drop The Take-Off The Low Bounce The Switchback The Long Jump The High Bounce The Jackknife The High Dive The Terrace

Pre-head

•o It 0 •C1 vi* fk

>

o hJ

It

X

W/

It n

WA

It 7/M

X X X X X

Tall

Nuclear Tone

X X

X X X X X X X X X X

Head

Wt

n

Z3 < n .2V M 3 "Pi fai rt 2 fa

S

X X X X X X X X X X X

« a I -o i§

an essential pitch feature of a tone group,

a pitch feature that may occur in a tone group. Table 2: The pitch features of the ten tone groups proposed by O'Connor & Arnold (1973). The 'Take-Off' in table 2, for example, is assumed to convey the following attitudes: In statements, amongst others, encouraging further conversation, guarded and reserving

42 judgement; in wh-questions, with the nuclear tone on the interrogative word, wondering and mildly puzzled; in yes-no questions disapproving and sceptical; in commands beginning with don't appealing to the listener to change his mind; and in interjections, sometimes reserving judgement and sometimes calm and casual acknowledgement. It can take on the forms of low rise a) low rise +tail b) low pre-head + low rise (+tail) or c) (low prehead +) low head + low rise (+tail) Example (37) shows the third possibility (c) in a reply to the question "How much did you win?" (37) About a .thousand ,pounds with "," symbolising a low head and "," symbolising a low rise. In German, pitch is not used for the expression of attitudes to the same extent as it is in English. For this linguistic purpose, particles such as wohl, ja etc, which English does not have, are much more likely to be employed (Trim 1964). A bilingual learner of German and English is faced with a threefold task of acquisition of the systematic use of pitch. On the one hand, the general meanings of the pitch movement or pitch accent categories are very similar in both languages, although there is some evidence that continuation may be signalled by falls in English and rises in German. On the other hand, pitch accent categories are used in a slightly different way to mark different speech acts in German and English. Whereas statements seem to be associated with falls in both languages, the association of questions with certain pitch movement patterns varies. Furthermore, language learners must unravel the complicated interplay between grammatical form, intonational form and pragmatic function of the various speech acts. As the third task, German/English bilingual children must differentiate between the phonological use of pitch for the expression of attitudes in both languages. The English rules must not be applied in German, where other linguistic forms take over this function. In general, however, it is unlikely that very young children use pitch for the expression of a similar range of attitudes as assumed in the adult system. Many of the attitudes which were described to be associated with various pitch movements above such as "guarded judgement" or "conveying a sense of involvement" cannot be assumed to form part of a child's world or communicative intent yet.

4.2

T h e phonetic production of pitch accents in English and G e r m a n

The phonetic production of various pitch accents in German and English differs a great deal. Grabe (1998) found that the H*+L pitch accent in both languages exhibits onglides, which can be realised as a rise or level pitch. Cross-linguistic differences become apparent in the alignment of FO within the vowel of the stressed syllable: In German, the peak of FO

43 occurs at the right edge of the vowel, whereas it is reached in its middle in English (figure 16, top line). Conversely, for the L * + H pitch accent, the dip in FO is lowest in the middle of the stressed vowel in German and aligned with its right edge in English, as the bottom line of figure 16 illustrates. In English, FO is lowest at the right edge of the vowel (indicated by the grey column), whereas it is lowest in its middle in German.

Southern Standard British English Peak alignment in H * + L

H gj

( Dip alignment \ V in L*+H )

Northern Standard German

V

/

Jsgj

^

Figure 16: The typical FO movement of the nuclear H*+L and L*+H pitch accents in English (a) and German (b) (from Grabe 1998). In the case of very little sonorant material in the stressed syllable of a H*+L pitch accent, English falls are compressed, i.e. the onglide and fall of the FO occur in a shorter stretch of time (figure 17). The rise is less pronounced, and the fall of FO happens in a shorter time. In German, in contrast, H * + L are truncated, i.e. the fall in FO is not realised any more.

Acoustic level (FO)

CompressionPhonetic level Surface phonology —

Adjustments



Underlying phonology

\

H*+L NONE

H*+L

Figure 17: Compression of an English H*+L pitch accent (from Grabe 1998). Not all of these differences in the phonetic production of the pitch accents, however, are perceptually noticeable. H * + L pitch accents in both languages are always heard as falling pitch by native speakers irrespective of whether truncation has applied or not. Differences

44 in peak alignment between the two languages are sometimes perceptible: The German H*+L accent with pronounced onglide is often heard as a rise-fall, whereas the English H*+L with peak alignment within the syllable is only heard as a fall. For a bilingual learner of both languages, two options seem possible. Either both phonetic production strategies are acquired and applied in the respective languages, or the learner produces the pitch accents in both languages with a perceptually not noticeable mixed strategy. The data cited by Grabe (1998), however, suggest that, at least in the case of H*+L pitch accents, a mixed strategy would be perceptible.

4.3 The acquisition of the phonological system of pitch

It has been claimed that the systematic use of pitch is the first aspect of language to be acquired (Bierwisch 1966; Fry 1966; Lenneberg 1967), and indeed the perception of the phonetic features of intonation is well developed from birth. Various older experiments tested infants' sensitivity to variations in isolated phonetic parameters of intonation. Morse (1972) found that the perception of differences between pitch movements is developed early: six- to eight-week-old infants can discriminate changes in pitch contour. Kaplan (1970), however, could not replicate these findings. She exposed four- and eight-montholds to the utterance "See the cat." with either falling or rising pitch on the last lexical item. By measuring their heart rate and their orientation movements, she could show that the four-month-old infants did not discriminate between the two versions, and that the eightmonth-olds discriminated between them only if the pitch movement was associated with stress. These early studies used highly unnatural material: Synthesised speech with the acoustically manipulated variables pitch, loudness and length. As discussed in section 2.3, these phonetic features co-vary in real speech so that abilities were tested that are not required for the perception of real speech. However, recent studies using real speech material (Jusczyk 1992; Decasper 1994) could also demonstrate that infants and even the foetus are sensitive to variations in pitch and other aspects of the intonation of their ambient language/s. Moreover, infants seem to be able to spontaneously imitate the pitch height and pitch movements of their parents' utterances (Lieberman 1967; Papousek & Papousek 1981; Rabson, Lieberman & Ryalls 1982). Several authors have suggested that the pitch movement categories of the target language are already observable in infants' babbling (Halliday 1975; Crystal 1981). The production of pitch is thus assumed to be phonologically represented at an earlier age than the production of phonemes. D'Odorico (1984) cites evidence that pitch movements on vocalisations differ with the context in which they are uttered. She studied the cry and non-cry vocalisations in Italian infants aged 0;4 to 0;9, categorising them as expressions of discomfort, calls and requests. Three of her four subjects showed a higher proportion of rising and high level pitch on requests than on discomfort cries and thus showed a tendency to associate pitch movements with specific communicative situations. Furthermore, young children seem to be able to mark their style of conversation with pitch. This was concluded in a study by Furrow (1984), who recorded 12 children aged

45 1; 11 to 2;1 and analysed their utterances auditorily. Each of them received a rating (from low to high) on the parameters pitch, loudness and pitch range, which were then added up to form a „prosodic score". Social context was categorised into three broadly defined classes: Eye contact, other social contact, and private speech. Utterances of each category differed significantly in their prosodic score with „eye contact" having the highest and „private speech" having the lowest. It can thus be concluded that by two years children can use intonation for broadly defined linguistic purposes such as communicating with someone and engaging their attention vs. using speech without a social context, i.e. talking to oneself. This replicates Halliday's (1975) observations, who broadly classified his son's utterances into pragmatic and mathetic. The former refers to speech with instrumental or regulatory function, the latter to utterances with a personal or heuristic function. From 1 ;7 on, Nigel systematically marked the former with a rise and the latter with a fall. In other words, utterances directed at a conversation partner and demanding a response were intonationally marked. This system broke down, Halliday reports, with Nigel's increasing semantic potential, and its role was taken over by the grammatical mood system. Crystal (1981) claims a detailed order of acquisition of the various pitch movement categories of English. According to him, falls are the first pitch movements to be produced. Next, the contrast between falls and level pitch is acquired, then the contrast between falls and rises. Later, a low fall is distinguished from a high fall and a low rise from a high rise. The last pitch movements to be acquired are rise-falls and fall-rises. Unfortunately, this claim is only based on his personal observations and no studies have yet offered support for this notion. Similarly, no experimental data on the bilingual acquisition of the phonological use of pitch can be found in the literature, and extant descriptions are of a highly subjective kind. Leopold (1947) devotes only two pages of his diary study of four volumes on the language acquisition of his German/English bilingual daughter to intonation. 1 In general, he writes, Hildegard imitates use of pitch faithfully. Some examples are adduced of lexical items uttered with distinctive pitch patterns. He notes a high or rising pitch on [?a], a sound combination used to express wishes between 1;3 and 1;6. Early differentiation of pitch movement is documented in Hildegard's systematic use of rises for requests and high pitch for expressions of interest from 1;3 on as well as in the lexical item bye-bye, which had different meanings according to its intonational form: Level pitch and a trochaic stress meant "going for a walk", whereas falling pitch and level stress were used for saying goodbye. Unfortunately, Leopold does not comment at all on any differences of use of the pitch systems in Hildegard's two languages. Anecdotal evidence for the use of pitch by bilingual learners comes from Hoffmann (1985, 1991), who observed that the closer the bilingual's two languages are in terms of phonology, the greater is the likelihood of interference, with pitch movements being the first affected linguistic area. Furthermore, she reports that with decreasing exposure to one language, fluency in intonation is lost first. Similarly, stress, rhythm and pitch features benefit immediately from a renewed exposure to the language in question.

1

He uses the term only for the "melodic aspects", i.e. pitch movement of Hildegard's early utterances.

46 Experimental studies concerning the acquisition of the phonological use of pitch have only been carried out with monolingual children. They produced contradictory results as to when the systematic use of pitch begins. The first prerequisite of constructing a linguistic system of pitch is the realisation that different pitch movements may contribute to the function of marking different speech acts: The same segmental string of sounds can express quite distinct intentions and/or meanings according to the pitch movement it carries. As children's utterances are very short at the beginning, this function is usually restricted to the nucleus. Contrastive use of nuclear types obviously requires adequate cognitive capacities probably the same that enable a child to use speech sounds in their linguistic function (Vihmanetal. 1982). As a general problem, however, the function of nuclear tones produced by children using single words and longer utterances is difficult to establish as their attitudinal meaning can only be inferred by the listener. Discrimination of different nuclear pitch movements by an adult listener, however, is not sufficient proof for the assumption that infants use them with an adult-like intent. There is a danger of circularity when the interpretation of an utterance as for example a question is based on the final rising pitch movement alone. Questions without rises will be ignored and non-interrogative utterances with final rises will be classified erroneously. Only meticulous attention to the general communicative situation can help to determine the linguistic function of pitch. Thus, conclusions for actual linguistic use of pitch at this early stage are almost impossible. Experimenting with various pitch movements by a 1; 10-year-old is reported in Carlson & Anisfeld (1969). They observed an "intonation substitution practice" in their subject, who repeated the same word over and over again with varying pitch movements, in a loud and soft voice and in a staccato or legato (long and slow) way. This playful experimenting with the phonetic parameters shows clear parallels to babbling in the acquisition of segmental phonology and might be an important precursor of the acquisition of the linguistic system of pitch. There is some evidence that children use pitch in order to mark types of utterances from an early age. Galligan (1987) describes the differentiation of statements and questions by nuclear tones in 17-month-olds acquiring English. She investigated two children in their acquisition of the grammatical use of pitch from the age of 10 to 21 months. One of them used rising pitch on utterances that were classified as expressing "general interest". By 12 months, he was found to repeat labelling utterances by his mother with a rising pitch movement, which accompanied the act of turning to her. In the following months, these repeated utterances were produced with a wider pitch range and increasing loudness and duration. At 18 months, he began to produce fall-rises for eliciting labelling responses. The other child also began by associating rises with general interest utterances. Between 14 and 18 months, however, her use of rising pitch movements in utterances in this context decreased in frequency. At 18 months, she again produced rises to elicit naming responses from her mother in the context of book-reading. For both children, instances of words differing only in the use of pitch were analysed, and it could be demonstrated these utterances were associated with falls in a description context and with rises in an interrogative context. Similarly, Bassano and Mendes-Maillochon (1994) describe the early use of the pitch system by a French-learning child. At 1;2 she systematically used falls for declarative and exclamative utterances, and at 1 ;9 rises for interrogative utterances.

47 Counter-evidence for an early systematic use of pitch for the differentiation of questions from other types of utterances comes from a study by Flax et al. (1991), who report the individual development of their three subjects acquiring American English. They were recorded at the prelinguistic stage, when they had a vocabulary of 10 words, and when they knew 50 words. The utterances were analysed instrumentally and classified into rise and non-rise terminal contour. Unfortunately, it is not specified whether „terminal contour" corresponds to the concept of nucleus or last pitch accent and boundary tone. All subjects showed individually varying and developmentally unstable associations between terminal pitch contours and communicative intent. The first child produced rises in 54% of all utterances. They frequently accompanied requests, responses and the act of giving at 1;2 and requests and giving at 1;8. The second child produced rises only for 14% of all utterances: In the prelinguistic stage, predominately with requests and, from the linguistic stage on, increasingly with protest. The third subject produced 13% rises in the first recording and 22% in the last. They were initially often associated with requests, responses and protests. Later they marked only the majority of requests and protest utterances. These results underline the importance of studying children individually. Furthermore, it demonstrates that although there is a tendency of requests to be accompanied by a rise, the correlation does by no means justify the assumption that the intonation system has been mastered at an early age. Neither could Marcos (1987) establish a clear connection between pitch movement and communicative intent. She studied ten French-learning children between the ages of 1 ;2 and 1;9 and compared their utterances, which she classified as requests or labelling. Although she observed a tendency of requests to be associated with rises and labelling with falls, the differences of pitch marking were not significant. Equally, for English children, a study carried out by Robb & Saxman (1989) did not produce any evidence for the systematic phonological use of pitch with the function of marking communicative intent. These results suggest that, although individual children may show an early systematic use of pitch for the differentiation of such basic types of utterances as questions and statements, the claim that pitch is one of the earliest linguistic systems to be acquired cannot be confirmed on the basis of these studies. Not only are the data of a contradictory character, the studies also show shortcomings on a theoretical and methodological level. None of the descriptions of pitch comply with the notational systems of either the British tradition or the AM approach but are of a highly idiosyncratic character. This is reflected in measurements of initial pitch height (Marcos 1987) or FO-movements across vocalisations (Robb & Saxman 1989), which are of little relevance in the current theoretical framework of intonation. Furthermore, auditory analysis is not complemented by instrumental analysis or vice versa, and often the influence of other phonetic parameters such as loudness on pitch is not regarded (Marcos 1987; Flax et al. 1991). For a description of the acquisition of the phonological use of pitch it is therefore necessary to operate within the theoretical framework of intonational description and to differentiate between phonological category and phonetic realisation of intonational phenomena. Moreover, the mental representation of the phonology of pitch ought to be assessed as well. As already discussed above, the testing of the representation of the intonation system poses severe methodological problems. Yet, Schoeler's (1987) study of the developing metalinguistic awareness of pitch movements in German children allows some inferences. He showed that children under 6 years of age could not judge whether the pitch contour of a

48 given utterance was a) anomalous or not and b) context-adequate or not. This ability begins to be acquired with school entry and is fully developed at age 13. He presented his subjects with sentences such as "1st zuviel Geld" (Is too much money) and varied the pitch contour: An overall falling contour or rise-fall contour for statements vs. a rising or fall-rise contour for questions. Judgements whether an utterance constitutes a question or a statement were correct in 70 percent of the replies in seven-year-olds and in 100 percent of the replies by the 15-year-olds. This suggests that not all of the seven-year-olds had reached El-level representations in Karmiloff-Smith's (1992) sense yet. One of the factors contributing to this competence might be short-term memory, which correlates r=.44 with success in the evaluation of the pitch contours. An interesting claim is that the ability to assess the appropriateness of intonation requires the ability to produce it correctly first. Most of the pre-school children were able to produce correct intonation contours. Around age 10, they learned to assess them correctly, and from age 15 on the ability to give reasons for the assessment is acquired. Similarly, Cruttenden (1974) could show that English children of school-age have not acquired the full representation of the phonological functions of pitch yet. English TV readers show regular pitch patterns in their announcement of football results. As illustrated in example (38a), a falling nucleus on the second team indicates a draw: (38) a. Halifax /one I \Liverpool one b. Halifax \one I VLiverpool... An away win by Liverpool, in contrast, would be read with a falling nucleus on the score of Halifax and a fall-rise on Liverpool (38b). Cruttenden (1974) presented children aged 7; 10 to 10; 11 with some anticipatory patterns (i.e. announcements interrupted before the reading of the second score) and asked them to predict the outcome. It emerged that, at age nine, 50% of the children showed no competence for this linguistic task. Cruttenden concluded that children between seven and ten are still in the process of acquiring some functions of the English pitch system. An alternative interpretation of the results might of course be that the children simply lacked experience with this highly specialised linguistic area.

4.4

M a s t e r y of the phonetic production of pitch accents

The productive abilities of infants in the area of pitch lag far behind their perceptual skills: Already in the very first cry of a new-born child the phonetic parameters pitch, loudness and length can be measured. However, control over the underlying physiological mechanisms, a prerequisite for their linguistic use in intonation, has yet to be acquired. Due to anatomical differences in the position of the ribs in relation to the spine, new-borns cannot regulate the subglottal air pressure by using their intercostal muscles in an adult-like fashion. The length of their vocalisations is thus physiologically limited (Lieberman 1985). Growing control over these muscles is reflected by the increase of vocalisation length from an average of 600 ms at four weeks of age to an average of 1500 ms at 20 weeks (Laufer & Horii 1977).

49 Equally, control over pitch is at first anatomically limited. The larynx of infants is in a high position (Kent & Murray 1982) where it can be moved upwards into the nasopharynx to allow simultaneous drinking and breathing (Laitman, Crelin & Conlogue 1977). At three months of age it moves down into the pharynx (George 1978) into the adult position. Laryngeal control is still very unstable (Kent & Murray 1982), which results in many irregularities during vocalisation. Phenomena such as abrupt FO shifts, irregular vocal fold vibrations and breathiness could be measured in three to nine month-olds. D'Odorico (1984) classified 26% of all vocalisations by the 0;4 to 0;9 olds in her study as dysphonated or hyperphonated (upward shifts of FO in the narrow-band spectrogram). Growing control over the physiological mechanisms involved in the production of pitch is reflected in the emergence of specific pitch movements such as falls and rises. Of 100 vocalisations recorded by Kent & Murray (1982) at three, six and nine months respectively, acoustic measurements showed that about a third had a flat FO shape, a little less than a third had a falling pitch and an equal number had a rise-falling shape. Rises, rise-fall-risefalls and fall-rises were very rare. In data collected by Marcos (1987), half of the utterances produced by 1 ;2 to 1 ;9 olds had a flat shape, and hardly any rise-falls or fall-rises occurred. Cross-linguistic differences in the frequency of rising and falling pitch movements produced by infants in accordance with their distribution in the various target languages were reported by Hallé, Boysson-Bardies & Vihman (1991). It is of course impossible to decide which and how many of these pitch patterns are produced accidentally or with linguistic intent. The phonetic parameter loudness does not seem to be controlled at all at this early stage. Kent & Murray (1982) report that it is completely dependent on pitch movements: They found a frequent occurrence of a "vocal tremor" in their three- to nine-month-old subjects, which can be described acoustically as a slow parallel modulation of intensity and frequency. For Allen's (1983) French-learning subjects this is still the case at 2;0. There are no experimental data describing the phonetic production of pitch accents across language acquisition. In a study concerned with the acquisition of rhythm, Allen & Hawkins (1978) investigated the FO movements within stressed syllables in English utterances by 2;8 to 3;4 olds. Nuclear non-final syllables showed an equal amount of rising, falling and complex FO movements with a slightly less frequent production of steady pitch. Nuclear final syllables also showed rising, falling and complex pitch movements. Unfortunately, pitch alignment within the vowels of the stressed syllables was not measured. To my knowledge, there is a complete lack of similar data from German or bilingual children.

4.5

Research questions

From the above can be seen that the studies concerned with the monolingual acquisition of the system of pitch have yielded contradictory evidence. This might in part be due to the lack of a uniform theoretical or methodological background. The phonetic realisation of pitch accents in language learners has not yet been investigated. In this study, therefore, the

50 bilingual acquisition of the phonological use and the phonetic production of pitch will be investigated in the following way. Using both auditory and instrumental analysis and the descriptive systems of both the British tradition and the AM approach, child utterances will be analysed in search of evidence of a systematic use of pitch for the marking of speech acts or other linguistic purposes. Looking at the marking of questions by pitch, it will be investigated whether this changes during acquisition, and developments in both languages will be compared. On a phonetic level, the production of pitch accents will be investigated with a view to the question whether one combined production strategy or two separate systems will be acquired.

5. Bilingual acquisition of intonational phrasing

This chapter is concerned with the bilingual acquisition of intonational phrasing. After a description of the phonological systems of intonational phrasing in both English and German (section 5.1) and the phonetic correlates of intonational phrases (section 5.2), the acquisition of intonational phrasing will be discussed (section 5.3). Here, only evidence from monolingual acquisition can be presented. Section 5.4 focuses on the acquisition of the individual phonetic correlates of intonational phrasing, and section 5.5 presents the research questions developed for this study.

5.1 The phonological systems of intonational phrasing in English and German

The linguistic use and function of intonational phrasing in both English and German is far from well understood although there is no lack of research. Couper-Kuhlen (1986) discusses, amongst others, two main linguistic functions of intonational phrasing in English: A grammatical and an informational one. As a grammatical function of intonational phrasing in English the correlation between phrase and clause boundaries with intonational boundaries has been claimed (Halliday 1967; Schubiger 1958). Crystal (1969) found that, in spontaneous speech, only 46% of the English clauses he analysed coincided with tone units and only 28% of tone units coincide with clauses. However, 80% of all tone units coincide with one element of the clause structure such as the NP, the AP or the VP. Gut (1995) claims that the distribution of intonational phrases in read speech is fairly restricted in English. She analysed read texts and found that speakers agreed on the position of major tone unit boundaries in 100% of the cases and minor tone unit boundaries in 80%. Major tone unit boundaries always coincide with clause boundaries and minor tone unit boundaries sometimes mark subordinate clause structures, although their application is variable across speakers. In a similar experiment, Gut (1995) investigated the distribution of major and minor tone units in German texts read aloud. There was nearly perfect agreement among the readers regarding the distribution of major tone unit boundaries, which all coincided with clause boundaries. The distribution of minor tone units, however, seems relatively free in German: Only about half of them appeared to be obligatory, i.e. they were realised in the majority of the 12 readings. The other half were classified as optional since they were realised in only about a third of the readings. Both "obligatory" and "optional" minor tone unit boundaries occurred at punctuation marks or syntactic boundaries such as between main and subordinate clauses. Thus, the main difference between minor tone unit boundaries in German and English seems to be their general frequency in a text and the position in the sentence where they can occur.

52 An experiment by Piirschel (1975) provides some evidence for this assumption. Five English and 149 German native speakers read out an English text. In addition, the German subjects were recorded reading a close translation into German of the same text. In general, it could be seen that the English speakers made fewer pauses than the German speakers reading both the English and the German text. Pauses inserted by the English subjects coincided highly with the punctuation marks of the texts. This was in principle also true for the German speakers reading the German text. There, however, some subjects paused in places where no English reader would do so. The following example demonstrates this (the superscripts indicate the frequency of long and short pauses following the word): Grundsaetzlich bedeutet ein solches Programm54/ den Ruf nach Qualität 145 / verzahnt'"/ mit den Notwendigkeiten 29 / einer liberalen,44/ demokratischen Gesellschaft.149/

Trim (1964) argues that, in German, minor tone unit boundaries coincide with phrase boundaries, whereas they comprise larger units such as clauses in English. Utterance (39) would therefore show the following minor tone units for German and the following in English (example 40): (39) Der alte Mann I ist I um sechs Uhr I morgens I nach Hause gekommen (40) The old man came home I at six o'clock in the morning

This grammatical function of intonational phrasing, however, seems far more important in reading or prepared speech than in spontaneous conversational speech. Here, the principal function of intonational phrasing is informational - the structuring of utterances and discourse into information units (Halliday 1967; Pheby 1975, 1983; Wunderlich 1988). Thus, the distribution of intonational phrases in spontaneous speech depends to a great extent on situational factors and is far more difficult to predict than in read or prepared speech. However, even in spontaneous speech intonational phrasing is not arbitrary but follows certain phonological rules: Intonational boundaries usually coincide with syntactic boundaries, either at a clause or phrase level. In both German and English, appositions and parenthetical structures are marked by intonational boundaries (examples 41 and 42). (41) (42)

I saw the dog I a beautiful retriever I running across the field. Dann kam der Peter I ein ziemlicher Dummkopf I dazu und sagte..

(Then Peter I a real wally I came and said...) Enumeration or other structures with a parallel coordination of several elements are also separated into individual intonational phrases in both languages, as illustrated below: (43) I bought some veg I meat I biscuits I and towels. (44) Da waren Karin I Martin I Frieder I und Elke.

(There were Karin I Martin I Frieder I and Elke) In contrast, topicalised structures constitute an intonational phrase in English but not in German (examples 45 and 46). (45) The flower 11 meant. (46) Die Blume meinte ich.

53 There seems to be a general tendency for English speakers to separate anything preceding the main clause by an intonational boundary. This applies to all adverbial modifiers (example 47) (47) Surprisingly I he didn't win. as well as "moved structures" in a transformational sense (example 48): (48) Always managed to scrape through I she did. In general, speakers use intonational phrasing in order to structure their discourse into units of information, which also facilitates comprehension by the listener. Thus, the intonational units probably correspond to units of planning in speech production. Fery (1988) cites the disambiguation of utterances as another function of intonational phrasing in German. In examples (49) and (50), it is the intonational phrase boundary that (49) Peter liebt Gerda aber nicht (50) Peter liebt I Gerda aber nicht determines the meaning: In (49) "Peter, however, does not love Gerda", and in (50) "Peter loves, however, Gerda does not". Of course, nucleus placement on liebt in (50), but not in (49) co-determines meaning here. Intonational phrasing is also used for the disambiguation of utterances in English. In examples 51 and 52, it is the intonational boundary that determines whether she washed herself or the baby. (51) She washed and fed the baby. (52) She washed I and fed the baby. In summary, a German/English bilingual learner is faced with two distinct intonational systems, which nevertheless show great similarity. The linguistic use of intonational phrasing between the two languages differs in its grammatical function for reading and prepared speech. The distribution of intonational boundaries is much more limited in English where major tone units always coincide with clause boundaries and minor tone units correlate with subordinate structures and punctuation marks. In contrast, intonational phrasing in terms of minor tone units shows great variation among German speakers. In spontaneous speech, intonational phrases correlate with units of sense and have the function of structuring utterances and discourse in both languages. Because of the variability of situational and conversational contexts, phonological rules of intonational phrasing are difficult to establish, and differences between the two languages are not well attested. One example of language-specific rules for intonational phrasing can be found in topicalised structures, which constitute IPs in English but not in German.

54 5.2

The phonetic correlates of intonational phrases

At the centre of the descriptive systems of intonation described in chapter 2 is the intonational phrase, which provides the framework within which intonational features are described. It is, however, apparent that its characterisation is far easier than its delimitation. Whereas most authors agree on the internal constituents of intonational phrases, little consensus can be found on the definition of the phonetic criteria which could serve to demarcate it. Cruttenden (1986) suggests the following phonetic criteria for the identification of intonational phrase boundaries: Pauses, final syllable lengthening and change in pitch level and direction of unaccented syllables (FO resetting). Intonational boundaries are always correlated with the phonetic parameter pause, however, not every pause in speech is produced in order to mark an intonational phrase boundary. Pauses also occur in hesitations and performance errors, before restarts and repairs, and because of the biological necessity of breathing. Attempts to establish regularities between the type of pause and its length have been only partially successful (Goldman-Eisler 1958). The phonetic parameter pause is thus a necessary but not a sufficient indicator of intonational boundaries, and it has to be supplemented by other phonetic parameters. One such parameter is length, which occurs in the final syllable before an intonational phrase boundary. Often, this final syllable also carries a distinct pitch movement. In examples 53 and 54, the words home and heim constitute the last syllable of the intonational phrase and would thus be significantly longer than in examples 55 and 56, where they occur in the middle of the IP. (53) (54) (55) (56)

I am going home I said Peter. Gehst Du heim I fragte sie mich. My home is my castle. Er ging heim mit seinen Hunden.

Listeners expect final syllable lengthening as a cue for intonational phrasing (Gussenhoven & Rietfeld 1992), and the length of the syllable is expected to increase with the rank of the boundary. Final syllable lengthening before a major tone unit must be longer than when preceding a minor tone unit boundary. Delattre (1966) compared final syllable lengthening in German and English and found that it followed the same rules: In both languages, stressed final syllables were longer than unstressed ones and, in both conditions, closed syllables were longer than open ones. Stressed open syllables, furthermore, were systematically longer than closed unstressed ones. In absolute terms, final syllable lengthening in English is more pronounced than in German, and the differences between the various types of final syllables are greater. A third phonetic correlate of intonational phrasing is a change in pitch level after an intonational boundary - the resetting of the fundamental frequency. In both German and English utterances, there is a tendency for FO to become lower throughout the utterance, which is called declination (e.g. Ladd 1984). Declination is most apparent in the height of successive high pitch accents, which will follow a downward sequence. At the end of an intonational phrase ending in a L% boundary tone, pitch is at the lowest level and will be

55 picked up to a higher starting position for the following intonational phrase (Ladd 1988). This means that in the case of utterances which are produced with more than one intonational phrase the unstressed syllable following an L- boundary tone of the preceding intermediate phrase will begin at a higher level compared to the L-. Listeners are not aware of this resetting of FO, but it is clearly visible in instrumental pitch tracking as described in section 2.1.2 (figure 7). The pitch of intonational phrases ending in H% boundary tones does not show this resetting of FO. In summary, the phonetic production of intonational phrasing is not easily determinable because a variety of phonetic parameters interact in many ways. From what is known so far, however, it appears that there are no major differences between German and English so that a bilingual learner would not have to acquire two separate perception and production strategies.

5.3

T h e acquisition of intonational phrasing

Already very young infants are sensitive to intonational phrases and prefer to listen to those with "normal length", which has been postulated to help them to process speech (Jusczyk 1997b). At the age of nine months, American infants listen longer to speech with pauses inserted at clause boundaries than to speech with pauses in the middle of clauses (HirshPasek et al. 1987). Jusczyk (1991) used read speech to show that the phonetic parameter pause can be discriminated in very early childhood: By the age of 4 Vi months, infants have developed a preference for the prosodic patterns that are used by mothers to segment their speech when addressing them. Whereas this is still independent of the actual language used at this age, at six months this preference is restricted to their mother tongue/s. On a subclausal level, nine month-olds also prefer speech with pauses between subject and verb phrase rather than within (Jusczyk et al. 1992), a preference six-month-olds do not show. Fernald (1985) and Fernald & Kuhl (1987) report that four-month-olds prefer to listen to infant-directed speech rather than adult-directed speech and that it is the pitch pattern which attracts them. Furthermore, they prefer to listen to it with pauses at the clause boundaries (Kemler-Nelson et al. 1989). Differences in the phonetic parameter length also seem to be perceived from very early on. Spring & Dale (1977) report that four- to 17-week-old infants discriminate syllables differing in the phonetic parameter length alone. The production of intonational phrasing in the course of language development is closely connected with both the acquisition of syntax and the development of the mechanisms of speech processing. The former is a prerequisite of the acquisition of the grammatical function of intonational phrasing; the latter provides the conditions for the use of IPs as information groups. There is of course much interaction between these two functions of intonational phrasing since intonational boundaries for sense units show high coincidence with grammatical boundaries. Very early child speech that consists of single-word utterances cannot reflect the acquisition of the phonological rules of intonational phrasing. On the contrary, it provides

56 clear evidence for an absence of phonological rules which group syntactic or information units together. Scollon (1979) describes some sequences of one-word utterances by a child of age 1 ;7 that are semantically related but produced in two IPs: (57) fingerll touchll (58) tapell stepll

(Brenda reaches out to touch the microphone with her finger) (she lifts her foot and holds it over the recorder)

Instances of single-word utterances related in sense but separated by pauses have also been found by other authors (Bloom 1971; Crystal 1986; Tracy 1991). This phenomenon is also observable in the so-called two-word stage. Despite the child's ability to group two words together into one intonational phrase, there is ample counter-evidence that intonational phrasing is not yet used linguistically with either a grammatical or an informational function. Scollon (1979) describes sequences of conversation that show how strings of words that belong to one information unit are broken up by pauses: (59) Ronll makell tape corderll (60) Rottenll foodll dog somell (61) This wayll hold itll hold itll holdingll holdingll It seems that, at this stage, the necessary processes in speech production which would enable the child to process information groups as single units before articulation, have not been acquired yet. All models of speech production assume various stages of speech planning from a pre verbal message to its articulation (Levelt 1989; Herrmann & Grabowski 1994). Usually, some syntactic and phonological encoding of the preverbal message is proposed before the articulation. Levelt postulates that both the grammatical and the phonological encoding take place in the Formulator, whose output goes directly to the Articulator. The generation of intonation is conceptualised as part of the phonological encoding, thus also situated in the Formulator. Elbers & Wijnen (1992) claim that this Formulator only develops with the onset of syntax and that it continues to be adapted over the course of syntactic and phonological acquisition. This raises the question whether grammatical and phonological encoding develop independently or, if not, whether their development is assumed to proceed at the same rate. Unfortunately, there are no studies concerned with the acquisition of the phonological rules of intonational phrasing in either German or English. A first testing ground for their linguistic use is the production of topicalised structures. In English, they constitute an IP, whereas in German the insertion of an intonational boundary would be inappropriate. Thus, language-specific strategies must be adopted by German/English bilingual children. Furthermore, with the onset of the production of subordinate structures, the acquisition of the grammatical and informational function of intonational phrasing can be demonstrated according to how children choose to structure their utterances in terms of IPs.

57 5.4

Mastery o f the phonetic correlates o f intonational phrasing

As phonetic criteria for an intonational phrase or tone unit final syllable lengthening, the insertion of pauses and the resetting of pitch after pauses was described. Various authors report final syllable lengthening already in the babbling phase (Laufer 1980; Robb & Saxman 1989; Snow 1994) which is however devoid of linguistic purpose. Systematic use of final-syllable lengthening in falling pitch movements was studied by Snow & StoelGammon (1994). Three children tested at 1;6 and 2;0 showed consistent greater length of final syllables as compared to non-final syllables in their one- or two-word utterances with falling nuclear pitch patterns. Snow (1994) reports that six of his nine subjects showed greater final- than non-final syllable lengthening three months after the onset of combinatory speech. Kubaska & Keating (1981) measured the length of words produced by three children aged 1;3 to 3;0 in order to investigate whether familiar words were articulated faster than relatively new ones. They report that words produced utterance-finally are longer than when produced at the beginning or in the middle of an utterance, that stressed words are longer than unstressed ones, and that a word produced in isolation has a longer duration than when produced in a phrase. Pollock et al. (1993) also reported final syllable lengthening in the production of disyllabic nonsense words by two-, three- and four-year-olds. Both stressed and unstressed vowels in a final position were significantly longer than when produced in a non-final position. A further finding was that the extent of final syllable lengthening decreases between two and three years of age. Whereas the two-year-olds produced final vowels of a mean length of 25cs, the four-year-olds produced them with a mean length of 21cs. Thus, length seems to be exaggerated up to age three. Two reasons for this are possible: a) speech by two-year-olds is overall slower and thus exhibits greater length in vowels than in the case of four-year-olds and b) physical control of the phonetic parameter length is not acquired yet at age two. To my knowledge, there are no studies concerning the systematic use of the phonetic parameter pause or concerning the resetting of pitch after pauses.

5.5

Research questions

This chapter showed that the phonological rules of intonational phrasing are more difficult to determine than those of nucleus placement and pitch. IPs mainly seem to have a grammatical function in reading and prepared speech and a more informational function in spontaneous speech. The phonetic correlates of intonational phrasing are pauses, final syllable lengthening, and F0 resetting. Hardly anything is known about the acquisition of the phonology or phonetic realisation of intonational phrasing. This study will thus investigate the following aspects of the acquisition of intonational phrasing: The phonological use of intonational phrasing for grammatical functions will be studied in topicalised structures, subordinate constructions and larger stretches of speech.

58 Evidence for the acquisition of IPs as a means of marking sense units will be looked for. On a phonetic level, the child utterances will be analysed for the production of final-syllable lengthening, the systematic production of pauses as intonational phrase boundaries, and the occurrence of FO resetting after intonational boundaries.

6. The study - research questions, method and analysis

6.1

Research questions

The present study is concerned with the acquisition of intonation by German/English bilingual children. In chapter 2 it was argued that the intonation of a language consists of a system of phonological representations and their phonetic realisations. Consequently, the acquisition of intonation in this study will be investigated on two levels: The phonetic one, which involves the physical control of the phonetic parameters pitch, loudness, and length; and the phonological level, where these phonetic parameters are applied systematically in order to achieve various linguistic purposes. On both levels, separate systems should be acquired for each language of a bilingual. However, as discussed in chapter 2, a specific bilingual production strategy might be acquired on the phonetic level, and the phonological representations may interact.

6.1.1 Nucleus placement The three intonational phenomena with basic linguistic functions that can be assumed to be among the first to be acquired by children are nucleus placement, pitch and intonational phrasing. In chapter 3 it was described that a prerequisite of the phonological use of nucleus placement is the transition from word-level stress to sentence-level stress. On the phonological level, nucleus placement is used in order to mark focus for either contrast or the differentiation of new vs. old information, and it can be used to mark emphasis. Various differences between the phonological systems of nucleus placement in German and English were pointed out. Subsequently, the phonetic production of nuclear stress was described and compared for German and English. The following research questions were developed from this: • when do bilingual children first produce sentence-level stress, i.e. nuclei in their utterances • when is nucleus placement used phonologically • for which linguistic function is nucleus placement used first and in which order are the various functions acquired • how do bilingual children treat the phonological differences between the two language systems • how are nuclei produced phonetically • do bilingual children use a specific bilingual production strategy • does this change across time

60 6.1.2 Pitch The intonational system of pitch was described as very complex in terms of linguistic functions and application in adult speech (chapter 4), and the debate about an early acquisition of the phonological system of pitch was presented. One of its functions prevalent in early child speech is the marking of different types of speech acts with different types of pitch accents. It was pointed out that in the case of questions there might be systematic differences between German and English in the use of pitch accent types. Equally, the phonetic realisation of certain pitch accent categories was proposed to differ between the two languages. Thus, the following research questions emerge for this study: • when do bilingual children begin to use pitch accent types systematically • when is pitch used phonologically • for which linguistic function is pitch used first, and in which order are the various functions acquired • are different systems for marking questions with pitch acquired in the two languages • how are pitch accents realised phonetically • is there a bilingual production strategy • does the production strategy change over time

6.1.3 Intonational phrasing Intonational phrasing was described as having a grammatical and an informational function (chapter 5). It was discussed that IPs were closely associated with syntactic and semantic units. The phonetic correlates of intonational phrases are pauses, preboundary syllable lengthening and the resetting of the fundamental frequency after a pause. Because of the amount of synchronous phonetic activity and the extent of speech planning required for the production of IPs, it was hypothesised that their acquisition is mastered only late in childhood. Thus, the following research questions will be addressed in this study: • when do IPs begin to coincide with syntactic structures such as major constituent boundaries • when do IPs begin to be associated with semantic groups • when do bilingual children begin to produce final syllable lengthening • when are pauses used for the demarcation of IPs • when does FO-resetting occur • are there any cross-linguistic differences in the phonetic production of IP markers • are there any cross-linguistic differences in the phonological use of IPs In summary, the acquisition of the phonological use of nucleus placement, pitch and intonational phrasing as well as the phonetic production of nuclei, pitch accents, final syllable lengthening, pauses and F0 resetting in German/English bilingual children will be explored in this study. Cross-linguistic differences will be analysed, and a model of the acquisition of intonation will be proposed for these children (chapter 11).

61 6.2

Method

6.2.1 Data The longitudinal data for this study were collected between 1989 and 1994, where they formed part of two larger DFG-projects conducted by Rosemarie Tracy (TR 238/1 and TR 238/2) at the University of Tübingen, involving five monolingual German-speaking and five bilingual German/English children of middle-class background. The focus of interest in the projects was early bilingualism and the emergence of complex sentences in young children. The children were visited at their homes at regular intervals and recorded during play sessions with either a German-speaking or an English-speaking investigator. This was supplemented by some additional structured sessions involving translation tasks for the children. The exact procedures of data collection will be described separately for each child in the following.

6.2.2 The subjects of the study 6.2.2.1 Hannah Hannah is the first-born child of an English mother and a German father, living in the South of Germany. Her mother speaks Southern British English and has an excellent command of German, which she acquired as a second language. Her father speaks Standard German and is equally fluent in English, which he acquired as a second language. Both parents spoke predominately German with each other before Hannah was born. They then decided to address her in their respective mother tongue and to use English when speaking to one another. The parents estimate that, during her first year of life, Hannah was exposed to an equal amount of English and German. When she started talking, however, she showed little discrimination between the languages, using either of them with her parents and talking predominately German. When Hannah started going to a crèche at 1 ;3, the parents changed their language policy and decided on English as the family language. Some mixing occurs in Hannah's speech, of which some were due to the language situation in the crèche: Two of the caretakers, one from Korea and one from Greece, spoke German as a second language and produced utterances such as "*Gehst Du nach heim", which could also be found in Hannah's speech (see also Tracy 1995).

Available data There are 14 tape and video recordings made with Hannah between the ages of 2;0 and 2;9. During these recordings, an English-speaking person (her mother) and a German-speaking project member were present. Recordings lasted between 30 minutes and an hour. Additionally, ten audio recordings were provided by her mother, which cover the ages 2;1 to 3;0. During these, Hannah was addressed in English by both her mother and her father,

62 and these recordings contain between 100 and 200 child utterances. For this study, the first five of these recordings were selected, which will be described in detail in section 6.3.

6.2.2.2 Laura Laura is a second-born child of a German mother and an English father and has lived in Southern Germany all her life. Her mother speaks Standard German with some Swabian influence and acquired English as a second language. In a questionnaire she filled in at the beginning of the study, she rated her proficiency as good for understanding and reading and sufficient for speaking and writing (Tracy 1995). Laura's father speaks an Urban Northern English dialect and has an excellent command of German. Before the children were born, the language spoken between the parents was German. They then decided to address the children in their respective mother tongues. However, Laura's input in English was quite limited. Her father only spent a few hours per week with Laura, during which he spoke English to her. Before the beginning of this study, Laura's only other contacts with English were occasional visits to Great Britain. Laura addressed both of her parents and her elder brother predominately in German. She joined kindergarten after her third birthday, where the input was exclusively German. At the beginning of the study, Laura spoke very little English, even when addressed exclusively in this language. The number of her English utterances increased after 3; 10.

Available

data

Between the ages of 2;7 and 4;9, 83 recordings with Laura were made. In the first three recordings, an English-speaking investigator and her older brother Adam were present. Recordings 4 to 83 comprise different play sessions, either with an English-speaking or with a German-speaking investigator. In several of the later recordings both investigators and her brother Adam were also present for some length of time. Depending on Laura's mood, the length of the recordings ranges from less than 100 to over 500 utterances, a fair portion of them often monosyllabic. For this study, 26 recordings were selected, which will be described in detail in the next section (6.3).

6.2.2.3 Adam Adam was born as the first child to the family described above. After Adam's birth, his parents decided to address him exclusively in their respective mother tongue/s although they continued to speak German to each other. Adam began to use the two different languages with the appropriate parent from the beginning and, as the parents report, showed little inclination to mix languages. As for Laura, Adam's input of English at the time of the study was considerably smaller than his input of German. His father, who, on a weekday, spent about two to four hours playing with him, constituted Adam's only source of English until the onset of the study. The family spent about three-to four-week holidays in England per year, where Adam spoke and was exposed to much English. The rest of the time, Adam

63 spoke German with his mother and his younger sisters, and at 3;9 he joined kindergarten where the input was exclusively German.

Available data Data of Adam consist of 88 recordings made between the ages 3;6 and 5;8. During the first three sessions, an English-speaking investigator and Laura were present. Recordings 4 to 86 comprise play sessions with either an exclusively English-speaking investigator or an exclusively German-speaking one. In two later recordings, both investigators were present and asked him to translate between them. In recording 88, he was asked to translate some especially prepared sentences. Recordings range from approximately 250 to over 600 child utterances. The 20 recordings selected for this study are described in detail in 6.3.

6.2.3 Data collection During the study, the children were visited at home. One member of the project acted as a German-speaking investigator, the other as an English-speaking investigator. They never addressed the children in the other language, but they did not pretend not to understand it when the children chose to reply in the other language. Parents were addressed in their respective mother tongues by the investigators, i.e. English in the case of Adam's and Laura's father and German in the case of their mother. The recordings were made in relatively unstructured play situations although some toys or books were introduced to elicit specific structures or translating. The children were free to choose toys or activities, and some of the recordings were made outside in the garden or while going for a walk.

6.3

Analysis

In this section, the data on which the present study is based will be described (6.3.1). Subsequently, the auditory analysis (6.3.2) and the acoustic analysis (6.3.4) carried out will be presented, together with measurements of their reliability (6.3.3 and 6.3.5).

6.3.1 Data Table 3 presents an overview of all the transcribed recordings with Hannah. The first column lists the name of the recording by which it will be referred to in the following. From column two, the child's age can be taken in years;months.days. The third column gives the total number of transcribed child utterances. In the fourth column, the participants of the recording are listed. Participants in the recordings were Hannah's mother (M) and sometimes additionally her father (F). No other investigator was present. All recordings are

64 about 60 minutes long with transcripts considerably shorter, as can be taken from the fifth column. A total of 796 utterances were transcribed for Hannah.

Recording

Age

Number of Utterances

Participants

Length of transcription

HI

2;01.13

150

M

20 min

H2

2;02.27

153

M

20 min

H3

2;03.27

212

M, F

40 min

H4

2;04.17

156

M, F

15 min

H5

2;06.15

125

M, F

15 min

Table 3: Recordings with Hannah transcribed phonetically and with intonation, Hannah's age, the number of child utterances, the other participants of the recording and the length of transcription. For Laura, 26 recordings were transcribed. They included the first and the last recording of the total of 83 made in the Tübingen project described above and were spaced equally over the period of investigation. Because of Laura's initial reluctance to speak English, transcribed German utterances outbalance the English ones. A total of 6121 utterances was transcribed for Laura, comprising German, English and mixed utterances. Tables 4 presents an overview of all the transcribed recordings with Laura.

Recording

Age

Number of Utterances

Participants

Length of transcription

LI

2;05.10

264

E, M, Ad

L2

2;05.24

301

E, M, Ad

L3

2;06.06

44

L4

2;06.15

224

E, M, Ad (grandparents) E, M

L5

2;06.26

392

E, M, (G2)

L6

2;07.12

502

G, M

L7

2;08.24

211

E, (M, Ad, G)

L8

2;09.07

473

G, (M)

90 min/ 90 min 105 min/ 105 min 60 min/ 60 min * 60 min/ 30 min 60 min/ 45 min 45 min/ 45 min 40 min/ 40 min 60 min/ 60 min

65 L9

2;09.21

332

E, G, Ad, M

L10

3;00.20

97

E, M

Lll

3;00.20

336

G

L12

3;01.03

263

E, (M,G)

L13

3;01.17

159

G, CE)

L14

3;02.02

290

E, (M, Ad, G)

LIS

3;02.28

486

G

L16

3;03.19

228

G

L17

3;03.19

L18

3;04.03

234

E (M, G, Ad)

L19

3;05.07

344

G, M

L20

3;06.29

406

E (G, Ad)

L21

3;07.10

76

E

L22

3;08.24

274

E (G, Ad, M)

L23

3; 10.02

387

E, (Ad, G)

L24

3;10.28

210

E (G, Ad, M)

L25

4;02.22

242

E, G, Ad

L26

4;03.12

337

E, G, Ad, E2, F

E, (M, G, Ad)

40 min/ 40 min 45 min/ 45 min 40 min/ 40 min 80 min/ 60 min 30 min/ 30 min 60 min/ 60 min 50 min/ 50 min 30 min/ 30 min 60 min/ 40 min* 90 min/ 50 min 60 min/ 50 min 70 min/ 70 min 15 min/ 15 min 60 min/ 60 min 150 min/ 150 min 120 min/ 80 min 180 min/ 100 min 150 min/ 150 min

Table 4: Recordings with Laura transcribed phonetically and with intonation, Laura's age, the number of child utterances, the other participants of the recording, and the length of both recording and transcription (The * denotes that only questions were transcribed). Out of the total of 88 recordings with Adam, 20 were selected, spanning the entire period of investigation. With a view of covering all of the development, recordings were chosen in regular intervals of time, and recordings in German and English were balanced equally. A

66 total of 7116 utterances were transcribed for Adam including German, English and mixed utterances (see table 5). Recording

Age

A1

3;06.28

Number of Utterances 308

Participants E, L, M

A2

3;07.14

257

E, L, M

A3

3;07.23

60

E, L, M

A4

3;10.12

459

G

A5

3; 10.27

393

E, (L)

A6

3; 10.27

442

G

A7

3;11.09

512

G

A8

4;00.13

179

G, (L, E)

A9

4;02.08

421

G

A10

4;02.08

458

E

All

4;03.19

278

E, (L)

A12

4;04.17

493

G

A13

4;04.17

275

E

A14

4;07.23

457

G, E, L

A15

4;08.28

664

E (L,M)

A16

4; 10.11

53

E, Sarah

A17

4;11.20

211

E, L (G)

A18

5,00.17

298

E (L, G)

A19

5;06.00

802

E, G, L, E2, F

Length of transcription 90min/ 90 min 105min/ 105 min 60min/ 60 min 60 min/ 60 min 60 min/ 60 min 60 min/ 60 min 50 min/ 50 min 60min/ 50 min 70 min/ 60 min 60 min/ 60 min 60 min/ 60 min 75 min/ 75 min 60 min/ 60 min 120 min/ 80 min 90 min/ 90 min 30 min/ 20 min 30 min/ 30 min 90 min/ 50 min 150 min/ 150 min

67 A20

5;07.30

186

E, G2

120 min

Table 5: Recordings with Adam transcribed phonetically and with intonation, his age, the number of child utterances, the other participants of the recording, and the length of both recording and transcription.

In both tables 4 and 5, other participants of the recordings were coded as follows: E G E2 G2 M

stands for the investigator speaking only English to the children stands for the investigator speaking only German to them stands for a second English-speaking investigator during recordings A19 and L25 stands for a second German-speaking investigator during recording A20 stands for their mother, who was usually simply present with very little participation in the conversation F stands for their father Ad stands for Adam L stands for Laura Sarah is their little sister, who was born after the onset of the study

In recording L3, Laura's grandparents were present for a short time but did not participate in the conversation. In general, participants given in brackets only appear shortly and do not speak much. An example would be Adam coming into the room where Laura is playing with an investigator and asking to borrow a pair of scissors. The language environment for Adam and Laura is very complex. A typical situation for Laura would look as follows: Both Adam and their mother speak German to her, although Adam might address some remarks to her in English, as in recordings L22 and L25. Her father speaks both German and English to her, the investigators address her either in German or English. A typical situation for Adam would look as follows: Laura always speaks German to him, his mother also addresses him exclusively in German, whereas his father tries to speak only English to him but sometimes switches to German. The investigators speak only either German or English. Both Adam and Laura use their languages according to the language the conversation partner uses; however, Laura avoids speaking English in the earlier recordings. The fifth column of tables 4 and 5 lists the length of the recording and the duration of the transcribed period. Transcription usually starts at the beginning of the recording but might end before its end for various reasons. Laura sometimes became very monosyllabic after a certain time or recordings ended with her turning very wild, mainly screaming and talking in some nonsense language. In recordings marked with an asterisk (*), only the questions uttered by the child were transcribed.

6.3.2 Auditory analysis and layout of the transcription All 51 recordings of this study were analysed auditorily according to the system of the British tradition, and transcripts were made according to the following form (figure 18). In the Tubingen projects, transcripts had been made for most of the recordings. The original layout was kept, but each child utterance was reanalysed and transcribed using IPA and the

68 intonation transcription system described below. In addition, many of the transcripts were extended considerably. L24

[3 ; 10 .28]

L und E mit Puppe im Wohnzimmer

Who's t h i s ,

006 'fb:,wi/an

Laura?

Hello Florian! What a b i g baby. F l o r i a n c a n ' t really see. What does Florian want to play now? Does F l o r i a n want to play I think Florian wants to play with the farm p i c t u r e s . L schüttelt Kopf

No? But I do. I wanted t o p l a y with your farm pictures. Can you show them t o me?

Figuren kleben durcheinander

Oh oh! What's going on that farm? Has there been a storm?

007 \je 008 dnz/fo:lii) 009 ai s \put da to|on /hia di | \klaimig Ap da steas|an des /tau:

010

hi:z'fDilirj \daun/nau f

011 si:|\putir) hi|\ba? daun ant s st| /smeariq his|hihis \hes daun f 012 /Jau|lak /OBU||/OBU mp

013 /jes at|val /h3:t

mf

Yeah. Look! cow doing?

What's

the

Mhm. And the farmer, look, h e ' s lying across - r i g h t i n the a i r . And w h a t ' s the little boy doing on t h e r o o f ? Up t h e r e ? Hm? What's he doing t h e r e ?

Oh Oh! Don't you that w i l l hurt?

think

69

0 1 4 hi ,fo:lir) \daun

When he first

falls down,

head

And what' s the horse doing in the little - in the little pond? Figure 18: Layout of the transcripts. The first column gives the situational descriptions, the second contains phonetically transcribed child utterances and the third the investigator's utterances. The transcripts consist of three columns: One for the description of the situation, one for the child's utterances, and one for the investigator's utterances. The left-hand column contains descriptions of the child's and the other participants' activities such as " L points at a book" or " A is building a duplo house". (These situational descriptions are given in German, see figure 18). In the middle column, the child utterances are presented in phonetic transcription and with intonational transcription as described below. The utterances are numbered. In recordings A1 to A3 and L I to L3, where both children were present all the time, their utterances are simply indexed by A: and L: respectively. The right-hand column lists the investigator's utterances in orthographic transcription including punctuation marks. Utterances by other participants than the project member are labelled accordingly, i.e. a remark by Adam's and Laura's mother would be transcribed beginning with "M:". Any utterances by Adam and Laura in a recording made with the sibling are transcribed phonetically in the child column with a preceding " A : " or " L : " , but are not numbered. In Hannah's case, her mother's and father's utterances were indicated by " M : " and "F:". The transcript can be read from left to right according to the turns of the conversation. Interruptions are marked with II. An example of the transcription layout can be seen in figure 18.

6.3.2.1 Phonetic Transcription As illustrated in figure 18, only child utterances were transcribed phonetically. Transcription was not limited to speech sounds: It was attempted to transcribe any sounds produced by the child in narrow phonetic transcription, using the IPA symbols. O f the diacritics, the following symbols were used: for a long vowel n for nasalisation h for aspiration , for a syllabic consonant One symbol was invented for the description of a lisp prevalent in Laura's speech from L 2 to L 2 3 and occasionally also produced by Adam and Hannah:

•1 äs

si

Geschenk!

1] !l

¡1

il

1 äM!2ü

HJ fortte 1 il

Hi

iti

S[«i|

Geschenk!

1

4

briete 1 brearh> 1 breaüK

Ir> J-.r

M Mi

Tm i >: 4.653C lOssc 0.58525 L: 4.68712 5.27238 ( — — — 550 ,,,,,, I T • • • .•r . T-rn ,,,, , , „, , T I'' " r 500 — I 450 4 ua I 4350 5f ZL " j 300L V 250 N l |J Ik 200 ^ Ic 150 I 100 I sä ""•^-'n 1m m m ÜB m IM»«» Figure 43: The FO movement of Adam's utterance [das da is sir)k | f o b spot | siijk] at 3;6. There is no evidence for an FO resetting. Figure 44 shows the pitch movement of Adam's utterance "Aber wenn des I aber wenn nicht tot I ist" ("but when this I but when not dead I is"). Pitch height on des before the pause is about 250 Hz, and pitch height after the pause is about 240 Hz on is.

Figure 44: The FO movement of Adam's utterance [aba ven das | aba ven ni? to:t | is]. There is no evidence for FO resetting after the pause.

160 There is one example for pitch resetting after a pause associated with a restart in an English utterance by Adam aged 4;2. From figure 45 can be taken that the pitch height before the y 1

LVLJI

t i

L_

H-LN

" picfcl

fej

il

il

Mil 550

J

i

3l

T1

5.30794S8C

Pictl

si

il

il

0: 1.30730

I: ,,

1 •

• 1

the I

7. EG034

trains

il

A

il

R: J. 04764 (F:

0.72

. r

500 450 %

400 350 300

i