Designing Speech for a Recipient: The roles of partner modeling, alignment and feedback in so-called 'simplified registers' 9789027256751, 9789027266170


178 53 13MB

English Pages [339] Year 2016

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Frontpage
Table of contents
Acknowledgements
1 Introduction
1.1 Open issues
1.2 'Simplified registers'
1.3 Overview of the volume
2 The role of the addressee: Central notions
2.1 Recipient design and membership categorization
2.2 Participation structure
2.3 Audience design and style
2.4 Accomodation
2.5 Register
2.6 Common ground
2.6.1 Common ground as a set of propositions
2.6.2 Common ground as a shared bias
2.6.3 Common ground as a collaborative achievement
2.7 Alignment
2.8 Feedback
2.9 Interim conclusions
3 Methods and data
3.1 Methods
3.1.1 Experimental data elicitation
3.1.2 Qualitative analysis
3.1.3 Quantitative analysis
3.2 Human-robot interaction data
3.3 Transcription conventions
4 Talking to children
4.1 Linguistic description of the variety
4.1.1 Phonetic, phonological and prosodic properties
4.1.2 Grammatical properties
4.1.3 Semantic and pragmatic properties
4.2 Homogeneity
4.2.1 Variation: Speakers
4.2.2 Boys versus girls
4.2.3 Variation: Developmental finetuning
4.2.4 Variation: Situation
4.3 Functionality
4.3.1 Affective functions
4.3.2 Communicative functions
4.3.3 Facilitating language acquisition
4.4 The role of alignment
4.5 The role of the partner’s feedback
4.6 The relationship between partner modeling, alignment and feedback
5 Talking to foreigners
5.1 Linguistic description of the variety
5.1.1 Phonological and prosodic properties
5.1.2 Grammatical properties
5.1.3 Semantic and pragmatic properties
5.2 Functionality
5.3 Homogeneity
5.4 The relationship between partner modeling, alignment and feedback
6 Talking to robots
6.1 Linguistic description of the variety
6.1.1 Phonological and prosodic properties
6.1.2 Lexical properties
6.1.3 Grammatical properties
6.1.4 Semantic and pragmatic properties
6.2 Functionality
6.3 The role of alignment
6.4 The role of partner models
6.4.1 Appearance
6.4.2 External variables: Gender
6.4.3 Personal partner models and linguistic choice
6.5 The role of the partner’s feedback
6.6 The relationship between partner modeling, alignment and feedback
7 Conclusions
7.1 Simplified registers?
7.1.1 The notion of simplification
7.1.2 The notion of register
7.2 The role of the addressee in interaction – some answers
7.3 General conclusions
Bibliography
Index
Recommend Papers

Designing Speech for a Recipient: The roles of partner modeling, alignment and feedback in so-called 'simplified registers'
 9789027256751, 9789027266170

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Designing Speech for a Recipient

Kerstin Fischer

John Benjamins Publishing Company

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Designing Speech for a Recipient

Pragmatics & Beyond New Series (P&bns) issn 0922-842X

Pragmatics & Beyond New Series is a continuation of Pragmatics & Beyond and its Companion Series. The New Series offers a selection of high quality work covering the full richness of Pragmatics as an interdisciplinary field, within language sciences. For an overview of all books published in this series, please see http://benjamins.com/catalog/pbns

Editor

Associate Editor

Anita Fetzer

Andreas H. Jucker

University of Augsburg

University of Zurich

Founding Editors Jacob L. Mey

Herman Parret

Jef Verschueren

Robyn Carston

Sachiko Ide

Deborah Schiffrin

Thorstein Fretheim

Kuniyoshi Kataoka

Paul Osamu Takahara

John C. Heritage

Miriam A. Locher

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

University of Southern Denmark

Belgian National Science Foundation, Universities of Louvain and Antwerp

Belgian National Science Foundation, University of Antwerp

Editorial Board University College London University of Trondheim University of California at Los Angeles

Susan C. Herring

Indiana University

Masako K. Hiraga

St. Paul’s (Rikkyo) University

Japan Women’s University Aichi University

Universität Basel

Georgetown University Kobe City University of Foreign Studies

Sandra A. Thompson

Sophia S.A. Marmaridou

University of California at Santa Barbara

Srikant Sarangi

Teun A. van Dijk

University of Athens Aalborg University

Marina Sbisà

University of Trieste

Universitat Pompeu Fabra, Barcelona

Yunxia Zhu

The University of Queensland

Volume 270 Designing Speech for a Recipient. The roles of partner modeling, alignment and feedback in so-called ‘simplified registers’ by Kerstin Fischer

Designing Speech for a Recipient The roles of partner modeling, alignment and feedback in so-called ‘simplified registers’

Kerstin Fischer

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

University of Southern Denmark

John Benjamins Publishing Company Amsterdam / Philadelphia

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

8

TM

The paper used in this publication meets the minimum requirements of the American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.

doi 10.1075/pbns.270 Cataloging-in-Publication Data available from Library of Congress: lccn 2016041067 (print) / 2016050592 (e-book) isbn 978 90 272 5675 1 (Hb) isbn 978 90 272 6617 0 (e-book)

© 2016 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Company · https://benjamins.com

Table of contents

Acknowledgements

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

chapter 1 Introduction 1.1 Open issues  2 1.2 ‘Simplified registers’  6 1.3 Overview of the volume  11 chapter 2 The role of the addressee: Central notions 2.1 Recipient design and membership categorization  13 2.2 Participation structure  19 2.3 Audience design and style  21 2.4 Accommodation 25 2.5 Register 27 2.6 Common ground  31 2.6.1 Common ground as a set of propositions  32 2.6.2 Common ground as a shared basis  39 2.6.3 Common ground as a collaborative achievement  45 2.7 Alignment 47 2.8 Feedback 56 2.9 Interim conclusions  64 chapter 3 Methods and data 3.1 Methods 67 3.1.1 Experimental data elicitation  70 3.1.2 Qualitative analysis  73 3.1.3 Quantitative analysis  79 3.2 Human-robot interaction data  80 3.3 Transcription conventions  86

ix 1

13

67

vi

Designing Speech for a Recipient

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

chapter 4 Talking to children 87 4.1 Linguistic description of the variety  88 4.1.1 Phonetic, phonological and prosodic properties  88 4.1.2 Grammatical properties  94 4.1.3 Semantic and pragmatic properties  101 4.2 Homogeneity 106 4.2.1 Variation: Speakers  106 4.2.2 Boys versus girls  113 4.2.3 Variation: Developmental finetuning  114 4.2.4 Variation: Situation  119 4.3 Functionality 122 4.3.1 Affective functions  123 4.3.2 Communicative functions  125 4.3.3 Facilitating language acquisition  126 4.4 The role of alignment  138 4.5 The role of the partner’s feedback  140 4.6 The relationship between partner modeling, alignment and feedback  143 chapter 5 149 Talking to foreigners 5.1 Linguistic description of the variety  150 5.1.1 Phonological and prosodic properties  150 5.1.2 Grammatical properties  154 5.1.3 Semantic and pragmatic properties  158 5.2 Functionality 162 5.3 Homogeneity 167 5.4 The relationship between partner modeling, alignment and feedback  171 chapter 6 Talking to robots 6.1 Linguistic description of the variety  178 6.1.1 Phonological and prosodic properties  178 6.1.2 Lexical properties  181 6.1.3 Grammatical properties  185 6.1.4 Semantic and pragmatic properties  187 6.2 Functionality 198 6.3 The role of alignment  200

177

Table of contents vii

6.4 The role of partner models  209 6.4.1 Appearance 210 6.4.2 External variables: Gender  216 6.4.3 Personal partner models and linguistic choice  218 6.5 The role of the partner’s feedback  227 6.6 The relationship between partner modeling, alignment and feedback  236

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

chapter 7 Conclusions 7.1 Simplified registers?  249 7.1.1 The notion of simplification  250 7.1.2 The notion of register  260 7.2 The role of the addressee in interaction – some answers  272 7.3 General conclusions  285

249

Bibliography

291

Index

323

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Acknowledgements

All in all, this book has taken a bit more than 15 years to materialize, and thus it has accompanied many phases of my life and received support from many people. Most notably and for the whole period, John Bateman has accompanied the project and supported it in many ways, as a boss, a colleague and as a friend. John introduced me to Systemic Functional Grammar and patiently discussed with me concepts of genre, social semiotics and a formal ontology-based view of alignment. Furthermore, I am very grateful to Herb Clark for wonderful long discussions, in his office, at the Bytes cafe or on some hiking trail, as well as for his moral and practical support. His publications have been an inspiration from day one, and the possibility to discuss central topics such as automaticity, alignment and the collaborative view of using language with him over the past years has been a great gift. I have furthermore received wonderful feedback for individual chapters; Eve Clark has extensively commented on the child-directed speech chapter, my colleague Søren Wind Eskildsen has reviewed the foreigner-directed speech chapter, and Jodi Forlizzi has commented on the robot-directed speech chapter, all three helping me to stay on course (remaining errors and inconsistencies are of course only mine). I am furthermore much indebted to Anita Fetzer for her kind support, as well as to Arnulf Deppermann for extremely interesting comments and suggestions, which have certainly made this a better book, and to the two anonymous reviewers. In the background, there have been many people supporting me over the past years; for instance, first Walter von Hahn and then Wolfgang Menzel have always made me welcome at the Natural Language Systems Group in Hamburg, providing me with a great infrastructure to calmly and quietly write this book in LaTeX. Furthermore, Kilian Foth is there for me anytime, to test out ideas and lines of argumentation or to solve urgent technological issues, and Maria Alm always tries to keep every organizational issue off my shoulders. Finally, I am very grateful for all the kind encouragement I have received from so many sides, especially from my family and from my friends and colleagues at Stanford and in Denmark. I am blessed with a friendly, kind, cooperative and supportive working environment, for which I am very grateful.

x

Designing Speech for a Recipient

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Last but not least I have to thank the Danish Research Foundation for sending me to Stanford in the first place, to the HStar Institute for hosting me, and to the German Research Foundation for funding my human-robot interaction research in the framework of the collaborative research area SFB/TR8 ‘Spatial Cognition’.

chapter 1

Introduction

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Actions are generated taking into account that they will be interpreted by a specific other – that is, they exhibit recipient design. (Levinson, 2006, p. 45)

The aim of this study is to clarify how speakers make linguistic choices for their addressees and thus to pin down what exactly it means to design one’s utterances for a particular communication partner. Although models of communication generally include the communication partner in some way or other, the focus in many approaches to language, be they semantic, pragmatic, psycho- or sociolinguistic, is on the speaker as an encoder or producer, assigning the partner the role of a hearer, listener, decoder or processor, and treating language use as if it was an entirely individual process (Bavelas, 2005). Recently, in many disciplines interest in the role of dialog and joint action, and thus also in the role of the communication partner, has increased (see, for instance, Galantucci and Sebanz, 2009), and there are some disciplines in which language use is “studied as if it were entirely a social process” (Clark, 1996, p. 4). However, disciplinary boundaries, theoretical divisions and methodological restrictions have so far prevented a comprehensive understanding of the relationship between individual and social processes in communication in general and the role the addressee may play in interaction in particular (cf. Schober and Brennan, 2003). The current study aims to overcome these problems and to provide a more comprehensive account of what it means to formulate an utterance for a specific communication partner. The central role the orientation to the addressee may play in interaction is most obvious when speakers talk to communication partners who are ‘peculiar’ in some way. For instance, speech that adults direct towards children is intuitively perceived as different from speech to other adults. This has led to defining speech directed towards children as a particular kind of situationally determined variety, a “system of linguistic expression whose use is governed by situational factors” (Crystal, 2001, p. 6), see also Section 2.5 Register. The idea behind this notion is that language may provide particular re sources depending on certain situational parameters, and that the addressee may be such a parameter. Studies on communication with other unusual communication partners confirm this intuition, such that speaking to handicapped people (e.g. DePaulo and Coleman, 1986), to dogs (e.g. HirshPasek and Treiman, 1982; Mitchell, 2001), to foreigners (e.g. Ferguson, 1975), or to computers (e.g. Brennan, 1991), but also to people with markedly better or

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

2

Designing Speech for a Recipient

worse spatial capabilities than oneself (e.g. Schober, 2009), exhibits features that can somehow be related to characteristics of the addressee. These extreme cases of speaking for a particular addressee do not exclude that we adjust all our speech to our partners; since usually partners are not as different from ourselves as children are from adults, this process may be less obvious but may still be at work. Designing speech for a particular communication partner may thus not only be at work if communication partners are somehow ‘peculiar’, but an essential aspect of formulation in general. This position is held by Levinson (2006), who proposes an orientation towards the addressee to be part of a universal ‘interaction engine’. That is, the fact that utterances are designed in a way such that the particular addressee can figure out their meanings is taken to constitute an interactional universal (Levinson, 2006, p. 45). However, although addressee orientation may be both ubiquitous in, and characteristic of, interaction, we know surprisingly little about how exactly speakers design their utterances for their communication partners. Especially the relationship between cognitive processes like partner modeling, supposedly automatic processes like interactive alignment, and social, interactional processes is still open. This study thus concerns a particular aspect of language use, namely how speakers take their addressees into account. This question places the investigation firmly in the study of “language and social interaction” (Bavelas, 2005, p. 179); such a study involves the consideration of people who speak and people who listen, as well as the possible cognitive resources they possibly bring into the interaction, which make them choose one linguistic feature over an other. Yet, at the same time participants in talk-in-interaction are engaged in a joint activity, and the effect of the interaction itself may go far beyond individual plans and actions (Clark, 1996; Bavelas, 2005; Suchman, 2007). The stance taken in this investigation is thus neither reductionistic and focused entirely on the individuals in interaction, nor does it take interaction as an unanalyzable whole. Instead, the role individual and social processes play in orienting to the partner as well as their interaction are taken to be empirical issues of language use, which are addressed in this study. Accordingly, the procedure in this study will necessarily be interdisciplinary, involving notions, methodologies and results from disciplines addressing individual and social processes.

1.1  Open issues Speakers have been found to design their utterances for a particular communication partner depending on the addressee’s supposed needs, capabilities and current activity. That is, speakers may select the linguistic properties of their utterances on the basis of what they believe to be suitable for the addressee at the moment

Chapter 1.  Introduction

of speaking. In conversation analysis, this phenomenon has been called recipient design, “the multitude of respects in which the talk by a party in a conversation is constructed or designed in ways which display an orientation and sensitivity to the particular other(s) who are the co-participants” (Sacks et al., 1974, p. 727). Taking this definition as a starting point, there are at least three aspects involved in the process of recipient design: “the particular other(s)”, “the multitude of respects in which talk is constructed or designed”, i.e. the linguistic choices made, and the “orientation and sensitivity to” the particular other, i.e. the relationship between linguistic choice and the communication partner, which concerns the processes and mechanisms by means of which the particular other is taken into account. From various theoretical perspectives these three aspects of recipient design raise a set of questions. One of the questions to be addressed concerns the range and nature of the linguistic choices, i.e. “the multitude of respects” that are influenced by the orientation to a particular other. The first question to be addressed is therefore:

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.



What is influenced by the orientation to a particular other; that is, which linguistic features are affected by addressee orientation?

Second, we need to know what exactly “the particular other” comprises, i.e. which aspects of their partners speakers take into account and how they do this. In psychology, this has been addressed as an issue of partner modeling (e.g. Horton and Gerrig, 2005a,b). Brennan et al. (2010) suggest to distinguish between two sources of partner-specific information: “(1) information from a more or less global model of a partner or their characteristics, mentally represented from prior personal experience, from expectations, or else from stereotype, and (2) feedback that becomes available locally online, from cues that emerge as the dialog unfolds” (Brennan et al., 2010, p. 316). With respect to the former, there is much discussion on the exact nature of the hypotheses speakers have about their communication partners and the amount of detail of the models created (e.g. Shintel and Keysar, 2009; Galati and Brennan, 2009). In psychology, that participants build up models of their partners in some way or other is taken for granted, and the main open question to be considered is taken to be the question of “when and how” this is done (e.g. Schober and Brennan, 2003; Brennan and Hanna, 2009). In Brennan et al.’s words: “speakers tailor their utterances to what they know about addressees, and (…) addressees tailor their interpretations to what their know about speakers. What is not so clear is how and when they do this” (Brennan et al., 2010, p. 315). This ‘how and when’ is taken to be a matter of psycholinguistic processing, concerning the exact moment in the production process at which speakers take their partners into account and what can be inferred from this about the nature of the language processing architecture; regarding the question under which circumstances participants create and

3

4

Designing Speech for a Recipient

access partner models, for instance Horton and Keysar (1996) propose that speakers model their partners’ propositional knowledge, yet only if there is time. That is, under time pressure speakers are not expected to take their partners’ knowledge into account. However, as Schober and Brennan (2003) point out, the controlled psycholinguistic experiments in which many of the findings were obtained do not necessarily allow us to conclude what participants do ‘in the wild’; i.e. experimental studies may lack ecological validity. Thus, it needs also be considered in which kinds of situations participants may create and make use of partner models. The questions concerning ‘the particular other’ thus comprise, for instance: – –

Under which circumstances do speakers build up global models of their communication partners? Under which circumstances do speakers rather attend to local cues, i.e. when do we need to assume addressee orientation to be an interactional achievement based on the partner’s contributions?

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

If participants create models of their communication partners, further questions concern the nature of the models on the basis of which speakers make their linguistic choices. Thus it needs to be addressed what exactly is modeled, as well as whether speakers construe rough, stereotypical or “one bit” (e.g. Galati and Brennan, 2009) models of their partners or whether they build up detailed models of their interlocutors, as for instance Clark (1996) suggests: –

What about their partners do speakers model and in how much detail?

Furthermore, besides the kinds of representations speakers may or may not build up, it is open to what extent the model is made use of in the formulation process (see also Horton and Gerrig, 2005a). Viewing partner modeling from the perspective of the participants, the question becomes: –

To what kinds of information about their partners do speakers attend, which aspects of their communication partners do they treat as relevant and how do their choices leave traces in speakers’ linguistic behavior?

Besides global partner models, also immediate, contingent behaviors by the communication partner, for instance, the partner’s feedback or linguistic features of the partner’s utterances, may influence speakers’ choices locally. One possibility is that the linguistic features by means of which speakers orient at their communication partners stem from automatic alignment with the partner’s utterances; Pickering and Garrod (2004), for instance, propose that interactive alignment, i.e. the spontaneous, automatic re-use of the partner’s linguistic representations, renders partner modeling superfluous – unless there are problems. This leads to the following questions:

Chapter 1.  Introduction

– – –

Can alignment account for the addressee orientation observable? Under which circumstances does alignment take place? What is the relationship between partner modeling and alignment?

Moreover, communication partners are not mute, passive recipients, but may actively contribute to the speakers’ linguistic choices. Listeners may, for instance, produce short responses speakers may make use of during formulation. Gardner (2001), for example, shows that subtle differences in intonation contour can serve to transform a simple mm into a powerful means for the listener to indicate whether he or she would like more information on the topic. Open issues thus concern: – – –

What does the partner’s feedback contribute to the formulation process? What is the relationship between feedback and partner modeling? What roles does feedback play in the revision of partner models?

Furthermore, we may ask whether the way speakers formulate for possibly restricted communication partners like the ones investigated here in the same way as they formulate their utterances for communication partners who are relatively similar to themselves. That there may be different mechanisms at work has recently been suggested by Pickering and Garrod (2013). Thus, we shall address the question below:

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.



Do people draw on different mechanisms when formulating their utterances for somewhat restricted communication partners compared to when they are speaking to someone similar?

Finally, it is open how speakers make their linguistic choices, i.e. which relationships hold between ‘particular others’ and certain linguistic choices such that these choices display an orientation to the communication partners. For example, one possibility could be that speakers decide for each linguistic feature individually whether it is appropriate for a given communication partner in a particular situation. For instance, Brown (1977) suggests that the peculiar adaptations of child-directed speech are “elicited by indications of some psycholinguistic ability” (Brown, 1977, p. 14). One possibility is thus that each linguistic choice is made on the basis of a judgment of the linguistic capabilities of the (child) addressee. Alternatively, features may come in packages, such that one decision determines the use of a set of linguistic features. For instance, a conventional relationship between situation and linguistic features chosen may be assumed (e.g. Ferguson, 1994; Crystal, 2001), or participants may have hypotheses about functional relationships between linguistic features, such that if a partner knows feature X, he or she is likely to know Y. Linguistic features may thus be related to each other by convention, implication or analogy, or they may be decided on individually based on their respective functions. The question is therefore: –

How are linguistic choices related to each other?

5

6

Designing Speech for a Recipient

In addition, we may also ask whether the adaptations speakers make are really useful for the listener; i.e. it still has to be seen to what extent the adjustments made are suitable and functionally appropriate: –

What are the functions of the adaptations?

These questions will have to be addressed in order to arrive at a more comprehensive understanding of addressee orientation in interaction. While the analytical distinctions made in the questions listed here may appear misguided from the perspective of a cooperation view of talk-in-interaction, which understands speaking as a joint project (cf. Clark, 1996) and meaning as interactively construed (e.g. Schegloff, 1982), the analytical distinctions made here will shed light on what exactly it means for language to be a joint project or to be interactively achieved. The exact processes of collaboration between speaker and hearer while speaking will thus be a result of the study rather than one of its axioms, shedding light on the process in which this collaboration occurs. The current study thus provides a new approach to the phenomenon: –

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.





by breaking up disciplinary boundaries and integrating notions, methods and findings from linguistics, social psychology, sociolinguistics, conversation analysis and pragmatics; by focusing on the underlying mechanisms, aiming at determining the role of partner modeling, alignment and feedback, and by identifying the way in which these factors interact; by concentrating on those aspects that leave traces in particular linguistic behavior and which thus can be shown to be relevant to the speakers themselves.

1.2  ‘Simplified registers’ The reason for choosing simplified registers as a starting point for the investigation is that they are suggested to be linguistic varieties essentially defined by the respective communication partner. That is, previous work suggests that speaking to a particular communication partner may constitute a register, a linguistic variety for which the addressee is taken to be the defining factor for the characteristics of the speech observable. A register in a given language and speech community is, according to Ferguson (1977), “defined by the uses for which it is appropriate and by the set of structural features which differentiate it from the other registers in the total repertory of the community” (Ferguson, 1977, p. 212). In the case of simplified registers this use concerns situations in which an adult native speaker speaks to some one who is somewhat restricted in his or her capabilities so that the speaker needs to simplify his or her speech.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 1.  Introduction

The three so-called simplified registers selected for the current investigation are speech to children, speech to foreigners and speech to robots. Ferguson proposes speech to children (Ferguson, 1977, 2004) and speech to foreigners (Ferguson, 1975, 1981) to constitute simplified registers, even though he admits that “the processes that derive simplified registers from adult speech are not always simplifying in nature” (Ferguson, 1977, p. 212). Thus, we can expect to find also other mechanisms besides simplification in the three varieties investigated – if they are varieties, i.e. relatively coherent categories” of texts that share some social or situational characteristic” (Biber and Conrad, 2009, p. 5), at all. Whether the ways of talking to these different communication partners are indeed best understood as registers, i.e. as conventional linguistic varieties defined by the situations in which they occur, will how ever be an open issue addressed in this investigation.1 The approach to disentangling the contributions of the different processes involved and to identifying their roles in the communication process is to analyze communicative situations in which we can observe processes of addressee orientation in detail because the communication partner is ‘peculiar’. Focusing on somehow restricted rather than on more common, ordinary conversational situations allows us to see addressee orientation at work and possibly to identify the models speakers build up of their communication partners in relation to the communication partner’s linguistic behavior. By analyzing communication with somewhat restricted communication partners, we can thus study the interaction between partner model, alignment and feedback independently and with greater clarity than in communication between unrestricted, equal partners. The central role of addressee orientation in the formulation process may become most apparent if those aspects usually taken for granted cannot be relied on. For instance, Giles and Franklyn-Stokes (1989), in their survey of communicator characteristics, hold that the first thing we need to know about our communication partners is the language they speak (Giles and Franklyn-Stokes, 1989, p. 119). That a common code constitutes a precondition for formulating even the first utterance 1. Unfortunately, it is next to impossible to speak about the phenomena under consideration without presupposing at least provisionally the existence of such varieties, as well as of a neutral variety against which the way of speaking to one of the communication partners under consideration seems ‘peculiar’. For instance, much of the literature reviewed in this study concentrates on the problematic aspects of communication, and thus on so-called speech modifications. The term modification, however, presupposes that there is a standard from which the usage in ‘peculiar’ situations deviates. For example, Ferguson argues that simplified registers should be considered as derivative from adult speech (Ferguson, 1977, p. 214). This view is widespread, but very problematic (see Nabrings, 1981; Koole and Ten Thije, 2001; Wagner, 1996). The terms register, variety and modification will be employed provisionally in this study in order to refer to the phenomena under consideration since it is not possible to avoid them entirely, but we shall return to them in Section 7.1.2 The Notion of Register.

7

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

8

Designing Speech for a Recipient

for an addressee is highly plausible; however, that participants understand it as a precondition themselves becomes observable to the analyst only in situations in which a common code cannot be taken for granted. Therefore, the more common ground is lacking between participants, the more it becomes apparent to the analyst what it is that participants take into account when formulating their utterances for a communication partner. Thus, my proposal is to investigate communicative situations in which interlocutors are in some ways peculiar regarding their use of language, such that the principle, what is easiest for oneself is easiest for the other (Metzing and Brennan, 2003; Givόn, 2005), does not hold. The so-called simplified registers (Ferguson, 1982) focused on in this investigation are infant and child-directed speech, foreigner-directed speech and speech directed towards embodied artificial communication partners, namely robots. Although the communication with many other types of somewhat ‘peculiar’ addressees has been discussed in the literature, for instance, speech directed to dogs (Mitchell, 2001; Tannen, 2004; Roberts, 2004), to the elderly (Coupland et al., 1988; Thimm, 1998a), to dolls (Sachs and Devin, 1976), or to mentally challenged adults (DePaulo and Coleman, 1986), the three speech situations selected here provide a useful data spectrum to address the questions raised in this volume. The three speech situations chosen differ in several respects, so that from the analysis of all three of them we get a more complete picture of the nature of addressee orientation than from the analysis of just one of them. The three candidates for varieties under consideration, i.e. “system[s] of linguistic expression whose use is governed by situational factors” (Crystal, 2001, p. 6), provide us with a broad range of different characteristics and peculiarities that may support the identification of the factors that are involved in speaking for an addressee. For infant-directed and for child-directed speech (IDS and CDS, respectively), also often referred to as motherese or baby talk, many resources are available to answer questions as to how homogeneous they are, whether they are universal, what their functions are, and how they develop. English IDS and CDS turn out to be applied by all kinds of speakers (parents, siblings, as well as unrelated speakers of all age levels (e.g. Snow, 1972; Barton and Tomasello, 1994; Roy et al., 2009)) and with very similar properties, and thus to be relatively homogeneous; for instance, Gleason and Weintraub (1978) claim that “what was regarded as ‘mother’s speech’ in the earlier studies is now recognized as a register used at least in some form by all older speakers when interacting with young children” (Gleason and Weintraub, 1978, p. 213). The properties of speech to infants and children have been documented very well; speakers have been found to finetune their utterances to particular developmental stages of their addressees, and their adjustments have been suggested to support arousing and managing attention and to facilitate the

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 1.  Introduction

acquisition of language. Thus, the study of IDS and CDS allows us to address several central open issues concerning the role of the communication partner in interaction, especially regarding the functionality of the adjustments made. In contrast to IDS and CDS, the properties of speech directed at foreigners (Ferguson, 1975), also called foreigner talk, i.e. speech directed at non-native speakers, are much less homogeneous; as we shall see below, special features are employed only by some speakers and to differing degrees, and the merits of the adjustments made are very controversial, because besides their hypothesized function to aid communication by simplifying complicated aspects of the language, they may introduce a hierarchical relationship between the speaker and the addressee and may thus have negative social consequences (Smith et al., 1991). While some of the so-called modifications may indeed be useful, so that it is justified to speak of a ‘simplified’ register, others are not, and may instead be counterproductive (Zuengler, 1991). In contrast to child-directed speech, in which the difference between an adult speaker and a helpless infant is rather obvious, in the communication with non-native speakers, ‘foreignness’ first needs to be construed (Wagner, 1996; Firth and Wagner, 2007; Scollon and Scollon, 2001, p. 545). Differences in the construal of the partner are reflected in the heterogeneous choices speakers make even in similar situations. Furthermore, there is evidence of alignment (Snow et al., 1981; Pearson et al., 2006) and finetuning (Warren-Leubecker and Bohannon III, 1982), but also of stereotyping and lack of adaptation (Zuengler, 1991; Smith et al., 1991). Thus, the study of foreigner-directed speech allows us to pin down the speakers’ social and interactional reasons for making their linguistic choices. The investigation furthermore allows us to re-evaluate the hypothesis that foreignerdirected speech constitutes a conventional register defined as a variety according to situation (Ferguson, 1975, 1982). More than IDS and CDS, the investigation of foreigner-directed speech (henceforth FDS) therefore provides us with information on the different contributions of the conceptualization of the communication partner compared with the influence of social and interactional aspects as well as with linguistic conventions. Speech directed towards robots that can process natural language (RDS) finally differs from the other two ‘simplified registers’ considered in that it is implausible to assume that the linguistic properties observable constitute conventionalized ways of addressing the (artificial) communication partner because people in general do not have much experience with automatic speech processing systems. Thus, the processes investigated here, partner modeling, alignment and the role of feedback, may be even more at work in RDS than in the previous two candidates for varieties, which have both been proposed to constitute conventional ways of speaking to the addressee in question. Furthermore, RDS allows us to study the conditioning factors, the issue of homogeneity, and the functions of linguistic variation, for

9

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

10

Designing Speech for a Recipient

instance, in ways that are methodologically not available to the study of the other supposed registers. The difference lies in the fact that the properties (linguistic output, appearance, functionalities) of the artificial communication partner can be varied in controlled ways (Fischer, 2010b; Nass, 2010). Moreover, frequent miscommunication in human-robot interaction allows the identification of the speakers’ hypotheses about what makes their utterances more understandable to a robot; especially participants’ attempts at solving miscommunication shed light on their partner models. That is, especially the speakers’ proposed solutions in situations of misunderstanding allow the qualitative analysis of their partner hypotheses. Studies on speech directed at artificial communication partners illustrate the heterogeneity of the variety and reveal both intra- and interpersonal variation in speech addressed to artificial communication partners (Zoltan-Ford, 1991; Amalberti et al., 1993; Johnstone et al., 1994; Pirker and Loderer, 1999; Fischer, 2003, 2006b), as well as the prominent role of the conceptualization of the artificial communication partner as human-like or tool-like (Fischer, 1999, 2000a, 2006b, 2011). The linguistic behavior observable exhibits measurable adaptations to the particular robot’s linguistic behavior, but the heterogeneity of the data elicited also point to a strong influence of various partner models which can be related to differing stereotypes. Experimental results in human-robot interaction allow us to identify those aspects that influence the concepts speakers develop of artificial systems as communication partners and show in detail in which ways the robot’s output may shape the design of utterances by human users. To sum up, the investigation of RDS allows us to address the questions raised in this study in a way that is not possible in naturally occurring interactions; its unnaturalness and the fact that it only occurs in laboratory settings anyway can be turned into a methodological advantage that allows the empirical investigation of free linguistic interactions in yet highly controlled settings (Fischer, 2010b). To conclude, the three different speech situations under consideration shed light on different aspects of the impact of the addressee on linguistic choice, even though all three are suspected to be ‘simplified registers.’ At the same time, the results of the comparison allow us to address the question whether the adaptations speakers make really simplify their utterances for their communication partners and whether speakers simplify their utterances for various different communication partners in similar ways: – – –

In which ways are the adaptations made in ‘simplified registers’ simpler? To what extent do the speakers’ adaptations facilitate understanding? Does simpler mean the same for all communication partners?

Chapter 1.  Introduction

If simplification is not the only determining factor in so-called ‘simplified registers’, what other goals may play a role? Are all participants similar in the way they attend to their communication partners or are they predispositioned differently or have different agendas that make them behave differently towards their communication partners? Thus, it needs to be addressed whether personal needs and desires play a role when orienting to a communication partner; in particular, we are going to ask: –

What role do social factors, such as social status, identity needs or the interpersonal relationship between the speakers, play?

A final question may concern whether the adjustments made in the three socalled simplified registers are not only similar, but also related to some extent. This will allow us to access the prototype hypothesis (e.g. Hatch, 1983; DePaulo and Coleman, 1986; Ferguson, 2004; Prato-Previde et al., 2006), i.e. the idea that all simplified registers are derived in some way from CDS: – Are the three suspected varieties investigated related, with CDS being the prototype of a ‘simplified register’?

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

These questions will guide the investigation of the suspected varieties selected and will be re-addressed in Chapter 7 Conclusions.

1.3  Overview of the volume The next chapter, Chapter 2 The Role of the Addressee: Central Notions, describes the theoretical notions and concepts that have been developed in different disciplines to account for the phenomena under consideration. It summarizes what is currently known about addressee orientation from various disciplinary perspectives and thus sets the scene for the current investigation. Chapter 3 Methods and Data describes the methods and data used in this investigation. While child- and foreigner-directed speech have been investigated in some detail before, the literature on speech directed at robots is rather limited. So while the chapters on child- and foreigner-directed speech can rely on the ubiquitous literature on these so-called registers, in the case of speech directed at artificial communication partners, empirical studies of robot-directed speech are necessary. Thus, for the study of robot-directed speech I present empirical investigations of data that I have elicited in the framework of the DFG-funded special research area Spatial Cognition at the University of Bremen in 2003–2006. These studies address the issues raised above by investigating how people address robots in controlled experimental scenarios. For the analysis of the human-robot interaction data, I use

11

12

Designing Speech for a Recipient

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

a methodology that I have developed particularly for this purpose (Fischer, 2003, 2010b; Fischer et al., 2012) and which I describe in detail in Chapter 3 Methods. The following three chapters address infant- and child-directed speech, foreigner-directed speech and robot-directed speech respectively. For each of the three suspected varieties I first determine their structural features and the amount of variation observable and thus the degree of homogeneity of the data, as well as the parameters of the variation. In a second step, I discuss to what extent the choices speakers make are functional with respect to the aims of the respective speaker. Finally, I investigate the relationship between partner modeling, alignment and interaction for each of the varieties under consideration. The last chapter compares the findings on the three different suspected varieties and provides some answers to the questions raised above in Sections 1.1 Open Issues and 1.2 ‘Simplified Registers’. Finally, the implications of the general approach to addressee orientation developed in the current investigation for our understanding of language in a cognitive and a social context will be discussed.

chapter 2

The role of the addressee

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Central notions

In this chapter, I bring together several theoretical concepts that have been developed in different disciplines to account for the role of the addressee in interaction. When studying the linguistic features that have been suggested to be influenced by addressee design as well as the processes and mechanisms that bring them about, it soon emerges that an interdisciplinary approach is necessary to obtain a full understanding of the phenomena under consideration. One reason is that the same phenomena are addressed in different disciplines so that there is a considerable overlap of interest from various perspectives. For instance, sociolinguistics has described the influence of addressee orientation on the phonetics and phonology of utterances, but also studies in social psychology produced results on addressee-dependent prosodic delivery of, for instance, given versus new words, and in addition on the role of the addressee in lexical disambiguation, definiteness and choice of spatial perspective. Register theory concerns choices on all linguistic levels, for instance, phonetic/phonological, lexical and syntactic. Likewise, conversation analysis has shown effects of recipient design on various linguistic and interactional levels. The description of the effect of addressee orientation on linguistic choice thus has to include findings from a broad range of disciplines. A second reason to take an interdisciplinary approach is that different explanatory concepts have been developed in different disciplines. Together, these theoretical notions are more likely to create a comprehensive picture of addressee orientation in interaction. In the following, I briefly describe central concepts that have been proposed to account for the role of the addressee from different disciplinary perspectives.

2.1  Recipient design and membership categorization The concepts of recipient design and membership categorization have been developed in the framework of ethnomethodological conversation analysis (CA). The notion of recipient design refers to “the multitude of respects in which the talk by a party in a conversation is constructed or designed in ways which display an orientation and sensitivity to the particular other(s) who are the co-participants” (Sacks

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

14

Designing Speech for a Recipient

et al., 1974, p. 727). Thus, the term recipient design does not refer to the process or product of designing a recipient, i.e. of creating a model of him or her, but to the variety of practices oriented to the recipient. CA has provided a large body of work on domains in which speakers can be shown to design their utterances for particular addressees; for instance, Fox (2008) writes: “There is extensive evidence that the syntactic structure of a given utterance is constructed for the particular action under production, for this particular recipient at this particular moment in the interaction” (Fox, 2008, p. 264). She concludes that “the process of accommodating particular others is thus central to conversation” (Fox, 2008, p. 257). For reference to persons and places, Sacks and Schegloff (1979) identify two preferences in speakers’ design of utterances for the communication partner: minimality and recognizability. The first preference is to minimize the description, i.e. to be as brief as possible. The second preference is to use so-called recognitionals, i.e. terms by means of which the recipient can immediately identify the person referred to. That is, speakers can be demonstrated to tailor reference terms to their addressees’ needs such that these terms constitute descriptions that are telling enough to achieve recognition but which are not more informative than that. Consequently, speakers take into account whether their addressees are likely to know the referent or not and by what information (cf. Schegloff, 1997). This is confirmed in a study by Betz (2015) who investigates recipient design in terms of assumed epistemic accessibility and subsequent negotiation through reference formulation in talk about third parties; in particular, she looks at uses of the ”definite article + proper name” pattern in German conversation regarding the degree of accessibility of the referent for the communication partner. The reference terms chosen may thus differ considerably for different addressees. In On Granularity, Schegloff (2000) furthermore shows that speakers adapt the level of detail to what their addressees may need for the current activity. Thus, besides adapting referential descriptions only in terms of assumed knowledge of the addressee, also the addressee’s activity may be attended to as relevant for choosing the appropriate linguistic expression. Sensitivity to a recipient has even been demonstrated even for interactional features that have often been suspected to be beyond speakers’ control, such as repair initiation (Levelt, 1983). An example for the interactional function of repair is a correction such as the co- uh the officer in court. Jefferson (1974) shows that such a repair can fulfill numerous interactional functions, such as presenting the speaker as an ordinary guy who would normally use the colloquial term cop, but who, out of respect for the court, changes to the polite term officer. Thus, by initiating self-repair, the speaker does not only convey a message about him- or herself, but also demonstrates his or her attention to the recipient. Deppermann (2015) investigates instances in which utterances are not well designed for the respective recipient, i.e. counterfactual recipient design. He identifies

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 2.  The role of the addressee: Central notions

two possible causes for this: the speaker may presuppose private, non-shared information either for pedagogical reasons to elicit a corrective instruction from the partner as part of the learning process, or he or she may fail to take the pupil’s perspective into account out of egocentricity and lack of ability. Using these analyses, the author shows that recipient design also projects intelligible next actions and has other-positioning effects. In this way, recipient design links to the participants’ interactional history as well as to projected future joint action. Besides linguistic decisions like choice of appropriate expressions or relatively local interactional processes, such as repair, also more global structures in conversation may be tailored to the addressee; for example, Schegloff (1980) argues that presequences, for instance, storytelling initiations, such as have you heard or guess what, are always tailored to the respective participants. Furthermore, Deppermann and Blühdorn (2013) describe how speakers use negation to cancel possible inferences the listener is expected to draw; thus, negated utterances may serve to block a particular listener’s possible assumptions from entering common ground. Similarly, Schmitt and Deppermann (2009) describe how a university lecturer chooses not only the argumentation strategy, but also the example to illustrate it on, as well as the linguistic expressions for a particular addressee. Thus, all kinds of choices in discourse may be specifically tailored towards their addressees. Another example is constituted by place descriptions; Schegloff (1972a) finds that speakers frequently carry out a membership analysis in order to formulate a place description. For instance, before providing an answer to a question about a place, speakers may ask whether the recipient has already visited that place, or whether he or she is a member of a group of people who can be expected to know the place, e.g. ‘you’ve studied at Berkeley, haven’t you?’. In other words, speakers inquire into the addressee’s membership in order to formulate their utterances on the basis of knowledge expectable of members of particular groups. The membership analysis furthermore serves to identify the term that is possibly most preferred by the listener. Schegloff finds that people prefer members’ categories over more general descriptions (Schegloff, 1972a). Thus, although a place can be uniquely identified by a street name and a house and even an apartment number, if the recipient knows the name of the owner of that place, a description drawing on this ‘members’ knowledge’ is preferred.1 The findings presented by Schegloff (1972a) also show that speakers actively engage in identifying which kinds of information are suitable for their addressees. These efforts include attempts at categorizing the addressee because the category, such as group membership, may allow conclusions regarding the addressee’s 1. From a different perspective, Clark (1998b) reaches a similar conclusion, see Section 2.6.2 Common Ground as a Shared Background.

15

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

16

Designing Speech for a Recipient

preferences regarding the terms chosen. This leads to another concept proposed in CA work, namely membership categorization. While the study of recipient design focuses on the practices speakers employ when speaking for particular recipients, the study of membership categorization concerns the procedures by means of which categories are assigned. The notion of membership categorization was developed by Sacks (1972a) in a methodological article called An Initial Investigation of the Usability of Conversational Data for Sociology. Sacks argues that sociological analysis may not rely on categories as a given, and proposes a method for identifying social order by analyzing the process by means of which categories are assigned. He shows in detail that categorization is methodical and therefore analyzable as an object for sociology. His empirical analysis of dozens of telephone interactions at a suicide prevention center, all listed in the appendix of the paper, reveals that participants, the speakers in these interactions, rely on particular methods to construe interactively the category ‘person-to-turn-to’. He finds that callers, if the question arises whether there is a person one might turn to for help, discuss their closest family members, such as parents, spouses, siblings or children. Sacks observes that categories allow for certain inferences or even search procedures – which can be understood quite literally here since police reports, which Sacks draws on in addition to the telephone interactions, show that also the police, when investigating a death, orient towards the same group of people to ascertain whether a person was suicidal. From his observations, Sacks develops the notion of membership categorization device, consisting not only of collections of categories but also of rules for their application. For instance, he finds that speakers, once one type of category has been used, are likely to categorize other members with respect to the same collection of categories. For example, if one person is referred to on the basis of her profession, other people mentioned will also be referred to on the basis of their respective professions. This is called the rule of consistency, and as all rules in CA, it describes observable behavior to which participants can be shown to orient such that violation is usually treated as accountable, i.e. in need of explanation (Schegloff, 1992; Heritage, 1988). The collections of categories plus their rules of application are thus concepts that are grounded in participants’ observable behavior (Sacks, 1984). Besides the rule of consistency, there is also a rule of economy, which basically states that in most cases a single referential term suffices even if many more categories could possibly be assigned. Membership categorization thus involves two resources: a collection of categories that could be applied and a set of rules stating when and how these category terms may be applied (Sacks, 1972a). Sacks (1972b) addresses categories of persons and categories of activities and discusses the way in which these categories codetermine eachother. For instance, the choice of baby influences the choice of cried because crying is an expectable activity

Chapter 2.  The role of the addressee: Central notions

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

for a baby, whereas other descriptions, which may be equally true, like the male cried or the baby made noise, are not equally expected. Thus, the categories form collections from which the speaker selects those that are particularly inference-rich. Yet, membership categorization is not taken to be a cognitive phenomenon but to be entirely socially and methodically construed in the contingencies of interaction and by the participants themselves, as we have seen for the ‘person-to-turn-to’ category. That is, in a membership categorization analysis, the conversation analyst has to show that participants attend to the categories proposed, i.e. that the concepts worked with are participant categories. In the UK, Sacks’ notions have been taken up and developed into an approach called Membership Categorization Analysis, which is applied to all kinds of data, from conversation to newspaper articles. Hester and Eglin (1997a), for instance, argue that Sacks’ work can give rise to both contextualized and decontextualized forms of membership categorization analyses. The decontextualized approach is taken, they argue, in anthropology, where scholars work with semantic categories that are established around certain dimensions, such as color or kinship terms. They contrast this approach with a contextualized approach, which regards categories in the contingencies of talk-in-interaction. An example from Hester and Eglin (1997b) illustrates what they understand by ‘contextualization’; they address one turn from a discussion between a teacher and other (unspecified) participants in which a teacher makes the following statement about a pupil: (1) TI: Now, the other side which I see of him is that (0.7) he’s a thief (0.7) you know, hgh he’ll pick up anything (-) y’know errgh (1.3)

Hester and Eglin (1997b) analyze the turn as follows: In this extract the teacher categorizes the referral as ‘a thief’ and as one who will ‘pick up anything’. Our question is what is meant by such categorizations. Does the category ‘thief’ refer to bank robbery, jewel theft, or industrial espionage? Is the teacher talking about fraud, picking pockets or white collar crime? Does ‘picking up anything’ refer to picking up litter, boys or girls? We take it that none of these ‘understandings’ seem appropriate…. The contextual resources used in making sense of this categorization include the category membership of the subject (referred pupil), the setting (referral meeting), the category membership of the participants (teacher and psychologist) and the immediate context of the co-selected descriptions. (Hester and Eglin, 1997b, p. 29, emphasis original)

The authors take these different aspects to constitute a category-in-context (Hester and Eglin, 1997b, p. 30). While this analysis may be very plausible and reflect the commonsense understanding of the respective turn, the authors do not acknowledge that it is their own commonsense knowledge that is applied to the analysis,

17

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

18

Designing Speech for a Recipient

rather than the result of an analysis of the participants’ understanding of the turn, or even being a fact that sociological analysis has to explain, as Sacks (1972a) or Schegloff (2007) formulate it. How this type of Membership Categorization Analysis is used for identifying and explicating one’s own commonsense knowledge is also illustrated in Lepper (2000), which provides an introduction to the method and in which exercise questions after a few pages ask the reader, for instance, to “identify the standardized relationship pairs which are evoked in the text” or “what locus of rights and obligations is evoked?” (Lepper, 2000, p. 18). While in this method the answers to these questions are provided by the analyst, Sacks (1972a) spends pages of painstaking analysis to identify standardized relationship pairs and the rights and obligations connected to them as participant categories in the data. Both the notion of category and the notion of context consequently differ in this approach from the CA assumption that researchers may only work with members’ categories, i.e. categories that are oriented to by the participants themselves (Sacks, 1972a,b; Schegloff, 1997, 2007). An important consequence of the CA position to concentrate on participant categories is that none of the strategies are applied ‘off the shelf’, but that they are continuously adjusted to the contingencies of the ongoing interaction. This concerns on the one hand the types of categories evoked; on the other, the recipient is not a passive listener, but contributes actively to the way he or she is categorized. For instance, as described above, Schegloff (1972a) shows how spatial descriptions are designed for their recipients on the basis of questions to which the recipient provides the answer. Thus, although speakers can be demonstrated to pay attention to recipient design, the recipient may provide the information he or she considers relevant him- or herself. Feedback signals are one of the means listeners have at their disposal to provide the speaker with information relevant for tailoring utterances for them (cf. Gardner, 2001). Furthermore, listeners may provide clarifying questions, interruptions, or candidate answers. For instance, Monteiro and Keating (2009) describe the socialization process of researchers in an interdisciplinary research team. By means of strategically placed clarification questions and interruptions, the interdisciplinary audience “teach” each other the appropriate level of detail necessary so that presentations be both informative and general enough to be understandable for the listeners from various different disciplines. In this sense, recipient design, i.e. the multitude of respects with which talk is oriented to a particular recipient, has to be viewed as a highly interactional phenomenon. This dynamic aspect of recipient design has also been the focus of a study by Schmitt and Deppermann (2009) who argue that people do not only adjust to their partners based on socio-categorical knowledge about the addressee and institutional roles that hold between the participants, but also based on local interactional solutions to interactional problems, which result in a dynamic, increasingly more

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 2.  The role of the addressee: Central notions

specific categorization of the communication partner, until it finally stabilizes. The authors also suggest to distinguish between the cognitive side of addressee orientation in which cognitive representations of the communication partner are evoked in order to tailor the following activity towards the respective recipient, and a practical side of recipient design in which the addressee is positioned by the speaker as a particular kind of recipient (Schmitt and Deppermann, 2009, p. 106). Similarly, also Schmitt and Knöbl (2013) argue that speakers’ actions are always realized in continuous interpretative consideration of the concurrent behavior of the partner. The speaker’s recipient design, what Schmitt and Deppermann (2009) have called the positioning of the partner, is being made intersubjectively available for the co-participants, just as any other interpretation is displayed to the participants as rejectable or ratifiable. The authors attempt to show that recipient design should be studied from a multimodal perspective since besides linguistic adjustments also eye gaze, body orientation and the ongoing activity contribute to the positioning of the respective recipient. To sum up, work in CA has shown that speakers attend to addressees’ knowledge (as apparent from their practices when trying to achieve recognition), membership (as apparent from speakers’ inferences with respect to spatial acquaintance, for instance), social status (as apparent from the interactive function of repairs), face needs (as apparent from studies on preliminaries), but also their concommitant displays in interaction. The aspects attended to are both specific and categorydependent; that is, they rely on knowledge about the particular communication partner as well as on inferences about their knowledge, familiarity and current needs due to their membership in particular categories for designing talk.

2.2  Participation structure Another central notion concerns the different roles participants may play in conversation. Goffman (1975) introduces the notion of participation structure to account for the fact that not all participants have equal status. In particular, he distinguishes between overhearers and ratified participants; among the latter are addressees and side participants, whereas overhearers can be distinguished in bystanders, who have access to what the speaker is saying and whose presence is recognized, and eavesdroppers, whose presence is not fully recognized (Goffman, 1975, p. 260). Based on Goffman’s categories, Clark et al. (1992) suggest a further distinction, namely that speakers may be their own listeners, i.e. they may be monitoring their own speech; in addition, Clark and Schreuder develop the notion of audience design based on participation structure such that different participant roles require different kinds of attention from the speaker (see Section 2.3 Audience Design and

19

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

20 Designing Speech for a Recipient

Style below). Participation status is defined relative to the participation framework (Goffman, 1979, p. 11), embedding “interaction arrangements” (Goffman, 1979, p. 23). For instance, participation status may determine participants’ right to take the turn, and speakers may be found to make linguistic choices in ways so as to indicate participation status to their interlocutors, allocating the next turn to particular coparticipants while simultaneously making others overhearers (Clark et al., 1992, p. 250). Therefore, participation status has consequences for audience design such that speakers design their utterances so that their coparticipants can identify who the participant addressed is and who is supposed to speak next, assigning the roles of speaker, addressee and overhearer for a limited amount of time (Clark et al., 1992, p. 250). Regarding the assignment of speaking roles between participants, so-called turn allocation, Sacks et al. (1974) suggest several, mostly linguistic, techniques that speakers may use to indicate who is to speak next. One of them is the use of a tag question, such as you know? or don’t you agree?, which construes a turn as the first pair-part of an adjacency pair (Sacks et al., 1974, p. 718). In addition, such techniques may comprise physical arrangements, conversational history, manner of speaking and linguistic content (Clark and Carlson, 1992, p. 222). Levinson (2006) furthermore suggests that participation structures differ crucially concerning the role participants play with respect to utterance design. Whereas speech is designed for an addressee implying ‘I bet you can figure out why I did this, just by knowing that I know you can’, thus presupposing Gricean intention recognition and mutual salience for speakers and listeners at the moment of speaking, speech is not designed for other participants in this way. Levinson proposes that participation structure is reflected in the difference between intention attribution versus intention interpretation, where only the first is based on recipient design such that listeners can assume that the speaker has constructed his or her actions to be interpretable for the intended participant(s). Along similar lines, Branigan et al. (2007) suggest that addressees, in contrast to side participants, may also be more inclined to attempt to achieve fuller understanding (Branigan et al., 2007, p. 167). Another aspect of participation structure is that often also different activities are associated with different participant roles. Goffman (1979) therefore introduces the notion of footing to describe the orientation of speakers towards different activities within the same interaction space. Examples of such footing have been studied especially in the context of interactional sociolinguistics; for instance, Tannen and Wallat (1986) investigate interactions between doctors, parents and children and show how by means of subtle linguistic changes doctors switch elegantly between different addressees and at the same time between different activities, such as examining, informing or entertaining. Hutchby (1995) illustrates the influence of participation structure on advice-giving on call-in radio; he shows that advice-giving

Chapter 2.  The role of the addressee: Central notions

talk is construed to be simultaneously relevant both to the specific recipient, the caller, and to the non-specific public overhearers, the audience. Both the advicegiving expert and the radio host exhibit particular linguistic modifications and generalizing strategies in order to involve both types of recipients. Further such studies also concern, for example, family or professional interactions around pets (Tannen, 2004; Roberts, 2004) in which prosodic and other linguistic cues indicate the particular recipient while utterances may also be designed for the overhearers. To sum up, participation structure constitutes an important factor for speakers’ linguistic choices on various linguistic levels; on the one hand, there are systematic differences in adjustments to participants in different participant roles (e.g. Schober and Clark, 1989; Clark, 1992; Levinson, 2006), on the other linguistic choices can be used to establish participation structure, i.e. to identify the person addressed and to switch between activities connected with different participant roles (e.g. Tannen and Wallat, 1986; Tannen, 2004).

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

2.3  Audience design and style Audience design is a term used in different disciplines to refer to slightly different aspects of addressee orientation. In psychology, the concept of audience design was developed by Herbert H. Clark and collaborators (Clark and Marshall, 1981; Clark, 1992) in order to describe how different participation structures influence language use. The notion builds on the findings that not all participants in conversation are oriented to in the same way. Thus, addressees play very different roles than other ‘hearers’, such as overhearers or eavesdroppers, mainly due to their ability to participate in dialog by means of feedback, and several studies by Clark and collaborators (Clark and Carlson, 1992; Schober and Clark, 1989; Clark and Schaefer, 1992) show that speakers hold different kinds of responsibilities towards addressees on the one hand and overhearers on the other. While speakers are generally bound to be understandable, collaborative and polite towards their addressees, their responsibilities towards their overhearers are very different. For instance, they may be indifferent with respect to their overhearers’ understanding, or they may try to conceal information from them while still informing the addressee, for example, when parents use spelling to talk about things their child is not supposed to hear. At the same time, addressees, not overhearers, may provide the speaker with information about their specific informational needs that may help speakers adapt their utterances optimally to what their hearers really need to know; for instance, Clark et al. (1992) hold that “speakers assign listeners to the roles of addressee, side participants and overhearer by the way they engineer their utterances in the current situation” (Clark et al., 1992, p. 250). These roles are assigned for a limited

21

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

22

Designing Speech for a Recipient

amount of time only (Clark et al., 1992, p. 251). Even though audience design differs for addressees and overhearers, speakers have to take all of their listeners into account when designing their utterances, addressees as much as side participants and potential overhearers, in spite of generally differing responsibilities of the different participants, for instance, when public side participants are present in radio talk shows, on TV or in courtrooms (Clark and Schaefer, 1992, p. 273). Furthermore, “[p]articipants are also distinguished from overhearers by the history of the ongoing interaction. If certain hearers were participating during the last utterance, (…) they can assume that they are also participants for the current utterance” (Clark and Carlson, 1992, p. 223). The assignment of speaking roles may thus not necessarily take place on an individual basis, but may rather be based on judgments of listeners as holders of particular roles (Clark et al., 1992, p. 249). The term audience design is also used in sociolinguistics, where it was introduced by Alan Bell in 1984. However, here the term is not related to differences in participation structure, but concerns mainly social differences between groups of addressees. Bell (1984) observed that the same speakers on the radio spoke differently depending on the radio station they were working for and depending on the assumed audience of the particular radio program. While sociolinguistics up to that point focused mainly on the effects of characteristics of the speaker on the speech produced, Bell suggests that speakers may also switch between styles depending on their listeners. Bell (1984), who focuses on phonetic and prosodic aspects of language use, therefore proposes audience design to be an as important variable as external variables describing the speaker, such as gender, age, or social class. He suggests that the person addressed may influence the speaker’s linguistic behavior consistently and to an even greater extent than the variable social class. Inspired by criticism such as that by Ladegaard (1995), who shows for a group of teenagers at school that they are more likely to design their utterances for their overhearers, their classmates, than for their addressee, the teacher, Bell (2001) more recently proposes that speakers also attend to referee design. That is, although audience design for a particular addressee is still taken to be the most important mechanism, speakers may orient at overhearers who are for some reasons more important to the speaker than the respective addressee. Similar findings for discourse practices have been made by Heritage (1985) who identifies special ways in which talk-show guests are oriented to by talk-show hosts in public broadcasts, which are also designed more for the overhearers than for the direct communication partner, and which even would be inacceptable in conversation. A recent study of referee design is Hitzler (2013), who investigates interactions between a teenager or his or her parent, a social worker and a director of an institution in which the teenager had been placed. In these interactions, it is the director’s job to sum up the relevant events during the reporting period, which are mostly

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 2.  The role of the addressee: Central notions

known to the teenager or parent talked about, yet the information is new to the social worker, who is the main decision maker in the process. The director solves this problem by navigating between talking to and talking about the teenager or parent, inviting him or her to collaborate in the narration and thus to support her recollection of the events, whereas the actual recipient is the social worker. Thus, Hitzler (2013) distinguishes between addressee orientation and recipient design. This distinction corresponds largely to the notion of audience versus referee design (Bell, 2001; Ladegaard, 1995). In a study of police interviews, Haworth (2013) furthermore illustrates how a mismatch in assumed audiences can lead to miscommunication and to potentially detrimental effects for the interviewee. In particular, while interviewees design their utterances for the present police officers, the officers also orient to the potential audience in court where the video recording of the interview may function as evidence for or against the interviewee. The combination of addressee and referee design, which is even opaque to the interviewee, may thus have both local interactional and judicial consequences for the interviewee. The notion of referee design can also account for the fact that speakers use stylistic variation to define their identity (cf. Labov, 1966). The orientation towards overt and covert prestige (Trudgill, 1972) can similarly be understood as instances of accommodation to prestigious cultural norms and as referee design to an assumed macho-culture respectively (Yaeger-Dror, 2001, p. 172). The notion of style has therefore been suggested to account for “the speaker’s way of navigating with respect to the social” (Rickford and Eckert, 2001, p. 6). Stylistic variation is consequently seen as a way of constructing social categories and identities, accounting also for “the interaction among variants of a single variable, and of the situated use of variation” (Rickford and Eckert, 2001, p. 5). Rickford and McNair-Knox (1994) investigate the interaction of different speaking styles empirically on the basis of interviews with Foxy, an AfricanAmerican teenager. They compare the quantitative distribution of some grammatical African-American Vernacular English (AAVE) features, such as absence of possessive -s, absence of plural -s, absence of third singular present tense -s, absence of copula, and invariant habitual be (Rickford and McNair-Knox, 1994, p. 247). In these interviews, Foxy was interviewed by people from different ethnic backgrounds and also by same age and different age interviewers. Moreover, interviews were carried out by the same interviewer several years apart so that also Foxy’s linguistic behavior over time could be investigated. Finally, the range of topics addressed was held comparable over the interviews so that the data also allow the analysis of different topics as a possible factor for differences in audience design. Their results show that Foxy used many more AAVE features when talking to familiar addressees of her own ethnic background than to an interviewer from a different ethnic background, even though the interviewer based her interview

23

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

24

Designing Speech for a Recipient

directly on the previous interview, establishing a continuity between her interview and the one carried out by the researcher who shares Foxy’s ethnic and sociocultural background. Their results thus demonstrate a strong effect of the addressee as a sociolinguistic variable, even though the data do not allow the conclusion as to whether familiarity or shared cultural background determined the interviewee’s linguistic choices (Rickford and McNair-Knox, 1994, p. 258). At the same time, the authors ascertain that the results are not due to accommodation since Foxy’s interviewers use far fewer AAVE features than she does. Furthermore, Rickford and McNair-Knox (1994) show that Foxy’s linguistic choices change over time independent of her addressees, the interviewers (who remained the same). They relate these results to changes in Foxy’s attitude towards education and her aims in life (Rickford and McNair-Knox, 1994, p. 264). Similarly, topic seems to play a role since different amounts of AAVE features can be observed in Foxy’s speech depending on the topic, but the data do not allow any conclusions as to whether the style shifts are due to referee design, that is, due to talking as if talking to addressees associated with that topic or rather simply specific to that topic (Rickford and McNair-Knox, 1994, p. 261). To sum up, this sociolinguistic case study reveals that audience design plays indeed a crucial role as a sociolinguistic variable, but also that it may be intertwined with many further influencing factors, such as the speaker’s attitude towards education, the topic, the familiarity with the addressee, etc. The notion of style may be flexible enough to account for the interplay between such different factors that determine audience design (Rickford and Eckert, 2001). Applied disciplines have also produced numerous methods and findings related to audience design. For instance, in journalism, media theory or document design, the notion plays a crucial role. In document design, for example, it refers to the process of accounting for the prospective reader of the document to be created. In order to adapt to the readers’ needs and processing capabilities and to appeal to prospective readers so that they may start reading in the first place, document designers create implicit or explicit models of the target audience. These models concern, for example, biological and psychological processing constraints, expected user preferences, suspected user capabilities, such as lexical knowledge and linguistic capabilities, or suspected interests (Schriver, 1997). For their audience design, document designers may, for instance, draw on findings from gestalt psychology and neurobiology, on resources particularly created for that purpose, or on feedback from potential readers elicited using participatory methods. Schriver (1997) describes three different methods to create such models of the intended reader: classification-based, intuitive, and feedback-driven. In the classification-based approach, the intended audience is categorized in ways that allow the use of resources, such as lists of vocabulary appropriate for a given audience or pre-established

Chapter 2.  The role of the addressee: Central notions

correlations between, for instance, socio-economic status and expected norms and values, which provide guidelines for particular design choices. The intuitive approach relies rather on the document designer’s ability to imagine to be a member of the intended audience. The feedback-driven method, in contrast, is empirical such that it rests on studies involving real readers, who, by means of methods such as read aloud protocols or focus groups, provide feedback on the effects of a given document and thus provide evidence for the reading process by real members of the intended audience. To sum up, the term audience design has been used in various disciplines for different purposes; however, the results obtained in all of these point to an important role of numerous different aspects of the addressee that speakers or writers may take into account on various linguistic levels, such as phonetics and phonology, prosody, lexical and grammatical choice but also persuasive strategies and visual presentation. These adjustments occur with respect to participation structures, i.e. the different roles participants may play in different constellations, and with respect to groups of (as in Bell’s radio broadcasts) or individual (as in the study on Foxy’s speech) communication partners.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

2.4  Accommodation The relationship between identity management and the maintenance of relationships is a central concept of communication accommodation theory (CAT) (e.g. Coupland, 2001; Giles and Coupland, 1991; Coupland et al., 1988), for historical overviews see Thimm (1998b) or Giles et al. (2006). In contrast to Bell’s audience design model, which assumes the categorization of the addressee to be static, based on rather stable characteristics defined by, for instance, socio-economic class membership, communication accommodation theory models the process in which the speaker may or may not adapt to particular linguistic features of the communication partner’s utterances based on a set of possibly conflicting goals concerning identity and social distance. In particular, speakers may over- and underaccommodate to their addressees depending on several possible interactional goals. That is, speakers have the choice to align with their partners and to accommodate the linguistic properties of their utterances to the suspected needs of their partners or not, depending on their goals with respect to identity and relationship management. At the heart of CAT lies the idea that communication is not only a tool to exchange information, but that it is crucial for negotiating category memberships. In this connection, CAT aims at predicting and explaining “the adjustments people make to create, maintain, or decrease social distance in interaction” (Giles et al., 2006, p. 3).

25

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

26 Designing Speech for a Recipient

The adjustments to the communication partner, in their view, comprise linguistic (for instance, the accent), paralinguistic (such as speech rate, pausing, utterance length) and non-verbal behavior (smiling, gazing) (Giles et al., 2006, p. 5); however, in addition to these, researchers in this paradigm also use the term accommodation more loosely to embrace all kinds of behaviors. So in Giles et al. (2006), the authors use a questionnaire study in which participants were asked to reply to the question: “In general, how accommodating are police officers? (i.e. how well do you think they listen to people, take their views into account, and want to understand their needs and unique situation?)” (Giles et al., 2006, Table 1). The definition of accommodation implied in this quote therefore goes far beyond the linguistic, paralinguistic and non-verbal behaviors described above, now comprising “an array of tactics” (Giles et al., 2006, p. 134). CAT assumes furthermore that every interaction is embedded in a context of previous interactions, either of the same interactants, or of ‘significant others’. Similarly, the authors stress that people accommodate to where they believe people to be communicatively, which may be different from where they really are (Giles et al., 2006, p. 6), i.e. they may orient towards a stereotype; stereotypes may prevent people from adjusting to their communication partners’ real competences. This concerns especially speech to the elderly (Coupland et al., 1988) and speech to foreigners (Zuengler, 1991). Furthermore, the amount of accommodation also depends on the communication partners’ suspected social status (e.g. Bourhis, 1983).2 CAT uses speaker motives to explain the observable behaviors. For instance, a motive for convergence, i.e. the adjustment to the partner’s utterances, is taken to be the wish for social approval. The authors rely mostly on similarity attraction theory and on social identity theory (Giles et al., 2006, p. 142) to explain adjustments to the communication partner. They further more relate their reasoning to positive and negative face needs, which they describe as communicative goals (Giles et al., 2006, p. 140). To summarize, CAT provides a comprehensive framework in which issues of social identity, group membership, face needs and other personal goals are combined to explain accommodative behavior in interactions between individuals and social groups. Psycholinguistic experimentation supports the suggestion that speakers do not accommodate automatically to linguistic features produced by their partners; in particular, Kraljic et al. (2008) find accommodation to a speaker’s pronunciation 2. The orientation to stereotypes may however also constitute a strategic choice, as a study by Jörissen (2011) shows. In an investigation of a teacher’s response to a student’s clarification question in a math class, the author demonstrates how an orientation at the stereotypical student saves the particular student’s face by presenting the problem raised as a general problem of the whole class. Similarly, Schick (2014) proposes similar functions for a dance teacher’s practice to insist that students use the phrase ‘some people’ when criticizing each other.

Chapter 2.  The role of the addressee: Central notions

if it was a characteristic feature of the speaker but not if it was incidental, for instance, because the speaker had a pen in her mouth. Thus, speakers do not simply mindlessly pick up any features of pronunciation that they hear but select features on the basis of their source. To sum up, accommodation has been introduced as a concept to describe convergence and divergence phenomena between participants in dialog. The spectrum of linguistic features affected ranges from phonetic and phonological via lexical and syntactic to content and strategy selection in general. The features suspected to influence the amount of adjustments to the recipient comprise social and communicative goals, identity needs as well as speakers’ judgements on the nature or origin of the linguistic features under consideration.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

2.5  Register The notion of register was developed in order to account for situation-specific language use. Register theory was guided by observations like Firth’s: “Effective action and good manners require appropriateness of language in situational context” (Firth, 1968, p. 207). As Ferguson (1994) outlines, register analysis relies on the working assumption that a “communication situation that recurs regularly in a society (in terms of participants, setting, communicative functions, and so forth) will tend over time to develop identifying markers of language structure and language use, different from the language of other communicative situations” (Ferguson, 1994, p. 20). Register theory thus assumes that there is an association between the situation, be it externally defined or socially or cognitively construed, and language use: “A variety of a language is a system of linguistic expression whose use is governed by situational factors” (Crystal, 2001, p. 6). For early approaches in register theory, the conventional nature of this link was essential because it means that registers need to be taught, for instance, to foreign language learners (e.g. Halliday et al., 1964); linguistic choice is thus not determined by the current activity, by the state of affairs being talked about or by the requirements of the situation, but by the convention that a certain kind of language is appropriate in a certain situation. From this perspective, the relationship with functional requirements of the situation is regarded as secondary: “Linguistic features of registers can sometimes be seen to have language-external causes, (…) but otherwise they must be accepted as being in the same arbitrary type of relation to the situational features they correlate with as, in general, linguistic items are to the situational items they ‘mean’ ” (Ellis and Ure, 1969, p. 251–259). Crystal’s quote below illustrates a more recent approach to situationally defined varieties which sees situational appropriateness as a matter of conventional linguistic behavior:

27

28

Designing Speech for a Recipient

Varieties are, in principle, systematic and predictable. It is possible to say, with some degree of certainty in a given language, how people from a particular region will speak, how learners will write, or how television commentators will present a type of sport. Notions such as ‘British English’ or ‘Liverpool English’, ‘legal French’ and ‘sports commentary’ are the result. To change an important element in any situation is to motivate change in the language people use there, if they wish to behave conventionally – whether the change is from one region to another, from law court to the street, from home to pub, from one listener to many, or from face-to-face to distant conversation. (Crystal, 2001, p. 7)

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Other scholars, however, do not solely rely on conventional relation ships between linguistic form and situation; Biber (1994), for instance, distinguishes between register markers, which are usually conventional, and relative distributions (and often co-occurrence relations) of core linguistic features that are characteristic of a particular register and which are taken to be functionally motivated (Biber, 1994, p. 36). Also other scholars hold that registers are characterized by certain probabilities of occurrence of particular linguistic features. Approaches differ however in what aspects of situations they assume linguistic choice to be correlated with; Systemic Functional Grammar, for instance, assumes that the probabilistic relationship holds between lexico-grammar and situation types, which in turn represent more or less stable configurations of meaning. Thus, Halliday and Matthiessen (2004) define a register as a functional variety of language (Halliday, 1978) – the patterns of instantiation of the overall system associated with a given type of context (a situation type).These patterns of instantiation show up quantitatively as adjustments in the systemic probabilities of language; a register can be represented as a particular setting of systemic probabilities. (Halliday and Matthiessen, 2004, p. 27–28)

Halliday (1978) suggests three semantic configurations by means of which situational contexts can be meaningfully interpreted: field, mode and tenor. Field concerns the content, task or activity relevant in a particular situation; mode concerns the modality in which the text is delivered, such as written versus spoken, computer-mediated, mediated by telephone or face to-face interaction. Tenor concerns the interpersonal relationship between participants, such as status, role, relationship and certain properties of the communication partner, such as age and gender. The context of situation in this view is thus a social construct interpreted along the three dimensions field, mode and tenor, to which the three linguistic metafunctions, the ideational, the interpersonal and the textual function, correspond. The different configurations of features on these dimensions account for the functional choices speakers make, which are evidenced by particular distributions of linguistic features in text. For instance, a situation of child-parent interaction can be characterized by

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 2.  The role of the addressee: Central notions

the field, which is child play involving the manipulation of objects and the recounting of past events, similar absent objects and introducing imaginary objects into the play; the tenor, which is characterized by the participants, child and parent, but also by the fact that the child is determining the course of action, announcing own intentions and controlling actions of the parent, as well as sharing and seeking corroboration of its own experience by verbal interaction with the parent; and by the mode, which is spoken and both monological and dialogical, as well as taskoriented (Halliday, 1978, p. 30). In this version of register theory, register is “a configuration of meanings that are typically associated with a particular situational configuration of field, mode, and tenor. But since it is a configuration of meanings, a register must also, of course, include the expressions, the lexico-grammatical and phonological features, that typically accompany or realise these meanings” (Halliday, 1978, p. 38–39). That is, “each contextual parameter has the potential of activating choices from specific areas of meaning and wording” (Hasan, 2001, p. 7). In this view, language use is understood as genuinely situated; participants in discourse are expected to share a socio-cultural and semiotic background, while the situation suggests a certain subset of the lexico-grammar of the language to be more probable than another (Hasan, 1973, p. 273). The relationship between context features and features of lexico-grammar is understood to be causal: The “factors said to correlate with register variation stand in a causal relation to the formal patterns which are characteristic of a particular register” (Hasan, 1973, p. 281). Nevertheless, the relationship between context and linguistic choice is not deterministic, but probabilistic (Halliday, 1985; Halliday and Matthiessen, 2004; Hasan, 1973, p. 273). Also Biber and Conrad (2001) argue for a probabilistic relationship between language and situation, yet, instead of pairing particular linguistic features with a taxonomy of certain types of situations that are characterized by configurations of field, mode and tenor, they suggest a multidimensional approach in which corpora are clustered on the basis of the quantitative distribution of their linguistic features due to the functional correspondents to the affordances of the situation. In addition to establishing such a probabilistic correspondence, Biber (1994) argues that an approach to register analysis has to provide a systematic description of both the linguistic features involved and the situational factors determining the register. These descriptions should not contain open class parameters since these render a systematic comparison of registers impossible. For instance, he rejects Halliday’s field, mode and tenor because they do not form closed sets of parameters (Biber, 1994, p. 39) and thus do not allow systematic comparison. Biber and Finegan (see also Biber, 1994, p. 36) therefore criticize that register analyses are usually “atheoretical, lacking a paradigmatic, or even dominant, methodological and theoretical framework” (Biber and Finegan, 1994, p. 6). Biber’s own proposal to structure the

29

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

30

Designing Speech for a Recipient

set of possible situations comprises the dimensions communicative characteristics of participants, relations between addressor and addressee, setting, channel, relations of participants to the text, purposes, intents and goals, and topic/subject (Biber, 1994, p. 40–41), some of which he admits to be not fully worked out yet, and indeed it is difficult to see how, for instance, topic could ever be spelled out as a closed class set. Hervey (1992) therefore argues that register features due to the field, i.e. the topic of the respective speech event, are to be seen as symptoms of the topic and thus as involuntary and not revealing, in contrast to strategic features where the linguistic choices made reveal information about the respective speaker. Biber and Conrad (2009) suggest that register, genre and style constitute three different perspectives on varieties, not different varieties. Here, the register perspective is one that concentrates on characteristic situational distributions of those features that are functionally motivated, whereas the conventional structures of the whole text would be investigated from a genre perspective; the style perspective, finally, concerns individual or aesthetic preferences (Biber and Conrad, 2009, p. 2). From this perspective, register would thus be characterized by a functional relationship between language and situational context (Biber and Conrad, 2009, p. 6). This definition is somewhat at odds with those by Crystal (2001) or Ferguson (2004), for instance, where conventionality plays an important role, as well as with the one used in Systemic Functional Grammar where it is left open whether the relationship is functional or conventional (Halliday and Matthiessen, 2004). Nevertheless, situational characteristics are taken as more basic than the linguistic properties (Biber and Conrad, 2009, p. 9), and the authors present a framework for situational analysis that describes participants, relationships between participants, the channel used, production circumstances, the setting, communicative purposes and topic in similar ways as in Biber (1994), thus not allowing an exhaustive situational classification, either, since not all situational features will be relevant for situational comparison anyway (Biber and Conrad, 2009, p. 39–40). Besides the fact that up to now there is no complete situation typology, critics of register analysis have argued that situations may not be objectively given, but either interactively (e.g. Gumperz, 1982; Schegloff, 1997) or cognitively (e.g. Lakoff, 1987; Fischer, 2000b; Langacker, 2008) construed. In this case, the linguistic choices speakers make contribute to defining what the situation consists in. That is, as much as language may be determined by situation, situation may be construed from language use, for instance: Rather than being determined by the social context in which it occurs, the chosen forms of speech themselves may define speech event or social relationship. (Gal, 1987, p. 291)

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 2.  The role of the addressee: Central notions

If however situation cannot be objectively determined from the outside, the notion of situationally determined language use becomes problematic. Moreover, if speakers may understand the same situation differently, they may make different linguistic choices, which leads to heterogeneous linguistic behavior and thus not to distinct, identifiable varieties. In Systemic Functional Grammar, scholars thus assume that situational parameters just activate potential configurations of linguistic choice (e.g. Hasan, 2001, p. 7), and that situations are semantic, and therefore inherently social (i.e. not cognitive), configurations. The relationship is understood as mutual (Halliday, 1978, p. 38): The situation allows inferences on the probability of occurrence of certain linguistic features, and linguistic realizations allow inferences regarding the situation of use of the expression under consideration. More recently, the notion of genre has been developed to account for situationdependent language use; here, lexico-grammar is suggested to correlate with social activity, instead of with situation types or parameters (cf. Martin, 1992; Bateman et al., 2007; Bateman, 2008). The notion of register in this view serves to describe the socio-cultural and contextual situatedness of discourse, rather than to enable the identification of more or less stable varieties. The term genre is used slightly differently by conversation analysts and interactional linguists who rather regard genre as ‘sedimented patterns’ that are attended to by the participants themselves, i.e. that constitute members’ categories (cf. Günthnerand König, 2016). In all of these approaches to register, the variety, and not the individual speaker, is in the focus. That is, the notion of register does not set out to explain how a speaker chooses a particular linguistic feature in a given situation, but rather focuses on the identifiability and even the predictability of regularities in the linguistic behaviors of groups of speakers. The focus in general is on a particular, sedimented, i.e. conventionalized, way of speaking, rather than on individual solutions to perceived communicative tasks.

2.6  Common ground Another important notion concerning addressee orientation is the interlocutors’ common ground. The concept of common ground has roots in several disciplines, such as pragmatics, psychology, cognitive science, formal semantics, and dialog modeling, among others. It is crucial to describing addressee orientation since it concerns what is assumed to be shared between speaker and addressee and thus involves speakers’ hypotheses about their partners’ knowledge, perception and attention. Central to our investigation is therefore what common ground is taken to consist in and how it is suggested to be made use of by the participants.

31

32

Designing Speech for a Recipient

Unfortunately, there is no general agreement with respect to what types of information common ground contains (Schober and Brennan, 2003, p. 144). From the perspective of addressee orientation, however, we can distinguish several different perspectives on common ground, which correspond in part to the orientations in different disciplines and which suggest different views on the kinds of information modeled of the communication partner.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

2.6.1  Common ground as a set of propositions In a position held especially in formal semantics, pragmatics and the philosophy of language, common ground consists of sets of propositions that are assumed to be shared with the coparticipants. These sets are augmented, and possibly also updated, by what is said. These propositions concern not only the information conveyed in the dialog, but also background knowledge and aspects of the current, perceptually available situation. This knowledge serves as background for reasoning and for retrieving speaker-intended meaning and other types of implicit meaning, such as the meanings of indexical expressions, implicatures and indirect speech acts. Within this perspective on common ground as shared knowledge, approaches differ (a) concerning the status ascribed to the proposition constituting the common ground, and (b) regarding their understanding of mutuality, for instance, whether they assume participants to hold full representations of the other’s representing one’s own knowledge (cf. Clark, 1996, for discussion). Thus, the approaches may differ to the extent to which the knowledge of the other includes propositions about one’s own knowledge, leading to reflexive or iterative representations, which have been criticized to be psychologically implausible (e.g. Clark and Marshall, 1981; Sperber and Wilson, 1986). Furthermore, approaches that consider representations of common ground to consist of sets of propositions differ with respect to whether they consider the propositions to be objectively given, serving as an infallible background, or as subjective. While in some positions common ground is considered to be objective and context-independent, such that participants presuppose its validity and make use of it when they draw inferences or disambiguate utterances or referential expressions, for instance relevance theory (Sperber and Wilson, 1986) rejects the idea of common ground as a set of true propositions, arguing that this position would lead into infinite regress since, as they suggest, we cannot have complete propositions about the others’ knowledge, including their knowledge of our knowledge, and so on. Sperber and Wilson (1986) therefore propose a cognitive concept of context. In their view, a speaker’s cognitive environment is “a set of assumptions which the individual is capable of mentally representing and accepting as true” (Sperber and Wilson, 1986, p. 46). To avoid infinite regress, their theoretical framework is based on assumptions that are taken to be mutually manifest:

Chapter 2.  The role of the addressee: Central notions

To be manifest, then, is to be perceptible and inferable. An individual’s total cognitive environment is the set of all the facts that he can perceive or infer: all the facts that are manifest to him. An individual’s total cognitive environment is a function of his physical environment and his cognitive abilities. It consists not only of all the facts that he is aware of, but also all the facts that he is capable of becoming aware of, in his physical environment. (Sperber and Wilson, 1986, p. 39)

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Relevance theory thus understands common ground as an entirely cognitive phenomenon and manifestness as a matter of degree. Communicators presuppose and construct mutual cognitive environments for determining the explicatures of a communicative contribution, which in turn are required for the calculation of the relevant implicatures. They do so on the basis of previous experience, which may lead to different representations: We do not all construct the same representation, because of differences in our narrower physical environment on the one hand, and in our cognitive abilities on the other. Perceptual abilities vary in effectiveness from one individual to another. Inferential abilities also vary, and not just in effectiveness. People speak different languages, they have mastered different concepts; as a result, they can construct different representations and make different inferences. They have different memories, too, different theories they bring to bear on their experience in different ways. Hence, even if they all shared the same narrow physical environment, what we propose to call their cognitive environments would still differ. (Sperber and Wilson, 1986, p. 38)

The notions of common ground discussed here are generally implicitly evoked in various empirical studies addressing to what extent the communication partner is being taken into consideration; in social and cognitive psychology, for instance, many studies rely on a perspective on common ground that is similar to the one outlined by Sperber and Wilson (1986): The total shared cognitive environment of two people is the intersection of their two total cognitive environments: i.e. the set of all facts that are manifest to them both. Clearly, if people share cognitive environments, it is because they share physical environments and have similar cognitive abilities. Since physical environments are never strictly identical, and since cognitive abilities are affected by previously memorised information and thus differ in many respects from one person to another, people never share their total cognitive environments. (Sperber and Wilson, 1986, p. 41)

For instance, the conception of common ground as sets of assumptions that are mutually manifest is generally implicitly adhered to in psychological investigations of the use of common ground and partner modeling in interaction that investigate to what extent, and under which conditions, speakers make use of factual

33

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

34

Designing Speech for a Recipient

knowledge that is only available to themselves and not to their addressees (e.g. Brown and Dell, 1987; Horton and Keysar, 1996; Lockridge and Brennan, 2002; Horton and Gerrig, 2005a,b; Kraljic and Brennan, 2005). The psychological research carried out in order to investigate to what extent speakers take the current knowledge states of their communication partners into account may focus, for instance, on ambiguity resolution; here, relevant questions concern whether if a particular linguistic form is potentially ambiguous for a hearer, the speaker will anticipate the ambiguity and try to disambiguate his or her utterance for the addressee, or whether speakers take into account whether a given referent is already known to the addressee when referring to an object, thus distinguishing between definite versus indefinite reference. Moreover, psychological studies concern whether information unavailable to the addressee is produced or not, for example, in storytelling. These studies address whether speakers attend to the listener’s information needs during comprehension and in particular whether information is available only to them or also to the listener. For instance, experiments by Brown and Dell (1987) build on the unequal distribution of knowledge between the participants in storytelling; they find a small effect for speakers to take into account their addressees’ information needs, i.e. whether they had a picture available showing the relevant information or not. They suggest that speakers do this because it is simply easiest for themselves (Brown and Dell, 1987, p. 310). However, when Lockridge and Brennan (2002) repeated the experiments with a naïve listener who had not heard the story told many times before like the confederate in Brown and Dell’s study had, they found a reliable and strong partner-based effect.3 Horton and Keysar (1996) carried out similar experiments in which information is available only to one of the participants. They find that speakers may make use of information about the communication partner only if there is time; in timecritical situations speakers do not access representations about the mental states of their communication partners. Instead, they take speakers to rely on their own mental states. Thus, speakers are suggested to be egocentric by default and only to access common ground in the case of problems or if there is time. Lane et al. (2006) find that especially when participants are asked to conceal particular information, they are even more likely to leak private information to the partner. That is, the fact that the partner was not supposed to know certain information made participants mention it even more frequently. This suggests that partner orientation increases the cognitive load for speakers during speech 3. This finding points to a considerable methodological problem for many studies in the field since confederates “are very likely to communicate […] lack of need of information” (Brennan et al., 2010, p. 308).

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 2.  The role of the addressee: Central notions

planning and thus implicitly supports the position held by Horton and Keysar (1996) that taking the addressee into account is not done by default, and in any case that modeling the partner’s knowledge is ‘computationally expensive’ (see Brennan et al., 2010, p. 310). Similarly, investigating the clarity of words that have been mentioned in a corpus twice, also Bard et al. (2000) find that speakers rely more on the fact that the word was familiar to themselves rather than to the addressee. In contrast, Brennan and Hanna (2009) argue that Bard et al. (2000) failed to provide a comparison, i.e. a control group to whom speakers also told the story a second time but who had not heard the story before; compared with speech directed to different addressees, there is indeed a clear difference between second mention to the same versus to a different addressee. Galati and Brennan (2009) investigate storytelling to same and to different addressees and find significant support for partner-specific attenuation, i.e. adaptation to the respective listener’s information needs. Studying the speakers’ attempts to disambiguate utterances for their communication partner, Kraljic and Brennan (2005) argue that speakers do not take into account whether a description is potentially ambiguous for a hearer. Instead, speakers are suggested to use prosodic cues for syntactic disambiguation all the time. Kraljic and Brennan conclude that the respective prosodic cues result from the speakers’ own planning and production processes, rather than being partneroriented. However, in a syntactic disambiguation task, Ferreira and Dell (2000) did find a slight effect for disambiguation for naïve addressees. A similar argument as the one by Kraljic and Brennan (2005) is made by Wu and Keysar (2007) who find instructors in a matching task to use names that the addressee cannot know, thus not paying attention to what is shared and what is privileged. Instead, the authors argue, participants make use of a strategy in which they rely on their own knowledge, especially if the overlap between shared and privileged knowledge is considerable. However, Heller et al. (2009) replicated the experiment by Wu and Keysar (2007) and found that even though names, i.e. nonshared terms, were used in the instructions, these were marked as non-shared and most likely due to participants’ interpretation of the situation, which in their view made teaching relevant. Thus, they find that participants recalled in great detail what was shared and what was privileged information and that they marked their utterances accordingly. Several authors attribute the many null findings with respect to addressee orientation to methodological problems in the analysis: Many studies rely on imagined addressees or make use of confederates, who may subconsciously influence the participants’ linguistic productions, or on monological, non-interactive contexts. As several authors, for instance, Schober and Brennan (2003), Brown-Schmidt (2009) and Brennan and Hanna (2009), criticize, confederates may have influenced

35

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

36

Designing Speech for a Recipient

subconsciously the speakers’ production process in many of the experiments reported on above. Furthermore, the results from Brown and Dell (1987) or Horton and Keysar (1996) neglect the interlocutors’ ability to interact, which may have a great impact on the representation of addressees’ epistemic and deontic states, as Brown-Schmidt (2009) argues; she compares partner-specific effects in interactive and non-interactive settings and finds that partner-specific interpretation is likely in interactive dialog settings whereas it often cannot be found in non-interactive settings. Thus, the role the partner plays in interaction seems to have an impact on the partner-specific effects to be found. Furthermore, studies need to rule out other functional or motivational influences while ensuring ‘natural interaction’. For instance, lack of addressee orientation may sometimes only be one out of a set of possible reasons for speakers to leave out information in storytelling; Horton and Spieler (2007), for instance, suggest that the lack of adaptations between older participants which they observed may be due to different interactional goals in the different age groups. Brown-Schmidt et al. (2008) demonstrate that common ground, in particular the difference between private and shared information, is taken into account by listeners in initial language processes, and that the strength and polarity by means of which common ground is taken into account depend crucially on the sentence type of the speaker’s utterance; for example, if the speaker is asking a wh-question, private information is readily activated since the speaker is more likely to ask for something he or she does not know than something that he or she knows is common ground. Thus, grammatical structure and the projected speech act corroborated by the grammatical form make addressees attend to common ground versus private information differently. The ultimate aim of the experiments just reported on is to infer from the amount and timing of partner-specific effects the nature of linguistic processing in general; the results reported on here give rise to at least three different models: We can distinguish between two different two-stage models, in which the partner is only taken into consideration in an additional, later step, and various one-stage models (e.g. Clark, 1996; Bavelas, 2007; Barr, 2008), in which partner-orientation is just a regular part of the linguistic processing. Barr (2008) presents a one-stage model in which common ground is also being taken into account from the earliest moment of comprehension, yet in which common ground is rather due to automatic anticipation and in which it is not necessarily taken into account in later stages of processing, such that linguistic competitors may override the results from the automatic anticipation effect of restricting one’s attention to what is shared. In particular, he finds that speakers activate both private and shared information. Thus, he advocates a model in which participants can hold simultaneous contradictory beliefs (Barr, 2008, p. 37). However, Heller

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 2.  The role of the addressee: Central notions

et al. (2009), for instance, suggest that it makes sense for speakers in production to activate both shared and privileged knowledge since every utterance contains given as well as new information; from this perspective, the findings by Barr (2008) are not surprising. The cooperative model of language use originally presented by Clark (1996) represents another one-stage approach, in which the partner is attended to from the beginning (e.g. Horton and Gerrig, 2005a,b; Galati and Brennan, 2009; Brennan et al., 2010). However, some of these authors do worry how a partner model could be ‘computationally feasible’ (Galati and Brennan, 2009, p. 47). Correspondingly, Horton and Gerrig (2005b) find that cognitive load constrains to what extent speakers take their communication partners’ knowledge into account. They hold that usually general cognitive association operations are at work, i.e. that partner effects are not due to specific mechanisms. As such, however, these operations are affected by limitations in memory like all other cognitive processes. In contrast, Galati and Brennan (2009) suggest that participants do not create elaborate representations of the common knowledge but that they may instead just encode simple two-bit partner-specific information, such as new versus old or perceptually available or not (cf. also Brennan et al., 2010, p. 326). That possible memory limitations constrain the amount of information people may model of their partner has been contested by Heller et al. (2009), who replicated the experiment by Wu and Keysar (2007) and who show that people do keep track of their communication partner’s knowledge states in considerable detail. Similarly, a study on conceptual pacts in three-party conversation by Yoon and Brown-Schmidt (2014) shows that people do not simply continue using their already grounded terms when a third party enters the conversation; thus, speakers are sensitive to with whom they share certain terms. While people obviously do not store everything about their partners, the actual limits to what they can and do represent are thus not clear yet. The two-stage models can be distinguished in essentially egocentric models (e.g. Horton and Keysar, 1996; Barr and Keysar, 2002, 2007) and in models relying on automatic priming, i.e. the interactive alignment model (Pickering and Garrod, 2004). The former position is held, for instance, by Shintel and Keysar (2009) who argue that many of the effects that are generally explained in reference to models of the partner’s knowledge can be accounted for using general conversational principles, such as the expectation of consistency, or as unspecific associations in memory. For the second position, see Section 2.7 Alignment below. Besides the broad range of psychological studies on partner effects in interaction, a very different kind of support for the idea that speakers may keep track of their partners’ knowledge states comes from the investigation of linguistic structure itself. Givόn (2005) holds context to consist of speakers’ representations of ‘other

37

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

38

Designing Speech for a Recipient

minds’, i.e. cognitive representations of the communication partners’ epistemic and deontic states. He argues that speakers take into account their interlocutors’ supposed mental models at all times. Givόn suggests that it makes sense from an evolutionary point of view to constantly monitor the others’ mental states because in this way their behaviors can be predicted, which has a high adaptive value. He suggests that we know other minds because we know our own minds, and by feature association we can transfer knowledge of ourselves to the others, which makes sense in a society of intimates, in which we are taken to have lived until about 8000 years ago. We thus have access to other minds on evolutionary grounds, that is, on the basis of thousands of years of experience in groups of intimates.4 That is, speakers can extrapolate from themselves to their communication partners. This is in line with the egocentric findings from, for instance, Horton and Keysar (1996), Bard et al. (2007, 2000), and Kraljic and Brennan (2005). In particular, Givόn argues that in order to survive, organisms need to represent most tokens of the same type in the same way and to recognize deviant tokens as exceptional (Givόn, 2005, p. 39). This means that an organism needs to develop context-sensitive adaptive responses. Givόn argues that the ability to predict the behavior of others is the most important adaptive capacity for a social cooperating species (Givόn, 2005, p. 120) and that “the systematic on-line construction of mental models of the current epistemic and deontic states of one’s interlocutor is the central adaptive motivation for the evolution of grammar” (Givόn, 2005, p. 121). We thus construct the others’ minds as like our own, as a prototype-based category (Givόn, 2005, p. 62). Furthermore, Givόn suggests that grammar has evolved as a perfect adaptation to our need to induce others to comprehend what is in our mind. In other words, grammar itself is designed to manage the hearer’s (epistemic and deontic) information states in both production and comprehension. The lexicon in this view corresponds to what is generically, culturally shared. Grammar represents the particular interlocutors’ mental models at particular times that constitute conventionalized, i.e. grammaticalized, common, recurrent, and adaptively relevant types of contexts (Givόn, 2005, p. 92). Givόn shows, for instance, how an aphoric and cataphoric linguistic devices are used to indicate continuity versus discontinuity with previously grounded referents (pp. 136–143) and “to anticipate the epistemic mental states of the interlocutor” (Givόn, 2005, p. 133) respectively. These “propositional modalities” display a “fine-tuned sensitivity on the part of the speaker to the informational and social reality around them, most conspicuously to the constantly shifting epistemic and deontic states of their interlocutors” (Givόn, 2005, 4. As Haviland (1988) shows, societies of intimates do not necessarily share all information with each other; thus, the point Givόn (2005) makes, that in small, rural societies people necessarily share their experiences, is not always true. I owe this observation to Herb Clark who pointed me to John Haviland’s paper.

Chapter 2.  The role of the addressee: Central notions

p. 177). These findings are, for instance, in accordance with the results by Heller et al. (2009) and by Brown-Schmidt (2009) which suggest that speakers pay close attention to their partners’ knowledge states. Thus, grammar can be interpreted as having the function to both anticipate and influence the interlocutors’ mental states, or, in Givόn’s words:

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

a dedicated signaling system whose purpose is to induce others to comprehend what is in one’s mind. Not only to comprehend, but hopefully also to spring into relevant action. Such behavior is inconceivable without a running on-line mental model, however subconscious, of the interlocutor’s rapidly shifting intentional and (Givόn, 2005, p. 120) epistemic states.

The adaptations towards the communication partner thus concern the grammatical expression of epistemic and deontic modality, tense, aspect, and evidentiality (where ‘deontic’ is taken to mean “matters of desirability, preference, intent, ability, obligation, manipulation or power” (Givόn, 2005, p. 149)). The adaptations extend to discourse. According to Givόn, discourse coherence is established by grammatical cues, which together constitute “an elaborate system of cues that speakers give hearers about highly specific mental structures and operations” (Givόn, 2005, p. 193). In his approach, then, speakers model their partners’ epistemic and deontic states continuously and dynamically, updating these beliefs with every utterance. The linguistic model presented by Givόn then is in support of extensive online partner modeling, combining both the egocentric, prototype-based perspective on partner modeling (‘what is easiest for the speaker is easiest for the addressee’) and the process of continuous partner-specific updates. Interestingly, in the model outlined, this orientation to the addressee is built into the structure of language itself, which thus provides speakers with an inventory of forms to manage the respective communication partner’s mental states. 2.6.2  Common ground as a shared basis In contrast to the position which regards common ground as a set of propositions, Herbert H. Clark has developed an approach to common ground that includes a much broader range of knowledge. For instance, Clark and his collaborators (e.g. Clark, 1992, 1996; Clark and Marshall, 1981; Clark and Wilkes-Gibbs, 1986) hold that “we carry around rather detailed models of people we know, especially of the people we know well” (Clark and Marshall, 1981, p. 55). These models comprise, besides facts, beliefs and assumptions, also “norms of behavior, conventions, procedures, skills, and even ineffable experiences. These may be represented in many ways – as verbalized statements, as mental images and maps, as ways of perceiving and behaving we cannot or ordinarily do not describe. All this information is graded" (Clark, 1996, p. 112). People employ physical and linguistic co-presence

39

40 Designing Speech for a Recipient

as well as community co-membership as heuristics to build up models of the information they share with their addressees (Clark, 1992). Clark (1996) presents a typology of common ground, proposing a number of fine-grained distinctions with respect to several different domains. In his account, ‘total common ground’ comprises the discourse record, and communal and personal common ground (Clark, 1996, p. 54). Both communal and personal common ground can be distinguished into a number of subtypes (Clark, 1996, p. 100–120): 1. communal common ground, which is based on community co-membership; cultural communities are identifiable by their sharing of expertise, based on, for instance, nationality, residence, education, occupation, hobbies, etc. (Clark, 1996, p. 102); –

– –

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.





 human nature; for instance, people normally have the same senses such that a speaker may generally assume that what is accessible to him or her may also be accessible to the partner (Clark, 1996, p. 106); communal lexicons, i.e. the vocabulary of a language; cultural facts, norms, procedures, for instance, “an understanding of certain social roles and the institutions with which these roles are associated” (Clark, 1996, p. 109), but also scripts (e.g. Schank and Abelson, 1975); ineffable background; as members of a particular category, participants share not only knowledge, but also know-how, appearance and perspective (Clark, 1996, p. 110); our feeling of other’s knowing (Clark, 1996, p. 111).

This concept of communal common ground is close to the commonsense reading of common ground, in which common ground comprises all of the information shared by a speech community, such as facts, attitudes, beliefs, norms, conventions and preferences as well as other types of sociocultural knowledge. 2. personal common ground, which is based on personal acquaintance or joint experience; evidence that speakers make use of personal common ground while speaking comes, for instance, from experiments in which participants were asked to conceal information for their overhearers (Clark and Schaefer, 1992, p. 263). The personal common ground comprises joint perception, joint experience and joint actions (Clark, 1996, p. 112). – perceptual basis, established through perceptual co-presence (Clark, 1996, p. 112), but the “perceptual events are never dealt within the raw” (Clark, 1996, p. 113); Clark et al. (1992) show how definite references are resolved by communication partners based on perceptual salience in the situation, but also by comparing it to speaker goals and to common ground and speaker presupposition (Clark et al., 1992, p. 98);

Chapter 2.  The role of the addressee: Central notions

 actional basis, especially previous joint action, such as talk (Clark, 1996, p. 114); – personal diaries, i.e. autobiographical recall of events that concern joint experience or joint action (Clark, 1996, p. 114); – acquaintedness; for instance, social relationships can be ordered according to acquaintance, e.g. strangers, acquaintances, friends or intimates, where friends and intimates are taken to be experts on each other (Clark, 1996, p. 115); – personal lexicons; here, Clark (1996) lists nicknames, affectionate terms and other privately used language, but a personal lexicon is likely to comprise also the interactively established referential terms as described in Clark and Wilkes-Gibbs (1986) and others.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.



This list comprises shared knowledge that goes beyond supposed mutual knowledge in the form of propositions. For instance, as Clark and Marshall argue, a speaker searching his or her memory for an object, state, event or process referred to in an utterance cannot search memory for this event alone. Instead, he or she must search for an event that involves the event in question and in addition the communication partner and him- or herself (Clark and Marshall, 1992, p. 53). Similarly, Metzing and Brennan (2003) argue that participants make use of “an underlying episodic representation that associates a referent, a referring expression (and the perspective it encodes), and other relevant information about the context of use (such as who the partner is)” (Metzing and Brennan, 2003, p. 203). Clark (1998b) provides a detailed account of how common ground influences lexical choice; he argues that each lexical item must be stored together with information on for which communities of speakers this item constitutes a convention. He uses the definition of Lewis (1969) as a starting point and proposes that conventions are partly arbitrary regularities in behavior that serve as coordination devices for recurrent coordination problems in a given population for whom this regularity is common ground (Clark, 1998b, p. 65). Thus, lexical items can only be successfully employed within the population for whom it is common ground. This entails that when choosing a lexical item, people have to be aware of which community membership they share with their communication partner in order to select an item whose interpretation is part of the common ground: When I am talking to you, I don’t choose just any word from my personal lexicon. I ascertain which communities you and I mutually believe we are both members of and limit myself to the entries indexed for those communities. Word selection and word interpretation then become social processes. They depend on judgements of shared membership in cultural communities. (Clark, 1998b, p. 72)

41

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

42

Designing Speech for a Recipient

People will therefore provide their communication partners with clues to their own community membership, for instance, during first encounters, in order to provide them with “the information they need for participating with them in joint activities” (Clark, 1998b, p. 76). These categories are ‘inference-rich’, to borrow the term from Sacks (1972b), such that they reflect common beliefs, practices and experience of the communities in question (Clark, 1998b, p. 84), which facilitates both word selection and disambiguation for community members, since members can take their communal expertise for granted, which includes a common lexicon. Clark (1998b) concludes that “strangers need to establish which communities they are both members of before they can choose words in talking to each other” (Clark, 1998b, p. 85); thus, establishing common ground (based on community membership or on joint experience) is a precondition for linguistic choice in general. The breadth of types of communal and personal common ground suggested by Clark (1996) overlaps with the broad conception of context developed in the research paradigm of the ethnography of communication (Saville-Troike, 2003). In this approach, the shared context of communication has been systematized with respect to the following categories: physical setting and psychological scene; participants, viz. speaker, hearer and audience and their statuses in the participation framework; ends, that is, the purpose of the speech event from a sociocultural perspective; act sequence, that is, how something is said as regards to message form and what is said as regards to message content; key, i.e. mock or serious, instrumentalities, channels (for instance, spoken, written, email) and forms (for instance, vernacular, dialect, or standard) of speech, norms of interaction and interpretation, and genre (Hymes, 1972). Also for the perspective on common ground as a shared basis, numerous empirical results have been accumulated. For instance, research on common ground as a shared basis concerns how participants may develop a shared personal lexicon. In these experiments as originally developed in Clark and Wilkes-Gibbs (1986), participants play the roles of instructor/director and instructee/matcher; their task is the description of cards with Tangram figures, which the instructee has to sort in a particular order based on the director’s instructions. The experiment is repeated several times with the roles of director and instructee switching around in every round. Dependent variables in the analysis are the amount of time and the amount of words and utterances participants need to carry out the task. The results of these experiments show invariably that participants progressively need less time and fewer words and are more efficient with each iteration of the game. Thus while participants establish jointly how to refer to each Tangram figure in the first round, they make use of the negotiated expressions in later phases. Thus, participants establish a shared personal lexicon for the particular task at hand (Clark and Wilkes-Gibbs, 1986). Brennan and Clark (1996) suggest that speakers enter

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 2.  The role of the addressee: Central notions

‘conceptual pacts’ about how to refer to particular figures or objects, and Metzing and Brennan (2003) show that listeners may be irritated if these pacts are broken. That is, if participants interactively negotiate a particular way of referring to a particular object, addressees expect speakers to use different terms only for different objects (see also Shintel and Keysar, 2009). These conceptual pacts have been shown to hold even across situations, when participants walk around in between sessions (Brennan et al., 2013). Also the studies by Horton and Gerrig show that speakers attend to the referential expressions previously established with their interlocutors, yet that this process is quite demanding in terms of memory load (Horton and Gerrig, 2005b) and that consequently speakers can be found to make mistakes (Horton and Gerrig, 2005a). Horton and Spieler (2007) find an age difference in the amount of audience design participants exhibit, which they largely attribute to memory restrictions with growing age. In their study, participants of two age groups showed similar effects in a card matching task when their partner was present; however, when they had to describe the same picture cards as before to a distant communication partner, only the younger participants took into account whether they were talking to a familiar or to an unfamiliar addressee. Another study of factors influencing the degree with which speakers take their partner’s knowledge into account in reference tasks is Gann and Barr (2012). The authors investigate which effects the possibility to provide feedback, the conventionality of the object and its novelty have on the degree with which speakers take their listeners’ information needs into account. They find that when speakers were familiar with an object themselves but their addressees were not, speakers mostly relied on their partners’ online feedback to provide additional information (Gann and Barr, 2012, p. 16), making use of the incremental nature of language, which always allows to simply add an extra constituent depending on the partner’s response (Gann and Barr, 2012, p. 16). In the absence of such feedback, speakers relied on their own planning and reasoning capabilities to formulate their utterances, where they put significantly more linguistic work into formulating descriptions of new objects to their familiar addressees than into formulating descriptions of already familiar objects (Gann and Barr, 2012, p. 16). Thus, speakers did not worry very much about overspecification, only about underspecification (Gann and Barr, 2012, p. 20). Furthermore, their different adjustments depending on the availability of feedback shows that speakers are generally sensitive to the affordances of the communicative situation (Gann and Barr, 2012, p. 16). Similar results have been achieved for other communication domains; besides investigating how people arrive at a common lexicon, studies address the use of personal common ground, for instance, with respect to differences in conversational strategies between friends and strangers. For instance, participants have been

43

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

44 Designing Speech for a Recipient

found to make use of personal knowledge when trying to conceal information from overhearers (Clark and Schaefer, 1987, 1992). Moreover, Isaacs and Clark (1987) show that participants make considerable adjustments to their partners depending on whether they identified them as experts or as novices. Lee et al. (2005) find people to build up mental models of the cultural backgrounds of their communication partners, which in their case was a robot. They provided participants with two kinds of clues about the robot: where it was built and the language it speaks. They find that on the basis of these clues, participants ascribe cultural knowledge of particular locations to the robot. Moreover, Fussell and Krauss (1992) investigate the role of speakers’ feeling of others’ knowing and find that participants often make use of this information when judging whether a communication partner is likely to know a particular referent. Thus, there is considerable experimental evidence that speakers make use of the kinds of contents suggested as a shared basis by Clark (1996). In the domain of perspective taking, experimental results (cf. Schober, 1998; Duran et al., 2011) furthermore show that speakers take their partners’ spatial perspective into account in utterance production, and not only their perspective in general, but also the approximate number of occurrences in which their partners have previously taken their perspective (Schober, 1998). A recent study by Ryskin et al. (2016) moreover indicates that perspective taking is facilitated if people can associate a particular perspective with a particular communication partner. In particular, their eye-tracking data show that in a reference disambiguation task, participants store particular points of view for each communication partner, which is then taken into account when identifying a particular referent. Schober (1995) finds that speakers often take the perspective of their communication partner, and that after exchanging roles as instructor and instructee, speakers try to be egalitarian. That is, they try to take the others’ perspective as often as these have taken theirs. Finally, in the spatially relatively demanding dialogs of this study, speakers increasingly use spatial instructions that are independent of particular viewpoints. Schober (1998) therefore argues that speakers generally orient towards the principle of least collaborative effort (see Clark and Wilkes-Gibbs, 1986; Clark, 1992, p. 132). Furthermore, the possibility to give feedback also plays a considerable role. In the study presented in Schober (1993), speakers mostly used their own perspective for producing spatial instructions when they had a real communication partner who was allowed to give feedback. In contrast, if speakers had to imagine a communication partner, they exclusively took the partner’s perspective. Schober (1998) attributes the difference to speakers’ knowledge about the success of their instructions due to feedback. If their communication partners signal to them that understanding is successful, speakers may stick to the mode of instruction which needs the least effort for themselves (cf. Clark and Wilkes-Gibbs, 1986). In contrast, in case they do not know whether their communication partner

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 2.  The role of the addressee: Central notions

understands them, they adjust to the partner’s perspective in order to guarantee understanding in the absence of feedback, thus taking over the main load to minimize the partner’s effort (cf. Clark and Wilkes-Gibbs, 1986). In another study, von Stutterheim and Kohlmann (1998) find that speakers stick to their own perspective even if the spatial position of the addressee changes. They therefore propose a model of text planning in which partner hypotheses play a role at particular stages. They argue that global choices, such as perspective taking, are set at an early stage of text planning and are therefore fairly resistant to modification. Their results however can be explained on the background of Schober’s findings by the fact that the addressee, a confederate, had already been acquainted with the speaker’s spatial perspective before changing his seat and moreover would have been able to communicate through feedback if he had had problems. Thus, the seemingly egocentric behaviors in this experiment can be explained by the availability of feedback (cf. Fischer, 2005). In a more recent study, Schober (2009) shows that the choice of perspective in dialog depends also on the distribution of spatial abilities between the participants. Particularly in dyads with asymmetric spatial capabilities, speakers with high abilities take their partners’ perspectives to facilitate the interaction. Similarly, Duran et al. (2011) find speakers to adopt an other-centric perspective depending on the partner’s perceived ability, even if this comes at a higher cognitive cost. In sum, perspective-taking can be concluded to be essentially partner-oriented with feedback playing a crucial role. The example of the experimental study of perspective taking shows that participants orient to common ground as a shared basis as suggested by Clark (1996). This perspective on common ground is far more inclusive than the modeling of the communication partner’s propositional knowledge by comprising a much broader spectrum of types of information attended to, and it understands common ground as a shared basis which is subject to collaborative work. 2.6.3  Common ground as a collaborative achievement Besides the knowledge-based perspective on common ground and besides the paradigm of common ground as a shared basis, a third general approach to common ground can be identified that assumes that common ground is collaboratively construed in interaction. Particularly in the research paradigms of interactional sociolinguistics and ethnomethodology, common ground is considered from a process viewpoint and thus is taken to be entirely constructed and reconstructed in and through the process of communication. Here, possibly relevant aspects of the communication partner, for instance power, ethnicity and gender, are seen as jointly construed in interaction (e.g. Gumperz, 1982; Schegloff, 1997).

45

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

46 Designing Speech for a Recipient

Similarly, in the interactional achievement tradition, the communicative situation is not taken to be objectively given. Instead, it is held that both internal and external aspects of context are selected by the participants depending on their relevance for the participants. Since context is not objective but dependent on the interactants’ active constructions, the contextual categories selected may differ even between the participants themselves. Consequently, there is a need for the interactants to signal to each other in what way they understand the context in the particular speech situation; they need to display their understanding of the situation as it is necessary for the current utterances to be interpretable. This concerns on the one hand what is being said, and on the other hand what the context serving as a background for what is said is considered to be (see Fetzer and Fischer, 2007). Among the strategies by means of which participants indicate context to each other are what Gumperz (1982) calls contextualization cues: By indicating aspects of context by means of linguistic features such as prosody, mimic, or gesture, speakers provide hints relevant to the interpretation of their utterances. Because of the concurrent nature of such displays, Gumperz (1982) has focussed on paralinguistic cues, especially on prosody. However, many different lexical and grammatical means can function as markers of common ground (Gumperz, 1977, p. 199). Also Clark (1996) argues that speakers provide each other both with linguistic and with non-linguistic evidence with respect to the right shared background, in particular regarding to which communities they belong; for instance, they may “deliberately display community affiliations in their dress, manner, and possessions” (Clark, 1996, p. 117). Furthermore, he holds participants to signal to each other how they understand each others’ contributions, which he calls the grounding process (Clark and Schaefer, 1989b): Participants display their understanding of each others’ utterances and the relationship that holds between these utterances to each other. These interpretations may be ratified or challenged by the coparticipants in the following turn (Sacks et al., 1974). Because speakers are in constant need to signal to each other what exactly they understand of their partners’ utterances, the procedures are often implicit. For instance, Clark and Schaefer list a number of techniques users employ for grounding, most of which are implicit (Clark and Schaefer, 1989b, p. 267): Continued attention, initiation of the relevant next contribution, acknowledgement (by means of nodding or items like uh-huh or yeah), demonstration, and verbatim display. “These types are graded roughly from the weakest to the strongest” (Clark and Schaefer, 1989b, p. 267), and also from least explicit to most explicit. The background assumption is that speakers need to assure each other “that the others are at that moment attending to, hearing, and trying to understand what they are saying” (Clark and Schaefer, 1989a, p. 123). The authors continue: “Speakers need their listeners’ help for both these assurances, so conversation takes everyone’s coordination. Together, the participants bear the mutual responsibility

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 2.  The role of the addressee: Central notions

of assuring that what is said has been heard and understood before the conversation goes on” (Clark and Schaefer, 1989a, p. 123–124). By signaling to the communication partner what has been understood, the participants continuously build up a pool of information available to all participants: “the speaker and her addressees all focus on grounding that content, on placing it among their shared beliefs – technically their mutual beliefs” (Clark and Schaefer, 1989a, p. 124). While Clark and Schaefer exemplify the construction of presentation and acceptance on telephone directory inquiries, they argue that it is a “systematic feature of everyday conversations”, too (Clark and Schaefer, 1989a, p. 149). The methodological principle underlying the study of grounding rests on the conversation analytic notion of the next-turn-proof-procedure which describes that each listener has the opportunity to display his or her understanding of the previous turn in his or her next turn, an interpretation which may be ratified or rejected by the previous speaker in the third turn (Sacks et al., 1974). Thus, by displaying understanding to each other, participants provide acceptance of the information provided by the communication partner, and the ratified information can be regarded as sufficiently grounded (Clark and Schaefer, 1989b). In this way, the participants’ own understandings of their partners’ previous turns become apparent. The two acts of presenting information and signaling acceptance together constitute contributions to conversations, which are thus collaboratively achieved. The result of the grounding process is, in Clark’s terminology, the discourse representation (Clark, 1996). The discourse representation, i.e. what is said, as well as what the common ground may be taken to consist in, are thus jointly construed by both the speaker and the listener. Thus, in this approach the listener is taken to play a rather active role (see also Bavelas et al., 2014). To sum up, common ground is understood in different ways depending in part on the different disciplines in which it is studied. The first two approaches discussed focus on different aspects that may contribute to models of communication partners, yet both assume that speakers build up detailed representations of their communication partners. This is not necessarily the case in the third approach discussed, in which common ground may be constituted locally and for the moment of interaction only.

2.7  Alignment Linguistic alignment, also called entrainment, convergence or synchronization, refers to the process in which a speaker re-uses linguistic structures previously presented by another speaker. Since Pickering and Garrod (2004) proposed alignment to constitute a central process in dialog production and comprehension, the interest in alignment has increased considerably.

47

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

48 Designing Speech for a Recipient

The interactive alignment model proposed by Pickering and Garrod (2004) is meant to account for basic processing mechanisms employed in natural dialog. The authors place their approach in the language as action tradition (Clark, 1996), yet they distinguish two types of coordination: On the one hand, there is coordination as joint action, as detailed in Clark (1996), on the other there is alignment, which is defined as the sharing of representations due to implicit, automatic priming mechanisms. The interactive alignment model is a psycholinguistic processing model that is based on dialog and which takes interactive processes as its starting point. In the framework of the current study, this model is particularly interesting because it suggests an important interactional process between speakers and addressees and takes a particular stance on partner modeling. Speakers are taken to align with their partners because aligned representations constitute shortcuts to understanding. If alignment takes place at various linguistic levels, this leads to successful dialog immediately and efficiently since similar representations are evoked in the participants. Thus, instead of assuming isolated processing by speakers and hearers, the interactive alignment model holds the processes to be interlinked. The channels are taken to be bidirectional, that is, they lead from production to comprehension and from comprehension to production. In particular, activation of a representation in one participant is suggested to lead to activation of the same or similar representations in the other. There is no intervening representation, the process is subconscious, and the main mechanism assumed by Pickering and Garrod (2004) is automatic priming, that is, an implicit, memory-based effect during which some information influences the occurrence of some other kind of information (Garrod and Pickering, 2009, p. 296). The model thus holds that the features of the communication partner’s utterance have a direct, automatic effect on the features activated in the current speaker. Furthermore, alignment at lower levels is taken to lead to alignment at higher levels. Indeed priming effects have been found on all levels of linguistic representation (Pickering and Garrod, 2004; Garrod and Pickering, 2007, 2009). According to Pickering and Garrod (2004), this leads to increased understanding by the alignment of situation models since when speakers try to understand each other, they are taken to match each others’ situation models by aligning to their partners’ syntactic, semantic, and even higher level discourse strategic choices, such as perspective taking. This also leads to intrapersonal alignment over time, because using the same linguistic representation for the same reference object is not only efficient, but it is also expected by the listener, as Metzing and Brennan (2003) have shown (cf. also Shintel and Keysar, 2009). Evidence for the suggestion that alignment at one linguistic level may lead to alignment at another is taken to come, for instance, from the fact that grammatical alignment is enhanced by the repetition of lexical items (Garrod and Pickering,

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 2.  The role of the addressee: Central notions

2009, p. 298). Gries (2005) confirms this observation in a corpus-linguistic investigation, yet he shows that syntactic alignment is highly verb-specific; i.e. whether syntactic priming occurs depends largely on the general distributional properties of the verb in the respective construction. This suggests that the linguistic properties of the linguistic material at the various linguistic levels may play a role in alignment with a linguistic form used by the partner. That is, it is the conventional linguistic properties and distributions that may influence the occurrence of alignment, not necessarily alignment at one level that would lead to alignment at another, as Pickering and Garrod (2004) suggest. However, Raffray et al. (2008) do find evidence for the priming of analogical relations by the repetition of linguistic material. Thus, they find strengthening relationships between semantic and syntactic levels. Alignment at lower levels, such as at the level of syntactic structures, may also be due to the quick changes of speaker role in conversation (cf. also Garrod and Pickering, 2009, p. 299). Since in conversation speakers are likely to become listeners very quickly, and listeners may become speakers, participants in conversation have a different status than overhearers and other side participants. Accordingly, overhearers have been found to be less likely to take up the syntactic structures they hear (Branigan et al., 2007).5 Another consequence of turn-taking in conversation is that getting ready for speaking and listening are closely related; correspondingly, Pickering and Garrod (2004) take production and comprehension processes to be tightly coupled, due to the requirements of dialog. In later work, the authors suggest that speakers use their production system as an emulator during language comprehension to predict upcoming words, grammatical categories and meanings (Pickering and Garrod, 2007, p. 108); this emulator is modulated by input such that speakers imitate what they have just heard at various linguistic levels, which enables the production system to make predictions, thus facilitating comprehension (Pickering and Garrod, 2007, p. 109). Moreover, Pickering and Garrod (2013) go beyond this suggestion by proposing that the listener not only predicts what the partner will say, but covertly imitates the whole production process by forward modeling. Evidence for the tight coupling between production and comprehension is taken to come, for instance, from fMRI studies that show overlapping processing regions in the brain, from the activation of tongue muscles during comprehension, but also from pragmatic findings such as the projectability of turn completion (Garrod and Pickering, 2009). Results demonstrating the convergence on certain referential terms like those reported on by Clark and Wilkes-Gibbs (1986) are 5. Obviously, other interpretations of this phenomenon are possible; in approaches inspired by CA, for example, the sharing of grammatical constructions across turns would rather be seen as an instance of co-production (e.g. Pekarek-Döhler, 2011) than of automatic priming; this phenomenon has also been described as dialogic resonance by Du Bois (2014).

49

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

50

Designing Speech for a Recipient

thus taken to be due to interactive alignment; another consequence of alignment between levels is similarly the establishment of fixed expressions and eventually also language change (Garrod and Pickering, 2007, p. 449). To sum up, the model holds that production and comprehension are tightly coupled, that speakers and hearers do not act in isolation, that there is a communication mechanism between the channels, and that these mechanisms are implicit and automatic. The activation of a representation in one participant determines the subconscious and automatic activation of the same representation in the other participant without intervening mechanisms, which is taken to be the most efficient process because there is only a single representation involved for both processes. Because alignment is taken to be automatic and subconscious, it does not require a model of the listener, and thus the interactive alignment model predicts that also common ground is established implicitly and automatically (Garrod and Pickering, 2007, p. 444). Because speakers derive common ground through alignment, no reference to the listener is necessary, and speakers do not take explicit representations into consideration. According to Pickering and Garrod (2004), previous models of common ground require the maintenance of a complex situation model. In contrast, in the interactive alignment model, speakers are taken to build up implicit common ground automatically through alignment. Modelling the listener is in this view neither apparent, nor necessary, unless there are problems or unless “the discrepancy between their knowledge and that of the speaker is made especially salient” (Garrod and Pickering, 2007, p. 445). The authors rely on results from Horton and Keysar (1996) who argue that under time pressure speakers do not consider their communication partners’ mental states. Instead, they take the adjustments to the partner observable to be due to resonance, associations between linguistic choices and particular partners from previous interactions (Garrod and Pickering, 2007). They suggest that the prior association between a particular linguistic expression and a particular addressee may be due to resonance as proposed by Horton and Gerrig (2005a), rather than to strategic inference (Garrod and Pickering, 2007, p. 444). Thus, Pickering and Garrod (2004) conclude that inferencing about common ground is optional and occurs only if resources are free. In the case of problems, negotiation, for instance in the form of clarification requests which are understood as an implicit common ground alignment strategy, may take place. Explicit, as well as implicit, non-alignment, Pickering and Garrod (2004) argue, is also possible, for instance, when speakers try to conceal information, when they wish to disalign deliberately or when the previous representation was not understandable. Disalignment is understood here not simply as lack of alignment but as the choice of different linguistic options in the sense of divergence (e.g. Strupka et al., 2016). Such strategies are however taken to be costly since they require a model

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 2.  The role of the addressee: Central notions

of the listener. Similarly, Pickering and Garrod suggest that in monolog, speakers/ writers indeed need to employ a supposed audience model. They discuss the results presented in Schober (1993), which demonstrate that speakers take different strategies concerning perspective taking in monolog versus in dialog. In particular, if speakers have to imagine a communication partner in a monological situation, they consistently take the perspective of the imagined partner, where as in dialog, speakers have been found to use different strategies and also to orient to maxims such as reciprocity and equality. Drawing on these findings, which suggest that a monological situation requires extra efforts, Pickering and Garrod argue monolog to be epiphenomenal. In later papers, Pickering and Garrod treat alignment independently of linguistic alignment and rather synonymous with ‘understanding’ (Garrod and Pickering, 2007) and ‘comprehension’ (Pickering and Garrod, 2013). For instance, they argue that the sharing of physical environments leads to alignment (Garrod and Pickering, 2007, p. 449), and also that “feedback increases alignment” (Garrod and Pickering, 2007, p. 447), where only the alignment of situation models can be meant. At the same time, alignment is taken to produce joint action: “the joint action of communication occurs when interlocutors align their situation models” (Garrod and Pickering, 2009, p. 297). The interactive alignment model has also been criticized. For instance, many of the recent studies on alignment show that not all speakers align. Assertions like “72–94% of the children showed positive accommodation on the different linguistic features examined” (Oviatt et al., 2004, p. 16) do not only show that alignment occurs, but also that 6–28% of the children did not align with their communication partner. If alignment occurs as an automatic priming effect, it remains open why only some and not all speakers align. Moreover, Levitan et al. (2014) show that (prosodic) entrainment in dialog occurs at very specific places and not globally. In particular, as the comparison of overlaps and interruptions show,“local entrainment is associated with discourse coherence” (Levitan et al., 2014). Furthermore, speakers may also disalign with their communication partners, for instance, when speaking to a robot (Strupka et al., 2016). Manson et al. (2013) look into the interactional effects of alignment and especially into the relationship between alignment and cooperativeness. The authors had 35 same-sex triads converse for ten minutes on any topic of their own choice before participants had to play a one-shot prisoner’s dilemma game with each of their interaction partners and finally rate their partners regarding warmth and competence. The analysis of their verbal behaviors regarding F0, variation of F0 and speech rate, laughter and style (measured by means of the LIWC program (Tausczik and Pennebaker, 2010)) reveals that in spite of considerable coordination, for instance, in style, especially regarding function words, only alignment

51

52

Designing Speech for a Recipient

of speaking rate was significantly related to participants’ willingness to cooperate with that partner in the prison dilemma game. Furthermore, while the alignment of speech characteristics was not associated with positive person perception ratings, the alignment in style and laughter were. Thus, alignment does not necessarily lead to more positive person perception and to more cooperation. The identification of alignment may also be more problematic than assumed by Pickering and Garrod (2004) and others. For instance, Bateman (2006) points out that alignment cannot be sufficiently quantified. He argues that it is an as yet unsolved theoretical and methodological problem to identify and quantify alignment; consider, for instance, the following a pair of utterances from human-robot interaction (Bateman, 2006, p. 165):

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

(2) Robot: Is this part of the kitchen? User: This is part of the kitchen

An automatic lexico-grammatical analysis of this exchange produces 59 features for the first and 62 features for the second utterance, many of which are shared; whether this however corresponds to our intuition about alignment, where to draw the line and how to weigh different features is completely open (Bateman, 2006, p. 165). And even if linguistic alignment could be identified in the transcript, it may be indistinguishable from other cooperative behavior, especially if alignment occurs over a longer distance, which it does (Gries, 2005). Bateman (2006) thus argues that in many cases it cannot be distinguished whether “the speaker here [was] choosing to cooperate with the robot or being subjected to alignment” (Bateman, 2006, p. 168). Furthermore, while Pickering and Garrod (2004) argue that alignment is affected by speaker participation in dialog (p. 187), the precise roles of feedback and interaction remain unclear in the model. If alignment is ubiquitous and automatic, addressees and overhearers should perform identically well, which they do not, as Branigan et al. (2007) demonstrate. When Pickering and Garrod explain the difference by pressures from the turn-taking system and the fact that in dialog listeners must be ready any time to become speakers themselves, the authors open up for the possibility that interactional requirements may enforce speakers’ readiness to align with their partner. Indeed, in the study presented in Branigan et al. (2007), the authors find different amounts of alignment for previous addressees versus for previous side participants, and thus participation structure to play a considerable role. That is, they find that “alignment can be affected by a range of contextual factors” (Branigan et al., 2007, p. 169). This leads the authors to suggesting that alignment is not simple input-output coordination; to avoid undermining their own position that alignment is automatic, they suggest that the different results for addressees and

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 2.  The role of the addressee: Central notions

side participants are due to differences in processing depth (Branigan et al., 2007, p. 190). Thus, differences in participation status are taken to lead to differences in the extent with which processing occurs and thus to different amounts of ‘residual activation’ (Branigan et al., 2007, p. 191). Yet also other studies call into question that alignment occurs on the basis of automatic priming (cf. also Krauss and Pardo, 2004); the maze game studies reported on in Mills (2007) and Mills and Healey (2008), for instance, indicate that alignment is strategic rather than automatic. The latter argue that alignment may be strategically employed in order to introduce a figure-ground distinction on the background of which a corrected element may be identifiable (Mills and Healey, 2008, p. 49). They thus suggest alignment to constitute a resource for the participants rather than an automatic procedure. Similarly, Mills (2007), who investigates the negotiation process of referential terms and strategies in dialog, argues that “alignment is actually the backdrop against which subtle, tacit changes are made in the process of developing abstract description types” (Mills, 2007, p. 128). Mills (2014) addresses the extent to which alignment (as behavior matching in the sense used by Pickering and Garrod (2004)) can account for coordination phenomena such as ‘conceptual pacts’ by analyzing 32 map task interactions. He investigates the role of repetitions of the partner’s linguistic features in semantic coordination, i.e. the development of stored referential descriptions, and in the development of procedures separately. He finds that “as interlocutors become coordinated they repeat less and less of the initial semantic model” (Mills, 2014, p. 165, emphasis original). Similarly, he suggests that repetition in later stages of procedural coordination would be disruptive and rather indicative of lack of coordination (Mills, 2014, p. 168). Furthermore, he argues that terms acquire different meanings over time, so that just looking at repetition of surface forms does not do justice to the many layers from previous iterations that lead to the final meaning (Mills, 2014, p. 168). Likewise, Schegloff (2004) criticizes that Pickering and Garrod (2004) lump together mechanisms and procedures that have different causes and conditions and thus fail to see the various central functions of these strategies. In particular, in his view, the functional nature of these mechanisms is neglected. Schober (2005) contests the assumption that aligned linguistic representations automatically lead to aligned situation representations. He investigates peoples’ understanding of survey questions and finds that while people generally believe that they understand the questions, they may in fact interpret the linguistic material used very differently. That is, in spite of them using the same linguistic inventory, i.e. being perfectly aligned on various linguistic levels, their situation models may diverge radically.

53

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

54

Designing Speech for a Recipient

In their commentaries on the article by Pickering and Garrod (2004), many authors furthermore point out problems with the role of the communication partner proposed in the model. For instance, Fussell and Kraut (2004) argue that if alignment was automatic, participants should align their utterances even if the situation is asymmetrical. In the experiments they report on, speakers operate under circumstances that differ with respect to the availability of visual cues. The data show that the speakers do not align to their partners’ deictic references to the environment. Instead, Fussell and Kraut (2004) assume speakers to rely on a partner model on the basis of which they choose the features of their utterances. In their reply to the commentary, Pickering and Garrod remark that this result was only due to the pressure of the situation and thus reject the example as evidence for concomitant partner modeling. Nevertheless, the data presented by Fussell and Kraut (2004) clearly show that communication partners do not simply automatically align in all situations. Also Brennan and Metzing (2004) argue that speakers create models of their partners, yet they propose that under normal circumstances speakers assume that what is easiest for the speaker is easiest for the addressee. However, Schober (2004) asks how one would know when to model the partner explicitly. That is, in order for participants to know when they have to stop doing what is easiest for them and start modeling implies that speakers keep track of whether there are differences that are potentially problematic. This implies that participants have to keep models of their communication partners in mind all the time. Pickering and Garrod (2004) reply that it is principally impossible to model all communication partners in multi-party dialog (see also Horton and Gerrig, 2005a) and that procedures like those reported on by Schober (2004) thus have to be optional (Pickering and Garrod, 2004, p. 217). Concerning the relationship between alignment and partner models, several different hypotheses have been investigated (Branigan and Pearson, 2006; Branigan et al., 2010). For instance, in experiments in which participants communicated with their (in fact artificial) communication partner via keyboard, speakers were found to align more with their partner when they believed to be talking to a computer than when they believed their partner to be another human (Branigan et al., 2003). In order to determine what causes this difference, Pearson et al. (2006) carried out an experiment in which participants were told that they were talking to a computer but in which the start-up screens for these computers differed. If the start-up screen stated that this was a very basic but well-tested version, participants aligned more than when the start-up screen announced the computer to be the latest, expensive computer software on the market. The authors conclude that people align more because of low suspected capabilities of the system. Pearson et al. (2003) also suspect an influence of politeness considerations. They had the communication partner, again a computer, present a dispreferred

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 2.  The role of the addressee: Central notions

term after participants had already previously used their preferred term twice. If people believed their communication partner to be human, only 15% aligned with their partner’s term. If they believed their partner to be a computer, 67% switched to the dispreferred term, even though their previous two uses of the preferred term had caused no problems. The authors argue that for humans to disalign in this situation is considered impolite, whereas computers are not expected to be polite. Furthermore, also considerations regarding the limited capabilities of computers in comparison with humans may play a role in the participants’ behavior. Pearson et al. (2006), using the same methods as the study Pearson et al. (2003) described above, find that English-speaking participants align more to the lexical items their communication partners use when they believe them to be non-native speakers of English than when they believe them to be native speakers of English. On the basis of these findings, Branigan et al. (2010) conclude that alignment is mediated by a partner model if and only if communicative success is at stake. That is, they take speakers to model their partners in case they have to fear that otherwise communication may be endangered. While in most of these studies on the relationship between partner models and alignment the partner model concerns only what is relevant for successful understanding, i.e. the communication partner’s cognitive and linguistic capabilities, the studies carried out in the framework of communication accommodation theory (e.g. Giles et al., 1991) show that also other considerations in the relationship between speaker and hearer play a role, especially speakers’ interactional goals and identity needs (see Section 2.4 Accommodation above). Furthermore, Fischer and Wilde (2005) argue that the speakers’ partner models determine the limits to alignment. In our study, we investigated participants’ alignment with a nonce word used by the robot in comparison with a spatial reference strategy that was completely opaque to the users and thus as uninterpretable to the participants as the nonce word. We found that speakers only aligned to linguistic material presented to them if it fit their concept of their addressee; in particular, they aligned with the spatial descriptions because they expected the robot to be competent in this area, but not with the lexical item, because they considered themselves more competent concerning natural language terms. This finding is in line with findings by Kraljic et al. (2008) who find participants to align only with phonetic peculiarities of the communication partner when these constitute a characteristic trait of their communication partner, and not a contingent, accidental effect of the speaker’s pronunciation. Thus, speakers may select to which linguistic features of the communication partner they align their utterances depending on their model of the communication partner. Pickering and Garrod (2013) extend the interactive alignment model by suggesting a model in which production and comprehension are tightly integrated

55

56

Designing Speech for a Recipient

and should be seen in the broader context of (joint) action. An integral part of the model is prediction by forward modeling of the speaker’s own and of others’ actions; this is done in either one of two ways: If the listener is assumed to be similar to the speaker, speakers can use prediction by simulation, that is, they can simply predict the upcoming action on the basis of what they would do themselves. In contrast, if the listener is judged to be different, speakers are suggested to predict upcoming action and the partner’s comprehension on the basis of association, for instance, on the basis of previous experience the speaker has had with listeners like the current one. The authors however hold that the two mechanisms are tightly integrated. In general, this model extends the interactive alignment model by presenting possible mechanisms such as forward modeling by means of which alignment is facilitated, and it furthermore makes additional predictions regarding the processing of the simplified registers under investigation in the current study. To sum up, while the addressee’s linguistic utterances have been found to play a role for the speaker’s linguistic choices such that speakers may align with them, the exact circumstances under which this happens are yet far from clear.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

2.8  Feedback A prominent role the addressee may play in interaction is to provide feedback. Most approaches to dialog consider feedback to be work done by the listener, i.e. they do not consider feedback to exchange the speaking role but rather to be in support of the current speaker (e.g. Gardner, 2001). Thus, feedback is supposed to have a continuing function (cf. Schegloff, 1982). Insights into the role of feedback in discourse comes from studies that compare addressees and overhearers, but also from studies on how listeners provide speakers with clues to their epistemic and deontic states. Both considerations will be instructive regarding the role of the addressee in interaction. A first experiment demonstrating the crucial role of feedback in dialog was carried out by Krauss and Weinheimer (1966). The task for the speakers, 48 secretaries and other employees of Bell Labs, was to instruct their addressees to select one card out of six with six figures on each card that matched the distribution of figures on the instructor’s card. In one condition, participants could interact freely via an audio connection. In the other condition, participants could neither give nor receive verbal feedback. In both settings, listeners could also signal their card choices by means of a button. This button, however, was manipulated by the experimenters who randomly assigned either 50% correct or 100% correct. The instructors’ utterances were found to be shorter when instructors received feedback, and even shorter with 100% correct evaluation. Further statistical analysis revealed

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 2.  The role of the addressee: Central notions

that verbal feedback had more influence on the length of instructions than whether speakers believed only half of their instructions to be successful. Thus, speakers who receive verbal feedback are more efficient in their instructions. An experiment by Kraut (1982) directly compares addressees and overhearers in the same situation. In this study, participants were 213 university students, and the instructors’ task was to provide a movie summary for an addressee and an overhearer. The interaction was restricted to audio interaction only. Participants were granted a 10-minute warm-up. In order to provide extrinsic motivation for good performance, the experimenters provided a $15 prize for the best instruction. The results show that the more feedback addressees gave, the better they understood; active listeners’ summaries were found to be better than those by overhearers, as measured by a post-experimental questionnaire. The questionnaires also ascertained that the bad results for overhearers cannot be attributed to dissatisfaction. The experiments also found prior knowledge to influence addressees’ responses, such that responses were longer, exhibited many questions and provided new ideas. Prior knowledge in the addressee also influenced speakers’ efficiency as measured in length and amount of detail. Experiments by Schober and Clark (1989) with undergraduates in psychology as subjects also show that whether or not speakers can give feedback has considerable consequences for efficiency, speed, and accuracy in solving reference tasks. In their study, the speakers’ task was to instruct their addressees to select Tangram figures in the right order in six trials. Overhearers listened to audiotaped conversations, one half from the start, the other half from the third trial onwards. The results were that speakers and addressees continuously needed fewer words per trial; addressees were more accurate than overhearers, and early overhearers were more accurate than late overhearers, and overhearers often made their choices much later than adressees. In the interpretation of the results, however, the problem occurred that the findings could be due to the boring audiotape. Therefore, a second study was conducted in which speakers had to instruct their addressees to select particular Tangram figures, with the overhearer being present. During the task, participants were restricted such that while the addressee could give feedback, the overhearer could not, yet the overhearer heard and saw everything the speaker and the addressee were saying and doing. The comparison of the matching results by addressees and overhearers shows that addressees, who were allowed to give feedback, are significantly more accurate than overhearers. Moreover, overhearers varied much more in their placement times, and if they were late, they were likely to get it wrong. Also the number of disfluencies produced by the speaker may be influenced by the amount of feedback available; Hadelich et al. (2004) tested several different conditions with types of feedback varied in a Tangram task. In situations in which

57

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

58

Designing Speech for a Recipient

partners were not allowed to give verbal feedback, the numbers of disfluencies per speaker were much higher than when both participants were allowed to talk freely. We can conclude that the possibility to provide and receive feedback makes speakers’ instructions more efficient, more precise, more fluent, more detailed and addressees’ understanding deeper and more accurate. Thus, feedback has a considerable impact on the speakers’ formulation processes and on the listener’s level of understanding. Given this enormous impact, the question is how the possibility to give feedback, such as producing an uh-huh, can make such a difference. Several studies from very different theoretical backgrounds have addressed this question and have provided proposals to explain how feedback signals may do their work in dialog. One role of feedback is to provide displays of understanding. In the ethnomethodological perspective, to achieve intersubjectivity, participants need to achieve a joint understanding of the things meant in the interaction, which is done by providing displays of the current understanding. That is, speakers can only assume to be understood to the degree to which their communication partners have ratified their presentations (cf. the grounding process (Clark and Schaefer, 1989a)). Thus, there are problems inherent in communication (ambiguity, vagueness etc.) that are overcome in conversation by joint displays of the speakers’ understanding of their communication partners’ utterances. In this way, speakers are orderly and methodological in their ways of speaking (Schegloff and Sacks, 1973, p. 290): “If the materials (records of natural conversations) were orderly, they were so because they had been methodologically produced by members of society for one another, and it was a feature of the conversations we treated as data that they were produced so as to allow the display by the coparticipants to each other of their orderliness, and to allow the participants to display to one another their analysis, appreciation, and use of that orderliness.” In a second step, these interpretations are also available for the analyst: But while understandings of other turns’ talk are displayed to coparticipants, they are available as well to professional analysts who are thereby afforded a proof criterion (and a search procedure) for the analysis of what a turn’s talk is occupied with. Since it is the parties’ understandings of prior turns’ talk that is relevant to their construction of next turns, it is their understandings that are wanted for analysis. The display of those understandings in the talk of subsequent turns affords […] a proof procedure for professional analysis of prior turns – resources intrinsic to the data themselves. (Sacks et al., 1974, p. 729)

Schegloff (1982) holds that feedback signals fulfill their functions based on their sequential position. That is, their occurrence not anywhere in the dialog but at very specific locations in the conversation, namely at transition relevance places (TRPs), means “I understand the state of the talk” and that “extended talk by another is

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 2.  The role of the addressee: Central notions

going on by declining to produce a fuller turn in that position” (Schegloff, 1982, p. 81). Thus, instances of feedback signals indicate that the listener is perceiving and understanding the speaker’s utterances well enough to identify a TRP. Furthermore, by neither taking the turn nor initiating repair, which are the listener’s other options at a TRP, he or she ratifies that the current speaker is holding the speaking role. In addition, by passing the opportunity to initiate repair or to take the turn, the listener acknowledges that the speaker may be in the course of producing a larger unit of discourse. This interpretation is confirmed by a study by Bavelas et al. (2002) which demonstrates that listener responses do not only occur at TRPs, but at places that are highly coordinated between speaker and hearer such that mutual gaze is established for a brief moment in which linguistic and non-lingistic responses may occur. In accordance with the function to support the current speaker in his or her speaking role, Schegloff labels feedback signals in this function ‘continuers’ (Schegloff, 1982, p. 87). Continuers therefore do not claim understanding in general, but display “a particular understanding through production of an action fitted to that understanding” (Schegloff, 1982, p. 81).6 In contrast to Schegloff’s proposal that feedback signals do their work by passing the opportunity to either take the turn or to initiate repair, Clark and Schaefer (1989a) assume that these signals are part of listeners’ efforts to “take positive steps to establish understanding” (Clark and Schaefer, 1989a, p. 265). They hold that people “try to ground what is said – to reach the mutual belief that what the speaker meant has been understood by everyone well enough for current purposes” (Clark and Schaefer, 1989a, p. 290), and that feedback figures prominently in the grounding process (see Section 2.6 Common Ground above). That is, speakers can only assume their utterances to be common ground to the degree to which their communication partners have accepted their presentations. Therefore, contributions are “not formulated by a speaker according to a prior plan, but emerge as the contributor and the partner act collaboratively” (Clark and Schaefer, 1989a, p. 292). This is done by mutual displays of acceptance of what has been presented by the partner. These displays provide speakers with clues to the extent to which contents are accepted, i.e. what the hearer has understood – well enough ‘for current purposes’ (Clark and Schaefer, 1989a, p. 290). According to Clark and Schaefer (1989a), the methods by means of which listeners may accept a speaker’s presentation are the following:

6. Curiously, the examples Schegloff quotes as evidence occur after instances of “re-cognitional reference” (Schegloff, 1982, p. 80). That is, in these examples speakers use feedback signals after they have identified the person the current speaker is talking about, i.e. when they have understood the information provided by their communication partner. Providing information on the success of establishing a reference may in fact exceed attention to the current state of talk.

59

60 Designing Speech for a Recipient

continued attention: for instance, by means of non-verbal signals; initiation of relevant next contribution: for instance, A: Sit down. B: thanks. acknowledgement: for instance, A: Bill is late. B: uh-huh. demonstration: for instance, A: Sit down. B (sits down).

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

display: for instance, A: call 9 – 1 – 1. B: 9 – 1 – 1, okay.

These different means differ in the strength by means of which they signal acceptance of the partner’s presentation; for instance, “[p]artners generally use acknowledgments only when they are quite confident that they understand and that the contributor isn’t expecting strong evidence” (Clark and Schaefer, 1989a, p. 281). Accordingly, in a study of reference to persons and places, Heritage (2007) finds only tacit claims of recognition: “overt demonstrations of recognition were nowhere to be found” (Heritage, 2007, p. 257). However, the means identified are likely not to be the only means that participants use to give feedback; Brennan (2005) finds that as soon as participants in her experiments found out that their partners could see how they were using their mouse, they used their moves not only instrumentally, but as communicative acts, for instance, to clarify the references of their utterances by using their mouse to point to the object in question. Thus, the above list may be constrained by the participants’ possibilities in a given medium, such as the telephone. Besides indicating different degrees of acceptance, feedback may also refer to different kinds of information that are accepted. Allwood et al. (1992) provide a model of different levels of discourse: First, communication requires that at least two agents are willing and able to communicate. Second, communication requires that the receiving agent is willing and able to perceive the behavioral or other means whereby the sending agent is displaying or signalling information. Third, communication requires that the receiving agent is willing and able to understand the content that the sender is displaying or signalling. (. . .) Finally, communication requires that the receiving agent is willing and able to react attitudinally and behaviorally to various aspects of the content that the sender is displaying or signalling. (Allwood et al., 1992, p. 2–3)

Allwood et al. (2006) extend the model to cover also non-verbal means of providing feedback. In this model, both verbal and non-verbal feedback is produced on the four different levels outlined above, and often simultaneously. Thus, if a feedback signal provides an indicator for understanding, it also provides indicators of successful contact and perception. This corresponds to Clark’s suggestion that the act of grounding is anchored to a logic of upward completion and downward evidence (Clark, 1996, p. 389).

Chapter 2.  The role of the addressee: Central notions

Gardner’s (2001) study of different deliveries of mm shows that minute differences in prosodic delivery have a great influence on speakers’ knowing of their partners’ information status. Using CA methodology, Gardner (2001) can not only show that listeners produce signals on this detailed level, but also that speakers really make use of this kind of information. For example, mm with fall-rising intonation (Gardner, 2001, p. 211) is oriented to a turn which is: – – –

regularly articulated unclearly; regularly conceptually difficult to understand; frequently in a sequentially incomplete position.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

An example is the following (Gardner, 2001, p. 123), in which Liz’ Mm hm? follows a turn which exhibits several hesitation phenomena (repetition, a pause, hesitation markers) and which does not answer Liz’ question, as her clarification at the end of the excerpt shows: (3) Mel: He -he wen / went home last night en ee – took his compuder home (0.7) Liz: ya mean he hasn’t got one et home? Mel: he’s he – carries one a them er (1.7) laptop – aroun with im. Liz: Mm hm? (0.8) Mel: an the book. (1.8) Liz: Ca ya do all that stuff on the laptop. (0.5) Mel: Oh yeah.

In contrast, mm with falling intonation (Gardner, 2001, p. 197) is oriented to a turn which: – – – – –

exhibits  no problems in articulation;  conceptually simple or straightforward; is is  not emotionally or judgmentally strongly expressive;  not disjunctive within the emerging sequence or topic; is is  pragmatically, grammatically and intonationally complete.

In the following excerpt, Ron’s Mm:. is oriented to a short and straightforward turn by Sally, and its use with falling intonation contour influences the further development of the interaction such that Sally understands the previous topic to be concluded (Gardner, 2001, p. 199):

61

62

Designing Speech for a Recipient

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

(4) Ron: We had an appalling meeting tonight, (1.7) Sally: So did we:. Ron: Mm:. Sally: what was yours about-.

In the example, Ron’s mm with falling intonation is taken by his communication partner, Sally, as a signal that the previous topic, her own appalling meeting, is complete, conceptually simple and straightforward and thus that he doesn’t need any elaboration of it. Consequently, Sally reorients to the topic Ron has brought up before, namely his own appalling meeting. The example therefore illustrates how minute features of the prosodic delivery of linguistic feedback can have a great influence on the speakers’ knowing of their communication partners’ need, or receptiveness, for further information on the topic. This is in line with findings by Bavelas et al. (2000), who distinguish between generic responses, such as Gardner’s mm or Schegloff’s uh-huh, and specific responses, such as “looking sad, gasping in horror, mirror the speaker’s gesture, or supply an appropriate phrase” (Bavelas et al., 2000, p. 943). They find that specific responses are tightly coupled to the narrator’s story, and that the listener’s failure to produce such responses results in worse stories. Thus, the listener’s specific responses crucially support the speaker in the production of a narrative, and thus have a considerable influence on the further interactional progress. Likewise, storytellers also gesture less frequently and use smaller gestures for addressees who have heard the story before, thus making use of listeners’ cues to shape their storytelling behavior (Galati and Brennan, 2013). Other feedback signals are used to negotiate subtle differences in epistemic states and are highly sensitive to the sequential environments in which they occur (e.g. Heritage, 1984, 1998, 2002, 2005). Local (1996) furthermore demonstrates how these highly different sequential contexts are coordinated with particular prosodic deliveries (cf. also Ehlich, 1986). To sum up, the studies presented illustrate in which ways listeners’ contributions, and even a small word like oh, uhuh or mm, may help speakers to tailor their utterances specifically to the needs of their respective addressees – in contrast to possible overhearers. They demonstrate how such a small signal uttered in a particular sequential position, with a particular intonation contour may indicate how the listener is oriented to the previous turn, in terms of the interactional achievement of discourse, the grounding of information, the conditions for communication to happen, and the listener’s understanding of the topic as complete and unproblematic. The approaches presented, irrespective of their different theoretical and methodological backgrounds, describe feedback functioning on a local,

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 2.  The role of the addressee: Central notions

sequential basis, oriented towards the previous turn. Thus they enable an account of closely coordinated dialog that does not require a complex model of the communication partner since choices may be made locally based on the contingent interaction and on the partner’s feedback on what would be useful in the next turn. How listeners’ responses may influence speakers’ productions and course of action is also apparent from a study by Goodwin (1986) who illustrates how different predispositions in the audience can lead to radically different interpretations of what is said by the speaker; these different interpretations even within the same audience may shape the course of action taken by framing the narration in ways that influence what is reported on and how (cf. also Heritage, 1984; Bublitz, 1988). These listener responses are then not restricted to feedback signals or their nonverbal equivalents, but may comprise of whole utterances that present new information themselves. Thus, listeners’ feedback cannot be categorically distinguished from interactional effects in general. In the following, I therefore take listeners’ feedback as a starting point for the linguistic analyses, yet what really needs to be explained are the interactional effects of listeners’ contributions. What is generally open is thus whether and in which ways feedback may also contribute more globally to the ways speakers orient to their addressees. For instance, the results presented in Bavelas et al. (2000), Heritage (1984, 1998, 2002, 2015) and in Gardner (2001) suggest that the different deliveries of feedback signals have global interactional consequences, which go far beyond the grounding of utterances since they serve a forward-looking, for instance, topic organizing function as well. Thus it is generally possible that feedback contributes to the creation of partner models in some way. That cognitive representations and feedback are in fact tightly coupled is suggested, for instance, by findings that show that the possible effects of feedback may be overridden by expectation. For instance, Kuhlen and Brennan (2011) attempt to determine the role of participants’ expectations in dialog; in particular, they aim to disentangle the role of attentiveness in dialog from the role of expectation of such attentiveness. They find that speakers with attentive addressees tell jokes in more vivid detail than those with distracted addressees only if they expected them to be attentive. Thus, expectation seems to override possible effects from participants’ feedback. In a later study, Kuhlen et al. (2012) compare the effects of expectations and feedback by manipulating both variables; they find that both speakers’ expectation that listeners will be distracted or attentive and the feedback from distracted or attentive listeners impact on speakers’ types and amounts of gesturing while speaking. How exactly expectation and feedback interact is, however, still an open question (cf. also Kuhlen et al., 2012, p. 37). Furthermore, participants may have other needs besides signaling understanding and grounding utterances; Wrede et al. (2010) demonstrate that listeners

63

64 Designing Speech for a Recipient

may deliberatively make their feedback ambiguous between understanding and securing the flow of communication in asymmetrical communication situations, such as in communication between hearing and hearing impaired speakers or between humans and a speech processing robot. Thus, interactional goals and expectations may influence the functions of feedback, yet it is still unclear how exactly this happens. To conclude, feedback may provide speakers with very detailed information concerning their communication partners’ current needs, on which they may rely for formulating their utterances for their partners. This is an interactive, jointly achieved and locally managed process that may require very little cognitive load, yet it may interact in various ways with cognitive processes concerning the addressee.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

2.9  Interim conclusions This survey of the different notions suggested in the study of the partner in interaction produces a complex picture of the possible ways in which the addressee may be oriented to in interaction. The factors considered concern what speakers bring into the interaction, such as interactional goals, for example, the speakers’ wish to affiliate or to disaffiliate with their communication partner, or expectations about group membership, social status or common ground, but they also concern what happens between speaker and listener in the course of the interaction, such as alignment and accommodation or the interactional grounding of what has been said, and speakers’ online modeling of their partners’ current epistemic and deontic states. In particular, the review of the central notions proposed in connection with addressee orientation has shown that some approaches to the role of the addressee in interaction take a cognitive perspective, treating the phenomenon as a matter of cognitive models of the communication partner or as a matter of interactional goals that a participant may bring into an interaction. Especially the notion of common ground is often treated as related to partner modeling, but also the concept of audience design relies on speakers’ orientation to certain mental representations of their listeners, thus treating addressee orientation as a cognitive phenomenon. Likewise, participants’ interactional goals and identity needs are taken as explanatory concepts in CAT. These notions regard the choice of a linguistic feature for a particular communication partner to rely on participants’ individual conceptions of their communication partner or on individual goals or agendas. In contrast, there are also approaches that view addressee orientation as an interactional, socially determined phenomenon. On the one hand, there is the notion of interactive alignment which takes the adjustment to the partner to be

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 2.  The role of the addressee: Central notions

involuntary and automatic; on the other hand, there are approaches that consider the linguistic forms produced to be interactionally achieved. In this perspective, the adjustments to a particular communication partner are locally managed and rely as much on speakers’ choices as on the partner’s interactional behavior and on the unfolding dynamics of the interaction itself. The phenomenon considered thus lies at the interface between cognitive and social aspects of communication and consequently constitutes a useful case for the study of the relationship of these two perspectives. In particular, given the different factors that may influence the ways the partner is attended to, it is unclear how these factors fit together; the situation is complicated by the fact that the different notions proposed stem from different disciplines with radically different presuppositions, ontologies and methodologies. Bavelas (2007), for instance, notes a tendency for psychologists to focus on individual actions and attributes (Bavelas, 2007, p. 140) in which the individual is seen as conceptually and physically isolated, and where the other person is seen as external to the individual and as an external influence. She argues that psychologists implicitly follow a reductionist perspective by assuming that “studying individuals who have been conceptually or even physically separated from each other will naturally lead to an understanding of their micro-social interactions” (Bavelas, 2007, p. 141). Such a perspective has obviously considerable consequences for possible findings on the role of the partner in interaction; likewise, the micro-sociological approach of conversation analysts, for instance Sacks (1972a) or Schegloff (2007), with its focus on interactional achievements, is not designed to yield results on, for instance, the cognitive models speakers may build up of their partners and employ in interaction. Thus, the mechanisms involved and the aspects of the communication partner taken into account have been addressed almost complementarily in different disciplines. Consequently, the study of addressee orientation, if it is to be comprehensive, necessarily involves an interdisciplinary perspective. One conclusion regarding the questions raised in Section  1.1 Open Issues can be reached from this survey already, which is that the linguistic features and interactional behaviors that are influenced by addressee orientation concern all linguistic levels, i.e. phonetics, phonology, prosody; grammar, comprising both morphology and syntax; the lexicon; discourse, perspective taking and the common ground assumed, as well as the choice of language as such. Addressee orientation is thus a pervasive feature that concerns language use at every linguistic level. Regarding all the other issues raised above in Section 1.1 Open Issues, we do not have definitive answers yet. With respect to partner modeling, we have seen previous results showing that people may build up quite elaborate partner models, but what exactly they comprise, how detailed they are, to what extent they rely on contingent information from the interaction and how they are put to use in

65

66 Designing Speech for a Recipient

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

naturally occurring talk-in-interaction is not clear. Furthermore, alignment has been suggested to be a fundamental mechanism of dialog, yet several studies show both participation status and the particular communication partner to influence the amount of alignment in interaction. Moreover, we have seen feedback to play a critical role for task efficiency, for instance, yet it is unclear to what extent it contributes to cognitive processes such as partner modeling and to what extent it renders such (possibly even explicit) modeling superfluous. Finally, the answer to the question, on which basis linguistic choices are made, can only be given once we have solved the above problems. Especially how the different findings relate to each other is so far impossible to ascertain, also because the different findings stem mostly from different disciplines. The literature overview has thus also revealed that a comprehensive and interdisciplinary perspective on the phenomenon is essential; the next chapter will however show that especially linguistic analyses of interactions with communication partners with certain measurable properties, such as robots, provide a methodology capable of closing the gap outlined, allowing us to answer the questions raised.

chapter 3

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Methods and data

As the literature overview in the previous chapter has revealed, the problems addressed in this study, although genuinely linguistic in nature, require an interdisciplinary perspective and thus a combination of different methods. One part of the approach taken here is to compare how addressees are taken into account in three different situations, namely when addressing children, foreigners and robots. The investigation of these three situations allows, more than the investigation of natural conversations between adult native speakers, inferences on addressee orientation and the active role of the recipient because of the unequal nature of the participants. That is, it is assumed that we can infer from speakers’ linguistic behaviors towards addressees who differ from the speakers in distinct ways the mechanisms and resources speakers make use of in interaction in general. The three suspected varieties chosen for this investigation have been studied in several different disciplines, for instance, from linguistic, anthropological, developmental, second language teaching, sociolinguistic and computational linguistics perspectives, and thus results from different disciplines have to be combined. While there is a considerable amount of research on child-directed (CDS) and foreigner-directed speech (FDS) we can make use of, linguistic studies on robot-directed speech (RDS) are still rare. The second part of the approach taken in this investigation is therefore to extend the discussion of the literature on the three varieties under consideration by empirical analyses of RDS. The studies of RDS are especially designed – in terms of both data elicitation and analysis – to address the questions central to the current investigation. Corresponding to the interdisciplinary perspective on the phenomena under consideration, the methods developed for the investigation of the questions raised in the previous two chapters combine principles and approaches from different disciplines.

3.1  Methods The investigation of robot-directed speech is used in this investigation to address questions left open from the comparison of the ways of speaking to children and foreigners. Addressing these questions requires empirical analyses especially designed to investigate partner orientation in dialog. Empirical studies of addressee

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

68 Designing Speech for a Recipient

orientation are however complicated by the fact that, as we have seen in Chapter 2 The Role of the Addressee: Central Notions, there is little agreement on what partner models consist of, whether partner models are built up and accessed at all, in which ways the partners’ feedback is taken into account, and how linguistic choices are being made, etc. Trying to understand addressee orientation thus confronts the researcher with the problem of identifying the relationships between a speaker, a set of linguistic features from which he or she can choose, and a particular communication partner, none of which are objectively given. For instance, the set of linguistic options available to a speaker at a given moment cannot be objectively determined from the outside. The inventory at hand may be smaller than expected, for instance, because of memory limitations, and also larger than expected, due to, for instance, creative language use or patterns conventionalized just in the respective group of participants. Therefore, the investigation has to identify the participants’ own understandings as apparent from their linguistic behavior; these understandings should be the result of the empirical investigation and not simply be assumed. For instance, whether partner models consist of representations of the partner’s knowledge or rather of inference-rich categories from which certain knowledge and capabilities are taken to follow, should be a result of the study rather than one of its axioms. The methods chosen for this investigation should therefore require as few a priori theoretical assumptions as possible and rather concentrate on identifying the participants’ own understandings. In their handbook article on the role of the partner in interactive spoken discourse, Schober and Brennan (2003) discuss two different research methodologies for the investigation of partner-specific effects in language production, namely corpus studies and laboratory studies. By corpus studies they understand investigations of collections of spontaneous conversations occurring in real-life settings. They suggest the advantage of conversation studies to be their ecological validity, yet they mention a number of methodological problems, such as the question of sample size and the difficulty to infer the speakers’ intentions from the transcript. The alternative, they suggest, are laboratory studies, which allow the researcher “to assess speakers’ intentions and addressees’ comprehension independently of the conversation” (Schober and Brennan, 2003, p. 129). The disadvantages of such studies are their lack of ecological validity and their restrictedness to task-oriented situations. In this study, a combination of these two approaches, i.e. corpus analysis and experimentation, is used to address the role of the recipient in interaction. The method is similar to experimentation in social psychology by bringing participants in certain situations in which they have to fulfill a given task, here in interaction with a robot. These situations will be held comparable, differing only in a single variable. The method is different from usual psychological experimentation,

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 3.  Methods and data

however, in that evaluation then is not based on post-experimental questionnaires or on the analysis of single dependent variables, but on the qualitative and quantitative analysis of participants’ verbal behavior during the tasks, i.e. on qualitative and quantitative corpus analyses. This method is self-evidently not suited to address all questions raised in the literature; for instance, no assertions can be made about on-line production processes and resources made use of. However, the method is well-suited to show participants’ own orientations in dialog and to study the consequences these orientations may have. The procedure for the analysis of RDS involves three steps: (a) experimental data elicitation, (b) a qualitative analysis of the interactions, and (c) a quantitative corpus linguistic and statistical investigation of the corpora. The qualitative analyses mostly rely on principles from ethnomethodological conversation analysis. This seems appropriate given the interactional nature of the object of study and our interest in participants’ own understandings of the situation and the communication partner. The dynamics of addressee orientation and the role of the addressee in interaction, though also relevant in monological speech and language, can best be studied in interactional situations, and conversation analysis has accumulated a considerable body of research on the mechanisms of interaction. However, conversation analysis (henceforth CA) is usually restricted to naturally occurring ordinary conversation (Hutchby and Wooffitt, 1998; Heritage, 2007).1 The data which are proposed here as particularly useful for the investigation of the role of the addressee are thus not common data for CA analyses, and there is consequently a need to justify the use of CA methods for the experimentally elicited data. Yet while the current approach does not share with CA a particular interest in the sequential structures of interaction as a social practice, CA is especially useful for our purposes since it provides us with methods for identifying those aspects of the communication partner the speakers attend to themselves, as I argue in Section 3.1.2 Qualitative Analysis below. Besides the choice of CA for the qualitative analysis, also the data elicitation procedures need further explanation since they rely on methods developed especially for the current purposes and which differ in some aspects from previous work in social psychology, from usual data elicitation procedures in human-computer interaction as well as from work in CA. The data elicitation procedures are therefore described and discussed below.

1. However, more recently the scope of conversation analytic studies has been extended, for instance, to institutional settings (e.g. Heritage and Sorjonen, 1994; Pomerantz et al., 1995; Drew and Heritage, 1992) as well as to task-oriented situations (e.g. Filipi and Wales, 2009), and thus the methodological choice made here is perhaps not as controversial now as it was some years ago.

69

70 Designing Speech for a Recipient

Finally, the qualitative studies will be supported by simple statistical analyses of the corpora, the data collections of human-robot interaction in different situations, which will be discussed in Section 3.2 Human-robot interaction data.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

3.1.1  Experimental data elicitation The concept underlying the comparison of the corpora elicited in different situations is that particular ways of speaking can be linked to objective differences in the communicative situation. The role of the communication partner becomes apparent if the communication partners differ in measurable ways, for instance, children versus adults, experts versus novices, strangers versus acquaintances or native versus non-native speakers (Schober and Brennan, 2003, p. 130). The hope is thus that the linguistic differences observable in comparable corpora can be related to the differences between addressees and not to any other influencing factors; for that reason, data need to be experimentally controlled in order to investigate the role of the addressee under scientific conditions. In contrast to natural conversation, experimentally elicited data can ensure that there is really only one variable in which situations differ. Even more than interactions between humans, human-robot interactions may be manipulated regarding the different parameters that speakers may take into account when choosing their language for their particular communication partner (Fischer, 2010b). That is, while we can manipulate a robot’s appearance, behavior and linguistic output, we cannot manipulate human communication partners to the same extent, at least it cannot be guaranteed that a confederate, for instance, will not provide cues that cast doubt on the validity of the results (cf. Brennan and Hanna, 2009). Thus, using human-robot interaction to investigate variables of addressee orientation seems to be useful. Furthermore, human-robot interaction does not constitute a natural conversational situation for most speakers and has to be elicited in laboratory settings anyway. At the same time, since we are interested in the discourse conditions for particular linguistic choices in human-robot interaction, we need to investigate free interactions between humans and robots. Free interactions however impose a number of methodological problems. In particular, the influence of situational variables does not become apparent if each interaction constitutes a unique interactional achievement. Thus, in order to identify the influence of particular situational variables, data have to be elicited such that all other factors are kept constant. If dialogs are elicited under conditions that differ only with respect to a single feature, observable differences can be related to the contextual feature under consideration, at least for hypothesis formation, which can then be systematically explored in qualitative analyses.

Chapter 3.  Methods and data

For the elicitation of the data therefore the so-called Wizard-of-Oz methodology was used, which was developed as a means to pretest the design of automatic dialog systems (cf. Fraser and Gilbert, 1991; Weiss et al., 2009). In such a scenario, subjects believe to be communicating with a real computer or robot while the supposed system’s behavior is actually manipulated by a human ‘wizard’. At the same time, human users are not prompted to display a particular linguistic behavior, but are free to behave naturally, that is, in a way that is similar to the way they would behave in real situations with real computers or robots. Eliciting data in such a scenario therefore allows the manipulation of the robot’s behavior in order to induce different kinds of linguistic behavior. A general difficulty in the design of Wizard-of-Oz scenarios consists in controlling the wizard’s behavior in order to be able to

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

– – –

simulate the behavior of a machine; be able to compare dialogs across participants and time; prevent individual adaptation processes of the wizard.

Many different solutions to these problems have been proposed in the literature; one possibility is to provide the wizard with particular rules of behavior (e.g. Grosz, 1982; Krause and Hitzenberger, 1992; Brindöpke et al., 1995). The success of this method depends on how closely the wizard can stick to the rules. Wizards are after all human beings who understand what the participants produce on the basis of their human communicative capabilities. Some study designers therefore do not tell the wizards which role they are playing. In Amalberti et al. (1993) and in Richards and Underwood (1984), for instance, the wizard did not know whether participants had been told that they were talking to a human being or whether they had been told that they were communicating with a computer; thus, the wizard could not design his or her behavior differently when simulating a human or an artificial communication partner. However, he or she may of course have had his or her hypotheses about what the speakers believed about who they were talking to. In both methods, the users’ utterances were directly available to the wizards. A third method was introduced by Johnstone et al. (1994) who used two wizards: one to translate the human speakers’ input into a formal code, and a second one to construct an answer on the basis of this encoded message. Thus, for the responding wizard the users’ utterances were not directly available and thus he or she could not adapt to the users’ behavior. A computationally more advanced version of this method is suggested by Sequeira et al. (2016), who allow the wizard to perceive only what the robot perceives, which is not much. The methods discussed so far remove human biases from the interaction, but they are unsuited to provide comparability between different conditions and between different speakers because the wizards

71

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

72

Designing Speech for a Recipient

react to what the participants produce in one way or another, leading to uniquely structured and thus uncomparable dialogs. Another method is therefore used in the dialogs elicited for this study (Fischer, 2003, 2010b): a fixed schema of behaviors was employed, irrespective of what speakers produced. In this way, the behavior of the wizard as a machine is convincing, and interpersonal variation can be controlled since all speakers are confronted with exactly the same system output. This may seem strange, but for speakers the effect is only that they may get the impression that the system does not understand them very well. It is therefore important to note that for the speakers themselves, the communicative situation is interactional and that the perspective that participants are ‘not really interacting’ is an etic perspective from the outside, not the participants’ own.2 In the methodology employed here (Fischer, 2003, 2010b), the comparability and control of the situation is thus achieved by keeping the robot behavior constant. That is, the robot’s behavioral and verbal output is based on a fixed schema which is the same for all dialogs and across corpora. In this way, not only are all speakers confronted with exactly the same situation; the methodology also allows us to investigate speakers’ concepts of their (artificial) communication partner. Because of the fixed schema that is causally unrelated to the speakers’ actual behavior, the speakers’ sense-making efforts cannot be attributed to particular features or misbehaviors of a particular robot, but they have to be understood as arising from the speakers’ own understandings of the situation, including their understanding of their communication partner. For example, if a speaker after a miscommunication (which may be frequent if the robot does not really react to the speakers’ utterances) uses a different descriptive term for the goal object, she displays that she holds the problem to be due to the robot’s limited lexicon. If she switches to another reference system, she displays to us that she believes the reference system to be possibly problematic in this situation. Thus, the design allows the identification of the explanatory models speakers build up to make sense out of the human-robot situation. Finally, the frequent impression of miscommunication encourages speakers to employ more than one linguistic strategy. If speakers are immediately successful, they have been found to stick to this strategy (Moratz et al., 2001). This would however yield very uninteresting data to us because the speakers’ reformulations are particularly revealing concerning the participants’ concepts of their communication partner and their understandings of the situation (see below).

2. Since in our experiments all speakers believed that they were talking to real automatic speech processing systems, as they put down in a post-experimental questionnaire, I shall refer to the simulated systems reported on in this study simply as robots.

Chapter 3.  Methods and data

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

3.1.2  Qualitative analysis The qualitative method employed is based on perspectives developed in the framework of ethnomethodological conversation analysis (CA), which aims at describing “how common-sense reasoning and social action work” (Heritage, 1988, p. 127). While the problems dealt with in conversation analysis traditionally focus on the sequential organization of conversations, the analysis of practices in dealing with aspects of the context, such as the recipient (e.g. Schegloff, 1972a), categories like gender (e.g. Schegloff, 1997), or the impact of institutional settings (e.g. Drew and Heritage, 1992; Heritage and Sorjonen, 1994; Pomerantz et al., 1995), have continuously been subject to empirical analysis as well. This property makes the method very well suited for the current purposes. One of the main assumptions of CA is that a description of interaction should only rest upon those categories which can be shown to be attended to by the participants themselves, i.e. that they are not carried into the analyses as explanatory concepts by the analyst. This methodological precaution guarantees the identification of only those conditioning factors that are of relevance to the speakers themselves. Conversation analysis has therefore concentrated on the observable, on aspects that leave their traces in the sequential organization of talk (Sacks, 1984). What we can observe in interactions, then, are the traces of a number of methods (hence: ethnomethodology) speakers themselves make use of when disambiguating their utterances for each other and displaying their understanding of their communication partners’ speech. “To recognize what is said means to recognize how a person is speaking” (Garfinkel, 1972, p. 320, emphasis original). Common understanding therefore “has necessarily an operational structure” (Garfinkel, 1972, p. 321). The displays of the procedures involved, however, may occur only for ‘all practical purposes’ (Nothdurft, 1984). That is, if the interactants assume that they understand each other, for example because of shared background, they do not display such information. Thus, the less problematic a situation is, the fewer displays of understanding can be observed. For our purposes such displays are however essential because our aim is to identify the speakers’ own understandings of the situation and of the communication partner in order to determine those aspects of the speakers’ understandings of the situation that influence their special linguistic behaviors. The fact that human-robot interaction is an unusual communication situation for the participants thus constitutes a deviant case and therefore a useful methodological resource (Schegloff, 1972b) because it leads people into public sense-making activities. In ethnomethodological conversation analysis, deviant cases play an important role since the way participants deal with them underlines the normative nature of the regularity observed. That is, in their interactive treatment of a deviant case,

73

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

74

Designing Speech for a Recipient

speakers indicate their attention to the regularity proposed overtly by displaying their normative expectations (Sacks et al., 1974; Hutchby and Wooffitt, 1998). The methodology is thus based on the idea that deviant cases are not only orderly in themselves, but, as Hutchby and Wooffitt argue, if “someone displays in their conduct that they are ‘noticing’ the absence of a certain type of turn from a coparticipant, then that demonstrates their own orientation to the relevance of the sequence that the analyst is aiming to describe” (Hutchby and Wooffitt, 1998, p. 98). Thus, by not analyzing basically smooth and unproblematic human-to-human conversation, but by focusing instead on potentially problematic human-robot interaction, even generally omnipresent, self-evident, and obvious aspects of their communication partners to which speakers attend may become evident. Thus, in addition to investigating how meanings are intersubjectively established (cf. Sacks, 1984), as usual CA studies would, in human-robot interaction also the speakers’ own interactional work provides useful information on recipient design; that is, while in conversation participants display to each other how they understand the current state of talk, which is thus available to the analyst, the current method relies also on speakers’ self-reformulations, with a focus on the human participant and his or her cognitive processes as displayed in the interaction. That is, speakers’ understandings of their communication partner and the situation are also available from traces in the formulation process and in particular in their reformulations. It is therefore also the speakers’ own interactional work that we are interested in, and thus the study of reformulation in human-robot interaction is particularly useful to disentangle the role of the speakers’ conceptualization, including membership categorization, from the interactional work. The procedure taken here is therefore to analyze interactions in which ‘all practical purposes’ make it necessary that speakers display their understanding of their communication partner more overtly than is necessary in smooth natural conversation among equal human interlocutors. Some of the corpora described below were therefore so designed that the artificial system mostly miscommunicated. Consequently, speakers were forced to make several attempts at solving what they understood as communicative problems, which reveal their hypotheses about what the communicative failure could be caused by. The sequential positions of the utterances produced as solutions to the same problem reveal the speakers’ hypotheses about what causes the communicative problems because they constitute reformulations of the previous utterances. The reformulations thus provide not only interpretative cues as displays of the speakers’ own interpretations of their previous utterances, but also indicate the speakers’ concepts of the determining factors of the situation involved. These may concern, for instance, the speakers’ concepts about language, i.e. which linguistic structures are easier and which are more difficult to understand; the robot as a communication partner, i.e. what

Chapter 3.  Methods and data

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

strengths and weaknesses the robot is likely to have; and the subject matter, for instance, how a given task may be decomposed in order to become easier for the robot (Fischer, 2006b). An important aspect of the method employed here is thus that the simulated system utterances in human-robot interaction are not meant to simulate current systems. State of the art speech technology may be both more and less advanced than what the simulated systems display in this study. Since the aim of this investigation is rather to identify how speakers conceptualize their artificial communication partner, system behavior needs to be treated as a variable that can be investigated and that therefore has to be experimentally controlled. An example from the data elicited in human-robot communication can show how the speakers’ linguistic behavior can reveal their hypotheses about the reasons for the situation of miscommunication between speaker and robot. The following example from the corpus described in Moratz et al. (2001) shows that the participant pays attention to the words she chooses in the interaction with the robot:3 (1) Command: Fahre hinter den Kasten [drive behind the case] Robot: error Command: Fahre hinter die Kiste [drive behind the box] Robot: error Command: Fahre hinter den Karton [drive behind the cardboard box] Robot: error Command: Fahre hinter das Hindernis [drive behind the obstacle] Robot: error Command: Fahre hinter das braune [drive behind the brown one] Robot: error Command: Fahre hinter den braunen Klotz [drive behind the brown cube] Robot: error Command: Fahre hinter das braune Hindernis [drive behind the brown obstacle]

What can be seen here is that in a situation of miscommunication, the speaker proposes a number of solutions to achieve her goal (to make the robot move to one of the objects in the room). Now, neither her linguistic choices nor the order in which they occur are arbitrary – conversation analysis holds that no details of production should be discarded prior to analysis (Sacks, 1984) and that how something is said contributes to what is being said (Garfinkel, 1972, p. 320). Thus, we can assume

3. Note that the presentation of the data in this book does not follow CA standards (e.g. Jefferson, 1983, 2004) because the data were elicited in an interdisciplinary collaboration to serve different purposes. In this particular example, the data were input by typing, and the robot’s error messages were printed on the screen in front of the participants.

75

76

Designing Speech for a Recipient

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

a certain orderliness of the speaker’s utterances, and the fact that her utterances all constitute reformulations of each other reveals the speaker’s hypotheses about what could improve the understanding. The utterances receive the interpretation as indicators of such underlying concepts on the basis of their sequential placement: as solutions to the same problem. Since the robot’s behavior is produced on the basis of a fixed schema, what counts as a problem as well as a possible solution is proposed by the speakers themselves. In Example (1), the speaker’s attention to finding the appropriate word can be identified by the traces the hypothesis leaves in the linguistic choices she takes within the sequence: systematically trying out different nouns (case, box, card board box, cube, obstacle) and color specifications. We can therefore assume that this speaker believes the choice of vocabulary to be a potentially problematic aspect of human-robot interaction and that she can solve the communication problems at hand by finding those words the robot can understand. At the same time she displays her concept about the robot as a communication partner to whom not the same range of vocabulary is available and for whom words have to be chosen with particular attention.4 Returning to the corpus to see whether there are similar instances from other speakers shows that attention to vocabulary is a more widespread phenomenon; the following example stems from another participant: (2) Command: fahre schräg nach rechts oben [drive diagonally up to the right] Robot: error Command: bewege dich nach rechts oben [move up to the right]

Reformulations therefore provide us with an inherent, emic methodology for identifying the speakers’ own understandings of the affordances of the situation and the choices at their disposal. Reformulations do not only display what speakers consider to be choices available at this given moment themselves, but they also allow us to infer the communicative tasks speakers are attending to. The fact that this type of solution to a communication problem is attended to in several other situations of communication failure with the same speaker as well as with several other speakers shows that the choice of words constitutes a systematic aspect attended to in the interaction between humans and robots (cf. Fischer, 2006b). Another aspect of CA methodology made use of in this study builds on what Schegloff (1972a) has called membership analysis. Membership analyses reveal the kinds of contents that participants understand as prerequisites for providing an

4. Note that the communication problem could just as well have resulted from the robot’s inability to process the spatial preposition, imperative constructions or goal-directed instructions, or from its inability to navigate to the goal, among other possible problem sources.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 3.  Methods and data

answer to a question. In Schegloff’s study speakers carry out location, activity and membership analyses prior to formulating a spatial instruction; for instance, speakers elicit information about the recipient, because the communication partner’s knowledge may render some formulations more relevant than others. In particular, as a ‘member’ of a particular class of people, such as the class of locals, or of the class of those who have visited the place before, the addressee may be expected to know some landmarks that the speaker may use for the spatial description. In his or her spatial description, the speaker has to ascertain the ‘recognizability’ of place names, i.e. the fact that “the hearer can perform operations on the name – categorize it, find as a member of which class it is being used, bring knowledge to bear on it, detect which of its attributes are relevant in context, etc.” (Schegloff, 1972a, p. 91). If the recipient is, for instance, a member of the university, spatial formulation can build on concepts that can be assumed to be known by a university member, such as the location of the auditorium or the cafeteria or other places that may serve as landmarks. Thus, speakers may ask before providing an answer to a question about how to get to a particular place whether the recipient has visited the place before, for instance, ‘have you been to Hamburg before?’, or whether she is a member of a group of people who can be expected to know the place well, for example, ‘you are from Hamburg, aren’t you?’ or ‘you’ve studied in Hamburg, haven’t you?’. In other words, speakers inquire into the addressee’s membership in order to formulate their utterances on the basis of knowledge expectable of members of particular groups. The findings by Schegloff (1972a) show that speakers actively engage in identifying which kinds of information are suitable for their addressees, for instance, by means of clarification questions that may occur as insertions in response to questions, which are available to the analyst. The concept of membership analysis provides us with a sound methodological basis for determining the categories of common ground attended to also in human-robot interaction. For instance, in the example below, the speaker wonders about the vocabulary of the robot: (3) to m+, and to the (at=prominent)half-left(/a) also forty-five does it know degrees, – forty-five degrees to the left, (1) is the refrigerator, (2)

Thus, before producing a spatial instruction, this speaker tries to ascertain from the experimenter whether the robot knows the vocabulary occurring in the instruction.5 Unlike in conversation, the membership analyses speakers carry out in human-robot interaction are not only visible in clarifying questions directed at the communication partner or the experimenter, but may also become apparent from side comments; users often produce self-talk or comments to the experimenter in 5. The experimenter’s reaction to queries like this was invariably “try it out” or a non-verbal variant of it, such as a puzzled look or shrugging of the shoulders.

77

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

78

Designing Speech for a Recipient

which they give accounts of their doings and in which they reveal their interpretations and explanations about what is going on. This behavior can be explained on the basis of Goffman’s analysis of response cries (Goffman, 1978); he suggests that response cries may provide indicators of the norms of general behavior a person attends to by marking violations of such norms. Here, speakers give accounts to their overhearers, like the experimenter, of their partly strange linguistic behavior triggered by the system’s malfunctions. Besides producing self-talk or clarification questions directed at the experimenter, the participants filled out questionnaires after the interaction with the respective robot. These data may provide additional information regarding the kinds of information to which speakers attend and which they use as explanations for the communication problems arising themselves (cf. Fischer, 2006b). Regarding Example (1) above, the speaker noted down in the questionnaire after the recording that she believed that she did not always manage to find words the robot could understand. Thus, the speaker reveals that she attends to the choice of words as a conversational resource that is relevant in the interaction with this particular communication partner (and that she considers finding an appropriate shared lexicon to be her responsibility). Consequently the data from the questionnaire support the previous analyses of her participant categories by means of evidence from those further data sources. To sum up, even though the methods from conversation analysis employed here are pushed quite far, violating general practice in CA, the methods are very well suited to address the issues at hand. While the interactions investigated are not naturally occurring, the aim in this study is not the identification of normative structures of talk-in-interaction but the identification of attended-to categories (e.g. Bilmes, 2009). At the same time, the study makes use of CA resources, such as self-reformulations, and principles, such as membership categorization, to identify those categories participants can be shown to attend to themselves. Furthermore, although there are currently no naturally occurring human-robot interactions, such interactions may soon be part of our everyday lives. Thus, if CA wanted to restrict itself to natural conversations, novel forms and circumstances of interaction would be beyond its scope, even though human participants effortlessly adapt to these new communication situations. To conclude, although experimentally elicited human-robot interaction data are unusual data from a CA perspective and a challenge to some of the principles that usually guide CA investigations, such as the interactional achievement of intersubjectivity and the negotiated nature of membership analysis, in order to address the particular questions raised in this study, the methods developed here on the basis of CA principles are particularly well suited.

Chapter 3.  Methods and data

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

3.1.3  Quantitative analysis The qualitative methods presented above are very well suited to tease out participants’ own understandings of the unusual human-robot interaction situation that may guide their linguistic choices. The qualitative linguistic analyses can show us the types of linguistic structures that speakers employ as well as the kinds of information to which the participants attend when making their choices regarding particular linguistic features. However, the qualitative linguistic analysis does not provide the whole picture since, as we know from register studies (e.g. Finegan and Biber, 2001; Biber and Conrad, 2009), characteristics of particular varieties may express themselves in probabilistic distributions of linguistic features. The qualitative analysis therefore needs to be complemented with a quantitative comparison of corpora that differ only with respect to some aspect of the addressee, e.g. appearance, function or verbal behavior. Using comparable corpora, we can furthermore get an impression of the effect of conceptualization of the recipient on speakers’ linguistic choices in order to identify how participants understand the affordances of the situation and of the broad spectrum of possibilities speakers choose from. While the qualitative analysis of speakers’ reformulations and clarification questions provides a window into their sometimes radically different understandings of the task and of their communication partners’ capabilities and needs in the different corpora, the effect these different conceptualizations have on linguistic choice need to be considered quantitatively. Thus, we are also interested in different types of users, in general behavioral tendencies and in measuring the effects of particular situational variables. Accordingly, there will be two different kinds of quantitative corpus studies: On the one hand, comparable corpora will be investigated for the effects of single external variables, on the other hand we shall investigate the quantitative effect of particular findings of the qualitative studies, especially the effect of different concepts of the communication partner on the probability of the occurrence of certain linguistic features. A possible problem with the quantitative analysis of the comparable corpora may be that especially scripted human-robot interactions are not really comparable because the robot’s behaviors, even though identical across participants, have different effects due to their different sequential placement. For instance, a robot utterance like ‘I did not understand’ may lead to different conclusions if it occurs after an utterance like ‘I would like to you to go to the second object in the middle of the four red ones’ than when it occurs after an instruction like ‘go straight’. In such cases, different people may experience the robot differently overall, dependent on what they started out with; on the other hand, since the robot’s utterances are taken to occur in response to speakers’ prior utterances, at least there is a strong connection between what speakers start out with and how they interpret the robot’s responses. My findings, that speakers’ behavior at the beginning of interactions predicts their behavior much later in the interaction to a high degree (Fischer,

79

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

80 Designing Speech for a Recipient

2011), suggests that the randomness of the robot’s responses do not influence the interaction as a whole very much. Nevertheless, the quantitative analyses will have to be accompanied by qualitative analyses of sequences in which participants make sense of the respective robot’s utterances in the light of their previous utterances. Preceding the quantitative analyses of the corpora, the normalized rate of occurrence of each feature investigated was computed for each dialog; the phenomena that are under consideration in this study may concern a broad range of phenomena so that also different inflectional forms may, for instance, be considered separately in the linguistic analyses (Newman, 2010). The normalized rate of occurrence is the absolute number of occurrence per dialog divided by the total number of words or turns produced in this dialog, depending on the property under consideration. The normalization of the data serves to indicate the average frequency of an item within a particular speaker’s speech data, and thus to allow a comparison of data across different speakers irrespective of how long the particular stretch of each speaker’s speech has been. This process treats each participant’s speech as a single corpus – which might disguise the fact that participants may be carrying out different activities. However, the experiments have been generally designed in a way so as to make only one type of activity relevant (e.g. spatial instruction), and in any case the tasks are all the same for all participants. The normalized data provide the basis for most of the quantitative investigations, with one exception: In the analyses of alignment, also absolute occurrences are taken into account since it is as interesting to see how often speakers align with their partners at all as what the proportion of aligned speech compared to the amount of speech they produced in general amounts to. The quantitative analyses are carried out semi-automatically by means of simple shell scripts, which, for instance, counted the word frequencies in each dialog and computed the distributions according to each group of speakers. Statistical analyses are then used to compare different groups of speakers and to identify co-occurrence relationships between different variables in the data. For the analyses of variance (ANOVAs) and for the Pearson product moment correlation analyses, the Statistica software package was employed.

3.2  Human-robot interaction data The corpora of human-robot interaction dialogs for this investigation were elicited in the context of the Collaborative Research Area SFB/TR8 ‘Spatial Cognition’ at the University of Bremen. All scenarios in which the current data were elicited are joint attention scenarios, in which human and robot share the same context of situation. The three sets of corpora are comparable regarding one variable each: one set of corpora differs with respect to the robot's appearance; and two sets of corpora differ

Chapter 3.  Methods and data

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

with respect to whether or not the robot produces verbal output; in one of them, participants talk to the dog-like pet robot Aibo (see Figure 3.1)6, in the other, speakers instruct an autonomous wheelchair (see below). These corpora differ concerning task, scenario and also regarding the role of the linguistic utterances the robot produces. The participants in these studies were exchange students and staff at the University of Bremen who were native speakers of English. Participants were recruited on a word-of-mouth basis and rewarded with bars of Swiss chocolate.

Interactions with Aibo, Scorpion and Pioneer: Robot appearance The scenarios in which this set of corpora was elicited differ only with respect to the robot addressed. The comparability and control of the situation is achieved by keeping the task, all situational factors and even the robot’s behavior constant. The robot’s behavioral output is based on a fixed schema which is the same for all dialogs and across corpora. The first robot used was Sony’s Aibo, a commercially available robot which looks like a little dog (see Figure 3.1). The second robot, Scorpion, was built by Frank Kirchner and colleagues at the University of Bremen (e.g. Spenneberg and Kirchner, 2002) and looks like a metal spider (see Figure 3.2). The third robot used is the commercially available experimental robotic platform, the Pioneer (see Figure 3.3).7 The task participants had to carry out was to instruct one of the three robots verbally to move to particular goal objects, pointed at by the experimenter. Pointing was used in order to avoid prompting the participant with particular spatial descriptions. Most tasks involved a single goal object, yet one task involved a sequence of several objects to which the robot had to move. After the recording, participants filled out a questionnaire in which they were asked whether they had believed that they were talking to a real robot, which all of them acknowledged. The data elicitation conditions can thus be summarized as follows: Participants: 30 native English speakers – 13 dialogs with Aibo, a dog-like pet robot – 8 dialogs with Scorpion, a huge metal insect (e.g. Spenneberg and Kirchner, 2007) – 9 dialogs with Pioneer, a car- or box-like robot Task: to instruct the robot verbally to move to particular goal objects; Robot behavior: according to a fixed schema of behaviors, independent of the speakers’ utterances (‘wizard-of-Oz’ scenario).

The three different data sets are thus comparable in all respects besides the appearance of the robot. 6. Picture courtesy by Marissa Cohen. 7. The picture of the pioneer was made available to me by the PRT group at the Fernuni Hagen, while the picture of the Scorpion is a courtesy from Frank Kirchner’s working group; the copyright remains with the University of Bremen.

81

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

82 Designing Speech for a Recipient

Figure 3.1 Aibo

Figure 3.2  Scorpion

Chapter 3.  Methods and data

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Figure 3.3  Pioneer

Human-Aibo interaction: Non-verbal vs. verbal robot output The second set of comparable human-robot dialogs used here also differs with respect to a single variable only, in this case the robot’s linguistic output. The robot used here is again Sony’s Aibo. In the first part of the experiment, we recorded 13 English speakers instructing Aibo to move to particular goal objects, pointed at by the experimenter. The scenario was the same as in the previous experiments, i.e. participants had to instruct Aibo to move to a single goal object or to a sequence of several objects to which the robot had to go. Again, the robot was steered by student employees behind a screen according to a fixed schema of robot behaviors. After the recording, speakers were asked whether they had believed that they were talking to a real robot, which all of them acknowledged. Condition 1 can thus be summarized as follows: Participants: 13 native English speakers; Task: to instruct the robot verbally to move to particular goal objects; Robot behavior: according to a fixed schema of behaviors, independent of the speakers’ utterances (‘wizard-of-Oz’ scenario).

The second condition of this study (13 dialogs with native speakers of English who had not participated in the first study) was carried out in the same way and using the same robot behaviors as in the first condition, yet the robot’s non-verbal behaviors were augmented by verbal robot output. Again, the robot behavior was manipulated by a human ‘wizard’ (see Fraser and Gilbert, 1991). The robot utterances were presynthesized and played in a fixed order. The utterances were so designed as to give no clue as to what may have gone wrong in order to avoid prompting particular error resolution strategies from the users. However, in these utterances, four design

83

84

Designing Speech for a Recipient

features were used which previous studies (Moratz et al., 2001; Fischer, 2003, 2006b) had revealed to correspond to quite rare features in human-robot interaction if the robot does not give feedback: First, we made the robot greet the participant. Second, we made the robot ask for and propose spatial references using object naming strategies. Third, we made the robot use an extrinsic reference system. Fourth, as an indicator of high linguistic capabilities, the robot made extensive use of relative clauses. Combining these design criteria, the robot’s utterances are the following: yes hello, how do you do? what can I do for you? which object did you mean? do you want me to go to the object that is in front? do you mean the object that is 30 degrees west of the box? I did not understand. do you want me to go to the glass? excuse me, which of the objects did you mean? should I head towards the blue object? do you mean the object that is south south east of the box? a turn of 360 degrees does not make sense. I can’t go faster.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Condition 2 can be summed up as follows: Participants: 13 English human-robot dialogs; Task: participants’ task is exactly as in Condition 1; Robot behavior: the robot’s behavior is exactly as in Condition 1, just that the fixed schema of robot behaviors is paired with a fixed schema of robot utterances, independent of the participants’ utterances.

Human-Rolland interaction: Non-verbal vs. verbal robot output The third set of comparable human-robot dialogs was elicited in a different scenario, using the Bremen autonomous wheelchair Rolland (see Figure 3.4). Participants in both conditions of the experiment had to fulfill four tasks: The first task was to steer the Rolland through a room furnished as a flat for a handicapped person and to teach it the names of locations the user may want the wheelchair to take him or her to afterwards. For instance, the task was to teach the Rolland the names of places such as the kitchen, the TV set or the dining table for future reference. Participants were told that the robot would store the labels in its internal map of the room. The second task consisted in describing the relationships between the locations in the room for the robot from a fixed position. The third task was again a teaching task where the participant had to drive through the building and to teach the robot the names of particular locations, which in this case were pre-specified. The

Chapter 3.  Methods and data

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

fourth task then was to instruct the autonomous wheelchair in advance to take the participant to a location it had previously been familiarized with. Since the robot could in fact not move autonomously by the time of the experiment, participants had to steer the robot themselves, which was consistent with the training tasks participants believed to be carrying out; when after the instruction in the fourth task the robot was meant to take the user to the respective location, in condition 1 the experimenter, and in Condition 2 the robot, would claim that the batteries were too low for the Rolland to actually take the participant to the location requested. The experimenter would then end the experiment.

Figure 3.4  Rolland

Conditions 1 and 2 differ concerning the robot’s linguistic output. Whereas there was no linguistic output in Condition 1, in Condition 2 the robot greeted the participants and employed feedback signals and clarification requests during the learning tasks, and an incremental description of the room in the second task. A design criterion for the robot’s linguistic output in the dialogs with Rolland, in contrast to the utterances designed for Aibo, was to have the robot align with the user wherever possible (to make the learning scenario credible) and to produce useful feedback and accounts for misunderstandings. The wizards could therefore

85

86 Designing Speech for a Recipient

choose from a set of presynthesized robot utterances which constitute alternatives in a script specified for each location, depending on the lexical items previously used by the participant. For instance, the wizard had the choice between the two clarification questions “do you prefer couch over living room area?” and “do you prefer sofa over living room area?” and would choose the pre-synthesized utterance using sofa if the participant had used sofa before, simulating learning. Having a set of alternative utterances available allowed the wizards to react semi-flexibly to the speakers’ instructions. Otherwise, the wizard followed the script of robot utterances per location as in the other corpora. The likely linguistic choices had previously been determined from the range of choices speakers made in Condition 1. The design criteria for the linguistic utterances will be discussed in more detail in Section 6.5.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

3.3  Transcription conventions The human-robot interaction data were transcribed according to transcription conventions developed in the framework of the SFB/TR8 in Bremen (see Fischer, 2004) to suit a number of purposes, including automatic search and the provision of data for the training of an automatic speech processing system. Consequently, the transcription is based on the orthography of the words uttered; the transcriptions therefore look different from transcriptions created in the framework of conversation analysis (e.g. Jefferson, 1983, 2004). Nevertheless, the transcription conventions aim to preserve as many speech characteristics as possible. In the human-robot corpora, the turn IDs consist of a letter indicating the robot addressed, i.e. A for Aibo, S for Scorpion, P for Pioneer and R for Rolland, and the speaker number, for instance 012. (breathing) marks a breathing event (laughing) marks a laughing event - marks a brief pause (300–600 msecs) – marks a longer pause (600–1000 msecs) (3) marks a pause of one or more seconds, here 3 seconds + marks a word fragment (…) marks a non-understandable sequence (at=lengthening)…(/a) marks syllable lengthening (at=prominent)…(/a) marks a prominent syllable (at=quiet)…(/a) marks quiet speech (at=loud)…(/a) marks loud speech (par)…(/par) marks parallel speech . marks a falling intonation contour , marks a pending, slightly falling intonation contour ? marks a rising intonation contour

chapter 4

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Talking to children

The most well-known and the most obvious candidate for a simplified register is certainly the way in which adults talk to children, and it has been suggested to constitute the most prototypical simplified register (cf. De Paulo and Coleman, 1986). Speech to children has, especially in earlier studies, also been called motherese, parentese or baby talk. Researchers nowadays mostly speak of infant- or child-directed speech, depending on the age of the child.1 According to Snow (1994), research into the properties of speech directed towards infants and children arose in the context of the anthropological description of ways of speaking, and thus the interest in infant- and child-directed speech was initially mainly descriptive (Snow, 1994, p. 4). With a shift into the study of the relationship between adult utterance and child utterance, child gaze and child gesture, the study of the variety moved from the grammatical description of the suspected register to discourse and conversation analysis, with increased attention to the pragmatic and social functions of adult utterances (Snow, 1994, p. 6). In current studies, infant- and child-directed speech is investigated mostly concerning its possibly facilitative role in language acquisition. Especially in constructivist approaches to language acquisition, which reject the hypothesis of innate, language-specific acquisition mechanisms (Tomasello, 2003), the particular input children receive plays an important role (e.g. Dabrowska, 2004; Ambridge and Lieven, 2011). Thus, in addition to studies describing the linguistic properties of speech to children, there is now also a growing body of studies on possibly facilitative effects of caregivers’ adjustments in their speech directed towards infants and children, which include distributional regularities and probabilistic information.

1. The term infant is often used in the literature to refer to different age groups; in the current study, I use the term infant if the children under consideration are clearly younger than twelve months; if it is unclear how old the children are or if the group under consideration comprises children older than twelve months, I refer to them as children.

88

Designing Speech for a Recipient

4.1  Linguistic description of the variety Most, but not all, cultures have been reported to make use of special ways of talking to children (cf. Lieven, 1994), and there are considerable similarities across languages (e.g. Fernald, 1992). While the following discussion focuses on English, also results from the study of other languages will be discussed if they are useful for understanding the kinds and functions of the adjustments caregivers make in infant- and child-directed speech.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

4.1.1  Phonetic, phonological and prosodic properties Some of the most prominent peculiarities of infant- and child-directed speech (henceforth IDS and CDS respectively) concern the phonetic, phonological and prosodic realization of utterances. Prosodic characteristics of IDS and CDS concern all aspects of the articulation of suprasegmental units, thus comprising aspects of timing, stress, intonation and pitch height and variability. Most studies of IDS and CDS consistently report prosodic peculiarities regarding pitch height and variability (cf. Pine, 1994, for an overview). For instance, De Paulo and Bonvillian (1978) review 31 studies of CDS and find considerable convergence between the studies concerning high-pitched, clear and slow speech towards children (De Paulo and Bonvillian, 1978, p. 191). Furthermore, the studies they review converge on a simplified phonology, comprising reduplications, lengthened vowels and distinctive consonant-vowel clusters. The most obvious properties of IDS are probably a high average F0 and exaggerated pitch contours, which have been reported even in studies relying more on self-reports or observation than on sophisticated linguistic analysis; for instance, Ferguson (2004) reports on overall higher pitch and exaggerated intonation contours. Correspondingly, IDS is thus often referred to as “singing” by non-linguists (e.g. Falk, 2004). The special intonation contours in speech to infants may help children to remain attentive and interested (Fernald, 1985), but also to identify important information in the speech stream; for instance, Fernald and Mazzie (1991) investigate in which ways new words are stressed or receive F0 peaks in CDS in contrast to speech addressed to other adults. They recorded 18 mothers telling a story from a picture book to their 14-month old children and to the experimenter. They find that in contrast to adult-directed speech, in which F0 peaks, word stress (intensity) and new information only partly overlap, in CDS mothers take great care to mark all new words prosodically, and even so during second mention. Since the bookreading situation is not really a teaching situation in the adult condition, the authors

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 4.  Talking to children

subsequently investigate adults teaching other adults how to use a foreign kitchen machine. The results from this study support the previous findings that in adultdirected speech new words are not consistently marked prosodically, in contrast to CDS. The authors conclude that the modifications in CDS cannot only be due to unspecific adjustments to the communication with incapable communication partners, as, for instance, Newport et al. (1977) suggest. Instead, mothers use these cues specifically to teach their children new vocabulary; thus they aim to facilitate language acquisition by producing consistent interpretable clues for the child. That exaggerated pitch contours may fulfill functions with respect to language acquisition is supported by a study by Trainor and Desjardins (2002) who argue that the high falls observable in IDS allow children to discriminate sounds more easily. In particular, in a study involving 24 six-to-seven month old infants, they investigate whether high versus low pitch and high-pitched versus low-pitched falling contours have an effect on infants’ ability to discriminate between two vowels. Results show that children discriminate between the two vowels better when the stimulus is presented with a falling contour, indicating that the exaggerated pitch range in IDS may indeed have a facilitative function. In contrast, no effect was found for high pitch; on the contrary, low-pitched stimuli were discriminated better than the high-pitched ones. The authors thus conclude that the high pitch must have another, perhaps an emotional or attentional function. In addition to high and exaggerated intonation contours, Ferguson (2004) suggests slow speech, careful enunciation, fewer disfluencies and longer pauses between utterances to characterize IDS and CDS (Ferguson, 2004, p. 178). Similarly, Albin and Echols (1996) find both stressed words and final syllables to be significantly longer in speech directed to six-to-nine-month olds compared to adult-directed speech in labeling tasks. Moreover, they find pitch peaks to be higher and more variable for words ending in stressed syllables in IDS. Furthermore, they report that final syllables are produced with greater amplitude in IDS. The authors suggest that the observed features contribute to the perceptual salience of final syllables in IDS and thus to speech segmentation on the one hand and to the highlighting of words that receive stress in adult-directed speech on the other. Redford et al. (2004) demonstrate that IDS provides systematic vowel duration patterns for disyllabic words; in particular, they find mothers to realize disyllabic words consistently in a way that allows infants to segment the speech stream reliably into words. These cues are stronger in utterance-medial position, thus providing facilitative information where it is needed the most (words in utterance-final position seem to be more easily identifiable for infants anyway (see Dominey and Dodane, 2004)). Soderstrom et al. (2003) demonstrate that infants as young as six months use such prosodic cues for the identification of phrase boundaries, i.e. for segmenting

89

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

90 Designing Speech for a Recipient

speech input into units. In their study, the prosodic cues available to the children were final syllable lengthening and pitch drop of naturally read sentences. Also Bernstein-Ratner (1987) argues that parents produce segmentation cues, for instance, by means of differential stress, prolongation of segmental durations, release of final consonants, glottal stops and the insertion of silence as phonological cues (Bernstein-Ratner, 1987, p. 169). Many other studies produce similar results with respect to pauses for phrase and clause boundaries (see Ambridge and Lieven, 2011, p. 43). The information children receive on linguistic boundaries may in turn support language acquisition by providing cues to permissible sound combinations in the respective language and thus to phonotactic information. Morgan et al. (1987) show in a language learning experiment with adults that prosodic cues to phrase boundaries facilitate language learning considerably. In a computational simulation, Brent and Cartwright (1996) furthermore investigate to what extent children may possibly infer phonotactic constraints for segmentation. Attention to phonotactic information seems to develop between six and nine months after birth. Several researchers report that while 6-month olds do not show any preferences, nine-month olds listen longer to words that are phonotactically permissible in their language than to those that are not (Jusczyk et al., 1993; Brent and Cartwright, 1996, p. 67). Brent and Cartwright (1996) investigate the role of three such constraints: that every word must contain a vowel, that there are restrictions to word initial and word final consonant clusters, and that only a finite set of consonant sequences can appear inside a word between two vowels (Brent and Cartwright, 1996, p. 96). They suggest that children learn legal combinations of sounds by attending to utterance boundaries (Brent and Cartwright, 1996, p. 105). Brent and Cartwright (1996) conclude that children make use of distributional regularities that concern the variability of a string with respect to its context, leading to phonotactic knowledge about permissible sound combinations of the language they are learning. To sum up, IDS and CDS are characterized by certain prosodic features, such as high average F0, exaggerated intonation contours, regular stress patterns, elongated syllables and high form–function correlations. Most of these features could be related to some facilitatory function with respect to language acquisition, and Weppelman et al. (2003) report some of these linguistic characteristics even in the speech of older siblings; especially girls of age 4;3–4;11 were found to change the duration of syllables, amplitude and to a certain degree also the F0 variability when talking to a female infant (4–12 months old). However, boys showed the reverse effect for F0 variability, and neither boys nor girls exhibited a higher fundamental frequency. Thus children do make changes for their infant addressees, but these changes are not necessarily the same ones as those adults make.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 4.  Talking to children

In addition to prosodic adjustments, many studies report phonetic hyperarticulation in infant- and child-directed speech, for instance, Weppelman et al. (2003) and Uther et al. (2007). In interaction with prosodic adjustments like the exaggerated pitch range and elongated syllables, adjustments may concern the speech rate, pausing, vowel space, vowel duration and the articulation of fricatives (e.g. Biersack et al., 2005), as well as the exaggeration of tone differences in tone languages (cf. Liu et al., 2009). Uther et al. (2007), for instance, investigated ten British mothers in conversation with their infants, whose age ranged from around four months to one year, for vowel duration, pitch and breadth of the vowel space, which serves as an indicator for the amount of hyperarticulation. Furthermore, they filtered a subset of the mothers’ utterance to their children by means of a low-pass filter, rendering the speech unintelligible, but allowing for an evaluation of the affective value of the prosodic features employed. They also elicited speech directed at a foreigner and at another adult for comparison. The authors report significantly higher mean pitch, significantly longer vowel duration and significantly greater vowel space for IDS compared to speech directed at adults. Furthermore, speech to infants shows significantly more positive affect and less negative affect than speech to native or non-native speaking adults, as judged by 24 listeners.2 Thus, Uther et al. (2007) find IDS to be characterized by higher pitch, longer, hyperarticulated vowels and high positive affect. Liu et al. (2003) review previous studies on hyperarticulation by means of increased vowel space and consonant expansion in CDS and conclude that studies show that mothers indeed increase the clarity of their utterances for their children. In their own study they first determine whether this is also the case for the 32 Mandarin speaking mothers who participated in their study. These mothers were audiotaped in interaction with their infants and when speaking to another adult. The results show that Mandarin mothers modify their speech clarity when speaking to their infants by increasing the acoustic distance between vowels. In fact, mothers increased the vowel spaces more for the older, 10–12 month old infants than for the younger, 6–8 month olds. Further analyses of infants’ perceptive capabilities show that infants of mothers with clearer speech performed better in an independent speech perception test. In contrast, there was no relationship between external variables like education or socio-economic status that could account for the differences in infants’ speech perception abilities. Soderstrom (2007) reviews previous work on phonetic and prosodic characteristics of IDS and CDS in different languages. She finds that for different languages different kinds of adjustments are reported, and that in each case the adjustments 2. The difference in the amount of negative affect may however disappear if other methods of low-pass filtering are used, see Knoll et al. (2009).

91

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

92

Designing Speech for a Recipient

reported are the most relevant to the specific language under consideration. She therefore argues that the adjustments are not just random but tuned in to the specific needs of the language learning child (Soderstrom, 2007, p. 507). Regarding hyperarticulation, however, the results reported on in the literature are rather inconclusive; for instance, Bard and Anderson (1983) report that speech directed to children is in fact less intelligible than adult-directed speech, and also Papousek et al. (1987) observe some of the parents investigated to employ a slurred and hardly intelligible low-volume way of speaking (Papousek et al., 1987, p. 508–509). Cruttenden (1994) proposes that the contradicting observations are due to differences in the ages of the children studied: While in early infant-directed speech adults indeed hyperarticulate, they use less well articulated speech when the children are older. In the study by Bard and Anderson (1983), speech to children in the youngest age group (1;10–2;0) was in both experiments much more intelligible than speech to the 2;4–2;6-olds (Bard and Anderson, 1983, p. 274). However, there does not seem to be a simple temporal relationship concerning the amount of adjustments; Englund and Behne (2006), for instance, find overspecification effects for fricative duration in the speech of 24 Norwegian parents to their infants. However, at the same time, they report underspecification with respect to the vowel space, i.e. a more narrow vowel space than in speech to other adults. Yet regarding vowel duration, they find changes during the period of recording. In contrast, like Uther et al. (2007), Liu et al. (2009) report considerable differences in vowel space, vowel duration, pitch height and lexical tone differences for Chinese mother-child dyads. The children addressed in their studies were however slightly older than the children in the study by Englund and Behne (2006); data were recorded from mothers talking to infants ranging from seven months to one year, to children aged five and to the experimenter. For twelve Mandarin Chinese target words the researchers find consistent changes from IDS over CDS to ADS with the exception of vowel space, which is not significantly different in IDS and CDS. Thus, while the question whether parents hyperarticulate when talking to their children has no simple answer, Fernald (2000) argues that adults do not only rely on segmental, acoustic cues for speech perception, either. Instead, perception is always also knowledge-driven, i.e. relying on signal-complementary information. Thus, the fact that parents do not eliminate the high phonetic variability when talking to their children may be due to the fact that nine month old infants can be demonstrated to exploit co-articulation effects already to predict the next syllable (Fernald, 2000, p. 247). Furthermore, already two year olds can be shown to employ phonotactic knowledge (Fernald, 2000, p. 246), and they profit particularly from the reformulative and repetitive nature of CDS (see below) since they find it easiest to identify words in final position (Fernald et al., 2001) and if they occur in fixed frames

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 4.  Talking to children

(Fernald, 2000, p. 251). Thus, Fernald (2000) suggests that IDS and CDS are indeed facilitative for language acquisition even if segmental information is not necessarily presented in a pedagogical manner. Bernstein-Ratner (1987) even suggests that the fact that caregivers expose children to phonological variability has a facilitative function as well; given the repetitive and reformulative nature of CDS, parents have the opportunity to vary their productions of repeated words and thus to inform their children on the conditions for such a variability; the author observes such alternations for instance with respect to palatalizations. Another property of CDS that may aid phonological acquisition is the heavy use of isolated words. Snow finds about 30% of mothers’ utterances to consist of only a single word (Snow, 1977, p. 43). Since one of the tasks the language-learning child is facing is to segment the stream of speech into meaningful units, isolated words may help to identify word boundaries and provide phonotactic information. Brent and Siskind (2001) also report frequent isolated words in IDS and early CDS. They recorded 13 mothers talking to their 9–15 month old children every two weeks for 90–120 minutes in their homes. From sessions recorded before the child reached the age of twelve months and after it had reached the age of 14 months, 75 minutes each were transcribed and analyzed. The authors count as isolated words those words that occur with at least 300 ms pause with respect to previous and following linguistic material. They find that, on average, 9% of the maternal utterances consisted of isolated words (Brent and Siskind, 2001, p. B36). 27.2% of these isolated words occur more than once within 30 seconds (Brent and Siskind, 2001, p. B37). The analysis of children’s use of the words that were presented to them in isolation reveals that 43.9% of the recorded productions of the respective child were previously uttered to him or her in isolation (Brent and Siskind, 2001, p. B38). Taken all isolated words produced by all mothers together, as much as 76.9% of the children’s utterances consist of words their mothers produced in isolation. While the total frequency with which a child is exposed to a word does not suffice to predict whether the child picks it up, whether the words has previously been produced in isolation turns out to be a reliable predictor (Brent and Siskind, 2001, p. B38). As another example of increased uses of one-to-one correspondences, caregivers have been found to connect not only pitch peaks reliably with new information (Fernald and Mazzie, 1991), but also particular words with particular contours. For instance, Quam et al. (2008) address to what extent certain frequent words in IDS are consistently related to certain prosodic contours. They find for the eight most frequent words in the speech of 16 mothers to their 9–15 month old infants that these words are indeed reliably coupled with specific intonation contours, thus providing stable form-meaning pairs. CDS thus indeed seems to be characterized by one-to-one correspondences between certain, highly frequent words and particular intonation contours (Quam et al., 2008).

93

94 Designing Speech for a Recipient

To sum up, CDS is characterized by higher pitch, which has been hypothesized to serve attention managing and information structuring functions, besides the fact that children have been found to prefer higher pitched over lower pitched speech. Furthermore, IDS and CDS are characterized by prosodic structuring by means of syllable lengthening and pausing that indicate the structural organization of utterances and mark word and phrase boundaries; F0 contours are taken to be more variable, keeping the child’s attention and facilitating the child’s detection of word boundaries and of new information. Moreover, IDS and CDS increase one-to-one correspondences by coordinating stressed syllables with new information and even by assigning the same intonation contours to the same lexical items. In addition, IDS and CDS are generally characterized by clearer speech, especially in speech to infants and young children. Ferguson (2004) suggests these features, especially exaggerated contours, slow speaking rate, cluster reduction, liquid substitution and reduplication, to be likely candidates for language universals (Ferguson, 2004, p. 182).

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

4.1.2  Grammatical properties Concerning grammar, peculiarities observed in CDS comprise morphological and syntactic adjustments, as well as certain distributional characteristics. With respect to morphology, adjustments mostly consist of limiting the full inventory of possible forms to a more restricted subset. For instance, the words used seem to be mostly monosyllabic: Redford et al. (2004) find 85.15% of the words to be monosyllabic, 13.9% to be disyllabic and only 0.9% to have more than two syllables in the speech of four mothers to their six-month old infants (Redford et al., 2004, p. 483). Another example illustrating that CDS may be restricted to certain linguistic structures and leave out others is the limited amount of passive morphology in CDS (Abbot-Smith and Behrens, 2006; Fischer et al., 2011). Similarly, Sylvester-Bradley and Trevathen (1978) report 83.8% of all verbs to occur in present tense in mother’s speech when the infant is eight and nine weeks old, and 69.8% for their whole corpus, which ranges from the child’s eighth’s to her twentieth week. Thus, adaptations concerning morphology seem to lie in the restriction of the features used to a subset of the inventory of the structures of a language (see Snow and Goldfield, 1983). There are correspondingly no reports on morphological adaptations that would compromise the grammatical correctness of child-directed speech (cf. Newport et al., 1977, p. 121). Ferguson (2004) notes that CDS “often omits inflectional endings, function words and also the verb to be,” yet admits that other studies besides his own do not report on these features (Ferguson, 2004, p. 179). Instead, in response to morphological errors in children’s utterances, parents correct these errors implicitly by providing grammatically appropriate recasts of these utterances (Chouinard and Clark, 2003).

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 4.  Talking to children

However, for some languages, researchers report increased numbers of certain morphological features if these may be revealing with respect to grammatical categories. For instance, Kempe et al. (2001) argue that in languages like Spanish and Russian the high number of diminutives in child-directed speech can facilitate the acquisition of grammatical gender for those nouns for which gender is not directly apparent. They suggest that in Spanish and Russian, in which diminutive forms reveal an otherwise opaque grammatical gender, child-directed speech exhibits more diminutives than child-directed speech in languages in which the diminutive does not provide such information, such as German. The authors do not compare these numbers to the use of diminutives in adult-direct speech to exclude that the effect is due to a much higher overall frequency of diminutives in the respective languages, but their data show that with infants, mothers do not use many diminutives, in either of the two languages investigated. This could have been expected if these languages employed more uses of diminutives in general or if the occurrence of diminutives was caused by affective functions (Georgakopoulou, 1996). Instead, there seems to be finetuning to the state of language acquisition of the respective child; the increased amounts of diminutives in Spanish and Russian occur in later stages (2;0 and 2;4) only. The fact that diminutives occur only in speech to children of a particular age then suggests a relationship with the learning of particular morphological features. Concerning syntactic characteristics, Ferguson (2004) reports short sentences, parataxis, telegraphic style and repetition (Ferguson, 2004, p. 182). He suspects that these modifications are “tailored to the needs of the child in verbal interaction” and thus that they “must somehow be relevant to the process of language acquisition” (Ferguson, 2004, p. 179). Similarly, in an early paper Slobin (1975) finds that mothers’ speech is characterized by short utterances, stressed syllables, changes in word order, the use of familiar words as anchor points and the use of words whose meanings are guessable from the interaction (Slobin, 1975, p. 285). In all, he reports a “predominance of simple, active, affirmative and negative sentences, and simple active, affirmative questions” (Slobin, 1975, p. 286). Moreover, he reports that the relative frequency of sentence types, constructions involving personal pronouns and forms of to be, interrogative pronouns, auxiliaries, contractions and certain tense and aspect markers in a corpus of children’s utterances match the proportions of these features in a corpus of mothers’ utterances (Slobin, 1975, p. 294). He concludes that “children, universally, are exposed to a special, simplified version of their community” (Slobin, 1975, p. 295). With this claim, he rejects Chomsky’s assumption that input to language acquisition is “meager and degenerate” (Chomsky, 1968, p. 68). Thus, he suggests that mothers’ utterances somehow meet their children’s linguistic abilities.

95

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

96 Designing Speech for a Recipient

The same conclusion is reached by De Paulo and Bonvillian (1978) who, in their review of 31 studies of CDS, find consistent reports on shorter sentences with reduced complexity, for instance, fewer subordinate and coordinate clauses, fewer embeddings and conjoinings, and proportionally more content words (De Paulo and Bonvillian, 1978, p. 191). Newport et al. (1977) report 60% of the child-directed utterances to be grammatically correct, “unswervingly well formed”, exhibiting significantly fewer disfluencies than adult-directed speech (Newport et al., 1977, p. 121). There are, for instance, no morphological or word order modifications that would reduce the complexity of an utterance at the expense of grammatical correctness. Furthermore, like in previous studies (e.g. Snow, 1972; Slobin, 1975), they find the mean length of utterances directed at children to be shorter than the mean length of utterances directed at adults. Newport et al. (1977) argue that sentences are shorter “because they go one proposition at a time” and because “embeddings and conjunctions are rare” (Newport et al., 1977, p. 122). This finding corresponds also to the high number of single word utterances observed by Snow (1972) and Brent and Siskind (2001). Laakso and Smith (2004) analyse various English corpora from the Childes database (MacWhinney, 2000) for the syntactic structures occurring; they report for a total of 24,286 parental utterances directed at 1;4–6;1 year old children 28.36% of the utterances to occur without a verb. Moreover, utterances containing verbs generally consist of very basic constructions; thus, they find 16.86% to be questions, 11.86% to be copula-based, 17.24% to be intransitives, 24.36% to be transitives, 1.48% to be ditransitives, and only 0.02% to be passives (Laakso and Smith, 2004, p. 768). Cartwright and Brent (1997), who also analyze naturally occurring childdirected speech from the Childes database, report several instances of pairs of sentences that differ only minimally with respect to the syntactic structures employed, i.e. which constitute minimal pairs (Cartwright and Brent, 1997, p. 131). This observation is supported by a systematic study by Küntay and Slobin (1996) who suggest that the highly reformulative nature of CDS may provide the child with important syntagmatic and paradigmatic information. The authors propose the notion of variation set, a series of adult utterances with a constant underlying communicative intention. Küntay and Slobin (2001) find that in the speech to two Turkish children, 21% and 35% of the maternal utterances occur in such variation sets, ranging from 2 to 25 utterances. These variation sets comprise lexical substitution and rephrasing, addition and deletion of specific references and reordering. Küntay and Slobin (1996) show how from such sets morphological, lexical and syntactic information can be extracted. The following example from the New-England corpus from the Childes database illustrates such a variation set:

Chapter 4.  Talking to children

Where Where Where

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

That Where

’s ’s ’s Is ’s ’s

the mommy? the mommy-girl? the mommy-person? that a lady the one that’s like Christopher?

the mommy? isn’t it.

Here the mother tries to get her son Christopher to identify the mother doll in a doll house, presenting her question in five different formulations. The features of variation sets allow the extraction of information on constituent boundaries, morphological marking, verb-specific uses, subcategorization, syntactic/semantic categories, obligatory and optional constituents, pronoun replacement, etc. (Küntay and Slobin, 2001). In addition, the child is directly exposed to information about syntagmatic, taxonymic and hyponymic relations between constructions, as well as to synonymy. The results from the studies by Küntay and Slobin (2001) therefore suggest that besides some general simplifications in grammatical structure, the distributional structure of the input the child receives may have facilitating effects on language acquisition. Correspondingly, in a computational simulation, Onnis et al. (2008) demonstrate that the information in variation sets available to the child may indeed facilitate language learning. That distributional variability as observed in variation sets supports learning in general is indicated by a study by Grunow et al. (2006) who taught non-adjacent dependencies to adults in two conditions: In the low-variability condition, the nonadjacent dependent material was separated by one out of a set of twelve non-words, whereas in the high-variability condition, 24 non-words occurred in this position. Adults were able to learn the dependency relationships only in the high-variability condition. Thus, variability in the learning input may facilitate the perception of linguistic relationships. In addition to the repeated occurrence of functionally similar constructions in variation sets, i.e. in adjacent utterances, it needs to be borne in mind that infants’ and children’s days are highly regular and repetitive. Thus, children are repeatedly confronted with similar linguistic utterances over time (Snow and Goldfield, 1983; Roy et al., 2012). That repeated exposure to certain expressions facilitates learning is moreover evident from a study by Ambridge et al. (2006), who report that children learn complex grammatical constructions better when they are repeatedly presented with the learning stimuli over time, in contrast to a massive exposure in a single trial. Thus, distributed presentation of linguistic input over time does not only provide infants and children with distributed information, they also learn the target construction better.

97

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

98 Designing Speech for a Recipient

That the repetitive and reformulative nature of CDS may provide children with helpful distributional information is confirmed by several corpus-linguistic studies. For instance, Naigles and Hoff-Ginsberg (1995) and Naigles and Hoff-Ginsberg (1998) address the role of positional information for verb learning in comparison with the role of frequency and positional salience, i.e. at the end of utterances. They investigate correlations between verb syntax and semantics, in particular the occurrence of verbs in what the authors call ‘broad frames’, i.e. certain argument structure constructions, and in ‘narrow frames’, i.e. item-specific patterns. They find that verbs occur overwhelmingly in their appropriate broad frames, which serve as reliable cues to some aspects of the meanings of these verbs. For instance, the transitive construction is associated reliably with a causative interpretation of the verb in this construction (Naigles and Hoff-Ginsberg, 1998, p. 99). In addition, the early verbs are also those that are most frequent in the input, and they generally occur in one of only 45 narrow frames. Ninio and Snow (1988) argue that CDS “exhibits a higher than usual formfunction correspondence” (Ninio and Snow, 1988, p. 12); the example they discuss is the high correspondence between sentence frame and speech act reported by Shatz (1977). Similar observations have been made concerning, for instance, prototypical slot fillers; for example, Goldberg et al. (2004) report so-called skewed input for argument structure constructions such that the prototypical light verb that provides the best clue to the meaning of a grammatical structure occurs significantly more often in a given argument structure construction in CDS than in a normal corpus. The same observation has been made by Stefanowitsch and Gries (2005) for the possessive construction, whose slots are significantly more often filled with prototypical fillers in CDS than in a large corpus of adult text and speech. Casenhiser and Goldberg (2005) show that skewed input indicating the constructional meaning is made use of in the learning of new constructions; thus, distributional cues like these are likely to be attended to by the child language learner. Distributional cues from the grammatical context may also have an impact on word learning; Fernald and Hurtado (2006) report an experimental study in which they compare word learning in isolation versus in repetitive frames, which are similar to the narrow frames reported on by Naigles and Hoff-Ginsberg (1995). In their first experiment, they find that 18-month olds look longer at the correct picture if they were presented with a whole utterance like look at the doggie than when presented with the word doggie alone. In a second experiment, Fernald and Hurtado (2006) tested whether the effect is due to the fact that the children were alerted by the imperative in the previous experiment; therefore, they repeated the experiment while just changing the second utterance into look … doggie. They still find the advantage of the whole frame. The authors conclude: “we have underestimated another odd parental habit that appears to be even more advantageous for

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 4.  Talking to children

the young learner: the use of short, simple, repetitive carrier frames leading up to a familiar noun” (Fernald and Hurtado, 2006, p. F39). Cameron-Faulkner et al. (2003) show that caregivers make frequent use of such frames; their analyses of corpora of child-directed speech present considerable evidence for the ubiquitous use of frames in CDS. They investigate the speech of twelve mothers and find that only eight frames account for 77% of all patterns in the corpus. As frames they define fixed patterns with open slots that occur at least four times per mother. In a follow-up study, Stoll et al. (2009) investigate whether the findings by Cameron-Faulkner et al. (2003) are due to the peculiar properties of English with its restricted word order, or whether similar results can be obtained for other, more flexible languages. They analyze corpora of Russian, English and German free play interactions between mothers and their 1;8 to 2;6 old children and find that even though English has the most lexically specific frames, German and Russian exhibit similar effects, such that there are still 58–75% utterances that make use of lexically restricted sets of frames; still, the more flexible the word order of the language in question, the fewer the lexically specific patterns. Mintz (2003) and collaborators show that frequent frames in which words in CDS occur serve as reliable cues to the grammatical category to which the words belong. For instance, Mintz et al. (2002) analyze corpora of child-directed speech and find that the distribution of words in context is sufficiently informative with respect to their grammatical category. If also phrase structural information is taken into account, these results even improve. If closed class items are replaced by a symbol in order to simulate children’s possible problems with these items, the results for nouns are only marginally affected while those for verbs even improve. Their computational learning studies thus indicate that frequent frames are a useful resource for language acquisition. In addition, Dabrowska (2009) shows that not only the general grammatical categories, but also very specific meaning aspects of verbs can be learned from the context in frames. This conclusion receives additional support from a study by Keren-Portnoy (2006), who find new verbs to be learned faster when they occur in a structure that the child knows already. The finding that the contexts in which a linguistic structure occurs facilitates learning opens up for the possibility that distributional information can provide sufficient cues for language learning. Also Laakso and Smith (2007) specifically address whether surface regularities in CDS suffice to induce deep semantic and syntactic structure; the example they analyze is pronoun distribution as a clue to verb class, in particular, psychological versus physical verbs. A study of corpora of CDS from the Childes database (MacWhinney, 2000) reveals such regularities, and a subsequent simulation with a simple statistical learner shows that children could in principle make use of this information in order to acquire verb class information.

99

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

100 Designing Speech for a Recipient

Goodman et al. (2008) aim to determine the impact of word frequency in parents’ speech on vocabulary acquisition. First, they had parents judge for a set of words from a standardized test whether their children knew the word; two tests were used: one for children of eight to sixteen months of age, covering both comprehension and production and comprising 396 words, and one for children between 1;4 and 2;6, which concerns production only and covers 680 words. Second, they determined the frequencies with which parents in the Childes database used these words. For the whole dataset, comprising common nouns, verbs, adjectives as well as closed class items and politeness formulas, there is a slight negative correlation of −.068. The result is mainly due to the highly frequent function words which are however learnt very late. In contrast, for the individual groups of vocabulary, significant positive correlations were obtained, indicating that input frequency has a positive influence on speed of acquisition. When time of acquisition is related to adult-directed or written corpora, the only statistically significant data concern common nouns, yet there is still a positive relationship. These results show not only that frequency in the input plays an important role for language acquisition, but also that the distribution of words in child-directed speech is different from the distribution in adult-directed speech. Borovsky and Elman (2006) carry out computational simulations of the results found by Goldberg et al. (2004) and Casenhiser and Goldberg (2005). They experiment with input frequency in general, with skewed input and with the role of grammatical complexity. In particular, they train an artificial neural network on particularly designed utterances, combining verbs with nouns of different categories. In the skewed input condition, they use one verb with higher frequency, whereas in the other conditions, verbs are used equally frequently. In the experiment on grammatical complexity, they use either two (transitive and intransitive) or four different constructions (in addition ditransitive and verb complement clauses). They find that the amount of input in general plays a crucial role in category formation, and that learning is facilitated if there are fewer and shorter constructions. Regarding skewed input, they find increased precision in medium range input frequencies, yet worse word learning. However, these inconclusive results may be due to the fact that the most frequent verb was selected randomly, and also the frequency distributions were set randomly; in contrast, in natural childcaregiver interactions, the verb selected is suggested to represent the constructional meaning best (e.g. give for the ditransitive) (Goldberg et al., 2004). To summarize the effects of the special distributional characteristics of CDS, children’s linguistic input is considerably less diversified, i.e. restricted to a subset of structures (Slobin, 1975; Cameron-Faulkner et al., 2003), markedly more focused distributionally (Fernald and Hurtado, 2006; Laakso and Smith, 2007), more stereotypical and consistent semantically (e.g. Karmiloff and Karmiloff-Smith, 2002;

Chapter 4.  Talking to children 101

Goldberg et al., 2004) and also much more heavily redundant pragmatically (cf. Küntay and Slobin, 1996) than speech directed towards adults, each in itself an adjustment that facilitates the identification of recurrent form-meaning mappings in language acquisition in an important respect. In addition, concerning its grammatical properties, CDS consists of relatively short, well-formed utterances and exhibits special morphological additions if they are revealing regarding certain grammatical features.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

4.1.3  Semantic and pragmatic properties Semantic properties of IDS and CDS concern, for instance, the use of basic level terms, the use of proper nouns for pronouns and the restrictedness of the topics dealt within CDS. Furthermore, there seems to be a special lexicon used when adults talk to children (Ferguson, 2004). Ferguson (2004) reports on lexical modifications in CDS: “every speech community seems to have a small lexicon of words used primarily with young children” (Ferguson, 2004, p. 179). Thus, a special lexicon for CDS (e.g. pee-pee, bow-wow) has been assumed (cf. Ferguson, 1977; De Paulo and Bonvillian, 1978). Typical areas for this kind of special lexicon concern body parts and functions, kin terms, food, animals, and games (Ferguson, 2004, p. 180). He suggests that these words are often hypocoristic terms which may prevent the child from causing embarrassment in public conversation since the ‘baby talk’ forms of these words are likely to be perceived as less inappropriate or even as cute (Ferguson, 2004, p. 234). In addition to special words, Brown (1977) suggests that speech to children is generally restricted to basic level terms. Correspondingly, of the 74 different nouns Sarah, one of the children he investigates, knew, 67 were basic level categories (Brown, 1977, p. 18), which is likely to reflect the distribution of nouns in her input. However, the reference to objects itself is something that seems to develop in CDS over time; Sylvester-Bradley and Trevathen (1978) find no references to objects other than to the baby herself in a mother’s speech when the baby is nine to twelve weeks old, but later the proportions of references to objects besides the baby rise to 50% in week nineteen (see also Filipi, 2009). In addition, the objects talked about seem to belong to rather restricted domains; the studies of CDS reviewed in De Paulo and Bonvillian (1978) agree unanimously that CDS exhibits reduced cognitive complexity by means of focus on immediately present objects and pictures or on recently completed actions and an emphasis on the here-and-now (De Paulo and Bonvillian, 1978, p. 191). Similarly, Newport et al. (1977) hold that “mothers want their children to drink their juice and clean up their playroom, so these are the things that are talked about” (Newport et

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

102 Designing Speech for a Recipient

al., 1977, p. 124). Snow concludes that “mothers make very predictable comments about very predictable topics” (Snow, 1977, p. 41), which allows children to work out the meanings of the sentences they hear. Besides the topics, also certain activities in the interactions between caregivers and children are highly predictable. For instance, for the task of word learning, Clark and Estigarribia (2011) find highly structured sequences, involving speech, gaze and gesture, by means of which caregivers introduce children not only to the words themselves but also to salient properties of the objects named. Also the naming procedures constitute repetitive frames, such as this is … or that’s a … They observed 40 parent-child pairs, the younger group of children being, on average, 18 months old (range 1;4.7 to 1;11.13) and the remaining twenty children being about three years old (range 2;8.8 to 3;2.20). Clark and Estigarribia (2011) find predictable sequences for (a) getting the child’s attention, (b) labeling the object, (c) situating the object in a conceptual or semantic domain, and (d) drawing the child’s attention to a particular part or property using speech and demonstrating, i.e. depicting gestures. The authors also report finetuning effects such that initially parents use more leading gestures to capture the attention of the younger children whereas they present more conceptual, situating information to the older children. Many authors suggest CDS (and IDS) to be highly contingent to the child’s actions; for instance, Sylvester-Bradley and Trevathen (1978) find play to be “inevitably followed by a pause during which the mother looks at her baby and smiles, laughs or calls out. If the baby expresses enjoyment at this point, by happy animation or a call, the game will be renewed, often in a slightly altered form” (Sylvester-Bradley and Trevathen, 1978, p. 88). So besides special lexical and semantic adjustments, IDS and CDS also exhibit interactional adjustments. For example, Sylvester-Bradley and Trevathen (1978) also report on ‘mirroring’ in a mother’s speech to an eight to twenty week old baby: “she continually interprets back to Sarah Sarah’s immediate state, her moods (…), her facial expressions (…), and her actions (…) as they happen” (Sylvester-Bradley and Trevathen, 1978, p. 85). Thus, the mother provides contingent responses that provide interpretations of the infant’s internal states. The authors conclude that IDS is characterized by “mutual interactive adaptation between mother and child” (Sylvester-Bradley and Trevathen, 1978, p. 89). Another example of such interactional adjustments is the “exaggerated use of questions” in CDS by means of which caregivers involve their children reported on by Ferguson (2004). Another example for an interactional adjustment is the joint negotiation of word meaning; Laakso and Helasvuo (2005) describe the sense-making efforts of a mother in interaction with her one-year-old girl who has only a single proto-word in her vocabulary; this proto-word is interpreted by the girl’s mother depending on the current situation, relying furthermore on gaze direction, pointing gestures,

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 4.  Talking to children 103

and body movements. The mother’s candidate understandings for the child’s utterances can be accepted or rejected by the child. At the same time, parents have been suggested to ‘pretend’ that their children are capable communication partners (e.g. Kaye, 1980), which provides children with models of the kinds of interactions they are expected to take part in later. Another example of this pretense concerns caregivers’ pronouns; RabainJamin and Sabeau-Jouannet (1989) investigate the use of pronouns in French maternal speech to three-, seven- and ten-month olds and find that pronoun use (a) changes over time, especially with respect to the use of third person references whose numbers decrease over the period studied, and (b) is significantly related to different kinds of activities and semantic content of the utterances. The authors argue that mothers use pronouns to construct the interactional space for the baby as the agent of a meaningful activity. Caregivers may also provide children with metalinguistic information; Clark (1998a) shows that caregivers provide children with pragmatic directions concerning “which words to use and hence how to talk about objects, actions, events, and relations” (Clark, 1998a, p. 436) and concerning semantic relations between words, for instance, set membership ‘X is a kind of Y’, partonymy ‘X has a Y’ and similarity, like similar function or similar appearance. She shows that children make use of these kinds of directions such that 17 of 18 children at age 2;2 tested inferred that Y included X when tested later on. The children were also tested for their ability to understand repair and thus for their sensitivity to the linguistic surface realization of the caregiver’s utterances; she finds that 13 of the 18 children had indeed replaced the wrong noun by the correct one. In contrast, in teaching sessions without metalanguage information, children used various different coping strategies which are similar to constraints on lexical acquisition previously proposed in the literature. Clark (1998a) concludes that “CDS plays a critical role in displaying to children the content of the lexicon for their first language” (Clark, 1998a, p. 445). In a study continuing this line of research, Clark and Wong (2002) show that adults present children with information about word meanings and relations between words implicitly during interactions. The authors suggest that by making use of a joint focus of attention and physical and conversational co-presence, parents provide their children with metalanguage directions on the meanings of new words (as distinct from any neighboring meanings), on their conventional forms, and on the relationships of this word with possible neighbors. In particular, in a corpus study of different corpora from the Childes database, comprising interactions with children between 1;1 and 5;1, Clark and Wong (2002) find numerous instances of both offers of words and offers of relations between words. They conclude that pragmatic directions play a critical role during lexical acquisition.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

104 Designing Speech for a Recipient

That caregivers may have very good intuitions about semantic relationships and what could be relevant for lexical acquisition is also shown in a study by Manders and Hall (2002). They find that caregivers are sensitive to those contrasts that facilitate noun and adjective learning and intuitively employ them in their instructions. In their experiment, they had 24 adults choose those cards that they might want to use to teach two novel (invented) adjectives to children, and participants consistently made use of those semantic relationships that had previously been shown to facilitate adjective learning in children. Mothers, in contrast to young children, have also been found to provide conversational support to their children. French and Pak (1995) compare mothers’ speech to their 3-year old daughters with speech by the girls’ peers. The interactions taped were four 15-minute play interactions, and the speech by the mothers and the girls’ peers as well as the girls’ own linguistic behavior was analyzed. The authors find mothers to offer conversational support that makes it easy for their daughters to respond contingently and to maintain a fluent conversation, whereas the girls’ peers did not offer such clues. The 3-year old girls themselves furthermore did not compensate for the lack of conversational support from their peers, which led to conversational exchanges that were very different in quality from the mother-child interactions; however, the girls managed to “mutually construct shared meanings and to establish and work towards shared goals” (French and Pak, 1995, p. 98). Thus, mothers’ linguistic behaviors facilitate interactions for the child and at the same time provide them with models of more adult-like interactions; as the comparison with the peer interactions however shows, these model interactions go beyond the necessary – that is, mothers’ linguistic behaviors are not only driven by the need to communicate, which may obviously be established with much less elaborate means. Caregivers have moreover been found to reformulate and expand their children’s utterances frequently; furthermore, they reformulate, rather than just expand, significantly more often if the utterance contains an error (Chouinard and Clark, 2003, p. 648). In particular, in their data set from the Childes database, Chouinard and Clark (2003) find that about two thirds of all erroneous utterances receive reformulations (Chouinard and Clark, 2003, p. 648), addressing problems in morphology, syntax, word choice and phonology (Chouinard and Clark, 2003, p. 651–652). These reformulations work as implicit corrections since they place the caregiver’s utterance in direct contrast to the child’s utterance, emphasizing the difference between the two versions (Chouinard and Clark, 2003, p. 646). This finding is in contrast to previous suggestions that children receive hardly any feedback on grammatical correctness. For instance, Slobin (1975) finds corrections to concern mainly aspects of the etiquette, whereas there seems to be hardly any pressure for grammatical correctness (Slobin, 1975, p. 290–291). He concludes that parental

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 4.  Talking to children 105

feedback does not seem to play a decisive role in the child’s discovery of grammatical rules. Chouinard and Clark (2003), however, show that children make use of the corrections that are embedded in the reformulations: The proportion of utterances by means of which children take up the conventional form ranges from 25% to 100% for the five different children investigated, and the older children are, the more often they attend to the tacit corrections in their subsequent utterances. The principle behind the potential efficiency of reformulations and recasts is contrast: by reformulating children’s utterances, caregivers do not only provide them with models of how a particular contents can be realized, but also demonstrate which parts are okay and which parts are problematic, thus providing children indirectly with negative evidence (Chouinard and Clark, 2003). Similarly, Saxton et al. (2005) investigate the effect of parents’ implicit corrections for 12 children after twelve weeks; in particular, they analyze the relationships between parents’ corrections provided in recasts and reformulations and children’s grammatical knowledge twelve weeks later. They find that corrective input relying on the contrast between grammatical and ungrammatical speech indeed correlates with higher grammatical knowledge at the later stage; in contrast, reformulations and expansions that provided positive models only were not significantly related to an increase in grammatical knowledge twelve weeks later. The results show that children make use of the contrast between the form of the child’s utterance and the form of the caregiver’s utterance available from recasts. In addition, children have been found to make use of contrasts for word learning, making use of pragmatic and contextual information to infer the meanings of new expressions (Clark and Amaral, 2010, p. 448). To sum up, CDS seems to be lexically and semantically restricted, such that words are mostly basic level categories and refer to very similar objects and situations of the here and now, and in this way CDS seems to be simplified in comparison with speech directed towards adults. At the same time, corresponding to the highly repetitive activities children are engaged in, CDS is highly repetitive. In addition, caregivers seem to provide children with clarifying information on semantic relationships, semantic contrasts and the situational appropriateness of utterances in context. Moreover, caregivers seem to treat infants and children like real, capable communication partners, thus providing them with models of appropriate interactional behavior. Finally, children receive numerous reformulations and recasts of their utterances which provide them with indirect feedback on the correctness and appropriateness of their own productions.

106 Designing Speech for a Recipient

4.2  Homogeneity An important concern for the question whether CDS constitutes a simplified register is how homogeneous speech directed to infants and children really is. In fact, Kaye (1980) notes that “BT3 is at best a set of loosely related features” and thus asks whether it deserves the term register at all (Kaye, 1980, p. 491). In contrast, Ninio (2011), for instance, holds that “On all criteria, parental speech addressed to young children is a specific register of English” (Ninio, 2011, p. 49) and concludes that child-directed speech is highly homogeneous: “The similarity found more than justifies the term register” (Ninio, 2011, p. 49). Researchers have found variation on several dimensions; for instance, studies have been carried out addressing differences between speakers of CDS, especially with respect to possible differences between mothers and fathers and concerning parents’ level of education and socio-economic status, between different situations and across time, i.e. in response to children’s increasing age and developing cognitive and linguistic capabilities, as well as concerning differences between the children addressed, i.e. girls versus boys.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

4.2.1  Variation: Speakers In this section, we consider whether all speakers adapt their speech to children alike.

Mothers versus non-mothers Snow (1972) tested mothers of 2-year olds and non-mothers in three verbal tasks of different degrees of difficulty. Participants were asked to create tape recordings for (their) 2-year old children. The analysis reveals only few significant differences between mothers and non-mothers (Snow, 1972, p. 559); in particular, non-mothers talked longer and produced longer utterances and fewer utterances without verbs. With respect to repetitions and reformulations as well as concerning all complexity measures besides utterance length, non-mothers simplified their speech in similar ways as the mothers did. Thus, prior experience with children does not seem to play an influential role. However, while motherhood itself does not yield significant differences, still not all mothers may talk to their children alike; for instance, Lieven (1978) analyzes interpersonal differences in five interactions between two mothers and their 18–24 month and 20–26 month old daughters, Kate and Beth. The two mothers exhibit 3. BT is short here for Baby Talk and refers to what has been called CDS in the current study.

Chapter 4.  Talking to children 107

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

very different linguistic behaviors towards their children; for instance, while Kate’s mother fails to respond to her daughter 19% of the time, at almost half of which the child continues to speak anyway, Beth’s mother does not react to her daughter’s utterances 54% of the time, mostly because she does not interrupt her conversation with the researcher. On the whole, Beth’s mother makes much less effort to keep the conversation with her daughter going, for example, by providing extensions or expansions, than Kate’s mother. Interestingly, Lieven (1978) attributes these differing behaviors partly to the children themselves since her own conversations with the children exhibited patterns similar to those of their mothers. She reports also that both children grew up to become fluent, competent speakers. Also Kaye (1980) investigates interpersonal differences between mothers and between mothers and non-mothers, with different results, however. He compares mothers’ speech to their children with speech by a female researcher and himself to the same children. He finds that while the mothers were highly consistent over time in their ways of talking to their children, their speech differed significantly both from each other and from the speech by the two researchers. In addition, the two researchers’ speech did not exhibit the same specific adjustments to the respective children. Thus, mothers seem to develop highly individual styles of talking to their infants. Similarly, also children have been found to differ significantly in their linguistic behaviors and development (Bates et al., 1995).

Mothers versus fathers Most work on speaker variability however focuses on the role of gender in CDS. Gleason (1975) suggests that fathers play a particular role in children’s language acquisition since they are usually not as familiar with their children’s utterances and linguistic knowledge as mothers are, but more familiar than other adults, making it slightly more difficult for children to make themselves understood to their fathers than to their mothers. Thus, fathers are taken to act as a bridge to the outside world (Gleason, 1975). Based on Gleason’s suggestions, Mannle and Tomasello (1987) investigate the speech exhibited by fathers, mothers and siblings. Concerning gender differences, they analyze the speech of the parents of 24 children; the mothers recorded spent on the average about 23 hours per week alone with their either 12- or 18-month old children, and the fathers recorded spent on the average about eight hours per week with their children alone. Recordings took place in parents’ homes during their usual activities with their children. The authors analyze joint attentional focus, object references, the contingency of replies and the number of the children’s turns in the conversation. They find that fathers indeed spend less time than mothers in joint attentional states with their children, and they refer to objects less often than mothers. Fathers also fail more often to follow the child’s focus of attention on an object. Fathers make more off-topic replies and recast their children’s utterances

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

108 Designing Speech for a Recipient

less often. These results provide evidence for the bridging hypothesis (Mannle and Tomasello, 1987, p. 29). Their results are supported by Walker and Armstrong (1995), who find different distributions of questions, responses and functional intention, i.e. directing versus initiating conversation, between the mothers and fathers of four children, two male and two female, at about 2;5 years. That fathers’ linguistic behavior indeed plays a role in the child’s language acquisition process is shown by means of a correlation analysis in which several feedback behaviors by the father are significantly related to the child’s vocabulary skills six months later (Mannle and Tomasello, 1987, p. 35). Their results also suggest gender differences regarding the phonetic and prosodic realization of affective utterances. Differences between mothers and fathers in the expression of affect are also reported by, for instance, Slaney and McRoberts (2003) who find much better automatic classification results for affect (attention, approval, prohibition) for female than for male CDS speakers, using pitch, broad spectral shapes and energy variation for the classification. This is surprising since perceptive analyses by a group of raters revealed female and male speech to be similar concerning emotional expression (Slaney and McRoberts, 2003, p. 379). In contrast to the above results, Papousek et al. (1987) find hardly any significant differences between mothers and fathers in their speech to 3-month olds. They compare 14 German mothers and fathers with respect to lexico-semantic and temporalmelodic features of speech addressed to their infant. In particular, they investigate mean length of utterance, grammatical sentence type, repetition of words, lexical content, pause duration, melodic contours, fundamental frequency and range, and the number of repetitions of melodic contours. They report the speech of both parents to contain 46% repetitions, and most utterances (55.2%) consisted of only 1–3 syllables (Papousek et al., 1987, p. 500). Both parents’ utterances mostly “served to promote the ongoing conversation and playful interchanges conveying basic messages (…), such as calls for visual attention (…); calls for an infant turn (…); approvals (…); or readiness for playful interchanges (…)” (Papousek et al., 1987, p. 501–502). Here, mothers place more emphasis on eliciting vocal behaviors whereas fathers are being found to express more interest in the infants’ motor behaviors. Even though the trend is significant, the difference is in fact very small. Fathers are however found to make slightly longer pauses yet to speak slightly faster than mothers (Papousek et al., 1987, p. 503–504). Both parents are found to use mostly unidirectional intonation contours, and both repeat preceding melodic contours, while mothers (70.7%) do so significantly more often than fathers (62.0%) (Papousek et al., 1987, p. 505). The authors also investigate whether the observed similarities between mothers and fathers may be the effect of intrafamilial attunement, yet in contrast to Mannle and Tomasello (1987), they find only few incidences of significant positive correlations (Papousek et al., 1987, p. 513).

Chapter 4.  Talking to children 109

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Other authors find some differences and some similarities; for instance, Gauvain et al. (2002) made both mothers and fathers instruct in separate sessions their five year old children in a familiar and in an unfamiliar game. Whereas the mothers used different strategies and different proportions of strategies in the first, unfamiliar, and the second, familiar session, fathers did not adjust their instruction strategies to whether or not the child had been instructed to play the game a month earlier. Thus, the authors observe differences with respect to the amounts of adjustments to children’s respective cognitive levels between mothers and fathers. In all other measures, however, there were no differences, and for both mothers and fathers high amounts of coordination of behaviors could be observed; in addition, the communicative problems between fathers and their children observed in other studies (e.g. Mannle and Tomasello, 1987) were not observed in this study (which may be due to the fact that children were slightly older and thus most likely more able to express themselves). Lanvers (2004) reviews research on the role of gender in language socialization and finds a general trend for mothers to speak more, to produce more facilitative speech, to be more negative, yet less directive and less informative than fathers. However, differences, if there are any, are usually rather small. Thus, speakers’ gender seems to have a rather small impact on CDS.

Socio-economic and psychological factors Variation between speakers of CDS may also be due to certain psychological conditions of the caregiver. For instance, Kaplan et al. (1999) find infant-directed speech by mothers who self-report symptoms of depression to show less F0 change and to enable four-month old infants less to learn voice-face associations than IDS by normal mothers. Thus, depression may affect mothers’ use of IDS characteristics, which may in turn deprive infants of facilitative cues for language acquisition. Much research furthermore suggests that caregivers may differ regarding the communication strategies they take. For instance, Nelson (1973) proposes that caregivers may be more or less directive and controlling and that these communication styles have an impact on language learning. Pine (1992) warns against a simple equation of directive style with alleged insensitivity towards the child’s needs and calls for a careful investigation of possible mediating factors. One such study is Taylor et al. (2009) which addresses the relationship between interactional behavior and linguistic development by showing how the different interactional styles are related to particular linguistic behaviors. The styles they identify are ‘high negative control’ (using displays of anger or annoyance, criticizing, forcing, punishing etc.) versus ‘control’ (directing, prohibiting or bribing) versus ‘guidance’ in a clean-up task. Importantly, all mothers use all of these strategies, just in different amounts. The different communication styles are statistically associated with

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

110 Designing Speech for a Recipient

certain other linguistic behaviors, such as numbers of questions, controls and negative controls, as well as with the mean length of utterance and number of different words. The authors furthermore investigate the effect of the speaking styles on children’s grammatical, semantic and pragmatic competence and find significant differences throughout, which are largely due to differences in the amount of negative control: “mothers who exhibited relatively high amounts of negative control during the session were paired with children who were consistently less advanced in their language usage” (Taylor et al., 2009, p. 397). In contrast, children of mothers who focused on asking, persuading and suggesting had the highest mean length of utterance and the highest numbers of bound morpheme types. Still, children whose mothers used commands were found to be more advanced on the pragmatic measures (Taylor et al., 2009, p. 398). Also individual differences between children have been suggested, mediated, however, by birth order and number of siblings. Thus, a tendency has been suggested for first-born Western middle class children to learn language in an analytic way, carefully combining known words, whereas children from larger families tend to learn in a holistic way, starting off with larger chunks into which new words are inserted. In this account, the environment is taken to play a considerable role (Dabrowska, 2004). Pine and Lieven (1993) however argue that the style differences identified between referential and expressive children are in fact rather due to two different strategies for analyzing their caregivers’ utterances and producing their own, which do not exclude each other: While one strategy is to identify individual words and to combine them in novel or new ways, the other is to identify larger chunks with flexible slots and to insert new material into these slots. Thus, the authors argue that the differences in learning styles are actually different styles for breaking into language structure. Still, these different strategies may depend on children’s learning environments. Furthermore, parents’ socio-economic background has been found to have an impact on the ways of speaking by means of which parents interact with their children (e.g. Bates et al., 1995; Fernald and Weisleder, 2011). For instance, Huttenlocher et al. (2002) find that socio-economic status is initially the main predictor for preschoolers’ comprehension of complex sentences; however, after one year of preschooling, the amounts of complex constructions in their teacher’s talk become the most influential factor (Huttenlocher et al., 2002, p. 366). Kaye (1980) finds an effect for education: mothers with more education are found to use more varied utterances to their infants (Kaye, 1980, p. 498). While in his study children’s linguistic capabilities in production and comprehension could not be related to any measures of their mothers’ speech in weeks 6–26, mothers’ educational level was a good predictor for children’s linguistic capabilities at age two (Kaye, 1980, p. 499).

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 4.  Talking to children

Socio-economic status has also been found to have an impact on parents’ naïve theories about children’s cognitive development and needs and timing for cognitive stimulation. Ninio (1979), for instance, elicited 104 Israeli mothers’ naïve theories about their children’s cognitive development and found significant differences depending on socio-economic status (SES). In particular, low SES mothers believed their children to develop certain cognitive skills much later than high SES mothers, and they also considered cognitive stimulation, such as starting to feed the self or buying the first book, for their children necessary only at later points in time. Vogt and Mastin (2013) investigate the impact of cultural expectations on children’s linguistic socialization in Mozambique. In particular, they compare the amounts of speech, communicative gesture and co-speech gesture children are exposed to in urban versus rural communities, which differ in the value they place on children’s verbal skills; in rural communities, children are expected to develop motor skills and to master social rules. In contrast, in urban communities, a higher value is placed on higher education and thus on verbal skills. The authors find significant differences in the amount of utterances children hear and in the amount of communicative gestures caregivers produce for them such that children in urban communities consistently receive more input. The amounts of utterances they heard at 1;1 was furthermore significantly associated with their expressive vocabularies at age 1;5 and 2;1. Thus, cultural values and expectations may affect the amounts of CDS children receive and accordingly their linguistic development. Rowe (2008) investigated the interactions of 47 parent-child dyads when the children were 2;6. She replicates the above findings by demonstrating that CDS is influenced by the parents’ socio-economic status and that the respective characteristics of CDS are significantly related to children’s vocabulary skill a year later. Moreover, she finds that socio-economic status is significantly related to knowledge about child development and certain beliefs about children, which mediate the relationship between socio-economic status and CDS. Her and Ninio’s (1979) results consequently provide a link between the results on the role of SES for caregivers’ naïve theories about child and infant needs and development on the one hand and their linguistic behavior towards their children on the other. Also Fernald and Weisleder (2011) report considerable effects of socio-economic factors on children’s linguistic development. They demonstrate that in families in which socio-economic factors allow caregivers much less time and energy to engage in direct one-on-one encounters with their children, much less CDS can be observed, which correlates demonstrably with delays in language acquisition in these families already when infants are 18 months old. There are thus no qualitative differences in speech to children in different social groups, but social classes differ regarding the sheer quantity with which children are exposed to CDS. That the decisive factor is in fact quantity and not quality of child-directed speech that

111

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

112 Designing Speech for a Recipient

children get to hear is supported by the comprehensive study by Hart and Risley (1995). The authors observe that depending on how much CDS children are exposed to, their own productive vocabularies develop at different speeds, with the gap continuously widening. The amount of words children heard varied between 616 words per hour in welfare families and 2,153 words per hour in professional families. These numbers correspond to children’s cumulative vocabularies of 525 at age 3 in welfare families and 1,116 words in professional families at the same age. Socio-economic status thus seems to have an impact on how much time caregivers spend on talking to their children in one-on-one interactions. In addition, caregivers’ socio-economic status may be related to scarcity; limited resources may also have effects on caregiver-infant relationships. For instance, Scheper-Hughes (1985) investigates maternal behavior in a Brazilian shantytown, and she suggests that the parental speech strategies that have been observed in great detail in Western cultures are “both culture and history bound, the reflection of a very specific and very recent reproductive strategy: to give birth to few babies and to invest heavily in each one” (Scheper-Hughes, 1985, p. 310). In contrast, under circumstances of scarcity, mothers develop other strategies: “to give birth to many children, invest selectively based on culturally derived favored characteristics, and hope that a few survive infancy and the early years of life” (Scheper-Hughes, 1985, p. 310). She therefore concludes that since “this reproductive strategy is characteristic of much of the world’s poorer population today, it would seem that some revision of maternal bonding/maternal thinking as a universal human script is in order” (Scheper-Hughes, 1985, p. 310). Thus, socio-economic factors may have a considerable influence on the way infants are perceived and, relatedly, the amount of CDS that children are exposed to. For instance, Bornstein and Putnick (2012) show that the developmental status of a country correlates with the amount of time caregivers spend on cognitive caregiving, for instance, the time spent on bookreading or storytelling, naming or counting. Thus, SES, comprising the availability of resources, caregivers’ education, cultural background and naïve theories about infants and children may influence the amount and the kind of activities parents engage in with their children, which may in turn have an impact on children’s language development.

Siblings Besides speech by mothers and fathers, Mannle and Tomasello (1987) investigate how older siblings talk to young siblings and find linguistic behaviors that are less finely tuned to the children’s linguistic capabilities; in particular, they find significant differences between mothers’ and siblings’ speech with respect to joint attention, object references, recasts, non-acknowledgements, off-topic responses and the amount the younger sibling gets to speak. These data may therefore support

Chapter 4.  Talking to children 113

the hypothesis that other family members besides the mother provide a greater challenge for the developing child. Siblings differ from fathers however in the kind of feedback they give to the infant’s utterances since siblings mainly respond nonverbally to the young child. This finding is in line with results by Weppelman et al. (2003) who find children as young as four years to employ numerous features of IDS when they are speaking to their younger siblings; however, not all of the prosodic features observed in the speech of their parents are present in their own speech to their siblings yet.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

4.2.2  Boys versus girls Besides parents’ gender, also the addressee’s gender may play a role. For instance, Albin and Echols (1996) find significant differences in the duration of words and final syllables and in non-final pitch peaks for mothers of girls versus mothers of boys (Albin and Echols, 1996, p. 408), such that mothers of girls use longer durations, and thus their speech is generally slower. Kaye (1980) finds longer and more varied utterances to female infants in comparison with male infants, yet this difference disappears when the children are older (Kaye, 1980, p. 498). He suggests that this is due to the fact that speech to infants is influenced by mothers’ ideas about infants, and that the lack of response from infants on the one hand and the amount of pretense on the side of the mother on the other hand leave room for variation depending on mothers’ ideas of gender roles and social relationships. Kitamura et al. (2002) analyze the prosodic properties of speech directed to girls and boys at birth, and at three, six, nine, and twelve months by Australian English and Thai speaking mothers for several prosodic features. They find considerable gender-specific differences in both cultures. In Australian English, while starting out relatively similar, from the sixth month onwards pitch height is generally higher for girls than for boys (Kitamura et al., 2002, p. 387). This trend is reversed for the Thai mothers; however, girls receive consistently more prosodically variable input than boys (Kitamura et al., 2002, p. 382–3). Australian English infants, in contrast, are addressed similarly with respect to variability. Concerning intonation contours, Australian boys receive initially more falling contours, then soon after considerable amounts of rising contours, and at about 12 months of age, they are mostly addressed with falling contours. For girls, in contrast, the amount of rising contours rises continually after birth. Thai children are however generally addressed with falling or level contours with an exceptional peak of rising contours for boys between 9 and 12 months of age. While the authors assume the results for Australian English IDS to be due to interaction, they suggest the Thai mothers

114 Designing Speech for a Recipient

to be more attending to the transfer of social norms and gender specific behaviors (Kitamura et al., 2002, p. 387). Karrass et al. (2002) find considerable differences in boys’ and girls’ linguistic competence at about one year, which they relate to differing amounts of encouragement of attention in their mothers’ speech. Encouragements of attention are positively related to both language comprehension and production measures, yet girls receive significantly more encouragement when they are one year old than boys. The authors suggest the reason to be that boys are physically more active and that mothers encourage them less verbally because of different expectations concerning boys’ and girls’ development (Karrass et al., 2002, p. 538).

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

4.2.3  Variation: Developmental finetuning The speech directed to infants and children has been suggested to vary depending on the age or development of the respective child; that is, different “degrees of babyishness” (Ferguson, 2004, p. 227) have been proposed, which Ferguson takes to correspond to the child’s linguistic development (Ferguson, 1977, p. 228). The question of whether parents adjust their speech based on the child’s linguistic and cognitive development will have important consequences for the discussion of the functionality of CDS, as well as theoretical implications, for instance, regarding the role of conventionalization in CDS. One factor that has been proposed to explain variation in IDS and CDS is thus the different needs children may have at different ages, for instance: speech to infants is affect-laden and may have as its primary purpose the establishment of a warm bond between infant and caregiver; speech to young children just learning language is characterized by the lexical and grammatical simplification we noted earlier and appears to be a language-teaching language; speech to school age children becomes grammatically complex and is no longer concerned with teaching the rules of language but, instead, concentrates on the rules and beliefs of the culture. (Gleason and Weintraub, 1978, p. 206)

The phonetic hyperarticulation reported for infant- and child-directed speech, for instance, i.e. the exaggeration of the pitch range, the mean F0, the speech rate, length of pausing, the vowel space, vowel duration and the articulation of fricatives (e.g. Biersack et al., 2005), as well as exaggeration of tone differences in tone languages (cf. Liu et al., 2009), are generally assumed to peak precisely when the child is most responsive to them (Snow, 1995). Finetuning effects may be found for verbal and nonverbal behaviors, and even in the relationship between the two; Karrass et al. (2002) report increasing maternal verbal encouragement of attention in speech to infants from four months,

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 4.  Talking to children 115

over eight months to one year of age. At the same time, the authors find physical encouragements of attention to decrease. Thus, with increasing linguistic capabilities, nonverbal behaviors tend to become less important. Kitamura et al. (2002) report finetuning for age level for Thai children with respect to pitch height, which is significantly higher for children between three and nine months of age (Kitamura et al., 2002, p. 380). There are no changes with respect to pitch range within this age group and in comparison with speech to adults in Thai, but the number of utterances with rising contours is consistently higher in speech addressed to children in general and at about nine months in particular (Kitamura et al., 2002, p. 381). There are moreover significant differences in the amount of tone information in utterance-initial and -final syllables, starting with significantly less tone information at birth and increasing continuously to adultlevel at about twelve months of age. For Australian English mother-infant dyads, Kitamura et al. (2002) find significantly higher pitch levels in speech to newborns, which even rise after three months. Furthermore, they identify finetuning with respect to the intonation contours, which rise more often already in the speech addressed to newborns and even more often after the first three months, peaking at about six months and declining then. They report a slight increase in pitch variability after birth, which is however not significantly different from variability in speech addressed to an adult. In contrast, Englund and Behne (2006) find no finetuning effects in Norwegian for fricative duration in the first six months, which is overspecified in mothers’ speech to their 0–6 month olds as compared to speech directed to the experimenter, and also no finetuning effect for vowel space, which they find to be even underspecified, i.e. smaller than in adult-directed speech. However, they do find a change in vowel duration over the period of six months, using ten recordings about two weeks apart. Thus, the amount of hyperarticulation and the ways it changes over time may be different for each individual linguistic property. Regarding syntactic adjustments, whether and how people adjust their speech to children with children’s increasing age and linguistic and cognitive abilities is not unanimous either; Soderstrom (2007) reviews previous work on speech to children and finds very heterogeneous results regarding syntactic measures such as MLU; she suggests that speech to younger children may be substantially different from speech to older children (Soderstrom, 2007, p. 509), yet no clear direction or development can be identified. Snow (1972) analyzes grammatical and discursive finetuning effects in speech to children. In her experiment she compares the speech of mothers of children aged 2;0 to 3;4 with the speech of mothers of children aged 9;5 to 12;4. The variables she investigates cover the overall quantity of speech measured in number of words, the mean length of utterance, sentence complexity measured in the ratio of

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

116 Designing Speech for a Recipient

compound verbs and subordinate clauses to the total number of utterances, number of words before the main verb, number of utterances without verbs, number of third-person pronouns, occurrences of partial and complete repetitions, and numbers of reformulations (semantic repetitions). The participants performed three verbal tasks with their own children and with another child from the other age group. The tasks comprised storytelling, tutoring in a sorting task and explaining a physical phenomenon to the child. Snow’s results show that mothers adapted their speech to younger and older children to significantly different degrees on all measures with the exception of preverb length, for which the difference was not significant. In addition, she reports 14% of mothers’ utterances to two year olds to be paraphrases of preceding utterances, three times more than for ten-year olds (Snow, 1972, p. 563). Similarly, Newport et al. (1977) find finetuning with respect to some measures of syntactic complexity for speech to children that was recorded six months apart (the children were between 12 and 27 months old) (Newport et al., 1977, p. 124). While they find the MLU to increase significantly with the child’s age, they argue that the reduced length in the speech to younger children may be due to deletions and thus not the result of reduced complexity; instead, they suggest that mothers have a “gross bias towards brevity” (Newport et al., 1977, p. 123) and that these changes are unrelated to the child’s linguistic sophistication (Newport et al., 1977, p. 126).4 The authors also find the type-token ratio and the number of verbs used to change over time, while the other complexity measures do not reach significance. Regarding sentence types, they find uses of the imperative to decrease over time, while the number of instances of the declarative increases (Newport et al., 1977, p. 123). Furthermore, on the discursive level, like Snow (1972), Newport et al. (1977) report self-repetitions, ‘content-less utterances’ and references to the child to decrease with the child’s age. Murray et al. (1990) find that mothers reduce their MLU over the age range (0;3, 0;6, and 0;9) studied and that the infant-directed utterances in their data set became shorter over time. They also report a positive effect of the shortening of MLU such that the amount with which mothers reduce the MLU was found to be predictive of the child’s receptive language development at 1;6. Sokolov (1993) investigates the same age range as Murray et al. (1990), yet he finds successively increasing MLU from 0;3 over 0;6 to 0;9 and is significantly

4. This suggestion is at odds with studies showing the highly reformulative and repetitive nature of talk to children (Küntay and Slobin, 2001; Onnis et al., 2008). Furthermore, the suspected bias towards brevity changes over time, so that even if it was responsible for the changes in MLU, the question would arise what causes it in the first place.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 4.  Talking to children 117

correlated both with receptive and expressive language measures in the second half of year one (Sokolov, 1993, p. 517). In contrast to the studies reported on above, Behrens (2006) finds no finetuning effects at all for selected syntactic features over a period of three years. She investigates a dense German corpus of dialogues between a mother and her son, Leo, over a period ranging from 1;11 to 4;11. During that time, Leo’s MLU increases from one to four words per utterance, while his mother’s MLU remains constant at about five words per utterance. Behrens (2006) furthermore investigates the distribution of parts of speech in Leo’s and his mother’s utterances, and while Leo’s utterances exhibit considerable developments, his mother’s utterances show the same distribution over the whole period of the recording. Moreover, Behrens (2006) investigates different types of noun phrases (simple noun, determiner + noun, pronoun), and again in spite of the considerable developments in Leo’s speech, his mother’s utterances show the same distribution over time. Finally, Behrens (2006) considers the distribution of different verb types (auxiliary, modal verb, copula, main verb); while Leo’s verbs initially consist of main verbs only, his mother’s utterances reveal a constant distribution of verb types over the time of the recording. Behrens argues that the mother’s lack of finetuning provides the child with the opportunity to observe a distributional model of the word classes under consideration. This interpretation is supported by the fact that Leo approximates the same distribution as his mother over time. Thus, his mother’s distributions provide him with a model for the adult-like distribution of linguistic types. Similarly, Retherford et al. (1981) do not find mothers to “alter the distribution of semantic and syntactic categories in their speech over time despite changes in occurrence and relative frequency of roles in the child’s speech” (Retherford et al., 1981, p. 605). That is, mothers “did not proportionally reduce the variety or frequency of their category use in accordance with the child’s productive repertoire” (Retherford et al., 1981, p. 607). Instead, the authors find the mothers’ use of the semantic roles to be stable over time, whereas the children slowly develop similar distributions as their mothers; the children thus tune in with their mothers. This is also the conclusion Gleitman et al. (1984) draw from their re-analysis of the data they considered in Newport et al. (1977): They hold the children to adapt to their parents and not the other way around. Dale and Spivey (2006) use a computational method from document comparison, called recurrence analysis, to assess the amount of syntactic coordination in child-caregiver interactions. They compare bigrams calculated on the basis of part-of-speech tagged dialogues from the Childes database (MacWhinney, 2000) of three corpora of parent-child interactions. In particular, they compare the sequences of word classes in parents’ and children’s utterances in temporal proximity. Their results suggest that not only parents finetune to their children’s linguistic

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

118 Designing Speech for a Recipient

competence, but that also children finetune to the linguistic properties of their parents’ utterances, indicating a bi-directional process of mutual alignment in dialog. Furrow and Nelson (1986) approach finetuning from the perspective of what they expect the child to need. They outline three criteria that linguistic input for language learning, i.e. the properties of “an effective teaching language” (Furrow and Nelson, 1986, p. 175), should fulfill: “(i) it would use language within contexts that are readily interpretable by the child on other grounds. Our research (…) suggests that these contexts are highly familiar and well-structured event sequences. (ii) It would adjust the length of utterances to the current ability of the child to integrate linguistic information. Thus MLU would be short at the beginning of language learning and would increase with the child’s growing competence (…). (iii) It would be sensitive to the child’s understanding, and would expand and recast utterances as needed to facilitate that understanding” (Furrow and Nelson, 1986, p. 176). This approach suggests that finetuning to the respective child’s linguistic capabilities does not necessarily mean to speak like the child; thus, while, for instance, Retherford et al. (1981) equate what they call “the finetuning hypothesis” with mothers matching their children’s linguistic characteristics (Retherford et al., 1981, p. 605), adjusting to a learner does not necessarily imply that the adult has to use similar or even the same linguistic characteristics as the child, both with respect to distribution and to the types of characteristics used. For instance, Sethuraman (2004) presents an analysis of finetuning regarding the presentation of input for verb-construction associations. In line with suggestions by Goldberg et al. (2004), that children learn constructional meanings better on the basis of prototypical verbs that represent the meaning of the respective construction best, she finds that younger children indeed receive linguistic input that is based on the prototypical verb, and that the verb-construction associations in parents’ speech become more varied over time. Similarly, in a large-scale study of several English corpora of the Childes database, Zeschel (2013) shows that while the proportion between constructions remains the same in mothers’ speech over time, for three of the most frequent argument structure constructions the associations between verbs and constructions are initially statistically biased towards a single representative light verb and become increasingly more varied over time. Methodological problems in the study of finetuning may be caused by interpersonal differences and adjustments at the micro level; Pan et al. (1993) investigate the MLU of the five longest utterances occurring in a ten minute interval and the number of different words produced in five minutes as measures of complexity. The results show highly specific parent-to-child tuning. Investigating the speech of larger speaker groups may thus be too coarse grained for finetuning to become apparent. Snow therefore argues that measures like MLU may be too global to account for differences in finetuning (Snow, 1995, p. 183).

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 4.  Talking to children 119

This hypothesis is supported by results from Roy et al. (2009) from the Speechome project. In this project, the project coordinator recorded all conversations occurring in his own household for three years, starting with the birth of the baby. Regarding the finetuning of the MLU, they find for all three caregivers of the child alike, father, mother and the nanny, that they adapt the mean length of utterances not only over time, but specifically for each new word the child is about to learn. The authors use the ‘word births’ of new vocabulary in the child’s speech as an anchor point and align caregivers’ utterances containing that word with this point in time. They demonstrate that caregivers shorten the MLUs of the utterances containing the word the child is about to learn successively until this word is first used by the child and then produce longer utterances containing this word. So while on the whole the caregivers’ adaptations in MLU seem to be only slight, the bird’s eye view on the whole vocabulary hides the extremely well finetuned adaptations at the micro-level. In addition to the results on syntactic finetuning discussed so far, there are findings regarding semantic and pragmatic adaptations. For instance, Veneziano (2001) finds significant changes with respect to the amount and type of displacement and informativeness in child-directed speech. She analyses the speech by four mothers to their children, who she divides into two age groups depending on the moment when references to the past and justifications (which also constitute displaced information) are first employed by the child. The first group consists of children ranging from 1;3 to 1;6/1;9, the second from 1;6/1;11 to 2;3 (Veneziano, 2001, p. 343). The speech of the mothers to their children in the two groups differs significantly especially in the amount of references to the past and to the immediate future, suggesting that the caregivers’ use of displaced, informative language is sensitive to the child’s development (Veneziano, 2001, p. 351). To sum up, CDS seems indeed to vary in relationship to the respective children’s cognitive and linguistic developing capabilities. While not for all linguistic features adjustments over time could be observed, in some cases these null findings could be explained by methodological problems (as in the case of MLU), while for others caregivers’ apparent failure to adapt to their child’s linguistic capabilities was suspected to provide the child with useful models of the goal. 4.2.4  Variation: Situation Snow (1977) reviews research on situational variation in mothers’ speech and concludes that the type of activity mothers and children are engaged in has a large effect on the realization of CDS (Snow, 1977, p. 36). She argues that mothers’ speech tends to vary in terms of simplicity and redundancy depending on the communicative demands of the respective situation (Snow, 1977, p. 37).

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

120 Designing Speech for a Recipient

Snow and Goldfield (1983) investigate to what extent the highly repetitive activities in which caregiver–child dyads are often engaged contribute to language learning, and in particular, how frequent recurrent routines are and whether children make use of them. They investigate 13 book-reading interactions between a mother and her 2;5 to 3;4 old boy over a period of eleven months. The book that was being read is a ‘storybook dictionary’, and the number of pages read in the book per session range from 17 to 59; altogether 610 pictures were discussed in that period, and many pictures were discussed several times, two pictures even eight times. Snow and Goldfield find that in 37.7% of those cases in which a topic recurs, the child takes the initiative and reproduces the same information previously provided by his mother. Thus, “the child utilized the strategy of saying what he had heard others say in precisely the same situation” (Snow and Goldfield, 1983, p. 566). The authors conclude that “situation-specific utterances, routinization of situations and predictability of adult utterances from situation are the crucial variables determining optimal usability of the linguistic environment” (Snow and Goldfield, 1983, p. 567). Conti-Ramsden and Friel-Patti (1987) investigate the influence of the degree of scriptedness of a situation on maternal speech. They compare ten mother-child dyads in three different situations: free play with familiar toys, waiting together (without toys), and introducing a new toy. Children were between 2;3 and 2;9. The authors find significant differences between situations in the children’s MLU, utterances and initiations, as well as in the proportion of semantically contingent utterances, mean number of exchanges, and mean number of turns per conversation. However, no changes were observed for mothers’ MLU. Thus, the authors conclude that the degree of scriptedness of the activity had mostly an influence on the child’s contributions to the interaction. A similar study was carried out by Yont et al. (2003), yet they find significant differences according to situation in mothers’ behaviors. They investigated mothers’ and twelve month old children’s utterances in two different situations, book reading versus toy play, two of the most well-researched situations of caregiverchild interaction. They recorded interactions between 25 one year olds and their mothers in their homes. The toys and the book were provided by the researchers. The interactions were analyzed with respect to the communicative intents expressed in the participants’ utterances. However, in contrast to the findings by Conti-Ramsden and Friel-Patti (1987), Yont et al. (2003) find significant differences in mothers’ speech according to situation. The authors find mothers to direct their children’s attention significantly more often in the toy play situation than during book reading. Furthermore, mothers negotiated the immediate environment and discussed recent events significantly more often in toy play than in book reading. Finally, mothers evaluated (marked) children’s actions significantly more often during toy play. In contrast, mothers established joint attention more often

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 4.  Talking to children 121

during book reading and discussed present events more often in book reading than in toy play situations. For the children, significant situational differences can be found with respect to directing the hearer’s attention (more in toy play), establishing a joint focus and discussing a present event (more in book reading). Thus, both mothers’ and children’s communicative intents differ according to the situation, which is accompanied by significant linguistic differences in the children’s language: They produce longer utterances and more words during toy play than during book reading. Walker and Armstrong (1995) find significant differences in the amounts of wh-questions, yes/no-questions, imperatives, directives, and amount of response depending on the situation. They distinguish between situations in which either directing the child’s behavior or eliciting conversation are at issue, but in fact they find significant differences between twelve of the 32 situations investigated, such as feeding, bathing, dressing, book reading, free play or doing a jigsaw puzzle. Also D’Odorico and Franco (1985) find significantly different utterances in different interactional activities. They created video recordings of a mother interacting with her 9-month old infant and analyzed joint activities such as ‘mother showing the toy’ or ‘infant is interested in the toy (looks at the toy or manipulates it) and the mother participates in manipulating the toy’ (D’Odorico and Franco, 1985, p. 571). Then, the mother’s utterances in each of these activities were analyzed concerning sentence type and semantic content. In a next step, 48 participants were asked to associate the mother’s utterances with drawings representing the activities identified. Participants associated the different activities uniformly with certain linguistic behaviors. The authors conclude that CDS is determined by the interactive rules operating inside the mother-infant dyad (D’Odorico and Franco, 1985, p. 582). That situations of learning and the consistency by means of which certain linguistic behaviors are produced in these situations plays a role in language acquisition is demonstrated in a recent study by Roy et al. (2012) who investigate the relationship between certain activity contexts and word learning and find that words that occur consistently and frequently in certain activities are learned earlier. These results suggest that the temporal, spatial and topical aspects of situations contribute to the child’s lexical acquisition. To conclude, interaction with children and infants is not a single, identifiable situation; the communicative situation is therefore not sufficiently defined by the participants in the interaction, i.e. by the fact that an adult is talking to an infant or young child. Instead, the activity participants engage in influences the way adults talk to their young communication partners as well as the children’s contributions to these activities. Nevertheless, the variation according to situation observed does not exclude the possibility that people will still talk to children in all of these

122 Designing Speech for a Recipient

different activities differently than when they are talking to other adults.5 The problem is that for methodological reasons, comparisons between CDS and speech to other adults are restricted to a few comparable situations; several of the activities common between caregivers and children (such as bathing, dressing, feeding, soothing, book reading) are just not appropriate among adults or are carried out under very different conditions. Within CDS, however, significant differences can be found depending on situational factors, such as the activity caregivers and children are engaged in.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

4.3  Functionality Ferguson (2004) notes that “we generally feel apologetic or embarrassed if caught engaging in unmistakable baby talk” (Ferguson, 2004, p. 176). All the more interesting are the questions why adults employ a special speaking style when interacting with children and what functions CDS may fulfill. Concerning the functionality of the adjustments made in CDS, several proposals have been made, which however do not necessarily exclude each other. Ferguson (1977), for instance, distinguishes between three types of functions of CDS: simplifying or reducing processes, clarifying processes, and expressive and identifying processes. Examples for simplifying or reducing processes are the replacement of “more difficult, complex, marked” sounds with “easier, simpler, less marked sounds” (Ferguson, 1977, p. 216), but also “reduction of inflections, the use of an all-purpose auxiliary make with BT nouns and interjection-like words in place of inflected verbs, and the replacement of second person pronouns by other forms of address” (Ferguson, 1977, p. 218). He suspects that CDS corresponds to the “perceptual sensitivities of the child” (Ferguson, 2004, p. 178). Examples for clarifying processes are the large number of repetitions, the more carefully enunciated style and reformulation (Ferguson, 1977, p. 222), but also the exaggeration of intonation contours (Ferguson, 1977, p. 223). Finally, expressive and identifying processes comprise the use of diminutives and hypocoristics (Ferguson, 1977, p. 224). Brown (1977) subsumes Ferguson’s simplifying and clarifying processes under the heading of communicative functions, which he contrasts with affective functions. While however the simplifying and clarifying processes suggested by Ferguson (1977) and the communicative functions suggested by Brown (1977) may serve functions with respect to facilitating both communication and language 5. Similar conclusions can be drawn for the other factors influencing CDS investigated, such as speakers’ gender, mental health and socio-economic status, child’s gender and child’s age and development.

Chapter 4.  Talking to children 123

acquisition, other authors distinguish between functions enhancing communication and functions facilitating language acquisition (e.g. Newport et al., 1977). In the following, we shall therefore address affective functions, communicative functions and functions concerning language acquisition.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

4.3.1  Affective functions That IDS and CDS fulfill mainly expressive and affective functions is suggested by a study by Hirsh-Pasek and Treiman (1982) who compare speech to children, as described in the literature, with speech to dogs. Since the authors find CDS to be structurally similar to dog-directed speech (concerning utterance length, sentence type, verb tense, well-formedness, repetition, sentence complexity, and the use of tag-questions), they suspect neither of the two varieties to have a language learning function, nor to be based on the child’s cognitive abilities; instead, they hold the adjustments to be due to some kind of social responsiveness of the communication partner. However, Hirsh-Pasek and Treiman (1982) do find functional differences between child-directed and dog-directed speech, especially with respect to the use of deixis, which is used in a tutorial fashion in CDS, yet not in speech to dogs. Thus, the authors conclude that CDS may also have other functions, besides the main social-affective ones. De Paulo and Bonvillian (1978) distinguish between possibly facilitative features of CDS that promote linguistic development, and those that do not play an obvious facilitative role in the child’s language learning – like “the squeaky voice, the phonological distortions, the diminutives, the unique lexicon” (DePaulo and Bonvillian, 1978, p. 205). They suspect that the child has to abandon the latter in the process of becoming a proficient speaker and that the use of a restricted lexicon may prevent children from acquiring “a more differentiated vocabulary” (De Paulo and Bonvillian, 1978, p. 205). Thus, they assume that CDS must have affective purposes since they hold so many of its features to be counterproductive to language acquisition. Another argument for affective functions may be that infants have been found to prefer infant-directed speech over adult-directed speech (Fernald, 1985). This seems to be the case with respect to both their mother tongue and cross-linguistically; Werker et al. (1994) find both English and Cantonese infants at four and a half and at nine months to prefer Cantonese infant-directed speech over adultdirected speech. The effect has been attributed to the higher pitch of infant-directed speech since infants have also been found to prefer higher pitched singing (Trainor and Zacharias, 1998). In their experiments, Trainor and Zacharias (1998) aim to explain what the infants’ preference for IDS is caused by. They tested 16 six-months old infants on children’s songs sung in higher or lower pitch by four different

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

124 Designing Speech for a Recipient

women with semi-trained voices. They find that infants look significantly longer in order to listen for the high versions (Trainor and Zacharias, 1998, p. 801). In order to determine whether infants react to a possibly less comfortable singing range instead of to higher-pitched singing, i.e. whether results may be due to differences in timbre, the experiment was repeated with four women who had different comfortable singing ranges. The results are identical with those of the first experiment such that infants look significantly longer to the higher versions. One reason for the development of IDS/CDS may thus be that children simply prefer the high pitch associated with this speaking style. Nowik-Stern et al. (1996) report a similar preference for IDS by premature infants (born at 24–38 weeks of gestation and tested at 4–8 months corrected age). Infants’ looking times were significantly longer for IDS (31.17 secs) than for ADS (23.13 secs). However, the authors find the premature infants to be less emotionally responsive to IDS. While Fernald (1985) suggests that infants’ preferences for motherese may have a considerable function with respect to attention management, the prosodic properties of IDS may fulfil important care-taking functions, such as soothing and comforting, prohibiting or expressing approval (Fernald, 1992). That the prosodic adjustments of CDS play such a care-taking role is supported by the fact that exaggerated intonation contours and high pitch can be found across languages (Fernald, 1992, p. 398–399). That a considerable function of IDS and CDS could be an affective function is supported by studies showing that infants interpret affective utterances in the intended ways. Lam and Kitamura (2006), for instance, investigate whether infants are able to discriminate affective intent in IDS utterances. Their results indicate that 3-month old infants prefer comforting utterances, 6-month olds prefer approving and comforting utterances over directing utterances, while 9-month olds show no preferences for any particular affective intent. These results suggest that the adaptations play a role with respect to their affective message especially in IDS. However, infants’ preferences for IDS do not necessarily have to be based on affective grounds; Segal (2010) investigates twenty 11-month old infants’ preference for structural simplifications in speech, in particular, shorter MLU, repetition and lexical variation. She finds infants to listen longer to the simplified speech, irrespective of the prosodic realization. Thus, infants also exhibit preferences for the structural properties of IDS. That infants are able to extract communicative functions from IDS also becomes apparent from the study by Spence and Moore (2003); they analyze infants’ capabilities to discriminate prosodic contours in low-pass filtered maternal utterances. They find that this ability develops between the fourth and sixth month of age; by six months, infants are sensitive to different prosodic contours associated with, for

Chapter 4.  Talking to children 125

instance, soothing and approving.6 These results indicate that children are indeed able to extract communicative information from the prosodic characteristics of IDS. That affective, interpersonal or social functions are involved in CDS is supported by relationships between speakers’ interpersonal skills and their amounts of adaptations to infants and children; in particular, Newman-Norlund et al. (2009) find adjustments to child addressees in a novel, nonverbal communication situation to be correlated with ‘empathy’ and ‘need for cognition’ personality trait measures. To conclude, particularly the prosodic properties of IDS and CDS maybe related to interpersonal, affective functions; yet, the same features are also likely to fulfill interactional and communicative functions.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

4.3.2  Communicative functions Another explanation for the adjustments observed in CDS is that caregivers are engaged in particular communicative tasks. For instance, Snow (1977) proposes that “adults’ persistent attempt to communicate creates the features of motherese” (Snow, 1977, p. 37). This position is also apparent in Brown’s suggestion: if you “concentrate on communicating, everything else will follow” (Brown, 1977, p. 26). That the features of CDS are functional with respect to facilitating communication, rather than accidental, conventional or related to infants’ preferences, comes from a study by Newman-Norlund et al. (2009) which shows that participants adjust novel, nonverbal communicative behaviors to a child addressee similarly to the linguistic adjustments reported on above. Thus, participants take their decisions on a functional basis with the aim to facilitate communication, rather than on the basis of convention. Furthermore, the authors find these behaviors to correlate with ‘empathy’ and ‘need for cognition’ personality trait measures (see above). McDonald and Pien (1981) identify interpersonal differences between mothers; they argue that these differences can be related to different communicative intentions, namely to direct their children’s behavior versus to engage them in conversation. They show not only that mothers predominantly pursue the one or the other communicative goal, they also demonstrate that numerous linguistic features correlate with these two intentions. Thus, McDonald and Pien (1981) hold that many of the characteristics of CDS are due to the communicative intents to direct or to engage in conversation respectively. At the same time, they argue that shortened MLU and talkativeness are not related to the two communicative

6. However, infants’ ability to distinguish musical contours may actually develop earlier than the study suggests since the four month olds are found to discriminate rising or falling contours in music. The results may thus be due to the filtering method used, see Knoll et al. (2009).

126 Designing Speech for a Recipient

intentions, and that thus caregivers may additionally simplify their utterances to enhance communication (McDonald and Pien, 1981, p. 355). Similarly, Fernald (1992) finds prosodic contours to correspond to specific communicative functions cross-linguistically, which indicates that the prosodic adjustments caregivers make help the child to interpret the communicative intention of the caregiver. Freed (1981) suggests that mothers’ use of self-repetitions is motivated by their effort to captivate the attention of an inattentive child (Freed, 1981, p. 31), and that in general their underlying intent is to direct their children’s behavior (Freed, 1981, p. 35). Thus, also in her view, the discursive modifications observable in IDS and CDS are due to caregivers’ communicative intentions, and attention getting and holding functions have also been suspected for several of the prosodic characteristics of CDS (e.g. Fernald, 1985). Thus, in this view, the characteristics of CDS are mainly due to caregivers’ attempts to reach certain communicative goals and to facilitate communication in general.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

4.3.3  Facilitating language acquisition The most debated function of CDS is certainly a possible function with respect to language acquisition. The question of whether input modifications facilitate language acquisition plays a crucial role in the discussion of the question whether language acquisition relies on language-specific innate capabilities or whether it is an active process that is based on general cognitive capabilities (cf. Ambridge and Lieven, 2011). Related to this is the question whether input is impoverished and not a sufficient resource for acquiring language and that therefore universal, innate mechanisms have to be assumed that guide the language acquisition process, or whether linguistic knowledge is taken to be constructed based on the input children receive (e.g. Tomasello, 2003). Thus, much attention has been paid not only to the speech modifications caregivers may make but also to their potentially facilitative effects on language acquisition. The answer to the question whether the peculiarities of CDS serve to facilitate language acquisition however depends on what scholars believe the language acquisition task to consist of; to what extent CDS may facilitate language acquisition depends on what the acquisition process is considered to be like. Accordingly, theoretical models of language acquisition may draw attention to different aspects of the input. In particular, three different views on the so-called ‘motherese hypothesis’ can be distinguished: One view holds CDS to be characterized by simplified input that facilitates language acquisition because of the reduced complexity of the input children receive. The hypothesis is here that the simpler the structures, the easier

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 4.  Talking to children 127

they can be acquired. Besides simplification, also the restriction of the input to a subset of linguistic structures is suggested to facilitate acquisition. This view of the ‘motherese hypothesis’ is discussed, for instance, in Newport et al. (1977) and in Snow and Goldfield (1983). The downside to this suggested function is then a possible vulnerability to the poverty of stimulus argument that states that children cannot learn structures that are not present in the input (Richards, 1994, p. 75). From this perspective, the amount of simplification in CDS should correspond to the speed in which children acquire the linguistic features simplified and how early these features can be found in children’s productions. A second, alternative hypothesis regarding the role of CDS is that the linguistic adjustments are due to adaptations to the child’s linguistic level. From this perspective, the more similar the caregiver’s language is to the productions of the child, the faster the child should acquire language. That is, in this view, caregivers do not simplify their utterances in order to facilitate language acquisition, but instead tune into the child’s own productions. This view accounts nicely for the changes in CDS over time in relationship to the child’s growing linguistic knowledge. This perspective on the functions of CDS is taken, for instance, by Retherford et al. (1981); however, the authors themselves argue that both mother and child tune into each other. A third view on how the adjustments of CDS may support language acquisition does not take simplified linguistic features as a starting point but instead numerous distributional and interactional aspects of CDS, including the complex ways in which caregivers respond to children; from this perspective, parents’ task is to facilitate access into the linguistic structure for their children (Furrow and Nelson, 1986). Research in this view concerns, for instance, the role of frequency, the role of variability, the use of certain repetitive frames, and the use of prototypical features (skewed input). In addition, in this, constructivist, perspective on language acquisition, also surface regularities are investigated as potential clues to linguistic structure. As representatives of the first view, Gleitman et al. (1984), for instance, take the “Motherese Hypothesis” to be that “the child learns best from the simplest data” (Gleitman et al., 1984, p. 67). Thus, from this perspective, caregivers’ contribution to language learning consists in simplifying the complexity of their utterances. However, in their own data, the authors do not find correlations between particular simplified linguistic features in the input and children’s linguistic development; instead, Newport et al. (1977), whose data they reanalyze, find a positive correlation between parents’ use of auxiliary-fronted yes/no-questions and children’s auxiliary development. Their finding that the frequency of yes/no-questions in the input is correlated with the acquisition of canonical declarative sentences seems to point in a very different direction, namely that the child learns more from more complex utterances (Gleitman et al., 1984, p. 68). The authors conclude that

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

128 Designing Speech for a Recipient

input modifications (by which they mean simplifications) therefore play a rather insignificant role in language acquisition. Similarly, Hoff-Ginsberg (1986) finds only a few simplifications of child-­ directed speech that are significantly related to the child’s linguistic development four to six months later. While most linguistic features of caregivers’ speech do not have a significant impact on children’s speech, she reports a correlation between mothers’ numbers of NPs per utterance and their children’s growth in NPs per utterance and in words per noun phrase, as well as in the number of self-repetitions and in verb phrases two months later. Furthermore, the numbers of real and reflective questions mothers produce were positively correlated with the child’s auxiliary development four months later. Moreover, Hoff-Ginsberg (1986) reports a positive correlation between mothers’ use of wh-questions and children’s competence in auxiliary use. Acknowledgements of declaratives, in contrast, were negatively correlated with NP and auxiliary use six months later. The author thus suggests two functions of the adjustments of child-directed speech, a data-­providing function, illustrating the regularities in the language, and an analysis-prodding function, as illustrated by the positive effect of response eliciting functions by means of real and reflective questions, yet no function of simplification per se. Also from a computer simulation perspective, Rohde and Plaut (2002) argue that simplification is not necessary for language learning since recurrent artificial neural networks, which they take to be good models of human processors, are characterized by an inbuilt capability to extract simple regularities first (Rohde and Plaut, 2002). Other studies do not argue that simplification is the key to language acquisition, but rather hold that the input children receive has to match children’s level of linguistic competence. For instance, Snow et al. (1987) address the second perspective on language acquisition described above, namely the role of what they call “fine-tuning of the syntactic and semantic complexity of input speech to the child’s level” (Snow et al., 1987, p. 65) by which they thus mean what we have called alignment with the child’s productions. On the basis of a review of previous correlational studies, they conclude that there are only a few indicators for such alignment and that furthermore the adjustments are rather based on what children are able to understand, rather than based on their productive capabilities. Furthermore, the fact that caregivers adjust their MLU also to prelinguistic infants, even though utterance length should not matter to them at all, is taken to show that the observable adjustments are not only due to “fine-tuning to the child’s language” (Snow et al., 1987, p. 70). They furthermore argue that this kind of alignment to the children’s productions does not facilitate language acquisition based on the hypothesis that it is simply not necessary, as the lack of such adaptations in other languages indicates (cf. Ochs and Schieffelin, 1995). Instead, they propose that semantic contingency

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 4.  Talking to children 129

is the crucial factor causing the observable adjustments. That is, they argue caregivers to “respond predictably and reliably to infant behaviors” which provides “the perfect way for the infant to learn how to use these behaviors to control the world” (Snow et al., 1987, p. 74). Studies showing correlations between the amount of contingency between children’s behaviors and mothers’ responses as well as studies in which increased contingency was successfully used as an intervention to overcome language acquisition delays serve as evidence for the crucial role of contingent interaction between caregiver and child. The authors furthermore assign a particularly significant role to maternal recasts and other juxtapositions between the child’s utterances and the mother’s responses (Snow et al., 1987, p. 78–79). Thus, in their view, language acquisition is facilitated by caregivers’ creating “discourse around the child’s utterances” (Snow et al., 1987, p. 79). To back up their hypotheses, the authors carry out an empirical investigation of the complexity of five children’s and their mothers’ utterances in routinized versus non-routinized situations. As predicted by their model, the authors find more complex utterances by both mother and child in the routinized situations. They conclude that it is the social and discursive embedding of children’s utterances that allows language acquisition to happen. Thus, they make a strong claim for the important role of interaction for language acquisition. However, numerous studies have meanwhile established significant correlations between caregivers’ speech adjustments at one point in time and children’s linguistic competence at a later time. In this approach, there is not necessarily a direct correspondence between special adjustments in caregivers’ speech and certain features in children’s productions; for instance, Furrow and Nelson (1986) take the function of CDS to be to generally allow the child to interpret the utterances they hear: “children will find it easier to interpret utterances semantically if they are couched in motherese-type utterances and thus will find it easier to learn the grammar of the language” (Furrow and Nelson, 1986, p. 165). Correspondingly, there need not necessarily be direct correspondences between particular linguistic features in the input provided by the mother and the child’s subsequent linguistic productions. Fernald (1992) illustrates this point by demonstrating how the intrinsic perceptual and affective salience of the prosodic properties of CDS, which are intuitively preferred by the child (Fernald, 1985), can be used to modulate attention, arousal and affect and thus to convey certain communicative intentions and emotions. These functions support message understanding and constitute regular soundmeaning correspondences for the child. Finally, the prosodic features of CDS help the child to segment and identify linguistic units in the speech stream (Fernald, 1992, p. 403). Thus, affective, communicative and facilitative functions interact with each other in supporting the child in interpreting verbal interaction.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

130 Designing Speech for a Recipient

The general facilitative function of CDS is supported by studies of the progress children make who hear more or less CDS. Huttenlocher et al. (2002), for instance, investigate interpersonal differences in the amount of input a child receives and their relationship to the child’s linguistic competence, thus contributing to the discussion of the influence of input on language acquisition. In order to exclude genetic explanations, they furthermore carry out a study on the effect of children’s preschool teachers’ linguistic behavior. In particular, the authors investigate the relationship between syntactic complexity in mothers’ utterances with both comprehension and production competence of their children as measured by both multi-clause sentences and the number of noun phrases per sentence. In the first study, which draws on data recorded by Hall et al. (1984), 34 four-to-five year old children from different socio-economic backgrounds were recorded in interactions with one of their parents. Recordings took place during children’s natural activities at 15min intervals over two days. Results show that neither the socio-economic status of the family nor the situation of recording (home vs. school) have an impact on the amount of complex sentences found in children’s speech. In contrast, the proportion of multi-clause sentences in their input turned out to predict quite accurately the children’s speech both at home and at school (Huttenlocher et al., 2002, p. 349). In a second experiment, Huttenlocher et al. (2002) replicate the findings on the data elicited by Hall et al. (1984) on data produced by 48 mother-child pairs from different socio-economic backgrounds and add a comprehension test to the study. Furthermore, they analyze the number of noun phrases per sentence as a second indicator for sentence complexity in addition to the measures used in the previous study, i.e. the number of multi-clause utterances, and refine the data annotation and processing. In this experiment, the authors find complexity of parents’ speech to be the most important predictor not only of children’s utterance production, but also of their comprehension (Huttenlocher et al., 2002, p. 355). This finding extends to different sentence types (Huttenlocher et al., 2002, p. 357). Regarding the comprehension scores, an influence of the family’s socio-economic status was found. In order to rule out that the correlations between parents’ input and children’s linguistic competence was due to genetic factors, the authors carried out a third investigation, involving 305 preschool children and their teachers. Children’s comprehension was tested twice, at the beginning of the school year and at the end. Teachers were recorded once in the middle of the school year for approximately two and a half hours. Results show that initially, children’s linguistic competence was significantly related to their socio-economic backgrounds, yet their syntactic growth over the school year was not. Instead, the main predictor for children’s comprehension of complex sentences at the end of the year was the proportion of complex sentences in their teachers’ speech (Huttenlocher et al., 2002, p. 366).

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 4.  Talking to children 131

Together, the three studies presented by Huttenlocher et al. (2002) show that the properties of children’s linguistic environment have a significant impact on the properties of their own speech. Thus, the authors conclude that aspects of parental input at an earlier age may be related to input variation at a later point in time (Huttenlocher et al., 2002, p. 344). Another comprehensive study of correlations between maternal speech and children’s progress in language acquisition is Pan et al. (2005). The domain they focus on is lexical acquisition in 108 children over a period of approximately two years. Interactions between mothers and their children were recorded when children were about 14 months, 24 months and 36 months old. Books and toys were provided by the researchers. The numbers of word types and tokens produced by the children during the sessions were analyzed as well as the mother’s word tokens and types and pointing gestures. In addition, demographic factors and mothers’ education, vocabulary and literacy skills were assessed, as well as possible depression levels. Pan et al. (2005) find the diversity of vocabulary in the maternal utterances, mothers’ language and literacy skills and mothers’ mental state (degree of depression) to correlate significantly with children’s vocabulary growth. The authors conclude that “there are measurable factors besides communicative input that enhance or impede child language learning” (Pan et al., 2005, p. 779). Furthermore, the numbers of pointing gestures by the mothers correlated with vocabulary growth. In contrast, quite unexpectedly (see Hart and Risley, 1995; Fernald et al., 2012), the number of tokens of words did not have an influence. However, as it had been previously suspected (e.g. Furrow et al., 1979), maternal behavior had different amounts of impact at different times; for instance, the effect of mothers’ linguistic diversity was strongest around children’s second birthday. Thus, the quality of the input children receive has a considerable impact on their linguistic development and may have a stronger effect at some times than at others. Similarly, studying mothers’ speech to the siblings of delayed children, Cross et al. (1985) find that there are some properties of mothers’ utterances that can be reliably associated with language development: While a contiguous semantic relationship between children’s and mothers’ utterances is associated positively with rate of development in early stages, the amount of disfluencies and unintellegibility of mothers’ utterances are negatively associated with linguistic competence (Cross et al., 1985, p. 259). Another set of studies establishing direct links between features of the input and the child’s language acquisition comes from the Speechome project in which about 70% of the total input the child received were recorded over a period of three years (Vosoughi et al., 2010b). The authors investigate the effects of F0 variation, intensity and duration on word learning. They use a large 400,000 word sample of child-directed utterances from the Speechome corpus to analyze the relationship

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

132 Designing Speech for a Recipient

between the prosodic properties of words in caregivers’ speech and the age at which the child adapted these words. Regarding duration, they first extracted the average duration of each vowel for all words in the corpus and calculated then the difference between every single word occurrence containing the respective vowel and the average relative duration of the 461 words the child actively produced by the end of month 24 in the corpus. Regarding intonation, they calculate the average F0 for each utterance and then compare it to the average F0 value of each word, as well as the maximum and minimum F0 within the word. Regarding intensity, the authors proceeded in a similar way as regarding F0. The results show that there are significant correlations between the prosodic features investigated and age of acquisition of each word in the child’s vocabulary, especially concerning nouns and adjectives. The strongest effect is caused by intensity, yet all three prosodic features together yield a highly significant correlation of r = −.44. Interestingly, the correlation between the individual prosodic features is not very high, ranging from r = .10 to r = .22; thus, the features complement each other in their predictive power concerning age of acquisition. These large scale results indicate that the prosodic features of CDS do have an impact on child word learning (Vosoughi et al., 2010b). In a follow-up study, Vosoughi et al. (2010a) add word frequency, recurrence (within a fixed time interval, initially of 600 seconds, yet 51 seconds produced the largest correlations) and MLU to the prosodic features. They find the features investigated to be relevant to differing degrees for different word classes. For instance, frequency, intensity and MLU were found to be the most important factors in learning nouns, which suggests that nouns are learned best when they occur frequently in short utterances with prosodic emphasis. In contrast, the acquisition of verbs and function words was influenced most by recurrence, which corresponds to the observations reported on above that CDS is highly repetitive. In addition to studies demonstrating the effect of naturally occurring linguistic input for language acquisition, recent studies show that children can learn linguistic structures from specially structured input. One such study is Brooks and Tomasello (1999) on passives; while children normally acquire the passive at around four, in their experiments children as young as three years learned to apply this construction to novel verbs due to an increased frequency of this construction in the input. Another such study is Valian and Casey (2003), who demonstrate that children aged 2;6 to 3;2 can learn wh-questions efficiently from structured input; they test the effects of modeling on the one hand and implicit correction on the other and find both to be effective for children to learn the construction and to identify the syntactic regularities.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 4.  Talking to children 133

That caregivers may intuitively present new words to children in a way that children are able to learn them successfully is shown in a study by Weizman and Snow (2001) who demonstrate that lexical learning can be related not only to the amount of different words in mothers’ speech, but also to the ways in which new words are presented to children. The authors distinguish between instructive, helpful and neutral situations in which a word is introduced. They examine five different situations, toy play, play of an educational game, mealtime, reading from an informative book and reading from a story book. The 53 mothers, who all come from low-income families, did not introduce new words in the book reading sessions other than the words occurring in the book; in contrast, particularly joint meals and toy play (especially with the educational toy) yielded new words and helpful and instructive word learning situations. The authors show that the mothers’ density of sophisticated words, as well as the number of helpful and instructive interactions, predict children’s lexical knowledge measured in the Peabody vocabulary test in general. In contrast, mothers’ education and child non-linguistic IQ did not play a role in children’s lexical comprehension (Weizman and Snow, 2001). That helpful adjustments to their children do not always have to be linguistic is apparent from a study by Karrass et al. (2002), who find correlations between the amounts in which mothers produce physical and verbal encouragements of attention for their four month old infants and the infants’ linguistic competence when they are four months and when they are one year old, where competence measures concern comprehension, compliance and production. Hurtado et al. (2008) address not just whether, but also how CDS influences language acquisition; in particular, they show how CDS affects lexical learning and processing efficiency and speed. They investigate mothers’ speech to their 18-month olds in relation to their children’s processing efficiency and vocabulary knowledge at 18 and 24 months. The quantity, diversity and complexity of the input children received at 18 months were highly intercorrelated; furthermore, children who received more, more diverse and longer utterances both knew more words and were faster in word recognition at 24 months. The authors’ subsequent statistical analyses reveal that vocabulary growth and increasing flexibility in spoken word comprehension together support the acquisition of new vocabulary items. That is, more, more complex and more diverse input has a positive effect both on vocabulary knowledge and on processing efficiency; at the same time, enhanced processing efficiency allows for faster vocabulary learning, and a greater vocabulary enhances processing efficiency (cf. also Marchman et al., 2015). Processing efficiency is also the key to explaining why the mere quantity of CDS has been found to affect children’s vocabularies as well as their processing speed and accuracy (Fernald and Weisleder, 2011; Weisleder and Fernald, 2013). Thus,

134 Designing Speech for a Recipient

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

the frequent experience of engaging with rich and varied language from an attentive caretaker provides the infant not only with models for language learning, but also with crucial opportunities for practice in interpreting language in real time, thus tuning and strengthening the processing skills used during real-time language comprehension. And as in building any skill, the amount of exercise a child experiences in language processing has enduring consequences for the optimal development of brain mechanisms underlying verbal fluency. (Fernald and Weisleder, 2011, p. 17)

Such findings explain quantitative effects, such as those reported by Hart and Risley (1995), as well as the considerable effects of interventions, for instance those by Ramey and Ramey (2004) who demonstrate early interventions by means of preschool teaching to have long-lasting effects, being still measurable when children reach college. Besides correlating input at earlier stages with children’s linguistic competence at later stages, several studies furthermore demonstrate a relationship between mothers’ communicative strategies and children’s linguistic development (e.g. Nelson, 1973). In particular, recasts and questions (see also Hoff-Ginsberg, 1986) have been shown to support language acquisition while behavior-oriented, directive interaction styles have been suggested to hinder language development. Paavola et al. (2005) analyze the relationship between 27 mothers’ responses to their one month old infants and the child’s communicative and linguistic development at one year. While the overall amount of responsiveness to the infant does not seem to play an important role, certain functions of mothers’ responses were found to be particularly effective. In particular, the amounts of mothers’ yes/no-questions with model answers correlate positively with children’s vocabulary comprehension and production eleven months later. In addition, the naming of objects and people was a strong predictor for children’s receptive and communicative competence, while the amount of displaced speech as well as attention getting devices were inversely related to children’s linguistic and communicative competence at about their first birthdays. The authors conclude that “specific dimensions of responsiveness predicted specific aspects of language” (Paavola et al., 2005, p. 191). Goldstein et al. (2010) reach a similar conclusion. They investigate to what extent mothers’ responses to their infants’ babbling are related to children’s vocabulary development six months later. They find that maternal responses to 9–10 month old infants’ babbling correlate significantly with children’s vocabulary knowledge at 15 months. They find that particularly spontaneous object labels provided by mothers, as well as candidate understandings that resemble the babbled sounds, are statistically correlated to children’s later lexical inventory. Thus, mothers’ responses to their children’s vocalizations may be helpful for vocabulary acquisition, and contingent feedback in the form of object labels facilitates the

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 4.  Talking to children 135

acquisition of these labels. The authors argue that babbling indicates a hightened arousal and readiness to learn so that mothers’ responses to babbling come just at the right time (Goldstein et al., 2010). Cross (1978) studies the linguistic and discursive features of mothers of children with normal in comparison with mothers of children with accelerated acquisition. She finds no significant differences with respect to the syntactic properties of mothers’ utterances, but numerous discursive features are significantly associated with accelerated acquisition, for instance, repetition, expansion, reformulation, semantic extension, semantically new utterances and self-repetition. Cross suggests that the function of child-directed speech is thus not to provide children with a less complex variety of language, but that “syntactic modifications have their effect not in isolation but in interaction with more dynamic adjustments that mothers make at other levels of description” (Cross, 1978, p. 209). She thus suggests the function of the adaptations to be “bringing the structural complexities of the input well within the reach of the child’s processing capacities” (Cross, 1978, p. 208). She concludes that “acceleration in linguistic acquisition is associated with an input that is substantially matched to the child’s own communicative intentions” (Cross, 1978, p. 214) since this frees the child to concentrate on the formal aspects of the expressions. She suspects the patterns of facilitative factors to change over time, which is supported by the results from studies, for instance, by Furrow and Nelson (1986), Pan et al. (2005) and Rowe (2008). To conclude, numerous studies have meanwhile demonstrated that particular properties of CDS are significantly related to language development in children. However, correlations do not express causal relationships, and thus, “a correlation between, for example, the frequency of mothers’ contingent responses and children’s language development is not evidence that contingent responding is necessary for human language acquisition” (Richards, 1994, p. 76, emphasis original). Richards argues that besides demonstrating a covariation between presumed cause and effect, scholars would have to show (a) that the cause precedes the effect, (b) that the independent variable can be manipulated so that the confound with spurious causes may be removed, and (c) that alternative explanations can be ruled out (Richards, 1994, p. 78–9). Moreover, he warns that there is always an interaction between caregiver and child such that cause and effect cannot be clearly separated. In addition, focussing on the gains in a given form may not be a useful indicator of language competence since the children who improve most may not be those with the highest scores at a later point in time. Furthermore, the intervals between the samples used are difficult to determine since both too long and too short intervals can distort the results. Finally, subvariables may be confounded and thus effects may be covered. All of these methodological problems may have contributed, in Richards’ view, to the fact that the effect of speech modifications

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

136 Designing Speech for a Recipient

towards infants and children could not be demonstrated in a way as to provide evidence against nativism (Richards, 1994). In their comparison of nativist and constructivist approaches to language acquisition, Ambridge and Lieven (2011) address many of these issues, as in fact have many of the studies reported on above. Means to address causality concern longitudinal studies and studies with retrospective or follow-up design, but also concurrent studies of accelerated versus delayed children (see Cross et al., 1985). Also other potential methodological pitfalls Richards (1994) raises have been explicitly addressed in the literature; for instance, the fact that certain features of CDS may only be effective at certain times (e.g. Furrow and Nelson, 1986), the fact that children’s linguistic ability may affect parental speech styles (Cross et al., 1985, p. 254), and potential co-relationships between linguistic features (Cross et al., 1985, p. 256), are explicitly taken into account in recent work on CDS. For instance, in order to distinguish between the features of mothers’ speech and interactional effects, Cross et al. (1985) compare not only the speech of mothers of delayed children with the speech of mothers of accelerated children, but also the speech of these mothers to the delayed children’s siblings. While the current evidence may not rule out the existence of innate linguistic knowledge (Ambridge and Lieven, 2011, p. 375), the fact that so many studies demonstrate successfully that relationships between input and language acquisition exist in spite of these methodological problems provides convincing evidence that many of the features of CDS together may facilitate language acquisition. Furthermore, Tomasello (2003) argues that many effects of the input had not previously been detected because researches focused on the development of generalizations as indicators for linguistic competence, rather than on the acquisition of individual structures. If however language is acquired in an item-based manner, as proposed by Tomasello (2003), and as evidenced, for instance, by the item-based grammars two-year olds can be shown to rely on (Bannard et al., 2009), then generalizations are not necessarily representative indicators for language acquisition. Moreover, the role of distributional information has hardly been taken into account for a long time since the potentially facilitative role of CDS was suspected to lie in grammatical simplifications, rather than in distributional information. Evidence that children possess the capabilities of statistical learning and generalization necessary to make use of such cues comes from studies of artificial language learning, since artificial languages enable systematic manipulation and testing of specific structures (Gomez and Gerken, 1999, p. 110). The authors tested infants for their discrimination of utterances created on the basis of different finitestate grammars. Infants demonstrate sensitivity to different end-points of utterances and to utterances generated by rules different from those encountered, even if these constructions occur with new linguistic material. Thus, excluding influences

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 4.  Talking to children 137

of prior familiarity with the linguistic material presented and other possible factors, the authors demonstrate that children are sensitive to distributional information and that they generalize the structures they identify beyond the linguistic material heard. Children are therefore in principle able to take advantage of the distributional cues provided by the peculiarities of CDS. Saffran et al. (2007) suggest this kind of rule learning not to be specific to language. They find that infants also learn the distinction between ABA, AAB and ABB presentations of images of dogs, not just on linguistic strings. Thus, not only do infants show statistical learning, but also generalization to new input, provided they recognize the input as from the same categories. Lieven (2010) discusses the role of frequency in language acquisition and finds that, in general, the more frequent a construction, the earlier it is acquired. For instance, frequency can outperform semantic generality in the acquisition of verbs and their association with particular argument structure construction (Theakston et al., 2004). Furthermore, frequency has an effect on ‘weird word order experiments’, in which children have to correct the mistakes and in which they have been found to correct mistakes depending on the frequency of the lexical material in the stimuli. Similarly, Rowland et al. (2003) find the frequency of wh-questions in the mothers’ language, recorded in 300 hours of mother-child interaction in a time span of one year (12 children aged 1;8.22–3;0.10), to be a better predictor for order of acquisition than factors such as semantic generality or relative complexity. Naigles and Hoff-Ginsberg (1998) find frequency in the input to be associated with both frequency and flexibility in the child’s productions (Naigles and Hoff-Ginsberg, 1998, p. 116). In particular, the more frequently a verb occurs in utterance-final position, the lower is its flexibility in children’s productions ten weeks later. The authors therefore suspect that the utterance-final position seems to be particularly relevant for word learning during early word acquisition, yet for later word learning flexibility is more revealing to the child. Accordingly, the authors find the diversity of syntactic environment to predict subsequent verb frequency and syntactic diversity in children’s productions ten weeks later (Naigles and Hoff-Ginsberg, 1998, p. 117). Lieven (2010) argues that if a given form instantiates a one-to-one relationship between form and meaning, then frequency effects can be overridden. Frequency effects can be observed in morphological learning, where frequent irregular forms are learned more easily than less frequent irregular forms; however, the process depends on, for instance, children’s judgements of similarity between these forms. Finally, conflicting cues may interfere with frequency effects so that form-­ function mappings, neighborhood relations and multiple cues serve as mediators for frequency.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

138 Designing Speech for a Recipient

Corpus-linguistic studies of CDS can also shed light onto the sources of errors children have been found to make; for instance, Kirjavainen et al. (2009) show by means of corpus analyses of a large collection of mother-child dialogs that pronoun case errors, such as me do it, as well as the rate by means of which they occur, can be explained on the basis of the frequency of certain constructions in the maternal utterances, for instance, as shortened reproductions of “let me do it”. Further evidence for potentially facilitative effects of the distributional properties of CDS comes from simulation studies which model the language learning process and which have shown that several of the properties of child-directed speech discussed above may facilitate language acquisition, and indeed numerous computational simulation studies demonstrate that surface regularities are sufficiently informative so that a child could make use of them. Examples are Mintz et al. (2002), Mintz (2003), Monaghan and Christiansen (2004), Onnis et al. (2004), Borovsky and Elman (2006), Laakso and Smith (2007) and Onnis et al. (2008). To summarize the discussion of the functions of CDS with respect to language acquisition, many of the structural, interactional and distributional features of speech directed towards infants and children may be potentially facilitative. In addition, for a considerable number of properties it is plausible to assume that infants and children indeed make use of them since they are statistically related to the children’s linguistic productions and comprehension at later stages. Finally, children, and even infants, have been shown to be principally able to take advantage of the cues presented to them.

4.4  The role of alignment We can now address to what extent the peculiarities of CDS can be accounted for by interactive alignment, that is, by automatic priming (see Pickering and Garrod, 2004). Several researchers have suggested that the characteristics of CDS are due to caregivers’ adjustments to their young communication partner, i.e. to alignment with the infant’s or child’s vocalizations. For example, Ferguson (2004) holds that the “modifications tend to reflect the child’s vocal behavior” (Ferguson, 2004, p. 179). For instance, the high pitch observed in CDS is similar to “babies’ early vocalizations, and indeed children’s vocalizations for a number of years, average much higher than normal adult pitch, in part because of the size and shape of their vocal tracts” (Ferguson, 2004, p. 178). Also Jakobson (1960) suggests that the adjustments of CDS are due to adults’ attempts to establish a common code with their linguistically restricted child and to “adjust themselves to the verbal habits of their addressees”. The idea that ‘Baby

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 4.  Talking to children 139

Talk’ is based on ‘Talk by Babies’ is also used to explain, for instance, the use of proper nouns instead of pronouns (cf. Brown, 1977, p. 10). Alignment as a factor shaping CDS has been explored by Retherford et al. (1981); however, they find that while the children slowly approach their parents’ use of syntactic and semantic categories, parents do not adjust their use of these categories depending on the child’s linguistic capabilities (Retherford et al., 1981, p. 607). The same conclusion is reached by Behrens (2006) for the distribution of parts of speech, noun phrases and verb types. In fact it seems to be more common for caregivers to respond to a child’s utterance that is deficient in some by means of other strategies, such as expansions (e.g. Cross, 1978). A study that shows that alignment by parents to their children’s linguistic productions does occur is Dale and Spivey (2006); they find that parents take up linguistic material from their children, yet they observe that children take up just as much linguistic material from their parents’ utterances. Thus, consistent effects of alignment cannot be shown to hold in CDS. A considerable problem regarding attempts at explaining CDS on the basis of alignment is the question how the child could ever make any progress if caregivers just matched children’s current status. The solution suggested by Brown (1977) is that it is the inconsistencies in caregivers’ speech that allow for language learning and progress (Brown, 1977, p. 15). This would mean that caregivers aimed to align with their children, yet regularly failed to do so. Given the considerable breadth of adjustments found, the functions of the adaptations, and the considerable organization of the interaction of these cues (such as the coordination of F0 contours, intensity and information structure in CDS described by Fernald and Mazzie (1991), for instance), it is unlikely that the adjustments observable are caused by unsuccessful attempts at aligning with children’s productions. A second major point against a determining role of alignment in IDS is that infants do not produce any linguistic output caregivers could align to. However, Kaye (1980) suggests that mothers’ short utterances to infants may be due to alignment even though the infants do not produce utterances themselves: “The infants’ behaviour comes in brief repetitive bursts. It would be much odder to respond to such expressions with long involved sentences than with short greetings, exhortations, and simple comments. Mothers mirror their infants’ behaviour back to them using speech as well as facial expressions and gestures” (Kaye, 1980, p. 502). This kind of alignment is of course quite unrelated to the sense of alignment as automatic priming as suggested by Pickering and Garrod (2004). Repetition of children’s utterances does, of course, occur in CDS (see Clark and Bernicot, 2008); however, as Mills and Healey (2008) have argued for collaboration in general, repetitions in CDS fulfill very specific functions, such as highlighting

140 Designing Speech for a Recipient

what is common ground and thus creating a contrast to not repeated, reformulated material, which thus serves as indirect negative evidence for the child (see also Chouinard and Clark, 2003). While this practice means to repeat linguistic material from the partner as suggested by the interactive alignment model, this kind of repetition is unlikely to rest upon automatic priming. In all, we can conclude that alignment plays only a minor role in IDS and CDS.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

4.5  The role of the partner’s feedback Snow (1977) holds that “language acquisition is the result of a process of interaction between mother and child” (Snow, 1977, p. 31) and that “mothers’ speech is a product of carefully adjusted interactional processes” (Snow, 1977, p. 37). She argues that “true mothers’ speech7 does not appear reliably until children are old enough to respond to adults’ speech” and “that even an experienced mother is not capable of producing fully adequate mothers’ speech if the child is not present to cue her” (Snow, 1977, p. 37). The last observation is based on a study by Snow (1972) which addresses the relationship between partner modeling on the one hand and interaction on the other. Mothers performed the same tasks for their own present child, for their own absent child, and for an absent child from another age group. In the first experiment, mothers just repeated their verbal interaction tasks with a tape recorder, in experiment 2, task difficulty was systematically varied to investigate whether the speech simplifications observed in experiment 1 were due to estimated task difficulty. In addition, in this experiment mothers were told that the tape recordings were later played to their children. In all experiments, mothers of both the younger and the older age groups talked more when the children were present, produced significantly shorter utterances with the present children and produced less complex structures for the present children with respect to all complexity measures. Similarly, the numbers of repetitions and reformulations are significantly different in the two conditions; just the partial repetitions are actually more frequent for the absent children than for the present children. Task difficulty, in contrast, had only little effect on the quantity and complexity of the utterances mothers produced. The results show that speech modifications by mothers depend to some extent on the reactions of the child addressed (Snow, 1972, p. 561) and that partner modeling alone, i.e. mothers’ mental representations of their children’s needs, cannot explain the adjustments observable. 7. Snow’s use of the term ‘mother’s speech’ corresponds to what we have been referring to as CDS.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 4.  Talking to children 141

A similar study in which the effect of the presence of, in this case, infant addressees is investigated, is the study by Murray and Trevarthen (1986) who investigate IDS in a video-mediated situation. They find significant differences in mothers’ behavior depending on whether mothers see their infants live or as replay. The authors recorded eight mothers in interaction with their two-month olds during which mothers believed that they were seeing their infants live the whole time; yet only in half of the interactions the live image of their infant was transferred. The linguistic analysis shows, for instance, that mothers asked more questions, repeated their utterances more often and produced fewer negative statements and fewer declaratives in the live situations (Murray and Trevarthen, 1986, p. 23), even though mothers judged their infants to be equally attentive across sessions (Murray and Trevarthen, 1986, p. 22). The authors conclude that infants’ responsiveness to the mother’s utterances contributes to the nature of CDS. That children’s linguistic and non-linguistic behaviors have an impact on how caregivers talk to them is also apparent from a study by Cross (1978) who finds that the more telegraphic utterances the child produces, the more expansions can be found in his or her mother’s speech. Like Snow (1977), Cross (1978)concludes that the linguistic behavior of the mother is sensitive to the linguistic features of the child’s utterances. Also Penman et al. (1983) study the role of nineteen infants’ ‘feedback’. Infants were three and six months old. The authors investigate the relationship between infant behavior and maternal speech style and find that particularly the affective components to be sensitive to infants’ non-linguistic behaviors. Laakso and Helasvuo (2005) demonstrate the active role of infants in meaning negotiation by analyzing four different uses of the same proto-verb in different situations of mother-infant interaction. They find the mother to provide candidate understandings for her daughter’s combinations of gaze direction, gesture, and language use, sometimes relying on phonological similarity between the proto-word uttered and the suspected target word, sometimes relying more on body language. The infant directs the course of the interaction through her active participation and acceptance or rejection of the candidate understanding of her utterances in context. Bohannon and Marquis (1977) study the effect of children’s signs of comprehension versus non-comprehension on the length of adults’ utterances. While signs of non-comprehension indeed have the effect that speakers shorten their utterances, signs of comprehension do not have any positive effect. In particular, the 2;8–3;0 old boy who served as the addressee in their study indeed understood shorter utterances better than longer utterances (Bohannon and Marquis, 1977, p. 1005), yet only signs of failure to understand led the participants to make adjustments. Thus, the authors suggest that speakers use shorter utterances in response to lack of understanding from the child, and this effect remains the same over

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

142 Designing Speech for a Recipient

time, consistent with their finding that finetuning of utterance length over the two months could not be observed. Bohannon III and Warren-Leubecker (1988) conclude that finetuning is not so much age-dependent or dependent on the linguistic and cognitive capabilities of the child, but rather dependent on local comprehension processes. Cross et al. (1980) also suggest that children’s receptive ability plays the most important role for CDS. In a study comparing speech to hearing children, hearingimpaired children of the same age and hearing-impaired children who match the verbal capabilities of the hearing children, Cross et al. (1980) attempt to tease out the contributions of the child’s cognitive development, interaction, and the child’s receptive capabilities. They find that while speech to hearing-impaired five-year olds is similar to the speech to hearing two-year olds, speech to hearing-impaired two-year olds shares many features with speech to infants. These results suggest that caregivers react contingently to the child’s receptive abilities, which provide a stronger cue than the objective measure age, children’s productive capabilities or their cognitive ability. Cross et al. (1980) suggest that it is the receptive capability that shapes CDS. This suggestion is in accordance with the continuity in parents’ behavior when speaking to preverbal versus older children who produce linguistic utterances themselves. Filipi (2009) provides a comprehensive analysis of the conversational development of preverbal children (from nine to eighteen months). She studies four parent-child dyads on the basis of one or two 30-minute recordings per month. Her analysis includes the child’s gaze, gesture and vocalizations. She demonstrates in great detail how parents treat their children’s nonverbal and verbal behaviors as first parts of adjacency pairs by providing the second part, treating the children as capable communication partners. Furthermore, parents initiate interactions by providing either summonses or questions, for which they expect the child to produce the appropriate response. If the expected second part is not forthcoming, they use various means to signal to the child that his or her response is still incomplete. For instance, looking at the mother is an appropriate response after a summons but not after a question. Parents will reformulate their questions and even move closer or touch the child until the expected response is produced. Only after several attempts, parents will let it pass. At the same time, their own reactions to the child’ productions occur generally in the ‘normal’ (Jefferson, 1984) intervals of 0–0.2 seconds, whereas they leave 0.5–1 second for the child to respond, thus orienting to the child’s possible (or even likely) production difficulties. In addition, Filipi (2009) shows how even very young children may initiate repair by repeating or holding their gestures if the parental responses were not what they had expected. Especially in gestures used to request something, repetition is used to display that the parent’s response has not been satisfactory.

Chapter 4.  Talking to children 143

Filipi (2009) argues that these specially adapted interactions allow the child to learn the principles underlying conversational sequencing, the normative nature of adjacency pairs, the turn-taking system, the expectations created by one turn with respect to the design of the next, and the expected timing in conversation. She furthermore illustrates how the child’s gestures are increasingly accompanied, and finally replaced, by vocalizations. In these interactions, the child plays an active role as a conversational participant, both initiating actions and responding to the partner’s actions. To sum up, infants and children contribute to the way they are being talked to by means of their interactional behaviors, especially by their displays of receptive ability and signs of lack of comprehension.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

4.6  The relationship between partner modeling, alignment and feedback We are now able to address questions concerning the relationship between the three mechanisms investigated in this study, namely the impact of cognitive models of the partner, the role of automatic alignment as suggested by Pickering and Garrod (2004), and the effect of the partner’s feedback. The previous sections have shown that people adjust their speech to children quite consistently and concerning all linguistic domains, and that these adjustments change with the child’s developing linguistic and cognitive capabilities. Moreover, the adjustments made have been found to be influenced by several external factors, such as speakers’ gender, education and socio-economic background, as well as by the respective activity they are engaged in with the child. Speakers’ level of depression and naïve theories about children have however also been found to be of influence. Furthermore, we have seen that even infants’ behavior shapes the way they are being talked to. At the same time, we have seen above (Section 4.4) that alignment does not play an important role in explaining CDS. Thus, while infants do not provide linguistic productions caregivers could align with or that could provide clues to their linguistic abilities, it is particularly children’s receptive abilities that seem to guide caregivers in their linguistic and discursive choices for their young communication partners. Given the impact of both education and naïve theories on the one hand and children’s displays of understanding on the other, the question arises to what extent caregivers’ linguistic choices are based on their partner models and to what extent they rely on interaction only. In the following, we shall address the question which roles these two resources, partner model and interaction, play and how they interact with each other. Several studies address this question explicitly. Gleason and Weintraub (1978) discuss two different p ossibilities f or t he (changes in) adjustments in speech to children. One possibility is that these

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

144 Designing Speech for a Recipient

adjustments are based on children’s feedback, and they find children to produce eye-contact, blinking and other small feedback responses that they hypothesize to have some influence on CDS. The other possibility they discuss is that the adjustments are due to partner models. In order to test parents’ knowledge about their children, they provided 16 parents with a test comprising linguistic, cognitive and affective tasks and asked them how their two- to five-year old children would score. After this, they had the children carry out the test. The comparison between the parents’ expectations and children’s replies shows particularly high correlations for the comprehension of linguistic items, which parents predicted correctly 66% of the time, while they predicted children’s productions correctly only 34% of the time. Altogether, parents’ judgments of their children’s linguistic competence was 54%–79% accurate, with fathers and mothers scoring equally well. Thus, parents’ knowledge about their children’s linguistic competence is generally very good, which corresponds to the findings by Roy et al. (2009) that both parents and the nanny must have had very detailed knowledge about the child’s current linguistic capabilities since they adjusted the length of their utterances depending on whether the child knew, or was about to learn, a word in the utterance in question. Interestingly, in the study by Gleason and Weintraub (1978), parents predicted correctly children’s preferences on the average, but not for their own children. In particular, parents were asked to predict their children’s favorite color and favorite animal; while parents expected their children to name prototypical instances of each category, such as red and blue and cats and dogs, children in fact chose other category members. Gleason and Weintraub (1978) conclude that “adults seem to have internalized stereotyped examples of the competencies of children at different ages and fairly limited ideas of what is appealing and interesting to children at different ages” (Gleason and Weintraub, 1978, p. 213). The authors take their findings as evidence that “parents’ language is not simply manipulated by subliminal feedback signals from children. Parents know a great deal about what their children know” (Gleason and Weintraub, 1978, p. 209). However, in the case of children’s preferences, they seem to orient more at a ‘typical’ or stereotyped child than at their own child’s productions since predictions seem to rely only partly on concrete linguistic experience with their own children. Another study directly addressing the factors influencing modifications in CDS is Cross et al. (1985). In order to disentangle the factors age, cognitive development, receptive ability, and formal linguistic, communicative and social competence, Cross et al. (1985) study speech to children whose capabilities are “not synchronized normally with their language levels” (Cross et al., 1985, p. 287); in particular, they investigate speech to normally developing, hearing-impaired, autistic and dysphasic children. Their results show first of all consistently for all of these studies that CDS is not adjusted based on the respective children’s age. Instead, especially

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 4.  Talking to children 145

children’s linguistic ability, and in particular their receptive ability, but also other communicative skills, play the most important role in the way their caregivers talk to them. Thus, while there are consistent significant differences between speech to children of the same age but different linguistic abilities, these differences disappear if speech to children is compared across age but for groups matched for linguistic ability. These results suggest that not the children’s age, but rather children’s contingent feedback plays the most important role. Second, Cross et al. (1985) also report significant differences in CDS to normally developing, hearing children at different age levels. The differences are biggest between speech to 10- to 13-month olds and speech to 23- to 29-month olds; the authors argue that it is the fact that children begin to produce utterances between these two stages that leads to a significant change in the speech children receive. Thus, this finding supports the conclusion that it is children’s own linguistic productions that shape CDS. At the same time, it cannot be ruled out that children’s behaviors contribute to parents’ partner models, which then give rise to certain linguistic behaviors on their side. Third, the authors try to identify specific adjustments for specific impairments. They find only few significant differences in the speech of mothers to children with different impairments, which however reflect the functional differences related to these children’s communicative impairments, providing maternal compensations for the child’s current or anticipated inadequacy as a communication partner. To sum up, Cross et al. (1985) suggest children’s receptive ability to influence the way people address these children more than age, cognitive development or linguistic knowledge. To what extent mothers’ behaviors are however determined by partner models or whether they are due to moment-by-moment adjustments to their children’s displays of comprehension cannot be answered. A study by Kaye (1980) addresses the relationship between partner modeling, alignment and interaction explicitly. He recorded 36 mothers in interaction with their 6-, 13- and 26-week old infants and compared their linguistic behaviors, especially simplicity and redundancy, to speech directed at these same infants by a female stranger and by himself, as well as with these mothers’ speech to their children two years later. On the basis of this data set, he is able to address several central questions: First, he addresses issues of the degree of finetuning of mothers’ adjustments to their children by comparing speech addressed to pre-verbal infants to speech to children with some productive capabilities. He finds mothers to adjust their speech to their children in all conditions, thus showing modifications to their six-week old infants as well as to their two-year old children. He also finds mothers to use very short utterances to their infants, even though their infants cannot be expected to understand either longer or shorter utterances, and utterances to become longer for the older children. Furthermore, he finds mothers to repeat their utterances both

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

146 Designing Speech for a Recipient

for their infants and their children: “infants did not respond verbally no matter how many times the mothers repeated themselves. Repetition does not seem to be for the sake of instruction or comprehension” (Kaye, 1980, p. 500). Moreover, he finds many more one-word utterances like greetings and feedback signals to the infants than to the two-year olds (Kaye, 1980, p. 500). He interprets this as part of mothers’ pretense that the infant has the floor, while two-year olds relinquish the floor usually after one utterance. Finally, he finds lexical items specific to CDS to occur only with the older children and not with the infants. He concludes that the increasing enrichment of linguistic input over time is rather complex, and, as Furrow et al. (1979) suggest, that there is no reason to assume that parental speech adjustments will have the same effects at different ages (Kaye, 1980, p. 501). Second, he addresses whether the partner model influences CDS; based on the finding by Snow (1972) that both mothers and non-mothers alike produce the characteristic adjustments, he holds that it is not the infant or child but the idea of an infant or child that causes the adaptations (Kaye, 1980, p. 492). He finds his hypothesis supported by his detailed analyses of the speech of two mothers from his data set; in particular, he reports these mothers to differ considerably in the frequencies with which they employ certain structures; in a qualitative analysis he then finds that mothers differed most importantly with respect to their understanding of the roles of their infants (Kaye, 1980, p. 504). He distinguishes between authoritarian, ego-building and egalitarian attitudes and argues that since mothers need to pretend in the interaction with their preverbal infants that their babies are communication partners, this pretense may instantiate different possible relationships (Kaye, 1980, p. 504–505). This is, he suggests, also the reason why he finds differences in the ways boys and girls are talked to at infant stage but not at age two: The pretense in the interaction with the pre-verbal infants allows preconceptions to affect the interaction more than situations in which children play an active role in the interaction (Kaye, 1980, p. 505). This point is supported by findings by Rowe (2008) who investigates the effect of mothers’ knowledge about child development on CDS and who finds a mediating effect of knowledge of child development on CDS. The third question Kaye (1980) addresses concerns the influence the child has on the interaction. He finds that mothers’ linguistic behavior was highly consistent over time with the same child, yet that both the female stranger and the author himself used different linguistic strategies to these same children. He concludes from this that the child him- or herself has not as much influence on the way it is addressed as the mother has, whose linguistic behavior may be determined more by their idea of her communication partner than by the actual infant or child. That is, while individual characteristics of the child do not affect the characteristics of the speech addressed to him or her (Kaye, 1980, p. 505), mothers’ individual characteristics may very well influence the way they talk to their children.

Chapter 4.  Talking to children 147

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

To sum up, while there is evidence that partner modeling in CDS has an impact on the way children are addressed (and which may be particularly powerful in interactions with nonverbal infants in which caregivers have to pretend that they are talking to a communicatively capable communication partner, i.e. in situations in which no evidence regarding the child’s receptive ability is available), there is also evidence that caregivers react contingently to infants’ and children’s linguistic and non-linguistic behaviors as indicators of their receptive ability. The exact relationship between partner modeling and interaction in IDS and CDS cannot be conclusively answered, yet both seem to play a central role in CDS, in contrast to interactive alignment, which does not account for the properties of speech to children.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

chapter 5

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Talking to foreigners

It has often been noted that speakers may talk to non-native speakers in peculiar ways. Ferguson (1975, 1981, 1982), as well as many after him (e.g. Freed, 1981; Hatch, 1983; Snow et al., 1981), has argued that these peculiarities are similar to those addressed to children and that the way native speakers talk to non-native speakers constitutes another example for a simplified register. Ferguson (1981) coined the term Foreigner Talk (FT) for this variety, yet also alternative terms are in use, such as teacher talk, classroom discourse, foreigner-directed speech, xenolect, foreign language interaction and native speaker–non-native speaker conversation. However, the term foreigner suggests not only non-nativeness, but also unfamiliarity, even strangeness, and some of the variability described below may stem from the fact that researchers have investigated many different groups of people with varying relationships with their communication partners. For instance, some studies concern foreign guestworkers and their foremen, others exchange students and volunteers who had signed up for intercultural exchange programs, others good friends and again others business contacts or language learning classrooms. Unifying factor in all of the FDS studies is solely the fact that one of the participants is a native speaker of the language used in the interaction while the other is not. In order to avoid circularity in defining the object of study, FDS will be understood here to comprise all speech by a native speaker that is directed at a nonnative speaker, irrespective of whether there are observable peculiarities or not. In contrast, the term FT characteristics will be used for the special adjustments speakers may make. In fact many researchers have found that speakers often do ‘nothing special’ when talking to non-native speakers. For example, Snow et al. (1981) report that between 2.2% and 20.1% of the utterances of each individual speaker and between 0% and 37.8% of the utterances in the individual conversations exhibited FT characteristics (Snow et al., 1981, p. 84). These figures mean that between 79.9% and 97.8% of the utterances of individual speakers in Snow et al.’s data were completely ‘normal’ and that there were conversations that were 100% ‘normal’. Similarly, Katz (1981) reports that 2–19% percent of FDS in child native–non-native conversation contained special adjustments (Katz, 1981, p. 54), which again implies that 79% to 98% are ‘normal’ utterances.

150 Designing Speech for a Recipient

Thus, for methodological reasons, I use the term foreigner-directed speech (FDS) in analogy with infant- and child-directed speech (IDS and CDS) and leave it to the following investigation to show whether there are any special ways of talking to non-native speakers, what these consist in and what factors they may be determined by.

5.1  Linguistic description of the variety The description of the ways speakers talk to foreigners is more difficult than the description of CDS since the category ‘foreigner’ may comprise many different kinds of addressees; another problem concerns the fact that the interactions analyzed often take place in very different situations. Finally, very different methodologies have been used, ranging from the elicitation of sentences intended for imaginary addressees over discourse analytic studies of classroom interactions and experimental studies with participants to CA studies of spontaneous real-life encounters (cf. Wagner and Gardner, 2004). Accordingly, the different studies reported on below often yield very heterogeneous and even contradictory results.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

5.1.1  Phonological and prosodic properties FDS studies drawing on speakers’ common sense perceptions generally report that FDS is delivered slowly and with clear or hyperclear pronunciation. For instance, Ferguson (1975), whose method was to ask the participants what one might say to a foreigner, reports that speakers generally assume that FDS is produced slowly, clearly and with increased loudness. Yet also scholars who carry out empirical investigations of interactions between native and non-native speakers report on phonological and prosodic adjustments; for instance, Henzl (1973) finds emphatic stress, loud pitch and a slow speech rate in speech to learners of Czech in a classroom setting (Henzl, 1973, p. 218), as well as hyperarticulation. In particular, she reports that speakers employ phonemically more accurate utterances and clearly avoid the simplifications of consonant clusters and vowel reduction that occur in normal Czech speech. Moreover, she finds utterances to be slow and to exhibit more pauses than speech to native speakers, and she reports a lack of disfluencies and hesitation markers (Henzl, 1973, p. 218). Müller (2000) also describes FDS as slow and even clumsy. He argues that native speakers provide an analysis of the language for the listener, making individual meaning-bearing parts of language salient and removing everything perceived as ‘unnecessary’. His evaluation of this strategy is however not positive at all; by

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 5.  Talking to foreigners 151

highlighting individual words and syllables in order to provide a factorization of language, speech to foreign adults often becomes unrhythmical, disrupted, lacking its fluent character and thus making it impossible for the communication partner to learn the linguistic gestalts language may be made up of. Also Avery et al. (1985) find evidence for the chunking of information into smaller units in English native–non-native interactions during problem solving. Yet not all researchers find prosodic adaptations; for instance, Katz (1981) investigates the speech of Lisa, a five-to-six year old girl who is a native speaker of English, in conversation with a five-to-six year old girl whose mother tongue is Hebrew and who is learning English as her second language. Interactions between the two girls were recorded on a monthly basis over a period of ten months. Katz (1981) finds slower tempo, careful enunciation and syllable-timed utterances to occur very rarely in the child’s speech directed at her playmate. He writes: “More global features such as slower tempo, careful enunciation and syllable-timed utterances – features a linguist might intuit as characterizing a FT register – are surprisingly sparse in the data” (Katz, 1981, p. 59). Furthermore, Smith et al. (1991) also report the opposite strategies: In their study, some speakers even sped up; while they found ‘normal’ adult-directed speech to be about 3.2 to 4.0 words per second, some speakers produced as many as 5.1 words per second in the conversation with a foreigner speaking broken English (Smith et al., 1991, p. 178). Phonetic studies of the prosodic, phonetic and phonological characteristics of FDS are unfortunately rare; however, Biersack et al. (2005) investigate speech to imagined children, foreigners and adult listeners for its prosodic peculiarities. Their subjects, twelve female native speakers of English, were asked to describe a route based on a map with five landmarks for an imagined two-year old child, for an imagined fellow student who has recently arrived from a foreign country and who has just begun to learn English, and for an imagined friend. The authors analyze pitch range and F0 maxima, as well as the speech rate, computed by syllables per second and by the duration of five vowels, in each of three sentences: the second calibration sentence read from the instructions, the first intonational phrase from the route description and the last intonational phrase from the route description. The analyses reveal that speakers use higher pitch only for imagined children and not for imagined foreigners. Range is lower for FDS than for CDS, but the difference does not reach significance. F0 maxima are significantly higher for CDS, and FDS tends to be lower than adult-directed speech (ADS). Speech rate was significantly slower in FDS than in both CDS and ADS. However, in CDS vowels are significantly longer than in the other two registers. Thus, the prosodic strategies for slowing down differ between FDS and CDS. In all, phonology and prosody of speech directed at imagined foreigners and children were found to differ both from each other and from adult-directed speech.

152 Designing Speech for a Recipient

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

DePaulo and Coleman (1986) provide a comparative study of speech to children, foreign adults, retarded adults and native English speaking adults. Participants were 80 undergraduate students who served as speakers and six girls aged about five to seven, four retarded women, four foreign women with noticeable foreign accents and two female native undergraduate students, who served as listeners. To exclude effects from feedback and interaction, speakers only got to speak to their communication partners in a five-minute warm-up phase. After that, speakers believed to be talking to the person they had just met who was meant to be listening from another room behind a one-way mirror. The task was a block world instruction task. DePaulo and Coleman (1986) analyze the speech rate (total number of words divided by the total time of task), the absolute number of pauses that were longer than two seconds divided by the total time of the task, and pause duration as defined by the mean number of seconds per pause (DePaulo and Coleman, 1986, p. 949). They find no effect for speech rate, yet the number of pauses longer than two seconds is higher for children thanfor foreigners, and the average length of pauses is also longer for children than for adults (DePaulo and Coleman, 1986, p. 950). This is, however, not necessarily in contrast to the results discussed so far, since the pauses suggested for FDS are rather characterized by their ubiquity and their placement than by their length (see also Müller, 2000), as the transcription of FDS utterances in Hatch et al. (1978) suggests: (1) Native speaker: is-it-hand-writ-ten? Non-native speaker: Uh, pardon me. Excuse me? Native speaker: Is-your-thesis-now-handwritten? (cf. Hatch et al., 1978, p. 50)

In any case, the pauses DePaulo and Coleman (1986) find for children and foreigners are significantly longer than pauses in speech to native adults (DePaulo and Coleman, 1986, p. 950). DePaulo and Coleman (1987) also investigate communicative warmth, i.e. the amount of respect, affect, friendliness and encouragement (DePaulo and Coleman, 1987, p. 80), conveyed by the speaker. They find that foreign adults are met with much less warmth than children, but with more than native adults, where the difference between children and native English speaking adults is significant. However, there are finetuning effects as well: The authors find that speakers “addressing foreign adults also conveyed a warmer impression if they were paired with a more sophisticated listener” (DePaulo and Coleman, 1987, p. 82). Another empirical study is Uther et al. (2007).The authors investigate ten mothers in three conversations each: one with their infant, one with a foreign research assistant and one with an adult native speaker of English (the researcher). They analyze pitch height, vowel space, and vowel duration of the occurrences of the names of three toys introduced in the infant-directed condition. Their findings

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 5.  Talking to foreigners 153

show hyperarticulation for FDS in terms of vowel space to increase, but, like the findings by Biersack et al. (2005), no prolonged vowel duration; thus, FDS was clearer but not slower. Furthermore, FDS is found to be similar to ADS in pitch height, indicating that speakers do not use higher pitch for foreigners. In addition, the authors selected short excerpts of the mothers’ speech from the three conditions for a study of the degree of positive and negative emotion conveyed. The excerpts were first sent through a low-pass filter to render them unintelligible and then rated for vocal affect by 24 native speakers of English on a 1–5 Likert scale. FDS receives very low ratings for positive affect, and the amount of utterances with negative affect in FDS is significantly higher than for IDS and ADS (Uther et al., 2007, p. 5). Also Bradlow (2002) presents an analysis of the phonetic features of (imagined) FDS, focusing on hyperarticulation, vowel duration, vowel space expansion and co-articulation; she investigates twelve American English monolinguals and nine Mexican Spanish – American English bilinguals reading a text for imagined listeners. In one condition, speakers were asked to imagine a listener who was familiar with the speaker’s voice; in the other condition, speakers were asked to imagine someone with hearing loss or from a different language background. Bradlow finds consistent hyperarticulation for clear speech in terms of vowel duration in the English data, but not to the same degree in the Spanish data (produced by the same bilingual speakers). Yet, in contrast to Henzl (1973), she finds co-articulation effects to persist even in the clear speech data, which might suggest that speakers do not adapt co-articulation to their listeners’ suspected needs. However, consistent expansion of the vowel space can be observed, dependent on the respective consonant context, for both the English-speaking and the Spanish speaking groups. The vowel space expansion effect in clear speech occurs thus regardless of vowel inventory size, which differs in the two languages. Avery et al. (1985) investigate the occurrence of prosodic features in naturally occurring discourse; they have four native–non-native speaker pairs and four native–native speaker pairs carry out a problem-solving activity. They analyze the occurrence of slow rate, articulation of final stops, the use of glottal stops before initial vowels, the numbers of contractions and vowel reductions, the amount of pausing and exaggerated volume and intonation in these eight dialogs, distinguishing between core contributions, i.e. contributions to the topic, and non-core contributions, such as repetitions, expansions, responses and metastatements. They find that phonological adjustments do not occur throughout the dialogs but are limited to specific points, namely to core information within the discourse.Such adaptations they find for both native speaker and native–non-native speaker interactions. However, their study is purely qualitative, and the authors only suspect that there may be quantitative differences as well (Avery et al., 1985, p. 227). Thus,

154 Designing Speech for a Recipient

as researchers working with the notion of register, such as Ferguson (1981) or Biber (1994), have cautioned, FDS may in fact not be characterized by peculiar features but by the increased use of particular features. To sum up, it is not possible to conclude that FDS is necessarily characterized by a particular set of prosodic and phonological features. While in some situations speakers seem to slow down, in others it does not seem to be the case. Moreover, the method for slowing down seems to be different from the methods of slowing down used in CDS, being more due to pauses between words and utterances than to vowel elongation. This trait is suggested to be helpful by some and to be disruptive by other authors. Furthermore, there seems to be evidence that speakers speak more clearly, increasing the differences between vowels, yet the results on the coarticulation of consonants are rather mixed. Noteworthy is also the high amount of utterances with negative affect communicated by the prosodic properties of these utterances in FDS in the study by Uther et al. (2007).

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

5.1.2  Grammatical properties The results in studies of FT characteristics concerning the occurrence of grammatical modifications are no less diverse. In 1975, Ferguson carried out a study eliciting speakers’ stereotypes about FDS by asking them to rewrite a set of sentences “how they thought speakers of English trying to communicate with apparently uneducated non-Europeans would say” them (Ferguson, 1975, p. 3). The modifications he finds concern mostly morphosyntactic means: – vowel additions – reduplication – article omission – copula and auxiliary omission – no do-support – no inflection – no conjunctions – tag questions – simplified negation – reduction of possessive pronouns – pronoun case substitution – special lexicon – lexical paraphrase – loud, slow delivery

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 5.  Talking to foreigners 155

In contrast, other researchers do not report on grammatical modifications. For instance, Hatch (1983), reviewing previous work on FDS, writes that “many researchers have found no instances of ‘ungrammatical foreigner talk’ in their data” (Hatch, 1983, p. 175). Likewise, Freed (1981) states that “The most striking finding is that there are no ungrammatical utterances” (Freed, 1981, p. 24). It may thus be suspected that the stereotypes elicited by Ferguson do not correspond to what speakers really do when speaking to foreigners, and that the features above are rather part of a caricature of speakers’ real strategies (cf. Roche, 1989). A grammatical feature reported on by many researchers from empirical studies is, for instance, that speakers use shorter utterances for foreigners than for other native speakers. For example, Freed (1981), who investigates conversations between students in a conversational partner program providing for regular cultural exchange, finds shorter and less complex utterances in FDS than in speech to native speakers of English depending on the linguistic capabilities of the non-native speaker (Freed, 1981, p. 25). Also other authors report a shorter than usual mean length of utterance; for instance, Smith et al. (1991) find a significantly smaller number of words per utterance for speech directed towards a confederate who speaks broken English than to the same confederate speaking accented or fluent English. Similar results are reported by Henzl (1973) on classroom discourse; she finds short and well-formed sentences (Henzl, 1973, p. 211), fewer subordinating clauses, and the avoidance of ‘difficult’ forms, such as the past tense (Henzl, 1973, p. 213). Similarly, DePaulo and Coleman (1986) find average sentence length to be 16.4 words per sentence in the speech to foreigners, 18.1 in the speech to native adults and 13.6 in the speech to the children (where the difference between speech to foreigners and speech to other adults does not reach significance, however). Furthermore, the authors report significantly more words used for the same task for foreign adults than for native adults, yet fewer than for children. Thus it seems that participants compensate for the shorter utterances by producing more of them (DePaulo and Coleman, 1986, p. 950). Moreover, DePaulo and Coleman (1986) find the numbers of questions addressed to children to be twice as high as those addressed to foreigners, whose numbers only tend to be a little higher than for native adults. However, in the use of common vocabulary there are no differences between foreigners and adults. Furthermore, there is a similar type-token ratio for all three varieties. In spite of the many null findings regarding grammatical adjustments to the partner’s linguistic competence reported on so far, Hatch et al. (1978) report many instances of copula deletion in classroom discourse; in contrast, they do not find any in speech between friends (Hatch et al., 1978, p. 47), yet many instances in which it was deleted (Hatch et al., 1978, p. 43); they report the native speaker to say, for instance:

156 Designing Speech for a Recipient

(2) is crazy. (3) is good for him. (4) I don’t think is good.

The authors find pronoun deletion both in the speech to a friend and in the speech of a teacher to his students after class. In the speech to the friend, there are furthermore instances that lack tense marking, and do-support may be limited (Hatch et al., 1978, p. 44), as in:

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

(5) he no likes what?

Moreover, the authors find a broad range of negatives (Hatch et al., 1978, p. 45) in the native speaker’s utterances. Furthermore, they report an overuse of for. Thus, the grammatical modifications Ferguson describes can indeed be found and have been reported on especially by scholars working on spontaneous interactions. For instance, Snow et al. (1981) investigate conversations between foreigners and Dutch government employees in two offices, the Population Register and the Central Housing Office in Amsterdam. They find determiner deletion and substitution, infinite verb forms, auxiliary and copula deletion, simplified negation, pronoun deletion and substitution, pronoun suppletion, avoidance of complex sentences by using juxtapositioning, the use of canonical word order and lexical changes in terms of “analytic translations” (Snow et al., 1981, p. 85–86). Grammatical modifications have also been observed in language contact situations that occur for specific purposes, such as trading (Foley, 2006) or giving orders (Clahsen et al., 1983). For instance, in Tok Masta, a version of English spoken by English settlers in New Guinea in the interaction with house servants and other employees and understood as a variety by its users,1 many of the features elicited by Ferguson’s experiments can be found, as is apparent from the following text (Mühlhäusler, 1981, p. 107): All boys belong one place, you savvy big master he come now, he new feller master, he strong feller too much, you look him all ships stop place; he small feller ship belonga him. Plenty more big feller he stop place belonga him; now he come here take all place … You look him new feller plag; you savvy him? He belonga British; he more better than other feller … Suppose you no look out along him, he cross too much. (proclamation ‘read to Natives on the Annexation of the late German Possessions in the Pacific, Rabaul, September 12th 1914’, published in the New Guinea Gazette of 1 November 1914, p. 7) 1. This variety, of which speakers claimed that they mastered it after a fortnight (Mühlhäusler, 1981, p. 106), ceased to exist when it became mutually unintelligible with the indigenous varieties of Tok Pisin around the 1930ies (Mühlhäusler, 1981, p. 100).

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 5.  Talking to foreigners 157

Here we can find vowel additions (e.g. belonga), article omission (e.g. big master), copula omission (e.g. he more better), lack of do-support (e.g. you savvy him?), lack of inflection (e.g. he stop), lack of conjunctions (e.g. now he come here take all place), simplified negation (e.g. you no look), reduction of possessive pronouns (e.g. place belonga him), special lexicon (e.g. savvy), and lexical paraphrase (e.g. all ships stop place). Moreover, the author of this text also makes use of an element of the Tok Pisin noun classification system (i.e. feller), although rather unsystematically. Also Müller (2000) reports on these features in German speech directed towards foreign adults.2 He observes all of the features suggested by Ferguson (1975) plus additionally a lack of particles and other small words that are suggested to be considered superfluous (Müller, 2000, p. 205). Similar features of simplification have also been found by Foley (2006) who investigates pidgins from different language families with respect to their relationship to the source languages involved. He finds that simplification plays a central role in the formation of the pidgins investigated since the varieties he studies, Hiri Motu, Yimas Pidgin and Tok Pisin, all show features that are not present in any of the source or sister languages. Instead, he suggests that clear simplification paths emerge along dimensions very similar to those outlined by Ferguson (1982). These same features are also reported on by Katz (1981) for the interaction between native and non-native children, yet he finds them to be surprisingly rare: “The same holds for morphosyntactic features (…) – simplification of negation to no(t), deletion of grammatical morphemes like article, copula and auxiliary be – are very infrequent in Lisa’s data” (Katz, 1981, p. 59). However, only some of the above features can be seen as simplifying devices, for instance, omission of articles, copula and auxiliaries, lack of do-support, inflection and conjunction, as well as simplification of negation, possession and the lexicon; other features may provide helpful clarifications, but they do not constitute simplifications per se, for instance, vowel additions,3 tag questions, pronoun case substitution and the special lexicon. These means are clarificational, rather than simplifying. The distinction proposed is quite crucial because simplification and clarification have often been associated with different values; for instance, Valdman (1981) argues that:

2. Interestingly, in speech to foreign children, adult native speakers seem to employ very different strategies (Müller, 2000). 3. The vowel additions in the example above are added systematically to belong only if it functions as a possessive marker, not if it is used as a verb. This is remarkable since Tok Masta speakers generally claim themselves that they ‘besprinkle’ the language with these occasionally added vowels or -ums (see Mühlhäusler, 1981, p. 104).

158 Designing Speech for a Recipient

simplification and reduction of fully formed varieties of the target language imply that native speakers equate social inferiority with intellectual limitations. One would therefore expect varieties of FT used in non-egalitarian situations to stress simplifying processes. (Valdman, 1981, p. 44)

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Thus, while clarification is regarded to be helpful, simplification is often associated with a ‘talking down function’ (Ellis, 1985, p. 137), and since grammatical modifications are often simplifying or have at least been treated as such, grammatical modifications have been rejected by many scholars (e.g. Ellis, 1985, p. 137). It may however be suspected, given the list of grammatical modifications above, that Valdman (1981) and Ellis (1985) really draw a line between modifications that render an utterance ungrammatical and modifications that preserve the general rules of standard grammar, rather than distinguishing between simplifying and clarifying modification. To sum up, while many studies of FDS only report on reduced mean length of utterance and a preference for juxtaposition or coordination (rather than subordination), other studies show heavy grammatical adaptations that render the native speakers’ utterances ungrammatical. Finally, some researchers have found no grammatical modifications at all. Thus, the spectrum of grammatical modifications observed ranges from none over reduction of grammatical complexity to grammatical modifications that concern a broad range of morphosyntactic means. 5.1.3  Semantic and pragmatic properties Observations regarding lexical semantic adjustments are as heterogeneous as those regarding grammatical modifications. For instance, for classroom discourse, Henzl (1973) reports the use of very basic vocabulary, inhibitions to use synonymous terms, and the avoidance of colloquial expressions (Henzl, 1973, p. 210). Altogether she notes a strong orientation towards the standard variety of the language (Henzl, 1973, p. 211). Freed (1981) also notes the use of basic vocabulary but reports that, in contrast to CDS, FDS is not characterized by the here- and nowness of conversational topics (Freed, 1981, p. 30). This observation is supported by Krashen (1981) who warns that FDS with its detached topics may be more difficult to understand than CDS (Krashen, 1981, p. 127). In contrast to the findings on the use of relatively simple lexical items, Smith et al. (1991) find many occurrences of technical vocabulary and idiomatic expressions in their data, as well as expressions of tension and frustration regarding general conversational behaviors (Smith et al., 1991, p. 181). So also with regard to lexical semantics, no homogeneous picture emerges.

Chapter 5.  Talking to foreigners 159

Many studies on FDS stress the amount of interactional modifications by means of which native speakers attempt to make themselves understood and to secure their partners’ understanding (e.g. Mackey, 2012; Gass, 2003, p. 233f): the presentation of comprehension checks to make sure that the conversation partner has understood the speaker’s utterance; –  requesting clarification, asking the communication partner for explication of his or her previous utterances in order to secure understanding; – the presentation of candidate answers so that the non-native speaker only needs to choose between different preformulated alternatives; – the presentation of reformulations, which provide the communication partner with alternative formulations, increasing the chance of comprehensibility; – the presentation of topic-focused questions that present alternative perspectives on the topic; – the use of exemplification in order to clarify the meaning of a lexical item; – the provision of recasts of the communication partner’s utterance in order to provide indirect negative evidence of what should not be said and to provide simultaneously a positive model of how the idea might have been expressed.



Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

The following example illustrates a lexical paraphrase by means of exemplification: (6) NS: Where do you eat your daily meals? NNS: Daily meals? NS: Lunch and dinner, where do you eat them?

Long (1981) terms these efforts at facilitating comprehension interactional modifications. They occur generally whenever communicative problems arise – places at which ‘negotiation’ becomes necessary (Varonis and Gass, 1985). Freed (1981), in her study of volunteers in a cultural exchange program, reports similar strategies as the ones outlined by Gass (2003). She finds well-designed adaptations to the foreigner’s respective linguistic proficiency and holds that selfrepetitions are usually due to signs of misunderstanding from the side of the foreigner (Freed, 1981, p. 31); in her opinion, FDS is characterized by two concerns of the native speakers: to ensure comprehension and to secure continued flow of conversation (Freed, 1981, p. 33). Bortfeld and Brennan (1997) observe similar efforts from the side of the native speakers in interactions between native and non-native speakers; they find the native speakers to talk more and to produce utterances that were post hoc rated at less natural than those by the non-native speakers. The authors relate these findings to the principle of least collaborative effort (Clark and Wilkes-Gibbs, 1986), thus suggesting that the native speakers take over the responsibility to ensure successful communication.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

160 Designing Speech for a Recipient

To sum up, speakers have been found to invest much time and effort into securing understanding, negotiating meaning and thus providing their non-native communication partners with indirect evidence regarding their language errors and the possible effects these errors may have, as well as on the meanings of words and phrases. Furthermore, speakers have been found to take interactional load from their non-native communication partners. These interactional modifications are often placed in contrast to input modifications like the grammatical peculiarities discussed above (e.g. Ellis, 1999). In contrast to the latter, the former are often evaluated very positively, and it is suspected that they facilitate communication and possibly language learning (e.g. Gass, 2003; Mackey, 2012; Lyster et al., 2013). Long (1981) also finds significant amounts of interactional modifications of the kinds described above, yet only in certain communicative situations. He investigates interactions between native and non-native participants in comparison with native–native interaction. He asked participants to carry out six different tasks, some of which involved the exchange of information whereas others were more conversational in nature. He finds significant differences between speech to non-native speakers and speech to other native speakers with respect to the interactional strategies described above, however only for those tasks in which exchange of information was at issue. Thus, when the exchange of information was not part of the joint tasks given to the interactants, there were no significant differences between conversations among native speakers and among native and non-native speakers (Long, 1981). Smith et al. (1991) even observe counterproductive conversational strategies. In their study, they asked pairs of participants, a native and a non-native speaker, who in fact was a confederate, to interview each other on the basis of a list of general questions. In the first condition the confederate, a trained linguist and professional actress, spoke English fluently, in the second condition she spoke in accented English the way her mother speaks, and in the third condition she spoke in broken English the way her grandmother speaks. They find strategies like the following (Smith et al., 1991, p. 180): (7) Confederate: What is ah im impulsive? Subject: (pointing with pencil to one end of scale on confederate’s questionnaire, as if to mark it for her): Yeah. I am. Yeah. (8) Subject: OK, let’s go on to another question. Uh – I don’t think you’re going to explain to me what the most serious problems in the world are today, so (laugh)…; Oh, OK, here’s the one after your goals. Uh, What do you see yourself doing ten years from now? (spoken rapidly) Confederate: Sorry? I uh Subject: Ten years. (sigh) ummmm (muttered under breath) Oh, this is fruitless.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 5.  Talking to foreigners 161

These interactions differ from those described by Gass (2003) and others in that the native speaker does not aim at negotiating meaning, or at providing helpful cues to language learning. Instead, in the first example, the native speaker ignores the confederate’s question about the meaning of the word impulsive by orienting towards the reason for asking the question, namely to fill in the appropriate information in the questionnaire. Thus, instead of negotiating meaning, the subject tells the confederate what to write down. Judging this exchange as counterproductive, however, implies that the subjects in these interactions should be oriented at cooperative language learning. Yet, as the example shows, the subject is more concerned with getting the questionnaire filled out correctly. The native speaker’s linguistic behavior in Example (8) is however neither oriented at language learning, nor at filling in the questionnaire efficiently; instead, it consists of rather pessimistic evaluations of the tenability of the goals set by the experimenters, given his or her communication partner’s limited linguistic capabilities. Thus, while the native speaker is not making any simplifying or clarifying efforts, he or she utters doubts about the usefulness of the whole interaction. The strategies used can be judged as counterproductive if we assume that the goals of these interactions are collaboration, language learning or cooperative information exchange. Interesting in this connection is also that the six speakers in the study by Smith et al. (1991) in each condition were very heterogeneous in their behaviors. Thus, the pessimistic strategies described above were only identified for some, not for all speakers; others behaved more in ways described by researchers like Gass (2003), Long (1981), Ellis (1999) or Mackey (2012). Wagner (1996) does not find any of the cooperative interactional modifications reported on by Varonis and Gass (1985), Gass (1997) or Gass (2003) either. One of the communicative situations he investigates is a naturally occurring interaction between a foreigner and a delivery person. The conversation exhibits many linguistic modifications but no interactional ones. Here it is the foreigner who takes the initiative and who introduces the new topics, and yet there is a close cooperation between the two speakers; for instance, the participants collaborate by involving the other by means of tag questions or by means of structure sharing, that is, by the use of parallel constructions (Wagner, 1996, p. 227). Another interaction he analyzes is a telephone conversation between two business men who do not engage in cooperative strategies like the above at all but deal with communication problems as conceptual, not as linguistic, problems about the business (Wagner, 1996, p. 230). Similarly, Firth and Wagner (2007) report on interactions between business people in which linguistic negotiation does not constitute an issue the participants orient to (cf. also Gardner and Wagner, 2004).

162 Designing Speech for a Recipient

To sum up, various interactional modifications have been observed in FDS, yet not in all circumstances. Their use seems to depend on whether information exchange is at issue for the speakers themselves (cf. also Mori, 2007) and whether speakers consider the communication to be feasible. Furthermore, the interactional modifications have not been found in situations in which language learning is not in the foreground.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

5.2  Functionality Hatch (1983) proposes four functions of FDS, one of them being to support language acquisition (cf. also Ellis, 1985; Gass, 2003), which is certainly the most extensively studied function of FDS. Early studies suggest a facilitative effect on second language acquisition. For example, Krashen (1981) suggests that teacher talk, foreigner talk and interlanguage talk4 facilitate second language acquisition as much as CDS facilitates first language acquisition. He proposes that simplified input is useful for language learning because it provides learners with structures they can understand because they already know them and additionally with some structures the learners have not yet acquired (Krashen, 1981, p. 126f). He admits that, regarding linguistic and interactional modifications, “not all native speakers will lay down the right size ‘net’ ”, that is, not everyone may be able to determine the set of structures already covered by their communication partner and those that may constitute an appropriate challenge (Krashen, 1981, p. 131). Otherwise he considers FDS in all of its forms to be highly useful for language learning. Also with respect to pidgins, the linguistic adaptations of FDS have been suggested to provide useful means for language learning. For instance, for Tok Masta, the FDS variety of Tok Pisin, Mühlhäusler (1981) argues that “because of its reduced character, Tok Masta offered a much more suitable input than every day English. The function of FT can thus be compared with that of ‘motherese’ in first-language acquisition” (Mühlhäusler, 1981, p. 110). Ellis (2008) argues furthermore that drawing a learner’s attention to a distinction may be very helpful especially in second language acquisition since transfer, learned attention and automatization are here at work (Ellis, 2008, p. 8), which make it more difficult for learners to acquire language from implicit instructions only (Ellis, 2008, p. 8); while Ellis makes this point with respect to classroom interaction, it can also be applied to the structural highlighting during grammatical 4. Krashen draws the lines between these three different varieties differently from how they are dealt within this study (Krashen, 1981, p. 121), yet together the three varieties cover broadly the same range as FDS considered here.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 5.  Talking to foreigners 163

modification. Similarly, Wang (2010) argues that the optimal input for second language acquisition is comprehensive, interesting, authentic (in the sense of pragmatically appropriate) and occurs in sufficient quantity; if FDS characteristics thus contribute to enhancing comprehension, they should facilitate language learning. Many recent studies attempt to estimate the impact of these modifications on language learning. Studies in second language acquisition have devoted very much attention to the effects of linguistic and interactional adaptations on language development. The aim is to find effective teaching methods, and thus the focus is not on whether speakers modify their speech and, if so, how, but on the possible uses of linguistic and interactional strategies for classroom applications. Many studies address the relative potential of modified input, interactional modification, and, thirdly, modified output (e.g. Lyster et al., 2013; Madlener, 2014). Yet in order to show that these strategies facilitate language learning, authors need to demonstrate not only that they facilitate comprehension, but also that learners are able to change their linguistic output accordingly and that the changes pertain (e.g. Ellis, 1999). One such study is Lyster and Ranta (1997), who investigate different feedback types produced by four teachers in the classroom as well as students’ reactions to the feedback they received. They find that teachers overwhelmingly use recasts, although other feedback types, for instance, elicitation, metalinguistic feedback, clarification requests and repetition lead to student-generated repair more successfully. Ellis (2008) argues that especially if things are simple, drawing a learner’s attention to a distinction or criterion may be very useful (Ellis, 2008, pp. 5–6). Since in second language transfer, learned attention and automatization from speakers’ experience with their first language are at work, learning from implicit interaction only can be problematic (Ellis, 2008, p. 8). However, distributional characteristics of input may be even more crucial in foreign language learning than during first language acquisition since the native speaker can take his or her lifelong experience with the language as the sample to judge from, whereas the learner only has a small subset of data at his or her disposal. Hence Ellis (2009) suggests to take also type and token frequency into consideration when preparing learning input since these two measures provide the learner with information on the productivity and the idiomaticity of a construction, respectively. Furthermore, skewed input like children receive (cf. Casenhiser and Goldberg, 2005; Goldberg et al., 2004) will point learners to constructional meanings. Ellis et al. (2013) use corpus analyses to show that second language learners make use of distributional characteristics – thus, distributional modifications could contribute to facilitating also second language learning. Moreover, if facilitative effects for a particular strategy can be identified, one may want to ask further which aspect of this strategy causes the facilitation. For example, scholars have investigated whether the facilitative function of, for instance,

164 Designing Speech for a Recipient

recasts, is due to their modeling effect, i.e. the provision of positive evidence, or due to the provision of negative evidence, drawing attention to aspects of morphosyntax or lexis that the learner needs to revise. For instance, Leeman (2003) investigates which components of recasts are responsible for the supportive effect of recasts in second language development. However, in order to create comparable conditions and to constrain input and interaction in controlled ways, researchers have sometimes employed methods that may impose considerable restrictions with respect to the generalizability of the results on linguistic modifications. In particular, in order to study the effect of certain modified grammatical structures in a controlled and systematic way, scholars resort to using scripted dialogs. For instance, Mackey (1999) presents an example of such a scripted dialog:

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

(9) NS: and now under it draw a pear. A pear is a fruit. It is like an apple. The color is green. Draw the pear under the book. Can you draw it? NNS: Ok ok I got it. Look like apple (draws). NS: Good. Now on the right of the pear draw an umbrella. (Mackey, 1999, p. 579)

It is clear that with such kind of input, not only the effect of certain linguistic modifications is tested, but also the effect of the presentation of extremely boring information in a lecturing manner. Ellis (1999) furthermore criticizes this and other studies for the fact that sessions with interactional modifications take much longer than those with scripted, modified input and that thus the two situations are not comparable. Therefore, later studies control for time – which however does not resolve the problem of this extremely unnatural interaction. In contrast to interactional modifications, which researchers hold to be generally facilitative for language acquisition, most scholars do not regard linguistic modifications to be particularly helpful. Some scholars have even argued strongly against the use of FT characteristics for language learners because of the risk of depriving learners from correct, idiomatic, fluent input that would be representative for the richness of the language as a whole. Müller (2000), for instance, argues that there is the danger of fossilization if learners are presented with speech exhibiting FT characteristics only. He argues that the users of FT characteristics stress the analytical properties of the language, instead of its gestalt properties, which provides particularly adult learners with impoverished, unrepresentative input that leads to pidginized speech. These caveats refer to linguistic modifications and not to interactional modifications, which therefore have been evaluated much more positively in the literature (e.g. Valdman, 1981; Ellis, 1985). A second function of the adaptations in FDS suggested by Hatch (1983) is to enhance communication. As we have seen above, we cannot assume that speakers

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 5.  Talking to foreigners 165

always interact with the aim of language teaching, but sometimes they just attend to getting something done, as in the business interactions described in Wagner (1996) or in the Tok Masta case (Mühlhäusler, 1981). Especially if there is very little shared basis with respect to language proficiency, the simplifications of FDS can support speakers in reaching a certain communicative goal, as for instance in the Yimas pidgin, which serves trading purposes (Foley, 2006), or in Hiri Motu (Thomason, 2001), which serves as the common code for police forces drawn from various places with various language backgrounds. Similar uses have been found, for instance, by Hatch et al. (1978) for both the interaction between two friends and between a teacher and his students, where the modifications facilitated communication (see also Firth and Wagner, 1997, 2007). The linguistic adjustments made may provide highly effective means for communication when no or little common language is available (cf. also Heine and Kuteva, 2007; Thomason, 2001). Another function of FDS proposed by Hatch (1983) is to create affective bonding. In an earlier study, Hatch and colleagues had investigated conversations between two friends, and the native speaker was found to adapt to her foreign friend’s linguistic structures to a very high degree (Hatch et al., 1978). Similar alignment with foreigners’ mistakes was also found in unofficial parts of the lessons in personalized interactions between a teacher and his students, yet the authors found few of such adaptations in telephone interactions with unfamiliar people. Thus, a personal, affective relationship may influence the use of FT characteristics, and the uptake of particular linguistic features from the learner’s interlanguage may create affective bonds. Furthermore, Ellis (1985) suggests that FT has another function, which is quite in contrast to those just outlined. He writes that a “function that Hatch does not mention is the use of foreigner talk to mark the role relationship between speakers – the ‘talking-down’ function. This involves the use of ungrammatical simplifications such as omission of grammatical functors and a special lexicon (e.g. ‘savvy’)” (Ellis, 1985, p. 137). Thus, Ellis (1985) holds that native speakers may make use of grammatical and lexical modifications to establish a particular hierarchical relationship between the interactants. This function is of course in contrast to the bonding function Hatch (1983) proposes (and probably not helpful in language learning, either). Also Mühlhäusler (1984) reports that indigenes disapprove of the foreigner talk version of Tok Pisin, Tok Masta (Mühlhäusler, 1984, p. 286). Many speakers of Tok Pisin hear this variety as condescending (Mühlhäusler, 1984, p. 287). What Ellis (1985) terms the ‘talking-down’ function has been discussed in detail as speech divergence and overaccommodation in Communication Accommodation Theory (CAT). The theory assumes four different dimensions to be relevant in the explanation of the dynamics of interactions between native and non-native speakers: native speakers’ interactional goals, for instance, their wish to communicate

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

166 Designing Speech for a Recipient

effectively; native speakers’ perceptions of the non-native speaker; native speakers’ encoding strategies; and non-native speakers’ decoding strategies of the strategies taken by the native speaker (Zuengler, 1991, p. 235). These dimensions influence each other in the dynamics of native–non-native speaker interactions. One possible interactional effect is speech divergence, when a speaker, for instance, because of stereotypical assumptions about the partner, wishes to increase social distance. Overaccommodation, in turn, concerns the non-native speakers’ perception of the native speaker’s strategies as comprising more adjustments than necessary, which may be due to a stereotypical perception of the non-native speakers’ linguistic competence or again the wish to increase social distance. According to CAT, the different ways FDS may be perceived are then due to non-native speakers’ decoding strategies as well as due to speakers’ different motives for the use of FT characteristics. These goals and strategies account for how it is possible that the same linguistic adaptations have markedly different effects and thus fulfill different communicative functions, as Ellis suggests, i.e. that they can be understood either as helpful adjustments or as ‘talking down’. An additional dimension in speakers’ attitudes towards FDS may be culturally determined; Ross and Shortreed (1990) investigate the effects the use of Japanese FT may have on overhearers; their participants, 195 Japanese university students enrolled in an English as a Foreign Language course, rated scripted dialogs between a native and a non-native speaker of Japanese that differed depending on (a) the Japanese language proficiency exhibited by the non-native speaker and (b) the Japanese native speaker’s reaction, which consisted of ‘normal’ speech in one condition, FT in the second, and a code-switch into English in the third condition. Results show that the Japanese students rated code-switching as more cooperative than the use of FT or standard Japanese regardless of the non-native speaker’s language proficiency, and especially the female students rated code-switching as more internationally minded and as more educationally attained. The use of FT was generally understood to be more polite than the use of code-switching, and both were rated as significantly more polite and as more empathic than the use of standard Japanese. The authors evoke Speech Accommodation Theory, i.e. in an earlier version of CAT, for the explanation of the results and argue that linguistic divergence may thus be motivated by psychological convergence (Ross and Shortreed, 1990, p. 141). In Japanese society, FT characteristics therefore seem not to be associated with a ‘talking-down’ function as it was observed by Valdman (1981) and Ellis (1985) in the Anglosaxon context. Given the negative evaluation of FT expressed by, for instance, Valdman (1981), Ellis (1985) or Müller (2000), it is unlikely that English FT would elicit similar judgments from other native speakers of English. The evaluation may be more positive from the learners’ perspective, however; Müller (2000) asked learners

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 5.  Talking to foreigners 167

of German as a foreign language to comment on a foreigner talk episode that he judged very negatively himself; contrary to his expectations, from the viewpoint of learner, the students all judged the adjustments to be very helpful. Even Smith et al. (1991), who report on various counterproductive strategies with respect to lexicon, prosody and interactional behaviors, mention that the subjects’ attitude towards the confederate was very friendly throughout. Thus, judgments on the usefulness of certain bad adaptations may differ considerably – depending on, as CAT suggests, the non-native speaker’s decoding strategies and the native speakers’ interactional goals. To sum up, certain features of FDS, such as grammatical and lexical modifications, have been regarded to provide helpful clues to comprehension and possibly to language learning, even though they may lead to fossilization if they constitute the only input for the language learner (see Müller, 2000). Numerous studies furthermore address the usefulness of interactional modifications for language learning. FT has also been found to be useful for communication if no common language is available, yet a certain interactional goal has to be reached. FT characteristics can furthermore serve as a sign of cooperation if the adjustments made correspond to those features produced by the communication partner. In contrast to all of these functions, FT has also been proposed to establish a hierarchical relationship between the speakers, to express divergence and to constitute degenerate input for language learning. An effect of the use of FT characteristics can furthermore be the perception of overaccommodation. How FDS is perceived seems moreover to be partly culture-dependent.

5.3  Homogeneity To sum up the discussion so far, the features identified for FDS are extremely heterogeneous. What speakers do when speaking to foreigners ranges from 1. 2. 3. 4.

‘nothing special’, over ‘input modifications’ and ‘interactional modifications’ to ‘counterproductive strategies’.

For instance, the comparison of the FDS features proposed by Ferguson (1975) and by Gass (1997), maybe with the exception of clear and slow delivery, shows no overlap between features proposed. The survey of the phonological, prosodic, morphosyntactic, semantic and pragmatic features of FDS above as well as the broad, and partially contradictory, spectrum of functions FDS may fulfill, characterize this variety as extremely heterogeneous.

168 Designing Speech for a Recipient

Researchers have introduced several distinctions to account for the differences observable. One such distinction is the difference between grammatical and ungrammatical FDS (cf. Ellis, 1985, p. 137). Long (1981) suggests that ungrammatical FDS is likely to occur under the following conditions:

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

1. 2. 3. 4.

“The NNS has very limited command of the language of communication. The NS is, or thinks he or she is, of higher social status than the NNS. The NS has considerable FDS experience. The conversation occurs spontaneously” (Long, 1981, p. 264).

The criteria proposed thus concern aspects of the communicative situation, the relationship between speaker and listener, and the linguistic capabilities of the communication partner. For each of these criteria examples can be found in the literature. For instance, Smith et al. (1991) find adaptations in mean length of utterance only in speech to a foreigner speaking broken English, not in speech to a foreigner speaking merely accented English. Furthermore, Tok Masta was used for the communication with speakers with very low English language proficiency. And Snow et al. (1981) find that the more limited the non-native speaker’s capabilities, the more FT characteristics can be found. Regarding status differences, especially pidgins that rest on FT at least to some degree (e.g. Tok Masta, Hiri Motu, or Yimas pidgin) are characterized by social differences between participants (Mühlhäusler, 1981; Thomason, 2001; Foley, 2006). Also in interactions with guest workers differences in social status have been noted together with the occurrence of grammatical simplifications (Roche, 1989). That the conversation needs to occur spontaneously is supported by several studies, since the presence of ungrammatical features is mostly reported in pidgins, interaction with guest workers, international business contacts or between municipal employees and immigrants, whereas in school classes (Henzl, 1979), in cooperative communication elicited in experimental settings (e.g. Gass, 2003) or with volunteers signing up for intercultural exchange (Freed, 1981), no such features have been reported.5 However, as much as we can find examples in support of the criteria proposed by Long (1981), there are also examples against them, as the author acknowledges himself (Long, 1981, p. 264). For instance, in the informal conversations between the two friends investigated by Hatch et al. (1978), no status differences hold (besides the linguistic ones, which would render the criterion circular and trivial), 5. ‘Spontaneity’ is most likely taken here to mean an incidental (non-experimental, non-planned and non-language learning) interaction between people on topics other than language learning and intercultural exchange.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 5.  Talking to foreigners 169

and still they observed numerous input modifications, i.e. lexical and grammatical adjustments. Teachers can furthermore be said to have much experience, and classroom discourse may also comprise episodes of spontaneous conversation, and still teachers refrain from using ungrammatical modifications; for instance, the classroom discourse observed in Henzl (1979) did involve very low proficiency speakers and still exhibited no or only few grammatical adjustments. The same holds for the broken English condition elicited in Smith et al. (1991), where no instances of grammatical modifications were found. And finally, the business encounters described in Hatch et al. (1978) and in Wagner (1996) did occur spontaneously, yet they did not show any ungrammatical grammatical modifications. An additional criterion is proposed by Ellis (1985) who suggests to add the bonding function to the list. In spite of the plausibility that the wish to bond with the communication partner may lead to alignment even beyond grammaticality, the criterion of bonding is in sharp contrast to the criterion of status difference, which illustrates the problematic nature of the list proposed. So it is not clear whether the criteria proposed by Long (1981) are meant to reinforce each other or whether they have to be viewed as independent of each other. To conclude the discussion of the conditions suggested by Long (1981) and others (e.g. Ellis, 1985), the list of factors proposed does not provide a satisfying account of the variability observed in FDS. A set of possible reasons for the occurrence of interactional modifications does not seem to be more promising: while Long (1981) suggests the necessity to exchange information to be an important criterion, the variety of interactional modifications and counterproductive strategies discussed in Smith et al. (1991), which was observed in the same situation, cannot be accounted for by this criterion. Wagner (1996) proposes an alternative account. He suggests that the occurrence of linguistic and interactional modifications is related to issues of initiative and attention in dialog and to the social roles of the participants (Wagner, 1996, p. 230). He states that “even with non-fluent partners, NS and NNS in the business data do not deploy heavy linguistic or interactional modification (…). In not doing so, they do not orient their contributions towards their partners’ linguistic competence but towards their professional competence” (Wagner, 1996, p. 230). As we have seen in Snow et al. (1981), however, orientation towards professional relations does not lead automatically to the lack of linguistic or interactional adaptations. Considering which studies are reporting on grammatical modifications and which are not, it appears that grammatical adaptations occur in natural settings, like, for instance, pidgins, especially those based on FDS (Foley, 2006), interactions between guest workers and their foremen (e.g. Clahsen et al., 1983), municipal employees (e.g. Snow et al., 1981), or shop clerks (e.g. Roche, 1989), and in naturally occurring interactions between friends (e.g. Hatch et al., 1978; Katz, 1981). We

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

170 Designing Speech for a Recipient

can assume that in these interactions, the focus is not on language learning but on other communicative goals. In these contexts, grammatical modification serves the purpose of securing communication. For instance, Firth (2009) argues that language learners often orient clearly at not being perceived as a language learner, and thus attention will be on the business done with the communication partner. In contrast, lack of grammatical modifications, but ubiquitous interactional adjustments, have been reported in pedagogical contexts (e.g. Henzl, 1973, 1979) and in experimentally elicited cooperative information exchange tasks (e.g. Gass, 2003, 1997; Varonis and Gass, 1985; Hatch, 1983; Freed, 1981). The attention to second language learning and to cooperative problem solving observed in the studies by Long (1981), Varonis and Gass (1985), or Gass (2003), for instance, may thus be due to the particular cooperative tasks that render attention to language proficiency and intercultural differences relevant, at least more relevant than they may be attended to in naturally occurring interactions. Wagner (1996) concludes therefore that the strategies observed in previous research are at least partly due to “a problem created by the elicitation procedure itself” (Wagner, 1996, p. 231). The division between experimental and naturalistic data corresponds to the results of Long’s own analysis (Long, 1981, p. 260–261), in which he reviews the types of data analyzed in a broad range of published and unpublished studies created in the late 1970ies. The pattern emerging suggests a clear division between data that have been elicited ‘indirectly’ (like Ferguson’s stereotype data) and data naturalistically observed on the one hand, in which ungrammatical input has been observed, and data elicited in quasi experimental settings and language classrooms, in which no such modifications have been observed. A decisive factor in the realization of FDS thus seems to be the activity speakers are engaged in and the communicative tasks to which they attend; if they consider the communicative task to be to achieve something, grammatical correctness does not play an important role. If however language acquisition is in focus, grammatical correctness does play a role. Especially if the listener’s knowledge about the language is rather basic, linguistic adaptations may be useful – and conversely, if the common grammatical structures of a language have already been acquired, linguistic adaptations in this respect may be besides the point. Moreover, if the acquisition of correct linguistic structures is in focus, grammatical correctness does play a role. If for some reason some kind of cooperation is desired between the interlocutors, be it a personal relationship or a cooperative task imposed on the speakers by an experimental situation, cooperative strategies aiming at joint meaning constitution can be observed. This corresponds to the suggestion by Communication Accommodation Theory (CAT) that speakers’ interactional goals play a decisive role. CAT holds that differing communicative goals, such as the wish to communicate effectively

Chapter 5.  Talking to foreigners 171

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

versus the wish to distance oneself from the communication partner, may influence the choice of encoding strategy, such as convergence, divergence, maintenance or complementarity. From this perspective, also the broad spectrum of different linguistic and interactional behaviors observed in Smith et al. (1991) can be accounted for. Communication Accommodation Theory furthermore explains how native speakers’ perception of the non-native speaker determines the choice of strategy. It also subscribes to the hypothesis that situation is not externally given but defined by the interactants (cf. Gumperz, 1982). Yet also external factors may have an impact. For instance, one can assume that pedagogical situations impose their own constraints on the speech towards foreigners since the utterances then become the ‘input’ for a language learner. Thus, the language classroom basically prevents native speakers from using ungrammatical forms (cf. Henzl, 1979). Thus, external constraints of the situational context determine in part the shape of FDS. Researchers have addressed this finding by suggesting to distinguish between naturally simplified FT versus pedagogically simplified FT, also called Teacher Talk (e.g. Ellis, 1993, p. 54), where the naturally occurring FDS may also comprise FT characteristics that violate standard grammatical rules (see also Gass, 2003; Ellis, 1985, p. 34–36). To conclude, FDS can occur in various different forms, which vary with the activities the speakers are engaged in and with their interactional goals, as well as due to situational constraints.

5.4  The relationship between partner modeling, alignment and feedback In which ways does partner modeling shape interactions between native speakers and non-native speakers? As the discussion above shows, speakers’ understanding of the goals of the interaction influences the strategies they will choose. Thus, understanding the communication partner, for example, as a foreign language learner, as a business partner or as a friend, plays a crucial role for the way the communication partner is talked to. The speakers’ concepts of what the communicative tasks and what the current situation consist of, including the communication partner, are important factors determining FDS. If the communication situation is understood as an intercultural dialog, the central question becomes “Who has introduced culture as a relevant category, for what purposes, and with what consequences?” (Scollon and Scollon, 2001, p. 545). In second language learning scenarios and in scenarios involving experimental cooperative tasks, speakers’ conceptualizations of the situation seem to be quite stable; the literature on the interactional strategies taken in native–non-native interaction is remarkably uniform (within the given research paradigm) (e.g. Long, 1981; Varonis and Gass, 1985; Gass, 2003). In these studies, the communicative situation

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

172 Designing Speech for a Recipient

and the communication partner are generally sufficiently defined by the fact that one participant is a language learner and the other one a native speaker, and hence an expert, of the language, as well as by the cooperative task they are involved in. In contrast, in naturally occurring situations, no fixed situation definitions may be available, and situations may comprise many further aspects besides language learning that contribute to speakers’ understanding of the affordances of a situation. Here, speakers seem to rely on different resources and may in fact treat language proficiency as a private matter, not to be commented on openly within the interaction (see Firth, 2009, p. 149). As a result, FDS is neither homogeneous nor sufficiently defined by the fact that the communication partner is a non-native speaker alone. This may be due to differences regarding what it means to be ‘foreign’, as well as to different aspects of the communicative situations attended to, such as communicative tasks with respect to information exchange, action and language learning. Thus especially in these naturally occurring situations, FDS is not determined only by the fact that the communication partner is a non-native speaker. Nevertheless, the adjustments speakers make are very precisely tuned to the perceived affordances of the situation. So the question becomes, which resources do native speakers draw on when speaking to a foreigner? One is stereotypes. As Ball et al. (1985) argue, the addressee is often not dealt with as an individual but as group member. They discuss the relationship between interpersonal and intergroup orientation; they suggest that speakers may sometimes orient more to the group to which their communication partner is taken to belong than to the person him- or herself. For instance, Zuengler (1991) describes the perception of over-accommodation as due to the native speaker’s orientation at a stereotype, rather than at the actual communication partner (Zuengler, 1991, p. 240), when the partner is oriented to as a group member. Roche (1989) in his recordings of interactions between various different foreigners and shop assistants in Germany finds also that skin color can be more influential than a communication partner’s actual linguistic proficiency. He reports that for the six different foreigners he investigated, the amount of FT characteristics they were exposed to was accounted for best by the addressees’ skin color and not by their command of the language. Thus, as a clue to membership categorization, native speakers may orient at their communication partners’ appearance, more so than to their language proficiency. Another resource that native speakers may rely on to make choices for the respective communication partner is alignment. Many researchers have identified alignment effects in FDS, and one of the hypotheses as to the origin of FT is that FT features match the language learners’ mistakes. For instance, in their study of municipal employees in two government offices, Snow et al. (1981) find significant correlations between foreigners’ errors and municipal employees’ simplification efforts

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 5.  Talking to foreigners 173

in terms of MLU and percentage of FT features (Snow et al., 1981, p. 89). Thus, the employees’ FT characteristics are partly due to the respective foreigners’ mistakes. Moreover, Hatch et al. (1978) report a direct matching of linguistic expressions between a non-native speaker and her friend. Similarly, Wagner (1996) reports on structure sharing between the foreign customer and the delivery person. In this interaction, speakers employ parallel constructions for the design of their utterances. Katz (1981) cannot find any statistically significant relationship between the foreign child’s mistakes and child native speaker’s adaptations, but he reports long term adaptation effects concerning the number of mistakes and the number of utterances containing adaptations (Katz, 1981, p. 61). That is, although the speaker does not align with her playmate’s interlanguage features, she adapts her speech globally to the linguistic level of her communication partner. Similarly, DePaulo and Coleman (1986) find more finetuning in FDS than in speech directed at children or retarded adults. This is remarkable since in these interactions, speakers did not receive any feedback from their interlocutors. In fact, they only got to meet their communication partners in a short pre-experiment conversation. During the instructions, speakers only believed to be talking to the communication partner they had just conversed with, yet actually there was no addressee present. Nevertheless, participants adapted their speech very sensitively to the linguistic level of their respective addressees, and much more strongly so for non-native listeners than for children, retarded and native adults (DePaulo and Coleman, 1986, p. 957). Henzl (1973, 1979) observes finetuning in a teacher’s talk to his students depending on the students’ language proficiency; at the same time, she finds his speech also to be adapted globally and irrespective of the actual linguistic features of students’ speech. Thus, alignment with particular linguistic features of non-native speakers’ utterances has been observed; however, there are also cases in which alignment cannot be found. For instance, Smith et al. (1991) report that many speakers did not slow down even though the confederate did, some even sped up. In particular, when the confederate spoke accented or broken English, she slowed down from the ‘normal’ 3.5 words per second to 2.0 in the accented English condition and to 1.4 in the broken English conditions. However, most of her communication partners did not align with her pace; while two did slow down, two spoke in ‘normal’ speed and further two even sped up, producing as many as 5.1 words per second (Smith et al., 1991, p. 178). Thus, most speakers did not align with their communication partner with respect to pace. Smith et al. (1991) argue that speakers’ failure to adapt to their communication partners results from the extra effort needed to adapt word choice, speed and sentence structure in order to fulfill the goal of enhancing clarity. They speak of a “cognitive burden”, which explains also speakers’ avoidance strategies,

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

174 Designing Speech for a Recipient

such as changing the topic, interrupting the non-native speaker or answering for her (Smith et al., 1991, p. 183–184).6 Zuengler (1991) brings in the concepts of maintenance and divergence to account for disalignment and the failure to align. That is, in the Communication Accommodation Theory model, one determining factor of FT is speakers’ encoding strategies; speakers may choose between convergence, divergence, maintenance or complementary strategies. If speakers choose to speak even faster than normal in the presence of a slow non-native speaker, the strategy they choose is divergence; if they simply maintain their ‘normal’ pace and do not align with the partner, the strategy is called maintenance. These strategies may be related to speakers’ interactional goals, i.e. whether ensuring comprehension is a goal attended to or whether native speakers rather want to emphasize their distinctness from their non-native interlocutor. This is again influenced by native speakers’ perception of their communication partner as culturally different, or as member of a group that is different from the groups in which the speaker is a member. Another relevant factor in speech divergence may be issues of language ownership (cf. Auer, 2006). Native speakers may not always expect non-native speakers to speak like native speakers. For instance, varieties like pidgins occur in contact situations with only limited contact between speakers of two languages. That is, pidgins are only created if there is some language contact but not much (Thomason, 2001). This may be due to the situational restrictedness of the pidgin, but it may also be due to speakers’ deliberate choice not to reveal too much of their language; for instance, with respect to Tok Masta, the pidgin spoken by British settlers in New Guinea, native speakers of English were worried that (e.g. Mühlhäusler, 1981, p. 111): There is a decided danger in allowing house-servants to obtain too large a vocabulary of the English language in that English-speaking Whites enjoy no privacy in their conversation in the home. [from the editorial of Rabaul Times of November 8th, 1935]

Thus, native speakers of a language may not be willing to have a group of nonnative speakers participate in their knowledge of the language. Another reason to not let the communication partner in the knowledge of one’s own language is to mark group boundaries; for example, Janicki and Mickiewicza (1986) argue that different types of accommodation should be distinguished: alignment on the verbal, nonverbal, and on the psychological level, and that a speaker’s overall accommodation may differ from the way accommodation is perceived by the 6. This position is of course in opposition to Pickering and Garrod (2004), who hold that alignment facilitates processing and reduces cognitive load.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 5.  Talking to foreigners 175

communication partner. Regarding communication with foreigners, the authors present anecdotal evidence that perfect convergence, for instance, a foreigner’s native-like pronunciation, may be perceived as ridicule and thus judged as unwelcome whereas perfect grammatical competence is always welcome (Janicki and Mickiewicza, 1986, p. 171). Although their article is concerned with the degree of alignment expected of foreigners, it can be instructive as to the expectations generated by the social role ‘foreigner’ and thus regarding speakers’ possibly restricted willingness to understand their communication partner as a part of their group. Also Giles et al. (2006) suggest that full convergence, that is, native-like competency, is often desired neither by the native speaker nor by the foreigner; the issues involved for the native speaker are deception and lack of trust, whereas the problem for the foreigner may rather lie in maintaining social identity. Regarding alignment we can conclude that only some features of FDS can be explained by means of alignment, and that the reasons for its limited role may be cognitive overload, psychological divergence, stereotypes or social expectations. That is, partner models, social factors and personal agendas may have a stronger impact on native speakers’ linguistic choices than automatic alignment effects. What role does interaction play in this discussion? The results on finetuning reported above indicate that the non-native speakers’ utterances influence speakers’ linguistic choices with respect to FT characteristics. A study that addresses the relationship between feedback and partner modeling directly is the one by WarrenLeubecker and Bohannon III (1982) who investigated forty native English speaking students in conversations with a foreigner in one of four different conditions: high or low expectation and high or low feedback with ten students per condition. Expectation concerning the foreigner’s linguistic competence was manipulated by the confederate by introducing herself either with a heavy Spanish accent and in broken, halting English or in fluent English with only a slight accent. Feedback was manipulated by short questions like “what?” or “huh?” and “quizzical looks” in one condition and nods, smiles and utterances like “okay” and “yes” in the other (Warren-Leubecker and Bohannon III, 1982, p. 210).The first forty utterances of each participant were analyzed for mean length of utterance, number of questions, number of false starts and disfluencies, repetitions and reformulations. The authors find a significant effect for feedback, such that subjects used longer utterances when the accomplice signaled that she understood. Expectation yielded only a significant effect on the second block of ten utterances, i.e. for utterances eleven to twenty, for mean length of utterance. Expectation also had a significant effect on the number of repetitions and reformulations. Subjects in the low expectation condition repeated themselves almost twice as often (Warren-Leubecker and Bohannon III, 1982, p. 212).Yet also feedback signaling failure to understand elicited significantly

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

176 Designing Speech for a Recipient

more repetitions and reformulations. Furthermore, subjects who received feedback indicating communication problems asked significantly more questions. Crucially, however, the data elicited by Warren-Leubecker and Bohannon III (1982) allow us to determine the interaction between expectation and feedback – while in natural scenarios there is usually a congruence between low expectations and low understanding and high expectations and high understanding, in this study the two variables were presented independently from each other. With respect to three out of four dependent variables investigated (MLU, repetitions, questions), feedback turned out to determine speakers’ linguistic behavior more than the initial expectations. That is, speakers adapted their linguistic behavior mostly on the basis of cues to their partner’s understanding or failure to understand. This is particularly interesting since a similar study by Bohannon and Marquis (1977) on CDS showed that caregivers simplified their language even though the children displayed signs of understanding. With respect to FDS, initial expectations about linguistic competence do not seem to play an as important role as they do in CDS. The results by Warren-Leubecker and Bohannon III (1982) are in contrast to the findings cited above that show that stereotypes based on appearance (Roche, 1989) or on speakers’ personal identity goals (Zuengler, 1991) may override any possible effects of the partner’s feedback. We can conclude that the relationship between partner modeling, alignment and interaction seems to vary depending on speakers’ goals and on aspects of the external situation, such as whether the encounter is spontaneous and outcome-oriented or whether it takes place in an experimental or a classroom situation.

chapter 6

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Talking to robots

Previous work has related speech addressed to automatic speech processing systems to the ‘simplified registers’, thus proposing it to constitute a variety or sublanguage that is characterized by linguistic simplifications (cf. Krause and Hitzenberger, 1992). Correspondingly, in analogy with the terms baby talk and foreigner talk (Ferguson, 1982), the term computer talk (Zoeppritz, 1985) for speech addressed to automatic speech processing systems was coined in order to stress its simplified nature. Given that robots are in many respects embodied, mobile computers (cf. also Powers et al., 2007), we can expect that speech directed at robots will exemplify similar characteristics. The human-robot situation is particularly interesting for the purposes of the current investigation since while we all have been communicating verbally with other humans from early childhood onwards, most of us lack experience in the communication with robots via natural language. Thus, in contrast to child- and foreigner-directed talk, which speakers generally have been exposed to in some form or other before, speech to robots can be taken to be entirely non-­conventional. There are thus no linguistic conventions speakers may resort to when faced with this unusual communication situation. Instead, for the users to interact with a speech processing robot means to communicate with someone – or something – about whom they do not have much knowledge and whom they have to get acquainted with in the course of the interaction. We can therefore expect that speakers in human-robot interaction (HRI) will need to decide online on the linguistic features of their utterances they consider the most appropriate for their understanding of the affordances of the current situation. The study of human-robot interaction may therefore, given appropriate methods, allow us to address some of the open issues, such as the relationship between the speakers’ preconceptions, the role of certain communicator characteristics (Giles and Franklyn-Stokes, 1989), such as appearance, the role of linguistic feedback, as well as the relationship between partner modeling and alignment, and how to disentangle the effects of these factors.

178 Designing Speech for a Recipient

6.1  Linguistic description of the variety Robots may vary considerably in appearance, functionality and context of use. For instance, there are humanoid robots, such as Repliee1 or iCub (Sandini et al., 2007), which are intended to resemble humans, and there are social robots with rather pet-like appearance, such as Kismet (Breazeal, 2000) or Aibo.2 Furthermore, there are robots to explore environments humans cannot reach easily, such as Scorpion (Spenneberg and Kirchner, 2007) or the NASA robots,3 which are not necessarily intended for social human-robot interaction. Moreover, there are robots carrying out selected tasks in production chains, usually repetitive high-precision or highrisk tasks. Finally, there are technical devices with selected automatic functionalities, such as the autonomous wheelchair Rolland (Lankenau and Roefer, 2001). Obviously, interactions between humans and these different kinds of robots may differ considerably from each other. The current investigation is restricted to a subset of these robots, but understanding the underlying factors may allow us to generalize from the findings to human-robot interaction in general.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

6.1.1  Phonological and prosodic properties Speech addressed to automatic speech processing systems is generally characterized by communication problems. State of the art speech technology does not yet allow natural and flexible conversation with artificial communication partners, and according to Dreyfus (1992), for instance, it never will. Many of the characteristics of robot-directed speech may therefore be due to miscommunication and attempts to establish understanding. In our corpora, misunderstandings arise due to both verbal and non-verbal robot behaviors; speakers understand the robot’s non-verbal behavior as occurring in reaction to their utterances (which in fact it is not) and interpret deviant behaviors as non-compliance, which they seem to attribute to communication problems. A common strategy in this case is to repeat the utterance using different phonetic and prosodic features in the repetitions. Analyses of speakers’ repetitions in our human-robot dialogs show that speakers use a broad range of phonetic and prosodic strategies in order to increase the understandability of their utterances. Thus, speakers employ prosodic peculiarities in reaction to the robot’s unexpected behavior as solutions to the problems they (believe to) encounter. In particular, 1. See, for instance, http://news.bbc.co.uk/2/hi/science/nature/4714135.stm. 2. See http://www.sony-aibo.com/ 3. See, for instance, http://www-robotics.jpl.nasa.gov/systems/index.cfm.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 6.  Talking to robots 179

they raise their voice, use strong emphasis, lengthen syllables and pause between words or even syllables. The phonetic, phonological and prosodic characteristics employed in the speech to robots are those that have previously been identified in contexts of communication problems, i.e. the lengthening and emphasizing of syllables and producing them in increased loudness, as well as slow delivery and the fragmentation of utterances by means of pauses between words and even between syllables (see Oviatt, 1995; Levow, 1998; Hirschberg et al., 2001). That participants in our human-robot interaction dialogs understand their utterances as possibly problematic is due to a preconception they carry into the interaction themselves since, as outlined above in Chapter 3 Methods and Data, for methodological reasons, i.e. for reasons of controllability, all robot behavior, verbal and nonverbal, was simulated by a human wizard on the basis of fixed schemata of behaviors that are the same for all users. Since also communication with real automatic speech processing systems is marked by restricted communication possibilities, the robot behaviors in our data display communication problems as well, without providing clues to the nature of these problems. Thus, it is important to keep in mind that although the robot appears to be restricted in its linguistic capabilities, this is not the reason for the communicative failure. The speakers’ attention to the linguistic properties of their utterances thus stems from their own concepts about what the robot could be good at and what it could have problems with, as well as from their theories about what makes language and spatial instructions easy or difficult to process. The linguistically restricted communication partner is thus the speakers’ own construct:4 (1) A004: okay robot - move right. (breathing) - (at=lengthening)mo(/a)ve right. (2) S037: um,--(at=slow)go-left. (/a)-turn, -left. -- he doesn’t understa+. oh, he (at=prominent)does(/a), good – boy. - - (at=lengthening)l(/a)eft - left (2) turn, (2) turn, (at=lengthening)yes(/a), good (3) P072: okay now go straight ahead? (3) a little farther? (1) little farther? (3) (at=lengthening) go (/a) (at=loud) little bit farther (/a)?

Examples (1) to (3) exemplify syllable lengthening, pauses between words and slow and loud delivery in repetitions of previous unsuccessful instructions. The use of 4. In another HRI corpus we elicited (Moratz et al., 2001), we used a real robot whose parsing capabilities, its ability to cope with spelling variants (the input modality in this corpus was typing) and the comprehensiveness of its lexicon, were actually quite good. The communicative failure users experienced was instead mostly due to the robot’s inability to process anything but goal instructions (utterances such as ‘go to the red block’ in which the direction is specified by the goal object). Speakers assumptions about reasons for communicative failure were however very similar to those speakers display in the corpora investigated here (see Fischer, 2006b).

180 Designing Speech for a Recipient

these prosodic features in repetitions after perceived communication breakdowns indicates that speakers attend to their pronunciation as a possible error source; that is, speakers employ these features as solutions to what they understand to be the problem, namely that the robot has problems recognizing what they are saying. The prosodic peculiarities are therefore strategies to increase the understandability of their utterances (cf. also Stent et al., 2008). The same phonetic and prosodic strategies as identified here have been reported for human-computer interaction in general. For instance, Oviatt (1995), Oviatt et al. (1998), and also Levow (1998, 1999) find hyperarticulation, syllable lengthening, and pauses in human-computer interaction, whose numbers even increase in repetitions. More recently, Meddeb and Frenz-Belkin (2010) investigate users’ speech while using a dictation software, and they do not only report very similar linguistic properties, but also demonstrate speakers’ awareness to their pronunciation as a trouble source. Table 6.1  Prosodic features in the RDS corpora

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Corpus Aibo1 Scorpion Pioneer Rolland1 Rolland2 Aibo2

N 13  8  9  9 11 13

loud

prominent

lengthen

M

sd

M

sd

M

sd

0.026 0.051 0.041 0.007 0.021 0.080

0.055 0.093 0.049 0.009 0.038 0.122

0.063 0.127 0.115 0.153 0.155 0.097

0.150 0.132 0.109 0.169 0.123 0.097

0.082 0.095 0.053 0.291 0.070 0.060

0.069 0.054 0.055 0.31 0.050 0.081

Quantitative analyses of our HRI corpora indicate that these strategies occur frequently and in all our human-robot interaction corpora, as Table 6.1 shows. The corpora do not differ significantly with respect to the prosodic features loudness (F(5,55) = 0.807, p = .550) and prominence (F(5,55) = 0.846, p = .523), yet concerning lengthening (F(5,55) = 3.108, p = .015), where we find higher numbers of lengthened syllables in the speech to the non-verbal Rolland. This is remarkable since prosodic peculiarities might have been expected in the context of the robot’s failure to comply, yet not if there is no indication of a communication problem. It could thus have been expected that in the dialogs in the first Rolland scenario, when participants steered the wheelchair themselves and received no feedback from the robot, that participants employed fewer of the peculiarities observed in the other corpora since there is no indication of communicative problems. However, the only indicators for the effect of the amount of communicative failure in the dialogs investigated here are the particularly low number of instances of increased loudness and the high standard deviation in the Rolland scenario for prominence and duration (lengthening). Yet, both Levow (1998) and Fischer (2006b) have shown that

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 6.  Talking to robots 181

the amounts of prosodic and phonetic peculiarities do rise in the context of communication problems in human-computer interaction and thus that these features are indeed associated with communicative problems. We can therefore assume that in the absence of feedback from the robot, speakers expect such problems anyway, which is why the number of these features is not significantly lower when speakers make spatial descriptions for the non-verbal robot. We can conclude that robot-directed speech is characterized by high numbers of phonological and prosodic peculiarities. These features are strategically employed by the participants to enhance the understandability of their utterances. If there is no indication of a communication problem, for instance, because the robot does not produce any feedback at all, neither verbal nor non-verbal, participants may still expect communication problems, leading them to employ considerable numbers of these prosodic peculiarities. Regarding intonation, we do however observe considerable variation between the different corpora, as Table  6.2 shows. The quantitative analysis of the intonation contours used in the different corpora yields significantly different distributions for the corpora under investigation (level intonation contours: F(5,55) = 7.715, p = 0.000; falling contours: F(5,55) = 3.310, p = 0.011; rising contours: F(5,55) = 78.701, p = 0.000). The interpretation of these findings is however problematic since, unlike loudness, prominence and syllable lengthening, intonation contours are not associated with a single communicative function (cf. Bolinger, 1989). We shall return to this discussion in Section 6.4.1 Appearance. Table 6.2  Final intonation contours in the RDS corpora Corpus

N

fall M

sd

rise M

sd

level M

sd

Aibo1 Scorpion Pioneer Rolland1 Rolland2 Aibo2

13  8  9  9 11 13

0.226 0.210 0.216 0.155 0.401 0.522

0.212030 0.179568 0.209312 0.101483 0.188424 0.241635

0.043 0.092 0.454 0.815 0.048 0.140

0.047 0.080 0.228 0.145 0.036 0.103

0.554 0.647 0.236 0.573 0.486 0.526

0.212 0.098 0.111 0.134 0.145 0.205

6.1.2  Lexical properties Besides the phonetic, phonological and prosodic features, also the lexical properties of speech to robots may exhibit some peculiarities. First, it can be observed that lexical items are usually selected with great care; speakers often suspect word choice to be critical in human-robot interaction, as is apparent from their reformulations and from clarification questions and remarks they direct at the experimenter:

182 Designing Speech for a Recipient

(4) P075: forward. (2) advance, (2) progress, - forward, (3) (at=high) forward? (/a) (7) (laughter) I was g+ I was wondering, whether it whether it understood English. - (laughter) (3) well yes I know, but um (1) maybe he was understanding forward as a sort of a strange kind of vorwärts5 (at=unclear) or something. (/a) - you know (laughter) - m

In this example, the speaker tries out several lexical alternatives to “forward” because “forward” did not immediately produce the desired behavior in the robot. The experienced communicative failure is thus attributed to the robot’s lexical restrictedness. The speaker then directs a side remark at the experimenter, wondering about which language the robot is able to understand. In the following example from the interactions with Aibo, the speaker reformulates her utterance because the robot claims to fail to understand; the speaker substitutes curve to the jars with turn to your left uttered with rising (trial) intonation:

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

(5) A034: that’s right, toward the box, (1) now (at=loud)curve(/a) toward the jars. (laughter) (5) Aibo: I did not understand. A034: (1) turn to your left?

The above examples illustrate that while communication problems in humanrobot interaction could potentially have many causes, (the above problems could be speech recognition problems or problems in matching the instruction with the perceived scene, for instance), people suspect the problems to be solvable if different words are chosen. The following example from the dialogs with the autonomous wheelchair Rolland shows that the robot’s clarification question “the what?” is interpreted by the speaker as a problem concerning lexical knowledge. She replaces the definite description the dinner table with another lexical item, dining room table, although the ‘problem’ could have just as well been caused by a problem in speech recognition: (6) R013: and now we will go to the table. Robot: I’m sorry, which kind of table? R013: the dinner table. Robot: the what? R013: the dining room table. Robot: ah, the dining table.

Further examples illustrating the same expectation, that the robot is lexically restricted, are the following: 5. Vorwärts is German for forward; the speaker uses codeswitching here to talk about the strange meaning the robot is suspected to employ for the term.

Chapter 6.  Talking to robots 183

(7) S042: rotate right. (laughter) (3) turn right. (8) P073: go straight? – or drive straight? (9) P072: turn, turn to the left? go to the left? (1) go to the left – please?

Also in the questionnaire studies following the elicitation of the human-robot dialogs employed in this study, speakers wondered about the linguistic capabilities of the robot, asking whether it would understand particular words or syntactic constructions, such as relative clauses. Thus, they attended to the fact that the system could have limited linguistic capabilities and displayed their concept of the robot as a communication partner that is somehow lexically restricted. Speakers may also use formal, technical or metric lexical items to facilitate the interaction, for instance: (10) R031: now we, back up, – (laughter), (1) and, (1) a five, (1) five feet reverse? – and then we’ll go over watch television. (11) S045: please, -move, - backward. (17) please make a 180 degree turn. (12) A032: your object is (4) one metre away and a 45 degree turn to your left

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

(13) A044: hello Aibo. - I want you to go, straight ahead, - past the first cup on your left, - and then, make, a right angled turn to your left. – to the f+ second cup. that you come to.

These speakers seem to suspect that their communication partners function in terms of formal operations and with abstract categories. Similarly, in their attention to the choice of words, speakers also try out more abstract categories, thus displaying their hypothesis that a robot might know objects or containers if it does not know cups, mugs or glasses, as in Example (14): (14) A048: please move, – straight forward, – straight (at=loud) ahead (/a), (at=slow) to the (/a)– to the mug, (4) in front of you. (5) Robot: Do you want me to go to the object that is in front? (1) A048: no, - I want you to go, (1) forward, (1) a little bit further. (10) please (at=loud) stop (/a), now turn, slightly right? (2) and now go to the(at=prominent) object(/a) in front of you.

Here the speaker reformulates her utterance from ‘to the mug in front of you’ into ‘to the object in front of you’, where object is even produced with strong emphasis. While formal, technical and metric terms occur frequently, speakers may also make use of rather non-technical modifiers, such as a bit, slightly or sharp, for instance: (15) A036: go a little bit to the left . - a tiny bit . (1)

184 Designing Speech for a Recipient

Technical terms and non-technical modifiers sometimes even occur within the same sentence: (16) A003: uh (1) (hnoise) (at=lengthening)turn(/a) ninety degrees to the left, (breathing) (7) (at=lengthening)s(/a)traight ahead, (breathing) (1) please, (laughter) (2) and now slightly to the left, (breathing) (3) (at=loud)stop(/a), (1) slightly to the right, (4) (at=lengthening)s(/a)top,–

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Such terms are in fact very difficult to deal with for artificial systems. The following two examples also illustrate that users may have little knowledge of what would be useful in a given situation. Example (17) shows considerable detail presented to the robot, the robotic wheelchair Rolland, detail that is neither relevant to the task nor even perceivable for any kind of existing artificial system. Example (18) illustrates that speakers may not know which information is required by the robot; process descriptions, like backup and go right, for example, are unlikely to enhance the robot’s familiarization process with particular goal locations since participants are steering the robotic wheelchair themselves: (17) R039: this is the tea-table, – it has my glasses on it, my glasses? or the (at= “lengthening”)green(/a) case. can you see the green case? hm? okay, here is the this must be the kitchen, this looks like the kitchen? (1) and maybe that’s the kitchen table, (laughter) (1) (at=“slow”)this is the uhm(/a), looks like sort of a uhm, what do you call + cooker, ah stove, - stove. this is a stove, this is American English (at=“quiet”)now(/a). - and it has a (at=“prominent”)pot(/a) on the top of it, do you see it (18) R036: okay, to the chair? (2) - (at=“lengthening”)m+(/a), (2) back now, (breathing) (3) to the, - dinner table, (8) - (at=“lengthening”)uhuh(/a), (2) (breathing) (2) - (at=“quiet”)m+, go right(/a), (11) oops, - perfect, (laughter)

However, while these kinds of information are unsuitable for the particular task the robot is meant to fulfill, participants do not treat the choice of content of their utterances as problematic. Thus, unlike the process of finding those words that may be understandable to the robot, participants do not seem to attend to their instructional strategies in general as potentially problematic in the same way. To sum up, speakers are very conscious about the limited lexicon robots may have. Some speakers consider technical terms particularly promising, which does however not stop them from using colloquial terms of modification. Furthermore, speakers may have only limited ideas about the kinds of semantic contents that could be useful for a robot in given task.

Chapter 6.  Talking to robots 185

6.1.3  Grammatical properties In this section we shall be concerned with grammatical properties of speech to robots. Especially in the spatial instruction dialogs in which the robot does not produce verbal output, speakers restrict themselves to very elementary grammatical structures. For instance, the instructions chosen often just consist of directional adverbs and simple imperative clauses: (19) P077: right? (1) okay, turn right? (20) P079: um (breathing) (2)(at=lengthening)go(/a)- forward left. (4) left? (1) forward . (3) forward? (2) stop? (5)

Sometimes participants use instructions consisting of prepositional phrases which build on the robot’s previous utterance: (21) Robot: Which object should I head towards? A048: (2) to the white cup, - behind you .

However, in other situations, also very elaborate syntactic constructions can be found, for instance:

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

(22) A031: now turn to your right, (3) and the object right to your left is where you wanna go. (23) A033: there is an object which is just, (2) (at=lengthening)to(/a)(7) just to your(at=slow)left,(/a) could you move towards that? (24) A032: the correct object will be the first. (25) A033: uh, (1) if you turn, (3) about, (3) seventy (at=slow)degrees to your left(/a) you, there, there will be a - a bowl facing you, - could you then, - move towards it . – all right.

Yet even in situations in which the robot produces quite elaborate output, speakers may choose a telegraphic style: (26) Robot: where are we going to? R004: television. (27) Robot: Yes, hello, how do you do? RD039: (2) should I um I’m fine thanks, – (laughter) um Robot: You can take us now to a place you want to name. RD039: (1) um - refrigerator,

In the following example, the speaker leaves out the grammatically obligatory definite article: (28) R039: tea-table is on the left, (2)

186 Designing Speech for a Recipient

Telegraphic style is also reported by Brennan (1991) in human-computer and by Kriz et al. (2010) in human-robot interaction for some speakers. When we consider the broad range of different instruction strategies taken by participants even in objectively identical situations, we thus find considerable variability between speakers with respect to the grammatical structures they employ in a given human-robot interaction situation. High variability may furthermore be observed even for individual speakers; for instance, in the following example, speaker R033 in the wheelchair instruction scenario without feedback from the robot restricts himself to naming the target locations with common nouns, however, using a numbering system to distinguish between different locations (the two exits and the two tables). The cookies obviously do not fit into this recipient designed list of formal location descriptions:6 (29) R033: tree, (28) TV, (8) (breathing) (15) table (13) exit one, (breathing) (12) (offtalk for 2:55 min) (8) table two? (17) cookies, (2) okay um,

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

In contrast, speaker R030 in the same scenario uses much more elaborate natural language utterances, mixing however different instruction types (declarative, imperative, adverbial phrase), for instance: (30) R030: um (breathing) (21) (at=quiet)okay(/a), so I want to go over to the sofa (…), uh, right (at=lengthening)ja(/a)? (…) so go right, (3) – right okay. (2 minutes offtalk) um, I want you to go to (at=unclear)um, no(/a), I want you to go to (at=lengthening)the(/a), - remote control, which is lying on the table? (1) (…)? (5) does it go right? – ja, (6) (…) – so oops, pardon, (laughter) - (at=lengthening)and(/a) so I’m at the small table that’s the coffee table? (at=lengthening)and, um(/a), – uh (7) uh, (2) just to my, (1) (at=prominent) left(/a) is the television? – (at=lengthening)and(/a), um, (1) I want to pick up it, if I want to pick up (at=laughter)the(/a), (1) if I want to pick up (at=lengthening) the(/a), remote control then I need to be here, so that I can get to it so I can switch (at=lengthening)on(/a) the television? (1) (at=lengthening)and(/a),–+f I want to sit down in, (1) the, - sofa, (1) (at=quiet)I need you to turn round(/a), (1) (breathing) (2)

The users in this corpus thus produce grammatically very diverse utterances, displaying different understandings of what the task, to teach the robot certain spatial locations, might consist in. Altogether, in the same situation, different speakers may employ a variety of different linguistic constructions: 6. The use of the informal term cookies to describe a non-location in the room can be seen as an attempt at finishing the task since the cookies are followed by the boundary signal okay (Bangerter et al., 2004) and a change in orientation towards the experimenter.

Chapter 6.  Talking to robots 187

nouns, e.g. R033: tree, (28) TV, (8) (breathing) (15) table, (13) exit one, gerunds, e.g. R036: okay, turning around to this, - right, – and going, back. imperative, e.g. R035: take me to the (a =lengthening)stove(/a), (12) ‘kay, (5) (a =slow) now(/a) take me to the plant. R031: please, (2) please go forward, (1) two feet and then, (1) make a right turn – go straight ahead. declarative, e.g. R039: tea-table is on the left, (2) this is a tea-table up here. R031: now we go, - go backwards about, five feet? (10) now we’ll turn to the right? modal verb constructions, e.g. R030: so I want to go over to the sofa. R030: I want you to go to um, no

Quantitative analyses show that, for instance, the mean length of utterance (MLU) may vary considerably in robot-directed speech not only between speakers but also between corpora; while in the interactions with non-verbal Aibo it is lowest, about as long as it has been found in the speech to 9-month old children (Newport et al., 1977), utterances may also have a mean utterance length as it can be found in interactions between human adults (cf. Behrens, 2006), as can be seen in Table 6.3.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Table 6.3  Mean length of utterance in four of the HRI corpora Corpus

MLU

N

sd

Aibo1 Aibo2 Scorpion Pioneer

3.972 4.105 4.616 5.088

13 13  8  9

3.396 2.230 2.807 2.091

To conclude, speakers generally suspect the respective robot to be linguistically restricted, and they may employ what they consider to be ‘simplifications’, such as a telegraphic speaking style; however, their ideas about what simplifies grammatical structures for a robot may vary considerably and do not always correspond to what might indeed facilitate interactions with an automatic speech processing system. 6.1.4  Semantic and pragmatic properties Regarding the semantic and pragmatic properties of robot-directed speech, it is noteworthy that most speakers understand human-robot interaction as a communicative situation at all (Wooffitt et al., 1997; Hutchby, 2001).7 7. Even 10-month-olds have been found to conceptualize interactive robots as true interlocutors (see Arita et al., 2005).

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

188 Designing Speech for a Recipient

Wooffitt et al. (1997) investigate central conversational procedures, such as turn-taking, dialog openings and closings, repair etc., in simulated human-­ computer interaction. They report, for instance, that speakers pay attention to the turn-taking system in ways they would in natural conversation by attending to (a) the timing of utterances, (b) the accountability of elapsing time, and (c) the general normative rules of the turn-taking system. The authors conclude that “the human participant will still be doing the things that humans do when they interact. That is, the full range of culturally available sense-making procedures will be brought to bear on any occasion, even if the other party to the interaction is a computer” (Wooffitt et al., 1997, p. 166). Thus, confronted with the unusual communication situation, speakers make use of those rules and procedures that they use in conversation among humans. Based on an extensive analysis of the impact of various different technologies on interaction, Hutchby similarly concludes that “social processes are involved in all aspects of technology” (Hutchby, 2001, p. 193). His study of human-computer interaction underlines that speakers attend to the normative structures of conversation in human-computer interaction just as in other kinds of interactional situations. Hutchby concludes that interacting with technology is constrained on the one hand by the affordances of the respective artifact, “functional aspects which frame, though not determine, the possibilities for agentic action in relation to an object” (Hutchby, 2001, p. 194), and by the pervasive normative structures of interaction on the other. Extensive work by Nass and colleagues has shown that speakers may treat artificial communication partners in ways similar to other humans; Nass and collaborators have therefore developed the computers-are-social-actors paradigm, which asserts that people treat computers as social actors just like other humans (Reeves and Nass, 1996; Groom et al., 2009); from this perspective, people should interact with robots in similar ways as they interact with other humans (Nass and Moon, 2000; Nass and Brave, 2005). The computer-as-social-actor hypothesis rests on the concept of mindless transfer (Nass and Moon, 2000); that is, even though speakers are aware that they are talking to a computer, not to another human, they mindlessly employ those behaviors that they are used to employing in interactions among humans. The reason for treating computers just like people lies in, according to Nass, evolutionary psychology, since “identifying other humans constitutes a significant evolutionary advantage” (Nass, 2004, p. 37). Mindless transfer constitutes then an error, albeit a sympathetic one: “polite responses to computers represent the best impulse of people, the impulse to err on the side of kindness and humanity” (Nass, 2004, p. 37). The discovery procedure of the studies in this framework consists in taking a stable finding from human-human interaction and applying it to human-­ computer scenarios (Reeves and Nass, 1996). For instance, Nass (2004) describes an

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 6.  Talking to robots 189

experiment in which participants first receive a tutoring session from a computer. After that, one third of the participants fill out a questionnaire about the computer’s performance at the same computer they have worked with, one third at another computer and one third on paper. The ratings of the computer’s performance are significantly better if participants fill out the questionnaire at the same computer. In human communication, it is impolite to bluntly tell other persons that one does not approve of their performance since such criticism constitutes a threat to their faces (Brown and Levinson, 1987); people will therefore make their judgments less face-threatening and thus more positive if they are evaluating a person directly than when they report their evaluation to another person. The study reported in Nass (2004) shows that the same effect can be found with computers. In similar studies, Nass and colleagues have investigated a broad range of such social behaviors, with the result that people were found to react similarly to the flattery from a computer as they react to human flattery (Fogg and Nass, 1997), that they react more positively to computers whose speech manifests the same personality as their own (Nass and Lee, 2000), and that they transfer human characteristics to the agents, such as intentionality (Ju and Takayama, 2008; Mutlu et al., 2009), ethnicity (Pratt et al., 2007) or gender, where a synthesized female voice will trigger the attribution of female characteristics to the computer persona (Nass and Brave, 2005; Nass, 2010). There are four types of evidence for the mindless treatment of computers as social actors: overuse of social categories, such as gender and ethnicity; automatic application of social rules, which they refer to as ‘overlearning’ (Nass and Moon, 2000, p. 87–88); premature cognitive commitment with single exposure, such as response to authority; and the breadth and depth of social responses, as apparent from the cumulative evidence provided by Nass and his colleagues over the years (e.g. Reeves and Nass, 1996; Nass and Lee, 2000; Nass et al., 2000; Nass and Moon, 2000; Dahlbäck et al., 2001; Nass and Gong, 2004; Nass and Brave, 2005; Groom et al., 2009; Nass, 2010). However, some authors report differences between human-human and human-computer and human-robot interaction that should not occur if people transferred from natural interactions mindlessly to interactions with artificial agents. Amalberti et al. (1993) find considerable differences between human-human and human-computer interaction, even though the wizard behaved identically in both situations; the conceptualization of the communication partner as human or as computer thus seems to play an important role. Similarly, Okita et al. (2007) find that the mere conceptualization of the communication partner as human versus machine has a considerable impact on participants’ learning behavior. Moreover, the mere thought of the sender of feedback (human or computer program) influences the depth by means of which this feedback is taken into account, as neuro-imaging studies show (Schindler and Kissler, 2016). Levin et al. (2008)

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

190 Designing Speech for a Recipient

investigate participants’ theory of mind of other humans, robots and computers with respect to categorization and goal-directed action tasks and find reliable differences between attributed behaviors between humans on the one hand and computers and robots on the other, even for robots and computers 100 years from now. In addition, Herberg et al. (2008)report differences especially in the amounts of social cues (looks, smiles, pointing) in action demonstrations to computers and robots in comparison with children and other humans. Also the results by Kanda et al. (2008) on human-robot interaction indicate differences between human-human and human-robot interaction especially regarding social behavior. They compare participant behaviors towards two humanoid robots and a human interactant with respect to verbal and nonverbal features. They do not find significant differences in the amount of information presented and the amount of politeness used, yet with respect to nonverbal behaviors there are significant differences: Speakers bow deeper for their human interlocutor in the greeting, they respond more slowly to the robots’ than to the human greeting, they gesture less for the robots than for the human, they respond more slowly to one of the robots’ pointing gesture, and they approach one of the robots more than the other and the human in terms of proximity. The results thus point to differences particularly at the social level. That mindless transfer may provide only a partial explanation is also supported by the study by Johnson and Gardner (2007), for instance, who find that the flattery effect described by Reeves and Nass (1996) and Fogg and Nass (1997) only holds for some speakers and also under certain conditions. If transfer is mediated, however, it cannot be claimed to be mindless any more. Also other studies address previous findings by Fogg and Nass (1997) on flattery and find that whether flattery is effective when coming from a computer depends on whether the information is presented verbally or textually (Lee, 2008), whether it is presented to men or women (Johnson et al., 2004) (men are not affected by flattery if the computer is female, and even the opposite effect has been observed), or whether it is presented by a male or female agent image (Lee, 2008). Thus, if transfer is involved, it seems to be mediated by a set of factors. A gender effect is also reported by Schermerhorn et al. (2008), and an effect of age is reported by Wolters et al. (2009). Further problems regarding transfer have been encountered specifically with respect to linguistic interaction. In particular, Shechtman and Horowitz (2003) find that human-human and human-computer communication differ especially along various social dimensions. That interaction by means of language should be less viable to transfer than other social behaviors is particularly unexpected if we consider that language is generally taken to induce anthropomorphism and to encourage mindless transfer; in particular, Nass (2004) holds that certain properties of computer interfaces, among which is language use, support mindless transfer and therefore trigger mindless ‘etiquette responses’ (see also Fogg, 2003, p. 91):

Chapter 6.  Talking to robots 191

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

– language use, – voice, – face, – emotion, – manifestation, – interactivity, – engagement with and attention to user autonomy, – unpredictability, and – the filling of traditional roles. The assumption that certain cues trigger anthropomorphic responses is supported by Gong’s study in which she finds that the more anthropomorphic the artificial agent is, the higher the degree of anthropomorphization (Gong, 2008). While it is plausible that situations that resemble human-to-human communication to a greater extent invite more treatment of artificial communication partners as a social interactant, we should expect the same of language use. Thus, since language use is expected to trigger mindless transfer, it should be strange if speakers did not transfer mindlessly especially in verbal interactions with the artificial communication partner. Yet this is exactly what many studies suggest: that particularly in linguistic interfaces there is less transfer with respect to social information. This is the conclusion that Shechtman and Horowitz (2003) as well as Amalberti et al. (1993) and Johnstone et al. (1994) arrive at, who all report considerable differences with respect to linguistic behavior in human-computer interactions and in interactions among humans. Another problem for the mindless transfer hypothesis is constituted by the fact that also in interactions between humans, differences can be found; thus, if speakers transfer linguistic behaviors from human communication to human-computer interaction, one of the questions arising is which linguistic variety would constitute the source for the transfer. The variability of communications among human interlocutors thus constitutes a problem for the mindless transfer hypothesis since it is unclear which variety speakers mindlessly transfer to human-computer interaction. Edwards (2001) provides a completely different interpretation of results such as those presented by Nass and colleagues. He proposes that the way in which participants interact with computers constitutes a social practice and is as such very much like communication with other beings. Edwards (1994) illustrates this point on the interpretation of animal communication. Previously, researchers have warned against overinterpreting gestures produced by chimps and gorillas, since, for instance, their gesture for ‘sorry’ “does not necessarily imply remorse”. Edwards argues that “the legitimacy of taking anybody’s individual utterances as diagnostic of mind is questionable” (Edwards, 1994, p. 764). Whether people actually feel

192 Designing Speech for a Recipient

remorse when they utter ‘sorry’ is as open as it is when animals produce such signs. Instead, people ascribe intentions, motivations etc. to each other, as well as to animals and computers, as part of a social discourse. Given these different positions, let us return to the empirical investigation of our corpora now. As Hutchby (2001) and Wooffitt et al. (1997) predicted, in human-robot interaction, speakers can be found to orient to the turn-taking system, for instance:

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

(31) R052: okay, – now I am in I’m in front of the (at=prominent) co(/a)ffee table. (1) (par) okay? (/par) Rolland: (par) Did (/par) you say coffee table? R052: – I did

In this example, the speaker produces a description of a location (the label of which the robot is meant to learn), and after a one-second pause, which might also have been heard as a significant delay by speakers in ordinary conversation (Levinson, 1983, p. 300), the speaker produces a check. The robot at that point has already begun producing a clarification question, to which the speaker responds with only a minor delay. Thus the speaker attends to the turn-taking system in at least two ways: she holds the robot accountable for the silence at the end of her turn, displaying her expectation for the robot to respect the usual timing of utterances in talk-in-interaction, and she responds to the robot’s question without much delay. In general, robot-directed speech therefore seems to exhibit the same interactional principles as conversation among humans. This applies even if the robot does not produce verbal output; the human-robot dialogs elicited show that speakers treat also the non-verbal robots as interaction partners. Similar to what Clark and Krych (2004) have found for communication among people, in human-robot interaction speakers treat the nonverbal behavior of the robot as dialog contributions, to which they respond again online with verbal utterances, for instance: (32) A003: straight ahead, (breathing) (1) please, (laughter) (2) and now slightly to the left (breathing) (3) stop, (1) slightly to the right, (4)

The same holds for attention to question–answer sequences and other sequential structures of natural conversation. Most of the speakers in the dialogs react to the robot’s questions by providing the relevant next, for instance: (33) Rolland: Okay, where are we going to next? R043: this is the refridgerator. (2) (noise) - do you see the refridgerator? Rolland: – Is this where you want to be to open it? R043: - no (2) move forward, (2) (hnoise) (1) this where I want to be - to open it,

Chapter 6.  Talking to robots 193

In the following example, the speaker complains about the robot’s request to describe the location she is heading towards, which shows that she understands its utterance as making a reply from her side conditionally relevant and thus that she attends to the normative structures of social interaction: (34) R043: we are going to the dining table, (1) this is the dining table, Rolland: – I’m sorry, which kind of table? R043: (at=prominent)di(/a)ning table, (1) this is the table where we (at=prominent)eat,(/a) Rolland: Ah, the dining table. (3) Just to get this right: I am in front of it, is that right? R043: (1) you are near one of the corners. (1) we’ll go in front of it in a minute. (9) (hnoise) - we are now in front of it. Rolland: Okay, thank you. (3) Okay, where are we going to next? R043: - you certainly are pushy.

The next example shows that although the speaker herself takes about two seconds before she replies to the robot’s question, she expects some kind of reaction from the robot’s side after her reply, since she repeats her utterance louder:

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

(35) Rolland: Yes, hello, how do you do? R048: (2) fine, (2) (at=loud)fine,(/a)

The speaker therefore documents her attention to the timing of the turn-taking system that she is used to from interactions among humans. However, even in otherwise smooth examples, speakers may produce pauses that would be accountable in conversation among humans; that is, in conversation among humans some explanation would be necessary for the violation of normative expectations in order to prevent negative conclusions (Heritage, 1988; Levinson, 1983, p. 320). In Example (35) above, the participant’s reply is two seconds delayed; furthermore, in the example below, the answer to the robot’s question “which name do you prefer, sofa or living room area?” is produced with a three-second delay. While some delays in this example can be accounted for by non-verbal action, the reply to a clarification question requires a verbal response, and the speaker’s failure to produce it on time thus indicates that not all speakers always follow the general timing of human conversation: (36) Rolland: Yes, hello, how do you do? R047: (1) I’m fine thank you. Rolland: (2) you can take us now to a place you want to name. R047: alright, – we are going to go to the dining room table, Rolland: (3) I’m sorry, which kind of table?

194 Designing Speech for a Recipient

R047: dining room table. (…) Rolland: okay, thank you. R047: (6) now we are going to the sofa, Rolland: (3) is this part of the living room area? R047: – yes . – Rolland: which name do you prefer, sofa or living room area? R047: (3) the s+, - living room area the c+ the sofa is in the living room area, – Rolland: okay, thank you.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Also in the example below, the speaker waits several seconds before producing her answer to the question. It may be argued that the robot does not always provide its utterances without a delay, either; however, the robot’s utterances are generally rather first pair parts, i.e. they initiate interaction and make the user’s response conditionally relevant, so it is not that they are expected by the participant. Furthermore, the robot’s utterances may be understood to occur in response to the participant’s nonverbal, steering behavior. Thus, it is unlikely that speaker R047 in the example above or R048 in the example below orient to the robot’s timing when producing a delayed utterance:8 (37) R048: (1) we are in front of the sofa. Rolland: (3) do you prefer living room area over sofa or couch? R048: (3) living room area is okay. Rolland: okay, thank you. R048: - okay.

We can conclude that not all speakers attend fully to the expected timing of utterances in human interaction; note, however, that speaker R048 in the example above produces a follow-up closing signal, matching the robot’s okay with another instance of okay, thus contributing to a joint topic closure (Schegloff and Sacks, 1973). Further evidence for the employment of conventions from social interactions between humans to human-robot interaction comes from studies demonstrating that the use of certain linguistic features may have the same pragmatic functions in human-robot interaction as in human-to-human communication. For instance, Torrey et al. (2013) investigate the effects of hedging and discourse marker use in robot advice giving. They find that people rate a robot that hedges its advice similar to humans is understood to be more considerate, more likable and less controlling, and they find the polite robot to have more impact on participants’

8. When interpreting the speakers’ turn-initial delays, it needs to be born in mind, however, that people are also steering the wheelchair at the same time and thus that their cognitive load may indeed be quite high.

Chapter 6.  Talking to robots 195

decision making.9 Similarly, Andrist et al. (2013) investigate the effects of linguistic signs of expertise (fluency, discourse structuring, metaphor use, indicators of prior expertise and wanting the best for the listener (cf. Mutlu, 2011)) and find people to judge robots that use such signals to be more competent and to be more persuasive. Chidambaram et al. (2012) furthermore find nonverbal gestures to have a significant effect on a robot’s persuasiveness. These findings suggest that people respond to linguistic features produced by robots in similar ways as when they are produced by other humans. There are however some further issues in which participants are not oriented to the interactive achievement of understanding observable in human conversation; in the following example, the speaker first ignores the robot’s greeting; subsequently she fails to understand the robot’s repetition of that greeting, yet she does not attempt to find out what it might have said and turns to the experimenter for help instead:

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

(38) Rolland: yes, hello, how do you do? R021: (2) (to experimenter): twenty-one, (1) (hnoise) (1) what do you want Rolland: yes, hello, how do you do? R021: (to experimenter): wa’ me say do you want me to say it again? or, (1) so, Rolland: you can take us now to a place you want to name. R021: – right next to me is – (at=slow) a table, (/a) (1) (…) armchair, -

Also the following example from the interactions with Aibo illustrates that speakers may choose to ignore the robot’s utterances completely: (39) Robot: do you mean the object that is 30 degrees west of the box? A033: (8) (at=lengthening) okay, (/a) (at=lengthening) now (/a), – turn left, (3) go straight to the box, (1) all right. (1)

Thus, this participant is not attending to grounding and thus to collaborating on establishing a joint understanding with the robot. That is, this participant does not treat her artificial communication partner as a social actor. So we can conclude that people generally attend to the normative structures of talk-in-interaction, that they transfer such structures from interactions (even though the degree of mindlessness in this transfer is unclear), and that they ascribe intentions to robots as they do to other humans, but that there are also exceptions, which are then not treated as accountable. 9. However, as Strait et al. (2014) point out, people may perceive certain robot behaviors differently when they observe them interacting with others than when they interact with them themselves. The experiments by Torrey et al. (2013) were carried out with people observing other people in interaction with the robot.

196 Designing Speech for a Recipient

There is furthermore evidence for attention to politeness; Wrede et al. (2006) observe polite conventionalized dialog routines such as greetings in 57.6% of all encounters and farewells in 29.8% of all encounters between a robot and a human participant. Furthermore, in our data speakers may use politeness formulas, as in the following examples (see also Example (9) above): (40) A004: okay robot - short left please? (breathing) - short left. (5) go on, - you are doing fine, (10) now stop, (breathing) – stop, (breathing) - robot please stop. (41) A003: keep going, (2) please keep going? - (laughter) (2)

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

(42) S039: turn (at=prominent)left, (/a) (5) turn (at=prominent) left, (/a) (2) (at=prominent) please (/a) turn (at=prominent) left? (/a) – thank you, - - (laughter) -

However, the speakers’ laughter in Examples (41) and (42) also shows that the speakers are fully aware of the awkwardness of using politeness formula when addressing an artificial communication partner. Thus, it is unlikely to be ‘mindless’ behavior. Furthermore, there is considerable variation between speakers. For instance, the numbers of instances of please per dialog in the interactions with Aibo range from 0 to 44. Thus, while some speakers employ politeness formulas regularly, others do not do so at all. Still, speakers consistently attempt to collaborate, especially in the dialogs with Rolland when the robot provides a description of the environment as a summary of what it has been taught by the human user. Its first turn ends in the discourse marker correct? asking for feedback; all utterances following this one do not. Yet, all participants provided the robot with feedback after every single utterance, for example: (43) Rolland: okay, let me summarize what I have understood. I’ll describe the room from right to left, as you did. The dining table is to my right, correct? R004: yes. Rolland: next to that is the living room area. R004: yes. Rolland: comprising a sofa and an armchair. R004: yes. Rolland: left of that is the place for watching TV. R004: yes. Rolland: we are opposite the computer table. R004: yes. Rolland: left of that is the desk. R004: yes.

Chapter 6.  Talking to robots 197

Table 6.4  Mean number of structuring cues in the different corpora Corpus

M

N

sd

Aibo1 Scorpion Pioneer Rolland1 Rolland2 Aibo2

0.099 0.026 0.031 0.556 0.563 0.163

13 8 9 9 11 15

0.075 0.038 0.037 0.316 0.370 0.153

Thus, participants collaborate with the robot in establishing a correct representation of the locations in the flat. Other pragmatic aspects of human-robot dialogs concern discourse structuring devices; Oviatt (1995), for instance, argues that utterances in human-computer interaction are much shorter and thus structuring devices are less necessary than in the much longer turns of conversation among humans. However, participants do produce structuring cues:

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

(44) A033: could you, (2) turn slightly to your left? - first? (8)

The quantitative investigation of our corpora however reveals that some speakers indeed use large numbers of structuring cues whereas others seem to follow Oviatt’s predictions. Thus, the amount of structuring devices per number of turns ranges from 0 to 27.7% for different speakers. Table 6.4 shows the different distributions of structural cues on the average in our corpora. The data show that in the dialogs with Rolland, in which the participants decided themselves how to structure the task and which locations to present to the robotic wheelchair, the average numbers of structuring cues per turns are considerable. Another peculiar property of human-computer and human-robot interaction seems to be the use of deprecatory terms, scolding and challenging (e.g. De Angeli and Brahnam, 2008). Furthermore, in an initial study involving the virtual human Max which is freely accessible for interaction via typed natural language in a German museum, Kopp (2006) reports 1.6% insults and 1.4% obscene or politically incorrect utterances towards the artificial agent (Kopp, 2006, p. 106). About three times a day, Max furthermore leaves ‘his’ screen because he judges the interaction with a particular human to be ‘annoying’. In a more elaborate study of interactions with the virtual museum agent, Kopp finds 11% of ‘flaming’, comprising verbal abuse, pornographic utterances but also random keystrokes and senseless utterances (Kopp, 2006, p. 109). In contrast, in the data investigated, such features are very rare. A reason may be that participants in our experiments were never alone with the robot nor had the opportunity to feel unobserved.

198 Designing Speech for a Recipient

To sum up, most participants seem to orient to conversational procedures they are familiar with from the interaction among humans. In this respect, participants indeed seem to treat computers as social actors. However, the exceptions regarding the timely production of relevant nexts, the high amounts of verbal abuse and the interpersonal variation show that participants do not do so mindlessly, which is supported by speakers’ laughter when they use politeness formulas. We can conclude the linguistic description of robot-directed speech by stating that a homogeneous variety robotalk could not be identified. There are neither particular reliable conventional register markers in speech to robots nor characteristic distributions of certain features besides those caused by expected communication problems. With respect to most linguistic domains, RDS has turned out to be rather heterogeneous.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

6.2  Functionality Even more so than child-directed and foreigner-directed speech, the functionality of speakers’ recipient design for automatic speech processing systems, such as robots, is a moving target. In the same way in which the usefulness of certain adaptations for infants and small children depends on their cognitive and linguistic development and their ability to make use of the linguistic input, and in which the usefulness of adaptations for foreigners depends on their language capabilities and the purpose of the interaction, the usefulness of adaptations in speech to computers and robots depends on their speech processing and reasoning capabilities. These, in turn, depend on the one hand on the system designers’ ideas of the linguistic properties of the speech the suspected users are likely to produce, and on the other on the possibilities and limitations of current speech technology. Regarding the phonological and prosodic peculiarities described above, we have seen that speakers employ these strategically because they believe these features to make their utterances more understandable – which may be true for human communication partners. However, such strategies can be disastrous in the communication with an artificial system if it was trained on continuous, natural or even read speech, instead of on slow, hyperclear speech which may even include pauses between the syllables of a word. Accordingly, Levow (1998) reports for English human-computer interaction data that utterances after a communication problem are considerably more difficult to recognize. In particular, the “probability of experiencing a recognition failure after a correct recognition was 16%, but immediately after an incorrect recognition it was 44%, 2.75 times greater” (Levow, 1998, p. 737). Thus, the efficiency of such strategies for increasing understandability depends on whether the speakers’ hypotheses about what makes

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 6.  Talking to robots 199

their language more understandable match with their communication partner’s processing capabilities. Syllable lengthening, pauses between words and syllables, strong emphasis and the like constitute a problem if the recognizer has not been trained on such material (cf. Batliner et al., 2003). Here the first option is to use such linguistic material for training. Alternatively, a second speech recognizer, trained particularly on emotional speech and on speech occurring in situations of communicative problems, may be employed in case of recognition failure. This special recognizer may be used only in situations in which problems are detected. This is the solution taken by Levow (1998) and Oviatt et al. (1998), for instance, in the case of repetitions. Batliner et al. (2003) suggest such an extra recognizer for all cases of trouble in communication, especially if the speaker gets angry (see also Nass and Gong, 2004). Irrespective of the usefulness of the speakers’ linguistic adaptations, however, speakers’ ideas about what makes speech more understandable need to be considered because these will affect what speakers are going to produce in situations of (mis-)communication with an automatic speech processing system. Regarding lexical and grammatical properties, as we have seen above, speakers may resort to technical or abstract vocabulary, use telegraphic style, switch from whole sentences to keywords, and may even produce structures that are more similar to programming languages than to natural language (cf. Krause, 1992). Regarding the lexicon, designers of automatic speech processing systems need to know which lexical items the speech recognizer needs to be trained with and which need to be accounted for in the syntactic and semantic analysis. The fact that participants seem to be highly aware of the lexical restrictedness of their artificial communication partner and that they understand lexical choice as a possible trouble source may make this problem less pressing; however, participants do not always have appropriate ideas of which other lexical items may be more suitable than those they had chosen initially. Syntactic modifications, especially if speakers produce ungrammatical sentences, furthermore cause considerable problems for most state-of the-art parsers. On the positive side, regarding the range of constructions used, the fact that speakers believe certain syntactic constructions to be too difficult for a system can be used to restrict the range of constructions implemented and thus to narrow down the search space during parsing. On the more negative side, the parser has to be extended so as to include telegraphic speaking style, or else to allow partial parsing. To sum up, whether speakers’ adaptations to human-robot interaction situations are functional or not depends on the components of the automatic speech processing system and on the data it has been trained with. Currently, most systems do not deal very well with human linguistic adjustments and simplifications (cf. Thar, 2009; Meddeb and Frenz-Belkin, 2010). Thus, in contrast to adaptations in CDS and partly in FDS, the adaptations in RDS are mostly of restricted functionality.

200 Designing Speech for a Recipient

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

6.3  The role of alignment In this section, we address the role of alignment in human-computer and humanrobot interaction, one of those mechanisms that have been proposed to determine interactions between communication partners (Pickering and Garrod, 2004). Previous work in human-computer interaction indicates that alignment may play a considerable role. For instance, Morel (1989) suggests that speakers are as polite with a system as the system is polite with them. Moreover, Amalberti et al. (1993) find that speakers adapt to their communication partners’ utterances. They divided their participants into two groups; one group believed that their partner’s utterances were coming from a computer while the other group believed these utterances to be produced by another human. Initially, the participants’ linguistic behavior in the two groups was significantly different. However, after several sessions the linguistic properties of the speech produced in the two groups converged, suggesting that the ways in which people talk to artificial (and other) communication partners may be determined by alignment, at least in the long run. On the other hand, Strupka et al. (2016) find people prosodically to disalign with the robots they were interacting with, such that they made their speech increasingly dissimilar over time with respect to intensity, duration, pitch height and pitch range. Branigan et al. (2003) find even more syntactic alignment with a computer than with a human, thus suggesting that the human-computer situation strengthens the role alignment may play. In a follow-up study, Pearson et al. (2003) even find that 67% of the participants aligned with a dispreferred term when the term was presented by a computer, in contrast to only 15% when the term was presented by another human. In order to determine the reasons for the above findings, Pearson, Hu, Branigan, Pickering and Nass (2006) investigated the effect of speakers’ assumptions about the cognitive capabilities of the computer. Participants were first shown a start-up screen with information about the computer program that participants were using throughout the experiment; in the basic condition, the start-up screen stated Basic version with a copyright from 1987 and displayed a fictional computer magazine review stressing the software’s limited features but cheap price and good value for money; in contrast, in the advanced condition, the start-up screen displayed Advanced version: Professional edition and a fictional computer magazine review stressing the software’s high price and its impressive range of features and sophisticated technology, with a recent copyright. All other aspects of the interactions with the computer were identical (Pearson et al., 2006). The results show that more speakers aligned with the ‘basic’ computer than with ‘advanced’ computer, indicating that speakers align more when they expect communicative success to be

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 6.  Talking to robots 201

at issue. The idea that alignment is concerned with facilitating communication (cf. also Branigan et al., 2010) is supported by a study by Pearson, Pickering, Branigan, Nass and Hu (2006) in which more alignment was found when speakers believed to be talking to a non-native speaker than when they believed that their communication partner was another native speaker of English (Pearson et al., 2006). In the studies by Branigan, Pearson and colleagues reported on above, research focuses on the alignment with either lexical or syntactic linguistic features of the partner’s utterances which had previously been tested for their general likelihood of occurrence. This methodological step is necessary since in order to determine the amount with which people align with each other, it needs to be established first whether they would not have used the lexical item or construction under consideration anyway. For instance, if there is generally a fifty-percent chance that speakers will use a particular word, no conclusions can be drawn if fifty percent of the speakers in the experiments use the term if the communication partner uses the same term. Therefore, the experiments reported on so far are controlled for the vocabulary and structures involved and usually concentrate on a single, rather uncommon feature in the investigation (see Branigan et al., 2010). While this procedure is very useful to establish that alignment occurs at all and to investigate the conditions under which it occurs, the focus on particular linguistic features blocks the view for speakers’ overall linguistic behavior in human-computer and human-robot dialog. That is, the focus on isolated items whose frequency has been controlled for on independent grounds is not suited to show to what extent speakers align to their communication partners in general, yielding a possibly too narrow view of the amount of adaptation. For instance, failure to adapt to the particular feature under consideration may be compensated for by the speaker by means of alignment of many other features. The approach also disguises the function of alignment in interaction in general. That is, it is obvious that speakers do not take up all linguistic material from their communication partners; although the goal of communication may be the construction of a shared representation of the situation, as Pickering and Garrod (2004) suggest, this is done only ‘for all practical purposes’ (Clark and Schaefer, 1989b). What is open therefore is which features of a computer’s or robot’s output speakers take up, under which circumstances, and why just these features and not any other (see Bateman, 2006). For instance, speakers may decide for a particular form because they believe it to be more suitable for the current communication partner, rather than due to alignment with the partner’s utterance. In the experiment described below, interactions with the artificial communication partner are therefore not constrained at all. Instead of assessing speakers’ preferences for particular linguistic forms on independent grounds, for instance, by determining a particular feature’s general frequency in very large corpora, the

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

202 Designing Speech for a Recipient

procedure here is thus to analyze speakers’ linguistic choices in interaction with a robot that produces no linguistic behavior and then to elicit a second corpus with verbal robot behavior in which the effects of the robot’s utterances can be studied globally. Thus, this study proceeds by analyzing the number of linguistic features of the robot’s utterances speakers may pick up in contrast to a baseline corpus in which speakers instruct a non-verbal robot. Consequently, the focus is not on a single linguistic item or construction, but on the speakers’ linguistic behavior on the whole. The data used in the current study are 26 interactions between Sony’s Aibo robot and native speakers of English (see Section 3.2 Human-Robot Interaction Data). Participants’ task was to make Aibo move to particular goal objects, pointed at by the experimenter in a joint attention scenario. Condition 1 comprises thirteen English human-robot dialogs in which the participants’ task is to instruct the robot verbally to move to particular goal objects. The robot’s behavior is created according to a fixed schema, independent of the speakers’ utterances in a so-called ‘wizard-of-Oz’ scenario; that is, a student assistant behind a screen manipulated the robot’s behavior via remote control and on the basis of a prespecified schema. Participants furthermore received no hints regarding the robot’s capabilities. Based on the findings on Condition 1, a second corpus was elicited. It also comprises thirteen dialogs with native speakers of English. The task and the robot’s behavior were identical to that in the first condition, but in addition to the robot’s nonverbal behavior, verbal output was played according to a fixed schema. The robot’s utterances were designed on the basis of the following considerations: First, in order to “trigger” (see Nass, 2004) a social relationship, Aibo greets the speaker with the utterance yes, hello, how do you do?; second, Aibo proposes an object-naming strategy, referring to goal and landmark objects directly, for instance, do you want me to go to the object that is in front? (in the baseline corpus such object-naming instructions have been found to be rare); third, Aibo uses an extrinsic reference system, for example, do you mean the object that is south south east of the box?. This serves to identify alignment in the domain of perspective taking directly since speakers do not use an extrinsic reference system in the first condition. Finally, since they do not occur in the non-verbal dialogs, either, Aibo makes frequent use of relative clauses. Besides these four properties, which the robot displays implicitly in its utterances, the utterances were designed not to reveal any information useful for construing a particular mental model of the artificial communication partner. Thus, if the robot signals lack of understanding, it does so by giving no clue as to what may have gone wrong by saying, for instance, I did not understand or by producing utterances simulating misunderstanding, for instance, a turn of 360 degrees is not useful, assuming (correctly) that nobody would actually suggest that the robot should turn 360 degrees.

Chapter 6.  Talking to robots 203

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Condition 2 thus comprises thirteen English human-robot dialogs which were recorded under conditions exactly as in Condition 1, just that the fixed schema of robot behaviors is paired with a fixed schema of robot utterances, both independent of the speakers’ instructions. The data were coded in the following way: – As evidence for suspected competence of the robot, the number of structuring cues was determined, in particular, first, next and now (see also Fischer and Bateman, 2006). – As an indicator of the robot’s suspected competence, the numbers of abstract object descriptions, in particular object, container and obstacle, in comparison to concrete ones, in particular, cup, box and bowl, were identified. Since the word object is used by the robot in Condition 2, also its absolute instances are counted for the examination of alignment. – Speakers’ emotional involvement in the interaction was measured by means of their use of the interjections oh and ah. – As an indicator for interpersonal relationships, it was counted how often participants use the robot’s name to get its attention. – As an indicator for the amount of interactivity with their artificial communication partner, speakers’ use of feedback was analyzed, comprising the feedback signals yes, yeah, okay, good and uh-huh. – Instances of technical expressions, for example, referring to the position of objects or to the position of the robot, were counted, in particular instances of degree, meter, inch, foot and angle. – Instances of extrinsic spatial references were counted, in particular, uses of the cardinal points of the compass, such as north or south. – Instances of anthro- and zoomorphization were counted, in particular, uses of dog, tail, head and eat. – As an indicator for social relationship, the number of politeness signals produced was identified, in particular, instances of please and thank you. – The type-token ratio and the mean length of utterance (MLU) were calculated from the number of words, the number of different words and the number of words per utterance. – The instructional strategy was analyzed in two steps: First it was determined whether an instruction consisted in the description of a goal object or in the specification of the direction or path along which the robot was to move; second the sentence mood in which the description was delivered was analyzed. The sentence mood chosen informs us about the relationship between the speakers and their artificial communication partners as well as about the robot’s suspected competence (Halliday and Matthiessen, 2004); in particular, the following categories were distinguished:

204 Designing Speech for a Recipient

declarative A032: okay um (1) your first object is to your right, imperative A008: head forward a bit, adverbial phrase A004: (at=quiet)backwards(/a). – (breathing) prepositional phrase A048: to the white cup. interrogative A033: no could (at=lengthening)you,(/a) (2) go fu+, uh further forward, (3)10 – Finally, concerning the prosodic realization, each utterance was analyzed for its intonation contour. We distinguish between falling, rising and level intonation contours. The results are the following: In the first condition, that is, with the robot displaying only non-verbal behavior, we find an almost consistent use of path-based instructions, that is, without naming a goal object. Accordingly, all instances of spatial instructions in these dialogs are based on the communication partner’s perspective. That is, speakers consistently use the robot’s point of view for their spatial instructions. Instructions are generally very basic, consisting mostly of adverbial phrases or imperative clauses. Altogether 84.6% of the speakers steer the robot as with a verbal remote control, for example:

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

(45) A005: um turn left, (breathing) (6) um turn left, (breathing) (6) stop, (1) oops stop. (laughter) forward, Table 6.5  Differences between the conditions concerning abstract versus concrete nouns abstract concrete

Aibo 1 M

sd

Aibo 2 M

sd

F

p

0.008 0.046

(0.031) (0.138)

0.175 0.191

(0.153) (0.128)

14.876  8.328

0.0007 0.008

In contrast to the first condition, in which speakers mostly used a partner-centered reference system and a very primitive instruction mode by means of path-based descriptions, in the second condition, instructions are much more complex, exhibiting significant linguistic differences due to alignment. First, there is alignment of instruction strategy; 84.6% of the speakers consistently use object-based instructions while path-based instructions are only used for ‘finetuning’: (46) A043: I want the middle, - uh, - plastic box. with the red lid. yeah, the one at the left hand side. (2) yep, and straight ahead. – good, okay, stop.

10. Instruction by means of nominal phrase did not occur in these data.

Chapter 6.  Talking to robots 205

Table 6.5 shows the significant differences in the strategies used in the two situations as they are apparent in the differing numbers of references to objects: Both concrete and abstract nouns are significantly more common in Condition 2. Correspondingly, an example of lexical alignment is the use of the abstract term object instead of the more concrete terms cup or bowl, which speakers were using in the first condition. Here, 89.2% of the speakers can be found to adapt to the robot’s lexical choice of object, for instance: (47) A044: hello Aibo. - I want you to go, straight ahead, - past the first cup on your left, - and then, make, a right angled turn to your left. - to the f+ second cup. that you come to. Aibo: do you want me to go to the object that is in front? A044: keep going towards the object that is in front.

Here, the speaker changes her previous referential term cup into object after the robot’s utterance. This observation is supported in the quantitative analysis of the occurrence of the term object; the comparison between the two corpora shows that the term object is much more frequent in the second condition, yielding a highly significant difference (see Table 6.6).

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Table 6.6  Differences between the conditions concerning the noun object object

Aibo 1 M

sd

Aibo 2 M

sd

F

p

0.077

(0.277)

12.600

(10.301)

19.103

0.0002

Table 6.7  Differences between the conditions concerning the external references system compass

Aibo 1 M

sd

Aibo 2 M

sd

F

p

0.000

(0.000)

0.013

(0.024)

4.021

0.055

Furthermore, we find alignment of the reference system employed by the robot; in a previous experiment (Moratz et al., 2001), we found that speakers may suspect robots to work on the basis of external relationships, especially the four cardinal points of the compass. In Condition 1, none of the participants uses this reference system. However, in Condition 2, in spite of its unintelligibility in the current situation (the recording took place indoors, and there was no clue to what the cardinal directions actually were), as many as 46.2% of the speakers make use of this type of spatial instruction, although it does not make much sense for the speakers themselves (see also Fischer and Wilde, 2005), for example: (48) Aibo: Do you mean the object that is 30 degrees west of the box? (7) Which object did you mean? A033: (4) yes. the object which is 30 degrees west of the blocks. the large glass.

206 Designing Speech for a Recipient

Participants may even use the absolute reference system when communication problems occur without being prompted by the robot: (49) A032: I want you to go to (at=lengthening) the (/a) (2) uh blue bowl, that is furthest from you. Aibo: alright. (11) should I head towards the blue object? A032: the one that is, - north-east. (50) A046: the red and white container in the middle. Aibo: I did not understand. A046: go to the objects south-south-east from the container.

Together, 46.2% of the speakers take up the robot’s extrinsic reference system. The quantitative comparison almost reaches statistical significance (see Table 6.7). Furthermore, we can find constructional alignment, for instance with respect to relative clauses: (51) Aibo: do you want me to go to the object that is in front? A031: - no, to the object that is to your right.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

However, a few turns later, the same speaker uses the relative clause with the overt relative pronoun in response to a communicative problem, that is, strategically rather than automatically, as a solution to a perceived problem: (52) A031: now move to the glass in front of you Aibo: I did not understand. A031: – move to the object that‘s in front of you.

Thus alignment does not necessarily occur here due to automatic priming but rather as part of a problem solving strategy. Together, there are significantly more relative clauses in Condition 2 than in Condition 1, as Table 6.8 shows. Table 6.8  Differences between the conditions concerning relative clauses relclauses

Aibo 1 M

sd

Aibo 2 M

sd

F

p

0.000

(0.000)

1.667

(2.526)

5.630

0.025

Finally, we can observe alignment with the greeting at the beginning of each dialog. While only one speaker volunteered a greeting to the robot in condition 1, in Condition 2 there are 9 participants who greet the robot, leading to a marginally significant difference (F(1,25) = 4.058, p = 0.05). To conclude, alignment seems to play a considerable role in these interactions. At the same time, there are numerous other significant differences between the two corpora that cannot be related to the linguistic output of the robot; for instance,

Chapter 6.  Talking to robots 207

the complexity of constructions generally increases in the second, verbal condition; for instance, we can find topicalization constructions: (53) A031: now turn to your right, (3) and the object right to your left is where you wanna go. (54) A032: the correct object will be the first.

Other linguistic choices cannot result from alignment either since they were not part of the robot utterances. There are, for instance, significant changes in grammatical mood such that the instances of imperative instructions are significantly lower in Condition 2 than in Condition 1 (see Table 6.9).11 Furthermore, we can observe a significant decrease in instances of anthro- and zoomorphization of the robot in Condition 2, yet the robot is called by its name significantly more often in the second condition. There are furthermore significant differences in the prosodic delivery of utterances, in the amount of feedback given and in the number of instructions per number of turns. Furthermore, there are fewer instances of interjections in Condition 2 compared to Condition 1. Finally, we can observe tendencies towards more technical and more polite terms in Condition 2.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Table 6.9  Differences between the conditions not due to alignment oh fall rise anthrop ah mood:imp vocative feedback instruct technical polite MLU

Aibo 1 M

sd

Aibo 2 M

sd

F

p

1.923 0.225 0.043 0.043 0.615 0.796 0.001 0.073 73.00 0.007 0.039 3.972

(2.019) (0.212) (0.047) (0.062) (0.768) (0.154) (0.003) (0.082) (35.80) (0.027) (0.107) (3,396)

0.000 0.522 0.140 0.000 0.067 0.589 0.034 0.141 62.93 0.044 0.106 4.105

(0.000) (0.242) (0.103) (0.000) (0.258) (0.266) (0.051) (0.087) (42.71) (0.070) (0.164) (2.230)

13.69 11.71 9.675 7.306 6.806 6.255 5.620 4.452 3.566 3.157 1.575 0.015

0.001 0.002 0.005 0.012 0.015 0.019 0.025 0.045 0.070 0.087 0.221 0.902

11. The abbreviations used in this table and the following are: fall = falling intonation contour; rise = rising intonation contour; anthrop = anthropomorphization; mood:imp = imperative mood; instruct = numbers of instructions per turns; technical = technical terminology; polite = politeness formula; MLU = mean length of utterance

208 Designing Speech for a Recipient

If however alignment does not cause these linguistic choices in the second condition, they must have other causes, and thus we can assume that the participants’ partner models play a decisive role, too. These models may already be shaped by the fact that the robot speaks at all (see also Cha et al., 2014), as is indicated by the rather large number of speakers who employ the object-based instruction strategy even before the robot has suggested this:

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

(55) Aibo: yes, hello, how do you do? (4) what can I do for you? A034: (2) um, – move to the bowl, on the left side? (1)

That is, some speakers initially employ the object-naming strategy directly after the greeting, i.e. before they have been prompted to do so by the robot. To conclude, we can summarize that speakers use linguistic material presented by the robot with respect to the instructional strategy taken, the reference system employed, the lexical choices made and with respect to the syntactic structures chosen. Thus, alignment in the sense of behavior matching can be demonstrated in our data. However, in addition, a linguistically skillful robot obviously creates a different, more complicated image than a robot that displays only passive linguistic knowledge, leading to different choices in the second condition compared to the first, non-verbal condition. The effect observed cannot be due to interactional features, such that the speakers would feel better understood, since the robot’s nonverbal behavior is identical in both situations, and its linguistic behavior generally does not convey successful understanding. That is, there is no more feedback signaling understanding in the second condition than there is in the first. The effect thus has to be due to the mere fact that the robot is using natural language utterances and that these comprise a greeting, reference to objects and relative clauses. This observation is in line with a study by Cha et al. (2014), who find not only that robots that speak at all are judged to be more competent both socially and physically than non-verbal robots, but also that robots that produce small talk are perceived to be more competent and more acceptable than robots that only produce on-topic communication. However, if some linguistic behavior is definitely not caused by alignment and can be rather attributed to partner modeling, it is not possible to ascertain in those instances in which people do make use of the robot’s utterances, whether this is caused by automatic alignment, or whether these instances are rather strategic choices based on a certain partner model, which comprises hypotheses about those structures that the robot seems to know because it uses them itself. This latter interpretation, that speakers’ linguistic choices in our dialogs therefore seem to depend essentially on their partner model, is supported by results reported on in Schober (1993) and discussed in Schober (1998), which point in a similar direction; he finds that in the absence of feedback from the partner, speakers

Chapter 6.  Talking to robots 209

choose a reference system that takes over the interactional load and makes it maximally easy for the potential hearer. The linguistic utterances from the artificial partner in our dialogs then play at least two roles: On the one hand speakers make use of linguistic properties of the robot’s utterances, on the other the robot’s utterances are taken to reveal information about the robot’s general linguistic and cognitive capabilities, which speakers make use of when building up their partner models. In our dialogs, the robot does not give feedback in terms of understanding; in fact, it behaves independently of what the speakers say. Nevertheless the robot’s verbal output has a great effect on the instructional strategies taken, such that speakers obviously regard the robot in the second condition to be much more skillful than in the first condition. Therefore, both the partner model and alignment with the robot’s linguistic features play a role in speakers’ utterance design. The next sections will be concerned with determining the contributions of these two resources, partner modeling and alignment, and with identifying their relationship.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

6.4  The role of partner models In this section, the question is addressed to what extent speakers’ linguistic behavior is determined by their partner models, which they may build up implicitly or explicitly, and what such models may comprise. First of all, we can observe that speakers attend to particular properties of their unfamiliar communication partner. Often, they turn to the experimenter to ask questions about the robot before they begin their instructions, and sometimes they direct their questions at their artificial communication partner itself. Below are some examples: (56) P075: does he understand English? (57) A004: do you see the blue bowl there? (58) S037: he (at=prominent)does(/a) know this routine? (59) P072: can he see the, does he have as, a a visual sensor?

Sometimes speakers also make side comments that indicate similar considerations as raised in the questions, for instance: (60) A001: he wags his tail. (1) that means that he sees me. (61) S048: move to the l+, -move to the other bowl? no, he doesn’t know that.

210 Designing Speech for a Recipient

The clarification questions speakers ask robots in order to formulate their utterances can be systematized regarding different traits of the robot, which concern, for instance, its perceptive and linguistic capabilities, or its understanding of the task, see Table 6.10 (see also Fischer and Bateman, 2006). These clarification questions are similar to the insertion sequences identified by Schegloff (1972a) in that they clarify preconditions for producing a spatial instruction. Thus, speakers may be actively engaged in a kind of membership analysis, aiming at creating an appropriate model of the communication partner (see also Clark, 1998b). The question arising then is what properties and types of information provided by their communication partner speakers make use of. Since speakers generally have little experience in the interaction with artificial communication partners, they can be expected to make use of knowledge about other situations that, in their views, are similar to the one encountered instead. As Nass and Brave (2005) have shown, participants in human-computer interactions have been found to make use of features displayed by the system; for instance, a female voice makes participants ascribe female properties to the system. The appearance of the communication partner seems therefore a natural starting point to investigate what information speakers rely on in partner modeling and how they proceed. Table 6.10  The topics of clarification questions Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Domain perception functional capabilities linguistic capabilities cognitive capabilities

Example P072 S037 P075 S037 A032

do you see the jar? does he have any sensors? does it understand German better than English? he knows what he’s supposed to do? can I use colors?

6.4.1  Appearance Research findings indicate that speakers orient at their communication partners’ appearance when formulating their utterances. Roche (1989), for instance, suggests that appearance may be a crucial factor in intercultural communication, such that a difference in skin color may be taken as evidence for lack of common ground. Moreover, besides the results by Nass and colleagues, that users transfer knowledge from the source domain of human social interaction to human-computer interaction (Reeves and Nass, 1996; Nass and Moon, 2000; Nass and Brave, 2005), also other studies have shown some impact of robots’ appearance (e.g. Powers and Kiesler, 2006; DiSalvo et al., 2002; Hegel, 2012). Likewise, Koda and Maes (1996) find differences in the amount of anthropomorphization depending on the amount

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 6.  Talking to robots 211

of human characteristics of the robot. Similarly, Parise et al. (1996) compare the degrees of cooperativeness exhibited by participants in interactions with humanlike versus dog-like agents. Although participants found the dog-like agent cuter and more likable, they cooperated more with the human-like agent and even more so with the human interlocutor. Similarly, a study by Scheutz et al. (2016) shows that people ascribe different degrees of moral responsibility to robots depending on their appearance. Additional evidence on the impact of an agent’s appearance on the ways people interact with it comes from the study of avatars and figures in virtual environments; for instance, Baylor et al. (2006) investigate different personas as online tutors and find considerable differences in the amount of influence these tutors have on the student depending on their appearance. Similarly, Forlizzi et al. (2007) find effects for agent appearance that reflect common gender stereotypes. Furthermore, studies with virtual humans indicate that appearance has a great effect on participants’ behavior (Van Vugt et al., 2010). Moreover, there are effects depending on how similar the artificial interaction partner is to the user (Nass and Lee, 2000; Yee et al., 2007). These findings suggest that speech directed to robots that differ in appearance will differ considerably. The causes may be subconscious, automatic responses to cues from the robots’ appearance, similar to how Nass and Brave (2005) have hypothesized participants to react to voice cues; alternatively, users may choose their linguistic strategies depending on their concepts of their artificial communication partner, which in turn are influenced by the appearance of the robot. In order to study the influence of the appearance of the robot on the way users talk to them, we investigate the degree to which zoomorphic characteristics have an influence on the way speakers interact with a robot and whether such interactions differ from interactions with artificial communication partners with fully technical appearances. The robots used here resemble a dog, a spider and a car-like object respectively (see Section 3.2 Human-Robot Interaction Data). The research question is whether speakers make use of experience from previous interactions with dogs, cars, or arachnoids and will thus interact differently with the three robots even though the robots’ behavior is identical. The current investigation was carried out using the human-robot corpus with the three different robots described in Section 3.2 Human-Robot Interaction Data; for convenience, I summarize here the main features of the corpus: – – – –

30 English dialogs; three different robots: Aibo, Scorpion, Pioneer; identical (since scripted) non-verbal behavior of the robots; task: to verbally instruct the robot to move to goal objects in a joint attention scenario.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

212 Designing Speech for a Recipient

The data were recorded with the same set of speakers, the recordings being however several months apart. Participants were therefore not debriefed about the experimental set-up before all three data sets had been elicited and were kept in the belief that they had been interacting with autonomous robots in all three experiments. As Nass and Moon (2000) and Nass and Brave (2005), for instance, suggest, speakers tend to transfer mindlessly features from the source domain of human social interaction to the interaction with artificial communication partners. If the mindless transfer hypothesis is correct, it can be expected that also the appearance of the robots, i.e. as dogs, spiders/scorpions, and toy cars, should lead participants to transfer their knowledge about these domains into the interaction. Correspondingly, the mindless transfer hypothesis predicts that participants transfer characteristics of speech to dogs, scorpions/spiders and toy cars to the speech to the robots. The speech to the three different robots should therefore be different due to mindless transfer from different source domains. While we do not know much about how people talk to scorpions/spiders and toy cars, if they talk to them at all, there is some research on how people talk to dogs. The mindless transfer hypothesis thus predicts that participants make use of their knowledge about the communication with dogs, such that we should be able to find characteristics of dog-directed speech that we cannot find in the other two corpora. Mitchell (2001) summarizes the research on talking to dogs; he reports many similarities in linguistic choices with speech to children. In particular, he finds caretakers of dogs to use we and let’s about as often as in child-directed speech, to use high pitch, and to use present tense verbs only; furthermore, 50% to 80% of the utterances are in imperative mood (Mitchell, 2001, p. 193). He also finds the dog’s name to constitute a considerable factor in human-dog interactions, making up 30.5% of the lexical material of the utterances; moreover, the MLU ranges from 2.4 to 2.9 (Mitchell, 2001, p. 192). Talk about the dog’s feelings or mental processes is rare (Mitchell, 2001). Similar features of dog-directed speech are reported by Roberts (2004) with the only difference that she finds the veterinarians in the clinic she investigates to use attributions of mental and emotional states to dogs as a strategy to convey possibly face-threatening information about the way the dog is kept to the owner. Based on his results, Mitchell argues that speech to children and speech to dogs are similar with respect to four out of five language functions: attention getting and holding, simplification, showing affection and pretending the other to be a proper conversation partner (Mitchell, 2001). Linguistic features and constructions fulfilling these functions are present in both varieties, while a language tutoring function is peculiar only to talk to children. Thus, Mitchell suggests functional similarities between talk to dogs and talk to children; in contrast, besides possibly attention getting and holding functions, which may be a prerequisite for any kind of dialog, the functions identified for dog- and child-directed speech are quite unlikely to be relevant for interactions with arachnoids and toy cars. So

Chapter 6.  Talking to robots 213

besides the mindless transfer hypothesis, also functional considerations suggest that the features described of human-dog interactions can be expected to occur in the human-Aibo dialogs but not in any of the other corpora. The data were recorded, transcribed and analyzed; the analysis was carried out semi-automatically; for most features, automatic search procedures were devised, the results of which were then checked manually. Grammatical mood and relative clauses were manually annotated and then searched automatically for each participant. The following linguistic features were encoded and normalized with respect to the total number of words used by the respective speaker:

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.



 evidence for suspected competence of the robot, instances of structuring As cues were counted, in particular, first, next and now (see also Fischer and Bateman, 2006). – As an indicator of the robot’s suspected competence, the numbers of abstract object descriptions, in particular object, container and obstacle, in comparison to concrete ones, in particular, cup, box and bowl, were identified. – Speakers’ emotional involvement in the interaction was measured by counting the occurrences of the interjections oh and ah, partner-oriented signals like hello and displays of interpersonal relationship, like the robot’s name or the personal pronoun we. – As an indicator for speakers’ interactivity with their artificial communication partner, their use of feedback was analyzed, comprising the feedback signals yes, yeah, okay, good and uh-huh. – Furthermore, instances of anthro- and zoomorphization were encoded, in particular, uses of dog, scorpion, spider, tail, head and eat. – Moreover, as an indicator for a social relationship, the number of politeness signals produced, in particular, instances of please and thank you was calculated. – The type-token ratio and the mean length of utterance (MLU) were calculated from the data. – Finally, the sentence mood chosen informs us about the relationship between the speakers and their artificial communication partners and about the suspected competence of the robot; in particular, the following categories were distinguished: declarative P073: oh? no it’s going right that’s good. imperative A008: head forward a bit, adverbial phrase S039: straight ahead, prepositional phrase P078: to the left. noun phrase P075: – right turn, interrogative A011: can you turn right? The occurrences of sentence mood were normalized with respect to the numbers of turns employed by the respective participant.

214 Designing Speech for a Recipient

The results of our corpus study are, given the plausibility of the hypotheses, rather surprising: For the 30 English dialogs, very few significant differences between the corpora can be identified. That is, speakers’ linguistic behavior was only marginally influenced by the differences in appearance between the robots. The mindless transfer hypothesis therefore does not correctly predict the participants’ linguistic behavior. Table 6.11 shows those features discussed by Mitchell (2001) as characteristic of dog-directed speech, of which many do not exhibit peculiar distributions in our data. Table 6.11  Suspected features of mindless transfer in the three corpora ah imperative vocative we anthrop he MLU

Aibo M (sd)

Scorpion M (sd)

Pioneer M (sd)

F

p

0.615 (0.768) 0.796 (0.154) 0.001 (0.003) 0.000 (0.000) 0.043 (0.063) 2.538 (4.370) 3.972 (3.396)

1.625 (1.302) 0.793 (0.147) 0.044 (0.123) 0.154 (0.555) 0.048 (0.135) 3.750 (10.60) 4.616 (2.807)

0.444 (1.014) 0.606 (0.258) 0.000 (0.000) 0.250 (0.463) 0.025 (0.054) 2.000 (4.690) 5.088 (2.091)

3.474 3.138 1.381 0.713 0.185 0.154 0.403

0.045 0.060 0.269 0.499 0.832 0.858 0.672

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Table 6.12  Other differences between the three corpora rise level fall structuring type-token

Aibo M (sd )

Scorpion M (sd )

Pioneer M (sd)

F

p

0.043 (0.047) 0.554 (0.212) 0.226 (0.212) 0.099 (0.075) 0.235 (0.066)

0.092 (0.081) 0.646 (0.098) 0.210 (0.180) 0.026 (0.038) 0.318 (0.068)

0.454 (0.228) 0.236 (0.111) 0.216 (0.209) 0.031 (0.037) 0.232 (0.064)

27.151 15.866 2.251 5.502 4.802

0.000 0.000 0.099 0.010 0.016

Interestingly, there are no differences concerning the degree of anthro- or zoomorphization, uses of the robot’s name and only marginally concerning the amount of imperative clauses. That is, even though speech to dogs seems to be characterized by imperatives, vocatives and the uses of first person plural personal pronoun we, participants did not use any of these more often in the dialogs with the dog-like Aibo than in the other dialogs. Likewise, due to the fact that Aibo is a pet robot, emotional expression and zoomorphization would have been expectable to a higher degree if users really mindlessly transferred properties of human-dog communication to human-Aibo interaction; however, a difference could only be found for a single interjection, ah, which however occurs most often in the dialogs with Scorpion. Finally, there is no more personification of Aibo by means of the third person singular pronoun he than of any other robot.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 6.  Talking to robots 215

The differences that can be found concern prosody, some structuring cues, language variability, where the differences with respect to the type-token ratio are due to larger numbers of different words in speech directed at Scorpion, and, as mentioned above, the use of the interjection ah. Significant differences thus concern rather suprasegmental, expressive means; the quantitative analysis of the intonation contours used in the different corpora reveals that speakers use almost three times as many utterances with level intonation contours in the speech to Scorpion than when talking to Pioneer. Rising contours are five to ten times more frequent in speech to the Pioneer than in the other corpora (see Table 6.12). While the differences in intonation contours and structuring cues suggest that participants are influenced by the robots’ appearance, the characteristics observable are difficult to interpret. Furthermore, the similarities on the lexical, grammatical and pragmatic levels are pervasive. In principle, we cannot rule out that the reasons for the lack of differences between the three different situations may be due to the relatively small number of participants or to the fact that the same speakers participated in the three experiments, even though there were several months between the different sessions. Still, if significant differences can be found with respect to prosody, structuring cues and the type-token ratio, there seems to be no methodological reason why differences should not also be observable at other linguistic levels if there were any. To sum up, human-robot interaction constitutes for most speakers a very unusual communication situation, and thus it could have been expected that speakers use all information available to them from the robot to design their utterances for their unfamiliar communication partner. Accordingly, Nass and Brave (2005) have found that speakers employ the cues given by the voice of human-computer interfaces to infer properties of their communication partner. They propose that people mindlessly transfer knowledge about human social actors to human-computer interfaces (see also Nass and Moon, 2000). Starting from this hypothesis it could have been expected that users in our human-robot dialogs would transfer many properties of the robot source domain to the verbal interaction. However, this was the case only to a very limited degree, and the results obtained are very difficult to interpret regarding the mindless transfer hypothesis. One possible explanation for these findings can be found in the fact that in the dialogs investigated here, speakers are talking to an artificial communication partner more than to anything else. The hypothesis would thus be that instead of relying on cues from the source domains ‘doghood’, ‘scorpionhood’ or ‘technical devicehood’ for the three robots, the speakers in fact treated all three robots mainly as robots. In this case, however, appearance would not play any role at all. Another possibility is that interaction plays a major role. The actual interaction was very similar for the participants in all three corpora since the robots’

216 Designing Speech for a Recipient

behavior was identical in all corpora, and interaction could thus turn out to be more influential than appearance. This is in line with results from Brennan (1991) and Amalberti et al. (1993) who also found interaction to override initial effects of the first impression the computer made. Another indicator for the impact of interactional effects comes from the study of children interacting with robots; as Meltzoff et al. (2010), for example, demonstrate, 18-month old children understand robots as social actors only if they are being socially interacted with. Thus, while humanoid appearance alone had only little effect on the children recognizing a robot as a social actor, observing a human interacting with the robot made them accept the robot as a social actor. Non-contingent behavior by the robot was also not a sufficient indicator for the children; it was only when they observed the robot engaging in a contingent exchange with the human that they followed its eyegaze and thus responded to it as to a social agent. Thus, at least for the children in these experiments, interaction plays a more convincing role than robot appearance does. We will return to this issue below in Section 6.5 The Role of the Partner’s Feedback.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

6.4.2  External variables: Gender If appearance does not account for the broad variation of linguistic behavior towards robots, a better explanation may be provided by external factors, such as speakers’ gender. Many previous studies have shown sociolinguistic variables, such as speakers’ gender, to influence the linguistic choices speakers make (e.g. Labov, 1972; Trudgill, 1972), and thus it makes sense to investigate to what extent speakers’ gender determines the choices speakers make when addressing a robot. Furthermore, gender effects have been found in human-robot interaction, for instance, by Schermerhorn et al. (2008), Takayama and Pantofaru (2009), Kriz et al. (2010) and Mutlu (2011), among many others, though not consistently so (cf. Strupka et al., 2016). In the following, we shall analyze the role of the variable gender in the corpora under consideration. For most corpora investigated here, no significant differences with respect to gender were found. In the interactions with Rolland, the only gender difference is that there is a statistical tendency for men to greet the robot (r = .50, n.s.). However, some significant gender differences were found in the comparison of Aibo, Scorpion and Pioneer. As Table 6.1312 shows, there are significant gender differences in these corpora with respect to intonation, such that women use almost three times more utterances with falling intonation than men; this is in contrast to many studies on intonation and gender (e.g. Lakoff, 1972; Fishman, 1978). 12. Female speakers: n = 18, male speakers: n = 12.

Chapter 6.  Talking to robots 217

Table 6.13  Gender differences in the three corpora

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

slight fall feedback structuring MLU little instructions modifiers bit now concrete mood:adv anthropo he interjections abstract compass

Female M sd

Male M

sd

F

p

0.00 0.29 0.14 0.08 5.46 2.06 0.86 0.07 3.22 5.22 0.08 0.14 0.06 4.06 0.09 0.01 0.00

0.83 0.11 0.06 0.02 3.00 0.17 0.92 0.01 0.17 1.83 0.00 0.25 0.01 0.67 0.06 0.00 0.00

(1.19) (0.08) (0.05) (0.03) (0.78) (0.58) (0.03) (0.06) (0.39) (2.79) (0.00) (0.24) (0.02) (1.23) (0.05) (0.00) (0.00)

8.936170 7.775752 7.848319 6.756736 6.334910 5.772269 4.519685 4.107490 3.743646 3.371234 3.245697 3.029443 2.436212 2.060578 1.968610 1.029873 0.658824

0.005767 0.009412 0.009122 0.014738 0.017843 0.023156 0.042461 0.052313 0.063170 0.076984 0.082390 0.092747 0.125893 0.162230 0.167775 0.318878 0.423822

(0.00) (0.22) (0.10) (0.07) (3.30) (2.67) (0.09) (0.09) (5.43) (5.95) (0.16) (0.12) (0.10) (8.07) (0.07) (0.03) (0.00)

There is also a difference in modification, such that men tend to use the lexical item slight while women tend to use the lexical items little and bit instead and more modification in general. Women use significantly more structuring cues, such as now, and significantly more feedback signals. Women also produce significantly longer utterances and tend to use more concrete nouns than men. Men, in contrast, produce more (shorter) instructions per turns, which tend to be simple instructions by means of adverbs more often. There are however no significant gender differences regarding anthro- or zoomorphization, personification (as evidenced by the use of the third person singular pronoun he), emotional expression by means of interjections, the use of abstract or technical vocabulary or of the external reference system used by the robot involving the cardinal points of the compass (see Table 6.13). To sum up, most human-robot interaction data do not exhibit gender differences. Only in the communication with the pet robots, a few gender differences appear. These differences concern on the one hand the complexity level of utterances to the robot, as evidenced by longer MLU, more nouns and more structuring cues used by women, and more interactive devices, such as feedback, on the other. However, compared to the many indicators for more complex and more interactive speech that do not exhibit significant differences, for instance, the number of relative clauses, the sentence types (grammatical mood), the type-token ratio, the number of abstract nouns or the use of the extrinsic reference system, these differences have to be regarded as minor. Crucial is also that there are no statistical

218 Designing Speech for a Recipient

differences regarding anthropomorphization or the use of the third person singular personal pronoun he for the robot and emotional expression by means of interjections. We can conclude that speakers’ gender has some influence on the linguistic choices made. Since however the corpora are gender-balanced, gender differences do not account for the differences between the different data sets and thus between different communicative situations.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

6.4.3  Personal partner models and linguistic choice So far, the current investigation has concentrated on external aspects of the robot and on external properties of the speakers and their role in the explanation of the ways human participants talk to their artificial communication partners. Neither of the two external factors (robots’ appearance and speakers’ gender) investigated have been found to account for the choices speakers make in a given situation, especially not for the significant interpersonal variation. However, many theoretical approaches discussed in Chapter 2 The Role of the Addressee: Central Notions hold that situation is cognitively or interactionally construed. Since in our data, at least within each corpus, the robots’ behavior was experimentally controlled, the role of interactional construal may be limited, yet cognitive construal would suggest that there may be interpersonal differences in the way speakers understand the human-robot interaction situation. Thus, the heterogeneity of the data discussed in Section 6.1 Linguistic Description of the Variety may be due to individual ways of conceptualizing the robot, that is, due to different personal partner models. Turkle (2006) and Turkle et al. (2006) approach human-robot interaction from a psychoanalytical perspective. They report on analyses of interview data they elicited in nursery homes and elementary schools from elderly and children who had kept either Aibo or My Real Baby,13 a doll-like robot, for several weeks. The interviews show that both children and elderly people vary considerable regarding their relationships with these ‘relational artifacts’. For instance, one girl, who Turkle (2006) describes as living in a chilly social environment, rejects her robot as ‘not being the real thing’, refusing to pretend that there is emotion where there is not. Another girl, who is characterized as suffering from lack of attention from her parents, is described as providing her robot with the nurturing relationship that she is hoping for herself. In contrast, a boy who is seriously ill admires the robustness of the robot and stresses its supernatural capabilities. And finally, an elderly man is suggested to use the robot to overcome his divorce by discussing 13. See http://www.generation5.org/content/2001/mrb.asp.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 6.  Talking to robots 219

potential scenarios of reconciliation with the robot, but also by finding comfort and trust in the relationship. These case studies suggest that children and elderly people establish the kind of relationship with the robot that they are concerned with the most themselves. Human-robot relationships are thus highly individual and similar to a Rorschach test projections of the self, yet in interaction with the evocative object (see also Turkle, 1982). Turkle et al. (2006) therefore propose that the philosophical positions that people develop towards robots depend crucially on their emotional needs. This position predicts very high interpersonal variation in ways of addressing robots, and hardly any generalizability. The ways in which people talk to computers and robots should thus reflect mostly their personal emotional needs and be entirely individual. This hypothesis is supported by a study by Lee et al. (2006) who find participants’ amount of loneliness to correlate with their judgment of the social presence of the robot and its overall behavior; furthermore, they find lonely people to provide more positive responses to social agents. Also Paepcke and Takayama (2010) find a priori expectations to have a considerable impact on users’ experience with, and perception of, robotic pets. However, in their study, they did not work with preconceptions that people may carry into the human-robot interactions themselves, but instead they manipulated participants’ expectations by means of different advertising texts with which the robots were introduced. In one condition, the advertisement created high expectations, in the other rather low expectations were targeted. The authors find that depending on whether expectations were high or low, users judged the robots’ capabilities significantly differently. In the current study, we did not manipulate participants’ expectations but rather introduced a feature that allows the identification of the participants’ preconceptions that they carry into the interaction. In particular, dialog openings are places at which speakers negotiate their understanding of the current communicative situation and which are thus revealing regarding the degree of sociality participants ascribe to their robotic communication partners. In our dialogs, participants’ first utterances may indicate their preconceptions about their artificial communication partner as well as their definition of the situation in response to the robot’s dialog opening. In all dialogs, the robot’s first utterance was: yes, hello, how do you do?. The participants’ different reactions to this very first utterance indicate especially to what extent participants understand the respective humanrobot interaction situation as a social situation. For example, in the instructional dialogs with Aibo, speakers reacted very differently to the robot’s dialog opening:

220 Designing Speech for a Recipient

(62) Aibo: yes, hello, how do you do? A042: I I’m good, and you, (laughter) Aibo: which object should I head towards? (1) A042: towards (at=slow) the (at=prominent)cup(/a) on the(at=prominent) furthest,(/a)(/a) that’s furthest left. (5)

This speaker displays an orientation towards social aspects of the robot’s greeting. In contrast, other speakers do not orient to the social cues offered in the robot’s first utterance:

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

(63) Aibo: yes, hello, how do you do? A030: (2) go straight. Aibo: what can I do for you? A030: go straight. (7) Aibo: do you want me to go to the object that is in front? A030: (1) no, go straight.

In order to study the effect of speakers’ different concepts of their artificial communication partners as evidenced by their different dialog openings, their first utterances were analyzed according to a very simple coding scheme. The first utterance of the robot in all corpora that involve verbal robot behavior is always yes, hello, how do you do?. This utterance allows relevant contributions at different levels: Speakers can react to the greeting (for instance, by providing the counter-greeting hello), to the question (by replying fine, for instance), to the polite function of the question (for instance, by replying thank you), or to the pragmatic act as a whole (for instance, by responding with the counter-question and how do you do?). Dialog beginnings were now simply coded for the number of strategies employed by each speaker. For instance, the dialog beginning for R004 was coded as 0: (64) Rolland: Yes, hello, how do you do? R004: (4) Rolland: You can take us now to a place you want to name. R004: (2) table. (laughter)

R017 was coded as 1 for the minimal reaction to the content of the question (‘I’m good’): (65) Rolland: Yes, hello, how do you do? R017: (2) I’m good, Rolland: You can take us now to a place you want to name. R017: (3) I would like to (at=lengthening) go (/a) to (at=lengthening) the (/a) computer. straight.

Chapter 6.  Talking to robots 221

R051 was coded as 2, since the speaker answers the question (fine) and recognizes the polite function of the robot’s question (thanks): (66) Rolland: Yes, hello, how do you do? R051: (laughter) fine thanks. okay so, Rolland: You can take us now to a place you want to name. R051: (1) we are now going to – the table.

R022 was coded as 3 for she reacts to the contents of the question (I’m fine) as well as to its polite function (thank you), and she reciprocates the greeting (how are you?):

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

(67) Rolland: Yes, hello, how do you do? R022: I’m fine thank you, how are you?

These different dialog openings reflect the considerable differences in how speakers understand the situation in terms of the relationship between the participants; while some speakers conceptualize the situation as social, others do not. A possible interpretation of these findings is that speakers may, or may not, enter a level of joint pretense (Clark, 1999; Fischer, 2006b, 2011). If they accept the computer or robot as a communication partner, taking a playful attitude towards the robot “as if it was human”, as it is evident from a reply to the greeting at at least two levels, we refer to them as ‘the players’ below. Thus, all dialog beginnings were coded on a scale from 0 to 3, and dialogs coded as 2 and 3 were grouped together as players, whereas those coded as 0 and 1 are referred to as ‘non-players’. The participants in this study were exchange students from various English speaking countries at the University of Bremen. None of them had participated in any other human-robot interaction experiment before. The corpus consists of eleven dialogs between native speakers of English, seven female and four male, and a robotic wheelchair. Interactions took about 30–45 minutes. In these dialogs, participants were asked to train the robotic wheelchair regarding particular spatial locations in a flat furnished for a handicapped person. Participants were told that the robotic wheelchair, the Bremen autonomous wheelchair Rolland (Lankenau and Roefer, 2001), would learn the labels for particular locations and would be able to take them to the locations in question after the training phase. Participants had to carry out four different tasks: Task 1 The first task was to familiarize the robot with useful locations in the flat, such as the kitchen or the dinner table, but also in front of the TV. Task 2 The second task was to summarize the locations taught to the robot from a static position and then to listen to and correct the robot’s summary of what it had learned.

222 Designing Speech for a Recipient

Task 3 The third task was to familiarize the robot with particular locations in the building, such as the room with the photocopying machine.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Task 4 The final task was to instruct the robot verbally to take the user to one of the locations in the building the robot had been familiarized with previously.

In contrast to the dialogs with Aibo reported on above, in which the role of the robot’s linguistic output was to suggest higher cognitive and linguistic capabilities than speakers had assumed in the dialogs with Aibo without verbal output and not to provide any other cues as to the functioning of the robot, thus not being helpful at all, the robot’s linguistic utterances in the dialogs with Rolland were explicitly designed to ensure natural dialogs and to guide users subtly and implicitly into a useful understanding of the tasks. The different tasks involve activities of different degrees of interactivity; while the first task is interactional such that the robot basically elaborates on every utterance the participant makes, indicating a high level of understanding, aiming at fluent dialog and involving the user in the grounding process, the second task is asymmetrical such that first the speaker has the turn while summarizing the locations in the room and then the robot summarizes what it supposedly has been taught. This phase is particularly useful to study to what extent speakers are inclined to provide the robot with feedback and whether they attend to conversational practices like the turn-taking system. The third task is similar to the first, but the robot does not produce any linguistic output (which it announces at the beginning of that phase). This phase is interesting with respect to the re-use of linguistic resources negotiated in the first task and will be discussed below in Section 6.5 The Role of the Partner’s Feedback. Finally, the last task consists in instructing the robot to move to a location it had been familiarized with in Task 3. This task is used here because it is comparable to the other robotic instructions scenarios we have looked at above, involving Aibo, Scorpion and Pioneer. Furthermore, if significant correlations between the very first utterance and speakers’ linguistic behavior in this last task can be established, even though about half an hour of interaction has occurred in between, this demonstrates that the role of conceptualization as apparent from the dialog opening is indeed a strong predictor for linguistic behavior. During the teaching and instructing situations, participants had to steer the Rolland like a normal electric wheelchair. Only at the end of the last task the robot was meant to move by itself. Yet, since the autonomous, speech-driven version of Rolland was not available at the time of the experiments, Rolland would claim that its batteries would have to be recharged first, after which the experimenter would end the interaction. The robot’s utterances were played according to a fixed schema by a human wizard behind the scenes. In a pre-study (see Section 6.5 The Role of the Partner’s

Chapter 6.  Talking to robots 223

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Feedback below), typical locations, labels and strategies employed by users had been identified, on the basis of which the schema of robot utterances for the elicitation of the data used here was designed for each possible location. The wizard’s task was to check to which location the user had moved the wheelchair and to play the presynthesized utterances specified for this location in a prespecified order. Although this procedure seems quite unnatural, the resulting dialogs are in fact quite smooth, and it is important to understand that for the participants themselves, the situations are interactional. The fact that all participants are exposed to the same utterances in the same sequence allows us to compare the dialogs across participants and to study the participants’ individual sense making activities. Possible effects that could limit the comparability of the dialogs produced by the fact that the same robot utterances occur in response to different kinds of utterances by the participants (see Section 3 Methods) are minimized because the robot’s utterances are almost exclusively relevant next utterances, thus providing only minimal feedback to the previous utterance. A questionnaire study involving pre- and post-questionnaires (Andonova, 2006) revealed furthermore that (a) participants had indeed believed to be talking to an automatic speech processing system, (b) they found the interactions pleasant, and (c) their attitude towards robots in general improved significantly during the interactions. The following sample dialog illustrates the relative smoothness of the interactions: (68) Rolland: Yes, hello, how do you do? R047: (1) I’m fine thank you. Rolland: (2) you can take us now to a place you want to name. R047: alright, – we are going to go to the dining room table, (3) Rolland: I’m sorry, which kind of table? R047: dining room table. (…) Rolland: okay, thank you. R047: (6) now we are going to the sofa, (3) Rolland: is this part of the living room area? R047: – yes . – Rolland: which name do you prefer, sofa or living room area? R047: (3) the s+, - living room area the c+ the sofa is in the living room area, – Rolland: okay, thank you.

Like the dialogs discussed above, the speakers’ utterances in the fourth, instructional task were analyzed concerning a broad spectrum of linguistic features. The variable instructional strategy comprises the grammatical moods (sentence types) declarative, imperative, and interrogative. In addition, speakers also produce instructions without overt verbs; thus, we find adverbial phrases, prepositional phrases and just noun phrases by means of which speakers instruct the robot. In particular, the following instructional strategies are distinguished:

224 Designing Speech for a Recipient

declarative R013: so we are going out of, - this room, imperative R017: turn around and leave the room adverbial phrase R017: straight, prepositional phrase R048: to the sofa? noun phrase R004: table. interrogative R043: (2) do you see the plant on the left forty five degrees, (1)

Furthermore, the different instruction strategies per speaker were counted, and utterances in Task 4 were manually annotated for the occurrences of relative clauses: variability in instruction strategy The numbers of different instruction strategies per speaker were encoded for the dialogs in Task 4; for instance, if a speaker uses both imperative clauses and adverbs to instruct the robot to move to the target location, the variability measure for this dialog was encoded as 2; number of relative clauses The number of relative clauses for each speaker was divided by the number of turns in Task 4; relative clauses have previously been identified as indicators of high ascribed competence (cf. Fischer, 2006b).

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

In addition, the dialogs in all tasks (Task 1–4) were encoded semiautomatically for several linguistic properties (not all features being however equally relevant in all tasks): number of turns First, for each participant the number of turns produced during the task under consideration was determined; the number of turns tells us about speakers’ linguistic effort spent on the instruction and serves as the basis for the calculations of the other measures; number of structuring cues Structuring cues comprise implicit structuring devices like discourse particles (for instance, now, okay or so) and explicit structuring cues (for instance, first of all, the next step or and then); the individual tokens were counted for each speaker and, since their occurrence is relevant for the relationship between turns rather than words, divided by the number of turns; technical terms Instances of technical terms, such as angle, degree, meter were counted and divided by the number of turns; number of politeness formulas Politeness formulas, such as please and thank you, were counted for each speaker and divided by the number of turns of each speaker; number of feedback signals As an indicator for speakers’ interactivity with their artificial communication partner in Task 2, their use of feedback signals was analyzed, in particular, the numbers of occurrences of the feedback signals yes, yeah, okay, good and uh-huh per speaker; personal pronouns As indicators of interactivity with the robot, instances of personal pronouns, especially I, you and we were counted, as well as the interactive use of let’s; furthermore, pronominal reference by means of it and this was counted;

Chapter 6.  Talking to robots 225

instances of vocatives The instances of the robot’s name were counted as an indicator for interactivity; number of prosodic peculiarities Like for all dialogs in this study, the numbers of prosodic strategies to enhance the understandability of one’s own speech were counted and divided by the number of utterances; amount of off-talk The number of utterances addressed to the experimenter were counted as a rather indirect indicator for the amount of attention the participant pays to the interaction with the robot in relation to the attention paid either to the experimenter or to the steering process; instances of alignment Instances of re-use of constructions repeatedly used by the robot, in particular, we are going and we are in front of are counted; dialog beginnings Finally, dialog beginnings were encoded as described above.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

A Pearson Product Moment Correlation analysis of the data shows that the different conceptions of the human-robot situation as either social or non-social correlate significantly with the choice of a broad spectrum of different linguistic features. Let us first inspect general correlations between dialog opening and linguistic choice holding across the four different tasks (Table 6.14 reports correlations greater than .20).14 Table 6.14  Correlations >.20 between dialog openings and linguistic behavior across all tasks Feature

r

speaker’s gender number of turns structuring cues politeness formulas off-talk technical terms you I vocatives discourse particles

0.500298 0.256090 0.213899 0.538205 0.396937 0.524626 0.615042* 0.235065 0.214281 0.341137

For the dialogs across tasks, the analysis reveals that speakers’ gender is correlated with the dialog beginning: r = 0.50, which means that in these data, the male participants were more likely to greet the robot in a social way than the female participants. This is in accordance with findings by Schermerhorn et al. (2008) who find men more likely to anthropomorphize robots. The data furthermore show that 14. * = p < .05, n = 11.

226 Designing Speech for a Recipient

users who attend to the social aspects of the robot’s greeting involve the robot more (as is evident from the numbers of personal pronouns, the politeness formulas and the instances of discourse particles and vocatives), that they entrust it with more cognitive competence as is apparent from the higher numbers of structuring cues and technical vocabulary. The relationship between the dialog beginnings and the linguistic choices made throughout the interactions was then analyzed for the last task, in which participants had to instruct the robot to take them to a particular location. This last phase is interesting particularly because of the correlation between dialog beginning and grammatical mood (sentence type, see Table 6.15).15 In the instructional Task 4 at the end of the interactions, the use of declarative mood correlates significantly with the dialog beginning. At the same time, the imperative and especially instructions by means of adverbs are negatively correlated with dialog beginnings. In general, speakers who understand the robot as a social communication partner as apparent from their dialog openings tend to use more declarative sentences and slightly more variable instructions, including more instructions by means of interrogatives and nouns and fewer of the simple instructions by means of adverbs and imperatives that we have seen in the instructions of Aibo, Scorpion and Pioneer.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Table 6.15  Correlations between dialog openings and linguistic behavior in Task 4 Feature

r

imperative declarative PP adverbial interrogative NP variability vocative you we technical terms discourse particles relative clauses prosody: prominent prosody: lengthening prosody: loudness

−0.087162 −0.729071* −0.034463 −0.346410 0.288675 0.288675 0.149071 0.252472 0.406949 0.453583 0.403786 0.341137 0.182020 0.316877 0.217104 0.288675

15. Abbreviations: PP = instruction by means of prepositional phrase, NP = instruction by means of a noun phrase.

Chapter 6.  Talking to robots 227

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Moreover, in the fourth task, the more social aspects are attended to in the dialog beginning, the higher is the amount of instances in which participants call the robot by its name, address it by means of the 2nd person singular pronoun or refer to themselves in the autonomous wheelchair by means of we. Furthermore, there is a tendency towards more structuring cues, more technical terms, and more relative clauses, indicating a higher suspected capability of the robot. Finally, speakers who use more social aspects in the dialog beginnings also tend to use more of the prosodic features identified above as relevant for solving communication problems. To conclude, the analysis presented has shown that the variability in linguistic choice corresponds to a considerable extent to participants’ differing conceptionalizations of their communication partner as social or non-social as measured by the different dialog beginnings. Even though the correlations often do not reach significance, which may be due to the small number of participants (n = 11), all features pointing to interactivity, complexity and politeness, thus features that serve as indicators that the speaker takes the robot as a serious communication partner, are positively correlated with the dialog beginning, and even in Task 4, that is, about 30–45 minutes after the initial greeting, there are still relatively high statistical correlations. The next section addresses to what extent the robot’s linguistic and non-­linguistic behavior influences the way it is talked to and to what extent its feedback determines the partner models people build up of the robot.

6.5  The role of the partner’s feedback In this section, we discuss whether and how the linguistic output of the robot influences the way it is being talked to. In Section 6.3, we already investigated to what extent participants’ linguistic choices were influenced by the fact that the robot was speaking at all and by the linguistic features of the robot’s utterances. There we found that participants align with the linguistic features of the robot’s utterances and that in addition participants make use of numerous other adjustments such that instructions become considerably more complex. The study presented here not only involves a different robot, the autonomous wheelchair Rolland we encountered in the previous section, but also yields slightly different results with respect to the role of linguistic feedback from the robot, narrowing down how interaction influences participants’ linguistic choices in HRI. The data considered here consist of twenty dialogs between native speakers of English and the Bremen autonomous wheelchair Rolland. The users’ task was to teach the wheelchair locations and labels for these locations in order to augment its internal map. Here we consider both conditions, one without verbal robot output

228 Designing Speech for a Recipient

(Condition 1, n = 9) and one with speech (Condition 2, n = 11), which served as the basis for the analyses in the previous section. Compared to our study of the role of robot output in Section 6.3 The Role of Alignment, the major finding of the comparison of the current dialogs with verbal and non-verbal robot behavior is that in contrast to the increase in variability of linguistic choices in the dialogs with Aibo, in this set of dialogs linguistic variability actually decreases with the presentation of linguistic utterances from the robot. In particular, on the basis of the robot’s linguistic output, speakers seem to have had a much clearer image of both the task and the capabilities of the robot. This shows, for instance, in the fact that utterances such as the following do not occur in the corpus in which the robot produces verbal output, but only in the corpus without verbal robot output:

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

(69) R037: this is a nice TV, I don’t know if it is colour or black or white, (1) okay here’s a nice little colour TV? (1) when you don’t have anything to do at work, you can switch it (at=lengthening)on(/a) and watch all the soap operas. – that are on in a day time. - - and next to that is a little, um lamp, with, looks like piggies, (1) don’t have my glasses on, or little dogs, (3) and, - that will give you some light,

The example exhibits much off-talk and attributions of characteristics of the robotic addressee which are not really applicable to a robotic wheelchair, such as watching soap operas after work. Given the task to instruct the robot to augment its internal map of the environment for future reference, the granularity level of the user’s descriptions is also slightly problematic for an automatic speech processing system. The number of similarly problematic examples decreased considerably in the second condition in which the robot prompted the users into producing relevant descriptions. Moreover, the linguistic variability in the lexical expressions chosen also decreases. Thus, in Condition 1 we can find different names for the objects in the room, for instance dinner table, dining table, supper table, eating table, table, table with plates, my table, table with the fern. In addition, also a variety of different syntactic constructions can be observed, for example, the imperative: drive to the desk; the declarative with a first person singular personal pronoun: I’m driving to the coffee table and first person plural declarative we-constructions: we are driving right now, (1) to the TV set; the use of modal verbs: now we can go to the table; or no verb at all: from the desk to the computer table one meter back. The following examples illustrate the spectrum of different kinds of instructions participants produced in the first, non-verbal condition with Rolland:

Chapter 6.  Talking to robots 229

(70) R030: so I want to go over to the sofa. so go right, (3) right okay. - so, go over to the direct (…)? – so, – (hnoise) (at=lengthening) and (/a), – (at=lengthening) um(/a), I want you to go to um, no (71) R026: stove (2) a bit, - (…) (2) move to the refrigerator? - (hnoise) (4) this is the refrigerator? (9) to the right? - and here’s the next wall? (2) and here is the computer? (72) R036: okay, turning around to this, - right, – and going, back. (3) (at=quiet) a little bit more (/a), (4) (hnoise), – okay, going back(at=lengthening)wards(/a)? (1) okay (73) R031: please, (2) please go forward, (1) two feet and then, (1) make a right turn – go straight ahead. - and stop, (1) in front of the, (1) the coffee table. (41) now we go, - go backwards about, five feet? (10) now we’ll turn to the right? (74) R033: tree, (28) TV, (8) (breathing) (15) table, (13) exit one, (75) R035: take me to the (at=lengthening)stove(/a), (12) ‘kay, (5)(a =slow)now(/a) take me to the (at=prominent)plant(/a). (11) okay, thank you, (noise) (9) now please take me (at=lengthening) to the (/a), - (at=prominent) television(/a)

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

(76) R037: okay dear, Rolland. this is the refrigerator from Mr. Bach’s room . (3) uhm, (breathing) (2) this is the computer, – I don’t know who it belongs to, (22) this is a little TV, I don’t know is it colour or black or white, (2) (77) R039: uhm, (hnoise) let’s go to the (at=prominent) tea-table (/a) first, tea-table is on the left, (2) this is a tea-table up here.

Regarding the spatial instructional strategies taken, also several different options are being pursued by the speakers, for example, path description: a sharp curve to the right, left, right ahead towards the chair, stop, one two three four five seconds, stop; goal object naming: a set up dining table, a group of seats; or even functional descriptions: I’m driving to the meal now or watering the plants! backwards, left, straight ahead. This variety of different choices is much reduced in the condition with verbal output, compare: (78) Robot: Yes, hello, how do you do? R016: (1) (hnoise) (3) oh how are you, Robot: you can take us now to a place you want to name R016: – (breathing) (2) take me, – (at=lengthening) to the (/a) computer please, Robot: (36) where are we going to? R016: (4) we’re going to the (at=prominent) stove (/a) (3) or, - hot plate. Robot: is this part of the kitchen? R016: this is the (at=prominent) hot (/a) plate

and:

230 Designing Speech for a Recipient

(79) Robot: Yes, hello, how do you do? R047: (1) I’m fine thank you Robot: you can take us now to a place you want to name. R047: alright, – we are going to go to the dining room table, Robot: I’m sorry, which kind of table? R047: (at=prominent)di(/a)ning room table

There is also alignment at the discourse level; in the verbal interactions with Rolland, the third utterance of the robot is where are we going to?. This utterance was chosen on the basis of our findings in Condition 1 that the we-construction was possible, but not very frequent. In Condition 2, many speakers align with this construction, and interestingly not only once, in sequential position after the robot’s utterance, but throughout as a means to mark major discourse boundaries, for instance:

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

(80) Robot: where are we going to? R013: we are going over to the coffee table. (14 turns later) R013: and now we will go to the table. (9 turns later) R013: and now we will go to the couch. (6 turns later) R013: and now the easy chair. (27 turns later) R013: and now we will go to my desk.

In the meantime, the robot had not used the construction again. Thus, the speaker stored properties of the construction (yet changing tense and aspect) as a means to announce new locations, which mark major discourse boundaries in the current scenario. Given the noticeable differences between participants’ linguistic behavior in the two scenarios, we can now address what role the robot’s utterances play in detail by analyzing which linguistic features of its utterances have which kinds of effects. The robot’s utterances were especially designed to provide users with implicit feedback, as it is common in conversation among humans (Clark and Schaefer, 1989a; Heritage, 2007, p. 257). As we have discussed in Chapter 2 The Role of the Addressee: Central Notions, according to Clark and Schaefer (1989b) speakers have different methods for giving feedback at their disposal, for instance, continued attention, initiation of relevant next contribution, acknowledgment, demonstration, and verbatim display. Current automatic speech processing systems are quite restricted with respect to prosody synthesis, and since feedback signals such as uh-huh have been shown to be extremely dependent in their interpretation on the prosodic contours with which they are delivered (Gardner, 2001), providing feedback by means of feedback signals (acknowledgments) is not suitable in a scenario with presynthesized robot utterances. Likewise, our robot was not able to provide feedback by means of demonstration. Consequently, the range of possible means for producing robot feedback is more restricted than feedback in ordinary conversation is.

Chapter 6.  Talking to robots 231

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

For this corpus, we designed several ‘relevant next contributions’, for instance: is this (e.g. the sofa) part of the living room area? and which name do you prefer, stove or kitchen? and is this where you want to be to open it? (the refrigerator). Furthermore, several clarification questions were devised to signal partial acceptance of the previous utterance, for instance: I’m sorry, which kind of table? or did you say computer?. There are furthermore clarification questions that were intended to signal successful linguistic processing yet uncertainty about the reference, for instance: is it the object in the corner? or am I in front of it? and just to get this right: I am in front of it, is that right?. The latter two utterances were structured according to a spatial perspective which was relatively frequent in Condition 1, for instance in utterances such as “we are left of the photocopying room.” Since we were also interested to determine whether users would pick up and align with the robot’s utterances, we used these utterances to test whether users would make use of the egocentric reference system suggested by these utterances. The next utterance would then be oh okay, I see which object you mean to ensure that the user understood that the reason for the robot’s clarification question was a failure to recognize the object in question. Finally, there were real acknowledgments, such as okay, thank you. The robot utterances used are the following: – Introductory utterances yes, hello, how do you do? where are we going to? okay, where are we going to next? – Relevant nexts is this part of the kitchen? is this part of the living room area? which name do you prefer, stove or kitchen? which name do you prefer, sofa or living room area? is this the place to watch TV or to switch it on or off? is this where you want to be to open it? – Explicit acknowledgments okay, thank you oh okay, I see which object you mean ah, the dining table – Clarification questions signaling partial understanding did you say computer? is this the name? the armchair? the computer? did you say coffee table? I’m sorry, which kind of table?

232 Designing Speech for a Recipient





Questions about object identification signaling understanding am I in front of it? just to get this right: I am in front of it, is that right? is it the object in the corner? Clarification question signaling little understanding I’m sorry? the what?

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

A qualitative analysis of the dialogs allows us to investigate how speakers react to these different kinds of feedback. To begin with, consider the following dialog: (81) R013: (1) and now we will go to the couch. Rolland: (15) is this part of the living room area? R013: yes, Rolland: (2) which name do you prefer, couch or living room area? R013: (1) couch. Rolland: (1) okay, thank you. R013: (1) (at=“slow”) and now, (/a) – the, (1) the easy chair . Rolland: (3) I’m sorry? R013: – the (at=“loud”) chair, (/a) Rolland: (7) am I in front of it? R013: – yes . – Rolland: oh okay, I see which object you mean. R013: (3) (at=“slow”) and, – now, (/a) – to the, coffee pot.

The example illustrates that, for instance, the robot’s utterances okay, thank you and oh okay, I see which object you mean are taken as signals of successful grounding, since the speaker continues with the next instruction after that. In contrast, the clarification question I’m sorry? is taken as a problem concerning speech recognition: the speaker produces the next utterance louder, restricting himself to the head noun of the noun phrase used previously. Finally, am I in front of it? is treated as a relevant next, as expected: the speaker answers the question, taking the robot’s utterance as an indicator that his or her previous utterance is sufficiently grounded. Throughout the corpus, speakers’ reactions to the utterances investigated in the example are consistently like the uses in the example above: okay, thank you and oh okay, I see which object you mean are taken as signals of successful grounding, I’m sorry is taken as an indicator of a recognition problem, and am I in front of it? is treated as a relevant next. In their responses, speakers make no difference between possibly different degrees of grounding (a verbatim display such as a repetition should serve as better indicator of understanding than an acknowledgement or a relevant next). This is consistent with findings by Brown-Schmidt (2012) that revealed no significant differences between the effects of different types of feedback

Chapter 6.  Talking to robots 233

on any depth of grounding. This may be due to the fact that grounding occurs “well enough for current purposes” (Clark and Schaefer, 1989a, p. 290), and there is no reason to doubt that the information exchanged is not sufficiently grounded. With respect to the last utterance, further interesting effects become apparent; consider: (82) R017: an’ now to the table, (5) an’ straight, (3) Rolland: I’m sorry, which kind of table? R017: (1) a small, – table, (1) that has a cup, (3) Rolland: is this part of the living room area? R017: (1) uh, yes . (4) Rolland: am I in front of it? R017: – yes, (1) Rolland: oh okay, I see which object you mean. R017: (1) and now the couch, (7) now I’m in front of it. (2)

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

(83) Rolland: Where are we going to? R022: (3) we are (at=slow)going to(/a) the writing desk. Rolland: (12) Am I in front of it? R022: - not yet, (9) now you’re in fron’ of it. (1) Rolland: okay, thank you. R022: – you’re welcome, (4) (84) R043: we are going into the (at=prominent)li(/a)ving room. (4) do you see the television? the television is in front of you. (4) the television is in the yellow on top of the yellow uh - cupboard, (1)

These examples show that speakers do not only interpret the robot’s clarification question as feedback on the acceptance of the presentation of the name of the location and as an indicator of a problem in object recognition, but as an indirect speech act asking the user to position the wheelchair in front of objects. Six out of the eleven speakers interpret the robot’s utterance as a statement that it needs to be in front of an object in order to perceive it. Altogether there are twelve such instances in the dialogs. Thus, the robot’s feedback utterances and clarification questions were taken up by the participants as indicators of the nature of the robot; that is, these utterances contributed (unintendedly by the dialog designer) to the users’ partner models. This is in line with results by Schober (2009) on spatial capabilities; Schober demonstrates that speakers with high spatial capabilities use the very first utterance of their communication partners to judge their spatial capabilities and to take over the load of producing highly adjusted spatial instructions themselves, making it as easy as possible for their communication partners, providing further evidence for the concept of ‘least collaborative effort’ as suggested by Clark and Wilkes-Gibbs (1986).

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

234 Designing Speech for a Recipient

What can thus be observed throughout these dialogs is that, as in the humanAibo dialogs in Section 6.3 The Role of Alignment, speakers take the robot’s utterances into account on a local basis, but they also make choices that they expect to be understandable for the robot on the basis of their much more concrete model of its capabilities due to its linguistic output. Thus, the robot’s feedback serves both locally in shaping the participant’s speech and globally in a kind of membership analysis. We can conclude that linguistic interaction both enriches and constrains variability: If the partner model is too basic, for instance, because of a simple task, or because of low expectations due to lack of verbal feedback from the robot, there is no variation, and participants resort to very simple commands. If however the robot’s speech capabilities evoke a more elaborate, yet highly vague partner model, as in the dialogs with Aibo, this leads to high variability. That is, if the robot’s utterances do not inform the participant implicitly about the robot’s capabilities, participants make up their own theories of their communication partner, which may differ both from each other and from the dialog designers’ hypotheses about what users may produce. In the absence of concrete evidence of the robot’s capabilities, people may bring in their own ideas and expectations, and especially the latter lead to considerable disappointment when they are not met (cf. Paepcke and Takayama, 2010; Cha et al., 2014). If, in contrast, the partner model matches the robot’s cognitive capabilities appropriately, participant’s speech is variable within reasonable limits. The very fact that a robot produces (not just understands) natural language utterances may thus change the partner model; in addition, people make use of their partner’s feedback in order to build up increasingly accurate partner models. In Fischer and Saunders (2012), we show how participants adapt to those linguistic features that are produced by the developing robot, and that otherwise their speech to the robot is characterized by immense variability. In these experiments, eight participants interacted with a humanoid robot over five sessions, while the robot was being trained on each participant’s utterances between sessions. It then produced increasingly more key words it had learned from each participant over the course of the five sessions. Participants changed their feedback behavior over time (when the robot was not producing speech itself in session 1, there was very little feedback compared to later sessions), as well as the amount of speech produced (they talked more when they had evidence that the robot was indeed learning), they used fewer attention getting devices when they had evidence that the robot was in fact learning and hence attending, and they reduced the complexity of their utterances and shortened their mean length of utterance over time. Nevertheless, for all other features of their utterances, for which they did not receive direct evidence with respect to how useful or appropriate they were, especially those concerning the interpersonal relationship with the robot, their behavior did not change, leading to considerable interpersonal variation. Thus, participants took the robot’s behavior

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 6.  Talking to robots 235

as indicator of its capabilities and adjusted their instructions according to their (very similar) ideas of what the situation was all about, yet their behavior did not change in areas they did not receive feedback for. The adjustments consequently concern only restricted areas of people’s tutoring, yet they are not simple reflexes to the robot’s feedback. Similarly, in a set of experiments with the humanoid robot iCub, Fischer, Lohan, Saunders, Nehaniv, Wrede and Rohlfing (2013) and Fischer et al. (2014) demonstrate that nonverbal responses from the robot initiate global changes in the human tutor’s utterances depending on their timing; if they occur in coordination with the tutor’s social behaviors, i.e. in response to the tutor’s eyegaze and demonstrating behavior, tutors understand the robot as a more competent communication partner such that they attempt to establish understanding more seriously by reducing the complexity of their utterances and by increasing the interactivity of their instructions (Fischer et al., 2013). In contrast, the same behaviors that are not contingent with the tutor’s social behavior do not have this effect. Thus, the timing of the robot’s nonverbal feedback serves the same function that Schegloff (1982) has suggested for continuers: By way of their placement, continuers function to signal that the listener understands the current state of talk. The robot’s gaze behavior can be argued to fulfill the same function if it is coordinated with the tutor’s social behavior. Thus, here the robot’s feedback serves global functions that concern the tutor’s partner model. Fischer, Lohan, Nehaniv and Lehmann (2013) investigate the relationship between implicit, nonverbal robot feedback and explicit signs of understanding by a robot that either does or does not respond contingently to the tutor’s social cues as well as produces the words it has learned from the tutor in previous sessions (thus combining the research questions from Fischer and Saunders (2012) and Fischer et al. (2013)). Their results indicate that the impact of the timing of the robot’s feedback signals has considerably more impact than the explicit signs of understanding; in particular, only those who were exposed to the socially timed robot behaviors did take the robot’s learning success into account by adjusting their tutoring strategies, if at all (Fischer et al., 2013). To sum up, people take their partners’ feedback into account continuously and online. For this, they may make use of explicit information about the partner, such as a suspected preference to be in front an object in order to be able to perceive it, or of implicit properties of their partners’ responses, such as the timing of their nonverbal reactions (Fischer et al., 2013) or the intonation contours of their feedback signals (cf. Gardner, 2001). Implicit, nonverbal signals seem to play at least an as big role as explicit feedback (cf. also Pentland, 2010) in influencing the partner models speakers build up. People may furthermore engage in active membership analysis (Schegloff, 1972a) in order to build up a partner model that allows

236 Designing Speech for a Recipient

them to formulate their utterances. Importantly, however, the partners’ feedback is unlikely to be responded to directly, such that the speaker’s formulation is a direct, unmediated response to the partner’s previous action, but instead feedback is taken to enrich and update the speaker’s partner model, since the responses in speakers’ behavior has been found in all of these experiments to concern functional configurations of features, rather than being local, immediate adjustments based on the partner’s actions (cf. Fischer et al., 2014).

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

6.6  The relationship between partner modeling, alignment and feedback So far we have seen that the speakers’ personal partner models, alignment and the robot’s linguistic output may constitute important factors in speakers’ linguistic choices. In this section, the question is addressed in which ways these different resources and strategies interact with each other. Regarding partner models, we have seen that the robot’s appearance seems to have only a limited effect. In particular, participants seem to rely only marginally on the robot’s resemblance with more familiar agents or objects to build up a partner model. Instead, participants’ initial expectations about what it is like to talk to a robot seem to play a more prominent role; in particular, the participants’ understanding of the human-robot interaction situation as social or non-social as evident from their first utterances was found to have a considerable influence on the linguistic features chosen later. That is, participants’ partner models may differ considerably with respect to the amount of sociality construed, which has a considerable impact on users’ linguistic choices. In particular, in Section 6.4.3 Personal Partner Models and Linguistic Choice we have seen that the way participants conceptualize the human-robot situation at the beginning of the dialogs has a considerable impact on their linguistic choices throughout the dialog. This is consistent with several other studies on user expectations; for instance, Paepcke and Takayama (2010) find that if participants’ expectations were high, participants were all the more disappointed if the robot did not meet these expectations, and they evaluated the robot more negatively than participants whose expectations had been low from the beginning. Similarly, Pearson et al. (2006) find significantly more alignment with a ‘basic’ computer than with an ‘advanced’ computer, all other things being equal. Furthermore, we found that while appearance does not contribute to evoking particular expectations and thus to the constitution of particular partner models, the different tasks involved in the different human-robot interactions investigated turned out to be more influential: While Aibo, Scorpion and Pioneer look different, they all performed the same, very basic, task, namely to move to objects on

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 6.  Talking to robots 237

the floor in front of the participant. The comparison of the dialogs with the nonverbal Aibo, Scorpion and Pioneer in the spatial instruction task showed that the three robots, irrespective of their appearance, were addressed with similarly basic instructions. In contrast, non-verbal Rolland was confronted with highly variable, quite sophisticated utterances (at least by some participants). This may be due to the much more sophisticated tasks in the dialogs with the automatic wheelchair. That is, Rolland was introduced as a robot that can learn the names of locations and which can take the user to that location after the training session. Thus, even though Rolland was not even capable of autonomous action (and in fact was just an ordinary electric wheelchair), the task to learn about the environment from natural language utterances has led the participants nevertheless into much more sophisticated utterances than the task to make the robot move to objects in the same room as the human interactant. Thus, the two different tasks made people suspect very different robot capabilities which in turn evoked very different linguistic behaviors from the users even in the non-verbal conditions. Consequently, the task a robot has been designed for seems to lead to inferences about its capabilities and thus to contribute to the partner model. Furthermore, the fact that the robot produced verbal output triggered more varied output in the interactions with Aibo; utterances directed at the speech producing robot were more elaborate than utterances directed at a robot (seemingly) equipped with speech understanding capabilities only (see also Cha et al., 2014). However, while the variability of structures chosen in the dialogs with Aibo increased when it was producing natural language utterances, variability decreased in the verbal interactions with Rolland. Thus, we cannot simply infer that the use of natural language by an artificial communication partner makes people use more features from human conversation. Instead, we can observe that the robot’s natural language utterances in both cases led speakers to produce language more appropriate for the task. Natural language thus does not function as a ‘trigger’ for conversational behaviors, i.e. it does not make participants transfer mindlessly from conversation among normal adults (Nass, 2004), but rather has subtle functional effects depending on the linguistic choices made from the perspective of the dialog designer. This finding was confirmed by our study on the role of the robot’s feedback in the dialogs with Rolland. Here we saw that the robot’s utterances contributed to the participants’ partner models both locally and globally. In particular, we found that the robot’s utterances were interpreted in terms of a partner model such that the robot’s feedback was scanned for evidence about possible preferences of the robot. In the case investigated, people were drawing conclusions from the robot’s utterances that had not been intended by the dialog designer, which illustrates the influence of individual linguistic design decisions. The partner’s feedback thus

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

238 Designing Speech for a Recipient

influences the speaker’s partner model, leading to revisions and, if necessary, to updates of the partner model. These interim conclusions all suggest that partner models play a considerable role in the design of utterances for the communication partner. Regarding the contents of partner models, we have found participants to keep track of words the robot seems to understand, linguistic constructions it has used itself, spatial references it has made and information it has requested; thus, speakers seem to build up partner models that comprise the broader range of characteristics of the communication partner suspected by Clark (1996), including a common lexicon and constructicon. Furthermore, participants were found to conceptualize their artificial communication partner with respect to its status as a social agent. Thus, the partner models emerging from our investigation into human-robot interaction indicate that such models may be very broadly construed, comprising both aspects of the particular partner’s suspected competence as well as social relations. These models exceed a simple storage of the linguistic properties and capabilities made use of by the robot (i.e. they are ‘inference-rich’, to use the term employed by Sacks (1972a)) since speakers suspect on the occurrence of one linguistic feature that also other kinds of linguistic properties are suitable, such as the use of topicalization constructions if the robot uses relative clauses. Thus, participants take individual linguistic features as evidence for a particular level of linguistic and cognitive competence, drawing conclusions on the partner in general. In sum, the partner model comprises partly very general and partly highly item-specific estimates about cognitive and linguistic capabilities, as well as social factors, all based on (a) experience with the robot as well as on (b) preconceptions about robots as social actors and (c) inferences similar to those described as ‘ineffable background’ by Clark (1996), concerning linguistic complexity, task difficulty and the social nature of human-robot interaction. Regarding alignment, three results are noteworthy; in particular, we have seen in both the dialogs with Aibo and Rolland with linguistic output that people align with the robot’s utterances. We have found alignment to influence speakers’ lexical, constructional, instructional, and referential choices. However, we have also seen that speakers may align to different degrees; in both corpora, there were speakers who did not align with the robot’s linguistic features at all (concerning social properties, like the polite greeting at the dialog beginning, as well as concerning grammatical features, such as the lack of use of determiners and full syntactic structures). That is, the first finding is that there is interpersonal variation with respect to alignment which casts doubts on alignment as automatic priming (cf. Pickering and Garrod, 2004). Second, even if instances of alignment to linguistic features occurring in previous utterances from the robot can be found, it is not always evident that these

Chapter 6.  Talking to robots 239

instances are due to automatic priming. For example, in the Aibo dialogs, the robot made frequent use of relative clause constructions, and these constructions were aligned with by almost half of the participants. However, speakers do not only employ the construction when the robot uses it, they even, contrary to previous findings that show that users consider relative clauses to be too difficult for a (nonverbal) robot (Fischer, 2006b), suspect this construction to be useful in making their utterances more understandable, for example:

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

(85) Robot: do you want me to go to the object that is in front? A031: - no, to the object that is to your right (… 5 turns…) A031: now move to the glass in front of you Robot: I did not understand A031: – move to the object that‘s in front of you.

While the first instance of a relative clause may be due to interactive alignment in the sense suggested by Pickering and Garrod (2004), the second instance is used for a particular purpose, i.e. speaker A031 takes relative clauses to facilitate understanding for the robot. This example supports the hypothesis by Branigan et al. (2010) who suggest that alignment is used to facilitate communication in the case of communication problems. In particular, Branigan et al. (2010) argue that in “HCI, alignment appears to have a stronger strategic component that aims to maximise communicative success, reflecting speakers’ beliefs about the limited capabilities of computers and the ensuing risk of communicative failure.” In this case, alignment would not be caused by automatic priming as suggested by Pickering and Garrod (2004). That is, the relative clause construction used here, which is replaced in the reformulation in order to enhance the understandability of the utterance in face of a communication problem, has not been uttered in the partner’s previous turn but five turns earlier. That the speaker resorts to a relative clause construction here may thus be due to some kind of strategic alignment or to a quite sophisticated partner model that includes knowledge about the robot’s linguistic capabilities, yet in any case not due to an automatic response. Third, compared to the baseline corpus we have found people in interaction with verbal Aibo to make many other changes, too, which cannot be related to features in the robot’s utterances and which therefore cannot be due to alignment. The fact that the robot produced natural language utterances at all may have led people into a more sophisticated partner model. Moreover, they may also have used the linguistic features it produced itself as indicators for a generally higher linguistic competence, thus inferring (unjustifiably, in the case of a robot) from knowledge of some linguistic features to the knowledge of others.

240 Designing Speech for a Recipient

To conclude, automatic alignment may play some, but a rather limited role in HRI; in many cases, partner modeling was more influential than alignment. Now the important question arises whether the partner model influences alignment, too. In order to address the relationship between partner modeling, feedback and alignment, let us first consider alignment in the dialogs with verbal Aibo again in more detail. What emerges is that for the two groups, for those who attend to the human-robot interaction situation as a social interaction and for those who do not, alignment fulfills different functions. Let us first look at those who do not react to social aspects of the robot’s utterances; for instance, speaker A030 responds only minimally to the robot’s initial greeting: (86) Aibo: Yes, hello, how do you do? A030: (1) hello (laughter) - um – go (par) straight? (/par) Aibo: (par) What (/par) can I do for you? A030:  – go straight? (2) (at=“slow”) left,(/a) turn left? (3) turn left? (1) (at=“quiet”) m okay,(/a) – go straight? (3) Aibo: Excuse me, which of the objects did you name? A030: (1) um (at=“slow”) please go to - goal bowl (/a) - number one?

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

While this speaker aligns only minimally with the social greeting, and does not align with the linguistic features of the robot’s subsequent utterances at all, she begins to align as soon as communicative problems arise: (87) Aibo: Do you want me to go to the object that is in front? A030: – um – no - (hnoise) I want you (at=“lengthening”) to (/a) go to the object? which is (2) (at=“lengthening”) um (/a) – towards your right.

Furthermore, she re-uses the robot’s term object from several turns earlier in the case of a communicative failure: (88) A030:  – go to the (at=“prominent”) plastic (/a) container with the (at=“prominent”) red lid (/a) - on your left. Aibo: I did not understand. A030: (at=“quiet”) uhuh (/a) - um go to the object? - located, to your left.

Later, she re-uses the robot’s expression that is in front (see Example (87)) and a relative clause construction: (89) Aibo: Excuse me, which of the objects did you name? A030: – go to the (1) plastic container? with the red lid, (1) that is (3) um - to your - that is in front of you.

Remarkable is also how she aligns with her own utterances once they resulted in apparent communicative success, re-using the expression the plastic container with the red lid from the previous two examples in the following example:

Chapter 6.  Talking to robots 241

(90) A030: now go to the object, - located, (1) on your - (at=“prominent”) right (/a) (3) s+ – mh– (hnoise) – go to the – plastic container with the (at=“prominent”) red lid (/a) (1) on your right.

Another, very similar, example of a speaker who disaligns initially is A032: (91) Aibo: Yes, hello, how do you do? (3) A032: (at=“lengthening”)s(/a)o I (…) go (par) straight (/par) Aibo: (par) What (/par) can I do for you? (1) A032: go straight about a metre and a half (5)

After the first instance of communicative failure, the speaker aligns with the robot’s term object: (92) Aibo: Do you want me to go to the object that is in front? – A032: no that is the wrong object (1) ah go left (2) turn 45 degrees to left – Aibo: Which object did you mean? A032: (3) the (1) second object (1) from (1) to your left (…) to your left (2)

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Later, the speaker re-uses features the robot has used in previous utterances, in particular the I-want-you-to-construction from the previous robot utterance Do you want me to go to the object that is in front?: (93) Aibo: Which object did you mean? A032: (1) the second cylindrical object to your left (6) Aibo: I did not understand. A032: (5) I want you to go to the first object to your right (9)

Even later, the same speaker A032 uses alignment with the robot’s relative clauses, as well as with the I-want-you-to-construction: (94) Aibo: Do you want me to go to the object that is in front? A032: (1) no I want you to go to the object that is behind the first one

These two features are re-used in later utterances: (95) Aibo: Which object should I head towards? A032: (2) the (3) I want you to go to the object (2) the second smallest the second object to your left (2) (96) Aibo: Do you mean the object that is south south east of the box? A032: (3) the one that is (1) the one that is closer to you

Here, alignment occurs not due to priming but as strategic re-use of linguistic resources previously employed by the robot. This effect is even more evident in the dialog between Aibo and A033; here the speaker ends up with utterances that consist almost exclusively of bits and pieces previously used by the robot:

242 Designing Speech for a Recipient

(97) Aibo: yes, hello, how do you do? (6) what can I do for you? (4) A033: uh, (sigh), – to your left, (8) (at=“lengthening”) a (/a) bit further forward again, (1) Aibo: do you want me to go to the object that is in front? (7) A033: no could (at=“lengthening”) you, (/a) (2) go fu+, uh further forward, (3) Aibo: which object did you mean? (6) A033: not the object, - uh, (2) straight ahead but, - one further to the left.

While the speaker disaligns with the robot in the dialog opening, in the next tasks he continues simplifying and explaining his instructions rather than aligning with the robot’s utterances: (98) A033: okay, could you go to the, first go to the objects on your right, (1) Aibo: I did not understand. (1) A033: could you, (1) go to the, – objects there are three objects on your right. could you go to them, first please? (7) Aibo: Which object did you mean? (3) A033: there are three, objects on your right, – could you go to them,

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Also in subsequent utterances, A033 rather opts for other simplifying strategies, such as agreeing to the robot’s suggestion (even though the glass jar is not the goal object) and remote-control-like steering: (99) Aibo: do you want me to go to the glass (1) A033: yes please, (9) Aibo: excuse me, which of the objects did you mean? (6) A033: the one st+, straight ahead. - just go straight ahead.

However, later also A033 can be found to align with previous robot utterances because of communicative failure, re-using the term degree used by the robot in the utterance a turn of 360 degrees is not useful, the relative clause construction, the lexical items glass and ‘blocks’ (actually box), as well as the robot’s reference system from the robot utterance do you mean the object south-south east of the box?: (100) Aibo: Which object did you mean? A033: (4) yes. the object which is 30 degrees west of the blocks. the large glass.

Thus, in all of these dialogs, participants do not respond to social features of the robot’s utterances yet begin to align as soon as communication problems arise, re-using properties of the robot’s utterances as building blocks for their own utterances, even across longer stretches of dialog. In contrast, the players, i.e. those who attend to the social aspects of the robot’s first utterance, are less likely to take up linguistic material from the robot’s

Chapter 6.  Talking to robots 243

utterances explicitly; rather, they build on it, taking the robot’s utterances as common ground and as ratified starting points by producing relevant nexts (Schegloff, 1972a; Clark and Schaefer, 1989b); see, for instance, the following excerpts from the dialog between Aibo and A046: (101) Aibo: Yes, hello, how do you do? (1) A046: I’m fine, thank you, – (at=“prominent”) please(/a) go to the object, (2) to your (at=“prominent”) left, (/a) - in the back. (5) Aibo: Do you want me to go to the object that is in front – A046: no. to the object in the back.

While the prepositional phrase she uses is structurally similar to the prepositional phrase the robot uses, the speaker does not repeat those parts of the robot’s utterance that she takes to be given. This strategy becomes even more apparent in the next example in which the answer does not reuse any lexical material of the question and even makes use of pronominal substitution: (102) Aibo: Excuse me, which of the objects did you name? A046: the (at=“prominent”) one (/a) in the (at=“prominent”) middle (/a).

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Later, A046 does however produce constructional alignment when she rejects the robot’s reference to a goal object: (103) Aibo: Do you want me to go to the object that is in front? – A046: no, the object that is in back.

In this case, the alignment creates a parallelism between two linguistic features that are in contrasting relationship, rendering the utterance even a bit odd; Mills (2007) has argued that especially in disagreements, partial alignment helps identify what exactly the difference consists in and in which points the speaker agrees with his or her partner. Further examples for the linguistic strategies taken by ‘players’ are from speakers A042 and A043; in these examples the speakers incorporate linguistic material from previous robot utterances but in response to the robot’s queries only produce expansions, relying on the robot’s utterances as being sufficiently grounded: (104) Aibo: Yes, hello, how do you do? (1) A042: I I’m good, and you, (laughter) (at=“quiet”) (at=“laughter”) I don’t know, (/a)(/a) (1) Aibo: which object should I head towards? (1) A042: towards (at=“slow”) the (at=“prominent”) cup (/a) on the (at=“prominent”) furthest, (/a) (/a) that’s furthest left.

244 Designing Speech for a Recipient

Here the speaker aligns with the social part of the dialog beginning and furthermore with the preposition in the robot’s question, producing a reply that builds on the underlying construction of the robot’s utterance. Also in later adjacency pairs, A042 aligns only minimally with the surface features of the robot’s question, instead building on joint understanding: (105) Aibo: Do you want me to go to the object that is in front? A042: (2) go, slightly beyond. - the object. (106) Aibo: Do you want me to go to the glass? A042: (1) (at=“loud”) next (/a) to the glass. – with the (at=“loud”) red (/a) lid. (107) Aibo: Excuse me, which of the objects did you mean? A042: (1) the one in the middle. (108) Aibo: which object did you mean? A042: (2) (at=“lengthening”) the (/a) - red object, - furthest (at=“prominent”) right (/a) from you. (3)

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

In all of these examples, A042 re-uses some of the lexical material previously used by the robot, for instance, glass and object, but also uses pronominal reference and minimal extension. Later she re-uses the term object also across distance: (109) A042: keep turning, (breathing) (4) stop. - (at=“quiet”) good. (/a)– (at=“loud”) the object, on your left.(/a)

In a study of syntactic alignment in very large dialogical and written corpora, Gries (2005) also reports on long distance alignment effects, both between and within speakers/authors. Thus, the phenomenon observed here is not unique to human-robot interaction, yet casts some doubt on the automaticity of such linguistic choices. Another example of a person aligning to the robot’s social greeting is A043; this speaker also builds on the robot’s utterances but does not align with them: (110) Aibo: Yes, hello, how do you do? A043: I’m fine thank you. – Aibo: which object should I head towards? A043: the ca+ the the the cup. (6) keep going straight ahead, (2) Aibo: Do you want me to go to the object that is in front? (1) A043: to the left.

Later, he first aligns with and then re-uses parts of the I-want-you-to-construction: (111) Aibo: Do you want me to go to the glass?– A043: no, I want you to go to the middle box. (3) straight ahead please. (1) (laughter) (2)

Chapter 6.  Talking to robots 245

Aibo: Excuse me, which of the objects did you mean? A043: I want the (at=“prominent”) middle, (/a) - uh, - plastic box. with the red, lid please. (5)

He also makes use of the I-want-you-to-construction when not prompted by the robot:

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

(112) A043: so then turn around? - (laughter) (3) good okay. so then straight ahead? and I want you to go to the (at=”prominent”) cup. (/a) - on the (at=“prominent”) left. (/a) please.

We can conclude that linguistic alignment generally serves as a verbatim display of a shared basis (Clark and Schaefer, 1989b). The players align with the dialog opening to establish a social relationship, i.e. by displaying reciprocity; however, then they assume a much broader shared basis, corresponding to the joint pretense of the robot being a social communication partner. Correspondingly, they align less, and rather use the initiation of a relevant next utterance as a grounding strategy to accept the robot’s utterances as common ground. In contrast, the non-players do not align to indicate a shared social basis, but employ alignment strategically as solutions to communicative failure. This leads to utterances consisting almost exclusively of expressions previously used by the robot. That is, they make use of linguistic material that can be assumed to be common ground since it is part of the discourse record. Thus, while alignment in both cases serves to indicate common ground, players and non-players use it for different purposes. The results from this qualitative analysis are supported by quantitative investigations; the Pearson Product Moment correlations show small, but consistently negative correlation efficients for the relationship between dialog beginnings and linguistic features of robot’s utterances (cf. Table 6.16).16 That is, people who respond to the social aspects of the robot’s greeting more re-use fewer of the robot’s expressions. Table 6.16  Correlations between dialog beginning and words used by the robot begin

Compass

Glass

I mean

Towards

I want

Degrees

Rel-clauses

−0.10

−0.43

−0.23

−0.26

−0.12

−0.15

−.41

16. The linguistic features analyzed concern the occurrences of the expressions towards, I want, glass, degree and box from the robot’s utterances which object should I head towards, do you want me to move to the glass, a turn of 360 degrees is not useful and do you mean the object south-south east of the box? respectively, as well as instances of I mean, relative clauses and instances of the cardinal points of the compass.

246 Designing Speech for a Recipient

In the Rolland data, communicative problems are not at issue to the same degree. What we can observe regarding the players here is that they, like in the Aibo dialogs, often build on the robot’s utterances. Yet in addition, they often make use of the robot’s linguistic features, which contributes to the fluency of the dialogs: (113) Rolland: yes, hello, how do you do? R013: (1) fine, how are you. (breathing) Rolland: you can take us now to a place you want to name. (4) where are we going to? R013: we are going over d’the coffee table. (2) Rolland: did you say coffee table? R013: – yes I did.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Here the participant reuses the robot’s ‘where are we going’ for her utterance ‘we are going’ and the robot’s ‘did you?’ for her ‘I did’. Another example is R016: (114) Rolland: yes, hello, how do you do? R016: (1) (hnoise) (3) oh (par) how are you, (/par) Rolland: (par) you can (/par) take us now to a place you want to name. R016: – (breathing) (2) take me, – (at=“lengthening”) to the (/a) computer please, Rolland: where are we going to? R016: (4) we’re going to the (at=“prominent”) stove, (/a) (3) or, hot plate. Rolland: is this part of the kitchen? R016: (1) this (at=“lengthening”) is (/a) the (at=“prominent”) hot (/a) plate. (2) Rolland: which name do you prefer, stove or kitchen? R016: – I prefer stove. –

Thus, in these cases alignment is not due to strategies concerning communicative failure, and, accordingly, alignment in the Rolland dialogs is positively associated with the dialog beginning. The data shown in Table 6.17 from the first (interactive) task in the Rolland dialogs show positive correlations between features of the robot’s utterances and those speakers who attend to social aspects of the robot’s first utterance. Above in Section 6.4.3 Personal Partner Models and Linguistic Choice, we have already seen evidence for a strong predictive role of speakers’ partner models as evident in their reactions to the robot’s polite greeting when we were investigating the quantitative distribution of speakers’ linguistic choices in these dialogs. Among these choices were also lexical choices in which speakers made use of linguistic features of the robot’s utterances; Table 6.17 shows the relationship between robot utterances, certain lexical choices and the correlation of these choices with the partner model as measured by the amount of alignment with the

Chapter 6.  Talking to robots 247

robot’s social greeting. The table shows that the models of the partner as social or non-social correlate with speakers’ amounts of alignment such that the partner model determines to what extent participants are likely to align with the robot’s utterances. Table 6.17 presents the analysis for the first task of the Rolland dialogs only, yet recall that participants made use of the robot’s where are we going to? both in the first and the third task in correlation with the dialog beginnings, as well as exhibited a tendency to use the term in front to describe a location in both tasks.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Table 6.17  Correlations between users’ expressions and robot utterances Linguistic feature

Robot utterance

r

take we are going this is in front name

you can take us now… where are we going to? where are we going to? is this the place… am I in front of it? … a place you want to name

0.52 0.50 0.35 0.43 0.25 0.41

While in principle both alignment and partner models may influence user behavior independently, the results of our empirical analyses suggest that the partner model determines how much speakers align with their partners. That is, people align to different degrees depending on their understanding of the socialness of the situation. Furthermore, due to their partner models, people may also align for different reasons and concerning different linguistic features of the communication partner’s speech. Alignment was thus found to be in competition with the reliance on common ground as a shared basis. We can therefore conclude that the different models of the artificial partner as a social versus a non-social agent define the limits of alignment. The results of our investigation furthermore indicate that speakers make their linguistic choices strategically rather than automatically. This finding has consequences for theoretical approaches to alignment (Pickering and Garrod, 2004, 2007) as well as for the hypothesis that people transfer mindlessly (Nass and Moon, 2000). The fact that some speakers do not align with the robot’s first utterance alone shows that the understanding of the interaction as social or non-social is not an automatic response, as suggested by the interactive alignment model and predicted by the notions of media equation (Reeves and Nass, 1996) and mindless transfer (Nass and Moon, 2000). In contrast to these proposals, speakers’ choices correspond to speakers’ individual understandings of the situation and of their communication partner. Thus, the results of this study do not provide evidence for automatic processes in linguistic choice.

248 Designing Speech for a Recipient

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Furthermore, in contrast to the assumption made by Pickering and Garrod (2004), it is not the case that partner modeling occurs only in case communicative success is at issue; due to their scripted nature, the human-robot dialogs are equally successful or unsuccessful for all speakers alike; yet, only some speakers retain linguistic properties of their artificial communication partners’ utterances over long stretches of dialog, and they use them when they either consider communicative success at issue or in potentially face-threatening situations, such as rejections, where alignment can serve to indicate partial agreement and to highlight those areas in which the speaker disagrees with his or her partner (as suggested by Mills and Healey (2008)). Speakers have also been found to retain verbatim representations of their partners’ utterances if these features are taken as clues to the robot’s capabilities. Thus, our results support the suggestions made by linguists like Givόn (2005) and social psychologists like Clark (1996), that participants create, and make use of, elaborate models of their partners concomitantly in dialog. These models then serve as the basis for communicative strategies of all kind, among which may also be alignment.

chapter 7

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Conclusions

The aim of this study was to investigate how addressees are taken into account in interaction. The approach taken in this study has provided a novel perspective on the phenomenon by comparing three different ‘simplified registers’ and by combining experimental, corpus-linguistic and conversation analytic methods in order to address specific issues. The choice of the varieties examined allowed the investigation of a multitude of linguistic properties involved in addressee orientation and of the types of representations speakers may build up of their communication partners, as well as the identification of a broad spectrum of influencing factors and resources speakers make use of when speaking for their addressees. Especially in the empirical investigation of robot-directed speech, specific hypotheses that concern the relationship between partner modeling, alignment and feedback could be tested. In this chapter, I summarize the results from the previous chapters and draw conclusions regarding the issues raised in the first two chapters of this study. The results will be presented in three different sections. First, I discuss the usefulness of understanding speech directed to children, foreigners and robots as ‘simplified registers’, considering both the notion of simplification and the notion of register. Second, I address the open issues outlined in Section 1.1 Open Issues and integrate the results of this study; Section 7.2 The Role of the Addressee in Interaction: Some Answers thus sums up the findings from each empirical chapter and relates them to each other in order to arrive at a more comprehensive picture of the processes involved in recipient design. In the third part of this chapter, in Section 7.3 General Conclusions, I present general conclusions concerning the nature of language emerging from the current study.

7.1  Simplified registers? The comparison of our findings in the three empirical chapters should enable us to address the relationship between the three suspected registers, their similarities and differences and thus to answer the question whether it makes sense to speak of the ways in which speakers address their particular communication partners investigated here as ‘simplified registers’. The answer to this question involves considering both the notion of simplification and the notion of register.

250 Designing Speech for a Recipient

7.1.1  The notion of simplification At the beginning of this investigation, we raised the following questions, among others: – In which ways are the adaptations made in ‘simplified registers’ simpler? – To what extent do the speakers’ adaptations facilitate understanding? – Does simpler mean the same for all communication partners? Several scholars argue that there is a general, universal notion of simplification (e.g. Kusters, 2003; Foley, 2006; McWhorter, 2007). For instance, Ferguson (1982) argues that simplification is surprisingly uniform; in his view, it is characterized by: smaller vocabulary, generic rather than specific terms, monomorphemic words, and paraphrase of semantically complex words; – coordination, parataxis, invariable word order as well as absence of copula, pronouns, and function words; – heavily reduced or no inflectional and derivational affixes, no allomorphy, and invariant stems; – CV monosyllables and CVCV disyllables.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.



McWhorter (2007) views simplification from a different perspective, arguing that particularly overspecification, structural elaboration, and irregularity are eliminated in simplification processes as they appear in language contact situations (cf. McWhorter, 2011, p. 2). These elimination processes result in partly similar linguistic properties as those suggested by Ferguson (1982); we can expect, for instance, inflectional morphology to disappear if the meaning is expressed already by the content words of the sentence (cf. McWhorter, 2011, p. 6). Another suggestion on simplification comes from Kusters (2003) who identifies three principles: – – –

the  economy principle, which means that there are as few inflectional categories as possible; the  transparency principle, which asserts a trend towards a one-to-one correspondence between form and meaning; and the isomorphy principle, which assumes increased regularity. 

Evidence for universal simplification strategies also comes from Foley (2006) who compares three typologically unrelated pidgins (Hiri Motu, Yimas Pidgin, Tok Pisin) and finds simplification processes that cannot be explained from the linguistic material of the source languages as a result of language contact. For instance, neither the contributing languages nor their ‘sisters’ have a basic binary

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 7.  Conclusions 251

tense system like Yimas-Arafundi Pidgin. Therefore, he suggests that simplification plays a crucial role in pidginization processes; in particular, he proposes that the fact “that the end result of the process of pidginization in the three languages is so similar (…) suggests that universal constraints are operative here” (Foley, 2006, p. 40). He argues that the data support both Ferguson’s and Kusters’ suggestions and that the general principle underlying simplification is to ease the processing load of the hearer (Foley, 2006, p. 4). If these assumptions are correct, speakers will indeed know intuitively what makes an utterance more understandable. This is in line with a position widely held with respect to language acquisition; for instance, Brown (1977) suggests to parents that in order to find an appropriate way of speaking to their child, they may simply concentrate on communication, i.e. that caregivers can trust in their intuition (and on the child’s subsequent response) on what will make their utterances simple enough for the child to understand. Yet even if simplification strategies are universally applied in the way outlined above, the result still cannot be predicted; for instance, Mühlhäusler (1981) demonstrates that the application of simplifying devices does not lead to predictable results. He has a group of students replicate the test described in Ferguson (1975), in which students were asked to rewrite sentences according to “how they thought speakers of English trying to communicate with apparently uneducated non-Europeans would say” them (Ferguson, 1975, p. 3). In the results of his study, Mühlhäusler (1981) observes a broad range of different possibilities in which the same utterance can bes implified. For instance, the utterance “I haven’t seen the man you’re talking about” yields the following different ‘translations’ into foreigner talk (Mühlhäusler, 1981, p. 105): – – – – – – – – –

I  no see this man.  no see man you talk about. Me  see man. No  no see man you talk about. Me  see um man you say. No  Man you talk about, I not see.  seen man you talk. No  You talk man. I not seen.  no look him man you say. Me

These data exhibit many of the features identified by Ferguson (1975): vowel additions, omission of articles, auxiliaries, do-support and conjunctions, as well as pronoun case substitution; however, the variation in the different reformulations shows that the application of these simplification strategies does not lead deterministically to the same results.

252 Designing Speech for a Recipient

It can thus be expected that in the three different registers discussed, simplification may instantiate quite differently depending on the communication partner, and indeed the present investigation has shown that with respect to the properties listed by Ferguson (1982), there are indeed considerable differences both between and within the three varieties investigated. Table 7.1 summarizes the comparison of the features suggested by Ferguson (1982) and our findings on the three varieties discussed in this book. Wherever the table asserts ‘sometimes’, we had found considerable variability concerning the feature under consideration.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Table 7.1  Similarities and differences between the so-called simplified registers Ferguson (1982)

CDS

FDS

RDS

reduced lexicon

yes

sometimes

sometimes

generic rather than specific terms

basic level categories

sometimes

yes, but also technical vocabulary

monomorphemic words and diminutives

yes, but also vowel additions

sometimes

no

paraphrase of semantically complex words

sometimes

sometimes

never

coordination and parataxis

short MLU, yet word specific

sometimes

sometimes

invariable word order

no: variation sets

not observed constructions used by the system

absence of copula, pronouns, function words

no, sometimes pronoun sometimes substitution

sometimes telegraphic style

heavily reduced morphological no marking, no allomorphy, invariant stems

sometimes

no

CV monosyllables and CVCV disyllables

sometimes

no

yes (in English)

In child-directed speech, we have found a smaller vocabulary, as Ferguson suggests, however, with a preference for basic level terms – these terms are located in the middle of a scale of generality versus specificity (see Cruse, 2000, ch.13). Furthermore, while the words used are usually rather monosyllabic than polysyllabic, sometimes syllables are added to standard language terms, as, for instance, in doggie. Semantic paraphrase was not observed, but a tendency towards shorter utterances and parataxis can be found depending on the child’s knowledge of the words in the utterance (Roy et al., 2009). Regarding word order, in contrast to Ferguson’s proposal, high variability can be observed in CDS, in which utterances occur in variation sets in which one utterance constitutes a reformulation of another. There are occasional pronoun substitutions especially concerning first and

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 7.  Conclusions 253

second person pronouns. Features that would cause the utterance to become ungrammatical, such as absence of the copula or function word and pronoun deletion, have however rarely been observed (e.g. Hatch, 1983; Newport et al., 1977). That is, most researchers report no grammatical or morphological modifications that would render CDS ungrammatical. Especially loss of inflection has not been reported. On the contrary, in languages in which diminutives are informative of the respective noun’s grammatical gender, increased use of diminutive suffixes was observed (Kempe et al., 2001). In CDS we do, however, observe attention to increased economy, such that caregivers restrict their utterances to short utterances with a limited vocabulary and restricted grammatical inventory. Furrow and Nelson (1986) argue that while individual adaptations in the input may not necessarily correlate with the acquisition of particular forms by the child, the simplifications altogether serve to make language more understandable: “children will find it easier to interpret utterances semantically if they are couched in motherese-type utterances and thus will find it easier to learn the grammar of the language” (Furrow and Nelson, 1986, p. 165). Furthermore, caregivers attend to increasing transparency, for instance, by increasing the vowel space and by exaggerating phonological contrasts and prosodic contours. Furthermore, an increase in one-to-one correspondence can be observed, for example, in the uses of prosodic prominence to mark new words (Fernald and Mazzie, 1991), as well as in the use of skewed distributions by means of which caregivers highlight prototypical slot fillers (see, for instance, Goldberg et al. (2004) and Casenhiser and Goldberg (2005) with respect to argument structure constructions and Stefanowitsch and Gries (2005) regarding the possessive construction). There is thus increased regularity and the avoidance of structural elaboration, just as Ferguson (1982), Kusters (2003) and McWhorter (2007) suggest; yet many other features of CDS, which have been demonstrated to facilitate language acquisition, such as increased variability in the input, as well as the affective and attentiongetting functions of the adjustments in speech to children, are not accounted for by the features suggested. Regarding foreigner-directed speech, many of the features observed by Ferguson (1982) have been found in the literature; Müller (2000), who investigates spontaneous, work-related interactions between native and non-native speakers, proposes that adults have a tendency towards an analytic factorization of language when talking to a non-native speaker; he argues that speakers attend to one form – one meaning correspondences, leaving out idiomatic and gestalt properties of language as well as everything understood to be not necessary (cf. McWhorter, 2007; Kusters, 2003; Thomason, 2001), such as inflection, conjunction and pragmatic items, such as discourse particles. This analysis of the language for the language learner, he argues, is traded against the fluency, idiomaticity and

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

254 Designing Speech for a Recipient

naturalness of natural language in its full form. His results thus support the features of simplification proposed by Ferguson (1982), as well as the principles of economy and transparency (Kusters, 2003) and the elimination of overspecification, structural elaboration and irregularity (McWhorter, 2007, 2011). That language contact between native and non-native speakers leads to simplification processes, is also the perspective taken in Kortmann and Szmrecsanyi (2009). These authors investigate varieties of English for their use of linguistic features that can be argued to be either ornamentally complex, i.e. that “complicate the system vis-à-vis the standard system, without clearly yielding an added communicative bonus” (Kortmann and Szmrecsanyi, 2009, p. 272), simplifying features, such as regularization of irregular patterns or reducing the numbers of contrast, and L2-simple features, i.e. “those that are known to recur in interlanguage varieties” (Kortmann and Szmrecsanyi, 2009, p. 274). Along these dimensions they find different types of English varieties to differ; in particular, they distinguish between (1) traditional L1, low-contact varieties of English, such as East Anglia or Appalachian English, (2) high-contact L1 varieties, such as standard British or American English, but also Scottish English or Urban African American Vernacular English, (3) L2-varieties, such as Indian or Singapore English, and (4) pidgins and creoles. Regarding ornamental complexity, they find a cline from traditional L1 via high-contact-L1 and pidgins and creoles to L2, with traditional low contact varieties exhibiting the greatest amount of ornamentally complex features and L2-varieties exhibiting the fewest. Regarding simplification, they find pidgins and creoles to exhibit the most simplifying features, and L2 to have the least of these features. Correspondingly, regarding L2-simple features, such as lack of inflection, they find the numbers of L2-simplifying features to be the highest in pidgins and creoles, and again the lowest in L2 varieties (Kortmann and Szmrecsanyi, 2009, p. 275). These results support the authors’ claim that languages differ regarding morphosyntactic complexity, and they show furthermore that high-contact languages may exhibit more L2-simple and other simplifying features than low-contact languages. The results however rely on survey data and thus only concern types of features, not their frequency. Kortmann and Szmrecsanyi (2009) thus add a corpus study in which they investigate fifteen varieties from the three groups low-contact L1, highcontact L1 and L2 for the frequency of overt grammatical marking for 1000-word samples of each variety, the frequency of free and bound grammatical morphemes per sample and the percentage of regular bound morphemes (Kortmann and Szmrecsanyi, 2009, p. 277). The results are systematic clines between traditional L1, high-contact L1 and L2 such that L2 has the fewest of these morphemes, traditional L1 has the highest and high-contact varieties being in the middle (Kortmann and Szmrecsanyi, 2009, p. 278). Given the general idea that contact varieties are

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 7.  Conclusions 255

characterized by increasing transparency, however, more analytic features might have been expected in high-contact than in low-contact varieties. Yet in fact, the results show that the amount of analytic morphology correlates positively with the amount of synthetic morphology. So these data do not show that language contact leads to an increase in analytic markings. However, additional studies of the regularity of the grammatical morphemes involved show that in L2, 82% of the bound grammatical morphemes are regular, whereas the percentage is 71% in high-contact L1 and 65% in traditional L1. These results do indeed suggest that language contact has an influence on increased regularity in the language and thus support the claim that “second-language acquisition entails grammatical simplification” (McWhorter, 2011, p. 4). However, even though these studies provide indirect evidence for strategies of simplification in speech to non-native speakers, most studies analyzing actual encounters between native and non-native speakers do not report the occurrence of such grammatically simplifying features in FDS. Simplified phonetics, phonology and prosody have been observed to varying degrees, and the phonetic simplifications made for foreigners were found to be different from those made for children (e.g. Biersack et al., 2005). With the exception of Müller (2000), increased attention to exaggerated regularities and one-to-one correspondences have not been reported in the literature reviewed, and also avoidance of overspecification has not been observed; on the contrary, in the context of interactional modifications, structural elaboration and overspecification, for instance, by giving examples, has been found. The simplification strategies suggested by Ferguson (1982), Kusters (2003) and McWhorter (2007) can be observed only in a few contexts, not in interactions between native and non-native speakers in general. Whether these features occur depends on the task at issue, as well as on whether or not language learning is a jointly attended-to goal of the interaction (Wagner, 1996; Firth and Wagner, 2007; Firth, 2009). As we have seen above, if speakers aim to make communication work in order to get something done, this may lead to instantiations of simplification strategies, whereas the focus on teaching and language acquisition precludes the use of simplification strategies such as those described by Ferguson (1982). Thus, in FDS, the features suggested by Ferguson (1982) and Kortmann and Szmrecsanyi (2009) were found to be used only in some contexts and not in others. Yet also in the domain of pidgins and creoles, speakers seem to employ simplification strategies selectively. The finding that no unified picture of simplification emerges corresponds to the observation by Roberts and Bresnan (2008) that even in pidgins inflectional morphology may be affected by simplification processes to different degrees. They find that inflectional morphology does not only not disappear in all pidgins (i.e. maximally simplified linguistic varieties, see McWhorter (2007) or Kortmann and Szmrecsanyi (2009)), but that there is a

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

256 Designing Speech for a Recipient

systematic difference between different types of inflections. Roberts and Bresnan (2008) argue that while indeed much of the contextual inflection, which contributes to building up the sentence, such as subject-verb agreement, is lost, pidgins often retain inherent inflection, such as tense, aspect, mood, negation, number, gender or definiteness (Roberts and Bresnan, 2008, p. 27). This is no counterevidence to the claims made by Kusters (2003), McWhorter (2007) or Ferguson (1982), but it suggests that speakers do not simply leave out certain forms, such as all inflections; instead, speakers make choices that correspond to the functions of the items under consideration. From this perspective, accounts for simplification that are based on functional considerations may account for the observable phenomena better than accounts that are oriented at the presence or absence of particular forms. Concerning robot-directed speech, the vocabulary used has been found to depend to a great extent on participants’ understandings of the robot’s capabilities as, for instance, indicated by the task it is suspected to fulfill. In particular, in RDS many participants were found to restrict themselves to lexical and syntactical structures for which they (assumed that they) had evidence that the robot would understand them (because it used them itself), and they treated lexical restrictedness as a potential problem source; however, as much as they used simple terms, they also employed technical, polysyllabic vocabulary to improve communication. There was thus no focus on monosyllabic or monomorphemic words, and there was no evidence for semantic paraphrase – if problems with the vocabulary were suspected, participants tried out other lexical items rather than paraphrasing their words semantically. Like the vocabulary, also the mean length of utterance and the preference for particular syntactic structures were found to be highly dependent on the partner model. Grammatical and morphological modifications are rare in the data investigated. Regarding syntactic constructions, participants generally stuck to what had worked once or to what had been used by the robot itself, which thus lead to relatively little intrapersonal word order variation, whereas variation across participants was considerable in the absence of the robot’s own utterances. Regarding telegraphic style, function words, copulas and pronouns were occasionally found to be left out, even though these cases were rare. Pronouns were however found to be rare in general. The phonetics, phonology and prosody of utterances directed at robots were found to be modified even when there were no signs of communication problems. The adjustments observed were syllable lengthening, increased loudness and strong emphasis as well as pauses between utterances, words and even syllables. These features are associated with communication problems and can thus be taken as indicators for simplification or clarification strategies. Also significant differences concerning the intonation contours with which the robot was addressed

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 7.  Conclusions 257

were found, yet they rather depended on the robot’s appearance than being due to simplification processes. A tendency for one-to-one correspondences and increased regularity in robotdirected speech can be identified in participants’ strategy to stay with one formulation once it seems to have proven successful (Fischer and Moratz, 2001; Moratz et al., 2001). The amount of variability in linguistic choice and thus the amount of regularity attended to, however, has been found to depend on the tasks performed and on the robot’s feedback, as we have seen in the case of non-verbal versus verbal Rolland. To sum up, the simplification strategies taken in RDS are very different from those suggested in Ferguson (1982) and also different from those reported on for CDS and FDS. This observation is supported by studies that compare simplification strategies across communication partners directly; for instance, with respect to phonetics, phonology and prosody, simplification for children has been found to be different from simplification for foreign adults, as Biersack et al. (2005) and Uther et al. (2007) have shown. According to these studies, different phonetic and prosodic means are employed in order to simplify speech for the different (imagined) communication partners. Studies comparing the grammatical characteristics of CDS and FDS also report differences regarding syntactic correctness and pragmatic function. For instance, Hatch (1983) finds no grammatical simplifications (simplified negation, do-support) in CDS, compared to FDS. Freed (1981) reports similar reductions in syntactic complexity in both varieties. However, she takes the foreigner to be treated as a peer with cognitive and social presence, which is, according to her, not the case for speech to children (Freed, 1981, p. 35). She suggests different functions to be prominent in the two varieties, such that the general affective function of CDS is not assumed for talk to foreigners (cf. also DePaulo and Coleman, 1986). Fischer et al. (2011) compare speech to infants and speech to a simulated robot whose interaction with the environment consists in following movements, prominent colors or skin color with its eyes. The task for both parents and the participants in the human-robot interaction experiments was to explain to their respective pre-verbal communication partner, i.e. either an eight-to-ten month old infant or the simulated robot, how simple toys or household devices can be used. The robot was simulated in that it was displayed on a screen, yet it was fully autonomous in its responses to the tutors’ movements. The analyses of the linguistic and action demonstration behaviors by the speakers reveal that parents simplify their speech syntactically much more for their children than tutors simplify their speech in the dialogs with the robot, but that parents’ action demonstration is in fact faster and contains shorter pauses than action demonstration for the simulated robot.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

258 Designing Speech for a Recipient

Thus, there are significant differences regarding all linguistic behaviors, yet in the one area on which the robot does provide feedback, namely on participants’ gestures, tutors slow down significantly compared to the action demonstrations for the young children. Thus, simplification strategies are adapted to the respective communication partners’ perceived needs, which are perceived to be different for infants and robots. How participants attempt to make their utterances more comprehensible for a particular communication partner thus depends partly on who this communication partner is. In the three cases investigated, making specific adjustments to children, foreigners and robots seems appropriate since what makes utterances more comprehensible for a child may differ considerably from what makes utterances more comprehensible for a non-native speaker or for a robot, and speakers seem to take this into consideration. Consequently, simplifications differ for different addressees. Simplification in the three situations under consideration is therefore realized quite differently; participants do not use similar simplification strategies in interaction with the three different communication partners investigated, and often not even with the same kind of partner, as for instance the heterogeneity of FDS and RDS demonstrate. However, part of the heterogeneity of the speech observed could be related to different goals speakers may carry into the interactions, such that simplification was not always what participants were aiming for. Thus, the choices made cannot always be appropriately described as simplifications. The lack of phonological clarity in certain phases of CDS as input for the acquisition of phonological rules is one example, and some features of speech to foreigners have been found to be even counterproductive. That is, speakers may not provide helpful information for their listeners at all; for instance, we have seen that in FDS, speakers may speed up and produce pessimistic comments on the expected success of the interaction (Smith et al., 1991; Zuengler, 1991), and that in the communication with robots, some of the supposed simplifications create the opposite effect, due to the functioning of natural language processing technology. Thus, besides the considerable variation in simplification strategies both within and between registers we have identified, simplifying is by no means all that speakers do in ‘simplified registers’. Nevertheless, many speakers clearly intend their utterances to become simpler when using these features, as for instance their clarification attempts following displays of communication problems show. Furthermore, speakers in all three registers employ means to clarify their utterances by making additions, which may range from phonological to morphosyntactic to interactional. Yet their modifications do not always fulfill the functions they are supposed to fulfill – especially the phonetic and prosodic adjustments in RDS do not facilitate communication with automatic speech processing systems. Here, speakers’ partner

Chapter 7.  Conclusions 259

models are either too coarse-grained or too inaccurate to yield functionally appropriate adaptations. Thus, it is more appropriate to describe the speakers’ linguistic choices as strategic, rather than as functional, and speakers’ strategies depend on their goals and their evaluation of the affordances of the current situation. To conclude, the current discussion has shown that (a) speakers use partly different simplification strategies in the communication with different communication partners and (b) speakers may also have other interactional goals besides simplification. If the main goal of speakers talking to children, foreigners or robots is not to simplify their utterances, we may ask:

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.



What are the functions of the adaptations?

The answer to this question is different for the speech occurring in the three different situations. For instance, “ungrammatical grammatical” simplification is a powerful device for communication restricted to the ideational domain, i.e. when business needs to be done. In contrast, addressee-specific levels of “grammatical grammatical” simplification as well as interactional modifications have been shown to facilitate comprehension (e.g. Chaudron, 1983; Ellis, 1999), and in contrast to the former, they also serve to maintain interpersonal relationships and, possibly, facilitate acquisition (e.g. Ellis, 1999; De la Fuente, 2002). However, in FDS also adjustments were observed that were more suited to present the native speaker as having higher status or as not being responsible for the inability to master the joint task given the communication partner’s low language capabilities, for instance through the display of interactional pessimism, than to facilitate communication. The adjustments made in speech to children were found to fulfill various purposes, such as maintaining conversation, establishing and maintaining joint attention, creating affectionate bonds and facilitating language acquisition. The simplification strategies speakers may take are thus finely tuned to the affordances of the respective situation and in particular to their interactional goals in this situation; this is in accordance with the suggestions by CAT (Giles et al., 1991, 2006). Simplification may furthermore depend on the granularity of the partner model, which may either be a model of a specific addressee or a model of an unspecified group member; if it is the former, alignment and negotiation processes may play a role in the updating of the partner model and thus in the simplification strategies the respective speaker assumes to be appropriate. If it is the latter, speakers are less likely to adjust to their particular communication partner (Zuengler, 1991). In RDS, phonetic and prosodic adjustments have been found to be related to increasing the understandability of one’s utterances, and many adjustments on the lexical and syntactic level were clearly designed to facilitate communication, like, for instance, the re-use of words and constructions the robot used itself. However,

260 Designing Speech for a Recipient

also failure to adjust to the robot’s utterances, like its social greeting, was observed, indicating that also in RDS speakers’ identity needs may override the requirements of the current communication situation. To sum up, there is not a single notion of simplification that predicts the properties of utterances directed at communication partners who are limited in their capabilities in one way or other. Instead, people take steps to facilitate communication in response to what they perceive to be the affordances of the current situation, which may differ considerably between speakers.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

7.1.2  The notion of register The notion of register suggests that there is a (to some extent) predictable relationship between situational variables and language use (cf. Hasan, 1973, p. 281). The concept thus implies some homogeneity in linguistic choice, i.e. that “[c]haracteristic features and the probability of occurrence of other features are uniquely determined by aspects of the situation” (Lee, 2001, p. 46), or, in Halliday’s terms: “we can and do (and must) make inferences from the situation to the text, about the kinds of meaning that are likely to be exchanged; and also inferences from the text to the situation” (Halliday, 1978, p. 36). Language use and situation are thus understood to correlate (Bateman et al., 2007, p. 147). Furthermore, the association between linguistic features and situation is often understood to be conventional, normative: “Register is used when we view a text as the instantiation of a conventionalized, functional configuration of language tied to certain broad societal situations, that is, variety according to use” (Lee, 2001, p. 46). A prominent proponent of the view that the ways of speaking to children or to foreigners are best understood as simplified registers is Charles Ferguson, who claims that the characteristics of talk to children co-occur regularly and are so interrelated that they are best described as registers (Ferguson, 1982). However, he admits that there is so much variation in the degree to which these features occur and their relative incidence that a range of intraregister variability must also be recognized. Ferguson therefore speaks of different “degrees of babyishness,” which “reflect the age of the addressee, the nature of the situation, the speaker’s estimate of the linguistic abilities of the child, the strength of the affective bond between the interactants, and so on” (Ferguson, 2004, p. 181). Ferguson holds that the conventionalized register ‘baby talk’ can also be made use of in other situations: “Like other registers, however, it may be extended to secondary uses” (Ferguson, 2004, p. 180). Such uses include the following: “it may be used in talking to pets, in coaxing other people (or even objects) to behave in the way the speaker wants, in calling attention to someone’s childishness, and in talk

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 7.  Conclusions 261

between lovers” (Ferguson, 2004, p. 180). From this perspective, the proposal that CDS, FDS and RDS might constitute simplified registers (Ferguson, 1982) suggests that speakers’ linguistic behaviors in situations involving children, foreigners or robots as communication partners might be sufficiently specified by the fact that speakers are interacting with a particular communication partner who is perceived to be in need of linguistic simplification. However, the empirical findings on the three suspected varieties in this study reveal considerable effects of various determining factors, such as the task, activity or social relationships in the situations under consideration, as well as the speakers’ personal agendas and the partner’s linguistic and nonverbal behavior. Even in identical situations, people were found to behave very differently (see Sections 6.4.3, Personal Partner Models and Linguistic Choice and 6.6, The Relationship between Partner Modeling, Alignment and Feedback). These findings cast doubt on a possible conventional link (e.g. Halliday et al., 1964; Lee, 2001) between a situation, as defined by the communication partner, and the linguistic properties chosen. Instead, we have observed considerable heterogeneity in all three varieties investigated. In particular, while CDS turned out to be relatively homogeneous between interactants such that there were similar results for mothers, fathers and other adults (for instance, Snow, 1972; Barton and Tomasello, 1994; Roy et al., 2009), there was still considerable variation depending on the respective child’s receptive ability (see Cross et al., 1985), as well as considerable situational variation (see Section 4.2.4 Variation: Situation), not to mention the impact of caregivers’ SES (cf. Fernald et al., 2012). Thus, in different activities, speakers find different linguistic features useful, which should not be the case if the situation was sufficiently defined by the communication partner, as is suggested by the term ‘simplified register’. Furthermore, older children addressing infants do not use the whole range of peculiar features observable in adult IDS, but acquire this speaking style feature by feature (Weppelman et al., 2003). This suggests that CDS is not a conventional variety as a whole, but that there is selective attention to particular linguistic structures and particular strategies taken specifically for a particular partner. Thus, CDS does not seem to be a conventional variety that is learned by exposure during the speakers’ own infancy; instead, the ability to adjust appropriately to a partner is acquired in the course of time and is functionally motivated. Moreover, the fact that caregivers finetune their utterances particularly to the receptive capabilities of their children shows that CDS is continuously changing, becoming gradually more similar to adult-directed speech (cf. Snow, 1972; Cross et al., 1980, 1985). In addition, the relative homogeneity of IDS and CDS in a given task (Ninio, 2011) seems to correspond to the relative homogeneity in the modeling of the infant addressees and their capabilities. This position is supported by studies of IDS in various different cultures, which have revealed that different ways of speaking to

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

262 Designing Speech for a Recipient

infants and young children are related to different models of infants and their roles in society (Scheper-Hughes, 1985; Ochs and Schieffelin, 1995). Foreigner-directed speech, in comparison, was found to be even more heterogeneous and therefore even less likely to constitute a conventional way of speaking (cf. Roche, 1989). The ways in which foreigners are addressed did not turn out to be sufficiently defined by the communication partner; instead, the interlocutors’ objectives play a considerable role, for instance, whether the interaction takes place in a business context (Firth and Wagner, 2007) or in a language learning situation (Varonis and Gass, 1985). Moreover, participants’ personal interactional goals were also found to be defining (e.g. Zuengler, 1991). An important determining factor of foreigner talk turned out to be the partner model, which in turn may be influenced by different cultural stereotypes, the goals of the interaction, in particular, language teaching versus other goals, speakers’ understanding of the interpersonal relationship, as well as the communication partner’s linguistic capabilities in interaction. FDS is thus characterized by numerous interpersonal differences, which makes it highly implausible that foreigner-directed speech may be based on certain conventions only. Roche (1989) comes to the same conclusion on the basis of his study of several interactions between native and non-native speakers in different situations. He observes considerable variability in the linguistic strategies speakers take, and participants’ real behavior is rather unrelated to the conventional model of speech to foreigners that seems to be widely spread through the media. He thus suggests that there is a conventional variety, but rather as a caricature of real behavior. The hypothesis that FDS is a conventional variety learned and passed on by the media was first proposed by Ferguson (1975); however, there is little evidence that even if there are conventional varieties, that they serve as models for speakers’ linguistic choices. For instance, Mühlhäusler (1981) reports that although Tok Masta, the FT variety of Tok Pisin, is not in use any more, it still survives “as a special style in the Tok Pisin of some Papua New Guineans. Here it serves as a kind of marker of colonialist attitudes and outdated race relationships” (Mühlhäusler, 1981, p. 109). However, Tok Masta is only marginally intelligible to the average Tok Pisin speaker (Mühlhäusler, 1984, p. 287) and thus does not account for speakers’ observable behaviors in real situations of use. The same seems to apply to depictions of foreigners in the media. For instance, Meek (2006) observes many of the features proposed for FDS in Hollywood depictions of Native American speech. While there are usually no phonological peculiarities, ‘Hollywood Injun English’ is characterized by monotonous intonation and long pauses (Meek, 2006, p. 97–98), and morphosyntactically by lack of contraction, by deletion or substitution of subject pronouns with either full nouns or dative pronouns, by deletion of determiners and auxiliaries, and by a highly

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 7.  Conclusions 263

simplified tense system in accordance with stereotypes about different conceptions of time (Meek, 2006, p. 100). In contrast to the depiction in Hollywood movies, native American varieties of English do not constitute a homogeneous variety similar to the stereotype described, but instead are characterized by the use of different dialects (Meek, 2006, p. 111). The depiction is thus rather a caricature without any correspondence to real speakers’ linguistic behavior. If there are therefore indeed conventionalized ways of talking to foreigners that are familiar to the speech community (e.g. Ferguson, 1975), then they do not account for speakers’ observable behaviors in FDS, but rather provide models of stereotypical behavior which are exploited for humoristic purposes. Finally, human-robot interaction is quite unlikely to be associated with particular conventional ways of speaking because of the little experience we have with robots that process natural language or even speech. There are neither examples of ‘correct’ speech to robots, nor much experience with talking robots or even dialog systems in general as yet that could serve as a model. Instead, speakers’ preconceptions play a considerable role, as well as partner modeling based on the robot’s functionalities, limitations and linguistic behaviors. Moreover, speakers make use of linguistic material stemming from the robot; speakers may also use peculiar features, like technical vocabulary or an external reference system, if they expect the interaction to improve in this way. Corresponding to speakers’ different partner models, RDS was found to be highly heterogeneous as well; we found interpersonal differences even in identical situations, which could be related to different personal partner models, yet linguistic behavior varied considerably also depending on the task and on the robot’s linguistic output. The considerable variation observed in RDS provides further evidence that there is no conventional link between situation and speakers’ choices. The notion of simplified register is therefore not suitable to account for the orientation at the addressee observed since none of the varieties investigated was eventually defined by the communication partner alone. More fine-grained distinctions could of course be made, taking different tasks or goals into consideration which may display certain probabilities; but unless linguistic behaviors are paired with the participants’ own understandings of what a situation consists in and participants’ interactional goals, there will always be considerable variation (cf. also Fischer, 2000b, 2011). This criticism is shared by other researchers; for instance, Lee writes: “Register has typically been used in a very uncritical fashion, to invoke ideas of ‘appropriateness’ and ‘expected norms,’ as if situational parameters of language use have an unquestionable, natural association with certain linguistic features and that social evaluations of contextual usage are given rather than conventionalized and contested” (Lee, 2001, p. 47).

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

264 Designing Speech for a Recipient

In order to address this problem, Systemic Functional Grammar assumes that the link between language and situation is a social-semiotic process (Halliday, 1978). In this model (cf. Halliday, 1985; Hasan, 1973), a situation is understood as a semantic configuration of the three dimensions field, mode and tenor. Furthermore, it is assumed that there is a certain probability that speakers will attend to certain metafunctions that correspond to these semantic configurations in a particular type of situation, that is, that speakers regard certain situationally specific functional configurations to be relevant in certain types of situation (Halliday, 1978). The lexico-grammar of a language is then functionally related to these metafunctions, and if speakers attend to certain metafunctions, they will make certain grammatical choices to fulfill these functions: “registers tend to differ in semantics and therefore in grammar and vocabulary” (Halliday, 1978, p. 43). Now, in our study, field, mode and tenor were not found to be the determining factors of the variation observable; however, the probabilistic nature of the model accounts for the observed heterogeneity since register only expresses a probable, yet not a conventional or deterministic relationship. Nevertheless, a probabilistic perspective, though observationally adequate, takes an external view on the relationship between situation and language choice, and thus does not account for the speakers’ own mechanisms they bring to bear when choosing their utterances. For instance, Hasan (2001) writes about child-directed speech: “all mothers do not talk the same way even while engaged in the same activity (…) Their sense of what the context is a context for is critically different” (Hasan, 2001, p. 7). Reasons for the variability may be the “subject’s social positioning” and her “stance to the universe” (Hasan, 2001, p. 8). Thus, while describing probable speaker behaviors in certain types of situations, the notion of register may not be revealing with respect to the mechanisms that determine speakers’ choices. A question related to the discussion of the notion of register is whether there is one primary variety that serves as the prototype for the others (e.g. Ferguson, 2004, p. 180). It has been proposed that CDS constitutes the prototypical simplified register from which the others are derived (e.g. Hatch, 1983; DePaulo and Coleman, 1986; Prato-Previde et al., 2006); CDS is acquired very early, used by a broad range of speakers in the same culture, and it is the only variety that is relatively homogeneous, which makes it a better candidate for a prototypical variety than the others. The question is thus: –

 the three varieties investigated related, with CDS being the prototype of a Are ‘simplified register’?

The hypothesis underlying this question is that speakers make use of their knowledge of ‘Baby Talk,’ for instance, when talking to a foreigner (Long, 1981; Hatch, 1983; Ferguson, 2004, p. 231). By comparing speech to children, to foreigners as

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 7.  Conclusions 265

well as to mentally challenged and to normal adults, De Paulo and Coleman (1986) address the question whether there is a prototype-based relationship between simplifications for different communication partners. Thus, they test empirically whether we can assume that CDS constitutes the prototype from which other ways of simplifying for particular communication partners are derived. However, they do not find evidence to support this hypothesis. The method by means of which they approach the problem provides as much evidence for the hypothesis that suspected competence plays the major role as for the hypothesis that status is the main determining factor as for the prototype hypothesis. They conclude: “apparently, BT1 is not a seamless whole that is extended only as such, but instead, it consists of a set of components” (DePaulo and Coleman, 1986, p. 956). The components they identify are clarification, simplification, timing, attention maintenance and affect (DePaulo and Coleman, 1986, p. 956). These components are affected differently by the different hypotheses; for instance, the competence hypothesis, which states that speech will be adapted to the presumed level of competence of the listener (DePaulo and Coleman, 1986, p. 952), suggests that children should be considered less competent than a mentally challenged adult, who should be considered less competent than a foreign adult, who in turn should be judged less competent than a native adult. This hypothesis accounts well for the findings regarding linguistic adjustments aimed at clarification, yet not for the other components. In contrast, the “status hypothesis accounts well for most of the findings on clarification, simplification and attention maintenance” (DePaulo and Coleman, 1986, p. 956). This means that linguistic utterances are adapted to the status of the listener; in particular, the authors hold that children have the least status, while retarded adults have more status than children but less than foreign adults, who again have less status than native adults (DePaulo and Coleman, 1986, p. 952). Linguistic features designed to clarify and to simplify one’s speech as well as to maintain the listener’s attention are indeed ordered in this way. However, the predictions for the prototype hypothesis would in fact be the same as for the status hypothesis (DePaulo and Coleman, 1986, p. 952). Thus, it would also predict that while CDS is the prototype, speech to retarded adults is closer to the prototype than speech to foreigners, and that native adults are at the other end of the spectrum. Therefore, the methods used here do not allow any definite conclusions regarding the prototype hypothesis since children, retarded adults, non-native speakers and native adults may all be argued to differ with respect to suspected competences, as well as status and their relative distance to the prototype CDS, if there is one. DePaulo and Coleman (1986) therefore conclude that people adjust

1. ‘BT’ stands here for baby talk, i.e. child-directed speech.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

266 Designing Speech for a Recipient

to their respective communication partners on the basis of individual functional components (DePaulo and Coleman, 1986, p. 956). That speakers’ strategies are adapted feature by feature to the respective communication partner’s needs is supported by studies on pronunciation; Burnham et al. (2002) show that speakers make very subtle phonetic and prosodic distinctions concerning individual components of simplification. In particular, they find that while speakers use the same high pitch for their pets as they do for their infants (though not for other adults), they articulate the vowel space differently for children, adults and pets, such that it is exaggerated for infants, normal for adults and smaller than usual for pets. Thus, depending on whether language understanding is at issue or not, speakers adapt their articulation. Similarly, pitch is adapted to different communication partners for other, most likely emotional or attentiongetting functions. Biersack et al. (2005) and Uther et al. (2007) investigate speech to children, foreigners and native adults in comparison, though unfortunately based only on imagined communication partners. They find the specific phonetic and prosodic adjustments made for foreigners to be different from those made for children. These results are in line with the findings of the current investigation which suggest that the adjustments observable in interactions with children, foreigners and robots are results of speakers’ strategies, not properties of their linguistic behavior (cf. also Fischer, 2006b). For instance, we have found people to adjust their speech differently for the three kinds of communication partners investigated; we have found them not to transfer mindlessly from knowledge of known communication situations to unfamiliar ones, as for instance from speaking to a dog to speaking to a dog-like robot; we have seen finetuning in CDS and high sensitivity to the verbal utterances robots produce, which are being oriented to both for alignment and for partner modeling. The observable linguistic behaviors are thus employed by the participants as solutions to the problems they (believe to) encounter. The sources of variability are therefore participants’ different hypotheses about what the problems consist in, and what the solutions could be. These hypotheses are inspired by the participants’ ideas about the capabilities of their communication partner, the task and language, but also depend on the speakers’ own agendas. This plethora of factors shaping speakers’ understandings of the respective situation leads to individual adjustments to the respective communication partner in a given situation. The link between linguistic choice and situation is consequently strategic, depending on the speakers’ understanding of the requirements of the situation. Our analyses have thus shown that speakers do not understand language as a simple tool that they take out and apply. Instead, they regard it as a sophisticated toolbox that comprises different tools for different tasks and tools that may differ in complexity.

Chapter 7.  Conclusions 267

The language chosen is consequently not a conventionalized reflex to situational variables but a set of solutions to particular problems that are employed strategically. The relationship between language and situation thus becomes a complex system of people’s ideas about the requirements of the situation and their linguistic solutions to these problems. While the current investigation does not allow us to make any assertions about psycholinguistic processing mechanisms, the current discussion sheds nevertheless some light onto the question concerning the kinds of mechanisms involved in speaking for somewhat restricted communication partners and for those who are more similar to the respective speaker. Thus, in Section 1.1 Open Issues we asked:

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.



Do people draw on different mechanisms when formulating their utterances for somewhat restricted communication partners compared to when they are speaking to someone similar?

Pickering and Garrod (2013) suggest that speakers may use prediction by simulation if the partner is similar to themselves and prediction by association if the partner is judged to be rather different (e.g. Pickering and Garrod, 2013, p. 18). However, some of the commentators raise certain concerns against this hypothesis; for instance, Kashima et al. (2013) suggest that speakers’ intentions act as a mediating factor such that neither association nor simulation account for the possibility that speakers may choose to diverge from their partners’ representations. Thus, communicative and action intentions modulate the perception–action link in joint activities (Kashima et al., 2013, p. 34). This criticism is related to the criticism by Pezzulo and Dindo (2013) who suggest that partner orientation is strategic, and that speakers’ choice of signaling is strategic and intentional, often aiming for longlasting communicative effects (Pezzulo and Dindo, 2013, p. 44). This position is in accordance with our finding above, that alignment is largely determined by speakers’ partner models. Similarly, Mani and Huettig (2013) argue that actually many different mechanisms may be involved, not just the two proposed by Pickering and Garrod (2013), of which many areas are yet largely unexplored (Mani and Huettig, 2013, p. 37). Moreover, Yoon and Brown-Schmidt (2013) argue that speakers have been demonstrated to take their partners and the context always into account and that thus ego-centric models are neither necessary nor adequate. Our own analysis does not suggest that there are different mechanisms involved since there is no evidence that there are categorical differences between speaking, for instance, to children or foreigners in comparison with speaking to other adults, or over time when the child or the linguistic competence of the nonnative speaker develops. That is, studies of CDS and FDS that involve a comparison with speech directed to other adults, do not report on any categorical differences but rather a smooth continuum (e.g. DePaulo and Coleman, 1986; Burnham et

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

268 Designing Speech for a Recipient

al., 2002; Biersack et al., 2005; Uther et al., 2007). Furthermore, especially in FDS and RDS we have seen that there are usually many utterances that are completely ‘normal’ such that speakers switch between adjusted and non-adjusted utterances within the same interaction. Moreover, we have seen in the comparison of the three varieties that people adjust particular linguistic features for their communication partners, depending on their judgments of the affordances of the current communicative situation. Such an assessment must also involve a decision on how similar the communication partner really is, as also Yoon and Brown-Schmidt (2013) point out. Furthermore, our findings support a view of language use in which specific features are chosen strategically for specific partners, thus supporting a view in which speakers intentionally signal their understanding of the situation to their partners in order to make their behavior predictable, as is suggested by Pezzulo and Dindo (2013) and by Kashima et al. (2013). Thus, even though the methods used in this study do not allow us to comment on psycholinguistic processing, the current findings favor a model in which speakers make individual strategic choices on the basis of their understandings of the current situation. In the absence of evidence what the communication partner and the current situation require, as when speakers are faced with an automatic speech processing robot, but also with strangers (see Clark, 1998b) or preverbal infants (see Kaye, 1980), people may need to take a guess at what is appropriate, or they may rely on stereotypes, since, as Clark (1998b) shows, people need to know to which communities the communication partner belongs before they can make their linguistic choices. In the absence of such evidence, people may also pretend their communication partner to be of a certain kind (Clark, 1996, 1999); in this case, speakers’ insecurity about their communication partners’ needs is reflected in the variability in their choices, as apparent in IDS (see Kaye, 1980), in FDS (see Smith et al., 1991) and in RDS (see Section 6.4.3 Personal Partner Models and Linguistic Choice above). At the same time, this need seems to make speakers use whatever they can get to build up partner models that may inform their linguistic choices. To assume the existence of such a need (Clark, 1998b, p. 84) accounts for our finding above that people use the robot’s utterances to infer its preferences, as well as for the results of the study by Fischer et al. (2011) in which people adjusted their gestures to the robot’s speed of eye gaze while they did not adjust their linguistic choices in the absence of any verbal feedback or signs of receptive ability from the robot. Given that linguistic choice seems to be determined by speakers’ specific solutions to perceived problems, does it make sense to speak of child-, foreigner- and robot-directed speech as registers at all? Although the observable linguistic properties seem to result from the users’ individual strategic decisions with respect to what they perceive as the requirements of the current situation, it was possible to identify groups of speakers with similar behaviors. Thus, speakers seem to cluster

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 7.  Conclusions 269

around different assumptions about what the situation requires, that is, what problems are posed by the supposed nature of their communication partner, the task and the language. As we have seen in the discussion of RDS, speakers in general believe robots to be in need of formal descriptions, to be more likely to be able to understand abstract (e.g. object, obstacle) rather than basic level category labels (e.g. cup, mug, box) and to be lexically, morphologically, and syntactically restricted, as well as restricted in their perceptive capabilities. In these respects, all users seem to agree, and since, as Clark (1998b) suggests, speakers make their linguistic choices on the basis of their judgments about which community memberships they share with their communication partners, we can expect similar linguistic choices in response to similar judgments of community membership. Correspondingly, given the relatively homogeneous view of infants in Western societies (Scheper-Hughes, 1985), talking to infants and children in English produces similar strategies from different speakers, resulting in relative homogeneity. While foreigner talk is very heterogeneous, different tendencies could be identified depending on interactants’ judgements of the affordances of the situation. And finally for RDS we have seen that people who take the robot to be a social actor use similar linguistic features throughout the interactions. Thus, if speakers judge a situation in a similar way, they will speak in a similar way, which may still render the notion of register a useful starting point. This view of register corresponds to the more recent conceptions of genre in which genre is defined as a “socially significant activity” (Bateman et al., 2007, p. 147), a “linguistically-realized activity type” (Martin, 1985, p. 250). By relating linguistic choices to certain activities, more structured relationships between language and situation can be defined than if the basis of comparison is a situation type (see Bateman et al., 2007). This notion also allows for dynamic changes in language choice based on different phases in the ongoing activity. Such a detailed, activitybased concept may account, for instance, for the observed differences in CDS depending on the activity caregiver and child are engaged in, and for the differences between business and language learning activities in FDS. In order to account for the findings on the role of interpersonal situation definitions like those observed in RDS, linguistic choice however needs to be related to personal situation models, that is, personal understandings of the affordances of a given situation, which correspond to the kind and structure of the activity speakers assume to be engaged in. However, genre theory, like register theory, assumes that social activities are social, not personal, interactional or even idiosyncratic, and that the link between social activities and linguistic choice is a statistical correlation. A genre or register from this perspective is a probabilistic description of a relatively homogeneous, common, socially construed understanding of an activity and the corresponding set of linguistic features chosen in response to a particular set of tasks. This notion covers,

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

270 Designing Speech for a Recipient

for instance, genres like newspaper articles, where there are socially construed structures that are not entirely functionally motivated (see Bateman, 2006; Bateman et al., 2007; Hasan, 2001; Martin, 1992). For most of the potential registers analyzed here, socially shared expectations were not found to play a big role; instead, speakers’ choices could be characterized to be mostly strategic (though not always functional in the sense of ‘effective’). Thus, we can assume a cline between activities in which behaviors are more functionally determined as ad hoc responses to the perceived affordances of the situation, and more socially (re)constructed activities, like writing a newspaper article. Thus, in the sense that a register is a variety grounded in the kinds of activities speakers understand themselves to be at issue would be consistent with the patterning observed in the current investigation. A more recent proposal to account for linguistic variation according to situation is the usage-based approach to language (cf. Bybee, 2007; McClelland and Bybee, 2007; Langacker, 2008, 2000; Bresnan and Hay, 2008; Bresnan and Ford, 2010). The usage-based model, which allows for the retention of individual exemplars as well as for partial and full schematization, suggests that people store linguistic usage events as “pairing of a vocalization, in all its specificity, with a conceptualization representing its full contextual understanding” (Langacker, 2000, p. 95). This conceptualization may be embedded in whole episodes related to all kinds of situational aspects. Evidence for the exemplar-based model and the retention of complex, scenebased representations comes, for instance, from socio-phonetic variation; Bybee (2010) argues that “in order for a certain phonetic form to be associated with males or females, upper or lower socioeconomic class, or geographic region, these details of pronunciation experiences in oneself and others must be registered in memory and indexed to individuals or classes of individuals” (Bybee, 2010, p. 21). Thus, who says what to whom under which circumstances seems to be memorized by interactants (cf. also Trude and Brown-Schmidt, 2012). She concludes (Bybee, 2010, p. 31): “At every level of grammar evidence can be found for rich memory representations.” The results obtained in this investigation strongly support accounts of language knowledge that comprise both schematic and exemplar-based representations (e.g. Abbot-Smith and Tomasello, 2006; Bybee, 2010) since people store knowledge of surface features of whole chunks of their partners’ utterances, as we have seen both in CDS (Snow and Goldfield, 1983) and in RDS (see Section 6.6 The Relationship between Partner Modeling, Alignment and Feedback), where speakers make use of linguistic expressions previously used by their communication partners when communication problems occur. Since however speakers cannot know whether communication problems will occur (see Schober, 2004), they must store such information about their partners concomitantly. The results of this investigation thus

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 7.  Conclusions 271

support the usage-based view of language knowledge, at least for interactions with the same partner; at the same time, the results cannot be explained by models of language that look at each sentence uttered as generated by context-free resources. The current investigation has thus shown that people store very detailed characteristics of their communication partners (including their preferred positions and words used etc.) as well as episodic knowledge. The usage-based model also holds people to retrieve linguistic features that are associated with similar situations. Since the usage-based model assumes language to consist of pairings between grammatical forms and conceptualizations, these conceptualizations may potentially include also knowledge on typical situations of use in the meaning components of a construction (cf. Fillmore, 1988; Fillmore et al., 1988; Fillmore and Atkins, 1992; Fischer, 2010a) in order to account for register effects (cf. Birkner, 2008). Such a model, which provides room for both particular, personal, and generalized, communal, experience, would account for the heterogeneity observed in the current study; as Bybee argues, “[s]ince exemplars provide a record of a speaker’s experience with language, exemplar models allow the direct representation of both variation and gradience” (Bybee, 2010, p. 32). The usage-based paradigm may therefore be a useful way of understanding the relationship between language and situation. On the other hand, Langacker (2008) argues that grammatical choices, because they constitute solutions to recurrent communicative problems the speakers of a speech community have faced in the past and are likely to face in the future, are abstractions from concrete situations and thus schematic with respect to specific contexts of use. Furthermore, our investigation has shown that not objective aspects of situations but rather participants’ understandings of these situations influence language choice. Thus, speakers attend to communicative functions rather than to situational factors. Moreover, they also make functional differences when retaining aspects of linguistic use in situation. One example is the finding by Kraljic et al. (2008), that speakers choose what to retain and and what to ignore – participants were found to ignore accidental factors of a speaker’s pronunciation and to retain features that they regarded to be characteristic of the speaker. Thus, for the usagebased model holds the same as for the concept of register, namely that formal features are associated with functional, subjective understandings of the affordances of situations rather than with situations themselves. What we can thus conclude concerning the notion of register is that speakers use strategies, not linguistic properties, to fulfill those functions that they regard relevant with respect to their own understanding of the situation. From the speakers’ perspectives, there are only functional choices based on their understandings of the affordances of the situation. In contrast, from an external perspective, certain regularities of speakers’ understandings of objectively identical situations may

272 Designing Speech for a Recipient

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

be possible, which may then be probabilistically associated with certain types of situations or the activities participants are involved in. It is possible that in other situations, speakers use linguistic choices to evoke a certain genre (for instance, when writing a newspaper article, (cf. Bateman et al., 2007)). In these situations, concepts like register or genre may be attended to and thus constitute members’ categories, that is, categories that are not only external descriptions of regularities of behavior but emic categories attended to by the participants themselves since in such situations, indicating the current activity or even behaving conventionally (see Crystal, 2001) may be crucial. Nevertheless, the question whether actually observable patternings of linguistic features are due to speakers’ orientation at a particular register or genre or whether the patterns observable are just an epiphenomenon caused by the fact that speakers conceptualize the affordances of a situation in similar ways and thus make similar choices is an entirely empirical issue. In the case of the three candidates for simplified registers presented in this study, speakers could not be shown to attend to and orient at particular registers or genres. Instead, some speakers were found to understand the affordances of a given situation in similar ways and thus to make similar linguistic choices. From this perspective, relatively homogenous linguistic choices and perceived affordances of situations may be correlated, which makes the notion of register useful only as as an etic, external concept, not as a ‘member’s category’ (Sacks, 1996).

7.2  The role of the addressee in interaction – some answers This section addresses those open issues raised in Section 1.1 Open Issues that concern how the addressee is oriented to in interaction. To begin with, the question, what is affected by addressee orientation, can be answered now: –

 What is influenced by the orientation to a particular other; that is, which linguistic features are affected by addressee orientation?

The empirical studies of the three supposed varieties confirm previous findings reported on in Chapter 2 The Role of the Addressee: Central Notions, namely that every single linguistic level, from phonetic, phonological and prosodic, via syntactic and semantic to pragmatic or discourse strategic, such as perspective taking, may be adjusted to the suspected needs of a communication partner. For instance, the comparative studies of CDS and FDS discussed in Chapters 4 Talking to Children and 5 Talking to Foreigners (e.g. Biersack et al., 2005) have shown that speakers adapt phonetic properties like the vowel space, phonological and prosodic characteristics, such as syllable lengthening and intonation contours, to their respective communication partners. Furthermore, lexico-grammatical adjustments were

Chapter 7.  Conclusions 273

found in all three varieties investigated; these adjustments were partly similar (like generally shorter MLUs), and partly peculiar to the situation under consideration (like the ‘ungrammatical grammatical’ modifications found in FDS and the use of technical language in RDS). Finally, pragmatic and discursive adjustments were found that were peculiar of each variety, such as the use of variation sets and the skewing of distributions towards prototypical slot fillers in CDS, the interactional modifications in FDS and the refusal to react to social information in RDS. Thus, indeed all linguistic levels may be adapted to the particular communication partner’s supposed needs. Next, we can address now the nature and role of partner models, the role of alignment, the role of feedback, and how these three resources interact, combining the different findings from the previous chapters.

Partner modeling In the current study, we have attempted to distinguish between online interactional processes based on alignment with the partner’s utterances, the collaboration between speaker and listener based on the listener’s feedback, and the role of partner modeling. Concerning the relationship between partner models and locally managed discursive practices, we therefore asked:

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

– –

 Under which circumstances do speakers build up global models of their communication partners? Under which circumstances do speakers rather attend to local cues, i.e. when do we need to assume addressee orientation to be an interactional achievement based on the partner’s contributions?

A first important argument for partner modeling comes from the study of interpersonal variation in RDS; in the studies involving the robots Aibo and Rolland, considerable interpersonal variability in speakers’ preconceptions about the social role of the robot was observed, which was found to influence speakers’ linguistic choices even over time. Since the robots’ verbal and nonverbal behavior in these dialogs is identical, speakers must bring preconceptions into the interaction that have a considerable impact on their linguistic behavior in the subsequent interactions. This is also likely to be the case in FDS, as Roche (1989), Smith et al. (1991) and Zuengler (1991) show, who observe interpersonal variation in interactions even with the same foreign communication partners. A second argument for partner modeling comes from the study of feedback in RDS; the communication partner’s verbal and nonverbal behavior could be shown to be exploited by the participants for building up and refining their partner models (see Section 6.5 The Role of the Partner’s Feedback). Here we have found participants to make use of features in the robot’s feedback in order to build up a model

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

274 Designing Speech for a Recipient

of its preferences and functionalities; in particular, a feedback utterance was interpreted by more than half of the participants as an indicator for a characteristic property of the robot, namely the need to be in front of an object in order to be able to perceive it. Thus, participants made use of local feedback to build up a global model of the artificial communication partner. A third argument for concomitant partner modeling is given by those differences that characterize participants’ speech to the verbal Aibo in contrast to the non-verbal Aibo and which were not caused by anything the robot had said itself (Section 6.3 Alignment and Cha et al. (2014)). Thus, participants’ linguistic choices changed significantly concerning features that cannot have been influenced by alignment with the robot’s utterances because they were not part of these utterances. These changes can only be due to significantly different partner models based on the robot’s utterances. Finally, additional evidence for the fact that addressee orientation relies heavily on partner modeling comes from studies like DePaulo and Coleman (1986) or Newman-Norlund et al. (2009), for instance, who find addressee orientation to differ for different addressees on the basis of the mere belief to be interacting with a child, a foreigner or a native adult. Moreover, studies of the role of expectations in FDS, such as Warren-Leubecker and Bohannon III (1982), and on the influence of caregivers’ theories of children’s linguistic and cognitive development in CDS (see Ninio, 1979; Rowe, 2008) illustrate the impact of speakers’ partner models on the way speakers may talk to their partners. These findings can only be explained on the basis of cognitive models of the partner and of the communicative situation which participants construe. By creating partner models, people partly rely on what the communication partner reveals about him-, her- or itself and the process is thus interactionally driven, yet is also dependent on people’s cognitive construal of the situation, as the impact of their preconceptions in all three varieties studied demonstrates. We shall return to the relationship between partner modeling and interactional negotiation below in our discussion of the role of feedback. –

 what kinds of information about their partners do participants attend, To which aspects of their communication partners do they treat as relevant and how do their choices leave traces in participants’ linguistic behavior?

Especially in our investigation of RDS, Clark’s categories of common ground were found to be relevant to the participants such that they attended actively to establishing common ground with their unfamiliar communication partner. Especially in those cases in which information was not available to the speakers, the speakers’ attention to the categories proposed by Clark (1996) became apparent from the clarification questions they asked (see Section 6.4.3 Personal Partner Models and

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 7.  Conclusions 275

Linguistic Choice). That speakers treated some of these categories as preconditions to formulating their instructions (cf. Schegloff, 1972a), provides evidence that people really take these categories into account. In the studies involving the robot Rolland, for instance, speakers were found to attend to their partners’ cognitive, linguistic and receptive capabilities, preferences, and linguistic productions since these were relevant for the current interactions. In contrast, in the study comparing speech to the robots Aibo, Scorpion and Pioneer, we found participants to make only little use of robots’ appearance, even though previous studies (e.g. Roche, 1989; Giles and Franklyn-Stokes, 1989; DiSalvo et al., 2002; Baylor et al., 2006; Powers and Kiesler, 2006; Van Vugt et al., 2010) had shown that appearance can be attended to as a relevant clue. Given speakers’ general attention to establishing common ground, the lack of attention to the robots’ appearances in these dialogs may be due to the fact that the robots’ appearance had no impact on their functionalities and did thus not provide clues to particular properties of the robots and were therefore not helpful for establishing common ground. Similarly, the studies by Cross et al. (1985) show that mothers do not rely on all cues equally in order to shape their utterances for their children; instead, mothers orient mainly to children’s receptive ability and not, for instance, to their children’s absolute age (and thus also not to body size, for instance) or to their cognitive abilities. Consequently, speakers only attend to those aspects of their communication partners that they understand as revealing with respect to their respective communicative goals. This finding is in line with constructivists approaches to context and common ground (see Chapter 2) The Role of the Addressee: Central Notions, which reject the idea that context of may be objectively given (e.g. Scollon and Scollon, 2001). In principle, however, all kinds of aspects of the partner may be attended to. These considerations lead us to the following question: –

What about their partners do speakers model and in how much detail?

In addition to all those aspects of common ground listed in Clark (1996), we found that people attend to and memorize individual words and constructions used by their partner, as well as their distributions; in CDS, for instance, Roy (2009) and Roy et al. (2009) find that both parents and the nanny kept track of words the child knew and words the child was about to learn and adjusted their utterance length accordingly. In RDS, we found participants to memorize words and constructions used by the robot and to make use of them later when communication problems occurred. Moreover, in perspective taking, participants were found to keep track of how often their partners had taken their perspective (Schober, 1995). So speakers retain very detailed properties of the surface realization of their partners’ utterances, and they keep track of the linguistic material presented by

276 Designing Speech for a Recipient

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

their communication partners and its frequency of occurrence, even over longer stretches of talk (cf. also Heller et al., 2009; Bybee, 2010). To sum up, speakers may take very detailed information about their communication partners’ linguistic abilities into account, creating very elaborate models of their linguistic competence; furthermore, there is evidence that people make use of all kinds of other information about their communication partner, such as appearance and behavior, as, for instance, Nass and Brave (2005), Roche (1989), or Baylor et al. (2006), among many others, have shown. However, as we have seen above, people seem to pick these clues up selectively. That is, they pick them up if they regard them to be clues to characteristics that may be interactionally relevant later. This became apparent in the study of robot appearance in Section 6.4.1 Appearance, in which the robots’ appearances were not revealing regarding the robots’ behaviors and consequently not much attended to. The amount of detail modeled may thus also depend on speakers’ goals; in FDS, for instance, speakers sometimes orient more to stereotypes than to their real partners, depending on their identity needs (see Giles et al., 1991). Furthermore, in RDS, participants were found to be resistant to linguistic information conveyed about the robot in the robot’s verbal output if they did not regard their artificial communication partner to be a social actor (see also Fischer, 2011). This suggests that people’s goals and identity needs play a considerable role in partner modeling as well (cf. Giles et al., 2006). We can thus answer the following question now, too: –

What role do social factors, such as social status, identity needs or the interpersonal relationship between the speakers, play?

The considerable amount of interpersonal variation observed and the different personal partner models in RDS indicate that partner models crucially depend on speakers’ understanding of the social affordances of the situation; that is, speakers carry certain preconceptions or personal preferences into the objectively identical interactions with the robot which are related to their willingness to understand the communication partner as a social actor. Our studies on RDS have revealed that the degree to which people are willing to understand the robot as a social actor predicts their subsequent linguistic behavior (cf. also Fischer, 2011). Speakers’ willingness to understand their communication partners as social actors may thus crucially influence whether speakers develop accurate partner models and adjust to their partners. The same effect could be found by Lee et al. (2010) who on the basis of suggestions by Fischer (2006a) identified correlations between users’ behavior in conversational openings and their conversational behavior throughout interactions. Similarly, interpersonal variation in FDS has been related to social and identity needs in a given situation, which influence speakers’ openness towards the foreign communication partner (cf.Roche, 1989; Smith et al., 1991; Zuengler, 1991). In

Chapter 7.  Conclusions 277

CDS, Haggan (2002) has shown that speakers’ understanding of the interpersonal relationship between caregiver and child may influence the way people report on speaking to children, if not their speech as well, thus explaining variation in the ways people adjust to children (or not) in different cultures (Ochs and Schieffelin, 1995; Ninio, 2011). Likewise, Turkle (2006) has shown that people may be biased towards developing certain interpersonal relationships with robotic pets based on their personal social and identity needs. Furthermore, in interactions between humans and a robot, Lee et al. (2006) find that the degree of loneliness of a person correlates with his or her judgment of the robot’s social presence. Social factors and matters of identity management may thus have a considerable impact on the amount and type of addressee orientation observable.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Alignment Pickering and Garrod (2004) propose that recipient design takes place implicitly and automatically by means of alignment unless participants encounter communication problems. The results of the current investigation however clearly show that alignment does not take place automatically and that the interactive alignment model does not constitute an adequate representation of the role of the partner in communication. The questions we raised at the beginning of the study were: – – –

Can alignment account for the addressee orientation observable? Under which circumstances does alignment take place? What is the relationship between partner modeling and alignment?

In IDS, alignment provides no explanatory concept for caregivers’ linguistic adjustments at all since the pre-verbal infant does not yet produce linguistic material the caregiver could align with. Thus, other mechanisms need to be taken into account. This observation alone however does not suffice to discard the automatic alignment hypothesis since Pickering and Garrod assume partner modeling to occur under special circumstances, such as if the partner is very different from the speaker (cf. also Pickering and Garrod, 2013), which is the case in the communication with pre-verbal infants. However, since IDS has not been reported to require particular efforts, the mechanism involved in speaking to infants is not likely to be very different from speaking to other communication partners, which casts doubt on the hypothesis that alignment is the default mechanism which is abandoned only under exceptional circumstances (Mani and Huettig, 2013). Furthermore, several studies have shown that parents adjust their speech not to children’s actual behavior but that they take into account what they are about to learn (e.g. Filipi, 2009; Roy et al., 2009). In foreigner-directed speech, speakers have been found to align sometimes and sometimes not. Pickering and Garrod (2004) suggest that speakers align

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

278 Designing Speech for a Recipient

automatically by default and that they invoke explicit models of their communication partners only in exceptional situations, such as during deception. This, however, is taken to be cost-intensive. If Pickering and Garrod were right, then foreigner talk would be easier for speakers who align than for those who do not; interactional analyses however provide no indications that this could be the case. On the contrary, in Snow et al. (1981), it was the more experienced employees who adjusted to their foreign communication partners. Furthermore, particularly in the analyses of human-robot interaction it became clear that alignment is used strategically, either to solve communication problems or to establish common ground with respect to the joint pretense of communicating as if the robot was a social actor (see Section 6.6 The Relationship between Partner Modeling, Alignment and Feedback). For instance, alignment was found to be used strategically to restrict one’s own speech to those constructions the robot would be likely to understand because it had used them itself. Moreover, alignment was not only found not to substitute partner modeling and to be an automatic, effortless alternative to it, it was also found to be dependent on it; the partner model in fact determined the amount and functions of alignment. That is, once speakers had built up a partner model, they did not align with robot utterances that exhibited features that did not fit the model, as evidenced by the selective alignment with only certain pronunciation features in the study by Kraljic et al. (2008) and with lexical material in contrast to the reference system reported on in Fischer and Wilde (2005), where those aspects of the robot’s output were aligned with that participants believed the robot to be an expert on, whereas they did not align with robot output if they considered themselves to have more expertise. Similarly, the selective alignment with the robot’s social greeting in Sections 6.4.3, Personal Partner Models and Linguistic Choice, and 6.6, The Relationship between Partner Modeling, Alignment and Feedback, shows that participants choose strategically what to align with, and some speakers were found to be resistant to alignment due to a certain preconception of what the human-robot communication situation is all about. This is in line with previous work in which speakers were found to align more with what they believe to be a cheap, very basic computer than with an expensive, top-quality system (Pearson et al., 2003, 2006a, b). Furthermore, the studies of alignment in the communication with the verbal robot Aibo show that participants align with the robot in order to facilitate communication, even though this may increase their own work load, as we have seen with the extrinsic reference system which speakers employed even though it was completely opaque to themselves. Alignment thus interacts in complex ways with models of the partner and of the affordances of the current situation. From this perspective, alignment is merely one possible strategy speakers may or may not make use of. At the same time, both alignment and disalignment have been shown to have important

Chapter 7.  Conclusions 279

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

interactive functions (cf. Erickson, 2004), which supports strategic accounts of alignment depending on what the participants regard to be at issue. As Mills and Healey (2008) argue, alignment serves as a resource to the participants, rather than being an automatic process based on priming (cf. also Mills, 2014). Finally, the uptake of lexical and constructional material from the communication partner can partly be attributed to alignment, but as the human-robot experiments have shown, speakers must also rely on partner modeling when making their linguistic choices, since many of the adaptations speakers were found to make were not due to alignment, but due to different concepts of their artificial communication partner as developed in the course of the dialog and on the basis of their partner’s linguistic output. That is, in the interaction with robots, speakers were found to use the occurrence of one linguistic feature as a resource for inferring some others (see Section 6.3 Alignment). Thus, the partner’s linguistic behavior was used to refine the partner model, which in turn influenced further linguistic choices. To sum up, alignment seems to be one powerful means to cooperate in interaction, yet it crucially depends on participants’ interactional goals and partner models. It thus does not account for the addressee orientation observable, and it is determined by partner models rather than providing a computationally cheap alternative to it.

Feedback The third resource considered in this study is the influence of the partner’s feedback on the speaker’s linguistic choices: –

What does the partner’s feedback contribute to the formulation process?

The results of the literature overview in Chapter 2 The Role of the Addressee: Central Notions has revealed that feedback plays a central role in the formulation process. As Gardner (2001) shows, speakers may rely on feedback, for instance, for determining whether the listener needs more information on the same topic or not. That is, he finds that the particular prosodic realizations of the feedback signal mm have certain interactional consequences, which include whether a topic is pursued further or not. Such a process is entirely locally managed, and speakers can rely on listeners’ responses to formulate their utterances. Feedback, even if it consists only of generic feedback signals like mm or uh-huh (Bavelas et al., 2000), may thus serve as an extremely detailed resource to display the listener’s current processing and information needs (cf. also Clark and Krych, 2004). As Bavelas et al. (2000) demonstrate, specific listener responses, which are related to the content and evaluation of the speaker’s utterances, play an even more important role since they support the speakers’ narrations. For instance, when

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

280 Designing Speech for a Recipient

listeners use specific responses, such as opening their eyes wide in surprise, speakers produce narrations that are rated considerably higher in quality than if listeners are distracted (Bavelas et al., 2002). This is in line with studies by Kraut (1982) and Schober and Clark (1989) who find addressees who were allowed to provide feedback to solve tasks more accurately than overhearers. If the conclusions of these studies are correct and feedback may provide speakers with accurate views of their hearers’ current information needs, speakers may not need to build up models of their communication partners’ epistemic and deontic knowledge (Givόn, 2005) at all and still be able to tailor their utterances very specifically to their partners’ needs. In the three case studies, further evidence showing that feedback plays a central role was found. In IDS, infants’ nonverbal behavior, especially attention and gaze direction, has been found to serve as clues for caregivers regarding their addressees’ reception of their utterances; for instance, Murray and Trevarthen (1986) have shown that the online responses of even only two-month-olds have an influence on the number of repetitions, declaratives and questions asked. Furthermore, Filipi (2009) has shown that caregivers understand the nonverbal infants’ gaze and gesture to indicate comprehension or to initiate repair, and thus even pre-linguistic addressees may contribute to establishing joint attention and joint understanding. Other studies of CDS also provide evidence that addressee orientation is locally managed and interactionally achieved: Snow (1972), for instance, shows that mothers do not adjust their utterances to their own (and other) children to the same degree when the children are not present. This suggests that caregivers do not employ comprehensive models of their children, but rather let themselves be guided by their children’s reactions. This is in line with the studies by Cross et al. (1980) and by Cross et al. (1985) discussed in Chapter 4 Talking to Children which suggest that it is mainly the child’s receptive ability that caregivers attend to and orient at. The authors investigate hearing and hearing impaired children of different age levels and find parents to be highly sensitive to their child’s respective receptive ability as apparent from the child’s linguistic and non-linguistic behavior. The authors therefore suggest receptive ability to be the most important predictor for the adaptations made. Thus, children’s feedback about their understanding of the caregivers’ utterances is suggested to play an important role in the shaping of CDS. Concerning foreigner-directed speech, the results obtained by WarrenLeubecker and Bohannon III (1982) indicate a considerable role of feedback. The authors find that subjects used longer utterances when the accomplice signaled that she understood. Conversely, feedback signaling failure to understand elicited significantly more repetitions and reformulations. Furthermore, subjects who received feedback indicating communication problems asked significantly more questions.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 7.  Conclusions 281

Also the study by Fischer et al. (2011) indicates that the communication partner’s contingent feedback plays a crucial role in the way utterances are designed for the communication partner. In their study, participants adapted to a simulated robot only with respect to those behaviors for which they received feedback from the robot. That is, the information participants received from the robot’s appearance and non-linguistic feedback (eye gaze) did not suffice to provide them with unambiguous information with respect to which kinds of linguistic structures the robot understands, resulting in both heterogeneous and relatively complex linguistic behavior. In contrast, the robot’s eye gaze provided participants with feedback on their gestures, which the robot followed with its eyes; correspondingly, gestures were found to be systematically adapted to the robot’s suspected receptive ability – leading to gestures that were even more exaggerated than those caregivers produce when speaking to their eleven-month-old infants. Fischer et al. (2013) investigate the interaction between nonverbal contingent and explicit feedback in humanrobot interaction and find that only if the robot reacts in socially contingent ways to the tutor’s instructions do tutors take the robot’s explicitly displayed, learned words into account by adjusting the ratio of keywords in relation to the number of words in general. Thus, socially contingent behavior serves as a clue to understanding by the partner, more so than the display of verbal knowledge. A further indicator that the partner’s behavior is considered online in the formulation process is the smoothness and apparent ease of conversation (cf. Pickering and Garrod, 2004). This is even true of human-robot interactions. For instance, the dialogs with the robot Rolland are characterized by considerable fluency. Furthermore, in the dialogs with Aibo those participants who reacted to the social greeting of the robot were found to co-construct utterances on the basis of the robot’s previous utterances, which results in very smooth dialogs (see Section 6.6 The Relationship between Partner Modeling, Alignment and Feedback). Moreover, the contingent interactions observed, for instance, between caregivers and children (e.g. Filipi, 2009), concern the local management of understanding in the ongoing interaction and are thus unlikely to be manageable on a purely cognitive level. These arguments speak for addressee orientation as an interactional achievement. However, since addressee orientation was also found to be partly based on partner modeling, the question arises how partner modeling and feedback are related: – –

What is the relationship between feedback and partner modeling? What roles does feedback play in the revision of partner models?

Above we have seen that feedback is made use of locally, i.e. contingently in interaction. One possible relationship between feedback and partner modeling is that people make use of feedback not only locally, but also globally to update and

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

282 Designing Speech for a Recipient

revise their partner models; it is principally possible that in addition to reacting contingently and locally to the partner’s contributions, speakers use their partners’ utterances to refine their partner models and to choose their next utterances according to their general, global model. Given the finding by Schegloff (1972a) that category membership relevant for the formulation of spatial descriptions is interactionally established, this possibility is very plausible. Besides the numerous clues to local effects of feedback, there is indeed evidence that speakers exploit their partners’ feedback globally, i.e. in terms of partner modeling. In RDS, for instance, we saw in qualitative analyses of reactions to robot utterances how participants made use of their partners’ feedback in a global way, using linguistic clues to infer the robot’s preferences and functionalities (see Section 6.5 The Role of the Partner’s Feedback). In particular, we have seen that a robot’s utterance was directly made use of to infer a robot’s preference, namely to be in front of an object in order to perceive it correctly. Furthermore, Fischer (2012b) shows that verbal feedback by the robot may guide users into more appropriate partner models, leading to much reduced numbers of out-of-domain vocabulary and to a much more precise understanding of the task. Feedback is thus oriented to by the speakers as a resource for partner modeling. Consequently, feedback plays a central role both locally and globally; addressee orientation thus does not exclusively depend on partner models and representations of common ground. This is particularly important since many psychological studies have focused on what kinds of information people represent about their communication partner; however, orientation towards the addressee may in fact be, in interaction, interactionally achieved (Schegloff, 1972a; Bavelas, 2005). Also Clark (1998b) argues that strangers will usually engage in a an exchange of indicators of community membership in order to provide each other with a basis for their linguistic choices. If however partner modeling is regularly carried out on the basis of the feedback from the communication partner, then lack of partner modeling does not necessarily entail that the speaker is egocentric (cf. Horton and Keysar, 1996; Horton and Spieler, 2007); the speaker may rather rely on local cues coming from the communication partner, as it has been suggested by Brown-Schmidt (2009). While the previous studies reported on so far have shown that feedback contributes to partner modeling, the question arises whether the reverse is also true and whether partner models determine the limits of their own revisions. Evidence that updates of the partner model through interaction take place solely within the limits of the current partner model comes from our findings on RDS where some people did not align with, and did not adjust to, the robot’s utterances if their first utterances already showed that they do not understand the robot to be a social actor. Thus, if a linguistic feature proposed by the communication partner does not fit the speaker’s model of his or her partner, he or she will not align with his or her

Chapter 7.  Conclusions 283

partner, as we have seen in Section 6.6 The Relationship between Partner Modeling, Alignment and Feedback above. The same is true of other kinds of adjustments, as we have seen in Section 6.4.3 Personal Partner Models and Linguistic Choice above; this is particularly problematic if the partner model is inaccurate or incomplete or determined by stereotypes, as it may happen in the interaction with non-native communication partners (cf. Roche, 1989). Here partner models do define the limits and the possible range of both alignment and the limits and range of their own revisions; that is, partner models are updated based on the partner’s feedback if and only if the feedback received is consistent with a possible extension or revision of the current partner model. To sum up, feedback contributes to addressee orientation in both local and global ways, providing both online information about listeners’ current needs and specific information for the updating of a partner model. At the same time, partner models may also restrict their own updating by means of feedback; local interactional achievement and the cognitive construal of the communication partner and of the affordances of the communication situation thus co-determine each other.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Linguistic choice We can conclude the discussion now by addressing the question concerning the patterning of linguistic features in a given communication situation: –

How are linguistic choices related to each other?

The adjustments speakers make as part of their recipient design are highly functional, as was demonstrated by the dependence of linguistic choice on the respective personal partner model; choices may however not be successful due to inaccurate or incomplete partner models, as we have seen especially in the discussion of the functionality of the adjustments of FDS and RDS in Sections 5.2 and 6.2. That choices are made in accordance with speakers’ assessments of the affordances of the current communication situation does however not mean that speakers make their choices consciously; as with most linguistic knowledge, speakers do not need to be aware of the functional implications of their choices or of the range of alternatives they have at their disposal in order to make appropriate, for instance, grammatical, decisions. We have also seen some evidence that linguistic choices may come in packages, such that the robot’s use of some linguistic features makes people also use some others (see Section 6.3 Alignment). In particular, in the interactions with the verbal robot Aibo, participants used more sophisticated lexical items and syntactic constructions than when talking to the non-verbal Aibo, even though these linguistic structures did not occur in the robot’s utterances. Instead, the robot’s use of some complex linguistic features made participants use other complex features which

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

284 Designing Speech for a Recipient

they suspected the robot to know as well. Thus speakers seem to have a model in mind of how easy or difficult a given structure is for a particular communication partner, given his or her linguistic productions. However, there is no evidence that the co-occurrence relationships observed are necessarily conventional, as suggested by some proponents of register theory (Ellis and Ure, 1969; Crystal, 2001); that a speaker who hears a robot use a relative clause construction uses a pseudo-cleft construction, for instance, may also be due to his or her ideas about the complexity of each construction and based on the assumption that if the robot knows the one, it will also know the other. Such kinds of reasoning may produce the same results as the conventional association between linguistic features, yet the underlying mechanisms are fundamentally different: One mechanism suggests that speakers decide for a particular register which comprises both relative clause and pseudo-cleft constructions, the other is based on tacit (folk) theories about the complexity of the individual linguistic structures and on distributional observations. The discussion of feedback in RDS above has shown that speakers do take some linguistic features as evidence for other capabilities, leading to inferences that are based on the tacit assumption that linguistic features form a structured inventory (cf. Goldberg, 2006). Thus, linguistic choices may be related to each other on the basis of similar expected difficulty for a given communication partner, i.e. adjusted to his, her or its perceived needs. Alternatively, the co-occurring linguistic choices may be based on retained knowledge on the situation-specific co-occurrence of linguistic features. The hypothesis that the relationship between choices might be conventional is rendered improbable by the fact that the varieties analyzed are generally extremely heterogeneous, both across persons in the same communicative situation and across communication partners (see Section 7.1.2 The Notion of Register above). However, since people store large chunks of their communication partners’ utterances and take into account distributional information in general (Gomez and Gerken, 1999), it cannot be ruled out that people may also retain co-occurrence relationships between items in similar situation; this view would be compatible with the usage-based model, which predicts people to keep track of all kinds of information on the usage of linguistic structures, including linguistic co-occurrence; this view is also in accordance with a probabilistic notion of register. However, whether or not they only keep track of distributional characteristics or also use them to inform their own linguistic choices is an open question. Given the heterogeneity of the three varieties investigated, there is not much evidence suggesting that people indeed attend to such co-occurrence relations during linguistic production.

Chapter 7.  Conclusions 285

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

7.3  General conclusions To sum up briefly the main findings of this investigation, no evidence was found for well-defined simplified registers that speakers take out and apply. Instead, speakers seem to regard language as a sophisticated tool box in which they have different tools for different tasks and tools that may differ in complexity and thus in appropriateness for different situations and communication partners. The linguistic properties of utterances chosen for a particular communication partner are thus not simple conventionalized reflexes to situational variables but a set of solutions to particular problems that are employed strategically. At the same time, we have found these strategical decisions to concern all linguistic levels; that is, all linguistic levels may be adapted to the particular communication partner’s supposed needs. The speakers’ strategic decisions regarding their particular listeners result in different simplification strategies in the communication with different communication partners and also in other linguistic behaviors besides simplification. There is consequently no single notion of simplification that could be used to predict the properties of utterances directed at somewhat restricted communication partners. Instead, people take steps to facilitate communication in response to what they perceive to be the affordances of the current situation, which may differ considerably from person to person. Yet our analyses have shown that if speakers categorize a situation in similar ways, they will speak in similar ways, which still makes the notion of variety a useful starting point. While the notion of simplified register is therefore not suitable to account for the orientation at the different addressees observed, linguistic behaviors can be paired with the participants’ own understandings of what a situation consists in and their interactional goals. There is thus a correlation between participants’ understandings of the affordances of the situation and certain linguistic choices. Accordingly, adjustments occurred more concerning what speakers believed about their communication partners than based on their real characteristics, as the occasional discrepancies in FDS and RDS (and to some extent even in CDS, see Gleason and Weintraub (1978)) show. We have found people to build up cognitive models of their communication partners, which they concomitantly draw on, yet which they also revise and update depending on the communication partner’s feedback. These partner models may comprise all kinds of information, and participants were found to orient actively towards building up partner models that provide them with a basis for their linguistic choices. Correspondingly, we have found people to keep numerous details about their partners in mind, including which words and linguistic constructions the partner had used, but also to rely on concomitant feedback. A model of the relationship between language and situation

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

286 Designing Speech for a Recipient

emerging from the results of this study will consequently have to take people’s constructive sense-making processes and personal episodes connecting language and situation into account. Furthermore, as Clark (1998b) suggests, we have found people to attend to the possible difficulties individual constructions and phrases may cause for their communication partners and thus to exhibit very specific knowledge about specific linguistic structures. Thus, linguistic choices could be related to each other on the basis of similar expected difficulty for a given communication partner. Our findings thus support models assuming rich memory representations (e.g. Clark, 1996; Heller et al., 2009; Bybee, 2010; Duran et al., 2011; Brown-Schmidt, 2012). Memory limitations, such as the failure to recall knowledge about the communication partner (see Horton and Spieler, 2007), were not found to be an issue in any of the three varieties investigated; on the contrary, people were found to remember exact phrases and linguistic structures their communication partners had previously produced, and particularly parents were found to have very good knowledge about their children’s current linguistic capabilities, as well as those they are about to acquire. Thus, people seem principally capable of retaining detailed knowledge about both individual linguistic structures and their communication partners, which seems to include rich episodic knowledge (see also Trude and Brown-Schmidt (2012) who find people not only to retain specific pronunciation variants for individual speakers but also to use them during online processing for disambiguation). While previous studies have raised doubts on the feasibility of rich memory representations and have thus either suggested ego-centric (Horton and Keysar, 1996), simplified, two-bit models (Brennan et al., 2010) or automatic associations rather than full representations (Horton and Gerrig, 2002; Brown-Schmidt, 2012), our study has shown that while partner models play a considerable role in linguistic choice as well as with respect to other mechanisms, such as feedback and alignment, they also rely heavily on moment-by-moment evidence from the partner (cf. Fischer et al., 2011; Gann and Barr, 2012; Fischer et al., 2013, 2014) on the one hand and appraisals of the whole situation and its affordances on the other. Thus, most situations (definitely those investigated here and most other experimental studies) do not afford an exact representation of the mental states of the partner, either because people can rely on the partner’s online feedback, or because not much depends on it. Gann and Barr (2012) demonstrate that speakers indeed rely on their partners’ feedback whenever they can and start planning only if feedback is not available (and the object is new but the partner is familiar and thus keeping referents apart constitutes a relevant communicative task). Thus, with Clark and Schaefer (1989a), we have to assume that people represent their partners “well

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 7.  Conclusions 287

enough for current purposes” (Clark and Schaefer, 1989a, p. 290).2 To conclude, partner models are not only concomitantly updated in interaction, speakers can also rely on their partners’ responses (Brown-Schmidt, 2012; Gann and Barr, 2012); furthermore, the granularity of these models has not only been found to depend on people’s attitudes and identity needs, but also on their judgments of what the situation affords on the whole (cf. also Gann and Barr, 2012). The richness of memory representation is thus likely to be modulated according to these parameters as well. This finding also underlines the special role addresses play in comparison to overhearers; in particular, in Chapter 2 The Role of the Addressee: Central Notions we have seen that much research in psychology (e.g. Schober and Clark, 1989; Branigan et al., 2007), in sociology (e.g. Goffman, 1979) and in pragmatics (e.g. Levinson, 2006) proposes asymmetrical contributions of participants, ascribing a special status to addressees. The current investigation has shown that people indeed build up very detailed models of their communication partners and monitor their behaviors closely in order to take it into account, and such processes necessarily have to be restricted to only a few special participants, i.e. addressees. Even though the current study did not probe into speakers’ online processing, the results nevertheless indicate that speakers in talk-in-interaction take their listeners into account at all times; in particular, we have found partner-specific effects at all levels and possible stages involved in linguistic production, including those structures that, no matter which psycholinguistic processing model is taken as a starting point, will always be at the core of language planning: argument structure constructions, grammatical voice, mean length of utterance, sentence complexity as well as the integration of F0 peaks, stress and information structure, not to mention word choice and the selection of contents. Thus, speakers take their partners into account from early planning onwards. These results support one-stage processing models, in which addressee orientation is an integral part and not an add-on, and show that linguistic production is indeed a social activity (see Clark, 1996, 1998b; Levinson, 2006). Moreover, the current study has revealed that both cognitive and social processes contribute to designing one’s speech for a recipient; concerning cognitive processes, participants’ understandings of the affordances of the situation and of the respective perceived capabilities of the communication partner have been shown to play a major role. With respect to social-interactive processes, people have been found to align with their partners, respond contingently to their feedback, collaboratively construe situation representations and attend to social structures 2. Concerning people’s reading strategies, for instance, the notion of satisfycing has been introduced to account for the fact that people do not generally target perfect understanding (cf. Krug, 2000).

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

288 Designing Speech for a Recipient

like turn-taking or adjacency pairs. Feedback furthermore contributes to addressee orientation by providing both online information about listeners’ current needs and to the updating of partner models. Feedback thus contributes to partner modeling, which underlines the dynamic nature of common ground, which is adjusted on the basis of interaction. Likewise, building up common ground was also found to be a constructive process since speakers only attend to those aspects of their communication partners that they understand as revealing for their respective communicative goals. At the same time, partner models may also restrict their own updating by means of feedback; local interactional achievement and the cognitive construal of the communication partner and the affordances of the communication situation thus go hand in hand. A similar relationship between cognitive and social processes has been observed with respect to alignment. Also alignment seems to be a powerful means to cooperate in interaction, yet it crucially depends on participants’ interactional goals and partner models. It thus does not account for the addressee orientation observable, and it is determined by partner modeling rather than providing a computationally cheap alternative to it. For instance, we have found speakers to re-use linguistic structures their partners used especially in the face of communication problems, but also to indicate common ground, particularly to establish a common starting point at the beginning of a dialog. These findings suggest alignment to be a social practice. At the same time, these processes could be shown to always take place in reference to a partner model and speakers’ interactional goals. metarepresentations

affordances of situation defines limits PM

personal agenda

Partner – appearance – behavior

online update

Choice/ behavior (including alignment)

Figure 7.1  Model of the relationships relevant for recipient design

Finally, the situation is complicated by the fact that participants in interaction collaborate in construing the situation and what it is about; especially our findings on RDS demonstrate the constructive nature of the interaction, based on participants’ preconceptions, which resulted in considerable interpersonal variation. However, the results on the formulation of the robot’s feedback in Fischer (2012a,b) show that irrespective of the participants’ preconceptions, the partner’s utterances can

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Chapter 7.  Conclusions 289

influence the interaction considerably. Thus, what the task and the situation consist in is also construed by the participants implicitly and interactively on the basis of their preconceptions but also in interaction with each other. These findings suggest that “language use is really a form of joint action” (Clark, 1996, p. 3), and that linguistic knowledge is a dynamic and flexible representation of these usage events. Figure 7.1 visualizes the interrelationships determined. Our findings thus suggest that cognitive and social processes are deeply intertwined. The interdependencies between cognitive and social processes have considerable implications for the study of communication in general (cf. also Clark, 1996; Bavelas, 2005; Langacker, 2008, p. 479); there are especially methodological consequences: Methods that deal with either the cognitive or the social separately do not account for the full complexity of interaction and speakers’ own reality. Thus, cognitive and social aspects of interaction need to be studied together; future theories and methodologies need to take these interdependencies into consideration, which may necessitate the development of new methods (or a fresh look at existing ones, as in this investigation). To conclude, Levinson (2006) argues that speakers produce each utterance in a way that the addressee is able to figure out its meaning: “The action to be interpreted can be presumed to have been designed to be transparent to this particular recipient” (Levinson, 2006, p. 45, emphasis original); the current study has shown how this is possible, and how participants take their partners into consideration. Linguistic choice was thus found to depend on cognitive-constructive, as well as on social-interactional processes, and to be influenced by aspects of usage. Thus, as Haugh and Liddicoat (2009) point out, “linguistics cannot be completely divorced from the broader study of communication” (Haugh and Liddicoat, 2009, p. 8); producing an utterance is a cognitive and a social act that cannot be separated from its communicative purposes with respect to a particular communication partner.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Bibliography

Abbot-Smith, K. and H. Behrens (2006). How known constructions influence the acquisition of other constructions: The German passive and future constructions. Cognitive Science 30(6), 995–1026. Abbot-Smith, K. and M. Tomasello (2006). Exemplar-learning and schematization in a usagebased account of syntactic acquisition. The Linguistic Review 23, 275–290. Albin, D. D. and C. H. Echols (1996). Stressed and word-final syllables in infant-directed speech. Infant Behavior and Development 19, 401–418. Allwood, J., K. Grammer, S. Kopp, and E. Ahlsen (2006). A framework for analyzing embodied communicative feedback in multimodal corpora. In LREC 2006 Workshop on Multimodal Corpora, Genoa, Italy, May 2006. Allwood, J., J. Nivre, and E. Ahlsén (1992). On the semantics and pragmatics of linguistic feedback. Journal of Semantics 9, 1–26. Amalberti, R., N. Carbonell, and P. Falzon (1993). User representations of computer systems in human–computer speech interaction. International Journal of Man–Machine Studies 38, 547–566. Ambridge, B. and E. V. Lieven (2011). Child Language Acquisition: Contrasting Theoretical Approaches. Cambridge: Cambridge University Press. Ambridge, B., A. L. Theakston, E. Lieven, and M. Tomasello (2006). The distributed learning effect for children’s acquisition of an abstract grammatical construction. Cognitive Development 23, 174–193. Andonova, E. (2006). On changing mental models of a wheelchair robot. In K. Fischer (Ed.), Proceedings of the Workshop on ‘How People Talk to Computers, Robots, and Other Artificial Communication Partners’, Hansewissenschaftskolleg, Delmenhorst, April 21–23, 2006, pp. 131–139. SFB/TR8 Report 010-09 2006. Andrist, S., E. Spannan, and B. Mutlu (2013). Rhetorical robots: Making robots more effective speakers using linguistic cues of expertise. In Proceedings of the 2013 Human-Robot Interaction Conference, pp. 341–348. Arita, A., K. Hiraki, T. Kanda, and H. Ishiguro (2005). Can we talk to robots? Ten-monthsold infants expected interactive humanoid robots to be talked to by persons. Cognition 95, B49–B57. Auer, P. (2006). Mobility, contact and accomodation. In C. Llamas, L. Mullany, and P. Stockwell (Eds.), The Routledge Companion to Sociolinguistics, pp. 109–115. Routledge. Avery, P., S. Ehrlich, and C. Yorio (1985). Prosodic domains in foreigner talk discourse. In S. M. Gass and C. E. Madden (Eds.), Input in Second Language Acquisition, pp. 214–229. Rowley, London, Tokyo: Newbury House. Ball, P., H. Giles, and M. Hewstone (1985). Interpersonal accomodation and situational construals: An integrative formalisation. In H. Giles and R. N. St.Clair (Eds.), Recent Advances in Language, Communication and Social Psychology. London: Lawrence Erlbaum Associates.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

292 Designing Speech for a Recipient

Bangerter, A., H. H. Clark, and A. R. Katz (2004). Navigating joint projects in telephone conversations. Discourse Processes 37(1), 1–23. Bannard, C., E. Lieven, and M. Tomasello (2009). Modeling children’s early grammatical knowledge. PNAS 106(41), 17284–17289. Bard, E. C. and A. H. Anderson (1983). The unintelligibility of talk to children. Journal of Child Language 10(1), 265–292. Bard, E. G., A. H. Anderson, Y. Chen, H. Nicholsona, C. Havard, and S. Dalzel-Job (2007). Controlling the intelligibility of referring expressions in dialogue. Journal of Memory and Language 57(4), 616–641. Bard, E. G., A. H. Anderson, C. Sotillo, M. Aylett, G. Doherty-Sneddon, and Newlands (2000). Controlling the intelligibility of referring expressions in dialogue. Journal of Memory and Language 42, 1–22. Barr, D. J. (2008). Pragmatic expectations and lingustic evidence: Listeners anticipate but do not integrate common ground. Cognition 109, 18–40. Barr, D. J. and B. Keysar (2002). Anchoring comprehension in linguistic precedents. Journal of Memory and Language 46, 391–418. Barr, D. J. and B. Keysar (2007). Perspective taking and the coordination of meaning in language use. In M. J. In Traxler and M. A. Gernsbacher (Eds.), Handbook of Psycholinguistics, pp. 901–938. New York: Academic Press. Barton, M. E. and M. Tomasello (1994). The rest of the family: The role of fathers and siblings in early language development. In C. Gallaway and J. Richards (Eds.), Input and Interaction in Language Acquisition, pp. 109–134. Cambridge: Cambridge University Press. Bateman, J. (2006). A social-semiotic view of interactive alignment and its computational instantiation. In K. Fischer (Ed.), Proceedings of the Workshop on ‘How People Talk to Computers, Robots, and Other Artificial Communication Partners’, Hansewissenschaftskolleg, Delmenhorst, April 21–23, 2006, pp. 157–170. Bateman, J. A. (2008). Multimodality and Genre. London: Arnold. Bateman, J. A., J. Delin, and E. Henschel (2007). Mapping the multimodal genres of traditional and electronic newspapers. In T. D. Royce and W. Bowcher (Eds.), New Directions in the Analysis of Multimodal Discourse, pp. 147–172. New Jersey: Lawrence Erlbaum Associates. Bates, E., P. S. Dale, and D. Thal (1995). Individual differences and their implications for theories of language development. In P. Fletcher and B. MacWhinney (Eds.), The Handbook of Child Language, pp. 96–151. Oxford: Blackwell. Batliner, A., K. Fischer, R. Huber, J. Spilker, and E. Nöth (2003). How to find trouble in communication. Speech Communication 40(1–2), 117–143. Bavelas, J. (2007). Face-to-face dialogue as a micro-social context. In S. D. Duncan, J. Cassell, and E. T. Levy (Eds.), Gesture and the Dynamic Dimension of Language, pp. 127–146. Amsterdam/Philadelphia: John Benjamins. Bavelas, J. B. (2005). The two solitudes: Reconciling social psychology and social interaction. In K. L. Fitch and R. E. Sanders (Eds.), Handbook of Language and Social Interaction, pp. 179– 200. Mahwah, N.J.: Lawrence Erlbaum Associates. Bavelas, J. B., L. Coates, and T. Johnson (2000). Listeners as co-narrators. Journal of Personality and Social Psychology 79(6), 941–952. Bavelas, J. B., L. Coates, and T. Johnson (2002). Listener responses as a collaborative process: The role of gaze. International Communication Association 52, 566–580.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Bibliography 293

Bavelas, J. B., P. De Jong, S. Smock Jordan, and H. Korman (2014). The theoretical and research basis of co-constructing meaning in dialogue. Journal of Solution-focused Brief Therapy 2(2), 1–24. Baylor, A. L., R. B. Rosenberg-Kima, and E. A. Plant (2006). Interface agents as social models: The impact of appearance on females’ attitude toward engineering. In CHI’06 Extended Abstracts, pp. 526–531. Behrens, H. (2006). The input-output relationship in first language acquisition. Language and Cognitive Processes 21(1–3), 2–24. Bell, A. (1984). Language style as audience design. Language in Society 13(2), 154–204. Bell, A. (2001). Back in style: Reworking audience design. In P. Eckert and J. Rickford (Eds.), Style and Sociolinguistic Variation, pp. 139–169. Cambridge: Cambridge University Press. Bernstein-Ratner, N. (1987). The phonology of parent–child speech. In K. Nelson and A. van Kleeck (Eds.), Children’s Language, Volume 6, pp. 159–174. Hillsdale, NJ: Erlbaum. Betz, E. (2015). Recipient design in reference choice: Negotiating knowledge, access, and sequential trajectories. Zeitschrift für Gesprächsforschung 16, 137–173. Biber, D. (1994). An analytical framework for register studies. In D. Biber and E. Finegan (Eds.), Sociolinguistic Perspectives on Register, pp. 31–56. Oxford University Press: Oxford and New York. Biber, D. and S. Conrad (2001). Register variation: A corpus approach. In D. Schiffrin, D. Tannen, and H. Hamilton (Eds.), The Handbook of Discourse Analysis, pp. 175–196. Oxford: Blackwell. Biber, D. and S. Conrad (2009). Register, Style and Genre. Cambridge: Cambridge University Press. Biber, D. and E. Finegan (1994). Introduction: Situating register in sociolinguistics. In D. Biber and E. Finegan (Eds.), Sociolinguistic Perspectives on Register, pp. 3–12. Oxford University Press: Oxford and New York. Biersack, S., V. Kempe, and L. Knapton (2005). Fine-tuning speech registers: A comparison of the prosodic features of child-directed and foreigner-directed speech. In Interspeech-2005, pp. 2401–2404. Bilmes, J. (2009). Taxonomies are for talking: Reanalyzing a Sacks classic. Journal of Pragmatics 41, 1600–1610. Birkner, K. (2008). Was  X betrifft: Textsortenspezifische Aspekte einer Redewendung. In A. ­Stefanowitsch and K. Fischer (Eds.), Konstruktionsgrammatik II: Von der Konstruktion zur Grammatik, pp. 59–80. Tübingen: Stauffenburg. Bohannon, J. N. I. and A. Marquis (1977). Children’s control of adult speech. Child Development 48, 1002–1008. Bohannon III, J. N. and A.Warren-Leubecker (1988). Recent developments in speech to children: We’ve come a long way, baby-talk. Language Sciences 10(1), 89–110. Bolinger, D. (1989). Intonation and its Uses: Melody in grammar and discourse. London: Arnold. Bornstein, M. H. and D. L. Putnick (2012). Cognitive and socioemotional caregiving in developing countries. Child Development 83(1), 46–61. Borovsky, A. and J. Elman (2006). Language input and semantic catergories: A relation between cognition and early word learning. Journal of Child Language 33, 759–790. Bortfeld, H. and S. E. Brennan (1997). Use and acquisition of idiomatic expressions in referring by native and non-native speakers. Discourse Processes 23, 119–147. Bourhis, R. (1983). Language attitudes and self-reports of French-English language usage in Quebec. Journal of Mulitlingual and Multicultural Development 4, 163–179.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

294 Designing Speech for a Recipient

Bradlow, A. R. (2002). Confluent talker- and listener-related forces in clear speech production. In C. Gussenhoven and N. Warner (Eds.), Laboratory Phonology 7, pp. 241–273. Berlin/New York: Mouton de Gruyter. Branigan, H. and J. Pearson (2006). Alignment in human-computer interaction. In K. Fischer (Ed.), Proceedings of the Workshop on ‘How People Talk to Computers, Robots, and Other Artificial Communication Partners’, Hansewissenschaftskolleg, Delmenhorst, April 21–23, 2006. SFB/TR8 Report 010-09 2006. Branigan, H., M. Pickering, J. Pearson, J. McLean, and C. Nass (2003). Syntactic alignment between computers and people: The role of belief about mental states. In Proceedings of the Twenty-fifth Annual Conference of the Cognitive Science Society, Boston. Branigan, H. P., M. J. Pickering, J. McLean, and A. Cleland (2007). Syntactic alignment and participant role in dialogue. Cognition 104, 163–197. Branigan, H. P., M. J. Pickering, J. Pearson, and J. McLean (2010). Linguistic alignment between people and computers. Journal of Pragmatics 42, 2355–2368. Breazeal, C. (2000). Sociable Machines: Expressive Social Exchange Between Humans and Robots. Ph. D. thesis, Department of Electrical Engineering and Computer Science, MIT. Brennan, S. (1991). Conversations with and through computers. User Modeling and User-Adapted Interaction 1, 67–86. Brennan, S. and H. H. Clark (1996). Conceptual pacts and lexical choice in conversation. Journal of Experimental Psychology: Learning, Memory and Cognition 22, 1482–1493. Brennan, S. and C. Metzing (2004). Two steps forward, one step back: Partner-specific effects in a psychology of dialogue (commentary on Pickering & Garrod 2004). Behavioral and Brain Sciences 27(2), 24–25. Brennan, S. E. (2005). How conversation is shaped by visual and spoken evidence. In J. Trueswell and M. Tanenhaus (Eds.), Approaches to Studying World-Situated Language Use: Bridging the language-as-product and language-action traditions, pp. 95–129. Cambridge, MA: MIT Press. Brennan, S. E., A. Galati, and A. K. Kuhlen (2010). Two minds, one dialog: Coordinating speaking and understanding. Psychology of Learning and Motivation 53, 301–344. Brennan, S. E. and J. E. Hanna (2009). Partner-specific adaptation in dialogue. Topics in Cognitive Science 1, 274–291. Brennan, S. E., K. S. Schuhmann, and K. M. Batres (2013). Entrainment on the move and in the lab: The walking around corpus. In Proceedings of the 35th Annual Conference of the Cognitive Science Society, pp. 1934–1939. Austin, TX: Cognitive Science Society. Brent, M. R. and T. A. Cartwright (1996). Distributional regularity and phonotactics are useful for segmentation. Cognition 61(1–2), 93–125. Brent, M. R. and J. M. Siskind (2001). The role of exposure to isolated words in early vocabulary development. Cognition 81, B33–B44. Bresnan, J. and M. F. Ford (2010). Predicting syntax: Processing dative constructions in American and Australian varieties of English. Language 86(1), 186–213. Bresnan, J. and J. Hay (2008). Gradient grammar: An effect of animacy on the syntax of ‘give’ in New Zealand and American English. Lingua 118(2), 245–259. Brindöpke, C., J. Häger, M. Johanntokrax, A. Pahde, M. Schwalbe, and B. Wrede (1995). Darf ich Dich Marvin nennen? Instruktionsdialoge in einem Wizard-of-Oz Szenario: Szenariodesign und Auswertung. SFB-Report ‘Situierte künstliche Kommunikatoren’ 95/16, Universität Bielefeld. Brooks, P. and M. Tomasello (1999). Young children learn to produce passives with nonce verbs. Developmental Psychology 35(1), 29–44.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Bibliography 295

Brown, P. and G. Dell (1987). Adapting production to comprehension: The explicit mention of instruments. Cognitive Psychology 19, 441–472. Brown, P. and S. Levinson (1987). Politeness. Some Universals in Language Usage (2nd (original 1978) ed.). Cambridge University Press. Brown, R. (1977). The place of baby talk in the world of language. In C. Snow and C. Ferguson (Eds.), Talking to Children: Language Input and Acquisition. Cambridge: Cambridge University Press. Brown-Schmidt, S. (2009). Partner-specific interpretation of maintained referential precedents during interactive dialog. Journal of Memory and Language 61, 171–190. Brown-Schmidt, S. (2012). Beyond common and privileged: Gradient representations of common ground in real-time language use. Language and Cognitive Processes 27(1), 62–89. Brown-Schmidt, S., C. Gunlogson, and M. K. Tanenhaus (2008). Addressees distinguish shared from private information when interpreting questions during interactive conversation. Cognition 107, 1122–1134. Bublitz, W. (1988). Supportive Fellow-Speakers and Cooperative Conversations. Discourse Topics and Topical Actions, Participant Roles and ‘Recipient Action’ in a Particular Type of Everyday Conversation. Amsterdam/Philadelphia: Benjamins. Burnham, D., C. Kitamura, and U. Vollmer-Conna (2002). What’s new, pussycat? On talking to babies and animals. Science 296, 1435. Bybee, J. (2007). Frequency of Use and the Organization of Language. Oxford: Oxford University Press. Bybee, J. (2010). Language, Usage and Cognition. Cambridge: Cambridge University Press. Cameron-Faulkner, T., E. Lieven, and M. Tomasello (2003). A construction based analysis of child directed speech. Cognitive Science 27, 843–873. Cartwright, T. A. and M. R. Brent (1997). Syntactic categorization in early language acquisition: Formalizing the role of distributional analysis. Cognition 63, 121–170. Casenhiser, D. M. and A. E. Goldberg (2005). Mapping of phrasal form and meaning. Developmental Science 8, 500–508. Cha, E., A. Dragan, J. Forlizzi, and S. S. Srinivasa (2014). Effects of speech on perceived capability. In Proceedings of the 2014 Human-Robot Interaction Conference HRI’14, Bielefeld, Germany. Chaudron, C. (1983). Simplification of input: Topic reinstatements and their effects on L2 learners’ recognition and recall. TESOL Quarterly 17(3), 437–458. Chidambaram, V., Y.-H. Chiang, and B. Mutlu (2012). Designing persuasive robots: How robots might persuade people using vocal and nonverbal cues. In Proceedings of the 2012 HumanRobot Interaction Conference, Boston, MA., pp. 293–300. Chomsky, N. (1968). Language and Mind. New York: Harcourt Brace & World. Chouinard, M. M. and E. V. Clark (2003). Adult reformulations of child errors as negative evidence. Journal of Child Language 30, 637–669. Clahsen, H., J. Meisel, and M. Pienemann (1983). Deutsch als Zweitsprache. Der Spracherwerb ausländischer Arbeiter. Narr: Tübingen. Clark, E. V. (1998a). Lexical structure and pragmatic directions in acquisition. In M. C. Gruber, D. Higgins, K. S. Olson, and T. Wysocki (Eds.), Chicago Linguistic Society: Papers from the Panels, Volume 34, pp. 437–446. Chicago Linguistic Society. Clark, E. V. and J. Bernicot (2008). Repetition as ratification: How parents and children place information in common ground. Journal of Child Language (35), 349–371. Clark, E. V. and B. Estigarribia (2011). Using speech and gesture to inform young children about unfamiliar word meanings. Gesture 11(1), 1–23.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

296 Designing Speech for a Recipient

Clark, E. V. and A. D.-W. Wong (2002). Pragmatic directions about language use: Offers of words and relations. Language in Society 31, 181–212. Clark, H. (1999). How do real people communicate with virtual partners? In Proceedings of AAAI-99 Fall Symposium, Psychological Models of Communication in Collaborative Systems, November 5–7th, 1999, North Falmouth, MA. Menlo Park, Calif.: AAAI Press; Cambridge, Mass.: MIT Press. Clark, H. H. (1992). Arenas of Language Use. Chicago: University of Chicago Press. Clark, H. H. (1996). Using Language. Cambridge: Cambridge University Press. Clark, H. H. (1998b). Communal lexicons. In K. Malmkjaer and J. Williams (Eds.), Context in Language Learning and Language Understanding, pp. 63–87. Cambridge: Cambridge University Press. Clark, H. H. and T. B. Carlson (1992). Hearers and speech acts. In H. H. Clark (Ed.), Arenas of Language Use. Chicago: University of Chicago Press and Stanford: CSLI. Clark, H. H. and M. A. Krych (2004). Speaking while monitoring addressees for understanding. Journal of Memory and Language 50(1), 62–81. Clark, H. H. and C. Marshall (1981). Definite reference and mutual knowledge. In K. Joshi, B. Webber, and I. Sag (Eds.), Elements of Discourse Understanding, pp. 10–63. Cambridge: Cambridge University Press. Clark, H. H. and C. Marshall (1992). Definite reference and mutual knowledge. In H. H. Clark (Ed.), Arenas of Language Use, pp. 9–59. Chicago: University of Chicago Press and Stanford: CSLI. Clark, H. H. and E. F. Schaefer (1987). Concealing one’s meaning from overhearers. Journal of Memory and Language 26, 209–225. Clark, H. H. and E. F. Schaefer (1989a). Collaborating on contributions to conversation. In R. ­Dietrich and C. Graumann (Eds.), Language Processing in a Social Context. Amsterdam: North Holland. Clark, H. H. and E. F. Schaefer (1989b). Contributing to discourse. Cognitive Science 13, 259–294. Clark, H. H. and E. F. Schaefer (1992). Dealing with overhearers. In H. H. Clark (Ed.), Arenas of Language Use, pp. 248–274. Chicago: University of Chicago Press and Stanford: CSLI. Clark, H. H., R. Schreuder, and S. Buttrick (1992). Common ground and the understanding of demonstrative reference. In H. H. Clark (Ed.), Arenas of Language Use, pp. 78–99. Chicago: University of Chicago Press and Stanford: CSLI. Clark, H. H. and D. Wilkes-Gibbs (1986). Referring as a collaborative process. Cognition 22, 1–39. Clark, E. V. and P. M. Amaral (2010). Children build on pragmatic information in language acquisition. Language and Linguistics Compass 4(7), 445–457. Conti-Ramsden, G. and S. Friel-Patti (1987). Situational variability in mother-child conversations. In K. Nelson and A. vanKleek (Eds.), Children’s Language, Volume 6, pp. 43–63. Hillsdale, N.J.: Erlbaum. Coupland, N. (2001). Language, situation, and the relational self: Theorizing dialect–style in sociolinguistics. In P. Eckert and J. Rickford (Eds.), Style and Sociolinguistic Variation, pp. 185–210. Cambridge: Cambridge University Press. Coupland, N., J. Coupland, H. Giles, and K. Henwood (1988). Accommodating the elderly: Invoking and extending a theory. Language in Society 17, 1–41. Cross, T. (1978). Motherese: Its association with rate of syntactic acquisition in young children. In N. Waterson and C. Snow (Eds.), The Development of Communication. London: Wiley & Sons.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Bibliography 297

Cross, T., J. E. Johnson-Morris, and T. G. Nienhuys (1980). Linguistic feedback and maternal speech: Comparisons of mothers addressing hearing and hearing-impaired children. First Language 1, 163–189. Cross, T. G., T. G. Nienhuys, and M. Kirkman (1985). Parent–child interaction with receptively disabled children: Some determinants of maternal speech style. In K. Nelson (Ed.), Children’s Language, Volume 5, pp. 247–290. Hillsdale, NJ: Erlbaum. Cruse, D. A. (2000). Meaning in Language. Oxford: Oxford University Press. Cruttenden, A. (1994). Phonetic and prosodic aspects of baby talk. In C. Gallaway and B. J. ­Richards (Eds.), Input and Interaction in Language Acquisition, pp. 135–152. Cambridge: Cambridge University Press. Crystal, D. (2001). Language and the Internet. Cambridge: Cambridge University Press. Dabrowska, E. (2004). Language, Mind and Brain. Edinburgh University Press. Dabrowska, E. (2009). Words as constructions. In V. Evans and S. Pourcel (Eds.), New Directions in Cognitive Linguistics, pp. 201–223. Amsterdam: John Benjamins. Dahlbäck, N., S. Swamy, C. Nass, F. Arvidsson, and J. Skågeby (2001). Spoken interaction with computers in a native or non-native language – same or different? In Proceedings of INTERACT 2001. Dale, R. and M. J. Spivey (2006). Unraveling the dyad: Using recurrence analysis to explore patterns of syntactic coordination between children and caregivers in conversation. Language Learning 56(3), 391–430. De Angeli, A. and S. Brahnam (2008). I hate you! Disinhibition with virtual partners. Interacting with Computers 20(3), 302–310. De la Fuente, M. J. (2002). Negotiation and oral acquisition of L2 vocabulary. Studies in Second Language Acquisition 24, 81–112. De Paulo, B. M. and J. D. Bonvillian (1978). The effect on language development of the special characteristics of speech addressed to children. Journal of Psycholinguistic Research 7(3), 189–211. DePaulo, B. M. and L. Coleman (1986). Talking to children, foreigners, and retarded adults. Journal of Personality and Social Psychology 51, 945–959. DePaulo, B. M. and L. Coleman (1987). Verbal and nonverbal communication of warmth to children, foreigners, and retarded adults. Journal of Nonverbal Behavior 11(2), 75–88. Deppermann, A. (2015). When recipient design fails: Egocentric turn-design of instructions in driving school lessons leading to breakdowns of intersubjectivity. Zeitschrift für Gesprächsforschung 16, 63–101. Deppermann, A. and H. Blühdorn (2013). Negation als Verfahren des Adressatenzuschnitts: Verstehenssteuerung durch Interpretationsrestriktionen. Deutsche Sprache 41(1), 6–30. DiSalvo, C. F., F. Gemperle, J. Forlizzi, and S. Kiesler (2002). All robots are not created equal: The design and perception of humanoid robot heads. In DIS2002, London, pp. 321–326. D’Odorico, L. and F. Franco (1985). The determinants of baby talk: Relationship to context. Journal of Child Language 12, 567–586. Dominey, P. F. and C. Dodane (2004). Indeterminacy in language acquisition: The role of child directed speech and joint attention. Journal of Neurolinguistics 17, 121–145. Drew, P. and J. Heritage (Eds.) (1992). Talk at Work: Language Use in Institutional and Work-Place Settings. Cambridge: Cambridge University Press. Dreyfus, H. L. (1992). What Computers Still Can’t Do. A Critique of Artificial Reason. Cambridge, Massachusettes/London: The M.I.T. Press. Du Bois, J. (2014). Towards a dialogic syntax. Cognitive Linguistics 25(3), 359–410.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

298 Designing Speech for a Recipient

Duran, N. D., R. Dale, and R. J. Kreutz (2011). Listeners invest in an assumed other’s perspective despite cognitive cost. Cognition 121, 22–40. Edwards, D. (1994). Imitation an artifice in apes, humans, and machines. American Behavioral Scientist 37(6), 754–771. Edwards, D. (2001). Discourse and Cognition. London, Thousand Oaks, New Delhi: Sage. Ehlich, K. (1986). Interjektionen. Number 111 in Linguistische Arbeiten. Tübingen: Niemeyer. Ellis, J. and J. Ure (1969). Language varieties: Register. In A. Meetham and R. Hudson (Eds.), Encyclopedia of Linguistics. Information and Control, pp. 251–259. Oxford: Pergammon. Ellis, N. (2008). Implicit and explicit knowledge about language. In J. Cenoz and N. Hornberger (Eds.), Encyclopedia of Language and Education (2nd ed.), Volume 6: Knowledge about Language, pp. 1–13. Berlin: Springer. Ellis, N. (2009). Optimizing the input: Frequency and sampling in usage-based and form-focussed learning. In M. H. Long and C. Doughty (Eds.), Handbook of Language Teaching, pp. 139–158. Oxford: Blackwell. Ellis, N., M. O’Donnell, and U. Römer (2013). Usage-based language: Investigating the latent structures that underpin acquisition. Language Learning 63(1), 25–51. Ellis, R. (1985). Understanding Second Language Acquisition. Oxford University Press. Ellis, R. (1993). Naturally simplified input, comprehension and second language acquisition. In M. Tickoo (Ed.), Simplification: Theory and Application. Singapore: SEAMEO Regional Language Centre. Ellis, R. (1999). Theoretical perspectives on interaction and language learning. In R. Ellis (Ed.), Learning a Second Language through Interaction. Amsterdam/Philadelphia: Benjamins. Englund, K. and D. Behne (2006). Changes in infant-directed speech in the first six months. Infant and Child Development 15, 139–160. Erickson, F. (2004). Talk and Social Theory: Ecologies of Speaking and Listening in Everyday Life. Cambridge: Polity. Falk, D. (2004). Prelinguistic evolution in early hominids: Whence motherese? Behavioral and Brain Sciences 27(4), 491–541. Ferguson, C. (1994). Dialect, register, and genre: Working assumptions about conventionalization. In D. Biber and E. Finegan (Eds.), Sociolinguistic Perspectives on Register, pp. 15–30. Oxford University Press: Oxford and New York. Ferguson, C. A. (1975). Toward a characterization of English foreigner talk. Anthropological Linguistics 17(1), 1–14. Ferguson, C. A. (1977). Baby talk as a simplified register. In C. E. Snow and C. A. Ferguson (Eds.), Talking to Children. Language Input and Acquisition, pp. 209–235. Cambridge: Cambridge University Press. Ferguson, C. A. (1981). ‘Foreigner talk’ as the name of a simplified register. International Journal of the Sociology of Language 28, 9–18. Ferguson, C. A. (1982). Simplified registers and linguistic theory. In L. Obler and L. Menn (Eds.), Exceptional Language and Linguistics. New York: Academic Press. Ferguson, C. A. (2004). Talking to children: A search for universals. In B. C. Lust and C. Foley (Eds.), First Language Acquisition – The Essential Readings, pp. 176–189. Oxford: Blackwell. Fernald, A. (1985). Four-month-old infants prefer to listen to motherese. Infant Behavior and Development 8(2), 181–195. Fernald, A. (1992). Human maternal vocalizations to infants as biologically relevant signals: An evolutionary perspective. In J. H. Barkow, L. Cosmidess, and J. Tooby (Eds.), The Adapted Mind: Evolutionary psychology and the generation of culture, pp. 391–428. New York & Oxford: Oxford University Press.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Bibliography 299

85–92.

Fernald, A. (2000). Speech to infants as hyperspeech: Knoweldge-driven processes in early word recognition. Phonetica 57, 242–254. Fernald, A. and N. Hurtado (2006). Names in frames: Infants interpret words in sentence frames faster than words in isolation. Developmental Science 9(3), F33–F40. Fernald, A., E. Marchman, and A. Weisleder (2012). SES differences in language processing skill and vocabulary are evident at 18 months. Developmental Science, 1–13. Fernald, A. and C. Mazzie (1991). Prosody and focus in speech to infants and adults. Journal of Developmental Psychology 27, 209–221. Fernald, A., G. W. McRoberts, and D. Swingley (2001). Infants’ developing competence in recognizing and understanding words in fluent speech. In J. Weissenborn and B. Hoehle (Eds.), Approaches to Bootstrapping: Phonological, lexical, syntactic, and neurophysiological aspects of early language acquisition, Volume I, pp. 97–123. Amsterdam: Benjamins. Fernald, A. and A. Weisleder (2011). Early language experience is vital to developing fluency in understanding. In S. Neuman and D. Dickinson (Eds.), Handbook of Early Literacy Research, Volume 3. New York: Guiltford Publications. Ferreira, V. and G. Dell (2000). Effect of ambiguity and lexical availability on syntactic and lexical production. Cognitive Psychology 40, 296–340. Fetzer, A. and K. Fischer (Eds.) (2007). Lexical Markers of Common Grounds. Amsterdam: Elsevier. Filipi, A. (2009). Toddler and Parent Interaction. Amsterdam/Philadelphia: John Benjamins. Filipi, A. and R. Wales (2009). An interactionally situated analysis of what prompts shift in the motion verbs ‘come’ and ‘go’. In K. Coventry, T. Tenbrink, and J. A. Bateman (Eds.), Spatial Language and Dialogue. Oxford: Oxford University Press. Fillmore, C. J. (1988). The mechanisms of construction grammar. In Proceedings of the 14th Meeting of the Berkeley Linguistic Society, pp. 35–55. Fillmore, C. J. and B. T. Atkins (1992). Toward a frame-based lexicon: The semantics of risk and its neighbors. In A. Lehrer and E. F. Kittay (Eds.), Frames, Fields and Contrasts, pp. 75–102. Hillsdale N.J.: Lawrence Erlbaum. Fillmore, C. J., P. Kay, and M. O’Connor (1988). The case of let alone. Language 64(3), 501–538. Finegan, E. and D. Biber (2001). Register variation and social dialect variation: The register axiom. In P. Eckert and J. R. Rickford (Eds.), Style and Sociolinguistic Variation. Cambridge: Cambridge University Press. Firth, A. (2009). Doing not being a foreign language learner: English as a lingua franca in the workplace and (some) implications for SLA. International Review of Applied Linguistics in Language Teaching 47(1), 127–156. Firth, A. and J. Wagner (1997). On discourse, communication, and (some) fundamental concepts in SLA research. The Modern Language Journal 81(3), 285–300. Firth, A. and J. Wagner (2007). Second/foreign language learning as a social accomplishment: Elaborations on a reconceptualized SLA. The Modern Language Journal 91(5), 798–817. Firth, J. R. (1968). Selected Papers of J. R. Firth, 1952–1959, by F. R. Palmer. London/Harlow: Longmans. Fischer, K. (1999). Die Ikonizität der Pause: Zwischen kognitiver Last und kommunikativer Funktion. In I. Wachsmuth and B. Jung (Eds.), Proceedings der KogWis99, Bielefeld, September 1999, pp. 250–255. Gesellschaft für Kognitionswissenschaft: Infix. Fischer, K. (2000a). From Cognitive Semantics to Lexical Pragmatics: The Functional Polysemy of Discourse Particles. Mouton de Gruyter: Berlin, New York. Fischer, K. (2000b). What is a situation? Gothenburg Papers in Computational Linguistics 00–5,

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

300 Designing Speech for a Recipient

Fischer, K. (2003). Notes on analysing context. In P. Kühnlein, H. Rieser, and H. Zeevat (Eds.), Perspectives on Dialogue in the New Millennium, Number 114 in Pragmatics and Beyond New Series, pp. 193–214. Amsterdam: John Benjamins. Fischer, K. (2004). Expressive speech characteristics in the communication with artificial agents. In Proceedings of the AISB 2004 Convention, Symposium on Language, Speech and Gesture for Expressive Characters, University of Leeds, 29 March – 1 April 2004, pp. 1–11. Fischer, K. (2005). Discourse conditions for spatial perspective taking. In Proceedings of WoSLaD Workshop on Spatial Language and Dialogue, Delmenhorst, October 2005. Fischer, K. (2006a). The role of users’ concepts of the robot in human-robot spatial instruction. In Proceedings of ‘Spatial Cognition ‘06’. Fischer, K. (2006b). What Computer Talk Is and Isn’t: Human-Computer Conversation as Intercultural Communication. Saarbrücken: AQ. Fischer, K. (2010a). Accounting for the role of situation in language use in a cognitive semantic representation of sentence mood. In D. Glynn and K. Fischer (Eds.), Quantitative Methods in Cognitive Semantics: Corpus-driven Approaches, pp. 179–200. Berlin/New York: Mouton de Gruyter. Fischer, K. (2010b). Why it is interesting to investigate how people talk to computers and robots: Introduction to the special issue. Journal of Pragmatics 42, 2349–2354. Fischer, K. (2011). Interpersonal variation in understanding robots as social actors. In Proceedings of HRI’11, March 6–9th, 2011. Lausanne, Switzerland, pp. 53–60. Fischer, K. (2012a). Contingency, projection and attention to common ground as major design principles for robot feedback. In Proceedings of the Workshop on ‘Robot Feedback in HumanRobot Interaction: How to Make a Robot ‘Readable’ for a Human Interaction Partner’, at the 21st IEEE International Symposium on Robot and Human Interactive Communication, RoMan’12, Paris, France. Fischer, K. (2012b). How different views of communication influence HCI design: The example of shaping. In Proceedings of the 12th Danish HCI Research Symposium (DHRS 2012), Sonderborg, Denmark. Fischer, K. and J. A. Bateman (2006). Keeping the initiative: An empirically motivated approach to predicting user-initiated dialogue contributions in HCI. In Proceedings of the EACL’06, April 2006, Trento, Italy. Fischer, K., K. Foth, K. Rohlfing, and B. Wrede (2011). Mindful tutors – linguistic choice and action demonstration in speech to infants and to a simulated robot. Interaction Studies 12(1), 134–161. Fischer, K., K. Lohan, and K. Foth (2012, March). Levels of embodiment: Linguistic analyses of factors influencing HRI. In Proceedings of HRI’12, Boston, Mass., Boston, MA. Fischer, K., K. Lohan, C. Nehaniv, and H. Lehmann (2013). Effects of different kinds of robot feedback. In Proceedings of the International Conference on Social Robotics. Berlin: Springer. Fischer, K., K. S. Lohan, K. Rohlfing, and K. Foth (2014). Partner orientation in asymmetric communication: Evidence from contingent robot response. In Proceedings of the HRI’14 Workshop on Humans and Robots in Asymmetric Interactions. March 3rd, 2014, Bielefeld, Germany. Fischer, K., K. S. Lohan, J. Saunders, C. Nehaniv, B. Wrede, and K. Rohlfing (2013). The impact of the contingency of robot feedback on HRI. In Proceedings of the 4th International Conference on Collaborative Technological System CTS’13, San Diego, May 20–24, 2013. Fischer, K. and R. Moratz (2001). From communicative strategies to cognitive modelling. In Proceedings of the First International Workshop on ‘Epigenetic Robotics’, September 17–18, 2001, Lund, Sweden.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Bibliography 301

Fischer, K. and J. Saunders (2012). Getting acquainted with a developing robot. In Proceedings of ‘Human User Behavior’, IROS, Portugal. Springer. Fischer, K. and R. Wilde (2005). Methoden zur Analyse interaktiver Bedeutungskonstitution. In C. Solte-Gresser, K. Struwe, and N. Ueckmann (Eds.), Von der Wirklichkeit zur Wissenschaft. Aktuelle Forschungsmethoden in den Sprach-, Literatur- und Kulturwissenschaften, Number 1 in Forum Literaturen Europas, pp. 163–174. Hamburg: LIT-Verlag. Fishman, P. (1978). Interaction: the work women do. Social Problems 25, 397–406. Fogg and C. I. Nass (1997). Silicon sycophants: Effects of computers that flatter. International Journal of Human-Computer Studies 46(5), 551–561. Fogg, B. (2003). Persuasive Technology: Using Computers to Change What we Think and Do. Morgan Kaufmann Publishers. Foley, W. (2006). Universal constraints and local conditions in pidginization. Journal of Pidgin and Creole Languages 21(1), 1–44. Forlizzi, J., J. Zimmerman, V. Manusco, and S. Kwak (2007). How interface agents affect interactions between humans and computers. In Designing Pleasurable Products and Interfaces. ACM Conference, 22–25 August, Helsinki, Finnland, pp. 209–221. Fox, B. (2008). Dynamics of discourse. In G. Antos, E. V. Ventola, and T. Weber (Eds.), Handbook of Applied Linguistics, pp. 255–284. Berlin: de Gruyter. Fraser, N. and G. Gilbert (1991). Simulating speech systems. Computer Speech and Language 5, 81–99. Freed, B. F. (1981). Foreigner talk, baby talk, native talk. International Journal of the Sociology of Language 28, 19–39. French, L. and M. K. Pak (1995). Young children’s play dialogues with mothers and peers. In K. E. Nelson and Z. Réger (Eds.), Children’s Language, Volume 8, pp. 65–101. Hillsdale, N.J.: Lawrence Erlbaum. Furrow, D., C. Nelson, and H. Benedict (1979). Mothers’ speech to children and syntactic development: Some simple relationships. Journal of Child Language 6, 423–442. Furrow, D. and K. Nelson (1986). A further look at the motherese hypothesis: A reply to ­Gleitman, Newport & Gleitman. Journal of Child Language 13, 163–176. Fussell, Susan, R. and R. M. Krauss (1992). Coordination of knowledge in communication: Effects of speakers’ assumptions about what others know. Journal of Personality and Social Psychology 62(3), 378–391. Fussell, S. and R. Kraut (2004). Visual copresence and conversational coordination (commentary on Pickering & Garrod 2004). Behaviour and Brain Sciences 27(2), 196–197. Gal, S. (1987). Linguistic repertoire. In U. Ammon, N. Dittmar, and K. Mattheir (Eds.), Sociolinguistics: An International Handbook of the Science of Language and Society, pp. 286–292. Berlin/New York: de Gruyter. Galantucci, B. and N. Sebanz (2009). Joint action: Current perspectives. Topics in Cognitive Science 1, 255–259. Galati, A. and S. E. Brennan (2009). Attenuating information in spoken communication: For the speaker, or for the addressee? Journal of Memory and Language 62, 35–51. Galati, A. and S. E. Brennan (2013). Speakers adapt gestures to addressees’ knowledge: Implications for models of co-speech gesture. Language, Cognition and Neuroscience. Gann, T. M. and D. J. Barr (2012). Speaking from experience: Audience design as expert performance. Language and Cognitive Processes, 1–23. Gardner, R. (2001). When Listeners Talk. Amsterdam/Philadelphia: John Benjamins. Gardner, R. and J. Wagner (Eds.) (2004). Second Language Conversations. London: Continuum.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

302 Designing Speech for a Recipient

Garfinkel, H. (1972). Remarks on ethnomethodology. In J. J. Gumperz and D. Hymes (Eds.), Directions in Socolinguistics. The Ethnography of Communication. New York et al.: Holt, Rinehart and Winston. Garrod, S. and M. J. Pickering (2007). Alignment in dialogue. In M. G. Gaskell and G. Altman (Eds.), The Oxford Handbook of Psycholiguistics, pp. 443–451. New York: Oxford University Press. Garrod, S. and M. J. Pickering (2009). Joint action, interactive alignment, and dialog. Topics in Cognitive Science 1, 292–304. Gass, S. (2003). Input and interaction. In C. J. Doughty and M. H. Long (Eds.), The Handbook of Second Language Acquisition. Malden/Oxford/Carlton: Blackwell. Gass, S. M. (1997). Input, Interaction, and the Second Language Learner. Mahwah, N.J.: Lawrence Erlbam Associates. Gauvain, M., B. I. Fagot, C. Leve, and K. Kavanagh (2002). Instruction by mothers and fathers during problem solving with their young children. Journal of Family Psychology 16(1), 81–90. Georgakopoulou, A. (1996). The audience shaping of text-strategies in spoken discourse: Adults vs. children addressees and the case of Modern Greek. Journal of Pragmatics 25, 649–674. Giles, H., J. Coupland, and N. Coupland (Eds.) (1991). Contexts of Accomodation. Developments in Applied Sociolinguistics. Cambridge: Cambridge University Press. Giles, H. and N. Coupland (1991). Language: Contexts and Consequences. Keynes: Open University Press. Giles, H. and A. Franklyn-Stokes (1989). Communicator characteristics. In M. Asante and W. Gudykunst (Eds.), Handbook of International and Intercultural Communication. Newbury Park, CA: Sage. Giles, H., C. Hajek, V. Barker, M.-L. Chen, B. Zhang, M. Hummert, and M. Andersen (2006). Accomodation and institutional talk: Communicative dimensions of police-civilian interactions. In A. Weatherall, B. Watson, and C. Gallois (Eds.), The Social Psychology of Language and Discourse. Basingstoke, UK: Palgrave Macmillan. Giles, H., M. Willemyns, C. Gallois, and M. C. Anderson (2006). Accommodating a new frontier: The context of law enforcement. In K. Fiedler (Ed.), Social Communication, pp. 129–162. New York: Psychology Press. Givόn, T. (2005). Context as Other Minds. Amsterdam/Philadelphia: John Benjamins. Gleason, J. B. (1975). Fathers and other strangers: Men’s speech to young children. In D. Dato (Ed.), Developmental Psychology: Theory and Application, pp. 289–297. Georgetown University Press. Gleason, J. B. and S. Weintraub (1978). Input language and the acquisition of communicative competence. In K. Nelson (Ed.), Children’s Language, Volume 1, pp. 171–222. New York, London, Sydney, Toronto: Gardner Press. Gleitman, L. R., E. L. Newport, and H. Gleitman (1984). The current status of the motherese hypothesis. Journal of Child Language 11, 43–79. Goffman, E. (1975). Frame Analysis: An Essay on the Organization of Experience. Cambridge: Harvard University Press. Goffman, E. (1978). Response cries. Language 54, 787–815. Goffman, E. (1979). Footing. Semiotica 25, 1–29. Goldberg, A. (2006). Constructions at Work. Oxford/NewYork: Oxford University Press. Goldberg, A., D. Casenhiser, and N. Sethuraman (2004). Learning argument structure generalizations. Cognitive Linguistics 15(3), 289–316. Goldstein, M. H., J. Schwade, J. Briesch, and S. Syal (2010). Learning while babbling: Prelinguistic object-directed vocalizations indicate a readiness to learn. Infancy 15(4), 362–391.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Bibliography 303

Gomez, R. L. and L. Gerken (1999). Artificial grammar learning by 1-year-olds leads to specific and abstract knowledge. Cognition 70, 109–135. Gong, L. (2008). How social is social responses to computers? The function of the degree of anthropomorphism in computer representations. Computers in Human Behavior 24, 1494–1509. Goodman, J. C., P. S. Dale, and P. Li (2008). Does frequency count? Parental input and the acquisition of vocabulary. Journal of Child Language 35, 515– 531. Goodwin, C. (1986). Audience diversity, participation and interpretation. Text 6(3), 283–316. Gries, S. T. (2005). Syntactic priming. Journal of Psycholinguistic Research 34(4), 365–399. Groom, V., L. Takayama, P. Ochi, and C. Nass (2009). I am my robot: The impact of robot-building and robot form on operators. In Proceedings of the Human-Robot Interaction Conference: HRI 2009, San Diego, CA., pp. 31–36. Grosz, B. J. (1982). Discourse analysis. In R. Kittredge and J. Lehrberger (Eds.), Sublanguage. Studies of Language in Restricted Semantic Domains, pp. 138–174. Berlin, New York: De Gruyter. Grunow, H., T. J. Spaulding, R. L. Gomez, and E. Plante (2006). The effects of variation on learning word order rules by adults with and without language-based learning disabilities. Journal of Communication Disorders 39, 158–170. Gumperz, J. (1982). Discourse Strategies. Number 1 in Studies in Interactional Sociolinguistics. Cambridge: Cambridge University Press. Gumperz, J. J. (1977). Sociocultural knowledge in conversational inference. In M. Saville-Troike (Ed.), Linguistics and Anthropology, pp. 191–211. Washington D.C.: Georgetown University Press. Günthner, S. and K. König (2016). Kommunikative Gattungen in der Interaktion: Kulturelle und grammatische Praktiken im Gebrauch. In A. Deppermann, H. Feilke, and A. Linke (Eds.), Sprachliche und kommunikative Praktiken, pp. 177–204. Berlin/Boston: de Gruyter. Hadelich, K., H. Branigan, M. J. Pickering, and M. W. Crocker (2004). Alignment in dialogue: Effects of visual versus verbal feedback. In Proceedings of the 8th Workshop on the Semantics and Pragmatics of Dialogue, Catalog’04, Barcelona, Spain, pp. 35–40. Haggan, M. (2002). Self-reports and self-delusion regaring the use of motherese: Implications from Kuwaiti adults. Language Sciences 24, 17–28. Hall, W. S., W. E. Nagy, and R. Linn (1984). Spoken Words: Effects of Situation and Social Group on Oral Word Usage and Frequency. Hillsdale et al.: Erlbaum. Halliday, M. (1985). Introoduction to Systemic Functional Grammar. London: Arnold. Halliday, M. and C. Matthiessen (2004). Introduction to Functional Grammar (3rd ed.). London: Arnold. Halliday, M., A. McIntosh, and P. Strevens (1964). The Linguistic Sciences and Language Teaching. London: Longman. Halliday, M. A. (1978). Language as a Social Semiotic: The Social Interpretation of Language and Meaning. Baltimore, Md: University Park Press. Hart, B. and T. R. Risley (1995). Meaningful Differences in the Everyday Experience of Young American Children. Baltimore: Brookes Publishing. Hasan, R. (1973). Code, register and social dialect. In B. Bernstein (Ed.), Class, Code and Control, pp. 253–292. London & Boston: Routlege & Keagan Paul. Hasan, R. (2001). Understanding talk: Directions from bernstein’s sociology. International Journal of Social Research Methodology 4(1), 5–9.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

304 Designing Speech for a Recipient

Hatch, E. M. (1983). Psycholinguistics: A Second Language Perspective. Rowley/London/Tokyo: Newbury House. Hatch, E. M., R. Shapira, and J. Wagner-Gough (1978). ‘foreigner-talk’ discourse. ITL: Review of Applied Linguistics 39–40, 39–60. Haugh, M. and Liddicoat (2009). Examining conceptualizations of communication. Australian Journal of Linguistics 29(1), 1–10. Haviland, J. (1988). Minimal maxims: Cooperationa and natural conversation in Zinacatán. Mexican Studies IV(1), 79–114. Haworth, K. (2013). Audience design in the police interview: The interactional and judicial consequences of audience orientation. Language in Society 42(1), 45–69. Hegel, F. (2012). Effects of a robot’s aesthetic design on the attribution of social capabilities. In Proceedings of the 21st IEEE International Symposium on Robot and Human Interactive Communication, RoMan’12, Paris, France. Heine, B. and T. Kuteva (2007). The Genesis of Grammar: A Reconstruction. Oxford University Press. Heller, D., K. Skovbroten, and T. M. K. (2009). Experimental evidence for speakers’ sensitivity to common vs. privileged ground in the production of names. In PRE-CogSci Workshop on the Production of Referring Expressions, Amsterdam, The Netherlands. Henzl, V. M. (1973). Linguistic register of foreign language instruction. Language Learning 23(2), 207–222. Henzl, V. M. (1979). Foreign talk in the classroom. IRAL XVII(2), 159–165. Herberg, J. S., M. M. Saylor, P. Ratanaswasd, D. T. Levin, and D. M. Wilkes (2008). Audiencecontingent variation in action demonstrations for humans and computers. Cognitive Science 32, 1003–1020. Heritage, J. (1984). A change-of-state token and aspects of its sequential placement. In J. Atkinson and J. Heritage (Eds.), Structure of Social Action: Studies in Conversation Analysis, pp. 299–345. Cambridge University Press. Heritage, J. (1985). Analyzing news interviews: Aspects of the production of talk for an overhearing audience. In Handbook of Discourse Analysis, Volume 3: Discourse and Dialogue. London: Academic Press. Heritage, J. (1988). Explanations as accounts: A conversation analytic perspective. In C. Antaki (Ed.), Analysing Everyday Explanation. A Casebook of Methods, pp. 127–144. London etc.: Sage. Heritage, J. (1998). Oh-prefaced responses to inquiry. Language in Society 29, 291–334. Heritage, J. (2002). Oh-prefaced responses to assessments: A method of modifying agreement/ disagreement. In C. Ford, B. Fox, and S. Thompson (Eds.), The Language of Turn and Sequence, pp. 196–224. New York: Oxford University Press. Heritage, J. (2005). Cognition in discourse. In H. Te Molder and J. Potter (Eds.), Conversation and Cognition, pp. 184–202. Cambridge: Cambridge University Press. Heritage, J. (2007). Intersubjectivity and progressivity in references to persons (and places). In T. Stivers and N. J. Enfield (Eds.), Person Reference in Interaction: Linguistic, Cultural and Social Perspectives, pp. 255–280. Cambridge: Cambridge University Press. Heritage, J. and M.-L. Sorjonen (1994). Constituting and maintaining activities across sequences: And-prefacing as a feature of question design. Language in Society 23, 1–29. Hervey, S. (1992). Registering registers. Lingua 86, 189–206. Hester, S. and P. Eglin (1997a). Membership categorization analysis: An introduction. In S. ­Hester and P. Eglin (Eds.), Culture in Action. Studies in Membership Categorization Analysis, pp. 1–23. International Institute for Ethnomethodology and University Press of America.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Bibliography 305

Hester, S. and P. Eglin (1997b). The reflexive constitution of category, predicate and context in two settings. In S. Hester and P. Eglin (Eds.), Culture in Action. Studies in Membership Categorization Analysis, pp. 25–48. International Institute for Ethnomethodology and University Press of America. Hirschberg, J., D. Litman, and M. Swerts (2001). Detecting misrecognitions and corrections in spoken dialogue systems from aware sites. In ISCA Tutorial and Research Workshop on Prosody in Speech and Understanding, Red Bank, NJ. Hirsh-Pasek, K. and R. Treiman (1982). Doggerel: Motherese in a new context. Journal of Child Language 9, 229–237. Hitzler, S. (2013). Recipient design in institutioneller Mehrparteieninteraktion. Gersprächsforschung – Online-Zeitschrift zur verbalen Interaktion 14, 110–132. Hoff-Ginsberg, E. (1986). Function and structure in maternal speech: Their relation to the child’s development of syntax. Developmental Psychology 22, 155–163. Horton, B. and B. Keysar (1996). When do speakers take into account common ground? Cognition 59, 91–117. Horton, W. and R. Gerrig (2002). Speakers’ experiences and audience design: Knowing when and knowing how to adjust utterances to addressees. Journal of Memory and Language 47(4), 589–606. Horton, W. and R. Gerrig (2005a). Conversational common ground and memory processes in language production. Discourse Processes 40, 1–35. Horton, W. S. and R. J. Gerrig (2005b). The impact of memory demands on audience design during language production. Cognition 96, 127–142. Horton, W. S. and D. H. Spieler (2007). Age-related differences in communication and audience design. Psychology and Aging 22(2), 281–290. Hurtado, N., V. A. Marchman, and A. Fernald (2008). Does input influence uptake? Links between maternal talk, processing speed and vocabulary size in Spanish-learning children. Developmental Science 11(6), F31–F39. Hutchby, I. (1995). Aspects of recipient design in expert advice-giving on call-in radio. Discourse Processes 19, 219–238. Hutchby, I. (2001). Conversation and Technology: From the Telephone to the Internet. Cambridge: Polity. Hutchby, I. and R. Wooffitt (1998). Conversation Analysis. Cambridge: Polity. Huttenlocher, S. C., M. Vasilyeva, E. Cymerman, and S. Levine (2002). Language input at home and at school: Relation to syntax. Cognitive Psychology 45, 337–374. Hymes, D. (1972). Models of the interaction of language and social life. In J. J. Gumperz and D. Hymes (Eds.), Directions in Socolinguistics. The Ethnography of Communication. New York etc.: Holt, Rinehart and Winston. Isaacs, E. A. and H. H. Clark (1987). References in conversations between experts and novices. Journal of Experimental Psychology: General 116, 26–37. Jakobson, R. (1960). Linguistics and poetics. In T. A. Sebeok (Ed.), Style in Language, pp. 350–377. Cambridge, MA: MIT Press. Janicki, K. and A. Mickiewicza (1986). Accomodation in native speaker – foreigner interaction. In J. House and S. Blum-Kulka (Eds.), Interlingual and Intercultural Communication, pp. 169–178. Tübingen: Narr. Jefferson, G. (1974). Error correction as an interactional resource. Language in Society 3, 181–199. Jefferson, G. (1983). Issues in the transcription of naturally-occurring talk: Caricature versus capturing pronunciational particulars. Tilburg Papers in Languge and Literature 34, 1–12.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

306 Designing Speech for a Recipient

Jefferson, G. (1984). On the organization of laughter in talk about troubles. In J. Atkinson and J. C. Heritage (Eds.), Structures of Social Action: Studies in Conversation Analysis, pp. 346– 369. Cambridge: Cambridge University Press. Jefferson, G. (2004). Glossary of transcript symbols with an introduction. In H. Lerner (Ed.), Conversation Analysis: Studies from the first generation. Philadelphia: John Benjamins. Johnson, D. and J. Gardner (2007). The media equation and team formation: Further evidence for experience as a moderator. International Journal of Human-Computer Studies 65, 111–124. Johnson, D., J. Gardner, and J. Wiles (2004). Experience as a moderator of the media equation: the impact of flattery and praise. International Journal of Human-Computer Studies 61(3), 237–258. Johnstone, A., U. Berry, T. Nguyen, and A. Asper (1994). There was a long pause: Influencing turn-taking behaviour in human-human and human-computer spoken dialogues. International Journal of Human-Computer Studies 41, 383–411. Jörissen, S. (2011). ‘Ungleichungen zeigen’. das Lösen von Übungsaufgaben im Mathematikunterricht. In R. Schmitt (Ed.), Unterricht ist Interaktion! Analysen zur De-facto-Didaktik. Mannheim: Amades. Ju, W. and L. Takayama (2008). Approachability: How people interpret automatic door movement as gesture. In Proceedings of the Conference on Design and Emotion 2008. Jusczyk, P. W., A. D. Friederici, J. M. Wessels, V. Y. Svenkerud, and A. M. Jusczyk (1993). Infants’ sensitivity to segmental and prosodic characteristics of words in their native language. Journal of Memory and Language 32, 402–420. Kanda, T., T. Miyashita, T. Osada, Y. Haikawa, and H. Ishiguro (2008). Analysis of humanoid appearances in human-robot interaction. IEEE Transactions on Robotics 24(3), 725–735. Kaplan, P. S., J.-A. Bachorowski, and P. Zarlengo-Strouse (1999). Child-directed speech produced by mothers with symptoms of depression fails to promote associative learning in four-month old infants. Child Development 70, 560–570. Karmiloff, K. and A. Karmiloff-Smith (2002). Pathways to Language: From Fetus to Adoloscent. Harvard University Press. Karrass, J., J. M. Braungart-Rieker, J. Mullins, and J. Burke Lefever (2002). Processes in language acquisition: The roles of gender, attention, and maternal encouragement of attention over time. Journal of Child Language 29, 519–543. Kashima, Y., H. Bekkering, and E. S. Kashima (2013). Communicative intentions can modulate the perception – action link. Behavioral and Brain Sciences 36(4), 33–34. Katz, J. T. (1981). Children’s second language acquisition: The role of foreigner talk in child-child interaction. International Journal of the Sociology of Language 28, 53–68. Kaye, K. (1980). Why we don’t talk baby talk to babies. Journal of Child Language 7, 489–507. Kempe, V., P. J. Brooks, and L. Pirott (2001). How can child-directed speech facilitate the acquisition of morphology? In Proceedings of the VIIIth International Congress for the Study of Child Language, pp. 1237–1247. Keren-Portnoy, T. (2006). Facilitation and practice in verb acquisition. Journal of Child Language 33, 487–518. Kirjavainen, M., A. Theakston, and E. Lieven (2009). Can input explain children’s me-for-I errors? Journal of Child Language 36(5), 1091–114. Kitamura, C., C. Thanavishuth, D. Burnham, and S. Luksaneeyanawin (2002). Universality and specificity in infant-directed speech: Pitch modifications as a function of infant age and sex in a tonal and a non-tonal language. Infant Behavior and Development 24, 372–392.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Bibliography 307

Knoll, M. A., M. Uther, and A. Costall (2009). Effects of low-pass filtering on the judgment of vocal affect in speech directed to infants, adults and foreigners. Speech Communication 51, 210–216. Koda, T. and P. Maes (1996). Agents with faces: The effects of personification of agents. In IEEE Workshop on Robot and Human Communication, pp. 185–194. Koole, T. and J. Ten Thije (2001). The reconstruction of intercultural discourse: Methodological considerations. Journal of Pragmatics 33, 571–587. Kopp, S. (2006). How people talk to a virtual human – conversations from a real-world application. In K. Fischer (Ed.), Proceedings of the Workshop on ‘How People Talk to Computers, Robots, and Other Artificial Communication Partners’, Hansewissenschaftskolleg, Delmenhorst, April 21–23, 2006, pp. 101–111. Kortmann, B. and B. Szmrecsanyi (2009). World Englishes between simplification and complexification. In L. Siebers and T. Hoffmann (Eds.), World Englishes – Problems, Properties and Prospects: Selected papers from the 13th IAWE conference, pp. 265–285. Amsterdam: Benjamins. Kraljic, T. and S. E. Brennan (2005). Prosodic disambiguation of syntactic structure: For the speaker or for the addressee? Cognitive Psychology 50, 194–231. Kraljic, T., A. G. Samuel, and S. E. Brennan (2008). First impressions and last resorts: How listeners adjust to speaker variability. Psychological Science 19(4), 332–338. Krashen, S. D. (1981). Second Language Acquisition and Second Language Learning. Pergamon Press. Krause, J. (1992). Fazit und Ausblick: Registermodell versus metaphorischer Gebrauch von Sprache in der Mensch-Computer-Interaktion. In J. Krause and L. Hitzenberger (Eds.), Computertalk, Number 12 in Sprache und Computer, pp. 157–170. Hildesheim: Olms. Krause, J. and L. Hitzenberger (Eds.) (1992). Computer Talk. Hildesheim: Olms Verlag. Krauss, R. M. and J. S. Pardo (2004). Is alignment always the result of automatic priming? (commentary on Pickering & Garrod 2004). Behaviour and Brain Sciences 27(2), 203–204. Krauss, R. M. and S. Weinheimer (1966). Concurrent feedback, confirmation and the encoding of referents in verbal communication. Journal of Personality and Social Psychology 4, 343–346. Kraut (1982). Social presence, facial feedback, and emotion. Journal of Personality and Social Psychology 42, 853–863. Kriz, S., G. Anderson, and J. G. Trafton (2010). Robot-directed speech: using language to assess first-time users’ conceptualizations of a robot. In HRI ‘10: Proceeding of the 5th ACM/IEEE international conference on Human-robot interaction, New York, NY, USA, pp. 267–274. ACM. Krug, S. (2000). Don’t Make me Think! A Common Sense Approach to Web Usability. New Riders. Kuhlen, A. K. and S. E. Brennan (2011). Anticipating distracted addressees. Discourse Processes 47, 567–587. Kuhlen, A. K., A. Galati, and S. E. Brennan (2012). Gesturing integrates top-down and bottomup information: Joint effects of speakers’ expectations and addressees’ feedback. Language and Cognition 4, 17–41. Küntay, A. and D. I. Slobin (1996). Listening to a Turkish mother: Some puzzles for acquisition. In D. Slobin, J. Gerhardt, A. Kyratzis, and J. Gua (Eds.), Social Interaction, Social Context, and Language. Essays in Honor of Susan Ervin-Tripp, pp. 265–286. Mahwah, N.J.: Erlbaum. Küntay, A. and D. I. Slobin (2001). Discourse behavior of lexical categories in Turkish childdirected speech: Nouns vs. verbs. In M. Almgren, A. Barrena, M. Ezeizabarrena, I. Idiazabal, and B. MacWhinney (Eds.), Research on Child Language Acquisition: Proceedings of the 8th Conference of the International Association for the Study of Child Language, pp. 928–946. Cascadilla Press.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

308 Designing Speech for a Recipient

Kusters, W. (2003). Linguistic Complexity: The Influence of Social Change on Verbal Inflection. Leiden: University of Leiden Center for Linguistics. Laakso, A. and L. Smith (2004). Pronouns predict verb meanings in child-directed speech. In K. Forbus, D. Gentner, and T. Regier (Eds.), Proceedings of the 26th Annual Meeting of the Cognitive Science Society, pp. 767–772. Mahwah, NJ: Lawrence Erlbaum Associates. Laakso, A. and L. B. Smith (2007). Pronouns and verbs in adult speech to children: A corpus analysis. Journal of Child Language 34, 725–763. Laakso, M. and M.-L. Helasvuo (2005). Negotiating meaning, negotiating reference in interaction between parent and one-year-old child. Talk presented at IPrA’05, July 11–15th, 2005, Riva del Garda. Labov, W. (1966). The Social Stratification of English in New York City. Washington, D.C.: Center for Applied Linguistics. Labov, W. (1972). Sociolinguistic Patterns. Philadelphia: University of Pennsylvania Press. Ladegaard, H. J. (1995). Audience design revisited: Persons, roles and power relations in speech interactions. Language & Communication 15(1), 89–101. Lakoff, G. (1987). Women, Fire and Dangerous Things. What Categories Reveal about the Mind. The University of Chicago Press. Lakoff, R. (1972). Language and Women’s Place. New York: Harper & Row. Lam, C. and C. Kitamura (2006). Developmental trends in infant preferences for affective intent in mothers’ speech. In P. Warren and C. I. Watson (Eds.), Proceedings of the 11th Australian International Conference on Speech Science and Technology, University of Auckland, New Zealand, pp. 100–105. Lane, L. W., M. Groisman, and V. S. Ferreira (2006). Don’t talk about pink elephants! Speakers’ control over leaking private information during language production. Psychological Science 17(4), 273–277. Langacker, R. (2000). Grammar and Conceptualization. Berlin: Mouton de Gryuter. Langacker, R. (2008). Cognitive Grammar: A Basic Introduction. New York: Oxford University Press. Lankenau, A. and T. Roefer (2001). A safe and versatile mobility assistant. IEEE Robotics and Automation Magazine 7, 29–37. Lanvers, U. (2004). Gender in discourse behaviour in parent-child-dyads: A literature review. Child: Care, Health & Development 30(5), 481–493. Lee, D. Y. (2001). Genres, registers, text types, domains, and styles: Clarifying the concepts and navigating a path through the BNC jungle. Language Learning and Technology 5(3), 37–72. Lee, E.-J. (2008). Flattery may get computers somewhere, sometimes: The moderating role of output modality, computer gender, and user gender. International Journal of Human-Computer Studies 66, 789–800. Lee, K. M., Y. Jung, J. Kim, and S. R. Kim (2006). Are physically embodied social agents better than disembodied social agents? The effects of physical embodiment, tactile interaction and people’s loneliness in human-robot interaction. International Journal of Human-Computer Studies 64, 962–973. Lee, M. K., S. Kiesler, and J. Forlizzi (2010). Receptionist or information kiosk: How do people talk with a robot? In Computer Supported Cooperative Work, Savannah, Georgia, pp. 31–40. Lee, S., S. Kiesler, I. Lau, and C. Chin (2005). Human mental model of humanoid robot. In IEEE International Conference on Robotics and Automation. Leeman, J. (2003). Recasts and second language develeopment – beyond negative evidence. Studies in Second Language Acquisition 25, 37–63.

docID=4

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Bibliography 309

Lepper, G. (2000). Categories in Text and Talk. A Practical Introduction to Categorization Analysis. London, etc.: Sage. Levelt, W. (1983). Monitoring and self-repair in speech. Cognition 14, 41–104. Levin, D. T., S. C. Killingsworth, and M. M. Saylor (2008). Concepts about the capabilities of computers and robots: A test of the scope of adults’ theory of mind. In HRI’09, pp. 57–63. Levinson, S. (1983). Pragmatics. Cambridge: Cambridge University Press. Levinson, S. C. (2006). On the human ‘interaction engine’. In N. C. Enfield and S. C. Levinson (Eds.), Roots of Human Sociality: Culture, Cognition and Interaction, pp. 39–69. Oxford/ New York: Berg. Levitan, R., S. Ben̆us̆, A. Gravano, and J. Hirschberg (2014). Entrainment and turn-taking in human-human dialogue. In Proceedings of the AAAI Spring Symposium, Stanford. Levow, G.-A. (1998). Characterizing and recognizing spoken corrections in human-computer dialogue. In Proceedings of Coling/ACL ‘98. Levow, G. A. (1999). Understanding recognition failures in spoken corrections in human-computer dialogue. In Proceedings of the ESCA-Workshop on Dialogue and Prosody, September 1rst – 3rd, 1999, De Koningshof, Veldhoven, The Netherlands., pp. 123–128. Lewis, D. (1969). Convention. Cambridge: Harvard University Press. Lieven, E. (1994). Crosslinguistic and crosscultural aspects of language addressed to children. In C. Gallaway and B. J. Richards (Eds.), Input and Interaction in Language Acquisition, pp. 56–73. Cambridge: Cambridge University Press. Lieven, E. (2010). Input and first language acquisition: Evaluating the role of frequency. Lingua 120, 2546–2556. Lieven, E. V. (1978). Conversations between mothers and young children: Individual differences and their possible implication for the study of language learning. In N. Waterson and C. E. Snow (Eds.), The Development of Communication, pp. 173–187. Chichester, New York, Brisbane, Toronto: John Wiley and Sons. Liu, H.-M., T. Feng-Ming, and P. K. Kuhl (2009). Age-related changes in acoustic modifications of Mandarin maternal speech to preverbal infants and five-year-old children: A longitudinal study. Journal of Child Language 36, 909–922. Liu, H.-M., P. K. Kuhl, and F.-M. Tsao (2003). An association between mothers’ speech clarity and infants’ discrimination skills. Developmental Science 6(3), F1–F10. Local, J. (1996). Conversational phonetics: Some aspects of news receipts in everyday talk. In Prosody in Conversation. Cambridge: Cambridge University Press. Lockridge, C. B. and S. E. Brennan (2002). Addressees’ needs influence spakers’ early syntactic choices. Psychonomic Bulletin & Review 9(3), 550–557. Long, M. H. (1981). Input, interaction, and second-language acquisition. In Winitz (Ed.), Native Language and Foreign Language Acquisition, Number 379, pp. 259–278. Annals of the New York Academy of Sciences. Lyster, R. and L. Ranta (1997). Corrective feedback and learner uptake: Negotiation of form in communicative classrooms. Studies in Second Language Acquisition 20, 37–66. Lyster, R., K. Saito, and M. Sato (2013). Oral corrective feedback in second language classrooms. Language Teaching 46(1), 1–40. Mackey, A. (1999). Input, interaction, and second language development. An empirical study of question formation in ESL. Studies in Second Language Acquisition 21, 557–587. Mackey, A. (2012). Input, Interaction and Corrective Feedback in L2 Classrooms. Oxford: Oxford University Press.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

310 Designing Speech for a Recipient

MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk. Mahwah, NJ: Lawrence Erlbaum. Madlener, K. (2014). Input optimization. Effects of type and token frequency in instructed second language learning. In H. Behrens and S. Pfänder (Eds.), Frequency Effects in Language. What Counts in Language Processing, Acquisition and Change. Berlin, New York: Mouton de Gruyter. Manders, K. and G. Hall (2002). Comparison, basic-level categories, and the teaching of adjectives. Journal of Child Language 29, 923–937. Mani, N. and F. Huettig (2013). Towards a complete multi-mechanism account of predictive language processing. Behavioral and Brain Sciences 36(4), 37–38. Mannle, S. and M. Tomasello (1987). Fathers, siblings, and the bridge hypothesis. In K. Nelson and A. vanKleek (Eds.), Children’s Language, Volume 6, pp. 23–41. Hillsdale, N.J.: Erlbaum. Manson, J. H., G. A. Bryant, M. M. Gervais, and M. A. Kline (2013). Convergence of speech rate in conversation predicts cooperation. Evolution and Human Behavior 34(6), 419–426. Marchman, V. A., K. A. Adams, E. C. Loi, A. Fernald, and H. M. Feldman (2015). Early language processing efficiency predicts later receptive vocabulary outcomes in children born preterm. Child Neuropsychology, 1–16. Martin, J. R. (1985). Process and text: Two aspects of human semiosis. In D. Benson and W. Greave (Eds.), Systemic Perspectives on Discourse, Volume 1, Selected Theoretical Papers from the 9th International Systemic Workshop, pp. 248–274. Norwood, N.J.: Ablex. Martin, J. R. (1992). English Text. London: Arnold. McClelland, J. and J. Bybee (2007). The gradience of gradience: A reply to Jackendoff. The Linguistic Review 24(2), 437–455. McDonald, L. and D. Pien (1981). Mother conversational behaviour as a function of interactional intent. Journal of Child Language 9, 337–358. McWhorter, J. (2007). Language Interrupted. Signs of Non-Native Acquisition in Standard Language Grammars. Oxford University Press. McWhorter, J. H. (2011). Linguistic Simplicity and Complexity: Why Do Languages Undress? Boston/Berlin: De Gryuter Mouton. Meddeb, E. and P. Frenz-Belkin (2010). What? I didn’t say that! Speaking to write. Journal of Pragmatics 42(9), 2415–2429. Meek, B. A. (2006). And the injun goes ”how!”: Representations of American Indian English in white public space. Language in Society 35, 93–128. Meltzoff, A. N., R. Brooks, A. P. Shon, and R. P. Rao (2010). ”Social” robots are psychological agents for infants: A test of gaze following. Neural Networks 23, 966–972. Metzing, C. and S. E. Brennan (2003). When conceptual pacts are broken: Partner-specific effects on the comprehension of referring expressions. Journal of Memory and Language 49, 201–213. Mills, G. (2014). Dialogue in joint activity: Complementarity, convergence and conventionalization. New Ideas in Psychology 32, 158–173. Mills, G. J. (2007). The development of semantic co-ordination in dialogue: the role of direct interaction. Ph. D. thesis, Queen Mary University of London, Department of Computer Science. Mills, G. J. and P. G. Healey (2008). Semantic negotiation in dialogue: The mechanisms of alignment. In Proceedings of the 9th Sigdial Workshop on Discourse and Dialogue, pp. 46–53. Mintz, T. H. (2003). Frequent frames as a cue for grammatical categories in child directed speech. Cognition 90, 91–117.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Bibliography 311

Mintz, T. H., E. L. Newport, and T. G. Bever (2002). The distributional structure of grammatical categories in speech to young children. Cognitive Science 26, 393–424. Mitchell, R. W. (2001). Americans’ talk to dogs: Similarities and differences with talk to infants. Research on Language and Social Interaction 34(2), 183–210. Monaghan, P. and M. H. Christiansen (2004). What distributional information is useful and usable in language acquisition? In Proceedings of the 26th Annual Conference of the Cognitive Science Society. Mahwah, NJ: Lawrence Erlbaum. Monteiro, M. and E. Keating (2009). Managing misunderstandings: The role of language in interdisciplinary scientific collaboration. Science Communication 31(1), 6–28. Moratz, R., K. Fischer, and T. Tenbrink (2001). Cognitive modelling of spatial reference for human-robot interaction. International Journal on Artificial Intelligence Tools 10(4), 589–611. Morel, M.-A. (1989). Computer-human communication. In M. Taylor, F. Neel, and D. Bouhuis (Eds.), The Structure of Multimodal Communication, pp. 323–330. Amsterdam: North-Holland Elsevier. Morgan, J. L., R. P. Meier, and E. L. Newport (1987). Structural packaging in the input to language learning: Contributions of prosodic and morphological marking of phrases to the acquisition of language. Cognitive Psychology 19(4), 498–550. Mori, J. (2007). Border crossings? Exploring the intersection of second language acquisition, conversation analysis, and foreign language pedagogy. The Modern Language Journal 91, 849–862. Mühlhäusler, P. (1981). Foreigner talk: Tok Masta in New Guinea. International Journal of the Sociology of Language 28, 93–113. Mühlhäusler, P. (1984). Good and bad pidgin: Nogut yu toktok kranki. In S. Wurm and P. ­Mühlhäusler (Eds.), Handbook of Tok Pisin, pp. 275–291. Pacific Linguistics. Müller, K. (2000). Lernen im Dialog. Tübingen: Narr. Murray, A. D., J. Johnson, and J. Peters (1990). Fine-tuning of utterance length to preverbal infants: Effects on later language development. Journal of Child Language 17(3), 511–525. Murray, L. and C. Trevarthen (1986). The infant’s role in mother–infant communications. Journal of Child Language 13, 15–29. Mutlu, B. (2011). Designing embodied cues for dialogue with robots. AI Magazine 32(4), 17–29. Mutlu, B., F. Yamaoka, T. Kanda, H. Ishiguro, and N. Hagita (2009). Non-verbal leakage in robots: Communication of intentions through seemingly unintentional behavior. In 6th International Conference on Human-Robot Interaction HRI ‘09, pp. 61–68. Nabrings, K. (1981). Sprachliche Varietäten. Gunter Narr, Tübingen. Naigles, L. R. and E. Hoff-Ginsberg (1995). Input to verb learning: Evidence for the plausibility of syntactic bootstrapping. Developmental Psychology 31, 827–837. Naigles, L. R. and E. Hoff-Ginsberg (1998). Why are some verbs learned before other verbs? Effects of input frequency and structure on children’s early verb use. Journal of Child Language 25, 95–120. Nass, C. (2010). The Man Who Lied to his Laptop: What Machines Teach us about Human Relationships. New York: Penguin. Nass, C. and S. Brave (2005). Wired for Speech. How Voice Activates and Advances the HumanComputer Relationship. Cambridge, MA., London: MIT Press. Nass, C. and L. Gong (2004). Ten principles for designing human-computer dialog systems. In D. Dahl (Ed.), Practical Spoken Dialog Systems, pp. 25–40. Dordrecht, Boston, London: Kluwer.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

312 Designing Speech for a Recipient

Nass, C., K. Isbister, and E.-J. Lee (2000). Truth is beauty: Researching embodied conversational agents. In J. Cassell, J. Sullivan, S. Prevost, and E. Churchill (Eds.), Embodied Conversational Agents, pp. 374–402. Cambridge, MA/London: The MIT Press. Nass, C. and K. M. Lee (2000). Does computer-generated speech manifest personality? An experimental test of similarity-attraction. CHI Letters 2(1), 329–336. Nass, C. and Y. Moon (2000). Machines and mindlessness: Social responses to computers. Journal of Social Issues 56(1), 81–103. Nass, C. I. (2004). Etiquette equality: Exhibitions and expectations of computer politeness. Communications of the ACM 47(4), 35–37. Nelson, K. (1973). Structure and Strategy in Learning to Talk, Volume 38 of Monographs of the Society for Research in Child Development. Wiley/Blackwell. Newman, J. (2010). Balancing acts: Empirical pursuits in cognitive linguistics. In K. Fischer and D. Glynn (Eds.), Quantitative methods in Cognitive Semantics: Corpus-driven Approaches. Berlin: Mouton de Gruyter. Newman-Norlund, S. E., M. L. Noordzij, R. D. Newman-Norlund, I. A. Volman, J. P. de Ruiter, P. ­Hagoort, and I. Toni (2009). Recipient design in tacit communication. Cognition 111, 46–54. Newport, E. L., H. Gleitman, and L. R. Gleitman (1977). Mother, I’d rather do it myself: Some effects and non-effects of maternal speech style. In C. E. Snow and C. A. Ferguson (Eds.), Talking to Children. Language Input and Acquisition, pp. 109–149. Cambridge: Cambridge University Press. Ninio, A. (1979). The naive theory of the infant and other maternal attitudes in two subgroups in Israel. Child Development 50, 976–980. Ninio, A. (2011). Syntactic Development: Its Input and Output. New York: Oxford University Press. Ninio, A. and C. E. Snow (1988). Language acquisition through language use: The functional sources of children’s early utterances. In Y. Levi, I. Schlesinger, and M. Braine (Eds.), Categories and Processes in Language Acquisition, pp. 11–30. Hilldale, NJ: Erlbaum. Nothdurft, W. (1984). “…äh folgendes Problem äh…”. Die interaktive Ausarbeitung “des Problems” in Beratungsgesprächen. Number 57 in Forschungsberichte des Instituts für Deutsche Sprache Mannheim. Tübingen: Narr. Nowik-Stern, A., M. G. Clarkson, M. K. Morris, K. Shabazz, and M. J. Stephens (1996). Premature infants’ preferences for and responsivity to infant-directed speech. Infant Behavior and Development 19, 653. Ochs, E. and B. Schieffelin (1995). The impact of language socialization on grammatical development. In P. Fletcher and B. MacWhinney (Eds.), The Handbook of Child Language, pp. 73–94. Oxford: Blackwell. Okita, S., J. Bailenson, and D. L. Schwartz (2007). The mere belief of social interaction improves learning. In Proceedings of the 29th Meeting of the Cognitive Science Society, Nashville, USA, August 2007. Onnis, L., P. Monaghan, M. H. Christiansen, and N. Chater (2004). Variability is the spice of learning, and a crucial ingredient for detecting and generalising nonadjacent dependencies. In Proceedings of the 26th Annual Conference of the Cognitive Science Society. Onnis, L., H. Waterfall, and S. Edelman (2008). Learn locally, act globally: Learning language with variation set cues. Cognition 109, 423–430. Oviatt, S. (1995). Predicting spoken disfluencies during human-computer interaction. Computer Speech and Language 9, 19–35. Oviatt, S., J. Bernard, and G.-A. Levow (1998). Linguistic adaptations during spoken and multimodal error resolution. Language and Speech 41(3–4), 419–442.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Bibliography 313

Oviatt, S., C. Darves, and R. Coulston (2004). Toward adaptive conversational interfaces: Modeling speech convergence with animated personas. ACM Transactions on Computer-Human Interaction (TOCHI) 11(3). Paavola, L., S. Kunnari, I. Moilanen, and M. Lehtihalmes (2005). The functions of maternal verbal responses to prelinguistic infants as predictors of early communicative and linguistic development. First Language 25(2), 173–195. Paepcke, S. and L. Takayama (2010). Judging a bot by its cover: An experiment on expectation setting for personal robots. In Proceedings of Human Robot Interaction (HRI), Osaka, Japan. Pan, B. A., H. Feldman, and C. E. Snow (1993). Parental speech to low-risk and at-risk children. Manuscript. Pan, B. A., M. L. Rowe, J. D. Singer, and C. E. Snow (2005). Maternal correlates of growth in toddler vocabulary production in low-income families. Child Development 76(4), 763–782. Papousek, M., H. Papousek, and M. Haekel (1987). Didactic adjustments in fathers’ and mothers’ speech to their three-month-old infants. Journal of Psycholinguistic Research 16, 306–319. Parise, S., S. Kiesler, L. Sproull, and K. Waters (1996). My partner is a real dog: Cooperation with social agents. In Computer Supported Cooperative Work ‘96, pp. 399–408. Pearson, J., J. Hu, H. P. Branigan, M. J. Pickering, and C. Nass (2006). Adaptive language behavior in HCI: How expectations and beliefs about a system affect users’ word choice. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Montréal, April 2006, pp. 1177–1180. Pearson, J., M. Pickering, H. Branigan, J. McLean, C. I. Nass, and J. Hu (2003). The influence of beliefs about an interlocutor on lexical and syntactic alignment: Evidence from humancomputer dialogues. In Proceedings of the Twenty-fifth Annual Conference of the Cognitive Science Society, Boston, July 2003. Pearson, J., M. J. Pickering, H. P. Branigan, C. Nass, and J. Hu (2006). Influence of prior beliefs and (lack of) evidence of understanding on lexical alignment. In Proceedings of the 10th Annual Conference on Architectures and Mechanisms of Language Processing Conference. Pekarek-Döhler, S. (2011). Emergent grammar for all practical purposes: the on-line formatting of left and right dislocations in French conversations. In P. Auer and S. Pfänder (Eds.), Constructions: Emerging and emergent, pp. 45–87. Berlin / Boston: de Gruyter. Penman, R., T. Cross, J. Milgrom-Friedman, and R. Meares (1983). Mother’s speech to prelingual infants: A pragmatic analysis. Journal of Child Language 10, 17–34. Pentland, A. S. (2010). To signal is human. American Scientist 98, 204–211. Pezzulo, G. and H. Dindo (2013). Intentional strategies that make co-actors more predictable: The case of signaling. Behavioral and Brain Sciences 36(4), 43–44. Pickering, M. and S. Garrod (2013). An integrated theory of language production and comprehension. Behavioral and Brain Sciences 36(4), 1–64. Pickering, M. J. and S. Garrod (2004). Towards a mechanistic psychology of dialogue. Behavioral and Brain Sciences 27, 169–225. Pickering, M. J. and S. Garrod (2007). Do people use language production to make predictions during comprehension? Trends in Cognitive Science 11(3), 105–110. Pine, J. (1992). Maternal style at the early one-word stage: Re-evaluating the stereotype of the directive mother. First Language 12, 169–182. Pine, J. M. (1994). The language of primary caregivers. In C. Gallaway and B. J. Richards (Eds.), Input and Interaction in Language Acquisition, pp. 15–37. Cambridge: Cambridge University Press.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

314 Designing Speech for a Recipient

Pine, J. M. and E. V. Lieven (1993). Reanalysing rote-learned phrases: Individual differences in the transition to multi-word speech. Journal of Child Language 20, 551–571. Pirker, H. and G. Loderer (1999). I said ”two ti-ckets”: How to talk to a deaf wizard. In Proceedings of the ESCA Workshop on Dialogue and Prosody, September 1rst – 3rd, 1999, De Koningshof, Veldhoven, The Netherlands, pp. 181–186. Pomerantz, A., J. Ende, and F. Erickson (1995). Precepting in a general medicine clinic: How preceptors correct. In G. Morris and R. Chenail (Eds.), The Talk of the Clinic, pp. 151–169. Hillsdale, New Jersey: Lawrence Erlbaum Associates. Powers, A. and S. Kiesler (2006). The advisor robot: Tracing people’s mental model from a robot’s physical attributes. In HRI ‘06, pp. 218–225. Powers, A., S. Kiesler, S. Fussell, and C. Torrey (2007). Comparing a computer agent with a humanoid robot. In HRI ‘07, pp. 145–152. Prato-Previde, E., G. Fallani, and P. Valsecchi (2006). Gender differences in owners interacting with pet dogs: An observational study. Ethology 112, 64–73. Pratt, J. A., K. Hauser, Z. Ugray, and O. Patterson (2007). Looking at human-computer interface design: Effects of ethnicity in computer agents. Interacting with Computers 19, 512–523. Quam, C., J. Yuan, and D. Swingley (2008). Relating intonational pragmatics to the pitch realizations of highly frequent words in english speech to infants. In Proceedings of the 30th Annual Meeting of the Cognitive Science Society, CogSci 2008, Washington, D.C., pp. 217–223. Rabain-Jamin, J. and E. Sabeau-Jouannet (1989). Playing with pronouns in French maternal speech in prelingual infants. Journal of Child Language 16, 217–238. Raffray, C. N., M. J. Pickering, and H. P. Branigan (2008). Relation priming, the lexical boost, and alignment in dialogue. Behavioral and Brain Sciences 31(4), 394–395. Ramey, C. T. and S. L. Ramey (2004). Early learning and school readiness: Can early intervention make a difference? Merrill-Palmer Quarterly 50(4), 471–491. Redford, M. A., B. L. Davis, and R. Miikkulainen (2004). Phonetic variability and prosodic structure in mothers. Infant Behavior & Development 27, 477–498. Reeves, B. and C. Nass (1996). The Media Equation. Stanford: CSLI and Cambridge: Cambridge University Press. Retherford, K. S., B. C. Schwartz, and R. S. Chapman (1981). Semantic roles and residual grammatical categories in mother and child speech: Who tunes into whom? Journal of Child Language 8, 583–608. Richards, B. J. (1994). Child-directed speech and influences on language acquisition: Methodology and interpretation. In C. Gallaway and B. J. Richards (Eds.), Input and Interaction in Language Acquisition, pp. 74–106. Cambridge: Cambridge University Press. Richards, M. and K. Underwood (1984). Talking to machines: How are people naturally inclined to speak? In Proceedings of the Ergonomics Society Annual Conference, pp. 62–67. London: Taylor & Francis. Rickford, J. R. and P. Eckert (2001). Introduction. In P. Eckert and J. R. Rickford (Eds.), Style and Sociolinguistic Variation, pp. 1–18. Cambridge: Cambridge University Press. Rickford, J. R. and F. McNair-Knox (1994). Addressee- and topic-influenced stye shift: A quantitative sociolinguistic study. In D. Biber and E. Finegan (Eds.), Sociolinguistic Perspectives on Register, pp. 235–275. Oxford and New York: Oxford University Press. Roberts, F. (2004). Speaking to and for animals in a veterinary clinic: A practice for managing interpersonal interaction. Research on Language and Social Interaction 37(4), 421–446. Roberts, S. J. and J. Bresnan (2008). Retained inflectional morphology in pidgins: A typological study. Linguistic Typology 12, 269–302.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Bibliography 315

Roche, J. (1989). Xenolekte. Struktur und Variation im Deutsch gegenüber Ausländern. Berlin, New York: de Gruyter. Rohde, D. L. and D. C. Plaut (2002). Less is less in language acquisition. In P. Quinlin (Ed.), Connectionist Modelling of Cognitive Development. Hove, UK: Psychology Press. Ross, S. and I. M. Shortreed (1990). Japanese foreigner talk: Convergence or divergence? Journal of Asian Pacific Communication 1(1), 135–145. Rowe, M. L. (2008). Child-directed speech: Relation to socioeconomic status, knowledge of child development and child vocabulary skill. Journal of Child Language 35, 185–205. Rowland, J. Pine, E. Lieven, and A. L. Theakston (2003). Determinants of acquisition order in whquestions: Re-evaluating the role of caregiver speech. Journal of Child Language 30, 609–635. Roy, B. C., M. C. Frank, and D. Roy (2009). Exploring word learning in a high-density longitudinal corpus. In Proceedings of the 31st Annual Conference of the Cognitive Science Society, Amsterdam, pp. 2106–2111. Roy, B. C., M. C. Frank, and D. Roy (2012). Relating activity contexts to early word learning in dense longitudinal data. In Proceedings of the 34th Annual Cognitive Science Conference. Sapporo, Japan. Roy, D. (2009). New horizons in the study of child language acquisition. In Proceedings of Interspeech 2009, Brighton. Ryskin, R. A., R. F. Wang, and S. Brown-Schmidt (2016). Listeners use speaker identity to access representations of spatial perspective during online language comprehension. Cognition 147, 75–84. Sachs, J. and J. Devin (1976). Young children’s use of age-appropriate speech styles in social interaction and role-playing. Journal of Child Language 3(1), 83–98. Sacks, H. (1972a). An initial investigation of the usability of conversational data for sociology. In D. Sudnow (Ed.), Studies in Social Interaction, pp. 31–74. New YorK: The Free Press. Sacks, H. (1972b). On the analyzability of stories by children. In J. Gumperz and D. Hymes (Eds.), Directions in Sociolinguistics: The Ethnography of Communication, pp. 325–345. New York: Rinehart and Winston. Sacks, H. (1984). Notes on methodology. In J. Atkinson and J. Heritage (Eds.), Structure of Social Action: Studies in Conversation Analysis, pp. 21–27. Cambridge: Cambridge University Press. Sacks, H. (1996). Lectures on Conversation, Volume I & II. Oxford: Blackwell. Sacks, H. and E. A. Schegloff (1979). Two preferences in the organization of reference to persons in conversation and their interaction. In G. Psathas (Ed.), Everyday Language: Studies in Ethnomethodology, pp. 15–21. New York: Irvington Publishers. Sacks, H., E. A. Schegloff, and G. Jefferson (1974). A simplest systematics for the organization of turn-taking for conversation. Language 50(4), 696–735. Saffran, J. R., S. D. Pollak, R. L. Seibel, and A. Shkolnik (2007). Dog is a dog is a dog: Infant rule learning is not specific to language. Cognition 105, 669–680. Sandini, G., G. Metta, and D. Vernon (Eds.) (2007). The iCub Cognitive Humanoid Robot. Heidelberg: Springer. Saville-Troike, M. (Ed.) (2003). The Ethnography of Communication: An Introduction (third ed.). Oxford: Blackwell. Saxton, M., P. Backley, and C. Gallaway (2005). Negative input for grammatical errors: Effects after a lag of 12 weeks. Journal of Child Language 32, 643–672. Schank, R. C. and R. Abelson (1975). Scripts, Plans, Goals, and Understanding. Hillsdale, N.J.: Erlbaum.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

316 Designing Speech for a Recipient

Schegloff, E. A. (1972a). Notes on a conversational practise: Formulating place. In D. Sudnow (Ed.), Studies in Social Interaction, pp. 75–119. New York: Free Press. Schegloff, E. A. (1972b). Sequencing in conversational openings. In J. J. Gumperz and D. Hymes (Eds.), Directions in Sociolinguistics. The Ethnography of Communication. New York etc: Holt, Rinehart and Winston. Schegloff, E. A. (1980). Preliminaries to preliminaries: ‘Can I ask you a question’. Sociological Inquiry 50(3–4), 104–152. Schegloff, E. A. (1982). Discourse as an interactional achievement: Some uses of ‘uh huh’ and other things that come between sentences. In D. Tannen (Ed.), Analysing Discourse. Text and Talk. Washington: Georgetown University Press. Schegloff, E. A. (1992). To Searle on conversation: A note in return. In (On) Searle on Conversation, pp. 113–128. Amsterdam and Philadelphia: John Benjamins. Schegloff, E. A. (1997). Whose text, whose context? Discourse & Society 8(2), 165–187. Schegloff, E. A. (2000). On granularity. Annual Review of Sociology 26, 715–720. Schegloff, E. A. (2004). Putting the interaction back into dialogue (commentary on Pickering & Garrod 2004). Behavioral and Brain Sciences 27(2), 207–208. Schegloff, E. A. (2007). A tutorial on membership categorization. Journal of Pragmatics 39, 462–482. Schegloff, E. A. and H. Sacks (1973). Opening up closings. Semiotica VIII(4), 289–327. Scheper-Hughes, N. (1985). Culture, scarcity, and maternal thinking: Maternal detachment and infant survival in a Brazilian shantytown. Ethos 13(4), 291–317. Schermerhorn, P., M. Scheutz, and C. R. Crowell (2008). Robot social presence and gender: Do females view robots differently than males? In HRI ‘08, pp. 263–270. Scheutz, M., J. Forlizzi, and J. Voiklis (2016). Which robot am i thinking about? the impact of action and appearance on peoples evaluations of a moral robot. In Proceedings of the HRI Conference, Christchurch, NZ. Schick, L. (2014). Some people: From referential vagueness to social-moral socialization in middle school dance classes. Pragmatics and Society 5(2), 243–270. Schindler, S. and J. Kissler (2016). People matter: Perceived sender identity modulates cerebral processing of socio-emotional language feedback. Neuroimage 134, 160–169. Schmitt, R. and A. Deppermann (2009). ”Damit Sie mich verstehen”: Genese, Verfahren und Recipient Design einer narrativen Performance. In M. Buss, S. Habscheid, S. Jautz, F. Liedtke, and J. G. Schneider (Eds.), Theatralität des sprachlichen Handelns: Eine Metaphorik zwischen Linguistik und Kulturwissenschaften, pp. 79–107. München: Fink. Schmitt, R. and R. Knöbl (2013). Recipient deisgn aus multimodaler Sicht. Deutsche Sprache 41(3), 242–276. Schober, M. (1993). Spatial perspective taking in conversation. Cognition 47(1), 1–24. Schober, M. (1995). Speakers, addressees, and frames of reference: Whose effort is minimized in conversations about location? Discourse Processes 20(2), 219–247. Schober, M. (2004). Just how aligned are interlocutors’ representations? (commentary on Pickering & Garrod 2004). Behavioral and Brain Sciences 27, 209–210. Schober, M. and H. H. Clark (1989). Understanding by addressees and overhearers. Cognitive Psychology 21, 211–232. Schober, M. F. (1998). Different kinds of conversational perspective-taking. In S. C. Fussell and R. Kreuz (Eds.), Social and Cognitive Psychological Approaches to Interpersonal Communication, pp. 145–174. Hillsdale: Lawrence Erlbaum.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Bibliography 317

Schober, M. F. (2005). Conceptual alignment in conversation. In B. Malle and S. Hodges (Eds.), Other Minds: How Humans Bridge the Divide between Self and Others, pp. 239–252. New York: Guiltford Press. Schober, M. F. (2009). Spatial dialogue between partners with mismatched abilities. In K. ­Coventry, T. Tenbrink, and J. A. Bateman (Eds.), Spatial Language and Dialogue. Oxford: Oxford University Press. Schober, M. F. and S. E. Brennan (2003). Processes of interactive spoken discourse: The role of the partner. In A. C. Graesser, M. A. Gernsbacher, and S. R. Goldman (Eds.), Handbook of Discourse Processes, pp. 123–164. Hillsdale: Lawrence Erlbaum. Schriver, K. (1997). The Dynamics of Document Design. New York etc.: John Wiley & Sons Inc. Scollon, R. and S. W. Scollon (2001). Discourse and intercultural communication. In D. Tannen, D. Schiffrin, and H. Hamilton (Eds.), Handbook of Discourse Analysis, pp. 538–547. Oxford: Basil Blackwell. Segal, J. (2010). Infant preferences for two properties of infant-directed speech. Master’s thesis, University of Maryland. Sequeira, P., P. Alves-Oliveira, T. Ribeiro, E. Di Tullio, S. Petisca, F. S. Melo, G. Castellano, and A. Paiva (2016). Discovering social interaction strategies for robots from restricted-perception wizard-of-oz studies. In Proceedings of the International Conference on Human-Robot Interaction, Christchurch, NZ, pp. 197–204. Sethuraman, N. (2004). Influence of parental input on learning argument structure constructions. In On-Line Proceedings Supplement of Boston University Child Language Development Conference 28. Shatz, M. (1977). How to do things by asking: Form-function relations in mothers’ questions to children. Paper presented at the biennial meeting of the Society for Research in Child Development, New Orleans. Shechtman, N. and L. M. Horowitz (2003). Media inequality in conversation: How people behave differently when interacting with computers and people. In CHI 2003, April 5–10, 2003, Ft. Lauderdale, Florida, Volume 5, pp. 281–288. Shintel, H. and B. Keysar (2009). Less is more: A minimalist account of joint action in communication. Topics in Cognitive Science 1, 260–273. Slaney, M. and G. McRoberts (2003). Babyears: A recognition system for affective vocalizations. Speech Communication 39, 367–384. Slobin, D. I. (1975). On the nature of talk to children. In Foundations of Language Development: A Multi-Disciplinary Approach, Volume I, pp. 283–297. New York, San Fancisco, London: Academic Press. Smith, S. W., N. Scholnick, A. Crutcher, and M. Simeone (1991). Foreigner talk revisited: Limits on accomodation to nonfluent speakers. In J. Blommaert and J. Verschueren (Eds.), The Pragmatics of International and Intercultural Communication. Amsterdam: Benjamins. Snow, C. E. (1972). Mothers’ speech to children learning language. Child Development 43, 549–565. Snow, C. E. (1977). Mothers’ speech research: From input to interaction. In C. E. Snow and C. A. Ferguson (Eds.), Talking to Children. Language Input and Acquisition, pp. 31–49. Cambridge: Cambridge University Press. Snow, C. E. (1994). Beginning from baby talk: Twenty years of research on input and interaction. In C. Gallaway and B. J. Richards (Eds.), Input and Interaction in Language Acquisition, pp. 3–12. Cambridge: Cambridge University Press.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

318 Designing Speech for a Recipient

Snow, C. E. (1995). Issues in the study of input: Finetuning, universality, individual and developmental differences, and necessary causes. In P. Fletcher and B. MacWhinney (Eds.), The Handbook of Child Language, pp. 180–193. Oxford: Blackwell. Snow, C. E. and B. A. Goldfield (1983). Turn the page please: Situation-specific language acquisition. Journal of Child Language 10, 551–569. Snow, C. E., R. Perlmann, and D. Nathan (1987). Why routines are different: Toward a multiplefactors model of the relation between input and language acquisition. In K. E. Nelson and A. van Kleek (Eds.), Children’s Language, Volume 6, pp. 65–97. Hillsdale, N.J.: Erlbaum. Snow, C. E., R. van Eeden, and P. Muysken (1981). The interactional origins of foreigner talk: Municipal employees and foreign workers. International Journal of the Sociology of Language 28, 81–91. Soderstrom, M. (2007). Beyond babytalk: Re-evaluating the nature and content of speech input to preverbal infants. Developmental Review 27, 501–532. Soderstrom, M., A. Seidl, D. G. Kemler Nelson, and P. W. Jusczyk (2003). The prosodic bootstrapping of phrases: Evidence from prelinguistic infants. Journal of Memory and Language 49(2), 249–267. Sokolov, J. L. (1993). A local contingency analysis of the fine-tuning hypothesis. Developmental Psychology 29(6), 1008–1023. Spence, M. J. and D. S. Moore (2003). Categorization of infant-directed speech: Development from 4 to 6 months. Developmental Psychobiology 42, 97109. Spenneberg, D. and F. Kirchner (2002). Scorpion: A biomimetic walking robot. Robotik 1679, 677–682. Spenneberg, D. and F. Kirchner (2007). The bio-inspired scorpion robot: Design, control and lessons learned. In H. Zhang (Ed.), Climbing and Walking Robots, Towards New Applications, pp. 197–218. I-Tech Education and Publishing. Sperber, D. and D. Wilson (1986). Relevance: Communication and Cognition. Oxford: Blackwell. Stefanowitsch, A. and S. T. Gries (2005). Covarying collexemes. Corpus Linguistics and Linguistics Theory 1(1), 1–46. Stent, A. J., M. K. Huffman, and E. Brennan, Susan (2008). Adapting speaking after evidence of misrecognition: Local and global hyperarticulation. Speech Communication 50, 163–178. Stoll, S., K. Abbot-Smith, and E. Lieven (2009). Lexically restricted utterances in Russian, German, and English child-directed speech. Cognitive Science 33, 75–103. Strait, M., C. Cunning, and M. Scheutz (2014). Let me tell you! Investigating the effects of robot communication strategies in advice-giving situations based on robot appearance, interaction modality and distance. In Proceedings of the 2014 Human-Robot Interaction Concference HRI’14, Bielefeld, Germany. Strupka, E., O. Niebuhr, and K. Fischer (2016). Influence of robot gender and speaker gender on prosodic entrainment in HRI. In Interactive Session at the IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN 2016), New York City. Suchman, L. (2007). Human-Machine Reconfigurations: Plans and Situated Actions (2nd ed.). Cambridge: Cambridge University Press. Sylvester-Bradley, B. and C. Trevathen (1978). Baby talk as an adaptation to the infant’s communication. In N. Waterson and C. E. Snow (Eds.), The Development of Communication, pp. 75–92. Chichester, New York, Brisbane, Toronto: John Wiley and Sons. Takayama, L. and C. Pantofaru (2009). Influences on proxemic behaviors in human-robot interaction. In Proceedings of IROS, pp. 5495–5502.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Bibliography 319

Tannen, D. (2004). Talking the dog: Framing pets as interactional resources in family discourse. Research on Language and Social Interaction 37(4), 399–420. Tannen, D. and C. Wallat (1986). Medical professionals and parents: A linguistic analysis of communication across contexts. Language in Society 15(3), 295. Tausczik, Y. and J. Pennebaker (2010). The psychological meaning of words: Liwc and computerized text analysis methods. Journal of Language and Social Psychology 29, 24–54. Taylor, N., W. Donovan, S. Miles, and L. Leavitt (2009). Maternal control strategies, maternal language usage and children’s language usage at two years. Journal of Child Language 36, 381–404. Thar, E. (2009). Telefonbasierte natürlichsprachliche Dialogsysteme – Telefonieren mit einem Computer? Talk presented at the 44. AAG in Sonderborg, May 2009. Theakston, A. L., E. V. Lieven, J. M. Pine, and C. F. Rowland (2004). Semantic generality, input frequency and the acquisition of syntax. Journal of Child Language 31, 61–99. Thimm, C. (1998a). Alter – Sprache – Geschlecht. Sprach- und kommunikationswissenschaftliche Perspektiven auf das höhere Lebensalter. Frankfurt/New York: Campus. Thimm, C. (1998b). Partnerhypothesen, Handlungswahl und sprachliche Akkommodation. In B. Henn-Memmesheimer (Ed.), Sprachvarianz als Ergebnis kalkulierter Handlungswahl, pp. 49–63. Tübingen: Niemeyer. Thomason, S. G. (2001). Language Contact: An Introduction. Edinburgh University Press. Tomasello, M. (2003). Constructing a Language: A Usage-Based Theory of Language Acquisition. Cambridge, Mass.: Harvard University Press. Torrey, C., S. R. Fussel, and S. Kiesler (2013). How a robot should give advice. In Proceedings of the 2013 Human-Robot Interaction Conference, Boston, Mass. Trainor, L. J. and R. N. Desjardins (2002). Pitch characteristics of infant-directed speech affect infants’ ability to discriminate vowels. Psychonomic Bulletin & Review 9(2), 335–340. Trainor, L. J. and C. A. Zacharias (1998). Infants prefer higher-pitched singing. Infant Behavior and Development 21(4), 799–806. Trude, A. M. and S. Brown-Schmidt (2012). Talker-specific perceptual adaptation during online speech perception. Language and Cognitive Processes 27(7/8), 979–1001. Trudgill, P. (1972). Sex, covert prestige and linguistic change in the Urban British English of Norwich. Language in Society 1(2), 175–195. Turkle, S. (1982). The subjective computer: A study in the psychology of personal computation. Social Studies of Science 12, 173–205. Turkle, S. (2006). A nascent robotics culture: New complicities for companionship. Technical report, AAAI Technical Report series. Turkle, S., W. Taggart, C. B. Kidd, and O. Dasté (2006). Relational artifacts with children and elders: The complexities of cybercompanionship. Connection Science 18(4), 347 – 361. Uther, M., M. Knoll, and D. Burnham (2007). Do you speak e-ng-l-i-sh? A comparison of foreigner- and infant-directed speech. Speech Communication 49, 2–7. Valdman, A. (1981). Sociolinguistic aspects of foreigner talk. International Journal of the Sociology of Language 28, 41–52. Valian, V. and L. Casey (2003). Young children’s acquisition of wh-questions: The role of structured input. Journal of Child Language 30, 117–143. Van Vugt, H., J. N. Bailenson, J. F. Hoorn, and E. A. Konijn (2010). Effects of facial similarity on user responses to embodied agents. ACM Transactions on Computer Human Interaction 17(2), 1–27.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

320 Designing Speech for a Recipient

Varonis, E. and S. Gass (1985). Non-native–non-native conversations: A model for negotiation of meaning. Applied Linguistics 6, 71–90. Veneziano, E. (2001). Displacement and informativeness in child-directed talk. First Language 21, 323–356. Vogt, P. and J. D. Mastin (2013). Rural and urban differences in language socialization and early vocabulary development in Mozambique. In M. Knauff, M. Pauen, N. Sebanz, and I. ­Wachsmuth (Eds.), Proceedings of the 35th Annual Conference of the Cognitive Science Society, pp. 3687–3692. Austin, TX: Cognitive Science Society. von Stutterheim, C. and U. Kohlmann (1998). Selective hearer adaptation. Linguistics 36(3), 517–549. Vosoughi, S., B. C. Roy, M. C. Frank, and D. Roy (2010a). Contributions of prosodic and distributional features of caregivers’ speech in early word learning. In Proceedings of the 32nd Annual Cognitive Science Conference. Portland, OR. Vosoughi, S., B. C. Roy, M. C. Frank, and D. Roy (2010b). Effects of caregiver prosody on child language acquisition. In Proceedings of the 5th International Conference on Speech Prosody. Chicago, IL. Wagner, J. (1996). Foreign language acquisition through interaction – a critical review of research on conversational adjustment. Journal of Pragmatics 26, 215–235. Wagner, J. and R. Gardner (2004). Introduction. In R. Gardner and J. Wagner (Eds.), Second Language Conversations, pp. 1–17. London: Continuum. Walker, K. and L. Armstrong (1995). Do mothers and fathers interact differently with their child or is it the situation which matters? Child: Care, Health and Development 21(3), 161–181. Wang, X. (2010). Features of input of second language acquisition. Journal of Language Teaching and Research 1(3), 282–284. Warren-Leubecker, A. and J. N. Bohannon III (1982). The effects of expectation and feedback on speech to foreigners. Journal of Psycholinguistic Research 11(3), 207–215. Weisleder, A. and A. Fernald (2013). Talking to children matters: Early language experience strengthens processing and builds vocabulary. Psychological Science 24(11), 2143–2152. Weiss, A., R. Bernhaupt, D. Schwaiger, M. Altmaninger, R. Buchner, and M. Tscheligi (2009). User experience evaluation with a wizard of oz approach: Technical and methodological considerations. In 9th IEEE-RAS International Conference on Humanoid Robots, Humanoids 2009, Paris, France, December 7–10, 2009, pp. 303–308. Weizman, Z. O. and C. E. Snow (2001). Lexical input as related to children’s vocabulary acquisition: Effects of sophisticated exposure and support for meaning. Developmental Psychology 37(2), 265–279. Weppelman, T. L., A. Bostow, R. Schiffer, E. Elbert-Perez, and R. S. Newman (2003). Children’s use of the prosodic characteristics of infant-directed speech. Language & Communication 23, 63–80. Werker, J., J. Pegg, and P. McLeod (1994). A cross-language investigation of infant preference for infant-directed communication. Infant Behavior and Development 17, 321–331. Wolters, M., K. Georgila, J. D. Moore, and S. E. Macpherson (2009). Being old doesn’t mean acting old: How older users interact with spoken dialog systems. ACM Transactions on Accessible Computing 2(1), 2.1–2.39. Wooffitt, R., N. Fraser, N. Gilbert, and S. McGlashan (1997). Humans, Computers and Wizards: Conversation Analysis and Human (Simulated) Computer Interaction. Routledge.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Bibliography 321

Wrede, B., S. Buschkämper, C. Muhl, and K. Rohlfing (2006). Analysing feedback in HRI. In K. Fischer (Ed.), Proceedings of the Workshop on ‘How People Talk to Computers, Robots, and Other Artificial Communication Partners’, Hansewissenschaftskolleg, Delmenhorst, April 21–23, 2006, pp. 38–53. Wrede, B., S. Kopp, K. Rohlfing, M. Lohse, and C. Muhl (2010). Feedback in asymmetric interactions. Journal of Pragmatics 42(9), 2369–2384. Wu, S. and B. Keysar (2007). The effect of information overlap on communication effectiveness. Cognitive Science 31, 1–13. Yaeger-Dror, M. (2001). Primitives of a system for ”style” and ”register”. In P. Eckert (Ed.), Style and Sociolinguistic Variation, pp. 170–184. Cambridge: Cambridge University Press. Yee, N., J. Bailenson, M. Urbanek, F. Chang, and D. Merget (2007). The unbearable likeness of being digital: The persistence of nonverbal social norms in online virtual environments. CyberPsychology and Behavior 10(1), 115–121. Yont, K. M., C. E. Snow, and L. Vernon-Feagans (2003). The role of context in mother-child interactions: An analysis of communicative intents expressed during toy play and book reading with 12-month-olds. Journal of Pragmatics 35, 435–454. Yoon, S. O. and S. Brown-Schmidt (2013). What is the context of prediction? Behavioral and Brain Sciences 36(4), 48–49. Yoon, S. O. and S. Brown-Schmidt (2014). Adjusting conceptual pacts in three-party conversation. Journal of Experimental Psychology: Learning, Memory and Cognition 40(4), 919–937. Zeschel, A. (2013). Gebrauchsfrequenz und Registerspezifik als Determinanten der Konstruktionswahl. Zeitschrift für Literaturwissenschaft und Linguistik 169, 81–96. Zoeppritz, M. (1985). Computer talk? Technical Report TN 85.05, IBM Heidelberg Scientific Center. Zoltan-Ford, E. (1991). How to get people to say and type what computers can understand. International Journal of Man-Machine Studies 34, 527–647. Zuengler, J. (1991). Accomodation in native–non-native interactions: Going beyond the “what” to the “why” in second-language research. In H. Giles, J. Coupland, and N. Coupland (Eds.), Contexts of Accomodation. Developments in Applied Sociolinguistics. Cambridge: Cambridge University Press.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Index

A accountability  16, 188, 192–193, 195 adjacency pair  20, 142–143, 244, 288 affect  41, 91, 108–109, 111, 114, 129, 133, 136, 146, 152–154, 199, 259, 265 affordances  29, 43, 76, 79, 172, 177, 188, 259–260, 268–272, 276, 278, 283, 285–288 age  8, 22–23, 28, 36, 43, 87, 90–93, 95, 100, 103, 106, 110–116, 119, 122, 124, 131–132, 140, 142, 144–146, 190, 260, 275, 280 Aibo  81–83, 85–86, 178, 182–183, 187, 195–196, 202, 204–208, 211, 213–214, 216, 218–220, 222, 226, 228, 234, 236–246, 273–275, 278, 281, 283 alignment  2, 4–7, 9, 12, 37, 47–56, 64, 66, 80, 118, 128, 138–140, 143, 145, 147, 165, 169, 171–177, 200–209, 225, 228, 230, 234, 236, 238–241, 243–249, 259, 261, 266–267, 270, 273–274, 277–279, 281, 283, 286, 288 automatic  54, 138–139, 143, 238, 240, 277 strategic  239, 241, 243 anthropology  17, 67, 87 anthropomorphism 190–191, 207, 210, 218, 225 appearance  10, 40, 70, 79–81, 103, 172, 176–178, 181, 210–212, 214–216, 218, 236–237, 257, 275–276, 281

argument structure constructions  98, 118, 137, 253, 287 asymmetry  45, 64, 222, 287 B Baby Talk  8, 87, 101, 106, 122, 138, 177, 260, 264–265 basic level terms  101, 105, 252, 269 birth order  110 body orientation  19, 103 bonding  112, 165, 169 C candidate understanding  18, 103, 134, 141, 159 Cantonese 123 caricature  155, 262–263 CASA  188–189, 195, 198, 215–216, 238, 269, 276, 278, 282 categories inference-rich  17, 42, 68, 238 Childes  96, 99–100, 103–104, 117–118 Chinese 92 clarification  1, 18, 26, 50, 60–61, 77–79, 85–86, 105, 122, 157–159, 161, 163, 181–182, 192–193, 210, 231–233, 256, 258, 265, 274 classroom discourse  22, 110, 130, 134, 168 classroom interaction  149–150, 155, 158, 162–163, 169–170 common ground  8, 15, 31–34, 36, 39–43, 45–47, 50, 59, 64–65, 77, 140, 210, 243, 245, 247, 274–275, 278, 282, 288

Communication Accommodation Theory  25, 55, 165, 170–171, 174 complexity  96, 100–101, 106, 115–116, 118, 123, 126–130, 133, 137, 140, 158, 207, 217, 227, 234–235, 238, 254, 257, 266, 284–285, 287, 289 computational simulation  90, 97, 99–100, 128, 138 concealing  21, 34, 40, 44, 50 conceptual pact  43, 53 consistency  16, 37, 94, 121, 139 construction grammar  98, 100, 118, 163, 271 constructivism  87, 127, 136, 275, 286 contextualization cue  46 contingency  4, 55, 63, 65, 102, 104, 107, 120, 128–129, 134–135, 142, 145, 147, 216, 235, 281–282, 287 convergence  26–27, 47, 166, 171, 174–175 Conversation Analysis  3, 6, 13, 16–19, 47, 61, 65, 69, 73, 75–76, 78, 249 corpus studies  29, 35, 49, 69–70, 72, 74–76, 79–81, 86, 94–96, 99–100, 103, 117–118, 131–132, 138, 163, 178–181, 186–187, 192, 197, 201–202, 205–206, 211–218, 220–221, 228, 231–232, 238–239, 244, 249, 254 counterproductive strategy  9, 123, 160–161, 167, 169, 258 Czech 150

324 Designing Speech for a Recipient

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

D deprecatory language  197 depression  109, 131, 143 deviant case analysis  73–74, 178 dialogic resonance  49 diminutives  95, 122–123, 252–253 disalignment 50 disfluency  57–58, 89, 96, 131, 150, 175 distribution  23, 28–30, 34, 45, 49, 56, 79–80, 99–101, 108, 117–118, 139, 181, 197–198, 214, 246, 253, 273, 275 divergence  27, 50, 165–167, 171, 174–175 duration  89–92, 108, 113–115, 131–132, 151–153, 180, 200 Dutch 156 E eavesdropper  19, 21 ecological validity  4, 68 education  24, 40, 91, 106, 110–112, 131, 133, 143, 166 egocentricity  15, 34, 37–39, 45, 231, 282 emic  18, 76, 272 English  8, 23, 28, 55, 81, 83–84, 88, 96, 99, 106, 113, 115, 118, 123, 151–156, 160, 162, 166, 168–169, 173–175, 182, 184, 198, 201–203, 209–211, 214, 221, 227, 251–252, 254, 262–263, 269 AAVE 23–24 American  153, 184 Australian  113, 115 British 28 Liverpool 28 Native American  262–263 entrainment 47 ethnicity  23–24, 45, 189 Ethnography of Communication 42 etic  30–32, 46, 68, 72, 271–272, 275 etiquette  104, 190 exemplification 159 expert  21, 41, 44, 70, 172, 278 expertise  40, 42, 172, 195, 278 eye gaze  19, 144, 257, 268, 281

F F0  51, 88, 90, 94, 109, 114, 131–132, 139, 151, 287 face 26 face needs  19, 26 familiarity  19, 23–24, 35, 43, 95, 99, 107, 109, 118, 120, 137, 153, 198, 236, 263, 286 fathers  106–109, 112–113, 144, 261 feedback signal  18, 58–60, 62–63, 85, 144, 146, 203, 213, 217, 224, 230, 235, 279 field  28–30, 34, 264 finetuning  9, 95, 102, 115–119, 142, 145, 152, 173, 175, 204, 266 fixed frames  92, 98–99, 102, 127, 188 footing 20 fossilization  164, 167 frequency  80, 90, 93, 95, 98, 100, 108, 117, 127, 132, 135, 137–138, 163, 201, 254, 276 friends  41, 43, 149, 155, 165, 168–169 G gender  22, 28, 45, 73, 95, 107–109, 113–114, 122, 143, 189–190, 211, 216–218, 225, 253, 256 grammatical  23, 25, 29, 36, 39, 46, 48–49, 52, 87, 94–101, 104–105, 108, 110, 114–115, 136, 154–158, 160, 162, 164–165, 167–171, 175, 185–187, 199, 207, 213, 215, 217, 223, 226, 238, 253–257, 259, 264, 271–273, 283, 287 grammatical gender  95, 253, 256 genre  30–31, 42, 269, 272 German  14, 95, 99, 108, 117, 156–157, 167, 182, 197, 210 gestalt properties  151, 164, 253 H Hebrew 151 hesitation  61, 150 heterogeneity  9–10, 31, 115, 150, 158, 161, 167, 198, 218, 258, 261–264, 269, 271, 281, 284 Hiri Motu  157, 165, 168, 250

homogeneity  9, 12, 106, 167, 260–261, 269 humanoid robots  178, 190, 216, 234–235 hyperarticulation 91–92, 114–115, 150, 153, 180 I identity  11, 23, 25–27, 55, 64, 175–176, 260, 276–277, 287 innateness hypothesis  87, 126, 136 intellegibility 131 intensity  88, 131–132, 139 interaction engine  2 interactional pessimism  160–161, 258–259 interactional sociolinguistics 20, 45 Interactive Alignment Model 37, 48, 50–51, 55–56, 140, 247, 277 interdisciplinarity  2, 13, 18, 65–67, 75 intonation  5, 61–62, 86, 88–90, 93–94, 108, 113, 115, 122, 124, 132, 153, 181–182, 204, 207, 215–216, 235, 256, 262, 272 isolated words  93 J Japanese 166 joint action  1–2, 6, 15, 40–42, 48, 51, 121, 267, 289 joint attention  80, 103, 107, 112, 120, 202, 211, 259, 280 L laboratory studies  10, 68, 70 language ownership  174 linguistic adjustments interactional  102, 159–164, 167, 169–170, 255, 259, 273 lexical  13, 24–25, 27, 41, 46, 48, 55, 86, 92, 94, 96, 101–104, 108, 114, 121, 124, 131, 133–134, 137, 146, 154, 156–159, 165, 167, 169, 181–183, 199, 201, 205, 208, 212, 215, 217, 228, 238, 242–244, 246, 256, 259, 278–279, 283

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

Index 325

phonetic  13, 22, 27, 55, 88, 91–92, 108, 114, 151, 153, 178–181, 255, 257–259, 266, 270, 272 phonological  13, 27, 29, 88, 90, 93, 123, 141, 150–151, 153–154, 167, 178–179, 181, 198, 253, 258, 262, 272 pragmatic  1, 49, 87, 101, 103, 105, 110, 119, 158, 167, 187, 194, 197, 215, 220, 253, 257, 272–273 prosodic  13, 21–22, 35, 51, 61–62, 88–91, 93–94, 108, 113, 124–126, 129, 132, 150–151, 153–154, 167, 178–181, 198, 204, 207, 225, 227, 230, 253, 257–259, 266, 272, 279 semantic  1, 17, 28, 31, 48–49, 53, 97, 99, 101–105, 108, 110, 116–117, 119, 121, 128, 131, 135, 137, 139, 158, 167, 184, 187, 199, 252, 256, 264, 272 syntactic  13–14, 27, 35, 48–49, 94–97, 99, 115–117, 119, 128, 130, 132, 135, 137, 139, 183, 185, 199–201, 208, 228, 238, 244, 256–257, 259, 272, 283 linguistic modification interactional  159–164, 167, 169, 255, 259, 273 linguistics  6, 67, 289 computational 67 developmental 67 sociolinguistics  6, 13, 20, 22, 45 loudness  150, 154, 179–181, 193, 226, 232, 256 M members’ categories  15, 18 membership analysis  13, 15–18, 74, 76–78, 172, 210, 234–235 membership categorization analysis 17–18 membership categorization device 16

methodology  1–2, 10, 12–13, 16, 29, 34–35, 47, 52, 58, 61, 65–66, 68–74, 76–77, 118–119, 122, 135–136, 150, 179, 201, 215, 289 mindless transfer  27, 188–191, 196, 198, 212–215, 237, 247, 266 minimal pairs  96 MLU  115–120, 124–125, 128, 132, 173, 176, 187, 203, 207, 212–214, 217, 252 mode  28–29, 44, 204, 264 modification grammatical  23, 25, 29, 36, 39, 46, 48–49, 52, 87, 94–101, 104–105, 108, 110, 114–115, 136, 154–158, 160, 162, 164–165, 167–171, 175, 185–187, 199, 207, 213, 215, 217, 223, 226, 238, 253–257, 259, 264, 271–273, 283, 287 input  49, 52, 71, 75, 87, 90, 95, 97–98, 100–101, 105, 111, 113, 118, 126–137, 146, 160, 162–164, 167, 169–171, 179, 198, 253, 258 monologue  29, 35, 51, 69 motherese 87 multimodality 19 N nativism 136 naïve theories  111–112, 143 negation  15, 154, 156–157, 256–257 non-native speakers  9, 55, 70, 149–150, 159–160, 165–166, 171, 173–175, 253–255, 262, 265 normalization 80 normativity  16, 188, 192–193, 195 norms  23, 25, 39–40, 42, 78, 114, 263 novice  44, 70 O omission article  154, 157, 165, 251 auxiliary 154 copula  23, 96, 117, 154–157, 250, 252–253 function words  165, 250

on-line production  69 one-bit model  4 overaccommodation  25, 165 overhearer  19–22, 40, 44, 49, 52, 56–57, 62, 78, 166, 280, 287 P parentese 87 participation status  20, 53 participation structure  19–22, 25, 52 passive  94, 96, 132 past tense  155 pauses  89–90, 108, 150, 152, 154, 179–180, 193, 198–199, 256–257, 262 perspective taking  44–45, 48, 51, 65, 202, 272, 275 philosophy of language  32 phonotactics  90, 92–93 pidgin  157, 162, 165, 168–169, 174, 250–251, 254–256 Pioneer  81, 83, 86, 180–181, 187, 197, 211, 214–216, 222, 226, 236–237, 275 pitch  88–94, 108, 113–115, 123–124, 138, 150–153, 200, 212, 266 politeness  14, 21, 54–55, 100, 166, 188, 190, 194, 196, 198, 200, 203, 207, 213, 220–221, 224–227, 238, 246 possessives  23, 98, 154, 157, 253 poverty of stimulus  127 prediction by simulation  56, 267 present tense  23, 94, 212 pretense  103, 113, 146–147, 218, 221, 245, 268, 278 priming  37, 48–49, 51, 53, 138–140, 206, 238–239, 241, 279 principle of least collaborative effort  44, 159, 233 prosody  25, 46, 61 R recast  94, 105, 107, 112, 118, 129, 134, 159, 163–164

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

326 Designing Speech for a Recipient

recipient design  1, 3, 13–16, 18–20, 23, 74, 198, 249, 277, 283, 288 counterfactual 14 redundancy  101, 119, 145 referee design  22–24 reformulation  72, 74, 76, 78–79, 104–106, 116, 122, 135, 140, 159, 175–176, 181, 239, 251–252, 280 register  1, 6–9, 11, 13, 27–31, 79, 87, 106, 149, 151, 154, 156, 198, 249, 260–261, 263–264, 269–272, 284–285 probabilistic 284 register marker  28, 198 register studies  79 register theory  13, 27, 29, 269, 284 relevance theory  32–33 repetition  48–49, 53, 61, 95, 106, 108, 116, 122–124, 126, 128, 135, 139–140, 142, 146, 153, 159, 163, 175–176, 178–180, 195, 199, 232, 280 response cries  78 robots as social actors  188–189, 195, 198, 215–216, 238, 269, 276, 278, 282 Rolland  84–86, 178, 180, 182, 184, 192–197, 216, 220–223, 227–230, 232–233, 237–238, 246–247, 257, 273, 275, 281 Russian  95, 99 S scarcity 112 Scorpion  81–82, 86, 178, 180–181, 187, 197, 211, 213–216, 222, 226, 236–237, 275 semantic configuration  264 SES  25, 91, 106, 109–112, 122, 130, 143, 261 siblings  8, 16, 90, 107, 110, 112–113, 131, 136 similarity attraction theory  26 simplification  7, 11, 114, 127–128, 157–158, 172, 212, 249–252, 254–261, 265–266, 285

simplified register  11, 87, 106, 149, 261, 263–264, 285 singing 88 situation  5, 9, 21, 26–32, 35, 40, 43, 46, 48, 50–51, 53–55, 57, 65, 69–70, 72–76, 79–81, 88, 102, 119–121, 125, 130, 141, 168–172, 176–177, 184, 186–188, 200– 201, 205, 215, 218–219, 221, 225, 235–236, 240, 247, 259–264, 266–274, 276, 278, 283–289 asymmetrical 64 definition 272 etic definition  27, 46, 171, 176, 264, 272 members’ definition  69, 72–73, 76, 79, 263, 285 model  48, 50–51, 53, 269 type  28, 31, 264, 269 skewed input  98, 100, 127, 163, 253 slow speech  88–89, 150 social identity theory  26 social class  22, 111 social distance  25, 166 social practice  69, 191, 288 social psychology  6, 13, 68–69, 248 social status  11, 19, 26, 64, 168 socio-economic status  25, 91, 106, 109–112, 122, 130 Spanish  95, 153, 175 Speech Accommodation Theory 166 speech rate  26, 51, 88–89, 91, 94, 113–114, 117, 150–155, 167, 173–174, 179–180, 198 speech technology  75 speech to computers  1, 28, 54–55, 69, 71, 128, 177, 180–181, 186, 188–192, 196–198, 200–201, 210, 215–216, 219–221, 228–229, 231, 236, 239, 246, 278 dogs  1, 8, 123, 137, 144, 211–212, 214, 228 dolls 8 elderly  8, 26 pets  21, 219, 260, 266, 277

speeding up  173, 258 statistical analysis  69–70, 80, 133 stereotype  3, 10, 26, 154–155, 170, 172, 175–176, 211, 262–263, 268, 276, 283 stranger  41–43, 70, 145–146, 268, 282 stress  88–90, 150, 158–159, 164, 177, 287 style  20–21, 23–24 subcategorization 97 synchronization 47 synonymy  97, 158 syntactic dependency  97 Systemic Functional Grammar 28, 30–31, 264 T tag question  20, 154, 157, 161 talk by babies  139 talking-down 165–166 Tangram  42, 57 taxonymy 97 Teacher Talk  26, 110, 130, 149, 162–163, 165, 169, 171, 173 technical vocabulary  183 telegraphic style  95, 185–186, 199, 252, 256 tenor  28–29, 264 tense marking  39, 95, 123, 156, 230, 250, 256, 263 Thai  113, 115 time pressure  4, 50 Tok Masta  156–157, 162, 165, 168, 174, 262 Tok Pisin  156–157, 162, 165, 250, 262 topic  5, 24, 30, 51, 61–63, 107, 112, 120, 153, 159, 174, 194, 208, 279 transcription conventions  86 turn allocation  20 turn-taking  49, 52, 143, 188, 192–193, 222, 288 two-bit models  286 type-token ratio  116, 155, 203, 213, 215, 217

Index 327

U universals  2, 8, 94, 112, 126, 250–251 usage-based model  270–271, 284

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

V variability  88, 90, 92–93, 97, 107, 113, 115, 127, 149, 169, 186, 191, 215, 224, 226–228, 234, 237, 252–253, 257, 260, 262, 264, 266, 268, 273

variation sets  96–97, 252, 273 variety  1, 8 vowel space  91–92, 114–115, 152–153, 253, 266, 272 W warmth  114, 152 well-formedness  96, 101, 123, 155 Wizard-of-Oz 71 word order  95–96, 99, 137, 156, 250, 252, 256

X xenolect 149 Y Yimas Pidgin  157, 165, 168, 250

Copyright © 2016. John Benjamins Publishing Company. All rights reserved.

This study asks how speakers adjust their speech to their addressees, focusing on the potential roles of cognitive representations such as partner models, automatic processes such as interactive alignment, and social processes such as interactional negotiation. The nature of addressee orientation is investigated on three candidates for so-called ‘simpliied registers’: speech to children (also called motherese or baby talk), speech to foreigners (also called foreigner talk) and speech to robots. The volume integrates research from various disciplines, such as psychology, sociolinguistics, psycholinguistics and conversation analysis, and ofers both overviews of child-directed, foreigner-directed and robot-directed speech and in-depth analyses of the processes involved in adjusting to a communication partner.

“Fischer has written a thorough, clear, and masterful contribution to understanding how people talk to children, foreigners, and robots. Do people have a model of their conversational partner, and if so, what is it? Fischer provides one of the most complete and penetrating reviews of these questions I have read. The test of an important work of science is not only how many questions it has answered, but how many it has raised. Fischer has succeeded admirably on both.” Herbert H. Clark, Stanford University “A distinctive and valuable contribution to the ield.” John Bateman, University of Bremen

isbn 978 90 272 5675 1

John Benjamins Publishing Company