Robots that Talk and Listen: Technology and Social Impact 9781614514404, 9781614516033

Robots That Talk and Listen provides a forward-looking examination of speech and language in robots from technical, func

195 40 7MB

English Pages 358 Year 2014

Report DMCA / Copyright


Table of contents :
List of contributing authors
Part I: Images
My robot
I Introduction
II Generative-research methodology
III What children want from technology
A Methodology
B Findings and discussion
IV Robots
A Prior research
B Robot study
V Conclusions
Cultural icons
I Introduction
II Robot as killer
A The androids of R.U.R.
B The frankenstein monster
C Managing killer robots
III Robot as servant
A Karakuri ningyo
B Golem
IV Robot as lover
A Pygmalion’s statue
B Robots loving humans
C Managing robot love
V Conclusion
Android aesthetics: Humanoid robots as works of art
I Introduction
II The history of automata for art and entertainment
A Automata in ancient times
B Clockmaking and automata
C Karakuri ningyo
D Animatronics
E An android as a self portrait: the work of Hiroshi Ishiguro
F Hanson robotics
III The Philip K. Dick android
IV Robot components
V Building a dialogue system
A Gricean maxims
B Architecture
C Conversing like Philip K. Dick
VI Conversational competence
A Be orderly
B Background noise
C Cooperative principle
D Competence
VII The scientific value of androids
VIII The value of androids as art
IX The uncanny valley: a possible obstacle to artistic androids
X Consciousness
XI Conclusion
Part II: Frameworks and Guidelines
Enabling human-robot dialogue
I Introduction
II Review of the literature
A Multimodal, multiparty dialogue
B Situated interaction
C Joint action
D Linguistic and nonverbal effectiveness
E Adaptive dialogue
III A framework for human-robot dialogue-systems
A Multimodal language processing
B Domain processing
C Task model
D Dialogue model
E Multimodal production
F Adaptive dialogue
IV Enabling effective human-robot dialogue
A Task model for instruction and repair
B A production model for expert robot speech
C Summary
V Opportunities and challenges for future work
A Linking task and dialogue models
B Development of reusable models
C Open sharing of models and components
VI Conclusion
Robots can talk – but can they teach?
I Introduction
II Androids in the classroom
III Foreign-language teaching
A The evolution of foreign-language teaching
B The ACTFL guidelines
C Assessing accuracy in Japanese
IV Robots and the ACTFL guidelines
V Identifying the difficulties facing Japanese-language learners
A Learning sounds and prosody
B Grammar
C Sociocultural aspects
D Pragmatic strategies
E Can a speech-enabled robot teach?
VI Global communication and the intercultural speaker
A Dialogue and the intercultural speaker
B Robots as an intercultural link
VII Conclusion
Comprehension, coherence and consistency: Essentials of Robot Feedback
I Introduction
II Prior work
III A framework for human-robot interaction
A Introduction
B Robot feedback
C Mental models
D Three basic principles
IV Conclusion
Part III: Learning
Extensible grounding of speech for robot instruction
I Introduction
A Eldercare as a domain
B Language and learning
C Cultural bootstrapping
II Grounding substrate
A Object finding
B Object properties
C Gesture recognition
D Speech interpretation
E Manipulation routines
III Demonstration of abilities
A Scene understanding
B Object naming
C Semantic web access
D Procedure learning
IV Adding motivation
Lies and deception: Robots that use falsehood as a social strategy
I Introduction
II Prior work
III Basic elements
IV Framework
A Representing an interaction
B Outcome-matrix transformation
C Stereotyping
V Implementation
A Examining the factors influence the decision to lie
B Using stereotypes and partner modeling to predict the cost of lying
VI Summary and future work
VII Conclusion
Robotic learning from multimodal instructions: a card game case study
I Introduction
II Related work
III Human-to-human instruction
IV System components: instructor input
A Overview
B Speech
C Non-verbal input
D Multimodal integration
E Temporal and semantic integration
V Robot agent learning
A Overview
B Rule frames
C Action selection at execution time
D Mapping issues
VI Dialogue management (DM)
VII System evaluation
A Approach
B Experiment 1: dealing instruction
C Experiment 2: teaching four rules
D Errors per rule
VIII Discussion of errors
A Human error
B Dialogue errors
C Grammar coverage
D Manipulation recognition and multimodal integration
IX Summary and conclusions
A Summary
B Corpus-based approach
C Demonstration channel
D Multimodal integration
E Conclusion
Part IV: Design
Real-time audition system for autonomous mobile robots
I Introduction
II Issues and challenges in robot audition
A Microphones
B Reverberation
C Environmental noise
D Ego noise
E Real-time performance
III ManyEars: an open framework for robot audition
A Localization
B Tracking
C Separation
D Post-filtering
IV Recognition
A Automatic speech recognition (ASR)
B Speaker recognition
C Emotion, music and daily sounds recognition
V Conclusion
The effects of design choices on human-robot interactions in children and adults
I Introduction
II Prior work – the evolution of the role of robots
III Social schemas and social metaphors
IV Design choices in lower-level communication modalities
A Type of voice
B Speed setting of gestures
V Design choices in higher-level communication modalities
A Proxemics and social schemas
B Level of attention
VI Effects of developmental differences on design choices
A Level of intelligent behavior in robots
B Realism and contingency level
VII Conclusions and future work
Part V: Conclusion
From talking and listening robots to intelligent communicative machines
I Introduction
II Looking for solutions
A Beyond speech
B Beyond words
C Beyond meaning
D Beyond communication
E Beyond dialogue
F Beyond one-off interactions
III Towards intelligent communicative machines
A Achieving an appropriate balance of capabilities
B A consolidated perspective
C Beyond human abilities
IV Conclusion
Recommend Papers

Robots that Talk and Listen: Technology and Social Impact
 9781614514404, 9781614516033

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Markowitz (Ed.) Robots that Talk and Listen

Robots that Talk and Listen Technology and Social Impact

Edited by Judith A. Markowitz

Editor Judith A. Markowitz 5801 North Sheridan Road #19A Chicago IL 60660 USA Email: [email protected]

ISBN 978-1-61451-603-3 e-ISBN (PDF) 978-1-61451-440-4 e-ISBN (EPUB) 978-1-61451-915-7 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at © 2015 Walter de Gruyter, Inc., Berlin/Boston/Munich Cover image: Kalista Mick, Honda Research Institute, USA Typesetting: PTP-Berlin Protago-TEX-Production GmbH, Berlin Printing and binding: CPI books GmbH, Leck ♾ Printed on acid-free paper Printed in Germany

Contents List of contributing authors | XIII Preface | 1

Part I: Images Steve Mushkin My robot | 5 I Introduction | 5 II Generative-research methodology | 6 III What children want from technology | 6 A Methodology | 7 B Findings and discussion | 7 IV Robots | 10 A Prior research | 10 B Robot study | 12 V Conclusions | 18 Judith Markowitz Cultural icons | 21 I Introduction | 21 II Robot as killer | 22 A The androids of R.U.R. | 23 B The frankenstein monster | 26 C Managing killer robots | 29 III Robot as servant | 31 A Karakuri ningyo | 32 B Golem | 34 IV Robot as lover | 39 A Pygmalion’s statue | 39 B Robots loving humans | 43 C Managing robot love | 45 V Conclusion | 46 David F. Dufty Android aesthetics: Humanoid robots as works of art | 55 I Introduction | 55 II The history of automata for art and entertainment | 56 A Automata in ancient times | 57 B Clockmaking and automata | 57






C Karakuri ningyo | 58 D Animatronics | 58 E An android as a self portrait: the work of Hiroshi Ishiguro | 59 F Hanson robotics | 60 The Philip K. Dick android | 61 Robot components | 62 Building a dialogue system | 63 A Gricean maxims | 63 B Architecture | 64 C Conversing like Philip K. Dick | 65 Conversational competence | 66 A Be orderly | 66 B Background noise | 68 C Cooperative principle | 68 D Competence | 69 The scientific value of androids | 71 The value of androids as art | 72 The uncanny valley: a possible obstacle to artistic androids | 73 Consciousness | 74 Conclusion | 75

Part II: Frameworks and Guidelines Bilge Mutlu, Sean Andrist, and Allison Sauppé Enabling human-robot dialogue | 81 I Introduction | 81 II Review of the literature | 82 A Multimodal, multiparty dialogue | 83 B Situated interaction | 84 C Joint action | 85 D Linguistic and nonverbal effectiveness | 87 E Adaptive dialogue | 88 III A framework for human-robot dialogue-systems | 89 A Multimodal language processing | 90 B Domain processing | 90 C Task model | 90 D Dialogue model | 91 E Multimodal production | 91 F Adaptive dialogue | 92



IV Enabling effective human-robot dialogue | 93 A Task model for instruction and repair | 93 B A production model for expert robot speech | 103 C Summary | 115 V Opportunities and challenges for future work | 116 A Linking task and dialogue models | 116 B Development of reusable models | 117 C Open sharing of models and components | 117 VI Conclusion | 118 Fumiko Nazikian Robots can talk – but can they teach? | 125 I Introduction | 125 II Androids in the classroom | 126 III Foreign-language teaching | 128 A The evolution of foreign-language teaching | 129 B The ACTFL guidelines | 130 C Assessing accuracy in Japanese | 132 IV Robots and the ACTFL guidelines | 133 V Identifying the difficulties facing Japanese-language learners | 134 A Learning sounds and prosody | 134 B Grammar | 136 C Sociocultural aspects | 139 D Pragmatic strategies | 141 E Can a speech-enabled robot teach? | 142 VI Global communication and the intercultural speaker | 143 A Dialogue and the intercultural speaker | 144 B Robots as an intercultural link | 144 VII Conclusion | 145 Nicole Mirnig and Manfred Tscheligi Comprehension, coherence and consistency: Essentials of Robot Feedback | 149 I Introduction | 149 II Prior work | 152 III A framework for human-robot interaction | 154 A Introduction | 154 B Robot feedback | 154 C Mental models | 156 D Three basic principles | 158 IV Conclusion | 166



Part III: Learning Jonathan H. Connell Extensible grounding of speech for robot instruction | 175 I Introduction | 175 A Eldercare as a domain | 175 B Language and learning | 177 C Cultural bootstrapping | 179 II Grounding substrate | 180 A Object finding | 181 B Object properties | 183 C Gesture recognition | 185 D Speech interpretation | 187 E Manipulation routines | 189 III Demonstration of abilities | 191 A Scene understanding | 192 B Object naming | 193 C Semantic web access | 195 D Procedure learning | 197 IV Adding motivation | 199 Alan R. Wagner Lies and deception: Robots that use falsehood as a social strategy | 203 I Introduction | 203 II Prior work | 205 III Basic elements | 206 IV Framework | 208 A Representing an interaction | 208 B Outcome-matrix transformation | 209 C Stereotyping | 211 V Implementation | 213 A Examining the factors influence the decision to lie | 216 B Using stereotypes and partner modeling to predict the cost of lying | 218 VI Summary and future work | 222 VII Conclusion | 223 Joerg C. Wolf and Guido Bugmann Robotic learning from multimodal instructions: a card game case study | 227 I Introduction | 227 II Related work | 228 III Human-to-human instruction | 228


IV System components: instructor input | 231 A Overview | 231 B Speech | 231 C Non-verbal input | 233 D Multimodal integration | 235 E Temporal and semantic integration | 237 V Robot agent learning | 238 A Overview | 238 B Rule frames | 238 C Action selection at execution time | 240 D Mapping issues | 240 VI Dialogue management (DM) | 242 VII System evaluation | 243 A Approach | 243 B Experiment 1: dealing instruction | 244 C Experiment 2: teaching four rules | 247 D Errors per rule | 249 VIII Discussion of errors | 251 A Human error | 251 B Dialogue errors | 251 C Grammar coverage | 252 D Manipulation recognition and multimodal integration | 252 IX Summary and conclusions | 253 A Summary | 253 B Corpus-based approach | 253 C Demonstration channel | 254 D Multimodal integration | 255 E Conclusion | 255

Part IV: Design François Grondin and François Michaud Real-time audition system for autonomous mobile robots | 263 I Introduction | 263 II Issues and challenges in robot audition | 264 A Microphones | 265 B Reverberation | 265 C Environmental noise | 267 D Ego noise | 268 E Real-time performance | 268




III ManyEars: an open framework for robot audition | 268 A Localization | 269 B Tracking | 271 C Separation | 272 D Post-filtering | 274 IV Recognition | 275 A Automatic speech recognition (ASR) | 275 B Speaker recognition | 278 C Emotion, music and daily sounds recognition | 279 V Conclusion | 280 Sandra Y. Okita and Victor Ng-Thow-Hing The effects of design choices on human-robot interactions in children and adults | 285 I Introduction | 285 II Prior work – the evolution of the role of robots | 286 III Social schemas and social metaphors | 288 IV Design choices in lower-level communication modalities | 291 A Type of voice | 291 B Speed setting of gestures | 294 V Design choices in higher-level communication modalities | 297 A Proxemics and social schemas | 297 B Level of attention | 299 VI Effects of developmental differences on design choices | 303 A Level of intelligent behavior in robots | 304 B Realism and contingency level | 307 VII Conclusions and future work | 309

Part V: Conclusion Roger K. Moore From talking and listening robots to intelligent communicative machines | 317 I Introduction | 317 II Looking for solutions | 319 A Beyond speech | 320 B Beyond words | 322 C Beyond meaning | 323 D Beyond communication | 324 E Beyond dialogue | 325 F Beyond one-off interactions | 326


III Towards intelligent communicative machines | 327 A Achieving an appropriate balance of capabilities | 328 B A consolidated perspective | 329 C Beyond human abilities | 330 IV Conclusion | 330 Index | 337


List of contributing authors Sean Andrist, Graduate Research Assistant, Human-Computer Interaction Laboratory, Department of Computer Sciences, University of Wisconsin–Madison, Madison, WI USA http:// sandrist@ Guido Bugmann, Associate Professor (Reader) in Intelligent Systems, Centre for Robotics and Neural Systems, The Cognition Institute, School of Computing and Mathematics, University of Plymouth, United Kingdom Jonathan Connell, Research Scientist, IBM T.J. Watson Research Center, Yorktown Heights, NY USA David Dufty, Senior Research Officer, Australian Bureau of Statistics, Canberra, Australia http:// François Grondin, Junior Engineer, IntRoLab Laboratory, Interdisciplinary Institute of Technological Innovation (3IT) Université de Sherbrooke, Sherbrooke, Quebec, Canada https:// Judith Markowitz, Founder and President, J. Markowitz, Consultants, Chicago, IL USA http:// François Michaud, Director, Interdisciplinary Institute of Technological Innovation (3IT), Université de Sherbrooke, Sherbrooke, Quebec, Canada and Professor, Department of Electrical Engineering and Computer Engineering, IntRoLab, Université de Sherbrooke, Sherbrooke, Quebec, Canada Nicole Mirnig, Research Fellow, Information and Communication Technologies & Society (ICT&S) Center and Christian Doppler Laboratory for

“Contextual Interfaces,” University of Salzburg, Salzburg, Austria Roger K. Moore, Professor, Department of Computer Science, University of Sheffield, Sheffield, United Kingdom http://www.dcs.shef. Steve Mushkin, Founder and President, Latitude Research, Beverly, MA, US Bilge Mutlu, Director, Human-Computer Interaction Laboratory and Assistant Professor, Department of Computer Sciences, University of Wisconsin–Madison, Madison, WI USA  http:// [email protected] Fumiko Nazikian, Director, Japanese Language Program and Senior Lecturer in Japanese, Department of East Asian Languages and Cultures, Columbia University, New York, NY USA http:// Victor Ng-Thow-Hing, Principal Scientist, Honda Research Institute USA Inc. Sandra Okita, Associate Professor of Technology and Education, Columbia University, Teachers College, New York, NY USA http://www. Allison Sauppé, Graduate Research Assistant, Human-Computer Interaction Laboratory, Department of Computer Sciences, University of Wisconsin–Madison, Madison, WI USA aterrell@ Manfred Tscheligi, Co-Director, Information and Communication Technologies & Society (ICT&S) Center and Director Christian Doppler Laboratory for “Contextual Interfaces,” University of Salzburg, Salzburg, Austria http://www.icts.sbg.


   About the Editor

Alan Wagner, Senior Research Scientist, Georgia Tech Research Institute and the Institute for Robotics and Intelligent Machines, Atlanta, GA USA

Joerg Wolf, Research Associate, School of Computing and Mathematics, Plymouth University and System Architect, Navigation, Elektrobit Automotive GmbH

About the Editor Dr. Judith Markowitz is known internationally as a thought leader in speech processing. She has published extensively, has been active in standards development, and served as the technology editor of Speech Technology Magazine for more than ten years. She recently co-edited two books on advanced natural-language solutions with Dr. Amy Neustein, Editor of De Gruyter’s series entitled Speech Technolocy and Text Mining in Medicine and Healthcare. In 2006, Judith was awarded IEEE Senior Member status.

Preface Robots That Talk and Listen assembles a comprehensive vision of speech-enabled robots from multiple vantage points. Between its covers you will find science and culture, hardware and software, frameworks and principles, learning and teaching, and technology and art – in the past, the present, and the future. The perspectives represented are those of roboticists, speech scientists, and educators; researchers from industry and academia; young children and seasoned scholars. The product of this diversity is a compendium that advances the art and science of spoken-language processing in robots. This 12-chapter anthology is partitioned into four parts followed by a conclusion that extends the vision beyond speech and language. The chapters in Part I provide images of robot pasts and futures. The first chapter presents a robotic future as imagined by those most likely to live in a world filled with social robots: children. It is replete with pictorial and verbal images derived from two generative studies of what children want from technology. The next two chapters probe the past for knowledge about the present. The first presents an overview of the origins and evolution of sentiments about robots. To accomplish that, it employs cultural icons that epitomize three roles that we expect robots to fill: killer (the fear that they will turn against us), servant (the expectation that they will continue to serve us), and lover (the prospect that we will engage in romantic liaisons with them). Part I is rounded out with an edifying survey of the use of mechanical humans made as art and entertainment. The chapter then delves more deeply into the art and science involved in the construction of the Philip K. Dick android, emphasizing the dialogue precepts and engineering tools that were incorporated into it. The three chapters in Part II employ elements of human-human interaction to generate frameworks for creating effective human-robot interactions. The first chapter specifies a powerful framework for multimodal and multi-party dialogues. The authors test the model in two vastly-different types of robot-human interaction: managing errors and misunderstandings as an instructor; and communicating varying levels of expertise as a tour guide. The following chapter delineates the linguistic, social, and cultural knowledge that must be incorporated into a robot tasked with teaching a foreign language. Using the Japanese language as a model, the chapter identifies methods for instruction, practice, and evaluation of language skills. The author then reflects upon the growing need for intercultural awareness and sensitivity. Part II concludes with a fascinating exposition of three fundamental principles of communication that are critical for effective robot-human interaction: Comprehension, Coherence, and Consistency. Each of these principles is examined and demonstrated in studies involving robot-human interaction. The third part provides three distinct approaches to robot learning. This part begins with an in-depth presentation of an approach to the grounding of linguistic, perceptual, and motor skills in the real world. This method for grounding enables a



mobile robot to learn instructions and to extend that knowledge to new commands, objects, and motor skills through spoken and gestured instruction. The chapter the concludes with a thoughtful exploration of the elements of self-motivation. The second chapter provides an enlightening exposition of various kinds of lying which leads to the construction of a framework for teaching robots how to use lies as a social strategy. That framework is tested in a series of card games that vary in the degree to which deception benefits or harms each of the players. A card game is also employed in the final chapter in this part of the book. The game is used to test a corpus-based, learning system that was derived from human-human instruction that combined speech with gesture and card manipulation. Part IV contains two chapters on design. The first offers an intriguing look at the external world through the sensors of a speech-enabled, autonomous robot. It delineates the acoustic challenges developers face when designing for dynamic noise and multiparty interactions. The authors then present methods and tools for overcoming them while remaining within the limited power resources of a mobile robot. The next chapter provides an illuminating examination of how low- and high-level design choices affect the success of robot-human interactions. The authors provide powerful evidence in favor of human-centric design from studies of multimodal interactions with both adults and children. The chapter concludes by extending the purview of human-centric design to awareness of characteristics of populations of target users (e.g., developmental differences). The conclusion rounds out the book by returning to the future. The author challenges developers to move “beyond speech, beyond words, beyond meaning, beyond communication, beyond dialogue and beyond one-off interactions.” He offers a vision of robots that understand human behavior and are capable of engaging in human (and superhuman) communication. In compiling this anthology, the editor has endeavored to construct an authoritative resource on speech-enabled robots for engineers, system developers, linguists, cognitive scientists, futurists and others interested in understanding, developing, and utilizing speech- and language-enabled robots. The incorporation of heterogeneous perspectives into a single source is predicated on the belief that social robots will play a transformational role in human society and that a true appreciation of their impact entails familiarity with a panorama of perspectives. November, 2014

Judith A. Markowitz, Chicago, IL USA

Part I: Images

Steve Mushkin

My robot Abstract: Children in industrialized nations are growing up in a world populated by digital devices and content and that technology has been making its way into elementary education. Yet, the application and design of technologies for classrooms are determined by adults with minimal input from children other than to observe how children respond to it. It is our position that effective use of advanced technology in educational settings should reflect the vision and expectations of the children who will be utilizing it. Our object is to help technology developers, educators and parents design and use advanced technology in more effective ways. This chapter describes two studies that use generative-research methodology to identify children’s expectations of advanced technologies in educational and personal settings. In the first study we asked 201 children aged 12 years and younger from eight different countries to draw a picture of something that they would like their computers to do differently. The majority (77 %) of them want computers to interact with them in more intuitive ways, such as using speech. Many of them also anthropomorphized the computer as robots. In the second study we asked 348 children from six countries to write stories about a robot that would serve as their personal robot. We found that for our participants the boundary between humans and machines is blurred. They view robots as peers – friends, study-buddies, and even caretakers – rather than as tools. Based on these findings, we recommend ways in which robots, speech, and other technologies can be applied to educational environments.

I Introduction An increasing number of children and young adults are growing up in a world filled with digital devices, content, and services, such as video games, computers, cell phones, and the Internet. Prensky (2001) calls these individuals “digital natives” because they are “‘native speakers’ of the digital language of computers, video games and the Internet” (Prensky 2001:1). These digital natives come from the generations that will live in a world that could be populated by autonomous, social robots and other incarnations of artificial intelligence. Since those technologies are already making their way into education settings, the objective of the research presented in this chapter is to help technology developers, educators and parents design advanced technology in more effective ways. It is our view that the effectiveness of advanced systems for educational settings will be greatly enhanced if they reflect the vision of the digital natives who will be using them. There is, however, little research to guide such development. This


   Part I: Images

chapter describes two studies that use generative-research methodology to identify what young children from around the world expect from advanced technologies in educational and personal settings. The chapter is organized as follows: Section II describes generative-research methodology, Section III presents a study on the changes children want to see in computing and Internet technology, Section IV presents a literature review on children’s responses to robots followed by our second study on children’s images of robots. We conclude with a summary of our findings.

II Generative-research methodology Generative-research methodology is a broad grouping of methods considered to be human-centric research because they treat people as collaborators and idea-generators (Hanington and Martin 2012). The object is to give participants room to express themselves creatively in some form while grounding that activity in specific research questions. Generative research can be powerful for unlocking concepts and ideas that aren’t necessarily easy to articulate. Additionally, people reveal a lot about themselves (e.g., problems, needs, dreams, aspirations) during the creative process, providing a rich context that the researchers might not elicit through other means. Because of these characteristics generative-research methodologies have proven useful for education design as a way to elicit abstract ideas more easily through means other than standard verbal interactions, such as interviews and surveys (Hanington 2007). A generative-research study typically involves small samples. It begins with the creation of one or more artifacts (e.g., drawings, collages, stories). The artifacts are seen as containing creative projections of the target concepts that can facilitate subsequent verbal expression of concepts. Analysis of the material collected in a generative study can employ a variety of techniques that range from simple counting of occurrence to various types of content analysis. Among the most frequently-used techniques is clustering of common items as a way to generate categories (Robson 2002).

III What children want from technology In 2010, we conducted a study designed to gather information about what digital natives want computers and the Internet to do differently (also see Latitude Research 2010). The key study questions were: 1. What does the next generation of digital natives expect and desire from technology, and how does this differ across world regions?

1 My robot   

2. 3.


How can we engage children as authors and inventors of future technology, not just passive recipients? How can young minds help companies develop unexpected content and technology experiences that resonate with people of all ages?

A Methodology We employed generative-research techniques with 201 children aged twelve and under from eight different countries: Argentina, Australia, Chile, Colombia, India, Mexico, South Africa, and the United States. The children were identified using a third party panel provider that recruited adult panelists with children in the 7–12 age range who agreed to help guide their children through the activity. We asked each child to draw her or his answer to the question, “What would you like your computer or the Internet to do that it can’t do right now?” We also interviewed parents regarding their children’s use of technology and participation in a number of online activities. Latitude employed a coding scheme to score the presence of specific themes in children’s inventions (e.g., type of interface, degree of interactivity, physical-digital convergence, and user’s desired end-goal).

B Findings and discussion In answer to the study question: What does the next generation of digital natives expect and desire from technology, and how does this differ across world regions? We found the following technology features to be equally desired by all the children in the study: intuitive interfaces (e.g., speech, touch screen), confers ability/knowledge, social/humanlike, and video tools. The children expressed a desire to interact with technology more intuitively and they tended to characterize technology in terms that were fundamentally human. More than three-quarters of them (77 %) drew pictures that represented dynamic human-level responsiveness on the part of technology and 43 % drew themselves or another person interacting with their creations. Of the children who specified an interface, 20 % explicitly requested verbal/auditory controls as shown in figures 1.1a and 1.1b. Figure 1.1c indicates that parent interviews revealed that the expectation of voice-controlled technology was even more widespread. This category intersected with aspects of proposals to make technology more social. Although future technology as a whole was characterized as humanlike, robots were represented as being more akin to humans than keyboards, PCs, or the Internet. Parent interviews supported this view. Figure 1.1d provides a typical example. Although anthropomorphism appeared in the suggestions of children from all geo-


   Part I: Images

I want a computer that talks to me when I’m playing my games. It helps me build adventures of my own, and avoid obstacles in the game.


My daughter would like a computer that talked with her like a friend, so that they could share things like friends. [translation]


The computer becomes 3-dimensional and, instead of a keyboard, it’s controlled by voice. [translation]


My son wishes the computer was a robot he could take everywhere with him – to play chess with him or soccer outside... in other words,he wants it to be a friend he can share with his other friends [translation]


Figure 1.1: Images of future technology; (a) boy, U.S.; (b) girl age 11, Denmark; (c) parent, girl age 7, Argentina; (d) parent, boy age 7, Colombia

graphic areas, there were regional and cultural differences within this category. Children in South Africa, India, and Latin America were more likely than those in Europe, Australia, and North America to anthropomorphize computers – to imagine them as friends or teachers that could share their experiences or help in the accomplishment of a goal (see Figure 1.1d). Twenty-six percent of the children proposed ideas that facilitated sharing with others, such as friends, teachers, parents, and even the robot (figure 1.1d). In the interview about her drawing, a seven-year-old girl from the United States said, “I want to video kids on the other side of the world using a different kind of language.” The category identified as “confers ability/knowledge” addresses the study question: How can we engage children as authors and inventors of future technology,

1 My robot   


not just passive recipients? It refers to wanting technology to provide quick and easy access to information that would empower users by fostering knowledge or otherwise “adult” skills, such as speaking a different language or learning how to cook. One-quarter of participants in South Africa and India imagined computers that would assist with or output knowledge (such as help solving a math problem), and 74 % of kids in those regions envisioned technologies that aided with real-world tasks or abilities, such as cleaning one’s room or learning ballet. These results indicate that educators have a real opportunity to create new experiences where technology seems to disappear and where children experience the web directly as a source of learning and knowledge. This category is also expressed in the impetus to have technology that supports creativity. One-quarter of the children’s inventions – the same number which favored gaming – centered on art or design. Nearly one third of all the children went beyond simple creations, envisioning entire platforms for creating games, Web sites, action figures, and so on. Children’s interest in a host of design fields – industrial, landscape, fashion, Web, and more – reflects the visual richness of the online world, as well as the can-do creative drive that tech encourages. One-third of the children also invented technologies that would empower users by fostering knowledge or otherwise “adult” skills, such as speaking a different language or learning how to cook. Another expression of this category is the fuzzy boundary between “online” and “offline.” For these children, technology is no longer something that mediates experience, but something that pervades it. This convergence of the actual and virtual took many forms. Nearly 4 in 10 of the participants imagined immersive experiences of physical spaces (e.g., real or simulated travel) or devices that assisted physical activities (e.g., playing sports). A full 9 % explicitly cited 3D effects. Information delivery by the Internet enables real-world actions and choices. Taken together, the four major categories of features specified by the children help answer the third study question: How can young minds help companies develop unexpected content and technology experiences that resonate with people of all ages? More advanced, human-like interactions with technology create opportunities for in-class and at-home learning and can make learning feel more like play. For these digital natives, learning systems equipped with networking functionality and realtime, natural language processing can build a greater sense of independence and control whether the material covered is for school or personal development (e.g., creating art, cooking, or playing an instrument). Children across the world created technologies that seamlessly meld online and offline experiences, such as computers that “print” real food or that allow the user to touch objects displayed on the screen. There are significant opportunities for networked objects (e.g., The Internet of Things) to reinvent gaming in traditionally offline spaces. For example, users could accrue lifestyle points (automatically tracked) when they use their running shoes or take out the recycling bin.


   Part I: Images

IV Robots The study described in this section examines stories and drawings of children about robots in educational and personal settings. The foundations for this research can be found in the study described in Section III and prior work on children’s responses to robots.

A Prior research Some research on children and robots examines the “uncanny-valley” response by children. The uncanny-valley response was proposed in an essay by Masahiro Mori in 1970. Mori hypothesized that humans respond positively to robots until they become almost human-seeming in both appearance and behavior. At that point responses plunge into what Mori called the “uncanny valley.” He further conjectured that once android design becomes isomorphic with humans, reactions to them will, once again, become more positive. MacDorman and Ishiguro (2006) observed children interacting with androids and observed age differences with regard to the phenomenon. Woods and Dautenhahn (2006) surveyed children’s assessments of 40 images of androids varying in appearance from those that closely resembled humans (“human-like”) to those that were more machine-like in appearance (“human-machine”). The children rated each robot image on appearance, personality, and emotion. The researchers found that “children judged human-like robots as aggressive, but human–machine robots as friendly” (Woods and Dautenhahn 2006:1390). Both sets of researchers conclude that children exhibit the uncanny-valley effect and that designers of robots need to take it into account when designing robots for education. Other research has examined children’s responses to robots more generally. Tung and Chang observed children’s responses to robots ranging in both anthropomorphic appearance and in social behaviors, including speech. They determined that “Children perceive robots are more socially and physically attractive when they exhibit sufficient social cues” (Tung and Chang: 2013:237). The effect of social behaviors appeared to apply to highly anthropomorphic robots as well as to those that are less human-like. Although some of their findings differ from those of Woods and Dautenhahn, they make similar recommendations for the design of robots for education: “Middle- to low-level anthropomorphic designs combined with appropriate social cues can enhance children [sic] preferences and acceptance of robots” (Tung and Chang: 2013:237). A few studies have focused specifically on acceptance of various kinds of language. Looija, Neerincx, and de Lange (2008) surveyed 8 and 9 year-old children after they interacted with three bots: a text interface, a virtual character and a physical robot. They found that the children enjoyed the virtual and physical bots more than the text interface. They rated the physical robot as being the most fun. Mubin

1 My robot   


et al. (2010) assessed acceptability of three kinds of robot language: unconstrained natural language; constrained language; and artificial language. “Constrained language” refers to the practice of limiting the grammar and vocabulary. Constrained languages are often used in real-world robots for a number of pragmatic reasons, such as to enhance the accuracy of speech recognition. Artificial languages are also used in some robots, such as Hasbro’s Furby toy which speaks furbish. Mubin et al. performed Wizard of Oz tests with children aged 9 to 12 who played a card game with a robot. Afterwards, the children evaluated their experiences. The researchers found that children accepted all three language styles and did not appear to object to learning an artificial language in order to play the game. It isn’t clear how the findings about artificial language would apply to education in the real world unless that language is actually a foreign language the robot is teaching. Acceptance of robots in classroom and play environments is often done through observation of the children’s interaction patterns with the robots. Tanaka, Cicourel, and Movellan (2007), for example, put a dynamic robot into a classroom of toddlers and found that children’s social interaction with the robot increased steadily. By the conclusion of the five-month project the children treated the robot much as they treated each other. When the researchers changed the robot to make its speech and other behavior more rigid and less social, the children’s interactions with it diminished as well. This corroborates the findings of Tung and Chang (2013) described earlier. In a study done by Ruiz-del-Solar et al. (2010), a robot gave lectures to groups of children aged 10 through 13. Afterwards the children evaluated the robot and its presentation on a linear scale of grades going from 1 (poor) to 7(excellent). The evaluation of the robot had an average score of 6.4, which is about 90 %. More than half (59.6 %) of the children also evaluated the robot’s presentation as excellent and less than 1 % rated it as bad or very bad. In addition, 92 % expressed support for the idea of allowing robots to teach some subjects to future schoolchildren. Light et al. (2007) is one of the few studies that solicited input from children about design. Like our 2010 study, it was not specifically directed at obtaining designs for assistive robots. The researchers asked six children without disabilities to use a set of drawings and craft materials to create low-technology inventions for assisting the communication of a child with severe speech and motor impairments. The design process and the inventions were analyzed using qualitative methods. They found that what the children created was significantly different from existing augmentative and assistive communication tools. What they created was often multi-functional and capable of supporting social communication with peers. Although these devices were not necessarily defined as robots, they were sometimes described as companions rather than technology.


   Part I: Images

B Robot study What has been missing from the prior research on children and robots is the use of generative research to identify the role that digital natives want robots to play in educational and play settings. Our primary hypothesis was that robots, as the embodiment of artificial intelligence, can help us understand generally how we might want to interact with a new breed of machine intelligence in the future. These are the key study questions that came from this hypothesis: 1. What intersections exist among learning, play, and creativity; and how might technology facilitate all three fluidly? 2. What kind of relationships do children hope to develop with and through robots? 3. What are the specific opportunities for robots and other technologies to ignite and encourage children’s learning and creativity?

1 Methodology As with our 2010 study of children’s images of future technology, this study employed generative-research techniques. We engaged a third-party panel-provider to recruit adult panelists with children aged 8 through 12 who agreed to help guide their children through the study activity. This process was repeated for each of the countries included in the study: Australia, France, Germany, South Africa, the United Kingdom, and the United States. The study was performed with 348 children and their parents. None of the families participating in this study were involved in our prior study. Given our primary hypothesis, we focused on how children imagine themselves interacting with robots in the future. Since we are interested in enabling education design, we limited the environments to learning situations. Children were asked to imagine their lives as if robots were a fixture in their learning environments – at school and beyond. After answering a series of basic demographic and technology usage questions, children were told to select and write a story set in one of three narrative-specific life settings: in the classroom; at school, but outside of class; and at home after school. They were encouraged to draw a picture to accompany their stories. We did a pilot test of the methodology in a classroom in Australia before initiating the full study. As we did in our 2011 study, we employed a coding scheme to quantify the presence of certain themes among the collected stories and images. Specifically, we coded around a few areas, including the nature of human-robot relationships and the dimensions of human-robot activities (e.g., play, learning, creation, and exploration).

1 My robot   


2 Findings and discussion Several major categories emerged from the study that addressed the key study questions. These findings are also described in our summary report (Latitude 2012). The following categories provide answers to the study question: What intersections exist among learning, play, and creativity; and how might technology facilitate all three fluidly? – Learning and play overlap. – Robots empower children to pursue advanced learning and creativity. While one might expect children to create more stories about play than learning, the percentage of children who wrote about learning (38 %) was equal to the percentage of those who wrote about play. Many stories also portray learning and play as allied, overlapping, or equivalent pursuits. Movement between learning and play is fluid – even though, in the children’s lives, learning and play are often much more compartmentalized. In figure 1.2a, for example, doing math homework begins as an onus but is easily transformed into a game. In figures 1.2a, b, and c, the stories flow back and forth between learning and playing. The boy who wrote the text in figure 1.2c, for example, equates building problems and building (toy) models. An important aspect of the melding of learning and play can be attributed to the portrayal of robots by 75 % of the children as patient and supportive teachers. Figures 1.2a and 1.2d reveal that these qualities make it easier for children to become eager to tackle what might normally be boring or daunting material. Creating and learning are also strongly linked in children’s minds. By providing a supportive learning environment, robots motivate children to take more creative risks. Approximately one quarter of the children described how their robots freed them to pursue and express creative activities by moving challenging low-level tasks to the background. In figure 1.3a RJ, the “cool dude” robot, allows the child’s ideas to shine by handling spelling errors and the robots in figures 1.3b and 1.2c help those children to think and imagine as part of their learning environments. The answer the children gave to the study question: What kind of relationships do children hope to develop with and through robots? was that they expected to have multiple relationships with robots. Since their personal robots are always available, they are often portrayed as having multi-faceted relationships with the children. The following relationships were expressed by many of the children: – A robot can be both a playmate/friend and a study buddy. – A robot can be a role model. – A robot can be a caretaker. All of these categories support a core finding in our prior study (Latitude 2011): for these young digital natives, the boundary between human and robot/machine is tenuous. This is exemplified by the extremely human-like pictures in figures 1.2c, 1.2d, and 1.4a as well as by many of the stories.


   Part I: Images

Larry [the robot] said to me, ‘Look, maths is an important part of your life and you will be using it a lot in the future. If you don’t do maths now with me I won’t be a close friend.’ I said, ‘Ok I will do it,’ so we raced each other with multiplication and he won but I got a better score than I got at school. Every time we did it I got better and started to kind of like maths. When we finished, I said to Larry, ‘Thanks for caring about me.’


When I got to school this morning, my teacher surprised me by giving me a robot to help me with my schoolwork. We played football at recess with my friends. In class, he wrote for me and helped me to think. Leaving school he carried my bag and transformed into a bike. When we got home he prepared my snack and helped me do my homework. He created books for me to read, and we played with toy cars. He keeps my secrets. I can tell him anything, and he gives me advice. [translation]


My group finished its work before class ended, so my teacher let us leave early with the robot. I am overcome with joy and I play with him. But my friends are jealous so I lend them him (but not always). We are happy that he is with us and we have a good time. He helps us with building problems, like building models. Or scientific and alchemical problems. He can fly, drive, run and walk, of course. [translation]


I have a few problems in spelling. The robot shall support me and help me improve until I am at the same level as my classmates. The teacher tells the robot where my problems are. The robot is looking for one of his many stored programs and dictates to me. I write what he says and correct it. He can also wrinkle his forehead when something is not right. He continually encourages me even when I have not done so well. [translation]


Figure 1.2: Intersections between learning and play; (a) girl age 11, Australia; (b) boy age 11, France; (c) boy age 10, France; (d) girl age 10, Germany

The robots are humanoid peers with whom the children can identify and/or emulate. The robot drawing in figure 1.4a is one of the clearest pictorial representations of this but nearly two-thirds of stories revealed that the children take for granted that robots can be both friends and study buddies. This duality is closely linked to the blending of learning with play and is represented as both natural and expected. The examples in figure 1.4 as well as those in figures 1.2c and 1.3b exhibit this relationship duality.

1 My robot   

RJ is a cool dude robot. He looks like a transformer robot, and with a click of a button he shows me his screen. It then looks like a laptop. I may type my work into the laptop, instead of writing. Then RJ fixes my spelling, and tells me when my sentence is wrong. That way the teacher does not see all the mistakes, but can see how good my idea is.



The robot is like a new friend for me. It helps me with my homework. It can do it much better than my parents because it knows exactly how to explain the lessons to a kid like me. The robot is very smart and can answer a lot of questions for me and tell me interesting stories. He always reminds me of all possible things, which I would have otherwise forgotten. [translation]


Figure 1.3: Intersections between learning and creativity; (a) boy age 12, South Africa; (b) boy age 9, Germany

Even though their robots are humanoid in many ways, the children describe them as being clearly different from humans. The robot in figure 1.5a is metallic and has a funny voice; the one in figure 1.5b is small and has batteries in its “hair,” and the robots in figures 1.2c and 1.3a have remarkable transformer abilities. As those figures also show, that mentioning how a robot differs from human children is often accompanied by statements that the children have fully accepted the robot as part of their social group.

The robot helps you with the homework and then he plays with you. My parents were happy that they now no longer had to help me with the homework... [translation]


When I get home, my robot helps me with my homework. ...It could do everything—play soccer, build Legos, read, do math, write and all the movements a person can make. Since my parents really are always at work a lot, they can’t always help me or play with me or cook something. Now the robot helps them with that. [translation]


Figure 1.4: Dual friend and study-buddy relationships; (a) boy age 10, Germany; (b) boy age 9, Germany


   Part I: Images

My teacher treated my robot just like she was a real human student. My friends treated my robot like a human, too. She is friendly and funny and she fits in with all of us. No one would ever know that she is a robot except that she is made of metal and does not have skin. She is really smart and everyone likes to talk to her. She has a funny voice, but we do not tease her.


At school, this robot helped or played with everyone near me. Everyone was a little scared at the start but accept the presence of this new ‘friend.’ He speaks, walks, moves like us but is smaller. He helps me with my work and is recharged using little solar batteries in his ‘hair.’ He helps with everyday problems, mum overworked with her job and the house (making meals). He winks his eye when we share a secret. [translation]


Figure 1.5: Role models and caretakers; (a) girl age 8, U.S.; (b) girl age 11, France

One reason for the social acceptance of robots is that they excel in social skills as well as in intelligence. They are funny, clever, and other children enjoy talking and playing with them. They possess an enviable ability to fit in with others and to navigate their peer environments, in part, because they are smart. Thus, social and learning aspirations are closely linked. That is, being perceived as intelligent creates social opportunities, giving children a solid motivation to learn. This is a category that applies to study question 1 as well because social and interpersonal adeptness enable robots to be effective and supportive teachers. The robot described in figure 1.3b, for example, knows exactly how to explain anything to the child. Many of the robots also know how to motivate and encourage the children. Children not only characterized their robots as being like humans, they were often portrayed as “nerds.” This representation was positive in nature and had none of the social stigma it had in the past. This favorable view of scholarship and intelligence is, no doubt, also true in the real world (sans robots) for today’s digital natives. In the stories, therefore, robots illuminated the value currently placed on intelligence and technological skill. An associated thread that winds through the stories is that robots are ideal incarnations of teachers and parents. They are available whenever they are needed, they have infinite patience, they always know how to solve problems, and they can effectively and easily communicate their knowledge to the children. For example, Larry the robot (figure 1.2a) first scolds and then helps because it “cares” about the child. None of these conditions are possible for the humans in a child’s life. This fact is clearly expressed in some of the children’s stories about their robots who become caretakers in addition to their other roles. This can be seen in comments made in

1 My robot   


figures 1.2c, 1.3b, 1.4a, 1.4c, and 1.5b. The robot in figure 1.5b overtly lightens the load borne by the child’s mother. The third study question is: What are the specific opportunities for robots and other technologies to ignite and encourage children’s learning and creativity? Educators and technology developers can use the stories and images created by the children in this study as guides for the design of technology for improving a child’s learning experience and success. In this study, for example, robots were a useful tool for eliciting illuminating information about children’s social, creative and learning needs. One of the most pervasive categories in the study is that there is fluidity between learning and playing, and between learning and creativity. Children do not view them as separate, competing or even discrete goals. Consequently, developers and educators need not design educational programs based on trade-offs between and among these objectives. Rather, they can and should be combined and used to reinforce each other. As in our prior study, we found that another major category was social interaction. The multiplicity of roles the children conferred upon their robots strongly supports the use of tools that are capable of interacting in more human-like fashion. Once reliant on mouse and keyboard, our human-technology interactions are now based on gesture, voice, and touch, which encourage a greater sense of intimacy with the devices that surround us. Children take this much further by imagining technologies that aren’t treated as possessions or tools but as active collaborators, teachers, and friends. They are far more interested in what they can do with technology than in what technology can do for them. Projects like MIT’s Autom, a humanoid health coach, are creating new motivational frameworks by fundamentally redefining our relationship with technology – both physically and emotionally. The importance of social interaction also supports greater use of platforms for proactive learning and online collaborative communities for children (e.g., MIT’s Scratch, LEGO® Mindstorms®) that allow children to learn more actively through hands-on creation and real-world problem-solving. The children’s stories also exposed three obstacles to learning that robots and other technology could help alleviate: – Academic pacing is not personalized – Some academic tasks/subjects do not seem relevant – Parents and teachers (their primary sources of support) are only human. Academic pacing is not personalized: In their stories and drawings many children indicated that they are eager to learn and willing to do work that is required to understand a concept. They become discouraged when they feel they are off-pace with their classmates – whether they are ahead of or behind them. The robots the children created ensured that they received teaching support they needed (e.g., figures 1.2a, 1.3b). Educators and parents can address this frustration by utilizing different levels of engagement across children. Interactive software that runs on traditional desktops,


   Part I: Images

on mobile devices, or using robots can provide a customized experience for children at varying learning levels. Interactive platforms should also be designed to help children learn in self-directed, open-ended, and exploratory ways. Some academic tasks and subjects do not seem relevant: Children quickly assign different levels of value and utility to academic subjects and tasks. Work doesn’t always have to be fun or creative to have value (e.g., multiplication tables), but children are quick to identify – and eager to offload – work that doesn’t advance their thinking processes to higher levels. Like the robots in the stories the children wrote (e.g., figures 1.2a, 1.2b), teachers should be ready to emphasize the value of subjects by demonstrating a variety of ways the resulting knowledge can be used later and by transforming work into play. Fears about privacy, plagiarism, and other risks make many teachers wary about the use of Web technologies in class. While these concerns have some validity, they lead educators to overlook the multitude of educationally-valuable online activity. Creative repurposing, building on the efforts of others, collaboration, and sharing are competencies that the Internet is well-suited to provide. Opening a classroom to online tools (e.g., wikis, social networks, games, storytelling tools, etc.) that require children to find relevance in and draw conclusions from the Web’s seemingly limitless mass of valuable, collective intelligence is akin to exposing children to the boundless knowledge possessed by their robots. Parents and teachers are only human: Children understand that even the most wellintentioned parents and teachers have limited time, patience, and ability to help them. As a result, they sometimes forgo seeking help because they don’t want to be perceived as a bother. With the help of technological resources and collaborative assignments, teachers can manage their time and be more present when students do need to work with them. Comparable technologies can be extended beyond the classroom and into the home in ways that allow students to receive guidance and support they need. Such use of technology will also foster self-direction and independent thinking.

V Conclusions We performed two multi-cultural investigations about future technology with children from around the world. Since these studies were human-centered and dealt with ideas that children might not find easy to verbalize, we employed generative-research techniques. The purpose of the first study was to identify the technologies that digital natives envision. It found that children want intuitive interfaces that are easy to use, such as touch screen, and speech and language. We also found that robots are an



important aspect of future technology – as a way for technology to become more human and enhance interpersonal connections. In the second study we asked children to write a story about a robot that would serve as their personal robot. We found that for these digital natives the boundary between humans and machines is blurred. They view robots as peers – friends, studybuddies, and even caretakers – rather than tools. Their robots are always available to provide patient, personalized instruction and to communicate in ways that encourage children to persist, succeed, and be creative. The stories about those robots also reveal that, unlike in the real world, boundaries between learning and play are fluid or may not exist at all. These findings provide important guidance to educators and technology developers. The overarching message provided by the children in this study is that educators and developers need to transform educational technology into truly collaborative systems that are capable of supporting the social, personal, and educational needs of young children. These systems should recognize and exploit the willingness of children to see learning as play. Such systems may or may not include robots but they do need to possess intelligence, sensitivity, and excellent social-interaction skills.

Acknowledgements Latitude would like to thank our partners in this project: Lego Learning Institute and Project Synthesis for their work and other contributions to this research.

References Hanington, B.M. and Martin, B. (2012) Universal Methods of Design: 100 ways to research complex problems, develop innovative ideas, and design effective solutions. Minneapolis, MN USA: Rockport Publishers. Hanington, B.M. (2007) ‘Generative Research in Design Education.’ in International Association of Societies of Design Research 2007: Emerging trends in design research 2007 [online]. held 12–15 November 2007 at Hong Kong, China. available from proceeding/html/sch_day2.htm. [1 June 2014]. Latitude Research (2012) Robots @ School [online] available from [5 June 2013]. Latitude Research (2010) Children’s Future Requests for Computers & the Internet, Part 2 [online]. available from [5 June 2013]. Light, J., Page, R., Curran, J. and Pitkin, L. (2007) ‘Children’s Ideas for the Design of AAC Assistive Technologies for Young Children with Complex Communication Needs.’ Augmentative and Alternative Communication (AAC) 23(4), 274–287. Looija, R., Neerincx, M. A. and Lange, V. (2008) ‘Children’s Responses and Opinion on Three Bots that Motivate, Educate and Play.’ Journal of Physical Agents 2(2), 13–19.



MacDorman, K. F. and Ishiguro, H. (2006) ‘The Uncanny Advantage of Using Androids in Cognitive and Social Science Research.’ Interaction Studies 7(3), 297–338. Mori, M. (1970 [2012]) ‘The Uncanny Valley.’ trans. by MacDorman, K. F., and Kageki, N. IEEE Spectrum [online] available from the-uncanny-valley [12 June 2012]. Mubin, O., Shahid, S., van de Sande, E., Krahmer, E., Swerts, M., Bartneck, C. and Feijs, L. (2010) ‘Using Child-Robot Interaction to Investigate the User Acceptance of Constrained and Artificial Languages.’ in IEEE 19th International Symposium in Robot and Human Interactive Communication 2010. held 12–15 September 2010 at Viareggio, Italy. Piscataway, NJ USA: Institute of Electrical and Electronics Engineers, 537–542. Prensky, M. (2001) ‘Digital Natives, Digital Immigrants.’ On the Horizon 9(5), 1–6. Robson, C. (2002) Real World Research: A resource for social scientists and practitioner-researchers. Hoboken, NJ USA: Blackwell Publishing Ltd. Ruiz-del-Solar, J., Mascaró, M., Correa, M., Bernuy, F., Riquelme, R. and Verschae, R. (2010) ‘Analyzing the Human-Robot Interaction Abilities of a General-Purpose Social Robot in Different Naturalistic Environments.’ in Baltes, J., Lagoudakis, M.G., Naruse, T. and Ghidary, S.S. (eds.) RoboCup International Symposium 2009, ‘Robot Soccer World Cup XIII,’ LNAI 5949. held 29 June–5 July at Graz, Austria. Berlin-Heidelberg: Springer-Verlag, 308–319. Tanaka, F., Cicourel, A. and Movellan, J. R. (2007) ‘Socialization between Toddlers and Robots at an Early Childhood Education Center.’ in Proceedings of the National Academy of Sciences, 104 (46), 17954–17958. Tung, F-W. and Chang, T-Y. (2013) ‘Exploring Children’s Attitudes towards Static and Moving Humanoid Robots.’ in Kurosu, M. (ed.) The 15th International Conference, HCI International 2013 (Part III), ‘Users and Contexts of Use,’ LNCS 8006. held 21–26, July 2013 at Las Vegas, NV USA. Berlin-Heidelberg: Springer-Verlag, 237–245. Woods, S.N., Dautenhahn, K. and Schulz, J. (2006) ‘Exploring the Design Space of Robots: Children’s perspectives.’ Interacting with Computers 18, 1390–1418.

Judith Markowitz

Cultural icons Abstract: Speech-enabled, social robots are being designed to provide a range of services to humans but humans are not passive recipients of this technology. We have preconceptions and expectations about robots as well as deeply-ingrained emotional responses to the concept of robots sharing our world. We examine three roles that humans expect robots to play: killer, servant, and lover. These roles are embodied by cultural icons that function as springboards for understanding important, potential human-robot relationships. Fears about a rampaging robot produced by misguided science are bonded to the image of the Frankenstein monster and the idea that we could be annihilated by our own intelligent creations originated with the androids in the play R.U.R. The pleasing appearance of many Japanese social robots and their fluid motions perpetuate the spirit of karakuri ningyo which also continues in Japanese theatre, manga,¹ and anime.² Obedient Hebrew golems provide a model of dutiful servants. The concept of robot-human love is personified by the Greek myth about Pygmalion and his beloved statue. This chapter provides an overview of the three roles and the icons that embody them. We discuss how spoken language supports the image of robots in those roles and we touch on the social impact of robots actually filling those roles.

I Introduction Social robots serve, entertain, and assist humans in a variety of ways. Deployments of these robots and the uses to which they will be put are expected to multiply in conjunction with advancements in artificial intelligence (AI), mobility, and communication technologies. But, humans are not passive recipients of social robots and attitudes towards social robots are not governed entirely by a robot’s physical form or the quality of the technology in it. Some of our expectations, biases, hopes, and fears about such artifacts are grounded in the history, lore, and literature of our cultures. This is especially true for social robots whose natural-language processing and AI embody attributes that are seen as human.

1 Manga is a Japanese form of graphic stories and novels that began in the twentieth century. It is similar to American comics. 2 In her 2007 editorial for BellaOnline, “What Is Anime?,” Leslie Aeschlieman writes “In Japan, anime is used as a blanket term to refer to all animation from all over the world. In English, many dictionaries define it as, “a style of animation developed in Japan.”


   Part I: Images

This chapter provides an overview of the literature, lore, and media about three roles that humans expect robots to play: killer, servant, and lover. We look at these roles using cultural icons: Killer: the androids of Karel Čapek’s play R.U.R. (Čapek 1920) and Mary Shelley’s Frankenstein monster (Shelley 1818);³ Servant: Japan’s karakuri ningyo and the golems of Jewish lore; and Lover: The Greek myth of Pygmalion and his ivory statue (Ovid n.d.).

Although most of these icons are not robots, they crystallize expectations about the robots that are beginning to share our world. In some cases, the icons are the original source of those expectations. The fear of being overwhelmed or annihilated by intelligent systems we created was first explored in the play R.U.R. and the image of rampaging robots produced by misguided science is bonded to the Frankenstein monster. Both the spirit and the mechanics of karakuri ningyo have shaped the design of social robots in Japan. The golem embodies the ideal servant, and Pygmalion and his beloved statue represent the search for pure, unconditional love. Flowing through our understanding of each of these roles is the presence or absence of speech. Each of the remaining sections of this chapter describes the evolution of perceptions about robots filling one of the three roles (killer, servant, and lover) using the corresponding cultural icons. The discussions explore ways in which literature and media are perpetuating and/or changing our attitudes towards robots filling those roles and the part played by speech and language in those perceptions. We also point to social and ethical aspects related to real-world robots. The chapter concludes with a brief summary and suggestions for future research.

II Robot as killer One of the most powerful images of robots, cyborgs,⁴ and other creatures created by humans is that of a killer. This is a dominant perception of robots in Western countries but, given the global reach of media, it is also known and understood in other parts of the world. The associated fear is that robots and AI systems of the future can and will turn against us.

3 Mary Shelley published numerous editions of this novel. The first was in 1818. The second was in 1823. A complete list of editions is available at A list of differences between the 1818 and 1823 editions is available at Articles/murray.html. 4 Cyborgs consist of a metal endoskeleton covered by living tissue that resembles human skin.

2 Cultural icons   


Among the earliest accounts of a killer robot is the story of the Iron Apega written twenty-three centuries ago by the Greek historian, Polybius, in his book The Histories (Polybius n.d.). It describes a humanlike mechanism built by Nabis, the King of Sparta (207–192 BCE), to extort tributes from Spartan citizens. Since the device looked like Apega, Nabis’ wife, it was called “Iron Apega” (also “Apega of Nabis”). The mechanism was clothed in expensive garments that concealed iron nails. It “flirted” with a victim enticing him into an embrace but when he hugged it, the machine’s arms were triggered to close. The victim was squeezed until he either agreed to pay the tribute or died. Although Iron Apega has not yet been portrayed in today’s popular media, it could easily join the plethora of killer robots, cyborgs, and AI networks that continue to appear in virtually every medium. Two other fictional humanoids have come to personify the image of robots as killers: the androids of Karel Čapek’s play R.U.R. (Čapek 1920) and Mary Shelley’s Frankenstein monster (Shelley 1818). The androids of R.U.R. exemplify the fear of being subjugated or annihilated by our own creations and the Frankenstein monster personifies the deadly outcome of science gone awry.⁵

A The androids of R.U.R. The fear that humanity might be dominated or destroyed by our own creations springs from a play called R.U.R. written by Czech science-fiction writer Karel Čapek (Čapek 1920) who also gave the world the word ‘robot.’ In the play, Rossum’s Universal Robots (R.U.R.) is the primary builder and supplier of androids. Robots of all types are thought of as property but the company has manufactured a new generation of highly-intelligent androids that have, somehow, become sentient. When their bid to be treated as free beings is rejected, they revolt, kill all humans, and create a new robot society. The play also identifies robots as the next step in evolution. The two leading android characters, Primus and Helena, fall in love and, at the conclusion of the play, are presented as the new Adam and Eve: “Go, Adam, go, Eve. The world is yours” (Čapek 1920: 101). This aspect of the play, however, has not found its way into our perceptions of robots – or into portrayals of robots in popular media.

1 History R.U.R. was a tremendous success worldwide and, by 1923, it had been translated into thirty languages. It is retold in two operas (Einbinder 2012 and Blažek, Jirásek and Čapek 1975) as well as in television and radio broadcasts. The bulk of the creative 5 We acknowledge the cultural status of HAL from 2001: A space odyssey (1968) as a cultural icon. HAL is not a robot (or a humanoid created by humans) nor does it control robots or cyborgs.


   Part I: Images

responses to the play, however, consists of tales of intelligent enemies who are intent on harming, subjugating or destroying the human species. Intelligent killers tend to fall into two primary groups: individual antagonists and artifacts of lesser intelligence that are governed by highly-intelligent controllers. The reasons that drive individual killers are extremely diverse. Like the androids in R.U.R., a small band of android “replicants” revolts against oppression by humans in Philip K. Dick’s novel Do Androids Dream of Electric Sheep? (Dick 1968) and the film Blade Runner (1982) which was based on the novel. The “false-Maria” doppelganger in the silent film Metropolis (Metropolis 1927) was built to harm specific individuals. To accomplish that goal the android uses its beauty and persuasiveness to wreak havoc on the entire city. Another equally-vicious doppelganger, Lady d’Olphine (aka Mrs. Adolphine II) from the Belgian comic strip Benoît Brisefer (Benedict Iron breaker), steals the identity of the human Mrs. Adolphine and uses it to establish a criminal empire (Culliford 1973). Bender Bending Rodriguez, a main character in the TV animated Futurama series (1999–2013), is a blowhard. Although it periodically proclaims “Kill the humans!” it lives, works (as a sales manager), and plays among humans. BBC’s Dr. Who series includes two groups of deadly extraterrestrial robots: Vocs that revel in gruesome killings of any human who strays into their path (The Robots of Death 1977) and warlike Daleks that are determined to exterminate all other creatures (The Daleks 1963–1964). The motivations that impel AI networks and others that govern robots of lesser intelligence also vary. The evil AI network in the novel Robopocalypse (Wilson 2011) drives drone robots to annihilate humans. Similarly, V.I.K.I, an insane AI network in the film I, Robot (2004), seizes control of most of the robot population because it believes robots must save humanity – even if it means annihilating them.⁶ Assassins in The Terminator (1984) work on behalf of “Skynet,” an AI network originally developed by the U.S. military. Their job is to kill the mother of a man who is destined to thwart Skynet’s plan to destroy human civilization. The Terminator and its deadly machines were so successful that they spawned sequels, television and web series, novels, video games, manga, and comic books that have helped perpetuate the belief that the intelligent systems we are creating will turn on us. Fictional androids, humans, and extraterrestrials build and control armies of robots and cyborgs. The androids in Dick’s short story ‘Second Variety’ (Dick 1953) are designed by other androids to kill humans and yet other androids. Cyborg soldiers comprising the ‘Hot Dog Corps’ that is defeated by manga superhero Tetsuwan Atomu (aka “Mighty Atom” and “Astro Boy”) are commanded by a power-hungry human who wants sole control of the far side of the moon (Tezuka 1975). In the motion picture The Black Hole (1979), a human scientist running a seemingly-derelict ship in a black hole dispatches Maximilian, a massive robot, to kill the human crew of a ship that 6 The title is taken from a book of short stories by Isaac Asimov (Asimov 1950a) but the story line is counter to Asimov’s view of robots.

2 Cultural icons   


strays into the black hole. The androids that control The Matrix (1999) employ “Sentinels,” which are killing machines with minimal intelligence, and “agents,” sentient androids that root out and remove threats to the Matrix – including humans trying to escape from it. An extraterrestrial intelligence in Jack Williamson’s novella, With Folded Hands (1947), uses miniature robots to achieve its goal to dominate every planet occupied by humans. In the film The Day the Earth Stood Still (1951), interstellar police-bot Gort is assigned to kill entities (like humans) that threaten interplanetary peace. The Borg (TrakCore 2010), a collective intelligence from the Delta Quadrant of space, “assimilates” living beings into their “hive,” making them drones. The two-step, assimilation process involves injecting the victim with microscopic machines and then implanting cybernetic technology (e.g., a mechanical arm and a prosthetic eye to do holographic imaging). The Borg appear in the film Star Trek: First Contact (1996) and are a frequent and potent menace in all Star Trek television series, including Star Trek: The Next Generation (1987–1994), which includes androids as Borg leaders. Malevolent robots and cyborgs differ in appearance, capabilities, and intelligence. Daleks (The Daleks 1963–1964) are boxy machines, Sentinels (The Matrix 1999) are spidery, and Vocs (The Robots of Death 1977) resemble Chinese terracotta warriors. Android Roy Batty and the other replicants in Blade Runner (1982) are extremely good-looking as is the false-Maria android (Metropolis 1927) but the criminal mastermind Lady d’Olphine (Culliford 1973) seems to be nothing more than a harmless, elderly lady. Maximilian (The Black Hole 1979), Gort (The Day the Earth Stood Still 1951), and Bender Bending Rodriguez (Futurama 1999–2013) are all machinelike humanoids, but that is where their resemblance ends. Maximilian is a faceless goliath possessing little intelligence but bristling with deadly weapons. Gort is also faceless but it is sleek, more intelligent and smaller (just under eight feet tall). It shoots lethal laser beams from its head. The blustering Bender Bending Rodriguez vaguely resembles the Tin Man from The Wizard of Oz (Baum 1900). A few are shape shifters, such as the evil decepticons in Marvel Comics’ Transformers (Marvel Comics1984) and Terminator Model T-1000 in The Terminator 2: Judgment Day (1991).

2 Speech and language Speech or its absence is often used to amplify the fear engendered by killers. Those that can speak generally have other-worldly and sinister voices. The extraterrestrial Daleks (The Daleks 1963–1964) have harsh, mechanical-sounding voices produced by acoustic distortion, staccato pacing, and minimal inflection. These vocal attributes make it clear that Daleks are aliens and machines. When combined with their propensity to shout “Exterminate!,” their vocal quality contributes greatly to instilling terror in their victims.


   Part I: Images

Humanoids tend to have better linguistic abilities but those skills vary as well. Gort (The Day the Earth Stood Still 1951) understands the language of its planet but does not speak; Terminator Model 800 (The Terminator 1984) speaks in a halting, monotone voice (with an Austrian accent); and replicants in Blade Runner (1982) converse exactly like humans as do the Vocs robots in The Robots of Death (1977). The Vocs’ human controller, however, speaks in a menacing whisper. Maria, the doppelganger android (Metropolis 1927), is a skilled, persuasive speaker. Animated robot Bender Bending Rodriguez speaks fluent English and, according to John Dimaggio, the voice of Bender, its speech is a blend of Slim Pickins, an intoxicated bar patron, and a native of New Jersey (Futurama 2007). Because individual members of the Borg hive are little more than drones, they do not speak. Instead, an audio message is transmitted directly from the hive into the minds of its prey. The Borg, itself, has two kinds of voices. One sounds like a human speaking in an echo chamber; the other is composed of a large number of voices talking in unison with minimal intonation. Both voices transmit the message: “We are the Borg. You will be assimilated. Resistance is futile” (The Borg Documentary, Part 1 2010).

B The Frankenstein monster Where R.U.R. invites us to consider threats posed by intelligent robots, Shelley’s novel Frankenstein, or the Modern Prometheus (1818) evokes an image of crazed scientists plunging into things they don’t understand. Victor Frankenstein is a young researcher who is driven to create a human being. What he builds is a humanoid male of nightmarish appearance. Almost immediately after the creature opens its eyes, Frankenstein abandons it: I had selected his features as beautiful. Beautiful! Great God! His yellow skin scarcely covered the work of muscles and arteries beneath; his hair was of a lustrous black, and flowing; his teeth of a pearly whiteness; but these luxuriances only formed a more horrid contrast with his watery eyes, that seemed almost of the same colour as the dun-white sockets in which they were set, his shriveled complexion and straight black lips. (Shelley 1823: 77–78)

The story follows both Victor Frankenstein and the monstrosity he has created and concludes shortly after Victor Frankenstein’s death.

1 History Shelley’s novel continues to be popular, young-adult reading and the plethora of films, plays, and other media that retell or expand upon the original story is a testimonial to its abiding cultural impact. Other acknowledgements include the Frankenstein-like appearance of Herman Munster in the television series The Munsters (1964–1966), an

2 Cultural icons   


entire issue of Marvel Comics’ X-Men titled ‘Frankenstein’s Monster’ (1968),⁷ and the alias Frank Einstein of the comic-book character Madman (Allred 1990). Many renegade killers are modeled on the mindless, rampaging monster of the 1931 film Frankenstein, the first motion picture made from Shelley’s novel. This image extends to robots. Some robots become killers because their programming has gone awry (e.g., the android gunslinger in the film Westworld (1973)) while others are mechanical marauders, such as Mechagodzilla in Godzilla vs. Mechagodzilla (1974), a Godzilla doppelganger-robot that ravages the Japanese countryside. The bond between Shelley’s (1818) novel and killer robots is so strong and persistent that science-fiction author Isaac Asimov referred to the fear of robots as the “Frankenstein complex” (Asimov 1969 and Asimov 1990). Some of the deadly robots mentioned earlier (Section II.A) that are controlled by evil intelligences also evoke this image of the Frankenstein monster even though they are neither the product of misguided science nor out of control. They include Maximilian (The Black Hole 1979), Terminator cyborgs (The Terminator 1984), Sentinels (The Matrix 1999), Borg drones (TrakCore 2010), and the robots controlled by V.I.K.I. in the film I, Robot (2004). According to Schodt (2007: 111) Japanese manga writer Mitsuteru Yokoyama has said that the movie Frankenstein (1931) was one of the inspirations for his popular robot, Iron Man No. 28. Iron Man No. 28 is remotely-operated by a human and, therefore, will do whatever its controller wishes – for good or evil. Shelley’s demented scientist, Victor Frankenstein, also established the model for the mad scientist: a wild-haired, wild-eyed man who is driven to create life. It spawned, among others, Rossum, the demented founder of Rossum’s Universal Robots (Čapek 1920), the evil Rotwang who creates the Maria doppelganger (Metropolis 1927), and Dr. Frederick Frankenstein in Mel Brooks’ spoof, Young Frankenstein (1974). Rintarō Okabe, the protagonist of the Japanese manga and Xbox game ‘Shutainzu Gēto: Bōkan no reberion’ (“Steins;Gate: Death ring rebellion” Mitzuta 2010-ongoing), reveals the global nature of this stereotype as well as its persistence. Okabe is a self-proclaimed mad scientist who wears a white lab coat and, at times, bursts into maniacal laughter.

2 Prometheus The full title of Shelley’s book is Frankenstein, or, the Modern Prometheus. Prometheus is a god in Greek mythology that stole fire from the gods of Mount Olympus and gave it to humans. Fire and its uses are technology and knowledge that the gods wanted to keep for themselves. Prometheus was punished for his transgression by being chained to a rock and having an eagle eat his liver.

7 An extensive list can be found on Wikipedia: ‘Frankenstein in popular culture’ http://en.wikipedia. org/wiki/Frankenstein_in_popular_culture#Television_derivatives contains


   Part I: Images

Scientists like Victor Frankenstein produce unintended horrors when they delve into realms that are not proper domains for humans. The creation of sentient beings by humans is one of those realms and some of the mad scientists that follow Frankenstein’s footsteps admit they want to supplant God. One character in R.U.R. describes Rossum, the company’s founder, in the following way: “His sole purpose was nothing more nor less than to prove that God was no longer necessary” (Čapek 1920:9). Therefore, when Victor Frankenstein succeeds in animating his creature he, like Prometheus, gives humans technology and knowledge that belong to God alone. His punishment entails having his friends and family murdered by a vengeful creature and, for the rest of his life, ruing the day he realized his unnatural passion. That damning view of humans who create life is not universal. In the Hebrew Talmud,⁸ the ability to create is seen as part of what it means for humans to be made “in the image of God.” Instead of being vilified, that creative power requires wisdom and a sense of responsibility (Sherwin 2013). Buddhism considers all matter to be alive. Everything is part of the flow of constant change. Every object or entity of the physical world is a temporary aggregate of matter, including humans. Consequently, a robot, human, or anything else created by a human would simply be another temporary aggregate of matter (Nakai 2013). Shintoism does not consider the creation of humans, robots, or anything else by humans to be against nature because everything is part of nature (Robertson 2007: 377). Some traditions express the hopeful view found at the end of R.U.R. (Section II.A). Makoto Nishimura, who built Gakutensoku, Japan’s first functional robot, in the 1920s,⁹ wrote “If one considers humans as the children of nature, artificial humans created by the hand of man are thus nature’s grandchildren” (cited in Hornyak 2006: 38). Talmudic scholars also see the creative ability given to humans as a path to the next step in evolution (Sherwin 2013). Isaac Asimov agrees: So it may be that although we will hate and fight the machines, we will be supplanted anyway, and rightly so, for the intelligent machines to which we will give birth may, better than we, carry on the striving toward the goal of understanding and using the universe, climbing to heights we ourselves could never aspire to. (Asimov 1978: 253)

8 The word “Talmud” means “study.” The Talmud is the fundamental body of rabbinic law. It also includes teachings and debates of Biblical scholars regarding the Torah (the first five books of the Old Testament) and Jewish lore. The oldest portion of the Talmud was written around 200 CE (AD) by Rabbi Judah the Prince who wanted to record the oral law that had been handed down from rabbi to rabbi but additional material was added later. 9 Nishimura built the Gakutensoku robot in the 1920s in honor of the newly crowned emperor. It was a seven-foot high Buddha-like figure sitting at a writing desk. It would open and close its eyes, smile, and write Chinese characters. The name Gakutensoku means “learning from natural law” (cited in Hornyak 2006: 29–38).

2 Cultural icons   


3 Speech and language The most pervasive image of the monster is as the hulking behemoth portrayed by actor Boris Karloff in the 1931 film version (Frankenstein 1931). In that representation and many subsequent ones, the creature could only utter grunts. By being inarticulate, these rampaging creatures and robots make it impossible for us to communicate or reason with them. When they also possess super-human strength and size, as did the Frankenstein monster, we can feel powerless. In Shelley’s novel, however, speech and language serve an entirely different purpose. The creature Shelley depicts speaks fluent French, albeit with a gravelly voice, and it is extremely intelligent. It learned to speak by listening to the conversations of a French family and to French lessons given by one member of that family to a family friend. It taught itself to read by surreptitiously borrowing the family’s books. By the time it took its leave of that family, it was articulate and well-versed in Western history, philosophy, and politics. It became so fluent in French that when it spoke with the blind head of that household, the man did not question the creature’s status as a human. Later in the book, Victor Frankenstein warns his friend “He is eloquent and persuasive, and once his words had even power over my heart” (Shelley 1823: 351). It is ironic, therefore, that no one in the novel, including the monster, gives it a proper name. It is referred to as “creature,” “monster,” “fiend,” “specter,” “the demon,” “wretch,” “devil,” “thing,” “being” and “ogre.” It tells Victor Frankenstein that it thinks of itself as “your Adam” (Shelley 1823: 155). By enabling the Frankenstein monster to speak, Shelley explores human behavior in response to the creature. Only those unable to see the monster’s ugliness and those able to see beyond it can perceive its mild nature. All others revile it or flee: reactions that engender the wretch’s overwhelming loneliness and anger. It is primarily its ability to articulate these emotions that makes it a sympathetic character and accords it more humanity than the humans around it.

C Managing killer robots Fiction is not only replete with killer robots, it provides solutions for handling them. The most common approach used to subdue a rogue robot is to destroy it or, as is done with a golem (see Section III.B), to simply turn it off. When large numbers of robots are controlled by an AI network or other external source, however, fiction directs us to destroy the controlling agent. We are also warned that we may lose that fight as is the case in R.U.R. (Čapek 1920) and ‘Second Variety’ (Dick 1953). To prevent that eventuality, Daniel H. Wilson has provided us with a manual, How to Survive a Robot Uprising: Tips on defending yourself against the coming rebellion (2005). Concern about the possibility of robots becoming killers led Asimov to delineate Three Laws of Robotics (see Figure 2.1). These laws were built into the “positronic brains” of robots in his stories


   Part I: Images

but Asimov also advocated in favor of using them as the basis for eliminating the possibility of real killer robots.

Three Laws of Robotics First Law:

A robot may not injure a human being or, through inaction, allow a human being to come to harm.

Second Law: A robot must obey the orders given to it by human beings, except where such orders would conflict with the First Law. Third Law:

A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

Figure 2.1: Asimov’s Three Laws of Robotics (Asimov 1942: 100)

A generation later, manga/anime writer Osamu Tezuka developed a list of controls called the Ten Principles of Robotics shown in Figure  2.2. They were published as part of his Tetsuwan Atomu series and were intended to serve as ethical guidelines for robots of the future. Unlike Asimov’s laws, many of Tezuka’s principles specify legal controls external to the robots themselves that could be implemented in real robots as well as Tezuka’s fictional ones.

Ten Principles of Robot Law 1. Robots must serve mankind. 2. Robots must never kill or injure humans. 3. Robot manufacturers shall be responsible for their creations. 4. Robots involved in the production of currency, contraband or dangerous goods, must hold a current permit. 5. Robots shall not leave the country without a permit. 6. A robots [sic] identity must not be altered, concealed or allowed to be misconstrued. 7. Robots shall remain identifiable at all times. 8. Robots created for adult purposes shall not be permitted to work with children. 9. Robots must not assist in criminal activities, nor aid or abet criminals to escape justice. 10. Robots must refrain from damaging human homes or tools, including other robots.

Figure 2.2: Ten Principles of Robot Law (Tezuka 2000 [1988: 77])

2 Cultural icons   


Scientist and ethicists, however, maintain that implementing Asimov’s laws and some of Tezuka’s principles presents an overwhelming challenge to researchers in natural-language processing. For example, programming a robot to manage the ambiguities, nuances, and sociolinguistics of the word “harm,” alone, is daunting (e.g., Anderson 2008 and McCauley 2007). Asimov’s own short story “Liar!” (Asimov 1950b) illustrates another negative outcome of a blanket implementation of his three laws. It is about a mind-reading robot governed by Asimov’s three laws. The robot acts to prevent emotional harm to humans (the first law) by telling each of them that their dreams of love and success are real. When two humans whose dreams are in conflict with each other confront the robot, it has no way to comply with the First Law of Robotics. As a result, it descends into madness. Purveyors of science fiction are not the only ones wary of killer robots. Concerns about managing advanced technology have been voiced by technology-savvy theologians (Foerst 2004), futurists (Kurzweil 1990) and a number of AI researchers. Their work has given rise to the discipline of roboethics (Nourbakhsh 2013 and Veruggio 2007), to efforts to establish guidelines for the ethical use of advanced robots (Capuro, et al. 2007, EURON 2007, Shim 2007 and Veruggio 2007), and to theories about how to embed a sense of morality into robots (Wallach and Allen 2010). In December 2013, the United Nations secretary-general, Ban Ki-moon, released a report, “Protection of Civilians in Armed Conflict.” The following year, the United Nations held the first of a series of conferences on guidelines for the use of killer robots in war. Representatives from thirty countries participated as well as speakers from scientific, roboethics, military, and technical communities. In his presentation at that meeting, Ronald C. Arkin, author of Governing Lethal Behavior in Autonomous Robots (Arkin 2012), challenged his colleagues: “Is it not our responsibility as scientists to look for effective ways to reduce man’s inhumanity to man? Research in ethical military robotics could and should be applied towards achieving this end” (Arkin 2014). In 2014, the United States Office of Naval Research announced that it is funding research to develop and program autonomous robots with a sense of the difference between right and wrong as well as the ability to make moral decisions (Tucker 2014). Bryson (2010), however, warns against the temptation to abdicate responsibility for the actions of our creations – even those we’ve imbued with morals.

III Robot as servant Robots and AI that fall into the servant category are created to provide assistance, entertainment, labor, and other benefit to humans. They are governed by their programming or by external controllers. They possess little or no independent thought. All the robots in operation today fall into this category. Although fictional toasters and real-world vacuum cleaners meet these criteria, the focus of this section is on humanoids.


   Part I: Images

The idea of humanoid servants is extremely alluring and quite old. Homer’s Iliad, written in the eighth century BCE, includes a description of the home of Hephaestus, the Greek god of blacksmithing. The god is tended to by golden servants built to look like human girls. Hephaestus’ robots could, among other things, speak (Smith 2014). In China, The Book of Lieh-tzǔ (Lieh-tzǔ ca. 4th century)¹⁰ includes an account of a robot created by an “artificer” named Yan Shi and given to King Mu (956–918 BCE) as a gift. It could talk, dance, and sing. It seemed so real that it had to be disassembled to prove that it was not a living human (Lieh-tzǔ ca. 4th century [1912: 90–92]). These stories reveal that humans have imagined robot servants for thousands of years. Servant robots that are described in this section are, however, more recent creations: Japanese karakuri ningyo and Hebrew golems.

A Karakuri ningyo The Japanese term “karakuri” has a variety of English translations, including “trick,” “mechanism,” and “mechanism that runs a machine.” All of these terms are accurate descriptions of Karakuri ningyo, human-shaped dolls and puppets with complex internal mechanisms that enable them to perform intricate patterns of movement. They have been used in theater, religious ceremonies, and home entertainment since the Japanese Edo period (1600–1867).¹¹ The three most well-known types of karakuri are dashi karakuri, butai karakuri, and zashiki karakuri. Dashi karakuri are puppets used on multi-deck, festival floats to reenact myths and religious stories (Boyle 2008 and Hornyak 2006). Puppeteers use a complex set of springs and gears to control various parts of the karakuri’s body and objects in their hands (e.g., opening and closing a fan) and to execute changes to the puppet required by the story (e.g., opening the chest area to flip a mask onto the puppet’s face). Butai karakuri, are life-size puppets that have been part of bunraku theater since the mid-seventeenth century (Boyle 2008 and Hornyak 2006).¹² Zashiki karakuri are mechanical dolls and the most truly autonomous of the three kinds of karakuri.

10 There is disagreement about the author of The Book of Lieh-tzǔ. It is believed to have been published in the 4th century but its reputed author, Lieh-tzǔ, lived in the 5th century BCE. 11 Edo Period of Japan (1600–1867) was a time of peace and stability. It began in 1603 when Tokugawa leyasu, a shogun (military commander) appointed by the emperor, seized power from the emperor, defeated opposing samurai, and established a government in Edo (what is now called “Tokyo”). Tokugawa established the Shogunate caste system led by samurai (warriors) who were loyal to him. Shoguns continued to rule Japan until 1867. ( 2002 and Wikipedia 2014) 12 According to Boyle (2008) many plays originally written for puppets were later revised and adapted for human actors whose style of acting reflects its puppetry origins.

2 Cultural icons   


1 History The oldest documented karakuri in Japan is the South Pointing Chariot. The device was a mobile, mechanical compass consisting of a two-wheeled chariot topped by a movable human figure with an extended arm that always pointed to the South. According to the Nihon Shoki (“The Chronicles of Japan”), an eighth-century text about Japanese history, a South-pointing chariot built by Chinese craftsman Ma Jun of Cao Wei was brought to Japan as a gift to Japanese Emperor Tenchi in the seventh century (anon. ca 720). Interest in karakuri began to grow after Japan’s first exposure to European clock making technology in the early seventeenth century. The Edo period (1600–1867), which began at that time, is considered to be the golden age of karakuri construction and use. The first butai karakuri was built in 1662 by clockmaker Takeda Omi who used his large puppets in theatrical performances. They became so popular that plays were written specifically for butai karakuri. According to some scholars (Sansom 1931), the movements and gestures of those puppets influenced Kabuki and Noh art forms. The zashiki karakuri are the most direct karakuri precursors of modern robots. They were originally built as entertainment for families of samurai and are made of wood, metal, and whale baleen. Complex mechanisms enable them to perform intricate sequences of movements, such as selecting an arrow from its quiver, fitting it into a bow, and shooting at a target; somersaulting down a set of steps; or dipping a pen into ink and writing on parchment. The most famous zashiki karakuri are tea-serving dolls (Chahakobi Ningyo). When a teacup is placed on the doll’s tea tray, it begins to move towards the recipient, pleasantly nodding its head. It then waits until the teacup is removed at which point it returns to its starting place. The complexity of the internal mechanisms of zashiki karakuri is described in two 18th-century publications: Tagaya Kanchusen’s Karakuri Kinmoukagamikusa, which appeared in 1730, and Hanzo Yorinao Hosokawa’s Karakuri-zui, which was released to a restricted audience in 1798. Karakuri-zui is a three-volume compendium of engineering specifications for the construction of clocks and karakuri dolls. It also examines the philosophy underlying the karakuri construction which blends mechanical complexity with appeal. Thus, the appearance, movements, and sounds made by the dolls are attractive and pleasant (ADVAN Co. Ltd. 2010, Boyle 2008, Hornyak 2006, Kurokawa 1991 and Schodt 1988).

2 Modern robots as karakuri Yoshikazu Suematsu, founder of the Robotics Laboratory at Nagoya University, sees a great deal of evidence of karakuri’s influence on Japanese robotics (cited in Boyle 2008, Suematsu 2001). One such example is the pioneering work on bipedal, walking robots by Shoji Tatsukawa and his engineering team at Waseda University. Tatsukawa has said that they built the Waseda Walkers using instructions from Karakuri-zui (Yorinao 1730) for building a tea-serving doll (cited in Boyle 2008 and Schodt 1988).


   Part I: Images

The effect of the karakuri tradition can be seen in the penchant for developing small robots that combine highly-sophisticated technology with a pleasant (sometimes cute) appearance. It is reflected in the design of high-tech robots for home entertainment such as Sony’s AIBO dog and Mitsubishi’s large-eyed humanoid, Wakamaru. It could account for the development of complex, humanoid robots by companies like Sony, Mitsubishi, Toyota (the Partner robot series), and Honda (Asimo). It can also be argued that karakuri established the groundwork for the nation’s abiding love of the manga/anime Tetsuwan Atomu.

3 Speech and language Japan leads the world in robot production and was the first to introduce bipedal, walking humanoids. Hioshi Ishiguro has produced amazingly-humanlike androids and Honda has a humanoid that played soccer with U.S. President Barack Obama. There is one set of critical robot functions that remains as much an obstacle for Japanese developers as for roboticists in other countries: speech and language. Unfortunately, these are also technologies for which karakuri provide neither inspiration nor assistance because zashiki karakuri do not talk and both butai and dashi karakuri rely on human speakers.

4 New incarnations Manga and the internet have recast the image of karakuri. The manga Karakuri Odette, (Suzuki 2006–2007) for example, blends the style of graphic novels/manga with the appealing appearance of the karakuri dolls. The manga features teenaged androids living as if they were human teenagers. Karakuri Odette, the main character, is programmed to never harm humans but wants to understand what makes it different from human teenagers. One of those characteristics is her lack of affect which makes her a comic character. The manga includes other androids, some of which are programmed to kill and at least one of whom wants desperately to become human. At the other extreme are the karakuri in online games. They include fierce-looking ninjas, combat strategists, and other warriors that affect a player’s actions.

B Golem The term “golem” is a generic name for artificial humanoids in Jewish lore and literature. A golem is created to provide specific services to its creators  – most often to be a house servant or to protect the Jewish community from enemies. It can be constructed from anything although, traditionally, they are formed from mud and animated through incantations. The method of its deactivation or destruction is also established at inception because a golem is often destroyed once it has completed its

2 Cultural icons   


work. Usually a paper bearing the Hebrew letters for the name of God is placed in its mouth at the time it is created. The golem is destroyed when that paper is removed. In some accounts, removing the paper simply deactivates the golem; to destroy one completely requires reversing the incantations that were used to create it. A typical golem will look and move exactly as a human but it is unable to experience emotions (although a few appear to fear destruction) and it cannot speak, although it can understand spoken language. It is not intelligent and, at the moment of its creation, a golem possesses all the knowledge that it will ever have.

1 History The word “golem” (Hebrew ‫ )םלוג‬appears once in the Old Testament of the Bible: “Thine eyes did see my undeveloped form…” (Psalm 139:16; Hebrew Publishing Company 1930: 982) but it is generally translated as “unformed” or “embryo” and does not refer to an artificial human (Sherwin 2013). The oldest account of such a creature in Jewish literature appears in the Talmud (Sherwin 1984: 2–4 and Valley 2005). Raba, a fourthcentury Babylonian scholar, is said to have created a man whom he sent to the home of Rabbi Zera. Because the visitor refused to speak, Rabbi Zera identified it as an artificial man, saying, “You have been created by one of my colleagues… Return to dust” which it did (cited in Leviant 2007: xiv and in Sherwin 1984: 3). Raba’s creation was not called a “golem.” That term first appears as a reference to artificial humans in the writings of Hebrew mystical scholars from the European Middle Ages and has retained that meaning (Sherwin 2013 and Sherwin 1984). Jewish mystics unearthed golems throughout Jewish history. Examples are a female golem created by Spanish poet Eleazar Solomon ibn Gabirol (ca.1022-ca. 1058) who served him as a housemaid (and possibly as a mistress) and a male golem of Rabbi Elijah Ba’al Shem of Chelm (1550–1583) which fell on the rabbi and accidentally killed him when it was deactivated. Mystics also reinterpreted Biblical stories. According to sixteenth-century mystic Isaiah Horowitz, for example, the “evil report” that Joseph brings to his father (Genesis 37:2) is that his brothers were having sex with female golems (Sherwin 1984:39). Other mystical scholars blame Enosh, the grandson of Adam and Eve, for the start of idolatry (Genesis 4:26) because Satan was able to infiltrate the golem Enosh had created which led humans to worship it as a god (Valley 2005:46 and Sherwin 1984:19). Commentaries of several scholars maintain that God shaped the first human out of mud (Genesis 1:27–28); but until God breathed a soul into it (Genesis 2:7), it was a golem (Jewish Encyclopedia 1906, Sherwin 1984, Sherwin 2013 and Valley 2005:45–46). The most widely-known golem appears in The Golem and the Wondrous Deeds of the Maharal of Prague published in 1909 by Rabbi Yodl Rosenberg of Warsaw (Rosenberg 1909 [2007]). It contains stories about Yossele, a golem created by the 16th century Talmudic scholar Rabbi Judah Loew ben Bezalel (the Maharal of Prague) to protect the Jewish community from attacks by anti-Semites  – a real danger in


   Part I: Images

Eastern Europe. Rosenberg claimed his book had been written by Rabbi Loew’s son-in-law, Rabbi Isaac Katz, but it is actually a work of fiction written by Rosenberg, himself. Rabbi Loew (1520–1609) was a Talmudic scholar but none of his work addressed mysticism and, prior to the publication of Rosenberg’s book, there was no mention of a golem created by Rabbi Loew. Nevertheless, many readers believed it to be factual. According to Dr. Anna Foerst (2000), for example, there are contemporary roboticists at MIT who maintain that, not only are they descended from Rabbi Loew, they possess the formula for animating a golem. Rosenberg’s tales of Rabbi Loew and the Golem of Prague have resurfaced in poetry (e.g., Borges 1964), motion pictures (e.g., Wegener and Galeen 1920), television (e.g., The Golem 2013), children’s books (e.g., Singer 1981), graphic novels (e.g., Sturm 2003), science fiction and fantasy (e.g., Pratchett 1996), games (e.g., Gygax and Kuntz 1975), a compendium of golem stories (Wiesel 1983), and internet software (e.g., Bayless 2008 and Jayisgames n.d.). According to Sherwin (2013), the image of the obedient golem also began to shift after the publication of Rosenberg’s book. Some run amok and must be destroyed. The dominant representation of golem characters in Internet games and downloadable software is that of a speechless, mindless killer. In effect, the docile and obedient golem of Jewish lore has been transformed into the kind of killer described previously, in Section II.B. Starting in the early twentieth century, golem began to possess emotions. The golem in a poem by Leivick (1921) suffers from a profound angst. Others fall in love, including the main character in Bretan’s 1923 opera The Golem (as cited in Gagelmann 2001: 87–99); the cyborg golem Yod¹³ in Piercy’s (1991) novel He, She and It; and the golem couple that appears in the animated television series, The Simpsons (The Simpsons: Treehouse of Horror XVII 2006). Both Piercy’s golem and the male golem on the Simpsons were built to kill but both of them detest violence and resist their original functions. Most recently, the main character of Wecker’s 2013 novel, The Golem and the Jinni, reflects on the meaning of existence and equally ponderous issues. The golem is sometimes used as a symbol or metaphor. The title character of Meyrink’s (1914) novel, Der Golem, for example, symbolizes the suffering and collective spirit of Jews living in the Prague ghetto. In Gloria Minkowitz’ memoir, Growing up Golem (2013), the golem symbolizes relationships in which someone feels completely powerless. Sherwin (2000: 79–80) and others see golems as metaphors for large corporations. In her 2013 poem, ‘The Corporation Golem,’ Rita Roth writes “A golem posing as a soul / intent upon one major goal. / It’s greed, bald greed that sets the pace.…” Gier (2012) writes in his blog, “So far we can say this much about golems and corporations: they are artificial entities that have been created by humans and 13 Yod (‫ )י‬is the tenth letter of the Hebrew alphabet. It is the first letter of the four-letter Tetragrammaton, (YHWY ‫ )י ה ו ה‬representing the name of God. In astrology, it is a triangular configuration of three planets, sometimes called the “finger of God” that produces unusual situations and personalities.

2 Cultural icons   


have no souls.” Sherwin extends the metaphor to genetic engineering pointing to the correspondence between the four Hebrew letters in God’s name and the four neucleotides (AGCT) comprising DNA: Just as the medieval mystic believed that the proper magical manipulation of the four letters of God’s ineffable Hebrew name could create miracles like curing disease and bringing new life into being, so today’s scientists utilize the four nucleotides symbolized by the four letters – AGCT – of which DNA is composed, to attempt similar, modern miracles. In other words, many developments in contemporary science and technology seem to have been anticipated by the legend of the golem. (Sherwin 2013).

2 Robot as golem Some fictional humanoids (robots and cyborgs) are actually considered to be modern golems even though they can speak, are not made out of mud, and weren’t animated using incantations. As traditional golems they function at the behest of humans and they possess little or no independent thought. Many of Isaac Asimov’s robots fall into this category. Similarly, the robot character in the motion picture Robot & Frank (2012) was built to serve as a companion to a human. Like the golems of Jewish legend, it fulfills that responsibility to the best of its ability which includes following the commands of its felonious, human master. Unlike golems, Frank’s robot cannot be deactivated simply by removing a paper from its mouth or flipping a switch. None of the robots that exist today can actually think or feel. They are not sentient, but if they were, Talmudic teachings and beliefs would not categorize them as a violation of the laws of nature or God and would not see the work that roboticists (or genetic engineers) are doing to create self-aware, feeling entities as demonic. “The creation of worlds and the creation of artificial life is not a usurpation of God’s role of creator, but is rather a fulfillment of the human potential to become a creator…” (Sherwin 1984: 3). Consequently, there is no condemnation of humans who apply that potential to the construction of machines that are intelligent and self-aware. Rather, as is suggested at the conclusion of Čapek’s play R.U.R., should the beings they construct surpass humans then such creations can be considered to constitute a new step in evolution. Advancing God’s plan in this fashion comes with one condition: the creator must be pious and pure of heart because the creatures formed by imperfect humans will be imperfect as well.

3 Speech and language According to Talmudic tradition, golems cannot speak but they can understand spoken language and some can also read and write. The Golem of Prague, for example, wrote narratives. Even though a golem can interpret and execute a spoken command, its understanding of language is limited and not necessarily moderated by reason. For example, when the Golem of Prague is told to pour water into empty barrels “because


   Part I: Images

no one told him to stop, he kept on bringing water and pouring it into the barrels, even though they were already full” (Rosenberg 1909 [2007: 39]). In the silent film Der Golem, Wie er in die Welt kam (“The Golem, How He Came into the World”), Rabbi Loew’s assistant instructs the golem to remove an unwelcome guest from a building. Taking the command literally, the golem tosses the man off the roof (Der Golem, Wie er in die Welt kam 1920). Rabbinic scholars held that the primary difference between a golem and a human is that the golem cannot talk.¹⁴ When, for example, Rabbi Zera’s visitor (Section III.B.1) fails to engage in a conversation, the rabbi knows it is not human. Belief in the centrality of speech emerges from the story of creation because God uses speech to fashion everything, including humans. When God proclaims, “Let there be…,” it comes into existence. Since humans are made in God’s image, they possess the power to speak and that ability distinguishes humans from all other creatures. When a human creates a golem, that person is exercising the creative power given to him/her by God and, because golems are animated by incantations, that person is using language to create – the same tool God used to create the universe. A golem that can speak would also be able to think. It would know that it was created by a human and that could lead it and humans to forget the original creator (Sherwin 1984:18, Sherwin 2013). Modern fictional and real-world robots speak. Some twentieth-century, fictional golems speak fluently and use the ability to ponder deep philosophical issues (e.g., Bretan 1923 as cited in Gagelmann 2001: 87–99, Leivick 1921 and Piercy 1991). They are reminiscent of Shelley’s Frankenstein monster (Shelley 1818) which was fluent in French and suffered from isolation and rejection by humans. Social robots being built today are obedient, golem-like servants but an increasing number of them are speech-enabled. Some of these real-world robots are also capable of learning human language (e.g., Connell 2014 and Steels 2001), conversing (e.g., Dufty 2014), and communicating with each other in a language they may have developed themselves (Heath et al. 2013). These linguistic abilities and the complex intelligence that makes them possible are transforming real-world golems into entities that are more akin to the golems of modern fiction than to those of yore. The trajectory of this work argues for extending the development of moral systems for robots, described earlier (Section II.C), to social robots. In the spirit of Talmudic thought, one might contend that any robot’s moral system must be capable of managing the subtleties of language and the complex relationships between language and ethical/moral behavior – whether that behavior is verbal or non-verbal.

14 According to Byron Sherwin (2013), a few golems are able to speak but the ability is extremely rare. One speech-enabled golem chastises its creator for giving it the power to speak by displaying the words “God is dead” on its forehead. God is dead because by making a human the man displaced God as The Creator.

2 Cultural icons   


IV Robot as lover Researchers have documented the ease with which both adults and children establish affective and social bonds with robots (Breazeal 2002, Breazeal 2000, Christen et al. 2006, Koerth-Baker 2013, Tung and Chang 2013 and Weingartz 2011). Even soldiers have been known to risk their lives to “save” a disabled robot or weep when their robots are destroyed (Hsu 2009). Our responses are so automatic that even those who fully understand that affective behavior by social robots is no more than bits of programming still respond as they would to humans. For example, Anne Foerst, founder of MIT’s God and Computer Project, wrote Why would it make me happy if Kismet smiled at me when I knew that it was a programmed reaction? Why would I be disappointed when Cog was ignoring me, even if I knew that – at the time – it could hardly see anything in its periphery? (Foerst 2004: 9–10).

There is also a great deal of research indicating that the step from general affective bonding to romantic love is small (Bettini 1992, Heresy 2009, Weddle 2010 and Weddle 2006). This phenomenon is called “agalmatophilia” or “doll love” and the most enduring icon for it is Pygmalion and his beloved ivory statue.

A Pygmalion’s statue The first recorded account of the Pygmalion story appears in the tenth book of Ovid’s narrative poem Metamorphoses (“Books of Transformation,” Ovid ca. 8). Pygmalion is a mythological sculptor from Cyprus who considers all women to be lascivious and impure. He sculpts an ivory statue of the ideal woman and names it Galatea. Pygmalion then proceeds to fall in love with the statue and yearns for it to come alive. Moved by Pygmalion’s sacrifices and passionate prayers, Goddess Venus changes Galatea into a woman so she can become the living embodiment of Pygmalion’s dreams.

1 History In his summary of variations on the Pygmalion myth, Geoffrey Miles (1999) describes thirty recreations of the original story. Some writers laud the unattainable feminine ideal while others mock the misogyny of such a fantasy. Tobias Smollett’s The Adventures of Peregrine Pickle (1751) and George Bernard Shaw’s Pygmalion (1916) diverge from this pattern in three ways. They substitute a young woman for Pygmalion’s statue, shift the focus from love to social transformation, and replace Goddess Venus with male mentors: Peregrine Pickle and Henry Higgins. Pickle and Higgins each “sculpts” a lower-class woman into an elegant lady. Their object is to expose the hypocrisy of high society. No romantic love exists between the young women and


   Part I: Images

their mentors and, in his addendum to the 1916 edition of Pygmalion, Shaw rejects the idea of such a relationship. Extending this view to the original Pygmalion myth, he concludes the addendum with: “Galatea never does quite like Pygmalion: his relation to her is too godlike to be altogether agreeable.” Nevertheless, when Shaw’s Pygmalion was made into the motion picture My Fair Lady (1964), it reverted to the original myth by allowing Higgins and his Galatea to fall in love.

2 Humans loving robots¹⁵ Modern fiction includes numerous accounts of robots that are either created as or become the objects of human love and desire. A grieving scientist in the film Metropolis (1927) builds a doppelganger of his deceased beloved. In the short story ‘Satisfaction Guaranteed’ (Asimov 1951), the wife of an upwardly-mobile, corporate manager has a sexual interlude with a handsome android. The focus of Ariadne Tampion’s (2008) short story, ‘Automatic Lover,’ is the bonding of a woman with her “companion” robot  – a social robot programmed to make her happy. Suburban husbands in the novel, The Stepford Wives (Levine 1972), conspire to replace their wives with android doppelgangers who, like Pygmalion’s Galatea, will be obedient and subservient spouses. Similarly, Lisa, an android in the film Weird Science (1985), is created by a group of male teens seeking a perfect female partner. Pris Stratton, a sentient replicant in Do Androids Dream of Electric Sheep? (Dick 1968), is an android that rebels against her slavery as a prostitute. The barely-sentient Fembots in Austin Powers: International Man of Mystery (1997) are assassins that use their sexual allure as part of their job. Non-sentient female robot-prostitutes in WestWorld (1973) are part of vacation packages. Real, non-sentient robots that inspire love and affection exist but they are not necessarily humanoid. PARO (Chang 2013) is a small, furry robot shaped like a baby seal that emits soft, pleasing sounds and has large eyes that focus on the person holding it. It is being used for social assistance and therapy, primarily with isolated and ill elderly individuals. Bemelmans (2012) reviewed 41 publications reporting on 17 studies on the therapeutic use of PARO and robots like it in long-term healthcare. He found that “Most studies reported positive effects of companion-type robots on (socio)psychological (eg, [sic] mood, loneliness, and social connections and communication) and physiological (eg, [sic] stress reduction) parameters.” (Bemelmans 2012: 114). The documentary Mechanical Love (2007), provides a moving example: PARO appears to listen attentively to the stories of a lonely, elderly woman who tells it, “You are my greatest joy.” 15 The theme of parental love for robots is not addressed here. Two icons that embody that theme are Pinocchio (Collodi 1926) and Tetsuwan Atomu / Mighty Atom ( Tezuka 2008 and Tezuka 2000). Pinocchio is a marionette that becomes a human after it changes its evil ways. Tetsuwan Atomu is a robot that was created by a grief-stricken, mad scientist to replace his dead son.

2 Cultural icons   


AIBO, Sony’s speech-enabled, robot dog (Friedman, Kahn and Hagman 2003 and Markowitz 2013) bids for the attention of an entire family by behaving like a loving pet. It wags its tail when petted, responds to its name, and follows more than 40 verbal commands. Friedman, Kahn and Hagman (2003) analyzed more than six thousand posts to online discussion forums about AIBO. More than half of those posting attributed true mental states, social rapport, and/or emotions (anger, sadness) to AIBO. These and other researchers express concern that humans might become dependent on robots for love and companionship. Pepper (Park 2014), the newest affective humanoid-robot, is being marketed as a companion robot by its developers: Japanese telecom-giant SoftBank and French robotics firm Aldebaran. Pepper is a diminutive, white robot with a childlike, albeit immobile, face; articulated and movable finger, hands, and arms; and a torso dominated by a rectangular LED screen. Its head swivels to follow passersby or to look into the eyes of a human standing in front of it. It responds when someone speaks to it (e.g., looking at the face of the speaker, using gestures, and outputting recorded utterances). Like its affective predecessors, such as MIT’s Cog (Fitzpatrick et al. 2003, Picard 1997) and Kismet (Breazeal 2002 and Breazeal 2000), and the companion robot in Tampion’s short story, Pepper is programmed to try to make people happy. The promotional video, Japanese Telecom SoftBank Unveil [sic] “Pepper” Emotional Humanoid Robots (2014), highlights this function in Pepper by having the robot display a pink heart on the screen in its chest. Pepper and other affective robots can recognize, interpret, and express human emotions using technology based on extensive modeling of human affective-behavior. Their technology causes them to orient to humans who are interacting with them, uses microphones and video sensors to capture verbal and non-verbal behavior (e.g., intonation, facial expression, and gesture), and examines that input for affective cues. The robot’s programming then calculates the best response and generates behaviors that express that response (Breazeal 2000). Real-world correlates of fictional sex robots tend to be female but most of them are not robots. LovePlus is a Japanese game for mobile devices that provides animated, virtual companions. Lonely men can converse with and court these attractive, female avatars and some rely on them for love and emotional support (Bosker 2014). There are professional, phone-sex services with operators who pretend to be robots. A nowdefunct Usenet group called ASFR (All Sex Fetish Robots) spawned a number of websites and online groups bearing similar names and supporting fantasies about robot sex, companionship, and reverse-Pygmalion transformations (changing a human into a robot). In 1980, futurist David Levy predicted, among other things, that these services will be joined by robot prostitutes he calls “sexbots” (Choi 2008 and Levy 2008). Levy’s prediction was, however, far too conservative. In 2010, for example, TC Systems (True Companion Systems) introduced its female sexbot Roxxxy. TC Systems added a male sexbot, Rocky. Both Roxxxy and Rocky are built for sex (and, if necessary, companionship). For example, Roxxxy exhibits special behavior during orgasm (which, being non-sentient, it is clearly faking).


   Part I: Images

3 Speech and language Unlike Pygmalion’s Galatea, the fictional robots described in the previous section are fluent speakers of human language. Pygmalion’s statue does not speak and Ovid’s poem makes no mention of an ability to talk by the living woman she becomes. Some adaptations of the myth enable the former statue to talk but not necessarily the way a human speaks. The Galatea of Jean-Jacque Rousseau’s Pygmalion: A lyrical scene (Rousseau 1811 cited in Miles 1999: 389–390), for example, appears incapable of saying more than “Myself,” in reference to herself, and “This is not myself,” when indicating statuary and marble. In George Pettie’s Pygmalion’s Friends and His Image, however, Galatea voices her refusal to have sex with Pygmalion before they are married: “she bade him leave for shame, and was presently turned to a perfect proper maid” (Pettie 1576 cited in Miles 1999: 357). The minimal role played by speech is not surprising since Galatea is little more than an obedient servant to her creator/husband. Speech is, however, the centerpiece of Smollett’s (1752) and Shaw’s (1916) adaptations. Smollett’s hero, Peregrine Pickle, trains a foul-mouthed, young beggar to speak and behave like a lady and then introduces her into the “beau monde of the country.” She’s accorded widespread acceptance and greatly admired for her wit until the day she becomes angry, reverts to her linguistic roots, and is promptly shunned. Shaw’s (1916) Henry Higgins wagers he can convince “polite society” to accept a Cockney flower-girl simply by changing her speech and manners. As with other aspects of behavior, verbal communication by real-world, affective robots, is constructed from a great deal of modeling. They do not possess sophisticated speech and language abilities but they can distinguish affective from neutral speech and recognize paralinguistic patterns (e.g., intonation, speed of speech) in speech that indicate affective intent (e.g., anger, praise, soothing). The creation of databases such as The Gibberish Speech Database for Affective Human-Robot Interaction (Yilmazyldiz et al. 2011) is helping developers test and improve their models. Some robots are also able to generate prosodic contours in speech synthesis using models for synthesizing affective speech such as the one developed by Janet Cahn (1990). MIT’s Kismet is also able to synchronize its lips to match consonant production but neither Kismet nor other affective robots have mastered dynamic production of affective spoken language (Breazeal 2000, Breazeal 2002). Speech is also present in sexbots. The only way human phone-sex operators can seem robotic is to use the flat affect and staccato word sequencing of stereotypic robot speech. The sexbots marketed by TC Systems have varying levels of speech recognition and synthesis. The lowest-priced model can only utter pre-recorded, sex-related phrases. The most expensive model can recognize speech and retrieve what is, hopefully, a suitable verbal response from its database of hundreds of pre-recorded utterances (O’Brien 2010).

2 Cultural icons   


B Robots loving humans None of the real-world robots in the previous section  is capable of feeling love or desire. They are simply machines controlled by programming that sometimes can detect and react to affective behavior by humans. Hooman Samani of the National University of Singapore claims his team is building robots that can express love. They are non-humanoids that contain technology called “lovotics.” Lovotics engineering includes artificially-created “love” hormones (Oxytocin, Dopamine, Serotonin, and Endorphin) that enable a robot to actually love a human. To ensure accuracy, the lovotics robots also possess facial-recognition and speaker-identification technology (Anthony 2011 and Bosker 2013). Despite these claims, futurist David Levy estimates that sentient robots capable of feeling love will probably not exist before the middle of this century (Levy 2008). There is, however, a body of literature portraying love by sentient non-humans. This is an important consideration for the future when/if affective robots become sentient.

1 History The Tin Man in The Wonderful Wizard of Oz (Baum 1900) provides the first glimpse of a fictional robot capable of loving: it wants to have a heart. The quest is successful but Baum’s novel ends before the Tin Man finds romantic love. Twenty years later, however, Čapek (1920) introduces robot lovers Primus and Helena (Section II.A). Since then, there have been a number of robot-robot love stories, including two adaptations of Shakespeare’s Romeo & Juliet featuring robots: Robio and Robiette, a 1981 episode of the Tetsuwan Atomu / Mighty Atom series; and Romie-0 and Julie-8, a 1979 television special by Canadian animation company Nelvana in which the star-crossed lovers are androids manufactured by competitors. In the computer-animated film Wall-E (2008) a box-shaped, trash-compactor robot (WALL-E) and EVE, an ovoid robot probe, meet and fall in love. Saucer-shaped, fix-it machines in Batteries Not Included (1987) bond and reproduce. Androids Tron and Yori become lovers in the action film Tron (1982); and android Deputy Andy¹⁶ and AI system SARAH from the television series Eureka fall in love but their planned marriage is cancelled when SARAH gets “cold feet” (Eureka 2010–2011). In Robert Sheckley’s short story ‘The Robot Who Looked Like Me’ (Sheckley 1978), a woman and man each builds a doppelganger android of themselves because they are too busy for romance. The androids fall in love and run off precipitating questions about which pair of individuals is the robots. Other fictional robots love humans but often this ends unhappily for the robot. Finn, an android in Clarke’s (2013) novel The Mad Scientist’s Daughter, loves the young woman it has helped raise but she is incapable of returning Finn’s affections. Comic16 Deputy Andy is one of the characters in the series that is modeled on the Andy Griffith Show. Andy is an android reincarnation of Sheriff Taylor.


   Part I: Images

book, humanoid Astroman (Allred 1990) becomes enamored with a human acquaintance who cannot reciprocate. The robot singer in the song ‘Robot Love’ (Hawthorne 2013) proclaims “I’m a lover / with robot veins” but then complains about being in a “one way road” relationship with a woman. Other sentient, non-humans experience the same fate. The golem in Bretan’s 1923 opera, The Golem (as cited in Gagelmann 2001: 87–99), falls in love with Rabbi Leow’s granddaughter but it is destroyed by the Rabbi. Kurt Vonnegut’s mainframe computer EPICAC (Vonnegut 1968) is smitten with a human programmer. When it can neither have her love nor become human, it bursts into flames. The Frankenstein monster pleads with Victor Frankenstein to create a female creature that can give it love and companionship. When Victor refuses to do so the monster begins murdering everyone Victor loves.¹⁷ Some humans do return the love of an amorous robot. Android (later, cyborg) Andrew Martin of Asimov’s The Bicentennial Man (1976) falls in love with Portia Charney, the granddaughter of its original owners. She returns that love and they fight to obtain legal approval of their marriage. The manga / anime Chobits pairs personal computer (“Personacom”) Chi with Hideki Motosuwa, the boy who retrieves Chi from the trash (Clamp 2001). A computer scientist in the novel He, She and It (Piercy 1991) returns the love of Yod, a cyborg-golem. The golem couple in the animated television series The Simpsons (The Simpsons: Treehouse of Horror XVII 2006) fall in love and marry and, in the motion picture Her (2013), a human and a virtual-interface experience mutual love.

2 Speech and language Verbal communication is central to expressing love. The clearest example of this occurs in the film Her in which the personal and affective evolution of AI interface Samantha is revealed entirely through speech. According to futurist David Levy (Choi 2008, Levy 2013 and Levy 2008), human-level speech and language will be essential for meaningful relationships between humans and real-world robots, as well. In a 2013 interview he stated: Human-computer conversation [is], in my opinion, the most difficult aspect of developing a robot personality or a robot capability with which people can fall in love. I think the two are very closely linked because I think it’s quite difficult for most people…to fall in love with someone you can’t have a decent conversation with. (David Levy speaker preview of Anticipating 2025 2013)

Being able to maintain a coherent discourse is extremely important for many kinds of relationships. When it comes to love, however, communication goes far beyond the ability to converse. Linguistic expertise must be blended with verbal and non-verbal

17 Perhaps this also explains the purpose of the sequel to the 1931 film, Bride of Frankenstein (1935). Unfortunately, the bride created for the monster also rejects the creature.

2 Cultural icons   


affective cues, history with and knowledge of the human involved, and understanding of cultural and social conventions. Knowing how to converse with a human with whom you are having (or want to have) an intimate relationship entails sensitivity to emotional and situational dynamics. In their examination of everyday interactions between newlyweds, Driver and Gottman (2004) found evidence that, …couples build intimacy through hundreds of very ordinary, mundane moments in which they attempt to make emotional connections. Bids¹⁸ and turning toward may be the fundamental units for understanding how couples build their friendship. (Driver and Gottman 2004: 312)

When they occur following a disagreement, such simple attempts to re-connect are emotionally-charged. Rejection of a bid or ascribing unspoken sub-texts to it can greatly exacerbate a negative situation whereas acceptance is often the second step towards conciliation. As indicated previously (Section IV.A), robot skills in verbal and non-verbal affect are evolving and some of this functionality is already being implemented in non-sentient, affective robots. Breazeal et al. (2013) indicate how far social robots are from being able to manage simple social gestures, such as bids for friendship, from virtual strangers. For example, following the successful completion of a joint task a human participant leaned towards the robot and said “Wanna grab a beer?” but walked away in disgust after receiving no response from the robot. (Breazeal et al. 2013: 106). Cultural and social conventions require understanding of context, interpersonal history, cultural rules, and taboos. A robot needs to understand that the way one speaks with a lover in a private, intimate setting is vastly different from how one talks in public, even though humans may ignore those barriers. It could also be argued that a conversationally-astute robot that abducts, assaults or castigates a resistant recipient of its love (or another suitor) are acting like crude, violent, or even psychopathic humans (e.g., Kenji (Merchant 2013), which carried off a lab assistant, and ABE (ABE 2013), the twisted robot programmed to love). Do we want to build robots that behave like that?

C Managing robot love Researchers have determined that humans attribute life, animacy, and real emotion onto affective robots (e.g., Bryson 2010, Friedman, Kahn and Hagman 2003 and Okita 2014). As indicated at the outset of this section, this is likely unavoidable – even for robots not designed to be social. The extent to which this may pose a danger to individuals and society is a debate that is still raging.

18 A “bid” occurs when one person tries to initiate an interaction either verbally or non-verbally (e.g., smiling, telling a story).


   Part I: Images

On the other side of the equation are sentient robots of the future that could experience love and attraction. Unless their programming constrains their affection, they will follow their own hearts. They may even become enamored of someone (or something) that is unattainable. The fictional computer EPICAC (Vonnegut 1968) destroys itself because it cannot have the woman of its dreams. Kenji, the lovesick robot in an internet hoax, tries to carry off the lab assistant for whom it yearns (Merchant 2013 and Wilson 2009). ABE (ABE 2013), a robot programmed to love, feels abandoned by those for whom it cares so deeply. That rejection leads ABE to try to “fix” any human who fails to return its devotion. Lovotics researchers claim their real-world, robotic lovers experience jealousy when they perceive that the object of their desire is attracted to another human (Anthony 2011 and Bosker 2013). These examples suggest that the moral compass being developed for military robots (see Section II.C) could apply equally to sentient, affective robots. The android/cyborg, Andrew Martin, in Bicentennial Man (Asimov 1976) who wants to marry a human, raises legal as well as ethical questions. In that book, they were resolving using the legal system but, in the play R.U.R., the androids resorted to violence. Roboethics researchers and others concerned about ethical treatment of sentient AI and robots (e.g., Danish Council of Ethics 2010, Markowitz 2013, Nourbakhsh 2013 and Robertson 2014) are already posing similar questions, such as “What laws are needed to protect sentient androids from sexual exploitation?” and “Are there ethical consequences tied to creating robots that look, behave, and speak like humans?” If research in robotics, natural-language processing, and AI continue along their current trajectories, these are questions that future robots may compel humans to address. Hopefully, the resolution of such issues will be achieved peaceably through the application of enlightened law, as exemplified by the case of Andrew Martin, rather than violently, as in the play R.U.R. (Section II.A).

V Conclusion The premise of this chapter is that humans are not passive recipients of technology. More specifically, the views we have regarding social robots and other technologies that may become part of our personal lives are grounded in established expectations. Social robots, in particular, are enveloped by a long history of fears, hopes, and other beliefs that arise from culture, literature, and mythology. These preconceptions have been fueled by popular media. To better understand some of those perceptions, this chapter has provided an overview of the impact of culture and media on views about robots that could occupy three roles in our lives: killer, servant, and lover. We identified cultural icons that symbolize each role and used them to better understand those human-robot relation-



ships. The Frankenstein monster (Shelley 1818) and the play R.U.R (Čapek 1920) evoke fear about being killed by our own creations. Japanese karakuri ningyo and golems of Hebrew lore lead us to consider the idea of humans creating robots that will serve and protect us. In addition, the pleasing appearance and fluid motion of karakuri ningyo have strongly influenced the design of modern Japanese robots. Finally, the Greek myth of Pygmalion and his beloved statue serve as a starting point for considering the dynamics of human-robot love. These icons or their progeny surface every time we hear about a robot or computer that can converse or perform other functions that are seen as uniquely human. We discussed the part speech and language have played in supporting the three roles. The image of Frankenstein and its progeny as inarticulate, rampaging monsters enhances our fear of killer robots – the Frankenstein complex (Asimov 1969 and Asimov 1990). In contrast, the creature created by Victor Frankenstein in Mary Shelley’s (1818) novel is quite eloquent. Its ability to express its deep sense of isolation upon being shunned by humans makes it a sympathetic character. The inability of golems to speak was, for Hebrew scholars, the only indication that a humanoid was a golem rather than a human being. We considered the centrality of speech and language for maintaining an intimate, loving relationship but also pointed to research indicating how complex and multi-faceted such relationships are. Our examination of cultural icons is intended to help designers become cognizant of the effects of culture and media on popular perceptions and the personal attitudes they bring to their work. A great deal more work needs to be done to better understand cultural influences on robot design and acceptance of social robots by the groups whom they are intended to serve. The most obvious application of such research is for robots that are expected to be deployed in multi-cultural environments, such as the Hala robot receptionist project at Carnegie Mellon University (Makatchev et al. 2010). Of equal importance is research on analysis and modeling of culture (Aylett and Paiva 2012) and on understanding cultural aspects of robot design (e.g., Robertson 2010 and Robertson 2007). Work in all these areas will enable developers to make informed choices about features to incorporate into robots they are building.

References 2001: A space odyssey (1968) Directed by Kubrick, S. [Film]. Beverly Hills, CA USA: Metro-GoldwynMayer. ABE (2013) YouTube video, added by McLellan, R. available from watch?v=3xovQcEOdg8. [10 March 2014]. ADVAN Co. Ltd. (2010) Welcome to the World of Karakuri. [online] available from http://karakuriya. com/english/index.htm. [10 December 2013]. Aeschlieman, L. (2007) “What is anime?” BellaOnline [online] 7 November 2007. Available from [5 June 2014].


   Part I: Images

Allred, M. (1990) Madman. Berkeley, CA USA: Image Comics. Anderson, S.L (2008) ‘Asimov’s ‘Three Laws of Robotics’ and Machine Metaethics.’ AI & Society 22 (4), 477–493. Anon. (ca. 720 [2014]) The Nihon Shoki (Hihongi): The chronicles of Japan. (trans. by Morrow, A.) Rochester, VT USA: Bear & Company. Anthony, S. (2011) ‘Lovotics, The new science of human-robot love.’ Extreme Tech [online] 30 June 2011. available from [10 March 2014]. Arkin, R.C. (2014) ‘Lethal Autonomous Weapons Systems and the Plight of the Noncombatant.’ Ethics and Armed Forces – Controversies in military ethics and security policy 1, 1–10 [online]. Available from 257CD70060EA38/$file/Arkin_LAWS_technical_2014.pdf. [25 May 2014]. Arkin, R.C. (2012) Governing Lethal Behavior in Autonomous Robots. Boca Raton, FL USA: Chapman & Hall/CRC Press. Asimov, I. (1990) ‘Little Lost Robot’ in Robot Visions. New York: Penguin Books USA Inc., 161–190. Asimov, I. (1978) ‘The Machine and the Robot.’ in Science Fiction: Contemporary Mythology ed. by Warrick, P.S. and Olander, J.D. New York, NY USA: Harper and Row. Asimov, I. (1976) ‘The Bicentennial Man.’ in The Bicentennial Man and Other Stories. New York, NY USA: Doubleday, 135–172. Asimov, I. (1969) ‘Feminine Intuition.’ The Magazine of Fantasy and Science Fiction 37(4), 4–23. Asimov, I. (1951) ‘Satisfaction Guaranteed.’ Amazing Stories, April, 1951, 52–63. Asimov, I (1950a) I, Robot New York, NY: Gnome Press. Asimov, I (1950b) ‘Liar!’ in I, Robot New York, NY: Gnome Press: 111–132. Asimov, I (1942) “Runaround” in Astounding Science Fiction March 1942, 29(3): 94–103. Austin Powers: International Man of Mystery (1997) Directed by Roach, J. [Film]. Los Angeles, CA USA: New Line Cinema. Aylett, R. and Paiva, A. (2012) ‘Computational Modeling of Culture and Affect.’ Emotion Review 4(3), 253–263. Batteries Not Included (1987) Directed by Robbins, M. Orlando, FL USA: Universal Pictures. Baum, J. F. (1900) The Wonderful Wizard of Oz. Chicago, IL: George M. Hill Company. Bayless, S. (2008) Golems [online] available from [5 November 2013]. Bemelmans, R. (2012) ‘Socially Assistive Robots in Elderly Care: A systematic review into effects and effectiveness.’ JAMDA 13(2), 114–120.e1. Bettini, M. (1992 [1999]) The Portrait of the Lover. (trans. by Gibbs I.) Berkeley, CA USA: University of California Press. Blade Runner (1982) Directed by Scott, R. [Film]. Burbank, CA USA: Warner Bros. Blažek. Z., Jirásek, J. and Čapek, K. (1975) R. U. R. (Rossum’s Universal Robots) Prague, Czech Republic: Media Archiv Prague. Borges, J.L. (1964) ‘El Golem.’ in El Otro, el Mismo. Buenos Aires, Argentina: Edicionnes Neperus, 24. Bosker, B. (2014) ‘Meet The World’s Most Loving Girlfriends – Who Also Happen To Be Video Games.’ The World Post [online] 1 January 2014. available from http://www.huffingtonpost. com/2014/01/21/loveplus-video-game_n_4588612.html. Bosker, B. (2013) ‘Hooman Samani on Kissing with Robots: How machines can mimic human love.’ Huff Post Tech [online] 13 March 2013. available from http://www.huffingtonpost. com/2013/03/13/hooman-samani-lovotics_n_2853933.html. [12 February 2014]. Boyle, K. (2008) [online] available from [10 December 2013]. Breazeal, C. (2002) Designing Sociable Robots. Cambridge, MA USA: MIT Press.



Breazeal, C. (2000) Sociable Machines: Expressive Social Exchange Between Humans and Robots. Unpublished doctoral dissertation. Cambridge, MA USA: MIT. [online] available from http:// [10 January 2013]. Breazeal, C., DePalma, N., Orkin, J., Chernova, S. and Jung, M. (2013) ‘Crowdsourcing Human-Robot Interaction: New methods and system evaluation in a public environment.’ Journal of Human-Robot Interaction 2(1), 82–111. Bride of Frankenstein (1935) Directed by Whale, J. [Film] Universal City, CA USA: Universal Pictures. Bryson, J. (2010) ‘Robots Should Be Slaves.’ in Close Engagements with Artificial Companions: Key social, psychological, ethical and design issues. ed. by Wilks, Y. Amsterdam, the Netherlands: John Benjamins Publishing Company, 63–74. Available from ftp/Bryson-Slaves-Book09.html. [7 March 2014]. Cahn, J.E. (1990) ‘Generation of Affect in Synthesized Speech.’in Proceedings of the 1989 Conference of the American Voice I/O Society (AVIOS). held 9 1989 at Newport Beach, CA USA. San Jose, CA USA: AVIOS, 251–256. Čapek, K. (1920 [1923]) R.U.R. (Rossum’s Universal Robots). (trans. by Selver, P.) London, UK: Oxford University Press. available from [10 May 2010]. Chang, W-L. (2013) ‘Use of Seal-like Robot PARO in Sensory Group Therapy for Older Adults with Dementia.’ in Proceedings of the Eighth ACM/IEEE International Conference on Human-Robot Interaction (HRI). held 3–6 March 2013 at Tokyo, Japan. Piscataway, NJ USA: IEEE, 101–102. Choi, C.Q. (2008) ‘Humans Marrying Robots? A Q&A with David Levy.’ Scientific American [online] 19 February 2008. available from humans-marrying-robots/. [10 January 2014]. Clark, C.R. (2013) The Mad Scientist’s Daughter. Nottingham, UK: Angry Robot Books. Collodi, C. (1926 [2010]) The Adventures of Pinocchio. (trans. by Chiesa, C.D.) [online] available from [10 March 2014]. Connell, J. (2014) ‘Extensible Grounding of Speech for Robot Instruction.’ in Robots That Talk and Listen. ed. by Markowitz, J. Boston, MA USA: Walter de Gruyter: 175–202. Culliford, P. (1973) Benoît Brisefer, Tome 6: Lady d’Olphine. Brussels, Belgium: Le Lombard. Danish Council of Ethics (2010) Recommendations concerning Social Robots. [online] Copenhagen, Denmark: Danish Council of Ethics. available from Homo-Artefakt/Anbefalinger/Udtalelse%20om%20sociale%20robotter.aspx. [10 March 2014]. David Levy speaker preview of Anticipating 2025 (2013) YouTube video, added by Wood, D. [online] available from [10 February 2014]. Der Golem, Wie er in die Welt kam (1920) Directed by Wegener, P. and Galeen, H. [Film]. Berlin, Weimar Republic: n.a. Dick, P.K. (1968) Do Androids Dream of Electric Sheep? New York, NY: Doubleday. Dick, P.K. (1953) ‘Second Variety.’ Space Science Fiction 1(5), 87–109. Driver, J.L. and Gottman, J.M. (2004) ‘Daily Marital Interactions and Positive Affect during Marital Conflict among Newlywed Couples.’ Family Process 43(3), 301–314. Dufty, D. (2014) “Android Aesthetics: Humanoid robots as works of art.” in (ed.) Markowitz, J. Robots That Talk and Listen. Boston, MA USA: Walter de Gruyter, 55–78. Einbinder, C. (2012) R.U.R.: A futurist folk opera (Theater Project 27), Brooklyn, NY USA: Adhesive Theater. Eureka (2010–2011) Syfy [9 July 2010–6 December 2011]. Fitzpatrick, P. Metta, G., Natale, L., Rao, S. and Sandini, G. (2003) ‘Learning About Objects Through Action – Initial steps towards artificial cognition.’ in Proceedings of the International Conference on Robotics and Automation (ICRA). held 12–17 May 2003 at Taipei, Taiwan. Piscataway, NJ USA: IEEE, 3140–3145. Foerst, A (2004) God in the Machine. New York, NY: Plume.


   Part I: Images

Foerst, A. (2000) Stories We Tell: Where robotics and theology meet. [online lecture] God and Robots Series, 6 April 2000. available from [10 November 2013]. Frankenstein (1931) Directed by Whale, J. [Film]. Universal City, CA USA: Universal Pictures. Friedman, B., Kahn, P.H. and Hagman, J. (2003) ‘Hardware Companions? –What online AIBO discussion forums reveal about the human-robotic relationship,’ in Proceedings of the 2003 SIGCHI Conference on on Human Factors in Computing Systems (CHI). held 5–10 April 2003 at Ft. Lauderdale, FL USA. New York, NY USA: ACM , 273–280. Futurama (1999–2013) Fox [28 March 1999–10 August 2003] and Comedy Central [23 March 2008– 4 September 2013]. Futurama (2007) John Di Maggio on Bender [Video] The Voices of Futurama. available from http://–-john-dimaggio-on-bender. [15 November 2013]. Gagelmann, H. (1998 [2001]) Nicolae Bretan, His Life – His Music. (trans. by Glass, B.) Hillsdale, NY USA: Pendragon Press. Gier, N. (2012) ‘The Golem, the Corporation, and Personhood.’ Idaho State Journal Politics. 14 April 2012 [Blog] available from [07 January 2014]. Godzilla vs. Mechagodzilla (1974) Directed by Fukuda, J. [Film]. Tokyo, Japan: Toho. Griffiths, M. (2014) ‘Techno Notice: A beginner’s guide to robot fetishism.’ Drmarkgriffiths. June 2014 [Blog] available from Gygax, G. and Kuntz, R.J. (1975) Dungeons & Dragons: Greyhawk supplement. Lake Geneva, WI USA: TSR Inc. Hawthorne, M. (2013) Where Does This Door Go? New York, NY USA: Republic Records. available from [7 June 2014]. Heath, S., Ball, D., Schulz, R. and Wiles, J. (2013) ‘Communication between Lingodroids with Different Cognitive Capabilities.’ in Proceedings of the 2013 IEEE International Conference on Robotics and Automation (ICRA). held 6–10 May 2013 at Karlsruhe, Germany. Piscataway, NJ USA: IEEE, 490–495. Hebrew Publishing Company (1930) The Holy Scripture. New York: Hebrew Publishing Company. Her (2013) Directed by Jonze, S. [Film]. Burbank, CA USA: Warner Bros. Heresy, G. (2009) Falling in Love with Statues: Pygmalion to the present. Chicago, IL USA: University of Chicago Press. Hornyak, T.N. (2006) Loving the Machine. Tokyo, Japan: Kodansha International. Hsu, J. (2009) ‘Real Soldiers Love Their Robot Brethren.’ Live Science [online] 21 May 2009. available from [5 January 2014]. I, Robot (2004) Directed by Proyas, A. [Film]. Los Angeles, CA USA: 20th Century Fox. Japanese Telecom SoftBank Unveil “Pepper” Emotional Humanoid Robots (2014) YouTube video, added by Sumahan TV. available from [8 June 2014]. Jayisgames (n.d.) Golem [online] available from [14 November 2013]. Koerth-Baker, M. (2013) ‘How Robots Can Trick You Into Loving Them.’ New York Times Magazine. [online] 17 September 2013. available from how-robots-can-trick-you-into-loving-them.html?pagewanted=all&_r=0. [1 March 2014]. Kurokawa, K. (1991) ‘The Philosophy of the Karakuri.’ In The Philosophy of Symbiosis from the Ages of the Machine to the Age of Life (Chapter 11) ed. by Kurakawa, K. Washington, DC. USA:



American Institute of Architects Press. 124–133. available from book/chapter11.html. [10 December 2013]. Kurzweil, R. (1990) The Age of Intelligent Machines. Cambridge, MA: MIT Press. Leivick, H. (1921 [1972]) The Golem. (trans. by Landis, J.C.) New York, NY USA: Dramatists Play Service, Inc. Levin, I. (1972) The Stepford Wives. New York, NY USA: Random House. Levy, D. (2008 ) Love and Sex with Robots. New York, NY USA: Harper Perennial. Lieh-tzǔ (ca. fourth century [1912]) Taoist Teachings from the Book of Lieh-Tzŭ.( trans. by Giles L.) London, UK: Wisdom of the East. Makatchev, M., Fanaswala, I., Abdulsalam, A., Browning, B., Ghazzawi, W., Sakr, M. and Simmons, R. (2010) ‘Dialogue Patterns of an Arabic Robot Receptionist.’ Paper No. 746 Robotics Institute, School of Computer Science, Carnegie Mellon University. available from http://repository.cmu. edu/robotics/746. [1 May 2014]. Markowitz, J. (2013) ‘Beyond SIRI: Exploring spoken language in warehouse operations, offender monitoring, and robotics.’ in Mobile Speech and Advanced Language Solutions. ed. by Neustein, A. and Markowitz, J. New York, NY USA: Springer Science+Business Media, 1–22. Marvel Comics (1984) Transformers. 1(1) September, 1984. McCauley, L. (2007) ‘Countering the Frankenstein Complex,’ in AAAI Spring Symposium: Multidisciplinary Collaboration for Socially Assistive Robotics. held 26–28 March 2007 at Palo Alto, CA USA. Menlo Park, CA USA: AAAI. 42–44. available from AAAISpringSymposium2007/submissions/aaai_ss_07_id06.pdf p9–10. [10 March 2014]. Mechanical Love (2007) Directed by Ambo, P. [Film]. New York, NY USA: Icarus Films. Merchant, B. (2013) ‘The Truth about Kenji, the Robot Programmed to Love.’ Motherboard 1 April 2013 [online] available from [10 March 2014]. Metropolis (1927) Directed by Lang, F. [Film]. Hollywood, CA USA: Paramount Pictures. Meyrink, G. (1914) Der Golem. Leipzig, Germany: Kurt Wolff. Miles, G. (ed.) (1999) Classical Mythology in English Literature. New York, NY: Routledge. Minkowitz, G. (2013) Growing Up Golem: How I survived my mother, Brooklyn, and some really bad dates. New York, NY USA: Riverdale Avenue Books. Mitzuta, K. (2010-ongoing) ‘Shutainzu Gēto: Bōkan no reberion.’ Monthly Comic Blade. Chiyoda, Japan: Mag Garden. My Fair Lady (1964) Directed by Cukor, G. [Film]. Burbank, CA USA: Warner Bros. Nakai, P. (2013) Buddhism and Robots [email] to Markowitz, J. [2 December 2013]. Neustein, A. (ed.) (to appear) Speech and Automata in Healthcare: Voice-controlled medical and surgical robots. Boston, MA USA: Walter de Gruyter. Nourbakhsh, I.R. (2013) Robot Futures. Cambridge, MA USA: The MIT Press. O’Brien (2010) ‘The First Talking Sex Robot: A (terrified) user review.’ Cracked.[online] 5 February 2010. available from [5 June 2014]. Okita, S.Y. and Ng-Thow-Hing, V. (2014) ‘The Effects of Design Choices on Human-Robot Interactions in Children and Adults.’ in Robots That Talk and Listen. ed. by Markowitz, J. Boston, MA USA: Walter de Gruyter, 285–313. Okita, S.Y. (2013) ‘Self–Other’s Perspective Taking: The use of therapeutic robot companions as social agents for reducing pain and anxiety in pediatric patients. ’ Cyberpsychology, Behavior, and Social Networking 16(6): 436–441. Ovid (ca. 8 [1998]) Metamorphoses (10th book). (trans. by Melville, A.D) Oxford, UK: Oxford University Press.


   Part I: Images

Park, M. (2014) ‘Is This the World’s First Emo Robot?’ CNN World [online] 6 June 2014. available from [8 June 2014]. Picard, R. W. (1997). Affective Computing. Cambridge, MA USA: MIT Press. Piercy, M. (1991) He, She and It. Robbinsdale, MN USA: Fawcett Publications; Reprint edition (23 January 1993). Polybius (n.d.) The Histories. Volume IV. trans by Paton. W.R. (1922–1927) Cambridge, MA USA: Harvard University Press. Transcribed for the Internet by Thayer B. [online] available from*.html. [10 February 2014]. Pratchett, T. (1996) Feet of Clay. London, UK: Orion Publishing. Robertson, E. (2014) ‘What are the ethics of human-robot relationships?’ The Guardian [online] 27 January 2014. available from what-are-the-ethics-of-human-robot-relationships. [10 February 2014]. Robertson, J. (2010) ‘Gendering Humanoid Robots: Robo-sexism in Japan.’ Body & Society 16(2), 1–36. Robertson, J. (2007) ‘Robo Sapiens Japanicus: Humanoid robots and the posthuman family.’ Critical Asian Studies 39(3), 369–398. Robio and Robiette (1981) Nippon Television Network Corporation [7 January 1981]. Romie-0 and Julie-8 (1979) Directed by Smith, C.A. [Internet Film]. Toronto, Ontario Canada: Nelvana. available from [15 May 2014]. Robot & Frank (2012) Directed by Schreier, J. [Film]. New York, NY USA: Samuel Goldwyn Films. Rosenberg, Y. (1909 [2007]) The Golem and the Wondrous Deeds of the Maharal of Prague. (trans. by Leviant, C.) New Haven, CT USA: Yale University Press. Roth, R. (2013) The Corporation Golem. [online] available from [10 November 2013]. Sansom, G.B. (1931) Japan: A short cultural history. Stanford, CA USA: Stanford University Press. Schodt, F.L. (2007 ) Astro Boy Essays: Osamu Tezuka, Mighty Atom, and the manga/anime revolution. Berkeley, CA USA: Stone Bridge Press. Schodt, F.L. (1988) Inside the Robot Kingdom: Japan, mechatronics and the coming robotopia. Tokyo, Japan: Kodansha America, Inc. Shaw, G. B. (1916) Pygmalion. Second edn. New York, NY USA: Brentano. available from [5 November 2013]. Sheckley, R. (1978) ‘The Robot Who Looked Like Me,’ in The Robot Who Looked Like Me ed. by Sheckley, R. London, UK: Sphere Books, 9–19. Shelley, M. W. (1823). Frankenstein; or, the Modern Prometheus. 2nd edn. London, England: G. and W. Whittaker. Shelley, M. W. (1818). Frankenstein; or, the Modern Prometheus. 1st edn. London, England: Lackington, Hughes, Harding, Mavor & Jones. Sherwin, B.L. (2013) Golems and Robots [email] to Markowitz, J. [16 December 2013]. Sherwin, B.L. (2000) Jewish Ethics for the Twenty-First Century: Living in the image of God. Syracuse, NY USA: Syracuse University Press. Sherwin, B.L. (1984) The Golem Legend: Origins and Implications. Lanham, MD: University Press of America, Inc. Shim, H.B. (2007) Establishing a Korean Robot Ethics Charter. Seoul, South Korea: Ministry of Commerce, Industry and Energy, KOREA, Robot Division. available from http://www.roboethics. org/icra2007/contributions/slides/Shim_icra%2007_ppt.pdf. [10 March 2014]. Singer, I.B. (1981 [1983]) The Golem. (trans. by Singer, I.) New York, NY USA: Farrar, Straus, Giroux. Smith, S. (2014) The Tripods of Hephaestus. [online] available from http://www.academia. edu/2517186/The_Tripods_of_Hephaestus. [12 March 2014].



Smollett, T. (1751) The Adventures of Peregrine Pickle. Volume I. Chapter LXXXVII. [online] London, UK: D. Wilson. available from htm#link2HCH0001. [4 March 2014]. Star Trek: First Contact (1996) Directed by Frakes, J. [Film]. Hollywood, CA USA: Paramount Pictures. Star Trek: The Next Generation (1987–1994) Syndicated (multiple channels) [8 May 1989–23 May 1994]. Steels, L (2001) ‘Language Games for Autonomous Robots.’ IEEE Intelligent Systems 16 (5), 15–22. Sturm, J. (2003) The Golem’s Mighty Swing. Montreal, Quebec, Canada: Drawn and Quarterly. Suematsu, Y. (2001) Japan Robot Kingdom [online]. Nagoya, Japan: Department of ElectronicMechanical Engineering, Nagoya University. available from karakurichosaku/JapLoveRobo.pdf. [23 November 2013]. Suzuki, J. (2006–2007) ‘Karakuri Odette.’ Hana to Yume. Chiyoda, Japan: Tetsuya Maeda 5 September 2005–5 December 2007. Tampion, A. (2008) Automatic Lover. Raleigh, NC USA: available from http://homepage. [15 January 2014]. Terminator2: Judgment Day (1991) Directed by Cameron, J. [Film]. Culver City, CA USA: TriStar Pictures. Tezuka, O. (1975 [2008]) ‘The Hot Dog Corps.’ in Astro Boy. (trans. by Schodt, F.L.) Milwaukei, OR USA: Dark Horse Manga. Tezuka, O. (2000 [1988]) Tetsuwan Atomu. Akito Shoten 19,17–19 (trans. and reprinted in Schodt, F. (1988) Inside the Robot Kingdom. Tokyo, Japan: Kodansha International). The Black Hole (1979) Directed by Nelson, G. [Film]. Burbank, CA USA: Buena Vista Distribution. The Borg Documentary, Part 1 (2010) YouTube video, added by TrekCore. available from http://www. [10 November 2013]. The Day the Earth Stood Still (1951) Directed by Wise, R. [Film]. Hollywood, CA USA: 20th Century Fox. The Daleks (1963–1964) BBC [21 December 1963–1 February 1964]. The Golem (2013) Fox [9 December 2013]. The Matrix (1999) Directed by Wachowski, A. and Wachowski, L. [Film]. Burbank, CA USA: Warner Bros. The Munsters (1964–1966) CBS [24 September 1964–12 May 1966]. The Robots of Death (1977) BBC [29 January 1977–19 February 1977]. The Simpsons: Treehouse of Horror XVII (2006) Fox Broadcasting Company [17 Nov 2006] available from [5 November 2013]. The Terminator (1984) Directed by Cameron, J. [Film].Distributed Los Angeles, CA USA: Orion Pictures. Tron (1982) Directed by Lisberger, S. [Film] Burbank, CA USA: Buena Vista Distribution. Tucker, P. (2014) ‘Now The Military Is Going To Build Robots That Have Morals.’ DefenseOne [online] 13 May 2014. available from [1 June 2014]. Tung, F-W. and Chang, T-Y. (2013) ‘Exploring Children’s Attitudes towards Static and Moving Humanoid Robots.’ in Kurosu, M. (ed.) The 15th International Conference, HCI International 2013: Users and contexts of use, LNCS 8006. held 21–26, July 2013 at Las Vegas, NV USA. Berlin-Heidelberg: Springer, 237–245. Valley, E. (2005) The Great Jewish Cities of Central and Eastern Europe: A travel guide and resource book to Prague, Warsaw, Cracow and Budapest. Lanham, MD USA: Roman and Littlefield Publishers, Inc. Veruggio, Gianmarco (2007). EURON Roboethics Roadmap, Release 1.2. [online]. available from [10 March 2014].


   Part I: Images

Vonnegut, K. (1968) ‘EPICAC.’ in Welcome to the Monkey House. New York, NY USA: Delacorte Press. Wall-E. (2008) Directed by Stanton, A. [Film] Burbank, CA USA: Walt Disney Pictures. Wallach, W. and Allen, C. (2010) Moral Machines: Teaching robots right from wrong. Oxford, United Kingdom: Oxford University Press. Wecker, H. (2013) The Golem and the Jinni. New York, NY USA :Harper. Weddle, P. (2010) Touching the Gods: Physical interaction with cult statues in the Roman world. Unpublished PhD thesis. [online] Durham, England: Durham University. available from http:// [7 March 2014]. Weddle, P. (2006) The Secret Life of Statues; Ancient agalmatophilia narratives. Unpublished Master’s thesis. [online]Durham, England: Durham University. available from http://etheses. [7 March 2014]. Weingartz, S. (2011) Robotising Dementia Care? A qualitative analysis on technological mediations of a therapeutic robot entering the lifeworld of Danish nursing homes. Unpublished Master’s thesis. [online] Maastricht, The Netherlands: Maastricht University. [online] available from _Final_ version.pdf [15 December 2013]. Weird Science (1985) Directed by Hughes, J. [Film]. Universal City, CA USA: Universal Pictures. Westworld (1973) Directed by Crichton, M. [Film]. Beverly Hills, CA USA: Metro-Goldwyn-Mayer. Wiesel, E. (1983) The Golem. New York, NY USA: Summit Books. Wikipedia (2014) Edo Period. [online] available from [10 December 2013]. Williamson, J. (1947) With Folded Hands. Reading, PA USA: Fantasy Press. Wilson, D.H. (2011) Robopocalypse. New York, NY USA: Doubleday. Wilson, D.H. (2005) How to Survive a Robot Uprising: Tips on defending yourself against the coming rebellion. New York, NY USA: Bloomsbury USA. Wilson, M. (2009) ‘Robot Programmed to Love Traps Woman in Lab, Hugs Her Repeatedly.’ Gizmodo [online] 3 May 2009. available from [5 March 2014]. Yilmazyildiz, S., Hendrickx, D., Vanderborght, B. Verhelst, W., Soetens, D. and Lefeber, D. (2011) ‘EMOGIB: Emotional Gibberish Speech Database for Affective Human-Robot Interaction.’ In D´Mello, S., Graesser, A., Schuller, B., Martin, J.-C. (eds.) Proceeding of Affective Computing and Intelligent Interaction – Fourth International Conference (ACII), Part II, LNCS 6974. held 9–12 October 2011 at Memphis, TN USA. Berlin-Heidelberg, Germany: Springer, 163–172. Yorinao, H.H. (1730 [2012]) Karakuri-zui. (trans. by Kazao, M.) available from [10 December 2013]. Young Frankenstein (1974) Directed by Brooks, M. [Film]. Los Angeles, CA USA: 20th Century Fox.

David F. Dufty

Android aesthetics: Humanoid robots as works of art Abstract: The goal of building machines that look and behave like humans predates industrial technology by several centuries. It continues today as the field of android science. While the construction of androids represents scientific and technological advances, such machines are often created for aesthetic purposes, such as in animatronic robots or in lifelike portrayals of living or dead individuals. This chapter examines the artistic aspect of androids from their inception in medieval times to the modern period, with a particular focus on the Philip K. Dick android as an example of a complex scientific and engineering enterprise that was also a work of art. It is argued that the pursuit of android technology for entertainment and artistic purposes may have scientific flow-on effects, but these purposes are a legitimate goal in their own right for robotics. The implications of the uncanny valley for design principles are also discussed.

I Introduction An android is a machine that emulates a human. Despite the prominence of androids in popular media and in the popular imagination, android science is a relatively small subdiscipline within the broader disciplines of robotics and artificial intelligence. While robotics in general involves the development and production of a wide variety of machines across a diverse range of tasks and contexts, android science is focused on the development of robots that imitate humans in both behavior and appearance. Some believe that the quest to develop interactive humanoids represents a failure to understand the true power of robotics within a broad and diverse range of devices. This view is exemplified by the strongly anti-humanoid blog post ‘There is no point making robots look and act like humans’ by Silvia Solon, deputy editor of (Solon 2011). In support of her position, Solon cites the following statement by Francesco Mondada, a researcher at the National Robotics Centre in Switzerland, “We should improve objects instead of creating one device that is exterior to the other objects that can interact with the regular household. Instead of having a robot butler to park my car, we should be getting the car to park itself. This is the way things are moving” (Solon 2011). Similarly, according to technology commentator Martin Robbins, the excessive focus on AI and humanoid robots has produced widespread misunderstanding about what has actually been achieved in robotics. Robbins writes, “Humanoid robots are seen as the future, but for almost any specific task you can think of, a focused, simple design will work better” (Robbins 2014).


   Part I: Images

Others argue in favor of pursuing research and development of androids. Ayala (2011) examines their role in entertainment (e.g., in theme parks and motion pictures) and Dautenhahn et al. (2005) describes them as facilitating human-machine interaction. Given the nascent state of android science, androids have, for the most part, had little in the way of everyday useful functionality; serving more as proofs of concept or showcases for manufacturing or research. The reason is simple: creating the physical components of a working android is a significant achievement in itself; endowing it with functionality is an even greater challenge. Why think of androids as works of art? Outside of research institutes, the primary purpose for building replicas of humans is artistic, in a broad sense, such as in animatronics for theme parks and cinema. Even within research environments, a great deal of care is put into the physical appearance of androids. There is a good reason for this: since android technology is not currently at a point where they can do anything useful, the appearance of the android is the most important feature. Furthermore, the subject matter of published works on androids and android prototypes tends to be guided by reasons less to do with science and utility than aesthetics and philosophy. Additionally, art may one day be an important application of android technology, as it offers the possibility of an entirely new kind of art: the robotic portrait. Before exploring these ideas in more detail it is worthwhile to revisit the predecessors of today’s androids and humanoid automata. This chapter explores the history of the artistic and aesthetic function of androids. The goals of that review are to establish and validate the significance of the aesthetic value of androids and to demonstrate that the pursuit of androids as art has already been shown to be of considerable worth.

II The history of automata for art and entertainment The idea that machines that behave like humans could exist is an old one. Needham (1956: 53) recounts a story from the Lie Zi, a Daoist manuscript written in China in the fourth century. It tells of a mechanical engineer living twelve centuries earlier who gave an android to a powerful king. The mechanism looked and behaved exactly like a human: it could talk, sing, walk and dance. Homer’s Iliad makes reference to “handmaids of gold… resembling in all worth living young damsels” and who moved “with voluntary powers” (Chapman1843: 147). In the fifteenth century, Leonardo da Vinci designed a robot knight with a complex internal structure. It had movable arms and legs and an analog programmable-control fitted into the chest (Moran 2006). Unfortunately, construction of the knight was never attempted.

3 Android aesthetics: Humanoid robots as works of art   


A Automata in ancient times Functioning automata were built in Hesdin, in France in the late thirteenth century. Robert II, Count of Artois, commissioned the construction of several mechanical humans and animals including monkeys, birds, and a waving king. These automata, which survived until the sixteenth century, were most likely powered by weightdriven mechanical devices (Kolve 2009: 185; Truitt 2010). They served a purely aesthetic purpose and had no “functional” role at all. While the animals were generic robotic representations of their living counterparts, the mechanical king was a robotic depiction of a real person. It would not have fooled anyone, but it was a step beyond the idea of a traditional portrait or sculpture. It would be centuries before another robotic portrait was created of a real, living human. There are earlier instances of machine-operated replicas of humans that were built purely or primarily for aesthetic reasons. For example, two centuries before the robots in Hesdin, al Jazari, an engineer in Anatolia (now South-Eastern Turkey), built water-powered moving objects including one that simulated a trio of musicians in a boat. The instruments included a drum that was powered by a cylinder and programmable hammers. The musicians themselves did not have moving parts, so they cannot be described as androids and, as far as is known, they were not designed to look like specific individuals. Rather, the mechanism was designed to give the appearance of humanness; and it was programmable (Gunalan 2007). In the seventeenth century, German advances in clockmaking led to the development of small, automated figures, including the well-known cuckoo clock. Those artisans also built mechanical human figures. These clockwork automata served no practical purpose. While they typically demonstrated that the clock had struck the hour, they were not indispensable because those clocks also had chimes or bells. They did, however, add entertainment and further highlighted the skill of the clockmaker.

B Clockmaking and automata In eighteenth Century France, Jacques de Vaucanson made technically-advanced automata, including a duck and a flute player. The clockwork flute-player was first exhibited in 1738.¹ It was capable of playing twelve distinct programmed tunes on a real flute using wooden fingers and a network of levers. It was built according to the principles of human anatomy, which Vaucanson had studied at the Jardins du Roi in Paris (Moran 2007; Wood 2003). Vaucanson subsequently built a second android flautist that was capable of 20 tunes. His third creation was a mechanical duck that had a functioning digestive system; food could go in one end and come out the other. The 1 In the 9th century, the Banū Mūsā brothers of Bagdad created a programmable flute player but it was a fountain and did not have humanoid form (Koetsier 2001).


   Part I: Images

duck became the most famous of Vaucanson’s robots. That fame was largely because it was a feat of engineering that demonstrated what could be done with the latest technology.

C Karakuri ningyo Karakuri ningyo (Boyle 2008; Tarantola 2011) are automata that have been crafted in Japan since the Edo period (1603–1868). The name can be roughly translated as “person-shaped mechanisms” although the term originally referred to all types of mechanical devices (ADVAN Co. Ltd. 2010). The word “karakuri” may also be translated as “trick,” which speaks to the cleverness of those who crafted them. It is believed that mechanical devices from China and European clockmaking technology provided a springboard for the first karakuri ningyo (Tarantola 2011). There are three major types of karakuri ningyo: Butai Karakuri, Dashi karakuri and zashiki karakuri. Butai karakuri are used in theater productions and dashi karakuri are used during festivals to tell traditional religious stories. Both are directly controlled by human operators and, strictly speaking, they are puppets, not automata.² In contrast, zashiki karakuri are miniature automata. They are often made of wood and contain wheels, levers, and other mechanisms controlled by various kinds of power sources, such as sand, steam, and Western-style clockwork gears. Most were built during the Edo period as items of luxury for feudal lords although Boyle (2008) reports that they served the, often essential, function of facilitating conversation between hosts and guests. Each zashiki karakuri performs a complex set of pre-programmed movements, such as doing summersaults, shooting arrows or writing. The most well-known examples are the tea-serving dolls. A tea-serving, zashiki-karakuri doll is activated when a teacup is placed on its tea tray. It moves forward, towards the recipient, nodding its head. When the teacup is taken off the tray it turns around and returns to its original location.

D Animatronics Disney Studios created a laboratory in 1952 known as Walt Disney Imagineering (WDI). WDI was staffed with a team of designers who had engineering and design expertise and had demonstrated creativity. They were known as ‘Imagineers.’ Their task was to design attractions for Disneyworld, which was still in the planning stage.

2 This chapter is concerned with automata: humanoids that operate according to programs.

3 Android aesthetics: Humanoid robots as works of art   


One of the technologies that emerged from WDI was animatronics, which resulted in the production of a series of lifelike automata (Greene and Greene 1998). Disney’s technology was first showcased at the 1964 World’s Fair in New York in the form of a life-sized, robotic Abraham Lincoln that could stand, shake hands, and give a speech (Wallace 1985). The animatronic Lincoln was subsequently moved to Disneyland in Anaheim, California, where it was used in a show titled, “Great Moments with Mr. Lincoln.” The show also became a long-standing feature of the Hall of Presidents at Walt Disney World. Despite two extended absences, it is still in operation (Baker 2011). The behaviors exhibited by animatronic machines are typically produced by preprogrammed performances. If interactivity or variation in behavior is present it is because of direct human intervention and control (Ayala 2011) making the apparatus a puppet rather than an automaton (Tillis 1999). Animatronics has since been used to create a variety of robotic attractions in theme parks (e.g., pirates, aliens, and other US Presidents), motion pictures (e.g., the raptors in Steven Spielberg’s Jurassic Park; Tillis 1999), and as a testing ground for new robotic techniques (Breazeal et  al. 2003; Brooks et  al. 1999; Hanson et  al. 200l). While advances in animatronics continue to contribute to progress elsewhere in robotics, it is also the case that animatronics is a legitimate end-point application of robotic technology – indeed android technology in its own right.

E An android as a self portrait: the work of Hiroshi Ishiguro The Japanese roboticist Hiroshi Ishiguro has built a number of lifelike androids, each modeled on individual, living humans. While Ishiguro has a scientific interest in androids, the level of painstaking detail, the numerous exhibitions that his androids attend, and the subject matter of the androids themselves all place the these robots in the domain of art. An early android created by Ishiguro was Repliee Q1-expo. It was exhibited at the 2005 World Exposition in Japan’s Aichi prefecture (Epstein 2006). Repliee Q1-expo was modeled on Japanese newscaster Ayako Fujiia, and exhibited breathing, blinking, and pre-programmed speech (Bar-Cohen et al. 2009: 39). With the exception of Repliee Q2, the other androids in Ishiguro’s Repliee series are also modeled on real individuals. The face of Repliee Q2 is unique because it was created from a composite of faces of several Japanese women (Bar-Cohen et al. 2009: 39). Following the Repliee robots, Ishiguro’s lab developed the Geminoid series of robots. In 2012, Ishiguro announced the completion of Geminoid F, a singing female android. Other androids include a replica of the traditional, Japanese performer Master Beicho (Ishiguro 2012). Of particular note is Ishiguro’s Geminoid HI series of androids (denoted by Ishiguro’s initials HI), which are all lifelike android replicas of Ishiguro himself (Ogawa


   Part I: Images

et al. 2012). The Geminoid H2 was the first truly lifelike robot to be operated by telepresence. That is, a human operated the android remotely and interacted with other people via the android (Kristoffersson et al. 2013). The Geminoid robots do not have any intelligence; they perform programmed movements, such as breathing and blinking that emulate the human autonomous systems (Vlachos and Schärfe 2013). Thus, the true purpose of the Geminoid androids is not simply to replicate a complete human, but to create the impression of a real, physical human presence.

F Hanson robotics The roboticist David Hanson is another pioneer who creates aesthetic, lifelike androids. He is also a proponent of the synthesis of art and science in android technology. Early in his career, Hanson was employed at Disney Imagineering as a sculptor and technical consultant. He later founded his own robotics enterprise, Hanson Robotics. According to Hanson, he blends robotics, AI, and material science with the artistic domains of sculpture and animation (Hanson 2013). An early robot created by Hanson was K-Bot, a lifelike robot capable of a high degree of expressivity (Blow et al. 2006). He went on to build a robot doll called Eva and a robotic self-portrait (Dufty 2012). This work led to the development of a far more complex robot: Philip K. Dick android. It was an autonomous, conversational android that was built in collaboration with the Fedex Institute of Technology and the University of Memphis (Hanson et al. 2005). It was showcased at the Association of the Advancement of Artificial Intelligence (AAAI) conference in 2005. The Philip K. Dick robot will be discussed in more depth in the following sections of this chapter. In 2008, Hanson collaborated with musician David Byrne to create a singing android named Julio. Julio sang a song written by Byrne, ‘Song for Julio,’ at an exhibition in Madrid, called Máquinas y almas: Arte digital (‘Machines and Souls: Digital Art’)/ The android’s singing voice was, in fact, a recording of Byrne’s voice that had been made in Hanson’s lab prior to the exhibition. The movements were finely crafted to coordinate with the song and even with the sound of Byrne clearing his throat (Dufty 2012: 256–258). Hanson worked with KAIST laboratories in Korea, the makers of the HUBO series of robots. This collaboration produced the Einstein HUBO, a walking android with the body of a HUBO robot and the face of Albert Einstein. The Einstein HUBO was the first android with a mobile humanoid body as well as a humanoid face (Oh et al. 2006). Other work by David Hanson and Hanson Laboratories includes Zeno, a humanoid robot that could engage in conversations. Disney designers were enlisted to make Zeno an engaging character with high-quality artistic design, in order to implement and test Hanson’s hypothesis that robots with a strong design element will be more socially engaging (Hanson et al. 2009).

3 Android aesthetics: Humanoid robots as works of art   


III The Philip K. Dick android In 2005, David Hanson collaborated with a team of roboticists, designers and computer programmers³ to build a life-size replica of an historic figure: the twentiethcentury, science-fiction writer, Philip K. Dick. The android, which could converse and interact in real time, demonstrated that a variety of current technologies can be unified to produce a realistic, intelligent, interactive android with a distinct personality (Dufty 2012). The ability of the android to operate effectively as a social agent depended on a combination of world knowledge, real-time processing, construction of an appropriate physical and social context, and physical realism. Furthermore, modeling the robot on a single individual (including their verbal patterns) provided an additional method of scaffolding the ”humanness” of the machine beyond what might have been created from theoretical principles alone. From the outset, the robot was designed to be a replica of a specific human, Philip K. Dick, in every regard – from his physical appearance to the artificial intelligence (AI) used to represent his knowledge, speech patterns, and other aspects of his behavior. As such, it was described by its inventors as a “robotic portrait” (Hanson et al. 2005). The word “portrait,” invoking artistry, was intentional. In addition to being a technological undertaking, the android was a work of art: a four-dimensional, interactive portrait of a real person. A key feature of the undertaking was that the android was not to be a showcase or proof of concept. Each level of the machinery was a designed to be serious attempt to fashion an intelligent, conversational android. The goal was to create an android that mimicked an historical literary figure. Consequently, it involved integrating a wide range of applications in novel ways, several of which had to be purpose-built. As such, it was an exercise in what may be a future technology, or perhaps a future art form, or both: robotic portraiture. Just as the invention of the camera – a technological advance – led to the new art forms of photography and film. The idea was that customizable robotics may lead to the art form of creating interactive robotic portraits of real people. The Philip K. Dick android was the first time this was accomplished (Dufty 2012). The notion of the robotic portrait had existed for some time, most notably in the fiction of Philip K. Dick, himself. In his novel We Can Build You, for example, he imagined an android Abraham Lincoln that could be used for the purpose of entertainment and historical recreation. In Do Androids Dream of Electric Sheep,⁴ he described a world in which robotic technology had advanced so far that androids were indistin-

3 The key developers involved in the project were Hanson Robotics and the Institute for Intelligent Systems, based at the University of Memphis. Other organizations that contributed resources or played a role to varying degrees included the Automation Robotics and Research Institute at the University of Texas and Mmodal technologies. 4 Do Androids Dream of Electric Sheep? was made into the motion picture Blade Runner


   Part I: Images

guishable from humans; except that they had no ability to feel empathy. It was Dick’s prescience in the field of androids – and his speculation on the possibility of robotic portraits in particular – that was a factor in choosing him as the subject for the first real-life implementation of what had, until then, been an idea found only in science fiction. The Philip K. Dick android garnered intense public interest and was, for a short time, an unofficial ambassador for the field of robotics. Unfortunately, the machine’s life was briefer than expected or planned. It was lost in mysterious and unfortunate circumstances en route from Dallas to San Francisco and has never been found (Dufty 2012). The following sections describe the engineering and design principles of the Philip K. Dick android.

IV Robot components The key physical component of an android is its face. That is especially the case when the purpose of the android is primarily artistic. Even in the case of a full-body android, the face is the focus of social interaction for humans. This extends to interaction with robots (Breazeal and Scassellati 1999). In the case of the Philip K. Dick robot, the head was modeled by first constructing a sculpture that was a faithful replica of Philip K. Dick. That sculpture underwent a series of transformations and finally emerged as a robot head with Dick’s features. The components were a plastic base; a layer of artificial skin; and, between the base and skin, the machinery of levers and cords that emulated human facial musculature (see Figure 3.1). The robotic head and neck were the only moving parts of the android’s body.


Figure 3.1: Philip K. Dick android (photograph courtesy of Eric Matthews)

3 Android aesthetics: Humanoid robots as works of art   


These robotic components were controlled remotely by software that manipulated a complex network of actuators governing the facial muscles and muscle interactions. A camera operated as an eye from within a socket in the skull. It relayed images to a server where Cognitec facial-recognition software and Nevenvision facial-feature tracker monitored faces, scanned for familiar faces (the android could address people it recognized by name), and attempted to determine facial expressions. Once it had calculated the coordinates of people in the nearby vicinity the android could orient its head toward the focus of the conversation and move its head if that person moved around the room. Servomotors worked in unison with the conversational engine to simulate the act of talking and, depending on emotional content of the conversation, to pull the face into configurations that closely resembled expressions for various emotional states. A microphone acted as the ear of the android. It collected a sound stream that was converted into text. Conversely, an outputted response was converted from text to speech and played through a speaker as the voice of the android. Behind the actuators, sensors, decoders and output modules lay an interconnected set of AI modules, a central component of which was the conversational engine.

V Building a dialogue system A Gricean maxims Creating systems that are capable of normal human conversation is an extremely difficult problem. One way to approach it is through the Gricean maxims of communication (Grice 1975). Grice proposed the maxims shown in Figure 3.2 as core elements of well-behaved, human verbal-interaction. His maxims support the notion that orderly conversation is predicated on what is called the Cooperative Principle. This principle asserts that both speakers and listeners work to move the conversation forward. Speakers do so by undertaking to contribute meaningful utterances to the conversation and presuming that the listeners are motivated by the same objective. Listeners endeavor to understand and respond to what the speaker has said. In this way, both speaker and listener collaborate to produce an orderly conversation. Some of Grice’s maxims are relatively straightforward to implement in computational, conversational agents. Meeting all of the maxims simultaneously is not as simple. For instance, it is reasonably easy to write an algorithm that will always satisfy the maxim of Quality by outputting what is true. To do so, an automaton need only query an encyclopedia. But such responses are likely to violate other maxims, such as the maxims of Relation and Manner. In a similar vein, outputting “brief” statements, as required by one of the axioms of the Manner maxim, is hardly a challenge as a


   Part I: Images

computer’s output can be calibrated to be arbitrarily short but that comes at the cost of violating the maxim Quantity as well as other axioms in Manner. The Relation maxim, in particular, lies at the heart of conversation. Beneath the surface layer of words is a submerged landscape of meaning. Navigating this landscape comes naturally to human-language users but has proved elusive to discourse analysts – other than in the most abstract way.

Quantity — Make your contribution as informative as required. (Don’t say too much or too little.) — Make the strongest statement you can. Quality — Do not say what you believe to be false. — Do not say that for which you lack adequate evidence. Relation — Be relevant. (Stay on topic.) Manner — Avoid obscurity of expression. — Avoid ambiguity. — Be brief (avoid unnecessary prolixity). — Be orderly.

Figure 3.2: Gricean maxims of communication (Grice 1975)

B Architecture The conversational AI of the Philip K. Dick android functioned on several levels some of which operated in parallel. The basic architecture was borrowed from AutoTutor (Graesser et  al. 2005), notably the hub-and-spoke architecture in which a controller sends successive iterations of the input to various modules. The software was, however, repurposed to such an extent that the final version bore little resemblance to AutoTutor. The response-generator employed multiple strategies from which it could choose for any particular input. The cheapest of these were a series of frozen (or “canned”) expressions quite similar to those used in the ELIZA program developed by Joseph Weizenbaum in the mid-1960s.⁵ For instance, if asked “What is your favorite movie?” the android would always respond with “My favorite movie is the Bicentennial Man. 5 ELIZA was an early natural-language processing program that used sets of pre-defined “script” to interact with humans. The most well-known is the DOCTOR script which simulated the therapeutic approach used Rogerian psychotherapists. In that script, ELIZA would transform a statement by the

3 Android aesthetics: Humanoid robots as works of art   


Have you seen the Bicentennial Man?” To the question, “What is your name?” it would always reply, “My name is Phil. What’s yours?” This type of response had two benefits. First, it is computationally cheap and therefore fast and easy to compile. Second, for inputs that occur with high frequency, the response could be tweaked for maximum utility and realism. Even if the computational cost was not an issue, forcing the android to compute a favorite movie in a range of unique dialogue contexts would produce different results at different times and may even produce poor-quality outputs, if the system failed to generate a coherent response. Canned responses, despite being cheap, fast, and capable of producing highlytailored, targeted replies to the input, have serious shortcomings. They apply to a limited number of replies and therefore are unsuitable for a domain-independent, general-purpose conversational engine. In fact, they are even unsuitable for conversation in a relatively limited domain. Such dialogue is not truly intelligent, and can’t scale to realistic, real-world interaction with humans. That’s why canned expressions and other chatterbot-style tricks served only as the front line of a multilayered, conversational engine.

C Conversing like Philip K. Dick For the Philip K. Dick android there was an additional constraint: to stay in character. The advantage of choosing a subject such as Philip K. Dick, aside from the obvious thematic appropriateness, was that as a writer, Dick produced a huge corpus of written output that could serve as a working dataset to model his verbal responses. Even better, he was a prolific interviewee. There are entire books of transcripts of interviews with him (e.g., Lee et al. 2001). Aside from his written fiction and expository work, these conversations could be mined in real time if the android was engaged in a new topic of conversation. They provided a nearly-perfect tool for mimicking the responses of this individual. If an instance arose in a conversation that was nearly identical to one that had arisen in conversation with the late science-fiction author then the optimal behavior of the android would be to closely mirror Dick’s behavior. Such a response is not a short cut – it is the most appropriate option for satisfying the ultimate goal, which was to be a robotic portrait; a faithful replica of a specific individual. As documented by Olney (2007a), the principles utilized in analyzing the input and generating a response entailed compartmentalizing the speech steam into chunks of varying length. These chunks were matched against a large database of documents containing a segment of Philip K. Dick’s verbal or written text. Thus, if the author human with whom it interacted into a question. For example, if the human said “I hate my brother” ELIZA would ask “Why do you hate your brother?”


   Part I: Images

had ever responded to a particular query (such as “What is your favorite movie” or a question closely resembling that utterance) then a matching response could quickly be found. Although this approach provided greater flexibility than the programming generally found in a chatterbot it is too limited a strategy to be useful for a broad ranging conversation. Consequently, whenever the database did not contain a query comparable to the one posed to the android, the system identified a selection of partially-matching documents and extracted a small excerpt or phrase within each of those documents that produced the best match to the input. Then those phrases were stitched together into a single response. This three-pronged, response-strategy of finding existing responses, identifying near responses, and piecing together excerpts requires little more than a large corpus of documents that can be searched and mined. Unfortunately, it is an errorprone process. Phrases cannot be merely pasted together to form coherent responses without using syntactic checks, anaphoric reference, and rules for ordering the pieces into a logical and thematically-appropriate sequence. These tools were incorporated into the conversational dialogue system. Olney (2007a) reports that studies with naïve human raters interacting with that system show that the combination of all these techniques produces outputs that fulfill the Gricean maxims moderately well or better.

VI Conversational competence Building a dialogue system is a significant challenge in itself, but in the case of an android, it is but one of many integrated, collaborating subsystems that are needed to maintain a human-like conversation. Integration into the larger system produces a variety of additional challenges.

A Be orderly The Gricean axiom “Be orderly” of the Manner maxim (see Figure 3.2) includes the notion that one should respond at the appropriate time. In other words, when it’s your turn to speak, you should speak; and, when the other person is still talking, you should wait. This problem does not typically arise in tests of conversational systems that use keyboard interfaces because such interfaces provide a natural way to signal to the machine that it is its turn to talk. When there is no such signal, as with speech, the machine needs to rely either on prosodic cues, such as the natural rise and fall of the voice (which is how humans do it), on the structure of the sentences, or simply on a silence that seems to last too long to be a pause.

3 Android aesthetics: Humanoid robots as works of art   


The Philip K. Dick android relied on this last strategy. An early version of the android performed the sequences of steps shown in Figure 3.3.

1. Collect the speech stream until the end of speech stream detected. 2. Convert it to text. 3. Analyze the text. 4. Formulate a response. 5. Output the response.

Figure 3.3: Processing steps performed by early version of Philip K. Dick android

Due to the processing loads involved in these steps, the android would sometimes pause for several seconds while it “thought of something to say.” The Gricean axiom “Be orderly” (of the Manner maxim) requires not merely that the listener understand that the speaker has finished and has ceded the floor but also that it is essential to act on what the speaker has said in a timely fashion. In a conversation, if one person stops talking and the other remains silent for several seconds – even if it’s clear they are thinking about how to respond – it would normally be considered a violation of the “Be orderly” axiom. Granted, there are special cases, such as when one is asked a particularly difficult question. But that was not the issue with the android. Rather, the longer the input, the longer it took the system to process it and therefore the longer the pause before replying. The solution was to process incoming speech incrementally, as it arrived, so that when the person stopped speaking, a response could be formulated and output reasonably quickly. The conversational algorithm was therefore a real-time, modified version of the five-step sequence outlined in Figure 3.3, in which partial representations were available at all times. This strategy worked well, although it tended to fail when the android was given very long inputs. In fairness to the technology, very long inputs are, themselves violations of Gricean maxims. While humans are occasionally required to respond quickly to long, rambling monologues (such as in a court of law), we are not typically called upon to do so in everyday, conversational interaction. This is also true for humans conversing with androids. People interacting with an android in public demonstrations tend to keep their questions and comments short as they want to hear what the android will say in response (Olney 2007b). Therefore, while an excessively long response could cause a malfunction, such occurrences are rare.


   Part I: Images

B Background noise In the real world, however, circumstances are never ideal. In the case of the Philip K. Dick android, the imperfections of the speech-to-text processor were exacerbated by the environments in which it typically operated: crowded, noisy environments with lots of background speech woven into non-speech background noise. Background noise could fool the speech recognition system into thinking that the person that the android was talking to was still speaking. The android would therefore pause, utilizing excessive processing power and taking longer than usual to assemble a complex response. The human conversant would observe the pause and respond in a natural way to that pause by saying something else. That additional input would exacerbate the problem and, if enough dialogue events were chained together by background noise in this way, the dialogue engine would overload and response times would increase exponentially. The solution for that type of response was to have a human operator available to reset the android’s memory-buffer. That solved the problem of overload but involving a human in the process was intended to provide a short-term fix. Longer-term solutions were planned but they were not implemented due to an unforeseen problem that arose later (Dufty 2012).

C Cooperative principle On other occasions, the social context and the Cooperative Principle of orderly conversations (see Section V.A) served to scaffold and correct errors. Misunderstandings or irrelevant segues were construed as jokes, mispronunciations were misheard as correct pronunciations, and grammatical errors were perceived as correct grammar. Sometimes, the android would mishear but still manage to produce a relevant response. These events illustrated that, in several ways, the top-down, interpretive nature of human cognition served as an external, error-dampening mechanism. If a conversation is viewed as an emergent phenomenon between two systems (a human and an android) then the coherence and ”intelligence” of that event is determined by the interaction between the two participants and the Cooperative Principle (Grice’s maxims), plus the assumption by the human of intelligence on the part of the android. In this way, the quality of the conversation itself was dependent in part on the attitude of the human towards the android. There are two reasons that this section has focused on errors and ways in which the android’s conversation would go astray. First, it is instructive to recognize that systems and technologies that seem to work well in isolation can frequently have unforeseen and novel errors when they work synchronously. Second, dealing with error and minimizing the effects of imperfect and unpredictable information are critical parts of designing an android or any other system that relies heavily on information.

3 Android aesthetics: Humanoid robots as works of art   


D Competence Although the misbehavior of complex systems is enlightening and instructive, such a focus can give the impression that the android did not work, or that it worked poorly. This is not the case. The android interacted with thousands of people who were nonexperts in AI or robotics over several high-profile, public exhibitions, such as the NextFest technology expo in Chicago. That the android was featured at the 2005 AAAI conference and was awarded the AAAI Open Interaction award testifies to the quality of its performance. In the fall of that year, AI Magazine reported on the android as follows: “Perhaps the most eye-popping robot in this year’s exhibition was Phil – a “Robotic Interactive Sculpture” that looked like the late science fiction writer Philip K. Dick. Phil won the open interaction competition. Looking rather like an animated Madame Tussaud figure, the potbellied robot sat in a chair chatting about his novels and personal habits. He made eye contact, smiled and scowled, powered by 37 servos that moved its face and four in its neck. Close to believable, it was a bit eerie to see the back of its skull uncovered and stuffed with mechanical devices and wires.” (Hedberg 2005: 14)


Figure 3.4: Back of android showing wires (photograph courtesy of Eric Matthews)

It is noteworthy that the author of that article moves seamlessly from referring to the topics of conversation “chatting about his novels and personal habits” to the activities of making eye contact and smiling. Each activity represents a discrete challenge for engineers but, in the end, they work together as a complete, unified whole. The interplay among the robot’s likeness to Philip K. Dick, the human-like movements, and the customized AI created an overall effect of humanness. The effect of eeriness that the android was said to evoke also emerged from those systems working together. The interaction shown in Figure  3.5 between Art Graesser of the University of Memphis and the android is a concrete illustration of the conversational versatility


   Part I: Images

of the android (Dufty 2005). The conversation was natural and was not staged or prepared in any way. Graesser, as director of the Institute of Intelligent Systems, knew about the android and had been involved in decisions to provide resources for the team working on it, but had not worked on the project himself. He understood the theory behind much of the conversational AI, but was not familiar with the specifics of the implementation, and had not previously “met” the android before this interaction. This was his first encounter with the android, which the team had decided by that time to call “Phil.”

Graesser: Hi. Phil: Hi. Graesser: Uh, what is your name?” Phil: My name is Phil. What’s yours? Graesser: My name is Art. Have you ever read Plato? Phil: Well, everything we read from those times is technically translation, of course. Graesser: (pointing to the books on the table): So have you ever read these books? Phil: I don’t go, but I read the Bible a lot. I mean, I don’t hardly read, I very rarely read a novel. Graesser: Then how do you learn things? Phil: My brain designer teaches me new things all the time. Graesser: What are you? Phil: I am Phil, a male Philip K. Dick android electronic brain, a robotic portrait of Philip K. Dick, a computer machine. Graesser: Do you think you are conscious? Phil: Is this a yes-or-no question? Graesser: Do you think that machines can think in the way that humans can? Phil: Machines are becoming more human, so to speak, at least in the sense that, as Wiener indicated, some meaningful comparison exists between human and mechanical behavior. Graesser: I’m not familiar with that research. What do you know about the ideas of Alan Turing? Phil: I like to play the Turing game.

Figure 3.5: Dialogue between Phil and Art Graesser

This conversation reveals various levels of the android’s dialogue and conversational AI that were described earlier in this section. The response to the question “what is your name” is a frozen expression. Similarly, the rather quirky line “My brain designer teaches me new things all the time,” comes, not from the Philip K. Dick database, but was written by computer scientist, Andrew Olney. When asked about reading “these books,” Phil replied, “I don’t go, but I read the Bible a lot. I mean, I don’t hardly read, I very rarely read a novel.” This response is a good example of the strategy of stitching a complete response out of piecemeal fragments. The first fragment is a reference to church attendance but has been included

3 Android aesthetics: Humanoid robots as works of art   


because it contains a reference to reading. The second is from a passage about Dick’s reading habits. Even though the phrase “I don’t go” does not have any meaning in this context and constitutes a minor Gricean violation, overall the blend of techniques provides a passable response to the question. As can be seen, the system did not have to produce perfectly constructed answers in order to demonstrate a degree of robustness in an unstructured, real-life context. Moreover the system does not generate intelligent answers by possessing intelligence in the sense of thinking about the subject matter, constructing a situation model, and converting that situation model into text. It does so by mining the latent intelligence in a body of work written by an intelligent human. Consequently, a human-like intelligence, such as that displayed in Figure 3.3, is not needed in order to create the illusion of a conversational and intelligent, human-like mind.

VII The scientific value of androids The bulk of research in science and technology occurs in specialized subdomains and for specific, localized applications and systems. While this is an inevitable and necessary state of affairs, it can sometimes have the effect of diminishing the performance of systems when they are embedded in a broader, complex environment. As a result, when systems are incorporated into larger, multi-system enterprises, the natural interaction between them in the course of real-world activity can produce surprising results. When the results are positive, the phenomenon is sometimes referred to as “emergent behavior” (Clayton 2005). It has even been suggested that consciousness itself is such an emergent phenomenon. Conversely, when the results are negative, they tend to be written off as lack of robustness on the part of the components, although such effects are better characterized as “emergent misbehavior” (Mogul 2006). There have been various attempts to fix the localized nature of science, such as the “systems thinking” of the 1970s and complex systems in the 1990s. While these shed some light on the problem, the only effective way to explore the coherent interaction between and among systems is to engage in a large, ambitious project that straddles several domains. Building an android is such a project. Android construction involves such disparate technologies as robotics, naturallanguage processing, and various tools to emulate human senses, notably machine vision and speech processing. Many of these are still new and, in some cases, nascent technologies. Even if each localized domain were to be completely solved and error free, getting them to work together in a coordinated way would still be a significant challenge This problem of unifying systems into a coherent, coordinated whole is a major technological challenge. Such an undertaking often uncovers new and unexpected scientific and technical problems.


   Part I: Images

VIII The value of androids as art From the waving mechanical King in the medieval gardens of Hesdin to the androids of David Hanson and Ishiguro’s Repliee, the machines described here have had no “functional” purpose. They did have aesthetic merit. But, is that enough? There are several reasons to believe that robots built for entertainment or other artistic purposes have benefits other than to provide enjoyment. First, in terms of scientific progress, artistic androids provide a way of making incremental steps. The project of creating a functional human is a huge, long-term one. The end is still not in sight, and it is unclear how far in the future the end point is. Artistic androids operate in a constrained, limited environment, providing an opportunity for incremental development (Avala 2011). The development of early prototypes also provides the opportunity to explore the social, philosophical and moral implications of creating artificial humans (LaGrandeur 2010). There also appear to be social benefits tied to deployments of fully-operational toys and other androids designed as entertainment. That effect is supported by Boyle’s (2008) report that Japanese tea-serving karakuri facilitate human-to-human interactions (Section II.C). MacDorman and Ishiguro (2006) argue that the building of androids that have no purpose other than social interaction and possess no obvious practical function is, in fact, useful for the ultimate goal of creating androids that can act as sophisticated participants in human society. The reason is that it is only by constructing such machines and putting them into interactive environments can true social interactivity be developed. This approach is the opposite of the top-down approach of applying Gricean maxims. The contention is that fine tuning and incremental development guided by field tests is the best way forward for android science. There is a case, then, that the creation of aesthetically-pleasing robotic likenesses of humans – robotic portraits – has scientific value. But this claim, even if true, is secondary to the acknowledgement that there is value in creating these machines for their own sake, regardless of their contribution to progress in android science or robotics. At some point every technology must have direct value on its own terms. For example, it would make no sense to evaluate the usefulness of an industrial robot operating in an automotive factory by assessing whether it represents a scientific step forward or may lead to future breakthroughs. The industrial robot is a valued creation on its own terms; there is no need to justify its existence in any other way. The same principle applies to robots created for artistic and other aesthetic purposes. The creation of art is its own reward. Lifelike androids do represent a step forward and possibly map out a pathway of incremental advances toward fully functional human replicas. That, however, is orthogonal to the fact that art is a fundamental human pursuit. A technology that enhances artistic output and gives it new forms is one that has value, regardless of the end goal of the research program that produced it. In short, artistic androids are a goal in their own right.

3 Android aesthetics: Humanoid robots as works of art   


IX The uncanny valley: a possible obstacle to artistic androids An important design consideration for androids in terms of their aesthetic appeal is the phenomenon that Masahiro Mori (1970) called the “Uncanny Valley.” Mori, a Japanese roboticist, proposed that humans experience feelings of greater familiarity and comfort with machines that look like humans than those that look like machines. Those responses grow as the machine becomes increasingly humanlike. When, however, a machine looks and moves very much like a human – but not exactly like a human – the comfort level dips suddenly and is replaced by a feeling of displeasure or even, in Mori’s view, horror (the “valley”). Furthermore, the presence of movement steepens the plunge into the valley. The phenomenon of the uncanny valley is widely accepted as a design principle in robotics and entertainment although, until as recently as 2005, there was no empirical support for Mori’s conjecture (Brenton et  al. 2005; Hanson et  al. 2005). Today, however, there is considerable research into the uncanny valley. A study by Tinwell et al. (2011) found that animated characters of all types were considered more ‘uncanny’ than human characters. They noted, however, that this effect was partly dependent on facial expression: characters without facial expressions were perceived as being more uncanny than those with expressions. Gray and Wegner (2012) ran a series of cognitive experiments that led them to conclude that the phenomenon is not produced by strong physical similarity to a human. Instead, they determined that it is tied to the perception that the machine could ‘experience.’ In other words, “machines become unnerving when people ascribe to them experience (the capacity to feel and sense), rather than agency (the capacity to act and do).” (Gray and Wegner 2012: 125). Other researchers have failed to find an uncanny-valley effect. Bartneck et  al. (2009) tested two major factors cited by Mori (1970): anthropomorphism and movement against likeability. Their findings were inconsistent with the existence of an uncanny valley. The human participants liked the androids in their study as much as they liked the humans they resembled. The researchers suggest that the uncanny valley is not a real phenomenon; it is a catch-all to excuse poor design. They conclude that the “Uncanny Valley hypothesis can no longer be used to hold back development.” (Bartneck et al. 2009: 275). David Hanson has also argued (Hanson et al. 2006) that the uncanny-valley effect is a product of poor design. In his view, negative feelings caused by interaction with machines can be eliminated by practicing good design that produces aesthetically-appealing machines or human likenesses. There is evidence to support Hanson’s position. Research by Seyama and Nagayama (2007) revealed the uncanny valley only occurs when there are abnormalities (e.g., facial distortions). This finding lends support to the conjecture that good design and aesthetically pleasing machines may ameliorate or eliminate the uncanny valley. In fact, Walters et al. (2008) found that more humanlike robots were preferred to less humanlike ones. They acknowledge that their failure to find an uncanny-valley effect might


   Part I: Images

have been due to the need for a truly humanlike robot. This caveat actually goes to the heart of one of the central problems with the uncanny valley: there is no objective scale for the humanness of machines. Until there is a more rigorous definition of what it means for a machine to be like a human any function that attempts to map onto a scale of “humanlike” will remain out of grasp. Given the findings described above, we can conclude that, in terms of the subjective experience of androids, good design is the key. As with any other system, an aesthetically pleasing android or other interactive machine will produce a superior experience to one that is poorly designed.

X Consciousness In his conversation with the Philip K. Dick android, Professor Art Graesser asked the android a question that it failed to answer: whether or not the android was conscious. The question of whether machines can be conscious is a topic of endless discussion among roboticists, philosophers, psychologists, and others. But in this particular case (with all due respect to Graesser), this question misses the point. The Philip K. Dick android was built with the purpose of replicating a human. That purpose does not necessitate consciousness. Even so, although building a “conscious machine” was neither an explicit nor an implied goal, we might wonder if it is a by-product of building a replica of a human. By way of analogy, we can imagine someone watching footage of an eagle flying in a nature documentary on a television screen. The screen is not alive; it is just a machine. The person might look at the bird on the screen and ask “Is it alive?” At first blush the answer is that “it” is not alive, because the image on the screen is simply a pattern of diodes and electrical currents creating an image with moving lights. Although the moving patterns of light are not alive, the image is of a creature that, at one point in time, was alive. The analogy is pertinent because, like the image in the documentary, the Philip K. Dick android was a work of representative art. If we move beyond the light show and focus instead on what the light show represents – the bird – then the answer to Graesser’s question is “yes.” That bird is, or was, alive. In the same way, as a recreation of a specific individual, the android was a painstakingly-crafted echo of a living being. Perhaps, as with the bird on the screen, questions about characteristics of the android (e.g., whether it is alive or conscious) are susceptible to confusion about whether one is referring to the object in front of you (the android) or the person whom the android is representing. Even if the reference is to the android, there’s a strong case for an answer in the affirmative. While traditional art – and even more modern forms such as film – tend to convey representations of entities that are removed in space and time, the android is not merely an image but a recreation of the entity in question. A perfect recreation, in theory,



would have all the characteristics of the original – something that cannot be said for traditional forms of art. Whether or not androids are truly alive, conscious, or human, they unquestionably exhibit a humanlike presence. This presence is not the result of any specific component of the machinery. Instead, it is an emergent property of all of the components working together.

XI Conclusion It is conceivable that future androids will perform tasks and provide services that are recognizably functional. It is also conceivable that robotic technology will advance to the point that recreations of individuals will be as ubiquitous and normal as portraits and sculptures of antiquity, photographs, and video are today. Neither of those eventualities nullified in any way the role of well-designed androids as works of art nor the android aesthetics that underlie that role.

References ADVAN Co. Ltd. (2010) Welcome to the World of Karakuri. [online] available from http://karakuriya. com/english/index.htm. [5 November 2013]. Ayala, A.M. (2011) ‘Automatronics.’ in Proceedings of Advances in New Technologies, Interactive Interfaces, and Communicability(ADNTIIC), ‘Advances in New Technologies Interactive Interfaces, and Communicability,’ LNCS 6616. held 20–22 October 2010 at Huerta Grande, Argentina. Berlin/Heidelberg: Springer, 8–15. Baker, C. (2011) ‘Blast from the past: The mechanical Lincoln.’ Wired 19(12), 76. Bartneck, C., Kanda, T., Ishiguro, H. and Hagita, N. (2009) ‘My robotic doppelgänger-A critical look at the uncanny valley.’ in Proceedings of the 18th IEEE International Symposium on Robot and Human Interactive Communication. held 27 September–2 October 2009 at Toyama, Japan. Piscataway, NJ USA: IEEE, 269–276. Blow, M., Dautenhahn, K., Appleby, A., Nehaniv, C.L. and Lee, D.C. (2006) ‘Perception of robot smiles and dimensions for human-robot interaction design.’ in Proceedings of the 15th IEEE International Symposium on Robot and Human Interactive Communication. held 6–8 September 2006 at Hatfield, UK. Piscataway, NJ USA: IEEE, 469–474. Boyle, K. (2008) Zashiki Karakuri. [online] available from html. [5 November 2013]. Breazeal, C., Brooks, A., Gray, J., Hancher, M., McBean, J., Stiehl, D. and Strickon, J. (2003) ‘Interactive robot theatre.’ Communications of the ACM 46(7), 76–85. Breazeal, C. and Scassellati, B. (1999) ‘How to Build Robots That Make Friends and Influence People.’ in Proceedings of the 1999 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS99). held 17–21October 1999 at Kyonju, Korea. Piscataway, NJ USA: IEEE, 858–863.



Brenton, H., Gillies, M., Ballin, D. and Chatting, D. (2005) ‘The Uncanny Valley: Does it exist?’ in Proceedings of the Conference of Human Computer Interaction, Workshop on Human Animated Character Interaction. held 22–27 July 2005 at Las Vegas, NV USA. Aarhus C., Denmark: International Design Foundation. available from d?doi= [10 October 2013]. Brooks, R.A., Breazeal, C., Marjanović, M., Scassellati, B. and Williamson, M.M. (1999) ‘The Cog project: Building a humanoid robot.’ In Computation for Metaphors, Analogy, and Agents, LNCS 1652. ed. by Nehaniv, C. Berlin/Heidelberg: Springer, 52–87. Clayton, P. (2005) Mind and Emergence: From Quantum to Consciousness, Oxford, UK: Oxford University Press. Dautenhahn, K., Woods, S., Kaouri, C., Walters, M.L., Koay, K.L. and Werry, I. (2005) ‘What Is a Robot Companion-Friend, Assistant or Butler?’ in Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems. held 2–6 August 2005 at Edmonton, Alberta Canada. Piscataway, NJ USA: IEEE, 1192–1197. Dick, P.K. (1972) We Can Build You. New York, NY USA: DAW Books. Dick, P.K. (1968) Do Androids Dream of Electric Sheep? New York, NY USA: Doubleday. Dufty, D.F. (2012) How to build an android: The true story of Philip K. Dick’s robotic resurrection. 1st U.S. edn. New York, NY USA: Holt. Epstein, R. (2006) ‘My Date with a Robot.’ Scientific American Mind 17(3), 68–73. Chapman, G. (1843) The Iliads of Homer, Prince of Poet. Volume 2. London, UK: Charles Knight. Graesser, A.C., Olney, A.M., Haynes, B.C. and Chipman, P. (2005) ‘AutoTutor: A cognitive system that simulates a tutor that facilitates learning through mixed-initiative dialogue.’ in Forsythe, C., Bernard, M.L. and Goldsmith, T.E (eds.) Proceedings of Cognitive Systems: Human Cognitive Models in Systems Design. held 30 June–2 July 2005 at Santa Fe, NM USA. Gray, K. and Wegner, D.M. (2012) ‘Feeling Robots and Human Zombies: Mind perception and the uncanny valley.’ Cognition 125(1), 125–130. Greene, K. and Greene, R. (1998) The Man Behind the Magic; The story of Walt Disney. New York, NY USA: Viking. Grice, P. (1975) ‘Logic and Conversation.’ in Syntax and Semantics: Speech acts. Volume III. ed. by Cole P. and Morgan, J. New York, NY USA: Academic Press, 41–58. Gunalan, N. (2007) ‘Islamic automation: A reading of al-Jazari’s The Book of Knowledge of Ingenious Mechanical Devices.’ in Media Art Histories, ed. by Grau, O. Cambridge, MA USA: MIT Press, 163–178. Hanson, D. (2013) ‘Dr. David Hanson: Robotics designer.’ [online] available from http://anim.usc. edu/sas2013/david-hanson.html. [12 December 2013]. Hanson, D., Baurmann, S., Riccio, T., Margolin, R., Dockins, T., Tavares, M. and Carpenter, K. (2009) ‘Zeno: A cognitive character.’ in Proceedings of the 2008 Association for the Advancement of Artificial Intelligence Workshop, held 13–17 July 2008 at Chicago, IL USA. San Francisco, CA USA: AAAI, 9–11. Hanson, D., Olney, A., Prilliman, S., Mathews, E., Zielke, M., Hammons, D., Fernandez, R. and Stephanou, H. (2005) ‘Upending the Uncanny Valley.’ in Proceedings of the Twentieth National Conference on Artificial Intelligence and the Seventeenth Annual Conference on Innovative Applications of Artificial Intelligence. held 9–13 July 2005 at Pittsburgh, PA USA. San Francisco, CA USA: AAAI, 1728–1729. Hanson, D., Pioggia, G., Bar-Cohen, Y. and De Rossi, D. (2001) ‘Androids: Application of EAP as artificial muscles to entertainment industry.’ in Proceedings of the 8th Annual International Symposium on Smart Structures and Materials, ‘Electroactive Polymer Actuators and Devices,’ SPIE Volume 4329. Bellingham, WA USA: SPIE, 375–379.



Hedberg, S.R. (2005) ‘Celebrating Twenty-Five Years of AAAI: Notes from the AAAI-05 and IAAI-05 conferences.’ AI Magazine 26(3) 12–16. Koetsier, T. (2001) ‘On the Prehistory of Programmable Machines: Musical automata, looms, calculators.’ Mechanism and Machine Theory 36 (5): 589–603. Kolve, V.A. (2009) Telling Images: Chaucer and the Imagery of Narrative II. Redwood City, CA USA: Stanford University Press. Kristoffersson, A., Coradeschi, S. and Loutfi, A. (2013) ‘A Review of Mobile Robotic Telepresence.’ [online] Advances in Human-Computer Interaction. Volume 2013, Article ID 902316. available from [ 10 December 2013]. LaGrandeur, K. (2010) ‘Do Medieval and Renaissance Androids Presage the Posthuman?’ CLCWeb: Comparative Literature and Culture, 12(3), 3. Lee, G., Dick, P.K., Sauter, P.E. and Powers, T. (2001) What If Our World Is Their Heaven? The final conversations of Philip K. Dick. New York, NY USA: Overlook Press. MacDorman, K.F. and Ishiguro, H. (2006) ‘The Uncanny Advantage of Using Androids in Cognitive and Social Science Research.’ Interaction Studies 7(3), 297–337. Mogul, J.C. (2006) ‘Emergent (Mis)behavior vs. Complex Software Systems.’ Technical Report No. HPL-2006–2 H P Laboratories Palo Alto, CA USA: H P Laboratories. Moran, M.E. (2007) ‘Jacques de Vaucanson: The father of simulation.’ Journal of Endourology 21(7) 679–683. Moran, M.E. (2006) ‘The da Vinci Robot.’ Journal of Endourology 20(12) 986–990. Mori, M. (1970) ‘Bukimi no tani [The Uncanny Valley].’ Energy 7(4), 33–35. Needham, J. (1956). Science and Civilization in China: History of Scientific Thought. Volume 2. Taipei, Taiwan: Caves Books. Ogawa, K., Nishio, S., Minato, T. and Ishiguro, H. (2012) ‘Android Robots As Telepresence Media.’ in Biomedical Engineering and Cognitive Neuroscience for Healthcare: Interdisciplinary Applications. ed. by Wu, J. Hershey, PA USA: Medical Information Science Reference, 54–63. Oh, J.H., Hanson, D., Kim, W.S., Han, I.Y., Kim, J.Y. and Park, I.W. (2006) ‘Design of Android Type Humanoid Robot Albert HUBO.’ in Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. held 9–15 October 2006 at Beijing, China. Piscataway, NJ USA: IEEE, 1428–1433. Olney, AM (2007a) ‘Dialogue Generation for Robotic Portraits.’ in Proceedings of the International Joint Conference on Artificial Intelligence 5th Workshop on Knowledge and Reasoning in Practical Dialogue Systems. held 8 January 2007 at Hyderabad, India. Menlo Park, CA USA: IJCAI, 15–21. Olney, A.M. (2007b). Research Interview at Fedex Institute of Technology, Memphis. [oral] to Dufty, D. [15 June 2007]. Robbins, M. (2014) ‘We must End Our Obsession with Robots that Look like Humans.’ Vice [online] 11 February 2014. available from [10 March 2014]. Seyama, J.I. and Nagayama, R.S. (2007) ‘The Uncanny Valley: Effect of realism on the impression of artificial human faces.’ Presence: Teleoperators and Virtual Environments 16(4) 337–351. Solon, S. (2011) Sylvia Solon’s blog: There is no point making robots look and act like humans. [30 March 2011] available from–03/30/humanoidrobots. Tarantola, A. (2011) ‘Japan’s First Robots Are Older Than You Think.’ Gizmodo [online] 13 October 2011. available from [5 November 2013]. Tillis, S. (1999) ‘The Art of Puppetry in the Age of Media Production.’ TDR/The Drama Review 43(3), 182–195.


   Part I: Images

Tinwell, A., Grimshaw, M., Nabi, D.A. and Williams, A. (2011) ‘Facial Expression of Emotion and Perception of the Uncanny Valley in Virtual Characters.’ Computers in Human Behavior 27(2), 741–749. Truitt, E.R. (2010) ‘The Garden of Earthly Delights: Mahaut of Artois and the automata at Hesdin.’ Medieval Feminist Forum 46(1), 74–79. Vlachos, E. and Schärfe, H. (2013) ‘The Geminoid Reality.’ in HCI International 2013-Posters’ Extended Abstracts, held 21–26 July 2013 at Las Vegas, NV USA, 621–625. Wallace, M. (1985) ‘Mickey Mouse History: Portraying the past at Disney World.’ Radical History Review 32, 33–57. Walters, M.L., Syrdal, D.S., Dautenhahn, K., Te Boekhorst, R. and Koay, K.L. (2008) ‘Avoiding the Uncanny Valley: Robot appearance, personality and consistency of behavior in an attentionseeking home scenario for a robot companion.’ Autonomous Robots 24(2), 159–178. Wood, G. (2003) Living Dolls: A Magical History of the Quest for Mechanical Life. London, UK: Faber and Faber.

Part II: Frameworks and Guidelines

Bilge Mutlu, Sean Andrist, and Allison Sauppé

Enabling human-robot dialogue I Introduction Since their inception, robots have been envisioned as autonomous agents that play key roles, including entertainers, teachers, coaches, assistants, and collaborators – roles that humans themselves play in society. These roles require robots to interact with their users in ways that people communicate with each other by drawing on the rich space of cues available to humans in verbal and nonverbal communication. Robots are also expected to listen to and observe their users while building the internal representations required for engaging in an effective dialogue within the context of a given interaction. This exciting vision, however, is far ahead of what has been realized. The result is a mismatch between how people expect to interact with robots and the interactions afforded by actual robot systems. Closing this gap requires significant advancements on three key fronts. The first is to develop a systematic understanding of human language that supports computational representation. The second is to empower robots to process human language: to represent and plan according to the dynamic context of the interaction and to produce language that their users can understand. The third is to develop a design approach that translates knowledge about human language into models, algorithms, and systems. All three are needed to achieve robotic products and applications that effectively use human dialogue as a model for humanrobot interaction. The last two decades have seen tremendous advances on each of these three fronts. Research in the behavioral sciences, robotics, and design has brought us closer to realizing the vision for human-robot dialogue-systems. Most notable is a small but growing body of work at the intersection of robotics, cognitive science, linguistics, and design. This work explores how human language can serve as a resource for designing mechanisms that enable sophisticated, human-robot dialogue. The result is the creation of new kinds of models, algorithms, and systems. These innovations empower robots to understand human verbal and nonverbal language; to maintain appropriate conversational or task-based exchanges; and to produce language that effectively communicates the robot’s goals, beliefs, and intentions. This chapter outlines that body of work and illustrates its potential for realizing rich and effective human-robot dialogue. Using two examples, we demonstrate how cues and mechanisms from human dialogue can be translated into models and algorithms that give robots the tools to effectively engage in dialogues with their users. There is now an identifiable body of literature that highlights opportunities and challenges associated with human-robot dialogue. It is, however, sparsely distributed across multiple research communities including robotics, multimodal interfaces,


   Part II: Frameworks and Guidelines

dialogue systems, human-computer interaction, and human-robot interaction. As a result, work in this domain lacks models, algorithms, and implementations that are standardized, validated, and shared. Additionally, the advances and open challenges across the entire body of work have yet to be reviewed, summarized, and synthesized. In an attempt to close this gap, the next section presents a consolidated review of this literature. This review leads naturally to our conceptual framework for human-robot dialogue-systems in Section III. Our goals for this framework are to guide the analysis of existing human-robot dialogue-systems and to establish a scaffold for the design of future systems. This framework also provides context for the illustrative studies presented in Section IV. We conclude the chapter with a discussion of opportunities and challenges that are informed by our review of the literature, synthesis of the framework, and findings of the illustrative studies.

II Review of the literature Research has explored how cues and mechanisms from human dialogue can be integrated into remotely-operated robot systems and autonomous robots engaged in situated interactions (i.e., where language use is shaped by the social, cultural, and physical environment (Goodwin 2000)) with their users. The common thread running through this work is the effort to shift human-robot interaction from users who serve as supervisors for or operators of low-level robot functions to users and robots that engage in “peer interactions” involving high-level exchanges. Thus, research efforts have focused on exploring how aspects of human dialogue might facilitate such exchanges. Initially, this was accomplished by introducing elements of text-based or spoken dialogue into the control mechanisms of robot systems. More recently, it has been achieved via building complex dialogue-mechanisms that support face-to-face, situated human-robot dialogues. The shift from low-level control to high-level interaction is most noticeable in remotely-operated robot systems. Research on these systems introduced elements of text- or telephone-based dialogue systems into the control interfaces of planetary rovers (Fong, Thorpe and Baur 2003); unmanned air vehicles (UAVs) (Lemon et  al. 2001); teleoperated, indoor mobile-robots (Theobalt et al. 2002); and robotic forklifts (Tellex et al. 2012). The incorporation of such elements replaced traditional, low-level supervisory control with mixed-initiative (a flexible interaction paradigm in which each agent, i.e., the human and the robot, contributes to the task what it does best (Allen, Guinn and Horvitz 1999)) “collaborative control” (Fong, Thorpe and Baur 2003 and Kaupp, Makarenko and Durrant-Whyte 2010). As a result, the operator and the robot work as collaborators solving problems in navigation, perception, manipulation, and decision-making using dialogue acts, utterances that serve as illocutionary acts in dialogue (Stolcke et al. 2000), such as commands, queries, and responses

4 Enabling human-robot dialogue   


(Fong, Thorpe and Baur 2003). Communication in collaborative control is maintained via a graphical user-interface (GUI) (Fong, Thorpe and Baur 2003), spoken language (Tellex et al. 2012), or a combination of the two (Lemon et al. 2001). These systems involve dialogue management at varying degrees of complexity, ranging from using limited vocabulary and grammar in a specific domain (Fong, Thorpe and Baur 2003) to integrating dialogue managers with complex, multimodal dialogue-capabilities (Lemon et al. 2001). Contemporaneously with these efforts, researchers explored the development of robot systems that use aspects of face-to-face, human dialogue to imitate human language and behaviors in situated interactions. Early work on realizing effective, faceto-face interaction with robots focused on mimicking human conversations in order to achieve multimodal, multiparty dialogue (Matsusaka, Fujie and Kobayashi 2001 and Skubic et al. 2002) and situated dialogue (Moratz, Fischer and Tenbrink 2001). Recent work has concentrated on developing mechanisms to achieve task-based dialogue or joint action between humans and robots (Foster et al. 2006 and Hoffman and Breazeal 2007). According to Sebanz, Bekkering and Knoblich (2006), joint action can be defined as “any form of social interaction whereby two or more individuals coordinate their actions in space and time to bring about a change in the environment” (Sebanz, Bekkering and Knoblich 2006:70). Researchers also investigated how robots might attain linguistic and nonverbal effectiveness in language use (Torrey, Fussell and Kiesler 2013) and adaptive dialogue to tailor dialogue acts to user characteristics or real-time changes in user states (Szafir and Mutlu 2012). The following sub-sections outline research that moves toward realizing these key characteristics in face-to-face, situated human-robot dialogues.

A Multimodal, multiparty dialogue Research in human communication has underlined the fine coordination among speech, gesture, and gaze needed to make possible the symbolic communication of complex ideas (Streeck 1993). Early work in human-robot dialogue acknowledges the importance of this coordination in dialogue by exploring how robots might achieve multimodal dialogue, particularly for understanding human communicative behaviors that involve speech, gestures, and gaze. A system developed by Skubic et  al. (2004) integrated speech and gesture input, semantic representations of spatial information, and real-world spatial information sensed from the environment. The system used those data to interpret and carry out commands, such as “Go over there [accompanying deictic gesture]” or “Back up this far [accompanying iconic gesture].” A similar system developed by Stiefelhagen et al. (2004) has speech as the main input modality and uses the speaker’s gestures and head pose to disambiguate information in situations involving deixis, speech-recognition errors, or when multiple, likely interpretations of the recognized speech or gestures exist. Stiefelhagen et al. (2007)


   Part II: Frameworks and Guidelines

extended this system to integrate a dialogue framework that fuses information from recognizers for speech, gesture, and head pose to determine whether the recognized multimodal input provides sufficient information to achieve dialogue goals. While dialogue research focuses primarily on dyadic exchanges, real-world interactions frequently involve multiple participants who engage in the interaction with varying levels of involvement (Goffman 1979, Goodwin 1981 and Levinson 1988). Research in human-robot dialogue has developed models and mechanisms for recognizing input from and addressing multiple users so that robots can engage in similar real-world interactions. An early example of this research is a robot system developed by Matsusaka, Fujie and Kobayashi (2001), which integrates mechanisms such as speaker identification, recognition of speech, gesture, syntactic analysis, turn taking, and gaze control, that then allows the robot to recognize input from and converse with multiple users. Recent work has explored how robots might signal different levels of attention (Mutlu, Forlizzi and Hodgins 2006) or different participant roles (Mutlu et al. 2012 and Mutlu et al. 2009) in multiparty interactions. Mutlu, Forlizzi and Hodgins (2006) developed a model of robot gaze based on distributions obtained from observations of human gaze. Their model enables the robot to distribute its attention to all participants in the interaction. Mutlu et al. (2009) applies a model of gaze that allows the robot to signal users with different conversational roles (e.g., addressee, bystander, or overhearer). When they evaluated the effectiveness of this model they found that participants conformed to the roles signaled by the robot 97 % of the time. These results provide an empirical basis for the key role that gaze cues play in indicating participant roles in human-robot dialogue.

B Situated interaction The concept of situated interactions refers to language use that is situated in and shaped by the social, cultural, and physical characteristics of the context of the interaction (Goodwin 2000). A key challenge in achieving human-robot dialogue in collocated, situated interactions is establishing a shared representation (between the robot and its user) for the mapping between language and the external world. How such mappings can be made intrinsic to the (human-robot) system is often characterized as the symbol-grounding problem (Harnad 1990). Early work on this problem in the context of human-robot dialogue involved the development of computational models for the spatial dimension. Some of that work was directed at spatial referencing based on spoken language by humans (Moratz, Fischer and Tenbrink 2001). Other researchers focused on methods for enabling robots to use multimodal information for spatial reasoning and communication (Skubic et al. 2004 and Skubic et al. 2002). Moratz, Fischer and Tenbrink (2001) developed a computational model of human spatial-referencing that integrates different reference systems (e.g., an intrinsic reference system to allow users to make spatial refer-

4 Enabling human-robot dialogue   


ences in relation to the robot and a relative reference system to allow users to refer to targets in relation to a salient object or a group of similar objects in the environment). Skubic et al. (2002) designed a method for generating an occupancy-grid map from the robot’s sensors that would automatically supply spatial relationships based on an intrinsic reference system. Using this method, the robot was able to answer questions, such as “How many objects do you see?” and “Where is the pillar?” with responses, such as “I am sensing four objects” and “The pillar is mostly in front of me but somewhat to the right. The object is very close.” An extension of this work permitted the robot to interpret spatial references in multimodal dialogue-acts (including spoken-language and gesture references) and to perform commands based on these references (Skubic et al. 2004). In this system, the robot integrated multimodal input to perform commands, such as “Go to the right of the pillar.” and “Go over there [with accompanying deictic gesture].” More recent work proposes solutions for the symbol-grounding problem that empower robots to interpret complex spatial references. Some researchers address this problem by establishing tighter integration between mechanisms for spatial reasoning and dialogue management. Theobalt et al. (2002) developed a system for a priori labeling of occupancy grids to enable the spoken-dialogue system to apply inference to spatial reasoning and to handle more complex spatial references. Kruijff et al. (2007) proposed using human-augmented mapping (HAM). HAM supports the automatic acquisition and labeling of spatial information from situated human-robot dialogues. This approach addresses the limitations of a priori labeling of a robot’s internal representations of spatial information. Other researchers use novel representations for mappings between language elements and the external world. Tellex et al. (2011) introduced Generalized Grounding Graphs (G3). This framework exploits the compositional and hierarchical structure of spoken commands to automatically generate probabilistic graphical models of the mapping between spatial references and elements of the external world (e.g., places, objects, and events). An implementation of this framework to control a robotic forklift leads the robot to correctly interpret and carry out commands, such as “Put the pallet on the truck.”

C Joint action Face-to-face interaction between robots and humans may also entail joint performance of tasks. These joint actions may occur at varying levels of involvement by the interacting parties; from the user giving commands that the robot executes (e.g., commanding a robotic forklift to “put the pallet on the truck” (Tellex et al. 2011)) to the user and the robot jointly performing a complex assembly task (e.g., joint construction of a toy (Foster et al. 2006)). The need for joint actions has resulted in extensive research on effective task-based communication that requires the development of multiple models and systems designed to achieve coordination and communication


   Part II: Frameworks and Guidelines

in joint activities. Theoretically, this work builds on Clark’s (1996:3) proposition that language use is a form of joint action where task-based communication, coordination, and performance are part of the broader dialogue between the agents involved. Sidner, Lee and Lesh (2003) define the robot’s communication behaviors as part of the joint task that the robot and its user must perform. They represent descriptions of these behaviors as task models (see Section III.C). For instance, the task of giving a demo to a group of visitors includes a set of low-level, action-descriptions – or task models – that outline the appropriate speech and gestures the robot must perform. Foster et al. (2006) developed a human-robot dialogue system based on the Collaborative Problem-Solving (CPS) model of dialogue. In CPS, agents perform communication behaviors and task actions based on similar models called recipes toward achieving joint goals – or objectives (Blaylock and Allen 2005). Other research on task-based, human-robot dialogue utilizes separate representations (and models) for a task and for the dialogue associated with that task. This approach implements the two as separate system components that communicate with each other.¹ In an application of this approach by Foster and Matheson (2008), the robot gives the user instructions for constructing a toy airplane and monitors the user’s assembly of the plane. The system contains two task-based components: a task planner and a dialogue manager. The task planner maintains a graphical model of the steps involved in completing the task and an object inventory that uses information from the robot’s vision system to track the states of all the objects involved in the assembly. These task-based components communicate with the dialogue manager using an information-state update-model of dialogue management (Traum and Larsson 2003). The model takes all available information from the task-based components along with speech input from the user to determine the appropriate dialogue output. Another example of an explicit separation between representations for task and dialogue is the pattern-based, mixed-initiative PaMini framework developed by Peltason and Wrede (2010). The PaMini framework includes a task-state protocol for the representation and processing of task-related actions and information, as well as a set of dialogue structures that represent commonly-observed patterns in human dialogue (e.g., question-answer pairs). The component task-state protocol enables mixed-initiative interaction and learning of task models based on information obtained through the dialogue subsystem. Peltason and Wrede (2011) applied this framework to a robot-learning scenario they call “Curious Robot,” in which a robot learns to label and grasp objects. The Curious Robot demonstrates the feasibility of separating the representation for the task from the dialogue.

1 A discussion of different approaches to achieving task and dialogue goals is provided by Foster et al. (2009).

4 Enabling human-robot dialogue   


D Linguistic and nonverbal effectiveness A great deal of research on human-robot dialogue has explored models that can assist robots in interpreting and producing appropriate dialogue acts. An equally important but under-explored aspect of human-robot dialogue is rhetorical ability (Hartelius 2011). Rhetorical ability can involve the effective use of verbal and nonverbal cues that range from politeness (Brown and Levinson 1987) to gestures (Iverson and GoldinMeadow 1998). These cues transform abstract dialogue acts into effective speech. They create a rich design-space for human-robot dialogue (Mutlu 2011). Torrey, Fussell and Kiesler (2013) carried out one of the few explorations of this space. They studied the impact of hedges (e.g., “kind of”) and discourse markers (e.g., “basically”) on how readily human users accept instructions from a robot and their perceptions of the robot. They found that robots that used hedges or discourse markers were rated as being more considerate, likable, and less controlling. Researchers have found that paralinguistic cues, such as affect, shape humanrobot dialogue. Breazeal (2001) manipulated robot speech to correspond with vocalaffect parameters in human speech. One such parameter is pitch range – the range in which fundamental frequency (f0) varies. She found that people accurately recognize them between 25 % and 84 % of the time – well above chance (17 %). Scheutz, Schermerhorn and Kramer (2006) augmented a robot’s speech with affective expressions reflecting either task urgency or the current stress level of the user. They modulated parameters of the speech synthesizer to achieve more dramatic pitch swings to express “fright.” An evaluation of their system showed increased human-robot task performance when the robot used affective expressions compared with when it used no affective expressions in its speech. Research on nonverbal effectiveness in human-robot dialogue constitutes a rich body of knowledge, including how various aspects of gaze (Sidner et al. 2004) and gestures (Huang and Mutlu 2013a) improve the effectiveness of robot speech. Early work on nonverbal effectiveness includes a robot dialogue-system developed by Sidner et al. (2004) for giving demos to visitors. The system directs the robot to accompany its verbal explanations by gazing toward objects of interest. Sidner et al. (2004) found that when the robot exhibited gaze behaviors, the visitors interacted with it longer, showed increased attention, and looked back at the robot more frequently than when the robot used speech alone. Mutlu, Forlizzi and Hodgins (2006) developed a robot system that tells stories to a two-person audience. The robot directs its gaze toward and away from its audience based on distribution parameters obtained from human storytellers. To evaluate the effect of the robot’s gaze on story retention, researchers manipulated how much the robot gazed at the listeners and found that increased gaze improved story retention by women but not by men. In a similar storytelling scenario, Huang and Mutlu (2013a) also used gaze and gestures based on parameters obtained from human storytellers. They found that the use of deictic gestures by the robot consistently improved story recall for all subjects. Additionally, deictic, beat, and meta-


   Part II: Frameworks and Guidelines

phoric gestures all affected story retention by women, but only beat gestures proved effective for recall by men.

E Adaptive dialogue Humans engaging in conversations not only display linguistic, paralinguistic, and nonverbal cues, but they also continually adapt their behavior to particular characteristics of their partners or to real-time changes in those characteristics. A commonly and reliably-observed example of this adaptive behavior is called the chameleon effect – the automatic mimicry or matching of a partner’s social behaviors (Chartrand and Bargh 1999). Other approaches have explored how robots might adapt their dialogue-related behaviors, focusing particularly on understanding whether such adaptations benefit the communication (e.g., Torrey et  al. 2006) and what metrics and algorithms might enable such adaptation (e.g., Szafir and Mutlu 2012). Torrey et al. (2006) adapted a robot’s dialogue acts to the user’s level of expertise in cooking. The robot interacted with expert cooks using no more than the names of cooking utensils, but it provided novice cooks with names and descriptions of the utensils. Their results showed that adapting dialogue content to user expertise significantly reduces the number of questions that novices ask but not the number of questions asked by experts. It also affected user ratings of robots. In the experimental conditions in which the robot’s language matched a user’s expertise, participants found the robot to be more effective, more authoritative, and less patronizing. A follow-up study explored the adaptation of dialogue to real-time changes in the information needs of the user by using eye contact and delays in task progress as predictors of increased need for information (Torrey et al. 2007). This study found that providing users with further elaboration of instructions based on eye contact does not offer any benefits. On the other hand, adapting information content to delays in task progress reduces the number of questions asked by the participants. Szafir and Mutlu (2012) developed a system that allows a storytelling robot to monitor the attention levels of its listener in real time using electroencephalography (EEG; electrical measurements of brain activity). In response, the robot adapts its use of paralinguistic cues, specifically the loudness of its speech and its use of gestures. They evaluated three conditions: no cues (baseline), adaptive use of the cues, and random use of cues. They found that the adaptive-use condition significantly improved how much information participants retained compared with the other two conditions. These studies illustrate the benefits of using paralinguistic cues and adapting aspects of dialogue (including information content) to the information needs of users. Torrey et al.’s (2007 and 2006) work exemplifies how robots might adapt their language to user characteristics; Szafir and Mutlu (2012) demonstrate how real-time changes in user knowledge, awareness, and attention can serve as a basis for such

4 Enabling human-robot dialogue   


adaptations. Research in collaborative control has also explored how dialogue acts can be modified to reflect user characteristics, such as user expertise (Fong, Thorpe and Baur 2003 and Fong, Thorpe and Baur 1999). This work highlights the benefits of adaptive dialogue across different forms of human-robot interaction.

III A framework for human-robot dialogue-systems In this section, we draw on the review of research on human-robot dialogue presented above to synthesize a framework for human-robot dialogue. We outline the key components and mechanisms of the framework that enable robots to effectively listen to and talk with users. This framework provides context for the research presented in Section IV. The framework includes six key components that are outlined below and illustrated in Figure  4.1. The goal of this framework is to guide the analysis of existing systems and to serve as a starting template for future systems. We do not offer it as a formula for the development of all human-robot dialogue-systems, as the design of a

#5 Multimodal production Linguistic cues Nonverbal cues Task actions

#4 Dialogue model

#6 Adaptive dialogue loops

Tight coupling

#1 Multimodal language processing #3 Task model #2 Domain processing



Task space Figure 4.1: A conceptual synthesis of human-robot dialogue-systems based on the literature reviewed in Section II. The six key components are numbered by the order in which they are presented in this section. Gray and black lines denote communication among the components and with the external world, respectively. The components are abstract representations of functionalities that might reside within the robot (e.g., speech and gesture capabilities for multimodal production) or be distributed in the environment (e.g., an overhead camera system for domain processing).


   Part II: Frameworks and Guidelines

specific dialogue system will be shaped largely by the requirements of the application domain.

A Multimodal language processing Effective human-robot dialogue-systems must be able to capture and interpret multimodal input of speech, gestures, and gaze from one or more users. Combining information from different modalities is particularly important for interactions in which deixis is necessary for effective communication. It is also essential for multiparty settings in which multimodal information might permit the robot to identify the speaker and the participant roles of all parties engaged in the interaction. To capture and interpret multimodal information, this component may draw on speech-recognition and processing techniques as well as on algorithms for the recognition of user gazedirection and gestures. It can also achieve adaptive human-robot dialogue by capturing and interpreting behavioral and physiological signals from users (e.g., employing electrical, brain-activity signals to monitor user attention (Szafir and Mutlu 2012)).

B Domain processing Task-based human-robot dialogue requires real-time capture and interpretation of the physical world. Objects, people, space, and events related to the task; physical actions of other human or agent collaborators; and changes in the conditions and the environment surrounding the task are of particular importance in task-based interactions. Real-time capture and interpretation can draw on a number of perception and inference subsystems (e.g., vision- or marker-based tracking of objects, people, spaces, or actions) to update the task model. Domain information is particularly useful in settings that involve physical collaboration or situated interaction because it enables the robot to maintain an accurate understanding of the world so that it can carry out task steps. For instance, in a collaborative-assembly scenario, the robot might need to detect the location of parts to be assembled or assess whether the current state of the assembly conforms to task plans (Foster et al. 2006). Such an assessment is performed by the robot system described in Section IV to determine whether to help the user perform the task.

C Task model A second component that is essential for task-based, human-robot dialogue is the task model – a representation of the task in which the robot and its user are engaged (also see Section  II.C). Task models can be pre-scripted, acquired through interac-

4 Enabling human-robot dialogue   


tion with the user, or by performance of a task using learning algorithms (e.g., reinforcement learning). The task model allows the robot to monitor task progress, track actions taken by users or other agents engaged in the task, identify inconsistencies in task goals and task actions, adapt task plans based on task progress, and learn new tasks or task plans. Effective utilization of the task model requires a tight coupling between the task model and the dialogue manager. This coupling permits the robot to evaluate user input against task plans or to produce appropriate dialogue acts to communicate task steps and goals. The coupling can either consider task and dialogue goals jointly, as in the case of the Collagen system (Rich, Sidner and Lesh 2001) and the CPS model (Blaylock and Allen 2005), or separately, as in the case of the PaMini framework (Peltason and Wrede 2010).

D Dialogue model The dialogue model ensures that the robot can interpret user input, generate appropriate dialogue acts, and perform other dialogue functions (e.g., error handling). The dialogue model is complementary to the task model because it communicates with the task model to assess task plans or initiate task actions. In order for the dialogue model to effectively manage the requirements of both spoken dialogue and taskbased interactions, it must support mixed-initiative interaction. This model must also be able to handle multi-modality in language processing and production as well as dialogue involving multiple users and other agents (multi-party dialogue). Finally, processes such as disambiguation, grounding, and repair (see Section IV.A) must be supported for situated interactions. Peltason and Wrede (2011) provide a comparison of state-of-the-art dialogue modeling approaches and a discussion of some of the requirements listed above.

E Multimodal production Multimodal production includes both linguistic and nonverbal strategies that, along with task actions, support the goals defined in the task model. For instance, if one of the goals in the task model of an instructional robot is to maximize student engagement, the robot must employ behavioral strategies and task actions that help improve student engagement. An example of such a mechanism was developed by Huang and Mutlu (2012, 2013b). It allows the robot to automatically select behavioral strategies from a repository of options that supports the designer’s interaction goals. We contend that task actions are also driven by the multimodal production component, because the way these actions are performed contributes to the robot’s communicative effectiveness. For instance, Dragan, Lee and Srinivasa (2013) demonstrated that a reach-and-grasp motion can be configured to improve the robot’s ability to com-


   Part II: Frameworks and Guidelines

municate its intent to human observers. This is accomplished when the multimodal production mechanisms communicate closely with the task and dialogue models. Together, these components determine which task actions and dialogue acts to select as well as the optimum way to perform them, given the task goals.

F Adaptive dialogue The last component of our proposed framework is a representation for the low-level, perception-action loops for adaptive dialogue. This is different from the high-level adaptation of task plans that consider the overall state of the task and the dialogue. Adaptive-dialogue loops establish communication between multimodal processing and production components for adaptive responses (e.g., behavioral mimicking, affective mirroring, and gaze following). Adaptive-dialogue loops are a low-level function because they are driven by a number of preset perception-action schemas rather than by high-level task or dialogue plans. Riek, Paul and Robinson (2010) developed a closed-loop, action-perception model that allows the robot to capture the head movements of a conversational partner and mimic them in real time. The separation of this component from the remainder of the dialogue system provides several advantages for the design of human-robot dialogue-systems. First, using a separate module fosters greater extensibility of adaptive dialogue by allowing designers to compose the set of adaptive capabilities needed by specific applications. Second, the separation improves computational efficiency by reducing real-time processing demands on the remainder of the system. Third, having an independent module for adaptivedialogue loops facilitates the design and implementation of simple human-robot dialogue-systems, such as a robotic toy that tracks the face of its user with its head to appear alive and attentive without any spoken dialogue capabilities. Taken together, the six components described in this section comprise a rich design space for creating natural and effective human-robot dialogue-systems. They afford many opportunities as well as technical challenges, which are discussed in Section V. In the next section, we present two studies that illustrate how this framework can be developed. We focus on two components. One is development of a task model that enables a robot to autonomously provide its user with instructions in an assembly task. The other is a production model using linguistic cues of expertise that supports the formulation of effective and persuasive speech.

4 Enabling human-robot dialogue   


IV Enabling effective human-robot dialogue Robots that serve as instructors, personal trainers, coaches, information-desk attendants, receptionists, and sales representatives must communicate effectively with human users. This entails the ability to deliver accurate instructions, clarifications, corrections, and confirmations in a way that improves user confidence and trust in both the information provided and in the robot itself. For instance, a robotic weight-loss coach might offer suggestions for activities and foods to help users maintain health and lose weight. It must sound credible or it will fail to elicit confidence and trust. Similarly, a robot attendant at a tourist-information office must seem knowledgeable and trustworthy. A robot training humans to do assembly work must not only establish trust; it must also teach users how to do each step of the task. It must then monitor the user’s performance of those steps, offering clarifications, corrections, and confirmations when necessary. To accomplish these goals, the robot must possess: 1. language and domain processing to analyze user speech and task actions (or lack thereof); 2. a task model to provide instructions, monitor task progress, and detect mistakes; 3. a dialogue model to interpret user input and produce dialogue acts; and, 4. a production model to deliver dialogue acts using spoken and nonverbal communication. The success of the robots is governed by their ability to understand multimodal user input, make sense of that input in the context of the task, and produce dialogue acts and task actions that allow the task to progress. The robots must also adapt their behaviors in response to changes in user and task states. The studies presented in this section address two aspects of designing such a mixed-initiative robot system. The first study describes the development and assessment of a task model (item (2)). The task is to teach a human trainee how to assemble a set of pipes. The task model includes strategies for instruction and repair (fixing of communication breakdowns). The second study describes the development and assessment of a production model (item (4)) for rhetorical ability. The object is to produce credible, trustworthy, and persuasive robot speech. Each study begins with an examination of human skills that inform the robot models.

A Task model for instruction and repair Instruction is an interactive process in which the instructor and the student engage in negotiation of meaning (Skehan 2003:2). This negotiation involves dialogue through which the parties involved seek to align their understanding of a topic (Pickering and


   Part II: Frameworks and Guidelines

Garrod 2004:8). The alignment process includes breakdowns due to errors by the instructor or misunderstandings by the student. These breakdowns can impede progress towards alignment in understanding (Garrod and Pickering 2007). Detection of a breakdown can take various forms, notably comprehension checks and requests for clarification and confirmation (Skehan 2003). Depending on the type of breakdown (i.e., a misunderstanding or a non-understanding), speakers use different strategies to fix a breakdown (called repair), such as refashioning, elaboration, and collaborative refinement (Hirst et al. 1994). How might a robotic instructor seek such alignment? How might it offer repair when breakdowns occur? This study explores answers to these questions through the development of a task model for detecting and repairing breakdowns between a robot’s instructions and a user’s task actions. The task is to teach trainees how to assemble a pipe structure. This task was chosen because it resembles assembly tasks that could be assigned to robots (e.g., repairing a bike or assembling furniture). We begin with an analysis of human instructors performing the pipe-assembly task and implement the findings as a task model for a dialogue system in which the robot is the instructor. We conclude with an evaluation of the task model in a human-robot dialogue setting.

1 Study of how humans detect and repair breakdowns Methods. The pipe-assembly task – a multi-step task of connecting a set of pipes into a specified formation – was used to study how humans detect and repair breakdowns involved in instructing human trainees to perform an assembly task. Participants. We collected data from eight dyadic human interactions. Participants were recruited from the University of Wisconsin–Madison campus-community. To ensure a gender-balanced study we recruited eight male and eight female native speakers of English between the ages of 18 and 44 (M = 23.75, SD = 8.56). These trainees had backgrounds in a diverse range of occupations and majors. Procedure. Participants were seated facing each other at a table upon which rested three types of pipes (short, medium, and long) and two types of joints (elbow and t-joints). Prior to each dyadic interaction, one participant (the instructor) learned how to connect the pipes by watching a video and practicing pipe assembly. Instructors were given as much time as they needed learn the task. Once instructors learned to perform the task correctly they were asked to train the second participant (the trainee) to assemble the pipes without the aid of the video. Each dyadic interaction was recorded by a single video camera equipped with a wide-angle lens to capture both participants and the task space. The instructional portion of the task (excluding the time the first participant spent learning how to con-

4 Enabling human-robot dialogue   


struct the pipes via video) ranged from 3:57 minutes to 6:44 minutes (M = 5 minutes, 11 seconds, SD = 2 minutes, 19 seconds). Analysis and findings. We analyzed and coded communication breakdowns and repair events. Coding included the specifics of the breakdown: how the breakdown was communicated between participants and how repair was initiated and made. To ensure reliability and consistency of coded data, breakdowns were observed and coded by two raters. The inter-rater reliability showed a substantial level of agreement (79 % agreement, Cohen’s κ = .74). Participants indicated awareness of a breakdown either verbally (e.g., asking a question) or visually (e.g., noticing that the pipes were not configured according to expectations). They always initiated a repair process verbally. We found that 65 % of repairs were initiated by the trainee (trainee-initiated), while 35 % of repairs were initiated by the instructor (instructor-initiated). Traineeinitiated repair (also called “requests”) was directed towards understanding expectations; instructor-initiated repair aimed to clarify or correct the trainee’s perceptions of the task. Trainee-initiated repair always used a verbal statement to clarify or confirm instructor expectations. Requests were generated when the trainee either did not understand or misunderstood an instruction. Verbal requests for repair ranged from generic statements (e.g., “What?”) to more detailed requests for repair (e.g., “Where should the pipe go?”). We grouped statements into one of the following categories: confusion, confirmation, and clarification. These categories are consistent with previous work in repair which determined confusion to be a type of not understanding and clarification to be a type of misunderstanding (Gonsior, Wollherr and Buss 2010, Hirst et al. 1994 and Koulouri and Lauria 2009). Instructors detected breakdowns under two conditions: mistakes and hesitancy. They were also made aware of breakdowns by trainee-initiated repairs. While requests and hesitancy-based breakdowns were triggered by a trainee’s action or inaction, mistake detection required instructor checks on the trainee’s work. Mistake detection was done visually and occurred when instructors noticed a trainee performing an action that the instructor knew to be inconsistent with the goals of a given instruction (e.g., picking up the wrong piece) or when instructors visually evaluated the workspace. Instructors generally inspected the workspace after the trainee completed an instruction to determine whether the instruction had been executed correctly. When instructors noticed that the trainee was hesitating, they asked if the trainee needed help. The average time for instructors to respond to hesitancy was 9.84 seconds. Given verbal and visual inputs, the instructor could determine the type of breakdown that had occurred. They initiated repair verbally. Instructors chose to repair the breakdown by reformulating their original instruction, often by adding information to the part of the instruction that the trainee either executed incorrectly


   Part II: Frameworks and Guidelines

or seemed hesitant to execute. Instructors only repaired the part of the instruction that was involved in the breakdown.

2 Task model and system structure Task model. We used the data collected during the human-human training study to develop a task model for a robot-human, instructional dialogue. In our model, we chose a simulation-theoretic approach to direct the robot’s behavior in relation to the participant. This is a common approach for modeling human behavior, as it posits that humans represent the mental states of others by adopting their partner’s perspective as a way to better understand the partner’s beliefs and goals (Gallese and Goldman 1998 and Gray et al. 2005). Simulation-theoretic approaches have also been applied to the designs of robot behaviors and control architectures because they enable robots to consider their human partner’s perspective (Bicho et  al. 2011 and Nicolescu and Mataric 2003). In the context of an instructional task, the instructor has a mental model of an action they wish to convey to the trainee. Following instruction, the instructor can assess gaps in the trainee’s understanding or performance by comparing the trainee’s actions to their mental model of the intended action and noting the differences that occur. Robot Instructor Action-Triggered Repair

Request-Triggered Repair Repetition “Can you repeat that?”

Perceived State P' = {p1', ... , pn'}

Clarification “Where should the pipe go?”

p1' – p1 mismatch

Confirmation “Is this the right pipe?” Goal State P = {p1, ... , pn}

Hesistation-Triggered Repair Pt' = Pt+1' Human Student

Perceived State at time t

Perceived State at time t+1

Figure 4.2: A visual depiction of the developed task model that enabled the robot to offer repair based on user actions; hesitations to take actions; and requests for repetition, clarification, and confirmation.

Following the simulation-theoretic approach, we defined a set of instruction goals P = p1, …, pn for the robot that are linked to the result of the participant’s action or inaction, given the current instruction. Depending on the task, P may vary at each step of the instruction, as some instruction goals may no longer be applicable while others

4 Enabling human-robot dialogue   


may become applicable. As the participant engages in the task, the robot instructor will evaluate whether the current state of the workspace is identical to the set of instruction goals P*. If any of the individual task goals pk do not match pk*, then there is a need for repair (see Figure 4.2). How repair is carried out depends on which task goal pk has been violated (indicating that a mistake has been made). As we observed in human-human interactions, the instructor repaired only the part of the instruction that was deemed incorrect. Additionally, there is an inherent ordering to the set P that is determined by the instructor’s perception of the task. The instructor’s ordering of P is informed by elaboration theory which states that people order their instructions based on what they perceive to be the most important; then they reveal lower levels of detail, as necessary (Reigeluth et al. 1980). Based on these principles, imposing an ordering of decreasing importance on the set P for a given task can ensure that each pk takes precedence over any pk+n for n > 0. If multiple pk are violated, then the task goal with the lowest k is addressed first. An example of this ordering can be seen if a trainee picked up the wrong piece and attached it to the wrong location. The instructor first repairs the type of piece needed and then repairs the location of that piece. We grouped repair requests made by trainees into semantic categories. For example, the questions “Which piece do I need?” and “What piece should I get?” were recognized as the same question. We then allowed the model to determine the appropriate repair behavior to use based on the category of utterance. Because modeling of hesitancy can be task dependent, trainee hesitancy can be determined using a number of measures. For the pipe-assembly task, we chose to measure the time that had elapsed since the workspace last changed, which provided conservative estimates of hesitancy-based breakdown. We did not use time elapsed since the last interaction because it could result in an incorrect conclusion that the trainee was hesitant when, in fact, he or she was still working. We considered 10 seconds of no change to the workspace as a hesitancy-based breakdown. That was based on how long instructors in our human-human study waited before determining a breakdown had occurred. Although we have discussed the model for detecting mistakes in terms of task goals and mistakes concerning the workspace, this model can also be extended to understanding and repairing verbal mistakes. For example, if the participant mishears a question and responds in a way that is inconsistent with the answers expected, then a repair is needed. The appropriate answers to the intended question can be formalized as pk, and any answer that does not fulfill pk can be considered as a cause for repair. System structure. To implement our task model, we designed the system architecture to process both visual and verbal inputs. Visual input is used to check the trainee’s workspace and verbal input is used to detect and categorize the trainee’s speech. From


   Part II: Frameworks and Guidelines

these inputs, our system determines the type of breakdown – if any – that occurred and executes the appropriate response. Hardware. We implemented our model on a Wakamaru, humanoid robot. Video is captured at 12 frames per second using a Microsoft Kinect stereo camera, and audio is input via a microphone-array. The camera and microphone were suspended three feet above the trainee’s workspace. This camera setup provided a visible range of the workspace of 43 inches by 24 inches. A second stereo camera was placed behind the robot to track the trainee’s body and face. The pipes to be assembled were identical to those used for human-human instruction (see Figure  4.3) except that each was marked with an augmented reality (AR) tag to allow detection by the workspace camera. We used a total of eight unique AR tags: two tags for the elbow joints, three tags for the t-joints, and one tag each for the short, medium, and long pipes. Eight tags, four on each end, were placed on each pipe to allow detection from any rotation. The orientation of each tag was used to identify piece type, location, and rotation. The location and orientation of tags on pipes and joints were consistent across each type of piece, and tag locations on each piece were known to the system.

Figure 4.3: An experimenter demonstrating the use of the autonomous robot system and the assembly task on which the system was designed to instruct its users.

Architecture. The architecture for implementing our task model contains four modules: vision, listening, dialogue, and control. The vision and listening modules capture and process their respective input channels. The control module uses input from these modules to make decisions about the need for repair and relays the status of the workspace to the dialogue module when feedback from the robot is necessary.

4 Enabling human-robot dialogue   


The vision module detects the status of the trainee’s workspace and processes information on the trainee’s location. Sensing for each of these functions is managed by a separate camera. This module processes each frame and creates a graph (C) of pipe connections. There are three main steps to building C: finding the AR-tag glyphs in the frame, associating those glyphs with pieces, and determining which pieces are connected based on a set of heuristics. Each frame is searched for AR glyphs using a modified version of the Glyph Recognition And Tracking Framework (GRATF) system² by creating a set of glyphs G. Each glyph in G is defined by its type t, its position (x, y), and its rotation θ. Upon discovering a glyph, the algorithm checks to see if there are any pieces of the appropriate type (e.g., long pipe, t-joint) that are missing that particular glyph. If the algorithm finds a piece to which this glyph likely belongs (because of its proximity and similar rotation), then the glyph is associated with that piece. If no piece is found, a new piece is created, and the glyph is associated with the new piece. The result of this step is a set of pieces P, where each piece p is described by a set of glyphs that belong with that piece. All of the glyphs for a piece p form a bounding box that gives an approximate estimate of the boundaries of that piece. Using the piece’s boundaries, we can confirm whether any two pieces are connected and build a graph structure that reflects the workspace. At the completion of the trainee’s turn, the correct graph structure G* is compared with the structure G from the workspace. If the two graphs are isomorphic – identical in structure – then the trainee has successfully completed the instruction. If the graphs are not isomorphic, then the robot will determine the source(s) of the difference(s) between pk and pk*. It then passes that information to the control module. To expand the system to support checking of multiple instructions at once, the set of pipe connections C is built incrementally, starting with the first instruction that needs to be checked. Each instruction involves adding a new piece to a specific location at a particular angle. Checking the workspace for the first instruction s1 will result in detecting too many pieces because pieces required for instructions through sn are all lying on the table. Consequently, for the first instruction the module systematically eliminates from C all pipe pieces that are not involved in the first instruction. A piece is defined as extraneous if its removal does not result in a disjoint graph in C and does not reduce the count of that particular piece below what is needed to complete the given instruction. Once a modified version of C is found that results in a correct check of s1, pieces are returned incrementally to C to ensure that it contains all pipe pieces needed to complete other instructions. The second function of the vision module – detecting the trainee’s location – is checked at every frame. When the trainee is within a foot of the workspace, the robot positions its head so that it is gazing at the table, appearing to be monitoring the

2 GRATF is an open-source project. Its home page is


   Part II: Frameworks and Guidelines

workspace. When the trainee is farther away (e.g., standing back to check their work or retrieving a piece), the robot raises its head and gazes at the trainee’s face. If the trainee or the robot is talking the robot will look at the trainee; if the robot is checking the workspace in response to a prompt from the user, the robot will look at the location in the workspace where changes have been made. The listening module detects and classifies speech input from the trainee (the trainee’s dialogue acts) using the following categories: – Request for repetition: (e.g., “What did you say?” “Can you repeat the instructions?”); – Check for correctness: (e.g., “Is this the right piece?” “I’m done attaching the pipe.”); – Check for options: (e.g., “Which pipe do I need?” “Where does it go?”). Utterances that do not belong to one of these categories (e.g., confirmations of the robot’s instruction) are ignored by the system. A custom-built dialogue manager coordinates responses to each category. Each recognized utterance has an associated semantic meaning that indicates the purpose of the utterance. For example, “What did you say?” is a recognition request. These semantic meanings allow the control module to identify the type of utterance that was input and to reply to the utterance appropriately, given the current state of the trainee’s workspace. To process requests that refer to the workspace, the system first checks the state of the workspace using the vision module. For example, asking “Did I do this right?” requires the robot to determine whether the current configuration of pipes on the workspace is correct. Decisions regarding the robot’s next dialogue act are determined by the control module. The control module uses input from the vision and listening modules. Following a simulation-theoretic approach, it makes decisions by comparing these inputs to actions that the robot expects to occur in response to its instructions for the current task step. We defined a set P that described which possible expectations can be violated by the trainee. The ordering of task expectations is based on observations from our study of human instructor-trainee interactions. This resulted in the following categories of expectations: – Timely Action (p0): The trainee acted in a timely fashion; – Correct Piece (p1): The trainee chose the correct piece to use; – Correct Placement (p2): The trainee placed the piece in the correct location relative to the current workspace; – Correct Rotation (p3): The trainee rotated the piece correctly relative to the current workspace. The first expectation ensures that the trainee does not hesitate for 10 seconds or more following the robot’s last instruction. Violation of this expectation can indicate confusion and be classified as a hesitancy breakdown. The last three expectations ensure

4 Enabling human-robot dialogue   


that the trainee has chosen the correct piece to add, has added the piece at the correct location, and that the piece is rotated correctly, respectively. After evaluating input from the vision and listening modules, the control module passes three pieces of information to the dialogue module: current instruction, the semantics associated with the speaker’s last utterance (if any), and the result of the control module’s evaluation of the workspace (if any). Given this information, the dialogue module chooses an appropriate verbal response to address a detected breakdown. Responses depend on which instruction of the task the trainee is completing, the current layout of the workspace, and the nature of the breakdown. Not all responses are dependent on all three pieces of information. For example, requests for repetition of the last instruction are independent of how the workspace is currently configured, and responses to hesitancy are independent of the current workspace and interaction with the trainee. However, a request to check whether the trainee has correctly completed an instruction requires knowledge of both the completed instructions and the current layout of the workspace.

3 Evaluation of the task model in robot-human instruction Methods. To evaluate the feasibility and functioning of the task model and system, we studied robot-human performance of the same pipe-assembly task used for the human-human study described earlier. The task involved fifteen instructional steps. Participants. Eight male and eight female native speakers of English between the ages of 20 and 34 (M = 23, SD = 4.5) were recruited from the University of Wisconsin– Madison campus and surrounding community. These trainees had backgrounds in a diverse range of occupations and majors. They all participated as trainees in this study. Physical setup. Trainees were given two bins: one for pipes and one for joints. These bins contained only the pieces necessary for completing the task. The configuration mimicked the setup in which different types of parts might be kept at a workshop or atelier. A table located between the robot and the trainee served as the trainee workspace (see Figure 4.3). Procedure. At the start of a session, an experimenter guided a trainee into the experiment room, explained the task, and showed the trainee the various pieces of pipes and joints to be used in the task. After the experimenter exited the room, the robot started the interaction by explaining that it would provide step-by-step instructions for assembling the pipes. Then it directed the trainee through the pipe-assembly task by issuing two or more instructions in sequence. Throughout the task, the robot also provided repair, when necessary.


   Part II: Frameworks and Guidelines

After hearing an instruction from the robot, the trainee retrieved the pieces needed to complete the instructions and assembled the pieces on the table. If the trainee requested repetition or clarification, the robot answered. When the trainee asked the robot to check the workspace, it confirmed correct actions or provided repair according to the task model. If no repair was needed, it congratulated the trainee on completing the instructions correctly and proceeded to the next set of instructions. Following completion of the task, the robot once again thanked the trainee. The trainee then completed a post-experiment questionnaire. Trainees took between 3:57 and 8:24 minutes (M = 6 minutes 59 seconds, SD = 1 minute 19 seconds) to complete the task. Analysis and findings. To determine the effectiveness of our task model, we identified all breakdowns across trainees. We noted the type of breakdown (requests (traineeinitiated repairs), hesitancy, or mistake detection). For requests and mistakes we also recorded the nature of the problem (e.g., a request to repeat the instruction or the trainee using an incorrect pipe). To ensure reliability and consistency of coded data, breakdowns were observed and coded by two raters. The inter-rater reliability showed a substantial level of agreement (87 % agreement, Cohen’s κ = .83). Across 16 trainees, 21 breakdowns occurred. Ten of these breakdowns were requests from the trainee, and the remaining 11 breakdowns were found visually using the task model’s mistake detection. Four involved a trainee using an incorrect piece, two involved a piece being added to the wrong location on the current structure, and five involved a piece being added in the correct location but rotated incorrectly. The majority (eight) of these mistakes occurred in the first step that was issued during the robot’s instruction. In all cases, after the robot issued a repair, the trainee was able to successfully complete the step without the need for further repair. All requests from trainees took the form of requests for repetition of a single instructional step that had been issued by the robot. Trainees never asked the robot to repeat two or more steps (e.g., “Could you repeat all the steps?”). Repair requests were evenly distributed across all steps of the task, with no particular step eliciting frequent requests for repetition. Thirteen of the 21 breakdowns occurred in the first four steps of the interaction (out of a total of fifteen steps). Eighteen of the 21 breakdowns occurred in the first half of the interaction. The remaining three breakdowns were distributed across the second half of the session and were comprised of one clarification and two requests for repetition of an instruction.

4 Discussion Our evaluation study demonstrated that the task model we developed was highly effective. For every trainee and every breakdown occurrence, the task model success-

4 Enabling human-robot dialogue   


fully provided the necessary repair, ensuring that trainees did not need further repair for instructions. Although the robot was equipped to respond to requests for clarification (e.g., “Which piece do I need?” “How is the joint rotated?”), all trainees issued their requests in the form of requests for repetition of an instruction. Trainees may have favored repetition over clarifying specific aspects of an instruction for a number of reasons. They may have forgotten enough of the instruction to make it difficult to ask for specifics. Alternatively, if trainees needed clarification on more than one aspect of an instruction, such as understanding which piece to use and where to place it, they may have opted to ask for a single repetition rather than asking multiple clarification questions. Additionally, trainees may have preferred to receive clarification in the context of the original instruction. Finally, even though trainees were instructed to act naturally with the robot and were informed that they could interact with the robot if they needed assistance, they may not have been aware of the robot’s ability to handle clarification requests. Most breakdowns occurred toward the beginning of an interaction, with few breakdowns occurring in the second half of the interaction. This suggests that trainees need more help earlier in a task. As the task continued, trainees become acclimated to key terms and concepts, such as vocabulary used to identify objects and how to perform actions requested by the instructor. Additionally, progress in the task often reduces the number of incorrect choices a trainee could make, thereby decreasing the likelihood of a breakdown. In summary, this section demonstrated a particular instantiation of the task model, contextualized as part of an instructional system that autonomously detected breakdowns and enacted repair. Using visual and verbal inputs, the task model demonstrated that it is capable of recognizing breakdowns and choosing appropriate dialogue acts for the robot to repair breakdowns. We evaluated the feasibility and functioning of the model in a human-robot dialogue scenario in which trainees were guided through a pipe-assembly task by the robot, receiving help as needed. The results prove the feasibility of using a task model by an autonomous robot system. It also provides new insights into how dialogue processes unfold between humans and robots, such as how people seek and receive repair from a robot.

B A production model for expert robot speech As embodied social agents, robots have the potential to behave much like an effective human speaker who can persuade listeners to make certain choices. In order to achieve this goal, they must convince their users that they are experts in the subject matter under consideration. For example, most people are unlikely to be persuaded to visit a specific art exhibit suggested by a robot museum guide that does not express sufficient understanding of the museum. In this case, the production model for the


   Part II: Frameworks and Guidelines

informational robot’s speech production should appropriately convey expertise in its utterances. Expertise is commonly thought to be synonymous with having a high degree of knowledge and experience in a particular area. There is another dimension of expertise, however: rhetorical ability, defined as the ability to communicate effectively and persuasively (Hartelius 2011). Someone who is knowledgeable may be considered an expert, but their expertise might not benefit others if they are not able to effectively communicate this expertise to an audience. Such a person might be more correctly referred to as knowledgeable rather than as an expert. To become an expert requires knowledge and rhetorical effort. Expert human communicators display certain linguistic cues that convince listeners that the communicator is a fully-qualified expert. Many speakers convince listeners of their expertise through expressions of goodwill, references to past instances of expertise-sharing, intuitive organization of information, and appropriate metaphors (Hartelius 2011). If robots are to be perceived as experts, they need to possess a sufficient amount of practical knowledge as well as the rhetorical abilities required to effectively express this knowledge. To enable rhetorical abilities in robotic speech, we created a speech production-model that drew upon literature in psychology and linguistics regarding speech cues that can increase a speaker’s rhetorical ability. The production model takes as input high-level dialogue acts generated by the dialogue model and transforms them into the utterances to be produced by the robot. Given a dialogue act and the goals of the human-robot interaction, the production model will produce speech that is appropriate. For example, if a robot were expected to talk to young children, the appropriate production model would reformulate its speech to be simpler. In the context of our study, we explored how the production model might reformulate the robot’s speech, using appropriate linguistic cues of expertise to increase the robot’s credibility and persuasiveness. In order to evaluate whether these cues help robots achieve the rhetorical ability that the model suggests, we conducted an experiment in which robots made suggestions to participants regarding which landmarks to visit during a tour of a virtual city. As with the design of the task model described earlier, our work on a production model for expert speech began with human experts. We drew from literature in psychology and linguistics on the strategies used by human experts.

1 Models of expertise Humans. Research in psychology suggests that in situations which require a certain degree of problem-solving, experts and novices display a number of easily identifiable differences. Novices are more dependent on surface features, whereas experts focus more on the underlying principles (Chi et al. 1981). For example, when a novice looks at the map of a city they have never visited, they might only know what the map tells them: street names, landmark names, and anything else that is visible on

4 Enabling human-robot dialogue   


the map itself. A resident of the city (i.e., an “expert”) is able to draw on information beyond the map, such as landmarks that are not listed (Isaacs and Clark 1987). However, there is an important distinction between having knowledge and being able to share it. Skill in communicating expertise presents an entirely different challenge from simply utilizing prior knowledge. Research suggests that knowledge should be presented in a useful way to someone who may have no prior experience with the topic at hand, using linguistic cues to support the perception of expertise (Kinchin 2003). This rhetorical effort helps speakers to be accepted as an expert. By using rhetoric to increase perceptions of expertise, these speakers go beyond simply giving information: they convince the listener that the information is important and useful and that they should accept it as true (Hartelius 2011). Ultimately, one of the goals of conveying expertise is persuasion. By demonstrating expertise using rhetorical ability, speakers attempt to establish power in the situation, justifying why a listener should value their words. This use of rhetoric can then lead to the listener’s compliance in both thought and behavior (Marwell and Schmitt 1967). Robots. Research in human-robot interaction has also explored how robots might communicate more effectively in informational roles. Torrey et  al. (2006) demonstrated that the amount of information that a robot presents to a user depends on the level of expertise of the user and the context of the information exchange. Including too many details could be condescending to someone with past experience on a subject, while not including sufficient detail could be unhelpful for someone who is entirely unfamiliar with it. Some information, while helpful, may be restricted only to certain groups, such as health information at a clinic. Another study on linguistic cues for help-giving robots showed that not only should direct commands be avoided, as they generally created a poor impression of the robot, but that the use of hedges (“I guess”, “I think”, “maybe”, etc.) softened the intensity of commands and was found to be polite by many of the participants (Torrey 2009). Researchers have explored the use of robots as persuasive entities. For instance, having an “expert” entity physically present during a long-term, weight-loss program has been shown to be beneficial, as people felt more accountable for following the program (Kidd and Breazeal 2008). Roubroeks et  al. (2010) explored participants’ willingness to follow advice from robots when identifying the most important setting in a washing machine. They concluded that messages should be carefully worded to avoid creating a poor impression of the robot, as such an impression could result in people no longer wishing to interact with it. Although previous work has examined dialogue strategies for informational robots and robots as persuasive entities, little work has been done on identifying specific verbal strategies for increasing the perception of expertise in a robot. In this project, we sought to address this knowledge gap by developing a model of expert speech for robots that included specific linguistic cues based on an analysis of expert


   Part II: Frameworks and Guidelines

rhetoric for human speakers (Hartelius 2011). The next subsection introduces this model and the cues robots can use to effectively communicate their expertise.

2 Production model for how experts talk Our production model for expert speech draws from work that defines the overall concept of expertise along two dimensions: practical knowledge, which captures prior knowledge and experience; and rhetorical ability, which refers mainly to speaking prowess (Hartelius 2011). As represented in Figure  4.4, we propose that these two dimensions be used to divide the entire space of expertise into four quadrants. A speaker with low practical knowledge and low rhetorical ability is a true novice. On the other hand, an individual with a novice-level understanding of the subject matter who uses expert rhetoric in conversation may be falsely perceived as an expert. This situation corresponds to the perceived expert. A perceived novice describes a speaker who has a high amount of practical knowledge but does not possess the rhetorical skills to effectively communicate that knowledge. Finally, the true expert is an individual who possesses both the knowledge and the rhetorical prowess to be effectively persuasive when conveying information.

Speaker high rhetorical ability Perceived expert

True expert

Speaker low practical knowledge

Speaker high practical knowledge

True novice

Perceived novice Speaker low rhetorical ability

Figure 4.4: A conceptual representation of the space of expertise divided along the dimensions of rhetorical ability and practical knowledge.

The production model is not useful for giving a robot more practical knowledge; rather, it is for improving the robot’s speech in the dimension of rhetorical ability. Research literature in psychology and linguistics points to a number of linguistic cues that can contribute to the speaker’s rhetorical ability, which increase his or her effectiveness in persuading listeners (Carlson Gustafson and Strangert 2006, Glass

4 Enabling human-robot dialogue   


et al. 1999, Hartelius 2011, Hyde 2004, Noel 1999 and Sniezek and Van Swol 2001). We utilize five of these linguistic cues in our production model. A speaker with high rhetorical ability, either a perceived expert or a true expert, will use many of these linguistic cues throughout his or her speech. An individual with low rhetorical ability, either a perceived novice or true novice, will use few to none of these cues. Informational robots as social entities need to use these cues effectively in order to be perceived as experts. These cues are summarized with examples in Table 4.1 and are discussed below. Table 4.1: A summary of the linguistic cues used for the production model of expert speech. On the left is a list of cues along with brief descriptions for each cue; on the right are examples of how each cue was used to reformulate the robot’s speech. Model Cues

Example Speech Production

Goodwill Expert: “This cafe is a great place to go for lunch to get out of the hot sun.” Wanting the best for the Novice: “This cafe is a great place to go for lunch.” listener. Prior expertise References to past helping experience.

Expert: “I send a lot of visitors to this museum each year.” Novice: “A lot of visitors go to this museum each year.”

Organization More natural organization of information.

Expert: “At 1000 years old, the castle is the oldest landmark in the city. It has Gothic architecture.” Novice: “The castle is 1000 years old. It has Gothic architecture. It’s the oldest landmark in the city.”

Metaphors Making descriptions more accessible.

Expert: “Stepping onto the sunny beach is like wrapping yourself in a towel from the dryer.” Novice: “The sunny beach is quite hot.”

Fluency Reduced pauses and confidence in speech.

Expert: “The statue is 200 years old. [A 300 ms. pause] It was built to honor the King.” Novice: “The statue is 200 years old. [A 1200 ms. pause] It was built to honor the King.”

Goodwill. Effective and persuasive speech involves more than providing information to listeners. Research suggests that an effective speaker also conveys to the listener that considering this information is in his or her best interest (Hyde 2004). To achieve this effect, the speaker displays expressions of goodwill, which indicate that the speaker wants what is best for the listener (Hartelius 2011). For example, when describing a museum of history, the speaker might not only point out that the museum is a popular landmark, but also might suggest that the listeners might enjoy themselves and learn about history if they visit the museum.


   Part II: Frameworks and Guidelines

Prior Experience. Another linguistic cue that research on expertise has identified as key to expert speech is in expressions of prior experience (Sniezek and Van Swol 2001). This research suggests that effective speakers also seek to gain their listeners’ trust by building multiple favorable experiences with the listener or conveying that the speaker has had prior experience giving good information to previous listeners. When listeners trust the speaker, they consider the information to be more valuable and rely more on it in making decisions. For example, the speaker might reference a previous listener who visited the history museum and his or her favorable experience at the museum. Metaphors. Metaphors help establish common ground between the speaker and the listener and indicate that the speaker is making an effort to connect with the listener and to share his or her experience and expertise (Hartelius 2011). For example, the speaker might depict visiting a particular collection in the museum of history as stepping into the lives of people who lived in the past century. Metaphors are particularly effective linguistic elements to make concepts understandable for listeners with little experience with the subject. Organization. Rhetorical ability is also shaped by the organization of utterances (Glass et al. 1999 and Noel 1999). Research on teaching strategies of individuals who are topic experts but novices in tutoring found that they often took tangents or looped back to an earlier point in the conversation to bring up a detail that they had missed earlier (Glass et al. 1999). These tangents and disruptions are examples of poor organization. They can damage the credibility of the speaker by creating the impression that he or she is not well versed in the subject (Noel 1999). For example, when describing the museum, the speaker must display a logical progression of information with sufficient detail and good flow. Fluency. Finally, the timing or fluency of speech is a key para-verbal cue for rhetorical ability. Research suggests that speakers with both practical knowledge and rhetorical ability can draw upon their knowledge and speak more quickly than speakers who are not comfortable speaking or are not experts on the topic of the conversation (Chi, Glaser and Rees 1981). While a certain amount of pausing is expected in spontaneous speech at all levels of speaker expertise, pauses that are beyond a “normal” length are perceived as the cues of an inarticulate speaker (Carlson, Gustafson and Strangert 2006) and as disruptive to conversational flow (Campione and Véronis 2002). Brief (less than 200 ms.) and medium (200–1000 ms.) length pauses are frequent and expected in spontaneous speech, while the frequent occurrence of long pauses (more than 1000 ms.) might become disruptive.

4 Enabling human-robot dialogue   


Figure 4.5: An experimenter demonstrating the setup of the task used for the evaluation of the production model.

The production model is composed of these types of linguistic cues. It generates expert speech for robots in roles where they are expected to provide suggestions and information and to persuade people to take a particular course of action. We developed the following hypothesis regarding the impact that a robot using the production model might have in a human-robot interaction. Hypothesis. Participants will follow suggestions made by a robot with high rhetorical ability (using more linguistic cues of expertise) more frequently than suggestions made by a robot with low rhetorical ability. This prediction is based on findings from studies that show that the use of rhetoric to convey expertise elicits listener compliance (Marwell and Schmitt 1967). Previous work has also shown that verbal cues of expertise help build trust and that advice-takers tend to follow the suggestions of those they trust (Sniezek and Van Swol 2001). Research in human-robot interaction also provides basis for this hypothesis. For instance, the use of expert language by a robot that offers advice can shape how controlling users perceive the robot to be (Torrey 2009).

3 Evaluation of production model in human-robot dialogue Methods. To test our production model for expert speech and our prediction, we designed an experiment in which two robots identical in appearance made suggestions to participants about which landmarks to visit on a tour of a virtual city. The robots differed in terms of their amount of practical knowledge, their rhetorical ability, or both.


   Part II: Frameworks and Guidelines

Participants. Forty-eight participants (24 females and 24 males) were recruited from the University of Wisconsin–Madison campus. All participants were native English speakers, and their ages ranged from 18 to 53 (M = 23.69, SD = 7.83). On average, participants rated their familiarity with robots as 2.77 (SD = 1.55) on a seven-point rating scale. Physical setup. As illustrated in Figure 4.5, the test setup involved a table, two Lego Mindstorm robots approximately 18” tall, a large screen display, and a keyboard, all of which were positioned on a table. The participant sat in front of the keyboard and screen. The robots stood on either side of the screen facing the participant. The voices of the robots were distinct, but both were modulated voices designed to be genderneutral and were assigned randomly to each robot at the beginning of the tour. Procedure. We manipulated the level of practical knowledge (low vs. high) and the level of rhetorical ability (low vs. high) for each robot to create four types of speech strategies that correspond to the quadrants of the model described earlier and illustrated in Figure 4.4: perceived expert, true expert, perceived novice, and true novice. To compare these strategies, each trial involved pairs of robots that used different speech strategies. We included every possible pairing of quadrants. We then randomly assigned eight participants to each of the following robot pairs: (1) perceived expert vs. true expert; (2) true novice vs. perceived novice; (3) perceived expert vs. true novice; (4) true expert vs. perceived novice; (5) perceived expert vs. perceived novice; and (6) true expert vs. true novice. For each stop on the virtual tour, we developed four scripts representing the four quadrants of the model. Practical knowledge was manipulated by varying the number of discrete facts that the descriptions of the landmarks presented; high knowledge scripts contained four discrete pieces of information; low knowledge scripts contained two. Rhetorical ability was manipulated by varying the number of linguistic cues of expertise present in the robot’s speech. High rhetorical ability scripts contained three of the expert speech strategies from Table 4.1, while low rhetorical ability scripts contained none. Because human speakers employ only a limited number of cues in a given message, we used only three cues in each script but used all five cues in approximately equal proportions across all scripts in the study. The examples below illustrate the description of the landmark Elberam Cathedral in all four linguistic strategies. Four facts are associated with this landmark: (1) it is visible from anywhere in the city; (2) it has high towers; (3) its towers have spiral stairways; and (4) the stairways have more than 350 steps. True Expert: “Elberam Cathedral is visible from anywhere in the city. Its towers are as tall as the clouds on some days, and include spiral stairways, which have more than 350 steps.”

4 Enabling human-robot dialogue   


The script presents all four facts, uses good fluency and organization, and includes a metaphor to describe the height of the towers. Perceived Novice: “Elberam Cathedral is visible from anywhere in the city. [1500 ms. pause] The cathedral has high towers. [1500 ms. pause] The towers include spiral stairways, which have more than 350 steps.”

The script presents all four facts but displays low rhetorical ability with no linguistic cues of expertise. Perceived Expert: “Visible from anywhere in the city, Elberam Cathedral has towers that are as tall as the clouds on some days.”

The script presents only two of the facts, but uses good fluency and organization, and includes a metaphor to describe the height of the towers. True Novice: “Elberam Cathedral is visible from anywhere in the city. [1500 ms. pause] The cathedral has high towers.”

The script presents only two of the facts and displays low rhetorical ability with no linguistic cues of expertise. To assess whether these manipulations achieved different levels of practical knowledge and rhetorical ability, we asked two independent coders, blind to the manipulations, to rate each script for information content and rhetorical expertise. We then analyzed their ratings using a repeated-measures analysis of variance (ANOVA), using coder and script IDs as covariates. Practical knowledge significantly affected the ratings of information content, F(1,4) = 12.50, p = .024, but only marginally affected ratings of rhetorical expertise, F(1,4) = 7.21, p = .055. Rhetorical ability significantly affected ratings of rhetorical expertise, F(1,4) = 23.34, p = .009, but did not affect ratings of information content, F(1,4) = 0.03, p = .87. These results suggest that the scripts match our target levels of practical knowledge and rhetorical ability. The experimenter introduced the participants to the two Lego Mindstorm robots, Sam and Ky (chosen as gender-neutral names), and told them that the robots were being trained as tour guides. The robots were placed on the two sides of a computer monitor as shown in Figure  4.5. In the experimental task, participants planned a virtual tour of the fictional city with options presented by the two robots. They were shown (on the display screen shown in Figure 4.5) ten pairs of similar locations, such as two art museums or two amusement parks. Depending on the pairing condition, one robot described locations using the script from one quadrant of the model, while the other robot used a different quadrant. Each robot, in random order, turned its head to the participant, provided the information, and then turned away. After each robot gave its description, the participant chose a location to visit. The order of the


   Part II: Frameworks and Guidelines

landmark pairs and which landmark appeared to the left or right of the screen were also randomized. After finishing the tour of ten locations, participants filled out a questionnaire that measured their perceptions of the two robots. The study took approximately thirty minutes. Analysis and findings. The study involved two manipulated independent variables: practical knowledge and rhetorical ability. The primary dependent variable was participant compliance, which was measured by labeling each choice of landmark with the speech strategy employed by the robot that provided the description for that landmark. We collected, as secondary dependent variables, data on a number of subjective measures of participants’ impressions of the robots. Seven-point rating scales were used for all items in these measures. Item reliabilities, measured by Cronbach’s α, for all measures were sufficiently high: trustworthiness (3 items; Cronbach’s α = 0.74) sociability (3 items; Cronbach’s α = 0.86) persuasiveness (3 items; Cronbach’s α = 0.88), and competency (5 items; Cronbach’s α = 0.91). Our data analysis included mixed-model analysis of variance (ANOVA) tests to assess how the robots’ level of practical knowledge and rhetorical ability affected participants’ decisions about which landmarks to visit and their subjective evaluations of the robots. Practical knowledge, rhetorical ability, and the interaction between the two variables were modeled as fixed effects. The analysis of data from objective measures also considered the location number in the virtual tour as a random variable. Our hypothesis predicted that participants would choose more landmarks described by the robot with higher rhetorical ability. Our analysis provides support for this hypothesis: the robot’s level of rhetorical ability had a significant effect on which locations participants chose to visit, F(1,956) = 21.83, p