140 79 16MB
English Pages 592 [591] Year 2014
EVOLUTION of THE
LANGUAGE
9177_9789814603621_tp.indd 1
3/3/14 8:26 am
May 2, 2013
14:6
BC: 8831 - Probability and Statistical Theory
This page intentionally left blank
PST˙ws
9177_9789814603621_tp.indd 2
3/3/14 8:26 am
Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
Photo sources (wikimedia.org ): CC BY-SA 3.0: Baobab_06.jpg by Cilibul; Grey_Parrot.jpg by Maurice van Bruggen; Baby_teeth_in_human_infant.jpg by Chrisbwah; Neuron_in_tissue_culture.jpg by GerryShaw. CC BY 2.0: Rock_crusher_gears.jpg by Jeekc; Oxfam_East_Africa_-_Alice’s_Shop.jpg by Oxfam East Africa. Public domain: Spectrogram-buy.png by COMDJ. Bonobo photo used by permission of Emilie Genty. Robot photo reprinted with permission of Luc Steels.
THE EVOLUTION OF LANGUAGE Proceedings of the 10th International Conference (EVOLANG10) Copyright © 2014 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 978-981-4603-62-1
Printed in Singapore
PREFACE Preface This volume collects the refereed papers and abstracts of the 10th International Conference on the Evolution of Language (EVOLANG X), held in Vienna on 4th-17th April 2014. Submissions to the conference were solicited in two forms, papers and abstracts, and this is reflected in the structure of this volume. The biennial EVOLANG conference is characterised by an invigorating, multi-disciplinary approach to the origins and evolution of human language, and brings together researchers from many fields including anthropology, archaeology, artificial life, biology, cognitive science, computer science, ethology, genetics, linguistics, neuroscience, palaeontology, primatology, psychology and statistical physics. The multi-disciplinary nature of the field makes the refereeing process for EVOLANG very challenging, and we are indebted to our panel of reviewers for their very conscientious and valuable efforts. A full list of the reviewers in the panel can be found on the following page. Further thanks are due to: • The EVOLANG committee: Rudolf Botha, Erica Cartmill, Jean-Louis Dessalles, Ramon Ferrer-i-Cancho, W. Tecumseh Fitch, James R Hurford, Chris Knight, Heidi Lyn, Luke McCrohon, Kazuo Okanoya, Nikolaus Ritt, Kenny Smith, Maggie Tallerman. • The local organising committee: Nikolaus Ritt, Andreas Baumann, Sarolta Viola, Iris Vukovics. • The workshop convenors: Bart de Boer, Carel Ten Cate, Bruno Gingras, Melanie Malzahn, Andrea Ravignani, Nikolaus Ritt, Luc Steels, Remi van Trijp, Freek van de Velde, Tessa Verhoef, Willem Zuidema. • Our Sponsors: John Benjamins publishing company. • The plenary speakers: Michael Arbib, Rob Boyd, Bill Croft, Chris Knight, Jim Hurford, Ann Senghas, Joan Silk, Kenny Smith. • Finally, and most importantly, the authors of all the contributions collected here. v
To commemorate the 10th EVOLANG, this volume contains a special section on the evolution of the conference itself, including perspectives on its history and future. We are excited to mark this milestone and look forward to the next ten conferences. Erica Cartmill, Sea´n Roberts, Heidi Lyn, and Hannah Cornish. December 2013
vi
PROGRAM ProgramCOMMITTEE Committee Natasha Abner Chris Abry Michael Arbib Kate Arnold Jordi Arranz Rie Asano Mark Atkinson Quentin Atkinson Lluis Barcel— i Coblijn Andrea Baronchelli Mark Bartlett Aleksandrs Berdicevskis Till Bergmann Teresa Berjarano Robert Berwick Richard Blythe Bart de Boer Robert Boyd Greg Bryant Christine Caldwell Angelo Cangelosi Andrew CarstairsMcCarthy Erica Cartmill Matt Cartmill Morten Christiansen Brady Clark Fred Coolidge Kensy Cooperrider Marie Coppola
Michael Corballis Hannah Cornish Chrissy Cuskley Dan Dediu Jean-Louis Dessalles Mark Dingemanse Daniel Dor Michael Dunn Mark Ellison Nicholas Fay Olga Feher Vanessa Ferdinand Ramon Ferrer i Cancho Tecumseh Fitch Molly Flaherty Jacob Foster Michael Franke Roland Frey Bruno Galantucci Simon Garrod Emilie Genty David Gil Bruno Gingras Tao Gong Chloe Gonseth Tom Griffiths Sabine van der Ham Harald Hammarstršm Steven Harnad Stefan Hartmann John Hawks vii
iii
Catherine Hobaiter Jean-Marie Hombert Jim Hurford Stephanie Jett Sverker Johansson Junko Kanero Anne van der Kant James Kirby Simon Kirby Chris Knight Bob Ladd Marion Laporte David Leavens Simon Levy Katja Liebal Philip Lieberman Elena Lieven David Lightfoot Richard Littauer Hannah Little Gary Lupyan Erkki Luuk Heidi Lyn Elaine Madsen Michiru Makuuchi Mauricio Martins Doug Mastin Luke McCrohon Adrien Meguerditchian Dominic Mitchell Richard Moore
Bart de Boer Robert Boyd Greg Bryant Christine Caldwell Angelo Cangelosi Andrew CarstairsMcCarthy Erica Cartmill Matt Cartmill Morten Christiansen Brady Clark Fred Coolidge Kensy Cooperrider Marie Coppola Juan Carlos Moreno Cabrera Keelin Murray Dillon Niederhut Alan Nielsen Mieko Ogura Kazuo Okanoya Irene Pepperberg Amy Perfors Simone Pika Michael Pleyer Camilla Power Ljiljana Progovac Sonia Ragir Anne Reboul Gareth Roberts Se‡n Roberts Hannah Rohde Carmen Salda–a
Roland Frey Bruno Galantucci Simon Garrod Emilie Genty David Gil Bruno Gingras Tao Gong Chloe Gonseth Tom Griffiths Sabine van der Ham Harald Hammarstršm Steven Harnad Stefan Hartmann John Hawks Marieke Schouwstra John Schumann iii Thomas Scott-Phillips Robert Seyfarth Catriona Silvey Barbora Skarabela Katie Slocombe Andrew Smith Kenny Smith Matt Spike Michael Spranger Kevin Stadler James Steele Nina Stobbe Justin Sulik Kenta Suzuki Samarth Swarup Whitney Tabor Masanori Takezawa
viii
iv
David Lightfoot Richard Littauer Hannah Little Gary Lupyan Erkki Luuk Heidi Lyn Elaine Madsen Michiru Makuuchi Mauricio Martins Doug Mastin Luke McCrohon Adrien Meguerditchian Dominic Mitchell Richard Moore Maggie Tallerman Monica Tamariz Bill Thompson Remi van Trijp Rory Turnbull Natalie Uomini Jaques Vauclair Tessa Verhoef Marilyn Vihman Paul Vogt Connie de Vos Slawomir Wacewicz Jeffrey Watumull Stephanie White Bodo Winter James Winters Hajime Yamauchi Jordan Zlatev Jelle Zuidema
Contents Preface
v
Program Committee
vii
Perspectives on EVOLANG
1
The Evolution of EVOLANG Michael Arbib Cedric Boeckx Michael Corballis Bart de Boer Jean-Louis Dessalles Ramon Ferrer-i-Cancho W. Tecumseh Fitch James R. Hurford Sverker Johansson Simon Kirby Chris Knight Phillip Lieberman Heidi Lyn Kazuo Okanoya Thom Scott-Phillips Luc Steels Maggie Tallerman
3 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Papers
23 25
Diachronic processes in language as signaling under conflicting interests Christopher Ahern, Robin Clark Syntactic development in phenotypic space Lluís Barceló-Coblijn, Antoni Gomila Benejam Finding the underpinnings: The last quarter century Ted Bayne ix
33 41
Strategies for the emergence of first-order phrase structure Emilia Garcia Casademont, Luc Steels What were we talking about? Exchanging social models as a route to language Martin Edwardes Why might SOV be initially preferred and then lost or recovered? A theoretical framework Ramon Ferrer-i-Cancho Linguistic animals: Understanding language through a comparative approach Piera Filippi Creative compositionality from reinforcement learning in signaling games Michael Franke Overlapping and synchronization in the song of the Indris (Indri indri) Marco Gamba, Valeria Torti, Giovanna Bonadonna, Gregorio Guzzo, Cristina Giacoma A matter of perspective: Viewpoint phenomena in the evolution of grammar Michael Pleyer, Stefan Hartmann A constructionist approach to the evolution of morphological complexity Stefan Hartmann Language evolved for storytelling in a super-fast evolution Till Nikolaus von Heiseler What iconicity can and cannot do for proto-Language Elizabeth Irvine Did language evolve incommunicado? Sverker Johansson Hunter-gatherer egalitarianism enabled grammar to evolve Chris Knight, Jerome Lewis Grasping compositional patterns in an artificial language by Chinese participants Yau Wai Lam, Tao Gong The emergence of compound signals Erkki Luuk, Hendrik Luuk Modality switch in human language evolution Roland Mühlenbernd, Dankmar Enke, Matthias Villing, Natalja Gavrilov, Jonas David Nick x
50 58 66 74 82 90
98 106 114 122 130 138 146 154 161
Broadcasting to the enemy: Deception as a solution in evolution of language L’udovít Malinovský Recursion is not language domain-specific: Interim results of a research program Mauricio Martins On the emergence of bilingualism in a communication “ALL” task as a result of competition between social conformism and language simplification Jerome Michaud Establishing a communication system: Miscommunication drives abstraction Gregory J. Mills On the reliability of unreliable information: Gossip as cultural memory Dominic Mitchell, Joanna J. Bryson, Gordon P. D. Ingram Homo praedicans Albert F. H. Naccache The phonatory culture hypothesis Dillon Niederhut Evolution of tense and aspect Mieko Ogura, Takumi Inakazu, William S.-Y. Wang Orofacial gestures in language evolution: The auditory feedback hypothesis Sylwester Orzechowski, Sławomir Wacewicz, Przemysław Żywiczyński Iconicity and ape gesture Marcus Perlman, Nathaniel Clark, Joanne E. Tanner Iterative vocal charades: The emergence of conventions in vocal communication Marcus Perlman, Rick Dale, Gary Lupyan Constructions, construal and cooperation in the evolution of language Michael Pleyer, Nicolas Lindner Female philopatry and egalitarianism as conditions for the emergence of intersubjectivity Camilla Power The role of coordination in regularization Martina Pugliese, Vittorio Loreto, Christine Cuskley, Claudio Castellano, Francesca Colaiori, Francesca Tria
xi
169
177
185 193 195 197 205 213 221
228 236 244 252 260
The psychology of biological clocks: A new framework for the evolution of rhythm Andrea Ravignani, Dan Bowling, Simon Kirby The paradox of linguistic complexity and community size Florencia Reali, Nick Chater, Morten H. Christiansen Social interaction influences the evolution of cognitive biases for language Seán G. Roberts, Bill Thompson, Kenny Smith Understanding the linguistic structure and evolution of web search queries Rishiraj Saha Roy, M. Dastagiri Reddy, Niloy Ganguly, Monojit Choudhury The role of iconicity in the evolution of linguistic structure Julio Santiago, Mónica Tamariz, Gabriella Vigliocco, David Vinson Linearisation of adjectives: The grammatical face of perceptual/conceptual biases? Jakob M. Steixner Supporting evidence for language polygenesis from Neanderthal-Human interbreeding Szeto Pui Yiu Language emergence in the laboratory: A method suitable to dynamical systems analysis Whitney Tabor, Russell Richie, Harry Dankowicz Is the syntax rubicon more of a mirage? Maggie Tallerman Symbol extension and meaning generation in cultural evolution for displaced communication Kaori Tamura, Takashi Hashimoto Fitness landscapes in cultural language evolution: A case study on German definite articles Remi van Trijp Social word learning strategies in different cultures Paul Vogt, J. Douglas Mastin The mental synthesis theory: The dual origin of human language Andrey Vyshedskiy Cognitive factors motivating the evolution of word meanings: Evidence from corpora, behavioral data and encyclopedic network structure Bodo Winter, Graham Thompson, Matthias Urban xii
262 270 278 286 294
296 302 310 318 326 334 342 344
353
The magic number 4: Evolutionary pressures on semantic frame structure Dekai Wu Abstracts Rule learning in humans and animals Raquel G. Alhama, Remko Scha, Willem Zuidema The Putty-nosed monkey ‘Pyow-Hack’ sequence: Compositional or an idiomatic expression? Kate Arnold, Klaus Zuberbühler Primate pragmatics: Putty-nosed monkeys use contextual information to disambiguate the cause of alarm calls Kate Arnold, Klaus Zuberbühler The evolution of human cognitive systems: Comparative approaches to language and music Rie Asano, Uwe Seifert Sociocultural determiners of linguistic complexity Mark Atkinson, Kenny Smith, Simon Kirby Speaking of language and evolution Christina Behme Language disorders as windows on language evolution Antonio Benítez-Burraco, Cedric Boeckx Zipf’s law across languages of the world: Towards a quantitative measure of lexical diversity Christian Bentz, Douwe Kiela Informational structure of an emerging communication system is shaped by its environment Till Bergmann, Rick Dale, Gary Lupyan Spirals in language evolution Katrien Beuls Sound symbolism and the origins of language Damián E. Blasi, Morten H. Christiansen, Søren Wichmann, Harald Hammarström, Peter F. Stadler The origins of combinatorial communication Richard A. Blythe, Thomas C. Scott-Phillips A proposal concerning the gene network that regulates the shape of the language-ready brain Cedric Boeckx, Antonio Benítez-Burraco
xiii
361 369 371 373
375
377 379 381 383
385 387 389 391
393 395
Sign-theory and the origin of language Denis Bouchard Social origins of rhythm? Synchrony and temporal regularity in human vocalization Daniel L. Bowling, Christian T. Herbst, W. Tecumseh Fitch Bridging the gap: From bodily mimesis to speech Erin Brown, Jordan Zlatev The emergence of combinatoriality in the cultural transmission of pop songs in a children’s gameshow Jon W. Carr The cumulative cultural evolution of category structure in an open-ended meaning space Jon W. Carr, Hannah Cornish, Simon Kirby Do talk to strangers: Maternal and non-maternal interaction in the transmission of primate gesture Erica A. Cartmill, Richard W. Byrne The evolution of polysemy in child language Bernardino Casas, Neus Català, Ramon Ferrer-i-Cancho, Jaume Baixeries Zebra finches can learn to recognize affixations Jiani Chen, Naomi Jansen, Carel ten Cate Vocal communication in Gibbons Esther Clarke, Klaus Zuberbühler, Ulrich H. Reichard The dissolution of language & speech following brain damage Chris Code Frequency and stability of linguistic variants Christine Cuskley, Claudio Castellano, Francesca Colaiori, Vittorio Loreto, Martina Pugliese, Francesca Tria Biological adaptation to cultural traits Bart de Boer Language and speech are old: A review of the evidence and consequences for modern linguistic diversity Dan Dediu, Stephen C. Levinson The role of the human political singularity in the emergence of language Jean-Louis Dessalles Conversational infrastructure and the convergent evolution of linguistic items Mark Dingemanse, Francisco Torreira, N. J. Enfield xiv
397 399 401 403 405 407 409 411 413 415 417 419 421 423 425
Words as unmotivated cues Pierce Edmiston, Gary Lupyan Representations are selected: They don’t just drift T. Mark Ellison, Nicolas Fay, Monica Tamariz, Dale Barr The cumulative cultural evolution of an instruction language Nicolas Fay, Mark Ellison, Riccardo Fusaroli, Kristian Tylen Birds tutored with their own developing song produce wildtype-like song as adults Olga Feher, Kenta Suzuki, Kazuo Okanoya, Iva Ljubicic, Ofer Tchernichovski Regularization in language evolution: On the joint contribution of domain-specific biases and domain-general frequency learning Vanessa Ferdinand, Simon Kirby, Kenny Smith The effect of pitch enhancement on spoken language acquisition Piera Filippi, Bruno Gingras, W. Tecumseh Fitch Language from gesture? Emergent transitivity marking in Nicaraguan Sign Language Molly Flaherty, Susan Goldin-Meadow, Ann Senghas, Marie Coppola, Lila Gleitman Four wrong ideas in evolutionary linguistics Koji Fujita Artificial grammar learning in infants, adults, and songbirds: What is shared, what is learned? Andreea Geambaşu, Clara C. Levelt, Michelle J. Spierings, Carel ten Cate A revival of the homo loquens as a builder of labeled structures Tomás Goucha, Emiliano Zaccarella, Angela D. Friederici Language development in children with laryngeal abnormalities identifies prerequisites for verbal protolanguage Caroline N. Green, Glenn E. Green Multimodal communication in wild chimpanzees Catherine Hobaiter, Richard W. Byrne, Klaus Zuberbühler Comparative method for determining lexical stress in nonsense words Marisa Hoeschele, W. Tecumseh Fitch Sound symbolism and arbitrary sound-meaning relationships in language Mutsumi Imai, Michiko Asano, Guillaume Thierry, Keiichi Kitajo, Hiroyuki Okada, Sotaro Kita
xv
427 429 431 433
435 437 439 441 443 445 447 449 451 453
Tracing language primitives: Phonosemantic realization of fundamental oppositional pairs Gerd Carling, Arthur Holmer, Niklas Johansson, Joost Van de Weijer, Jordan Zlatev The origins of regularity in language: Why coordination matters Caroline Kamps, Vanessa Ferdinand, Simon Kirby Efficient communication and language evolution Paul Kay Evolutionary paths to compositional language Dimitar Kazakov, Mark Bartlett Systems emerge: The cultural evolution of interdependent sequential behaviours in the lab Simon Kirby, Hannah Cornish, Kenny Smith Formant tuning technique in vocalizations of non-human primates Hiroki Koda, Masumi Wakita, Nobuo Masataka, Takeshi Nishimura, Isao T. Tokuda, Chisako Oyakawa, Toshikuni Nihonmatus Bow-and-arrow technology: Mapping human cognition and perhaps language evolution Alexandra Regina Kratschmer, Miriam Noël Haidle, Marlize Lombard Patterns of variation in language and tool use: An ethnographic and comparative approach Anneliese Kuhle From grasping to grooming to gossip David A. Leavens, Jared P. Taglialatela, William D. Hopkins A multimodal perspective on ape communication Katja Liebal Social structure from language games Dorota Lipowska, Adam Lipowski Getting communication started: The superiority of gesture over non-linguistic vocalization Casey Lister, Nicolas Fay, T. Mark Ellison, Susan Goldin-Meadow The effect of size of articulation space on the emergence of combinatorial structure Hannah Little, Bart de Boer Comparative psychology and the evolution of language: Methodology matters Heidi Lyn
xvi
455 457 459 461 463 465
467
469 471 473 475 477 479 481
Pronomial characteristics of an evolved language: Is brevity an evolutionary advantage? Caroline Lyon, Joe Saunders, Chrystopher Nehaniv Culture vs. biology: Adversarial coevolution during the evolution of the lexicon Luke McCrohon From hand to mouth: Fine precision grip during mutual grooming elicited wide lip movements in wild Fongoli chimpanzees Adrien Meguerditchian, Marie Plouvier, Jill D. Pruetz, William D. Hopkins The nature of language in interaction Ashley Micklos Dogs need embodied directions: Children but not dogs possess skills needed for communicating with absent interlocutors Richard Moore, Bettina Mueller, Juliane Kaminski, Michael Tomasello Is Gricean communication necessarily cooperative? Richard Moore What Dwight L. Bolinger probably would have contributed to evolutionary linguistics Salikoko S. Mufwene Motivated vs. conventional systematicity: Implications for language learning and the structure of the lexicon Alan Nielsen, Simon Kirby, Kenny Smith The role of vocal learning in the acoustic communication of the Egyptian fruit bat Yosef Prat, Mor Taub, Yossi Yovel Detecting differences between the languages of Neandertals and modern humans Seán G. Roberts, Dan Dediu, Stephen C. Levinson The effect of iconicity on the emergence of combinatorial structure: An experimental study Gareth Roberts, Bruno Galantucci Accelerated regions and the language faculty Carmen Saldaña Chimpanzee food grunts are directed at specific individuals: Precursors for triadic communication? Anne Marijke Schel, Simon W. Townsend, Zarin Machanda, Klaus Zuberbühler, Katie E. Slocombe
xvii
483 485 487 489 491 493 495 497 499 501 503 505 507
About time: Semantic structure in emerging language Marieke Schouwstra Handicaps are unnecessary for human communication Thomas C. Scott-Phillips, Maxwell N. Burton-Chellew, Stuart A. West The origins of word meaning Catriona Silvey Intentionality in the production of chimpanzee alarm calls Katie E. Slocombe, Simon W. Townsend, Zarin Machanda, Klaus Zuberbühler, Anne M. Schel The cognitive underpinnings of metaphor as the driving force of language evolution Andrew D. M. Smith, Stefan H. Höfler Prosodic cue weighting by zebra finches Michelle J. Spierings, Carel ten Cate Minimal requirements for the emergence of learned signalling Matthew Spike, Kevin Stadler, Simon Kirby, Kenny Smith Incremental recruitment language — A formalism for evolutionary semantics Michael Spranger Momentum-based language change: A non-adaptive model of directional selection Kevin Stadler, Richard A. Blythe, Kenny Smith, Simon Kirby Symbolisation and cognition Justin Sulik The evolutionary relations between music and language: A cross-musical idiom approach from the comparative perspective of language and music Xiaoxia Sun, Uwe Seifert Organization of language: Evaluation of modularity theories Adam Szalontai, Katalin Csiszar Culture: Copying, compression and conventionality Mónica Tamariz, Simon Kirby Model fitting and prediction for language evolution Bill Thompson, Vanessa Ferdinand The effect of communication on category structure Bill Thompson, Catriona Silvey, Simon Kirby, Kenny Smith On the relations between articulatory gestures and manual grasping Lari Vainio, Mikko Tiainen, Martti Vainio, Kaisa Tiippana xviii
509 511 513 515
517 519 521 523 525 527
529 531 533 535 537 539
Learning speech-like signals from a skewed continuous distribution Sabine van der Ham, Bart de Boer Development of language through shared intentionality and categorization Olga Vasileva Iterated learning of sound systems and the emergence of tone categories Tessa Verhoef, Bart de Boer Selection in the lexicon Annemarie Verkerk, Andreea S. Calude, Mark Pagel Frequency-dependent bias affects the spread of human communication systems Bradley Walker, Nicolas Fay, T. Mark Ellison Adaptive strategies in the origins of semantic categories Pieter Wellens The influence of music on the perception of emotions in voice samples: Evolutionary implications Jacek Wilczyński, Sławomir Wacewicz, Przemysław Żywiczyński Semantic crowding triggers systematically structured sign systems Michael Wilson, T. Mark Ellison, Nicolas Fay Speech is characterized by robustness, neutrality and evolvability Bodo Winter Experimentally investigating the role of context in the structuring of the linguistic system over cultural evolution James Winters, Simon Kirby, Kenny Smith Neural networks, algebraic rules & human uniqueness Marieke S. Woensdregt, Willem Zuidema Modelling language competition without prestige Meng Han Zhang, Tao Gong Requirements on scenarios for the evolution of language and cognition Willem Zuidema
541
Authors Index
567
xix
543 545 547 549 551 553 555 557 559 561 563 565
May 2, 2013
14:6
BC: 8831 - Probability and Statistical Theory
This page intentionally left blank
PST˙ws
Perspectives on EVOLANG
May 2, 2013
14:6
BC: 8831 - Probability and Statistical Theory
This page intentionally left blank
PST˙ws
THE EVOLUTION OF EVOLANG The EVOLANG conference was born in Edinburgh in 1996. Eighteen years later the conference is still going strong. In 2014, the 10th EVOLANG conference will be held in Vienna. This landmark year presents a natural opportunity to reflect upon the state of the field and on the evolution of the conference itself. Though we do not (yet) have a formal society dedicated to the evolution of language, a lively community has formed around the conference. This community includes scholars from a surprising diversity of disciplines and countries. To support our growing community, we now have a website (evolang.org), with information about past and present conferences. To commemorate the 10th EVOLANG, we have included a special section in which we pause to take stock, look backwards as well as forwards, and reflect on the cultural history of the conference and the broader field. To this end, we invited a number of scholars to contribute their perspectives on the evolution of EVOLANG. The contributors come from different countries, disciplines, and career stages. We focused on scholars who have been a part of the conference for many years, but also included a few scholars who began attending only recently. The pieces in this 10th anniversary section represent many distinct points of view. None of the comments were edited; they reflect the intellectual diversity and spirit of debate that characterize the conferences themselves. We have also included two word clouds to illustrate changes in the topics, methods, and rhetoric over the history of the conference. The clouds on the following pages depict the 100 most frequent words in the text of the first and current EVOLANG proceedings (after eliminating some common words). The size of a word reflects its frequency. While language and evolution remain prominent (fortunately), changes in the frequency of other words illustrate the ways in which the conference has evolved. On a personal note, my first experience with EVOLANG was in Leipzig, 2004. I was just preparing to enter graduate school and I was struck by the disciplinary diversity and (relatively) good-natured disagreement on display at the conference. It felt like this was how interdisciplinary science should be done. I had the sense (and I still do) that the multifaceted approach promoted by the conference was the only way to gain traction on one of the hardest problems in science. We haven’t answered all the questions, and we are subject to the same fads and politics as any field, but I am continually impressed by the theoretical and methodological cross-fertilization embraced by the EVOLANG community. With funding bodies and academic institutions increasingly supporting interdisciplinary science, I hope that the next 10 conferences will bring us closer to a common vocabulary and a unified field of language origins research. Erica Cartmill UCLA, EVOLANG10 editorial committee chair
3 3
4 4
10th Conference, Vienna, April 2014
5 5
Michael Arbib University of Southern California First EVOLANG conference: Harvard 2002 Over the years, I have attended many of the EvoLang meetings and invariably found them stimulating and have enjoyed the chance to catch up with friends and explore a wide range of issues. Rather than rehearse the many pluses, though, let me use these notes to suggest ways to move forward. First, speaking as a reviewer of papers for many of the conferences, I think too many of the submissions are mere speculation. Yet our field is not so set in its ways that we should exclude “enthusiastic amateurs” if they bring seminal ideas. What can we do about this? (i) In the Call For Papers, key themes for papers could be succinctly defined; and we could explicitly exhort people “new to the field” to consult some key references as a basis for relating their contribution to the existing literature (ii) I think the “2 page abstract” should be discontinued as an option for submissions to the conference. This is so short that it encourages top-of-the-head submissions in some cases, while giving too little space to exhibit an original train of thought in others. Second, speaking as a neuroscientist, I am struck by how few EvoLang talks discuss the brain mechanisms underlying language, and how few of these few are based on more than a passing acquaintance with, neuroscience. Surely the study of how brains came to support language should be a major theme informed by the latest findings of neuroscience. (I realize I am setting myself up for a fall, since I will be giving a plenary on this theme in Vienna.) Third, speaking as someone who benefits from it, I am impressed by the trajectory to develop experiments and field observations on forms of communication and other forms of hierarchically organized behavior that can be analyzed to offer insight into the evolution of language. Finally, speaking as a computational modeler, I want to recall the Utrecht meeting. The good news was that the power of modeling was so well recognized that almost every time block had a parallel session on modeling. The bad news was that, as a result, experimentalists and field workers lost the chance to learn about modeling and thus to assess its value in their own work and to provide feedback that could tune the models to better address their data. I believe subsequent meetings have been and will be better structured for integrating modeling, experimentation and observation. Let me just add that models are of value not only if they provide a systematic explanation of specific data but also if they add to our vocabulary of processes and mechanisms through models that provide general insights as a basis for later more detailed data-respectful modeling.
6
6
Cedric Boeckx ICREA/Universitat de Barcelona First EVOLANG conference: Kyoto 2012 It is a testimony to the accuracy of Dobzhansky's well-known statement (“nothing makes sense in biology except in the light of evolution”) that many scholars take biolinguistics to be synonymous with evolutionary linguistics. And it is a testimony to the importance of Evolang that for many scholars, young and seasoned alike, the place to learn about advances in evolutionary linguistics is Evolang. In the course of its brief, but rich, history, Evolang has demonstrated that the 1866 ban from Paris is definitely passé, but especially for someone like me, who matured as a linguist along the banks of the Charles river, it has shown the needs and benefits of taking both biological and cultural factors seriously if we are to make progress in understanding the evolution of language. I recall being told that 'these Evolang folks' only confused phylogeny and glossogeny, but discovered a remarkably open-minded, pluralistic, welcoming environment focused not on ideologies, but on scientific problems that are too hard for a single kind of method or school of thought to solve. At Evolang I learned that it would be a mistake to insist on a strict separation of the internal and external factors that enter into the design of grammatical systems. I also saw how conferences with over 300 participants can run smoothly and retain the delightful atmosphere of small workshop where everyone attends everyone else's talk. Darwin biographers regularly return to Edinburgh, convinced that the seeds of evolutionary theory were sown during the two years he spent there. I am convinced that future historians of the field of evolutionary linguistics will go back to the first Evolang meeting in Edinburgh when they retrace the steps that led to progress. We may not have to wait too long to see this happen, because that first meeting in 1996 gave birth to a biennale tradition that we should all be proud to be a part of: it shows how complex knowledge can accumulate rapidly in successive generations of scholars with a desire to learn. In so doing, Evolang teaches us about language evolution by example.
7
7
Michael Corballis University of Auckland First EVOLANG conference: Paris 2000 The initiation of the EVOLANG conferences in 1996 opened a floodgate, releasing the topic of language evolution from the lingering effects of famous bans of the 1860s and the anti-evolutionary stance of Chomskyan linguistics. Even the first conference produced a remarkable variety of approaches to language evolution that were to mushroom over succeeding meetings. Some, such as Derek Bickerton, continued to argue in Chomskyan fashion for a single critical step from protolanguage to full human language. Bickerton maintained a distinctive if argumentative presence until the Barcelona conference in 2008— and may well be back. But on the whole, no holds were barred. I sidled, as it were, into EVOLANG 2000 in Paris with the proposition that language evolved from manual gestures, having earlier floated the idea in The Lopsided Ape (1991). But the gestural theory has a centuries-old history, and had been revived in some detail by Gordon Hewes in 1973. I don’t think there was much interest in 2000, although Ann Senghas spoke on the grammaticalization of Nicaraguan Sign Language, and Sue Savage-Rumbaugh on her gesturing and pointing bonobos. If I recall correctly she suggested that Kanzi himself would present at the next conference, but careful observation of the presenters at EVOLANG 2002 suggested that Kanzi wasn’t one of them. In 2002, though, Michael Arbib and John Skoyles weighed in with mirror neurons—although Jim Hurford sounded a note of caution about a mirror- neuron take-over. In 2006 in Rome gestural theory seemed to come of age—appropriately in Italy, where gesture rivals speech as the medium of expression. Vittorio Gallese talked on “the linguistic body,” Michael Arbib (again) on the “mirror system hypothesis,” Michael Tomasello on “ape gestures and human language.” Others who presented on ape gestures included Vauclair and Meguerditchian, Byrne and Cartmill, Pika and Liebal. These themes have continued in subsequent meetings, more prominent perhaps in submitted presentations than in plenaries. A future meeting in Naples, gesture capital of Italy, might clinch the deal. Of course, it is not all about gesture, and there remain those skeptical of the gestural hypothesis—this is as it should be. And there was, and is, much much more, with genetics a growing theme. EVOLANG may well be unique in the sheer diversity of biological, archeological, linguistic, and computational expertise, and progress has been, dare I say, palpable. Long may it continue— and evolve.
8
8
Bart de Boer Vrije Universiteit Brussel First EVOLANG conference: London 1998 My scientific career has developed very much in parallel with EVOLANG. I did not attend the first edition, but my PhD supervisor Luc Steels did, and he immediately switched the research focus of his lab – including my thesis project – to evolution of language. In 1998 I was an enthusiastic young computer modeler in the stage of my PhD where the results finally got exciting. In 2014 I am running my own little group where experimental work, computer models and work based on archaeological and biological data is combined. In 1998 I was happy when my results looked somewhat like real data, in 2014 I am worried about statistical significance, experimental confounds, whether my samples are large enough or whether my computer agents should be Bayesian or not. I think this reflects a maturation of the field of language evolution. In 1998 it was mostly a group of enthusiastic scientists with a shared interest in this decidedly un-mainstream topic. Everyone was mesmerized by the work that was going on in the different disciplines. In 1998 almost anything went, even rather speculative work, as long as it was exciting. However, even then it was already clear that different disciplines could not just improve their research by exchange of ideas and data, but also by learning from each other’s methodological strengths and weaknesses. In 2014, the work in language evolution has become much more rigorous: we are discovering methods by which questions that we could formerly only speculate about can now be investigated empirically. At the same time computer models have become more of a tool, rather than a goal in themselves. Although this means that the field is perhaps a bit less adventurous, there are still many exciting new results that appear every year. As for the future, obvious progress is to be expected from the study of ancient DNA, from new fossil finds (personally I am hoping for a Homo erectus hyoid bone) and from animal studies. Less obviously perhaps, I am convinced that a better understanding of the interaction between individual learning with cultural and biological evolution will allow us to pinpoint more accurately what cognitive adaptations have undergone evolutionary pressure related to language. I also expect increasing consensus over old origins of language: whereas in 1998 a proposed time depth of 50 000 years was still commonplace, nowadays 150 000 years is considered a recent date, and half a million years more likely. Finally, I think we should strive for language evolution to become part of the standard linguistics curriculum: after all (to paraphrase Dobzhansky) nothing in linguistics makes sense except in the light of evolution.
9
9
Jean-Louis Dessalles Telecom ParisTech First EVOLANG conference: Edinburgh 1996 In 1996, very few scholars who spoke at the Edinburgh Conference presented language as problematic. For most of them, it was all about a success story. The only problem was to reconstruct the happy sequence of events that led to Language. Language was obviously a marvel, and our task was merely to explain how our species happened to reach that evolutionary nirvana. Now, more and more scientists who gather at the Evolang Conference are aware of the conundrum: human information has a negative price. Humans are very talkative (they speak 16 000 words a day on average). Like scientists who pay to speak at conferences, individuals spend time and energy in gathering information worth to tell and wait their turn to deliver it during lengthy conversations. Why are human beings competing to give away, for free, information to their competitors? The Conference helped in exposing the weaknesses of “easy” answers such as the species’ advantage, group selection or reciprocal cooperation. Now, thanks to Evolang, language is no longer a marvel. It’s a mystery, an abnormality. To solve the conundrum, the various disciplines that meet at the Evolang Conference must be again and again confronted with it. My personal wish is to see future contributors to concentrate more on evolution and more on language. → more on evolution, because Darwinian constraints are most often overlooked. Apparently, by Darwinian standards, language shouldn’t exist. The Conference must encourage more scientists to address the problem if we want it to be solved. → more on language, because many contributors to Evolang study abstract descriptions that they call “language”. We need more efforts to be devoted to studying language as it is, rather than as we imagine it. Language is conversation, language is chatter. Language is a form of behaviour, and it can be easily observed “in the wild”. We should explain the behaviour first, instead of ignoring it. I can summarize with one single word what I find to be missing in Evolang, and what Evolang is in the best position to discover: the function of language.
10
10
Ramon Ferrer-i-Cancho Universitat Politecnica de Catalunya First EVOLANG conference: Leipzig 2004 What if we are not the center? — I am very grateful to the Evolang conferences for having provided me with an advanced school for learning about language and evolution and the many related topics found at their crossroads. It is hard to find a milieu that is as multidisciplinary as Evolang (complex system meetings are the only serious competitor that I know). This wide field is evolving quickly (much faster than academic programs in my conservative and centralized university system). However, it still strikes me as strange to encounter claims about the uniqueness of human language, because a rigorous statistical test for determining whether a species has a language of an expressive capacity or sequential complexity equivalent to ours is not forthcoming. As a ‘theoretician’ I am worried about the right level of abstraction for connecting the dots. When we think about the origins of human language, we are biased towards our lineage but examples of complex cognition and societies in distant species with brains not necessarily as large as ours are constantly challenging us, strongly suggesting convergent evolution. Humankind has been the reference for this series of conferences but the evidence of complex communication in bacteria, millions of years before the earliest estimate for the emergence of some form of language in our ancestors, might help us to see the crux of the problem in the coming years. Evolang reminds me about two extreme models of scientific inquiry: the pope model, where a big ego gives you everything you think you must need to not get lost in the darkness of ignorance, and the swarm model, where the constant exchange of ideas of adaptive individuals who do not necessarily share the same knowledge or opinions brings discovery and real progress. I think that this conference has been more about the second. I hope we never lose sight of that.
11
11
W. Tecumseh Fitch University of Vienna First EVOLANG conference: Paris 2000 Although I missed the very first (Edinburgh) EvoLang Conference, I've attended most of the others, and I organized the 4th conference at Harvard in 2002, where among other things Noam Chomsky ended his long silence on the topic of language evolution. In my opinion, the quality of research presented at EvoLang has steadily increased. I've been particularly impressed at the readiness of younger attendees to fearlessly jump across traditional disciplinary boundaries, adopting an empirical, hypothesis-testing approach. As a biologist I've been very pleased to see increasing contributions adopting a comparative approach, using data from animal cognition and communication to inform our understanding of the biology and evolution of human language. But before we wallow in self-congratulation it is important to recognize that research in this field did not start with EvoLang or (as per a common misconception) Pinker & Bloom's 1990 BBS paper. Besides the classics, from Plato through Condillac to Darwin, two earlier blossomings of research in the 20th century cannot be ignored, especially a string of Current Anthropology articles (e.g. Hockett & Ascher 1964, or Gordon Hewes' 1973 eloquent defense of the gestural origins hypothesis, which introduced the term "protolanguage"), or Harnad and colleagues’ classic 1976 Annals volume. Most of the new ideas "discovered" in the post-1990 era had already been insightfully discussed back then, often with rich referencing to even earlier predecessors, and we ignore this history at our peril. And now back to wallowing: despite these predecessors, the rise of the current "EvoLang generation" clearly marks a new era. Unlike previous brief bursts of interest, this regular conference has successfully created a stable community: scientists who meet, debate, consider, and then meet again a few years later. I think this has had an incredibly positive effect on the study of the biology and evolution of language, and Jim Hurford deserves a lot of the credit for this, as the prime mover and steadfastly levelheaded and rational father of EvoLang. The cumulative effect is a true interdisciplinary community that certainly doesn't agree about everything (or indeed much at all) but at least is moving towards consensus about what the big issues are and how to address them scientifically. EvoLang deserves considerable credit for initially nurturing this maturation, and spurring it onward and upward. May many more great EvoLang conferences follow!
12
12
James R. Hurford University of Edinburgh First EVOLANG conference: Edinburgh 1996 Twenty years ago, work on language evolution tended to be done by isolated individuals. In some disciplines, such as anthropology and biology, the topic was at least recognized as one on which rational, if speculative, conclusions could be reached, backed up by empirical evidence and coherent theory. In others, such as linguistics, an interest in language evolution was generally seen as a sign that a colleague was wobbling off the rails of proper investigation. After our first conference in Edinburgh in 1996, several distinguished researchers of different specialisms told me the conference had opened their eyes to the possibility of significant advance through cross-fertilization between disciplines. A roboticist wrote that he had been on the edge of his chair with excitement throughout the proceedings. A primatologist wrote gratefully of her relief that the discussions had been so open to a breadth of ideas and that the questions had not been defined in biased ways by domain-blinkered theoretical assumptions. The EVOLANG conference, as it came to be known, has moved on in this spirit ever since. Back in self-made isolation wards, there are grumblings of various sorts. One kind is that EVOLANG proceedings are too speculative. Indeed they are speculative, as are all disciplines ultimately to some extent. You won't find the same degree of technicality at EVOLANG as you will at a specialist genetics or neuroscience conference. But equally, in cosily homogenous meetings, where everyone starts roughly on the same page, you won't find very much with wellmotivated general appeal outside the specialism. Language evolution research is inevitably multidisciplinary, not the province of one subject, not even linguistics. Selection for our proceedings has always been based on empirical evidence combined with rigorous argument. And we haven't necessarily started on the same page as each other. What EVOLANG has done is to show those eager to look that there are bookfuls of other pages worth reading, to work toward a more coherent picture of all the influences that have made Homo sapiens, through biological and cultural evolution, the incomparable piece of work we are. We now have a forum where sensible ideas on language evolution are taken seriously. A community has been built.
13
13
Sverker Johansson Dalarna University First EVOLANG conference: Rome 2006 My first experience with Evolang was a rejection. I wrote a crappy abstract for the 2004 Leipzig conference, and got what I deserved. So for Rome in 2006, I submitted two abstracts just to be safe – and both got accepted ☺ . From then on, I have been accepted to every Evolang – though I failed to attend in 2010, as my wife was nine months pregnant at the time and made it clear what she thought of my going to a conference while she went into labour. Evolang was already fairly mature when I first saw it in 2006. Fashions change, but quality and sophistication has gradually improved over the year. Nevertheless, Evolang remains highly heterogeneous. Despite many high- quality contributions, they do not really form a coherent body of knowledge. Not enough people work on connecting the dots. Explaining the evolution of any biological feature, including language, requires answers to the four questions of Tinbergen (1963): adaptation, history, proximate cause, and ontogeny. We should pay more attention to Tinbergen in our contributions, and make clear which questions we are addressing. Much miscommunication at Evolang stems from misunderstood Tinbergen levels. A deeper divide has been lurking for some years, and surfaced in earnest in Kyoto 2012: that between Chomskyan biolinguistics and everybody else. For many years, Chomsky totally dismissed evolutionary linguistics. But in the past decade, Chomsky and friends have built a parallel effort at elucidating the origins of language, under the label ‘biolinguistics’, without really connecting with mainstream Evolang, either intellectually or culturally. We have here a Kuhnian incommensurability problem, with contradictory views of the nature of language, and of what is to be explained at Evolang. Do we bridge this Kuhnian gap in Vienna 2014?
14
14
Simon Kirby University of Edinburgh First EVOLANG conference: Edinburgh 1996 I was in the final year of my PhD at Edinburgh when my inspirational supervisor, Jim Hurford, organized the First International Conference on the Evolution of Language. Nobody knew whether it was going to be a success, and indeed I recall some skepticism around the department about whether there was enough known about the evolution of language to mount a major international conference in this topic. Nevertheless, there was a distinct feeling of excitement as delegates met on the balcony of the conference venue at the foot of Edinburgh’s extinct volcano. Jim and I marveled at all these people who had come out into the open to declare an interest in this peculiar subject and who appeared to share our heretical belief that nothing in linguistics makes sense except in the light of evolution. One of the themes of that first conference (albeit in the smallest of the lecture rooms) was computational modeling. I was surprised to find anyone else working in this area, but was delighted when this grew to be a significant strand in future conferences in the series. This became the first of a handful of methodological obsessions of the field, as it sought ways to move from armchair theorizing to testable hypotheses. Nevertheless it presented, and continues to present, a challenge for the conference. Do we have specialist sessions that focus on methodology, or integrate this technical work in with the rest of the conference? I think we’ll know that the field has grown up when talks are grouped by the topic of the research rather than the methods being used to address that topic. Back in Edinburgh in 1996 it was obvious that language evolution is not just wildly interdisciplinary, but also has an unusual mix of researchers who dabble in language evolution “on the side” and those – especially younger academics – who see it as their main subject. The future of this field surely lies in this new generation of scholars who are well versed in multiple methods, and open to data and results from a diversity of disciplines. These are people who wish to make language evolution be their full-time job. The challenge for EVOLANG is to help develop the careers of these language evolution experts; to promote the topic in the subject areas in which these people need to find jobs; and to assist in the training of students in subjects such as linguistics, psychology, biology and cognitive science who want to move into this area.
15
15
Chris Knight University of East London (retired) First EVOLANG conference: Edinburgh 1996 Jim Hurford and I first met at an interdisciplinary conference on ‘Ritual and the Origins of Culture’, held in London in the School of Oriental, African and Asian studies in 1994. After the final session, when everyone was leaving, Jim approached me saying, ‘Chris, why were so few linguists involved?’ As local organiser I’d tried to be inclusive but took Jim’s point. Remaining in touch, we convened a modest symposium on language origins in Stratford, East London, the following year. On the last day, Jim volunteered to be local organiser for our next effort – a full-scale international conference to be held in Edinburgh in 1996. Edinburgh was wonderful. If I remember right, there were about 60 of us, each hugely enthusiastic and combining to produce a kaleidoscope of ideas. I remember thinking how paradoxical it all was. Here we were, all focusing on just one issue: how did our ancestors first start making themselves understood? Like everyone else, I had a go. Yet nobody understood a word of what I was saying. The same went for me – I kept wondering what on earth everyone else was talking about! It felt like Alice in Wonderland. None of the words we used – least of all ‘LANGUAGE’ – had any agreed meaning whatsoever. So here we all were, struggling intellectually with the origins of language – while ironically illustrating through our own linguistic failure just how difficult the problem must have been. My training as a social anthropologist tells me this: there can be no shared meanings without shared communal practices, shared rituals. We need more singing and dancing, more all-night feasts, more getting drunk together. The academic ritual of a conference every two years is a good deal better than nothing, but perhaps still insufficient to counteract the opposite tendency, which is to allow the routines and procedures of our numberless institutional specialisms to pull us imperceptibly apart. Now we’ve reached our tenth corroboree. In his two latest books – The Origins of Meaning and The Origins of Grammar – Jim brings the results together as much as anyone has ever done. Above all, we were lucky it was Jim who took things in hand. I can’t think of anyone more fair-minded, more resolutely pluralist or more acquainted with the issues in all their variety and breadth. Maybe EVOLANG under his stewardship hasn’t yet hammered out a shared scientific language for us, but we’ve come a long way.
16
16
Phillip Lieberman Brown University First EVOLANG conference: London 1998 OPPORTUNITIES MISSED: Charles Darwin formulated the principles and practices that have since guided evolutionary biology, including the comparative method which entails studying related living species to trace the evolution of a human attribute. Unfortunately EVOLANG conferences have departed from the Darwinian paradigm, instead presenting a mélange of enlightening studies weighed down by linguistic exercises that share many attributes of medieval scholasticism, and just-so stories. Protolanguages that lack syntax are discussed, ignoring comparative studies such as the chimpanzee ASL project of Allen and Beatrix Gardner. Another just-so story was Neanderthal vocal communication restricted to high-pitched humming, absurd given the archaeological record and the large hyoid bone of the Kebara Neanderthal fossil that supported a large larynx which would yield a low pitched voice. One recurring problem is that Noam Chomsky’s views on what constitutes research have permeated much of the agenda of EVOLANG conferences. Chomsky rejects the role of natural selection in evolution and variation, the feedstock for natural selection. Darwinian evolutionary theory involves no directing force guiding evolution, removing creation myths and the role of any God from the equation. Chomsky instead invokes a mystical “third force” that directs evolution towards formal elegance and “simplicity” – whatever that might be. Linguistic research has worn blinkers for generations, taking into account exceedingly narrow ranges of data that support particular theories, ignoring inconvenient facts. Pet theories are supported by data that reflect competence. Performance effects are irrelevant. Entire fields of research are off-limits. The role of culture in shaping a language and setting the conditions for natural selection hasn’t entered into the minds of mainstream linguists, though it’s clear that is the case. Selective sweeps on genes that confer adult lactose tolerance, for example, occurred only in cultures where cows, sheep or goats were domesticated. The needs of a culture appear to shape language. Is recursion a “universal” in light of Daniel Everett’s study of the Piraha? Does recursion really have anything to do with language in light of Lashley’s insights on motor control? When I point out that the neural bases for human laryngeal control can be traced back to Therapsids, transitional mammallike reptiles who lived millions of years ago, and apparently initially evolved to maintain mother-infant bonds, the reaction often is “so what,” missing the point that this exemplifies the mechanism that Charles Darwin pointed out in On the Origin of Species to account for the rapid transitions that mark the course of evolution, “…the fact that an organ originally constructed for one purpose…may be converted into one for a wholly different purpose… (1859, p. 190). Hopefully, future EVOLANG conferences will take better account of the basic principles of evolutionary biology.
17
17
Heidi Lyn University of Southern Mississippi First EVOLANG conference: London 1998 Moving forward, in collective inquiry — My first experience with Evolang meetings was as a graduate student at the 2nd conference in London in 1998. As a student of artificial language with nonhumans, there were few conferences at which I felt that my work fit. Moving from talk to talk in London, I was overwhelmed by the sheer number of paradigms and topics that spoke directly to me. I had never thought that computers could be programmed to model communicative agents, or that robotic dogs could “Talk” to one another. The archeological findings from the field, the philosophical discussions of the linguists, the peek into the brain processes by the neurobiologists – I couldn’t attend enough panels or speak to enough people. With each succeeding conference, while I became more familiar with the paradigms, the main players, and the themes, it became ever more clear that one of the strongest aspects of the Evolang community is the commitment to maintaining the multi-disciplinary program otherwise the researchers in the separate fields would continue to be shockingly unaware of progress made outside their own field. One memory I have of attempting to remedy this is - at Harvard in 2002 - inserting a slide of “top misconceptions I have heard about the Kanzi project at this conference”. Highlighting the diverse outlooks that abound in the field, at the Rome conference in 2006 I was asked, like many of us, to describe what would happen if a group of infants were placed in a language-free space during the first several of years of life. Would they acquire language? Possibly predictably, not one of us agreed on the answer. Evolang allows all of us, in all of our questing dissent, a place to present, learn, argue and hopefully move toward a better understanding of a phenomenon - one that we likely will never be able to fully comprehend with any certainty.
18
18
Kazuo Okanoya The University of Tokyo First EVOLANG conference: Paris 2000 I study birdsong in search of clues to the origins and evolution of language. Over the past decade many of my ideas have been inspired by attending Evolang and interacting with the diverse variety of researchers that the conference attracts. Some have been critical to my approach, but others have been supportive, and thanks to the conference I have established friendships with researchers from all over the world. I was fortunate enough to be asked to organize Evolang9 in Japan in 2012. Although Japan was hit by an earthquake and nuclear disaster the previous year, the conference attracted more than 330 participants, making it the largest Evolang conference up until that point. I thank Koji Fujita, Takashi Hashimoto, Luke McCrohon, and Ayumi Osawa for its success. The first Evolang I attended was Evolang3 in Paris, where I presented research on the song syntax of Bengalese finches. In my talk I submitted the hypothesis that behavioral syntax could evolve in the absence of semantics through a process of sexual selection; the implication being that syntax in human language could also have evolved in this way. My talk was enthusiastically welcomed by attendees such as Alison Wray and Chris Knight, who invited me to contribute to a book on language evolution that they were preparing. However other attendees were quite critical to my hypothesis, saying that birdsong had nothing to do with language. Such criticism was highly useful in the further development of my ideas. At Evolang5 in Leipzig I was introduced to Simon Kirby by Terry Deacon and began collaboration with both of them. I felt my “bird story” had gradually come to be accepted by many researchers. I then visited Edinburgh and had the opportunity to talk with Jim Hurford and Maggie Tallerman. Jim was always interested in my bird story but Maggie was amicably critical of it. After Evolang8 in Utrecht, I was unfortunate enough to be stuck in Europe because of the volcano eruption in Iceland, but this gave me the opportunity to begin collaboration with Bob Berwick who was also trapped. Together with Johan Bolhuis, we started to write a review paper under the ash cloud. In such ways attending Evolang has greatly expanded my horizons. My biggest hope for the future of Evolang is that it maintains its atmosphere of friendly criticism and heterogeneous openness.
19
19
Thom Scott-Phillips Durham University First EVOLANG conference: Rome 2006 The format of the conference itself has not changed greatly during my time, but the field itself has undergone at least one significant methodological change. This is the greatly increased use of laboratory experiments (see Scott-Phillips & Kirby, 2010 for a review; further experiments have been conducted since that article). Contrary to a common assumption, these experiments are not attempts to replicate the origins of language in the laboratory. Rather, they are a tool to study, in a controlled way, how various processes hypothesized to be of importance to language evolution actually work. These experiments have already proven their worth, and they will, I predict, continue to be of importance in the coming years, especially with regards the cultural evolution of languages. What has not changed too much, not only since I joined the field, but since the Evolang conferences began 20 years ago (and indeed before then too) are the foci of attention. Like linguistics in general, language evolution has mainly focused on phonetics, phonology, semantics, and, in particular, syntax. In contrast, pragmatics has been relatively little studied, as a quick glance at the index of any of the many edited collections on language evolution will show. This is, in my view, a mistake. In fact, as evolutionarily minded researchers, we more than anyone should be attuned to the importance of pragmatics for the study of language. Traits, be they cultural or biological, cannot evolve if they are never expressed in the wider environment. In this case, that means the use of language in communication, and the cognitive basis of that communication. These topics are precisely the domain of pragmatics. For these reasons, it is regrettable that researchers such as Steven Levinson and Dan Sperber, both of whom have studied in great detail the links between pragmatics, cognition, and evolution, have never given plenary talks at the conference. If this trend were to change, the Evolang conferences would be doing the field a great service. Scott-Phillips, T. C., & Kirby, S. (2010). Language evolution in the laboratory. Trends in Cognitive Sciences, 14(9), 411-417.
20
20
Luc Steels IBE (UPF-CSIC), Barcelona and VUB, Brussels First EVOLANG conference: Edinburgh 1996 EVOLANG: PAST and FUTURE — I still remember vividly the first Evolang conference, brilliantly organised by Jim Hurford and Chris Knight. I accidentally heard about it only a few months before the event and felt very lucky that I could still get in. Language evolution had become my focus of attention only a year or so earlier and I was eager to present some intriguing results using agent-based modeling and concepts from artificial life (Steels, 1995). Of course I was also eager to learn what others had come up with and the conference was not a disappointment. I was on the edge of my chair the whole time and rushed from one session to another, disappointed that there were parallel sessions. Studdert-Kennedy, Lindblom and MacNeillage presented impressive research on the origins of speech which helped me to nurture the work of my students de Boer and Oudeyer on the origins of vowel systems. The anthropologists Knight, Dunbar and Power opened my eyes to the importance of sociality as a prerequisite for symbolic language. I had my first exposure to landmark work by Kirby and John Batali on agent-based modeling. And, on top of many other fascinating talks, there was a most enjoyable whisky tasting in Jim Hurford's house in Edinburgh. The main themes of Evolang have been shifting, with more emphasis now on animal communication, neurobiological prerequisites, genetics, and psychological experimentation, but the openness to different perspectives has fortunately remained. What do I personally wish for in the future? Evolutionary biology suggests signposts. First, there is an overarching theory of biological evolution, which informs all investigations in biology. We do not have such a similar framework yet, but I believe we should strive for it. Second, the historical evolution of biological species is fundamental to biological evolution studies because biologists believe that the processes observable throughout history are the same as the ones at the birth of new species and life itself. I believe the same is true for language and hence the rich body of data from historical linguistics should be integrated more into Evolang. Third, evolutionary biology (indeed biology in general) increasingly uses computational modeling and synthetic methods. I strongly believe efforts to model language evolution are important and I wish there would be more of it. These are future wishes but let's now celebrate the 10th anniversary and thank wholeheartedly the many people that have put in so much work to make all this happen. Steels, L. (1995) A self-organizing spatial vocabulary Artificial Life Journal, 2(3).
21
21
Maggie Tallerman Newcastle University First EVOLANG conference: Edinburgh 1996 I became fascinated by language evolution in the latter half of the 1990s, my early interest genuinely motivated by a fear that some stranger would ask me where the language faculty came from and how long our species had had it. I had no idea; better do some reading, then. A decade or two elapses. I still have no answers, but now I can point to a massive body of work from across numerous scientific disciplines that is highly suggestive, though admittedly suggestive in many different ways. When I started teaching a course on language evolution for undergrads in linguistics, there was remarkably little for them to read. It wasn’t that there was no literature – we’d start with Pinker & Bloom’s seminal 1990 paper in BBS, ‘Natural language and natural selection’, plus Derek Bickerton’s 1990 book Language and Species, and a fair bit of the 1996 Lock & Peters volume Handbook of human symbolic evolution was relevant – but much work was inaccessible to students who weren’t particularly literate scientifically. This has changed hugely. We now have useful textbooks (e.g. by Johansson; Fitch; McMahon & McMahon) plus a dedicated OUP series of monographs and edited volumes. And EVOLANG has been marvellous for our emerging discipline. I find the conferences hugely stimulating; I learn stacks, I get mad, I revise my own ideas. Our preoccupations change over time; we are less obsessed with a polar continuity debate, realizing now that there are continuities and discontinuities. Linguists learn about evolutionary biology; primatologists and neurologists learn about language. As a discipline, we are young, and still have much to learn from other scientific fields. Most of all, linguists of all persuasions have finally engaged with evolutionary linguistics – not just Bickerton and Jim Hurford, two of my personal heroes, both highly influential from the start – but also people like Ray Jackendoff, Steve Anderson, Denis Bouchard, all eminent in generative linguistics and now producing excellent work for our subfield. I find our future bright. Thanks, Jim and Chris, for EVOLANG.
22
22
Papers
May 2, 2013
14:6
BC: 8831 - Probability and Statistical Theory
This page intentionally left blank
PST˙ws
DIACHRONIC PROCESSES IN LANGUAGE AS SIGNALING UNDER CONFLICTING INTERESTS
CHRISTOPHER AHERN, ROBIN CLARK Department of Linguistics, University of Pennsylvania, 619 Williams Hall, Philadelphia, 19104, United States [email protected], [email protected] Game-theory has found broad application in modeling pragmatic reasoning in cases of common and conflicting interests. Here we extend these considerations to diachronic patterns of language use. We use evolutionary game theory to characterize the effect of conflicting interests on linguistic signaling in a population over time. We show how sufficiently large conflicts of interest give rise to particular patterns of signaling that can be used to model diachronic processes such as Jespersen’s Cycle.
1. Introduction In the abstract, both Linguistics and Animal Communication deal with the transmission and interpretation of signals. Yet, the two differ markedly in the use of the term cooperation. Crucially, this difference hinges on the interests of the agents sending and receiving signals. In the Gricean tradition, speakers are taken to be cooperative when they make their contributions relevant to current conversational goals. This use presumes that interlocutors agree to the goals of a given exchange, and benefit from pursuing them. In the biological sense, speakers are cooperative when they forgo some gain to the benefit of others (Nowak, 2006). This use entails that speakers have countervailing goals, but do not pursue them. In what follows, we will use cooperation solely in the biological sense and consider the implications it has for patterns of signaling over time. Cooperation in the biological sense is in need of evolutionary explanation. If senders have an incentive to pursue some goal, then they should do so. If, in doing so, senders have every incentive to deceive, then receivers should not listen. If receivers do not listen, then senders have no motive to signal in the first place. Crucially, the actions of senders depend on those of receivers and vice versa. Given this interdependence, conflicting interests undermine signaling. Senders will learn or evolve to dissimulate and receivers will learn or evolve to distrust. This process takes on the familiar form of the tragedy of the commons (Hardin, 1968). Individuals will always be tempted to exploit the common resource of credulity, to the collective detriment.
25
Silence or, at best, meaningless babble is the equilibrium state in the population: neither senders nor receivers have reason to unilaterally change their behavior. So, what keeps signaling in general, and human language in particular, from the downward spiral to silence? The existence and persistence of signaling necessitate evolutionary mechanisms that mitigate conflicts of interest between senders and receivers. Various mechanisms have been proposed to solve this problem, with varying degrees of appropriateness for application to language (ScottPhillips, 2008). This leads us to two general questions, which we can consider apart from the details of any particular set of mechanisms. First, how effective are these mechanisms in solving the problem of cooperation in signaling? Given that language exists, such mechanisms are clearly sufficient to stave off total collapse. Given that signals are not always used cooperatively, such mechanism are not sufficient to ensure the idealized case of Gricean commonality. Second, if language is indeed subject to a host of competing pressures, what impact will this have on how linguistic signals are used over time? This second question will be our chief concern. Using evolutionary game theory, we consider the range of conflict between senders and receivers, from pure common interest to pure conflicts of interest. We identify the crucial points that determine different regimes of signaling behavior: small conflicts allow for stable signaling, large conflicts lead to a collapse, and in between we observe cyclic signaling. The main contribution of this work is the application of this model to the well-known diachronic process of Jespersen’s Cycle. We show how even a slight tendency to overuse emphatic negation leads to the loss of emphasis, necessitating the introduction of a new emphatic form to take its place. This provides a broader connection between the forces affecting language at both evolutionary and historical time scales. The rest of the paper is organized as follows. In the next section we provide an overview of the formal framework. We then turn to a particular parametrization of conflicting interests and examine stability. The resulting model is then applied to the cycle. We conclude with a brief discussion of our results and directions for future research. 2. Evolutionary Game Theory Evolutionary game theory (Maynard Smith, 1982) gives us the formal tools to model interactions between agents with varying degrees of common and conflicting interests, without the too strong assumption of perfect rationality. Here we introduce signaling games (Lewis, 1969) along with the equilibrium concept of an evolutionarily stable strategy as a model of signaling in a population.
26
2.1. Signaling Games 2.1. Signaling Games A signaling game consists of two players: a sender and a receiver. The sender has A signaling of two players: a sender a receiver. The sender has some privategame piececonsists of information, t ∈ T . The pieceand of information can be thought some private piece of information, t ∈ T . The piece of information can be thought of as some fact about the state of the world. The sender chooses a message, of thereceiver. state of A thesender world.strategy The sender chooses m as ∈ some M , tofact sendabout to the s ∈ [T → Ma] message, specifies m ∈ M , to send to the receiver. A sender strategy s ∈ [T M ]what specifies a know state what message to send in any given state.a The receiver does not → The receiver does not know what what message to send in any given state. of the world actually holds and must choose an action, a ∈ A, to interpretstate the of the world holdsstrategy and must action, a ∈ what A, toaction interpret the message sent.actually A receiver r ∈choose [M →anA] specifies to take message sent. A receiver strategy r ∈ [M → A] specifies what action to take upon receiving any message. The outcome of the game is determined by the state upon anymessage message. Theand outcome of the game determined state of thereceiving sender, the sent, the action taken byisthe receiver. by Thethe sender of the sender, the message sent, and the action taken by the receiver. The sender and the receiver have preferences over these outcomes, which are reflected in the and thefunctions, receiver have preferences over these outcomes, which aretoreflected the utility US and Ur respectively, which map outcomes the set ofinreal utility functions, U and U respectively, which map outcomes to the set of real S r numbers. numbers. The set of possible combinations of sender and receiver strategies are strategy The set of possible combinations receiver strategies are strategy profiles. That is a sender strategy in of thesender set of and possible sender strategies, s ∈ S, profiles. That is a sender strategy in the set of possible sender strategies, ∈ S, and a receiver strategy in the set of all possible receiver strategies, r ∈ R,s yield and a receiver strategy in the set of all possible receiver strategies, r ∈ R, yield a strategy profile s, r. The expected utility for a given strategy profile can be agiven strategy profile where s, r. δ The expected utility fordistribution a given strategy can be as follows, specifies a probability over T profile . given as follows, where δ specifies a probability distribution over T . EUS (s, r) = δ(t) · US (t, r(s(t))) EUS (s, r) = t δ(t) · US (t, r(s(t))) t EUR (s, r) = δ(t) · UR (t, r(s(t))) EUR (s, r) = t δ(t) · UR (t, r(s(t)))
(1) (1)
t
For each possible state the sender and receiver strategies determine an outcome. For sender receiver s(t) each is thepossible messagestate the the sender willand employ and strategies r(s(t)) isdetermine the action an theoutcome. receiver s(t) is the message the sender will employ and r(s(t)) is the action theoutcomes receiver will take given that message. The expected utility is the sum of these will take given that message. The expected utility is the sum of these outcomes weighted by the probability of the state that yields them. weighted by the probability of the state that yields them. 2.2. Evolutionarily Stable Strategies 2.2. Evolutionarily Stable Strategies Signaling games provide a description of the interaction we are interested in, but Signaling a description of the interaction wemodel. are interested in, but we requiregames some provide sort of solution concept to complete our An evolutionwe require some sort of solution concept to complete our model. An evolutionarily stable strategy (Maynard Smith, 1982) provides just that. Loosely speaking, arily stableisstrategy (Maynard Smith, 1982) played provides speaking, a strategy evolutionarily stable if, when byjust an that. entireLoosely population, it is aresistant strategytoisinvasion evolutionarily stable if, when played by an entire population, it is by mutant strategies. For asymmetric games, such as signalresistant to invasion by mutant strategies. For asymmetric games, such as signala For simplicity, we only consider pure sender (receiver) strategies, functions from states to mesa simplicity, only consider pure sender strategies, functions from messagesFor (messages to we actions). Mixed strategies are (receiver) a straightforward generalization. A states mixedtosender sages (messages to actions). Mixed strategies are a sender straightforward generalization. A mixed sender strategy specifies a probability distribution over pure strategies, σ = p s + ... + p s , where 1 1 k k strategy a probability distribution over pure sender strategies, σ= p1pure s1 +receiver ... + pk strategies, sk , where 1. A mixed receiver strategy specifies a probability distribution over i pi =specifies strategy specifies a probability distribution over pure receiver strategies, r1 1. +A ...mixed + qk rreceiver ρ= i pqi1 = k. ρ = q1 r1 + ... + qk rk .
27
ing games, where players have different roles, a strategy profile is evolutionarily stable if and only if it constitutes a Strict Nash equilibrium (Selten, 1980). Definition 1. A strategy profile s∗ , r∗ is a Strict Nash equilibrium if and only if: • For all s ∈ S, such that s = s∗ , EUS (s∗ , r∗ ) > EUS (s, r∗ ) • For all r ∈ R, such that r = r∗ , EUR (s∗ , r∗ ) > EUS (s∗ , r) This simply states that the sender and receiver do worse if they unilaterally deviate from the equilibrium. Thus for a given strategy profile to be an ESS its component strategies must be mutual best responses to each other. 3. Conflicting Interests With the general framework in place, we now turn to a simple way of capturing the degree of conflict between senders and receivers by a single parameter. We then consider different constraints on signaling behavior. From this we determine the existence of evolutionarily stable strategies at various parameter values. 3.1. Parametrization Conflicts of interest between a sender and the receiver are essentially a matter of misaligned preferences. To see this, consider the following game. Let there be two states T = {tL , tH }, where we treat tL as a low value on some scale and tH as a high value. For simplicity, tL = 0 and tH = 1, where both states are equiprobable, δ(tL ) = δ(tH ) = 12 . There is some finite set of messages M = {m1 , ..., mk } available to the sender. Let the set of actions available to the receiver be the interval A = [0, 1]. Now, we can consider preferences over outcomes. Receivers want to accurately sort the senders by type and act accordingly. For example, suppose there are actions aL and aH that maximize the receiver’s utility when playing with tL and tH senders respectively. Following Crawford and Sobel (1982), we define the sender and receiver utility functions as the following, where the constant of conflict, b ∈ [0, 1], allows us to express how much some senders gain by exaggeration. US (t, a) = −(a − [t + b])2
UR (t, a) = −(a − t)2
(2)
Note that the constant of conflict represents the distance between the interests of the sender and the receiver. When b = 0 interests are perfectly aligned. Senders wish to reveal their type: a tL sender’s payoff is maximized by aL and a tH
28
sender’s payoff is maximized by aH . These actions also maximize the receiver’s payoff. In contrast, when b > 0, their interests diverge. The sender does best when he convinces the receiver to take an action appropriate for a higher type. For example, if b = 1 then senders maximize their utility with the highest possible action. A tL sender’s payoff is maximized by aH , as is a tH sender’s payoff. Note that a tH sender’s payoff is always maximized by aH , regardless of the size b. 3.2. Signaling We can now consider the incentives of senders and receivers and define conditions that determine signaling behavior. We begin from the receiver’s perspective before turning to the sender. The receiver’s best course of action depends on the amount of information available regarding the sender’s type. For example, if senders separate themselves by using distinct signals, then the receiver should make use of this information by taking the unique appropriate action. What should the receiver do if the senders pool together using the same signal? In the absence of any information, the action that maximizes her expected utility is the action that corresponds to the average type of the population, which we will refer to as the pooling action, aP . aP = δ(tL )aL + δ(tH )aH
(3)
We can now turn to the sender’s perspective and ask which of the receiver’s actions are preferred. We can state this generally in the following manner. For a sender of type t, the expected utility of an action a exceeds that of an action a, where a > a, if the following holds. 1 (US (t, a) + US (t, a )) < t + b (4) 2 In other words, when the sender’s type plus the constant of conflict exceed the average of the two actions, then the higher action a is preferred. This gives us a general recipe for determining the sender’s preferences. For any two receiver actions we can find their average. If this is less than the sender’s type plus the conflict, then the sender prefers the alternative higher action. 3.3. Stability We are now in a position to consider the stability of various sorts of signaling behavior for various degrees of conflict. In particular, we will be interested in whether senders separate or pool, and any transitions between the two kinds of behavior. As a point of reference, we will consider a state of affairs where all sender types use a single message m, and receivers respond to this message with the pooling action aP . Now, consider the availability of some alternate message
29
m , a neologism. Suppose that receivers take some action other than aP in response to this message. When would senders have an incentive to adopt a new message? The answer to this question depends on the degree of conflict. Consider the case where b is fairly small. Starting from the pooling state, do senders prefer some alternate action? We consider the types in turn. For b ∈ [0, 14 ), tL senders prefer aL to aP and tH senders prefer aH to aP . That is, both types of senders have an incentive to break from pooling. That is, if receivers respond to m with any action other than aP , then one type will have an incentive to adopt this new signal. Once this takes place the receiver will be able to separate the two types and act accordingly. Moreover, this constitutes an evolutionarily stable separating equilibrium. Neither type wishes to adopt the signal of the other, because it leads to a lower payoff. If we increase b we observe different constraints. For b ∈ ( 14 , 12 ), we consider the types in turn. Again, starting from the pooling equilibrium, we know that tL senders prefer the pooling action to being identified. But, tH senders still wish to be identified. Thus, tH senders have an incentive to break from the pooling by adopting a new signal. Receivers will do best by responding to this new signal accordingly and then separating the types. Note that neither sender type has an incentive to pool. Even though tL senders prefer the pooling outcome the most, they still prefer being identified to being mistaken for tH senders. Thus, separating is still evolutionarily stable. Increasing b further, we consider the case where b ∈ ( 12 , 34 ). As before, tH senders always have an incentive to adopt a new signal to be identified. Responding to this, receivers will be able to separate the types. However, once the types are separated, tL senders will always have an incentive to pool. That is, if receivers respond to m with aH , then tL senders will adopt m . Receivers will then respond with the pooling action and the cycle begins again. Finally, we consider b ∈ ( 34 , 1]. In this case, both types of senders prefer aH . Any message that the receiver responds to with aH will be adopted by both types. However, if both types adopt the same message, then the receiver does best by responding with the pooling action. Pooling in this case is neologism proof (Farrell, 1993), it cannot be disrupted by the introduction of new signals. Thus, we see that beyond a certain point, conflicts are too large for separating to be sustained, even temporarily. For sufficiently small conflicts of interest, separating is the only evolutionarily stable strategy. As conflict grows separating is destabilized, but persists in a cyclic pattern. The availability of new signals allows a subset of the types to identify themselves and be separated from the others. Yet, eventually even this escape from pooling is cut off. By considering a range of common and conflicting interests we have identified a range of signaling behaviors, including cyclic behavior where messages are adopted, mimicked, and then abandoned.
30
4. Diachronic Processes Jespersen’s Cycle is a broad, cross-linguistic tendency for a change in negation. Starting from an initial point where plain and emphatic negation are two distinct forms, emphatic negation undergoes a kind of rhetorical devaluation and is bleached of its emphasis (Dahl, 2001).b Kiparsky and Condoravdi (2006) show the process in Greek repeating itself several times over the centuries; last century’s emphatic negation is bound to be used as this century’s plain negation.c Table 1.
Historical Forms of Negation in Greek
PLAIN
EMPHATIC
SOURCE
ου...τι (ου)δεν...τι δεν...τιποτε δεν...πραμα
ου-δε...εν δεν...τιποτε δεν... πραμα δεν...απαντοξη
Ancient Greek Early Medieval Greek Greek Dialects Modern Cretan
The types of senders correspond to contextual standards of use. For example, one might reasonably say, “I didn’t move!” when one has moved an inch, but not “I didn’t move an inch!” In that case, tH senders represent a stricter standard of use than tL senders. Where the actions available to receivers are the formation of degrees of belief, receivers would be expected to form a higher degree of belief for any message sent exclusively by tH senders. If convincing receivers is the goal, regardless of type, we would expect both types to flock to such a message. Receivers would eventually becoming increasingly skeptical, responding more and more with the pooling action. The degree of belief that receivers have in the proposition would decrease over time. In other words, emphatic negation would lose its emphasis. In light of this, the tH senders can distinguish themselves by the use of some new signal. Neologisms are abundant, most often formulated by the addition of a minimizer (e.g. “not an inch”) or a generalizer (e.g. “not at all”) (Horn, 1989). Yet, just as tH senders might distinguish themselves by adopting a new signal, so to can the tL senders follow. Forms of negation are constantly being devalued and abandoned as interlocutors aim to convince others. 5. Concluding Remarks Here we have presented a model that allows us to explore various degrees of common and conflicting interests, noting a range of behavior, from separating, to cyb Schwenter (2006) argues from synchronic data in Romance that the plain/emphatic distinction is not sufficient, and that the different forms of negation are sensitive to information structural constraints. Our analysis could be extended to this account. c This table is constructed to show the connection between forms of different time periods; see Kiparsky and Condoravdi (2006) for more details and discussion.
31
cling, to collapse, as conflict increases. We have focused on cyclic signaling, taking Jespersen’s Cycle as a prime example. Such models offer insight into what can be explained by considering signaling under both common and conflicting interests. The existence of neologisms is crucial to the stability of separating and pooling. Future work will involve understanding the interaction between neologism formation and various game dynamics (Hofbauer & Sigmund, 1998). References Crawford, V. P., & Sobel, J. (1982). Strategic information transmission. Econometrica, 1431–1451. Dahl, O. (2001). Inflationary effects in language and elsewhere. In Frequency and the Emergence of Linguistic Structure (pp. 471–480). Philadelphia: John Benjamins Publishing Company. Farrell, J. (1993). Meaning and Credibility in Cheap-Talk Games. Games and Economic Behavior, 5, 514–531. Hardin, G. (1968). The Tragedy of the Commons. Science, 162(3859), 1292– 1297. Hofbauer, J., & Sigmund, K. (1998). Evolutionary Games and Population Dynamics. Cambridge University Press. Horn, L. R. (1989). A natural history of negation. University of Chicago Press. Kiparsky, P., & Condoravdi, C. (2006). Tracking Jespersen’s Cycle. In M. Janse (Ed.), Proceedings of the 2nd International Conference of Modern Greek Dialects and Linguistic Theory. University of Patras. Lewis, D. (1969). Convention. Cambridge: Harvard University Press. Maynard Smith, J. (1982). Evolution and the Theory of Games. Cambridge university press. Nowak, M. A. (2006). Five Rules for the Evolution of Cooperation. Science, 314(5805), 1560–1563. Schwenter, S. A. (2006). Fine-tuning Jespersen’s cycle. In B. T. Birner & G. Ward (Eds.), Drawing the boundaries of meaning: Neo-Gricean studies in pragmatics and semantics in honour of Laurence R. Horn (pp. 327–344). John Benjamins. Scott-Phillips, T. (2008). On the correct application of animal signalling theory to human communication. In A. D. M. Smith, K. Smith, & R. Ferrer i Cancho (Eds.), The evolution of language: Proceedings of the 7th International Conference on the Evolution of Language (pp. 275–282). Singapore: World Scientific Press. Selten, R. (1980). A note on evolutionarily stable strategies in asymmetric animal conflicts. Journal of Theoretical Biology, 84(1), 93–101.
32
SYNTACTIC DEVELOPMENT IN PHENOTYPIC SPACE LLUÍS BARCELÓ-COBLIJN Human Evolution and Cognition group, Department of Philosophy, University of Murcia, Campus de Espinardo, Murcia, 30100, Spain ANTONI GOMILA BENEJAM Human Evolution and Cognition group, Department of Psychology, University of the Balearic Islands, crta. de Valldemossa km 7,5, Palma, 07122, Spain The present work shows how an empirical study of language ontogeny can be accommodated within biological theory. Several analyses of linguistic corpora of first language acquisition have been represented by means of networks. The results show in each corpus a combination of linear and non-linear growth of both words and syntactic links. Four languages (five children) were analyzed. All children follow the same developmental pattern. This is characterized by transitions from one type of network to another more complex: from tree-like networks to scale-free networks, and then from this stage to scale-free networks with the feature known as “small-world”. These three stages can be thought as developmental phenotypes. As such, they accommodate perfectly to Pere Alberch's theory of developmental morphospace. Our proposal is valid for both typical and atypical linguistic development.
1. On networks and language 1.1. Introduction The study of language acquisition is perhaps one of the most intriguing because, besides the linguistic analyses of data, it necessarily deals with the psychological side of language. Its place, then, is within the developmental process of brain and mind. This can be done from several, different approaches. It can be a linguistic study in the classical sense, it can introduce neuroimaging techniques, it can be a more philosophical study, etc. It has been particularly in the last decades that the study of language has received considerable attention from physicists. After the fruitful emergence of Communication Theory (Shannon, 1948; Shannon & Weaver, 1963), physicists have deepened in their knowledge of complex systems and how the elements that compound such systems interact. Especially after Watts & Strogatz’s (1998) publication on 33 33
network evolution, this way to study, represent and analyze complex systems has been proven applicable to many aspects of nature. For example, disease spreading, the nervous system or molecular interaction, are examples of the application range of network approaches. Human language is also a possible field where networks can be introduced in order to provide additional information about the cognitive system behind language. After introducing the concept of network analyses, we will show how network analysis of linguistic data has also been imported into the language acquisition field. Applied longitudinally, this approach yields interesting results because it contributes with new, additional information that is not evident or easily retrieved through more traditional linguistic analyses. The results of several longitudinal studies of acquisition of different first languages show that children typically undergo the same developmental pattern in the development of their ability for combining words. Moreover, this pattern of language growth is characterized by a combination of linear and non-linear growth. When development is non-linear, the system makes a transition to an upper level of complexity. Finally, we find that each level of complexity, represented by a type of network, can be framed without problems within the theory of development put forward by Pere Alberch (Alberch, 1989; Alberch et al., 1979; Alberch et al., 1996). This theory has recently been imported to language study of language phylogeny (Balari & Lorenzo, 2012), but never considered from the perspective of ontogeny. We propose the latter, and argue in favor of these network analyses as evidence of the existence of different linguistic phenotypes before the adult language faculty is fully developed. 1.2. Pere Alberch’s phenotypic morphospace Pere Alberch is considered one of the fathers of the Evo-Devo stream adopted in biological theory and, by extension, in evolutionary theory. In his works he could show how the traditional view of genes and evolution, proposed by Dobzhansky’s school of biology, was not enough to explain variation between species or variation within species. Some aspects of the evolution of individuals and species seem to have their explanation in processes that happen beyond the amino-acid chains. Environment is important – at all levels, not just the external one, but also intra-organism – and the developmental trends followed by individuals can differ subtlety, sometimes provoking deeper changes at some generation. Alberch (1989) explained “the logic of monsters”, showing why some cases of teratologies1 are possible – like an individual with two heads, when she belongs to chordates – and why others are not – e.g., the minotaur. He 1
From τέρας (teras), “monster” in Greek.
34 34
proposed to envision organic development through a phenotypic morphospace, in which possible phenotypes were represented: typical phenotypes group together in clusters, whereas atypical phenotypes fell outside the “boundaries” of these clusters. Atypical phenotypes represented deviations from the most followed developmental tendencies. Many of the atypical developmental tendencies happen to not to be conciliatory with life. A particular phenotype can change through time. These changes are often subtle, so that the phenotype does not fall outside the boundaries (linear change). If the phenotype reaches the border of a group of phenotypes, it becomes statistically unstable. Then, at some generation, it can happen that one of her descendants does not develop this border phenotype, but makes a leap to the next possible group of phenotypes (non-linear change).
Figure 1. Alberch’s phenotypic morphospace on the left; Balari & Lorenzo’s (2012) linguistic morphospace on the right.
Alberch’s proposal for representing evolution was able to explain (1) why this phenotypes come out once and again (because they are not transmitted genetically); (2) why there are both subtle variations (linear progress) and radical variations (non-linear progress); and (3) this representation of development is suitable for both ontogeny and phylogeny2. Balari & Lorenzo (2012) have recently imported this approach into evolutionary linguistics, arguing that it could also be possible to imagine that in the evolutionary course of our species, several kinds of cognitive systems have taken place. Interestingly, they would differ regarding their computational potential. Balari and Lorenzo take as an example of possible phenotypes the well-known Chomsky hierarchy. The authors posit each level of the hierarchy as a possible phenotype (with exception of Turing’s machine level, which is 2
Alberch rejected Ernst Haeckel’s dictum that “ontogeny recapitulates phylogeny”. In this regards, see in particular Gomila (2010).
35 35
considered biological implausible). Hence, according to Balari and Lorenzo, the evolution of the language faculty can be figured out through a phenotypic morphospace where all biologically possible faculties of language could be represented. Whereas apes would be in a phenotypic cluster, current humans would have made a leap to a different phenotype, reaching the current computational potential. Although Balari & Lorenzo’s proposal is appealing, it seems really difficult to put to the test. There are not any representatives of other species of the subtribe Hominina to see and compare the computational potential of their minds or the externalizing systems they could have. Therefore, a good way to test the plausibility of the idea of linguistic morphospace is to focus on ontogeny, rather than on phylogeny. To do this, one needs (1) longitudinal studies of first language acquisition, (2) different phenotypes and (3) abrupt changes between phenotypes. Next, we present a series of empirical studies that do support the idea of a linguistic morphospace, where several levels of computational potential have been identified and abrupt transitions between them as well. 1.3. Syntactic network analyses The syntactic analysis of human language and its representation by means of networks is nothing new. Differences mostly rely in other aspects, like for example, the source of data, or how they are linguistically analyzed or the distances between words to be taken into account. A pioneer in this field is Ferrer-i-Cancho, who has developed several works in this line, showing that the connection between words in (adult) human language reaches a particular level of complexity, represented by a scale-free network with the small-world effect (Ferrer-i-Cancho & Solé, 2001). The same happens when large texts are analyzed, for example, the novel Moby Dick (Solé et al. 2010). A good question is, whether humans are born with such potential, or instead they develop it during infancy. The studies carried out in language acquisition support the latter view and talk about a “syntactic explosion” (Lust, 2006). How can we explore this process with networks? This is a question that Corominas-Murtra et al. (2009) tried to answer. They analyzed “by hand” all the sentences from two corpora of children acquiring English (CHILDES database). Their results showed a series of networks in each case, which grew progressively until a point after which there was an abrupt transition. The first networks were structurally simple, called tree-like networks. Then, the networks became scale-free until they developed the characteristic of small-world. 36 36
In short, since Watts & Strogatz (1998) seminal work, networks have been differentiated by the values of some features, like the cluster coefficient (C) or the path length (L): Table 1. Networks according to Watts & Strogatz (1998) featured by two values: cluster coefficient (C) and path length (L). Type Regular Small-world Random
C high high small
L high small small
Because one language is not enough to draw strong conclusions, further corpora were analyzed, utterance by utterance, of three more languages: Dutch, German and Spanish (Barceló-Coblijn, Corominas-Murtra & Gomila, 2012), yielding similar results. Children developed their ability for combining words following the same stages, represented by three types of networks, in very similar temporal periods. In fact, we observed that it is between the 25th and 27th /28th (depending on the individual) that there is an important increase of syntactic links in the network. 1.4. Procedure A computer friendly way to analyze syntactically human sentences is focusing on dependence relationships between words, so that a word – say, a noun – syntactically depends from a verb. The traditional representation is the one where the words of a sentence are linked by arches or arrows. This kind of syntactic annotations can be sent to a network program.
Figure 2. Linking words by syntactic dependence relationships.
37 37
Our team of linguists analyzed each corpus, sentence by sentence. The guidelines followed can be found in Corominas-Murtra (2007). To this end, our linguists have kept the same criteria when analyzing the corpus: for example, if it is considered that particles are the “governor” element in Germanic phrasal verbs, this criterion has been applied all along the corpus (e.g., Dutch: opstaan “stand up”; the expression “sta op” [imperative] would be: sta ← op). Typically, from a sample of a corpus – a conversation transcription – a graph is obtained. The graphs from language acquisition data are usually divided in several components (or “sub-networks”). The largest one, called Giant Connected Component (G), is the focus of our interest: as the time elapses and the child grows, G becomes larger, appearing in the graph fewer components larger than 2-nodes. This has been interpreted as a sign of the cohesion of the system.
1;09.15
1;10.27
2;03.10
Figure 3. Three graphs from three different periods of the Spanish corpus Aguirre “MAG”. Age represented as “Year; Months. Days”
Coinciding with previous works, when the network becomes small-world the hubs in it are always functional words (e.g., determiners), which emerge late and abruptly. These kinds of words do not show the conceptual meaning of other words like nouns or adjectives, rather the information compressed by them is often of syntactic nature. 1.5. Discussion Until now, two works following the same procedure have been reported syntactic analyses by networks of 5 languages – corpora of 6 children – (Corominas-Murtra et al. 2009; Barceló-Coblijn et al., 2012). In both works children follow the same developmental path, suggesting that humans typically
38 38
follow the same steps in order to rapidly develop the language faculty. In these two works the ability for combining words has been examined. In a first period, the network G is quite simple, often with an elongated treelike shape. After a period of linear growth, where the number of nodes is quite the same as edges, there is an abrupt transition and G becomes a scale-free network. The number of nodes and edges do not increase linearly anymore, being always the number of edges (syntactic links) higher. In the last step the network becomes small-world, and the number of edges typically doubles the number of nodes. We propose to conceive these steps as evidence of different syntactic phenotypes. Each phenotype represents a different level of complexity. The abrupt and late emergence of functional words as hubs highlights the abrupt transition towards a system governed by words with highly syntactic content. Importantly, if each phenotype can be portrayed by a different kind of network, this allows the study of deviations from the typical developmental path. Hence, an important prediction of this approach is that the syntactic networks of pathologies related to syntactic deficits should show differences in their structure. These pathological linguistic phenotypes would fall outside the borders of the typical linguistic phenotype of modern humans. Acknowledgements This research was supported by DGICYT Projects FFI2010-20759 and FFI2009-13416-C02-01 and (Spanish Ministry of Economy and Competitiveness). References Alberch P. (1989). The logic of monsters: Evidence for internal constraint in development and evolution. Geobios, 12: 21–57. Alberch, P., & Blanco, M.J. (1996). Evolutionary patterns in ontogenetic transformation from laws to regularities. International Journal of Developmental Biology, 40, 845–858. Alberch, P., Gould, S. J., Oster, G.F., & Wake, D.B. (1979). Size and shape in ontogeny and phylogeny. Paleobiology, 5(3), 296–317. Balari S. & Lorenzo G. (2012). Computational Phenotypes: Towards an Evolutionary Developmental Biolinguistics. Oxford University Press, Oxford. Barceló-Coblijn L., Corominas-Murtra B. & Gomila A. (2012). Syntactic trees and small-world networks: syntactic development as a dynamical process. Adapt. Behav., 20(6): 427-442
39 39
Corominas-Murtra, B. (2007). Network statistics on early English syntax: Structural criteria. ArXiv:0704.3708v2. Corominas-Murtra B., Valverde S. & Solé R. V. (2009). The ontogeny of scalefree syntax networks: Phase transitions in early language acquisition. Adv. Complex Syst., 12: 371–392. Gomila, A. (2010) Evolutionary Psychology and the proper relationship between ontogeny and phylogeny. Luis A. Pérez Miranda & Aitor Izagirre Madariaga (eds.).Advances in Cognitive Science: Learning, Evolution and Social Action, pp. 233 – 252. Servicio Editorial de la Universidad del País Vasco – Euskal Herriko Unibertsitateko Argitalpen Zerbitzuak. Ferrer-i-Cancho, R., & Solé, R. V. (2001). The small world of human language. Proceedings of the Royal Society B: Biological Sciences, 268, 2261–2265. Lust B. 2006. Child Language: Acquisition and Growth. Cambridge University Press, Cambridge. Shannon, P., Markiel, A., Ozier, O., Baliga, N. S., Wang, J. T., Ramage, D., Amin, N., Schwikowski, B., & Ideker, T. (2003). Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res,13, 2498–2504. Shannon, C. (1948). A Mathematical Theory of Communication. Bell System Technical Journal 27: 379–423, 623–656. Solé, R. V., Corominas-Murtra, B., Valverde, S., & Steels, L.(2010). Language networks: Their structure, function, and evolution. Complexity, 15, 20–26. Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of "small-world" networks. Nature, 396, 440–442. Weaver, W. & Shannon, C. (1963). The Mathematical Theory of Communication. Univ. of Illinois Press.
40 40
FINDING THE UNDERPINNINGS: THE LAST QUARTER CENTURY FINDING THE UNDERPINNINGS: THE LAST QUARTER CENTURY
TED BAYNE Independent Researcher TED BAYNE West Tisbury, MA, 02575 USA Independent Researcher WestPinker Tisbury, 02575 USA a paper in Behavioral and Almost a quarter century ago and MA, Bloom published Brain Sciences that proposed that evolution have a place at the table of discourse about language. That event together many lesspublished splashy events an era where Almost a quarter century ago with Pinker and other Bloom a paperstarted in Behavioral and an evolutionary perspective neuroscience, neurobiology, anthropology, and Brain Sciences that proposed penetrated that evolution have a place at the table of discourse about genetics. disciplines together horizon of allowable language.Multiple That event togetherworked with many otherand lessthe splashy events started anhypotheses era where dramatically widened. While languageneuroscience, evolution studies were lively pre-1990 (e.g. an evolutionary perspective penetrated neurobiology, anthropology, and Ascher (1964) & Harnad (1976)), thistogether submission thatofweallowable have experienced genetics. Multiple disciplines worked and contends the horizon hypothesesa major shift inwidened. the post-1990 period that has had broad influence. 1 offers three dramatically While language evolution studies were Section lively pre-1990 (e.g. examples of this trend. The shift grew out of epistemic changes in the way wea Ascher (1964) & Harnad (1976)), this submission contends that we have experienced approached thethesubject of a period two-tiered evolutionary (1) Section a more 1recent major shift in post-1990 that has had broadprocess influence. offerssociothree linguistic layer and (2) the biological and semiotic underpinnings and precursors of this examples of this trend. The shift grew out of epistemic changes in the way we layer (much the older). Tableof1 acompares relevant topics for pre-1990 In approached subject two-tiered evolutionary process (1) versus a morepresent recentday. socioSection 2,layer two and domains evaluated these epistemic shifts: origins linguistic (2) theare biological andfor semiotic underpinnings andmusic precursors of and this modern biosemiotics. together, thesetopics two for fields point versus to the present emergence layer (much older). TableTaken 1 compares relevant pre-1990 day. of In significant preverbal semiotic capabilities, co-evolving with prehistoric Homo the Section 2, two domains are evaluated for these epistemic shifts: music origins and underpinnings for spokenTaken language and providing corefields language “for free”. modern biosemiotics. together, these two pointinfrastructure to the emergence of Today, the expanding language evolution discussion is surrounded by compatible, significant preverbal semiotic capabilities, co-evolving with prehistoric Homo the resource-rich underpinningsdisciplines. for spoken language and providing core language infrastructure “for free”. Today, thethe expanding language evolution discussion is surrounded by compatible, 1. Arousing Scientific Imagination resource-rich disciplines. This revolution was stimulated in multiple ways: (1) new thematic areas became scientifically “available” the evolutionary perspective); (2) breakthroughs 1. Arousing the Scientific(e.g. Imagination in one disciplinewas cross-fertilized disciplines This revolution stimulated in other multiple ways: (1)(e.g. newgenetics, thematic neuroscience, areas became 1 ; or (4) etc.); (3) new “available” hypotheses (e.g. forcedthere-thinking of old implicit assumptions scientifically evolutionary perspective); (2) breakthroughs multidisciplinary “hybridized” a subject (e.g. music origins, see below). in one discipline work cross-fertilized other disciplines (e.g. genetics, neuroscience, 1 this time, the horizontal proliferation an evolutionary perspective ; or (4) etc.);During (3) new hypotheses forced re-thinking of oldofimplicit assumptions 2 . Far-reaching into new fields (e.g. archeology and neurobiology) wellorigins, known see multidisciplinary work “hybridized” a subject (e.g. is music below). During this time, the horizontal proliferation of an evolutionary perspective into new fields (e.g. archeology and neurobiology) is well known2. Far-reaching 1 Clegg, M. (2012:58). The Evolution of the Human Vocal Tract. In Bannan ed. (2012:ch. 3). Clegg provides a model of how to contest widely held but under-challenged implicit assumptions [about laryngeal descent]. 1 2 Clegg, M. (2012:58). The Evolution of the Human Vocal Tract. In Bannan ed. (2012:ch. 3). Clegg Foley, R.A. (2012). Music and Mosaics: The Evolution of Human Abilities. In Bannan et al. provides a model of how to contest widely held but under-challenged implicit assumptions [about (2012:31-34). Foley refers to “a paradigm shift in human evolutionary studies” due to laryngeal descent]. 2interdisciplinary work from genetics and molecular phylogeny as well as other fields. Foley, R.A. (2012). Music and Mosaics: The Evolution of Human Abilities. In Bannan et al. (2012:31-34). Foley refers to “a paradigm shift in human evolutionary studies” due to interdisciplinary work from genetics and molecular phylogeny as well as other fields.
41
vertical explorations have re-conceptualized specific fields (e.g. Deacon (1997, 2011) and Porges (2011)). Here are three examples illustrating the trend. 1.1 Pinker and Bloom: The Ripple Effect Kenneally (2007:65-66), who chronicled the period and its debates, summarizes the impact of their well-known 1990 paper: “…although it’s not possible to determine the relative contribution of all these factors, it’s clear that together they had an impact. Before their paper, relatively few books and papers were published on the topic. Since then, many books and more than one thousand papers have been published on language evolution.” [“…of all these factors” meaning: of other papers in other journals]
1.2 Porges and the Polyvagal Theory Porges’ intellectual process became a scientific model of the comprehensive application of an evolutionary perspective. Porges tells of his uphill struggle to master the paradoxical neurobiology of the visceral organs (Porges, 2011). When he thought he was “done” with his research without having to descend into deep evolutionary themes, an MD pointed out fatal errors in his work and Porges was back to “square one”. After years in the archives of NIH he emerged with a solid theory. He integrated a thorough evolutionary grasp of the layers of neurobiology present in the human body that support “fight-flight” and “play dead” responses while also normally allowing regulated emotion in the context 3 of nuanced social relations. Additionally, the more he probed the role of affect in cognition, self-regulation, and nervous systems the more he grasped the functional oneness of mind and body. (The growing recognition that felt (primary process) emotions arise mainly from older subcortical structures is evidence that behaviorism’s grip is fading (Panksepp 2012:55-63) and this has encouraged an embodied, integrated, and alive way of viewing humans and animals with a functional interiority.) Porges has become a valuable resource for psychologists (e.g. Fosha et al, 2009, Schore, 2011, and Panksepp, 2012) who are turning increasingly to analysis and practice derived from neurobiological evidence. Porges summarizes: “The evolution of the autonomic nervous system provides an organizing principle to interpret the adaptive significance of physiological responses in promoting social behavior. According to the polyvagal theory, the well-documented phylogenetic shift in neural regulation of the autonomic nervous system passes through three global stages, each with an associated behavioral strategy.”4 “Three neural circuits form a phylogenetically ordered response hierarchy that regulates behavioral and physiological adaptation to safe, dangerous, and life-threatening environments.”5 “Coincident with the separation of the middle ear bones, other phylogenetic transitions resulted in brainstem areas regulating the vagus becoming intertwined with the areas regulating the striated 3
Compare Schore (2011) & Greenspan (2004:282-284), et al. On affect itself see Panksepp (2012). Porges (2001). The Polyvagal Theory: Phylogenetic substrates of a social nervous system. International Journal of Psychophysiology 42:123-146. 5 Porges (2011: ch. 13). 4
42
muscles of the face and head. The result of this transition was a dynamic social engagement system with social communication features (e.g. facial expression, head movements, vocalizations, and listening) interacting with visceral state regulation.”6
The three neural layers are: the parasympathetic (ancient), sympathetic (newer), and social engagement (most modern). The three pithy quotes above speak to centrality of an evolutionary perspective and show the epistemic shift that comes along with it: the infusion of a biological deep time into the visualization and conceptualization of human neurobiological layering. The social engagement layer represents a massive evolutionary investment clearly co-evolving with reciprocal semiosis within social bonds, a fundamental prerequisite for language emergence.7 1.3 Schore and the transformation of psychotherapy Discussing the transformation of psychotherapy since 1990, Schore (2011) 8 refers specifically to the Kuhnian concept of shift: both the fundamental configuration of human neurophysiology changed as well as the relation of the scientist or clinician to it (an epistemic change). The neuroscience that modernized (parent-child) attachment theory9 participates in this shift. Schore’s accounts of parent-child preverbal semiosis in the context of attachment relationships show a link with language acquisition. Early childhood brain development prepares a platform for the “verbal leap”. 10 Schore’s relevance here lies in (1) his discussion of Kuhnian shift in his field, and (2) his careful descriptions of preverbal parent-child semiosis. These changes are also a result of evolutionary contributions from Porges and others. 2 Two Noble Domains 2.1 Music Origins Increasing evidence from the music origins discussion points to musical capacities forming underpinnings for verbal language just as Porges’ social engagement neural circuit is a co-evolved prerequisite for semiosis in social bonding. Music has proven itself to be a fertile research modality for neuroscience itself.11 Pinker’s famous “auditory cheesecake” remark was a premature dismissal12. Much research has been done that is valuable to both music and 6 Porges (2011: ch. 13), there is discussion of increased neural regulation of the vocalization system and coordination of vocalization with “respiratory effort and volume”. 7 Porges (2011:ch. 9-13). See Deacon’s “4th leg” in para. 2.2 below. Deacon is not alone in proposing that social bonds over time (loyalties) are related to language beginnings. See Wenseleers (2010). 8 See Introduction in Schore (2011) entitled “Toward a New Paradigm of Psychotherapy”. 9 Hart (2011), Konner (2010), & Schore (2011). Parent-child attachment theory originated from an evolutionary perspective (e.g. Bowlby’s “environment of evolutionary adaptedness”). 10 Compare Hrdy (2009:124-132) on extending Bowlby’s attachment theory. 11 Bannan ed. (2012:12-17). 12 From Pinker’s (1997:534) How the Mind Works. NY: Norton.
43
language evolution domains. Bannan (2012:15) points out the shift that happened in his field: “It is only in the last fifteen years or so that research has focused on fully embracing the role of music into the evolutionary agenda, and encouraged speculation on the prehistoric soundscape and its function in shaping modern human abilities.” It is not evolution of music as the B Minor Mass (cultural invention or exaptation?) that is at issue. Faculties underlying human musical ability are the serious research topics: (1) significant overlap between verbal and musical cognitive areas (Patel 2010); (2) imitating a provided rhythmic pattern 13 ; (3) pitch perception 14 ; (4) prosody perception 15 ; (5) auditory anticipation 16 ; (6) ascribing meaning to tonal information (non-verbal and natural sounds17); (7) highly versatile vocal production18; (8) body performance for social cohesion19; (9) the matching of tonal ranges in music vs. human voice 20 ; (10) preverbal/nonverbal vocal learning21; (11) metaphorical projection on nonverbal sounds or body movement 22 -- many are language precursors. Rhythmic patterning for motor systems of the brain and body are important for many human activities outside of early dance, drumming, etc.23 It is likely that tonal information played a role in protolanguage prior to formal speech (Everett, 2012). A newborn uses preverbal melodic perception 24 to understand her caregiver, inferring her communicative intent (early mindreading). Bogolepova summarizes: "The right hemisphere of the neonate is actively involved in the perception of speech melody and the intonations of the voices of mother and surrounding people. The pre-speech stage of child development is characterized by interactions of the descriptive and emotional components due mainly to mechanisms operating within the hemispheres on the principle of non-verbal
13 Hattori (2013). Human (vs. primate) abilities to imitate complex rhythmic patterns are vastly superior. Mirror neurons (Iacoboni, 2009) are present in primates and humans; see the section entitled “Neural Mirroring and Psychological Theories of Imitation” for detection in humans. 14 Diamond (2000): one vowel might have as many as eight different meanings in the Iyau language. 15 Elordieta (2012). Prosody perception requires dealing with tone, tempo, rhythm, timbre, prosody, and amplitude of speech (elements common to music as well). See also Bogolepova (2001). 16 Huron (2006). 17 Changizi (2011). 18 Hai, T.Q. & Bannan, N. (2012: 142). Vocal Traditions of the World. In Bannan ed. (2012:ch. 6). On an accompanying DVD, Hai demonstrates the highly versatile capabilities of the human vocal system. 19 Gamble, C. (2012: 101). When the Words Dry Up. In Bannan ed. (2012:ch. 4). Music as bodybased performance with social meaning and emotional expressiveness is quite ancient. 20 Porges (2011: ch. 13 Section entitled “The Frequency Band of Perceptual Advantage”). 21 Merker, B. (2012:215). The Vocal Learning Constellation. In Bannan ed. (2012:ch. 9). Vocal learning ranks high in this list. 22 Spitzer (2004: Part I). 23 See ch. 16 in Wallin et al. (2000). 24 See Part 2 in Malloch & Trevarthen (2009). This series of essays provides a strong case for “musicality” in preverbal parent-child communication. See Cuccio (2013) on infant theory of mind as a prerequisite for spoken language.
44
communication." 25 Symbolic forms occur in parent-child nonverbal semiosis before formal language develops.26 In a parallel way, prehistoric instances of Homo probably developed metaphoric-symbolic forms of dance and music prior to formal verbal language use. 27 See Patel (2010) on the overlap between perception of affect in speech and in music.28 [I. Morley sums up] The mechanisms for voice recognition are almost certainly evolutionarily far older than those for linguistic processing; the fact that timbre-rich musical sounds are processed exclusively by these mechanisms could suggest that musical processing also predates linguistic processing, or at least that the processing of tonal content predates the processing of semantic content.29
2.2 Modern Biosemiotics and a Reconceived Question of Consciousness Another instance of a transformative reframe of language evolution is the emergence of modern biosemiotics (e.g. see Deacon, 1997 and Schilhab et al, 2012). This growing interdisciplinary field asserts a robust substrate of evolved precursor30 capabilities (not “aimed” at spoken language) that verbal language might leverage and then co-evolve with. It forces a deeper appreciation for preverbal levels of human cognition, interpretation, perception, and presentation of meaning. Biosemiotics radically reduces the weight of explanation placed on a theory of verbal language evolution because it posits a Homo sapiens with more infrastructural capacities that are there “for free”. Deacon (1997) inspired the formation of two significant conferences named for his book (Symbolic Species Conference I and II). The Introduction to the Conference volume points to the unique blending Deacon achieved.31 The “four legs” of Deacon’s 1997 thesis 32 consist in (1) neoBaldwinian evolution (“multi-level selection”),33 (2) neuro-connectedness of the 25 Bogolepova (2001). See Schore (2011) on the development of infant brains and communicative abilities (preverbal). The ability to perceive pitches, tones, prosody, and rhythms and extract meaning from them is well presented. 26 Trevarthen, C. (2009). The Functions of Emotion in Infancy. In Fosha et al. (2009: ch. 3). Symbolic forms in communication occur before formal language. 27 Gamble, C. (2012:89-102). When the Words Dry Up. In Bannan ed. (2012:ch. 4). Symbol and metaphor underlie verbal language. It is likely these were present in Homo prior to formal verbal language like children’s early preverbal symbolic play. See Hart (2011:ch. 3) “Language Development and Play Ability”. Also, see Cross, I. in Bannan ed. (2012: 271-274). 28 Patel (2010). See Section 6.5. 29 Morley, I. (2012:118). Physiological Evolution and Musical Capacities. In Bannan ed. (2012:ch. 5). 30 Hoffmeyer, J. & Kull, K. (2007:262) “Baldwin and Biosemiotics”. In Weber et al eds. (2007). See Christiansen (2012) on language tasks harnessing a domain-general neural system. 31 Schilhab et al. (2012:3). 32 The 1997 thesis was updated by Deacon in essays in Weber et al eds. (2007), then further sharpened by Deacon in Schilhab et al (2012:9). Deacon’s Multilevel selection is seen (among other related positions) as elevating the Baldwin effect from “triviality”. Also, see Yamauchi & Hashimoto (2010). 33 Deacon, T.W. (2007: 81). Multilevel Selection in a Complex Adaptive System: The Problem of Language Origins. In Weber et al eds. (2007).
45
prefrontal area to many pre-existing areas and organs such as the vocalization system (i.e. no single language module) 34 , (3) the integration of Peirce’s semiotics through weaving the symbolic innovation of Homo sapiens into previously evolved cognitive-perceptual layers (for icon and index), and (4) the anthropological evolution of social bonds that use previously evolved capacities as well as the symbolic innovation to extend relationships in time (social loyalties). 35 Linguistic “features” like recursion are not proprietary to a language “module” but rather are implicated in evolved symbolic semiosis of any kind and thus freely donated to language emergence 36 . This set of perspectives represents an epistemic change over the “language module” view of the earlier era as well as other implicit assumptions. An evolutionary perspective (e.g. Porges) has a multidimensional and layered model of change, leveraging the conserving characteristic of evolution itself. Older components can serve newer components in new settings. Newer components can “take over” older components and extend them. With an evolutionary view, cultural forms along with newer emergent abilities could be positioned as evolving to “harness” evolved older capacities.37 Language could evolve to match and ride on top of these human faculties 38 (many cognitive or semiotic hurdles had already been achieved before verbal language arrives on the scene.) Interestingly, neuroscience shows that between birth and age 3, human infants’ brains grow at a very rapid rate in the context of and in response to attachment experiences and communications with caregivers. The mainly right-brain communicative capabilities (visual and auditory) are non-verbal but increasingly effective. The auditory channel uses tonal-prosodic-rhythmic perception and the visual interprets facial gesture. The result is an intimate semiosis between two beings able to extract meaning from sounds and facial signs (Schore, 2011)39. In a substantial way, this preverbal early period helps to lay the groundwork for verbal language acquisition. There is evidence that the child learns meta-lessons about meaning and symbolic reference in a non-verbal world before she speaks complete sentences. As another example, there is evidence that numerosity
34
Seung, S. (2012) describes neonate brain development (the role of epigenetic growth is major). This summary was derived from the Introduction in Schilhab et al. (2012). See Diller & Cann (2013:248). They cite Deacon, T.W. (2003). Universal grammar and semiotic constraints. In Christiansen, M.H. and Kirby, S. (eds.), Language evolution. Oxford: Oxford UP. Also, see Hurford (2011:572-576). 37 Changizi (2011: ch. 1). Also, see Yamauchi & Hashimoto (2010:280-282). 38 The history of writing systems echo the verbal story. See Changizi (2009: ch. 4) and Wolf (2007:217): they agree that writing systems had to evolve to harness extant features of human biology. 39 Schore (2011). Modern attachment theory is enhanced by neuroscience. Early childhood brain development is increasingly understood in the light of neuroplasticity and neuro-malleability. See also C. Trevarthen “The Functions of Emotion in Infancy” in Fosha et al. (2009:ch. 3): “Clearly, the human brain is both an intentional organ and an intersubjective one before it is a linguistic one…” Also, see Hart (2011:ch. 2). See Cuccio (2013:3-6) on infant theory of mind functioning in a preverbal setting. 35 36
46
itself is a symbolic layer occurring well before language acquisition. 40 Moreover, early childhood symbolic play after 2 years adds a further layer of sophistication to the picture (e.g. Adams (2010:29) and Hart (2011:ch.3)). Recent work on consciousness 41 positions it right in the middle of Deacon’s “four legs” (see above) in a major reconceptualization. Deacon makes a strong case that the emerging complexity of a goal-oriented human organism created adaptive pressure for a functional consciousness able to mediate between organism and environment. Consciousness did not evolve separate from what the body was doing in the world. It evolved out of the escalating interaction between organism and environment with the requirement that the organism factor its own reality into that interaction (requiring sentience of sentience).42 Deacon summarizes: “For animals with brains, the organism and its distinctive teleodynamic characteristics will likely fail to persist (both in terms of resisting death and reproducing) if its higher-order teleodynamics of self-prediction [my emphasis] fails in some respect.” (Deacon, 2011:526) The importance of goal setting (intentionality) is that the brain (and awareness) is charged with the job of incessantly mediating between the organism and the environment (over time) to execute the goal (micro or macro). As organisms became more complex in their properties and functions, the brain increased its role to perform this mediative job, including its own being into its perceptual sets in order to negotiate self-predictive interaction with the environment. The temporal position of consciousness is in the “negative” space of the anticipated moment – vision and hearing are similarly occupied with the predictive moment to come43. System Language entity
Memory Academia
Table 1. Pre-1990 versus Current: Topical Comparisons Pre-1990 2014 (Some Just Emergent) Distributed support. Balwinian An innate Module.44 niche construction45. Biosemiotics “Phrenology” & localization brain assumptions. describes its precursors.46 Computational syntax model. More austere and pre-set. Rich and dynamic memory.47 Departmental silos. Exclusivity. Interdisciplinary. Inclusivity.
40
Coolidge & Overmann (2013). Deacon (2011). See Huron (2009:108) for auditory predictive awareness, Changizi (2009) for visual predictive awareness, Posner et al (2012), and Weber et al eds. (2007:ch. 13). Lieberman (2013) aligns with Deacon on the lack of zombie processes auto-mediating between the intentdriven human organism and a complex environment. 42 Hoffmeyer, J. & Kull, K. (2007:258-263) “Baldwin and Biosemiotics”. In Weber et al eds. (2007). 43 Changizi (2009) for vision and Huron (2006) for auditory. Both channels are cognitively wired for predictive awareness. We see/hear what is based on prediction/anticipation (what is to be). 44 Lieberman (2013:5). “Innateness” creates an irresolvable genetic chicken-and-egg problem. 45 Yamauchi & Hashimoto (2010). 46 The founders of the modern interdiscipline were Thomas Sebeok and Thure von Uexkull. See Nattiez (1990 trans: ch. 1) and Weber et al eds. (2007:253-272) on biosemiotics as fundamental. 47 Bybee (2010: ch. 2). Also, Seung (2012:70-74). 41
47
Consciousness
Primate Capabilities
Altriciality
Psychology and Psychotherapy
Table 1. Continued Behaviorism: denial of interiority (earlier). Reified. Persistence of Cartesian mind-body. Consciousness: a “thing” & a “location”. Less capable. Behaviorism held these assessments back. More constrained sense of infant intelligence and brain development. Emphasis on cognitivebehavioral, left-brain priority, drugs, behavior modification.
Functional integration with life forms. Intrinsic to life forms48. Mind-body merge. Consciousness is embodied activity in time49. More capabilities are seen to be present in “lower” animals once thought uniquely human.50 Modern attachment theory51 and evolutionary perspective52. Infant intelligence and neuroplasticity53. Right-brain emphasis54, the unconscious, neuroplasticity, neurobiology, relational, affectregulation.55
Selected References Adams, K. (2010). Unseen worlds: Looking through the lens of childhood. London: Jessica Kingsley. Ascher, R. et al (1964). The human revolution. Current Anthropology, June 1964. Vol. 5 (3): 135-147. Bannan, N. (2012). Music, language, and human evolution. Oxford: Oxford UP. Bogolepova, I.N. & Malofeeva, L.I. (2001). “Characteristics of the Development of Speech Motor Areas 44 and 45”. Neurosci. Behav. Physiol. Vol 31, No 4, 8 July 2001. Bybee, J. (2010). Language, usage, and cognition. Cambridge: Cambridge UP. Changizi, M. (2009). The vision revolution. Dallas: BenBella. Changizi, M. (2011). Harnessed. Dallas: BenBella. Christiansen, M.H. et al (2012). Similar neural correlates for language and sequential learning. Language and Cognitive Processes 2012, 27(2), 231-256. Coolidge, F. L., & Overmann, K. A. (2013:ch.7) The archaeology of number concept. in Botha, R. et al eds. (2013). The evolutionary emergence of language. Oxford: Oxford UP. Cuccio, V. (2013). From a bodily-based format of knowledge to symbols. Biosemiotics. DOI 10.1007/s12304-013-9184-6. Deacon, T. (1997). The symbolic species. New York: Norton. Deacon, T. (2011). Incomplete nature. New York: Norton. Diamond, J. (2000). The rise and fall of the third chimpanzee. London: Radius. Diller, K.C. & Cann, R.L. (2013:ch13).Genetics, evolution, and the innateness of language. in Botha, R. et al eds. (2013). The evolutionary emergence of language. Oxford: OUP. Elordieta, G. & Prieto, P. (eds.) (2012). Prosody and meaning. Berlin: de Gruyter Mouton. 48
Deacon (2011). The general decline of Cartesian and Behaviorist tacit assumptions is essential. Hoffmeyer, J. & Kull, K. (2007:258-263) “Baldwin and Biosemiotics”. In Weber et al eds. (2007). de Waal (2013). 51 Hart (2011), Gopnik (2009), & Schore (2011). 52 Konner (2010). 53 Gopnik (2009). Gopnik describes infants’ surprising abilities to deal with truth vs. fiction, counterfactuals and improbabilities, and good vs. evil. 54 McGilchrist (2009: Part II). 55 Schore (2011), Greenspan at al. (2004), and Panksepp (2012). 49 50
48
Everett, D. (2012). Language: The cultural tool. New York: Pantheon Books. Fosha, D., Siegel, D.J., & Solomon, M. eds. (2009). The healing power of emotion. New York: Norton. Gopnik, A. (2009). The philosophical baby. New York: Farrar, Straus, & Giroux. Greenspan, S. & Shanker, S.G. (2004). The first idea. Cambridge: Da Capo. Harnad, S.R. et al (eds) 1976. Origins and evolution of language and speech, Annals of the New York Academy of Sciences vol. 280. Hart, S. (2011 trans). The impact of attachment. New York: Norton. Hattori, Y., Tomonaga, M. & Matsuzawa, T. (2013) Spontaneous synchronized tapping to an auditory rhythm in a chimpanzee. Sci. Rep. 3, 1566; DOI:10.1038/srep01566 Hrdy, S. B. (2009). Mothers and others. Cambridge: Belknap. Hurford, J. (2011). The origins of grammar. Oxford: Oxford UP. Huron, D. (2006). Sweet anticipation. Cambridge: MIT Press. Iacoboni, M. (2009). Imitation, empathy, and mirror neurons. Annu. Rev. Psychol. 60:653.70. Kenneally, C. (2007). The first word. New York: Penguin. Konner, M. (2010). The evolution of childhood. Cambridge: Belknap. Lieberman, P. (2013). The unpredictable species. Princeton: Princeton UP. Malloch, S. & Trevarthen, C. (2009). Communicative musicality. Oxford: Oxford UP. McGilchrist, I. (2009). The master and his emissary. New Haven: Yale UP. Nattiez, J. (1990 trans). Music and discourse. Princeton: Princeton UP. Panksepp, J. & Biven, L. (2012). The archaeology of mind: Neuroevolutionary origins of human emotions. New York: Norton. Patel, A. D. (2010). Music, language, and the brain. Oxford: Oxford UP. Porges, S. W. (2011). The polyvagal theory. New York: Norton. Posner, M. I. (2012). Cognitive neuroscience of attention (2nd ed.). New York: The Guilford Press. Schilhab, T., & Stjernfelt, F., & Deacon, T. eds. (2012). The symbolic species evolved. NY:Springer. Schore, A. N. (2011). The science of the art of psychotherapy. New York: Norton. Seung, S. (2012). Connectome: How the brain’s wiring makes us who we are. Boston: Houghton Mifflin. Spitzer, M. (2004). Metaphor and musical thought. Chicago: University of Chicago Press. Wallin, N. L., Merker, B. & Brown, S. (2000). The origins of music. Cambridge: MIT Press. Weber, B.H. & Depew, D.J. (2007) Evolution and learning: The baldwin effect reconsidered. Cambridge: MIT Press paperback. Wenseleers, T. et al “Social evolution theory: a review” (2010:ch. 6). in Szekely, T. et al (eds) (2010). Social behavior: Genes, ecology and evolution. New York: Cambridge UP. Wolf, M. (2007). Proust and the squid. New York: Harper. Relaxation of selection, niche construction, and the Baldwin effect in language evolution. Artificial Life (MIT) Fall 2010, Vol. 16, No. 4, pp. 271-287.
49
STRATEGIES FOR THE EMERGENCE OF FIRST-ORDER PHRASE STRUCTURE
EMILIA GARCIA CASADEMONT AND LUC STEELS Institut de Biologia Evolutiva (UPF-CSIC) PRBB, Barcelona [email protected] We introduce and investigate strategies that lead to the emergence and sharing of first-order phrase structure in a population of agents playing language games. First-order phrase structures combine words into phrases but do not yet generalise to hierarchical or recursive phrases. We argue that syntax is motivated by the need to reduce cognitive effort, mainly to avoid combinatorial search in parsing, and communicative accuracy.
1. Introduction There has already been a substantial body of earlier work on the emergence of phrase structure. One widespread hypothesis (following from research in Iterated Learning) is that language learners use a learning algorithm (for example minimal description length learning) that seeks structure in the data, even if the data is not or only weakly structured (Smith, Kirby & Brighton, 2003). Once they have hypothesised structure, by overgeneralisation, learners impose it on their own utterances as speakers so that the next generation of learners picks up this structure again and possibly imposes more structure of their own. In this approach, the introduction of phrase structure is exclusively in the hands of the learner and is motivated by overcoming the transmission bottleneck. In contrast, we argue here that syntax arises from the need to avoid combinatorial explosions in parsing and semantic ambiguity in interpretation. So we seek a functionalist as opposed to structuralist explanation. Moreover, the way structure arises is not through a transmission bottleneck but by stepwise invention, adoption and alignment of linguistic conventions in a population of agents based on a cultural selectionist dynamics (Steels, 2012). To develop the argument, we first consider the question why human languages have syntax and then by what strategies this could arise. 2. Why do human languages have syntax? Let us use the standard framework of the predicate calculus for representing meaning. Su , a situation model, is equal to a set of facts WSu = {f1 , ..., fn }, where
50
each fact fi = pj (ok ) is a proposition stating that a predicate pj is true for an object each fact fi = pj (ok ) is a proposition stating that a predicate pj is true for an object ok in the present situation Su . A predicate pj is decomposed into an attribute af ok in the present situation Su . A predicate pj is decomposed into an attribute af (for example ‘color’) and a value v (for example ‘blue’), written as p = a -v , (for example ‘color’) and a value vgg (for example ‘blue’), written as pjj = aff -vgg , as in ‘color-blue’. In the present study, all predicates have only a single argument. as in ‘color-blue’. In the present study, all predicates have only a single argument. Higher order phrase structure requires predicates with multiple arguments but this Higher order phrase structure requires predicates with multiple arguments but this outside the scope of this paper. outside the scope of this paper. An object-description is equal to the set of all predicates in WSu that have the An object-description is equal to the set of all predicates in W u that have the same object as argument. We assume a shared ontology consistingSof an indefinite same object as argument. We assume a shared ontology consisting of an indefinite number of abstract attributes and values and a shared lexicon consisting of a set of number of abstract attributes and values and a shared lexicon consisting of a set of words w1 , ..., wm , where each word wd introduces a predicate pe and an argument a predicate pe and an argument words w1 , ..., wm , where each word wd introduces using a variable ?v , as in w = p (?v ).a When using a lexical strategy, using a variable ?vff , as in wdd = pee (?vff ).a When using a lexical strategy, the speaker looks up in the lexicon the minimal set of words that expresses all the speaker looks up in the lexicon the minimal set of words that expresses all the object-descriptions he wants to convey and transmits these words in any order the object-descriptions he wants to convey and transmits these words in any order to the hearer. The hearer then recovers the predicates p1 (?v1 ), ... , pl (?vm ) by to the hearer. The hearer then recovers the predicates p1 (?v1 ), ... , pl (?vm ) by consulting the lexicon and then attempts to find bindings for all the variables, consulting the lexicon and then attempts to find bindings for all the variables, such that for every word wd in the utterance with meaning pe (?vf ), there is a fact such that for every word w in the utterance with meaning p (?vf ), there is a fact in the situation model fi = dpj (ok ) where pj = pe and ?vf isebound to o . in the situation model fi = pj (ok ) where pj = pe and ?vf is bound to okk . By way of illustration, we use Spanish as the object language and English for By way of illustration, we use Spanish as the object language and English for the names of predicates. For example, consider the following mini-lexicon: the names of predicates. For example, consider the following mini-lexicon: word word verde verde rojo rojo
meaning meaning color-green(?x) color-green(?x) color-red(?u) color-red(?u)
word word peque˜no peque˜no grande grande
meaning meaning size-small(?y) size-small(?y) size-big(?v) size-big(?v)
word word cubo cubo esfera esfera
meaning meaning shape-cube(?z) shape-cube(?z) shape-sphere(?w) shape-sphere(?w)
then in order to express the two object-descriptions {color-green(o1 ), size-big(o1 ), then in order to express the two object-descriptions {color-green(o1 ), size-big(o1 ), shape-cube(o1 )} and {shape-cube(o2 )}, the speaker produces ”verde cubo cubo shape-cube(o )} and {shape-cube(o2 )}, the speaker produces ”verde cubo cubo grande” using1free word order. The hearer recovers the predicates color-green(?x), grande” using free word order. The hearer recovers the predicates color-green(?x), size-big(?v), shape-cube(?z), shape-cube(?z2 ). When matching this to a situasize-big(?v), shape-cube(?z), shape-cube(?z2 ). When matching this to a situation model WS1 = {color-green(o1 ), size-big(o1 ), shape-cube(o1 ), color-red(o2 ), tion model W 1 = {color-green(o1 ), size-big(o1 ), shape-cube(o1 ), color-red(o2 ), the following consistent binding set is found: {?x = o , ?v = o , shape-cube(o2S)}, shape-cube(o2 )}, the following consistent binding set is found: {?x = o11 , ?v = o11 , ?z = o1 , ?z2 = o2 }. ?z = o1 , ?z2 = o2 }. It often happens, as in this example, that several variables bind to the same It often happens, as in this example, that several variables bind to the same variable. It is useful to represent this information in advance of attempting to variable. It is useful to represent this information in advance of attempting to match the predicates against the situation model because then the matcher has match the predicates against the situation model because then the matcher has to do less combinatorial work. This can be done by replacing all the variables to do less combinatorial work. This can be done by replacing all the variables that should bind to the same object by a single one, as in {color-green(?x), sizethat should bind to the same object by a single one, as in {color-green(?x), sizebig(?x), shape-cube(?x)}. We call a list of predicates with equalised variables a big(?x), shape-cube(?x)}. We call a list of predicates with equalised variables a predicate combination and a set of predicate combinations that covers the whole predicate combination and a set of predicate combinations that covers the whole utterance a predicate combination hypothesis or simply hypothesis. utterance a predicate combination hypothesis or simply hypothesis. The comprehension process can be decomposed into two subprocesses: (i) The comprehension process can be decomposed into two subprocesses: (i) Generate all possible hypotheses for the predicates provided by the words in an Generate all possible hypotheses for the predicates provided by the words in an utterance so far and (ii) Test the hypotheses by filtering out those for which there utterance so far and (ii) Test the hypotheses by filtering out those for which there a Following standard AI convention, variables are denoted by symbols preceded by a question mark. a Following standard AI convention, variables are denoted by symbols preceded by a question mark.
51
exists a consistent binding set with the current (shared) situation model. The set of all hypotheses is called H (for hypothesis set) and the set of all hypotheses for which a consistent set of bindings can be found in the situation model is called M (for meaning set). If the cardinality of M is greater than 1 then there is semantic ambiguity. How does H scale in relation to the number of words in an utterance? The number of hypotheses Bm is equal to the number of partitions of the set D of words in an utterance of size m, where a partition of D is defined as a set of nonempty, pairwise disjoint subsets of D whose union is D. Bm is known as the Bell number and defined using the following equation (Bell, 1938): Bm+1 =
m m
k=0
k
Bk
(1)
with B0 = B1 = 1. So Bm , the cardinality of H, grows double exponentially with the number of words in the utterance (see Figure 1 (right)). This means that the sentence you are now reading (which contains 20 words) generates 51,724,158,235,372 partitions and hence possible hypotheses. (Assuming of course that there is no syntax.) So this is the core of the problem. Without some way to limit the number of hypotheses, the lexical strategy will not be effective for utterances which contain more than a few words. We argue that this is the reason why syntax is used in human languages. Indeed, if extra information can be provided by the speaker so that the hearer can restrict the set of hypotheses or to consider as quickly as possible only those that are relevant, given the ontology and current situation, then the combinatorial explosion in the interpretation process can be drastically reduced. We use a variant of the Naming Game (originally introduced in (Steels, 1995)) to explore how this can be done, using the same methodology as used already for investigating the origins of agreement (Beuls & Steels, 2013), namely, by formulating and studying increasingly more sophisticated strategies until we end up with strategies that give rise to the kinds of structures we find in human languages, i.c. phrase structure. Each strategy should be an improvement on the previous one, either by reducing the size of the set of possible meanings generated by the parser, or by reducing the inventory of constructions and thus the cognitive effort that is needed to store or learn them. In the Naming Game, the speaker describes a set of properties of an object in the shared context using an utterance and the hearer then examines whether the meaning fits with the current situation. There can be more than one object and the hearer does not know how many. Moreover there is more than one word for the same object.
52
3. The Grouping Strategy A first step towards a reduction of the set of hypotheses is achieved when the speaker puts the elements which refer to the same object together. We call this the grouping strategy (see Figure 1). For example, to express the two object-descriptions {color-green(o1 ), size-big(o1 ), shape-cube(o1 )} and {shapecube(o2 )} with the words ”verde”, ”cubo”, ”grande” and ”cubo”, the speaker puts ”verde” ”cubo” and ”grande” together as in ”verde cubo grande cubo”, instead of in some random order as before, such as ”verde cubo cubo grande”. W1+W2+W3 extend
12
W1+W2 add W1+W2, W3 W1
Lexical strategy
9
extend
6
add
Grouping strategy
W1,W2
3
…
0 1
W1
W2
W3
3
5
7
9
m
Figure 1. Left: The grouping strategy progressively builds a search tree where the nodes provide the predicate combinations obtained from each consecutive word. When a word is encountered, its meaning is either added to an existing predicate combination on that path (extend), or a new predicate combination is started (add). Right: Scaling behavior of the lexical strategy and the grouping strategy. The x-axis plots m, the length of the utterance, and the y-axis the cardinality of H on a logarithmic scale.
Does the grouping strategy help to restrict the set of hypotheses? The computational complexity of the grouping strategy can be derived in a straightforward way from the underlying algorithm. The set of possible hypotheses H is now 2(m 1) with m the number of words in the utterance. The growth of hypotheses in relation to the utterance length is therefore significantly less than the Bell number (see Figure 1) but it is still exponential and does not really allow scaling up the size of the utterance yet. For example, for a sentence of 20 words we still have 524,288 possible combinations. 4. The Sequencing Strategy A more sophisticated strategy is to introduce ordering among the words in a group, which implies that each agent has to store and learn an inventory of grammatical constructions which recognise and impose this ordering for a particular object description. Ordering helps in two ways: (i) The boundaries of a group occasionally become delineated. For example, consider the utterance “peque˜no verde esfera rojo grande cubo”. If the word sequences “peque˜no verde esfera” and “rojo
53
30
grande cubo” are already known to the hearer, then he can infer that there is a boundary between “esfera” and “rojo”. appear because the same attribute (color) cannot occur more than once in the same object-description. On the other hand “esfera rojo” or “esfera rojo grande” may appear. There is a second benefit. (ii) Stored sequences act like chunks (Figure 2 (left)). The building of a predicate combination matching with a complete object description can happen in one step instead of several, so that the search space is shrinking. For example, “peque˜no verde esfera rojo grande cubo” can be parsed in two steps rather than 5. W1+W2+W3+W4
extend
25
W1+W2+W3
Sequencing strategy
W1+W2, W3+W4 extend
W1+W2, W3 add
add …
20 0
W1,W2
Pattern strategy
W1+W2, W3, W4 5
W1
15
add add
10
W1+W2+W3, W4
number of constructions
add
extend W1+W2
0
W1
W2
W3
W4
1000
2000
3000
4000
5000
Interactions
Figure 2. Left: The sequencing strategy stores a sequence of words as a single construction, so that the search space shrinks. Right: Convergence of constructions comparing the sequencing strategy and the pattern strategy (discussed in the next section). Both lead quickly to convergence of an optimal set for the situations the agents communicate about. Clearly the pattern strategy leads to fewer patterns and faster convergence.
How is an inventory of sequence constructions established? The speaker first uses the existing sequences in his inventory to produce an utterance and if some meanings are left unexpressed, he uses the grouping strategy with a random order but then stores the chosen order for the future. The hearer also uses first his existing inventory and the grouping strategy if some words do not form a sequence yet. After he has established a consistent unique interpretation, the hearer can infer which sequences are missing from his inventory and build new grammatical constructions. Because different speakers may invent different orderings, a mechanism is needed to allow the population to reach convergence. Each construction has an associated score. The sequences with the highest scores are used first (by speaker and hearer). After the game, the hearer uses a lateral inhibition learning rule to update the scores (Steels, 1998): The scores of winning constructions are increased and competitors (i.e. constructions with the same meaning but a different word ordering) are decreased. Figure 3 shows that this strategy leads to the typical winner-take-all dynamics of lateral inhibition, also observed in the Naming Game, where after an initial growth, variation is damped and the shared inventory settles on the optimum number (which is 20 in this case).
54
1
scores
0.8 0.6 0.4 0.2 0
0
2000
4000
6000
8000
10000
games
Figure 3. The x-axis plots txeipo) the number of games played. only (lonye txeipo vebalo) At each time instant (vebalo faira) two randomly (quohezo (quohezo fairay-axis lonye) plots the running (fairaaverage quohezo) of scores of all(quohezo chosen agents interact. The agents roigo) for each sequence (lonye quohezo) (txeipo quohezo) (roigo txeipo vebalo) construction. The(txeipo population size is 10 (but couldfaira of roigo) course be scaled A clear winner-takevebalo roigo) (quohezo (lonyeup). txeipo quohezo) (vebalo roigo) (roigo txeipo quohezo) all effect is observed as (faira one roigo) ordering becomes dominant for each possible object description. This (txeipo vebalo) (quohezo lonye) (txeipo lonye quohezo) happens very quickly (roigo afterquohezo) less than 2000 games (which is on average(quohezo 400 pertxeipo agent for a population (roigo vebalo) lonye) (faira vebalo) (roigo faira vebalo) (vebalo faira lonye) of 10). (roigo faira) (lonye txeipo) (faira lonye) (roigo vebalo faira) (txeipo roigo vebalo) (vebalo lonye txeipo) (txeipo roigo)
(lonye vebalo) (vebalo txeipo) (vebalo lonye) (txeipo lonye) (faira lonye vebalo) (vebalo faira roigo) (faira roigo vebalo)
(quohezo roigo faira) (quohezo txeipo roigo) (quohezo faira) (lonye faira quohezo) (txeipo quohezo lonye)
Figure 4 shows that the sequencing strategy is effective in minimising the number of constructions that agents have to apply before they find a solution. The sequencing strategy also reduces the frequency of semantic ambiguity in the interactions (see Figure 5 (left)).
15 10 0
5
Computational Cost: 6 words utterance
15 10 5 0
Computational Cost: 6 words utterance
20
Sequencing Strategy
20
Grouping Strategy
0
2000
4000
6000
8000
0
Interactions
2000
4000
6000
8000
Interactions
Figure 4. Time evolution of the number of construction applications needed for the grouping strategy (left) and the sequencing strategy (right) for utterances of size m=6. For each language game (shown on the x-axis), the number of construction applications is shown (on the y-axis). We see that the sequencing strategy requires fewer computational steps.
5. The Pattern Strategy Although the sequencing strategy leads to an improvement in search efficiency, this benefit is traded against memory load. The number of sequences is equal to the number of possible word sequences describing the same object and this grows exponentially with the number of attributes. Let a be the number of attributes
55
and v the number of values for the attributes, assumed to be constant, then the maximum number of sequences is equal to a a · v n · n! (2) n n=2
For example, for v = 2, we get 727 sequences for a = 6, 6560 for a = 8, 59048 for a = 10, etc. Given that a = 3 and v = 2 in the present experiments, this is equal to 72. Agents do not make all possible sequence constructions because once they have already acquired a sequence that covers an object description the speaker does not need to make a new one. nevertheless the sequencing strategy does not adequately scale. A solution is to use patterns, i.e. sequences defined in terms of lexical categories (or parts of speech) such as noun, adjective, adverb, etc. We call this the pattern strategy. It will restrict the inventory to managable proportions while retaining the same advantages as the sequencing strategy, i.e. generating fewer hypotheses and shrinking the search space. To implement the pattern strategy, the lexicon now stores with every word wi not only the meaning but also an associated set of categories cat(wi ) and instead of sequence constructions we get pattern constructions that associate a combination of lexical categories with a particular sequential order. A pattern p is further denoted in terms of a sequence of lexical categories: p = [c1 , ..., cn ]. How do agents build (as speakers) and learn (as hearers) new patterns? Three primitive actions are needed: initialise, coerce and reuse: [1] Initialise Initially, no patterns exist and words have empty category sets. When a group has been constructed by the speaker or the hearer (using the grouping strategy discussed earlier), a new pattern is built by (i) creating a new category for each word in the group and adding this category to the category-set of each respective word in the lexicon, and (ii) creating the corresponding pattern construction for this sequence of categories. [2] Coercion When a pattern is partially matching with a subsequence of words in the input, then the word whose category is not matching can be coerced into filling that slot by adding the expected category cj to the possible categories of the word. This is like coercing a noun (such as ”google”) to be used as a verb (as in ”she googled me on the Internet”). Coercion minimises the number of patterns that are used. [3] Reuse When a new pattern is created for a group of words (using initialise) and a word already belongs to some lexical categories, then it is advisable to reuse one of the existing categories of this word in the new pattern. Reuse minimises the number of categories. Experimental results show unequivocally that the semiotic dynamics generated by these primitive operations together with the lateral inhibition learning rule leads to a shared grammar within the population of agents (see Figure 2 (right)).
56
0.20
As expected, the pattern-strategy leads to fewer patterns compared to the sequencing strategy and hence to faster convergence. We get the same results with respect to efficiency as shown in figure 4. Both the categories and the patterns are different for different agents,Semantic but we can nevertheless investigate how they converge by Ambiguity using MDS-plots (Figure 5). MDS Categories without all categories 2
0.15
SYN-... SYN-... SYN-... SYN-...
1
SYN-...
Dim2
0.10
SYN-... SYN-... SYN-...
0
SYN-... SYN-...
0.05
SYN-... -1
SYN-...
-2
1/3
SYN-...
0.00
-2.2
Grouping Strategy
Sequencing Strategy
-1.1
0.0
1.1
2.2
Dim1
Figure 5. Left: The bar plot compares the frequence of interactions with more than one possible interpretation after matching against the world model. Right: A Muli-Dimensional-Scaling plot of the different categories of the agents. Clusters emerge, showing that agents have developed similar lexical categories. reset
6. Conclusions Clearly there are many opportunities to extend this work. By using slightly more intelligent strategies the learning efficiency and effectiveness of both the sequencing and the pattern strategy can be increased. Nevertheless the strategies presented here already demonstrate clearly the main thesis of the paper, namely that syntax is motivated by the need to avoid combinatorial search in parsing and semantic ambiguity in interpretation. References Bell E (1938) The iterated exponential integers. The Annals of Mathematics 39. Steels, L. (1995) A self-organizing spatial vocabulary. Artificial Life Journal, 2(3),1995. Steels, L. (1998) The Origins of Ontologies and Communication Conventions in MultiAgent Systems. Journal of Agents and Multi-Agent Systems, 1(2), 1998. pp. 169194. Beuls, K., and L. Steels (2013). Agent-Based Models of Strategies for the Emergence and Evolution of Grammatical Agreement. PLOS ONE, 8(3), e58960. http://dx.plos.org/10.1371/journal.pone.0058960 Smith, K., S. Kirby, and H. Brighton. (2003). Iterated learning: a framework for the emergence of language. Artificial Life, 9(4):371-386. Steels, L. (ed.) (2012) Experiments in Cultural Language Evolution. John Benjamins Pub., Amsterdam.
57
WHAT WERE WE TALKING ABOUT? EXCHANGING SOCIAL MODELS AS A ROUTE TO LANGUAGE MARTIN EDWARDES Department of Education and Professional Studies, KingÕs College London, Franklin-Wilkins Building, Stamford Street, London SE1 9NH, United Kingdom This paper looks at the role that social calculus and the exchange of social models could have played in the process leading to human language. It considers the nature and source of social calculus, its place in the evolution of humans, and its consequences for language. The paper is exploratory rather than evidential, but it does provide a plausible explanation for the appearance of grammatical form in human language.
1. Introduction There has been considerable discussion on the ÒhowÓ (e.g. Aitchison, 1996) and ÒwhyÓ (e.g. Dunbar, 2009) of language origins, and some discussion on the ÒwhoÓ (e.g. Johansson, 2013). The ÒwhereÓ of language origins is also now agreed: it is most likely to have happened in Africa, between the evolution of Homo sapiens and our diaspora across the globe (e.g. Tattersall, 2009). However, the ÒwhatÓ question remains largely unaddressed: what particular communicational activity required the suite of functions that typifies human language? It is not enough to view the versatility of grammatical language as the reason for its appearance; that is putting the effect before the cause. Instead, there must have been a particular cognitive function which could only be shared using a complex, language-like system; and, to become shareable, it must have involved information which was advantageous to both sender and receiver. This paper considers the sharing of knowledge about social relationships as a primary linguistic event, and looks at the events that could have brought it about. Sharing social information is a communicative activity which seems to be exclusively human, a necessary feature of the reputation-driven (Engelmann et al., 2012), altruistically punishing (Boyd et al., 2003), reverse-dominant (Boehm, 1999) culture of modern humans. The paper relies, and builds upon, DunbarÕs gossip hypothesis (Dunbar, 1996); but it looks at the cognitive and
58
communicative structures that underlie gossip, rather than the grooming and socialisation functions that gossip provides. 2. A cognition of social relationships A key feature of human socialisation is the ability to map relationships between others in our social group. Like many other animals, we are able to cognitively model other individuals and our relationships with them; and, like other primates, we can also model relationships between other individuals, and use those models to adjust our relationships with those other individuals. At first glance, the modelling of relationships between other individuals (the two argument form of A-relationship-B) would seem to be a simple extension of our capacity to model our own relationships with others (the one-argument form of relationship-A). There is, however, a considerable difference between the two forms. Relationship-A represents the capacities to reliably identify other individuals, and to associate emotional tags with those individuals; and both of these capacities seem to be evolutionarily quite ancient (e.g. Cooper et al., 2003). These relationship-A models are intimately personal: they represent our own image of the other individual, and our own emotional reaction to that individual; the modelled other and the emotions attached to that modelled other are closely intertwined. This contrasts with the A-relationship-B model, where the images of the other individuals and the modelled emotional relationship between them are not our own images of, and relationships with, those individuals. Personally, I may be inimical to both Alf and Beth, but I have to be able to model their friendship as something separate to my own emotions. This also means that my model of BethÕs image of Alf has to be different from my own image of Alf; but the two images also have to represent the same individual. This problem multiplies as the number of group members increases: I have to try to retain models of everyoneÕs images of everyone, and somehow produce a coherent understanding of the actual relationships in the group. Where relationship-A modelling requires a simple social arithmetic, A-relationship-B modelling requires a social calculus, or computational grammar. So in this modelling of group relationships we have a complex cognitive activity that requires many of the functions that typify human language. It involves segmentation, in that the modelled individuals and the modelled relationships have to be slotted into a standard form of A-relationship-B; it involves
59
differentiation, in that the relationships and modelled individuals serve different functions in the standard form; it involves abstraction, in that the relationship A has with B is distinct from my relationships with A and B; and it can involve directionality, in that the relationship in A-relationship-B may not be the same as in B-relationship-A. This last function may rely on the capacity to attribute false beliefs to others, a capacity which chimpanzees do not share with us (Call & Tomasello, 2008); and which, therefore, may well be exclusively human Ð at least in terms of currently primate species. A-relationship-B calculus is, however, quite ancient in other ways. Cheney and Seyfarth (2007) show how modern baboons (Papio hamadryas ursinus) maintain social hierarchies in which each baboon knows their place. They must give deference to those above them to avoid confrontation, and they expect deference from those below them. The hierarchy is linear and, by itself, it involves simple relationship-A modelling. However, baboons also keep track of the interactions of others in their group. They are able to identify who is making a call from the call itself, and they pay more attention when, for instance, a threat bark from a subordinate is followed by a fear bark from a dominant. Female baboons also seem to understand a hierarchy of families overlaying the individual hierarchy: after a confrontation, a reconciliation with another member of the antagonistÕs family counts as a reconciliation with the antagonist. The best explanation for this is that baboons have a cognitive social calculus of A-relationship-B constructs, although they do not use this calculus in their communication. This may be because all baboon signals collocate with the event or object being signalled, as seems to be the case for all nonhuman primates; but the value in communicating A-relationship-B constructs is that they can be signalled when the events and objects are not present. The capacity to reference absent, and therefore irreal, objects and events is another capacity which may be exclusive to humans among current primate species. The question for the evolution of language, therefore, is not how A-relationshipB constructs became part of our social cognition, but what led to them becoming communicable. The answer to this question is likely to involve the development of a whole series of cognitive and physical capacities which all need their own explanation. However, the long time period between the likely appearance of social calculus in cognition and its use in communication provides a relaxed timetable for the evolution of all of the necessary capacities. The six million
60
years from the chimpanzee-human common ancestor to modern humans is time enough. For instance, there is enough time for a full phonological explanation, from complex sounds being made as costly signals (Gintis et al., 2001), through an attentional language-like phonology (MacNeilage, 2008), into a situation where the complex sounds take on their own arbitrary meanings (Hurford, 2007). There is time for the development of a fully co-operative culture, involving vigilant sharing to ensure equitable distribution of resources (Erdal & Whiten, 1994), reverse dominance to suppress alpha behaviour within groups (Boehm, 1999), and co-operative signalling based around modelling the needs and expectations of the signal receiver (Dessalles, 2007). This emphasis on overt collaboration over competition would create a species in which co-operation is the norm, and non-co-operative behaviour is altruistically punished (OÕGorman et al., 2009). Vigilant sharing can also lead to joint attention, turning the individualÕs attention outward onto shared events and, in turn, leading to co-operative deixis (Tomasello, 2008). This generates an environment where cultural transmission of complex skills becomes possible: intentional teaching and learning can happen, and a cultural Òratchet effectÓ can take hold (Boyd et al., 2011). Knowledge becomes robust: it is duplicated across several brains, ensuring that it is not lost when individual brains die. Within the development of our co-operative culture, there is enough time to explain the co-evolution of human culture and the co-operative signalling needed to support that culture. How this co-evolution could have happened has generated many complementary and competing explanations (e.g. McNamara et al., 2008; Ambrose, 2010; Pinker, 2010; Jablonka et al., 2012), and it should not be seen as either simple or inevitable. However, while there is currently no single explanation for the co-evolution, the timescale means that we do not need to introduce a sudden or catastrophic evolutionary event to justify its development.
3. Sharing social relationships None of these effects, by themselves, required a language-like communication system; but they did set the scene, leading up to the point when humans first
61
began to share ideas that did require productive complexity, such as social calculus. This was, of course, the moment when we began to use gossip as a social lubricant (Dunbar, 1996); so it is likely that the sharing itself was motivated by the need to create new ways of grooming, or socialising with, each other. An interesting feature of this sharing of social models is that it doesnÕt necessarily rely on truth-values. Any information you share with me about your perception of the relationship between Alf and Beth tells me something about your own relationship with each of them, regardless of whether the utterance represents the actual relationship between Alf and Beth. There is useful information in your utterance beyond what the utterance says, a ÒmetaÓ level which makes the utterance worth listening to regardless of direct semantic content. Because the receiver is listening to the sender as well as the message, the mere act of utterance creates value in the utterance. This new way of meaning changes the signalling costs and benefits for both sender and receiver. Utterances can be cheap (and potentially dishonest) in terms of their direct message, while still being costly to the sender (and valuable to the receiver) in terms of their metamessages. In this environment, the true cost of information-giving is reputation, backed by altruistic punishment (Fehr & GŠchter, 2002), which will tend to keep the direct message honest; but the difficult-to-fake information in the metamessages provides an added bonus for the receiver. Once A-relationship-B utterances are being exchanged, other cognitive and linguistic capacities begin to emerge naturally from the signalling environment. These include: ¥ Reflective selfhood Ð when someone offers me a social model that includes me as one of the protagonists, I have to be able to make an image of myself as a third party in the same way I make third-party images of others; ¥ Grammatical persons Ð when images of other and self are part of communication, the privileged communicative roles of sender and receiver need to be recognised and modelled; ¥ Temporality and modality Ð once the irrealis boundary of absent reference has been crossed, and the need for signal accuracy has been
62
mitigated, it is possible to introduce information which is not current or even not actual; ¥ Recursion Ð because of the conditionality of truth in the offered models, tagging received A-relationship-B models with the identity of the sender (C) provides deniability when they are re-broadcast; which, in turn, means that tagging received A-relationship-B-by-C models with the identity of the sender (D) becomes valuable É and so on. In theory, this iteratively nested tagging requires Ð or provides Ð the infinite recursion proposed by Hauser et al. (2002). However, as Dunbar (2004) shows, the number of nested levels actually possible is heavily constrained. These capacities can emerge naturally out of the sharing of social models, using cognitive mechanisms developed for other purposes (Edwardes, 2010 & forthcoming); like the sharing of social models, they do not each need their own genetic explanation. So, while they extend the range and power of language, they do not rely on a cognitively specialised language engine for their expression. Instead, language develops as a series of responses to particular communicational needs. 4. Conclusion This paper started with a specific question: what particular communicational activity required the suite of functions that typifies human language? While the sharing of social relationships may not be the only answer possible, it does seem to satisfy many of the issues that an attempt to answer this question inevitably raises. It does not need a special genetic explanation because it seems to be a relatively ancient cognitive mechanism; and, because it does not rely on special genetic explanations, it can be incorporated into a standard model of human evolution. In terms of communication, it does not require novel cognitive systems; and, while it does rely on a new communicative need, that need is justifiable in fitness terms. Finally, the sharing of social relationships is itself a productive explanation for other aspects of being human, such as our capacity to model ourselves objectively. Shared social calculus may not be the final answer to the question posed above; but, like any scientific hypothesis, it provides an effective working model until something better comes along.
63
References Aitchison, J. (1996). The Seeds of Speech: language origin and evolution. Cambridge: Cambridge University Press. Ambrose, S. H. (2010). Coevolution of Composite-Tool Technology, Constructive Memory, and Language: Implications for the Evolution of Modern Human Behavior. In Current Anthropology Volume 51, Supplement 1, June 2010, S135-S147. Boehm, C. (1999). Hierarchy in the Forest: the evolution of egalitarian behaviour. Cambridge: Harvard University Press. Boyd, R., Gintis, H., Bowles, S. & Richerson, P. J. (2003). The evolution of altruistic punishment. In PNAS, vol. 100, no. 6, 3531-3535. Boyd, R., Richerson, P. J., & Henrich, J. (2011). The cultural niche: Why social learning is essential for human adaptation. In PNAS, vol. 108, suppl. 2, 10918-10925. Call, J. & Tomasello, M (2008). Does the chimpanzee have a theory of mind? 30 years later. In Trends in Cognitive Sciences, Vol.12, No.5, 187-192. Cheney, D. L. & Seyfarth, R. M. (2007). Baboon Metaphysics: the evolution of a social mind. Chicago: University of Chicago Press. Cooper, J. J., Ashton, C., Bishop, S., West, R., Mills, D. S. & Young, R. J. (2003). Clever hounds: social cognition in the domestic dog (Canis familiaris). In Applied Animal Behaviour Science 81, 229-244. Dessalles, J. L. (2007). Why We Talk: the evolutionary origins of language. Oxford: Oxford University Press. Dunbar, R. I. M. (1996). Grooming, Gossip and the Evolution of Language. London: Faber & Faber Ltd. Dunbar, R. I. M. (2004). The Human Story: a new history of mankind's evolution. London: Faber & Faber Ltd. Dunbar, R. I. M. (2009). Why only Humans Have Language. In R. Botha & C. Knight (Eds.), The Prehistory of Language. Oxford: Oxford University Press. Edwardes, M. (2010). The Origins of Grammar: an anthropological perspective. London: Continuum. Edwardes, M. (forthcoming). Awareness of self and awareness of selfness: why the capacity to self-model represents a novel level of cognition in humans. In Selected Papers from the UK Cognitive Linguistics Conference 4, July 2012. Engelmann, J. M., Herrmann, E. & Tomasello, M. (2012). Five-Year Olds, but Not Chimpanzees, Attempt to Manage Their Reputations. In PLoS One, Vol. 7, Issue 10, e48433. Erdal, D. & Whiten, A. (1994). On Human Egalitarianism: An Evolutionary Product of Machiavellian Status Escalation? In Current Anthropology, Vol. 35, No.2, 175-183. Fehr, E. & GŠchter, S. (2002). Altruistic punishment in humans. In Nature, vol 415, 10 January 2002, 137-140.
64
Gintis, H., Smith, E. A. & Bowles, S. (2001). Costly Signaling and Cooperation. In J. theor. Biol. 213, 103-119. Hauser, M. D., Chomsky, N. & Fitch, W. T. (2002). The Faculty of Language: what is it, who has it, and how did it evolve? In Science vol 298 22 November 2002, 1569-1579. Hurford, J. R. (2007). The Origins of Meaning: language in the light of evolution. Oxford: Oxford University Press. Jablonka, E., Ginsburg, S. & Dor, D. (2012). The co-evolution of language and emotions. In Phil. Trans. R. Soc. B 2012 367, 2152-2159. Johansson, S. (2013). The Talking Neanderthals: What Do Fossils, Genetics, and Archeology Say? In Biolinguistics 7: 035-074. MacNeilage, P. (2008). The Origin of Speech. Oxford: Oxford University Press. McNamara, J. M., Barta, Z., Fromhage, L. & Houston, A. I. (2008). The coevolution of choosiness and cooperation. In Nature, Vol 451, 10 January 2008, 189-192. OÕGorman, R., Henrich, J. & Van Vugt, M. (2009). Constraining free riding in public goods games: designated solitary punishers can sustain human cooperation. In Proc. R. Soc. B 276, 323-329. Pinker, S. (2010). The cognitive niche: Coevolution of intelligence, sociality, and language. In PNAS 11 May 2010, vol. 107, suppl. 2, 8993-8999. Tattersall, I. (2009). Human origins: Out of Africa. In PNAS, vol. 106, no. 38, 16018-16021. Tomasello, M. (2008). Origins of Human Communication. Cambridge: MIT Press.
65
WHY MIGHT SOV BE INITIALLY PREFERRED AND THEN LOST OR WHY MIGHT SOV BE INITIALLY PREFERRED AND THEN LOST OR RECOVERED? A THEORETICAL FRAMEWORK RECOVERED? A THEORETICAL FRAMEWORK RAMON FERRER-I-CANCHO RAMON FERRER-I-CANCHO Complexity and Quantitative Linguistics Lab, TALP Research Center, Departament de Complexity and Quantitative Linguistics Lab, TALP Research Center, Departament de Llenguatges i Sistemes Inform`atics, Universitat Polit`ecnica de Catalunya, Campus Nord, Llenguatges i Sistemes Inform`atics, Universitat Polit`ecnica de Catalunya, Campus Nord, Edifici Omega, Jordi Girona Salgado 1-3. Barcelona, 08034, Catalonia (Spain) Edifici Omega, Jordi Girona Salgado 1-3. Barcelona, 08034, Catalonia (Spain) [email protected] [email protected] Little is known about why SOV order is initially preferred and then discarded or recovered. Here Little is known about whyfor SOV order is initially andrelated then discarded or recovered. we present a framework understanding thesepreferred and many word order phenomena:Here the we presentofadominant framework for understanding andwords manyorders, related the word order the diversity orders, the existencethese of free need of phenomena: alternative word diversity of word dominant the existence words orders, of alternative orders and orderorders, reversions and cyclesofinfree evolution. Underthe thatneed framework, word word order orders and word order reversionssatisfaction and cycles problem in evolution. Under word are order is regarded as a multiconstraint in which at that leastframework, two constraints in is regarded as a multiconstraint satisfaction problem in which at least two constraints are in conflict: online memory minimization and maximum predictability. conflict: online memory minimization and maximum predictability.
1. Introduction 1. Introduction There is converging evidence that SOV (subject-object-verb) or its semantic corThere is converging evidence that SOV (subject-object-verb) or its semantic correlate (agent-patient-action) is a word order emerging at the very origins of lanrelate (agent-patient-action) is a word order emerging at the very origins of language (Dryer, 2005; Pagel, 2009; Gell-Mann & Ruhlen, 2011; Goldin-Meadow, guage 2005; Pagel, 2009; Gell-Mann & Ruhlen, 2011; Goldin-Meadow, ¨ (Dryer, So, Ozy¨ rek, & Mylander, 2008). However, the reasons why SOV is initially pre¨ uurek, So, Ozy¨ & Mylander, 2008). However, the reasons why SOV is initially preferred, and later abandoned or readopted are not well understood. This requires, ferred, and later abandoned or readopted are not well understood. This requires, in our opinion, introducing a theoretical framework. in our opinion, introducing a theoretical framework. The ordering of O, S, and V is a particular case of the ordering of a head (V) The ordering of O, S, and V is a particular case of the ordering of a head (V) with two dependents (O and S). Let us consider a general case where a head and with two dependents (O and S). Let us consider a general case where a head and its n dependents (complements or modifiers)a , must be arranged linearly (the 1st its n dependents (complements or modifiers)a , must be arranged linearly (the 1st elements has position 1, the 2nd element has position 2 and so on). Then, D , elements has position 1, the 2nd element has position 2 and so on). Then, Dll, the online memory cost of placing the head in position l (1 ≤ l ≤ n + 1) can the online memory cost of placing the head in position l (1 ≤ l ≤ n + 1) can be defined as the sum of the cost of dependencies before and after the head, i.e. be defined as the sum of the cost of dependencies before and after the head, i.e. (Ferrer-i-Cancho, 2013b) (Ferrer-i-Cancho, 2013b) l−1 n+1−l l−1 n+1−l Dl = g(d) + g(d), g(d) + g(d), Dl = d=1 d=1
d=1 d=1
(1) (1)
a we are blindly borrowing the concept and head and dependent from syntactic theory, a Although we are blindly and head and dependent from syntactic theory, link Although direction or hierarchy are borrowing not relevantthe forconcept our theoretical arguments. link direction or hierarchy are not relevant for our theoretical arguments.
66
where g(d) is the cognitive cost of a syntactic dependency of length d, which is assumed to be a strictly monotonically increasing function of d. Dl is minimized when the head is placed at the center (Ferrer-i-Cancho, 2013b). The optimal placement of the head would change if one wished to maximize the predictability of certain elements. To maximize the predictability of the head (e.g., V), the head should be placed last while to maximize the predictability of the dependents (e.g., S and O) the head should be put first (Ferrer-i-Cancho, 2013a). Therefore, there is a conflict between minimum online memory and maximum predictability provided that n > 1. 2. Word order phenomena as a multiconstraint engineering problem. 2.1. The diversity of word orders According to the conflicts above, it is expected that that there is not a single winner in the world-wide statistics of the dominant orderings for O, S and V. N1 , N2 and N3 are defined, respectively, as the number of languages whose dominant word order has the verb first, second or last. In a sample of 1377 languages, it is found N1 = 120, N2 = 499 and N3 = 569 (Dryer, 2011), suggesting that two possible strategies for maximizing predictability and the strategy for minimizing online memory expenditure are all three represented in world languages. Verb initial orders suggest a strategy of maximizing the predictability of the subject and the object; verb final orders suggest a strategy of maximizing the predictability of the verb; central verbs suggest a strategy of minimizing online memory expenditure. Here it is not intended to establish whether the counts above indicate that a certain strategy is better than another in absolute terms for real languages (this is left for future work). Following a similar argument it has been suggested that word order diversity could emerge from the struggle between two cognitive domains (the computational system of grammar and the direct interaction between the sensory-motor and the conceptual system) trying to impose their preferred structure on human language (Langus & Nespor, 2010). Our approach emphasizes the conflict between universal abstract principles of sequential processing and does not need to recur to any specific cognitive system or cognitive domain. 2.2. Languages lacking a dominant word order The conflict between online memory and predictability implies that there is no unquestionable placement for the verb and might explain the existence of a 14% of languages lacking a dominant word order (Dryer, 2011) and why a lack of dominant order is an intermediate stage between SOV, i.e. maximum predictability of the verb and SVO, i.e. minimum online memory (Pagel, 2009).
67
2.3. Word order reversions in evolution The conflict between online memory and predictability may shed light on word order reversions and cycles in word order evolution. A typical example is the reversion from SVO to SOV (Pagel, 2009; Gell-Mann & Ruhlen, 2011). For instance, Mandarin Chinese was originally an SOV language and became SVO; it is currently in the processes of moving back to SOV and thus displays both orders (Li & Thompson, 1981; Goldin-Meadow et al., 2008). Cycling between SOV and SVO could be interpreted as cycling between two incompatible attractors: maximum predictability of the head and minimum online memory. This cycling could be the manifestation of bistable system (Strogatz, 1994). Another example of reversion is the transition from SV O to V SO/V OS and then back SVO occasionally (Gell-Mann & Ruhlen, 2011). We do not mean that reversions or cycles (repeated reversions) are a necessity of the conflict between online memory and predictability, rather, that this conflict is a relevant factor underlying word order back and forth changes. We are providing a hypothesis rather than a complete explanation. 2.4. Alternative orders with a head at the center It is well-known that SVO is an alternative word order in languages where SVO is not the dominant word order (Greenberg, 1963). This suggests that having SVO as an alternative is a natural consequence of the conflict between maximizing predictability and minimizing online memory: if a languages does not result in a dominant order with the verb at the center then it should have it as alternative to compensate the choice of a dominant order that does not comply with all the constraints. 2.5. Verb last in computer prediction experiments. Computer simulation is a powerful tool for word order research (Reali & Christiansen, 2009; Gong, Minnet, & Wang, 2009). Recently, SOV has been obtained in two-stage computer simulation experiments with recurrent neural networks (learners) that have addressed the problem of the emergence of word order from the point of view of sequential learning (Reali & Christiansen, 2009). During the first stage, networks learned to predict the next element of number sequences and the best learners were selected. During the second stage, language was introduced and coevolved with the learners. The best language learners and the best learned language were selected. The best language learners had to comply with the additional constraint of maintaining the performance on number prediction of the first stage. Notice that predictability is an explicit selective pressure for the neural networks in both stages and that the languages that were selected in the second stage must have been strongly influenced by the pressure to maximize predictability. This suggests that SOV surfaces in these experiments because postponing the
68
verb is the optimal strategy when maximizing its predictability (Ferrer-i-Cancho, 2013a). 2.6. The preference for head last in simple sequences and its loss in complex sequences. When a head has at least two arguments and the sender maximizes the predictability of the last element, the last item has to be the head (Ferrer-i-Cancho, 2013a). Recent gestural communication experiments with only one head, i.e. a verb or an action, and two modifiers, i.e. a subject or actor and an object or patient, show a preference for placing the head at the end in simple sequences (Goldin-Meadow et al., 2008; Langus & Nespor, 2010) which suggests that the predictability of the head is being maximized. Notice that the null hypothesis that there is no prior preference for head placement cannot explain this phenomenon. A crucial finding is that this head last preference is lost in complex sequences (Langus & Nespor, 2010). The preference for head last in experiments with simple sequences and its loss in longer sequences suggests that (a) maximizing predictability dominates in short sequences (no memory-predictability conflict) while it competes with online memory for longer sequences and (b) sequence length is a critical parameter in word order phenomena. This means that predictability maximization would be the principle dominating in early stages of language evolution. To support hypotheses (a) and (b), it is needed to explain why online memory minimization would have been neglected for short sequences and recovered for longer sequences. To see this, consider that • n = 2 is the minimum number of complements or modifiers needed to observe a preference for the verb to appear as the first or the last item of sequence. This is the number of complements or modifiers in the gestural communication experiments where a preference of the actor (subject) and the patient (object) to precede that action (verb) has been found (GoldinMeadow et al., 2008; Langus & Nespor, 2010). • In all these experiments with short sequences, the dominant order emerging is head last (more precisely the sequence actor-patient-action). It has been demonstrated that this placement of the head maximizes the dependency lengths (Ferrer-i-Cancho, 2013a). • But as there are only n = 2 complements/modifiers, the cost of head last is simply D3 = g(1) + g(2), the smallest among all head last configurations with n ≥ 2 (D3 is indeed the minimum value of Dn+1 for n ≥ 2). The results of the gestural experiments suggest that the online memory cost of a head last configuration can indeed be neglected and thus only predictability matters. As expected, the preference for head last in gestural experiments involving longer sequences (for instance a main clause and a
69
subordinate clause) disappears, suggesting that the conflict between online memory minimization and predictability concerns specially long or complex enough sequences • However, n = 2 does not seem to be enough to warrant that the online memory cost can be neglected since it has been shown above that about one third of word languages follow the SV O or the OV S order, which is a case of n = 2. The key is the fact that, in the gestural experiments where a preference for head last is found, elements are atomic (i.e. made of a single ”word” or unit), which gives, as it will be demonstrated, a further online memory advantage with regard to the case of the ordering of subject, verb and object in word languages. So far, distance has been measured in elements (constituents) for simplicity certain elements can be made, for instance, of a subordinate clause, which happens in real languages and certain experiments (Langus & Nespor, 2010). Thus, online memory cost can be estimated more accurately if distance is measured in words (Ferrer-i-Cancho, 2008). In that case, the minimum online memory cost is achieved when elements are atomic (Appendix A). Thus, the experiments in (Goldin-Meadow et al., 2008; Langus & Nespor, 2010) where elements are atomic, follow the setup where online memory cost can be most easily neglected, and thus, not surprisingly, head last surfaces. In contrast, subjects, objects and verbs are not necessarily atomic, which may explain why central verb orders are found in about one third of world languages (including languages lacking a dominant word order) (Dryer, 2011) or in the gestural experiments with complex sequences (Langus & Nespor, 2010), despite their a priori disadvantages in terms of predictability of the verb. Notice that the abundance of verb last orders (N3 ) does not contradict the principle of online memory cost minimization. Indeed, the relative position of adjectives and verbal auxiliaries in verb last orders can be explained in terms of online memory cost minimization (Ferrer-i-Cancho, 2008). Thus, the fact that a language has SOV as dominant does not mean that online memory cost minimization is inactive. Langus and Nespor (2010) attribute the preference for head last order in simple gestural experiments to a dissociation between communication and language but such dissociation is not necessary. Here we simply argue that a principle of language, i.e. online memory minimization, can be neglected in those cases and thus one is able to explain the results of these experiments through another principle of language, i.e. maximum predictability. None of this principles is specific to language. The hypothesis of a correlation between sequence complexity and the pressure for minimizing online memory suggests the following questions: • Why does SVO not appear more frequently in world languages? The statistical evidence indicating that SVO is the second most frequent order after
70
SOV has already been reviewed. This raises two questions. First, do all languages have a high sequential complexity? The stability of SOV might be higher in languages producing syntactically simpler sentences. The issue of whether world languages have the same complexity is a matter of debate in linguistics (Sampson, 2009). Second, what is the best way of measuring the abundance of a word order? It turns out that the most frequent word order, if frequency is measured in number of speakers and not in number of languages (Dryer, 2011), is SVO by far (Bentz & Christiansen, 2010). • Why does SVO (or the symmetric OVS) not appear more clearly in the gestural experiments in (Langus & Nespor, 2010) with complex sentences? Evolving towards SVO from SOV may need (1) more time (2) more interaction between individuals and (3) more individuals than in the bounded experiments in (Langus & Nespor, 2010). We suggest that the speed of the evolution towards SVO or its accessibility may depend on at least these three factors. The need of (1) is supported by the fact the lacking a dominant word order is a transient configuration between SOV and SVO (Pagel, 2009). As for (3), notice it has been argued that the degree optimization of a language may depend on its number of speakers (Sampson, 2009; McWhorther, 2001). Our interpretation of the preference for head last in simple sequences and it is loss in complex sequences given here clearly differs from that of Langus and Nespor (2010). While they argue that the computational system of grammar has been bypassed in simple gestural experiments showing a preference for head last, our interpretation is simply that in this case sequences are short enough to make the online memory cost negligible (Appendix). In our interpretation there is no flip of systems. The principles of sequential processing are constant but the strength of a particular principle depends on the experimental conditions. 3. Final remarks It should not be interpreted that our theoretical conflict between principles predicts that all verb placements (verb first, verb last or central verb) should have about the same abundance after discarding languages lacking a dominant word order and controlling for sentence length, number of constituents, their size and other factors. The point is that a ring backbone defines the most likely transitions between orderings of S, V and O (Ferrer-i-Cancho, 2013b) and thus some configurations (e.g., verb initial orders) are more difficult to reach, despite their optimality, due to the initial preference for SOV and the attraction towards SVO (Ferrer-i-Cancho, 2013b). Word orders cannot be fully explained with the individual cognitive biases considered above (Dunn, Greenhill, Levinson, & Gray, 2011; Ferrer-i-Cancho, 2013b).
71
Acknowledgements The essence of the ideas above started circulating in January 2009 and was presented in the Kickoff Meeting ”Linguistic Networks” which was held in Bielefeld University (Germany) in June 5, 2009. We thank the participants, specially G. Heyer and A. Mehler for valuable discussions. We are also grateful to E. Santacreu-Vasut, Simon Kirby and reviewers for comments on the current version. This work was supported by the grant BASMATI (TIN2011-27479-C04-03) from the Spanish Ministry of Science and Innovation. Appendix: Online memory cost function Dl , a more accurate online memory cost function than Dl , is introduced next. Imagine that there are n + 1 constituents that can be made of more than one word and thus the sequence length in words is m ≥ n + 1. The term main head or root is used to refer to the head of the head constituent, which is the head of V for the particular case of the ordering of S, V , O. Now dependencies are formed between the head word of each complement/modifier and the head of the root constituent, following the same conventions of dependency grammar (Mel’ˇcuk, 1988). We assume that g(d) is defined for d ∈ [1, m − 1]. If the main head or root belongs to the l-th constituent of the sequence (1 ≤ l ≤ n + 1), the total online memory cost of the dependencies between the root and the heads of its n l−1 n+1 complements/modifiers may be defined as Dl = i=1 g(di,l ) + i=l+1 g(di,l ), where di,j (with i, j ∈ [1, n + 1] ⊂ N) is the distance in words between the head word of the i-th constituent and that of the j-th constituent. If constituents are made of a single word, one has di,j = |i − j| and thus Dl = Dl (recall Eq. 1). The point is that given a sequence of constituents, Dl is minimum when constituents are atomic (i.e. constituents are made of a single word; in that case, the only word of the constituent is the head). To see it, assume i = l and notice that di,l ≥ |i − l| (with equality if and only if the i-th constituent, the l-th and the intermediate constituents are atomic) and g(di,l ) ≥ g(|i − l|) (since g(d) is a monotonically increasing function of d). References Bentz, C., & Christiansen, M. H. (2010). Linguistic adaptation at work? the change of word order and case system from Latin to the Romance languages. In A. Smith, M. Schouwstra, B. de Boer, & K. Smith (Eds.), Proceedings of the eight international conference on the evolution of language (p. 26-33). London: World Scientific. Dryer, M. (2005). Order of subject, object and verb. In M. Haspelmath, M. S. Dryer, D. Gil, & B. Comrie (Eds.), The world atlas of language structures. Oxford: Oxford University Press. Dryer, M. (2011). Order of subject, object and verb. In M. Dryer & M. Haspelmath
72
(Eds.), The world atlas of language structures online. Munich: Max Planck Digital Library. (Available online at http://wals.info/chapter/81. Accessed on 2013-04-23.) Dunn, M., Greenhill, S. J., Levinson, S. C., & Gray, R. D. (2011). Evolved structure of language shows lineage-specific trends in word-order universals. Nature, 473, 7982. Ferrer-i-Cancho, R. (2008). Some word order biases from limited brain resources. A mathematical approach. Advances in Complex Systems, 11(3), 393-414. Ferrer-i-Cancho, R. (2013a). The placement of the head that maximizes predictability: an information theoretic approach. submitted. Ferrer-i-Cancho, R. (2013b). The placement of the head that minimizes online memory: a complex systems approach. Language Dynamics and Change, in press. (http://arxiv.org/abs/1309.1939) Gell-Mann, M., & Ruhlen, M. (2011). The origin and evolution of word order. Proceedings of the National Academy of Sciences USA, 108(42), 1729017295. ¨ urek, A., & Mylander, C. (2008). The natGoldin-Meadow, S., So, W. C., Ozy¨ ural order of events: how speakers of different languages represent events nonverbally. Proceedings of the National Academy of Sciences, 105(27), 9163-9168. Gong, T., Minnet, J. W., & Wang, W. S.-Y. (2009). A simulation study of word order bias. Interacion Studies, 10(1), 51-76. Greenberg, J. H. (1963). Some univerals of grammar with particular reference to the order of meaningful elements. In J. H. Greenberg (Ed.), Universals of language (p. 73-113). London: MIT Press. Langus, A., & Nespor, M. (2010). Cognitive systems struggling for word order. Cognitive Psychology, 60(4), 291-318. Li, C. N., & Thompson, S. A. (1981). Mandarin Chinese: a functional reference grammar. Berkeley: University of California Press. McWhorther, J. H. (2001). The power of Babel. New York: Times Books. Mel’ˇcuk, I. (1988). Dependency syntax: theory and practice. Albany: State of New York University Press. Pagel, M. (2009). Human language as a culturally transmitted replicator. Nature Reviews Genetics, 10(6), 405-415. Reali, F., & Christiansen, M. H. (2009). Sequential learning and the interaction between biological and linguistic adaptation in language evolution. Interaction Studies, 10, 5-30. Sampson, G. (2009). A linguistic axiom challenged. In G. Sampson, D. Gil, & P. Trudgill (Eds.), Language complexity as an evolving variable (p. 1-18). Oxford: Oxford University Press. Strogatz, S. H. (1994). Nonlinear dynamics and chaos: With applications to physics, biology, chemistry and engineering. Reading, MA: Perseus Books.
73
LINGUISTIC ANIMALS: UNDERSTANDING LANGUAGE THROUGH A COMPARATIVE APPROACH PIERA FILIPPI Department of Cognitive Biology, University of Vienna, Althanstrasse, 14 Vienna, 1090, Austria With the aim to clarify the definition of humans as Òlinguistic animalsÓ, in the present paper I functionally distinguish three types of language competences: i) language as a general biological tool for communication, ii) Òperceptual syntaxÓ, iii) propositional language. Following this terminological distinction, I review pivotal findings on animalsÕ communication systems, which constitute useful evidence for the investigation of the nature of three core components of humansÕ faculty of language: semantics, syntax, and theory of mind. In fact, despite the capacity to process and share utterances with an open-ended structure is uniquely human, some isolated components of our linguistic competence are in common with nonhuman animals. Therefore, as I argue in the present paper, the investigation of animalsÕ communicative competence provide crucial insights into the range of cognitive constraints underlying humansÕ ability of language, enabling at the same time the analysis of its phylogenetic path as well as of the selective pressures that have led to its emergence.
1. Introduction The aim of this paper is to help refine the definition of humans as Òlinguistic animalsÓ. Indeed, language, the distinctive feature of the human mind, is constituted of a complex set of closely intertwined processes that enable to share and convey an open-ended set of different utterances. Despite being pervasive in human cognition, language is one of the most challenging subjects in science. However, research on the origins of language through a comparative approach to animalsÕ communications systems constitutes a highly fertile way to explore what particular cognitive processes underlie humansÕ linguistic competence, making it unique. In fact, as I will describe in the following sections, recent research suggests that some languagerelated processes are shared with a wide range of nonhuman animals. Conversely, in pointing out the cognitive processes that human and nonhuman animals share, these studies help understand the distinctive features of humansÕ faculty of language. As a matter of methodological clarity, it is worth identifying three kinds of language faculties: i) language meant in a broad sense, as a general biological tool that allows communication, ii) Òperceptual syntaxÓ, iii) propositional
74
language. The first meaning refers to the capacity to produce visual and/or acoustic signs in association with specific referents. The second kind refers to the ability to process structural patterns among sensorial elements, with no semantic values involved. The third type of linguistic competence is the ability to understand and speak one or more natural languages. These are constituted of a specific set of morpho-syntactic rules that govern the combination of items, enabling the generation of a potentially infinite set of utterances. Typically, communications through propositional language take place within shared frames of attention, where the speakers intend to mutually inform each other or influence behaviors. While nonhuman animalsÕ ÒlinguisticÓ capacities fall under the first two kinds of the faculty of language, humansÕ linguistic abilities include all three types of language competence. I will utilize this terminological distinction between the three orders of language for the comparative investigation of the following constitutive components of language: semantics, syntax and the ability to attribute mental states to other individuals. Based on this conceptual distinction, I will argue that nonhuman and human animals share language-related cognitive traits (namely the first two orders of language) that were critical for the evolutionary emergence of propositional language. In particular, two types of cognitive commonalities can reveal pivotal evolutionary data on the faculty of language: ÒhomologiesÓ and ÒanalogiesÓ. In fact, the comparative approach to animalsÕ system of communication indicates the presence of language-related homologies (traits that two or more species have inherited from a common ancestor), and analogies (traits that distant species have evolved independently, in response to the same selective pressures.) Hence, the identification of these traits enables the analysis of the phylogenetic path of the core traits constituting the modern faculty of language, as well as of the selective pressures that have led to the origin of human language. 2. Semantics As to the semantic valence of nonhuman calls, extensive evidence suggests that a taxonomically diverse range of species use Òfunctionally referentialÓ signals, i.e. sequences of sounds whose acoustic structure provides listeners with sufficient information about the situational context or the object triggering the signal emission. The ability to produce functionally referential calls is an instance of the second type of the faculty of language, namely of the capacity to express visual and/or acoustic signs in association with specific referents. Thus, this type of signals possess informative value linked, for instance, to the presence of food or predators, sexual availability, social aggression, individual identity, pair bonding quality. Indeed, several non-primate species (e.g. birds, frogs, rats, bats, chickens, bees) have been shown to produce these calls (see Hauser, 1996 for a review.) Given the pervasive presence of this important linguistic feature in widely distant species, it is very difficult to assess its status
75
as an analog or a homologous trait in the tested species. Among primates, the ability to produce referential calls has been reported for suricates (Manser, 2001), rhesus macaques (Hauser & Marler, 1993), vervet monkey (Seyfarth & Cheney, 1980) and chimpanzees (Notman & Rendall, 2005). These data indicate the presence of crucial homologous traits that might have constrained the evolution of human language. 3. Syntax As von UexkŸll (1934/1992) has observed, different biocognitive processes ground the epistemic access to each species-specific environment (Umwelt). Following this theory, I identify the salient process that distinguishes each species-typical cognitive competence with the ability of Òperceptual syntaxÓ, i.e. the faculty to connect perceptual elements according to structural rules. Specifically, in each species this ability is given in different orders of computational complexity and is used in specific sensorial domains. In order to avoid conceptual confusion, it is worth preliminarily emphasizing that with the term perceptual syntax, I refer to the meaning modelled on the Greek word syntaxis, composed by ÒsynÓ (together, with) Ð and ÒtaxisÓ (order, connection, coordination of the parts according to structural rules), which must be kept conceptually distinguished from the modern meaning of the term ÒsyntaxÓ, intrinsically tied to the semantic values of the lexical units occurring within the sentence context. 3.1. Perceptual Syntax: Production The question I would like to address in this section is whether any observable evidence that the ability to produce patterns of perceptual items is present in animalsÕ communication systems. Indeed, evidence suggests that songbirds, parrots, cetaceans, and nonhuman primates possess the ability to concatenate the units of their vocalizations following specific sets of rules (Clarke, Reichard, & ZuberbŸhler, 2006; Dahlin & Wright, 2012; Okanoya, 2004; Suzuki, Buck, & Tyack, 2006). Thus, these data indicate that the ability to produce structured sequences of sounds is an analogous trait shared among phylogenetically distant species, which has evolved under the evolutionary pressures linked to sexual selection, territory defense, or group bonding (see Hauser, 1996). Recent studies provide evidence that also nonhuman primatesÕ communicative vocalizations result from specific combinations of vocal units, which convey different meanings such as the presence of food or predators (Crockford & Boesch, 2005), or the quality of the social relationship occurring between the vocalizing monkeys (Geissmann, 2002). These findings indicate that perceptual syntax is a fundamental homologous feature. Interestingly, recent
76
field research conducted on anthropoids has revealed the existence of a few rudimental cases of affixation. For instance, a study conducted on Campbell monkeys provides evidence on their ability to add a vocal unit to a specific alarm call (e.g. the one signalling the presence of leopards) in order to broaden its referent into a general alert call, signalling, for instance an arboreal disturbance, or a distant predator (Ouattara, Lemasson, & Zuberbuhler, 2009). As the authors of this research suggest, this acoustic affix affects the overall meaning of the call. Similarly, a study conducted on the putty-nosed monkey (Arnold, Pohlner, & Zuberbuhler, 2008) indicate that this species uses two types of signals (pyows and hawks) and inverting them generates different meaning effects. Hence, the data reviewed in this paragraph confirm that some species of animals have not only the ability to concatenate sounds according to specific rules, but also to produce multiple referential effects by means of different sound patterns. However, although highly crucial for the investigation of the evolution of language, it is worth emphasizing that these kinds of call sequences are not governed by any systematic combinatorial rule enabling the generation of a flexible, open-ended set of new vocal productions as it is the case in human natural languages. 3.2. Perceptual Syntax: Perception Previous research has investigated the ability to process linguistically-based information in chimpanzees, sea lions, dolphins (for a review see Schusterman & Gisiner, 1988) and parrots (Pepperberg, 2009). In the last decade, a much refined comparative approach to animalsÕ ÒlinguisticÓ capacities has adopted artificial languages generated according to the computational rules described in ChomskyÕs formal language theory (1956). As the linguist observes, in propositional language one can identify different levels of computational complexity that can be processed by an abstract computational machine or automaton. The simplest of these levels is described by the regular (or finite state) grammar, where all an automaton has to know is the initial state, the rule of transition to the next item and the set of final (or accepting) states. Any level above the finite state grammar requires higher cognitive resources in terms of memory and computational abilities. The aim of this paradigm of research is to explore nonhuman animals computational capacities, and thus to map the evolutionary path and selective pressures that led to human languageÕs typical computational complexity. Within this research framework, the first study was conducted by Fitch & Hauser (2004). The authors compared the ability of a species of a nonhuman primate (cotton-top tamarins) and of a group of adult humans to distinguish a regular grammar from a supra-regular grammar. They found that both human and nonhuman primates are able to parse a regular grammar, whereas only humans correctly processed a supra-regular pattern.
77
Interestingly, further studies have provided evidence that also songbirds are able to identify supra-regular patterns (Abe & Watanabe, but see Seki, Suzuki, Osawa, & Okanoya, 2013), thus pointing to this ability as a crucial analog trait evolved in humans and songbirds under the same selective dynamics, perharps linked to sexual attraction or territory defense. Given these considerations, and following Deacon (1997), I suggest that what makes humansÕ ability of language species-specific is the intrinsic possibility that the connections within the linguistic structure are not merely perceptual, but have an external referential value. In fact, while nonhuman animals possess the ability to recognize several perceptually structured signals, or bilateral associations between one auditory or visual token and a correspondent external object or action (indexical connection) Ð humans perceive relationships between referential objects through the logical and morfosyntactic relationships between the linguistic tokens: a cognitive process that implicitly orients linguistic acts. Thus, keeping with this philosophical paradigm, I identify the uniquely human faculty of language in the ability to combine semantic units within a network of combinatorial and logical relationships (Deacon, 1997) that can be linked to the state of affairs in the external world (Wittgenstein, 1922/2007). I see exactly in this ability the core cognitive process underlying a) the capacity to speak (or to reason) in verbal propositions and b) the general human faculty of language expressed, for instance, in the ability to draw visual conceptual maps or to compute mathematical expressions. 4. Theory of mind Finally, a study concerning the evolutionary dynamics of language cannot disregard the research on the precursors of a capacity that had a key role in determining the specificity of human cognition: the ability to attribute mental states to conspecifics within a frame of shared intentions and joint actions (Tomasello & Farrar, 1986). As Fitch, Huber & Bugnyar (2010) observe, this ability requires a set of multiple processes: a) detect someoneÕs gaze, b) look in the direction of the signalerÕs visual focus, c) look at the signalerÕs visual target of attention, i.e. knowing what the signaler sees d) attribute states of mind, i.e., knowing what the signaler knows. For instance, in order to infer a signalerÕs focus of attention, one should be able to detect his gaze, and to follow its direction. Indeed, recent studies have shown that the ability to identify someoneÕs gaze and head direction is widespread in a wide range of species (Emery, 2000). The ability to look in the same direction as the signaller is cognitively more complex. Interestingly, research conducted on multiple species (Schloegl, Kotrschal, & Bugnyar, 2007; TŽgl‡s, Gergely, Kup‡n, Miklosi, & Top‡l, 2012) indicates that this ability is possessed by a number of mammals. Moreover, recent studies suggest that also the ability to share the signallerÕs visual target of attention, or in other words, to know what he or she sees is
78
present in multiple species that are phylogenetically both distant or close to humans (Bugnyar & Heinrich, 2006; Hare & Tomasello, 1999). These data provide insights into the evolutionary constraints that might have evolved in distant species driven by similarly strong social dynamics. Moreover, extensive research has been dedicated to animalsÕ ability to infer othersÕ states of mind. Since much of the observed behaviors might be merely explained in terms of associative learning, the interpretation of these studies is highly controversial. Critically, in order to assess whether nonhuman animals possess the ability to know what other individuals know or think, and to influence their behaviour accordingly, one has to distinguish the signalerÕs perspective from the receiverÕs one. Indeed, as Seyfarth and Cheney (2003) observe, although callers are not aware of the state of knowledge of the receivers, nor they signal with the goal of affecting it, the resulting effect is to expose the listeners to useful information, triggering a behavioural response. Ultimately, I propose that the ability to infer other individualsÕ states of mind is grounded on the faculty to integrate multiple information linked to: a) past experience, b) a shared frame of knowledge, and c) implicit expectations on the interlocutors. 5. Conclusions The investigation of animalsÕ ÒlinguisticÓ competence provides crucial insights into the range of cognitive constraints underlying human language. Progress in this research area requires strong efforts in integrating discussions of multiple fields of investigation such as linguistics, biology, philosophy of language, psychology and neuroscience. This kind of interdisciplinary perspective will certainly provide a strong scientific background on the human faculty of language: one of the most complicated biological phenomena. However, despite highly fruitful, this approach on language necessitates specific methodological expediencies. For instance, in order to prevent any state of fruitless impasse typical of debates among experts of different fields, it is necessary to find common functional definitions to work upon. This will favor the generation of further empirically and theoretically grounded hypotheses on the investigation of the evolutionary path of human language, and consequently, on the distinctive features the make it unique. This research paradigm will broaden the understanding of the faculty of language, and maybe also of the interplay between language, rationality, and perception. Ultimately, this kind of approach might also suggest fruitful avenues for the investigation of the constraints that guide the ontogenetic emergence of human language.
79
Acknowledgements Funding was provided by a European Research Council Advanced Grant SOMACCA (No. 230604) awarded to William Tecumseh Fitch. The funders had no role in decision to publish or preparation of the manuscript. References Abe, K., & Watanabe, D. (2011). Songbirds possess the spontaneous ability to discriminate syntactic rules. Nature Neuroscience, 14(8), 1067Ð 1074. Arnold, K., Pohlner, Y., & Zuberbuhler, K. (2008). A forest monkeyÕs alarm call series to predator models. Behavioral Ecology and Sociobiology, 62(4), 549Ð559. Bugnyar, T., & Heinrich, B. (2006). Pilfering ravens, Corvus corax, adjust their behaviour to social context and identity of competitors. Animal Cognition, 9(4), 369Ð376. Chomsky, N. (1956). Three models for the description of language. Information Theory, IRE Transactions on, 2(3), 113Ð124. Clarke, E., Reichard, U. H., & ZuberbŸhler, K. (2006). The syntax and meaning of wild gibbon songs. PLoS ONE, 1, e73. Crockford, C., & Boesch, C. (2005). Call combinations in wild chimpanzees. Behaviour, 142(4), 397Ð421. Dahlin, Christine R., and Timothy F. Wright (2012). "Does syntax contribute to the function of duets in a parrot, Amazona auropalliata?." Animal cognition 15(4), 647-656. Deacon, T. W. (1997). The Symbolic Species. The Co-evolution of language and the human brain. New York: Norton & Company. Emery, N. J. (2000). The eyes have it: the neuroethology, function and evolution of social gaze. Neuroscience & Biobehavioral Reviews, 24(6), 581Ð604. Fitch, W. T., & Hauser, M. D. (2004). Computational constraints on syntactic processing in a nonhuman primate. Science, 303(5656), 377Ð 380. Fitch, W., Huber, L., & Bugnyar, T. (2010). Social cognition and the evolution of language: constructing cognitive phylogenies. Neuron, 65(6), 795Ð814. Geissmann, T. (2002). Duet-splitting and the evolution of gibbon songs. Biological Reviews of the Cambridge Philosophical Society, 77(01), 57-76. Hare, B., & Tomasello, M. (1999). Domestic dogs (Canis familiaris) use human and conspecific social cues to locate hidden food. Journal of Comparative Psychology, 113(2), 173Ð177. Hauser, M. D., & Marler, P. (1993). Food-associated calls in rhesus
80
macaques (Macaca mulatta): II. Costs and benefits of call production and suppression. Behavioral Ecology, 4(3), 206Ð212. doi:10.1093/beheco/4.3.206 Hauser, M. D. (1996). The evolution of communication. Cambridge, MA: The MIT Press. Manser, M. B. (2001). The acoustic structure of suricates' alarm calls varies with predator type and the level of response urgency. Proceedings of the Royal Society of London. Series B: Biological Sciences, 268(1483), 2315Ð2324. Notman, H., & Rendall, D. (2005). Contextual variation in chimpanzee pant hoots and its implications for referential communication. Animal Behaviour, 70(1), 177Ð190. Okanoya, K. (2004). Song syntax in Bengalese finches: proximate and ultimate analyses. Advances in the Study of Behavior, 34, 297-346. Ouattara, K., Lemasson, A., & Zuberbuhler, K. (2009). Campbell's monkeys concatenate vocalizations into context-specific call sequences. Pnas, 106(51), 22026Ð22031. Pepperberg, Irene M. (2009. The Alex studies: cognitive and communicative abilities of grey parrots. Cambridge, MA: Harvard University Press. Schloegl, C., Kotrschal, K., & Bugnyar, T. (2007). Gaze following in common ravens, Corvus corax: ontogeny and habituation. Animal Behaviour, 74(4), 769Ð778. Seki, Y., Suzuki, K., Osawa, A. M., & Okanoya, K. (2013). Songbirds and humans apply different strategies in a sound sequence discrimination task. Frontiers in psychology, 4. Seyfarth, R. M., Cheney, D. L., & Marler, P. (1980). Monkey responses to three different alarm calls: evidence of predator classification and semantic communication. Science, 210(4471), 801Ð803. Seyfarth, R. M., & Cheney, D. L. (2003). Signalers and Receivers in Animal Communication. Annual Review of Psychology, 54(1), 145Ð 173. Suzuki, R., Buck, J. R., & Tyack, P. L. (2006). Information entropy of humpback whale songs. The Journal of the Acoustical Society of America, 119(3), 1849. TŽgl‡s, E., Gergely, A., Kup‡n, K., MiklOsi, A., & Top‡l, J. (2012). Dogs' Gaze Following Is Tuned to Human Communicative Signals. Current Biology, 22(3), 209Ð212. Tomasello, M., & Farrar, M. J. (1986). Joint attention and early language. Child development, 1454Ð1463. UexkŸll, J. von e G. Kriszat (1992). A stroll through the worlds of animals and men: A picture book of invisible worlds. Semiotica, 89(4). doi:10.1515/semi.1992.89.4.319. (Original paper published in 1934). Wittgenstein, L. (2007). Tractatus logico-philosophicus. New York, NY: Cosimo. (Original work published 1922).
81
CREATIVE COMPOSITIONALITY FROM REINFORCEMENT LEARNING IN SIGNALING GAMES
MICHAEL FRANKE Institute for Logic, Language and Computation Universiteit van Amsterdam Amsterdam, 1098XH, The Netherlands [email protected] Compositional language use shows in creatively associating hitherto unencountered meanings and forms in systematic ways. I submit that compositionality, as a key feature of human language, is no reason not to see a continuum between human speech and animal communication. Basic forms of compositional creativity presuppose surprisingly little cognitive sophistication. If changes in agents’ behavioral dispositions are susceptible to similarities between different meanings and, independently, to similarities between different forms, creative compositionality can emerge in a signaling game model with reinforcement learning.
1. Introduction A decisive step in the evolution of language was the transition from a holophrastic term language to a compositional language (Jackendoff, 1999). A holophrastic language consists of simple expressions that are individually meaningful, but are not combined in meaningful ways. In contrast, a compositional language has structured linguistic expressions which are built up from simpler individually meaningful parts. The meaning of a complex expression is related in a systematic way to the meaning of the parts that it comprises. Human language can be used holophrastically, but is compositional. Evidence for holophrastic communication in animals is known for long (c.f. Seyfarth, Cheney, & Marler, 1980). Animals also combine signals into sequences with novel meanings (c.f. Arnold & Zuberb¨uhler, 2006; Ouattara, Lemasson, & Zuberb¨uhler, 2009). Languagetrained primates even creatively produce short sequences of meaningful elements to express new meanings (c.f. Marks Greenfield & Savage-Rumbaugh, 1990). A compositional language has many advantages over a non-compositional one: it can convey more with less means, is therefore less susceptible to noise, can be learned from fewer examples, and much else. But in order to understand how the transition from a holophrastic to a compositional language might have been possible, it is unsatisfactory to simply point to a potential evolutionary advantage of compositionality once it is there (contra Nowak & Krakauer, 1999).
82
The relevant question is rather by what mechanism early forms of compositionality could have arisen in the context of a holophrastic system. Many learning mechanisms are capable of linking a structured meaning space with a structured space of potential expressions, and so provide potential answers to the how?-question we are after (Batali, 1998; Kirby, 2002). It is good to know that it is possible for rather sophisticated agents to learn, and even generate, a compositional language. But once we know it, the key question becomes what the minimal cognitive abilities are that could lead to the transition in question. Skyrms (2010) addresses this question in a game theoretic setting and suggests to see the beginning of compositional language in a model first introduced by Barrett (2007). The following paragraphs will introduce this model, together with the relevant background on signaling games. I proceed to argue that the Barrett-Skyrms model misses a key feature of compositionality, namely that it is a flexible and potentially creative ability to associate novel expressions with novel meanings. But rudimentary forms of creative compositionality do not presuppose much sophistication. Agents who perceive similarities between world states and (unrelated) similarities between signals can evolve a disposition to creatively exploit existing associations between states and signals. This can be demonstrated by a simple signaling game model using Roth-Erev reinforcement learning with two defensible amendments: (i) a spill-over mechanism that distributes accumulated rewards also to non-actualized contingencies proportional to how similar they are to the successful actual contingency (c.f. O’Connor, 2013), and (ii) a small amount of lateral inhibition (c.f. Steels, 1995). Signaling Games. Signaling games were invented by Lewis (1969). In the simplest case, an unbiased random process selects one out of two possible world states. The sender knows the selected state, but the receiver does not. The sender sends one out of two signals. The receiver perceives the signal and chooses one out of two acts. If the chosen act matches the world state, the game is a success for both players, otherwise a failure. If the sender uses one signal consistently in the first state, and another in the other, and if the receiver chooses the appropriate act after each signal, sender and receiver will always play successfully. Such behavior of sender and receiver, as it were, bestows a meaning on the signals: each signal comes to be associated with a unique world state and its corresponding action. Reinforcement Learning. Players could arrive at meaningful signal use in manifold ways. Much depends on their cognitive abilities. From an evolutionary perspective, and in line with the kind of methodological minimalism advocated above, it is interesting to assume that players are rather unsophisticated, incapable of rational decision making and possibly even unaware that they are playing a game. Basic forms of reinforcement learning (RL) are relevant for this purpose and have been studied well in this context. Basic RL assumes that each player keeps
83
an implicit record of the past successes associated with each action choice, the so-called accumulated rewards. The sender keeps a record for each state-signal pair, the receiver one for each signal-act pair. Whenever a play was successful, agents add a reward to the pair that was actually used. Initially, all accumulated rewards are 1. Accumulated rewards inform the agents’ dispositions to act. In the simplest case, the probability that the sender selects signal m in state t is given by the relative accumulated rewards for that pair, i.e., the accumulated rewards for t and m divided by the sum of all accumulated rewards for t and all possible signals: p(m | t) = ar(t,m)/ m ar(t,m ). Similarly for the receiver. Argiento et al. (2009) proved that this form of RL will eventually settle into a communicative constellation for the basic signaling game sketched above. If there are more states or signals, or if states are not equiprobable, things change. E.g., with three equiprobable states and three signals basic RL leads to a fully communicative state in ca. 95% of simulation runs (Barrett, 2007). The Barrett-Skyrms Model. The meaningfulness that arises in Lewis-style signaling models is holophrastic. But a slight extension of the model raises hopes that very rudimentary forms of compositional meaning can also be traced in this way. Barrett (2007, 2009) studied signaling games with two signals, four states and corresponding actions. Instead of one sender, there are two. If each sender can send one out of two signals, it is possible to communicate exactly which world state obtains. Simulations of RL show that this situation almost always ensues. In the fully communicative situation, each sender’s signal conveys one bit of information about which of the four world states is actual. The receiver puts the necessary information together and “infers” what the right choice of action is; or at least this is how it looks from the outside. Skyrms (2010) suggests a variant of the multiple-sender model as a step towards understanding the origins of compositionality. The crucial observation is that the two-sender case is formally equivalent to a set-up with one sender who may send one out of two signals twice in sequence. Nothing else changes. RL still frequently leads to fully communicative signal use. But we have complex signals now, made up of two parts. Each part communicates one bit of information about the state. Skyrms therefore suggests to see here “a simple kind of compositionality” because “[t]he information in a complex signal is a function [my emphasis] of the information in its parts” (Skyrms, 2010). I partly agree, partly disagree. Although we can describe the situation as one where the meaning of a complex signal is a function of its parts, there is no justification for doing so. A simpler description is that the receiver has simply learned to respond to four signals in the right way. Nothing hinges on the fact that these four signals are composed of two individual signals to us. The dispositions of the receiver to react to complex signals do not depend on their composition. Similarly, the sender has simply learned to emit one out of four signals that are implemented
84
as a bit-string of length two. There is no indication in the model that the agents have learned to apply a function of the meaning of the basic signals of which the complex signal is composed. They never actually use the simple signals. If they would, they might use them in ways unrelated to their “meaning contribution” to the complex signals. An explanation of basic forms of compositionality (or basic conjunctive inferences for that matter), requires an explanation of how agents acquire a rule-like disposition that shows when applied to novel stimuli. Only then is there a justification within the model to assume that they use the meaning of basic signals to arrive at the meaning of a complex signal. Creativity and Spill-Over RL. Compositional signal use should show in creativity in the application of behaviorally acquired meaningfulness. Only then are we justified in describing behavior as following a functional combination of meaning. But creativity in this sense is at odds with basic RL. We would like to see whether RL-learners can be creative when confronted with a novel stimulus. But basic RL does not influence choice dispositions in non-actualized contingencies. So basic RL-learners will make uniform random choices in novel situations. Variants of RL in which rewards are accumulated also for non-actualized contingencies exist (c.f. O’Connor, 2013). I submit that creative use of acquired meaningfulness is possible if rewards spill-over to non-actualized contingencies proportional to their similarity with the actualized one. Suppose that in state t signal m has led to payoff x ≥ 0. (I focus on the sender from here on; the receiver part is analogous.) Basic RL adds x to the accumulated rewards ar(t, m) only. In contrast, spill-over RL adds x to all ar(t , m ) in proportion to how similar the pair t , m is to t, m. Concretely, if similarity between pairs is a number between 0 and 1, then we add x times the similarity of t, m and t , m to ar(t , m ). Basic and spill-over RL differ in their assumptions about the learner’s secondary-dispositions. Secondary dispositions are a learner’s dispositions to change his (primary) dispositions to act given feedback about success or failure. Spill-over RL presupposes that agents’ secondary dispositions are sensitive to similarity of choice-point/action pairs. Basic RL does so, too, but also assumes that all pairs are maximally distinct. Depending on what kind of stimuli and similarities are at stake, spill-over RL presupposes less cognitive sophistication. The spill-over may be due to an inability to distinguish sharply. Model. The simplest non-trivial case where spill-over RL might lead to compositional creativity is a signaling game with six states and six signals. Three states and three signals are simple, three of each complex. If A and B are simple states (signals), then AB is a complex state (signal) built from A and B. Obvious ways of thinking about complex states and signals are meaning conjunction and signal sequencing, but other ways of combination are conceivable. We can remain entirely abstract here. Each state/signal has similarity 1 to itself. Similarity is a
85
symmetric notion, and complex state/signal AB bears a similarity 0 ≤ s ≤ 1 to both A and B, and 0 to all others. The similarity of pair t, m and t , m is the similarity of t and t times the similarity of m and m . If s = 0, we obtain basic RL. Notice that agents’ secondary dispositions are sensitive to the similarity among states, and the similarity among signals, but not to similarities between a state and a signal, or even similarities between associating this signal with this state and that signal with that state. So, our assumptions about similarities do not smuggle in a particular cognitive ability at all; on the contrary. This set-up promises to help explain spontaneous and creative compositional signal use. Suppose the sender has signals for states tA and tB , namely mA and mB . Suppose the sender is in state tAB for the first time. tAB bears marks of both tA and tB but is identical to neither. Perhaps, so the intuition goes, accumulated rewards of using mA and mB successfully in the past in states tA and tB have spilled over sufficiently to build a strong association between the hitherto unseen state tAB and the hitherto unused signal mAB . Maybe this association is strong enough for mAB to be the most likely action choice in tAB . This would then truly be a creative compositional use of a complex signal based on what its parts mean. Lateral Inhibition. Unfortunately, not all values of s achieve this. The pair tAB , mA is at least as similar to tA , mA as tAB , mAB is. If 0 < s < 1/2, the accumulated rewards of tAB , mAB will be strictly lower than those of tAB , mA (see below). Spill-over RL alone is not enough to evolve strong dispositions for creative compositionality for small s. Things change if we add the possibility of lateral inhibition, which is another standard variation on classical RL models (Steels, 1995). If 0 ≥ i ≥ 1 is a parameter for lateral inhibition, then, if t, m is part of a successful play, we subtract i from the sender’s accumulated rewards for all t, m and t , m (where t = t and m = m ), if the result is non-negative, and otherwise set the accumulated rewards to 0. (Likewise, for the receiver.) Positive i helps acquire a compositional language, although it is not strictly necessary either. It helps because, intuitively speaking, lateral inhibition does not affect the association of tAB , mAB , but diminishes the association of tAB , mA and tAB , mB . Lateral inhibition is not an innocuous assumption. A positive i suggests that the agents tend towards one-to-one associations between choice points and actions. Some psychologists see evidence for a related tendency in language acquisition, the so-called mutual exclusivity bias: when learning new words children seem to assume that different objects must have different names and different names must refer to different objects (Clark, 2009).
86
i 1
creative compositionality ar(tAB , mAB ) > ar(tAB , mA )
.8 .6 .4
reliance on basic meaning ar(tAB , mAB ) < ar(tAB , mA )
.2
s .2 .4 .6 .8
1
Figure 1. Comparison of accumulated rewards in the mean field limit when agents use a fully communicative language for simple states and signals. The plot shows the regions of the parameter space i, s ∈ [0; 1] where the complex signal mAB is the most likely choice in unfamiliar state tAB .
Analysis. Suppose the sender uses mA exclusively in state tA , mB in tB and mC in tC . The mean field change in accumulated rewards is easy to calculate: ar(t ˙ AB , mA ) = ar(t ˙ AB , mB ) = 1/3 max(s − i, 0) 2
ar(t ˙ AB , mAC ) = ar(t ˙ AB , mBC ) = s /3
ar(t ˙ AB , mC ) = 0 2
ar(t ˙ AB , mAB ) = 2s /3
In the mean field limit, the probability of the sender to choose the “compositional” 2 message spontaneously converges to p(mAB | tAB ) = s /2s2 +max(s−i,0). If i ≥ s, 2 1 then this probability peaks at /2. If i > s − 2s , the accumulated rewards of the creative compositional pairing tAB , mAB will be the highest for that choice point. The region of the parameter space where this holds is shown in Figure 1. Simulations. Limit results are important, but do not inform us about short-term dynamics. Numerical simulations do. Figure 2 shows results from spill-over RL for different parameter values. Initially, accumulated rewards were 1. Agents first played with only simple states and signals for 104 rounds. In ca. 99% of trials a communicative code evolved. 100 of these were recorded for each parameter pair. Agents then continued to play with the full state and signal set. The plots show the average relative accumulated reward at choice point tXY for options mXY and mX . With X, Y ∈ {A, B, C} each plotted point is an average of 3 times 100 data points. We see that under favorable parameter values a dominant disposition for creative compositionality co-evolves quickly, together with the basic meaning association of simple states and simple signals. Discussion. Even unsophisticated agents can acquire a disposition to creatively use signals in a new environment in a basic compositional way. The main ability needed for that is to have secondary dispositions that are suitably sensitive to any perceived similarity between states and any perceived similarity between signals. For rudimentary forms of compositionality, it is not necessary that agents, as it were, look for structural similarity between states and signals. If agents could track more information about similarities, they would presumably evolve more
87
i=0
i = 0.1
i = 0.2 s = 0.05
0.5
s = 0.15
0 1 0.5 0 1 s = 0.25
average relative accumulated reward
1
0.5 0 103 104 105 106 103 104 105 106 103 104 105 106 rounds
Figure 2. Results of numerical simulations of spill-over RL when there is no initial simple language in place. For different values of parameters the plots show average relative accumulated rewards in state tXY for using mXY (black) and mX (gray).
elaborate compositional systems. It is tempting to speculate that a further step towards a human-like compositional language would involve recognizing similarities in the dynamically shifting patterns in the way a set of signals is used. But this is beyond the scope of this contribution, and irrelevant for a demonstration that basic forms of compositionality can arise already if agents merely track similarities among states and, independently, similarities among signals. Evolving creative compositional dispositions is not a practical certainty. Only some parameter constellations readily allow for it. Low values for i and s seem most reasonable for unsophisticated agents. But it is then that creative compositionality is unlikely to evolve. This may explain why we have seen only little direct evidence of it in animal communication systems so far. Still, the model presented here makes clear that a continuous transition from holophrastic to compositional coding is possible. It might be objected that spontaneous compositional language use, albeit it possibly the most likely choice, is never certain, i.e., p(mAB | tAB ) is at most 1/2. I believe that this is a good prediction that again excludes the presumably erroneous prediction that creative compositionality should be much more wide-spread in the animal kingdom than current evidence suggests. In sum, I propose that the model demonstrates that a path from holistic language to creative compositionality exists also for non-sophisticated RL-learners, but that the likelihood of finding this path is naturally upper-bounded.
88
References Argiento, R., Pemantle, R., Skyrms, B., & Volkov, S. (2009). Learning to signal: Analysis of a micro-level reinforcement model. Stochastic Processes and their Applications, 119, 373–390. Arnold, K., & Zuberb¨uhler, K. (2006). Language evolution: Semantic combinations in primate calls. Nature, 441, 303. Barrett, J. A. (2007). Dynamic partitioning and the conventionality of kinds. Philosophy of Science, 74, 527–546. Barrett, J. A. (2009). The evolution of coding in signaling games. Theory and Decision, 67, 223–237. Batali, J. (1998). Computational simulations of the emergence of grammar. In J. R. Hurford, M. Studdert-Kennedy, & C. Knight (Eds.), Evolution of language: Social and cognitive bases. Cambridge, UK: Cambridge University Press. Clark, E. V. (2009). Lexical meaning. In E. L. Bavin (Ed.), Child language (pp. 283–300). New York: Cambridge University Press. Jackendoff, R. (1999). Possible stages in the evolution of the language capacity. Trends in Cognitive Sciences, 3(7), 272–279. Kirby, S. (2002). Learning, bottlenecks and the evolution of recursive syntax. In T. Briscoe (Ed.), Linguistic evolution through language acquisition: Formal and computational models (pp. 173–204). Cambridge University Press. Lewis, D. (1969). Convention. a philosophical study. Cambridge, MA: Harvard University Press. Marks Greenfield, P., & Savage-Rumbaugh, E. (1990). Grammatical combination in Pan Paniscus: Process of learning and invention in the evolution and development of language. In S. Parker & K. Gibson (Eds.), “Language” and intelligence in monkeys and apes (pp. 540–578). Cambridge University Press. Nowak, M. A., & Krakauer, D. C. (1999). The evolution of language. PNAS, 96, 8028–8033. O’Connor, C. (2013). The evolution of vagueness. Erkenntnis. (To appear) Ouattara, K., Lemasson, A., & Zuberb¨uhler, K. (2009). Campbell’s monkeys use affixation to alter call meaning. PLoS ONE, 4(11), e7808. Seyfarth, R. M., Cheney, D. L., & Marler, P. (1980). Monkey responses to three different alarm calls: Evidence of predator classification and semantic communication. Science, 210(4471), 801–803. Skyrms, B. (2010). Signals: Evolution, learning, and information. Oxford: Oxford University Press. Steels, L. (1995). A self-organizing spatial vocabulary. Artificial Life, 2(3), 319– 332.
89
OVERLAPPING AND SYNCHRONIZATION IN THE SONG OF THE INDRIS (Indri indri) MARCO GAMBA VALERIA TORTI GIOVANNA BONADONNA GREGORIO GUZZO CRISTINA GIACOMA Department of Life Sciences and Systems Biology, University of Torino, Via Accademia Albertina 13, Torino, 10123, Italy In Indri indri, males and females within a social group emit loud, long distance calls in a coordinated manner. An indri may start emitting a vocal utterance before the end of another individual‟s contribution, resulting in different degrees of overlap between individual songs. This study provides the first quantitative analysis of the individual overlap between indri males and females showing the adult pair members mainly overlap each other. The adult female song also has an effect on male singing in 35% of the cases, while the timing of a song unit could predict the occurrence of a unit sung by another group-member in only 10% of the other dyads (e.g. male-female or male-youngster).
1. Introduction The interplay between synchronous and asynchronous displays in communicative interactions are based over sequential symmetry formations and symmetry breaks, which may involve postural, gestural, and vocal displays (Rotondo & Boker, 2002; Adger-Antonikowski, 2008). Thus, it has been hypothesized that the spatio-temporal structure of the formation and breaking of symmetry can be diagnostic of social and cognitive aspects of human dyadic relationships (Boker & Rotondo, 2002). Moreover, it has been reported that gender and dominance influence dyad interactions affecting synchrony and symmetry of the behavioral displays (Rotondo & Boker, 2002). Non-verbal communication in nonhuman primates may have been the substrate for the development of „protolinguistic‟ forms of communication (Bickerton, 2000). Nonhuman primates long distance calls have been often seen in the frame of a larger evolutionary framework, in the perspective of human vocal and musical rhythms (Merker, 1999; 2000; Fitch, 2006; Geissmann, 2000). These series of vocal emissions have been indicated as a possible evolutionary
90
precursor of human music (Merker, 2000; Merker & Okanoya, 2007; Merker et al. 2009). Previous research showed that simultaneous calling patterns given by several individuals within a social group are present in various terrestrial animals including wolves (Harrington & Asa, 2003; Mazzetti et al., 2013), coyotes (McCarley, 1975), and jackals (Jaeger et al., 1996), and in few primate species as gibbons (Geissmann, 2000; Merker, 1999) and indris (Giacoma et al., 2010; Maretti et al., 2010). According to Brown (2007), one of the features that these primate „songs‟ share with music is “the matching of time". The organization of indri is based around a socially monogamous pair where the female is dominant (Pollock, 1975). Group size usually varies between 2 and 6 animals (Torti et al., 2013). Loud singing in indris follows a sequential organization (Pollock, 1986), which shows evident species- and sex-specific features (Giacoma et al., 2010; Gamba et al., 2011) and may serve conveying different messages (Bonadonna et al., 2013; Torti et al., 2013). We aimed to investigate whether the ability to synchronize calling during group chorusing, which may be a phylogenetic parallel to singing ability in humans, is present in Indri indri, a species who diverged from the human evolutionary path more than 65 million years ago (Steiper & Young, 2006). Singing synchronization was expressed by measuring the percentage of overlap showed by individual singers and by evaluating whether the timing of a song unit could predict the occurrence of a unit sung by another group-member. 2. Materials & Methods 2.1. Study sites and subjects We observed and recorded a total of 9 groups in 3 areas of dense tropical forest in Madagascar: 3 groups in Andasibe-Mantadia National Park, Analamazaotra Reserve, 3 groups in Mitsinjo Station Forestière, and 3 groups in the Maromizaha Forest. The indris lived in socially monogamous groups composed of an adult pair and few other individuals. Very often an additional individual (often a male), whose relatedness with the adult pair is unknown, was present in the social group. We recorded 12 adult males, 12 females, and 7 youngsters. 2.2. Sampling and equipment We collected data in the field over a total of 21 months (3 months a year), between September and December from 2006 to 2012. We carried out observations of one group per day from 06:00 am to 1:00 pm. Each indri was individually recognized by natural markings. All recordings were made without the use of playback stimuli and nothing was done to modify the behavior of the indris. We have received permits for this research from “Direction des Eaux et
91
Forêts” and “Madagascar National Parks” (formerly ANGAP). We recorded 77 songs, consisting of duets and choruses, with a maximum of six individuals singing in the same song. All utterances were recorded at a distance from 2 to 10 m, because all the study groups were habituated. Focal animal sampling allowed the attribution of each vocalization to a signaler. Recordings were made using Sennheiser ME 66 and ME 67 and AKG CK 98 microphones, facing the focal animals (Altmann 1974) during the emission of the songs. The microphone output signal was recorded at a sampling rate of 44.1 kHz using a solid-state digital audio recorder (Marantz PMD671, SoundDevices 702 or Olympus S100). 2.3. Acoustic analyses Segments containing indri songs were edited using Praat 5.3.46 (Boersma & Weenink 2008) and we copied each song to a single audio file (in AIFF format). We then identified each individual emission and saved this information in a Praat textgrid. Merging textgrids of different indris we estimated the overlap between individuals, which was expressed in percentage of song duration. Durations of the overlapping and non-overlapping parts of each song and timing of the centre points of each song unit were saved within Praat and exported to a Microsoft© Excel spreadsheet (Gamba & Giacoma 2007; Gamba et al. 2012). 2.4. Statistical Analyses analyses Differences in the percentage of song overlap between individuals were tested using a Mann-Whitney U test. The predictive power of the song unit timing in one individual over another was evaluated using the Granger Casuality test (Granger, 1969). We computed the bivariate Granger causality test in two directions for each dyad of indris singing in a chorus (Wessa, 2013) tracking whether they were males, females or youngsters. Probability values were corrected using the Sidak post-hoc method. Statistical analyses were performed using IBM SPSSStatistics 20.0 for Mac and R (Hornik, 2013). We reported means and standard deviations (± SD) of the variables measured. 3. Results 3.1. Overlapping All of the songs we recorded showed intra-group overlapping between singers. We found that adult males and adult females had a remarkable part of their song not overlapped by others, 21+6% and 25+5% respectively. These percentages were significantly higher than those shown by youngsters of the same sex for
92
both males (12+7%; N = 11, Z = -2.041, p = 0.041) and females (10+3%; N = 11, Z = -2.717, p = 0.007). Overlapping within the adult pair (Npair = 8; 13+5%) was significantly higher than the percentage of overlapping between a pair member and any other individual within social group (0.004 < pSidak < 0.013). 3.2. Synchronization In indris, females‟ contribution to the song served as a template for males. When singing in chorus with more than one male, the female contribution covered the total duration of the song. Males, on the contrary, sang in different parts of the song, showing scarce overlapping one another. We tested for causality between the timing of the centre-point of a unit of an emitter and the timing of the centre-point of a unit by another singer in the same chorus. We could detect an effect of youngster‟s singing on the pattern of adult males and females in 11% and 10% of the analyzed songs respectively. We obtained similar results (10%) evaluating the effect of female singing on the youngsters. We found that adult male singing had an effect on the female song coordination and the youngster contribution in 33% of the samples. The highest values were found for the effect of female‟s song over adult males, where we found a significant result in 35% of the cases. 4.
Discussion
Our current knowledge about primate singing is that these calls have an innate basis but they can be refined during development (Pistorio et al., 2006; Lemasson et al., 2011). Although field experience give clear indications of interindividual coordination during group chorusing, this was rarely investigated in previous primatological research. The role of song features in the reproduction, in the intra-group and inter-group regulation in indris has been made clear by previous studies. The song units are sexually dimorphic (Giacoma et al., 2010), singing serves to maintain cohesion between group members and to regulate the spacing between neighboring groups (Torti et al., 2013). The song also mediates crucial aspects of indris‟ reproductive biology (Bonadonna et al., 2013). Our analyses on indri showed that overlap between the male and the female of an adult pair is much more likely to occur than any other overlap between singers. This is different from previous evidence collected on singing primates where a young gibbon was often overlapping its mother (Merker & Cox 1999) while adult sex-specific individual songs were largely separated during the song (Geissmann, 2000).
93
Synchronous calling bouts may provide a measure of male/female cooperation as well as vocal skill (Merker, 1999). Both these factors assure that a song is informative, signaling territorial occupation to conspecifics and introducing sexual selection pressures on the calling behavior of territorial groups. The songs may stimulate male or female migrations and the formation of new groups. In this perspective, overlapping of units emitted by the adult pair may indeed provide conspecifics at distance with a signal of their cohesion. Synchronous calling extends the reach of the signal by decibel summation and is likely to inform a greater number of conspecifics, increasing the potential mating opportunities on one hand, and acting as deterrent to potential distant intruders on the other hand. The emission of non-overlapping units may play a significant role in communicating with animals at shorter range. Indris‟ advertisement songs, exchanged across the territorial boundaries, may play a role in spacing neighboring groups, potentially stimulating territory intrusions, extra-pair copulations and vocal fights, or, in reverse, may deter encroachments and help conserving the existing conditions. The evidence derived from the analysis of singing dyads units is less conclusive. The percentage of songs in which we found significance for an effect of an individual contribution on another is only 22% of the sample. The effect of female‟s song on the adult male‟s contribution appears stronger that those found in the other dyads, but still limited. Although we cannot exclude that indris have the ability to synchronize their calls during the song, the evolutionary scenario by which the human capacity to organize synchronous vocal emissions may have emerged still has to find a nonhuman parallel. Acknowledgements This study is part of a larger field research on Indri indri supported by the Università degli Studi di Torino and the ACP Science and Technology Programme of the ACP Group of States, with the financial assistance of the European Union, through the Projects BIRD (Biodiversity Integration and Rural Development; N° FED/2009/217077) and SCORE (Supporting Cooperation for Research and Education; Contract n. ACP RPR 118 # 36) and by grants from the Parco Natura Viva - Centro Tutela Specie Minacciate. We are also grateful to Compagnia di San Paolo for co-financing the “Progetto 2008-10 Innovazione Laboratori didattici - Facoltà di Scienze MFN” which allowed purchasing lab equipment used by G.B., V.T., and G.G. during the field activities. We thank Ministère de l‟Environnement et des Forêts (MEF) and Madagascar National Parks for granting the research permits. We are also grateful to GERP (Groupe d‟Etudes et des Recherche sur les Primates) for allowing us to collect data in the forest under its management. We, finally, thank Rose Marie Randrianarison, Dr. Jonah Ratsimbazafy, Dr. Cesare Avesani Zaborra and Dr. Caterina Spiezio for their important role in the organization of the Maromizaha field station, all the research and international guides, Lanto and Mamatin for their help and logistic
94
support, and Dr Olivier Friard for help with the computational analyses. The contents of this document are the sole responsibility of the authors and can under no circumstances be regarded as reflecting the position of the European Union. References Adger-Antonikowski, A. (2008). A Functionalist Perspective of Language Ability and Behavioral Synchrony in the Development of Emotion Regulation. Albany, USA: UMI Dissertations Publishing. Altmann, J. (1974). Observational study of behavior: sampling methods. Behavior, 49, 227-267 [Reprinted In L.D. Houck and L.C. Drickamer (Eds.), Foundations of Animal Behavior. U Chicago Press, 1996]. Bickerton, D. (2000). How protolanguage became language. In M. StuddertKennedy, C. Knight, J.R. Hurford (Eds.), The evolutionary emergence of language: social function and the origins of linguistic form. Cambridge, UK: Cambridge University Press. Boersma, P., Weenink, D. (2008). Praat: doing phonetics by computer (version 5.0 27). Computer program. Boker, S. M., Rotondo, J. L. (2002). Symmetry building and symmetry breaking in synchronized movement. Advances in consciousness research, 42, 163174. Bonadonna, G., Torti, V., Randrianarison, R.M., Martinet, N., Gamba, M. & Giacoma, C. (2013). Behavioral correlates of extra-pair copulation in Indri indri. Primates, DOI 10.1007/s10329-013-0376-0. Brown, S. (2007). Contagious heterophony: A new theory about the origins of music. Musicae Scientiae, 1, 3-26. Fitch, W.T. (2006). The biology and evolution of music: a comparative perspective. Cognition, 100, 173–215. Gamba, M., Giacoma, C. (2007). Quantitative acoustic analysis of the vocal repertoire of the crowned lemur. Ethology Ecology And Evolution, 19, 323343. Gamba, M., Favaro, L., Torti, V., Sorrentino, V., Giacoma, C. (2011). Vocal tract flexibility and variation in the vocal output in wild indris. Bioacoustics, 20, 251-266. Gamba, M., Friard, O., & Giacoma, C. (2012). Vocal tract morphology determines species-specific features in vocal signals of lemurs (Eulemur). International Journal of Primatology, 33, 1453-1466. Geissmann, T. (2000). Gibbon song and human music from an evolutionary perspective. In N.L. Wallin, B. Merker and S. Brown (Eds.), The Origins of Music (pp. 103–123). Cambridge, MA: The MIT Press. Giacoma, C., Sorrentino, V., Rabarivola, C., & Gamba, M. (2010). Sex differences in the song of Indri indri. International Journal of Primatology, 31, 539-551.
95
Granger, C.W.J. (1969). Investigating Causal Relations by Econometric Models and Cross-Spectral Methods. Econometrica, 37, 424-438. Harrington, F.H., & Asa, C.S. (2003). Wolf communication: behavior, ecology, and conservation. In L.D. Mech and L. Boitani (Eds.), Wolves: Behavior, Ecology and Conservation (pp. 96–99). Chicago: University of Chicago Press. Jaeger, M.M., Pandit, R.K., & Haque, E. (1996). Seasonal Differences in Territorial Behavior by Golden Jackals in Bangladesh: Howling versus Confrontation. Journal of Mammalogy, 77, 768-775. Lemasson, A., Ouattara, K., Petit, E.J., & Zuberbuhler, K. (2011). Social learning of vocal structure in anon-human primate. BMC Evolutionary Biology, 11, 362. Maretti, G., Sorrentino, V., Finomana, A., Gamba, M., & Giacoma, C. (2010). Not just a pretty song: an overview of the vocal repertoire of Indri indri. Journal of anthropological sciences, 88, 151-165. Mazzini, F., Townsend, S.W., Viranyi, Z., & Range, F. (2013). Wolf howling is mediated by relationship quality rather than underlying emotional stress. Current Biology, 23, 1677-1680. McCarley, H. (1975). Long-distance vocalizations of coyotes (Canis latrans). Journal of Mammalogy, 56, 847-856. Merker, B. (1999). Synchronous chorusing and the origins of music. Musicae Scientiae, Special issue 1999–2000, 59–73. Merker, B. (2000). Synchronous chorusing and human origins. In N.L. Wallin, B. Merker and S. Brown (Eds.), The Origins of Music (pp. 315–327). Cambridge, MA: The MIT Press. Merker, B., & Cox, C. (1999). Development of the female great call in Hylobates gabriellae: A case study. Folia Primatologica, 70, 96-106. Merker, B., & Okanoya, K. (2007). The natural history of human language: Bridging the gaps without magic. In C. Lyon, L. Nehaniv and A. Cangelosi (Eds.), Emergence of communication and language (pp. 403-420). London: Spring-Verlag. Merker, B., Eckerdal, M., & Eckerdal, P. (2009). On the role and origin of isochrony in human rhythmic entrainment. Cortex, 45, 4-17. Pistorio, A., Vintch, B., & Wang X. (2006). Acoustic analyses of vocal development in a New World primate, the common marmoset (Callithrix jacchus). Journal of the Acoustical Society of America, 2006; 120, 16551670. Pollock, J.I. (1975). The social behavior and ecology of Indri indri. PhD thesis, University College London. Rotondo, J. L., Boker, S. M. (2002). Behavioral synchronization in human conversational interaction. Advances in consciousness research, 42, 151-162. Torti, V., Gamba, M., Rabemananjara, Z.H., & Giacoma, C. (2013). The songs of the indris: contextual variation in the long distance calls of a lemur. Italian Journal of Zoology, DOI: 10.1080/11250003.2013.845261.
96
Steiper , M.E., & Young, N.M. (2006). Primate molecular divergence dates. Molecular Phylogenetics and Evolution, 41, 384-394. Wessa, P. (2013). Free Statistics Software, Office for Research Development and Education, version 1.1.23-r7, URL http://www.wessa.net/
97
A MATTER OF PERSPECTIVE: VIEWPOINT PHENOMENA IN THE EVOLUTION OF GRAMMAR MICHAEL PLEYER Department of English, Universität Heidelberg, Kettengasse 12, Heidelberg, D-69117, Germany, [email protected] STEFAN HARTMANN German Department, University of Mainz, Jakob-Welder-Weg 18, Mainz, D-55099, Germany, [email protected] Language provides a variety of means to conceptualize objects, states, events, and abstract entities in different ways and from different perspectives. These so-called ‘construal operations’ play a key role in Cognitive Linguistics. With the example of construal operations pertaining to viewpoint and perspectivation, this paper aims to demonstrate how different theoretical and methodological approaches can be combined to yield a better understanding of how languages systematically make use of general cognitive capacities of perspective-taking, -setting, and sharing. These insights can in turn shed light on the evolution of specific grammatical phenomena as well as on the evolution of language more generally.
1. Introduction Viewpoint phenomena are pervasive in language (cf. e.g. Langacker, 1996). Consider a sentence like The next Evolang will be held in Vienna. A spatial location is indicated with the help of the preposition in, and the future tense locates the event in time, which, together with the modifier next, also indicates the conceptualizer’s temporal stance between two Evolang conferences. The passive mode sets the agent ‘off-stage’ (Langacker, 1987), thereby limiting the conceptualizer’s viewing frame to the unfolding of the event itself. Within the broader framework of Cognitive Linguistics, both Langacker’s Cognitive Grammar (cf. Langacker, 1987, 1991) and Talmy’s Cognitive Semantics (cf. Talmy, 2000) have pointed out the importance of dynamic meaning construal in linguistic communication. The notions of perspective and viewpoint figure prominently both in Langacker’s (e.g. 1987, 1991) typology of ‘construal operations’ and in Talmy’s (e.g. 2000) ‘schematic systems’ framework. We
98
propose that this view of language bears key implications for studying the evolution of specific grammatical phenomena as well as on the evolution of language more generally (Pleyer, 2012). Specifically, Cognitive Linguistics can contribute to specifying the cognitive processes which represent prerequisites for the evolution of language and for the dynamic process of meaning construal that characterizes linguistic interaction. In addition, there have been increasing efforts to verify central claims of Cognitive Linguistics on an empirical basis (cf. Janda, 2013). Corpus studies as well as experimental methods have been applied to investigate a broad range of grammatical phenomena. In sum, these studies, some of which we review below, have lent support to the Cognitive-Linguistic approach of conceptualizing language as a structured inventory of constructions at different levels of abstraction. These are seen to serve as ‘prompts’ for the embodied simulation of different kinds of entities (objects, events, actions, relations). Moreover, recent research in Cognitive Linguistics has emphasized the fundamentally social-interactional and intersubjective nature of language (e.g. Verhagen, 2005). These dimensions have also been stressed in much of language evolution research (e.g. Tomasello, 2008; Fitch, 2010). They also play a key role in the investigation of viewpoint phenomena and their evolution. Overall, these developments make Cognitive Linguistics highly amenable for interdisciplinary integration. This indicates that an incorporation of theorizing and results in Cognitive Linguistics into Evolutionary Linguistics promises to be a fruitful enterprise. In the remainder of this paper, we will discuss two examples of grammatical phenomena tightly connected to viewpoint and perspectivation. First, we discuss aspectual framing (e.g. What happend vs. What was happening), which provides a prime example for the Cognitive-Linguistic hypothesis that the shaping and the ‘construal’ of a proposition is as important an aspect of linguistic semantics as the conceptual content of an expression (cf. Langacker, 1987). Then we turn to the phenomenon of ‘subjectification’, i.e. the process by which the conceptualizer’s stance is grammaticalized ‘into’ an expression. We then discuss how these linguistic findings can be linked up with what is known about perspective-taking capacities in humans and other animals. We conclude that by looking at these linguistic phenomena and the underlying capacities that support them from the perspective of Cognitive Linguistics and Evolutionary Linguistics, we can gain significant insights into the emergence of perspectivation in language and its role in the evolution of language.
99
2. Viewpoint Phenomena: Subjectivity and Aspectual Framing Ever since its inception in the 1980s, Cognitive Linguistics has emphasized the key role of visual and spatial cognition in language. This follows quite naturally from the key Cognitive-Linguistic assumption that language has to be regarded as an integral part of human cognition and therefore as tightly interconnected with other cognitive systems. These hypotheses have been supported by a wide range of empirical research. For example, Zwaan (2004) presents evidence that in language processing, comprehenders construct experiential simulations of the described situation. Furthermore, a series of experimental studies suggest that language comprehenders represent object distance not only visually, but also auditorily (cf. Winter & Bergen, 2012). Another series of experiments lend support to the idea that grammatical person orchestrates the perspective from which an event or situation is simulated (cf. Bergen, 2012, pp. 110-114). On a more abstract level, it has been suggested that a variety of grammatical phenomena can best be explained in terms of viewpoint phenomena. One of the most widely discussed and most thoroughly investigated viewpoint phenomena is grammatical aspect. 2.1. Aspectual Framing Grammatical aspect provides a powerful means to conceptualize the unfolding of events in different ways (cf. Croft 2012, p. 4). According to Comrie (1976, p. 3), “aspects are different ways of viewing the internal temporal constituency of a situation”. Especially the variation between perfective and imperfective aspectual framing in English (e.g. She played vs. She was playing) has been widely discussed in Cognitive Linguistics. Cognitive Grammar characterizes aspectual framing in terms of viewpoint: In the case of the English progressive, “the position from which the situation is viewed is contained in the ongoing process itself (so that any boundaries are not ‘in view’)” (Verhagen, 2007, p. 51). Child language studies as well as experimental approaches have supported this view. Crucially, the ‘involved viewpoint’ plays an important role in the acquisition of progressive aspect: In child-directed speech, perfectively framed sentences tend to be used when the event denoted by the verb is still unfolding (cf. Ibbotson et al., 2013). Cook-Gumperz & Kyratzis (2001) show that in pretend play situations of 3- to 4-year olds, progressive constructions are tied to children taking an involved viewpoint on actions they take part in (e.g. I’m making soup). In experimental setups such as sentence completion or event description tasks, participants tend to describe situations in more detail when the sentence to be completed or the question to be answered are framed progressively (cf. Matlock et al.,
100
2012). Furthermore, Bergen & Wheeler (2010) show that progressive, but not perfect sentences about hand motion facilitate manual action in the same direction. This indicates that progressive aspect evokes a higher degree of ‘immersion’ of the conceptualizer. The spatial nature of grammatical aspect becomes particularly obvious in languages in which spatial expressions have come to be grammaticalized as aspectual markers. Locative constructions are used for progressive aspect in a variety of both genetically related and unrelated languages, e.g. in a number of African languages, in French (être en train de), and in Dutch (aan het V zijn) (cf. Booij, 2008). The conceptualization evoked by the Dutch aan het construction has been addressed by Flecken & Gerwien (2013). More specifically, they investigate the interaction between participants’ duration estimations of progressive and non-progressive event descriptions and the inherent duration of the respective events. They found that “the progressive form extends duration estimations for short events, whereas it shortens the perceived duration of inherently medium and long events.” They argue that by means of progressive aspect, conceptualizers take an involved viewpoint by selecting a time interval that falls within the total duration of the event. Taken together, these results lend support to the hypothesis that both the acquisition and the comprehension of grammatical aspect are fundamentally grounded in physical reality and social interaction. 2.2. Subjectivity and Subjectification Most Cognitive-Linguistic approaches attribute a key role to the conceptualizer, who figures prominently in expressions of attitudes and mental states. Traugott (1997) defines subjectification as the process “whereby meanings become increasingly based in the speaker's subjective belief state, or attitude toward what is said”. For example, the epistemic uses of promise and threaten can be seen as cases of subjectification. A sentence such as He promised to be stout when grown up (Defoe, 1722, OED) obviously does not refer to a commissive speech act, as in She promised to be home at eight, but rather expresses the speaker’s belief of how the person referred to will look like in the future. For Langacker (1990), subjectification pertains to the degree to which a conceptualizer is construed as ‘offstage’. In a sentence like I believe he’ll be stout when grown up, the subject of conceptualization is explicitly mentioned and therefore, in Langacker’s terms, construed as ‘onstage’. In He promised to be stout when grown up, by contrast, the conceptualizer is grammaticalized ‘into’ the construction, as it were. Consequently, subjectification can be seen as tightly connected to perspectival
101
construal operations as it relates to the conceptualizer’s vantage point and role in a viewing relationship (cf. Langacker, 1990). The process of subjectification is a prime example of how the meaning of a linguistic construction can change over time due to the availability of different construal options and shifts in the prototypicality of these particular options. In the case of promise, only the performative reading seems to be available in the first stage of its development, while the various epistemic usage variants only develop from the 16th century onwards (cf. Traugott, 1997, pp. 186f.). In a usage-based model (Barlow & Kemmer, 2000), these processes can be modeled as reconfigurations of a complex semantic network. 3. Evolutionary Origins of Perspectival Construal Operations From the point of view of Evolutionary Linguistics, the phenomena presented in the previous sections present us with the challenge to explain how the ability of dynamic perspectival construal in language and cognition evolved. More specifically, we can ask which precursors to these abilities can be found in nonhuman animals and how these capacities develop in ontogeny. Perspective-taking in humans is a complex skill which is based on a set of many interacting capacities and motivations. Some of these capacities and motivations are considered to be uniquely human, whereas others seem to be shared with other primates, especially the great apes. These can thus be seen as the evolutionary foundation or platform of the perspectival abilities underlying the emergence of viewpoint phenomena in language and cognition. From around one year of age, human infants show the ability and motivation to share perspectives in direct engagement. The capacity to understand what others experience starts to develop a few months later and seems to be fully developed around 24 to 30 months. Understanding how another person sees something only seems to emerge around their third birthday, with the full-blown capacity to explicitly confront and reflect on different perspectives emerging around four years of age (Moll & Meltzoff, 2011). Many studies suggest that chimpanzees exhibit rudimentary forms of perspective-taking. That is, “chimpanzees, like humans, understand that others see, hear and know things” (Call & Tomasello, 2008, p. 190). Importantly, though, the structure of social perspective-taking and -setting in humans goes well beyond these capacities. Perspective-taking in humans can be conceptualized as a dynamical process of intersubjective, participatory sense-making, which is based on embodied interaction and the mutual incorporation of embodied perspectives (Fuchs, 2012). For example, humans do not only understand and take other people’s perspectives, but, in
102
contrast to chimpanzees, they make use of their perspective-taking capabilities in a fundamentally cooperative, declarative, and informative kind of communication (cf. Tomasello, 2008). This means that the evolution of the human drive to share perspectives and psychological states with others – something which Fitch (2010, pp. 130f.) refers to as ‘Mitteilungsbedürfnis’ – was of fundamental importance in the evolution of language. The examples presented in section 2 demonstrate that certain features of human languages can be seen as crucially relying on this evolved human capability for dynamic perspectival construal both in the visual-spatial and in the socio-cognitive sense. In particular, we argue that perspectival construal operations play a key role in the evolution of grammar. One particularly striking development supporting this hypothesis can be found in Nicaraguan Sign Language (NSL). Over a very short period of time, this emerging sign language developed morphological devices for linking arguments with their respective verbs, which fundamentally rely on spatial contrasts (cf. Senghas, 2000). More precisely, signers of the second, but not of the first cohort use sign direction as a morphological device to express semantic roles. In doing so, signers of the second, but not of the first cohort consistently choose a character view representation, i.e. they represent the relative positions of the characters involved in the event to be expressed from the respective point of view of these characters (cf. Senghas, 2000). 4. Conclusion The findings discussed above strongly suggest that the human perspectival drive and the socio-cognitive capacities connected to it play an important role in the emergence of perspectivation and viewpoint phenomena in language and cognition. This, we argue, is the most important aspect and function of language from an evolutionary and cognitive perspective. Language and the perspectival construal operations of individual languages evolved as a means to conceptualize objects, states, events, and abstract entities in different ways and from different perspectives. These linguistic construal operations serve as prompts for the creation of embodied simulations. The overall capacity for linguistic perspectivation depends on general cognitive capacities. Most important among these is the human capacity for perspective-taking, -setting, and -sharing. From the point of view of Cognitive Linguistics and Evolutionary Linguistics, we thus have to further explicate how these capacities are tied to the phenomena of viewpoint and perspective in language. As we have shown above, understanding these capacities requires combining the insights to be gained from different approaches such as experimental studies, child language
103
research, comparative studies, and investigations into the historical evolution of fully developed as well as emerging languages. Moreover, investigating the evolution of these underlying capacities can in turn help provide a better understanding of grammatical phenomena such as those discussed in this paper. References Barlow, M., & Kemmer, S. (eds.) (2000). Usage-Based Models of Language. Stanford: CSLI Publications. Bergen, B. K. (2012). Louder than Words. The New Science of How the Mind Makes Meaning. New York: Basic Books. Bergen, B.; Wheeler, K. (2010). Grammatical Aspect and Mental Simulation. In: Brain and Language, 112, 150-158. Booij, G. E. (2008). Constructional Idioms as Products of Linguistic Change. The aan het + Infinitive Construction in Dutch. In A. Bergs and G. Diewald (Eds.), Constructions and Language Change (pp. 79-104). Berlin, New York: De Gruyter. Call, J., & Tomasello, M. (2008). Does the Chimpanzee Have a Theory of Mind? 30 Years Later. Trends in Cognitive Sciences, 12, 187-192. Comrie, B. (1976). Aspect. An Introduction to the Study of Verbal Aspect and Related Problems. Cambridge: Cambridge University Press. Cook-Gumperz, J.; Kyratzis, A. (2001). Pretend Play. Trial Ground for the Simple Present. In M. Pütz, et al. (Eds.), Applied Cognitive Linguistics I (pp. 4162). Berlin, New York: De Gruyter. Croft, W. (2012). Verbs. Aspect and Causal Structure. Oxford: Oxford University Press. Fitch, W.T. (2010). The Evolution of Language. Cambridge: Cambridge University Press. Flecken, M.; Gerwien, J. (2013). Grammatical Aspect Influences Event Duration Estimations. In M. Knauff et al. (Eds.), Cooperative Minds. Proceedings of the 35th Annual Meeting of the Cognitive Science Society (pp. 2309-2314). Austin, TX: Cognitive Science Society. Fuchs, T. (2012). The Phenomenology and Development of Social Perspectives. Phenomenology and the Cognitive Sciences. (online first), doi 10.1007/s11097-012-9267-x. Hare, B.; Call, J., & Tomasello, M. (2001). Do Chimpanzees Know What Conspecifics Know? Animal Behavior, 61, 139-151. Ibbotson, P., Lieven, E., & Tomasello, M. (2013). The Communicative Contexts of Grammatical Aspect Use in English. Journal of Child Language (online first), doi: 10.1017/S0305000913000135. Janda, L. A. (ed.) (2013). Cognitive Linguistics. The Quantitative Turn. The Essential Reader. Berlin, New York: De Gruyter.
104
Langacker, R. W. (1987). Foundations of Cognitive Grammar. Vol. 1. Theoretical Prerequisites. Stanford: Stanford University Press. Langacker, R. W. (1990). Subjectification. Cognitive Linguistics, 1, 5-38. Langacker, R. W. (1991). Foundations of Cognitive Grammar. Vol. 2. Descriptive Application. Stanford: Stanford University Press. Langacker, R. W. (1996). Viewing in Cognition and Grammar. In P. W. Davis (Ed.), Alternative Linguistics. Descriptive and Theoretical Modes (pp. 153212). Amsterdam, Philadelphia: John Benjamins. Matlock, T.; Sparks, D.; Matthews, J. L.; Hunter, J., & Huette, S. (2012). Smashing New Results on Aspectual Framing. How People Talk about Car Accidents. Studies in Language, 36, 700-721. Moll, H., & Meltzoff, A. N. (2011). Perspective-Taking and its Foundation in Joint Attention. In J. Roessler, H. Lerman and N. Eilan (Eds). Perception, Causation & Objectivty (pp. 286-304). Oxford: Oxford University Press. Pleyer, M. (2012). Cognitive Construal, Mental Spaces and the Evolution of Language and Cognition. In T. C. Scott-Phillips, M. Tamariz, E. A. Cartmill and J. R. Hurford (Eds.), The Evolution of Language. Proceedings of the 9th International Conference (pp. 288-295). Singapore: World Scientific. Senghas, A. (2000). The Development of Early Spatial Morphology in Nicaraguan Sign Language. In S. C. Howell et al. (Eds.), Proceedings of the 24th Annual Boston University Conference on Language Development (pp. 696707). Somerville, MA: Cascadilla Press. Talmy, L. (2000). Toward a Cognitive Semantics. 2 vol. Cambridge: MIT Press. Tomasello, M.; Carpenter, M.; Call, J.; Behne, T., & Moll, H. (2005). Understanding and Sharing Intentions. The Origins of Cultural Cognition. Behavioral and Brain Sciences, 28, 675-691. Tomasello, M. (2008). Origins of Human Communication. Cambridge: MIT Press. Traugott, E. C. (1997). Subjectification and the Development of Epistemic Meaning. The case of promise and threaten. In T. Swan and O. J. Westvik (Eds.), Modality in Germanic Languages. (pp. 185-210). Berlin, New York: De Gruyter. Verhagen, A. (1995). Subjectification, Syntax, and Communication. In D. Stein and S. Wright (Eds.), Subjectivity and Subjectivisation. Linguistic Perspectives (pp. 103-128). Cambridge: Cambridge University Press. Verhagen, A. (2007). Construal and Perspectivization. In D. Geeraerts and H. Cuyckens (Eds.), The Oxford Handbook of Cognitive Linguistics (pp. 48-81). Oxford: Oxford University Press. Winter, B., & Bergen, B. (2012). Language Comprehenders Represent Object Distance both Visually and Auditorily. Language and Cognition, 4, 1-16. Zwaan, R. A. (2004). The Immersed Experiencer. Toward an Embodied Theory of Language Comprehension. Psychology of Learning and Motivation, 44, 35-62.
105
A CONSTRUCTIONIST APPROACH TO THE EVOLUTION OF MORPHOLOGICAL COMPLEXITY STEFAN HARTMANN German Department, University of Mainz, Jakob-Welder-Weg 18, D-55099 Mainz, Germany, [email protected] The domain of morphology provides a particularly challenging area of research not only for general linguistics, but also for the study of language evolution. This paper discusses which insights can be gained from the theoretical framework of Construction Grammar (CxG) with regard to the evolution of morphology. It is shown that the CxG model of linguistic knowledge is perfectly compatible with an emergentist account of morphological complexity.
1. Introduction Morphology has been called “the conceptual centre of linguistics” (Spencer & Zwicky, 1998) since it is tightly connected with phonology, syntax, semantics, and pragmatics. Nevertheless, with the notable exception of CarstairsMcCarthy’s (e.g., 2010) body of work, morphology has so far only played a minor role in the study of language evolution. This paper aims at elucidating the merits and possibilities of studying the evolution of morphology in a Construction Grammar (CxG) framework. More specifically, I will argue that regarding morphological patterns as constructions in the CxG sense allows for a straightforward account of the emergence of morphology in the course of the cultural evolution of language. In this view, morphology evolves quite naturally in a bottom-up and usage-based fashion as a ‘linking element’ between syntax and lexicon. The emergence of morphology is therefore tightly connected with the evolution of the syntax-lexicon continuum. This paper draws on three core assumptions, which will be spelled out in more detail in the subsequent sections. First, language is a Complex Adaptive System (CAS) constituted by the interrelated dynamic systems of biological evolution, socio-cultural transmission, and individual learning, which operate at the phylogenetic, glossogenetic, and ontogenetic timescales, respectively (cf. Steels, 2011; Kirby, 2012). In the course of cultural evolution, this complex
106
system adapts to domain-general learning and processing biases (cf. Christiansen & Chater, 2008). In this view, it can be assumed that developments to be observed in historical language change as well as in language acquisition can inform an account of the emergence of morphology. Second, morphology and syntax do not constitute separate autonomous modules of the grammar; neither do lexicon and grammar constitute separate modules of language (cf. Taylor, 2012). Instead, any natural human language can be exhaustively described in terms of constructions, which are – in line with the CAS approach – intrinsically dynamic, since their meaning constantly has to be ‘negotiated’ in actual language use (cf. Lewandowska-Tomasczyk, 1985). Thus, they are constantly subject to constructional change (Hilpert, 2013), which also entails constant shifts on the syntax-lexicon continuum. The third major assumption, which follows out of the first two, is that grammar is entirely usage-based. As Tomasello (2009) puts it: “Meaning is use” and “structure emerges from usage”. The remainder of this paper is divided into three parts. First I will discuss the role of morphology in constructionist accounts of linguistic knowledge. Then I will show how these theoretical considerations can be combined with empirical findings from child language research, historical linguistics, and comparative studies to yield a more comprehensive picture of the cognitive capacities as well as the socio-cultural factors that play a role in the evolution of morphology. A brief conclusion then outlines a possible scenario how and due to which pressures morphologically complex constructional schemas might have evolved. 2. Morphology in Construction Grammar Construction Grammar (CxG) serves as an umbrella term for a family of linguistic theories that sees co nstructio ns as the basic units of language. Constructions are defined as form-meaning pairings at different levels of abstraction including morphemes, words, idioms, and abstract phrasal patterns (cf. Hoffmann & Trousdale, 2013). On this view, morphology takes a middle place on a scale ranging from atomic to highly specific items, the so-called syntax-lexicon continuum. Indeed, morphology is itself quite heterogeneous: Some morphological constructions (e.g. N+N compounding) can be assigned a position closer to the ‘syntax’ pole, while others (e.g. suppletion: go – went) are more ‘lexical’ in nature. Morphological constructions can be represented as constructional schemas such as the one in (1) for nominal compounds like milkman in English. Importantly, these schemas can also capture non-morphemic processes such as conversion, as the constructional schema for Dutch conversion (e.g. bouw ‘to build’ > bouw ‘building’) in (2) demonstrates.
107
(1) [[a]Xk [b]Ni]Nj ↔ [SEMi with relation R to SEMk]j (from Booij, 2010, p. 17) (2) [[x]Vj]N[-neuter],i ↔ [ACTIONj]i (from Booij, 2010, p. 40) In other words, morphologically complex words do not necessarily have to exhibit complex morphemic structure. They can also be complex by virtue of bearing a systematic structural and/or semantic relation to another word, as is the case in clipping. The clipped forms sax for saxophone, par for paragraph, and vet for veterinarian (Schmid, 2011, p. 217), for example, are all formed in accordance with the same constructional schema. Moreover, they are all marked for informality. However, word-formation products can also lose their relationship to their base words by means of lexicalization. For example, flu has lost the aspect of informality and has surpassed influenza as the default choice of words for describing this disease even in more formal text types throughout the 20th century, as a corpus research in the TIME magazine corpus of American English (Davies, 2007) reveals: While the token frequency of influenza decreases significantly (Kendall’s τ = -0.67, p < 0.05), the frequency of flu increases (τ = 0.61, p < 0.05). Consequently, the proportion of flu in relation to the sum total of instances of both flu and influenza also increases significantly from about 5% in the 1920s to more than 80% in the 2000s (τ = 0.78, p < 0.01). CxG conceives of the mental representation of language in terms of a socalled constructicon (cf. e.g. Goldberg, 2006), which can be metaphorically described as a ‘mental corpus’ (Taylor, 2012): Language users keep track of the linguistic utterances they encounter and form generalizations over these instances of language use at different levels of abstraction. In the domain of morphology, this means that language users abstract away constructional schemas from usage patterns. These schemas then serve as templates against which specific instances of morphological patterns are understood and which can in turn be used productively in accordance with the constraints defined by the respective schema. Importantly, the constraints imposed by constructional schemas are entirely usage-based. They come about in a strictly bottom-up fashion and can be overridden by language change, i.e. by changes in usage. 3. The Evolution of Morphological Complexity Although it is hardly a matter of debate that the underlying capacities of language are – at least in the specific arrangement that made the emergence of language possible – uniquely human, most cognitively oriented approaches assume that the grammars of human languages make use of general cognitive
108
principles which evolved for non-linguistic purposes. Specifically, Tomasello (2009) proposes that the capacity to acquire a language fundamentally relies on two more general functions, namely, intention-reading and pattern-finding. Precursors of both can arguably be found in non-human animals (cf. Tomasello, 2008). For example, Endress et al. (2009) found that cotton-top tamarin monkeys “can spontaneously (no training) acquire an affixation rule that shares important properties with our inflectional morphology”. They conclude that some of the cognitive mechanisms underlying affixation rely on perception or memory primitives. However, the pattern-finding abilities involved in linguistic morphology are of course much more complex and can be discerned drawing on insights from language change and language acquisition. Concerning the latter, CxG assumes that children learn language from the statistical features of their input (cf. Goldberg, 2006, p. 71), making use of the skewed frequencies characterizing this input (cf. Taylor, 2012). Despite a variety of open questions and language-specific differences, there is a great amount of evidence that children acquire both inflectional and derivational morphology by first making rather small generalizations and then building up the morphological schemas in a piecemeal fashion (cf. Behrens, 2009). To be sure, the patternfinding abilities involved in the acquisition of morphology go way beyond simple categorization capabilities. They also involve highly sophisticated cognitive processes such as metaphor and metonymy, which have been studied extensively in Cognitive Linguistics (cf. Croft & Cruse, 2004). For example, Dirven (1999) argues that conceptual metonymy of event schemas, operating at the predicate-argument level, is involved in English noun > verb conversion, e.g. nurse > (to) nurse sb. In this example, the typical activities of a nurse are metonymically mapped to the verb yielded by the constructional schema for N>V conversion. Children tend to adopt and overgeneralize this pattern, cf. e.g. German lampen ‘(to) lamp (i.e. use a flashlight)’ (Behrens, 2011, p. 165). The cognitive processes involved in language acquisition also play a role in historical language change. The importance of pattern-finding in the cultural evolution of morphological patterns is obvious in the diachronic development of many morphological constructions, e.g. in the case of the emergence of a new ablaut class in German: Verbs such as melken ‘(to) milk’ or spinnen ‘(to) spin’ changed their inflection pattern from melken - malk - gemolken to melken - molk - gemolken (cf. Nowak, 2013). This pattern became entrenched over time so that other verbs such as schwimmen ‘to swim’ tend to be inflected according to this pattern, as well. In the view presented in the preceding sections, accounting for the evolution of morphology is tantamount to accounting for the emergence of
109
morphologically complex constructional schemas. For example, in the oftencited example of the reanalysis of -gate from Watergate (which, being a proper name, can be treated as a simplex construction [Watergate]), a complex constructional schema [[x]Ni [gate]N]Nj emerges (cf. Booij, 2010, p. 90). On the meaning pole, the suffix -gate imposes the meaning ‘scandal’ to whatever lexical item is inserted into the open slot of the schema. Both on the form and on the meaning pole of a construction, abstraction and schematicity emerge diachronically through language use. This is evident in the well-attested process of grammaticalization (Hopper & Traugott, 2003). For example, the Germanic form *līka- was associated with the fairly concrete meaning ‘body’. Through frequent use in compounds, it gradually became an affix (e.g. adverbial -ly in English, adjectival -lich in German). In a usage-based view, “there is every reason to assume that the very first grammatical constructions emerged in the same way as those observed in more recent history.” (Bybee, 2010, p. 202) Therefore, diachronic language change, most importantly grammaticalization phenomena, can shed light on the origins of structure in language (cf. Heine & Kuteva 2007). Of course, the processes at work in the emergence of morphology go beyond grammaticalization. For example, reanalysis, pragmatic factors, and grammaticalization conspire in the emergence of the suffix -gate. In the initial stages of its productive use, it could only be understood because of the association with Watergate, i.e. its use presupposed some common ground between interlocutors; hence, this example also highlights the pragmatic and intersubjective as well as cultural factors involved in the emergence of morphological patterns. To sum up, morphology evolved – and keeps on evolving – through actual language use. This ties in neatly with the scenario outlined by Bybee (2010, p. 203): “Grammar developed gradually as language was used and as the capacities of humans or our ancestors increased to accommodate a large vocabulary, more abstract categories and many automated sequences.” 4. Conclusion In a constructionist perspective, the evolution of morphology in an emergentist way (cf. e.g. Hopper, 1998) seems entirely conceivable. Given the assumption of a syntax-lexicon continuum, in a CxG perspective, the emergence of morphology is closely related to both syntax and lexicon. In the case of syntax, we can assume the gradual emergence of syntactic complexity, starting with the mere succession of lexical items (e.g. farmer kill duckling) and eventually, through schema abstraction over usage patterns, yielding highly complex constructional
110
schemas. Of course, the open slots of syntactic constructions are more often filled by some lexical items than by others – indeed, empirical investigations of specific constructions reveal highly skewed frequencies, often even Zipfian distributions (cf. Goldberg, 2006, p. 76). Consequently, specific lexical items tend to occur together – and as countless cases in the traceable history of human languages demonstrate, “items that are used together fuse together” (Bybee, 2007, p. 316). Experimental approaches in an iterated learning framework have shown that linguistic structure is subject to pressures for compressibility and simplicity in the course of cultural evolution (e.g. Smith et al., 2013). These pressures are at work in the evolution of morphology, as well. If the scenario outlined above is correct, morphological complexity emerged in response to the need to compress a fairly broad range of patterns of thought in an efficient way. At the same time, the pressure for simplicity pertains both to the form and to the meaning pole of a (morphological) construction. These pressures are closely tied to pragmatic and discursive needs. For example, the lexicalization of a frequent complex word such as wheelchair ensures efficient communication, again – as in the case of -gate – drawing on common ground and shared extra-linguistic knowledge. In this view, there is no need to assume a specific ‘module’ for morphology to evolve on the phylogenetic/biological timescale. Instead, the emergence of morphology can be treated as a process of cultural evolution. While the assumptions concerning the cognitive capacities underlying morphology differ fundamentally from Carstairs-McCarthy’s (2010) approach, the scenario he outlines is in principle compatible with the theory presented here. He assumes that the evolution of morphology begins with purely phonological alternations (“proto-‘allomorphy’”). These are reanalyzed in a similar fashion as umlaut patterns, which at first were purely phonologically conditioned (e.g. foot - feet). Indeed, this scenario bears important similarities to the scenario for the evolution of syntactically complex constructional schemas presented above: We start with distinct items and eventually arrive at generalization and schema abstraction. These considerations also open up new perspectives on the debate about the “building block” metaphor (cf. e.g. Langacker, 1987: 452f.): Indeed, this metaphor applies quite straightforwardly to a hypothetical protolanguage consisting solely of lexical items. To a certain extent, it also applies to a hypothetical protolanguage consisting of lexical items and rudimentary syntax. However, given the storage and processing constraints of the human brain, such a “building-block” language would neither be very efficient nor capable of expressing complex states of affairs. As language adapts to the human brain (Christiansen & Chater, 2008), it makes use of domain-general capabilities
111
which can be subsumed under Tomasello’s broad notion of pattern-finding, blurring the boundaries between syntax and the lexicon. Now, we are dealing with schemas rather than building blocks. At the same time, the emergence of morphology fundamentally draws on common ground and intersubjective meaning construal (Tomasello’s ‘intention reading’). Hence, in a usage-based constructionist perspective, cognitive, social-interactional, and cultural factors conspire in the evolution of morphological complexity. Acknowledgments I wish to thank Michael Pleyer, Jonas Nölle, Andreas Hölzl, and three anonymous reviewers for their helpful comments and suggestions on a previous draft of this paper. Remaining errors are of course mine. References Behrens, H. (2009). Grammatical Categories. In E. L. Bavin (Ed.), The Cambridge Handbook of Child Language (pp. 199-215). Cambridge: Cambridge University Press. Behrens, H. (2011). Die Konstruktion von Sprache im Spracherwerb. In A. Lasch and A. Ziem (Eds.), Konstruktionsgrammatik III. Aktuelle Fragen und Lösungsansätze (pp. 166-179). Tübingen: Stauffenburg. Booij, G. E. (2010). Construction Morphology. Oxford: Oxford University Press. Bybee, J.L. (2007). Frequency of Use and the Organization of Language. Oxford: Oxford University Press. Bybee, J. L. (2010). Language, Usage and Cognition. Cambridge: Cambridge University Press. Carstairs-McCarthy, A. (2010). The Evolution of Morphology. Oxford: Oxford University Press. Christiansen, M. H., & Chater, N. (2008). Language as Shaped by the Brain. Behavioral and Brain Sciences, 31, 489-558. Croft, W., & Cruse, A. (2004). Cognitive Linguistics. Cambridge: Cambridge University Press. Davies, M. (2007). The TIME Magazine Corpus. (100 Million Words, 1920s2000s.) Available online at http://corpus.byu.edu/time/ (last accessed 05/09/2013). Dirven, R. (1999). Conversion as a Conceptual Metonymy of Event Schemata. In K.-U. Panther & G. Radden (Eds.), Metonymy in Language and Thought (pp. 275-287). Amsterdam/Philadelphia: John Benjamins. Endress, A. D.; Cahill, D.; Block, S.; Watumull, J., & Hauser, M. D. (2009). Evidence of an Evolutionary Precursor to Human Language Affixation in a Non-Human Primate. Biology Letters (online, doi 10.1098/rsbl.2009.0445).
112
Goldberg, A. E. (2006). Constructions at Work. The Nature of Generalization in Language. Oxford: Oxford University Press. Heine, B., & Kuteva, T. (2007). The Genesis of Grammar. A Reconstruction. Oxford: Oxford University Press. Hilpert, M. (2013). Constructional Change in English. Cambridge: Cambridge University Press. Hoffmann, T., & Trousdale, G. (2013). Construction Grammar. Introduction. In T. Hoffmann & G. Trousdale (Eds.), The Oxford Handbook of Construction Grammar (pp. 1-12). Oxford: Oxford University Press. Hopper, P. J. (1998). Emergent Grammar. In M. Tomasello (Ed.), The New Psychology of Language. Vol. 1 (pp. 155-174). Mahwah, NJ: Erlbaum. Hopper, P. J., & Traugott, E. C. (2003). Grammaticalization. 2nd ed. Cambridge: Cambridge University Press. Kirby, S. (2012). Language is an Adaptive System. The Role of Cultural Evolution in the Origins of Structure. In M. Tallerman & K. R. Gibson (Eds.), The Oxford Handbook of Language Evolution (pp. 589-604). Oxford: Oxford University Press. Langacker, R. W. (1987). Foundations of Cognitive Grammar. Vol. 1. Theoretical Prerequisites. Stanford: Stanford University Press. Lewandowska-Tomaszczyk, B. (1985). On Semantic Change in a Dynamic Model of Language. In J. Fisiak (Ed.), Historical Semantics, Historical Word-Formation (pp. 297-323). Berlin, New York, Amsterdam: Mouton. Nowak, J. (2013). spinnen - sponn? - gesponnen. Die Alternanz x-o-o als Alternative zum "Schwachwerden". In P. M. Vogel (Ed.), Sprachwandel im Neuhochdeutschen (pp. 169-184). Berlin, New York: De Gruyter. Schmid, H.-J. (2011). English Morphology and Word-Formation. An Introduction. 2nd ed. Berlin: Erich Schmidt Verlag. Smith, K.; Tamariz, M., & Kirby, S. (2013). Linguistic Structure is an Evolutionary Trade-Off between Simplicity and Expressivity. In M. Knauff et al. (Eds.), Proceedings of the 35th Annual Meeting of the Cognitive Science Society (pp. 1348-1353). Austin, TX: Cognitive Science Society. Spencer, A., & Zwicky, A. M. (1998). Introduction. In A. Spencer and A. M. Zwicky (Eds.), The Handbook of Morphology (pp. 1-10). Oxford: Blackwell. Steels, L. (2011). Modeling the Cultural Evolution of Language. In: Physics of Life Reviews, 8, 339-356. Taylor, J. R. (2012). The Mental Corpus. Oxford: Oxford University Press. Tomasello, M. (2008). Origins of Human Communication. Cambridge: MIT Press. Tomasello, M. (2009). The Usage-Based Theory of Language Acquisition. In E. L. Bavin (Ed.), The Cambridge Handbook of Child Language (pp. 69-87). Cambridge: Cambridge University Press.
113
LANGUAGE EVOLVED FOR STORYTELLING IN A SUPERFAST EVOLUTION TILL NIKOLAUS VON HEISELER Independent, Lützowstraße 81, 10785 Berlin, Germany The problem of language evolution can be split into two questions: What was language selected for? And what possible scenario could allow this adaptation process to occur? This paper addresses only the first question. Our hypothesis is that language was initially selected for narration (an utterance that refers to a past action) and only later was used for other purposes. We provide evidence from different fields and perspectives and show that selection for narration is not only suggested by the structure of language itself but also has the potential to solve other unanswered questions: the emergence of episodic memory and the ability to understand others as having beliefs, thoughts, desires, etc. (theory of mind). Furthermore, our hypothesis provides a new perspective on the problem of human altruism. This paper is meant not as a final result, but as a proposal for a provocative working hypothesis (see title) and a prelude to a model that could explain why narration (or storytelling) came under such immense selective pressure.
1. Introduction The problem of language evolution implies two questions: first, what was language selected for? And second, in what concrete scenario could this selection have actually happened? The first question concerns the evolutionary function of language. The second question addresses the actual scenario in which this function became essential. We could reconstruct the function of language simply by analyzing its structure, without considering why this function was put under selective pressure. However, the traditional view often starts with the preconception that language is a complex system of communication and then speculates about its communicative advantages. To avoid such speculations, we start with a structural analysis of language, and then infer the unique function of this structure to which language is adapted. Besides the evolution of language, there are other unsolved problems of human evolution. One of these problems is the unique human altruism. Humans cooperate with other individuals even if they know they will not be repaid for doing so; infants are born with an innate general altruism that later narrows down to certain persons and certain situations (Tomasello, 2009). Other 114
problems concern the super-fast evolution of certain human traits (Lahn, et al., 2004), especially (1) the ability to understand others as having states of mind such as beliefs, thoughts, desires, and so on (theory of mind) and (2) episodic memory, that is, the ability to remember certain occurrences and events of the past in a narrative order. 2. Language and time traveling 2.1 Did language evolve from animal communication? One crucial question is whether language evolved from animal communication.
Figure 1 | Three possibilities of language evolution. In this figure we illustrate three possibilities of language evolution. In the first row, language is just an extension of animal communication. In the second row, language emerges from animal communication; however, what makes it language comes in at the very end and requires a saltation (between 3b and 4b). In the third row, the exclusive function of language emerges separately from the animal communication system. It develops until it fulfils most of the functions that the animal communication system fulfilled (but there still exist some forms of communication other than language).
There are three possible scenarios for the evolution of language (Figure 1). (1) Language evolved from animal communication and is just a more powerful instrument of communication; it is not essentially different from animal communication and probably is a result of multiple innovations. (2) Language evolved from animal communication, but at a certain point it underwent a fundamental change. (3) Language did not evolve from animal communication; instead, it evolved to fulfil a completely new function.
115
If the first scenario were true, there would be no problem about language in the first place: language would be a trait just like other traits and either would have no special function (or structure) or this special function would be a result of many gradual transitions. The second scenario is the most inelegant concept, because a saltation emerges out of nowhere (e.g., an integration of recursion). The third scenario proposes that language evolved for an utterly new function. It is the most elegant concept, because it requires just one explanation: what is the new function? According to this concept, the key to the problem of language evolution is this exclusive function, a function that animal communication systems do not fulfil. There is some neuroscientific evidence supporting the third concept: language and animal communication are not coded in the brain in the same area. Instead, the action-recognition system of primates has been suggested to be homologous with Broca’s area in the human brain, a region in the frontal lobe of the left hemisphere that is linked to speech production (e.g. Arbib, 2005; Rizzolatti & Sinigaglia, 2008). 2.2 The unique structure of language and its exclusive function We may not know exactly how language works (Bargmann et al., 2013), but we do know some things about its structure. The core of the sentence – the verb – refers to something that is not visible on its own and produces various thematic roles (agent, patient, instrument, etc.), so-called ‘slots’ (Jackendoff, 1983). This means that the basic units of language are not words but propositional sentences; words are only fission products of sentences. The sentence ‘An ape grips the grape’, for example, symbolizes a holistic sensorial experience, the perception of an action. The distinctions used are not ‘in the world’ but are the result of categorizations that implies a distinction between apes and non-apes, gripping and all other possible actions, and grapes and all other objects. Thus we could say: the perception of an action is decomposed by means of syntactic structure (the only words that actually correspond to real objects on their own are proper names). But why should an individual decompose reality in this way? We can think of three reasons: (a) for the purpose of reasoning (to understand ‘what is going on’), (b) to tell someone something that happened earlier and out of sight—that is, to narrate displaced actions and events, or (c) future planning. Narration we define as an utterance that refers to a displaced action (excluding a signal that is solely an index to a displaced object or animal). In its prototype it does not require narrated series of events.
116
2.3 Language is adapted to narration In the following section, we analyze the relation between syntax and narration. 1. Narration is possible only with syntax. The vervet monkey (Chlorocebus pygerythrus) has three different alarm calls, depending on the particular predator it is warning against: leopards, snakes, or eagles (Seyfarth, 1980). Contrary to direct communication, in this case there is a reference to the predator as the third party. But reference to this third party is not to something absent, but rather to an overlooked presence. So the call is not a symbol used in communication; instead, the signal dissolves the communicative situation: the receiver does not react to the sender but to what the signal refers to. A single utterance will always be interpreted as information about the present: if someone cries out “Fire!”, she does not mean that the Bibliotheca of Alexandria burned down more than two thousand years ago, but that there is a fire here and now. Thus, the displacement of an action is possible only if an utterance is given a context through other words and within a sentence; syntax is therefore the basis of narration. 2. Apes use signalling only for imperatives. The gestures of trained chimpanzees are mostly imperative, designed to bring reward or advantage to the sender. Human language, by contrast, includes “declarative statements as well as imperative ones” (Corballis, 2011, p. 163). If language evolved for an exclusive reason, it seems plausible that it evolved for declarative statements. In addition, humans are experts at storytelling: they not only enjoy stories but can differentiate and select between good and poor narratives. 3. Goodall (Conversation with Freddy Gray, 2010) supports the claim that the most important unique function of language is the displacement of action, that is, the communication of events that are not present. After recently being asked in an interview about the most significant difference between chimpanzees and humans, she responded: “Chimps … are unable to communicate about things that aren’t present” (Goodall, 2010). And Corballis (2011, pp. 113–114) writes: Language may have evolved … so that we can share our mental travels through time. … I think that grammatical language evolved primarily to enable us to share episodes. … Language is exquisitely designed to communicate ‘who did what to whom, where, when and why.’ We can further infer that the exclusive function of language is the communication of an absent action, because the existence of a present action could be communicated simply by pointing. Indeed, it can be shown that many 117
of the trappings of grammar are solutions to the problem of conveying episodic information (Corballis & Suddendorf, 2007).The basis for memory of past events (episodic memory) is the understanding of actions of conspecifics. What are the neurocognitive foundations for understanding social actions? 2.4 The mirror neuron system, gesture, and language In this section, we investigate how the mirror neuron system is related to language. We will see that the ability to understand action – and not the communication of animals – is the precursor to language. One of the major breakthroughs in cognitive neuroscience in recent decades has been the discovery of neurons in the cerebral cortex of rhesus monkeys that fire when an action is performed by the subject as well as when it is observed (Rizzolatti et al., 1996). Understanding could therefore be interpreted as internalized imitation. This means that understanding is more complex than imitation and can be broken down into two steps: spontaneous imitation and the suppression of this reflex (internalization). Since the understanding of action implies imitation and internalization (suppression of the physical imitation reflex), the main challenge of signing a verb is not the signing (simulation of the action) itself, but remembering a past action at the particular moment of language use. The challenge for the receiver would be to understand a simulation in the frame of a narration (to understand that signs refer to past actions). Objects could be spontaneously signed by the signer showing their use (in sign languages, these kinds of signs are called ‘manipulators’). The first utterance (no matter how primitive) that referred to a past action conferring an evolutionary advantage on the sender would probably start an escalating evolutionary process, as a result of which the narrative ability would develop even further. The development of the vocabulary (restricted to manipulators) could be extraordinarily fast because the development would rely on a natural lexicon founded in the social reasoning system of primates. 3. The speed of evolution and the development of unique human traits Recent studies (Lahn, et al., 2004) suggest that humans developed through a ‘super-fast’ evolutionary process found nowhere else in the animal kingdom. The speed of evolution depends on the production of variation, on the probability of the variation being an improvement, on the number of fields in which improvements are possible, on evolutionary pressure, and on the number of individuals involved in the evolutionary process. Bigger changes will be based on mutations of the gene expression and include insertions and deletions 118
(Britten, 2010). But those bigger changes can be positively selected only if there is something to improve. Narrative abilities, however, can improve on many levels simultaneously—for example: voice (pleasant sound, tone, and depth); rhythm and melody; lexicon size; syntactic structures in the dimensions of clarity, beauty, elegance, and complexity (great processing costs for the speaker, low processing costs for the receiver); and dramaturgy and suspense. Alongside episodic memory, having a theory of mind is an important prerequisite for being able to tell a complex story. First, a good storyteller always needs to keep two things in mind: the whole story (his own knowledge of the narration) and what he has related of it so far (the knowledge of the receiver of the story). Second, to understand a story, both the narrator and the receiver need to understand the motivations and beliefs of the hero. However, the narrator has to make sure not only that he himself understands the motives of the hero, but also that the receiver understands them at every given point in the story. Thus, to understand whether the receiver grasps the story, the narrator needs a second-order theory of mind (i.e., he needs to understand what the receiver believes the hero believes). Not only language but also both a theory of mind and episodic memory will be put under selective pressure in the adaptation process for narration (or narrative abilities). If there is a reproductive advantage to narrative utterance, then all cognitive abilities connected to narration could evolve and positive selected mutations could be recombined through sexual reproduction. 4. Conclusion In our model, language is neither adapted simply to the brain nor the brain directly to language; instead, both language and brain are adapted to storytelling. The key to language evolution is therefore to model a scenario in which storytelling would be strongly positively selected. We therefore deny the widely spread misconception that language had the initial function of coordinating action in complex situations, exchanging pragmatic information, etc. The problem with such ideas is that the coordination of a group does not confer individual reproductive success on the speaker. The benefits in such a situation are only on the side of the receiver. Moreover, any model that proposes cooperation needs to confront two long-debated problems: the problem of human altruism and the problem of group selection. Models that propose collaboration and shared intention as a prerequisite of language development find themselves in a Catch 22, if they cannot explain the emergence of human altruism and theory of mind. However, the first narrative utterance (a sign that 119
refers to a past event) could be the starting point of language evolution if narration would confer reproductive advantage on the narrator. The reproductive advantage could, for example, be based on sexual selection. This scenario could also explain the super-fast evolution of various other unique cognitive abilities. If the reputation of every individual is stored in and distributed through circulating narrations, then non-kin, non-reciprocal altruism seems possible. (Nowak & Sigmund, 2005). This solves the problem of human altruism. But this does not necessarily mean that narration in fact evolved for this purpose. Though human altruism would indeed suggest new forms of collaboration, this would pay off only on the level of group selection. Though group selection is possible in the sense that one hominini group could replace others, it cannot be the foundation of the development of any trait. To think of any advantage on any level as the foundation of evolution without explaining how the gene pool of a population changes its allele frequency is a teleological illusion, that is, it conflates cause and effect. The actual development of traits is always connected with a modelling of the gene pool, and this can happen only if individuals with different genes – within a population – have different reproduction rates. To think of human altruism as the cause of language evolution would therefore repeat the common mistake of confusing the later advantage of a trait with a possible cause of its evolution (Tinbergen, 1963). In other words, circulating narrations would solve the problem of human altruism, but this altruism cannot be the cause of the development of language. On this level, our result is a negative one: this paper does not model a concrete scenario for language evolution. Thus, this paper is but a prelude to a much more complex model that explains why narrative abilities were put under such extreme selective pressure in the evolution of our ancestors. References Arbib, M. A. (2005). From money-like action recognition to human language: An evolutionary framework for neurolinguistics. Behavioral and Brain Sciences, 28, pp. 105–167. Bargmann, S., Götze, C., Weskott, T., Holler, A., & Webelhuth, G. (2013). An Empirical Investigation into Binding-Theoretic Reconstruction Effects in Restrictive Relative Clauses of German. Berlin, Humboldt University: Liguistic evidence. Beckner, C., Blythe, R., Bybee, J., Christiansen, M. H., Ellis, N. C., Holland, J., . . . Schoenemann, T. (2009). Language is a complex adaptive system. Language Learning, 59 (s1), pp. 1–26. 120
Britten, R. J. (2010, October 4). Transposable element insertions have strongly affected human evolution. Biological Sciences – Evolution, 107, pp. 19945– 19948. Corballis, M. C. (2011). The Recursive Mind: The Origins of Human Language, Thought, and Civilization. Princeton: Princeton University Press. Corballis, M. C., & Suddendorf, T. (2007). Memory, Time and Language. In C. Pasternak (Ed.), What makes us human (pp. 18–36). Oxford: Oneworld . Goodall, J. (2010, April 10). Conversation with Freddy Gray. Spectator. Hockett, C. (1960). The Origin of Speech. Scientific American, 203, pp. 89–97. Jackendoff, R. (1983). Semantics and cognition. Cambridge, MA: MIT Press. Kirby, S. (2012). Language is an Adaptive System: The Role of Cultural Evolution in the Origins of Structure. In M. Tallerman & K. R. Gibson (Eds.), The Oxford handbook of language evolution (pp. 589–604.). Oxford: Oxford University Press. Lahn, B., Dorus, S., Vallender, E. J., Evans, P. D., Anderson, J. R., Gilbert, S. L., & Mahowald, M. (2004, December 29). Accelerated Evolution of Nervous System Genes in the Origin of Homo sapiens. Cell, 119, pp. 1027–1040. Nowak, M., & Sigmund, K. (2005). Evolution of indirect reciprocity. Nature 437(7063), S. 1291–1298 . Rizzolatti, G., & Sinigaglia, C. (2008). Empathie und Spiegelneuronen. Frankfurt am Main: Suhrkamp. Rizzolatti, G., Fadiga, L., Gallese, V., & Fogassi, L. (1996). Premotor cortex and the recognition of motor actions. Cognitive Brain Research, 3, pp. 131–141. Seyfarth, R. M. (1980). Monkey responses to three different alarm calls: evidence of predator classification and semantic communication. Science, 210, pp. 801–803. Tinbergen, N. (1963). On aims and methods of Ethology. Zeitschrift für Tierpsychologie, 20, pp. 410–433. Tomasello, M. (2003). Constructing a language. Cambridge, MA: Harvard University Press. Tomasello, M. (2009). Why we cooperate. Cambridge, MA: MIT Press.
121
WHAT ICONICITY CAN AND CANNOT DO FOR PROTOLANGUAGE ELIZABETH IRVINE Department of Philosophy, Australian National University, Coombs Close Canberra, 2601, Australia, [email protected] It is often claimed or assumed that iconicity played a crucial role in bootstrapping the emergence of early linguistic systems and symbolic capacities, because the meaning of iconic signs are easier to comprehend than of arbitrary signs, and so demand less complex cognitive capacities. In this paper, it is argued that instead a separate category of non-iconic but non-arbitrary signs are sufficient to get very simple linguistic systems off the ground, based on a range of biologically grounded cross-modal associations. Yet this highlights a potential explanatory gap between the capacities of domain-specific systems for interpreting signs (used early on), and the capacities of a domain general system for interpreting meaning from form (iconic signs) or broader symbolic capacity (arbitrary signs), that evolved later. However, despite being unable to bootstrap the development of a symbolic capacity, iconicity may still have played a role in building the lexicon of languages used by individuals who have passed this ÔgapÕ, so already possess (something like) a domain general symbolic capacity. Ways of testing these claims are provided.
1. Introduction: Iconicity and Bootstrapping Iconic signs, where the meaning of the sign is ÔtransparentÕ in its form, are often treated as a way of bootstrapping the emergence of early linguistic systems and linguistic capacities. In particular, early linguistic systems based on iconic signs may bypass the need for a fully developed symbolic capacity, so allow simple linguistic systems to emerge and co-evolve with the full suite of cognitive capacities associated with modern language use. Part of this evolution may be culturally driven, with iconic signs becoming arbitrary over time (e.g. Gasser 2004). However, discussions about the role of iconicity in the emergence of linguistic systems often waver between the notion of imagic iconicity, where a specific signÕs form in some way resembles its meaning, and diagrammatic iconity or systematicity, where a space of forms is related to a space of meanings, but where individual form-meaning pairs may be arbitrary.
122
This paper focuses on the role of imagic iconicity in particular for two reasons. First, even if systematicity was present in early linguistic systems, the cognitive mechanisms underlying the interpretation of specific (possibly imagic iconic) forms are likely to be different from those related to mapping and comprehending systematic relationships in spaces of form-meaning pairs. Second, imagic iconicity is difficult enough to define, identify and measure to be theoretically and empirically interesting in its own right. Here, different types of apparent imagic iconicity are analysed. It is argued that signs claimed to iconic, and crucially related to the emergence of language, are not in fact straightforward cases of iconic signs, though they are related to their meaning in a non-arbitrary (e.g. biologically grounded) way. This highlights a potential problem in the current literature; of moving too easily between talk about domain specific capacities for the interpretation of apparently iconic signs (here argued to be in fact non-iconic but non-arbitrary signs), to talk about a domain general ability to treat and interpret signs as iconic. However, even if iconicity did not play a role in the emergence of early linguistic systems and symbolic capacities, reasons are given why iconic signs may nevertheless have been prevalent in the later development of linguistic systems. Suggestions of how to test these claims are also provided. 2. What Counts as Imagic Iconicity? The aim of this section is not to come up with a formal definition or measure of imagic iconicity (see e.g. Verhoef et al. 2011 for discussion), but identify two graded features of (apparently) iconic signs, and to see if the recognition of iconicity is necessary or likely for the comprehension of early linguistic signs. One way of grading iconicity is in terms of the source of the (potentially) iconic resemblance relation between the form and meaning of a sign. The second graded feature is the extent to which the user or recipient treats the sign as iconic. These are discussed in more detail below. 2.1. What is Not Iconic? Ape Gestures First, despite the links between iconicity, gesture, and research into non-human primate communication (e.g. Tomasello 2008), it is reasonably well accepted that ape gestures are (typically) not iconic (though see e.g. Russon and Andrews 2011). Play gestures, begging gestures, the nursing poke, are intentionmovements or attention-getters, and function as parts of ritualized behaviours:
123
ÒÉape intention-movements rest on the natural tendency of recipients to anticipate the next step in an action sequenceÓ (Tomasello 2009, p. 62). While these signs can be treated as iconic by outsiders, they are not iconic for their users. These signs get their meaning by being parts of (on-line) action sequences, not from recognition or interpretation of resemblance relations. 2.2. Iconicity in Cross-Modal Associations There is a large body of work showing that humans (and likely other animals) naturally associate properties of one modality with another (see Spence 2011 for review). Many of these associations pick up on useful statistical features of the signaler or the environment (e.g. big things tend to be heavier), while other associations may just be accidents of development. These phenomena have been found to aid comprehension of signs (e.g. Perniss et al. 2010 for examples), and to potentially play a role in early language evolution as a solution to the symbolgrounding problem (Simner et al. 2010), all in virtue of their apparent iconicity. However, the important features of these signs do not seem to be their iconicity, but their non-arbitraryness, in terms of being biologically grounded. First, there are grades of (apparent) iconicity present in cross-modal associations. Some reflect common associations found in the environment (including the body), and these can generate signs that embody a recognizable resemblance relation between the sign and its meanings, so can potentially be treated as iconic signs. On the other hand, cross-modal associations based on (more or less) accidental neural wiring are less likely to generate signs that embody recognizable resemblance relations, and so are unlikely to be treated as iconic. In this case, there is a continuum of iconicity in cross-modal associations, sometimes with very little (potential) iconicity at all. Yet whatever the (potential) grades of iconicity of signs based on crossmodal associations, they are fairly easy for humans to interpret. The claim here then is that when interpreting the meaning of signs based on cross-model associations, it is not necessarily the (graded) iconic status of these signs that does the work, but the way that neural wiring routs information in the brain. So, signs based on cross-modal associations are non-arbitrary since they are biologically grounded, but they need not be treated as iconic by recipients in order to comprehend their meaning. This more conservative treatment of the interpretation of these kinds of signs suggests that iconicity not need have played a major role in the emergence of early linguistic systems and capacities.
124
2.3. Pantomime and Iconic Gesture A case of iconicity that is more often invoked to explain the emergence of early linguistic systems is iconicity in gesture. The gestural channel is particularly suited for iconic representation, combining obvious spatial and temporal features, along with the capacity for pantomiming events, which can be difficult in the vocal channel (see e.g. Fay et al. 2013). One treatment of the iconicity in gesture comes in the form of embodied approaches to language, including work on mirror neurons (e.g. Arbib 2012, Gentilucci and Corballi 2006). Here, bodily or manual signs that simulate actions can be interpreted using systems of action understanding. Since simulated actions resemble ÔrealÕ actions, they are at least potentially iconic. However, as above, a more conservative approach is to identify the mirror neuron system as another way of interpreting non-arbitrary, rather than iconic signs, as they are based on (domain specific) learned associations. Some evidence in favour of this is that children aged 2-4 use action-based gestures not as symbolic or decontextualized labels for objects, but as descriptions of what one can do with the object (Ôaction associatesÕ) (Marentette and Nicoladis 2011). So, signs get their meaning because they are simulated actions, processed by a specialist system, rather than by a domain general mechanism for recognizing iconicity or resemblance relations. The worry is again how to move from this small set of signs and meanings into a wider space of iconic signs and resemblance relations, and arbitrary symbols and meanings. Further, while iconicity in gesture may aid understanding new signs in adults (e.g. Brown 2012, Ortega and Morgan 2010), it does not seem to help children acquiring language. For example, in hearing children, Capirci & Volterra (2008) found that iconic gestures do not always precede vocal symbols, and Nicoladis et al. (2010) found that children are no better at matching quantity to iconic signs (fingers) than to arbitrary signs (Arabic number symbols). Namy (2008) uses the result that recognition of iconic gestures only really starts after 2 years of age, when verbal communication starts to come online, to argue that the capacity for understanding any kind of representation underlies use of both iconic and symbolic signs. Similarly, for deaf children, Òthe iconicity found in a sign language does not appear to play a significant role in the way the language is processed or learnedÓ (Goldin-Meadow and Alibali, 2013, pg. 270). Again, it may be (domain specific) non-arbitrariness, rather than imagic iconicity, that plays a role in the emergence of very early linguistic capacities and systems.
125
2.4. Summary: Non-Iconic, Non-Arbitrary Signs Iconicity is assumed to play a major role in the emergence of early linguistic systems because it provides an easy way of reading meaning from the form of a sign. Instead, the sections above outline why it pays to be careful about separating iconicity from non-arbitraryness. Iconicity invokes the idea of a domain general mechanism for recognizing resemblance relations, while in fact a more conservative take on early sign comprehension identifies a number of relevant (more or less) domain specific cognitive capacities, often relying on cross-modal associations. In this case, the ability to understand the meaning of signs in one domain does not guarantee understanding elsewhere, and the emergence of a domain general ability to recognize meaning in signs is still unexplained. To move towards an explanation, it may be necessary to identify a wider range of non-iconic but non-arbitrary signs, the mechanisms used to interpret them, and how they interact to scaffold the evolution and development of (something like) a domain general symbolic capacity. 3. Emergence vs. Development of Languages The arguments above are aimed at critiquing the role of iconicity in the emergence of early linguistic systems. However, iconic signs may still have played an important role in the development of these systems once they, and some form of symbolic capacity, emerged. First, even if imagic iconicity only aids comprehension of signs for individuals who already use language and already possess a symbolic capacity, it can still play a role in the extension of a lexicon. That is, for existing language users, novel iconic signs are easy to understand and learn, so provide a quick and easy way to expand a vocabulary. However, a range of factors affect the best way to build a lexicon. One of the main problems with iconic signs is that for a range of similar meanings, the signs could have similar forms, and so be difficult to disambiguate (Gasser 2004). Iconic signs, particularly as they become shortened and simplified, may also become harder to understand for individuals who do not share a history with other users of the sign. However, supposing that early language users had small lexicons relieves a pressure for expressivity that might favour the use of arbitrary signs. Further, even if these iconic signs are similar, early language users were likely part of reasonably small communities who had a substantial shared background (Meir et al. 2010). In addition, use of early linguistic systems would have been strongly
126
related to physical, social and task context, so combined with shared background, this may have been sufficient disambiguate similar iconic signs. Further, signs that are not originally iconic for their users may become iconic with the development of more complex cognitive abilities. For example, early non-iconic gestural signs (ape gestures, simulated actions) could become fully iconic, by changing both the way that the recipient interprets the sign (recognizing a resemblance relation), and by the presence of more complex communication intentions from the producer (the producer intends that the resemblance relation is noticed and interpreted in a particular way). In this case, existing potentially high-grade iconic signs could easily become parts of an iconic lexicon. Finally, imitation is a common form of learning and potentially provides a wide range of iconic gestures. For example, imitating the behaviours of others in order to learn complex manual skills could provide a range of gestures with shared iconic meaning for a community of language users (e.g. about the means to acquire or prepare food). In this case, imitation and developing linguistic abilities provide a way of generating new iconic signs that are easy to comprehend and easy to spread across a community of reasonably competent language users. 4. Conclusion and Suggestions for Experimental Work It has been suggested here that it plays to careful about the use of the term iconicity, and how it is used in claims (or assumptions) about its role in the evolution of early linguistic systems. Instead of playing a crucial role in getting linguistic systems and symbolic capacities off the ground, iconicity comes in a later stage, and in a different way. A more conservative approach to treat signs in early linguistic systems as non-iconic but non-arbitrary. Interpretation of these signs does not rely on a general mechanism for recognizing (iconic) meaning within form, but a range of (more or less) domain specific features of cognitive processing. This then puts pressure on accounts that rely on iconicity per se to explain the emergence of linguistic systems and a symbolic capacity. Instead, questions shift to how a domain general ability to recognize iconicity evolved from a range of domain specific capacities suited to interpreting noniconic, but non-arbitrary signs, and what the later roles of iconicity might have been in the development of languages.
127
One way of testing the ideas in the first section would be to try to separate out and compare ongoing success in a communication game where dyads must come up with labels for objects in a single medium (e.g. like in Verhoef et al. 2011), across signs that recipients rate as either iconic, non-iconic but nonarbitrary, or arbitrary. Given the arguments above, one hypothesis is that there would be a clear ranking of communicative success across (high to low): 1) signs that receivers rate as iconic, so recognize a resemblance relation, 2) signs rated for which a meaning Ôfeels rightÕ but no resemblance relation is recognized (non-iconic but non-arbitrary signs) 3) signs that receivers rate as arbitrary. This could be used to illustrate the claim that simple linguistic systems can be built on signs that lie between full arbitrariness and iconicity, but that take advantage of domain specific features of cognitive processing. The claims made in the second section could also be tested experimentally. One way would be to modify Scott-PhillipÕs (2010) Ôembodied communication gameÕ where a potential communication channel is effectively hidden from communicating pairs, some of whom find it and complete the task, and some of whom do not (here, movements that are required to ÔwinÕ the game, and that are visible to both players, can also function as a communication channel). One prediction from the discussion above is that, having hidden a channel suited for iconic (but non-systematic) signs, and one for arbitrary (but non-systematic) signs, pairs should be just as successful in finding each type of channel. However, pairs having found the iconic channel should then develop a larger lexicon more quickly. Acknowledgements Many thanks to Erin Brown, Chrissy Cuskley, James Winters, Michael Pleyer, Evan Kidd, Kim Sterelny, a seminar audience at ANU, and particularly to Sean Roberts, for discussion and guidance through the literature. References Arbib, M. A. (2012). How the Brain got Language: The Mirror System Hypothesis. Oxford University Press. Brown, J. E. (2012). The evolution of symbolic communication: An embodied perspective. Unpublished PhD Thesis, University of Edinburgh. Capirci, O. & Volterra, V. (2008). Gesture and speech: The emergence and development of a strong and changing partnership. Gesture, 8, 22-44. Fay, N. Arbib, M., & Garrod, S. (2013). How to bootstrap a human communication system. Cognitive Science, 37, 1356-1367.
128
Gasser, M. (2004). The origins of arbitrariness in language. In Proceedings of the annual conference of the Cognitive Science Society, 26, 4-7. Gentilluci, M. & Corballis, M. C. (2006). From manual gesture to speech: A gradual transition. Neuroscience & Biobehavioral Reviews, 30, 949-960. Goldin-Meadow, S., & Alibali, M.W. (2013). GestureÕs role in speaking, learning, and creating language. Annual Review of Psychology, 123, 448453. Marentette, P. & Nicoladis, E. (2011). PreschoolersÕ interpretations of gesture: Label or action associate? Cognition, 121, 386-399. Meir, I., W. Sandler, C. Padden, & M. Aronoff (2010). Emerging sign languages. In M. Marschark and P. Spencer, (eds.) Oxford Handbook of Deaf Studies, Language, and Education. Oxford University Press. Namy, L. L. (2008). Recognition of iconicity doesnÕt come for free. Developmental Science, 11, 841-846. Nicoladis, E., Pika, S., & Marentette, P. (2010). Are number gestures easier than number words for preschoolers? Cognitive Development, 25, 247-261. Ortega, G. & Morgan, G. (2010) Comparing child and adult development of a visual phonological system. In Language, interaction and acquisition, 1, pp. 67-81. Amsterdam: John Benjamins Publishing Company. Perniss, P. Thompson, R. L. & Vigliocco, G. (2010). Iconicity as a general property of language: evidence from spoken and signed languages. Frontiers in Psychology, 1, Article 227. Russon, A. & Andrews, K. (2011). Orangutan pantomime: Elaborating the message. Biology Letters, 7, 627-630. Scott-Phillips, T. C. (2010) The evolution of communication: Humans may be exceptional. Interaction Studies, 11, 78-99. Simner, J. Cuskley, C., & Kirby, S. (2010). What sound does that taste? Crossmodal mappings across gustation and audition. Perception, 39, 553-569. Spence, C. (2011). Crossmodal correspondences: A tutorial review. Attention, Perception, and Psychophysics, 73, 971-995. Sterelny, K. (2012). Language, gesture, skill: The co-evolutionary foundations of language. Phil. Trans. of the Royal Society B, 367, 2141Ð2151. Tomasello, M. (2008). Origins of Human Communication. Cambridge, MA: MIT Press. Verhoef, T., Kirby, S., & Padden, C. (2011). Cultural emergence of combinatorial structure in an artificial whistled language. In Proceedings of the 33rd annual conference of the cognitive science society, pp. 483-488.
129
DID LANGUAGE EVOLVE INCOMMUNICADO?
SVERKER JOHANSSON School of Education and Communication, University of J¨onk¨oping, Box 1026, J¨onk¨oping, SE-551 11, Sweden [email protected] It is commonly assumed in evolutionary linguistics that language evolved for communication. But much recent work in biolinguistics, e.g. Chomsky (2010), proposes instead that language evolved for purely internal use, as a cognitive tool, with no externalization until at a later stage in language evolution. How well supported is really our general assumption of communicative language origins? Does it make sense to have instead an early stage with internal language only? I will review the arguments invoked in favor of the incommunicado hypothesis, and critically evaluate their strength.
1. Introduction When investigating the evolutionary origins of any feature of an organism, one pertinent question is whether it is an adaptation, and if so, what it is an adaptation for. What provided the adaptive advantage that originally made the feature spread in the population (cf. Tinbergen, 1963), what was its function? In most work on evolutionary linguistics, it is generally assumed that language evolved for communicative purposes (Pinker, 1998; Jackendoff, 2002; Johansson, 2005; Dessalles, 2009; Giv´on, 2009; Bickerton, 2009, among many others). The vast majority of specific hypotheses proposed for language evolution likewise postulate a selective advantage that has something to do with communication. What distinguishes these hypotheses is mainly what the communication is about, and why and for whom the communication provides an evolutionary benefit. But there has long been a minority of scholars who propose that the original function of language lay in some other area than communication (Jerison, 1973; Chomsky, 2002; Newmeyer, 2004; Turner, 2004; Mirolli & Parisi, 2006; Lupyan, 2008). In recent years, a specific non-communicative hypothesis, growing out of the minimalist program (Chomsky, 1995), has gained a fair amount of supporters among those who study language origins under the label ‘biolinguistics’ (Chomsky, 2010; Piattelli-Palmarini, 2010; Berwick & Chomsky, 2011; Berwick, 2011; Boeckx, 2013). According to this hypothesis, language was originally a cognitive tool: “...language is not properly regarded as a system of communication. It is a system for
130
expressing thought, ...” (Chomsky, 2002, p 76), “If so, then it appears that language evolved, and is designed, primarily as an instrument of thought.” (Chomsky, 2007, p 22). For some non-negligible period of time during language evolution, language was not externalized at all, nor linearized, but was instead used purely inside the head with no connection to sensorimotor systems (Chomsky, 2010; Hauser, 2011b; Boeckx, 2013). The use of language for communication is not only viewed as evolutionarily secondary, it is also downgraded in importance. Chomsky argues that “The use of language for communication might turn out to be a kind of epiphenomenon.” (2002, p 107). Language evolution was incommunicado. Communication between proponents of communicative and noncommunicative language origins hypotheses has not always been fruitful and constructive; the debate sometimes looks like an illustraton of the incommensurability of paradigms (cf. Kuhn, 1962). My purpose in this work is to attempt to bridge the incommensurability gap, coming from the communicative side and looking at the arguments of the other side, critically but without dismissing them out of hand. I will first summarize briefly the arguments proposed in favor of the incommunicado hypothesis, after which I will go on to review the various problems and weaknesses of the hypothesis. 2. Arguments for non-communicative language origins Indubitably language is an important cognitive tool for humans today, and our “inner speech” constitutes a major fraction of actual language usage (Chomsky, 2002), but that in itself does not prove that cognition was its original raison d’ˆetre; language is likewise an important communicative tool today. The main arguments provided in favor of the incommunicado hypothesis are instead, briefly, as follows: • Asymmetry of the interfaces. This argument is strongly rooted in the minimalist paradigm, with the core of language regarded as a computational module forming an optimal connection between the conceptual-intentional (C-I) and sensorimotor (SM) interfaces. As argued by e.g. Chomsky (2010), the existence of manifest imperfections at the SM side entails that the locus of perfection must rather be at the C-I side, implying that language was optimized for the C-I-interface first, with SM tacked on afterwards. • Language badly designed for communication. This argument is closely related to the preceding one, but proceeds from a different starting point. It is argued by e.g. Berwick (2011) that language has numerous universal features that are difficult to explain if language evolved for communication, from which it is concluded that language didn’t evolve for communication. • Modality-independence of language. As is well known, language works perfectly fine in a variety of modalities, notably both spoken and signed.
131
Chomsky (2010) invokes this as further evidence of the asymmetry of the interfaces; apparently language can work with several different externalization modules (different SM interfaces?), which likewise is taken to imply that externalization was a secondary development, retrofitted to an existing internal language. • Lack of reference relation in language. As argued by Chomsky (2010): “... there appears to be no reference in human language and thought ... Referring is an action, and the internal symbols that are used to refer do not pick out mind-independent objects.” (p. 57) The symbols of speech, in contrast with animal calls, are only internally connected, without external reference, something which makes more sense if externalization and communication are secondary aspects of language. • The sudden emergence of language. Logically, there is no necessary connection between incommunicado and sudden origins. Nevertheless, much of the incommunicado literature proposes a saltational origin for language, with a single critical mutation providing syntax: “The simplest assumption, ..., is that the generative procedure emerged suddenly as the result of a minor mutation.” (Berwick & Chomsky, 2011, p. 29), with similar statements also found in e.g. Piattelli-Palmarini (2010) and Chomsky (2005, 2010); cf. the “hopeful monsters” of Goldschmidt (1940). Piattelli-Palmarini (2010) is also strongly anti-adaptationist and anti-selectionist, in contrast with Chomsky (2010) who provides a selectional scenario for the spread of the putative single mutation. The saltational hypothesis is connected with and supported by the belief that language is of sudden recent origin, perhaps 50,000–100,000 years ago (e.g. Berwick & Chomsky, 2011). 3. Reasons why incommunicado is implausible Here I will first provide comments on and counterarguments to the arguments reviewed in the previous sections. After that I will proceed with other considerations that are problematic for the incommunicado hypothesis. • Asymmetry of the interfaces. This argument carries weight only within the minimalist paradigm, and is powerless unless it is assumed that the minimalist thesis (Chomsky, 2000) is basically true, both in that the picture of language as a bridge between interfaces is correct, and in that language is optimal. Here the incommensurability problem is manifest; from a perspective outside minimalism, the imperfections on the SM side, that Chomsky (2010) concedes, could rather be interpreted as falsifying the postulated perfection of language, in which case Chomsky’s asymmetry argument would be downgraded to just an ad hoc effort to save his thesis.
132
• Language badly designed for communication. The strength of this argument likewise depends on the underlying assumption that language is optimal in some sense, well designed for whatever it is designed for. But this assumption is not self-evidently true; imperfections are a common outcome in evolutionary tinkering (cf. Jacob, 1977). The eye, for example, has obviously evolved for vision, but the reverse orientation of the retina and the resulting blind spot is just as obviously bad design for vision. Furthermore, do we understand the whole system of human communication well enough to be sure that the oddities invoked by Berwick (2011) are really badly designed for communication? Berwick mainly talks about parsing difficulties, but that is only one aspect of communication; language as a communicative tool is presumably some kind of compromise between the interests of the speaker and the interests of the hearer, and parsing difficulties for the hearer might be compensated by corresponding advantages for the speaker. • Modality-independence of language. This is indeed a very interesting feature of our language faculty, which any serious hypothesis of language origins must account for. But the account of Chomsky (2010) is not the only possibility. Notably, the various hypotheses postulating an early gestural stage in language evolution (e.g. Corballis, 2002) account for it in a totally different way. Polymodal hypotheses might likewise provide a natural explanation for modality-independent language evolution in a social communicative context (Dor, Knight & Lewis, in press). • Lack of reference relation in language. Again, Chomsky (2010) correctly identifies an interesting and unusual feature of our language faculty, in the very indirect ways that our symbols connect with any kind of external reality. But the lack of direct reference is a problem also in the incommunicado scenario; if the putative language of thought is to be used for thinking about anything useful, external reference remains an issue. The incommunicado hypothesis does not solve this problem; it would still be the case that “[t]hese properties... have to be accounted for somehow in the study of their evolution. How, no one has any idea.” (Chomsky, 2010, p. 58). • The sudden emergence of language. Saltational hypotheses have a welldeserved poor reputation in evolutionary biology (Kinsella, 2009; Iordansky, 2006; Futuyma, 1998; Cojocaru, 2009), and recent developments in evo-devo have not done much to change that (Gardner, 2013), contra the claims of Piattelli-Palmarini (2010) and Berwick (2011). Having a single large-effect mutation lead to something that is viable and perhaps even useful is not common, but does happen occasionally, due to robust developmental constraints and modular architecture. But having it lead to something that is both totally novel, and optimal or perfect in any reasonable sense,
133
is unlikely in the extreme (Kinsella, 2009). Positing a saltational origin for language effectively amounts to positing sheer dumb luck as an explanation, which is far from the principled explanations sought by Chomsky (2005). The belief that language popped up suddenly and recently less than 100,000 years ago is not supported by recent results from archeology and genetics. Instead proxies for language, including externalization, appear gradually and piecemeal over a long period, in both early Homo sapiens and Neanderthals (Johansson, 2011, 2013; Dediu & Levinson, 2013). In summary, none of the arguments invoked in favor of the incommunicado hypothesis is compelling. Furthermore, the notion of an incommunicado period in language evolution has a number of other problems as well. For one thing, it does not pass the “chimp test” (Bickerton, 2008); the selective advantage described by Chomsky (2010) for incommunicado language would be just as applicable to chimps as to (proto-)humans, so why don’t chimps have it as well? If language is primarily and originally a cognitive tool, it is odd that so much of the language machinery in our head is geared towards connecting with the sensorimotor interface, and that so much of the structure of language, notably the enforced seriality of spoken language, is likewise shaped by the demands of externalization. Even more odd is the fact that when we do use language as a cognitive tool, we typically use it in the form of ‘inner speech’, an apparent internalized form of the externalization of language; this makes more sense with an originally externalized language later adopted for internal use. Language acquisition is also problematic. Hauser (2011b) correctly notes that such a non-externalized language could never be transmitted, any “vocabulary” developed by one individual would die with that individual. Hauser then goes on to say that this somehow supports a long incommunicado period in language evolution — but I would argue the opposite; in the putative incommunicado period, it is not clear how language could ever be acquired in ontogeny, given how heavily language acquisition depends on external input. Even though the poverty of the stimulus for language acquisition is notorious, acquisition totally fails without any stimulus at all; cf. the case of Genie (Curtiss, 1977). How did language acquisition ever get off the ground in any individual with incommunicado language? Concerning the neural organisation of language, the incommunicado hypothesis would predict close connections and non-dissociability between cognition and core syntax in the brain. The rather sparse results available from studies examining this issue are mixed. Monti, Parsons, and Osherson (2009) find dissociation between language and logic in the brain, whereas Baldo et al (2010) find that relational reasoning is not separate from language, and Baldo et al (2005) that complex problem solving is impaired both in aphasia patients and in healthy subjects attending to a verbal distractor task. Broca’s aphasia is noteworthy in that syntax is severely impaired, but other cognitive functions are often largely spared — unlike many other aphasics, Broca
134
patients typically retain their previous non-verbal IQ (Grodzinsky, 2000). If there is any aphasia where core syntax is damaged, it is Broca’s. But if core syntax were vital for cognition and conceptual thought, Broca’s aphasia would entail severely impaired cognition. The evidence from empirical data on cognition in pre-linguistic infants and non-linguistic animals likewise indicate that language is not a prerequisite for complex cognition or conceptual reasoning (Fitch, 2009; Hauser, 2011a), and that cognitive representation predates language in both ontogeny and phylogeny (Giv´on, 2009). 4. Conclusions There is no compelling evidence in favor of the incommunicado hypothesis, at least not outside the minimalist paradigm, and a number of issues, notably language acquisition, are highly problematic. Within the minimalist paradigm, the conjunction of the assumption of perfect language and the manifest imperfection of language externalization does provide some incentive to downgrade the importance of externalization and postulate that language was perfected without externalization. But even within that context, the incommunicado hypothesis would appear to create more problems than it solves. But the failure of the incommunicado hypothesis does not in itself prove that language evolved for communication. The arguments for and against the various communicative hypotheses need to be evaluated on their own merits. References Baldo et al. (2005). Is problem solving dependent on language? Brain & Lang 92:240-250. Baldo et al. (2010). Is relational reasoning dependent on language? a voxel-based lesion symptom mapping study. Brain & Lang 113:59-64. Berwick, R. C. (2011). Syntax facit saltum redux: Biolinguistics and the leap to syntax. In A. M. D. Sciullo & C. Boeckx (Eds.), The biolinguistic enterprise. new perspectives on the evolution and nature of the human language faculty. Oxford: Oxford University Press. Berwick, R. C., & Chomsky, N. (2011). The biolinguistic program: The current state of its development. In A. M. D. Sciullo & C. Boeckx (Eds.), The biolinguistic enterprise. new perspectives on the evolution and nature of the human language faculty. Oxford: Oxford University Press. Bickerton, D. (2008). Two neglected factors in language evolution. In A. D. M. Smith, K. Smith, & R. Ferrer i Cancho (Eds.), Evolution of language evolang7. World scientific. Bickerton, D. (2009). Adam’s tongue. New York: Hill & Wang. Boeckx, C. (2013). Lexicon, syntax, and grammar: Biolinguistic concerns. Presented at Congr`es International des Linguistes, Gen`eve, 21-27 July.
135
Chomsky, N. (1995). The minimalist program. Cambridge: MIT Press. Chomsky, N. (2000). Minimalist inquiries: the framework. In R. Martin, D. Michaels, & J. Uriagereka (Eds.), Step by step: Essays on minimalist syntax in honor of howard lasnik (p. 89-155). Cambridge, MA: MIT Press. Chomsky, N. (2002). On nature and language. Cambridge University Press. Chomsky, N. (2005). Three factors in language design. Ling Inquiry 36:1-22. Chomsky, N. (2007). Of minds and language. Biolinguistics 1:9-27. Chomsky, N. (2010). Some simple evo devo theses: how true might they be for language? In R. K. Larson, V. D´eprez, & H. Yamakido (Eds.), The evolution of human language. biolinguistic perspectives. Cambridge University Press. Cojocaru, I. (2009). The revolutionary transition from essentialist to populationary thinking in biology. Analele S¸tiint¸ifice ale Universitˇa¸tii Al. I. Cuza Ias¸i, s. Biologie animalˇa, Tom LV. Corballis, M. C. (2002). From hand to mouth: the origins of language. Princeton: Princeton University Press. Curtiss, S. (1977). Genie : a psycholinguistic study of a modern-day ”wild child”. New York: Academic Press. Dediu, D., & Levinson, S. C. (2013). On the antiquity of language: the reinterpretation of neandertal linguistic capacities and its consequences. Frontiers in Psychology 4(397), 1-17, doi:10.3389/fpsyg.2013.00397. Dessalles, J.-L. (2009). Why we talk. Oxford University Press. Dor, Knight & Lewis. (in press). The social origins of language: Early society, communication and polymodality. Oxford: Oxford University Press. Fitch, W. T. (2009). Prolegomena to a future science of biolinguistics. Biolinguistics 3.4:283-320. Futuyma, D. J. (1998). Evolutionary biology, 3rd ed. Sunderland: Sinauer. Gardner, A. (2013). Darwinism, not mutuationism, explains the design of organisms. Progress in Biophysics and Molecular Biology 111:97-98. Giv´on, T. (2009). The adaptive approach to grammar. In D. Bickerton & E. Szathmary (Eds.), Biological foundations and origin of syntax. MIT Press. Goldschmidt, R. (1940). The material basis of evolution. New Haven: Yale University Press. Grodzinsky, Y. (2000). The neurology of syntax: language use without broca’s area. Behav & Brain Sci 23(1):1-21. Hauser, M. D. (2011a). Evolingo: The nature of the language faculty. In PiattelliPalmarini, Salaburu, & Uriagereka (Eds.), Of minds and language. a dialogue with noam chomsky in the basque country. Oxford: Oxford University Press. Hauser, M. D. (2011b). The illusion of biological variation: a minimalist approach to the mind. In Piattelli-Palmarini, Salaburu, & Uriagereka (Eds.), Of minds and language. a dialogue with noam chomsky in the basque country. Oxford: Oxford University Press.
136
Iordansky, N. N. (2006). The problem of the evolutionary saltations. Zhurnal Obshchei Biologii 67(4):256-267. Jackendoff, R. (2002). Foundations of language. brain, meaning, grammar, evolution. Oxford: Oxford University Press. Jacob, F. (1977). Evolution and tinkering. Science 196:1161-1166. Jerison, H. J. (1973). Evolution of the brain and intelligence. New York: Academic Press. Johansson, S. (2005). Origins of language — constraints on hypotheses. Amsterdam: Benjamins. Johansson, S. (2011). Constraining the time when language evolved. Linguistic and Philosophical Investigations 10:45-59. Johansson, S. (2013). The talking neanderthals: What do fossils, genetics and archeology say? Biolinguistics 7:35-74. Kinsella, A. R. (2009). Language evolution and syntactic theory. Cambridge University Press. Kuhn, T. S. (1962). The structure of scientific revolutions. Chicago: University of Chicago Press. Lupyan, G. (2008). Extracommunicative functions of language: verbal interference causes categorization impairments. In A. D. M. Smith, K. Smith, & R. Ferrer i Cancho (Eds.), Evolution of language - evolang7. World scientific. Mirolli, M., & Parisi, D. (2006). Talking to oneself as a selective pressure for the emergence of langu. In A. Cangelosi, A. D. M. Smith, & K. Smith (Eds.), The evolution of language. World scientific. Monti, Parsons, & Osherson. (2009). The boundaries of language and thought in deductive inference. Proc Nat Acad Sci 106:12554-12559. Newmeyer, F. J. (2004). Cognitive and functional factors in the evolution of grammar. In P. Hackney (Ed.), Evolution of language: Fifth international conference, leipzig. http://www.ling.ed.ac.uk/evolang/2004/evolang5.pdf. Piattelli-Palmarini, M. (2010). What is language, that it may have evolved, and what is evolution, that it may apply to language. In R. K. Larson, V. D´eprez, & H. Yamakido (Eds.), The evolution of human language. biolinguistic perspectives. Cambridge University Press. Pinker, S. (1998). The evolution of the human language faculty. In N. G. Jablonski & L. C. Aiello (Eds.), The origin and diversification of language. San Francisco: California Academy of Sciences. Tinbergen, N. (1963). On aims and methods in ethology. Zeitschrift f¨ur Tierpsychologie 20:410433. Turner, H. (2004). The appearance of design in grammatical universals as evidence of adaptation for non-communicative functions. In P. Hackney (Ed.), Evolution of language: Fifth international conference, leipzig. http://www.ling.ed.ac.uk/evolang/2004/evolang5.pdf.
137
HUNTER-GATHERER EGALITARIANISM ENABLED GRAMMAR TO EVOLVE CHRIS KNIGHT Independent Researcher, Brockley, London SE4 JEROME LEWIS University College London, Anthropology, Taviton Street, London, WC1H 0BW For grammar to evolve, it was not enough for social relations to become more cooperative. Before grammaticalization processes could get under way, primate-style dominance/submission dynamics had to be decisively countered by coalitionary resistance culminating in an egalitarian social order based on ‘reverse dominance’. Only once unprecedentedly trusting ingroup relationships had been established on this basis could full intersubjectivity emerge, enabling the flexible joint attention structures necessary for grammaticalisation to begin.
1.1. Hunter-gatherer social systems Woodburn’s (1982 and 2005) analyses of shared cultural traits across the world’s hunter-gatherer societies shows that they are either ‘immediate–return’ or ‘delayed-return’. Focusing on the simplest of these systems – immediatereturn societies – he elucidates a set of social practices characteristic of these ‘assertively egalitarian’ hunter-gatherers. Egalitarian economic relations are maintained by pressures to share which are imposed on anyone with more than they can immediately consume, so preventing saving and accumulation. People are systematically disengaged from property, hence from its potential to create dependency. Everyone has direct individual access to the resources necessary for survival. All can move freely as they please; no one can coerce others to do their will. People who brag or try to impose themselves on others are mercilessly teased, avoided or even exiled. Such societies include some Pygmy groups in Central Africa (Aka, Baka, Bayaka, Biaka, Efe, Mbendjele, Mbuti), Hadza in Tanzania, some San groups in Namibia and Botswana, several groups in India such as the Andaman Islanders, Hill Pandaram and Nayaka, and in south-east Asia, the Agta, Batek, Maniq, Penan and others. The distribution of these immediate-return traits across the world suggests that such social systems are likely to have great antiquity. 138
Indeed, Whiten and Erdal (2012) describe this remarkable set of similarities as the ‘human socio-cognitive niche’. Since primates often demonstrate related capacities, we can make inferences about the ancestral foundations enabling the emergence of five core components distinctive of early modern humans. These are (1) egalitarianism, (2) a gendered division of labour, (3) intersubjectivity, (4) language and (5) cultural transmission. The greatest number of contemporary hunter-gatherers in the world – estimates ranging from 100,000 to 500,000 people – inhabit the forests of the Congo Basin. By combining linguistic, material culture and genetic evidence with paleoecology of the region, Bahuchet (1996: 112) concludes that the ancestors of today’s Pygmies began populating this area 40-30,000 years ago, when forests were expanding. As the forests subsequently regressed, these groups became isolated in forest refuges, leading to the genetic and linguistic diversity between them today. Bahuchet (1996, Table 5.1, p. 109) tabulates his observations of cultural similarities and differences between Pygmy groups stretching from west to east across the Congo Basin. Yodel and polyphonic music are consistently associated with hunting and gathering, forest mobility, camps made of leaf and liana huts, woven-handled axes, and an egalitarian social order. In addition, gender roles are based on blood symbolism (Lewis 2008) while all exhibit a predatory engagement with their environment and with neighbouring cultures (Köhler and Lewis 2002). These similarities suggest that their shared traits are likely to be remnants of a more ancient Pygmy culture dating back to an ancestral population whose diversification is described above. Chen et al (2000) demonstrate how today’s Pygmy populations are related genetically to extant San Bushman populations, having once formed a single ancestral population. Noting their shared polyphonic singing styles, Grauer makes the strong claim that ‘Biaka Pygmies … could represent one of the oldest human populations … (and that) the Kung exhibited a set of related haplotypes that were positioned closest to the root of the human mtDNA phylogeny, suggesting that they too, represent one of the most ancient African populations … (therefore) the almost indistinguishable musical practices of the two groups may well date to at least the time of their divergence from the same population – a period that could … date to at least 76,000 years ago, but possibly as much as 102,000 years ago (Chen et al 2000: 1371).’ Grauer (2007: 6)
139
The evidence converges on an association between hunting and gathering, egalitarianism, a gendered division of labour, life governed by playful mimetic ritual – and a highly distinctive musical style. Too specific to emerge from convergent evolution, and with genetic evidence proving a shared past, these elements are best interpreted as components of a highly resilient, effective and ancient African hunter-gatherer culture. This insight provides important clues to help explain why out of 220 primate species just one speaks. 1.2. Continuity or discontinuity? We now turn to nonhuman primates. According to Cheney and Seyfarth (2005), the demands of primate social life, with its alliances, friendships and rivalries ‘create selective pressures for just the kind of complex, abstract conceptual abilities that are likely to have preceded the earliest forms of linguistic communication.’ They propose (2005: 153) that ‘the primate mind evolved in an environment characterized by intense social competition, that such competition created selective pressures favoring structured, hierarchical, rule-governed intelligence, and that such social intelligence shares many formal features with linguistic intelligence.’
This is a strong version of the continuity hypothesis: distinctively human linguistic capacities evolved without rupture or break from a starting point in primate cognition and communication (cf. Darwin (1871). Primate listeners, note Cheney and Seyfarth (2005: 147), are experts at gleaning rich information from ‘relatively impoverished’ vocal signals. Despite occasional deceit, these vocalisations generally constitute reliable evidence of the information they convey. Female baboon ‘threat-grunts’, for example, are given only by dominants to those beneath them in the hierarchy. ‘Fear barks’, meanwhile, ‘are unambiguous signals of subordination.’ (p. 148). The authors continue: ‘Baboons clearly understand the difference between Hannah threatens Sylvia and Sylvia threatens Hannah. It is also likely that nonhuman primates can represent descriptive modifiers (a big leopard as opposed to a small one) and prepositions that specify locations (a leopard in a tree as opposed to one under the tree).’ (p. 151)
They further describe primate comprehension as representational, based on discrete values, hierarchically structured, rule-governed, open-ended, propositional and independent of modality. These features, they point out, are precisely those characteristic of linguistic syntax. (p. 152)
140
Note, however, that despite all this, no such complex information is intentionally encoded in vocal sequences by those producing the signals. No syntactic structure in a baboon’s vocal output distinguishes, say, Hannah threatens Sylvia from Sylvia threatens Hannah. Rather, the ‘syntactical’ properties listed above are features exclusively of comprehension, as Cheney and Seyfarth
concede. Comprehension and production, then, are quite different capacities, the two evolving along radically divergent trajectories. We suggest that the factors responsible for this striking disconnect help explain why language never evolves among nonhuman primates. In our view, the ‘intense social competition’ invoked by Cheney and Seyfarth as an engine of language evolution is better understood as an obstacle, helping to explain why grammaticalisation does not occur at all among nonhuman primates, despite their intelligence. As evolutionary psychologists, Whiten and Erdal seek a balance between continuity and discontinuity. They stress that despite intriguing continuities, the group politics prevailing among nonhuman primates differ radically from those of human hunter-gatherers. Crucially, ‘while chimpanzee political dynamics may carry the seeds of egalitarianism, chimpanzees are far from egalitarian’ (Whiten and Erdal 2012: 2127). The overall ‘socio-cognitive niche’ inhabited by our ape relatives is qualitatively unlike that in which language evolved. Central to their argument is what Whiten and Erdal (2012) term ‘deep social mind’. Minds are ‘deeply social’ when they mutually interpenetrate, each side actively representing other minds while simultaneously assisting others to reciprocate in kind. Significant in this context are our ‘co-operative eyes’ (Kobayashi and Kohshima 2001). Human eyes are almond-shaped and horizontally aligned, the relatively dark iris standing out against a white sclera background. Eyes of this unusual kind are adaptive in enabling us to follow and reveal our direction of gaze, inferring one another’s focus of attention (Tomasello et al. 2007). The very different eye morphology of, say, an adult male gorilla – dark iris against dark facial surround – suggests adaptation for inscrutability, not transparency. Such eyes are designed for looking out, not looking in. A dominant male gorilla needs to survey others’ mental states but has no interest in exposing itself to comparable surveillance in return. Whiten and Erdal argue persuasively that ‘deep social mind’ cannot evolve under such circumstances. A vertical dominance hierarchy necessarily pits ‘self’ against ‘other’, making it difficult to reverse roles and perspectives in the necessary way. Where others are not one’s equals, it becomes correspondingly difficult to step into their shoes, viewing one’s own goals and motivations by 141
adopting their standpoint. Without such ‘egocentric perspective reversal’, language cannot even begin to evolve (Steels and Loetzsch 2009; Steels 2009). 1.3. The need for trust What levels of mutuality and trust are necessary for grammaticalisation to begin? In exploring such questions, an analysis of the range of communicative styles used by BaYaka Pygmies is illuminating. What stands out is these peoples’ playfulness, creativity and vocal-gestural freedom. Communicative strategies include animal and forest mimicry, reenactments, impersonations, sign languages, simultaneous use of multiple languages, song, dance and ritual ‘spirit play’ (Lewis 2009). Participation in polyphonic spirit play is central to these peoples’ sense of belonging and identity. Genetically distant Pygmy groups with many thousands of years of separation, today speaking different languages, correctly recognize themselves as egalitarian forest hunter-gatherers based only on hearing one another sing in this unusual style (Lewis 2013). The BaYaka ethnography provides an example of how words and rules can arise from gendered forms of mimicry. Men use deceptive mimicry to lure game animals within reach and in many other aspects of hunting and subsequent story-telling. Women use hilarious mimicry to collectively shame non-sharers or would-be cheats or show-offs, thereby imposing a normative social order from below. Collective action from below undermines anyone’s ability to establish hierarchical relations with others. Such ‘reverse dominance’ (Boehm (1999; 2012) serves to uphold levels of trust and shared perspective sufficient for selected elements of performance to be abbreviated, conventionalised and creatively combined in language-like ways (Lewis 2009, in press). The contrast with primate social life is striking. The forms taken by primate dominance vary widely, with despotic systems at one end of the scale and relatively relaxed, more democratic arrangements at the other (van Schaik 1989). Yet through all such variation, ape and monkey political dynamics remain highly competitive, preventing even the most intelligent animals from inventing or maintaining ‘symbolic’ traditions. The difficulty is that words are, by primate standards, ‘fakes’. They are volitionally controlled vocalisations whose meanings are ‘arbitrary’, based on social agreement and – unlike primate vocalisations – effective even when radically disengaged from bodily or emotional causation. So for listeners to accept them, there must exist unusually high levels of community-wide trust. Our central argument is that even when nonhuman primates form cooperative 142
alliances, these are unlikely to be sufficiently stable or inclusive to enable a lexicon to cumulatively evolve. Limited trust prompts receivers to reject playful mimicry, responding preferentially to signals that they can recognise as intrinsically reliable because ‘hard-to-fake’. This rules out group-wide acceptance of imaginative fictions, precluding the cultural evolution not just of grammatical complexity but even of limited symbolic vocal communication. 1.4. The origins of grammar To explain how grammar first evolved, Heine and Kuteva (2007) invoke grammaticalization – that continuous historical process in which free-standing words develop into grammatical markers, while these in turn become ever more specialised and grammatical. For grammaticalization to start, a basic precondition is freedom to innovate. Suppose an early speech community possessed just a few noun-like mimetic items such as ‘spear’ or ‘dance’. What would stop them from using these as verbs? Imagine a noun meaning ‘spear’. Why not use it to ‘spear’ a pig? Once you have a noun meaning a ‘dance’, why not use it to say ‘let’s dance’? Only if you were worried about grammar would this pose any difficulty. Heine and Kuteva point out that boundaries between grammatical categories are not as rigid as some theorists imagine. Categories like ‘noun’ and ‘verb’ arise out of usage; they certainly don’t need to be hardwired from the outset in the brain. Fixed categorical boundaries are unlikely precisely because the point of departure is ‘let’s pretend’ or ‘metaphor’ in the broadest sense – saying one thing while meaning another. This immediately introduces flexibility into the system. In a language without tense markers, for example, a speaker might point forward in space to indicate future tense. In real life, time is not space, but a sympathetic listener should get the point. Grammaticalisation in general works this way. Over time, as the functions of words diversify through usage, they become subject to subtle changes in the way they can be deployed. Preferences become habits, habits eventually become grammatical rules. Before long, unconsciously and collaboratively, the community will have constructed for itself a fully grammaticalised language. Note, however, that to release this dynamic in the first instance, listeners must license one another to be imaginative, drawing on their most playful instincts at the expense of all former reliability constraints.
143
1.5. Conclusion The logic of grammaticalization underlying language change over historical time is unique to our species. At first sight, it seems puzzling that even highly intelligent apes, when living in the wild, fail to transmit learned communicative techniques in such a way as to prompt the cumulative evolution of at least rudimentary grammatical structure. In this paper we have argued that the explanation is not so much cognitive deficiency considered in isolation as the absence of sufficient mutual trust. For grammaticalization to get under way, evolving humans had to give one another the freedom to ‘fake’ signal sequences at will. Individuals had to feel licensed to combine and convey messages which, though literally ‘false’, could be interpreted as figuratively or metaphorically ‘true’ (Knight 2008). Primates don’t grant one another such freedom. The reasons for this are deep-rooted, being inseparable from the competitive nature of primate sociality and politics. On these grounds, we conclude that before grammaticalization could get under way, primate-style dominance/submission dynamics had to be decisively overthrown and replaced by the ‘reverse dominance’ principles of egalitarian hunter-gatherers. References Bahuchet, S. (1996). Fragments pour une Histoire de la Forêt Africaine et de son Peuplement: les Données Linguistiques et Culturelles, in C. M. Hladik, A. Hladik, H. Pagezy, O. F. Linares, G. J. A. Koppert et A. Froment (eds.), L’alimentation en Forêt Tropicale: Interactions Bioculturelles et Perspectives de Développement. Paris: Éditions UNESCO, 97–119. Boehm, C. (1999). Hierarchy in the Forest: The Evolution of Egalitarian Behavior. Cambridge, MA: Harvard University Press. Boehm, C. (2012). Moral Origins: The Evolution of Virtue, Altruism, and Shame. New York: Basic Books. Chen, Y-S., Olckers, A., Schurr, T. G., Kogelnik, A. M., Huoponen, K. and Wallace, D. C. (2000). Mitochondrial DNA variation in the South African Kung and Khwe – and their genetic relationships to other African populations. American Journal of Human Genetics 66:1362–1383. Cheney, D. L. and Seyfarth, R. M. (2005). Constraints and preadaptations in the earliest stages of language evolution. Linguistic Review 22: 135-59. Darwin, C. (1871). The Descent of Man, and Selection in Relation to Sex, 2 vols. London: Murray. Grauer, V. (2007). New perspectives on the Kalahari debate: A tale of two ‘genomes’. Before Farming 2:4 online edition.
144
Heine, B. and T. Kuteva (2007). The Genesis of Grammar: A Reconstruction. Oxford: Oxford University Press. Knight, C. (2008). ‘Honest fakes’ and language origins. Journal of Consciousness Studies, 15, No. 10–11, 2008, pp. 236–48. Kobayashi, H., and Kohshima, S. (2001). Unique morphology of the human eye and its adaptive meaning: Comparative studies on external morphology of the primate eye, Journal of Human Evolution 40: 419–435. Köhler, A., and Lewis, J. (2002). Putting hunter-gatherer and farmer relations in perspective: A commentary from Central Africa. In Kent, S. Ethnicity, Hunter-Gatherers, and the ‘Other’: Association or Assimilation in Southern Africa? Washington: Smithsonian Institute, pp. 276-305. Lewis, J. (2008). Ekila: blood, bodies and egalitarian�societies. Journal of the Royal Anthropological Institute (N.S.) 14:�297-315. Lewis, J. (2009). As well as words: Congo Pygmy hunting, mimicry, and play, in R. Botha, and C. Knight (eds.), The Cradle of Language: Studies in the Evolution of Language. Oxford University Press, Oxford, pp. 236-256. Lewis, J. (2013). A cross-cultural perspective on the significance of music and dance on culture and society, with insight from BaYaka Pygmies. In M. Arbib (ed.) Language, Music and the Brain: A Mysterious Relationship. Cambridge MA: MIT. Lewis, J. (in press). BaYaka Pygmy multi-modal and mimetic communication traditions. In D. Dor, C. Knight and J. Lewis (eds.), The Social Origins of Language. Oxford: Oxford University Press. Steels, L. and Loetzsch, M. (2009). Perspective alignment in spatial language. In K. R. Coventry, T. Tenbrink and J. A. Bateman (eds.), Spatial Language and Dialogue. Oxford University Press, pp. 70-89. Steels, L. (2009) Is sociality a crucial prerequisite for the origins of language? In Botha, R. and C. Knight (eds), The Prehistory of Language. Oxford: Oxford University Press. Tomasello, M., Hare, B., Lehmann, H., and Call, J. (2007). Reliance on head versus eyes in the gaze following of great apes and human infants: The Cooperative Eye Hypothesis, Journal of Human Evolution 52: 314–320. van Schaik, C. P. (1989). The ecology of social relationships amongst female primates. In The Behavioral Ecology of Humans and Other Mammals, ed., V. F. Standen, pp. 195-218. Oxford: Blackwell. Whiten, A. and Erdal, D. (2012). The human socio-cognitive niche and its evolutionary origins. Phil. Trans. R. Soc. B 2012 367, 2119-2129. Woodburn, J. (1982). Egalitarian Societies. Man, the Journal of the Royal Anthropological Institute, 17, 3: 431-51. Woodburn, J. (2005). Egalitarian societies revisited. In Widlok, T. and Tadesse, W. G., Property and Equality. New York & Oxford: Berghahn. Vol. 1, pp. 18-31.
145
GRASPING COMPOSITIONAL PATTERNS IN AN ARTIFICIAL LANGUAGE BY CHINESE PARTICIPANTS YAU WAI LAM, TAO GONG Department of Linguistics, University of Hong Kong, Pokfulam Road, Hong Kong We conduct an artificial language learning experiment on Chinese participants, and discover that these non-alphabetic language users can grasp compositional structures in alphabetic utterances encoding a semantically-more salient feature (shape) and a less so feature (texture). Statistical analyses on the experimental results reveal (a) a simultaneous learning of compositional items and sequential order that regulates these items, (b) a female advantage in generalizing learnt knowledge in novel instances, and (c) a freedom of syntax from certain type of semantic saliency. These findings complement available experimental and simulation studies on issues concerning compositionality, and inspire some reconsideration on the semantics-syntax correlation in language.
1. Introduction Compositionality (the principles on how meanings of complex expressions are constructed from their subparts via regulating rules, Krifka, 2001) and its evolution in human language has been a popular research topic in evolutionary linguistics (Christiansen & Kirby, 2003). Apart from computer simulations (e.g., Kirby, 1999; Vogt, 2005; Gong, 2011), artificial language learning experiments (ALL) have recently joined the endeavor to tackle these issues. Some of these experiments have shown that compositionality reflects linguistic adaptation (compositional segments adapted to capturing distinct meaning niches), and that socio-cultural factors (e.g., bottleneck during cultural transmission) play important roles in the origin of compositionality (Kirby et al., 2008; Cornish et al., 2009). Available ALL experiments are based primarily on alphabetic language users, and their foci largely lie in the origin of compositionality out of an initially randomized artificial language. In those experiments, compositional items are often induced either on purpose or due to recalling mistake. Although these ALL experiments explicitly show that semantic features (e.g., shape, color, or motion) end up being encoded by compositional items and a consistent structure of these items arise naturally, it remains unclear: (a) whether predefined compositional
146
items and structures can be grasped by alphabetic or non-alphabetic language users; and (b) whether saliency (relative conspicuousness of distinct semantic concepts in a given situation) of different semantic items affects the acquisition of structures that regulate the utterances encoding these items. Psychological experiments have revealed a consistent saliency hierarchy among visual features such as colors, shapes, and textures: to humans, colors are more salient than shapes, and shapes are more salient than textures (Şaykol et al., 2004). It is interesting to see whether such semantic constraint can influence language structure. One way to examine this question is to analyze whether humans show different performances in grasping structures consistent with this saliency hierarchy or not. In this paper, following the step-training paradigm of De Boer & Verhoef (2012), we conducted an ALL experiment on non-alphabetic language (Chinese) users. We designed an artificial language using a set of predefined compositional items and an arbitrary order regulating these items in utterances. For the sake of simplicity, the semantics was reduced to two dimensions (shape and texture), and we formed two versions of the artificial language that had the “shape+texture” (consistent with the saliency hierarchy) and “texture+shape” (reverse to this hierarchy) orders respectively. We tested (a) whether the participants could, after repeated exposure, grasp the compositional items and orders and apply this learnt knowledge in judging familiar and novel meanings or utterances; and (b) whether this semantic saliency affected the learning of the compositional items and orders. In the following sections, Sec. 2 describes the experiment, Sec. 3 reports and analyzes the experimental results, and Sec. 4 discusses our findings and points out future directions of this study. 2. Methods and Materials 2.1. Participants This experiment was approved by the College Research Ethics Committee (CREC) of the University of Hong Kong. Thirty-two students from this university volunteered for the experiment (16 females, age range: 18-24, average: 20.5), whose native languages is Putonghua or Cantonese. All participants had normal or adjust-to-normal eye-sight and no brain injuries. They signed a consent form before the experiment and got paid after completing it.
147
2.2. Experimental Materials We designed an artificial language, the utterances of which were chosen from an alphabetic script and the semantics of which included 16 visual stimuli (created by PhotoImpact X3) each formed by one of four shapes (star, square, triangle, and circle) and one of four textures (stripes, dots, zigzag, and checkerboard). The utterances had various lengths and combinations, including consonant-vowel clusters and those with or without consonants at onset or coda positions. An utterance could be divided into two parts, respectively encoding a shape and a texture. Based on the two orders of these two parts, we defined two versions of artificial language. Table 1 shows the meaning-utterance mappings of these languages. Sixteen participants (8 females) were assigned to learn Ver. 1 (“shape+texture”) of artificial language, and the others Ver. 2 (“texture+shape”). Table 1. Meaning-utterance mappings of the two versions of artificial language. Hyphens (not shown in exposed utterances to participants) are inserted to show the segments. Dashed cells are utterances not shown at the learning stages.
2.3. Procedure The experimental procedure was implemented by E-Prime. During the experiment, the participants sat comfortably in front of a laptop in a dim, quiet room. They were asked to learn an artificial language through three five-minute blocks, with optional two-minute breaks in between. The whole experiment lasted about 20 minutes. Each block consisted of a learning stage and a testing stage. At each learning stage, 12 out of the 16 meaning-utterance mappings of the artificial language were presented to the participants for three times in a fixed predefined order (Fig. 1(a)). This order ensured that any consecutive presentations contained distinct shapes and textures, which made it hard for participants to immediately notice the semantic or utterance similarity. At the testing stages of the first two blocks, participants were asked 20 questions presented in a random order. Ten of these questions were meaning selection questions, in each of which one utterance and three meanings were shown to the participants, who were instructed to indicate, by key pressing, the
148
meaning encoded by the utterance (Fig. 1(b)). The optional meanings in each question shared no or at most one shape or texture. The other ten questions were utterance selection questions, in each of which the participants were asked to match one meaning onto one of three optional utterances (Fig. 1(c)). All meanings and utterances shown in the learning stages appeared at least once and at most twice in these 20 testing questions. This setting removed the confounding factor that participants might be more sensitive to some meanings or utterances. The purpose of including multiple learning and testing stages was to trace the learning progress. At the testing stage of the third block, apart from the 20 normal questions, we added four meaning selection and four utterance selection questions, all of which involved the four utterances or meanings not exposed during the learning stages (the dashed cells in Table 1). These eight questions aimed to evaluate whether the participants could generalize the learnt items and structure to judge novel instances. Together with the 20 normal questions, these 28 questions were presented to the participants in a random order. After completing the experiment, the participants also filled in the postexperiment questionnaires to indicate how they processed the artificial language.
(a) (b) (c) Figure 1. (a) Example of presentations at the learning stages; (b) Example of meaning selection questions; (c) Example of utterance selection questions.
3. Results To analyze the experimental results, we extracted the correctness and reaction time data in answering the 20 normal questions asked at the three testing stages and the eight additional questions asked only at the third testing stage. For the normal questions, the participants showed similar performances in answering the 10 meaning selection and 10 utterance selection questions (pairwised T-test: for correctness, t(95)=0.7564, p=0.4513; for reaction time, t(95)=0.4744, p=0.6363), which matched their answers in the post-experiment questionnaires. In addition, both male and female participants showed similar performances in answering these 20 questions (group T-test: for correctness,
149
t(94)=-1.4537, p=0.1494; for reaction time, t(94)=0.1598, p=0.8734). Furthermore, a two-way repeated measures ANOVA (two factors: two versions of artificial language and three learning stage) revealed that the version of artificial languages had no significant effects on correctness (F(1,15)=0.690, p=0.4191, η2=0.0106) and reaction time (F(1,15)=0.005, p=0.9445, η2 SHAPE > TEXTURE > COLOR > MATE RIAL, where the dimensions to the left are further removed from the noun than the ones on the rightaa . aa The focus on adjectives with clear perceptual correlates is necessary for the aims of this work. Omitting ‘higher’ adjective denoting evaluative and speaker-related aspects is a consequence of the focus on extralinguistic correlates which can only be assessed for adjective classes/feature dimensions with well-defined perceptual content..
296
Within parts of the syntactic literature, AORs have been described as reflexes of a hierarchy of specific functional projections (Scott, 2002), although the source of such a hierarchy, interpreted by Scott as a sequence of functional projections coded in Universal Grammar, remains contented. A possible extralinguistic basis for AORs has been explored both from a theoretical (Bouchard, 2002) and a psycholinguistic (Martin, 1969, and much subsequent work) vantage point. Some early works (Danks & Schwenk, 1972, and later) assumed a ”Pragmatic Communication Rule”, whereby the most relevant features, i.e. those least predictable, would be mentioned first to aid understanding by the listener. This account, first developed for English, predicts that the typical order for languages noun-adjective order should be NA1 A2 A3 (where the subscripts indicate the different adjectives’ usual position in English). Contrary to that, we typically find a fully inverted order, i.e. NA3 A2 A1 (Richards, 1975; Plank, 2007). A more promising approach is to derive adjective order from the (perceived) intrinsicality of the feature dimensions involved. Two of the best pieces of evidence for this are the forced choice paradigm of Byrne (1979, ) in (1)-(2) and the naming task from Belke (2006). Collocating two contradictory adjectives of the same class, Byrne found that the only (marginal) interpretation was one with the adjective closer to the noun representing a permanent and typical feature, and the other one a transient and accidental one. (1) a slow fast dog a. * ‘a rocket powered Saint Bernard’ b. ‘an aging greyhound’ (2)
a fast slow dog a. ‘a rocket powered Saint Bernard’ b. * ‘an aging greyhound’
Belke (2006) provides evidence against an incremental formulation account and argues for what she calls the ‘perceptual classification account’ of AORs. By contrasting the categories COLOUR and SIZE in a referential communication task, she shows that colour, not size – i.e., the category closer to the noun – is processed automatically even when it is task-irrelevant. Subjects were tasked with identifying a target in an array of objects varying by size and color. In the ‘size-only condition’, i.e. when size alone uniquely identified the target, they overspecified for colour 87% of the time with a ‘neutral’ instruction, rising to 99.7% under time constraints. In the colour-only condition, overspecification for size occured in no more than 26.7% (42.9%) of trials. One shortcoming of Belke (2006) is that the contrast is exclusively between COLOUR and SIZE , one of which is an intersective, absolute category, and the other relative. It is not clear whether her results straightforwardly extend to the relative ordering among different subtypes of absolute adjectives, such as
297
COLOUR / TEXTURE/ SHAPE . Indeed, her discussion indicates that she considers the absolute/relative dichotomy to be crucially involved in producing the asymmetries she observed, in that color is conceptualised first and enters a closer connection to the noun mainly because it can be assessed without comparison to other objects.
2. Experiments 2.1. Experiment 1: Reaction times in a visual search paradigm This experiment tests reaction times as a function of varying adjective order. Set up like a simple video game, it increases in difficulty at every level. Levels have a fixed duration, and subjects proceed to the next level if and only if they score 50% or more correct clicks.
Figure 1. Top: verbal instruction (English version). Bottom: visual array subsequently displayed (dark grey=blue, light grey=red)
298
After the description of a target object appears on screen to be read by the After the itdescription object appears on screen to be for read1500ms by the participants, is replacedofbya target a random array of objects, displayed participants, it is replaced by a random array of objects, displayed for 1500ms at level 1, and for increasingly shorter times thereafter. One and only one of at 1, and for increasingly shorter timesthethereafter. One and three only feature one of thelevel objects satisfies the description. Overall objects vary along the objects satisfies the description. Overall the objects vary along three feature dimensions (SIZE, SHAPE, COLOR) and fall into two different categories (stylised SHAPE , COLOR) and fall into dimensions (SIZE,see faces and houses, the screenshot in figure 1. two different categories (stylised faces and houses, see the screenshot in figure The main parameter of interest, recorded1.by the programme, is whether the The main interest, recorded by the programme, is whether the instruction wasparameter given in of a canonical order (‘large round blue house’) or a noninstruction was given in a canonical order (‘large round blue house’) or a noncanonical one. The goal is to determine the effect on search time and error rate canonical one.variables. The goalPotential is to determine the effectsuch on search time on andscreen error rate as dependent other confounds as position and as dependent variables. Potential other confounds such as position on screen and redundancy of the description (e.g. whether the target was the only large round redundancy of the description (e.g. whether the target was the only large round object displayed) are also recorded and can thus be statistically controlled for. object displayed) are also andiscan thus be statistically controlled for. A measurable effect ofrecorded word order insufficient to decide between competing b A measurable effect of word order is insufficient to decide between competing hypotheses about the nature and origin of adjectival or feature hierarchies . Taken b hypotheses abouterror the nature adjectival feature hierarchies Taken alone, increased rates and and origin searchoftimes couldoreven be showing the .cost of alone, increased error rates and search times could even be showing the cost of dealing with the marked constructions themselves. Quantifying the effect, howdealing with the marked constructions themselves. Quantifying the effect, however, allows for the results of this experiment to be used as a baseline for further ever, allows the results of this experiment to be usedThe as aexperiment baseline for further studies whichfortarget contextual effects more explicitly. itself can studies which target contextual effects more explicitly. The experiment itself can give hints by allowing to estimate mitigating and aggravating factors through its give hints by allowing to estimate mitigating andnature aggravating factors its in-built controls. Different hypotheses about the of AORs, whilethrough they may in-built controls. Different hypotheses about the nature AORs, while they may agree that a processing effect of non-canonical order isofto be expected, predict agree thatifaany, processing effect of those non-canonical order is to be expected, predict different, interactions with other factors. different, if any, interactions with those other factors. 2.2. Further steps: Implicit categorisation task 2.2. Further steps: Implicit categorisation task A non-verbal implicit categorisation task can be used to assess which feature diA non-verbal implicit categorisation can be usedThis to assess which feature dimensions are preferentially construedtask as distinctive. experiment, again uses mensions are preferentially construed as distinctive. This experiment, again uses the categories of SHAPE, COLOUR, and SIZE. In each round, an array of 2 x 3 the categories SHAPE , COLOUR , and SIZEis . In each One round, an array of 2 x 3 objects varyingofalong those three dimensions shown. of these dimensions, objects varying along those three dimensions is shown. One of these dimensions, randomly chosen in each session, is relevant, while the two remaining dimensions randomly chosenSubjects in each session, is relevant, thescore two remaining dimensions are distractors. are awarded with a while positive which ultimately deare distractors. Subjects are awarded with a positive score which ultimately determines payout when they choose an item of the right category, while the choice termines payout when they choose an item of the right category, while the choice of one category over the other along the other dimensions is inconsequential. In of category other along the other dimensions inconsequential. In anyone single round,over therethecan be several right choices that mayisvary along irrelevant any single round, there can be several right choices that may vary along irrelevant dimensions. dimensions. The only instruction participants receive is to try to score as much as they instruction participants receive is tothe tryexperimenter to score as much they can,The and only due to the random assignment, not even knowsaswhich can, and due to the random assignment, not even the experimenter knows which dimension will be relevant in the current session. It is recorded how long it takes dimension be relevantthe in the current session.using It is recorded howoflong takes participantswill to determine relevant category, a threshold 14 itcorrect participants to determine the relevant category, using a threshold of 14 correct choices among the last 15 rounds. The choices (whether right or wrong for the choices among the last 15 rounds. The choices (whether right or wrong for the b Thanks b Thanks
to an anonymous reviewer for reminding me to make this point more explicit. to an anonymous reviewer for reminding me to make this point more explicit.
299
purposes of payout) give further clues to the process of hypothesis formation and rejection. A prediction of the intrinsicality account of AORs is that the ease with which participants recognise a dimension as relevant correlates with the linguistic factor c of closeness to the nounc . The non-verbal design of the experiment has the benefit to allow direct crossd species comparisonsd with existing work on concept learning in model species (Roberts & Mazmanian, 1988; Goto, Wills, & Lea, 2004, among others); or even to specifically design animal experiments to match this task. Thus it becomes possible, at least in principle, to trace the cross-species distribution and reconstruct the phylogeny and ecology of a cognitive trait underlying one particular aspect of human langauge(s). 3. Summary The clear physical character of the referents of adjectives of shape, colour, and size allows to study constraints on their syntax as a relatively approachable aspect of how the ‘Faculty of Language in the Broad Sense’ (Hauser, Chomsky, & Fitch, 2002) shapes Language(s). This provides a basis for identifying analogous and homologous cognitive constraints in other species (Fitch, Huber, & Bugnyar, 2010; Fitch, 2012). The experiments presented are steps towards such an empirical study of the Grammar-Cognition interface and its biological and evolutionary basis. Acknowledgements The work presented here has benefited from discussions with and comments from (in alphabetical order) Boban Arsenijevi´c, Dan Bowling, Tecumseh Fitch, Gy˝orgy Gergely, Katharina Hartmann, Dalina Kallulli, Martin Prinzhorn, Viola Schmitt, Gesche Westphal-Fitch, the audience at the Workshop ‘The Adjective - Semantic, Pragmatic, and Discursive Approaches’ (Clermont-Ferrand, 2013), and three anonymous reviewers invited by the organisers of EvoLang 10. Where I ignore their suggestions, I do so at my own peril. References Alexiadou, A. (2001). Adjective syntax and noun raising: Word order asymmetries in the DP as the result of adjective distribution. Studia Linguistics, cc The literature on the ‘shape bias’ (Landau, Smith, & Jones, 1988), and the more recent elaboraSHAPE tions on which contexts do and do not trigger it (Wilcox & Chapa, 2004, a.m.o) suggest that SHAPE might be an outlier here: It could be preferentially relied upon not because it is interpreted as the most intrinsic among several attributes applied to an object, but because it is not interpreted as an external attribute at all and chosen as the marker of kind membership. dd The minimal instruction subjects do get is comparable with many experimental animals’ previous experience with operant learning conditions on unrelated tasks.
300
55(3), 217–248. Belke, E. (2006). Visual determinants of preferred adjective order. Visual Cognition, 14(3), 261–294. Bouchard, D. (2002). Adjectives, numbers, and interfaces: Why languages vary. Elsevier. Byrne, B. (1979). Rules of prenominal adjective order and the interpretation of “incompatible” adjective pairs. Journal of Verbal Learning and Verbal Behavior, 18(1), 73–78. Cinque, G. (2010). The syntax of adjectives. Cambridge, MA: MIT Press. Danks, J. H., & Schwenk, M. A. (1972). Prenominal adjective order and communication context. Journal of Verbal Learning and Verbal Behavior, 11, 183–187. Dixon, R. M. W., & Aikhenvald, A. Y. (Eds.). (2004). Adjective classes: A cross– linguistic typology (Vol. 1). Oxford: Oxford UP. Fitch, W. T. (2012). Evolutionary Developmental Biology and Human Language Evolution: Constraints on Adaptation. Evolotionary Biology, 39(4), 613– 637. Fitch, W. T., Huber, L., & Bugnyar, T. (2010). Social Cognition and the Evolution of Language: Constructing Cognitive Phylogenies. Neuron, 65, 795–814. Goto, K., Wills, A. J., & Lea, E. G. (2004). Global-feature classification can be aquired more rapidly than local-feature classification in both humans and pigeons. Animal Cognition, 7, 109–113. Hauser, M., Chomsky, N., & Fitch, W. T. (2002). The faculty of language: What is it, who has it, and how did it evolve? Science, 298, 1569–1579. Landau, B., Smith, L. B., & Jones, S. S. (1988). The importance of shape in early lexical learning. Cognitive Development, 3(3), 299–321. Martin, J. (1969). Semantic determinants of preferred adjective order. Journal of Verbal Learning and Verbal Behavior, 8, 697–704. Plank, F. (2007). Extent and limits of linguistic diversity as the remit of typology – but through constraints on what is diversity limited? Linguistic Typology, 11, 43–68. Richards, M. M. (1975). Pragmatic communication rule of adjective ordering: A critique. American Journal of Psychology, 88(2), 201–215. Roberts, W. A., & Mazmanian, D. S. (1988). Concept learning at different levels of abstraction by pigeons, monkeys, and people. Journal of Experimental Psychology: Animal Behavior Processes, 14(3), 247–260. Scott, G. (2002). Stacked Adjectival Modification and the Structure of Nominal Phrases. In G. Cinque (Ed.), Functional structure in dp and ip (Vol. 1, pp. 91–120). Oxford: Oxford University Press. Wilcox, T., & Chapa, C. (2004). Priming infants to attend to color and pattern information in an individuation task. Cognition, 90(3), 265 - 302.
301
SUPPORTING EVIDENCE FOR LANGUAGE POLYGENESIS FROM NEANDERTHAL-HUMAN INTERBREEDING SZETO PUI YIU Department of Linguistics and Modern Languages, The Chinese University of Hong Kong Shatin, New Territories, Hong Kong SAR Contrary to popular belief in the last decade, recent comparative genomic studies provide strong evidence for the occurrence of interbreeding between the Neanderthals and Homo sapiens. Based on anatomical, genetic, and archaeological evidence, this study argues that the Neanderthals were biologically language-ready, but their simple social system may have made language unnecessary for them until they started interacting with Homo sapiens. The successful interbreeding between the Neanderthals and Homo sapiens further suggests that the former were likely able to acquire language. This paper argues that the Neanderthal’s successful acquisition of human language may serve as supporting evidence for the language polygenesis hypothesis.
1.
Introduction
1.1. Background Information First discovered in the Neander Valley, Germany, in 1856, Homo neanderthalensis, more commonly known as the Neanderthals, is the longest known of all extinct Homo species (Klein, 2003). Neanderthals and Homo sapiens (hereafter used interchangeably with humans) shared a common ancestry dated at around 400,000-600,000 BP, and they went along separate paths thereafter, the former in Eurasia and the latter in Africa (Finlayson, 2004). It was believed that the Neanderthals went extinct rather rapidly after a group of humans (often known as the Cro-Magnons) left Africa and invaded Eurasia at around 45,000 BP (Klein, 2003). The mainstream belief in the last decade was that the Neanderthals went extinct at around 30,000 BP, with no gene flow between humans and them (Lewin, 2005). However, the Neanderthal Genome Project (Green et al., 2010) shows that Neanderthals share some genes with present-day non-African humans but not with the African ones, which provides strong evidence for the occurrence of gene flow from the Neanderthals to the 302
ancestors of present-day non-African humans. The findings of the Neanderthal Genome Project certainly have a huge impact on the study of human evolution. Based on the latest findings on Neanderthal-human interbreeding, this study aims to discuss the language capacity of the Neanderthals, and the implications of such findings for language evolution. 1.2. Relationship between Homo sapiens and other Homo species While it is quite well-agreed that Homo sapiens evolved from Homo erectus in Africa (Lewin, 2005), not until very recently did we know about the possible interbreeding between Homo sapiens and other Homo species, thus leading us to a more complete understanding of our evolutionary history. Apart from the Neanderthal Genome Project, another recent landmark in the study of human evolution is the discovery of the Denisovans (Krause et al., 2010), a previously unknown Homo species found in southern Siberia. Subsequent genomic analysis shows that present-day Australasian people carry about 5% of Denisovan DNA (Reich et al., 2010). As the Denisovans are a newly discovered species, currently we know relatively little about them. Therefore, Neanderthal-human interbreeding and its implications are the focus of this study. However, as discussed later, the evidence for Denisovan-human interbreeding can make some of the points raised in this paper more persuasive. Based on the latest findings on Neanderthal-human interbreeding, this study aims to discuss the language capacity of the Neanderthals, and the implications of such findings for language evolution. 2.
Language Capacity of Neanderthals
Given that there was interbreeding between the Neanderthals and humans, there must have been communication between the two Homo species. Whether the Neanderthals could use language to communicate has long been the interest of scholars from various fields. This paper argues that the Neanderthals were biologically language-ready, based on anatomical and genetic evidence. The relationship between archaeology evidence and language capacity is also discussed in this section. 2.1. Anatomical evidence The anatomy of the brain and vocal tract is considered in this section. First, it is well agreed that our language capacity is related to our brain size and complexity. Given that a Neanderthal brain is as large (or even slightly larger
303
than) and complex as a human brain (the Broca’s area, which is closely related to language production, is present in the brain of both Homo species) (Lewin, 2005), a Neanderthal brain is possibly able to cope with the high complexity of language. Second, the vocal tract anatomy of the Neanderthals was also believed to be a key factor which determines whether the Neanderthals could have language. The Neanderthals were widely believed to be unable to produce articulate human speech because of their vocal tract anatomy (Lieberman, 1984; 1991). A phonetic study (Boë et al., 2007) illustrates that structured speech is anatomically possible in the Neanderthals as a Neanderthal vocal tract is comparable to that of a 10-year-old human child, and in fact it is the motor control of the articulators that ultimately determines whether the Neanderthals could produce a wide range of speech sounds. The fine motor control of articulators such as the jaw, the tongue, and the lips is related to FOXP2, the socalled language gene. 2.2. Genetic evidence As language is a highly complex system, it is extremely difficult, if not impossible, to exhaustively identify all the genes which are relevant to language. The complexity can be reflected by the fact that most pedigree analyses of language disorder were not consistent with a single defective gene (Chow, 2005). Nonetheless, the FOXP2 gene receives most attention since it was confirmed to be associated with a speech and language disorder which runs in the KE family. Although it is misleading to refer to FOXP2 as the language gene, as it is currently the best known gene with clear association with language, we will focus on FOXP2 in this section. As the FOXP2 gene is associated with orofacial muscle control and the activation of Broca’s area, it is not only essential to the articulation of speech sounds but also to our grammatical capabilities (Hurford, 2012). Genetic studies show that the human FOXP2 gene has undergone 2 amino-acid-changing mutations since the human-chimpanzee split; and the Neanderthal FOXP2 gene had the same 2 amino-acid-changing mutations (Krause et al., 2007). In other words, Neanderthals and humans share the same FOXP2 variant. This discovery further suggests that language use was possible among the Neanderthals. In short, both the anatomical and genetic evidence support that the Neanderthals were biologically language-ready.
304
2.3. Archaeological evidence Language does not fossilize, but artifacts do. It is possible to deduce the social and linguistic behavior of ancient hominids from their material culture (Davidson, 2003). Archaeologists believe that the trajectory of the complexity of tool technology through time may reveal something about the change in language capacity (Lewin, 2005). However, it is noteworthy that tool use and language use involve different cognitive processes, and they can only be regarded as having an analogical relationship. In this section, 2 aspects of material culture, namely stone tool and artwork, are discussed. 2.3.1. Stone tool Given the durability of stone-tool assemblages, the study of such artifacts figures prominently in our understanding of human evolution (Mellars, 1996). In Europe, stone-tool cultures prior to the Neolithic are termed the Paleolithic, which is further subdivided into the Lower Paleolithic, the Middle Paleolithic, and the Upper Paleolithic (Lewin, 2005). There is a good match between the archaeological and fossil evidence in Europe – the Neanderthals were always associated with Middle Paleolithic, the less advanced stone-tool industry; while humans were always associated with Upper Paleolithic, the more advanced stone-tool industry (Shea, 2011). As Diamond (1992) describes, Upper Paleolithic tools have a much higher complexity and diversity than their Lower Paleolithic counterparts. According to Lewin (2005), stone-tool complexity may reflect social complexity; beyond a certain degree of social complexity, abstract and complicated notions about social norms and patterns would frequently be the topic of “conversation”, which would not likely be possible without language. The less advanced stone-tool technology associated with the Neanderthals may imply that they had a simpler social structure than the concurrent Homo sapiens. Therefore, the Neanderthals were probably biologically language-ready, but their culture and social structure were relatively simple, suggesting that language might not be an indispensable component of their society. 2.3.2. Artwork The prevalence of art work and jewellery is a distinguishing feature of the Upper Paleolithic culture (Finlayson, 2004; Klein, 2003; Lewin, 2005; Mellars, 2004). The appreciation of art work and ornaments is a sign of the increase in symbolic behaviour, which, according to Finlayson (2004), reflects the 305
increasingly sophisticated social system of Homo sapiens. Musical instruments such as flutes and rattles are also associated with the Upper Paleolithic culture yet evidently lacking from the Lower Paleolithic one (Klein, 2003; Mellars, 2004; Lewin, 2005). The Neanderthals’ inability of such achievements also suggests that they had a less developed culture and simpler social structure. 2.4. Interim summary Anatomical and genetic evidence suggests that the Neanderthals were probably biologically language-ready. On the other hand, archaeological evidence suggests that the Neanderthals probably had a relatively simple social structure, which indicates that language was probably not essential in a Neanderthal society. In addition to one’s biological language-readiness, one’s possession of language may also depend on the environment to which one is exposed. This issue, which is relevant to the question whether the Neanderthals had language, is examined in the next section. 3. Interaction between Neanderthals and Homo sapiens Fossil evidence shows that the Neanderthals and Homo sapiens co-existed for around 15,000 years. After the years of co-existence, the Neanderthals seem to have gone extinct. As there is no evidence of direct physical conflict between the 2 Homo species (Horan et al., 2005), the more sophisticated social structure and material culture of Homo sapiens were probably the key for them to adapt to the ever-changing environment better, especially given that the period of Neanderthal-human co-existence coincided with the last glacial period (Finlayson, 2004). A possible scenario is that most Neanderthal groups were outcompeted by human groups, and eventually died out. However, given that there is gene flow from the Neanderthals to Homo sapiens, some Neanderthals must have been able to survive and reproduce. As the Neanderthals were probably biologically ready for language and other complex behavior, their degree of behavioral plasticity was probably on a par with Homo sapiens. Some Neanderthals probably managed to pick up the technology and culture associated with humans through complex interaction and imitation. Mastery of language was essential for the Neanderthals to communicate effectively with humans and get accepted as a member of society. Such Neanderthals eventually merged into the human populations, making a traceable contribution to the human gene pool. Interestingly, while nuclear DNA analysis shows gene flow between the 2 Homo species, mitochondrial DNA (mtDNA) analysis does not. Given that 306
mtDNA is exclusively maternally inherited, the presence of Neanderthal nuclear DNA (inherited both maternally and paternally) and the absence of mtDNA in modern non-African human populations suggests that successful interbreeding only happened between male Neanderthals and female Homo sapiens. Interbreeding between male Homo sapiens and female Neanderthals might be rare, absent, or not able to produce fertile offspring. Given such an interbreeding pattern, we cannot rule out the possibility that the male Neanderthals mated with the female Homo sapiens by force without being accepted as a member of the human society, as it is likely that the Homo sapiens were reluctant to mate with the culturally less developed Neanderthals. Even so, the presence of Neanderthal DNA in all present-day non-African human groups shows that complete merger of the Neanderthal-human hybrids into the concurrent Homo sapiens group must have occurred extensively, which strongly suggests that the hybrids’ ability to handle language and other complex skills was not significantly hindered despite carrying 50% of Neanderthal DNA, as long as they had the opportunity to get exposed to the required social and environmental stimuli. 4. Implications for Language Evolution The most significant implication of the above findings is that languagereadiness does not necessarily lead to actual language use. Language is a biocultural product – on the one hand, we are genetically disposed to possess a neural system and body structures that support language use; on the other hand, language is a cumulative product which is developed and transmitted in societies. Language use is unlikely in Homo species which lived in highly isolated groups with simple social structure even if they were biologically language-ready. As the Neanderthals and their concurrent Homo sapiens had the same variant of FOXP2 gene, and both of them were anatomically languageready, Homo species may have been language-ready for at least 400,000 – 600,000 years. The tool technology associated with the hominids at that time was the Lower Paleolithic industries, which is a primitive material culture. Given that tool complexity reflects social complexity, the hominids associated with Lower Paleolithic industries probably had very simple social structures and were likely to have actual language use. As a bio-cultural product, language may have emerged for multiple times in different regions and evolved as several lineages independent of each other. Given that language use is probably related to cultural development, and the material cultures of different regions of the world developed in different rates in 307
history (Shea, 2011), it is unlikely for all present-day languages to share a single common ancestor. This may help to explain why there are such a large number of language families in the world which cannot be convincingly shown to be genetically related, and why it seems next to impossible to reconstruct a “ProtoWorld”, despite the hard work of many historical linguists for decades. The analyses made in this study support some previous studies which highlight the likelihood of language polygenesis through mathematical modeling (e.g. Coupé & Jean-Marie, 2005; Freedman & Wang, 1996). References Boë, L.-J. (2007). The vocal tract of newborn humans and Neanderthals. Journal of Phonetics, 35, pp. 564-81. Chow, K. L. (2005). Speech and Language - A Human Trait Defined by Molecular Genetics. In J. W. Minett, & W. S.-Y. Wang (Eds.), Language Acquisition, Change and Emergence (pp. 21-45). Hong Kong: City University of Hong Kong Press. Coupé, C., & Jean-Marie, H. (2005). Polygenesis of linguistic strategies: a scenario for the emergence of languages. In J. W. Minett, & W. S.-Y. Wang (Eds.), Language Acquisition, Change and Emergence: Essays in Evolutionary Linguistics (pp. 153-201). Hong Kong: City University of Hong Kong Press. Davidson, I. (2003). The archeological evidence of language origions. In M. H. Christiansen, & S. Kirby (Eds.), Language Evolution (pp. 140-57). Oxford: Oxford University Press. Diamond, J. (1992). The Third Chimpanzee. New York: HarperCollins Publishers. Finlayson, C. (2004). Neanderthals and Modern Humans. New York: Cambridge University Press. Freedman, D. A., & Wang, W. S.-Y. (1996). Language Polygenesis: A Probabilistic Model. Anthropological Science, 104, pp. 131-38. Green, R. E., Krause, J., Briggs, A. W., Maricic, T., Stenzel, U., Kircher, M., et al. (2010). A Draft Sequence of the Neandertal Genome. Science, 328, pp. 710-22. Horan, R. D., Bulte, E., & Shogren, J. F. (2005). How trade saved humanity from biological exclusion: an economic theory of Neanderthal extinction. Journal of Economic Behaviour & Organization, 58, pp. 1-29. Hurford, J. R. (2012). The Origins of Grammar. Oxford: Oxford University Press. Klein, R. G. (2003). Whither the Neanderthals? Science, 299, pp. 1525-27.
308
Krause, J., Fu, Q.-M., Good, J. M., Viola, B., Shunkov, M. V., Derevianko, A. P., et al. (2010). The complete mitochondrial DNA genome of an unknown hominin from southern Siberia. Nature, 464, pp. 894-897. Krause, J., Lalueza-Fox, C., Orlando, L., Enard, W., Green, R. E., Burbano, H., et al. (2007). The derived FOXP2 variant of modern humans was shared with Neandertals. Current Biology, 17, pp. 1908-12. Lewin, R. (2005). Human Evolution (5th ed.). Oxford: Blackwell Publishing Ltd. Lieberman, P. (1984). The biology and evolution of language. Cambridge: Harvard University Press. Lieberman, P. (1991). Uniquely human: The evolution of speech, thought, and selfless behavior. Cambridge: Harvard University Press. Mellars, P. (1996). The Neanderthal Legacy. New Jersey: Princeton University Press. Mellars, P. (2004). Neanderthals and the modern human colonization of Europe. Nature, 432, pp. 461-65. Reich, D., Green, R. E., Kircher, M., Krause, J., Patterson, N., Durand, E. Y., et al. (2010). Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature, 468, pp. 1053-60. Shea, J. J. (2011). Refuting a Myth About Human Origins. American Scientist, 99, pp. 128-35.
309
LANGUAGE EMERGENCE IN THE LABORATORY: A METHOD SUITABLE TO DYNAMICAL SYSTEMS ANALYSIS
WHITNEY TABOR AND RUSSELL RICHIE Department of Psychology, University of Connecticut Storrs, CT USA 06269-1020 [email protected] HARRY DANKOWICZ University of Illinois The last decade has seen an explosion of studies examining language emergence experimentally (a literature known as Experimental Semiotics). While revealing, most studies in this domain typically have a complex set of constraints on behavior which make formal analysis of the results challenging. Wishing to take a dynamical systems approach in which we could comprehensively analyze the typology of behavioral outcomes, we devised a coordination task based on Roberts and Goldstone (2011)’s number summing game and studied the process by which the groups arrived at a coordinating scheme. While we have not yet seen evidence for traditional hallmarks of natural language in our game, we believe two features of the results offer a helpful perspective on language evolution: (i) the variation across groups is organized around equivalence classes of strategies (egalitarian points) that stem from the combination of the task structure with the physical capabilities of the organisms, suggesting a dynamicalsystems-based framework for understanding parametric variation in natural languages different from conceptions of Universal Grammar as an information structure and (ii) individual behavior appears to be a sum of impulses that generally approximate but also tend to resist the egalitarian points, suggesting that construing linguistic structure as self-organized, rather than as biologically specified principles and parameters may help address language emergence.
1. Introduction In the past 5-10 years, a new paradigm for studying language evolution has emerged: Experimental Semiotics. In such studies, human participants are brought into the lab to accomplish a joint task, while they are prevented from using established language channels (e.g. speech). To coordinate their actions, participants invent surprisingly complex systems, shedding light on the emergence of combinatoriality and compositionality (Galantucci, 2005), and how language acquisition shapes language evolution (Kirby, Cornish, & Smith, 2008), among other phenomena. Here, we help extend the ES approach by seeking a formal framework in which to situate the research. Generative grammar (e.g., Chomsky, 1981) is an initially
310
1
INTRODUCTION
2
obvious choice, but the fluidity and gradualness of emerging communication systems seem incommensurate with the rigid symbols and rules of generative approaches. As Brighton and Kirby (2006) and others have suggested, an alternative may lie in Dynamical Systems Theory (DST), the mathematical theory of systems that change. Conceptualizing a system as a point on a multi-dimensional, connected manifold, DST is well suited to capture continuous shift from one part of the behavior space to another, as a language emerges and changes. Moreover, the structural patterns that language systems exhibit can be thought of as stabilities in the manifold—i.e., regions that the system is pulled to by virtue of the interactions among the forces on it. One of the challenges to applying dynamical systems ideas in prior paradigms of ES is that, although the systems clearly have stabilities, a formal catalog of those stabilities is usually not feasible. This seems at least partly due to the complex structure of the task domains investigated. Here, we explore a task domain that supports a rich range of behaviors, but the behaviors are organized around stabilities whose layout and principles of organization are easily characterized. This allows us to carefully ask questions about the source of structure in the group behavior. We suggest that the evidence supports a self-organization framework, different from the information-processing perspective usually adopted in studies of cognition. Self-organization refers to situations in which many small entities (e.g., particles, insects, humans) engaged in repeated interactions with feedback exhibit organized structure at the scale of the group (Kukona & Tabor, 2011). 1.1. Task overview In a group coordination experiment reported in (Roberts & Goldstone, 2011), participants (in groups ranging in size from 2 to 17) contributed integers independently to try to make their sum match an unknown target. After each attempt, participants were told whether the obtained sum was too high or too low relative to the target, or given the difference between the sum and target. Groups played repeatedly with the same target until the objective was met. We refer to this game as the Integer Summing Game. In our study, we employed a similar game with groups of 4. In our version, unlike Roberts and Goldstone’s, participants were explicitly told the goal number G and, after each attempt, given the list, c1 , c2 , c3 , and c4 , of integers contributed by the 4 group members. This was done in order to allow participants to develop very precise models of the group system, in much the way grammars are very precisely learned. We refer to our version of the game as the Integer Summing Game with Detailed Feedback (ISGDF). By definition, the possible solutions to the ISGDF in the 4-dimensional integer lattice Z 4 lie on the 3-dimensional integer simplex given by c1 + c2 + c3 + c4 = G
(1)
As we make clear in more detail below, the ISGDF, at least under the param-
311
2
METHOD: INTEGER SUMMING GAME WITH DETAILED FEEDBACK (ISGDF)3
eters we use, does not produce communicative interaction that has well-known, hallmark properties of natural language. For example, there is little evidence of representation or compositionality. Nevertheless, we suggest that there is a more fundamental set of properties that are relevant to understanding the variety of natural language and its evolution. First, the game requires individuals to agree upon one particular strategy, e.g. three players entering zero and the fourth entering the goal (henceforth ”Triple Zero Strike”), among several that are equivalent (e.g., the various permutations of which player enters the goal), much as language requires a community choose among words and rules that are equivalent (e.g. ’apple’ versus ’pomme’, or head-complement and not complement-head ordering). Second, certain game strategies, e.g. Triple Zero Strike, seem to be more stable in the long-run than other strategies, e.g. full egalitarianism, where players split the goal four-ways; we find a natural language analog in cross-linguistically favored traits, e.g. SVO (= Subject-Verb-Object word order) and SOV over OSV, OVS, VSO, and VOS (experimental studies provide evidence for processing differences between the favored types and the disfavored—e.g., Hall, Mayberry, & Ferreira, 2013). Finally, we present evidence that persistent idiosyncratic variation is a central feature of the way humans solve our task, consistent with a self-organization view of language evolution. 2. Method: Integer Summing Game with Detailed Feedback (ISGDF) In our version of the ISGDF, each player sits at a computer terminal on which is displayed a box for entering text, a Goal number, and a list of the previous contributions of all the players with the previous contribution of the viewing player marked, and a number indicating the Total of the previous contributions. On each trial, each player types an integer into the text box and presses the Return key. If a player enters his or her integer before all the other players have entered theirs, a message appears below the text box which says, ”Waiting for Other Players”. Once all of the players have entered their numbers, the sum of their guesses is calculated and printed on the screen. If the sum equals the goal, a message appears indicating success, and a new goal is generated. If the sum does not equal the goal, players must try again with the same Goal until they achieve it. Each event of contributing 4 numbers to the Total is called a trial and the series of trials that eventually result in achievement of a particular Goal is called a round. In the current design, players played up to 20 rounds with goals chosen in the range 1199 or played for 1/2 hour, whichever occurred first. The four contributions for each trial were stored for later analysis. Being interested in the organization of well-formed coordinations, which we think of as analogous to adult grammatical language, we focused our investigations on the solutions reached at the end of each round. We normalized the data by dividing each set of player contributions by the Goal of the current round (effectively making the Goal 1 for all rounds), and then projected the four dimen-
312
3 3
RESULT 1: CLUSTERING RESULT 1: CLUSTERING
4 4
sional solutions from all rounds of all groups onto the three dimensional solution sional solutions from 1. alla rounds of allsection, groupswe onto three dimensional solution In the next testthe two hypotheses that together subspace of equation a subspace of equation 1. In the next section, we test two hypotheses that together form a distinctive claim of our dynamical perspective: there should be clustering form a distinctive claim of our dynamical perspective: should be clustering around the egalitarian points and idiosyncratic stability there of individual behaviors. around the egalitarian points and idiosyncratic stability of individual behaviors. 3. Result 1: Clustering 3. Result 1: Clustering Human participants seem inclined toward egalitarian play, even though the Triple Human participants inclined toward play,15 even though the Triple Zero Strike solution seem is most efficient. We egalitarian thus identified egalitarian points in Zero Strike solution is most efficient. We thus identified 15 egalitarian points in the solution space (Equation 1) around which we predicted that behavior would the solution space (Equation 1) around which we predicted that behavior would cluster: all the points combining some number (from 0 to 3) of zeros with egalcluster: all the pointsfrom combining some number (from 0 toof3)22, of zeros egalitarian contributions the remainder (e.g. for a Goal [0, 0, with 11, 11] is itarian contributions from the remainder (e.g. for a Goal of 22, [0, 0, 11, 11] is an egalitarian point) . We defined a statistical model, Soft Egalitarian Density, to an egalitarian point) . We defined a statistical model, Soft Egalitarian Density, to be the sum of 15 normal distributions with means at these loci. We sampled the be the by sumgenerating of 15 normal distributions with means at theseeach loci.ofWe the model 100 points of Gaussian noise around thesampled 15 egalitarmodel by generating 100 points of Gaussian noise around each of the 15 egalitarb ian points.b We assessed the hypothesis that the data cluster around the egalitarian ian points. We assessed hypothesis the data cluster around the egalitarian points by comparing the the gradient of thethat model density to the gradient of the obpoints by comparing the gradient of the model density to the gradient of the observed density in regions where there was substantial data. served density in regions where there was substantial data. In particular, some of the observed data were concentrated precisely on indiIn particular, ofthe thedata observed were concentrated precisely individual points. To some smooth into a data density function comparable to theonmodel vidual points. To smooth the data into a density function comparable to the model density, we added 100 noisy versions to each of the observed data points. We then density, we the added 100ofnoisy versionsspace to each of the observed data We then considered tiling the solution consisting of cubes of points. side length 0.2, considered the tiling of the solution space 3consisting of cubes of side length 0.2, with one cube centered at the origin in R3 and sides aligned with the basis vecwith cube cube centered theatorigin in data R and sides with the basis vectors. one For each that at had least 20 points in aligned it, we computed the center tors. For each cube that had at least 20 data points in it, we computed the center of mass of the data within the cube and formed a vector, called the data gradient, of mass of the within thecube cube to and a vector, called the data gradient, connecting thedata center of the theformed data center of mass. Similarly, for the connecting the center of the cube to the data center of mass. Similarly, for thea model sample, within this cube, we identified the center of mass and formed model sample, within this cube, we identified the center of mass and formed vector, called the model gradient connecting the center of the same cube to thea vector, called the model gradient connecting thecubes centerthat of satisfied the samethe cube to the model sample center of mass. For each of the 49 20-points model sample center of mass. For each of the 49 cubes that satisfied the 20-points criterion, we measured the cosine of the angle between the data gradient and the criterion, we measured the cosine the angle between data gradient andaverthe model gradient. A one-sided t-testofconfirmed that thesethecosines were, on model gradient. A one-sided t-test confirmed that these cosines were, on average, greater than 0 (t(53) = 11.63, p < .001), indicating that the data gradient and age, greatergradient than 0 (t(53) = 11.63, p < .001), that the the model point in approximately theindicating same direction in data mostgradient parts of and the the model gradient point in approximately the same direction in most the space (only 4 of 54 cubes had nonpositive cosines). Figure 1 shows theparts data of along space (only 4 of 54 cubes had nonpositive cosines). Figure 1 shows the data along with the egalitarian points. Removing the Triple Zero Strike solutions still yielded with thecosine egalitarian points. Removing thezero Triple Zero=Strike yielded a mean significantly greater than (t(36) 4.71, solutions p < .001),still indicating athat mean cosine significantly greater than zero (t(36) = 4.71, p < .001), indicating there is significant clustering even at non-optimal stabilities. that there is significant clustering even at non-optimal stabilities. a We generated an orthonormal basis for the null space of [1, 1, 1, 1] and used the matrix of these a We generated an orthonormal 3 . null space of [1, 1, 1, 1] and used the matrix of these basisinfor Rthe vectors to map points in R4 into points 3. b ThetoEuclidean pointsthe in R vectors map pointsdistance in R4 into between closest two peaks was 0.29 in the transformed space; we b The Euclidean distance between the closest peaks was sure 0.29 the in the transformed space; we chose variance = 17% of this distance (i.e., σ 2 = two 0.05), to make peaks did not substantially chose variance = 17% of this distance (i.e., σ 2 = 0.05), to make sure the peaks did not substantially overlap. overlap.
313
4 4
RESULT 2: PERSISTENCE OF IDIOSYNCRATIC STABILITIES RESULT 2: PERSISTENCE OF IDIOSYNCRATIC STABILITIES
5 5
Figure 1. Points are normalized values of (c1, c2, c3, c4) from Equation (1), evaluated at each soluFigure 1. dots Points areegalitarian normalizedpoints. valuesRed of (c1, c2, c3,mark c4) from Equation at each mark solution. Blue mark asterisks 4-person game (1), data.evaluated Green asterisks tion. Bluegames dots mark points. Red computer asterisks mark 4-person game data. mark 3-person (one egalitarian person failed to show; played 0 for fourth). Data Green points asterisks are jittered for 3-person visibility.games (one person failed to show; computer played 0 for fourth). Data points are jittered for visibility.
4. Result 2: Persistence of idiosyncratic stabilities Persistenceof ofidiosyncratic Idiosyncraticstabilities Stabilities 4. Result 2: Persistence A standard view of the source of clustered patterning in natural language typology A standard view of the source of clustered patterning in natural language typology is that there is a Universal Grammar which has a limited inventory of parameter is that there is a Universal Grammar which has a limited inventory of parameter settings (Chomsky, 1981). Each setting specifies a cluster of behaviors that go tosettings (Chomsky, 1981). Each setting specifies a cluster of behaviors that go together, and only a limited subset of the conceivable combinations of all behaviors gether, and only a limited subset of the conceivable combinations of all behaviors is observed. The current system is an approximately analogous case in that there is observed. The current system is an approximately analogous case in that there is an infinity of conceivable solutions to the Integer Summing Game but only a is an infinity of conceivable solutions to the Integer Summing Game but only a small finite number of them are favored (the 15 egalitarian points). small finite number of them are favored (the 15 egalitarian points). When data exhibit variation not accounted for by a structural model, a standard When data exhibit variation not accounted for by a structural model, a standard approach is to treat the variation as independently distributed random noise added approach is to treat the variation as independently distributed random noise added to the model. Our self-organization perspective predicts that the structure and to the model. Our self-organization perspective predicts that the structure and variation are not independent parts of the system; instead, the structure emerges variation are not independent parts of the system; instead, the structure emerges from the individual impulses of the contributors, shaped by feedback during their from the individual impulses of the contributors, shaped by feedback during their interactions. We thus expect the variation around the structure to not be noisy but interactions. We thus expect the variation around the structure to not be noisy but to consist of micro-stabilities (tendencies to persist in idiosyncratic behaviors). to consist of micro-stabilities (tendencies to persist in idiosyncratic behaviors). To explore this contrast between the classical and self-organizing perspectives To explore this contrast between the classical and self-organizing perspectives we built a model of our data set in the classical vein: we hypothesized two funwe built a model of our data set in the classical vein: we hypothesized two fundamentally different mental states underlying each participant’s behavior: Condamentally different mental states underlying each participant’s behavior: Contribute Something and Contribute Nothing. Contribute Something is the foundatribute Something and Contribute Nothing. Contribute Something is the foundation of egalitarian behavior and it also gives rise to striker behavior when other tion of egalitarian behavior and it also gives rise to striker behavior when other participants are contributing zero. At the beginning of play, all participants have participants are contributing zero. At the beginning of play, all participants have a high likelihood (0.95) of choosing Contribute Something. On the first play, a high likelihood (0.95) of choosing Contribute Something. On the first play,
314
4 RESULT 2: PERSISTENCE OF IDIOSYNCRATIC STABILITIES
6
each contributes round(G/N + η0 ) where N is the number of players, η0 is normally distributed random noise with mean 0 and variance K/4, where K = 7 (this value controls the number of steps it takes, no average, for a model group to succeed on a goal; we calibrated it numerically as described below). After the first play, each player reviews the previous play of each of the other players; if the contribution is less than 1/8 of G, the player is assumed be contributing nothing, otherwise contributing something (1/8 = 1/2 times 1/4, the minimum mean value Contribute Something with 4 players). The players draw their next contribution from round(G · pi + η(i)) where pi is 0 if player i is in Contribute Nothing, pi is 1/N C if player i is in Contribute Something and estimates N C contributors (self included) from the last round. The noise is given by 1 − N1C ·N (2) NC where N is normally distributed noise with mean 0 and variance 1. Before each trial, the current state of each player is selected as follows: if the current state is Contribute Something, then switch to Contribute Nothing with probability M · 1− N1C N C , where M is a constant; otherwise, do not switch states. In the limit of infinite rounds, this model converges on the Triple Zero Strike behavior. Searching numerically in the parameter space, we chose M and K so that so that the number of trials it took the models to succeed on the 20 goal numbers in the experiment had approximately the same mean as each of the human group trial counts and, at the same time, the proportion of groups that discovered Triple Zero Strike within the 20 rounds was approximately the same as that of the human proportion. Though it is a dynamical feedback model, this model makes a standard assumption about the way variability is related to structure: Variability = Structure + Noise. A t-test based on the Result 1 procedure applied to the model data indicated significant alignment of the agent-based model with the Soft Egalitarian Density prediction. To test for idiosyncratic stability, we employed Recurrence Quantification Analysis (RQA—C. L. Webber & Zbilut, 1994; Zbilut & C. L. Webber, 1992). RQA provides a two dimensional map of self-similarity in the structure of a time series. In particular, the RQA plot of a time series T containing L elements is a L × L matrix Q in which Q(i, j) is 1 if T (i) = T (j) and 0 otherwise. Note that Q(i, i) = 1 ∀i. Various measures of the spatial distribution of ones and zeros are used to assess structure in the time series. One measure, Percent Determinism— equation 3—indicates the extent to which the time series repeats subsections of itself in different places. η(i) = K ·
PD =
L−1
l=lmin l · P (l) L i−1 2 i=2 j=1 Q(i, j)
315
(3)
5 CONCLUSIONS
a.
7
b. Figure 2.
a. RQA plot of a typical model group. b. RQA plot of a typical human group.
Here, l varies over lengths of positive trending, slope 1 diagonal lines of recurrent points in Q. P (l) is the proportion of off-diagonal recurrent points that are part of lines of length l. We set lmin to its lowest meaningful value, 2 (if lmin = 1, then P D = 1 for data that have any nontrivial recurrence). RQA applied to a time series of white noise is likely to yield P D close to zero, while a completely deterministic process yields P D = 1. Deterministic behavior in the Integer Summing Game indicates temporary (or permanent) stability of the guessing patterns. Thus, to ask how stability in the human data compared to stability in the model, we assessed percent determinism for each player. In keeping with the self-organization hypothesis, we expected the percent determinism to be significantly greater in the humans. Figure 2 shows a sample human RQA plot alongside a model RQA plot with a similar number of trials. Because playing zero is highly stable in both the humans and the models, and thus swamps the determinism estimation, we set recurrence to zero for all trials where zero was played. For the 9 groups in which 4 players completed all 20 trials, a t-test indicated that the human P D was indeed significantly greater than the model P D (P Dhuman = 0.52(0.18); P Dmodel = 0.29(0.06); t(16) = 3.74, p < .01). 5. Conclusions In this work we presented human behavioral data from a coordination task based on Roberts & Goldstone, 2011’s number summing task. We found two key results: (i) groups’ collective strategies tended to cluster around a few well-defined egalitarian points, a tendency we believe parallels human languages’ tendencies to cluster around a small finite number of parameter settings (ii) individuals persist in idiosyncratic behaviors that deviate from the egalitarian points identified in (i), despite the fact that they tend to converge on the egalitarian points over time. Behavior (ii) is of particular interest because it constitutes a way in which the coordination game we have investigated behaves differently from what is traditionally assumed in models of language behavior. The behavior is consistent with
316
REFERENCES
8
a self-organizing account of how coordinated group behavior arises and changes over time. The self organization account predicts that the global stabilities result from the gradual shaping of individual stabilities, thus predicting the simultaneous existence of individual and global stabilities. This is different from standard language models which assume that the group-independent, long-duration stabilities stem from fixed biologically-specified commonalities among the organisms involved and that idiosyncratic behaviors constitute noisy deviations from these behaviors. We suggest that because of the importance of feedback-shaped stabilities, a dynamical systems approach is suitable to our task, and may offer helpful insights into natural language evolution. Acknowledgements Thanks to Katherine Alfred, Pyeong Whan Cho, Olivia Harold, Brendan Innes, Milod Kazerounian, Corinne LaPorte-Cauley, Robert Powers III, Christopher Richard, and Sarah Tunewicz, who helped run the experiments and three anonymous reviewers for helpful feedback. This material is based on work supported by the United States National Science Foundation Grant No.’s 1246920 (NSF INSPIRE) and 1059662 (NSF PAC) and a United States National Institutes of Health (NIH) P01 HD 001994 grant to Haskins Laboratories. References Brighton, H., & Kirby, S. (2006). Understanding linguistic evolution by visualizing the emergence of topographic mappings. Artificial life, 12(2), 229–42. Chomsky, N. (1981). Lectures on government and binding. Dordrecht: Foris. C. L. Webber, J., & Zbilut, J. P. (1994). Dynamical assessment of physiological systems and states using recurrence plot strategies. Journal of Applied Physiology, 76(2), 965-973. Galantucci, B. (2005). An experimental study of the emergence of human communication systems. Cognitive science, 29(5), 737–67. Hall, M. L., Mayberry, R. I., & Ferreira, V. S. (2013). Cognitive constraints on constituent order: Evidence from elicited pantomime. Cogntion, 129, 1–17. Kirby, S., Cornish, H., & Smith, K. (2008). Cumulative cultural evolution in the laboratory: an experimental approach to the origins of structure in human language. Proceedings of the National Academy of Sciences of the United States of America, 105(31), 10681–6. Kukona, A., & Tabor, W. (2011). Impulse processing: A dynamical systems model of incremental eye movements in the visual world paradigm. Cognitive Science, 35, 1009–1051. Roberts, M. E., & Goldstone, R. L. (2011). Adaptive group coordination and role differentiation. PLoS One, 7(6), e22377. Zbilut, J. P., & C. L. Webber, J. (1992). Embeddings and delays as derived from quantification of recurrence plots. Physics Letters A, 171(3-4), 199-203.
317
IS THE SYNTAX RUBICON MORE OF A MIRAGE? MAGGIE TALLERMAN Linguistics Section, Newcastle University, Newcastle NE1 7RU, UK Minimalist thinking proposes that the narrow language faculty arose as a saltation as a result of a Merge mutation, and was initially used in problem-solving etc. as a solely internal ‘language of thought’, with EXTERNALIZATION only appearing later. Moreover, natural selection played little part in the evolution of language. Nor were there any precursors to full language; no kind of gradually-evolving protolanguage existed in hominin evolution. On the basis of the properties of two central components of language, the lexicon and syntactic displacement, I argue against each of these claims.
1.
Minimalist views of language evolution
Recent literature within the Minimalist/Biolinguistic framework (e.g. Chomsky 2010, 2012; Berwick & Chomsky 2011; Berwick 2011) promotes four interrelated claims concerning language evolution. 1. The language faculty emerges as the result of a recent saltation, i.e. a very sudden evolutionary step. For instance, Chomsky talks of the ‘narrow window of the “sudden emergence”, which [leads] one to expect something more like a snowflake than the outcome of extensive ... “tinkering” over a long period’ (2010: 59). The assumption is that the contents of the language faculty – or specifically the parts pertaining to ‘narrow syntax’ – are extremely limited: the ‘human syntactic engine’ (Berwick 2011) consists merely of words and their features plus the Merge operation, which creates hierarchical structure. Moreover, all that was required for the syntax rubicon to be crossed was some minor genetic mutation or ‘slight rewiring of the brain’ (Berwick & Chomsky 2011), which gave rise to Merge. This occurred in a single individual, providing the crucial bridge from no language to full language in a single leap. 2. ‘Externalization’, i.e. communicating linguistically with conspecifics, is a secondary phenomenon, appearing well after solely internal uses of language: ‘The earliest stage of language would have been […] a language of thought,
318
available for use internally’ (Chomsky 2010: 55), and ‘all relevant biological and evolutionary research leads to the conclusion that the process of externalization is secondary’ (Berwick & Chomsky 2011: 32). The idea is that Merge initially applied in thought processes, and was adaptive because it enhanced planning and problem-solving abilities. Assuming that the relevant mutation was transmissible to offspring, a community of (internal) language-users would form over time, at which point externalization would be advantageous. 3. The role of natural selection in developing the narrow language faculty was negligible. If the contents of the faculty are essentially words + Merge, and the latter results from a minor mutation, this minimizes the aspects of language that require explanation in evolutionary terms. Rather than assuming that small, incremental changes occurred in a gradually-developing language faculty, requiring many thousands of years of adaptive change to reach full-blown language, the Minimalist conception de-emphasizes the role of natural selection. Instead, proponents stress the role played by ‘third factor effects’ – natural or physical laws of form. 4. There are no precursors to full syntax; no pre-syntactic stages occurred in the evolution of language: ‘there is no room in this picture for any precursors to language – say a language-like system with only short sentences’ (Berwick & Chomsky 2011: 31). Under this view, once Merge occurred, then the full panoply of linguistic structures, including all recursive constructions, became instantly available to our ancestors. This, of course, is consistent with the idea that language was not shaped over long periods by natural selection. In this paper I present counterarguments to each of these four claims, which together I will term the Strong Minimalist Thesis (SMT), after Berwick & Chomsky (2011: 30). Argumentation centres on the properties on two core areas of ‘narrow syntax’: 1) the lexicon; 2) syntactic displacement, i.e. ‘movement’. I defend a gradualist approach to the evolution of the language faculty, contra Berwick (2011: 70), who states that ‘One does not need to advance incremental, adaptationist arguments with intermediate steps between some protolanguage and full natural language to explain much, perhaps all, of natural language’s design’. I argue that only an incremental and adaptive approach can capture the specific properties of both the lexicon and syntactic displacement.
319
2.
‘Conceptual atoms of the lexicon’
To build the ‘syntactic engine’ under the SMT notion of language, we must first have lexical items, since without these there will be nothing to merge. Human lexical items clearly have some relationship with – and presumably ultimately derive from – concepts. The SMT position seems to be that human-type concepts arise first in evolution, before any kind of language is present, and are subsequently (somehow) made available to the Merge operation, which turns them into lexical items, all without externalization. However, the relevant literature does not distinguish clearly between the ‘conceptual atoms of the lexicon’ (Berwick & Chomsky 2011: 30) and words; for instance, Chomsky (2010: 57) mentions ‘concepts and lexical items (to the extent that they differ – …far from a simple question)’, and Chomsky (2012) goes further, denying that there is any difference between lexical items and concepts. This is a curious position. While concepts are either fully innate, or else canalized, developing reliably with experience, lexical items are entirely learned. Concepts are private, and are not much affected by the concepts of our conspecifics, whereas words are public constructs, with their meanings, sounds/shapes, selectional restrictions and syntax established through usage in a language community. Other animals certainly have private concepts of some kind (e.g. vervet monkeys have concepts of different predator classes which are honed by experience), but these bear no relationship to shared lexical items. A further critical distinction between lexical items and concepts is of course that only the former can COMBINE. Assuming that the Merge operation itself did come about via a ‘minor mutation’, what makes it possible for lexical items to combine at all? Chomsky (2008) proposes that lexical items exhibit EDGE FEATURES which permit them to be merged: transitive verbs seek out arguments as complements, determiners seek out nouns, and so on. But this merely begs the question: where do the relevant edge features come from? They are formal properties: for instance, assassinate requires a complement that is not only +human but is also +socially prominent. Private concepts cannot have edge features, because such features are only established via externalization, effectively being negotiated between speakers as the meaning and usage of lexical items are settled. These features vary over time, exactly because ‘externalization’ gives rise to new syntactic patterns for lexical items, as language change shows clearly (a simple example is the current change in the syntax of the verb impact in British English, which is becoming transitive for many younger speakers, whereas for older speakers it takes a PP complement (X impacts on Y). This suggests that SMT claim (2) – that externalization is a
320
secondary phenomenon – cannot be correct: no Merge of pre-externalized concepts could occur in the putative ‘language of thought’ because this has no way to acquire edge features. Rather, as in other primates, there were simple non-combinable concepts but no lexicon before language was externalized, and a shared lexicon (along with human-type concepts) becomes established in each speech community via usage. These conclusions also militate against claim (1): the idea that the narrow language faculty is a saltation. Even assuming that Merge itself arose suddenly, the lexicon over which it must operate did not. There can be no instantaneous lexicon, yet the lexicon is clearly a core part of the LF. The first hominin speakers probably used a small set of words analogous to Jackendoff’s (2002) ‘defective’ lexical items (hello, wow, ouch, oops, shh etc.); these lack edge features (they have sound and meaning but no lexical category and no syntax) so remain non-combinable, and reflect a period in human evolution in which only dyadic engagement occurred (Tomasello et al. 2005); in other words, speakers could not yet refer to external entities. When TRIADIC engagement evolved as part of the suite of specifically human features which Tomasello terms SHARED INTENTIONALITY, referential lexical items started to appear (Hurford 2007); now, for the first time, two speakers could converge attention jointly on a third entity – and label it. This accounts for the gradual development of a shared lexicon, with speakers acquiring learned, conventional form/meaning associations. The fact that both lexical items and their edge features are learned remains unaccounted for in the Minimalist approach, which assumes that internal concepts arose first, then were somehow endowed with edge features in order to Merge, all to serve internal thought, and that externalization followed. It is also evident that, contrary to claim (3), natural selection plays a role here: just as shared intentionality is adaptive, allowing humans to benefit by cooperating in joint ventures, so a developing referential lexicon is adaptive, since community labels can shortcut (ape-like) learning of categories by trial and error (Cangelosi & Harnad 2001). Having categorical distinctions, for instance for putative predators or safe foodstuffs, is clearly adaptive. Learning shared labels speeds up the acquisition of categorical knowledge by eliminating the need for each person to categorize for themself, so a public “E-lexicon” is adaptive. An E-lexicon (for each evolving language) that expands over time in hominin evolution is thus predicted. Crucially, the lexicon is not stored permanently in the genome, but in the speech community (Pinker & Bloom 1990). The beauty of this system is that each speaker need only know a subset of the community’s total E-lexicon, yet the latter can continue to grow, with each generation adding to it. Moreover, nothing in the SMT accounts for the
321
astonishing size of the “I-lexicon” – the amount each person can learn, store and retrieve – but assuming that the ability to learn as much as possible of the large, ambient E-lexicon is also adaptive (not least in increasing opportunities to communicate and thus develop social relationships, which itself is adaptive in a highly cooperative species), then the potential size of the I-lexicon will increase gradually in phylogeny. A purely private lexicon, on the other hand, has little potential for growth, since other people’s labels cannot be added to it. Finally, note crucially that permanent “I-storage” of lexical items is – rather ironically – fully dependent on externalization: studies of both language attrition and tip-ofthe-tongue phenomena indicate that vocabulary and lexical access decline quickly if lexical items are not used in public. So lexical items as we know them cannot evolve without externalization of language, and this also establishes their publicly-shared edge features, which, in turn, are what makes syntax possible. But such features cannot emerge instantaneously either, since they develop as words are used. There is no reason to assume that even the earliest REFERENTIAL lexical items were combinable. Single (proto)words, still lacking lexical categories, and thus lacking syntax must be a precursor to any two-word stage: Merge can’t operate until there are single words available to be merged. In fact, syntactic categories are only formed when words combine in consistent patterns. This strongly suggests that the ‘no protolanguage’ claim (4) is incorrect: individual words were first used separately, and early combinations would not be syntactically consistent, or even syntactic at all. The first combinations would be merely paratactic word collocations, incidental nonce occurrences of two vaguely-related words in the same utterance; if used again, these collocations start to become established, and in time words gain formal syntactic requirements, including their lexical categories: edge features emerge gradually, item by item. This picture of a lexicon developing incrementally within a speech community is supported by recent studies of language creation, e.g. Kegl et al. (1999) on Nicaraguan Sign Language (ISN); Sandler et al. (in press) on AlSayyid Bedouin Sign Language. Conventionalized lexical items are not there right from the start, but instead develop as the language is transmitted horizontally and vertically. The lexicon emerges gradually via externalization. For example, Kegl et al. demonstrate that transitivity in verbs is an EMERGENT property in ISN; in other words, the relevant edge features develop over time, contrary to the expectations of the SMT.
322
3.
Syntactic displacement and externalization of language
According to the SMT, the property of syntactic displacement (‘movement’) comes ‘for free’ (Chomsky 2012: 57) with Merge: ‘Crucially, the operation Merge yields the familiar displacement property of language: the fact that we pronounce phrases in one position, but interpret them somewhere else as well’ (Berwick & Chomsky 2011: 31). Here, I argue that contrary to claim (2) – the idea that externalization was secondary – displacement only emerges via language use, and had no role in a putative initial ‘language of thought’. Syntactic displacement occurs in wh-questions: What is Kim eating __ ?; in passives such as The cheese was eaten __ by Gromit, and elsewhere. ‘External Merge’ builds headed hierarchical structures from lexical items, giving rise to constructions such as Kim is eating the cheese or Kim is eating what. ‘Internal Merge’ (displacement) takes a segment of a construction and copies it at the left edge: what Kim is eating what. The original position of the copied item is then suppressed: only the last-Merged (highest) instance of what is actually pronounced in externalization, What is Kim eating __ ?, though of course speakers understand that what remains the object of eating. Berwick & Chomsky (2011: 32) suggest that for the speaker, ‘the computational burden is greatly eased’ by this suppression, but that the burden of interpretation is thrust onto the hearer instead, who must reconstruct the original argument position of the whelement. In turn, they say, the fact that computational efficiency (only pronounce the highest copy) always wins out over interpretive-communicative efficiency (pronounce both copies) demonstrates that, in accordance with claim (2), ‘language evolved as an instrument of internal thought, with externalization a secondary process’ (ibid). In fact, suggests Berwick (2011: 70f) ‘displacement makes language processing and communication more difficult, not less difficult – yet another argument that language is not designed for communication’. At first glance, this reasoning appears sound: externalized language would surely be easier to process if arguments always remained in their original position, where their theta roles are assigned – then it would be clear what argument belonged to which head verb or preposition etc. But this takes no account of the goals of communication. For instance, consider the following dialogue: Speaker A: What’s happened? You look really down. Speaker B: (1) a. #Lightning struck my sister yesterday. b. My sister was struck by lightning yesterday.
323
While (1a), with no displacement, sounds unnatural (hence marked with #), (1b) – with displacement – is the reasonable answer in this context, because Speaker B needs to foreground the most salient argument of struck, which is what passivization does. Contra Berwick, language is absolutely designed for communication: syntactic displacement is the tool of externalization. As noted above, under the SMT displacement is said to come ‘for free’ when Merge appeared; it ought, then, to be part of the initial language of thought, if this contains Merge. But this is entirely the wrong way round. We use displacement primarily not in thought, but for communicative purposes: we ask wh-questions to which we require an answer from an interlocutor, we focalize or topicalize elements according to discourse requirements, such as distinguishing between given and new information, we make animate theme arguments into subjects by passivization to foreground them, as seen above, and we extrapose heavy noun phrases to make for an easier parse. All these displacement structures only exist because of discourse, coming about via externalization, and could not develop in a purely internal ‘language of thought’, which contains no dialogue. Again, studies of emerging languages support this view, showing for instance that a fronting construction for topicalized elements appeared in ISN as the language developed. Displacement occurs incrementally, construction by construction, contra the ‘language as saltation’ claim (1). Moreover, displacement constructions are typically intimately tied to the functional categories that signal them – categories which emerge gradually over time via GRAMMATICALIZATION (Heine & Kuteva 2007). Externalized language use is certainly necessary for grammaticalization to occur. For instance, the crucial semantic distinction between active The snake ate the bird and passive The snake was eaten by the bird is signalled solely by the two functional elements: auxiliary be + -en participle. Passives have numerous distinct pathways of development cross-linguistically, but all appear to involve grammaticalization in some form or other. Like the lexicon, then, displacement arises via externalization. Even taking into account only the contents of the ‘narrow’ language faculty, I have argued that externalization is crucial in forming the properties of two major components: the lexicon and syntactic displacement. Under this view, there is no syntax rubicon in language evolution.
324
References Berwick, R. C. (2011). Syntax facit saltum redux: Biolinguistics and the leap to syntax. In Di Sciullo & Boeckx (Eds.), (pp. 65–99). Berwick, R. C. & Chomsky, N. (2011). The biolinguistic program: the current state of its development. In Di Sciullo & Boeckx (Eds.), (pp. 19–41). Cangelosi, A., & Harnad, S. (2001). The adaptive advantage of symbolic theft over sensorimotor toil: Grounding language in perceptual categories. Evolution of Communication, 4, 117–142. Chomsky, N. (2008). On phases. In R. Freidin, C. Otero, & M. L. Zubizarreta (Eds.), Foundational issues in linguistic theory (pp. 133–166). Cambridge, MA: MIT Press. Chomsky, N. (2010). Some simple evo devo theses: how true might they be for language? In R. K. Larson, V. Déprez, & H. Yamakido (Eds.), The evolution of human language: Biolinguistic perspectives (pp. 45–62). Cambridge: Cambridge University Press. Chomsky, N. (2012). The science of language: Interviews with James McGilvray. Cambridge: Cambridge University Press. Di Sciullo, A. M., & Boeckx, C. (Eds.) (2011). The biolinguistic enterprise Oxford: Oxford University Press. Heine, B. & Kuteva, T. (2007). The genesis of grammar: A reconstruction. Oxford: Oxford University Press. Hurford, J. R. (2007). The origins of meaning: Language in the light of evolution. Oxford: Oxford University Press. Jackendoff, R. (2002). Foundations of language: Brain, meaning, grammar, evolution. Oxford: Oxford University Press. Kegl, J., Senghas, A. & Coppola, M. (1999). Creation through contact: Sign language emergence and sign language change. In M. DeGraff (Ed.), Language creation and language change: Creolization, diachrony and development (pp. 179–237). Cambridge, MA: MIT Press,. Pinker, S., & Bloom, P. (1990). Natural language and natural selection. Behavioral and Brain Sciences, 13, 707–784. Sandler, W., Meir, I., Padden, C., & Aronoff, M. In press. Language emergence: Al-Sayyid Bedouin Sign Language. In N. Enfield, P. Kockelman, & J. Sidnell (Eds.) Cambridge handbook of linguistic anthropology. Cambridge: Cambridge University Press. Tomasello, M., Carpenter, M., Call, J., Behne, T. & Moll, H. (2005). Understanding and sharing intentions: The origins of cultural cognition. Behavioral and Brain Sciences, 28, 675–735.
325
SYMBOL EXTENSION AND MEANING GENERATION IN CULTURAL EVOLUTION FOR DISPLACED COMMUNICATION KAORI TAMURA & TAKASHI HASHIMOTO School of Knowledge Science, Japan Advanced Institute of Science and Technology (JAIST), 1-1, Asahidai, Nomi, Ishikawa, 923-1292, Japan Displaced communication is a unique feature of human linguistic communication. We hypothesized that the senderÕs symbol extension and the receiverÕs meaning generation are key factors in displaced communication. We examined these two processes through an experiment and found that in the repetition of the processes, the symbol system changed from iconic to figurative. Therefore, we claim that cultural evolution of the symbol system is important in the realization of displaced communication.
1. Introduction 1.1. Displacement of Human Language Displacement is one of the defining features of human language. The term refers to the ability to talk about things that are remote in space or time (or both) from the context of the utterance (Hockett, 1960). Displacement is a concept that defines the difference between human linguistic communication and symbolic communication in animals. The understanding of displacement is thought to contribute to the study of the origin and evolution of human language. However, it is not clear which aspects of displacement are unique to humans. Here, we investigate displacement as a unique part of human language by focusing on displacement that is brought out in communication, considering the displacement of a sender and receiver (Tamura & Hashimoto, 2012). In addition, we claim that displacement unique to humans concerns not symbol grounding but symbol extension and meaning generation (Hashimoto, 2007). Displacement of communication to make the receiver understand absent objects or, more specifically, what the receiver does not know, is important. Think of communication in which a sender and receiver try to share knowledge about an object. If they have seen the object before and have symbols to represent it, they can share knowledge about it even if the object is not immediately present, as long as they remember the correspondence relation. However, only the existing correspondence relations between the symbols and
326
object are not enough for the sender and receiver to reach a common understanding of the object, if the receiver has not seen it before. The sender needs to extend the symbolic relationship to represent the object to the receiver. On the other hand, the receiver guesses what the sender is trying to communicate by generating a hypothesized meaning of the symbol, which represents something the receiver has not known before. Communication to share knowledge that the receiver has not known can be realized by repeating such interaction between the sender and receiver. The motivation of our study is to examine how the senderÕs symbol extension and the receiverÕs meaning generation take place to realize displaced communication. 1.2. Previous Results In the previous study, it was observed that senders used two kinds of expressions to communicate about absent objects in graphical communication tasks (Tamura & Hashimoto, 2013). Metaphoric Expression: Expressions that represent the feature of an absent object with another object that typically has the feature For example, in the task to communicate Òsour appleÓ to the receiver, one sender used a lemon to express ÒsourÓ (see the leftmost drawing in Fig. 1). This kind of expression can be regarded as a metaphor expression; the sender likened a Òsour appleÓ to a lemon to represent the feature Òsour.Ó Motion Expression: Expressions that represent the feature of an absent object with motions that typically cause the feature For example, in the task to communicate Òsolid fireÓ to the receiver, one sender used the motion ÒhitÓ to express the concept ÒsolidÓ (see the rightmost drawing in Fig. 1). This kind of expression can be regarded as a metonymic expression; the sender referred to Òsolid fireÓ in terms of the motion to hit, which is an adjacent event when we feel ÒsolidÓ in daily life.
Fig. 1. Examples of metaphoric expressions and motion expressions. From the left, a metaphoric expression and a motion expression for Òsour appleÓ and those of Òsolid fire.Ó
These expressions are considered examples of the senderÕs symbol extensions. So far, we observed metaphoric expression and motion expression separately and found that metaphoric expression was used more to express what the receiver did not know (Tamura & Hashimoto, 2013). Motion expression was used as much as metaphoric expression to express what the receiver already
327
knew, though it was used less often to express what the receiver did not know. We also observed that both metaphoric and motion expressions were used together in one picture. Thus, the two expressions might work together to describe objects that receivers do not know about. We also observed examples in which the receivers generated meanings through communication based on hypothesis formation (Tamura & Hashimoto, 2013). They formed a hypothesis about what the sender was trying to say from the drawing. The sender guessed the receiverÕs understanding from the reply and tried to modify it by drawing a new picture. Through repetition of these interactions, the receiver came to understand the senderÕs message. In the experiment, the participants did not succeed in displaced communication at first but came to realize it after several interactions. Based on these previous results, we hypothesize that there will be changes in the use of metaphoric and motion expressions during the tasks. In the present study, we examine this hypothesis by comparing the usage frequency of these expressions in the first and last halves of the tasks. This paper is organized as follows. In the second chapter, we explain the experimental framework of displaced communication, our experimental settings, and two kinds of drawing tasks to be compared. In the third chapter, we describe the experiment results, and in the fourth chapter, we discuss them in detail. The last chapter provides the conclusion. 2. Experiment 2.1. Experimental Framework for Displaced Communication We adopt an experimental semiotics approach, which has drawn attention recently (Scott-Phillips & Kirby, 2010; Galantucci & Garrod, 2011). A graphical medium is often used for referential communication tasks (e.g., Fay et al., 2008). We constructed an experimental framework for displaced communication utilizing graphical interaction, similar to Fay et al. (2008), and examined the process of symbol extension and meaning generation in cultural evolution. 2.2. Experimental Setting The experiment assigned two roles corresponding to a speaker and listener in conversation: a Òsender,Ó who expressed an assigned object by drawing it (or something to represent it), and a Òreceiver,Ó who identified the referent of the senderÕs drawing. Two participants were paired together and given 8 exchanges for each object. Thirty-six Japanese graduate students (18 pairs) participated. The sender was verbally assigned an object (i.e., a noun phrase composed of an adjective and a noun) and then drew a picture on a tablet PC with his/her finger within 2 minutes. The receiver was instructed to guess what the object was and reply with a combination of an adjective and a noun within 1 minute.
328
The receiver was not given any word candidates (of either adjectives or nouns) beforehand, only told that the targets were not complicated words. The receiverÕs reply was verbally fed back to the sender by the experimenter, and the sender then drew a new picture about the same object. The sender and receiver repeated these interactions 8 times. The participants communicated in separate rooms so that they would not be able to use other communicative media such as verbal exchanges or eye contact. They could use only black lines for drawing. The use of linguistic characters (e.g., alphabet letters or kana) or meaningful symbols (e.g., algebraic signs) was prohibited. 2.3. Two Types of Drawing Tasks We set two kinds of tasks for comparison. The sender was assigned one of the two tasks below. Not-in-memory task: A combination of an adjective and a noun that was unfamiliar to the receiver (e.g., cold Christmas tree, soft traffic light, sour fire) In-memory task: A combination of an adjective and a noun that was familiar to the receiver (e.g., cold water, soft pillow, sour apple) The medium of drawing enables communication based on iconic representation. We adopted nouns that are easily understood as icons by drawing their shapes. Iconic symbols are sufficient to communicate the meaning of nouns but not adjectives, because adjectives are impossible to express directly by drawing. In In-memory tasks, it is easy for receivers to predict an adjective by understanding the noun. However, in Not-in-memory tasks, the receiverÕs understanding of the noun is not a hint for the prediction of the adjective. Thus, it is expected that some devices will be used more in the Not-in-memory tasks in the experiment in order to express unconventional combinations of adjectives and nouns. 3. Results To examine whether there were any changes during the tasks, we compared the usage frequency of the expressions in the following four categories between the tasks: a. Only metaphoric expression b. Only motion expression c. Both metaphoric and motion expressions d. Neither metaphoric nor motion expression In the Not-in-memory tasks, the number of correct answers for adjectives
329
was significantly higher in the last was four significantly turns (M =higher 6.25, SD in the = 0.96) last four compared turns (M to = 6.25, SD = 0.96) compared to the first four turns (M = 2.75, SDthe = 1.71, first four t(3.40) turns = -3.06, (M = 2.75, p = .012, SD =r = 1.71, .83).t(3.40) Also, = -3.06, p = .012, r = .83). Also, in the In-memory tasks, the number in the In-memory of correct answers tasks, the fornumber adjectives of correct was answers for adjectives was significantly higher in the last half significantly (M = 12.0, higher SD =in3.16) the last thanhalf in the (M first = 12.0, halfSD = 3.16) than in the first half (M = 7.00, SD = 0.82, t(6) = -3.58, (M p= =7.00, .047,SD r == .78). 0.82,Itt(6) was= expected -3.58, p =that .047, more r = .78). It was expected that more devices to express adjectives were devices used in to the express last half. adjectives Therefore, were we usedseparated in the last half. Therefore, we separated the tasks into two groups for analysis, the tasks theinto firsttwo halfgroups and last forhalf, analysis, and looked the first forhalf and last half, and looked for significant differences between the significant two. differences between the two. Fig. 2 shows the transitions of the Fig. number 2 shows of pairs the transitions for four different of the number uses ofof pairs for four different uses of devices. As observed in Tamuradevices. and Hashimoto As observed (2013), in Tamura the number and Hashimoto of times (2013), the number of times only metaphoric expressions were only used metaphoric in whole turns expressions was significantly were used larger in whole in turns was significantly larger in Not-in-memory tasks (M = 3.75,Not-in-memory SD = 1.67) thantasks in In-memory (M = 3.75,tasks SD =(M 1.67) = 2.25, than in In-memory tasks (M = 2.25, SD = 0.89, t(14) = -2.25, p = .041, SD =r0.89, = .52,t(14) Fig.=2a). -2.25, However, p = .041, there r =was .52,no Fig. 2a). However, there was no significant difference between the significant tasks in the difference last halfbetween (t(6) = -1.10, the tasks p = in .315, the rlast half (t(6) = -1.10, p = .315, r = .41). Also, the number of pairs = .41). who Also, used the onlynumber motion ofexpressions pairs who did usednotonly motion expressions did not differ between the tasks in eitherdiffer the first between (t(6) = the0.17, tasksp in = .870, eitherrthe = .07) firstor(t(6) last= 0.17, p = .870, r = .07) or last half (t(6) = 1.63, p = .154, r = .56, half Fig. (t(6) 2b).= 1.63, p = .154, r = .56, Fig. 2b). The number of pairs who used both The metaphoric number of pairs and motion who used expressions both metaphoric was and motion expressions was significantly larger in the lastsignificantly half in the larger Not-in-memory in the lasttasks half (M=7.75, in the Not-in-memory tasks (M=7.75, = -3.61, SD=1.71) than in the In-memory SD=1.71) tasks (Mthan = 4.50, in theSDIn-memory = 0.58, t(6) tasks (M = p4.50, SD = 0.58, t(6) = -3.61, p = .011, r = .83, Fig. 2c). On the = .011, otherr hand, = .83,the Fig. number 2c). On of the pairsother whohand, used the number of pairs who used neither metaphoric nor motion expression neither metaphoric was significantly nor motion smaller expression in the was last significantly smaller in the last half in the Not-in-memory taskshalf (M in = 3.25, the Not-in-memory SD = 1.26) than tasks in the (M In-memory = 3.25, SD = 1.26) than in the In-memory tasks (M = 5.50, SD = 0.58, t(6) =tasks 3.25,(Mp = 5.50, .017, SD r== .80, 0.58, Fig.t(6) 2d).= 3.25, p = .017, r = .80, Fig. 2d).
Fig. 2. The transition of the numbers of pairs Fig. 2. forThe fourtransition different of uses theofnumbers metaphoric of pairs and for motion four different uses of metaphoric and motion expressions. expressions.
330
4. Discussion: Cultural Evolution of Symbol Systems How is displaced communication realized? We claim that it can be realized by certain changes to the symbol systems. Here, we consider displacement from the perspective of the cultural evolution of the symbol system and discuss what kind of change in the symbol system formed through the repetition of symbol extension and meaning generation. Changes of symbol systems have been observed in Nicaraguan Sign Language (NSL). Senghas et al. (2004) revealed that children who used sign language analyzed complex events into basic elements and sequenced these elements into hierarchically structured expressions. This is a change from a holistic to a compositional symbol system. Pyers and Senghas (2009) found that the second cohort of NSL signers used more mental-state verbs, and its falsebelief performance improved from the first cohort. This can be interpreted as a change making the symbol system easier to refer to otherÕs mental states. It seems that the cognitive abilities of NSL signers did not change before and after NSL developed. Rather, it is more reasonable to think there was a change of symbolic systems, that is, a cultural evolution of symbol systems. Similar changes of symbol systems have been observed in the field of experimental semiotics (Galantucci & Garrod, 2011). Kirby et al. (2008) used artificial language to test cultural evolution in the laboratory. In their experiment, a symbolic system changed from holistic to compositional through iterated learning by human participants. Fay et al. (2003) confirmed that the transition from iconic to symbolic graphical expressions occurs through interaction between participants. In our study, metaphoric expression, considered as one symbol extension by senders, was used significantly to express what the receiver did not know. In addition, not only motion expression but both metaphoric and motion expressions were used significantly in the last half to express what the receiver did not know. Thus, only motion expression, a sort of metonymic expression, may not be sufficient for determining the target adjectives, but when combined with metaphoric expression, motion expression can serve as symbol extension. The sender expressed the object that the receiver did not know by extending iconic symbols to metaphorical and metonymical symbols. The receiver interpreted these extended with meaning generation in hypothesis formation and formed a hypothesis about the senderÕs message from the senderÕs extended symbols, then expressed the hypothesis as a reply. The receiver first interpreted the symbols as iconic but, through repeated interactions, gradually understood that the sender used them as figurative symbols. The sender also guessed the receiverÕs understanding from the replies and tried to modify the receiverÕs understanding by drawing new pictures. Thus, the sender and receiver came to develop metaphoric and metonymic symbols that can express adjectives. This can be considered as the change of their symbol
331
system from iconic to figurative. It is suggested that the repetition of the senderÕs symbol extension and the receiverÕs meaning generation are an important process in the change from an iconic to a figurative system. The senders also used expressions like ÒballoonÓ and ÒcontrastÓ in addition to metaphoric expressions, though the frequencies of use were not significant. ÒBalloonÓ was used only in combination with metaphoric expressions; it can serve as a structure that distinguishes the source concept and target concept in source-target mapping. Contrast was used in combination with metaphoric and/or motion expressions; it can serve as structures that aid the understanding of target adjectives by comparing them with the opposite feature. These expressions are considered to function as devices to support receiversÕ understanding. 5. Conclusion We investigated the process of displaced communication using a graphical communication task experiment. By analyzing transitions in the use of metaphoric and motion expressions, we found that displaced communication is realized by the change of the symbolic system from an iconic to a figurative system, including metaphoric and motion expressions, through the senderÕs symbol extension and the receiverÕs meaning generation. The symbol systems formed in the present experiment are shared only in a pair of participants. In this regard, these symbol systems may correspond to the level of homesign in NSL. A more structured symbol system might be formed in communication within a larger group. We should examine what kind of changes to the symbol systems occur in the condition that the sender communicates with multiple receivers, especially different receivers in each turn. Acknowledgements The authors thank Takuma Torii for his fruitful discussion. This work was supported by a Grant-in-Aid for Scientific Research (No. 23300085) of the Ministry of Education, Culture, Sports, Science, and Technology, Japan. References Fay, N., Garrod, S., Lee, J., & Oberlander, J. (2003). Understanding interactive graphical communication. Proceedings of the 25th Annual Conference of the Cognitive Science Society, Boston, MA. Fay, N., Garrod, S., & Roberts, L. (2008). The fitness and functionality of culturally evolved communication system. Philosophical Transactions of the Royal Society B: Biological Sciences, 363, 3553-3561. Galantucci, B., & Garrod, S. (2011). Experimental semiotics: A review. Frontiers in Human Neuroscience, 5, 1-15.
332
Hashimoto, T. (2007). How is displacement of symbols possible? Consideration through constructive modeling of grammaticalization (In Japanese). The Society of Instrument and Control Engineers (SICE) Symposium on Systems and Information, 205-210. Hockett, C. F. (1960). The origin of speech. Scientific American, 203, 3, 89-96. Kirby, S., Cornish, H., & Smith, K. (2008). Cumulative cultural evolution in the laboratory: An experimental approach to the origins of structure in human language. Proceedings of the National Academy of Sciences, 105, 1068110686. Pyers, J., & Senghas, A. (2009). Language promotes false-belief understanding: Evidence from learners of a new sign language. Psychological Science, 20, 805-812. Senghas, A., Kita, S., & Ozyurek, A. (2004). Children creating core properties of language: Evidence from an emerging sign language in Nicaragua. Science, 305, 1779-1782. Scott-Phillips, T. C., & Kirby, S. (2010). Language evolution in the laboratory. Trends in Cognitive Sciences, 14, 411-417. Tamura, K., & Hashimoto, T. (2012). Displacement in communication. The Evolution of Language: Proceedings of the 9th International Conference (evolang9), Kyoto, Japan. Tamura, K., & Hashimoto, T. (2013). Self-other hypothesis formation in displaced communication (in Japanese). Proceedings of the 27th Annual Conference of the Japanese Society for Artificial Intelligence, Toyama, Japan.
333
FITNESS LANDSCAPES IN CULTURAL LANGUAGE EVOLUTION: A CASE STUDY ON GERMAN DEFINITE ARTICLES
REMI VAN TRIJP Sony Computer Science Paris, 6 Rue Amyot, Paris, 75005, France [email protected] Computational experiments in cultural language evolution are important because they help to reveal the cognitive mechanisms and cultural processes that continuously shape and reshape the structure and knowledge of language. However, understanding the intricate relations between these mechanisms and processes can be a daunting challenge. This paper proposes to recruit the concept of fitness landscapes from evolutionary biology and computer science for visualizing the “linguistic fitness” of particular language systems. Through a case study on the German paradigm of definite articles, the paper shows how such landscapes can shed a new and unexpected light on non-trivial cases of language evolution. More specifically, the case study falsifies the widespread assumption that the paradigm is the accidental by-product of linguistic erosion. Instead, it has evolved to optimize the cognitive and perceptual resources that language users employ for achieving successful communication.
1. Introduction There is a wide consensus among researchers working on cultural language evolution that language is a complex adaptive system (CAS; Steels, 2000; Beckner et al., 2009), which means that many linguistic phenomena are emergent properties of the complex interplay between cognitive, perceptual and social constraints on language users as they engage with each other in local interactions. Computational models are an important tool of the CAS approach because they demonstrate the consequences of these complex dynamics on a scale and time course that is unimaginable with human subjects (e.g. Baronchelli, Chater, Pastor-Satorras, & Christiansen, 2012; Beuls & Steels, 2013; Fay & Ellison, 2013). As noted by Coupé, Shuai, and Gong (2013, p. 121), these models have recently shifted from investigating the evolution of simple one-to-one mappings to more realistic linguistic communication systems. However, this increase in sophistication comes with the need for new methods of deciphering what is going on in the models. This paper proposes to recruit the concept of fitness landscapes from evolutionary biology and computer science, and to apply it to cultural language evolution. Through a case study on German, it will illustrate how fitness landscapes can help to solve long-standing puzzles in language evolution.
334
1.1. Linguistic Selectionism and Fitness Landscapes This paper subscribes to the theory of cultural language evolution as proposed by Steels (2011). One of the cornerstones of the theory is the biologically-inspired mechanism of linguistic selectionism, which involves two kinds of processes: 1. Processes that create linguistic variation in a population. 2. Processes that select variants to become dominant conventions. The theory thus hypothesizes that linguistic variants may be selected if they increase the “linguistic fitness” of a language. In order to test this hypothesis, we need a good understanding of the relations between the space of possible languages (given the linguistic variants at hand) and their linguistic fitness. Fitness landscapes can offer a valuable tool for visualizing these relations. Fitness landscapes were first introduced in evolutionary biology by Wright (1932), who visualized the space of possible gene combinations as a field, where the “height” of the field corresponds to the fitness of a genotype (in this case its replication rate). The problem of evolution can then be conceptualized as “a mechanism by which the species may continually find its way from lower to higher peaks in such a field” (p. 358–359). This idea soon became generalized to computer science, where it is used for evolutionary optimization (Richter, 2010). Applying this approach to cultural language evolution, we can thus generate a space of possible languages through the processes that create variation in a population. Next, we can define a “fitness function” that evaluates how well each possible language is adapted for communication. As I will explain in section 2, this fitness function aggregates several linguistic selection criteria that language users experience when engaging in linguistic interactions, such as communicative success, disambiguation power and processing efficiency. 1.2. A Linguistic Puzzle The best way to demonstrate the validity of the proposed method is to ground it into a concrete case study of language evolution that looks like an outright challenge to the theory. The experiments in this paper focus on the evolution of the German paradigm of definite articles, which is widely considered as one of the most intriguing puzzles in linguistics. The big mystery of the paradigm is as follows. Each article is marked for three dimensions: case (nominative, accusative, dative and genitive), number (singular and plural) and gender (masculine, neuter and feminine). A fully transparent paradigm would thus consist of 24 distinct articles, one for each combination of these dimensions. As illustrated in Figure 1, however, the actual German paradigm has been evolving further and further away from such transparency. The Figure displays the paradigm as it appears in Old High German (OHG, 900–1100; Wright, 1906), Middle High German (MHG, 1100–1500; Wright, 1916) and New
335
High German (NHG, from 1500 onwards). Gray cells in the paradigm indicate when distinct forms collapsed into “syncretic forms” (i.e. where the same form covers multiple cells). One striking observation is that the OHG-paradigm counts twice as many distinct forms than the current system. For instance, where NHG has one syncretic form die for nominative and accusative plurals, OHG had a three-way gender distinction between masculine, neuter and feminine. So why did the speakers of German allow this more transparent system to crumble down to its current form? What can this case study tell us about language evolution? !"#$ %+! ,-.,/ 01%
d‘r d‘n d‘mu d‘s
%+! ,-.,/ 01%
die die d"n d‘ro
%&'( 4567'8"9 daz! daz! d‘mu d‘s Plural diu diu d"n d‘ro
)&*
!"#$
diu die d‘ru d‘ru
d‘r d‘n !"# d‘s
deo deo d"n d‘ro
die die d‘n d‘r
Old High German 233
%&'( 4567'8"9 daz! daz! !"# d‘s Plural diu diu d‘n d‘r
)&*
!"#$
diu die !"$ d‘r
der den dem des
die die d‘n d‘r
die die den der
Middle High German
1100
%&'( 4567'8"9 das das dem des Plural die die den der
)&* die die der der die die den der
New High German 1500
1900
Figure 1. This Figure shows the system of German definite articles in three different time periods: Old High German, Middle High German and New High German.
The answers suggested in the literature contradict each other. The most popular explanation is that non-systematic syncretism is simply a historical accident caused by phonological and morphological changes (Baerman, 2009). However, new syncretic forms do not randomly enter the paradigm. In fact, the data show such strong tendencies that Hawkins (2004, p. 63–86) argues that the increase in syncretism follows a universal hierarchy for case (nom > acc > dat) and gender (masc, fem > neut ), whereby distinctions in lower dimensions are lost before distinctions in higher dimensions (e.g. dative before accusative). However, none of these answers can explain why the paradigm declined so rapidly from OHG to MHG, but then remained relatively stable for more than five centuries despite the availability of simpler variants in the Low German dialects (Shrier, 1965). 2. Experimental Set-Up The predictions of the aforementioned hypotheses can be tested through computational experiments. The experiments reported here start with a bidirectional processing model of German in Fluid Construction Grammar (see van Trijp, 2011, 2013 and www.fcg-net.org/demos/german-case/ for an online demo).
336
2.1. Processes that Create Variation While the rest of the grammar remains fixed, the experiments generate a space of possible variations by changing the paradigm of definite articles, after which each variation can be evaluated in terms of its linguistic fitness. Since we are interested in explaining how the OHG-paradigm may have evolved into its current form, the experiment starts with a computational reconstruction of the OHG-system. Variation is caused by two pairs of widely attested processes. The first pair consists of phonological processes that either “erode” or “expand” forms. For instance, the process apocope is a force of erosion in which the last sound of a word is dropped, e.g. [dEmu] → [dEm]. Phonological expansion works in the other direction, where new sounds can be attached to a form. Phonological processes can leave the distinctions of a paradigm intact, or they can cause forms to collapse. The second set of processes consists of attraction and repulsion. Attraction may happen when two forms are phonologically close to each other. In this case, the form with the highest type frequency will attract the other form and effectively usurp its functions. For instance, example (1) illustrates the phonological distance from the OHG-article die (in the center of the spider chart) to the other articles of the OHG-paradigm. As can be seen, die [di@] is phonologically closest to diu [diu] and deo [deo]. There is thus a high probability that one of these forms will attract the others, and thereby increase the syncretism of the paradigm. dër
diu deo
dëru
(1)
daz̹ d z ̹ daz dën dën dēn
dëmu dëmu
The opposite of attraction is repulsion. Repulsion occurs when one of the cells that is covered by a syncretic form breaks free and gets its own distinct form. Repulsion therefore increases the transparency of a paradigm. 2.2. Linguistic Fitness Each generated variation is tested for both parsing and production against a corpus of 360 declarative utterances, which exposes the variation to all possible combinations of case, number and gender in transitive and ditransitive patterns. In parsing, the model tries to disambiguate the argument structure that underlies the utterance (i.e. ‘who did what to whom’). The linguistic fitness of a variation is evaluated as a weighted average of four measures: disambiguation power (based on the amount of utterances that the language is able to disambiguate), processing efficiency (based on the amount of
337
primitive operations that the language faculty has to perform in order to parse or produce sentences), ease of articulation (based on the amount of movements that articulators such as the lips and tongue have to make when pronouncing sounds) and acoustic distinctiveness (based on the phonological distance between words). All of these measures are formally defined and discussed by van Trijp (2013). Disambiguation power weighs 85%, processing efficiency 13%, and ease of articulation and acoustic distinctiveness weigh 1% each. These weights were experimentally obtained through standard feature weighting methods in order to find the best fit of the model on the empirical data on the evolution of German articles. 3. Experimental Results
Linguistic Fitness
The resulting fitness landscape for German is shown in Figure 2. The X-axis shows the average length of the articles in each variation, and the Y-axis shows the amount of distinct case forms (maximum 18 because the genitive case is ignored in the experiments because it is not a core argument role). The height of the landscape corresponds to linguistic fitness. The first remarkable result is that, perhaps counterintuitively, the language does not require a lot of distinct articles for reaching a high fitness value: two distinct forms already push the fitness beyond 80%, and all other variations are very close to each other with linguistic fitness values between 85 and 92%.
0.9-1.0 Avera g
e artic
le len
gth
0.8-0.9
N
ro
be
um
les
tic
r fa
Figure 2. The fitness landscape for German definite articles. The paradigm is quite robust to change and already reaches a linguistic fitness of more than 80% with only two articles. The contour map beneath the landscape shows that the fittest variants have paradigms of 3 to 7 articles.
338
The large plateau of high fitness values indicates that the language is remarkably robust for changes in its case system, which may explain the enormous variation in the case systems of German dialects (Shrier, 1965). Nevertheless, the contour map beneath the landscape suggests that some variants on the plateau still have higher fitness values, with the best paradigms consisting of three to seven articles. Figure 3 zooms in on all values above 60% for linguistic fitness, disambiguation power and processing efficiency; and it marks where Old and New High German are situated in the landscape. The Figure reveals that the plateau is in fact what cyclists call a “false flat”, meaning that there is a low-gradient climb. The results thus confirm that the evolution of the German paradigm of definite articles can be conceptualized as an upwards movement in a fitness landscape without any intermediate “valleys” that need to be bridged. 1
Middle High German
0.95
New High German
0.9
disambiguation power Old High German linguistic fitness
0.85 0.8 0.75 0.7
processing efficiency
0.65 0.6
1
2
3
4
5
6
7
8 9 10 11 Number of Markers
12
13
14
15
16
17
18
Figure 3. This Figure shows the processing efficiency of different article systems (ranging from a fully syncretic paradigm to a maximally transparent one), their impact on the language’s disambiguation power and its linguistic fitness. The Y-axis zooms in on values of more than 60%. The Figure shows that the language achieves a higher fitness score with the NHG-paradigm than with the OHGparadigm.
The smooth evolution towards NHG is also confirmed when looking at the fittest linguistic variants generated through the processes of variation. For instance, just as what happened in German, early variants typically lose the gender distinctions in nominative and accusative plural forms (i.e. die vs. deo vs. diu). Collapsing these forms significantly increases the processing efficiency of the paradigm. Since gender and number are also marked on German nouns, the loss of the gender distinction does not affect the language’s disambiguation power.
339
When looking at the variants that evolve in the other direction – namely increasing the number of distinct forms through repulsion – we first see a reestablishment of the nominative-accusative distinction for singular-neuter nouns and for all plural forms. Twelve articles then suffice for achieving maximum disambiguation power if they are arranged in a proper way. However, each additional form comes at the cost of processing efficiency, which explains why smaller paradigms are preferred over larger ones. 4. Discussion and Conclusion The experimental results of the case study provide support for the theory of cultural language evolution as proposed by Steels (2011). More specifically, the theory predicts that linguistic variants may become dominant in a population if they offer a selective advantage for communication. The case study has demonstrated that linguistic selectionism can indeed explain even the seemingly erratic evolution of German definite articles. Moreover, the fittest variants that were generated by the computational model correspond to those changes that actually happened in the German language. The experiments therefore cast serious doubt on the alternative explanations for the evolution of German definite articles as discussed in section 1.2. First, all systematic trends in the increase of syncretism emerge as a side-effect of linguistic selection. The model thus shows that there is no need to posit universal case or gender hierarchies, as hypothesized by Hawkins (2004). Secondly, it is very unlikely that the syncretic forms of the paradigm evolved as a historical accident rather than as a result of selection. Without selection, all variants have an equal chance of “survival”, which results in an explosion of the space of possible variations. In such a large space, it is highly improbable that the language has followed a path that consistently moves upwards in the fitness landscape by sheer accident. In sum, this paper has demonstrated that fitness landscapes are a valuable tool for computational experiments in cultural language evolution. Fitness landscapes allow researchers to visualize the relations between a space of possible languages and their linguistic fitness, which helps to understand the complex dynamics between cognitive, perceptual and social forces that shape and reshape the structure and knowledge of language. Acknowledgements The work presented in this paper was funded by and conducted at Sony CSL Paris. The idea to apply fitness landscapes to cultural language evolution first came up in discussions with Luc Steels (Sony CSL Paris - CSIC-UPF Barcelona) and Pieter Wellens (VUB Artificial Intelligence Laboratory Brussels) in the Summer of 2012, and was later also suggested to me by Robert C. Berwick (MIT) and Dan Dediu (Max Planck Institute Nijmegen) at the Workshop on Language, Cognition and Computational Models (Paris, 28-29 May 2013).
340
References Baerman, M. (2009). Case syncretism. In A. Malchukov & A. Spencer (Eds.), The Oxford handbook of case (pp. 219–230). Oxford: OUP. Baronchelli, A., Chater, N., Pastor-Satorras, R., & Christiansen, M. (2012). The biological origin of linguistic diversity. PLoS ONE, 7(10), e48029. Beckner, C., Blythe, R., Bybee, J., Christiansen, M. H., Croft, W., Ellis, N. C., Holland, J., Ke, J., Larsen-Freeman, D., & Schoenemann, T. (2009). Language is a complex adaptive system: Position paper. Language Learning, 59(s1), 1-26. Beuls, K., & Steels, L. (2013). Agent-based models of strategies for the emergence and evolution of grammatical agreement. PLoS ONE, 8(3), e58960. Coupé, C., Shuai, L., & Gong, T. (2013). Review of the 9th international conference on the evolution of language (evolang9). Biolinguistics, 7, 112–131. Fay, N., & Ellison, T. M. (2013). The cultural evolution of human communication systems in different sized populations: Usability trumps learnability. PLoS ONE, 8(8), e71781. Hawkins, J. (2004). Efficiency and complexity in grammars. Oxford: OUP. Richter, H. (2010). Evolutionary optimization and dynamic fitness landscapes. from reaction-diffusion systems to chaotic CML. In I. Zelinka, S. Celikovský, H. Richter, & G. Chen (Eds.), Evolutionary algorithms and chaotic systems (pp. 409–446). Berlin: Springer. Shrier, M. (1965). Case systems in German dialects. Language, 41(3), 420–438. Steels, L. (2000). Language as a complex adaptive system. In M. Schoenauer (Ed.), Proceedings of ppsn vi: Lectur notes in computer science (pp. 17– 26). Berlin: Springer-Verlag. Steels, L. (2011). Modeling the cultural evolution of language. Physics of Life Reviews, 8(4), 339–356. van Trijp, R. (2011). Feature matrices and agreement: A case study for German case. In L. Steels (Ed.), Design patterns in Fluid Construction Grammar (p. 205-235). Amsterdam: John Benjamins. van Trijp, R. (2013). Linguistic assessment criteria for explaining language change: A case study on syncretism in German definite articles. Language Dynamics and Change, 3(1), 105–132. Wright, J. (1906). An Old High German primer (2nd ed.). Oxford: Clarendon Press. Wright, J. (1916). A Middle High German primer (2nd ed.). Oxford: Clarendon Press. Wright, S. (1932). The roles of mutation, inbreeding, crossbreeding, and selection in evolution. In Proceedings of the sixth international congress on genetics (pp. 355–366).
341
SOCIAL WORD LEARNING STRATEGIES IN DIFFERENT CULTURES
PAUL VOGT J. DOUGLAS MASTIN Tilburg center for Cognition and Communication, Tilburg Univeristy, PO Box 90153, Tilburg, 5000 LE, the Netherlands [email protected]; [email protected]
One of the distinguishing features of the human language capacity is the ability to acquire relatively large vocabularies. This is a remarkable feature, because learning word-meaning mappings is a hard problem due to huge amounts of referential uncertainty Quine (1960). For example, Vogt (2012) has shown that under idealised circumstances, a statistical learner cannot learn an adult-sized lexicon within a lifetime when there is no control for referential uncertainty. In order to explain the human capacity to learn large vocabularies, we must have evolved word learning strategies to reduce referential uncertainty. Of the large variety of possible word learning strategies, social strategies, such as coordinating joint attention (Tomasello, 1995), are perhaps considered the most distinctive for human language evolution. The question, however, is to what extent are specific social strategies applied universally across cultures. Keller (2012) has described three prototypical learning environments in which children are raised: industrial urban societies, non-industrial urban environments, and non-industrial rural communities. The learning environment of these three environments support the development of different levels of autonomy: individual psychological autonomy in industrial urban, communal psychological autonomy in non-industrial urban and (communal) action autonomy in non-industrial rural communities. These differences very much reflect the needs that the lifestyles of the communities demand of children when they grow up (Keller, 2012). We investigate how different social word learning strategies are applied in these three learning environments. To this aim, we have carried out longitudinal field studies among 40 infants and their (extended) families in the Netherlands, urban Mozambique and rural Mozambique. We carried out naturalistic observations and collected scores on expressive vocabulary through parental checklists at infants’ average ages of 1;1, 1;5 and 2;1. Of each recording, we transcribed child-directed speech (CDS), child speech, attentional states (e.g., object play, dyadic interactions and joint attention), non-verbal communication (e.g., diectic gestures, conventional gestures, ritualised
342
play and motoric simulation) and amounts of time infants engage with different members of the (extended) family. In addition to assessing the amounts with which these occur, we assessed how they correlate with vocabulary size. The results reveal major differences in language socialisation. For instance, we see that Dutch infants received twice as much CDS than urban Mozambican infants, who were exposed to five times as much CDS than rural infants. In addition, Dutch infants engaged more in object related activities, such as object play and joint attention, and less in dyadic interactions. In Mozambique we found that joint attention has a positive correlation to vocabulary in the urban community, but a negative one in the rural community. In both these communities the amounts infants engaged in dyadic interactions correlate positively with later vocabulary. These results are reflected in the non-verbal signals addressed to infants. While in the Netherlands there were substantially more deictic gestures, in urban Mozambique there was substantially more ritualised play, and in rural Mozambique we observed relatively more motoric stimulation. Finally, language socialisation in the Netherlands was mostly facilitated by the parents, while in Mozambique socialisation was much more distributed over the extended families and contained many multiparty interactions. Their language socialisation was to a large extent also facilitated by siblings; in the rural community even up to equal the amount of socialisation provided by the mothers. To conclude, our extensive field study shows remarkable quantitative and qualitative differences in language socialisation, which is in line with previous ethnographic research (Kuntay, Nakamura, & Ates Sen, in press). In particular, the results confirm the characterisation of the three prototypical learning environments proposed by Keller (2012). While the Dutch children appear to be more cognitively stimulated, the development of social knowledge and motoric skills are more stimulated in rural Mozambique, while the urban Mozambican learning environment is a combination of both. Since the lifestyle of the industrial society is relatively new in evolutionary timescale, social word learning strategies that are considered crucial based on studies from industrial societies, such as joint attention, may have played a less crucial role in the origins of language. References Keller, H. (2012). Autonomy and relatedness revisited: Cultural manifestations of universal human needs. Child Development Perspectives, 6(1), 12–18. Kuntay, A. C., Nakamura, K., & Ates Sen, B. (in press). Crosslinguistic and crosscultural approaches to pragmatic development. In D. Matthews (Ed.), Pragmatic development. Quine, W. V. O. (1960). Word and object. Cambridge University Press. Tomasello, M. (1995). Joint attention as social cognition. Vogt, P. (2012). Exploring the robustness of cross-situational learning under Zipfian distributions. Cognitive Science, 36, 726–739.
343
THE MENTAL SYNTHESIS THEORY: THE DUAL ORIGIN OF HUMAN LANGUAGE ANDREY VYSHEDSKIY, PHD Boston University, 5 Cummington St. Boston, MA, USA The origin of language remains one of the greatest mysteries of all times. The last 150 years, since Darwin’s theory, have been marked by great discoveries in genetics and paleoanthropology. However, the discussion of the evolution of the human intellect and language is as vigorous as it was in Darwin’s times. While studying the neuroscience of consciousness, I was struck with certain facts about mental imagery that seemed to shed some light on the process of the evolution of the human mind. That research resulted in a simple, parsimonious theory of the evolution of the human mind that makes clear and testable predictions. I propose that two independent forces drove the evolution of the human brain. First, under the strong selection pressure from stalking motionless predators hidden by the tall grass of the savanna, our ancestors evolved greater top-down control of perception. The quality and precision of the stone tools manufactured by hominins starting 2.5 million years ago indicate that the hominin mind acquired an ability to generate a mental template of the future chopper; their visual system had evolved to actively and intentionally control its percept. At the same time, a greater demand for cooperation influenced the development of a speech box. The archeological and genetic data suggest that as of 600,000 years ago hominins acquired a nearly modern vocal apparatus. When a single mutation around 100,000 years ago triggered the delay of the maturation of the prefrontal cortex, it created a favorable environment for the two developing traits to collude. The synergetic combination of a greater control of perception coupled with a modern speech apparatus resulted in the cultural invention of a syntactic communication system and the concomitant acquisition of mental synthesis: the ability to voluntarily imagine any novel object. These behaviorally modern humans excelled at performing mental simulations, which resulted in the dramatic acceleration of technological progress; the human population exploded and humans quickly settled most habitable areas of the planet. Armed with the ability to mentally simulate any plan and then to communicate it to their companions, humans rapidly became the dominant species.
1. The neurological mechanism of mental synthesis This manuscript presents a theory of the evolution of the human mind and suggests experiments that could be done to test, refute, or validate the hypothesis. The basis of the theory is a simple, yet fundamental question: what happens neurologically when two objects, never before seen together (say, an apple on top of a whale), are imagined together for the first time? We know that 344
a familiar object, such as an apple or a whale, is represented in the brain by a neuronal ensemble. When one sees or recalls such an object, the neurons of that object’s neuronal ensemble tend to activate into synchronous resonant activity (Quiroga et al., 2008). The neuronal ensemble binding mechanism, based on the Hebbian principle “neurons that fire together, wire together,” came to be known as the binding-by-synchrony hypothesis (Singer, 2007). However, while the Hebbian principle explains how we perceive a familiar object, it does not explain the infinite number of novel objects that humans can voluntarily imagine. The neuronal ensembles encoding those objects cannot jump into spontaneous synchronized activity on their own since the parts forming those novel images have never been seen together. I propose that to account for imagination, the binding-by-synchrony hypothesis would need to be extended to include the phenomenon of mental synthesis whereby the brain actively and intentionally synchronizes independent neuronal ensembles into one morphed image. Thus, the apple neuronal ensemble is synchronized with the whale neuronal ensemble, and the two disparate objects are perceived together. The synchronization mechanism of mental synthesis is likely responsible for many imaginative and creative traits that philosophers and scientists have recognized as being uniquely human, despite not having a precise neurological understanding of the process. For example, Lev Vygotsky claims, “Imagination is a new formation that is not present in the consciousness of the very young child, is totally absent in animals, and represents a specifically human form of conscious activity” (Vygotsky, 1933). Ian Tattersall writes, “... if there is one single thing that distinguishes humans from other life-forms, living or extinct, it is the capacity for symbolic thought: the ability to generate complex mental symbols and to manipulate them into new combinations. This is the very foundation of imagination and creativity: of the unique ability of humans to create a world in the mind...” (Tattersall, 1999). One function that uses mental synthesis is human language. When we speak we use mental synthesis to communicate a novel image (“My house is the second one on the left, just across the road from the red gate”) and we rely on the listener to use mental synthesis to make sense of our words and follow our instructions. Flexible syntax, prepositions, adjectives, verb tenses, and other common elements of grammar, all facilitate the human ability to communicate an infinite number of novel images with the use of a finite number of words. Without mental synthesis it would be impossible to understand the difference between the task of “putting a bowl behind a cup” and the task of “putting a bowl in front of the cup.” The innate function of mental synthesis and a culturally acquired communication system are the two main components of human language. What is the neurological basis of mental synthesis? Neuronal ensembles storing sensory memories are largely located in the posterior cortex (temporal, parietal and occipital lobes), while the temporal organization of behavior is primarily a function of the frontal cortex (Fuster, 2008). Single neuron 345
recordings in monkeys (Miyashita, 2004) and functional brain imaging studies in humans (Fuster, 2008) demonstrated that memory recall is associated with activation of neuronal ensembles in the posterior sensory cortex and is under the executive control of the prefrontal cortex (PFC). I hypothesized that mental synthesis is performed by the PFC, which activates and synchronizes independent neuronal ensembles in-phase with each other. When two or more independent neuronal ensembles are activated to fire synchronously in one mental frame they are consciously experienced as one unified object or scene. In this process humans can manufacture an unlimited number of novel mental images and plan their future actions through mental simulation of the physical world. The PFC can be viewed as a puppeteer controlling its puppets (memories encoded in neuronal ensembles stored in the posterior cortex). By pulling the strings that connect the puppeteer to its puppets, the PFC activates and changes the firing phase of the neuronal ensembles retrieved into working memory. Phase-synchronized neuronal ensembles are consciously experienced as a novel whole object or scene. For example, to imagine something you have never seen before, such as your favorite cup on top of your computer’s keyboard, your PFC (1) activates the neuronal ensemble of the cup, (2) activates the neuronal ensemble of the keyboard, and then (3) synchronizes the two ensembles in time. In this process the PFC relies on isochronous connections between the neurons of the PFC and the neurons of the posterior sensory cortex. Without isochronous connections it would be impossible to synchronize the neuronal ensemble puppets located at varying physical distances from the PFC puppeteer. There is significant evidence that isochronicity of the connections between the puppeteer and its puppets may be accomplished by experience-dependent deferential myelination of those connections (Fields, 2008). Additional axonal isolation provided by multiple layers of the myelin sheath can dramatically increase conduction velocity and compensate for the uneven distance between the puppeteer and its puppets (Salami et al., 2003). The experience essential for training the isochronous connections is provided by a syntactic communication system. 2. The dual origin of human language Acquisition of language is commonly cited as the reason that humans dramatically changed their behavior around 100,000 years ago. However, to this day, there is vigorous scientific discussion around the following two questions: what part of language was acquired 100,000 years ago and why was language acquired so abruptly (Hauser et al., 2002). Charles Darwin envisioned the origin of language simply as the acquisition of a mechanical ability to produce sounds, which provides the basis for a communication system. A modified version of that hypothesis claims that a culturally acquired communication system enabled language acquisition (Ulbaek, 1998). Opponents of the “nurture” hypothesis argue that human syntactic language cannot be taught to animals and therefore 346
humans must be unique in their genetic predisposition to language. Spearheading the “nature” hypothesis is Noam Chomsky who argues that a genetic mutation, which took place 100,000 years ago, enabled the innate faculty of language. Both parties agree that hominin behavior changed dramatically some 100,000 years ago, but the nurture party argues that a communication system was culturally acquired at that time, while the nature party argues that a genetic mutation predisposed humans to language acquisition. My analysis suggests that the nurture and nature theories are not mutually exclusive and that, in fact, both theories are correct. Sometime around 100,000 years ago, there was a genetic mutation that predisposed humans to the acquisition of mental synthesis, the innate component of language, and there was a change from a non-syntactic to a syntactic communication system, which quickly followed the genetic mutation and completed the giant step of language acquisition. Humans now possessed both components of language: a communication system complete with spatial prepositions and flexible syntax and the ability to synthesize words into an infinite number of novel mental images. Until 100,000 years ago, the two components of language developed slowly and independently for several million years. Natural selection mandated that our ancestors should be in groups in order to survive. Cooperation was in high demand for the highly social and very adventurous H. erectus, and it was aided by the verbal communication system, which improved in H. erectus starting some two million years ago. The archeological (Tattersall, 1999) and genetic (FOX2P, Krause et al., 2007) evidence unequivocally points to the fact that, as of 600,000 years ago, hominins had acquired a nearly modern speech apparatus. Noam Chomsky is absolutely right: the H. erectus species that existed 600,000 years ago lacked the innate part of human language; the species lacked mental synthesis and was thus unable to mentally synthesize any novel images. Without mental synthesis, H. erectus could not use a flexible syntax, verb tenses or spatial prepositions. Their communication system was a nonsyntactic, finite communication system, similar to the communication system of chimpanzees, but with a larger vocabulary enabled by the nearly modern articulate vocal apparatus. Even in the absence of mental synthesis, the large number of words provided a significant advantage for the mobile H. erectus who was often moving from one place to another. The syntactic communication system acquired by humans 100,000 years ago grew out of this non-syntactic communication system of H. erectus. Concurrently, another independent evolutionary force was preparing the hominin brain for acquisition of the innate part of language. In their new habitat, hominins were continuously selected for their superior ability to recognize motionless, stalking predators partially obscured by the tall savanna grass. I hypothesize that predator recognition in the savanna became significantly more difficult for hominins and that the selection of hominins capable of faster and 347
better predator recognition became a major evolutionary force. While their chimpanzee-like ancestors were safe from most ground-dwelling predators in the treetop canopies, hominins venturing into the open savanna were exposed to many new predators including big cats, hyenas, and the now extinct sabertoothed cats. Unable to fight off the bigger and faster predators, the four-foottall hominins had only one option for survival: long-distance recognition and avoidance of the stalking, motionless predators. All primates, including humans, recognize motionless objects slower than moving objects (Matsuno & Tomonaga, 2006). For our tree-dwelling chimpanzee-like ancestors, fast recognition of motionless stalking felines was not such a big deal: to attack, a cat had to move itself toward a tree, making itself much easier to detect. However, when hominins moved away from the trees into the savannas, the situation reversed. Now the hominins were moving and the camouflaged predators could remain motionless and blend into the background. Recognition of stationary ambiguous visual stimuli is not a trivial matter. The hominin brain was rewired to facilitate the detection of motionless predators concealed by vegetation. As a result of these neurological changes, modern humans detect stationary visual targets an order of magnitude faster than chimpanzees (Matsuno & Tomonaga, 2006) and integrate local elements significantly better than chimpanzees (Fagot & Tomonaga, 2001). The PFC plays an active role in the visual processing of ambiguous stimuli (Windmann et al., 2006). It is likely that the increased control by the PFC of neuronal ensembles in the posterior cortex resulting in an improved recognition of hidden, stalking predators began in australopithecines and improved continuously over the next six million years of hominin evolution. Stone tools manufactured by hominins indicate that as of 2.6 million years ago, the hominin PFC was able to actively and intentionally control its percept. According to Ian Tattersall, “To make a carefully shaped hand ax from a lump of rock not only demanded a sophisticated appreciation of how stone can be fashioned by fracture, but a mental template in the mind of the toolmaker that determined the eventual form of the tool” (Tattersall I, 1999). The neuronal ensemble of a mental template was clearly an active modification of the neuronal ensemble of the cobble. The PFC of H. habilis was able to voluntarily shift neurons representing flakes out-of-phase with the neuronal ensemble of the cobble, to perceive the remaining synchronously firing neurons as the mental template of the chopper. H. habilis was then able to physically shape the cobblestone to match his mental template of the chopper. H. habilis was not capable of synchronizing multiple neurons in-phase with each other (mental synthesis), but he was able to selectively desynchronize neurons in order to perceive an object he has never experienced before. Acquisition of the ability to form a mental template was an evolutionary stepping-stone on the road towards mental synthesis. In the next 2.5 million years, predation further selected hominins for their ability to analyze the savanna scenery for the presence of motionless stalking 348
predators concealed by vegetation. The improvement in quality of the stone tools manufactured by H. erectus in comparison to those made by H. habilis is evidence of the improving top-down control of the PFC over its conscious precept: H. erectus was better able to detect “invisible” predators concealed by vegetation and to see an “invisible” hand axe inside a flint stone. As of 600,000 years ago, H. erectus had a nearly modern speech apparatus and was able to intentionally control its perception by desynchronizing parts of existing neuronal ensembles. However the species was not yet capable of synchronizing independent neuronal ensembles into novel mental images; it was not capable of mental synthesis. Two events separated H. erectus from acquiring mental synthesis and an infinite syntactic communication system, one genetic and one cultural: a genetic mutation that would significantly slow down maturation of the PFC and the concurrent cultural acquisition of a syntactic communication system. Maturation of the PFC in modern humans is delayed dramatically compared to chimpanzees and monkeys (Liu et al., 2012). Overall, humans are born with a less mature brain and develop 1.5 to 2 times slower than chimpanzees: molar teeth erupt three years later and humans become sexually active roughly five years after the chimps do (Zollikofer & Ponce de León, 2010). However, the delay in synaptic maturation in the PFC from a few months in chimpanzees and macaques to more than five years in humans (Liu et al., 2012) and the delay in axon myelination in the PFC (Miller et al., 2012) are much more dramatic compared to this overall delay in maturation. The mental synthesis theory connects a specific genetic mutation to language acquisition and describes the neurological mechanism affected by the mutation. I became convinced that the genetic mutation that triggered the dramatic delay of the maturation of the PFC was the “last” mutation, which finally enabled language acquisition. This conviction was based on the following logic: On its own, without the simultaneous acquisition of a full syntactic language, the genetic mutation that caused this dramatic delay in the maturation schedule of the PFC is completely deleterious and therefore highly unlikely to become “fixed” within a population. A developmentally delayed recipient of this mutation would have had a significantly longer childhood, a longer dependence on parents, and an increased chance of early death as a consequence of such childhood immaturity. This mutation becomes advantageous only with the concurrent acquisition of a syntactic communication system, in which case the mutation would have aided in the ontogenetic development of isochronous connections between the PFC and the posterior sensory cortex. Extending the period of neuroplasticity would have provided the PFC puppeteer several extra years to fine-tune its voluntary control over the firing phase of the neuronal ensemble puppets located in the posterior sensory cortex. This fine control of the firing phase of the neuronal ensembles is crucial for synchronization of those neural ensembles (which lies at the basis of humans’ ability to synthesize any novel mental images in the 349
process of mental synthesis). The physical length of connections between the PFC and the memory storage areas in the posterior cortex varies at least tenfold. The development of isochronic connections to neuronal ensembles located in the posterior cortex likely involves adjustment of conduction velocity in individual fibers via experience-dependent differential myelination of those fibers. This experience is normally provided by a syntactic communication system. Without early exposure to a syntactic communication system, even modern children do not acquire mental synthesis. A non-syntactic communication system (e.g. home sign) is inadequate for ontogenetic acquisition of isochronic connections between the PFC puppeteer and the neuronal ensemble puppets located in the posterior cortex. A modern child who is linguistically deprived during the critical period is unable to convert a finite non-syntactic communication system (e.g. home sign) into an infinite syntactic communication system on his or her own (Grimshaw et al., 1998; Curtiss, 1988; Morford, 2003; Hyde et al., 2011; Ramírez et al., 2012). However, as has been seen in the case of deaf children in Nicaragua, it is possible for a group of children to spontaneously invent a syntactic communication system (Kegl et al., 1999; Senghas, 2004). It follows that a syntactic communication system must have been invented by at least two hominins who were living together during their critical period. Once the two children (most likely siblings or twins) who were both carriers of the mutation that delayed the maturation of the PFC, invented a few prepositions, they would have converted their tribe’s nonsyntactic finite communication system into an infinite syntactic communication system. With just a few prepositions, their normal conversations with each other would have trained the isochronic connections between their prefrontal and posterior cortical areas, thus allowing them to attain mental synthesis. A mutation that was deleterious in the absence of a syntactic communication system became a highly advantageous mutation due to the simultaneous acquisition of a culturally transmitted syntactic communication system and mental synthesis. The mental synthesis theory proposes that the two parts of language were propelled by unrelated evolutionary forces until about 100,000 years ago: the communication system developed independently of the PFC’s ever-increasing control of perception (future mental synthesis); there was no synergy in their development. This separation explains why the process was so slow until 100,000 years ago. Neither slowly increasing top-down control of perception by the PFC, nor the improving speech apparatus were enough on their own to trigger acquisition of either mental synthesis or a syntactic communication system. However, their combination was. Increased top-down control of perception and improvements in speech accumulated a necessary critical mass and colluded some time around 100,000 years ago to produce the uniquely human ability of mental synthesis. Mental synthesis — the innate, natural component of language — enabled adding 350
prepositions, adjectives and verb tenses to the communication system. The finite, non-syntactic communication system of H. erectus was then converted into an infinite, recursive syntactic language that we know today. At the same time, the expanding communication system — the nurture component of language — contributed to the further development of mental synthesis in the consecutive generations of children. The humans coming out of Africa some 65,000 years ago were very much like us. They were now in possession of both components of human language: the culturally acquired syntactic communication system along with the innate predisposition towards mental synthesis. 3. A wish list of experiments An important component of a theory is that it should be falsifiable. During the development of the mental synthesis theory, I had a major concern: how do I make a theory about mainly subjective phenomena (internal thoughts) falsifiable? Therefore, in the complete monograph, I discuss some predictions from the theory and provide an accompanying “wish list” of experiments that could be done to test, refute or validate those predictions. Please download the complete monograph from: www.mobilereference.com/a/mind.pdf References Fagot J, Tomonaga M. (2001) Effects of element separation on perceptual grouping by humans and chimpanzees Anim Cogn 4:171–177 Fields, R. D. (2008). White matter in learning. Trends in neurosci, 31:361-370. Fuster, Joaquin M. The prefrontal cortex. Academic Press, 2008. Grimshaw et al. (1998) First-language acquisition in adolescence Brain and Language, 63:237-255. Hauser et al. (2002). The faculty of language: What is it, who has it?. Science, 298:1569-1579. Hyde, D.C. et al. (2011). Spatial and numerical abilities without a complete natural language. Neuropsychologia, 49(5), 924-936. Kegl J et al. (1999) in Language Creation and Language Change, MIT Press Krause J. et al. (2007) The Derived FOXP2 Variant of Modern Humans Was Shared with Neandertals. Current Biology, 17(21):1908-1912 Liu X. et al. (2012) Extension of cortical synaptic development distinguishes humans from chimpanzees and macaques. Genome Research, 22:611-622. Matsuno T, Tomonaga M. Visual search for moving and stationary items in chimpanzees and humans Behavioural Brain Research 172:219–232 (2006) Miller, DJ et al. (2012) Prolonged myelination in human neocortical evolution. Proceedings of the National Academy of Sciences. 109:16480-16485. Morford, J. (2003) Grammatical development in adolescent first-language learners. Linguistics 41, 681–721. 351
Miyashita, Y. (2004). Cognitive memory: cellular and network machineries and their top-down control. Science Signaling, 306(5695), 435. Quiroga RQ, Kreiman G, Koch C, Fried I. Sparse but not ‘grandmother-cell’ coding in the medial temporal lobe. Trends Cogn Sci. 12(3):87-91 (2008) Ramírez, N. F., Lieberman, A. M., & Mayberry, R. I. (2012). The initial stages of first-language acquisition begun in adolescence: when late looks early. Journal of Child Language, 1(1), 1-24. Salami, M. et al. (2003) Change of conduction velocity by regional myelination yields constant latency. Proc. Natl. Acad. Sci. U. S. A. 100, 6174–6179 Singer, W. (2007). Binding by synchrony. Scholarpedia, 2(12), 1657 Spocter, M. A., et al. (2010). Wernicke’s area homologue in chimpanzees. Proceedings of the Royal Society B: Biological Sciences, 277, 2165-2174. Tattersall I. (1999) Becoming Human Oxford University Press Ulbaek, Ib (1998). In Approaches to the evolution of language. Cambridge Univ Press. pp. 30–43. Vygotsky L. (1933) Play and its role in the Mental Development of the Child. Translated by Catherine Mulholland from Voprosy psikhologii, 1966, No. 6 Windmann, S. et al. (2006). Role of the prefrontal cortex in attentional control over bistable vision. Journal of Cognitive Neuroscience, 18(3), 456-471. Zollikofer & Ponce de León (2010) The evolution of hominin ontogenies. Semin. Cell Dev. Biol. 21:441–452
352
COGNITIVE FACTORS MOTIVATING THE EVOLUTION OF WORD COGNITIVE FACTORS MOTIVATING THE EVOLUTION OF WORD MEANINGS: MEANINGS: EVIDENCE FROM CORPORA, BEHAVIORAL DATA AND EVIDENCE FROM CORPORA, BEHAVIORAL DATA AND ENCYCLOPEDIC NETWORK STRUCTURE ENCYCLOPEDIC NETWORK STRUCTURE BODO WINTER11, GRAHAM THOMPSON11 & MATTHIAS URBAN22 BODO WINTER1 , GRAHAM THOMPSON & MATTHIAS URBAN Cognitive and Information Sciences, 1 and Information Sciences, UniversityCognitive of California, Merced, 5200 North Lake Rd. University of California, Merced,U.S.A. 5200 North Lake Rd. Merced, 95340, Merced, 95340, U.S.A. 2 Leiden University Centre for Linguistics, University of Leiden 2 LeidenPostbus University Centre of Leiden 9515, 2300for RALinguistics, Leiden, the University Netherlands Postbus 9515, 2300 RA Leiden, the Netherlands Recent linguistic work suggests that the meaning of some words may evolve in a Recent linguistic suggestswords that the meaning some words may evolve in a directional fashion:work for instance, for ‘skin’ mayofdevelop the meaning ‘bark’ more directional forway instance, may develop the meaning of ‘bark’ more easily than fashion: the other round.words Here,for we‘skin’ investigate the underpinnings proposed easily than semantic the otherchanges way round. Here, we investigate data. the underpinnings proposed directional by looking at synchronic We show thatofwords that directional semantic as changes by looking synchronic data.change We show that have been identified candidates for the at origin of semantic (such as words ‘skin’) that are have identified as candidates forword the origin (such as ‘skin’) are more been frequent in English, have more sensesofinsemantic Webster,change more associations in free more in English, word senses Webster, moreofassociations free word frequent association data andhave moremore connections in a in network model Wikipedia. inThese word association and more principles connections in aevolution network of model Wikipedia. These findings highlight data key cognitive in the wordofmeaning, ultimately findings key cognitive in the evolution of word meaning, ultimately showing highlight that directional semanticprinciples change may be highly motivated. showing that directional semantic change may be highly motivated.
1. Semantic Change 1. Semantic Change Language evolution research frequently focuses on the evolution of structural Language evolution such research frequently focuses on the evolution of structural aspects of language, as compositionality. Perhaps because semantic change aspects of language, such as compositionality. Perhaps because semantic often seems haphazard and unpredictable, the evolution of word meaningschange is less often seems haphazard and unpredictable, the evolution of word meanings is less explored. explored. In a cross-linguistic study on 149 different languages, Urban (2011) found In a cross-linguistic study on 149 different languages, Urban (2011) found 45 concept pairs that suggest predictable regularity. Take, for example, the 45 concept pairs that suggest 1predictable regularity. Take, for example, the concept pair ‘bark’ and ‘skin’1. In the cross-linguistic sample, the word for conceptis pair ‘bark’ used and to ‘skin’ . In ‘bark’ the cross-linguistic sample, word for ‘skin’ frequently express (e.g., a language mightthe have a word ‘skin’ is frequently used to express ‘bark’ (e.g., a language might have a word that can roughly be translated to mean ‘tree skin’), but not the other way round, that be translated mean ‘tree skin’),such but not the otherbark’. way round, i.e., can thereroughly is no language that to has an expression as ‘human Urban i.e., there is no language that has an expression such as ‘human bark’. Urban (2011) suggests that such synchronic asymmetries in how concepts are (2011) suggests that such synchronic asymmetries in how concepts are 1 1
We use single quotation marks for concepts and italics for words. We use single quotation marks for concepts and italics for words.
353
expressed may be explained by directional trends in semantic change. This proposal is supported by an analysis of attested cases in Indo-Aryan languages (Urban, 2011: 15-24). Table 1 lists a subset of the 45 asymmetrical concept pairs. In the sample, the word to the right of each arrow was frequently expressed by a morphologically complex form that contains the word on the left. We will call the word on the left “unmarked” and the word on the right “marked,” with the diachronic interpretation that the unmarked concept may be a frequent source and the marked concept a frequent target of semantic change. Table 1. Some of the concept pairs proposed as developing directionally. For full list and discussion, see Urban (2011: 9-13). Arrows point from cross-linguistically unmarked to marked concepts. Nature > Nature animal > bird sun > moon grass > straw cloud > fog honey > wax
Human > Human breast > milk mouth > lip belly > womb heart > belly liver > lungs
Human > Nature skin > bark mouth > beak saliva > foam house > nest tongue > flame
Nature > Human egg > testicle sun > clock shadow > mirror bird > airplane foam > lungs
Most of the concept pairs in Table 1 are characterized by some form of relation, such as similarity (e.g., ‘cloud’~‘fog’), contiguity (‘honey’~‘wax’) or taxonomy (‘animal’~‘bird’). But mere association does not explain why the direction of change only goes one way, i.e., it does not predict that ‘animal’> ‘bird’ is more likely than ‘bird’>‘animal’. What motivates directional patterns in word formation? Why do speakers use the word for ‘skin’ to express the meaning of ‘bark’ but not vice versa? In this paper, we explore these questions by looking at synchronic correlates of the observed cross-linguistic asymmetries. Following the lead of Jäger and Rosenbach’s “diachronic priming hypothesis” (2008), we look at how asymmetries in synchronic mental associations may be useful for predicting diachronic change. To this end, we use word association data from behavioral experiments, expecting to see that words for marked concepts tend to be associated to words for unmarked concepts—and less so the other way round. We explore a related idea by looking at associations between concepts in encyclopedias. Using a network model of Wikipedia, where articles correspond to nodes and hyperlinks to connections, we expect to see articles for unmarked concepts to have more connections. Association data and associations in encyclopedias can be seen as direct synchronic reflections of the trends observed by Urban (2011). To further understand where these association asymmetries come from, we look at English 354
corpora and behavioral data from reaction time studies. We expect unmarked concepts to be more cognitively accessible. That is, we expect them to be more frequent in corpora, and we expect them to elicit faster reaction times. Here, the idea is that when new expressions are formed from old ones, these expressions are likely to be based on more accessible and more frequent concepts. Finally, we test an old idea from linguistics: It has been proposed that words become semantically extended by virtue of being used in diverse contexts (see, already, Zipf, 1949: 19-31; see also Calude & Pagel, 2011: 1106). Based on this idea, we further expect to see unmarked concepts to occur in more different textual contexts, and, we expect them to have more senses listed in dictionaries. In all of our analyses, we look at correlations between the cross-linguistic data and a single well-studied language, English. We focus on English for this first case study; future work will need to explore other languages. 2. Results 2.1. Mental associations Nelson, McEvoy and Schreiber (1998) asked over 6,000 participants to list the first word that came to mind in response to cues such as “BOOK ____.” For a given concept pair A,B this produces a forward association probability P(A,B) (the proportion of people responding “B” after seeing “A”) and a backward association probability P(B,A). The word animal, for example, has a forward association of 0.016 to the word bird, and a backward association of 0.04. This means that more people responded animal when cued by bird. Conversely, fewer people responded bird when cued by animal. Naturally, the English free association data does not contain exact matches for all of the 45 concept pairs in Urban (2011). In fact, we found only 8 cases in which both words of a pair came up as cue and target in the database (such as animal and bird discussed above). Probably due to sparse data, the difference between unmarked and marked concepts is only approaching significance (t(7)=2.01, p=0.084). The numeric trends show that marked concepts are likely leading towards words for unmarked concepts (Fig. 1a). A measure less affected by sparseness is the mean forward association and the mean backward association, averaged across all concepts. Take, for example, the concept pair ‘animal’>‘bird’. The word animal has forward associations to dog (0.293), cat (0.12), zoo (0.022), as well as to many other words. It also has “incoming” backward associations from dog (0.026), cat (0), zoo (0.649) (here, 0 means that no participant responded animal after seeing cat). We can take the 355
mean of all backward associations and subtract the mean of all forward associations, giving us an index of association asymmetry (cf. Hill, Korhonen & Bentz, 2013). A higher asymmetry index indicates more incoming than outgoing associations. The index is higher for words corresponding to unmarked concepts (t(75)=2.68, p