Acoustic Communication in Animals: From Insect Wingbeats to Human Music 9819908302, 9789819908301

This book is the first volume of the bioacoustics series published by the Society for Bioacoustics. This volume provides

209 46 5MB

English Pages 230 [231] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Reference
Contents
Chapter 1: Using Knowledge About Human Vocal Behavior to Understand Acoustic Communication in Animals and the Evolution of Lan...
1.1 How Can We Apply What We Know About Humans to the Study of Animal Communication?
1.2 The Impact of Using Humans as a Model Species for Animal Communication
1.2.1 Impacts on the Time Domain
1.2.2 Impacts on the Spectral Domain
1.2.2.1 Octave Equivalence
1.2.2.2 Consonance
1.2.3 Impacts on the Structural Domain
1.3 Conclusions
References
Chapter 2: Acoustic Communication in Fruit Flies and Mosquitoes
2.1 Introduction
2.2 Acoustic Behaviors
2.2.1 Fruit Flies
2.2.1.1 Courtship Song of Males
2.2.1.2 Females Emit Songs During Copulation
2.2.1.3 Sound Pulses During Agonistic Interactions
2.2.2 Mosquitoes
2.2.2.1 Mosquito Flight Tones
2.2.2.2 Male Hearing Behaviors
2.2.2.3 Female Hearing Behaviors
2.3 Ear Anatomy and Function
2.3.1 Fruit Flies
2.3.1.1 Ear Anatomy
2.3.1.2 The Auditory Neural Circuit in the Brain
2.3.1.3 Sound Localization
2.3.2 Mosquitoes
2.3.2.1 Ear Anatomy and Function
2.3.2.2 Auditory Neuropile in the Brain
2.3.2.3 Sound Localization
References
Chapter 3: Multiple Functions of Ultrasonic Courtship Song in Moths
3.1 Bat-Predator and Insect-Prey
3.2 Discovery of Unconscious Ultrasonic Communication
3.3 Male Courtship Ultrasound
3.3.1 High-Intensity Ultrasound
3.3.2 Low-Intensity Ultrasound
3.4 Secondary Evolution of Ultrasonic Courtship Song
3.4.1 Deceptive Function in Courtship Song
3.4.2 Discrimination Before the Evolution of Mate Recognition
3.5 Self-Feedback Via Courtship Song
3.6 Perspectives
References
Chapter 4: Recent Progress in Studies on Acoustic Communication of Crickets
4.1 Introduction
4.2 Acoustic Signal and Sexual Selection in Crickets
4.2.1 Overview of Acoustic Signals and Female Preference in Crickets
4.2.2 Remaining Key Questions
4.3 Effect of Conspecifics on Acoustic Communication of Crickets
4.4 Effect of Heterospecifics on Acoustic Communication of Crickets
4.4.1 Predator-Prey Interaction
4.4.2 Competition for Calling Sites
4.4.3 Acoustic Masking Interference
4.5 Acoustic Communication of Crickets in the Anthropocene
4.6 Conclusion
References
Chapter 5: Vocal Imitation, A Specialized Brain Function That Facilitates Cultural Transmission in Songbirds
5.1 Imitation, Social Learning, and Cultural Transmission
5.2 Vocal Imitation in Social Animals
5.3 Vocal Imitation in Songbirds
5.4 Neural Mechanism of Vocal Imitation in Songbirds
5.5 Comparative View to Dissect the Mechanism of Imitation
5.6 Future Directions
References
Chapter 6: Dancing in Singing Songbirds: Choreography in Java Sparrows
6.1 Introduction
6.1.1 Song-Dance Courtship in Estrildid Finches
6.1.2 Song-Dance Courtship in Java Sparrows
6.2 Methods
6.2.1 Examining Song-Dance Coordination in Males
6.2.2 Comparing Dance Sequence Complexity Between Males and Females
6.3 Results
6.3.1 Song-Dance Coordination in Males
6.3.2 Dance Sequences of Males and Females
6.4 Discussion
References
Chapter 7: Vocal Communication in Corvids: Who Emits, What Information and Benefits?
7.1 Vocal Communication as a Window into the Understanding of Animal Cognition
7.2 Social Function of Vocal Signals in Group-Living Mammals and Birds
7.2.1 Contact Call: Signals for Sender´s Social Information
7.2.2 Alarm Call: Signals for Potential Risk
7.3 Vocal Communication in Corvids
7.3.1 Social Ecology of Corvids as a Foundation for Vocal Communication of Conspecific and Heterospecific Information
7.3.2 Contact Call and Individual Recognition
7.3.3 Alarm Call: Vocal Signals for Recognizing Identity and Behavior of Heterospecific Animals
7.3.4 Alarm Calls in Non-Breeding Groups: Information and Function
7.4 Future Direction
References
Chapter 8: Affiliation, Synchronization, and Rhythm Production by Birds
8.1 Introduction
8.2 Affiliative Interactions Between Mates
8.3 Vocal Mimicry Between Cage Mates
8.4 Behavioral Contagion Among Individuals
8.5 Rhythmic Synchronization
8.5.1 Budgerigars
8.5.2 Bengalese Finches
8.5.2.1 Experiment 1
8.5.2.2 Experiment 2
8.5.2.3 Experiment 3
8.5.2.4 Experiment 4
8.6 Timing Coordination with Others in Budgerigars
8.7 General Discussion
References
Chapter 9: Cockatiels: A Research Subject for Studying Capability for Music Production
9.1 Why Cockatiels?
9.2 Development of Acoustic Patterns of the Songs
9.3 Synchronization to a Playback of the Melody of Human Music
9.4 Re-arrangement of the Melody by the Birds
9.5 Creation of Novel Sound Sequences by the Birds
9.6 Another Music-Like Behaviour Seen in the Birds
9.7 Functions of These Music-Like Behaviours in Cockatiels
9.8 What Cockatiels Teach Us About Acoustic Communication and Diversity
References
Chapter 10: Acoustic Properties and Biological Significance of Ultrasonic Vocalizations in Rodents: Emotional Expressions
10.1 Introduction
10.2 Types of Mouse USVs
10.2.1 Female-to-Female and Male-to-Male USVs
10.2.2 Functional Considerations for F-F and M-M USVs
10.2.3 Basic Acoustic Properties of Mouse USVs
10.2.4 Who Is the Sender of USVs?
10.3 Mouse pupUSVs
10.3.1 Development of pupUSVs and Thermoregulation
10.3.2 Factors Influencing pupUSVs
10.3.3 Receiver´s Response
10.4 Mouse Courtship Vocalizations
10.4.1 Expression of Sexual Motivation
10.4.2 Individual Differences: Environmental and Genetic Factors
10.4.3 Female Responses
10.4.4 Characteristics as a Sexual Display
10.5 Rat USVs
10.5.1 Basic Characteristics of Rat USVs and Differences from Mouse USVs
10.5.2 Rat pupUSVs
10.5.3 Adult Distress Calls
10.5.4 Adult Positive Calls
10.6 Conclusion and Future Directions
References
Chapter 11: Effects of Acoustic Interference on the Echolocation Behavior of Bats
11.1 Introduction
11.2 Echolocation Pulse Design
11.3 Acoustic Interference in Bat Echolocation
11.3.1 Auditory Masking
11.3.2 Clutter Interference
11.3.3 Cocktail Party Nightmare
11.3.3.1 Spectral Adjustments
11.3.3.2 Temporal Adjustments
11.3.3.3 Acoustic Interference in CF-FM Echolocating Bats
11.4 Conclusion
References
Chapter 12: Diverse Sound Use and Sensitivity in Auditory Communication by Chimpanzees (Pan troglodytes)
12.1 Sensitivity to Frequency and Spectral of Sound
12.2 Flexibility in Vocalization
12.3 Using Sound in the Context of Auditory Communication Outside of Vocalization
12.4 Auditory Communication Uniquely Found in Captivity or in the Wild
12.5 Body Gestures in Auditory Communication
References
Chapter 13: The Interplay Among the Linguistic Environment, Language Perception, and Production in Children´s Language-Specifi...
13.1 Introduction
13.1.1 Compound Word Formation and Rendaku Voicing
13.1.2 Rendaku Conditions
13.1.3 Preschooler-Specific Prosodically Based Rendaku Strategy
13.2 Study 1
13.2.1 Research Questions
13.2.2 Working Hypotheses
13.2.3 Methods
13.2.3.1 Participants and Ethical Considerations
13.2.3.2 Procedure
13.2.3.3 Compound Noun Formation Task with Cross-Modal Linguistic Stimuli: Visual (Orthography) and Auditory (Speech) Informat...
13.2.3.4 Design and Materials
13.2.3.5 Measures
13.2.4 Results
13.2.5 Discussion of the First Experiment
13.3 Study 2
13.3.1 Methods
13.3.1.1 Procedure (Compound Comprehension Task)
13.3.1.2 Materials
13.3.2 Results and Discussion
13.4 Summary and Conclusion
An Example of Experimental Stimuli (No-Orthography Condition)
Verbal Instructions in the Experiment
Cross-Modal Linguistic Stimuli Used in the Production Task
List of Stimuli for 16 E2s Used in the Test Trial
Example of Stimuli Used in the Forced Choice Tasks
Two Types of Pitch-Accent Assignment
References
Chapter 14: Sound Processing in the Auditory Periphery: Toward Speech Communication and Music Comprehension
14.1 Introduction
14.2 Mechanism of Auditory Periphery
14.3 Peripheral Speech Processing
14.4 Peripheral Processing of Musical Tones
14.5 Boundary Between Speech and Music
References
Recommend Papers

Acoustic Communication in Animals: From Insect Wingbeats to Human Music
 9819908302, 9789819908301

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Yoshimasa Seki Editor

Acoustic Communication in Animals From Insect Wingbeats to Human Music (Bioacoustics Series Vol.1)

Acoustic Communication in Animals

Yoshimasa Seki Editor

Acoustic Communication in Animals From Insect Wingbeats to Human Music (Bioacoustics Series Vol.1)

Editor Yoshimasa Seki Department of Psychology Aichi University Toyohashi, Japan

ISBN 978-981-99-0830-1 ISBN 978-981-99-0831-8 https://doi.org/10.1007/978-981-99-0831-8

(eBook)

© The Society for Bioacoustics 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

I thank all the people who take this book in their hands. This is the first book in a series which will be produced by The Society for Bioacoustics, a society which was launched in Japan in 2014. While not all contributing authors are members of the Society, I personally asked each of them to write a chapter for this book and they completed excellent, thought-provoking manuscripts. I deeply appreciate the contributions of the authors. As previous studies have mentioned (e.g., Bradbury and Vehrencamp 2011), the definition of “communication” is difficult. Nevertheless, most animals generate vibrations and sounds while they are moving their body parts (including the vocal organs). Those sounds can be research targets of bioacoustics studies. Further, how animals use those sounds, especially in inter-individual relationships, is a topic of great interest to people. Therefore, “acoustic communication in animals” is a fitting title for the first book in this series. Marisa Hoeschele (Acoustics Research Institute, Austrian Academy of Sciences) and her colleagues wrote an introduction for this book. They provided a nice overview of studies for acoustic communication in animals. In subsequent chapters, the authors introduce some fascinating recent topics in animal communication research, ranging from invertebrates to humans. This “cross-sectional approach” looking at a range of species at once is one of the characteristics of this book. Of course, each species occupies its own space on the tip of a branch of the phylogenetic tree; therefore, “from invertebrates to humans” does not mean “from inferior to superior.” Another characteristic of this book is the variety of author backgrounds. The research fields studied and the research methods employed also vary widely, from molecular biology and neurobiology to psychology and human brain imaging. Nevertheless, the authors developed their chapters under the consideration of ethology and evolution. I believe readers will recognize the profundity of the topics in each chapter. Traditionally, some Japanese people have unique perspectives on sounds which are produced by animals. For example, Bashô, the most famous Haiku poet of the v

vi

Preface

Edo era, composed a very short poem, “Furu ike ya kawazu tobikomu mizu no oto.” This is the most famous poem in Japan, which depicts a scene in which “frog jump into old pond making sound.” (Note: indefinite and definite articles, plural forms, and the third person singulars are not used in Japanese sentences. This ambiguity may stimulate our imagination for the scene.) As another example, school children in Japan learn a song “Mushi no Koe” meaning “voices (or sounds, songs) of insects,” in which some crickets’ songs and the associated onomatopoeia appear in the lyrics. Most of the authors currently live in Japan. Therefore, they may have been influenced by this culture, and the contents of the chapters may reflect their perspectives. This is another unique characteristic of this book. Upon concluding this preface, I thank the staff of Springer Nature, especially Fumiko Yamaguchi, for supporting my work on this publication. I hope this book will help you to deepen your understanding of acoustic communication in animals. Toyohashi, Japan

Yoshimasa Seki

Reference Bradbury JW, Vehrencamp SL (2011) Principles of animal communication, 2nd edn. Sinauer, Sunderland, MA

Contents

1

Using Knowledge About Human Vocal Behavior to Understand Acoustic Communication in Animals and the Evolution of Language and Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marisa Hoeschele, Dan C. Mann, and Bernhard Wagner

1

2

Acoustic Communication in Fruit Flies and Mosquitoes . . . . . . . . . Matthew P. Su and Azusa Kamikouchi

27

3

Multiple Functions of Ultrasonic Courtship Song in Moths . . . . . . . Ryo Nakano

47

4

Recent Progress in Studies on Acoustic Communication of Crickets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Takashi Kuriwada

63

Vocal Imitation, A Specialized Brain Function That Facilitates Cultural Transmission in Songbirds . . . . . . . . . . . . . . . . . . . . . . . . Masashi Tanaka

81

5

6

Dancing in Singing Songbirds: Choreography in Java Sparrows . . . Masayo Soma and Mari Shibata

95

7

Vocal Communication in Corvids: Who Emits, What Information and Benefits? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Yuiko Suzuki and Ei-Ichi Izawa

8

Affiliation, Synchronization, and Rhythm Production by Birds . . . . 125 Yuko Ikkatai and Yoshimasa Seki

9

Cockatiels: A Research Subject for Studying Capability for Music Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Yoshimasa Seki

vii

viii

Contents

10

Acoustic Properties and Biological Significance of Ultrasonic Vocalizations in Rodents: Emotional Expressions . . . . . . . . . . . . . . 153 Shota Okabe and Kouta Kanno

11

Effects of Acoustic Interference on the Echolocation Behavior of Bats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Kazuma Hase, Kohta I. Kobayasi, and Shizuko Hiryu

12

Diverse Sound Use and Sensitivity in Auditory Communication by Chimpanzees (Pan troglodytes) . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Yuko Hattori

13

The Interplay Among the Linguistic Environment, Language Perception, and Production in Children’s Language-Specific Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Takayo Sugimoto

14

Sound Processing in the Auditory Periphery: Toward Speech Communication and Music Comprehension . . . . . . . . . . . . . . . . . . 219 Toshie Matsui

Chapter 1

Using Knowledge About Human Vocal Behavior to Understand Acoustic Communication in Animals and the Evolution of Language and Music Marisa Hoeschele, Dan C. Mann, and Bernhard Wagner

Abstract Humans have an extensive vocal repertoire including language, music, and innate sounds such as laughing, screaming, and crying. When it comes to understanding acoustic communication in other animals, it can be difficult to know to which domain their vocalizations compare. In addition, in humans we have information about both the meaning and fundamental structural units of the sounds we produce. In contrast, in other animals we have neither of these things by default. This makes it very difficult to know where to begin analysis. Often researchers begin by, e.g., recording members of a species in different contexts and looking for vocal differences. While this has led to some important insights, imagine if another species were to do the same thing with human vocalizations: They would likely greatly underestimate the information present in human sounds. Here, we will provide examples showing how approaching animal communication with humans as a model species can potentially answer whether any other species may have abilities that match our own. In addition, we will show how evaluating current hypotheses about the functional and evolutionary origins of human language and music provides insights on the purposes of similar abilities in other species. Keywords Musicality · Language · Acoustics · Vocalizations · Animal communication There are a lot of parallels between human language and music and the acoustic communication in other species and many examples of how studying animal M. Hoeschele (✉) · B. Wagner Acoustics Research Institute, Austrian Academy of Sciences, Vienna, Austria e-mail: [email protected]; [email protected] D. C. Mann Acoustics Research Institute, Austrian Academy of Sciences, Vienna, Austria Konrad Lorenz Institute of Ethology, University of Veterinary Medicine, Vienna, Vienna, Austria © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 Y. Seki (ed.), Acoustic Communication in Animals, https://doi.org/10.1007/978-981-99-0831-8_1

1

2

M. Hoeschele et al.

communication can lead to insights about human language and music (e.g., Doupe and Kuhl 1999; Fitch 2005, 2017; Patel 2003). For example, research with zebra finches led to the insight that the FoxP2 gene is relevant for the sensory-guided motor learning necessary for vocal learning across species including human language (Scharff and Petri 2011). Another example is that studying the neural pathways of animals with complex vocal learning (learning entirely new signals via imitation or invention) has clarified what fundamental connections in the brain (Jarvis 2019) and genetic underpinnings (Pfenning et al. 2014; Cahill et al. 2021) are needed to be able to develop a vocal communication system that resembles language. But what about the other way around? Given that we know so much about humans, can we use that knowledge to learn more about the communication of other animals? Humans have the most studied and best understood acoustic communication system of any species on the planet. We humans typically think of our own communication system as residing on a separate level from other species. Many studies have spent time trying to identify the unique aspects of human language relative to the abilities of other animals. For example, that language requires especially complex cognitive processing such as recursion (Hauser et al. 2002) that may be rare or absent in other species. However, it is likely that the majority of human behaviors are only different from other species in degree not kind, as Darwin famously said already in 1871 (although it is important to consider that differences in degree could lead to what we might consider to be differences in kind at higher levels of cognition, see Penn et al. 2008). Other species have often simply not been studied in the same way as humans (e.g., see Mann and Hoeschele 2020 for an example with regard to vocal behavior) so it may be premature to treat our own vocal behaviors as being categorically distinct from other species. And even if they are in some ways distinct, in many ways they are likely to have strong parallels to the communication systems of other species. Here, we argue that if we start from human abilities and consider how these abilities are studied differently than those of other species, then we can open up surprising new areas of research that can be informative for animal communication. In the next section, we will explain how this approach—learning about animal communication starting from human acoustic behaviors—can be put into action. Afterward, we will discuss some examples of how this approach has informed our own work, and then, we will conclude with an outlook to future work.

1

Using Knowledge About Human Vocal Behavior to Understand. . .

1.1

3

How Can We Apply What We Know About Humans to the Study of Animal Communication?

There are endless topics of study in the research area covering human acoustic communication—music perception, linguistics, phonetics, music performance—all of these are vast areas of study with many niche subdomains. But not all of it is likely to be relevant across species. So how can we choose which topics to study across species? The answer lies mainly in the topic’s connection to culture. Topics that are culturally specific (e.g., studying the style of Mozart; McClary 1986), or culturally divergent (e.g., variations between justice systems around the world, Greenberg 2001) are not likely to be relevant across species. Instead, cross-cultural similarities are what is likely to be interesting for cross-species study (e.g., parallels in musical composition cross-culturally; e.g., Hulse et al. 1992, or the fact that justice itself is an important topic cross-culturally; e.g., Greenberg 2001). The reason that cross-cultural similarities are an especially good starting point is that they represent human-wide abilities. Culturally specific/divergent topics are likely the result of human cultural evolution as it happened to have occurred on Earth. If another population of humans existed on another planet, they may well have additional specific/divergent ways of being. As an example, it would not make sense to ask whether an animal knows how to use chopsticks because it is an invention that humans had to develop and then have to learn. However, if the divergences are studied over enough cultures, sometimes broader parallels can be found. For example, the broader question about whether other animals use tools while eating may be applicable, because eating tools (e.g., forks, chopsticks, edible tools such as bread, tinfoil wrap, jugs for beverages, skewers) appear in a variety of forms across human cultures. Cross-cultural parallels can be difficult to identify because they tend to feel trivial. In music, for example, although style varies greatly across cultures and genres, some basic aspects are very common, such as our attention to relative pitch (we recognize a melody even if the frequency has been shifted up or down) and the use of a steady beat (see, e.g., Honing et al. 2015). For language, there is a limited set of phonemes across all cultures (Pfitzinger and Niebuhr 2011), and the timing between utterances when speaking with another person is highly conserved cross-culturally (Stivers et al. 2009). These make excellent topics for cross-species study. However, phenomena that are more tempting to study are those that seem to be much more impressive or surprising aspects of human behavior but which may actually be artifacts of laboratory testing (e.g., human uniqueness in decision making Ludvig et al. 2014), not cross-culturally valid (e.g., the Müller–Lyer visual illusion, Segall et al. 1968), or difficult to empirically demonstrate even in humans (e.g., consciousness; Melloni et al. 2021; Allen and Trestman 2017). One way to try and discover the more trivial cross-cultural parallels is to take the perspective of an alien researcher: How would we study humans if we were not humans ourselves? And what would we make of typical human behaviors? For example, an alien researcher might say “Aha, this creature is using a tool to consume

4

M. Hoeschele et al.

food” if it observes someone using a fork. This perspective can help highlight the big picture. Additionally, culturally universal topics can include everything from physiology and psychophysical abilities/limitations to behavior and cognition. The former tends to be easier to study, and much work has been done, for example, comparing the hearing ranges of different species (e.g., Okanoya and Dooling 1987; Osmanski and Wang 2011) the physiological structure of different hearing apparati (e.g., Stiebler and Ehret 1985; Köppl 2001), and how spatial hearing is accomplished (see, e.g., Ashida and Carr 2011). Here, we are focused on the latter: human behavior and cognition, specifically related to acoustic communication. In the next section, we detail several specific examples of how the approach of understanding animal communication based on human behavior and cognition has been useful in our past research.

1.2

The Impact of Using Humans as a Model Species for Animal Communication

Using humans as a model species for vocal communication may at first appear counterintuitive, as human vocal communication is so complex. However, this is actually an advantage rather than a disadvantage. To better understand the complexities of human vocal communication, it becomes necessary to separate different aspects of acoustic communication into distinct domains (e.g., the time domain, the spectral domain, and the structural domain). While human communication remains rather complex in all of these domains, once the domains are established, it becomes clear that some of these complexities are shared by other species. As such, humans can serve as model species for studies in non-human species whose vocal communication may not be as complex overall but similar in singular domains. We now go on to demonstrate this by discussing three domains: the time domain, the spectral domain, and the structural domain and explaining how using human vocal communication as a model has aided efforts to understand animal communication as a whole. Overall, our goal is to show that taking a step back and identifying almost trivial aspects of human language/music behavior and cognition can be fruitful in finding abilities in other species that depend on these very same building blocks.

1.2.1

Impacts on the Time Domain

Rhythm is critical in both music and language. We use rhythm in order to synchronize with each other whether it is by playing music together, dancing together, or speaking together without interrupting or leaving awkward pauses.

1

Using Knowledge About Human Vocal Behavior to Understand. . .

5

Research on acoustic parallels between humans and other animals often began with avian species, because we anthropomorphically identified “songbirds” as a group of species that “sing.” Rhythm was thus also expected of these birds. Over one hundred years ago, some musical theorists such as Billroth (1895) suggested that a variety of bird species sang rhythmically. Later data supported this initial intuition. Experimental studies beginning in the mid-1900s showed that songbirds can perceive rhythmic information in acoustic signals. For example, jackdaws (Coloeus monedula) were able to distinguish stress patterns (i.e., strong-weak-strongweak vs. strong-weak-weak) regardless of tempo or timbre (Reinert 1965). About 20 years later, it was shown that European starlings (Sturnus vulgaris) were able to distinguish rhythmic and arrhythmic patterns (Hulse et al. 1984). Rhythmic information also appeared to be important in birds’ own vocalizations (e.g., Slabbekoorn and ten Cate 1999). We now know that songbirds in particular learn new vocalizations throughout development via experience, and this ability is supported by neural connections that have surprising parallels to the neural connections found in humans and not in non-human primates and most other mammals (Jarvis 2007). Songbirds are one of the few groups of animals that learn entirely new vocalizations based on experience (Tyack 2020). While non-human primates were considered vocal non-learners in the categorical view of vocal learning, in some cases primates can modify their innate vocalizations based on experience. Because of this, vocal learning is now often considered a more continuous feature (see, e.g., Wirthlin et al. 2019). While classic vocal learners, such as humans and songbirds, learn entirely new vocalizations from experience, many animals achieve a more subtle form of vocal learning such as, e.g., changing the frequency of an innate vocalization. We humans are the only primate member to have this classic vocal learning ability, and this is supported by brain structure that is more similar to songbirds than other primates in terms of overarching connections between the auditory regions and other regions. Taken together, the fact that birds are both (1) vocal learners and (2) seemed to be able to identify important rhythmic information may lead to optimism about the relative rhythmic abilities of birds and humans. However, more recent studies that looked more closely at how birds evaluate rhythm in acoustic signals showed that these animals may be more dissimilar to us humans than we originally thought. Specifically, although the birds can solve some tasks that involve the timing of elements, they are often not using what we would refer to as rhythm. Instead of paying attention to the overarching rhythmic structure (e.g., the fact that beats occur regularly), the birds mainly paid attention to very specific features of the signal (e.g., the exact length of time between elements). In other words, the birds did not perceive what we humans would consider the core part of rhythm: the regularity of particular elements to one another, and the pattern of strong and weak elements within that overarching structure. Instead, they either memorized specific time intervals, or the order with which sounds were presented regardless of the timing. Pigeons (Hagmann and Cook 2010), non-vocal learners, had the most trouble with this and showed no evidence of perceiving any kind of regularity. Zebra finches and budgerigars, both vocal learning species, did recognize some but not other aspects of regularity. A few

6

M. Hoeschele et al.

individual budgerigars, however, did identify the overarching rhythmic structure (ten Cate et al. 2016). So it is possible for these species, but may not be the default way of processing sounds. Because the same might be said of the older studies with birds as well, this raises the question what the definition of “rhythm” should be in order to distinguish between what we expected the birds to be able to do and what they actually did. An important consideration here is that animals tend to make rhythmic patterns with a steady pulse even inadvertently. For example, a woodpecker pecking at some wood (Wang et al. 2011) or a songbird performing a trill (Podos 1997) both may simply be performing at maximal rate which results in a steady pulse. Similar rhythmic patterns can be found in much more common behaviors such as breathing and walking across species. The question of a definition of rhythm turns out not to be an easy task, because it gets at the heart of the issue at hand. Do we define it in a way that it captures human music but excludes everything else? Or do we define it in a way that incorporates the abilities of other animals? Importantly to this discussion: At the same time as some of the more recent rhythmic perception studies were happening, other studies focused on what was commonly thought to be a human-only ability until less than 15 years ago: motor entrainment. Motor entrainment is the ability to synchronize bodily movement with an external acoustic signal. Because of videos of animals on the social media platform YouTube, Snowball the cockatoo was discovered. This cockatoo was visited by researchers, and from experiments where Snowball’s favorite music was slowed down and sped up, it appeared that Snowball was able to track and move to the beat just like humans (Patel et al. 2009). Further studies suggested that this ability was common among some vocal learning species (Schachner et al. 2009) although vocal non-learning species could in some cases be trained to entrain to a beat (Cook et al. 2013; Hattori et al. 2013). In a broader sense however, entrainment to acoustic signals can be found in a wide variety of species, most notably anurans and arthropods (see Greenfield 2005), and these abilities can exceed the abilities of humans (see, e.g., Jacoby et al. 2021 as human comparison). However, it is unclear whether the mechanism for this rhythmic behavior is the same as that of synchronizing to a musical beat. Patel (2021) outlines three ways that dancing along with music is different from what anurans and arthropods do: (1) the beat in music is far from as obvious as the on-or-off nature of a cricket chirp, (2) entrainment to music is possible at a wide variety of tempos (speeds) instead of a narrow range of tempos, and (3) there is a domain switch from acoustics to movement. If we extend rhythm from acoustic rhythms to other domains, the problem becomes even more complicated. Despite humans having brain adaptations meant to process acoustic rhythmic information from birth (Winkler et al. 2009) which does not appear to be found in other primates (Honing et al. 2012; Merchant et al. 2015) other primates do not necessarily lack non-acoustic rhythmic abilities. In fact, chimpanzees perform what’s called a “conga line” where they move rhythmically together but without any acoustic information (Lameira et al. 2019). Perhaps human

1

Using Knowledge About Human Vocal Behavior to Understand. . .

7

vocal learning allowed our species to apply rhythmic abilities to the acoustic domain in ways that match other vocal learning species. In sum, the human ability to sing and dance and synchronize together via acoustic communication has enabled us to study a very wide-reaching new area of research that is relevant to disparate species throughout the animal kingdom. Whether it’s frogs, insects, birds, or mammals, rhythm affects all of them. By considering abilities that are important in human behavior, we identified several important features of rhythm perception and production that may be relevant across species. This work has led to a reevaluation of the definitions and extent of rhythm. On the one hand, rhythm is everywhere in our unconscious behaviors such as walking and breathing; on the other hand, there are very specific measurably distinct abilities, such as motor entrainment and the perception of a steady pulse in a stream of sound, that are critical in human rhythmic behaviors. There are likely to be other specific abilities that are relevant to non-human animals only, but by noting these human-specific abilities we were able to tease apart many foundational aspects of rhythmic behavior. Now, as we strive toward clearer definitions of rhythm, it is through this comparative work and a look toward our own species that has allowed us to provide scaffolding for this multi-faceted set of abilities we colloquially refer to simply as rhythm.

1.2.2

Impacts on the Spectral Domain

The spectral domain is highly important to human vocal communication. A prime aspect of human spectral communication is pitch, which is usually the percept of the fundamental frequency of a sound. High or low pitch as well as the contour of the pitch (i.e., the changing of pitch over time) is used to convey different information in human communication (e.g., Clark and Yallop 1990; Yip 2002; see also Stevens et al. 2013). Beyond pitch, the human voice is a harmonic sound, meaning that beyond the fundamental frequency (usually perceived as pitch (e.g., Hirst and Looze 2021)), our voices feature overtones at integer multiples of the fundamental frequency. This is highly important to speech/language, as these overtones can be filtered by the position of our vocal apparatus resulting in the vowels and consonants that constitute words. The field of research regarding the spectral domain of human and non-human vocal communication and its perception is vast and has many subfields (such as phonology, musicology, psychoacoustics, neurobiology). For reasons of brevity and expertise, we wish to focus here on two examples of how using humans as a model species for vocal communication can lead to interesting insights with regard to two phenomena related to human vocal harmonics: octave equivalence and consonance preference.

8

1.2.2.1

M. Hoeschele et al.

Octave Equivalence

To humans, sounds separated by a doubling in frequency (the musical interval of “an octave”) sound similar (Burns 1999; Patel 2003, Hoeschele et al. 2012b; Wagner et al. 2022). Despite differences between cultures in how musical scales are structured and how pitch is perceived (Hove et al. 2009), the octave is used as a basis of pitch perception in a variety of musical cultures from around the world (e.g., Burns 1999, but see Jacoby et al. 2019, McDermott et al. 2016). Using humans as a model species for perception of octave equivalence is a particularly good example of how this approach can lead to new insights. Octave equivalence is primarily known to exist in humans because we experience it firsthand. However, for a long time, it was difficult to empirically show the effect even in humans with early studies finding contradictory results (Allen 1967; Deutsch 1972; Dowling and Hollombe 1977; Krumhansl and Shepard 1979, Kallman 1982; see also Burns 1999). As such, early octave equivalence studies with non-human animals similarly attempted to test whether they perceive the phenomenon. The earliest such work is a study by Blackwell and Schlosberg (1943) who studied octave equivalence in rats. Those rats were trained to discriminate between notes and subsequently probed with notes that were an octave higher. The rats responded similarly to the octave transposed notes which was interpreted as octave equivalence perception. However, Burns (1999) criticized this study for not controlling for harmonics, meaning that octave may have been contained in training stimuli. Another study tested European Starlings (Sturnus vulgaris) for octave equivalence perception (Cynx 1993). While this study did not find octave equivalence in the tested species, it was later also found that the used paradigm also failed to show octave equivalence in human participants (Hoeschele et al. 2012b). Therefore, it became clear that humans needed to be established more clearly as a model species, testing them using paradigms that would also be applicable with other species. As such, a non-verbal operant conditioning paradigm was developed by Hoeschele et al. (2012b) which succeeded in demonstrating octave equivalence in humans. While it was not the first study to succeed in doing so (see, e.g., Kallman 1982), it was unique in that its paradigm could readily be applied to other species. This was first implemented with black-capped chickadees (Poecile atricapillus; Hoeschele et al. 2013). Yet, this species still did not show octave equivalence. As such, the next logical question was why humans in particular might have octave equivalence. Octave equivalence emerges in humans in early childhood, during pre-verbal infanthood (Demany and Armand 1984). A biological basis of the phenomenon has therefore been suggested, with the physical structure of the human voice being put forward as a likely candidate for such a basis (e.g., Bowling and Purves 2015; Schwartz et al. 2003; Terhardt 1984). As the human voice is a harmonic sound, it has frequencies occurring at integer multiples of the fundamental frequency, i.e., at frequencies ×2, ×3, ×4, ×5, ×6, and so on. Therefore, because octaves are a doubling in frequency, the human voice prominently features octaves:

1

Using Knowledge About Human Vocal Behavior to Understand. . .

9

every other harmonic within the human voice constitutes successive octaves from the fundamental. When learning to speak, human children at times need to copy sounds presented by adult voices that are outside of their higher vocal ranges. In such cases, children will produce successful imitations by transposing their fundamental frequency by an octave relative to the adult voices (Peter et al. 2008, 2009; see also Peter et al. 2015). This is perceived by humans as a successful imitation (see Hoeschele 2017). Physically speaking, this makes sense, as a sound transposed by an octave shares as many overlapping harmonics with the original sound as is possible for two sounds with distinct fundamental frequency. As such, it appears logical that octave equivalence is connected to vocal learning, an idea dubbed the “vocal learning hypothesis.” Yet, black-capped chickadees, who failed the non-verbal operant octave equivalence paradigm, are a vocal learning species (Hoeschele et al. 2013). An explanation seemed to be that it is not that vocal learning alone that necessitates octave equivalence, but only vocal learning that involves imitation of sounds outside of an individual’s vocal range (as is the case in human children but not in black-capped chickadees). Therefore, our group ran a study with budgerigars, a small Australian parrot species which is capable of imitating sounds outside of its vocal range (e.g., the human voice). However, budgerigars did not show octave equivalence either (Wagner et al. 2019) in a study using the verified non-verbal paradigm from Hoeschele et al. (2012a, 2013). Indeed, the budgerigars showed an opposite effect to humans, reacting similarly to the black-capped chickadees. Therefore, it appears that vocal learning is likely not at the root of octave equivalence perception across species. As stated above, budgerigars are capable of producing uncannily accurate imitations of sounds outside their vocal range such as human voices. However, unlike human children, they do not appear to use octave equivalence to achieve these imitations. Perhaps they instead imitate formant frequencies of the human voice which occur in their preferred range of vocalization (Scanlan 1999). To conclude, our finding that vocal learning does not necessitate octave equivalence is in line with the three studies that—at least tentatively—have found octave equivalence in non-human species. Wright et al. (2000) showed that two rhesus macaques (Macaca mulatta) generalized melodies that were transposed by an octave but not by a smaller musical interval. Richards et al. (1984) documented that a bottlenose dolphin (Tursiops truncatus) spontaneously transposed sounds outside of her preferred whistle range by an octave. Similarly, house finches tutored by canaries transposed canary vocalizations by an octave in imitating their vocalizations (Mann et al. 2020). As only two of these three species are vocal learning species, we here find further support for the idea that vocal learning is not essential to octave equivalence. In this section, we have shown how using our knowledge about human vocalizations with regard to octave equivalence has not only propelled forward our understanding of a human trait but also engendered new ideas about vocal learning in different species. Yet, the question remains what actually is at the root of the phenomenon. We once more turn to humans as our model species to arrive at new hypotheses regarding the roots of octave equivalence. These may be less related to

10

M. Hoeschele et al.

vocal learning than to another phenomenon, namely human preference for “consonance” which we will discuss now before reaching a final conclusion.

1.2.2.2

Consonance

The term “consonance” is used to describe combinations of tones that are perceived as pleasant by humans. It has long been observed that the defining feature of these combinations is the separation of their frequencies by small integer frequency ratios (e.g., Bowling and Purves 2015; Krumhansl 1990; Terhardt 1984). The octave with its ratio of 2:1 is the most consonant interval, followed by the “perfect fifth” at 3:2 and the perfect fourth at 4:3. The opposite of consonant intervals would be “dissonant” intervals with relatively complex frequency ratios (e.g., a tritone with a ratio of 32:45). Such intervals are often considered as less or even unpleasant to listen to (e.g., Wagner et al. 2020). A number of studies tested non-human species for consonance perception. The methods and goals of these studies varied, however. A number of studies only tested whether certain species could discriminate between consonance and dissonance but not whether they preferred one over the other. Japanese macaques (Macaca fuscata; Izumi 2000), European starlings (Hulse et al. 1995), and Java sparrows (Lonchura oryzivora; Watanabe et al. 2005) were shown to be able to learn to discriminate between consonance and dissonance. The ability of pigeons and black-capped chickadees to discriminate complex chords also suggests that they would be capable of such a discrimination (Brooks and Cook 2010; Hoeschele et al. 2012a; see Toro and Crespo 2017 for a review). Other studies also investigated actual preferences for listening to consonant over dissonant stimuli. The results of these studies were often null such as for Tungara frogs (Physalaemus pustulosus; Akre et al. 2014), Campbell’s monkeys (Cercopithecus campbelli; Koda et al. 2013), and cotton-top tamarins (Sanguinus oedipus; McDermott and Hauser 2004). Where results were not null, they have later been criticized or came in conflict with later results. A study that documented preference to consonance in albino rats (Rattus norvegicus; Fannin and Braud 1971) was contradicted by results from a later study by Crespo-Bojorque and Toro (2015). Another study documenting consonance preference by an infant chimpanzee (Sugimoto et al. 2010) was criticized by Chiandetti and Vallortigara (2011) who pointed out a lack of control. However, there is some evidence for consonance preference in birds derived from field studies: In some bird species, songs contain consonant intervals such as the musician wren (Cyphorhinus arada; Doolittle and Brumm 2012) and the hermit thrush (Catharus guttatus; Doolittle et al. 2014). In the great tit, the production of consonant interval sequences correlates positively with mating success (Richner 2016). Yet, none of these studies are directly comparable to humans and as such it again became relevant to consider where human consonance preference originates from. In ontogeny, consonance preference occurs early, during pre-verbal infanthood (Masataka 2006; Perani et al. 2010; Schellenberg and Trehub 1996; Trainor et al. 2002; Trainor and Heinmiller 1998; Trehub 2003; Zentner and Kagan 1996, 1998

1

Using Knowledge About Human Vocal Behavior to Understand. . .

11

but see Platinga and Trehub 2014). As such, a biological basis of the phenomenon has been suggested (e.g., Bowling and Purves 2015; Schwartz et al. 2003; Terhardt 1984; but see McDermott et al. 2016 and Bowling et al. 2017 for discussion). Above we have discussed how the octave is featured prominently in the harmonic series of the overtones of our voice. Considering the harmonic series again, because all the harmonics are integer multiples of the fundamental it becomes clear that it prominently features the most consonant intervals, namely the octave (2:1) the perfect fifth (3:2) and the perfect fourth (4:3). Human preference for consonance also has been shown to correlate with a preference for harmonic sounds (Bowling and Purves 2015; Bowling et al. 2017, 2018; Cousineau et al. 2012; McDermott et al. 2010). As such, a preference for human voices could translate into a preference for consonant musical intervals—another aspect of the “vocal similarity hypothesis.” As most non-human animals’ voices are also harmonic, a preference for their own voices could lead to consonance preference in many species. Evolutionarily speaking, preferring your own species voice could be beneficial, e.g., in distinguishing animate from inanimate objects (which conversely do not usually produce harmonic sounds as described by, e.g., Chiandetti and Vallortigara (2011)). Here, once again, our knowledge about human vocal behavior allows for new testable hypotheses. A helpful approach with regard to the vocal similarity hypothesis is the implementation of cross-species studies. For example, one could test whether animals will prefer consonance over dissonance in a setup where they would benefit from attending to/approaching a conspecific’s vocalizations. Indeed, such a study was conducted by Chiandetti and Vallortigara (2011). In their paradigm, newly hatched chicks (Gallus domesticus) that were incubated in acoustic isolation preferentially approached one out of two identical imprinting objects if it was associated with a consonant piano melody as opposed to a dissonant version of the same melody). The chicks may have used consonance as a cue as to which object was more likely to be their mother because a mother hen’s clucking is a harmonic sound. Chiandetti and Vallortigara’s (2011) study thus empirically demonstrated a species’ attraction to musical consonance as a proxy for attraction to vocalizations. However, the study was not directly comparable to humans. As such, Wagner et al. (2020) conducted a similar place preference study where humans could freely choose whether they wanted to spend time with dissonance or with consonance. Humans preferred consonance in this setup. Budgerigars, in a parallel study, did not (Wagner et al. 2020). Note, however, that in budgerigars vocal output is not as clearly harmonic as it is in humans (Lavenex 1999; Tu et al. 2011) meaning that this result does not contradict the vocal similarity hypothesis. Considering the results from non-human consonance and octave equivalence studies together with what we know about humans allows for new insights about the phenomena across species. In a review paper, Wagner and Hoeschele (2022) combine the findings we reviewed above to suggest that both consonance preference and octave equivalence are constrained by multiple interplaying factors. Specifically, they proposed four essential traits for species attention, attraction, and/or production of harmonic information, namely: (1) vocal learning, (2) clear harmonic vocalizations, (3) differing vocal ranges, and (4) simultaneous vocalization/duetting.

12

M. Hoeschele et al.

We have already discussed the first three of these traits above. The importance of the fourth trait once again stems from considering what we know about human vocal communication: Simultaneous vocalizing and duetting are found in human music. There the harmonic series is particularly important, as harmonics need to be taken into account for produced sounds to merge into one. Recently, the cross-cultural “pleasantness” of consonant intervals has been hotly debated (Jacoby et al. 2019; Athanasopoulos et al. 2021; but see Bowling et al. 2017). According to some studies, “pleasantness” as such may be less important than the perceptual effects of consonant intervals merging into one another. McPherson et al. (2020) describe how simultaneously occurring notes were more likely to be perceived as one note if they were separated by consonant intervals. This is less surprising when considering that greater overlap in harmonic information between notes is directly related to the perceived consonance. As such, the octave as the most consonant interval should show the greatest perceptual fusion because there is most harmonic overlap between the compound notes. Indeed, the basis of octave equivalence has recently been suggested to be in perceptual merging of sounds within a harmonic structure due to neural phase locking (Demany et al. 2021). As humans possess all of the proposed four traits underlying octave equivalence/ consonance from Wagner and Hoeschele (2022), they may be especially prone to attending to harmonic information. However, many non-human species possess subsets of these traits. This makes it possible to test the importance of each trait by comparing species with different subsets of these abilities. By conducting such research, we may also find that non-human animals possess additional traits that may be important to their vocal communication and which influences their acoustic perceptions. As such, starting from what we know about human vocal communication opened up a line of research that also allows for further insights into the traits and abilities of other species. The main question here could be summarized as: In which ways do other species use the physics of sound and how is this integrated in their cognition?

1.2.3

Impacts on the Structural Domain

Not only does human communication depend on our perception and production of rhythmic and spectral features of sound, but both music and language depend on structural information as to how we organize which type of element will occur in a sequence. The study of human language (as well as human speech) has been one of the areas where human-based research has been used quite extensively to understand animal communication. Ironically, this has often been the case because researchers have set human linguistic traits as “special” compared to non-human communicative or cognitive systems. Researchers usually have good, logical reasons to think that some language-related trait is, in fact, unique to humans. For example, categorical perception—the ability to perceive gradient stimuli as categorical—seems to be a well-suited adaptation for overcoming imprecision in the highly variable and rapid

1

Using Knowledge About Human Vocal Behavior to Understand. . .

13

linguistic signal (Liberman et al. 1957). Yet, chinchillas (Chinchilla chinchilla) and budgerigars (Melopsittacus undulatus) can both be trained to categorically perceive the same types of sounds that humans were tested on (stop consonant pairs like /p/-/ b/, /t/-/d/, /k/-/g/). Furthermore, swamp sparrows (Melospiza georgiana) tested on their species-typical vocal units show similar perceptual boundaries along acoustic gradients (Lachlan and Nowicki 2015; Nelson and Marler 1989). For practically every domain of linguistics, there exists some animal that demonstrates a language-related trait (see Bugnyar et al. 2016; Doupe and Kuhl 1999; Fitch 2017; Pika et al. 2018 and references therein). Naturally, these data re-contextualize human linguistic abilities, but importantly for animal research, the data also provided important insights into non-human animal communication and cognition. For instance, while “formants” (vocal tract resonances) were long known to be an important cue for vowels in spoken human language, formant research in animals demonstrated the ubiquity of the source-filter theory in tetrapod vocalizations (Fitch 1999; Taylor and Reby 2010) and that formants can serve non-linguistic functions (namely, as a cue to body size, Reby et al. 2005). There are still concepts, tools, and approaches used to study human language and human speech that can be further applied in non-human animal acoustic communication. In particular, the concept of the “segment” has great potential (see Mann and Hoeschele 2020). We define “segments” as discrete units that are divided in the acoustic stream by rapid transitions in one or more acoustic measures (like amplitude or frequency). Importantly, these units are not necessarily divided by intervals of silence. In humans, “segment” is a term that can be applied to both “phones” and “phonemes” (the former refers to physical, acoustic units while the latter is the mental abstraction of those units). So, segments are central in the study of both phonetics (the study of speech sounds) and phonology (the study of the abstract organization of speech sounds). In spite of its central role in spoken human language research, the segment has received relatively little attention in animal acoustic communication. The vast majority of acoustic communication systems are analyzed using silence as a boundary marker (e.g., syllables, songs). In humans, silence is an important acoustic cue but it is not a reliable marker of word or even sentence boundaries (Wang et al. 2010). So, a silence-bounded utterance from a speaker will likely contain a large number of syntactic, semantic, and phonological units. This information would not be apparent without referencing segments. Non-human animals could similarly be packing information into their “syllables” (when referring to the smallest acoustic unit divided by silence in non-humans we use the term “syllable,” because “syllable” has a separate meaning in linguistics, we will use “utterance” for humans). If comparative researchers want to make the most acoustically analogous comparisons to spoken human language, segmental information should not be ignored. These data could reveal a layer of complexity to some species’ vocal behavior or could potentially reveal some broader principles about acoustic communication. Non-humans do not need to be conveying the same type of information as human language for segments to provide compelling data.

14

M. Hoeschele et al.

While segmental analyses are relatively rare when compared with syllable or song level analyses, segments have been described and analyzed in a few species. For instance, banded mongooses (Mungos mungo) produce syllables composed of two segments (Jansen et al. 2013). The initial segment is a harsh, aperiodic signal while the final segment is harmonic. These segments seem to provide distinct functional information, as well. The initial segment carries individual identity cues while the duration of the final segment varies by behavioral context. Similarly, dingoes (Canis familiaris dingo) have two-segment syllables with a noisy bark preceding a harmonic howl (Deaúx et al. 2016). These bark-howl syllables are produced when an individual is threatened but, unlike with the banded mongooses, both segments seem to carry individual identity cues. Western lowland (Gorilla gorilla gorilla) and mountain (Gorilla beringei beringei) gorilla close calls are built from a repertoire of five basic units. Each of these units can be produced in isolation or as segments in multi-segment syllables (Hedwig et al. 2014). Within syllables, some unit types co-occur more frequently and some transitions are more common than others, which suggests there may be some organizational rules or biases. At the same time, the transitions between the types were not completely fixed (e.g., “t2” was usually followed by “t4” but could also be followed by “a1,” “t3,” or itself), suggesting more flexibility than is often assumed with non-human primates. In the banded mongoose, the dingo, and the western gorilla some or all of the segments can be produced as isolated syllables, suggesting combinatoriality abilities that could parallel human phonological abilities (Bowling and Fitch 2015; Fitch 2012). There are other systems that seem to have discrete units within syllables, but where it is more difficult to classify the units into clear segment types or to define some functional purpose to the segments (unlike with the banded mongoose segments, for instance). Multiple species of shorebirds produce distress calls in which they produce a continuously varying tonal signal that switches to some non-linear acoustic phenomena, like deterministic chaos, subharmonics, or frequency jumps (Miller et al. 2022). In a non-linear system, a continuous change in some parameter (e.g., airflow, sub-glottal/syringeal pressure, etc.) can lead to a discrete outcome, like biphonation or frequency jumps. This has been found in chimpanzee pan hoots, as well, which commonly end with biphonation, subharmonics, deterministic chaos, or frequency jumps (Riede et al. 2004). Similar transitions from harmonic signals to non-linear acoustics have been described in mammals (Wilden et al. 1998), birds (Zollinger et al. 2008), frogs (Ryan and Guerra 2014), and fish (Rice et al. 2011). It is unclear how these non-linear dynamics relate to potential units in vocal repertoires. In humans, non-linear acoustic phenomena can be evidence of vocal fold pathology (e.g., development of cysts or polyps; Herzel et al. 1994). It is also possible that these segments are not structured units that are the result of precise motor planning, but rather are simply by-products of biophysical interactions in the sound-producing organs, much like voice cracks in humans (Fitch et al. 2002; Wilden et al. 1998). This does not necessarily mean there is no adaptive function for non-linear acoustics. For instance, in shorebirds, because non-linearities arise in distress calls, segments should be less structured and could be more like human infant crying where unpredictability serves to overcome habituation to vocalizations (Fitch et al.

1

Using Knowledge About Human Vocal Behavior to Understand. . .

15

2002). Another possibility is that non-linear phenomena could be utilized in complex, structured sequences. In this case, signalers could produce more varied and complex vocal behavior while relying on less neural control. Fee et al. (1998) found that acoustic transitions within syllables in the complex, learned song of zebra finches (Taeniopygia guttata), were more dynamic than would be expected given the continuous changes in the syringeal or respiratory parameters. Segmental systems, when compared to systems where the most basic units are divided by silence, permit less ambiguous information to be transmitted over a shorter period of time (Jansen et al. 2013; Mann and Hoeschele 2020). So, segmental systems might be more common as complexity (e.g., larger repertoire size, less stereotypy, more socially learned elements, etc.) increases. However, there are even less data on segments in vocal learning species, even among those that produce complex song, like songbirds or cetaceans. This may be a reflection of the difficulty in analyzing these complex signals rather than a genuine lack of segmental systems, though. For instance, in songbird research, segments are discussed as a potential unit of analysis (Williams 2004, called “elements”), even though they are rarely analyzed. In analyses of cetacean tonal-like acoustic signals, within-syllable frequency changes are used as a metric of signal complexity (Kershenbaum et al. 2013; May-Collado and Wartzok 2008). While these measures are not explicitly segmental analyses, these approaches do incorporate data related to within-syllable boundaries. In many species, individuals or populations repeat stereotyped frequency contours, so the within-syllable frequency changes are not random. At the same time, distinct frequency contours are present within groups and even within individual repertoires. For instance, bottlenose dolphin signature whistles are clearly distinct from one individual to the next (Tursiops truncatus, Janik 1999). This suggests the timing, direction, and magnitude of the frequency changes encode information. Thus far, most of these analyses convert this segmental information into a single metric. In bottlenose dolphins, May-Collado and Wartzok (2008) summed the frequency changes over the signals as a proxy for complexity. Kershenbaum et al. (2013) went further by converting the continuous signals into a sequence of segments. Using a Parsons algorithm, each sound was broken into a fixed number of segments which were assigned to one of seven categories based on the magnitude and direction of the frequency contour (e.g., flat, large rise, medium fall, etc.). Each signal was a sequence of seven segment types so the authors could use Shannon entropy to quantify the complexity of the signal. In two (likely) vocal learning species, developmental data suggest segments could provide insight into how adult vocalizations are formed. In harp seals (Phoca groenlandica), pups produce a wide range of vocalizations, many of which seem to be made up of segments (or “parts,” Miller and Murray 1995; note while there are no clear data on vocal learning in harp seals, there is in the closely related harbor seal; Reichmuth and Casey 2014). Some vocalizations could be composed of more than a dozen segments (the maximum was 16 segments) and all ten individuals in the sample produced calls that had at least three segments. Miller and Murray (1995) argue the complexity of these vocalizations seems functionally

16

M. Hoeschele et al.

“unnecessary.” Pups produce them in mother–pup interactions but it is not clear why the pups would need such structured and variable calls when communicating with their mothers. Especially when compared to other pinniped species (like elephant seals, and most otariids), harp seal mother–pup vocalizations are more complex even though other species seem to face more pressures that might be expected to increase complexity (e.g., longer relationships, higher breeding density, and more pup mobility). Miller and Murray hypothesize that the vocalizations might be akin to songbird subsong or human babbling, where juveniles explore the articulatory space while learning to produce more consistent and precise vocalizations which match the functional purpose needed for that species. Thus, segmental systems might not be difficult to develop, but might be unnecessary or even counter-productive to the communicative needs of many species. In the harp seals, segments seem to be more present in early stages of vocal development, but the opposite pattern seems to exist in intermediate roundleaf bats (Hipposideros larvatus; a member of family Rhinolophidae which Knörnschild 2014 labels as “highly promising” for vocal production learning). Chi et al. (2020) found that infant intermediate roundleaf bats produce discrete syllables in the first few days after birth. But, as time goes on, the discrete syllables are produced with shorter and shorter intervals of silence. By adulthood, the syllables are no longer discrete, but are merged into a single syllable. In some cases, the individual components of the syllables seem obvious only in light of the ontological development. These data suggest adults use segments in their vocal system but need time to develop the appropriate vocal motor control that permits tightly coordinated articulations. For species that have very large syllable repertoires, segmental analyses could help to reduce the number of basic units and make description a bit more tractable or could open up new avenues for cross-species comparisons. For instance, while budgerigar (Melopsittacus undulatus) warble has been described as having only eight-syllable classes, there is considerable within-class variation. In describing budgerigar warble, Farabaugh et al. (1992) stated that “[v]ariability was so extreme that exact repeated renditions of particular syllables were relatively rare, and series of syllables composing particular songs like those of oscine songbirds were virtually non-existent” (pp. 118). However, while there may be little stereotypy in syllables and their organization, there may be more structure at the segmental level. Syllables are built from two broad segment classes which somewhat parallels the consonant– vowel distinction found in all spoken human languages (Mann et al. 2021). Furthermore, Mann et al. found that budgerigars from multiple populations show a strong tendency for producing more aperiodic segments in syllable-initial positions while producing harmonic segments in syllable-medial positions, another parallel with human segmental biases. Critically, this work with budgerigar segments was made possible with a model that was based on human segments. In sum, considering the level at which human vocalizations would need to be evaluated in order to fully be able to decode our species’ vocalizations points to a critical difference between human and many non-human animal studies: the importance of the segment. Studying segments in non-human species enables us to find

1

Using Knowledge About Human Vocal Behavior to Understand. . .

17

structure in species vocalizations that so far were considered novel at every rendition (such as budgerigar warble) and/or to simplify the repertoire of a given species by identifying units that repeat within syllables (such as the discrete units of young roundleaf bats that appear within adult syllables). Segment analysis may be crucial to understanding a given vocalizations’ meaning or purpose. Additionally, segmental analysis also allows us to form hypotheses about the necessities and limitations of a given species’ vocalizations depending on other factors such as the extent of the species’ vocal learning abilities.

1.3

Conclusions

Here, we showed how the study of animal acoustic communication can be enhanced by considering what we know about the human species. We showed that things that have cross-cultural importance for us humans may well be relevant for other species. This is something that has had little attention to date in the acoustic domain, mainly because we did not have the tools to do so in the past. As we hope we have shown, studying human abilities in other species is best accomplished through direct comparison: that is, by studying humans under the same conditions as the compared species in question. In all cases, it is of critical importance to consider how we would see a particular human behavior if we were analyzing it in another species. Then, when we have identified what we want to study, the method with which we study it also needs to be considered. When it comes to analyzing vocal production, this is somewhat straightforward, although even here it is important to consider when and how the vocalizations were recorded across species. For other behavior measures, this can become quite complicated. Using operant conditioning and place preference paradigms can help this endeavor, however differences in hearing ability, situational relevance, and acoustic relevance all need to be taken into account. For paradigms that use reward, the reward relevance also needs to be considered and tweaked across species. We hope we have demonstrated that, despite concerns about anthropomorphism when comparing humans to other animals (e.g., Wynne 2004), careful consideration and comparison between non-human animal and human behavior can be extremely helpful in the study of animal behavior at least with regard to animal communication. Perhaps with new insights about the building blocks of human music and language the ability to communicate with other animals is on the horizon. Acknowledgments Dan Mann is supported by funding from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement no. 101028815.

18

M. Hoeschele et al.

References Akre KL, Bernal X, Rand AS, Ryan MJ (2014) Harmonic calls and indifferent females: no preference for human consonance in an anuran. Proc Biol Sci 281(1789):20140986. https:// doi.org/10.1098/rspb.2014.0986 Allen O (1967) Octave discriminability of musical and non-musical subjects. Psychon Sci 7(12): 421–422. https://doi.org/10.3758/BF03331154 Allen C, Trestman M (2017) Animal consciousness. In: Schneider S, Velmans M (eds) The Blackwell companion to consciousness. Wiley, pp 63–76. https://doi.org/10.1002/ 9781119132363.ch5 Ashida G, Carr CE (2011) Sound localization: Jeffress and beyond. Curr Opin Neurobiol 21(5): 745–751. https://doi.org/10.1016/j.conb.2011.05.008 Athanasopoulos G, Eerola T, Lahdelma I, Kaliakatsos-Papakostas M (2021) Harmonic organisation conveys both universal and culture specific cues for emotional expression in music. PLoS One 16(1):e0244964. https://doi.org/10.1371/journal.pone.0244964 Billroth T (1895) Wer ist musikalisch? Gebrüder Paetel, Berlin Blackwell HR, Schlosberg H (1943) Octave generalization, pitch discrimination, and loudness thresholds in the white rat. J Exp Psychol 33(5):407–419. https://doi.org/10.1037/h0057863 Bowling DL, Fitch WT (2015) Do animal communication systems have phonemes? Trends Cogn Sci 19(10):555–557. https://doi.org/10.1016/j.tics.2015.08.011 Bowling DL, Purves D (2015) A biological rationale for musical consonance. Proc Natl Acad Sci U S A 112(36):11155–11160. https://doi.org/10.1073/pnas.1505768112 Bowling DL, Hoeschele M, Gill KZ, Tecumseh Fitch W (2017) The nature and nurture of musical consonance. Music Percept 35(1):118–121 Bowling DL, Purves D, Gill KZ (2018) Vocal similarity predicts the relative attraction of musical chords. Proc Natl Acad Sci U S A 115(1):216–221. https://doi.org/10.1073/pnas.1713206115 Brooks DI, Cook RG (2010) Chord discrimination by pigeons. Music Percept 27(3):183–196. https://doi.org/10.1525/mp.2010.27.3.183 Bugnyar T, Reber SA, Buckner C (2016) Ravens attribute visual access to unseen competitors. Nat Commun 7(9):10506. https://doi.org/10.1038/ncomms10506 Burns EM (1999) Intervals, scales, and tuning. In: Deutsch D (ed) Psychology of music, 2nd edn. Academic Press, pp 215–264. https://doi.org/10.1016/B978-012213564-4/50008-1 Cahill JA, Armstrong J, Deran A, Khoury CJ, Paten B, Haussler D, Jarvis ED (2021) Positive selection in noncoding genomic regions of vocal learning birds is associated with genes implicated in vocal learning and speech functions in humans. Genome Res 31(11): 2035–2049. https://doi.org/10.1101/gr.275989.121 Chi T, Liu M, Tan X, Sun K, Jin L, Feng J (2020) Syllable merging during ontogeny in Hipposideros larvatus. Bioacoustics 29(4):387–398. https://doi.org/10.1080/09524622.2019. 1610906 Chiandetti C, Vallortigara G (2011) Chicks like consonant music. Psychol Sci 22(10):1270–1273. https://doi.org/10.1177/0956797611418244 Clark, Yallop C (1990) An introduction to phonetics and phonology, Blackwell Cook P, Rouse A, Wilson M, Reichmuth C (2013) A California sea lion (Zalophus californianus) can keep the beat: motor entrainment to rhythmic auditory stimuli in a non vocal mimic. J Comp Psychol 127(4):412–427. https://doi.org/10.1037/a0032345 Cousineau M, McDermott J, Peretz I (2012) The basis of musical consonance as revealed by congenital amusia. Proc Natl Acad Sci 109(48):19858–19863. https://doi.org/10.1073/pnas. 1207989109 Crespo-Bojorque P, Toro JM (2015) The use of interval ratios in consonance perception by rats (Rattus norvegicus) and humans (Homo sapiens). J Comp Psychol 129(1):42–51. https://doi. org/10.1037/a0037991

1

Using Knowledge About Human Vocal Behavior to Understand. . .

19

Cynx J (1993) Auditory frequency generalization and a failure to find octave generalization in a songbird, the European starling (Sturnus vulgaris). J Comp Psychol 107(2):140–146. https:// doi.org/10.1037/0735-7036.107.2.140 Darwin C (1871) The descent of man, and selection in relation to sex. J. Murray, London. https:// doi.org/10.5962/bhl.title.2092 Deaúx EC, Allen AP, Clarke JA, Charrier I (2016) Concatenation of “alert” and “identity” segments in dingoes’ alarm calls. Sci Rep 6(February):1–9. https://doi.org/10.1038/srep30556 Demany L, Armand F (1984) The perceptual reality of tone chroma in early infancy. J Acoust Soc Am 76(1):57–66. https://doi.org/10.1121/1.391006 Demany L, Monteiro G, Semal C, Shamma S, Carlyon RP (2021) The perception of octave pitch affinity and harmonic fusion have a common origin. Hear Res 404:108213. https://doi.org/10. 1016/j.heares.2021.108213 Deutsch D (1972) Octave generalization and tune recognition. Percept Psychophys 11:411–412. https://doi.org/10.3758/BF03206280 Doolittle E, Brumm H (2012) O canto do Uirapuru: consonant intervals and patterns in the song of the musician wren. J Interdiscip Music Stud 6(1):55–85. https://doi.org/10.4407/jims.2013. 10.003 Doolittle EL, Gingras B, Endres DM, Fitch WT (2014) Overtone-based pitch selection in hermit thrush song: in expected convergence with scale construction in human music. Proc Natl Acad Sci 111(46):16616–16621. https://doi.org/10.1073/pnas.1406023111 Doupe AJ, Kuhl PK (1999) Birdsong and human speech: common themes and mechanisms. Annu Rev Neurosci 22:567–631. https://doi.org/10.1146/annurev.neuro.22.1.567 Dowling WJ, Hollombe AW (1977) The perception of melodies distorted by splitting into several octaves: effects of increasing proximity and melodic contour. Percept Psychophys 21(1):60–64. https://doi.org/10.3758/BF03199469 Fannin HA, Braud WG (1971) Preference for consonant over dissonant tones in the albino rat. Percept Mot Skills 32(1):191–193. https://doi.org/10.2466/pms.1971.32.1.191 Farabaugh SM, Brown ED, Dooling RJ (1992) Analysis of warble song of the budgerigar: Melopsittacus undulatus. Bioacoustics 4(2):111–130. https://doi.org/10.1080/09524622.1992. 9753211 Fee MS, Shraiman B, Pesaran B, Mitra PP (1998) The role of nonlinear dynamics of the syrinx in the vocalizations of a songbird. Nature 395(6697):67–71. https://doi.org/10.1038/25725 Fitch WT (1999) Acoustic exaggeration of size in birds via tracheal elongation: comparative and theoretical analyses. J Zool 248:31–48. https://doi.org/10.1017/S095283699900504X Fitch WT (2005) The evolution of music in comparative perspective. Ann N Y Acad Sci 1060:29– 49. https://doi.org/10.1196/annals.1360.004 Fitch WT (2012) Segmental structure in banded mongoose calls. BMC Biol 10:2–5. https://doi.org/ 10.1186/1741-7007-10-98 Fitch WT (2017) Empirical approaches to the study of language evolution. Psychon Bull Rev 24(1): 3–33. https://doi.org/10.3758/s13423-017-1236-5 Fitch WT, Neubauer J, Herzel H (2002) Calls out of chaos: the adaptive significance of nonlinear phenomena in mammalian vocal production. Anim Behav 63(3):407–418. https://doi.org/10. 1006/anbe.2001.1912 Greenberg J (2001) Studying organizational justice cross-culturally: fundamental challenges. Int J Confl Manag 12(4):365–375 Greenfield MD (2005) Mechanisms and evolution of communal sexual displays in arthropods and anurans. Adv Study Behav 35(05):1–62. https://doi.org/10.1016/S0065-3454(05)35001-7 Hagmann CE, Cook RG (2010) Testing meter, rhythm, and tempo discriminations in pigeons. Behav Process 85(2):99–110. https://doi.org/10.1016/j.beproc.2010.06.015 Hattori Y, Tomonaga M, Matsuzawa T (2013) Spontaneous synchronized tapping to an auditory rhythm in a chimpanzee. Sci Rep 3(1566):1–6. https://doi.org/10.1038/srep01566 Hauser MD, Chomsky N, Fitch WT (2002) The faculty of language: what is it, who has it, and how did it evolve? Adv Sci 298(5598):1569–1579. https://doi.org/10.1126/science.298.5598.1569

20

M. Hoeschele et al.

Hedwig D, Robbins MM, Mundry R, Hammerschmidt K, Boesch C (2014) Acoustic structure and variation in mountain and western gorilla close calls: a syntactic approach. Behaviour 151(8): 1091–1120. https://doi.org/10.1163/1568539X-00003175 Herzel H, Berry D, Titze IR, Saleh M (1994) Analysis of vocal disorders with methods from nonlinear dynamics. J Speech Hear Res 37(5):1008–1019. https://doi.org/10.1044/jshr.3705. 1008 Hirst D, Looze C (2021) Fundamental frequency and pitch. In: Knight R, Setter J (eds) The Cambridge handbook of phonetics (Cambridge handbooks in language and linguistics). Cambridge University Press, Cambridge, pp 336–361. https://doi.org/10.1017/9781108644198.014 Hoeschele M (2017) Animal pitch perception: melodies and harmonies. Comp Cogn Behav Rev 12: 5–18. https://doi.org/10.3819/CCBR.2017.120002 Hoeschele M, Cook RG, Guillette LM, Brooks I, Sturdy CB (2012a) Black-capped chickadee (Poecile atricapillus) and human (Homo sapiens) chord discrimination. J Comp Psychol 126(1): 57–67. https://doi.org/10.1037/a0024627 Hoeschele M, Weisman RG, Sturdy CB (2012b) Pitch chroma discrimination, generalization, and transfer tests of octave equivalence in humans. Atten Percept Psychophys 74(8):1742–1760. https://doi.org/10.3758/s13414-012-0364-2 Hoeschele M, Weisman RG, Sturdy CB, Hahn A, Guilette L (2013) Chickadees fail standardized operant tests for octave equivalence. Anim Cogn 16(4):599–609. https://doi.org/10.1007/ s10071-013-0597-z Honing H, Merchant H, Háden GP, Prado L, Bartolo R (2012) Rhesus monkeys (Macaca mulatta) detect rhythmic groups in music, but not the beat. PLoS One 7(12):e51369. https://doi.org/10. 1371/journal.pone.0051369 Honing H, ten Cate C, Peretz I, Trehub SE (2015) Without it no music: cognition, biology and evolution of musicality. Philos Trans R Soc Lond B Biol Sci 370:20140088. https://doi.org/10. 1098/rstb.2014.0088 Hove MJ, Sutherland ME, Krumhansl CL (2009) Ethnicity effects in relative pitch. In: Proceedings of the 31st annual meeting of the cognitive science society, vol 31. pp 2256–2261 Hulse SH, Humpal J, Cynx J (1984) Discrimination and generalization of rhythmic and arrhythmic sound patterns by European starlings (sturnus vulgaris). Music Percept 1(4):442–464. https:// doi.org/10.2307/40285272 Hulse SH, Takeuchi AH, Braaten RF (1992) Perceptual invariances in the comparative psychology of music. Music Percept 10(2):151–184. https://doi.org/10.2307/40285605 Hulse SH, Bernard DJ, Braaten RF (1995) Auditory discrimination of chord-based spectral structures by European starlings (Sturnus vulgaris). J Exp Psychol Gen 124(4):409–423. https://doi.org/10.1037/0096-3445.124.4.409 Izumi A (2000) Japanese monkeys perceive sensory consonance of chords. J Acoust Soc Am 108(6):3073–3078. https://doi.org/10.1121/1.1323461 Jacoby N, Undurraga EA, McPherson MJ, Valdés J, Ossandón T, McDermott JH (2019) Universal and non-universal features of musical pitch perception revealed by singing. Curr Biol 29(19): 3229–3243. https://doi.org/10.1016/j.cub.2019.08.020 Jacoby N, Polak R, London J (2021) Extreme precision in rhythmic interaction is enabled by roleoptimized sensorimotor coupling: analysis and modelling of West African drum ensemble music. Philos Trans R Soc Lond B Biol Sci 376(1835):20200331. https://doi.org/10.1098/ rstb.2020.0331 Janik VM (1999) Pitfalls in the categorization of behaviour: a comparison of dolphin whistle classification methods. Anim Behav 57(1):133–143. https://doi.org/10.1006/anbe.1998.0923 Jansen DA, Cant MA, Manser MB (2013) Segmental concatenation of individual signatures and context cues in banded mongoose (Mungos mungo) close calls. BMC Biol 10(1):97. https://doi. org/10.1186/1741-7007-10-97 Jarvis ED (2007) Neural systems for vocal learning in birds and humans: a synopsis. J Ornithol 148 (SUPPL. 1):35–44. https://doi.org/10.1007/s10336-007-0243-0

1

Using Knowledge About Human Vocal Behavior to Understand. . .

21

Jarvis ED (2019) Evolution of vocal learning and spoken language. Science 366(6461):50–54. https://doi.org/10.1126/science.aax0287 Kallman HJ (1982) Octave equivalence as measured by similarity ratings. Percept Psychophys 32(1):37–49. https://doi.org/10.3758/BF03204867 Kershenbaum A, Sayigh LS, Janik VM (2013) The encoding of individual identity in dolphin signature whistles: how much information is needed? PLoS One 8(10):1–7. https://doi.org/10. 1371/journal.pone.0077671 Knörnschild M (2014) Vocal production learning in bats. Curr Opin Neurobiol 28:80–85. https:// doi.org/10.1016/j.conb.2014.06.014 Koda H, Basile M, Olivier M, Remeuf K, Nagumo S, Blois-Heulin LA (2013) Validation of an auditory sensory reinforcement paradigm: Campbell’s monkeys (Cercopithecus campbelli) do not prefer consonant over dissonant sounds. J Comp Psychol 127(3):265–271. https://doi.org/ 10.1037/a0031237 Köppl C (2001) Tonotopic projections of the auditory nerve to the cochlear nucleus angularis in the barn owl. J Assoc Res Otolaryngol 2(1):41–53. https://doi.org/10.1007/s101620010027 Krumhansl CL (1990) Cognitive foundations of musical pitch. Oxford University Press Krumhansl CL, Shepard RN (1979) Quantification of the hierarchy of tonal functions within a diatonic context. J Exp Psychol Hum Percept Perform 5(4):579–594. https://doi.org/10.1037/ 0096-1523.5.4.579 Lachlan RF, Nowicki S (2015) Context-dependent categorical perception in a songbird. Proc Natl Acad Sci 112(6):1892–1897. https://doi.org/10.1073/pnas.1410844112 Lameira AR, Eerola T, Ravignani A (2019) Coupled whole-body rhythmic entrainment between two chimpanzees. Sci Rep 9(1):18914. https://doi.org/10.1038/s41598-019-55360-y Lavenex PB (1999) Vocal production mechanisms in the budgerigar (Melopsittacus undulatus): the presence and implications of amplitude modulation. J Acoust Soc Am 106(1):491–505. https:// doi.org/10.1121/1.427079 Liberman AM, Harris KS, Hoffman HS, Griffith BC (1957) The discrimination of speech sounds within and across phoneme boundaries. J Exp Psychol 54(5):358–368. https://doi.org/10.1037/ h0044417 Ludvig EA, Madan CR, Pisklak JM, Spetch ML (2014) Reward context determines risky choice in pigeons and humans. Biol Lett 10(8):20140451. https://doi.org/10.1098/rsbl.2014.0451 Mann DC, Hoeschele M (2020) Segmental units in nonhuman animal vocalization as a window into meaning, structure, and the evolution of language. Anim Behav Cogn 7(2):151–158. https://doi. org/10.26451/abc.07.02.09.2020 Mann DC, Lahti DC, Waddick L, Mundinger P (2020) House finches learn canary trills. Bioacoustics 1-17:215. https://doi.org/10.1080/09524622.2020.1718551 Mann DC, Fitch WT, Tu HW, Hoeschele M (2021) Universal principles underlying segmental structures in parrot song and human speech. Sci Rep 11(1):1–13. https://doi.org/10.1038/ s41598-020-80340-y Masataka N (2006) Preference for consonance over dissonance by hearing newborns of deaf parents and of hearing parents. Dev Sci 9:46–50 May-Collado LJ, Wartzok D (2008) A comparison of bottlenose dolphin whistles in the Atlantic Ocean: factors promoting whistle variation. J Mammal 89(5):1229–1240. https://doi.org/10. 1644/07-MAMM-A-310.1 McClary S (1986) A musical dialectic from the enlightenment: Mozart’s “piano concerto in G major, K. 453”, movement 2. Cult Crit 4(4):129. https://doi.org/10.2307/1354338 McDermott J, Hauser M (2004) Are consonant intervals music to their ears? Spontaneous acoustic preferences in a nonhuman primate. Cognition 94(2):B11–B21. https://doi.org/10.1016/j. cognition.2004.04.004 McDermott JH, Lehr AJ, Oxenham AJ (2010) Individual differences reveal the basis of consonance. Curr Biol 20(11):1035–1041. https://doi.org/10.1016/j.cub.2010.04.019

22

M. Hoeschele et al.

McDermott JH, Schultz AF, Undurraga EA, Godoy RA (2016) Indifference to dissonance in native Amazonians reveals cultural variation in music perception. Nature 535(7613):547–550. https:// doi.org/10.1038/nature18635 McPherson MJ, Dolan SE, Durango A, Ossandon T, Valdés J, Undurraga EA, Jacoby N, Godoy RA, McDermott JH (2020) Perceptual fusion of musical notes by native Amazonians suggests universal representations of musical intervals. Nat Commun 11(2786):1–14. https://doi.org/10. 1038/s41467-020-16448-6 Melloni L, Mudrik L, Pitts M, Koch C (2021) Making the hard problem of consciousness easier. Science 372(6545):911–912. https://doi.org/10.1126/science.abj3259 Merchant H, Grahn J, Trainor L, Rohrmeier M, Fitch WT (2015) Finding the beat: a neural perspective across humans and non-human primates. Philos Trans R Soc Lond B Biol Sci 370(20140093):1–16. https://doi.org/10.1098/rstb.2014.0093 Miller EH, Murray AV (1995) Structure, complexity, and organization of vocalizations in harp seal (Phoca groenlandica) pups. In: Sensory systems of aquatic mammals. pp 237–264 Miller EH, Kostoglou KN, Wilson DR, Weston MA (2022) Anatomy of avian distress calls: structure, variation, and complexity in two species of shorebird (Aves: Charadrii). Behaviour 153(3):1–35. https://doi.org/10.1163/1568539X-bja10147 Nelson DA, Marler P (1989) Categorical perception of a natural stimulus continuum: birdsong. Science 244(4907):976–978. https://doi.org/10.1126/science.2727689 Okanoya K, Dooling RJ (1987) Hearing in passerine and psittacine birds: a comparative study of absolute and masked auditory thresholds. J Comp Psychol 101(1):7–15. http://www.ncbi.nlm. nih.gov/pubmed/3568610 Osmanski MS, Wang X (2011) Measurement of absolute auditory thresholds in the common marmoset (Callithrix jacchus). Hear Res 277(1–2):127–133. https://doi.org/10.1016/j.heares. 2011.02.001 Patel AD (2003) Language, music, syntax and the brain. Nat Neurosci 6(7):674–681. https://doi. org/10.1038/nn1082 Patel AD (2021) Vocal learning as a preadaptation for the evolution of human beat perception and synchronization. Philos Trans R Soc B: Biol Sci 376:20200326. https://doi.org/10.1098/rstb. 2020.0326 Patel AD, Iversen JR, Bregman MR, Schulz I (2009) Experimental evidence for synchronization to a musical beat in a nonhuman animal. Curr Biol 19(10):827–830. https://doi.org/10.1016/j.cub. 2009.03.038 Penn DC, Holyoak KJ, Povinelli DJ (2008) Darwin’s mistake: explaining the discontinuity between human and nonhuman minds. Behav Brain Sci 31(2):109–130; discussion 130-178. https://doi. org/10.1017/S0140525X08003543 Perani D, Saccuman MC, Scifo P, Spada D, Andreolli G, Rovelli R, Koelsch S (2010) Functional specializations for music processing in the human newborn brain. Proc Natl Acad Sci 107(10): 4758–4763. https://doi.org/10.1073/pnas.0909074107 Peter B, Stoel-Gammon C, Kim D (2008) Octave equivalence as an aspect of stimulus-response similarity during nonword and sentence imitations in young children. In: Proceedings of the 4th international conference on speech prosody, SP 2008, pp 731–734 Peter B, Larkin T, Stoel-Gammon C (2009) Octave-shifted pitch matching in nonword imitations: the effects of lexical stress and speech sound disorder. J Acoust Soc Am 126(4):1663–1666. https://doi.org/10.1121/1.3203993 Peter B, Foster B, Haas H, Middleton K, McKibben K (2015) Direct and octave-shifted pitch matching during nonword imitations in men, women, and children. J Voice 29(260):260. e21–260.e30. https://doi.org/10.1016/j.jvoice.2014.06.011 Pfenning AR, Hara E, Whitney O, Rivas MV, Wang R, Roulhac PL, Howard JT, Wirthlin M, Lovell PV, Ganapathy G, Mouncastle J, Moseley MA, Thompson JW, Soderblom EJ, Iriki A, Kato M, Gilbert MTP, Zhang G, Bakken T et al (2014) Convergent transcriptional specializations in the brains of humans and song-learning birds. Science 346(6215):1256846. https://doi. org/10.1126/science.1256846

1

Using Knowledge About Human Vocal Behavior to Understand. . .

23

Pfitzinger HR, Niebuhr O (2011) Historical development of phonetic vowel systems: the last 400 years. In: Proceedings of the 17th international congress of phonetic sciences, vol 160. p 163 Pika S, Wilkinson R, Kendrick KH, Vernes SC (2018) Taking turns: bridging the gap between human and animal communication. Proc Biol Sci 285(1880):20180598. https://doi.org/10.1098/ rspb.2018.0598 Platinga J, Trehub SE (2014) Revisiting the innate preference for consonance. J Exp Psychol Hum Percept Perform 40(1):40–49. https://doi.org/10.1037/a0033471 Podos J (1997) A performance constraint on the evolution of trilled vocalizations in a songbird family (Passeriformes: Emberizidae). Evolution 51(2):537–551. https://doi.org/10.1111/j. 1558-5646.1997.tb02441.x Reby D, McComb K, Cargnelutti B, Darwin C, Fitch WT, Clutton-Brock T (2005) Red deer stags use formants as assessment cues during intrasexual agonistic interactions. Proc Biol Sci 272(1566):941–947. https://doi.org/10.1098/rspb.2004.2954 Reichmuth C, Casey C (2014) Vocal learning in seals, sea lions, and walruses. Curr Opin Neurobiol 28:66–71. https://doi.org/10.1016/j.conb.2014.06.011 Reinert J (1965) Takt- und Rhythmusunterscheidung bei Dohlen. Z Tierpsychol 22(6):623–671. https://doi.org/10.1111/j.1439-0310.1965.tb01683.x Rice AN, Land BR, Bass AH (2011) Nonlinear acoustic complexity in a fish “two-voice” system. Proc Biol Sci 278(1725):3762–3768. https://doi.org/10.1098/rspb.2011.0656 Richards DG, Wolz JP, Herman LM (1984) Vocal mimicry of computer-generated sounds and vocal labeling of objects by a bottlenosed dolphin, Tursiops truncatus. J Comp Psychol 98(1): 10–28. https://doi.org/10.1037/0735-7036.98.1.10 Richner H (2016) Interval singing links to phenotypic quality in a songbird. Proc Natl Acad Sci U S A 113(45):12763–12767. https://doi.org/10.1073/pnas.1610062113 Riede T, Owren MJ, Arcadi AC (2004) Nonlinear acoustics in pant hoots of common chimpanzees (pan troglodytes): frequency jumps, subharmonics, biphonation, and deterministic chaos. Am J Primatol 64(3):277–291. https://doi.org/10.1002/ajp.20078 Ryan MJ, Guerra MA (2014) The mechanism of sound production in tungara frogs and its role in sexual selection and speciation. Curr Opin Neurobiol 28:54–59. https://doi.org/10.1016/j.conb. 2014.06.008 Scanlan J (1999) The function and significance of inter-species acoustic cues in the transformation of budgerigar (Melopsittacus undulatus) sounds into “speech”. Int J Comp Psychol 12(3): 111–152 Schachner A, Brady TF, Pepperberg IM, Hauser MD (2009) Spontaneous motor entrainment to music in multiple vocal mimicking species. Curr Biol 19(10):831–836. https://doi.org/10.1016/ j.cub.2009.03.061 Scharff C, Petri J (2011) Evo-devo, deep homology and FoxP2: implications for the evolution of speech and language. Philos Trans R Soc Lond B Biol Sci 366(1574):2124–2140. https://doi. org/10.1098/rstb.2011.0001 Schellenberg EG, Trehub SE (1996) Natural musical intervals: evidence from infant listeners. Psychol Sci 7(5):272–277. https://doi.org/10.1111/j.1467-9280.1996.tb00373 Schwartz DA, Howe CQ, Purves D (2003) The statistical structure of human speech sounds predicts musical universals. J Neurosci 23(18):7160–7168. https://doi.org/10.1523/JNEUROSCI.23-1807160.2003 Segall M, Campbell D, Herskovits M (1968) The influence of culture on visual perception. In: Toch H, Smith C (eds) Social perception. Van Nostrand Slabbekoorn H, ten Cate C (1999) Collared dove responses to playback: slaves to the rhythm. Ethology 105(5):377–391. https://doi.org/10.1046/j.1439-0310.1999.00420.x Stevens CJ, Keller PE, Tyler MD (2013) Tonal language background and detecting pitch contour in spoken and musical items. Psychol Music 41(1):59–74. https://doi.org/10.1177/ 0305735611415749

24

M. Hoeschele et al.

Stiebler I, Ehret G (1985) Inferior colliculus of the house mouse. I. A quantitative study of tonotopic organization, frequency representation, and tone-threshold distribution. J Comp Neurol 238(1): 65–76. https://doi.org/10.1002/cne.902380106 Stivers T, Enfield NJ, Brown P, Englert C, Hayashi M, Heinemann T et al (2009) Universals and cultural variation in turn-taking in conversation. Proc Natl Acad Sci U S A 106(26): 10587–10592. https://doi.org/10.1073/pnas.0903616106 Sugimoto T, Kobayashi H, Nobuyoshi N, Kiriyama Y, Takeshita H, Nakamura T, Hashiya K (2010) Preference for consonant music over dissonant music by an infant chimpanzee. Primates 51(1):7–12. https://doi.org/10.1007/s10329-009-0160-3 Taylor AM, Reby D (2010) The contribution of source-filter theory to mammal vocal communication research. J Zool 280(3):221–236. https://doi.org/10.1111/j.1469-7998.2009.00661.x ten Cate C, Spierings M, Hubert J, Honing H (2016) Can birds perceive rhythmic patterns? A review and experiments on a songbird and a parrot species. Front Psychol 7(May):1–14. https:// doi.org/10.3389/fpsyg.2016.00730 Terhardt E (1984) The concept of musical consonance: a link between music and psychoacoustics. Music Percept 1(3):276–295. https://doi.org/10.2307/40285261 Toro J, Crespo P (2017) Consonance processing in the absence of relevant experience: evidence from nonhuman animals. Comp Cogn Behav Rev 12:33–44. https://doi.org/10.3819/CCBR. 2017.120004 Trainor LJ, Heinmiller BM (1998) The development of evaluative responses to music. Infant Behav Dev 21(1):77–88. https://doi.org/10.1016/S0163-6383(98)90055-8 Trainor LJ, Tsang CD, Cheung VHM (2002) Preference for sensory consonance in 2- and 4-monthold infants. Music Percept 20(2):187–194. https://doi.org/10.1525/mp.2002.20.2.187 Trehub SE (2003) The developmental origins of musicality. Nat Neurosci 6(7):669–673. https://doi. org/10.1038/nn1084 Tu HW, Osmanski MS, Dooling RJ (2011) Learned vocalizations in budgerigars (Melopsittacus undulatus): the relationship between contact calls and warble song. J Acoust Soc Am 129(4): 2289–2299. https://doi.org/10.1121/1.3557035 Tyack PL (2020) A taxonomy for vocal learning. Philos Trans R Soc Lond B Biol Sci 375(1789): 20180406. https://doi.org/10.1098/rstb.2018.0406 Wagner B, Hoeschele M (2022) The links between pitch, timbre, musicality, and social bonding from cross-species research. Comp Cogn Behav Rev 17:1–20. https://doi.org/10.3819/CCBR. 2022.170002 Wagner B, Mann DC, Afroozeh S, Staubmann G, Hoeschele M (2019) Octave equivalence is not linked to vocal mimicry: budgerigars fail standardized operant tests for octave equivalence. Behaviour 156(5–8):480–504. https://doi.org/10.1163/1568539X-00003538 Wagner B, Bowling DL, Hoeschele M (2020) Is consonance attractive to budgerigars? No evidence from a place preference study. Anim Cogn 23(5):973–987. https://doi.org/10.1007/s10071-02001404-0 Wagner B, Sturdy C, Weisman RG, Hoeschele M (2022) Pitch chroma information is processed in addition to pitch height information with more than two pitch-range categories. Atten Percept Psychophys 84:1757–1771. https://doi.org/10.3758/s13414-022-02496-1 Wang YT, Green JR, Nip ISB, Kent RD, Kent JF (2010) Breath group analysis for reading and spontaneous speech in healthy adults. Folia Phoniatr Logop 62(6):297–302. https://doi.org/10. 1159/000316976 Wang L, CheungJason JTM, Pu F, Li D, Zhang M, Fan Y (2011) Why do woodpeckers resist head impact injury: a biomechanical investigation. PLoS One 6(10):1–8. https://doi.org/10.1371/ journal.pone.0026490 Watanabe S, Uozumi M, Tanaka N (2005) Discrimination of consonance and dissonance in Java sparrows. Behav Process 70(2):203–208. https://doi.org/10.1016/j.beproc.2005.06.001 Wilden I, Herzel H, Peters G, Tembrock G (1998) Subharmonics, biphonation, and deterministic chaos in mammal vocalization. Bioacoustics 9:171–196. https://doi.org/10.1080/09524622. 1998.9753394

1

Using Knowledge About Human Vocal Behavior to Understand. . .

25

Williams H (2004) Birdsong and singing behavior. Ann N Y Acad Sci 1016:1–30. https://doi.org/ 10.1196/annals.1298.029 Winkler I, Haden GP, Ladinig O, Sziller I, Honing H (2009) Newborn infants detect the beat in music. Proc Natl Acad Sci 106(7):2468–2471. https://doi.org/10.1073/pnas.0809035106 Wirthlin M, Chang EF, Knörnschild M, Krubitzer LA, Mello CV, Miller CT et al (2019) A modular approach to vocal learning: disentangling the diversity of a complex behavioral trait. Neuron 104(1):87–99. https://doi.org/10.1016/j.neuron.2019.09.036 Wright AA, Rivera JJ, Hulse SH, Shyan M, Neiworth JJ (2000) Music perception and octave generalization in rhesus monkeys. J Exp Psychol 129(3):291–307. https://doi.org/10.1037/ 0096-3445.129.3.291 Wynne CDL (2004) The perils of anthropomorphism. Nature 428(April):606 Yip (2002) Tone. Cambridge University Press. Page 1. https://doi.org/10.1017/ CBO9781139164559 Zentner MR, Kagan J (1996) Perception of music by infants. Nature 383(6595):29. https://doi.org/ 10.1038/383029a0 Zentner MR, Kagan J (1998) Infants’ perception of consonance and dissonance in music. Infant Behav Dev 21(3):483–492. https://doi.org/10.1016/S0163-6383(98)90021-2 Zollinger SA, Riede T, Suthers RA (2008) Two-voice complexity from a single side of the syrinx in northern mockingbird Mimus polyglottos vocalizations. J Exp Biol 211:1978–1991. https://doi. org/10.1242/jeb.014092

Chapter 2

Acoustic Communication in Fruit Flies and Mosquitoes Matthew P. Su and Azusa Kamikouchi

Abstract Acoustic communication between conspecific males and females, mediated by wing movements, forms a key part of premating behavior in both fruit flies and mosquitoes. However, there are substantial differences in how, where, and when this communication is conducted, resulting in distinct ear anatomy and function. Here, we compare acoustic behaviors and ear anatomy between the two groups. Given the relative lack of genetic tools available until recently in mosquitoes, much remains unclear regarding the fundamental underpinnings of hearing. We describe the neural circuitry underlying hearing behaviors in fruit flies, and how equivalent mosquito circuitry can be investigated in the future. Keywords Insect ear · Mating behavior · Courtship song · Phonotaxis

2.1

Introduction

Intra- and interspecific acoustic communication influences key behaviors across the animal kingdom (Chen and Wiens 2020). Such communication requires both a sender to produce sounds and a receiver to process the corresponding auditory information contained therein. Hearing therefore plays a key role in animal communication, with two major types of sound detection organ having evolved as an evolutionary consequence of sound energy being delivered to animals in two ways (Warren and Nowotny 2021). These types can be broadly defined as either pressure M. P. Su Graduate School of Science, Nagoya University, Nagoya, Aichi, Japan Institute for Advanced Research, Nagoya University, Nagoya, Aichi, Japan e-mail: [email protected] A. Kamikouchi (✉) Graduate School of Science, Nagoya University, Nagoya, Aichi, Japan Institute for Advanced Research, Nagoya University, Nagoya, Aichi, Japan Graduate School of Life Science, Tohoku University, Sendai, Miyagi, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 Y. Seki (ed.), Acoustic Communication in Animals, https://doi.org/10.1007/978-981-99-0831-8_2

27

28

M. P. Su and A. Kamikouchi

receivers to detect fluctuations in pressure, or movement receivers to detect oscillations of particles in a medium (including air, liquids, and other media) (Albert and Kozlov 2016). While mammalian tympanic ears serve as exemplary pressure receivers, the antennal ears of fruit flies and mosquitoes instead act as movement receivers, resulting in the creation of a distinct acoustic landscape. One notable difference is a reduction in the working range of antennal ears, as they can perceive acoustic stimuli only within a close distance of the source (~cm) due to the near-field nature of the particle velocity component of sound (Albert and Göpfert 2015). This closerange auditory system plays a major role in determining a diverse array of mating behaviors and communication across many insect species. Studies on fruit fly acoustic communication for example have identified both male and female songs, as well as aggressive sound generation during social interactions between conspecifics (Ishikawa and Kamikouchi 2016). In particular, research using Drosophila melanogaster, the most widely utilized insect model organism in biosciences, has revealed the molecular, cellular, and neural-circuit mechanisms that underly their acoustic communication. Recent extensions/adaptations of Drosophila studies into mosquito auditory research have helped elucidate hearing function and behavior and multiple levels in insects with even more complex auditory systems (Andrés et al. 2020). These investigations into the fundamental mechanisms underlying insect hearing have helped to deepen our (neuro-)scientific understanding of how insects perceive and process auditory information. This information is not only of conceptual value, but can also be put to practical use; as mosquitoes rely on hearing for mating, interfering with their hearing systems or behaviors can disrupt this mating (Xu et al. 2022). This is theoretically possible via novel insecticides or acoustic traps, but field application has thus far proven challenging. Here, we first review acoustic communication in fruit flies and describe the progress made in understanding fruit fly auditory systems, before connecting these results with equivalent research in mosquitoes. Despite both fruit flies and mosquitoes belonging to the order Diptera (meaning both can be considered as true flies), mosquito research lags far behind due to a historical lack of genetic and experimental tools that have only just started to be addressed. We therefore describe the expected future research path for mosquito studies, which require an additional layer of complexity due to the distinct nature of mosquito mating. Finally, we highlight fundamental questions that must be answered for the successful implementation of novel mosquito control methods targeting hearing.

2.2

Acoustic Behaviors

In both fruit flies and mosquitoes, sounds generated by wing vibrations serve as acoustic signals in conspecific communication for mating. The mating of the fruit fly Drosophila melanogaster [D. melanogaster] takes place on the ground and is

2

Acoustic Communication in Fruit Flies and Mosquitoes

29

characterized by the male’s stereotyped courtship behaviors toward a target female, including the wing-vibrating behavior that emits the courtship song (Bastock and Manning 1955; Greenspan and Ferveur 2003). The mating of mosquitoes, in contrast, occurs in midair. Prior to mating, male mosquitoes form large aggregations (“swarms”) in which males detect the faint flight tones of the small number of female mosquitoes who fly nearby (Clements 1999). Detection of these sounds triggers a phonotactic attraction of multiple males to females; males then compete with each other for copulation (Andrés et al. 2020). Mosquito acoustic behaviors thus involve an additional spatial dimension to D. melanogaster. In this chapter, we describe the stereotyped acoustic behaviors of fruit flies, and how this research can be extended to understanding male mosquito phonotaxis.

2.2.1

Fruit Flies

2.2.1.1

Courtship Song of Males

In D. melanogaster, males emit a close-range courtship song toward a female by vibrating one wing during courtship (Shorey 1962) (Fig. 2.1a). This behavior is innate and does not need to be learned. By the early 1980s, the songs of over 100 species of Drosophila and the related drosophilid genus Zaprionus had been described (Ewing 1983). The courtship song of each Drosophila species carries unique temporal patterns (Ewing and Bennet-Clark 1968; Cowling and Burnet 1981) and is widely considered to be a crucial component of the flies’ premating barriers, contributing to both reproductive isolation and speciation in drosophilid flies (Ritchie et al. 1999). The majority of Drosophila species, including D. melanogaster, have more than one type of courtship song. The song of D. melanogaster males is comprised of a sine song and two types of pulse song (Clemens et al. 2018). These songs have no fixed order; males pattern their songs using feedback cues from partners as well as their internal states (Calhoun et al. 2019). Closely related species differ in major parameters of the song, such as pulse type, interpulse interval (IPI), and intrapulse frequency (IPF) of the pulse song, and the presence and frequency of the sine song. Among seven sibling species of the melanogaster subgroup, D. simulans and D. mauritiana have similar, but distinct, courtship songs to D. melanogaster, which consist of a mixture of the sine song and pulse songs (Cowling and Burnet 1981; Riabinina et al. 2011). D. erecta also produces both songs, but not in a mixture. D. yakuba and D. ananassae males produce only the pulse song, and D. teissieri males produce two discrete and unique waveform elements (Riabinina et al. 2011). Species-specific song elements, especially IPIs, are widely considered to be important for both reproductive isolation and speciation in Drosophilid flies. The pulse song IPI of D. melanogaster has been reported to be approximately 30–40 ms,

30

M. P. Su and A. Kamikouchi

Fig. 2.1 Acoustic communication of fruit flies and mosquitoes. (a) (left) A male Drosophila melanogaster vibrates his wing to attract a female, generating a courtship song. (left inset) A typical pulse song waveform. (right) Male wing extension during courtship. (b) (left) Mosquito mating occurs within male-dominated swarms. Males utilize phonotactic attraction to femalespecific WBFs to mediate courtship. (right) Male phonotaxis is sufficiently robust that speakers can be used to attract and trap males

whereas in D. simulans, the closest relative of D. melanogaster, the IPI is about 55–65 ms (Cowling and Burnet 1981). The courtship song of each species increases female receptivity to copulation (Bennet-Clark and Ewing 1969). Previous studies have shown that in D. melanogaster, pulse songs with a conspecific IPI increase females’ mating receptivity more than those with heterospecific IPIs, indicating that

2

Acoustic Communication in Fruit Flies and Mosquitoes

31

the IPI is a critical parameter of the courtship song (Bennet-Clark and Ewing 1969; Ritchie et al. 1999) (Ohashi et al. 2023). D. simulans females also increase their receptivity when exposed to a pulse song with a conspecific IPI more than those with heterospecific IPIs (Ritchie et al. 1999) (Ohashi et al. 2023). At early ages, males of these two species share similar IPI values, with immature D. melanogaster males having particularly broad IPI distributions in their pulse songs. Maturation is necessary to establish species specificity (Moulin et al. 2001). During the courtship ritual, the male emits a song when he is close to a target female or orienting toward her. The wing closer to the female is extended, and males quickly switch between wings when circling the female (Bastock and Manning 1955; Morley et al. 2018). Intriguingly, males adjust their song intensity to compensate for female distance, which is estimated by complex visual stimulus features (Coen et al. 2016). Studies in D. melanogaster revealed that the song behavior of males is controlled by neuronal circuits expressing the sex determination factors Fruitless ( fru) and Doublesex (dsx) (dsx + fru + neurons). Many of these dsx + fru + neurons are male-specific or sexually dimorphic (Dickson 2008). In the early stages of the courtship ritual, the female fly exhibits a rejection response consisting primarily of escape and aggression components. The male typically continues courtship behavior after such rejections, which allows female flies to spend time evaluating males by detecting a variety of signals, such as malederived pheromones and songs. The probability of behavioral transition from female rejection to acceptance increases as male courtship proceeds until the final stage of the ritual, i.e., the female slows her locomotion and opens her vaginal plate to accept copulation (Billeter et al. 2006). Song response behavior is observed not only in females but also in males. When exposed to male courtship songs in a single-sex group situation, male flies start chasing each other and form so-called courtship chains (Eberl et al. 1997) This chaining behavior is the most obvious behavioral phenotype of fruit flies upon exposure to the courtship song and therefore has long been used to quantify the auditory response of flies (Eberl et al. 1997; Kamikouchi et al. 2009). Males of both D. melanogaster and D. simulans exhibit chaining behavior upon exposure to the courtship song (Yoon et al. 2013), suggesting this behavior is evolutionally conserved. A previous D. melanogaster report has shown the IPI window of song response behavior is fine-tuned by prior auditory experiences (Li et al. 2018). When females are exposed to a song carrying a conspecific IPI for several days before being challenged by a courting male, their preference for the conspecific IPI over a heterospecific IPI gets stronger than in naive females. Likewise, when males are exposed to a conspecific song for several days, or reared in a group of males, their preference for the conspecific IPI is more strongly tuned than in naïve, or isolated, males. This phenomenon in D. melanogaster is known as song preference learning and requires the GABAergic system for establishment (Li et al. 2018).

32

2.2.1.2

M. P. Su and A. Kamikouchi

Females Emit Songs During Copulation

Although it had previously been assumed that only male flies sing during mating, recent findings have demonstrated that female Drosophila also sings by wing vibration in copula (Kerwin et al. 2020). This female copulation song is produced by bilateral wing vibrations which occur throughout copulation and is acoustically distinct from the male song. Female copulation song production requires dsx + positive neurons. Males which hear female songs during copulation may increase their allocation of receptivity suppressing seminal fluid molecules (Kerwin et al. 2020). Females of D. simulans, D. mauritiana, and D. sechellia, close relatives of D. melanogaster, also emit copulation songs. While the acoustic parameters of male courtship songs display marked inter-species differences, female song structure in these species is very similar.

2.2.1.3

Sound Pulses During Agonistic Interactions

During sexual rejection and inter-male aggression, both sexes of D. melanogaster also produce a third type of sound pulses (Paillette et al. 2012; Versteven et al. 2017; Swain and von Philipsborn 2021). These agonistic sounds are observed in immature females or males in response to male courtship behavior. These sounds, emitted by flicking both wings in a stereotyped manner, consist of pulses with more variable IPIs than the male’s courtship pulse song (Paillette et al. 2012). Aggression sound signals are also produced by the wing flicking of males during combat over resources such as food and mates. This sound also consists of pulses with more variable IPIs than the male’s courtship pulse song (Jonsson et al. 2011; Versteven et al. 2017). Playback of aggression sounds was found to increase the number of aggressive encounters between male flies (Versteven et al. 2017).

2.2.2

Mosquitoes

2.2.2.1

Mosquito Flight Tones

In sharp contrast to fruit flies, for whom mating occurs on a 2-D plane with songs produced by both sexes during copulation/mating, mosquitoes mating (and thus acoustic communication) occurs mid-flight (Clements 1999). Unlike fruit flies and songbirds therefore, mosquitoes do not produce specific courtship songs but instead rely on the sounds produced by the beating of their wings during regular flying. These flight tones, or wing beat frequencies (WBFs), are sexually dimorphic, with males having a significantly greater WBF than females (~750 Hz as compared to ~500 Hz) (Andrés et al. 2020). While these WBFs appear relatively conserved across the major human-biting mosquito genera (Aedes, Anopheles, and Culex), they can vary greatly in non-human-biting species (Kim et al. 2021).

2

Acoustic Communication in Fruit Flies and Mosquitoes

33

Mosquito WBFs are dependent on a number of factors, including temperature and time of day (Villarreal et al. 2017; Somers et al. 2022). In an interesting comparison to fruit flies, a female-specific splicing isoform of dsx (dsxF) is necessary for females to produce female WBFs; homozygous dsxF female Anopheles gambiae mutant flight instead produces an intersex WBF approximately halfway between male and female flight tones (Su et al. 2020).

2.2.2.2

Male Hearing Behaviors

Male mosquito hearing behavior centers around phonotaxis; direct attraction to the sound of flying females. This behavior has been identified across multiple mosquito genera, though the extent of attraction somewhat varies (Iikeshoji 1985). As female WBFs are broadly similar across multiple species, it appears conspecific mating is conserved via the timing and location of mating rather than differential sound production (Diabaté et al. 2009). Mosquito mating largely occurs in swarms; male-dominated aggregations which form not only at specific times of day (and are thus driven by an internal circadian clock (Wang et al. 2021a)) but also at distinct locations (Hartberg 1971; Sawadogo et al. 2013). Males gather in groups over visual markers, with this gathering likely mediated via pheromone sensing though directly contradictory reports exist (Mozūraitis et al. 2020; Poda et al. 2022). The distance at which males fly above the marker results in species segregation (Poda et al. 2019); males from two distinct Anopheles species swarm around the same markers, but with distinct spatial differences, reducing interspecific mating opportunities (Diabaté et al. 2009). While males return to swarming locations repeatedly throughout their lives (Ekechukwu et al. 2015), females largely encounter swarms only once, as after a first mating females are unlikely to remate (Degner and Harrington 2016). Females appear to fly around the edge of swarms rather than entering directly (Sawadogo et al. 2014). Males locate females by listening for their distinct WBFs, with the aforementioned sexual dimorphisms in WBF meaning that even in male-dominated spaces, females are always identifiable (Fig. 2.1b). Male mosquito attraction to sound is sufficiently strong that phonotaxis can be replicated using speakers playing pure tones mimicking female WBFs (Fig. 2.1b). By coupling speakers with fans and traps to prevent males leaving once they have approached the speaker, males can be removed from the mating population (Staunton et al. 2021). However, although numerous lab-based experiments have demonstrated the efficiency of these sound traps in the carefully controlled conditions, translation to the field has proven challenging and only daily catches of a handful of males have proven possible.

2.2.2.3

Female Hearing Behaviors

Unlike male mosquito phonotaxis, which has been studied for over a century, female hearing behaviors remain largely under-researched. Indeed, despite the significant

34

M. P. Su and A. Kamikouchi

size of the female mosquito ear as compared to every other insect species, it was long presumed that female mosquitoes were deaf as researchers were unable to elicit phonotactic responses in major mosquito species. Although later research into hearing function demonstrated female Aedes aegypti mosquitoes showed all major hallmarks of active hearing (Göpfert and Robert 2001), thus rebutting notions of deafness, quantifiable hearing behaviors remained unreported. Progress was finally made however by studying mosquito species that bite not humans, but frogs. Multiple species, including Uranotaenia macfarlanei and Culex territans, have been found to be attracted to sounds mimicking specific frog mating calls (Toma et al. 2005; Borkent and Belton 2006; Bartlett-Healy et al. 2008). These sounds thus enable females to identify hosts for blood feeding in the absence of heat signatures. Further advances were made shortly after in the human-biting species Aedes aegypti. Females from this species were found to shift their WBFs to match males not at the fundamental tone, but instead at harmonics of this fundamental sound (Cator et al. 2009). This appears the direct result of sexual dimorphisms in mosquito flight tones; females with a WBF of 500 Hz match males not at their 750 Hz tone, but at 1500 Hz (the females’ third, and the males’ second, harmonic). These results represented not only the first evidence of female human-biting mosquito hearing behaviors, but also the first demonstration of such a complex acoustic communication system. Denoted as “harmonic convergence,” equivalent experiments in other species (including Anopheles gambiae and Culex quinquefasciatus) uncovered similar conserved mechanisms across species (Warren et al. 2009; Pennetier et al. 2010). Females from these species thus appeared to use harmonic convergence as a selective mechanism to screen potential mates and choose only those deemed suitable on (unknown) acoustic criteria. Building on these preliminary findings, later research found that harmonic convergence occurred not only at this 3:2 ratio of the females’ third and the males’ second harmonic, but at multiple ratios (Aldersley et al. 2016). Mated females also showed significant reductions in convergence activity, in agreement with their general refraction from copulation following mating (League et al. 2019). Despite this growing body of evidence, more detailed investigations into the precise timing of harmonic convergence revealed that the time at which convergence was most likely to occur appeared at a late stage in the courtship process, suggesting that female selection based on male flight tones seemed unlikely (Aldersley and Cator 2019). Furthermore, a thorough analysis of previously published datasets found that harmonic convergence was as likely to occur in males and females recorded separately as in mating pairs, implying that this apparent convergence was instead the result of chance (Somers et al. 2022). This finding unfortunately seemingly returns female mosquito hearing behavior research to 2008, with no clear forward path.

2

Acoustic Communication in Fruit Flies and Mosquitoes

2.3

35

Ear Anatomy and Function

Both fruit flies and mosquitoes use an antenna to detect sound. This antennal ear is comprised of two parts: the receiver and the auditory organ (Johnston’s organ; JO). The receiver is located distally and picks up the mechanical energy of sound (i.e., the particle velocity component of sound). The JO is located within the second segment of the antenna (a2) and detects receiver movement evoked by sound stimuli. The JO is the site of auditory mechanotransduction and houses JO neurons as receptor cells. JO neurons are the primary neurons of the auditory pathway of both fruit flies and mosquitoes. The antennae of fruit flies and mosquitoes vibrate in a nonlinear, active manner (Göpfert and Robert 2001, 2002). JO neurons actively amplify the sound-induced vibrations they transduce. In this chapter, we will overview the anatomy and function of the ear in fruit flies and mosquitoes. We also summarize the auditory neural circuit in the fruit fly brain that processes song information to execute specific behavioral responses.

2.3.1

Fruit Flies

2.3.1.1

Ear Anatomy

In the fruit fly, the feathery arista at the distal part of the antenna serves as the sound receiver (Fig. 2.2a). It vibrates back and forth in response to acoustic stimulation by catching the particle velocity component of sound. The receiver vibrations of fruit flies are restricted by a single central axis, meaning movement is essentially limited to a single dimension, and are maximally sensitive to forces perpendicular to the arista length (Göpfert and Robert 2003). The arista is mechanically coupled to the third antennal segment (a3) (Nadrowski et al. 2008). Accordingly, the movement of the arista is converted to the oscillation of the base of the a3, the hook, that is embedded in a2 and connects to JO (Fig. 2.2b). This connection informs JO neurons of sound receiver movements; JO neurons, bipolar neurons with an axon at the brain side and a cilium at the hook side of the cell body, are activated by the oscillation of the hook. From the theoretical dipole of a singing fruit fly male wing, sound attenuation is expected to be high, with amplitude decreasing by a relationship of 1/r3, where r is the distance from the sound source (Bennet-Clark 1984). A female fly experiences an airspeed on the order of 0.1 cm/s as she listens to a male courtship song (BennetClark 1971; Morley et al. 2018). To detect such a faint sound, the antennal receiver of fruit flies increases its sensitivity by vibrating in a nonlinear, active manner (Göpfert and Robert 2002). JO neurons mediate this active process, through which quiet sound stimuli are amplified at the receiver (Göpfert and Robert 2003; Göpfert et al. 2006). Moreover, the resonant characteristics of the antennae vary with sound

36

M. P. Su and A. Kamikouchi

Fig. 2.2 The antennal ear of fruit flies and mosquitoes. (a) (left) The ears of mammals (top) and fruit flies (bottom). (right) The fruit fly ear is located at the antenna. The auditory organ, Johnston’s organ (JO) is located within the second segment of the antenna (a2). a1, the first segment of the antenna; a3, the third segment of the antenna. Modified from Kamikouchi et al. 2006 with permissions. (b) JO anatomy. Horizontal section of the a2 (right panel) shows two clusters of JO neurons (JONs). Dashed line in the left panel represents the section position of the right panel. Modified from Kamikouchi et al. 2006 with permissions. (c) Mosquito antennal ear anatomy. Male ears (left) are distinct from female ears (right) due to differences in antennal fibrillae. The ears of both sexes are comprised of a flagellum, containing flagellar neurons (red), and a JO containing JO neurons (blue region). Each neuronal group projects to a distinct brain region. (d) Male (left) and female (right) mosquito flagellar vibration frequencies. Exposure to specific neurotransmitters can alter the flagellar vibrations of both sexes; neurotransmitters with known effects on frequency tuning (estimated at ~500Hz and ~225Hz in males and females respectively prior to exposure) are listed

intensity; low intensities (faint sounds) generate lower resonant frequencies than higher intensities (loud sounds) through an active process, effectively altering the stiffness of the sound receiver (Göpfert and Robert 2002, 2003; Albert et al. 2007). In D. melanogaster, active mechanical feedback from JO neurons shifts the sound receiver’s best frequency from 800 Hz to 200 Hz (i.e., from the passive state to the active state) under no sound stimulus conditions (Göpfert et al. 2005). The JO houses ~480 JO neurons in D. melanogaster (Kamikouchi et al. 2006). These neurons are stretch-sensitive mechanosensory neurons activated and inactivated when their cilia are stretched and compressed, respectively (Kamikouchi et al. 2009). Traditionally, JO neurons have been anatomically and functionally classified into five subgroups, namely JO- A, B, C, D, and E (Kamikouchi et al.

2

Acoustic Communication in Fruit Flies and Mosquitoes

37

2006). JO neurons of each subgroup send their axons to a distinct zone in the antennal mechanosensory and motor center (AMMC) in the brain. Each of the AMMC zones, denoted zones A, B, C, D, and E, is innervated by a corresponding subgroup of JO neurons (JO- A, B, C, D, and E, respectively). In addition to these five subgroups, a recent EM-based reconstruction study has identified a new subgroup of JO neurons, JO-F (Hampel et al. 2020). Axons of JO-F neurons project to zone F, a distinct zone located in the ventromedial region of the AMMC in the fly brain. Due to the differences in the intrinsic and extrinsic properties, JO neuron subgroups respond differently to the antennal movement. JO-A and JO-B neurons are vibration-sensitive neurons and are maximally activated by high (>100 Hz) and low ( 90 dB SPL (sound pressure level (relative to 20 μPa) in decibel) at 1 m] and low-frequency sounds (e.g., 20 Hz to 20 kHz), are audible to humans and, thus, have been studied subjects for a long time. These songs involve the advantages of being conceived by intended receivers (female mates) over long distances because of the low attenuation in the air. Males of singing insects, such as cicadas, crickets, and katydids, hence, use calling songs to attract female mates that are distant from the male signalers. However, conspicuous signals subject the signalers to eavesdropping by unintended receivers, including eared predators, parasitoids, and conspecific male rivals (i.e., satellite males) (Zuk and Kolluru 1998; Gerhardt and Huber 2002; Hedwig and Robert 2014; Legett et al. 2019). Predatory-gleaning bats and eared parasitoid flies locate sound-generating insects using passive hearing (Belwood and Morris 1987; Zuk et al. 2006; Alem et al. 2011; Siemers et al. 2012). In contrast to calling songs, because the quiet “soft” courtship songs that were found in male corn borer moths Ostrinia spp. (Crambidae) are less susceptible to eavesdropping, it was predicted that whispering ultrasounds for copulation are widespread among moths (Nakano et al. 2008). It was revealed in a survey of 26 species of eared moths belonging to Noctuidae, Erebidae, Pyralidae, Crambidae, and Geometridae families that males of 11 species (42%) produced relatively low-intensity (70 dB SPL at the position of the moth. Since the males of O. furnacalis and S. litura do not emit courtship songs loudly, flying moths do not exhibit last-ditch reactions, such as protean and unpredictable flight responses, to their actual courtship songs. Therefore, it is deduced that the female recognizes the soft courtship song of males as a distant bat echolocation call and keeps stationary in order not to be detected by the bat. The equivalent reactions of females to bat echolocation calls and male courtship songs support the idea that eared moths evolved acoustic sexual communication based on preexisting sensory biases.

3.4.2

Discrimination Before the Evolution of Mate Recognition

In eared moths, evolutionary modification from a bat avoidance response to a positive response to the attractive signal of mates is an essential process for forming sexual acoustic communication. In the lesser waxmoth, Achroia grisella (Pyralidae), for example, female moths evade echolocation calls of bats as negative phonotaxis but are attracted to male ultrasonic calling songs as positive phonotaxis (Rodríguez and Greenfield 2004; Greenfield and Weber 2000; Greenfield and Hohendorf 2009). In the lichen moth, Manulea japonica (Erebidae), and the yellow peach moth, Conogethes punctiferalis (Crambidae), females distinguish bat echolocation calls from male courtship songs based on the temporal pattern and show the mate acceptance posture to a conspecific male singing a courtship song (Nakano et al. 2013, 2014). Except for M. japonica and C. punctiferalis, the required behavioral reversal (negative response to bat echolocation calls and positive response to male courtship songs) might not be critical in mating. Courtship songs work in a relatively short range, which makes positive orientations like calling songs unnecessary. Thus, courtship song, which was added to the typical moth mating system with female sex pheromones for long-distance attraction of male moths, has probably rendered the production of courtship songs more typical compared to loud calling songs in moths. Adaptive sophistication of moth acoustic communication based on preexisting sensory exploitation results in matching the property of the sounds of the sender to the detection abilities of the receiver. Courtship ultrasounds would be tuned toward detectable sound intensity and frequency range of auditory characteristics in receivers who evolved hearing to detect bat echolocation calls. Afterward, the

56

R. Nakano

acoustic signal of the male sender and the response of the female receiver coevolved through sexual selection (Greenfield 2014). In erebid tiger and lichen moths, males produce courtship songs. Their temporal structure is different from that of bat ultrasounds. In addition, individuals of both sexes emit aposematic or jamming signals from the same metathoracic tymbals against bat echolocation calls. The preference/mate recognition of females for male courtship songs in this group is most likely to have evolved after the evolution of anti-bat ultrasounds. Currently, females never generate anti-bat ultrasounds for male songs. Thus, experimentally demonstrating sensory exploitation in moth acoustic communication with the materecognition signal is a tough subject because the preference of the female for the current ultrasonic signal of the male evolved after the development of original evasive action to bat echolocation ultrasounds. Regarding the presence of male courtship ultrasounds, Spodoptera moths will provide appropriate examples: some do not use courtship ultrasounds for mating, whereas S. litura and Spodoptera exigua, for instance, emit prominent male ultrasounds in courtship (Nakano and Nagamine 2019).

3.5

Self-Feedback Via Courtship Song

In the lucerne moth Nomophila nearctica (Pyraloidea, Crambidae), courting males sequentially generate pairs of ultrasonic clicks with a dominant frequency of 112 kHz and a 61 dB SPL at 10 cm (Fig. 3.2a) (Nakano and Nagamine 2019). Males possess a pair of mesothoracic smooth tymbals, and only one-side tymbal is used for generating ultrasonic clicks. Courtship ultrasounds are emitted from the right or left tymbal when a male bends its abdomen in the left or right direction for genital coupling: the tymbal of the other side of the abdominal bending direction is used for ultrasound production (Fig. 3.2a). For successful copulation, males are required to persistent courtship behaviors with abdominal bending and ultrasound emissions because original mating success is relatively low (about 30–40%) even in intact females and males (Fig. 3.2b). When the tympanic membranes of the female or tymbals of the male are destroyed, mating success further decreases to less than 10%. Here presenting playback of male courtship ultrasounds toward a female mate with functional hearing rescues the low mating success, from 10% to 30% or more (Fig. 3.2b). This rescue is achieved by broadcasting echolocation calls of a sympatric insectivorous bat, Eptesicus fuscus. Additionally, flying female moths and resting females equally take evasive actions in response to both simulations of ultrasounds emitted from male moths and E. fuscus bats. The fact that female moths of N. nearctica do not discriminate between conspecific male ultrasounds and bat echolocation calls and male courtship ultrasounds have a deceptive function in female mates, as shown in O. furnacalis and S. litura moths, is corroborated by these findings. For acoustic moths of which males generate bat-like ultrasounds to make female receivers motionless, one fundamental question has arisen: Is a male sender directly

3

Multiple Functions of Ultrasonic Courtship Song in Moths

57

Fig. 3.2 Ultrasonic communication using a male courtship song in the lucerne moth, Nomophila nearctica (Pyraloidea, Crambidae). (a) Male moths emit ultrasonic clicks while bending the abdomen for genital coupling. (b) Mating success is enhanced by male courtship song, but a sympatric bat (Eptesicus fuscus) call also does enhance it. (c) The number of genital contacts in 5 min significantly decreases when a singing male cannot hear a male courtship song or bat echolocation call

58

R. Nakano

affected by hearing its own ultrasonic courtship song with the deceptive function? The number of copulation attempts of males accompanied by abdominal bending was analyzed to explore this in behavioral experiments with male moths, in which their genitalia was surgically ablated to being incapable of copulation with a focal female. The mean number of genital contacts with a female for 5 min was more than 20 in males with intact tymbal organs or sham-operated tymbals than in males with intact tymbal organs but injured metathoracic surface cuticles (the positive control of ablation of tymbal organs) (Fig. 3.2c). By contrast, aphonic males with destroyed tymbal organs quit the copulation attempts soon despite the presence of nearby females: the mean number was only five times or less. By broadcasting male courtship ultrasounds or E. fuscus bat echolocation calls, the number of genital contacts for copulation in aphonic males with ablated tymbals significantly increased in comparison with no ultrasound stimulus but not for background room noise. It is suggested by these data that male courtship songs motivate male senders in the courtship context. The mechanism for self-feedback using own courtship ultrasounds is still unknown. Still, it is anticipated that odor processing in the central nervous system, i.e., brain, of male moths is sensitized by ultrasound stimuli (Anton et al. 2011).

3.6

Perspectives

Male moths often produce instantaneous courtship ultrasounds with low sound intensity (Nakano and Nagamine 2019). Owing to the “sensory bias” of human ears toward relatively low frequency (20 Hz to 20 kHz), high sound intensity, and long sounds, such as calling songs of cicadas and crickets, the actual number of insects communicating with soft courtship songs has long been underestimated (Balenger 2015; Reichard and Anderson 2015). According to the recent development of handy microphones and recording devices that can easily capture weak (ultra)sounds, the use of soft sounds for private communication in more moth species will be revealed. Subsequently, these mating sounds will be confirmed as a deceptive signal, a true one, or the others. These studies will be essential for deepening our understanding of the evolutionary scenario of acoustic communication in moths. Studies on modifying original negative responses in defensive behavior, such as the freezing response to predator signals, may contribute to our knowledge of the evolution of novel communication modalities.

References Alem S, Koselj K, Siemers BM, Greenfield MD (2011) Bat predation and the evolution of leks in acoustic moths. Behav Ecol Sociobiol 65:2105–2116. https://doi.org/10.1007/s00265-0111219-x

3

Multiple Functions of Ultrasonic Courtship Song in Moths

59

Anton S, Evengaard K, Barrozo RB, Anderson P, Skals N (2011) Brief predator sound exposure elicits behavioral and neuronal long-term sensitization in the olfactory system of an insect. Proc Natl Acad Sci U S A 108:3401–3405. https://doi.org/10.1073/pnas.1008840108 Balenger SL (2015) Stridulated soft song by singing insects. Anim Behav 105:275–280. https://doi. org/10.1016/j.anbehav.2015.03.024 Belwood JJ, Morris GK (1987) Bat predation and its influence on calling behavior in Neotropical katydids. Science 238:64–67. https://doi.org/10.1126/science.238.4823.64 Conner WE (1999) ‘Un chant d’appel amoureux’: acoustic communication in moths. J Exp Biol 202:1711–1723. https://doi.org/10.1242/jeb.202.13.1711 Conner WE, Corcoran AJ (2012) Sound strategies: the 65-million-year-old battle between bats and insects. Annu Rev Entomol 57:21–39. https://doi.org/10.1146/annurev-ento-121510-133537 Endler JA, Basolo AL (1998) Sensory ecology, receiver biases and sexual selection. Trends Ecol Evol 13:415–420. https://doi.org/10.1016/S0169-5347(98)01471-2 Fernández Y, Dowdy NJ, Conner WE (2020) Extreme duty cycles in the acoustic signals of tiger moths: sexual and natural selection operating in parallel. Integr Org Biol 2:obaa046. https://doi. org/10.1093/iob/obaa046 Gerhardt HC, Huber F (2002) Acoustic communication in insects and anurans: common problems and diverse solutions. University of Chicago Press, Chicago Greenfield MD (2014) Acoustic communication in the nocturnal Lepidoptera. In: Hedwig B (ed) Insect hearing and acoustic communication, Animal signals and communication, vol 1. Springer, Berlin, Heidelberg, pp 81–100 Greenfield MD, Hohendorf H (2009) Independence of sexual and anti-predator perceptual functions in an acoustic moth: implications for the receiver bias mechanism in signal evolution. Ethology 115:1137–1149. https://doi.org/10.1111/j.1439-0310.2009.01700.x Greenfield MD, Weber T (2000) Evolution of ultrasonic signalling in wax moths: discrimination of ultrasonic mating calls from bat echolocation signals and the exploitation of an anti-predator receiver bias by sexual advertisement. Ethol Ecol Evol 12:259–279. https://doi.org/10.1080/ 08927014.2000.9522800 Hedwig B, Robert D (2014) Auditory parasitoid flies exploiting acoustic communication of insects. In: Hedwig B (ed) Insect hearing and acoustic communication, Animal signals and communication, vol 1. Springer, Berlin, Heidelberg, pp 45–63 Ishikawa Y (2020) Insect sex pheromone research and beyond. Entomology monographs. Springer, Singapore Kawahara AY, Plotkin D, Espeland M, Meusemann K, Toussaint EFA, Donath A, Gimnich F, Frandsen PB, Zwick A, dos Reis M, Barber JR, Peters RS, Liu S, Zhou X, Mayer C, Podsiadlowski L, Storer C, Yack JE, Misof B, Breinholt JW (2019) Phylogenomics reveals the evolutionary timing and pattern of butterflies and moths. Proc Natl Acad Sci U S A 116: 22657–22663. https://doi.org/10.1073/pnas.1907847116 Kindl J, Kalinová B, Červenka M, Jílek M, Valterova I (2011) Male moth songs tempt females to accept mating: the role of acoustic and pheromonal communication in the reproductive behaviour of Aphomia sociella. PLoS One 6:e26476. https://doi.org/10.1371/journal.pone.0026476 Legett HD, Page RA, Bernal XE (2019) Synchronized mating signals in a communication network: the challenge of avoiding predators while attracting mates. Proc R Soc Lond B Biol Sci 286: 20191067. https://doi.org/10.1098/rspb.2019.1067 Miller LA, Surlykke A (2001) How some insects detect and avoid being eaten by bats: tactics and countertactics of prey and predator. Bioscience 51:570–581. https://doi.org/10.1641/0006-3568 (2001)051[0570:HSIDAA]2.0.CO;2 Minet J, Surlykke A (2003) Auditory and sound producing organs. In: Kristensen NP (ed) Lepidoptera, moths and butterflies, Morphology and physiology, vol 2. Walter de Gruyter, Berlin, pp 289–323 Nakano R, Mason AC (2018) Early erratic flight response of the lucerne moth to the quiet echolocation calls of distant bats. PLoS One 13:e0202679. https://doi.org/10.1371/journal. pone.0202679

60

R. Nakano

Nakano R, Nagamine K (2019) Loudness-duration tradeoff in ultrasonic courtship songs of moths. Front Ecol Evol 7:244. https://doi.org/10.3389/fevo.2019.00244 Nakano R, Ishikawa Y, Tatsuki S, Surlykke A, Skals N, Takanashi T (2006) Ultrasonic courtship song in the Asian corn borer moth, Ostrinia furnacalis. Naturwissenschaften 93:292–296. https://doi.org/10.1007/s00114-006-0100-7 Nakano R, Skals N, Takanashi T, Surlykke A, Koike T, Yoshida K, Maruyama H, Tatsuki S, Ishikawa Y (2008) Moths produce extremely quiet ultrasonic courtship songs by rubbing specialized scales. Proc Natl Acad Sci U S A 105:11812–11817. https://doi.org/10.1073/pnas. 0804056105 Nakano R, Takanashi T, Fujii T, Skals N, Surlykke A, Ishikawa Y (2009) Moths are not silent, but whisper ultrasonic courtship songs. J Exp Biol 212:4072–4078. https://doi.org/10.1242/jeb. 032466 Nakano R, Takanashi T, Skals N, Surlykke A, Ishikawa Y (2010a) To females of a noctuid moth, male courtship songs are nothing more than bat echolocation calls. Biol Lett 6:582–584. https:// doi.org/10.1098/rsbl.2010.0058 Nakano R, Takanashi T, Skals N, Surlykke A, Ishikawa Y (2010b) Ultrasonic courtship songs of male Asian corn borer moths assist copulation attempts by making the females motionless. Physiol Entomol 35:76–81. https://doi.org/10.1111/j.1365-3032.2009.00712.x Nakano R, Ihara F, Mishiro K, Toyama M (2012a) Male courtship ultrasound produced by mesothoracic tymbal organs in the yellow peach moth Conogethes punctiferalis (Lepidoptera: Crambidae). Appl Entomol Zool 47:129–135. https://doi.org/10.1007/s13355-012-0099-5 Nakano R, Takanashi T, Ihara F, Mishiro K, Toyama M, Ishikawa Y (2012b) Ultrasonic courtship song of the yellow peach moth, Conogethes punctiferalis (Lepidoptera: Crambidae). Appl Entomol Zool 47:87–93. https://doi.org/10.1007/s13355-012-0092-z Nakano R, Takanashi T, Surlykke A, Skals N, Ishikawa Y (2013) Evolution of deceptive and true courtship songs in moths. Sci Rep 3:2003. https://doi.org/10.1038/srep02003 Nakano R, Ihara F, Mishiro K, Toyama M, Toda S (2014) Double meaning of courtship song in a moth. Proc R Soc Lond B Biol Sci 281:20140840. https://doi.org/10.1098/rspb.2014.0840 Nakano R, Takanashi T, Surlykke A (2015a) Moth hearing and sound communication. J Comp Physiol A 201:111–121. https://doi.org/10.1007/s00359-014-0945-8 Nakano R, Ihara F, Mishiro K, Toyama M, Toda S (2015b) High duty cycle pulses suppress orientation flights of crambid moths. J Insect Physiol 83:15–21. https://doi.org/10.1016/j. jinsphys.2015.11.004 Nakano R, Ito A, Tokumaru S (2022) Sustainable pest control inspired by prey-predator ultrasound interactions. Proc Natl Acad Sci U S A 119:e2211007119. https://doi.org/10.1073/pnas. 2211007119 Obara Y (1979) Bombyx mori mating dance: an essential in locating the female. Appl Entomol Zool 14:130–132. https://doi.org/10.1303/aez.14.130 Reichard DG, Anderson RC (2015) Why signal softly? The structure, function and evolutionary significance of low-amplitude signals. Anim Behav 105:253–265. https://doi.org/10.1016/j. anbehav.2015.04.017 Rodríguez RL, Greenfield MD (2004) Behavioural context regulates dual function of ultrasonic hearing in lesser waxmoths: bat avoidance and pair formation. Physiol Entomol 29:159–168. https://doi.org/10.1111/j.1365-3032.2004.00380.x Roeder KD (1962) The behaviour of free-flying moths in the presence of artificial ultrasonic pulses. Anim Behav 10:300–304. https://doi.org/10.1016/0003-3472(62)90053-2 Ryan MJ (1998) Sexual selection, receiver biases, and the evolution of sex differences. Science 281: 1999–2003. https://doi.org/10.1126/science.281.5385.1999 Siemers BM, Kriner E, Kaipf I, Simon M, Greif S (2012) Bats eavesdrop on the sound of copulating flies. Curr Biol 22:R563–R564. https://doi.org/10.1016/j.cub.2012.06.030 Spangler HG (1985) Sound production and communication by the greater wax moth (Lepidoptera: Pyralidae). Ann Entomol Soc Am 78:54–61. https://doi.org/10.1093/aesa/78.1.54

3

Multiple Functions of Ultrasonic Courtship Song in Moths

61

Spangler HG (1988) Moth hearing, defense, and communication. Annu Rev Entomol 33:59–81. https://doi.org/10.1146/annurev.en.33.010188.000423 Spangler HG, Greenfield MD, Takessian A (1984) Ultrasonic mate calling in the lesser wax moth. Physiol Entomol 9:87–95. https://doi.org/10.1111/j.1365-3032.1984.tb00684.x Takanashi T, Nakano R, Surlykke A, Tatsuta H, Tabata J, Ishikawa Y, Skals N (2010) Variation in courtship ultrasounds of three Ostrinia moths with different sex pheromones. PLoS One 5: e13144. https://doi.org/10.1371/journal.pone.0013144 Yack JE (2004) The structure and function of auditory chordotonal organs in insects. Microsc Res Tech 63:315–337. https://doi.org/10.1002/jemt.20051 Yager DD (2012) Predator detection and evasion by flying insects. Curr Opin Neurobiol 22:201– 207. https://doi.org/10.1016/j.conb.2011.12.011 Zuk M, Kolluru GR (1998) Exploitation of sexual signals by predators and parasitoids. Q Rev Biol 73:415–438. https://doi.org/10.1086/420412 Zuk M, Rotenberry JT, Tinghitella RM (2006) Silent night: adaptive disappearance of a sexual signal in a parasitized population of field crickets. Biol Lett 2:521–524. https://doi.org/10.1098/ rsbl.2006.0539

Chapter 4

Recent Progress in Studies on Acoustic Communication of Crickets Takashi Kuriwada

Abstract Acoustic communication between male and female crickets has been studied at both proximate and ultimate levels. This chapter focuses on the study of the ultimate level. The fitness benefits of female preference for male songs have long been studied. As direct benefits, there are many reported cases of increased fecundity in females. As an indirect benefit, it is often reported that the offspring of attractive males become attractive sons. However, some unresolved problems exist that limit our understanding of acoustic communication of crickets, including relationships with other sexually dimorphic traits such as weapon traits and the influence of conspecific social relationships on communication. For example, owing to the aggressiveness of socially isolated males toward females, the male reproductive success is hindered. Furthermore, acoustic signals can be used as cues by heterospecific individuals. Predators and parasitoids eavesdrop on the songs of crickets to locate and attack them. Moreover, the song of one cricket species may interfere with the communication of other species (acoustic masking interference). Finally, I explore how anthropogenic disturbances can impact cricket communication. Keywords Acoustic masking interference · Acoustic signal · Anthropocene · Eavesdrop · Female preference · Sexual selection · Noise · Urbanization

4.1

Introduction

In Japan, many people enjoy the acoustic signals of orthopteran insects such as crickets and katydids since ancient times. Japanese scientists have taken interest in studying the microbiology of the orthopterans such as neurobiology and physiology; however, their ecology and evolution have been poorly investigated in Japan. Orthopteran insects, especially crickets (Gryllidae), are useful model organisms for

T. Kuriwada (✉) Laboratory of Zoology, Faculty of Education, Kagoshima University, Kagoshima, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 Y. Seki (ed.), Acoustic Communication in Animals, https://doi.org/10.1007/978-981-99-0831-8_4

63

64

T. Kuriwada

macrobiological research outside of Japan (Gerhardt and Huber 2002) because the quantification of mating signals and female preference is relatively easy to achieve through playback experiments. Acoustic communication in crickets has been intensively studied to test the theory of sexual selection, evolution of signals, and speciation. Cricket song can be used as both a signal and a cue. Maynard-Smith and Harper (2003) defined signal as “any act or structure which alters the behavior of other organisms, which evolved because of that effect, and which is effective because the receiver’s response has also evolved.” Meanwhile, a cue refers to a trait that has not evolved for the purpose of communication yet can provide information to an observer about the location, identity, or condition of the target organism (Maynard-Smith and Harper 2003). In this chapter, I explore the current research on acoustic communication in crickets as follows. First, I briefly review how cricket acoustic communication is an important signal involved in the process of sexual selection. Second, I show how acoustic communication functions as a cue for conspecific and heterospecific individuals. Finally, I describe how human disturbance can affect cricket acoustic communication.

4.2 4.2.1

Acoustic Signal and Sexual Selection in Crickets Overview of Acoustic Signals and Female Preference in Crickets

In orthopteran insects, especially Gryllidae, males generally produce three distinct types of acoustic signals: calling, courtship, and aggressive songs (Alexander 1961; Gerhardt and Huber 2002). Calling songs are used in the long-range attraction of females and the demonstration of territory to other males; courtship songs are used in short-range courtship immediately prior to copulation; aggressive songs are produced during combative encounters with rival males. The temporal structure of calling songs has been studied in terms of sexual selection. Studies have investigated whether calling songs reflect male phenotypic and genetic characteristics and are targets for female preference, and if so, how females benefit from their preference (Zuk and Simmons 1997; Gerhardt and Huber 2002). Producing songs can reduce survival chance, thus putting males at a disadvantage in terms of natural selection. The production of songs has high energy costs (Hoback and Wagner Jr 1997). In addition, cricket calling songs can attract predators and parasitoids (reviewed by Zuk and Kolluru 1998). In the theory of sexual selection, because of such costs, only males in good conditions can produce songs and guarantee the honesty of the acoustic signal (Maynard-Smith and Harper 2003). In addition, cricket songs are considered reliable index signals because they cannot be faked (Maynard-Smith and Harper 2003). In either case, the relationship between male quality and the features of song structures is correlated. For example,

4

Recent Progress in Studies on Acoustic Communication of Crickets

65

researchers have found correlations between nutritional conditions and calling rate, chirp rate of calling songs, and/or time spent calling in field crickets (e.g., Scheuber et al. 2003, Hunt et al. 2004, Judge et al. 2008, but see Harrison et al. 2013). Furthermore, body size and calling structures such as dominant frequency have been often found to be significantly positively correlated (Simmons 1988; but see, Miyashita et al. 2016). Moreover, male crickets of the species Acheta domesticus, Gryllus campestris, and Teleogryllus commodus producing more attractive calling song have higher immunocompetence (Ryder and Siva Jothy 2000; Jacot et al. 2004; Simmons et al. 2005). Female preference for certain song characteristics has also been shown in field crickets. Appendix 4 of Gerhard and Huber (Gerhardt and Huber 2002) provides an extensive summary of female preference for male acoustic signals in anurans and insects, including crickets. The primary function of calling songs is the identification of appropriate conspecific mates (Honda-Sumi 2005). In addition, it has been reported that there is individual variation in calling songs and that females are attracted to certain song structures over others. For example, T. occipitalis females prefer long chirps with trills over the distinct chirps (Kuriwada et al. 2020). Furthermore, the direct and indirect benefits to females for exhibiting preference for certain calling song have been investigated. Direct benefits improve fitness of the female by enhancing either quantity of successful offspring or survival of the females, including better nuptial gifts, a greater quality or quantity of sperm, or access to better territories. Indirect benefits refer to the provision of superior genetic quality for offspring, thus improving the female’s fitness by improving its offspring quality to increase mating success or viability. The two main mechanisms for achieving these benefits are the “sexy son” and the “good genes.” In the example of the “sexy son” hypothesis, the temporal structure of calling songs, such as pulse per trill and calling bout length, is known to be highly heritable in some field crickets (e.g., Hedrick 1988; Webb and Roff 1992; Gray and Cade 1999; Simmons 2004; Hunt et al. 2007). Therefore, the male offspring of a male whose song is preferred by females is expected to also emit songs preferred by females and thus have high mating success. In the example of “good genes” hypothesis, calling songs often indicate body condition (which in turn reflects resource acquisition ability), body size, and immunocompetence of males. If these traits are heritable, female preference based on the calling song could enhance the viability of the offspring. Empirical studies have examined the direct and indirect benefits of female mate preference in crickets. Many of these studies defined male attractiveness based on female mating responses such as latency to copulation and copulation success of focal males (i.e., no-choice test: Shackleton et al. 2005). For example, a shorter latency to copulation indicates higher attractiveness to females of the male. It is worth noting that these indices of female preference are influenced not only by calling songs but also other factors, including courtship songs and cuticular hydrocarbons (e.g., Thomas and Simmons 2009; Leonard and Hedrick 2010). Therefore, rather than focusing on specific sexually selected traits, a more realistic approach to study female preference in crickets is to use the no-choice tests (Kelly and AdamGranger 2020). Simmons (1987) showed that the offspring of G. bimaculatus

66

T. Kuriwada

females that were allowed to choose their mates had shorter developmental periods and higher survival rates than those of females that had been allocated mates. Meanwhile, sons of attractive G. bimaculatus males had longer developmental times and higher mating success (Wedell and Tregenza 1999). In the ground cricket Allonemobius socius, the father’s mating success is positively correlated with its son’s mating success but negatively with its daughter’s fecundity (Fedorka and Mousseau 2004). This indicates that mating biases for high-quality males can result in conflicting sex-specific offspring fitness returns. Similarly, in the house cricket Acheta domesticus, females who mated with attractive males experienced a shortened lifespan compared to those who mated with unattractive males but had more attractive sons. Consequently, the direct cost to females was compensated by indirect benefits (Head et al. 2005). Recent comprehensive research in the field crickets G. firmus found that the offspring of more attractive males were not superior in either body condition or immunocompetence than the offspring of less attractive males. However, females received direct fecundity benefits from mating with more attractive males (Kelly and Adam-Granger 2020). Herein, I introduce research that focuses on the attractiveness of the calling song itself and the benefits of the preference for song structure. In the variable field cricket Gryllus lineaticeps, females preferred calling songs with higher chirp rates and longer chirp durations (Wagner Jr 1996). Females that mated with males that produced longer chirps showed delayed oviposition and females that delayed oviposition lived longer. Additionally, there was a positive correlation between the male chirp rate and the number of sperm transferred to females; thus, the females they mated with produced a greater quantity of fertile eggs (Wagner Jr and Harper 2003). Because there was a negative phenotypic correlation between the chirp rate and chirp duration, females must make a trade-off between the survival and reproductive benefits when choosing mates. To summarize these studies, female mate preference in crickets may have more direct benefits than indirect benefits. In addition, in terms of indirect benefits, there is more evidence in support of the “sexy son” hypothesis than the “good genes” hypothesis in crickets. Although calling song is the most studied feature of sexual selection in crickets, female preference, relationship between male characteristics, and heritability in courtship song have also been investigated (e.g., Wagner Jr and Reiser 2000; Gray and Eckhardt 2001; Rantala and Kortet 2003; Tregenza et al. 2006; Simmons et al. 2010). Because courtship songs are emitted when males are in close to females, low-quality males may invest more in the courtship song to promote current mating success. Indeed, there are some reports that the courtship song is not conditiondependent (e.g., Gray and Eckhardt 2001; Harrison et al. 2013). Calling song attracts for distant mates; thus, it has a high cost for an uncertain reward if there are no mates nearby, i.e., they are essentially gambling with their energy. However, when a potential mate is close, the chance of actually mating is high, thus there is the high chance of fitness gain from exerting a lot of energy into courtship at this time. Although this is an example of a calling song, older males more invest their resource to calling effort because they have lower future reproductive opportunities, but signal honesty is maintained on average over the entire lifetime in bell crickets

4

Recent Progress in Studies on Acoustic Communication of Crickets

67

Meloimorpha japonica (Kuriwada and Kasuya 2011). To shed light on the evolution of honest signals, it is important to investigate the cost of such unreasonable investment in courtship songs. Most of the studies presented here were based on laboratory experiments. Recently, field studies have been conducted on the acoustic communication of crickets. More information on these field studies can be obtained from the following website: https://www.wildcrickets.org/ (“Wild crickets: evolution of nature”).

4.2.2

Remaining Key Questions

As noted above, male acoustic signals for females can be divided into calling songs and courtship songs. In addition to these songs, the male ground cricket Dianemobius csikii produces song types when it is near females; in this case, the male continues to produce the courtship song near the female for more than 20 min before switching to a different type of song, after which copulation occurs within 10–20 s (Kuriwada In preparation). Although the function of this song type has not yet been verified, the song has a completely different temporal structure to the courtship song and is always emitted after it. Investigation of acoustic signals of other less commonly studied cricket species (i.e., not Gryllus and Teleogryllus species) may lead to the discovery of this new repertoire of songs. Furthermore, there is evidence that courtship songs have another function; in the field cricket T. occipitalis, approximately 10% of males emit courtship songs during male–male contests (i.e., same-sex sexual behavior). This behavior weakens the intensity of the contest, thus reducing risk of injury of the males (Fig. 4.1, Kuriwada 2017). Thus, although the intensive study of the acoustic signal of crickets, there is still room for unexplored discoveries. Although research has typically concentrated on acoustic signals, other signals (such as chemical signals and physical traits) are also important in cricket communication. For example, an interaction between the acoustic and chemical signals has been observed in G. integer; in this study, females accepted courtship by chemical signals more quickly after exposure to attractive calling songs than after exposure to unattractive calling songs (Leonard and Hedrick 2010). Furthermore, male crickets not only use aggressive songs in male–male competition but also have weapon traits for direct fighting. For example, Judge and Bonanno (2008) examined sizes of the head and mouthparts, including the maxillae and mandibles, of G. pennsylvanicus and identified them to be sexually dimorphic weaponry. The horn-headed cricket Loxoblemmus doenitzi exhibits a distinctly dimorphic head shape, with males having flat heads and triangular horns that are used for male–male contest behavior (Fig. 4.2), while females’ heads are rounded and hornless (Kim et al. 2011). Interestingly, Kuriwada (2016a) found no significant correlations between the temporal structures of calling songs and horn length. Additionally, calling efforts such as nightly calling duration and phrase interval were affected by recent diet quality whereas horn length was not (Kuriwada 2016a). These results suggest that acoustic

68

T. Kuriwada

Fig. 4.1 (a) Effects of courtship song on the intensity of male–male contest behavior in Teleogryllus occipitalis. Each data point is represented by a dot, and the width of violin plot represents the relative frequency of intensity of contest. The intensity of contest was recorded using a categorical scale of aggression, where 0 = mutual avoidance, 1 = antenna fencing and aggressive song, 2 = bilateral maxillae/ mandible spreading, 3 = maxillae/mandible engagement, and 4 = grappling. When males emitted courtship song, the intensity of contests significantly decreased ( p < 0.01). (Modified from Kuriwada 2017). (b) Image of the male on the right emitting courtship song to the male on the left

signals are more reflective of short-term influences of resource availability, whereas morphological structures may only reflect long-term resource quality. Therefore, the combination of signals reflects different aspects of male quality. The relationship between acoustic signals and weapon traits for male–male competition is also an interesting topic. Intra-specific interactions also affect acoustic communication in crickets. Male G. bimaculatus reared in groups rarely exhibit aggressive displays, whereas interactions between males of previously isolated crickets typically become aggressive (Stevenson and Rillich 2013). If such social isolation promotes male aggression, it is likely that courtship behavior between sexes would also be affected by isolation. Indeed, isolated G. bimaculatus males showed higher levels of aggression toward the antennae of females than males reared in groups (Nagamoto et al. 2005). To examine whether enhanced aggression due to social isolation affects male mating success, Kuriwada (2016b) set up three treatments: one male, two males, or one male and one female. Each treatment group was reared in an isolated container for 7 days;

4

Recent Progress in Studies on Acoustic Communication of Crickets

69

Fig. 4.2 Images of Loxoblemmus doenitzi. (a) Dorsal view. (b) Frontal view of five male heads with varying horn length. Dotted lines indicate how horn length is measured

thereafter, males were presented with females. Male aggression to females and copulation success was measured. Males reared alone or with females were more aggressive toward females and had lower mating success than males reared with other males. These results show that social isolation from same-sex individuals escalates male aggression toward females. Additionally, when male G. bimaculatus encountered heavier females (i.e., females with higher fecundity), calling effort increased, irrespective of copulation success (Kuriwada 2022). These results suggest that social relationships are a critical factor affecting mating behavior, even though the species is normally considered solitary. Although acoustic communication in crickets has been studied intensively, new discoveries and unresolved problems remain. This research will continue to develop in the future.

70

4.3

T. Kuriwada

Effect of Conspecifics on Acoustic Communication of Crickets

While the previous section discussed the song of the cricket as a signal, this and the subsequent sections discuss the song as a cue. As previously described, signals function as intentional forms of communication, while cues are involuntary indicators of an organism’s identity or condition. Some crickets use songs as cues to indicate the presence and status of nearby conspecifics. For example, songs operate as a cue for males to assess the number of nearby rivals, while females can assess the availability of mates. There is evidence that conspecific male calling songs operate as cues that indicate a greater risk of sperm competition, leading to increased male sexual investment. In a classic example, the males of the cricket species A. domesticus and Gryllodes supplicans have been observed to transfer more sperm to females when exposed to visual and auditory cues from cohabiting conspecific males than when no other males are nearby (Gage and Barnard 1996). A series of experiments are described below that investigated the effects of calling songs on reproductive investment and life history in male and female T. commodus field crickets. In these studies, recordings of calling songs were broadcasted to crickets during the nymph stage, with song quality experimentally manipulated and differences in male density simulated by changing the number of speakers. They found that when reared in an environment simulating a high density of highquality conspecific males, male nymphs delayed maturity to grow larger and called less frequently than those reared in a simulation of low male density (Kasumovic et al. 2011). In addition, males exposed to high calling rates as nymphs exhibited a more dramatic decline in calling efforts at late adulthood (Kasumovic et al. 2012). These results indicate that the songs of other males heard during the nymph stage operate as a cue for the number and quality of rivals, allowing nymphs to adjust their development to avoid competition or compete more effectively with rivals. Meanwhile, females reared under the same experimental simulations matured more rapidly and allocated more resources to egg production when reared in a simulation of a higher density of high-quality males (Kasumovic et al. 2011). Additionally, female nymphs reared with greater variation in calling rate were more responsive to male song as adults (Kasumovic et al. 2012) and produced higher quality offspring when mated with more attractive males (Santana et al. 2020) than those reared with only high calling rates. Similarly, calling song exposure at the adult stage increased ovary mass in female G. firmus, especially in long-winged individuals (Conroy and Roff 2018). These results show that male calling songs operate as acoustic cues to influence the reproductive strategies of conspecific females. These are not the only ways in which calling songs operate as conspecific cues; another example is a phenomenon known as “satellite behavior.” Satellite behavior is a mating tactic by which a male with poor competitive ability does not produce song itself but stays near a higher quality calling male, aiming to intercept females that are attracted (Gerhardt and Huber 2002). Thus, in species that exhibit this

4

Recent Progress in Studies on Acoustic Communication of Crickets

71

behavior, less lower quality males are expected to be drawn to attractive male songs. Indeed, in the house cricket A. domesticus, smaller males are known to exhibit stronger phonotaxis to male calling songs (Kiflawi and Gray 2000). However, copulation in crickets begins when the female mounts a male. Therefore, the extent to which interception of females by satellite males is successful is not understood (Gerhardt and Huber 2002). Another example of male phonotaxic response is the aggressive displacement hypothesis, which predicts that males with higher competitive ability may rely on positive phonotaxis to displace competitors for suitable calling sites (Jang 2011). More competitive or attractive males exhibited strong phonotaxis to attractive calling song in the field cricket G. integer and G. texensis (Leonard and Hedrick 2009; McCarthy et al. 2013). These evidences show that calling songs are used not only as a signal to prospective mates but also used as cues to the same species. In the subsequent sections, I will explore the effects of calling songs on interspecific interactions.

4.4 4.4.1

Effect of Heterospecifics on Acoustic Communication of Crickets Predator–Prey Interaction

Crickets are the target prey or host for many predators and parasitoids, and their conspicuous calling songs are often used as cues for locating them, a behavior referred to as eavesdropping. For example, the Mediterranean house gecko Hemidactylus turcicus exhibits positive phonotaxis to the calling song of the decorated cricket G. supplicans (Sakaluk and Belwood 1984). Similarly, the heron Florida coerulea uses the song of short-tailed crickets, Anurogryllus celerinictus, as a cue for hunting (Bell 1979). Perhaps one of the most well-documented instances of eavesdropping using cricket song is observed in parasitoid flies of the genus Ormia, wherein female flies locate male crickets by following their calling songs and then deposit their larvae on the host (Cade 1975). On two Hawaiian islands, the Pacific field cricket T. oceanicus lost the ability to produce calling songs over just 12–20 generations in response to the introduction of the parasitoid fly O. ochracea (Zuk et al. 2006). Interestingly, the loss of song on both islands is an instance of convergent evolution, owing to different genetic mutations between the two populations (Pascoal et al. 2014). Silent mate location on these islands was facilitated by a combination of pre-existing mate-locating strategies, such as satellite behavior (Bailey and Zuk 2008; Tinghitella et al. 2009). In addition, a newly evolved song has been discovered in these populations after the original song was lost (Tinghitella et al. 2018). Although its function in courtship is unclear, the novel song appears to attract long-distance females (Tinghitella et al. 2018). Eavesdropping by predators and parasitoids has an important impact on the

72

T. Kuriwada

evolution of acoustic communication in crickets (Dobbs et al. 2020). For a more extensive review of the eavesdropping of acoustic signals, refer to Zuk and Kolluru (1998).

4.4.2

Competition for Calling Sites

To introduce another perspective, I have extended the aggressive displacement hypothesis introduced in Sect. 4.3 to interspecific interactions in this section. Suitable calling sites are often rare, with competition between cricket species. Therefore, it would be favorable for males with higher competitive advantage to use the song of other species as a cue to identify calling sites and aggressively displace the males of less competitive species. Two field cricket species, T. occipitalis and L. equestris, coexist in the same habitat on Amami Oshima Island in Japan and emit calling songs in cracks in the ground surface (Kuriwada 2020). The body size of T. occipitalis is larger than that of L. equestris (body length: 2–2.5 times; body weight: 3 times; Kuriwada 2020). Furthermore, T. occipitalis outperforms L. equestris in interspecific resource competition under laboratory conditions (Tajima et al. 2019). Therefore, the aggressive displacement hypothesis predicts that T. occipitalis will exhibit a positive phonotaxic response to the calling songs of L. equestris. However, in a playback experiment, neither T. occipitalis nor L. equestris males exhibited any phonotactic responses to heterospecific songs (Kuriwada 2020). Why did crickets not respond to songs of other species? One potential reason is that a suitable calling site for one species may not be suitable for another species. Another explanation is that if auditory perception exhibits selectivity to a conspecific frequency, then a heterospecific song may not be registered by the focal species. The mean dominant frequencies of T. occipitalis and L. equestris were approximately 4500 and 5000 kHz, respectively. In G. bimaculatus, the auditory neuron has sharp directional sensitivity only within a limited frequency range very close to the 5 kHz optimum, corresponding to the peak frequency of its own species’ calling song (Popov et al. 1978). In fact, the female T. occipitalis can locate conspecific males based on the calling song without being disturbed by the calling song of L. equestris, although whether the cricket species are capable of frequency tuning has not yet been confirmed (Fig. 4.3, Kuriwada et al. 2020).

4.4.3

Acoustic Masking Interference

Such frequency tuning of field crickets is likely to be necessary because of the co-occurrence of multiple species that use acoustic signals. In the wild, multiple species of crickets, katydids, and frogs often chorus simultaneously in the same habitat. Therefore, acoustic masking interference is an important problem in conspecific acoustic communication and is an example of reproductive interference.

4

Recent Progress in Studies on Acoustic Communication of Crickets

73

Fig. 4.3 Effects of conspecific song type and presence of heterospecific song on female preference. The time spent by female Teleogryllus occipitalis to choose between an average calling song and either attractive or unattractive calling songs were measured. “hs” indicates the presence of heterospecific (Loxoblemmus equestris) calling song. Each data point is represented by a dot and median, and interquartile range is shown with the boxplot. There were no significant statistical significances among the treatments ( p > 0.05). A large proportion of females chose the more attractive song. (Modified from Kuriwada et al. 2020)

Reproductive interference refers to instances when individuals of different species sexually interact during mating behavior, with one or both species suffering a fitness cost (Reviewed by Gröning and Hochkirch 2008; Shuker and Burdfield-Steel 2017; Kyogoku and Wheatcroft 2020). Reproductive interference can reduce population growth and has a strong impact due to positive frequency-dependent selection. Therefore, because the more abundant species are less likely to experience heterospecific encounters at the individual level, initially abundant species can drive the rapid extinction of rarer species via the reproductive interference (Kyogoku and Wheatcroft 2020). However, as noted above, multiple species using acoustic signals do coexist in the field. This suggests that these species have developed strategies for avoiding acoustic masking interference. Schmidt and Balakrishnan (2015) summarized various strategies for achieving this, including temporal, spatial,

74

T. Kuriwada

and spectral partitioning and frequency tuning. Presumably, such mechanisms will exist among species that have coexisted for a sufficient time for evolution to occur. However, there may be acoustic masking interference from invasive species because invasive and native species have not coexisted on an evolutionary time scale. Indeed, in anurans, advertisement calls of native Brazilian frogs are plastically modified by the song of the invasive bull frog Lithobates catesbeianus (Medeiros et al. 2017). Little research has been conducted on orthopteran species, although one study found that calls of the invasive Coquí frog did not affect female phonotaxis in the Pacific field cricket T. oceanicus (Zuk et al. 2017). In Japan, two non-native cricket species, A. domesticus and G. bimaculatus, have been popularly used as live foods for predatory pet animals. If the crickets escape into the wild, biological interactions with native species will likely occur. Since these species can move quickly and have a powerful jump, the risk of escape to the wild is high. Although not crickets, the Dubia roach Blaptica dubia, which is native to South America and commercially traded as a live food, has been found in central Japan, where they are estimated to be able to survive through two winters (Kato and Yamasako 2021). Non-native prey animals can be able to survive in the field. It is important to examine not only interspecific resource competition but also acoustic masking interference between invasive and native crickets.

4.5

Acoustic Communication of Crickets in the Anthropocene

Disturbance of acoustic communication by anthropogenic noise can be considered as one of acoustic masking interference in a broad sense. Indeed, acoustic communication of bats, birds, frogs, fish, and insects is often hampered by anthropogenic noise (reviewed by Swaddle et al. 2015; Candolin 2019; Heinen-Kay et al. 2021; Cronin et al. 2022). Within this topic, the most commonly studied group are birds, followed by frogs (Heinen-Kay et al. 2021). This is because much anthropogenic noise (such as traffic noise) produces low-frequency sounds that are more similar in frequency to bird and frog acoustic signals than those of invertebrates. However, in some field cricket species, such as G. bimaculatus, anthropogenic noise prevents female phonotaxis toward male calling songs, even when the frequency band of the calling song is above that of traffic noise (e.g., Schmidt et al. 2014; Bent et al. 2018). Exposure to traffic noise during pre-maturation in T. oceanicus hindered the development of female mate location behavior, regardless of whether traffic sounds were present at testing (Gurule-Small and Tinghitella 2018). Interestingly, although females are less likely to respond to male songs under noisy conditions, noise decreases the rejection responses of certain unattractive songs in grasshopper Chorthippus biguttulus (Reichert and Ronacher 2015). This implies that anthropogenic noise may undermine the intensity of sexual selection in invertebrates.

4

Recent Progress in Studies on Acoustic Communication of Crickets

75

Differences between the songs of orthopteran insects reared in noisy and quiet conditions have also been reported (e.g., in the grasshopper C. biguttulus (Lampe et al. 2012, 2014), the tree cricket Oecanthus (Costello and Symes 2014; Orci et al. 2016), and the field cricket G. bimaculatus (Gallego-Abenza et al. 2020)). In a study on six species of Oecanthus tree crickets, the dominant frequency of calling songs did not change depending on the noise level, but males were less likely to call (Costello and Symes 2014). Similarly, male O. pellucens used shortened echemes (sets of syllables) and paused singing with increasing noise levels. However, males did not modify the fundamental frequency of their song (Orci et al. 2016). In G. bimaculatus, the inter-chirp duration increased with exposure to noise, but the dominant frequency did not change (Gallego-Abenza et al. 2020). Furthermore, males from areas close to roads exhibited decreased chirp rates than those collected further from the road. This implies that crickets can adapt their behavior when exposed to constant noise pollution in order to maintain effective signaling (Gallego-Abenza et al. 2020). Contrary to these results, male grasshopper C. biguttulus from roadside habitats call at a higher dominant frequency compared to males from quieter habitats (Lampe et al. 2012). The potential mechanisms of these changes in song structure and/or calling effort in response to noise are developmental plasticity (Lampe et al. 2014; Orci et al. 2016; Gallego-Abenza et al. 2020) and genetic changes (Lampe et al. 2014). However, few studies have examined whether the changes in frequency and temporal structure of songs under noisy environments are adaptive. Namely, there is no evidence that the changed song is more likely to be identified by females under the noisy conditions. Kuriwada (2023) shows that there are differences in the temporal structure of calling songs in the ground cricket Dianemobius nigrofasciatus between rural and urban habitats. However, the songs of urban males were not easier to localize by females under traffic noise than the songs of rural males. It will be a future challenge to determine whether the changes in calling songs of urban crickets are adaptive under noisy conditions. Noise pollution is not the only anthropogenic disturbance to crickets; other factors by human activity such as artificial light at night (ALAN), chemical emissions, and habitat fragmentation often co-occur in urban areas. When these disturbances co-occur, they can modify each other effects in various ways. The combined effects can be classified as antagonistic (less than the sum of each impact) and synergistic (more than the sum of each impact) (Hale et al. 2017). Thus, to examine the effects of anthropogenic noise on cricket communication, it is necessary to consider the combined effects of other anthropogenic disturbance types (Halfwerk and Slabbekoorn 2015). Anthropogenic noise and ALAN are two major urban-specific factors that strongly impact wildlife through sensory pollution (Swaddle et al. 2015). As described above, there are several studies on the effects of noise on the mating of crickets. Although there are relatively few studies on the impact of ALAN on acoustic communication, Levy et al. (2021) showed that ALAN alters the timing of calls in G. bimaculatus. Females chronically exposed to high ALAN mounted males significantly less often, whereas males reared on high levels of ALAN (i.e.,

76

T. Kuriwada

100 lx; the equivalent of a brightly lit urban area) were mounted more often by females (Botha et al. 2017). A study by Rebar et al. (2022) is one of the few to examine the combined effects of noise and ALAN. Rebar et al. (2022) reared G. veletis crickets from emergence to adulthood in one of four treatments (control, traffic noise, ALAN, or traffic noise and ALAN) and examined the single and combined effects of the treatments on life history and reproductive traits. In particular, traits related to mating were investigated. The search time in response to calling song playback was faster in females reared under noisy conditions than in control females. However, ALAN treatment did not significantly affect the search time. The time to mount a male was quickened by the combined ALAN and noise treatments but was more influenced by noise treatment alone. In general, the combined effects of noise and ALAN are neither additive nor synergistic. Under similar experimental conditions, Ichikawa and Kuriwada (2023) showed a weak synergistic effect of noise and ALAN on the survival rate of the ground cricket D. nigrofasciatus. However, no significant combined effects on developmental time and diapause rate were observed (Ichikawa and Kuriwada 2023), while the combined effect on acoustic communication was not examined. As mentioned, there are only few studies on the combined effects of anthropogenic disturbances. Thus, the extent of their impact remains unclear but could potentially be substantial. The study of adaptive evolution to urban-specific environmental factors, such as noise and ALAN, is a research area that has attracted considerable attention (Verrelli et al. 2022). Crickets are excellent research targets because they use easily quantifiable acoustic signals for communication, are abundant in both urban and environmental environments, have appropriate body size, and are easy to breed cumulatively.

4.6

Conclusion

Acoustic communication in crickets has long been studied as a model system in neuroscience, genetics, physiology, ecology, and evolutionary biology. Recently, the songs of crickets have been studied not only as signals within conspecifics but also as cues for conspecifics and heterospecifics. Furthermore, in the Anthropocene, the impact of anthropogenic noise on acoustic communication and adaptation to the urban environments of the cricket are interesting subjects. There are many unexplored areas not covered in this chapter, such as speciation and the origin of acoustic communication. Acknowledgments The author gratefully acknowledges support from the Special Budget of MEXT: Establishment of Research and Education Network on Biodiversity and Its Conservation in the Satsunan Islands, and JSPS KAKENHI (Grant Numbers 16 K21244 and 19 K06842).

4

Recent Progress in Studies on Acoustic Communication of Crickets

77

References Alexander RD (1961) Aggressiveness, territoriality, and sexual behavior in field crickets (Orthoptera: Gryllidae). Behaviour 17:130–223 Bailey NW, Zuk M (2008) Acoustic experience shapes female mate choice in field crickets. Proc R Soc B 275(1651):2645–2650 Bell PD (1979) Acoustic attraction of herons by crickets. J.N.Y. Entomol Soc 87:126–127 Bent AM, Ings TC, Mowles SL (2018) Anthropogenic noise disrupts mate searching in Gryllus bimaculatus. Behav Ecol 29(6):1271–1277 Botha LM, Jones TM, Hopkins GR (2017) Effects of lifetime exposure to artificial light at night on cricket (Teleogryllus commodus) courtship and mating behaviour. Anim Behav 129:181–188 Cade W (1975) Acoustically orienting parasitoids: fly phonotaxis to cricket song. Science 190(4221):1312–1313 Candolin U (2019) Mate choice in a changing world. Biol Rev 94:1246–1260 Conroy LP, Roff DA (2018) Adult social environment alters female reproductive investment in the cricket Gryllus firmus. Behav Ecol 29(2):440–447 Costello RA, Symes LB (2014) Effects of anthropogenic noise on male signalling behaviour and female phonotaxis in Oecanthus tree crickets. Anim Behav 95:15–22 Cronin AD, Smit JA, Muñoz MI, Poirier A, Moran PA, Jerem P, Halfwerk W (2022) A comprehensive overview of the effects of urbanisation on sexual selection and sexual traits. Biol Rev 97:1325–1345 Dobbs OL, Talavera JB, Rossi SM, Menjivar S, Gray DA (2020) Signaler–receiver–eavesdropper: risks and rewards of variation in the dominant frequency of male cricket calls. Ecol Evol 10(21): 12364–12371 Fedorka KM, Mousseau TA (2004) Female mating bias results in conflicting sex-specific offspring fitness. Nature 429(6987):65–67 Gage AR, Barnard CJ (1996) Male crickets increase sperm number in relation to competition and female size. Behav Ecol Sociobiol 38(5):349–353 Gallego-Abenza M, Mathevon N, Wheatcroft D (2020) Experience modulates an insect’s response to anthropogenic noise. Behav Ecol 31:90–96 Gerhardt HC, Huber F (2002) Acoustic communication in insects and anurans: common problems and diverse solutions. Physiol Entomol 28(1):62–63 Gray DA, Cade WH (1999) Quantitative genetics of sexual selection in the field cricket, Gryllus integer. Evolution 53(3):848–854 Gray DA, Eckhardt G (2001) Is cricket courtship song condition dependent? Anim Behav 62(5): 871–877 Gröning J, Hochkirch A (2008) Reproductive interference between animal species. Q Rev Biol 83(3):257–282 Gurule-Small GA, Tinghitella RM (2018) Developmental experience with anthropogenic noise hinders adult mate location in an acoustically signalling invertebrate. Biol Lett 14(2):20170714 Hale R, Piggott JJ, Swearer SE (2017) Describing and understanding behavioral responses to multiple stressors and multiple stimuli. Ecol Evol 7:38–47 Halfwerk W, Slabbekoorn H (2015) Pollution going multimodal: the complex impact of the humanaltered sensory environment on animal perception and performance. Biol Lett 11:20141051 Harrison SJ, Thomson IR, Grant CM, Bertram SM (2013) Calling, courtship, and condition in the fall field cricket, Gryllus pennsylvanicus. PloS One 8(3):e60356 Head M, Hunt J, Jennions M, Brooks R (2005) The indirect benefits of mating with attractive males outweigh the direct costs. PLoS Biol 3:e33 Hedrick AV (1988) Female choice and the heritability of attractive male traits: an empirical study. Am Nat 132(2):267–276 Heinen-Kay JL, Kay AD, Zuk M (2021) How urbanization affects sexual communication. Ecol Evol 11:17625–17650

78

T. Kuriwada

Hoback WW, Wagner WE Jr (1997) The energetic cost of calling in the variable field cricket, Gryllus lineaticeps. Physiol Entomol 22(3):286–290 Honda-Sumi E (2005) Difference in calling song of three field crickets of the genus Teleogryllus: the role in premating isolation. Anim Behav 69(4):881–889 Hunt J, Brooks R, Jennions MD, Smith MJ, Bentsen CL, Bussiere LF (2004) High-quality male field crickets invest heavily in sexual display but die young. Nature 432(7020):1024–1027 Hunt J, Blows MW, Zajitschek F, Jennions MD, Brooks R (2007) Reconciling strong stabilizing selection with the maintenance of genetic variation in a natural population of black field crickets (Teleogryllus commodus). Genetics 177(2):875–880 Ichikawa I, Kuriwada T (2023) The combined effects of artificial light at night and anthropogenic noise on life history traits in ground crickets. Ecol Res Jacot A, Scheuber H, Brinkhof MW (2004) Costs of an induced immune response on sexual display and longevity in field crickets. Evolution 58(10):2280–2286 Jang Y (2011) Male responses to conspecific advertisement signals in the field cricket Gryllus rubens (Orthoptera: Gryllidae). PloS One 6:e16063 Judge KA, Bonanno VL (2008) Male weaponry in a fighting cricket. PloS One 3:e3980 Judge KA, Ting JJ, Gwynne DT (2008) Condition dependence of male life span and calling effort in a field cricket. Evolution 62(4):868–878 Kasumovic MM, Hall MD, Try H, Brooks RC (2011) The importance of listening: juvenile allocation shifts in response to acoustic cues of the social environment. J Evol Biol 24(6): 1325–1334 Kasumovic MM, Hall MD, Brooks RC (2012) The juvenile social environment introduces variation in the choice and expression of sexually selected traits. Ecol Evol 2(5):1036–1047 Kato T, Yamasako J (2021) First field record of an introduced pet-feeder cockroach, Blaptica dubia (Serville, 1838) (Blattidae, Blaberinae), in a temperate zone of Japan. Entomol Sci 24(1):76–78 Kelly CD, Adam-Granger É (2020) Mating with sexually attractive males provides female Gryllus firmus field crickets with direct but not indirect fitness benefits. Behav Ecol Sociobiol 74(7): 1–12 Kiflawi M, Gray DA (2000) Size–dependent response to conspecific mating calls by male crickets. Proc R Soc B 267(1458):2157–2161 Kim H, Jang Y, Choe JC (2011) Sexually dimorphic male horns and their use in agonistic behaviors in the horn-headed cricket Loxoblemmus doenitzi (Orthoptera: Gryllidae). J Ethol 29:435–441 Kuriwada T (2016a) Horn length is not correlated with calling efforts in the horn-headed cricket Loxoblemmus doenitzi (Orthoptera: Gryllidae). Entomol Sci 19:228–232 Kuriwada T (2016b) Social isolation increases male aggression toward females in the field cricket Gryllus bimaculatus. Pop Ecol 58:147–153 Kuriwada T (2017) Male–male courtship behaviour, not relatedness, affects the intensity of contest competition in the field cricket. Anim Behav 126:217–220 Kuriwada T (2020) Male responses to conspecific and heterospecific songs in two field cricket species. J Ethol 38(1):99–105 Kuriwada T (2022) Encounter with heavier females changes courtship and fighting efforts of male field crickets Gryllus bimaculatus (Orthoptera: Gryllidae). J Ethol 40(2):145–151 Kuriwada T, Kasuya E (2011) Age-dependent changes in calling effort in the bell cricket Meloimorpha japonica. J Ethol 29(1):99–105 Kuriwada T, Kawasaki R, Kuwano A, Reddy GV (2020) Mate choice behavior of female field crickets is not affected by exposure to heterospecific calling songs. Environ Entomol 49(3): 561–565 Kyogoku D, Wheatcroft D (2020) Heterospecific mating interactions as an interface between ecology and evolution. J Evol Biol 33(10):1330–1344 Lampe U, Schmoll T, Franzke A, Reinhold K (2012) Staying tuned: grasshoppers from noisy roadside habitats produce courtship signals with elevated frequency components. Funct Ecol 26(6):1348–1354

4

Recent Progress in Studies on Acoustic Communication of Crickets

79

Lampe U, Reinhold K, Schmoll T (2014) How grasshoppers respond to road noise: developmental plasticity and population differentiation in acoustic signalling. Funct Ecol 28(3):660–668 Leonard AS, Hedrick AV (2009) Male and female crickets use different decision rules in response to mating signals. Behav Ecol 20:1175–1184 Leonard AS, Hedrick AV (2010) Long-distance signals influence assessment of close range mating displays in the field cricket, Gryllus integer. Biol J Linn Soc 100(4):856–865 Levy K, Wegrzyn Y, Efronny R, Barnea A, Ayali A (2021) Lifelong exposure to artificial light at night impacts stridulation and locomotion activity patterns in the cricket Gryllus bimaculatus. Proc R Soc B 288(1959):20211626 Maynard-Smith J, Harper D (2003) Animal signals. Oxford University Press, New York, NY McCarthy TM, Keyes J, Cade WH (2013) Phonotactic behavior of male field crickets (Gryllus texensis) in response to acoustic calls from conspecific males. J Insect Behav 26:634–648 Medeiros CI, Both C, Grant T, Hartz SM (2017) Invasion of the acoustic niche: variable responses by native species to invasive American bullfrog calls. Biol Invasions 19(2):675–690 Miyashita A, Kizaki H, Sekimizu K, Kaito C (2016) No effect of body size on the frequency of calling and courtship song in the two-spotted cricket, Gryllus bimaculatus. PloS One 11: e0146999 Nagamoto J, Aonuma H, Hisada M (2005) Discrimination of conspecific individuals via cuticular pheromones by males of the cricket Gryllus bimaculatus. Zoolog Sci 22:1079–1088 Orci KM, Petróczki K, Barta Z (2016) Instantaneous song modification in response to fluctuating traffic noise in the tree cricket Oecanthus pellucens. Anim Behav 112:187–194 Pascoal S, Cezard T, Eik-Nes A et al (2014) Rapid convergent evolution in wild crickets. Curr Biol 24(12):1369–1374 Popov AV, Markovich AM, Andjan AS (1978) Auditory interneurons in the prothoracic ganglion of the cricket, Gryllus bimaculatus DeGeer. J Comp Physiol 126(2):183–192 Rantala MJ, Kortet R (2003) Courtship song and immune function in the field cricket Gryllus bimaculatus. Biol J Linn Soc 79(3):503–510 Rebar D, Bishop C, Hallett AC (2022) Anthropogenic light and noise affect the life histories of female Gryllus veletis field crickets. Behav Ecol 33:731–739 Reichert MS, Ronacher B (2015) Noise affects the shape of female preference functions for acoustic signals. Evolution 69:381–394 Ryder JJ, Siva Jothy MT (2000) Male calling song provides a reliable signal of immune function in a cricket. Proc Biol Sci 267(1449):1171–1175 Sakaluk SK, Belwood JJ (1984) Gecko phonotaxis to cricket calling song: a case of satellite predation. Anim Behav 32(3):659–662 Santana EM, Machado G, Kasumovic MM (2020) Pre-maturation social experience affects female reproductive strategies and offspring quality in a highly polyandrous insect. Behav Ecol Sociobiol 74(11):1–11 Scheuber H, Jacot A, Brinkhof MW (2003) Condition dependence of a multicomponent sexual signal in the field cricket Gryllus campestris. Anim Behav 65(4):721–727 Schmidt AK, Balakrishnan R (2015) Ecology of acoustic signalling and the problem of masking interference in insects. J Comp Physiol A 201(1):133–142 Schmidt R, Morrison A, Kunc HP (2014) Sexy voices–no choices: male song in noise fails to attract females. Anim Behav 94:55–59 Shackleton MA, Jennions MD, Hunt J (2005) Fighting success and attractiveness as predictors of male mating success in the black field cricket, Teleogryllus commodus: the effectiveness of no-choice tests. Behav Ecol Sociobiol 58:1–8 Shuker DM, Burdfield-Steel ER (2017) Reproductive interference in insects. Ecol Entomol 42(S1): 65–75 Simmons LW (1987) Female choice contributes to offspring fitness in the field cricket, Gryllus bimaculatus (De Geer). Behav Ecol Sociobiol 21(5):313–321 Simmons LW (1988) Male size, mating potential and lifetime reproductive success in the field cricket, Gryllus bimaculatus (De Geer). Anim Behav 36:372–379

80

T. Kuriwada

Simmons LW (2004) Genotypic variation in calling song and female preferences of the field cricket Teleogryllus oceanicus. Anim Behav 68(2):313–322 Simmons LW, Zuk M, Rotenberry JT (2005) Immune function reflected in calling song characteristics in a natural population of the cricket Teleogryllus commodus. Anim Behav 69(6): 1235–1241 Simmons LW, Tinghitella RM, Zuk M (2010) Quantitative genetic variation in courtship song and its covariation with immune function and sperm quality in the field cricket Teleogryllus oceanicus. Behav Ecol 21(6):1330–1336 Stevenson PA, Rillich J (2013) Isolation associated aggression: a consequence of recovery from defeat in a territorial animal. PloS One 8:e74965 Swaddle JP, Francis CD, Barber JR, Cooper CB, Kyba CCM, Dominoni DM, Shannon G, Aschehoug E, Goodwin SE, Kawahara AY, Luther D, Spoelstra K, Voss M, Longcore T (2015) A framework to assess evolutionary responses to anthropogenic light and sound. Trends Ecol Evol 30:550–560 Tajima S, Yamamoto K, Kuriwada T (2019) Interspecific interference competition between two field cricket species. Entomol Sci 22(3):311–316 Takashi Kuriwada (2023) Differences in calling song and female mate location behaviour between urban and rural crickets. Biological Journal of the Linnean Society Thomas ML, Simmons LW (2009) Sexual selection on cuticular hydrocarbons in the Australian field cricket, Teleogryllus oceanicus. BMC Evol Biol 9(1):1–12 Tinghitella RM, Wang JM, Zuk M (2009) Preexisting behavior renders a mutation adaptive: flexibility in male phonotaxis behavior and the loss of singing ability in the field cricket Teleogryllus oceanicus. Behav Ecol 20(4):722–728 Tinghitella RM, Broder ED, Gurule-Small GA, Hallagan CJ, Wilson JD (2018) Purring crickets: the evolution of a novel sexual signal. Am Nat 192(6):773–782 Tregenza T, Simmons LW, Wedell N, Zuk M (2006) Female preference for male courtship song and its role as a signal of immune function and condition. Anim Behav 72(4):809–818 Verrelli BC, Alberti M, Des Roches S et al (2022) A global horizon scan for urban evolutionary ecology. Trends Ecol Evol S0169-5347(22):00190–00192 Wagner WE Jr (1996) Convergent song preferences between female field crickets and acoustically orienting parasitoid flies. Behav Ecol 7(3):279–285 Wagner WE Jr, Harper CJ (2003) Female life span and fertility are increased by the ejaculates of preferred males. Evolution 57(9):2054–2066 Wagner WE Jr, Reiser MG (2000) The importance of calling song and courtship song in female mate choice in the variable field cricket. Anim Behav 59(6):1219–1226 Webb KL, Roff DA (1992) The quantitative genetics of sound production in Gryllus firmus. Anim Behav 44(5):823–832 Wedell N, Tregenza T (1999) Successful fathers sire successful sons. Evolution 53(2):620–625 Zuk M, Kolluru GR (1998) Exploitation of sexual signals by predators and parasitoids. Q Rev Biol 73:415–438 Zuk M, Simmons LW (1997) Reproductive strategies of the crickets (Orthoptera: Gryllidae). In: Choe JC, Crespi BJ (eds) The evolution of mating systems in insects and arachnids. Cambridge University Press, Cambridge, pp 89–109 Zuk M, Rotenberry JT, Tinghitella RM (2006) Silent night: adaptive disappearance of a sexual signal in a parasitized population of field crickets. Biol Lett 2(4):521–524 Zuk M, Tanner JC, Schmidtman E, Bee MA, Balenger S (2017) Calls of recently introduced coquí frogs do not interfere with cricket phonotaxis in Hawaii. J Insect Behav 30(1):60–69

Chapter 5

Vocal Imitation, A Specialized Brain Function That Facilitates Cultural Transmission in Songbirds Masashi Tanaka

Abstract We humans are one of the rare animals that imitate others. Imitation enables us to transmit language, music, and other skills and knowledge across generations, contributing to the growth of complex cultures and civilizations. Among mammals, only a handful of species have been reported to imitate others, whereas vocal imitation is oddly common in some avian species including songbirds. Similar to human speech, birdsong is a sequence of complex vocalizations transmitted across generations through imitative learning. Recent advances in neuroscience have begun to elucidate specialized neural circuits underlying the vocal imitation in songbirds. This chapter focuses on songbirds and some other social animals with vocal imitation and discusses how studies on these animals can be useful to understand not only our imitation ability, but also our sociality and complex cultures. Keyword Songbird · Birdsong · Vocal learning · Imitation · Cultural transmission

5.1

Imitation, Social Learning, and Cultural Transmission

When we learn a new skill or knowledge, learning from others is generally much more efficient than learning on our own. For this reason, we often seek an appropriate tutor when starting to learn a new sport, musical instrument, or other cultural activities. Speech acquisition is no exception. When we learn the first language during childhood, caregivers often serve as models of appropriate verbal communication. Although humans readily learn vocalizations and other motor actions from others, such imitative learning is an exceptional feat. Indeed, prior studies have shown that only a limited animal species can learn skills and knowledge from other individuals.

M. Tanaka (✉) Faculty of Letters, Arts, and Sciences, Waseda University, Tokyo, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 Y. Seki (ed.), Acoustic Communication in Animals, https://doi.org/10.1007/978-981-99-0831-8_5

81

82

M. Tanaka

Fig. 5.1 Imitation process. After deciding which models to imitate, a pupil must memorize the model behavior as a template with which the pupil evaluates its own behavior while practicing until the pupil completes the imitation

Learning from others—social learning—can include any type of learning if it is influenced by other individuals in society (Heyes 1994). Social animals aggregate in a group to efficiently forage, mate, and survive predation. These animals can obtain information about the environment from any behaviors of others, even from changes in gait and piercing scream. While such social learning in a broad sense might be evident in many social animals, some of these animals further possess a special type of social learning—imitative learning—to accurately reproduce behaviors of others. This imitative learning is the one of the rarest abilities in the animal kingdom. Imitation is an ability to precisely copy others’ behavior. Imitation is probably a complex learning consisting of multiple, simpler forms of learning (Fig. 5.1). Before imitating others, a pupil must first decide which model to imitate. The decision might precede observation of the actual behavior if the pupil believes that the model’s behaviors are generally useful, as often observed in over-imitation in human children. However, it would be more common that observation of a useful behavior triggers the decision to imitate. If a pupil has no idea about the value of the model behavior, the decision can depend on other criteria such as feasibility of the imitation or a social bond to the model. Once a pupil decides which model to imitate, then the pupil must memorize the model’s behavior. Understanding the purpose of the behavior is not always necessary, but the pupil must have a grasp of how to reproduce the model behavior in order to effectively use it as a reference template. If the model behavior is hard to reproduce, the pupil may have to practice for a long period of time by comparing its own behavior with the model template. The pupil can dynamically update the template if the model is available for reference. Otherwise, the pupil must keep the template in memory until the imitation is completed. The pupil also has to maintain the motivation to learn as imitation is often a costly process that requires long time of practice. Since imitation is such a complex learning that requires various cognitive and motor functions, it would not be surprising that only a limited animal species have the ability. We humans readily learn new skills from others, but it has been a matter of debate whether other non-human primates can imitate others (Tomasello 1999; Whiten and van de Waal 2018). It appears that even our closest relatives straggle to

5

Vocal Imitation, A Specialized Brain Function That Facilitates. . .

83

copy others’ behaviors: Chimpanzees can only learn basic gestures and number recognition after intensive training (Terrace et al. 1979; Matsuzawa 1985). Theories have suggested that imitation has contributed to the development of human culture and sociality by boosting cognitive abilities such as understanding of others, theory of mind, and empathy (Tomasello 1999; Meltzoff and Decety 2003; Ramachandran and Oberman 2006; Ramachandran 2011). Imitation is especially important when learning complex cultural behaviors (Tanaka 2022). Culture can include a wide variety of human activities, but one common feature is that they are transmitted from people to people. Since cultural behaviors are often highly unique, they are most efficiently transmitted through imitation rather than individual invention. It would be hard to imagine how we could develop our complex cultures and civilizations without the ability to imitate (Tomasello 1999).

5.2

Vocal Imitation in Social Animals

Much like imitation of body movements, vocal imitation is also a rare ability that can be found in only a limited animal species. Humans with normal hearing and motor function readily learn speech of the first language during childhood. The speech acquisition is at least partly through imitative learning, in which infants try to reproduce the vocalizations of surrounding adults (Kuhl and Meltzoff 1996). In contrast, non-human primates generally do not learn vocalizations from others, except for some species such as chimpanzees, gibbons, and marmosets, which could modulate innate vocalizations at some extent depending on social experience (Egnor and Hauser 2004). The limited vocal learning in non-human primates is probably not due to motor constraints but due to neural limitations (Fitch et al. 2016). It would be plausible to assume that our brain function to imitate vocalizations has evolved in the human lineage. Intriguingly, however, the vocal imitation has been reported in some animals other than primates. There are reports of vocal imitation in gray seals and elephants (Poole et al. 2005; Stansbury and Janik 2019). One of the most studied animals with vocal imitation would be cetaceans (Ridgway et al. 2012). For example, bottlenose dolphins imitate vocalizations in a social and interactive environment (Reiss and McCowan 1993). Killer whales imitate unfamiliar sounds if the imitation is rewarded with social interaction (Abramson et al. 2018). Sperm whales imitate clicking sounds from parents and can transmit the sounds across generations for decades (Rendell and Whitehead 2003). Beluga whales, which have been reported to imitate body movements (Abramson et al. 2017), can also imitate human vocalizations (Murayama et al. 2014). Furthermore, humpback whales produce a complex sequence of vocalizations called “song” (Payne and McVay 1971), which is thought to be learned through imitation. Humpback whales in distant areas transmit different types of songs (Noad et al. 2000). The whale song propagates rapidly through distant populations (Garland et al. 2011). Such rapid propagation of a song is also reported in bowhead whales (Stafford et al. 2018).

84

M. Tanaka

These reports of vocal imitation in distant animal species from humans suggest that this ability has independently evolved multiple times in the history of animal evolution, serving to specific functions in those animals. Notably, most animals that imitate vocalization are highly social, suggesting a social function of vocal imitation. Humans, whales, and other animals that imitate vocalizations generally have a large brain (Smaers et al. 2021), and the encephalization—the large brain relative to body size—is thought to associate with sociality (Shultz and Dunbar 2010). Nevertheless, it appears that small animals like bats also have an ability to imitate vocalizations (Knörnschild et al. 2010). Moreover, vocal imitation can be found even in small birds. It is widely known that parrots are excellent at imitating human vocalizations. Recent studies began to explore the imitation ability in African gray parrots (Giret et al. 2009), budgerigars (Farabaugh et al. 1994), and cockatiels (Seki 2021). To be noted, high encephalization is reported in some bird species (Tsuboi et al. 2018). Due to the high neuron density in the bird brain, the total number of neurons in these relatively large birds can be comparable to that of primates (Olkowicz et al. 2016). However, vocal imitation is evident even in the smallest birds such as songbirds and hummingbirds (Marler and Slabbekoorn 2004; Johnson and Clark 2020). It would be more plausible that specialized neural circuits—rather than general size of the brain—are key for the ability to imitate vocalizations of others.

5.3

Vocal Imitation in Songbirds

Songbirds belong to the suborder Passeri, best known for their beautiful song. Songbirds include a wide variety of species: finches, ravens, starlings, canaries, sparrows, birds of paradise, and others. It has been widely known that some songbirds like mynahs can imitate human speech (Thorpe 1959). Wild birds such as European starlings and mockingbirds are also famous for imitating a variety of vocalizations from other birds (West et al. 1983; Gammon and Altizer 2011). Actually, many songbird species studied so far appear to be great imitators, learning songs from other individuals. Birdsong is a sequence of complex vocalizations transmitted from bird to bird (Marler and Slabbekoorn 2004). Similar to dialects of human speech, even the same songbird species transmit different types of songs when they are geographically separated (Marler and Tamura 1964; Baker et al. 2001). Experiments have shown that juvenile songbirds can learn songs from an adult tutor bird even if the tutor is from the other area or even other species (Clayton 1987). If a juvenile bird is socially isolated with no exposure to an appropriate tutor, the juvenile bird cannot acquire a normal song (Thorpe 1958). Similarly, deafening of a juvenile bird disrupts its ability to imitate a song, resulting in abnormal songs after development (Konishi and Nottebohm 1969). These findings have clearly shown that songbirds have to imitate others to acquire normal, species-typical songs.

5

Vocal Imitation, A Specialized Brain Function That Facilitates. . .

85

Fig. 5.2 Rare abilities shared by humans and songbirds. Humans and songbirds share rare abilities such as cultural transmission of skillful behaviors, complex vocal communication, and high sociality. Imitation ability could have contributed to the evolution of these rare abilities shared by these distant species

Songbirds have been a useful model in which to study how animals imitate vocalizations. Studies have found behavioral, neural, and genetic parallels between song learning and human speech acquisition (Zeigler and Marler 2008). Although being an evolutionarily distant species from humans, songbirds oddly share several unique abilities with humans (Fig. 5.2). First, birdsong is a cultural trait that is transmitted through imitation, as humans transmit speech and other cultures in society (Bruno et al. 2021; Tanaka 2022). Studies suggest that birdsong may be pleasing to bird’s ear (Riters 2011) similar to our positive experience during cultural activity. Songbirds also exhibit skillful behaviors such as dance, nest building, and tool use, which could be analogous to human arts. Second, songbirds and humans heavily rely on vocal communication. Notably, songbirds could also use syntactic rules (Okanoya 2004; Abe and Watanabe 2011; Suzuki et al. 2020), analogous to human language. Third, songbirds are highly social animals. Much like many mammals, songbirds are mostly altricial species so that they need caregivers to feed them after hatching. Moreover, songbirds can maintain strong social bonds to specific individuals for a long time. It is curious how the distant species like songbirds and humans share such unique abilities. One possibility is that convergent evolution conferred one of these unique abilities on those animals, guiding the evolution of other unique abilities. For example, the evolution of vocal imitation might have facilitated social abilities in these species. Indeed, theories suggest that imitation is the source that humans acquired unique abilities such as culture, language, and other unique skills (Tomasello 1999). It might be beneficial to elucidate the brain function of vocal imitation in songbirds to understand how complex cultures, communication, and sociality have evolved in these animals.

86

5.4

M. Tanaka

Neural Mechanism of Vocal Imitation in Songbirds

Vocal imitation in songbirds generally proceeds by two stages (Fig. 5.1) (Marler and Slabbekoorn 2004; Zeigler and Marler 2008). First is the sensory learning stage, where a juvenile bird listens to adult songs to memorize for future imitation. Second is the sensorimotor learning stage, where a juvenile bird practices the song again and again until it masters to sing the song. Although the detailed neural mechanism of the vocal imitation remains to be clarified, recent advances in neuroscience have identified the key neural circuits for song imitation in some songbird species: zebra finches, canaries, and Bengalese finches. Obviously, the auditory circuits are crucial to vocal imitation. Impeding the neural function of the caudomedial nidopallium (NCM), one of the higher auditory areas that can encode tutor song memory (Phan et al. 2006; Yanagihara and YazakiSugiyama 2016) blocks song imitation (London and Clayton 2008; Katic et al. 2022). Deafening also blocks song imitation by disrupting not only formation of auditory memory but also auditory feedback, which is necessary for accurate evaluation of vocalizations by pupils (Konishi and Nottebohm 1969). Vocal imitation requires precise coordination of motor commands by comparing auditory memory and vocalizations of self. Such coordination should be best controlled by an interface between auditory and motor circuits in the brain. Indeed, it appears that vocal imitation is regulated by such interface in the songbird brain, a motor cortical area called “HVC.” HVC was formerly an abbreviation of the “hyperstriatum ventrale pars caudalis,” but HVC is now used as a proper noun as it turned out to be not a striatal but a cortical area in the pallium (Reiner et al. 2004). HVC is a sensorimotor area that receives auditory signals from auditory areas and a thalamic nucleus. HVC then sends motor-related signals to downstream cortical areas and basal ganglia. HVC is a critical hub of a specialized neural circuits—the “song system”—which is involved in singing and song learning (Fig. 5.3). Studies have identified different pathways originating from HVC play specialized roles in song imitation (Fee and Scharff 2010; Ikeda et al. 2020; Mooney 2020). A subset of HVC neurons send signals to a motor cortical area, robust nucleus of arcopallium (RA). This signaling pathway originating from HVC neurons that project to RA (HVC-RA) is called as the “song motor pathway.” Similar to mammalian primary motor cortex, RA neurons that receive inputs from HVC-RA neurons project to the brainstem, innervating motor neurons that controls the vocal organ “syrinx” and respiratory muscles. Impeding the song motor pathway by lesioning HVC or sectioning HVC-RA axons disrupts singing of learned song, while sparing innate vocalizations and the motivation to sing (Aronov et al. 2008). Another type of HVC neurons, HVC-X neurons, projects to a basal ganglia region, “Area X.” Similar to corticostriatal neurons in mammals, HVC-X neurons mostly synapse onto striatal medium spiny neurons in Area X. Medium spiny neurons project to pallidal neurons in Area X, which further provide inputs to a

5

Vocal Imitation, A Specialized Brain Function That Facilitates. . .

87

Fig. 5.3 Song system in the songbird brain. A sensorimotor cortical area HVC is a critical hub of the “song system,” a specialized neural circuit for singing behavior. HVC integrates inputs from auditory areas, a thalamic area (omitted in this figure), and modulatory dopaminergic signals from the midbrain including PAG. HVC then provides motor signals to multiple downstream pathways to control song learning and singing

motor thalamus (a medial portion of dorsolateral nucleus of the thalamus: DLM). The motor thalamus DLM then provides inputs to a premotor cortical area (lateral magnocellular nucleus of the anterior nidopallium: LMAN), which projects to RA as well as Area X, homologous to the mammalian cortico-basal ganglia loop. These circuits originating from HVC-X neurons are called as the “anterior forebrain pathway.” Studies have shown that this pathway is essential to song learning by modulating vocal variability in pitch, amplitude, and sequence (Kao et al. 2005; Tanaka et al. 2016; Kojima et al. 2018; Sánchez-Valpuesta et al. 2019; Singh Alvarado et al. 2021). Area X also receives dense innervation of dopamine neurons in the midbrain ventral tegmental area (VTA) and substantia nigra pars compacta (SNc). Recent studies suggest that these dopamine neurons are important for sending vocal error signals to Area X, driving a reinforcement learning to improve the song (Gadagkar et al. 2016). The third type is a small number of HVC neurons projecting back to an auditory area, avalanche (Av), which is also necessary for song learning (Roberts et al. 2017). It appears that Av projects to a cortical area (ventral portion of the intermediate arcopallium: AIV) that can modulate the vocal error computation in VTA (Mandelblat-Cerf et al. 2014; Kearney et al. 2019). Av can possibly compute vocal error signals by comparing the motor command from HVC and the resultant auditory inputs from auditory areas. It has been known that these three pathways are important for vocal imitation. The precise mechanism that initiates song imitation remained unknown until recently, but a study showed that dopamine signaling in HVC can induce plastic changes in HVC to drive vocal imitation (Tanaka et al. 2018). The dopamine inputs to HVC originate mainly from the midbrain periaqueductal gray (PAG) (Appeltants et al. 2000). PAG neurons encode an appropriate tutor song to induce rapid formation of auditory responsivity to the tutor song in HVC (Tanaka et al. 2018). It has been

88

M. Tanaka

suggested that the stabilization of dendritic spines in HVC neurons is a correlate of the formation of the tutor song memory (Roberts et al. 2010). Probably dopamine from PAG causes the synaptic potentiation from auditory areas such as the nucleus interfacialis of the nidopallium (NIf) (Zhao et al. 2019) and stabilizes the auditory memory to guide subsequence song imitation. To be noted, HVC interneurons are known to encode the improvement of the song during practicing (Vallentin et al. 2016). Therefore, plastic changes in HVC would be also important for the sensorimotor phase of song imitation. The downstream pathways of HVC are known to provide feedback to HVC, which can influence HVC signaling (Hamaguchi et al. 2016; Ikeda et al. 2020). It would be plausible that the plasticity in multiple neural circuits in the song system cooperatively contributes to the complex learning in song imitation. Recent advances in neuroscience have begun to allow exploration of the detailed function of microcircuits in songbirds. It would not be long until studies elucidate the specialized neural circuits underlying the song imitation, which would help to understand how such a complex learning was conferred on a limited animal species including humans.

5.5

Comparative View to Dissect the Mechanism of Imitation

Findings of the song imitation in songbirds can be potentially translated to understanding of imitation in other animals. For example, it appears that the song imitation is driven by the dense projection from midbrain dopaminergic neurons to motor cortical areas in songbirds (Tanaka et al. 2018). Whereas dopaminergic innervation is sparse in motor cortical areas in rodents, the motor cortices receive dense dopaminergic inputs in primates (Berger et al. 1991; Williams and Goldman-Rakic 1998). Therefore, dense dopamine projections to motor cortical areas may serve unique functions shared by songbirds and primates. Although non-human primates are generally not good at vocal imitation, they can sometimes reproduce motor actions of other individuals. Such imitative learning in primates might be better explained as a result of emulation of the others’ goal or local enhancement of attention (Tomasello 1999). Nevertheless, there are reports that some primates can culturally transmit complex behaviors across generations (Whiten et al. 2009; Falótico et al. 2019). Moreover, theories have suggested that mirror neurons found in motor cortical areas in monkeys are involved in high cognitive functions such as imitation by encoding both the same action of self and others (Rizzolatti and Craighero 2004; Iacoboni 2009). Mirror neurons are also reported in HVC of swamp sparrows (Prather et al. 2008). Although there is probably no counterpart of HVC in the mammalian brain (Pfenning et al. 2014), studies suggest that HVC is analogous to Broca’s area, a cortical area involved in human speech, based on their regulation of vocal sequence and timing (Long Michael et al. 2016). Similar to other motor cortices, it seems that

5

Vocal Imitation, A Specialized Brain Function That Facilitates. . .

89

Broca’s area receives dopaminergic inputs (Aalto et al. 2005). Although the dopaminergic innervation to the motor-related area in primates and songbirds is probably not homologous, these pathways may have independently evolved in primates and birds, having potentiated their ability to imitate. Imitation in the vocal and motor domain shares several characteristics, including the existence of sensitive period, innate motivation to learn from others, and importance of social interaction. Some birds and cetaceans are reported to imitate both vocalizations and body movement (Zentall 2004; Abramson et al. 2017), suggesting that the vocal and motor imitation may not be completely independent brain functions. Vocal imitation facilitates complex communication useful for strengthening social bonds. Similarly, motor imitation can be related to various social functions: heightening social functions (Pope et al. 2015), stimulating familiarity (Paukner et al. 2009), and facilitating sympathy (Iacoboni 2009). The association between imitation, sociality, and complex cultures might be key to understand why imitation was acquired only in a limited animal species.

5.6

Future Directions

Dissecting the close links between imitation, sociality, and culture might not be straightforward because their associations may vary depending on the various behavioral contexts and animal species involved. For example, it has been known that social interaction with a live tutor is most efficient when driving song imitation in many songbird species (Kroodsma and Pickert 1984), vocal imitation in bottlenose dolphins (Reiss and McCowan 1993), as well as human speech learning (Kuhl et al. 2003). However, some songbird species such as song sparrows and canaries can learn songs played from a speaker without social interactions (Marler and Peters 1977, 1988). White-crowned sparrow also learn songs from a speaker especially when the songs include species-specific introductory whistles (Soha and Marler 2000). Therefore, there are certainly species differences in how imitation and social interactions are linked. Some songbirds have strong innate bias or preference to selectively imitate certain song elements. Swamp sparrows are inclined to learn species-specific song syllables (Podos 1996; Podos et al. 1999). There is large variability in zebra finch songs, which could be explained by individual preference (Podos 1997; Mets and Brainard 2018) or partner’s preference, which can drive sexual selection of songs (Nowicki et al. 2001). Indeed, when an abnormal song is transmitted from bird to bird through social interaction, the transmitted song gradually approaches to the species-typical song (Fehér et al. 2009). Some of the innate predispositions could be explained by genetic or motor constraint (Podos 1997, Mets and Brainard 2018), while there might be some other factors (Gardner et al. 2005). Indeed, studies on hybrid songbirds began to elucidate genetic predispositions in song imitation (Wang et al. 2019). However, cultural transmission of songs would be driven not solely by personal bias but also under influence of society and culture. Notably, encultured

90

M. Tanaka

animals such as chimpanzees reared by humans (Tomasello et al. 1993; Custance et al. 1995; Buttelmann et al. 2007) and domesticated dogs (Range et al. 2011) are reported to possess a better ability to imitate. The experimental evidence is still lacking to understand why only a limited animal species acquired imitation. Although imitation is useful to quickly learn skills and knowledge in society, the exact benefit of wasting a long period of time for imitating complex songs remains to be clarified in songbirds. One prediction is that vocal imitation, similar to motor imitation, is involved in social functions and contributes to the development of complex cultures. Comparative perspectives would be necessary to understand the function of imitation under various experimental conditions in diverse animals. Songbirds, in particular, offer a unique opportunity to explore how the brain circuits for imitation are linked to sociality and cultures, both of which are the foundation of our civilizations.

References Aalto S, Brück A, Laine M, Någren K, Rinne JO (2005) Frontal and temporal dopamine release during working memory and attention tasks in healthy humans: a positron emission tomography study using the high-affinity dopamine d2 receptor ligand [11c] flb 457. J Neurosci 25:2471 Abe K, Watanabe D (2011) Songbirds possess the spontaneous ability to discriminate syntactic rules. Nat Neurosci 14:1067–1074 Abramson JZ, Hernández-Lloreda MV, Esteban J-A, Colmenares F, Aboitiz F, Call J (2017) Contextual imitation of intransitive body actions in a beluga whale (Delphinapterus leucas): a “do as other does” study. PloS One 12:e0178906 Abramson JZ, Hernández-Lloreda MV, García L, Colmenares F, Aboitiz F, Call J (2018) Imitation of novel conspecific and human speech sounds in the killer whale (Orcinus orca). Proc R Soc B Biol Sci 285:20172171 Appeltants D, Absil P, Balthazart J, Ball GF (2000) Identification of the origin of catecholaminergic inputs to hvc in canaries by retrograde tract tracing combined with tyrosine hydroxylase immunocytochemistry. J Chem Neuroanat 18:117–133 Aronov D, Andalman AS, Fee MS (2008) A specialized forebrain circuit for vocal babbling in the juvenile songbird. Science 320:630–634 Baker MC, Baker EM, Baker MSA (2001) Island and Island-like effects on vocal repertoire of singing honeyeaters. Anim Behav 62:767–774 Berger B, Gaspar P, Verney C (1991) Dopaminergic innervation of the cerebral cortex: unexpected differences between rodents and primates. Trends Neurosci 14:21–27 Bruno JH, Jarvis ED, Liberman M, Tchernichovski O (2021) Birdsong learning and culture: analogies with human spoken language. Annu Rev Linguist 7:449–472 Buttelmann D, Carpenter M, Call J, Tomasello M (2007) Enculturated chimpanzees imitate rationally. Dev Sci 10:F31–F38 Clayton NS (1987) Song learning in cross-fostered zebra finches: a re-examination of the sensitive phase. Behaviour 102:67–81 Custance DM, Bard KA, Whiten A (1995) Can young chimpanzees (Pan troglodytes) imitate arbitrary actions? Hayes & Hayes (1952) revisited. Behaviour 132:837–859 Egnor SER, Hauser MD (2004) A paradox in the evolution of primate vocal learning. Trends Neurosci 27:649–654 Falótico T, Proffitt T, Ottoni EB, Staff RA, Haslam M (2019) Three thousand years of wild capuchin stone tool use. Nat Ecol Evol 3:1034–1038

5

Vocal Imitation, A Specialized Brain Function That Facilitates. . .

91

Farabaugh SM, Linzenbold A, Dooling RJ (1994) Vocal plasticity in budgerigars (Melopsittacus undulatus): evidence for social factors in the learning of contact calls. J Comp Psychol 108:81– 92 Fee MS, Scharff C (2010) The songbird as a model for the generation and learning of complex sequential behaviors. ILAR J 51:362–377 Fehér O, Wang H, Saar S, Mitra PP, Tchernichovski O (2009) De novo establishment of wild-type song culture in the zebra finch. Nature 459:564–568 Fitch WT, de Boer B, Mathur N, Ghazanfar AA (2016) Monkey vocal tracts are speech-ready. Sci Adv 2:e1600723 Gadagkar V, Puzerey Pavel A, Chen R, Baird-Daniel E, Farhang Alexander R, Goldberg JH (2016) Dopamine neurons encode performance error in singing birds. Science 354:1278–1282 Gammon DE, Altizer CE (2011) Northern mockingbirds produce syntactical patterns of vocal mimicry that reflect taxonomy of imitated species. J Field Ornithol 82:158–164 Gardner TJ, Naef F, Nottebohm F (2005) Freedom and rules: the acquisition and reprogramming of a bird’s learned song. Science 308:1046–1049 Garland EC, Goldizen AW, Rekdahl ML, Constantine R, Garrigue C et al (2011) Dynamic horizontal cultural transmission of humpback whale song at the ocean basin scale. Curr Biol 21:687–691 Giret N, Péron F, Nagle L, Kreutzer M, Bovet D (2009) Spontaneous categorization of vocal imitations in African grey parrots (Psittacus erithacus). Behav Process 82:244–248 Hamaguchi K, Tanaka M, Mooney R (2016) A distributed recurrent network contributes to temporally precise vocalizations. Neuron 91:680–693 Heyes CM (1994) Social learning in animals: categories and mechanisms. Biol Rev 69:207–231 Iacoboni M (2009) Imitation, empathy, and mirror neurons. Annu Rev Psychol 60:653–670 Ikeda MZ, Trusel M, Roberts TF (2020) Memory circuits for vocal imitation. Curr Opin Neurobiol 60:37–46 Johnson KE, Clark CJ (2020) Ontogeny of vocal learning in a hummingbird. Anim Behav 167:139– 150 Kao MH, Doupe AJ, Brainard MS (2005) Contributions of an avian basal ganglia–forebrain circuit to real-time modulation of song. Nature 433:638–643 Katic J, Morohashi Y, Yazaki-Sugiyama Y (2022) Neural circuit for social authentication in song learning. Nat Commun 13:4442 Kearney MG, Warren TL, Hisey E, Qi J, Mooney R (2019) Discrete evaluative and premotor circuits enable vocal learning in songbirds. Neuron 104:559–75.e6 Knörnschild M, Nagy M, Metz M, Mayer F, von Helversen O (2010) Complex vocal imitation during ontogeny in a bat. Biol Lett 6:156–159 Kojima S, Kao MH, Doupe AJ, Brainard MS (2018) The avian basal ganglia are a source of rapid behavioral variation that enables vocal motor exploration. J Neurosci 38:9635–9647 Konishi M, Nottebohm F (1969) Experimental studies on the ontogeny of avian vocalizations. In: Bird vocalizations. Cambridge University Press, London, pp 29–48 Kroodsma DE, Pickert R (1984) Sensitive phases for song learning: effects of social interaction and individual variation. Anim Behav 32:389–394 Kuhl PK, Meltzoff AN (1996) Infant vocalizations in response to speech: vocal imitation and developmental change. J Acoust Soc Am 100:2425–2438 Kuhl PK, Tsao F-M, Liu H-M (2003) Foreign-language experience in infancy: effects of short-term exposure and social interaction on phonetic learning. Proc Natl Acad Sci 100:9096–9101 London SE, Clayton DF (2008) Functional identification of sensory mechanisms required for developmental song learning. Nat Neurosci 11:579–586 Long Michael A, Katlowitz Kalman A, Svirsky Mario A, Clary Rachel C, Byun Tara M et al (2016) Functional segregation of cortical regions underlying speech timing and articulation. Neuron 89: 1187–1193 Mandelblat-Cerf Y, Las L, Denisenko N, Fee MS (2014) A role for descending auditory cortical projections in songbird vocal learning. Elife 3:e02152

92

M. Tanaka

Marler P, Peters S (1977) Selective vocal learning in a sparrow. Science 198:519–521 Marler P, Peters S (1988) Sensitive periods for song acquisition from tape recordings and live tutors in the swamp sparrow, Melospiza georgiana. Ethology 77:76–84 Marler PR, Slabbekoorn H (2004) Nature’s music: the science of birdsong. Jordan Hill, Elsevier Science & Technology, Amsterdam Marler P, Tamura M (1964) Culturally transmitted patterns of vocal behavior in sparrows. Science 146:1483–1486 Matsuzawa T (1985) Use of numbers by a chimpanzee. Nature 315:57–59 Meltzoff AN, Decety J (2003) What imitation tells us about social cognition: a rapprochement between developmental psychology and cognitive neuroscience. Philos Trans R Soc Lond B Biol Sci 358:491–500 Mets DG, Brainard MS (2018) Genetic variation interacts with experience to determine interindividual differences in learned song. Proc Natl Acad Sci 115:421–426 Mooney R (2020) The neurobiology of innate and learned vocalizations in rodents and songbirds. Curr Opin Neurobiol 64:24–31 Murayama T, Iijima S, Katsumata H, Arai K (2014) Vocal imitation of human speech, synthetic sounds and beluga sounds, by a beluga (Delphinapterus leucas). Int J Comp Psychol 27:369– 384 Noad MJ, Cato DH, Bryden MM, Jenner MN, Jenner KC (2000) Cultural revolution in whale songs. Nature 408:537 Nowicki S, Searcy WA, Hughes M, Podos J (2001) The evolution of bird song: male and female response to song innovation in swamp sparrows. Anim Behav 62:1189–1195 Okanoya K (2004) Song syntax in Bengalese finches: proximate and ultimate analyses. In: Slater PJB, Rosenblatt JS, Snowdon CT (eds) Advances in the study of behavior, vol 34. Elsevier Academic Press, San Diego, CA, pp 297–346 Olkowicz S, Kocourek M, Lučan Radek K, Porteš M, Fitch WT et al (2016) Birds have primate-like numbers of neurons in the forebrain. Proc Natl Acad Sci 113:7255–7260 Paukner A, Suomi SJ, Visalberghi E, Ferrari PF (2009) Capuchin monkeys display affiliation toward humans who imitate them. Science 325:880–883 Payne RS, McVay S (1971) Songs of humpback whales. Science 173:585–597 Pfenning AR, Hara E, Whitney O, Rivas MV, Wang R et al (2014) Convergent transcriptional specializations in the brains of humans and song-learning birds. Science 346:1256846 Phan ML, Pytte CL, Vicario DS (2006) Early auditory experience generates long-lasting memories that may subserve vocal learning in songbirds. Proc Natl Acad Sci 103:1088–1093 Podos J (1996) Motor constraints on vocal development in a songbird. Anim Behav 51:1061–1070 Podos J (1997) A performance constraint on the evolution of trilled vocalizations in a songbird family (Passeriformes: Emberizidae). Evolution 51:537–551 Podos J, Nowicki S, Peters S (1999) Permissiveness in the learning and development of song syntax in swamp sparrows. Anim Behav 58:93–103 Poole JH, Tyack PL, Stoeger-Horwath AS, Watwood S (2005) Elephants are capable of vocal learning. Nature 434:455–456 Pope SM, Russell JL, Hopkins WD (2015) The association between imitation recognition and socio-communicative competencies in chimpanzees (Pan troglodytes). Front Psychol 6:188 Prather JF, Peters S, Nowicki S, Mooney R (2008) Precise auditory–vocal mirroring in neurons for learned vocal communication. Nature 451:305–310 Ramachandran VS (2011) The tell-tale brain: a neuroscientist’s quest for what makes us human. W W Norton, New York, NY, p 357 Ramachandran VS, Oberman LM (2006) Broken mirrors: a theory of autism. Sci Am 295:62–69 Range F, Huber L, Heyes C (2011) Automatic imitation in dogs. Proc R Soc B Biol Sci 278:211– 217 Reiner A, Perkel DJ, Bruce LL, Butler AB, Csillag A et al (2004) Revised nomenclature for avian telencephalon and some related brainstem nuclei. J Comp Neurol 473:377–414

5

Vocal Imitation, A Specialized Brain Function That Facilitates. . .

93

Reiss D, McCowan B (1993) Spontaneous vocal mimicry and production by bottlenose dolphins (Tursiops truncatus): evidence for vocal learning. J Comp Psychol 107:301–312 Rendell LE, Whitehead H (2003) Vocal clans in sperm whales (Physeter macrocephalus). Proc Biol Sci 270:225–231 Ridgway S, Carder D, Jeffries M, Todd M (2012) Spontaneous human speech mimicry by a cetacean. Curr Biol 22:R860–R861 Riters LV (2011) Pleasure seeking and birdsong. Neurosci Biobehav Rev 35:1837–1845 Rizzolatti G, Craighero L (2004) The mirror-neuron system. Annu Rev Neurosci 27:169–192 Roberts TF, Tschida KA, Klein ME, Mooney R (2010) Rapid spine stabilization and synaptic enhancement at the onset of behavioural learning. Nature 463:948–952 Roberts TF, Hisey E, Tanaka M, Kearney MG, Chattree G et al (2017) Identification of a motor-toauditory pathway important for vocal learning. Nat Neurosci 20:978–986 Sánchez-Valpuesta M, Suzuki Y, Shibata Y, Toji N, Ji Y et al (2019) Corticobasal ganglia projecting neurons are required for juvenile vocal learning but not for adult vocal plasticity in songbirds. Proc Natl Acad Sci 116:22833–22843 Seki Y (2021) Cockatiels sing human music in synchrony with a playback of the melody. PloS One 16:e0256613 Shultz S, Dunbar R (2010) Encephalization is not a universal macroevolutionary phenomenon in mammals but is associated with sociality. Proc Natl Acad Sci 107:21582–21586 Singh Alvarado J, Goffinet J, Michael V, Liberti W, Hatfield J et al (2021) Neural dynamics underlying birdsong practice and performance. Nature 599:635–639 Smaers JB, Rothman RS, Hudson DR, Balanoff AM, Beatty B et al (2021) The evolution of mammalian brain size. Sci Adv 7:eabe2101 Soha JA, Marler P (2000) A species-specific acoustic cue for selective song learning in the whitecrowned sparrow. Anim Behav 60:297–306 Stafford KM, Lydersen C, Wiig Ø, Kovacs KM (2018) Extreme diversity in the songs of Spitsbergen’s bowhead whales. Biol Lett 14:20180056 Stansbury AL, Janik VM (2019) Formant modification through vocal production learning in gray seals. Curr Biol 29:2244–49.e4 Suzuki TN, Wheatcroft D, Griesser M (2020) The syntax-semantics interface in animal vocal communication. Philos Trans R Soc Lond B Biol Sci 375:20180405 Tanaka M (2022) A comparative perspective on animal cultures. WASEDA RILAS J 10:61–71 Tanaka M, Singh Alvarado J, Murugan M, Mooney R (2016) Focal expression of mutant huntingtin in the songbird basal ganglia disrupts cortico-basal ganglia networks and vocal sequences. Proc Natl Acad Sci 113:E1720–E1E27 Tanaka M, Sun F, Li Y, Mooney R (2018) A mesocortical dopamine circuit enables the cultural transmission of vocal behaviour. Nature 563:117–120 Terrace HS, Petitto LA, Sanders RJ, Bever TG (1979) Can an ape create a sentence? Science 206: 891–902 Thorpe WH (1958) The learning of song patterns by birds, with especial reference to the song of the chaffinch Fringilla coelebs. Ibis 100:535–570 Thorpe WH (1959) Talking birds and the mode of action of the vocal apparatus of birds. Proc Zool Soc London 132:441–455 Tomasello M (1999) The cultural origins of human cognition. Harvard University Press, Cambridge, MA Tomasello M, Savage-Rumbaugh S, Kruger AC (1993) Imitative learning of actions on objects by children, chimpanzees, and enculturated chimpanzees. Child Dev 64:1688–1705 Tsuboi M, van der Bijl W, Kopperud BT, Erritzøe J, Voje KL et al (2018) Breakdown of brain– body allometry and the encephalization of birds and mammals. Nat Ecol Evol 2:1492–1500 Vallentin D, Kosche G, Lipkind D, Long MA (2016) Inhibition protects acquired song segments during vocal learning in zebra finches. Science 351:267–271 Wang H, Sawai A, Toji N, Sugioka R, Shibata Y et al (2019) Transcriptional regulatory divergence underpinning species-specific learned vocalization in songbirds. PLoS Biol 17:e3000476

94

M. Tanaka

West MJ, Stroud AN, King AP (1983) Mimicry of the human voice by European starlings: the role of social interaction. Wilson Bull 95:635–640 Whiten A, van de Waal E (2018) The pervasive role of social learning in primate lifetime development. Behav Ecol Sociobiol 72:80 Whiten A, McGuigan N, Marshall-Pescini S, Hopper LM (2009) Emulation, imitation, overimitation and the scope of culture for child and chimpanzee. Philos Trans R Soc Lond B Biol Sci 364:2417–2428 Williams SM, Goldman-Rakic PS (1998) Widespread origin of the primate mesofrontal dopamine system. Cereb Cortex 8:321–345 Yanagihara S, Yazaki-Sugiyama Y (2016) Auditory experience-dependent cortical circuit shaping for memory formation in bird song learning. Nat Commun 7:11946 Zeigler HP, Marler P (2008) Neuroscience of birdsong. Cambridge University Press, Cambridge Zentall TR (2004) Action imitation in birds. Anim Learn Behav 32:15–23 Zhao W, Garcia-Oscos F, Dinh D, Roberts TF (2019) Inception of memories that guide vocal learning in the songbird. Science 366:83–89

Chapter 6

Dancing in Singing Songbirds: Choreography in Java Sparrows Masayo Soma and Mari Shibata

Abstract Courtship singing in songbirds is often accompanied by gestural displays, similar to human vocalization or music that solicits body movements. This suggests that sound communication can potentially function in multimodal contexts; however, prior songbird research has primarily focused on the acoustic domain solely. In an effort to understand the multimodal signaling associated with singing, we analyzed the simultaneous singing and dancing courtship displays of Java sparrows. Specifically, we investigated the degree of singing-dancing temporal coordination in males as well as individual variability in dance sequences in males and females, as only males sing, but both sexes engage in courtship duet dancing. The results revealed a strong temporal relationship between the commencement of hopping and the production of song notes in males, which was affected not only by song learning but also by the identity of the female that received the courtship display. In addition, the dancing sequence was more complex in males than in females. Although it remained unexplained how such among- and within-individual variations contribute to the message content of courtship in the Java sparrow, multimodal courtship was not merely a byproduct of singing and warrants further scrutiny in future investigations. Keywords Audio-visual communication · Estrildid finch · Dance · Mutual courtship · Sexual signal

M. Soma (✉) Department of Biology, Faculty of Science, Hokkaido University, Sapporo, Japan e-mail: [email protected] M. Shibata Biosystems Science Course, The Graduate School of Life Science, Hokkaido University, Sapporo, Japan © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 Y. Seki (ed.), Acoustic Communication in Animals, https://doi.org/10.1007/978-981-99-0831-8_6

95

96

6.1

M. Soma and M. Shibata

Introduction

Birdsong is one of the best-studied acoustic communication systems in animals, although its role in multimodal contexts has received limited attention up to this point. Some bird species are known to exhibit gestural displays (visual displays) concurrently with vocalizations (e.g., Dalziell et al. 2013; Ota et al. 2015, 2017; Ligon et al. 2018), just as some people tend to add gestures or body movements when speaking or singing, whether intentionally or unconsciously (Corballis 2009; Zentner and Eerola 2010; Fujii et al. 2014; Pouw et al. 2021). In some instances, such body postures may be inseparable from vocalizations owing to their functions in the respiratory and syringeal motor systems responsible for vocal production; for instance, King penguins Aptenodytes patagonicus adopt a particular posture to produce characteristic display calls (Kriesell et al. 2020), and male brown-headed cowbirds Molothrus ater move their wings in synchrony with songs (Cooper and Goller 2004). Similarly, humans cannot move their lips independently of the phonemes they plan to produce because lips play a crucial role in sound production (Fowler and Saltzman 1993). However, this does not imply that lip movements are incapable of encoding information beyond what sounds alone can convey. Specifically, it is known that human speech perception and language acquisition rely on visual cues to the extent that audio-visual stimuli consisting of mismatched speech sounds and mouth postures result in false perceptions of spoken syllables (McGurk effect: McGurk and Macdonald 1976; Rosenblum et al. 1997). It remains unclear as to whether the same can be said about the beak movement and song production of songbirds. In the zebra finch Taeniopygia guttata, the association between beak movements and song production is not very strong (Williams 2001, 2004), and the role of visual stimuli in song learning is still under investigation (Varkevisser et al. 2022a, b). Whether or not motion-vocal coordination is obligatory, multimodal signals have the ability to convey more information than unimodal signals (Partan and Marler 1999; Higham and Hebets 2013; Partan 2013; Miles and Fuxjager 2018; Mitoyen et al. 2019; Pouw et al. 2021). In general, signals transferred via multiple modalities are more likely to be detected by the receiver because one modality signal can serve as a backup for others (Johnstone 1996; Bro-Jørgensen 2010). For example, the tap-dancing courtship display of cordon bleu finches Uraeginthus cyanocephalus can be delivered via multiple modalities since particularly fast tapping feet movements could serve as a visual signal under conditions with sufficient light and produce conspicuous sounds and possibly substrate-borne vibrations that travel through the perch (Ota et al. 2015, 2017, 2018; Ota 2020; Ota and Soma 2022). Moreover, temporally coupled signals or signal components would convey more powerful messages than uncoordinated signals (Higham and Hebets 2013; Mitoyen et al. 2019). In the túngara frog Physalaemus pustulosus, females were more attracted to temporally synchronized multimodal stimuli (i.e., calls and inflation of the vocal air sac) of a robotic frog than to asynchronous stimuli (Taylor et al. 2011), whereas in magpie-larks Grallina cyanoleuca, in which mating pairs perform

6

Dancing in Singing Songbirds: Choreography in Java Sparrows

97

well-coordinated audio-visual duets consisting of antiphonal vocalizations and matching wing movements (Hall 2000; Hall and Magrath 2007; Ręk and Magrath 2016), precise coordination of the vocal and visual components was more effective as a cooperative display to defend territory (Rȩk 2018).

6.1.1

Song-Dance Courtship in Estrildid Finches

Although estrildid finches (family: Estrildidae) are primarily studied and known for acoustic communication because of two outstanding model species of birdsong in this taxonomic group (i.e., zebra and Bengalese finches), their courtship rituals comprise multi-component and multimodal signals, frequently characterized by singing-dancing performances (Zanollo et al. 2013; Ota et al. 2015, 2018; Soma and Garamszegi 2015; Soma and Mori 2015; Ullrich et al. 2016; Gomes et al. 2017; Soma 2018; Soma et al. 2019; Ota 2020). What is most intriguing about estrildid finches is that they exhibit striking interspecific variations in sexual signals (i.e., courtship displays) and the presence or absence of sexual dimorphisms in them, despite the monotonous breeding ecology shared among all estrildid species, which include social monogamy, long-term pair bonding, and gregarious nature (absence of a distinct territory) (Soma and Garamszegi 2015, 2018; Soma 2018). Despite previous attempts to unveil the general selective forces underlying such interspecific and between-sex diversity using comparative approaches (Soma and Garamszegi 2015; Gomes et al. 2017), it still remains unclear what promoted the evolution of courtship communication and plumage ornamentation in the taxonomic group. However, behavioral studies have indicated that courtship dance shared by both sexes contributes to pair formation or maintenance (Soma and Iwama 2017; Ota et al. 2018), similar to the function of female songs or duetting (Langmore 1998; Hall 2004, 2009). Typically in estrildid finches, a courting individual stands next to its prospective mate on a perch and sings songs with dancing, which is a repetitive sequence of species-stereotypical motions, such as hopping, pivoting, or bill-wiping (Goodwin 1982; Restall 1996; Baptista et al. 1999; Soma and Iwama 2017), and can be accompanied by holding straw material in the bill (nest material holding display, Soma 2018). In a considerable proportion of estrildid species, females share whole or part of courtship dance elements with males (Soma and Garamszegi 2015), with females occasionally displaying solo dance performances (Ota et al. 2015; Ota 2020) or initiating courtship interactions that lead to mutual duet-like dancing with males (Soma and Iwama 2017). While song and dance are frequently co-expressed in many estrildid species (Goodwin 1982; Ota et al. 2015), the prevalence of female dance is not evolutionarily correlated with the occurrence of female songs, suggesting that vocal and visual signals have independent functions or mechanisms (Soma and Garamszegi 2015). However, the role of visual signals in courtship remains unclear.

98

6.1.2

M. Soma and M. Shibata

Song-Dance Courtship in Java Sparrows

We studied the Java sparrow Lonchura oryzivora, a member of the family Estrildidae, in order to understand the function of audio-visual courtship by shedding light on visual signals. Similar to its closely related species, such as the Bengalese finch Lonchura striata var. domestica, only males sing in this species (Seller 1979; Kagawa and Soma 2013; Lewis et al. 2021), whereas male and female Java sparrows share identical courtship dance elements and engage in mutual dancing (Soma and Iwama 2017) (Table 6.1, Fig. 6.1). Previous investigations on the courtship of Java sparrows in captivity revealed that the presence of duet dancing is more important for successful mating than song alone (Soma and Iwama 2017). This may come as a surprise, considering that individual variations in songs are typically a major factor in mate choice among songbirds (Searcy 1992; Riebel 2009; Byers and Kroodsma 2009; Soma and Garamszegi 2011; see also Snyder and Creanza 2019). It is likely that dance, as a visual signal, would be of greater use in close-distance communication than in vocalization. As shown in Fig. 6.1 (see also Soma and Iwama 2017), Java sparrow courtship typically starts with an interactive exchange of bill-wiping motions between the sexes, which are followed by a gradual increase in the number of hops incorporated between bill-wipes in both sexes, leading to singing in males and copulation solicitation display (CSD) in females when mating is successful. If the bill-wiping and hopping (BH sequence) is performed by one sex alone and unanswered by the other, copulation is less likely to occur (Soma and Iwama 2017). Java sparrow males frequently perform song and dance simultaneously, although the temporal coordination between them has not yet been examined. At least in the zebra finch, where only males sing and duet dancing is absent, males have been observed to synchronize their hopping with introductory notes, which are defined as repeating note types sung at the start of songs (Ullrich et al. 2016). In particular, their analysis revealed that a video frame of hopping (defined as the displacement of both feet from the perch) coincided with the duration of the introductory notes (Ullrich et al. 2016). In this study, we aimed to examine the temporal synchronization between hopping and introductory song notes in male Java sparrows, using precise timing Table 6.1 Comparison of the characteristics of singing and dancing components used by Java sparrows in their courtship displays Performer sex Communication style Development Social learning Neural mechanism Components

Song Male Unilateral (from male to female) Practicing in juveniles Yes (vocal learning from tutor/father) Song system

Dance Male and female Interactive (duet between male and female) Practicing in juveniles No Unknown

Song note types vary among individuals owing to vocal learning

Bill-wipe and hop are shared among all conspecifics

6

Dancing in Singing Songbirds: Choreography in Java Sparrows

99

Fig. 6.1 Schematic depiction of typical courtship interactions between the sexes of the Java sparrow. (a) Only males sing songs that start with a recurrence of introductory notes and are followed by the main song phrase. (b) Before a male starts to sing, both males and females simultaneously perform courtship dancing, which consists of repeated bill-wipes and hops. In this chapter, we examined the song-dance coordination by focusing on the timing of introductory notes and hopping (Sect. 6.2.1) and dance sequence complexity by analyzing bill-wipe and hopping (BH) sequences (Sect. 6.2.2)

measurements of their behaviors and accounting for individual variations in song phenotypes. In Java sparrows, each male has a single song type that is characterized by a note-type repertoire and note order (Hasegawa et al. 2011; Kagawa and Soma 2013; Lewis et al. 2021). Previous studies on the songs of Java sparrows showed that the phenotypic features of male songs are determined by social learning from models (i.e., the social or foster fathers), but are unaffected by genetic fathers when crossfostering is applied in early life (Lewis et al. 2021) (Table 6.1). Therefore, social fathers and sons have similar songs not only in terms of song acoustic features (e.g., note types and sequences) (Lewis et al. 2021), but also temporal parameters, such as note-to-note intervals. These result in very similar songs among the members of the same song lineage and large distinctions between males from different lineages, at

100

M. Soma and M. Shibata

least in captive populations (Lewis et al. 2021). Consequently, the rhythmicity in singing is learned by every male. What about hopping rhythms? We discovered that Java sparrow juveniles started practicing dancing earlier than they started practicing songs (singing subsongs) and spent considerable time practicing until they could perform singing and dancing simultaneously during development (Soma et al. 2019). Considering that the dance elements (bill-wipe and hop) are stereotyped within species, it is highly improbable that they are socially acquired. Through dance practice, Java sparrow juveniles improve their motor performance, such as hopping rate, and become able to sing and dance simultaneously near the end of the songlearning period (Soma et al. 2019). Given these developmental trajectories of audiovisual courtship (Table 6.1), it is likely that song-dance coordination could be influenced by song phenotypes that differ across males and song lineages. Alternatively, when dancing rhythmicity is intrinsically programmed and universally shared among conspecifics, there may be little song-dance coordination. In addition, because of the dyadic interaction of their courting, it is possible that the dancing behaviors of males would be affected by those of the females with whom they interact (review in de Reus et al. 2021). In other words, the BH sequence of an individual could be the outcome of mutual interactions between the two sexes, with sequence complexity varying depending on which sex leads and which follows.

6.2

Methods

We used a total of 34 captive adult Java sparrows (male, n = 20; female, n = 14; age: 12–58 months old) from lab-bred captive populations with known pedigrees. Each male subject was randomly paired with a female and placed in an observation cage (25 × 27 × 22 cm) in a soundproof box for 1–4 h in order to record mating interactions using a video camera (Q3HD ZOOM, 60 fps). We repeated the observation for particular pairs that demonstrated courtship in order to maximize the sample size of the dance data, resulting in uneven observation durations for each individual and pair. However, we ensured that the dance data of each individual consisted of pairings with multiple partners; otherwise, we could not distinguish individual variations from variations among pairs.

6.2.1

Examining Song-Dance Coordination in Males

We collected 88 dancing bouts from 12 males from a total of 335 h of video recordings, where we did not include individuals with insufficient data. Each dance bout had 4.5 introductory notes on average (total n = 396). As observed in Figs. 6.1, 6.2 male Java sparrows tend to simultaneously hop and produce introductory notes at the start of singing. We assessed the interval from the commencement of the hop to the onset of the note using Adobe Premiere Pro and Audition (Adobe

6

Dancing in Singing Songbirds: Choreography in Java Sparrows

101

Fig. 6.2 (a) Male Java sparrows were observed to vocalize introductory notes while hopping. (b) The histogram depicts the frequency distribution of the intervals between the commencement of hopping and the production of the note for the 12 males, with typical hop duration marked by a gray background

Systems Inc.). Positive values of the onset-to-onset interval indicate hop-vocal overlap (Fig. 6.2). We used a nested analysis of variance (ANOVA) to test the effects of male identity and song lineage on vocal-motor coordination, where male identity was nested within the song lineage, and female identity was entered as an explanatory factor.

6.2.2

Comparing Dance Sequence Complexity Between Males and Females

From the video recordings mentioned above (Sect. 6.2.1), we coded the dance BH sequences of both sexes (dance bouts: male, n = 90; female, n = 45). Specifically, we determined the BH sequence repeated in each dancing bout by focusing on the sequence shortly before the onset of songs and bill clicks (cf. Soma and Mori 2015). For example, BHBHBH and BBBBHHHH are relatively simple sequences in which the occurrence of each B or H is highly predictable from the preceding dance element, whereas BBHBHBBH is a more complex sequence in which the occurrence of B or H is entirely unpredictable even with knowledge of the sets of preceding notes. This degree of sequential complexity can be quantified using the entropy of Markov chain (e.g., Da Silva et al. 2000; Choi et al. 2022). Briefly, when the dance

102

M. Soma and M. Shibata

Fig. 6.3 Schematic examples of high and low third entropy BH sequences. Each matrix displays transition probabilities derived from each BH sequence shown above. As Bird A has more unpredictable transition patterns than Bird B, its entropy is higher, indicating that its sequence is more complex

elements (B and H) in a sequence occur randomly and are unpredictable, the entropy of the Markov model is larger. Figure 6.3 depicts a schematic illustration of the entropy calculations based on a third-order Markov model, in which the preceding three elements are used to determine the occurrence of each focal element. In this study, we calculated first through fifth-order entropy for each dance sequence. To test sex differences in dance sequence complexity measured as entropy, we employed linear mixed effect (LME) models in which n-th order entropy was entered as a response variable, sex was entered as an explanatory variable, and individual identity was included as a random effect to account for the non-independence of the data from the same individual. Considering that the length of the dance sequence (total number of dance elements per bout) can influence entropy, dance length was included as an explanatory variable in the models presented above.

6

Dancing in Singing Songbirds: Choreography in Java Sparrows

6.3 6.3.1

103

Results Song-Dance Coordination in Males

In accordance with previous findings regarding the zebra finch (Ullrich et al. 2016), we discovered that male Java sparrows coordinate vocalization with dancing. Specifically, 97.3% of the observed introductory notes (n = 383) were accompanied by hopping, whereas only 3.3% of introductory notes were produced without hopping (n = 13). As shown in Fig. 6.2, introductory notes were emitted during hopping in such a timely manner that it only took approximately 0.05–0.25 s between the commencement of hopping and the production of the introductory note. Moreover, the timing of vocalization and hopping varied among males, which was mostly explained by differences in song lineages (Table 6.2, Fig. 6.4, nested ANOVA: P < 0.001), suggesting that song learning influenced dance-song coordination. In addition, the identity of the female that received the courtship dance had a statistically significant effect on the timing (Table 6.2, ANOVA: P < 0.001), which was possibly caused by the interactive nature of the courtship dance of the Java sparrow.

6.3.2

Dance Sequences of Males and Females

We identified statistically significant sex differences in the first, second, and third orders of entropy (P < 0.012) but not in the fourth and fifth orders (Fig. 6.5). The effect of dance length was statistically significant for third- through fifth-order entropy (P < 0.01) but not for first- and second-order entropy. In these LME models, the random effect of bird identity was also statistically significant (P < 0.01), indicating that each bird tended to exhibit similar dance sequence complexity at various times or with different partners. As a supplemental analysis, we investigated the relationships between paired partners by examining the correlation of entropy within pairs; however, there were no statistically significant correlations between paired males and females.

Table 6.2 The effects of song lineage, male and female identities on vocal-motor coordination measured as the onset-to-onset interval of the hop and note Song lineage Male identity (within song lineage) Female identity

df 7 3 10

F 9.208 0.538 3.106

P < 0.001 0.657 < 0.001

Fig. 6.4 Frequency distribution of hop and note onset-to-onset intervals (s) for each male. As in Fig. 6.2, the gray background in each panel indicates the typical hop duration. The males in each square are descended from the same song lineage, indicating that they share a similar song type

104 M. Soma and M. Shibata

6

Dancing in Singing Songbirds: Choreography in Java Sparrows

105

Fig. 6.5 (a) Comparisons of first- to fifth-order BH sequence entropy between sexes (*: P < 0.05). (b) Individual variation in third-order entropy of BH sequences

6.4

Discussion

Multimodal aspects of mating interactions are noteworthy for understanding animal communication content and elucidating the underlying mechanisms (Spezie et al. 2022). In this study, we identified individual variations in the courtship dancing of the Java sparrow, in which males sing, whereas females do not. The observed strong coordination between hopping and introductory song notes in males (Fig. 6.4) corroborated previous findings in a related species, the zebra finch (Ullrich et al. 2016), and provided novel insights into this relationship, as the precise timing of hop notes differed among males from different song lineages. Male and female Java sparrows interact by executing a BH sequence prior to males starting the coordinated hop-note display (Fig. 6.1). Although the BH sequence was shared by both sexes, sequential complexity was found to be higher in males and differed among

106

M. Soma and M. Shibata

individuals (Fig. 6.5). These findings implied that the visual components of multimodal courtship play significantly in sexual communication. The temporal coupling between dancing and singing in Java sparrows (in this study) and zebra finches (Ullrich et al. 2016) could be explained by ultimate and proximate causations, which were not necessarily mutually exclusive. From the former perspective, we would expect coordinated multimodal signals to provide a fitness advantage. The degree of temporal coordination of multiple (multimodal) signals would influence their efficacy, with synchronized signals facilitating mate attraction or repelling rivals more effectively than asynchronous signals. This idea has frequently been tested utilizing robotic animals (Mitri et al. 2013) or virtual video stimuli (Varkevisser et al. 2022b). For example, female frogs did not favor mating calls presented with asynchronous movements of air sacs by a robotic frog (Taylor et al. 2011). Although it could be a preference against unnaturalistic stimuli, as calling without synchronized air sac inflation is impossible in nature, avian research that focused on the coupling of two independent signals reported similar results (Ręk and Magrath 2016; Rȩk 2018). Also in wolf spiders (Schizocosa ocreata), females responded more strongly to synchronized playback stimuli composed of a video of visual courtship display and vibratory signals of males (Kozak and Uetz 2016). However, synchronization between male calling and visual courtship display did not have an influence on female preference in ring doves (Mitoyen et al. 2021). For the Java sparrow, it remains unclear whether hop-vocal coordination stems from a mechanistic or physiological constraint, which, from the latter viewpoint, concerns the proximate mechanisms of multimodal signaling. Given the vocalrespiratory system in songbirds (Schmidt and Wild 2014) and the relationship between respiratory control and locomotive (wing or leg) movements in birds (Boggs 1997; Cooper and Goller 2004), it may not be unexpected that vocalization and hopping cannot be combined with arbitrary timings. Acoustic features and temporal patterns of song notes are socially learned and stereotypical among male Java sparrows belonging to the same song line (Soma 2011; Lewis et al. 2021). However, the outcome that vocal-motor coordination in male Java sparrows was affected by song learning and the identity of the female receiver was unexpected. It suggested that vocal-motor coordination was strong enough to be repeatable among males with similar song types and flexible enough to be affected by partnered females. Interestingly, whether songs accompany dancing or not in another estrildid species, the blue-capped cordon bleu Uraeginthus cyanocephalus, affected note sequences in songs (Ota and Gahr 2022), indicating that the two modality signals interact with each other not only temporally but also at higher integration levels. These phenomena remind us of the impact of music on human nature, which is characterized by singing and dancing (Mehr et al. 2019). An additional interesting aspect of the courtship of the Java sparrow was that males and females appeared to interact mutually through dancing (Soma and Iwama 2017; Soma et al. 2019). They frequently mirrored the motions of their partners (billwipes and hops) in manners that appeared to be turn-taking or synchronization of bill-wipe and hop motions, although the “rules” underlying such interactions have

6

Dancing in Singing Songbirds: Choreography in Java Sparrows

107

not been quantitively assessed in previous or current research. To understand the interactive nature of duet dancing in the Java sparrow, the present descriptions of changes in dance sequences among individuals and between sexes will be useful. In contrast to birdsong, where sequential complexity is seen as an outcome of sexual selection and has proven to be associated with mating success in at least some species (see also Soma and Garamszegi 2011; Backhouse et al. 2022), the fitness benefits of performing complex sequences of dance displays are obscure. In greater flamingos Phoenicopterus roseus, the group-based sexual display consists of a repertoire of stereotyped motions, and its sequential complexity depends on age and breeding status rather than sex, indicating that the dance sequence is a conditiondependent sexual signal in both males and females (Perrot et al. 2016). Similarly, in wolf spider Phoenicopterus roseus, males with a complex sequence of vibratory mating signals were more successful at mating (Choi et al. 2022). It is likely that in the Java sparrow, intra-individual repeatability and sex differences in dance sequence complexity were the outcomes of sexual selection, which was consistent with our observation that males performed more complex dance sequences than females. Alternatively, it was also possible that the sequential complexity in each sex resulted from interactions within pairs. To address this question, researchers examined the joint entropy of duet dancing in the red-crowned crane Grus japonensis, which is renowned for its intricate duet-dancing sequence of mating pairs. They discovered that more complex (less coordinated) duets were associated with better mating success (Takeda et al. 2018). However, it should be noted that unpredictable (complex) signals are not always favored. The opposite may be true, depending on the species. Indeed, in a lekking species known for dual-male displays (lance-tailed manakin, Chiroxiphia lanceolata), courtship displays in the presence of females were more predictable and coordinated between alpha and beta males in comparison those without femalese (Vanderbilt et al. 2015). In conclusion, we demonstrated among-individual variations in courtship dance in Java sparrows relying on detailed behavioral analyses in this study. Although it remained unexplained how such phenotypic variations contribute to the message content of courtship, it is obvious that courtship dance is not merely a byproduct of singing and warrants further scrutiny in the future, considering that only songs have been extensively investigated and are well-understood in songbirds. Acknowledgments This work was supported by Japan Society for the Promotion of Science Grants-in-Aid for Young Scientists (Nos. 23680027, 16H06177) received by MS.

References Backhouse F, Dalziell AH, Magrath RD, Welbergen JA (2022) Higher-order sequences of vocal mimicry performed by male Albert’s lyrebirds are socially transmitted and enhance acoustic contrast. Proc R Soc B Biol Sci 289:20212498. https://doi.org/10.1098/rspb.2021.2498 Baptista LF, Lawson R, Visser E, Bell DA (1999) Relationships of some mannikins and waxbills in the estrildidae. J Ornithol 140:179–192

108

M. Soma and M. Shibata

Boggs DF (1997) Coordinated control of respiratory pattern during locomotion in birds. Am Zool 37:41–53. https://doi.org/10.1093/icb/37.1.41 Bro-Jørgensen J (2010) Dynamics of multiple signalling systems: animal communication in a world in flux. Trends Ecol Evol 25:292–300. https://doi.org/10.1016/j.tree.2009.11.003 Byers BE, Kroodsma DE (2009) Female mate choice and songbird song repertoires. Anim Behav 77:13–22. https://doi.org/10.1016/j.anbehav.2008.10.003 Choi N, Adams M, Fowler-Finn K, Knowlton E, Rosenthal M, Rundus A, Santer RD, Wilgers D, Hebets EA (2022) Increased signal complexity is associated with increased mating success. Biol Lett 18:292–300. https://doi.org/10.1098/rsbl.2022.0052 Cooper BG, Goller F (2004) Multimodal signals: enhancement and constraint of song motor patterns by visual display. Science 303:544–546. https://doi.org/10.1126/science.1091099 Corballis MC (2009) The evolution and genetics of cerebral asymmetry. Philos Trans R Soc B Biol Sci 364:867–879. https://doi.org/10.1098/rstb.2008.0232 Da Silva ML, Piqueira JRC, Vielliard JME (2000) Using Shannon entropy on measuring the individual variability in the Rufous-bellied thrush Turdus rufiventris vocal communication. J Theor Biol 207:57–64. https://doi.org/10.1006/jtbi.2000.2155 Dalziell AH, Peters RA, Cockburn A, Dorland AD, Maisey AC, Magrath RD (2013) Dance choreography is coordinated with song repertoire in a complex avian display. Curr Biol 23: 1132–1135. https://doi.org/10.1016/j.cub.2013.05.018 de Reus K, Soma M, Anichini M, Gamba M, De Heer KM, Lense M, Bruno JH, Trainor L, Ravignani A (2021) Rhythm in dyadic interactions. Philos Trans R Soc B Biol Sci 376: 20200337. https://doi.org/10.1098/rstb.2020.0337 Fowler CA, Saltzman E (1993) Coordination and coarticulation in speech production. Lang Speech 36:171–195 Fujii S, Watanabe H, Oohashi H, Hirashima M, Nozaki D, Taga G (2014) Precursors of dancing and singing to music in three- to four-months-old infants. PloS One 9:1–12. https://doi.org/10.1371/ journal.pone.0097680 Gomes ACR, Funghi C, Soma M, Sorenson MD, Cardoso GC (2017) Multimodal signalling in estrildid finches: song, dance and colour are associated with different ecological and life-history traits. J Evol Biol 30:1336–1346. https://doi.org/10.1111/jeb.13102 Goodwin D (1982) Estrildid finches of the world. Cornell University Press, New York, NY Hall ML (2000) The function of duetting in magpie-larks: conflict, cooperation, or commitment? Anim Behav 60:667–677. https://doi.org/10.1006/anbe.2000.1517 Hall ML (2004) A review of hypotheses for the functions of avian duetting. Behav Ecol Sociobiol 55:415–430. https://doi.org/10.1007/s00265-003-0741-x Hall ML (2009) A review of vocal duetting in birds. Adv Study Behav 40:67–121. https://doi.org/ 10.1016/S0065-3454(09)40003-2 Hall ML, Magrath RD (2007) Temporal coordination signals coalition quality. Curr Biol 17:R406– R407. https://doi.org/10.1016/j.cub.2007.04.022 Hasegawa A, Soma M, Hasegawa T (2011) Male traits and female choice in Java sparrows: preference for large body size. Ornithol Sci 10:73–80. https://doi.org/10.2326/osj.10.73 Higham JP, Hebets EA (2013) An introduction to multimodal communication. Behav Ecol Sociobiol 67:1381–1388. https://doi.org/10.1007/s00265-013-1590-x Johnstone RA (1996) Multiple displays in animal communication: “backup signals” and “multiple messages”. Philos Trans R Soc Lond B Biol Sci 351:329–338 Kagawa H, Soma M (2013) Song performance and elaboration as potential indicators of male quality in Java sparrows. Behav Processes 99:138–144 Kozak EC, Uetz GW (2016) Cross-modal integration of multimodal courtship signals in a wolf spider. Anim Cogn 19:1173–1181. https://doi.org/10.1007/s10071-016-1025-y Kriesell HJ, Le Bohec C, Cerwenka AF, Hertel M, Robin JP, Ruthensteiner B, Gahr M, Aubin T, Düring DN (2020) Vocal tract anatomy of king penguins: morphological traits of two-voiced sound production. Front Zool 17:1–11. https://doi.org/10.1186/s12983-020-0351-8

6

Dancing in Singing Songbirds: Choreography in Java Sparrows

109

Langmore NE (1998) Functions of duet and solo songs of female birds. Trends Ecol Evol 13:136– 140. https://doi.org/10.1016/S0169-5347(97)01241-X Lewis RN, Soma M, de Kort SR, Gilman RT (2021) Like father like son: cultural and genetic contributions to song inheritance in an estrildid finch. Front Psychol 12:654198. https://doi.org/ 10.3389/fpsyg.2021.654198 Ligon RA, Diaz CD, Morano JL, Troscianko J, Stevens M, Moskeland A, Laman TG, Scholes E (2018) Evolution of correlated complexity in the radically different courtship signals of birds-ofparadise. PLoS Biol 16:1–24. https://doi.org/10.1371/journal.pbio.2006962 McGurk H, Macdonald J (1976) Hearing lips and seeing voices. Nature 264:746–748 Mehr SA, Singh M, Knox D, Ketter DM, Pickens-Jones D, Atwood S, Lucas C, Jacoby N, Egner AA, Hopkins EJ, Howard RM, Hartshorne JK, Jennings MV, Simson J, Bainbridge CM, Pinker S, O’Donnell TJ, Krasnow MM, Glowacki L (2019) Universality and diversity in human song. Science 366:eaax0868. https://doi.org/10.1126/science.aax0868 Miles MC, Fuxjager MJ (2018) Animal choreography of song and dance: a case study in the Montezuma oropendola, Psarocolius montezuma. Anim Behav 140:99–107. https://doi.org/10. 1016/j.anbehav.2018.04.006 Mitoyen C, Quigley C, Fusani L (2019) Evolution and function of multimodal courtship displays. Ethology 125:503–515. https://doi.org/10.1111/eth.12882 Mitoyen C, Quigley C, Boehly T, Fusani L (2021) Female behaviour is differentially associated with specific components of multimodal courtship in ring doves. Anim Behav 173:21–39. https://doi.org/10.1016/j.anbehav.2020.12.014 Mitri S, Wischmann S, Floreano D, Keller L (2013) Using robots to understand social behaviour. Biol Rev Camb Philos Soc 88:31–39. https://doi.org/10.1111/j.1469-185X.2012.00236.x Ota N (2020) Tap dancers in the wild: field observations of multimodal courtship displays in socially monogamous songbirds. Naturwissenschaften 107:30. https://doi.org/10.1007/s00114020-01686-x Ota N, Gahr M (2022) Context-sensitive dance–vocal displays affect song patterns and partner responses in a socially monogamous songbird. Ethology 128:61–69. https://doi.org/10.1111/ eth.13240 Ota N, Soma M (2022) Vibrational signals in multimodal courtship displays of birds. In: Hill PSM, Mazzoni V, Stritih-Peljhan N, Virant-Doberlet M, Wessel A (eds) Biotremology: physiology, ecology, and evolution. Springer International, Cham, pp 237–259 Ota N, Gahr M, Soma M (2015) Tap dancing birds: the multimodal mutual courtship display of males and females in a socially monogamous songbird. Sci Rep 5:16614. https://doi.org/10. 1038/srep16614 Ota N, Gahr M, Soma M (2017) Songbird tap dancing produces non-vocal sounds. Bioacoustics 26: 161–168. https://doi.org/10.1080/09524622.2016.1231080 Ota N, Gahr M, Soma M (2018) Couples showing off: audience promotes both male and female multimodal courtship display in a songbird. Sci Adv 4:1–7. https://doi.org/10.1126/sciadv. aat4779 Partan SR (2013) Ten unanswered questions in multimodal communication. Behav Ecol Sociobiol 67:1523–1539. https://doi.org/10.1007/s00265-013-1565-y Partan S, Marler P (1999) Communication goes multimodal. Science 283:1272–1273. https://doi. org/10.1126/science.283.5406.1272 Perrot C, Béchet A, Hanzen C, Arnaud A, Pradel R, Cézilly F (2016) Sexual display complexity varies non-linearly with age and predicts breeding status in greater flamingos. Sci Rep 6:1–10. https://doi.org/10.1038/srep36242 Pouw W, Proksch S, Drijvers L, Gamba M, Holler J, Kello C, Schaefer RS, Wiggins GA (2021) Multilevel rhythms in multimodal communication. Philos Trans R Soc B Biol Sci 376:1–9. https://doi.org/10.1098/rstb.2020.0334 Rȩk P (2018) Multimodal coordination enhances the responses to an avian duet. Behav Ecol 29: 411–417. https://doi.org/10.1093/beheco/arx174

110

M. Soma and M. Shibata

Ręk P, Magrath RD (2016) Multimodal duetting in magpie-larks: how do vocal and visual components contribute to a cooperative signal’s function? Anim Behav 117:35–42. https:// doi.org/10.1016/j.anbehav.2016.04.024 Restall R (1996) Munias and Mannikins. Pica Press, Haarlem Riebel K (2009) Song and female mate choice in zebra finches: a review. Adv Study Behav 40:197– 238. https://doi.org/10.1016/S0065-3454(09)40006-8 Rosenblum LD, Schmuckler MA, Johnson JA (1997) The McGurk effect in infants. Percept Psychophys 59:347–357. https://doi.org/10.3758/BF03211902 Schmidt MF, Wild JM (2014) The respiratory-vocal system of songbirds: anatomy, physiology, and neural control, 1st edn. Elsevier B.V, Amsterdam Searcy WA (1992) Song repertoire and mate choice in birds. Am Zool 32:71–80. https://doi.org/10. 1093/icb/32.1.71 Seller TJ (1979) Unilateral nervous control of the syrinx in Java sparrows (Padda oryzivora). J Comp Physiol A 129:281–288. https://doi.org/10.1007/BF00657664 Snyder KT, Creanza N (2019) Polygyny is linked to accelerated birdsong evolution but not to larger song repertoires. Nat Commun 10:1–15. https://doi.org/10.1038/s41467-019-08621-3 Soma MF (2011) Social factors in song learning: a review of Estrildid finch research. Ornithol Sci 10:89. https://doi.org/10.2326/osj.10.89 Soma M (2018) Sexual selection in Estrildid finches, with further review of the evolution of nesting material holding display in relation to cooperative parental nesting. Jpn J Anim Psychol 68:121– 130. https://doi.org/10.2502/janip.68.2.2 Soma M, Garamszegi LZ (2011) Rethinking birdsong evolution: meta-analysis of the relationship between song complexity and reproductive success. Behav Ecol 22:363–371. https://doi.org/10. 1093/beheco/arq219 Soma M, Garamszegi LZ (2015) Evolution of courtship display in Estrildid finches: dance in relation to female song and plumage ornamentation. Front Ecol Evol 3:1–11. https://doi.org/10. 3389/fevo.2015.00004 Soma M, Garamszegi LZ (2018) Evolution of patterned plumage as a sexual signal in estrildid finches. Behav Ecol 29:676–685. https://doi.org/10.1093/beheco/ary021 Soma M, Iwama M (2017) Mating success follows duet dancing in the Java sparrow. PloS One 12: e0172655. https://doi.org/10.5061/dryad.s6v4p Soma M, Mori C (2015) The songbird as a percussionist: syntactic rules for non-vocal sound and song production in Java sparrows. PloS One 10:1–10. https://doi.org/10.1371/journal.pone. 0124876 Soma M, Iwama M, Nakajima R, Endo R (2019) Early-life lessons of the courtship dance in a dance-duetting songbird, the Java sparrow. R Soc Open Sci 6:190563. https://doi.org/10.1098/ rsos.190563 Spezie G, Quigley C, Fusani L (2022) Learned components of courtship: a focus on postural displays, choreographies and construction abilities, 1st edn. Elsevier, Amsterdam Takeda KF, Hiraiwa-Hasegawa M, Kutsukake N (2018) Uncoordinated dances associated with high reproductive success in a crane. Behav Ecol 30:101–106. https://doi.org/10.1093/beheco/ ary159 Taylor RC, Klein BA, Stein J, Ryan MJ (2011) Multimodal signal variation in space and time: how important is matching a signal with its signaler? J Exp Biol 214:815–820. https://doi.org/10. 1242/jeb.043638 Ullrich R, Norton P, Scharff C (2016) Waltzing Taeniopygia: integration of courtship song and dance in the domesticated Australian zebra finch. Anim Behav 112:285–300. https://doi.org/10. 1016/j.anbehav.2015.11.012 Vanderbilt CC, Kelley JP, DuVal EH (2015) Variation in the performance of cross-contextual displays suggests selection on dual-male phenotypes in a lekking bird. Anim Behav 107:213– 219. https://doi.org/10.1016/j.anbehav.2015.06.023

6

Dancing in Singing Songbirds: Choreography in Java Sparrows

111

Varkevisser JM, Mendoza E, Simon R, Manet M, Halfwerk W, Scharff C, Riebel K (2022a) Multimodality during live tutoring is relevant for vocal learning in zebra finches. Anim Behav 187:263–280. https://doi.org/10.1016/j.anbehav.2022.03.013 Varkevisser JM, Simon R, Mendoza E, How M, van Hijlkema I, Jin R, Liang Q, Scharff C, Halfwerk WH, Riebel K (2022b) Adding colour-realistic video images to audio playbacks increases stimulus engagement but does not enhance vocal learning in zebra finches. Anim Cogn 25:249–274. https://doi.org/10.1007/s10071-021-01547-8 Williams H (2001) Choreography of song, dance and beak movements in the zebra finch (Taeniopygia guttata). J Exp Biol 204:3497–3506 Williams H (2004) Birdsong and singing behavior. Ann N Y Acad Sci 1016:1–30. https://doi.org/ 10.1196/annals.1298.029 Zanollo V, Griggio M, Robertson J, Kleindorfer S (2013) Males with a faster courtship display have more white spots and higher pairing success in the diamond firetail, Stagonopleura guttata. Ethology 119:344–352. https://doi.org/10.1111/eth.12071 Zentner M, Eerola T (2010) Rhythmic engagement with music in infancy. Proc Natl Acad Sci U S A 107:5768–5773. https://doi.org/10.1073/pnas.1000121107

Chapter 7

Vocal Communication in Corvids: Who Emits, What Information and Benefits? Yuiko Suzuki and Ei-Ichi Izawa

Abstract Corvids live in complex societies in which individuals form dominance hierarchies and affiliative relationships. Such complex individual-based social lives are thought to drive the flexible social skills involved in the audio-vocal communication among corvids. Recent studies suggest the flexibility of audio-vocal communication in corvids, such as cross-modal audio-visual individual recognition, foodassociated referential signaling, and volitional vocal control associated with arbitrary visual stimuli. Corvids are also kleptoparasitic foragers that scrounge food resources produced by heterospecific predatory animals, such as wolves and humans. Such foraging ecology is assumed to be the foundation for corvid cognition and behavior that are sensitive to the behavior of heterospecific animals. In this chapter, we review the audio-vocal communication among corvids, especially contact calls and alarm calls, and its cognitive underpinning from the viewpoints of social and foraging ecologies. Keywords Corvid · Bird · Cognition · Contact call · Alarm call · Kleptoparasitism

7.1

Vocal Communication as a Window into the Understanding of Animal Cognition

Vocal communication is advantageous for animals in wide and rapid propagation of signals toward multiple recipients and in interactions between individuals in visually blocked environments (such as forests) and in case of long-distance separation (Bradbury and Vehrencamp 2011). In this chapter, we review the vocal communication among corvids—a highly social group of birds, including crows, ravens, rooks, jackdaws, and jays—and the relevant cognition. Corvids are an interesting model for understanding the evolutionarily triadic relationship among individualbased social ecology, cognitive ability, and neural underpinning. This is because

Y. Suzuki · E.-I. Izawa (✉) Department of Psychology, Keio University, Tokyo, Japan e-mail: izawa@flet.keio.ac.jp © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 Y. Seki (ed.), Acoustic Communication in Animals, https://doi.org/10.1007/978-981-99-0831-8_7

113

114

Y. Suzuki and E.-I. Izawa

corvids, especially Corvus spp. (e.g., crows, ravens, jackdaws, and rooks), (1) live in fluid social groups where individuals recognize each other and establish dominance hierarchies and affiliative relationships among members for competition and cooperation, (2) show sophisticated cognitive skills under wild and laboratory conditions, and (3) possess enlarged non-laminar pallium, a homologue of mammalian cortices (Güntürkün and Bugnyar 2016; Nomura and Izawa 2017). All of these features of corvids have evolved independently of mammals approximately 350 million years ago. Thus, research on the ecology, cognition, and brain of corvids could allow us to understand the convergence and divergence aspects of the evolution of animal communication at both proximate and ultimate levels. In this chapter, amongst the variety of vocal communications in corvids, we focus on acoustic communication, which can involve individual recognition, a contact call for social contact with conspecifics, and an alarm call for sharing a perceived threat with conspecifics. Here, we first introduce studies on contact calls and alarm calls in non-corvid birds and mammals, and then we review studies on corvids.

7.2 7.2.1

Social Function of Vocal Signals in Group-Living Mammals and Birds Contact Call: Signals for Sender’s Social Information

Contact calls are vocal signals that are common in mammals and birds. This call plays a crucial role in maintaining contact with specific individuals such as mothers and infants (chicks), pair-bonded partners, and group members, especially in visually blocked situations (Kondo and Watanabe 2009). The acoustic structures of contact calls are stable and different within and between individuals, respectively, to encode individuality and other information, such as sex and age. Thus, for receivers, a contact call is a reliable signal for recognizing identity and/or social information regarding the signal sender. For example, in the breeding colony of emperor penguins (Aptenodytes forsteri), chicks find their parents returning from foraging by contact calls of the parents in noisy environments because of similar contact calls by neighboring chicks and parents (Jouventin et al. 1999). Adult pallid bats (Antrozous pallidus) emit contact calls to find colony mates in a crevice roost because they change colony location every 1–2 days so that they find their mates based not on spatial cues but on the acoustic characteristics of contact calls (Arnold and Wilkinson 2011). While inter-individual differences and intra-individual similarities of contact calls are useful for representing individual identity, the similarity of contact calls within allied individuals has been found in mammals and birds. In budgerigars (Melopsittacus undulatus), a small gregarious parrot, experimental group housing of unfamiliar individuals resulted in convergent acoustic structures of contact calls (Farabaugh et al. 1994; Hile and Striedter 2000). Pygmy marmosets (Cebuella

7

Vocal Communication in Corvids: Who Emits, What Information and Benefits?

115

pygmaea), living in a family group, have intra-group similarities but inter-group differences in the acoustic structures of contact calls (Elowson and Snowdon 1994). Interestingly, the acoustic structures of contact calls in male-female pairs converge following pair-bond formation (Snowdon and Elowson 1999). These studies suggest that contact calls potentially represent group membership or partnerships based on the plasticity of their acoustic structures, although there is little evidence that receiver individuals recognize group membership or pair-bond partnerships based on contact calls (e.g., Hopp et al. 2001).

7.2.2

Alarm Call: Signals for Potential Risk

Alarm calls are also common in birds and mammals, particularly in prey species. Alarm calls serve as a propagation of risks or threats, typically predators, as perceived by the vocal sender. Various animals possess multiple types of alarm calls with distinct acoustic structures, which are associated with predator types and/or urgency of the threat. Vervet monkeys (Chlorocebus pygerythrus) discriminatively produce three types of alarm calls depending on predator types, such as leopards, eagles, and snakes, and group members receiving calls showed discriminative anti-predator behaviors, such as climbing a tree for a leopard call, looking up for an eagle call, and looking down for a snake call (Seyfarth et al. 1980a, b; Cheney and Seyfarth 2007). The alarm calls of meerkats (Suricata suricatta) have been shown to represent predator types, such as carnivores and raptors, as well as the urgency of predation risk in acoustic structures (Manser 2001). When the urgency of predation risk is low or high, alarm calls become harmonic or noisy, respectively. In meerkat alarm calls, the predator types were reflected in differences in the lowest frequency formant. The formant frequency is determined by the resonance of the vocal tract, which is anatomically different among individuals. Consequently, the alarm calls of meerkats include individual identity (Townsend et al. 2014). Although alarm calls of velvet monkeys and meerkats represent the sender’s identity in acoustic structures, it remains unclear whether receivers of alarm calls recognize the identity of the sender (Schibler and Manser 2007). The social relationship between the sender and receiver affects whether the alarm calls function as an oncoming-risk signal. Similar alarm calls result in different responses of recipient behavior depending on the social relationship. In degus (Octodon degus), a social rodent, alarm calls emitted by adults cause higher rates of alert behavior in recipient degus than those by subadults. The alarm calls of adult degus contain frequency-modulated syllables, but those of subadults do not. The receiver, especially the elderly, may use frequency-modulated syllables only in adult alarm calls as cues for alert behavior (Nakano et al. 2013). Predation risk is critical for any individual, so receivers must sensitively respond to alarm signals. Such a tight linkage between alarm signals and escape responses is potentially at risk of the tactical use of alarm calls by senders to deceptively scare

116

Y. Suzuki and E.-I. Izawa

receivers away, which is referred to as false alarm calls. Indeed, false alarm calls have been reported in both inter-species and intra-species communication. For example, fork-tailed drongos (Dicrurus adsimilis) use various types of alarm calls, including drongos-specific calls, and mimic meerkats’ and pied babblers’ (Turdoides bicolor) alarm calls. Drongos emit alarm calls for predator approach, which can be eavesdropped by meerkats and babblers to escape from predatory risk. However, drongos emit alarm calls under no predation risk to deceptively scare heterospecific animals away and steal food (Flower et al. 2014). The subordinates of great tits (Parus major) have been found to use false alarm calls to startle dominant conspecifics and heterospecifics away from the occupied food resources (Møller 1988).

7.3 7.3.1

Vocal Communication in Corvids Social Ecology of Corvids as a Foundation for Vocal Communication of Conspecific and Heterospecific Information

Corvids, such as crows, ravens, and jays, flexibly change social structures depending on the time of day, seasons, ages, and natural history (Goodwin 1986). Corvids are monogamy and form lifelong pair-bonds. Pair-bonded adult males and females spend time together throughout the year and cooperate to breed in their territory. In contrast, juveniles and non-pair-bonded adults aggregate and move repeatedly and remain in a wide range of non-territorial areas. The aggregation of juveniles and non-breeder adults is the so-called fission-fusion group in which members occasionally repeat leaving and re-joining aggregations (e.g., Loretto et al. 2017). Both territorial pair-bonds and free-floating singletons often share unoccupied foraging areas during the daytime and outside breeding seasons, and roost together with hundreds or even thousands of conspecifics at night (Goodwin 1986). Even in such fluid groups of crows and ravens, individuals recognize each other and form dominance hierarchies and affiliative relationships via aggressive and affiliative (e.g., allopreening) interactions, respectively, to avoid the escalation of agonistic conflicts over limited resources in both captive and wild populations (Chiarati et al. 2010; Miyazawa et al. 2020; Boucherie et al. 2022). Such social ecology is assumed to play a role as a socio-ecological foundation in physical, visual, and audio-vocal communication among conspecifics at various spatial distances in corvids. Another ecological foundation for vocal communication in corvids, especially crows and ravens, is kleptoparasitism in foraging. Kleptoparatisism is a foraging strategy that exploits or scrounges food resources produced by conspecific and heterospecific animals (Nishimura 2010). Ravens are known to preferentially follow the foraging packs of wolves, which are potential predators for ravens, and to steal prey killed by wolves (Vucetich et al. 2004). Ravens have also been documented to approach the gunshot sounds of human hunters, suggesting the exploitation of prey

7

Vocal Communication in Corvids: Who Emits, What Information and Benefits?

117

animals produced by humans (White 2005). Similar foraging strategies have been documented in urban areas, where ravens and crows scavenge anthropogenic food sources, such as garbage (Loretto et al. 2016; Benmazouz et al. 2021). Such a kleptoparasite foraging strategy is assumed to be a socio-ecological foundation for crows and ravens to flexibly use behavioral tactics, including vocal communication, in interspecific conflicts over food sources by incorporating sensitively to foodproducing predators, such as wolves and humans.

7.3.2

Contact Call and Individual Recognition

Similar to various birds and mammals, crows and ravens use contact calls to recognize specific individuals. In large-billed crows (C. macrorhynchos), a field study reported vocal exchange using a single-element ka call between wild individuals, which indicates a ka call as a contact call (Kondo et al. 2010b). The acoustic structures of ka calls were found to be consistent within but different between individuals, suggesting that this call could serve as a vocal cue for individual recognition (Kondo et al. 2010a). An experimental study using an expectancy-violation paradigm found that large-billed crows matched ka calls and the appearance of familiar conspecifics, indicating audio-visual cross-modal individual recognition (Kondo et al. 2012). In addition, such cross-modal matching between ka calls and appearance of conspecifics was confirmed only for group members but not for non-group members, suggesting that recognizing individuals involved in learning to associate between the voice and appearance of conspecifics were repeatedly encountered in their fission-fusion lives. In ravens, individual recognition based on contact calls lasted for up to 3 years. In an experimental study using vocal playback in captive groups of ravens, ravens showed contactcall responses to the played-back calls of affiliative group members separated for up to 3 years but did not respond to calls of non-affiliative familiar or unfamiliar individuals (Boeckle and Bugnyar 2012). Ravens possess a type of contact call to recruit conspecifics in the foraging context. Yell calls of ravens were relatively loud for long-distance propagation and emitted at food sources to form an aggregation of conspecifics, especially when the food was too large to be occupied by the finder(s) (Heinrich 1988). A specific type of yell, haa call, was produced by the first few finders of a food resource but was difficult to access due to, for example, the presence of nearby predators. Acoustic analysis and behavioral tests have suggested that acoustic structures of haa calls encode individuals, and calling rates change depending on the type of food, such as meats and animal chow (Bugnyar et al. 2001; Boeckle et al. 2012). A recent followup study found that the acoustic properties and emission rates of haa calls were affected by the signalers’ age and sex, as well as by the presence of affiliative individuals nearby (Sierro et al. 2020). These pieces of evidence suggest that haa calls, a type of contact call, in ravens serve as a potential referential signal to the recipients.

118

7.3.3

Y. Suzuki and E.-I. Izawa

Alarm Call: Vocal Signals for Recognizing Identity and Behavior of Heterospecific Animals

Corvids use alarm calls to recognize the type and/or identity of predators. Siberian jays (Perisoreus infaustus), a small corvid living in kin-based family groups, were found to emit different anti-predator-calls associated with specific hawk behaviors, such as searching and attacking (Griesser 2008). In addition, a playback experiment using different types of anti-predator calls revealed that the jays exhibited specific escape responses depending on the type of anti-predator calls. This study suggests that jays share predation risk by using alarm calls among group members to reduce mortality. Unlike jays, crows are generally thought to be less under predation risk. However, crows possess alarm calls to communicate with potential threatsm such as humans. American crows (C. brachyrhynchos) have been demonstrated to quickly learn to recognize the faces of dangerous individuals (Marzluff et al. 2010). In a field experimental study, an experimenter placed a mask of a unique face captured and released several individuals of crow flocks in the wild. Marzluff et al. (2010) found that, after the experimental capture and release, both the captured and not captured crows in the flock showed scolding/mobbing specifically to the approach of a person putting on the ‘captured’ mask and that the scolding response was observed even 2 years and more after the capturing. A follow-up field study found that the scolding/ mobbing response to a dangerous person was spatially spread in uncaptured crows, which inhabited areas surrounding the capture site, across 3 years with increasing intensity and consistency of the response (Cornell et al. 2012). These studies suggest that vocal communication using alarm calls plays a crucial role in the social learning of potential threats, such as the visual information of heterospecific animals (e.g., human individuals). A similar alarm response to specific human individuals as a potential danger has been reported in breeding jackdaws (C. monedula). Lee et al. (2019) evaluated social learning to recognize dangerous humans based on conspecific alarm calls using a vocal playback technique. In this experiment, an unfamiliar human approaching the nest box of the focal jackdaw was presented together with a playback of conspecific scold calls. In the subsequent encounters of the ‘dangerous’ person without alarm calls, the jackdaws exhibited a heightened fear response and shortened latency to return to the nest box. This finding also suggests the role of alarm call communication in the visual social learning of a potential threat such as humans.

7.3.4

Alarm Calls in Non-Breeding Groups: Information and Function

Most studies on alarm calls in corvids have examined pair-bonded adults during breeding seasons. Alarm calls in non-breeder groups comprising juveniles and

7

Vocal Communication in Corvids: Who Emits, What Information and Benefits?

119

unpaired adults have rarely been investigated. In large-billed crows, the use of alarm calls in non-breeder groups has been suggested to be related to dominance ranks. A previous study of a captive group of unpaired large-billed crows revealed that sequential ka-ka calls, an alarm signal in this species (Kuroda 1990), were produced most often by the first-dominant male, although no relationship was found between dominance rank and the emission rate of contact calls (Kondo and HiraiwaHasegawa 2015). Sentinel alarm-vocalization behavior by a few specific individuals within a group was found only in cooperative breeding species such as meerkats (Manser 1999) and pied babblers (Turdoides bicolor; Manser 1999) and kin-based cooperation accounted for the benefits of alarm behavior. This was not the case for the non-breeder group of large-billed crows. Although a hypothesis of dominance signaling, which could be beneficial for mate choice (Grafen 1990), might be a possible account for the reason why the first-dominant male emits alarm calls most frequently, no study to support this hypothesis has been yet reported. Our recent findings suggest that alarm ka-ka calls of large-billed crows may represent unfamiliar human males. We experimentally evaluated what response using ka-ka calls the first-ranked male of a captive non-breeder group exhibited against the potential threat of an approaching human individual. In the experiment, a person from either a ‘familiar/unfamiliar’ × ‘male/female’ category approached the group in an outdoor aviary. Familiar males and females were the daily caretakers in the captive group. We analyzed the acoustic structures of alarm ka-ka calls emitted by the first-ranked male in response to an approaching person. Acoustic analysis using principal component analysis revealed that ka-ka calls fall into three types based on acoustic structures (see spectrograms in Fig. 7.1). Interestingly, the call type C (Fig. 7.1c) was only emitted in response to unfamiliar males. No such category-specific responses were found in unfamiliar females or familiar males and females. These results suggest that the first-ranked male of a non-breeder group may propagate a potential threat to unfamiliar human males by using a specific type of alarm ka-ka call, although the information that the first-ranked crow specifies unfamiliar males based on was undetermined. Given the cognitive ability of crows to discriminate human individuals and sex based on visual information (Marzluff et al. 2010; Bogale et al. 2011; Lee et al. 2019), alarm ka-ka calls of the first-ranked crow likely represent unfamiliar males as a threat, even for the non-breeder group. However, what benefits alarm behavior could be paid for remains an open question. Producing alarm ka-ka calls to specific human individuals suggests the ability of crows to learn and use the association of alarm call production with a certain visual cue of specific heterospecific individuals. This flexible learning ability consists, as aforementioned, the evidence of social learning of crows and jackdaws to recognize dangerous persons based on the face (Marzluff et al. 2010; Cornell et al. 2012; Lee et al. 2019). Further support for this ability was reported in a recent study using operant conditioning in which subject carrion crows (C. corone) were shown to learn to produce and suppress contact-call-like voice discriminatively depending on different colors of visual stimuli (Brecht et al. 2019). Although the calls emitted by the crows in this study were not alarm calls, this finding suggests a learning ability to control call production associated with arbitrary visual objects. Similar flexible

120

Y. Suzuki and E.-I. Izawa

Fig. 7.1 Spectrographs of alarm ka-ka calls of large-billed crows. (a–c) Three types of ka-ka calls emitted by the first-ranked male in response to different persons approaching. The calls characterized as a sequence of ka elements with a dominant frequency band at around 1 kHz and 0.2–0.5 s inter-element intervals. (d) Three-dimensional plots of principal component scores produced by multiple acoustic parameters of ka-ka calls by the first-ranked male

use and/or plasticity of alarm calls were found in a recent study of green monkeys (Chlorocebus sabaeus), in which aerial alarm calls were rapidly learned to associate with a novel flying object (e.g., drone) (Wegdell et al. 2019), suggesting that the alarm calls are more plastic and modifiable depending on the experience than ever thought. For crows and ravens, especially in anthropogenic environments, humans are potential predators and also producers of food resources, such as wolves in the winter forest, for the kleptoparasite foraging ecology of crows and ravens. Such riskand-benefit conflicts associated with humans in foraging ecology may drive the flexible ability to recognize conspecific and heterospecific individuals in audio-vocal communication in crows and ravens.

7.4

Future Direction

In this chapter, we have reviewed vocal communication behavior and cognition of corvids, a highly social avian group. One of the most important characteristics of corvids’ audio-vocal communication is the individual basis, which is tightly

7

Vocal Communication in Corvids: Who Emits, What Information and Benefits?

121

associated with complex social ecology consisting of inter-individual competition (e.g., dominance ranks) and affiliation. Such individual-based social lives in corvids are assumed to be an ecological foundation to facilitate the cognitive ability to flexibly use the behavioral tactics of corvids, especially crows and ravens. Foraging ecology, especially kleptoparasitism, of crows and ravens is thought to be another crucial ecological foundation for their behavioral and cognitive flexibility. The kleptoparasitic strategy demands crows and ravens to adjust their behavior for competing conspecific rivals over food resources, but also to control the risk of predation by heterospecific animals. Such conflicting foraging ecology could also drive the behavior and cognition of crows and ravens to be more flexible and/or tactical. These ecologically grounded behavioral and cognitive flexibilities of crows and ravens are underpinned by the ability to recognize various types of information, such as individuals, social relationships, food, risk, contexts, and so on, as semantic components for communication. Such information should be processed at the singlemodality level (e.g., visual or audio-vocal alone), but at the cross-modal integration level. Indeed, vocal communications using contact and alarm calls in corvids, introduced in this chapter, involve the cognitive process of multimodal semantic and referential information as a prerequisite for beyond-modality human language. Although these arguments are not new but have continued for audio-vocal communication in other birds and mammals (e.g., Cheney and Seyfarth 2007; Suzuki 2021). Future research efforts on vocal communication in corvids have the potential to understand the evolutionary triadic relationships among individual-based social ecology, cognitive ability, and neural systems in non-mammalian lineages.

References Arnold BD, Wilkinson GS (2011) Individual specific contact calls of pallid bats (Antrozous pallidus) attract conspecifics at roosting sites. Behav Ecol Sociobiol 65:1581–1593 Benmazouz I, Jokimäki J, Lengyel S, Juhász L, Kaisanlahti-Jokimäki M-L, Kardos G, Paládi P, Kövér L (2021) Corvids in urban environments: a systematic global literature review. Animals 11:3226 Boeckle M, Bugnyar T (2012) Long-term memory for affiliates in ravens. Curr Biol 9:801–806 Boeckle M, Szipl G, Bugnyar T (2012) Who wants food? Individual characteristics in raven yells. Anim Behav 84:1123–1130 Bogale BA, Aoyama M, Sugita S (2011) Categorical learning between “male” and “female” photographic human faces in jungle crows (Corvus macrorhynchos). Behav Process 86:109– 118 Boucherie PH, Gallego-Abenza M, Massen JMM, Bugnyar T (2022) Dominance in a socially dynamic setting: hierarchical structure and conflict dynamics in ravens’ foraging groups. Phil Trans Roy Soc B 377:20200446 Bradbury JW, Vehrencamp SL (2011) Principles of animal communication, 2nd edn. Sinauer, London Brecht KF, Hage SR, Gavrilov N, Nieder A (2019) Volitional control of vocalizations in corvid songbirds. PLoS Biol 17:e3000375 Bugnyar T, Kijne M, Kotrschal K (2001) Food calling in ravens: are yells referential signals? Anim Behav 61:949–958

122

Y. Suzuki and E.-I. Izawa

Cheney DL, Seyfarth RM (2007) Baboon metaphysics. The University of Chicago Press, Chicago, IL Chiarati E, Canestrari D, Vera R, Marcos JM, Baglione V (2010) Linear and stable dominance hierarchies in cooperative carrion crows. Ethology 116:346–356 Cornell HN, Marzluff JM, Pecoraro S (2012) Social learning spreads knowledge about dangerous humans among American crows. Proc Biol Sci 279:499–508 Elowson AM, Snowdon CT (1994) Pygmy marmosets, Cebuella pygmaea, modify vocal structure in response to changed social environment. Anim Behav 47:1267–1277 Farabaugh SM, Linzenbold A, Dooling RJ (1994) Vocal plasticity in budgerigars (Melopsittacus undulatus): evidence for social factors in the learning of contact calls. J Comp Psychol 108:81– 92 Flower TP, Gribble M, Ridley AR (2014) Deception by flexible alarm mimicry in an African bird. Science 344:513–516 Goodwin D (1986) Crows of the world. In: British museum (natural history), 2nd edn, London Grafen AJ (1990) Biological signals as handicaps. J Theor Biol 144:517–546 Griesser M (2008) Referential calls signal predator behavior in a group-living bird species. Curr Biol 18:69–73 Güntürkün O, Bugnyar T (2016) Cognition wihtout cortex. Trends Cogn Sci 20:291–303 Heinrich B (1988) Winter foraging at carcasses by three sympatric corvids, with emphasis on recruitment by the raven, Corvus corax. Behav Ecol Sociobiol 23:141–156 Hile AG, Striedter GF (2000) Call convergence within groups of female budgerigars (Melopsittacus undulatus). Ethology 106:1105–1114 Hopp SL, Jablonski P, Brown JL (2001) Recognition of group membership by voice in Mexican jays, Aphelocoma ultramarina. Anim Behav 62:297–303 Jouventin P, Aubin T, Lengagne T (1999) Finding a parent in a king penguin colony: the acoustic system of individual recognition. Anim Behav 57:1175–1183 Kondo N, Hiraiwa-Hasegawa M (2015) The influence of social dominance on calling rate in the large-billed crow (Corvus macrorhynchos). J Ornithol 156:775–782 Kondo N, Watanabe S (2009) Contact calls: information and social function. Jpn Psychol Res 51: 197–208 Kondo N, Izawa E-I, Watanabe S (2010a) Perceptual mechanism for vocal individual recognition in jungle crows (Corvus macrorhynchos): contact call signature and discrimination. Behaviour 147:1051–1072 Kondo N, Watanabe S, Izawa E-I (2010b) A temporal rule in vocal exchange among large-billed crows Corvus macrorhynchos in Japan. Ornithol Sci 9:83–91 Kondo N, Izawa E-I, Watanabe S (2012) Crows cross-modally recognize group members but not non-group members. Proc Biol Sci 279:1937–1942 Kuroda N (1990) The jungle crows in Tokyo. Yamashina Institute of Ornithology, Abiko Lee VE, Régli N, McIvor GE, Thornton A (2019) Social learning about dangerous people by wild jackdaws. R Soc Open Sci 6:191031 Loretto M-C, Reimann S, Schuster R, Graulich DM, Bugnyar T (2016) Shared space, individually used: spatial behaviour of non-breeding ravens (Corvus corax) close to a permanent anthropogenic food source. J Ornithol 157:439–450 Loretto MC, Schuster R, Itty C, Marchand P, Genero F, Bugnyar T (2017) Fission-fusion dynamics over large distances in raven non-breeders. Sci Rep 7:380 Manser MB (1999) Response of foraging group members to sentinel calls in suricates, Suricata suricatta. Proc Biol Sci 266:1013–1019 Manser MB (2001) The acoustic structure of suricates’ alarm calls varies with predator type and the level of response urgency. Proc Biol Sci 26:2315–2324 Marzluff JM, Walls J, Cornell HN, Withey JC, Craig DP (2010) Lasting recognition of threatening people by wild American crows. Anim Behav 79:699–707

7

Vocal Communication in Corvids: Who Emits, What Information and Benefits?

123

Miyazawa E, Seguchi A, Takahashi N, Motai A, Izawa E-I (2020) Different patterns of allogrooming between same-sex and opposite-sex in non-breeder groups of wild-caught largebilled crows (Corvus macrorhynchos). Ethology 126:195–206 Møller AP (1988) False alarm calls as a means of resource usurpation in the great tit Parus major. Ethology 79:25–30 Nakano R, Nakagawa R, Tokimoto N, Okanoya K (2013) Alarm call discrimination in a social rodent: adult but not juvenile degu calls induce high vigilance. J Ethol 31:115–121 Nishimura K (2010) Kleptoparasitism and cannibalism. In: Breed MD, Moore J (eds) Encyclopedia of animal behavior. Academic Press, Amsterdam, pp 253–258 Nomura T, Izawa E-I (2017) Avian brains: insights from development, behaviors and evolution. Dev Growth Differ 59:244–257 Schibler F, Manser MB (2007) The irrelevance of individual discrimination in meerkat alarm calls. Anim Behav 74:1259–1268 Seyfarth RM, Cheney DL, Marler P (1980a) Vervet monkey alarm calls: semantic communication in a free-ranging primate. Anim Behav 28:1070–1094 Seyfarth RM, Cheney DL, Marler P (1980b) Monkey responses to three different alarm calls: evidence of predator classification and semantic communication. Science 210:801–803 Sierro J, Loretto M-C, Szipl G, Massen JJM, Bugnyar T (2020) Food calling in wild ravens (Corvus corax) revisited: who is addressed? Ethology 126:257–266 Snowdon CT, Elowson AM (1999) Pygmy marmosets modify calls structure when paired. Ethology 105:893–908 Suzuki NT (2021) Animal linguistics: exploring referentiality and compositionality in bird calls. Ecol Res 36:221–231 Townsend SW, Charlton BD, Manser MB (2014) Acoustic cues to identity and predator context in meerkat barks. Anim Behav 94:143–149 Vucetich JA, Peterson RO, Waite TA (2004) Raven scavenging favours group foraging in wolves. Anim Behav 67:1117–1126 Wegdell F, Hammerschmidt K, Fischer J (2019) Conserved alarm calls but rapid auditory learning in monkey responses to novel flying objects. Nat Ecol Evol 3:1039–1042 White C (2005) Hunters ring dinner bell for ravens: experimental evidence of a unique foraging strategy. Ecology 86:1057–1060

Chapter 8

Affiliation, Synchronization, and Rhythm Production by Birds Yuko Ikkatai and Yoshimasa Seki

Abstract Musical entrainment is widely observed in humans but is not often noted in other animals. “The vocal learning and rhythmic synchronization” hypothesis could evince a possible reason for this disparity. This premise suggests that the ability to predict auditory beats and move to a certain rhythm is connected to the complex neural circuitry for vocal learning. Parrots are excellent vocal learners, which makes them one of the best research subjects for the examination of this hypothesis. Therefore, this chapter deals with the budgerigar (Melopsittacus undulatus), a small parrot species. We also focus on the highly social behavior of this species. We review their behavioral mechanisms of establishing and maintaining relationships with other individuals and then describe the capability of budgerigars to effect rhythmic synchronization to metronomic sounds. Subsequently, we compare this capability in budgerigars with the abilities of a songbird species as another lineage of vocal learners. We argue the possibility that the substrates of capabilities for rhythmic synchronization are linked to the substrates for vocal learning and social behaviors. Keyword Avian · Mimicry · Behavioral contagion · Timing coordination · Sensorimotor coordination

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/978-981-99-0831-8_8. Y. Ikkatai (✉) College of Human and Social Sciences, Kanazawa Unviersity, Kanazawa, Japan e-mail: [email protected] Y. Seki Department of Psychology, Aichi University, Toyohashi, Japan © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 Y. Seki (ed.), Acoustic Communication in Animals, https://doi.org/10.1007/978-981-99-0831-8_8

125

126

8.1

Y. Ikkatai and Y. Seki

Introduction

Human beings widely evince entrainment to rhythmic beats. However, no evidence demonstrating the ability of non-human animals to move their body parts in synchrony with human music existed until a paper (Patel et al. 2009) was published about a parrot named Snowball. Snowball (a cockatoo) was spontaneously entrained to musical beats of man-made human music with varied timing. Such entrainment is probably unnecessary for birds; hence, it is unlikely that it evolved via the natural and/or sexual selection processes. Schachner et al. (2009) later documented similar behaviors exhibited by another famous parrot, an African grey named Alex. Why do such interesting reports entail parrots? That they are vocal learners is a possible reason. The “vocal learning and rhythmic synchronization” hypothesis (Patel 2006) proposed that the capacity for rhythmic synchronization evolved as a by-product of the capability of vocal learning. The hypothesis appears reasonable because both vocal learning and synchronization of body movements to rhythmic stimuli are established on the substrate for transforming sensory inputs to motor outputs. We could consider another characteristic of parrots as a further reason for this capability. Parrots often establish strong interpersonal bonds, even sometimes with their human caretakers. Social rewards could amplify their beat perception and synchronization (Patel 2021). Therefore, we first review social behaviors, affiliations between mates, the vocal mimicry of cage mates, and behavioral contagions of budgerigars (Melopsittacus undulatus), a species of small social parrots that have been popularly used as subjects for vocal learning. We then describe the capabilities of budgerigars concerning rhythmic synchronization to metronomic sounds. Further, we compare the capabilities of budgerigars with the abilities of Bengalese finches (Lonchura striata var. domestica), a songbird species that signifies another lineage of vocal learners. Such a comparison could provide the grounds to argue the relationship between the capabilities of vocal learning and rhythmic synchronization. Finally, we contend the possibility that the substrates of the capability for rhythmic synchronization are linked to the substrates for social behaviors between individuals.

8.2

Affiliative Interactions Between Mates

Budgerigars establish social relationships with other individuals and forge particularly strong bonds with their mating partners. In Australia, wild budgerigars congregate in flocks that encompass dozens to hundreds of individuals of different sexes and ages. They perform most activities such as flying, feeding, perching, roosting, and drinking in a flock (Wyndham 1980). The basic unit of a flock is a male-female pair bond. They leave the flock during the breeding season to procreate. Females incubate eggs in nests in trees. Males form small flocks with neighboring individuals

8 Affiliation, Synchronization, and Rhythm Production by Birds

127

to forage for food for their nesting female partners and fledglings (Wyndham 1980). In captivity, the pair bond is usually established in a few weeks (e.g., Hile et al. 2000). Pair bonds appear to be maintained through a lifetime, but empirical data are limited. A captive experiment revealed that the pair bonds resumed even after their 70-day separation into unisexual groups (Trillmich 1976). Affiliative interactions such as warbling songs, head shaking, courtship feeding, and copulation are important in budgerigars to maintain the pair bond (e.g., Brockway 1964b; Wyndham 1980). Female budgerigars do not necessarily accept the affiliation actions of males during the mating process. Females often reject such actions through aggressive behaviors such as pecking attacks, pecking motions, and displacement (Brockway 1964a). Ikkatai and Izawa (2012) kept two groups of 10 budgerigars each and observed the mating formation process. The frequencies of both affiliative and aggressive behaviors increased just before the pair bond was formed, but aggressive behaviors decreased significantly after the pair bond was established. Affiliative behaviors also decreased slightly after the pair bond was established, but were subsequently sustained at a constant high level. This finding revealed that affiliation contributed to both the formation and maintenance of pair bonds (Ikkatai and Izawa 2012). Affiliative interactions after conflicts are also labeled post-conflict affiliation (PC affiliation) and aid in restoring valuable relationships. PC affiliation is classified into two types. The first type of affiliation occurs between parties to aggressive interactions (i.e., reconciliation). The relationship between the parties may be dissolved because of agonistic encounters. Affiliations help conflict management between the parties and enable them to avoid the loss of their relationship. The second type of affiliation develops between one party to an aggressive interaction and a third-party bystander (i.e., consolation and bystander affiliation). Such affiliations are thought to help reduce stress. PC affiliation has been reported primarily in primate species (e.g., Fujisawa et al. 2005; Cordoni et al. 2006; Kutsukake and Castles 2004). Interestingly, some corvid birds engage in PC affiliation, but the patterns of occurrence vary according to bird species and individual relationships (Seed et al. 2007; Fraser and Bugnyar 2010, 2011; Logan et al. 2013). Ikkatai et al. (2016b) observed agonistic interactions and their outcomes in pair-bonded budgerigars. Their study reported that males initiated both types of PC affiliation more often than females. A defeated male often initiated reconciliatory affiliation with partner females. This male-biased pattern is not unique to the PC context; however, it is common to the non-PC context. Bystander males more often initiated third-party affiliations with partner females when the latter were defeated in extra-pair conflicts. Alternatively, bystander females initiated affiliations more often with victorious male partners. This finding demonstrates that budgerigars possess a behavioral mechanism that allows them to maintain relationships of the highest quality within a monogamous breeding system.

128

8.3

Y. Ikkatai and Y. Seki

Vocal Mimicry Between Cage Mates

Budgerigars learn new vocalizations throughout life. Interestingly, cage mates converge in their vocalization patterns. This ability could strengthen their social bonds. The birds each can make short contact calls, and vocal patterns vary acoustically from individual to individual. Experimental studies have evidenced that budgerigars can discriminate between the contact calls of others (Park and Dooling 1985; Dooling 1986). Farabaugh et al. (1994) formed two groups of three males each and investigated the process of change in contact calls. In one group, two different birds imitated each other’s contact calls between 1 and 2 weeks. Bartlett and Slater (1999) also reported that when a new male individual was introduced to a wellestablished male group, the newcomer converged his contact call with the other three members in 15–20 days. Hile and Striedter (2000) kept two groups of four females each and observed their contact calls for 8 weeks. The females shared their calls in 4–7 weeks. Hile et al. (2000) paired nine males and nine females to maximize the differences in contact calls between the males and females. The pairs shared identical contact calls in around 2 weeks on average, and eight of the nine males converged their contact calls to their paired females. The asymmetrical call convergence is considered to be correlated with the formation and maintenance of the pair bond in budgerigars (Hile et al. 2000). Interestingly, vocal convergence has not been reported in experimental operant settings (Manabe et al. 1995, 1997; Manabe and Dooling 1997).

8.4

Behavioral Contagion Among Individuals

Budgerigars can mimic and learn the behaviors of others. Heyes and Saggerson (2002) investigated imitative learning using a two-object/two-action test. The subject budgerigars (n = 28) observed a conspecific demonstrator (n = 8) using their beaks to remove objects (blue and black stoppers) from a hole and feed on it. The objects could be removed by pulling up or pushing down. The subject birds then attained access to the object and attempted to obtain food. The experimenter observed the objects and behaviors selected by the subject birds. The results revealed that the test birds removed the stoppers in the same direction as the moderator. Additionally, the budgerigars imitated the behaviors of the virtual demonstrator presented during video-playback (Mottley and Heyes 2003). Mui et al. (2008) trained 22 budgerigars to peck and step on a manipulandum. The subject birds then learned a discrimination task in two groups. Individuals in the compatible group were rewarded when they pecked during video-playback observations of pecking and when they stepped during video-playback observations of stepping. Conversely, subjects in the incompatible group were rewarded when they pecked during video-playback observations of stepping and when they stepped during video-playback observations of pecking. Both groups learned each task, but the

8 Affiliation, Synchronization, and Rhythm Production by Birds

129

compatible group delivered the correct response more often than the incompatible group (Mui et al. 2008). Richards et al. (2009) used 38 male budgerigars as subjects and let them watch the virtual demonstrator of conspecific feeding by pecking or stepping on the stopper. The subjects were allowed to access the stopper after 24 h had passed. The results disclosed that subject birds that had observed a demonstrator pecking were more likely to imitate the behavior after 24 h than those who observed a demonstrator stepping on the stopper. Contagious yawning and stretching have also been reported in budgerigars. Miller et al. (2012) observed a group of 21 birds in captivity for 15 days. Yawning and stretching occurred at short intervals after other individuals had yawned and stretched. Gallup et al. (2015) randomly paired two budgerigars in an experimental setting and observed their yawning occurrences. The frequency of contagious yawning increased when the two birds could see each other in comparison to when they could not. The contagion of yawning was also elicited using video stimuli of other budgerigars yawning (Gallup et al. 2015). The social nature of budgerigars can be also observed in their feeding behaviors. The author of this chapter investigated whether the video-playback of conspecific or non-conspecific feeding facilitated feeding in budgerigars and Bengalese finches. The amount of food consumed by the birds was compared between two conditions: the presentation of non-feeding conspecific/non-conspecific videos and the display of feeding conspecific/non-conspecific videos (Ikkatai and Seki 2016). Our results demonstrated that conspecific feeding videos facilitated food consumption in budgerigars but not in Bengalese finches. The authors of this chapter further investigated whether the social nature of budgerigars could also be observed via a type of audio-visual telecommunication system. For this purpose, two budgerigars located in two separate sound-attenuated boxes were brought face to face through a monitor and a loudspeaker. Their behaviors were recorded and categorized to examine the behavioral contagion between the birds. The recording was performed in two experimental conditions: (1) the “interactive two-way condition” in which the two birds could communicate in real-time and (2) the “non-interactive (one-way) condition” in which a budgerigar appeared on the monitor as a pre-recorded video. Synchronous locomotion was more observed between the birds during the interactive two-way condition. Contagious ruffling, wing-stretching, and yawing were observed; however, the number of occurrences was not statistically different between the two conditions (Ikkatai et al. 2016a). This study demonstrated the intensely social nature of budgerigars and evidenced their strong inclination to imitate each other’s behaviors. Interestingly, Heyes (2011) described “automatic imitation” as the mimicking of observed behavior in non-imitative situations, which requires sensorimotor coordination and relies on a mirror neuron system (Rizzolatti et al. 1996). A kind of mirror neurons were also found in the vocal learning nervous systems of songbirds (Prather et al. 2008).

130

8.5 8.5.1

Y. Ikkatai and Y. Seki

Rhythmic Synchronization Budgerigars

A study analyzing the locomotion of dancing animals in an immense number of YouTube videos to investigate their entrainment capacities vis-à-vis musical beats reported negative results for rhythmic synchronization in budgerigars (Schachner et al. 2009). However, as described in the introductory section, budgerigars potentially possess this capacity if the vocal learning hypothesis is valid because they are prominent vocal learners. Therefore, some experiments were designed to ascertain the capacities of budgerigars in experimental setups. Hasegawa et al. (2011) isochronously presented short (300 ms) audio-visual stimuli (combination of a 3-kHz pure tone and an LED illumination) to budgerigars and the birds were trained using an operant conditioning method with food rewards to peck the LED key only when the stimuli were presented. Therefore, all that the birds were required to accomplish in this task was to repeatedly peck a key in response to a stimulus presentation. However, the peck timing was distributed around the onset of the stimulus, which was quicker than the estimated reaction time. Hasegawa et al. also found a response labeled negative mean asynchrony (NMA) in rhythmic tapping tasks in humans (Repp 2005): the mean response time of the budgerigars slightly preceded the stimulus onset. Thus, the birds anticipated the timing of the incoming stimuli using the tempo of the metronome, indicating that the operant training allowed their entrainment to the rhythmic stimuli. Seki and Tomyta (2019) further investigated the spontaneity of the synchronization of the peck timing in budgerigars to the metronomic rhythm. Their study used the operant conditioning method with food rewards to train naïve birds to alternatively peck two (left and right) keys eight times. Subsequently, a metronomic sound was played back during the task, even though all the birds were merely required to peck the keys eight times. The peck timing did not determine whether or not the subjects could obtain food rewards. Therefore, the spontaneous adjustment of the peck timing to the stimuli could be assumed if the peck timing of the birds was affected by the timing of the sound stimuli. The results revealed that the stimuli modestly but significantly influenced the peck timing in two of the five budgerigars.

8.5.2

Bengalese Finches

Bengalese finches are vocal learners; however, their vocal production flexibility is inferior to the adaptability of most parrots. Bengalese finches can learn their songs during a critical period. They have a lesser capacity for the acquisition of heterospecific songs, at least in comparison to parrots (Okanoya 2012). Thus, we hypothesized that the “vocal learning hypothesis” may predict that finches have a lesser capacity for rhythmic synchronization than parrots.

8 Affiliation, Synchronization, and Rhythm Production by Birds

131

We examined this hypothesis using ten two-year-old male Bengalese finches (unpublished data). Briefly, the birds were trained to peck a single LED key using the operant procedure similar to the budgerigar studies described above (i.e., Hasegawa et al. 2011; Seki and Tomyta 2019; Kishimoto and Seki 2022). The birds were trained to peck the key during the 300 ms stimulus presentation period. The stimulus comprised an LED illumination accompanied by a 3000 Hz tone presented at 70 dB SPL at the bird-head position because Bengalese finches are sensitive to this frequency (see, Woolley et al. 2001). Stimuli were subsequently presented along with a metronomic rhythm, and the inter-stimulus onset interval (IOI) was set at 600, 750, 900, or 1050 ms.

8.5.2.1

Experiment 1

The birds were required to peck the key five times without missing. Pecks achieved during the stimulus period were considered hits and the duration was expanded to encompass the entire length of the stimulus plus 20% of the IOI before stimulus onset and after stimulus offset (i.e., “the acceptable period”). We measured time lags between the stimulus onset and noted the response for the fourth and fifth pecks in each trial. Figure 8.1a shows the distribution of the lags. A significant peck timing bias appeared after the stimulus onset in all IOI conditions. The lag durations matched the estimated reaction time (ERT) (see below). The pecks would appear around the onset of the stimulus or before the stimulus onset if the birds anticipated incoming stimuli using the tempo of the metronome. This condition

Fig. 8.1 Distribution of the timing of the last two pecks in Experiments 1 (a) and 2 (b). The shaded bars display significant biases ( p < 0.05, with the Bonferroni correction). Zero marks the stimulus onset. The length of the circumference corresponds to the acceptable period. No bias would appear in the graphs when birds pecked at random. Arrows indicate the estimated reaction times (ERTs)

132

Y. Ikkatai and Y. Seki

Fig. 8.2 Inter-response intervals (IRIs) noted in Experiments 1 (a) and 2 (b)

is labeled as negative mean asynchrony (NMA): the mean response time slightly precedes the stimulus onset (see Repp 2005; Hasegawa et al. 2011). The interresponse interval (IRI) modes were 600, 750, 890, and 1040 ms for each IOI condition (Fig. 8.2a). Therefore, the birds probably waited for every cue and then pecked the key in reaction to the received signal. A random IOI test was run for six of the ten birds after Experiments 1 and 2 were conducted to estimate their reaction times to the stimuli (i.e., stimulus trains are not isochronous and subjects would not be able to anticipate the timing of an incoming stimulus). The peck timing mode was 210 ms, indicating that the birds needed this time to produce each peck in response to a cue.

8.5.2.2

Experiment 2

The birds were required to produce two extra pecks with no stimulus cues (i.e., they would have to follow “imaginary” stimulus cues for these pecks) after producing four pecks accompanied by an audio-visual metronome, without missing. This task would be more difficult and more unstable for the birds than Experiment 1. Thus, we collected double the number of data (50 reinforced trials × 2 sessions). One bird stopped engaging in the task in the 1050 ms IOI condition, and we recorded a total of 7800 pecks (i.e., two pecks × 100 trials × 10 birds × four IOI conditions—two pecks × 100 trials × one bird × one IOI condition). Unlike Experiment 1, significant bias appeared around the onset of the “imaginary” stimulus for the timing of the fifth and sixth pecks (Fig. 8.1b). The IRI modes were 600, 730, 870, and 990 ms for each IOI condition (Fig. 8.2b). The birds needed to use the preceding metronomic information (Video 8.1) to produce the fifth and sixth pecks. Thus, we assumed that the key pecks appeared around the onset of each “imaginary” stimulus, demonstrating that the birds reproduced rhythmic intervals like those of the preceding metronomic stimuli. However, the mean time lags between pecks and the onset of the imaginary stimulus were all positive (+22, +64, +54, and + 52 ms for each IOI condition), revealing no evidence of NMA.

8 Affiliation, Synchronization, and Rhythm Production by Birds

8.5.2.3

133

Experiment 3

The adopted procedure was identical to Experiment 1, except this trial employed only the auditory cue. The auditory stimuli were sufficient for the birds to complete the task. We analyzed 2000 pecks in total (i.e., two pecks × 50 trials × five birds × four IOI conditions). A significant bias was observed in the timing of pecks before ERT, a finding different from the results of Experiment 1. Therefore, the training for Experiment 2 could have affected the strategies adopted by the birds, and/or the difference in the sensory modality could have affected the results. The IRI modes were 590, 810, 880, and 1050 ms for each IOI condition (Fig. 8.4a). The mean time lags from the onset of stimuli were all positive (+111, +72, +80, and 33 for each IOI condition), not evidencing any NMA.

8.5.2.4

Experiment 4

The adopted procedure resembled Experiment 2 except for the sole use of the auditory cue in this trial. One bird stopped engaging in the task in the 600, 750, and 1050 ms IOI conditions. Thus, we analyzed an aggregate of 3400 pecks (two pecks × 100 trials × five birds × four IOI conditions—two pecks × 100 trials × one bird × three IOI conditions). A significant bias was noted in the timing of pecks before ERT (Fig. 8.3b). The IRI modes were 540, 770, 860, and 1060 ms for each IOI condition (Fig. 8.4b). The mean time lags from the onset of stimuli were all positive (+110, +78, +53, and + 44 ms), with no evidence of NMA. The Bengalese finch subjects did not display rhythmic synchronization in Experiment 1, unlike budgerigars (Hasegawa et al. 2011). However, the birds learned to use the tempo of the preceding metronomic stimuli and could reproduce the rhythm

Fig. 8.3 The distribution of the timing of the last two pecks in Experiments 3 (a) and 4 (b)

134

Y. Ikkatai and Y. Seki

Fig. 8.4 Inter-response intervals (IRIs) noted in Experiments 3 (a) and 4 (b)

in Experiment 2. Therefore, Bengalese finches could harbor the potential for rhythmic motor production in time with a metronome, but this ability may not manifest unless necessary. Hence, the birds could instead prefer to merely react to each single stimulus cue. In this sense, their capability for rhythmic synchronization is lesser than the ability of budgerigars.

8.6

Timing Coordination with Others in Budgerigars

Kishimoto and Seki (2022) trained budgerigars in an experimental setting to take individual alternating turns to produce key pecks. Two cages, each equipped with an LED key, were placed adjacent to each other in the experimental setting. The subject birds were each put in a separate cage. The following procedure was established: (1) peck timing was signaled by the illumination of the LED on one side; (2) when the bird on that side pecked the key, the LED was turned off and the LED was turned on at the other side. The repetition of this sequence caused the birds to form pseudoturn-taking sequences. Subsequently, the authors analyzed the peck timing and found that some birds pecked the key shortly before the LED was turned on. This study did not investigate the rhythmic synchronization capability in budgerigars; however, its outcomes suggest that budgerigars can use the motion of other birds as a cue that helps them anticipate the timing of incoming events. This finding could prove that along with their ability for vocal learning, the social nature of budgerigars may play a role in their capability for rhythmic synchronization.

8.7

General Discussion

This chapter delved into the rhythmic synchronization capability of parrots to musical beats in connection with their capacity for vocal learning. To accomplish this objective, we overviewed existing studies on social behaviors in budgerigars before introducing our investigations into the capacities of budgerigars and

8 Affiliation, Synchronization, and Rhythm Production by Birds

135

Bengalese finches for rhythmic synchronization. We posited the potential interrelatedness of vocal learning abilities, social behaviors, and rhythmic synchronization. The capacity for rhythmic synchronization or musical entrainment could vary even among vocal learners. Human beings perform dancing along with music beats outstandingly superior to other animal species. Human individuals entrain to music almost without exception. A cockatoo and an African grey have been reported to spontaneously entrain to musical beats (Patel et al. 2009; Schachner et al. 2009). No evidence of spontaneous musical entrainment exists in budgerigars (Schachner et al. 2009); however, they have exhibited rhythmic synchronization to metronomes in a key-peck task with NMA (Hasegawa et al. 2011). Bengalese finches reproduced rhythmic intervals only after they were intensively trained by an additional operant conditioning procedure (Experiment 2), and did not display NMA. To take a stance on the vocal learning and rhythmic synchronization hypothesis, neural substrates for flexible vocal imitation such as those found in parrot brains could be involved in spontaneous rhythmic synchronization or musical entrainment (the neural circuits for vocal learning are more complex than the circuits in the songbird brains; Chakraborty et al. 2015). Vocal learning performances also vary according to species. Human beings display the prominent capability to modify the acoustic characteristics of their vocalizations. Some parrots and myna birds are excellent in this task, and many finches evince a moderate capability for vocal learning. Parrots and human beings are open-ended vocal learners while Bengalese finches are closed-ended vocal learners. The vocal learning window for the Bengalese finches used in this study had already expired before the experiments were conducted, even though auditory feedback is available for song production in adult birds also (Okanoya and Yamaguchi 1997; Woolley and Rubel 1997). Thus, to take a stance on the vocal learning and rhythmic synchronization hypothesis, the limited ability for rhythmic pecking behavior in the Bengalese finches we used for our experiments could be explained by the limited vocal learning capability of this species. To clarify the contended issue, we can also compare the physical origins of rhythmic synchronization with some other behaviors that can be emerged using a variety of physical mechanisms that differ between species, for instance, walking, flying, and swimming. Behaviors that resemble each other could originate in different mechanisms even within an individual. For example, an organ may take over the function of another if the original organ incurs physical injury resulting in functional impairment. The same could be true of the capability of rhythmic synchronization. Similarly, the capability of rhythmic synchronization could entail the substrates for vocal learning in one species but not in another. We can interpret our results as congruent with the vocal learning and rhythmic synchronization hypothesis. The results could also evidence that the capacity for vocal learning may not be enough on its own to generate spontaneous rhythmic synchronization comparable to that exhibited by human beings and budgerigars (i.e., a tap or peck around the stimulus onset without additional training as in Experiment 2 of the Bengalese finch study). The latter idea seems aligned with the counterargument against the vocal learning and rhythmic synchronization hypothesis. The results of chimpanzee (Hattori et al. 2013, 2015) and bonobo (Large and Gray 2015) studies have allowed scholars to

136

Y. Ikkatai and Y. Seki

Fig. 8.5 Some parrot studies reviewed in this chapter suggest a possible mechanism that supports the capability for rhythmic synchronization

contend that rhythmic synchronization originated from cooperative behaviors and social interactions. However, vocal learning and social hypotheses are not mutually exclusive. Parrots including budgerigars are wonderful vocal learners, but that is not all. As this chapter has demonstrated, they establish stable relationships with others through vocal and motor communication. They imitate each other’s vocalization and other behaviors. Occasionally, this mimicking manifests as behavioral contagion. These aspects could be involved in the substrate of rhythmic synchronization in budgerigars and may also enhance their social relationships. Also, like rhythmic synchronization, imitation and contagion behaviors require the transformation of sensory information to motor outputs (Fig. 8.5). Human beings occasionally shout time while generating certain behaviors that require collaboration, such as rowing at a regatta or pulling a rope in a tug-of-war. Keeping time aloud could facilitate the rhythmic synchronization of the actions of the individual members of a group. This exemplar also recalls the classical idea of the yo-he-ho hypothesis (Crystal 2005) of the origin of language. The social hypothesis encompasses a part of the truth about the origin of rhythmic synchronization but we can still discuss the role of vocal learning and vice versa. When we hear people shouting to keep time as they collaborate to accomplish a task, we also tend to match the pitch of the vocalized sounds. An out-of-pitch vocalization generated by a group member could disrupt the timing of the collaborative behavior. Therefore, “spectral synchronization” (Podlipniak 2017) could be important in vocalization between individuals in such cooperative tasks, in addition to the synchronization of vocal timing. Obviously, the capability of vocal learning is necessary for spectral synchronization (also see Chap. 9). Thus, the excellent rhythmic synchronization capability in human beings (and maybe some parrots) or their facility of entrainment to musical beats could involve both vocal learning and social behaviors. The superior rhythmic synchronization performance of a species could be explained by not one or the other of the two substrates, but their combination.

8 Affiliation, Synchronization, and Rhythm Production by Birds

137

References Bartlett P, Slater PJ (1999) The effect of new recruits on the flock specific call of budgerigars (Melopsittacus undulatus). Ethol Ecol Evol 11(2):139–147 Brockway BF (1964a) Ethological studies of the budgerigar (Melopsittacus undulatus): non-reproductive behavior. Behaviour 22(3–4):193–222 Brockway BF (1964b) Ethological studies of the budgerigar: reproductive behavior. Behaviour 23(3–4):294–323 Chakraborty M, Walløe S, Nedergaard S, Fridel EE, Dabelsteen T, Pakkenberg B et al (2015) Core and shell song systems unique to the parrot brain. PLoS One 10(6):e0118496 Cordoni G, Palagi E, Tarli SB (2006) Reconciliation and consolation in captive western gorillas. Int J Primatol 27(5):1365–1382 Crystal D (2005) How language works: how babies babble, words change meaning, and languages live and die. Penguin, Harmondsworth Dooling RJ (1986) Perception of vocal signals by budgerigars (Melopsittacus undulatus). Exp Biol 45(3):195–218 Farabaugh SM, Linzenbold A, Dooling RJ (1994) Vocal plasticity in budgerigars (Melopsittacus undulatus): evidence for social factors in the learning of contact calls. J Comp Psychol 108(1): 81–92 Fraser ON, Bugnyar T (2010) Do ravens show consolation? Responses to distressed others. PLoS One 5(5):e10605 Fraser ON, Bugnyar T (2011) Ravens reconcile after aggressive conflicts with valuable partners. PLoS One 6(3):e18118 Fujisawa KK, Kutsukake N, Hasegawa T (2005) Reconciliation pattern after aggression among Japanese preschool children. Aggress Behav 31(2):138–152 Gallup AC, Swartwood L, Militello J, Sackett S (2015) Experimental evidence of contagious yawning in budgerigars (Melopsittacus undulatus). Anim Cogn 18(5):1051–1058 Hasegawa A, Okanoya K, Hasegawa T, Seki Y (2011) Rhythmic synchronization tapping to an audio–visual metronome in budgerigars. Sci Rep 1(1):1–8 Hattori Y, Tomonaga M, Matsuzawa T (2013) Spontaneous synchronized tapping to an auditory rhythm in a chimpanzee. Sci Rep 3:1566 Hattori Y, Tomonaga M, Matsuzawa T (2015) Distractor effect of auditory rhythms on self-paced tapping in chimpanzees and humans. PLoS One 10(7):e0130682 Heyes C (2011) Automatic imitation. Psychol Bull 137(3):463–483 Heyes C, Saggerson A (2002) Testing for imitative and nonimitative social learning in the budgerigar using a two-object/two-action test. Anim Behav 64(6):851–859 Hile AG, Striedter GF (2000) Call convergence within groups of female budgerigars (Melopsittacus undulatus). Ethology 106(12):1105–1114 Hile AG, Plummer TK, Striedter GF (2000) Male vocal imitation produces call convergence during pair bonding in budgerigars, Melopsittacus undulatus. Anim Behav 59(6):1209–1218 Ikkatai Y, Izawa E (2012) Relevance of social interactions to pair bond formation and maintenance in budgerigars, Melopssitacus undulatus (in Japanese). Stud Sociol Psychol Educ: Inq Hum Soc 73:49–55 Ikkatai Y, Seki Y (2016) Effect of conspecific and heterospecific video playback on food consumption in budgerigars and Bengalese finches. Psychologia 59(2–3):81–90 Ikkatai Y, Okanoya K, Seki Y (2016a) Observing real-time social interaction via telecommunication methods in budgerigars (Melopsittacus undulatus). Behav Process 128:29–36 Ikkatai Y, Watanabe S, Izawa EI (2016b) Reconciliation and third-party affiliation in pair bond budgerigars (Melopsittacus undulatus). Behaviour 153(9–11):1173–1193 Kishimoto R, Seki Y (2022) Response timing of budgerigars in a turn-taking task under operant conditioning. Behav Process 198:104638 Kutsukake N, Castles DL (2004) Reconciliation and post-conflict third-party affiliation among wild chimpanzees in the Mahale Mountains, Tanzania. Primates 45(3):157–165

138

Y. Ikkatai and Y. Seki

Large EW, Gray PM (2015) Spontaneous tempo and rhythmic entrainment in a bonobo (Pan paniscus). J Comp Psychol 129(4):317 Logan CJ, Emery NJ, Clayton NS (2013) Alternative behavioral measures of postconflict affiliation. Behav Ecol 24(1):98–112 Manabe K, Dooling RJ (1997) Control of vocal production in budgerigars (Melopsittacus undulatus): selective reinforcement, call differentiation and stimulus control. Behav Process 41:117–132 Manabe K, Kawashima T, Staddon JER (1995) Differential vocalization in budgerigars: towards an experimental analysis of naming. J Exp Anal Behav 63:111–126 Manabe K, Staddon JER, Cleaveland JM (1997) Control of vocal repertoire by reward in budgerigars (Melopsittacus undulatus). J Comp Psychol 111:50–62 Miller ML, Gallup AC, Vogel AR, Vicario SM, Clark AB (2012) Evidence for contagious behaviors in budgerigars (Melopsittacus undulatus): an observational study of yawning and stretching. Behav Process 89(3):264–270 Mottley K, Heyes C (2003) Budgerigars (Melopsittacus undulatus) copy virtual demonstrators in a two-action test. J Comp Psychol 117(4):363–370 Mui R, Haselgrove M, Pearce J, Heyes C (2008) Automatic imitation in budgerigars. Proc R Soc B Biol Sci 275(1651):2547–2553 Okanoya K, Yamaguchi A (1997) Adult Bengalese finches (Lonchura striata var. domestica) require real-time auditory feedback to produce normal song syntax. J Neurobiol 33(4):343–356 Okanoya K (2012) Behavioural factors governing song complexity in Bengalese finches. Int J Comp Psychol 25:44–59 Park TJ, Dooling RJ (1985) Perception of species-specific contact calls by budgerigars (Melopsittacus undulatus). J Comp Psychol 99(4):391 Patel AD (2006) Musical rhythm, linguistic rhythm, and human evolution. Music Percept 24(1): 99–104 Patel AD (2021) Vocal learning as a preadaptation for the evolution of human beat perception and synchronization. Philos Trans R Soc B 376(1835):20200326 Patel AD, Iversen JR, Bregman MR, Schulz I (2009) Experimental evidence for synchronization to a musical beat in a nonhuman animal. Curr Biol 19(10):827–830 Podlipniak P (2017) The role of the Baldwin effect in the evolution of human musicality. Front Neurosci 11:542 Prather JF, Peters S, Nowicki S, Mooney R (2008) Precise auditory–vocal mirroring in neurons for learned vocal communication. Nature 451(7176):305–310 Repp BH (2005) Sensorimotor synchronization: a review of the tapping literature. Psychon Bull Rev 12:969–992 Richards C, Mottley K, Pearce J, Heyes C (2009) Imitative pecking by budgerigars, Melopsittacus undulatus, over a 24 h delay. Anim Behav 77(5):1111–1118 Rizzolatti G, Fadiga L, Gallese V, Fogassi L (1996) Premotor cortex and the recognition of motor actions. Cogn Brain Res 3(2):131–141 Schachner A, Brady TF, Pepperberg IM, Hauser MD (2009) Spontaneous motor entrainment to music in multiple vocal mimicking species. Curr Biol 19(10):831–836 Seed AM, Clayton NS, Emery NJ (2007) Postconflict third-party affiliation in rooks, Corvus frugilegus. Curr Biol 17(2):152–158 Seki Y, Tomyta K (2019) Effects of metronomic sounds on a self-paced tapping task in budgerigars and humans. Curr Zool 65(1):121–128 Trillmich F (1976) The influence of separation on the pair bond in budgeriagrs (Melopsittacus undulatus; Aves, Psittacidae). Z Tierpsychol 41:396–408 Woolley SM, Rubel EW (1997) Bengalese finches Lonchura Striata domestica depend upon auditory feedback for the maintenance of adult song. J Neurosci Res 17(16):6380–6390 Woolley SM, Wissman AM, Rubel EW (2001) Hair cell regeneration and recovery of auditory thresholds following aminoglycoside ototoxicity in Bengalese finches. Hear Res 153(1): 181–195 Wyndham E (1980) Diurnal cycle, behaviour and social organization of the budgerigar Melopsittacus undulatus. Emu 80(1):25–33

Chapter 9

Cockatiels: A Research Subject for Studying Capability for Music Production Yoshimasa Seki

Abstract Some cockatiels are capable of imitating human music through song, and this ability is potentially a very interesting topic for researchers studying acoustic communication in animals. However, the vocalizations of this species, particularly their vocal sequences, have been mentioned very little in academic literature. This chapter reviews my own studies (including unpublished data) investigating the vocal behaviour of cockatiels, focusing on their capability for the imitation of human music and creative vocal production. Three hand-raised cockatiels in my laboratory were exposed to a melody of human music, which was performed through whistling. The birds produced vocal outputs in response to the melody before they had even fully fledged. Then, the birds spontaneously began to imitate the melody. The process by which the imitation developed varied among the birds. After they copied the melody, some of the birds spontaneously sang the melody in synchrony with a playback of the melody, much like humans do when they sing Happy Birthday together. Further, the cockatiels modified the patterns of the melody spontaneously. They also created some novel sound sequences without the presence of the model sounds. These findings provide insights into acoustic communication between conspecifics and heterospecifics. Given that we live in a world in which diverse creatures are living together, understanding others is of great importance. Keyword Parrots · Rhythm · Synchronization · Creativity · Drumming · Percussion Every year, numerous melodies originate all over the world. Some of these melodies are completely novel, unlike anything that already exists. Therefore, music is often associated with creativity in humans (e.g., Dietrich and Kanso 2010) and some people think the capability to create music is unique to humans. Why do they think this? Some people think that non-human animals are instinctively constrained in what sound sequences they are capable of producing; thus, the degree of acoustic variation produced by non-human animals is lesser than that of humans. Another common thought is that, unlike humans, other animals use these types of sounds Y. Seki (✉) Department of Psychology, Aichi University, Toyohashi, Japan © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 Y. Seki (ed.), Acoustic Communication in Animals, https://doi.org/10.1007/978-981-99-0831-8_9

139

140

Y. Seki

only in contexts involving reproduction (e.g., courtship and territory defence). Further, some think that the substrates of music production (as well as the functions of the mind) are much less constrained in humans than in animals, for the reasons stated above. However, if we focus on specific individuals, rather than groups of humans and other animals, is this really the case? As a group, humans have inherited and accumulated huge legacies from their ancestors (including music) through multiple ways (e.g., oral transmission, writing and other methods). Further, many types of musical instruments have been developed, not by individuals within a lifetime, but by people throughout a long human history, continuously drawing upon the work of others. Moreover, in many cultures, people are exposed to various forms of music from childhood. It is undeniable that these factors differentiate humans from other animals. However, if we focus on individuals as opposed to groups, the difference between humans and other animals may not be as large as we originally thought. We cannot ask this question from the perspective of humans, because it is not feasible to design an experiment in which humans are isolated from their own society and culture. Instead, we can examine the capabilities of non-human animals which have been brought up in a human-centric culture. One famous study of this kind is Kellogg and Kellogg (1933), in which a young chimpanzee was raised by a human family, with a mother, father, and baby sibling. The chimpanzee was treated as a child by the parents. It is not easy to do something like this; in comparison, raising animals with exposure to human music is relatively easy. In addition, we need to consider another point in addressing the question above: what is music? It is difficult to define the concept of music; thus, it is also difficult to determine what we can refer to as “music” in the natural behaviour of non-human animals. However, we discriminate music from other sounds (of course, there are some exceptions). People from one culture may easily judge a melody of popular music from another culture as a kind of music, regardless if they have been exposed to that culture previously. Along this line, instead of searching for sounds similar to human music in the wild, we may be able to present actual popular human music to non-human animals to determine their musical tendencies. Several studies have investigated the preference for melodies of human music (or, relevant sound sequences) in various non-human animals; e.g., in songbirds (Watanabe and Nemoto 1998), budgerigars (Hoeschele and Bowling 2016), domestic chicks (Chiandetti and Vallortigara 2011), rats (Otsuka et al. 2009), cats (Snowdon et al. 2015a) dogs (Bowman et al. 2017), elephants (French et al. 2018), in non-human primates (McDermott and Hauser 2007; Koda et al. 2013; Mingle et al. 2014) and even in goldfish (Shinozuka et al. 2013). These studies only considered one side of the coin: perception of human music by non-human animals. However, we should also consider the other side: production (including imitation) of human music by non-human animals. But which animal species is the most suitable for this purpose? If we were to make a list of non-human animals that produce music-like sounds, songbirds would appear at the top of that list. Some famous composers of classical music have been inspired by birdsongs to create melodies (Baptista and Keister

9

Cockatiels: A Research Subject for Studying Capability for Music Production

141

2005). Some musical instruments were made to mimic bird vocalizations (Baptista and Keister 2005). Moreover, songbirds have been referred to as a model species by researchers investigating the capability and origin of human music (e.g., Fitch 2006; Roeske et al. 2020; also see, Snowdon et al. 2015b). However, some scientists mentioned that birdsong does not bear much similarity to human music in some aspects (e.g., Araya-Salas 2012). From this perspective, the famous sulphur-crested cockatoo, Snowball, exhibited quite interesting behaviour. Patel et al. (2009) demonstrated that the cockatoo spontaneously moved in synchrony with various beats of human music. Further, Snowball created original and diverse motor patterns for dancing (Keehn et al. 2019). These reports suggest that melodies of human music were particularly salient to this bird. In addition, the bird produced motor outputs in sync with the rhythm of human music. Therefore, the behaviour shown by Snowball is quite impressive in relation to the present topic. However, synchronization to a musical beat is not the same as copying a melody. In order to copy a melody of human music without musical instruments, pitch or frequency must be somehow represented and modulated. The vocal organs of some animals may serve this purpose. As a next step, we can examine the imitation of human music by non-human animals using vocal outputs. As shown in previous chapters, songbirds are capable of learning novel acoustic patterns based on their own auditory experience. However, this does not always mean that songbirds learn exactly what they heard (there are some exceptions like lyrebirds which imitate a broad range of acoustic stimuli, including artificial noise; e.g., Mulder and Hall 2013). To learn novel sounds requires large physical and cognitive costs (e.g., developing a complex song control neural system, as referenced in the previous chapter). Thus, it is reasonable to assume that they may have an internal filter to choose which sounds to copy. This means some songbirds have an innate limitation for learning novel sounds, including the songs of closely related species (e.g., Takahasi and Okanoya 2010). Therefore, we cannot expect that every vocal learning species will imitate the melodies of human music. This could be particularly true for popular laboratory animals, such as zebra finches and Bengalese finches. Nevertheless, there are a few examples in the literature that report songbirds imitating human music. One of the most famous examples is an anecdote called “Mozart’s starling” (Haupt 2017). Wolfgang Mozart found a European starling while it was singing a melody, which was composed by Mozart himself (the composition was known as K.453 today) and he bought the bird (West et al. 2004). More recently, an article reported that another European starling named Kuro sang human music (West and King 1990). Therefore, European starlings may exhibit the tendency to imitate human music. How about other species? Anecdotally, a captive bullfinch in England, named Bullie, was said to sing the national anthem (BBC News 2007). In addition, an academic paper was published in which bullfinches imitated a melody of music performed by human whistling (Nicolai et al. 2014).

142

9.1

Y. Seki

Why Cockatiels?

As described above, occasionally some songbirds imitate human music. What do we know about parrots, another clade of vocal learners in Aves? Some researchers have been interested in this question. In one of the earliest documented studies, Lashley (1913) presented various auditory stimuli as models for the imitation of an Amazon parrot. The study made no mention of the parrot copying musical melodies; however, the bird would sing quite readily when a melody was played, but much less to the presentation of each note of the melody played alone. This means the Amazon parrot was sensitive not to sound notes themselves but to the melodic sequence of them. Another study (Gramza 1970) may contribute knowledge to this topic as well. In the study, vocal mimicry by budgerigars was investigated under various conditions. Budgerigars were exposed to pure tone sequences under different background conditions. The birds mimicked the sound sequence less often in an environment in which music was played in the background than in an acoustically bare environment. More recently, a study reported that an African Grey parrot was trained to produce a vocal sequence following a sound sequence produced by a piano. Each sequence produced by the bird followed similar musical rules as the previous piano sequence in terms of the frequency ratios between the notes (Bottoni et al. 2003). The aforementioned studies suggest that parrots are a potentially good research subject for the present topic; however, they did not demonstrate that parrots imitate melodies of human music. Nevertheless, we can readily see direct evidence of parrots singing melodies of human music on the internet. In fact, it is well known among aviculturists that parrots are just as competent as (or, maybe superior to) songbirds in imitating human music. Probably, the most famous species that mimics the melodies of human music is the cockatiel (Fig. 9.1). There are many YouTube videos in which a cockatiel is singing human music, such as the Micky Mouse Club March and My neighbour Totoro. However, in those examples the cockatiels do not actually pronounce human words to mimic the lyrics, but just imitate the melody using pure tone-like sounds that are similar to human whistling. Fig. 9.1 Cockatiels

9

Cockatiels: A Research Subject for Studying Capability for Music Production

143

There are also some YouTube videos in which African Grey parrots sing songs of human music and do attempt to pronounce human words. This is quite interesting, and I have no doubt that this species can sing human songs. However, to analyse the acoustic structures of their vocal sounds imitating human words is not trivial. Therefore, I decided to study “imitation of a melody of human music” rather than “imitation of human songs” in parrots. It is much easier to analyse the whistle-like vocal sounds of cockatiels imitating human music. This is one advantage of using them in academic research for the present topic. Another advantage is that with cockatiels, the mimicry of human music is very obvious, so that it is unequivocal what melody the cockatiel is singing. A third advantage is that they are one of the most popular pet birds in the world, and it is easy to keep them in captivity. For these reasons, in 2017 I obtained six juvenile cockatiels from a local breeder that were around 20–25 days post hatch (dph). At that time, the sex of the birds was not known. Eventually, it was determined that three of the six birds were males and the others were females. The three males (Birds PY, C and PK) were lutino cockatiels and were pale yellow in colour. PY was obtained first, then, C and PK were obtained later on the same day. Of all the six birds, only the three males showed any capability for the imitation of music, therefore, hereafter, I describe the vocal behaviour of the males.

9.2

Development of Acoustic Patterns of the Songs

At the beginning of the study, the cockatiels were exposed to a live performance of my whistling. I whistled a similar melody to the “Mickey Mouse Club March.” Later, my whistling was digitally recorded to use as a template for their imitation. The playback of the recording was used as an auditory stimulus, in addition to my live whistling. It is likely that the melody became a salient stimulus for the birds soon after they first heard it. For example, while the melody was presented, juvenile bird PY frequently produced calls coordinating his vocal timing with the melody (Fig. 9.2). This behaviour began spontaneously soon after the bird’s first exposure to the melody. Then, the birds spontaneously began to imitate the melody. It is difficult to pinpoint the exact time when they began to imitate the melody because they were already vocalizing some sounds upon obtaining them from the breeder. However, by A Yes. U > A No. U≒A

U unaccented words, A accented words

non-native vocabulary and judged that the stimuli (both existent native vocabulary and novel words were given) were non-native vocabulary, to which rendaku does not apply. Our results show that literate preschoolers used orthographic cues in rendaku processing. Preschoolers used different strategies, based on the types of orthography provided in the stimuli. Preschoolers in the no-orthography and Hiragana conditions showed similar tendencies: They applied rendaku to unaccented words more often than to accented ones, whereas those in the Katakana condition seemed reluctant to apply it. It follows from these results that children attend to orthographic cues but continue to use the prosodically based rendaku strategy (Table 13.3). Why, then, were preliterate preschoolers in the Katakana condition not willing to apply rendaku to unaccented words more often than to accented words? We can think of the following reasons. Preliterate children may be aware of some correspondence between the types of orthography and word categories. They may know that Hiragana is used for a certain category of words and Katakana for another. They may also be aware that Katakana is used to represent non-native Japanese vocabulary, to which rendaku does not apply. Preschoolers first define the rendaku word category based on prosodic information, namely pitch-accent (preschoolerspecific rendaku strategy). Our literate preschoolers could differentiate rendaku strategies based on a rough Katakana vs. non-Katakana distinction, but not based on a Hiragana vs. Katakana distinction. We need their longitudinal data to consider the orthographic effects on the developmental changes in children’s language processing. We examined how literacy affects Japanese children’s mental representations and the developmental discontinuity found in the language-specific rule of “rendaku” processing. Rendaku is a morphophonemic process in Japanese that voices the initial obstruent of the second element (=E2) of a compound (ori “folding” + kami “paper” → origami “paper” “folding”: /k/→/g/; Vance 2015). Neuropsychological studies on adult subjects have shown that rendaku involves multiple brain areas that are responsible for phonological (sounds), syntactic (grammar), and semantic (meaning) information of language (Ogata et al. 2000; Sakai et al. 2006). In adult grammar, rendaku applies if the E2 is native vocabulary (Yamato morpheme) and contains no voiced obstruent consonant (the two rendaku conditions; Fukuda and Fukuda 1999). Recent psycholinguistic studies on rendaku acquisition have revealed that preschoolers do not follow adult rendaku conditions. Children originally construct the

13

The Interplay Among the Linguistic Environment, Language Perception,. . .

209

prosodically based preschooler-specific rendaku processing (= the Preschoolerspecific rendaku strategy; Sugimoto 2013a, b); but show a more adult-like processing strategy in middle childhood (Sugimoto 2016a). The preschooler-specific rendaku strategy was observed among English–Japanese bilinguals and blind children (Sugimoto 2016a, 2019). It is clear that children change their rendaku processing strategies that leads to the developmental discontinuity between early and middle childhood, but we do not know the exact mechanism underlying it. What factors affect their developmental discontinuity? One possible factor would be lexical awareness through the acquisition of Japanese literacy. Typically, developing children learn the traditional Japanese writing system comprising three types of orthography: Hiragana, Katakana, and kanji. Hiragana is used for native vocabulary (=rendaku words) and Katakana is used for loan words. Blind children usually learn only one type of orthography: the Braille alphabet, a writing system using combinations of raised dots. How do different orthographic systems affect children’s rendaku processing? We conducted experiments using compound noun production tasks (Nicoladis 2003; Sugimoto 2016a) to investigate factors responsible for developmental discontinuity. Our subjects were typically developing children, congenital blind children, and children with severe visual disorders, as well as the adult counterparts of each group. We compared the rendaku patterns of children before and after acquiring literacy. Our findings are of twofold. First, preliterate preschoolers with, both blind and sighted children, showed prosodically based rendaku processing. However, after acquiring literacy they showed adult-like rendaku processing. Second, adults’ rendaku processing seemed different between the blind and sighted. That is, whereas preliterate blind children showed preschooler-specific rendaku processing just like other preliterate preschoolers, both literate blind children with Braille alphabetic knowledge and literate blind adults showed overgeneralizations of rendaku processing. These results show that children’s mastery over literacy motivates their representational redescriptions (Karmiloff-Smith 1992) of the rendaku word category, which, in turn, determine developmental discontinuity and differences in language processing. The types of orthography that children learn to use, that is, the Braille alphabet with only three types of Japanese characters corresponding to each of the Japanese word categories, determine the directions of the children’s representational redescriptions, resulting in a qualitative difference in rendaku processing strategies between literate blinds and the sighted. Longitudinal experiments were conducted to investigate how literacy affects children’s developmental changes in language processing. Blind Japanese children showed unique developmental tendencies in their language processing strategies in middle childhood after acquiring literacy. The fundamental mechanism behind this developmental change is that the types of orthography learned by children, namely the Braille alphabet alone or three types of Japanese characters corresponding to each of the Japanese word categories, determine the directions of their representational redescriptions. This results in a qualitative difference in language processing strategies between the literate blind and sighted children.

210

T. Sugimoto

Fig. 13.2 Developmental change in the children’s rendaku processing strategy

Preliterate preschoolers, both the blind and the sighted, showed prosodically based rendaku processing. After acquiring literacy, they showed adult-like rendaku processing (Fig. 13.2). Blind children showed a developmental change in rendaku processing after acquiring literacy in Braille. Both literate blind children (with Braille alphabetic knowledge) and literate blind adults showed overgeneralizations of rendaku processing. Hypothesis A was thus supported: Literacy motivates children’s redefinition of rendaku (a → b → c). (a) Literacy development based on different orthographic systems (Kana or Braille). (b) Different redefinitions of the rendaku word category. (c) Distinctive rendaku processing strategies.

13.3 Study 2 We have seen that preliterate children find a unique way to rely on the prosodic property of words in rendaku processing. How do they learn to effectively process right-headed compounds that have their syntactic head at the end of the compounds? Hirose and Mazuka (2017) demonstrated that Japanese-speaking adults and first graders both show anticipatory compound processing based on the language-specific compound accent rule (=CAR). That is, children aged 6–7 years can exploit compound prosody to disambiguate the structure and meaning of a given compound. However, we do not know exactly when and how children start exploiting the CAR to properly comprehend compounds. Thus, we investigated Japanese-speaking children’s acquisition of the CAR and their development of compound processing. We conducted longitudinal experiments using compound comprehension tasks on 65 Japanese-speaking children aged between 2 and 4 years. We found that children’s compound processing strategies changed after they acquired CAR. Before acquiring it, children could not identify the compound head. Instead, they showed a

13

The Interplay Among the Linguistic Environment, Language Perception,. . .

211

language-general parsing preference for the left-most part of a compound. Our results suggest that children’s acquisition of language-specific CAR enables their compound processing. We investigated Japanese-speaking children’s acquisition of the CAR and their development of compound processing. We conducted cross-sectional and longitudinal experiments using compound comprehension tasks on 101 Japanese-speaking children aged between 2 and 5 years. We sought to investigate the developmental process of compound processing in young children in relation to their acquisition of CAR and lexical knowledge. Accordingly, we asked the following research questions. 1. When and how do children start exploiting the CAR to comprehend compounds. 2. Is the CAR (or prosodic information) alone sufficient to comprehend compounds? 3. What kind of information do children use to comprehend compounds?

13.3.1

Methods

We conducted cross-sectional and longitudinal studies on children aged between 2 and 5 years (Table 13.4).

13.3.1.1

Procedure (Compound Comprehension Task)

We conducted an experiment using a forced-choice task on Japanese-speaking children aged between 2 and 4 years. The children were tested individually in a quiet room. Each child was given a set of four pictures on the PC screen. After listening to a compound noun using headphones, the children were asked to choose one picture that best denoted the word they heard. We found two types of compound processing during the development: E1- and E2-based processing. We used forcedchoice tasks of identifying one out of four pictures on a PC monitor (Appendix 13.5).

13.3.1.2

Materials

A total of 14 endocentric (=head final) compounds were used. We used seven non-native words and seven native words in this experiment.

Table 13.4 Experimental designs 1. Cross-sectional study 2. Longitudinal study (T1–3)

Participants Children aged between 2 and 5 years Children aged between 2 and 4 years

N (females) 101 (55) 20 (10)

212

T. Sugimoto

E1 + E2 = Compound Noun. e.g., megane + karasu → meganegarasu. “eye glasses” + “crow” → “glass-wearing crow”.

13.3.2

Results and Discussion

The following results ensued (Table 13.5 and Fig. 13.3): Our two-way ANOVA (N2 Word Type (2) × Age Group (5)) found a significant interaction between word types and age groups [F(4,84) = 2.963, p < 0.01] and a significant simple effect of word types on age groups [F(1,84) = 15.712, p < 0.01]. These results indicate that children aged 2 years can exploit the CAR to comprehend compounds with non-native N2s. They are not on the CAR to properly process compound nouns. They need lexical categorical knowledge of E2 to recognize native E2s. Thus, they use different information in processing compounds. Children aged 2 years can make use of the CAR to comprehend compounds properly with A type N2s (=E2s). However, they had difficulty comprehending compounds with native N2s (=E2) (Fig. 13.4). Table 13.5 Descriptive statistics by word condition Age groups (MA; age range) 2 years (32 m; 24–35 m) 3 years (39 m; 36–41 m) 3.5 years (46 m; 42–47 m) 4 years (53 m; 48–59 m) 5 years (61 m; 60–65 m) Total (24–65 m)

A. Non-native N2 (SD) 5.0 (1.27) 5.06 (1.55) 5.63 (1.41) 6.57 (1.42) 7 (0.0) 5.74 (1.36)

B. Native N2 (SD) 3.25 (1.39) 3.5 (1.72) 4.46 (1.69) 6.18 (1.42) 7 (0.0) 4.67 (1.96)

Fig. 13.3 The development of compound processing by word type

N (females) 16 (8) 18 (9) 24 (12) 28 (14) 15 (7) 101 (55)

13

The Interplay Among the Linguistic Environment, Language Perception,. . .

213

Fig. 13.4 The developmental trajectory from the longitudinal study (compounds with native N2s)

1. 2. 3. 4.

Children aged 2 years initially showed left-headed compound processing. They shifted from left- to right-headed processing at around 3 years of age. Their vocabulary age at T1 predicted their performance at T2 (r = 0.69). Children combined both prosodic and lexical knowledge to develop their compound-processing ability.

We investigated Japanese-speaking children’s acquisition of the CAR and their development of compound processing. We conducted cross-sectional and longitudinal experiments using compound comprehension tasks on 101 Japanese-speaking children aged between 2 and 5 years. We found that children’s compoundprocessing strategies changed after they acquired CAR. Children could not identify the compound head before they acquired CAR. Instead, they showed a languagegeneral parsing preference for the left-most part of a compound. Our results show that children’s acquisition of the language-specific CAR and lexical knowledge enabled their compound processing.

13.4

Summary and Conclusion

A series of language production and comprehension studies showed some of the developmental characteristics of Japanese-speaking children’s language-specific development. When they have no orthographic knowledge, Japanese-speaking

214

T. Sugimoto

preschoolers first characterize the lexical category (Native, Sino-Japanese, or Foreign) based on pitch-accent, that is, preschooler-specific rendaku strategy. After acquiring orthography, children gradually change their rendaku processing strategy, with information such as that on the relationship between orthography and word category. Something more is necessary for their redefinition of the rendaku category, which may cause a developmental change in their rendaku processing strategy. Acknowledgments This is a widely revised version of my talks presented at the two annual meetings of Cognitive Science Society, one in 2016 (Philadelphia, USA) and the other in 2019 (Montreal, Canada), respectively. The author is grateful for all the participants in this study. This work was financially supported in part by JSPS KAKENHI Grant #17 K02764, which was awarded to the author, as well as by the Rendaku Project organized by the National Institute of Japanese Language and Linguistics (NINJAL).

An Example of Experimental Stimuli (No-Orthography Condition)

Verbal Instructions in the Experiment E1: Kore-wa himawari-desu. “Here’s a hima’wari.” E2: Kore-wa karasu-desu. “Here’s a ka’rasu.” Compound: Koreni namae-o tuketekudasai. “How would you name it?”

13

The Interplay Among the Linguistic Environment, Language Perception,. . .

215

Cross-Modal Linguistic Stimuli Used in the Production Task ① Presentation of only visual stimuli (Control)

ᶆ Presentation of cross-modal stimuli (Hiragana condition)

List of Stimuli for 16 E2s Used in the Test Trial

1 2 3 4

8 known words Accented ta’nuki “racoon dog” ka’rasu “crow” ho’uki “bloom” ho’taru “light bug”

Unaccented sakura “cherry blossom” tsukue “desk” hatake “field” kuruma “car”

8 novel words (Old Japanese) Accented/unaccenteda tokama “reaping hook” hikime “long arrow” sasara “an old musical instrument” koromo “Kimono”

Unaccented/accenteda tekona “virgin” hokai “food container” tatara “bellows” hokora “god’s palace”

E1: himawari “sunflower”; E2: 16 words listed above a For eight novel words, we created two types of pitch-accent assignment patterns to counterbalance possible phonotactic effects. We divided children in each of the three conditions into two groups and used different pitch-accent assignment for each group in each condition

T. Sugimoto

216

Example of Stimuli Used in the Forced Choice Tasks

Two Types of Pitch-Accent Assignment Groups Novel word stimuli (E2) Tekona, hokai, tatara, hokora Tokama, hikime, sasara, koromo

Group A of each condition Accented e.g., te’kona (anti-penult) Unaccented e.g., tokama (no pitch accent)

Group B of each condition Unaccented e.g., tekona (no pitch accent) Accented e.g., to’kama (anti-penult)

References Ambridge B, Lieven E (2011) Child language acquisition: contrasting theoretical approaches. Cambridge University Press, Cambridge Bauer L (2009) Typology of compounds. In: Lieber R, Stekauer P (eds) The Oxford handbook of compounding. Oxford University Press, Oxford, pp 343–356 Clark E, Gelman S, Lane N (1985) Compound nouns and category structure in young children. Child Dev 56:84–94 Di Sciullo A (2009) Why are compounds a part of human language? A view from asymmetry theory. In: Lieber R, Stekauer P (eds) The Oxford handbook of compounding. Oxford University Press, Oxford, pp 145–177 Fukuda SE, Fukuda S (1999) The operation of rendaku in the Japanese specifically languageimpaired: a preliminary investigation. Folia Phoniatr Logop 51(1–2):36–54 Fukuda SE, Fukuda S (2001) An asymmetric impairment in Japanese complex verbs in specific language impairment. Cogn Stud Bull Jpn Cogn Sci Soc 8(1):63–84 Hirose Y, Mazuka R (2017) Exploiting pitch accent information in compound processing: a comparison between adults and 6-to 7-year-old children. Lang Learn Dev 13(4):375–394 Itô J, Mester R-A (1986) The phonology of voicing in Japanese: theoretical consequences for morphological accessibility. Linguistic Inquiry 49–73 Itô J, Mester R-A (1995) Japanese phonology. In: Goldsmith JA, Riggle J, Alan CL (eds) The handbook of phonological theory. Blackwell, Oxford, pp 817–838

13

The Interplay Among the Linguistic Environment, Language Perception,. . .

217

Itô J, Mester R-A (2001) Alternations and distributional patterns in Japanese phonology. J Phonetic Soc Jpn 5(2):54–60 Karmiloff-Smith A (1992) Beyond modularity: a developmental perspective on cognitive science. The MIT Press Kubozono H (2006) Where does loanword prosody come from: a case study of Japanese loanword accent. Lingua 116(7):1140–1170 Kubozono H (2011) Japanese pitch accent. In: Oostendrop M, Hume E, Rice K (eds) The Blackwell companion to phonology. Blackwell, Oxford, pp 2879–2907 Kubozono (2015) Introduction to Japanese phonetics and phonology. In: Kubozono H (ed) Handbook of Japanese phonetics and phonology. Walter de Gruyter, Berlin, pp 1–40 Labrune L (2013) A cross-linguistic approach to Rendaku-like compound-markers, with special reference to Korean and Basque. Talk presented at 3rd international conference on Phonetics & Phonology, Tokyo McCawley J (1968) The phonological component of a grammar of Japanese. Mouton, The Hague Nicoladis E (2003) What compound nouns mean to preschool children. Brain Lang 84(1):38–49 Nicoladis E (2007) Preschool children’s acquisition of compounds. In: Libben G, Jarema G (eds) The representation and processing of compound words. Oxford University Press, Oxford, pp 96–124 Ogata E, Hayshi R, Imaizumi S, Hirata N, Mori K (2000) Neural processing mechanism for Rendaku and accent rules in compound word recognition. Onsei 99(678):17–24, IEICE Omura A (2000) Observation methods. In: Research methods in educational psychology (=kyouikusinrigaku-no gihou). Fukumura Shuppan, Tokyo Sakai H, Miyatani M, Tanaka J, Yoshimura M, Maruisi S, Muranaka H (2006) Hanashikotoba niokeru fukugougoshori no nokinou. (=Cortical function for information binding in compound word processing in spoken language: an fMRI investigation.). Proc Cogn Sci Soc Jpn 23:34–37 Stoel-Gammon C (2011) Relationships between lexical and phonological development in young children. J Child Lang 38(1):1–34 Sugimoto T (2013a) The role of pitch accent in the acquisition of Rendaku. Poster presented at the 3rd international conference on phonetics and phonology, Tokyo Sugimoto T (2013b) The acquisition of Rendaku: the lexical strata and the Lyman’s law. Proceedings of the 146th annual meeting of the Linguistic Society of Japan, pp 67–72 Sugimoto T (2015a) Development of preschooler-specific language processing strategy: a longitudinal study. Proceedings of the 25th annual meeting of Japan Society of Developmental Psychology, p 238 Sugimoto T (2015b) Children’s language processing during the transition from early childhood to middle childhood. Proceedings of the Annual Meeting of Japan Society of Educational Psychology, p 332 Sugimoto T (2016a) Acquisition of Rendaku by bilingual children: a case study. NINJAL Res Papers 11:1–9 Sugimoto T (2016b) Children’s use of orthographic cues in language processing. In: The Proceedings of the 38 the Annual Conference of the Cognitive Science Society. Cognitive Science Society, Austin, TX, pp 883–888 Sugimoto T (2017) The developmental aspects of children’s rendaku processing strategies. In: Vance T et al (eds) Papers in Rendaku research (Rendaku-no Kenkyu: selected papers from the rendaku project). Kaitakusha Co. Ltd, Tokyo, pp 181–197 Sugimoto T (2019) Equifinality and multifinality in children’s language development. Lang Cult 41:101–114 Vance T (2015) Rendaku. In: Kubozono H (ed) Handbook of Japanese phonetics and phonology. Walter de Gruyter, Berlin, pp 397–441

Chapter 14

Sound Processing in the Auditory Periphery: Toward Speech Communication and Music Comprehension Toshie Matsui

Abstract While communicating by vocal language or sharing music, the entrance of sound is the auditory periphery. Our auditory system extracts the pitch of a sound and the shape of resonators (body of instrument and vocal tract). Using pitch information, we recognize pitch accents and remember a melody. Using the resonator shape information, we recognize vowels and notice the difference between the instruments. This chapter will provide an overview of what kind of sound processing of our auditory system leads to an understanding of speech and music, based on previous studies using psychological experimental approaches. Keywords Speech · Music · Pitch · Source-filter model · Vocal tract · Melody · Timbre

14.1

Introduction

Our hearing system recognizes consonants and vowels from speech sounds and comprehends them as words. It also recognizes pitch transitions and types of instruments from musical sounds and recognizes them as music. Our sense of hearing receives and processes various sounds, other than voices and music. We can sense when someone is trying to tell us something and infer what is happening around us. Depending on whether the subject is speech, music, or something else, the important sound features and their combinations differ. However, if the medium is sound, it should undergo common processing in the auditory periphery regardless of the input. More complex cognition becomes possible through processing in the periphery, followed by extracting basic information, and classifying and integrating information at a higher level. Speech and music, which are identified in this chapter as representative forms of communication in human beings, also have the same T. Matsui (✉) The Electronics-Inspired Interdisciplinary Research Institute, Toyohashi University of Technology, Toyohashi, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 Y. Seki (ed.), Acoustic Communication in Animals, https://doi.org/10.1007/978-981-99-0831-8_14

219

220

T. Matsui

information extracted from the periphery, and speech-specific and music-specific processing are performed in the subsequent stages. In this chapter, we discuss the important elements that are extracted by the common processing of the auditory periphery in completely different forms of sound communication such as human speech and music. Additionally, we examine previous studies that decide the characteristics of speech and music and discuss how it may be related to peripheral processing. Speech and music are often treated separately; however, these two perceptions and cognition are deeply related. The convergence of the two domains is described at the end of this chapter.

14.2

Mechanism of Auditory Periphery

The human auditory system, like that of many mammals, is processed stepwise through the outer ear, middle ear, inner ear, auditory nerve, subcortical nucleus, and cortex. The fringe of cartilage that protrudes from the external part of the head is called the auricle. It collects sound. Along with the sound collection, the shape of the auricle modifies the frequency of each sound source direction, which could be a cue to the perception of the sound source direction, especially for the elevation (perceptual experiments (Gardner and Gardner 1973); HRTF simulation research based on auricle shape (Takemoto et al. 2012)). The ear drum, which is a thin membrane, exists at the inner most part of the external auditory canal and separates it from the middle ear. The eardrum vibrates due to fluctuations in sound pressure. It converts sound waves into vibrations. Vibrations are transmitted to the cochlea via the auditory ossicles, which are the smallest bones in the human body. The cochlea consists of two and a half turns of a tube with a diameter of about 2 mm and a length of about 35 mm. The ossicle vibration that reaches the cochlea becomes the vibration of the basilar membrane that extends on the scala media inside of the cochlea because of the vibration of the lymph fluid. The cochlear basilar membrane has a shape that increases in thickness and width from the basilar to the cochlea apex, owing to which, the vibrations are frequency resolved in the form of membrane resonances. Auditory nerves connected to the basilar membrane of the cochlea fire in synchrony with the resonance of the membrane. The nerve signals, which are thus converted, are then transmitted to the central nervous system through the auditory nerve. Thus, both the middle ear (ossicles) and the inner ear (cochlea) have an impedancematching function for efficiently converting air vibration to liquid vibration. There are two ways of encoding the vibration of the cochlear basilar membrane into neural firing. The cochlear basilar membrane performs frequency resolution by the mechanism in which the resonance frequency differs in different places depending on its thickness and width. When a certain basilar membrane site resonates well, the auditory nerve connected to that site often fires. Thus, the frequency characteristics of sound input to the peripheral auditory system are encoded as neural excitation patterns. On the other hand, the auditory nerve receives signals from the

14

Sound Processing in the Auditory Periphery: Toward Speech. . .

221

inner hair cells located on the basilar membrane and it fires. The stereocilia on the upper side of the inner hair cells (opposite the basilar membrane) have the property of opening mechanosensitive ion channels only during specific phases of oscillation. This characteristic is called phase locking, and it preserves temporal information regarding the period of the vibration waveform. With these two encoding methods, we perceive sound features from multiple perspectives. The information that is extracted by each nucleus in the pathway from the auditory nerve to the central nervous system is briefly mentioned since it remains controversial. Before the signal reaches the cerebral cortex, it passes through many nuclei. Numerous nervous nuclei exist from the brainstem to the primary auditory cortex such as the cochlear nucleus, superior olivary complex, lateral lemniscus nucleus, inferior colliculus, and medial geniculate body (see Pickles 2013 for a detailed description of the auditory system). Up to the primary auditory cortex of the cerebral cortex, processing for frequency analysis, pitch extraction, and sound image localization, which constitutes the direct perception of sound, is almost complete. Information that comes from the left and right ears already crosses at the stage of the superior olivary complex. Some of these brainstem activities may be measured on the cranial as auditory brainstem responses. This is a type of electroencephalogram (EEG) and signifies responses at the brainstem level with latencies of 10 ms or less. There is a typical waveform pattern for a healthy listener, and the presence or absence of that can be used to estimate brainstem function. It is also used for a newborn hearing screening. Another brainstem-related feature is the preservation of sound frequency structure from the nuclei in the brainstem to the primary auditory cortex. This is called tonotopicity because the characteristic frequency and the innervation arrangement correspond. It contrasts with retinotopy in vision. There are, however, conflicting reports regarding the characteristic frequency gradient of the tonotopic map, and no consensus has been reached (see the recent studies, Langers and van Dijk 2012; Baumann et al. 2013; Saenz and Langers 2014, etc.). At a level higher than the brainstem, the auditory cortex is linked to functions such as language. As briefly outlined above, the auditory system extracts information step-by-step and integrates it into higher-order organs to create the final auditory image. An auditory image is something that is recognized as a unity of objects, such as voice, instrumental sound, and environmental sound while changing from moment to moment. While the question of how to perceive things holistically is often not clarified, features of auditory images can often be explained by peripheral processing results. It will be next described how peripheral auditory processing defines the relationship between sounds, focusing on speech and music.

222

14.3

T. Matsui

Peripheral Speech Processing

Human vocalization is more diverse than those of other primates. The evolution of human beings to upright bipedalism created a large space in the oral cavity. Changes in the shape of the articulators in the oral cavity (especially the tongue) have composed a structure that allows complex and rapid movement (Takemoto 2001; Takemoto et al. 2006). Human vocalization is also based on the acquisition of a neural basis that allows voluntary control of breathing during speech (Maclarnon and Hewitt 2004). In a more recent report, Nishimura et al. (2022) revealed through anatomy and vocalization modeling that loss of larynx complexity gained vocalization stability during the evolution from primates to human beings. Roughly described, human speech originates from a buzz-like sound emitted from the vocal fold. The glottal sound becomes speech sound by frequency modification in the vocal tract, which is the articulating organ. This process can be illustrated by the combination of a source and a filter in terms of signal processing. This idea is called the source–filter model of speech (Chiba and Kajiyama 1941; Fant 1970). The vocal fold, which provides the sound source, and the articulator, which works as the filter, move almost independently. The two factors, which constitute the human speech production process represented by the source–filter model, are reflected in two aspects of the acoustic signal. A glottal source wave, which is the sound source of the voice, is generated by the glottis that periodically opens and closes. Therefore, the sound source wave by glottis determines the temporal periodicity (fundamental frequency) of the vocal acoustic signal. Furthermore, specific vowels are produced depending on the shape of the vocal tract when the sound source wave was generated. Since the vocal tract can also be considered a resonance tube, a difference in the shape of the vocal tract results in a difference in resonance frequency. The resonance frequency peak is called a formant, and each vowel has its own formant combination. Consonants are generated depending on the position, duration, and the moment in which the vocal tract is closed. It appears as a short-time change in waveform and frequency compared to vowels in general. The speech sounds we use in our daily communication are formed through such a process. For example, pitch-accented languages such as Japanese and tone languages such as Chinese have the same consonant-vowel combination (or only vowel) that can be pronounced as different words by changing the fundamental frequency transition. While the vocal tract shape changes similarly, these two utterances differ in terms of the transition of the glottal source sound period. Contrarily, if some consonants and vowels are uttered in fixed fundamental frequency, their glottal source wave remains the same; however, their vocal tract shapes change in a different pattern (even if the same word is pronounced with the same pitch accent, the pattern of the glottal sound source may differ, if the emotions conveyed are different). Speech produced in this manner is analyzed at the auditory periphery. As explained in Sect. 14.2, after the signal is frequency resolved in the cochlear basilar membrane, the sound information is encoded based on two separate aspects: which

14

Sound Processing in the Auditory Periphery: Toward Speech. . .

223

cochlear basilar membrane sites resonated and which cycles the auditory nerves were firing. The frequency-resolving function of the cochlear basilar membrane can be regarded as a bank of filters in terms of signal processing, which are called auditory filters. Phase locking of the auditory nerve preserves the period of the output waveform from the auditory filter. Some representative auditory computation models claim that the period of the acoustic signal can be explained by the fact that the period of the firing timing of all the channels of the auditory filter is summed (Patterson et al. 1995; Glasberg and Moore 1990) (also physiological models suggest by Hewitt and Meddis 1994). For example, if we assume that a voiced vowel is input to the auditory system, in the frequency domain, the spectral envelope of voiced vowels has several peaks (formants). Overall spectral envelope has a slope in which the power decreases as the frequency band increases. In the time domain, a voiced sound signal has a periodically repeating structure. The auditory system encodes the spectral shape of vowels as locations of the resonance on the cochlear basilar membrane. Then, the period of the vowel’s waveform is encoded as the firing timing of the auditory nerve. When a speaker utters different vowels with the same fundamental frequency, the listener’s auditory system will have different place information of the resonance and common temporal information of the nervous firing. The listener then perceives that different vowels are uttered at the same pitch. If the same vowel is uttered at several different fundamental frequencies, then the listener’s auditory system will have different temporal information and common spatial information. The listener then perceives that the same vowel is uttered at several different pitches. The three stages of utterance, auditory encoding, and perceptual representation correspond roughly thus and so. In the case of unvoiced (whispered) vowels, turbulence by the airflow is the sound source because it does not involve the opening and closing of the vocal fold. Therefore, the periodicity of speech becomes ambiguous, subsequently, the perceived pitch also becomes ambiguous. Although there is no clear pitch, a certain kind of “pitch” is often produced by changing the formants depending on the vocalization, especially for whispering. We often use them as intonation in the case of whispers (e.g., Higashikawa et al. 1996). However, the same vowel utterance does not provide the same pattern of auditory nerve excitation consistently. Different speakers have different formant frequencies. Groups with different average vocal tract lengths (VTLs), such as that of male adults, female adults, and children, have different average formant frequencies (Hillenbrand et al. 1995). Formant frequencies decrease as the child’s VTL grows with height (Lee et al. 1999; Huber et al. 1999). Yet, we can converse with children as well as male and female adults. Thus, we can identify a certain vowel by associating it not with the specific formant frequency or the specific spectral envelope, but with the information obtained by “normalizing” the spectral envelope (the excitation pattern of the auditory nerve). We can also extract speaker information. When normalizing the excitation pattern for vowels, the information on the size of the vocal tract is separated from that. We can also discriminate vocal tract sizes with a certain degree of accuracy. Generally, estimating the size of invisible opponents is an important ability for their own safety.

224

T. Matsui

Many experiments have been conducted to directly estimate a person’s height and weight from voice (van Dommelen and Moxness 1995; Gonzalez 2004; Pisanski et al. 2014). Experimental results have also been reported in which the listener misjudges the physique when the speaker uses the voice uttered intending to convey that his/her physique has changed (Pisanski and Reby 2021). These experiments showed that speech provides cues to the speaker’s size, though its effect on perception is limited. However, these results may be caused by the limited size variance of real speakers and the fact that estimating height and weight, itself, is a difficult task. In an EEG experiment with newborns (Vestergaard et al. 2009), newborns showed mismatch negativity (MMN) responses similar to adults with respect to the size discrimination of instrumental sounds. Behavioral experiments with newborns (Pietraszewski et al. 2017) also showed that newborns can recognize the relationship between speech sound and body size. Blind listeners have shown that they can discriminate speaker size with the same degree of accuracy and error as sighted listeners (Pisanski et al. 2016, 2017). These results suggest that speaker size discrimination is an innate function of the auditory system. Recordings of real human voices are used in these experiments. To investigate the discrimination threshold for vocal tract size, signal processing that allows independent control of vocal tract shape and vocal tract size is necessary. STRAIGHT, a speech analysis/synthesis system, was created as a method to independently control the waveform period and the scaling of the spectral envelope of speech signals (Kawahara et al. 1999, 2008). This method corresponds to independently controlling the fundamental frequency of the utterance and the size of the vocal tract. Scaling only the spectral envelope in the frequency domain makes it sound like the speaker’s size has changed while the vowel types and perceived pitches are preserved. This scaling in the frequency domain corresponds to the change in resonant frequency that occurs when physically changing the size of the acoustic tube. Several experiments were conducted with synthetic speech to measure the sensitivity to scaling of its spectral envelope: speaker size discrimination for vowels (Smith et al. 2005); speaker size discrimination for syllables (Ives et al. 2005). The former reported a just noticeable difference (JND) of 8% for size discrimination for vowels, whereas the latter reported a JND of 5% for syllables. An experiment using another synthetic speech generation method without a vocoder such as STRAIGHT reported JNDs of 5–6% for formant frequencies (Pisanski and Rendall 2011). The JNDs of these vocal tract sizes are smaller than the 11% JND of the loudness of noise (Miller 1947). The perception of sound size arguably has a relatively high resolution in the auditory system. Most of these previous studies are experiments using voiced sounds. Voiced speech is driven by a stream of glottal pulses and unvoiced (whispered) speech is driven by continuous noise (Tartter 1989, 1991). As a result of measuring the vowel recognition rate and size discrimination accuracy of voiced and unvoiced speech, the JND of the geometric mean of formant frequency was shown as approximately 5% for both voiced and unvoiced speech (Irino et al. 2012). It demonstrates that human beings can detect speaker size with high accuracy in both voiced and unvoiced speech. A computational process for size extraction is also proposed (Irino et al.

14

Sound Processing in the Auditory Periphery: Toward Speech. . .

225

2017; Matsui et al. 2022). These studies are based on the normalization function of excitation patterns in the auditory model (Irino and Patterson 2002). Numerous perceptual experiments in previous research have shown that the auditory system can encode speech into three components: the glottal source sound, the shape as well as the size of the vocal tract. These experimental results inspired computational models implementing the possible processing steps. Although the proposed process is based primarily on perceptual experiments with manipulated speech, it has gained credibility due to its consistency with the physiological function of the auditory system. On the other hand, hearing does not exclusively process speech. It must be connected to how this process works in music sound.

14.4

Peripheral Processing of Musical Tones

Speech and music have the function of communicating with others. To make communication, it is necessary to freely (spontaneously) change the sound signal which will be transmitted to others. In speech, consonants and vowels are produced by changing the shape of the vocal tract over time. Changing the vocal fold vibration’s periodicity is used as a pitch accent or expression of emotions and intentions. In music (this section is limited to modern Western music), pitch transitions shape the melody which is an important element of music. Layering multiple pitches build harmony. Based on periodic beats, rhythms are created by temporal variations. Combination instruments with various sounding mechanisms perform orchestration. Signals, thus produced under the control of the speaker/performer, reach the listener and are understood as speech or music. The previous section discussed how the human auditory system works and how it analyzes speech. Here, we discuss how the analysis by the auditory periphery functions in music perception and cognition, especially in Western modern music. Since the nineteenth century, in the field of speech and that of music, the question of how the human auditory system extracts the pitch, which is one of the psychological quantities from the physical properties of sound, has been addressed (Helmholtz 1863; Seebeck 1841 referred to in Plomp 1967). Especially in Western modern music, the pitch has been considered an important aspect of music, because a set of discrete pitches creates a musical scale and builds harmony. The use of musical sounds with a definite pitch and sounds with an ambiguous pitch in musical performance may have led to an interest in the physical properties of the sounds that produce pitch. As with speech, the periodicity of sound is perceived as pitch. Musical sounds with clear periodicity, such as string and wind sounds, have a clear pitch and can perform a melody. Musical sounds with unclear periodicity, such as drums and cymbals, have unclear pitch and cannot form melodies or harmonies. Sung melodies are clear in voiced sounds, but melodies sung in whispers can only convey their general shape.

226

T. Matsui

Section 14.2 explained that the pitch of a sound is caused by the extraction of the periodicity of the acoustic signal in the auditory peripherals. However, it has been experimentally shown that the “pitch” of sound, which has been mainly referred to in the field of music, is not one-dimensional. Sounds with the same pitch name (C, D, E, . . .), that is, sounds that differ by an octave, have very similar sensations. Although the sounds of different octaves are very similar, the impression of their height differs. The sensation commonly called “pitch” is thought to comprise two dimensions: the cycle that repeats every octave and the height direction that changes different octaves. Based on the characteristics of this cycle and height direction, the pitch is considered to be a psychological representation of a spiral form (Shepard 1964, 1982), which has been confirmed by three-dimensional structuring in multi-dimensional scaling (MDS) based on the results of psychological experiments (Ueda and Ohgushi 1987). The dimension that repeats every octave is called “pitch chroma,” and the height direction is called “pitch height.” The pitch chroma and pitch height can also be explained by peripheral auditory encoding. A note one octave higher has half the period of the original note. In terms of temporal coding, notes separated by octave intervals are very similar. The periodicity of pitch-chroma is not considered in terms of the frequency spectrum, that is, place coding. Based on this explanation, pitch chroma is deemed closely related to the temporal information of neural firing. However, the pitch height is well explained by the excitation pattern of the auditory nerve. If two instruments play the same note on the score, one may give the impression of being more “shrill” or “metallic” than the other. Even sounds whose pitch is not clear can give rise to a certain sense of height; for example, a cymbal sound gives an impression higher than a bass drum sound. Even if the pitch chroma of the two sounds is the same, the pitch height is different. This phenomenon also occurs in speech, where two different vowels with the same fundamental frequency can still sound one vowel shriller than the other as the frequency centroids are different. The difference in the frequency centroid is encoded as the difference in the pitch height in terms of the helical structure of the pitch. As mentioned earlier, pitch chroma and pitch height differ in their origin in the information encoded in the auditory periphery. The rationale for this, along with octave similarity, is the limitation of phase locking. The upper limit frequency of phase locking, which is a characteristic of the firing of the auditory nerve, in mammals is said to be about 4–5 kHz (see a review Heil and Peterson 2017). Some other reviews claim 10 kHz or higher (Heinz et al. 2001; Recio-Spinoso et al. 2005). The fundamental frequency at which a melody can be identified is also around 4–5 kHz (Ward 1954). Melody shapes above this fundamental frequency are not clearly identifiable (Attneave and Olson 1971). Temporal information in the auditory periphery is necessary for the identification of melody, an important element in music. The auditory peripheral place and temporal information provided by individual sounds may also explain performance when pitch-related musical features are perceived and recognized. For identifying a melody, two cues are important: the

14

Sound Processing in the Auditory Periphery: Toward Speech. . .

227

melody contour which is the up-down pattern of the pitch and the adjacent intervals between individual tones (Dowling and Fujitani 1971). In a study that experimentally investigated whether the melody outline and individual intervals depend on place or temporal information, melody outlines could be identified by place and temporal information; however, intervals were not identified by place information (Matsui and Tsuzaki 2008). The same resource also revealed that temporal information is also necessary for the tonal framework that defines the identity of individual intervals in modern music. This is not surprising given that scale components are based on the periodicity of sounds, however, it is an example for how the perception of music can be explained by auditory peripheral information. While the pitch was focused as an important element in modern Western music, timbre is also an important element in music that can convey information about the sound source. Various instruments are used in real music. Different instruments are used for different timbres (different pitch ranges are often required). The technique of combining different timbres to create a new impression enriches music in various forms, including multi-person ensembles, organ stops, and modern orchestrations. ANSI defines timbre as, “. . . that attribute of auditory sensation in terms of which a listener can judge that two sounds similarly presented and having the same loudness and pitch are dissimilar” (American National Standards Institute, and Committee on Bioacoustics, S3, American National Standards Institute, Acoustical Society of America 1973). As timbre is defined as anything other than loudness and pitch, it is an aspect that does not have a clear unidimensional definition (strictly, loudness and pitch are not completely independent of timbre, however, no detailed discussion will take place here). Several studies have explored the relationship between the physical properties of sound and subjective impressions of timbre (see a review, McAdams and Giordano 2009). The similarity between timbres is measured by perceptual experiments, and the results are mapped onto a multi-dimensional space using a multi-dimensional scaling, which is called a timbre space. The axes that best describe the timbre space are the spectral centroid, the logarithm of attack time, spectral flux (evolution of the spectral shape over the duration of the tone), and spectral instability. The timbre indicates that the instruments are different and serves as an indicator that the sound is derived from the same instrument. Thus, we can judge if the instrument body has the same shape. We can understand that the violin, viola, and cello belong to the same family, and often one is misidentified as the other (Giordano and McAdams 2010). This process may be the same as that in speech perception, in which the same vowel can be identified even if the speaker size is different. It is possible to discriminate body size within instruments of the same family (van Dinther and Patterson 2006; Plazak and McAdams 2017). These perceptual phenomena and experimental results demonstrate that the timbre space is multidimensional, however, contains a one-dimensional size axis from small to large. Unlike the pitch, the one-dimensional size axis is experimentally confirmed to be difficult to remember and identify as a transition pattern (Matsui and Tsuzaki 2013). This suggests that the musical framework known as the “Klangfarbenmelodie” proposed by Schönberg in 1911 is not as perceptually tractable as the melody. In

228

T. Matsui

speech experiments, it is difficult to remember the order of the contents (types of vowels) when sounds with different resonator sizes are consecutive (Takeshima et al. 2010). The “timbre” in music, including the shape and size of resonators, can classify mixed sounds and clarify the time-series information to be conveyed.

14.5

Boundary Between Speech and Music

Speech and music, representatives of sound-mediated communication, are often treated as completely irrelevant in research on perception and cognition. As described earlier, there are many commonalities in auditory peripheral process, and common information’s subsequent link to perception and cognition. A model created in one domain may allow us to understand perceptual phenomena in the other domain. Both domains mentioned here are stated in terms of early stages in the perception of speech and music. At higher levels of processing, such as linguistic comprehension of speech and comprehension of musical expressions, the processing differences between speech and music become even more pronounced. Measuring brain activity while listening to speech or music has been established as a research method to clarify the differences and commonalities, along with the method of constructing a psychological model from the results of perceptual experiments. Since then, there has been more additions to brain function research (e.g., Patel 2008; Koelsch 2012) summarizes brain function research but often refers to language models). New experimental results continue to be announced day by day clarifying much the processing of speech and music share and separate the mechanism. Finally, experiments will be introduced, which show that speech and music, the two representative forms of sound we use to communicate, are domains that can be confused. There exists a phenomenon that when a phrase of speech is played repeatedly, the listener begins to perceive it as a song (Deutsch et al. 2011). Recent studies have shown that the degree of emphasis on the fundamental frequency (F0) of the speech stimulus and the order in which the F0-manipulated speech is played can change the likelihood of this Speech-to-Song Illusion (STS). They also reported that once they recognized it as a song, they could not cancel the way they heard it as a song (Groenveld et al. 2020). On the other hand, it was reported that STS is promoted by the characteristics of the speech itself and the linguistic characteristics of the listeners’ language (STS is likely to occur when non-native speakers hear phrases containing many vowels, nasals, and approximants; Rathcke et al. 2021). Although systematic research on STS is still in formative stages, all studies suggest that our auditory system perceives speech and music (songs) have more blurred boundaries than expected.

14

Sound Processing in the Auditory Periphery: Toward Speech. . .

229

References American National Standards Institute, Committee on Bioacoustics, S3, American National Standards Institute, Acoustical Society of America (1973) American national standard psychoacoustical terminology. American National Standards Institute, New York Attneave F, Olson RK (1971) Pitch as a medium: a new approach to psychophysical scaling. Am J Psychol 84(2):147–166 Baumann S, Petkov CI, Griffiths TD (2013) A unified framework for the organization of the primate auditory cortex. Front Syst Neurosci 7:11 Chiba T, Kajiyama M (1941) The vowel: its nature and structure. Tokyo-Kaiseikan, Tokyo Deutsch D, Henthorn T, Lapidis R (2011) Illusory transformation from speech to song. J Acoust Soc Am 129(4):2245–2252 Dowling WJ, Fujitani DS (1971) Contour, interval, and pitch recognition in memory for melodies. J Acoust Soc Am 49(2 part2):524–531 Fant G (1970) Acoustic theory of speech production, 2nd edn. MOUTON Gardner MB, Gardner RS (1973) Problem of localization in the median plane: effect of pinnae cavity occlusion. J Acoust Soc Am 53(2):400–408 Giordano BL, McAdams S (2010) Sound source mechanics and musical timbre perception: evidence from previous studies. Music Percept 28(2):155–168 Glasberg BR, Moore BC (1990) Derivation of auditory filter shapes from notched noise data. Hear Res 47(1–2):103–138 Gonzalez J (2004) Formant frequencies and body size of speaker: a weak relationship in´ adult humans. J Phon 32(2):277–287 Groenveld G, Burgoyne JA, Sadakata M (2020) I still hear a melody: investigating temporal dynamics of the speech-to-song illusion. Psychol Res 84(5):1451–1459 Heil P, Peterson AJ (2017) Spike timing in auditory-nerve fibers during spontaneous activity and phase locking. Synapse 71(1):5–36 Heinz MG, Colburn HS, Carney LH (2001) Evaluating auditory performance limits: I, one-parameter discrimination using a computational model for the auditory nerve. Neural Comput 13(10):2273–2316 Helmholtz H (1863) Die Lehre von den Tonempfindungen als physiologische Grundlage f ur die Theorie der Musik (On the sensations of tones) Hewitt MJ, Meddis R (1994) A computer model of amplitude-modulation sensitivity of single units in the inferior colliculus. J Acoust Soc Am 95(4):2145–2159 Higashikawa M, Nakai K, Sakakura A, Takahashi H (1996) Perceived pitch of whispered vowelsrelationship with formant frequencies: a preliminary study. J Voice 10(2):155–158 Hillenbrand J, Getty LA, Clark MJ, Wheeler K (1995) Acoustic characteristics of American English vowels. J Acoust Soc Am 97(5 Pt 1):3099–3111 Huber JE, Stathopoulos ET, Curione GM, Ash TA, Johnson K (1999) Formants of children, women, and men: the effects of vocal intensity variation. J Acoust Soc Am 106(3 Pt 1): 1532–1542 Irino T, Patterson RD (2002) Segregating information about the size and shape of the vocal tract using a time-domain auditory model: the stabilised wavelet-Mellin transform. Speech Comm 36(3):181–203 Irino T, Aoki Y, Kawahara H, Patterson RD (2012) Comparison of performance with voicedandwhisperedspeechinwordrecognitionandmean-formant-frequencydiscrimination. Speech Comm 54(9):998–1013 Irino T, Takimoto E, Matsui T, Patterson RD (2017) An auditory model of speaker size perception for voiced speech sounds. Interspeech 2017:1153–1157 Ives DT, Smith DRR, Patterson RD (2005) Discrimination of speaker size from syllable phrases. J Acoust Soc Am 118(6):3816–3822

230

T. Matsui

Kawahara H, Masuda-Katsuse I, De Cheveigne A (1999) Restructuring speech representations using a pitch-adaptive time–frequency smoothing and an instantaneous-frequency based F0 extraction: possible role of a repetitive structure in sounds. Speech Comm 27(3–4):187–207 Kawahara H, Morise M, Takahashi T, Nisimura R, Irino T, Banno H (2008) Tandem-STRAIGHT: a temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation. In: 2008 IEEE international conference on acoustics, speech and signal processing, pp 3933–3936 Koelsch S (2012) Brain and music. Wiley, Chichester Langers DRM, van Dijk P (2012) Mapping the tonotopic organization in human auditory cortex with minimally salient acoustic stimulation. Cereb Cortex 22(9):2024–2038 Lee S, Potamianos A, Narayanan S (1999) Acoustics of children’s speech: developmental changes of temporal and spectral parameters. J Acoust Soc Am 105(3):1455–1468 Maclarnon A, Hewitt G (2004) Increased breathing control: another factor in the evolution of human language. Evol Anthropol 13(5):181–197 Matsui T, Tsuzaki M (2008) Functional differences between tonotopic and periodic information in recognition of transposed melodies: how do local cues affect global features? Acoust Sci Technol 29(5):309–319 Matsui T, Tsuzaki M (2013) Independence of mental representations for tonotopic and periodic scales in perceptual judgment of vowel-like sounds. Acoust Sci Technol 34(6):436–439 Matsui T, Irino T, Uemura R, Yamamoto K, Kawahara H, Patterson RD (2022) Modelling speakersize discrimination with voiced and unvoiced speech sounds based on the effect of spectral lift. Speech Comm 136:23–41 McAdams S, Giordano BL (2009) The perception of musical timbre. In: Hallam S, Cross I, Thaut M (eds) The Oxford handbook of music psychology. Oxford University Press, Oxford, pp 72–80 Miller GA (1947) Sensitivity to changes in the intensity of white noise and its relation to masking and loudness. J Acoust Soc Am 19(4):609–619 Nishimura T, Tokuda IT, Miyachi S, Dunn JC, Herbst CT, Ishimura K, Kaneko A, Kinoshita Y, Koda H, Saers JPP, Imai H, Matsuda T, Larsen ON, Jürgens U, Hirabayashi H, Kojima S, Fitch WT (2022) Evolutionary loss of complexity in human vocal anatomy as an adaptation for speech. Science 377(6607):760–763 Patel AD (2008) Music, language, and the brain. Oxford University Press, Oxford Patterson RD, Allerhand MH, Giguere C (1995) Time-domain modeling of peripheral auditory processing: a modular architecture and a software platform. J Acoust Soc Am 98(4):1890–1894 Pickles JO (2013) An introduction to the physiology of hearing, 4th edn. Koninklijke Brill NV, Leiden Pietraszewski D, Wertz AE, Bryant GA, Wynn K (2017) Three-month-old human infants use vocal cues of body size. Proc Biol Sci 284(1856):20170656 Pisanski K, Reby D (2021) Efficacy in deceptive vocal exaggeration of human body size. Nat Commun 12(1):968 Pisanski K, Rendall D (2011) The prioritization of voice fundamental frequency or formants in listeners’ assessments of speaker size, masculinity, and attractiveness. J Acoust Soc Am 129(4): 2201–2212 Pisanski K, Fraccaro PJ, Tigue CC, O’Connor JJM, Feinberg DR (2014) Return to Oz: voice pitch facilitates assessments of men’s body size. J Exp Psychol Hum Percept Perform 40(4): 1316–1331 Pisanski K, Oleszkiewicz A, Sorokowska A (2016) Can blind persons accurately assess body size from the voice? Biol Lett 12(4):20160063 Pisanski K, Feinberg D, Oleszkiewicz A, Sorokowska A (2017) Voice cues are used in a similar way by blind and sighted adults when assessing women’s body size. Sci Rep 7(1):10329 Plazak J, McAdams S (2017) Perceiving changes of sound-source size within musical tone pairs. Psychomusicol Music Mind Brain 27(1):1–13 Plomp R (1967) Pitch of complex tones. J Acoust Soc Am 41(6):1526–1533 Rathcke T, Falk S, Dalla Bella S (2021) Music to your ears. Music Percept 38(5):499–508

14

Sound Processing in the Auditory Periphery: Toward Speech. . .

231

Recio-Spinoso A, Temchin AN, van Dijk P, Fan Y-H, Ruggero MA (2005) Wiener-kernel analysis of responses to noise of chinchilla auditory-nerve fibers. J Neurophysiol 93(6):3615–3634 Saenz M, Langers DRM (2014) Tonotopic organization of the human auditory cortex. Hear Res 307:43–52 Seebeck A (1841) Beobachtungen über einige bedingungen der entstehung von tönen [Observations on some conditions for the formation of tones]. Ann Phys Chem 53:417–436. https://doi. org/10.1002/andp.18411290702 Shepard RN (1964) Circularity in judgments of relative pitch. J Acous Soc Am 36(12):2346–2353 Shepard RN (1982) Geometrical approximations to the structure of musical pitch. Psychol Rev 89(4):305–333 Smith DRR, Patterson RD, Turner R, Kawahara H, Irino T (2005) The processing and perception of size information in speech sounds. J Acoust Soc Am 117(1):305–318 Takemoto H (2001) Morphological analyses of the human tongue musculature for three dimensional modeling. J Speech Lang Hear Res 44(1):95–107 Takemoto H, Honda K, Masaki S, Shimada Y, Fujimoto I (2006) Measurement of temporal changes in vocal tract area function from 3D cine-MRI data. J Acoust Soc Am 119(2):1037–1049 Takemoto H, Mokhtari P, Kato H, Nishimura R, Iida K (2012) Mechanism for generating peaks and notches of head-related transfer functions in the median plane. J Acoust Soc Am 132(6): 3832–3841 Takeshima C, Tsuzaki M, Irino T (2010) Perception of vowel sequence with varying speaker size. Acoust Sci Technol 31(2):156–164 Tartter VC (1989) What’s in a whisper? J Acoust Soc Am 86(5):1678–1683 Tartter VC (1991) Identifiability of vowels and speakers from whispered syllables. Percept Psychophys 49(4):365–372 Ueda K, Ohgushi K (1987) Perceptual components of pitch: spatial representation using a multidimensional scaling technique. J Acoust Soc Am 82:1193–1193 van Dinther R, Patterson RD (2006) Perception of acoustic scale and size in musical instrument sounds. J Acoust Soc Am 120(4):2158–2176 van Dommelen WA, Moxness BH (1995) Acoustic parameters in speaker height and weight identification: sex-specific behaviour. Lang Speech 38(3):267–287 Vestergaard MD, Haden GP, Shtyrov Y, Patterson RD, Pulvermüller F, Denham SL, Sziller I, Winkler I (2009) Auditory size-deviant detection in adults and newborn infants. Biol Psychol 82(2):169–175 Ward WD (1954) Subjective musical pitch. J Acoust Soc Am 26(3):369–380