374 3 19MB
English Pages [1050] Year 2019
THE OXFORD HANDBOOK OF
MUSIC AND THE BRAIN
THE OXFORD HANDBOOK OF
MUSIC AND THE BRAIN Edited by
MICHAEL H. THAUT and
DONALD A. HODGES
Great Clarendon Street, Oxford,
26
, United Kingdom
Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide. Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries © Oxford University Press 2019 The moral rights of the authors have been asserted First Edition published in 2019 Impression: 1 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by licence or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this work in any other form and you must impose this same condition on any acquirer Published in the United States of America by Oxford University Press 198 Madison Avenue, New York, NY 10016, United States of America British Library Cataloguing in Publication Data Data available Library of Congress Control Number: 2019943710 ISBN 978–0–19–880412–3 ebook ISBN 978–0–19–252613–7 Printed and bound by CPI Group (UK) Ltd, Croydon,
04
Links to third party websites are provided by Oxford in good faith and for information only. Oxford disclaims any responsibility for the materials contained in any third party website referenced in this work.
T
C
List of Contributors
SECTION I INTRODUCTION 1. The Neuroscientific Study of Music: A Burgeoning Discipline D
A. H
M
H. T
SECTION II MUSIC, THE BRAIN, AND CULTURAL CONTEXTS 2. Music Through the Lens of Cultural Neuroscience D
A. H
3. Cultural Distance: A Computational Approach to Exploring Cultural Influences on Music Cognition S
J. M
,S
M. D
,
M
T. P
4. When Extravagance Impresses: Recasting Esthetics in Evolutionary Terms B
M
SECTION III MUSIC PROCESSING IN THE HUMAN BRAIN 5. Cerebral Organization of Music Processing T
B
J
M
H. T
6. Network Neuroscience: An Introduction to Graph Theory Network-Based Techniques for Music and Brain Imaging Research R
W. W
7. Acoustic Structure and Musical Function: Musical Notes Informing Auditory Research M
S
8. Neural Basis of Rhythm Perception C
M. V A. G
J
B
N
, J. E
T. T
9. Neural Basis of Music Perception: Melody, Harmony, and Timbre S
K
10. Multisensory Processing in Music F
R
SECTION IV NEURAL RESPONSES TO MUSIC: COGNITION, AFFECT, LANGUAGE 11. Music and Memory L
J
,
12.
Music and Attention, Executive Function, and Creativity P
L
R
E. G
13. Neural Correlates of Music and Emotion P
N. J
L
S. S
14. Neurochemical Responses to Music Y
K
15. The Neuroaesthetics of Music: A Research Agenda Coming of Age E
B
16. Music and Language D
S
B
M
SECTION V MUSICIANSHIP AND BRAIN FUNCTION 17. Musical Expertise and Brain Structure: The Causes and Consequences of Training V
B. P
18. Genomics Approaches for Studying Musical Aptitude and Related Traits I
J
19. Brain Research in Music Performance E C
A I. I
,S
F
,D
20. Brain Research in Music Improvisation M
G. E
A
L. B
S. S
,
21. Neural Mechanisms of Musical Imagery T
L. H
22. Neuroplasticity in Music Learning V
P
M
T
SECTION VI DEVELOPMENTAL ISSUES IN MUSIC AND THE BRAIN 23. The Role of Musical Development in Early Language Acquisition A
B
,M
G
,
L. R
S
24. Rhythm, Meter, and Timing: The Heartbeat of Musical Development L
J. T
S
M
-R
25. Music and the Aging Brain L B
F
,A
M
,E
B
,
T
26. Music Training and Cognitive Abilities: Associations, Causes, and Consequences S
S
E. G
S
27. The Neuroscience of Children on the Autism Spectrum with Exceptional Musical Abilities A
O
SECTION VII MUSIC, THE BRAIN, AND HEALTH
28. Neurologic Music Therapy in Sensorimotor Rehabilitation C
T
K
M
S
29. Neurologic Music Therapy for Speech and Language Rehabilitation Y
S. L , C
T
,
C
S
30. Neurologic Music Therapy Targeting Cognitive and Affective Functions S
H
31. Musical Disorders I
R
,S
P
,
P
T
32. When Blue Turns to Gray: The Enigma of Musician’s Dystonia D
P
E
A
SECTION VIII THE FUTURE 33. New Horizons for Brain Research in Music M Index
H. T
D
A. H
L
C
Eckart Altenmüller, Institute of Music Physiology and Musicians’ Medicine (IMMM), University of Music, Drama and Media, Germany Aaron L. Berkowitz, Department of Neurology, Brigham and Women’s Hospital, Harvard Medical School, USA Emmanuel Bigand, CNRS, UMR5022, Laboratoire d’Etude de l’Apprentissage et du Développement, Université de Bourgogne, France and Institut Universitaire de France, France Anthony Brandt, The Shepherd School of Music, USA Elvira Brattico, Center for Music in the Brain (MIB), Department of Clinical Medicine, Aarhus University, Denmark and The Royal Academy of Music, Aarhus/Aalborg, Denmark Thenille Braun Janzen, Music and Health Science Research Collaboratory (MaHRC), University of Toronto, Canada Steven M. Demorest, Northwestern University, USA Michael G. Erkkinen, Department of Neurology, Brigham and Women’s Hospital, USA Laura Ferreri, Cognition and Brain Plasticity Group, Bellvitge Biomedical Research Institute, Hospitalet de Llobregat, Barcelona and Department of Cognition, Development and Educational Psychology, University of Barcelona, Spain. Laboratoire d’Etude des Mécanismes Cognitifs, Université Lumière Lyon 2, 69676 Lyon, France Shinichi Furuya, Sony Computer Science Laboratories Inc., Japan Molly Gebrian, University of Wisconsin-Eau Claire, Department of Music and Theatre Arts, USA Jessica A. Grahn, Brain and Mind Institute, Western University, Canada Rachel E. Guetta, The National Center for PTSD, VA Boston Healthcare System, USA Shantala Hegde, Clinical Neuropsychology and Cognitive Neuroscience Center and Music Cognition Laboratory, Department of Clinical Psychology, National Institute of Mental Health and Neurosciences, Bengaluru, India Donald A. Hodges, University of North Carolina at Greensboro, USA Timothy L. Hubbard, Arizona State University, USA and Grand Canyon University, USA Christos I. Ioannou, Institute of Music Physiology and Musicians’ Medicine (IMMM), University of Music, Drama and Media, Germany
Lutz Jäncke, Division of Neuropsychology, Institute of Psychology, University of Zurich, Switzerland Irma Järvelä, Department of Medical Genetics, University of Helsinki, Finland Patrik N. Juslin, Department of Psychology, Uppsala University, Sweden Stefan Koelsch, Department for Biological and Medical Psychology, University of Bergen, Norway Yuko Koshimori, Music and Health Research Collaboratory (MaHRC), University of Toronto, Canada Yune S. Lee, Department of Speech and Hearing Science, The Ohio State University, USA Psyche Loui, Northeastern University, USA Susan Marsh-Rollo, Auditory Development Lab, McMaster University, Canada Bjorn Merker, Independent Scholar, Kristianstad, Sweden Benjamin Morillon, Institut de Neurosciences des Systèmes, Aix-Marseille Université & INSERM, Marseille, France Steven J. Morrison, University of Washington, USA Aline Moussard, Centre de Recherche de l’Institut Universitaire de Gériatrie de Montréal (CRIUGM), Canada Adam Ockelford, School of Education, University of Roehampton, London, UK Sébastien Paquette, International Laboratory for Brain, Music and Sound Research (BRAMS), Université de Montréal, Québec, Canada Marcus T. Pearce, Queen Mary University of London, UK and Aarhus University, Denmark Virginia B. Penhune, Department of Psychology Concordia University, Canada David Peterson, Institute for Neural Computation, University of California San Diego, USA Vesa Putkinen, Turku PET Centre, University of Turku, Turku, Finland Isabelle Royal, Département de psychologie, Université de Montréal, Québec, Canada Frank Russo, Ryerson University, Canada Laura S. Sakka, Department of Psychology, Uppsala University, Sweden Charlene Santoni, Faculty of Music, University of Toronto, Canada E. Glenn Schellenberg, Department of Psychology, University of Toronto Mississauga, Canada Daniel S. Scholz, Institute of Music Physiology and Musicians’ Medicine (IMMM), University of Music, Drama and Media, Germany Daniele Schön, Institut de Neurosciences des Systèmes, Aix-Marseille Université & INSERM, France Michael Schutz, Institute for Music and the Mind, McMaster University, Canada L. Robert Slevc, Department of Psychology, University of Maryland, USA Klaus Martin Stephan, SRH Gesundheitszentrum Bad Wimpfen, Germany Swathi Swaminathan, Rotman Research Institute, Baycrest Health Sciences, Canada
J. Eric T. Taylor, Brain and Mind Institute, Western University, Canada Mari Tervaniemi, Cognitive Brain Research Unit, Department of Psychology and Logopedics, Faculty of Medicine, University of Helsinki, Helsinki, Finland and Cicero Learning, Faculty of Educational Sciences, University of Helsinki, Helsinki, Finland Corene Thaut, Faculty of Music, University of Toronto, Canada Michael H. Thaut, Music and Health Science Research Collaboratory (MaHRC), University of Toronto, Canada Barbara Tillmann, CNRS, Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics team, France and University of Lyon, France Laurel J. Trainor, Department of Psychology, Neuroscience & Behavior, McMaster University, Canada Pauline Tranchant, Département de psychologie, Université de Montréal, Canada Christina M. Vanden Bosch der Nederlanden, Brain and Mind Institute, Western University, Canada Robin W. Wilkins, University of North Carolina at Greensboro, USA
SECTION I
IN T R OD U C T ION
CHAPT E R 1
THE NEUROSCIENTIFIC STUDY OF MUSIC: A BURGEONING DISCIPLINE D O N A L D A . H O D G E S A N D MI C H A E L H . T H A U T
I T book is the result of a considerable amount of effort by fifty-four authors from thirteen countries. Beyond that, it represents the work of hundreds of researchers over the past fifty years or so. The neuroscientific study of music, or neuromusical research as it may be called, has grown and expanded significantly over several decades. The purpose of this chapter is twofold. The first portion provides a brief historical perspective on music and neuroscience. The second presents an overview of the eight sections and thirty-three chapters of this book.
V
B D
Space limitations do not permit a detailed historical overview of neuromusical research. Rather, the intent is to provide glimpses of early,
pioneering efforts. In 1977, R. A. Henson included historical notes on neuromusical research in the ground-breaking book on music and the brain he edited along with Macdonald Critchley (Critchley & Henson, 1977). John Brust (2003) also provided a historical perspective. More recently, Eckart Altenmüller, Stanley Finger, and François Boller edited a twovolume set on music, neurology, and neuroscience (2015a, 2015b) that provides far greater depth and detail. The first volume focuses on historical connections and perspectives and the second on evolution, the musical brain, and medical conditions and therapies. From these and other sources, here are a few glimpses into the growing field of music–brain research. • Franz Joseph Gall (1758–1828), the founder of phrenology, identified music as one of the twenty-seven faculties of the mind (Elling, Finger, & Whitaker, 2015); in Fig. 1, you can see the music faculty, listed as Tune, just above the eye. Among many others who pursued this notion, Madam Luise Cappiani (1901) gave an address at the American Institute of Phrenology in which she discussed phrenology, physiology, and psychology in connection with music and singing. • In the 1860s and 1870s, British neurologist John Hughlings Jackson (1835–1911) made cogent observations about children who could not speak but who could sing (Lorch & Greenblatt, 2015). Speaking of one speechless child, Jackson said, “It is worthy of remark that when he sings he can utter certain words … but he can only do so while singing” (Jackson, 1871, p. 430). By 1888, German neurologist August Knoblauch (1863–1919) had coined the term “amusia” (Graziano & Johnson, 2015) and created a model with five music centers: an auditory center for the perception of musical tones, a motor center for musical production, an idea center for the analysis and comprehension of music, a visual system for reading musical notation, and a motor system for writing musical notation (Johnson & Graziano, 2003). Damage to any of these five centers could lead to nine disorders, grouped into perception or production impairments. Richard Wallaschek (1860–1917), John Edgren (1849–1929), and others also investigated the loss of musical abilities in relation to brain function (Henson, 1977). • The first encephalographic (EEG) recording in humans was made by Hans Berger in 1924 (Haas, 2003). Less than twenty-five years later,
researchers were studying musicogenic epilepsy by means of EEG (Shaw & Hill, 1947). By the mid-1970s, investigators were utilizing event-related potentials (ERPs) in relation to music (Schwent, Snyder, and Hillyard, 1976). They found N100 responses (negative waves peaking between 80 and 120 ms after the onset of a stimulus) reflecting pre-attentive perception of pitch changes. • In 1981, Roland, Skinhøj, and Lassen asked participants to make same-different judgments on tone-rhythm patterns taken from the Seashore Tests of Musical Talent while undergoing positron emission tomography (PET) scans. They found widespread activations, including differences between left and right hemispheric processing. • Roland Beisteiner reported on three experiments conducted in Vienna in 1995 in which he used functional magnetic resonance imaging (fMRI), along with direct current EEG (DC-EEG) and magnetoencephalography (MEG), to demonstrate the viability of these methods in the study of music. Finger and hand movements, approximating those used in playing the piano, elicited strong activations in primary and supplementary motor cortices. Since that time, fMRI has become a predominant methodology in neuromusical research. • Recent years have seen the development of several additional methodologies, including transcranial magnetic stimulation (TMS), voxel based morphometry (VBM), tensor based morphometry (TBM), diffusion tensor imaging (DTI), and genomics approaches. Also, new data analysis techniques are being developed, such as network science (described by Wilkins, this volume).
FIGURE 1. A phrenological map of the brain. Music is listed as “Tune” and appears just above the eye. Source: By William Walker Atkinson, 1862–1932 [No restrictions], via Wikimedia Commons. https://upload.wikimedia.org/wikipedia/commons/7/71/How_to_know_human_nature_its_inner_states_and_outer_forms_%281919%29_%2814784651435%29.jpg
From these earliest explorations into music and the brain, neuromusical research has exploded in recent decades, as indicated in Fig. 2. What began as fledgling, pioneering efforts from the 1940s to the 1960s has burgeoned into a relative flood of publications in the 2000s.
FIGURE 2. The number of published articles obtained from a simple “music and brain” search in PubMed (https://www.ncbi.nlm.nih.gov/pubmed/).
Given their variety and ubiquity, human musical experiences are complex and mysterious. Philosophers, ethnomusicologists, music theorists, and many others have spilled countless barrels of ink trying to explicate the phenomenon of music. Why do we respond to music so powerfully? What does it mean? Why do we have it at all? Explaining how music “works” in the human brain is no less daunting. Of necessity, neuroscientists frequently take a reductionist approach (Bickle, 2003; Krakauer, Ghazanfar, GomezMarin, MacIver, & Poeppel, 2017). Findings from work going on at one level (e.g., networks) are not necessarily integrated into work at another level (e.g., genomics). Furthermore, results are often parsed according to methodology (e.g., fMRI and ERP). As stated, some of this is of necessity; after all, notions derived from activations generated across 30 minutes of music listening and monitored by fMRI are not immediately compatible with results from an experimental design with musical stimuli of just a few seconds as recorded by MEG. To avoid a crazy-quilt, scattershot view of music, broad overviews attempting to blend disparate findings have appeared from time to time in the literature. Whether in articles (e.g., Peretz & Zatorre, 2005; Warren, 2008), chapters (e.g., Marin & Perry, 1999; Schlaug, 2003), or books (e.g., Critchley & Henson, 1977; Koelsch, 2012), these reviews are critically
important in moving us toward a more coherent, unified understanding of music in the brain. There are certain advantages to having a singular view of one or two authors, or even in focusing the discussion in a limited word count. The present volume, on the other hand, has strengths in the diversity and expertise of fifty-four authors who have written approximately 350,000 words on music and neuroscience. In the next portion of this chapter, we provide an overview of their thirty-three chapters.
C
O
As this introductory chapter comprises the first section, these overviews will concentrate on sections II through VIII.
II. Music, the Brain, and Cultural Contexts 2. Music through the lens of cultural neuroscience, Donald A. Hodges. 3. Cultural distance: A computational approach to exploring cultural influences on music cognition, Steven J. Morrison, Steven M. Demorest, and Marcus T. Pearce. 4. When extravagance impresses: Recasting esthetics in evolutionary terms, Bjorn Merker. The three chapters in Section II aim to put the neuroscientific study of music into a larger cultural context. First, Donald Hodges revisits a longstanding notion that musical experiences have both biological and cultural underpinnings. Biology and culture are so intertwined that there is no clear way to separate the two, and no need to, either. Rather, the new field of cultural neuroscience provides increased understanding of how biological and cultural aspects constrain and enhance each other. Next, Steven Morrison, Steven Demorest, and Marcus Pearce present a model of cultural distance, a computational means of determining how closely the music from disparate cultures relate. Unfamiliar music whose statistical patterns of pitch and rhythm closely approximate one’s own may be easier to process than music with widely divergent patterns. Such a model may be useful in future neuroimaging studies of cross-cultural music processing. In the final chapter in this section, Bjorn Merker presents a persuasive argument that our human aesthetic responses to music arise from elements at play in the development of large and complex birdsong repertoires. Responses among birds may range from boredom to interest/curiosity. In
humans, a hedonic reversal leads to being impressed, being moved, or to awe and sublimity at the extreme. Taken together, these three chapters remind us that findings from the neuroscientific study of music must always be placed into broader cultural contexts in order for a full and complete understanding.
III. Music Processing in the Human Brain 5. Cerebral organization of music processing, Thenille Braun Janzen and Michael H. Thaut. 6. Network neuroscience: An introduction to graph theory networkbased techniques for music and brain imaging research, Robin W. Wilkins. 7. Acoustic structure and musical function: Musical notes informing auditory research, Michael Schutz. 8. Neural basis of rhythm perception, Christina M. Vanden Bosch der Nederlanden, J. Eric T. Taylor, and Jessica A. Grahn. 9. Neural basis of music perception: Melody, harmony, and timbre, Stefan Koelsch. 10. Multisensory processing in music, Frank Russo. Authors in Section III explore what we know about how music is processed in the human brain. Thenille Braun Janzen and Michael Thaut present an organizational scheme based upon ascending auditory pathways, auditory-frontal networks, auditory-motor networks, and auditory-limbic networks. The most advanced research has moved beyond what parts of the brain are involved at specific points in the processing stream and are beginning to look increasingly at how these various brain regions interact in real time. The complexity of music processing, involving aspects such as preference, socio-cultural contexts, musical expertise, and so on, poses a daunting challenge but substantial process is being made. One advancement, according to Robin Wilkins, is network science, which utilizes graph theory techniques and analysis as a means of understanding structural and functional connectivity in the brain. Network science moves
us closer to learning how the brain communicates with itself in the dynamic process of responding to music. A further advantage may be that it allows for monitoring task performance during much longer music listening conditions than brief excerpts. Michael Schutz continues the discussion in the next chapter with a more fine-grained examination of how micro-timing changes in musical stimuli are processed in the brain as music unfolds over time. Constant, rapid fluctuations in overtone spectra require sophisticated neural tracking mechanisms. Indeed, one of the deficiencies of early synthesized music, and to some extent some auditory perception research, is a lack of ecological validity in terms of temporally invariant musical stimuli. In the next chapter, Christina Vanden Bosch der Nederlanden, J. Eric T. Taylor, and Jessica Grahn provide an overview of the research on how the brain processes and produces musical rhythms. Auditory-motor networks are particularly important in beat finding and other rhythmic processes. Our brain’s ability to perceive and produce rhythms has wide-ranging implications for many aspects of human behavior. Stefan Koelsch expands the discussion into an examination of the neural underpinnings of melodic, harmonic, and timbral perception. Numerous and widespread brain regions are involved in processing music. Because infants and individuals without formal music training can process melody, harmony, and timbre successfully, musicality is clearly a natural ability of the human brain. Although much of the extant research focuses on particular sensory modalities, ultimately a more ecologically valid understanding arises from the integration of multiple sensory inputs and this topic is taken up by Frank Russo. An integrated, multisensory view of music processing involves auditory, visual, somatosensory, vestibular, and motor systems. This necessarily involves extensive, widely-distributed but locally-specialized neural networks (Sergent, Zuck, Terriah, & MacDonald, 1992). Overall, the six chapters of Section III remind us that music is a whole brain experience, with numerous intertwining and interacting neural networks. Enormous progress has been made in ferreting out all the disparate components and their entangled interrelationships, especially with the advent of rapidly evolving technologies but there are still puzzles left to solve.
IV. Neural Responses to Music 11. Music and memory, Lutz Jäncke. 12. Music and attention, executive function, and creativity, Psyche Loui and Rachel Guetta. 13. Neural correlates of music and emotion, Patrik Juslin and Laura Sakka. 14. Neurochemical responses to music, Yuko Koshimori. 15. The neuroaesthetics of music: A research agenda coming of age, Elvira Brattico. 16. Music and language, Daniele Schön and Benjamin Morillon. The six chapters comprising Section IV delve into the ways the brain responds to music. Once again, we see multiple overlapping and mutually reinforcing domains. All meaningful musical experiences involve memory in one way or another. Lutz Jäncke explores discrete, music-only, and shared memory systems that involve auditory processing, episodic, autobiographic, semantic, and implicit memories, as well as motor programs, emotion, and motivation. Each of these components has neural correlates designed for encoding, storing, and retrieving musical memories. Such a diffuse and distributed network may help explain commonly reported musical influences on non-musical memory formation. Psyche Loui and Rachel Guetta tackle relationships between music and attention, executive function, and creativity. The topic of attention in music can be informed by general theories of attention, as well as those specifically applied to musical stimuli. Passive music listening experiences are less likely to affect executive functions, but research is ongoing concerning whether and to what extent active musicing affects executive functions in terms of near and far transfers and in terms of relevant neural mechanisms. Attention and executive functions, along with their attendant brain networks, are both connected to musical creativity. Patrik Juslin and Laura Sakka provide a thorough and detailed review of neuroimaging studies related to music and emotion. Although certain brain regions have been more or less consistently implicated in the processing of musical emotions, much is still unclear. For example, it is not always
certain in some experimental designs whether participants are “merely” perceiving or actually experiencing musical emotions. Juslin and Sakka provide methodological recommendations for moving the field forward. Neurochemical responses are the basis for musical emotions and Yuko Koshimori reviews recent work in this emerging field. Musical experiences induce the release of neurotransmitters (e.g., dopamine, serotonin, and acetylcholine), neuropeptides (e.g., beta-endorphin, oxytocin, and arginine vasopressin), steroid hormones (e.g., cortisol), and peripheral immune biomarkers. In addition to the main area of research concerning neurochemical responses in music listening and music performance experiences, another primary course of investigation involves the intentional manipulation of neurochemicals via music in a variety of health and wellness issues (e.g., Parkinson’s disease, chronic pain, and stress). Elvira Brattico’s discussion of neuroaesthetics combines but also moves beyond the previous chapters in this section; this is another emerging field that demonstrates the maturing of neuromusical research. Building on decades of previous work in music perception, cognition, and more recently emotion, neuroaesthetics investigates matters such as brain areas involved in liking, preference, and aesthetic judgments. While this undoubtedly introduces more subjectivity into the discussion, it also moves us closer to a core human experience that lies at the root of music’s importance. Music and language are both ubiquitous aspects of the human experience and questions about the nature of and relationships between the two have been asked and the answers debated for centuries. Now neuroscientists are posing new questions, such as “to what extent are music and language processed in distinct, shared, or homologous networks?” Daniele Schön and Benjamin Morillon give answers to this and related questions based on current evidence. They also discuss the effects of musical experiences on language acquisition and skills. As was the case with Section III, Section IV demonstrates the tremendous complexity of human musical experiences from a neuroscientific standpoint. Steadily, patiently, over a period of time and with new technologies and methodologies, a clearer picture is emerging.
V. Musicianship and Brain Function
17. Musical expertise and brain structure: The causes and consequences of training, Virginia Penhune. 18. Genomics approaches for studying musical aptitude and related traits, Irma Järvelä. 19. Brain research in music performance, Eckart Altenmüller, Shinichi Furuya, Daniel Scholz, and Christos Ioannou. 20. Brain research in music improvisation, Michael Erkkinen and Aaron Berkowitz. 21. Neural mechanisms of musical imagery, Timothy Hubbard. 22. Neuroplasticity in music learning, Vesa Putkinen and Mari Tervaniemi. Authors of the six chapters comprising Section V are all concerned with unraveling knotty issues surrounding the ways musicianship and brain function interact with each other. Virginia Penhune begins with the notion that musical training affects numerous brain structures, including gray and white matter, auditory cortex and association areas, motor regions, frontal regions, and parietal cortex. Some variances between adult musicians and non-musicians may be due to pre-existing differences, but sufficient research exists to support the contention that long-term musical training produces many of these changes. Penhune also discusses reasons why music has such strong effects on brain plasticity. Irma Järvelä takes us on a tour of genomics, specifically the role of genetics in human musicality. Genes influencing inner ear development, auditory pathways, and cognition are all linked to musical aptitude. In addition, genomics research suggests that music and language have a common evolutionary heritage and that genes play a role in the effects music has on the body. Eckart Altenmüller, Shinichi Furuya, Daniel Scholz, and Christos Ioannou examine the contributions that prolonged extensive goal-directed practice, multisensory-motor integration, high arousal, and emotional and social rewards make toward inducing brain plasticity. They discuss motor planning and control, and finally musician’s dystonia, that is, plasticity-induced loss of skills or what they call de-expertise. Michael Erkkinen and Aaron Berkowitz review neuroimaging studies of music improvisation. Using PET, fMRI, tDCS (transcranial direct current stimulation), and EEG, researchers have implicated numerous brain regions
involved in the spontaneous creation of music. Overall, improvisation activates a broad network of brain regions involving cognitive control and monitoring, motor planning and execution, multimodal sensation, motivation, emotional/limbic processing, and language regions. Timothy Hubbard describes and discusses auditory and motor neural mechanisms supporting musical imagery. Involuntary musical imagery includes anticipatory musical imagery, musical hallucinations, schizophrenia, earworms, and synesthesia. Embodied musical imagery is covered in such examples as spatial and force metaphors, the role of mimicry, the distinction between the inner ear and inner voice, the effects of mental practice on performance, musical imagery and dance, and musical affect. Vesa Putkinen and Mari Tervaniemi are concerned with neural plasticity in music learning. Focusing primarily on studies employing ERPs derived from EEG and MEG, they found evidence to support the contention that musical training enhances domain-general auditory processing skills, though far transfer to executive functions is less certain. They also contend that training alone does not account for all the differences between musicians and non-musicians, as self-selection is a confound in terms of predisposing factors. These six chapters push beyond the nature of passive music listening situations into the realm of active musicing experiences. While we cannot pretend that we fully understand what is transpiring in the brain of Daniel Barenboim as he conducts a Mahler symphony, by fits and starts, patient marching, and occasional leaping, we are moving forward.
VI. Developmental Issues in Music and the Brain 23. The role of musical development in early language acquisition, Anthony Brandt, Molly Gebrian, and Robert Slevc. 24. Rhythm, meter, and timing: The heartbeat of musical development, Laurel J. Trainor and Susan Marsh-Rollo. 25. Music and the aging brain, Laura Ferreri, Aline Moussard, Emmanuel Bigand, and Barbara Tillmann.
26. Music training and cognitive abilities: Associations, causes, and consequences, Swathi Swaminathan and E. Glenn Schellenberg. 27. The neuroscience of children on the autism spectrum with exceptional musical abilities, Adam Ockelford. Throughout the lifespan, musical experiences have consequences for brain development. Anthony Brandt, Molly Gebrian, and Robert Slevc examine the role of early musical experiences on language acquisition. Evidence suggests that speech is initially processed by infants as a type of music. Initially entangled in the child’s brain, speech and music gradually develop into independent modalities. Though many of the differences between speech and music starkly divide them, timbral aspects of phonemes and prosodic elements of melodic and rhythmic inflection provide a common bridge. Laurel Trainor and Susan Marsh-Rollo focus on the special role that rhythmic elements play in musical development. Initially, infants use timing cues to perceive and respond to emotional information. As they become enculturated to their surroundings, they develop oscillatory brain rhythms that link auditory and motor aspects of entrainment. Eventually, perceptual awareness of the synchronicity of movements among people enables them to make reliable judgments of trust and friendship. Laura Ferreri, Aline Moussard, Emmanuel Bigand, and Barbara Tillmann report on the role music can play in improving cognition and promoting well-being and social connection at the other end of the lifespan. Divided into two major sections, the first concentrates on music’s contributions to healthy aging, including underlying brain regions. The second examines the role of music-based therapeutic approaches dealing with age-related issues such as memory, language, motor functions, and emotions and well-being. Swathi Swaminathan and E. Glenn Schellenberg review relationships between music training and cognitive abilities. Positive associations are reported for measures of general cognitive, visuospatial, and language abilities, as well as academic achievement and healthy aging. However, with the exception of some linkages between musical training and specific language skills, causal evidence is lacking, inconsistent, or weak. In the final chapter in this section, Adam Ockelford presents a neuroscientific model accounting for exceptional musicianship among some children on the autism spectrum. In these special cases, children process language and
everyday sounds as if they were music. For these individuals, then, music takes precedence over language and other everyday sounds. From birth to death, and in all cognitive conditions, music plays an important role in the human experience. We have known this anecdotally and now we are beginning to understand requisite brain processes.
VII. Music, the Brain, and Health 28. Neurologic Music Therapy in sensorimotor rehabilitation, Corene Thaut and Klaus Martin Stephan. 29. Neurologic Music Therapy for speech and language rehabilitation, Yune Lee, Corene Thaut, and Charlene Santoni. 30. Neurologic Music Therapy targeting cognitive and affective functions, Shantala Hegde. 31. Musical disorders, Isabelle Royal, Sébastien Paquette, and Pauline Tranchant. 32. When blue turns to gray: The enigma of musician’s dystonia, David Peterson and Eckart Altenmüller. The greatest preponderance of neuromusical research is basic research, an attempt to understand how music is processed in the brain. To date, the strongest forays into applied research come in the area of health. The five chapters in Section VII demonstrate the tremendous strides that have been taken in utilizing the power of music for more healthy living. Music is important in the development, rehabilitation, and maintenance of sensorimotor function, especially as it relates to neurologic disorders. Corene Thaut and Klaus Martin Stephan discuss the role of neurologic music therapy (NMT) in the facilitation of motor function in such populations as those with Parkinson’s disease, stroke, traumatic brain injury (TBI), multiple sclerosis, cerebral palsy, autism, and the healthy elderly. They cover acquired movement disorders, degenerative diseases, and developmental disorders. Yune Lee, Corene Thaut, and Charlene Santoni explore the efficacy of using NMT interventions for the treatment of dysarthria, apraxia of speech,
aphasia, fluency, sensory deficits, voice disorders, and dyslexia. Eight standardized clinical techniques in the speech and language domain include Melodic Intonation Therapy (MIT), Musical Speech Stimulation (MUSTIM), Rhythmic Speech Cueing (RSC), Vocal Intonation Therapy (VIT), Oral Motor and Respiratory Exercises (OMREX), Therapeutic Singing (TS), Developmental Speech and Language Training Through Music (DSLM), and Symbolic Communication Training Through Music (SYCOM). Built-in temporal processes for both rhythm and speech are mediated by corticostriatal circuitries comprising the basal ganglia, the supplementary motor area (SMA), the premotor cortex, and the frontal operculum. Shantala Hegde discusses the use of NMT to improve cognitive and affective functioning in such neurological conditions as TBI, stroke/cerebrovascular accident, dementia, other degenerative conditions like Parkinson’s disease, and in major psychiatric conditions such as schizophrenia, bipolar affective disorders, as well as common psychiatric conditions such as anxiety and depression. Music can play an important role in cognitive rehabilitation as it engages auditory, motor, language, cognitive, and emotional functions across cortical and subcortical brain regions. Although early results are promising, considerably more research using standardized NMT techniques is needed. Isabelle Royal, Sébastien Paquette, and Pauline Tranchant focus their attention on musical deficiencies due to congenital or acquired amusia and musical anhedonia. Some individuals are born with an inability to process pitch or rhythm; others acquire such deficits as a result of brain trauma or stroke. Musical anhedonia may affect approximately 2 percent of the population; even though these individuals are able to interpret music’s emotional content, they derive no pleasure from it. Collectively, the study of amusia provides a unique opportunity to study neural structures underlying music processing. David Peterson and Eckart Altenmüller investigate musical dystonia (MD), the enigmatic disorder that selectively interferes with involuntary motor control necessary for musical performance. MD includes such pathological features as abnormalities in inhibition, sensorimotor integration, and plasticity at many levels of the central nervous system. Increasing understanding of the underlying neurological processes may lead to improved management and possibly prevention of MD.
Centuries of music therapy in a broad sense of the term (e.g., as in the role of the shaman or medicine man in many societies worldwide) and decades of “modern” music therapy have clearly demonstrated the healing powers of music. We are just now, however, at the cusp of explaining these effects from a neuroscientific standpoint. Ensuing years will undoubtedly see tremendous progress in these applications.
VIII. The Future 33. New horizons for brain research in music, Michael Thaut and Donald Hodges. In the final chapter, our aim is to identify noteworthy developments in music-brain research and identify a few key areas for future research. As demonstrated throughout this book, significant strides are being made in a wide variety of important areas, including network modeling and connectivity analyses, genomics and neurotransmitter imaging, and clinical neuroscience research. Somewhat lagging is neuroimaging work in musician’s health, music education, and collaborative efforts with music philosophers. A few final comments concerning the content of this book: anyone reading multiple chapters is likely to discover that there are some overlaps in coverage. That is, subtopics may be discussed in more than one chapter. We chose not to delete most of these places during the editorial process for two main reasons: (1) Subtopics frequently need to be reintroduced in various chapters to provide context for the main topic at hand. (2) In using slightly different wording or citing different sources, various authors provide a richer understanding. Contrarily, there are still a few topics that are not covered in this volume. To do so would require expanded coverage beyond what is possible at this point. Furthermore, it should be noted that research in certain areas is moving so quickly that new findings are changing our understanding on a very short timescale. Rapid release of individual chapters online counteracts this problem to a certain extent and we are extremely pleased with the contributions these contributing authors have made to the literature on music and the brain.
R Altenmüller, E., Finger, S., & Boller, F. (Eds.). (2015a). Music, neurology, and neuroscience: Historical connections and perspectives. Progress in Brain Research Vol. 216. Amsterdam: Elsevier. Altenmüller, E., Finger, S., & Boller, F. (Eds.). (2015b). Music, neurology, and neuroscience: Evolution, the musical brain, and medical conditions and therapies. Progress in Brain Research, Vol. 217. Amsterdam: Elsevier. Beisteiner, R. (1995). DC-EEG, MEG and FMRI as investigational tools for music processing. In R. Steinberg (Ed.), Music and the mind machine: The psychophysiology and psychopathology of the sense of music (pp. 243–249). Berlin: Springer Verlag. Bickle, J. (2003). Philosophy and neuroscience: A ruthlessly reductive account. Dordrecht: Kluwer Academic Publishers. Brust, J. (2003). Music and the neurologist: A historical perspective. In I. Peretz & R. Zatorre (Eds.), The cognitive neuroscience of music (pp. 181–191). Oxford: Oxford University Press. Cappiani, L. (1901). Phrenology, physiology, and psychology in connection with music and singing. The Phrenological Journal and Science of Health (1870–1911) 3(2), 58–60. Critchley, M., & Henson, R. (Eds.). (1977). Music and the brain: Studies in the neurology of music. Springfield, IL: Charles C. Thomas. Elling, P., Finger, S., & Whitaker, H. (2015). Franz Joseph Gall and music: The faculty and the bump. In E. Altenmüller, S. Finger, & F. Boller (Eds.), Music, neurology, and neuroscience: Historical connections and perspectives. Progress in Brain Research, Vol. 216 (pp. 3–32). Amsterdam: Elsevier. Graziano, A., & Johnson, J. (2015). Music, neurology, and psychology in the nineteenth century. In E. Altenmüller, S. Finger, & F. Boller (Eds.), Music, neurology, and neuroscience: Historical connections and perspectives. Progress in Brain Research, Vol. 216 (pp. 33–49). Amsterdam: Elsevier. Haas, L. (2003). Hans Berger (1873–1941), Richard Caton (1842–1926), and electroencephalography. Journal of Neurosurgery and Psychiatry 74(1), 9. Henson, R. (1977). Neurological aspects of musical experience. In M. Critchley & R. Henson (Eds.), Music and the brain (pp. 3–21). Springfield, IL: Charles C. Thomas. Jackson, J. (1871). National hospital for the paralysed and epileptic: Singing by speechless (aphasic) children. The Lancet 2, 430–431. Johnson, J., & Graziano, A. (2003). August Knoblauch and amusia: A nineteenth-century cognitive model of music. Brain and Cognition 51(1), 102–114. Koelsch, S. (2012). Brain and music. Oxford: Wiley-Blackwell. Krakauer, J., Ghazanfar, A., Gomez-Marin, A., MacIver, M., & Poeppel, D. (2017). Neuroscience needs behavior: Correcting a reductionist bias. Neuron 93(3), 480–490. Lorch, M., & Greenblatt, S. (2015). Singing by speechless (aphasic) children: Victorian medical observations. In E. Altenmüller, S. Finger, & F. Boller (Eds.), Music, neurology, and neuroscience: Historical connections and perspectives. Progress in Brain Research, Vol. 216 (pp. 53–72). Amsterdam: Elsevier. Marin, O., & Perry, D. (1999). Neurological aspects of music perception and performance. In D. Deutsch (Ed.), The psychology of music (2nd ed., pp. 653–724). San Diego: Academic Press. Peretz, I., & Zatorre, R. (2005). Brain organization for music processing. Annual Review of Psychology 56, 89–114.
Roland, P. E., Skinhøj, E., & Lassen, N. A. (1981). Focal activations of human cerebral cortex during auditory discrimination. Journal of Neurophysiology 45(6), 1139–1151. Schlaug, G. (2003). The brain of musicians. In I. Peretz & R. Zatorre (Eds.), The cognitive neuroscience of music (pp. 366–381). Oxford: Oxford University Press. Schwent, V. L., Snyder, E., & Hillyard, S. A. (1976). Auditory evoked potentials during multichannel selective listening: Role of pitch and localization cues. Journal of Experimental Psychology: Human Perception and Performance 2(3), 313–325. Sergent, J., Zuck, E., Terriah, S. & MacDonald, B. (1992). Distributed neural network underlying musical sight-reading and keyboard performance. Science 257(3), 106–109. Shaw, D., & Hill, D. (1947). A case of musicogenic epilepsy. Journal of Neurology, Neurosurgery, and Psychiatry 10(3), 107. Warren, J. (2008). How does the brain process music? Clinical Medicine 8(1), 32–36.
SECTION II
MU S IC , T HE B R A IN , A N D C U LT U R A L C ON T E X T S
CHAPT E R 2
MUSIC THROUGH THE L E N S O F C U LT U R A L NEUROSCIENCE DONALD A. HODGES
I
C N
S have long recognized the co-equal roles biology and culture play in the phenomenon we call music (e.g., Blacking, 1973). Fifty years ago, Gaston (1968), quoting Dobzhansky, expressed the idea clearly and succinctly. In asking how we developed characteristics of humanness, he wrote: To begin to answer this question, it is not necessary to separate the biology from the culture of man [italics in the original]. They go hand in hand. “The fact which must be stressed, because it has frequently been missed or misrepresented, is that the biological and cultural evolutions are parts of the same process” (Dobzhansky, 1962, p. 22). This means that the part of man’s culture we call music has a biological as well as a cultural basis. (p. 11)
To be certain, the pendulum of our understanding has sometimes swung toward one and away from the other, when nature is favored over nurture and vice versa. However, for the moment, let us take it as axiomatic that both are necessary. Even so, “the problem of reconciling ‘cultural’ and ‘biological’ approaches to music, and indeed to the nature of mind itself,
remains” (Cross & Morley, 2009, p. 61). The purpose of this chapter, then, is not to debate that biology and culture are both necessary components of human musical experiences, nor to determine the extent of the contribution from each, but rather to examine some of the recent evidence that supports this contention. One reason to take another look at an old, and perhaps wellestablished concept, is to add newer understandings from the field of cultural neuroscience. Cultural neuroscience is an emerging field of study that has arisen as a means of investigating relationships between culture and brain (Chiao, Li, Seligman, & Turner, 2016; Han et al., 2013). Chiao (2009) sees three components of the cultural neuroscience toolbox: • Cultural psychologists investigate what cultural values, beliefs, and practices influence human behavior and how they do so. • Neuroscientists use a variety of approaches to determine the role of the brain. • Neurogeneticists investigate genetic regulation of brain mechanisms that support cognitive, emotional, and social behaviors. Using these three components, Han and Ma (2015) proposed a culture– behavior–brain (CBB) loop model of human development (Fig. 1). Culturally contextualized behaviors (CC-Behavior) occur within a specific cultural context but may not occur outside that culture. Culturally voluntary behaviors (CV-Behavior) are guided by specific cultural mores that become embedded in the brain. Genes moderate culture–brain interactions by affecting brain anatomy and some behavioral and cognitive characteristics; likewise, there are mutual gene–culture influences. Some of these genetic influences take place over thousands of years and some occur within a given lifespan.
FIGURE 1. Illustration of the CBB loop model of human development. Cultural environments contextualize human behaviors. Learning novel cultural beliefs and the practice of different behavioral scripts in turn modify the functional organization of the brain. The modified brain then guides individual behavior to voluntarily fit into a cultural context and meanwhile to modify current cultural environments. Direct interactions also occur between culture and brain without overt behavior. Abbreviations: CBB, culture–behavior–brain, CC-Behavior, culturally contextualized behavior; CV-Behavior, culturally voluntary behavior. Reprinted from Trends in Cognitive Neuroscience 19(11), Shihui Han and Yina Ma, A culture-behavior-brain loop model of human development, pp. 666–676, Figure 1, doi.org/10.1016/j.tics.2015.08.010, Copyright © 2015 Elsevier Ltd. All rights reserved.
Because a full explication of cultural neuroscience would require an extended discussion beyond this chapter, a more straightforward way to approach cultural neuroscience is to examine the implications of the following: “Cultural practices adapt to neural constraints, and the brain adapts to cultural practice” (Ambady & Bharucha, 2009, p. 342). Let us examine both of these in turn, specifically as they relate to music.
Cultural Practices Adapt to Neural Constraints
Although it is difficult to predict precise biological limits for human performance, a reasonable assumption is that biological factors place restrictions on human musicality. We can hear musical pitches only within a delimited frequency range, typically 20 Hz–20,000 Hz at the extremes. We can sing only so high; for example, Mozart stretched the limits when he wrote an F above high C in the Queen of the Night aria from The Magic Flute (“Der Hölle Rache kocht in meinem Herzen” from Die Zauberflöte). Even so, musicians are capable of amazing feats. Smith (1953) reported that one pianist performed the 6266 notes of Schumann’s Toccata in C Major, Op. 7 in 4’20” at a rate of 24.1 notes per second. Toscanini was credited with a phenomenal memory, reportedly having memorized 250 symphonic works and 100 operas (Marek, 1975). Although it is certainly possible for someone to play these pieces faster or memorize more scores, surely there must be some limits. Perceptually and cognitively, Wagner and Chinese operas push many listeners to the extreme. Going beyond human limits, however, Cage’s Organ2/ASLSP (As SLow aS Possible) is currently being performed in a church in Halberstadt, Germany in what is projected to take 639 years (Wakin, 2006). At this speed, it is possible for any person to hear only a fraction of the entire performance.
The Brain Adapts to Cultural Practice Just as the brain shapes what we do, what we do shapes the brain. Neurologist Frank Wilson (1998) wrote a compelling account of how the brain and the hand co-evolved. Over time, developmental changes allowed us to use our hands for an increasingly wider variety of tasks, such as grasping, throwing, pounding, manipulating tools, and so on, and these newly-acquired skills, in turn, spurred further brain development. Of course, it is not just the hand in isolation. In chipping stone tools, for example, listening carefully to the sound of the stone being shaped is critical to a successful result, as one extra strike may cause the rock to break. Creating bone flutes (Conard, Malina, & Münzel, 2009) or lithophones (Cross, Zubrow, & Cowan, 2002), rock percussion instruments out of flint blades, would require similar interactions of hand, ear, and brain. In the case of flutes, tinkering with where to place finger holes and
how to direct the air (i.e., whether as a notched, block, or transverse flute) requires considerable ingenuity (Kunej & Turk, 2000). Wilson encapsulates these ideas in speaking about the co-evolution of the brain and the musical hand: What we are left with when we seek to explain musical talent on a biological basis seems best characterized as an assembly of neurologic and behavioral potentials that arise from within and are uniquely defined by specific cultures. (1998, p. 224)
Another example of cultural practice influencing brain development can be seen in the organization of the hearing mechanism. Tonotopic organization maintains a frequency map on the basilar membrane in the inner ear that is maintained throughout the auditory pathway all the way to the auditory cortex. Pantev and colleagues (1998) demonstrated that for trained musicians a pitch map overlays the frequency map, as responses were 25 percent larger to piano tones than to pure tones; this was not true for controls who had never learned to play a musical instrument. Similarly, violinists and trumpeters showed more robust responses to tones from their instrument than to pure tones (Pantev, Roberts, Schultz, Engelien, & Ross, 2001). Since musical tones from Western instruments (i.e., piano, violin, trumpet) are cultural artifacts, it is difficult to account for these results unless the brain has adapted itself to environmental experiences. In contrast to the two ends of a continuum (i.e., either nature or nurture), human behavior, generally, and musical behavior, specifically, are a combination of the two. In the following sections, we will briefly examine genetic influences on musical behavior, neural plasticity, cultural influences on innate infant responses to music, the search for music universals, and cross-cultural music research.
G
I
M B
Genetic instructions provide another example of biological restrictions that can be modified by environmental experiences. Although genes provide instructions that influence nearly everything about us, including both physical features (e.g., hair and eye color) and behavior, genetic instructions
are not inviolable; rather, daily living and life’s experiences influence gene expression, including those associated with learning and memory (Rampon et al., 2000). However, interpreting gene–environment interactions is not without difficulty. What makes the situation so problematic is that some environmental circumstances that might influence genetic expression are themselves open to genetic influence. In reviewing the status of current understanding, Manuck and McCaffery state that, “… it seems reasonable to assume that most dimensions of measured experience will have both environmental and genetic determinants …” (2014, p. 63), even if there is no clear way of disentangling the two. Ullén, Hambrick, and Mosing (2016) discussed interactions between environment and genetic instructions in the development of expertise. In contrast to a focus on deliberate practice as the sole determiner of expert performance, they proposed a multifactorial gene–environment interaction model (MGIM) of expert performance (Fig. 2). According to this model, expertise results from an array of factors that work in tandem. High-level expertise (e.g., musical performance) cannot simply be a matter of enough hours of deliberate practice. Genetic and non-genetic factors, along with their interactions, are necessary. For example, in a large study of twins (N = 10,500), genetic influences accounted for the amount of practice time (69 percent of the variance in males and 41 percent in females) (Mosing, Madison, Pedersen, Kuja-Halkola, & Ullén, 2014).
FIGURE 2. Schematic summary of main elements of the multifactorial gene–environment interaction model (MGIM). At the phenotypic level (upper part), the MGIM assumes that psychological traits such as abilities, personality, interests, and motivation are associated with the domain and intensity of practice. Specific examples of variables that have been shown to be involved in various forms of expertise are provided in italics under each general heading. Practice will cause adaptations of neural mechanisms involved in expertise and can also influence relevant physical body properties. Furthermore, neural mechanisms related to trait differences may impact expertise independently of practice. Both genetic and non-genetic factors (lower part) influence the various variables that are involved in expertise at the phenotypic level. These influences are likely to be complex and involve both gene–environment interaction effects and covariation between genes and environment (G–E covariation). Reprinted from Psychological Bulletin 142(4), Fredrik Ullén, David Zachary Hambrick, and Miriam Anna Mosing, Rethinking expertise: A multifactorial gene–environment interaction model of expert performance, pp. 427–446, doi.org/10.1037/bul0000033, Copyright © 2016 American Psychological Association.
Contemporary research is providing increasingly refined understandings of genetic–musical behavior interactions. For example, gene expression is differentially upregulated or downregulated for music listening or for music performance (Kanduri et al., 2015a, b). Excellent reviews of the role of genetics in music, documenting interaction between genes and environment
are found in Mosing, Peretz, & Ullén (2018), Yi, McPherson, Peretz, Berkovic, & Wilson (2014), and Yi, McPherson, & Wilson (2018). See also Chapter 18.
N
P
Musicians are models of neural plasticity (Münte, Altenmüller, & Jäncke, 2002). That is, many changes have been documented in the brains of musicians as a result of training. Table 1 is not intended to be an exhaustive list, either of neural adaptations or number of relevant sources, but rather to show only a few of the ways that adult musicians’ brains have been modified by music learning experiences. Several investigators have concluded that these changes are more likely a result of intense music learning experiences than that these musicians were born with “different” brains (Hyde et al., 2009; Norton et al., 2005; Schlaug, Norton, Overy, & Winner, 2005; Schlaug et al., 2009). In a confirming study, identical twins, with one member of each pair having piano lessons and the other one not, showed significant differences in brain anatomy attributed to musical training (Manzano & Ullén, 2018).
Table 1. Changes in musicians’ brains Region
Change
Source
Anatomical changes Cerebellum
Greater volume in males, but not females
Hutchinson et al., 2003
Corpus callosum
Area 3 of CC enlarged
Schlaug et al., 2009
Gray matter
Greater volume in motor, auditory, and visuospatial areas
Bermudez & Zatorre, 2005; Gaser & Schlaug, 2003
Sensorimotor Identifying markers in precentral cortex for Bangert & Schlaug, 2006 cortex string players (RH) and pianists (LH) White matter
Positive correlations between amount of Bengtsson et al., 2005 practice time and white matter organization
Functional changes Auditory cortex
Increased cortical representation for musical tones over pure tones
Pantev et al., 1998, 2001
Multimodal integration areas
Increased activity in convergence zones
Hodges et al., 2005
RH motor cortex
Increased cortical representation for string Elbert et al., 1995 players
Secondary auditory cortex
Superior sound localization in conductors
Temporal and Enhanced MMN for chord alterations frontal lobes Visual cortex
Münte et al., 2001
Koelsch et al., 1999; Tervaniemi et al., 1999
Minimal deactivation of visual cortex during Hodges et al., 2010 difficult auditory tasks
RH = right hemisphere; LH = left hemisphere; CC = corpus callosum; MMN = mismatch negativity, a component of event-related potentials in response to a violation of an expected rule (e.g., a wrong note in a tonal musical passage).
Actually, formal study, or in musical parlance practice, is not necessary for musical experiences to elicit changes in the brain. With the possible exception of those with congenital amusia (Peretz, Brattico, Järvenpää, & Tervaniemi, 2009), nearly everyone learns the music of the surrounding culture, even in the absence of formal training. For example, people generally have no trouble successfully processing the accompanying
musical track while watching movies and television. This was confirmed in a study in which scores on combined music aptitude tests were normally distributed in a population, suggesting that “moderate musical aptitude is common and does not need formal training” (Oikkonen & Järvelä, 2014, p. 1104). One of the critical challenges infants face is to make sense of what initially appears to be a chaotic world. Fortunately, they come into the world remarkably able to detect patterns and structures in the environment based on the frequency with which they are encountered. Moreover, they are able to do this often in the absence of explicit feedback. Statistical learning, as it is called, is foundational for understanding how we process both auditory (Saffran, Aslin, & Newport, 1996; Saffran, Johnson, Aslin, & Newport, 1998) and visual stimuli (Kirham, Slemmer, & Johnson, 2002; Turk-Browne, Jungé, & Scholl, 2005). Music and language are the two primary auditory inputs that have been studied. Regarding music, statistical learning plays a role in the perception of melody (Creel, Newport, & Aslin, 2004), harmony (Jonaitis & Saffran, 2009), timbre (Tillmann & McAdams, 2004), and the acquisition of absolute pitch (Saffran, 2003; Saffran & Griepentrog, 2001). Gestalt organizing principles appear to be important in the statistical learning process (Creel, Newport, & Aslin, 2004). Work on the neural structures involved in statistical learning is just beginning (e.g., Karuza et al., 2013), however, there is every reason to believe that advancements in this area will continue to be made. In the meantime, additional support for innate neural structures subserving music came with the discovery that congenital amusics (persons with music processing deficits) can learn unfamiliar words as easily as controls, but not musical patterns (Peretz, Saffran, Schön, & Gosselin, 2012); in other words, mere exposure is not sufficient without the requisite intact neural mechanisms. Experience-expectant processes (e.g., language and music) are largely driven by genes; the brain prepares itself, largely through genetic processes, to learn any language(s) that the person might encounter (Kuhl & RiveraGaxiola, 2008). Experience-dependent processes (e.g., English or Spanish; jazz or Chinese opera) rely more on learning experiences. Thus, infants have the capability of processing any musical style they might encounter (Hannon & Trehub, 2005; Winkler, Háden, Ladining, Sziller, & Honing, 2009), but the particular musical style or styles depends upon the
environment in which they are raised. Galvan (2010) created a model whereby neural plasticity is a result of both development and learning (Fig. 3). Rather than being independent, autonomous processes, development and learning are part of a continuum. Genetic instructions and learning experiences work together to shape the brain. Experience-expectant mechanisms rely more on development, while experience-dependent mechanisms rely more on learning.
FIGURE 3. This working model illustrates that development and learning exist on a continuum, as each independently and simultaneously influence neural plasticity. While development is largely guided by experience-expectant mechanisms, it also receives input from experience-dependent mechanisms. Similarly, learning is mostly guided by experience-dependent mechanisms, but also receives experience-expectant input. Reprinted from Human Brain Mapping 31(6), Adriana Galván, Neural plasticity of development and learning, pp. 879–90, Figure 1+, doi.org/10.1002/hbm.21029, Copyright © 2010, John Wiley and Sons.
Looking for explanations of how these changes occur in the brain leads us to two basic brain development processes, neural pruning and myelination, both of which have been implicated in musical studies. Each process is driven by both genetic instructions and lived experiences.
Neural Pruning Early on in development, the brain overproduces synapses, the connections between neurons (Berk, 2017). Different brain regions peak at different times, but by age 2 there may be as many as 50 percent more synapses in a given area than will be present during adulthood (Stiles, Reilly, Levine, Trauner, & Nass, 2012). Following the peak of this rapid proliferation of synapses, a protracted period of decline extends throughout childhood and into early adulthood. Operating on a “use it or lose it” basis, unused synapses are selectively pruned, leaving a sculpted brain (Gogate, Giedd, Janson, & Rapoport, 2001). The number of possible connections—100,000 trillion (1014) synapses in the cerebral cortex—is far too great to be determined by genetics alone. Rather, the general outlines are genetically programmed, with selective pruning guided by sensory and motor experience, psychoactive drugs, gonadal hormones, parent–child relationships, peer relationships, stress, intestinal flora, and diet (Kolb & Gibb, 2011). Changes in cortical thickness as a result of pruning are associated with behavior. Sculpting the brain is not simply a matter of deleting unused cells and synapses. At the same time this is happening, new synapses are being formed throughout the lifetime. Synapses formed early in life are “expecting” certain life experiences that will prune them into optimal networks. Later forming synapses are more localized and specific to particular learning experiences. “Thus, experiences are changing neural networks by both adding and pruning synapses” (Kolb & Gibb, 2011, p. 268).
Myelination As neurons communicate among themselves by creating neural networks, they have numerous dendrites for input but only one axon for output. Over time, axons are covered in a fatty sheath called myelin that enhances transmission speed up to 100 times and improves efficiency (Zull, 2002). Genetic instructions drive myelination in a process that moves through the
brain from bottom to top and back to front. Thus, it is only in one’s early to mid-20s that the frontal lobes are fully myelinated, and increasing myelination is related to enhanced cognitive functioning (Webb, Monk, & Nelson, 2001). Because myelin is white in appearance, the core of the brain is called white matter; here, billions of fibers connect different regions of gray matter into neural networks (Filley, 2005). Although genetic instructions are essential, myelination is also responsive to learning experiences, as “neurons that wire together fire together” and “neurons that fire apart wire apart—or neurons out of sync fail to link” (Doidge, 2007, pp. 63–64). In other words, when we engage repeatedly in a thought or action (e.g., practicing scales), the neural network(s) supporting those processes becomes stronger with repetitive stimulation (Fields, 2009). Specifically, learning experiences elicit more wrappings of the axon, making message transmission increasingly efficient. Thus, for example, Bengtsson et al. (2005) found that practicing the piano induced changes in white matter plasticity. Changes were greater during childhood than during adolescence or adulthood. Improved efficiency comes at a cost, as myelination decreases flexibility in neural responses. That is, the brain places restrictions on itself such that what is learned limits what can be learned (Quartz, 2003). The more attuned to surrounding cultural expressions (e.g., language, music, etc.) children become, the less responsive they are to other cultural expressions (Pons, Lewkowicz, Soto-Faraco, & Sebastián-Gallés, 2009). Responding appropriately to unfamiliar tonal and rhythmic structures becomes more difficult once one has learned the music of the surrounding culture (Patel, Meltznoff, & Kuhl, 2004).
C
I R
I M
I
There is significant evidence that the fetus responds to sounds during the last trimester before birth, as activations in the primary auditory cortex were recorded using fMRI in the left hemisphere of fetuses at 33 weeks gestation (Jardi et al., 2008). Newborns as early as 1–3 days old responded to music with activations in both hemispheres (Perani et al., 2010). Excerpts of
Western tonal music registered primarily in the right hemisphere (RH), while altered or dissonant versions reduced RH responses and activated left inferior frontal cortex and limbic structures. “These results demonstrate that the infant brain shows a hemispheric specialization in processing music as early as the first postnatal hours” (Perani et al., 2010, p. 4758). Similarly, using near-infrared spectroscopy (NISR), researchers found that neonates registered speech and music sounds in both hemispheres, with more coherent responses to speech in the left hemisphere (LH) (Kotilahti et al., 2010). Regarding music specifically, researchers used event-related potentials (ERP) to determine that newborns can process musical pitch intervals (Stefanics et al., 2009), distinguish pitch from timbre (Háden et al., 2009), detect the beat in music (Winkler et al., 2009), and create expectations for tonal patterns (Carral et al., 2005). While one cannot rule out the effects of learning entirely, it seems clear that we come into the world prepared to process musical sounds. The foregoing suggests inborn proclivities for musical processing, but not predetermined responses to specific styles of music. After reviewing the literature, Hannon and Trainor (2007) concluded that neural networks “become increasingly specialized for encoding the musical structure of a particular culture” (p. 470). As an example, Shahin, Roberts, & Trainor (2004) found that auditory evoked potentials in 4- and 5-year-olds were larger in those who received Suzuki music lessons compared to controls who did not; even larger responses were generated by tones from the instrument studied (i.e., piano or violin). To conclude this section, we look at two studies in which researchers examined the effects of enculturation more closely. Mehr and colleagues conducted several experiments designed to explore the ways in which infants imbue music with social meanings. Five-month-old infants heard one of two novel songs presented by a parent, by a toy, or by a friendly, but unfamiliar adult in person and subsequently via video (Mehr, Song, & Spelke, 2016). Later, these infants heard two novel individuals sing the familiar and then the unfamiliar song. Those infants who had previously heard a parent sing the familiar song preferred (i.e., looked longer at) the new person singing it rather than the new person singing the unfamiliar song. The amount of exposure to the song received at home was correlated with the length of selective attention. These effects were not found in the infants who initially heard the familiar song emanating from a toy or a
socially-unrelated person. Thus, songs sung by caretakers embody social meanings for five-month-old infants. In an extension, eleven-month-old infants were randomly assigned to one of two groups; one group listened to one of two novel songs sung by a parent, while the others heard a song that emanated from a toy activated by a parent (Mehr & Spelke, 2017). Subsequently, they viewed a video of two new people, each singing one of the songs. In a following silent condition, two people appeared next to each other, each presenting and endorsing an object, such as a small stuffed toy or models of an apple or pear, and the infant was allowed to reach toward the objects. Preference was indicated by eye gaze and touching. Infants in both groups chose the object presented by the singer of the familiar song. Clearly, infants preferred familiar songs regardless of whether they were learned by hearing the parent sing or by playing with a musical toy. Even though both groups chose the familiar song, infants who heard parents sing the song gazed longer at the object than those who heard it coming from a toy. Again, music was imbued with social meanings.
T
S
M
U
René Dubos (1981) coined the term invariants, by which he meant characteristics of human culture that are universal in a general sense but particularized in each culture. Language, clothing, and shelter are some examples, as are art and music. A common way to approach an understanding of the ubiquity of music around the world is to separate universal from culture-specific features. Ethnomusicologists have taken on this challenge in many articles (e.g., Boiles, 1984; List, 1984; Merriam, 1964; Nettl, 1977, 1983, 2000, 2005; Nketia, 1984). By universal, they mean “more common than not” or “typical” and certainly not that every culture employs a particular feature. There will nearly always be exceptions. Nevertheless, there is abundant evidence to support the contention that all human societies engage in what may be called or recognized as music (Cross, 2007, 2009–2010; Cross & Morley, 2009; Nettl, 2005). Such universal behavior is likely supported by underlying
biological mechanisms (Turner & Ioannides, 2009), such as genetic influences (see Chapter 18). One line of support for music’s long-standing role in human development comes from archaeological findings. Although the earliest evidence of art is shrouded in the mists of time, there are tantalizing hints such as the Venus of Tan-Tan, a quartzite sculpture dated from between 300,000 and 500,000 years ago (DeFelipe, 2011) or cave paintings from 64,800 years ago (Hoffman et al., 2018). Granted, these earliest findings are controversial, having been created by Homo heidelbergensis and Homo neanderthalensis, respectively, and not direct evidence of music. However, there is reason to believe that music was also part of early human behavioral repertoire (e.g., Mithen, 2006). Here are just a few examples of supporting evidence: • 70,000 years ago: Cave paintings depict a bow, which anthropologists contend was used as a musical instrument as well as a weapon (Kendig & Levitt, 1982); musical bows have been found worldwide (Mason, 1897). • 60,000 years ago: Artifacts in a cave in Lebanon indicate ceremonies involving singing and dancing (Constable, 1973). This was made more plausible when a video was made of a contemporary Australian aborigine executing a cave painting in the presence of singing and dancing as part of a religious ritual (Mumford, 1967). Acoustically, the best places for singing and chanting are those caves with the most art and rooms with poor acoustics rarely have paintings (Allman, 1994; Cross & Morley, 2009; Morley, 2006). • 40–20,000 years ago: Cave paintings of musicians and dancers (Prideaux, 1973) are found along with whistles, pipes, flutes, and bone and rock percussion instruments (Blake & Cross, 2008; Cross et al., 2002; Dams, 1985; Kunej & Turk, 2000). Of course, a more extensive treatment of this topic would provide many more details, but even these few points should suffice to make the point that humans have always and everywhere been musical. Singing is common among all cultures (Lomax, 1968), as is the singing of lullabies and dancing to music (McDermott, 2008). Lullabies appear to
possess common features (Trehub, Unyk, & Trainor, 1993). The use of musical instruments is so common as to be nearly universal, if not completely so (Wade, 2009). Instruments are often classified into idiophones (struck instruments such as gongs and rattles), membranophones (drums), aerophones (flutes and other wind instruments), chordophones (stringed instruments), corpophones (body percussion and hand clapping), and electrophones (mechanical and electrical instruments) (Hornbostel & Sachs, 1992; Wade, 2009). Drake and Bertrand (2001) proposed five candidates for universals in temporal processing: segmentation and grouping, predisposition towards regularity, active search for regularity, temporal zone for optimal processing, and predisposition toward simple duration ratios. Even some basic emotions appear to be recognized in music cross-culturally (Adachi, Trehub, & Abe, 2004; Balkwill & Thompson, 1999; Balkwill, Thompson, & Matsunaga, 2004; Fritz et al., 2009), although subtle emotions are strongly affected by culture (Davies, 2010; Gregory & Varney, 1996). Given the enormous variety of music and musicing around the world, and even given the fact that some cultures do not have a specific word for music in their language (Cross & Morley, 2009; Dissanayake, 2009), it should be no surprise that there is scant agreement on universal features. However, Brown and Jordania (2011) proposed four types of universals: • Type 1: Conserved Universals occur in all musical utterances and include the use of discrete pitches, octave equivalence, phrase structures, and so on. • Type 2: Predominant Patterns occur in all musical systems or styles and include musical scales with seven or fewer pitches per octave, a predominance of precise rhythms, use of idiophones and drums, and so on. • Type 3: Common Patterns. Musical patterns that, while not universal, are widespread. Examples might include the unity of Jewish musical traditions following the diaspora, with Ashkenazic styles in Russia and northern Europe and Sephardic styles in Persia, India, Spain, and the Mediterranean basin (Bahat, 1980). Another example would be religious music, such as Buddhist or Christian musical practices in many different countries.
• Type 4: Range Universals. Particular categories of music or musical behavior are expressed across a wide range of possibilities. For example, all music could be placed into a classification of multipart textures, from monophony, heterophony, homophony, to polyphony. The first three categories are based strongly on Nettl’s (2000) gradientof-universality approach. The authors then provide a list of seventy items related to music’s sound structures (i.e., pitch, rhythm, melodic structure and texture, form, vocal style, expressive devices, and instruments) and extra-musical features (i.e., contexts, contents, and behavior). Just as the twelve tones of Western music’s chromatic scale provide for an infinite number of realizations, so might these putative universals provide the structure of human music within which the cultural variations are also infinite. Continued research, however, is critically necessary. Some candidates for relatively universal functions and roles of music in worldwide cultures have also been offered (Table 2).
Table 2. Functions and roles of music Music provides the function of: Emotional expression
M
Aesthetic enjoyment
M
Entertainment
M&G
Communication
M&G
Symbolic representation
M, C, & G
Physical response/coordination of action
M&C
Enforcing conformity to social norms
M
Validation of social institutions and religious rituals
M&G
Contribution to the continuity and stability of culture
M
Contribution to the integration of society
M
Regulation of an individual’s emotional, cognitive, or physiological state
C
Mediation between self and other
C
Traditional roles of music include: Lullabies
G
Games
G
Work music
G
Dancing
G
Storytelling
G
Ceremonies and festivals
G
Battle
G
Ethnic or group identity
G
Salesmanship
G
Healing
G
Trance
G
Court music
G
C = Clayton (2009), G = Gregory (1997), M = Merriam (1964).
On balance, evidence supports the notion that biological and cultural aspects combine and interact to create whatever may be universal about music. Ongoing cross-cultural music research is critical to advancing our understanding.
C
-C
M
R
In thinking about cross-cultural music research, it should be noted that one of the difficulties in our current understanding of neurocognition is the fact that 94 percent of the participants in psychological experiments come from only 12 percent of the world’s population (Arnett, 2008), and 90 percent of published neuroimaging studies come from Western countries (Chiao, 2009). This is likely even more true for music cognition and an alarming exacerbation is the rapid Westernization of the globe such that it will soon be much more difficult to find listeners who have not been exposed to Western music. Time is running out for us to have access to indigenous, authentic musical performers and listeners. A few tentative conclusions can be drawn from the relatively small number of cross-cultural music research studies published: 1. The most general finding is that enculturation strongly affects how one interprets and understands music from within and without the home culture (Curtis & Bharucha, 2009; Demorest, Morrison, Beken, & Jungbluth, 2008; Demorest, Morrison, Nguyen, & Bodnar, 2016; Kessler, Hansen, & Shepard, 1984). The cultural distance hypothesis suggests that musical processing is more efficient and accurate when unfamiliar music is similar to one’s own cultural music and less so the farther removed the unfamiliar music becomes (Demorest & Morrison, 2016; see also Chapter 3). 2. Given the caveats for music universals from the previous section, there are probable cognitive and emotional processes that support all musical experiences but these can be highly modified by enculturation (e.g., Krumhansl et al., 2000; Lauuka, Eerola, Thingujan, & Yamasaki, 2013; Neuhaus, 2003). 3. Certain basic emotions (e.g., happiness, sadness, anger) may be identifiable in unfamiliar music, but less so emotions that are more culture specific (Balkwill & Thompson, 1999; Balkwill et al., 2004; Fritz et al., 2009; Lauuka et al., 2013). There is some evidence that psychophysical variables or acoustic cues (e.g., tempo, loudness,
complexity, etc.) play a role in determining emotional expressions (Balkwill & Thompson, 1999; Balkwill et al., 2004). 4. Music enculturation begins early in infancy (Morrison, Demorest, & Stambaugh, 2008; Soley & Hannon, 2010; Trainor, Marie, Gerry, Whiskin, & Unrau, 2012); bimusicalism, similar to bilingualism, can result from sufficient early exposure (Wong, Roy, & Margulis, 2009). Once well established, however, enculturated processes may be somewhat resistant to change through training (Morrison, Demorest, Campbell, Bartolome, & Roberts 2013). 5. Active musicing is more efficient than passive exposure in establishing enculturated music processes (Trainor et al., 2012); however, passive exposure, in the form of statistical learning, is sufficient for inculcating a basic understanding of one’s own cultural music (Drake & Ben El Heni, 2003). Regarding brain responses in cross-cultural research, a few additional points can be made. Activation sites to familiar (i.e., from one’s own culture) and unfamiliar music may elicit responses in similar (Demorest & Morrison, 2003) or nearby brain regions (Matsunaga, Yokosawa, & Abe, 2012). Also, brain activations to culturally unfamiliar music may be more in terms of degree than substance (Demorest & Osterhout, 2012; Morrison & Demorest, 2009; Morrison, Demorest, Aylward, Cramer, & Maravilla, 2003; Nan, Knösche, & Frederici, 2006). However, different brain regions may also be activated in response to familiar and unfamiliar music depending on specific tasks required of participants (Nan, Knösche, Zysset, & Frederici, 2008). Finally, cultural experiences influence both the perception and memory of music at behavioral and neurological levels (Demorest et al., 2010). Taken as a whole, cross-cultural research supports the main contention of this chapter, namely that musical experiences are an intricate and complicated combination of biological and cultural processes. Because biological mechanisms may influence how enculturation proceeds and enculturation may impose biological constraints, the two are highly interrelated. As stated at the outset, our purpose is not in attempting to separate the two, as that is not only impossible but an artificial schism; rather, it is to recognize that one informs the other.
C Viewed through the lens of cultural neuroscience, the central thesis of this chapter is that biological and cultural aspects of musical experiences are inextricably intertwined. Virtually nothing about musical experiences is purely biological or purely cultural. We might consider a tree with its root system as a visual analogy (Fig. 4). Let the trunk represent musicality as a universal aspect of being human. Let the branches represent major cultural traditions and the smaller twigs and leaves stand for particular musical genres and styles.1 The cultural distance hypothesis (Demorest & Morrison, 2016) suggests that leaves on the same branch (i.e., nearby musical styles) are more understandable to listeners than leaves on the opposite side of the tree. Supporting the visible part of the tree is a dense, deep-seated root system. These roots represent the supporting biological and cultural underpinnings of music. Each root is an amalgam of biological and cultural aspects, such that it is impossible to disentangle the Gordian knot.
FIGURE 4. A visual analogy for human musicality. The roots represent biological and cultural underpinnings. The trunk represents musicality as a universal aspect of humankind. The branches represent different cultural traditions and the twigs and leaves represent particular musical genres and styles.
It is important to remember that the object of study in neuromusical research is not a brain that sits in a jar on a shelf in some lab; it is inside a living person with a personality, with all manner of proclivities, potentialities, and internal and external motivations and influences. Being mindful of these biocultural interactions does not mean that it is possible to separate biology from culture, but rather that research findings must be
interpreted with an awareness of these mutual influences. Let us hope that research within a cultural neuroscience perspective will proceed at an everincreasing pace so that we can learn as much as possible about the biocultural aspects of music before it is too late and there are no more indigenous, authentic musicians and music listeners to study.
R Adachi, M., Trehub, S., & Abe, J.-I. (2004). Perceiving emotion in children’s songs across age and culture. Japanese Psychological Research 46(4), 322–336. Allman, W. (1994). The stone age present. New York: Simon & Schuster. Ambady, N., & Bharucha, J. (2009). Culture and the brain. Current Directions in Psychological Science 18(6), 342–345. Arnett, J. (2008). The neglected 95 percent: Why American psychology needs to become less American. American Psychologist 63(7), 602–614. Bahat, A. (1980). The musical traditions of the oriental Jews. The World of Music 22(2), 46–55. Balkwill, L.-L., & Thompson, W. (1999). A cross-cultural investigation of the perception of emotions in music: Psychophysical and cultural cues. Music Perception 17(1), 43–64. Balkwill, L.-L., Thompson, W., & Matsunaga, R. (2004). Recognition of emotion in Japanese, Western, and Hindustani music by Japanese listeners. Japanese Psychological Research 46(4), 337–349. Bangert, M., & Schlaug, G. (2006). Specialization of the specialized in features of external human brain morphology. European Journal of Neuroscience 24(6), 1832–1834. Bengtsson, S., Nagy, Z., Skare, S., Forsman, L., Forssberg, H., & Ullén, F. (2005). Extensive piano practicing has regionally specific effects on white matter development. Nature Neuroscience 8(9), 1148–1150. Berk, L. (2017). Development through the lifespan. New York: Pearson Education. Bermudez, J., & Zatorre, R. (2005). Differences in gray matter between musicians and nonmusicians. The neurosciences and music II: From perception to performance. Annals of the New York Academy of Sciences 1060, 395–399. Blacking, J. (1973). How musical is man? Seattle: University of Washington Press. Blake, E., & Cross, I. (2008). Flint tools as portable sound-producing objects in the upper paleolithic context: An experimental study. In P. Cunningham, J. Heeb, & R. Paardekooper (Eds.), Experiencing archaeology by experiment (pp. 1–19). Oxford: Oxbow Books. Boiles, C. (1984). Universals of musical behavior: A taxonomic approach. The World of Music 26(2), 50–64. Brown, S., & Jordania, J. (2011). Universals in the world’s musics. Psychology of Music 41(2), 229– 248. Carral, V., Huotilainen, M., Ruusuvirta, T., Fellman, V., Näätänen, R., & Escera, C. (2005). A kind of auditory “primitive intelligence” already present at birth. European Journal of Neuroscience 21(11), 3201–3204. Chiao, J. (2009). Cultural neuroscience: A once and future discipline. Progress in Brain Research 178, 287–304.
Chiao, J., Li, S., Seligman, R., & Turner, R. (Eds.). (2016). The Oxford handbook of cultural neuroscience. Oxford: Oxford University Press. Clayton, M. (2009). The social and personal functions of music in cross-cultural perspective. In S. Hallam, I. Cross, & M. Thaut (Eds.), The Oxford handbook of music psychology (pp. 35–44). Oxford: Oxford University Press. Conard, N., Malina, M., & Münzel, C. (2009). New flutes document the earliest musical tradition in southwestern Germany. Nature 460(7256), 737–740. Constable, G. (1973). The Neanderthals. New York: Time-Life Books. Creel, S., Newport, E., & Aslin, R. (2004). Distant melodies: Statistical learning of nonadjacent dependencies in tone sequences. Journal of Experimental Psychology 30(5), 1119–1130. Cross, I. (2007). Music and cognitive evolution. In L. Barrett & R. Dunbar (Eds.), The Oxford handbook of evolutionary psychology (pp. 649–667). Oxford: Oxford University Press. Cross, I. (2009–2010). The evolutionary nature of musical meaning. Musicæ Scientiæ, Special Issue 2009–2010, 179–200. Cross, I., & Morley, I. (2009). The evolution of music: Theories, definitions and the nature of the evidence. In S. Malloch & C. Trevarthen (Eds.), Communicative musicality (pp. 61–81). Oxford: Oxford University Press. Cross, I., Zubrow, E., & Cowan, F. (2002). Musical behaviours and the archaeological record: A preliminary study. In J. Mathieu (Ed.), Experimental archaeology. British Archaeological Reports International Series 1035 (pp. 25–34). Oxford: BAR Publishing. Curtis, M. E., & Bharucha, J. J. (2009). Memory and musical expectation for tones in cultural context. Music Perception 26(4), 365–375. Dams, L. (1985). Paleolithic lithophones: Descriptions and comparisons. Oxford Journal of Archaeology 4(1), 31–46. Davies, S. (2010). Emotions expressed and aroused by music: Philosophical perspectives. In P. Juslin & J. Sloboda (Eds.), Handbook of music and emotion: Theory, research, applications (pp. 15–43). Oxford: Oxford University Press. DeFelipe, J. (2011). The evolution of the brain, the human nature of cortical circuits, and intellectual creativity. Frontiers in Neuroanatomy 5(29), 1–17. Demorest, S., & Morrison, S. (2003). Exploring the influence of cultural familiarity and expertise on neurological responses to music. Annals of the New York Academy of Sciences 999, 112–117. Demorest, S., & Morrison, S. (2016). Quantifying culture: The cultural distance hypothesis of melodic expectancy. In J. Chiao, S.-C. Li, R. Seligman, & R. Turner (Eds.), The Oxford handbook of cultural neuroscience (pp. 183–196). Oxford: Oxford University Press. Demorest, S., Morrison, S., Beken, M., & Jungbluth, D. (2008). Lost in translation: An enculturation effect in music memory performance. Music Perception 25(3), 213–223. Demorest, S., Morrison, S., Nguyen, V., & Bodnar, E. (2016). The influence of contextual cues on cultural bias in music memory. Music Perception 33(5), 590–600. Demorest, S., Morrison, S., Stambaugh, L., Beken, M., Richards, T., & Johnson, C. (2010). An fMRI investigation of the cultural specificity of music memory. Social Cognitive and Affective Neuroscience 5(2–3), 282–291. Demorest, S., & Osterhout, L. (2012). ERP responses to cross-cultural melodic expectancy violations. Annals of the New York Academy of Sciences 1252, 152–157. Dissanayake, E. (2009). Root, leaf, blossom, or bole: Concerning the origin and adaptive function of music. In S. Malloch & C. Trevarthen (Eds.), Communicative musicality: Exploring the basis of human companionship (pp. 17–30). Oxford: Oxford University Press. Dobzhansky, T. (1962). Mankind evolving. New Haven, CT: Yale University Press. Doidge, N. (2007). The brain that changes itself. New York: Penguin.
Drake, C., & Ben El Heni, J. (2003). Synchronizing with music: Intercultural differences. Annals of the New York Academy of Sciences 999, 429–437. Drake, C., & Bertrand, D. (2001). The quest for universals in temporal processing in music. Annals of the New York Academy of Sciences 930, 17–27. Dubos, R. (1981). Celebrations of life. New York: McGraw-Hill. Elbert, T., Pantev, C., Wienbruch, C., Rockstroh, B., & Taub, E. (1995). Increased cortical representation of the fingers of the left hand in string players. Science 270(5234), 305–307. Fields, D. (2009). The other brain. New York: Simon & Schuster. Filley, C. (2005). White matter and behavioral neurology. In J. Ulmer, L. Parsons, M. Moseley, & J. Gabrieli (Eds.), White matter in cognitive neuroscience. Annals of the New York Academy of Sciences 1064, 162–183. Fritz, T., Jentschke, S., Gosselin, N., Sammler, D., Peretz, I., Turner, R., … Koelsch, S. (2009). Universal recognition of three basic emotions in music. Current Biology 19(7), 573–576. Galvan, A. (2010). Neural plasticity of development and learning. Human Brain Mapping 31(6), 879–890. Gaser, C., & Schlaug, G. (2003). Gray matter differences between musicians and nonmusicians. Annals of the New York Academy of Sciences 999, 514–517. Gaston, E. (1968). Man and music. In E. Gaston (Ed.), Music in therapy (pp. 7–29). New York: Macmillan. Gogate, N., Giedd, J., Janson, K., & Rapoport, J. (2001). Brain imaging in normal and abnormal brain development: New perspectives for child psychiatry. Clinical Neuroscience Research 1(4), 283–290. Gregory, A. (1997). The roles of music in society: The ethnomusicological perspective. In D. Hargreaves & A. North (Eds.), The social psychology of music (pp. 123–140). Oxford: Oxford University Press. Gregory, A., & Varney, N. (1996). Cross-cultural comparisons in the affective response to music. Psychology of Music 24(1), 47–52. Háden, G., Stefanics, G., Vestergaard, M., Denham, S., Sziller, I., & Winkler, I. (2009). Timbreindependent extraction of pitch in newborn infants. Psychophysiology 46(1), 69–74. Han, S., & Ma, Y. (2015). A culture–behavior–brain loop model of human development. Trends in Cognitive Sciences 19(11), 666–676. Han, S., Northoff, G., Vogeley, K., Wexler, B., Kitayama, S., & Varnum, M. (2013). A cultural neuroscience approach to the biosocial nature of the human brain. Annual Review of Psychology 64, 335–359. Hannon, E. E., & Trainor, L. J. (2007). Music acquisition: Effects of enculturation and formal training on development. Trends in Cognitive Sciences 11(11), 466–472. Hannon, E. E., & Trehub, S. E. (2005). Metrical categories in infancy and adulthood. Psychological Science 16(1), 48–55. Hodges, D., Burdette, J., & Hairston, D. (2005). Aspects of multisensory perception: The integration of visual and auditory information processing in musical experiences. In G. Avanzini, L. Lopez, S. Koelsch, & M. Majno (Eds.), The neurosciences and music II: From perception to performance. Annals of the New York Academy of Sciences 1060, 175–185. Hodges, D., Hairston, W., Maldjian, J., & Burdette, J. (2010). Keeping an open mind’s eye: Mediation of cross-modal inhibition in music conductors. In S. M. Demorest, S. J. Morrison, & P. S. Campbell (Eds.), Proceedings of the 11th International Conference on Music Perception and Cognition (ICMPC 11) (pp. 415–416). Seattle, Washington. Hoffman, D., Standish, C., Garcia-Diez, M., Pettitt, P., Milton, J., Zilhão, J., … Pike, A. (2018). UTh dating of carbonate crusts reveals Neandertal origin of Iberian cave art. Science 359(6378),
912–915. Hornbostel, E., & Sachs, C. (1992). Classification of musical instruments. In H. Meyers (Ed.), Ethnomusicology: An introduction (pp. 444–461). New York: W. W. Norton. Hutchinson, S., Lee, L., Gaab, N., & Schlaug, G. (2003). Cerebellar volume of musicians. Cerebral Cortex 13(9), 943–949. Hyde, K., Lerch, J., Norton, A., Forgeard, M., Winner, E., Evans, A., & Schlaug, G. (2009). Musical training shapes brain development. Journal of Neuroscience 20(10), 3019–3025. Jardi, R., Pins, D., Houfflin-Debarge, V., Chaffiotte, C., Rocourt, N., Pruvo, J.-P., … Thomas, P. (2008). Fetal cortical activation to sound at 33 weeks of gestation: A functional MRI study. NeuroImage 42(1), 10–18. Jonaitis, E., & Saffran, J. (2009). Learning harmony: The role of serial statistics. Cognitive Science 33(5), 951–968. Kanduri, C., Kuusi, T., Ahvenainen, M., Philips, A., Lähdesmäki, H., & Järvelä, I. (2015a). The effect of music performance on the transcriptome of professional musicians. Scientific Reports 5, 9506. doi:10.1038/srep09506 Kanduri, C., Raijas, P., Ahvenaninen, M., Phillips, A., Ukkola-Vuoti, L., Lähdesmäki, H., & Järvelä, I. (2015b). The effect of listening to music on human transcriptome. PeerJ 3, e830. doi:10.7717/peerj.830 Karuza, E., Newport, E., Aslin, R., Starling, S., Tivarus, M., & Bavelier, D. (2013). The neural correlates of statistical learning in a word segmentation task: An fMRI study. Brain and Language 127(1), 46–54. Kendig, F., & Levitt, G. (1982). Overture: Sex, math and music. Science Digest 90(1), 72–73. Kessler, E. J., Hansen, C., & Shepard, R. N. (1984). Tonal schemata in the perception of music in Bali and the West. Music Perception 2(2), 131–165. Kirham, N., Slemmer, J., & Johnson, S. (2002). Visual statistical learning in infancy: Evidence for a domain general learning mechanism. Cognition 83(2), B35–B42. Koelsch, S., Schröger, E., & Tervaniemi, M. (1999). Superior pre-attentive auditory processing in musicians. NeuroReport 10(6), 1309–1313. Kolb, B., & Gibb, R. (2011). Brain plasticity and behaviour in the developing brain. Journal of Canadian Child Adolescent Psychiatry 20(4), 265–276. Kotilahti, K., Nissilä, I., Näsi, T., Lipiäinen, L., Noponen, T., Meriläinen, P., … Fellman, V. (2010). Hemodynamic responses to speech and music in newborn infants. Human Brain Mapping 31(4), 595–603. Krumhansl, C. L., Toivanen, P., Eerola, T., Toiviainen, P., Järvinen, T., & Louhivuori, J. (2000). Cross-cultural music cognition: Cognitive methodology applied to North Sami yoiks. Cognition 76(1), 13–58. Kuhl, P., & Rivera-Gaxiola, M. (2008). Neural substrates of language acquisition. Annual Review of Neuroscience 31, 511–534. Kunej, D., & Turk, U. (2000). New perspectives on the beginnings of music: Archaeological and musicological analysis of a middle Paleolithic bone “flute.” In N. Wallin, B. Merker, & S. Brown (Eds.), The origins of music (pp. 235–268). Cambridge, MA: MIT Press. Lauuka, P., Eerola, T., Thingujan, N., & Yamasaki, T. (2013). Universal and culture-specific factors in the recognition and performance of musical affect expressions. Emotion 13(3), 434–449. List, G. (1984). Concerning the concept of the universal and music. The World of Music 26(2), 40– 47. Lomax, A. (1968). Folk song style and culture. New Brunswick, NJ: Transaction Books. McDermott, J. (2008). The evolution of music. Nature 453(7193), 287–288.
Manuck, S., & McCaffery, J. (2014). Gene–environment interaction. Annual Review of Psychology 65, 41–70. Manzano, O., & Ullén, F. (2018). Same genes, different brains: Neuroanatomical differences between monozygotic twins discordant for musical training. Cerebral Cortex 28(1), 1–8, 387–394. Marek, G. (1975). Toscanini. London: Vision Press. Mason, O. (1897). Geographical description of the musical bow. American Anthropologist 10(11), 377–380. Matsunaga, R., Yokosawa, K., & Abe, J. (2012). Magnetoencephalography evidence for different brain subregions serving two musical cultures. Neuropsychologia 50(14), 3218–3227. Mehr, S., Song, L., & Spelke, E. (2016). For 5-month-old infants, melodies are social. Psychological Science 27(4), 486–501. Mehr, S., & Spelke, E. (2017). Shared musical knowledge in 11-month-old infants. Developmental Science 21(2), e12542. doi:10.1111/desc.12542 Merriam, A. (1964). The anthropology of music. Chicago, IL: Northwestern University Press. Mithen, S. (2006). The singing Neanderthals: The origins of music, language, mind, and society. Cambridge, MA: Harvard University Press. Morley, I. (2006). The evolutionary origins and archaeology of music: An investigation into the prehistory of human musical capacities and behaviors (Doctoral dissertation). University of Cambridge, Cambridge. Darwin College Research Reports, DCRR-002. Retrieved from https://www.darwin.cam.ac.uk/drupal7/sites/default/files/Documents/publications/dcrr002.pdf Morrison, S., & Demorest, S. (2009). Cultural constraints on music perception and cognition. In J. Y. Chiao (Ed.), Progress in brain research, Vol. 178: Cultural neuroscience: Cultural influences on brain function (pp. 67–77). Amsterdam: Elsevier. Morrison, S., Demorest, S., Aylward, E., Cramer, S., & Maravilla, K. (2003). fMRI investigation of cross-cultural music comprehension. NeuroImage 20(1), 378–384. Morrison, S., Demorest, S., Campbell, P., Bartolome, S., & Roberts, J. (2013). Effect of intensive instruction on elementary students’ memory for culturally unfamiliar music. Journal of Research in Music Education 60(4), 363–374. Morrison, S., Demorest, S., & Stambaugh, L. (2008). Enculturation effects in music cognition: The role of age and music complexity. Journal of Research in Music Education 56(2), 118–129. Mosing, M., Madison, G., Pedersen, N., Kuja-Halkola, R., & Ullén, F. (2014). Practice does not make perfect: No causal effect of music practice on music ability. Psychological Science 25(9), 1795–1803. Mosing, M., Peretz, I., & Ullén, F. (2018). Genetic influences on music expertise. In D. Hambrick, G. Campitelli, & B. Macnamara (Eds.), The science of expertise: Behavioral, neural, and genetic approaches to complex skill (pp. 272–282). New York: Routledge. Mumford, L. (1967). The myth of the machine. New York: Harcourt Brace Jovanovich. Münte, T., Altenmüller, E., & Jäncke, L. (2002). The musician’s brain as a model of neuroplasticity. Nature Reviews Neuroscience 3(6), 473–478. Münte, T., Kohlmetz, C., Nager, W., & Altenmüller, E. (2001). Superior auditory spatial tuning in conductors. Nature 409(6820), 580. Nan, Y., Knösche, T., & Frederici, A. (2006). The perception of musical phrase structure: A crosscultural ERP study. Brain Research 1094(1), 179–191. Nan, Y., Knösche, T., Zysset, S., & Frederici, A. (2008). Cross-cultural music phrase processing: An fMRI study. Human Brain Mapping 29(3), 312–328. Nettl, B. (1977). On the question of universals. The World of Music 19, 2–13. Nettl, B. (1983). The study of ethnomusicology. Urbana, IL: University of Illinois Press.
Nettl, B. (2000). An ethnomusicologist contemplates universals in musical sound and culture. In N. Wallin, B. Merker, & S. Brown (Eds.), The origins of music (pp. 463–472). Cambridge, MA: MIT Press. Nettl, B. (2005). The study of ethnomusicology: Thirty-one issues and concepts. Champaign, IL: University of Illinois Press. Neuhaus, C. (2003). Perceiving musical scale structures: A cross-cultural event-related brain potentials study. Annals of the New York Academy of Sciences 999, 184–188. Nketia, J. (1984). Universal perspectives in ethnomusicology. The World of Music 26(2), 3–20. Norton, A., Winner, E., Cronin, K., Overy, K., Lee, D., & Schlaug, G. (2005). Are there pre-existing neural, cognitive, or motoric markers for musical ability? Brain and Cognition 59(2), 124–134. Oikkonen, J., & Järvelä, I. (2014). Genomics approaches to study musical aptitude. Bioessays 36(11), 1102–1108. Pantev, C., Oostenveld, R., Engelien, A., Ross, B., Roberts, L., & Hoke, M. (1998). Increased auditory cortical representation in musicians. Nature 392(6678), 811–814. Pantev, C., Roberts, L., Schultz, M., Engelien, A., & Ross, B. (2001). Timbre-specific enhancement of auditory cortical representations in musicians. Neuroreport 12(1), 169–174. Patel, A., Meltznoff, A., & Kuhl, K. (2004). Cultural differences in rhythm perception: What is the influence of native language? In S. Lipscomb, R. Ashley, R. Gjerdingen, & P. Webster (Eds.), Proceedings of the 8th International Conference on Music Perception and Cognition. Evanston, IL: Northwestern University. CD-ROM. Perani, D., Saccuman, M., Scifo, P., Spada, D., Andreolli, G., Rovelli, R., … Koelsch, S. (2010). Functional specializations for music processing in the human newborn brain. Proceedings of the National Academy of Sciences 107(10), 4758–4763. Peretz, I., Brattico, E., Järvenpää, M., & Tervaniemi, M. (2009). The amusic brain: In tune, out of key, and unaware. Brain 132(5), 1277–1286. Peretz, I., Saffran, J., Schön, D., & Gosselin, N. (2012). Statistical learning of speech, not music, in congenital amusia. Annals of the New York Academy of Sciences 1252, 361–367. Pons, F., Lewkowicz, D., Soto-Faraco, S., & Sebastián-Gallés, N. (2009). Narrowing of intersensory speech perception in infancy. Proceedings of the National Academy of Sciences, 106(26), 10598– 10602. Prideaux, T. (1973). Cro-Magnon man. New York: Time-Life Books. Quartz, S. (2003). Learning and brain development: A neural constructivist perspective. In P. Quinlan (Ed.), Connectionist models of development (pp. 279–309). New York: Psychology Press. Rampon, C., Jiang, C., Dong, H., Tang, Y.-P., Lockhart, D., Schultz, P., … Hu, Y. (2000). Effects of environmental enrichment on gene expression in the brain. Proceedings of the National Academy of Sciences 97(23), 12880–12884. Saffran, J. (2003). Absolute pitch in infancy and adulthood: The role of tonal structure. Developmental Science 6(1), 35–47. Saffran, J., Aslin, R., & Newport, E. (1996). Statistical learning by 8-month-old infants. Science 274(5294), 1926–1928. Saffran, J., & Griepentrog, G. (2001). Absolute pitch in infant auditory learning: Evidence for developmental reorganization. Developmental Psychology 37(1), 74–85. Saffran, J., Johnson, E., Aslin, R., & Newport, E. (1998). Statistical learning of tone sequences by human infants and adults. Cognition 70(1), 27–52. Schlaug, G., Forgeard, M., Zhu, L., Norton, A., Norton, A., & Winner, E. (2009). Training-induced neuroplasticity in young children. The neurosciences and music III. Annals of the New York Academy of Sciences 1169, 205–208.
Schlaug, G., Norton, A., Overy, K., & Winner, E. (2005). Effects of music training on the child’s brain and cognitive development. The neurosciences and music II: From perception to performance. Annals of the New York Academy of Sciences 1060, 219–230. Shahin, A., Roberts, L., & Trainor, L. (2004). Enhancement of auditory cortical development by musical experience in children. NeuroReport 15(12), 1917–1921. Smith, H. (1953). From fish to philosopher. Garden City, NY: Doubleday Anchor. Soley, G., & Hannon, E. (2010). Infants prefer the musical meter of their own culture: A crosscultural comparison. Developmental Psychology 46(1), 286–292. Stefanics, G., Háden, G., Sziller, I., Balázs, L., Beke, A., & Winkler, I. (2009). Newborn infants process pitch intervals. Clinical Neurophysiology 120(2), 304–308. Stiles, J., Reilly, J., Levine, S., Trauner, D., & Nass, R. (2012). Neural plasticity and cognitive development: Insights from children with perinatal brain injury. Oxford: Oxford University Press. Tervaniemi, M., Kujala, A., Alho, K., Virtanen, J., Ilmoniemi, R., & Näätänen, R. (1999). Functional specialization of the human auditory cortex in processing phonetic and musical sounds: A magnetoencephalographic (MEG) study. NeuroImage 9(3), 330–336. Tillmann, B., & McAdams, S. (2004). Implicit learning of musical timbre sequences: Statistical regularities confronted with acoustical (dis)similarities. Journal of Experimental Psychology: Learning, Memory, and Cognition 30(5), 1131–1142. Trainor, L., Marie, C., Gerry, D., Whiskin, E., & Unrau, A. (2012). Becoming musically enculturated: Effects of music classes for infants on brain and behavior. Annals of the New York Academy of Sciences 1252, 129–138. Trehub, S., Unyk, A., & Trainor, L. (1993). Adults identify infant-direct music across cultures. Infant Behavior and Development 16(2), 193–211. Turk-Browne, N., Jungé, J., & Scholl, B. (2005). The automaticity of visual statistical learning. Journal of Experimental Psychology: General 134(4), 552–564. Turner, R., & Ioannides, A. (2009). Brain, music and musicality: Inferences from neuroimaging. In S. Malloch & C. Trevarthen (Eds.), Communicative Musicality (pp. 147–181). Oxford: Oxford University Press. Ullén, F., Hambrick, D., & Mosing, M. (2016). Rethinking expertise: A multifactorial gene– environment interaction model of expert performance. Psychological Bulletin 142(4), 427–446. Wade, B. (2009). Thinking musically: Experiencing music, expressing culture (2nd ed.). New York: Oxford University Press. Wakin, D. (2006). John Cage’s long music composition in Germany changes a note. New York Times, May 6. Retrieved September 26, 2017 from http://www.nytimes.com/2006/05/06/arts/music/06chor.html Webb, S., Monk, C., & Nelson, C. (2001). Mechanisms of postnatal neurobiological development: Implications for human development. Developmental Neuropsychology 19(2), 147–171. Wilson, F. (1998). The hand: How its use shapes the brain, language, and human culture. New York: Vintage Books. Winkler, I., Háden, G., Ladining, O., Sziller, I., & Honing, H. (2009). Newborn infants detect the beat in music. Proceedings of the National Academy of Sciences 106(7), 2468–2471. Wong, P., Roy, A., & Margulis, E. (2009). Bimusicalism: The implicit dual enculturation of cognitive and affective systems. Music Perception 27(2), 81–88. Yi, T., McPherson, G., Peretz, I., Berkovic, S., & Wilson, S. (2014). The genetic basis of music ability. Frontiers in Psychology 5(658), 1–19. Yi, T., McPherson, G., & Wilson, S. (2018). The molecular genetic basis of music ability and musicrelated phenotypes. In D. Hambrick, G. Campitelli, & B. Macnamara (Eds.), The science of
expertise: Behavioral, neural, and genetic approaches to complex skill (pp. 283–303). New York: Routledge. Zull, J. (2002). The art of changing the brain: Enriching the practice of teaching by exploring the biology of learning. Sterling, VA: Stylus Publishing.
1
To be more accurate, each leaf should have a different shape to represent the individuality of various musical styles.
CHAPT E R 3
C U LT U R A L D I S TA N C E : A C O M P U TAT I O N A L A P P R O A C H TO E X P L O R I N G C U LT U R A L I N F L U E N C E S ON MUSIC COGNITION S T E V E N J. MO R R I S O N, S T E V E N M. D E MO R E S T, A N D MA R C U S T. P E A R C E
I A with many psychological constructs, much of what has been reported in research on the cognitive processing of music is limited to data collected from individuals from a small subset of cultural contexts (Henrich, Heine, & Norenzayan, 2010). Further, the music that is typically employed for the purposes of testing and exploration tends to be drawn from a similarly small set of music practices and mostly consists of that constructed within the Western diatonic framework. This includes Western classical music as well as many North American and Western European folk and popular genres. This is striking given that music is often regarded as a particularly prominent and powerful manifestation of culture. Music is a common way for individuals to assert cultural identity (Frith, 1996) and, as such, its value
arguably lies as much in its cultural and stylistic distinctiveness as in any universal qualities it may possess. Musical systems are somewhat closed in that each describes a set of practices and conventions within which performances, pieces, or whatever might be the appropriate musical “unit” are understood and evaluated. These same practices and conventions can also serve as touchstones against which individuals push in the spirit of creativity and innovation. People come to inhabit a musical system due to various combinations of formal learning—conservatory training, for example, as a means of gaining knowledge of avant-garde art music—and informal learning—becoming steeped in Cajun music as a result of growing up in the southern region of the US state of Louisiana. In this chapter, our purpose is to emphasize music as an intercultural phenomenon. As such, we will not focus on the particularities of any specific music cultural tradition, nor will we examine the concept of musical universals or the structural or acoustical candidates for such a distinction. Rather, we will dedicate our attention to interactions between music cultures, to what happens when music moves across cultural boundaries. From a sociological perspective, it has been useful to view the construct of culture from a somewhat dichotomous perspective in which the notion of the cultural insider can be contrasted with that of the cultural outsider (Merton, 1972). Contemporary scholarship has drawn attention to the complexity of this comparison and the considerable subjectivity that lies at the heart of such an often, oversimplified bifurcation (for an examination of this issue in the field of music research, see Trulsson & Burnard, 2016). Although music is often associated with cultural identity and therefore susceptible to insider/outsider categorization, the ease with which an individual interacts with any given culture’s music may be more nuanced. Culture-based differences in the way listeners and performers interact with and respond to music are often delineated by ethnic identity or geographical location which are, in turn, generally treated as categorical constructs. As such, they tend to oversimplify complex relationships, obscure considerable within-group variability, and, most critically for the present purpose, do not hold up well when considering a brain-based understanding of music processing. The cultural dimension of music provides context for critical tests of music as a neurological phenomenon. The conclusion that particular brain
regions or neurological pathways are associated with human music processing can be tested by examining whether such relationships are evident across musical and cultural contexts. Likewise, the strength or extent of neural activity may offer insights into the ways in which particular music parameters function within musical systems. Cultural roots of music practice also offer a critical test of principles of formal musical learning. Teaching and learning practices often vary from culture to culture and, given that they are often directed toward withinculture music, likely interact with the idiosyncratic elements of the music being taught. The prospect of learning—even at a fundamental level—an unfamiliar music tradition as a performer or as a listener provides a context in which culture-general learning strategies or pathways might be tested. Similarly, it provides a framework in which “from the ground up” skill or schema development can be observed, particularly through more informal learning pathways in which exposure and self-directed discovery feature prominently. At the neurological level, learning within a culturally unfamiliar context might provide further evidence of experience-based neural plasticity as well as potential interactions with already-learned music conventions. Given the incremental nature of music learning (formal or informal) and the imprecision of insider/outsider classifications, cross-cultural studies of both music perception and music learning would benefit from a more nuanced view of cross-cultural differences in musical traditions, one that is more continuous than categorical. Below we will explore the construct of cultural distance as one potential approach. Cultural distance has been examined at a societal level (Hofstede, 1983) through the development of a suite of measures found to effectively account for culture-based variability among workers. Since its publication, this construct has been used primarily in the fields of business and economics; however, it has also been employed in a number of cross-cultural designs including, occasionally, those related to music (Baek, 2015). The principle of cultural distance—as a way to conceptualize a culture-specific phenomenon in relation to its manifestation in other cultures—is evident in research on more specific cultural practices, as well. Kuhl (1991), for example, posited a “perceptual magnet effect” to explain early language learning processes and the manner in which infants’ speech perception quickly gravitates toward commonly used phonemic prototypes. Similarly, individuals demonstrate better memory (Golby,
Gabrieli, Chiao, & Eberhardt, 2001) as well as better recognition of emotional expression (Chiao et al., 2008) for same-race faces. In both instances, more differentiated face recognition correlated with increased neural activity in fusiform areas and amygdala, respectively. In this chapter, we will provide a brief overview of cross-cultural research in music cognition. We will consider studies that have compared individuals’ interactions with culturally familiar and unfamiliar music, those that have compared responses by participants from different cultural backgrounds, and those that have employed fully comparative designs in which participants of different cultural backgrounds interact with each other’s music tradition. Among the previous research, we will summarize some of our own recent work that has focused on identifying musical parameters—specifically pitch and rhythm—that appear to make a particularly strong contribution to the differences arising from cross-cultural music interactions. Based on this work, we will then describe the construct of cultural distance as a conceptual and analytical means of interpreting and perhaps predicting cross-cultural responses to music.
R
L
The purpose of this review is to provide a brief overview of topics in music cognition that have been explored through a cross-cultural lens. (For more thorough treatment of this topic see reviews by Morrison & Demorest, 2009 and Patel & Demorest, 2013.) Researchers have explored the cross-cultural perception of music emotion, preference, musical structures of scale and key, rhythm and meter, and larger formal elements, as well as musical memory. Participants in these studies have spanned the gamut from infancy to adulthood offering a picture of how culture influences music cognition and how that influence changes with age and experience.
Cross-Cultural Explorations of Emotion
The single largest body of cross-cultural research in music cognition has to do with the recognition of emotion in music. With the exception of a very small number of studies (e.g., Egerman, Fernando, Chuen, & McAdams, 2015), the research has focused not on emotion induction, or how music makes you feel, but on the ability to recognize emotional states present in music stimuli. On the surface, this seems a curious choice given the somewhat flexible nature of emotion recognition even within a cultural group. However, emotion proves to be an excellent choice for exploring cultural universality versus particularity in music cognition because emotions refer not just to cognitive categories, but to physical states that can be mimicked acoustically (Juslin, 2000; Juslin & Laukka, 2003). Crosscultural studies have explored Western listeners’ perceived emotion in music of India (Balkwill, 2006; Balkwill & Thompson, 1999; Balkwill, Thompson, & Matsunaga, 2004; Deva & Vermani, 1975; Gregory & Varney, 1996; Keil & Keil, 1966), perception of Western music by nonWestern listeners, including Congolese pygmies (Egermann et al., 2015) and the Mafa people of northern Cameroon (Fritz et al., 2009), Western listeners perception of Congolese pygmy music (Egermann et al., 2015), and the cross-cultural communication of emotion involving performers and listeners from Swedish, Indian, and Japanese music cultures (Laukka, Eerola, Thingujam, Yamasaki, & Beller, 2013). The findings can be summarized briefly as follows: A limited set of emotions can be recognized in music regardless of cultural familiarity. The emotions most consistently recognized (happy, sad, angry) vary in arousal in ways that mimic physiological states. Other emotion recognition judgments show influences of cultural familiarity. There are several theories of emotion recognition that attempt to model this combination of psychophysical and cultural cues in emotion recognition judgments. One of the first theories was the Cue Redundancy Model (CRM) proposed by Balkwill and Thompson (1999). According to this model, emotions in music are decoded by attending to cues in the musical stimulus consisting of psychophysical cues (sound intensity, tempo, melodic complexity, pitch range, etc.) and culture-specific cues like the use of a certain instrument or tonality to communicate a particular emotional state. This allows in-culture listeners to use more information in their emotion recognition judgments, but it also allows out-of-culture listeners to access basic emotional information regardless of familiarity. The authors later proposed a more
refined model called Fractionating Emotional Systems or FES (Thompson & Balkwill, 2010). FES attempts to explain how the culture-specific and culture-general cues proposed in CRM function in development. They propose that all emotion communication is built on a phylogenetic base of shared cues involved in being human. As we age we incorporate ontogenetic cues for both music and language prosody into our emotional vocabulary in a more culturally specific way. Fritz (2013) has proposed a “dock-in” model of emotion recognition that is consistent with previous models in stating “all music cultures contain both universal and culturespecific features” (p. 514). It differs from previous models in that it proposes that different cultures may “dock in” to only a subset of universal music codes and that cross-cultural understanding can be explained in part by the overlap in universal features employed. This notion of overlap between cultures is similar to the cultural distance hypothesis discussed below, though the basis for comparing cultural systems is based on a simulation of the cognitive processing of musical structure rather than a comparison of stimulus features. When evaluating the findings of cross-cultural research in emotion perception it is important to keep in mind that, of all of the studies listed, only three (Egermann et al., 2015; Gregory & Varney, 1996; Laukka et al., 2013) were fully comparative, that is, featuring both listeners and musical stimuli from all cultures involved (Patel & Demorest, 2013). It may be difficult to generalize these findings to other non-Western listeners or musics. While the experience of emotions is a human universal, the notion that music contains an emotional message rather than a functional or social one, may be a somewhat culturally specific one. Given that most of the studies cited here asked listeners from Western or Western-influenced cultures to identify the emotions in non-Western music, and that much of that music came from a single non-Western culture (India), it is difficult to determine the cultural appropriateness of emotion judgments in music. As Fritz (2013) observed in relation to one specific comparison involving members of a society indigenous to a remote region of Cameroon, “the musical expression of a variety of emotions like fearfulness and sadness, while recognized in the Western stimuli by the Mafa participants, are— according to interviews with Mafa individuals—never represented in the traditional music of the Mafa people” (p. 512).
Cross-Cultural Explorations of Music Preference Music preference research also explores affective responses to music, not in terms of how music codes affect and emotion, but rather by examining the conditions under which listeners experience pleasure when hearing music. As LeBlanc proposed in his theoretical model, “Music preference decisions are based upon the interaction of input information and the characteristics of the listener, with input information consisting of the musical stimulus and the listener’s cultural environment” (1982, p. 29). Music educators have long been interested in music preference as a cross-cultural phenomenon in part due to their commitment to providing a culturally diverse music education. Researchers in music education have looked at how children’s preference for music of other cultures develops and its relationship to familiarity and other musical features. Researchers have explored the musical qualities that might influence preference judgments across cultures (Demorest & Schultz, 2004; Flowers, 1980; Fung, 1994; Morrison & Yeh, 1999; Shehan, 1981) and whether instruction in a culture’s music can influence preference (Heingartner & Hall, 1974; Shehan, 1985). As with the research on emotion, the bulk of studies explore how Western listeners respond to non-Western music and are not fully comparative. Findings show that preference for culturally unfamiliar music can be increased with exposure—most of these studies were conducted in formal educational settings among school-age and college populations—but it does not extend to novel pieces from the culture. Also, students prefer music that has properties of their culture such as westernized arrangements of non-Western music (Demorest & Schultz, 2004). To summarize the findings, the more familiar sounding something is culturally, the more likely listeners are to like it. However, while exposure can increase preference for out-of-culture music, it does so only for learned pieces and does not generalize to the style as a whole (Shehan, 1985).
Cross-Cultural Explorations of Musical Structure
One of the debates surrounding music and culture is the extent to which there are deep structures in music that are relatively invariant across cultures (cf. Brown & Jordania, 2013). Given humans’ shared biology and the apparent human need to engage in musical behavior, it is plausible that certain structural features would be present in most, if not all, musics. Through cross-cultural explorations of musical structure, researchers have sought to identify some of the structural features or perceptual processes that work across cultures as well as the points at which music cognition becomes more culturally bound.
Scale and Key Perception Some of the earliest cross-cultural work done on scale perception included infants (Lynch & Eilers, 1991, 1992; Lynch, Eilers, Oller, & Urbano 1990; Lynch, Eilers, Oller, Urbano, & Wilson, 1991; Lynch, Short, & Chua, 1995). In a series of studies the authors tested whether pitch deviations could be detected when presented in the context of familiar (major/minor) versus unfamiliar (pelog) scale contexts. They found that deviations were better detected for familiar scale contexts for both adults and children with the exception of infants aged 6–12 months who performed similarly. While these studies represent an important early attempt to examine scale perception, they were hampered by methodological issues pertaining to the way in which stimuli were created and the possible interference of absolute pitch strategies. There has been a significant amount of work examining whether tonal relationships or tonal hierarchies (Krumhansl & Shepard, 1979) can be perceived by out-of-culture listeners (Castellano, Bharucha, and Krumhansl, 1984; Kessler, Hansen, and Shepard, 1984; Krumhansl, 1995; Krumhansl, Louhivuori, Toiviainen, Jarvinen, & Eerola, 1999; Krumhansl et al., 2000). The research has included music and participants from a variety of cultures in the designs and the findings have been mixed. The general sense is that out-of-culture listeners can employ more global strategies involving tone proximity and frequency of occurrence within the stimulus materials to mimic insider tonality judgments, but only up to a point. When judgments become more complex (Krumhansl et al.,1999, 2000) or require specific cultural knowledge (Curtis & Bharucha, 2009), cultural influences on tonal cognition become more pronounced. This
suggests that tonality perception, like emotion perception, provides both general and specific cues for listeners depending on their cultural background. Two recent fully comparative studies (Raman & Dowling, 2016, 2017) demonstrate the relative influence of global versus cultural factors in tonality judgments. In a series of four experiments across two studies the authors explored the sensitivity of Western and Carnātic trained musicians to two types of modulations in Carnātic melodies. The rāgamālikā modulation is more typical in Carnātic music and corresponds to the less frequent parallel minor (C major to C minor) modulation in Western music. The grahabēdham modulation is less common in Carnātic music, but more common in Western music as it corresponds to a modulation to the relative minor (C major to A minor). They tested modulation identification (both accuracy and speed), tonal profiles, and active probe tone response during modulation. While results varied somewhat across the different experiments, they found, in general, that cultural background influenced speed and accuracy in modulation detection with Indian listeners more accurate overall. Response time varied by the cultural familiarity of the modulation, with Indians faster for rāgamālikās and Westerners faster for grahabēdhams. They also found that Western musicians’ tone profile responses, while relying on global information about frequency and distribution of tones, were sometimes influenced by a misapplication of Western major/minor judgments in Carnātic tone profiles. The authors reference the Cue Redundancy Model reviewed above as a possible explanation for the mix of global and cultural cues employed by both groups of musicians. Other approaches to cross-cultural tonal cognition have included eventrelated potential (ERP) responses to tasks involving out-of-culture scale violations (Neuhaus, 2003; Renninger, Wilson, & Donchin, 2006) and melodic expectancy violations (Demorest & Osterhout, 2012). In general, listeners were less sensitive to out-of-culture scale deviations unless they could detect the deviations using a culture-specific strategy. Another area of research has addressed whether linguistic background shapes musical ability. Researchers have found that tonal language speakers are generally better at general pitch discrimination (Giuliano, Pfordresher, Stanley, Narayana, & Wicha, 2011; Pfordresher & Brown, 2009; Wong et al., 2012) and even at pitch accuracy in singing (Pfordresher & Brown, 2009) than
those from non-tonal linguistic backgrounds. The authors suggest that finegrained pitch processing is central to the acquisition of a tonal language and therefore better developed among these individuals (Pfordresher & Brown, 2009).
Rhythm and Meter Perception Rhythm and meter perception has received much more attention in music cognition over the last ten to fifteen years, and with that attention has come a commensurate increase in cross-cultural exploration. Researchers have examined when infants’ responses to meter become culturally biased (Hannon & Trehub, 2005a, 2005b; Soley & Hannon, 2010), the influence of linguistic rhythm on rhythm perception (Hannon, 2009; Iversen, Patel, & Ohgushi, 2008; Patel & Daniele, 2003; Yoshida et al., 2010), and cultural influences on rhythmic perception and performance (Cameron, Bentley, & Grahn, 2015; Drake & Ben El Heni, 2003; Polak, London, & Jacoby, 2016; Stobart & Cross, 2000). In all of these investigations researchers have found varying degrees of cultural influence in rhythm processing in adults and infants, with infants demonstrating a preference for the meters of their home culture as early as 4–8 months (Soley & Hannon, 2010), even when those meters were more complex. Unlike adults, monocultural infants were equally responsive to metric violations within both familiar and unfamiliar meters (Hannon & Trehub 2005a) and infants as old as 12 months demonstrated enough flexibility to “reset” their perceptual responses with sufficient exposure to an unfamiliar meter (Hannon & Trehub, 2005b). While language acquisition has often been a focus of tonal cognition, several studies have found relationships between the rhythmic qualities of language and musical rhythms (Hannon, 2009; Patel & Daniele, 2003) and rhythm grouping (Iversen et al., 2008; Yoshida et al., 2010) of instrumental music from the culture. In a recent fully comparative study, Cameron and colleagues (2015) tested Western-born and East African musicians’ performance on three rhythmic tasks, discriminating between two patterns, reproducing rhythm patterns, and tapping a steady beat to rhythmic patterns. Patterns were drawn from East African and Western music and the authors predicted that musicians would show a cultural advantage for all three tasks. As with
previous cross-cultural work, however, they found that while the two performance tasks (rhythm reproduction and beat tapping) showed an inculture advantage, the groups were equally adept at rhythm discrimination. This study was particularly noteworthy for including both perception and performance measures, as many studies feature one or the other.
Phrasing and Form Researchers have explored the influence of enculturation on phrase boundary perception (Nan, Knösche, & Friederici, 2006; Nan, Knösche, Zysset, & Friederici, 2008) and musical tension (Wong, Chan, Roy, & Margulis, 2011) through neuroscientific measures. Two fully comparative ERP studies (Nan, Knösche, & Friederici, 2009; Nan et al., 2006) tested Chinese and German musicians’ and non-musicians’ ability to detect phrase boundaries cross-culturally in unfamiliar excerpts. Results showed a clear in-culture advantage on the behavioral task, and early positive ERP components (100–450 ms) distinguished the two groups of participants for Chinese music (familiar only to the Chinese participants). Both groups exhibited a Closure Positive Shift neurologically suggesting they were sensitive to phrase boundaries in both cultures. A follow-up study with only German participants used an fMRI paradigm (Nan et al., 2008) to scan participants while they heard phrased and unphrased examples of Western and Chinese melodies that they were asked to classify by culture. All participants were better at recognizing in-culture examples and the researchers found that participants exhibited generally higher activation when listening to the Chinese melodies in regions associated with attention and auditory processing suggesting that out-of-culture music is more demanding for those processes. In most of the studies reviewed thus far, there are differences with inculture and out-of-culture responses to a variety of musical tasks from emotion and preference to basic musical structures. However, the results are almost always tempered by an awareness that some aspects of music processing can be done without relying on culturally specific strategies, using more global cues and responding to familiar sounding aspects of unfamiliar cultures. In the next section, we review a series of studies on cross-cultural music memory that have led us to propose a possible explanatory framework for musical enculturation.
Cross-Cultural Explorations of Music Memory In a series of experiments over the last decade or so we have used recognition memory as a way of assessing how effectively in-culture and out-of-culture music is processed. The studies have explored both behavioral (Demorest, Morrison, Beken, & Jungbluth, 2008; Morrison, Demorest, Campbell, Bartolome, & Roberts, 2012; Morrison, Demorest, & Stambaugh, 2008) and neurological (Demorest et al., 2010; Morrison, Demorest, Aylward, Cramer, & Maravilla, 2003) responses to culturally familiar (Western or Turkish) and culturally unfamiliar (Turkish or Chinese) music. In addition, we explored whether memory performance was influenced by training (Demorest et al., 2008; Morrison et al., 2012) or complexity (Morrison et al., 2008). The primary finding of this research has been that there is an “enculturation effect,” or cultural bias, in listening such that culturally unfamiliar music is consistently less effectively processed even when considering matters of age, training, and complexity. Further, this effect appears in both Western and non-Western born listeners. This finding was strengthened by the work of another group that tested memory and tension judgments in monomusical and bimusical participants in the United States and India (Wong, Roy, & Margulis, 2009) and found a similar recognition memory effect for monomusical, but not for bimusical, participants. It should be noted that in most cases out-of-culture recognition memory was above chance and demonstrated improvement with repeated testing (Morrison et al., 2012); however, the observed difference between in- and out-of-culture memory performance remained. Despite the consistency of the enculturation effect, we did not have a good explanation for its cause: that is, what aspect of out-of-culture music was interfering with listeners’ ability to hear and remember it? What was so unfamiliar about culturally unfamiliar music? Was it timbre, tonality, rhythm, melody, or some combination? In a recent study (Demorest, Morrison, Nguyen, & Bodnar, 2016), we sought to strip away contextual variables in an attempt to attenuate or eliminate the effects of enculturation on memory performance. We also explored the possible influence of music preference as a variable influencing attention and memory. Western-born participants (N = 128) were randomly assigned to conditions in which they heard the same music excerpts presented in one of three contexts: full instrumental ensemble (the original version), a single-voice melody on
piano, or a single-voice isochronous pitch sequence also on piano. In each condition participants heard a block of three longer Western art music excerpts and a block of three longer Turkish art music excerpts in a counterbalanced order. After each example, they were asked to rate their preference for the excerpt. After each set of three examples they completed a twelve-item recognition memory test with six targets (taken from the excerpts heard previously) and six foils (taken from a musically different and previously unheard part of the same pieces). Regardless of the listening condition, participants demonstrated superior memory for in-culture examples suggesting that none of the contextual changes mitigated memory performance for out-of-culture music. In-culture memory performance was influenced by context, but out-of-culture memory performance was not. Preference was higher overall for in-culture music, but there was no significant correlation between preference scores and memory performance across cultures. This suggested that the process of enculturation involved a kind of informal learning of deeper structure involving commonly heard sequences of pitch relationships. Based on these findings we concluded, “If our understandings of out-ofculture music are filtered through in-culture expectations, then a comparison of the statistical properties of a listener’s home culture with that of an unfamiliar culture might yield predictive information about subsequent memory performance” (Demorest et al., 2016, p. 597). We labeled the notion of a statistical comparison between music cultures across one or more selected parameters as cultural distance (Demorest & Morrison, 2016) in an effort to convey the potentially continuous rather than dichotomous relationship among music cultural practices. In the next section, we will discuss the construct of cultural distance as an explanatory framework and present illustrative work in cross-cultural corpus analysis that lends support to its central premise.
C
D
Throughout the body of research that examines cross-cultural cognitive processes associated with music, the logic of the underlying design typically sets individuals and/or music examples from one cultural
background in contrast with individuals and/or music from another cultural background. Such designs impose a dichotomous relationship between that which is culturally familiar or culturally similar and that which is unfamiliar or dissimilar. On one scale, this might be seen as reflecting the in-group and out-of-group dynamic. However, such bifurcation blurs the fluidity that characterizes musical interactions (Cross, 2008). That is, from the point of view of an individual encultured in a particular music tradition, the music of a culturally unfamiliar tradition may seem surprisingly accessible in one case or virtually impenetrable in another. It is this distinction—and the continuum of increasing or decreasing similarity from one’s own music— that we propose can be productively explored using the concept of cultural distance (Demorest & Morrison, 2016). The way in which an individual interacts with music is mediated by the properties common to the prevailing music of that individual’s culture. The music on which one was “brought up” provides the framework by which subsequent music experiences are judged as typical or atypical. Put another way, the statistical likelihood of events that characterize the music of one’s home culture governs not only the way in which one interacts with novel pieces from within that same cultural tradition, but also with music from culturally unfamiliar music traditions. One scans for common and familiar patterns both where they are likely to be found and where they may not be likely at all. This situation suggests a way in which an individual’s responses to and facility with culturally unfamiliar music may be interpreted or, indeed, predicted. Specifically, we have hypothesized that the degree to which the musics of any two cultures differ in the statistical patterns of pitch and rhythm will predict how well a person from one of the cultures can process the music of the other. (Demorest & Morrison, 2016, p. 189)
Based on this cultural distance hypothesis, music cultures with considerable overlap of patterns would likely allow for more efficient and effective processing that might be observed through such responses as recognition memory, error detection, phrase parsing, or metric identification, to name a few. In order to test this proposition, we first need a way to ascertain the statistical properties of structural parameters considered typical of a given culture’s music. IDyOM (Information Dynamics of Music; Pearce, 2005) is a computational model of auditory expectation that uses statistical learning
and probabilistic prediction to acquire and process internal representations of the structure of a musical style. Using the intervallic content of melody as an illustration, IDyOM generates a probability distribution over the set of possible intervals leading to each note in the melody. IDyOM generates probability distributions that are conditioned upon the preceding musical context and the prior musical experience of the model. The probability of each note can be log-transformed to yield its information content according to the model (MacKay, 2003), which reflects how unexpected the model finds a note in a particular context. IDyOM is a variable-order Markov model (Begleiter, El-Yaniv, & Yona, 2004; Bell, Cleary, & Witten, 1990; Bunton, 1997; Cleary & Teahan, 1997) which uses a multiple-viewpoint framework (Conklin & Witten, 1995) to represent music. This means that IDyOM has several features that go beyond the capabilities of standard Markov (or n-gram) models: first, it combines predictions from models of different order (using different length contexts for prediction); second, it adapts the maximum order used depending on the context; third, it combines predictions from a long-term model (intended to reflect effects of long-term exposure to a musical style) and a short-term model (reflecting dynamic learning of repeated structure within a given piece of music); and fourth, it is able to combine models of different representations of the musical surface (e.g., chromatic pitch, pitch contour, pitch interval and scale degree for predicting pitch; duration, duration ratio, duration contour for predicting rhythm). IDyOM has been shown to predict accurately Western listeners’ pitch expectations in behavioral, physiological, and EEG studies (e.g., Egermann, Pearce, Wiggins, & McAdams, 2013; Hansen & Pearce, 2014; Omigie, Pearce, & Stewart, 2012; Omigie, Pearce, Williamson, & Stewart, 2013; Pearce, 2005; Pearce, Ruiz, Kapasi, Wiggins, & Bhattacharya, 2010). In many circumstances, IDyOM provides a more accurate model of listeners’ pitch expectations than static rule-based models (e.g., Narmour, 1990; Schellenberg, 1997). Rule-based models consist of fixed rules (e.g., a small interval is expected to be followed by another small interval in the same direction) which cannot be modified by experience and therefore do not predict any differences in perception between music cultures. Although such models may describe the perception of listeners from a given culture they do not constitute accurate models of cognition since they cannot account for the observed effects of enculturation reviewed above, and they
often prove less accurate than IDyOM in accounting for within-culture perception (Hansen & Pearce, 2014; Pearce, 2005; Pearce, Ruiz, et al., 2010). Furthermore, IDyOM accounts well for other psychological processes in music perception, including similarity perception (Pearce & Müllensiefen, 2017), recognition memory performance (Agres, Abdallah, & Pearce, 2018), phrase boundary perception (Pearce, Müllensiefen, & Wiggins, 2010), and aspects of emotional experience (Egermann et al., 2013; Gingras et al., 2015; Sauvé, Sayad, Dean, & Pearce, 2017). To illustrate the construct of cultural distance, we trained three IDyOM models to simulate listeners with enculturation in three different musical styles: first, a Western model trained on a corpus of European folk songs to simulate the perception of a Western listener enculturated in Western tonal music; second, a Chinese model trained on a corpus of Chinese folk songs to simulate the perception of a Chinese listener enculturated in Chinese traditional music; and third, a Turkish model trained on a corpus of Turkish Makam melodies to simulate the perception of a Turkish listener enculturated in Turkish Makam music. The corpus of Western tonal music consists of 769 German folk songs from the Essen Folk Song Collection (Schaffrath, 1992, 1994, 1995), extracted from the datasets fink and erk. The corpus of Chinese music consists of 858 Chinese folk songs from the Essen Folk Song Collection, extracted from the datasets han and natmin. The corpus of Turkish Makam music consists of 805 Makam melodies extracted from the SymbTR database (Karaosmanoğlu, 2012).1 See Table 1 for further details of the corpora used to train the model simulations.
Empty and non-monophonic compositions were first removed from all corpora. Furthermore, we removed duplicate compositions using a conservative procedure that considers two compositions duplicates if they share the same opening four melodic pitch intervals regardless of rhythm. The pitch system used in Turkish Makam music is microtonal and does not precisely map onto the Western (approximately) twelve-fold equal division of the octave (Bozkurt, Ayangil, & Holzapfel, 2014). Since IDyOM’s pitch matching is exact this would cause the Western and Chinese models to assign zero probabilities to every pitch in the Turkish corpus. A simple (though not unproblematic) way of addressing this issue is to round each pitch in the Turkish corpus to the nearest semitone, which enables comparisons to be made between the corpora. For studies with Western participants, this corresponds to the assumption that listeners perceive microtonal pitches categorically, aggregating microtonal pitches to the nearest semitone category. There is some evidence that listeners do in fact perceive pitch categorically in this way, at least in certain circumstances (Burns & Campbell, 1994; Perlman & Krumhansl, 1996). In this example, any responses among Western listeners that demonstrated differences between Western melodies and these “pitch-Westernized” Turkish melodies would underestimate the dissimilarity experienced between the two corpora, conservatively producing type II errors (false negatives) rather than type I errors (false positives). Each model was used to make both within-culture and between-culture predictions. For the within-culture predictions, IDyOM estimates the information content of every event in every composition in the corpus, using ten-fold cross-validation (Kohavi, 1995) to create training and test sets from the same corpus. For between-culture predictions, IDyOM is first trained on the within-culture corpus (e.g., the Western corpus for the Western model) and then estimates the information content of every note in every composition in a different corpus representing the comparison culture (e.g., the Chinese or Turkish corpus for the Western model). IDyOM was configured to use only its long-term model (or LTM, simulating long-term exposure to a musical style) trained on the appropriate corpus; the shortterm model (simulating dynamic learning of repeated patterns within a piece of music) was not used. Other than these differences regarding training corpora, all models were configured identically using the default parameters described in Pearce (2005). In all cases, information content was
averaged across notes for each composition yielding a value representing the mean unpredictability of that composition for a given model. For each comparison between cultures (Western vs. Turkish, Western vs. Chinese, Turkish vs. Chinese), we then plot the data for each composition in the two corresponding corpora: information content for one model is plotted on the abscissa while information content for the second model is plotted on the ordinate. The line of equality (x = y) indicates equivalence between the two models. Compositions lying on this line do not distinguish the two cultures, being equally predictable for each model; in other words, they should be equally familiar and predictable to listeners enculturated in either of the two cultures. Positions near the origin represent compositions that are simple within both cultures—that is, they are highly predictable insofar as most incidences of a selected feature are quite common—while positions far from the origin represent compositions that are complex— unpredictable, uncommon—within both cultures. Positions further away from the line of equality represent compositions that are predictable for the simulated model of one culture but unpredictable for the simulated model of the other culture. Distance from the line of equality, therefore, provides a quantitative measure of cultural distance based on information-theoretic modeling of enculturation in musical styles. Fig. 1A illustrates how cultural distance is computed for a comparison between IDyOM models trained on the Western corpus and the Chinese corpus using a pitch interval representation. By rotating the data points through 45°, Fig. 1B shows the same data with Cultural Distance on the ordinate and culture-neutral complexity on the abscissa. In this example, IDyOM correctly classifies 98 percent of the folk songs by culture (Chinese vs. Western).
FIGURE 1. Modeling cultural distance between the Western and Chinese corpora using a pitch interval representation. A: The information content of the Western model plotted against that of the Chinese model with the x = y line shown. B: A 45° rotation of A such that the ordinate represents cultural distance and the abscissa culture-neutral complexity. For each style, the ten compositions with most extreme cultural distance are highlighted.
As mentioned above, IDyOM is capable of modeling different attributes of the musical surface and combining the predictions made by those models. For each comparison between cultures, cultural distance is computed for models predicting pitch structure alone (using a representation of pitch interval), rhythmic structure alone (using a representation of inter-onset interval), and for models using a combined representation of pitch and rhythmic structure (for which a melodic event is represented as a pair of values, one for the preceding pitch interval and one for the preceding inter-onset interval). For each cultural comparison and each of the three representations, ten compositions with the highest Cultural Distance were selected for each of the two cultures compared. These compositions are highlighted in Fig. 1 for the pitch interval representation. Table 2 shows the mean Cultural Distance values for each combination of cultural comparison and model representation for the corpus as a whole and for the ten selected compositions. Note that this Cultural Distance measure reflects both corpora included in the comparison. Thus, there is only partial overlap between the different comparisons (e.g., five of the ten Chinese songs
selected in the German comparison are the same as those selected in the Turkish comparison; five for the two Turkish comparisons and two for the two German comparisons). Note also that this Cultural Distance measure may be asymmetrical such that one culture is on average more distant from the second than the second is from the first (e.g., in the case of the Western and Chinese comparison, see Table 2). For all three cultural comparisons, as shown in Table 2, the IDyOM simulations produce positive correlations between the cultures for rhythm predictions much more so than pitch predictions which yield no correlation (Western/Chinese), a small positive correlation (Western/Turkish), or a moderate negative correlation (Turkish/Chinese). This suggests that pitch is a more important indicator of cultural distance between these styles than rhythm. For each of the three representations used in each of the three comparisons, one-sample t-tests indicate that the mean cultural distance is significantly different from zero (p < 0.01) for both corpora involved in the comparison.
Limitations The analysis of two or more types of music along any given musical parameter (for example, pitch as in the illustration above) or combination of parameters imposes the assumption that such an analysis is valid within each music type. While a music tradition such as Western art music (at least that from or deriving from the common practice period of approximately
the mid-seventeenth to early twentieth centuries) has a well-established history of analysis and interpretation based, in part, on both sequential and concurrent pitch interval relationships, the same may not be said of other traditions. Tools such as IDyOM offer the flexibility to examine cultural distance according to a variety of individual or combinations of musical parameters. Nevertheless, any specific configuration runs the risk of privileging one parametric hierarchy over another. Thus, in terms of crosscultural research, such statistical models will virtually always impose the perspective of a particular music tradition, at least to some degree. This limitation has ramifications for fully comparative studies in that the degree to which a parameter holds primacy for one set of participants may not hold true for the other. Much as emotion recognition, so familiar to the experience of westernized listeners, did not figure meaningfully in the music tradition of the Mafa (Fritz, 2013), the statistical likelihood of patterns of pitch may contribute less to musical thinking among Rwandans and more to North Americans (as in Cameron et al., 2015) than does the complexity of patterns of rhythm. In this way, cultural distance is a tool through which one can isolate norms for one or more musical parameters as well as provide a particular perspective on musical meaning-making. A related limitation is that IDyOM currently requires symbolic scorelike input in which notes are represented as discrete events with discrete properties (e.g., onset time, pitch). This does not readily accommodate musical cultures which depend heavily on timbral, dynamic, or textural changes. The same is true of musical cultures that have no written tradition, where the distinction between composition and performance is blurred or nonexistent or where music is inextricably combined with other modes of communication (Cross, 2014). Despite the emphasis here on the advantageous aspects of familiarity, without question novelty is an attractive characteristic of music. Models of musical expectancy (e.g., Huron, 2006; Meyer, 1956) describe the interest inherent in and stimulation derived from that which is unfamiliar and surprising in music. The constant curiosity for new musical ideas suggests ongoing willingness to explore less “predictable” musical scenarios. With much of the world’s music readily—and in many cases instantly— accessible, such willingness leads as easily to unfamiliar music traditions as to the remoter corners of one’s own. We have used cultural distance as a means of explaining processing difficulties (as operationalized by
recognition memory); however, it is equally viable as a tool to examine such positive aspects of music experience as interest and surprise. Although Cook (2008) was referring specifically to musicologists, his description can arguably be construed more broadly: “Practically all of us are at least to some degree musically multilingual … as a result one understands even the tradition(s) in which one is most ‘at home’ as options amongst other options, understands them in relation to other traditions rather than as absolutes” (p. 63).
C Research on cross-cultural music interactions has demonstrated that responses to culturally familiar and unfamiliar music, as well as responses by individuals encultured in different music traditions, can be either remarkably similar or strikingly different depending on the task and the music presented. Theoretical models such as Cue Redundancy (Balkwill & Thompson, 1999) or Fritz’s (2013) dock-in model, have framed crosscultural music interactions as consisting of culture-general and culturespecific components. The manner in which these models account for areas of overlap between music cultures and distinctions unique to each music culture fit well with recent research findings as well as with the concept of cultural distance. However, absent from their construal of shared and unique features is a middle ground of “culturally specific but similar” components that, while mutually proprietary and uniquely meaningful to each culture, may be somewhat accommodating to strategies for listening, performing, and meaning-making deployed by individuals from outside the culture. This similar-but-not-shared aspect of the cultural distance construct can help account for memory responses, reported above, to out-of-culture music that were less successful than for in-culture music but were still above chance (e.g., Demorest et al., 2008). Likewise, it also provides an explanation in cases where listeners have applied familiar listening strategies to culturally unfamiliar music only to encounter ultimate confusion (e.g., Curtis & Bharucha, 2009). Eventually, the trajectory of complexity within a culturally unfamiliar system takes a listener or
performer past where learned patterns can accommodate. On the whole, responses to musics that demonstrate considerable overlap may show greater consistency than those to musics with very few points of commonality. Thus, one can make a distinction between the apparent “ease” with which an individual can move between music cultures and the more likely case of greater opportunities afforded by some unfamiliar music cultures to successfully deploy familiar strategies. This is potentially useful for neurological investigations of music processing. Responses to culturally unfamiliar music have generally been reported to differ more by degree than by presence or location. That is, music appears to recruit similar neural systems regardless of its cultural familiarity, though the strength or extent of that activity may differ according to the music encountered (e.g., Nan et al., 2008; Demorest et al., 2010). The model of cultural distance is a tool that provides a continuous rather than categorical conceptualization of cross-cultural music research designs. Such a correlational approach may lend itself well to the finegrained, incremental, and plastic manner in which neurological processes and pathways develop and are deployed. We are not suggesting that through the learning of an unfamiliar array of patterns one can gain access to the full, rich experience of culturally situated musical contexts. Music represents a broad range of activities and relationships that may only have tenuous connections to structural parameters like melodic or rhythmic intervals. Much of music’s meaning is derived from where, when, and how it occurs quite apart from how it is put together (Small, 1998). Rather, we suggest that cultural distance may be a useful lens through which specific aspects of the cognitive processing of music—particularly musical structure—may be predicted, investigated, analyzed, and interpreted. Much of the research on cross-cultural musical interactions has involved measurement of such things as memory, affective response, detection of differences, verbal or written description, and preference. In virtually all cases these outcomes were prompted through listening tasks, a way of experiencing music that, while ecologically valid and obviating any need for previous training, is covert and arguably accommodating of varied interpretations and strategies. In contrast, investigations of cross-cultural performance contexts may yield new insights into the ways in which individuals navigate unfamiliar musical terrain. More directly observable
performance-based interactions may shed additional light on the processes by which one grapples with, accommodates, or eventually gains facility with musics that are differently organized. Earlier we posed the question of what happens when music crosses cultural boundaries. The construct of cultural distance provides a more graduated, incremental way of conceptualizing the relationship between the familiar and the unfamiliar. It allows for the fluidity characteristic of musical interactions, recognizes the porous nature of music categorization, and accounts for the variability found within any music tradition. For research purposes, cultural distance offers a way by which dichotomous models of music—insider/outsider, familiar/unfamiliar, own/other—can be refined to test a more nuanced picture of musical meaning-making. In this way, cross-cultural music interactions might be viewed less as the crossing of a boundary and more as the undertaking of a trip.
R Agres, K., Abdallah, S., & Pearce, M. T. (2018). Information-theoretic properties of auditory sequences dynamically influence expectation and memory. Cognitive Science 42(1), 43–76. Baek, Y. M. (2015). Relationship between cultural distance and cross-cultural music video consumption on YouTube. Social Science Computer Review 33(6), 730–748. Balkwill, L. L. (2006). Perceptions of emotion in music across cultures. Paper presented at Emotional Geographies: The Second International & Interdisciplinary Conference, May, Queen’s University, Kingston, Canada. Balkwill, L. L., & Thompson, W. F. (1999). A cross-cultural investigation of the perception of emotion in music: Psychophysical and cultural cues. Music Perception 17(1), 43–64. Balkwill, L. L., Thompson, W. F., & Matsunaga, R. (2004). Recognition of emotion in Japanese, Western, and Hindustani music by Japanese listeners. Japanese Psychological Research 46(4), 337–349. Begleiter, R., El-Yaniv, R., & Yona, G. (2004). On prediction using variable order Markov models. Journal of Artificial Intelligence Research 22, 385–421. Bell, T. C., Cleary, J. G., & Witten, I. H. (1990). Text compression. Englewood Cliffs, NJ: Prentice Hall. Bozkurt, B., Ayangil, R., & Holzapfel, A. (2014). Computational analysis of Makam music in Turkey: Review of state-of-the-art and challenges. Journal of New Music Research 43(1), 3–23. Brown, S., & Jordania, J. (2013). Universals in the world’s musics. Psychology of Music 41(2), 229– 248. Bunton, S. (1997). Semantically motivated improvements for PPM variants. The Computer Journal 40(2–3), 76–93. Burns, E. M., & Campbell, S. L. (1994). Frequency and frequency-ratio resolution by possessors of absolute and relative pitch: Examples of categorical perception. Journal of the Acoustical Society
of America 96(5), 2704–2719. Cameron, D. J., Bentley, J., & Grahn, J. A. (2015). Cross-cultural influences on rhythm processing: Reproduction, discrimination, and beat tapping. Frontiers in Psychology 6, 366. Retrieved from https://doi.org/10.3389/fpsyg.2015.00366 Castellano, M. A., Bharucha, J. J., & Krumhansl, C. L. (1984). Tonal hierarchies in the music of north India. Journal of Experimental Psychology: General 113(3), 394–412. Chiao, J. Y., Iidaka, T., Gordon, H. L., Nogawa, J., Bar, M., Aminoff, E., … Ambady, N. (2008). Cultural specificity in amygdala response to fear faces. Journal of Cognitive Neuroscience 20(12), 2167–2174. Cleary, J. G., & Teahan, W. J. (1997). Unbounded length contexts for PPM. The Computer Journal 40(2–3), 67–75. Conklin, D., & Witten, I. H. (1995). Multiple viewpoint systems for music prediction. Journal of New Music Research 24(1), 51–73. Cook, N. (2008). We are all (ethno)musicologists now. In H. Stobart (Ed.), The new (ethno)musicologies (pp. 48–70). Lanham, MD: Scarecrow Press. Cross, I. (2008). Musicality and the human capacity for culture. Musicae Scientiae 12(1 Suppl.), 147–167. Cross, I. (2014). Music and communication in music psychology. Psychology of Music 42(6), 809– 819. Curtis, M. E., & Bharucha, J. J. (2009). Memory and musical expectation for tones in cultural context. Music Perception 26(4), 365–375. Demorest, S. M., & Morrison, S. J. (2016). Quantifying culture: The cultural distance hypothesis of melodic expectancy. In J. Y. Chiao, S.-C. Li, R. Seligman, & R. Turner (Eds.), The Oxford handbook of cultural neuroscience (pp. 183–194). Oxford: Oxford University Press. Demorest, S. M., Morrison, S. J., Beken, M. N., & Jungbluth, D. (2008). Lost in translation: An enculturation effect in music memory performance. Music Perception 25(3), 213–223. Demorest, S. M., Morrison, S. J., Beken, M. N., Stambaugh, L. A., Richards, T. L., & Johnson, C. (2010). Music comprehension among western and Turkish listeners: fMRI investigation of an enculturation effect. Social Cognitive and Affective Neuroscience 5, 282–291. Demorest, S. M., Morrison, S. J., Nguyen, V. Q., & Bodnar, E. N. (2016). The influence of contextual cues on cultural bias in music memory. Music Perception 33(5), 590–600. Demorest, S. M., & Osterhout, L. (2012). ERP responses to cross-cultural melodic expectancy violations. Annals of the New York Academy of Sciences 1252, 152–157. Demorest, S. M., & Schultz, S. J. (2004). Children’s preference for authentic versus arranged versions of world music recordings. Journal of Research in Music Education 52(4), 300–313. Deva, B. C., & Virmani, K. G. (1975). A study in the psychological response to ragas. (Research Report II of Sangeet Natak Akademi). New Delhi, India: Indian Musicological Society. Drake, C., & Ben El Heni, J. (2003). Synchronizing with music: Intercultural differences. Annals of the New York Academy of Sciences 999, 429–437. Egermann, H., Fernando, N., Chuen, L., & McAdams, S. (2015). Music induces universal emotionrelated psychophysiological responses: Comparing Canadian listeners to Congolese Pygmies. Frontiers in Psychology 5, 1341. Retrieved from https://doi.org/10.3389/fpsyg.2014.01341 Egermann, H., Pearce, M. T., Wiggins, G. A., & McAdams, S. (2013). Probabilistic models of expectation violation predict psychophysiological emotional responses to live concert music. Cognitive, Affective & Behavioral Neuroscience 13(3), 533–553. Flowers, P. J. (1980). Relationship between two measures of music preference. Contributions to Music Education 8, 47–54.
Frith, S. (1996). Music and identity. In S. Hall & P. Du Gay (Eds.), Questions of cultural identity (pp. 108–127). London: Sage Publications. Fritz, T. (2013). The dock-in model of music culture and cross-cultural perception. Music Perception: An Interdisciplinary Journal 30(5), 511–516. Fritz, T., Jentschke, S., Gosselin, N., Sammler, D., Peretz, I., Turner, R., … Koelsch, S. (2009). Universal recognition of three basic emotions in music. Current Biology 19(7), 573–576. Fung, C. V. (1994). Undergraduate nonmusic majors’ world music preference and multicultural attitudes. Journal of Research in Music Education 42(1), 45–57. Gingras, B., Pearce, M. T., Goodchild, M., Dean, R. T., Wiggins, G., & McAdams, S. (2015). Linking melodic expectation to expressive performance timing and perceived musical tension. Journal of Experimental Psychology: Human Perception & Performance 42(4), 594–609. Giuliano, R. J., Pfordresher, P. Q., Stanley, E. M., Narayana, S., & Wicha, N. Y. (2011). Native experience with a tone language enhances pitch discrimination and the timing of neural responses to pitch change. Frontiers in Psychology 2, 146. Retrieved from https://doi.org/10.3389/fpsyg.2011.00146 Golby, A. J., Gabrieli, J. D., Chiao, J. Y., & Eberhardt, J. L. (2001). Differential responses in the fusiform region to same-race and other-race faces. Nature Neuroscience 4, 845–850. Gregory, A. H., & Varney, N. (1996). Cross-cultural comparisons in the affective response to music. Psychology of Music 24(1), 47–52. Hannon, E. E. (2009). Perceiving speech rhythm in music: Listeners classify instrumental songs according to language of origin. Cognition 111(3), 403–409. Hannon, E. E., & Trehub, S. E. (2005a). Metrical categories in infancy and adulthood. Psychological Science 16(1), 48–55. Hannon, E. E., & Trehub, S. E. (2005b). Tuning in to musical rhythms: Infants learn more readily than adults. Proceedings of the National Academy of Sciences 102(35), 12639–12643. Hansen, N. C., & Pearce, M. T. (2014). Predictive uncertainty in auditory sequence processing. Frontiers in Psychology 5, 1–17. Retrieved from https://doi.org/10.3389/fpsyg.2014.01052 Heingartner, A., & Hall, J. V. (1974). Affective consequences in adults and children of repeated exposure to auditory stimuli. Journal of Personality and Social Psychology 29(6), 719–723. Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences 33(2–3), 61–83. Hofstede, G. (1983). National cultures in four dimensions: A research-based theory of cultural differences among nations. International Studies of Management & Organization 13(1–2), 46–74. Huron, D. B. (2006). Sweet anticipation: Music and the psychology of expectation. Cambridge, MA: MIT Press. Iversen, J. R., Patel, A. D., & Ohgushi, K. (2008). Perception of rhythmic grouping depends on auditory experience. Journal of the Acoustical Society of America 124, 2263–2271. Juslin, P. N. (2000). Cue utilization in communication of emotion in music performance: Relating performance to perception. Journal of Experimental Psychology: Human Perception and Performance 26(6), 1797–1812. Juslin, P. N., & Laukka, P. (2003). Communication of emotions in vocal emotion and music performance: Different channels, same code? Psychological Bulletin 129(5), 770–814. Karaosmanoğlu, M. K. (2012). A Turkish Makam music symbolic database for music information retrieval: Symbtr. In Proceedings of the 13th ISMIR Conference, Porto, Portugal, 223–228. Keil, A., & Keil, C. (1966). A preliminary report: The perception of Indian, Western, and AfroAmerican musical moods by American students. Ethnomusicology 10(2), 153–173. Kessler, E. J., Hansen, C., and Shepard, R. N. (1984). Tonal schemata in the perception of music in Bali and the West. Music Perception 2(2), 131–65.
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (Vol. 2, pp. 1137–1145). San Mateo, CA: Morgan Kaufmann. Krumhansl, C. L. (1995). Music psychology and music theory: Problems and prospects. Music Theory Spectrum 17(1), 53–80. Krumhansl, C. L., Louhivuori, J., Toiviainen, P., Jarvinen, T., & Eerola, T. (1999). Melodic expectation in Finnish spiritual folk hymns: Convergence of statistical, behavioral, and computational approaches. Music Perception 17(2), 151–195. Krumhansl, C. L., & Shepard, R. N. (1979). Quantification of the hierarchy of tonal functions within a diatonic context. Journal of Experimental Psychology: Human Perception and Performance 5(4), 579–594. Krumhansl, C. L., Toivanen, P., Eerola, T., Toiviainen, P., Järvinen, T., & Louhivuori, J. (2000). Cross-cultural music cognition: Cognitive methodology applied to North Sami Yoiks. Cognition 76(1), 13–58. Kuhl, P. K. (1991). Human adults and human infants show a “perceptual magnet effect” for the prototypes of speech categories, monkeys do not. Attention, Perception, & Psychophysics 50(2), 93–107. Laukka, P., Eerola, T., Thingujam, N. S., Yamasaki, T., & Beller, G. (2013). Universal and culturespecific factors in the recognition and performance of musical affect expressions. Emotion 13(3), 434–449. LeBlanc, A. (1982). An interactive theory of music preference. Journal of Music Therapy 19(1), 28– 45. Lynch, M. P., & Eilers, R. E. (1991). Children’s perception of native and nonnative musical scales. Music Perception 9(1), 121–131. Lynch, M. P., & Eilers, R. E. (1992). A study of perceptual development for musical tuning. Perception & Psychophysics 52(6), 599–608. Lynch, M. P., Eilers, R. E., Oller, D. K., & Urbano, R. C. (1990). Innateness, experience, and music perception. Psychological Science 1(4), 272–276. Lynch, M. P., Eilers, R. E., Oller, K. D., Urbano, R. C., & Wilson, P. (1991). Influences of acculturation and musical sophistication on perception of musical interval patterns. Journal of Experimental Psychology: Human Perception and Performance 17(4), 967–975. Lynch, M. P., Short, L. B., and Chua, R. (1995). Contributions of experience to the development of musical processing in infancy. Developmental Psychobiology 28(7), 377–398. MacKay, D. J. C. (2003). Information theory, inference, and learning algorithms. Cambridge: Cambridge University Press. Merton, R. K. (1972). Insiders and outsiders: A chapter in the sociology of knowledge. American Journal of Sociology 78(1), 9–47. Meyer, L. B. (1956). Emotion and meaning in music. Chicago, IL: University of Chicago Press. Morrison, S. J., & Demorest, S. M. (2009). Cultural constraints on music perception and cognition. Progress in Brain Research 178, 67–77. Morrison, S. J., Demorest, S. M., Aylward, E. H., Cramer, S. C., & Maravilla, K. R. (2003). fMRI investigation of cross-cultural music comprehension. NeuroImage 20(1), 378–384. Morrison, S. J., Demorest, S. M., Campbell, P. S., Bartolome, S. J., & Roberts, J. C. (2012). Effect of intensive instruction on elementary students’ memory for culturally unfamiliar music. Journal of Research in Music Education 60(4), 363–374. Morrison, S. J., Demorest, S. M., & Stambaugh, L. A. (2008). Enculturation effects in music cognition: The role of age and music complexity. Journal of Research in Music Education 56(2), 118–129.
Morrison, S. J., & Yeh, C. S. (1999). Preference responses and use of written descriptors among music and nonmusic majors in the United States, Hong Kong, and the People’s Republic of China. Journal of Research in Music Education 47(1), 5–17. Nan, Y., Knösche, T. R., & Friederici, A. D. (2006). The perception of musical phrase structure: A cross-cultural ERP study. Brain Research 1094(1), 179–191. Nan, Y., Knösche, T. R., & Friederici, A. D. (2009). Non-musicians’ perception of phrase boundaries in music: A cross-cultural ERP study. Biological Psychology 82(1), 70–81. Nan, Y., Knösche, T. R., Zysset, S., & Friederici, A. D. (2008). Cross-cultural music phrase processing: An fMRI study. Human Brain Mapping 29(3), 312–328. Narmour, E. (1990). The analysis and cognition of basic melodic structures: The implicationrealization model. Chicago, IL: University of Chicago Press. Neuhaus, C. (2003). Perceiving musical scale structures: A cross-cultural event-related brain potentials study. Annals of the New York Academy of Sciences 999, 184–188. Omigie, D., Pearce, M. T., & Stewart, L. (2012). Tracking of pitch probabilities in congenital amusia. Neuropsychologia 50(7), 1483–1493. Omigie, D., Pearce, M. T., Williamson, V. J., & Stewart, L. (2013). Electrophysiological correlates of melodic processing in congenital amusia. Neuropsychologia 51(9), 1749–1762. Patel, A. D., & Daniele, J. R. (2003). An empirical comparison of rhythm in language and music. Cognition 87(1), B35–B45. Patel, A. D., & Demorest, S. M. (2013). Comparative music cognition: Cross-species and crosscultural studies. In D. Deutsch (Ed.), The psychology of music (3rd ed., pp. 647–681). London: Academic Press. Pearce, M. T. (2005). The construction and evaluation of statistical models of melodic structure in music perception and composition (Doctoral dissertation). Department of Computing, City University, London. Pearce, M. T., & Müllensiefen, D. (2017). Compression-based modelling of musical similarity perception. Journal of New Music Research 46(2), 135–155. Pearce, M. T., Müllensiefen, D., & Wiggins, G. A. (2010). Melodic grouping in music information retrieval: New methods and applications. In Z. W. Ras & A. Wieczorkowska (Eds.), Advances in music information retrieval (pp. 364–388). Berlin: Springer. Pearce, M. T., Ruiz, M. H., Kapasi, S., Wiggins, G. A., & Bhattacharya, J. (2010). Unsupervised statistical learning underpins computational, behavioural and neural manifestations of musical expectation. NeuroImage 50(1), 302–313. Perlman, M., & Krumhansl, C. L. (1996). An experimental study of internal interval standards in Javanese and Western musicians. Music Perception 14(2), 95–116. Pfordresher, P. Q., & Brown, S. (2009). Enhanced production and perception of musical pitch in tone language speakers. Attention, Perception, & Psychophysics 71(6), 1385–1398. Polak, R., London, J., & Jacoby, N. (2016). Both isochronous and non-isochronous metrical subdivision afford precise and stable ensemble entrainment: A corpus study of Malian djembe drumming. Frontiers in Neuroscience 10, 285. Retrieved from https://doi.org/10.3389/fnins.2016.00285 Raman, R., & Dowling, W. J. (2016). Real-time probing of modulations in South Indian classical (Carnatic) music by Indian and Western musicians. Music Perception 33(3), 367–393. Raman, R., & Dowling, W. J. (2017). Perception of modulations in south Indian classical (Carnatic) music by student and teacher musicians: A cross-cultural study. Music Perception 34(4), 424–437. Renninger, L. B., Wilson, M. P., & Donchin, E. (2006). The processing of pitch and scale: An ERP study of musicians trained outside of the western musical system. Empirical Musicology Review 1(4), 185–197.
Sauvé, S., Sayad, A., Dean, R. T., & Pearce, M. T. (2017). Effects of pitch and timing expectancy on musical emotion. arXiv Preprint, 1708.03687. Schaffrath, H. (1992). The ESAC databases and MAPPET software. Computing in Musicology 8, 66. Schaffrath, H. (1994). The ESAC electronic songbooks. Computing in Musicology 9, 78. Schaffrath, H. (1995). The Essen folksong collection. In D. Huron (Ed.), Database containing 6,255 folksong transcriptions in the Kern format and a 34-page research guide [computer database]. Menlo Park, CA: CCARH. Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. Music Perception 14(3), 295–318. Shehan, P. K. (1981). Student preferences for ethnic music styles. Contributions to Music Education 9, 21–28. Shehan, P. K. (1985). Transfer of preference from taught to untaught pieces of non-Western music genres. Journal of Research in Music Education 33(3), 149–158. Small, C. (1998). Musicking: The meanings of performing and listening. Middletown, CT: Wesleyan University Press. Soley, G., & Hannon, E. E. (2010). Infants prefer the musical meter of their own culture: A crosscultural comparison. Developmental Psychology 46(1), 286–292. Stobart, H., & Cross, I. (2000). The Andean anacrusis? Rhythmic structure and perception in Easter songs of northern Potosi, Bolivia. British Journal of Ethnomusicology 9(2), 63–92. Thompson, W. F., & Balkwill, L. L. (2010). Cross-cultural similarities and differences. In P. N. Juslin & J. A. Sloboda (Eds.), Handbook of music and emotion: Theory, research, applications (pp. 755– 790). New York: Oxford University Press. Trulsson, Y. H., & Burnard, P. (2016). Insider, outsider or cultures in-between. In P. Burnard, E. Mackinlay, & K. Powell (Eds.), The Routledge international handbook of intercultural arts research (pp. 115–125). New York: Routledge. Wong, P. C. M., Chan, A. H. D., Roy, A., & Margulis, E. H. (2011). The bimusical brain is not two monomusical brains in one: Evidence from musical affective processing. Journal of Cognitive Neuroscience 23(12), 4082–4093. Wong, P. C., Ciocca, V., Chan, A. H., Ha, L. Y., Tan, L. H., & Peretz, I. (2012). Effects of culture on musical pitch perception. PloS ONE 7(4), e33424. Wong, P. C. M., Roy, A. K., & Margulis, E. H. (2009). Bimusicalism: The implicit dual enculturation of cognitive and affective systems. Music Perception 27(2), 81–88. Yoshida, K. A., Iversen, J. R., Patel, A. D., Mazuka, R., Nito, H., Gervain, J., & Werker, J. F. (2010). The development of perceptual grouping biases in infancy: A Japanese-English cross-linguistic study. Cognition 115(2), 356–361.
1
The Essen Folk Song Collection was retrieved from: http://kern.humdrum.org/cgi-bin/browse? l=/essen. The SymbTR database was retrieved from: https://github.com/MTG/SymbTr.
CHAPT E R 4
W H E N E X T R AVA G A N C E IMPRESSES: RECASTING ESTHETICS IN E VO L U T I O N A RY T E R M S B JO R N ME R K E R
I W we constrain language by meter and rhyme in poetry, or when we adorn mundane earthenware pottery with decorative markings, we are making matters more complicated than utility or instrumental purposes dictate. Whole art forms, such as music, lay claim to human resources without yielding obvious returns in survival benefits. A candidate benefit such as the promotion of group cohesion through music-mediated bonding (Huron, 2001) begs the question of why humans need music to bond when non-human animals bond perfectly well without it (Lim & Young, 2006). We share with them the system of socially contingent circulation of “hormones of affiliation” (oxytocin and vasopressin, Heinrichs, von Dawans, & Domes, 2009), so enhanced expression of relevant receptors in nuclei of the basal forebrain (Kelly & Goodson, 2014) would seem to provide a less cumbersome way to increase our bonding propensities. Even assuming music does play a role in the human case (Pearce, Launay, &
Dunbar, 2015), the evolutionary question of how and why music acquired a capacity to facilitate bonding remains (Pinker, 1997, p. 528). The paucity of well-supported utilitarian accounts of the function of music has led some to regard music as a by-product of other mechanisms of the mind (Pinker, 1997, p. 534) or as a culturally invented “technology” (Patel, 2008, p. 400). The approach to be detailed in what follows traces the human propensity to expend resources on the arts to the same selection pressures that have compelled a number of species of non-human animals to maintain cultural traditions featuring large repertoires of elaborate song on a learned basis. In tracing that analogy we will uncover a psychological/neural mechanism specifically involved in esthetic judgments that frames the question of the emotional impact of music in a new way. To do so, we need to approach the relevant animal displays from first principles onwards to arrive at the recent elaboration of Zahavi’s handicap principle (Zahavi, 1975) into the “developmental stress hypothesis” for the function of large and complex song repertoires in birds with vocal learning (Hasselquist, Bensch, & von Schantz, 1996; MacDougall-Shackleton & Spencer, 2012; Nowicki, Searcy, & Peters, 2002a).
S
A E
, I: T L
Among songbirds with vocal production learning (Janik & Slater, 1997) there is an association between a high duty cycle for song (i.e., a large amount of continuous singing per day), large song repertoires, and high pattern variety among songs (Baylis, 1982). This correlation is presumably driven by the fact that protracted production of monotonous singing loses the attention of its audience by the ubiquitous mechanism of habituation (Hartshorne, 1956; Kroodsma, 1978; Sachs, 1967; Sokolov, 1963). Persistent singing is energetically costly and takes place at the expense of useful activities such as foraging. Why, then, prolong the song display beyond the boredom threshold, thus incurring the additional cost of acquiring the means to produce elaborate song? Whence the waste and frivolity of virtuoso performance?
Because the costs of signaling are paid for by the same metabolic engine that foots the bill for survival, the very fact of surviving with the added burden of exaggerated signaling is in itself informative. It supplies proof positive that the signaler is capable of sustaining that additional burden. The capacity is therefore necessarily an aspect of signaler quality, a circumstance Amotz Zahavi codified in what he named the “handicap principle” (Zahavi, 1975), a principle that completes the Darwinian theory of sexual selection (Darwin, 1871). The logic of the handicap principle is quite general, and is by no means limited to promoting potential genetic benefits to offspring. A male in an agonistic interaction with a conspecific needs to assess the actual fighting ability of the rival, and not his genetic potential. Similarly a female in a species with obligate bi-parental care must assess the extent of a prospective mate’s capacity to invest in the care of offspring. For a variety of reasons that capacity can and does vary independently of the genes he contributes to those offspring. In these cases and others, a display of excess capacity in the form of elaborate signaling can indicate capacity in the relevant behavioral dimension, provided such signaling actually relates, directly or indirectly, to abilities and resources employed in the behavioral dimension of interest to the receiver. An example involving a direct relationship between signaling and a desired or feared quality in the signaler is physical fitness itself. Loud singing for many hours on a daily basis proves that the singer has the energy reserves, predator vigilance, stamina, and foraging ability to sustain such behavior without succumbing. For a receiver this means that those same resources are available for other uses should the animal’s circumstances or needs require it. Indirect relationships between signaling and signaler qualities can be quite remote, as illustrated by numerous laboratory and field studies inspired by the “developmental stress hypothesis” over the past two decades (Hasselquist et al., 1996; Nowicki et al., 2002a). The learned acquisition of elaborate song is a protracted and demanding sequence of intertwined perceptual, attentional, memory, and motor challenges that unfolds after hatching in a still developing brain. The sequence of passive song memorization, followed by stage-wise practice of vocal skill spanning over weeks and months, interacts with and feeds back upon the development and neural maturation of an elaborate system of
interconnected forebrain nuclei dedicated to song learning and production (Iwaniuk & Nelson, 2003; reviewed in Nowicki et al., 2002a). Thus, the size of the song control nuclei of the mature songbird forebrain correlates not only with average song repertoire size across species, but with the repertoire size and song proficiency achieved by individuals within a species (Gahr, 2000; Garamszegi & Eens, 2004). The latter circumstance is of central biological significance, because repertoire size and song proficiency are factors used by females in choosing a mate (Nowicki, Searcy, & Peters, 2002b). Each sequential stage of this delicately tuned two-way interaction between neural development and behavior is susceptible to perturbation by a variety of external stressors and disturbances (hence the name “developmental stress hypothesis”). They include, but are not limited to, immune challenges, disease and parasites, nutritional status dependent on parental provisioning and later the bird’s own foraging ability, environmental pollutants, and disruptions at the nest (reviewed by MacDougall-Shackleton & Spencer, 2012). The management of such encumbrances consumes developmental resources which otherwise would have been available for the practice-dependent growth of the song system. A large repertoire and proficient song performance accordingly can only be acquired by an individual who as a nestling was cared for by wellfunctioning parents, who grew up in a secure nest, was subsequently unencumbered by disease and parasites, and—in possession of sharp faculties, memory capacity, foraging ability, and predator vigilance— engaged in hundreds of hours of successful singing practice. Whatever impairs the post-hatching growth of a bird’s system of song nuclei, and whatever keeps the bird from attending to and practicing song is later evident as deficits in the size and perfection of its mature song repertoire. This makes a large repertoire of complex song a direct causal reflection of an individual’s successful passage through a demanding and varied developmental obstacle course. The more demanding the performance to be acquired, the more comprehensive a measure of an individual’s personal history and qualities lies implicit in the perfected, mature songbout. In effect, then, an individual’s level of song proficiency sums up, in a single performance, the entire developmental history of the singer, and as such provides an allround certificate of competence, of all-around individual phenotypic
quality. It tells its audience, in a way impossible to counterfeit, that the singer comes from, as it were, “a good background.” Potential mates and rivals thus do well to take a singer displaying mastery and virtuosity seriously. Though none of this is likely to be accomplished without an adequate genetic background, it is the finished phenotype and not the genotype that fights with rivals and helps a bonded female provision her offspring and defend the nest. Hence the importance of markers for phenotypic quality when decisions in these regards have to be made on the spot during a brief breeding season. That is what the expert songbout provides, conferring on the singer high priority as mate or rival. Provided, that is, that there are ears competent to assess the quality of the songbout, and to discriminate an outstanding performance from a middling one. This in turn leads us to the crux of cultural esthetics, namely the means by which receivers judge performance quality and the critical dependence of those means on the cultural song tradition within which the performance takes place.
S
A
, II: A B B
The circumstances outlined in the previous section help us understand why a brown thrasher accumulates a song repertoire estimated to contain over 1800 separate melodies (Kroodsma & Parker, 1977), or how the sounds of as many as 76 different species of birds from two continents can be identified in the song repertoire of a single marsh warbler individual (Dowsett-Lemaire, 1979). As we have seen, the sole reason to take a performance seriously is the protracted and demanding process of its acquisition. It is only the lengthy and exacting course of pattern acquisition and vocal skill practice that makes a songbout an all-round index of phenotypic quality. Accordingly, 1800 melodies produced by impromptu invention on the spot ought to impress less than the same number acquired by meticulous copying from the local song tradition. Why should this be so? Only the local song tradition (or other local sounds in the case of bird mimics) provides the intended audience with a standard or norm by which
to judge the extent of a singing individual’s proficiency and repertoire coverage. The listeners or judges grew up in the same general neighborhood as the singer. They were therefore exposed to the same song tradition and other ambient sounds, and committed them to memory even if, as is the case for the females of some species, they do not themselves sing (the females of many species do in fact sing, see Riebel, 2003). Females are sensitive both to how much has been learned by a male and to how well his performance matches the shared standard, and they make their mating decisions accordingly (Nowicki et al., 2002b). Only against the background of the intimate knowledge of the local lore shared between performer and audience is it possible to tell the extent to which a given performer has achieved mastery. By the same token, potential usurpers without apprenticeship in that lore but attempting to fake it need not apply. Even in species that copy the sounds of other species or environmental sounds into their repertoire (perhaps generations ago: Baylis, 1982), and thus lack species-specific constraints on the patterns that may be acquired, the repertoire is typically acquired, first and foremost, from the local song tradition carried by conspecifics. Because only exact duplication of the received pattern proves that the bird actually attended to and discriminated the perceptual details of its model and then practiced its articulatory complexities to perfection, copying fidelity is part of the standard. A whole circuit in the song control system of birds—the so-called anterior forebrain pathway—is dedicated to using auditory feedback of the bird’s own singing voice to gradually shape its vocal output to match the model stored in memory (see Konishi, 2004 for review). Complexity itself is therefore not the point of the performance. Random strings abound in complexity—in fact, by one measure they are ultimately complex (see, for example, Grassberger, 1986, Fig. 1)—but their complexity is of a kind that does not lend itself to comparative assessment. If individuals are to be compared one with another, the extent of their acquisition of content from a common, shared, pool of content, namely the local song tradition or soundscape, is essential for ranking their performances. If that tradition and soundscape is richer than what any single individual can easily master, then the extent of an individual’s repertoire “coverage” of that material, that is, the size of an individual’s tradition-based repertoire, is a veridical measure of his or her song learning capacity.
FIGURE 1. Schematic depiction of the “information dimension” outlined in the text. It is a composite of concepts and findings of three authors studying responses to novelty in the 1960s (Bindra, 1959; Sachs, 1967; Sokolov, 1963). The dimension spans from maximal certainty (minimal prediction error) at the bottom, to maximal uncertainty (maximal prediction error) at the top. Behavioral reactions are presented in the left-hand column, and inferred psychological states in the right-hand column.
We can conjecture that only a performance that draws on a sufficiently broad sample of the listener’s recognition memory for the local song tradition—a sample large enough to challenge and even tax the listeners’ powers of apprehension—will be taken seriously as a proof of competence. The better each song string reproduces the traditional model the better will their aggregate fill this function. A judgment of competence might have to be upgraded to one of mastery if, in addition to featuring extensive coverage of the traditional lore and high fidelity reproduction, the performance starts pushing the limits of the listener’s recognition memory. This would happen when the songbout includes material that in fact forms part of the tradition but was “missed” by the listener’s own ontogenetic acquisition process, or when it features patterns introduced by the singer as virtuoso embellishments. Under such circumstances the listener has good reason to take the performance seriously indeed. In either case the performer is giving proof of a capacity beyond that possessed by the listener, given the proviso that in either case
such “excess” material (excess from the standpoint of the listener) fits seamlessly into the framework of the received form. It is copying fidelity (supported by what I have called a “conformal motive,” Merker, 2005) that gives the resulting cultural song tradition the temporal inertia needed for it to serve as a standard of judgment. It in effect stabilizes the tradition against too rapid accumulation of inevitable copying errors. Thus stabilized, it provides individual learners with a vehicle by means of which to advertise the quality of their developmental history through the quality and scope of their command of the local tradition, and their audience with a standard by which to judge the same performance. We turn now to the inner workings of the listener’s responsiveness in this regard.
S S
A
, III: M
Judgments of a singer’s performance are fraught with consequences for listeners, be they potential rivals or mates. In the case of the opposite sex it determines the partner with whom one or more breeding seasons—even a lifetime—will be spent, and in the case of same-sex rivalry it determines matters as important as the quality of the territory on which foraging and the rearing of offspring will take place. Much therefore hinges on the assessment of the songbout that serves as a proxy for the phenotypic qualities it underwrites, as covered in the previous two sections. How then to compare and judge the streams of intricately patterned sound emanating from the throats of singers (perhaps not even visible to their judges)? Something must intervene in the psychology of the receiver/judge/listener between apprehension of the songbout and the reallife choice the receiver makes on its basis. That something can hardly be formal analysis of the contents of the songbout, but ought to be some form of intuitive summary measure of the extent to which the songbout taps and taxes the listener’s knowledge of the local song tradition. Some form of global emotional summary thus lies close at hand. As we have seen, the repertoire size, model fidelity, pattern complexity, and ease or elegance of delivery of a songbout must be measured against
the local song tradition as its standard. It is assessable, therefore, only against a background of prior familiarity with that local song tradition (or soundscape, in the case of mimics). A principal function of the bulging forebrain system of warm-blooded animals (i.e., birds and mammals, which are large-brained compared to the rest of the animal kingdom) is to determine the extent to which current sensory afference pushes or exceeds the boundaries of prior stimulus familiarity. This quantity has variously been called surprisal (Tribus, 1961), novelty (Berlyne, 1960; Bindra, 1959; Sachs, 1967; Sokolov, 1963), surprisingness (Kamin, 1969), prediction error (not named as such: Rescorla & Wagner, 1972), and expectancy violation (Meyer, 1956). It has been related specifically to esthetics by Berlyne (1971). Though there are differences in emphasis and detail behind these names, they all have a shared functional principle at their core, readily interpretable in informal Bayesian terms (Rohrmeier and Koelsch, 2012). The operation of this principle is captured by the free energy formulation of the logistics of bi-directional learning networks pioneered by Geoffrey Hinton and colleagues (Hinton & Zemel, 1994), subsequently popularized by Karl Friston and others (Clark, 2013; Friston, 2002). Implemented through an elaborate neural system which besides its neocortical parts involves the hippocampus, amygdala, and diencephalic and midbrain way stations (see Merker, 2007a, Fig. 3 and Merker, 2013a, Fig. 2) this function converts the informational content of sensory experience to a running emotional summary in real time of the extent to which the pattern of current afference exceeds the bounds of prior experience. When those bounds are exceeded, this system signals caution, apprehension, fear, and even terror, along a dimension that represents the magnitude of novelty, expectancy violation, or prediction error. For present purposes, it suffices to conceive of movement along this dimension to be signaled by increasing levels of central activation. Central activation is reflected in cortical gamma oscillations (Merker, 2013b), and peripherally in the specifically cholinergic aspect of sympathetic activity reflected in skin conductance changes (Shields, MacDowell, Fairchild, & Campbell, 1987), which vary linearly with the intensity of psychological (emotional) activation (Bradley & Lang, 2000).
FIGURE 2. The same spectrum of psychological states as in the right-hand column of Fig. 1, paired with their counterparts in the domain of esthetics. The context of existential safety that frames esthetic experience occasions a “hedonic reversal” of valence in the upper reaches of the esthetic information dimension, here designated “danger zone.”
Long before the recent formal treatments of this system were inaugurated, its behavioral and psychological aspects had been studied by psychologists and physiologists interested in the learning dynamics of habituation to novelty already cited. Their results can be summarized in terms of a pattern of graded responsiveness along a single psychological/emotional dimension. Cognitively it spans a spectrum from total certainty to total uncertainty, behaviorally from sleep to freezing, and emotionally from boredom to terror. Between the latter two extremes lies a gradient of emotional states ranging from mild interest, to active curiosity, caution, and fear, as depicted in Fig. 1. When the prior stimulus familiarity of such a system, stored as recognition memory, includes a massive repertoire of local song, acquired during the intensive song learning stage of ontogeny, the normal operation of this system renders it a sensitive detector of the extent to which a currently experienced songbout pushes or exceeds the boundaries of the listener’s recognition memory. To the extent that it does, the system will deliver the selfsame real-time emotional summary along the central activation dimension for that songbout as for any other sensory experience. An impoverished sample of the local song tradition will be experienced as “boring.” An adequate sample rendered with confidence or flair will be experienced as “interesting.” Finally, a bout whose pattern richness taxes the limits of the listener’s recognition memory would be experienced as
apprehension or fear, were it not for the fact that it is set apart from other activities by its character of performance, or “display” in behavioral biology terms, and is recognized as such by all concerned. Thus framed and contextually constrained, the superior performance induces not outright apprehension or fear in the listener, but a “tamed” version of the same in the form of being “touched,” “moved,” “impressed,” and—at the high end of the informational dimension—even “awed” by what is heard (cf. Konečni, 2005, 2015). This hedonic shift instantiates, in other words, the principle that in a context of safety, negative emotions may undergo a “hedonic reversal” to be experienced as positive (Apter, 1982; Bloom, 2010; Strohminger, 2013). The principal proposal of this chapter, then, is this: The biological roots of the esthetic emotions, animal and human, are to be found in this informational dimension of telencephalic operations in large-brained species. So far these esthetic emotions have been discussed primarily in the context of human responses to art (Berlyne, 1971; Konečni, 2005, 2015; Konečni, Brown, & Wanic, 2008; Kuehnast, Wagner, Wassiliwizky, Jacobsen, & Mennighaus, 2014; Scherer & Zentner, 2001; Scherer, Zentner, & Schacht, 2001–2002). For brevity, I propose to use the expressions “moving” and “being moved” (Konečni, 2005, 2015), and at times the equivalent “impressing,” “impressive,” and “being impressed,” as shorthand for phenomena associated with the mid-range of emotional responsiveness to esthetic stimuli, flanked by “interest” at the less intense, and by “awe” at the more intense, end of the range, as depicted in Fig. 2. The reason the heart of a Bengalese finch female starts beating faster on hearing a tape recording of an accomplished male singer (Okanoya, 2004) would accordingly be “because she is moved or impressed” by what she hears. And if we are indeed on the grounds of emotion we should be able to specify an action tendency or behavioral bias promoted by that emotion (Ekman, 1999; Fontaine & Scherer, 2013; Frijda, 1987; Izard, 2007). In view of what has gone before the answer is not far to seek: the action tendency promoted by the more intense levels of being impressed is that of “yielding,” “surrender,” “submission,” or “capitulation” to the source of the impressive performance, be the performer a potential mate or a rival. In a sense, the hedonic reversal from “fear” to “being moved” or “impressed” is reflected in a replacement of the behavioral tendency to escape by a tendency to yield or surrender. And if, finally, we ask for the eliciting
stimulus or antecedent that evokes the emotion of being impressed, the answer can only be “an outstanding performance.” We can accordingly sum up this excursion into the biology of “squandering as asset” by saying that an “outstanding performance” before listeners conversant with the tradition to which the performance belongs will move or impress those listeners, and that their emotional response of being impressed is realized in an action tendency toward “surrender,” directed at the performer exhibiting mastery through the performance. This ascription of the effect of esthetic stimuli to the operation of the information-related emotional dimension sets them clearly apart from both motivational systems in general (their hedonic aspects included, for which see Bloom, 2010) and from the domain of basic emotions as a whole (Ekman, 1999). The boredom-to-awe spectrum is but the full unpacking of one of the basic emotions, variously namely “interest” (Izard, 2007) or “surprise” (Ekman, 1999). In keeping with the “cerebral” nature of this emotional spectrum, the neural system for learning, producing, apprehending, and judging song in birds with vocal learning is concentrated to the telencephalon of their forebrain (Jarvis, 2007). Finally, to counter possible misunderstanding of the role assigned to emotion in the present proposal: the fact that the process of assessing the merits of a songbout is mediated by an emotional variable (“being moved or impressed”) by no means implies that the patterns of the song somehow “portray emotion,” are about emotion, or are a vehicle for communicating emotions. They portray nothing outside of themselves. What they communicate is command of repertoire, complexity, and mastery of execution, not anything encoded, language-like, in those patterns (more on this in the section “The Psychological Impact of Music”). When performed by an accomplished singer, a listener attuned to the relevant song tradition registers appreciation of the performance in the form of being interested, moved, or awed, according to the degree of command of tradition and virtuosity displayed therein. The emotion is about the pattern, and not the other way around. That the pattern in turn reflects the all-round phenotypic qualities of the performer is what allows esthetics to be cast in evolutionary terms, if the argument developed in this chapter has any merit.
T
H
C
We are now ready to make a swift transition to human arts and esthetics, and we do so via human music on the plausible assumption that the first form of human music proper was song. In fact, song may have preceded speech in our evolutionary history, perhaps in the form of song and dance in a group setting as a first form of the human arts (Merker, 2005, 2008). A strong reason to make these assumptions is provided by the fact that humans, unlike our closest relatives among the apes, indeed unlike any other primate, are vocal learners, and more specifically, are vocal production learners (Janik & Slater, 1997; see also Doupé & Kuhl, 1999). Among non-human animals, this capacity for learning to reproduce by voice novel sound patterns originally received by ear has most commonly evolved to serve learned song. Therefore, the default assumption regarding the function for which our own capacity for vocal learning originated would be song as well. If so, learned song preceded speech in our evolutionary ancestry (see Merker, 2012, 2015 for details), and we have landed squarely in the constellation of factors outlined in previous sections as critical for the origin and maintenance of complex cultural traditions of ritual lore in animals with learned song. As a biological trait, this would include the motivational mechanism of a conformal motive ensuring fidelity to tradition (Merker, 2005), the role of prior familiarity in appreciation (cf. Madison & Schiölde, 2017), as well as the ultimate purpose for taking on the burden of acquisition, namely to impress a competent audience with one’s command of the shared lore. As we saw in the section “Squandering as Asset, II,” fidelity to tradition coupled with a shared exposure history furnishes a standard of judgment short of which the tradition eventually collapses into idiosyncratic caprice without grounds for comparing one performance with another. Trends in Western art over the past century have tended to obscure the fundamental nature of this connection. In fact, it has been actively combatted as a fetter on the exercise of untrammeled creativity. In good agreement with the present perspective, the history of contemporary art accordingly abounds in examples of idiosyncratic caprice exercised in the absence of shared criteria for comparing one performance or creation with another. Proof of this assertion surfaces from time to time in the form of adventitious revelations that expose the arbitrary nature of the judgments
involved (Cheston, 2015; Jordan-Smith, 1960; Museum of Hoaxes, 2005. Also: Wikipedia entries for Disumbrationism, Pierre Brassau). Each step toward such a state of affairs typically meets with opposition when it first occurs. Presumably this trend in the serious arts of Western culture (poetry, painting, and music first and foremost, though not limited to these) would not have proceeded as far as it has were it not for a more general cultural ambience in the West emphasizing the inherent value of novelty and the sanctity of artistic freedom, buttressed by the Romanticism myth of artistic genius. That myth emphasizes the role of rare artistic endowment over that of diligent mastery of a tradition in the genesis of great art (Smith, 1924; Waterhouse, 1926). This cultural ambience eventually ripened into outright celebration of iconoclasm in the course of the twentieth century. Yet even then, with each advance of idiosyncratic license, voices were raised in protest, sometimes trenchantly so. One illustrative example occurred when a faction of musicians in the modern jazz genre abandoned all reliance on traditional form in what they styled “free form jazz.” The jazz bassist and band leader Charles Mingus, a creative musician by no means a stranger to innovation, witnessed a key event in this development. It was Ornette Coleman’s controversial 1960 performances at the New York City “Five Spot” jazz club. Mingus commented: “… if the free-form guys could play the same tune twice, then I would say they were playing something … Most of the time they use their fingers on the saxophone and they don’t even know what’s going to come out. They’re experimenting” (Wikipedia entry “Charles Mingus”). On another occasion he noted, “They don’t even know their Parker” (B. Merker, personal observation). Note that Mingus’ comments by no means are directed against creativity or innovation as such. They remind us, rather, of the necessity, under circumstances where freedom is in fact possible because the means of artistic expression are learned, of a shared exposure history to ground substantive assessment of artistic merit. It is that shared background that supplies the crucial “common currency” by which alone the informational emotion of “being impressed” serves as an index of comparative value across different performances. Without that anchoring in a shared tradition, the emotional reaction of being impressed becomes as arbitrary and idiosyncratic as the performances themselves. The bulwark against bluff has been broken.
The reaction of being impressed by outstanding artistic creations for which one has been prepared by an appropriate exposure history is ubiquitous across the arts. It must not be confounded with the kind of emotional responses that originate in personal associations forged in the course of significant life events. A tune that figured prominently in a teenage romantic infatuation may, when encountered years or even decades later, compel strong feelings on an associative basis without reflecting on the tune’s artistic merits (Konečni, 2005; Rauhe, 2003; Scherer & Zentner, 2001). It is otherwise when we encounter a piece of music, perhaps for the first time, for which our listening history of the genre to which it belongs has equipped us to appreciate its masterfully patterned content, and we groan and even weep in admiration (Gabrielsson, 2011; Scherer et al., 2001–2002; see also Konečni, 2005). We may even feel our skin covered in goosebumps, and a shiver or chill traverse our spine (reviewed in Hunter & Schellenberg, 2010; see further Gabrielsson, 2011; Scherer & Zentner, 2001; Silvia & Nusbaum, 2011; Vickhoff, Åström, & Theorell, 2012). But what a peculiar way to express our admiration, by sighing, groaning, chills, and even tears! Our analysis of animal cultural esthetics allows us to make sense of these peculiar behavioral and physiological tokens of being impressed. An ordinary trigger for bodily reactions such as shivers, goose-bumps (piloerection), or chills is genuine fear (Marks, 1969, pp. 2, 39). They are the peripheral expressions of the central fear state, as it engages the autonomic (sympathetic) nervous system on an automatic, involuntary basis. These low-level autonomic (involuntary) reactions apparently remain patent even under circumstances where an esthetic stimulus taps the fear range of the information dimension, but on contextual grounds undergoes a hedonic reversal, as already covered. Thus the shivers, chills, and goosebumps betray the origin of the emotional impact of strong esthetic experiences in the fear range of the informational dimension depicted in Figs. 1 and 2, in good agreement with the present interpretive framework (cf. Benedek & Kaernbach, 2011). Similarly for the sighing, groaning, and weeping elicited by strong esthetic experiences. Tears appear to be the most common bodily response to strong experiences of music (Gabrielsson, 2011; Scherer et al., 2001– 2002; see also Konečni, 2005). A prominent ordinary setting for such
reactions is the experience of personal loss, for which such reactions serve a largely involuntary expressive function (e.g., Averill, 1979; Frijda, 1988). As we saw in the previous section, the action tendency contingent on being esthetically moved should be a readiness, indeed an urge, to yield, submit, surrender, or capitulate to the source of an impressive performance, be it a rival who has bested us by a masterly performance, or a suitor who has penetrated our defenses by the same. In either case, loss is implicit in the act of surrender. Being bested by a rival is attended by a direct loss of status and its perquisites. What hovers in the evolutionary background, as we have seen, is the potential for physical attack from an agonist whose masterful performance, according to the developmental stress hypothesis, advertises his all-round superior phenotypic characteristics. In surrendering to a suitor, one loses freedom of choice in matters as important as the parentage of one’s offspring, along with loss of personal independence for the considerable stretch of time that the partnership will last. More abstractly conceived, a certain giving up (loss) of self is implicit in every act of submission. Arthur Schopenhauer emphasized “forgetfulness of self” in discussing esthetics, and its special relation to experiences of the sublime which he illustrated by way of landscape painting (Schopenhauer, 1844/1966, vol. I, pp. 200ff., vol. II. pp. 369ff.). For a recent discussion of this important (and once celebrated) topic in esthetics, see Konečni (2005, 2011). Absorption in the pattern-stream of a musical performance promotes forgetfulness of anything extraneous, including one’s sense of self. Such absorption, given the requisite level of background familiarity, will be all the more complete and compelling in the case of outstanding performances, not only because their masterly patterning invites it, even compels it, but because they tax our powers of apprehension. Then self-surrender and forgetfulness of self may reach a peak, a circumstance that may bear on the psychology of transcendental and religious experiences that are a prominent aspect of strong experiences of music (Gabrielsson, 2011). What drives tears to our eyes even though we are not actually sad or grieving, then, is the tacit sense of loss coupled to the action tendency of surrender promoted by an outstanding performance. The phenomenon is not even strictly confined to arts and esthetics: similar responses can occur on witnessing an outstanding performance in, say, sports. To prevent misunderstanding, note that none of this is to be taken to mean that the
connection between such reactions and surrender is directly present to the minds of those experiencing them. The listening mind is typically absorbed in the pattern of the performance, far from the sadness of loss or the cold hand of fear. In keeping with the hedonic reversal invoked here, happiness and joy are typical of these intense experiences (Gabrielsson, 2011). These caveats regarding what might be present to the mind of the listener/beholder do not mean, however, that the evolutionary logic of capitulating to the originator of a masterful display necessarily is a matter of our distant ancestry only. There is no dearth of examples of strangers soliciting casual amorous liaisons with famous creators of art on the basis of encountering their creations alone (Lipsius, 1919; Miller, 2000, p. 331; see also Nettle & Clegg, 2006). In sum, only where artistry is embedded in a tradition that poses a challenge of acquisition for its practitioners and also has shaped the sensibilities of the intended audience does the latter’s emotional response of being impressed provide a measure of artistic merit. It is only when both conditions are met that a causal connection between intuitive response and artistic merit is in fact patent, according to the psychological account given in the section “Squandering as Asset, III.” Such was typically the state of affairs throughout human cultures until the advent of modernity in the West, and even there it still holds for its popular culture within any of its given subcultures. By the same token, where anything can be art, nothing in fact is.
T N
P “M
I ”
M “E
: ”
The thesis that art generally, and music specifically, exerts its effect via the informational dimension defined in the section “Squandering as Asset, III” has obvious consequences for the much discussed issue of “music and meaning” and its subdomain “music and emotion” (Davies, 1994; Juslin & Sloboda, 2001; Meyer, 1956; Robinson, 1997). It does so by allowing us to draw a principled distinction between the undoubted psychological impact of music on the one hand, and questions of its carrying meaning as well as of its portraying or inducing emotions on the other.
Strictly speaking only the sentences of language “mean” at all, as most trenchantly argued by Staal (Staal, 1989). The multilevel combinatorics of phonemes and morphemes by which language performs its arbitrary (in the sense of conventional) mapping between the form of utterances and their meaning (compare “bord,” “Tisch,” and “table” for the selfsame type of object in Swedish, German, and English, respectively) constitutes a bona fide code for representing and conveying meaning. This code is so detailed and comprehensive that virtually every difference in the strings of phonemes that make up sentences makes a difference in the information conveyed by those sentences. This lexically semanticized syntactic code turns sequential patterns of vocally produced sounds into statements about things that bear not the slightest resemblance to those sound sequences themselves (see the “table” example above). Thus we think and communicate about objects, events, matters of fact, states of the world, ideas, intentions, beliefs and desires, without limit, using the same few dozen phonemes to do so. This is what it means to “mean,” namely that something encodes something other than itself, which is its meaning. Much of what compels our interest, carries significance, and recruits our psychological engagement—in short, has psychological impact—does so without the detour of meaning in this sense. Our non-linguistic perception and cognition quite generally operates on patterns of sensory input by discriminating, segmenting, grouping, classifying and generalizing within and across them, and not by using them as vehicles for encoding matters other than themselves. Something need not, in other words, mean in order to be meaningful: witness the experience of a magnificent sunset as but one of an infinitude of cases in point. Viewed in this light, the patterns of music define themselves as perceptual objects that engage the informational dimension of our perceptual/cognitive capacities in the manner of auditory analogs of visually presented arabesques or the shifting patterns of a turning kaleidoscope, to use Hanslick’s felicitous metaphors (Hanslick, 1854). As such they need to “sound good” (and “better,” and “best”), not to refer to circumstances other than themselves. In so doing they exploit the limitless pattern-generativity music conquers for itself by a combinatorics of “particulate” elements drawn from the discretized continua of pitch and duration (for which see Abler, 1989; Merker, 2002; Merker, Morley, & Zuidema, 2015). This limitless generativity fits ill with a conception of
music as a device for portraying or evoking the limited set of subjective states that make up our emotions. Not only is the empirical evidence supporting that conception weak (Konečni, 2003, 2008; Konečni et al., 2008; Scherer, 2003), but weighty arguments have been leveled against it, arguments for which Hanslick’s 1854 essay is still the unsurpassed locus classicus (Hanslick, 1854; see also Davies, 1994; Zangwill, 2004). It is the common experience of having been moved by music, even to the point of tears or chills in some instances, that has lent credence to the notion that music, somehow, is “about” emotions, or exerts its effects by inducing them. This “being moved” or being impressed by music is indeed an emotional response. But as we saw in the section “Squandering as Asset, III,” that response moves up and down the intensity dimension of a single one of the basic emotions, rather than across them. The fact that music has an emotional impact in this sense must accordingly be sharply distinguished from the claim that music is about emotions in the plural, or aims at evoking emotions, again in the plural, which in either case it would have to do to fit the metaphor of being a “language of emotions” (Spencer, 1911). To the extent that the patterns of music refer to circumstances other than themselves (e.g., storms, battles, or dancing peasants in programmatic music) they tend to do so by dynamically or otherwise mimicking, resembling, or caricaturing the things to be evoked (Hanslick, 1854). That is not how language carries meaning except in the special cases of onomatopoeia and some of the uses of prosody, both of which lie outside of the central coding device that gives the sentences of language their unique and unbounded capacity to mean. So even when music is intended to mean —which is far from always the case—it does not mean, it mimics. In song and music without lyrics it is the vocal or instrumental patterns themselves that are the information conveyed. As detailed in previous sections, our emotional response to these patterns concerns the inducing patterns themselves as they interact with the background of our prior musical familiarity. When that background relates to the genre to which the patterns belong, the specificity, scope, and intricacy of the interaction is commensurately enhanced. It is here that the infinite pattern generativity of music comes into its own. It furnishes the makings of the untold structural devices (variation and repetition, various symmetries and asymmetries, etc.) needed to create temporal trajectories capable of sustaining our interest in the face of the ubiquitous habituability of our cognitive equipment, a
habituability that converts every novelty to “old hat” in the course of a few encounters. It is not to our emotions that this content is addressed in the first place, but to our imagination, as Eduard Hanslick, following Arthur Schopenhauer, insisted (Hanslick, 1854; Schopenhauer, 1844/1966, vol. II, pp. 447ff.).1 Engaging the contours of our recognition memory the ever-varied patterns of music trigger a variety of familiarity-based expectancies which their temporally unfolding melodic, rhythmic, and harmonic patterns confirm, violate, or complement, generating tensions, their resolution, and new expectancies in ever-shifting peregrinations across the sensibility landscape sculpted by the listener’s history of prior exposure (Meyer, 1956; Narmour, 1977; Schopenhauer, 1844/1966, vol. II, p. 455). For a given musical listening experience, it is the cumulative effect—presumably in “leaky integrator” fashion—of the particular sequence of twists and turns along the temporal trajectory of this interaction that determines how far up the information dimension toward “awe” a given experience of music takes us and thus the extent to which it moves or impresses us. In this sense “being moved” or “impressed” is a specifically esthetic emotion. It may even be deemed the esthetic emotion (Konečni, 2005, 2011, 2015), generated by hedonic reversal in the danger zone of the information dimension. In both cultural history and the listening history of individuals, the infinite space of musical combinatorics differentiates into occupied subregions according to genre (cf. Merker, 2002, pp. 11–12). Individual specimens that make up any given region of this space will exhibit greater or lesser elegance with greater or lesser degrees of well-formedness (see, e.g., Lerdahl & Jackendoff, 1983) and greater or lesser efficacy in stirring a given listener’s imagination on encounter. And just as in other perceptual and cognitive systems, pattern invariants are bound to be extracted across the sampled space in accordance with a variety of shared structural characteristics, clustering musical impressions under high-level descriptors. Thus the categories “nostalgia” (sentimental, dreamy, melancholic), “power” (energetic, triumphant, heroic), and “tension” (agitated, nervous, impatient, irritated) extracted by Zentner and colleagues from responses to a diverse sample of European classical music (Zentner, Grandjean, & Scherer, 2008). The authors interpret their results in terms of a model of musicspecific emotions. They might also be construed in terms of high-level
intuitive (statistical) pattern-classification, ranging across the vast and multiform world of musical patterns that have accumulated in a given musical culture and whose uneven sampling has shaped the musical familiarity and sensibilities of any given listener. The perspective on the psychological impact of music presented here by no means relegates music to an abstract domain of formalist connoisseurship. Some of its patterns—say of rhythmic music meant to support dancing—access a presumably species-specific predisposition for bodily entrainment to isochrony-based auditory patterns, and help optimize such entrainment (Merker, 2014; Merker, Madison, & Eckerdal, 2009). The central role of music in youth and popular culture suffices to dispel any overly formalist notion of its nature, and fits well with the evolutionary perspective on esthetics presented here.
C To sum up: the emotional impact of music is best understood not by analogy to the meaning encoded in language, nor by assimilation to the biology of basic emotions, but through the behavioral biology of the Zahavian handicap principle (Zahavi, 1975) and its psychological ramifications. Where handicaps take the form of esthetic displays—from the peacock’s tail to the vocal artistry of pied butcherbirds (Taylor, 2009)— mechanisms for judging their quality must be in place, typically in the medium of an emotional dimension spanning from boredom, via interest/curiosity, to being impressed, with awe and a sense of sublimity at its high end. As I have been at pains to make credible, the elaboration of Zahavi’s handicap principle in the developmental stress hypothesis for the size and complexity of birdsong repertoires provides an eminently plausible interpretive framework for the nature and function of human song and music as well. It dispels the appearance of frivolity encumbering our expenditure of effort and resources on acquiring and producing the pattern richness of human song and music. By exact analogy to the case of learned birdsong, it gives us a means to display command and mastery of a trove of culturally patterned and transmitted lore. Such command and mastery
serves not only as a badge of competence in the culture, but as a certificate of the phenotypic traits needed to achieve that competence. In our case today, music is not alone in providing such a shorthand certificate of phenotypic competence. It was eventually supplemented by language performing that same function among others. The two domains share not only the pattern generativity of a combinatorics of discrete elements, but also the mechanism of vocal learning, and the cerebral equipment for pattern-assessment. It is even possible that language grew out of song in a glacial movement of contextual semanticization of song repertoires, as detailed in Merker (2012). For music, in the setting of a cultural tradition of pattern familiarity shared between performer and listener, the circumstances reviewed here allow a given performance to be appreciated, and even to be assessed, on occasion, as an outstanding one. And that, I submit, is when extravagance impresses, and what is more, when it rightfully should impress, according to the recasting of esthetics in evolutionary terms that has been the burden of this chapter.
R Abler, W. L. (1989). On the particulate principle of self-diversifying systems. Journal of Social and Biological Structures 12(1), 1–13. Apter, M. J. (1982). The experience of motivation: The theory of psychological reversals. New York: Academic Press. Averill, J. R. (1979). The functions of grief. In C. Izard (Ed.), Emotions in personality and psychopathology (pp. 339–368). New York: Plenum Press. Baylis, J. R. (1982). Avian vocal mimicry: Its function and evolution. In D. E. Kroodsma & E. H. Miller (Eds.), Acoustic communication in birds (pp. 51–83). New York: Academic Press. Benedek, M., & Kaernbach, C. (2011). Physiological correlates and emotional specificity of human piloerection. Biological Psychology 86(3), 320–329. Berlyne, D. E. (1960). Conflict, arousal, and curiosity. New York: McGraw-Hill. Berlyne, D. E. (1971). Aesthetics and psychobiology. New York: Appleton Century. Bindra, D. (1959). Stimulus change, reactions to novelty, and response decrement. Psychological Review 66(2), 96–103. Bloom, P. (2010). How pleasure works. New York: W. W. Norton. Bradley, M. M., & Lang, P. J. (2000). Measuring emotion: Behavior, feeling and physiology. In R. D. Lane, L. Nadel, & G. Ahern (Eds.), Cognitive neuroscience of emotion (pp. 242–276). New York: Oxford University Press. Cheston, P. (2015). Artist in legal row claims “former workshop sold her paint-spattered carpet as genuine works.” Retrieved from http://www.standard.co.uk/news/london/artist-in-legal-row-
claims-former-workshop-sold-her-paint-spattered-carpet-as-genuine-works-a2947666.html Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences 36(3), 181–204. Darwin, C. (1871). The descent of man and selection in relation to sex. New York: D. Appleton & Company. Davies, S. (1994). Musical meaning and expression. Ithaca, NY: Cornell University Press. Doupé, A. J., & Kuhl, P. K. (1999). Birdsong and human speech: Common themes and mechanisms. Annual Review of Neuroscience 22, 567–631. Dowsett-Lemaire, F. (1979). The imitative range of the song of the Marsh Warbler, Acrocephalus palustris, with special reference to imitations of African birds. Ibis 121(4), 453–468. Ekman, P. (1999). Basic emotions. In T. Dalgleish & M. Power (Eds.), Handbook of cognition and emotion (pp. 45–60). Chichester: John Wiley and Sons. Fontaine, J. J. R., & Scherer, K. R. (2013). Emotion is for doing: The action tendency component. In J. J. R. Fontaine, K. R. Scherer, & C. Soriano (Eds.), Components of emotional meaning: A sourcebook (Chapter 11). Oxford Scholarship Online. Oxford: Oxford University Press. doi:10.1093/acprof:oso/9780199592746.001.0001 Frijda, N. H. (1987). Emotion, cognitive structure, and action tendency. Cognition and Emotion 1(2), 115–143. Frijda, N. H. (1988). Laws of emotion. American Psychologist 43(5), 349–358. Friston, K. (2002). Functional integration and inference in the brain. Progress in Neurobiology 68(2), 113–143. Gabrielsson, A. (2011). Strong experiences with music. Oxford: Oxford University Press. Gahr, M. (2000). Neural song control system of hummingbirds: Comparison to swifts, vocal learning (songbirds) and nonlearning (suboscines) passerines, and vocal learning (budgerigars) and nonlearning (dove, owl, gull, quail, chicken) nonpasserines. Journal of Comparative Neurology 426(2), 182–196. Garamszegi, L. Z., & Eens, M. (2004). Brain space for a learned task: Strong intraspecific evidence for neural correlates of singing behavior in songbirds. Brain Research Reviews 44(2–3), 187–193. Grassberger, P. (1986). Toward a quantitative theory of self-generated complexity. International Journal of Theoretical Physics 25(9), 907–938. Hanslick, E. (1854). Vom musikalisch Schönen. Beiträge zur Revision der Ästhetik der Tonkunst. Leipzig: Weigel. Hartshorne, C. (1956). The monotony threshold in singing birds. Auk 73, 176–192. Hasselquist, D., Bensch, S., & von Schantz, T. (1996). Correlation between song repertoire, extrapair paternity and offspring survival in the great reed warbler. Nature 381(6579), 229–232. Heinrichs, M., von Dawans, B., & Domes, G. (2009). Oxytocin, vasopressin, and human social behavior. Frontiers in Neuroendocrinology 30(4), 548–557. Hinton, G. E., & Zemel, R. S. (1994). Autoencoders, minimum description length, and Helmholtz free energy. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems 6 (pp. 3–10). San Mateo, CA: Morgan Kaufmann. Hunter, P. G., & Schellenberg, E. G. (2010). Music and emotion. In M. R. Jones, R. R. Fay, & A. N. Popper (Eds.), Music perception (pp. 129–164). New York: Springer. Huron, D. (2001). Is music an evolutionary adaptation? Annals of the New York Academy of Sciences 930, 43–61. Iwaniuk, A. N., & Nelson, J. E. (2003). Developmental differences are correlated with relative brain size in birds: A comparative analysis. Canadian Journal of Zoology 81(12), 1913–1928. Izard, C. E. (2007). Basic emotions, natural kinds, emotion schemas, and a new paradigm. Perspectives on Psychological Science 2(3), 260–280.
Janik, V. M., & Slater, P. J. B. (1997). Vocal learning in mammals. Advances in the Study of Behavior 26, 59–99. Jarvis, E. D. (2007). Neural systems for vocal learning in birds and humans: A synopsis. Journal of Ornithology 148, 35–44. Jordan-Smith, P. (1960). The road I came; some recollections and reflections concerning changes in American life and manners since 1890. Caldwell, Idaho: Caxton Printers. Juslin, P. N., & Sloboda, J. A. (Eds.). (2001). Music and emotion: Theory and research. Oxford: Oxford University Press. Kamin, L. J. (1969). Predictability, surprise, attention, and conditioning. In R. Church & B. Campbell (Eds.), Punishment and aversive behavior (pp. 279–296). New York: Appleton-Century-Crofts. Kelly, A. M., & Goodson, J. L. (2014). Social functions of individual vasopressin–oxytocin cell groups in vertebrates: What do we really know? Frontiers in Neuroendocrinology 35(4), 512–529. Konečni, V. J. (2003). Review of P. N. Juslin and J. A. Sloboda (Eds.), Music and emotion: Theory and research. Music Perception 20, 332–341. Konečni, V. J. (2005). The aesthetic trinity: Awe, being moved, thrills. Bulletin of Psychology and the Arts 5(2): 27–44. Konečni, V. J. (2008). Does music induce emotion? A theoretical and methodological analysis. Psychology of Aesthetics, Creativity, and the Arts 2(2), 115–129. Konečni, V. J. (2011). Aesthetic trinity theory and the sublime. Philosophy Today 55, 64–73. Konečni, V. J. (2015). Being moved as one of the major aesthetic emotional states: A commentary on “Being moved: linguistic representation and conceptual structure.” Frontiers in Psychology 6, 343. Konečni, V. J., Brown, A., & Wanic, R. (2008). Comparative effects of music and recalled life-events on emotional state. Psychology of Music 36(3), 289–308. Konishi, M. (2004). The role of auditory feedback in birdsong. In H. P. Ziegler & P. Marler (Eds.), The behavioral neurobiology of birdsong. Annals of the New York Academy of Sciences 1016, 463– 475. Kroodsma, D. E. (1978). Continuity and versatility in birdsong: Support for the monotony threshold hypothesis. Nature 274(5672), 681–683. Kroodsma, D. E., & Parker, L. D. (1977). Vocal virtuosity in the brown thrasher. Auk 94, 783–785. Kuehnast, M., Wagner, V., Wassiliwizky, E., Jacobsen, T., & Mennighaus, W. (2014). Being moved: Linguistic representation and conceptual structure. Frontiers in Psychology: Emotion Science 5, 1242. Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge, MA: MIT Press. Lim, M. M., & Young, L. J. (2006). Neuropeptidergic regulation of affiliative behavior and social bonding in animals. Hormones and Behavior 50(4), 506–517. Lipsius, I. M. (1919). Liszt und die Frauen. Leipzig: Breitkopf & Härtel. MacDougall-Shackleton, S. A., & Spencer, K. A. (2012). Developmental stress and birdsong: Current evidence and future directions. Journal of Ornithology 153(Suppl. 1), S105–S117. Madison, G., & Schiölde, G. (2017). Repeated listening increases the liking for music regardless of its complexity: Implications for the appreciation and aesthetics of music. Frontiers in Neuroscience 11, 147. Marks, I. M. (1969). Fears and phobias. New York: Academic Press. Merker, B. (2002). Music: The missing Humboldt system. Musicae Scientiae 6(1), 3–21. Merker, B. (2005). The conformal motive in birdsong, music and language: An introduction. In G. Avanzini, L. Lopez, S. Koelsch, & M. Majno (Eds.), The neurosciences and music II: From perception to performance. Annals of the New York Academy of Sciences 1060, 17–28.
Merker, B. (2007a). Consciousness without a cerebral cortex: A challenge for neuroscience and medicine. Behavioral and Brain Sciences 30(1), 63–134. Merker, B. (2007b). Music at the limits of the mind. In G. Kugiumutzakis (Ed.),Sympantiki Armonia, Musike kai Epistimi. Ston Miki Theodoraki [Universal harmony, music and science. In honour of Mikis Theodorakis]. Heraklion: Crete University Press. Merker, B. (2008). Ritual foundations of human uniqueness. In S. Malloch & C. Trevarthen (Eds.), Communicative musicality (pp. 45–59). Oxford: Oxford University Press. Merker, B. (2012). The vocal learning constellation: Imitation, ritual culture, encephalization. In N. Bannan & S. Mithen (Eds.), Music, language and human evolution (pp. 215–60). Oxford: Oxford University Press. Merker, B. (2013a). The efference cascade, consciousness, and its self: Naturalizing the first person pivot of action control. Frontiers in Psychology 4, article 501, 1–20. Merker, B. (2013b). Cortical gamma oscillations: The functional key is activation, not cognition. Neuroscience & Biobehavioral Reviews 37(3): 401–417. Merker, B. (2014). Groove or swing as distributed rhythmic consonance: Introducing the groove matrix. Frontiers in Human Neuroscience 8, article 454, 1–4. Merker, B. (2015). Seven theses on the biology of music and language. Signata 6, 195–213. Merker, B., Madison, G., & Eckerdal, P. (2009). On the role and origin of isochrony in human rhythmic entrainment. Cortex 45(1): 4–17. Merker, B., Morley, I., & Zuidema, W. (2015). Five fundamental constraints on theories of the origins of music. Philosophical Transactions of the Royal Society of London: Biology 370(1664): 20140095. doi:10.1098/rstb.2014.0095 Meyer, L. B. (1956). Emotion and meaning in music. Chicago, IL: University of Chicago Press. Miller, G. F. (2000). The mating mind: How sexual choice shaped the evolution of human nature. New York: Doubleday. Museum of Hoaxes (2005). Monkey art fools expert. Retrieved from: http://hoaxes.org/weblog/comments/monkey_art_fools_expert Narmour, E. (1977). Beyond Schenkerism: The need for alternatives in music analysis. Chicago, IL: University of Chicago Press. Nettle, D., & Clegg, H. (2006). Schizotypy, creativity and mating success in humans. Proceedings of the Royal Society of London B: Biological Sciences 273, 611–615. doi:10.1098/rspb.2005.3349 Nowicki, S., Searcy, W. A., & Peters, S. (2002a). Brain development, song learning and mate choice in birds: A review and experimental test of the “nutritional stress hypothesis.” Journal of Comparative Physiology A: Sensory, Neural, and Behavioral Physiology 188: 1003–1004. Nowicki, S., Searcy, W. A., & Peters, S. (2002b). Quality of song learning affects female response to male bird song. Proceedings of the Royal Society of London B: Biological Sciences 269, 1949– 1954. Okanoya, K. (2004). Song syntax in Bengalese finches: Proximate and ultimate analyses. Advances in the Study of Behavior 34, 297–345. Patel, A. D. (2008). Music, language, and the brain. Oxford: Oxford University Press. Pearce, E., Launay, J., & Dunbar, R. I. M. (2015). The ice-breaker effect: Singing mediates fast social bonding. Royal Society Open Science 2, 150221. Retrieved from http://dx.doi.org/10.1098/rsos.150221 Pinker, S. (1997). How the mind works. New York: Penguin Putnam. Rauhe, H. (2003). Musik heilt und befreit. In H. G. Bastian & G. Kreutz (Eds.), Musik und Humanität. Interdiziplinäre Grundlagen für (musikalische) Erzhiehung und Bildung (pp. 182– 191). Mainz: Schott.
Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and non-reinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). New York: Appleton-CenturyCrofts. Riebel, K. (2003). The “mute” sex revisited: Vocal production and perception learning in female songbirds. Advances in the Study of Behavior 33, 49–86. Robinson, J. (Ed.). (1997). Music and meaning. Ithaca, NY: Cornell University Press. Rohrmeier, M. A., & Koelsch, S. (2012). Predictive information processing in music cognition. A critical review. International Journal of Psychophysiology, 83, 164–175. Sachs, E. (1967). Dissociation of learning in rats and its similarities to dissociative states in man. In J. Zubin & H. Hunt (Eds.), Comparative psychopathology: Animal and human (pp. 249–304). New York: Grune and Stratton. Scherer, K. R. (2003). Why music does not produce basic emotions. In R. Breslin (Ed.), Proceedings of the Stockholm Music Acoustic Conference, 2 vols., Vol. 1 (pp. 25–28). Retrieved from http://www.speech.kth.se/smac03 Scherer, K. R., & Zentner, M. R. (2001). Emotional effects of music: Production rules. In P. N. Juslin & J. A. Sloboda (Eds.), Music and emotion: Theory and research (pp. 361–392). Oxford: Oxford University Press. Scherer, K. R., Zentner, M. R., & Schacht, A. (2001–2002). Emotional states generated by music: An exploratory study of music experts. Musicae Scientiae, Special Issue: Current trends in the study of music and emotion, 149–171. Schopenhauer, A. (1844/1966). The world as will and representation (2nd ed.; orig. ed. 1819). Trans. E. F. J. Payne, 2 vols. New York: Dover. Shields, S. A., MacDowell, K. A., Fairchild, S. B., & Campbell, M. L. (1987). Is mediation of sweating cholinergic, adrenergic, or both? A comment on the literature. Psychophysiology 24(3), 312–319. Silvia, P. J., & Nusbaum, E. C. (2011). On personality and piloerection: Individual differences in aesthetic chills and other unusual aesthetic experiences. Psychology of Aesthetics, Creativity, and the Arts 5(3), 208–214. Smith, L. P. (1924). Four words: Romantic, originality, creative, genius. Oxford: Clarendon Press. Sokolov, E. N. (1963). Higher nervous functions: The orienting reflex. Annual Review of Physiology 25, 545–580. Spencer, H. (1911). On the origin and function of music. In Essays on education and kindred subjects (pp. 312–330). London: J. M. Dent & Sons. Staal, F. (1989). Rules without meaning. New York: Peter Lang. Strohminger, N. S. (2013). The hedonics of disgust (Doctoral dissertation). University of Michigan. Retrieved from https://deepblue.lib.umich.edu/handle/2027.42/97960 Taylor, H. (2009). Towards a species songbook: Illuminating the vocalisations of the Australian pied butcherbird (Cracticus nigrogularis) (Doctoral dissertation). University of Western Sydney. Tribus, M. (1961). Thermodynamics and thermostatics: An introduction to energy, information and states of matter, with engineering applications. New York: Van Nostrand. Vickhoff, B., Åström, R., & Theorell, T. (2012). Musical piloerection. Music and Medicine 4, 82–89. Waterhouse, F. A. (1926). Romantic “originality.” The Sewanee Review 34, 40–49. Zahavi, A. (1975). Mate selection: A selection for a handicap. Journal of Theoretical Biology 53(1), 205–214. Zangwill, N. (2004). Against emotion: Hanslick was right about music. British Journal of Aesthetics 44(1), 29–43.
Zentner, M., Grandjean, D., & Scherer, K. R. (2008). Emotions evoked by the sound of music: Characterization, classification, and measurement. Emotion 8(4), 494–521.
1
For Schopenhauer’s influence on Hanslick, see Merker (2007b), to which can be added the fact that the “smoking gun” of that influence in Hanslick’s final paragraph was eliminated from all but the first edition of Hanslick’s famous essay.
SECTION III
MU S IC P R OC E S S IN G IN T HE HU MA N B R A IN
CHAPT E R 5
CEREBRAL O R G A N I Z AT I O N O F M U S I C PROCESSING T H E N I L L E B R A U N JA N Z E N A N D MI C H A E L H . T H A U T
I U the neural underpinnings of music processing is a central theme in cognitive neuroscience, as evidenced by the growing body of literature on this topic. Neuroimaging research developed over the past 20 years has successfully mapped several cortical and subcortical brain regions that support music processing. This chapter provides a broad panorama of the current knowledge concerning the anatomical and functional basis of music processing in the healthy brain. For that, we focus our attention on core brain networks implicated in music processing, emphasizing the anatomical and functional interactions between cortical and subcortical areas within auditory-frontal networks, auditory-motor networks, and auditory-limbic networks. Finally, we review recent studies investigating how brain networks organize themselves in a naturalistic music listening context. The term network here implies the notion of a collection of regions that are activated to support a particular function, referencing structural and functional connections between these regions. With that, we move beyond the “where” and “when” of task-related activity to start understanding how
different brain networks interact to support cognitive, perceptual, and motor functions.
N
B
M H
P B
The Ascending Auditory Pathways Music perception begins with the decoding of acoustic information. Acoustic signals such as voices and music enter the human ear and trigger a cascade of signal transpositions along the auditory pathways (Fig. 1). Incoming auditory signals are transmitted by the outer and middle ear to the cochlea of the inner ear, where acoustic information is translated into neural activity. Acoustic properties such as sound frequency are represented tonotopically in the basilar membrane of the cochlea, which refers to the systematic topographical arrangement of neurons as a function of their response to tones of different frequencies. This tonotopic organization is found throughout the auditory neuraxis (Humphries, Liebenthal, & Binder, 2010; Zatorre, 2002).
FIGURE 1. The neural auditory pathway consists of an interconnecting cascade of processing nodes from the cochlear nucleus (CN) up to primary auditory cortex (AC) and higher-level auditory regions in superior temporal cortex (STC). Abbreviations: CN, cochlear nucleus; SOC, superior olivary complex; IC, inferior colliculus; HC, hippocampus; MGB, medial geniculate body; AC, auditory cortex; STC, superior temporal cortex. Reprinted from Progress in Neurobiology 123(1), Sascha Frühholz, Wiebke Trost, and Didier Grandjean, The role of the medial temporal limbic system in processing emotions in voice and music, pp. 1–17, https://doi.org/10.1016/J.PNEUROBIO.2014.09.003, Copyright © 2014 Elsevier Ltd. All rights reserved.
Outside of the cochlea, dendrites of the spiral ganglion cells synapse with the base of the hair cells located in the organ of Corti on the basilar membrane. Triggered by the movement of the hair cells on the basilar membrane, the spiral ganglion cells are the first neurons to fire an action potential in the auditory pathway and transmit all the brain’s auditory input via their axons synapsing with the dendrites of the cochlear nuclei (Amunts, Morosan, Hilbig, & Zilles, 2012; Froud et al., 2015; Nayagam, Muniak, & Ryugo, 2011). The majority of the fibers (70 percent) cross over to the opposite hemisphere starting at the levels of the cochlear nuclei (contralateral
pathway), while some remain on the same incoming side (ipsilateral pathway). The acoustic information is highly preprocessed by a series of brainstem nuclei before reaching the cortex. Basic acoustic features such as sound intensity, signal onsets, periodicity, and signal location are extracted in the cochlear nucleus, lateral lemniscus, and the superior olivary complex. There is a secondary pathway that originates in the ventral cochlear nucleus where some fibers project from there to the reticular formation, a general arousal system in the lower brainstem. Descending (efferent) fiber tracts from the reticular formation form the audio-spinal pathway by connecting with the motor neurons in the spinal cord to innervate reflexive motor responses to sound and to prime motor neural excitability (Horn, 2006; Huffman & Henson, 1990; Rossignol & Melvill Jones, 1976). The secondary ascending (afferent) pathway inhibits lower auditory centers to elevate hearing thresholds and alert the cortex to incoming auditory signals. In the primary ascending pathway, the superior olivary complex is the first relay station of the brainstem where cochlear inputs from both left and right sides converge, providing the anatomical basis for the processing of sound location by measuring timing and sound intensity differences between incoming left and right signals to determine sound angles (Grothe, 2000; Tollin, 2003). More complex spectral and temporal decoding of the acoustic signals occurs in the inferior colliculus. Functional magnetic resonance imaging research with animals has shown that the spectral and temporal dimensions of the acoustic signals are distinctly mapped in the inferior colliculus, indicating that, in addition to the tonotopic maps, the temporal envelope of the acoustic signals are also topographically represented in the inferior colliculus (Baumann et al., 2011). The last crosslateral projections are at the inferior colliculus level. The last subcortical node in the primary ascending pathway is the medial geniculate body, which is comprised of multiple subdivisions. The ventral nucleus of the medial geniculate body is tonotopically organized and is the main ascending route to the primary auditory cortex, while its other subdivisions project widely to both primary and non-primary auditory cortex. Importantly, the auditory pathway does not only consist of ascending projections; it also has rich top-down projections that are critical for modulation of neural responses in the subcortical auditory centers and for learning-induced plasticity (Bajo, Nodal, Moore, & King, 2010; Suga &
Ma, 2003). In general, conduction in the auditory pathway is faster and stronger for the contralateral pathway. The human auditory cortex is located in the posterior part of the superior temporal lobe covering the Heschl’s gyrus and parts of the planum temporale and the posterior superior temporal gyrus. More specifically, the primary auditory cortex is largely located in the medial part of the Heschl’s gyrus (corresponding to Brodmann’s area BA41), and its core auditory region is tonotopically organized such that different subregions of the cortex are sensitive to different frequency bands (Langers, 2014; NormanHaignere, Kanwisher, & McDermott, 2013). The primary auditory cortex performs fine-grained and specific analysis of acoustic features, such as frequency (Da Costa et al., 2011; Humphries et al., 2010; Warren, Uppenkamp, Patterson, & Griffiths, 2003) and spectro-temporal modulation (Schonwiesner & Zatorre, 2009), playing a key role for the transformation of acoustic features into auditory percepts (e.g., from sound frequency into pitch percept) (Griffiths & Warren, 2004). Several lesion studies and functional imaging research have identified the lateral Heschl’s gyrus as a pitch-sensitive area, suggesting that pitch percepts are represented in this particular cortical region of the auditory cortex (for review, see Zatorre & Zarate, 2012). After the initial decoding of acoustic information in the primary auditory cortex, the information is transmitted to the secondary auditory cortex (located in the planum temporale and the planum polare) and to higher-level associative cortex in the superior temporal cortex and superior temporal sulcus. Areas of the non-primary auditory cortex are involved in a number of functions crucial for establishing a cognitive representation of the acoustic environment, including the representation of auditory objects (auditory Gestalt formation), which entails processes such as the analysis of the contour of a melody, spatial grouping, extraction of inter-sound relationships, and stream segregation (Griffiths & Warren, 2002, 2004; for review, see Koelsch, 2011). Within the non-primary auditory cortex, there are multiple differentiated networks that have distinct functional roles (Cammoun et al., 2015). There is consistent evidence indicating that the superior temporal gyrus—both anterior and posterior to the Heschl’s gyrus—plays an important role in melodic processing (for review, see Janata, 2015; Peretz & Zatorre, 2005; Zatorre & Zarate, 2012). For instance, the superior temporal lobe (including
both the superior temporal gyrus and the superior temporal sulcus) has been identified in studies examining melodic contour processing (Lee, Janata, Frost, Hanke, & Granger, 2011; Patterson, Uppenkamp, Johnsrude, & Griffiths, 2002; Schindler, Herdener, & Bartels, 2013; Tramo, Shah, & Braida, 2002), perception of melodic intervals (Klein & Zatorre, 2015), sound spectral envelope (Warren, Jennings, & Griffiths, 2005), and categorical perception of major and minor chords (Klein & Zatorre, 2011). Interestingly, studies have shown that the posterior region of the auditory cortex is more sensitive to decoding changes in pitch height (which refers to the spectral weighting of a sound), whereas more anterior areas are more sensitive to changes in pitch chroma (which is a feature related to the relative position of a pitch within a scale), indicating that pitch dimensions may have distinct representations in the human auditory cortex (Warren et al., 2003). Recently emerging evidence suggests that the parietal cortex and posterior regions of the superior temporal sulcus are key brain areas for multisensory integration, where information from auditory, visual, tactile, and multisensory stimuli converge via a patchy distribution of inputs, followed by integration in the intervening cortex (Beauchamp, Argall, Bodurka, Duyn, & Martin, 2004; Beauchamp, Nath, & Pasalar, 2010; Beauchamp, Yasar, Frye, & Ro, 2008). Functional differences have also been reported between the left and right auditory cortices, whereby the left auditory cortical areas have a higher degree of temporal sensitivity, whereas corresponding areas on the right auditory cortex have a greater spectral resolution (Andoh & Zatorre, 2011; Cha, Zatorre, & Schönwiesner, 2016; Perani, 2012; Santoro et al., 2014; Stewart, Overath, Warren, Foxton, & Griffiths, 2008; Tervaniemi et al., 2000; Warrier et al., 2009). Notably, research has repeatedly shown a righthemisphere bias in the processing of fine-grained spectral processing and a preferential response in the left hemisphere for temporal features of sounds, which supports the hypothesis that these functional asymmetries at early stages of auditory processing may be related to the intrinsic properties of each cortical hemisphere (Zatorre & Zarate, 2012). However, the pattern of activation between hemispheres can be modulated by stimulus complexity and/or task demands (Brechmann & Scheich, 2005; Hyde, Peretz, & Zatorre, 2008; Schön, Gordon, & Besson, 2005; Stewart et al., 2008) or music training (Ohnishi et al., 2001; Proverbio, Orlandi, & Pisanu, 2016).
With respect to music perception, the findings outlined thus far reveal a hierarchical organization of auditory processing (Stewart et al., 2008; Wessinger et al., 2001; see also de Heer, Huth, Griffiths, Gallant, & Theunissen, 2017). The primary auditory cortex plays a crucial role in extracting individual pitches and pitch changes within the melody, whereas non-primary auditory areas are involved in determining relationships between pitches to define the melody contour. More abstract processes required to establish syntactic relationships and meaning occur largely in regions outside of the auditory cortex, including the frontal cortex.
Auditory-Frontal Networks The transformation of the auditory information into a musically meaningful tonal context involves several areas of the frontal cortex. Studies of music syntax, utilizing primarily expectancy violation paradigms, have demonstrated that regions of the inferior frontal gyrus respond to harmonic expectancy violations (Bianco et al., 2016; Janata, Birk, et al., 2002; Koelsch et al., 2002; Koelsch, Fritz, Schulze, Alsop, & Schlaug, 2005; Maess, Koelsch, Gunter, & Friederici, 2001; Seger et al., 2013; Tillmann, Janata, & Bharucha, 2003). Reports have repeatedly indicated that the cortical network comprising the inferior frontolateral cortex (corresponding to BA44), inferior frontal gyrus, the anterior portion of the superior temporal gyrus, and the ventral premotor cortex, is involved in the processing of musical structure (for review, see Koelsch, 2006, 2011). This network appears to be specialized in establishing syntactic relationships by evaluating the harmonic relationship between incoming tonal information and a preceding harmonic sequence, thus detecting musical-structural irregularities and organizing fast short-term predictions of upcoming musical events (Koelsch, 2006). Recent imaging research has also suggested that rhythmic and melodic deviations in musical sequences may recruit different cortical areas—pitch deviations engage a neural network comprising auditory cortices, inferior frontal and prefrontal areas, whereas rhythmic deviations of a musical sequence recruit neural networks involving the posterior parts of the auditory cortices and parietal areas (Lappe, Lappe, & Pantev, 2016; Lappe, Steinsträter, & Pantev, 2013).
These findings are in accordance with the dual-pathway model of auditory processing, which hypothesizes that two auditory processing pathways originate from the primary auditory cortex, each contributing to processing different higher-order aspects of auditory stimuli (Belin & Zatorre, 2000; Bizley & Cohen, 2013; Hickok & Poeppel, 2007; Rauschecker & Scott, 2009). The anterior-ventral auditory pathway—which projects from anterior superior temporal gyrus to anterior inferior frontal gyrus and prefrontal areas—is predominantly involved in perceiving auditory objects and processing auditory spectral features. For instance, it has been shown that the inferior frontal gyrus and related areas of the ventrolateral prefrontal cortex are activated during phonological and semantic processing, non-verbal auditory sound detection (Kiehl, Laurens, Duty, Forster, & Liddle, 2001), discrimination and auditory feature detection (Gaab, Gaser, Zaehle, Jancke, & Schlaug, 2003; Zatorre, Bouffard, & Belin, 2004), and auditory working memory (Kaiser, Ripper, Birbaumer, & Lutzenberger, 2003), which reinforces the assumption that these areas play a more fundamental role in auditory processing. On the other hand, the posterior-dorsal stream—which connects posterior superior temporal gyrus with posterior inferior frontal gyrus, posterior parietal cortex, and premotor cortex—has been implicated in extracting spectral motion and temporal components of an auditory stimulus, thus processing how frequencies change over time (review: Plakke & Romanski, 2014; Zatorre & Zarate, 2012). Recent evidence indicates that the dorsal pathway of auditory processing also plays an important role in calculating and comparing pitch or temporal manipulations within a context and using this auditory information to select and prepare appropriate motor responses (Belin & Zatorre, 2000; Chen, Rae, & Watkins, 2012; Foster, Halpern, & Zatorre, 2013; Hickok & Poeppel, 2007; Loui, 2015; Saur et al., 2008; Warren, Wise, & Warren, 2005). Frontal cortex activity has also been associated with cognitive demands or the stimulus properties within a task. Tasks that require maintenance and rehearsal of musical information activate the working memory functional network, comprising the ventrolateral premotor cortex (encroaching Broca’s area), dorsal premotor cortex, the planum temporale, inferior parietal lobe, the anterior insula, and subcortical structures (Koelsch et al., 2009; Royal et al., 2016; Schulze, Zysset, Mueller, Friederici, & Koelsch, 2011). The medial prefrontal cortex (primarily the medial orbitofrontal region) appears
to be particularly engaged in tasks requiring self-referential judgments (Alluri et al., 2013; Zysset, Huber, Ferstl, & von Cramon, 2002), musical semantic memory (Groussard et al., 2010; Platel, Baron, Desgranges, Bernard, & Eustache, 2003), and music-evoked autobiographical memory (Janata, 2009; Von Der Heide, Skipper, Klobusicky, & Olson, 2013). Areas in the frontal lobes, such as parietal and ventrolateral prefrontal regions, are differentially activated depending on the relative attentional demands of the tasks (Alho, Rinne, Herron, & Woods, 2014; Janata, Tillmann, & Bharucha, 2002; Maidhof & Koelsch, 2011; Satoh, Takeda, Nagata, Hatazawa, & Kuzuhara, 2001). Involuntary musical imagery—that is, the spontaneous experience of having music looping in one’s head—is associated with cortical thickness in regions of the right frontal and temporal cortices as well as the anterior cingulate and left angular gyrus (Farrugia, Jakubowski, Cusack, & Stewart, 2015). On the other hand, voluntary musical imagery—the generation of mental representation of music or musical attributes in the absence of real sound input—engages secondary auditory cortices, the parietal cortex, inferior frontal regions, the supplementary motor area (SMA) and pre-SMA (Brown & Martinez, 2007; Halpern, Zatorre, Bouffard, & Johnson, 2004; Harris & De Jong, 2014; Peretz et al., 2009; Zatorre, Halpern, Perry, Meyer, & Evans, 1996). Neural activity in motor areas during perception or mental imagery of sounds have been repeatedly reported when musicians listen to a well-rehearsed musical sequence (Bangert et al., 2006; D’Ausilio, Altenmüller, Olivetti Belardinelli, & Lotze, 2006; Harris & De Jong, 2014; Haueisen & Knösche, 2001) or when pianists watch silent video recordings of hands playing a silent keyboard (Baumann et al., 2007; Bianco et al., 2016; Hasegawa et al., 2004). Activation of the fronto-parietal motorrelated network (comprising Broca’s area, the premotor region, intraparietal sulcus, and inferior parietal region) was also found when non-musicians listened to a piano piece they learned to play (Lahav, Saltzman, & Schlaug, 2007). These studies collectively show that the mere perception or mental imagery of sounds (which would normally be associated with a specific action) can automatically trigger representations of the movement necessary to produce these sounds, providing strong evidence that perception and action are intrinsically coupled in the human brain and in cognition (for review, see Keller, 2012; Maes, Leman, Palmer, & Wanderley, 2014; Novembre & Keller, 2014).
Auditory-Motor Networks Projections from motor cortex to the auditory cortex are an architectural feature common to many animal species (Schneider, Nelson, & Mooney, 2014). Animal research has indeed proven to be an important model to investigate the synaptic and circuit mechanisms by which the motor cortex interacts with auditory cortical activity (e.g., Merchant, Perez, Zarco, & Gamez, 2013; Nelson et al., 2013; Roberts et al., 2017; Schneider & Mooney, 2015). For instance, a recent study in mice found that axons from the secondary motor cortex make synapses onto both excitatory and inhibitory neurons in deep and superficial layers of the auditory cortex and that a subset of these neurons extends axons to various subcortical areas important for auditory processing (Nelson et al., 2013). The analysis of local field potentials of behaving macaques has also provided valuable insight regarding the neural underpinnings for beat synchronization, showing, for example, that beta-band oscillations may enable communication between distributed circuits involving the striato-thalamocortical network during rhythm perception and production (for a review, see Merchant & Bartolo, 2018; Merchant, Grahn, Trainor, Rohrmeier, & Fitch, 2015; see also Chapter 8). Recent research has also identified fiber projections transmitting auditory signals into motor regions in the human brain (Fernández-Miranda et al., 2015). Fernández-Miranda and colleagues demonstrated that the left and right arcuate fascicle, a white matter fiber tract that links lateral temporal cortex with frontal areas, is segmented into subtracts with distinct fiber terminations (Fig. 2). One set of fibers terminates at the ventral precentral and caudal middle frontal gyri (BA4, BA6), providing direct projections from auditory cortex to motor areas (primary motor cortex, premotor cortex).
FIGURE 2. Subtracts of the left arcuate fascicle with terminations on primary motor cortex and premotor cortex, corresponding to Brodmann areas BA6 and BA4 (ventral precentral and caudal middle frontal gyri). Reprinted by permission from Brain Structure and Function 220 (3), Asymmetry, connectivity, and segmentation of the arcuate fascicle in the human brain, Juan C. FernándezMiranda, Yibao Wang, Sudhir Pathak, Lucia Stefaneau, Timothy Verstynen, and Fang-Cheng Yeh, pp. 1665–1680, https://doi.org/10.1007/s00429-014-0751-7 © Springer-Verlag Berlin Heidelberg, 2014.
Further evidence of functional coordination between auditory and motor cortices has been provided by a robust body of neuroimaging research. Studies have shown that listening to and encoding auditory rhythms internally increases auditory-motor brain connectivity (Chen, Penhune, & Zatorre, 2008a; Chen, Zatorre, & Penhune, 2006; Fujioka, Trainor, Large, & Ross, 2012; Grahn & Brett, 2007), and that the coupling among cortical motor and auditory areas is strengthened with musical training (Chen, Penhune, & Zatorre, 2008b; Grahn & Rowe, 2009; Palomar-García, Zatorre, Ventura-Campos, Bueichekú, & Ávila, 2017). Studies have also
found that corticospinal excitability is modulated by music with a strong beat (“groove”), which suggests that merely listening to musical rhythm elicits activity in motor-output pathways from the primary motor cortex to the spinal cord (Giovannelli et al., 2013; Michaelis, Wiener, & Thompson, 2014; Stupacher, Hove, Novembre, Schütz-Bosbach, & Keller, 2013). Further evidence of auditory-motor coupling at spinal cord level is provided by research showing that delivering transcranial magnetic stimulation in time with the music facilitates corticospinal excitability in muscles involved in foot tapping (i.e., tibialis anterior, gastrocnemius) (Wilson & Davey, 2002; see also Thaut, McIntosh, Prassas, & Rice, 1992), and that the degree of corticospinal excitability depends on musical training, being greater in trained musicians (D’Ausilio et al., 2006; Stupacher et al., 2013). Finally, extensive neurophysiological evidence indicates that auditory and motor regions communicate through oscillatory activity and that the cortical loop between these areas generates temporal predictions that are crucial in auditory perceptual learning and for the perception of, and entrainment to, musical rhythms (Fujioka et al., 2012; Large, Herrera, & Velasco, 2015; Large & Snyder, 2009; Ross, Barat, & Fujioka, 2017; for review: Merchant et al., 2015; Morillon & Baillet, 2017; Ross, Iversen, & Balasubramaniam, 2016). Therefore, evidence at multiple levels of inquiry suggests that there is a strong functional and anatomical link between auditory and motorrelated areas, and that many components of the motor system are deeply involved in auditory perceptual learning, in the generation of predictions, as well as in the perception of, and entrainment to, musical rhythms. Interconnectivity between auditory and motor-related areas is crucial for time perception and for the production of timed movements. Temporal processing and sensorimotor synchronization involve complex functional networks comprising several distant cortical and subcortical brain areas, including the cerebellum, the basal ganglia (predominantly the putamen), thalamus, the SMA and pre-SMA, premotor cortex (PMC), and the auditory cortex (for review: Chauvigné, Gitau, & Brown, 2014; Iversen & Balasubramaniam, 2016; Leow & Grahn, 2014; Merchant et al., 2015; Teki, Grube, & Griffiths, 2012). Although the specific role of each area is still emerging, recent studies have reached consensus that there are at least two distinct networks involved in timing—one is centered on the role of the cerebellum in the processing of sensory prediction errors, motor adaptation, and duration-based timing, and the second is based on the role of the basal
ganglia and the SMA on beat-based timing and internally driven rhythmic movements.
Cortico-Cerebellar Network The cerebellum receives segregated projections from prefrontal, frontal, parietal, and superior temporal regions via the pontine nuclei in the brainstem (Fig. 3). Output projections are then sent from the cerebellar cortex to specialized deep cerebellar nuclei, which in turn project back, via the thalamus, to the region of the cerebral cortex from which the initial projection originated (Koziol, Budding, & Chidekel, 2011; Schmahmann & Pandya, 1997). These parallel cortico-cerebellar loops place the cerebellum in a unique position to use all the information it receives from the neocortex to build, through a learning process, an internal “model” that contains all of the dynamic processes that are required to perform a specific movement or behavior. This feedforward information (or efferent copy) is used to generate a representation of the expected sensory consequences of that command, and to compute error signals that can produce online changes to adjust its execution and/or to improve future predictions (for review, see Sokolov, Miall, & Ivry, 2017; Wolpert, Miall, & Kawato, 1998). Indeed, research has demonstrated that the cerebellum is key in establishing sensory prediction errors by processing signal discrepancies between the expected sensory consequences of a stimulus/movement and the actual sensory input (Baumann et al., 2015; Koziol et al., 2014; Manto et al., 2012; Tseng, Diedrichsen, Krakauer, Shadmehr, & Bastian, 2007). These error signals are essential for sensorimotor control, motor adaptation, and learning because they allow rapid adjustments in the motor output and refinement of future sensory predictions in order to reduce the variability of subsequent actions (Doyon, Penhune, & Ungerleider, 2003; Petter, Lusk, Hesslow, & Meck, 2016; Shadmehr, Smith, & Krakauer, 2010; Sokolov et al., 2017).
FIGURE 3. Diagram of cortico-cerebellar and basal ganglia-thalamo-cortical networks and the intricate connectivity between these circuits. The basal ganglia-thalamo-cortical timing network normally involves the SMA, PFC, Striatum, PPC, GPe, Th, STN, VTA, and SN. The cerebellar network involves the Cb Cortex, PN, DN, and IO. Note that the cerebellum is also connected to multiple cortical and subcortical regions, and that reciprocal connections between the basal ganglia and the cerebellum are not illustrated. Abbreviations: PFC, prefrontal cortex; SMA, supplementary motor area; PPC, posterior parietal cortex; Th, thalamus; GPe, globus pallidus; STN, sub thalamic nuclei; SN, substantia nigra; VTA, ventral tegmental area; PN, pontine nuclei; DN, dentate nucleus; Cb, cerebellar cortex; IO, inferior olive. Reproduced with permission from Petter et al. (2016).
A growing body of research evidence indicates that cortico-cerebellar networks are predominantly engaged in movement synchronization to externally cued stimuli, but less involved in self-paced or internally guided motor behaviors (Brown, Martinez, & Parsons, 2006; Buhusi & Meck, 2005; Chauvigné et al., 2014; Del Olmo, Cheeran, Koch, & Rothwell, 2007; Grahn & Rowe, 2013; Manto et al., 2012; Thaut et al., 2009; Witt, Laird, & Meyerand, 2008). These findings concur with the cerebellum’s role in integration of sensory and motor information, basic sensory prediction
related to motor timing, and temporal adaptation during sensorimotor synchronization (Diedrichsen, Criscimagna-Hemminger, & Shadmehr, 2007; Gao et al., 1996; Manto et al., 2012; Mayville, Jantzen, Fuchs, Steinberg, & Kelso, 2002; Rao et al., 1997; Schwartze, Keller, & Kotz, 2016; Shadmehr et al., 2010; Thaut, Demartin, & Sanes, 2008; Tseng et al., 2007). The premotor cortex is also known to play a role in movements guided by external sensory stimuli and is thought to be particularly involved in aspects of prediction related to motor timing and temporal adaptation, and in integrating higher-order features of sound with the appropriately timed and organized motor response (Chapin et al., 2010; Chen et al., 2008b; Jahanshahi et al., 1995; Jäncke, Loose, Lutz, Specht, & Shah, 2000; Kornysheva & Schubotz, 2011; Pecenka, Engel, & Keller, 2013; Schubotz, 2007). Studies have indeed identified fronto-olivocerebellar pathways that connect the dorsal portions of the dentate nucleus in the cerebellum to motor areas such as the primary motor cortex and the premotor cortex (Dum, 2002; Middleton & Strick, 2001; Schmahmann & Pandya, 1997). The olivocerebellar network is thought to be an important neural loop in the cerebellar adaption of sensorimotor forward models due to its capacity to directly modulate the output signals sent from the cerebellum back to sensorimotor cortical areas (Koziol et al., 2011; Sokolov et al., 2017). The inferior olive is a brainstem nucleus that receives significant projections from the sensorimotor cortex and is one of the main sources of input to the cerebellar cortex. Excitatory neurons originated in the inferior olive, known as climbing fibers, project to Purkinje cells in the cerebellar sensorimotor cortex and the deep cerebellar nuclei. This microcircuit is completed with Purkinje cells in the cerebellar cortex sending inhibitory projections to the deep cerebellar nuclei (including the dentate nucleus), which in turn send projections back to the inferior olive and to the cerebral cortex via the thalamus (Fig. 3). Some models suggest that this cortico-cerebellar network is involved in detecting sequences of cortical input activity and generating precisely timed output activity in response, hence contributing to the optimization and coordination of neocortical network activity involved in cognitive and motor processes (Durstewitz, 2003; Fatemi et al., 2012, p. 792; Mauk & Buonomano, 2004; Medina & Mauk, 2000; Molinari et al., 2005; Molinari, Leggio, & Thaut, 2007; Thaut et al., 2009). Alternatively, other theories hypothesize that the olivocerebellar circuit has the electrophysiological characteristics of a neural clock capable of generating
accurate absolute timing signals, suggesting that the cerebellum is specialized for providing an explicit temporal representation (Allman, Teki, Griffiths, & Meck, 2014; Ashe & Bushara, 2014; Ivry, Spencer, Zelaznik, & Diedrichsen, 2002; Spencer, Ivry, & Zelaznik, 2005; Teki et al., 2012). Recently converging evidence indicates that the cerebellum is also implicated in measuring and storing the absolute duration of sub-second time intervals of discrete perceptual events (for review: Allman et al., 2014; Petter et al., 2016; Teki et al., 2012). Several studies have demonstrated that the cerebellum is crucial for perceptual tasks requiring temporal discrimination, processing of target duration, detecting the timing onset of discrete perceptual events, detecting violations of temporal expectancies, and processing complex temporal events such as polyrhythmic stimuli and non-metric rhythms (Grahn & Rowe, 2009; Grube, Cooper, Chinnery, & Griffiths, 2010; Kotz, Stockert, & Schwartze, 2014; O’Reilly, Mesulam, & Nobre, 2008; Paquette, Fujii, Li, & Schlaug, 2017; Schwartze, Rothermich, Schmidt-Kassow, & Kotz, 2011; Teki, Grube, Kumar, & Griffiths, 2011; Tesche & Karhu, 2000; Thaut et al., 2008). Recent functional imaging and transcranial stimulation research demonstrated that the cerebellar lobules VI and VIIA in the vermis are specially active in perceptual tasks involving duration-based timing (Grube, Cooper, et al., 2010; Grube, Lee, Griffiths, Barker, & Woodruff, 2010; Keren-Happuch, Chen, Ho, & Desmond, 2014; Lee et al., 2007; O’Reilly et al., 2008). The notion that distinct cerebellar regions are activated depending on the context and the different aspects of timing is supported by neuroimaging studies demonstrating that the cerebellum is topographically organized so that different regions of the cerebellum manage information from different domains (Kelly & Strick, 2003; Keren-Happuch et al., 2014; Koziol et al., 2011; Stoodley & Schmahmann, 2009, 2010). Although the cerebellum has been long known for its importance in motor behavior and timing, current research has firmly established the cerebellum’s critical role in modulating cognitive functions including attention, emotion, executive function, language, working memory, and music perception (for review, see Baumann et al., 2015; Buckner, 2013; Koziol et al., 2014; Sokolov et al., 2017). Recent studies indeed suggest that the cerebellum plays a role in processing pitch and timbre (Alluri et al., 2012; Parsons, 2012; Parsons, Petacchi, Schmahmann, & Bower, 2009; Pfordresher, Mantell, Brown, Zivadinov, & Cox, 2014; Thaut, Trimarchi, & Parsons, 2014; Toiviainen, Alluri, Brattico,
Wallentin, & Vuust, 2014). For instance, Thaut and colleagues (2014) described common and distinct neural substrates underlying processing of the different components of rhythmic structure (i.e., pattern, meter, tempo), but also, that melody processing induced activity in different regions when compared to rhythm (e.g., right anterior insula and various cerebellar areas). Another study showed that alterations of auditory feedback during piano performance, particularly pitch disruptions, increased activity in the cerebellum (Pfordresher et al., 2014), which is aligned with the understanding that the cerebellum is involved in monitoring sensory prediction errors, including pitch information. The cerebellum has also been implicated in the processing of affective sounds (Alluri et al., 2015; Pallesen et al., 2005; for review: Frühholz, Trost, & Kotz, 2016), and in working memory tasks such as recognizing musical motifs (e.g., Burunat, Alluri, Toiviainen, Numminen, & Brattico, 2014; see also Ito, 2008; Marvel & Desmond, 2010), supporting the idea that the cerebellum is a multipurpose neural mechanism capable of influencing a wide range of functional processes.
Basal Ganglia-Thalamo-Cortical Network Mounting evidence suggests that a distributed network comprising the basal ganglia (particularly the putamen), thalamus, and cortical areas such as the SMA and pre-SMA, premotor cortex and auditory cortex, is engaged in beat perception (for review: Leow & Grahn, 2014; Merchant et al., 2015; Petter et al., 2016; Teki et al., 2012). The basal ganglia are thought to play a key role in predicting upcoming events based on a relative timing mechanism, that is, where temporal intervals are coded relative to a periodic beat interval (Grahn & Brett, 2007; Grahn, Henry, & McAuley, 2011; Grahn & Rowe, 2013; Grube, Cooper, et al., 2010; Grube, Lee, et al., 2010; Kotz, Brown, & Schwartze, 2016; Nozaradan, Schwartze, Obermeier, & Kotz, 2017; Teki et al., 2011). These findings are consistent with studies showing the involvement of the basal ganglia in reward prediction, associate learning, and harmonic processing (e.g., Salimpoor, Benovoy, Larcher, Dagher, & Zatorre, 2011; Salimpoor, Zald, Zatorre, Dagher, & McIntosh, 2015; Seger et al., 2013). Functional connectivity between basal ganglia (putamen), cortical motor areas (premotor cortex and SMA), and auditory cortex increases significantly when listening to rhythms with a clear beat, suggesting that the basal ganglia and the SMA are important for the
representation of pulse and rhythm even in the absence of movement (Chen et al., 2008a; Grahn & Brett, 2007; Grahn & Rowe, 2009; Stupacher et al., 2013). Neural pathways connecting the basal ganglia and the SMA have been identified in studies using in vivo imaging tractography (Akkal, Dum, & Strick, 2007; Lehéricy et al., 2004), showing that corticostriatal connections are part of a distributed network that supports different aspects of timing (Fig. 3). There are strong indications that the basal ganglia (putamen) and SMA are predominantly involved in maintaining the internal representation of the beat intervals in sensorimotor tasks (beat continuation). This notion is supported by studies showing that there is greater activation of the putamen and SMA during the continuation phase of synchronization-continuation tasks, that is, when the external reference cues are no longer available (Cunnington, Bradshaw, & Iansek, 1996; Grahn & Rowe, 2013; Halsband, Ito, Tanji, & Freund, 1993; Rao et al., 1997). These findings concur with research describing the role of the SMA in timed movements performed in the absence of any pacing stimulus (i.e., self-paced or internally guided motor behaviors) (Coull, Vidal, & Burle, 2016; Harrington & Jahanshahi, 2016; Lima, Krishnan, & Scott, 2016; Nachev, Kennard, & Husain, 2008; Witt et al., 2008). Activity in the SMA and basal ganglia during internally generated movements has been also investigated in non-human primates (for review: Merchant & Bartolo, 2018; Merchant et al., 2015). The analysis of local field potentials of behaving macaques has demonstrated, for instance, that greater beta-band (15–30 Hz) activity in the putamen was observed during the continuation phase of synchronization-continuation tasks, suggesting that beta-band oscillations may enable communication between a distributed set of circuits including the motor cortico-basal ganglia-thalamo-cortical circuit (Bartolo, Prado, & Merchant, 2014). Interestingly, the study also found gamma-activity (30–50 Hz) in some local fields in the putamen during the synchronization phase of the task, suggesting that the putamen may also be involved in local computations associated with sensorimotor processing during beat synchronization. The physiological mechanism underlying the processing of temporal information involving the basal ganglia-thalamo-cortical circuit is likely mediated by dopamine receptors located on corticostriatal neurons in the nigrostriatal pathway (for review, see Agostino & Cheng, 2016; Allman et al., 2014; Buhusi & Meck, 2005; Petter et al., 2016). Evidence suggests that
striatal medium spiny neurons in the dorsal striatum (comprising putamen and caudate nucleus) are crucial to duration discrimination in the secondsto-minutes range due to their role in large-scale oscillatory networks connecting mesolimbic, nigrostriatal, and mesocortical dopaminergic systems (Buhusi & Meck, 2005; Merchant, Harrington, & Meck, 2013). The striatal beat-frequency model suggests that the neural mechanisms of interval timing are based on the entrainment of the oscillatory activity of striatal neurons and cortical neural oscillators (Matell & Meck, 2004). The role of dopamine in interval timing accuracy and precision is supported by studies showing that patients with disorders that involve dopaminergic pathways, such as Parkinson’s disease, Huntington’s disease, and schizophrenia, have difficulties in timing-related tasks, and that dopaminergic medication can ameliorate these issues (Harrington et al., 2011; Jahanshahi et al., 2010; see review in Allman & Meck, 2012; Coull, Cheng, & Meck, 2011). A recent study also showed that dopamine depletion in healthy individuals attenuated the activity in the putamen and SMA and directly interfered with the processing of temporal information (Coull, Hwang, Leyton, & Dagher, 2012). Pharmacological studies have also made significant advances in understanding how dopamine affects the activity of corticostriatal circuits and what roles the different dopaminergic receptors play in timing behavior (for review, see Agostino & Cheng, 2016; Narayanan, Land, Solder, Deisseroth, & DiLeone, 2012). Taken together, it is clear that cortico-cerebellar and basal gangliathalamo-cortical networks have complementary roles in temporal perception and motor timing, and the challenge for future studies is to further understand how these networks interact in both motor and nonmotor functions. Recently emerging evidence from neuroanatomical studies using virus transneuronal tracers demonstrates that the cerebellum and the basal ganglia are reciprocally connected and that these subcortical structures are indeed part of an integrated network (Bostan, Dum, & Strick, 2013; Caligiore et al., 2017; Chen, Fremont, Arteaga-Bracho, & Khodakhah, 2014; Kotz et al., 2016; Pelzer, Melzer, Timmermann, von Cramon, & Tittgemeyer, 2017). Models discussing the possible ways in which the cortico-cerebellar and striato-thalamo-cortical networks may integrate to support time perception and sensorimotor synchronization have been recently proposed, instigating further investigations (Lusk, Petter, Macdonald, & Meck, 2016; Petter et al., 2016; Teki et al., 2012).
Auditory-Limbic Networks The limbic and the auditory systems are highly interconnected and form an important part of the core neural network involved in affective sound processing (Frühholz et al., 2016). Direct and indirect pathways between the auditory system and limbic areas have been described in the literature (Fig. 4B) (Frühholz, Trost, & Grandjean, 2014; Janak & Tye, 2015). For instance, the amygdala (specifically, the lateral part of the basolateral complex) receives direct projections from the superior temporal cortex (LeDoux, 2007), and animal research suggests that there may also be a direct connection with the primary auditory cortex (Reser, Burman, Richardson, Spitzer, & Rosa, 2009). The amygdala is also interconnected with subcortical nodes of the ascending auditory pathway, receiving direct projections from the medial geniculate body and sending projections to the inferior colliculus, supporting the notion that less complex sounds (i.e., short high-intensity sounds or aversive sounds) may be transmitted to the amygdala through a fast subcortical circuit (Fig. 4B) (Frühholz et al., 2016; Pannese, Grandjean, & Frühholz, 2016). Recent theories suggest that this direct link between the auditory thalamus and the amygdala plays an important role in fast responses to sound whereas a “slow” network projecting from thalamus to primary auditory cortex to association cortex to amygdala may govern interpretive labeling/understanding responses during music processing and music-evoked emotions (Huron, 2006; Juslin & Västfjäll, 2008).
FIGURE 4. (A) The neural auditory ascending pathway. (B) Amygdala and hippocampal connections to the auditory system. The amygdala receives direct input from the MGB of the thalamus (1) and from higher-level auditory cortex in STC (line 2), which both project to the lateral nucleus of the basolateral (l) complex of the amygdala. Tracing studies in animals also report connections between AC and the amygdala (dashed line 2). The basal nucleus (b) of the basolateral complex has efferent connection to the IC (line 3). The accessory nucleus (ac), the medial (m), and the central nucleus (c) are not directly connected to the auditory system. The hippocampus (hc) shows direct (line 2) and indirect (lines 1) connections to the auditory cortex. A direct connection exists from the CA1 region to the higher-level auditory cortex (line 2). Indirect connections mainly provide input to the hippocampal formation by connections from the STC to the parahippocampal gyrus (phg), to the perirhinal cortex (prc) and the entorhinal cortex (erc), all line 1, which figure as input relays to the hippocampus. Abbreviations: MGB, medial geniculate body; STC, superior temporal cortex; IC, inferior colliculus; CN, cochlear nucleus; SOC, superior olivary complex; AC, auditory cortex; SUB, subiculum; DG, dentate gyrus. Reprinted with permission from Frühholz et al. (2014).
Mounting evidence from functional neuroimaging research shows that music can modulate activity in several brain areas of the limbic system, such as the amygdala, the hippocampal formation, right ventral striatum (including the nucleus accumbens) extending into the ventral pallidum, caudate nucleus, insula, the cingulate cortex, and the orbitofrontal cortex (for review: Koelsch, 2014; Zatorre, 2015). Studies have demonstrated that music that is perceived as joyful elicits strong response in the superficial nuclei group of the amygdala, an area that seems to be particularly involved in extracting the social significance of signals that convey basic socioaffective information (Koelsch et al., 2013; Koelsch & Skouras, 2014; Lehne, Rohrmeier, & Koelsch, 2013). Activity changes in response to joyful, unpleasant, or sad music, were also found in the (right) laterobasal amygdala, an area that has been implicated in acquisition, encoding, and retrieval of both positive and negative associations, and processing cues that predict either positive or negative reinforcement (Brattico et al., 2011; Koelsch et al., 2013; Koelsch, Fritz, v. Cramon, Müller, & Friederici, 2006; Mitterschiffthaler, Fu, Dalton, Andrew, & Williams, 2007; Pallesen et al., 2005). The laterobasal amygdala is involved in the regulation of neural input into the hippocampal formation, another area that responds to musicevoked emotions such as tenderness, peacefulness, nostalgia, or wonder (Burunat et al., 2014; Choppin et al., 2016; Koelsch et al., 2013; Mitterschiffthaler et al., 2007; Trost, Ethofer, Zentner, & Vuilleumier, 2012; for review: Koelsch, 2014). The hippocampus, in turn, receives projections
from the auditory system, however, these are mediated by the parahippocampal gyrus, the perirhinal cortex, and the entorhinal cortex (Fig. 4B) (for review, see Frühholz et al., 2014; Koelsch, 2014). Changes in the ventral striatum (including the nucleus accumbens) have also been found in response to pleasant music (Blood & Zatorre, 2001; Koelsch et al., 2006; Menon & Levitin, 2005; Mueller et al., 2015; Salimpoor et al., 2013; Zatorre & Salimpoor, 2013). In particular, the nucleus accumbens has been shown to respond to intense feelings of musicevoked pleasure and reward (Blood & Zatorre, 2001; Salimpoor et al., 2011, 2013), suggesting that functional connectivity between the auditory cortex and ventral striatum (including the nucleus accumbens) is crucial for experiencing pleasure in music (Martínez-Molina, Mas-Herrero, RodríguezFornells, Zatorre, & Marco-Pallarés, 2016; Sachs, Ellis, Schlaug, & Loui, 2016; Salimpoor et al., 2013). Music-evoked pleasure can lead to dopamine release in distinct anatomical areas; increase in dopamine availability in the dorsal striatum is associated to the anticipation of reward, whereas increase in dopamine in the ventral striatum occurs during the rewarding experience (Blood & Zatorre, 2001; Menon & Levitin, 2005; Salimpoor et al., 2011, 2015; Zatorre & Salimpoor, 2013). Aesthetic pleasure results from the integration between subcortical dopaminergic regions and higher-order cortical areas of the brain (for review, see Salimpoor et al., 2015). It has been shown, for instance, that functional connectivity between the nucleus accumbens and the auditory cortex as well as fronto-striatal circuit (involving ventral and dorsal subdivisions of the striatum and frontal areas such as inferior frontal gyri, prefrontal cortex, and orbitofrontal cortex) predicts whether individuals will decide to purchase a song (Salimpoor et al., 2013). Recently emerging data from transcranial magnetic stimulation research further supports the direct role of the fronto-striatal circuit in both the affective responses and motivational aspects of music-induced reward (Mas-Herrero, Dagher, & Zatorre, 2017). The ventromedial prefrontal cortex and adjacent orbitofrontal cortex are involved in high-level emotional processing, such as reward detection and valuation, and are the main cortical inputs to the nucleus accumbens, again reinforcing the notion that fronto-striatal circuits are highly involved in the integration, evaluation, and decision-making of reward-related stimuli (for review: Haber & Knutson, 2010; Salimpoor et al., 2015; see also Chapter 14).
Recent findings suggest that the auditory cortex also plays a crucial role in the emotional processing of sounds, beyond mere acoustical analysis (Frühholz et al., 2016; Koelsch, Skouras, & Lohmann, 2018). Koelsch et al. (2018) found that fear stimuli (compared with joy stimuli) evoked higher network centrality in both anterior and posterior auditory association cortex, suggesting that the auditory cortex may play a central role in the affective processing of auditory information. Moreover, findings also indicated that the auditory cortex is functionally connected with a widespread network involved in emotion processing, which includes limbic/paralimbic structures (cingulate, insular, parahippocampal, and orbitofrontal cortex, as well as the ventral striatum), and also extra-auditory neocortical areas (visual, somatosensory, and motor-related areas, and attentional structures). These results expand the traditional view that sensory cortices have mere perceptual functions and highlight the importance of investigating the functional connectivity between brain regions.
Brain Network Interactions Recent advances in neuroimaging analysis methods have allowed researchers to address questions of functional connectivity, interregion coupling, and networked computations that go beyond the “where” and “when” of task-related activity, providing new insights about how different brain networks interact to support cognitive, perceptual, and motor functions (Friston, 2011). Among the topics recently explored in music neuroscience is how brain networks organize themselves in a naturalistic music listening situation, wherein data acquisition takes place while participants listen to entire songs in an uninterrupted fashion, thus emulating real-life listening experiences (Alluri et al., 2012, 2013, 2015; Burunat et al., 2014; Koelsch & Skouras, 2014; Koelsch et al., 2018; Lehne et al., 2013; Sachs et al., 2016; Toiviainen et al., 2014). Studies using novel data-driven methods to investigate neural correlates of musical feature processing using fMRI data have found, for instance, that timbral feature processing during naturalistic listening conditions engages sensory and default mode network cerebrocortical areas as well as cognitive areas of the cerebellum, whereas musical pulse and tonality processing recruit cortical
and subcortical cognitive, motor, and emotion-related circuits (Alluri et al., 2012; Toiviainen et al., 2014). Orbitofrontal cortex and the anterior cingulate cortex, which are associated with aesthetic judgments and selfreferential appraisal, are also recruited while listening to full musical pieces (Alluri et al., 2013; Reybrouck & Brattico, 2015; Sachs et al., 2016). Moreover, music containing lyrics seems to particularly increase activity in the left auditory cortex, corroborating the hypothesis of hemispheric lateralization (Alluri et al., 2013; Brattico et al., 2011). Collectively, these findings confirm the notion that music processing requires timely coordination of large-scale cognitive, motor, and limbic brain circuitry. Research has also demonstrated that music preference and music expertise can modulate functional brain connectivity during passive music listening. A recent study has found that the default mode network—a network of interacting brain regions that is important for internally-focused thoughts—was more functionally connected when people listened to unfamiliar music they like compared to music they dislike, and that listening to one’s favorite music increased connectivity between auditory brain areas and the hippocampus (Wilkins, Hodges, Laurienti, Steen, & Burdette, 2014). These findings were recently expanded by a study showing that musicians and non-musicians use different neural networks during music listening (Alluri et al., 2017). Whole-brain network analysis revealed that, while the dominant hubs during passive music listening in nonmusicians encompassed regions related to the default mode network, in musicians the primary neural hubs engaged during music listening comprised cerebral and cerebellar sensorimotor regions. Moreover, the study also showed that musicians have enhanced connectivity in the motor and sensory homunculus representing the upper limbs and torso during the listening task, suggesting that experts tend to process music using an actionbased approach whereas non-musicians use a perception-based approach (Alluri et al., 2017; see also Moore, Schaefer, Bastin, Roberts, & Overy, 2014). Evidence for the reconfiguration of human brain functional networks during music listening has also been provided by electroencephalography (EEG) studies (Adamos, Laskaris, & Micheloyannis, 2018; Klein, Liem, Hänggi, Elmer, & Jäncke, 2016; Rogenmoser, Zollinger, Elmer, & Jäncke, 2016; Sänger, Müller, & Lindenberger, 2012; Wu et al., 2012; Wu, Zhang, Ding, Li, & Zhou, 2013). Overall, findings concur that music processing
induces changes in the functional organization of neural synchronies by increasing intraregional and interregional oscillatory synchronizations. These findings support the evidence that music, like other higher cognitive tasks, requires the activation of different cortical and subcortical regions in an organized and cooperative manner (Bhattacharya & Petsche, 2005).
S Uncovering the neural underpinnings of music processing is a central theme in cognitive neuroscience, as evidenced by the robust body of literature on this topic. Neuroimaging research developed in the past 20 years has successfully identified several brain regions involved in the complex set of cognitive processes underlying music perception, memory, emotion, and performance, providing the foundation upon which research has started to explore how these different brain regions interact to support music processing. This chapter provides a broad panorama of the current knowledge concerning the anatomical and functional basis of music processing through a network perspective. Starting with the trajectory of auditory stimuli through the ascending auditory pathway, we described how interactions between auditory and frontal cortical areas are crucial for the transformation of the acoustic information into a musically meaningful tonal context, for the integration of sound events over time in working memory, and the role of frontal areas in autobiographical memories, attention, and musical imagery. Anatomical and functional coordination between auditory and motor-related areas were also discussed in order to understand how cortical and subcortical areas are involved in sensorimotor synchronization and temporal processing, focusing more specifically on the roles of cortico-cerebellar and basal ganglia-thalamo-cortical networks. Auditory and limbic interactions were also discussed in relation to affective sound processing and music-evoked emotions, also pointing to the importance of the integration between subcortical dopaminergic regions and higher-order cortical areas for aesthetic pleasure. To finalize, we reviewed recent studies investigating how brain networks organize themselves in a naturalistic music listening context. Collectively, this robust body of literature suggests that music processing requires timely coordination of
large-scale cognitive, motor, and limbic brain networks, setting the stage for a new generation of music neuroscience research on the dynamic organization of brain networks underlying music processing.
R Adamos, D. A., Laskaris, N., & Micheloyannis, S. (2018). Harnessing functional segregation across brain rhythms as a means to detect EEG oscillatory multiplexing during music listening. Journal of Neural Engineering 15, 036012. Agostino, P. V., & Cheng, R. K. (2016). Contributions of dopaminergic signaling to timing accuracy and precision. Current Opinion in Behavioral Sciences 8, 153–160. Akkal, D., Dum, R. P., & Strick, P. L. (2007). Supplementary motor area and presupplementary motor area: Targets of basal ganglia and cerebellar output. Journal of Neuroscience 27(40), 10659–10673. Alho, K., Rinne, T., Herron, T. J., & Woods, D. L. (2014). Stimulus-dependent activations and attention-related modulations in the auditory cortex: A meta-analysis of fMRI studies. Hearing Research 307, 29–41. Allman, M. J., & Meck, W. H. (2012). Pathophysiological distortions in time perception and timed performance. Brain 135(3), 656–677. Allman, M. J., Teki, S., Griffiths, T. D., & Meck, W. H. (2014). Properties of the internal clock: Firstand second-order principles of subjective time. Annual Review of Psychology 65, 743–771. Alluri, V., Brattico, E., Toiviainen, P., Burunat, I., Bogert, B., Numminen, J., & Kliuchko, M. (2015). Musical expertise modulates functional connectivity of limbic regions during continuous music listening. Psychomusicology 25(4), 443–454. Alluri, V., Toiviainen, P., Burunat, I., Kliuchko, M., Vuust, P., & Brattico, E. (2017). Connectivity patterns during music listening: Evidence for action-based processing in musicians. Human Brain Mapping 38(6), 2955–2970. Alluri, V., Toiviainen, P., Jääskeläinen, I. P., Glerean, E., Sams, M., & Brattico, E. (2012). Largescale brain networks emerge from dynamic processing of musical timbre, key and rhythm. NeuroImage 59(4), 3677–3689. Alluri, V., Toiviainen, P., Lund, T. E., Wallentin, M., Vuust, P., Nandi, A. K., … Brattico, E. (2013). From Vivaldi to Beatles and back: Predicting lateralized brain responses to music. NeuroImage 83, 627–636. Amunts, K., Morosan, P., Hilbig, H., & Zilles, K. (2012). Auditory system. In J. K. Mai & G. Paxinos (Eds.), The human nervous system (3rd ed., pp. 1270–1300). London: Elsevier. Andoh, J., & Zatorre, R. J. (2011). Interhemispheric connectivity influences the degree of modulation of TMS-induced effects during auditory processing. Frontiers in Psychology 2, 161. Ashe, J., & Bushara, K. (2014). The olivo-cerebellar system as a neural clock. In H. Merchant & V. de Lafuente (Eds.), Neurobiology of interval timing: Advances in experimental medicine and biology (pp. 155–166). New York: Springer. Bajo, V. M., Nodal, F. R., Moore, D. R., & King, A. J. (2010). The descending corticocollicular pathway mediates learning-induced auditory plasticity. Nature Neuroscience 13(2), 253–260. Bangert, M., Peschel, T., Schlaug, G., Rotte, M., Drescher, D., Hinrichs, H., … Altenmüller, E. (2006). Shared networks for auditory and motor processing in professional pianists: Evidence from
fMRI conjunction. NeuroImage 30(3), 917–926. Bartolo, R., Prado, L., & Merchant, H. (2014). Information processing in the primate basal ganglia during sensory-guided and internally driven rhythmic tapping. Journal of Neuroscience 34(11), 3910–3923. Baumann, O., Borra, R. J., Bower, J. M., Cullen, K. E., Habas, C., Ivry, R. B., … Sokolov, A. A. (2015). Consensus paper: The role of the cerebellum in perceptual processes. Cerebellum 14(2), 197–220. Baumann, S., Griffiths, T. D., Sun, L., Petkov, C. I., Thiele, A., & Rees, A. (2011). Orthogonal representation of sound dimensions in the primate midbrain. Nature Neuroscience 14(4), 423–425. Baumann, S., Koeneke, S., Schmidt, C. F., Meyer, M., Lutz, K., & Jancke, L. (2007). A network for audio-motor coordination in skilled pianists and non-musicians. Brain Research 1161(1), 65–78. Beauchamp, M. S., Argall, B. D., Bodurka, J., Duyn, J. H., & Martin, A. (2004). Unraveling multisensory integration: Patchy organization within human STS multisensory cortex. Nature Neuroscience 7(11), 1190–1192. Beauchamp, M. S., Nath, A. R., & Pasalar, S. (2010). fMRI-guided transcranial magnetic stimulation reveals that the superior temporal sulcus is a cortical locus of the McGurk effect. Journal of Neuroscience 30(7), 2414–2417. Beauchamp, M. S., Yasar, N. E., Frye, R. E., & Ro, T. (2008). Touch, sound and vision in human superior temporal sulcus. NeuroImage 41(3), 1011–1020. Belin, P., & Zatorre, R. J. (2000). “What,” “where” and “how” in auditory cortex. Nature Neuroscience 3(10), 965–966. Bhattacharya, J., & Petsche, H. (2005). Phase synchrony analysis of EEG during music perception reveals changes in functional connectivity due to musical expertise. Signal Processing 85(11), 2161–2177. Bianco, R., Novembre, G., Keller, P. E. E., Kim, S.-G. G., Scharf, F., Friederici, A. D. D., … Sammler, D. (2016). Neural networks for harmonic structure in music perception and action. NeuroImage 142, 454–464. Bizley, J. K., & Cohen, Y. E. (2013). The what, where and how of auditory-object perception. Nature Reviews Neuroscience 14(10), 693–707. Blood, A. J., & Zatorre, R. J. (2001). Intensely pleasurable responses to music correlate with activity in brain regions implicated in reward and emotion. Proceedings of the National Academy of Sciences 98(20), 11818–11823. Bostan, A. C., Dum, R. P., & Strick, P. L. (2013). Cerebellar networks with the cerebral cortex and basal ganglia. Trends in Cognitive Sciences 17(5), 241–254. Brattico, E., Alluri, V., Bogert, B., Jacobsen, T., Vartiainen, N., Nieminen, S., & Tervaniemi, M. (2011). A functional MRI study of happy and sad emotions in music with and without lyrics. Frontiers in Psychology 2, 308. Brechmann, A., & Scheich, H. (2005). Hemispheric shifts of sound representation in auditory cortex with conceptual listening. Cerebral Cortex 15(5), 578–587. Brown, S., & Martinez, M. J. (2007). Activation of premotor vocal areas during musical discrimination. Brain and Cognition 63(1), 59–69. Brown, S., Martinez, M. J., & Parsons, L. M. (2006). The neural basis of human dance. Cerebral Cortex 16(8), 1157–1167. Buckner, R. L. (2013). The cerebellum and cognitive function: 25 years of insight from anatomy and neuroimaging. Neuron 80(3), 807–815. Buhusi, C. V., & Meck, W. H. (2005). What makes us tick? Functional and neural mechanisms of interval timing. Nature Reviews Neuroscience 6(10), 755–765.
Burunat, I., Alluri, V., Toiviainen, P., Numminen, J., & Brattico, E. (2014). Dynamics of brain activity underlying working memory for music in a naturalistic condition. Cortex 57, 254–269. Caligiore, D., Pezzulo, G., Baldassarre, G., Bostan, A. C., Strick, P. L., Doya, K., … Herreros, I. (2017). Consensus paper: Towards a systems-level view of cerebellar function: The interplay between cerebellum, basal ganglia, and cortex. Cerebellum 16(1), 203–229. Cammoun, L., Thiran, J. P., Griffa, A., Meuli, R., Hagmann, P., & Clarke, S. (2015). Intrahemispheric cortico-cortical connections of the human auditory cortex. Brain Structure & Function 220(6), 3537–3553. Cha, K., Zatorre, R. J., & Schönwiesner, M. (2016). Frequency selectivity of voxel-by-voxel functional connectivity in human auditory cortex. Cerebral Cortex 26(1), 211–224. Chapin, H. L., Zanto, T., Jantzen, K. J., Kelso, S. J. A. A., Steinberg, F., & Large, E. W. (2010). Neural responses to complex auditory rhythms: The role of attending. Frontiers in Psychology 1, 547–558. Chauvigné, L. A. S., Gitau, K. M., & Brown, S. (2014). The neural basis of audiomotor entrainment: An ALE meta-analysis. Frontiers in Human Neuroscience 8, 776. Chen, C. H., Fremont, R., Arteaga-Bracho, E. E., & Khodakhah, K. (2014). Short latency cerebellar modulation of the basal ganglia. Nature Neuroscience 17(12), 1767–1775. Chen, J. L., Penhune, V. B., & Zatorre, R. J. (2008a). Listening to musical rhythms recruits motor regions of the brain. Cerebral Cortex 18(12), 2844–2854. Chen, J. L., Penhune, V. B., & Zatorre, R. J. (2008b). Moving on time: Brain network for auditorymotor synchronization is modulated by rhythm complexity and musical training. Journal of Cognitive Neuroscience 20(2), 226–239. Chen, J. L., Rae, C., & Watkins, K. E. (2012). Learning to play a melody: An fMRI study examining the formation of auditory-motor associations. NeuroImage 59(2), 1200–1208. Chen, J. L., Zatorre, R. J., & Penhune, V. B. (2006). Interactions between auditory and dorsal premotor cortex during synchronization to musical rhythms. NeuroImage 32(4), 1771–1781. Choppin, S., Trost, W., Dondaine, T., Millet, B., Drapier, D., Vérin, M., … Grandjean, D. (2016). Alteration of complex negative emotions induced by music in euthymic patients with bipolar disorder. Journal of Affective Disorders 191, 15–23. Coull, J. T., Cheng, R. K., & Meck, W. H. (2011). Neuroanatomical and neurochemical substrates of timing. Neuropsychopharmacology 36(1), 3–25. Coull, J. T., Hwang, H. J., Leyton, M., & Dagher, A. (2012). Dopamine precursor depletion impairs timing in healthy volunteers by attenuating activity in putamen and supplementary motor area. Journal of Neuroscience 32(47), 16704–16715. Coull, J. T., Vidal, F., & Burle, B. (2016). When to act, or not to act: That’s the SMA’s question. Current Opinion in Behavioral Sciences 8, 14–21. Cunnington, R., Bradshaw, J. L., & Iansek, R. (1996). The role of the supplementary motor area in the control of voluntary movement. Human Movement Science 15(5), 627–647. D’Ausilio, A., Altenmüller, E., Olivetti Belardinelli, M., & Lotze, M. (2006). Cross-modal plasticity of the motor cortex while listening to a rehearsed musical piece. European Journal of Neuroscience 24(3), 955–958. Da Costa, S., van der Zwaag, W., Marques, J. P., Frackowiak, R. S. J., Clarke, S., & Saenz, M. (2011). Human primary auditory cortex follows the shape of Heschl’s gyrus. Journal of Neuroscience 31(40), 14067–14075. de Heer, W. A., Huth, A. G., Griffiths, T. L., Gallant, J. L., & Theunissen, F. E. (2017). The hierarchical cortical organization of human speech processing. Journal of Neuroscience 37(27), 6539–6557.
Del Olmo, M. F., Cheeran, B., Koch, G., & Rothwell, J. C. (2007). Role of the cerebellum in externally paced rhythmic finger movements. Journal of Neurophysiology 98(1), 145–152. Diedrichsen, J., Criscimagna-Hemminger, S. E., & Shadmehr, R. (2007). Dissociating timing and coordination as functions of the cerebellum. Journal of Neuroscience 27(23), 6291–6301. Doyon, J., Penhune, V., & Ungerleider, L. G. (2003). Distinct contribution of the cortico-striatal and cortico-cerebellar systems to motor skill learning. Neuropsychologia 41(3), 252–262. Dum, R. P. (2002). An unfolded map of the cerebellar dentate nucleus and its projections to the cerebral cortex. Journal of Neurophysiology 89(1), 634–639. Durstewitz, D. (2003). Self-organizing neural integrator predicts interval times through climbing activity. Journal of Neuroscience 23(12), 5342–5353. Farrugia, N., Jakubowski, K., Cusack, R., & Stewart, L. (2015). Tunes stuck in your brain: The frequency and affective evaluation of involuntary musical imagery correlate with cortical structure. Consciousness and Cognition 35, 66–77. Fatemi, S. H., Aldinger, K. A., Ashwood, P., Bauman, M. L., Blaha, C. D., Blatt, G. J., …Welsh, J. P. (2012). Consensus paper: Pathological role of the cerebellum in autism. Cerebellum 11(3), 777– 807. Fernández-Miranda, J. C., Wang, Y., Pathak, S., Stefaneau, L., Verstynen, T., & Yeh, F. C. (2015). Asymmetry, connectivity, and segmentation of the arcuate fascicle in the human brain. Brain Structure & Function 220(3), 1665–1680. Foster, N. E. V. V, Halpern, A. R., & Zatorre, R. J. (2013). Common parietal activation in musical mental transformations across pitch and time. NeuroImage 75, 27–35. Friston, K. J. (2011). Functional and effective connectivity: A review. Brain Connectivity 1(1), 13– 36. Froud, K. E., Wong, A. C. Y., Cederholm, J. M. E., Klugmann, M., Sandow, S. L., Julien, J.-P., … Housley, G. D. (2015). Type II spiral ganglion afferent neurons drive medial olivocochlear reflex suppression of the cochlear amplifier. Nature Communications 6(1), 7115. Frühholz, S., Trost, W., & Grandjean, D. (2014). The role of the medial temporal limbic system in processing emotions in voice and music. Progress in Neurobiology 123, 1–17. Frühholz, S., Trost, W., & Kotz, S. A. (2016). The sound of emotions: Towards a unifying neural network perspective of affective sound processing. Neuroscience & Biobehavioral Reviews 68, 96–110. Fujioka, T., Trainor, L. J., Large, E. W., & Ross, B. (2012). Internalized Timing of isochronous sounds is represented in neuromagnetic beta oscillations. Journal of Neuroscience 32(5), 1791– 1802. Gaab, N., Gaser, C., Zaehle, T., Jancke, L., & Schlaug, G. (2003). Functional anatomy of pitch memory: An fMRI study with sparse temporal sampling. NeuroImage 19(4), 1417–1426. Gao, J. H., Parsons, L. M., Bower, J. M., Xiong, J., Li, J., & Fox, P. T. (1996). Cerebellum implicated in sensory acquisition and discrimination rather than motor control. Science 272(5261), 545–547. Giovannelli, F., Banfi, C., Borgheresi, A., Fiori, E., Innocenti, I., Rossi, S., … Cincotta, M. (2013). The effect of music on corticospinal excitability is related to the perceived emotion: A transcranial magnetic stimulation study. Cortex 49(3), 702–710. Grahn, J. A., & Brett, M. (2007). Rhythm and beat perception in motor areas of the brain. Journal of Cognitive Neuroscience 19(5), 893–906. Grahn, J. A., Henry, M. J., & McAuley, J. D. (2011). fMRI investigation of cross-modal interactions in beat perception: Audition primes vision, but not vice versa. NeuroImage 54(2), 1231–1243. Grahn, J. A., & Rowe, J. B. (2009). Feeling the beat: Premotor and striatal interactions in musicians and nonmusicians during beat perception. Journal of Neuroscience 29(23), 7540–7548.
Grahn, J. A., & Rowe, J. B. (2013). Finding and feeling the musical beat: Striatal dissociations between detection and prediction of regularity. Cerebral Cortex 23(4), 913–921. Griffiths, T. D., & Warren, J. D. (2002). The planum temporale as a computational hub. Trends in Neurosciences 25(7), 348–353. Griffiths, T. D., & Warren, J. D. (2004). What is an auditory object? Nature Reviews Neuroscience 5(11), 887–892. Grothe, B. (2000). The evolution of temporal processing in the medial superior olive, an auditory brainstem structure. Progress in Neurobiology 61(6), 581–610. Groussard, M., Viader, F., Hubert, V., Landeau, B., Abbas, A., Desgranges, B., … Platel, H. (2010). Musical and verbal semantic memory: Two distinct neural networks? NeuroImage 49(3), 2764– 2773. Grube, M., Cooper, F. E., Chinnery, P. F., & Griffiths, T. D. (2010). Dissociation of duration-based and beat-based auditory timing in cerebellar degeneration. Proceedings of the National Academy of Sciences 107(25), 11597–11601. Grube, M., Lee, K. H., Griffiths, T. D., Barker, A. T., & Woodruff, P. W. (2010). Transcranial magnetic theta-burst stimulation of the human cerebellum distinguishes absolute, duration-based from relative, beat-based perception of subsecond time intervals. Frontiers in Psychology 1, 171. Haber, S. N., & Knutson, B. (2010). The reward circuit: Linking primate anatomy and human imaging. Neuropsychopharmacology 35(1), 4–26. Halpern, A. R., & Zatorre, R. J. (1999). When that tune runs through your head: A PET investigation of auditory imagery for familiar melodies. Cerebral Cortex 9(7), 697–704. Halpern, A. R., Zatorre, R. J., Bouffard, M., & Johnson, J. A. (2004). Behavioral and neural correlates of perceived and imagined musical timbre. Neuropsychologia 42(9), 1281–1292. Halsband, U., Ito, N., Tanji, J., & Freund, H. J. (1993). The role of premotor cortex and the supplementary motor area in the temporal control of movement in man. Brain 116(1), 243–266. Harrington, D. L., Castillo, G. N., Greenberg, P. A., Song, D. D., Lessig, S., Lee, R. R., & Rao, S. M. (2011). Neurobehavioral mechanisms of temporal processing deficits in Parkinson’s disease. PLoS ONE 6(2), e17461. Harrington, D. L., & Jahanshahi, M. (2016). Reconfiguration of striatal connectivity for timing and action. Current Opinion in Behavioral Sciences 8, 78–84. Harris, R., & De Jong, B. M. (2014). Cerebral activations related to audition-driven performance imagery in professional musicians. PLoS ONE, 9(4), e93681. Hasegawa, T., Matsuki, K. I., Ueno, T., Maeda, Y., Matsue, Y., Konishi, Y., & Sadato, N. (2004). Learned audio-visual cross-modal associations in observed piano playing activate the left planum temporale: An fMRI study. Cognitive Brain Research 20(3), 510–518. Haueisen, J., & Knösche, T. R. (2001). Involuntary motor activity in pianists evoked by music perception. Journal of Cognitive Neuroscience 13(6), 786–792. Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature Reviews Neuroscience 8(5), 393–402. Horn, A. K. E. (2006). The reticular formation. Progress in Brain Research 151, 127–155. Huffman, R. F., & Henson, O. W. (1990). The descending auditory pathway and acousticomotor systems: Connections with the inferior colliculus. Brain Research Reviews 15(3), 295–323. Humphries, C., Liebenthal, E., & Binder, J. R. (2010). Tonotopic organization of human auditory cortex. NeuroImage 50(3), 1202–1211. Huron, D. B. (2006). Sweet anticipation: Music and the psychology of expectation. Cambridge, MA: MIT Press. Hyde, K. L., Peretz, I., & Zatorre, R. J. (2008). Evidence for the role of the right auditory cortex in fine pitch resolution. Neuropsychologia 46(2), 632–639.
Ito, M. (2008). Control of mental activities by internal models in the cerebellum. Nature Reviews Neuroscience 9(4), 304–313. Iversen, J. R., & Balasubramaniam, R. (2016). Synchronization and temporal processing. Current Opinion in Behavioral Sciences 8, 175–180. Ivry, R. B., Spencer, R. M., Zelaznik, H. N., & Diedrichsen, J. (2002). The cerebellum and event timing. Annals of the New York Academy of Sciences 978, 302–317. Jahanshahi, M., Jenkins, I. H., Brown, R. G., Marsden, C. D., Passingham, R. E., & Brooks, D. J. (1995). Self-initiated versus externally triggered movements: I. An investigation using measurement of regional cerebral blood flow with PET and movement-related potentials in normal and Parkinson’s disease subjects. Brain 118(4), 913–933. Jahanshahi, M., Jones, C. R. G., Zijlmans, J., Katzenschlager, R., Lee, L., Quinn, N., … Lees, A. J. (2010). Dopaminergic modulation of striato-frontal connectivity during motor timing in Parkinson’s disease. Brain 133(3), 727–745. Janak, P. H., & Tye, K. M. (2015). From circuits to behaviour in the amygdala. Nature 517(7534), 284–292. Janata, P. (2009). The neural architecture of music-evoked autobiographical memories. Cerebral Cortex 19(11), 2579–2594. Janata, P. (2015). Neural basis of music perception. In G. G. Celesia & G. Hickok (Eds.), Handbook of clinical neurology: The human auditory system (Vol. 129, pp. 187–205). Amsterdam: Elsevier. Janata, P., Birk, J., Van Horn, J., Leman, M., Tillmann, B., & Bharucha, J. J. (2002). The cortical topography of tonal structures underlying Western music. Science 293(5539), 2425–2430. Janata, P., Tillmann, B., & Bharucha, J. J. (2002). Listening to polyphonic music recruits domaingeneral attention and working memory circuits. Cognitive, Affective & Behavioral Neuroscience 2(2), 121–140. Jäncke, L., Loose, R., Lutz, K., Specht, K., & Shah, N. (2000). Cortical activations during paced finger-tapping applying visual and auditory pacing stimuli. Cognitive Brain Research 10(1–2), 51– 66. Juslin, P. N., & Västfjäll, D. (2008). Emotional responses to music: The need to consider underlying mechanisms. Behavioral and Brain Sciences 31(5), 559–575. Kaiser, J., Ripper, B., Birbaumer, N., & Lutzenberger, W. (2003). Dynamics of gamma-band activity in human magnetoencephalogram during auditory pattern working memory. NeuroImage 20(2), 816–827. Keller, P. E. (2012). Mental imagery in music performance: Underlying mechanisms and potential benefits. Annals of the New York Academy of Sciences 1252(1), 206–213. Kelly, R. M., & Strick, P. L. (2003). Cerebellar loops with motor cortex and prefrontal cortex of a nonhuman primate. Journal of Neuroscience 23(23), 8432–8444. Keren-Happuch, E., Chen, S. H. A., Ho, M. H. R., & Desmond, J. E. (2014). A meta-analysis of cerebellar contributions to higher cognition from PET and fMRI studies. Human Brain Mapping 35(2), 593–615. Kiehl, K. A., Laurens, K. R., Duty, T. L., Forster, B. B., & Liddle, P. F. (2001). Neural sources involved in auditory target detection and novelty processing: An event-related fMRI study. Psychophysiology 38(1), 133–142. Klein, C., Liem, F., Hänggi, J., Elmer, S., & Jäncke, L. (2016). The “silent” imprint of musical training. Human Brain Mapping 37(2), 536–546. Klein, M. E., & Zatorre, R. J. (2011). A role for the right superior temporal sulcus in categorical perception of musical chords. Neuropsychologia 49(5), 878–887. Klein, M. E., & Zatorre, R. J. (2015). Representations of invariant musical categories are decodable by pattern analysis of locally distributed BOLD responses in superior temporal and intraparietal
sulci. Cerebral Cortex 25(7), 1947–1957. Koelsch, S. (2006). Significance of Broca’s area and ventral premotor cortex for music-syntactic processing. Cortex 42(4), 518–520. Koelsch, S. (2011). Toward a neural basis of music perception: A review and updated model. Frontiers in Psychology 2, 110. Koelsch, S. (2014). Brain correlates of music-evoked emotions. Nature Reviews Neuroscience 15(3), 170–180. Koelsch, S., Fritz, T., Schulze, K., Alsop, D., & Schlaug, G. (2005). Adults and children processing music: An fMRI study. NeuroImage 25(4), 1068–1076. Koelsch, S., Fritz, T., v. Cramon, D. Y., Müller, K., & Friederici, A. D. (2006). Investigating emotion with music: An fMRI study. Human Brain Mapping 27(3), 239–250. Koelsch, S., Gunter, T. C., v. Cramon, D. Y., Zysset, S., Lohmann, G., & Friederici, A. D. (2002). Bach speaks: A cortical “language-network” serves the processing of music. NeuroImage 17(2), 956–966. Koelsch, S., Schulze, K., Sammler, D., Fritz, T., Müller, K., & Gruber, O. (2009). Functional architecture of verbal and tonal working memory: An fMRI study. Human Brain Mapping 30(3), 859–873. Koelsch, S., & Skouras, S. (2014). Functional centrality of amygdala, striatum and hypothalamus in a “small-world” network underlying joy: An fMRI study with music. Human Brain Mapping 35(7), 3485–3498. Koelsch, S., Skouras, S., Fritz, T., Herrera, P., Bonhage, C., Küssner, M. B., & Jacobs, A. M. (2013). The roles of superficial amygdala and auditory cortex in music-evoked fear and joy. NeuroImage 81, 49–60. Koelsch, S., Skouras, S., & Lohmann, G. (2018). The auditory cortex hosts network nodes influential for emotion processing: An fMRI study on music-evoked fear and joy. PLoS ONE 13(1), e0190057. Kornysheva, K., & Schubotz, R. I. (2011). Impairment of auditory-motor timing and compensatory reorganization after ventral premotor cortex stimulation. PLoS ONE 6(6), e21421. Kotz, S. A., Brown, R. M., & Schwartze, M. (2016). Cortico-striatal circuits and the timing of action and perception. Current Opinion in Behavioral Sciences 8, 42–45. Kotz, S. A., Stockert, A., & Schwartze, M. (2014). Cerebellum, temporal predictability and the updating of a mental model. Philosophical Transactions of the Royal Society B: Biological Sciences 369(1658), 20130403. Koziol, L. F., Budding, D., Andreasen, N., D’Arrigo, S., Bulgheroni, S., Imamizu, H., …Yamazaki, T. (2014). Consensus paper: The cerebellum’s role in movement and cognition. Cerebellum 13(1), 151–177. Koziol, L. F., Budding, D. E., & Chidekel, D. (2011). Sensory integration, sensory processing, and sensory modulation disorders: Putative functional neuroanatomic underpinnings. Cerebellum 10(4), 770–792. Lahav, A., Saltzman, E., & Schlaug, G. (2007). Action representation of sound: Audiomotor recognition network while listening to newly acquired actions. Journal of Neuroscience 27(2), 308–314. Langers, D. R. M. (2014). Assessment of tonotopically organised subdivisions in human auditory cortex using volumetric and surface-based cortical alignments. Human Brain Mapping 35(4), 1544–1561. Lappe, C., Lappe, M., & Pantev, C. (2016). Differential processing of melodic, rhythmic and simple tone deviations in musicians: An MEG study. NeuroImage 124, 898–905.
Lappe, C., Steinsträter, O., & Pantev, C. (2013). Rhythmic and melodic deviations in musical sequences recruit different cortical areas for mismatch detection. Frontiers in Human Neuroscience 7, 260. Large, E. W., Herrera, J. A., & Velasco, M. J. (2015). Neural networks for beat perception in musical rhythm. Frontiers in Systems Neuroscience 9, 159. Large, E. W., & Snyder, J. S. (2009). Pulse and meter as neural resonance. Annals of the New York Academy of Sciences 1169, 46–57. LeDoux, J. (2007). The amygdala. Current Biology 17(20), R868–R874. Lee, K.-H., Egleston, P. N., Brown, W. H., Gregory, A. N., Barker, A. T., & Woodruff, P. W. R. (2007). The role of the cerebellum in subsecond time perception: Evidence from repetitive transcranial magnetic stimulation. Journal of Cognitive Neuroscience 19(1), 147–157. Lee, Y. S., Janata, P., Frost, C., Hanke, M., & Granger, R. (2011). Investigation of melodic contour processing in the brain using multivariate pattern-based fMRI. NeuroImage 57(1), 293–300. Lehéricy, S., Ducros, M., Krainik, A., Francois, C., Van De Moortele, P. F., Ugurbil, K., & Kim, D. S. (2004). 3-D diffusion tensor axonal tracking shows distinct SMA and pre-SMA projections to the human striatum. Cerebral Cortex 14(12), 1302–1309. Lehne, M., Rohrmeier, M., & Koelsch, S. (2013). Tension-related activity in the orbitofrontal cortex and amygdala: An fMRI study with music. Social Cognitive and Affective Neuroscience 9(10), 1515–1523. Leow, L. A., & Grahn, J. A. (2014). Neural mechanisms of rhythm perception: Present findings and future directions. Advances in Experimental Medicine and Biology 829, 325–338. Lima, C. F., Krishnan, S., & Scott, S. K. (2016). Roles of supplementary motor areas in auditory processing and auditory imagery. Trends in Neurosciences 39(8), 527–542. Loui, P. (2015). A dual-stream neuroanatomy of singing. Music Perception 32(3), 232–241. Lusk, N. A., Petter, E. A., Macdonald, C. J., & Meck, W. H. (2016). Cerebellar, hippocampal, and striatal time cells. Current Opinion in Behavioral Sciences 8, 186–192. Maes, P.-J., Leman, M., Palmer, C., & Wanderley, M. M. (2014). Action-based effects on music perception. Frontiers in Psychology 4, 1008. Maess, B., Koelsch, S., Gunter, T. C., & Friederici, A. D. (2001). Musical syntax is processed in Broca’s area: An MEG study. Nature Neuroscience 4(5), 540–545. Maidhof, C., & Koelsch, S. (2011). Effects of selective attention on syntax processing in music and language. Journal of Cognitive Neuroscience 23(9), 2252–2267. Manto, M., Bower, J. M., Conforto, A. B., Delgado-García, J. M., Da Guarda, S. N. F., Gerwig, M., … Timmann, D. (2012). Consensus paper: Roles of the cerebellum in motor control: The diversity of ideas on cerebellar involvement in movement. Cerebellum 11, 457–487. Martínez-Molina, N., Mas-Herrero, E., Rodríguez-Fornells, A., Zatorre, R. J., & Marco-Pallarés, J. (2016). Neural correlates of specific musical anhedonia. Proceedings of the National Academy of Sciences 113(46), E7337–E7345. Marvel, C. L., & Desmond, J. E. (2010). Functional topography of the cerebellum in verbal working memory. Neuropsychology Review 20(3), 271–279. Mas-Herrero, E., Dagher, A., & Zatorre, R. J. (2017). Modulating musical reward sensitivity up and down with transcranial magnetic stimulation. Nature Human Behaviour 2(1), 27–32. Matell, M. S., & Meck, W. H. (2004). Cortico-striatal circuits and interval timing: Coincidence detection of oscillatory processes. Cognitive Brain Research 21(2), 139–170. Mauk, M. D., & Buonomano, D. V. (2004). The neural basis of temporal processing. Annual Review of Neuroscience 27, 307–340. Mayville, J. M., Jantzen, K. J., Fuchs, A., Steinberg, F. L., & Kelso, J. A. S. (2002). Cortical and subcortical networks underlying syncopated and synchronized coordination revealed using fMRI.
Human Brain Mapping 17(4), 214–229. Medina, J. F., & Mauk, M. D. (2000). Computer simulation of cerebellar information processing. Nature Neuroscience 3(Suppl.), 1205–1211. Menon, V., & Levitin, D. J. (2005). The rewards of music listening: Response and physiological connectivity of the mesolimbic system. NeuroImage 28(1), 175–184. Merchant, H., & Bartolo, R. (2018). Primate beta oscillations and rhythmic behaviors. Journal of Neural Transmission 125, 461–470. Merchant, H., Grahn, J., Trainor, L., Rohrmeier, M., & Fitch, W. T. (2015). Finding the beat: A neural perspective across humans and non-human primates. Philosophical Transactions of the Royal Society B: Biological Sciences 370(1664), 20140093. Merchant, H., Harrington, D. L., & Meck, W. H. (2013). Neural basis of the perception and estimation of time. Annual Review of Neuroscience 36, 313–336. Merchant, H., Perez, O., Zarco, W., & Gamez, J. (2013). Interval tuning in the primate medial premotor cortex as a general timing mechanism. Journal of Neuroscience 33(21), 9082–9096. Michaelis, K., Wiener, M., & Thompson, J. C. (2014). Passive listening to preferred motor tempo modulates corticospinal excitability. Frontiers in Human Neuroscience 8, 252. Middleton, F. A., & Strick, P. L. (2001). Cerebellar projections to the prefrontal cortex of the primate. Journal of Neuroscience 21(2), 700–712. Mitterschiffthaler, M. T., Fu, C. H. Y., Dalton, J. A., Andrew, C. M., & Williams, S. C. R. (2007). A functional MRI study of happy and sad affective states induced by classical music. Human Brain Mapping 28(11), 1150–1162. Molinari, M., Leggio, M. G., Filippini, V., Gioia, M. C., Cerasa, A., & Thaut, M. H. (2005). Sensorimotor transduction of time information is preserved in subjects with cerebellar damage. Brain Research Bulletin 67, 448–458. Molinari, M., Leggio, M. G., & Thaut, M. H. (2007). The cerebellum and neural networks for rhythmic sensorimotor synchronization in the human brain. Cerebellum 6(1), 18–23. Moore, E., Schaefer, R. S., Bastin, M. E., Roberts, N., & Overy, K. (2014). Can musical training influence brain connectivity? Evidence from diffusion tensor MRI. Brain Sciences 4(2), 405–427. Morillon, B., & Baillet, S. (2017). Motor origin of temporal predictions in auditory attention. Proceedings of the National Academy of Sciences 114(42), E8913–E8921. Mueller, K., Fritz, T., Mildner, T., Richter, M., Schulze, K., Lepsien, J. J., … Möller, H. E. (2015). Investigating the dynamics of the brain response to music: A central role of the ventral striatum/nucleus accumbens. NeuroImage 116, 68–79. Nachev, P., Kennard, C., & Husain, M. (2008). Functional role of the supplementary and presupplementary motor areas. Nature Reviews Neuroscience 9, 856–869. Narayanan, N. S., Land, B. B., Solder, J. E., Deisseroth, K., & DiLeone, R. J. (2012). Prefrontal D1 dopamine signaling is required for temporal control. Proceedings of the National Academy of Sciences 109(50), 20726–20731. Nayagam, B. A., Muniak, M. A., & Ryugo, D. K. (2011). The spiral ganglion: Connecting the peripheral and central auditory systems. Hearing Research 278(1–2), 2–20. Nelson, A., Schneider, D. M., Takatoh, J., Sakurai, K., Wang, F., & Mooney, R. (2013). A circuit for motor cortical modulation of auditory cortical activity. Journal of Neuroscience 33(36), 14342– 14353. Norman-Haignere, S., Kanwisher, N., & McDermott, J. H. (2013). Cortical pitch regions in humans respond primarily to resolved harmonics and are located in specific tonotopic regions of anterior auditory cortex. Journal of Neuroscience 33(50), 19451–19469. Novembre, G., & Keller, P. E. (2014). A conceptual review on action-perception coupling in the musicians’ brain: What is it good for? Frontiers in Human Neuroscience 8, 603.
Nozaradan, S., Schwartze, M., Obermeier, C., & Kotz, S. A. (2017). Specific contributions of basal ganglia and cerebellum to the neural tracking of rhythm. Cortex 95, 156–168. O’Reilly, J. X., Mesulam, M. M., & Nobre, A. C. (2008). The cerebellum predicts the timing of perceptual events. Journal of Neuroscience 28(9), 2252–2260. Ohnishi, T., Matsuda, H., Asada, T., Aruga, M., Hirakata, M., Nishikawa, M., … Imabayashi, E. (2001). Functional anatomy of musical perception in musicians. Cerebral Cortex 11(8), 754–760. Pallesen, K. J., Brattico, E., Bailey, C., Korvenoja, A., Koivisto, J., Gjedde, A., & Carlson, S. (2005). Emotion processing of major, minor, and dissonant chords: A functional magnetic resonance imaging study. Annals of the New York Academy of Sciences 1060, 450–453. Palomar-García, M. Á., Zatorre, R. J., Ventura-Campos, N., Bueichekú, E., & Ávila, C. (2017). Modulation of functional connectivity in auditory-motor networks in musicians compared with nonmusicians. Cerebral Cortex 27(5), 2768–2778. Pannese, A., Grandjean, D., & Frühholz, S. (2016). Amygdala and auditory cortex exhibit distinct sensitivity to relevant acoustic features of auditory emotions. Cortex 85, 116–125. Paquette, S., Fujii, S., Li, H. C., & Schlaug, G. (2017). The cerebellum’s contribution to beat interval discrimination. NeuroImage 163, 177–182. Parsons, L. M. (2012). Exploring the functional neuroanatomy of music performance, perception, and comprehension. The Cognitive Neuroscience of Music 930(1), 211–231. Parsons, L. M., Petacchi, A., Schmahmann, J. D., & Bower, J. M. (2009). Pitch discrimination in cerebellar patients: Evidence for a sensory deficit. Brain Research 1303, 84–96. Patterson, R. D., Uppenkamp, S., Johnsrude, I. S., & Griffiths, T. D. (2002). The processing of temporal pitch and melody information in auditory cortex. Neuron 36(4), 767–776. Pecenka, N., Engel, A., & Keller, P. E. (2013). Neural correlates of auditory temporal predictions during sensorimotor synchronization. Frontiers in Human Neuroscience 7, 380. Pelzer, E. A., Melzer, C., Timmermann, L., von Cramon, D. Y., & Tittgemeyer, M. (2017). Basal ganglia and cerebellar interconnectivity within the human thalamus. Brain Structure and Function 222(1), 381–392. Perani, D. (2012). Functional and structural connectivity for language and music processing at birth. Rendiconti Lincei 23(3), 305–314. Peretz, I., Gosselin, N., Belin, P., Zatorre, R. J., Plailly, J., & Tillmann, B. (2009). Music lexical networks: The cortical organization of music recognition. Annals of the New York Academy of Sciences 1169, 256–265. Peretz, I., & Zatorre, R. J. (2005). Brain organization for music processing. Annual Review of Psychology 56, 89–114. Petter, E. A., Lusk, N. A., Hesslow, G., & Meck, W. H. (2016). Interactive roles of the cerebellum and striatum in sub-second and supra-second timing: Support for an initiation, continuation, adjustment, and termination (ICAT) model of temporal processing. Neuroscience & Biobehavioral Reviews 71, 739–755. Pfordresher, P. Q., Mantell, J. T., Brown, S., Zivadinov, R., & Cox, J. L. (2014). Brain responses to altered auditory feedback during musical keyboard production: An fMRI study. Brain Research 1556, 28–37. Plakke, B., & Romanski, L. M. (2014). Auditory connections and functions of prefrontal cortex. Frontiers in Neuroscience 8, 199. Platel, H., Baron, J. C., Desgranges, B., Bernard, F., & Eustache, F. (2003). Semantic and episodic memory of music are subserved by distinct neural networks. NeuroImage 20(1), 244–256. Proverbio, A. M., Orlandi, A., & Pisanu, F. (2016). Brain processing of consonance/dissonance in musicians and controls: A hemispheric asymmetry revisited. European Journal of Neuroscience 44(6), 2340–2356.
Rao, S. M., Harrington, D. L., Haaland, K. Y., Bobholz, J. A., Cox, R. W., & Binder, J. R. (1997). Distributed neural systems underlying the timing of movements. Journal of Neuroscience 17(14), 5528–5535. Rauschecker, J. P., & Scott, S. K. (2009). Maps and streams in the auditory cortex: Nonhuman primates illuminate human speech processing. Nature Neuroscience 12(6), 718–724. Reser, D. H., Burman, K. J., Richardson, K. E., Spitzer, M. W., & Rosa, M. G. P. (2009). Connections of the marmoset rostrotemporal auditory area: Express pathways for analysis of affective content in hearing. European Journal of Neuroscience 30(4), 578–592. Reybrouck, M., & Brattico, E. (2015). Neuroplasticity beyond sounds: Neural adaptations following long-term musical aesthetic experiences. Brain Sciences 5(1), 69–91. Roberts, T. F., Hisey, E., Tanaka, M., Kearney, M. G., Chattree, G., Yang, C. F., … Mooney, R. (2017). Identification of a motor-to-auditory pathway important for vocal learning. Nature Neuroscience 20(7), 978–986. Rogenmoser, L., Zollinger, N., Elmer, S., & Jäncke, L. (2016). Independent component processes underlying emotions during natural music listening. Social Cognitive and Affective Neuroscience 11(9), 1428–1439. Ross, B., Barat, M., & Fujioka, T. (2017). Sound-making actions lead to immediate plastic changes of neuromagnetic evoked responses and induced β-band oscillations during perception. Journal of Neuroscience 37(24), 5948–5959. Ross, J. M., Iversen, J. R., & Balasubramaniam, R. (2016). Motor simulation theories of musical beat perception. Neurocase 22(6), 558–565. Rossignol, S., & Melvill Jones, G. (1976). Audio-spinal influence in man studied by the H-reflex and its possible role on rhythmic movements synchronized to sound. Electroencephalography and Clinical Neurophysiology 41(1), 83–92. Royal, I., Vuvan, D. T., Zendel, B. R., Robitaille, N., Schönwiesner, M., & Peretz, I. (2016). Activation in the right inferior parietal lobule reflects the representation of musical structure beyond simple pitch discrimination. PLoS ONE 11(5), e0155291. Sachs, M. E., Ellis, R. J., Schlaug, G., & Loui, P. (2016). Brain connectivity reflects human aesthetic responses to music. Social Cognitive and Affective Neuroscience 11(6), 884–891. Salimpoor, V. N., Benovoy, M., Larcher, K., Dagher, A., & Zatorre, R. J. (2011). Anatomically distinct dopamine release during anticipation and experience of peak emotion to music. Nature Neuroscience 14(2), 257–264. Salimpoor, V. N., Van Den Bosch, I., Kovacevic, N., McIntosh, A. R., Dagher, A., & Zatorre, R. J. (2013). Interactions between the nucleus accumbens and auditory cortices predict music reward value. Science 340(6129), 216–219. Salimpoor, V. N., Zald, D. H., Zatorre, R. J., Dagher, A., & McIntosh, A. R. (2015). Predictions and the brain: How musical sounds become rewarding. Trends in Cognitive Sciences 19(2), 86–91. Sänger, J., Müller, V., & Lindenberger, U. (2012). Intra- and interbrain synchronization and network properties when playing guitar in duets. Frontiers in Human Neuroscience 6, 312. Santoro, R., Moerel, M., De Martino, F., Goebel, R., Ugurbil, K., Yacoub, E., & Formisano, E. (2014). Encoding of natural sounds at multiple spectral and temporal resolutions in the human auditory cortex. PLoS Computational Biology 10(1), e1003412. Satoh, M., Takeda, K., Nagata, K., Hatazawa, J., & Kuzuhara, S. (2001). Activated brain regions in musicians during an ensemble: A PET study. Cognitive Brain Research 12(1), 101–108. Saur, D., Kreher, B. W., Schnell, S., Kummerer, D., Kellmeyer, P., Vry, M.-S., … Weiller, C. (2008). Ventral and dorsal pathways for language. Proceedings of the National Academy of Sciences 105(46), 18035–18040.
Schindler, A., Herdener, M., & Bartels, A. (2013). Coding of melodic gestalt in human auditory cortex. Cerebral Cortex 23(12), 2987–2993. Schmahmann, J. D., & Pandya, D. N. (1997). The cerebrocerebellar system. International Review of Neurobiology 41, 31–38, 38a, 39–60. Schneider, D. M., & Mooney, R. (2015). Motor-related signals in the auditory system for listening and learning. Current Opinion in Neurobiology 33, 78–84. Schneider, D. M., Nelson, A., & Mooney, R. (2014). A synaptic and circuit basis for corollary discharge in the auditory cortex. Nature 513(7517), 189–194. Schön, D., Gordon, R. L., & Besson, M. (2005). Musical and linguistic processing in song perception. Annals of the New York Academy of Sciences 1060(1), 71–81. Schonwiesner, M., & Zatorre, R. J. (2009). Spectro-temporal modulation transfer function of single voxels in the human auditory cortex measured with high-resolution fMRI. Proceedings of the National Academy of Sciences 106(34), 14611–14616. Schubotz, R. I. (2007). Prediction of external events with our motor system: Towards a new framework. Trends in Cognitive Sciences 11(5), 211–218. Schulze, K., Zysset, S., Mueller, K., Friederici, A. D., & Koelsch, S. (2011). Neuroarchitecture of verbal and tonal working memory in nonmusicians and musicians. Human Brain Mapping 32(5), 771–783. Schwartze, M., Keller, P. E., & Kotz, S. A. (2016). Spontaneous, synchronized, and corrective timing behavior in cerebellar lesion patients. Behavioural Brain Research 312, 285–293. Schwartze, M., Rothermich, K., Schmidt-Kassow, M., & Kotz, S. A. (2011). Temporal regularity effects on pre-attentive and attentive processing of deviance. Biological Psychology 87(1), 146– 151. Seger, C. A., Spiering, B. J., Sares, A. G., Quraini, S. I., Alpeter, C., David, J., & Thaut, M. H. (2013). Corticostriatal contributions to musical expectancy perception. Journal of Cognitive Neuroscience 25(7), 1062–1077. Shadmehr, R., Smith, M. A., & Krakauer, J. W. (2010). Error correction, sensory prediction, and adaptation in motor control. Annual Review of Neuroscience 33, 89–108. Sokolov, A. A., Miall, R. C., & Ivry, R. B. (2017). The cerebellum: Adaptive prediction for movement and cognition. Trends in Cognitive Sciences 21(5), 313–332. Spencer, R. M. C., Ivry, R. B., & Zelaznik, H. N. (2005). Role of the cerebellum in movements: Control of timing or movement transitions? Experimental Brain Research 161(3), 383–396. Stewart, L., Overath, T., Warren, J. D., Foxton, J. M., & Griffiths, T. D. (2008). fMRI evidence for a cortical hierarchy of pitch pattern processing. PLoS ONE 3(1), e1470. Stoodley, C. J., & Schmahmann, J. D. (2009). Functional topography in the human cerebellum: A meta-analysis of neuroimaging studies. NeuroImage 44(2), 489–501. Stoodley, C. J., & Schmahmann, J. D. (2010). Evidence for topographic organization in the cerebellum of motor control versus cognitive and affective processing. Cortex 46(7), 831–844. Stupacher, J., Hove, M. J., Novembre, G., Schütz-Bosbach, S., & Keller, P. E. (2013). Musical groove modulates motor cortex excitability: A TMS investigation. Brain and Cognition 82(2), 127–136. Suga, N., & Ma, X. (2003). Multiparametric corticofugal modulation and plasticity in the auditory system. Nature Reviews Neuroscience 4(10), 783–794. Teki, S., Grube, M., & Griffiths, T. D. (2012). A unified model of time perception accounts for duration-based and beat-based timing mechanisms. Frontiers in Integrative Neuroscience 5, 90. Teki, S., Grube, M., Kumar, S., & Griffiths, T. D. (2011). Distinct neural substrates of duration-based and beat-based auditory timing. Journal of Neuroscience 31(10), 3805–3812.
Tervaniemi, M., Medvedev, S. V., Alho, K., Pakhomov, S. V., Roudas, M. S., Van Zuijen, T. L., & Näätänen, R. (2000). Lateralized automatic auditory processing of phonetic versus musical information: A PET study. Human Brain Mapping 10(2), 74–79. Tesche, C. D., & Karhu, J. J. T. (2000). Anticipatory cerebellar responses during somatosensory omission in man. Human Brain Mapping 9(3), 119–142. Thaut, M. H., Demartin, M., & Sanes, J. N. (2008). Brain networks for integrative rhythm formation. PLoS ONE 3(5), e2312. Thaut, M. H., McIntosh, G. C., Prassas, S. G., & Rice, R. R. (1992). Effect of rhythmic auditory cuing on temporal stride parameters and EMG patterns in normal gait. Neurorehabilitation and Neural Repair 6(4), 185–190. Thaut, M. H., Stephan, K. M., Wunderlich, G., Schicks, W., Tellmann, L., Herzog, H., … Hömberg, V. (2009). Distinct cortico-cerebellar activations in rhythmic auditory motor synchronization. Cortex 45(1), 44–53. Thaut, M. H., Trimarchi, P., & Parsons, L. (2014). Human brain basis of musical rhythm perception: Common and distinct neural substrates for meter, tempo, and pattern. Brain Sciences 4(2), 428– 452. Tillmann, B., Janata, P., & Bharucha, J. J. (2003). Activation of the inferior frontal cortex in musical priming. Annals of the New York Academy of Sciences 999, 209–211. Toiviainen, P., Alluri, V., Brattico, E., Wallentin, M., & Vuust, P. (2014). Capturing the musical brain with Lasso: Dynamic decoding of musical features from fMRI data. NeuroImage 88, 170–180. Tollin, D. J. (2003). The lateral superior olive: A functional role in sound source localization. Neuroscientist 9(2), 127–143. Tramo, M. J., Shah, G. D., & Braida, L. D. (2002). Functional role of auditory cortex in frequency processing and pitch perception. Journal of Neurophysiology 87(1), 122–139. Trost, W., Ethofer, T., Zentner, M., & Vuilleumier, P. (2012). Mapping aesthetic musical emotions in the brain. Cerebral Cortex 22(12), 2769–2783. Tseng, Y., Diedrichsen, J., Krakauer, J. W., Shadmehr, R., & Bastian, A. J. (2007). Sensory prediction errors drive cerebellum-dependent adaptation of reaching. Journal of Neurophysiology 98(1), 54– 62. Von Der Heide, R. J., Skipper, L. M., Klobusicky, E., & Olson, I. R. (2013). Dissecting the uncinate fasciculus: Disorders, controversies and a hypothesis. Brain 136(6), 1692–1707. Warren, J. D., Jennings, A. R., & Griffiths, T. D. (2005). Analysis of the spectral envelope of sounds by the human brain. NeuroImage 24(4), 1052–1057. Warren, J. D., Uppenkamp, S., Patterson, R. D., & Griffiths, T. D. (2003). Separating pitch chroma and pitch height in the human brain. Proceedings of the National Academy of Sciences 100(17), 10038–10042. Warren, J. E., Wise, R. J. S., & Warren, J. D. (2005). Sounds do-able: Auditory-motor transformations and the posterior temporal plane. Trends in Neurosciences 28(12), 636–643. Warrier, C., Wong, P., Penhune, V., Zatorre, R., Parrish, T., Abrams, D., & Kraus, N. (2009). Relating structure to function: Heschl’s gyrus and acoustic processing. Journal of Neuroscience 29(1), 61– 69. Wessinger, C. M., VanMeter, J., Tian, B., Van Lare, J., Pekar, J., & Rauschecker, J. P. (2001). Hierarchical organization of the human auditory cortex revealed by functional magnetic resonance imaging. Journal of Cognitive Neuroscience 13(1), 1–7. Wilkins, R. W., Hodges, D. A., Laurienti, P. J., Steen, M., & Burdette, J. H. (2014). Network science and the effects of music preference on functional brain connectivity: From Beethoven to Eminem. Scientific Reports 4(1), 6130.
Wilson, E. M. F., & Davey, N. J. (2002). Musical beat influences corticospinal drive to ankle flexor and extensor muscles in man. International Journal of Psychophysiology 44(2), 177–184. Witt, S. T., Laird, A. R., & Meyerand, M. E. (2008). Functional neuroimaging correlates of fingertapping task variations: An ALE meta-analysis. NeuroImage 42(1), 343–356. Wolpert, D. M., Miall, R. C., & Kawato, M. (1998). Internal models in the cerebellum. Trends in Cognitive Sciences 2(9), 338–347. Wu, J., Zhang, J., Ding, X., Li, R., & Zhou, C. (2013). The effects of music on brain functional networks: A network analysis. Neuroscience 250, 49–59. Wu, J., Zhang, J., Liu, C., Liu, D., Ding, X., & Zhou, C. (2012). Graph theoretical analysis of EEG functional connectivity during music perception. Brain Research 1483, 71–81. Zatorre, R. J. (2002). Auditory cortex. In V. S. Ramachandran (Ed.), Encyclopedia of the Human Brain (pp. 289–301). Amsterdam: Elsevier. Zatorre, R. J. (2015). Musical pleasure and reward: Mechanisms and dysfunction. Annals of the New York Academy of Sciences 1337(1), 202–211. Zatorre, R. J., Bouffard, M., & Belin, P. (2004). Sensitivity to auditory object features in human temporal neocortex. Journal of Neuroscience 24(14), 3637–3642. Zatorre, R. J., Halpern, A. R., Perry, D. W., Meyer, E., & Evans, A. C. (1996). Hearing in the mind’s ear: A PET investigation of musical imagery and perception. Journal of Cognitive Neuroscience 8(1), 29–46. Zatorre, R. J., & Salimpoor, V. N. (2013). From perception to pleasure: Music and its neural substrates. Proceedings of the National Academy of Sciences 110(Suppl. 2), 10430–10437. Zatorre, R. J., & Zarate, J. (2012). Cortical processing of music. In D. Poeppel, T. Overath, A. N. Popper, & R. R. Fay (Eds.), The human auditory cortex: Springer handbook of auditory research (Vol. 43, pp. 261–294). New York: Springer. Zysset, S., Huber, O., Ferstl, E., & von Cramon, D. Y. (2002). The anterior frontomedian cortex and evaluative judgment: An fMRI study. NeuroImage 15(4), 983–991.
CHAPT E R 6
NETWORK NEUROSCIENCE: AN I N T R O D U C T I O N TO G R A P H T H E O RY N E T W O R K - B A S E D TECHNIQUES FOR MUSIC AND BRAIN IMAGING RESEARCH R O B I N W. WI L K I N S
I I this chapter, I provide an introduction to network neuroscience techniques and methods that may be successfully applied to neuroimaging data for brain-based music research. Included in this chapter is a background to the field of network science more broadly, as an approach to the study of complex systems, in addition to the more currently accepted graph theory techniques and applied analysis methods. The focus of the chapter is on two main components. First, an introductory overview of some of the specific network-based techniques that may be applied to neuroimaging data for understanding structural and functional brain connectivity. For those interested in pursuing the effects of music on brain
connectivity, it is important to understand there is a difference between network-based brain connectivity analyses and other conventional correlation measures of connectivity analysis. This is particularly true within the most prominent area of resting-state connectivity research (Biswall, Kylen, & Hyde, 1997; Biswal, Yetkin, Haughton, & Hyde, 1995; Fox et al., 2005; Greicius, Krasnow, Reiss, & Menon, 2003) as well as the default mode network (Broyd et al., 2009; Buckner, Andrews-Hanna, & Schacter, 2008; Raichle, 2001). At present, terms such as “brain networks,” “functional connectivity,” or “brain connectivity” frequently appear in the brain imaging literature. Nonetheless, readers are cautioned that all connectivity terms are not scientifically interchangeable or mathematically equal in their approach. Non-trivial statistical differences depend on whether network-based or correlational statistical methods are used to analyze and describe structural and functional brain connectivity (Bassett & Sporns, 2017; Bullmore & Sporns, 2009; Greicius et al., 2003; Stam, 2014). In addition, the field of network neuroscience is currently highly active and readers will find more refined network measures being generated and reported regularly. Thus, as a neuroscientific frontier, the second section of this chapter provides some of the more promising implications from the application of these network neuroscience techniques for advancing our understanding of the effects of music on structural and functional brain network connectivity. Ultimately, the supportive evidence found from the application of these techniques may prove useful for a host of neurological questions and neurorehabilitation avenues surrounding musical experiences and the brain.
Overview of Network Science Network-based approaches to the study of complex systems have become ubiquitous in a wide variety of research areas (Barabási & Albert, 1999; Newman, 2003; Watts & Strogatz, 1998). Steeped in the mathematical foundation of graph theory (Euler, 1736), network methods have led to a greater understanding of the interactions between components in systems as disparate as social networks, biological systems, communication arrays, and transportation networks (Barabási, 2002; Newman, 2003; Watts, 2003;
Watts & Strogatz, 1998). In addition, the fields of neuroscience and neuroimaging have greatly benefited from a network science approach (Bassett & Sporns, 2017; Stam & Reijneveld, 2007). Studying the brain as a complex system presents an opportunity to understand how structural and functional features contribute to dynamic mental phenomena of the brain. Importantly, network-based methods move opportunities in experimental design forward and progress beyond correlation analyses of neuroimaging data by providing more advanced statistical measures to evaluate whole brain connectivity (Bassett & Bullmore, 2006; Bullmore & Sporns, 2009; Fox, Zhang, Snyder, & Raichle, 2009; Sporns, Chialvo, Kaiser, & Hilgetag, 2004; Sporns, Tononi, & Kötter, 2005). Here, the brain is subdivided into regions (represented as network nodes) and interregional interactions (represented as network edges) estimated from structural or functional neuroimaging modalities, including functional magnetic resonance imaging (fMRI), electroencephalography (EEG), diffusion tensor imaging (DTI), and magnetoencephalography (MEG) (Friston, Frith, Turner, & Frackowiak, 1995; Logothetis, 2008; Stam & Reijneveld, 2007; Tuch, Reese, Wiegell, & Wedeen, 2003; Wedeen, Hagmann, Tseng, Reese, & Weiskoff, 2005). Recent advances in a network-based understanding of the brain have dramatically changed from conventional approaches of more traditional brain-activation focused experiments and former statistical analysis methods of neuroimaging data (Savoy, 2005; Shirer, Ryali, Rykhlevskaia, Menon, & Greicius, 2012). Now, rather than trying to understand brain function through isolated areas of brain response activation, researchers are able to explore neurological responses throughout the entire brain, as an interconnected system. This knowledge, that the brain is a complex system, is transforming our more traditional understanding of the brain (Bassett & Sporns, 2017; Betzel et al., 2012). Approaching the brain as a system presents an opportunity to uncover patterns in interregional interactions that are not apparent with conventional neuroimaging approaches to experimental design and analysis methods (Bassett, Khambhati, & Grafton, 2017; Bassett & Sporns, 2017; He & Evans, 2010; Sporns et al., 2005). This is specifically advantageous to questions surrounding music and brain imaging research. Unlike conventional neuroimaging analyses, a focal impetus behind network-based analyses arises from the hypothesis that a network approach provides for a more accurate representation of the brain as an interconnected system, an
organizational property that is often overlooked in more conventional neuroscientific approaches (Telesford, Simpson, Burdette, Hayasaka, & Laurienti, 2011). Perhaps more importantly, network methods allow for a statistically principled investigation of different brain states and neurological disorders under a common representational framework (Bassett & Bullmore, 2009; Moussa et al., 2011; Sporns et al., 2004). Network-based methods not only refine the outcomes of existing techniques, but also typify a paradigm shift for representing the brain’s structure and functional connectivity dynamics. This approach offers quantitatively different maps, where networks, consisting of nodes (e.g., voxels of neurons or brain regions) and links (e.g., anatomical or functional connections) are endowed with topological properties. Studying the brain at these various levels has led to the emergence of substantial evidence from the newer field of network neuroscience, a now firmly established brainbased scientific frontier (Bassett & Sporns, 2017). Within the brain, music affects an intricate set of complex neural processing systems (Alluri et al., 2012, 2013; Koelsch, 2009; Schlaug, 2001, 2009a; Thaut, Demartin, & Sanes, 2008; Wilkins, 2015; Wilkins, Hodges, Laurienti, Steen, & Burdette, 2012, 2014; Zatorre, Evans, Meyer, & Gjedde, 1992). These include structural components associated with sensory processing as well as functional elements implicated in memory, cognition, and mood fluctuation. Because music affects such diverse systems in the brain, it is an ideal candidate for analysis using a network-based approach (Guye, Bettus, Bartolomei, & Cozzone, 2010; Wilkins, 2015). A network approach represents a conceptual revolution beyond standard statistical approaches by bringing together researchers from a variety of disciplines to work on complex problems that defy understanding through confinement within any single discipline (West, 2011). With recent technological and analytical advances, we are witnessing an explosion in the quantity of network data and the subsequent comprehensiveness of information gleaned by generating network-based maps of complex systems’ data at each spatiotemporal scale. Importantly, network-based methods offer a natural mathematical framework that not only refine the outcomes of existing statistical analysis techniques, but also typify a paradigm shift for representing complex systems’ structure and dynamics. Extrapolating new and highly-detailed information may now be found within the intricacies of complex systems (Mitchell, 2009; Strogatz, 2001).
Consequently, new and rewarding solutions are being obtained to address problems important to society (Wang, González, Hidalgo, & Barabási, 2009; West, 2011). For readers interested in learning more about the emerging area of networks, the book Linked gives a user-friendly account of developments in the study of networks (Barabási, 2002). In addition, Six Degrees offers a sociologist’s view of historical discoveries, both old and new (Watts, 2003).
Introduction to Network Metrics As a field of interdisciplinary statistical physics, network science provides a host of robust statistical techniques and methods for investigating the structure and function of complex systems that display behaviors that defy explanation by the study of the systems’ elements in isolation (Barabási & Albert, 1999; Girvan & Newman, 2002). Network science is based on the branch of mathematics called graph theory (Euler, 1736; Newman, 2003). A graph is simply a mathematical representation of any real-world network that is made up of interconnected elements. In its most basic form, a graphed network is a collection of points, referred to as vertices or nodes connected together by lines as links or edges (see Fig. 1). A simple graph is a set of nodes that has a set of edges. Nodes represent the fundamental elements of the system, such as people, and the edges represent the connections between the pairs of nodes, such as friendships between pairs of people. Thus, a network is basically defined as a set of nodes or vertices where the connections between them are measured as links or edges. It is important to note that networks can be either directed or undirected, depending on the type of network and the data provided. Undirected networks are networks where information is passed to and from any given node in no particular or specific flow pattern. Directed networks, on the other hand, imply that information flows in a unilateral direction. Finally, networks can be weighted or unweighted, depending on the choice of the type of network. For a more detailed discussion, see Newman (2006).
FIGURE 1. Demonstration of a network. This network is comprised of 13 nodes. Nodes are shown as numbered circles. Nodes are connected to other nodes within the network by edges or links (shown as connecting lines).
The most primary network metric is degree. Within a network, the degree of a node is simply the number of connections the node has to other nodes within the rest of the network (Bullmore & Sporns, 2009; Strogatz, 2001). The degrees of all the nodes within the network form a degree distribution (Amaral, Scala, Barthelemy, & Stanley, 2000). In random networks, where all connections are equally possible, the degree distribution is typically Gaussian (i.e., normal) with a symmetrically centered distribution. Complex networks, on the other hand, generally result in a non-Gaussian degree distribution with a long tail toward high degree nodes. In the depiction of the network shown in Fig. 2, nodes are connected by links. Node 9 has edges or links that connect it to four other nodes within the network. Thus, the node in Fig. 2 has a degree of four. The connection links, or path length, is calculated by measuring the minimum number of edges information must pass through, when going from one node to another node, on its way to its final node destination within the network. The path length measurement can be compared to a similar network with the same number of nodes and the probability of a randomly generated set of connection links within the same network. Thus, in any network collection of nodes, the degree of the collection can be compared to the degree that might occur in a randomly connected network of the same size or density (i.e., the total number of nodes within the network). In Fig. 3, we can see that nodes within a network can have the equal probability of connecting to each and every other node within the network. If all nodes in the network
connect to all the other possible neighboring nodes, we would say that the network is regular (i.e., completely connected). If, on the other hand, we investigated the possibility of the connections of a node within a random network, we would see a different result. In random networks, all degree connections are equally probable, resulting as a Gaussian degree distribution.
FIGURE 2. Demonstration of the Network Statistic Degree. This figure depicts the degree of a network node. In this network, Node 9 has edge connections to four other nodes in the network. Thus, Node 9 has a degree of four. Note that Node 9 also connects to Nodes 6, 7, 8, and 11 within the network but does not connect to Node 10 or Node 12.
FIGURE 3. Depiction of three networks: regular, small-world, and random. This figure demonstrates differences in connections within three networks that have the same number of nodes. The regular network has connections with all neighboring nodes but no long-range connections. The random network has haphazard connections throughout the network. In contrast, the small-world network has primarily nearest neighbor connections but also some long-range connections across the network. This is referred to as the “small-world” effect. Small-world networks have been revealed to be a property of the brain.
As shown in Fig. 3, in the regular network we can see that each node is connected to each and every other neighboring node, but does not have
long-range connections to nodes across the network. The regular network is considered completely connected. However, in a random network, node connections are arbitrary. Thus, in contrast to both the regular and the random network, the small-world network depicted in the center of Fig. 3 shows that most nodes connect to neighboring nodes. However, this network has a few nodes with long-range connections to other network nodes. Thus, while the regular network has a lot of node-to-nearestneighbor connections, the small-world network also has a few distinct longrange nodal connections that, in turn, generate close proximity through direct connectivity (Amaral et al., 2000). These direct connections are found regardless of node location (i.e., regional proximity). This phenomenon of a “small-world” effect is a widely recognized characteristic of complex brain networks (Bassett & Bullmore, 2006; Watts & Strogatz, 1998). In random networks, all node degree connections are possible. In most complex systems however, high degree nodes tend to connect to other high degree nodes. In other words, the network does not scale regularly. A scalefree network is a network where the degree distribution follows a power law. Thus, in complex systems, rather than high degree nodes exhibiting random connection to any particular node, high degree nodes tend to selfselect by connecting to other high degree nodes and therefore generate a non-Gaussian distribution that is scale-free. Intuitively, when considered as a characteristic framework for understanding the brain, this makes sense. The brain selectively utilizes its high degree connections as resources in an efficient fashion in order to coordinate a host of widely distributed systemlevel functions. These complex networks are termed “scale-free.” To recap, nodes in complex systems, such as the brain, generally have a non-Gaussian degree distribution, often with a long tail toward a high degree. Complex brain networks exhibit characteristics of small-world networks where nodes tend to connect to other nodes in disparate regions of the network (Bullmore & Sporns, 2012). Finally, the degree distributions of nodes in complex networks are scale-free and follow a power law (Barabási & Albert, 1999). If the nearest neighbors of a node are also directly connected to each other they form a cluster (Watts & Strogatz, 1998). Nodes that tend to cluster are considered hubs (see Fig. 4). As the term implies, hubs function as connection “interchanges” within the network. The clustering coefficient
quantifies the number of connections that exist between the nearest neighbors of a node as a proportion of the maximum number of possible connections. Random networks have a low average clustering whereas complex networks typically have high clustering. Those nodes with high degrees, as hubs, are considered central to the network and can demonstrate their importance to the overall functioning brain network. This is important when considering application to the brain. Understanding brain function, and how it may be structurally or functionally altered or remediated via musical experiences, has important implications for understanding the effects of music and musical training as well as treating a variety of neurological conditions and disorders (El Haj, Fasotti, & Allain, 2012; Hodges & Wilkins, 2015; Hyde et al., 2009; Schlaug, 2009a; Wilkins, 2015; Wilkins et al., 2012, 2014; Wilkins et al., 2018; Wong, Skoe, Russo, Dees, & Kraus, 2007).
FIGURE 4. Demonstration of a hub. Node 7, shown as a darker circle, is central to all the other nodes in the network and is therefore a hub. Note that Node 7 has a degree of five, but due to its high centrality, Node 7 is also considered a hub within the entire network.
Hubs are part of a class of network measurements termed centrality. Centrality analysis measures how many of the shortest paths between pairs of nodes information must pass through on its way to its final destination within the network (Zuo et al., 2011; see Fig. 4). Presently, centrality measures are currently an active and ongoing area of research and there are several specific mathematical approaches to calculating unique characteristics of centrality metrics in the brain including: betweenness, eigenvector, and leverage centrality, among others (Borgatti, 2005; Joyce, Laurienti, Burdette, & Hayasaka, 2010; Newman, 2005). In concept, centrality functions like highway interchanges or subway “transfer-stops”
by calculating those nodes, as central hubs, that play an important functional role in the network. A node with high centrality, as a hub, is considered crucial to the network. As one could envision in Fig. 4, if the central hub is damaged or removed the network will become fragmented and communication across the network will be affected accordingly. Conversely, yet perhaps equally enticingly, if a hub were able to be restored or trained, there would be functional implications as well. Evidence indicates that the function of a complex network requires the maintenance of specific hubs that have high degree connections as node clusters. These hubs, importantly, are not necessarily adjacent and may be located in widely distributed brain regions (Bullmore & Sporns, 2012). Presently, there are provincial hubs that have high within-module degree and low participation coefficient as well as connector hubs with a high participation coefficient. However, the most widely accepted metric currently substantiated in the brain imaging literature is the “rich club,” those regions with densely interconnected connector hubs (Bullmore & Sporns, 2012). The selection and removal of a few critical nodes that are hubs can inflict havoc and potentially dismantle the entire functional or structural network (Albert, Jeong, & Barabási, 2000). Again, this has implications for the brain. Evidence from network neuroscience has exposed how the brain’s network resilience to attack helps protect its fragility and potential vulnerabilities. Damage within brain regions or specific trauma to particular brain network hubs would likely have impact on the brain functional network. Conversely, if external stimuli such as music or experiences in musical training can potentially re-route connections to specific hubs in brain regions important for healthy brain function, or even temporarily restore hub connections within traumatized regions, research suggests the brain may experience enhanced or therapeutic functional results (see Fig. 7) (Raglio et al., 2015; Sachs, Ellis, Schlaug, & Loui, 2016; Shirer et al., 2012; Sihvonen et al., 2017; Thaut et al., 2009; Wilkins et al., 2012, 2014). This would also be demonstrated in related functional brain concepts within the neuroimaging literature such as neuroplasticity, neurorestoration, and neurorehabilitation (Herholtz & Zatorre, 2012; Kraus & Chandrasekaran, 2010; Schlaug, 2009a, 2009b; Zatorre & Samson, 1991). Assortativity is the correlation between the degrees of connected nodes. Positive assortativity indicates that high degree nodes tend to preferentially
self-select to connect with other high degree nodes. Again, these degree distributions, where high degree nodes connect to other high degree nodes, result in the “small-world phenomenon” (Barabási & Albert, 1999; Watts & Strogatz, 1998). A negatively assortative network, on the other hand, indicates that high degree nodes tend to connect to low degree nodes. Community structure is a network metric for the measurement of the interconnectedness of nodes within a network (Newman & Girvan, 2004). Somewhat similar in concept to the partition approach when similar types of houses can be mapped into local nearby geographic sections or neighborhoods, community structure measures the topological configuration of the network by partitioning the network to calculate those nodes that exhibit and share more inner connections than outer node connections (see Fig. 5). Community structure analysis is performed by creating non-overlapping collections of highly interconnected nodes, or “modules” of nodes, that are statistically more connected to each other than to other nodes within the overall network (Girvan & Newman, 2002; Newman & Girvan, 2004). Modules are subsets of strongly connected nodes within the brain network. Modularity is defined as the quality of a particular partition of the network into modules (Newman & Girvan, 2004). Computationally, modularity (often referred to as Q) reflects the number of links between nodes within a module minus what would be expected given a random distribution of links between all nodes regardless of modules. This value varies from 0 to 1, with a higher value reflecting stronger community structure. In brief, in order to calculate the consistency of modular organization across time, the networks are first partitioned into distinct modules (i.e., separate communities) using a choice of algorithm approaches such as those found in Blondel, Guillaume, Lambiotte, and Lefebvre (2008), among others. These methods include optimization algorithms for modularity analysis that operate by identifying, through an iterative process, partitions of the network into subsets of highly connected nodes compared to other connected nodes’ modularity. In community structure detection procedures, the brain network is partitioned through multiple iterations, as repetitive calculations to detect which subdivisions throughout the entire network have modules that result in the maximum number of within-group edges and the minimum number of between-group edges (Newman & Girvan, 2004).
FIGURE 5. Community Structure. This figure depicts how a network (left panel) can be analyzed into separate communities. Community structure is a statistical detection procedure that measures those nodes that exhibit more highly interconnected nodes, compared to other nodal connections within the network. This network has three sub-graphed communities (shown in green, red, and blue circles, middle and right panels). Notice that each community is still sparsely connected, through connector hubs, to other nodes that are in other communities. Communities can be highly connected despite their spatial or regional proximity within the brain. Community structure is a statistic that is also referred to as “modularity.”
Community detection procedures are computationally intensive and are impacted by the choice of node parcellation schemes. In addition, an atlas or region-of-interest (ROI) based network will necessarily be different from a voxel-based network, due to the size of the network and node selection. Robust network partitioning into modules requires partitioning the individual network into modules across multiple iterations, in order to capture the most representative modular structure (Blondel et al., 2008; Fortunato, 2010; Newman, 2006). In an effort to calculate module comparisons based on groups of people or different conditions, datasets from groups of people or conditions can be further strengthened through the application of an additional statistical procedure termed Scaled Inclusivity (Steen, Hayasaka, Joyce, & Laurienti, 2011). Scaled Inclusivity takes into account each subject’s modules and then cross-compares it to each and every other person’s modules to determine which subject’s modules are most representative of the group (Stanley et al., 2013; Steen et al., 2011; Wilkins et al., 2014). Importantly, scaled inclusivity also accounts for the negative (absence) of a node within each person’s module and thus “scales” the calculation accordingly. Again, there are several different community
detection procedures that divide the functional subsets within the network across the brain topology and are measured through several different optimization procedures (Blondel et al., 2008; Fortunato, 2010; Mucha, Richardson, Macon, Porter, & Onnela, 2010). Community structure analysis, that calculates nodes that share connections with each other as non-overlapping groups, is also called modularity (Newman, 2006). In closing this introductory section on network methods, there are a host of robust statistical graph theory approaches that can be used to describe networks that are beyond the scope of this chapter including, but not limited to: multiplex, multilayer, multislice, multitype, hierarchial, multiweighted, interacting, interdependent, and coupled networks. For a complete review of fundamental brain network measurements, see Rubinov and Sporns (2010). In summary, there are numerous network-based metrics that can be applied to brain imaging data. In any network, there can be different—yet potentially equally informative—measurements about the components of a network. These graph theory techniques account for characteristics of the network by measuring specific components and their unique interactions (Telesford et al., 2011). The choice of nodes for network generation frequently varies from study to study. It is important to stress that the choice of node parcellation scheme and procedure is key for understanding the robustness of a particular network and subsequent results. Research has substantiated that voxel-based brain imaging networks differ substantially from region or atlas based networks in terms of choice of nodal parcellation (Cohen et al., 2008; Craddock, James, Holtzheimer, Hu, & Mayberg, 2012; Hayasaka & Laurienti, 2010; Mumford et al., 2010; Stanley et al., 2013). Depending on the type of imaging modality (e.g., fMRI, EEG, DSI, DTI, or MEG), the choice of node parcellation scheme(s) and approach to the actual node selection will, necessarily, be different. Currently, there is an absence of a fully agreed upon approach to node selection and studies can range from single neuron to voxel-based as well as brain regions-of-interest primarily determined by the neuroimaging literature brain atlases (Craddock et al., 2012; Power et al., 2011; Stanley et al., 2013; Wang, Zuo, & He, 2010). This inherently alters how the connectivity results and analyses are interpreted. A brain network comprised of a 90-node network is obviously going to be different than network-based statistics performed on a 21,000 voxel-based network. The means of node selection in brain
networks largely determines the subsequent neurobiological interpretation of the network results (Butts, 2009). Readers are again encouraged to determine whether research reports have selected nodes based on results from previous neuroimaging literature a priori, somewhat like a predefined seek-and-search, which may eliminate important information before the results and analyses are performed, or whether the brain network and statistics were generated without biases a priori and the subsequent analyses performed without prior intentional selection toward findings in any particular region or specific area of the brain. Again, neither is necessarily “better” than the other, but it is certainly worth making the distinction as the field of music and brain connectivity research moves forward. In closing, this section highlights the fundamental graph theory metrics from network science. Each network-based statistic provides a different layer of information that leads to a fuller understanding of brain connectivity.
Generating Brain Networks: Steps for Network-Based Neuroimaging Analysis Generating a brain network requires multiple processing steps for analyses. In brief, functional magnetic resonance imaging (fMRI) or other neuroimaging data (EEG, MEG) are acquired. Once the data are acquired, several statistical procedures are applied to prepare the data for network analysis. These procedures are typically performed as data preprocessing steps but are frequently reported under the data processing section within peer-reviewed research reports. The preprocessing of fMRI data involves skull stripping of the acquired neuroimaging data (i.e., revealing the brain only) and the application of several imperative statistical procedures that include motion correction, slice timing correction, realignment, coregistration of structural and functional images, normalization, and smoothing. An excellent explanation of the statistical techniques used on fMRI data may be found in Lindquist et al. (2018). Processing fMRI data for network-based analysis is only performed after completion of the preprocessing and correlation procedures through a series of statistical steps via command line data processing. There are several fMRI data processing applications available online such as the Free Software Library (FSL), AFNI, FreeSurfer, Diffusion Analysis and Tracula, and Statistical Parametric Mapping (SPM).
Brain network generation and analysis is currently an active area of research. Due to this fact, network-based analyses include emerging procedures and new statistical methods that are being created and applied, with new results being published regularly. Rather than performing the more conventional connectivity analyses, for generating brain networks (i.e., graph-theory based networks) subsequent to the data processing phase, several more advanced statistical procedures are needed in order to achieve actual graph-theory based network generation and analysis. Due to the high computational load, network procedures and approaches as well as most state-of-the-art network processing and analyses are still managed through various in-house data processing scripts, typically in UNIX/LINUX, matlab, and/or python computing languages. However, there are several useful network toolkits and software applications that are freely available including The Brain Connectivity Tool Box, the Functional Connectivity (Conn)Tool Box, and GraphVar (Kruschwitz, List, Waller, Rubinov, & Walter, 2015; Rubinov & Sporns, 2010; Whitfield-Gabrielli & NeitoCastanon, 2012). Due to the nature of network neuroscience as an emerging field in brain science, there is also the option of developing new statistical network measurements and approaches, including more advanced computer scripts, that apply to specific procedures or statistical analyses. At present, most of these are created for a new network property or for comparing different properties. This process will continue as the field grows and will certainly further advance our understanding of both structural and functional brain networks in terms of cognition and perception, in addition to neurological health and disease. These newer network analyses statistics and algorithms are typically published in methods sections and are frequently reported under methods as “in-house” processing scripts, many times in the supplemental methods section of a peer-reviewed publication. It is quite common for new network statistics and in-house processing scripts to be employed for working with fMRI data for network analysis. Thus, apart from the aforementioned network statistics, the field remains to be defined fully in terms of which newer network methods are considered sufficiently robust as “gold standards.” Again, researchers are cautioned that this is particularly true for node parcellation and node choice selection (Stanley et al., 2013). For any network analysis, once the fMRI data have been processed, a connectivity matrix must be generated. In brief, for connectivity analysis
(often referred to as “functional connectivity”), a cross-correlation procedure is applied between each node and each and every other node. Current neuroimaging technology limits functional brain network analysis to nodes above the millimeter scale, meaning that many potentially interacting neurons and synapses will be represented as individual nodes in human brain networks. Once the cross-correlation (i.e., the connectivity) matrix is generated, a thresholding statistic is applied to the data. A set of statistical thresholding procedures are performed on each correlation so that the resulting matrix can be binarized to reveal the strongest connections in the network. Thresholding is currently another active area of network research (Van den Heuvel et al., 2017). Thresholding intuitively eliminates at least some of the brain network connections. Correlation matrices can be measured through thresholding iterations across all possible data points, from 0.01 to 1. Indeed, thresholding procedures have been applied across varying data points and examined for their robust characteristics. For example, having too high a threshold (e.g., 0.95 or 1.0) necessarily includes all correlation connections (exceedingly strong and very weak). Thus, the results yield of the thresholded matrix is not informative. However, results reveal that similarly sized networks show less inter-subject network fragmentation with thresholds set at 0.2, 0.25, or 0.3. There are currently several different statistical approaches for thresholding procedures that are not inconsequential including proportional, relative, and absolute, among others (Van den Heuvel et al., 2017). Researchers interested in reading more about different consequences that may result from varying threshold statistical approaches will find more detailed information in Van Wijk, Stam, and Daffertshofer (2010) and Van den Heuvel et al. (2017). Again, the goal of thresholding the correlation matrix is to preserve the strongest connections and density of the network. Additionally, thresholding procedures are implemented to prevent excessive fragmentation and inadvertent insertion of randomness into the data, while simultaneously eliminating the weaker connections. All thresholding is performed on the connectivity matrices prior to applying any graph theory statistics for network-based analyses. The result from this thresholding procedure is considered a widely accepted and most fundamental step prior to any network-based analysis. The results of thresholding procedures reveal the adjacency matrix (Aij). It is important to note that, unlike typical correlation analyses of resting-state data with music as functional connectivity analyses
(e.g., intrinsic connectivity, radial connectivity) oftentimes reported in the brain imaging literature, all advanced network-based statistics and analyses are performed on the adjacency matrix data. Thus, the choices of parcellation scheme in terms of node selection and thresholding procedures are critical for examining brain networks. A current lack of an agreed upon approach to node selection has led to the analysis of functional brain networks across an extensive range of scales. While individual neurons may be considered as nodes, this has only been successful for more simplistic networks, such as the C. Elegans (Sporns & Kötter, 2004; Towlson, Vertes, Ahnert, Schafer, & Bullmore, 2013). It is still not currently possible to noninvasively image or computationally analyze the brain’s estimated 100 billion neurons each one with ∼7,000 synapses (Stanley et al., 2013). Presently, a comprehensive and unanimously agreed upon nodal definition is still outstanding, making the selection of node options one of the more central challenges in network analyses of neuroimaging data (Stanley et al., 2013). Again, readers are encouraged to note that not all connectivity approaches reported in the neuroimaging literature are mathematically or statistically interchangeable in their approaches. While prevalent brain connectivity literature employs correlation procedures, network-based (graph theory) connectivity methods stem from the field of network science. A full explanation of the technical and statistical steps used in brain imaging is found in the wide set of fMRI literature, although several articles highlight components of these techniques and network region-of-interest or voxel-based network comparisons (Hagberg, Schult, & Swart, 2008; Hayasaka & Laurienti, 2010). A complete review of statistics for fMRI data can be found in Lindquist et al. (2018). Fig. 6 is a pictorial description of a more typical data processing stream and network generation pipeline. The pipeline depicted here is for fMRI data. Each of these steps must be performed before any network-based statistics can be applied to individual datasets and any network-based statistical comparisons can be made across groups of people.
FIGURE 6. Processing stream for brain network analysis. Functional time series are correlated and then binarized through thresholding procedures to create an adjacency matrix, representing the strongest connections between every possible pair of nodes. The adjacency matrix is subsequently mapped onto brain space following network-based statistical analyses. For network analysis, functional magnetic resonance imaging (fMRI) data is processed in multiple steps through what is typically referred to as a pipeline. Reproduced from Wilkins (2015).
In summary, in terms of some of the broader categories of network statistical properties and their role in the analyses of the overall brain network (Rubinov & Sporns, 2010), there are particular metrics useful for brain segregation, integration, and influence. Examples of segregation of brain networks include clustering, motifs, and community structure or modularity. Integration of brain networks includes distance, path length, and efficiency measures, among others, while influence includes network metrics of degree, participation, and betweenness (Bassett & Sporns, 2017; Bullmore & Sporns, 2009). Thus, neuroimaging investigators are cautioned, in regard to network-based brain imaging studies with music, to try to select
the most robust categories of node measures for network statistics and each imaging modality as possible to avoid spurious results.
I
M
B
R Since the original network-based investigation into the effects of music on the brain, “Network Science: A New Method for Investigating the Complexity of Musical Experiences in the Brain” (Wilkins et al., 2012), that paper and those that followed have generated new insight into how and why music affects network-based functional and structural brain connectivity using EEG, DTI, DSI, and fMRI data (Fauvel et al., 2014; Hodges & Wilkins, 2015; Karmonik et al., 2016; Koelsch, Skouras, & Lohmann, 2018; Liu, Abu-Jamous, et al., 2017; Liu, Brattico et al., 2017; Wilkins, 2015; Wilkins et al., 2014; Wu et al., 2012; Wu, Zhang, Ding, Liu, & Zhou, 2013). The evidence resulting from a network-based approach to the brain (Bassett & Bullmore, 2006; Bassett & Sporns, 2017; Bullmore & Sporns, 2009) provides us with substantial confirmation that network neuroscience not only advances our understanding of the brain, but simultaneously holds promise for new understandings regarding the effects of music and musical training on structural and functional brain networks in both neurological health and disease as well as various compromised and functional brain states (Bigand et al., 2015; Blum et al., 2017; Fauvel et al., 2014; Gaser & Schlaug, 2003; Greicius, 2008; Gusnard, Akbudak, Shulman, & Raichle, 2001a, 2001b; Karmonik et al., 2016; Koelsch et al., 2018; Magee, Clark, Tamplin, & Bradt, 2017; Moussa et al., 2011; Raglio et al., 2015; Sihvonen et al., 2017; Wilkins et al., 2018; Wu et al., 2013). Network neuroscience presents opportunities in experimental designs previously beyond the scope of classic neuroimaging analyses (i.e., “one region-one behavior”). While conventional activation-style designs for traditional experimental neuroimaging research are still valid, being able to pursue questions about the brain’s entire system in a statistically principled manner presents an opportunity to advance our understanding of music and the brain. Newer evidence suggests that music may provide a means to affect information flow in the brain network (Karmonik et al., 2016) as well
as changes in functional measures that accompany gray matter volume changes from musical expertise (Fauvel et al., 2014). Results reveal the brain functional network responds to preferred music listening by creating communities within pivotal regions of the default mode network, a region widely accepted to be important to self-reflective and mind wandering processes important for brain function (Wilkins et al., 2014) and that a favorite song can spontaneously separate the functional network into distinct communities between the auditory cortex and the hippocampus, a region recognized for memory encoding. Dynamic functional connectivity analyses of data collected while people were listening to stimuli of continuous music previously suggested to influence anxiety and anger show significant measures of intrinsic connectivity within the salience network (Lindquist et al., 2018). More recent evidence suggests that whole brain responses to naturalistic music listening spontaneously alters the resting brain to stimulate significant hubs within attentional control regions of the anterior cingulate, highlighting how the network system may potentially optimize or restore aspects of neurological function by resourcing attentional circuit-breaker mechanisms (Wilkins et al., 2018). Compared to the brain at rest, network analyses also indicate a significant reduction in betweenness centrality within the amygdala during naturalistic music, a region implicated in emotional responses linked to anxiety and avoidance behaviors, suggesting a systems-level decrease in these affective responses while listening to ambient background music (Wilkins et al., 2018). Recent evidence also reveals that significant functional network characteristics of different auditory regions are exhibited during music-evoked emotional experiences of fear and joy (Koelsch et al., 2018). The substantial questions and promising potential surrounding the effects of brain responses to musical experiences have been, in many ways, outside the scope of previously available tools and the more conventional brain activation-based experimental approaches and analyses techniques. It is easy to understand how music and brain imaging investigators at all levels occasionally may have a sense of unease in dealing comprehensively with a network-based approach to music and the brain. Under such circumstances, it is tempting for neuroimaging scientists who are pursuing questions about music to remain within the confines of conventional activation analyses. A similar historical response can be found when neuroimaging scientists were first considering the connectivity of the
Default Mode Network: “The suggested link between the processing taking place at rest and its physiology is one that can have no direct relevance for neuroimaging” (Morcom & Fletcher, 2007, p. 1075; for a complete update on this commentary see also Raichle, 2001). This type of statement is arguably true, if one’s experimental music and brain horizons are limited to previous techniques and analyses in functional neuroimaging science. This chapter suggests, however, that such a finite agenda will be depleted eventually if not nourished by the broader implications and understanding of brain function that these emerging network science techniques may serve. In closing, the main objective of this chapter is to highlight the graph theory methods and network science evidence that persuades us towards complex systems thinking and the field of network neuroscience. While conventional approaches provide evidence of brain activation to music, a network-based approach takes a different perspective, which is that a full understanding of brain activity—including brain responses to musical experiences—critically depends on studying the brain as a complex system (Bassett & Sporns, 2017; Bullmore & Sporns, 2009; Wilkins, 2015) through the application of network (graph theory) techniques (Bassett & Bullmore, 2009). A network-based analysis provides us with statistical rigor to study detailed patterns of neural connections throughout the entire system of the brain. This approach can be applied to data collected while people are listening to continuous music, as well as comparing brain responses to different types of music and the brains of people with musical training (Fig. 7) (Wilkins et al., 2012). These complex connections, or brain networks, help reveal the architectural and functional scaffolding that ultimately illuminates the brain’s dynamic behaviors as robust statistical connectivity patterns, including the brain’s intrinsic (i.e., resting-state) activity that may be affected while listening to music including that present in the default mode network regions of the brain (Broyd et al., 2009; Raichle, 2001; Wilkins et al., 2014).
FIGURE 7. Depiction of high degree hubs based on musical genre. Note the consistency of high degree hubs in the auditory regions while people (N = 21) listened to continuous classical music, in this case Beethoven’s 1st Symphony, Mvt. 1 London Symphony Orchestra. A 21,000 × 21,000 voxel-based matrix was used for the network-based statistical analyses. Reproduced from Wilkins et al. (2012, pp. 282–283). © 2012 by the International Society for the Arts, Sciences and Technology, published by MIT Press.
Although there are still fundamental questions about music and the brain that remain unresolved, network science offers key tools that hold promise for providing answers about complex systems in new ways. As the field continues to advance, network neuroscience and the study of brain connectivity, through network-based statistics, will expand new experimental and theoretical avenues for understanding how structural brain connectivity leads to dynamic brain function. The discussion in this chapter, in particular, illustrates how network-based approaches may advance fundamental questions surrounding the promising effects of music in neurological research and rehabilitation (Hodges & Wilkins, 2015; Kotchoubey, Pavlov, & Kleber, 2015; Thaut et al., 2008). As a computationally robust field, network neuroscience provides a new mathematical framework for investigating complex systems that goes beyond previously conventional approaches to experimental design and neuroimaging research. Methods from graph theory provide a robust, wellestablished framework for assessing brain connectivity, both locally and globally, offering a rigorous opportunity to expansively and non-invasively explore the entire human brain under whole brain activity experiences (Bullmore & Sporns, 2009; Rubinov & Sporns, 2010). Analyses can reveal
patterns of both structural and functional brain connectivity. A network neuroscience approach provides unprecedented opportunities for examining the effects of musical experiences on the human brain. The methods and techniques presented here provide an opportunity for researchers to pursue questions that may further advance the field of music and brain research, deepening our scientific understanding surrounding the effects of music on the brain.
R Albert, R., Jeong, H., & Barabási, A.-L. (2000). Error and attack tolerance of complex networks. Nature 406(6794), 378–382. Alluri, V., Toivianen, P., Jaaskelainen, J. P., Glerean, E., Sams, M., & Brattico, E. (2012). Large-scale brain networks emerge from dynamic processing of musical timbre, key and rhythm. NeuroImage 59(4), 3677–3689. Alluri, V., Toiviainen, P., Lund, T. E., Wallentin, M., Vuust, P., Nandi, A. K., … Brattico, E. (2013). From Vivaldi to Beatles and back: Predicting lateralized brain responses to music. NeuroImage 83, 627–636. Amaral, L. A., Scala, A., Barthelemy, M., & Stanley, H. E. (2000). Classes of small-world networks. Proceedings of the National Academy of Sciences 97(21), 11149–11152. Barabási, A.-L. (2002). Linked: The new science of networks. Cambridge, MA: Perseus Publishing. Barabási, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. Science 286(5439), 509–512. Bassett, D. S., & Bullmore, E. (2006). Small-world brain networks. Neuroscientist 12(6), 512–523. Bassett, D. S., & Bullmore, E. (2009). Human brain networks in health and disease. Current Opinion in Neurology 22(4), 340–347. Bassett, D. S., Khambhati, A. N., & Grafton, S. T. (2017). Emerging frontiers of neuroengineering: A network science of brain connectivity. Annual Review of Biomedical Engineering 19, 327–352. Bassett, D. S., & Sporns, O. (2017). Network neuroscience. Nature Neuroscience 20(3), 353–364. Betzel, R. F., Erickson, M. A., Abell, M., O’Donnell, B. F., Hetrick, W. P., & Sporns, O. (2012). Synchronization dynamics and evidence for a repertoire of network states in resting EEG. Frontiers in Computational Neuroscience 6. Retrieved from https://doi.org/10.3389/fncom.2012.00074 Bigand, E., Tillmann, B., Peretz, I., Zatorre, R. J., Lopez, L., & Majno, M. (Eds.). (2015). The neurosciences and music V: Cognitive stimulation and rehabilitation. Annals of the New York Academy of Sciences 1337. Biswal, B. B., Kylen, J. V., & Hyde, J. S. (1997). Simultaneous assessment of flow and BOLD signals in resting-state functional connectivity maps. NMR in Biomedicine 10(45), 165–170. Biswal, B. B., Yetkin, F. Z., Haughton, V. M., & Hyde, J. S. (1995). Functional connectivity in the motor cortex of resting human brain using echo-planar MRI. Magnetic Resonance in Medicine 34(4), 537–541. Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics 2008. Retrieved from
https://doi.org/10.1088/1742-5468/2008/10/P10008 Blum, K., Simpatico, T., Febo, M., Rodriquez, C., Dushaj, K., Li, M., … Badgaiyan, R. D. (2017). Hypothesizing music intervention enhances brain functional connectivity involving dopaminergic recruitment: Common neuro-correlates to abusable drugs. Molecular Neurobiology 54(5), 3753– 3758. Borgatti, S. (2005). Centrality and network flow. Social Networks 27(1), 55–71. Broyd, S. J., Demanuele, C., Debener, S., Helps, S. K., James, C. J., & Sonuga-Barke, E. J. S. (2009). Default-mode brain dysfunction in mental disorders: A systematic review. Neuroscience & Biobehavioral Reviews 33(3), 279–296. Buckner, R. L., Andrews-Hanna, J. R., & Schacter, D. L. (2008). The brain’s default mode network: Anatomy, function, and relevance to disease. Annals of the New York Academy of Sciences 1124, 1–38. Bullmore, E., & Sporns, O. (2009). Complex brain networks: Graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience 10(3), 186–198. Bullmore, E., & Sporns, O. (2012). The economy of brain network organization. Nature Reviews Neuroscience 13(5), 336–349. Butts, C. T. (2009). Revisiting the foundations of network analysis. Science 325(5939), 414–416. Cohen, A. L., Fair, D. A., Dosenbach, N. U. F., Miezin, F. M., Dierker, D., & Van Essen, D. C. (2008). Defining functional areas in individual human brains using resting functional connectivity MRI. NeuroImage 41, 45–57. Craddock, R. C., James, G. A., Holtzheimer, P. E., Hu, X. P., & Mayberg, H. S. (2012). A whole brain fMRI atlas generated via spatially constrained spectral clustering. Human Brain Mapping 33(8), 1914–1928. El Haj, M., Fasotti, L., & Allain, P. (2012). The involuntary nature of music-evoked autobiographical memories in Alzheimer’s disease. Consciousness and Cognition 21(1), 238–246. Euler, L. (1736). Solutio problematis ad geometriam situs pertinentis. Commentarii Academiae Scientiarum Imperialis Petropolitanae 8, 128–140. Reprinted and translated in N. L. Biggs, E. K. Lloyd, & R. J. Wilson, Graph Theory 1736–1936 (pp. 3–8). Oxford: Oxford University Press, 1976. Fauvel, B., Groussard, M., Chetelat, G., Fouquet, M., Landeau, B., Eustache, F., … Platel, H. (2014). Morphological brain plasticity induced by musical expertise is accompanied by modulation of functional connectivity at rest. NeuroImage 90, 179–188. Fortunato, S. (2010). Community detection in graphs. Physics Reports 486(3–5), 75–174. Fox, M. D., Snyder, A. Z., Vincent, J. L., Corbetta, M., Van Essen, D. C., & Raichle, M. E. (2005). The human brain is intrinsically organized into dynamic, anticorrelated functional networks. Proceedings of the National Academy of Sciences 102(27), 9673–9678. Fox, M. D., Zhang, D., Snyder, A. Z., & Raichle, M. E. (2009). The global signal and observed anticorrelated resting state brain networks. Journal of Neurophysiology 101(6), 3270–3283. Friston, K. J., Frith, C. D., Turner, R., & Frackowiak, R. S. (1995). Characterizing evoked hemodynamics with fMRI. NeuroImage 2(2), 157–165. Gaser, C., and Schlaug, G. (2003). Brain structures differ between musicians and non-musicians. Journal of Neuroscience 23(27), 9240–9245. Girvan, M., & Newman, M. E. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99(12), 7821–7826. Greicius, M. (2008). Resting-state functional connectivity in neuropsychiatric disorders. Current Opinion in Neurology 21(4), 424–430. Greicius, M., Krasnow, B., Reiss, A. L., & Menon, V. (2003). Functional connectivity in the resting brain: A network analyses of the default mode hypothesis. Proceedings of the National Academy of
Sciences 100(1), 253–258. Gusnard, D. A., Akbudak, E., Shulman, G. L., & Raichle, M. E. (2001a). Medial prefrontal cortex and self-referential mental activity: Relation to a default mode of brain function. Proceedings of the National Academy of Sciences 98(7), 4259–4264. Gusnard, D. A., Akbudak, E., Shulman, G. L., & Raichle, M. E. (2001b). Role of medial prefrontal cortex in a default mode of brain function. NeuroImage 13(6), S414. Guye, M., Bettus, G., Bartolomei, F., & Cozzone, P. (2010). Graph theoretical analysis of structural and functional connectivity MRI in normal and pathological brain networks. Magnetic Resonance Materials in Physics, Biology and Medicine 23(5–6), 409–421. Hagberg, A. A., Schult, D. A., & Swart, P. J. (2008). Exploring network structure, dynamics, and function using NetworkX. In G. Varoquaux, T. Vaught, & J. Millman (Eds.), Proceedings of the 7th Python in Science Conference (SciPy2008) (pp. 11–15). Pasadena, CA. Hayasaka, S., & Laurienti, P. J. (2010). Comparison of characteristics between region- and voxelbased network analyses in resting-state fMRI data. NeuroImage 50(2), 499–508. He, Y., & Evans, A. (2010). A review of structural and functional brain connectivity. Current Opinion in Neurology 23(4), 341–350. Herholz, S. C., & Zatorre, R. J. (2012). Musical training as a framework for brain plasticity: Behavior, function, and structure. Neuron 76(3), 486–502. Hodges, D. A., & Wilkins, R. W. (2015). How and why does music move us? Answers from psychology and neuroscience. Music Education Journal 101(4), 41–47. Hyde, K. L., Lerch, J., Norton, A., Forgeard, M., Winner, E., Evans, A. C., & Schlaug, G. (2009). Musical training shapes structural brain development. Journal of Neuroscience 29(10), 3019–3025. Joyce, K. E., Laurienti, P. J., Burdette, J. H., & Hayasaka, S. (2010). A new measure of centrality for brain networks. PLoS ONE 5(8), e12200. Karmonik, C., Brandt, A., Anderson, J. R., Brooks, F., Lytle, J., Silverman, E., & Frazier, J. T (2016). Music listening modulates functional connectivity and information flow in the human brain. Brain Connectivity 6(8), 632–641. Koelsch, S. (2009). A neuroscientific perspective on music therapy. Annals of the New York Academy of Sciences 1169, 374–384. Koelsch, S., Skouras, S., & Lohmann, G. (2018). The auditory cortex hosts network nodes influential for emotion processing: An fMRI study on music-evoked fear and joy. PLoS ONE 13(1), e0190057. Kotchoubey, B., Pavlov, Y. G., & Kleber, B. (2015). Music in research and rehabilitation of disorders of consciousness: Psychological and neurophysiological foundations. Frontiers in Psychology 6, 1763. Retrieved from https://doi.org/10.3389/fpsyg.2015.01763 Kraus, N., & Chandrasekaran, B. (2010). Music training for the development of auditory skills. Nature Reviews Neuroscience 11(8), 599–605. Kruschwitz, J. D., List, D., Waller, L., Rubinov, M. & Walter, H. (2015). GraphVar: A user-friendly toolbox for comprehensive graph analyses of functional brain connectivity. Journal of Neuroscience Methods 245, 107–115. Lindquist, K. A., Pendl, S., Brooks, J. A., Wilkins, R. W., Kraft, R. A., & Gao, W. (2018). Dynamic functional connectivity of intrinsic networks during emotions. NeuroImage. Under review. Liu, C., Abu-Jamous, B., Brattico, E., & Nandi, A. K. (2017). Towards tunable consensus clustering for studying functional brain connectivity during affective processing. International Journal of Neural Systems 27(2), 1650042. doi:10.1142/S0129065716500428 Liu, C., Brattico, E., Abu-Jamous, B., Pereira, C. S., Jacobsen, T., & Nandi, A. K. (2017). Effect of explicit evaluation on neural connectivity related to listening to unfamiliar music. Frontiers in Human Neuroscience 11, 611. Retrieved from https://doi.org/10.3389/fnhum.2017.00611
Logothetis, N. K. (2008). What we can do and what we cannot do with fMRI. Nature 453(7197), 869–878. Magee, W. L., Clark, I., Tamplin, J., & Bradt, J. (2017). Music interventions for acquired brain injury. Cochrane Database of Systematic Reviews 1, CD006787. doi:10.1002/14651858.CD006787.pub3 Mitchell, M. (2009). Complexity: A guided tour. Oxford: Oxford University Press. Morcom, A. M., & Fletcher, P. C. (2007). Does the brain have a baseline? Why we should be resisting a rest. NeuroImage 37(4), 1073–1082. Moussa, M. N., Vechlekar, C. D., Burdett, J. H., Steen, M. R., Hugenschmidt, C. E., & Laurienti, P. J. (2011). Changes in cognitive state alter human functional brain networks. Frontiers in Human Neuroscience 5, 1–15. Retrieved from https://doi.org/10.3389/fnhum.2011.00083 Mucha, P. J., Richardson, T., Macon, K., Porter, M. A., & Onnela, J. P. (2010). Community structure in time-dependent, multiscale, and multiplex networks. Science 328(5980), 876–878. Mumford, J. A., Horvath, S., Oldham, M. C., Langfelder, P., Geschwind, D. H., & Poldrack, R. A. (2010). Detecting network modules in fMRI time series: A weighted network analysis approach. NeuroImage 52(4), 1465–1476. Newman, M. E. J. (2003). The structure and function of complex networks. SIAM Review 45, 167– 256. Newman, M. E. J. (2005). Power laws, Pareto distributions and Zipf’s law. Contemporary Physics 46(5), 323–351. Newman, M. E. J. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103(23), 8577–8582. Newman, M. E., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E 69(2 Pt. 2), 026113. Power, J. D., Cohen, A. L, Nelson, S. M., Wig, G. S., Barnes, K. A., Church, J. A., … Petersen, S. E. (2011). Functional network organization of the human brain. Neuron 72(4), 665–678. Raglio, A., Attardo, L., Gontero, G., Rollino, S., Groppo, E., & Granieri, E. (2015). Effects of music and music therapy on mood in neurological patients World Journal of Psychiatry 5(1), 68–78. Raichle, M. E. (2001). A default mode of brain function. Proceedings of the National Academy of Sciences 98(2), 676–682. Rubinov, M., & Sporns, O. (2010). Complex network measures of brain connectivity: Uses and interpretations. NeuroImage 52(3), 1059–1069. Sachs, M. E., Ellis, R. J., Schlaug, G., & Loui, P. (2016). Brain connectivity reflects aesthetic responses to music. Social Cognitive and Affective Neuroscience 11(6), 884–891. Savoy, R. A. (2005). Experimental design in brain activation MRI: Cautionary tales. Brain Research Bulletin 67, 361–365. Schlaug, G. (2001). The brain of musicians. A model for functional and structural adaptation. Annals of the New York Academy of Sciences 930, 281–299. Schlaug, G. (2009a). Listening to music facilitates brain recovery processes. Annals of the New York Academy of Sciences 1169, 372–373. Schlaug, G. (2009b). Music, musicians, and brain plasticity. In S. Hallam, I. Cross, & M. Thaut (Eds.), The Oxford handbook of music psychology (pp. 197–207). Oxford: Oxford University Press. Schlaug, G., Marchina, S., & Norton, A. (2009). Evidence for plasticity in white matter tracts of chronic aphasic patients undergoing intense intonation-based speech therapy. Annals of the New York Academy of Sciences 1169, 385–394. Shirer, W. R., Ryali, S., Rykhlevskaia, E., Menon, V., & Greicius, M. D. (2012). Decoding subjectdriven cognitive states with whole-brain connectivity patterns. Cerebral Cortex 22(1), 158–165.
Sihvonen, A. J., Sarkamo, T., Leo, V., Tervaniemi, M., Altenmuller, E., & Soinila, S. (2017). Musicbased interventions in neurological rehabilitation. The Lancet Neurology 16(8), 648–660. Sporns, O., Chialvo, D., Kaiser, M., & Hilgetag, C. (2004). Organization, development and function of complex brain networks. Trends in Cognitive Sciences 8(9), 418–425. Sporns, O., & Kotter, R. (2004). Motifs in brain networks. PLoS Biology 2(11), e369. Sporns, O., Tononi, G., & Kötter, R. (2005). The human connectome: A structural description of the human brain. PLoS Computational Biology 1(4), e42. Stam, C. J. (2014). Modern network science of neurological disorders. Nature Reviews Neuroscience 15, 683–695. Stam, C. J., & Reijneveld, J. P. (2007). Graph theoretical analysis of complex networks in the brain. Nonlinear Biomedical Physics 1, 3. doi:10.1186/1753-4631-1-3 Stanley, M. L., Moussa, M. N., Paolini, B. M., Lyday, R., Burdette, J. H., & Laurienti, P. J. (2013). Defining nodes in complex networks. Frontiers in Computational Neuroscience 7, 169. Retrieved from https://doi.org/10.3389/fncom.2013.00169 Steen, M., Hayasaka, S., Joyce, K., & Laurienti, P. (2011). Assessing the consistency of community structure in complex networks. Physical Review E 84(1–2), 016111. Strogatz, S. H. (2001). Exploring complex networks. Nature 410(6825), 268–276. Telesford, Q. K., Simpson, S. L., Burdette, J. H., Hayasaka, S., & Laurienti, P. J. (2011). The brain as a complex system: Using network science as a tool for understanding the brain. Brain Connectivity 1(4), 295–308. Thaut, M. H., Demartin, M., & Sanes, J. N. (2008). Brain networks for integrative rhythm formation. PLoS ONE 3, e2312. Thaut, M. H., Gardiner, J. C., Holmberg, D., Horwitz, J., Kent, L., Andrews, G., … McIntosh, G. R. (2009). Neurologic music therapy improves executive function and emotional adjustment in traumatic brain injury rehabilitation. Annals of the New York Academy of Sciences 1169, 406–416. Towlson, E., Vertes, P. E., Ahnert, S., Schafer, W. R., & Bullmore, E. T. (2013). The rich club of the C. elegans neuronal connectome. Journal of Neuroscience 33(15), 6380–6387. Tuch, D. S., Reese, T. G., Wiegell, M. R., & Wedeen, V. J. (2003). Diffusion MRI of complex neural architecture. Neuron 40(5), 885–895. Van den Heuvel, M. P., de Lange, S. C., Zalesky, A., Seguin, C., Yeo, B. T. T., & Schmidt, R. (2017). Proportional thresholding in resting-state fMRI functional connectivity networks and consequences for patient-control connectome studies: Issues and recommendations. NeuroImage 152, 437–449. Van Wijk, B. C., Stam, C. J., & Daffertshofer, A. (2010). Comparing brain networks of different size and connectivity density using graph theory. PloS ONE 5, e13701. Wang, J., Zuo, X., & He, Y. (2010). Graph-based network analysis of resting-state functional MRI. Frontiers in Systems Neuroscience 4, 16. Retrieved from https://doi.org/10.3389/fnsys.2010.00016 Wang, M., González, C. A., Hidalgo, & Barabási, A.-L. (2009). Understanding the spreading patterns of mobile phone viruses. Science 324(5930), 1071–1076. Watts, D. J. (2003). Six degrees: The science of a connected age. New York: W. W. Norton. Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of “small-world” networks. Nature 393(6684), 440–442. Wedeen, V. J., Hagmann, P., Tseng, W. Y., Reese, T. G., & Weiskoff, R. M. (2005). Mapping complex tissue architecture with diffusion spectrum magnetic resonance imaging. Magnetic Resonance in Medicine 54(6), 1377–1386. West, B. J. (2011). Overview 2010 of ARL program on network science for human decision making. Frontiers in Physiology 2, 76. Retrieved from https://doi.org/10.3389/fphys.2011.00076
Whitfield-Gabrielli, S., & Nieto-Castanon, A. (2012). Conn: A functional connectivity toolbox for correlated and anticorrelated brain networks. Brain Connectivity 2(3). doi:10.1089/brain.2012.0073 Wilkins, R. W. (2015). Network science and the effects of music on the human brain (Doctoral dissertation). University of North Carolina at Greensboro. Wilkins, R. W., Giridharan, S., Johnston, M., Brooks, J. A., Lindquist, K. A., & Kraft, R. A. (2018). Changes in resting-state functional brain networks during naturalistic music listening. In preparation. Wilkins, R. W., Hodges, D. A., Laurienti, M., Steen, M., & Burdette, J. H. (2012). Network science: A new method for investigating the complexity of musical experiences in the brain. Leonardo 45(3), 282–283. Wilkins, R. W., Hodges, D. A., Laurienti, M., Steen, M., & Burdette, J. H. (2014). Network science and the effects of music preference on functional brain connectivity: From Beethoven to Eminem. Scientific Reports 4, 6130. doi: 10.1038/srep06130 Wong, P. C., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nature Neuroscience 10(4), 420–422. Wu, J., Zhang, J., Ding, X., Liu, D., & Zhou, C. (2013). The effects of music on brain functional networks: A network analyses. Neuroscience 250, 49–59. Wu, J., Zhang, J., Liu, C., Liu, D., Ding, X., & Zhou, C. (2012). Graph theoretical analysis of EEG functional connectivity during music perception. Brain Research 1483, 71–81. Zatorre, R., Evans, A., Meyer, E., & Gjedde, A. (1992). Lateralization of phonetic and pitch discrimination in speech processing. Science 256(5058), 846–849. Zatorre, R., & Samson, S. (1991). Role of the right temporal neocortex in retention of pitch in auditory short-term memory. Brain 114(6), 2403–2417. Zuo, X., Ehmke, R., Mennes, M., Imperati, D., Castellanos, X., Sporns, O., & Milham, M. (2011). Network centrality in the human functional connectome. Cerebral Cortex 22(8), 1862–1875.
CHAPT E R 7
ACOUSTIC STRUCTURE AND MUSICAL FUNCTION: MUSICAL NOTES I N F O R M I N G A U D I TO RY RESEARCH MI C H A E L S C H U T Z
I
O
B ’ Fifth Symphony has intrigued audiences for generations. In opening with a succinct statement of its four-note motive, Beethoven deftly lays the groundwork for hundreds of measures of musical development, manipulation, and exploration. Analyses of this symphony are legion (Schenker, 1971; Tovey, 1971), informing our understanding of the piece’s structure and historical context, not to mention the human mind’s fascination with repetition. In his intriguing book The first four notes, Matthew Guerrieri deconstructs the implications of this brief motive (2012), illustrating that great insight can be derived from an ostensibly limited grouping of just four notes. Extending that approach, this chapter takes an even more targeted focus, exploring how groupings related to the harmonic structure of individual notes lend insight into the acoustical and perceptual basis of music listening.
Extensive overviews of auditory perception and basic acoustical principles are readily available (Moore, 1997; Rossing, Moore, & Wheeler, 2013; Warren, 2013) discussing the structure of many sounds, including those important to music. Additionally, several texts now focus specifically on music perception and cognition (Dowling & Harwood, 1986; Tan, Pfordresher, & Harré, 2007; Thompson, 2009). Therefore this chapter focuses on a previously under-discussed topic within the subject of musical sounds—the importance of temporal changes in their perception. This aspect is easy to overlook, as the perceptual fusion of overtones makes it difficult to consciously recognize their individual contributions. Yet changes in the amplitudes of specific overtones excited by musical instruments as well as temporal changes in the relative strengths of those overtones play a crucial role in musical timbre. Western music has traditionally focused on properties such as pitch and rhythm, yet contemporary composers are increasingly interested in timbre, to the point where it can on occasion even serve as a composition’s primary focus (Boulez, 1987; Hamberger, 2012). And although much previous scientific research on the neuroscience of music as well as music perception has focused on temporally invariant tones, there has been increasing recognition in the past decade that broadening our toolbox of stimuli is important to elucidating music’s psychological and neurological basis. Consequently, understanding the role of temporal changes in musical notes holds important implications for psychologists, musicians, and neuroscientists alike. Traditional musical scores give precise information regarding the intensity of each instrument throughout a composition in the form of dynamic markings. But for obvious practical reasons scores never specify the rapid intensity changes found in each overtone of an individual note. At most, composers hint at their preferences through descriptive terms such as “sharper/duller,” vague instructions (“as if off in the distance”), and/or performers use stylistic considerations to make such decisions—e.g., by following period-specific performance practice. And to a large extent, both the harmonic structure of a note as well as changes in its harmonic structure over time are natural consequences of an instrument’s physical structure. For example, the rapid decay of energy in harmonics shortly after the onset of a vibraphone note contrasts with the long sustain of its fundamental— contributing to its characteristic sound.
Musical notation clearly reflects changes in the intensity of collections of notes (e.g., crescendos, sfz) but never on the changes within notes themselves. While understandable, this decision mirrors the lack of attention to changes in overtone intensity in many psychophysical descriptions of sound—as well as perceptual experiments with auditory stimuli. This is unfortunate, as these intensity changes play an important role in efforts to synthesize “realistic” sounding musical notes—an issue of great relevance to composers creating electronic music. These also play an important role in discussions of tone quality so crucial to music educators training young ears, not to mention sound editors/engineers exploring which dynamic changes are important to capture and preserve when recording/mixing/compressing high quality audio. This chapter summarizes research on both the perceptual grouping of overtones and their rapid temporal changes, placing it in a broader context by highlighting connections to another important topic—how individual notes are perceptually grouped into chords. Finally, it concludes with a discussion of mounting evidence that auditory stimuli devoid of complex temporal changes may lead to experimental outcomes that fail to generalize to world listening—and on occasion can suggest errant theoretical frameworks and basic principles.
G
N C
: D H
The vertical alignment of notes gives rise to musical harmonies ranging from lush to biting—from soothing to scary. Consequently, composers carefully design complex groupings whose musical effects hinge on small changes in their arrangement. For example, major and minor chords differ significantly in their neural processing (Pallesen et al., 2005; Suzuki et al., 2008) and evoke distinct affective responses (Eerola, Friberg, & Bresin, 2013; Heinlein, 1928; Hevner, 1935). Yet from the standpoint of acoustic structure this change is small—a half-step in the third (i.e., “middle note”) of a musical chord (Aldwell, Schachter, & Cadwallader, 2002). In absolute terms, this represents a relatively small shift in the raw acoustic information —moving one of three notes the smallest permissible musical distance.
From a raw acoustic perspective, this is particularly unremarkable in a richly orchestrated passage, yet the shift from major to minor can lead to significant changes in a passage’s character. Individuals with cochlear implants—which offer relatively coarse pitch discrimination—are often unable to hear these distinctions, and often find music listening problematic (Wang et al., 2012). Fortunately most hear these changes quite readily, as evidenced by a literature on the detection of “out of key” notes shifted by a mere semi-tone (Koelsch & Friederici, 2003; Pallesen et al., 2005). Although musical acculturation occurring at a relatively young age (Corrigall & Trainor, 2010, 2014) aids this process, even musically untrained individuals are capable of detecting small changes (Schellenberg, 2002). Notes of different pitch are often grouped together into a single musical object—a chord. Typically consisting of three or more individual notes, chords function as a “unit” and together lay out the harmonic framework or backbone of a musical passage. The specific selection of simultaneous notes (i.e., harmonically building chords) has profound effects on the listening experience of audiences, forming one of the key building blocks of strong physiological responses to music (Lowis, 2002; Sloboda, 1991). The masterful selection of notes, rhythms, and instruments requires both intuition and craft, and basic principles are articulated in numerous treatises on composition (Clough & Conley, 1984), and guidelines to orchestration (Alexander & Broughton, 2008; Rimsky-Korsakov, 1964). Yet another aspect of musical sound’s vertical structure plays a crucial role in the listening experience, even if it is under less direct control by composers— the “vertical structure” (i.e., harmonic content) of individual notes—as well as the time-varying changes to these components. This topic forms the primary focus of this chapter, for much as study of individual notes can lend insight into our perception of musical passages, studying the rich, timevarying structure of concurrent harmonics can lend insight into our understanding of their perception.
G
H I
: D N
The complexities in composers’ grouping of individual notes into chords are well known (Aldwell et al., 2002), yet the musical importance of individual harmonics is less transparent, even though single notes produced by musical instruments contain incredible sophistication and nuance (Hjortkjaer, 2013). Musical instruments produce sounds rich in overtones, which for pitched instruments generally consist of harmonics at integer multiples of the fundamental (Dowling & Harwood, 1986; Tan et al., 2010), as well as other non-harmonic energy (particularly during a sound’s onset). The lawful structure of these overtones serves as an important binding cue, triggering a decision by the perceptual system to blend overtones such that “the listener is not usually directly aware of the separate harmonics” (Dowling & Harwood, 1986, p. 24). Although some musicians develop the ability to “hear out” individual components of their instruments (Jourdain, 1997, p. 35), in general this collection of frequencies fuses into a single musical unit. Consequently for practical matters the complex structure of individual notes is of less musical interest than the composer’s complex selection of structural cues (Broze & Huron, 2013; Huron & Ollen, 2003; Patel & Daniele, 2003; Poon & Schutz, 2015), or the performer’s interpretation of those cues (Chapin, Jantzen, Kelso, Steinberg, & Large, 2010). Although the musical importance of small note-to-note variations in amplitude with respect to phrasing and expressivity (Bhatara, Tirovolas, Duan, Levy, & Levitin, 2011; Repp, 1995) is widely recognized, the small moment-to-moment amplitude variations in individual overtones have received less research attention. Musical sounds contain overtones shifting in their relative strength over time (Jourdain, 1997, p. 35), and some textbooks explicitly note the importance of these dynamic changes (Thompson, 2009, p. 59). Yet the role of spectra is often presented as timeinvariant and described through summaries of spectral content irrespective of temporal changes in a note’s spectra. Musical instruments produce notes rich in temporal variation—not only in their overall amplitudes, but even with respect to the envelopes of individual harmonics. For example, Fig. 1 visualizes a musical note performed on the trumpet (left panel) and clarinet (right panel), based on instrument sounds provided by the University of Iowa Electronic Music studios (Fritts, 1997). The intensity (z axis) of energy extracted from each harmonic (x axis) is graphed over time (y axis). These 3D visualizations
illustrate the temporal complexity of harmonics bound into the percept of a single note. In fact, divorced from its context in a melody, expressive timings in musical passages, discussion of performer’s intentions regarding phrasing and numerous other considerations, analysis of isolated notes affords invaluable insight. Small temporal variations in each overtone play a key role in the degree to which synthesized notes sound “real” rather than “artificial.” Highly trained musicians can routinely produce different variations on a single note (“brighter” or “more legato,” “shimmery,” etc.), which involve intentionally varying both the balance and temporal changes in a note’s overtones.
FIGURE 1. Visualization of single notes produced by a trumpet (left) and clarinet (right), illustrating their complex temporal structure. Although the trumpet spectrum changes more dynamically than the clarinet, each partial is in constant flux. The goal of these 3D figures is to illustrate the dynamic nature of the harmonic structure of musical tones. Consequently they are not complete acoustical analyses (which are readily available elsewhere), but serve to highlight information lost in temporally invariant power spectra.
As tones synthesized without adequate temporal changes often sound uninteresting or “fake,” composers of electronic music, producers, instrument manufacturers, and other musical professionals pay top dollar for high quality audio samplings of instruments needed for their artistic purposes. Some creators of electronic music prefer samples of real musical
sounds over efforts to synthesize these sounds (Risset & Wessel, 1999), in part due to the temporal complexity of accurately realizing the temporal changes in individual musical notes, as well as our sensitivity to small changes (or the lack thereof) in electronically generated tones. From a psychological perspective, what is so crucial about the structure of individual notes? What are the acoustic differences between life-like and dull renditions of individual instruments? The importance of dynamic changes in an individual note’s harmonics can be most usefully understood within the context of musical timbre—a complex, multidimensional property that has proven incredibly challenging to even define, let alone explain. Unfortunately for timbre enthusiasts, this property is often treated as a “miscellaneous category” (Dowling & Harwood, 1986, p. 63) accounting for the perceptual experience of “everything about a sound which is neither loudness nor pitch” (ANSI, 1994; Erickson, 1975). In other words, timbre is often defined less by what it is than what it is not (Risset & Wessel, 1999). This oppositional approach is sensible given the multitude of acoustic factors known to play a role in its perception (Caclin, McAdams, Smith, & Winsberg, 2005; McAdams, Winsberg, Donnadieu, de Soete, & Krimphoff, 1995).
A
S
M T
One particularly useful technique for studying musical timbre is multidimensional scaling (MDS), which allows for exploration absent of assumptions about which acoustic properties are most important. Many studies using this approach will present a variety of individual notes matched for pitch and intensity, asking participants to rate their similarity (or more often, dissimilarity). Analysis of dissimilarity ratings affords construction of a multidimensional space allowing for visualization of the “perceptual distance” between different pairs of notes. Early studies found spectral properties play a crucial role (Miller & Carterette, 1975), and subsequent work has refined our understanding of their role on both the neural (Tervaniemi, Schröger, Saher, & Näätänen, 2000) and perceptual (Grey & Gordon, 1978; Trehub, Endman, & Thorpe, 1990) levels.
Consequently, the role of spectra in timbre is well explained in numerous textbooks on auditory perception and music cognition (Dowling & Harwood, 1986; Tan et al., 2010; Thompson, 2009, p. 48), typically through visualizations of power spectra, similar to Fig. 2.
FIGURE 2. Power spectra of trumpet and clarinet. These plots accurately convey the trumpet’s energy at many harmonics in contrast to the clarinet’s energy primarily at odd numbered harmonics. However, power spectra fail to convey any information about the temporal changes in harmonic amplitude so crucial to a sound’s timbre.
Power spectra provide a useful, time-invariant summary of the relative harmonic strength. By collapsing along the temporal dimension shown in Fig. 1, Fig. 2 summarizes one of the characteristic distinctions between brass and woodwind instruments—that trumpets produce energy at all harmonics, whereas clarinets primarily emphasize alternate harmonics. Yet power spectra fail to capture the dynamic changes prominent in natural musical instruments, and the perceptual difference between synthesizing the information represented in Fig. 1 and Fig. 2 is striking. For interactive demonstrations of these differences, pedagogical tools useful for both teaching and research purposes are freely available from www.maplelab.net/pedagogy. The shortcomings of power spectra are clear in cases where temporal cues play key roles not only in the realism of a musical sound, but in the distinction between different musical timbres. For example, the top row of Fig. 3 shows power spectra for notes produced on the trombone vs. cello.1 This visual similarity in power spectra is somewhat surprising, given the
markedly different methods of sound production in these instruments—a brass tube driven by lips on a mouthpiece vs. a bow drawn across a string. Additionally, cellos and trombones function differently in most musical compositions, suggesting their perception is distinct. Although this distinction is not apparent from their power spectra, it is clear in the middle row of Fig. 3 showing changes in harmonic strength over time. The bottom row provides a visualization of tones synthesized using the power spectra in the first row—illustrating what is retained and what is lost in time-invariant visualizations of musical sounds.
FIGURE 3. Visualizations of trombone (left) and cello (right). Panels in top row illustrate similarity in these instruments’ power spectra, despite the clear acoustical differences shown in the middle panels. Bottom panels visualize tones synthesized using static power spectra (i.e., ignoring temporal changes in the strength of individual harmonics).
Certain aspects of temporal dynamics are recognized as playing an important role in musical timbre. For example, both the rise time (initial onset) of notes (Grey, 1977; Krimphoff, McAdams, & Winsberg, 1994) as well as gross temporal structure—amplitude envelope—have been shown to be important (Iverson & Krumhansl, 1993). As an extreme example, reversing the temporal structure of a note qualitatively changes its timbre, such that a piano note played “backwards” sounds more like a reed-organ than a piano (Houtsma, Rossing, & Wagennars, 1987). It is important to note that in this case the power spectra for piano notes played either forwards or backwards are identical—yet the experience of listening to these renditions differs markedly. Even beyond dramatic changes such as backwards listening, temporal changes are known to play an important role in sounds from natural instruments. However, interest in the connection between temporal dynamics and timbre has largely focused on a sound’s onset (Gordon, 1987; Strong & Clark, 1967) rather than changes throughout its sustain period. For example, past studies have shown that insensitivity to a tone’s onset correlates with reading deficits (Goswami, 2011). Tone onset is also crucial to distinguishing between musical timbres (Skarratt, Cole, & Gellatly, 2009), and their removal leads to confusion of instruments otherwise easily differentiable (Saldanha & Corso, 1964).2
T S
U
T M
V P
R
Although temporal changes in the strengths of individual harmonics clearly play an important role in musical sounds, these changes are rightly recognized by experimental psychologists as potentially confounding (or at least introducing noise into) perceptual experiments. Not only will different instruments (along with variations in mouthpieces, mallets, bows, etc.)
make consistency challenging when using natural musical tones, the complexity of changes in recordings of nominally steady-state notes runs contrary to the level of control desirable for scientific experimentation. If an experimenter’s goal is to explore the role of pitch difference in auditory stream segregation, short pure tones with minimal amplitude variation offer clear benefits for drawing strong, replicable conclusions elucidating some aspects of our auditory perceptual organization. Consequently, the high degree of emphasis placed upon tightly constrained, easily reproducible stimuli incentivizes the use of simplified tones lacking temporal variation beyond simplistic onsets and offsets. This raises important questions about what kinds of stimuli are used to assess auditory perception. Although simplified sounds aid researchers in avoiding problematic confounds, their over-use could lead to challenges with generalizing their findings to natural sounds with the kinds of temporal variations shown in Fig. 1. In order to explore the kinds of sounds used in research on music perception, my team surveyed 118 empirical papers published in the journal Music Perception from experiments dating back to its inception in 1983, based on a previous comprehensive bibliometric survey (Tirovolas & Levitin, 2011). Primarily interested in determining the amount of amplitude variation found in the temporal structures of auditory stimuli, we classified every stimulus used in each of the 212 surveyed experiments as either “flat” (i.e., lacking temporal variation), “percussive” (decaying notes such as those produced by the piano, cowbell, or marimba), or “other”—sounds such as those produced by sustained instruments like the French horn or human voice. Fig. 4 illustrates examples of each stimulus class.
FIGURE 4. Wave forms of different sounds found in the survey of stimuli used in Music Perception (Schutz & Vaisberg, 2014). Reproduced from Music Perception: An Interdisciplinary Journal 31(3), Michael Schutz and Jonathan M. Vaisberg, Surveying the temporal structure of sounds used in music perception, pp. 288–296, doi:10.1525/mp.2014.31.3.288, Copyright © 2014, The Regents of the University of California.
The most surprising outcome from this survey was that although most articles included a wealth of technical information on spectral structure, duration, and the exact model of headphones or speakers used to present the
stimuli, about 35 percent failed to define the stimuli’s temporal structure. This finding is not unique to Music Perception—my team found similar problems with under-specification in the journal Attention, Perception & Psychophysics (Gillard & Schutz, 2013). More important than underspecification, both surveys revealed a strong bias against sounds with the kinds of temporal variations common to musical instruments. Although flat tones lend themselves well to tight experimental control and consistent replication amongst different labs, they fail to capture the richness of the sounds forming the backbone of the musical listening experience. Yet they remain prominent in a wide range of research on auditory perception on tasks purportedly designed to illuminate generalizable principles of auditory perception. Prominent researchers have noted that the world is “[not] replete with examples of naturally occurring auditory pedestals [i.e., flat amplitude envelopes]” (Phillips, Hall, & Boehnke, 2002, p. 199). Yet flat tones appear to be the normative approach to research on auditory perception, which are clearly far removed from the complexity of natural musical sounds—as shown in Fig. 5. Note that each of the three musical instruments visualized not only exhibits constant temporal changes, but temporal changes in the amplitudes of each individual harmonic. This dynamic fluctuation contrasts starkly with the flat tones favored in auditory perception research shown in the bottom right panel. This over-fixation on sounds lacking meaningful amplitude variation is not confined to behavioral work; a large-scale review of auditory neuroscience research concluded with a note of caution that important properties of functions of the auditory system will only be fully understood when researchers begin employing envelopes that “involve modulation in ways that are closer to real-world tasks faced by the auditory system” (Joris, Schreiner, & Rees, 2004, p. 570). The acoustic distance between the temporally dynamic musical sounds and temporally constrained flat tones common in auditory perception and neuroscience research raises important questions about the degree to which theories and models derived from these experiments generalize to musical listening. The complexities of balancing competing needs for experimental control and ecological relevance are significant, and will serve as the focus of the following section.
FIGURE 5. Single notes produced by an oboe (upper left), French horn (upper right), and viola (lower left) illustrate their temporal complexity. Although their specific mix of harmonics varies, these instruments all exhibit constant changes in the strength of each harmonic over the tone’s duration. This temporal complexity contrasts strongly with the temporal simplicity of the flat tone depicted in the lower right panel, which lacks temporal variation beyond abrupt onsets/offsets, and no change in relative strength of harmonics.
O
M
C S
S
This focus on tightly constrained stimuli is not necessarily problematic; control of extraneous variables is essential to researchers’ ability to draw strong conclusions from individual experiments. Consistency in the
synthesis of stimuli amongst different labs holds many advantages with respect to replication, an issue of increasing importance to the field as a whole. And in some circumstances the real-world associations inherent in temporally complex sounds can pose obstacles to answering key questions. For example, researchers exploring acoustic attributes of unpleasant sounds illustrate that frequency range (Kumar, Forster, Bailey, & Griffiths, 2008), spectral “roughness” (Terhardt, 1974), and the relative mix of harmonics-tonoise (Ferrand, 2002) are key factors—issues important for engineers designing human–computer auditory interfaces. Yet a direct ranking of sounds shows that vomiting is regarded as one of the most unpleasant (Cox, 2008), an outcome related less to its specific acoustic properties than the obvious real-world associations (McDermott, 2012). In some cases these real-world associations may be regarded as confounds obfuscating the general principles at hand. Therefore, in some inquiries aimed at understanding the relationship between acoustic structure and perceptual response, it is not only reasonable but actually necessary to use sounds devoid of referents. This issue of disentangling the effects attributable to associations vs. acoustic features is of particular importance in the perception of music, given the rich and complex relationship between music, memory, and emotion. Familiar compositions can evoke memories as a result of past associations—for example from a history of personal listening/performance (Schulkind, Hennis, & Rubin, 1999) or use in film sound tracks (e.g., those used by Vuoskoski and Eerola, 2012). Indeed songs from popular television shows are so familiar they have even been used to assess the pervasiveness of absolute pitch amongst the general population (Schellenberg & Trehub, 2003). Consequently, synthesized tones lacking real-world associations serve a useful purpose in advancing our understanding of auditory perception. However, although artificial sounds devoid of real-world associations that afford precise control/replication offer advantages in certain circumstances, their simplicity can pose barriers to fully understanding music perception. In fact, auditory psychophysics’ focus on “control” (Neuhoff, 2004) and the study of isolated parameters absent their natural context (Gaver, 1993) is an issue of long-standing concern in some corners of the auditory perception community. This is of particular importance to understanding music, as composers, performers, conductors, and recording
engineers focus great attention to slight nuances of musical timbre. Yet the same differences so useful in artistic creation often serve as confounds within the realm of auditory psychophysics. This raises important questions about the types of stimuli that should be used in experiments designed to address questions related to music listening. Can artificial sounds abstracted from our day-to-day musical experiences lead to experimental outcomes that generalize to listening outside the laboratory? Perceptual experiments exploring audio-visual integration in musical contexts offer a useful case study in the consequences of ignoring the role of musical sounds’ dynamic temporal structures. A large body of audiovisual integration research using temporally simplistic sounds has concluded that vision rarely influences auditory evaluations of duration3 (Fendrich & Corballis, 2001; Walker & Scott, 1981; Welch & Warren, 1980). However, a musical experiment exploring ongoing debate amongst percussionists led to a surprising break with widely accepted theory. In that series of studies an internationally acclaimed musician attempted to create long and short notes on the marimba—a tuned, wooden bar instrument similar to the xylophone. Notes on the marimba are percussive (Fig. 4, middle panel)—with continuous temporal variation in their structure as the energy transferred into the bar (by striking) gradually dissipates as a result of friction, air resistance, etc. Whether or not the duration of these notes can be intentionally varied has been long debated in the percussion community (Schutz & Manning, 2012). However, an assessment of an expert percussionist’s ability to control note duration demonstrated that these gestures are in fact acoustically inconsequential, but trigger an illusion in which the longer physical gesture used to strike the instrument affects perception of the resulting note’s duration (Schutz & Lipscomb, 2007). Musical implications (Schutz, 2008) aside, this finding represents a clear break from previously accepted views on the integration of sight and sound (Fendrich & Corballis, 2001; Walker & Scott, 1981; Welch & Warren, 1980). The surprising ability of percussionists to shape perceived note duration despite previous experimental work to the contrary stems in large part from a bias in the temporal structure of stimuli used in auditory research. Subsequent experiments illustrate that movements derived from the percussionists’ gesture (Schutz & Kubovy, 2009b) integrate with sounds exhibiting decaying envelopes (e.g., piano notes, produced from the impact
of a hammer on string), but failed to integrate with the sustained tones produced by the clarinet or French horn (Schutz & Kubovy, 2009a). As the clarinet differs in many properties from the marimba and piano, a direct test of temporal structure using pure tones (i.e., sine waves) shaped with decaying vs. amplitude invariant amplitude envelopes found visual information integrated with the temporally dynamic percussive tones, but not the temporally invariant flat tones previously used in audio-visual integration experiments (Schutz, 2009). This distinction between the outcomes of experiments with tones using temporally dynamic vs. static amplitude envelopes is important in assessing the degree to which lab-based tasks inform our understanding of listening in the real world. For example, temporal structure can play a key role in the well-known audio-visual bounce effect (ABE), in which two circles approach each other, overlap, and then move to their original starting point. Although this ambiguous display can be perceived as depicting circles either “bouncing off” or “passing through” one another, a brief tone coincident with the moment of overlap enhances the likelihood of seeing a bounce (Sekuler, Sekuler, & Lau, 1997). However, not all sounds affect this integrated perception in the same way. Sounds synthesized with decaying envelopes mimicking impact events trigger significantly more bounce percepts than their mirror images (Grassi & Casco, 2009). The temporal structure of individual tones also plays a role in a variety of “general” perceptual tasks assessed primarily using tones lacking dynamic temporal changes, leading to different experimental outcomes in tasks ranging from learning associations (Schutz, Stefanucci, Baum, & Roth, 2017) to perceiving pitches (Neuhoff & McBeath, 1996), assessing event duration (Vallet, Shore, & Schutz, 2014), and segmenting auditory streams (Iverson, 1995). Overlooking the importance of temporal structure in auditory perception can even lead to misguided theoretical claims used to inform ongoing research programs. For example, as discussed previously a great deal of audio-visual integration research involves temporally simplified tones ensuring experimental control. However, interest in the role of the natural connection between sight and sound has been considered in discussions regarding the “unity assumption” (Welch, 1999) and/or “identity decision” (Bedford, 2004). That research explores the idea that event unity between sight and sound plays an important role in the binding decision, such that
stimuli perceived as “going together” are more likely to bind. For example, in the well-known “ventriloquist effect” the sound of a ventriloquist’s voice is perceptually bound with concurrent lip movements of their puppets (Abry, Cathiard, Robert-Ribes, & Schwartz, 1994; Bonath et al., 2007). Unfortunately, the natural real-world relationships between sights and sounds often pose challenges for the controlled manipulations so important to experimental research. For example, tightly controlled, psychophysically inspired studies of multimodal speech help clarify the importance of event unity in multisensory integration. Gender matched faces and voices—the sound of a male producing syllable paired with the lip movements of either male or female articulating that syllable—bind more strongly than gender mis-matched faces and voices (Vatakis & Spence, 2007). This finding offers strong evidence for the unity assumption raising important questions about the degree to which it applies to auditory stimuli beyond speech. A series of experiments assessing the role of the unity assumption with musical stimuli involved pairing the sound of a piano note and plucked guitar string with video recordings of the movements used to produce these sounds. Following their earlier procedures, this approach found no evidence of the unity assumption playing a role in this non-speech musical task (as well as other stimuli such as a hammer striking ice vs. a bouncing ball). This outcome contributed to the conclusion that the unity assumption applied only to speech stimuli (Vatakis, Ghazanfar, & Spence, 2008). However, as summarized below, subsequent research found strong evidence for the unity assumption in non-speech tasks—considering the importance of auditory temporal structure. The piano and guitar sounds used by Vatakis et al. (2008) exhibited similar amplitude envelopes—a property defining the gross temporal structure of a sound (i.e., the summation of changes in the amplitudes of spectral components). Building upon their approaches to assessing binding using musical notes produced by the marimba and cello, my team found evidence for the unity assumption when assessing sounds that involved clearly differentiable amplitude envelopes (Chuen & Schutz, 2016). Although in hindsight, the traditional focus on flat tones in auditory psychophysics research helped obfuscate the obvious similarity in temporal structure of the guitar and piano notes used by Vatakis et al. (2008). Given the relatively small proportion of auditory perception studies using natural sounds, this oversight is understandable as the use of natural sounds in
psychophysics experiments is laudable given the general focus on temporally invariant stimuli, which “often seems to have limited direct relevance for understanding the ability to recognize the nature of complex natural acoustic source events” (Pastore, Flint, Gaston, & Solomon, 2008, p. 13). From these examples, it is clear that the time-varying structure of natural sounds (or lack thereof) can meaningfully influence the outcomes of psychological experiments. This is true whether researchers’ goals are to explore natural listening or attempting to better understand the theoretical structure and function of the auditory system. This issue holds important implications even for experiments aimed at elucidating generalized principles of perceptual processing rather than explicitly assessing the role of dynamic temporal changes. Together, these concerns are consistent with those raised previously by proponents of ecological acoustics such as John Neuhoff, who argue that “the perception of dynamic, ecologically valid stimuli is not predicted well by the results of many traditional experiments using static stimuli” (2004, p. 5).
C Traditional studies of specific sequences of notes such as the four note opening of Beethoven’s Fifth Symphony provide useful insight into both the theoretical structure of musical passages, as well as their larger cultural relevance. Much as the constant movement of pitches and rhythms gives rise to lively melodies, the continual variations in temporal structure (for multiple simultaneous harmonics) play an important role in musical listening. However, as this information is not notated in musical scores and is often under-emphasized in scientific discourse, the importance of these dynamic changes is not always fully recognized. This “insight” is well understood amongst those involved in sound synthesis and virtual modeling of musical instruments. However, the need for tight experimental control for stimuli used in experimental work on auditory perception and auditory neuroscience has incentivized the use of simple time-invariant flat tones. Although they offer important methodological benefits, their distance from musical sounds can pose limitations on their ability to inform our
understanding of natural listening. With modern recording and sound synthesis approaches we now have the ability to generate auditory stimuli exhibiting the rich temporal variation of natural musical sounds, while also affording the precise control so crucial for avoiding confounds—raising exciting new possibilities for future innovation and discovery. Looking toward the future, research assessing core questions of auditory perception using temporally complex sounds will help clarify the degree to which existing theories and models apply to our perception of natural sounds such as those produced by musical instruments.
A Funding supporting this research was provided by the Natural Sciences and Engineering Research Council of Canada (NSERC), Social Sciences and Humanities Research Council of Canada (SSHRC), and the Ontario Early Researcher Award (ERA). I would like to thank Maxwell Ng for his assistance in creating the visualizations of the instrument sounds used throughout this chapter.
R Abry, C., Cathiard, M. A., Robert-Ribes, J., & Schwartz, J. L. (1994). The coherence of speech in audio-visual integration. Current Psychology of Cognition 13, 52–59. Acoustical Society of America Standards Secretariat (1994). Acoustical Terminology ANSI S1.1– 1994 (ASA 111-1994). American National Standard. ANSI/Acoustical Society of America. Alais, D., & Burr, D. (2004). The ventriloquist effect results from near-optimal bimodal integration. Current Biology 14(3), 257–262. Aldwell, E., Schachter, C., & Cadwallader, A. (2002). Harmony & voice leading (3rd ed.). Boston, MA: Schirmer. Alexander, P. L., & Broughton, B. (2008). Professional orchestration: The first key. Solo instruments & instrumentation note, volume 1 (3rd ed.). Petersburg, VA: Alexander Publishing. Bedford, F. L. (2004). Analysis of a constraint on perception, cognition, and development: One object, one place, one time. Journal of Experimental Psychology: Human Perception and Performance 30(5), 907–912. Bhatara, A., Tirovolas, A. K., Duan, L. M., Levy, B., & Levitin, D. J. (2011). Perception of emotional expression in musical performance. Journal of Experimental Psychology: Human Perception and Performance 37(3), 921–934.
Bonath, B., Noesselt, T., Martinez, A., Mishra, J., Schwiecker, K., Heinze, H.-J., & Hillyard, S. A. (2007). Neural basis of the ventriloquist illusion. Current Biology 17(19), 1697–1703. Boulez, P. (1987). Timbre and composition—timbre and language. Contemporary Music Review 2(1), 161–171. Broze, Y., & Huron, D. (2013). Is higher music faster? Pitch–speed relationships in Western compositions. Music Perception: An Interdisciplinary Journal 31(1) 19–31. Caclin, A., McAdams, S., Smith, B. K., & Winsberg, S. (2005). Acoustic correlates of timbre space dimensions: A confirmatory study using synthetic tones. Journal of the Acoustical Society of America 118(1), 471–482. Chapin, H., Jantzen, K., Kelso, J. A. S., Steinberg, F., & Large, E. W. (2010). Dynamic emotional and neural responses to music depend on performance expression and listener experience. PLoS ONE 5, 1–14. Chuen, L., & Schutz, M. (2016). The unity assumption facilitates cross-modal binding of musical, non-speech stimuli: The role of spectral and amplitude cues. Attention, Perception, & Psychophysics 78(5), 1512–1528. Clough, J., & Conley, J. (1984). Basic harmonic progressions. New York: W. W. Norton. Corrigall, K. A., & Trainor, L. J. (2010). Musical enculturation in preschool children: Acquisition of key and harmonic knowledge. Music Perception: An Interdisciplinary Journal 28(2), 195–200. Corrigall, K. A., & Trainor, L. J. (2014). Enculturation to musical pitch structure in young children: Evidence from behavioral and electrophysiological methods. Developmental Science 17(1), 142– 158. Cox, T. J. (2008). Scraping sounds and disgusting noises. Applied Acoustics 69(12), 1195–1204. Dowling, W. J., & Harwood, D. L. (1986). Music cognition. Orlando, FL: Academic Press. Eerola, T., Friberg, A., & Bresin, R. (2013). Emotional expression in music: Contribution, linearity, and additivity of primary musical cues. Frontiers in Psychology 4, 1–12. Retrieved from https://doi.org/10.3389/fpsyg.2013.00487 Erickson, R. (1975). Sound Structure in Music. Berkeley, CA: University of California Press. Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415(6870), 429–433. Fendrich, R., & Corballis, P. M. (2001). The temporal cross-capture of audition and vision. Perception & Psychophysics 63(4), 719–725. Ferrand, C. T. (2002). Harmonics-to-noise ratio: An index of vocal aging. Journal of Voice 16(4), 480–487. Fritts, L. (1997). University of Iowa Electronic Music Studios. University of Iowa. Retrieved from http://theremin.music.uiowa.edu/MIS.html Gaver, W. (1993). What in the world do we hear? An ecological approach to auditory event perception. Ecological Psychology 5(1) 1–29. Gillard, J., & Schutz, M. (2013). The importance of amplitude envelope: Surveying the temporal structure of sounds in perceptual research. In Proceedings of the Sound and Music Computing Conference (pp. 62–68). Stockholm, Sweden. Gordon, J. W. (1987). The perceptual attack time of musical tones. Journal of the Acoustical Society of America 82(1) 88–105. Goswami, U. (2011). A temporal sampling framework for developmental dyslexia. Trends in Cognitive Sciences 15(1) 3–10. Grassi, M., & Casco, C. (2009). Audiovisual bounce-inducing effect: Attention alone does not explain why the discs are bouncing. Journal of Experimental Psychology: Human Perception and Performance 35(1), 235–243.
Grey, J. M. (1977). Multidimensional perceptual scaling of musical timbres. Journal of the Acoustical Society of America 61(5), 1270–1277. Grey, J. M., & Gordon, J. W. (1978). Perceptual effects of spectral modifications on musical timbres. Journal of the Acoustical Society of America 63(5), 1493–1500. Guerrieri, M. (2012). The first four notes: Beethoven’s Fifth and the human imagination. New York: Alfred A. Knopf. Hamberger, C. L. (2012). The evolution of Schoenberg’s Klangfarbenmelodie: The importance of timbre in modern music. The Pennsylvania State University. Retrieved from https://etda.libraries.psu.edu/files/final_submissions/8130 Heinlein, C. P. (1928). The affective characters of the major and minor modes in music. Journal of Comparative Psychology 8, 101–142. Hevner, K. (1935). The affective character of the major and minor modes in music. American Journal of Psychology 47(1), 103–118. Hjortkjaer, J. (2013). The musical brain. In J. O. Lauring (Ed.), An introduction to neuroaesthetics: The neuroscientific approach to aesthetic experience, artistic creativity, and arts appreciation (pp. 211–244). Copenhagen: Museum Tusculanum Press. Houtsma, A. J. M., Rossing, T. D., & Wagennars, W. M. (1987). Auditory demonstrations on compact disc. Journal of the Acoustical Society of America. New York: Acoustical Society of America/Eindhoven: Institute for Perception Research. Huron, D., & Ollen, J. (2003). Agogic contrast in French and English themes: Further support for Patel and Daniele (2003). Music Perception: An Interdisciplinary Journal 21(2), 267–271. Iverson, P. (1995). Auditory stream segregation by musical timbre: Effects of static and dynamic acoustic attributes. Journal of Experimental Psychology: Human Perception and Performance 21, 751–763. Iverson, P., & Krumhansl, C. L. (1993). Isolating the dynamic attributes of musical timbre. Journal of the Acoustical Society of America 94, 2594–2603. Joris, P. X., Schreiner, C. E., & Rees, A. (2004). Neural processing of amplitude-modulated sounds. Physiological Reviews 84, 541–577. Jourdain, R. (1997). Music, the brain, and ecstasy: How music captures our imagination. New York: William Morrow and Company. Kendall, R. A. (1986). The role of acoustic signal partitions in listener categorization of musical phrases. Music Perception 4(2), 185–213. Koelsch, S., & Friederici, A. D. (2003). Toward the neural basis of processing structure in music. Annals of the New York Academy of Sciences 999, 15–28. Krimphoff, J., McAdams, S., & Winsberg, S. (1994). Caractérisation du timbre des sons complexes. II. Analyses acoustiques et quantification psychophysique. Journal de Physique IV Colloque 4, 625–628. Kumar, S., Forster, H. M., Bailey, P., & Griffiths, T. D. (2008). Mapping unpleasantness of sounds to their auditory representation. Journal of the Acoustical Society of America 124(6), 3810–3817. Lowis, M. J. (2002). Music as a trigger for peak experiences among a college staff population. Creativity Research Journal 14(3–4), 351–359. McAdams, S., Winsberg, S., Donnadieu, S., de Soete, G., & Krimphoff, J. (1995). Perceptual scaling of synthesized musical timbres: Common dimensions, specificities, and latent subject classes. Psychological Research 58(3), 177–192. McDermott, J. (2012). Auditory preferences and aesthetics: Music, voices, and everyday sounds. In R. J. Dolan & T. Sharot (Eds.), Neuroscience of preference and choice: Cognitive and neural mechanisms (pp. 227–257). London: Academic Press.
Miller, J. R., & Carterette, E. C. (1975). Perceptual space for musical structures. Journal of the Acoustical Society of America 58(3), 711–720. Moore, B. C. J. (1997). An introduction to the psychology of hearing (4th ed.). London: Academic Press. Neuhoff, J. G. (2004). Ecological psychoacoustics (J. G. Neuhoff, Ed.). Amsterdam: Elsevier/Academic Press. Neuhoff, J. G., & McBeath, M. K. (1996). The Doppler illusion: The influence of dynamic intensity change on perceived pitch. Journal of Experimental Psychology: Human Perception and Performance 22(4), 970–985. Pallesen, K. J., Brattico, E., Bailey, C., Korvenoja, A., Koivisto, J., Gjedde, A., & Carlson, S. (2005). Emotion processing of major, minor, and dissonant chords: A functional magnetic resonance imaging study. Annals of the New York Academy of Sciences 1060, 450–453. Pastore, R. E., Flint, J., Gaston, J. R., & Solomon, M. J. (2008). Auditory event perception: The source–perception loop for posture in human gait. Perception & Psychophysics 70(1), 13–29. Patel, A. D., & Daniele, J. R. (2003). Stress-timed vs. syllable-timed music? A comment on Huron and Ollen (2003). Music Perception: An Interdisciplinary Journal 21(2), 273–276. Phillips, D. P., Hall, S. E., & Boehnke, S. E. (2002). Central auditory onset responses, and temporal asymmetries in auditory perception. Hearing Research 167(1–2), 192–205. Poon, M., & Schutz, M. (2015). Cueing musical emotions: An empirical analysis of 24-piece sets by Bach and Chopin documents parallels with emotional speech. Frontiers in Psychology 6, 1–13. Retrieved from https://doi.org/10.3389/fpsyg.2015.01419 Repp, B. H. (1995). Quantitative effects of global tempo on expressive timing in music performance: Some perceptual evidence. Music Perception: An Interdisciplinary Journal 13(1), 39–57. Rimsky-Korsakov, N. (1964). Principles of orchestration (M. Steinberg, Ed.). New York: Dover. Risset, J.-C., & Wessel, D. L. (1999). Exploration of timbre by analysis and synthesis. In D. Deutsch (Ed.), The Psychology of Music (pp. 113–169). San Diego, CA: Gulf Professional Publishing. Rossing, T. D., Moore, R. F., & Wheeler, P. A. (2013). The science of sound (3rd ed.). London: Pearson Education. Saldanha, E. L., & Corso, J. F. (1964). Timbre cues and the identification of musical instruments. Journal of the Acoustical Society of America 36(11), 2021–2026. Schellenberg, E. G. (2002). Asymmetries in the discrimination of musical intervals: Going out-oftune is more noticeable than going in-tune musical intervals. Music Perception: An Interdisciplinary Journal 19(2), 223–248. Schellenberg, E. G., & Trehub, S. E. (2003). Good pitch memory is widespread. Psychological Science 14(3), 262–266. Schenker, H. (1971). Analysis of the first movement. In E. Forbes (Ed.), Beethoven Symphony No. 5 in C minor (pp. 164–182). New York: W. W. Norton. Schulkind, M. D., Hennis, L. K., & Rubin, D. C. (1999). Music, emotion, and autobiographical memory: They’re playing your song. Memory & Cognition 27(6), 948–955. Schutz, M. (2008). Seeing music? What musicians need to know about vision. Empirical Musicology Review 3(3), 83–108. Schutz, M. (2009). Crossmodal integration: The search for unity (Dissertation). University of Virginia. Schutz, M., & Kubovy, M. (2009a). Causality and cross-modal integration. Journal of Experimental Psychology: Human Perception and Performance 35(6), 1791–1810. Schutz, M., & Kubovy, M. (2009b). Deconstructing a musical illusion: Point-light representations capture salient properties of impact motions. Canadian Acoustics 37(1), 23–28.
Schutz, M., & Lipscomb, S. (2007). Hearing gestures, seeing music: Vision influences perceived tone duration. Perception 36(6), 888–897. Schutz, M., & Manning, F. (2012). Looking beyond the score: The musical role of percussionists’ ancillary gestures. Music Theory Online 18, 1–14. Schutz, M., Stefanucci, J., Baum, S. H., & Roth, A. (2017). Name that percussive tune: Associative memory and amplitude envelope. Quarterly Journal of Experimental Psychology 70(7), 1323– 1343. Schutz, M., & Vaisberg, J. M. (2014). Surveying the temporal structure of sounds used in music perception. Music Perception: An Interdisciplinary Journal 31(3), 288–296. Sekuler, R., Sekuler, A. B., & Lau, R. (1997). Sound alters visual motion perception. Nature 385(6614), 308. Skarratt, P. A., Cole, G. G., & Gellatly, A. R. H. (2009). Prioritization of looming and receding objects: Equal slopes, different intercepts. Attention, Perception, & Psychophysics 71(4), 964–970. Sloboda, J. (1991). Music structure and emotional response: Some empirical findings. Psychology of Music 19(2), 110–120. Strong, W., & Clark, M. (1967). Perturbations of synthetic orchestral wind-instrument tones. Journal of the Acoustical Society of America 41(2), 277–285. Suzuki, M., Okamura, N., Kawachi, Y., Tashiro, M., Arao, H., Hoshishiba, T., … Yanai, K. (2008). Discrete cortical regions associated with the musical beauty of major and minor chords. Cognitive, Affective, & Behavioral Neuroscience 8(2), 126–131. Tan, S.-L., Pfordresher, P. Q., & Harré, R. (2007). Psychology of music: From sound to significance. New York: Psychology Press. Terhardt, E. (1974). On the perception of periodic sound fluctuations (roughness). Acta Acustica United with Acustica 30, 201–213. Tervaniemi, M., Schröger, E., Saher, M., & Näätänen, R. (2000). Effects of spectral complexity and sound duration on automatic complex-sound pitch processing in humans: A mismatch negativity study. Neuroscience Letters 290, 66–70. Thompson, W. F. (2009). Music, thought, and feeling: Understanding the psychology of music. New York: Oxford University Press. Tirovolas, A. K., & Levitin, D. J. (2011). Music perception and cognition research from 1983 to 2010: A categorical and bibliometric analysis of empirical articles in Music Perception. Music Perception: An Interdisciplinary Journal 29(1), 23–36. Tovey, D. F. (1971). The Fifth Symphony. In E. Forbes (Ed.), Beethoven Symphony No. 5 in C minor (pp. 143–150). New York: W. W. Norton. Trehub, S. E., Endman, M. W., & Thorpe, L. A. (1990). Infants’ perception of timbre: Classification of complex tones by spectral structure. Journal of Experimental Child Psychology 49(2), 300–313. Vallet, G., Shore, D. I., & Schutz, M. (2014). Exploring the role of amplitude envelope in duration estimation. Perception 43(7), 616–630. Vatakis, A., Ghazanfar, A. A., & Spence, C. (2008). Facilitation of multisensory integration by the “unity effect” reveals that speech is special. Journal of Vision 8(9), 1–11. Vatakis, A., & Spence, C. (2007). Crossmodal binding: Evaluating the “unity assumption” using audiovisual speech stimuli. Perception & Psychophysics 69(5), 744–756. Vuoskoski, J. K., & Eerola, T. (2012). Can sad music really make you sad? Indirect measures of affective states induced by music and autobiographical memories. Psychology of Aesthetics, Creativity, and the Arts 6, 1–10. Walker, J. T., & Scott, K. J. (1981). Auditory-visual conflicts in the perceived duration of lights, tones and gaps. Journal of Experimental Psychology: Human Perception and Performance 7(6), 1327–1339.
Wang, S., Liu, B., Dong, R., Zhou, Y., Li, J., Qi, B., … Zhang, L. (2012). Music and lexical tone perception in Chinese adult cochlear implant users. The Laryngoscope 122, 1353–1360. Warren, R. M. (2013). Auditory perception: A new synthesis. Amsterdam: Elsevier. Welch, R. B. (1999). Meaning, attention, and the “unity assumption” in the intersensory bias of spatial and temporal perceptions. In G. Aschersleben, T. Bachmann, & J. Musseler (Eds.), Cognitive contributions to the perception of spatial and temporal events (pp. 371–387). Amsterdam: Elsevier. Welch, R. B., & Warren, D. H. (1980). Immediate perceptual response to intersensory discrepancy. Psychological Bulletin 88(3), 638–667.
1
All analyses of notes in this chapter are based on additional samples from the University of Iowa Electronic Music studios (Fritts, 1997). 2
However, presenting notes without transients as part of a melodic sequence (rather than as isolated tones) may mitigate this confusion (Kendall, 1986). 3
Provided that the acoustic information is of sufficient quality (Alais & Burr, 2004; Ernst & Banks, 2002).
CHAPT E R 8
NEURAL BASIS OF RHYTHM PERCEPTION C H R I S T I N A M. VA N D E N B O S C H D E R N E D E R L A N D E N, J. E R I C T. TAY L O R, A N D JE S S I C A A . G R A H N
I T experience music, listeners must be able to pick up on the temporal relationships among events as they unfold. These temporal relationships are characterized by the rhythm, or the pattern of time intervals between the onsets of events in music (see Fig. 1). Unlike sculpture or painting, we must comprehend rhythmic structure to perceive and produce music and dance. One of the most intriguing phenomena in music is that when we listen to rhythm, we perceive a regular, recurring pulse or beat (Cooper & Meyer, 1960; Large, 2008), which allows us to bob our heads and clap our hands in time to the music. This psychologically generated beat does not always have to align with the note onsets in a rhythm, as evidenced by the fact that we mentally continue the beat through gaps in the music (see Fig. 1). We further organize the musical beat into alternations of strong and weak beats at multiple hierarchical timescales, called meter (Epstein, 1995; Lerdahl & Jackendoff, 1983). The meter helps us distinguish between, for example, a waltz (i.e., triple meter) and a march (i.e., duple meter) depending on whether we hear the strong beat fall on every three or two beats, respectively.
FIGURE 1. Rhythm is represented by the black dots on each line, whereas the beat is represented by the bold black lines occurring every other beat (duple meter). (A). Represents a simple metrical pattern, such that events fall on the beat more often than not. (B). Depicts a complex metrical pattern with some events occurring on the beat, while many others do not. Finally, (C). Illustrates a syncopated rhythm, where note events always occur off the beat.
Despite the ease with which humans pick up on the beat and synchronize their movements to music, it is not trivial to understand how human brains perceive and process rhythm. Musical rhythms are beat-based, which means that the pattern of onsets gives rise to the feeling of an underlying pulse or framework. Perceiving a beat can make it easier to predict and act on upcoming events in a rhythmic sequence. However, many other naturally occurring rhythms in our environment do not have a regular pulse or beat, such as walking, talking, or the car engine turning over. Such rhythms are called non-beat-based. Different mechanisms have been proposed to account for the way that humans encode beat- and non-beat-based rhythms. Absolute timing mechanisms encode exact durations of all intervals in a sequence, whereas relative timing mechanisms encode when intervals start and stop in relation to the beat. If there is no regular beat, then absolute timing is likely necessary to encode the rhythm. However, with a beat, relative timing may be used. There is evidence for distinct neural networks associated with absolute and relative timing (Teki, Grube, Kumar, & Griffiths, 2011), with participants relying on either mechanism depending on the nature of the rhythm and task demands. A number of approaches are used to understanding rhythm processing, incorporating methodologies based on behavior, neuroimaging, and patient studies. Measuring behavior is fundamental to our understanding of rhythm because it can provide a direct measure of how we move to music. However, much of behavioral research is correlational—that is, distinct measures of stimulus characteristics and tapping variability may be related
to one another, but the stimulus characteristics may not cause tapping variability as there may be a third unmeasured variable that is the true influencer of performance. Although neuroimaging approaches are ideal for discovering more about when and where in the brain rhythm is processed, some neuroimaging studies fail to include (or are unable to include for methodological reasons) behavioral measures of rhythm processing, which makes it difficult to determine exactly how differences in neural activation relate to real-world outcomes. That is, simply because there are differences in activation for two different rhythms does not necessarily mean that participants will also perceive them differently. Finally, studies of patients with brain damage or dysfunction provide significant insights, but rely on natural accidents, which do not lead to the same amount or location of damage in each individual. This makes it difficult to understand what lesioned areas are truly necessary for rhythm processing or whether a particular combination of areas is required to perceive rhythm. Of course, a combination of these approaches, combined with methods that focally disrupt ongoing neural processing, such as transcranial magnetic stimulation (TMS), have given rise to a rich literature on human rhythm perception. This chapter will review the current literature on the neural basis of rhythm perception, which highlights important brain areas for perceiving a beat, and how the human brain entrains to rhythms in music.
F
B
To understand how the brain processes rhythm, especially rhythms that have a beat (as is the case in most music), it is first necessary to create rhythmic stimuli that are capable of inducing the percept of a strong beat, and similar stimuli that either do not induce a beat percept at all, or only weakly so. This allows researchers to compare scenarios when participants feel the beat more or less strongly (or not at all), but other aspects of the task are equivalent (e.g., the presence of acoustically similar sounds, having to listen to or reproduce rhythms). The stimuli must be as physically similar as possible so that when activation of strong and weak beat rhythms is compared, activation differences will not arise from other differences between stimuli apart from the percept of a beat. To solve this problem,
researchers take advantage of the fact that the strength of a beat percept can be driven by perceptual accents, or perceived emphases on certain tones in a rhythm that generally mark the beat. These perceptual accents differ from physical accents (e.g., changes in loudness or pitch on certain notes), because they arise from timing of the tones, even though the physical properties of all the tones are identical (Brochard, Abecasis, Potter, Ragot, & Drake, 2003; Povel & Okkerman, 1981). For instance, people perceive accents on every other event in an evenly spaced sequence (e.g., tick-tock of a clock), or perceive accents on notes that are preceded or followed by long silent intervals, even if the events are the same duration, loudness, and pitch as events not surrounded by silence. By creating sequences that differ in their pattern of temporal onsets, but are identical in all other respects (e.g., number of tones, duration, loudness, pitch), researchers can create rhythms with varying degrees of a beat percept that are matched in other ways. For example, in metric simple rhythms (see Fig. 1A), the timing of the tones is selected to create regularly occurring perceptual accents, which induce a clear and steady beat. In these cases, the intervals between tones would be comprised of whole-integer ratios (e.g., 2, 2, 1, 1, 2, in which numbers represent multiples of an arbitrary time interval (e.g., 1 = 250 ms) between tone onsets). These can be compared to metric complex rhythms (see Fig. 1B), in which tone onsets are syncopated: the perceptual accents occur irregularly—they do not always coincide with the beat, and thus do not induce a strong beat percept (Povel & Essens, 1985). The irregular perceptual accents of complex rhythms also make it more difficult to synchronize with them (Patel, Iversen, Chen, & Repp, 2005). A third category of rhythmic sequences is non-metric, in which onsets occur in non-integer ratio sequences (e.g., onsets occurring at 1, 1.3, 0.4, 1.7 intervals apart) rather than the integer ratio sequences common to metric simple and complex rhythms. These stimuli sound almost random, so much so that listeners have a strong inclination to reproduce them by incorrectly tapping back integer ratio sequences (Collier & Wright, 1995; Essens, 1986; Ravignani, Delgado, & Kirby, 2017). Importantly, the perceptual difference between simple and complex rhythms is a product of whether perceptual accents coincide with tone onsets, as both types of rhythms are composed of integer-ratio intervals. Now that we have covered the notion of metricality and the distinction between simple and complex rhythms, we can dive into recent findings
from cognitive neuroscience that describe how neural processes give rise to rhythm perception. The basic logic of fMRI studies involves measuring the brain’s online metabolic activity in at least two different experimental conditions, and comparing the pattern of differences. These differences indicate which neural structures respond during a behavior or cognitive function. For example, comparing brain responses to simple, complex, and non-metric rhythms that differ in the strength of beat perception but are otherwise perceptually similar, might pinpoint the neural structures involved in perceiving the beat. Using this approach, greater activity has been seen in certain motor structures, namely the basal ganglia (BG) and supplementary motor area (SMA), while hearing metric simple rhythms (which have a clear beat) compared to complex or non-metric stimuli (which have little or no beat; Grahn & Brett, 2007). Greater activity occurred regardless of participants’ musical training, suggesting a fundamental role for those areas in rhythm perception. Responses in other motor areas, namely the premotor cortex (PMC) and cerebellum, were observed for all sequences, and did not vary depending on the presence or absence of a beat. All these motor areas were implicated in more general timing and sequence processing as well (Chen, Penhune, & Zatorre, 2008), but the basal ganglia and SMA appear to be particularly responsive to beat processing. A similar study revealed a distributed network that predicts individual differences in beat perception, in which strong beat perceivers display greater SMA, ventrolateral prefrontal cortex (PFC), and medial PFC activity compared to weak beat perceivers. The activated network in strong beat perceivers was more distributed through frontal and motor areas, whereas the activated network in weak beat perceivers was more limited to auditory processing (Chen, Zatorre, & Penhune, 2006). While degree of motor response in activated networks differs depending on the task and the stimuli, one conclusion is clear: rhythm perception recruits participation from a network of motor areas, even when movements are not required. The recruitment of basal ganglia (specifically, the putamen) during beat perception was further confirmed by a later fMRI study that examined the neural responses to the beat when the beat was induced by different types of accents, or emphases (Grahn & Rowe, 2009). The beat was either emphasized by changes in loudness on beat tones (strong external accents), or perceptual accents created by timing of the tones without loudness changes as described in metric simple stimuli above (weak external
accents), or the tones were unaccented, and thus any beat imposed by listeners was generated internally (internal accents). Regardless of the accent used to induce beat perception, greater putamen activity was observed relative to control non-beat rhythms. Moreover, greater connectivity (taken as a measure of communication between brain areas) was observed between the putamen and the SMA, PMC, and auditory cortex. In musicians, whether the beat was external or internal activated different subcomponents of the motor network, modulating connectivity between PMC and auditory cortex. fMRI studies provide insights into the metabolic state of the brain under different conditions, but there are other physiological markers we can use to decipher the processes involved in rhythm perception. TMS can be used to briefly activate the connections between the brain, spinal cord, and distal musculature, giving us a snapshot of the body’s corticospinal excitability at any given moment. Simply, a strong magnetic field is induced on the surface of a participant’s head using a powerful electric current, safely contained within an insulated coil. This field passes a few centimeters through the head and induces a small and localized electric field inside the brain that causes neurons within it to fire. This means experimenters can directly and non-invasively stimulate neuronal firing. The neuronal firing triggered by TMS delivered to primary motor cortex (M1) results in involuntary muscle twitches. Measuring the amplitude of the electrical signal that causes the muscle to contract, called a motor evoked potential (MEP), at the muscle gives a reliable index of corticospinal excitability, or the motor system’s readiness for action. For example, stimulating M1 in pianists elicits greater amplitude MEPs from hand muscles when they listen to a piece they have played compared to an unfamiliar piece (D’Ausilio, Altenmüller, Olivetti Belardinelli, & Lotze, 2006), suggesting that their motor system automatically responds to pieces they have learned. Rhythm researchers have used this TMS-MEP logic to measure the motor system’s excitability during beat perception. For example, MEP amplitudes measured from the ankle were greater when TMS pulses were delivered in time with the beats of strong beat compared to weak beat rhythms (Cameron, Stewart, Pearce, Grube, & Muggleton, 2012). Increased MEP amplitude in response to the beat is in line with the aforementioned fMRI literature on greater basal ganglia and SMA activations during perception of simple compared to complex sequences. Increases in excitability also
happen in response to music. In a recent study, musicians listened to “highgroove” or “low-groove” music while receiving TMS, where groove is “a musical quality that makes us want to move with the rhythm or beat” (Stupacher, Hove, Novembre, Schutz-Bosbach, & Keller, 2013, p. 127). MEP amplitudes were greater for high-groove versus low-groove music, and this effect was more pronounced for MEPs elicited on the beat versus off the beat pulses, indicating that motor system readiness was greatest on the beat. Although these studies do not directly implicate the basal ganglia in rhythm perception—these structures are too deep within the brain for the transcranial magnetic field to penetrate—the MEPs are measured downstream of the central nervous system, implicating the possible modulation of the entire motor system. The last source of evidence for the role of the motor system in rhythm perception comes from neuropsychological cases. There are patients whose rhythm perception has been altered by a disease or disorder, or by lesions due to stroke. For example, Parkinson’s disease (PD) is characterized by rigidity of movement, tremor, and slowness, and is caused by the progressive deterioration of the dopaminergic pathway in the basal ganglia. Given the aforementioned review of the basal ganglia’s role in rhythm perception, and PD patients’ documented difficulty with perception and production of isochronous rhythms (Harrington, Haaland, & Hermanowitz, 1998), Grahn and Brett (2009) surmised that this patient population would also display difficulties with beat perception. Indeed, healthy older participants found it easier to perceive simple, beat-based rhythms versus complex rhythms, whereas PD patients did not display this advantage, indicating they were less able use the simple rhythms’ beat-based structure to perform the discrimination task. The authors concluded that a healthy basal ganglia appears to be necessary for processing rhythms with a strong beat. In a follow-up to this study, PD patients who were on L-DOPA, a drug that increases dopamine, discriminated simple rhythms better (but complex rhythms worse) when they were on versus off their medication (Cameron, Pickett, Earhart, & Grahn, 2016). Rhythm discrimination performance was correlated with the severity of the Parkinson’s disease. Taken together, the results indicated that healthy dopaminergic function influences beat-based timing. Finally, the ability to adapt to changes in tempo is severely hampered by focal basal ganglia lesions due to stroke (Schwartze, Keller,
Patel, & Kotze, 2011). Thus, overall, these neuropsychological studies point to an essential role of the basal ganglia in normal rhythm perception.
O
M
An alternative way of examining beat perception is through the neural dynamics of excitation and inhibition, which lead to cyclical activity changes in populations of neurons. These cyclical changes are called oscillations. Neuronal activity oscillates spontaneously in the brain, but when listeners receive rhythmic input, the phase and period of ongoing neural oscillations can be influenced to match, or phase-lock to, the incoming signal (e.g., Picton, John, Dimitrijevic, & Purcell, 2003; Schroeder, Lakatos, Kajikawa, Partan, & Puce, 2008). Indeed, rhythmic stimuli like music or language can act as a pacing signal that allows listeners to more accurately attend to relevant information in the continuous signal (Henry & Obleser, 2012). This finding is directly consistent with behavioral work from the dynamic attending literature that finds fluctuations in attention over time (Jones & Boltz,1989; Large & Jones, 1999). That is, attention fluctuates periodically, with peaks of concentrated attention that occur more strongly with increasing periodicity of a stimulus. Oscillatory attentional and neural dynamics help explain a large body of literature in music cognition that finds better performance for events (pitch or interval discrimination) happening on a strong beat—when attention may be at its peak in the oscillatory phase—compared to a weak beat (for recent review see Henry & Herrmann, 2014). The ongoing oscillatory neural dynamics of the brain also make behavioral predictions for musical rhythm and beat perception. Neural resonance theory (Large, 2008; Large & Snyder, 2009) demonstrates that the interaction of rhythmic input with a bank of ongoing neural oscillators can give rise to several key facets of human rhythm and beat perception described throughout this chapter. As described above, the way that humans experience music is as a stable regular pattern in time. However, the surface rhythm is often not periodic. A rhythm may initially have several events that fall on the beat, but events will also fall on both strong and weak metrical positions (Figs. 1A and 1B) and may even begin to consistently fall
on the off-beat, as is the case in syncopated rhythms (Fig. 1C). The mathematical model of neural resonance theory predicts that perception of the beat in this instance would remain stable because the initial rhythmic input from rhythms of Figs. 1A and 1B act to reset the phase and period of ongoing neural oscillators, leading to a beat percept that is quite persistent, even in the face of conflicting evidence. Further, the physics of ongoing oscillators allow listeners to maintain a rhythmic pulse, even in the absence of environmental input (e.g., through a silent gap in a song). Finally, a key prediction from neural resonance theory illustrates that neural oscillations resonate with rhythmic input, which results in peaks of activation at integer multiples at harmonics or subharmonics of the input rhythms. These harmonics (3:1 or 2:1) and subharmonics (1:3 or 1:2) are related to the way that listeners hear perceptual accents on alternating events (see section on “Oscillatory Mechanisms”), and place stronger perceptual emphasis on certain tones, such as downbeats, in metrical groupings. Taken together, neural resonance theory explains how humans (a) perceive a regular pulse from irregular rhythmic input (i.e., in the absence of strictly periodic input, such as a metronome), (b) maintain the feeling of a pulse or beat that persists when sound ceases, and (c) experience alternations of strong and weak beats and begin to organize music into a hierarchical metrical framework. That is, listeners “hear musical events in relation to these patterns because they are intrinsic to the physics of the neural systems involved in perceiving, attending, and responding to auditory stimuli” (Large & Snyder, 2009, p. 52). Neural oscillatory perspectives on rhythm perception have been fruitful because they rely on the temporal dynamics of rhythm processing instead of relying on an index of rhythm processing at a single moment in time or as an average of brain activity over time. One popular way to examine beat perception has been through the frequency tagging approach. This methodology allows researchers to characterize at what rates listeners hear strong events happening, or have heightened attention, in musical rhythms (e.g., events happening at a rate of 2 or 3 Hz). A landmark study demonstrated that when participants heard a metronome (i.e., evenly spaced, unaccented sequence of tones), but were asked to perceive the metronome in groupings of two or three tones, there was a peak in the power of the EEG spectrum related to the particular frequency they imagined. Thus, when participants heard the rhythm in groupings of two
there was greater power at the frequency related to a binary grouping compared to the slower frequency related to a ternary grouping and vice versa (Nozaradan, Peretz, Missal, & Mouraux, 2011). When participants were trained to move to an ambiguous rhythm in either a duple or a triple meter, there was subsequently greater power in the EEG spectrum at the frequency they moved to (Chemin, Mouraux, & Nozaradan, 2014), even when simply listening and no longer moving. Such findings demonstrate that beat perception is not simply stimulus-driven, but that listeners can and do impose a beat on a sequence that can be observed in neural activity. Oscillatory activity in particular frequency bands—which are unrelated to the particular frequency of the input stimulus—is also important for characterizing rhythm processing. When researchers looked at induced (i.e., not phase-locked to the event onset) and not evoked (i.e., phase-locked to the stimulus onset) activity using electroencephalography they found that activity of high-frequency oscillations from 20–60 Hz followed the pattern of processing that was observed in many behavioral studies of rhythm processing—listeners do not simply react to note onsets after they occur. Instead, they anticipate a beat and even “feel” that beat when a note is occasionally omitted. These researchers showed that there was a peak in induced high-frequency oscillations that occurred in anticipation of tone onsets even when the tone was omitted (Snyder & Large, 2005). Further studies found separate functional roles for beta (15–30 Hz) and gamma (30– 80 Hz) frequency bands (Fujioka, Trainor, Large, & Ross, 2009). Activity in the beta band reflected motor processing and was important for coordinating auditory-motor interactions when processing the beat in music, whereas gamma-band activity was associated with the same endogenous anticipatory processing of the beat found in previous studies. In a follow-up study, Fujioka and colleagues (Fujioka, Trainor, Large, & Ross, 2012) found that induced beta-band activity increased in anticipation of the beat and varied with tempo, further suggesting an endogenous generator. In contrast, a sharp decrease in induced beta occurred immediately after the onset of the beat, but this decrease followed the same pattern regardless of stimulus presentation characteristics, suggesting that beta desynchronization was simply a response to hearing a tone, and not reflective of anticipation. Importantly, this activity originates both from auditory cortex and from sensorimotor cortex, again highlighting the role beta-band activity plays in coordinating auditory-motor interactions (Fujioka et al., 2012). These
auditory-motor interactions in the beta band could have important consequences for preparation of motor movements that are important in more ecologically valid musical experiences, such as when the listener grooves along with the music. Again, these studies highlight that rhythm processing is not just a faithful tracking of acoustic input, but rather involves the perception of beat and meter related periodicities that are not necessarily part of a stimulus. These studies also highlight the integral role motor processing plays in the perception of the beat in music, even when listeners are not moving or tapping along to the music. There is a growing interest in neuroscientific investigations of rhythm processing to characterize the way that humans entrain to music by looking at oscillatory dynamics at the beat frequency and by considering auditorymotor dynamics in other frequency bands. While many approaches have shed considerable light on the ways listeners perceive rhythm, it is important to carefully consider how we interpret whether differences in peak power of the EEG spectrum are related to beat or stimulus characteristics (Henry, Herrmann, & Grahn, 2017). Neural resonance theory has shown that many facets of beat perception in humans emerge naturally from the physical interactions of multiple internal oscillators and rhythmic input, but it does not explain all aspects of beat perception, such as how children learn to be better perceivers of the beat in music, or how beat perception changes based on culture or musical experience. Future research is needed to understand more about how oscillatory activity in the brain interacts with music experience and how such experiences are maintained or weighted in such a dynamical system.
L
M
Music and language are both important forms of human communication. Although there are many similarities among these two domains, one of the key similarities related to rhythm processing is that music and language both unfold sequentially in time and are hierarchically structured (Patel, 2003). Yet the temporal characteristics of how they unfold are different (Ding et al., 2017). As is clear from the discussion thus far, listeners perceive a musical beat that, despite surface irregularities, leads to the
perception of beat events unfolding at regular intervals, with alternating strong and weak beats according to the meter of the music. In language, there is a long history of debate over whether there are isochronous units in speech rhythms between successive syllable onsets or between successive stressed syllable onsets. However, after carefully annotating speech intervals at consonant and vowel onsets, little evidence has been found for regularity between syllables or stressed syllables in the acoustic signal, although other patterns of more or less vocalic variability did emerge (Grabe & Low, 2002; but see Brown, Pfordresher, & Chow, 2017). Despite the lack of evidence for an isochronous interval in speech, spoken utterances are still comprised of rhythmic, albeit less regular than in music, peaks in the acoustic signal that are important for helping listeners form expectations. There is growing evidence that better neural tracking of the rhythmic syllable onsets in speech is important for language comprehension (e.g., Peelle, Gross, & Davis, 2013). Given the importance of understanding rhythmic relationships in language, there is growing interest in how musical training or musical ability is related to language abilities in a wide range of listeners. A large body of literature has focused on the relationship between reading and rhythm processing. For instance, compared to age and reading level matched peers, individuals with dyslexia are worse at neurally tracking low-frequency (e.g., delta and theta frequency bands) temporal information in the speech signal (Power, Colling, Mead, Barnes, & Goswami, 2016). Delta (0–4 Hz) and theta (4–8 Hz) frequencies roughly correspond to the rate at which phrases and syllables unfold in the speech stream, respectively. Some researchers have even posited that the deficits related to phonological awareness and analysis in developmental dyslexia are actually caused by temporal processing deficits. In particular, some evidence suggests that adults with dyslexia oversample the speech stream in high-frequency oscillatory bands that may be related to phonological onsets, which leads to greater power at frequencies that may be irrelevant for processing phonetic information in speech (Lehongre, Ramus, Villiermet, Schwartz, & Giraud, 2011). Further evidence of a relationship between language and temporal processing abilities comes from individuals with specific language impairments, including dyslexia, for whom there are positive correlations between musical training and language outcomes, with unique predictive power coming from rhythm perception skills (Flaugnacco
et al., 2015; Habib et al., 2016; Zuk et al., 2017). Enhanced language processing as a result of music ability may be particularly related to beatbased processing abilities, and not better encoding of rhythmic intervals in general, as studies have shown that regularity detection is particularly related to language and literacy in adults from a wide range of language backgrounds (Bekius, Cope, & Grube, 2016; Grube, Cooper, & Griffiths, 2013). While the studies outlined above show evidence that neurally following rhythmic input in speech is important for developing normal language and reading skills and that music training seems to be related to behavioral language outcomes, there are very few studies that have compared beat perception or production abilities and related them directly to neural tracking of rhythmic input. However, one study has shown that there is an association between rhythm production abilities in preschoolers and encoding of fundamental frequency in a single utterance (i.e., “da”) through auditory brainstem response (Woodruff-Carr, White-Schwoch, Tierney, Strait, & Kraus, 2014). Further research is necessary to establish a link between rhythm perception or production and neural tracking of lowfrequency information in speech. Tracking the rhythmic fluctuations in the amplitude envelope of speech may also be different from the types of rhythmic entrainment discussed above. Therefore, it is important to determine whether these language and reading studies rely on similar mechanisms as musical rhythm entrainment, such that that neural activity can remain entrained to the syllable rate even when the utterance is removed, or whether these studies represent a more stimulus-dependent rhythmic processing than musical rhythms.
D
R
Rhythm perception is an important skill for myriad domains, including music, language, and movement, among others. The ubiquity of rhythmic information in our everyday environment highlights the importance for developing rhythm processing skills early in development. Indeed, rhythm processing seems to be important also for social-emotional development. Children who are able to synchronize to music show not only better parsing
of events unfolding in time, but also show better social-emotional processing as a result of synchronization. After playing musical instruments together, 4-year-olds showed higher rates of spontaneous helping compared to children who were not encouraged to synchronize their actions with a partner (Kirschner & Tomasello, 2010). A similar pattern can be found as young as 14 months of age in a paradigm that induces synchrony between a child and another adult by having the experimenter bounce the child in an infant carrier in synchrony with another adult (Cirelli, Einarson, & Trainor, 2014). Infants bounced synchronously showed more prosocial helping behaviors than children bounced out of synchrony. It is clear that beat processing is advantageous for normal development, but indexing beat perception in infancy is difficult given the limitations young infants have making overt behavioral responses. This is where neural measures are particularly useful for examining beat perception at the earliest stages of development. Newborns listening to musical sequences show larger neural mismatch responses when an omission occurs on a strong beat compared to a weak beat (Winkler, Haden, Ladinig, Sziller, & Honing, 2009), providing evidence for beat processing in humans from birth. Further, using the frequency-tagging approach described above, infants who heard an ambiguous rhythm that could be perceived in either a duple or triple meter, had peaks corresponding to the beat and both metrical frequencies (Cirelli, Spinelli, Nozaradan, & Trainor, 2016). However, infants with either more experience in music classes or more musically engaged parents showed greater peaks in the EEG spectrum related to duple compared to triple meter perception (Brochard et al., 2003). This finding is in line with culturespecific patterns suggesting that Western listeners prefer simple integer ratios, with a bias toward duple meter groupings. Later in childhood, even when children are capable of making behavioral or motor responses, the immaturity of their motor system makes it unclear whether a lack of beat perception abilities or motor immaturity is at the root of differences between the way children and adults process rhythm and beat. For instance, although children move when they hear music, there is little evidence that they actually synchronize their movements to the beat, which makes it unclear whether children are poor beat perceivers or poor dancers. As described above, beta-band activity reflects auditory-motor interactions and researchers have used this approach to show that beat processing may not
become mature until after age 7. Seven-year-olds’ beta-band activity during beat processing only showed the adult-like pattern of desynchronization and subsequent rebound in anticipation of the beat for slow rhythms, not fast rhythms (Cirelli, Bosnyak, et al., 2014). These findings in the beta band align well with behavioral findings suggesting that sensorimotor synchronization with music does not reach adult-like accuracy until 8 or 9 years of age (McAuley, Jones, Holub, Johnston, & Miller, 2006). Together these findings suggest that beat perception is intact from birth, but that the auditory-motor processing capabilities required to synchronize movements to music develop well into childhood.
C E
P R
P
Clues to the neural mechanisms of rhythm perception can be gleaned from comparative studies between humans and other species that have similar abilities, but different brains. Some of the best examples of rhythmic entrainment come from various bird species, such as cockatiels (Patel, Iversen, Bregman, & Schulz, 2009), certain parrots (Schachner, Brady, Pepperberg, & Hauser, 2009), and budgerigars (Hasegawa, Okanoya, Hasegawa, & Seki, 2011), who can all bob their heads in time with a simple rhythm. Although none of these animals appear to reach human sophistication with rhythmic entrainment—for example, they have difficulty with complex rhythms, or with adaptation to novel tempos—their ability to synchronize with simple rhythms has led some researchers to theorize that beat perception is a corollary ability to the development of vocal learning (Patel, 2006). Although this idea is further supported by the presence of simple synchronization abilities in other vocal, non-bird species such as bonobos, chimpanzees, and possibly elephants (Hattori, Tomonaga, & Matsuzawa, 2013; Large & Gray, 2015; Poole, Tyack, Stoeger-Horwath, & Watwood, 2005), recent demonstrations of rhythmic entrainment in a sea lion—a decidedly non-vocal species—pose complications for the theory that beat perception is a corollary development to vocal learning (Cook, Rouse, Wilson, & Reichmuth, 2013). The sea lion not only synchronizes to simple rhythms, she satisfies more stringent tests of rhythmic entrainment
previously observed only in humans, such as being able to adapt to changes in tempo. Most cross-species research is done on monkeys instead of better vocal learners (like the aforementioned bird species) for matters of convenience (e.g., established monkey neurophysiology labs, similar brains) and closer evolutionary ancestry. Monkey rhythmic entrainment is impoverished by comparison to humans: macaques can time intervals very accurately (Zarco, Merchant, Prado, & Mendez, 2009), and can synchronize with simple isochronous sequences, but they have more reactionary, as opposed to anticipatory, actions. Online measures of neural activity (i.e., local field potentials, LFPs) during synchronization tasks indicate that monkeys’ putaminal cells are interval-sensitive, with different populations representing different durations by bursts of gamma- or beta-band oscillations (Bartolo, Prado, & Merchant, 2014). Thus, monkeys are good at timing the individual intervals that make up a rhythm, but they do not appear to synchronize as accurately as humans do when multiple intervals are presented in a sequence. Monkeys also do not appear to process nonisochronous rhythms the way that humans do. EEG studies in monkeys show no event-related potentials (ERPs) corresponding to unexpected events (as indexed by the mismatch negativity, or MMN, to unexpected beat omissions); the MMN represents the detection of something out-of-place— if the monkeys don’t perceive the beat in the rhythm, then the omissions won’t be out of place (Honing, Merchant, Háden, Prado, & Bartolo, 2012). Moreover, simple rhythmic deviants elicit changes in gaze and expression, and auditory cortex LFPs (Selezneva et al., 2013). Structurally, these sequential timing tasks in humans rely on the motor cortico-basal-gangliathalamo-cortical (mCBGT) circuit in humans. The monkey analogue to this network also appears to be heavily involved in motor timing and sequencing (Merchant, Pérez, Zarco, & Gámez, 2013). However, the reciprocal connections between auditory cortex and the mCBGT circuit that exist in humans are not matched in monkeys. Instead, the monkey mCBGT appears to be more strongly connected to visual cortex, which may explain why monkeys lack strong rhythmic entrainment, and perform better on visual synchrony tasks than auditory ones (for a review, see Merchant & Honing, 2014). This structural discrepancy, along with the behavior differences between humans and monkeys, may explain why strong rhythmic entrainment appears to be a decidedly human ability.
C
-M R
I P
Thus far, this chapter has focused on rhythm perception in the auditory modality. However, rhythms include temporally patterned stimuli in any modality. For example, the isochronous blinking of a car’s turn signal is a visual rhythm, and your phone’s vibrating notification is a tactile rhythm. In this section, we will discuss how rhythm is perceived in non-auditory modalities, focusing on vision. Predictably, the neural correlates of rhythm perception differ between modalities, but some are also shared. These shared substrates might be a clue to the neural representation of rhythm in a pure, temporal sense, uncontaminated by modality-specific processing: the sine qua non of rhythm perception. Like audition, vision is sensitive to temporal regularities in the environment. For example, visual-spatial attention is biased toward reliably repeating patterns (Zhao, Al-Aidroos, & Turk-Browne, 2013). Unlike audition, rhythmic visual stimuli, such as a blinking dot, do not lead to a strong sense of beat: Auditory rhythms are reproduced and remembered better than visual ones (Glenberg, Mann, Altman, Forman, & Procise, 1989; Collier & Logan, 2000, respectively). While it is true that audition generally has better temporal sensitivity than vision (e.g., Goldstone & Lhamon, 1972) this does not explain why auditory rhythms give rise to a sense of beat and visual ones do not. Recently, researchers have instantiated visual rhythms with more dynamic stimuli in an attempt to capitalize on the visual system’s sensitivity to motion and acceleration. A blinking stimulus isn’t visually natural, but a moving one is. Concordantly, rotating bars and bouncing balls can give rise to a sense of beat in a manner similar to auditory stimuli (Grahn, 2012; Hove, Iversen, Zhang, & Repp, 2013; Iversen, Patel, Nicodemus, & Emmorey, 2015). Even more naturalistic stimuli, like watching a dancer or following a conductor’s baton, give rise to timing advantages illustrative of beat perception (Luck & Sloboda, 2009; Su & Salazar-Lopez, 2016). The message from this new literature is that although audition wins over other modalities for temporal processing superiority, rhythm processing is possible in other modalities when the stimuli are crafted to follow that sense’s priorities.
Given that visual rhythm processing is possible, how does the brain do it? One possibility is that visual rhythm processing piggy-backs on the rhythmically-superior auditory and motor resources. According to this view, visual rhythm perception involves the creation of an internal auditory rhythm to accompany visual stimuli. Evidence for this perspective was demonstrated in an fMRI task where participants watched or heard rhythmic stimuli in counterbalanced blocks of a tempo adaptation task; visual sequences produced a stronger sense of beat and stronger bilateral putamen activity when preceded by the auditory task block versus with no prior auditory experience with the task (Grahn, Henry, & McAuley, 2011). This change in brain response during the visual task following the auditory block resembled the activation observed in auditory tasks alone (Grahn & Brett, 2007). When the visual task preceded the auditory block, there was no enhancement to rhythm perception or brain response in the basal ganglia, indicating that it was not simply a practice effect. This study used blinking visual rhythms that do not readily elicit rhythm perception, so the authors suggested that the observed behavior and brain responses reflected the co-opting of typical auditory rhythm perception to achieve the perception of a visual rhythm. In a later fMRI study with discrete and moving visual and auditory stimuli, this putamen activity was shown to reflect a supra-modal rhythm perception response: Activity in the putamen corresponded to the strength of synchrony with an ongoing rhythm, regardless of the modality and without prior auditory experience with the stimuli (Hove, Fairhurst, Kotz, & Keller, 2013). This idea of a supra-modal, or modality-general, process underpinning rhythm perception received further support from a study measuring ERPs in response to temporal expectancy violations in an adaptive tempo task with auditory and visual stimuli (Pasinski, McAuley, & Snyder, 2016). They found larger amplitude ERPs for the auditory task, but similar patterns as the visual response, suggesting again the presence of a modality-general rhythm perception network, likely rooted in the basal ganglia and the motor system.
I
D
M T
Rhythm processing abilities vary widely in the general population. It is not difficult to run into someone who proclaims that she has two left feet, and there is evidence of individuals who are actually “beat deaf.” These individuals cannot align their movements to the beat of a musical piece despite being able to synchronize to a metronome (Phillips-Silver et al., 2011). Differences in experience, such as music training, can enhance rhythm perception and production. But individual differences in abilities that are associated with beat perception can lead people to encode, store, and act on auditory information in different ways. For instance, individual differences in auditory short-term memory (STM) and regularity detection are associated with better rhythm abilities, especially when reproducing longer rhythms (Grahn & Schuit, 2012). Music training also accounts for unique variance in rhythm reproduction abilities compared to auditory STM and regularity detection, although musical training may only influence rhythm perception abilities in certain tasks (Bauer, Kreutz, & Herrmann, 2015; Grahn & Brett, 2007; Geiser, Ziegler, Jancke, & Meyer, 2009). Regularity detection is also correlated with activation in auditory-motor areas, including left SMA and left dorsal and ventral premotor areas, which may indicate that people who are better at detecting the beat in music also rely more heavily on transforming rhythms into auditory-motor representations instead of relying purely on auditory cues (Grahn & Schuit, 2012). These findings are similar to previous work showing that strong beat perceivers showed greater activation in SMA compared to weak beat perceivers, when listening to an ambiguous rhythm (Grahn & McAuley, 2009). Individual motor abilities may also be important for predicting individual differences in preferred tempo (120 bpm or 2 Hz), which is the rate at which listeners feel most comfortable tapping to music or a metronome (McAuley et al., 2006). An individual’s specific peak frequency in the beta range, assessed during a motor tapping task, predicts preferred tempo (Bauer et al., 2015), providing additional evidence that auditorymotor interactions can lead to differences in the way that people prefer to entrain to music. Although much of the literature on rhythm and beat perception makes claims about commonalities across individuals in the neural processing of rhythms, there is considerable variation in the way humans respond to rhythms. These individual differences are particularly important to consider when trying to use rhythm as a therapeutic tool, as in patients with PD.
Although rhythmic stimulation may have seemingly miraculous effects for some individuals, there are many others for whom rhythmic stimulation may have no effect or perhaps even a negative effect on gait (Leow, Parrott, & Grahn, 2014; Nombela et al., 2013). Further research is necessary to characterize what factors lead to these individual differences in neural processing of rhythm, including auditory-motor interactions, musical background, and biological differences to better target interventions to the individual.
M
J
A
So far, we have examined rhythm as a subject of the perceiver and his or her brain. Realistically, rhythms must also have creators, making rhythm perception an inherently social topic: It depends upon the perception of others’ actions. One of the themes of this chapter has been the contribution of the motor system to the perception of rhythm; unsurprisingly, it may be through the shared architecture of our motor systems that we perceive music and rhythm so fluently when expressed by other people. The idea of motor system involvement in rhythm perception follows from the discovery of the mirror neuron system in monkeys and analogous systems in humans (see Rizzolatti & Craighero, 2004, for a review), a network that responds not only to one’s own movements, but also to seeing or hearing movements of others. This discovery was rapidly adapted to explain motor simulation: that we unconsciously mimic, or concurrently represent, others’ movements within our own motor system (Gallese & Goldman, 1998). It is useful to think of motor simulation as a way to represent observed actions by the same motoric structures that execute them. Later, motor simulation was employed to explain the empathic nature of movement in art (Freedberg & Gallese, 2007). Evidence for the shared representation of action in art observation has been demonstrated in dance (Cross, Hamilton, & Grafton, 2006) and painting (Leder, Bär, & Topolinski, 2012; Taylor, Witt, & Grimaldi, 2012), but it is most prominently espoused in music, explaining findings such as the automatic activation of handcontrolling motor areas in pianists while listening to piano performance (Haueisen & Knösche, 2001), the co-activation of auditory areas in
violinists when they mimic violin actions (Lotze, Scheler, Tan, Braun, & Birbaumer, 2003), and various effects describing interference between music listening and musical performance, which occurs because both processes depend upon activation of the motor system (e.g., Drost, Rieger, Brass, Gunter, & Prinz, 2005; Drost, Rieger, & Prinz, 2007; Taylor & Witt, 2015). As we have seen, rhythm perception involves the motor system: Feeling the beat is an inherently motoric phenomenon. We can apply the logic of motor simulation to the challenging demands of timing and joint action in music. Perceiving rhythm in a social setting may also require the concurrent representation of others’ actions in the listener’s motor system. In an inventive TMS study, pianists were required to play the right-hand part of a duet where the left-hand part was either rehearsed by them at an earlier time or unrehearsed. This left-hand part would undergo regular changes in timing that the subject would adapt to. Right-hemisphere (read: left-hand) TMS interfered with tempo adaptation only for duets where the subject had previously rehearsed the left-handed accompanying piece. This indicates that keeping time with a duet partner involved the online co-representation of that partner’s piece, which was disrupted by the TMS (Novembre, Ticini, Schütz-Bosbach, & Keller, 2013). This is evidence for the role of motor simulation in the perception of rhythm and timing during joint musical action. This flexible adaptation is not surprising given the motor system’s ability to represent the temporal dynamics of observed actions (Press, Cook, Blakemore, & Kilner, 2011). This co-representation of observed and executed actions is important for any kind of rhythmic cooperation. To study rhythmic cooperation, a group of researchers created a virtual partner in an adaptive timing task so that the degree of timing cooperation could be tightly controlled. Subjects tapping along with the virtual partner’s changing rhythm exhibited increased activity in premotor areas when the partner was cooperative versus difficult to follow along with, suggesting that this kind of rhythmic co-action depends on simulated internal representations of the co-actor (Fairhurst, Janata, & Keller, 2012).
C
As we have seen, the neuroscience of rhythm is a vibrant field of study that can be approached from many angles. Despite this variety, there has been a common thread throughout this chapter: the involvement of motor processes during the perception of rhythm and beat. Given the role of the motor system in fundamental timing processes, it is not surprising that similar networks should become involved with the perception of rhythm. What interests us is the reliability and the variety of motor system participation in rhythm processing, whether it is the recruitment of the basal ganglia during the perception of strong beats as revealed by fMRI (see section on “Feeling the Beat”), the heightened corticospinal excitability of toe-tapping as revealed by TMS (see section on “Feeling the Beat”), the auditory-motor coordination of beta-band patterns recorded from EEG (see section on “Oscillatory Mechanisms”), the co-development of movement and rhythm production in children (see section on “Development of Rhythm”), or the use of others’ actions to guide joint action (see section on “Comparative Psychology and Evolution of Rhythm Perception”). These are just a few of the many ways in which the auditory and motor systems interact to produce the rich experience of rhythm perception and production. Promising new research offers these auditory-motor interactions as the basis for therapies to help patients with neurodegenerative diseases of the basal ganglia (e.g., Spaulding et al., 2013), or patients with developmental language disorders, such as dyslexia (e.g., Flaugnacco et al., 2015). Our ability to neurally process rhythms is not only important for being able to clap along to our favorite song, but is important for examining fundamental psychological questions ranging from individual differences in perception and production to what makes humans unique from other species. Future work on the neural bases of rhythm perception has the potential to inform a wide range of domains including aesthetics, evolution, and human perception and production.
R Bartolo, R., Prado, L., & Merchant, H. (2014). Information processing in the primate basal ganglia during sensory-guided and internally driven rhythmic tapping. Journal of Neuroscience 34(11), 3910–3923. Bauer, A. K., Kreutz, G., & Herrmann, C. S. (2015). Individual musical tempo preference correlates with EEG beta rhythm. Psychophysiology 52(4), 600–604.
Bekius, A., Cope, T., & Grube, M. (2016). The beat to read: A cross-lingual link between rhythmic regularity perception and reading skill. Frontiers in Human Neuroscience 10, 425. Retrieved from https://doi.org/10.3389/fnhum.2016.00425 Brochard, R., Abecasis, D., Potter, D., Ragot, R., & Drake, C. (2003). The “ticktock” of our internal clock: Direct brain evidence of subjective accents in isochronous sequences. Psychological Science 14(4), 362–366. Brown, S., Pfordresher, P. Q., & Chow, I. (2017). A musical model of speech rhythm. Psychomusicology: Music, Mind, and Brain 27(2), 95–112. Cameron, D. J., Pickett, K. A., Earhart, G. M., & Grahn, J. A. (2016). The effect of dopaminergic medication on beat-based auditory timing in Parkinson’s disease. Frontiers in Neurology 7, 19. Retrieved from https://doi.org/10.3389/fneur.2016.00019 Cameron, D. J., Stewart, L., Pearce, M. T., Grube, M., & Muggleton, N. G. (2012). Modulation of motor excitability by metricality of tone sequences. Psychomusicology: Music, Mind, and Brain 22(2), 122–128. Chemin, B., Mouraux, A., & Nozaradan, S. (2014). Body movement selectively shapes the neural representation of musical rhythms. Psychological Science 25(12), 2147–2159. Chen, J. L., Penhune, V. B., & Zatorre, R. J. (2008). Listening to musical rhythms recruits motor regions of the brain. Cerebral Cortex 18(12), 2844–2854. Chen, J. L., Zatorre, R. J., & Penhune, V. B. (2006). Interactions between auditory and dorsal premotor cortex during synchronization to musical rhythms. NeuroImage 32(4), 1771–1781. Cirelli, L. K., Bosnyak, D., Manning, F. C., Sinelli, C., Marie, C., Fujioka, T., … Trainor, L. J. (2014). Beat-induced fluctuations in auditory cortical beta-band activity: Using EEG to measure age-related changes. Frontiers in Psychology 5, 742. Retrieved from https://doi.org/10.3389/fpsyg.2014.00742 Cirelli, L. K., Einarson, K. M., & Trainor, L. J. (2014). Interpersonal synchrony increases prosocial behavior in infants. Developmental Science 17(6), 1003–1011. Cirelli, L. K., Spinelli, C., Nozaradan, S., & Trainor, L. J. (2016). Measuring neural entrainment to beat and meter in infants: Effects of music background. Frontiers in Neuroscience 10, 229. Retrieved from https://doi.org/10.3389/fnins.2016.00229 Collier, G. L., & Logan, G. (2000). Modality differences in short-term memory for rhythms. Memory & Cognition 28(4), 529–538. Collier, G. L., & Wright, C. E. (1995). Temporal rescaling of simple and complex ratios in rhythmic tapping. Journal of Experimental Psychology: Human Perception and Performance 21(3), 602– 627. Cook, P., Rouse, A., Wilson, M., & Reichmuth, C. (2013). A California sea lion (Zalophus californianus) can keep the beat: Motor entrainment to rhythmic auditory stimuli in a non vocal mimic. Journal of Comparative Psychology 127(4), 412–427. Cooper, G., & Meyer, L. B. (1960). The rhythmic structure of music. Chicago, IL: University of Chicago Press. Cross, E. S., Hamilton, A. F. D. C., & Grafton, S. T. (2006). Building a motor simulation de novo: Observation of dance by dancers. NeuroImage 31(3), 1257–1267. D’Ausilio, A., Altenmuller, E., Olivetti Belardinelli, M., & Lotze, M. (2006). Cross-modal plasticity of the motor cortex while listening to a rehearsed musical piece. European Journal of Neuroscience 24(3), 955–958. Ding, N., Patel, A. D., Chen, L., Butler, H., Luo, C., & Poeppel, D. (2017). Temporal modulations in speech and music. Neuroscience & Biobehavioral Reviews 81(Part B), 181–187. Drost, U. C., Rieger, M., Brass, M., Gunter, T. C., & Prinz, W. (2005). Action-effect coupling in pianists. Psychological Research 69(4), 233–241.
Drost, U. C., Rieger, M., & Prinz, W. (2007). Instrument specificity in experienced musicians. Quarterly Journal of Experimental Psychology 60(4), 527–533. Epstein, D. (1995). Shaping time: Music, the brain, and performance. New York: Macmillan. Essens, P. J. (1986). Hierarchical organization of temporal patterns. Perception & Psychophysics 40(2), 69–73. Fairhurst, M. T., Janata, P., & Keller, P. E. (2012). Being and feeling in sync with an adaptive virtual partner: Brain mechanisms underlying dynamic cooperativity. Cerebral Cortex 23(11), 2592–2600. Flaugnacco, E., Lopez, L., Terribili, C., Montico, M., Zoia, S., & Schön, D. (2015). Music training increases phonological awareness and reading skills in developmental dyslexia: A randomized control trial. PLoS ONE 10(9), e0138715. Freedberg, D., & Gallese, V. (2007). Motion, emotion and empathy in esthetic experience. Trends in Cognitive Sciences 11(5), 197–203. Fujioka, T., Trainor, L. J., Large, E. W., & Ross, B. (2009). Beta and gamma rhythms in human auditory cortex during musical beat processing. Annals of the New York Academy of Sciences 1169, 89–92. Fujioka, T., Trainor, L. J., Large, E. W., & Ross, B. (2012). Internalized timing of isochronous sounds is represented in neuromagnetic beta oscillations. Journal of Neuroscience 32(5), 1791–1802. Gallese, V., & Goldman, A. (1998). Mirror neurons and the simulation theory of mind-reading. Trends in Cognitive Sciences 2(12), 493–501. Geiser, E., Ziegler, E., Jancke, L., & Meyer, M. (2009). Early electrophysiological correlates of meter and rhythm processing in music perception. Cortex 45(1), 93–102. Glenberg, A. M., Mann, S., Altman, L., Forman, T., & Procise, S. (1989). Modality effects in the coding reproduction of rhythms. Memory & Cognition 17(4), 373–383. Goldstone, S., & Lhamon, W. T. (1972). Auditory-visual differences in human temporal judgment. Perceptual and Motor Skills 34(2), 623–633. Grabe, E., & Low, L. (2002). Durational variability in speech and the rhythm class hypothesis. In N. Warner & C. Gussenhoven (Eds.), Papers in laboratory phonology 7 (pp. 515–546). Berlin: Mouton de Gruyter. Grahn, J. A. (2012). See what I hear? Beat perception in auditory and visual rhythms. Experimental Brain Research 220(1), 51–61. Grahn, J. A., & Brett, M. (2007). Rhythm and beat perception in motor areas of the brain. Journal of Cognitive Neuroscience 19(5), 893–906. Grahn, J. A., & Brett, M. (2009). Impairment of beat-based rhythm discrimination in Parkinson’s disease. Cortex 45(1), 54–61. Grahn, J. A., Henry, M. J., & McAuley, J. D. (2011). fMRI investigation of cross-modal interactions in beat perception: Audition primes vision, but not vice versa. NeuroImage 54(2), 1231–1243. Grahn, J. A., & McAuley, J. D. (2009). Neural bases of individual difference in beat perception. NeuroImage 47(4), 1894–1903. Grahn, J. A., & Rowe, J. B. (2009). Feeling the beat: Premotor and striatal interactions in musicians and nonmusicians during beat perception. Journal of Neuroscience 29(23), 7540–7548. Grahn, J. A., & Schuit, D. (2012). Individual difference in rhythmic ability: Behavioral and neuroimaging investigations. Psychomusicology: Music, Mind, and Brain 22(2), 105–121. Grube, M., Cooper, F. E., & Griffiths, T. D. (2013). Auditory temporal-regularity processing correlates with language and literacy skill in early adulthood. Cognitive Neuroscience 3(3–4), 225– 230. Habib, M., Lardy, C., Desiles, T., Commeiras, C., Chobert, J., & Besson, M. (2016). Music and dyslexia: A new musical training method to improve reading and related disorders. Frontiers in Psychology 7, 26. Retrieved from https://doi.org/10.3389/fpsyg.2016.00026
Harrington, D. L., Haaland, K. Y., & Hermanowitz, N. (1998). Temporal processing in the basal ganglia. Neuropsychology 12(1), 3–12. Hasegawa, A., Okanoya, K., Hasegawa, T., & Seki, Y. (2011). Rhythmic synchronization tapping to an audio-visual metronome in budgerigars. Scientific Reports 1, 120. doi:10.1038/srep00120 Hattori, Y., Tomonaga, M., & Matsuzawa, T. (2013). Spontaneous synchronized tapping to an auditory rhythm in a chimpanzee. Scientific Reports 3, 1566. doi:10.1038/srep01566 Haueisen, J., & Knösche, T. R. (2001). Involuntary motor activity in pianists evoked by music perception. Journal of Cognitive Neuroscience 13(6), 786–792. Henry, M. J., & Herrmann, B. (2014). Low-frequency neural oscillations support dynamic attending in temporal context. Timing and Time Perception 2(1), 62–86. Henry, M. J., Herrmann, B., & Grahn, J. A. (2017). What can we learn about beat perception by comparing brain signals and stimulus envelopes? PLoS ONE 12(2), e0172454. Henry, M. J., & Obleser, J. (2012). Frequency modulation entrains slow neural oscillations and optimizes human listening behavior. Proceedings of the National Academy of Sciences 109(49), 20095–20100. Honing, H., Merchant, H., Háden, G. P., Prado, L., & Bartolo, R. (2012). Rhesus monkeys (Macaca mulatta) detect rhythmic groups in music, but not the beat. PloS ONE 7(12), e51369. Hove, M. J., Fairhurst, M. T., Kotz, S. A., & Keller, P. E. (2013). Synchronizing with auditory and visual rhythms: An fMRI assessment of modality differences and modality appropriateness. NeuroImage 67, 313–321. Hove, M. J., Iversen, J. R., Zhang, A., & Repp, B. H. (2013). Synchronization with competing visual and auditory rhythms: Bouncing ball meets metronome. Psychological Research 77(4), 388–398. Iversen, J. R., Patel, A. D., Nicodemus, B., & Emmorey, K. (2015). Synchronization to auditory and visual rhythms in hearing and deaf individuals. Cognition 134, 232–244. Jones, M. R., & Boltz, M. (1989). Dynamic attending and responses to time. Psychological Review 96(3), 459–491. Kirschner, S., & Tomasello, M. (2010). Joint music making promotes prosocial behavior in 4-yearold children. Evolution and Human Behavior 31(5), 354–364. Large, E. W. (2008). Resonating to musical rhythm: Theory and experiment. In S. Grondin (Ed.), Psychology of time (pp. 189–231). Bingley: Emerald. Large, E. W., & Gray, P. M. (2015). Spontaneous tempo and rhythmic entrainment in a bonobo (Pan paniscus). Journal of Comparative Psychology 129(4), 317–328. Large, E. W., & Jones, M. R. (1999). The dynamics of attending: How people track time-varying events. Psychological Review 106(1), 119–159. Large, E. W., & Snyder, J. S. (2009). Pulse and meter as neural resonance. Annals of the New York Academy of Sciences 1169, 46–57. Leder, H., Bär, S., & Topolinski, S. (2012). Covert painting simulations influence aesthetic appreciation of artworks. Psychological Science 23(12), 1479–1481. Lehongre, K., Ramus, F., Villiermet, N., Schwartz, D., & Giraud, A.-L. (2011). Altered low-gamma sampling in auditory cortex accounts for the three main facets of dyslexia. Neuron 72(6), 1080– 1090. Leow, L.-A., Parrott, T., & Grahn, J. A. (2014). Individual differences in beat perception affect gait responses to low- and high-groove music. Frontiers in Human Neuroscience 8, 1–12. Retrieved from https://doi.org/10.3389/fnhum.2014.00811 Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge, MA: MIT Press. Lotze, M., Scheler, G., Tan, H. R., Braun, C., & Birbaumer, N. (2003). The musician’s brain: Functional imaging of amateurs and professionals during performance and imagery. NeuroImage
20(3), 1817–1829. Luck, G., & Sloboda, J. A. (2009). Spatio-temporal cues for visually mediated synchronization. Music Perception: An Interdisciplinary Journal 26(5), 465–473. McAuley, J. D., Jones, M. R., Holub, S., Johnston, H. M., & Miller, N. S. (2006). The time of our lives: Life span development of timing and event tracking. Journal of Experimental Psychology: General 135(3), 348–367. Merchant, H., & Honing, H. (2014). Are non-human primates capable of rhythmic entrainment? Evidence for the gradual audiomotor evolution hypothesis. Frontiers in Neuroscience 7, 274. Retrieved from https://doi.org/10.3389/fnins.2013.00274 Merchant, H., Pérez, O., Zarco, W., & Gámez, J. (2013). Interval tuning in the primate medial premotor cortex as a general timing mechanism. Journal of Neuroscience 33(21), 9082–9096. Nombela, C., Rae, C. L., Grahn, J. A., Barker, R. A., Owen, A. M., & Rowe, J. B. (2013). How often does music and rhythm improve patients’ perception of motor symptoms in Parkinson’s disease? Journal of Neurology 260(5), 1404–1405. Novembre, G., Ticini, L. F., Schütz-Bosbach, S., & Keller, P. E. (2013). Motor simulation and the coordination of self and other in real-time joint action. Social Cognitive and Affective Neuroscience 9(8), 1062–1068. Nozaradan, S., Peretz, I., Missal, M., & Mouraux, A. (2011). Tagging the neuronal entrainment to beat and meter. Journal of Neuroscience 31(28), 10234–10240. Pasinski, A. C., McAuley, J. D., & Snyder, J. S. (2016). How modality specific is processing of auditory and visual rhythms? Psychophysiology 53(2), 198–208. Patel, A. D. (2003). Rhythm in language and music. Annals of the New York Academy of Sciences 999, 140–143. Patel, A. D. (2006). Musical rhythm, linguistic rhythm, and human evolution. Music Perception: An Interdisciplinary Journal 24(1), 99–104. Patel, A. D., Iversen, J. R., Bregman, M. R., & Schulz, I. (2009). Experimental evidence for synchronization to a musical beat in a nonhuman animal. Current Biology 19(10), 827–830. Patel, A. D., Iversen, J. R., Chen, Y., & Repp, B. H. (2005). The influence of metricality and modality on synchronization with a beat. Experimental Brain Research 163(2), 226–238. Peelle, J. E., Gross, J., & Davis, M. H. (2013). Phase-locked responses to speech in human auditory cortex are enhanced during comprehension. Cerebral Cortex 23(6), 1378–1387. Phillips-Silver, J., Toiviainen, P., Gosselin, N., Piche, O., Nozaradan, S., Palmer, C., & Peretz, I. (2011). Born to dance but beat deaf: A new form of congenital amusia. Neuropsychologia 49(5), 961–969. Picton, T. W., John, M. S., Dimitrijevic, A., & Purcell, D. (2003). Human auditory steady-state responses. International Journal of Audiology 42, 177–219. Poole, J. H., Tyack, P. L., Stoeger-Horwath, A. S., & Watwood, S. (2005). Animal behaviour: Elephants are capable of vocal learning. Nature 434(7032), 455–456. Povel, D. J., & Essens, P. (1985). Perception of temporal patterns. Music Perception: An Interdisciplinary Journal 2(4), 411–440. Povel, D. J., & Okkerman, H. (1981). Accents in equitone sequences. Perception & Psychophysics 30(6), 565–572. Power, A. J., Colling, L. J., Mead, N., Barnes, L., & Goswami, U. (2016). Neural encoding of the speech envelope by children with developmental dyslexia. Brain and Language 160, 1–10. Press, C., Cook, J., Blakemore, S. J., & Kilner, J. (2011). Dynamic modulation of human motor activity when observing actions. Journal of Neuroscience 31(8), 2792–2800. Ravignani, A., Delgado, T., & Kirby, S. (2017). Musical evolution in the lab exhibits rhythmic universals. Nature Human Behaviour 1(1), 0007. doi:10.1038/s41562-016-0007
Rizzolatti, G., & Craighero, L. (2004). The mirror-neuron system. Annual Review of Neuroscience 27, 169–192. Schachner, A., Brady, T. F., Pepperberg, I. M., & Hauser, M. D. (2009). Spontaneous motor entrainment to music in multiple vocal mimicking species. Current Biology 19(10), 831–836. Schroeder, C. E., Lakatos, P., Kajikawa, Y., Partan, S., & Puce, A. (2008). Neuronal oscillations and visual amplification of speech, Cell Press 12(3), 106–113. Schwartze, M., Keller, P. E., Patel, A. D., & Kotz, S. A. (2011). The impact of basal ganglia lesions on sensorimotor synchronization, spontaneous motor tempo, and the detection of tempo changes. Behavioural Brain Research 216(2), 685–691. Selezneva, E., Deike, S., Knyazeva, S., Scheich, H., Brechmann, A., & Brosch, M. (2013). Rhythm sensitivity in macaque monkeys. Frontiers in Systems Neuroscience 7. Retrieved from https://doi.org/10.3389/fnsys.2013.00049 Snyder, J. S., & Large, E. W. (2005). Gamma-band activity reflects the metric structure of rhythmic tone sequences. Cognitive Brain Research 24(1), 117–126. Spaulding, S. J., Barber, B., Colby, M., Cormack, B., Mick, T., & Jenkins, M. E. (2013). Cueing and gait improvement among people with Parkinson’s disease: A meta-analysis. Archives of Physical Medicine and Rehabilitation 94(3), 562–570. Stupacher, J., Hove, M. J., Novembre, G., Schutz-Bosbach, S., & Keller, P. E. (2013). Musical groove modulates motor cortex excitability: A TMS investigation. Brain and Cognition 82(2), 127–136. Su, Y. H., & Salazar-López, E. (2016). Visual timing of structured dance movements resembles auditory rhythm perception. Neural Plasticity 2016, 1678390. doi:10.1155/2016/1678390 Taylor, J. E. T., & Witt, J. K. (2015). Listening to music primes space: Pianists, but not novices, simulate heard actions. Psychological Research 79(2), 175–182. Taylor, J. E. T., Witt, J. K., & Grimaldi, P. J. (2012). Uncovering the connection between artist and audience: Viewing painted brushstrokes evokes corresponding action representations in the observer. Cognition 125(1), 26–36. Teki, S., Grube, M., Kumar, S., & Griffiths, T. D. (2011). Distinct neural substrates of duration-based and beat-based auditory timing. Journal of Neuroscience 31(10), 3805–3812. Winkler, I., Haden, G. P., Ladinig, O., Sziller, I., & Honing, H. (2009). Newborn infants detect the beat in music. Proceedings of the National Academy of Sciences 106(7), 2468–2471. Woodruff Carr, K., White-Schwoch, T., Tierney, A. T., Strait, D. L., & Kraus, N. (2014). Beat synchronization predicts neural speech encoding and reading readiness in preschoolers. Proceedings of the National Academy of Sciences 111(40), 14559–14564. Zarco, W., Merchant, H., Prado, L., & Mendez, J. C. (2009). Subsecond timing in primates: Comparison of interval production between human subjects and rhesus monkeys. Journal of Neurophysiology 102(6), 3191–3202. Zhao, J., Al-Aidroos, N., & Turk-Browne, N. B. (2013). Attention is spontaneously biased toward regularities. Psychological Science 24(5), 667–677. Zuk, J., Bishop-Liebler, P., Ozernov-Palchik, O., Moore, E., Overy, K., Welch, G., & Gaab, N. (2017). Revisiting the “enigma” of musicians with dyslexia: Auditory sequencing and speech abilities. Journal of Experimental Psychology: General 146(4), 495–511.
CHAPT E R 9
NEURAL BASIS OF MUSIC P E R C E P T I O N : M E L O D Y, H A R M O N Y, A N D T I M B R E S T E FA N K O E L S C H
I “M ” is a special case of sound. As opposed to animal song and drumming (e.g., birdsong, ape drumming, etc.), music is produced by humans. As opposed to noise, and noise-textures (e.g., wind, fire crackling, rain, water bubbling, etc.), musical sounds have a structural organization. In the time domain, the most fundamental principle of musical structure is the temporal organization of sounds based on an isochronous grid (the tactus, or “beat”), although there are notable exceptions (such as some kinds of meditation music, or some pieces of modern art music, such as the famous Atmosphères by Ligeti). In the frequency (pitch) domain, the most fundamental principle of musical structure is an organization of pitches according to the overtone series, resulting in simple (e.g., pentatonic) scales. Note that the production of overtone-based scales is, in turn, rooted in the perceptual properties of the auditory system, especially in octave and “fifth equivalence” (Terhardt, 1991), and that inharmonic spectra (e.g., of inharmonic metallophones) give rise to different scales, such as the pelog and slendro scales (Sethares, 2005). Thus, for a vast amount of musical
traditions around the globe and through human history, these two principles (isochronous beat and scale-pitch) build the nucleus of a universal musical grammar. Out of this nucleus, a seemingly infinite number of musical systems, styles, and compositions evolved, and this evolution appears to have followed computational principles described, for example, in the Chomsky hierarchy and their extensions (Rohrmeier, Zuidema, Wiggins, & Scharff, 2015), that is, local relationships between sounds based on a finite state grammar, nonlocal relationships between sounds based on a contextfree grammar, and possibly a context-sensitive grammar (Rohrmeier et al., 2015). Note that the term “language” also refers to structured sounds that are produced by humans. Similar to music, spoken language also has melody, rhythm, accents, and timbre. However, in language normally only one individual speaks at a time (otherwise the language cannot be understood, and the sound is unpleasant). By contrast, music immediately affords, by virtue of its fundamental structural principles, that several individuals produce sounds together (while the music still makes sense and sounds good). In this sense, language is the music of the individual, and music is the language of the group. The fact that music can only be produced by humans is afforded by the uniquely human ability to synchronize movements (including vocalizations) flexibly in a group to an external pulse (see also Merchant & Honing, 2014; Merker, Morley, & Zuidema, 2015). Finally, several scholars have noted that “language,” in turn, is a special case of music. For example, Ulrich (personal communication) once noted that language is music distorted by (propositional) semantics. In this regard, the terms “music” and “language” both refer to structured sounds that are produced by humans as a means of social interaction, expression, diversion, or evocation of emotion, with language, in addition, affording the property of propositional semantics. The following sections will review neuroscientific research about the perception of musical sounds, in particular with regard to the structural processing of melodies and harmonies.
W
D
N
O C
H
O
The auditory system evolved phylogenetically from the vestibular system. Interestingly, the vestibular nerve contains a substantial number of acoustically responsive fibers. The otolith organs (saccule and utricle) are sensitive to sounds and vibrations (Todd, Paillard, Kluk, Whittle, & Colebatch, 2014), and the vestibular nuclear complex in the brainstem exerts a major influence on spinal (and ocular) motoneurons in response to loud sounds with low frequencies, or with sudden onsets (Todd et al., 2014; Todd & Cody, 2000). Moreover, both the vestibular nuclei and the auditory cochlear nuclei in the brainstem project to the reticular formation (also in the brainstem), and the vestibular nucleus also projects to the parabrachial nucleus, a convergence site for vestibular, visceral, and autonomic processing in the brainstem (Balaban & Thayer, 2001; Kandler & Herbert, 1991). Such projections initiate and support movements and contribute to the arousing effects of music. Thus, subcortical processing of sounds does not only give rise to auditory sensations, but also to muscular and autonomic responses, and the stimulation of motoneurons and autonomic neurons by low-frequency beats might contribute to the human impetus to “move to the beat” (Grahn & Rowe, 2009; Todd & Cody, 2000). In addition to vibrations of the vestibular apparatus and cochlea, sounds also evoke resonances in vibration receptors, that is, in the Pacinian corpuscles (which are sensitive from 10 Hz to a few kHz, and located mainly in the skin, the retroperitoneal space in the belly, the periosteum of the bones, and the sex organs), and maybe even responses in mechanoreceptors of the skin that detect pressure. The famous international concert percussionist Dame Evelyn Glennie is profoundly deaf and hears mainly through vibrations felt in the skin (personal communication with Dame Glennie), and probably in the vestibular organ. Thus, we do not only hear with our cochlea, but also with the vestibular apparatus and mechanoreceptors distributed throughout our body.
A
F B
E T
Neural activity originating in the auditory nerve is progressively transformed in the auditory brainstem, as indicated by different neural
response properties for the periodicity of sounds, timbre (including roughness, or consonance/dissonance), sound intensity, and interaural disparities in the superior olivary complex and the inferior colliculus (Geisler, 1998; Langner & Ochse, 2006; Pickles, 2008; Sinex, Guzik, Li, & Henderson Sabes, 2003). Already the inferior colliculi can initiate flightand defensive behavior in response to threatening stimuli (even before the acoustic information reaches the auditory cortex; Cardoso, Coimbra, & Brandão, 1994; Lamprea et al., 2002), providing evidence of relatively elaborated auditory processing already in the brainstem. This stays in contrast to the visual system: already Philip Bard (1934) observed that decortication (removing the neocortex) led to blindness in cats and dogs, but not to deafness. Although the hearing thresholds appeared to be elevated, the animals were capable of differentiating sounds. From the thalamus (particularly over the medial geniculate body) neural impulses are mainly projected into the auditory cortex (but note that the thalamus also projects auditory impulses into the amygdala and the medial orbitofrontal cortex; Kaas, Hackett, & Tramo, 1999; LeDoux, 2000; Öngür & Price, 2000). The exact mechanisms underlying pitch perception are not known (and will not be discussed here), but it is clear that both space information (originating from the tonotopic organization of the cochlea) and time information (originating from the integer time intervals of neural spiking in the auditory nerve) contribute to pitch perception (Moore, 2008). Importantly, the auditory pathway does not only consist of bottom-up, but also of top-down projections; nuclei such as the dorsal nucleus of the inferior colliculus presumably receive even more descending than ascending projections from diverse auditory cortical fields (Huffman & Henson, 1990). Given the massive top-down projections within the auditory pathway, it also becomes increasingly obvious that top-down predictions play an important role in pitch perception (Malmierca, Anderson, & Antunes, 2015). Within the predictive coding framework (currently one of the dominant theories on sensory perception), such top-down projections are thought to afford passing on backward predictions, while forward sensory information is passed bottom-up, signaling prediction errors, that is, sensory information that does not match a prediction (Friston, 2010). Numerous studies investigated decoding of frequency information in the auditory brainstem using the frequency-following response (FFR; Kraus & Chandrasekaran, 2010). The FFR can be elicited pre-attentively and is
thought to originate mainly from the inferior colliculus (but note also that it is likely that the auditory cortex is at least partly involved in shaping the FFRs, e.g., by virtue of top-down projections to the inferior colliculus, referred to above). Using FFRs, Wong and colleagues (Wong, Skoe, Russo, Dees, & Kraus, 2007) measured brainstem responses to three Mandarin tones that differed only in their (F0) pitch contours. Participants were amateur musicians and non-musicians, and results revealed that musicians had more accurate encoding of the pitch contour of the phonemes (as reflected in the FFRs) than non-musicians. This finding indicates that the auditory brainstem is involved in the encoding of pitch contours of speech information (vowels), and that the correlation between the FFRs and the properties of the acoustic information is modulated by musical training. Similar training effects on FFRs elicited by syllables with a dipping pitch contour have also been observed in native English speakers (non-musicians) after a training period of 14 days (with eight 30-minute sesssions; Song, Skoe, Wong, & Kraus, 2008). The latter results show the contribution of the brainstem in language learning and its neural plasticity in adulthood. A study by Strait and colleagues (Strait, Kraus, Skoe, & Ashley, 2009) also reported musical training effects on the decoding of the acoustic features of an affective vocalization (an infant’s unhappy cry), as reflected in auditory brainstem potentials. This suggests (a) that the auditory brainstem is involved in the auditory processing of communicated states of emotion (which substantially contributes to the decoding and understanding of affective prosody), and (b) that musical training can lead to a finer tuning of such (subcortical) processing.
Acoustical Equivalency of “Timbre” and “Phoneme” With regard to a comparison between music and speech, it is worth mentioning that, in terms of acoustics, there is no difference between a phoneme and the timbre of a musical sound (and it is only a matter of convention that some phoneticians rather use terms such as “vowel quality” or “vowel color,” instead of “timbre”). Both are characterized by the two physical correlates of timbre: spectrum envelope (i.e., differences in the
relative amplitudes of the individual “harmonics,” or “overtones”) and amplitude envelope (also sometimes called the amplitude contour or energy contour of the sound wave, i.e., the way that the loudness of a sound changes over time, particularly with regard to the on- and offset of a sound). Aperiodic sounds can also differ in spectrum envelope (see, e.g., the difference between /ʃ/ and /s/), and timbre differences related to amplitude envelope play a role in speech, e.g. in the shape of the attack for /b/ vs. /w/ and /ʃ/ vs. /tʃ/.
Auditory Feature Extraction in the Auditory Cortex As mentioned earlier, auditory information is projected mainly via the subdivisions of the medial geniculate body into the primary auditory cortex (PAC, corresponding to Brodmann’s area 41) and adjacent secondary auditory fields (corresponding to Brodmann’s areas 42 and 52; for a detailed description of primary auditory “core,” and secondary auditory “belt” fields, as well as their connectivity, see Kaas & Hackett, 2000). With regard to the functional properties of primary and secondary auditory fields, a study by Petkov and colleagues (Petkov, Kayser, Augath, & Logothetis, 2006) showed that, in the macaque monkey, all of the PAC core areas, and most of the surrounding belt areas, show a tonotopic organization (the tonotopic organization is clearest in the field A1, and some belt areas seem to show only weak, or no, tonotopic organization). These auditory areas perform a more fine-grained, and more specific, analysis of acoustic features compared to the auditory brainstem. For example, Tramo and colleagues (Tramo, Shah, & Braida, 2002) reported that a patient with bilateral lesion of the PAC (a) had normal detection thresholds for sounds (i.e., the patient could say whether there was a tone or not), but (b) had elevated thresholds for determining whether two tones had the same pitch or not (i.e., the patient had difficulties to detect fine-grained frequency differences between two subsequent tones), and (c) had markedly increased thresholds for determining the pitch direction (i.e., the patient had great difficulties in saying whether the second tone was higher or lower in pitch than the first tone, even though he could tell that both tones differed.1 Note
that the auditory cortex is also involved in a number of other functions, such as auditory sensory memory, extraction of inter-sound relationships, discrimination and organization of sounds as well as sound patterns, stream segregation, automatic change detection, and multisensory integration (for reviews see Hackett & Kaas, 2004; Winkler, 2007; some of these functions are also mentioned further in the following). Moreover, the (primary) auditory cortex is involved in the transformation of acoustic features (such as frequency information) into percepts (such as pitch height and pitch chroma). For example, a sound with the frequencies 200 Hz, 300 Hz, and 400 Hz is transformed into the pitch percept of 100 Hz. Lesions of the (right) PAC result in a loss of the ability to perceive residue pitch (or “virtual pitch”) in both animals (Whitfield, 1980) and humans (Zatorre, 1988), and neurons in the anterolateral region of the PAC show responses to a missing fundamental frequency (Bendor & Wang, 2005). Moreover, magnetoencephalographic (MEG) data indicate that response properties in the PAC depend on whether or not a missing fundamental of a complex tone is perceived (Patel & Balaban, 2001; data were obtained from humans). Note, however, that combination tones emerge already in the cochlea, and that the periodicity of complex tones is coded in the spike pattern of auditory brainstem neurons; therefore, different mechanisms contribute to the perception of residue pitch on at least three different levels (basilar membrane, brainstem, and auditory cortex). However, the studies by Zatorre (1988) and Whitfield (1980) suggest that, compared to the brainstem or the basilar membrane, the auditory cortex plays a more prominent role for the transformation of acoustic features into auditory percepts (such as the transformation of information about the frequencies of a complex sound, as well as about the periodicity of a sound, into a pitch percept). Warren and colleagues (Warren, Uppenkamp, Patterson, & Griffiths, 2003) reported that changes in pitch chroma involve auditory regions anterior of the PAC (covering parts of the planum polare) more strongly than changes in pitch height. Conversely, changes in pitch height appear to involve auditory regions posterior of the PAC (covering parts of the planum temporale) more strongly than changes in pitch chroma (Warren et al., 2003). Moreover, with regard to functional differences between the left and the right PAC, as well as neighboring auditory association cortex, several studies suggest that the left auditory cortex (AC) has a higher resolution of
temporal information than the right AC, and that the right AC has a higher spectral resolution than the left AC (Hyde, Peretz, & Zatorre, 2008; Perani et al., 2010; Zatorre, Belin, & Penhune, 2002). Finally, the auditory cortex also prepares acoustic information for further conceptual and conscious processing. For example, with regard to the meaning of sounds, just a short single tone can sound, for example, “bright”, “rough,” or “dull”. That is, the timbre of a single sound is already capable of conveying meaning information. Operations within the (primary and adjacent) auditory cortex related to auditory feature analysis are reflected in electrophysiological recordings in brain-electric responses that have latencies of about 10 to 100 ms, particularly middle-latency responses, including the auditory P1 (a response with positive polarity and a latency of around 50 ms), and the later auditory N100 component (the N1 is a response with negative polarity and a latency of around 100 ms). Such brain-electric responses are also referred to as “event-related potentials” (ERPs) or “evoked potentials.”
E
M F
G
While auditory features are extracted, the acoustic information enters the auditory sensory memory (or “echoic memory”), and representations of auditory Gestalten (Griffiths & Warren, 2004) or “auditory objects” are formed. The auditory sensory memory (ASM) retains information only for a few seconds, and information stored in the ASM fades quickly. The ASM is thought to store physical features of sounds (such as pitch, intensity, duration, location, timbre, etc.), sound patterns, and even abstract features of sound patterns (e.g., Paavilainen, Simola, Jaramillo, Näätänen, & Winkler, 2001). Operations of the ASM are at least partly reflected electrically in the mismatch negativity (MMN, e.g., Näätänen, Tervaniemi, Sussman, Paavilainen, & Winkler, 2001). The MMN is an ERP with negative polarity and a peak-latency of about 100–200 ms and appears to receive its main contributions from neural sources located in the PAC and adjacent auditory (belt) fields, with additional (but smaller) contributions
from frontal cortical areas (for reviews, see Deouell, 2007; Schönwiesner et al., 2007). Auditory sensory memory operations are indispensable for music perception; therefore, practically all MMN studies are inherently related to, and relevant for, the understanding of the neural correlates of music processing. As will be outlined below, numerous MMN studies have contributed to this issue (a) by investigating different response properties of the ASM to musical and speech stimuli, (b) by using melodic and rhythmic patterns to investigate auditory Gestalt formation, and/or (c) by studying effects of long- and short-term musical training on processes underlying ASM operations. Especially the latter studies have contributed substantially to our understanding of neuroplasticity (i.e., to changes in neuronal structure and function due to experience), and thus to our understanding of the neural basis of learning (for a review see Tervaniemi, 2009). Here, suffice it to say that MMN studies showed effects of long-term musical training on the processing of sound localization, pitch, melody, rhythm, musical key, timbre, tuning, and timing (e.g., Koelsch, Schröger, & Tervaniemi, 1999; Putkinen, Tervaniemi, Saarikivi, de Vent, & Huotilainen, 2014; Rammsayer & Altenmüller, 2006; Tervaniemi, Castaneda, Knoll, & Uther, 2006; Tervaniemi, Janhunen, Kruck, Putkinen, & Huotilainen, 2016). Auditory oddball paradigms were also used to investigate processes of melodic and rhythmic grouping of tones occurring in tone patterns (such grouping is essential for auditory Gestalt formation, see also Sussman, 2007), as well as effects of musical long-term training on these processes. These studies showed effects of musical training (a) on the processing of melodic patterns (Fujioka, Trainor, Ross, Kakigi, & Pantev, 2004; Tervaniemi, Ilvonen, Karma, Alho, & Näätänen, 1997; Tervaniemi, Rytkönen, Schröger, Ilmoniemi, & Näätänen, 2001; Zuijen, Sussman, Winkler, Näätänen, & Tervaniemi, 2004; in these studies, patterns consisted of four or five tones), (b) on the encoding of the number of elements in a tone pattern (Zuijen, Sussman, Winkler, Näätänen, & Tervaniemi, 2005), and (c) on the processing of patterns consisting of two voices (Fujioka, Trainor, Ross, Kakigi, & Pantev, 2005). The formation of auditory Gestalten entails processes of perceptual separation, as well as processes of melodic, rhythmic, timbral, and spatial grouping. Such processes have been summarized under the concepts of auditory scene analysis and auditory stream segregation (Bregman, 1994).
Grouping of acoustic events follows Gestalt principles such as similarity, proximity, and continuity (for acoustic cues used for perceptual separation and auditory grouping see Darwin, 1997, 2008). In everyday life, such operations are not only important for music processing, but also, for instance, for separating a speaker’s voice during a conversation from other sound sources in the environment. That is, these operations are important because their function is to recognize and to follow acoustic objects, and to establish a cognitive representation of the acoustic environment. It appears that the planum temporale (which is part of the auditory association cortex) is a crucial structure for auditory scene analysis and stream segregation, particularly due to its role for the processing of pitch intervals and sound sequences (Griffiths & Warren, 2002; Patterson, Uppenkamp, Johnsrude, & Griffiths, 2002; Snyder & Elhilali, 2017).
M P
E L
F D
:
Processing regularities of subsequent sounds can be performed based on two different principles: first, based on the regularities inherent in the acoustical properties of the sounds, for example, pitch (after a sequence of several sounds with the same pitch, a sound with a different pitch sounds irregular). This type of processing is assumed to be performed by the auditory sensory memory, and processing of irregular sounds is reflected in the MMN (discussed earlier). Note that the extraction of the regularity underlying such sequences does not require memory capabilities beyond the auditory sensory memory (i.e., the regularity is extracted in real time, on a moment-to-moment basis). I have referred previously to such syntactic processes as “knowledge-free structuring” (Koelsch, 2012). Second, the local arrangement of elements in language and music includes numerous regularities that cannot simply be extracted on a moment-to-moment basis but have to be learned over an extended period of time (“local” refers here to the arrangement of adjacent, or directly succeeding, elements). For example, it usually takes months, or even years, to learn the syntax of a language, and it takes a considerable amount of exposure and learning to establish (implicit) knowledge of the statistical
regularities of a certain type of music. I have referred previously to such syntactic processes as “musical expectancy formation” (Koelsch, 2012). An example for local dependencies in music captured by “musical expectancy formation” is the bigram table of chord transition probabilities extracted from a corpus of Bach chorales in a study by Rohrmeier and Cross (2008). That table, for example, showed that after a dominant seventh chord, the most likely chord to follow is the tonic. It also showed that a supertonic is nine times more likely to follow a tonic than a tonic following a supertonic. This is important, because the acoustic similarity of tonic and supertonic is the same in both cases, and therefore it is very difficult to explain this statistical regularity simply based on acoustic similarity. Rather, this regularity is specific for this kind of major-minor tonal music, and thus has to be learned (over an extended period of time) to be represented accurately in the brain of a listener. Notably, even non-musicians are sensitive to such statistical regularities and pick up statistical structures without explicit intent. This ability is explored within the frameworks of statistical learning (Saffran, Aslin, & Newport, 1996) and implicit learning (Cleeremans, Destrebecqz, & Boyer, 1998), both of which have been argued to investigate the same underlying learning phenomenon (Dienes, 2012; Perruchet & Pacton, 2006). Although statistical learning appears to be domain-general (Conway & Christiansen, 2005), it has most prominently been investigated in the context of language acquisition, especially word learning (for a review see Romberg & Saffran, 2010), as well as music (for reviews see Ettlinger, Margulis, & Wong, 2011; François & Schön, 2014; Rohrmeier & Rebuschat, 2012). With regard to statistical learning paradigms, word learning has been argued to be grounded, at least in part, in sequence prediction: in a continuous stream of syllables, sequences of events linked with high statistical conditional probability likely correspond to words, whereas syllable transitions with low predictability may likely be indicative of word-boundaries (François & Schön, 2014; Marcus, Vijayan, Rao, & Vishton, 1999; Saffran, Newport, & Aslin, 1996). Thus, tracking conditional probability relations between syllables has been regarded as highly relevant for the extraction of candidate word forms (Hay, Pelucchi, Estes, & Saffran, 2011; Saffran, 2001). In music, representations of musical regularities guiding local dependencies serve the formation of a musical expectancy (“musical” is
italicized here to clearly differentiate this type of expectancy formation from the formation of expectancies based on simply acoustical regularities). In addition, integrating information across the extracted units eventually reveals distributional properties (Hunt & Aslin, 2010; Thiessen, Kronstein, & Hufnagle, 2013). Extracted statistical properties provide an important basis for predictions which guide the processing of sensory information (Friston, 2010; Friston & Kiebel, 2009; Thiessen et al., 2013). Stimuli that are hard to predict (e.g., the syllable after a word boundary) have been hypothesized to increase processing load (Friston, 2010; Friston & Kiebel, 2009). Such an increase in processing load has been found to be reflected neurophysiologically in ERP components such as the N100 and the N400: during successful stream segmentation, word-onsets evoke larger N100 and N400 ERPs compared to more predictable positions within the word in adults (e.g., Abla, Katahira, & Okanoya, 2008; François, Chobert, Besson, & Schön, 2013; Francois & Schön, 2011, 2014; Schön & François, 2011; Teinonen & Huotilainen, 2012), and similar ERP responses have been observed even in newborns (Teinonen, Fellman, Näätänen, Alku, & Huotilainen, 2009). When participants learn local dependencies (i.e., statistical regularities underlying the succession of sounds), irregular sounds elicit a statistical MMN (or sMMN; Koelsch, Busch, Jentschke, & Rohrmeier, 2016), which is maximal between around 130–220 ms, and has a frontal distribution (Daikoku, Yatomi, & Yumoto, 2014; Furl et al; 2011; Koelsch et al., 2016; Paraskevopoulos, Kuchenbuch, Herholz, & Pantev, 2012). So far, this has been investigated in statistical learning paradigms in which participants are presented over a period of several dozens of minutes with streams of “triplets” (i.e., sounds arranged in threes), with the triplets being designed such that succession of tones within and between triplets follows exactly specified statistical regularities. It is important to understand that, within the Chomsky hierarchy, a finite state automaton is required to process both the regularities underlying the generation the physical MMN (phMMN) and abstract-feature MMN (afMMN) on the one side (i.e., “knowledge-free structuring”), and the sMMN on the other (i.e., “musical expectancy formation”). In other words, a finite state grammar is sufficient to process these two types of regularities. However, they are represented psychologically and neurophysiologically in fundamentally different ways (because the processing of regularities that do
not require long-term memory, i.e., “knowledge-free structuring,” differs neurocognitively from the processing of regularities stored in long-term memory, i.e. “musical expectancy formation”). The local transition probabilities underlying the generation of the phMMN and afMMN are stored in auditory sensory memory (and if the probabilities change, the sensory representations of the new transition probabilities are dynamically updated). By contrast, deviants in statistical learning paradigms, like those employed in the MEG studies described above (Daikoku et al., 2014; Daikoku, Yatomi, & Yumoto, 2015; Furl et al., 2011; Koelsch et al., 2016; Paraskevopoulos et al., 2012) require an extended period of learning, and the mismatch response associated with statistical learning reflects the processing of local dependencies based on (implicit) knowledge about statistical regularities. That is, the mismatch response associated with statistical learning is based on memory representations beyond the capabilities of sensory memory. With regard to music, this also means that fundamentally different neurocognitive systems process different types of local syntactic dependencies in music, even though they can be captured by the same (finite state) automaton within the Chomsky hierarchy.
M P
S N
B D
:
As described in the previous section, tonal music involves representations of single events and local relationships on short timescales. However, many composers designed nested hierarchical syntactic structures spanning longer timescales, potentially up to entire movements of symphonies and sonatas (Salzer, 1962; Schenker, 1956). Hierarchical syntactic structure (involving the potential for nested nonlocal dependencies) is a key component of the human language capacity (Chomsky, 1995; Fitch & Hauser, 2004; Friederici, Bahlmann, Heim, Schubotz, & Anwander, 2006; Hauser, Chomsky, & Fitch, 2002; Nevins, Pesetsky, & Rodrigues, 2009), and frequently produced and perceived in everyday life. For example, in the sentence “the boy who helped Peter kissed Mary,” the subject relative clause “who helped Peter” is nested into the main clause “the boy kissed Mary,” creating a nonlocal hierarchical dependency between “the boy” and
“kissed Mary.”2 Music theorists have described analogous hierarchical structures for music. Schenker (1956) was the first to describe musical structures as organized hierarchically, in a way that musical events are elaborated (or prolonged) by other events in a recursive fashion. According to this principle, for example, a phrase (or set of phrases) can be conceived of as an elaboration of a basic underlying tonic–dominant–tonic progression. Schenker further argued that this principle can be expanded to even larger musical sequences, up to entire musical movements. In addition, Hofstadter (1979) was one of the first to argue that a change of key embedded in a superordinate key (such as a tonal modulation away from and returning to an initial key) constitutes a prime example of recursion in music. Based on similar ideas, several theorists have developed formal descriptions of the analysis of hierarchical structures in music (Lerdahl & Jackendoff, 1983; Rohrmeier, 2011; Steedman, 1984), including the Generative Theory of Tonal Music (GTTM) by Lerdahl and Jackendoff (1983), and the Generative Syntax Model (GSM) by Rohrmeier (2011). Humans are capable of processing hierarchically organized structures including nonlocal dependencies in music (Dibben, 1994; Koelsch, Rohrmeier, Torrecuso, & Jentschke, 2013; Lerdahl & Krumhansl, 2007; Serafine, Glassman, & Overbeeke, 1989), driven by the human capacity to perceive and produce hierarchical, potentially recursive structures (Chomsky, 1995; Hauser et al., 2002; Jackendoff & Lerdahl, 2006). Using chorales by J. S. Bach (see Figure 1) a recent study (Koelsch et al., 2013) showed that hierarchically incorrect final chords of a musical period (violating the nonlocal prolongation of the beginning of the period) elicit a negative brain-electric potential which is maximal between 150 and 300 ms and had frontal preponderance.
FIGURE 1. Nonlocal dependencies in music. (a) Original version of J. S. Bach’s chorale Liebster Jesu, wir sind hier. The first phrase ends on an open dominant (see chord with fermata) and the second phrase ends on a tonic (dotted rectangle). The tree structure above the scores represents a schematic diagram of the harmonic dependencies. The two thick vertical lines (separating the first and the second phrase) visualize that the local dominant (V, rectangle above the fermata) is not immediately followed by a resolving tonic chord, but implies its resolution with the final tonic (indicated by the dotted arrow). The same dependency exists between initial and final tonic (indicated by the solid arrow). This illustrates the nonlocal (long-distance) dependency between the initial and final tonic regions and tonic chords, respectively (also illustrated by the solid arrow). The chords belonging to a key other than the initial key (see function symbols in square brackets) represent one level of embedding. (b) Modified version (the first phrase, i.e. notes up to the fermata, was transposed downwards by the pitch interval of one fourth, see light gray scores). The tree structure above the scores illustrates that the second phrase is not compatible with an expected tonic region (indicated by the dotted line), and that the last chord (a tonic of a local cadence, dotted rectangle) neither prolongs the initial tonic, nor closes the open dominant (see solid and dotted lines followed by question mark). In both (a) and (b), roman numerals indicate scale degrees. T, S, and D indicate the main tonal functions (tonic, subdominant, dominant) of the respective part of the sequence. Squared brackets indicate scale degrees relative to the local key (in the original version, the function symbols in square brackets indicate that the local key of C major is a subdominant region of the initial key of G major).
Note that the term “hierarchical” is used here to refer to a syntactic organizational principle of musical sequences by which elements are organized in terms of subordination and dominance relationships (Lerdahl & Jackendoff, 1983; Rohrmeier, 2011; Steedman, 1984). Such hierarchical structures can be established through the recursive application of rules, analogous to the establishment of hierarchical structures in language (Chomsky, 1995). In both linguistics and music theory, such hierarchical dependency structures are commonly represented using tree graphs. The term “hierarchical” is sometimes also used in a different sense, namely to indicate that certain pitches, chords, or keys within pieces occur more frequently than others and thus establish a frequency-based ranking of structural importance (Krumhansl & Cuddy, 2010). That is not the sense intended here. Numerous other studies using EEG, MEG, and fMRI have previously investigated processing of musical syntax using melodies (with regular and irregular tones) or chord sequences (with regular and irregular harmonies, for reviews see Koelsch, 2009, 2012; Patel, 2008). In all of these studies, the processes of “musical expectancy formation” (involving processing of local dependencies) and “musical structure building” (involving processing of hierarchically organized nonlocal dependencies) were confounded (as is usually the case in “real” music). For example, in the sequences shown in
Figure 2b, the final chord of the upper sequence is a tonic (I), which is the most likely chord to follow a dominant (V). The final chord of the lower sequence is a supertonic (II), which is less likely to follow a dominant. Thus, the local transition probability from V to II is lower than from V to I (in other words, the local dependency of I on V is stronger, i.e., more regular, than of II on V).
FIGURE 2. (a) Examples of chord functions: The chord built on the first scale tone is denoted as the tonic, the chord on the second tone as the supertonic, and the chord on the fifth tone as the dominant. (b) The dominant-tonic progression represents a regular ending of a harmonic sequence (top), the dominant-supertonic progression is less regular and unacceptable as a marker of the end of a harmonic progression (bottom sequence, the arrow indicates the less regular chord). (c) ERPs elicited in a passive listening condition by the final chords of the two sequence types shown in (b). Both sequence types were presented in pseudorandom order equiprobably in all twelve major keys. Brain responses to irregular chords clearly differ from those to regular chords (best to be seen in the black difference wave, regular subtracted from irregular chords). The first difference between the two waveforms is maximal around 200 ms after the onset of the fifth chord (ERAN, indicated by the long arrow) and taken to reflect processes of music-syntactic analysis. The ERAN is followed by an N5 taken to reflect processes of harmonic integration (short arrow). (d) Activation foci (small spheres) reported by functional imaging studies on music-syntactic processing using chord sequence paradigms (Koelsch, Fritz, et al., 2005; Maess et al., 2001; Koelsch et al., 2002; Tillmann et al., 2003) and melodies (Janata et al., 2002). Large gray disks show the mean coordinates of foci (averaged for each hemisphere across studies, coordinates9 refer to standard stereotaxic space). Reprinted from Trends in Cognitive Sciences, 9(12), Stefan Koelsch and Walter A. Siebel, Towards a neural basis of music perception, pp. 578–584, Copyright © 2005 Elsevier Ltd. All rights reserved.
At the same time, the final tonic “prolongs” the initial tonic, whereas the final supertonic does not. Therefore, the nonlocal dependency between initial and final chord is fulfilled in the upper sequence and violated in the bottom sequence. Figure 2c shows brain-electric responses to the final chords of the sequences shown in Figure 2b: the irregular supertonics elicit an ERAN (early right anterior negativity, indicated by the arrow) compared to the regular tonic chords. Importantly, as described earlier, the ERAN elicited here is a conglomerate of the sMMN (due to processing the local dependency violation) and the “hierarchical ERAN” (due to the processing of the nonlocal dependency violation). A study by Zhang and colleagues (Zhang, Zhou, Chang, & Yang, 2018), however, nicely showed effects of nonlocal context effects on local harmonic processing using the ERAN. The ERAN has a larger amplitude in individuals with musical training, is reduced by strong attentional demands, but can be elicited even if participants ignore the musical stimulus (for a review see Koelsch, 2012). Most studies reporting an ERAN used harmonies as stimuli, but the ERAN can also be elicited by melodies (e.g., Carrus, Pearce, & Bhattacharya, 2013; Fiveash, Thompson, Badcock, & McArthur, 2018; Miranda & Ullman, 2007; Zendel, Lagrois, Robitaille, & Peretz, 2015). Moreover, a study by Sun and colleagues (Sun, Liu, Zhou, & Jiang, 2018) reported that the ERAN can also be elicited by rhythmic syntactic processing. Interestingly, a study by Przysinda and colleagues (Przysinda, Zeng, Maves, Arkin, & Loui, 2017) showed differential ERAN responses in classical and jazz musicians depending on their preferences for irregular, or unusual harmonies. The ERAN is relatively immune against predictions: the ERAN latency, but not amplitude, is influenced by veridical expectations (Guo & Koelsch, 2016). However, Vuvan and colleagues (Vuvan, Zendel, & Peretz, 2018) reported that random feedback (including false feedback) on participants’ detection of out-of-key tones in melodies modulated the ERAN amplitude, possibly suggesting that attention-driven changes in the confidence in predictions (i.e., changes in the precision of predictions) might alter the ERAN amplitude. Recent studies also report that the ERAN is absent in individuals with “amusia” (Sun, Lu, et al., 2018), or that pitchjudgment tasks can eliminate the ERAN in amusics (Zendel et al., 2015). In children, the ERAN becomes visible around the age of 30 months (Jentschke, Friederici, & Koelsch, 2014), and several studies have reported ERAN responses in pre-school children (Corrigall & Trainor, 2014;
Jentschke, Koelsch, Sallat, & Friederici, 2008; Koelsch, Grossmann, Gunter, Hahne, & Friederici, 2003). Children with specific language impairment show a reduced (or absent) ERAN (Jentschke et al., 2008), whereas neurophysiological correlates of language-syntactic processing are developed earlier, and more strongly in children with musical training (Jentschke & Koelsch, 2009). Functional neuroimaging studies using chord sequences (similar to those shown in Figure 2b, e.g., Koelsch et al., 2002; Koelsch, Fritz, Schulze, Alsop, & Schlaug, 2005; Maess, Koelsch, Gunter, & Friederici, 2001; Tillmann, Janata, & Bharucha, 2003; Villarreal, Brattico, Leino, Østergaard, & Vuust, 2011) or melodies (Janata, Tillmann, & Bharucha, 2002) suggest that music-syntactic processing involves the pars opercularis of the inferior frontal gyrus (corresponding to BA 44v; Amunts et al., 2010) bilaterally, but with right-hemispheric weighting (see the spheres in Figure 2d). It seems likely that the involvement of BA 44v in music-syntactic processing is mainly due to the hierarchical processing of (syntactic) information: This part of Broca’s area is involved in the hierarchical processing of syntax in language (e.g., Friederici et al., 2006; Makuuchi, Bahlmann, Anwander, & Friederici, 2009), the hierarchical processing of action sequences (e.g., Fazio et al., 2009; Koechlin & Jubault, 2006), and possibly also in the processing of hierarchically organized mathematical formulas and termini (Friedrich & Friederici, 2009; although activation in the latter study cannot clearly be assigned to BA 44 or BA 45). Finally, using an artificial musical grammar, a recent study by Cheung and colleagues (Cheung, Meyer, Friederici, & Koelsch, 2018) reported activation of BA 44v associated with the processing of nonlocal (nested) dependencies (however, note that dependencies in that study were not hierarchically organized). It appears that inferior BA 44 is not the only structure involved in music-syntactic processing: additional structures include the superior part of the pars opercularis (Koelsch et al., 2002), ventral premotor cortex (PMCv; Janata et al., 2002; Koelsch, Fritz, et al., 2005; Parsons, 2001), and the anterior portion of the STG (Koelsch, Fritz, et al., 2005). The PMCv possibly contributes to the processing of local music-syntactic dependencies (i.e., information based on a finite state grammar): activations of PMCv have been reported in a variety of functional imaging studies on auditory processing using musical stimuli, linguistic stimuli, auditory oddball paradigms, pitch discrimination tasks, and serial prediction tasks,
underlining the importance of these structures for the sequencing of structural information, the recognition of structure, and the prediction of sequential information (Janata & Grafton, 2003). With regard to language, Friederici (2004) reported that activation foci of functional neuroimaging studies on the processing of hierarchically organized long-distance dependencies and transformations are located in the posterior IFG (with the mean of the coordinates reported in that article being located in the inferior pars opercularis), whereas activation foci of functional neuroimaging studies on the processing of local dependency violations are located in the PMCv (see also Friederici et al., 2006; Makuuchi et al., 2009; Opitz & Kotz, 2011). Moreover, patients with lesions in the PMCv show disruption of the processing of finite state, but not phrase-structure grammar (Opitz & Kotz, 2011). That is, in the abovementioned experiments that used chord sequence paradigms to investigate the processing of harmonic structure, the musicsyntactic processing of the chord functions probably involved processing of both finite state grammar (local dependencies) and phrase-structure (or “context-free”) grammar (hierarchically organized nonlocal dependencies). The music-syntactic analysis involved a computation of the harmonic relation between a chord function and the context of preceding chord functions (phrase-structure grammar). Such a computation is more difficult (and less common) for irregular than for regular chord functions, and this increased difficulty is presumably reflected in a stronger activation of (inferior) BA 44 in response to irregular chords. In addition, the local transition probability from the penultimate to the final chord is lower for the dominant–supertonic progression than for the dominant–tonic progression (finite state grammar), and the computation of the (less predicted) lowerprobability progression is presumably reflected in a stronger activation of PMCv in response to irregular chords. The stronger activation of both BA 44 and PMCv appears to correlate with the perception of a musicsyntactically irregular chord as “unexpected” (although emotional effects of irregular chords probably originate from BA 47, discussed below). Note that the ability to process context-free grammar is available to humans, whereas non-human primates are apparently not able to master such grammars (Fitch & Hauser, 2004). Thus, it is highly likely that only humans can adequately process music-syntactic information at the phrasestructure level. It is also worth noting that numerous studies showed that
even “non-musicians” (i.e., individuals who have not received formal musical training) have a highly sophisticated (implicit) knowledge about musical syntax (e.g., Tillmann, Bharucha, & Bigand, 2000). Such knowledge is presumably acquired during listening experiences in everyday life. Finally, it is important to note that violations of musical expectancies also have emotional effects, such as surprise, or tension (Huron, 2006; Koelsch, 2014; Lehne & Koelsch, 2015; Meyer, 1956). Consequently, musical irregularity confounds emotion-eliciting effects, and it is difficult to disentangle cognitive and emotional effects of music-syntactic irregularities in neuroscientific experiments. For example, a study by Koelsch and colleagues (Koelsch, Ftiz, et al., 2005) reported activation foci in both BA 44 and BA 47 (among other structures) in response to musical expectancy violations, and a study by Levitin and Menon (2005) reported activation of BA 47 (without BA 44) in response to scrambled (unpleasant) vs. normal music. BA 47 is paralimbic, five-layered palaeocortex (not neocortex), and activation of this region with musical irregularities is most likely due to emotional effects (this is also consistent with an fMRI study reporting that musical tension correlates with neural activity in BA 47; Lehne, Rohrmeier, & Koelsch, 2014). Note that, because BA 47 is not neocortex, it is problematic to consider this region as a “language area.” Moreover, BA 47 is adjacent to BA 44/45/46, thus activation foci originating in Broca’s area can easily be misplaced in BA 47. Based on receptorarchitectonic (and cytoarchitectonic) data, a study by Amunts et al. (2010) showed that BA 47 does not cluster together with BA 44/45/46 (Broca’s area in the wider sense), nor with BA 6 (PMC). As mentioned earlier, hierarchical processing of syntactic information from different domains (such as music and language) requires contributions from neural populations located in BA 44. However, it is still possible that, although such neural populations are located in the same brain area, entirely different (non-overlapping) neural populations serve the syntactic processing of music and language within the same area. That is, perhaps the neural populations mediating language-syntactic processing in BA 44 are different from neural populations mediating music-syntactic processing in the same area. Therefore, the strongest evidence for shared neural resources for the syntactic processing of music and language stems from experiments that revealed interactions between music-syntactic and language-syntactic
processing (Carrus et al., 2013; Fedorenko, Patel, Casasanto, Winawer, & Gibson, 2009; Koelsch, Gunter, Wittfoth, & Sammler, 2005; Patel, Iversen, Wassenaar, & Hagoort, 2008; Slevc, Rosenberg, & Patel, 2009; Steinbeis & Koelsch, 2008). In these studies, chord sequences or melodies were played simultaneously with (visually presented) sentences, and it was shown, for example, that the ERAN elicited by irregular chords interacted with the left anterior negativity (LAN) elicited by linguistic (morpho-syntactic) violations (Koelsch, Gunter, et al., 2005; Steinbeis & Koelsch, 2008). Thus, music-syntactic processes can interfere with language-syntactic processes. In summary, neurophysiological studies show that music- and languagesyntactic processes engage overlapping resources (presumably located in the inferior frontolateral cortex), and evidence showing that these resources underlie music- and language-syntactic processing is provided by experiments showing interactions between ERP components reflecting music- and language-syntactic processing (in particular LAN and ERAN). Importantly, such interactions are observed in the absence of interactions between LAN and MMN, that is, in the absence between language-syntactic and acoustic deviance processing (reflected in the MMN), and in the absence of interactions between the ERAN and the N400 (i.e., in the absence of music-syntactic and language-semantic processing). Therefore, the reported interactions between LAN and ERAN are syntax-specific and cannot be observed in response to any kind of irregularity.
C
R
As a concluding remark I would like to emphasize that even individuals without formal musical training show sophisticated abilities with regard to the decoding of musical information, the acquisition of knowledge about musical syntax, the processing of musical information according to that knowledge, and the understanding of music. This finding supports the notion that musicality is a natural ability of the human brain. Such musical abilities are important for making music together in groups, and thus for the beneficial social effects promoted by musical group activities (such as cooperation and social cohesion, e.g., Koelsch, 2014; Tarr, Launay, & Dunbar, 2014). The natural musical abilities of humans are also important
for the acquisition and the processing of language. For example, differentiating different vowels, consonants, and lexical tones, is a highly sophisticated capability of the human auditory system. Tonal languages rely on a meticulous decoding of pitch information, and both tonal and nontonal languages require an accurate analysis of speech prosody to decode structure and meaning of speech. Infants use such prosodic cues to acquire information about word and phrase boundaries (possibly even about word meaning). The assumption of an intimate connection between music and speech is corroborated by the reviewed findings of overlapping and shared neural resources for music and language processing in both adults and children. These findings suggest that the human brain, particularly at an early age, does not treat language and music as separate domains, but rather treats language as a special case of music, and music as a special case of sound.
R Abla, D., Katahira, K., & Okanoya, K. (2008). On-line assessment of statistical learning by eventrelated potentials. Journal of Cognitive Neuroscience 20(6), 952–964. Amunts, K., Lenzen, M., Friederici, A. D., Schleicher, A., Morosan, P., Palomero-Gallagher, N., & Zilles, K. (2010). Broca’s region: Novel organizational principles and multiple receptor mapping. PLoS Biology 8(9), e1000489. Balaban, C. D., & Thayer, J. F. (2001). Neurological bases for balance–anxiety links. Journal of Anxiety Disorders 15(1), 53–79. Bard, P. (1934). On emotional expression after decortication with some remarks on certain theoretical views: Part II. Psychological Review 41(5), 424. Bendor, D., & Wang, X. (2005). The neuronal representation of pitch in primate auditory cortex. Nature 436(7054), 1161–1165. Bregman, A. S. (1994). Auditory scene analysis: The perceptual organization of sound. Cambridge, MA: MIT Press. Cardoso, S. H., Coimbra, N. C., & Brandão, M. L. (1994). Defensive reactions evoked by activation of NMDA receptors in distinct sites of the inferior colliculus. Behavioural Brain Research 63(1), 17–24. Carrus, E., Pearce, M. T., & Bhattacharya, J. (2013). Melodic pitch expectation interacts with neural responses to syntactic but not semantic violations. Cortex 49(8), 2186–2200. Cheung, V., Meyer, L., Friederici, A. D., & Koelsch, S. (2018). The right inferior frontal gyrus processes hierarchical non-local dependencies in music. Scientific Reports 8, 3822. doi:10.1038/s41598-018-22144-9 Chomsky, N. (1995). The minimalist program. Cambridge, MA: MIT Press. Cleeremans, A., Destrebecqz, A., & Boyer, M. (1998). Implicit learning: News from the front. Trends in Cognitive Sciences 2(10), 406–416.
Conway, C. M., & Christiansen, M. H. (2005). Modality-constrained statistical learning of tactile, visual, and auditory sequences. Journal of Experimental Psychology: Learning, Memory, and Cognition 31(1), 24–39. Corrigall, K. A., & Trainor, L. J. (2014). Enculturation to musical pitch structure in young children: Evidence from behavioral and electrophysiological methods. Developmental Science 17(1), 142– 158. Daikoku, T., Yatomi, Y., & Yumoto, M. (2014). Implicit and explicit statistical learning of tone sequences across spectral shifts. Neuropsychologia 63, 194–204. Daikoku, T., Yatomi, Y., & Yumoto, M. (2015). Statistical learning of music-and language-like sequences and tolerance for spectral shifts. Neurobiology of Learning and Memory 118, 8–19. Darwin, C. J. (1997). Auditory grouping. Trends in Cognitive Sciences 1(9), 327–333. Darwin, C. J. (2008). Listening to speech in the presence of other sounds. Philosophical Transactions of the Royal Society B: Biological Sciences 363(1493), 1011–1021. Deouell, L. Y. (2007). The frontal generator of the mismatch negativity revisited. Journal of Psychophysiology 21(3/4), 188–203. Dibben, N. (1994). The cognitive reality of hierarchic structure in tonal and atonal music. Music Perception 12(1), 1–25. Dienes, Z. (2012). Conscious versus unconscious learning of structure. In P. Rebuschat & J. Williams (Eds.), Statistical learning and language acquisition (pp. 337–364). Berlin: Walter de Gruyter. Ettlinger, M., Margulis, E. H., & Wong, P. C. (2011). Implicit memory in music and language. Frontiers in Psychology 2. Retrieved from https://doi.org/10.3389/fpsyg.2011.00211 Fazio, P., Cantagallo, A., Craighero, L., D’Ausilio, A., Roy, A. C., Pozzo, T., … Fadiga, L. (2009). Encoding of human action in Broca’s area. Brain 132(7), 1980–1988. Fedorenko, E., Patel, A., Casasanto, D., Winawer, J., & Gibson, E. (2009). Structural integration in language and music: Evidence for a shared system. Memory & Cognition 37(1), 1–19. Fitch, W. T., & Hauser, M. D. (2004). Computational constraints on syntactic processing in a nonhuman primate. Science 303(5656), 377–380. Fiveash, A., Thompson, W. F., Badcock, N. A., & McArthur, G. (2018). Syntactic processing in music and language: Effects of interrupting auditory streams with alternating timbres. International Journal of Psychophysiology 129(1), 31–40. François, C., Chobert, J., Besson, M., & Schön, D. (2013). Music training for the development of speech segmentation. Cerebral Cortex 23(9), 2038–2043. Francois, C., & Schön, D. (2011). Musical expertise boosts implicit learning of both musical and linguistic structures. Cerebral Cortex 21(10), 2357–2365. François, C., & Schön, D. (2014). Neural sensitivity to statistical regularities as a fundamental biological process that underlies auditory learning: The role of musical practice. Hearing Research 308, 122–128. Friederici, A. D. (2004). Processing local transitions versus long-distance syntactic hierarchies. Trends in Cognitive Sciences 8(6), 245–247. Friederici, A. D., Bahlmann, J., Heim, S., Schubotz, R. I., & Anwander, A. (2006). The brain differentiates human and non-human grammars: Functional localization and structural connectivity. Proceedings of the National Academy of Sciences 103(7), 2458–2463. Friedrich, R., & Friederici, A. D. (2009). Mathematical logic in the human brain: Syntax. PLoS ONE 4(5), e5599. Friston, K. (2010). The free-energy principle: A unified brain theory? Nature Reviews Neuroscience 11(2), 127–138. Friston, K., & Kiebel, S. (2009). Predictive coding under the free-energy principle. Philosophical Transactions of the Royal Society B: Biological Sciences 364(1521), 1211–1221.
Fujioka, T., Trainor, L. J., Ross, B., Kakigi, R., & Pantev, C. (2004). Musical training enhances automatic encoding of melodic contour and interval structure. Journal of Cognitive Neuroscience 16(6), 1010–1021. Fujioka, T., Trainor, L. J., Ross, B., Kakigi, R., & Pantev, C. (2005). Automatic encoding of polyphonic melodies in musicians and nonmusicians. Journal of Cognitive Neuroscience 17(10), 1578–1592. Furl, N., Kumar, S., Alter, K., Durrant, S., Shawe-Taylor, J., & Griffiths, T. D. (2011). Neural prediction of higher-order auditory sequence statistics. NeuroImage 54(3), 2267–2277. Geisler, C. D. (1998). From sound to synapse: Physiology of the mammalian ear. New York: Oxford University Press. Grahn, J. A., & Rowe, J. B. (2009). Feeling the beat: Premotor and striatal interactions in musicians and nonmusicians during beat perception. Journal of Neuroscience 29(23), 7540–7548. Griffiths, T. D., & Warren, J. D. (2002). The planum temporale as a computational hub. Trends in Neurosciences 25(7), 348–353. Griffiths, T. D., & Warren, J. D. (2004). What is an auditory object? Nature Reviews Neuroscience 5(11), 887–892. Guo, S., & Koelsch, S. (2016). Effects of veridical expectations on syntax processing in music: Event-related potential evidence. Scientific Reports 6, 19064. doi:10.1038/srep19064 Hackett, T. A., & Kaas, J. (2004). Auditory cortex in primates: Functional subdivisions and processing streams. In M. S. Gazzaniga (Ed.), The cognitive neurosciences (pp. 215–232). Cambridge, MA: MIT Press. Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of language: What is it, who has it, and how did it evolve? Science 298(5598), 1569–1579. Hay, J. F., Pelucchi, B., Estes, K. G., & Saffran, J. R. (2011). Linking sounds to meanings: Infant statistical learning in a natural language. Cognitive Psychology 63(2), 93–106. Hofstadter, D. R. (1979). Gödel, Escher, Bach. New York: Basic Books. Huffman, R. F., & Henson, O. W. (1990). The descending auditory pathway and acousticomotor systems: Connections with the inferior colliculus. Brain Research Reviews 15(3), 295–323. Hunt, R. H., & Aslin, R. N. (2010). Category induction via distributional analysis: Evidence from a serial reaction time task. Journal of Memory and Language 62(2), 98–112. Huron, D. B. (2006). Sweet anticipation: Music and the psychology of expectation. Cambridge, MA: MIT Press. Hyde, K. L., Peretz, I., & Zatorre, R. J. (2008). Evidence for the role of the right auditory cortex in fine pitch resolution. Neuropsychologia 46(2), 632–639. Jackendoff, R., & Lerdahl, F. (2006). The capacity for music: What is it, and what’s special about it? Cognition 100(1), 33–72. Janata, P., & Grafton, S. T. (2003). Swinging in the brain: Shared neural substrates for behaviors related to sequencing and music. Nature Neuroscience 6(7), 682–687. Janata, P., Tillmann, B., & Bharucha, J. J. (2002). Listening to polyphonic music recruits domaingeneral attention and working memory circuits. Cognitive, Affective, & Behavioral Neuroscience 2(2), 121–140. Jentschke, S., Friederici, A. D., & Koelsch, S. (2014). Neural correlates of music-syntactic processing in two-year old children. Developmental Cognitive Neuroscience 9, 200–208. Jentschke, S., & Koelsch, S. (2009). Musical training modulates the development of syntax processing in children. NeuroImage 47(2), 735–744. Jentschke, S., Koelsch, S., Sallat, S., & Friederici, A. D. (2008). Children with specific language impairment also show impairment of music-syntactic processing. Journal of Cognitive Neuroscience 20(11), 1940–1951.
Johnsrude, I. S., Penhune, V. B., & Zatorre, R. J. (2000). Functional specificity in the right human auditory cortex for perceiving pitch direction. Brain 123(1), 155–163. Kaas, J. H., & Hackett, T. A. (2000). Subdivisions of auditory cortex and processing streams in primates. Proceedings of the National Academy of Sciences 97(22), 11793–11799. Kaas, J. H., Hackett, T. A., & Tramo, M. J. (1999). Auditory processing in primate cerebral cortex. Current Opinion in Neurobiology 9(2), 164–170. Kandler, K., & Herbert, H. (1991). Auditory projections from the cochlear nucleus to pontine and mesen-cephalic reticular nuclei in the rat. Brain Research 562(2), 230–242. Koechlin, E., & Jubault, T. (2006). Broca’s area and the hierarchical organization of human behavior. Neuron 50(6), 963–974. Koelsch, S. (2009). Music-syntactic processing and auditory memory: Similarities and differences between ERAN and MMN. Psychophysiology 46(1), 179–190. Koelsch, S. (2012). Brain and music. Chichester: Wiley-Blackwell. Koelsch, S. (2014). Brain correlates of music-evoked emotions. Nature Reviews Neuroscience 15(3), 170–180. Koelsch, S., Busch, T., Jentschke, S., & Rohrmeier, M. (2016). Under the hood of statistical learning: A statistical MMN reflects the magnitude of transitional probabilities in auditory sequences. Scientific Reports 6, 19741. doi:10.1038/srep19741 Koelsch, S., Fritz, T., Schulze, K., Alsop, D., & Schlaug, G. (2005). Adults and children processing music: An fMRI study. NeuroImage 25(4), 1068–1076. Koelsch, S., Grossmann, T., Gunter, T. C., Hahne, A., & Friederici, A. D. (2003). Children processing music: Electric brain responses reveal musical competence and gender differences. Journal of Cognitive Neuroscience 15(5), 683–693. Koelsch, S., Gunter, T. C., Cramon, D. Y. von, Zysset, S., Lohmann, G., & Friederici, A. D. (2002). Bach speaks: A cortical “language-network” serves the processing of music. NeuroImage 17(2), 956–966. Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: An ERP study. Journal of Cognitive Neuroscience 17(10), 1565–1577. Koelsch, S., Rohrmeier, M., Torrecuso, R., & Jentschke, S. (2013). Processing of hierarchical syntactic structure in music. Proceedings of the National Academy of Sciences 110(38), 15443– 15448. Koelsch, S., Schröger, E., & Tervaniemi, M. (1999). Superior pre-attentive auditory processing in musicians. Neuroreport 10(6), 1309–1313. Koelsch, S., & Siebel, W. A. (2005). Towards a neural basis of music perception. Trends in Cognitive Sciences 9(12), 578–584. Kraus, N., & Chandrasekaran, B. (2010). Music training for the development of auditory skills. Nature Reviews Neuroscience 11(8), 599–605. Krumhansl, C. L., & Cuddy, L. L. (2010). A theory of tonal hierarchies in music. Music Perception 36, 51–87. Lamprea, M. R., Cardenas, F. P., Vianna, D. M., Castilho, V. M., Cruz-Morales, S. E., & Brandão, M. L. (2002). The distribution of Fos immunoreactivity in rat brain following freezing and escape responses elicited by electrical stimulation of the inferior colliculus. Brain Research 950(1–2), 186–194. Langner, G., & Ochse, M. (2006). The neural basis of pitch and harmony in the auditory system. Musicae Scientiae 10(1), 185. LeDoux, J. E. (2000). Emotion circuits in the brain. Annual Review of Neuroscience 23, 155–184.
Lehne, M., & Koelsch, S. (2015). Toward a general psychological model of tension and suspense. Frontiers in Psychology 6. Retrieved from https://doi.org/10.3389/fpsyg.2015.00079 Lehne, M., Rohrmeier, M., & Koelsch, S. (2014). Tension-related activity in the orbitofrontal cortex and amygdala: An fMRI study with music. Social Cognitive and Affective Neuroscience 9(10), 1515–1523. Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge, MA: MIT Press. Lerdahl, F., & Krumhansl, C. L. (2007). Modeling tonal tension. Music Perception 24(4), 329–366. Levitin, D. J., & Menon, V. (2005). The neural locus of temporal structure and expectancies in music: Evidence from functional neuroimaging at 3 tesla. Music Perception: An Interdisciplinary Journal 22(3), 563–575. Maess, B., Koelsch, S., Gunter, T. C., & Friederici, A. D. (2001). Musical syntax is processed in the area of Broca: An MEG-study. Nature Neuroscience 4(5), 540–545. Makuuchi, M., Bahlmann, J., Anwander, A., & Friederici, A. D. (2009). Segregating the core computational faculty of human language from working memory. Proceedings of the National Academy of Sciences 106(20), 8362–8367. Malmierca, M. S., Anderson, L. A., & Antunes, F. M. (2015). The cortical modulation of stimulusspecific adaptation in the auditory midbrain and thalamus: A potential neuronal correlate for predictive coding. Frontiers in Systems Neuroscience 9, 19. Retrieved from https://doi.org/10.3389/fnsys.2015.00019 Marcus, G. F., Vijayan, S., Rao, S. B., & Vishton, P. M. (1999). Rule learning by seven-month-old infants. Science 283(5398), 77–80. Merchant, H., & Honing, H. (2014). Are non-human primates capable of rhythmic entrainment? Evidence for the gradual audiomotor evolution hypothesis. Frontiers in Neuroscience 7, 274. Retrieved from https://doi.org/10.3389/fnins.2013.00274 Merker, B., Morley, I., & Zuidema, W. (2015). Five fundamental constraints on theories of the origins of music. Philosophical Transactions of the Royal Society B: Biological Sciences 370(1664), 20140095. Meyer, L. B. (1956). Emotion and meaning in music. Chicago, IL: University of Chicago Press. Miranda, R. A., & Ullman, M. T. (2007). Double dissociation between rules and memory in music: An event-related potential study. NeuroImage 38(2), 331–345. Moore, B. C. J. (2008). An introduction to the psychology of hearing (5th ed.). Bingley: Emerald. Näätänen, R., Tervaniemi, M., Sussman, E., Paavilainen, P., & Winkler, I. (2001). “Primitive intelligence” in the auditory cortex. Trends in Neurosciences 24(5), 283–288. Nevins, A., Pesetsky, D., & Rodrigues, C. (2009). Pirahã exceptionality: A reassessment. Language 85(2), 355–404. Öngür, D., & Price, J. L. (2000). The organization of networks within the orbital and medial prefrontal cortex of rats, monkeys and humans. Cerebral Cortex 10(3), 206–219. Opitz, B., & Kotz, S. A. (2011). Ventral premotor cortex lesions disrupt learning of sequential grammatical structures. Cortex 48(6), 664–673. Paavilainen, P., Simola, J., Jaramillo, M., Näätänen, R., & Winkler, I. (2001). Preattentive extraction of abstract feature conjunctions from auditory stimulation as reflected by the mismatch negativity (MMN). Psychophysiology 38(2), 359–365. Paraskevopoulos, E., Kuchenbuch, A., Herholz, S. C., & Pantev, C. (2012). Statistical learning effects in musicians and non-musicians: An MEG study. Neuropsychologia 50(2), 341–349. Parsons, L. (2001). Exploring the functional neuroanatomy of music performance, perception, and comprehension. Annals of the New York Academy of Sciences 930, 211–231. Patel, A. D. (2008). Music, language, and the brain. Oxford: Oxford University Press.
Patel, A. D., & Balaban, E. (2001). Human pitch perception is reflected in the timing of stimulusrelated cortical activity. Nature Neuroscience 4(8), 839–844. Patel, A. D., Iversen, J. R., Wassenaar, M., & Hagoort, P. (2008). Musical syntactic processing in agrammatic Broca’s aphasia. Aphasiology 22(7), 776–789. Patterson, R. D., Uppenkamp, S., Johnsrude, I. S., & Griffiths, T. D. (2002). The processing of temporal pitch and melody information in auditory cortex. Neuron 36(4), 767–776. Perani, D., Saccuman, M. C., Scifo, P., Spada, D., Andreolli, G., Rovelli, R., … Koelsch, S. (2010). Functional specializations for music processing in the human newborn brain. Proceedings of the National Academy of Sciences 107(10), 4758–4763. Perruchet, P., & Pacton, S. (2006). Implicit learning and statistical learning: One phenomenon, two approaches. Trends in Cognitive Sciences 10(5), 233–238. Petkov, C. I., Kayser, C., Augath, M., & Logothetis, N. K. (2006). Functional imaging reveals numerous fields in the monkey auditory cortex. PLoS Biology 4(7), e215. Pickles, J. O. (2008). An introduction to the physiology of hearing (3rd ed.). Bingley: Emerald. Przysinda, E., Zeng, T., Maves, K., Arkin, C., & Loui, P. (2017). Jazz musicians reveal role of expectancy in human creativity. Brain and Cognition 119, 45–53. Putkinen, V., Tervaniemi, M., Saarikivi, K., de Vent, N., & Huotilainen, M. (2014). Investigating the effects of musical training on functional brain development with a novel melodic MMN paradigm. Neurobiology of Learning and Memory 110, 8–15. Rammsayer, T., & Altenmüller, E. (2006). Temporal information processing in musicians and nonmusicians. Music Perception 24(1), 37–48. Rohrmeier, M. (2011). Towards a generative syntax of tonal harmony. Journal of Mathematics and Music 5(1), 35–53. Rohrmeier, M., & Cross, I. (2008). Statistical properties of tonal harmony in Bach’s chorales. In Ken’ichi Miyazaki, Mayumi Adachi, Yuzuru Hiraga, Yoshitaka Nakajima, and Minoru Tsuzaki (Eds.), Proceedings of the 10th International Conference on Music Perception and Cognition. ICMPC (CD-ROM). Rohrmeier, M., & Rebuschat, P. (2012). Implicit learning and acquisition of music. Topics in Cognitive Science 4(4), 525–553. Rohrmeier, M., Zuidema, W., Wiggins, G. A., & Scharff, C. (2015). Principles of structure building in music, language and animal song. Philosophical Transactions of the Royal Society B: Biological Sciences 370(1664), 20140097. Romberg, A. R., & Saffran, J. R. (2010). Statistical learning and language acquisition. Wiley Interdisciplinary Reviews: Cognitive Science 1(6), 906–914. Saffran, J. R. (2001). Words in a sea of sounds: The output of infant statistical learning. Cognition 81(2), 149–169. Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science 274(5294), 1926–1928. Saffran, J. R., Newport, E. L., & Aslin, R. N. (1996). Word segmentation: The role of distributional cues. Journal of Memory and Language 35(4), 606–621. Salzer, F. (1962). Structural hearing: Tonal coherence in music (Vol. 1). New York: Dover Publications. Schenker, H. (1956). Neue musikalische theorien und phantasien: Der freie satz (2nd ed.). Vienna: Universal Edition. Schön, D., & François, C. (2011). Musical expertise and statistical learning of musical and linguistic structures. Frontiers in Psychology 2, 167. Retrieved from https://doi:10.3389/fpsyg.2011.00167 Schönwiesner, M., Novitski, N., Pakarinen, S., Carlson, S., Tervaniemi, M., & Näätänen, R. (2007). Heschl’s gyrus, posterior superior temporal gyrus, and mid-ventrolateral prefrontal cortex have
different roles in the detection of acoustic changes. Journal of Neurophysiology 97(3), 2075–2082. Serafine, M. L., Glassman, N., & Overbeeke, C. (1989). The cognitive reality of hierarchic structure in music. Music Perception 6(4), 397–430. Sethares, W. A. (2005). The gamelan. In W. A. Sethares, Tuning, timbre, spectrum, scale (pp. 165– 187). Berlin: Springer. Sinex, D. G., Guzik, H., Li, H., & Henderson Sabes, J. (2003). Responses of auditory nerve fibers to harmonic and mistuned complex tones. Hearing Research 182(1–2), 130–139. Slevc, L. R., Rosenberg, J. C., & Patel, A. D. (2009). Making psycholinguistics musical: Self-paced reading time evidence for shared processing of linguistic and musical syntax. Psychonomic Bulletin & Review 16(2), 374–381. Snyder, J. S., & Elhilali, M. (2017). Recent advances in exploring the neural underpinnings of auditory scene perception. Annals of the New York Academy of Sciences 1396, 39–55. Song, J. H., Skoe, E., Wong, P. C. M., & Kraus, N. (2008). Plasticity in the adult human auditory brainstem following short-term linguistic training. Journal of Cognitive Neuroscience 20(10), 1892–1902. Steedman, M. J. (1984). A generative grammar for jazz chord sequences. Music Perception 2(1), 52– 77. Steinbeis, N., & Koelsch, S. (2008). Shared neural resources between music and language indicate semantic processing of musical tension-resolution patterns. Cerebral Cortex 18(5), 1169–1178. Strait, D. L., Kraus, N., Skoe, E., & Ashley, R. (2009). Musical experience and neural efficiency: Effects of training on subcortical processing of vocal expressions of emotion. European Journal of Neuroscience 29(3), 661–668. Sun, L., Liu, F., Zhou, L., & Jiang, C. (2018). Musical training modulates the early but not the late stage of rhythmic syntactic processing. Psychophysiology 55(2), e12983. Sun, Y., Lu, X., Ho, H. T., Johnson, B. W., Sammler, D., & Thompson, W. F. (2018). Syntactic processing in music and language: Parallel abnormalities observed in congenital amusia. NeuroImage: Clinical 19, 640–651. Sussman, E. S. (2007). A new view on the MMN and attention debate: The role of context in processing auditory events. Journal of Psychophysiology 21(3), 164–175. Tarr, B., Launay, J., & Dunbar, R. I. (2014). Music and social bonding: “Self–other” merging and neurohormonal mechanisms. Frontiers in Psychology 5, 1096. Retrieved from https://doi.org/10.3389/fpsyg.2014.01096 Teinonen, T., Fellman, V., Näätänen, R., Alku, P., & Huotilainen, M. (2009). Statistical language learning in neonates revealed by event-related brain potentials. BMC Neuroscience 10(1), 21. Teinonen, T., & Huotilainen, M. (2012). Implicit segmentation of a stream of syllables based on transitional probabilities: An MEG study. Journal of Psycholinguistic Research 41(1), 71–82. Terhardt, E. (1991). Music perception and sensory information acquisition: Relationships and lowlevel analogies. Music Perception: An Interdisciplinary Journal 8(3), 217–239. Tervaniemi, M. (2009). Musicians—same or different? Annals of the New York Academy of Sciences 1169, 151–156. Tervaniemi, M., Castaneda, A., Knoll, M., & Uther, M. (2006). Sound processing in amateur musicians and nonmusicians: Event-related potential and behavioral indices. Neuroreport 17(11), 1225–1228. Tervaniemi, M., Ilvonen, T., Karma, K., Alho, K., & Näätänen, R. (1997). The musical brain: Brain waves reveal the neurophysiological basis of musicality in human subjects. Neuroscience Letters 226(1), 1–4. Tervaniemi, M., Janhunen, L., Kruck, S., Putkinen, V., & Huotilainen, M. (2016). Auditory profiles of classical, jazz, and rock musicians: Genre-specific sensitivity to musical sound features.
Frontiers in Psychology 6, 1900. Retrieved from https://doi.org/10.3389/fpsyg.2015.01900 Tervaniemi, M., Rytkönen, M., Schröger, E., Ilmoniemi, R. J., & Näätänen, R. (2001). Superior formation of cortical memory traces for melodic patterns in musicians. Learning & Memory 8(5), 295–300. Thiessen, E. D., Kronstein, A. T., & Hufnagle, D. G. (2013). The extraction and integration framework: A two-process account of statistical learning. Psychological Bulletin 139(4), 792–814. Tillmann, B., Bharucha, J., & Bigand, E. (2000). Implicit learning of tonality: A self-organized approach. Psychological Review 107(4), 885–913. Tillmann, B., Janata, P., & Bharucha, J. J. (2003). Activation of the inferior frontal cortex in musical priming. Cognitive Brain Research 16(2), 145–161. Todd, N. P. M., & Cody, F. W. (2000). Vestibular responses to loud dance music: A physiological basis of the “rock and roll threshold”? Journal of the Acoustical Society of America 107(1), 496– 500. Todd, N. P. M., Paillard, A., Kluk, K., Whittle, E., & Colebatch, J. (2014). Vestibular receptors contribute to cortical auditory evoked potentials. Hearing Research 309, 63–74. Tramo, M. J., Shah, G. D., & Braida, L. D. (2002). Functional role of auditory cortex in frequency processing and pitch perception. Journal of Neurophysiology 87(1), 122–139. Villarreal, E. A. G., Brattico, E., Leino, S., Østergaard, L., & Vuust, P. (2011). Distinct neural responses to chord violations: A multiple source analysis study. Brain Research 1389, 103–114. Vuvan, D. T., Zendel, B. R., & Peretz, I. (2018). Random feedback makes listeners tone-deaf. Scientific Reports 8(1), 7283. Warren, J. D., Uppenkamp, S., Patterson, R. D., & Griffiths, T. D. (2003). Separating pitch chroma and pitch height in the human brain. Proceedings of the National Academy of Sciences 100(17), 10038–10042. Whitfield, I. (1980). Auditory cortex and the pitch of complex tones. Journal of the Acoustical Society of America 67(2), 644–647. Winkler, I. (2007). Interpreting the mismatch negativity. Journal of Psychophysiology 21(3–4), 147– 163. Wong, P. C. M., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nature Neuroscience 10(4), 420–422. Zatorre, R. J. (1988). Pitch perception of complex tones and human temporal-lobe function. Journal of the Acoustic Society of America 84, 566–572. Zatorre, R. J. (2001). Neural specializations for tonal processing. Annals of the New York Academy of Sciences 930, 193–210. Zatorre, R. J., Belin, P., & Penhune, V. B. (2002). Structure and function of auditory cortex: Music and speech. Trends in Cognitive Sciences 6(1), 37–46. Zendel, B. R., Lagrois, M.-É., Robitaille, N., & Peretz, I. (2015). Attending to pitch information inhibits processing of pitch information: The curious case of amusia. Journal of Neuroscience 35(9), 3815–3824. Zhang, J., Zhou, X., Chang, R., & Yang, Y. (2018). Effects of global and local contexts on chord processing: An ERP study. Neuropsychologia 109, 149–154. Zuijen, T. L. von, Sussman, E., Winkler, I., Näätänen, R., & Tervaniemi, M. (2004). Grouping of sequential sounds: An event-related potential study comparing musicians and nonmusicians. Journal of Cognitive Neuroscience 16(2), 331–338. Zuijen, T. L. von, Sussman, E., Winkler, I., Näätänen, R., & Tervaniemi, M. (2005). Auditory organization of sound sequences by a temporal or numerical regularity: A mismatch negativity study comparing musicians and non-musicians. Cognitive Brain Research 23(2–3), 270–276.
1
For similar results obtained from patients with (right) PAC lesions see Johnsrude, Penhune, and Zatorre (2000) and Zatorre (2001). 2
Note that a finite state automaton will only (mis)understand that “Peter kissed Mary”!
CHAPT E R 10
M U LT I S E N S O RY PROCESSING IN MUSIC FRANK RUSSO
I D of music tend to be unimodal in nature, often including some version of the idea that music is organized sound with aesthetic intent. Even philosophical treatises that attempt to define music in broad terms tend to overlook multisensory aspects (Nattiez, 1990; Thomas, 1983). However, multisensory aspects abound. For instance, the facial expressions and body gestures of a performer may be perceived through the visual system and the mechanical vibrations produced by a musical instrument may be perceived through the somatosensory system. Sensorimotor networks may also give rise to cascade effects. For example, motor activity in response to a beat may give rise to micro-movements of the head and torso, which may in turn lead to vestibular stimulation. When the motor activity becomes entrained it may serve as its own channel of sensory input. As such, the perception of music is often multisensory, integrating inputs from auditory, visual, somatosensory, vestibular, and motor areas. This chapter has three main sections. The first provides an overview of theory and evidence regarding multisensory processing. The second considers auditory-only processing with a focus on lateralization, basic modularity, and pathways. This sets the stage for the final section, which
considers non-auditory and multisensory processing of pitch, timbre, and rhythm. In each subsection corresponding to a dimension of music, psychophysical evidence is presented before reviewing the extant neuroscientific evidence. Where no neuroscientific evidence exists, proposals have been made about the types of neural processing that may be involved.
M
P
It has often been noted that speech is perceived by eye and by ear. This is normally characterized as an opportunity to minimize uncertainty as it allows the brain to capitalize on convergences. However, it can also represent a sensory-processing challenge in that information from across two channels must somehow be bound together into a common representation. This challenge may be even greater in music given the additional channels of sensory information that are routinely involved and the intentional use of uncertainty as a compositional device. Nevertheless, under most conditions, multisensory information in music is successfully integrated yielding a coherent and stable multisensory percept. Information from across the senses may be integrated in a manner that is cognitive or perceptual (Schutz, 2008). Cognitive integration takes place after information from two or more channels has been processed independently (see review and meta-analysis concerning audio-visual music by Platz & Kopiez, 2012). A classic musical example of this type of integration is the influence of performer attractiveness on judgments of performance quality (Wapnick, Mazza, & Darrow, 2000). In this example, information from one channel does not so much alter perception in another as much as influence how those perceptions are evaluated. Another musical example that reflects cognitive multisensory integration concerns the “blue note” in live jazz and blues. Blue notes are often accompanied by a visual display that conveys negatively valenced emotion (e.g., wincing of the eyes, shaking, or rolling the head back). Thompson, Graham, and Russo (2005) sought to assess the effect of this practice by using twenty clips of a blues concert performed by B.B. King. Although all of the selected clips possessed some level of dissonance, half were
performed with a relatively neutral facial expression. Two groups of participants were asked to provide judgments of dissonance. One group made judgments in an auditory-only condition and the other made judgments in an auditory-visual condition. Results revealed that visual information influenced judgments of dissonance, such that the difference between dissonant and neutral performances was greater in the audio-visual condition. However, it would be erroneous to conclude that information from the visual and auditory channel had been integrated at the level of perceptual representation. Integration at the perceptual level is said to take place when information from across the senses is integrated in a manner that is automatic and preattentive (Arieh & Marks, 2008; Spence, 2011). All of the multisensory examples considered in the rest of this chapter meet these simple criteria. However, the neural mechanisms allowing for perceptual integration are by no means uniform. To foreshadow, there are at least three main types of mechanisms that have been implicated. The mechanisms vary with respect to network size but all involve some form of direct or indirect communication between primary sensory areas of the brain (see Fig. 1).
FIGURE 1. Schematic diagram of brain circuitry underpinning three mechanisms of multisensory integration (STS = Superior Temporal Sulcus; IFG = Inferior Frontal Gyrus; S = Somatosensory Cortex; A = Auditory Cortex; V = Visual Cortex). Top panel diagrams first mechanism involving primary sensory areas only. Second panel diagrams second mechanism involving the first mechanism in addition to a known multisensory area, superior-temporal sulcus (STS). Bottom panel diagrams third mechanism that may be described as sensorimotor. It builds on the second mechanism adding feedback connections from a known motor planning area, inferior frontal gyrus (IFG). Subcortical contributions from the superior colliculus not diagrammed.
First, a basic form of multisensory integration occurs when unisensory input activates areas of primary sensory cortex that are not normally associated with that input. This phenomenon has been observed following sensory deprivation that is permanent (e.g., blindness) or temporary (e.g., blindfold), suggesting a role for rapid cortical plasticity (Merabet et al., 2008). Complementary evidence has been found using unisensory information. For example, auditory cortex may be activated by lip reading in the context of silent speech (Calvert et al., 1997) or silent tactile stimulation (Foxe et al., 2002). Although not strictly multisensory, these examples reveal the existence of lateral connections between primary sensory areas and suggest the potential for integration without the involvement of higher-order multisensory areas (Foxe & Schroeder, 2005). Second, evidence has been observed for a “superadditive” neural response to multisensory input that is greater than the neural response for equivalent unisensory inputs. Most of the evidence for superadditivity has been found using intercellular recordings in the superior colliculus using animal models (Stein & Meredith, 1993). However, using non-invasive imaging methods, evidence for superadditivity has also been found in the cerebral cortex. For instance, a superadditive response has been observed in superior temporal sulcus using audio-tactile and audio-visual stimuli (Beauchamp, Yasar, Frye, & Ro, 2008). This body of evidence suggests a mechanism for multisensory integration that relies on hierarchical processing involving the progressive convergence of pathways. Finally, evidence is emerging from electrophysiological, neuroimaging, and brain stimulation studies for the functional role of connectivity across broad expanses of sensorimotor cortex (Frith & Hasson, 2016; Keil, Müller, Ihssen, & Weisz, 2012; Luo, Liu, & Poeppel, 2010; Luo & Poeppel, 2007). Synchronized oscillations across multisensory and motor areas may serve to integrate and select task-relevant information from across the senses. Sensory input may feed forward leading to a predictive motor code that is
informed by priors (empirically based expectations about movement patterns). In turn, this predictive code can feed back to multisensory areas allowing for comparison with incoming sensory input (Kilner, Friston, & Frith, 2007). This body of evidence emphasizes the inherent uncertainty that exists in sensory information and the important role that the motor system can have in disambiguating that uncertainty. This sensorimotor mechanism allows for context-sensitive multisensory integration that relies on feedforward and feedback connections (Senkowski, Schneider, Foxe, & Engel, 2008). In addition to investigating the particular mechanisms underpinning multisensory integration, research has attempted to explain the extent to which the different senses will contribute to the perception of a multisensory stimulus. The likelihood of integrating information from across the senses is lawfully related to the extent to which information about a signal appears to overlap in space and time. In other words, when the audio and visual aspects of a signal are delayed in time or separated in space, the likelihood of integration is reduced. In addition, the law of inverse effectiveness states that multisensory integration is inversely proportional to the effectiveness of the strongest unisensory response (Meredith & Stein, 1986; Stein & Meredith, 1993). Hence, if an auditory input is robust on its own to facilitate some functional goal, it will be resistant to influence from non-auditory information. If the auditory input is weak due to a compromised sensory system, perceptual ambiguity, or masking from noise, then the likelihood of integrating information from other senses increases. Maximum-likelihood estimation (MLE) methods have been used to model psychophysical as well as neural findings (Alais & Burr, 2004; Ernst & Banks, 2002; Gu, Angelaki, & DeAngelis, 2008; Rohe & Noppeney, 2015). Based on Bayesian probability theory, MLE models are essentially a weighted linear sum that combines signals from different senses (Angelaki, Gu, & DeAngelis, 2009; Ernst & Bülthoff, 2004). The weight assigned to each signal is determined by stimulus or perceiver characteristics that influence signal reliability. Like the inverse effectiveness rule, the critical assumption in this approach is that inherent uncertainty exists in sensory information.
A
P
Despite the extensive involvement of non-auditory areas in music processing, there is no mistaking that the auditory cortex is the central hub for processing music in the neurotypical brain. Rather than an undifferentiated whole, the auditory cortex is best understood as a collection of modules that work together as an “auditory network” enabling the processing of separate dimensions of music. The exposition of these modules is briefly reviewed here to allow for comparison with processing of the same dimensions as experienced by other senses. While more exhaustive reviews of auditory neuroscience may be found elsewhere in this volume, the focus of the brief review provided here sets the stage for subsequent discussion of evidence for non-auditory input activating auditory cortex. The area known as the auditory core exists in both hemispheres including the superior temporal gyrus of the temporal lobe and extending into the lateral sulcus as well as the transverse temporal gyri that runs toward the center of the brain. The latter is often referred to as Heschl’s gyrus, which is the first structure in the cortex that reflects the tonotopic map that originates in the cochlea. Some research has suggested the existence of separate caudal and rostral tonotopic maps with mirror-like orientations (Formisano et al., 2003). Additional tonotopic maps have been found in the belt area surrounding the core (Rauschecker, Tian, & Hauser, 1995; Rauschecker, Tian, Pons, & Mishkin, 1997). Beyond the belt area lies a tertiary area of auditory cortex known as the parabelt. The parabelt is thought to have functionally distinct subdivisions (Kaas & Hackett, 2000). The caudal subdivision abuts and is interconnected with the superior temporal sulcus. Together, this caudal subdivision of the parabelt and the superior temporal sulcus constitute the posterior hub of the auditory-motor pathway (more details on pathways below). An early PET study by Zatorre & Belin (2001) indicated that in both hemispheres, temporal variation of auditory input engages the core, whereas spectral variation engages the belt. However, responses to temporal features (i.e., with relevance for rhythm) were clearly biased toward the left and responses to spectral features (i.e., with relevance for pitch and timbre) were clearly biased toward the right. This apparent pattern of hemispheric
specialization has been further validated by the results of neuropsychological studies involving patients with cortical lesions. In general, patients with lesions in the right hemisphere have more impaired pitch processing than do those with lesions in the left hemisphere. For example, lesions in the right hemisphere lead to weaker pitch discrimination (Milner, 1962; Johnsrude, Penhune, & Zatorre, 2000), weaker perception of the missing fundamental (Zatorre, 1988), weaker sensitivity to pitch direction (Johnsrude, Penhune, & Zatorre, 2000), and weaker sensitivity to the global pitch contour (Peretz, 1990). On the basis of neurophysiological and psychophysical data, Poeppel (2001) proposed a similar (but speech-specific) hemispheric specialization that focused on the window of temporal integration. He proposed that the left hemisphere had a short integration window (20–50 ms) that supports processing of formant transitions and that the right hemisphere had a long integration window (150–250 ms) that supports processing of intonation contours. This specialization may ultimately be rooted in differences in the volume of white-matter tissue across the two hemispheres. A post-mortem study by Anderson, Southern, and Powers (1999) found a higher volume of white matter tissue in the belt area of the left hemisphere compared to the right due to greater thickness of the myelin sheathing. More recent neuroimaging research has further validated this proposed explanation for hemispheric specialization. For example, Hyde, Peretz, & Zatorre (2008) found that activation in the right hemisphere increased parametrically as a function of the pitch distance between consecutive tones. In contrast, they observed only a coarse-grain differentiation in the left hemisphere. Auditory evoked potentials have also been used to elucidate hemispheric specialization. Neurons in the right hemisphere have been found to possess sharper frequency tuning than those in the left hemisphere (LiégeoisChauvel, Giraud, Badier, Marquis, & Chauvel, 2012).
Pathways Much like the “what” (ventral) and “where” (dorsal) visual pathways originally proposed to explain functional organization in the visual system (Goodale & Milner, 1992), there are two main auditory pathways leading
out of auditory cortex and terminating in frontal areas (Zatorre, Chen, & Penhune, 2007). A ventral auditory pathway is thought to be involved primarily with category-based representations (e.g., phonemes). A dorsal “auditory-motor” pathway is thought to be specialized for sensorimotor translations of time-varying information that is not categorical. This pathway may be particularly important in the context of learning a new piece of music (Lahav, Saltzman, & Schlaug, 2007; Schalles & Pineda, 2015), perceiving emotion in music (McGarry, Pineda, & Russo, 2015; Thompson et al., 2005; Vines, Krumhansl, Wanderley, Dalca, & Levitin, 2011), and in the type of feedback monitoring required for performance, particularly in continuous pitch instruments like voice or violin (Loui, 2015; Zatorre et al., 2007). The auditory-motor pathway involves reciprocal connections between inferior frontal gyrus and posterior subdivisions of the superior temporal gyrus (auditory parabelt) and superior temporal sulcus (multisensory area).
M
P
P
Visuomotor Influences Numerous studies have demonstrated that the size of a sung melodic interval can be judged directly through the visual system. When videos of sung melodic intervals are presented to observers without audio, they are able to accurately scale them according to size (Thompson & Russo, 2007). This ability does not appear to require music or vocal training, which argues against a cognitive account based on long-term memory associations, and further suggests that some aspects of the visual information provide reliable cues for judging interval size. Video-based tracking has shown that larger intervals possess more head movement, eyebrow raising, and mouth opening. The influence of visual information on perception of size in sung melodic intervals persists even under point-light presentation conditions in which the dynamic information in the display is retained while eliminating static visual cues (Abel, Li, Russo, Schlaug, & Loui, 2016).
The visual channel continues to influence the perceived size of sung melodic intervals even when audio is present (Russo, Sandstrom, & Maksimowski, 2011; Thompson et al., 2005; Thompson, Russo, & Livingstone, 2010). The mouth area may be particularly important in judging the size of sung melodic intervals as reducing the level of audibility in an audio-visual presentation (by increasing level of background noise) causes observers to increase the proportion gaze directed toward the mouth (Russo et al., 2011). However, the visual influence on auditory judgments has been found to be mitigated for participants with a young onset of musical training (Abel et al., 2016). One interpretation of this finding is that early-trained musicians possess a stronger audio-motor representation of sung melodic intervals. This enhancement in priors may allow them to focus on auditory input or rely less heavily on non-auditory input when presented with multisensory musical stimuli. This prioritization may be further reinforced through experience playing in groups where orthogonal streams of audio and visual information may co-exist. But how can we be sure that vision influences melodic pitch processing at a perceptual (vs. cognitive) level? One behavioral means of assessing whether multisensory integration is perceptual is to utilize a dual-task paradigm. Thompson et al. (2010) presented participants with sung melodic intervals accompanied by facial expressions used to perform a small or large interval (two and nine semitones, respectively). Participants were asked to count the number of 0’s and 1’s that were superimposed over the singer’s face during performance of each interval. The conditions were blocked by digit speed (300 or 700 msec per digit) as well as task demand (single or dual task). Results revealed that the influence of the visual information on auditory judgments of sung melodic interval size was not moderated by cognitive load. These findings suggest that the integration was automatic and pre-attentive. The cortical underpinnings of this example of multisensory integration in music may originate in motion selective areas of the dorsal visual pathway, such as the medial temporal and the medial superior temporal areas. Both of these areas are adjacent to the posterior bank of the superior temporal sulcus, a known multisensory area that projects to premotor areas allowing for sensorimotor translations of dynamic sensory input (Kilner, 2011). There are also reciprocal connections from premotor to superior temporal sulcus allowing for a predictive coding model of action involving
sensory representations (Kilner et al., 2007). This type of predictive coding may be particularly important in shaping auditory judgments of action on the basis of visual input alone or in situations where auditory input is ambiguous for some reason (e.g., an individual with severe hearing loss or an individual with normal hearing listening in low signal-to-noise conditions). The mechanism proposed for visual perception and audio-visual integration of melodic pitch information involves feedforward and feedback connections along the dorsal stream. Feedforward connections provide multisensory input to motor planning areas. Feedback connections provide predictive coding of movement informed by priors that can be compared with incoming sensory information (Kilner et al., 2007; Maes, Leman, Palmer, & Wanderley, 2014). In the case of individuals with severe hearing loss, there may also be an additional contribution owed to visual activation of the auditory cortex in belt areas (Finney, 2001; Röder, Stock, Bien, Neville, & Rösler, 2002). Research in animal models suggests that belt areas undergo profound plastic changes following a period of auditory deprivation, which leads in some cases to enhanced visual processing. Lomber, Meredith, and Kral (2010) showed that deactivation of posterior belt areas selectively eliminates enhancements to visual localization, whereas deactivation of the dorsal belt areas eliminates enhancement of visual motion detection. Transcranial magnetic stimulation (TMS) has been used as one means of investigating the assumed involvement of motor areas in processing sung melodic intervals (Royal, Lidji, Théoret, Russo, & Peretz, 2015). Nonmusicians were given brief training that enabled them to apply a label to intervals of different size (e.g., unison, octave, etc.). Following training, facilitative TMS was applied over motor cortex, while participants observed a pitch interval label that was immediately followed by the audio-visual presentation of a sung interval. Participants were required to make a forcedchoice judgment regarding whether the pitch interval label matched the pitch interval contained in the two-note vocal melody. Motor-evoked potentials recorded from the mouth muscles contralateral to the hemisphere receiving stimulation were found to increase relative to baseline for large pitch intervals and decrease for small pitch intervals, suggesting that some type of motor simulation was taking place.
Another line of evidence in support of motor involvement in perception of song may be found in EEG research investigating the sensorimotor (or mu) wave. The oscillatory generators of the sensorimotor wave can be found in the inferior frontal gyrus, and to a lesser extent in the inferior parietal lobe. The sensorimotor wave becomes desynchronized when an individual moves intentionally or when they observe others moving intentionally, and the extent of desynchronization is enhanced under multisensory presentation conditions (Kaplan & Iacoboni, 2007; McGarry, Russo, Schalles, & Pineda, 2012). These data have been interpreted as evidence of an internal simulation involving motor planning and proprioception. While some controversy exists regarding the putative mirror system responsible for the sensorimotor wave (Hickok, 2009), its responsiveness to observation of intentional action is less equivocal. A meta-analysis by Fox et al. (2016), involving eighty-five studies, found significant event-related desynchronization during observation of intentional action (Cohen’s d = 0.31, N = 1,508). With regard to music stimuli, evidence has been found for sensorimotor desynchronization in response to audio-only presentations of isolated sung notes (Lévêque & Schön, 2013) and audio-visual presentations of sung melodic intervals (McGarry et al., 2015). Although it seems likely, it remains to be determined whether sensorimotor desynchronization in response to song is greater in multisensory compared to unisensory presentation conditions.
Somatosensory Influences Because all sound arises from a source of mechanical vibration, it should be no surprise that evidence exists for perception of pitch and other musical dimensions on the basis of vibrotactile input (i.e., mechanical vibration of the skin). Detection thresholds for vibrotactile stimuli show peak sensitivity around 250 Hz, and a sharp decline in sensitivity (i.e., larger thresholds) below 100 Hz (Hopkins, Maté-Cid, Fulford, Seiffert, & Ginsborg, 2016; Morioka & Griffin, 2005; Verrillo, 1992). Thresholds are also smaller in smooth (vs. hairy) skin due to increased mechanoreceptor density (Verrillo & Bolanowski, 1986), and with large (vs. small) contactor areas due to effects of spatial summation (Morioka & Griffin, 2005). Pitch
discrimination thresholds obtained with vibrotactile stimuli tend to be about five times greater than those obtained with auditory stimuli (Branje, Maksimowski, Karam, Fels, & Russo, 2010; Verrillo, 1992). In addition to this relatively poor pitch discrimination ability, there is no convincing psychophysical evidence for vibrotactile pitch discriminations beyond about 1,000 Hz. Single cell recording in macaques has revealed that low-frequency vibrotactile stimuli can activate belt areas of auditory cortex (Schroeder et al., 2001). Convergent evidence has been found in imaging studies involving adults with normal hearing. Low-frequency vibrotactile stimuli has been shown to activate auditory cortex bilaterally (Levänen, Jousmäki, & Hari, 1998), particularly in posterior belt areas (Schürmann, Caetano, Hlushchuk, Jousmäki, & Hari, 2006). The extent of auditory activations observed in deaf participants is more widespread than that observed in normal hearing participants (Auer, Bernstein, Sungkarat, & Singh, 2007), likely due to neuroplastic changes following sensory deprivation. One question resulting from this work is whether activation of auditory areas by vibrotactile stimuli is direct or whether it is the result of projections from somatosensory areas. Using MEG, Caetano and Jousmäki (2006) were able to track the time course of vibrotactile activations. They presented normal hearing participants with 200 Hz vibrotactile stimuli delivered to the fingertips. An initial response was observed in somatosensory cortex, peaking around 60 ms, followed by transient auditory responses in auditory and secondary somatosensory cortices between 100 and 200 ms. Finally, a sustained response was observed in auditory cortex between 200 and 700 ms. Although these studies all present unisensory stimuli, taken together, these findings suggest a likely mechanism for audio-tactile integration that is hierarchical involving a progressive convergence of auditory and somatosensory pathways. One of the main areas of sensory convergence in the cortex appears to be the posterior subdivisions of the auditory parabelt and the superior temporal sulcus (see Fig. 2).
FIGURE 2. Schematic sagittal view of the human brain featuring modules and pathways that are involved in the multisensory perception of music.
M
P
T
Visuomotor Influences Saldaña and Rosenblum (1993) presented participants with audio-visual presentations of cello tones where bowing and plucking was crossed across the senses. So, for example, observers were presented with a multisensory stimulus in which the audio channel consisted of an unequivocal plucking sound and the visual channel presented an unequivocal bowing movement. Much like the “McGurk effect” upon which this study is based (McGurk & Macdonald, 1976), auditory judgments were influenced by visual information. For instance, plucking sounds were more likely to be heard as bowing when accompanied by bowing visual movement. The authors interpreted their results with regard to an automatic internal motor simulation that is driven by auditory and visual information. Much like the
explanation for sung melodic pitch, an internal motor simulation may have provided a predictive coding model of action involving sensory representations. The output of the predictive coding model may have been integrated with direct auditory input at the level of the superior temporal sulcus. Consistent with this interpretation, fMRI work involving multisensory speech has consistently implicated the superior temporal sulcus and superior temporal gyrus (Callan et al., 2003, 2004; Jones & Callan, 2003). Similar evidence has been found with multisensory tool use and the extent of activation in the superior temporal sulcus appears to adhere to the law of inverse effectiveness (Stevenson & James, 2009).
Somatosensory Influences Several studies have investigated the ability to discriminate timbre using vibrotactile stimuli. Russo, Ammirante, and Fels (2012) found that deaf and hearing observers were able to accurately distinguish instrument timbres on the basis of vibrotactile information alone. Deaf and hearing participants were also able to distinguish timbre on the basis of synthetic tones that differed only with regard to spectral envelope (dull vs. bright). This ability persisted even though numerous controls were put in place to ensure that participants received no trace of residual auditory input. Russo et al. (2012) proposed that vibrotactile discrimination involves the cortical integration of spectral information filtered through frequency-tuned mechanoreceptors. There are four known channels that respond to touch (Bolanowski, Gescheider, Verrillo, & Checkosky, 1988), and each is sensitive to a unique range of the frequency spectrum. This allows the mechanoreceptors to collectively code for spectral shape in the same way that has been proposed for critical bands in the auditory system (Makous, Friedman, & Vierck, 1995). It would only take two such channels to allow for the coding of spectral tilt. A follow-up study revealed that deaf participants are able to discriminate sung vowels and that extent of difference in spectral tilt between pairs strongly predicted their discriminability (Ammirante, Russo, Good, & Fels, 2013). In addition to the influence of vibrotactile stimulation on passive reception of timbre, it seems likely that such stimulation provides
performers with valuable timbre information during active performance (Marshall & Wanderley, 2011). As an example, the string vibrations of a piano are detectable at the level of the key press. Vibration detection thresholds are reduced under natural playing conditions involving active touch (Papetti, Jarvelainen, Giordano, Schiesser, & Frohlich, 2017) and the co-occurrence of sound at the same frequency (Ro, Hsu, Yasar, Caitlin Elmore, & Beauchamp, 2009). Perhaps not surprisingly, the perception of sound quality as evaluated by the performer has been shown to be influenced by vibration that is felt through the keys (Fontana, Papetti, Järveläinen, & Avanzini, 2017). To date, there have been no neural studies investigating auditory-tactile perception of timbre. However, it seems likely that this ability would depend on direct projections from somatosensory cortex to posterior belt areas of auditory cortex (see top panel of Fig. 1). These direct projections are likely to be right lateralized because of thinner myelin sheathing in the right auditory cortex (Anderson et al., 1999), which may better support communication across frequency channels, thus enabling spectral analysis.
M
P
R
Visuomotor Influences Rhythm involves the metrical patterning and grouping of tones that is shaped by intensity and duration. Visual influences have been found to affect the ability to track rhythm as well as the low-level dimensions that contribute to rhythm (e.g., loudness and duration). Because percussionists do not have the ability to independently control the intensity and duration of the notes that they produce, the use of gestures may be particularly important in shaping these dimensions (Schutz, 2008). Rosenblum and Fowler (1991) recorded handclaps of varying intensity. They presented participants with audio-visual pairings of the handclaps that were either congruent or incongruent. Although participants were asked to base loudness judgments only on what they heard, the visual information presented had a systematic influence on loudness judgments.
Schutz and colleagues have shown that expressive gestures are also able to influence the duration of a performed note. Their initial study utilized recordings of notes performed on a marimba with “long” and “short” gestures (Schutz & Lipscomb, 2007). Audio and visual channels were recombined to form congruent and incongruent audio-visual pairings. These pairings were presented to listeners and they were asked to make duration estimations on the basis of sound alone. Although the auditory content of the recordings had no effect on estimations of duration, the visual presentation influenced perceived duration such that long gestures lengthened notes and short gestures shortened notes. This effect persisted even when visual content was substituted with a point-light display, suggesting that the effect was based on the dynamics of visual movement (Schutz & Kubovy, 2009). The ability to synchronize to metrical structures created by discrete visual flashes has been found to be inferior to synchronization with discrete auditory tones that have the same temporal characteristics (Patel, Iversen, Chen, & Repp, 2005). However, the auditory advantage is almost eliminated if visual rhythms are presented using continuous stimuli such as a bouncing ball (Grahn, 2012; Hove, Fairhurst, Kotz, & Keller, 2013; Iversen, Patel, Nicodemus, & Emmorey, 2015). Imaging results have shown that activation in the putamen, a key timing area involved in motor planning and beat perception (Grahn & Brett, 2007), parallels results obtained with sensorimotor synchronization tasks. In particular, continuous visual stimuli led to greater activation of the putamen than did visual flashes, approaching activation levels obtained with auditory beeps. This finding suggests that the ability to synchronize to metrical structure is not simply contingent on the channel of sensory input but also on the nature of stimulus presentation (Grahn, 2012; Hove et al., 2013; Ross, Iversen, & Balasubramaniam, 2016). While discrete events are optimal with auditory stimuli, continuous events lead to better outcomes with visual stimuli. Some evidence suggests that the deaf possess some advantage in tracking visual rhythms (Iversen et al., 2015). The latter finding may be owed to neuroplastic changes resulting from sensory deprivation and life-long experience with signing (Bavelier et al., 2000, 2001). Referring back to Fig. 1, the strength of direct visual input to auditory-motor pathways is likely enhanced in deaf individuals. Many studies have used EEG to assess neural entrainment to the beat. When the frequency of the beat is within the range of human movement
(e.g., 1 to 4 Hz), large swathes of cortex entrain to that frequency. These neural oscillations will persist even after a rhythmic stimulus has been temporarily paused. Depending on when the rhythmic stimulus is resumed, the entrained neural oscillations will either increase or decrease in power (Simon & Wallace, 2017). Power decreases when the rhythmic stimulus anticipates the beat (too early) and it increases when the rhythmic stimulus is resumed on the beat (on time). However, if the beat is resumed as an audio-visual event, there is no modulation of power in the entrained neural oscillations. These findings reveal that multisensory inputs are not equivalent to auditory inputs with respect to entrainment. One interpretation is that multisensory input is “highly reliable or salient” and that resources should be allocated to processing it independently from the oscillations manifesting from the original auditory-only beat. This pattern of neural findings may also help to explain results from sensorimotor synchronization studies revealing superior synchronization using multisensory rhythms compared with auditory-only rhythms (Elliott, Wing, & Welchman, 2010; Varlet, Marin, Issartel, Schmidt, & Bardy, 2012). Although visual influences on the perception of rhythm can be powerful, it is important to acknowledge that many listeners will choose to listen with their eyes closed under challenging conditions. One interpretation of this phenomenon is that the visual information is somehow distracting. In a task involving temporal order judgments of varying complexity, researchers found progressively greater deactivation of visual cortical areas as temporal asynchronies approached discrimination thresholds (Hairston et al., 2008). This finding is perhaps best understood from the perspective of the inverse effectiveness rule (Stein & Meredith, 1993), whereby deactivation of the visual cortex protects against integration of potentially aberrant timing information from the visual system in a task that is well handled by audition.
Somatosensory Influences Some evidence exists for the somatosensory system contributing to the perception of rhythm. Tranchant et al. (2017) asked deaf and hearing participants to synchronize movements to a vibrotactile beat delivered
through a vibrating platform. Hearing participants were also asked to synchronize movements to the same beat delivered through audition and without vibrotactile stimulation. Results revealed that most participants were able to synchronize to the vibrotactile beat with no differences between groups. However, for hearing participants, synchronization performance was better in the auditory condition than in the vibrotactile condition. Other studies have demonstrated that sensorimotor synchronization to a beat is possible using vibrotactile stimulation applied to the fingertip (Brochard, Touzalin, Després, & Dufour, 2008; Elliott et al., 2010), toe (Müller et al., 2008), or to the back (Ammirante, Patel, & Russo, 2016). Findings have revealed that synchronization to a simple (metronomic) vibrotactile beat can be as accurate as synchronization to an auditory beat but only under certain conditions. For example, Müller et al. (2008) found equivalence on the fingertip but not the toe and Ammirante et al. (2016) found equivalence on the back, but only when a large portion of the back was stimulated. Presumably, spatial summation (involving integration of information across receptors), improved the somatosensory response to rhythmic information (Gescheider, Bolanowski, Pope, & Verrillo, 2002). Ammirante et al. (2016) also included an audio-tactile condition to investigate multisensory integration. Results indicated that sensorimotor synchronization to audio was consistently equivalent to auditory-tactile, regardless of contactors size. These results may be interpreted with respect to the maximum likelihood estimation model (Ernst & Banks, 2002), where auditory information represents a highly reliable cue that is resistant to integration with information from a somewhat less reliable channel of sensory input (vibrotactile). The results of Ammirante et al. (2016) may also be considered with respect to sensorimotor models of perception (Fig. 1, Panel 3). The Action Simulation for Auditory Perception (ASAP) model suggests that our ability to find the beat in rhythm is based on an internal simulation of periodic motor activity (Patel & Iversen, 2014). A secondary hypothesis posited in the model is that beat perception evolved from mechanisms required for verbal communication, as both involve periodic timing and the integration of motor and auditory information. This hypothesis is supported in part by the observation that beat synchronization exists robustly in vocal-learning species that are only distally related to humans (e.g., parrots and elephants)
and not at all in non-human primates (Merchant, Grahn, Trainor, Rohrmeier, & Fitch, 2015). As vocal communication is primarily based in the auditory modality it follows that cognitive and neurological timing mechanisms would show a preference for auditory stimuli. Again, this prediction is confirmed by evidence demonstrating that sensorimotor synchronization to auditory stimuli tends to be superior to sensorimotor synchronization to visual or vibrotactile stimuli. Current research in my lab led by Sean Gilmore is using EEG and source analysis to investigate the extent to which neural entrainment to the beat is possible under audio-only, vibrotactile-only, and audio-vibrotactile stimuli. On the basis of the behavioral results of Ammirante et al. (2016), we expect to find that neural entrainment in motor planning areas will be weakest for vibrotactile stimuli and that no differences will exist between audio and audio-tactile conditions.
Movement-Based Influences Both passive and active head movements are capable of stimulating the vestibular system (Cullen & Roy, 2004). Given that people actively move their heads while listening to music it would seem that vestibular stimulation is commonplace in music listening. Moreover, given that vestibular cortex is extensively connected with other sensory systems it stands to reason that there are ample opportunities for multisensory integration in music that involve the vestibular system. Phillips-Silver & Trainor (2005) assessed the contribution of the vestibular system to multisensory rhythm using an ambiguous auditory rhythm. These rhythms can be encoded in duple form (a march) or in triple form (a waltz). The rhythms were presented to infants while they were bounced on every second or every third beat. On the basis of a head-turn preference procedure, researchers were able to conclude that when infants were bounced on every second beat, they were coding the ambiguous rhythm in duple form, and when they were bounced on every third beat they coded the rhythm in triple form. A follow-up experiment in the same study showed that blindfolding infants mitigated but did not eliminate the effect, which
confirms that this example of multisensory integration in rhythm does not depend on visual perception. Two other studies by Trainor and colleagues have confirmed that these effects of auditory-vestibular integration in music persist into adulthood. In one study, adults were trained to bounce in duple or triple time while listening to an ambiguous rhythm. A subsequent listening test showed that adults identified an auditory version of the rhythm pattern with accented beats that matched their bouncing experience as more similar than a version whose accents did not match (Phillips-Silver & Trainor, 2007). Because this study involved self-motion it was not able to separate out the contributions of vestibular and proprioceptive cues. However, a follow-up study involving direct galvanic stimulation of the vestibular system was able to provide evidence that auditory and vestibular information are integrated in rhythm perception in adults even in the absence of movement. In single cell recordings involving animal models, the posterior parietal cortex appears to be a likely locus of multisensory integration involving vestibular input (Bremmer, Schlack, Duhamel, Graf, & Fink, 2001). This area happens to be proximal to other cortical areas that have been implicated as contributing to multisensory processing (i.e., posterior superior temporal gyrus, auditory parabelt, and medial temporal areas). Other researchers have considered the consequences of multisensory integration resulting from moving to the beat. Manning & Schutz (2013) had participants move or simply listen to an isochronous beat. A final tone was presented following a brief pause and participants were asked whether it was consistent with the timing of the preceding sequence. Accuracy in this timing task was superior in the movement condition. In a follow-up study, it was found that the accuracy gains in this timing task are greater in percussionists than in non-percussionists, suggesting a role for experience with moving to the beat (Manning & Schutz, 2016). It seems likely that the multisensory timing cues resulting from moving to the beat would lead to stronger neural entrainment to the beat. Indeed, EEG research involving an ambiguous rhythm has shown that entrainment is stronger after participants have been trained to move to the rhythm in a way that suggests a binary or ternary form (Chemin, Mouraux, & Nozaradan, 2014). In addition, the entrainment gains were detectable at frequencies related to the meter of movement.
S
C
This chapter has provided theory and evidence regarding multisensory processing in music. Three mechanisms were proposed and a broad range of evidence was reviewed. Fig. 3 provides a schematic depiction of this review focusing on brain areas and connections that underpin multimodal processing of pitch, timbre, and rhythm. Solid lines are used to indicate connections that have been validated using multiple lines of evidence. Dashed lines are used to indicate connections that are more theoretical with only limited validation. Regardless of the evidential status, the proposed connection strength is reflected by line thickness. Due to space considerations, this review has been necessarily selective in topics considered. A more exhaustive consideration of the subject could have broadened the focus to include multisensory perception of lyrics (Quinto, Thompson, Russo, & Trehub, 2010), expressivity (Vuoskoski, Thompson, Clarke, & Spence, 2014), and emotion (Thompson, Russo, & Quinto, 2008; Vines et al., 2011), as well as examples of multisensory integration that are better understood from an associative or cognitive perspective (e.g., North, 2012; North, Hargreaves, & McKendrick, 1999; Wapnick et al., 2000). Nonetheless, this chapter has attempted to make the case that our conceptualization of music should be multisensory. Although the majority of individuals will justifiably focus on sound as the core of music processing, a more inclusive and nuanced consideration of music takes a multisensory perspective, involving the integration of inputs from auditory, visual, somatosensory, vestibular, and motor areas.
FIGURE 3. Schematic representation of cortical connections supporting multisensory perception of music.
A Funding supporting this research was provided by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (NSERC). I would like to thank Fran Copelli for assistance with figures and discussion of concepts. Sean Gilmore and Michael Schutz provided valuable feedback on earlier drafts of this chapter.
R Abel, M. K., Li, H. C., Russo, F. A., Schlaug, G., & Loui, P. (2016). Audiovisual interval size estimation is associated with early musical training. PLoS ONE 11(10), 1–12. Alais, D., & Burr, D. (2004). Ventriloquist effect results from near-optimal bimodal integration. Current Biology 14(3), 257–262. Ammirante, P., Patel, A. D., & Russo, F. A. (2016). Synchronizing to auditory and tactile metronomes: A test of the auditory-motor enhancement hypothesis. Psychonomic Bulletin & Review 23(6), 1882–1890. Ammirante, P., Russo, F. A., Good, A., & Fels, D. I. (2013). Feeling voices. PloS ONE 8(1), 1–5. Anderson, B., Southern, B. D., & Powers, R. E. (1999). Anatomic asymmetries of the posterior superior temporal lobes: A postmortem study. Neuropsychiatry Neuropsychology, and Behavioral Neurology 12(4), 247–254. Angelaki, D. E., Gu, Y., & DeAngelis, G. C. (2009). Multisensory integration: Psychophysics, neurophysiology, and computation. Current Opinion in Neurobiology 19(4), 452–458. Arieh, Y., & Marks, L. E. (2008). Cross-modal interaction between vision and hearing: A speedaccuracy analysis. Perception & Psychophysics 70(3), 412–421. Auer, E. T., Bernstein, L. E., Sungkarat, W., & Singh, M. (2007). Vibrotactile activation of the auditory cortices in deaf versus hearing adults. Neuroreport 18(7), 645–648. Bavelier, D., Brozinsky, C., Tomann, A., Mitchell, T., Neville, H., & Liu, G. (2001). Impact of early deafness and early exposure to sign language on the cerebral organization for motion processing. Journal of Neuroscience 21(22), 8931–8942. Bavelier, D., Tomann, A., Hutton, C., Mitchell, T., Corina, D., Liu, G., & Neville, H. (2000). Visual attention to the periphery is enhanced in congenitally deaf individuals. Journal of Neuroscience 20(17), RC93. Beauchamp, M. S., Yasar, N. E., Frye, R. E., & Ro, T. (2008). Touch, sound and vision in human superior temporal sulcus. NeuroImage 41(3), 1011–1020. Bolanowski, S. J., Gescheider, G. A., Verrillo, R. T., & Checkosky, C. M. (1988). Four channels mediate the mechanical aspects of touch. Journal of the Acoustical Society of America 84(5), 1680–1694. Branje, C., Maksimowski, M., Karam, M., Fels, D. I., & Russo, F. A. (2010). Vibrotactile display of music on the human back. Proceedings of the 3rd International Conference on Advances in Computer–Human Interactions, ACHI 2010 (pp. 154–159). Retrieved from https://doi.org/10.1109/ACHI.2010.40
Bremmer, F., Schlack, A., Duhamel, J. R., Graf, W., & Fink, G. R. (2001). Space coding in primate posterior parietal cortex. NeuroImage 14(1), S46–S51. Brochard, R., Touzalin, P., Després, O., & Dufour, A. (2008). Evidence of beat perception via purely tactile stimulation. Brain Research 1223, 59–64. Caetano, G., & Jousmäki, V. (2006). Evidence of vibrotactile input to human auditory cortex. NeuroImage 29(1), 15–28. Callan, D. E., Jones, J. A., Munhall, K., Callan, A. M., Kroos, C., & Vatikiotis-Bateson, E. (2003). Neural processes underlying perceptual enhancement by visual speech gestures. Neuroreport 14(17), 2213–2218. Callan, D. E., Jones, J. A., Munhall, K., Kroos, C., Callan, A. M., & Vatikiotis-Bateson, E. (2004). Multisensory integration sites identified by perception of spatial wavelet filtered visual speech gesture information. Journal of Cognitive Neuroscience 16(5), 805–816. Calvert, G. A., Bullmore, E. T., Brammer, M. J., Campbell, R., Williams, S. C. R., McGuire, P. K., … David, A. S. (1997). Activation of auditory cortex during silent lipreading. Science 276(5312), 593–596. Chemin, B., Mouraux, A., & Nozaradan, S. (2014). Body movement selectively shapes the neural representation of musical rhythms. Psychological Science 25(12), 2147–2159. Cullen, K. E., & Roy, J. E. (2004). Signal processing in the vestibular system during active versus passive head movements. Journal of Neurophysiology 91(5), 1919–1933. Elliott, M. T., Wing, A. M., & Welchman, A. E. (2010). Multisensory cues improve sensorimotor synchronisation. European Journal of Neuroscience 31(10), 1828–1835. Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415(6870), 429–433. Ernst, M. O., & Bülthoff, H. H. (2004). Merging the senses into a robust percept. Trends in Cognitive Sciences 8(4), 162–169. Finney, E. M. F. (2001). Visual stimuli activate auditory cortex in the deaf. Nature Neuroscience 4(12), 1171–1173. Fontana, F., Papetti, S., Järveläinen, H., & Avanzini, F. (2017). Detection of keyboard vibrations and effects on perceived piano quality. Journal of the Acoustical Society of America 142(5), 2953– 2967. Formisano, E., Kim, D. S., Di Salle, F., Van De Moortele, P. F., Ugurbil, K., & Goebel, R. (2003). Mirror-symmetric tonotopic maps in human primary auditory cortex. Neuron 40(4), 859–869. Fox, N. A., Yoo, K. H., Bowman, L. C., Cannon, E. N., Ferrari, P. F., Bakermans-Kranenburg, M. J., … Van IJzendoorn, M. H. (2016). Assessing human mirror activity with EEG mu rhythm: A metaanalysis. Psychological Bulletin 142(3), 291–313. Foxe, J. J., & Schroeder, C. E. (2005). The case for feedforward multisensory convergence during early cortical processing. Neuroreport 16(5), 419–423. Foxe, J. J., Wylie, G. R., Martinez, A., Schroeder, C. E., Javitt, D. C., Guilfoyle, D., … Murray, M. M. (2002). Auditory-somatosensory multisensory processing in auditory association cortex: An fMRI study. Journal of Neurophysiology 88(1), 540–543. Frith, C. D., & Hasson, U. (2016). Mirroring and beyond: Coupled dynamics as a generalized framework for modelling social interactions. Philosophical Transactions of the Royal Society B: Biological Sciences 371(1693), 20150366. Retrieved from https://doi.org/10.1098/rstb.2015.0366 Gescheider, G. A., Bolanowski, S. J., Pope, J. V., & Verrillo, R. T. (2002). A four-channel analysis of the tactile sensitivity of the fingertip: Frequency selectivity, spatial summation, and temporal summation. Somatosensory and Motor Research 19(2), 114–124. Goodale, M. A., & Milner, A. D. (1992). Separate visual pathways for perception and action. Trends in Neurosciences 15(1), 20–25.
Grahn, J. A. (2012). See what I hear? Beat perception in auditory and visual rhythms. Experimental Brain Research 220(1), 51–61. Grahn, J. A., & Brett, M. (2007). Rhythm and beat perception in motor areas of the brain. Journal of Cognitive Neuroscience 19(5), 893–906. Gu, Y., Angelaki, D. E., & DeAngelis, G. C. (2008). Neural correlates of multisensory cue integration in macaque MSTd. Nature Neuroscience 11(10), 1201–1210. Hairston, W. D., Hodges, D. A., Casanova, R., Hayasaka, S., Kraft, R., Maldjian, J. A., & Burdette, J. H. (2008). Closing the mind’s eye: Deactivation of visual cortex related to auditory task difficulty. Neuroreport 19(2), 151–154. Hickok, G. (2009). Eight problems for the mirror neuron theory of action understanding in monkeys and humans. Journal of Cognitive Neuroscience 21(7), 1229–1243. Hopkins, C., Maté-Cid, S., Fulford, R., Seiffert, G., & Ginsborg, J. (2016). Vibrotactile presentation of musical notes to the glabrous skin for adults with normal hearing or a hearing impairment: Thresholds, dynamic range and high-frequency perception. PLoS ONE 11(5), e0155807. Retrieved from https://doi.org/10.1371/journal.pone.0155807 Hove, M. J., Fairhurst, M. T., Kotz, S. A., & Keller, P. E. (2013). Synchronizing with auditory and visual rhythms: An fMRI assessment of modality differences and modality appropriateness. NeuroImage 67, 313–321. Hyde, K. L., Peretz, I., & Zatorre, R. J. (2008). Evidence for the role of the right auditory cortex in fine pitch resolution. Neuropsychologia 46(2), 632–639. Iversen, J. R., Patel, A. D., Nicodemus, B., & Emmorey, K. (2015). Synchronization to auditory and visual rhythms in hearing and deaf individuals. Cognition 134, 232–244. Johnsrude, I. S., Penhune, V. B., & Zatorre, R. J. (2000). Functional specificity in the right human auditory cortex for perceiving pitch direction. Brain 123(1), 155–163. Jones, J. A., & Callan, D. E. (2003). Brain activity during audiovisual speech perception: An fMRI study of the McGurk effect. Neuroreport 14(8), 1129–1133. Kaas, J. H., & Hackett, T. A. (2000). Subdivisions of auditory cortex and processing streams in primates. Proceedings of the National Academy of Sciences 97(22), 11793–11799. Kaplan, J. T., & Iacoboni, M. (2007). Multimodal action representation in human left ventral premotor cortex. Cognitive Processing 8(2), 103–113. Keil, J., Müller, N., Ihssen, N., & Weisz, N. (2012). On the variability of the McGurk effect: Audiovisual integration depends on prestimulus brain states. Cerebral Cortex 22(1), 221–231. Kilner, J. M. (2011). More than one pathway to action understanding. Trends in Cognitive Sciences 15(8), 352–357. Kilner, J. M., Friston, K. J., & Frith, C. D. (2007). Predictive coding: An account of the mirror neuron system. Cognitive Processing 8(3), 159–166. Lahav, A., Saltzman, E., & Schlaug, G. (2007). Action representation of sound: Audiomotor recognition network while listening to newly acquired actions. Journal of Neuroscience?27(2), 308–314. Levänen, S., Jousmäki, V., & Hari, R. (1998). Vibration-induced auditory-cortex activation in a congenitally deaf adult. Current Biology 8(15), 869–872. Lévêque, Y., & Schön, D. (2013). Listening to the human voice alters sensorimotor brain rhythms. PLoS ONE 8(11), 1–10. Liégeois-Chauvel, C., Giraud, K., Badier, J. M., Marquis, P., & Chauvel, P. (2012). Intracerebral evoked potentials in pitch perception reveal a functional asymmetry of human auditory cortex. Annals of the New York Academy of Sciences 930, 117–132. Lomber, S. G., Meredith, M. A., & Kral, A. (2010). Cross-modal plasticity in specific auditory cortices underlies visual compensations in the deaf. Nature Neuroscience 13(11), 1421–1427.
Loui, P. (2015). A dual-stream neuroanatomy of singing. Music Perception: An Interdisciplinary Journal 32(3), 232–241. Luo, H., Liu, Z., & Poeppel, D. (2010). Auditory cortex tracks both auditory and visual stimulus dynamics using low-frequency neuronal phase modulation. PLoS Biology 8(8), e1000445. Retrieved from http://dx.plos.org/10.1371/journal.pbio.1000445.g007 Luo, H., & Poeppel, D. (2007). Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex. Neuron 54(6), 1001–1010. McGarry, L. M., Pineda, J. A., & Russo, F. A. (2015). The role of the extended MNS in emotional and nonemotional judgments of human song. Cognitive, Affective, & Behavioral Neuroscience 15(1), 32–44. https://doi.org/10.3758/s13415-014-0311-x McGarry, L. M., Russo, F. A., Schalles, M. D., & Pineda, J. A. (2012). Audio-visual facilitation of the mu rhythm. Experimental Brain Research 218(4), 527–538. McGurk, H., & Macdonald, J. (1976). Hearing lips and seeing voices. Nature 264(5588), 746–748. Maes, P.-J., Leman, M., Palmer, C., & Wanderley, M. M. (2014). Action-based effects on music perception. Frontiers in Psychology 4. Retrieved from https://doi.org/10.3389/fpsyg.2013.01008 Makous, J. C., Friedman, R. M., & Vierck, C. J. (1995). A critical band filter in touch. Journal of Neuroscience 15(4), 2808–2818. Manning, F. C., & Schutz, M. (2013). “Moving to the beat” improves timing perception. Psychonomic Bulletin and Review 20(6), 1133–1139. Manning, F. C., & Schutz, M. (2016). Trained to keep a beat: Movement-related enhancements to timing perception in percussionists and non-percussionists. Psychological Research 80(4), 532– 542. Marshall, M. T., & Wanderley, M. M. (2011). Examining the effects of embedded vibrotactile feedback on the feel of a digital musical instrument. New Interfaces for Musical Expression (June), 399–404. Merabet, L. B., Hamilton, R., Schlaug, G., Swisher, J. D., Kiriakopoulos, E. T., Pitskel, N. B., … Pascual-Leone, A. (2008). Rapid and reversible recruitment of early visual cortex for touch. PLoS ONE 3(8). Retrieved from https://doi.org/10.1371/journal.pone.0003046 Merchant, H., Grahn, J., Trainor, L., Rohrmeier, M., & Fitch, W. T. (2015). Finding the beat: A neural perspective across humans and non-human primates. Philosophical Transactions of the Royal Society B: Biological Sciences 370(1664), 20140093. Retrieved from https://doi.org/10.1098/rstb.2014.0093 Meredith, M. A., & Stein, B. E. (1986). Visual, auditory, and somatosensory convergence on cells in superior colliculus results in multisensory integration. Journal of Neurophysiology 56(3), 640–662. Milner, B. (1962). Laterality effects in audition. In V. B. Mountcastle (Ed.), Interhemispheric relations and cerebral dominance (pp. 177–195). Baltimore, MD: Johns Hopkins University Press. Morioka, M., & Griffin, M. J. (2005). Thresholds for the perception of hand-transmitted vibration: Dependence on contact area and contact location. Somatosensory and Motor Research 22(4), 281– 297. Müller, K., Aschersleben, G., Schmitz, F., Schnitzler, A., Freund, H. J., & Prinz, W. (2008). Interversus intramodal integration in sensorimotor synchronization: A combined behavioral and magnetoencephalographic study. Experimental Brain Research 185(2), 309–318. Nattiez, J.-J. (1990). Music and discourse: Toward a semiology of music. Princeton, NJ: Princeton University Press. North, A. C. (2012). The effect of background music on the taste of wine. British Journal of Psychology,103(3), 293–301. North, A. C., Hargreaves, D. J., & McKendrick, J. (1999). The influence of in-store music on wine selections. Journal of Applied Psychology 84(2), 271–276.
Papetti, S., Jarvelainen, H., Giordano, B. L., Schiesser, S., & Frohlich, M. (2017). Vibrotactile sensitivity in active touch: Effect of pressing force. IEEE Transactions on Haptics 10(1), 113–122. Patel, A. D., & Iversen, J. R. (2014). The evolutionary neuroscience of musical beat perception: The Action Simulation for Auditory Prediction (ASAP) hypothesis. Frontiers in Systems Neuroscience 8, 57. Retrieved from https://doi.org/10.3389/fnsys.2014.00057 Patel, A. D., Iversen, J. R., Chen, Y., & Repp, B. H. (2005). The influence of metricality and modality on synchronization with a beat. Experimental Brain Research 163(2), 226–238. Peretz, I. (1990). Processing of local and global musical information by unilateral brain-damaged patients. Brain 113(4), 1185–1205. Phillips-Silver, J., & Trainor, L. J. (2005). Feeling the beat: Movement influences infant rhythm perception. Science 308(5727), 1430. Phillips-Silver, J., & Trainor, L. J. (2007). Hearing what the body feels: Auditory encoding of rhythmic movement. Cognition 105(3), 533–546. Platz, F., & Kopiez, R. (2012). When the eye listens: A meta-analysis of how audio-visual presentation enhances the appreciation of music performance. Music Perception 30(1), 71–83. Poeppel, D. (2001). Pure word deafness and the bilateral processing of the speech code. Cognitive Science 25(5), 679–693. Quinto, L., Thompson, W. F., Russo, F. A., & Trehub, S. E. (2010). A comparison of the McGurk effect for spoken and sung syllables. Attention, Perception, & Psychophysics 72(6), 1450–1454. Rauschecker, J. P., Tian, B., & Hauser, M. (1995). Processing of complex sounds in the macaque nonprimary auditory cortex. Science 268(5207), 111–114. Rauschecker, J. P., Tian, B., Pons, T., & Mishkin, M. (1997). Serial and parallel processing in rhesus monkey auditory cortex. Journal of Comparative Neurology 382(1), 89–103. Ro, T., Hsu, J., Yasar, N. E., Caitlin Elmore, L., & Beauchamp, M. S. (2009). Sound enhances touch perception. Experimental Brain Research 195(1), 135–143. Röder, B., Stock, O., Bien, S., Neville, H., & Rösler, F. (2002). Speech processing activates visual cortex in congenitally blind humans. European Journal of Neuroscience 16(5), 930–936. Rohe, T., & Noppeney, U. (2015). Cortical hierarchies perform Bayesian causal inference in multisensory perception. PLoS Biology 13(2). Retrieved from https://doi.org/10.1371/journal.pbio.1002073 Rosenblum, L. D., & Fowler, C. A. (1991). Audiovisual investigation of the loudness-effort effect for speech and nonspeech events. Journal of Experimental Psychology: Human Perception and Performance 17(4), 976–985. Ross, J. M., Iversen, J. R., & Balasubramaniam, R. (2016). Motor simulation theories of musical beat perception. Neurocase 22(6), 558–565. Royal, I., Lidji, P., Théoret, H., Russo, F. A., & Peretz, I. (2015). Excitability of the motor system: A transcranial magnetic stimulation study on singing and speaking. Neuropsychologia 75, 525–532. Russo, F. A., Ammirante, P., & Fels, D. I. (2012). Vibrotactile discrimination of musical timbre. Journal of Experimental Psychology: Human Perception and Performance 38(4), 822–826. Russo, F. A., Sandstrom, G. M., & Maksimowski, M. (2011). Mouth versus eyes: Gaze fixation during perception of sung interval size. Psychomusicology: Music, Mind, and Brain 21(1–2), 98– 107. Saldaña, H. M., & Rosenblum, L. D. (1993). Visual influences on auditory pluck and bow judgments. Perception & Psychophysics 54(3), 406–416. Schalles, M. D., & Pineda, J. A. (2015). Musical sequence learning and EEG correlates of audiomotor processing. Behavioural Neurology, 2015. Retrieved from https://doi.org/10.1155/2015/638202
Schroeder, C. E., Lindsley, R. W., Specht, C., Marcovici, A., Smiley, J. F., & Javitt, D. C. (2001). Somatosensory input to auditory association cortex in the macaque monkey. Journal of Neurophysiology 85(3), 1322–1327. Schürmann, M., Caetano, G., Hlushchuk, Y., Jousmäki, V., & Hari, R. (2006). Touch activates human auditory cortex. NeuroImage 30(4), 1325–1331. Schutz, M. (2008). Seeing music? What musicians need to know about vision. Empirical Musicology Review 3(3), 83–108. Schutz, M., & Kubovy, M. (2009). Deconstructing a musical illusion: Point-light representations capture salient properties of impact motions. Canadian Acoustics 37(1) 23–28. Schutz, M., & Lipscomb, S. (2007). Hearing gestures, seeing music: Vision influences perceived tone duration. Perception 36(6), 888–897. Senkowski, D., Schneider, T. R., Foxe, J. J., & Engel, A. K. (2008). Crossmodal binding through neural coherence: Implications for multisensory processing. Trends in Neurosciences 31(8), 401– 409. Simon, D. M., & Wallace, M. T. (2017). Rhythmic modulation of entrained auditory oscillations by visual inputs. Brain Topography 30(5), 565–578. Spence, C. (2011). Crossmodal correspondences: A tutorial review. Attention, Perception, & Psychophysics 73(4), 971–995. Stein, B. E., & Meredith, M. A. (1993). The merging of the senses. Cambridge, MA: MIT Press. Stevenson, R. A., & James, T. W. (2009). Audiovisual integration in human superior temporal sulcus: Inverse effectiveness and the neural processing of speech and object recognition. NeuroImage 44(3), 1210–1223. Thomas, C. (1983). Music as heard: A study in applied phenomenology. New Haven, CT: Yale University Press. Thompson, W. F., Graham, P., & Russo, F. A. (2005). Seeing music performance: Visual influences on perception and experience. Semiotica 156(1/4), 203–227. Thompson, W. F., & Russo, F. A. (2007). Facing the music. Psychological Science 18(9), 756–757. Thompson, W., Russo, F., & Livingstone, S. (2010). Facial expressions of singers influence perceived pitch relations. Psychonomic Bulletin & Review 17(3), 317–322. Thompson, W. F., Russo, F. A., & Quinto, L. (2008). Audio-visual integration of emotional cues in song. Cognition & Emotion 22(8), 1457–1470. Tranchant, P., Shiell, M. M., Giordano, M., Nadeau, A., Peretz, I., & Zatorre, R. J. (2017). Feeling the beat: Bouncing synchronization to vibrotactile music in hearing and early deaf people. Frontiers in Neuroscience 11. Retrieved from https://doi.org/10.3389/fnins.2017.00507 Varlet, M., Marin, L., Issartel, J., Schmidt, R. C., & Bardy, B. G. (2012). Continuity of visual and auditory rhythms influences sensorimotor coordination. PLoS ONE 7(9). Retrieved from https://doi.org/10.1371/journal.pone.0044082 Verrillo, R. T. (1992). Vibration sensation in humans. Music Perception: An Interdisciplinary Journal 9(3), 281–302. Verrillo, R. T., & Bolanowski, S. J. (1986). The effects of skin temperature on the psychophysical responses to vibration on glabrous and hairy skin. Journal of the Acoustical Society of America 80(2), 528–532. Vines, B. W., Krumhansl, C. L., Wanderley, M. M., Dalca, I. M., & Levitin, D. J. (2011). Music to my eyes: Cross-modal interactions in the perception of emotions in musical performance. Cognition 118(2), 157–170. Vuoskoski, J. K., Thompson, M. R., Clarke, E. F., & Spence, C. (2014). Crossmodal interactions in the perception of expressivity in musical performance. Attention, Perception, & Psychophysics, 76(2), 591–604.
Wapnick, J., Mazza, J. K., & Darrow, A. A. (2000). Effects of performer attractiveness, stage behaviour, and dress on evaluation of children’s piano performances. Journal of Research in Music Education 323(4), 323–335. Zatorre, R. J. (1988). Pitch perception of complex tones and human temporal-lobe function. Journal of the Acoustical Society of America 84(2), 566–572. Zatorre, R. J., & Belin, P. (2001). Spectral and temporal processing in human auditory cortex. Cerebral Cortex 11(10), 946–953. Zatorre, R. J., Chen, J. L., & Penhune, V. B. (2007). When the brain plays music: Auditory–motor interactions in music perception and production. Nature Reviews Neuroscience 8, 547–558.
SECTION IV
N E U R A L R E S P ON S E S TO MU S IC : C OGN IT ION , A F F E C T, L A N GU A GE
CHAPT E R 11
M U S I C A N D M E M O RY L U T Z JÄ N C K E
I M listening, music composing, and music-making are strongly associated with memory processes. For example, when we listen to music we might remember the title, the melody, the singer or musicians, and the circumstances in which we heard the music for the first time. It is also possible that we catch the gist when listening to a particular piece of music without explicitly knowing the details of the piece. These are the most obvious memory aspects associated with music. However, some people might even be able to remember a single tone or a tone interval without relying on a reference tone. Music might also help to boost our memory and help us to consolidate what we have learned. These few examples demonstrate that music is associated in many ways with memory processes. In this chapter, I will discuss these associations and provide some examples of and future applications for music supporting memory processes. But before I examine the typical music-related memory aspects, I discuss some basic principles of the human memory system.
H
M
G
Human memory comprises several parts: (1) sensory memory, (2) shortterm memory, (3) working memory, and (4) long-term memory. The sensory memory stores sensory information for a very short period. This memory system is strongly associated with neural networks processing sensory information. Thus, this information is not processed, interpreted, and encoded. The working memory system is a central system not only for memory processes; it is rather pivotal for many if not all cognitions. The main functions controlled by working memory are often coined as “maintenance and manipulation” to express the fact that working memory not only holds but also manipulates information. To hold information for a short period of time without any cognitive manipulation is a matter of shortterm memory. Manipulation, which is a main pillar of the working memory system, is strongly related to executive functions, pattern recognition, longterm memory, encoding for long-term memory, language and music comprehension, problem solving, and even creativity. This is all accomplished under participation of the working memory system. Thus, this system is pivotal for nearly all music functions and particularly for music memory. The neural networks, which are involved in working memory process, are not focal but are distributed over many brain areas due to the many functions associated with working memory. In long-term memory, encoded material is stored for longer time periods, sometimes even extremely long—up to many decades. Long-term memory is divided into an explicit and implicit memory system. The explicit memory system contains consciously available information and comprises the semantic and episodic memory. The semantic memory contains conscious memory of facts while the episodic memory is a system for holding events, memory traces associated with places, times, emotions, and other concept-based knowledge of an experience. This explicit memory (sometimes also called declarative memory) is not a simple store; it is rather a mechanism constructing the past on the basis of stored and new information using specific strategies (e.g., retrieval schemas, which will be described later). The neural underpinnings of the explicit memory system are relatively complex and contain so-called “bottleneck structures” in mesiotemporal brain areas (including the hippocampus) and networks in temporal, parietal, as well as frontal brain areas. Thus, the explicit memory system is based on a distributed network with a mesiotemporal core system. The implicit memory system contains information that is not easy to verbalize but can be
used without consciously thinking about it. The networks controlling this implicit memory system do not overlap with the neural networks for the explicit memory system. The neural networks for the implicit memory mainly comprise premotor, cerebellar, and basal ganglia structures.
M
P
M L
The psychological processes and the neural underpinnings of music listening have been studied quite intensively. These studies have shown that music is processed in a cascade of steps that begins with the segregation within the auditory stream, followed by the extraction and integration of a variety of acoustic features, leading to cognitive memory-related processes that induce personal, often emotional, experiences. Thus, listening to music can be conceived of as a hierarchical continuous serial-to-parallel conversion during which the auditory stream (stream of tones and chords) is integrated to melody chunks and these melody chunks are then integrated to an entire melody (Fig. 1). For this serial-to-parallel conversion, working memory processes are pivotal, since the tonal and/or music information is stored temporarily and perpetually manipulated.
FIGURE 1. Schematic description of the serial-to-parallel conversion, which can be conceived of as a form of integration of serial information on different levels. t1–t10 represent different tones presented in serial order. m1 to m3 are the integrated tones combining to form melody fragments. At the next level, these melody fragments are integrated into a larger melody cluster or even into the entire musical piece.
The sound sequences are woven into a melodic contour of pitch and rhythm. These melodic contours do not appear to be due to bottom-up processes since the listener is not a passive listener or receiver, but is actively engaged in processing the music. In this context, the listener uses acoustic memories, aesthetic judgments, and expectations and combines them to understand and interpret the particular piece of music (the schema concept is discussed later in the chapter). Thus, the listener stores many aspects of the auditory stimuli—such as pitch, pitch interval, timbre, and rhythm—in memory. Based on this stored information, the listener constructs an integrated memory of the particular melody. In the following, I will describe some memory processes associated with tone, tone interval, and melody processing in more detail.
Tone Memory Even non-musicians are relatively good at remembering and recognizing single tones or the pitch of a melody. For example, in an experiment conducted by Gaab and colleagues (Gaab, Gaser, Zaehle, Jancke, & Schlaug, 2003), non-musicians performed well in pitch memory tasks during which they were asked to make a decision on whether the last or second to last tone of a tone sequence was the same or different as the first tone. The recognition rate for the tones was astonishingly high, with an accuracy of about 66 percent. The authors also conducted fMRI measurements during pitch memory learning. When relating the pitch memory performance to the task-related hemodynamic responses, they revealed that bilaterally the supramarginal gyrus and the dorsolateral cerebellum were significantly correlated with good task performance. The authors suggest that (besides the auditory cortex), the supramarginal gyrus and the dorsolateral cerebellum may play a critical role in short-term storage of pitch information. Absolute pitch listeners are much better in memorizing tones and chords. Absolute pitch (AP) is defined as the ability to identify a note without relying on a reference tone (Levitin & Rogers, 2005; Takeuchi & Hulse, 1993). It is a rare ability with an incidence of 1 percent in the general population, although Asian people speaking tonal languages have a higher rate (Deutsch, Henthorn, Marvin, & Xu, 2006). Absolute pitch is supposed to originate from an intertwining of genetic factors (Gregersen, Kowalsky, Kohn, & Marvin, 1999), early exposure to music (Gregersen, Kowalsky, Kohn, & Marvin, 2001), and intensity of musical training (Gregersen et al., 2001). Currently, a two-component model is discussed explaining this extraordinary ability. In the context of this model, it is suggested that AP is constituted by one perceptual (i.e., “categorical perception”) and two cognitive—“pitch memory” (i.e., explicit memory) and “pitch labeling” (i.e., implicit associative memory)—mechanisms, whereby the latter mechanism has been suggested as constituting the load-bearing skeleton of AP. Several neurophysiological and neuroanatomical studies support this suggestion. One main finding in this context is that frontotemporal areas are strongly activated during tone listening and tone memory tasks in AP listeners and that these regions are specifically and strongly functionally
and anatomically interconnected (Elmer, Rogenmoser, Kühnis, & Jäncke, 2015; Rogenmoser, Elmer, & Jäncke, 2015; Zatorre, Perry, Beckett, Westbury, & Evans, 1998). Although these findings are interesting and important for understanding the neural underpinnings of tone perception and tone memory, listening to single tones and remembering them are not adequate tasks in understanding musical listening and the associated memory processes in their entirety.
Tone Interval Memory More important for understanding music-related memory processes is to understand the psychological and neurophysiological processes that are operative during tone sequence and melody listening tasks. Even nonmusicians are very good in recognizing melodies based largely upon the relative sizes of the intervals between successive pitches. This ability is robustly preserved even when the entire frequency range of the music is shifted up or down (i.e., during transposing). This ability, which is called relative pitch (RP) processing, is strongly influenced or even entirely acquired early during development. For example, Trainor and colleagues (Trainor, McDonald, & Alain, 2002) showed that 5.5- to 6.5-month-old infants preferred to listen to a particular melody, which they have listened to repeatedly (compared to a novel melody). In this experiment, the authors also demonstrated that the AP information was more or less unimportant. Most important, however, was the long-term representation of the melody, which is based on the tone intervals. In a further electrophysiological experiment (Plantinga & Trainor, 2005), it was shown that RP interval processing occurs in a more or less automatic fashion, as demonstrated by mismatch negativities (MMN) to deviations of known pitch intervals. Since the MMN is commonly regarded as a neurophysiological marker of preattentive processing of change detection, the authors conclude that pitch interval perception is automatically implanted. Further studies have substantiated these findings by showing that the encoding accuracy increases with increasing length of the tone sequences (Lee, Janata, Frost, Martinez, & Granger, 2015). The authors interpret these findings as support
for the idea that it is easier for the subjects to apply particular Gestalt principles to longer than to short tone sequences.
Tonal Working Memory When we listen to music we integrate the incoming sequential auditory information. That makes it necessary to hold some auditory information for a short period of time in memory and to combine this with the next incoming sounds of a melody. Thus, we have to hold auditory information, and based on our knowledge about the musical structure, we combine the tone sequences into melodies. Without such a mechanism, it would be impossible for us to follow and understand even the shortest musical piece. From this description, it is clear that short-term memory processes (maintaining auditory information for a short period of time) as well as cognitive processes (manipulating, combining, and prediction) are involved here. This combination of maintenance and manipulation of incoming stimuli has led to the formulation of the working memory (WM) concept. The classical WM model was mostly developed using verbal material (Baddeley & Hitch, 1974). According to this model, verbal information is processed by a phonological loop, which is further subdivided into a passive storage component (phonological store) and an active rehearsal mechanism (articulatory rehearsal process). The passive storage component is assumed to store auditory or speech-based information for a few seconds. In addition, an attentional control system (the central executive) controls and supervises the phonological loop. In a later version of the WM model, the mutual interaction between long-term memory (LTM) and WM was recognized by proposing an episodic buffer (Baddeley, 2010). Recent developments have led to a more domain-general model of WM (Cowan, 2011; Oberauer & Lewandowsky, 2011). This new model proposes polymodal LTM representations of items, which are activated either by incoming sensory input or by volition, thus becoming available for attentional selection. Based on these theoretical contributions, we now accept that WM is a system with limited capacity binding information from the phonological loop, storing information in a multimodal code, and
enabling the interaction between WM and LTM under the supervision of attention and executive control.
Behavioral Findings of Tonal Working Memory Although the classical WM model is well elaborated, it has been unclear whether music information (e.g., tones, chords, and timbre) is processed within the WM system similar to verbal information. As mentioned above, the classical WM model has been designed on the basis of verbal information and does not explicitly specify whether the phonological loop also processes non-verbal information. In behavioral WM studies, one typically influences the rehearsal process by introducing specific stimuli that are similar to the stimuli (e.g., phonological similarity effect) that should be held in mind. Other paradigms manipulate the length of the to-beremembered items (e.g., word, or sequence, length effect). An important part of the classical WM model is that verbal information can be maintained in verbal WM by internal articulatory rehearsal (within the phonological loop). But does such an internal rehearsal also exist for pitch and timbre information? Not many studies have been conducted to date trying to answer this question and they have come to conflicting conclusions (for an excellent summary see the review by Schulze & Koelsch, 2012). However, as Schulze and Koelsch (2012) correctly point out, the conflicting results are mainly based on the different paradigms and stimuli settings used. Nevertheless, when having a closer look at these findings, a more or less clear picture emerges. There is clear evidence that a tonal WM indeed exists in which tonal information is rehearsed. However, the subjects must be able to rehearse the material. Rehearsal is possible if the subjects are familiar with the tone information and when the to-beremembered tone information is salient enough (i.e., when tones are used whose frequencies correspond to the frequencies of the Western chromatic scale, or if the frequency differences between the used tones are not smaller than one semitone). Behavioral studies directly comparing verbal and tonal WM are relatively rare. Early studies (Deutsch, 1970; Salamé & Baddeley, 1989) reported that tones or instrumental music as intervening stimuli interfered more strongly with WM tasks for tones than for phonemes or syllables. Thus, these studies were taken as support for a specialized tonal and verbal
WM system. However, Semal and colleagues (Semal, Demany, Ueda, & Hallé, 1996) discovered that the frequency relations between the intervening stimuli and the standard stimuli are most important to explain the results of the behavioral WM experiments. They rather identified that pitch similarity of the intervening stimuli (words or tones) had a greater effect on the performance rate than the particular modality (verbal or tonal) of the intervening stimuli. Thus, the pitches for both verbal and tonal stimuli are processed in the same WM system. This auditory WM system always comes into play when the to-beremembered information is auditorily coded. For example, in a suppression experiment (during which the subjects had to either sing or speak during the retention period), recognition accuracy for both tone and digit sequences decreases, regardless of whether the suppression material was verbal or non-verbal (Schendel & Palmer, 2007). Thus, again this experiment demonstrates that musical or verbal suppression does not selectively impair verbal or tonal WM. A further experiment uncovered expertise-related influences on the tonal WM system (Williamson, Baddeley, & Hitch, 2010). The results of this experiment showed decreased performance if the tone sequences consisted of more proximal (similar) pitches compared to more distal (dissimilar) pitches, an effect resembling the phonological similarity effect in the verbal WM domain. Thus, one should refer more to an auditory WM where auditory information of the same (or similar) pitch determines intervening effects. And these intervening effects are independent whether the standard or intervening stimuli are tones, vowels, syllables, or words. In other words, everything that sounds similar interferes with each other. There is a single auditory-based WM system that is used for all auditory information, regardless of whether it is verbal or non-verbal auditory material. This might explain how musical training can improve verbal working memory (discussed later). However, in the context of music listening, one has to keep in mind that it is possible that acoustic information exists, which one cannot rehearse. This could be specific timbre or pitch information. In this case, the subjects cannot take advantage of the phonological loop (or general auditory rehearsal mechanism). In such situations, the auditory information is retained for a short period of time in specific feature maps.
Neuroanatomical Correlates of Working Memory With the advent of modern brain imaging techniques, it is now possible to identify the neural networks that are involved in controlling WM processes. In the past, several studies have examined the neural underpinnings of auditory WM using verbal material. These studies have shown that mainly Broca’s area and premotor areas are core regions involved in the internal rehearsal of verbal material (for a review of these studies see Schulze & Koelsch, 2012). Besides these core regions, the insular cortex and the cerebellum seem also to be involved in internal rehearsal of verbal information. The neural underpinning of the phonological store has been suggested to rely on parietal areas including the inferior and superior parietal lobules and on the posterior perisylvian brain (particularly including the left posterior planum temporale). While parietal brain areas most likely reflect increased engagement of attentional resources (which, incidentally, nicely fits with the pivotal role of attention in WM processes according to the new domain-general WM models: Oberauer & Lewandowsky 2011), the left posterior planum temporale is possibly involved in the temporary storage of verbal information during WM tasks. On the basis of these findings, and because posterior perisylvian brain areas also support speech processing, it has been proposed that they act as an auditory–motor interface for WM (Hickok & Poeppel, 2007). These findings suggest a dual-stream model of speech processing with a ventral stream involved in speech comprehension (supporting lexical access) and a left dominant dorsal stream comprising the planum temporale enabling sensory–motor integration. Through this stream the perceived speech signals are mapped onto articulatory representations in frontal brain areas Elmer, Hänggi, Meyer, & Jäncke, 2013). Far fewer neuroimaging studies have directly investigated the neural underpinnings of WM for tones. However, the few studies that have examined tonal WM revealed that, only in non-musicians, all structures involved in tonal WM were also involved in verbal WM. In summary, consistently across studies (Schulze, Mueller, & Koelsch, 2011, 2013; Schulze, Zysset, Mueller, Friederici, & Koelsch, 2011), data obtained from non-musicians indicate a considerable overlap of neural resources underlying WM for both verbal and tonal information. This common
network includes a mainly left-lateralized fronto-parietal network including Broca’s area, parietal areas, and the planum temporale.
Memory for Music When we listen to music, we often recognize the musical piece quite well. Sometimes we remember the title of the musical piece or even further information like the text, composer, and the main instruments. Listeners sometimes are even very accurate in reproducing familiar music by singing and moving rhythmically to the music (Frieler et al., 2013; Halpern, 1989; Levitin, 1994). Similar to verbal and non-verbal memory, musical memory can be divided into implicit (unconscious), semantic, and episodic musical memory (the latter memory systems are conscious) (Platel, 2005). The implicit musical memory can best be seen in neurological patients. For example, Johnson and colleagues (Johnson, Kim, & Risse, 1985) exploited the so-called mere exposure effect in the context of music listening experiments. This mere exposure effect was first demonstrated and described by Zajonc (1968) as a psychological phenomenon, describing that subjects tend to develop a preference for items simply because they have been repeatedly confronted with this item. In the study by Johnson and colleagues, Korsakoff’s syndrome patients preferred an unfamiliar musical piece after only one previous presentation, compared to new musical pieces. However, these patients did very poorly in a music recognition test. Halpern and O’Connor (2000) found the same dissociation in normal elderly listeners, who were at chance in recognizing just-presented melodies. However, these subjects liked these musical pieces better than new melodies. A similar distinction between explicit and implicit music memory was drawn by Samson and Peretz (2005). On the basis of a comprehensive analysis of neurological patients suffering from lesions in either the right or the left temporal lobe, they concluded that right temporal lobe structures have a crucial role in the formation of melody representations that support priming and memory recognition, which are both more implicit memory processes, whereas left-sided temporal lobe structures are more involved in the explicit retrieval of melodies. Mere exposure effects have also been shown in healthy subjects (Green, Bærentsen, Stødkilde-Jørgensen,
Roepstorff, & Vuust, 2012; Honing & Ladinig, 2009). These and further similar studies in this area gave rise to the suggestion that there is indeed an implicit musical memory, which demonstrates different features compared to explicit musical memory. Implicit musical memory in normal and healthy subjects appears, for example, during by-the-way music listening during which we might move or hum without explicitly knowing which musical piece we are listening to. This definitely happens nowadays quite often, especially when we use our mobile devices (e.g., iPhone, etc.) while we stroll through the street, drive a car, or jog. The semantic musical memory is defined as memory for music excerpts without associating them with the context in which the listener learned the excerpt. Thus, we do not associate and remember the temporal (when) or spatial (where) circumstances under which we have encoded and learned the musical piece. Musical semantic memory may represent a form of musical lexicon, separate of a verbal lexicon, even though strong links certainly exist between them. Interestingly, musical pieces can be associated with non-music semantic memory as Koelsch and colleagues have shown (Koelsch et al., 2004). They demonstrated that short music excerpts can prime concrete and even abstract words. Even when the musical pieces were unknown to the subject, this priming effect occurred. Obviously, music can carry meaning. The precise psychological mechanism responsible for this interesting association between music and meaning is currently not entirely understood but this study particularly shows that musical information is strongly embedded in distributed memory network. Episodic musical memory, on the other hand, is defined as the capacity to recognize a musical excerpt for which the spatiotemporal context during learning can be recalled (when, where, under which circumstances, and with which people). A particular form of episodic musical memory is the autobiographical musical memory. This memory part is activated when we listen to music which is strongly associated with past experiences of our own life. A further memory concept, which is similar to the autobiographical memory, is the so-called memory for “nostalgia.” Nostalgia has been defined as an affective process sometimes accompanying autobiographical memories (Wildschut, Sedikides, Arndt, & Routledge, 2006) giving rise to (mostly) positive and (sometimes) negative effects (such as sadness). Nostalgia is strongly associated with personality
traits explaining the obvious inter-individual differences in the presence of this effect. The different facets of musical memory have been the focus of substantial research in recent years. Based on this research, we now know that the different musical memory systems mentioned earlier can be modulated by different psychological aspects comprising (1) intrinsic musical features such as timbre or tempo, (2) the emotional and arousal components, and (3) individual schemas and musical structure. A further issue influencing music memory processes, which incidentally is relatively new, pertains to the (4) particular brain activation pattern during encoding and retrieval of music information. In the following, I will discuss these issues in more detail.
Intrinsic Features of Musical Pieces Halpern and Müllensiefen (2008) manipulated timbre and tempo in order to examine their influence on implicit and explicit memory for musical pieces. They asked their study participants to encode forty unfamiliar short tunes. After that, the participants were asked to give explicit and implicit memory ratings for a list of eighty tunes, which included forty that had previously been heard. To measure implicit memory, a rating of the pleasantness of old and new melodies was used. Measures reflecting explicit memory performance were obtained by calculating the difference between the recognition confidence ratings of old and new melodies. Half of the forty previously heard tunes differed in timbre or tempo in comparison with the first exposure. Change in timbre and tempo both impaired explicit memory measures, and change in tempo also made implicit tune recognition worse. These findings support the hypothesis that an implicit musical memory indeed exists, but furthermore shows implicit music memory is only influenced by tempo variations. Interestingly, timbre and tempo had an influence on the explicit music memory. These and further similar studies in this area gave rise to the suggestion that there is indeed an implicit musical memory, which demonstrates different features compared to the explicit musical memory.
Emotion and Arousal Induced by Music
Several studies have shown that emotion and arousal evoked by musical pieces influence retrieval and recognition of music. The main finding is that emotional and arousing musical pieces are remembered better than pieces which are less emotional and arousing (Alonso, Dellacherie, & Samson, 2015; Eschrich, Münte, & Altenmüller, 2005, 2008; Ferreri & RodriguezFornells, 2017; Parks & Clancy Dollinger, 2014; Peretz et al., 2009; Vieillard & Gilet, 2013) (but for contradictory results, see Altenmüller, Siggel, Mohammadi, Samii, & Münte, 2014). The reason for this memory enhancing effect is thought to be based on at least two different and partly interacting effects: (1) activation of the mesolimbic system, and (2) enhancing the number of associations within the semantic associative network. Emotionally and rewarding music strongly activates the mesolimbic reward system (Salimpoor, Zald, Zatorre, Dagher, & McIntosh, 2015). The mesolimbic system is a relatively small brain system (including the nucleus accumbens and the ventromedial prefrontal cortex), which is important for the control of emotion, reward, and learning and which is mediated mainly by dopamine. Dopamine is also widely recognized to be the critical transmitter involved in addiction processes, for example, during the course of virtually all drug abuses (including heroin, alcohol, cocaine, and nicotine abuse). Even psychological addictions (e.g., gaming) are associated with particular activations within the dopamine system (Kühn et al., 2011). But other forms of rewards such as positive social interactions likewise activate dopaminergic neurons and are powerful aids to attention and learning (Keitz, Martin-Soelch, & Leenders, 2003). Dopamine is thought to strengthen the synaptic potentiation in memory networks activated during learning and consolidation of the music material. Thus, dopamine also promotes plastic adaptations in brain areas involved in the control of trained and practiced tasks. A further transmitter involved in music listening is serotonin. Serotonin levels are significantly higher when subjects are exposed to music they find pleasing (Evers & Suhr, 2000). Several (mostly animal) studies have suggested a particular role of serotonin in learning and memory processes (Meneses & Liy-Salmeron, 2012). However, it is not entirely clear whether serotonin plays a positive or inhibitory role in memory formation. It may actually be memory enhancing in one brain area and inhibiting in another. Nevertheless, these transmitter systems (together with several others) may support learning and memory processes. However,
one has to acknowledge that not only positively evaluated and rewarding music is preferentially stored in musical memory but also negative or simply arousing music. That these non-rewarding musical pieces are strongly implemented in the music memory can be better explained by the associative memory models, which will be explained in the next paragraph. However, it should be kept in mind that this model is also useful in explaining the role of emotion in general, irrespective of valence and arousal. In the context of the semantic associative network model of memory formation (Bower, 1981) or the Search of Associative Memory (SAM) model (Raaijmakers & Shiffrin, 1981), it has also been proposed that emotions are used as contextual information linked to the to-beremembered item. These models assume that emotions are represented in a network of nodes together with the musical piece. Thus stimulation and “activation” of emotion nodes would create a form of spreading activation that lowers the threshold of excitation of all associatively linked nodes and thus helps to retrieve the music memory trace from memory. We will come to this model and the extension later on. This model is particularly suited to explain why even unpleasant music might be remembered well. This issue has not been studied so far, but from introspection it is known that we sometimes heavily dislike particular musical pieces despite recognizing them relatively accurately.
Individual Schemas and Music Structure Different listeners may understand the same musical piece in very different ways. They may have varying degrees of appreciation for the musical structure and they may differ on how it fits into the cultural context. In order to describe and understand how we individually perceive and memorize music, I will use the well-known schema concept (Piaget, 1923). In other words, schemas are a form of cognitive heuristic which automatically makes assumptions about the music and, although not completely accurate, enables us to make quick judgments. Schemas are a product of our experiences and can be adjusted or refined throughout our entire lives. These schemas help us to understand various musical pieces; they can influence our music memories, or influence what musical piece we pay our attention to, and thus affect the chunks of information that are
available for encoding long-term music memories. Additionally, when we try to remember a musical memory, schemas can help us to piece together memories from it. These schemas determine how (and whether) we encode, consolidate, and remember a particular piece of music. A schema can even prevent us from encoding, for example, when we are not interested in or strongly dislike a particular musical piece. In such situations, we will not focus our attention on this piece and at the end we will remember it poorly. On the other hand, it is possible that a particular musical piece fits perfectly to a stored schema (which incidentally is positively evaluated), in which case we direct our attention to this piece of music and insert it preferentially into our memory system. Incidentally, we know from several neurophysiological studies that focusing attention on a particular auditory stimulus enhances neural activation in the auditory cortex (Jäncke, Mirzazade, & Shah, 1999). Thus, attention gives rise to focal neural activation increases in specific brain areas and thus can influence learning, consolidation, and improved retrieval of stored information. While schemas depend on the individual subject and how the subject “organizes” the neural networks and mental structures for processing incoming information, the musical structure itself also plays a pivotal role in learning and remembering musical pieces. There are long and short pieces, some of them are monotone while others vary dynamically across the entire piece. Some pieces use several musical themes appearing in different forms while others only use one more or less simple theme. In other words, musical structure is defined by the degree of change within different levels of the musical piece. Many researchers use the terms “information” or “complexity” to describe the musical structure (Werbik, 1971). In this context, information refers to redundancy. If the next note in a piece of music is relatively determined by the preceding notes, it conveys little new information about the piece. Thus, a musical piece containing complicated changes on many levels of its structure contains more information than a piece that is repetitive and for which the next notes and beats are easily predictable on the basis of the preceding notes and beats. In the context of music memory, it is obvious that complexity of the musical piece affects how it is encoded, consolidated, and recalled. The more complicated (and complex) a musical piece, the more difficult it is to encode and remember it. However, whether we can
learn and remember complicated and complex music also depends on our mental structure and the schemas we have available for music perception and music memory. Those who have mental structures for complex music will find it easier to learn and retrieve them. Thus, there should be a strong interaction between the mental structure for music and the musical structure itself for forming musical memory. As far as I know, this has not been studied explicitly in the music domain. However, in other domains it has frequently been shown that experts (with specific and optimized mental structures) are partly exceptional in discriminating, learning, and recognizing information from their fields of expertise (Gobet, 1998; Rawson & Van Overschelde, 2008). Thus, it is most likely that expertise in music (even low level expertise) will have substantial influence on music memory. Nevertheless, it is left to future studies to show that the available mental structure for music has indeed an influence on musical memory.
Brain Activation during Encoding and Retrieval of Music Only a few studies have examined the neural underpinnings of music memory so far. The few fMRI studies have uncovered mostly similar findings (Altenmüller et al., 2014; Ford, Addis, & Giovanello, 2011; Gagnepain et al., 2017; Groussard et al., 2010; Janata, 2009; Margulis, Mlsna, Uppunda, Parrish, & Wong, 2009; Plailly, Tillmann, & Royet, 2007; Platel, 2005; Watanabe, Yagishita, & Kikyo, 2008). However, there are also some differences depending on the paradigm used and the particular music memory system studied. All studies identified a strong involvement of the bilateral temporal brain area including the primary and secondary auditory cortex (within the superior temporal gyrus) and temporal brain areas known to be involved in language and memory processing (the middle and inferior temporal gyrus). In addition, all studies reported the involvement of frontal brain areas during music recognition. Mostly, the left inferior frontal cortex is involved. When it comes to episodic music memory, bilateral frontal cortex activations have also been reported with slightly right-sided dominance. Sometimes the precuneus has also been reported as being activated during episodic music memory tasks. When autobiographical music memory is tested, hemodynamic responses in default-mode network (DMN) regions increase, including lateral parietal, temporal, medial prefrontal, and posterior cingulate cortices (Ford et al., 2011; Janata, 2009).
Although these studies adequately demonstrate that a distributed cortical network is involved in music memory process, one has to keep in mind that the fMRI environment and the obtained hemodynamic measures are not optimal for studying the neural underpinnings of music processing in general and music memory processes in particular. The loud and partly annoying measurement environment is suboptimal for music presentation and even for fine-graded cognitive processes. A major drawback of many fMRI studies (and those mentioned earlier) is the fact that mostly very short fragments of musical pieces (10–20 seconds) have been used, which may have precluded the complex cognitive and emotional processes associated with natural music listening. In addition, the hemodynamic responses are slow and only partly correlate with the underlying neurophysiological activations (Logothetis, 2008). In future experiments it would be extremely helpful to study the neural underpinnings of the different music memory systems using silent and less annoying neurophysiological measurement techniques, such as EEG, MEG, or NIRS which provide the possibility of working with natural music stimuli. Currently, there are no studies using the types of experimental paradigms that were used in the aforementioned fMRI studies. Thus, it is of utmost importance to study the neurophysiological oscillations, intracortical current densities, and coherences during music memory tasks. This would provide the opportunity to study the neural underpinnings of music memory processes using more natural experimental situations. Until now, many music perception studies have been published using these techniques and more natural music stimuli. Since music perception implicitly makes use of music memory processes, these studies have uncovered findings that are also interesting for music memory research. For example, listening to natural music is associated with activations in distributed neural systems comprising bilaterally temporal and frontal brain areas (Jäncke & Alahmadi, 2016). In addition, particular coherences between adjacent and distant brain areas are obvious during music perception (Bhattacharya & Petsche, 2001; Bhattacharya, Petsche, Feldmann, & Rescher, 2001; Bhattacharya, Petsche, & Pereda, 2001; Jäncke, 2012) and other music-related tasks (Bangert & Altenmüller, 2003). Thus, these studies partly correspond with fMRI studies in showing that music perception (and thus partly music memory) is controlled via a
distributed neural network binding together brain systems involved in auditory, memory, attention, sequence processing, and executive functions. These neurophysiological findings could be used to understand the possible enhancing effects of music on cognitive tasks (which I will summarize in the next section). In his review article, Wolfgang Klimesch summarized his EEG findings on memory research (Klimesch, 1999) and reported that “good” and “bad” memory performers substantially differ in terms of the time courses of event-related desynchronizations (ERD) in the upper alpha and theta band during a semantic judgment task. The results indicate within the first 1000 ms after presenting the test stimuli, good memory performance is associated with a significantly larger extent of alpha band desynchronization. The opposite holds true for the theta band where good memory performance is reflected by a larger extent of synchronization during the first 1000 ms. In this respect, the phasic responses of these frequency bands reflect the quality and performance of the memory. In addition, tonic changes of these frequency bands are also related to the performance in memory, cognition, and perception. For example, increased tonic alpha band and decreased theta band power are associated with increased performance in various cognitive and perceptual tasks. In this respect, it seems obvious that attempts are made to influence the tonic and phasic oscillations in the alpha and theta bands in certain brain areas in such a way that the functions performed by these brain areas run optimally. This has been done by Klimesch and colleagues (Klimesch, Sauseng, & Gerloff, 2003). They induced increased alpha-band oscillation in parietal brain areas using transcranial magnetic stimulation (TMS) prior to the performance of spatial intelligence tasks. By doing this, they increased the tonic alpha band power in parietal areas. As a result of this manipulation, the subjects substantially improved their cognitive performance. Thus, it is conceivable that music listening might influence brain activation in a similar way leading to an improvement in several ongoing cognitive processes.
M
M
E
Can music be used as memory enhancer? When asking this question one has to distinguish which aspect of memory should benefit from music. In fact there are different influences of music on memory performance. First, we have to discuss whether musicians or non-professional but musically trained subjects benefit from musical training in terms of improved memory performance (e.g., improved verbal working memory or improved longterm memory). Second, does background music exert beneficial or even detrimental effects on cognitive functions? Third, can music be used to enhance memory functions? And fourth, is music beneficial for clinical samples? In the following I will summarize some of the important findings in this field.
Musical Proficiency and Memory An often-asked question in the context of music research is whether musicians are outperforming non-musicians in non-musical memory tasks. In other words, is there a kind of transfer from music proficiency to nonmusical abilities? A very recent meta-analysis aimed to clarify whether musicians indeed perform better than non-musicians in various memory tasks (Talamini, Altoè, Carretti, & Grassi, 2017). By searching published work on this topic in international databases, they collected twenty-nine studies that used fifty-three different memory tasks (e.g., working memory and long-term memory tasks with different materials). For these studies and memory tests, they calculated Hedges’ g, a measure of the effect size adjusted for small groups. These g values were interpreted according to the criteria suggested by Cohen (1988): small effect = 0.2 to 0.5; medium effect = 0.5 to 0.8; large effect > 0.8. Using this measure, they uncovered that musicians performed better than non-musicians in terms of long-term memory (small effect: g = 0.29), short-term memory (medium effect: g = 0.57), and working memory (medium effect: g = 0.56). They also controlled for the influence of moderator variables (e.g., stimulus material: tonal, verbal, or visuospatial) and identified that the musician’s advantage for short-term and working memory was larger with tonal stimuli, moderate with verbal stimuli, and small or null with visuospatial stimuli. Thus, one is
relatively safe in concluding that musicians are really better, even in nonmusic related memory processes. But why are they better? Currently, two possibilities are available to explain this finding: (1) a kind of Pygmalion effect or (2) a consequence of musical training. According to the Pygmalion (or Rosenthal) effect, musicians might perform better than non-musicians because the researchers expected musicians to do better, which might induce an improvement in their performance. However, differences between musicians and non-musicians have not been reported for all cognitive tasks (Schellenberg, 2001). There are only a few tasks (including memory functions) for which musicians show enhanced performance. Another possibility could be that individuals with better memory are more likely to become musicians. This is also not very likely since individuals with good memory can become very skilled and successful in other domains outside the music business. They could become good academics, economists, or philosophers. Thus, this hypothesis is not very helpful in explaining the memory advantage in musicians. On the other hand, a better memory might be a consequence of music training. This musical training might have positively influenced (1) auditory processing, (2) improved overlapping neural networks for speech and music functions, and (3) active learning strategies, such as chunking and sensorimotor integration. Improved auditory processing has been demonstrated in many experiments (Kühnis, Elmer, & Jäncke, 2014; Marie, Magne, & Besson, 2011). This improved ability could be helpful in memory tasks, especially when stimuli are presented orally, because a better auditory encoding of the item to be remembered could strengthen the trace of the stimulus in the listener’s memory. In addition, encoding via the working memory system makes use of the phonological/tonal/auditory loop of the working memory system (described earlier). Thus, musicians might use their superior auditory functions to use the early auditory encoding more efficiently than non-musicians. Incidentally, two studies (Okhrei, Kutsenko, & Makarchuk, 2017; Talamini, Carretti, & Grassi, 2016) revealed no difference between musicians and non-musicians in short-term memory tasks when verbal stimuli were presented visually, thus supporting the hypothesis that auditory encoding is the important link here. A further possible reason for the superior memory performance in musicians could be based on the strong overlap between neural networks and psychological functions involved in
speech and music processing. For example, phonological awareness, reading ability, and music perception are controlled by overlapping networks (Anvari, Trainor, Woodside, & Levy, 2002; Flaugnacco et al., 2015). Music performance is a multisensorial issue involving associating the music notation with the sound of the notes, and the motor responses. These associations have to be built up during learning to play a music instrument. This particular type of training is initially effortful and demands attentional and executive control. Nevertheless, music training might therefore enhance active learning strategies, such as chunking and attentional control, functions that are essential to developing a good memory.
Influence of Background Music on Learning and Recall The influence of background music on various tasks and cognitive processes has been studied and discussed for quite a long while. A metaanalysis conducted by Kämpfe and colleagues (Kämpfe, Sedlmeier, & Renkewitz, 2010) revealed that background music does not have a uniform effect on the performance of tasks. Based on these findings, one might tentatively conclude that the effect of background music on cognitive function can be attributed to general arousal and mood changes (Schellenberg & Weiss, 2013). In one of these studies (Jaencke & Sandmann, 2010), EEG activity was recorded during encoding of verbal material. The authors found no influence of background music on verbal learning. There was, however, a substantially stronger alpha band desynchronization during the first 800– 1200 ms after presentation to the stimulus to learn during background music. Four seconds later this changed to a substantial alpha band synchronization. According to the results presented by Klimesch (1999), this could indicate that background music presentation slightly improves the neural underpinnings of encoding (indicated by the phasic alpha band desynchronization) followed by a more efficient consolidation (indicated by the later and more tonic alpha band synchronization). But these neurophysiological changes do not correlate with the memory performance
since the latter did not change during background listening. In a further study by Kussner and colleagues (Kussner, de Groot, Hofman, & Hillen, 2016), the authors reported unstable effects of background music on learning performance. While they found no influence of background music on learning in the first experiment, the exact replication in a second experiment revealed a beneficial effect of background music. However, they identified that beta band power measured at baseline before the learning experiment (which served as an index of trait arousal) correlated with the learning performance. Thus, general arousal as indicated by resting state beta band activity could indicate a good starting point for later occurring learning. Whether background music might positively influence learning and memory in clinical populations or in elderly subjects suffering from agedependent memory declines is still disputed. Some studies have shown that background music can enhance memory performance in the elderly. Using NIRS, Ferreri et al. (2014) reported improved learning during background music listening conditions in elderly subjects. This learning improvement was accompanied by decreased prefrontal cortex blood flow, which the authors interpret as less activation and less “disturbing effort” during encoding. If these findings hold true in replications, it will open new perspectives to counter the often decreased episodic memory performance in the elderly.
Music as Memory Modulator in Healthy Subjects A recent set of studies revealed that emotional arousal evoked by music can enhance memory consolidation (Judde & Rickard, 2010). The authors of this study presented music excerpts immediately after learning, 20 or 45 minutes after encoding of verbal material. During this post-learning period, the subjects relaxed. One week later the same subjects took part in a retention experiment during which they were tested whether they remembered the words they had learned one week before. The retention performance was significantly enhanced, regardless of valence, when music presentation occurred at 20 minutes, but not immediately or 45 minutes after encoding. The authors explain this facilitatory effect of music
presentation on long-term memory in the context of what is currently known about the time course of memory consolidation. Memory consolidation is time-dependent since the biochemical processes modulating synaptic processes need some time (at least 25 minutes) to develop and to install the new and altered synaptic contacts in the memory networks, including the release of various hormones into the bloodstream (i.e., epinephrine, norepinephrine, and cortisol) (McGaugh, 2000). Thus, when arousing music (irrespective of the valence) is presented exactly 20– 25 minutes post-learning, memory consolidation is enhanced. In a subsequent experiment the authors demonstrated that learning emotional material was attenuated when relaxing music was presented during the postlearning phase (Rickard, Wong, & Velik, 2012). Thus, relaxing music may counter the increased arousal levels that might enhance the formation of emotional memory containing negative and unwanted memories. A number of studies have investigated how memory performance changes when the words to be learned are sung (Calvert & Tart, 1993; Kilgour, Jakobson, & Cuddy, 2000; McElhinney & Annett, 1996; Tamminen, Rastle, Darby, Lucas, & Williamson, 2017; Wallace, 1994). The learning materials have been words, lyrics, or ballads. Although these studies differ in terms of the particular paradigms used, they all came to more or less the same conclusion that sung (verbal material) is better recalled. However, the benefit of the sung modality increased as familiarity with the melody increased. In some studies the sung benefit was entirely restricted to conditions in which the song was familiar to the participants (Calvert & Tart, 1993; Tamminen et al., 2017). These results can be best explained in the context of the SAM theory. When new information is encoded it is easier and more efficient to “connect” this new information with already stored memory traces, which familiar music is. In other words, familiar music offers the possibility of attaching new information to it. Michael Thaut and his colleagues (Peterson & Thaut, 2007) examined the neural underpinnings associated with the presentation and encoding of sung verbal material compared to spoken verbal material. For this they measured EEG during the presentation of the material to be learned and calculated what they called “learning-related changes in coherence (LRCC)” to quantify the learning-related changes of brain oscillations between the scalp electrodes. Using this measure, they found increased coherences within and between left and right frontal areas in several
frequency bands during encoding of sung words. These results are interpreted as support for the hypothesis that verbal learning in the context of musical presentation strengthens coherent oscillations in frontal cortical networks, which are known to be involved in encoding and retrieval of memory information. Although the neurophysiological findings are compelling and consistent with what we know about the neural underpinnings of working memory and other memory processes (i.e., changes of brain synchronization during learning and retrieval), there was no difference in terms of behavioral performance for the sung or spoken material. It might be possible that the learning material was too easy and thus induced floor effects or that the sung material was not related strongly enough to familiar melodies.
Music as Memory Modulator in Neurological Patients There is currently substantial interest in finding non-invasive interventions to rehabilitate the cognitive impairments of neurological patients. In particular, patients suffering from memory impairments have been targeted in recent research in order to identify possible beneficial effects of music on memory impairments. One of the first and most important studies with neurological patients has been published by Särkämo et al. (2008). Applying a single-blind, randomized, and controlled experiment with sixty stroke patients of whom twenty listened daily for one hour to self-selected music, the study revealed substantial improvements for verbal memory and focused attention only for those patients listening to self-selected music. The control subjects (listening to audio book or doing nothing in addition) did not show any improvement in these cognitive functions. This study was one of the first, demonstrating beneficial effects of music listening on cognitive recovery in neurological patients. A number of studies have shown that patients with Alzheimer’s disease (AD) recognize lyrics that they learned heard sung more reliably than lyrics heard in the spoken modality (Simmons-Stern, Budson, & Ally, 2010). Besides this improvement, they also showed substantial improvements in memorizing the semantic content of the lyrics learned in the sung modality
(Simmons-Stern et al., 2012). Other studies have shown that the material learned in the sung condition was relatively robust since the patients recognize these stimuli relatively well even after longer periods of time (Moussard, Bigand, Belleville, & Peretz, 2012, 2014; Palisson et al., 2015). Incidentally, similar findings have been shown in multiple sclerosis patients (Thaut, Peterson, McIntosh, & Hoemberg, 2014). Although some of the beneficial effects of sung material were relatively small (e.g., Moussard et al., 2014), they all can be explained within the same theoretical framework. We know that music processing is associated with brain activations in a distributed neural network including many brain areas. This increases the likelihood that even in case of strong degenerations of some brain areas, other network parts are intact, which then can be used for encoding and consolidation. A further possibility could be that musical information could be used as “context” information, which can be used to “attach” the newly learned information (similar to the SAM theory proposed for memory processes in healthy subjects).
A M
M
M
Principally, music memory is not that different from the “classical” memory system. However, there are some fundamental and important differences when it comes to natural music and how it is processed, stored, and retrieved. Music is a dynamic stimulus evolving over time, so when listeners listen to music they have to integrate the incoming sequential auditory information and apply specific memory-based mechanisms (Gestalt perception, chunking, etc.) to form this sequence to a musical piece. Thus, music listening is not only a matter of simple auditory information processing, it is much more, since several psychological functions are involved, from working memory to several aspects of longterm memory. What is, however, special for music perception and music memory is the fact that widely distributed neural networks are involved in perceiving and recognizing musical pieces. Figure 2 demonstrates schematically the specific nature of the musical memory, which is partly derived from the SAM model proposed by Raaijmakers and Shiffrin (1981)
and from Kalveram’s model of inverse processing (Kalveram & Seyfarth, 2009).
FIGURE 2. Schematic description of the memory system associating music information with many non-music aspects.
As one can see from Fig. 2, auditory information is fed to the memory storage, which can be conceived of as a kind of correlation storage where auditory information (at) is associated with lots of different information resulting in a set of efferences (et). In this sense, music information can be associated with motor programs, which is particularly important for musicians who have learned to generate music by manipulating instruments. For that they need specific and highly specialized motor programs, which they can use to operate their instrument. But nonmusicians do also have (maybe even an innate) audio-motor coupling, which becomes obvious when we listen to rhythmic music. In these
situations, we tend to move according to the music rhythm. Besides motor programs, episodic, autobiographical, semantic, and implicit memory information are also associated with music information. Even emotion and motivation can be related to the incoming auditory information. These associations can be conceived of as correlations of varying strength, with the correlation strength depending on the frequency of repetition and the salience of the associated information. Some of these correlations give rise to conscious perceptions (explicit memory) while others remain unconscious (implicit memory). Executive functions can enhance the processing of incoming information by directing attention to particular information at the end enhancing the neural activation of those areas that are involved in processing this information. We can also apply executive functions to direct our attention to particular correlations, thus enhancing their likelihood to result in an appropriate efference. This can also lead to a kind of suppression and/or inhibition of other correlations. In this context, perceptual and memory schemas can be applied according to which we select or enhance incoming information or the pattern of correlations within the storage. This model can also be operated in an inverse fashion. For example, when a person wants to change his current mood he might “activate” the correlations within the storage associated with a particular emotion. This will “activate” images of those musical pieces that activate the wished emotion. Thus, the goal (to evoke a particular emotion) is now fed into the storage, which activates those efferences yielding to the emotion in question. No generation has listened to music as often as our generation. According to a 2016 US survey estimate, more than 90 percent of the population reported listening to music for an average of 25 hours a week (Nielsen, 2017). Thus, it is obvious that music is a frequently applied cue for autobiographical memories because music is associated with lots of everyday information. Thus music can serve as an efficient strategy to assess and stimulate our biographical memory. Music is a complex stimulus carrying much information and which evolves over time. This could be the reason why music processing is associated with distributed neural network activations. Obviously, it is relatively easy to link music information to multimodal information. Music can carry meaning, emotion, and episodic non-music information; it can also trigger and control motor behavior. This multiplicity and versatility of
the music network offer many possibilities to insert new information. This could be the reason why several music-related learning strategies improve memory functions. However, future studies are necessary to show whether music interventions can be used to improve memory functions in both healthy and neurologically impaired subjects.
R Alonso, I., Dellacherie, D., & Samson, S. (2015). Emotional memory for musical excerpts in young and older adults. Frontiers in Aging Neuroscience 23. Retrieved from https://doi.org/10.3389/fnagi.2015.00023 Altenmüller, E., Siggel, S., Mohammadi, B., Samii, A., & Münte, T. F. (2014). Play it again, Sam: Brain correlates of emotional music recognition. Frontiers in Psychology 114. Retrieved from https://doi.org/10.3389/fpsyg.2014.00114 Anvari, S. H., Trainor, L. J., Woodside, J., & Levy, B. A. (2002). Relations among musical skills, phonological processing, and early reading ability in preschool children. Journal of Experimental Child Psychology 83(2), 111–130. Baddeley, A. (2010). Working memory. Current Biology 20(4), R136–R140. Baddeley, A. D., & Hitch, G. (1974). Working memory. In G. H. Bower (Ed.), Psychology of Learning and Motivation (Vol. 8, pp. 47–89). New York: Academic Press. Bangert, M., & Altenmüller, E. O. (2003). Mapping perception to action in piano practice: A longitudinal DC-EEG study. BMC Neuroscience 4, 26. Retrieved from https://doi.org/10.1186/1471-2202-4-26 Bhattacharya, J., & Petsche, H. (2001). Enhanced phase synchrony in the electroencephalograph gamma band for musicians while listening to music. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics 64(1 Pt. 1), 012902. Bhattacharya, J., Petsche, H., Feldmann, U., & Rescher, B. (2001). EEG gamma-band phase synchronization between posterior and frontal cortex during mental rotation in humans. Neuroscience Letters 311(1), 29–32. Bhattacharya, J., Petsche, H., & Pereda, E. (2001). Long-range synchrony in the gamma band: Role in music perception. Journal of Neuroscience 21(16), 6329–6337. Bower, G. H. (1981). Mood and memory. The American Psychologist 36, 129–148. Calvert, S. L., & Tart, M. (1993). Song versus verbal forms for very-long-term, long-term, and shortterm verbatim recall. Journal of Applied Developmental Psychology 14(2), 245–260. Cohen, J. (1988). Statistical power analysis for the behavioral sciences. New York: Lawrence Erlbaum Associates. Cowan, N. (2011). The focus of attention as observed in visual working memory tasks: Making sense of competing claims. Neuropsychologia 49, 1401–1406. Deutsch, D. (1970). Tones and numbers: Specificity of interference in immediate memory. Science 168(3939), 1604–1605. Deutsch, D., Henthorn, T., Marvin, E., & Xu, H. (2006). Absolute pitch among American and Chinese conservatory students: Prevalence differences, and evidence for a speech-related critical period. Journal of the Acoustical Society of America 119(2), 719–722.
Elmer, S., Hänggi, J., Meyer, M., & Jäncke, L. (2013). Increased cortical surface area of the left planum temporale in musicians facilitates the categorization of phonetic and temporal speech sounds. Cortex 49(10), 2812–2821. Elmer, S., Rogenmoser, L., Kühnis, J., & Jäncke, L. (2015). Bridging the gap between perceptual and cognitive perspectives on absolute pitch. Journal of Neuroscience 35(1), 366–371. Eschrich, S., Münte, T. F., & Altenmüller, E. O. (2005). Remember Bach: An investigation in episodic memory for music. Annals of the New York Academy of Sciences 1060, 438–442. Eschrich, S., Münte, T. F., & Altenmüller, E. O. (2008). Unforgettable film music: The role of emotion in episodic long-term memory for music. BMC Neuroscience 9, 48. Retrieved from https://doi.org/10.1186/1471-2202-9-48 Evers, S., & Suhr, B. (2000). Changes of the neurotransmitter serotonin but not of hormones during short time music perception. European Archives of Psychiatry and Clinical Neuroscience 250(3), 144–147. Ferreri, L., Bigand, E., Perrey, S., Muthalib, M., Bard, P., & Bugaiska, A. (2014). Less effort, better results: How does music act on prefrontal cortex in older adults during verbal encoding? An fNIRS study. Frontiers in Human Neuroscience 8, 301. Retrieved from https://doi.org/10.3389/fnhum.2014.00301 Ferreri, L., & Rodriguez-Fornells, A. (2017). Music-related reward responses predict episodic memory performance. Experimental Brain Research 235(12), 3721–3731. Flaugnacco, E., Lopez, L., Terribili, C., Montico, M., Zoia, S., & Schon, D. (2015). Music training increases phonological awareness and reading skills in developmental dyslexia: A randomized control trial. PLoS ONE 10(9), e0138715. Ford, J. H., Addis, D. R., & Giovanello, K. S. (2011). Differential neural activity during search of specific and general autobiographical memories elicited by musical cues. Neuropsychologia 49(9), 2514–2526. Frieler, K., Fischinger, T., Schlemmer, K., Lothwesen, K., Jakubowski, K., & Müllensiefen, D. (2013). Absolute memory for pitch: A comparative replication of Levitin’s 1994 study in six European labs. Musicae Scientiae: The Journal of the European Society for the Cognitive Sciences of Music 17(3), 334–349. Gaab, N., Gaser, C., Zaehle, T., Jancke, L., & Schlaug, G. (2003). Functional anatomy of pitch memory: An fMRI study with sparse temporal sampling. NeuroImage 19(4), 1417–1426. Gagnepain, P., Fauvel, B., Desgranges, B., Gaubert, M., Viader, F., Eustache, F., … Platel, H. (2017). Musical expertise increases top-down modulation over hippocampal activation during familiarity decisions. Frontiers in Human Neuroscience 11, 472. Retrieved from https://doi.org/10.3389/fnhum.2017.00472 Gobet, F. (1998). Expert memory: A comparison of four theories. Cognition 66(2), 115–152. Green, A. C., Bærentsen, K. B., Stødkilde-Jørgensen, H., Roepstorff, A., & Vuust, P. (2012). Listen, learn, like! Dorsolateral prefrontal cortex involved in the mere exposure effect in music. Neurology Research International 2012, 846270. Retrieved from http://dx.doi.org/10.1155/2012/846270 Gregersen, P. K., Kowalsky, E., Kohn, N., & Marvin, E. W. (1999). Absolute pitch: Prevalence, ethnic variation, and estimation of the genetic component. American Journal of Human Genetics 65(3), 911–913. Gregersen, P. K., Kowalsky, E., Kohn, N., & Marvin, E. W. (2001). Early childhood music education and predisposition to absolute pitch: Teasing apart genes and environment. American Journal of Medical Genetics 98(3), 280–282. Groussard, M., La Joie, R., Rauchs, G., Landeau, B., Chetelat, G., Viader, F., … Platel, H. (2010). When music and long-term memory interact: Effects of musical expertise on functional and structural plasticity in the hippocampus. PLoS ONE 5(10), e13225.
Halpern, A. R. (1989). Memory for the absolute pitch of familiar songs. Memory & Cognition 17(5), 572–581. Halpern, A. R., & Müllensiefen, D. (2008). Effects of timbre and tempo change on memory for music. Quarterly Journal of Experimental Psychology 61(9), 1371–1384. Halpern, A. R., & O’Connor, M. G. (2000). Implicit memory for music in Alzheimer’s disease. Neuropsychology 3(14), 391–397. Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature Reviews Neuroscience 8(5), 393–402. Honing, H., & Ladinig, O. (2009). Exposure influences expressive timing judgments in music. Journal of Experimental Psychology: Human Perception and Performance 35(1), 281–288. Jaencke, L., & Sandmann, P. (2010). Music listening while you learn: No influence of background music on verbal learning. Behavioral and Brain Functions 6, 3. Retrieved from https://doi.org/10.1186/1744-9081-6-3 Janata, P. (2009). The neural architecture of music-evoked autobiographical memories. Cerebral Cortex 19(11), 2579–2594. Jäncke, L. (2012). The dynamic audio-motor system in pianists. Annals of the New York Academy of Sciences 1252, 246–252. Jäncke, L., & Alahmadi, N. (2016). Detection of independent functional networks during music listening using electroencephalogram and sLORETA-ICA. Neuroreport 27(6), 455–461. Jäncke, L., Mirzazade, S., & Shah, N. J. (1999). Attention modulates activity in the primary and the secondary auditory cortex: A functional magnetic resonance imaging study in human subjects. Neuroscience Letters 266(2), 125–128. Johnson, M. K., Kim, J. K., & Risse, G. (1985). Do alcoholic Korsakoff’s syndrome patients acquire affective reactions? Journal of Experimental Psychology: Learning, Memory, and Cognition 11(1), 22–36. Judde, S., & Rickard, N. (2010). The effect of post-learning presentation of music on long-term word-list retention. Neurobiology of Learning and Memory 94(1), 13–20. Kalveram, K. T., & Seyfarth, A. (2009). Inverse biomimetics: How robots can help to verify concepts concerning sensorimotor control of human arm and leg movements. Journal of Physiology 103(3– 5), 232–243. Kämpfe, J., Sedlmeier, P., & Renkewitz, F. (2010). The impact of background music on adult listeners: A meta-analysis. Psychology of Music 39(4), 424–448. Keitz, M., Martin-Soelch, C., & Leenders, K. L. (2003). Reward processing in the brain: A prerequisite for movement preparation? Neural Plasticity 10(1–2), 121–128. Kilgour, A. R., Jakobson, L. S., & Cuddy, L. L. (2000). Music training and rate of presentation as mediators of text and song recall. Memory & Cognition 28(5), 700–710. Klimesch, W. (1999). EEG alpha and theta oscillations reflect cognitive and memory performance: A review and analysis. Brain Research Reviews 29(2), 169–195. Klimesch, W., Sauseng, P., & Gerloff, C. (2003). Enhancing cognitive performance with repetitive transcranial magnetic stimulation at human individual alpha frequency. European Journal of Neuroscience 17(5), 1129–1133. Koelsch, S., Kasper, E., Sammler, D., Schulze, K., Gunter, T., & Friederici, A. D. (2004). Music, language and meaning: Brain signatures of semantic processing. Nature Neuroscience 7(3), 302– 307. Kühn, S., Romanowski, A., Schilling, C., Lorenz, R., Mörsen, C., Seiferth, N., … IMAGEN Consortium (2011). The neural basis of video gaming. Translational Psychiatry 1(11), e53. Kühnis, J., Elmer, S., & Jäncke, L. (2014). Auditory evoked responses in musicians during passive vowel listening are modulated by functional connectivity between bilateral auditory-related brain
regions. Journal of Cognitive Neuroscience 26(12), 2750–2761. Kussner, M. B., de Groot, A. M., Hofman, W. F., & Hillen, M. A. (2016). EEG beta power but not background music predicts the recall scores in a foreign-vocabulary learning task. PLoS ONE 11(8), e0161387. Lee, Y. S., Janata, P., Frost, C., Martinez, Z., & Granger, R. (2015). Melody recognition revisited: Influence of melodic Gestalt on the encoding of relational pitch information. Psychonomic Bulletin & Review 22(1), 163–169. Levitin, D. J. (1994). Absolute memory for musical pitch: Evidence from the production of learned melodies. Perception & Psychophysics 56, 414–423. Levitin, D. J., & Rogers, S. E. (2005). Absolute pitch: Perception, coding, and controversies. Trends in Cognitive Sciences 9(1), 26–33. Logothetis, N. K. (2008). What we can do and what we cannot do with fMRI. Nature 453(7197), 869–878. McElhinney, M., & Annett, J. M. (1996). Pattern of efficacy of a musical mnemonic on recall of familiar words over several presentations. Perceptual and Motor Skills 82(2), 395–400. McGaugh, J. L. (2000). Memory: A century of consolidation. Science 287(5451), 248–251. Margulis, E. H., Mlsna, L. M., Uppunda, A. K., Parrish, T. B., & Wong, P. C. M. (2009). Selective neurophysiologic responses to music in instrumentalists with different listening biographies. Human Brain Mapping 30(1), 267–275. Marie, C., Magne, C., & Besson, M. (2011). Musicians and the metric structure of words. Journal of Cognitive Neuroscience 23(2), 294–305. Meneses, A., & Liy-Salmeron, G. (2012). Serotonin and emotion, learning and memory. Reviews in the Neurosciences 23(5–6), 543–553. Moussard, A., Bigand, E., Belleville, S., & Peretz, I. (2012). Music as an aid to learn new verbal information in Alzheimer’s disease. Music Perception: An Interdisciplinary Journal 29(5), 521– 531. Moussard, A., Bigand, E., Belleville, S., & Peretz, I. (2014). Learning sung lyrics aids retention in normal ageing and Alzheimer’s disease. Neuropsychological Rehabilitation 24(6), 894–917. Nielsen (2017). Nielsen Music year-end report 2016. Retrieved from http://www.nielsen.com/us/en/press-room/2017/nielsen-releases-2016-us-year-end-musicreport.html Oberauer, K., & Lewandowsky, S. (2011). Modeling working memory: A computational implementation of the Time-Based Resource-Sharing theory. Psychonomic Bulletin & Review 18(1), 10–45. Okhrei, A., Kutsenko, T., & Makarchuk, M. (2017). Performance of working memory of musicians and non-musicians in tests with letters, digits, and geometrical shapes. Biologija 62(4), 207–215. Palisson, J., Roussel-Baclet, C., Maillet, D., Belin, C., Ankri, J., & Narme, P. (2015). Music enhances verbal episodic memory in Alzheimer’s disease. Journal of Clinical and Experimental Neuropsychology 37(5), 503–517. Parks, S. L., & Clancy Dollinger, S. (2014). The positivity effect and auditory recognition memory for musical excerpts in young, middle-aged, and older adults. Psychomusicology: Music, Mind, and Brain 24(4), 298–308. Peretz, I., Gosselin, N., Belin, P., Zatorre, R. J., Plailly, J., & Tillmann, B. (2009). Music lexical networks. Annals of the New York Academy of Sciences 1169, 256–265. Peterson, D. A., & Thaut, M. H. (2007). Music increases frontal EEG coherence during verbal learning. Neuroscience Letters 412(3), 217–221. Piaget, J. (1923). La langage et la pensée chez l’enfant: Études sur la logique de l’enfant. Retrieved from
http://pubman.mpdl.mpg.de/pubman/item/escidoc:2375486/component/escidoc:2375485/Piaget_1 923_language_pensee_enfant.pdf Plailly, J., Tillmann, B., & Royet, J.-P. (2007). The feeling of familiarity of music and odors: The same neural signature? Cerebral Cortex 17(11), 2650–2658. Plantinga, J., & Trainor, L. J. (2005). Memory for melody: Infants use a relative pitch code. Cognition 98(1), 1–11. Platel, H. (2005). Functional neuroimaging of semantic and episodic musical memory. Annals of the New York Academy of Sciences 1060, 136–147. Raaijmakers, J. G., & Shiffrin, R. M. (1981). Search of associative memory. Psychological Review 88(2), 93–134. Rawson, K. A., & Van Overschelde, J. P. (2008). How does knowledge promote memory? The distinctiveness theory of skilled memory. Journal of Memory and Language 58(3), 646–668. Rickard, N. S., Wong, W. W., & Velik, L. (2012). Relaxing music counters heightened consolidation of emotional memory. Neurobiology of Learning and Memory 97(2), 220–228. Rogenmoser, L., Elmer, S., & Jäncke, L. (2015). Absolute pitch: Evidence for early cognitive facilitation during passive listening as revealed by reduced P3a amplitudes. Journal of Cognitive Neuroscience 27(3), 623–637. Salamé, P., & Baddeley, A. (1989). Effects of background music on phonological short-term memory. Quarterly Journal of Experimental Psychology Section A 41(1), 107–122. Salimpoor, V. N., Zald, D. H., Zatorre, R. J., Dagher, A., & McIntosh, A. R. (2015). Predictions and the brain: How musical sounds become rewarding. Trends in Cognitive Sciences 19(2), 86–91. Samson, S., & Peretz, I. (2005). Effects of prior exposure on music liking and recognition in patients with temporal lobe lesions. Annals of the New York Academy of Sciences 1060, 419–428. Särkämö, T., Tervaniemi, M., Laitinen, S., Forsblom, A., Soinila, S., Mikkonen, M., … Hietanen, M. (2008). Music listening enhances cognitive recovery and mood after middle cerebral artery stroke. Brain: A Journal of Neurology 131, 866–876. Schellenberg, E. G. (2001). Music and nonmusical abilities. Annals of the New York Academy of Sciences 930, 355–371. Reprinted in G. E. McPherson (Ed.), The child as musician: A handbook of musical development (2nd ed., pp. 149–176). Oxford: Oxford University Press, 2016. Schellenberg, E. G., & Weiss, M. W. (2013). Music and cognitive abilities. In D. Deutsch (Ed.), The Psychology of Music (3rd ed., pp. 499–550). London: Academic Press. Schendel, Z. A., & Palmer, C. (2007). Suppression effects on musical and verbal memory. Memory & Cognition 35(4), 640–650. Schulze, K., & Koelsch, S. (2012). Working memory for speech and music. Annals of the New York Academy of Sciences 1252, 229–236. Schulze, K., Mueller, K., & Koelsch, S. (2011). Neural correlates of strategy use during auditory working memory in musicians and non-musicians. European Journal of Neuroscience 33(1), 189– 196. Schulze, K., Mueller, K., & Koelsch, S. (2013). Auditory stroop and absolute pitch: An fMRI study. Human Brain Mapping 34(7), 1579–1590. Schulze, K., Zysset, S., Mueller, K., Friederici, A. D., & Koelsch, S. (2011). Neuroarchitecture of verbal and tonal working memory in nonmusicians and musicians. Human Brain Mapping 32, 771–783. Semal, C., Demany, L., Ueda, K., & Hallé, P. A. (1996). Speech versus nonspeech in pitch memory. Journal of the Acoustical Society of America 100(2 Pt. 1), 1132–1140. Simmons-Stern, N. R., Budson, A. E., & Ally, B. A. (2010). Music as a memory enhancer in patients with Alzheimer’s disease. Neuropsychologia 48(10), 3164–3167.
Simmons-Stern, N. R., Deason, R. G., Brandler, B. J., Frustace, B. S., O’Connor, M. K., Ally, B. A., & Budson, A. E. (2012). Music-based memory enhancement in Alzheimer’s disease: Promise and limitations. Neuropsychologia 50(14), 3295–3303. Takeuchi, A. H., & Hulse, S. H. (1993). Absolute pitch. Psychological Bulletin 113(2), 345–361. Talamini, F., Altoè, G., Carretti, B., & Grassi, M. (2017). Musicians have better memory than nonmusicians: A meta-analysis. PLoS ONE 12(10), e0186773. Talamini, F., Carretti, B., & Grassi, M. (2016). The working memory of musicians and nonmusicians. Music Perception: An Interdisciplinary Journal 34(2), 183–191. Tamminen, J., Rastle, K., Darby, J., Lucas, R., & Williamson, V. J. (2017). The impact of music on learning and consolidation of novel words. Memory 25(1), 107–121. Thaut, M. H., Peterson, D. A., McIntosh, G. C., & Hoemberg, V. (2014). Music mnemonics aid verbal memory and induce learning-related brain plasticity in multiple sclerosis. Frontiers in Human Neuroscience 8, 395. Retrieved from https://doi.org/10.3389/fnhum.2014.00395 Trainor, L. J., McDonald, K. L., & Alain, C. (2002). Automatic and controlled processing of melodic contour and interval information measured by electrical brain activity. Journal of Cognitive Neuroscience 14(3), 430–442. Vieillard, S., & Gilet, A.-L. (2013). Age-related differences in affective responses to and memory for emotions conveyed by music: A cross-sectional study. Frontiers in Psychology 4, 711. Retrieved from https://doi.org/10.3389/fpsyg.2013.00711 Wallace, W. T. (1994). Memory for music: Effect of melody on recall of text. Journal of Experimental Psychology: Learning, Memory, and Cognition 20(6), 1471–1485. Watanabe, T., Yagishita, S., & Kikyo, H. (2008). Memory of music: Roles of right hippocampus and left inferior frontal gyrus. NeuroImage 39(1), 483–491. Werbik, H. (1971). Informationsgehalt und emotionale Wirkung von Musik. Mainz: B. Schott. Wildschut, T., Sedikides, C., Arndt, J., & Routledge, C. (2006). Nostalgia: Content, triggers, functions. Journal of Personality and Social Psychology 91(5), 975–993. Williamson, V. J., Baddeley, A. D., & Hitch, G. J. (2010). Musicians’ and nonmusicians’ short-term memory for verbal and musical sequences: Comparing phonological similarity and pitch proximity. Memory & Cognition 38(2), 163–175. Zajonc, R. B. (1968). Attitudinal effects of mere exposure. Journal of Personality and Social Psychology 9(2 pt. 2), 1–27. Zatorre, R. J., Perry, D. W., Beckett, C. A., Westbury, C. F., & Evans, A. C. (1998). Functional anatomy of musical processing in listeners with absolute pitch and relative pitch. Proceedings of the National Academy of Sciences 95(6), 3172–3177.
CHAPT E R 12
M U S I C A N D AT T E N T I O N , EXECUTIVE FUNCTION, A N D C R E AT I V I T Y P S Y C H E L O U I A N D R A C H E L E . G U E T TA
I A is “the taking possession by the mind, in clear and vivid form, of one out of what seem several simultaneously possible objects or trains of thought. Focalization, concentration, of consciousness are of its essence” (James, 1890, p. 403). Executive functions are “a family of top-down mental processes needed when you have to concentrate and pay attention … three core EFs: inhibition [inhibitory control, including self-control (behavioral inhibition) and interference control (selective attention and cognitive inhibition)], working memory (WM), and cognitive flexibility (also called set shifting, mental flexibility, or mental set shifting and closely linked to creativity)” (Diamond, 2013, pp. 1–2). Creativity is “the ability to produce work that is novel (i.e., original, unexpected), high in quality, and appropriate (i.e., useful, meets task constraints)” (Sternberg, Lubart, Kaufman, & Pretz, 2005, p. 351). How does music, as “organized sound” (Varèse & Wen-Chung, 1966), intersect with these cognitive capacities of the human mind? In this chapter,
we provide a general overview of the contemporary research at the intersection of music and attention, executive functions, and creativity. On one hand, we see that musical sounds provide an optimal stimulus set with which to understand the fundamental properties of attention, executive functions, and creativity. On the other hand, music also offers a window through which researchers may assess effects of long-term training on more general cognitive function, as well as neurocognitive development throughout the lifespan.
M
A
There are many ways to conceptualize the vast literature on attention. Perhaps as a result, research on the intersection between attention and music has been similarly fragmented. Nevertheless, research on music and attention has followed the trends of psychology and neuroscience more generally, and musical stimuli have served as a useful model to tease apart several models of attention. Here we provide a general overview of the disparate theories on attention, before turning to its intersection with the work on music more specifically.
Theories of Attention Patel’s OPERA hypothesis (Patel, 2011b) posits that one of several reasons why music training benefits the neural encoding of speech is through attention: by engaging shared brain networks between music and speech that are associated with focused attention. Attention has been thought of in terms of early versus late selection theories, and in terms of its operation over space and time. Early selection theories focus on sensory processing and more exogenous (reflexive) sources of information, whereas late selection theories focus more on feature selection and more cognitive, endogenous operations. Theories of early versus late attention differ in their posited effects of perceptual selection, enhancement, and cognitive focus along various points
in the perceptual-cognitive pathway, or along the gradient of primary to association areas in the human cortex. Such early versus late selection theories of attention pertain to when, temporally, along the classic sensorycognitive pathway attentional processes might operate. Early selection theories generally focus on sensory processing (closer to the sensory periphery) whereas late selection theories focus more on cognitive processing. Evidence for early selection comes from findings from the dichotic listening paradigm in which event-related brain potentials were recorded. The amplitude of the N1, an event-related brain response generated in response to sounds, is enhanced in response to sounds in the attended ear relative to the unattended ear (Woldorff & Hillyard, 1991). Magnetoencephalography work subsequently pointed out the source of this attentional enhancement to the auditory cortex (Woldorff et al., 1993). Since the auditory cortex is part of the primary sensory cortices, the finding that this early cortical way station acts on attentional processing as early as 100 ms after sound presentation provides convincing evidence for early selection. Theories that posit relatively late selection conceptualize attention as a feature-based or object-based operation. In particular, the feature integration theory (Treisman & Gelade, 1980) posits that attention operates by combining pre-attentively selected features within a busy scene. Support for this comes from illusory conjunctions, in which unattended features of visual objects, such as color and shape, are sometimes combined to give rise to an illusory percept of a nonexistent object. While this theory has received lots of interest, the definitions of features in vision may not so readily transfer to audition. In the auditory modality, stimulus representation has been described as hierarchical, as shown by psychophysical and modeling studies. At the lowest rung of the hierarchy of stimulus representation there are “primitive features” such as acoustic frequency, whereas at higher levels there are more complex, emergent features such as virtual pitch, which combine with other features to form objects. Attention can be enhanced by cueing at the appropriate level, thus reducing uncertainty (Hafter & Saberi, 2001). Object-based attention offers a direct comparison between visual and auditory processing. Much like the visual system combines features to form objects, the auditory system forms objects by grouping together sound elements that share features such as frequency and harmonic structure (Shinn-Cunningham, 2008). The
temporal evolution of these features is especially relevant for object formation in the auditory system. At a low-level timescale, the auditory system may group together sounds based on similar fine-grained temporal features such as attack time, while at a higher-level timescale, distinct tones may be grouped together based on temporal proximity to give rise to beat perception. Beat perception has been proposed as an attentional mechanism, through which different temporal objects such as tones are combined together to form larger units such as rhythms and phrases (De Freitas, Liverence, & Scholl, 2014; Grahn, 2012). The rhythmic effects of attention over time will be revisited later in this section. Evidence for late selection in auditory neurophysiology comes from findings of later attention-related enhancements in event-related brain responses such as the P300 (Purves et al., 2008). In addition, cases of late selection are supported by the neuropsychological literature, in which patients with lesions in the right parietal cortex present with lack of awareness of their contralesional (usually left) visual field. For these cases, sometimes the successful perception of one feature can reduce the detectability of another, simultaneously present feature, a condition known as extinction. In the auditory/musical modality, interesting evidence comes from the use of an auditory illusion in a case of auditory extinction (Deouell, Deutsch, Scabini, Soroker, & Knight, 2007). This study took advantage of Deutsch’s scale illusion, in which presenting subjects with alternating high-pitched and low-pitched tones to the left and right ear paradoxically leads to the percept of a stream of high tones in the right ear and low tones in the left (Deutsch, 1974). When the patient with auditory neglect was presented with octave illusion stimuli, he reported only hearing the high-pitched stream. The fact that he only heard the right-lateralized stream, rather than the right-ear stimulus, does suggest that some forms of perceptual analysis, such as the formation of auditory streams, are intact before attention and its disruption in hemispatial neglect, thus providing support for late selection. A third line of literature supports a combination of early and late selection theories in showing attention-related enhancements of mid-latency brain responses to sound. For example, the mismatch negativity, an eventrelated potential generated around 200 ms after the onset of unexpected sounds, is both pre-attentively generated and modulated by attention (Woldorff, Hillyard, Gallen, Hampson, & Bloom, 1998). More specific to
the music literature, the Early Right Anterior Negativity (ERAN), an eventrelated potential in response to unexpected musical chords such as the Neapolitan chord (Koelsch, Gunter, Friederici, & Schröger, 2000), is also pre-attentively generated but modulated by attention: When subjects were directing attention away from auditory stimuli in a visual task, they nevertheless elicited an ERAN in response to the Neapolitan chord; however its amplitude is larger in the attended condition (Loui, Grent-’TJong, Torpey, & Woldorff, 2005). Taken together, the best available resolution for the debate on early versus late selection holds that attention acts on multiple levels of the perceptual-cognitive or primary-association continuum, by selecting relevant features and processing them more fully at more sensory stages, and also by combining selected features to form coherent objects, streams, or scenes at later association stages.
Selection and Filtering While the controversy amongst the early versus late selection camps continues, other work has focused on the roles of attention for selecting and filtering (Hafter, Sarampalis, & Loui, 2008). Perhaps the most common example of attentional filtering is the famous cocktail party effect, our remarkable ability to focus on one speaker amidst a noisy environment (Cherry, 1953). In contrast, Broadbent (1982) noted that peripheral stimuli may also take over attention and processing, such as in the “breakthrough of the unattended” phenomenon. Bregman’s (1994) theory of auditory scene analysis posits that we stream or segregate distinct auditory stimuli by means of top-down knowledge as well as bottom-up perceptual processing, based on acoustic features such as frequency and amplitude co-modulation. This auditory stream segregation, the dividing of our world into separate sound-emitting objects, helps us to make sense of the sounds around us. Music listening, thus, entails many aspects of analyzing a busy auditory scene. In Western music, for instance, at various times we are continually separating and fusing the different voices within the musical surface to perceive melody and harmony. This act of auditory scene analysis requires selective and divided attention, and interacts with training (Loui & Wessel, 2007). In
music, the objects to which we attend may pertain to horizontal aspects such as melody, vertical aspects such as harmony, timbral aspects including spectral centroid, and amplitude envelope. Attended features or objects may also be music-theoretically defined components such as specific chord changes and harmonies. They may also pertain to rhythm, meter, and/or larger-scale musical structure such as form.
Attending to Musical Pitch and Harmonicity The musical surface is rich with different types of information, all of which can direct our attention as we listen. Frequency, pitch, and harmonicity can act as predictive cues, guiding our attention toward the cued feature. Early psychophysical work had shown that subjects were better at detecting tones that are presented at an expected frequency as well as an expected pitch, giving rise to the idea that cues can combine hierarchically, as reviewed above (Hafter & Saberi, 2001). However, cues do not have to share perceptual features with the target in order to drive attention: Voluntary attention to a cue frequency heightens sensitivity for a different target frequency; furthermore, a visual cue can direct attention toward an auditory frequency (Hafter, Schlauch, & Tang, 1993). Thus, auditory sensitivity increases not only for what is physically presented, but also for what is attended. Signal detection is easier when the signal shares perceptual features with the attended cue, thus enabling involuntary or exogenous cueing; but also, whenever the cue provides some information that can endogenously (or voluntarily) lead to the reduction of uncertainty and the increase in predictability about the target in an ongoing task. These effects of endogenous cueing also guide expectations in a higherlevel musical context. Based on our long-term knowledge from encountering music in our culture, humans have developed expectations for commonly co-occurring musical structures such as in harmony, melody, and musical syntax. Reaction time studies have shown that our knowledge of musical syntax can act as a prime or a cue that directs attention toward musically expected stimuli, thus reducing reaction time for harmonically expected musical structures and increasing reaction time for unexpected structures (Bharucha & Stoeckig, 1986; Marmel, Tillmann, & Dowling,
2008). This enhanced attentional processing due to the priming effect of tonality is not tied to tasks that involve reacting to the feature of musical expectation itself; its effects even spread to visual processing (Escoffier & Tillmann, 2008). The priming effect of tonal expectations has been shown in non-musicians as well as musicians, suggesting that they result from implicitly learned expectations rather than from explicit musical training (Bigand, Poulin, Tillmann, Madurell, & D’Adamo, 2003). However, the effect of tonal expectations does depend on selective attention: when the task is to attend selectively to the melodic contour of a chord progression, musically trained subjects were more affected by unexpected harmonies, showing both reaction time costs and benefits relative to musically untrained subjects, who were slower overall but not affected by different unexpected chord progressions (Loui & Wessel, 2007). This again points to the analysis of complex musical materials (such as chord progressions with different voices) as auditory scenes with different streams of information in local as well as global contexts, a view echoed in other cognitive and electrophysiological studies (Justus & List, 2005; List, Justus, Robertson, & Bentin, 2007).
Temporal Attention, Prediction, and Entrainment of Musical Stimuli In addition to operating over different points in frequency, pitch, and harmonicity, attention also operates over time. Perhaps the most recent influential view of how music can contribute to the discussion in attention comes from the idea that music unfolds over time in the form of rhythm, which is the pattern of inter-onset intervals which enable the cognitive system to chunk the incoming sound stimuli in a hierarchical manner (Longuet-Higgins & Lee, 1982; Povel & Essens, 1985). The idea that attention is temporally based is not incompatible with the object-based views of attention reviewed earlier in this chapter, but more recently there has been a shift of interest specifically toward how attention changes dynamically over time. This is modeled by the Dynamic Attending Theory, which posits that attention fluctuates in rhythmically predictable pulses, giving rise to different levels of detection and identification to stimuli
presented at different times relative to the attentional rhythm (Jones, 1976; Jones & Boltz, 1989; Jones, Moynihan, MacKenzie, & Puente, 2002). Compelling evidence for the Dynamic Attending Theory comes from psychophysical studies, in which subjects were better at same-different judgments in pitch when the pitch to be judged occurred at a rhythmically predictable time (Jones et al., 2002). The study of rhythmic attention has recently become closely tied to the study of rhythmic oscillations in the brain. The idea that there are intrinsic rhythmic fluctuations in the brains of humans and other mammals is not new, going back to the late 1800s and popularized by Hans Berger in the 1920s (Millett, 2001). Berger discovered that by recording electrical signals from the human scalp, he could observe spontaneous electrical fluctuations of the electroencephalogram (EEG) at the rate of ~10 Hz, which he coined as the alpha rhythm. The power of alpha-band activity is highest during states of rest and relaxation. In contrast, activity increases in different frequency bands such as beta (~20 Hz), gamma (>30 Hz), and delta (2–4 Hz) have been observed during different mental states. These bands of oscillatory activity, and the physical relationships between them, are hypothesized to have functional significance for enabling long-range neuronal communications across the brain. In particular, beta activity is shown to track the beat during the perception and imagery of rhythmic music (Fujioka, Ross, & Trainor, 2015). Rhythmic synchronization to the beat frequency is strongest over the motor areas (Nozaradan, Zerouali, Peretz, & Mouraux, 2013), suggesting an involvement of the motor system in attending to the beat, consistent with fMRI work (Grahn & Brett, 2007). Furthermore, bursts of activity in the beta band are found to originate in the left sensorimotor cortex and influences activity in the auditory cortex, suggesting that the motor system, with its intrinsic oscillatory activity in the beta band, guides rhythmic attention in the auditory system (Morillon & Baillet, 2017). Together, the recent literature shows that musical rhythm drives auditory attention via the entrainment of oscillatory neuronal activity at multiple frequencies, which originates in the motor system but is tightly coupled with the auditory system. In addition to being important for understanding attention to musical rhythm, these findings also pertain to speech, which contains multiple temporal modulations at specific frequencies (Ding et al., 2017). Selective attention in the real world likely entails listening at various timescales, which affects different patterns of
neural and behavioral entrainment (Henry, Herrmann, & Obleser, 2015). Understanding how the brain organizes these fluctuating rhythms may have implications for designing music targeted toward enhancing attention. New approaches to music composition have inserted some rhythmic components (e.g., fast rhythmic amplitude modulations) to the musical stimulus to target specific neuronal oscillations, with the ultimate goal of improving cognition (James et al., 2017). This approach is promising as it may offer therapeutic possibilities for music-based training of executive functions that makes use of the rhythmic temporal properties of attention to achieve optimal goaldirected behavior.
M
E
F
Executive functions (EFs) include processes related to planning and selfcontrol, as well as attention, working memory, mental inhibition, and cognitive flexibility (Diamond, 2013). This subset of cognitive function enables us to readily manipulate and prioritize information, filter through distractors, balance our thoughts, and switch between tasks to optimize cognitive performance. Without these processes, we would not be able to concentrate on important tasks, think before acting, adapt to unexpected challenges, resist temptations, or generally function cognitively in our daily lives. The fundamental EFs, namely inhibition, interference control, working memory, and cognitive flexibility, play important roles in development, intelligence, and social and cognitive health. The question as to whether and how EFs are enhanced through either passive music listening or more active long-term musical training has gained increased attention. The proposition that music and musical training may influence executive functioning has been a topic of debate in recent years, perhaps first widely popularized in media and public interest by the Mozart Effect (Rauscher, Shaw, & Ky, 1993). The idea that merely listening to music could improve our grades in school, our ability to focus, or even our general IQ was at once exciting and applicable, not to mention marketable. Since the inception of the Mozart Effect, however, research has debunked the idea that passively listening to Mozart can transfer to cognitive gains outside of the musical domain. And so, the questions
remain: Does music training confer non-musical advantages? If so, how? The long-term effect of music training is arguably the most active area of music and the brain research today. This section will delineate the current theories and literature on what potential effects music may have on EFs, the roles of near versus far transfer, and the functions of specific neural mechanisms on EFs and transfer. Since the Mozart Effect has largely been discredited, the focus in music cognition research has been the long-term, more effortful effects of musical training. Unlike passive listening, long-term music training engages more of our neural and cognitive circuitry and thus can be expected to induce structural and functional plastic changes in the brain. The importance of discerning whether musical training promotes any advantages to EFs relates to the question of the transferring of skills, or transfer. The transfer and generalization of learning and skills from one area to another, then, can increase general cognitive capacities. Near transfer occurs within a specific modality (e.g., music and speech) whereas far transfer occurs between two less obviously related domains (e.g., music and IQ or music and conflict monitoring). While nearer forms of transfer between music and related areas have been demonstrated, far transfer is harder to prove.
Association Studies Suggesting Near Transfer Studying near transfer as a means to understand the possible effects of music on related cognitive abilities and EFs include association studies looking at groups of children and adults, some musically trained and some untrained. From these comparison studies between subjects with different levels of musical training, we know that training has measureable effects on the brain as indicated by auditory evoked responses, such as those generated from the brainstem (Kraus & Chandrasekaran, 2010). Patel’s OPERA hypothesis postulates that musical training benefits the neural encoding of speech in five ways, the first of which is overlap between neural resources for music and speech (Patel, 2011a). This is supported by many known associations between musical training, speech, and language skills. For instance, musical training improves auditory skills such as pitch discrimination, which is associated with children’s reading
abilities and phonemic knowledge, providing evidence of an association between musical abilities and the EF needed for reading and linguistic processing (Lamb & Gregory, 1993). Children with better pitch perception and production abilities also perform better at phonemic awareness tests even after controlling for intelligence and musical training (Loui, Kroog, Zuk, Winner, & Schlaug, 2011), providing additional support for shared neural resources for musical (pitch) and speech (phonemic) awareness. Advantages of pitch discrimination generalize to tasks that involve the perception of pitch in speech, and may be generally helpful in non-musical, cognitive tasks (Lolli, Lewenstein, Basurto, Winnik, & Loui, 2015). Still, association studies of near transfer lack certain clarity due to potential confounds, such as parental income, education, and other indirect causes of non-random allocation of participants. Theoretically, an influential model that has been proposed to underlie near transfer between music and language is Patel’s shared syntactic resource integration hypothesis (SSRIH) (Patel, 2003). The SSRIH proposes that syntax in language and music share a common set of processes, executed in temporal and frontal brain regions. The proposal of a synergistic processing scheme between music and language was demonstrated when both reaction time and reading comprehension were especially taxed due to the need to simultaneously integrate syntactically ambiguous grammar and harmonic violations (Slevc, Rosenberg, & Patel, 2009). Supporting the SSRIH, these findings reinforce the theory that music and language draw on a common pool of limited processing resources for approaching and making sense of incoming elements into syntactic structures. The resolution of perceptual and cognitive conflicts, or cognitive control, then, has been implicated in both music and linguistic processing. Patel’s demonstration of interactive effects between the two modalities suggests the presence of near transfer between syntactic processing of music to language. Although the SSRIH posits shared resources between music and language, the nature of this resource is unclear. Slevc and Okada (2015) suggest that cognitive control, and the implicated prefrontal cortical mechanisms, may be one shared resource between the musical and linguistic domains. And while the intersection of music and language has not historically been focused on EFs, the idea that cognitive control may be controlling both syntactic domains is worth noting. The points of
convergence between processing and filtering amongst language and music, as well as the notion of transfer, may help to explain a possible mechanism by which musical training enhances cognitive functions such as EFs. These findings pose generalizable implications on immediate and longterm cognitive transfer from musical training to, say, reading exercises and vice versa. Slevc and Okada’s theory that cognitive control may be one shared resource between the musical and linguistic domains is important in understanding how detection and resolution of conflict occurs when expectations are violated and interpretations must be reworked, as in the case of grammatical and harmonic violations. By this account, musical training involves not just the incremental processing and integration of musical elements as they occur sequentially, but also the generation of musical predictions and expectations, which must sometimes be prioritized and revised in response to evolving musical input. An additional study investigating the relationship between music and EFs evaluated musical experience and its ability to predict individual differences on inhibition, updating, and set-switching in both auditory and visual modalities (Slevc, Davey, Buschkuehl, & Jaeggi, 2016). Incidentally, musical ability was indeed able to predict better performance on both auditory and visual updating tasks, even when controlling for a variety of potential confounds such as age, handedness, bilingualism, and socioeconomic status. Musical ability was not, however, clearly related to inhibitory control and was unrelated to set-switching behavior. Such mixed results from this group show that the extra-musical gains associated with musical ability are not limited to auditory processes, but rather to specific aspects of EFs. This supports a process-specific, but modality-general relationship between musical experience and non-musical aspects of cognition, hereby also bolstering the potential of near and far transfer.
Far Transfer The hypothesis that music training enhances EFs assumes that far transfer of cognitive skills takes place as a result of training; however, far transfer has not been reliable across studies (Sala & Gobet, 2017b). On one hand,
cross-sectional studies comparing musicians and non-musicians have shown positive effects of EF: Adult musicians perform better on measures of cognitive flexibility, working memory, and verbal fluency, and musically trained children also perform better on behavioral and fMRI indices of verbal fluency, rule representation, and task switching (Zuk, Benjamin, Kenyon, & Gaab, 2014). On the other hand, cross-sectional studies are still limited by the fundamental possibility that results may be due to similar confounds as the association studies, such as differences in parental education, socio-economic status (although these were mostly controlled for in the previous study), or some aspect of exposure in the home environment that is outside of the experimenter’s control, as well as pre-existing differences before initiating training. Long-term differences in EF performance, only after controlling for these potential confounds, would provide a convincing basis for the possibility of far transfer.
Longitudinal Studies on Far Transfer Longitudinal studies aim to eliminate these confounds, and the randomized controlled trial is still hailed as the gold standard for such experimental designs. In that regard, some longitudinal studies do provide support for music to EF transfer. Several longitudinal studies have tested the effects of music lessons on IQ. Preschool children who received weekly music training for six months showed higher gains on performance IQ tests than musically untrained counterparts, with effects being observable as early as the age of 3 (Gromko & Poorman, 1998). Still, some of these extra-musical gains could have been attributed to non-musical factors such as time spent with the class and with the instructor, as these were not given in the notreatment control group. Thus, an active control group is an important improvement to the design of these longitudinal studies. A 2004 longitudinal study tested the relationship between music lessons and general intelligence, here IQ (Schellenberg, 2004). The study assigned 144 children to either music lessons on keyboard or voice, or to control groups with either drama lessons or no lessons. Children in the two music groups exhibited greater increases in full-scale IQ from pre- to post-lessons, as measured by the WISC-III (Wechsler, 1991). Although the effect was fairly
small, the demonstrated enhancements generalized across all IQ subtests, index scores, and standardized measures of academic achievement. Further, the drama group exhibited improvements in measures of social behavior that were not evident amongst the music group. Here, the presence of active control groups provides more substantial evidence for the possibility of far transfer.
Behavioral Changes and Neural Mechanisms In addition to a drama lesson control group, other studies have compared music training against sports and visual art training as active control groups. One study compared the effects of two interactive computerized training programs in music and visual art on preschool children (Moreno et al., 2011). Children in the music group showed enhanced performance on verbal intelligence measures after only 20 days of training. Furthermore, this boosted performance was positively correlated with changes in eventrelated potential (ERP) measures during an executive-function task (the go/no-go task, requiring cognitive control and inhibition), here demonstrating far transfer. Such longitudinal studies with randomized, active control groups provide the most impressive evidence of the far transfer effects of music to extra-musical gains. In another longitudinal behavioral and ERP study, Habibi and colleagues (2016) compared children in music training, children in sports training, and a no-training matched control group. Children with musical training showed an improvement in their ability to detect auditory changes, as measured by cortical auditory evoked potentials to musical notes after one year of training. Specifically, the P1 amplitude, an ERP measure of auditory cortical activity, decreased significantly for all three groups, though with the largest decrease in the music group from baseline to year 2 (Habibi, Cahn, Damasio, & Damasio, 2016). A particularly robust difference between the three groups is the decrease in P1 amplitude and latency in the music group elicited by piano tones in the passive task. As decreased P1 amplitude and latency is observed in adults, these results may suggest accelerated maturity of auditory processing as a result of music training.
Combining cross-sectional and longitudinal data in a behavioral and fMRI study in children and adults, Ellis and colleagues showed that musically trained subjects were superior at melodic discrimination, with the number of hours of practice predicting the behavioral improvement. Interestingly, the underlying changes in brain activity involved increased leftward asymmetry in the supramarginal gyrus (SMG). Longitudinal fMRI data showed changes in activity of the left SMG during melodic discrimination that correlated with hours of practice, after controlling for age and previous training (Ellis, Bruijn, Norton, Winner, & Schlaug, 2013). As the left SMG is a region implicated in short-term auditory working memory, these training-related changes in left SMG activity may suggest improved working memory function over time, by co-opting brain areas that are otherwise involved in systems that are not normally engaged for music. It is worth noting that while Moreno et al. showed transfer to a nonauditory task, Habibi et al. and Ellis et al. showed effects of long-term training on neural processing of sounds, which did not involve transfer per se. Nevertheless, the neural mechanisms that changed as a result of training, that is, the left SMG and the neural generators of the P1, may be relatively domain-general, respectively subserving working memory and auditory processing more generally. The combined use of neuroimaging, electrophysiology, and behavioral tasks is fruitful for investigating transfer effects of musical training, as it provides clues as to the underlying neural mechanism behind transfer. The evolution of functional neural signatures over the course of longitudinal studies may be informative not only of how music training affects the brain, but also of how neural processes develop more generally throughout the lifespan.
Negative Findings Studies reviewed thus far have reported positive transfer effects for near transfer, and more limited but nevertheless successful results on far transfer. However, not all reports have been positive, and the effect sizes of far transfer have been small, as shown by a recent meta-analysis of the far transfer effects of musical training (Sala & Gobet, 2017a, b). Mehr and
colleagues found no reliable evidence for non-musical cognitive benefits from brief preschool music lessons (Mehr, Schachner, Katz, & Spelke, 2013). Preschool children were either given music classes, arts instruction, or no lessons. After six weeks, the participants were assessed in four distinct cognitive areas in which older arts students have been reported to excel: spatial-navigational reasoning, visual form analysis, numerical discrimination, and receptive vocabulary. At first, music class participants showed greater spatial-navigational ability than those in the visual arts class, while children from the visual arts class showed greater visual form analysis ability than children from the music class. However, the researchers were unable to replicate this trend. In the end, the children who were provided with music classes performed no better overall than those with visual arts or no classes. These findings demand caution in interpreting other positive findings for enhanced executive functioning as a result of music instruction. It may be important to note, however, that the brief training sessions from this study do not readily compare to long-term musical training. Furthermore, the selection of transfer tasks needs to take into account the underlying mechanism that could lead to transfer.
Conclusions and Implications While the popularized Mozart Effect is highly confounded, the benefits of long-term musical training on EFs seem to be promising. Music may also have protective effects against age-related hearing loss: For instance, oscillatory neural activity of older adults is less flexible to speech-paced rhythms, especially during focused attention (Henry, Herrmann, Kunke, & Obleser, 2017). While neural entrainment to speech is disrupted in older age, it may be possible that extended music lessons, which bolster speech perception at a younger age, can protect against some of this disintegration later in life (White-Schwoch, Carr, Anderson, Strait, & Kraus, 2013). Thus, further understanding the influences of musical training on executive function is crucial, as our ability to flexibly manipulate mental information is not only necessary for successful functioning in everyday life, but also has implications throughout our lifetime.
M
C
While executive function pertains to the ability of the cognitive system to work with conflicting constraints, creativity pertains to relatively unconstrained thought processes. Thinking “outside of the box” is a foremost marvel of the human mind. The ability to be creative, or to produce output that is at once novel and unexpected, yet useful and appropriate, requires some domain-specific knowledge (Csikszentmihalyi, 1996; Sternberg & Lubart, 1999; Sternberg et al., 2005). While the exact mechanisms contributing to the creative processes are still unknown, there is evidence that creativity relies on real-time contributions of multiple constituent mental processes (Goldenberg, Mazursky, & Solomon, 1999). These mental processes involve selective attention and stream segregation, long-term and autobiographical as well as working memory, idea generation and evaluation, and expectation and prediction, as well as the ability to switch between these processes. Creativity, then, incorporates some of the fundamental EFs, such as attention and mental flexibility. Creativity does differ from other components of executive function, however, in its form of thought. While executive function entails the ability to engage in deliberation and strongly constrained thinking, creative thinking has fewer deliberate constraints (Christoff, Irving, Fox, Spreng, & Andrews-Hanna, 2016). And it is due to this relatively unconstrained nature that the study of creativity has been more elusive and imprecise. In a creativity task there is no single correct answer, yet there are more and less creative answers. The standard definition of creativity is bipartite: for a work to be considered creative, it has to be both novel and useful/appropriate (Runco & Jaeger, 2012). Historical and empirical musicologists have long been interested in finding novelty in pieces of music relative to their context. This is important both for better understanding of existing works, and for the possibility of generating novel works (Collins, 2016). In contrast, the usefulness of music is difficult, if not impossible, to define. Most might expect that for artistic domains including music, the concept of usefulness in music opens up more questions than it answers, and is therefore not a good definition at all. Appropriateness is easier to define as being within the stylistic or genre-based context, for example, sonata form, variations on a theme, or classical versus jazz versus experimental music improvisation. To
be considered appropriate, one has to stay primarily within an expected genre, or within the style. In that regard, creativity in music must be considered within its historical and stylistic context. This dependence on the environment applies to creativity more generally, which must be considered relative to the domain, the field, and the creator (Csikszentmihalyi, 1996).
Musical Improvisation as a Model of Creativity Psychological studies on creativity and music have considered creativity as a set of cognitive functions. The study of musical improvisation offers a window into creativity, which is predicated upon novel combinations of existing skills (Limb & Braun, 2008). A systematic literature review of neuroscience of musical improvisation shows shared neural networks between musical improvisation and other forms of creativity, such as artistic or scientific creativity. Generally, a network of prefrontal regions is involved in musical improvisation as well as every other form of creativity (Beaty, 2015). At the same time, there are also some differences between music, artistic, and scientific creativity (e.g., insight problems). As shown in a meta-analysis of fMRI studies on creativity (Boccia, Piccardi, Palermo, Nori, & Palmiero, 2015), musical creativity often involves auditory-motor networks, such as the supplementary motor areas, in addition to other prefrontal regions that are consistently active in creativity studies. Improvisation training is fundamentally cognitive training (Biasutti, 2015). Teaching improvisation in the classroom can not only increase creativity among students (Norgaard, 2017), it may also inform cognitive theories of creativity and improvisation (Norgaard, Spencer, & Montiel, 2013). A critical review of PET, fMRI, and EEG studies on creativity showed that although there is some convergence on the importance of the prefrontal cortex, there are nevertheless many holes in the literature that would benefit from further investigations (Sawyer, 2011). A systematic understanding of musical improvisation, that combines multiple methods in musical information retrieval, psychophysics and psychometrics, and cognitive neuroscience will be useful for a thorough understanding of what creativity means, and how to foster creativity in pedagogy.
Neuroimaging Studies of Music and Creativity (For a detailed overview of neuroimaging studies on improvisation, see Chapter 20). With the advent of fMRI and the engineering of MR-compatible musical instruments (Hollinger, Steele, Penhune, Zatorre, & Wanderley, 2007), it became possible to observe functional correlates of human brain activity during jazz improvisation, comparing it to a closest non-improvised control condition. The first fMRI study on jazz improvisation compared improvised versus overlearned conditions in novel melodies and musical scales (Limb & Braun, 2008). Results showed many loci of activations, with a general trend of more activity in mesial regions during improvisation, especially in the prefrontal cortex. Another fMRI study looked at piano improvisation as an auditory-motor sequencing problem (Bengtsson, Csikszentmihalyi, & Ullén, 2007). This study also compared the task of improvisation against the task of reproducing a previously created improvisation from memory. The most significant difference in brain activity between improvisation and reproduction conditions was again found in the pre-supplementary motor area (pre-SMA); however, the improvisation condition also showed higher activity in dorsolateral prefrontal cortex and dorsal premotor cortex. Together, results are consistent with Limb and Braun (2008) in identifying a network of interacting prefrontal areas active during improvisation. Similarly, another fMRI study on musical improvisation (Berkowitz & Ansari, 2008) tested similar experimental and control conditions of improvisation versus reproduction, but with the additional comparison between rhythmic and melodic improvisation and control conditions. Results showed more activations as well as deactivations for melodic improvisation relative to rhythmic improvisation, with effects centered around motor planning regions in the frontal lobe, specifically the premotor cortex. Freestyle rap is another form of musical creativity that involves heavy use of rhythmic improvisation as opposed to melodic improvisation. One fMRI study compared brain activity during spontaneous freestyle rap to conventional rehearsed performance (Liu et al., 2012). During the freestyle condition, rap artists showed an upregulation of mesial regions (presumably important for idea generation and/or self-referential processes) and
downregulation of lateral regions which reflect rule learning. The mesial regions are part of a larger group of regions that are intrinsically correlated in their activity, together known as the Default Network (Fox & Raichle, 2007). In contrast, the lateral regions, such as the dorsolateral prefrontal cortex, are part of a larger network consistently active during executive functions (Executive Control Network) (Shirer, Ryali, Rykhlevskaia, Menon, & Greicius, 2012). Although most studies in musical creativity have shown improvisationrelated activity in prefrontal regions (including the medial prefrontal cortex, the dorsolateral prefrontal cortex, the cingulate cortex, and the pre-SMA), other studies have observed activity in the classic language and emotion networks. One study showed activity in the inferior frontal gyrus, also known as Broca’s area, while jazz musicians were interacting by “trading fours” (Donnay, Rankin, Lopez-Gonzalez, Jiradejvong, & Limb, 2014) and improvising to communicate a specific positive or negative emotional intent (McPherson, Barrett, Lopez-Gonzalez, Jiradejvong, & Limb, 2016). Broca’s area is also the known neural generator of the ERAN, an electrophysiological marker for the processing of musically unexpected events (Maess, Koelsch, Gunter, & Friederici, 2001), and recent work has shown a larger ERAN in jazz improvising musicians, suggesting increased involvement of Broca’s area following improvisation training (Przysinda, Zeng, Maves, Arkin, & Loui, 2017). Functional connectivity from fMRI results also showed that duration of improvisation experience was negatively correlated with fronto-parietal areas in the executive control network, but positively correlated with functional connectivity between areas within the auditory-motor network (Pinho, De Manzano, Fransson, Eriksson, & Ullén, 2014). Based on these recent studies, it appears that areas important for auditory-motor functions, including the language network, are as intrinsic to musical creativity as the aforementioned default and executive control networks.
Data-Driven Correlates of Creativity While the literature has generally defined creativity as the tendency to produce novel and appropriate output, the determination of creativity in the
output has generally required the consensual assessment of multiple raters (Amabile, 1982), a relatively time-consuming technique that can be sensitive to bias on the part of the raters. With recent advances in musical information retrieval, it may be fruitful to relate the definition of creativity to information that can be gleaned from the creative output itself. Since people who are more creative tend to produce more fluent, original, and flexible output (Silvia, Beaty, & Nusbaum, 2013), it may be useful to operationally define creativity as fluent production of high information content. Information theory includes many possible measures, the first of which is entropy, first defined by Shannon (1948) and subsequently used in neuroscience (Friston, 2010) and in music cognition (Hansen & Pearce, 2014). Information retrieval techniques such as the musical information retrieval toolbox (Lartillot & Toiviainen, 2007) now have relatively datadriven measures of musical information content such as entropy, as well as harmonic movement, spectral centroid change, and onset detection. Applying these types of information retrieval techniques to musical performances may yield useful information about the player’s creativity. A new and potentially fruitful approach comes from relating entropy from musical production to brain structure to reveal brain–behavior correlations, an approach beginning to be adopted in recent studies (Arkin, Przysinda, Pfeifer, Zeng, & Loui, 2019; Zeng, Przysinda, Pfeifer, Arkin, & Loui, 2018). As data-driven approaches of understanding become increasingly sophisticated, it has become even more important to continue relating the studies of music and the brain to find unifying approaches to data that might inform both fields. We can move toward biomarkers of creativity by having rigorously defined outcome measures and relating these outcome measures to data from the brain. In this way, music offers a promising approach for the most useful way to conceptualize creativity.
Personality and Cognitive Profiles of Creative Musicians Examining personality and cognitive profiles of creative musicians has also lent interesting insight into the neuropsychological study of creativity. Jazz musicians tend to be more creative, as measured by the Divergent Thinking
Test (Benedek, Borovnjak, Neubauer, & Kruse-Weber, 2014). These differences are not only domain-specific to music, but also generalize to domain-general indicators of divergent thinking outside the musical realm. Kleinmintz and colleagues (Kleinmintz, Goldstein, Mayseless, Abecasis, & Shamay-Tsoory, 2014) also showed higher divergent thinking scores and alternative uses task performance in improvising musicians, with the mediating effect of idea evaluation. Specifically, the evaluation of creativity mediated the effect. Furthermore, Przysinda and colleagues (2017) showed higher scores on the divergent thinking task among jazz musicians. In terms of personality measures, Benedek and colleagues (2014) showed different personality profiles in jazz and improvisational musicians. Specifically, jazz and improv musicians are more open to experience, as are jazz listeners (Rentfrow & Gosling, 2003). This is consistent with the creativity literature in general: there is a consistent statistical association between creativity and openness to experience (McCrae, 1987). Although this association is well replicated, the direction of causality is unknown. Perhaps being open to experience makes you more creative; perhaps being creative makes you more open to experience, or perhaps both are due to some other variable(s). Hopefully the neurocognitive knowledge of creativity will inform better music making in performance and in the classroom (Biasutti, 2015), while improving understanding of how musical knowledge might transfer to extra-musical outcomes in other areas of cognition.
C Following our adopted definition in the beginning of this chapter of music as organized sound, we have now seen that organized sounds are generated in many situations that are barely musical, if at all. For example, experimental stimuli in an auditory research study are intentionally organized sounds that vary in their musicality. The extent to which these intentional sounds become perceived as music may involve our attention toward its context and the many elements of the musical surface. The literature we reviewed also shows that fundamentally, the human mind is “an anticipator, an expectation-generator” (Dennett, 2008). As expectation
shapes all that we experience, how we perceive music also depends on our expectation. Music interfaces with many aspects of cognition: from attention, which is linked to stimulus processing and selection, to creativity, which involves generating new stimuli as well as reacting to them. At another level, music requires and influences executive function, which is the collection of our brain’s central executive processes, which we must deploy to interact with music. Open questions pertain to the intersection of these three sections: Does better executive function give rise to better creativity? Or are the two constructs inversely related? How does attention to specific elements of the musical surface enable or enhance creativity? Understanding these seemingly disparate aspects of cognitive function as interrelated can drive the formulation of new and interesting research questions, which might inform our understanding of music as well as cognitive science more generally.
R Amabile, T. M. (1982). Social psychology of creativity: A consensual assessment technique. Journal of Personality and Social Psychology 43(5), 997–1013. Arkin, C., Przysinda, E., Pfeifer, C., Zeng, T., & Loui, P. (2017). Information content predicts creativity in musical improvisation: A behavioral and voxel-based morphometry study. Under review. Arkin, C., Przysinda, E., Pfeifer, C., Zeng, T., & Loui, P. (2019). Grey matter correlates of creativity in musical improvisation. Under review. Beaty, R. E. (2015). The neuroscience of musical improvisation. Neuroscience & Biobehavioral Reviews 51, 108–117. Benedek, M., Borovnjak, B., Neubauer, A. C., & Kruse-Weber, S. (2014). Creativity and personality in classical, jazz and folk musicians. Personality and Individual Differences 63, 117–121. Bengtsson, S. L., Csikszentmihalyi, M., & Ullén, F. (2007). Cortical regions involved in the generation of musical structures during improvisation in pianists. Journal of Cognitive Neuroscience 19, 830–842. Berkowitz, A. L., & Ansari, D. (2008). Generation of novel motor sequences: The neural correlates of musical improvisation. NeuroImage 41(2), 535–543. Bharucha, J. J., & Stoeckig, K. (1986). Reaction time and musical expectancy: Priming of chords. Journal of Experimental Psychology: Human Perception and Performance 12(4), 403–410. Biasutti, M. (2015). Pedagogical applications of the cognitive research on music improvisation. Frontiers in Psychology 6. Retrieved from https://doi.org/10.3389/fpsyg.2015.00614 Bigand, E., Poulin, B., Tillmann, B., Madurell, F., & D’Adamo, D. A. (2003). Sensory versus cognitive components in harmonic priming. Journal of Experimental Psychology: Human Perception and Performance 29(1), 159–171.
Boccia, M., Piccardi, L., Palermo, L., Nori, R., & Palmiero, M. (2015). Where do bright ideas occur in our brain? Meta-analytic evidence from neuroimaging studies of domain-specific creativity. Frontiers in Psychology 6, 1195. Retrieved from https://doi.org/10.3389/fpsyg.2015.01195 Bregman, A. S. (1994). Auditory scene analysis: The perceptual organization of sound. Cambridge, MA: MIT Press. Broadbent, D. E. (1982). Task combination and selective intake of information. Acta Psychologica 50(3), 253–290. Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and with two ears. Journal of the Acoustical Society of America 25(5), 975–979. Christoff, K., Irving, Z. C., Fox, K. C., Spreng, R. N., & Andrews-Hanna, J. R. (2016). Mindwandering as spontaneous thought: A dynamic framework. Nature Reviews Neuroscience 17(11), 718–731. Collins, D. (2016). The act of musical composition: Studies in the creative process. New York: Routledge. Csikszentmihalyi, M. (1996). Creativity: Flow and the psychology of discovery and invention. New York: HarperCollins. De Freitas, J., Liverence, B. M., & Scholl, B. J. (2014). Attentional rhythm: A temporal analogue of object-based attention. Journal of Experimental Psychology: General 143(1), 71–76. Dennett, D. C. (2008). Kinds of minds: Toward an understanding of consciousness. New York: Basic Books. Deouell, L. Y., Deutsch, D., Scabini, D., Soroker, N., & Knight, R. T. (2007). No disillusions in auditory extinction: Perceiving a melody comprised of unperceived notes. Frontiers in Human Neuroscience 1, 15. Retrieved from https://doi.org/10.3389/neuro.09.015.2007 Deutsch, D. (1974). An illusion with musical scales. Journal of the Acoustical Society of America 56(S1). Retrieved from https://doi.org/10.1121/1.1914084. Diamond, A. (2013). Executive functions. Annual Review of Psychology 64, 135–168. Ding, N., Patel, A. D., Chen, L., Butler, H., Luo, C., & Poeppel, D. (2017). Temporal modulations in speech and music. Neuroscience & Biobehavioral Reviews 81(Part B), 181–187. Donnay, G. F., Rankin, S. K., Lopez-Gonzalez, M., Jiradejvong, P., & Limb, C. J. (2014). Neural substrates of interactive musical improvisation: An fMRI study of “trading fours” in jazz. PLoS ONE 9, e88665. Ellis, R. J., Bruijn, B., Norton, A. C., Winner, E., & Schlaug, G. (2013). Training-mediated leftward asymmetries during music processing: A cross-sectional and longitudinal fMRI analysis. NeuroImage 75, 97–107. Escoffier, N., & Tillmann, B. (2008). The tonal function of a task-irrelevant chord modulates speed of visual processing. Cognition 107(3), 1070–1083. Fox, M. D., & Raichle, M. E. (2007). Spontaneous fluctuations in brain activity observed with functional magnetic resonance imaging. Nature Reviews Neuroscience 8(9), 700–711. Friston, K. (2010). The free-energy principle: A unified brain theory? Nature Reviews Neuroscience 11, 127–138. Fujioka, T., Ross, B., & Trainor, L. (2015). Beta-band oscillations represent auditory beat and its metrical hierarchy in perception and imagery. Journal of Neuroscience 35(45), 15187–15198. Goldenberg, J., Mazursky, D., & Solomon, S. (1999). Creative sparks. Science 285(5433), 1495– 1496. Grahn, J. A. (2012). See what I hear? Beat perception in auditory and visual rhythms. Experimental Brain Research 220(1), 51–61. Grahn, J. A., & Brett, M. (2007). Rhythm and beat perception in motor areas of the brain. Journal of Cognitive Neuroscience 19(5), 893–906.
Gromko, J. E., & Poorman, A. S. (1998). The effect of music training on preschoolers’ spatialtemporal task performance. Journal of Research in Music Education 46(2), 173–181. Habibi, A., Cahn, B. R., Damasio, A., & Damasio, H. (2016). Neural correlates of accelerated auditory processing in children engaged in music training. Developmental Cognitive Neuroscience 21, 1–14. Hafter, E. R., & Saberi, K. (2001). A level of stimulus representation model for auditory detection and attention. Journal of the Acoustical Society of America 110, 1489. Retrieved from https://doi.org/10.1121/1.1394220 Hafter, E. R., Sarampalis, A., & Loui, P. (2008). Auditory attention and filters. In W. Yost (Ed.), Auditory perception of sound sources (pp. 115–142). Dordrecht: Springer. Hafter, E. R., Schlauch, R. S., & Tang, J. (1993). Attending to auditory filters that were not stimulated directly. Journal of the Acoustical Society of America 94, 743–747. Retrieved from https://doi.org/10.1121/1.408203 Hansen, N. C., & Pearce, M. T. (2014). Predictive uncertainty in auditory sequence processing. Frontiers in Psychology 5, 1052. Retrieved from https://doi.org/10.3389/fpsyg.2014.01052 Henry, M. J., Herrmann, B., Kunke, D., & Obleser, J. (2017). Aging affects the balance of neural entrainment and top-down neural modulation in the listening brain. Nature Communications 8, 15801. doi:10.1038/ncomms15801 Henry, M. J., Herrmann, B., & Obleser, J. (2015). Selective attention to temporal features on nested time scales. Cerebral Cortex 25(2), 450–459. Hollinger, A., Steele, C., Penhune, V., Zatorre, R., & Wanderley, M. (2007). fMRI-compatible electronic controllers. In Proceedings of the 7th international conference on New Interfaces for Musical Expression (pp. 246–249). New York: ACM. doi:10.1145/1279740.1279790 James, T., Przysinda, E., Sampaio, G., Woods, K. J. P., Hewett, A., Morillon, B., & Loui, P. (2017). Acoustic effects on oscillatory markers of sustained attention. Presentation at the International Conference on Auditory Cortex. Banff, Canada. James, W. (1890). The principles of psychology. New York: Henry Holt. Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention, and memory. Psychological Review 83(5), 323–355. Jones, M. R., & Boltz, M. (1989). Dynamic attending and responses to time. Psychological Review 96(3), 459–491. Jones, M. R., Moynihan, H., MacKenzie, N., & Puente, J. (2002). Temporal aspects of stimulusdriven attending in dynamic arrays. Psychological Science 13(4), 313–319. Justus, T., & List, A. (2005). Auditory attention to frequency and time: An analogy to visual localglobal stimuli. Cognition 98(1), 31–51. Kleinmintz, O. M., Goldstein, P., Mayseless, N., Abecasis, D., & Shamay-Tsoory, S. G. (2014). Expertise in musical improvisation and creativity: The mediation of idea evaluation. PLoS ONE 9, e101568. Koelsch, S., Gunter, T. C., Friederici, A. D., & Schröger, E. (2000). Brain indices of music processing: Nonmusicians are musical. Journal of Cognitive Neuroscience 12(3), 520–541. Kraus, N., & Chandrasekaran, B. (2010). Music training for the development of auditory skills. Nature Reviews Neuroscience 11, 599–605. Lamb, S. J., & Gregory, A. H. (1993). The relationship between music and reading in beginning readers. Educational Psychology 13(1), 19–27. Lartillot, O., & Toiviainen, P. (2007). A Matlab toolbox for musical feature extraction from audio. In Proceedings of the 10th International Conference on Digital Audio Effects (pp. 237–244). Bordeaux, France. Retrieved from http://dafx.labri.fr/main/papers/p237.pdf
Limb, C. J., & Braun, A. R. (2008). Neural substrates of spontaneous musical performance: An fMRI study of jazz improvisation. PLoS ONE 3, e1679. List, A., Justus, T., Robertson, L. C., & Bentin, S. (2007). A mismatch negativity study of local– global auditory processing. Brain Research 1153, 122–133. Liu, S., Chow, H. M., Xu, Y., Erkkinen, M. G., Swett, K. E., Eagle, M. W., … Braun, A. R. (2012). Neural correlates of lyrical improvisation: An fMRI study of freestyle rap. Scientific Reports 2, 834. doi:10.1038/srep00834 Lolli, S., Lewenstein, A. D., Basurto, J., Winnik, S., & Loui, P. (2015). Sound frequency affects speech emotion perception: Results from congenital amusia. Frontiers in Psychology 6. Retrieved from https://doi.org/10.3389/fpsyg.2015.01340 Longuet-Higgins, H. C., & Lee, C. S. (1982). The perception of musical rhythms. Perception 11(2), 115–128. Loui, P., Grent-’T-Jong, T., Torpey, D., & Woldorff, M. (2005). Effects of attention on the neural processing of harmonic syntax in Western music. Cognitive Brain Research 25(3), 678–687. Loui, P., Kroog, K., Zuk, J., Winner, E., & Schlaug, G. (2011). Relating pitch awareness to phonemic awareness in children: Implications for tone-deafness and dyslexia. Frontiers in Psychology 2, 111. Retrieved from https://doi.org/10.3389/fpsyg.2011.00111 Loui, P., & Wessel, D. (2007). Harmonic expectation and affect in Western music: Effects of attention and training. Perception & Psychophysics 69(7), 1084–1092. McCrae, R. R. (1987). Creativity, divergent thinking, and openness to experience. Journal of Personality and Social Psychology 52(6), 1258–1265. McPherson, M. J., Barrett, F. S., Lopez-Gonzalez, M., Jiradejvong, P., & Limb, C. J. (2016). Emotional intent modulates the neural substrates of creativity: An fMRI study of emotionally targeted improvisation in jazz musicians. Scientific Reports 6, 18460. doi:10.1038/srep18460 Maess, B., Koelsch, S., Gunter, T. C., & Friederici, A. D. (2001). Musical syntax is processed in Broca’s area: An MEG study. Nature Neuroscience 4, 540–545. Marmel, F., Tillmann, B., & Dowling, W. J. (2008). Tonal expectations influence pitch perception. Perception & Psychophysics 70(5), 841–852. Mehr, S. A., Schachner, A., Katz, R. C., & Spelke, E. S. (2013). Two randomized trials provide no consistent evidence for nonmusical cognitive benefits of brief preschool music enrichment. PloS ONE 8(12), e82007. Millett, D. (2001). Hans Berger: From psychic energy to the EEG. Perspectives in Biology and Medicine 44(4), 522–542. Moreno, S., Bialystok, E., Barac, R., Schellenberg, E. G., Cepeda, N. J., & Chau, T. (2011). Shortterm music training enhances verbal intelligence and executive function. Psychological Science 22(11), 1425–1433. Morillon, B., & Baillet, S. (2017). Motor origin of temporal predictions in auditory attention. Proceedings of the National Academy of Sciences 114(42), E8913–E8921. Norgaard, M. (2017). Developing musical creativity through improvisation in the large performance classroom. Music Educators Journal 103(3), 34–39. Norgaard, M., Spencer, J., & Montiel, M. (2013). Testing cognitive theories by creating a patternbased probabilistic algorithm for melody and rhythm in jazz improvisation. Psychomusicology: Music, Mind, and Brain 23(4), 243–254. Nozaradan, S., Zerouali, Y., Peretz, I., & Mouraux, A. (2013). Capturing with EEG the neural entrainment and coupling underlying sensorimotor synchronization to the beat. Cerebral Cortex 25(3), 736–747. Patel, A. D. (2003). Language, music, syntax and the brain. Nature Neuroscience 6, 674–681.
Patel, A. D. (2011a). Why would musical training benefit the neural encoding of speech? The OPERA hypothesis. Frontiers in Psychology 2, 142. Retrieved from https://doi.org/10.3389/fpsyg.2011.00142 Patel, A. D. (2011b). Why does musical training benefit the neural encoding of speech? A new hypothesis. Journal of the Acoustical Society of America 130, 2398. Retrieved from https://doi.org/10.1121/1.3654612 Pinho, A. L., De Manzano, O., Fransson, P., Eriksson, H., & Ullén, F. (2014). Connecting to create: Expertise in musical improvisation is associated with increased functional connectivity between premotor and prefrontal areas. Journal of Neuroscience 34(18), 6156–6163. Povel, D.-J., & Essens, P. (1985). Perception of temporal patterns. Music Perception: An Interdisciplinary Journal 2(4), 411–440. Przysinda, E., Zeng, T., Maves, K., Arkin, C., & Loui, P. (2017). Jazz musicians reveal role of expectancy in human creativity. Brain and Cognition 119, 45–53. Purves, D., Cabeza, R., Huettel, S. A., Labar, K. S., Platt, M. L., Woldorff, M. G., & Brannon, E. M. (2008). Cognitive neuroscience. Sunderland: Sinauer Associates. Rauscher, F. H., Shaw, G. L., & Ky, C. N. (1993). Music and spatial task performance. Nature 365(6447), 611. Rentfrow, P. J., & Gosling, S. D. (2003). The do re mi’s of everyday life: The structure and personality correlates of music preferences. Journal of Personality and Social Psychology 84(6), 1236–1256. Runco, M. A., & Jaeger, G. J. (2012). The standard definition of creativity. Creativity Research Journal 24(1), 92–96. Sala, G., & Gobet, F. (2017a). Does far transfer exist? Negative evidence from chess, music, and working memory training. Current Directions in Psychological Science 26(6), 515–520. Sala, G., & Gobet, F. (2017b). When the music’s over: Does music skill transfer to children’s and young adolescents’ cognitive and academic skills? A meta-analysis. Educational Research Review 20, 55–67. Sawyer, K. (2011). The cognitive neuroscience of creativity: A critical review. Creativity Research Journal 23(2), 137–154. Schellenberg, E. G. (2004). Music lessons enhance IQ. Psychological Science 15(8), 511–514. Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal 27(3), 379–423. Shinn-Cunningham, B. G. (2008). Object-based auditory and visual attention. Trends in Cognitive Sciences 12(5), 182–186. Shirer, W. R., Ryali, S., Rykhlevskaia, E., Menon, V., & Greicius, M. D. (2012). Decoding subjectdriven cognitive states with whole-brain connectivity patterns. Cerebral Cortex 22(1), 158–165. Silvia, P. J., Beaty, R. E., & Nusbaum, E. C. (2013). Verbal fluency and creativity: General and specific contributions of broad retrieval ability (Gr) factors to divergent thinking. Intelligence 41(5), 328–340. Slevc, L. R., Davey, N. S., Buschkuehl, M., & Jaeggi, S. M. (2016). Tuning the mind: Exploring the connections between musical ability and executive functions. Cognition 152, 199–211. Slevc, L. R., & Okada, B. M. (2015). Processing structure in language and music: A case for shared reliance on cognitive control. Psychonomic Bulletin & Review 22(3), 637–652. Slevc, L. R., Rosenberg, J. C., & Patel, A. D. (2009). Making psycholinguistics musical: Self-paced reading time evidence for shared processing of linguistic and musical syntax. Psychonomic Bulletin & Review 16(2), 374–381. Sternberg, R. J., & Lubart, T. (1999). The concept of creativity: Prospects and paradigms. In R. Sternberg (Ed.), Handbook of creativity 1 (pp. 3–15). Cambridge: Cambridge University Press.
Sternberg, R. J., Lubart, T. I., Kaufman, J. C., & Pretz, J. E. (2005). Creativity. In K. Holyoak & R. G. Morrison (Eds.), The Cambridge handbook of thinking and reasoning (pp. 351–370). Cambridge: Cambridge University Press. Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology 12(1), 97–136. Varèse, E., & Wen-Chung, C. (1966). The liberation of sound. Perspectives of New Music 5(1), 11– 19. White-Schwoch, T., Carr, K. W., Anderson, S., Strait, D. L., & Kraus, N. (2013). Older adults benefit from music training early in life: Biological evidence for long-term training-driven plasticity. Journal of Neuroscience 33(45), 17667–17674. Woldorff, M. G., Gallen, C. C., Hampson, S. A., Hillyard, S. A., Pantev, C., Sobel, D., & Bloom, F. E. (1993). Modulation of early sensory processing in human auditory cortex during auditory selective attention. Proceedings of the National Academy of Sciences 90(18), 8722–8726. Woldorff, M. G., & Hillyard, S. A. (1991). Modulation of early auditory processing during selective listening to rapidly presented tones. Electroencephalography and Clinical Neurophysiology 79(3), 170–191. Woldorff, M. G., Hillyard, S. A., Gallen, C. C., Hampson, S. R., & Bloom, F. E. (1998). Magnetoencephalographic recordings demonstrate attentional modulation of mismatch-related neural activity in human auditory cortex. Psychophysiology 35(3), 283–292. Zeng, T., Przysinda, E., Pfeifer, C., Arkin, C., & Loui, P. (2017). Structural connectivity predicts success in musical improvisation. Under review. Zeng, T., Przysinda, E., Pfeifer, C., Arkin, C., & Loui, P. (2018). White matter connectivity reflects success in musical improvisation. bioRxiv. Zuk, J., Benjamin, C., Kenyon, A., & Gaab, N. (2014). Behavioral and neural correlates of executive functioning in musicians and non-musicians. PLoS ONE 9, e99868.
CHAPT E R 13
N E U R A L C O R R E L AT E S O F MUSIC AND EMOTION PAT R I K N . JU S L I N A N D L A U R A S . S A K K A
I W it comes to explaining the universal attraction of music as a human phenomenon, few aspects loom larger than the emotional responses it arouses. Music listeners may experience anything from startle reflexes and changes in arousal to discrete emotions such as happiness, sadness, interest, and nostalgia—as well as profound aesthetic emotions (Juslin, 2019). Such experiences are the “driving force” behind most people’s engagement with music, and might have far-reaching implications for their well-being and health (e.g., MacDonald, Kreutz, & Mitchell, 2012; Thaut & Wheeler, 2010). When systematic studies of music and emotion finally took off, around the millennium (Juslin & Sloboda, 2001), it was inevitable that neuropsychological research would play a role in that trend. While imaging studies could constrain psychological theorizing, psychological theories could guide imaging studies and help to organize their findings. Coinciding with a reappraisal of the role of emotion in human behavior in the neurosciences (Damasio, 1994), the end of the 1990s saw the first brain imaging studies focusing on emotions in music (Blood, Zatorre, Bermudez, & Evans, 1999).
Mapping the neural correlates of emotional responses to music turned out to be more difficult than initially expected, however. Even such a seemingly delimited domain as emotion appears to involve a wide range of subcortical and cortical areas, distributed across the brain (Koelsch, 2014); and unfortunately, the relevant brain regions do not come in neat little packages, which can be interpreted easily by researchers. Hence, to account for the neural correlates of musical emotions could turn out to be one of the great challenges in the neuroscience of music. The goal of this chapter is to offer a theoretical and empirical review of studies of the neural correlates of emotional responses to music, carried out over the last thirty-five years. The remainder of the chapter is structured as follows: First, we provide basic definitions and distinctions of the field of musical affect. Second, we present a theoretical framework, which could serve to organize the field. Third, we review seventy-eight empirical studies, published between 1982 and 2016. We distinguish different empirical approaches in these studies and draw general conclusions based on their results. Finally, we consider the implications of these findings and offer some methodological recommendations for future studies.
M
A D
: D
Emotions belong to the field of affect, which covers a range of phenomena. The common and defining feature is valence (i.e., the evaluation of an object, person, or event as being positive or negative). Most researchers also require a certain degree of arousal, in order to distinguish affect from purely cognitive judgments. Accordingly, musical affect could comprise anything from preference (e.g., liking a piece) and mood (a mild, objectless, and long-lasting affective state, e.g., feeling gloomy after hearing sad music in the background all morning) to aesthetic judgment (e.g., rating a composition as valuable as “art”). Most brain studies to date, however, have arguably focused on emotions, as defined by Juslin (2011, p. 114): Emotions are relatively brief, intense, and rapidly changing reactions to potentially important events (subjective challenges or opportunities) in the external or internal environment—often of a social nature—which involve a number of subcomponents
(cognitive changes, subjective feelings, expressive behavior, and action tendencies) that are more or less ‘synchronized’ during an emotional episode.
Changes in the intensity, quality, and complexity of an emotion could occur, from moment to moment, and such changes can be captured in terms of shifts along such emotion dimensions as arousal and valence (Russell, 1980). However, emotions may also be analyzed in terms of qualitatively distinct categories (e.g., joy, sadness, awe, nostalgia), which remain throughout an episode (Izard, 1977). Both categorical and dimensional approaches receive some support in empirical studies (e.g., Harmon-Jones, Harmon-Jones, & Summerell, 2017), though we agree with Zentner’s (2010) view that dimensional models are ultimately unable to do justice to the richness or specificity of emotional responses to music. Most researchers in the domain seem to agree that music can influence emotions (for reviews, see Juslin & Sloboda, 2010), so the primary aim of current research is rather to understand the nature of this process—how it “works.” In the following section, we describe a framework that can serve to organize and guide research. First, we need to make a distinction between perception and induction of emotions: We may simply perceive (or recognize) an emotion expressed in the music or we may actually feel an emotion in ourselves. The distinction is important, because different psychological processes—and hence different neural substrates—may be involved, depending on the type of process. Whenever practically feasible, it is recommendable to measure multiple emotion components (self-reported feeling, expression, psychophysiology) in order to draw more valid conclusions about the occurrence of an aroused emotion. (If researchers do not find a coherent response in multiple emotion components, there is reason to suspect that “only” perception of emotion has occurred.)
P T
M F
: A
To explain emotional responses to music, we need to uncover the psychological mechanisms that produce perceived or induced emotion. Broadly speaking, the mechanism refers to those causal processes through which an outcome is brought into being. In the present context, this involves a functional (i.e., psychological) description of what the brain is “doing” in principle (e.g., retrieving a memory). Such a process description at the psychological level must not be confused with the separate question of where in the brain the process is implemented, or with the phenomenological experience it seeks to explain (Dennett, 1987). Several authors have proposed possible mechanisms underlying perception and induction of emotions in music, typically involving one or a few possibilities (see Berlyne, 1971; Clynes, 1977; Juslin, 2001; Langer, 1957; Meyer, 1956; Scherer & Zentner, 2001; Sloboda & Juslin, 2001). Space limitations prevent us from reviewing previous work here, but a parsimonious way to organize current theory is provided by the ICINASBRECVEMAC framework, fully described in Juslin (2019) and briefly summarized below.
Emotion Perception The first part of the acronym ICINAS-BRECVEMAC stands for IconicIntrinsic-Associative, and refers to three ways in which music carries emotional meaning. Although the case can be made that emotion perception is a more straightforward process than emotion induction, even perceived emotions may need to be decomposed into different subprocesses. Accordingly, based on the seminal distinction made by Dowling and Harwood (1986), Juslin (2013b) proposes that there are three distinct “layers” of musical expression of emotion. Each layer corresponds to a specific type of coding of emotional meaning (see Fig. 1).
FIGURE 1. Multiple-layer conceptualization of musical expression of emotions. Reproduced from Patrik N. Juslin, What does music express? Basic emotions and beyond, Frontiers in Psychology: Emotion Science 4(596), Figure 2, doi: 10.3389/fpsyg.2013.00596 © 2013 Juslin. This work is licensed under the Creative Commons Attribution License (CC BY 3.0). It is attributed to the author Patrik N. Juslin.
The core layer is based on iconically coded basic emotions. Icon refers to how music carries emotional meaning based on a formal resemblance between the music and other events that have an emotional tone (such as emotional speech and gesture). This core layer may explain findings of cross-modal parallels (Juslin & Laukka, 2003) and universal recognition of basic emotions (i.e., sadness, happiness, anger, fear, and love/tenderness) in both speech (Bryan & Barrett, 2008) and music (Fritz et al., 2009). The core layer may be extended, qualified, and even modified by two additional layers based on intrinsic and associative coding, respectively, which enable listeners to perceive also more complex or ambiguous emotions. The two additional layers are less cross-culturally invariant and depend more on the context and the listener’s individual learning (Juslin, 2019). Intrinsic coding refers to how music carries meaning based on syntactic relationships within the music itself, how one part of the music may “refer” to another part of the music, thus contributing to shifting levels of stability, tension, or arousal (“affective trajectories”; e.g., Spitzer, 2013). Associative coding, finally, refers to how music carries emotional meaning based on a more arbitrary association (e.g., temporal or spatial contiguity); a piece of music can be perceived as expressive of an emotion
just because something in the music (e.g., a melodic theme) has been repeatedly linked with other emotionally meaningful events in the past— either through chance or by design (e.g., Wagner’s “Leitmotif” strategy; see Dowling & Harwood, 1986). To illustrate this further in a musical piece, the overall emotion category or broad “emotional tone” (e.g., sadness) might be specified by iconically coded features (e.g., slow tempo, minor mode, low and often falling pitch contour, legato articulation); this basic emotion category is given “expressive shape” by intrinsically coded features (e.g., local structural features such as syncopations, dissonant intervals, and melodic appoggiaturas), creating “tension” and “release,” which contribute to more time-dependent and complex nuances of the same emotion category (e.g., sadness vs. hopelessness); to this we add the final and more personal layer of expression (e.g., that the listener associates the piece with a particular person, event, or physical location). It appears plausible that the three sources of perceived emotions—which might occur alone or in combination—involve partly different neural correlates (Juslin, 2019).
Emotion Induction Our main focus in this chapter will be on induced emotion, which appears to be more complex in terms of its neural substrates. Here, a multimechanism framework is clearly called for. The second part of the ICINASBRECVEMAC acronym refers to nine psychological mechanisms for induction of emotions (listed below), which may be activated by music (and other stimuli). An evolutionary perspective on human perception of sounds suggests that the survival of our ancient ancestors depended on their ability to detect patterns in sounds, derive meaning from them, and adjust their behavior accordingly (Juslin, 2013a; cf. Hodges & Sebald, 2011). This behavioral function can be achieved in a multitude of ways, reflecting the phylogenetic origin of our emotions. The human brain did not develop from scratch. It is the result of a long evolutionary process, during which newer brain structures were gradually
imposed on older structures (Gärdenfors, 2003). Brain circuits are laid out like the concentric layers of an onion, functional layer upon functional layer. One consequence of this arrangement, which is the result of natural selection rather than design, is that emotion can be evoked at multiple levels of the brain (Juslin, 2019). Hence, the first author of this chapter has postulated a set of induction mechanisms involving (more or less) distinct brain networks, which have developed gradually and in a specific order during evolution—from simple reflexes to complex judgments. Different mechanisms rely on different kinds of mental representation (e.g., associative, analogical, sensorimotoric), which serve to guide future action. All mechanisms have in common that they can be triggered by a “musical event” (broadly defined as music, listener, and context). The mechanisms are: •
•
•
•
•
Brainstem reflex, a hard-wired attention response to subjectively “extreme” values of basic acoustic features, such as loudness, speed, and timbre (e.g., Davis, 1984); you may become startled and surprised by the loud beginning of a rock song during a live concert. Rhythmic entrainment, a gradual adjustment of an internal body rhythm, such as heart rate, towards an external rhythm in the music (e.g., Harrer & Harrer, 1977); you may experience excitement when your heart rate is becoming gradually synchronized with a captivating and slightly faster rhythm in a piece of techno music at a nightclub. Evaluative conditioning, a regular pairing of a piece of music and other positive or negative stimuli leading to a conditioned association (e.g., Blair & Shimp, 1992); you may feel happy when you happen to hear a song that has repeatedly occurred in festive contexts previously. Contagion, an internal “mimicry” of the perceived voice-like emotional expression of the music (e.g., Juslin, 2001); you may experience sadness when you hear a slow, quiet, low-pitched performance of a classical piece on the cello, featuring much vibrato and rubato. Visual imagery, inner images of an emotional character conjured up by the listener through a metaphorical mapping of the musical structure (Osborne, 1980); you may become relaxed when you
•
•
•
indulge in mental images of a landscape suggested by a piece of “new-age” music. Episodic memory, a conscious recollection of a particular event from the listener’s past that is “triggered” by the music (Baumgartner, 1992); you may experience nostalgia when a song evokes a vivid personal memory from the specific time you met your current partner in life. Musical expectancy, a response to the gradual unfolding of the syntactical structure of the music, and its expected or unexpected continuations (Meyer, 1956); you may feel anxious due to uncertainty created by phrases without a clear tonal center in an “avant-garde” piece. Aesthetic judgment, a subjective evaluation of the aesthetic value of the music, based on an individual set of weighted criteria (Juslin, 2013a); you may take pleasure in the exceptional beauty of a Bach composition, or may admire the exceptional skills of a great performer.
In addition to these eight mechanisms, music can also arouse emotions through the default mechanism for induction of emotions: Cognitive goal appraisal (Scherer, 1999). You may become annoyed when a neighbor plays music late at night, blocking your goal of going to sleep. Cognitive appraisal appears less important in musical settings, however (Juslin, Liljeström, Västfjäll, Barradas, & Silva, 2008). For further elaboration and predictions for each mechanism, see Juslin (2019). One implication of the framework is that before one can understand an emotion in any given situation, it is necessary to know which of these mechanisms is in operation. This is because each mechanism has its own process characteristics, in terms of information focus, key brain regions, degree of cultural impact and learning, ontogenetic development, induced emotions, induction speed, availability to consciousness, dependence on musical structure, and so forth. Armed with these theoretical principles of music and emotion, we are ready to take a look at the empirical work carried out to date. Our review will be restricted to studies that explicitly focus on musical affect. (Aesthetic responses are reviewed in Chapter 15, this volume.)
R
E
S
General Overview In this section, we summarize seventy-eight neuropsychological studies, published between 1982 and 2016 (see Appendix table). Studies have been grouped with regard to methodology: PET/fMRI (38 studies, 49 percent), EEG (22 studies, 28 percent), lesions (16 studies, 20 percent), and dichotic listening (2 studies, 3 percent). They are described in terms of listeners, musical stimuli, contrast/design, method, main findings, and type of affect (e.g., measuring induced vs. perceived emotions; categories, dimensions, preferences). The categorization concerning induced vs. perceived emotion is not entirely straightforward, because brain studies do not always distinguish the processes in the design. (Previous reviews of the field have tended to inter-mix studies that focus on different aspects, induced vs. perceived emotion.) Sample size varies depending on method—PET/fMRI (M = 16.31), EEG (M = 32.00), lesions (M = 14.44), and dichotic listening (M = 18.00)—but tends to be relatively small overall. Note that PET/fMRI and EEG studies have focused mostly on induced emotion, whereas lesion and dichotic listening studies have focused mostly on perceived emotion. Blood flow studies have used mostly fMRI (as opposed to PET) and “real” (as opposed to synthesized) music, and have mostly adopted dimensional (66 percent) as opposed to discrete (34 percent) approaches to emotion. EEG studies have also used mostly “real” music—but have adopted dimensional (34 percent) and discrete (31 percent) approaches to a roughly equal degree. Lesion studies have (in contrast to other studies) used mainly synthesized music, and have mostly studied discrete emotions (75 percent), rather than dimensions (38 percent). Such differences between studies that use different methods should clearly be kept in mind when interpreting the overall results.
Empirical Approaches
The “contrast” and “emotion” columns in the appendix table are suggestive of the kind of empirical approach adopted in the study. Some early studies tended to use an open-ended exploratory approach, which simply presents listeners with supposedly “emotional” music, to see which regions might be affected. Although such an approach was defensible in the early stages, it makes it difficult to interpret the results (e.g., “It is not possible to disentangle the different subcomponents of the activation due to limitations of this experimental design,” Alfredson, Risberg, Hagberg, & Gustafson, 2004, p. 165). Thus, for instance, it may not be clear whether the study has measured perceived or induced emotion, in the absence of control conditions or converging measures. We identify at least five possible approaches in the neuropsychological study of emotions, which can serve different aims. These have been adopted, implicitly or explicitly, in music studies to highly varying degrees. We briefly summarize these approaches, before looking closer at the actual data. 1. A first approach appears to serve mainly to demonstrate that stimuli do arouse emotions by comparing the results to previous studies of emotions. Although most musicians and listeners would seem to take the emotional powers of music for granted, it has been the matter of some controversy whether music really evokes emotions (Kivy, 1990). A landmark study by Blood and Zatorre (2001) revealed—for the first time —that pleasurable responses to music influence “core” regions of the brain already linked to emotion, such as the amygdala, the hippocampus, and the ventral striatum. The demonstration of blood-flow changes in such regions appeared to make musical emotions more “real,” in the eyes of some observers. But data of this kind were over-sold, sometimes: A lot was made of the finding that enjoyment of music involves the same “reward circuits” in the brain as other forms of pleasure such as food, sex, and drugs (e.g., the nucleus accumbens); yet this discovery is not that surprising. It would have been far more surprising to discover unique “reward circuits” only for music. The major conclusion of this approach is that “the brain areas affected by emotions to music are similar to those reported in other brain studies of emotion.”
2. A second approach speaks to the previously discussed distinction between perceived and induced emotions. A meta-analysis of PET and fMRI studies of perception and induction of emotion in general (outside music) by Wager et al. (2008) concluded that the two processes involve peak activations of different brain regions, supporting the idea that these are distinct processes. Some authors argue that the processes can be distinguished in terms of prefrontal activation, such that perceived emotion activates mainly the right hemisphere (regardless of the emotion) whereas evoked emotion is lateralized according to valence: positive emotions in the left hemisphere, negative in the right (e.g., Blonder, 1999; Davidson, 1995). To the best of our knowledge, no music study thus far has directly contrasted perception and induction of emotion, but attempts to interpret data along those lines have been made (Juslin & Sloboda, 2001, p. 456). We review further evidence below. The preliminary conclusion of this approach is that “perception and induction of emotions may involve different patterns of brain activation.” 3. A third approach, already hinted at above, aims mainly to discriminate neural patterns of affective responses with regard to their valence (positive/negative). This approach has been adopted by several studies in the general emotion field. For instance, Chikazoe and colleagues (Chikazoe, Lee, Kriegeskorte, & Anderson, 2014) were able to find particular patterns with significant correlations to the degree of positive or negative valence experienced by subjects. A similar approach is often used in music. In fact, in our estimation, the use of an explicit (“positive vs. negative,” “pleasant vs. unpleasant”) or implicit (“happy vs. sad,” “consonant vs. dissonant”) valence dimension is the most common approach in blood-flow studies. Several studies indicate that positive affect is handled in the left hemisphere, whereas negative affect is handled in the right (see Altenmüller, Schurmann, Lim, & Parlitz, 2002; Daly et al., 2014; Flores-Gutiérrez et al., 2007; Schmidt & Trainor, 2001; Tsang, Trainor, Santesso, Tasker, & Schmidt, 2001). Not all of the studies seem to follow this pattern, however—at least on first view. A problem is that in some cases, it is difficult to know for sure
whether a study has measured perceived or evoked emotion, since multicomponent indices were not used. For instance, it is an open question whether ratings of pleasantness of music in some studies are just that (ratings of the stimuli) or whether they index feelings of pleasure. If there is insufficient control over which process is actually elicited in studies, this can explain the mixed findings. We submit that the results suggest some degree of specificity in terms of valence, but the nature of these patterns and their interpretation remain contested. Yet, a preliminary conclusion of this approach is that “neural correlates can distinguish the valence of musically aroused affect.” 4. A fourth approach seeks to obtain links between discrete emotions and neural structures. This is part of an ongoing debate about whether there is emotion-specificity in responding more generally. Some neuroscientists claim to have been able to distinguish neural activity in terms of discrete emotions (see Damasio et al., 2000; Kassam, Markey, Cherkassky, Loewenstein, & Just, 2013; Murphy, Nimmo-Smith, & Lawrence, 2003; Saarimäki et al., 2016). We should clearly acknowledge, however, that the hypothesis of emotionspecific activation remains controversial. A recent review failed to obtain any evidence that discrete emotions can be consistently localized to distinct brain regions (cf. Clark-Polner, Wager, Satpute, & Barrett, 2016). This is sometimes cited as evidence against a discrete emotions approach. However, the very same review failed to obtain evidence of specific regions linked with dimensions such as valence also! Hence, the authors argue that the localization hypothesis for affective states—whether discrete or dimensional—is flawed in general. Previous neuropsychological studies may, indeed, have been too eager to localize particular emotions in specific parts of the brain. Some tendencies in a similar direction may be found in music research also—for instance, linking the amygdala to fear perception (Peretz, 2001), and the hippocampus to tender emotions (Koelsch, 2014), although both these structures are clearly involved in a much wider range of emotions. There is a risk here that neuroscientists “claim” certain areas as “music-specific” or “emotion-specific” when, in fact, they are neither.
In our view, both the proponents and critics of the emotion-specificity approach have tended to confuse causal mechanisms with affective outcomes: there is no reason to assume emotion specificity in the former (e.g., a “memory area” may be active across emotions), even though there is specificity in the felt emotions (nostalgia vs. awe). In a meta-analysis, Lindquist and colleagues (Lindquist, Wager, Kober, Bliss-Moreau, & Barrett, 2012) observed a set of interacting brain regions commonly involved in basic psychological operations of both an emotional and nonemotional nature during emotion experience, across a range of discrete emotion categories. The authors argue that this finding is consistent with a “constructive” approach to emotion (Barrett, 2017). However, it is equally consistent with the BRECVEMAC framework presented earlier. The major conclusion of this approach, then, is that “although there may be some limited level of emotion specificity in regions linked to conscious emotional experience, most areas involve domain-general processes (such as memory) which are active not only during emotions.” This, then, leads us to the fifth and final approach. 5. The fifth approach focuses on underlying psychological processes or brain functions; that is, mechanisms (e.g., Cabeza & Nyberg, 2000). By carefully isolating distinct psychological processes in the experimental design, one can link neural correlates to mental functions. For example, episodic memories might involve a partly distinct brain network from conditioned responses. This approach is the “essence” of neuropsychology and has been successful in the neurosciences more generally. Yet this approach is still rare in the music field (Janata, 2009; Steinbeis, Koelsch, & Sloboda, 2006). Over time, one may discern a change from basic lateralization studies (e.g., dichotic listening) and a search for individual brain structures to a consideration of more complex and distributed networks. But we are not aware of any study of neural correlates so far that contrasts different psychological mechanisms. (We will consider such an approach later in the chapter.) Thus, the increasing awareness of the role of mechanisms has not yet translated into concrete designs. This becomes clear when we take a closer look at the findings.
Summary of Brain Imaging Data At the current stage, the data that are potentially most informative when it comes to pinpointing neural correlates of musical emotion come from the (38) brain imaging studies conducted to date. Tables 1 and 2 summarize the main findings for perceived and induced emotion, respectively, in terms of broad brain areas for which blood-flow changes have been reported. Ideally, the interpretation should be made in terms of “networks” (Bressler & Menon, 2010), rather than “isolated” regions, but current results do not yet enable such interpretations.
Some broad conclusions can be drawn based on the findings. First, music listening can cause changes in blood flow in “core” regions for emotional processing. Second, as noted by Peretz (2010, p. 119), “there is not a single, unitary emotional system underlying all emotional responses to music.” On the contrary, a fairly broad range of cortical and subcortical brain regions seem to be linked to musical emotions. Most of these belong to the (extended) limbic system and include the amygdala, the hippocampus, the striatum (including nucleus accumbens), the cingulate cortex, the insula, the prefrontal and orbitofrontal cortex, the cerebellum, the frontal gyrus, the parahippocampal gyrus, and various brainstem structures. The data in Tables 1 and 2 also enable us to compare induced and perceived emotions. As may be seen, there is some overlap between the
brain regions reported. This could reflect two things: (a) that there is some extent of overlap in the neural correlates of these processes or (b) that studies have not sufficiently distinguished between the processes—such that some studies that ostensibly focus on induced emotion have measured perceived emotion and vice versa; or that some studies measure both processes at the same time—leading to “noisy” data. Few studies have measured multiple components of emotion so as to enhance the validity of conclusions about induced emotions (discussed at the beginning of this chapter). In any case, note that there are certain differences in the findings for the two processes: Only for induction of emotion have several studies reported changes in the amygdala, the striatum (including nucleus accumbens), and the hippocampus. At least some of these areas may thus distinguish induced emotions from mere perception, though studies that directly contrast the two processes under controlled conditions are clearly required to confirm this hypothesis.1 Beyond these simple and relatively trivial conclusions, interpretations of the findings tend to become more difficult and “impressionistic” in nature. Given a general lack of “process-pure” manipulations of mechanisms, researchers have to rely on “informed speculations” about the possible role of different brain structures and networks. These are typically based on general knowledge of the brain, but tend to be relatively vague. This is because the analyses involve very broad brain areas which have been proposed to be involved in a wide range of different psychological processes; that is, they have poor “selectivity” (Poldrack, 2006) when it comes to “revealing” specific psychological processes. Koelsch (2014, p. 172) submits that observed changes in the amygdala “could be because music is perceived as a stimulus with social significance owing to its communicative properties.” This is, indeed, one possibility— but we really do not know. And even if this notion is correct, it does not offer very precise information about the functional role of the amygdala. An additional problem is that this form of “reverse inference” about cognitive process is not deductively valid (Poldrack, 2006). Normally, we would infer from brain imaging data that “when cognitive process X is active, then brain area Y is active”—not the other way around. Let us be clear: This is not a matter of competence. Informed speculations and interpretations by distinguished neuroscientists like Stefan
Koelsch or Isabelle Peretz are as good as they get. The problem is rather that in the absence of process-specific experimental manipulation in the field as a whole, theoretical interpretations are rendered difficult for a number of reasons. A first problem is that brain imaging “cannot disentangle correlation from causation” (Peretz, 2010, p. 114); a related problem is that results from imaging studies tend to be “overinclusive” (Peretz, 2010, p. 114); therefore, “it is not always easy to determine if the activity is related to emotional or non-emotional processing of the musical structure” (Peretz, 2010, p. 112). Indeed, the same brain structure can serve different roles both within and across domains (Kreutz & Lotze, 2007). In addition, as implied by the ICINAS-BRECVEMAC framework, cognition and emotion are not neatly separated in the brain: specific cognitive processes may be involved depending on the mechanism responsible for the perceived or induced emotion. The specific listener task (self-report of felt affect, ratings of melodies, or mere listening) may also affect the patterns of brain activation/deactivation, and so may differences with respect to the music stimuli (“real” vs. “synthesized” music, “familiar” vs. “unfamiliar,” “selfselected” vs. “experimenter-selected”). All of these issues conspire to make interpretations of findings from brain imaging studies problematic. This has not prevented researchers from suggesting how to organize the findings with regard to the processes of perception and induction, respectively.
Perception of Emotions Double brain dissociations between emotional judgments and melody recognition (Peretz & Gagnon, 1999), and between emotional judgments and basic music perception (Peretz, Gagnon, & Bouchard, 1998), initially lead Peretz (2001) to postulate an “emotion module,” dedicated to perception of emotion in music. Subsequently, she proposed that a more distributed network, originally evolved to process vocal emotions, has been “invaded” by music, such that emotional speech and emotional music will share neural resources (Peretz, 2010). This idea has received some support (Escoffier, Zhong, Schirmer, & Qui, 2013) and is in line with documented
parallels in emotions between speech and music (Juslin & Laukka, 2003). Studies on emotions in speech suggest a network of areas primarily in the (right) frontal and parietal lobes, including the inferior frontal gyrus (Schirmer & Kotz, 2006). The possibility of cross-modal parallels can be explored in the context of the present results (Table 1). For perceived emotions, the most frequently reported regions are frontal areas (73 percent of studies) and the frontal gyrus (45 percent). Note that Escoffier et al. (2013) found that tracing of emotions in both speech and music was related to activity in the medial SFG. Moreover, Nair, Large, Steinberg, and Kelso (2002) discovered that listening to expressive (as compared to “mechanical”) music performances increased activity in the right inferior frontal gyrus. These findings seem consistent with the “shared-resources hypothesis” (further evidence of a shared neural code was recently reported by Paquette, Takerkart, Saget, Peretz, & Belin, 2018). There are some additional brain regions implicated in emotionperception studies. Curiously, there are three studies (27 percent) that report changes in the cerebellum during perceived emotion, and five studies (45 percent) that report changes in the anterior cingulate cortex (which occurs also in evoked emotion; cf. Table 2). We return to these findings later. It has further been argued that the perception of dissonance is linked to the parahippocampal gyrus (Blood et al., 1999). This notion receives support from lesion studies showing that this basic ability suffers after damage to the parahippocampal gyrus (Gosselin et al., 2006). Only two of eleven studies (18 percent) report changes in the amygdala (see Table 1)— though it has been found that recognition of “scary” music suffers after damage to the amygdala (Gosselin et al., 2005). It cannot be completely ruled out that the two studies really measured evoked emotion rather than just perceived (since they featured unpleasant stimuli that may have evoked some negative emotion). In summary, the most consistent results are that perception of emotions in music involves the frontal cortex and the frontal gyrus—and, perhaps, some right hemisphere lateralization (Bryden, Ley, & Sugarman, 1982).
Induction of Emotions
For induction of emotions, a larger number of brain regions have been reported (Table 2). The most frequently reported areas include the amygdala (63 percent of studies), the frontal cortex (70 percent; Pfc 37 percent), the ventral striatum/NAc (44 percent), the hippocampus (52 percent), the insula (48 percent), and the anterior cingulate cortex (41 percent). However, note that the results vary a lot from study to study, in ways that are not easy to explain. For example, it may be seen in Table 2 that there are numerous additional regions that were reported in only one or a few studies. These include the parahippocampus, the thalamus, the basal ganglia, the cerebellum, motor regions, and the brainstem. One approach to this problem is to look for areas that are consistently activated across studies in the hope that this will reveal an emotion network that is invariably involved in the process. Thus, for instance, Koelsch and colleagues (Koelsch, Siebel, & Fritz, 2010) argue that a network consisting of the amygdala, the hippocampus, the parahippocampus, the temporal poles, and the pregenual cingulate cortex may play a consistent role in emotional processing of music. But is there support for the idea of a set of brain regions that are consistently activated? Close inspection of Table 2 reveals that few brain regions are reported in more than about half of the studies which purported to measure induced emotions. If areas are not consistently found to be influenced, how is this to be interpreted? Some of this variability is surely due to methodological problems and consequent measurement error. This could include differences in how regions of interest (ROI) are defined, or in the assumptions made in the analysis. But assuming that limbic regions were “prime suspects” in the analyses, the variability is still too large to be accounted for by (only) this factor. In principle, one may argue that if these studies have tried to measure emotion and the listed regions are not consistently activated across studies, then either these areas are not related to emotions, or these studies have not consistently managed to induce any emotion. However, a different interpretation suggested by the BRECVEMAC framework (and supported by meta-analyses of “general” emotion findings; cf. Lindquist et al., 2012) is that the variability is due to different psychological mechanisms being activated in different investigations (depending on the musical stimuli, the listeners, and the situation, as well as the experimental procedure). This possibility is elaborated in the following section.
T
M
P
A
If neuropsychology “aims to relate neural mechanisms to mental functions” (Peretz, 2010, p. 99), and most previous studies have not tried to manipulate mechanisms that involve distinct mental functions (discussed earlier), it is hard to resist the conclusion that studies in this field have somehow attempted to do neuropsychology, although without the psychology. There is one exception: Janata (2009) focused specifically on the process of autobiographical memory and found that dorsal regions of the medial prefrontal cortex responded to the relative degree of autobiographical salience of musical stimuli (rated post-hoc). We believe that a more principled approach, which aims to target specific mechanisms, might lead to more interpretable results (Juslin, Barradas, & Eerola, 2015; Juslin, Harmat, & Eerola, 2014). Based on the assumptions that most studies of musical emotion have lacked the needed specificity, in terms of stimulus manipulation and procedures, to separate different underlying mechanisms, and that neuroscience studies in general psychology have reached a higher level of theoretical sophistication, we propose hypotheses from various sub-domains (e.g., memory, imagery, language). These might be tested in designs that manipulate specific mechanisms, in a humble attempt to uncover more mechanism-specific brain networks (Juslin, 2019). Emotional responses to music can be expected to involve three general types of brain regions: (1) brain regions always involved during music perception (e.g., the primary auditory cortex), (2) regions always involved in the conscious experience of emotion, regardless of the “source” of the emotion (candidates may include the rostral anterior cingulate and the medial prefrontal cortex; see, e.g., Lane, 2000, pp. 356–358), and (3) regions involved in information-processing that differs depending on the mechanism that caused the emotion. The last category of regions may involve processes (e.g., syntactic processing, episodic memory) that do not in themselves imply that emotions have been aroused: They may also occur in the absence of emotions (e.g., Pessoa, 2013). Based on these notions, we propose the following (preliminary) hypotheses for emotion induction. (Neural correlates of aesthetic judgments are discussed in Chapter 15, this volume).
Brainstem reflexes involve the reticulospinal tract, which travels from the reticular formation of the brain stem, and the intralaminar nuclei of the thalamus (Davies, 1984; Kinomura, Larsson, Gulyás, & Roland, 1996). “Alarm signals” to auditory events can be emitted as early as at the level of the inferior colliculus of the brainstem (Brandao, Melo, & Cardoso, 1993), producing startle reflexes and increased arousal. Studies show that the reticulospinal tract is required for the acoustic startle response, because lesions in this tract abolish the response (Boulis, Kehne, Miserendino, & Davis, 1990). Yet, although the neural circuitry that “mediates” the acoustic startle is located entirely within the brainstem, the system can be modulated by higher neural tracts (Miserandino, Sananes, Melia, & Davis, 1990). Rhythmic entrainment has been less examined, but could involve neural oscillation patterns to rhythmic stimulation in early auditory areas, motor areas (sensorimotor cortex, supplementary motor area), the cerebellum, and the basal ganglia (see Fujioka, Trainor, Large, & Ross, 2012; Tierney & Kraus, 2013; Trost el al., 2014), perhaps primed early on by reticulospinal pathways in the brainstem (Rossignol & Melvill Jones, 1976). The cerebellum could be particularly important in “active” entrainment (coordination of a motor response; e.g., Grahn, Henry, & McAuley, 2011), whereas the caudate nucleus of the basal ganglia could be the crucial area during “passive” entrainment to auditory stimulation (Trost et al., 2014). Evaluative conditioning (EC) involves particularly the lateral nucleus of the amygdala and the interpositus nucleus of the cerebellum (e.g., Fanselow & Poulus, 2005; Johnsrude, Owen, White, Zhao, & Bohbot, 2000; Sacchetti, Scelfo, & Strata, 2005). Hippocampal activation may also occur, if the EC depends strongly on the context, but only the amygdala seems to be required for EC to occur (LeDoux, 2000). The timing of the delivery of the CS and US used in conditioning is important, which may explain why the cerebellum is active in conditioning (like another time-dependent process—rhythmic entrainment). We argue that the amygdala is mainly involved in the evaluation of the stimulus whereas the cerebellum is involved in the timing of the response (Cabeza & Nyberg, 2000). Emotional contagion from music will presumably include brain regions for the perception of emotions from the voice (and, hence, presumably of emotions from voice-like characteristics of music), mainly right-lateralized inferior frontal areas (including the frontal gyrus) and the basal ganglia (Adolphs, Damasio, & Tranel, 2002; Paulmann, Ott, & Kotz, 2011;
Schirmer & Kotz, 2006), and also “mirror neurons” in premotor regions, in particular regions involved in perceiving emotional vocalizations (e.g., Paquette et al., 2018; Warren et al., 2006; cf. Koelsch, Fritz, von Cramon, Müller, & Friederici, 2006). Visual imagery involves visual representations in the occipital lobe that are spatially mapped and activated in a “top-down” manner during imagery (Charlot, Tzourio, Zilbovicius, Mazoyer, & Denis, 1992; Goldenberg, Podreka, Steiner, Franzén, & Deecke, 1991). This requires the intervention of an attention-demanding process of image generation, which appears to have a left temporo-occipital localization (e.g., Farah, 2000). Self-reported imagery vividness correlates with activation of the visual cortex in imaging studies (Cui, Jeter, Yang, Montague, & Eagleman, 2007), which may also be activated during music listening (e.g., Thornton-Wells et al., 2010). Episodic memory can be divided into various stages (e.g., encoding, retrieval). The conscious experience of recollection of an episodic memory seems to involve the medial temporal lobe, especially hippocampus (e.g., Nyberg, McIntosh, Houle, Nilsson, & Tulving, 1996) and the medial prefrontal cortex (Gilboa, 2004; for similar results in music, see Janata, 2009). Additional areas correlated with episodic memory retrieval include the precuneus (Wagner, Shannon, Kahn, & Buckner, 2005), the entorhinal cortex (Haist, Gore, & Mao, 2001), and the amygdala (in the case of emotional memories; Dolcos, LaBar, & Cabeza, 2005). Musical expectancy refers to such expectancies that involve syntactical relationships between different parts of the musical structure (Meyer, 1956), somewhat akin to a syntax in language. Lesion studies indicate that several areas of the left perisylvian cortex are involved in various aspects of syntactical processing (Brown, Hagoort, & Kutas, 2000), and parts of Broca’s area increase their activity when sentences increase in syntactical complexity (Caplan, Alpert, & Waters, 1998; Stromswold, Caplan, Alpert, & Rauch, 1996; for music, see Maess, Koelsch, Gunter, & Friederici, 2001). Musical expectancy also involves monitoring of conflicts between expected and actual music sequences. This may recruit parts of the anterior cingulate (Botvinick, Cohen, & Carter, 2004) or orbitofrontal cortex (Koelsch, 2014). It should be noted that nearly all of the brain regions proposed above have been reported in at least one imaging study of music listening; and many have been reported frequently. Detailed predictions for neural correlates of emotion perception, based on the ICINAS-BRECVEMAC
framework (Juslin, 2019), have not been proposed earlier, but the reported blood-flow changes are at the least consistent with sources of perceived emotions in terms of iconic similarity with emotional speech (e.g., the right frontal gyrus), intrinsically coded tension in musical structure (e.g., the anterior cingulate cortex), and associative coding based on classic conditioning (e.g., the cerebellum; see Table 1). Overlapping brain areas between evoked and perceived emotion (Tables 1 and 2) could reflect similar processes—such as emotion perception (prefrontal brain areas, involved in the induction mechanism contagion) and conflict monitoring (the anterior cingulate cortex, which is involved in both intrinsic sources of perceived emotions and the expectancy mechanism for emotion induction). We emphasize, however, that all “post-hoc” speculations of this type must be treated with caution: The relevant distinctions between processes must be made at the stage of experimental design (Juslin et al., 2014, 2015), rather than in the interpretations afterwards.
C
R A
: A F ?
N
Nearly a decade ago, Peretz (2010) observed that the neuropsychology of music and emotion was in its infancy. Yet she seemed optimistic: “It is remarkable how much progress has been accomplished over the last decade” (Peretz, 2010, p. 119). We take a slightly more pessimistic view on the current state of the art: the field may have become a “toddler,” but the results are fragmented. Most studies seem to make sense, when considered on their own, but the different studies do not add up to a consistent “big picture.” When it comes to understanding which brain regions are involved in music and emotion, and their respective role in the underlying processes, it is not obvious that the field has advanced much, as compared to the seminal studies carried out nearly twenty years ago (Blood & Zatorre, 2001). Yes, some brain areas have been (more or less) consistently reported across different studies—but we still do not know which roles they play.
We suggest that this reflects the lack of a systematic research program, which truly attempts to link specific psychological processes to brain networks. The previously outlined ICINAS-BRECVEMAC framework provides one promising way to address this issue. We recognize, however, that there may be other ways of “slicing the pie.” The important thing is that we do not try to eat the pie randomly, because that is bound to get messy. We argue that future research designs need to become increasingly sensitive to psychological process distinctions. To this end, we propose three ways of enhancing progress in the domain: (1) actively manipulating and contrasting different psychological mechanisms in the same experimental design (cf. Juslin et al., 2014); (2) employing convergent measures to support conclusions about the engagement of each mechanism (see Juslin et al., 2015, Table 7) and about whether perception or induction of emotion has occurred (e.g., Lundqvist, Carlsson, Hilmersson, & Juslin, 2009); (3) analyzing sets of regions as networks (as opposed to analyzing single regions), in order to increase the selectivity of response in the brain region of interest (Poldrack, 2006; cf. Koelsch, Skouras, & Lohmann, 2018). We also recommend the use of more systematic control conditions (i.e., contrasts) to rule out “alternative interpretations”—contrasting different mechanisms not only with one another, but also with “non-emotional” music listening, listening to “mere sounds,” to silence, etc., in order to isolate the brain networks that are selectively involved in: music listening per se; emotions in general; specific emotion categories and dimensions; psychological mechanisms; and more domain-general cognitive processes (e.g., attention). One hitherto unexplored possibility is to use transcranial magnetic stimulation (Pascual-Leone, Davey, Rothwell, Wassermann, & Puri, 2002) to disrupt brain activity at crucial times and locations to prevent mechanisms from becoming activated by music events. Some scholars argue that an understanding of musical emotions is important in order to better understand emotions in general (Koelsch et al., 2010). Indeed, because music engages so fully with our emotions, music
can sometimes reveal the nature of our “emotional machinery” more clearly than the stimuli normally used to study emotions. The fact that music appears to be so “abstract”—meaning that our “post-hoc” rationalizations for emotions cannot be made to easily fit—may help us to think more clearly about the “true” causes of our emotions (Juslin, 2019). Current theory of music and emotion suggests that responses are mediated by a wide range of mechanisms, rather than just cognitive appraisal. However, there is an unfortunate disconnect between theory in the field and empirical studies of the neural correlates, which prevents brain studies from realizing their full potential; when psychological theory becomes reflected in the experimental design of brain imaging studies, that is when things are bound to get exciting.
R (Articles marked * are included in the empirical review) Adolphs, R., Damasio, H., & Tranel, D. (2002). Neural systems for recognition of emotional prosody: A 3-D lesion study. Emotion 2, 23–51. *Alfredson, B. B., Risberg, J., Hagberg, B., & Gustafson, L. (2004). Right temporal lobe activation when listening to emotionally significant music. Applied Neuropsychology 11, 161–166. *Altenmüller, E., Schurmann, K., Lim, V. K., & Parlitz, D. (2002). Hits to the left, flops to the right: Different emotions during listening to music are reflected in cortical lateralisation patterns.
Neuropsychologia 40, 2242–2256. *Ball, T., Rahm, B., Eickhoff, S. B., Schulze-Bonhage, A., Speck, O., & Mutschler, I. (2007). Response properties of human amygdala subregions: Evidence based on functional MRI combined with probabilistic anatomical maps. PLoS ONE 2, e307. Barrett, L. F. (2017). How emotions are made: The secret life of the brain. Boston: Houghton Mifflin Harcourt. Baumgartner, H. (1992). Remembrance of things past: Music, autobiographical memory, and emotion. Advances in Consumer Research 19, 613–620. *Baumgartner, T., Esslen, M., & Jäncke, L. (2005). From emotion perception to emotion experience: Emotions evoked by pictures and classical music. International Journal of Psychophysiology 60, 34–43. *Baumgartner, T., Lutz, K., Schmidt, C. F., & Jäncke, L. (2006). The emotional power of music: How music enhances the feeling of affective pictures. Brain Research 1075, 151–164. Berlyne, D. E. (1971). Aesthetics and psychobiology. New York: Appleton Century Crofts. Blair, M. E., & Shimp, T. A. (1992). Consequences of an unpleasant experience with music: A second-order negative conditioning perspective. Journal of Advertising 21, 35–43. Blonder, L. X. (1999). Brain and emotion relations in culturally diverse populations. In A. L. Hinton (Ed.), Biocultural approaches to the emotions (pp. 275–296). Cambridge: Cambridge University Press. *Blood, A. J., & Zatorre, R. J. (2001). Intensely pleasurable responses to music correlate with activity in brain regions implicated in reward and emotion. Proceedings of the National Academy of Sciences 98, 11818–11823. *Blood, A. J., Zatorre, R. J., Bermudez, P., & Evans, A. C. (1999). Emotional responses to pleasant and unpleasant music correlate with activity in paralimbic brain regions. Nature Neuroscience 2, 382–387. Botvinick, M. M., Cohen, J. D., & Carter, C. S. (2004). Conflict monitoring and anterior cingulate cortex. Trends in Cognitive Sciences 8, 539–546. Boulis, N. M., Kehne, J. H., Miserendino, M. J. D., & Davis, M. (1990). Differential blockade of early and late components of acoustic startle following intrathecal infusion of 6-cyano-7nitroquinoxaline-2,3-dione (CNQX) or D, L-2-amino-5-phosphonovaleric acid (AP-5). Brain Research 520, 240–246. Brandao, M. L., Melo, L. L., & Cardoso, S. H. (1993). Mechanisms of defense in the inferior colliculus. Behavioral Brain Research 58, 49–55. *Brattico, E., Alluri, V., Bogert, B., Jacobsen, T., Vartiainen, N., Nieminen, S., & Tervaniemi, M. (2011). A functional MRI study of happy and sad emotions in music with and without lyrics. Frontiers in Psychology 2, 308. Retrieved from https://doi.org/10.3389/fpsyg.2011.00308 Bressler, S. L., & Menon, V. (2010). Large-scale brain networks in cognition: Emerging methods and principles. Trends in Cognitive Sciences 14, 277–290. Brown, C. M., Hagoort, P., & Kutas, M. (2000). Postlexical integration processes in language comprehension: Evidence from brain-imaging research. In M. S. Gazzaniga (Ed.), The new cognitive neurosciences (2nd ed., pp. 881–895). Cambridge, MA: MIT Press. *Brown, S., Martinez, M. J., & Parsons, L. M. (2004). Passive music listening spontaneously engages limbic and paralimbic systems. Neuroreport 15, 2033–2037. Bryan, G. A., & Barrett, H. C. (2008). Vocal emotion recognition across disparate cultures. Journal of Cognition and Culture 8, 135–148. *Bryden, M. P., Ley, R. G., & Sugarman, J. H. (1982). A left-ear advantage for identifying the emotional quality of tonal sequences. Neuropsychologia 20, 83–87.
Cabeza, R., & Nyberg, L. (2000). Imaging cognition II: An empirical review of 275 PET and fMRI studies. Journal of Cognitive Neuroscience 12, 1–47. Caplan, D., Alpert, N., & Waters, G. (1998). Effects of syntactic structure and propositional number on patterns of regional cerebral blood flow. Journal of Cognitive Neuroscience 10, 541–542. *Caria, A., Venuti, P., & de Falco, S. (2011). Functional and dysfunctional brain circuits underlying emotional processing of music in autism spectrum disorders. Cerebral Cortex 21, 2838–2849. *Chapin, H., Jantzen, K., Kelso, J. S., Steinberg, F., & Large, E. (2010). Dynamic emotional and neural responses to music depend on performance expression and listener experience. PloS ONE 5, e13812. Charlot, V., Tzourio, N., Zilbovicius, M., Mazoyer, B., & Denis, M. (1992). Different mental imagery abilities result in different regional cerebral blood flow activation patterns during cognitive tasks. Neuropsychologia 30, 565–580. Chikazoe, J., Lee, D. H., Kriegeskorte, N., & Anderson, A. K. (2014). Population coding of affect across stimuli, modalities, and individuals. Nature Neuroscience 17, 1114–1122. Clark-Polner, E., Wager, T. D., Satpute, A. B., & Barrett, L. F. (2016). Neural fingerprinting: Metaanalysis, variation, and the search for brain-based essences in the science of emotion. In L. F. Barrett, M. Lewis, & J. M. Haviland-Jones (Eds.), Handbook of emotions (4th ed., pp. 146–165). New York: Guilford Press. Clynes, M. (1977). Sentics: The touch of emotions. New York: Doubleday. Cui, X., Jeter, C. B., Yang, D., Montague, P. R., & Eagleman, D. M. (2007). Vividness of mental imagery: Individual variability can be measured objectively. Vision Research 47, 474–478. *Daly, I., Malik, A., Hwang, F., Roesch, E., Weaver, J., Kirke, A., … Nasuto, S. J. (2014). Neural correlates of emotional responses to music: An EEG study. Neuroscience Letters 573, 52–57. *Daly, I., Williams, D., Hallowell, J., Hwang, F., Kirke, A., Malik, A., … Nasuto, S. J. (2015). Music-induced emotions can be predicted from a combination of brain activity and acoustic features. Brain and Cognition 101, 1–11. Damasio, A. (1994). Descartes’ error: Emotion, reason, and the human brain. New York: Avon Books. Damasio, A. R., Grabowski, T. J., Bechara, A., Damasio, H., Ponto, L. L. B., Parvizi, J., & Hichwa, R. D. (2000). Subcortical and cortical brain activity during the feeling of self-generated emotions. Nature Neuroscience 3, 1049–1056. Davidson, R. J. (1995). Celebral asymmetry, emotion, and affective style. In R. J. Davidson & K. Hugdahl (Eds.), Brain asymmetry (pp. 361–387). Cambridge, MA: MIT Press. Davis, M. (1984). The mammalian startle response. In R. C. Eaton (Ed.), Neural mechanisms of startle behavior (pp. 287–342). New York: Plenum Press. Dennett, D. C. (1987). The intentional stance. Cambridge, MA: MIT Press. Dolcos, F., LaBar, K. S., & Cabeza, R. (2005). Remembering one year later: Role of the amygdala and the medial temporal lobe memory system in retrieving emotional memories. Proceedings of the National Academy of Sciences 102, 2626–2631. Dowling, W. J., & Harwood, D. L. (1986). Music cognition. New York: Academic Press. *Eldar, E., Ganor, O., Admon, R., Bleich, A., & Hendler, T. (2007). Feeling the real world: Limbic response to music depends on related content. Cerebral Cortex 17, 2828–2840. *Escoffier, N., Zhong, J., Schirmer, A., & Qui, A. (2013). Emotions in voice and music: Same code, same effect? Human Brain Mapping 34, 1796–1810. Fanselow, M. S., & Poulos, A. M. (2005). The neuroscience of mammalian associative learning. Annual Review of Psychology 56, 207–234. Farah, M. J. (2000). The neural bases of mental imagery. In M. S. Gazzaniga (Ed.), The new cognitive neurosciences (2nd ed., pp. 965–974). Cambridge, MA: MIT Press.
*Field, T., Martinez, A., Nawrocki, T., Pickens, J., Fox, N. A., & Schanberg, S. (1998). Music shifts frontal EEG in depressed adolescents. Adolescence 33, 109–116. *Flores-Gutierrez, E. O., Diaz, J.-L., Barrios, F. A., Favila-Humara, R., Guevara, M. A., del RioPortilla, Y., & Corsi-Cabrera, M. (2007). Metabolic and electric brain patterns during pleasant and unpleasant emotions induced by music masterpieces. International Journal of Psychophysiology 65, 69–84. *Flores-Gutiérrez, E. O., Díaz, J.-L., Barrios, F. A., Guevara, M. Á., del Río-Portilla, Y., CorsiCabrera, M., & del Flores-Gutiérrez, E. O. (2009). Differential alpha coherence hemispheric patterns in men and women during pleasant and unpleasant musical emotions. International Journal of Psychophysiology 71, 43–49. Fritz, T., Jentschke, S., Gosselin, N., Sammler, D., Peretz, I., Turner, R., … Koelsch, S. (2009). Universal recognition of three basic emotions in music. Current Biology 19, 1–4. Fujioka, T., Trainor, L. J., Large, E. W., & Ross, B. (2012). Internalized timing of isochronous sounds is represented in neuromagnetic beta oscillations. Journal of Neuroscience 32, 1791–1802. *Gagnon, L., & Peretz, I. (2000). Laterality effects in processing tonal and atonal melodies with affective and nonaffective task instructions. Brain & Cognition 43, 206–210. Gärdenfors, P. (2003). How homo became sapiens: On the evolution of thinking. Oxford: Oxford University Press. Gilboa, A. (2004). Autobiographical and episodic memory: One and the same? Evidence from prefrontal activation in neuroimaging studies. Neuropsychologica 42, 1336–1349. Goldenberg, G., Podreka, I., Steiner, M., Franzén, P., & Deecke, L. (1991). Contributions of occipital and temporal brain regions to visual and acoustic imagery: A SPECT study. Neuropsychologia 29, 695–702. *Gosselin, N., Peretz, I., Hasboun, D., Baulac, M., & Samson, S. (2011). Impaired recognition of musical emotions and facial expressions following anteromedial temporal lobe excision. Cortex 47, 1116–1125. *Gosselin, N., Peretz, I., Johnsen, E., & Adolphs, R. (2007). Amygdala damage impairs emotion recognition from music. Neuropsychologia 45, 236–244. *Gosselin, N., Peretz, I., Noulhiane, M., Hasboun, D., Beckett, C., Baulac, M., & Samson, S. (2005). Impaired recognition of scary music following unilateral temporal lobe excision. Brain 128, 628– 640. *Gosselin, N., Samson, S., Adolphs, R., Noulhiane, M., Roy, M., Hasboun, D., … Peretz, I. (2006). Emotional responses to unpleasant music correlates with damage to the parahippocampal cortex. Brain 129, 2585–2592. *Goydke, K. N., Altenmüller, E., Möller, J., & Münte, T. (2004). Changes in emotional tone and instrumental timbre are reflected by the mismatch negativity. Cognitive Brain Research 21, 351– 359. Grahn, J. A., Henry, M. J., & McAuley, J. D. (2011). fMRI investigation of cross-modal interactions in beat perception: Audition primes vision, but not vice versa. NeuroImage 54, 1231–1243. *Green, A. C., Baerentsen, K., Stodkilde-Jorgensen, H., Wallentin, M., Roepstorff, A., & Vuust, P. (2008). Music in minor activates limbic structure: A relationship with dissonance? Neuroreport 19, 711–715. *Griffiths, T. D., Warren, J. D., Dean, J. L., & Howard, D. (2004). “When the feeling’s gone”: A selective loss of musical emotion. Journal of Neurology, Neurosurgery & Psychiatry 75, 344–345. Haist, F., Gore, J. B., & Mao, H. (2001). Consolidation of human memory over decades revealed by functional magnetic resonance imaging. Nature Neuroscience 4, 1139–1145. Harmon-Jones, E., Harmon-Jones, C., & Summerell, E. (2017). On the importance of both dimensional and discrete models of emotion. Behavioral Sciences 7, 66.
Harrer, G., & Harrer, H. (1977). Music, emotion, and autonomic function. In M. Critchley & R. A. Henson (Eds.), Music and the brain: Studies in the neurology of music (pp. 202–216). London: William Heinemann Medical Books. Hodges, D., & Sebald, D. (2011). Music in the human experience: An introduction to music psychology. New York: Routledge. *Hsieh, S., Hornberger, M., Piguet, O., & Hodges, J. R. (2012). Brain correlates of musical and facial emotion recognition: Evidence from the dementias. Neuropsychologia 50, 1814–1822. Izard, C. E. (1977). The emotions. New York: Plenum Press. *Janata, P. (2009). The neural architecture of music-evoked autobiographical memories. Cerebral Cortex 19, 2579–2594. *Jeong, J.-W., Diwadkar, V. A., Chugani, C. D., Sinsoongsud, P., Muzik, O., Behen, M. E., … Chugani, D. C. (2011). Congruence of happy and sad emotion in music and faces modifies cortical audiovisual activation. NeuroImage 54, 2973–2982. Johnsrude, I. S., Owen, A. M., White, N. M., Zhao, W. V., & Bohbot, V. (2000). Impaired preference conditioning after anterior temporal lobe resection in humans. Journal of Neuroscience 20, 2649– 2656. Juslin, P. N. (2001). Communicating emotion in music performance: A review and a theoretical framework. In P. N. Juslin & J. A. Sloboda (Eds.), Music and emotion: Theory and research (pp. 309–337). Oxford: Oxford University Press. Juslin, P. N. (2011). Music and emotion: Seven questions, seven answers. In I. Deliège & J. Davidson (Eds.), Music and the mind (pp. 113–135). Oxford: Oxford University Press. Juslin, P. N. (2013a). From everyday emotions to aesthetic emotions: Toward a unified theory of musical emotions. Physics of Life Reviews 10, 235–266. Juslin, P. N. (2013b). What does music express? Basic emotions and beyond. Frontiers in Psychology: Emotion Science 4, 596. Juslin, P. N. (2019). Musical emotions explained. Oxford: Oxford University Press. Juslin, P. N., Barradas, G., & Eerola, T. (2015). From sound to significance: Exploring the mechanisms underlying emotional reactions to music. American Journal of Psychology 128, 281– 304. Juslin, P. N., Harmat, L., & Eerola, T. (2014). What makes music emotionally significant? Exploring the underlying mechanisms. Psychology of Music 42, 599–623. Juslin, P. N., & Laukka, P. (2003). Communication of emotions in vocal expression and music performance: Different channels, same code? Psychological Bulletin 129, 770–814. Juslin, P. N., Liljeström, S., Västfjäll, D., Barradas, G., & Silva, A. (2008). An experience sampling study of emotional reactions to music: Listener, music, and situation. Emotion 8, 668–683. Juslin, P. N., & Sloboda, J. A. (Eds.). (2001). Music and emotion: Theory and research. Oxford: Oxford University Press. Juslin, P. N., & Sloboda, J. A. (Eds.). (2010). Handbook of music and emotion: Theory, research, applications. Oxford: Oxford University Press. Juslin, P. N., & Västfjäll, D. (2008). Emotional responses to music: The need to consider underlying mechanisms. Behavioral and Brain Sciences 31, 559–575. *Kamiyama, K. S., Abla, D., Iwanaga, K., & Okanoya, K. (2013). Interaction between musical emotion and facial expression as measured by event-related potentials. Neuropsychologia 51, 500– 505. Kassam, K. S., Markey, A. R., Cherkassky, V. L., Loewenstein, G., & Just, M. A. (2013). Identifying emotions on the basis of neural activation. PLoS ONE 8, e66032. *Khalfa, S., Schon, D., Anton, J. L., & Liégeois-Chauvel, C. (2005). Brain regions involved in the recognition of happiness and sadness in music. Neuroreport 16, 1981–1984.
Kinomura, S., Larsson, J., Gulyás, B., & Roland, P. E. (1996). Activation by attention of the human reticular formation and thalamic intralaminar nuclei. Science 271, 512–515. Kivy, P. (1990). Music alone: Reflections on a purely musical experience. Ithaca, NY: Cornell University Press. Koelsch, S. (2014). Brain correlates of music-evoked emotions. Nature Reviews Neuroscience 15, 170–180. *Koelsch, S., Fritz, T., & Schlaug, G. (2008). Amygdala activity can be modulated by unexpected chord functions during music listening. Neuroreport 19, 1815–1819. *Koelsch, S., Fritz, T., von Cramon, D. Y., Müller, K., & Friederici, A. D. (2006). Investigating emotion with music: An fMRI study. Human Brain Mapping 27, 239–250. *Koelsch, S., Kilches, S., Steinbeis, N., & Schelinski, S. (2008). Effects of unexpected chords and of performer’s expression on brain responses and electrodermal activity. PLoS ONE 3, e2631. *Koelsch, S., Remppis, A., Sammler, D., Jentschke, S., Mietchen, D., Fritz, T., … Siebel, W. A. (2007). A cardiac signature of emotionality. European Journal of Neuroscience 26, 3328–3338. Koelsch, S., Siebel, W. A., & Fritz, T. (2010). Functional neuroimaging. In P. N. Juslin & J. A. Sloboda (Eds.), Handbook of music and emotion: Theory, research, applications (pp. 313–344). Oxford: Oxford University Press. *Koelsch, S., Skouras, S., Fritz, T., Herrera, P., Bonhage, C., Küssner, M. B., & Jacobs, A. M. (2013). The roles of superficial amygdala and auditory cortex in music-evoked fear and joy. NeuroImage 81, 49–60. Koelsch, S., Skouras, S., & Lohmann, G. (2018). The auditory cortex hosts network nodes influential for emotion processing: An fMRI study on music-evoked fear and joy. PLoS ONE 13, e0190057. Kreutz, G., & Lotze, M. (2007). Neuroscience of music and emotion. In W. Gruhn & F. Rauscher (Eds.), Neurosciences in music pedagogy (pp. 143–167). New York: Nova. *Kreutz, G., Ott, U., & Wehrum, S. (2006). Cerebral correlates of musically-induced emotions: An fMRI-study. In M. Baroni et al. (Eds.), Proceedings of the Ninth International Conference on Music Perception and Cognition (ICMPC). Bologna, August 22–26. Lane, R. D. (2000). Neural correlates of conscious emotional experience. In R. D. Lane & L. Nadel (Eds.), Cognitive neuroscience of emotion (pp. 345–370). Oxford: Oxford University Press. Langer, S. K. (1957). Philosophy in a new key. Cambridge, MA: Harvard University Press. LeDoux, J. E. (2000). Cognitive-emotional interactions: Listen to the brain. In R. D. Lane & L. Nadel (Eds.), Cognitive neuroscience of emotion (pp. 129–155). Oxford: Oxford University Press. *Lehne, M., Rohrmeier, M., & Koelsch, S. (2014). Tension-related activity in the orbitofrontal cortex and amygdala: An fMRI study with music. Social Cognitive and Affective Neuroscience 9, 1515– 1523. *Lerner, Y., Papo, D., Zhdanov, A., Belozersky, L., & Hendler, T. (2009). Eyes wide shut: Amygdala mediates eyes-closed effect on emotional experience with music. PLoS ONE 4, e6230. *Liégeois-Chauvel, C., Bénar, C., Krieg, J., Delbé, C., Chauvel, P., Giusiano, B., & Bigand, E. (2014). How functional coupling between the auditory cortex and the amygdala induces musical emotion: A single case study. Cortex 60, 82–93. *Lin, Y.-P., Wang, C.-H., Jung, T.-P., Wu, T.-L., Jeng, S.-K., Duann, J.-R., & Chen, J.-H. (2010). EEG-based emotion recognition in music listening. IEEE Transactions on Biomedical Engineering 57, 1798–1806. Lindquist, K. A., Wager, T. D., Kober, H., Bliss-Moreau, E., & Barrett, L. F. (2012). The brain basis of emotion: A meta-analytic review. Behavioral and Brain Sciences 35, 121–143. *Logeswaran, N., & Bhattacharya, J. (2009). Crossmodal transfer of emotion by music. Neuroscience Letters 455, 129–133.
Lundqvist, L.-O., Carlsson, F., Hilmersson, P., & Juslin, P. N. (2009). Emotional responses to music: Experience, expression, and physiology. Psychology of Music 37, 61–90. MacDonald, R., Kreutz, G., & Mitchell, L. (Eds.). (2012). Music, health, and well-being. Oxford: Oxford University Press. Maess, B., Koelsch, S., Gunter, T. C., & Friederici, A. D. (2001). Musical syntax is processed in Broca’s area: A MEG study. Nature Neuroscience 4, 540–545. *Matthews, B. R., Chang, C.-C., De May, M., Engstrom, J., & Miller, B. L. (2009). Pleasurable emotional response to music: A case of neurodegenerative generalized auditory agnosia. Neurocase 15, 248–259. *Mazzoni, M., Moretti, P., Pardossi, L., Vista, M., & Muratorio, A. (1993). A case of music imperception. Journal of Neurology, Neurosurgery & Psychiatry 56, 322–324. *Menon, V., & Levitin, D. J. (2005). The rewards of music listening: Response and physiological connectivity of the mesolimbic system. NeuroImage 28, 175–184. Meyer, L. B. (1956). Emotion and meaning in music. Chicago: University of Chicago Press. Miserendino, M. J. D., Sananes, C. B., Melia, K. R., & Davis, M. (1990). Blocking of acquisition but not expression of conditioned fear-potentiated startle by NMDA antagonists in the amygdala. Nature 345, 716–718. *Mitterschiffthaler, M. T., Fu, C. H., Dalton, J. A., Andrew, C. M., & Williams, S. C. (2007). A functional MRI study of happy and sad affective states induced by classical music. Human Brain Mapping 28, 1150–1162. *Mizuno, T., & Sugishita, M. (2007). Neural correlates underlying perception of tonality-related emotional contents. Neuroreport 18, 1651–1655. *Mueller, K., Fritz, T., Mildner, T., Richter, M., Schulze, K., Lepsien, J., … Möller, H. E. (2015). Investigating the dynamics of the brain response to music: A central role of the ventral striatum/nucleus accumbens. NeuroImage 116, 68–79. Murphy, F. C., Nimmo-Smith, I., & Lawrence, A. D. (2003). Functional neuroanatomy of emotions: A meta-analysis. Cognitive, Affective, & Behavioral Neuroscience 3, 207–233. *Nair, D. G., Large, E. W., Steinberg, F., & Kelso, J. A. S. (2002). Perceiving emotion in expressive piano performance: A functional MRI study. In K. Stevens et al. (Eds.), Proceedings of the 7th International Conference on Music Perception and Cognition, July 2002 (CD rom). Adelaide, Australia: Causal Productions. Nyberg, L., McIntosh, A. R., Houle, S., Nilsson, L.-G., & Tulving, E. (1996). Activation of medialtemporal structures during episodic memory retrieval. Nature 380, 715–717. *Omar, R., Hailstone, J. C., Warren, J. E., Crutch, S. J., & Warren, J. D. (2010). The cognitive organization of music knowledge: A clinical analysis. Brain 133, 1200–1213. *Omar, R., Henley, S., Bartlett, J. W., Hailstone, J. C., Gordon, E., Sauter, D. A., … Warren, J. D. (2011). The structural neuroanatomy of music emotion recognition: Evidence from frontotemporal lobar degeneration. NeuroImage 56, 1814–1821. Osborne, J. W. (1980). The mapping of thoughts, emotions, sensations, and images as responses to music. Journal of Mental Imagery 5, 133–136. *Pallesen, K. J., Brattico, E., Bailey, C., Korvenoja, A., Koistovo, J., Gjedde, A., & Carlson, S. (2005). Emotion processing of major, minor, and dissonant chords: A functional magnetic resonance imaging study. Annals of the New York Academy of Sciences 1060, 450–453. Paquette, S., Takerkart, S., Saget, S., Peretz, I., & Belin, P. (2018). Cross-classification of musical and vocal emotions in the auditory cortex. Annals of the New York Academy of Sciences 1423, 329–337. Pascual-Leone, A., Davey, N. J., Rothwell, J., Wassermann, E. M., & Puri, B. K. (Eds.). (2002). Handbook of transcranial magnetic stimulation. Oxford: Oxford University Press.
Paulmann, S., Ott, D. V. M., & Kotz, S. A. (2011). Emotional speech perception unfolding in time: The role of the basal ganglia. PLoS ONE 6, e17694. *Perani, D., Saccuman, M. C., Scifo, P., Spada, D., Andreolli, G., Rovelli, R., … Koelsch, S. (2010). Functional specializations for music processing in the human newborn brain. Proceedings of the National Academy of Sciences 107, 4758–4763. *Pereira, C. S., Teixeira, J., Figueiredo, P., Xavier, J., Castro, S. L., & Brattico, E. (2011). Music and emotions in the brain: Familiarity matters. PLoS ONE 6, e27241. Peretz, I. (2001). Listen to the brain: A biological perspective on musical emotions. In P. N. Juslin & J. A. Sloboda (Eds.), Music and emotion: Theory and research (pp. 105–134). Oxford: Oxford University Press. Peretz, I. (2010). Towards a neurobiology of musical emotions. In P. N. Juslin & J. A. Sloboda (Eds.), Handbook of music and emotion: Theory, research, applications (pp. 99–126). Oxford: Oxford University Press. *Peretz, I., & Gagnon, L. (1999). Dissociation between recognition and emotional judgment for melodies. Neurocase 5, 21–30. *Peretz, I., Gagnon, L., & Bouchard, B. (1998). Music and emotion: Perceptual determinants, immediacy, and isolation after brain damage. Cognition 68, 111–141. Pessoa, L. (2013). The cognitive-emotional brain: From interactions to integration. Cambridge, MA: MIT Press. *Petrini, K., Crabbe, F., Sheridan, C., & Pollick, F. E. (2011). The music of your emotions: Neural substrates involved in detection of emotional correspondence between auditory and visual music actions. PloS ONE 6, e19165. Poldrack, R. A. (2006). Can cognitive processes be inferred from neuroimaging data? Trends in Cognitive Sciences 10, 59–63. Rossignol, S., & Jones, G. (1976). Audio-spinal influence in man studied by the H-reflex and its possible role on rhythmic movements synchronized to sound. Electroencephalography and Clinical Neurophysiology 41, 83–92. Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology 39, 1161–1178. Saarimäki, H., Gotsopoulos, A., Jaaskelainen, I. P., Lampinen, J., Vuilleumier, P., Hari, R., … Nummenmaa, L. (2016). Discrete neural signatures of basic emotions. Cerebral Cortex 26, 2563– 2573. Sacchetti, B., Scelfo, B., & Strata, P. (2005). The cerebellum: Synaptic changes and fear conditioning. The Neuroscientist 11, 217–227. *Salimpoor, V. N., Benovoy, M., Larcher, K., Dagher, A., & Zatorre, R. (2011). Anatomically distinct dopamine release during anticipation and experience of peak emotion to music. Nature Neuroscience 14, 257–262. *Salimpoor, V. N., van den Bosch, I., Kovacevic, N., McIntosh, A. R., Dagher, A., & Zatorre, R. J. (2013). Interactions between the nucleus accumbens and auditory cortices predict music reward value. Science 340, 216–219. *Sammler, D., Grigutsch, M., Fritz, T., & Koelsch, S. (2007). Music and emotion: Electrophysiological correlates of the processing of pleasant and unpleasant music. Psychophysiology 44, 293–304. *Satoh, M., Nakase, T., Nagata, K., & Tomimoto, H. (2011). Musical anhedonia: Selective loss of emotional experience in listening to music. Neurocase 17, 410–417. Scherer, K. R. (1999). Appraisal theories. In T. Dalgleish & M. Power (Eds.), Handbook of cognition and emotion (pp. 637–663). Chichester: Wiley.
Scherer, K. R., & Zentner, M. R. (2001). Emotional effects of music: Production rules. In P. N. Juslin & J. A. Sloboda (Eds.), Music and emotion: Theory and research (pp. 361–392). Oxford: Oxford University Press. Schirmer, A., & Kotz, S. A. (2006). Beyond the right hemisphere: Brain mechanisms mediating vocal emotional processing. Trends in Cognitive Sciences 10, 24–30. *Schmidt, B., & Hanslmayr, S. (2009). Resting frontal EEG alpha-asymmetry predicts the evaluation of affective musical stimuli. Neuroscience Letters 460, 237–240. *Schmidt, L. A., & Trainor, L. J. (2001). Frontal brain electrical activity (EEG) distinguishes valence and intensity of musical emotions. Cognition & Emotion 15, 487–500. *Schmidt, L. A., Trainor, L. J., & Santesso, D. L. (2003). Development of frontal encephalogram (EEG) and heart rate (ECG) responses to affective musical stimuli during the first 12 months of post-natal life. Brain and Cognition 52, 27–32. *Shahabi, H., & Moghimi, S. (2016). Toward automatic detection of brain responses to emotional music through analysis of EEG effective connectivity. Computers in Human Behavior 58, 231– 239. *Singer, N., Jacoby, N., Lin, T., Raz, G., Shpigelman, L., Gilam, G., … Hendler, T. (2016). Common modulation of limbic network activation underlies musical emotions as they unfold. NeuroImage 141, 517–529. Sloboda, J. A., & Juslin, P. N. (2001). Psychological perspectives on music and emotion. In P. N. Juslin & J. A. Sloboda (Eds.), Music and emotion: Theory and research (pp. 71–104). Oxford: Oxford University Press. Spitzer, M. (2013). Sad flowers: Affective trajectory in Schubert’s Trockne Blumen. In T. Cochrane, B. Fantini, & K. R. Scherer (Eds.), The emotional power of music (pp. 7–21). Oxford: Oxford University Press. *Spreckelmeyer, K. N., Altenmüller, E., Colonius, H., & Münte, T. F. (2013). Preattentive processing of emotional musical tones: A multidimensional scaling and ERP study. Frontiers in Psychology 4, 656. doi:10.3389/fpsyg.2013.00656 *Spreckelmeyer, K. N., Kutas, M., Urbach, T. P., Altenmüller, E., & Münte, T. F. (2006). Combined perception of emotion in pictures and musical sounds. Brain Research 1070, 160–170. *Steinbeis, N., & Koelsch, S. (2009). Understanding the intentions behind man-made products elicits neural activity in areas dedicated to mental state attribution. Cerebral Cortex 19, 619–623. *Steinbeis, N., Koelsch, S., & Sloboda, J. A. (2006). The role of musical structure in emotion: Investigating neural, physiological, and subjective emotional responses to harmonic expectancy violations. Journal of Cognitive Neuroscience 18, 1380–1393. Stromswold, K., Caplan, D., Alpert, N., & Rauch, S. (1996). Localization of syntactic comprehension by positron emission tomography. Brain and Language 52, 452–473. *Suzuki, M., Okamura, N., Kawachi, Y., Tashiro, M., Arao, H., Hoshishiba, T., … Yanai, K. (2008). Discrete cortical regions associated with the musical beauty of major and minor chords. Cognitive, Affective, & Behavioral Neuroscience 8, 126–131. Thaut, M. H., & Wheeler, B. L. (2010). Music therapy. In P. N. Juslin & J. A. Sloboda (Eds.), Handbook of music and emotion: Theory, research, applications (pp. 819–848). Oxford: Oxford University Press. Thornton-Wells, T. A., Cannistraci, C. J., Anderson, A. W., Kim. C. Y., Eapen, M., Gore, J. C., … Dykens, E. M.(2010). Auditory attraction: Activation of visual cortex by music and sound in Williams syndrome. American Journal on Intellectual and Developmental Disabilities 115, 172– 189. Tierney, A., & Kraus, N. (2013). The ability to move to a beat is linked to the consistency of neural responses to sound. Journal of Neuroscience 33, 14981–14988.
*Trost, W., Ethofer, T., Zentner, M. R., & Vuilleumier, P. (2012). Mapping aesthetic musical emotions in the brain. Cerebral Cortex 22, 2769–2783. Trost, W., Frühholz, S., Schön, D., Labbé, C., Pichon, S., Grandjean, D., & Vuilleumier, P. (2014). Getting the beat: Entrainment of brain activity by musical rhythm and pleasantness. NeuroImage 103, 55–64. *Tsang, C. D., Trainor, L. J., Santesso, D. L., Tasker, S. L., & Schmidt, L. A. (2001). Frontal EEG responses as a function of affective musical features. Annals of the New York Academy of Sciences 930, 439–442. Wager, T. D., Barrett, L. F., Bliss-Moreau, E., Lindquist, K. A., Duncan, S., Kober, H., & Mize, J. (2008). The neuroimaging of emotion. In M. Lewis, J. M. Haviland-Jones, & L. F. Barrett (Eds.), Handbook of emotions (3rd ed., pp. 249–267). New York: Guilford Press. Wagner, A. D., Shannon, B. J., Kahn, I., & Buckner, R. L. (2005). Parietal lobe contributions to episodic memory retrieval. Trends in Cognitive Sciences 9, 445–453. Warren, J. E., Sauter, D. A., Eisner, F., Wiland, J., Dresner, M. A., Wise, R. J., … Scott, S. K. (2006). Positive emotions preferentially engage an auditory-motor “mirror” system. Journal of Neuroscience 26, 13067–13075. Zentner, M. R. (2010). Homer’s prophecy: An essay on music’s primary emotions. Music Analysis 29, 102–125.
1
The dopaminergic mesolimbic reward pathway involving nucleus accumbens is a prime suspect when it comes to positive emotions. The hippocampus and the amygdala are both considered key areas for emotional memories, which will presumably not be involved as much during comparatively “neutral” perception of emotions.
CHAPT E R 14
NEUROCHEMICAL R E S P O N S E S TO M U S I C Y U K O K O S H I MO R I
I M is used in various settings in everyday life and can modulate our mood, emotion, arousal, motivation, and movement. These music-induced effects can be objectively assessed using neuroimaging techniques and peripheral biomarkers. For example, functional neuroimaging studies have demonstrated that listening to music alters brain activity in various brain regions in the mesocorticolimbic pathways such as the anterior cingulate cortex, orbitofrontal cortex, insula, amygdala, hippocampus, and ventral striatum, which are implicated in reward, motivation, and emotional behaviors, in addition to the brain regions in the motor pathways such as the premotor and supplementary motor areas, thalamus, basal ganglia, and cerebellum. These neuroimaging studies reveal important anatomical information that also allows us to infer what brain functions music can modulate. However, within the same anatomical region, different neuroreceptors are expressed. Knowledge of the neurochemical functions can uncover more specific effects of music on various brain functions and help us to better understand the effects of music on brain pathology. Neurochemical functions can be measured using positron emission tomography (PET). PET imaging is a nuclear medicine imaging technique
to quantify the chemical/biological processes of molecules in vivo by injecting radiolabeled molecules (i.e., radioligands; Venneti, Lopresti, & Wiley, 2013). The radioligands typically resemble endogenous biological molecules with specific biological targets that they bind. This allows for mapping the distribution of the molecules in the brain. The radioligand is synthesized by labeling a precursor molecule with short-lived radionuclides such as carbon-11 (t1/2 = 20.4 min) or fluorine-18 (t1/2 = 109.8 min). After the radioligand is injected intravenously, it enters the bloodstream, crosses the blood–brain barrier, and binds target receptors or proteins in the brain. The radioligands can be agonists that induce downstream signaling in a manner similar to the endogenous molecules or antagonists that block the receptor and prevent it from being available to the endogenous molecules (Gunn, Slifstein, Searle, & Price, 2015). Multiple radioligands for different targets have been developed such as dopamine (DA), serotonin (5-HT), norepinephrine (NE), opioid, acetylcholine (Ach) and others (Gunn et al., 2015). PET imaging employing these radioligands allows for uncovering music-induced neurotransmissions that bind target neuroreceptors. On the other hand, major limitations of PET imaging are cost, invasiveness, and limited accessibility. Because of these limitations, there have been few studies using PET imaging to investigate music-induced neurochemical changes. Accordingly, the research findings discussed in this chapter are primarily based on the molecular concentrations or secretion rate of the peripheral biomarkers in blood (e.g., plasma and platelets), saliva, and urine. It should be noted that some central and peripheral chemicals serve different functions (e.g., norepinephrine), and whether some peripheral measures reflect the central measures is debatable (e.g., oxytocin), which is discussed later in this chapter. This chapter covers neurotransmitters including DA, 5-HT, NE, and Ach; neuropeptides such as beta (ß)endorphin, oxytocin (OT), and arginine vasopressin (AV), as well as their receptors and associated genes; steroid hormones such as cortisol; and peripheral immune biomarkers.
D
S
Dopamine (DA) is synthesized in the cytosol of catecholaminergic neurons in the ventral tegmental area (VTA) and in the substantia nigra pars compacta (SNpc) of the brain. The VTA DA sends projections to the ventral striatum/nucleus accumbens (NAc), amygdala, and hippocampus as well as medial prefrontal cortex such as orbitofrontal and anterior cingulate cortices whereas the SN DA sends projections to the dorsal striatum (i.e., putamen and caudate nucleus). These DA pathways are commonly referred to as mesocorticolimbic and nigro-striatal dopamine pathways, respectively. The former pathway is associated with emotional/motivational functions whereas the latter pathway is more involved in the executive/cognitive and sensorimotor functions (Solís & Moratalla, 2018). There are five dopamine receptor subtypes, which are classified based on their functional properties and subdivided into D1-like and D2-like families. The D1-like family consists of D1 and D5 receptors and the D2-like family consists of D2, D3, and D4 receptors. DA is also synthesized in the adrenal medulla and acts as a hormone along with other catecholamines (see section “Norepinephrine Systems”). Numerous functional neuroimaging studies have demonstrated that music alters brain activity in the DA pathways associated with reward/motivation (Menon & Levitin, 2005; Salimpoor, van den Bosch, Kovacevic, McIntosh, & Dagher, 2013), emotion/pleasure (Koelsch, 2014; Mueller et al., 2015; Salimpoor, Benovoy, Larcher, Dagher, & Zatorre, 2011), as well as motor functions (Grahn & Rowe, 2009). However, to date there has been only one study that investigated dopaminergic transmission in the ventral and dorsal striatum associated with musical pleasure (Salimpoor et al., 2011). This study employed PET with [11C]raclopride that is a D2/D3 receptor antagonist and found that greater DA release occurred in bilateral dorsal and ventral striatum, but notably in the right caudate nucleus and the right NAc during listening to pleasurable music selected by the participants compared to neutral music. Furthermore, the greater DA release in the right caudate nucleus was associated with the greater number of peak pleasure or “chills” whereas greater DA release in the right NAc was associated with more intense chills experienced. These anatomically distinct roles of the subregions in music listening were further elucidated by analyzing their temporal brain activation using functional magnetic resonance imaging (fMRI). The increased brain activity in the right caudate nucleus occurred several seconds prior to experiencing the
pleasurable peak while the enhanced activity in the right NAc occurred during the pleasurable moments. The authors interpreted these findings as indicating that the former structure is involved in the anticipation and prediction of pleasure and the latter structure, in experiencing pleasure. This study demonstrated that musical pleasure is associated with DA release in the ventral striatum, particularly in the NAc. However, as the individuals who regularly experience “chills” during music listening were selected to participate in the study, further research is needed to investigate whether DA is also released during listening to pleasurable music in those who have never experienced “chills.” In addition to musical pleasure, the role of DA in music perception and auditory-motor entrainment was investigated in Parkinson’s disease (PD) that is primarily characterized by loss of dopaminergic neurons in the SNpc, resulting in depletion of dopaminergic input to the dorsal striatum. The involvement of DA in these functions was investigated when people with PD were on and off dopaminergic medication. One study showed that dopaminergic medication improved music perception (Cameron, Pickett, Earhart, & Grahn, 2016). However, it is unknown whether the improvement was due to the practice effect that was also observed in the healthy participants or the effect of the medication. Another study using PET with [11C]-DTBZ that binds dopamine transporters (VMAT2) did not find strong association between dopaminergic denervation and auditory-motor task performance (Miller et al., 2013). However, when the people with PD were grouped based on the similarity of dopaminergic denervation, the auditorymotor synchronization accuracy paralleled the pattern of denervation. On the other hand, two fMRI studies did not implicate the role of DA during auditory-motor entrainment (Elsinger et al., 2003; Jahanshahi et al., 2010). In addition to these central dopaminergic functions, dopamineassociated gene expression and peripheral dopaminergic levels were also investigated. The expression of alpha-synuclein (SNCA) that maintains DA neuronal homeostasis (Murphy, Rueter, Trojanowski, & Lee, 2000; Oczkowska, Kozubski, Lianeri, & Dorszewska, 2014) was upregulated in professional musicians after a two-hour concert performance compared to after a music-free control condition (Kanduri, Kuusi, et al., 2015), as well as in listeners with musical experiences longer than 10 years and those with high musical aptitude after listening to 20-minutes of classical music (Kanduri, Raijas, et al., 2015). The latter study also reported that the
upregulation was absent after a music-free control condition or in listeners with no significant musical experience (Kanduri, Kuusi, et al., 2015; Kanduri, Raijas, et al., 2015). A few studies investigated the dopaminergic levels in the peripheral samples as well as associated psychological measures using an auditory stimulus and music. Two studies reported decreased dopaminergic levels: one in the urine sample following daily listening to binaural beats for 60 days in healthy adults, who also reported decreased trait anxiety (Wahbeh, Calabrese, & Zwickey, 2007) and the other study in the plasma sample following a 12-week dance movement therapy (DMT) that combines music, light exercise, and sensory stimulation in female adolescents with mild depression whose psychological distress was also reduced (Jeong et al., 2005). One study reported no change in the plasma dopaminergic level following listening to music (high-uplifting music or low-uplifting) in healthy adults who had performed a stress-induced task (Hirokawa & Ohira, 2003). However, it is unknown whether the stress-induced task affected the plasma dopaminergic level before listening to music. To summarize, there is some evidence that music enhances dopaminergic function. Listening to pleasurable music induces dopamine release, and music upregulates the SNCA expression that may facilitate dopaminergic neurotransmission. However, these responses may occur only in specific people (i.e., those with extensive musical training or those who regularly experience “chills” by listening to pleasurable music). Further studies are needed to investigate these effects in individuals with varying music education/training levels and listening habits and experiences, as well as using different genres of music. Music may also be able to reduce the peripheral DA levels and psychological disturbances. Future studies including both clinical and healthy control participants are needed to clarify these effects. In addition, some PD studies are suggestive of the role of DA in music perception and auditory cuing. Further pharmacological studies in people with PD are needed to address the limitations of previous studies for clarification.
E
O
S
Central endogenous opioid systems (EOS) include three opioid peptides including β-endorphin, enkephalins, and dynorphin and their three receptors, Mu (μ), Delta (δ), and Kappa (κ) (Benarroch, 2012). Neurons containing β-endorphin are localized in two main areas, the arcuate nucleus of the hypothalamus and nucleus tractus solitarius tract of the brainstem, which send widespread projections to the rest of the brain. Enkephalin and dynorphin are primarily located in local neurons. Opioid receptors are widely but differentially expressed throughout the central nervous system (CNS): the μ receptor is the most abundant receptor in the cerebral cortex, amygdala, thalamus, brainstem, dorsal horn and dorsal root ganglion (DRG) neurons; the δ receptor is mostly expressed in the olfactory system, striatum, limbic cortex, dorsal horn and DRG neurons; and the κ receptor in the claustrum, striatum, hippocampus, hypothalamus, brainstem, and dorsal horn. The EOS involves various functions such as reward, pain modulation, stress responses, and autonomic control. In the previous section, the study by Salimpoor et al. (2011) showed that musical pleasure was associated with DA release in the ventral striatum. However, the central EOS likely plays a primary role in the positive affect or pleasure induced by music, which acts synergistically with DA (Chanda & Levitin, 2013). Both endogenous and exogenous opioids can activate DA neurons in the VTA (Hjelmstad, Xia, Margolis, & Fields, 2013) that innervates the NAc. Both EOS and dopamine systems are involved in reward mechanisms, which consist of liking/pleasure (core reactions to hedonic experience of receiving a reward), wanting (motivational aspect), and learning (association and cognitive representations) (Berridge & Kringelbach, 2015), but animal research favors the notion that the EOS but not the DA system generates pleasure. Specifically, stimulation of μ opioid receptors in the rostrodorsal part of the medial shell in the NAc, the posterior ventral pallidum, as well as anterior orbitofrontal cortex and posterior insula (Castro & Berridge, 2017) enhances pleasure. In fact, two studies reported that blocking opioid receptors attenuated musical thrills (Goldstein, 1980) and pleasure (Mallik, Chanda, & Levitin, 2017) in response to participant-selected music. In addition to the role of the central EOS in musical pleasure, several studies investigated plasma levels of β-endorphin, which is released from the anterior pituitary gland (see also the section “Neuroendocrine Systems II”), acts as a hormone associated with stress (Kreutz, Murcia, & Bongard,
2012), and therefore functions differently from the β-endorphin in the CNS (Veening, Gerrits, & Barendregt, 2012). Listening to techno-music increased the plasma β-endorphin level, accompanied with increases in other psychophysiological measures as well as changes in emotional states whereas listening to classical music did not affect them (Gerra et al., 1998). Interestingly, this study also showed the association between the β-endorphin responses to music and personal traits. Higher β-endorphin responses were associated with less novel seeking. In contrast to enhanced EOS responses, a decrease in the plasma concentration of β-endorphin was reported in response to experimenterselected relaxation music, accompanied with a reduction in worries, fear, and blood pressure in coronary patients (Vollert, Störk, Rose, & Möckel, 2003) as well as after a single one-hour singing session in choirs affected by cancer including carers, bereaved carers, and patients (Fancourt, Williamon, et al., 2016). In this latter study, the β-endorphin level showed negative correlations with the levels of immune biomarkers and a positive correlation with another stress biomarker. Another study reported a decrease in response to experimenter-selected classical music and imagery, but not to music only or imagery only (McKinney, Tims, Kumar, & Kumar, 1997). In summary, musical pleasure is associated with the central EOS and music induces changes in the plasma concentration of β-endorphin. As the EOS plays an important role in various functions and has two different functional systems (i.e., central and peripheral systems), more research is needed to replicate and extend existing literature. Suggested future studies include PET studies that investigate both EOS and dopamine systems associated with musical pleasure/reward, studies that investigate the effects of different genres of music and personal traits on the release of βendorphin, as well as studies that investigate the effects of music on EOS associated with pain modulation (i.e., μ opioid receptors in the central pain network; Benarroch, 2012) and stress regulation including both healthy participants and those with pain and stress. When the cerebrospinal fluid (CSF) or peripheral β-endorphin levels are assessed, the diurnal fluctuations should be taken into account. Moreover, it should be noted that responses of the plasma β-endorphin little reflect those in the CSF although they are not entirely independent (Veening et al., 2012).
S
S
Serotonin (5-HT) is synthesized in the raphe nuclei of the brainstem. Some of the 5-TH neurons project to the dorsal cochlear nucleus (DCN) and others send ascending projections to the inferior colliculus (IC) in which auditory neurons express multiple subtypes of 5-HT receptors (Hurley & Sullivan, 2012). A few studies demonstrated that the pharmacological stimulation of 5-HT receptors altered auditory perception and thereby subjective feelings in healthy participants. Specifically, 5-HT receptor 2A (5HT2A) agonist altered the neural response to both participant-selected, personally meaningful music and experimenter-selected non-meaningful music (Barrett, Preller, Herdener, Janata, & Vollenweider, 2017), enhanced the emotion induced by experimenter-selected music (Kaelen et al., 2015), as well as enhanced subjective experiences (mental imagery) accompanied with greater brain connectivity during listening to experimenter-selected music (Kaelen et al., 2016). These studies suggest that the variance in the neuroreceptor expression may play a role in subjective musical experiences. Several studies investigated genetic associations between 5-HT systems and musical ability/behavior using conventional genetic approaches such as genome-wide linkage scans, association studies, copy number variation studies, and candidate gene association. However, the associations are weak and inconclusive. In a small sample, musical traits were associated with the protocadherin-alpha gene (Ukkola-Vuoti et al., 2013) that is important for maturation of the serotonergic projections (Katori et al., 2009) and the galactose mutarotase gene that plays a role in 5-HT release and in membrane trafficking of 5-HT transporter (Djurovic et al., 2009). The serotonin transporter gene (SLC6A4) that regulates 5-HT supply to the receptors has been associated with musical memory (Granot et al., 2007) and choir participation (Morley et al., 2012) whereas it showed weak associations with musical aptitude (Ukkola, Onkamo, Raijas, Karma, & Järvelä, 2009) and no association with active music listening (Ukkola-Vuoti et al., 2011). Serotonin is also implicated in behavioral states such as stress and emotional behavior (Hurley & Sulivan, 2012), as well as various psychiatric and neurologic disorders such as depression, anxiety disorders, obsessivecompulsive disorder, dementia, and post-traumatic stress disorders
(Bandelow et al., 2017). One study measured the platelet content of 5-HT as a model of neural biochemistry and reported a decrease in response to the experimenter-selected unpleasant music compared to pleasant music, suggesting that unpleasant music induces emotional stress or negative emotions, which led to 5-HT release and decreased intracellular 5-HT content of the serotonergic neurons, reflected by 5-HT content of platelets (Evers & Suhr, 2000). Another study reported the plasma serotonin concentration increased in female adolescents with mild depression who had DMT whereas it decreased in those who had no intervention (Jeong et al., 2005). However, the 5-HT levels did not significantly differ between these two groups after the experimental session. There are also two studies that reported no 5-HT changes following music interventions (Kumar et al., 1999; Wahbeh et al., 2007). In summary, the literature shows weak evidence on associations between 5-HT systems and music. However, music modulates the activity in brain regions associated with emotion (Koelsch, 2014) and musical activities can influence social behavior and interaction. Similarly, the 5-HT systems play an important role in emotional behavior and social interaction (Hurley & Sullivan, 2012). In addition, 5-HT closely interacts with neuropeptides— oxytocin and arginine vasopressin that are implicated in social behavior and social reward (Albers, 2015; Dölen, Darvishzadeh, Huang, & Malenka, 2013) and that have been associated with social aspects of music and musical activities. Therefore, more studies are needed to fully understand the relationships between central 5-HT systems and music.
N
S P
I (P )
Two neuropeptides released from the posterior pituitary are oxytocin (OT) and arginine vasopressin (AV). They are highly conserved across species (Johnson & Young, 2017) and modulate social behaviors (Bachner-Melman & Ebstein, 2014), including social cognition (Donaldson & Young, 2008) and social affiliation (Insel, 2010), as well as reproductive behaviors. They are also implicated in psychiatric disorders such as autism spectrum disorder (Bachner-Melman & Ebstein, 2014; Donaldson & Young, 2008).
OT and AV are collectively called nonapeptides because they are composed of nine amino acid residues (Acher & Chauvet, 1995). They are predominantly synthesized in the magnocellular neurons in the hypothalamic supraoptic and paraventricular nuclei and released centrally and peripherally into the circulation through the posterior pituitary (Johnson & Young, 2017) and thereby act as neuromodulators or neurohormones (Bachner-Melman & Ebstein, 2014; Donaldson & Young, 2008). Although several nonapeptide receptors are identified in the brain, OT receptor (OTR), vasopressin receptor 1a (V1aR), and vasopressin receptor 1b (V1bR) have been a major focus of investigation. These nonapeptide receptors are expressed throughout auditory and mesolimbic pathways (Johnson & Young, 2017).
Oxytocin Several studies investigated peripheral OT responses to musical activities. A single 30-minute singing lesson increased the serum OT level compared to the baseline level in both professional and amateur singers (Grape, Sandgren, Hansson, Ericson, & Theorell, 2003). Compared to a chatting group, a singing group showed increase in the salivary OT level and improvement of psychological well-being (Kreutz et al., 2012). In another study, the plasma OT level increased in a small sample of four singers after improvised singing, but it did not change after pre-composed singing (Keeler et al., 2015). Furthermore, a group of boys with mild emotional disturbance aged between 8 and 12 years showed an increased level of salivary OT in the free session of group drumming compared to the practice session, which was not observed in the same age group of girls or an older aged group of boys (Yuhi et al., 2017). In contrast to these findings, two studies reported that group singing reduced the OT levels. One study found a decrease in the salivary OT level after choir singing (Schladt et al., 2017). However, this change was not observed after solo singing in the same participants. Instead, the OT level increased after solo singing. Another study reported that singing in a single 70-minute choir rehearsal was associated with a decrease in the salivary OT
level across three populations affected by cancer (Fancourt, Williamon, et al., 2016). In addition to these studies with musical activities, the effect of passive listening was investigated in two studies. Elevated plasma OT level was reported in cardiac surgery patients who listened passively to experimenterselected “soothing” music (soft, relaxing of 60–80 beats per minute with a volume of 50–60 dB) for 30 minutes one day after the surgery, but not in those who rested without listening to music (Nilsson, 2009). Elevated plasma OT level was also observed in participants with Williams Syndrome (WS) who listened to their favorite music that elicited positive emotions (Dai et al., 2012).
Arginine Vasopressin Arginine vasopressin receptor 1A (AVPR1A) is one of the main genes that have been associated with musical activities and related behaviors in genome-wide linkage and association studies (Bachner-Melman et al., 2005; Granot et al., 2007; Mariath et al., 2017; Ukkola et al., 2009; UkkolaVuoti et al., 2011). The AVPR1A microsatellites have been associated with musical working memory (Granot et al., 2007; but also see Granot, Uzefovsky, Bogopolsky, & Ebstein, 2013), musical aptitude (Ukkola et al., 2009), active music listening (Ukkola-Vuoti et al., 2011), and a wide range of musical abilities (e.g., musical abilities associated with tempo, rhythm, dynamics, vocality, and pitch, as well as creativity and development of musical ideas and accompaniment) (Mariath et al., 2017). Except for these genetic studies, the relationships between AV and music have been little explored. The only study that has measured the AV level was the study by Dai et al. (2012) mentioned above, which also found an increase in the AV level in participants with WS. In summary, music induces the peripheral OT responses and there is some genetic association between AVR and music. However, the directional changes are not consistent among those OT studies. The elevated OT levels are generally implicated in positive social experiences (Chanda & Levitin, 2013). However, OT is also released in response to various kinds of stress (Brown, Cardoso, & Ellenbogen, 2016; de Jong et al., 2015; Pierrehumbert
et al., 2010). The reduction in the OT level may reflect lower arousal and stress during choir singing (Schladt et al., 2017). Taking blood samples may cause stress and increase the OT level in some participants at the baseline measurement, confounding the findings. Alternatively, the inconsistent findings may also be partially derived from different sampling methods. Some studies measured the OT levels in plasma and others measured the salivary OT. These peripheral levels are used as a proxy for the central OT levels. However, there are no strong correlations between central (CSF) and peripheral measures as well as between the peripheral measures (Carson et al., 2015; Hoffman, Brownley, Hamer, & Bulik, 2012; Javor et al., 2014; Lefevre et al., 2017; Valstad et al., 2017). Other possible factors explaining the inconsistencies include the influence by gonadal steroid (Insel, 2010) and subjective experiences of the musical activities employed in the study (Yuhi et al., 2017). The baseline measurement in a healthy control group is also needed to understand the directional changes and interactions when clinical populations are studied. To date, there has been only one study that investigated the effects of music on both neuropeptides in a clinical population and no study in healthy participants. For future studies measuring both neuropeptides, it should be noted that OT and AV show similar directional changes for some social behaviors such as pair bonding (Caldwell, 2017), whereas they show different effects in some cases, and opposite effects for aggression (Ferris, 1992; MacLean et al., 2017), anxiety and stress (Bachner-Melman & Ebstein, 2014; Heinrichs, von Dawans, & Domes, 2009), and social approach (Thompson & Walton, 2004). In fact, several electrophysiological experiments revealed their differential regulations of excitatory projections in the limbic system (Campbell-Smith, Holmes, Lingawi, Panayi, & Westbrook, 2015; Huber, Veinante, & Stoop, 2005; Lubin, Elliot, Black, & Johns, 2003; Numan et al., 2010). Furthermore, animal research suggests that neuroanatomical distribution of their receptors may be critical for determining function.
N
S P
II (A )
A neuroendocrine system, commonly referred to as the hypothalamicpituitary-adrenal (HPA) axis releases cortisol as its main effector hormone (Spencer, Chun, Hartsock, & Woodruff, 2018). Cortisol plays an important role in circadian and stress regulation. The basal cortisol levels fluctuate in a circadian fashion in the absence of stressors and the levels rise in response to acute physical or psychological stressors as well as a circadian entrainment. Circadian and stress-induced cortisol secretion is determined by the neurohormone, corticotropin releasing factor (CRF) produced in and secreted from the medial paraventricular nucleus of the hypothalamus. In response to CRF, the anterior pituitary produces and secretes adrenocorticotropin hormone (ACTH) and ß-endorphin (see also the section “Endogenous Opioid Systems”). Triggered by the ACTH, the cortisol is synthesized in the adrenal cortex. It passively diffuses into the adrenal vein and is carried throughout the circulatory system. In addition to CRF, vasopressin (AVP) is also involved in the process of the secretion. The measured cortisol levels discussed in this section are primarily salivary cortisol unless mentioned otherwise. Salivary cortisol is a valid and reliable measure for unbound hormone in blood (Kirschbaum & Hellhammer, 1994). Cortisol has been most studied as a stress biomarker in response to music (Chanda & Levitin, 2013; Fancourt, Ockelford, & Belai, 2014; Hodges, 2010). There is a general consensus that relaxing music regardless whether it is experimenter- or participant-selected reduces cortisol levels (Beaulieu-Boire et al., 2013; Chanda & Levitin, 2013; Chen, Sung, Lee, & Chang, 2015; Fancourt et al., 2014; Hodges, 2010; Jayamala, Lakshmanagowda, Pradeep, & Goturu, 2015; Kreutz et al., 2012; MejíaRubalcava, Alanís-Tavira, Mendieta-Zerón, & Sánchez-Pérez, 2015; but also see null findings by Chen et al., 2015; Chlan, Engeland, & Savik, 2013; Good et al., 2013; Tan, McPherson, Peretz, Berkovic, & Wilson, 2014). However, when experimenter-selected relaxing music and participant-selected music from a choice of genres were compared, participant-selected music was more effective in reducing the cortisol level by showing the prolonged effect post-surgery (Leardi et al., 2007). Another study suggests that the sound of rippling water may be more effective than relaxing music in reducing the cortisol level (Thoma et al., 2013). However, neither was significantly different from the control condition without acoustic stimulation.
In addition to the effect of relaxing music, several studies investigated the effect of stimulating music on the cortisol levels. Five studies reported that stimulating music also reduced the cortisol levels in female adolescents with chronic depression (Field et al., 1998), in surgical patients (Koelsch et al., 2011), in participants with hypertensives (Möckel et al., 1995), in dancers (Quiroga Murcia, Kreutz, Clift, & Bongard, 2010), in participants with lung infection (le Roux, Bouic, & Bester, 2007), and in healthy males (Ooishi, Mukai, Watanabe, Kawato, & Kashino, 2017), whereas other studies found an increase in healthy participants (Brownley, McMurray, & Hackney, 1995; Gerra et al., 1998; Hébert, Béland, Dionne-Fournelle, Crête, & Lupien, 2005; Karageorghis et al., 2017). These results suggest that stimulating music can either attenuate or enhance the cortisol level, which may depend on participant characteristics and/or their preference of music. Furthermore, there are some studies suggesting that music in general may reduce the cortisol levels. For example, both participant-selected chillinducing music and music they disliked significantly reduced the cortisol levels in both male and female participants (Fukui & Toyoshima, 2013). Additionally, listening to music regardless of genre (Mozart, Strauss, and ABBA) led to a significant reduction in the serum cortisol concentrations, which was also significantly lower compared to those in the silence condition (Trappe & Voit, 2016). Furthermore, another study reported that both repetitive drumming and instrumental meditation music decreased the cortisol levels (Gingras, Pohler, & Fitch, 2014). Taken together, cortisol appears to be responsive to music in general. Cortisol responses to music were also investigated in surgical patients pre- and post- as well as during surgery. Listening to participant-selected and experimenter-selected music during and post-surgery prevented the cortisol level from increasing, and/or decreased the cortisol level postsurgery (Graversen & Sommer, 2013; Nilsson, 2009; Schneider, Schedlowski, Schürmeyer, & Becker, 2001; Tabrizi, Sahraei, & Rad, 2012; but also see Lin, Lin, Huang, Hsu, & Lin, 2011). Comparing different periods of time of music listening, one study reported that listening to experimenter-selected relaxing music following the surgery was most effective in reducing the level of serum cortisol relative to listening to music in the pre- and peri-operative periods (Nilsson, Unosson, & Rawal, 2005). Altogether, these studies suggest that listening to music is beneficial
to surgical patients by reducing cortisol levels. Schneider et al. (2001) reported that more than a majority of patients in the music group thought that the beneficial effect of music was a distraction. Thus, how music exerts the beneficial effect on the cortisol level and other behavioral measures in surgical patients needs further investigation. The counteracting effect of music on the elevated cortisol levels induced by acute stressors was also studied in younger healthy participants. Experimenter-selected relaxing music helped to reduce the cortisol level immediately following a psychological stressor whereas the silence condition led to an increase during the same recovery period (Khalfa, Dalla Bella, Roy, Peretz, & Lupien, 2003), suggesting that relaxing music facilitates faster recovery from the stressor. On the other hand, experimenter-selected relaxing music did not lower the cortisol levels after the exposure to a psychological stressor whereas it prevented stress-induced increases in heart rate, systolic blood pressure, and anxiety compared to the silence condition (Knight & Rickard, 2001). Another study used an acute physiological stressor and demonstrated that tapping to the experimenterselected positive music post-stressor was associated with more positive mood and stronger cortisol responses (i.e., increase) compared to tapping to the neural music (Koelsch et al., 2016). The positive mood was also associated with the greater cortisol response to the acute stressor in the music group. Authors interpreted these findings as indicating that the stronger cortisol response may reflect an early sign of immuno-enhancing response to the acute stressor, but not a higher stress level because the music group overall had a more positive mood (Koelsch et al., 2016). There was no effect of music on the level of ACTH in this study. The inconsistent findings of these studies may be partly due to different types of stressor and how music was applied in these studies. Moreover, the effects of group musical activities on endocrine responses have been studied. Singing was associated with a reduction in endocrine responses (Fancourt, Aufegger, & Williamon, 2015; Fancourt, Williamon, et al., 2016; Schladt et al., 2017). Cortisol reduction was greater for choir singing than solo singing, accompanied with a reduction in the salivary OT (Schladt et al., 2017). In addition, the effect of group singing on the endocrine responses was modulated by the conditions of performance (Fancourt et al., 2015). More specifically, the reduced levels in cortisol and cortisone were only observed in the low-stress condition (singing without
an audience) compared to the high-stress condition (singing in a live concert). On the other hand, no endocrine changes were found following a single session of group drumming (Bittman et al., 2001) or multiple sessions of group drumming (Fancourt, Perkins, et al., 2016) in healthy participants. Music therapy, in which musical and other activities are led by a therapist, showed mixed results. Following guided imaginary and music (GIM) therapy that combines relaxation techniques and listening to classical music, the cortisol level was reduced to be lower compared to the silence condition in healthy participants (McKinney et al., 1997) and in individuals on sick-leave (Beck, Hansen, & Gold, 2015). On the other hand, there were no endocrine responses to an individualized music therapy in older healthy adults (Suzuki, Kanamori, Nagasawa, Tokiko, & Takayuki, 2007); to an individualized music therapy or to a multisensory stimulation environment including auditory stimulation in older adults with severe dementia (Valdiglesias et al., 2017); or to movement music therapy in older healthy adults (Shimizu et al., 2013). One study investigated the social effect of music listening on the cortisol level (Linnemann, Strahler, & Nater, 2016). Listening to music in the presence of others (mostly friends), but not listening alone, attenuated the secretion of cortisol. However, the presence of others alone significantly explained the variance in the cortisol level. In addition, the findings of this study should be interpreted with caution since the time intervals between music listening and the cortisol measurement were unknown. Interestingly, listening to music for relaxation was associated with significant reductions in the subjective stress level and in the cortisol concentration in healthy participants (Linnemann, Ditzen, Strahler, Doerr, & Nater, 2015). Moreover, the reduction in the cortisol level was not associated with the perception of music as relaxing. The authors emphasized the importance of non-musical, contextual factors such as reasons for music listening. It would be interesting to compare the cortisol response to a non-musical control activity for relaxation. This study also showed that listening to music for distraction increased the stress level, which contrasted the findings in surgical patients (Schneider et al., 2001). This may be due to differences in participants’ characteristics and/or circumstances.
The effects of music on endocrine measures may be mediated by sex. For example, testosterone showed opposite responses between men and women. Music decreased testosterone in men but it increased in women (Fukui & Toyoshima, 2013; Fukui & Yamashita, 2003). In addition, music may have differential effects on men and women. In one study, after strenuous exercise men and women showed different trajectory of the cortisol levels during a recovery period with music. This was observed regardless of musical tempo (Karageorghis et al., 2017). In another study, the cortisol level decreased more steeply in men relative to women in both choir and solo singing (Schladt et al., 2017). In contrast, other studies did not find any sex effect on the level of cortisol associated with music listening (Fukui & Yamashita, 2003; Nater, Abbruzzese, Krebs, & Ehlert, 2006). The studies discussed above included adult participants. Several studies investigated the cortisol responses to music in younger age groups. In schoolchildren, extra two-hour musical activities including singing, moving, dancing, or playing instruments during a school year resulted in a reduction of the cortisol level measured in the afternoon at the end of school year. However, this result reached a statistical significance only when a onetail t-test was used (Lindblad, Hogmark, & Theorell, 2007). In preterm infants, exposure to live instrumental music reduced the cortisol level along with improvement of other measures for oxygen desaturations, apneas, and pain (Schwilling et al., 2015). On the other hand, recorded lullabies did not affect the cortisol level or sleep–awake behavior (Dorn et al., 2014). Another study also did not find any effect of recorded lullaby combined with touch on the cortisol level (Qiu et al., 2017). This study, however, showed that following the intervention, blood ß-endorphin was significantly increased, accompanied with decreased pain responses. To summarize, it is relatively conclusive that music reduces the cortisol level. The beneficial effects of music may be associated with distraction from aversive states (Chanda & Levitin, 2013) in the context of acute stressors (Linnemann, Kappert, et al., 2015) and/or listener’s intention of music listening (Linnemann, Ditzen, et al., 2015). Further studies are needed to clarify how music exerts beneficial effects on stress biomarkers. The endocrine responses are primarily studied associated with stress in which multiple factors can affect the findings, for example depending on whether the stressor is either acute or chronic (Koelsch et al., 2016) or
whether it is psychological, physiological, or physical. In addition, appropriate stress response differs depending on the circadian phase (Spencer et al., 2018). Therefore, more studies warrant further elucidation of the effect of music on stress responses.
N
S
Norepinephrine (NE) neurons are located in the brainstem, primarily in the locus coeruleus (LC), whose axons widely project to the cerebral cortex, limbic regions, thalamus, and cerebellum as well as to the spinal cord. The major NE projection from the LC is thought to play an important role in stress responses and various psychiatric disorders (Hurley, Flashman, Chow, & Taber, 2010). In addition, the NE neurons located in the caudal pons and medulla are involved in the function of the sympathetic nervous system (SNS), regulating the autonomic responses of heart rate, blood pressure, and respiration. The activation of the SNS induced by physical or psychological stressors releases NE, which stimulates the adrenal glands that synthesizes and secretes hormonal norepinephrine, epinephrine, and dopamine (Kreutz et al., 2012). Music has been studied as an intervention to reduce stress and to normalize the SNS. A single therapeutic session using relaxing music reduced the plasma level of epinephrine in critically ill patients, which was also accompanied with reductions in the amount of sedative drug required, in blood pressure, and in heart rate (Conrad et al., 2007). Similarly, another study reported that music therapy sessions using familiar music lowered the plasma NE level in the elderly with dementia and cardiovascular disease compared to those without music therapy (Okada et al., 2009). The patients in the music therapy group also showed improvement in other SNS measures and a reduction in the number of congestive heart failures. On the other hand, music therapy sessions increased both NE and epinephrine levels in males with Alzheimer’s disease (Kumar et al., 1999). The increased levels were normalized at a six-week follow-up. Two studies demonstrated the differential effects of stimulating and relaxing music. Experimenter-selected slow rhythm (classical) music decreased the plasma NE level whereas experimenter-selected fast rhythm
(music from action movies) increased the plasma epinephrine level in healthy male participants (Yamamoto et al., 2003). These changes did not affect the following exercise performance. Similarly, using the salivary alpha-amylase as a surrogate biomarker for SNS, energizing music increased the activity whereas relaxing music decreased it in healthy participants (Linnemann, Ditzen, et al., 2015). One study showed the effects of music and the genre of music on the NE and epinephrine levels only in patients but not in healthy participants (Möckel et al., 1995). The hypertensive participants who selected modern classical music from a choice of preselected genres showed a reduction in the NE level whereas those who selected meditative music showed a reduced epinephrine level. The participants in both groups also showed reductions in other stress biomarkers. Another study showed that the religious Islamic music selected by an experimenter reduced the plasma NE level whereas classic music increased it in the Muslim participants who were listening to it during a dental procedure, which was also accompanied with the same directional changes in the systolic blood pressure. The differences in the NE levels as well as in the systolic blood pressure between pre- and post-dental procedure in the religious music group also significantly differed from the classic music and no music groups (Maulina, Djustiana, & Shahib, 2017). These studies suggest that the significance of music may play an important role in exerting the positive effects on the hormonal and physiological measures. In contrast, there were no catecholaminergic changes in response to experimenter-selected stimulating music (Hirokawa & Ohira, 2003) or experimenter-selected relaxing music in healthy participants (Gerra et al., 1998; Hirokawa & Ohira, 2003); experimenter-selected relaxing music in post-operative critically ill patients (Conrad et al., 2007); participantselected music from a list, from a choice of genre, or of their preferences in preoperative patients (Lin et al., 2011; Schneider et al., 2001; Wang, Kulkarni, Dolev, & Kain, 2002) or in those receiving ventilator support (Chlan, Engeland, & Anthony, 2007); or participant-selected music from a choice of genre in patients under general anesthesia (Migneault et al., 2004) or in post-operative patients (Lin et al., 2011). Furthermore, experimenterselected positive music from various genres and with varying tempo that evokes feelings of pleasure and happiness did not change the levels of NE
compared to a neutral auditory stimuli following an acute physiological stressor (Koelsch et al., 2016). To summarize, literature shows conflicting results on the peripheral catecholaminergic responses to music. Music tends to decrease the levels of catecholamine in some individuals with medical conditions. Tempo/rhythm of music may be an important factor influencing the responses. This may reflect that the auditory nuclei in the brainstem and the midbrain encoding auditory temporal information (Griffiths, Uppenkamp, Johnsrude, Josephs, & Patterson, 2001) are innervated by the NE system from the LC (Levitt & Moore, 1979; Thompson, 2003). The beneficial effects observed in listening to religious Islamic music among Muslim participants (Maulina et al., 2017) suggest the top-down regulation of music on the NE system and SNS. The underlying mechanisms for the effects of music on SNS are also discussed in review papers (Fancourt et al., 2014; Juslin & Västfjäll, 2008).
P
I
S
The immune system functions to protect and defend the body against infection and damage from foreign organisms and toxins, while maintaining checks and balances to prevent self-reactivity. It has two branches: innate and adaptive immunity. The innate immune responses occur immediately following an insult and are the first component of the immune system to be activated against invasion (Turvey & Broide, 2010), and include activation of immune cells such as granulocyte and monocytes/macrophages, and secretion of pro-inflammatory cytokines such as interleukin IL-1β, IL-6, tumor necrosis factor alpha TNF-α, and interferon-gamma IFN-γ to upregulate the acute inflammatory response. In contrast, the adaptive immune system, consisting of B cells and T cells, is slower acting with its responses occurring days to weeks after exposure. Unlike the innate immune system, the adaptive immune system is capable of memory and is able to adjust in response to pathogens. Anti-inflammatory cytokines include IL-1Ra, IL-4, IL-6, IL-10, IL-11 and IL-13, and TNF-β that modulate the inflammatory immune response to prevent the harmful effects of prolonged immune system activation. It should be noted that the immune cells such as natural killer (NK) cells and dendritic cells cannot be clearly
defined as innate or adaptive and that some cytokines have both pro- and anti-inflammatory properties depending on the amount of the cytokines expressed, the length of time they are expressed, or which form of the receptors the cytokines activate (Rainville, Tsyglakova, & Hodes, 2018).
Immune Cells Group drumming led to an increase in the levels of NK cell in healthy adults (Bittman et al., 2001) and increased CD4+ T cell, and memory T cell counts only in older adults, but not in younger adults (Koyama et al., 2009). In contrast, listening to experimenter-selected relaxing music during surgery decreased the levels of NK cell, which was not observed in patients who chose their music from preselected music (Leardi et al., 2007).
Cytokines Among cytokines, IL-6 (that presents both pro- and anti-inflammatory properties) has been most researched associated with music. Music therapy sessions using relaxing music reduced the IL-6 levels, which was accompanied with reductions in SNS biomarkers in surgical patients (Conrad et al., 2007), and in the elderly with cerebrovascular disease and dementia (Okada et al., 2009). Experimenter-selected classical music also decreased the IL-6 level among older adults who liked the genre of music, which was accompanied with an increase in the expression of μ opioid receptors (Stefano, Zhu, Cadet, Salamon, & Mantione, 2004) whereas it did not change the levels of other cytokines. On the other hand, group drumming exercises led to an increase in the IL-6 level, along with increased levels of pro-inflammatory IFN-γ in older adults, which was not observed in younger adults (Koyama et al., 2009). Because Koyama et al. (2009) also reported increased CD4+ T cell and memory T cell counts only in older adults, the increased IL-6 level may be anti-inflammatory. Although it appeared that IL-6 showed “the greatest levels of responsiveness” (Fancourt et al., 2014, p. 18), more recent studies showed
otherwise (Beaulieu-Boire et al., 2013; Fancourt, Perkins, et al., 2016; Fancourt, Williamon, et al., 2016; Koelsch et al., 2016). Further research is needed to determine whether or not IL-6 is a sensitive immune biomarker in response to music. Other cytokines also showed responses to music. The level of antiinflammatory IL-1 was increased, along with the cortisol reduction in response to participant-selected music compared to the control conditions (Bartlett, Kaufman, & Smeltekop, 1993). In another study, antiinflammatory IL-4 was increased, accompanied with a reduction in a proinflammatory marker, monocyte chemoattractant protein (MCP) in response to multiple group drumming sessions (Fancourt, Perkins, et al., 2016). Another study reported increased inflammatory markers including proinflammatory IL-2 and soluble IL2 receptor α; anti-inflammatory IL-4; and IL-17 that displays both pro- and anti-inflammatory profiles, along with improved affects and reductions in the cortisol, ß-endorphin, and OT levels in response to a single session of singing in choirs affected by cancer (Fancourt, Williamon, et al., 2016). One study found that Mozart, but not Beethoven or Schubert, downregulated the levels of anti-inflammatory IL4, 10 and 13 and upregulated the levels of pro-inflammatory cytokine such as IFN-γ and IL-12, which was also associated with alleviated allergic skin responses (Kimata, 2003). The findings reported by Kimata (2003) may reflect the enhancement of pro-inflammatory responses induced by music, which was similar to the increased cortisol responses following the acute physiological stressor (Koelsch et al., 2016).
Immunoglobulin A Along with these peripheral immune biomarkers, immunoglobulin A (IgA) is one of the most commonly studied immune biomarkers associated with music. Immunoglobulin A is a major serum immunoglobulin that is predominantly produced in the bone marrow and mediates various protective functions through interaction with specific receptors and immune mediators (Woof & Ken, 2006). Immunoglobulin A is also a principal antibody class in the external secretions that bathe vast mucosal surfaces of the gastrointestinal, respiratory, and genitourinary tracts and plays an
important role in first line immune protection. Secretory and serum IgA are different biochemical and immunochemical properties produced by cells with different organ distributions. Therefore, different methods of immunization can induce either secretory or serum IgA responses or a combination of both. In general, research has yielded consistent results: music increases the concentrations or secretion rate of secretory IgA (S-IgA) (Chanda & Levitin, 2013; Fancourt et al., 2014; Hodges, 2010), suggesting that music enhances immunity in healthy individuals. Furthermore, the S-IgA increase was greater when engaging in group singing compared to passive listening (Beck, Cesario, Yousefi, & Enamoto, 2000; Kreutz, Bongard, Rohrmann, Hodapp, & Grebe, 2004; Kuhn, 2002). Another study showed that S-IgA was increased only in repose to “designer music” that brings positive feelings, but not to relaxing (new age) or rock music (McCraty, Atkinson, & Rein, 1996). However, there are also a few studies reporting no changes in the levels of IgA. In two studies, the serum levels of IgA did not change in patients who listened to experimenter-selected calming music post-surgery (Nilsson et al., 2005) or joyful music (which was described to the patients as “relaxing” acoustic stimulation to reduce noise) before, during, or after surgery (Koelsch et al., 2011). No music effect on the plasma IgA concentrations may be due to the effects of local anesthetic infiltration (Nilsson et al., 2005) or due to differences in response to music between SIgA and serum IgA (Woof & Ken, 2006). Furthermore, two studies reported no changes following stressors such as eating adverse/allergic food (Kejr et al., 2010) or a stressful cognitive task (Hirokawa & Ohira, 2003). The immunoenhancement effect of music may be limited in healthy individuals without exposure to stressors. To summarize, there is some evidence that music induces changes in immune biomarkers. S-IgA appears to respond most consistently and robustly to music in healthy individuals. Music-induced increase of S-IgA is interpreted as immunoenhancement. Future studies can investigate how long this effect lasts and whether music experiences and habits modulate the effect. Although music induces responses of other immune biomarkers, the interpretations can be challenging due to the inconsistency in the directional changes of cytokines with different inflammatory properties.
An interesting observation from animal research is that individual differences in the peripheral immune system influence the development of stress susceptibility, demonstrated by higher circulating levels of IL-6 and leukocytes in susceptible mice compared to resilient and control mice (Hodes et al., 2014; Rainville et al., 2018). Therefore, it may be useful to separate participants depending on the baseline level of immune biomarkers. Furthermore, as immune biomarkers are closely connected with hormones (Yovel, Shakhar, & Ben-Eliyahu, 2001), sex may need to be accounted for in the study design.
C
S
Cholinergic neurons are localized in the basal forebrain, pedunculopontine tegmental nucleus (PPT), and laterodorsal tegmental nucleus (LDT). Two latter nuclei are collectively termed as the pontomesencephalic tegmentum (PMT) nuclei. The PMT nuclei send widespread projections to the spinal cord, thalamus, basal forebrain, and frontal cortex. Major acetylcholine (Ach) receptors include nicotinic (nAChR) and muscarinic (mAChR) receptors, which are expressed in the auditory system (Metherate, 2011; Morley & Happe, 2000). Cholinergic modulation of auditory functions is well studied and animal research demonstrated that cholinergic neurons were responsive to simple auditory stimuli such as pure tones (Koyama, Jodo, & Kayama, 1994) and clicks (Reese, Garciarill, & Skinner, 1995a, b). However, whether acoustic information including music induces cholinergic responses in the human brain is unknown. In animal research, some neurons in the primary auditory cortex send direct glutamatergic projections to the superior olivary complex, as well as PMT that innervates the IC and the auditory thalamus (Motts & Schofield, 2010). These observations suggest that auditory stimuli activating the primary auditory cortex may be able to affect the activity of cholinergic neurons in the PMT, influencing various functions such as arousal, the sleep–wake cycle, motor control, and motivation and reward behavior (Schofield, 2010). Cholinergic PMT cells are connected with dopaminergic neurons in the VTA (Chen, Nakamura, Kawamura, Takahashi, & Nakahara, 2006; Pan & Hyland, 2005) and these connections are likely to be involved
in reward behavior (Pan & Hyland, 2005), and the connections from PPT with BG are associated with motor functions. The cholinergic PPT neurons responsive to clicks (Reese et al., 1995a, b) may underlie part of the mechanisms for auditory-motor entrainment. There is also a network of the mediodorsal nucleus of the thalamus projecting to cholinergic and noncholinergic neurons in the globus pallidus that project to the auditory cortex (Moriizumi & Hattori, 1992), which may also be associated with auditory motor functions.
D
F
D
Research demonstrates that music induces responses of neurochemicals as well as peripheral hormones and immune biomarkers, along with concomitant functional changes. Some of them are extensively studied and yield relatively consistent responses (e.g., a reduction in cortisol and an increase in S-IgA), and others are little studied and/or show inconsistent results. In addition, there are few studies that have directly investigated the CNS responses. As a neuroscientific pursuit in music as well as clinical application of music are of growing interest, more studies are needed to elucidate and confirm the neurochemical responses to music and acoustic information by employing more rigorous study designs. Future studies need to consider participant characteristics such as age, sex, trait and state of depression and anxiety levels, baseline neurochemical levels, polymorphisms associated with music ability, music education/training levels, and music listening habits and preferences. At the same time, more studies are needed to investigate the effects of these individual characteristics on neurochemical responses to determine important confounding variables in music studies. Moreover, studies including clinical populations or older healthy adults need to include control groups consisting of participants without the medical conditions or younger healthy adults to determine whether the target group is different from the control group in the baseline level of neurochemical measures and how they react differently to the music intervention. Existing literature used different methods to evaluate the molecular responses to music. Some studies simply compared the levels between pre-
and post-music intervention and others additionally included control silence conditions. In addition, some studies used passive listening and others used group musical activities. In order to determine and dismantle the specific effects of music, future studies need to include control conditions well matched with music conditions in terms of attention, engagement, and interactions (e.g., passive listening to music versus passive listening to an audio book suggested by Chanda & Levitin, 2013). In general, research shows that participant-selected music has greater responses compared to experimenter-selected music. When the experimenter-selected music is used in experimental studies or in clinical settings, participants’ rating on the selected music for emotional dimensions and liking may help to understand the findings or answer some of the inconsistency and variance of the responses although subjective and objective hedonic reactions are not always mutual (Berridge & Kringelbach, 2015). More specific descriptions of music may also help to further clarify what components of music are important to induce such responses. Furthermore, concomitant measures of other relevant biomarkers and physiological/emotional/behavioral data are useful to determine whether observed neurochemical responses are beneficial. Demonstrated correlations may help to interpret the findings and the underlying mechanisms. Moreover, the findings based on the peripheral measures to infer brain functions should be interpreted with caution unless the measures are well-validated proxies for the central measures. In addition, timing of measurement is important for some biomarkers, and thus multiple measurements over a period of time may be able to capture more distinct response. To date, there is only one study directly addressing neurochemical changes associated with music listening (Salimpoor et al., 2011), which used PET with a D2/D3 receptor antagonist, [11C]raclopride (see section “Dopamine Systems” for details). However, D2/D3 receptor agonists such as [11C]-(+)-PHNO (Rabiner & Laruelle, 2010; Willeit et al., 2006) may be more advantageous to investigate the functional changes in the ventral striatum because it is more sensitive to competition from endogenous dopamine following administration of dopamine releasing stimuli than the antagonist [11C]raclopride (Narendran et al., 2010; Shotbolt et al., 2012;
Willeit et al., 2008) and shows up to 20-fold higher affinity for D3 over D2 receptors, providing higher sensitivity and allowing for better quantification of the D3 receptor subtype in the ventral striatum (Graff-Guerrero et al., 2008; Narendran et al., 2006). There are three radioligands developed for opioid receptors,: [11C]carfentanil, targeting µ opioid receptors (Frost et al., 1989), [11C]diprenorphine for non-selective opioid receptors (Jones et al., 1994), and more recent [11C]-LY2795050, targeting κ opioid receptors (Naganawa et al., 2015). For the neuroimmune system, a number of radioligands have been developed, targeting translocator protein (TSPO) that is localized to the outer membrane of mitochondria of glia cells and has been used as a biomarker for neuroimmune system and neuroinflammation in normal aging and various diseases and disorders (Gunn et al., 2015). In addition, radioligands for 5-HT and cholinergic subtype receptors as well as for NE have been developed. When PET studies are conducted, demographic characteristics such as age and sex, and body mass are important confounding variables to be taken into account (Gunn et al., 2015). In addition, polymorphism can have great impact on binding. For example, TSPO polymorphism produces three different binding phenotypes (Owen et al., 2011). Along with these neuroreceptors and proteins that can be studied using PET imaging, the recent development of proton magnetic resonance spectroscopy at high magnetic field strengths allows for more reliable estimation of the amino acids glutamine, glutamate, and gammaaminobutyric acid (Ciurleo, Di Lorenzo, Bramanti, & Marino, 2014). This neuroimaging technique may be able to shed a light on cortico-cortical interactions and top-down modulations of music. Moreover, pharmacological studies in a double-blind placebo-controlled crossover design, combined with more accessible fMRI would facilitate to elucidate the role of neurochemicals in music-associated complex functions such as cognition and emotional behaviors.
R
Acher, R., & Chauvet, J. (1995). The neurohypophysial endocrine regulatory cascade: Precursors mediators, receptors, and effectors. Frontiers in Neuroendocrinology 16(3), 237–289. Albers, H. E. (2015). Species, sex and individual differences in the vasotocin/vasopressin system: Relationship to neurochemical signaling in the social behavior neural network. Frontiers in Neuroendocrinology 36, 49–71. Bachner-Melman, R., Dina, C., Zohar, A. H., Constantini, N., Lerer, E., Hoch, S., … Ebstein, R. P. (2005). AVPR1a and SLC6A4 gene polymorphisms are associated with creative dance performance. PLoS Genetics 1(3), 394–403. Retrieved from https://doi.org/10.1371/journal.pgen.0010042 Bachner-Melman, R., & Ebstein, R. P. (2014). The role of oxytocin and vasopressin in emotional and social behaviors. Handbook of Clinical Neurology 124, 53–68. Bandelow, B., Baldwin, D., Abelli, M., Bolea-Alamanac, B., Bourin, M., Chamberlain, S. R., … Riederer, P. (2017). Biological markers for anxiety disorders, OCD and PTSD: A consensus statement. Part II: Neurochemistry, neurophysiology and neurocognition. World Journal of Biological Psychiatry 18(3), 162–214. Barrett, F. S., Preller, K. H., Herdener, M., Janata, P., & Vollenweider, F. X. (2017). Serotonin 2A receptor signaling underlies LSD-induced alteration of the neural response to dynamic changes in music. Cerebral Cortex (December), 1–12. Retrieved from https://doi.org/10.1093/cercor/bhx257 Bartlett, D., Kaufman, D., & Smeltekop, R. (1993). The effects of music listening and perceived sensory experience on the immune system as measured by interleukin-1 and cortisol. Journal of Music Therapy 30(4), 194–209. Beaulieu-Boire, G., Bourque, S., Chagnon, F., Chouinard, L., Gallo-Payet, N., & Lesur, O. (2013). Music and biological stress dampening in mechanically-ventilated patients at the intensive care unit ward: A prospective interventional randomized crossover trial. Journal of Critical Care 28(4), 442–450. Beck, B. D., Hansen, A. M., & Gold, C. (2015). Coping with work-related stress through guided imagery and music (GIM): Randomized controlled trial. Journal of Music Therapy 52(3), 323– 352. Beck, R. J., Cesario, T. C., Yousefi, A., & Enamoto, H. (2000). Choral singing, performance perception, and immune system changes in salivary immunoglobulin A and cortisol. Music Perception: An Interdisciplinary Journal 18(1), 87–106. Benarroch, E. E. (2012). Endogenous opioid systems: Current concepts and clinical correlations. Neurology 79, 807–814. Berridge, K. C., & Kringelbach, M. L. (2015). Pleasure systems in the brain. Neuron 86(3), 646–664. Bittman, B., Berk, L., Felten, D., Westengard, J., Simonton, O., Pappas, J., & Ninehouser, M. (2001). Composite effects of group drumming music therapy on modulatin of neuroendocrine-immune parameters in normal subjects. Alternative Therapies 7(1), 38–47. Brown, C. A., Cardoso, C., & Ellenbogen, M. A. (2016). A meta-analytic review of the correlation between peripheral oxytocin and cortisol concentrations. Frontiers in Neuroendocrinology 43, 19– 27. Brownley, K. A., McMurray, R. G., & Hackney, A. C. (1995). Effects of music on physiological and affective responses to graded treadmill exercise in trained and untrained runners. International Journal of Psychophysiology 19(3), 193–201. Caldwell, H. K. (2017). Oxytocin and vasopressin: Powerful regulators of social behavior. Neuroscientist 23(5), 517–528. Cameron, D. J., Pickett, K. A., Earhart, G. M., & Grahn, J. A. (2016). The effect of dopaminergic medication on beat-based auditory timing in Parkinson’s disease. Frontiers in Neurology 7, 1–8. Retrieved from https://doi.org/10.3389/fneur.2016.00019
Campbell-Smith, E. J., Holmes, N. M., Lingawi, N. W., Panayi, M. C., & Westbrook, R. F. (2015). Oxytocin signaling in basolateral and central amygdala nuclei differentially regulates the acquisition, expression, and extinction of context-conditioned fear in rats. Learning & Memory 22(5), 247–257. Carson, D. S., Berquist, S. W., Trujillo, T. H., Garner, J. P., Hannah, S. L., Hyde, S. A., … Parker, K. J. (2015). Cerebrospinal fluid and plasma oxytocin concentrations are positively correlated and negatively predict anxiety in children. Molecular Psychiatry 20(9), 1085–1090. Castro, D. C., & Berridge, K. C. (2017). Opioid and orexin hedonic hotspots in rat orbitofrontal cortex and insula. Proceedings of the National Academy of Sciences 114(43), E9125–E9134. Chanda, M. L., & Levitin, D. J. (2013). The neurochemistry of music. Trends in Cognitive Sciences 17(4), 179–191. Chen, C. J., Sung, H. C., Lee, M. S., & Chang, C. Y. (2015). The effects of Chinese five-element music therapy on nursing students with depressed mood. International Journal of Nursing Practice 21(2), 192–199. Chen, J., Nakamura, M., Kawamura, T., Takahashi, T., & Nakahara, D. (2006). Roles of pedunculopontine tegmental cholinergic receptors in brain stimulation reward in the rat. Psychopharmacology 184(3–4), 514–522. Chlan, B. L. L., Engeland, W. C., & Anthony, A. (2007). Influence of music on the stress response in patients receiving mechanical ventilatory support: A pilot study. American Journal of Critical Care 16(2), 141–146. Chlan, L. L., Engeland, W. C., & Savik, K. (2013). Does music influence stress in mechanically ventilated patients? Intensive and Critical Care Nursing 29(3), 121–127. Ciurleo, R., Di Lorenzo, G., Bramanti, P., & Marino, S. (2014). Magnetic resonance spectroscopy: An in vivo molecular imaging biomarker for Parkinson’s disease? BioMed Research International 2014, 519816. Retrieved from https://doi.org/10.1155/2014/519816 Conrad, C., Niess, H., Jauch, K., Bruns, C., Hartl, W., & Welker, L. (2007). Overture for growth hormone: Requiem for interleukin-6? Critical Care Medicine 35(12), 2709–2713. Dai, L., Carter, C. S., Ying, J., Bellugi, U., Pournajafi-Nazarloo, H., & Korenberg, J. R. (2012). Oxytocin and vasopressin are dysregulated in Williams syndrome, a genetic disorder affecting social behavior. PLoS ONE 7(6), e38513. Retrieved from https://doi.org/10.1371/journal.pone.0038513 de Jong, T. R., Menon, R., Bludau, A., Grund, T., Biermeier, V., Klampfl, S. M., … Neumann, I. D. (2015). Salivary oxytocin concentrations in response to running, sexual self-stimulation, breastfeeding and the TSST: The Regensburg Oxytocin Challenge (ROC) study. Psychoneuroendocrinology 62, 381–388. Djurovic, S., Le Hellard, S., Kähler, A. K., Jönsson, E. G., Agartz, I., Steen, V. M., … Andreassen, O. A. (2009). Association of MCTP2 gene variants with schizophrenia in three independent samples of Scandinavian origin (SCOPE). Psychiatry Research 168(3), 256–258. Dölen, G., Darvishzadeh, A., Huang, K. W., & Malenka, R. C. (2013). Social reward requires coordinated activity of nucleus accumbens oxytocin and serotonin. Nature 501(7466), 179–184. Donaldson, Z. R., & Young, L. J. (2008). Oxytocin, vasopressin, and the neurogenetics of sociality. Science 322(5903), 900–904. Correction (2009): Science 323(5920), 1429. Dorn, F., Wirth, L., Gorbey, S., Wege, M., Zemlin, M., Maier, R. F., & Lemmer, B. (2014). Influence of acoustic stimulation on the circadian and ultradian rhythm of premature infants. Chronobiology International 31(9), 1062–1074. Elsinger, C. L., Rao, S. M., Zimbelman, J. L., Reynolds, N. C., Blindauer, K. A., & Hoffmann, R. G. (2003). Neural basis for impaired time reproduction in Parkinson’s disease: An fMRI study. Journal of the International Neuropsychological Society 9(7), 1088–1098.
Evers, S., & Suhr, B. (2000). Changes of the neurotransmitter serotonin but not of hormones during short time music perception. European Archives of Psychiatry and Clinical Neuroscience 250(3), 144–147. Fancourt, D., Aufegger, L., & Williamon, A. (2015). Low-stress and high-stress singing have contrasting effects on glucocorticoid response. Frontiers in Psychology 6, 1–5. Retrieved from https://doi.org/10.3389/fpsyg.2015.01242 Fancourt, D., Ockelford, A., & Belai, A. (2014). The psychoneuroimmunological effects of music: A systematic review and a new model. Brain, Behavior, and Immunity 36, 15–26. Fancourt, D., Perkins, R., Ascenso, S., Carvalho, L. A., Steptoe, A., & Williamon, A. (2016). Effects of group drumming interventions on anxiety, depression, social resilience and inflammatory immune response among mental health service users. PLoS ONE 11(3), 1–16. Retrieved from https://doi.org/10.1371/journal.pone.0151136 Fancourt, D., Williamon, A., Carvalho, L. A., Steptoe, A., Dow, R., & Lewis, I. (2016). Singing modulates mood, stress, cortisol, cytokine and neuropeptide activity in cancer patients and carers. Ecancermedicalscience 10, 1–13. Retrieved from https://doi.org/10.3332/ecancer.2016.631 Ferris, C. (1992). Role of vasopressin in aggressive and dominant/subordinate behaviors. Annals of the New York Academy of Sciences 652, 212–226. Field, T., Martinez, A., Nawrocki, T., Pickens, J., Fox, N., & Schanberg, S. (1998). Music shifts frontal EEG in depressed adolescents. Adolescence 33(129), 109–116. Frost, J. J., Douglass, K. H., Mayberg, H. S., Dannals, R. F., Links, J. M., Wilson, A. A., … Wagner, H. N. (1989). Multicompartmental analysis of [11C]-Carfentanil binding to opiate receptors in humans measured by positron emission tomography. Journal of Cerebral Blood Flow & Metabolism 9(3), 398–409. Fukui, H., & Toyoshima, K. (2013). Influence of music on steroid hormones and the relationship between receptor polymorphisms and musical ability: A pilot study. Frontiers in Psychology 4, 1– 8. Retrieved from https://doi.org/10.3389/fpsyg.2013.00910 Fukui, H., & Yamashita, M. (2003). The effects of music and visual stress on testosterone and cortisol in men and women. Neuro Endocrinology Letters 24(3–4), 173–180. Gerra, G., Zaimovic, A., Franchini, D., Palladino, M., Giucastro, G., Reali, N., … Brambilla, F. (1998). Neuroendocrine responses of healthy volunteers to “techno-music”: Relationships with personality traits and emotional state. International Journal of Psychophysiology 28, 99–111. Gingras, B., Pohler, G., & Fitch, W. T. (2014). Exploring shamanic journeying: Repetitive drumming with shamanic instructions induces specific subjective experiences but no larger cortisol decrease than instrumental meditation music. Plos ONE 9(7). Retrieved from https://doi.org/10.1371/journal.pone.0102103 Goldstein, A. (1980). Thrills in response to music and other stimuli. Physiological Psychology 8(1), 126–129. Good, M., Albert, J. M., Arafah, B., Anderson, G. C., Wotman, S., Cong, X., … Ahn, S. (2013). Effects on postoperative salivary cortisol of relaxation/music and patient teaching about pain management. Biological Research for Nursing 15(3), 318–329. Graff-Guerrero, A., Willeit, M., Ginovart, N., Mamo, D., Mizrahi, R., Rusjan, P., … Kapur, S. (2008). Brain region binding of the D2/3 agonist [11C]-(+)- PHNO and the D2/3 antagonist [11C]raclopride in healthy humans. Human Brain Mapping 29(4), 400–410. Grahn, J. A., & Rowe, J. B. (2009). Feeling the beat: Premotor and striatal interactions in musicians and nonmusicians during beat perception. Journal of Neuroscience 29(23), 7540–7548. Granot, R. Y., Frankel, Y., Gritsenko, V., Lerer, E., Gritsenko, I., Bachner-Melman, R., … Ebstein, R. P. (2007). Provisional evidence that the arginine vasopressin 1a receptor gene is associated with musical memory. Evolution and Human Behavior 28(5), 313–318.
Granot, R. Y., Uzefovsky, F., Bogopolsky, H., & Ebstein, R. P. (2013). Effects of arginine vasopressin on musical working memory. Frontiers in Psychology 4, 1–12. Retrieved from https://doi.org/10.3389/fpsyg.2013.00712 Grape, C., Sandgren, M., Hansson, L., Ericson, M., & Theorell, T. (2003). Does singing promote well-being? An empirical study of professional and amateur singers during a singing lesson. Integrative Physiological & Behavioral Science 38(1), 65–74. Graversen, M., & Sommer, T. (2013). Perioperative music may reduce pain and fatigue in patients undergoing laparoscopic cholecystectomy. Acta Anaesthesiologica Scandinavica 57(8), 1010– 1016. Griffiths, T. D., Uppenkamp, S., Johnsrude, I., Josephs, O., & Patterson, R. D. (2001). Encoding of the temporal regularity of sound in the human brainstem. Nature Neuroscience 4(6), 633–637. Gunn, R. N., Slifstein, M., Searle, G. E., & Price, J. C. (2015). Quantitative imaging of protein targets in the human brain with PET. Physics in Medicine and Biology 60(22), R363–R411. Hébert, S., Béland, R., Dionne-Fournelle, O., Crête, M., & Lupien, S. J. (2005). Physiological stress response to video-game playing: The contribution of built-in music. Life Sciences 76(20), 2371– 2380. Heinrichs, M., von Dawans, B., & Domes, G. (2009). Oxytocin, vasopressin, and human social behavior. Frontiers in Neuroendocrinology 30(4), 548–557. Hirokawa, E., & Ohira, H. (2003). The effects of music listening after a stressful task on immune functions, neuroendocrine responses, and emotional states in college students. Journal of Music Therapy 40(3), 189–211. Hjelmstad, G. O., Xia, Y., Margolis, E. B., & Fields, H. L. (2013). Opioid modulation of ventral pallidal afferents to ventral tegmental area neurons. Journal of Neuroscience 33(15), 6454–6459. Hodes, G. E., Pfaua, M. L., Marylene Leboeufb, C., Goldena, S. A., Christoffela, D. J., Bregmana, D., … Russo, S. J. (2014). Individual differences in the peripheral immune system promote resilience versus susceptibility to social stress. Proceedings of the National Academy of Sciences 111(45), 16136–16141. Hodges, D. (2010). Pyschophysiological measures. In P. Juslin & J. Sloboda (Eds.), Handbook of music and emotion: Theory, research, applications (pp. 279–312). Oxford: Oxford University Press. Hoffman, E. R., Brownley, K. A., Hamer, R. M., & Bulik, C. M. (2012). Plasma, salivary, and urinary oxytocin in anorexia nervosa: A pilot study. Eating Behaviors 13(3), 256–259. Huber, D., Veinante, P., & Stoop, R. (2005). Vasopressin and oxytocin excite distinct neuronal populations in the central amygdala. Science 308(5719), 245–248. Hurley, L. M., & Sullivan, M. R. (2012). From behavioral context to receptors: Serotonergic modulatory pathways in the IC. Frontiers in Neural Circuits 6, 1–17. Retrieved from https://doi.org/10.3389/fncir.2012.00058 Hurley, R. A, Flashman, L. A., Chow, T. W., & Taber, K. H. (2010). The brainstem: Anatomy, assessment, and clinical syndromes. Journal of Neuropsychiatry and Clinical Neuroscience 22(1), 2–6. Retrieved from https://doi.org/10.1176/appi.neuropsych.23.2.121 Insel, T. R. (2010). The challenge of translation in social neuroscience: A review of oxytocin, vasopressin, and affiliative behavior. Neuron 65(6), 768–779. Jahanshahi, M., Jones, C. R. G., Zijlmans, J., Katzenschlager, R., Lee, L., Quinn, N., … Lees, A. J. (2010). Dopaminergic modulation of striato-frontal connectivity during motor timing in Parkinson’s disease. Brain 133(3), 727–745. Javor, A., Riedl, R., Kindermann, H., Brandstatter, W., Ransmayr, G., & Gabriel, M. (2014). Correlation of plasma and salivary oxytocin in healthy young men: Experimental evidence. Neuro Endocrinology Letters 35(6), 470–473.
Jayamala, A. K., Lakshmanagowda, P. B., Pradeep, G. C. M., & Goturu, J. (2015). Impact of music therapy on breast milk secretion in mothers of premature newborns. Journal of Clinical and Diagnostic Research 9(4), CC04–CC06. Jeong, Y. J., Hong, S. C., Myeong, S. L., Park, M. C., Kim, Y. K., & Suh, C. M. (2005). Dance movement therapy improves emotional responses and modulates neurohormones in adolescents with mild depression. International Journal of Neuroscience 115(12), 1711–1720. Johnson, Z. V., & Young, L. J. (2017). Oxytocin and vasopressin neural networks: Implications for social behavioral diversity and translational neuroscience. Neuroscience & Biobehavioral Reviews 76, 87–98. Jones, A. K. P., Cunningham, V. J., Ha-Kawa, S. K., Fujiwara, T., Liyii, Q., Luthra, S. K., … Jones, T. (1994). Quantitation of [11C]diprenorphine cerebral kinetics in man acquired by PET using presaturation, pulse-chase and tracer-only protocols. Journal of Neuroscience Methods 51(2), 123– 134. Juslin, P. N., & Västfjäll, D. (2008). Emotional responses to music: The need to consider underlying mechanisms. Behavioral and Brain Sciences 31(5), 559–575; discussion 575–621. Kaelen, M., Barrett, F. S., Roseman, L., Lorenz, R., Family, N., Bolstridge, M., … Carhart-Harris, R. L. (2015). LSD enhances the emotional response to music. Psychopharmacology 232(19), 3607– 3614. Kaelen, M., Roseman, L., Kahan, J., Santos-Ribeiro, A., Orban, C., Lorenz, R., … Carhart-Harris, R. (2016). LSD modulates music-induced imagery via changes in parahippocampal connectivity. European Neuropsychopharmacology 26(7), 1099–1109. Kanduri, C., Kuusi, T., Ahvenainen, M., Philips, A. K., Lähdesmäki, H., & Järvelä, I. (2015). The effect of music performance on the transcriptome of professional musicians. Scientific Reports 5, 1–7. Retrieved from https://doi.org/10.1038/srep09506 Kanduri, C., Raijas, P., Ahvenainen, M., Philips, A. K., Ukkola-Vuoti, L., Lähdesmäki, H., & Järvelä, I. (2015). The effect of listening to music on human transcriptome. PeerJ 3, e830. Retrieved from https://doi.org/10.7717/peerj.830 Karageorghis, C. I., Bruce, A. C., Pottratz, S. T., Stevens, R. C., Bigliassi, M., & Hamer, M. (2017). Psychological and psychophysiological effects of recuperative music postexercise. Medicine & Science in Sports & Exercise 50(4), 739–746. Katori, S., Hamada, S., Noguchi, Y., Fukuda, E., Yamamoto, T., Yamamoto, H., … Yagi, T. (2009). Protocadherin-α family is required for serotonergic projections to appropriately innervate target brain areas. Journal of Neuroscience 29(29), 9137–9147. Keeler, J. R., Roth, E. A., Neuser, B. L., Spitsbergen, J. M., Waters, D. J. M., & Vianney, J.-M. (2015). The neurochemistry and social flow of singing: Bonding and oxytocin. Frontiers in Human Neuroscience 9, 1–10. Retrieved from https://doi.org/10.3389/fnhum.2015.00518 Kejr, A., Gigante, C., Hames, V., Krieg, C., Mages, J., König, N., … Diel, F. (2010). Receptive music therapy and salivary histamine secretion. Inflammation Research 59(Suppl. 2), 217–218. Khalfa, S., Dalla Bella, S., Roy, M., Peretz, I., & Lupien, S. J. (2003). Effects of relaxing music on salivary cortisol level after psychological stress. Annals of the New York Academy of Sciences 999, 374–376. Kimata, H. (2003). Listening to Mozart reduces allergic skin wheal responses and in vitro allergenspecific IGE production in atopic dermatitis patients with latex allergy. Behavioral Medicine 29(1), 15–19. Kirschbaum, C., & Hellhammer, D. H. (1994). Salivary cortisol in psychoneuroendocrine research: Recent developments and applications. Psychoneuroendocrinology, 19(4), 313–333. Knight, W. E. J., & Rickard, N. S. (2001). Relaxing music prevents stress-induced increase in subjective anxiety, systolic blood pressure, and heart rate in healthy males and female. Journal of
Music Therapy 34(4), 254–272. Koelsch, S. (2014). Brain correlates of music-evoked emotions. Nature Reviews Neuroscience 15(3), 170–180. Koelsch, S., Boehlig, A., Hohenadel, M., Nitsche, I., Bauer, K., & Sack, U. (2016). The impact of acute stress on hormones and cytokines, and how their recovery is affected by music-evoked positive mood. Scientific Reports 6, 1–11. Retrieved from https://doi.org/10.1038/srep23008 Koelsch, S., Fuermetz, J., Sack, U., Bauer, K., Hohenadel, M., Wiegel, M., … Heinke, W. (2011). Effects of music listening on cortisol levels and propofol consumption during spinal anesthesia. Frontiers in Psychology 2, 1–9. Retrieved from https://doi.org/10.3389/fpsyg.2011.00058 Koyama, M., Wachi, M., Utsuyama, M., Bittman, B., Hirokawa, K., & Kitagawa, M. (2009). Recreational music-making modulates immunological responses and mood states in older adults. Journal of Medical and Dental Sciences 56, 79–90. Koyama, Y., Jodo, E., & Kayama, Y. (1994). Sensory responsiveness of “broad-spike” neurons in the laterodorsal tegmental nucleus, locus coeruleus and dorsal raphe of awake rats: Implications for cholinergic and monoaminergic neuron-specific responses. Neuroscience 63(4), 1021–1031. Kreutz, G., Bongard, S., Rohrmann, S., Hodapp, V., & Grebe, D. (2004). Effects of choir singing or listening on secretory immunoglobulin A, cortisol, and emotional state. Journal of Behavioral Medicine 27(6), 623–635. Kreutz, G., Murcia, C., & Bongard, S. (2012). Psychoneuroendocrine research on music and health: An overview. In R. MacDonald, G. Kreutz, & L. Mitchell (Eds.), Music, health, and wellbeing (pp. 457–476). Oxford: Oxford University Press. Kuhn, D. (2002). The effects of active and passive participation in musical activity on the immune system as measured by salivary immunoglobulin A (SIgA). Journal of Music Therapy 39(1), 30– 39. Kumar, A., Tims, F., Cruess, D., Mintzer, M., Ironson, G., Loewenstein, D., … Kumar, M. (1999). Music therapy increases serum melatonin levels in patients with Alzheimer’s disease. Alternative Therapies in Health Medicine 5(9), 49–57. Leardi, S., Pietroletti, R., Angeloni, G., Necozione, S., Ranalletta, G., & Del Gusto, B. (2007). Randomized clinical trial examining the effect of music therapy in stress response to day surgery. British Journal of Surgery 94(8), 943–947. le Roux, F., Bouic, P., & Bester, M. (2007). The effect of Bach’s Magnificat on emotions, immune, and endocrine parameters during physiotherapy treatment of patients with infectious lung conditions. Journal of Music Therapy 44(2), 156–168. Lefevre, A., Mottolese, R., Dirheimer, M., Mottolese, C., Duhamel, J. R., & Sirigu, A. (2017). A comparison of methods to measure central and peripheral oxytocin concentrations in human and non-human primates. Scientific Reports 7(1), 17222. Retrieved from https://doi.org/10.1038/s41598-017-17674-7 Levitt, P., & Moore, R. Y. (1979). Origin and organization of brainstem catecholamine innervation in the rat. Journal of Comparative Neurology 186(4), 505–528. Lin, P. C., Lin, M. L., Huang, L. C., Hsu, H. C., & Lin, C. C. (2011). Music therapy for patients receiving spine surgery. Journal of Clinical Nursing 20(7–8), 960–968. Lindblad, F., Hogmark, Å., & Theorell, T. (2007). Music intervention for 5th and 6th graders: Effects on development and cortisol secretion. Stress and Health 23(1), 9–14. Linnemann, A., Ditzen, B., Strahler, J., Doerr, J. M., & Nater, U. M. (2015). Music listening as a means of stress reduction in daily life. Psychoneuroendocrinology 60, 82–90. Linnemann, A., Kappert, M. B., Fischer, S., Doerr, J. M., Strahler, J., & Nater, U. M. (2015). The effects of music listening on pain and stress in the daily life of patients with fibromyalgia
syndrome. Frontiers in Human Neuroscience 9, 1–10. Retrieved from https://doi.org/10.3389/fnhum.2015.00434 Linnemann, A., Strahler, J., & Nater, U. M. (2016). The stress-reducing effect of music listening varies depending on the social context. Psychoneuroendocrinology 72, 97–105. Lubin, D., Elliot, J., Black, M., & Johns, J. (2003). An oxytocin antagonist infused into the central nucleus of the amygdala increases maternal aggressive behavior. Behavioral Neuroscience 117(2), 195–201. McCraty, R., Atkinson, M., & Rein, G. (1996). Music enhances the effect of positive emotional states on salivary IgA. Stress Medicine 12, 167–175. McKinney, C. H., Tims, F. C., Kumar, A. M., & Kumar, M. (1997). The effect of selected classical music and spontaneous imagery on plasma β-endorphin. Journal of Behavioral Medicine 20(1), 85–99. MacLean, E. L., Gesquiere, L. R., Gruen, M. E., Sherman, B. L., Martin, W. L., & Carter, C. S. (2017). Endogenous oxytocin, vasopressin, and aggression in domestic dogs. Frontiers in Psychology 8. Retrieved from https://doi.org/10.3389/fpsyg.2017.01613 Mallik, A., Chanda, M. L., & Levitin, D. J. (2017). Anhedonia to music and mu-opioids: Evidence from the administration of naltrexone. Scientific Reports 7, 1–8. Retrieved from https://doi.org/10.1038/srep41952 Mariath, L. M., da Silva, A. M., Kowalski, T. W., Gattino, G. S., De Araujo, G. A., Figueiredo, F. G., … Schuch, J. B. (2017). Music genetics research: Association with musicality of a polymorphism in the AVPR1A gene. Genetics and Molecular Biology 40(2), 421–429. Maulina, T., Djustiana, N., & Shahib, M. N. (2017). The effect of music intervention on dental anxiety during dental extraction procedure. The Open Dentistry Journal 11(1), 565–572. Mejía-Rubalcava, C., Alanís-Tavira, J., Mendieta-Zerón, H., & Sánchez-Pérez, L. (2015). Changes induced by music therapy to physiologic parameters in patients with dental anxiety. Complementary Therapies in Clinical Practice 21(4), 282–286. Menon, V., & Levitin, D. J. (2005). The rewards of music listening: Response and physiological connectivity of the mesolimbic system. NeuroImage 28(1), 175–184. Metherate, R. (2011). Functional connectivity and cholinergic modulation in auditory cortex. Neuroscience & Biobehavioral Reviews 35(10), 2058–2063. Migneault, B., Girard, F., Albert, C., Chouinard, P., Boudreault, D., Provencher, D., … Girard, D. C. (2004). The effect of music on the neurohormonal stress response to surgery under general anesthesia. Anesthesia & Analgesia 98(2), 527–532. Miller, N. S., Kwak, Y., Bohnen, N. I., Müller, M. L. T. M., Dayalu, P., & Seidler, R. D. (2013). The pattern of striatal dopaminergic denervation explains sensorimotor synchronization accuracy in Parkinson’s disease. Behavioural Brain Research 257, 100–110. Möckel, M., Störk, T., Vollert, J., Röcker, L., Danne, O., Hochrein, H., … Frei, U. (1995). Stress reduction through listening to music: Effects on stress hormones, hemodynamics and mental state in patients with arterial hypertension and in healthy persons. Deutsche Medizinische Wochenschrift 120(21), 745–752. Moriizumi, T., & Hattori, T. (1992). Choline acetyltransferase-immunoreactive neurons in the rat entopeduncular nucleus. Neuroscience 46(3), 721–728. Morley, A. P., Narayanan, M., Mines, R., Molokhia, A., Baxter, S., Craig, G., … Craig, I. (2012). AVPR1A and SLC6A4 polymorphisms in choral singers and non-musicians: A gene association study. PLoS ONE 7(2), 2–8. Retrieved from https://doi.org/10.1371/journal.pone.0031763 Morley, B. J., & Happe, H. K. (2000). Cholinergic receptors: Dual roles in transduction and plasticity. Hearing Research 147(1–2), 104–112.
Motts, S. D., & Schofield, B. R. (2010). Cholinergic and non-cholinergic projections from the pedunculopontine and laterodorsal tegmental nuclei to the medial geniculate body in guinea pigs. Frontiers in Neuroanatomy 4, 1–8. Retrieved from https://doi.org/10.3389/fnana.2010.00137 Mueller, K., Fritz, T., Mildner, T., Richter, M., Schulze, K., Lepsien, J., … Möller, H. E. (2015). Investigating the dynamics of the brain response to music: A central role of the ventral striatum/nucleus accumbens. NeuroImage 116, 68–79. Murphy, D. D., Rueter, S. M., Trojanowski, J. Q., & Lee, V. M. (2000). Synucleins are developmentally expressed, and alpha-synuclein regulates the size of the presynaptic vesicular pool in primary hippocampal neurons. Journal of Neuroscience 20(9), 3214–3220. Naganawa, M., Zheng, M.-Q., Henry, S., Nabulsi, N., Lin, S.-F., Ropchan, J., … Huang, Y. (2015). Test-retest reproducibility of binding parameters in humans with 11C-LY2795050, an antagonist PET radiotracer for the opioid receptor. Journal of Nuclear Medicine 56(2), 243–248. Narendran, R., Mason, N. S., Laymon, C. M., Lopresti, B. J., Velasquez, N. D., May, M. A., … Frankle, W. G. (2010). A comparative evaluation of the dopamine D(2/3) agonist radiotracer [11C] (-)-N-propyl-norapomorphine and antagonist [11C]raclopride to measure amphetamine-induced dopamine release in the human striatum. Journal of Pharmacology and Experimental Therapeutics 333(2), 533–539. Narendran, R., Slifstein, M., Guillin, O., Hwang, Y., Hwang, D. R., Scher, E., … Laruelle, M. (2006). Dopamine (D2/3) receptor agonist Positron Emission Tomography radiotracer [11C]-(+)-PHNO is a D3 receptor preferring agonist in vivo. Synapse 60(7), 485–495. Nater, U. M., Abbruzzese, E., Krebs, M., & Ehlert, U. (2006). Sex differences in emotional and psychophysiological responses to musical stimuli. International Journal of Psychophysiology 62(2), 300–308. Nilsson, U. (2009). Soothing music can increase oxytocin levels during bed rest after open-heart surgery: A randomised control trial. Journal of Clinical Nursing 18(15), 2153–2161. Nilsson, U., Unosson, M., & Rawal, N. (2005). Stress reduction and analgesia in patients exposed to calming music postoperatively: A randomized controlled trial. European Journal of Anaesthesiology 22(2), 96–102. Numan, M., Bress, J. A., Ranker, L. R., Gary, A. J., DeNicola, A. L., Bettis, J. K., & Knapp, S. E. (2010). The importance of the basolateral/basomedial amygdala for goal-directed maternal responses in postpartum rats. Behavioural Brain Research 214(2), 368–376. Oczkowska, A., Kozubski, W., Lianeri, M., & Dorszewska, J. (2014). Mutations in PRKN and SNCA genes important for the progress of Parkinson’s disease. Current Genomics 14(8), 502–517. Okada, K., Kurita, A., Takase, B., Otsuka, T., Kodani, E., Kusama, Y., … Mizuno, K. (2009). Effects of music therapy on autonomic nervous system activity, incidence of heart failure events, and plasma cytokine and catecholamine levels in elderly patients with cerebrovascular disease and dementia. International Heart Journal 50(1), 95–110. Ooishi, Y., Mukai, H., Watanabe, K., Kawato, S., & Kashino, M. (2017). Increase in salivary oxytocin and decrease in salivary cortisol after listening to relaxing slow-tempo and exciting fasttempo music. PLoS ONE 12(12), 1–16. Retrieved from https://doi.org/10.1371/journal.pone.0189075 Owen, D. R. J., Gunn, R. N., Rabiner, E. A., Bennacef, I., Fujita, M., Kreisl, W. C., … Parker, C. A. (2011). Mixed-affinity binding in humans with 18-kDa translocator protein ligands. Journal of Nuclear Medicine 52(1), 24–32. Pan, W. X., & Hyland, B. I. (2005). Pedunculopontine tegmental nucleus controls conditioned responses of midbrain dopamine neurons in behaving rats. Journal of Neuroscience 25(19), 4725– 4732. Pierrehumbert, B., Torrisi, R., Laufer, D., Halfon, O., Ansermet, F., & Beck Popovic, M. (2010). Oxytocin response to an experimental psychosocial challenge in adults exposed to traumatic
experiences during childhood or adolescence. Neuroscience 166(1), 168–177. Qiu, J., Jiang, Y.-F., Li, F., Tong, Q.-H., Rong, H., & Cheng, R. (2017). Effect of combined music and touch intervention on pain response and β-endorphin and cortisol concentrations in late preterm infants. BMC Pediatrics 17(1), 1–7. Retrieved from https://doi.org/10.1186/s12887-0160755-y Quiroga Murcia, C., Kreutz, G., Clift, S., & Bongard, S. (2010). Shall we dance? An exploration of the perceived benefits of dancing on well-being. Arts & Health 2(2), 149–163. Rabiner, E. A., & Laruelle, M. (2010). Imaging the D3 receptor in humans in vivo using [11C](+)PHNO positron emission tomography (PET). International Journal of Neuropsychopharmacology 13(3), 289–290. Rainville, J. R., Tsyglakova, M., & Hodes, G. E. (2018). Deciphering sex differences in the immune system and depression. Frontiers in Neuroendocrinology (August). Retrieved from https://doi.org/10.1016/j.yfrne.2017.12.004 Reese, N. B., Garciarill, E., & Skinner, R. D. (1995a). Auditory input to the pedunculopontine nucleus: I. Evoked potentials. Brain Research Bulletin 37(3), 257–264. Reese, N. B., Garciarill, E., & Skinner, R. D. (1995b). Auditory input to the pedunculopontine nucleus: II. Unite responses. Brain Research Bulletin 37(3), 265–273. Salimpoor, V. N., Benovoy, M., Larcher, K., Dagher, A., & Zatorre, R. J. (2011). Anatomically distinct dopamine release during anticipation and experience of peak emotion to music. Nature Neuroscience 14(2), 257–262. Salimpoor, V. N., van den Bosch, I., Kovacevic, N., McIntosh, A. R., & Dagher, A. Z. R. (2013). Interactions between the nucleus accumbens and auditory cortices predict music reward value. Science 340(6129), 216–219. Schladt, T. M., Nordmann, G. C., Emilius, R., Kudielka, B. M., de Jong, T. R., & Neumann, I. D. (2017). Choir versus solo singing: Effects on mood, and salivary oxytocin and cortisol concentrations. Frontiers in Human Neuroscience 11, 1–9. Retrieved from https://doi.org/10.3389/fnhum.2017.00430 Schneider, N., Schedlowski, M., Schürmeyer, T. H., & Becker, H. (2001). Stress reduction through music in patients undergoing cerebral angiography. Neuroradiology 43(6), 472–476. Schofield, B. R. (2010). Projections from auditory cortex to midbrain cholinergic neurons that project to the inferior colliculus. Neuroscience 166(1), 231–240. Schwilling, D., Vogeser, M., Kirchhoff, F., Schwaiblmair, F., Boulesteix, A. L., Schulze, A., & Flemmer, A. W. (2015). Live music reduces stress levels in very low-birthweight infants. Acta Paediatrica (Oslo, Norway), 104(4), 360–367. Shimizu, N., Umemura, T., Hirai, T., Tamura, T., Sato, K., & Kusaka, Y. (2013). Effects of movement music therapy with the naruko clapper on psychological, physical and physiological indices among elderly females: A randomized controlled trial. Gerontology 59(4), 355–367. Shotbolt, P., Tziortzi, A. C., Searle, G. E., Colasanti, A., Van Der Aart, J., Abanades, S., … Rabiner, E. A. (2012). Within-subject comparison of [11C]-(+)-PHNO and [11C]raclopride sensitivity to acute amphetamine challenge in healthy humans. Journal of Cerebral Blood Flow and Metabolism 32(1), 127–136. Solís, O., & Moratalla, R. (2018). Dopamine receptors: Homomeric and heteromeric complexes in l‑DOPA‑induced dyskinesia. Journal of Neural Transmission 1, 1–8. Retrieved from https://doi.org/10.1007/s00702-018-1852-x Spencer, R. L., Chun, L. E., Hartsock, M. J., & Woodruff, E. R. (2018). Glucocorticoid hormones are both a major circadian signal and major stress signal: How this shared signal contributes to a dynamic relationship between the circadian and stress systems. Frontiers in Neuroendocrinology 49, 52–71.
Stefano, G. B., Zhu, W., Cadet, P., Salamon, E., & Mantione, K. J. (2004). Music alters constitutively expressed opiate and cytokine processes in listeners. Medical Science Monitor: International Medical Journal of Experimental and Clinical Research 10(6), MS18–MS27. Suzuki, M., Kanamori, M., Nagasawa, S., Tokiko, I., & Takayuki, S. (2007). Music therapy-induced changes in behavioral evaluations, and saliva chromogranin A and immunoglobulin A concentrations in elderly patients with senile dementia. Geriatrics & Gerontology International 7(1), 61–71. Tabrizi, E. M., Sahraei, H., & Rad, S. M. (2012). The effect of music on the level of cortisol, blood glucose and physiological variables. EXCLI Journal 11, 556–565. Retrieved from https://doi.org/10.3389/fpsyg.2011.00058 Tan, Y. T., McPherson, G. E., Peretz, I., Berkovic, S. F., & Wilson, S. J. (2014). The genetic basis of music ability. Frontiers in Psychology 5, 1–19. Retrieved from https://doi.org/10.3389/fpsyg.2014.00658 Thoma, M. V., La Marca, R., Brönnimann, R., Finkel, L., Ehlert, U., & Nater, U. M. (2013). The effect of music on the human stress response. PLoS ONE 8(8), 1–12. Retrieved from https://doi.org/10.1371/journal.pone.0070156 Thompson, A. M. (2003). Pontine sources of norepinephrine in the cat cochlear nucleus. Journal of Comparative Neurology 457(4), 374–383. Thompson, R. R., & Walton, J. C. (2004). Peptide effects on social behavior: Effects of vasotocin and isotocin on social approach behavior in male goldfish (Carassius auratus). Behavioral Neuroscience 118(3), 620–626. Trappe, H.-J., & Voit, G. (2016). The cardiovascular effect of musical genres. Deutzsches Ärzteblatt International 113(20), 347–352. Turvey, S. E., & Broide, D. H. (2010). Innate immunity. Journal of Allergy and Clinical Immunology 125(2 Suppl. 2), S24–S32. Ukkola-Vuoti, L., Kanduri, C., Oikkonen, J., Buck, G., Blancher, C., Raijas, P., … Järvelä, I. (2013). Genome-wide copy number variation analysis in extended families and unrelated individuals characterized for musical aptitude and creativity in music. PLoS ONE 8(2). Retrieved from https://doi.org/10.1371/journal.pone.0056356 Ukkola-Vuoti, L., Oikkonen, J., Onkamo, P., Karma, K., Raijas, P., & Järvelä, I. (2011). Association of the arginine vasopressin receptor 1A (AVPR1A) haplotypes with listening to music. Journal of Human Genetics 56(4), 324–329. Ukkola, L., Onkamo, P., Raijas, P., Karma, K., & Järvelä, I. (2009). Musical aptitude is associated with AVPR1A-Halotypes. PLoS ONE 4(5), e5534. Retrieved from https://doi.org/10.1371/journal.pone.0005534 Valdiglesias, V., Maseda, A., Lorenzo-López, L., Pásaro, E., Millán-Calenti, J. C., & Laffon, B. (2017). Is salivary chromogranin A a valid psychological stress biomarker during sensory stimulation in people with advanced dementia? Journal of Alzheimer’s Disease 55(4), 1509–1517. Valstad, M., Alvares, G. A., Egknud, M., Matziorinis, A. M., Andreassen, O. A., Westlye, L. T., & Quintana, D. S. (2017). The correlation between central and peripheral oxytocin concentrations: A systematic review and meta-analysis. Neuroscience & Biobehavioral Reviews 78, 117–124. Veening, J. G., Gerrits, P. O., & Barendregt, H. P. (2012). Volume transmission of beta-endorphin via the cerebrospinal fluid: A review. Fluids and Barriers of the CNS 9(1), 1. Retrieved from https://doi.org/10.1186/2045-8118-9-16 Venneti, S., Lopresti, B. J., & Wiley, C. A. (2013). Molecular imaging of microglia/macrophages in the brain. Glia 61(1), 10–23. Retrieved from https://doi.org/10.1002/glia.22357 Vollert, J., Störk, T., Rose, M., & Möckel, M. (2003). Musik als begleitende therapie bei koronarer herzkrankheit. Deutsche Medizinische Wochenschrift, 128, 2712–2716.
Wahbeh, H., Calabrese, C., & Zwickey, H. (2007). Binaural beat technology in humans: A pilot study to assess psychologic and physiologic effects. Journal of Alternative and Complementary Medicine 13(1), 25–32. Wang, S., Kulkarni, L., Dolev, J., & Kain, Z. (2002). Music and preoperative anxiety: A randomized, controlled study. Anesthesia & Analgesia 94(6), 1489–1494. Willeit, M., Ginovart, N., Graff, A., Rusjan, P., Vitcu, I., Houle, S., … Kapur, S. (2008). First human evidence of d-amphetamine induced displacement of a D2/3agonist radioligand: A [11C]-(+)PHNO positron emission tomography study. Neuropsychopharmacology 33(2), 279–289. Willeit, M., Ginovart, N., Kapur, S., Houle, S., Hussey, D., Seeman, P., & Wilson, A. A. (2006). High-affinity states of human brain dopamine D2/3 receptors imaged by the agonist [11C]-(+)PHNO. Biological Psychiatry 59(5), 389–394. Woof, J. M., & Ken, M. A. (2006). The function of immunoglobulin A in immunity. Journal of Pathology 208(2), 270–282. Yamamoto, T., Ohkuwa, T., Itoh, H., Kitoh, M., Terasawa, J., Tsuda, T., … Sato, Y. (2003). Effects of pre-exercise listening to slow and fast rhythm music on supramaximal cycle performance and selected metabolic variables. Archives of Physiology and Biochemistry 111(3), 211–214. Yovel, G., Shakhar, K., & Ben-Eliyahu, S. (2001). The effects of sex, menstrual cycle, and oral contraceptives on the number and activity of natural killer cells. Gynecologic Oncology 81(2), 254–262. Yuhi, T., Kyuta, H., Mori, H.-A., Murakami, C., Furuhara, K., Okuno, M., … Higashida, H. (2017). Salivary oxytocin concentration changes during a group drumming intervention for maltreated school children. Brain Sciences 7, 152. Retrieved from https://doi.org/10.3390/brainsci7110152
CHAPT E R 15
THE NEUROAESTHETICS OF MUSIC: A RESEARCH AGENDA COMING OF AGE E LV I R A B R AT T I C O
I H , the study of music, how it is perceived and appreciated and how it is created (composed) and produced (performed) has been approached in two broadly distinct ways. On one hand, music has been studied as a succession of compositions and composers and how these are acclaimed in different epochs. This “humanistic” approach uses the descriptive methods of history, sociology, and philosophy, and it is often identified with musicology proper. Within this approach, philosophical aesthetics of music finds its place (Scruton, 1999): the goal is to describe the change of musical taste over time, namely the explicit or unsaid principles that tacitly govern the consensus on what is considered musically acceptable and admirable (“beautiful”) and what is not. The peculiarity of this “humanistic” approach is the attention to the work of a single composer or musician, narrated for evidencing the uniqueness and exceptionality of his/her work, and its non-replicable contribution to humanity (Zeki, 2014). On the other hand, music has also been studied analytically with methods resembling natural sciences more than humanities. Music theory in
primis and systematic musicology in secundis have evidenced the conventions that underlie music composition, namely the recipes for creating music, derived from the work of generally recognized composers, and the constant laws of perception that govern how music is understood and appreciated. With the advent of cognitive science, this “systematic” approach, grounded on the scientific method, has been inspired by the computer metaphor in the search for universal rules that govern how we perceive, appreciate, and produce music (Sloboda, 1985). The search for the perception and cognition laws for music has profited from neurological findings, in which patients with brain lesions in auditory temporal areas showed a loss of musical perceptual abilities accompanied with preservation of other auditory perceptual skills (Peretz & Zatorre, 2005). These studies, when supported by the opposite findings, namely when showing a double dissociation between music and language perception, provided the grounds for the initial influential models of music perception and production, listing a set of modules, each dedicated to encapsulated and automatic subskills (Peretz & Coltheart, 2003). This line of research, bridging systematic musicology with brain-lesion studies, has seen its climax in the 1980s and early 1990s. The 1990s, called the “decade of the brain,” also witnessed a surge of interest for answering epistemological (perception- and cognition-related) questions with experiments on healthy volunteers using methods borrowed from neurophysiology and neurology. New brain scanning devices such as magnetoencephalography (MEG, measuring the magnetic fields around ion currents produced by neurotransmission) and functional magnetic resonance imaging (fMRI, measuring neuron-activity-dependent hemoglobin changes in blood flow in the brain) allowed access to the study of music brain functions to a broader group of researchers, without the need to study rare brain-lesion patients. Healthy volunteers could be increasingly measured during music tasks without causing any harm to them, apart from the shortlasting discomfort of the experimental session. This variation of the systematic approach peaked in the 2000s decade and is called “cognitive neuroscience of music” (Levitin & Tirovolas, 2009; Peretz & Zatorre, 2003; Samson, Dellacherie, & Platel, 2009) or more simply “neurosciences of music” (Altenmüller et al., 2012; Bigand & Tillmann, 2015). According to these accounts, music corresponds to a biological function, involving universal features that are shared by all humans ontologically (since birth)
and philologically (since the appearance of Homo sapiens). More complex models of music perception, cognition, and emotions started to emerge, incorporating findings that pointed at shared rather than modular neural resources dedicated to music, in relation to other auditory functions (Früholz, Trost, & Kotz, 2016; Koelsch & Siebel, 2005; Patel, 2008). Hence, in cognitive neuroscience of music, the main goals have been, and still are, the search for brain specializations for music (as opposed to speech), the determination of the neural foundations of music perception, emotion, and production, and the identification of music effects on other brain functions. Overall, the predominant topics and models within cognitive neuroscience of music leave little space to aesthetic processes such as evaluative judgments, appreciation, and taste formation. In the past years, though, we are witnessing a shift of paradigm within the “systematic” approach, centered on a revised conceptualization of music, and that might ultimately reconcile this approach with the traditional “humanistic” one. This shift has been initiated by studies that have focused on the subjective experience of music listening, rather than the objective, physical attributes of it. In these studies, experimental participants were asked to bring their own music to the laboratory, and their individual reactions to the music heard became the focus of investigation, irrespectively of which object induced those reactions (Blood & Zatorre, 2001; Brattico et al., 2016). This experience is referred to as aesthetic when it originates in association to an artistic, human-made object without clear utilitarian functions. In several philosophical conceptualizations, in art and music what matters is the phenomenological content of the individual experience. The scientific method applied to the study of this experience is called empirical aesthetics, when mainly behavioral methods are used, or neuroaesthetics, when also brain research techniques are applied. In empirical aesthetics and neuroaesthetics, researchers strive to fragment the aesthetic experience into subprocesses or stages that can be studied separately and that, when replicated, can produce a predictable outcome. However, since the human mind possesses an embodied craving for beauty, harmony, and symmetry, some artistic-object features that generate an aesthetic experience occur more frequently than others (Chatterjee & Vartanian, 2016; Conway & Rehding, 2013; Pearce et al., 2016; Smith, 2005).
Indeed, art and music are forms of human expression that are as old as our species is (Aubert et al., 2014; Curtis, 2006). Hence, the aesthetic experience of music (and other arts) must be a biological as well as a cultural phenomenon. This point of view does not in any way downplay the act of creation, but rather emphasizes the fact that an aesthetic experience has aspects that are amenable for analysis in terms of biological frameworks. A recent cross-cultural study (Savage, Brown, Sakai, & Currie, 2015) provides further support studying music eliciting aesthetic experiences common to all humans. This study showed that the well-studied statistically predominant perceptual and cognitive features of music (pitch: use of discrete pitches, small intervals and melodic contours; rhythm: isochronous beat and multiples of beats; form: short phrases lasting less than 9 sec) are accompanied by other features that have been thus far marginalized in scientific investigation, namely instrumentation (concurrent use of voice and instruments), performance style (chest voice), and social context (performed in groups and by males). These features relate to aspects of music that are relevant in an aesthetic experience and that have been thus far mainly related to cultural transmission rather than biology: for instance, mastering the style of the music is often a prerequisite for reaching a positive aesthetic outcome and the type of social context is also a determinant of a musical aesthetic experience. In line with this, a metaanalysis has summarized the reasons for listening to music (Schafer, Sedlmeier, Stadtler, & Huron, 2013), illustrating, from the subjective experiential viewpoint, how music can be addressed by scientific investigation. Among the 129 surveyed reasons from the literature, three main factors emerged: social relatedness, self-awareness, and mood regulation/arousal. The last factor supports previous claims that music listening behavior is explained by the emotional and aesthetic impact of music. The other two factors have been less studied with neuroscience methods, also due to the limitations intrinsic to the experimental setup. In the present chapter, I first describe the general framework of neuroaesthetics of art that has inspired the advocated paradigm shift from music neuroscience to music neuroaesthetics, and then provide some putative reasons for the slow emergence of this field of research, as opposed to the neuroaesthetics of visual arts. Then, I list some of the main findings obtained within music neuroaesthetics that have been organized in the few models existing in the literature. The discussion is dedicated to the frontiers
in the study of intra-subject neural interactions between brain areas that give rise to aesthetic responses and to the latest attempts for capturing the neural attributes of inter-subject interactions during musical performance.
T
N D
S
N A
The term neuroaesthetics was first coined by Semir Zeki almost two decades ago (Zeki, 1999) to indicate a multidisciplinary field of research, focused at first on visual art, merging a long history of philosophical and empirical aesthetics with the methodology of cognitive and affective neurosciences (Chatterjee, 2011; Chatterjee & Vartanian, 2014, 2016; Conway & Rehding, 2013; Nadal & Pearce, 2011; Pearce et al., 2016). Neuroaesthetics seeks to understand the neural principles underlying the different processes that compose a human aesthetic experience with an artistic object (Livingstone & Hubel, 2002). An aesthetic experience has been defined as a psychological state determined by interaction with an object to which we intend to attribute (evaluate/appraise) positive or negative qualities according to perceptual, cognitive, affective, or cultural criteria. It is intrinsically different from other affective experiences due to a special attitude (also referred to as focus, stance, or pre-classification) toward the object. According to a Kantian notion, this aesthetic stance is often characterized by being disinterested, distanced from the primary emotional needs of the organism (Leder, Gerger, Brieber, & Schwarz, 2014). According to a somewhat tautological definition, an aesthetic experience is “an experience of a particular kind of object that has been designed to produce such an experience” (Bundgaard, 2015, p. 788). According to this conceptualization, an aesthetic experience arises when, through perceptualrepresentational processes, we attribute to the stimulus a meaning based on aesthetic evaluation. While there exist some universal laws of preference for some stimulus configurations (e.g., according to Gestalt laws humans tend to like symmetry, equilibrium, and order due to organizational function of the organism; Cupchik, 2007; Eysenck, 1942), the stimulus alone is not by itself the source of an aesthetic experience. Rather, it is the intentional
relation and attitude that the subject has with the stimulus. Because of this, subjectivity is intrinsic in aesthetic responses. A stimulus that is aesthetically appealing to one person can be repulsive to another. These variations derive from both the internal state, including the personal experience of previous encounters with the stimulus, and the attitudes toward the stimulus, the current mood, and the innate biological predispositions for processing the stimulus and for having an aesthetic experience as a whole (Pelowski, Markey, Forster, Gerger, & Leder, 2017). Along this conceptualization, the research field of neuroaesthetics is dedicated to studying how the brain facilitates the human capacity for experiencing phenomena as “aesthetic” and for creating objects that evoke such experience. To delve into these aims, one can choose two possible directions of investigation, as also conceptualized by Brattico (2015), Cupchik and colleagues (Cupchik, Vartanian, Crawley, & Mikulis, 2009), Jacobsen and Beudt (2017), and Pelowski et al. (2017): on one hand, the bottom-up perceptual facilitation of aesthetic responses based on the physical properties of an artwork, and, on the other hand, the feedback and feedforward relationship between top-down, intentional orientation of attention and the artwork. Following Redies (Redies, 2015), this dualism in how aesthetic phenomena are studied can be represented as a dichotomy between formalist and contextual theories. Formalist theories propose that the aesthetic experience relies on formal properties of the stimulus (e.g., symmetry, sensual beauty), which are considered to be universal and based on human brain physiology. Often in these theories, aesthetic responses to art are described as automatic and independent from conscious control (Zeki, 2013). In turn, in contextual theories the aesthetic experience depends on the intention of the artist and the circumstances under which the artwork has been created and is displayed. Some of these theories focus on contemporary abstract art, characterized by a lesser role given to sensory features (Jacobsen, 2014; Leder, Belke, Oeberst, & Augustin, 2004; Pelowski, Markey, Lauring, & Leder, 2016). Some proposals also attempt a reconciliation between the two opposite stands, modeling the impact of topdown and bottom-up factors depending on the type of artistic stimulus that is at hand. For instance, in the model by Redies (Redies, 2015) external information, meaning the stimulus features and context in which it is displayed, is distinct from internal representation, meaning the subjective representation and reaction to the stimulus by the beholder. In this particular
model, aesthetic experience is reached only with favorable encoding and cognitive mastering of the stimulus. In most proposals, mainly focused on visual art (Pearce et al., 2016; Pelowski et al., 2016), the aesthetic experience seems to emerge from the interaction of cognitive, affective, and evaluative processes, involving at least three different brain processes: (a) an enhancement of low-level sensory processing; (b) high-level top-down processing and activation of cortical areas involved in evaluative judgment; (c) an engagement of the reward circuit, including cortical and subcortical regions. The initial efforts within neuroaesthetics of visual art involved measurements of subjects’ brain activity while they evaluated the beauty or preference of artistic versus natural pictures (e.g., Vartanian & Goel, 2004), while they rated the beauty or correctness of abstract visual patterns (e.g., Jacobsen & Höfel, 2003), or while they viewed abstract, still life, landscape, or portrait pictures classified as beautiful, ugly, or neutral prior to the brain scanning session (e.g., Kawabata & Zeki, 2004). After these inspiring works, a great number of publications using neuroimaging and neurophysiological techniques have followed. Current neuroaesthetic research has fractionated human responses to art into the main outcomes of aesthetic emotions (e.g., pleasure, being moved, interest), preference (e.g., conscious liking), and judgment (e.g., beauty), associating to each of them a replicable and reliable pattern of neural and physiological activity (Brattico et al., 2016; Brattico, Bogert, & Jacobsen, 2013; Brattico & Pearce, 2013; Chatterjee & Vartanian, 2014, 2016; Istok, Brattico, Jacobsen, Ritter, & Tervaniemi, 2013; Jacobsen, 2014; Leder, Markey, & Pelowski, 2015; Nieminen, Istok, Brattico, Tervaniemi, & Huotilainen, 2011; Pearce et al., 2016; Pelowski et al., 2016; Reybrouck & Brattico, 2015). In these proposals, aesthetic emotions are the subjective feelings elicited by an artistic object whereas aesthetic judgments are defined as subjective evaluations based on an individual set of criteria. Moreover, several factors affecting the aesthetic experience have been targeted by neuroscientific investigation: environment, intentions, familiarity, expertise, and attitudes. In the latest overarching proposal called the Vienna Integrated Model of Art Perception or VIMAP (Pelowski et al., 2017), bottom-up processing of lowlevel artwork derived features, listing perceptual analysis, implicit memory integration, and explicit classification, is conjoined with top-down factors. Among those latter factors, cognitive mastery, namely the matching of all
information collected in previous processing stages to existing predictions and schemata, plays a central role and leads to the creation of meaning and associations. Brain substrates of the difference stages of the visual aesthetic experiences have also been identified particularly in visual cortices for feature analysis, dorsolateral prefrontal cortex for cognitive mastery default-mode network regions, error-monitoring regions of the anterior cingulate cortex, limbic regions (particularly, insula and amygdala) for controlling emotions, and orbitofrontal cortex for integrating signals from cognitive and emotional brain regions and issuing aesthetic judgments. Lately, while the initial and majority of efforts have concentrated on visual art (paintings), researchers keen on the neuroaesthetics approach have expanded their interest from visual art toward several other artistic domains, such as sculpture (Di Dio, Macaluso, & Rizzolatti, 2007), architecture (Coburn, Vartanian, & Chatterjee, 2017), dance (Calvo-Merino, Glaser, Grezes, Passingham, & Haggard, 2005; Calvo-Merino, Jola, Glaser, & Haggard, 2008), and poetry (Wassiliwizky, Koelsch, Wagner, Jacobsen, & Menninghaus, 2017). In the past few years, the field has seen a fast growth with several special issues of journals and books (e.g., Huston, Nadal, Mora, Agnati, & Cela Conde, 2015; Martindale, Locher, & Petrov, 2007), reviews (Chatterjee, 2011; Chatterjee and Vartanian, 2014, 2016; Leder & Nadal, 2014; Nadal et al., 2008; Pearce et al., 2016; Pelowski et al., 2016, 2017), and conferences (e.g., Nadal & Pearce, 2011). While critiques do exist (Tallis, 2008, 2011), and are indeed welcome for a healthy scientific debate, in the past two years the status of neuroaesthetics, especially for visual arts, has changed from that of contingent or trendy to that of a mature discipline (Chatterjee, 2011; Leder & Nadal, 2014; Pearce et al., 2016).
N
: A R M
A
Similar to other artistic domains, music is phylogenetically universal: it has existed across all human cultures and epochs. It might be even older than our Homo sapiens species: a flute with two holes carved in a bear bone was found in 1996 from a cave in Slovenia that was inhabited by Neanderthals
(Aubert et al., 2014; Seghdi & Brattico, in press). Music is also ontogenetically universal considering that it is the first form of communication between a newborn and a parent and the last one to disappear when all other cognitive functions have been polished away by neurodegenerative decay (Golden et al., 2017; Jacobsen et al., 2015; Matrone & Brattico, 2015). Music shares all these aspects, namely universality, evolutionary functions, emotional impact, expressivity, with other forms of art. Moreover, music is characterized by responses that are aesthetic in nature, since they involve a variety of emotional processes that typically are associated with and temporally precede evaluative (subjective) decisions to consciously like the music heard and attribute to it (objective) properties of beauty or mastering or interest, as well as to seek for the same experience again. These processes form a learning motivational loop that ultimately generates a set of preferences and habits called musical taste. For instance, the top reasons why we listen to music (Laukka, 2007; McDonald & Stewart, 2008) and even why we become musicians (Juslin & Laukka, 2004; Sloboda, 1992) are related to the aesthetic responses that music evokes: enjoyment, being moved, entertainment, and beauty. Also, when asked to name the adjectives that best describe the aesthetic value of music hundreds of university students indicated “beautiful” as the most common word (Istok et al., 2009). Hence, cognitive neuroscience can regard music as a form of expressive art, rather than an auditory domain to be contrasted with the other auditory domains of speech/language, as proposed in a first essay dedicated to the emerging field (Brattico & Pearce, 2013), aligning itself to the recent progress of neuroaesthetics. Along this line of thought, already in the late 1800s, the German philosopher Eduard Hanslick (1825–1904) underlined the strong links between music and aesthetics as opposed to the utilitarian function of speech: “Speech and music have their centres of gravity at different points, around which the characteristics of each are grouped: and while all specific laws of music will centre in its independent forms of beauty, all laws of speech will turn upon the correct use of sounds as a medium of expressing ideas” (Hanslick, 1954, pp. 94–95). In a second essay dedicated to music neuroaesthetics (Hodges, 2016), the field was described as counting two distinct research agendas. The first one is a “broad” agenda that studies music perception, cognition, and emotion, without explicit reference to aesthetics or to any aesthetic concept, and
which can be identified with the broader field of cognitive neuroscience of music. The second one is a research agenda of “narrow” scope that can be identified as the “core” neuroaesthetics of music, since it deals primarily with aesthetic processes, and it explicitly refers to preference, aesthetic emotions, and beauty (or other aesthetic) judgments. The increasing amount of studies under the umbrella of the “core” neuroaesthetics of music often do not explicitly refer to any specific model of the musical aesthetic experience, but they typically contain the word “aesthetic” when describing findings. The goals of the “core” neuroaesthetics of music are to determine how the neuronal processing of multisensorial signals leads to aesthetic responses during music listening and performance. Aesthetic responses include emotions (such as sensory and conscious pleasure or enjoyment, being moved), liking or preference, and aesthetic judgment. The present chapter aims at identifying the main themes that separate music neuroaesthetics from the broader cognitive neurosciences of music (see Fig. 1).
FIGURE 1. Diagram illustrating the standing of the field of neuroaesthetics of music within broader human cognitive neuroscience studies.
E
M A
M E
Even if the past few years have witnessed several studies on aestheticrelated phenomena during music listening, the scientific questions asked have often been addressed without any explicit reference to overarching
aesthetic frameworks, differently from what happens in visual research (Brattico & Pearce, 2013; Hodges, 2016). In a critical integrative analysis of thirty-one empirical aesthetic studies conducted between the years 1990 and 2015 (out of the initial 1,450 references first obtained) (Tiihonen, Brattico, Maksimainen, Wikgren, & Saarikallio, 2017), it was noted that scientific investigations of pleasure, one of the main subjective aesthetic responses to any artwork, have been contextualized within aesthetic frameworks and concepts for the visual modality (studies using stimuli from figurative arts, such as painting or sculptures) whereas they were linked to basic neuroscientific literature on primary pleasure (or the absence of it) for music modality. This analysis confirms that visual empirical and neuroaesthetics are active fields counting a number of established and wellrecognized frameworks, whereas research on music is dominated by sensory and basic emotion models. The current situation can be attributed to the scarcity of brain-based models of aesthetic processes in music, leading to limited efforts of overarching interpretations of the individual neuroscientific findings obtained. One of these models (illustrated in Fig. 2) is characterized by a chronometric distinction of the information processing stages leading to aesthetic responses. This and further developments by the same authors establish a distinction between pre-attentive, low-level perceptual and emotional stages, and reflective processes involving cognitive control (Brattico, 2015; Brattico et al., 2013; Brattico & Pearce, 2013; Nieminen et al., 2011; Reybrouck & Brattico, 2015). These stages lead to the three main outcomes of an aesthetic experience, namely emotion, preference, and judgment (Brattico, 2015; Brattico & Pearce, 2013). These previous accounts include a locationist view combined with a temporal information processing description of the brain mechanisms involved in the aesthetic experience of music: each temporally evolving stage depends on a distinct set of specific brain structures. The final outcomes of the aesthetic experience require the succession of all previous stages in order to materialize. For instance, according to Brattico et al. (2013), conscious liking judgments can be issued after the brainstem, thalamus, and limbic regions have quickly reacted to salient features of the sound, and after the frontotemporal cortex has encoded and integrated those sound features with learned cognitive schemata, using parietal and action observation neural resources for attributing emotional connotations to the sounds (see Fig. 2).
If all these stages are successfully completed, and if limbic, prefrontal, and mentalizing brain regions are conspicuously activated, then a liking judgment, possibly accompanied also by a beauty verdict, is issued.
FIGURE 2. A schematic representation of a previous framework concerning the timing, localization, and effects of neural processes contributing to aesthetic experience (modified from Brattico et al., 2013). The lower block shows how the various processes evolve as a function of time, beginning from the first sensory analyses to the main outcomes of aesthetic emotions, preference, and judgments. The upper block illustrates their rough anatomic locations and connections in the human brain. ABR = auditory brainstem response; LPP = late positive potential.
Other influential models that inform research on music neuroaesthetics, although not explicitly referring to the aesthetic experience as a whole, have targeted either music-induced emotions or mainly pleasure (irrespectively of other emotions). The most influential model for music-induced emotions has been first proposed by Juslin and Västfjäll (2008) and identifies eight main mechanisms that are supposed to explain induction of any musical emotion: brainstem reflexes (the automatic reactions to salient, potentially important, features of sounds), evaluative conditioning (deriving from repeated pairing of music to positive or negative stimuli), emotional contagion (when music mimics a bodily or vocal emotional expression), visual imagery (association with visual images during listening), episodic memory (elicitation of a memory for a particular event), and musical expectancy. This latter mechanism has been strongly linked with two important forces accounting for a rewarding musical experience: predictability and surprise (Huron, 2006). During listening, we use our former encounters with music and our implicit knowledge of musical conventions to consciously or implicitly anticipate the outcomes of the musical “paths,” wondering where they might lead us (Huron, 2006, 2009). According to Huron (2006, 2009), Imagination, Tension, Prediction, Reaction, and Appraisal (ITPRA) create a loop leading to musical pleasure: anticipating future events in music through imagination creates both physiological and psychological tension, and both unconscious and conscious predictions for specific features are formed; the final outcome is a reaction leading to a conscious appraisal response (whether the outcome is good, bad, or something in between). In a summarizing effort, Vuust & Kringelbach (2010) identified a dichotomy between extra-musical mechanisms that rely, for example, on associations with past events or other emotional sounds and the intra-musical mechanism of anticipation. According to this latter mechanism, a musical structure would be aesthetically pleasing when it optimally challenges learned predictions for incoming events (Vuust & Witek, 2014; Witek, Clarke, Wallentin, Kringelbach, & Vuust, 2014; Witek, Kringelbach, & Vuust, 2015). In the
brain, dopaminergic neurotransmission between ventral tegmental area, ventral striatum (including the nucleus accumbens), amygdala, and insula up to the orbitofrontal cortex, is associated with desire for a reward, especially when it comes as unexpected (prediction error), whereas dorsal striatum and opioid neurotransmission seem to be related with the actual pleasurable reaction (Berridge & Kringelbach, 2015; Kringelbach & Berridge, 2017; Salimpoor, Benovoy, Larcher, Dagher, & Zatorre, 2011). To the initial six emotion-inducing mechanisms proposed by Juslin and Västfjäll (2008), another two were added by Juslin (2013). The first one was rhythmic entrainment. This mechanism is particularly interesting from the neuroscience perspective as it has been linked to the neural mechanism that synchronizes the firing frequency of neuronal assemblies to the pulse of the music heard (Large & Snyder, 2009), although not at tempi below 1 note per second (Doelling & Poeppel, 2015). In some cases, this neuronal entrainment can be observed even in the spectral domain. For instance, a dissonant sound seems to elicit neuronal activity that periodically oscillates at the same frequency as the beats (amplitude modulations) of the sound (Fishman et al., 2001; Pallesen et al., 2015). The other added mechanism was aesthetic judgment (Juslin, 2013), that begins when a special aesthetic attitude is adopted and that is based on a set of individual criteria determining a preference or rejection of a particular musical piece. Aesthetic judgment, according to Juslin (2013), accounts for the special nature of music-induced emotions that distinguish them from mundane emotions (such as sadness for a sudden loss) as well as for the common incidence of mixed emotions induced by music representing negative emotions but producing pleasurable feelings of enjoyment. Notably, in this model, the distinction between emotions, preference, and judgments is made, similarly to Brattico et al. (2013), although it differs from that because of a lesser emphasis on temporally succeeding neutrally distinct processes. The most comprehensive accounts of the aesthetic experience (Brattico et al., 2013; Hargreaves & North, 2010; Hodges, 2016; Juslin, 2013) cover also the context, namely the external physical environment surrounding the individual during a musical activity. The listening experience changes depending on whether it is consumed alone or with peers, in a concert hall or at home. The listener, that is, the internal state of the individual (attention, intention, attitude, motivation, personality) cannot be omitted
either (Brattico et al., 2013; Brattico, 2015; Hargreaves & North, 2010; Hodges, 2016; Reybrouck and Brattico, 2015); a specific internal state can either impose an incidental music consumption, in the case of a distracted person with no intention to have any musical exposure, or cause a full aesthetic experience with positive responses, in the case of the avid concertgoer.
M
B A
S R
R M
In the information processing models of the aesthetic experience presented above, the extraction of acoustic features in brainstem, thalamus, and sensory cortices is the first necessary stage. In music (but also in visual art, according to some models), emotional responses, described also as reactions or reflexes, occur already at an early stage and are closely predicted by the physical content of the stimulus (Brattico et al., 2013; Pearce, 2015; Reybrouck & Brattico, 2015). For instance, a rough, dissonant chord can alone excite neuronal assemblies in the limbic system, such as the amygdala and parahippocampal gyrus (Blood, Zatorre, Bermudez, & Evans, 1999; Gosselin et al., 2006; Pallesen et al., 2005). In the case of early emotional reactions to sounds, causing immediate sensory pleasure, limbic regions can be activated even without the involvement of higher-order brain areas. A dissociation between fast and slow routes for pleasure (described in Brattico, 2015; Kringelbach & Berridge, 2017), is visible in studies involving tasks that distract subjects from deliberate evaluation of sounds. For instance, in Bogert et al. (2016) limbic regions were activated in response to emotionally stereotypical music clips only when subjects were focusing their attention on descriptive aspects of the sounds, whereas they were downregulated when subjects had to direct their conscious attention to the emotions expressed by the music. An intermediate stage of the aesthetic experience, explicitly mentioned in Brattico et al. (2013) and Juslin (2013) as well as several visual models (Pelowski et al., 2016, 2017), includes integration of features and the modulation by existing cultural knowledge. This stage requires the involvement of lateral prefrontal cortex, particularly the inferior frontal
gyrus, and premotor areas. These brain regions have been repeatedly involved in the detection of incongruous sound events, violating expectations based on previous knowledge of musical conventions. The predictive coding theory of brain function suggests that in both auditory and frontal regions of the brain prior predictions are continuously applied topdown to an incoming signal and when an error occurs between priors and actual signal, predictions are changed in a bottom-up feedback loop for minimizing free energy (Friston, 2005; Vuust, Ostergaard, Pallesen, Bailey, & Roepstorff, 2009). These prediction errors can be measured by using the event-related potential (ERP) technique and by focusing on brain responses such as the N100 or the mismatch negativity (MMN) or the early right anterior negativity (ERAN) (Koelsch, 2011; Koelsch & Siebel, 2005), tracking the information content (probabilistically based on the occurrence of sounds in the preceding context) or subjective expectancy of sounds (Pearce, Ruiz, Kapasi, Wiggins, & Bhattacharya, 2010). During the intermediate stage, discrete emotions expressed by music are perceived and possibly even induced. While in Juslin’s (2013) model, emotions are considered as an outcome of the different psychological and neural mechanisms activated during a listening experience, in Brattico et al.’s (2013) model emotions are perceived and felt before other aesthetic outcomes occur. Support for this view comes from studies showing the independence between conscious, thought-related aesthetic processes and emotional processes (Bogert et al., 2016; Brielmann & Pelli, 2017; Liu et al., in press). A recent meta-analysis of fMRI studies on musical emotions highlights a set of regions in the brain that form the core of the functional network that processes musical emotions, namely nucleus accumbens, amygdala, hippocampus, insula, cingulate cortex, orbitofrontal cortex, and temporal pole (Koelsch, 2014). One kind of emotional response to music is conscious pleasure or enjoyment, closely related to liking and preference. In existing models, enjoyment and conscious liking are described as aesthetic outcomes since they require a deliberate decision and an evaluative act deriving from the integration of the preceding cognitive and emotional information processing stages (Brattico et al., 2013; Juslin, 2013). From the brain perspective, conscious pleasure and liking, often accompanied by the bodily response of chills, have been consistently associated with activity of mesolimbic brain regions of the reward circuit, including the nucleus accumbens, the ventral
tegmental area, the amygdala, the insula, the orbitofrontal cortex, and the ventromedial prefrontal cortex, which rely on the neurotransmitter dopamine (Blood & Zatorre, 2001; Blum et al., 2010; Chanda & Levitin, 2013; Koelsch, 2014; Salimpoor et al., 2013; Zatorre, 2015). A third kind of aesthetic outcome is aesthetic judgment (“this music is beautiful”). As visible from Table 1, only a few studies have analyzed aesthetic judgments. Indeed, beauty is the most mentioned criterion when freely associating a word to music aesthetic value (Istok et al., 2009). A series of studies aimed at contrasting aesthetic versus cognitive responses to the same musical stimuli in order to evidence the specificity and chronometry of the neural mechanisms that govern aesthetic processes occurring during music listening. The first of these studies (Brattico, Jacobsen, De Baene, Glerean, & Tervaniemi, 2010) was conducted using electroencephalography (EEG): subjects were asked to judge the same 180 musical sequences while they were either deciding if the sequences sounded correct or incorrect (descriptive task) or they were deciding whether they liked them or not (evaluative task). Results showed larger frontal negativities for evaluative than descriptive task and more neural resources involved in “aesthetic” listening. In terms of brain structures, the orbitofrontal cortex is repeatedly found active in association with beauty judgments of music (similarly to beauty judgments of visual art) (Brattico et al., submitted; Ishizu & Zeki, 2011).
P
A
I
M
: N A
R Recent years have seen a change in the way brain physiology is described, from a locationist view where each structure subserves one or a few main functions, to a distributed view where the brain is described as a complex dynamic system and where the interactions between its components govern cognitive functions (Bassett & Sporns, 2017; Medaglia, Lynall, & Bassett, 2015). This novel view derives from the technological and scientific progress of network neuroscience, namely the marriage between network science and cognitive neuroscience (Bassett & Sporns, 2017). Network techniques are mathematical tools to describe complex systems organized in networks that change over time (dynamics) (Medaglia et al., 2015; Newman, 2010).
In previous overviews of the music neuroaesthetic field (Brattico & Pearce, 2013; Hodges, 2016), studies from network neuroscience have not been much mentioned. Indeed, most studies on functional connectivity have been published in the past two years. For instance, it has been recently found that functional connectivity between the superior temporal gyrus (where the auditory cortex is located), the inferofrontal cortex (where hierarchical predictions for sounds are computed), and reward regions determines the pleasurable rewarding responses to music, or the absence of them (Martínez-Molina, Mas-Herrero, Rodriguez-Fornells, Zatorre, & Marco-Pallares, 2016; Sachs, Ellis, Schlaug, & Loui, 2016; Salimpoor et al., 2013; Wilkins, Hodges, Laurienti, Steen, & Burdette, 2014). For instance, in a study where subjects had to decide how much money they would use to buy songs, it was found that the connections between the nucleus accumbens and its surrounding regions (the amygdala and the hippocampus) predicted how much a participant would spend on each song (Salimpoor et al., 2013). The importance of the neural interactions between the nucleus accumbens and the auditory cortex for determining aesthetic pleasure to music has been remarked also by studies aiming at identifying the neural sources of individual differences in pleasurable reactions to musical sounds (Keller et al., 2013; Martínez-Molina et al., 2016; Sachs et al., 2016). These studies originate from the recently empirically proven observation that music is not universally liked and appreciated but rather individuals vary greatly in their sensitivity to musical reward, ranging from musicophilics characterized by acute craving for music and increased responsiveness and interest for musical sounds (Sacks, 2007) to musical anhedonics, with a total indifference to music (Mas-Herrero, Marco-Pallares, Lorenzo-Seva, Zatorre, & Rodriguez-Fornells, 2013; Mas-Herrero, Zatorre, RodriguezFornells, & Marco-Pallares, 2014). A recent study using diffusion tensor imaging (DTI) has evidenced that the white-matter tracts between the posterior portion of the superior temporal lobe and emotion- and rewardprocessing regions such as the anterior insula and the medial prefrontal cortex explain the individual differences in reward sensitivity to music (Sachs et al., 2016). In that study, reward sensitivity was quantified with the amount of chills experienced by each individual combined with the degree of physiological changes (heart rate and skin conductance response) during listening to music inducing chills versus neutral music. Another study
(Martínez-Molina et al., 2016) used the newly developed BMRQ questionnaire to identify music-specific anhedonic, hedonic, and hyperhedonic subjects. They were measured with fMRI during a music listening task where they rated the pleasantness of the music excerpts, and a gambling task, where they either won or lost a symbolic amount of money. Results evidenced decreased regional activity in the ventral striatum (including the nucleus accumbens) in anhedonics and increased regional activity in hyperhedonics as well as downregulated functional connectivity between this area and the right superior temporal gyrus in anhedonics. These results were obtained only in relation with pleasantness responses to the music and not with the gambling task. These findings are not confined to receptive pleasure during listening but also relate to the desire to move to rhythmic aspects of the music. A study by Witek et al. (forthcoming) found local changes in directed effective connectivity between motor (dorsomedial prefrontal) and reward (striatal) networks during maximal rhythm-induced pleasurable urge to move. In addition, they showed that maximal pleasurable desire to move to sound was predicted by a meta-stable brain network organization, namely a neural organization lying between an ordered and a disordered state (computed as whole-brain shuffling speed of effective connectivity matrices) (Deco, Kringelbach, Jirsa, & Ritter, 2017). These and other studies compellingly demonstrate that functional connectivity between the superior temporal gyrus (where the auditory cortex is located), the inferofrontal cortex (where hierarchical predictions for sounds are computed), and reward regions of the brain are linked with pleasurable rewarding responses to music, or the absence of them (Martínez-Molina et al., 2016; Sachs et al., 2016; Salimpoor et al., 2013; Wilkins et al., 2014). Notably, the neural transmission between these brain areas is regulated by the monoamine neurotransmitter dopamine that has been linked to incentive salience and motivation for acting, namely to the “wanting” phase of the reward cycle (Kringelbach & Berridge, 2017). A very recent investigation has discovered a molecular link between affective sensitivity to (musical) sounds and dopamine functionality (Quarto et al., 2017): a functional variation in a dopamine receptor gene modulates the impact of sounds on mood states and emotion-related prefrontal and striatal brain activity.
The studies reviewed above, while having the important merit to reveal the complex architecture subserving the rewarding experience of music listening, have not examined whether this experience can be consumed spontaneously, even with casual listening, or whether it requires focused attention and a particular attitude (that is sometimes referred to as aesthetic stance). A fresh study (Liu et al., in press) contrasted conditions varying in the type of focused attentional involvement toward the music requested from subjects. Similarly to previous findings (Bogert et al., 2016; Brattico et al., 2016; Liu, Abu-Jamous, Brattico, & Nandi, 2017), the study observed a co-activation in a network of mesiotemporal limbic structures, including the nucleus accumbens, in response to the liked musical stimuli, irrespectively of whether subjects were focusing on making a conscious liking evaluation or not. Functional connectivity within prefrontal and parieto-occipital regions was instead obtained for the liking judgments.
F
C
P
Until now, the musical experience has been analyzed from the point of view of the subject. Yet, music (like other arts) can represent a means of communication between the judgmental intentions of the perceiver and the meaning-making intentions of the composer/artist. The act of meaning attribution, which is essential to an aesthetic experience, as argued, for example, by Chatterjee and Vartanian (2014), Pearce et al. (2016), Leder et al. (2004), and Menninghaus et al. (2017), cannot exist without the assignment of an intention to the agent producing the artistic object (Acquadro, Congedo, & De Riddeer, 2016). Modern neuroscience offers unprecedented opportunities to capture the essence of such aesthetic processes, thanks to the hyperscanning approach, namely the synchronized brain recordings of two or more persons doing an experimental task together (Hari, Henriksson, Malinen, & Parkkonen, 2015; Konvalinka & Roepstorff, 2012; Zhdanov et al., 2015; Zhou, Bourguignon, Parkkonen, & Hari, 2016). Even if presently, “core” neuroaesthetics of music does not much account for motor production, the mirror neuron or action observation system (a set of neurons in the fronto-parietal regions of the brain that
responds when watching others doing a motor action; Freedberg & Gallese, 2007; Gallese & Freedberg, 2007; Rizzolatti, Fadiga, Gallese, & Fogassi, 1996) has been proposed as a key mechanism allowing aesthetic responses to music in an interactive situation (Molnar-Szakacs & Overy, 2006). According to one model (Molnar-Szakacs & Overy, 2006), music is described as a sequence of hierarchically organized sequences of motor acts synchronous with auditory information and activating both the auditory cortex and motor regions of the action observation network in the posterior inferior frontal gyrus (BA 44) and adjacent premotor cortex. In this model, the anterior insula serves to evaluate the internal visceral changes derived from music and relay these changes to activity in the limbic system, which ultimately is responsible for the complex affective experiences originating from music listening. The co-activation of the same motor systems in musician and perceiver is supposed to allow the co-representation and sharing of emotions during an aesthetic musical experience. Hence, future studies using hyperscanning techniques might measure the aesthetic value of a musical interaction and determine the responsible neural mechanisms. Initial investigations measuring the inter-subject coupling of electroencephalographic signals (especially in beta frequency range) from guitarists playing in a duet prove the feasibility of this approach (Lindenberger, Li, Gruber, & Müller, 2009; Müller, Sanger, & Lindenberger, 2013). To conclude, the agenda of the neuroaesthetics of music, by addressing questions related to intra- and inter-subjectivity during a musical activity, comes close to the essence of music and of what we are as humans. While there still is the risk of “biologism,” researchers working under the music neuroaesthetics umbrella reach out to the “humanistic” approach since they strive to explain how “musical appreciation is dependent on culture, memory, mood and many other factors such as personal taste” (Tallis, 2011, p. 54).
R Acquadro, M. A., Congedo, M., & De Riddeer, D. (2016). Music performance as an experimental approach to hyperscanning studies. Frontiers in Human Neuroscience 10, 242. Retrieved from https://doi.org/10.3389/fnhum.2016.00242
Alluri, V., & Toiviainen, P. (2015). Musical expertise modulates functional connectivity of limbic regions during continuous music listening. Psychomusicology: Music, Mind, and Brain 25(4), 443–454. Altenmüller, E., Demorest, S. M., Fujioka, T., Halpern, A. R., Hannon, E. E., Loui, P., … Zatorre, R. J. (2012). Introduction to the neurosciences and music IV: Learning and memory. Annals of the New York Academy of Sciences 1252, 1–16. Aubert, M., Brumm, A., Ramli, M., Sutikna, T., Saptomo, E. W., Hakim, B., … Dosseto, A. (2014). Pleistocene cave art from Sulawesi, Indonesia. Nature 514(7521), 223–227. Bassett, D. S., & Sporns, O. (2017). Network neuroscience. Nature Neuroscience 20(3), 353–364. Berns, G. S., Capra, C. M., Moore, S., & Noussair, C. (2010). Neural mechanisms of the influence of popularity on adolescent ratings of music. NeuroImage 49(3), 2687–2696. Berridge, K. C., & Kringelbach, M. L. (2015). Pleasure systems in the brain. Neuron 86(3), 646–664. Bigand, E., & Tillmann, B. (2015). Introduction to the neurosciences and music V: Cognitive stimulation and rehabilitation. Annals of the New York Academy of Sciences 1337, vii–ix. Blood, A. J., & Zatorre, R. J. (2001). Intensely pleasurable responses to music correlate with activity in brain regions implicated in reward and emotion. Proceedings of the National Academy of Sciences 98(20) 11818–11823. Blood, A. J., Zatorre, R. J., Bermudez, P., & Evans, A. C. (1999). Emotional responses to pleasant and unpleasant music correlate with activity in paralimbic brain regions. Nature Neuroscience 2(4), 382–387. Blum, K., Chen, T. J., Chen, A. L., Madigan, M., Downs, B. W., Waite, R. L., … Gold, M. S. (2010). Do dopaminergic gene polymorphisms affect mesolimbic reward activation of music listening response? Therapeutic impact on Reward Deficiency Syndrome (RDS). Medical Hypotheses 74(3), 513–520. Bogert, B., Numminen-Kontti, T., Gold, B., Sams, M., Numminen, J., Burunat, I., … Brattico, E. (2016). Hidden sources of joy, fear, and sadness: Explicit versus implicit neural processing of musical emotions. Neuropsychologia 89, 393–402. Brattico, E. (2015). From pleasure to liking and back: Bottom-up and top-down neural routes to the aesthetic enjoyment of music. In M. Nadal, J. P. Houston, L. Agnati, F. Mora, & C. J. Cela Conde (Eds.), Art, aesthetics, and the brain (pp. 303–318). Oxford: Oxford University Press. Brattico, E., Bogert, B., Alluri, V., Tervaniemi, M., Eerola, T., & Jacobsen, T. (2016). It’s sad but I like it: The neural dissociation between musical emotions and liking in experts and laypersons. Frontiers in Human Neuroscience 9, 676. Retrieved from https://doi.org/10.3389/fnhum.2015.00676 Brattico, E., Bogert, B., & Jacobsen, T. (2013). Toward a neural chronometry for the aesthetic experience of music. Frontiers in Psychology 4, 206. Retrieved from https://doi.org/10.3389/fpsyg.2013.00206 Brattico, E., Brattico, P., & Jacobsen, T. (2009). The origins of the aesthetic enjoyment of music: A review of the literature. Musicae Scientiae 13(2), 15–39. Brattico, E., Brusa, A., Fernandes, H. M., Jacobsen, T., Gaggero, G., Toiviainen, P., Vuust, P., & Proverbio, A. M. (submitted). The beauty and the brain: Investigating the neural correlates of musical beauty during a realistic listening experience. Brattico, E., Jacobsen, T., De Baene, W., Glerean, E., & Tervaniemi, M. (2010). Cognitive vs. affective listening modes and judgments of music: An ERP study. Biological Psychology 85(3), 393–409. Brattico, E., & Pearce, M. T. (2013). The neuroaesthetics of music. Psychology of Aesthetics, Creativity, and the Arts 7, 48–61.
Brattico, P., Brattico, E., & Vuust, P. (2017). Global sensory qualities and aesthetic experience of music. Frontiers in Neuroscience 11. Retrieved from https://doi.org/10.3389/fnins.2017.00159 Brielmann, A. A., & Pelli, D. G. (2017). Beauty requires thought. Current Biology 27(10), 1506– 1513 e3. Brown, S., Gao, X., Tisdelle, L., Eickhoff, S. B., & Liotti, M. (2011). Naturalizing aesthetics: Brain areas for aesthetic appraisal across sensory modalities. NeuroImage 58(1), 250–258. Bundgaard, H. (2015). Feeling, meaning, and intentionality: A critique of the neuroaesthetics of beauty. Phenomenology and the Cognitive Sciences 14(4), 781–801. Calvo-Merino, B., Glaser, D. E., Grezes, J., Passingham, R. E., & Haggard, P. (2005). Action observation and acquired motor skills: An FMRI study with expert dancers. Cerebral Cortex 15(8), 1243–1249. Calvo-Merino, B., Jola, C., Glaser, D. E., & Haggard, P. (2008). Towards a sensorimotor aesthetics of performing art. Consciousness and Cognition 17(3), 911–922. Chanda, M. L., & Levitin, D. J. (2013). The neurochemistry of music. Trends in Cognitive Sciences 17(4), 179–193. Chapin, H., Jantzen, K., Kelso, J. A., Steinberg, F., & Large, E. (2010). Dynamic emotional and neural responses to music depend on performance expression and listener experience. PloS ONE 5(12), e13812. Chatterjee, A. (2011). Neuroaesthetics: A coming of age story. Journal of Cognitive Neuroscience 23(1), 53–62. Chatterjee, A., & Vartanian, O. (2014). Neuroaesthetics. Trends in Cognitive Sciences 18(7), 370– 375. Chatterjee, A., & Vartanian, O. (2016). Neuroscience of aesthetics. Annals of the New York Academy of Sciences 1369, 172–194. Coburn, A., Vartanian, O., & Chatterjee, A. (2017). Buildings, beauty, and the brain: A neuroscience of architectural experience. Journal of Cognitive Neuroscience 29(9), 1521–1531. Conway, B. R., & Rehding, A. (2013). Neuroaesthetics and the trouble with beauty. PLoS Biology 11, e1001504. Cupchik, G. C. (2007). A critical reflection on Arnheim’s Gestalt theory of aesthetics. Psychology of Aesthetics, Creativity, and the Arts 1(1), 16–24. Cupchik, G. C., Vartanian, O., Crawley, A., & Mikulis, D. J. (2009). Viewing artworks: Contributions of cognitive control and perceptual facilitation to aesthetic experience. Brain and Cognition 70(1), 84–91. Curtis, G. (2006). The cave painters. New York: Anchor Books. Deco, G., Kringelbach, M. L., Jirsa, V. K., & Ritter, P. (2017). The dynamics of resting fluctuations in the brain: Metastability and its dynamical cortical core. Scientific Reports 7, 3095. doi:10.1038/s41598-017-03073-5 Di Dio, C., Macaluso, E., & Rizzolatti, G. (2007). The golden beauty: Brain response to classical and renaissance sculptures. PLoS ONE 11, 1–9. Doelling, K. B., & Poeppel, D. (2015). Cortical entrainment to music and its modulation by expertise. Proceedings of the National Academy of Sciences 112(45), E6233–E6242. Eysenck, H. J. (1942). The experimental study of the “good Gestalt”: A new approach. Psychological Review 49(4), 344–364. Fishman, Y. I., Volkov, I. O., Noh, M. D., Garell, P. C., Bakken, H., Arezzo, J. C., … Steinschneider, M. (2001). Consonance and dissonance of musical chords: Neural correlates in auditory cortex of monkeys and humans. Journal of Neurophysiology 86(6), 2761–2788. Freedberg, D., & Gallese, V. (2007). Motion, emotion and empathy in esthetic experience. Trends in Cognitive Sciences 11(5), 197–203.
Friston, K. (2005). A theory of cortical responses. Philosophical Transactions of the Royal Society B: Biological Sciences 360(1456), 815–836. Früholz, S., Trost, W., & Kotz, S. A. (2016). The sound of emotions: Towards a unifying neural network perspective of affective sound processing. Neuroscience & Biobehioral Reviews 68, 96– 110. Gallese, V., & Freedberg, D. (2007). Mirror and canonical neurons are crucial elements in esthetic response. Trends in Cognitive Sciences 11(10), 411. Golden, H. L., Clark, C. N., Nicholas, J. M., Cohen, M. H., Slattery, C. F., Paterson, R. W., … Warren, J. D. (2017). Music perception in dementia. Journal of Alzheimer’s Disease 55(3), 933– 949. Gosselin, N., Samson, S., Adolphs, R., Noulhiane, M., Roy, M., Hasboun, D., … Peretz, I. (2006). Emotional responses to unpleasant music correlates with damage to the parahippocampal cortex. Brain 129(10), 2585–2592. Hanslick, E. (1954). On the musically beautiful. Indianapolis: Hackett (English translation from the 8th ed. 1891). Hargreaves, D. J., & North, A. C. (2010). Experimental aesthetics and liking for music. In P. N. Juslin & J. A. Sloboda (Eds.), Handbook of music and emotion: Theory, research, applications (pp. 515–46). Oxford: Oxford University Press. Hari, R., Henriksson, L., Malinen, S., & Parkkonen, L. (2015). Centrality of social interaction in human brain function. Neuron 88(1), 181–193. Hodges, D. A. (2016). The neuroaesthetics of music. In S. Hallam, I. Cross, & M. Thaut (Eds.), The Oxford handbook of music psychology (2nd ed., pp. 247–262). Oxford: Oxford University Press. Huron, D. (2006). Sweet anticipation: Music and the psychology of expectation. Cambridge, MA: MIT Press. Huron, D. (2009). Aesthetics. In S. Hallam, I. Cross, & M. Thaut (Eds.), The Oxford handbook of music psychology (pp. 151–159). Oxford: Oxford University Press. Huston, J. P., Nadal, M., Agnati, L., Mora, L., & Cela Conde, C. J. (Eds.). (2015). Art, aesthetics and the brain. Oxford: Oxford University Press. Ishizu, T., & Zeki, S. (2011). Toward a brain-based theory of beauty. PLoS ONE 6, e21852. Istok, E., Brattico, E., Jacobsen, T., Krohn, K., Mueller, M., & Tervaniemi, M. (2009). Aesthetic responses to music: A questionnaire study. Musicae Scientiae 13, 183–206. Istok, E., Brattico, E., Jacobsen, T., Ritter, A., & Tervaniemi, M. (2013). “I love rock ’n’ roll”: Music genre preference modulates brain responses to music. Biological Psychology 92(2), 142–151. Jacobsen, J. H., Stelzer, J., Fritz, T. H., Chetelat, G., La Joie, R., & Turner, R. (2015). Why musical memory can be preserved in advanced Alzheimer’s disease. Brain 138(8), 2438–2450. Jacobsen, T. (2014). Domain specificity and mental chronometry in empirical aesthetics. British Journal of Psychology 105(4), 471–473. Jacobsen, T., & Beudt, S. (2017). Domain generality and domain specificity in aesthetic appreciation. New Ideas in Psychology 47, 97–102. Jacobsen, T., & Höfel, L. (2003). Descriptive and evaluative judgment processes: Behavioral and electrophysiological indices of processing symmetry and aesthetics. Cognitive, Affective, & Behavioral Neuroscience 3(4), 289–299. Juslin, P. N. (2013). From everyday emotions to aesthetic emotions: Towards a unified theory of musical emotions. Physics of Life Reviews 10(3), 235–266. Juslin, P. N., & Laukka, P. (2004). Expression, perception, and induction of musical emotions: A review and a questionnaire study of everyday listening. Journal of New Music Research 33(3), 217–238.
Juslin, P. N., & Västfjäll, D. (2008). Emotional responses to music: The need to consider underlying mechanisms. Behavioral and Brain Sciences 31(5), 559–575. Kawabata, H., & Zeki, S. (2004). Neural correlates of beauty. Journal of Neurophysiology 91(4), 1699–1705. Keller, J., Young, C. B., Kelley, E., Prater, K., Levitin, D. J., & Menon, V. (2013). Trait anhedonia is associated with reduced reactivity and connectivity of mesolimbic and paralimbic reward pathways. Journal of Psychiatric Research 47(10), 1319–1328. Koelsch, S. (2011). Toward a neural basis of music perception: A review and updated model. Frontiers in Psychology 2, 110. Retrieved from https://doi.org/10.3389/fpsyg.2011.00110 Koelsch, S. (2014). Brain correlates of music-evoked emotions. Nature Reviews Neuroscience 15, 170–180. Koelsch, S., & Siebel, W. A. (2005). Towards a neural basis of music perception. Trends in Cognitive Sciences 9(12), 578–584. Konvalinka, I., & Roepstorff, A. (2012). The two-brain approach: How can mutually interacting brains teach us something about social interaction? Frontiers in Human Neuroscience 6, 215. Retrieved from https://doi.org/10.3389/fnhum.2012.00215 Kornysheva, K., von Cramon, D. Y., Jacobsen, T., & Schubotz, R. I., Tuning-in the beat: Aesthetic appreciation of musical rhythms correlates with a premotor activity boost. Human Brain Mapping 31(1), 48–64. Kringelbach, M. L., & Berridge, K. C. (2017). The affective core of emotion: Linking pleasure, subjective well-being, and optimal metastability in the brain. Emotion Review 9(3), 191–199. Kühn, S., & Gallinat, J. (2012). The neural correlates of subjective pleasantness. NeuroImage 61(1), 289–294. Large, E. W., & Snyder, J. S. (2009). Pulse and meter as neural resonance. Annals of the New York Academy of Sciences 1169, 46–57. Laukka, P. (2007). Uses of music and psychological well-being among the elderly. Journal of Happiness Studies 8(2), 215–241. Leder, H., Belke, B., Oeberst, A., & Augustin, D. (2004). A model of aesthetic appreciation and aesthetic judgements. British Journal of Psychology 95(4), 489–508. Leder, H., Gerger, G., Brieber, D., & Schwarz, N. (2014). What makes an art expert? Emotion and evaluation in art appreciation. Cognition and Emotion 28, 1137–1147. Leder, H., Markey, P. S., & Pelowski, M. (2015). Aesthetic emotions to art: What they are and what makes them special. Comment on “The quartet theory of human emotions: An integrative and neurofunctional model” by S. Koelsch et al. Physics of Life Reviews 13, 67–70. Leder, H., & Nadal, M. (2014). Ten years of a model of aesthetic appreciation and aesthetic judgments: The aesthetic episode—developments and challenges in empirical aesthetics. British Journal of Psychology 105(4), 443–464. Lehne, M., & Koelsch, S. (2015). Tension-resolution patterns as a key element of aesthetic experience: Psychological principles and underlying brain mechanisms. In J. P. Huston, M. Nadal, F. Mora, L. Agnati, & C. J. Cela Conde (Eds.), Art, aesthetics, and the brain (pp. 285–302). Oxford: Oxford University Press. Levitin, D. J., & Tirovolas, A. K. (2009). Current advances in the cognitive neuroscience of music. Annals of the New York Academy of Sciences 1156, 211–231. Lindenberger, U., Li, S. C., Gruber, W., & Müller, V. (2009). Brains swinging in concert: Cortical phase synchronization while playing guitar. BMC Neuroscience 10, 22. Retrieved from https://doi.org/10.1186/1471-2202-10-22 Liu, C., Abu-Jamous, B., Brattico, E., & Nandi, A. K. (2017). Towards tunable consensus clustering for studying functional brain connectivity during affective processing. International Journal of
Neural Systems 27(2), doi:10.1142/S0129065716500428 Liu, C., Brattico, E., Abu-Jamous, B., Pereira, C. S., Jacobsen, T., & Nandi, A. K. (in press). Effect of explicit evaluation on the neural connectivity related to listening to unfamiliar music. Frontiers in Human Neuroscience. Retrieved from https://doi.org/10.3389/fnhum.2017.00611 Livingstone, M., & Hubel, D. H. (2002). Vision and art: The biology of seeing. New York: Harry N. Abrams. McDonald, C., & Stewart, L. (2008). Uses and functions of music in congenital amusia. Music Perception 25(4), 345–355. Martindale, C., Locher, P., & Petrov, V. M. (2007). Evolutionary and neurocognitive approaches to aesthetics, creativity and the arts. Amityville, NY: Baywood Publishing. Martínez-Molina, N., Mas-Herrero, E., Rodriguez-Fornells, A., Zatorre, R. J., & Marco-Pallares, J. (2016). Neural correlates of specific musical anhedonia. Proceedings of the National Academy of Sciences 113, E7337–E7345. Mas-Herrero, E., Dagher, A., & Zatorre, R. J. (2018). Modulating musical reward sensitivity up and down with transcranial magnetic stimulation. Nature Human Behaviour 2, 27–32. Mas-Herrero, E., Marco-Pallares, J., Lorenzo-Seva, U., Zatorre, R. J., & Rodriguez-Fornells, A. (2013). Individual differences in music reward experiences. Music Perception 31(2), 118–138. Mas-Herrero, E., Zatorre, R. J., Rodriguez-Fornells, A., & Marco-Pallares, J. (2014). Dissociation between musical and monetary reward responses in specific musical anhedonia. Current Biology 24(6), 699–704. Matrone, C., & Brattico, E. (2015). The power of music on Alzheimer’s disease and the need to understand the underlying molecular mechanisms. Journal of Alzheimer’s Disease and Parkinsonism 5. doi:10.4172/2161-0460.1000196 Medaglia, J. D., Lynall, M. E., & Bassett, D. S. (2015). Cognitive network neuroscience. Journal of Cognitive Neuroscience 27(8), 1471–1491. Menninghaus, W., Wagner, V., Hanich, J., Wassiliwizky, E., Jacobsen, T., & Koelsch, S. (2017). The distancing-embracing model of the enjoyment of negative emotions in art reception. Behavioral and Brain Sciences 40, 1–58. Menon, V., & Levitin, D. J. (2005). The rewards of music listening: Response and physiological connectivity of the mesolimbic system. NeuroImage 28(1), 175–184. Molnar-Szakacs, I., & Overy, K. (2006). Music and mirror neurons: From motion to “e”motion. Social Cognitive and Affective Neuroscience 1(3), 235–241. Montag, C., Reuter, M., & Axmacher, N. (2011). How one’s favorite song activates the reward circuitry of the brain: Personality matters! Behavioural Brain Research 225(2), 511–514. Müller, V., Höfel, L., Brattico, E., & Jacobsen, T. (2010). Aesthetic judgments of music in experts and laypersons: An ERP study. International Journal of Psychophysiology 76(1), 40–51. Müller, V., Sanger, J., & Lindenberger, U. (2013). Intra- and inter-brain synchronization during musical improvisation on the guitar. PLoS ONE 8, e73852. Nadal, M., Munar, E., Capo, M. A., Rossello, J., & Cela-Conde, C. J. (2008). Towards a framework for the study of the neural correlates of aesthetic preference. Spatial Vision 21(3–5), 379–396. Nadal, M., & Pearce, M. T. (2011). The Copenhagen neuroaesthetics conference: Prospects and pitfalls for an emerging field. Brain and Cognition 76(1), 172–183. Newman, M. E. J. (2010). Networks: An introduction. Oxford: Oxford University Press. Nieminen, S., Istok, E., Brattico, E., Tervaniemi, M., & Huotilainen, M. (2011). The development of aesthetic responses to music and their underlying neural and psychological mechanisms. Cortex 47(9), 1138–1146. Pallesen, K. J., Bailey, C. J., Brattico, E., Gjedde, A., Palva, J. M., & Palva, S. (2015). Experience drives synchronization: The phase and amplitude dynamics of neural oscillations to musical chords
are differentially modulated by musical expertise. PLoS ONE 10, e0134211. Pallesen, K. J., Brattico, E., Bailey, C., Korvenoja, A., Koivisto, J., Gjedde, A., & Carlson, S. (2005). Emotion processing of major, minor, and dissonant chords: a functional magnetic resonance imaging study. Annals of the New York Academy of Sciences 1060, 450–453. Patel, A. (2008). Music, language, and the brain. Oxford: Oxford University Press. Pearce, M. T. (2015). Effects of expertise on the cognitive and neural processes involved in musical appreciation. In J. P. Huston, M. Nadal, F. Mora, L. Agnati, & C. J. Cela Conde (Eds.), Art, aesthetics, and the brain (pp. 319–338). Oxford: Oxford University Press. Pearce, M. T., Ruiz, M. H., Kapasi, S., Wiggins, G. A., & Bhattacharya, J. (2010). Unsupervised statistical learning underpins computational, behavioural, and neural manifestations of musical expectation. NeuroImage 50(1), 302–313. Pearce, M. T., Zaidel, D. W., Vartanian, O., Skov, M., Leder, H., Chatterjee, A., & Nadal, M. (2016). Neuroaesthetics: The cognitive neuroscience of aesthetic experience. Perspectives on Psychological Science 11(2), 265–279. Pelowski, M., Markey, P. S., Forster, M., Gerger, G., & Leder, H. (2017). Move me, astonish me … delight my eyes and brain: The Vienna Integrated Model of top-down and bottom-up processes in Art Perception (VIMAP) and corresponding affective, evaluative, and neurophysiological correlates. Physics of Life Reviews 21, 80–125. Pelowski, M., Markey, P. S., Lauring, J. O., & Leder, H. (2016). Visualizing the impact of art: An update and comparison of current psychological models of art experience. Frontiers in Human Neuroscience 10, 160. doi:10.3389/fnhum.2016.00160 Pereira, C. S., Teixeira, J., Figueiredo, P., Xavier, J., Castro, S. L., & Brattico, E. (2011). Music and emotions in the brain: Familiarity matters. PloS ONE 6(11), e27241. Peretz, I., & Coltheart, M. (2003). Modularity of music processing. Nature Neuroscience 6, 688–691. Peretz, I., & Zatorre, R. (Eds.). (2003). The cognitive neuroscience of music. Oxford: Oxford University Press. Peretz, I., & Zatorre, R. J. (2005). Brain organization for music processing. Annual Review of Psychology 56, 89–114 Quarto, T., Fasano, M. C., Taurisano, P., Fazio, L., Antonucci, L. A., Gelao, B., … Brattico, E. (2017). Interaction between DRD2 variation and sound environment on mood and emotion-related brain activity. Neuroscience 341, 9–17. Redies, C. (2015). Combining universal beauty and cultural context in a unifying model of visual aesthetic experience. Frontiers in Human Neuroscience 9, 218. Retrieved from https://doi.org/10.3389/fnhum.2015.00218 Reybrouck, M., & Brattico, E. (2015). Neuroplasticity beyond sounds: Neural adaptations following long-term musical aesthetic experiences. Brain Sciences 5(1), 69–91. Rizzolatti, G., Fadiga, L., Gallese, V., & Fogassi, L. (1996). Premotor cortex and the recognition of motor actions. Cognitive Brain Research 3(2), 131–141. Sachs, M. E., Ellis, R. J., Schlaug, G., & Loui, P. (2016). Brain connectivity reflects human aesthetic responses to music. Social Cognitive and Affective Neuroscience 11(6), 884–891. Sacks, O. (2007). Musicophilia: Tales of music and the brain. New York: Vintage. Salimpoor, V. N., Benovoy, M., Larcher, K., Dagher, A., & Zatorre, R. J. (2011). Anatomically distinct dopamine release during anticipation and experience of peak emotion to music. Nature Neuroscience 14, 257–262. Salimpoor, V. N., Van Den Bosch, I., Kovacevic, N., McIntosh, A. R., Dagher, A., & Zatorre, R. J. (2013). Interactions between the nucleus accumbens and auditory cortices predict music reward value. Science 340(6129), 216–219.
Salimpoor, V. N., & Zattore, R. J. (2013). Neural interactions that give rise to musical pleasure. Psychology of Aesthetics, Creativity, and the Arts 7, 62–75. Samson, S., Dellacherie, D., & Platel, H. (2009). Emotional power of music in patients with memory disorders: Clinical implications of cognitive neuroscience. Annals of the New York Academy of Sciences 1169, 245–255. Savage, P. E., Brown, S., Sakai, E., & Currie, T. E. (2015). Statistical universals reveal the structures and functions of human music. Proceedings of the National Academy of Sciences 112, 8987–8992. Schafer, T., Sedlmeier, P., Stadtler, C., & Huron, D. (2013). The psychological functions of music listening. Frontiers in Psychology 4, 511. Retrieved from https://doi.org/10.3389/fpsyg.2013.00511 Scruton, R. (1999). The aesthetics of music. Oxford: Oxford University Press. Seghdi, N., & Brattico, E. (in press). The phylogenetic roots of music. Biokulturelle Menneske. Sloboda, J. A. (1985). The musical mind. Oxford: Oxford University Press. Sloboda, J. A. (1992). Empirical studies of emotional response to music. In M. R. Jones & S. Holleran (Eds.), Cognitive Bases of Musical Communication (pp. 33–46). Washington, DC: American Psychological Association. Smith, C. U. (2005). Evolutionary neurobiology and aesthetics. Perspectives in Biology and Medicine 48(1), 17–30. Steinbeis, N., & Koelsch, S. (2009). Understanding the intentions behind man-made products elicits neural activity in areas dedicated to mental state attribution. Cerebral Cortex 19(3), 619–623. Suzuki, M., Okamura, N., Kawachi, Y., Tashiro, M., Arao, H., Hoshishiba, T., … Yanai, K. (2008). Discrete cortical regions associated with the musical beauty of major and minor chords. Cognitive, Affective, & Behavioral Neuroscience 8(2), 126–31. Tallis, R. (2008). The limitations of a neurological approach to art: Review of Neuroarthistory: From Aristotle and Pliny to Baxandall and Zeki by John Onians (Yale University Press, 2008). Lancet 372, 19–20. Tallis, R. (2011). Reflections of a metaphysical flaneur. London and New York: Routledge. Tiihonen, M., Brattico, E., Maksimainen, J., Wikgren, J., & Saarikallio, S. (2017). Constituents of music and visual-art related pleasure: A critical integrative literature review. Frontiers in Psychology 8, 1218. Retrieved from https://doi.org/10.3389/fpsyg.2017.01218 Trost, W., Ethofer, T., Zentner, M., & Vuilleumier, P. (2012). Mapping aesthetic musical emotions in the brain. Cerebral Cortex 22(12), 2769–2783. Trost, W., Frühholz, S., Cochrane, T., Cojan, Y., & Vuilleumier, P. (2015). Temporal dynamics of musical emotions examined through intersubject synchrony of brain activity. Social Cognitive and Affective Neuroscience 10(12), 1705–1721. Trost, W., Frühholz, S., Schön, D., Labbé, C., Pichon, S., Grandjean, D., & Vuilleumier, P. (2014). Getting the beat: Entrainment of brain activity by musical rhythm and pleasantness. NeuroImage 103, 55–64. Vartanian, O., & Goel, V. (2004). Neuroanatomical correlates of aesthetic preference for paintings. Neuroreport 15(5), 893–897. Vuust, P., & Kringelbach, M. L. (2010). The pleasure of making sense of music. Interdisciplinary Science Reviews 35(2), 166–182. Vuust, P., Ostergaard, L., Pallesen, K. J., Bailey, C., & Roepstorff, A. (2009). Predictive coding of music: Brain responses to rhythmic incongruity. Cortex 45(1), 80–92. Vuust, P., & Witek, M. A. (2014). Rhythmic complexity and predictive coding: A novel approach to modeling rhythm and meter perception in music. Frontiers in Psychology 5, 1111. Retrieved from https://doi.org/10.3389/fpsyg.2014.01111
Wassiliwizky, E., Koelsch, S., Wagner, V., Jacobsen, T., & Menninghaus, W. (2017). The emotional power of poetry: Neural circuitry, psychophysiology and compositional principles. Social Cognitive and Affective Neuroscience 12(8), 1229–1240. Wilkins, R. W., Hodges, D. A., Laurienti, P. J., Steen, M., & Burdette, J. H. (2014). Network science and the effects of music preference on functional brain connectivity: From Beethoven to Eminem. Scientific Reports 4, 6130. doi:10.1038/srep06130 Witek, M. A., Clarke, E. F., Wallentin, M., Kringelbach, M. L., & Vuust, P. (2014). Syncopation, body-movement and pleasure in groove music. PLoS ONE 9, e94446. Witek, M. A., Gilson, M., Clarke, E. F., Wallentin, M., Deco, G., Kringelbach, M. L., & Vuust, P. (forthcoming). The brain dynamics of musical groove: Whole-brain modelling of effective connectivity reveals increased metastability of reward and motor networks. Nature Communication. Witek, M. A., Kringelbach, M. L., & Vuust, P. (2015). Musical rhythm and affect: Comment on “The quartet theory of human emotions: An integrative and neurofunctional model” by S. Koelsch et al. Physics of Life Reviews 13, 92–94. Zatorre, R. J. (2015). Musical pleasure and reward: Mechanisms and dysfunction. Annals of the New York Academy of Sciences 1337, 202–211. Zeki, S. (1999). Inner vision: An exploration of art and the brain. Oxford: Oxford University Press. Zeki, S. (2013). Clive Bell’s “Significant Form” and the neurobiology of aesthetics. Frontiers in Human Neuroscience 7, 730. Retrieved from https://doi.org/10.3389/fnhum.2013.00730 Zeki, S. (2014). Neurobiology and the humanities. Neuron 84(1), 12–14. Zhdanov, A., Nurminen, J., Baess, P., Hirvenkari, L., Jousmaki, V., Makela, J. P., … Parkkonen, L. (2015). An internet-based real-time audiovisual link for dual MEG recordings. PLoS ONE 10, e0128485. Zhou, G., Bourguignon, M., Parkkonen, L., & Hari, R. (2016). Neural signatures of hand kinematics in leaders vs. followers: A dual-MEG study. NeuroImage 125, 731–738.
CHAPT E R 16
MUSIC AND LANGUAGE D A N I E L E S C H Ö N A N D B E N JA MI N MO R I L L O N
I W music and language may differ in terms of their structures and functions, they share the distinctive feature of being dynamically organized in time; the information they carry is intrinsically contained in the temporal dimension. A frequently asked question is whether music and language are processed by similar or different brain regions, neural networks, or cortical oscillatory processes, and to what extent the brain circuitry is specialized compared to other stimuli. In order to tackle these issues, it is worth keeping in mind some principles. Nikolaas Tinbergen and David Marr described different levels of analysis that must, in their valuable opinion, be taken into account if one wants to understand behavior and complex systems (Marr, 1982; Tinbergen, 1963). Marr’s three levels of analysis (computational, algorithmic, and implementational) are particularly suited to study brain functions. Because music and language differ in terms of surface acoustic features and convey different purposes, the computations needed to process them differ. On the other hand, at the implementation level, the same organ and a myriad of cells process both music and language. The key program in modern cognitive neurosciences is thus to tackle the algorithmic level (Poeppel, 2012): Are similar or different algorithms involved in the processing of music and language? And what are they? In this chapter, we will begin with a historical perspective, where the
human brain is described from a phrenological viewpoint. Then, we will describe the common functions and operations in music and language, the methodological limitations in current approaches, and portray the resourcesharing hypothesis. We will then describe the interdependency between music and language, notably how musical training improves language skills, before trying to bridge music and language in a single context. We will conclude by describing a promising avenue: studies that adopt a dynamical standpoint to understand music and language.
O
M
M L
From a historical perspective, the study of the comparison of music and language brain functions dates back to the early observations of deficits acquired following a brain lesion. Since then, language and musical disorders are described with different terms: aphasia and amusia. This distinction comes along with a deeper distinction between language and musical domains that at the end of the nineteenth century had been the object of structural or historical formalization following very different paths. Language is analyzed as a formal system of different elements, while music is viewed in a historical perspective as an artistic behavior. Language and music are thus viewed as two highly distinct human domains. In this context, the observation of selective impairment of language or musical abilities fits in very well and also complies with the idea that different functions are implemented in different brain regions. The birth of cognitive sciences is strongly influenced by this vision of language as a specific and uniquely human function with dedicated neural structures and music as a different human “artistic” function. At the end of the 1950s Noam Chomsky was convinced that the principles underlying language structure are biologically determined: every individual has the same language potential because it is genetically transmitted, independently of socio-cultural differences. This scientific and political view of language development has had a tremendous impact in the field of linguistics, cognitive sciences, and neurosciences. It stands in clear contrast with that of another giant of psychology, B. F. Skinner. Skinner considered the mind as
a tabula rasa whereon only experience could add knowledge. The two giants faced each other in an intellectual duel. The most famous attack of Chomsky (1959) is the argument on the poverty of the stimulus: the child exposed to a limited amount of linguistic stimuli is able to generalize to new linguistic constructions using the rules acquired on the initial set. According to Chomsky, the trial and error learning mechanisms defended by the behaviorists would not be an appropriate model to language acquisition since language is acquired by listening to correct sentences. This observation, as well as the fact that a confined brain lesion such as in Broca’s area may induce a specific language deficit (agrammatism), led to Chomsky’s suggestion that syntactic knowledge may be partly innate. Curiously, Chomsky did not remark that music acquisition follows very similar principles as language acquisition: early acquisition, generativity, and learning from correct structures. Chomsky’s work strongly inspired that of Jerry Fodor who in the early 1980s wrote The modularity of mind (1983). The mind (and the brain) would be organized in independent modules with specific functions. Again Fodor’s view is strongly influenced by and reinforces the results of the neuropsychological literature, digging deeper and deeper into specific deficits following focal brain lesions. The functioning of the brain seems quite simple: every region has a specific functional role and the lesion causes a deficit that may be very specific, for instance affecting independently nouns and verbs processing (Hillis & Caramazza, 1995). It is within this context that the field of neuropsychology of music develops, beyond previous anecdotal accounts. As every new field, the desire to gain identity and acknowledgment is strong. Music is thus studied as a special human faculty with dedicated brain areas. This vision is also constrained by some intrinsic limitations of the field of neuropsychology of music. First, research on musical skills in brain lesion patients requires a neurologist or neuropsychologist with a musical background. Indeed, while testing language skills may appear a simple task, assessing musical abilities definitely requires special skills, even more so in the era of “pencil and paper.” The second limitation is the Western idea that music is the prerogative of a few people, called musicians, and thus it only make sense to assess musical abilities in experienced musicians such as composers, conductors, or performers with musical education (Basso & Capitani, 1985; Luria, Tsevetkova, & Futer, 1965). Altogether this gives access to a limited
amount of data strongly influenced by the modular approach, with musical functions clearly distinct from other human abilities. This is the vision that is well summarized in the article entitled “Modularity of music processing” (Peretz & Coltheart, 2003): several single case studies are used to defend not only the hypothesis of modularity of music and language, but also the modularity of different levels of music processing. However, focusing on a single function, even more when using a single methodological approach (for instance, brain lesions) will systematically lead toward a modular interpretation of reality. In other words, focusing only on language syntactic processing in Broca’s aphasics will necessarily lead one to conclude that the left inferior frontal operculum is involved (or not) in syntactic processing. This may be in turn interpreted in a modular perspective: syntax is independently and specifically processed in the left frontal operculum. By contrast a comparative approach will give a broader and more complex picture. Patel (2003), considering language and musical syntax, claims that, while these may seem very different, there are several commonalities, such as the need to build an integrated flow of information that takes into account a certain number of rules. Here we can clearly see all the power of the comparative approach that requires us to go beyond a circular definition of cognitive function (e.g., syntax is syntax) in order to compare apparently different function (e.g., syntax and harmony) that can possibly be redefined in terms of a more elementary function with a greater psychobiological validity. In the case of syntax and harmony, finding common substrates requires one to redefine the object of the study (i.e., some elementary operation common to both). With the advent of the neuroimaging era, while the first two decades have been dominated by a modular approach, the last decade has put the accent on the importance of the network and its connections. Cognitive neurosciences have also gained access to the functioning of nonpathological brains using highly sophisticated experimental designs. This has allowed a breakdown of both language and music processing into more elementary operations. If the search of biomarkers has somewhat consolidated the innatist model, several major criticisms have been developed further. For instance, studies on the zebra finch, a species of bird well known for their ability to learn new songs, showed that it is the learning process that alters the neuronal circuits. The maturation of synaptic inhibition onto premotor neurons is correlated with learning but not age
(Vallentin, Kosche, Lipkind, & Long, 2016). This shows that even in a species wherein one could think that the rules governing song acquisition are genetically encoded, the environment plays an important role. Of course putting a zebra finch in a cage with a cat (and assuming that the cat did not eat the bird) would not allow the bird to learn how to meow, that is to say that the genes do play a major role, of course. When considering the case of language and music, two extremely refined forms of communication, while the human species specificity is certainly genetically encoded, this does not imply that whatever allows the development of language is specific to language and not shared with music. In other words, if language and music are specific to humans in their capacity to convey an extraordinary amount of information, one should not misinterpret this in terms of different evolutionary or developmental trajectories of language and music. Psychology of music and neurosciences of music are recent fields of research. The major limitation of new disciplines (and of humans) is their strong desire to build their own identity, which often occurs to the detriment of considering neighboring disciplines (and identities). Our field has also yielded to this temptation by building musical cognitive models that have initially ignored other potentially inspiring and similar domains, such as language for instance. We will see now that music shares several cognitive operations with language.
C
F M
O L
Both language and music serve a highly sophisticated communicative function. While we will refrain here from giving a definition of what is language and what is music, it is important to keep in mind that they require a huge amount of different perceptual and cognitive operations. To perceive both music and language, one of the first operations that needs to be implemented is the possibility to discriminate sounds. The two phonemes [d] and [t] are quite similar but need to be distinguished as it is the case for a C and B in music or for the same pitch played by an oboe or a bassoon. Sounds can be characterized in terms of a limited number of spectral features and these features are relevant to both musical and
linguistic sounds. The analysis of the acoustic features of sounds takes place in the cochlea and in several subcortical relays up to the primary auditory cortex. There is a suggestion that the auditory cortex may be asymmetric in the use of temporal windows of analysis, with the left auditory cortex preferring short windows of integration and the right auditory cortex preferring longer windows (Giraud et al., 2007; Poeppel, 2003; Zatorre, Belin, & Penhune, 2002). This hypothesis has been used to defend the idea that language, requiring short windows of analysis to discriminate consonants, is preferentially processed in the left hemisphere, while music, requiring longer windows of analysis to discriminate pitch, is preferentially processed in the right hemisphere. While the debate is still open, one should keep in mind that language perception is not just consonant discrimination, but also requires us to take into account other features, such as pitch in tone or stresses, that require longer windows of analysis. On the other side, music is often considered in our Western society and by non-musicians as mostly relying on pitch discrimination. However, any good musician will claim that an extremely important feature of music is the sound quality, which is not stationary as pitch and requires short analysis windows. The scenario is thus more complicated than it is often depicted and the idea of the cortex performing parallel processing on any acoustic input, yielding to the extraction of complementary piece of information, seems necessary to overcome the simplistic monolithic distinction between language and music. Generating different patterns of neuronal responses to every sound would yield, in everyday life, to an infinite number of sound representations. This is why sounds are categorized. Two acoustically different tokens of [b] will thus be perceived as a unique [b]. Two different high Es of the violin will be perceived as E, even if one is slightly lower than the other; an A and a C note of a piano will be perceived as “piano” sounds. Categorization is necessary and common to both language and music and it allows us to make sense of the world, by reducing its intrinsic variety to a finite and limited number of categories. Categorical representations of sounds are possibly distributed across neuronal populations within the human auditory cortices, including primary auditory areas (Belin, Zatorre, Lafaille, Ahad, & Pike, 2000; Liebenthal, Binder, Spitzer, Possing, & Medler, 2005; Rauschecker & Scott, 2009; Rauschecker & Tian, 2000; Staeren, Renvall, De Martino, Goebel, & Formisano, 2009),
although motor regions seem also to play a role in representing, for instance, phonemic acoustic features (Cheung, Hamilton, Johnson, & Chang, 2016). We rarely perceive sounds in isolation, but rather in a complex flow. This requires us to build a structure that evolves in time, taking into account the different phonemes of a sentence or tones of a melody. Building such a structure requires at the very least a working memory capacity that allows manipulating sound representations. Sounds are grouped into larger units and this grouping depends upon our previous experience with these sounds. In other words we take advantage of our previous experience with the world and build multiple statistical distributions of sounds. Different distributions will account for different grouping strategies: for instance, streaming a specific voice or musical instrument in a cocktail party or in a musical ensemble (Elhilali & Shamma, 2008); grouping phonemes together or tones to build words or melodies according to the transitional probabilities of phonemes or tones (Saffran, Aslin, & Newport, 1996; Saffran, Johnson, Aslin, & Newport,1999; Schön et al., 2008). These statistical distributions are built on the memory traces of what we have previously perceived and strongly influence our upcoming perception of the world. In fact, following these statistical distributions, several rules may emerge that allow us to simplify even more the complex and continuous auditory flow. Importantly in both language and music, the distributions can also be computed onto symbolic unit. These distributions or internal models at different feature levels have two major consequences. The first, cited earlier, is that they allow us to generate new sequences having similar statistical properties—in other words, new sentences or melodies complying to the rules of the musical or linguistic system. The second is that they allow us to make accurate predictions on upcoming events. Listening to a person speaking or playing we will be able to anticipate, to a certain degree, what and when is going to be said/played. Considering the very fast and changing nature of the auditory flow, this ability is of utmost importance and it explains why sounds (phonemes or tones) missing from a speech or musical signal can be restored by the brain and appear to be heard (DeWitt & Samuel, 1990). In this respect music is particularly challenging because it may require us to anticipate simultaneously several distinct streams of features. For instance, when listening to a symphony orchestra or a string quartet, several melodic
lines take place at the same time and need to be anticipated in order to perceive a sense of continuity in the music. Overall, language and music are characterized by a limited set of acoustic features, categorized by the human brain into a limited set of representations, and subjected to similar rules of statistical learning.
O
R
S
Since most research in cognitive neuroscience has been guided by the assumption that brain regions are specialized for a given function, studies on music and language have addressed the question of whether music and language share common neural substrates. This has been often referred as the notion of overlap (Patel, 2011). The idea is simple. If one could show that there is a strong overlap for music and language processing, this would go against a modular and domain-specific view. However, there are more problems with this approach than one might imagine. We will review them briefly in the following section together with some neuroimaging findings. The first problem is of purely methodological order. Indeed, many published works using fMRI, including those comparing music and speech processing, use a subtraction logic. Namely, results are a statistical contrast that only allows us to see which areas show a greater signal compared to another condition. This is referred to the tip of the iceberg problem. Indeed, it may well be that by contrasting a language and a music task one finds a peak in a given region. This is then interpreted as a specific area dedicated to language (or to music, depending upon the direction of the subtraction; see for instance Rogalsky & Hickok, 2011). However, this completely ignores the possibility that there is a large common substrate that is invisible when making the subtraction (e.g., 100 and 101 share 100, but 101–100 only shows 1). This approach is Manichean and suffers from its lack of quantitative descriptions. These studies have, therefore, a methodological bias toward highlighting differences rather than commonalities. A second series of problems is the experimental designs that have been used. Indeed, only a few studies have directly used the same participants and the same experiment music and language processing. Comparing
results across studies will also tend to show differences that may not be due to brain computations but to differences in the populations, acquisition, or analysis pipelines. Even when assessing music and language processing in the same participant, there remains the challenge of comparing comparable conditions. This goes beyond the fact that speech and music stimuli by nature are different acoustically, insofar as if this was the only difference it should affect only the primary auditory cortex. The real challenge is to define the proper elementary operation and balance the difficulty level of the task across linguistic and musical stimuli. Defining the operation is already quite challenging because it requires a “good” model of what to compare. Of course comparing music and language does not make any sense, because there is no such a thing as a function for music in the brain. Thus, music and language need to be reduced to more elementary functions as described earlier. But even comparing syntactic processing is not trivial. Indeed, one needs to choose which syntactic level to compare in language (syntactic embedding or gender agreement do not imply the same operations) and find the good analogy in music. Then, the researcher is still left with a complicated issue, that of the difficulty level. For instance, in comparing the role of pitch in music and in language prosody, one should ascertain that the difficulty level of the task is comparable across material rather than using a fixed criterion (e.g., detect a 15 percent pitch change) that may be trivial with music but not with speech (Schön, Magne, & Besson, 2004). Another important issue is raised by Peretz and colleagues: It is important to keep in mind that neural overlap does not necessarily entail neural sharing. The neural circuits established for musicality may be intermingled or adjacent to those used for a similar function in language and yet be neurally separable. For example, mirror neurons are interspersed among purely motor-related neurons in pre-motor regions of the macaque cortex (Rizzolatti & Craighero, 2004). Similarly, the neurons responsible for the computation of some musical feature may be interspersed among neurons involved in similar aspects in speech. (Peretz, Vuvan, Lagrois, & Armony, 2015, p. 3)
The problem that is raised here is the scale problem of human anatomy. Historically, there has been a very rough distinction of music and language in terms of hemispheric dominance and this led many people to believe that language is processed by the left hemisphere and music by the right hemisphere. We now clearly know that this is not the case (Lindell, 2006;
Vigneau et al., 2011). Then, there have been more specific claims that the left Broca’s area would be language specific, but this has also been falsified, by showing for instance that musical harmony (Koelsch et al., 2002; Maess, Koelsch, Gunter, & Friederici, 2001) and rhythm processing (Herdener et al., 2012) are mediated by the same regions processing language syntax (Friederici & Kotz, 2003). Further work based on multivariate pattern analysis has shown that within overlapping regions, distinct brain patterns of responses can be found to linguistic and musical sounds (Abrams et al., 2010; Fedorenko, McDermott, Norman-Haignere, & Kanwisher, 2012). However, these differences could be accounted for in terms of differences in the stimuli manipulation or in the task. For instance, Abrams et al. (2010) compared scrambled versions of music and speech to normal music and speech and used a fixed scrambling window of 350 ms. As the authors acknowledge, it could be that music and speech have inherently different acoustical regularities and structures, rendering one material more “scrambled” than the other. Also, different patterns of activation in common brain areas may result from the same neural population reacting differently to music and language (Kunert & Slevc, 2015). The argument raised by Peretz advocates for the possibility of music dedicated neurons, adjacent to language dedicated neurons. While this is of course a non-falsifiable hypothesis for the moment, one should not think of music or language as a whole, but in terms of precisely defined elementary operations. If these operations are required with both language and music material, then there would be no reason for the brain to produce two extremely intermingled networks computing the same algorithm. On the other side it is clear that the rules determining gender agreement or those affecting tonality modulations are necessarily represented in different neural networks. Thus, claiming that differences may always subsist at a smaller scale is a recursive argument that does not really add much to the debate (besides the fact that at a quantum level, music and language can be described by the same equations). In our view, the major advances will not come from single unit recordings showing specific neurons to the last chord of a precise Haydn piano sonata, but rather from neurocomputational models precisely describing what particular operations are subtended by a given neural network when listening to speech and to music.
A more promising approach seems to us to study whether two different levels of music and language processing interact or not. Indeed, the interaction is a measure of the extent to which two processes influence each other and as such it can be used to infer that one process is not independent of the other. Several studies have tackled this issue by using interference paradigms. For instance Slevc and colleagues (Slevc, Rosenberg, & Patel, 2009) measured the reading time of garden path sentences and found that it was influenced by simultaneous presentation of irrelevant harmonically unexpected chords while it was not affected by timbrally unexpected chords (e.g., a different instrument). These results have been interpreted as evidence for shared music–language resources processing structural (syntactic) relations. The task-irrelevant music being processed automatically, it uses some resources resulting in a suboptimal processing of the language syntactic relations. Other studies have used this approach to show an interaction between melodic and syntactic processing (Fedorenko, Patel, Casasanto, Winawer, & Gibson, 2009), harmonic and syntactic processing but not semantic processing (Hoch, Poulin-Charronnat, & Tillmann, 2011), and harmonic processing and word recall (Fiveash & Pammer, 2014). This has also been coupled to electrophysiological measures, confirming that melodic or harmonic unexpected events affect the syntax-related left anterior negativity (Carrus, Pearce, & Bhattacharya, 2013; Koelsch, Gunter, Wittfoth, & Sammler, 2005). Interestingly Sammler et al. (2013) showed a co-localization of early components elicited by musical and linguistic syntactic deviations using intracranial recordings. Surprisingly, few neuroimaging studies have exploited the possibly most natural setting to compare music and language which is a stimulus that combines both speech and music: song. The use of songs has the clear advantage of solving the problem of using different stimuli in the language and musical task. Schön et al. (2010) used an interference paradigm based on sung sentences and showed that the processing demands of melodic and lexical/phonological processing interact in a large network including bilateral temporal cortex and left inferior frontal cortex. Importantly, most voxels sensitive to the lexical/phonological manipulation are also sensitive to the interaction between the lexical/phonological and the melodic dimensions. In other words there seem to be very few voxels that are involved in lexical/phonological and are not influenced by melodic structure (see Fig. 1).
FIGURE 1. Number of surviving voxels for the main effect of lexical/phonological dimension as a function of the threshold of the interaction between phonological and melodic dimensions. The dotted vertical line indicates the p-value of 0.05 for the mask. The right edge corresponds to a very conservative p-value (adapted from Schön et al., 2010).
Similarly, Sammler et al. (2010) using an adaptation paradigm, showed a strong integration between melodic and phonological levels in song in the dorsal pathway with a degree of integration decaying toward anterior regions of the left STS, possibly resulting from the processing of meaning of words. This integration of melodic and phonological dimension is also in line with the findings that a sung language is more easily learned than a spoken language (Schön et al., 2008). Kunert and colleagues (Kunert, Willems, Casasanto, Patel, & Hagoort, 2015) showed an effect of musical harmonic deviancy on language syntax processing in the left inferior frontal gyrus. Notably this effect was not present when the deviancy in the musical stimulus was limited to the acoustic level (louder sound). Interestingly the authors also showed, in a behavioral study, an effect of the syntactic structure of sentences on the performance of a musical harmonic judgment task, confirming the idea of shared resources.
One may wonder how to combine these data suggesting shared resources with the “older” data issued from the neuropsychological studies pointing rather to a specificity and independence of several levels of language and music processing. However, very few studies have tried to systematically assess the co-existence of language and musical deficits, even for the most studied language deficit following a lesion in Broca’s area. Ani Patel was the first to investigate brain-damaged individuals and more specifically aphasic individuals with grammatical comprehension problems in language in order to see whether they also have a deficit in processing structural musical relations (Patel, Iversen, Wassenaar, & Hagoort, 2008). Broca’s aphasic patients and controls had to judge whether a set of sentences contained or not a grammatical or semantic error. A similar task was used with harmonic error introduced into musical chord sequences. In a second experiment participants were tested using an implicit harmonic priming procedure. Both experiments showed that the aphasic patients have an impaired musical syntactic processing. Importantly, this took place in absence of low-level deficits, and with a preserved short-term memory for pitch patterns. This scenario is complicated by the fact that not all agrammatic patients may necessarily show a musical deficit (Slevc, FaroqiShah, Saxena, & Okada, 2016). On a similar line, Sammler and colleagues (Sammler, Koelsch, & Friederici, 2011) showed a reduction or extinction of the typical electrophysiological marker of musical syntax processing in agrammatic patients with a lesion in the left inferior frontal cortex. These results are consistent with the hypothesis that Broca’s area computes a rather domain-general “syntactic” processing but still a huge amount of work remains to be done with brain-lesioned patients.
M
T
L
S
We have seen that the approach of studying music and language brain correlates is limited by a number of methodological problems that render the interpretation of the results in terms of sharing or not of the resources rather complex. Another way to address the sharing resources hypothesis is to investigate whether music training affects the way the brain processes language, and vice versa. The reasoning is the following. Musical expertise
requires an intense training often starting at an early age. As a result of learning, all the operations required by music perception and production will be affected by this training and become more efficient. If some of these operations are also required by language perception and production, then one should be able to observe a more efficient processing whenever the appropriate language processing levels are investigated. By contrast with the approach described above, the validation of this hypothesis does not necessarily require brain imaging data, insofar as behavioral differences can be taken as evidence that resource sharing exists. Psychologists and some neuroscientists often use the term “transfer of learning.” This term is, however, rather vague as it seems to point to some sort of magic transfer of learning from one domain to another or from one function to another function without specifying how this transfer would actually take place. However, an alternative explanation is to hypothesize that these so-called transfer effects are simply due to an elementary function that is shared by both music and language processing. According to this view there is no transfer taking place, but only sharing of functions and resources. Importantly, while there is no clear way of showing how transfer could be possibly implemented, shared elementary operations can be defined via careful experimental manipulations. Considering the early steps of sound analysis helps to clarify this point. The group of Nina Kraus has studied for many years the effect of music training on sound perception in general, including speech. Using EEG and focusing on high frequency (>200 Hz) neural responses, possibly principally occurring at the subcortical level, this group of researchers has shown that, compared to non-musicians, musicians have a stronger representation of several features of speech sounds, including the fundamental frequency (Wong, Skoe, Russo, Dees, & Kraus, 2007), the harmonics (Kraus & Chandrasekaran, 2010), and rapid transients that may be important in distinguishing consonants (Parbery-Clark, Tierney, Strait, & Kraus, 2012). Overall, the correlation between the neural response and the stimulus is greater in musicians than in non-musicians and this independently of whether the stimulus is a music or a speech sound (Musacchia, Sams, Skoe, & Kraus, 2007). Most importantly this correlation is more resistant to acoustic noise in musicians. In other words, musicians seem to be able to filter out the noise better than non-musicians (ParberyClark, Skoe, & Kraus, 2009). Interestingly some of these differences can be
observed in adults that had a few years of music training during childhood, thus showing that these changes last in time and do not necessarily require a long-lasting and intense training (Skoe & Kraus, 2012). Moreover, these differences induced by music training are not simply due to a better processing of any sound feature. Indeed, results of a recent experiment show that music training can facilitate the selective processing of certain relevant features of speech. In this study, Intartaglia and colleagues (Intartaglia, White-Schwoch, Kraus, & Schön, 2017) compared French and American participants listening to an American phoneme, not existing in French. The comparison of the neural signatures showed that American participants had a more robust representation compared to French participants. The differences concerned the high formant frequencies that are necessary to encode the specific features of consonants and vowels. They then tested French musicians and the differences with the Americans disappeared. In other words, music training seems to allow a better encoding of the relevant features of speech sounds, even when these sounds are not familiar. When interpreting these overall results one should keep in mind that two possible non-exclusive explanations co-exist. First, the subcortical relays may be more efficient in sound processing due to massive bottom-up processing. In this case one can clearly see that there is no need to advocate for a transfer effect. There is a dedicated auditory subcortical network that processes both musical and linguistic sounds. If this network becomes more efficient via intensive musical training, then speech processing will also benefit from the enhanced efficiency. Second, the cortical regions are known to send efferent signals to the subcortical relays and these modulatory top-down signals may play a role in enhancing the representation of certain features of sounds or in reducing the noise (Strait, Kraus, Parbery-Clark, & Ashley, 2010; Tenenbaum, Kemp, Griffiths, & Goodman, 2011). In this perspective, the changes are possibly due to an enhanced connectivity that allows a finer modulatory activity of cortical over subcortical activity. Independently of whether these enhanced subcortical representations reflect a bottom-up or a top-down modulation, these results are important in interpreting the differences that may be observed at a more integrated level. Indeed, differences observed at a phonological, syntactic, or prosodic level may result from a cascade effect of early auditory processing differences.
The studies on prosody and phoneme perception in musicians are particularly sensitive to this issue. Indeed, pitch is important in speech at the supra-segmental level, by signaling the emotional content of an utterance (Kotz et al., 2003), the linguistic structure (Steinhauer, Alter, & Friederici, 1999), and certain syntactic features such as to determine whether the utterance is a question or not (Astésano, Besson, & Alter, 2004). Pitch contour also plays a role at the segmental level in tone languages: it plays a linguistically contrastive function. Musicians are more accurate in detecting subtle pitch variations in both music and speech prosody. These variations in speech prosody are detected earlier by musicians’ brains and elicit more distinguishable event-related potentials compared to normal speech (Schön et al., 2004). This has been replicated with 8-year-old musician children (Magne, Schön, & Besson, 2006). Music lessons also seem to promote sensitivity to emotions conveyed by speech prosody. Indeed, musically trained adults perform better than untrained adults in discrimination and identification of emotional prosody (Thompson, Schellenberg, & Husain, 2004). Finally, musicians are more accurate at identifying, reproducing, or discriminating Mandarin tones (Gottfried & Riester, 2000; Gottfried, Staby, & Ziemer, 2004; Marie, Delogu, Lampis, Belardinelli, & Besson, 2011). However, as previously stated, it is difficult to know to what extent these differences are due to cortical or subcortical plasticity. Considering that anatomical differences have been observed at the cortical level in the auditory cortex (Benner et al., 2017; Kleber et al., 2016; Schlaug, Jäncke, Huang, Staiger, & Steinmetz, 1995; Shahin, Bosnyak, Trainor, & Roberts, 2003), it seems reasonable to believe that the whole auditory network is modified by music training, thus affecting speech processing at multiple levels. Interestingly, previous studies provided evidence for a positive relationship between the function or the anatomy of the planum temporale and performance during syllable categorization (Elmer, Hänggi, Meyer, & Jäncke, 2013). Recently, Elmer and colleagues (Elmer, Hänggi, & Jäncke, 2016) provided evidence for a relationship between planum temporale connectivity, musicianship, and phonetic categorization. They found an increased connectivity between the left and right plana temporalia in musicians compared to non-musicians. This increased connectivity positively correlated with the performance in a phonetic categorization task as well as with musical aptitudes. Indeed, music training seems to affect the
sensitivity to some acoustic features that are important to categorization of syllables, in particular temporal features such as voice-onset time (Chobert, Marie, François, Schön, & Besson, 2011; Zuk et al., 2013). Very few studies have examined whether musical expertise influences the processing of the speech temporal structures. While isochrony is absent in speech, several nested temporal hierarchies are present in speech (Cummins & Port, 1998; Ghitza, 2011; Giraud & Poeppel, 2012). Musicians outperform non-musicians when asked to judge the lengthening of a syllable in a sentence (Marie, Magne, & Besson, 2011). Also, independently of whether musicians direct attention to the temporal or semantic content, they are more sensitive to subtle changes in the temporal structure of speech than non-musicians (Magne et al., 2006; Marie, Delogu, et al., 2011). Milovanov et al. (2009) reported a positive correlation between musical aptitudes and sensitivity to syllable discrimination in children. In artificial language learning, speech segmentation results from the capacity to parse a continuous stream of syllables and to build and maintain probabilistic relationship of the different elements (syllables) that compose words. François and Schön (2011) showed that musicians have improved segmentation skills compared to non-musicians. Indeed, when listening to a new stream of an artificial language, they are faster and more accurate at segmenting the continuous stream. Children, after only one year of music training already show an improvement in speech segmentation (François, Chobert, Besson, & Schön, 2012). This ability, namely discovering word boundaries in the continuous stream of natural speech, is of utmost importance during language learning in the first years of life (Saffran et al., 1996). The evidence concerning an effect of music training on language semantic and syntactic levels is rather scarce. One study showed that music training seems to influence semantic aspects of language processing (Dittinger et al., 2016). However, in this study, French participants had to learn new words that were in Thai language. Thus, differences may be due to the difficulty of the task at the perceptual level in terms of discriminating Thai tokens that differed in terms of pitch or vowel length. At the neural level, results indicate an increased functional connectivity in the ventral and dorsal streams of the left hemisphere during retrieval of novel words in musicians compared to non-musicians (Dittinger, Valizadeh, Jäncke, Besson, & Elmer, 2018). An effect of musical expertise on syntactic
processing was shown by Jentschke and Koelsch (2009) with earlier and larger evoked responses to syntactic errors in children with musical training. However, others described that differences are absent at the behavioral level and that musical expertise does not modulate the amplitude of responses evoked by syntactic violations but only the topographical distribution (Fitzroy & Sanders, 2013). Thus, the evidence that music training affects language semantic and syntactic processing is not yet compelling and further studies are awaited. Overall, while the theoretical framework of transfer of learning remains uncertain, there is a rather massive amount of data pointing to an improvement induced by music training at different levels of speech and language processing. Patel (2014) has tried to formalize the conditions under which music training may be beneficial to speech processing. In the OPERA hypothesis (Overlap, Precision, Emotion, Repetition, and Attention) he suggests that, in order for music training to enhance speech processing, music and speech need to share sensory or cognitive processing mechanisms and music must place higher demands on these mechanisms compared to speech. These mechanisms are tightly bound to the music emotional rewards system (Salimpoor et al., 2013). The last ingredients of music-induced and speech-related neural plasticity would be the fact that music training requires a repetition of sound patterns and gestures for an enormous amount of time under conditions of highly focused attention.
B
M
L
When considering the effects of music training on speech and language abilities, one should keep in mind that most of the studies described here compared adult professional musicians to a group of adult non-musicians. This comparison has two methodological weaknesses. The first concerns the possibility of pre-existing differences, namely musicians already differed from non-musicians before starting to make music. The second is that music training is a complex activity, often involving individual lessons, group activities, theory classes, and so on. This makes it impossible to know what factors in music training had an impact on speech and language abilities.
Both criticisms can be addressed by running longitudinal studies assessing the absence of differences before the beginning of music training (Chobert, François, Velay, & Besson, 2012; François et al., 2012), and comparing the music-training group with a control group involved in an activity with a similar setting (e.g., visual arts, theater). However, this approach is time and cost consuming, insofar as it requires following two groups of children for a long period of time (often one year), testing them at least twice and coordinating the two training programs. There is an alternative methodological approach that is somewhat in between the interference or interaction approach and the group comparison described earlier. The idea is to test the effect of music stimulation on speech perception. This has proven particularly successful in the temporal domain. Indeed, the structure of speech and music have a similar hierarchical temporal scaffolding (Haegens & Golumbic, 2018; Schön & Tillmann, 2015). A series of studies has shown that priming the temporal structure of speech using a music rhythmic prime can induce a speech processing benefit (Cason, Astésano, & Schön, 2015; Cason & Schön, 2012; Chern, Tillmann, Vaughan, & Gordon, 2018; Przybylski et al., 2013). These studies showed a benefit of rhythmic priming both in phoneme detection and in a grammaticality judgment task. This approach has been particularly efficient with language-impaired population. For instance, passive listening to a rhythmic regular prime improved the performance in a grammaticality judgment task in children with dyslexia or specific language impairment (SLI, Bedoin, Brisseau, Molinier, Roch, & Tillmann, 2016; Przybylski et al., 2013) and patients with a basal ganglia lesion (Kotz, Gunter, & Wonneberger, 2005). While these results support the importance of temporal predictions, the ability to anticipate in time upcoming events, for language processing, it is not clear whether the benefit at the grammatical level is mediated by a selective effect at the syntactic level or by improved speech perception. For instance Cason and colleagues (Cason, Hidalgo, Isoard, Roman, & Schön, 2015) have shown that priming the temporal structure of a sentence with music improved phoneme perception in hearing-impaired children. Most of these studies have used a passive listening approach. However, an active approach, requiring the intervention of the audio-motor network seems to have a stronger effect than passive listening (Cason, Astésano, & Schön, 2015; Morillon & Baillet, 2017; Morillon, Schroeder, & Wyart,
2014). An interesting avenue for the future is to test the effect of a single session of music training on several levels of speech and language processing. This seems to us a good compromise between all the abovementioned approaches insofar as it prevents the criticisms of pre-existing differences, and it allows a strict control of the content of the training session without “reducing” music to a passive listening of an isochronous metronome. Recently, Hidalgo and colleagues (Hidalgo, Falk, & Schön, 2017) used this type of approach to investigate temporal adaptation in speech interaction in hearing-impaired children. They showed that a 30 minute session of active rhythmic training facilitated the access to the temporal structure of verbal interactions and improved performance in a simple turn-taking task. One of the factors prompting research in the domain of music and language is the possibility to use music to remediate language impairment. Thus, the fundamental research supports the therapeutic approach of using music to recover impaired functions by defining what aspects of music training benefit language processing and at which levels of processing. While it is not the aim of this chapter to review this literature (see Chapter 29 by Lee, Thaut, and Santoni, this volume), it is important to note that the underlying neuroscientific models supporting the use of music in language rehabilitation have changed. For instance, the development of melodic intonation therapy to recover language function in non-fluent aphasic patients was somewhat driven by the idea that patients can learn a new way to speak through singing by using the right hemisphere (Albert, Sparks, & Helm, 1973; Zumbansen, Peretz, & Hébert, 2014). Forty years later our knowledge of the spatiotemporal dynamics subtending music and language on one side and of the pathophysiology of language disorders on the other side has been refined. Stahl and colleagues (Stahl, Kotz, Henseler, Turner, & Geyer, 2011) have shown, concerning non-fluent aphasia, that rhythmic training may be the most relevant aspect of the musical intervention, rather than the melodic aspect, especially when patients present a basal ganglia lesion, a subcortical structure involved in motor coordination and the processing of temporal information (Kotz & Schwartze, 2010). Interestingly, several recent works on the use of music for language rehabilitation point to an important role of the rhythmic aspect of music. More precisely, musical training targeted toward improving rhythmic perception and production, resulted in improved phonological and reading
skills (Bhide, Power, & Goswami, 2013; Cogo-Moreira, de Avila, Ploubidis, & de Jesus Mari, 2013; Flaugnacco et al., 2015; Moore, Branigan, & Overy, 2017; Overy, 2000). These results suggest a shared substrate and point to temporal processing as playing a major role in language processing. This fits with the temporal sampling framework proposed by Goswami (2011) for dyslexia and by extension for SLI. Building on the neural resonance theory positing internal oscillators guiding attention over time (Large & Jones, 1999), Goswami suggests that deficits in syllabic segmentation and other sequential processes may result from impaired rhythmic entrainment leading to difficulties in sampling information over time. Along a similar line, Tierney and Kraus (2014) proposed the precise auditory timing hypothesis (PATH) that suggests that neural entrainment in auditory and motor cortex, and the interaction between them, underlies many of the behavioral aspects of both language and music processing. We will now describe music and language with a temporal focus.
A T
F L
M
As of today, the most promising approach to understand information processing seems to us the adoption of a dynamical emphasis, focusing on the temporal dimension. This perspective can be operationalized in complementary ways, ranging from portraying temporal regularities within sensory inputs to investigating time-resolved neural patterns of activity implicated in sensory processing, both in terms of frequency-resolved neural oscillations and neural networks dynamics. The underlying motivation is to describe information processing at the algorithmic (or representational) level, as first proposed by David Marr (Marr, 1982; Poeppel, 2012)—in other words to understand how the system does what it does, and more precisely what representations it uses, how they emerge, and how they are manipulated. Describing the time constants or the temporal profile of activity of each of these neural algorithms constitutes a preliminary stage toward this ultimate goal. While this approach can be carried out separately for music and language, a direct comparison of the
two is also useful to delimitate general processing steps from more specific ones. In the speech domain, David Poeppel has theorized this approach in the “asymmetric sampling in time” hypothesis (Giraud and Poeppel, 2012; Poeppel, 2003). Basically, speech can be described as a multi-timescale signal, with a hierarchical organization composed of phonemic, syllabic, and prosodic information (among others). At the neural level, both parallel and sequential processing occurs, with gamma (~30 Hz), theta (~5 Hz), and delta (~2 Hz) oscillations being specifically engaged by these multitimescale, quasi-rhythmic properties of speech, and tracking its dynamics. Giraud and Poeppel argue that such neural oscillations “are foundational in speech and language processing, ‘packaging’ incoming information into units of the appropriate temporal granularity” (Giraud & Poeppel, 2012, p. 511). Interestingly, music is also characterized by a multi-timescale structure, with rhythm and meter hierarchically organized (Vuust & Witek, 2014). However, in an acoustic characterization of the temporal modulations in music and speech, Ding and colleagues (2017) recently highlighted that their temporal modulation rates differ. While the main tempo of music is around 2 Hz (120 bpm), a temporal modulation around 5 Hz primarily characterizes speech, which corresponds to the syllabic rate. At least two complementary avenues can be drawn from this result. First, the distinction between music and speech modulation properties could be at the origin of some of their computational differences. In a fascinating paradigm, Oded Ghitza showed that intelligibility of timecompressed speech can be greatly enhanced if periods of silence of the appropriate duration are inserted (Ghitza, 2011; Ghitza & Greenberg, 2009). Oscillation-based models of speech perception best explain these data, where optimum intelligibility is achieved when the syllable rhythm is within the range of the theta-frequency brain rhythms (~4–10 Hz), comparable to the rate at which segments and syllables are articulated in conversational speech. Follow-up experiments were performed in the music domain, where participants had to identify the musical key of timecompressed short melodic sequences (Farbood, Marcus, & Poeppel, 2013; Farbood, Rowland, Marcus, Ghitza, & Poeppel, 2015). This highlighted that insertion of silence gaps was beneficial to performance, in unison with the speech experiments, providing compelling clues into possible oscillatory mechanisms underlying segmentation of auditory information. However,
the two experiments in the music domain were not conclusive with regard to the preferred rate of processing, observed at 2–3 Hz or 5–7 Hz, respectively. While the former result would be compatible with the fact that the main tempo of music is around 2 Hz, suggesting that the distinctions between music and speech acoustic modulation properties are a productive attribute of their respective perceptual analysis, the latter would be compatible with the idea that the auditory cortex parses information at the theta rate, and that such sampling operates rather independently of the nature of the acoustic signal (music or speech). Second, the most shared characteristic between music and language acoustic signals is that both of them have strong temporal constraints (i.e., a salient main modulation rate, at ~2 and 5 Hz, respectively), leading to strong temporal predictions. Temporal predictions are believed to play a fundamental role in the way we sample sensory information, in particular in the auditory domain (Jones, 1976; Nobre & van Ede, 2018; Schroeder & Lakatos, 2009). Behavioral experiments show that anticipating the moment of occurrence of an upcoming event optimizes its processing by improving the quality of sensory information (Jaramillo & Zador, 2011; Morillon, Schroeder, Wyart, & Arnal, 2016; Rohenkohl, Cravo, Wyart, & Nobre, 2012). Current theories and empirical findings suggest that this enhancement is achieved by the entrainment of low-frequency neuronal oscillations, which temporally modulates the excitability of task-relevant neuronal populations (Cravo, Rohenkohl, Wyart, & Nobre, 2013; Large & Jones, 1999; Schroeder & Lakatos, 2009). Such entrainment, principally observed in sensory cortices (Besle et al., 2011; Lakatos et al., 2013), would be possible thanks to the downward propagation of temporal prediction signals, recently shown to originate in the motor system (Morillon & Baillet, 2017). These signals would be responsible for the predictive alignment of the neuronal excitability phase of ongoing oscillations in sensory cortex with upcoming events, possibly through top-down phasereset (e.g., Park, Ince, Schyns, Thut, & Gross, 2015 ; Stefanics et al., 2010). A recent proposition by Arnal and colleagues (Rimmele, Morillon, Poeppel, & Arnal, submitted) is that time estimation relies on the neural recycling of action circuits (Coull, 2011) and is implemented by internal, non-conscious “simulation” of movements in most ecological situations (Arnal, 2012; Arnal & Giraud, 2012; Schubotz, 2007). On this view, temporal predictions correspond to a covert form of active sensing
(Morillon, Hackett, Kajikawa, & Schroeder, 2015; Schroeder, Wilson, Radman, Scharfman, & Lakatos, 2010). In other words, the efferent motor signals that are generated when synchronizing our actions to predictable events are also generated during the passive perception of such regularities (Arnal, 2012; Patel & Iversen, 2014). When temporal regularities occur in the timescale of natural actions/movements, the motor system is recruited (Chen, Penhune, & Zatorre, 2008; Du & Zatorre, 2017; Grahn & Rowe, 2012; Merchant, Grahn, Trainor, Rohrmeier, & Fitch, 2015; Teki, Grube, Kumar, & Griffiths, 2011; Zatorre, Chen, & Penhune, 2007). The great richness of the repertoire of motor schemes (gestures) makes it possible to simulate (and predict) the occurrence of sensory events with great accuracy and to treat them with greater precision (Morillon et al., 2016; Schubotz, 2007), offering a flexible tool to precisely predict “when” and select relevant information in time. Given the finesse of our motor expertise and the amazing complexity of our repertoire of actions, this means that we can use internal simulation of action to anticipate temporal trajectories. This conception is compatible with various forms of “motor theories” of speech perception (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967), in which the covert simulation of actions can lead to a given sensory configuration. The role of temporal predictions, while being a critical role in both music and language, differs in multiple ways. First, music is much more rhythmic than speech, hence predictions are more precise. Second, while temporal predictions have primarily a contextual role in language, helping to optimize the extraction of relevant information, they serve a much more fundamental purpose in music. Indeed, musical rhythm has a remarkable capacity to move our minds and bodies. This is because it is part of the information content in itself, rather than being a contextual cue (as in language). In a compelling review article, Vuust and Witek (2014) hypothesize that music would exploit general principles of brain functioning, notably its structuration as a Bayesian, predictive system, to optimize our pleasure and desire to move. In any case, these distinctions highlight that music stimulates the dorsal auditory stream much more than language, as this pathway is involved in audio-motor transformation (Hickok & Poeppel, 2007) and temporal information processing (Morillon & Baillet, 2017). As a consequence, musical training or musical stimulation strengthen the connectivity between auditory and motor cortices, which has
beneficial effects for speech comprehension (Falk, Lanzilotti, & Schön, 2017), especially in noisy conditions (Du & Zatorre, 2017), and phonological and reading skills in children (Flaugnacco et al., 2015), as described earlier. Overall, while music and language have both different structure and function, they share the specificity to be temporal in essence. Adopting a dynamical approach seems thus the most promising avenue to understand how the human brain interacts with this type of multisensory environment.
R Abrams, D. A., Bhatara, A., Ryali, S., Balaban, E., Levitin, D. J., & Menon, V. (2010). Decoding temporal structure in music and speech relies on shared brain resources but elicits different finescale spatial patterns. Cerebral Cortex 21(7), 1507–1518. Albert, M. L., Sparks, R. W., & Helm, N. A. (1973). Melodic intonation therapy for aphasia. Archives of Neurology 29, 130–131. Arnal, L. (2012). Predicting “when” using the motor system’s beta-band oscillations. Frontiers in Human Neuroscience 6, 225. Retrieved from https://doi.org/10.3389/fnhum.2012.00225 Arnal, L., & Giraud, A. L. (2012). Cortical oscillations and sensory predictions. Trends in Cognitive Sciences 16(7), 390–398. Astésano, C., Besson, M., & Alter, K. (2004). Brain potentials during semantic and prosodic processing in French. Cognitive Brain Research 18(2), 172–184. Basso, A., & Capitani, E. (1985). Spared musical abilities in a conductor with global aphasia and ideomotor apraxia. Journal of Neurology, Neurosurgery & Psychiatry 48(5), 407–412. Bedoin, N., Brisseau, L., Molinier, P., Roch, D., & Tillmann, B. (2016). Temporally regular musical primes facilitate subsequent syntax processing in children with specific language impairment. Frontiers in Neuroscience 10. Retrieved from https://doi.org/10.3389/fnins.2016.00245 Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P., & Pike, B. (2000). Voice-selective areas in human auditory cortex. Nature 403(6767), 309–312. Benner, J., Wengenroth, M., Reinhardt, J., Stippich, C., Schneider, P., & Blatow, M. (2017). Prevalence and function of Heschl’s gyrus morphotypes in musicians. Brain Structure and Function 222(8), 1–17. Besle, J., Schevon, C. A., Mehta, A. D., Lakatos, P., Goodman, R. R., McKhann, G. M., … Schroeder, C. E. (2011). Tuning of the human neocortex to the temporal dynamics of attended events. Journal of Neuroscience 31(9), 3176–3185. Bhide, A., Power, A., & Goswami, U. (2013). A rhythmic musical intervention for poor readers: A comparison of efficacy with a letter-based intervention. Mind, Brain, and Education 7(2), 113– 123. Carrus, E., Pearce, M. T., & Bhattacharya, J. (2013). Melodic pitch expectation interacts with neural responses to syntactic but not semantic violations. Cortex 49(8), 2186–2200. Cason, N., Astésano, C., & Schön, D. (2015). Bridging music and speech rhythm: Rhythmic priming and audio-motor training affect speech perception. Acta Psychologica 155, 43–50.
Cason, N., Hidalgo, C., Isoard, F., Roman, S., & Schön, D. (2015). Rhythmic priming enhances speech production abilities: Evidence from prelingually deaf children. Neuropsychology 29(1), 102. Cason, N., & Schön, D. (2012). Rhythmic priming enhances the phonological processing of speech. Neuropsychologia 50(11), 2652–2658. Chen, J. L., Penhune, V. B., & Zatorre, R. J. (2008). Listening to musical rhythms recruits motor regions of the brain. Cerebral Cortex 18(12), 2844–2854. Chern, A., Tillmann, B., Vaughan, C., & Gordon, R. L. (2018). New evidence of a rhythmic priming effect that enhances grammaticality judgments in children. Journal of Experimental Child Psychology 173, 371–379. Cheung, C., Hamilton, L. S., Johnson, K., & Chang, E. F. (2016). The auditory representation of speech sounds in human motor cortex. eLife 5, e12577. Chobert, J., François, C., Velay, J. L., & Besson, M. (2012). Twelve months of active musical training in 8- to 10-year-old children enhances the preattentive processing of syllabic duration and voice onset time. Cerebral Cortex 24(4), 956–967. Chobert, J., Marie, C., François, C., Schön, D., & Besson, M. (2011). Enhanced passive and active processing of syllables in musician children. Journal of Cognitive Neuroscience 23(12), 3874– 3887. Chomsky, N. (1959). A review of B. F. Skinner’s Verbal Behavior. Language 35(1), 26–58. Cogo-Moreira, H., de Avila, C. R. B., Ploubidis, G. B., & de Jesus Mari, J. (2013). Effectiveness of music education for the improvement of reading skills and academic achievement in young poor readers: A pragmatic cluster-randomized, controlled clinical trial. PloS ONE 8(3), e59984. Coull, J. T. (2011). Discrete neuroanatomical substrates for generating and updating temporal expectations. In S. Dehaene & E. Brannon (Eds.), Space, time and number in the brain: Searching for the foundations of mathematical thought (pp. 87–101). Amsterdam: Elsevier. Cravo, A. M., Rohenkohl, G., Wyart, V., & Nobre, A. C. (2013). Temporal expectation enhances contrast sensitivity by phase entrainment of low-frequency oscillations in visual cortex. Journal of Neuroscience 33(9), 4002–4010. Cummins, F., & Port, R. (1998). Rhythmic constraints on stress timing in English. Journal of Phonetics 26(2), 145–171. DeWitt, L. A., & Samuel, A. G. (1990). The role of knowledge-based expectations in music perception: Evidence from musical restoration. Journal of Experimental Psychology: General 119(2), 123–144. Ding, N., Patel, A. D., Chen, L., Butler, H., Luo, C., & Poeppel, D. (2017). Temporal modulations in speech and music. Neuroscience & Biobehavioral Reviews 81(B), 181–187. Dittinger, E., Barbaroux, M., D’Imperio, M., Jäncke, L., Elmer, S., & Besson, M. (2016). Professional music training and novel word learning: From faster semantic encoding to longerlasting word representations. Journal of Cognitive Neuroscience 28(10), 1584–1602. Dittinger, E., Valizadeh, S. A., Jäncke, L., Besson, M., & Elmer, S. (2018). Increased functional connectivity in the ventral and dorsal streams during retrieval of novel words in professional musicians. Human Brain Mapping 39(2), 722–734. Du, Y., & Zatorre, R. J. (2017). Musical training sharpens and bonds ears and tongue to hear speech better. Proceedings of the National Academy of Sciences 5, 201712223. Retrieved from https://doi.org/10.1073/pnas.1712223114 Elhilali, M., & Shamma, S. A. (2008). A cocktail party with a cortical twist: How cortical mechanisms contribute to sound segregation. Journal of the Acoustical Society of America 124(6), 3751–3771.
Elmer, S., Hänggi, J., & Jäncke, L. (2016). Interhemispheric transcallosal connectivity between the left and right planum temporale predicts musicianship, performance in temporal speech processing, and functional specialization. Brain Structure and Function 221(1), 331–344. Elmer, S., Hänggi, J., Meyer, M., & Jäncke, L. (2013). Increased cortical surface area of the left planum temporale in musicians facilitates the categorization of phonetic and temporal speech sounds. Cortex 49(10), 2812–2821. Falk, S., Lanzilotti, C., & Schön, D. (2017). Tuning neural phase entrainment to speech. Journal of Cognitive Neuroscience 29(8), 1378–1389. Farbood, M. M., Marcus, G., & Poeppel, D. (2013). Temporal dynamics and the identification of musical key. Journal of Experimental Psychology Human Perception & Performance 39(4), 911– 918. Farbood, M. M., Rowland, J., Marcus, G., Ghitza, O., & Poeppel, D. (2015). Decoding time for the identification of musical key. Attention, Perception, & Psychophysics 77(1), 28–35. Fedorenko, E., McDermott, J. H., Norman-Haignere, S., & Kanwisher, N. (2012). Sensitivity to musical structure in the human brain. Journal of Neurophysiology 108(12), 3289–3300. Fedorenko, E., Patel, A., Casasanto, D., Winawer, J., & Gibson, E. (2009). Structural integration in language and music: Evidence for a shared system. Memory & Cognition 37(1), 1–9. Fitzroy, A. B., & Sanders, L. D. (2013). Musical expertise modulates early processing of syntactic violations in language. Frontiers in Psychology 3, 603. Retrieved from https://doi.org/10.3389/fpsyg.2012.00603 Fiveash, A., & Pammer, K. (2014). Music and language: Do they draw on similar syntactic working memory resources? Psychology of Music 42(2), 190–209. Flaugnacco, E., Lopez, L., Terribili, C., Montico, M., Zoia, S., & Schön, D. (2015). Music training increases phonological awareness and reading skills in developmental dyslexia: A randomized control trial. PLoS ONE 10(9), e0138715. Fodor, J. A. (1983). The modularity of mind: An essay on faculty psychology. Cambridge, MA: MIT Press. François, C., Chobert, J., Besson, M., & Schön, D. (2012). Music training for the development of speech segmentation. Cerebral Cortex 23(9), 2038–2043. François, C., & Schön, D. (2011). Musical expertise boosts implicit learning of both musical and linguistic structures. Cerebral Cortex 21(10), 2357–2365. Friederici, A. D., & Kotz, S. A. (2003). The brain basis of syntactic processes: Functional imaging and lesion studies. NeuroImage 20, S8–S17. Ghitza, O. (2011). Linking speech perception and neurophysiology: Speech decoding guided by cascaded oscillators locked to the input rhythm. Frontiers in Psychology 2. Retrieved from https://doi.org/10.3389/fpsyg.2011.00130 Ghitza, O., & Greenberg, S. (2009). On the possible role of brain rhythms in speech perception: Intelligibility of time-compressed speech with periodic and aperiodic insertions of silence. Phonetica 66, 113–126. Giraud, A. L., Kleinschmidt, A., Poeppel, D., Lund, T. E., Frackowiak, R. S., & Laufs, H. (2007). Endogenous cortical rhythms determine cerebral specialization for speech perception and production. Neuron 56(6), 1127–1134. Giraud, A. L., & Poeppel, D. (2012). Cortical oscillations and speech processing: Emerging computational principles and operations. Nature Neuroscience 15(4), 511–517. Goswami, U. (2011). A temporal sampling framework for developmental dyslexia. Trends in Cognitive Sciences 15(1), 3–10. Gottfried, T. L., & Riester, D. (2000). Relation of pitch glide perception and Mandarin tone identification. Journal of the Acoustical Society of America 108(5), 2604.
Gottfried, T. L., Staby, A. M., & Ziemer, C. J. (2004). Musical experience and Mandarin tone discrimination and imitation. Journal of the Acoustical Society of America 115(5), 2545. Grahn, J. A., & Rowe, J. B. (2012). Finding and feeling the musical beat: Striatal dissociations between detection and prediction of regularity. Cerebral Cortex 23(4), 913–921. Haegens, S., & Golumbic, E. Z. (2018). Rhythmic facilitation of sensory processing: A critical review. Neuroscience & Biobehavioral Reviews 86, 150–165. Herdener, M., Humbel, T., Esposito, F., Habermeyer, B., Cattapan-Ludewig, K., & Seifritz, E. (2012). Jazz drummers recruit language-specific areas for the processing of rhythmic structure. Cerebral Cortex 24(3), 836–843. Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature Reviews Neuroscience 8, 393–402. Hidalgo, C., Falk, S., & Schön, D. (2017). Speak on time! Effects of a musical rhythmic training on children with hearing loss. Hearing Research 351, 11–18. Hillis, A. E., & Caramazza, A. (1995). Representation of grammatical categories of words in the brain. Journal of Cognitive Neuroscience 7(3), 396–407. Hoch, L., Poulin-Charronnat, B., & Tillmann, B. (2011). The influence of task-irrelevant music on language processing: Syntactic and semantic structures. Frontiers in Psychology 2. Retrieved from https://doi.org/10.3389/fpsyg.2011.00112 Intartaglia, B., White-Schwoch, T., Kraus, N., & Schön, D. (2017). Music training enhances the automatic neural processing of foreign speech sounds. Scientific Reports 7(1), 12631. Jaramillo, S., & Zador, A. M. (2011). The auditory cortex mediates the perceptual effects of acoustic temporal expectation. Nature Neuroscience 14, 246–251. Jentschke, S., & Koelsch, S. (2009). Musical training modulates the development of syntax processing in children. NeuroImage 47(2), 735–744. Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention, and memory. Psychological Review 83(5), 323–355. Kleber, B., Veit, R., Moll, C. V., Gaser, C., Birbaumer, N., & Lotze, M. (2016). Voxel-based morphometry in opera singers: Increased gray-matter volume in right somatosensory and auditory cortices. NeuroImage 133, 477–483. Koelsch, S., Gunter, T. C., Cramon, D. Y. V., Zysset, S., Lohmann, G., & Friederici, A. D. (2002). Bach speaks: A cortical “language-network” serves the processing of music. NeuroImage 17(2), 956–966. Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: An ERP study. Journal of Cognitive Neuroscience 17(10), 1565–1577. Kotz, S. A., Gunter, T. C., & Wonneberger, S. (2005). The basal ganglia are receptive to rhythmic compensation during auditory syntactic processing: ERP patient data. Brain and Language 95(1), 70–71. Kotz, S. A., Meyer, M., Alter, K., Besson, M., von Cramon, D. Y., & Friederici, A. D. (2003). On the lateralization of emotional prosody: An event-related functional MR investigation. Brain & Language 86(3), 366–376. Kotz, S. A., & Schwartze, M. (2010). Cortical speech processing unplugged: A timely subcorticocortical framework. Trends in Cognitive Sciences 14(9), 392–399. Kraus, N., & Chandrasekaran, B. (2010). Music training for the development of auditory skills. Nature Reviews Neuroscience 11(8), 599–605. Kunert, R., & Slevc, L. R. (2015). A commentary on: “Neural overlap in processing music and speech.” Frontiers in Human Neuroscience 9. Retrieved from https://doi.org/10.3389/fnhum.2015.00330
Kunert, R., Willems, R. M., Casasanto, D., Patel, A. D., & Hagoort, P. (2015). Music and language syntax interact in Broca’s area: An fMRI study. PloS One 10(11), e0141069. Lakatos, P., Musacchia, G., O’Connel, M. N., Falchier, A. Y., Javitt, D. C., & Schroeder, C. E. (2013). the spectrotemporal filter mechanism of auditory selective attention. Neuron 77, 750–761. Large, E. W., & Jones, M. R. (1999). The dynamics of attending: How people track time-varying events. Psychological Review 106(1), 119–159. Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967). Perception of the speech code. Psychological Review 74, 431–461. Liebenthal, E., Binder, J. R., Spitzer, S. M., Possing, E. T., & Medler, D. A. (2005). Neural substrates of phonemic perception. Cerebral Cortex 15(10), 1621–1631. Lindell, A. K. (2006). In your right mind: Right hemisphere contributions to language processing and production. Neuropsychology Review 16(3), 131–148. Luria, A. R., Tsevetkova, L. S., & Futer, D. S. (1965). Aphasia in a composer. Journal of the Neurological Sciences 2(3), 288–292. Maess, B., Koelsch, S., Gunter, T. C., & Friederici, A. D. (2001). Musical syntax is processed in Broca’s area: an MEG study. Nature neuroscience, 4(5), 540–545. Magne, C., Schön, D., & Besson, M. (2006). Musician children detect pitch violations in both music and language better than nonmusician children: Behavioral and electrophysiological approaches. Journal of Cognitive Neuroscience 18(2), 199–211. Marie, C., Delogu, F., Lampis, G., Belardinelli, M. O., & Besson, M. (2011). Influence of musical expertise on segmental and tonal processing in Mandarin Chinese. Journal of Cognitive Neuroscience 23(10), 2701–2715. Marie, C., Magne, C., & Besson, M. (2011). Musicians and the metric structure of words. Journal of Cognitive Neuroscience 23(2), 294–305. Marr, D. (1982). Vision. San Francisco, CA: Freeman. Merchant, H., Grahn, J., Trainor, L., Rohrmeier, M., & Fitch, W.T. (2015). Finding the beat: A neural perspective across humans and non-human primates. Philosophical Transactions of the Royal Society B: Biological Sciences 370(1664), 20140093. doi:10.1098/rstb.2014.0093 Milovanov, R., Huotilainen, M., Esquef, P. A., Alku, P., Välimäki, V., & Tervaniemi, M. (2009). The role of musical aptitude and language skills in preattentive duration processing in school-aged children. Neuroscience Letters 460(2), 161–165. Moore, E., Branigan, H. & Overy, K. (2017). Exploring the role of auditory-motor synchronisation in the transfer of music to language skills in dyslexia. Outstanding Poster Award talk at Neurosciences and Music VI conference. Morillon, B., & Baillet, S. (2017). Motor origin of temporal predictions in auditory attention. Proceedings of the National Academy of Sciences 114(42), E8913–E8921. Morillon, B., Hackett, T. A., Kajikawa, Y., & Schroeder, C. E. (2015). Predictive motor control of sensory dynamics in auditory active sensing. Current Opinion in Neurobiology 31, 230–238. Morillon, B., Schroeder, C. E., & Wyart, V. (2014). Motor contributions to the temporal precision of auditory attention. Nature Communications 5, 5255. Morillon, B., Schroeder, C. E., Wyart, V., & Arnal, L. H. (2016). Temporal prediction in lieu of periodic stimulation. Journal of Neuroscience 36(8), 2342–2347. Musacchia, G., Sams, M., Skoe, E., & Kraus, N. (2007). Musicians have enhanced subcortical auditory and audiovisual processing of speech and music. Proceedings of the National Academy of Sciences 104(40), 15894–15898. Nobre, A. C., & van Ede, F. (2018). Anticipated moments: Temporal structure in attention. Nature Reviews Neuroscience 19, 34–38.
Overy, K. (2000). Dyslexia, temporal processing and music: The potential of music as an early learning aid for dyslexic children. Psychology of Music 28(2), 218–229. Parbery-Clark, A., Skoe, E., & Kraus, N. (2009). Musical experience limits the degradative effects of background noise on the neural processing of sound. Journal of Neuroscience 29(45), 14100– 14107 Parbery-Clark, A., Tierney, A., Strait, D. L., & Kraus, N. (2012). Musicians have fine-tuned neural distinction of speech syllables. Neuroscience 219, 111–119. Park, H., Ince, R. A. A., Schyns, P. G., Thut, G., & Gross, J. (2015). Frontal top-down signals increase coupling of auditory low-frequency oscillations to continuous speech in human listeners. Current Biology 25(12), 1649–1653. Patel, A. D. (2003). Language, music, syntax and the brain. Nature Neuroscience 6(7), 674–681. Patel, A. D. (2011). Why would musical training benefit the neural encoding of speech? The OPERA hypothesis. Frontiers in Psychology 2, 142. doi:10.3389/fpsyg.2011.00142 Patel, A. D. (2014). Can nonlinguistic musical training change the way the brain processes speech? The expanded OPERA hypothesis. Hearing Research 308, 98–108. Patel, A. D., & Iversen, J. R. (2014). The evolutionary neuroscience of musical beat perception: The Action Simulation for Auditory Prediction (ASAP) hypothesis. Frontiers in System Neuroscience 8, 57. Retrieved from https://doi.org/10.3389/fnsys.2014.00057 Patel, A. D., Iversen, J. R., Wassenaar, M., & Hagoort, P. (2008). Musical syntactic processing in agrammatic Broca’s aphasia. Aphasiology 22(7–8), 776–789. Peretz, I., & Coltheart, M. (2003). Modularity of music processing. Nature Neuroscience 6(7), 688– 691. Peretz, I., Vuvan, D., Lagrois, M. É., & Armony, J. L. (2015). Neural overlap in processing music and speech. Philosophical Transactions of the Royal Society B: Biological Sciences 370(1664), 20140090. Poeppel, D. (2003). The analysis of speech in different temporal integration windows: Cerebral lateralization as “asymmetric sampling in time.” Speech Communication 41(1), 245–255. Poeppel, D. (2012). The maps problem and the mapping problem: Two challenges for a cognitive neuroscience of speech and language. Cognitive Neuropsychology 29(1–2), 34–55. Przybylski, L., Bedoin, N., Krifi-Papoz, S., Herbillon, V., Roch, D., Léculier, L., … Tillmann, B. (2013). Rhythmic auditory stimulation influences syntactic processing in children with developmental language disorders. Neuropsychology 27(1), 121–131. Rauschecker, J. P., & Scott, S. K. (2009). Maps and streams in the auditory cortex: Nonhuman primates illuminate human speech processing. Nature Neuroscience 12, 718–724. Rauschecker, J. P., & Tian, B. (2000). Mechanisms and streams for processing of “what” and “where” in auditory cortex. Proceedings of the National Academy of Sciences 97(22), 11800– 11806. Rimmele, J. M., Morillon, B., Poeppel, D., & Arnal, L. H. (submitted). The proactive and flexible sense of timing. Rizzolatti, G., & Craighero, L. (2004). The mirror-neuron system. Annual Review of Neuroscience 27, 169–192. Rogalsky, C., & Hickok, G. (2011). The role of Broca’s area in sentence comprehension. Journal of Cognitive Neuroscience 23(7), 1664–1680. Rohenkohl, G., Cravo, A. M., Wyart, V., & Nobre, A. C. (2012). Temporal expectation improves the quality of sensory information. Journal of Neuroscience 32(24), 8424–8428. Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science 274(5294), 1926–1928.
Saffran, J. R., Johnson, E. K., Aslin, R. N., & Newport, E. L. (1999). Statistical learning of tone sequences by human infants and adults. Cognition 70(1), 27–52. Salimpoor, V. N., van den Bosch, I., Kovacevic, N., McIntosh, A. R., Dagher, A., & Zatorre, R. J. (2013). Interactions between the nucleus accumbens and auditory cortices predict music reward value. Science 340(6129), 216–219. Sammler, D., Baird, A., Valabrègue, R., Clément, S., Dupont, S., Belin, P., & Samson, S. (2010). The relationship of lyrics and tunes in the processing of unfamiliar songs: A functional magnetic resonance adaptation study. Journal of Neuroscience 30(10), 3572–3578. Sammler, D., Koelsch, S., Ball, T., Brandt, A., Grigutsch, M., Huppertz, H. J., … Friederici, A. D. (2013). Co-localizing linguistic and musical syntax with intracranial EEG. NeuroImage 64, 134– 146. Sammler, D., Koelsch, S., & Friederici, A. D. (2011). Are left fronto-temporal brain areas a prerequisite for normal music-syntactic processing? Cortex 47(6), 659–673. Schlaug, G., Jäncke, L., Huang, Y., Staiger, J. F., & Steinmetz, H. (1995). Increased corpus callosum size in musicians. Neuropsychologia 33(8), 1047–1055. Schön, D., Boyer, M., Moreno, S., Besson, M., Peretz, I., & Kolinsky, R. (2008). Songs as an aid for language acquisition. Cognition 106(2), 975–983. Schön, D., Gordon, R., Campagne, A., Magne, C., Astésano, C., Anton, J. L., & Besson, M. (2010). Similar cerebral networks in language, music and song perception. NeuroImage 51(1), 450–461. Schön, D., Magne, C., & Besson, M. (2004). The music of speech: Music training facilitates pitch processing in both music and language. Psychophysiology 41(3), 341–349. Schön, D., & Tillmann, B. (2015). Short- and long-term rhythmic interventions: Perspectives for language rehabilitation. Annals of the New York Academy of Sciences 1337, 32–39. Schroeder, C. E., & Lakatos, P. (2009). Low-frequency neuronal oscillations as instruments of sensory selection. Trends in Neurosciences 32(1), 9–18. Schroeder, C. E., Wilson, D. A., Radman, T., Scharfman, H., & Lakatos, P. (2010). Dynamics of active sensing and perceptual selection. Current Opinion in Neurobiology 20, 172–176. Schubotz, R. I. (2007). Prediction of external events with our motor system: Towards a new framework. Trends in Cognitive Sciences 11(5), 211–218. Shahin, A., Bosnyak, D. J., Trainor, L. J., & Roberts, L. E. (2003). Enhancement of neuroplastic P2 and N1c auditory evoked potentials in musicians. Journal of Neuroscience 23(13), 5545–5552. Skoe, E., & Kraus, N. (2012). A little goes a long way: How the adult brain is shaped by musical training in childhood. Journal of Neuroscience 32(34), 11507–11510. Slevc, L. R., Faroqi-Shah, Y., Saxena, S., & Okada, B. M. (2016). Preserved processing of musical structure in a person with agrammatic aphasia. Neurocase 22(6), 505–511. Slevc, L. R., Rosenberg, J. C., & Patel, A. D. (2009). Making psycholinguistics musical: Self-paced reading time evidence for shared processing of linguistic and musical syntax. Psychonomic Bulletin & Review 16(2), 374–381. Staeren, N., Renvall, H., De Martino, F., Goebel, R., & Formisano, E. (2009). Sound categories are represented as distributed patterns in the human auditory cortex. Current Biology 19(6), 498–502. Stahl, B., Kotz, S. A., Henseler, I., Turner, R., & Geyer, S. (2011). Rhythm in disguise: Why singing may not hold the key to recovery from aphasia. Brain 134(10), 3083–3093. Stefanics, G., Hangya, B., Hernadi, I., Winkler, I., Lakatos, P., & Ulbert, I. (2010). Phase entrainment of human delta oscillations can mediate the effects of expectation on reaction speed. Journal of Neuroscience 30(41), 13578–13585. Steinhauer, K., Alter, K., & Friederici, A. D. (1999). Brain potentials indicate immediate use of prosodic cues in natural speech processing. Nature Neuroscience 2(2), 191–196.
Strait, D. L., Kraus, N., Parbery-Clark, A., & Ashley, R. (2010). Musical experience shapes top-down auditory mechanisms: Evidence from masking and auditory attention performance. Hearing Research 261(1), 22–29. Teki, S., Grube, M., Kumar, S., & Griffiths, T. D. (2011). Distinct neural substrates of duration-based and beat-based auditory timing. Journal of Neuroscience 31(10), 3805–3812. Tenenbaum, J. B., Kemp, C., Griffiths, T. L., & Goodman, N. D. (2011). How to grow a mind: Statistics, structure, and abstraction. Science 331(6022), 1279–1285. Thompson, W. F., Schellenberg, E. G., & Husain, G. (2004). Decoding speech prosody: Do music lessons help? Emotion 4(1), 46–64. Tierney, A., & Kraus, N. (2014). Auditory-motor entrainment and phonological skills: Precise auditory timing hypothesis (PATH). Frontiers in Human Neuroscience 8. Retrieved from https://doi.org/10.3389/fnhum.2014.00949 Tinbergen, N. (1963). On aims and methods of ethology. Ethology 20, 410–433. Vallentin, D., Kosche, G., Lipkind, D., & Long, M. A. (2016). Inhibition protects acquired song segments during vocal learning in zebra finches, Science 351(6270), 267–271. Vigneau, M., Beaucousin, V., Hervé, P. Y., Jobard, G., Petit, L., Crivello, F., … Tzourio-Mazoyer, N. (2011). What is right-hemisphere contribution to phonological, lexico-semantic, and sentence processing? Insights from a meta-analysis. NeuroImage 54(1), 577–593. Vuust, P., & Witek, M. A. G. (2014). Rhythmic complexity and predictive coding: A novel approach to modeling rhythm and meter perception in music. Frontiers in Psychology 5, 1111. Retrieved from https://doi.org/10.3389/fpsyg.2014.01111 Wong, P. C., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nature Neuroscience 10(4), 420–422. Zatorre, R. J., Belin, P., & Penhune, V. B. (2002). Structure and function of auditory cortex: Music and speech. Trends in Cognitive Sciences 6(1), 37–46. Zatorre, R. J., Chen, J. L., & Penhune, V. B. (2007). When the brain plays music: Auditory–motor interactions in music perception and production. Nature Reviews Neuroscience 8, 547–558. Zuk, J., Ozernov-Palchik, O., Kim, H., Lakshminarayanan, K., Gabrieli, J. D. E., Tallal, P., & Gaab, N. (2013). Enhanced syllable discrimination thresholds in musicians. PLoS ONE 8(12), e80546. Zumbansen, A., Peretz, I., & Hébert, S. (2014). Melodic intonation therapy: Back to basics for future research. Frontiers in Neurology 5. Retrieved from https://doi.org/10.3389/fneur.2014.00007
SECTION V
MU S IC IA N S HIP A N D B R A IN F U N C T ION
CHAPT E R 17
M U S I C A L E X P E RT I S E A N D BRAIN STRUCTURE: THE CAUSES AND CONSEQUENCES OF TRAINING VIRGINIA B. PENHUNE
I O the past twenty years, brain imaging studies have demonstrated that music training can change brain structure, predominantly in the auditorymotor network that underlies music performance. These studies have also shown that brain structural variation is related to performance on a range of musical tasks, and that even short-term training can result in brain plasticity. In this chapter, we will argue that the observed differences in brain structure between experts and novices derive from at least four sources. First, there may be pre-existing individual differences in structural features supporting specific skills that predispose people to undertake music training. Second, lengthy and consistent training likely produces structural change in the brain networks tapped by performance through repeated cycles of prediction, feedback, and error-correction that drive learning. Third, the timing of practice during specific periods of development may result in
brain changes that do not occur at other periods of time, and which may promote future learning and plasticity. Fourth, both the rewarding nature of music itself, as well as the reward value of practice and accurate performance may make music training a particularly effective driver of brain plasticity.
S
B A
D M
There is now a relatively large body of brain imaging data showing differences in gray- (GM) and white-matter (WM) architecture between musicians and non-musicians (see Fig. 1). In adults all of these studies are cross-sectional, and typically compare music students or professionals with controls selected to have very little music training. One of the most common and expected findings is that music training is associated with enhancements in auditory regions, particularly Heschl’s gyrus (HG), the region of primary auditory cortex. These studies have found that musicians commonly show greater gyrification of HG (Schneider et al., 2002; Schneider et al., 2005), and greater GM volume or cortical thickness (CT) in this region (Bermudez, Lerch, Evans, & Zatorre, 2009; Foster & Zatorre, 2010; Gaser & Schlaug, 2003; Karpati, Giacosa, Foster, Penhune, & Hyde, 2017; Schneider et al., 2002, 2005). These differences have been shown to be related to indices of music proficiency (Schneider et al., 2002, 2005), hours of music practice (Foster & Zatorre, 2010), variations in EEG and MEG responses to auditory signals (Schneider et al., 2002, 2005), and performance on melody discrimination and rhythm reproduction tasks (Foster & Zatorre, 2010; Karpati et al., 2017).
FIGURE 1. Regions of the dorsal auditory pathway affected by music training. Illustrates brain regions found to show structural changes in musicians compared to non-musicians. These include the auditory (superior temporal gyrus, STG), partietal, premotor cortex (PMC), and inferior frontal gyrus (IFG) regions in the dorsal auditory pathway, as well as the connecting fibers of the arcuate fasciculus. Also pictured are the cerebellum and corticospinal tract (CST). Regions not shown are the corpus callosum and basal ganglia.
The second most common finding is enhancement in motor regions of the brain, including GM in primary motor, premotor, and parietal regions, as well as the cerebellum and basal ganglia. In addition, consistent increases have been observed in white-matter pathways, including the corpus callosum, descending motor tracts, and sensorimotor connections. One of the first studies in this domain found that the length of the central sulcus, and by inference the size of the motor cortex (M1), was larger in trained musicians, and that earlier onset of training was related to greater length (Amunts et al., 1997). This finding has been replicated in subsequent
studies using whole-brain analysis techniques (Bermudez et al., 2009; Gaser & Schlaug, 2003). Differences between musicians and non-musicians have also been observed in the corpus callosum (CC), the primary white-matter pathway connecting the two hemispheres. In another early investigation, it was found that the surface area of the anterior half of the CC was larger in musicians, and that this difference was greatest for those who began training before age 7 (Schlaug, Jancke, Huang, Staiger, & Steinmetz, 1995). Musicians have also been found to have greater white-matter integrity in the CC as measured using diffusion tensor imaging (DTI), with these measures being related to hours of practice (Bengtsson et al., 2005), as well as to age of start and performance on a sensory-motor synchronization task (Steele, Bailey, Zatorre, & Penhune, 2013). In the descending motor pathways, changes in DTI measures have been observed to be related to hours of practice in childhood (Bengtsson et al., 2005). Changes in subcortical structures have also been observed, with a recent study reporting that musicians have greater gray-matter volume in the putamen (Vaquero et al., 2016), and others showing enhancements in cerebellar gray- (Gaser & Schlaug, 2003; Hutchinson, Lee, Gaab, & Schlaug, 2003) and white-matter (Abdul-Kareem, Stancak, Parkes, Al-Ameen, et al., 2011). However, a more recent study from our laboratory using cerebellar-specific segmentation techniques found no differences in either gray- or white-matter volumes between musicians and non-musicians, but that musicians who began training before age 7 had reduced volumes in cerebellar regions specifically related to motor timing (Baer et al., 2015). Other regions found to differ between musicians and non-musicians are in frontal and parietal cortex, including regions important for language (pars opercularis and triangularis; areas 44 and 45) and working memory (dorsolateral: 9/46; and ventrolateral prefrontal cortex: 47/12). Enhanced GM density has been observed in areas 44/45 that is related to years of music experience (Abdul-Kareem, Stancak, Parkes, & Sluming, 2011; James et al., 2014; Sluming et al., 2002), and to performance on a test of absolute pitch (Bermudez et al., 2009). Importantly, musicians have also been found to have greater white-matter integrity as measured with DTI in the arcuate fasciculus, the pathway connecting auditory, parietal, and inferior frontal regions (Halwani, Loui, Ruber, & Schlaug, 2011). Musicians have also been reported to have greater cortical thickness in DLPFC; and interregional variability in cortical thickness is correlated across a broader
range of auditory and motor regions in musicians compared to controls (Bermudez et al., 2009). Finally, several studies have reported greater graymatter volume in parietal regions (Foster & Zatorre, 2010; Gaser & Schlaug, 2003; James et al., 2014), which are engaged in sensorimotor transformations and planning that are relevant for playing a musical instrument (Andersen & Cui, 2009; Gogos et al., 2010; Rauschecker, 2011). In particular, Foster and Zatorre (2010) found that both gray-matter volume and cortical thickness were related to performance on a test of melodic discrimination in a group of people with varying levels of music experience. Taken together, cross-sectional studies in adult musicians provide evidence that long-term practice produces structural changes in regions of the dorsal auditory-motor network that has been shown in functional imaging studies to be recruited during playing (Brown, Zatorre, & Penhune, 2015; Chen, Penhune, & Zatorre, 2008; Herholz & Zatorre, 2012; Novembre & Keller, 2014).
D
I R
T
-
P
Studying effects of music training in childhood is important because that is when lessons typically begin, but also because we know that sensorimotor experience during early sensitive periods in development can have differential impacts on long-term brain plasticity. The first longitudinal study in children examined the effects of 15 months of piano training study in 6- to 8-year-olds (Hyde et al., 2009). Longitudinal studies are critical because they allow us to establish more direct causal connections between training and any observed changes in the brain. This study found that children who received training did not differ from untrained children at baseline, but showed gray-matter enhancements in auditory and motor cortex, as well as enlargement of the corpus callosum. Most importantly, the volume of auditory cortex was found to be related to performance on tests of melody and rhythm discrimination, and the volume of motor cortex was found to be related to performance on a test of fine-motor skill. These results are supported by a second longitudinal study which found that 6- to
8-year-old children participating in a music training program were found to have greater WM integrity in the CC after two years (Habibi et al., 2017). There was also some evidence of reduced cortical thinning in right compared to left posterior auditory cortex. Taken together, these longitudinal results indicate that even relatively short-term training in childhood can produce changes in behavior and brain structure. Most importantly, changes occurred in the same regions of the auditory-motor network—auditory cortex, M1, and the CC—that have been shown to differ after long-term training in adults. The parallel between longitudinal changes in childhood and cross-sectional findings in adults supports the inference that the structural differences observed in adults are indeed the result of training. The only other anatomical study in children found that in a large group of 8- to 10-year-olds, the volume of HG was larger in those who practiced more, and was associated with measures of music aptitude, as well as behavioral and MEG measures of auditory processing (Seither-Preisler, Parncutt, & Schneider, 2014). This is consistent with a longitudinal EEG study in children showing enhancements of auditory evoked responses to musical features (Putkinen, Tervaniemi, Saarikivi, Ojala, & Huotilainen, 2014). Interestingly, however, no changes in HG volume were observed when examining possible longitudinal effects after 13 months of additional training. Further, hierarchical regressions models predicting HG volume found that aptitude accounted for a greater proportion of the variance than practice time. The authors interpreted these last two findings as indicating that anatomical predispositions make a greater contribution to musical outcomes than training. However, it is also possible that training-related plastic changes had already occurred in the period preceding the study. Most children began lessons between 6 and 7 years old, and thus had already been playing for one to two years. The issue of whether predispositions or training contribute most to observed structural differences between musicians and non-musicians has long been debated, with little data that can directly contribute to settling the argument. As will be discussed further in this chapter, some data from untrained adults show that individual differences in specific anatomical features are related to performance or learning of musical tasks, providing indirect evidence that pre-existing anatomical features may mediate the potential to acquire musical skills (Foster & Zatorre, 2010; Li et al., 2014;
Paquette, Fujii, Li, & Schlaug, 2017; Schneider et al., 2005). The finding described earlier of larger HG volume in children who practice more, and which does not change over time can also be considered as evidence for a pre-existing structural feature associated with musical skill (Seither-Preisler et al., 2014). Work with twins has shown that the propensity to practice is heritable, and that genes appear to account for a large portion of the variance in music abilities (Mosing, Madison, Pedersen, Kuja-Halkola, & Ullén, 2014). However, a very recent study from this same group compared brain structure in monozygotic twins discordant for music practice. They found that the twins who played had greater cortical thickness in auditory and motor regions as well as WM enhancements in the corpus callosum compared to those who did not (de Manzano & Ullén, 2018). These findings provide the most definitive support yet for the causal effect of music training on brain structure. In an effort to synthesize these apparently opposing results, the authors have proposed a gene–environment interaction model of the musical skills and its impact on the brain (Ullén, Hambrick, & Mosing, 2016). This model proposes that multiple genetic predispositions subserving specifically musical skills, such as auditory and motor abilities, as well as non-specific cognitive and personality factors contribute to the likelihood that someone will engage in training. They also hypothesize that environmental factors interact with genetic predispositions to either promote or discourage persistence. We would further propose that the timing of music experience interacts with both predispositions and normative brain maturation to influence long-term behavioral and brain plasticity (see Fig. 2).
FIGURE 2. Gene–maturation–environment interactions. Illustrates the interaction between genes, brain maturation, and specific training. Genetic variation leads to individual differences in brain structures for musical aptitudes such as auditory perception and motor dexterity. Genetic variation also regulates other non-specific aptitudes, such as cognitive skills and personality factors, including openness and the propensity to practice. Maturation produces normative changes that peak at different times depending on the brain region. Experience, such as music training, then interacts with both pre-existing individual differences, and normative maturation to change brain structure and plasticity. Experience also feeds back on genes through gene–environment interactions that can further enhance or limit plasticity.
T
I
D T
A very important question in understanding the effect of music training on brain structure is the interaction between brain development and music training. Anecdotal evidence from the lives of famous musicians suggests that an early start of training can promote the development of extraordinary skill in adulthood (Jorgensen, 2011). Evidence from animal and human studies also shows that early experience, such as specific auditory exposure (Chang & Merzenich, 2003; de Villers-Sidani, Chang, Bao, & Merzenich, 2007), or enriched sensorimotor environments (Kolb et al., 2012), can have long-term effects on behavior and the brain. Two important early studies provided suggestive evidence that the impact of music training on brain structure was related to the age of start, with those who begin earlier showing greater enhancements in the size of
M1 (Amunts et al., 1997) or the surface area of the corpus callosum (Schlaug et al., 1995). However, without specific controls, the age of start of training is typically confounded with the total years of training, making it impossible to attribute the observed differences to the age at which training began. In addition, these studies did not link the observed neuroanatomical differences to relevant behavior. To address these issues, a series of studies have compared behavior and brain structure in early- (ET < age 7) and late-trained (LT > age 7) musicians (see Fig. 3; see also Baer et al., 2015; Bailey & Penhune, 2010, 2012, 2013; Bailey, Zatorre, & Penhune, 2014; Steele et al., 2013; Vaquero et al., 2016). In these studies we matched ET and LT groups on important potential confounding variables including: years of music experience, years of formal training, and hours of current practice. In addition, we assessed cognitive measures such as non-verbal IQ and auditory working memory which might be thought to be related to the capacity for early training. Most importantly, we assessed performance on relevant musical skills, such as rhythm reproduction and melody discrimination. The age 7 cut-off for ET and LT groups was initially drawn from the study by Schlaug et al. (1995) and was essentially arbitrary. However, using a large sample of behavioral data, we have been able to show that the likely age range where early training has its strongest effect is between 7 and 9 (Bailey & Penhune, 2013). Behaviorally our studies have shown that adult musicians who begin training before age 7 outperform those who begin later on rhythm reproduction and melody discrimination tasks (Bailey & Penhune, 2010, 2012). Drawing on this work, we collected a large sample of ET and LT musicians with behavioral, T1 and DTI data. Analysis using deformationbased morphometry on the T1 data found that ET musicians show enlargement in the region of the ventral premotor cortex (vPMC), and that the volume of this region is related to performance on the rhythm synchronization task (Bailey et al., 2014). These findings are consistent with fMRI studies showing that vPMC is active when both musicians and non-musicians are performing the same rhythm task (Chen et al., 2008). In the same sample, DTI measures showed that ET musicians also had enhanced WM integrity in the posterior mid-body of the corpus callosum, the location of fibers connecting M1 and PMC in the two hemispheres (Steele et al., 2013). We interpreted these findings based on data about normative maturation in these regions, and the relative contribution of
genes and environment to their variability. A large, cross-sectional developmental sample showed that GM volume in anterior motor regions, including MI and PMC, have their peak period of growth between 6 and 8 years old (Giedd et al., 1999). Similarly, the size of the anterior region of the CC shows its peak increase at the same time (Westerhausen et al., 2011), and variability of this region is more strongly influenced by environmental than genetic factors (Chiang et al., 2009). Based on these data, we can hypothesize that early training at the time of peak maturational change in motor regions and the CC may enhance brain plasticity. In addition, the relatively stronger contribution of environment to the size of anterior CC in adults suggests that it might be more susceptible to the impact of music training. We interpreted these findings as demonstrating a scaffold, or metaplastic, effect where early training promotes brain plasticity which is sustained or augmented by later practice (Steele et al., 2013). Our findings in the PMC and CC appear to tell a straightforward story in which early training produces enlargement or enhancement of brain structure. However, more recent findings make it clear that reality is not so simple. Using the same sample described earlier, we examined GM and WM volumes in the cerebellum using a novel multi-atlas segmentation technique that labels all thirteen lobules in both hemispheres (Baer et al., 2015). In addition, we tested these musicians and controls on a classic auditory-motor tapping and continuation task (Repp, 2005). The cerebellum has been linked to a range of sensory and motor timing functions that are likely to be relevant for music training and performance (Koziol et al., 2014; Sokolov, Miall, & Ivry, 2017). And, as described earlier, previous work had found greater cerebellar GM volume in trained musicians (Gaser & Schlaug, 2003; Hutchinson et al., 2003). However, the results of our study showed that ET musicians had smaller volumes of cerebellar lobules IV, V, and VI compared to LT musicians. Strikingly, earlier age of start, greater music experience, and better timing performance were all correlated with smaller cerebellar volumes. Better timing performance was specifically associated with smaller volumes of right lobule VI which has been functionally linked to perceptual and motor timing (E, Chen, Ho, & Desmond, 2014; Ivry, Spencer, Zelaznik, & Diedrichsen, 2002). This is consistent with another recent study which found that early-trained pianists
had smaller GM volume in the right putamen, and lower timing variability when playing scales (Vaquero et al., 2016).
FIGURE 3. Findings from studies examining structural differences between early- (ET; before age 7) and late-trained (LT; after age 7) musicians. Panel A on the left is taken from Bailey et al., 2014 and shows GM enhancement in the ventral premotor cortex (vPMC) in ET musicians. Panel A on the right is taken from Steele et al., 2013 and shows enhanced FA in the posterior midbody of the corpus callosum. Panel B on the left is taken from Vaquero et al., 2016 and shows reduced GM in the putamen in ET musicians. Panel B on the right is taken from Baer et al, 2015 and shows reduced volume of left cerebellar lobule VIIIa. The graphs at the bottom of each panel show the relationship of volume changes with the age of onset of training.
So, why does training affect the cerebellum differently than the cortex, and how do these findings challenge our understanding of the effects of early experience? There are several features of cerebellar anatomy that may explain this result. First, developmental studies show that peak growth in the cerebellum occurs much later than in most of the cortex, between the ages of 12 and 18 (Tiemeier et al., 2010). Thus early experience may have a different effect on cerebellar plasticity, such that experience leads to greater efficiency and reduced expansion. Second, the cerebellum is unique in being structurally homogeneous, with the identical cytoarchitecture and input–output circuitry throughout (Schmahmann, 1997). In the motor system, the cerebellar circuits are known to play a role in error-correction and optimization. Because these circuits are uniform across the structure, it is hypothesized that they perform the same role in optimizing a wide variety of functions in the regions to which it is connected (Balsters, Whelan, Robertson, & Ramnani, 2013; Koziol et al., 2014; Sokolov et al., 2017). The cerebellar regions that are smaller in ET musicians in our study are connected to frontal motor and association regions, including M1, PMC, and prefrontal cortex (Diedrichsen, Balsters, Flavell, Cussans, & Ramnani, 2009; Kelly & Strick, 2003). Based on this information, it is possible that training-related skills and cortical expansion might be supported by greater optimization and reduced expansion in the cerebellum. If this is true, then cortical and cerebellar changes with training should be inversely related.
A
S
-T
T
Differences in brain structure between musicians and non-musicians have generally been attributed to long and intensive training. However, it is more
likely that they result from an interaction between training-induced plasticity and pre-existing individual differences in the brain that predispose certain people to engage in music (see Fig. 2). While there is little direct evidence for specific brain features that predispose an individual to become a musician, evidence from studies of individual differences in music ability and response to training can provide some clues. Individual differences in auditory and motor regions of untrained individuals have been linked to performance on specific musical tasks, and to the ability to learn to play an instrument. GM concentrations in auditory regions and the amygdala were found to be correlated with interval discrimination in a large sample unselected for music training (Li et al., 2014). Similarly, in a sample selected to have a range of musical experience, GM concentration and cortical thickness in auditory and parietal regions were found to be related to the ability to discriminate melodies that had been transposed (Foster & Zatorre, 2010). Finally, a recent study found that cerebellar volumes were related to beat perception in musicians (Paquette et al., 2017). Individual differences in WM tracts connecting auditory and motor regions, and in motor output pathways have been found to be related to faster learning of short melodies (Engel et al., 2014). Further, WM integrity in the left arcuate fasciculus and the temporal segment of the CC have been found to predict individual differences in auditory-motor synchronization (Blecher, Tal, & Ben-Shachar, 2016). Findings showing that brain structural features can predict musical skills are consistent with results in related domains, where the volume of auditory cortex was found to be associated with the ability to learn linguistic pitch discrimination (Wong et al., 2008), and the volume of both auditory cortex (Golestani, Molko, Dehaene, LeBihan, & Pallier, 2007; Golestani, Paus, & Zatorre, 2002) and the arcuate fasciculus have been found to be related to foreign language sound learning (Vaquero, Rodriguez-Fornells, & Reiterer, 2017). Very importantly, however, aptitude for music training likely relies on more than pure auditory or motor skill. Heritability studies show that the propensity to practice appears to be genetically transmitted (Mosing et al., 2014), and that personality variables such as “openness to experience” are also associated with lifetime practice (Butkovic, Ullén, & Mosing, 2015). Thus, an individual with exceptional pre-existing skills must also have the right personality characteristics to undertake long-term training, and the openness to engage with new people, places, and ideas. A talented
individual who does not like to practice, or hates stress, travel, and challenge is unlikely to become a professional musician.
B
I
A
T
Taken together, the current data on brain structure in musicians suggests that there may be pre-existing structural features—likely in the auditorymotor network supporting musical skill—that predispose individuals to pursue music training. Once training begins, the long-term effects on behavior and brain structure depend on the age of start, and thus on the interaction between training and the maturational trajectories of these regions and their connections. Early training may produce a type of scaffold or metaplasticity effect. Metaplasticity is a term that originates from studies of hippocampal learning mechanisms, and denotes the idea that experience can change the potential for plasticity of a synapse (for review see Altenmüller & Furuya, 2016; Herholz & Zatorre, 2012). When applied to the context of music, it is the idea that training during specific phases of brain development can have long-term effects on how those regions change in response to future experience. Evidence for metaplastic effects resulting from music training comes from studies showing that musicians have enhanced learning of sensory and motor skills (Herholz, Boh, & Pantev, 2011; Ragert, Schmidt, Altenmüller, & Dinse, 2004; Rosenkranz, Williamon, & Rothwell, 2007), and greater increases in M1 activity during learning (Hund-Georgiadis & von Cramon, 1999). Thus we can think of early training as a scaffold on which later training can build (Bailey et al., 2014; Steele et al., 2013). Along with these training-specific metaplastic effects, evidence from heritability studies indicates that skills and abilities not specific to music may also contribute to promoting or limiting plasticity; these include the propensity to practice (Mosing et al., 2014), as well personality and cognitive variables that can support training (Butkovic et al., 2015).
W
I M D
S B
E P
?
Why does music training produce such robust changes in brain structure? One very obvious answer is practice—lots of practice. For the studies reviewed here, the average length of training for musicians was 15–20 years. This is the equivalent of thousands of hours of practice across a large portion of the person’s life. While the idea that simply practicing long enough will result in expertise has been largely debunked (for review, see Mosing et al., 2014), long-term, consistent practice is strongly associated with expertise in a range of domains (Macnamara, Hambrick, & Oswald, 2014). Further, in the studies reviewed here, the length of training is typically strongly related to both structural brain differences and task performance. The impact of practice on brain organization is supported by studies in animals showing that practice on new motor tasks is associated with expanded representations in motor areas (Elbert, Pantev, Wienbruch, Rockstroh, & Taub, 1995; Nudo, Milliken, Jenkins, & Merzenich, 1996), changes in MR measures of gray- and white-matter (Scholz, AllemangGrand, Dazai, & Lerch, 2015; Scholz, Niibori, Frankland, & Lerch, 2015), and increased numbers of synapses and dendritic spines (Kleim, Barnaby, et al., 2002; Kleim, Freeman, et al., 2002; Kleim et al., 2004). Neuronal changes in gray matter that are related to learning include neurogenesis, synaptogenesis, and changes in neuronal morphology. In white matter, changes related to learning including increases in the number of axons, axon diameter, packing density of fibers, and myelination can be found (Zatorre, Fields, & Johansen-Berg, 2012). A second reason that music training may be particularly effective in driving brain plasticity is the highly specific nature of practice. The majority of musicians are experts on a single instrument; thus they perform millions of repetitions of the same movements, and listen attentively to an even larger number of associated sounds. When practicing, a musician imagines and plans a precise sequence of sounds and the movements required to produce them. Once the plan is set in motion, they use auditory and somatosensory information to detect subtle deviations in sound and movement, implementing adjustments to enhance performance. Practice is therefore a repeated prediction, feedback, and error-correction cycle. Auditory-motor prediction is thought to be a central function of the dorsal stream, particularly of the premotor cortex. Brain imaging studies have shown increased activity in the PMC when people listen to melodies that they have learned to play (Chen, Rae, & Watkins, 2012; Lahav, Saltzman,
& Schlaug, 2007), and recent work from our laboratory has shown that transcranical magnetic stimulation (TMS) over dorsal PMC disrupts learning of auditory-motor associations (Lega, Stephan, Zatorre, & Penhune, 2016). Feedback and error-correction are key components of motor learning (Diedrichsen, Shadmehr, & Ivry, 2010; Sokolov et al., 2017; Wolpert, Diedrichsen, & Flanagan, 2011), and studies of both motor and sensory learning show that functional and structural changes in the brain are driven by decreases in error and improved precision. For example, learning to juggle (Scholz, Klein, Behrens, & Johansen-Berg, 2009), balance on a tilting board (Taubert et al., 2010), or to perform a complex visuomotor task (Lakhani et al., 2016; Landi, Baguear, & Della-Maggiore, 2011) have all been shown to produce changes in gray- or white-matter architecture that were related to decreases in error with learning. Thus error-driven learning, particularly during periods of high developmental plasticity may be an important contributor to structural brain changes measured in adult musicians. Another reason that music training may be so successful in producing brain plasticity is that it is inherently multisensory. To produce music, performers must learn to link sounds to actions, but they must also link visual, somatosensory, and proprioceptive feedback to these sounds and actions. As described earlier, training is a prediction to feedback to errorcorrection cycle in which musicians use all their sensory resources to produce the perfect sound. Sounds are linked to actions relatively rapidly, as has been shown by changes in the strength of motor activity during passive listening to learned melodies after short-term training (Bangert et al., 2006; D’Ausilio, Altenmüller, Olivetti Belardinelli, & Lotze, 2006; Lega et al., 2016; Stephan, Brown, Lega, & Penhune, 2016). In particular, it was shown that learning to play a melody resulted in greater changes in the activity of auditory cortex than learning to remember the melody by listening alone (Lappe, Herholz, Trainor, & Pantev, 2008). This may partly be based on strong intrinsic connections between the auditory and motor systems (Chen et al., 2012; Poeppel, 2014; Zatorre, Chen, & Penhune, 2007). But it can also be hypothesized that co-activation of circuits deriving from multiple senses may drive plasticity even more strongly than input from a single sense (Lee & Noppeney, 2011, 2014). A final feature of music training that is likely crucial in promoting plasticity is the rewarding nature of performance. There are three aspects of
reward that may stimulate plasticity: first, the rewarding nature of music itself that is experienced through playing; second, the intrinsic reward of performing, both for the player and through the acclaim it may bring; and finally, the potentially rewarding nature of practice and the pleasure of accurate performance. The intrinsic pleasure derived from music appears to be common to most people (Mas-Herrero, Marco-Pallares, Lorenzo-Seva, Zatorre, & Rodriguez-Fornells, 2011), and is hypothesized to be based on the same dopamine-modulated, predictive systems that regulate reward in other domains with direct biological consequences, including drugs, food, sex, and money (Salimpoor, Zald, Zatorre, Dagher, & McIntosh, 2015). Thus learning to produce a rewarding stimulus, such as music, is likely to be rewarding to the player. We also know that learning and brain plasticity are strongly affected by the reward value of what is learned. Animal studies show that brain plasticity associated with auditory learning is greater when the information to be learned is rewarded, or behaviorally relevant. For example, the responses of neurons in the auditory cortex of ferrets were modulated by the reward value of stimuli (David, Fritz, & Shamma, 2012). Further, pairing a tone with stimulation of dopamine circuits in the brainstem increased the selectivity of responding in auditory neurons tuned to the same tone (Bao, Chan, & Merzenich, 2001). Importantly, dopamine has been shown to modulate motor learning in both humans and animals (Floel et al., 2005; Tremblay et al., 2009, 2010); possibly through the reinforcement and habitformation circuitry of the striatum (Graybiel & Grafton, 2015; Haith & Krakauer, 2013). Thus, if the output of practice, a beautiful piece of music, is rewarding and stimulates dopamine release, then playing such a piece should promote learning. It is also likely that the social benefits of playing music add to this type of reward. Finally, humans seem to have a strong internal motivation to practice and perfect many skills, even if those skills do not have immediate physiological, psychological, or social outcomes. In addition to music, people spend hours perfecting their golf swing, playing video games, or baking elaborate cakes. All of these skills require practice, and the outcome of practice is often not immediate. Thus we hypothesize that practice itself may be rewarding, and that the prediction–feedback–error-correction cycle that is important for learning, may be motivating across a range of domains. When musicians are learning a new and challenging piece, or perfecting an
old one, they know exactly what they want it to sound like. This representation is translated into a motor plan, and both the imagined outcome and the plan become predictions against which they will measure their performance. When musicians attempt to play the piece, they will likely make errors, which lead to corrections and learning; but when they play the piece as imagined, they experience the reward of accurate performance. Because error feedback and reward are so important for learning, these mechanisms seem like strong candidates for promoting brain plasticity, but have been little explored.
W
D
W
G
F
H
?
Bringing together the data from this review, we suggest three directions for future research. (1) Currently, most studies examine GM and WM differences separately, or do not directly link them through analysis. Analyses typically target differences in individual regions, when it is very likely that plasticity changes occur at the network level. Additionally, groups are defined a priori rather than using datadriven approaches using participant characteristics such as training duration or age-of-start. Implementing these kinds of analyses requires large samples with multiple imaging measures. This implies a multi-center, data-sharing approach where standard behavioral and imaging protocols are implemented to allow aggregation of results. (2) A related goal for music neuroscientists in the next ten years should be the establishment of standardized test batteries with age-based norms that can be administered across locations. A number of groups have been working on the development of tests aimed at children and adults (Dalla Bella et al., 2017; Ireland, Parker, Foster, & Penhune, in press; Mullensiefen, Gingras, Musil, & Stewart, 2014; Peretz et al., 2013). Important features of such norms are: availability, standard of administration, and up-to-date norms.
(3) Studies targeting gene–maturation–environment interactions that will allow us to understand the complex interactions between preexisting individual differences in ability, and the type and timing of music training. Music-specific databases and standard instruments would contribute to the feasibility of such work.
R Abdul-Kareem, I. A., Stancak, A., Parkes, L. M., Al-Ameen, M., Alghamdi, J., Aldhafeeri, F. M., … Sluming, V. (2011). Plasticity of the superior and middle cerebellar peduncles in musicians revealed by quantitative analysis of volume and number of streamlines based on diffusion tensor tractography. Cerebellum 10(3), 611–623. Abdul-Kareem, I. A., Stancak, A., Parkes, L. M., & Sluming, V. (2011). Increased gray matter volume of left pars opercularis in male orchestral musicians correlate positively with years of musical performance. Journal of Magnetic Resonance Imaging 33(1), 24–32. Altenmüller, E., & Furuya, S. (2016). Brain plasticity and the concept of metaplasticity in skilled musicians. Advances in Experimental Medicine and Biology 957, 197–208. Amunts, K., Schlaug, G., Jancke, L., Steinmetz, H., Schleicher, A., Dabringhaus, A., & Zilles, K. (1997). Motor cortex and hand motor skills: Structural compliance in the human brain. Human Brain Mapping 5(3), 206–215. Andersen, R. A., & Cui, H. (2009). Intention, action planning, and decision making in parietal-frontal circuits. Neuron 63(5), 568–583. Baer, L., Park, M., Bailey, J., Chakravarty, M., Li, K., & Penhune, V. (2015). Regional cerebellar volumes are related to early musical training and finger tapping performance. NeuroImage 109, 130–139. Bailey, J. A., & Penhune, V. B. (2010). Rhythm synchronization performance and auditory working memory in early- and late-trained musicians. Experimental Brain Research 204(1), 91–101. Bailey, J. A., & Penhune, V. B. (2012). A sensitive period for musical training: Contributions of age of onset and cognitive abilities. Annals of the New York Academy of Sciences 1252, 163–170. Bailey, J. A., & Penhune, V. B. (2013). The relationship between the age of onset of musical training and rhythm synchronization performance: Validation of sensitive period effects. Frontiers in Neuroscience 7, 227. Retrieved from https://doi.org/10.3389/fnins.2013.00227 Bailey, J. A., Zatorre, R. J., & Penhune, V. B. (2014). Early musical training is linked to gray matter structure in the ventral premotor cortex and auditory-motor rhythm synchronization performance. Journal of Cognitive Neuroscience 26(4), 755–767. Balsters, J. H., Whelan, C. D., Robertson, I. H., & Ramnani, N. (2013). Cerebellum and cognition: Evidence for the encoding of higher order rules. Cerebral Cortex 23(6), 1433–1443. Bangert, M., Peschel, T., Schlaug, G., Rotte, M., Drescher, D., Hinrichs, H., … Altenmüller, E. (2006). Shared networks for auditory and motor processing in professional pianists: Evidence from fMRI conjunction. NeuroImage 30(3), 917–926. Bao, S., Chan, V. T., & Merzenich, M. M. (2001). Cortical remodelling induced by activity of ventral tegmental dopamine neurons. Nature 412(6842), 79–83. Bengtsson, S., Nagy, Z., Skare, S., Forsman, L., Forssberg, H., & Ullén, F. (2005). Extensive piano practicing has regionally specific effects on white matter development. Nature Neuroscience 8(9),
1148–1150. Bermudez, P., Lerch, J. P., Evans, A. C., & Zatorre, R. J. (2009). Neuroanatomical correlates of musicianship as revealed by cortical thickness and voxel-based morphometry. Cerebral Cortex 19(7), 1583–1596. Blecher, T., Tal, I., & Ben-Shachar, M. (2016). White matter microstructural properties correlate with sensorimotor synchronization abilities. NeuroImage 138, 1–12. Brown, R. M., Zatorre, R. J., & Penhune, V. B. (2015). Expert music performance: Cognitive, neural, and developmental bases. Progress in Brain Research 217, 57–86. Butkovic, A., Ullén, F., & Mosing, M. A. (2015). Personality-related traits as predictors of music practice: Underlying environmental and genetic influences. Personality and Individual Differences 74, 133–138. Chang, E. F., & Merzenich, M. M. (2003). Environmental noise retards auditory cortical development. Science 300(5618), 498–502. Chen, J. L., Penhune, V. B., & Zatorre, R. J. (2008). Moving on time: Brain network for auditorymotor synchronization is modulated by rhythm complexity and musical training. Journal of Cognitive Neuroscience 20(2), 226–239. Chen, J. L., Rae, C., & Watkins, K. E. (2012). Learning to play a melody: An fMRI study examining the formation of auditory-motor associations. NeuroImage 59(2), 1200–1208. Chiang, M. C., Barysheva, M., Shattuck, D. W., Lee, A. D., Madsen, S. K., Avedissian, C., … Thompson, P. M. (2009). Genetics of brain fiber architecture and intellectual performance. Journal of Neuroscience 29(7), 2212–2224. D’Ausilio, A., Altenmüller, E., Olivetti Belardinelli, M., & Lotze, M. (2006). Cross-modal plasticity of the motor cortex while listening to a rehearsed musical piece. European Journal of Neuroscience 24(3), 955–958. Dalla Bella, S., Farrugia, N., Benoit, C. E., Begel, V., Verga, L., Harding, E., & Kotz, S. A. (2017). BAASTA: Battery for the Assessment of Auditory Sensorimotor and Timing Abilities. Behavior Research Methods 49(3), 1128–1145. David, S. V., Fritz, J. B., & Shamma, S. A. (2012). Task reward structure shapes rapid receptive field plasticity in auditory cortex. Proceedings of the National Academy of Sciences 109(6), 2144–2149. de Manzano, O., & Ullén, F. (2018). Same genes, different brains: Neuroanatomical differences between monozygotic twins discordant for musical training. Cerebral Cortex 28(1), 387–394. de Villers-Sidani, E., Chang, E. F., Bao, S., & Merzenich, M. M. (2007). Critical period window for spectral tuning defined in the primary auditory cortex (A1) in the rat. Journal of Neuroscience 27(1), 180–189. Diedrichsen, J., Balsters, J. H., Flavell, J., Cussans, E., & Ramnani, N. (2009). A probabilistic MR atlas of the human cerebellum. NeuroImage 46(1), 39–46. Diedrichsen, J., Shadmehr, R., & Ivry, R. B. (2010). The coordination of movement: Optimal feedback control and beyond. Trends in Cognitive Sciences 14(1), 31–39. E, K. H., Chen, S. H., Ho, M. H., & Desmond, J. E. (2014). A meta-analysis of cerebellar contributions to higher cognition from PET and fMRI studies. Human Brain Mapping 35(2), 593– 615. Elbert, T., Pantev, C., Wienbruch, C., Rockstroh, B., & Taub, E. (1995). Increased cortical representation of the fingers of the left hand in string players. Science 270(5234), 305–307. Engel, A., Hijmans, B. S., Cerliani, L., Bangert, M., Nanetti, L., Keller, P. E., & Keysers, C. (2014). Inter-individual differences in audio-motor learning of piano melodies and white matter fiber tract architecture. Human Brain Mapping 35(5), 2483–2497. Floel, A., Breitenstein, C., Hummel, F., Celnik, P., Gingert, C., Sawaki, L., … Cohen, L. G. (2005). Dopaminergic influences on formation of a motor memory. Annals of Neurology 58(1), 121–130.
Foster, N. E., & Zatorre, R. J. (2010). Cortical structure predicts success in performing musical transformation judgments. NeuroImage 53(1), 26–36. Gaser, C., & Schlaug, G. (2003). Brain structure differences between musicians and non-musicians. Journal of Neuroscience 23(27), 9240–9245. Giedd, J., Blumenthal, J., Jeffries, N., Castellanos, F., Liu, H., Zijdenbos, A., … Rapoport, J. (1999). Brain development during childhood and adolescence: A longitudinal MRI study. Nature Neuroscience 2(10), 861–863. Gogos, A., Gavrilescu, M., Davison, S., Searle, K., Adams, J., Rossell, S. L., … Egan, G. F. (2010). Greater superior than inferior parietal lobule activation with increasing rotation angle during mental rotation: An fMRI study. Neuropsychologia 48(2), 529–535. Golestani, N., Molko, N., Dehaene, S., LeBihan, D., & Pallier, C. (2007). Brain structure predicts the learning of foreign speech sounds. Cerebral Cortex 17(3), 575–582. Golestani, N., Paus, T., & Zatorre, R. (2002). Anatomical correlates of learning novel speech sounds. Neuron 35, 997–1010. Graybiel, A. M., & Grafton, S. T. (2015). The striatum: Where skills and habits meet. Cold Spring Harbor Perspectives in Biology 7(8), a021691. doi:10.1101/cshperspect.a021691 Habibi, A., Damasio, A., Ilari, B., Veiga, R., Joshi, A. A., Leahy, R. M., … Damasio, H. (2017). Childhood music training induces change in micro and macroscopic brain structure: Results from a longitudinal study. Cerebral Cortex 1–12. doi:10.1093/cercor/bhx286 Haith, A. M., & Krakauer, J. W. (2013). Model-based and model-free mechanisms of human motor learning. Advances in Experimental Medicine and Biology 782, 1–21. Halwani, G. F., Loui, P., Ruber, T., & Schlaug, G. (2011). Effects of practice and experience on the arcuate fasciculus: Comparing singers, instrumentalists, and non-musicians. Frontiers in Psychology 2, 156. Retrieved from https://doi.org/10.3389/fpsyg.2011.00156 Herholz, S. C., Boh, B., & Pantev, C. (2011). Musical training modulates encoding of higher-order regularities in the auditory cortex. European Journal of Neuroscience 34(3), 524–529. Herholz, S. C., & Zatorre, R. (2012). Musical training as a framework for brain plasticity: Behavior, function, and structure. Neuron 76(3), 486–502. Hund-Georgiadis, M., & von Cramon, D. (1999). Motor-learning-related changes in piano players and non-musicians revealed by functional magnetic-resonance signals. Experimental Brain Research 125(4), 417–425. Hutchinson, S., Lee, L. H., Gaab, N., & Schlaug, G. (2003). Cerebellar volume of musicians. Cerebral Cortex 13(9), 943–949. Hyde, K. L., Lerch, J., Norton, A., Forgeard, M., Winner, E., Evans, A. C., & Schlaug, G. (2009). Musical training shapes structural brain development. Journal of Neuroscience 29(10), 3019–3025. Ireland, K., Parker, A., Foster, N., & Penhune, V. (in press). Rhythm and melody tasks for schoolaged children with and without musical training: Age-equivalent scores and reliability. Frontiers in Auditory Cognitive Neuroscience. Ivry, R. B., Spencer, R. M., Zelaznik, H. N., & Diedrichsen, J. (2002). The cerebellum and event timing. Annals of the New York Academy of Sciences 978, 302–317. James, C. E., Oechslin, M. S., Van De Ville, D., Hauert, C. A., Descloux, C., & Lazeyras, F. (2014). Musical training intensity yields opposite effects on grey matter density in cognitive versus sensorimotor networks. Brain Structure & Function 219(1), 353–366. Jorgensen, H. (2011). Instrumental learning: Is an early start a key to success? British Journal of Music Education 18(3), 227–239. Karpati, F. J., Giacosa, C., Foster, N. E. V., Penhune, V. B., & Hyde, K. L. (2017). Dance and music share gray matter structural correlates. Brain Research 1657, 62–73.
Kelly, R., & Strick, P. (2003). Cerebellar loops with motor cortex and prefrontal cortex of a nonhuman primate. Journal of Neuroscience 23(23), 8432–8444. Kleim, J. A., Barnaby, S., Cooper, N., Hogg, T., Reidel, C., Remple, M., & Nudo, R. (2002). Motor learning-dependent synaptogenesis is localized to functionally reorganized motor cortex. Neurobiology of Learning and Memory 77(1), 63–77. Kleim, J. A., Freeman, J. H., Jr., Bruneau, R., Nolan, B. C., Cooper, N. R., Zook, A., & Walters, D. (2002). Synapse formation is associated with memory storage in the cerebellum. Proceedings of the National Academy of Sciences 99(20), 13228–13231. Kleim, J. A., Hogg, T., VandenBerg, P., Cooper, N., Bruneau, R., & Remple, M. (2004). Cortical synaptogenesis and motor map reorganziation occur during late, but not early, phase of motor skill learning. Journal of Neuroscience 24(3), 628–633. Kolb, B., Mychasiuk, R., Muhammad, A., Li, Y., Frost, D. O., & Gibb, R. (2012). Experience and the developing prefrontal cortex. Proceedings of the National Academy of Sciences 109(Suppl. 2), 17186–17193. Koziol, L. F., Budding, D., Andreasen, N., D’Arrigo, S., Bulgheroni, S., Imamizu, H., … Yamazaki, T. (2014). Consensus paper: The cerebellum’s role in movement and cognition. Cerebellum 13(1), 151–177. Lahav, A., Saltzman, E., & Schlaug, G. (2007). Action representation of sound: Audiomotor recognition network while listening to newly acquired actions. Journal of Neuroscience 27(2), 308–314. Lakhani, B., Borich, M. R., Jackson, J. N., Wadden, K. P., Peters, S., Villamayor, A., … Boyd, L. A. (2016). Motor skill acquisition promotes human brain myelin plasticity. Neural Plasticity 2016, 7526135. doi:10.1155/2016/7526135 Landi, S. M., Baguear, F., & Della-Maggiore, V. (2011). One week of motor adaptation induces structural changes in primary motor cortex that predict long-term memory one year later. Journal of Neuroscience 31(33), 11808–11813. Lappe, C., Herholz, S. C., Trainor, L. J., & Pantev, C. (2008). Cortical plasticity induced by shortterm unimodal and multimodal musical training. Journal of Neuroscience 28(39), 9632–9639. Lee, H., & Noppeney, U. (2011). Long-term music training tunes how the brain temporally binds signals from multiple senses. Proceedings of the National Academy of Sciences 108(51), E1441– E1450. Lee, H., & Noppeney, U. (2014). Music expertise shapes audiovisual temporal integration windows for speech, sinewave speech, and music. Frontiers in Psychology 5, 868. Retrieved from https://doi.org/10.3389/fpsyg.2014.00868 Lega, C., Stephan, M. A., Zatorre, R. J., & Penhune, V. (2016). Testing the role of dorsal premotor cortex in auditory-motor association learning using transcranical magnetic stimulation (TMS). PLoS ONE 11(9), e0163380. Li, X., De Beuckelaer, A., Guo, J., Ma, F., Xu, M., & Liu, J. (2014). The gray matter volume of the amygdala is correlated with the perception of melodic intervals: a voxel-based morphometry study. PLoS ONE 9(6), e99889. Macnamara, B. N., Hambrick, D. Z., & Oswald, F. L. (2014). Deliberate practice and performance in music, games, sports, education, and professions: A meta-analysis. Psychological Science 25(8), 1608–1618. Mas-Herrero, E., Marco-Pallares, J., Lorenzo-Seva, U., Zatorre, R. J., & Rodriguez-Fornells, A. (2011). Individual differences in music reward experiences. Music Perception 31(2), 118–138. Mosing, M. A., Madison, G., Pedersen, N. L., Kuja-Halkola, R., & Ullén, F. (2014). Practice does not make perfect: No causal effect of music practice on music ability. Psychological Science 25(9), 1795–1803.
Mullensiefen, D., Gingras, B., Musil, J., & Stewart, L. (2014). The musicality of non-musicians: An index for assessing musical sophistication in the general population. PLoS ONE 9(2), e89642. Novembre, G., & Keller, P. E. (2014). A conceptual review on action-perception coupling in the musicians’ brain: What is it good for? Frontiers in Human Neuroscience 8, 603. Retrieved from https://doi.org/10.3389/fnhum.2014.00603 Nudo, R., Milliken, G., Jenkins, W., & Merzenich, M. (1996). Use-dependent alterations of movement representations in primary motor cortex of adult squirrel monkeys. Journal of Neuroscience 16(2), 785–807. Paquette, S., Fujii, S., Li, H. C., & Schlaug, G. (2017). The cerebellum’s contribution to beat interval discrimination. NeuroImage 163, 177–182. Peretz, I., Gosselin, N., Nan, Y., Caron-Caplette, E., Trehub, S. E., & Beland, R. (2013). A novel tool for evaluating children’s musical abilities across age and culture. Frontiers in Systems Neuroscience 7, 30. Retrieved from https://doi.org/10.3389/fnsys.2013.00030 Poeppel, D. (2014). The neuroanatomic and neurophysiological infrastructure for speech and language. Current Opinion in Neurobiology 28, 142–149. Putkinen, V., Tervaniemi, M., Saarikivi, K., Ojala, P., & Huotilainen, M. (2014). Enhanced development of auditory change detection in musically trained school-aged children: A longitudinal event-related potential study. Developmental Science 17(2), 282–297. Ragert, P., Schmidt, A., Altenmüller, E., & Dinse, H. (2004). Superior tactile performance and learning in professional pianists: Evidence for meta-plasticity in musicians. European Journal of Neuroscience 19(2), 473–478. Rauschecker, J. (2011). An expanded role for the dorsal auditory pathway in sensorimotor control and integration. Hearing Research 271, 16–25. Repp, B. (2005). Sensorimotor synchronization: A review of the tapping literature. Psychonomic Bulletin and Review 12(6), 969–992. Rosenkranz, K., Williamon, A., & Rothwell, J. C. (2007). Motorcortical excitability and synaptic plasticity is enhanced in professional musicians. Journal of Neuroscience 27(19), 5200–5206. Salimpoor, V. N., Zald, D. H., Zatorre, R. J., Dagher, A., & McIntosh, A. R. (2015). Predictions and the brain: How musical sounds become rewarding. Trends in Cognitive Sciences 19(2), 86–91. Schlaug, G., Jancke, L., Huang, Y., Staiger, J. F., & Steinmetz, H. (1995). Increased corpus callosum size in musicians. Neuropsychologia 33(8), 1047–1055. Schmahmann, J. (1997). The cerebrocerebellar system. In J. Schmahmann (Ed.), The Cerebellum and Cognition (Vol. 41, pp. 31–55). San Diego, CA: Academic Press. Schneider, P., Scherg, M., Dosch, H., Specht, H., Gutschalk, A., & Rupp, A. (2002). Morphology of Heschl’s gyrus reflects enhanced activation in the auditory cortex of musicians. Nature Neuroscience 5(7), 688–694. Schneider, P., Sluming, V., Roberts, N., Scherg, M., Goebel, R., Specht, H. J., … Rupp, A. (2005). Structural and functional asymmetry of lateral Heschl’s gyrus reflects pitch perception preference. Nature Neuroscience 8(9), 1241–1247. Scholz, J., Allemang-Grand, R., Dazai, J., & Lerch, J. P. (2015). Environmental enrichment is associated with rapid volumetric brain changes in adult mice. NeuroImage 109, 190–198. Scholz, J., Klein, M. C., Behrens, T. E., & Johansen-Berg, H. (2009). Training induces changes in white-matter architecture. Nature Neuroscience 12(11), 1370–1371. Scholz, J., Niibori, Y., Frankland, P. W., & Lerch, J. P. (2015). Rotarod training in mice is associated with changes in brain structure observable with multimodal MRI. NeuroImage 107, 182–189. Seither-Preisler, A., Parncutt, R., & Schneider, P. (2014). Size and synchronization of auditory cortex promotes musical, literacy, and attentional skills in children. Journal of Neuroscience 34(33), 10937–10949.
Sluming, V., Barrick, T., Howard, M., Cezayirli, E., Mayes, A., & Roberts, N. (2002). Voxel-based morphometry reveals increased gray matter density in Broca’s area in male symphony orchestra musicians. NeuroImage 17(3), 1613–1622. Sokolov, A. A., Miall, R. C., & Ivry, R. B. (2017). The cerebellum: Adaptive prediction for movement and cognition. Trends in Cognitive Sciences 21(5), 313–332. Steele, C. J., Bailey, J. A., Zatorre, R. J., & Penhune, V. B. (2013). Early musical training and whitematter plasticity in the corpus callosum: Evidence for a sensitive period. Journal of Neuroscience 33(3), 1282–1290. Stephan, M. A., Brown, R., Lega, C., & Penhune, V. (2016). Melodic priming of motor sequence performance: The role of the dorsal premotor cortex. Frontiers in Neuroscience 10, 210. Retrieved from https://www.frontiersin.org/articles/10.3389/fnins.2016.00210 Taubert, M., Draganski, B., Anwander, A., Muller, K., Horstmann, A., Villringer, A., & Ragert, P. (2010). Dynamic properties of human brain structure: Learning-related changes in cortical areas and associated fiber connections. Journal of Neuroscience 30(35), 11670–11677. Tiemeier, H., Lenroot, R. K., Greenstein, D. K., Tran, L., Pierson, R., & Giedd, J. N. (2010). Cerebellum development during childhood and adolescence: A longitudinal morphometric MRI study. NeuroImage 49(1), 63–70. Tremblay, P. L., Bedard, M. A., Langlois, D., Blanchet, P. J., Lemay, M., & Parent, M. (2010). Movement chunking during sequence learning is a dopamine-dependent process: A study conducted in Parkinson’s disease. Experimental Brain Research 205(3), 375–385. Tremblay, P. L., Bedard, M. A., Levesque, M., Chebli, M., Parent, M., Courtemanche, R., & Blanchet, P. J. (2009). Motor sequence learning in primate: Role of the D2 receptor in movement chunking during consolidation. Behavioural Brain Research 198(1), 231–239. Ullén, F., Hambrick, D. Z., & Mosing, M. A. (2016). Rethinking expertise: A multifactorial gene– environment interaction model of expert performance. Psychological Bulletin 142(4), 427–446. Vaquero, L., Hartmann, K., Ripolles, P., Rojo, N., Sierpowska, J., Francois, C., … Altenmüller, E. (2016). Structural neuroplasticity in expert pianists depends on the age of musical training onset. NeuroImage 126, 106–119. Vaquero, L., Rodriguez-Fornells, A., & Reiterer, S. M. (2017). The left, the better: White-matter brain integrity predicts foreign language imitation ability. Cerebral Cortex 27(8), 3906–3917. Westerhausen, R., Luders, E., Specht, K., Ofte, S. H., Toga, A. W., Thompson, P. M., … Hugdahl, K. (2011). Structural and functional reorganization of the corpus callosum between the age of 6 and 8 years. Cerebral Cortex 21(5), 1012–1017. Wolpert, D. M., Diedrichsen, J., & Flanagan, J. R. (2011). Principles of sensorimotor learning. Nature Reviews Neuroscience 12(12), 739–751. Wong, P. C., Warrier, C. M., Penhune, V. B., Roy, A. K., Sadehh, A., Parrish, T. B., & Zatorre, R. J. (2008). Volume of left Heschl’s gyrus and linguistic pitch learning. Cerebral Cortex 18(4), 828– 836. Zatorre, R. J., Chen, J., & Penhune, V. (2007). When the brain plays music: Sensory-motor interactions in music perception and production. Nature Reviews Neuroscience 8, 547–558. Zatorre, R. J., Fields, R. D., & Johansen-Berg, H. (2012). Plasticity in gray and white: Neuroimaging changes in brain structure during learning. Nature Neuroscience 15(4), 528–536.
CHAPT E R 18
GENOMICS APPROACHES FOR STUDYING MUSICAL A P T I T U D E A N D R E L AT E D TRAITS I R MA JÄ RV E L Ä
G
A
S
H
T E cell in the human body contains 46 chromosomes, made up of ~3 billion nucleotides containing about 20,000 individual genes (DixonSalazar & Gleeson, 2010). Of the 20,000 genes, to date the functions of 4,000 genes have been uncovered (http://www.omim.org/). About 1.5 percent of the genome encodes amino acids that form the building blocks of human tissues and organs, the proteins. The human cerebral cortex is made up of ~20 billion neurons, each of which makes an average of 7,000 synaptic contacts (Dixon-Salazar & Gleeson, 2010). The human brain exhibits a higher expression of genes for synaptic transmission and plasticity and higher energy metabolism compared to other primates (Cáceres et al., 2003). Genomic approaches enable the study of biological phenomena in an unbiased and hypothesis-free fashion, without any knowledge of the biological background of the phenotype of interest
(Lander, 2011). Molecular genetic analyses can be applied to study human traits based on their molecular properties rather than anatomic regions. The utility of next generation sequencing technology has facilitated the identification of individual genetic variants (“genetic selfies”) with decreasing cost (Lindor, Thibodeau, & Burke, 2017). This has been exemplified in medical research, where thousands of genes that cause inherited diseases or predisposition to common diseases have been identified. Molecular genetic studies are based on Mendelian rules, knowing that children inherit half of their genes from their mother and half from their father. The inherited variants remain the same during their entire lives. This is the unique strength of DNA studies in the identification of genetic variants associated with human traits. Using statistical methods, genetic loci and alleles can be identified in the human genome that are associated with the trait under study. Genes with their pathways located in the associated regions are the candidate genes whose functions can explain the biological characteristics of a trait under study. Environmental factors (lifestyle) can affect the expression and regulation of genes. The effect of the environmental triggers on the expression and regulation of the genes can be studied for example by RNA- and microRNA-sequencing in humans and model organisms. Methods of genomics and bioinformatics can be applied to combine the data to identify genes and alleles, their regulation, and the pathways linked to musical aptitude and music-related behavioral traits (e.g., music education, listening, performing, creating music; see Fig. 1).
FIGURE 1. The mode of inheritance of human traits spans from monogenic, that is, caused by a single gene, to multifactorial inheritance caused by numerous predisposing variants and environmental factors. Based on genetic and genomics studies musical aptitude is inherited as a multifactorial trait for which both predisposing genetic variants and exposure to music as an environmental factor are needed (Oikkonen et al., 2015; Park et al., 2012; Pulli et al., 2008).
M
A
B T
Musical practices represent distinctive cognitive abilities of humans. In biological (genetic) terms, musical aptitude represents a complex cognitive trait in humans where the function of the auditory pathway (inner ear, brainstem, auditory cortex) and several brain regions are involved. Music is sound that is recognized by hair cells in the inner ear. These sounds are transmitted as electrical signals through the midbrain to the auditory cortex. About 1 percent of all human genes have a function in hearing; of them at least 80 are known to cause hearing loss (http://hereditaryhearingloss.org/) (Atik, Bademci, Diaz-Horta, Blanton, & Tekin, 2017). Brains are naturally very sensitive to environmental exposure to music (Perani et al., 2010) and
music training (see, e.g., Herholz & Zatorre, 2012; Koelsch, 2010). This sensitivity is age-dependent similarly to language (Penhune & de VillersSidani, 2014; White, Hutka, Williams, & Moreno, 2013) or vocal learning in songbirds (Rothenberg, Roeske, Voss, Naguib, & Tchernichovski, 2014). The sensitivity may be linked to emotional content characteristic of musical sounds that have their effect on human body functions (Nakahara, Masuko, Kinoshita, Francis, & Furuya, 2011; Salimpoor, Benovoy, Larcher, Dagher, & Zatorre, 2011). However, the molecular mechanisms and biological pathways mediating the effects of music remain largely unknown. It may be that the ability to detect musical sounds serves as a prerequisite for appreciating music. This ability is called musical aptitude in this chapter. Musical aptitude can include abilities, for example, to perceive and understand intensity, pitch, timbre and tone duration, and the rhythm and structure they form in music. Carl Seashore developed a battery of tests consisting of six subtests that measure pitch, intensity, time, consonance, tonal memory, and rhythm (Seashore, Lewis, & Saetveit, 1960). The Seashore tests for pitch (SP) and for time (ST) consist of pair-wise comparisons of the physical properties of sound, and are used to measure simple sensory capacities such as the ability to detect small differences in tone pitch or length. Karma (1994) developed a music test (KMT) to measure the structure of music that includes recognition of melodic contour, grouping, and relational pitch processing. Auditory structuring ability can be defined as the ability to identify temporal aspects in time (detecting sound patterns in time) (Karma, 1994). A similar kind of pattern recognition is found in many other fields like sport and poetry (comprising language and speech) that resembles gestalt principles in recognition of music structure (Justus & Hutsler, 2005). In zebra finches, identification of acoustic features of song syllables (pitch and timbre) and the speciesspecific typical gap durations (rhythm) between song syllables are detected by different neural cells (Araki, Bandi, & Yazaki-Sukiyama, 2016). Temporal coding of inter-syllable silent gaps seems to be preserved when birds are exposed to different song environments, suggesting that temporal gap coding is innate and species-specific whereas syllable morphology coding is more experience dependent (Araki et al., 2016). In fact, the detection of gaps resembles the detection of pauses in music structure in humans. This is related to understanding tones with time that evokes anticipatory responses because of the cognitive expectations and prediction
cues involved in listening to music (Salimpoor, Zald, Zatorre, Dagher, & McIntosh, 2015). In fact, combined music test scores (KMT, SP, ST) were normally distributed among participants with no specific music education (Oikkonen & Järvelä, 2014) suggesting that the ability to detect pitch, time, and sound patterns is common in populations with no music training. Abilities that animals exhibit without the need for training are referred to as innate traits. The possession of a natural musical ability may explain why musical practices are common and present in all societies. It has been observed that musicianship clusters in families. How much is this aggregation due to genetic and/or environmental factors, such as exposure to music? Several studies have been performed to analyze the inheritance of musical traits. In a twin study using a Distorted Tunes Test (DTT) (the subjects’ task was to recognize wrong tones incorporated into simple popular melodies) the correlation between the test scores was 0.67 in monozygous and 0.44 in dizygous twins (Drayna, Manichaikul, de Lange, & Snieder, 2001). The heritability (defined as the proportion of the total variance of the phenotype that is genetic, h2 = VG/VP, where VG is genetic variance and VP is the overall variance of the phenotype) of the auditory structuring ability test (Karma Music Test, KMT) was 0.46 in the Finnish families examined (Oikkonen et al., 2015). Carl Seashore’s subtests of pitch (SP) and time discrimination (ST) measure the ability to detect small differences between two sequentially presented tones. The heritability scores are 0.68 and 0.21 for SP and ST, respectively. The heritability of combined KMT, SP and ST scores COMB was 0.60 (Oikkonen et al., 2015). The heritability of pitch perception test (PPA) that is based on singing was 40 percent (Park, Lee, Kim, & Ju, 2012). A genetic component has also been demonstrated in rare music phenotypes such as congenital amusia (Peretz, Cummings, & Dube, 2007) and absolute pitch (AP) (Baharloo, Service, Risch, Gitschier, & Freimer, 2000). Congenital amusia is often referred to as “tone deafness” and is a disorder in which a subject’s ability to perceive or produce music is disturbed. A recent family aggregation study showed that the sibling relative risk (λS,) was estimated to be 10.8, which suggests a genetic contribution to the trait (Peretz et al., 2007). Another extreme trait is absolute pitch (AP). AP refers to the ability to identify and name pitches without a reference pitch, and the sibling relative risk (λS) has been estimated to range from 7.8 to 15.1 (Baharloo et al., 2000). In fact, music perception belongs to a class of human cognitive
abilities that has been shown to be highly familial. In the Finnish families, 52 percent of the professional musicians had one or both parents who were also professional musicians (Fig. 2).
FIGURE 2. Parental music education is related to children’s music education. High music education is common among parents of professional musicians (n = 100). Reproduced from Irma Järvelä, Genomics studies on musical aptitude, music perception, and practice, Annals of the New York Academy of Sciences, Special Issue: The Neurosciences and Music 6, p. 2, Figure 1, doi:10.1111/nyas.13620, Copyright © 2018, New York Academy of Sciences.
E
M
A
Evolution is based on genetic alleles that are transmitted through generations during history. Music cultures can develop in diverse directions but they are linked to the genetic alleles in the human genome. These alleles are responsible for biological determinant human traits. Favorable alleles are enriched in the gene pool showing high allele frequencies associated with the beneficial trait, whereas damaging alleles that cause harmful effects tend to disappear from the gene pool. The universality of music in all societies suggests that beneficial alleles do underlie music-related behavior. However, it is not known what distinguishes humans from primates with regard to musical ability and what are the biological determinants underlying artistic cognitive traits. It is notable that modern humans have an auditory center that functions identically to that of the first primates that lived millions of years ago (Parker et al., 2013). Adaptive convergent sequence evolution has also been found in echolocating bats and dolphins (Montealegre-Z, Jonsson, RobsonBrown, Postles, & Robert, 2012), implying that numerous genes are linked not only to hearing but also vision. Interestingly, several birdsong genes were shown to be upregulated when listening to and performing music (Guo et al., 2007; Horita et al., 2012; Kanduri, Kuusi, et al., 2015; Kanduri, Raijas, et al., 2015; Pfenning et al., 2014). These data suggest that the machinery to facilitate the hearing of sounds is highly conserved. It facilitates communication via sounds important for the survival of humans and other species. Vocal learning in songbirds shows similar features to those found in humans (Araki et al., 2016). Recent studies on songbirds have shown that there exist two different types of brain cells in the bird auditory cortex, these register song syllables in zebra finches (Araki et al., 2016). One type identifies the acoustic features of song syllables (pitch and timbre) that are affected more by the environment whereas the other type detects the species-specific typical gap durations (rhythm) between song syllables which are preserved (Araki et al., 2016). Advanced cognitive abilities are characteristic of humans and are likely to be the recent result of positive selection (Sabeti et al., 2006). For example, FOXP2 has been implicated in human speech and language has
been under positive selection during recent human evolution (Enard et al., 2002). As genetic evolution is much slower than cultural evolution, we and others (Honing, Ten Cate, Peretz, & Trehub, 2015) hypothesize that the genetic variants associated with musical aptitude have a pivotal role in the development of music culture. In comparison, in songbirds, the evolution of song culture is the result of a multigenerational process where the song is developed by vertical transmission in a species-specific fashion suggesting genetic constraints (Lipkind & Tchernichovski, 2011). This emphasizes the importance of the selection of parental singing skills and their genetic background in evolution. According to the Mendelian rules, half of the genes are directly inherited to the offspring. In fact, the genetic component is larger as the other half of the genes that are not transmitted to the children shape the parental behavior and affect the children’s development. Concordantly, Hambrick et al. (2014) have shown that training in music is responsible for about 30 percent of music performance in professional musicians implying that other factors including genes have a larger effect. In a Swedish twin study, it was found that willingness to practice music is an independent personality trait that has a high heritability (40–70 percent) (Mosing, Madison, Pedersen, KujaHalkola, & Ullén, 2014). These results point to a greater and independent role of genetic factors contributing to music perception and practice. Genomic approaches can be used to identify the regions of positive selection in the human genome. Variations in the music test scores of auditory structuring ability (Karma Music Test; KMT), Carl Seashore’s subtests of pitch (SP) and time discrimination (ST) suggest that the alleles may have been targeted for selection. When testing three methods for selection, haplotype based methods haploPS, XP-EHH, and the allele frequency based method FST in the combined phenotype of three music test scores shown earlier (COMB), hundreds of genes were found in the selection regions (Liu et al., 2016). Several of them were known to be involved in auditory perception and inner ear development (DICER1, FGF20, CUX1, SPARC, KIF3A, TGFB3, LGR5, GPR98, PAX8, COL11A1, USH2A, PROX1). The findings are consistent with the convergent evolution of genes related to auditory processes and communication in other species (Montealegre-Z et al., 2012; Parker et al., 2013; Zhang et al., 2014). Some genes were known to affect cognition and memory (e.g., GRIN2B, IL1A,
IL1B, RAPGEF5) and reward mechanisms (RGS9). Interestingly, several genes were linked to song perception and production in songbirds (e.g., FOXP1, RGS9, GPR98, GRIN2B, VLDLR). Of these GPR98 expressed in the song control nuclei of the vocalizing songbird (zebra finch) has been found to be under positive selection in the songbird lineage (Pfenning et al., 2014). Some hypotheses could be constructed based on previous biological knowledge about the identified genes. FOXP2 has been implicated in an inherited language disorder (Lai, Fisher, Hurst, Vargha-Khadem, & Monaco, 2001) that causes disturbances in the ability to detect timing (rhythm) but not pitch in music (Alcock, Passingham, Watkins, & VarghaKhadem, 2000). This is concordant with the different brain cells that are responsible for pitch and timing in songbirds (Araki et al., 2016). FOXP1 and another candidate gene VLDLR, a direct target gene of human FOXP2 (Ayub et al., 2013; Vernes et al., 2007), belong to the singing-regulated gene networks in the zebra finch. VLDLR, very-low-density lipoprotein receptor (Vldlr) is a member of the Reelin pathway, which affects learned vocalization (Hilliard, Miller, Fraley, Horvath, & White, 2012). GRIN2B is associated with learning, brain plasticity, and cognitive performance in humans (Kauppi, Nilsson, Adolfsson, Eriksson, & Nyberg, 2011) and belongs to the ten prioritized genes in convergent analysis of musical traits in animals and humans (Oikkonen, Onkamo, Järvelä, & Kanduri, 2016). RGS9 is expressed in the striatum and belongs to the regulator of Gprotein signaling (RGS) gene family that plays a key role in regulating intracellular signaling of G-protein coupled receptors, such as dopamine. The data support the previous findings of the role of dopaminergic pathway and its link to the reward mechanism as molecular determinants in the positive selection of music (Salimpoor et al., 2011). This preliminary study identified a huge amount of functionally relevant candidate genes which underlie the evolution of music. Further studies may give a more accurate picture after methods to analyze polygenic selection become available (Qian, Deng, Lu, & Xu, 2013).
G
-W A
L M
A T
For assigning genetic markers associated with a trait such as musical aptitude the definition of the phenotype is required. As musical aptitude is a complex cognitive trait, it is likely that its individual components have distinct molecular backgrounds. Each of these components (subphenotype) can be analyzed separately and they can also be combined. In a genome-wide study of musical aptitude nearly 800 family members were defined for auditory structuring ability (Karma Music Test, KMT) (Karma, 1994) and perception of pitch and time (Seashore et al., 1960) in music and a combined test score of all the three aforementioned test scores (COMB). When the family material was analyzed for 660,000 genetic markers several genetic loci were found in the human genome (Oikkonen et al., 2015). The identified loci contained candidate genes that affect inner ear development and neurocognitive processes, which are necessary traits for music perception. The highest probability of linkage was obtained at 4q22 (Oikkonen et al., 2015). Earlier, chromosome 4q22 was found in a smaller family material using microsatellite marker scan (Pulli et al., 2008). The strongest association (in unrelated subjects) was found upstream of GATA binding protein 2 (GATA2) at chromosome 3q21.3. GATA2 is a relevant candidate gene as it regulates the development of cochlear hair cells (Haugas, Lilleväli, Hakanen, & Salminen, 2010) and the inferior colliculus (IC) (Lahti, Achim, & Partanen, 2013) important in tonotopic mapping, that is, the processing of sounds of different frequency in the brain. Interestingly, GATA2 is abundantly expressed in dopaminergic neurons (Scherzer et al., 2008) that release dopamine during emotional arousal to music (Salimpoor et al., 2011). Several plausible candidate genes were located at 4p14 with the highest probability of linkage in the family study (Oikkonen et al., 2015). The pitch perception accuracy (SP) was linked next to the protocadherin 7 gene (PCHD7), expressed in the cochlear (Lin et al., 2012) and amygdaloid (Hertel, Redies, & Medina, 2012) complexes. PCHD7 is a relevant candidate gene for pitch perception functioning in the hair cells of the cochlea that recognize pitches (Gosselin, Peretz, Johnsen, & Adolphs, 2007). The amygdala is the emotional center of the human brain affected by music (Koelsch, 2010). Interestingly, the homologous gene PCDH15 also affects hair cell sensory transduction and together with cadherin type 23 (CDH23) (another candidate gene at chromosome 16) form a tip-link with each other in sensory hair cells (Sotomayor, Weihofen, Gaudet, & Corey, 2012). Moreover, the Pcdha–gene cluster was found in
the CNV-study of musical aptitude (Ukkola-Vuoti et al., 2013). Plateletderived growth factor receptor alpha polypeptide (PDGFRA) is expressed in the hippocampus (Di Pasquale et al., 2003), associated with learning and memory. The potassium channel tetramerisation domain containing 8 (KCTD8) is expressed in the spiral ganglion of the cochlea (Metz, Gassmann, Fakler, Schaeren-Wiemers, & Bettler, 2011). KCTD8 also interacts with the GABA receptors GABRB1 and GABRB2; of these, GABRb1 protein is reduced in schizophrenia, bipolar disorder, and major depression, diseases that severely affect human cognition and mood regulation (Fatemi, Folsom, Rooney, & Thuras, 2013). Cholinergic receptor, nicotinic alpha 9 (neuronal) (CHRNA9) (Katz et al., 2004) and the paired-like homeobox 2b (PHOX2B) (Ousdal et al., 2012) on chromosome 4 also affect inner ear development. In addition, PHOX2B increases amygdala activity and autonomic functions (blood pressure, heart rate, and respiration) that are reported to be affected by music (Blood & Zatorre, 2001). The genome-wide analyses performed on Mongolian families using the pitch perception accuracy (PPA) test identified a partly shared genetic region on chromosome 4q (Park et al., 2012). The statistically most significant locus found in a genome-wide linkage study of absolute pitch (AP) is located at 8q24.21 (Theusch, Basu, & Gitschier, 2009). The results suggest that musical aptitude is an innate ability that is affected by several predisposing genetic variants (Fig. 1). Genome-wide copy number variation (CNV) analysis revealed regions that contain candidate genes for neuropsychiatric disorders were associated with musical aptitude (Ukkola-Vuoti et al., 2013). A deletion covering the protocadherin-a gene cluster 1–9 (PCDHA 1–9) was associated with low music test scores (COMB) both in familial and sporadic cases. PCDHAs affect synaptogenesis and maturation of the serotonergic projections in the brain and Pcdha mutant mice show abnormalities in learning and memory (Katori et al., 2009).
T P
E
M H
P T
Music acts as an environmental trigger. Numerous studies have shown that listening to and performing classical music have an effect on the human body (Blood & Zatorre, 2001; Salimpoor et al., 2011). When comparing genome-wide RNA expression profiles before and after listening to classical music and after a “music-free” control session, the activity of genes involved in dopamine secretion and transport (SNCA, RTN4, and SLC6A8), and learning and memory (SNCA, NRGN, NPTN, RTN4) were enhanced (Kanduri, Kuusi, et al., 2015). Of these genes, SNCA (George, Jin, Woods, & Clayton, 1995), NRGN (Wood, Olson, Lovell, & Mello, 2008), and RGS2 affect song learning and singing in songbirds (Clayton, 2000) suggesting a shared evolutionary background of sound perception between vocalizing birds and humans. It is noteworthy that the effect of music was only detectable in musically-experienced listeners. The lack of the effect of music in novices could be explained by differences in the amount of exposure to music that is known to affect brain structure and function (Elbert, Pantev, Wienbruch, Rockstroh, & Taub, 1995; Gaser & Schlaug, 2003), unfamiliarity with the music (Salimpoor, Benovoy, Longo, Cooperstock, & Zatorre, 2009), or musical anhedonia (Martínez-Molina, Mas-Herrero, Rodríguez-Fornells, Zatorre, & Marco-Pallarés, 2016). In addition, listening to music increased the expression of the target genes of the dopaminoceptive neuronal glucocorticoid receptor (NR3C1), which increases the synaptic concentration of dopamine linked to rewarding and reinforcing properties (Ambroggi et al., 2009). It is of note that NR3C1 is also a key molecule in the regulation of addictive behavior. Music performance by professional musicians involves a wide spectrum of cognitive and multisensory motor skills, whose molecular basis is largely unknown. The effect of music performance on the genome-wide peripheral blood transcriptome of professional musicians was analyzed by collecting RNA-samples before and after a two-hour concert performance and after a “music-free” control session. The upregulated genes were found to affect dopaminergic neurotransmission, motor behavior, neuronal plasticity, and neurocognitive functions including learning and memory. Specifically, performance of music by professional musicians increased the expression of FOS, DUSP1, SNCA, catecholamine biosynthesis, and dopamine metabolism (Kanduri, Raijas, et al., 2015). Interestingly, SNCA, FOS, and DUSP1 are involved in song perception and production in songbirds. Thus, both listening to and performing music shared partially the same genes as
those affected in songbird singing. It is noteworthy that although the brains of songbirds are small they have a double density neuron structure compared to primate brains of the same mass. Thus, the large number of neurons can contribute to the neural basis of cognitive capacity (Enard, 2016). In both listening to and performing music (Kanduri, Kuusi, et al., 2015; Kanduri, Raijas, et al., 2015), one of the strongest activations was detected in the alfa-synuclein gene (SNCA), which has a physiological role in the development of nerve cells and releases neurotransmitters, especially dopamine from the presynaptic cells. Dopamine is responsible for motor functions and genes known to affect growth and the plasticity of nerve cells and the inactivation of genes affecting neurodegeneration (Kanduri, Raijas, et al., 2015). SNCA is located in the best linkage region of musical aptitude on chromosome 4q22.1 and is regulated by GATA2 residing at 3q21, the region with the most significant association with musical aptitude thus linking the results of the GWA study and transcriptional profiling studies to the same locus (Kanduri, Kuusi, et al., 2015; Kanduri, Raijas, et al., 2015; Oikkonen et al., 2015) (Fig. 3). GATA2 is abundantly expressed in dopaminergic neurons and binds to intron-1 of endogenous neuronal SNCA to regulate its expression. The results are in agreement with neurophysiological studies where increases in endogenous dopamine have been detected in the striatum when listening to music (Blood & Zatorre, 2001). Interestingly, SNCA is a causative gene for Parkinson’s disease (with disturbed dopamine metabolism) (Petrucci, Ginevrino, & Valente, 2016) and variations in SNCA predispose to Lewy-body dementia (Peuralinna et al., 2008). Listening to music and music performance had partially different effects on gene expression. Some genes such as ZNF223 and PPP2R3A were downregulated after music listening but upregulated after music performance (Kanduri, Kuusi, et al., 2015; Kanduri, Raijas, et al., 2015). ZNF223 is a zinc-finger transcription regulator and similar to an immediate early response gene (IEG) ZNF225 (also known as ZENK, EGR1) that regulates the song control system of songbirds (Dong & Clayton, 2008). PPP2R3A, abundantly expressed in the striatum, is known to integrate the effects of dopamine and other neurotransmitters (Ahn et al., 2007). Other IEGs such as FOS and DUSP1 that are known to be responsible for the song control nuclei of songbirds were upregulated only after music
performance (Kanduri, Kuusi, et al., 2015), but not music listening (Kanduri, Raijas, et al., 2015). Many other song perception-related genes in songbirds like RGS2 were found to be differentially regulated after listening to music, but not after music performance (Kanduri, Kuusi, et al., 2015; Kanduri, Raijas, et al., 2015). The reasons for the differences are plausibly due, for example, to different types of musical activity and different study subjects.
FIGURE 3. The results of DNA- and RNA-studies of music-related traits converge at chromosome 4q22. The alpha-synuclein gene (SNCA) upregulated by listening to music (Kanduri, Raijas, et al., 2015) and music performance by professional musicians (Kanduri, Kuusi, et al., 2015) is located at the most significant region of musical aptitude (Oikkonen et al., 2015; Park et al., 2012; Pulli et al., 2008) and regulated by GATA2, associated with musical aptitude (Oikkonen et al., 2015). Reproduced from Irma Järvelä, Genomics studies on musical aptitude, music perception, and practice, Annals of the New York Academy of Sciences, Special Issue: The Neurosciences and Music 6, p. 4, Fig. 2, doi:10.1111/nyas.13620, Copyright © 2018, New York Academy of Sciences.
At the molecular level, auditory perception processes have been shown to exhibit convergent evolution across species (Sotomayor et al., 2012; Zhang et al., 2014). Among them is protocadherin15 (PCDH15), also found in human genome-wide association study of musical aptitude (Oikkonen et al., 2015). Also, gene expression specializations have been detected in the regions of the brain that are essential for auditory perception and
production, both in humans and songbirds (Pfenning et al., 2014; Salimpoor et al., 2011; Whitney et al., 2014).
C
A
Integration of data from various species helps to prioritize genes most relevant to the phenotype. A rich literature exists about genes affecting the vocal learning of different species, especially songbirds (Clayton, 2013; Pfenning et al., 2014) and recently, data have been gathered about candidate genes associated with human musical traits (Kanduri, Kuusi, et al., 2015; Kanduri, Raijas, et al., 2015; Liu et al., 2016; Oikkonen et al., 2015; Park et al., 2012; Pulli et al., 2008). When ranking the hitherto known data about the candidate genes found in musical aptitude, music listening, and performance with genes identified in vocalizing animal species, data about brain and tissue-specific molecules and pathways can be utilized, which is not possible in human studies alone. Convergent analysis of genes identified in vocalizing animals and human music-related traits revealed that the most common candidate genes were activity dependent immediate early genes (IEGs) including EGR1, FOS, ARC, BDNF, and DUSP1 (Oikkonen et al., 2016). IEGs respond to sensory and motor stimuli in the brain. Of these, EGR1 is widely expressed in brain regions that affect cognition, emotional response, and sensitivity to reward in the rat (Duclot & Kabbaj, 2017). EGR1 is upregulated by song perception and production in songbirds (Avey, Kanyo, Irwin, & Sturdy, 2008; Drnevich, et al., 2012). Interestingly, EGR1 is the only highly ranked gene in all human phenotypes like music listening, music performing, and musical aptitude. In contrast, PHIP, noradrenalin, and NR4A2 were ranked among the top molecules in the whole sample as well as within music listening studies, but not within music performance (e.g., singing) related studies, whereas DUSP1, PKIA, and DOPEY2 were the top genes specifically in music practice. These results support at least partially different molecular backgrounds for musicrelated processes. FOS and DUSP1 were activated when professional musicians played a concert (Kanduri, Kuusi, et al., 2015). Other candidate genes like FOXP2 and GRIN2B have been shown to be critical for vocal communication in songbirds (Haesler et al., 2004) and cognitive
development, including speech in humans (Hu, Chen, Myers, Yuan, & Traynelis, 2016), and they are located in the selection regions for musical aptitude (Liu et al., 2016). There are still limitations in comparative studies as the avian genomes contain only ~70 percent of the number of the human genes (Zhang et al., 2014). Convergent evidence for genes involved in functions like cognition, learning, and memory has been reported in music-related activities (Oikkonen et al., 2016). Several pathways were identified describing the interaction and function of the identified genes. Among them, the CDK5 signaling pathway regulates cognitive functions in the brain. Interestingly, the MEK gene, a member of the CDK5 signaling pathway is necessary for song learning in songbirds (London & Clayton, 2008). There is a partially shared genetic predisposition for musical abilities and general cognition (Mosing et al., 2014; Mosing, Madison, Pedersen, & Ullén, 2016). Human cognitive capacity has evolved rapidly, therefore it is highly likely that human-specific pathways and genes may underlie human musical abilities. Obviously, cognition-related genes are a plausible group of candidate genes for elucidating the more recent evolution of music-related traits.
B
B A
C M
Creativity in music is an essential part of the development of music culture and industry. Creative activity in music, composing, improvising, or arranging, is common (Oikkonen et al., 2016). Some evidence for the biological basis of creativity in music has been obtained from brain imaging studies where composing (Brown, Martinez, & Parsons, 2006) and improvising musical pieces are shown to affect several regions in the brain such as the medial prefrontal cortex, premotor areas, and auditory cortex (Dietrich & Kanso, 2010; Limb & Braun, 2008; Liu et al., 2012). Listening to music has been shown to increase dopamine in a human PET study (Salimpoor, 2011). So far, genomics approaches have been applied rarely to creative activities in general or specifically in musical activities. Dopaminergic genes appeared to be upregulated in genomic studies of musical aptitude
and related traits (Kanduri, Kuusi, et al., 2015; Kanduri, Raijas, et al., 2015; Oikkonen et al., 2015), and some of them, such as FOS and FOXP2 have also been found in songbirds (Murugan, Harward, Scharff, & Mooney, 2013; Nordeen, Holtzman, & Nordeen, 2009). The dopamine D4 receptor gene (DRD4) is an interesting candidate for creativity in music. It mediates dopamine signaling at the neuronal synapses. Two of its signaling variants (7R and 2R) have been associated not only with novelty seeking and altruism, but also financial risk taking and heavy drinking: This kind of behavior can be seen as a sensitivity toward the influences from the environment (Kitayama, King, Hsu, Liberzon, & Yoon, 2016). It may be that the carriers of the 7R/2R variants have the capacity to adopt new ways of behavior such as to create new music (Kitayama et al., 2016) (Fig. 4). This may serve as an example of how genetic variants can affect cultural evolution (Kim & Sasaki, 2014). In fact, many composers are known to have composed music describing actual occurrences in society.
FIGURE 4. Several music-related traits are found to be linked to dopaminergic metabolism.
Based on a large epidemiological study from Sweden, individuals in creative professions were more likely to suffer from bipolar disorder (Kyaga et al., 2011) and creative professions were overrepresented among the first-degree relatives of patients with neuropsychiatric disorders (e.g., schizophrenia and bipolar disorder) indicating familial co-segregation of creativity and neuropsychiatric disorders (Kyaga et al., 2013). When the known genes and alleles associated with neuropsychiatric disorders (Lee, Ripke, Neale, & Cross-Disorder Group, 2013; Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014) were analyzed among artistic professions including musicians, the risk alleles were more prevalent in the artistic professions (Power et al., 2015). When professional musicians played a traditional classic concert, several genes reported to be mutated in neuropsychiatric or neurodegenerative diseases were affected (Kanduri, Kuusi, et al., 2015). This finding may reflect creative activities plausibly linked to music performance. Thus, molecular genetic studies give evidence that artistic creativity and neuropsychiatric disorders are partially shared by the same predisposing genetic variants. Creativity is likely rewarding, whereas diseases cause suffering. It is currently not known which of the numerous risk alleles of neuropsychiatric disorders are required and which are the individual family and environmental protective or risk factors that underlie complex phenotypes like creativity and neuropsychiatric disorders.
C Empirical research on the biological background of music-related human traits has been introduced using genomics methods. Genes affecting inner ear development, dopaminergic systems, learning, and memory were found as candidate genes for musical aptitude, listening to and performing music. In addition, the several genes previously known to affect vocal learning of songbirds were identified as candidate genes for music perception and practice. Activity dependent immediate early genes (IEGs) were the most commonly ranked genes by humans and songbirds in convergent analysis. IEGs like EGR1 are critical mediators of gene–environment interactions characterized by rapid and dynamic responses to neuronal activity and
reward-related synaptic plasticity (Duclot & Kabbaj, 2017), also reported in music-related studies (Salimpoor et al., 2011; Schneider et al., 2002). IEGs could thus serve as plausible candidate genes to mediate the effects of music as an environmental factor. Replication studies and studies using epigenomics methods are warranted to further elucidate the biological background of music-related traits.
R Ahn, J. H., Sung, J. Y., McAvoy, T., Nishi, A., Janssens, V., Goris, J., … Nairn, A. C. (2007). The B"/PR72 subunit mediates Ca2+-dependent dephosphorylation of DARPP-32 by protein phosphatase 2A. Proceedings of the National Academy of Sciences 104(23), 9876–9881. Alcock, K. J., Passingham, R. E., Watkins, K., & Vargha-Khadem, F. (2000). Pitch and timing abilities in inherited speech and language impairment. Brain & Language 75(1), 34–46. Ambroggi, F., Turiault, M., Milet, A., Deroche-Gamonet, V., Parnaudeau, S., Balado, E., … Tronche, F. (2009). Stress and addiction: Glucocorticoid receptor in dopaminoceptive neurons facilitates cocaine seeking. Nature Neuroscience 12(3), 247–249. Araki, M., Bandi, M. M., & Yazaki-Sukiyama, Y. (2016). Mind the gap: Neural coding of species identity in birdsong prosody. Science 354(6317), 1282–1287. Atik, T., Bademci, G., Diaz-Horta, O., Blanton, S. H., & Tekin, M. (2017). Whole-exome sequencing and its impact in hereditary hearing loss. Genetics Research 97, e4. doi:10.1017/S001667231500004X Avey, M. T., Kanyo, R. A., Irwin, E. L., & Sturdy, C. B. (2008). Differential effects of vocalization type, singer and listener on ZENK immediate early gene response in black-capped chickadees (Poecile atricapillus). Behavioural Brain Research 188(1), 201–208. Ayub, Q., Yngvadottir, B., Chen, Y., Xue, Y., Hu, M., Vernes, S. C., … Tyler-Smith, C. (2013). FOXP2 targets show evidence of positive selection in European populations. American Journal of Human Genetics 92(5), 696–706. Baharloo, S., Service, S. K., Risch, N., Gitschier, J., & Freimer, N. B. (2000). Familial aggregation of absolute pitch. American Journal of Human Genetics 67(3), 755–758. Blood, A. J., & Zatorre, R. J. (2001). Intensely pleasurable responses to music correlate with activity in brain regions implicate in reward and emotion. Proceedings of the National Academy of Sciences 98(20), 11818–11823. Brown, S., Martinez, M. J., & Parsons, L. M. (2006). Music and language side by side in the brain: A PET study of the generation of melodies and sentences. European Journal of Neuroscience 23(10), 2791–2803. Cáceres, M., Lachuer, J., Zapala, M. A., Redmond, J. C., Kudo, L., Geschwind, D. H., … Barlow, C. (2003). Elevated gene expression levels distinguish human from non-human primate brains. Proceedings of the National Academy of Sciences 100(22), 13030–13035. Clayton, D. F. (2000). The genomic action potential. Neurobiology of Learning and Memory 74(3), 185–216. Clayton, D. F. (2013). The genomics of memory and learning in songbirds. Annual Review of Genomics and Human Genetics 14, 45–65.
Di Pasquale, G., Davidson, B. L., Stein, C. S., Martins, I., Scudiero, D., Monks, A., & Chiorini, J. A. (2003). Identification of PDGFR as a receptor for AAV-5 transduction. Nature Medicine 9, 1306– 1312. Dietrich, A., & Kanso, R. (2010). A review of EEG, ERP, and neuroimaging studies of creativity and insight. Psychological Bulletin 136(5), 822–848. Dixon-Salazar, T. J., & Gleeson, J. G. (2010). Genetic regulation of human brain development: Lessons from Mendelian diseases. Annals of the New York Academy of Sciences 1214, 156–167. Dong, S., & Clayton, D. F. (2008) Partial dissociation of molecular and behavioral measures of song habituation in adult zebra finches. Genes, Brain and Behavior 7(7), 802–809. Drayna, D., Manichaikul, A., de Lange, M., & Snieder, H. (2001). Genetic correlates of musical pitch recognition in humans. Science 291(5510), 1969–1972. Drnevich, J., Replogle, K. L., Lovell, P., Hahn, T. P., Johnson, F., Mast, T. G., … Clayton, D. F. (2012). Impact of experience-dependent and -independent factors on gene expression in songbird brain. Proceedings of the National Academy of Sciences 109(Suppl. 2), 17245–17252. Duclot, F., & Kabbaj, M. (2017). The role of Early Growth Response 1 (EGR1) in brain plasticity and neuropsychiatric disorders. Frontiers in Behavioral Neuroscience 11, 35. Retrieved from https://doi.org/10.3389/fnbeh.2017.00035 Elbert, T., Pantev, C., Wienbruch, C., Rockstroh, B., & Taub, E. (1995). Increased cortical representation of the fingers of the left hand in string players. Science 270(5234), 305–307. Enard, W. (2016). The molecular basis of human brain evolution. Current Biology 26(20), R1109– R1117. Enard, W., Przeworski, M., Fisher, S. E., Lai, C. S., Wiebe, V., Kitano, T., … Pääbo, S. (2002). Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418(6900), 869– 872. Fatemi, S. H., Folsom, T. D., Rooney, R. J., & Thuras, P. D. (2013). Expression of GABAA a2-, b1and e-receptors are altered significantly in the lateral cerebellum of subjects with schizophrenia, major depression and bipolar disorder. Translational Psychiatry 3, e303. doi:10.1038/tp.2013.64 Gaser, C., & Schlaug, G. (2003). Brain structures differ between musicians and non-musicians. Journal of Neuroscience 23(27), 9240–9245. George, J. M., Jin, H., Woods, W. S., & Clayton, D. F. (1995). Characterization of a novel protein regulated during the critical period for song learning in the zebra finch. Neuron 15, 361–372. Gosselin, N., Peretz, I., Johnsen, E., & Adolphs, R. (2007). Amygdala damage impairs emotion recognition from music. Neuropsychologia 45(2), 236–244. Guo, Y. P., Sun, X., Li, C., Wang, N. Q., Chan, Y. S., & He, J. (2007). Corticothalamic synchronization leads to c-fos expression in the auditory thalamus. Proceedings of the National Academy of Sciences 104(28), 11802–11807. Haesler, S., Wada, K., Nshdejan, A., Morrisey, E. E., Lints, T., Jarvis, E. D., & Scharff, C. (2004). FoxP2 expression in avian vocal learners and non-learners. Journal of Neuroscience 24(13), 3164– 3175. Hambrick, D. Z., Oswald, F. L., Altmann, E. M., Meinz, E. J., Gobet, F., & Campitelli, G. (2014). Deliberate practice: Is that all it takes to become an expert? Intelligence 45, 34–45. Haugas, M., Lilleväli, K., Hakanen, J., & Salminen, M. (2010). Gata2 is required for the development of inner ear semicircular ducts and the surrounding perilymphatic space. Developmental Dynamics 239(9), 2452–2469. Herholz, S. C., & Zatorre, R. J. (2012). Musical training as a framework for brain plasticity: Behavior, function, and structure. Neuron 76(3), 486–502. Hertel, N., Redies, C., & Medina, L. (2012). Cadherin expression delineates the divisions of the postnatal and adult mouse amygdala. Journal of Comparative Neurology 520(17), 3982–4012.
Hilliard, A. T., Miller, J. E., Fraley, E. R., Horvath, S., & White, S. A. (2012). Molecular microcircuitry underlies functional specification in a basal ganglia circuit dedicated to vocal learning. Neuron 73(3), 537–552. Honing, H., Ten Cate, C., Peretz, I., & Trehub, S. E. (2015). Without it no music: Cognition, biology and evolution of musicality. Philosophical Transactions of the Royal Society of London B: Biological Sciences 370(1664): 20140088. doi:10.1098/rstb.2014.0088 Horita, H., Kobayashi, M., Liu, W.-C., Oka, K., Jarvis, E. D., & Wada, K. (2012). Specialized motordriven dusp1 expression in the song systems of multiple lineages of vocal learning birds. PLoS ONE 7, e42173. Hu, C., Chen, W., Myers, S. J., Yuan, H., & Traynelis, S. F. (2016). Human GRIN2B variants in neurodevelopmental disorders. Journal of Pharmacological Sciences 132(2), 115–121. Justus, T., & Hutsler, J. J. (2005). Fundamental issues in the evolutionary psychology of music: Assessing innateness and domain specificity. Music Perception 23(1), 1–27. Kanduri, C., Kuusi, T., Ahvenainen, M., Philips, A. K., Lähdesmäki, H., & Järvelä, I. (2015).The effect of music performance on the transcriptome of professional musicians. Scientific Reports 5, 9506. doi:10.1038/srep09506 Kanduri, C., Raijas, P., Ahvenainen, M., Philips, A. K., Ukkola-Vuoti, L., Lähdesmäki, H., & Järvelä, I (2015). The effect of listening to music on human transcriptome. PeerJ 3, e830. Retrieved from https://doi.org/10.7717/peerj.830 Karma, K. (1994). Auditory and visual temporal structuring: How important is sound to musical thinking? Psychology of Music 22(1), 20–30. Katori, S., Hamada, S., Noguchi, Y., Fukuda, E., Yamamoto, T., Yamamoto, H., … Yagi, T. (2009). Protocadherin-alpha family is required for serotonergic projections to appropriately innervate target brain areas. Journal of Neuroscience 29(29), 9137–9147. Katz, E., Elgoyhen, A. B., Gómez-Casati, M. E., Knipper, M., Vetter, D. E., Fuchs, P. A., & Glowatski, E. (2004). Developmental regulation of nicotinic synapses on cochlear inner hair cells. Journal of Neuroscience 24(36), 7814–7820. Kauppi, K., Nilsson, L.-G., Adolfsson, R., Eriksson, E. & Nyberg, L. (2011). KIBRA polymorphism is related to enhanced memory and elevated hippocampal processing. Journal of Neuroscience 31, 14218–14222. Kim, H. S., & Sasaki, J. Y. (2014). Cultural neuroscience: Biology of the mind in cultural contexts. Annual Review of Psychology 65, 487–514. Kitayama, S., King, A., Hsu, M., Liberzon, I., & Yoon, C. (2016). Dopamine-system genes and cultural acquisition: The norm sensitivity hypothesis. Current Opinion in Psychology 8, 167–174. Koelsch, S. (2010). Towards a neural basis of music-evoked emotions. Trends in Cognitive Sciences 14(3), 131–137. Kyaga, S., Landén, M., Boman, M., Hultman, C. M., Långström, N., & Lichtenstein, P. (2013). Mental illness, suicide and creativity: 40-year prospective total population study. Journal of Psychiatric Research 47(1), 83–90. Kyaga, S., Lichtenstein, P., Boman, M., Hultman, C., Långström, N., & Landén, M. (2011). Creativity and mental disorder: Family study of 300,000 people with severe mental disorder. British Journal of Psychiatry 199(5), 373–379. Lahti, L., Achim, K., & Partanen, J. (2013). Molecular regulation of GABAergic neuron differentiation and diversity in the developing midbrain. Acta Physiologica (Oxford) 207(4), 616– 627. Lai, C. S., Fisher, S. E., Hurst, J. A., Vargha-Khadem, F., & Monaco, A. P. (2001). A forkheaddomain gene is mutated in a severe speech and language disorder. Nature 413 (6855), 519–523. Lander, E. S. (2011). Initial impact of the sequencing of the human genome. Nature 470, 187–197.
Lee, S. H., Ripke, S., Neale, B. M., & Cross-Disorder Group of the Psychiatric Genomics Consortium (2013). Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nature Genetics 45(9), 984–994. Limb, C. J., & Braun, A. R. (2008). Neural substrates of spontaneous musical performance: An FMRI study of jazz improvisation. PLoS ONE 3, e1679. Lin, J., Yan, X., Wang, C., Guo, Z., Rolfs, A., & Luo, J. (2012). Anatomical expression patterns of delta-protocadherins in developing chicken cochlea. Journal of Anatomy 221(6), 598–608. Lindor, N. M., Thibodeau, S., & Burke, W. (2017). Whole-genome sequencing in healthy people. Mayo Clinic Proceedings 92(1), 159–172. Lipkind, D., & Tchernichovski, O. (2011). Colloquium paper: Quantification of developmental birdsong learning from the subsyllabic scale to cultural evolution. Proceedings of the National Academy of Sciences 108(Suppl. 3), 15572–15579. Liu, S., Chow, H. M., Xu, Y., Erkkinen, M. G., Swett, K. E., Eagle, M. W., … Braun, A. R. (2012). Neural correlates of lyrical improvisation: An fMRI study of freestyle rap. Scientific Reports 2, 834. doi:10.1038/srep00834 Liu, X., Kanduri, C., Oikkonen, J., Karma, K., Raijas, P., Ukkola-Vuoti, L., … Järvelä, I. (2016). Detecting signatures of positive selection associated with musical aptitude in the human genome. Scientific Reports 6, 21198. doi:10.1038/srep21198 London, S. E., & Clayton, D. F. (2008). Functional identification of sensory mechanisms required for developmental song learning. Nature Neuroscience 11(5), 579–586. Martínez-Molina, N., Mas-Herrero, E., Rodríguez-Fornells, A., Zatorre, R. J., & Marco-Pallarés, J. (2016). Neural correlates of specific musical anhedonia. Proceedings of the National Academy of Sciences 113(46), E7337–E7345. Metz, M., Gassmann, M., Fakler, B., Schaeren-Wiemers, N., & Bettler, B. (2011). Distribution of the auxiliary GABAB receptor subunits KCTD8, 12, 12b, and 16 in the mouse brain. Journal of Comparative Neurology 519(8), 1435–1454. Montealegre-Z, F., Jonsson, T., Robson-Brown, K. A., Postles, M., & Robert, D. (2012). Convergent evolution between insect and mammalian audition. Science 338(6109), 968–971. Mosing, M. A., Madison, G., Pedersen, N. L., Kuja-Halkola, R., & Ullén, F. (2014). Practice does not make perfect: No causal effect of music practice on music ability. Psychological Science 25(9), 1795–1803. Mosing, M. A., Madison, G., Pedersen, N. L., & Ullén, F. (2016). Investigating cognitive transfer within the framework of music practice: Genetic pleiotropy rather than causality. Developmental Science 19(3), 504–512. Murugan, M., Harward, S., Scharff, C., & Mooney, R. (2013). Diminished FoxP2 levels affect dopaminergic modulation of corticostriatal signaling important to song variability. Neuron 80(6), 1464–1476. Nakahara, H., Masuko, T., Kinoshita, H., Francis, P. R., & Furuya, S. (2011). Performing music can induce greater modulation of emotion-related psychophysiological responses than listening to music. International Journal of Psychophysiology 81(3), 152–158. Nordeen, E. J., Holtzman, D. A., & Nordeen, K. W. (2009). Increased Fos expression among midbrain dopaminergic cell groups during birdsong tutoring. European Journal of Neuroscience 30(4), 662–670. Oikkonen, J., Huang, Y., Onkamo, P., Ukkola-Vuoti, L., Raijas, P., Karma, K., … Järvelä, I. (2015). A genome-wide linkage and association study of musical aptitude identifies loci containing genes related to inner ear development and neurocognitive functions. Molecular Psychiatry 20(2), 275– 282.
Oikkonen, J., & Järvelä, I. (2014). Genomics approaches to study musical aptitude. Bioessays 36(11), 1102–1108. Oikkonen, J., Onkamo, P., Järvelä, I., & Kanduri, C. (2016). Convergent evidence for the molecular basis of musical traits. Scientific Reports 6, 39707. doi:10.1038/srep39707 Ousdal, O. T., Anand Brown, A., Jensen, J., Nakstad, P. H., Melle, I., Agartz, I., … Andreassen, O. A. (2012). Associations between variants near a monoaminergic pathways gene (PHOX2B) and amygdala reactivity: A genome-wide functional imaging study. Twin Research and Human Genetics 15(3), 273–285. Park, H., Lee, S., Kim, H. J., & Ju, Y. S. (2012). Comprehensive genomic analyses associate UGT8 variants with musical ability in a Mongolian population. Journal of Medical Genetics 49(12), 747– 752. Parker, J., Tsagkogeorga, G., Cotton, J. A., Liu, Y., Provero, P., Stupka, E., & Rossiter, S. J. (2013). Genome-wide signatures of convergent evolution in echolocating mammals. Nature 502(7470), 228–231. Penhune, V., & de Villers-Sidani, E. (2014). Time for new thinking about sensitive periods. Frontiers in Systems Neuroscience 8, 55. Retrieved from https://doi.org/10.3389/fnsys.2014.00055 Perani, D., Saccuman, M. C., Scifo, P., Spada, D., Andreolli, G., Rovelli, R., … Koelsch, S. (2010). Functional specializations for music processing in the human newborn brain. Proceedings of the National Academy of Sciences 107(10), 4758–4763. Peretz, I., Cummings, S., & Dube, M. P. (2007). The genetics of congenital amusia (tone deafness): A family-aggregation study. American Journal of Human Genetics 81(3), 582–588. Petrucci, S., Ginevrino, M., & Valente, E. M. (2016). Phenotypic spectrum of alpha-synuclein mutations: New insights from patients and cellular models. Parkinsonism & Related Disorders 22(Suppl. 1), S16–S20. Peuralinna, T., Oinas, M., Polvikoski, T., Paetau, A., Sulkava, R., Niinistö, L., … Myllykangas, L. (2008). Neurofibrillary tau pathology modulated by genetic variation of alpha-synuclein. Annals of Neurology 64(3), 348–352. Pfenning, A. R., Hara, E., Whitney, O., Rivas, M. V., Wang, R., Roulhac, P. L., … Jarvis, E. D. (2014). Convergent transcriptional specializations in the brains of humans and song-learning birds. Science 346(6215), 1256846–1256846. Power, R. A, Steinberg, S., Bjornsdottir, G., Rietveld, C. A., Abdellaoui, A., Nivard, M. M., … Steffanson, K. (2015). Polygenic risk scores for schizophrenia and bipolar disorder predict creativity. Nature Neuroscience 18(7), 953–955. Pulli, K., Karma, K., Norio, R., Sistonen, P., Göring, H. H., & Järvelä, I. (2008). Genome-wide linkage scan for loci of musical aptitude in Finnish families: Evidence for a major locus at 4q22. Journal of Medical Genetics 45(7), 451–456. Qian, W., Deng, L., Lu, D., & Xu, S. (2013). Genome-wide landscapes of human local adaptation in Asia. PLoS ONE 8, e54224. Rothenberg, D., Roeske, T. C., Voss, H. U., Naguib, M., & Tchernichovski, O. (2014). Investigation of musicality in birdsong. Hearing Research 308, 71–83. Sabeti, P. C., Schaffner, S. F., Fry, B., Lohmueller, J., Varilly, P., Shamovsky, O., … Lander, E. S. (2006). Positive natural selection in the human lineage. Science 312(5780), 1614–1620. Salimpoor, V. N., Benovoy, M., Larcher, K., Dagher, A., & Zatorre, R. J. (2011). Anatomically distinct dopamine release during anticipation and experience of peak emotion to music. Nature Neuroscience 14(2), 257–262. Salimpoor, V. N., Benovoy, M., Longo, G., Cooperstock, J. R., & Zatorre, R. J. (2009). The rewarding aspects of music listening are related to degree of emotional arousal. PLoS ONE 4, e7487.
Salimpoor, V. N., Zald, D. H., Zatorre, R. J., Dagher, A., & McIntosh, A. R. (2015). Predictions and the brain: How musical sounds become rewarding. Trends in Cognitive Sciences 19(2), 86–91. Scherzer, C. R., Grass, J. A., Liao, Z., Pepivani, I., Zheng, B., Eklund, A. C., … Schlossmacher, M. G. (2008). GATA transcription factors directly regulate the Parkinson’s disease-linked gene alphasynuclein. Proceedings of the National Academy of Sciences 105(31), 10907–10912. Schizophrenia Working Group of the Psychiatric Genomics Consortium (2014). Biological insights from 108 schizophrenia-associated genetic loci. Nature 511(7510), 421–427. Schneider, P., Scherg, M., Dosch, H. G., Specht, H. J., Gutschalk, A., & Rupp, A. (2002). Morphology of Heschl’s gyrus reflects enhanced activation in the auditory cortex of musicians. Nature Neuroscience 5(7), 688–694. Seashore, C., Lewis, D., & Saetveit, J. (1960). Seashore measures of musical talents. New York: Psychological Corporation. Sotomayor, M., Weihofen, W. A., Gaudet, R., & Corey, D. P. (2012). Structure of a force-conveying cadherin bond essential for inner-ear mechanotransduction. Nature 492(7427), 128–132. Theusch, E., Basu, A., & Gitschier, J. (2009). Genome-wide study of families with absolute pitch reveals linkage to 8q24.21 and locus heterogeneity. American Journal of Human Genetics 85(1), 112–119. Ukkola-Vuoti, L., Kanduri, C., Oikkonen, J., Buck, G., Blancher, C., Raijas, P., … Järvelä, I. (2013). Genome-wide copy number variation analysis in extended families and unrelated individuals characterized for musical aptitude and creativity in music. PLoS ONE 8, e56356. Vernes, S. C., Spiteri, E., Nicod, J., Groszer, M., Taylor, J. M., Davies, K. E., … Fisher, S. E. (2007). High-throughput analysis of promoter occupancy reveals direct neural targets of FOXP2, a gene mutated in speech and language disorders. American Journal of Human Genetics 81(6), 1232– 1250. White, E. J., Hutka, S. A., Williams, L. J., & Moreno, S. (2013). Learning, neural plasticity and sensitive periods: Implications for language acquisition, music training and transfer across the lifespan. Frontiers in Systems Neuroscience 7, 90. Retrieved from https://doi.org/10.3389/fnsys.2013.00090 Whitney, O., Pfenning, A. R., Howard, J. T., Blatti, C. A., Liu, F., Ward, J. M., … Jarvis, E. D. (2014). Core and region-enriched networks of behaviorally regulated genes and the singing genome. Science 346(6215), 1256780. Wood, W. E., Olson, C. R., Lovell, P. V., & Mello, C. V. (2008). Dietary retinoic acid affects song maturation and gene expression in the song system of the zebra finch. Developmental Neurobiology 68(10), 1213–1224. Zhang, G., Li, C., Li, Q., Li, B., Larkin, D. M., Lee, C., … Wang, J. (2014). Comparative genomics reveals insights into avian genome evolution and adaptation. Science 346(6215), 1311–1320.
CHAPT E R 19
BRAIN RESEARCH IN MUSIC PERFORMANCE E C K A RT A LT E N MÜ L L E R, S H I N I C H I F U R U YA, D A N I E L S . S C H O L Z, A N D C H R I S TO S I . I O A N N O U
I M performance is based on extensive training and playing experience. It provides an excellent model for studying changes in brain functions and structures along with increasing expertise, a phenomenon usually referred to as plasticity of the human brain. Especially in professional musicians demands placed on the nervous system by music performance are very high and provide a uniquely rich multisensory and motor experience to the player. As confirmed by neuroimaging studies, playing music depends on a strong coupling of perception and action mediated by sensory, motor, and multimodal integration areas distributed throughout the brain. A pianist, for example, must draw on a whole set of complex skills, including translating visual analysis of musical notation into motor actions, coordinating multisensory information with bimanual motor activity, developing fine motor skills in both hands coupled with metric precision, and monitoring auditory feedback to fine-tune a performance as it progresses. In this chapter, we summarize research on the effects of musical training on brain function, brain connectivity, and brain structure. First, we address factors inducing and continuously driving brain plasticity in dedicated
musicians, arguing that prolonged goal-directed practice, multisensory– motor integration, high arousal, and emotional and social rewards contribute to these plasticity-induced brain adaptations. Subsequently, we briefly review the neuroanatomy and neurophysiology underpinning musical activities by focusing on the perception of sound, integration of sound and movement, and the physiology of motor planning and motor control. Further down, we review the literature on functional changes in brain activation and brain connectivity along with the acquisition of musical skills. In the following section, we focus on structural adaptions in the gray matter of the brain and in fiber tract density associated with music learning. We critically discuss the findings that structural changes are mostly seen when starting musical training after the age of 7 years, whereas functional optimization is more effective before this age. Finally, we briefly address the phenomenon of de-expertise, reviewing studies which provide evidence that intensive music-making can induce dysfunctional changes which are accompanied by a degradation of skilled motor behavior, also termed “musician’s dystonia” (see Peterson & Altenmüller, this volume). This condition, which is frequently highly disabling, mainly affects male classical musicians with a history of compulsive working behavior, anxiety disorder, or chronic pain. We conclude with a concise summary of the role of brain plasticity, meta-plasticity, and maladaptive plasticity in the acquisition and loss of musicians’ expertise.
P
M B
D P
Performing music at a professional level is one of the most demanding and fascinating human experiences. Singing and playing an instrument involve the precise execution of very fast and, in many instances, extremely complex movements that must be structured and coordinated with continuous auditory, somatosensory, and visual feedback. Furthermore, it requires retrieval of musical, motor, and multisensory information from both short-term and long-term memory and relies on continuous planning of an ongoing performance in working memory. The consequences of motor actions have to be anticipated, monitored, and adjusted almost in real-time
(Brown, Penhune, & Zatorre, 2015). At the same time, music should be expressive, requiring the performance to be enriched with a complex set of innate and acculturated emotional gestures. Practice is required to develop all of these skills and to execute these complex tasks. Ericsson and colleagues (Ericsson, Krampe, & TeschRömer, 1993) undertook one of the most influential studies on practice, with students at the Berlin Academy of Music. They considered not only time invested in practice but also quality of practice, and proposed the concept of “deliberate practice” as a prerequisite for attaining excellence. Deliberate practice combines goal-oriented, structured, and effortful practicing with motivation, resources, and focused attention. Ericsson and colleagues argued that a major distinction between professional and amateur musicians, and generally between more successful versus less successful learners, is the amount of deliberate practice undertaken during the many years required to develop instrumental skills to a high level (Ericsson & Lehmann, 1996). Extraordinarily skilled musicians therefore exert a great deal more effort and concentration during their practice than less skilled musicians, and are more likely to plan, imagine, monitor, and control their playing by focusing their attention on what they are practicing and how it can be improved. Furthermore, they can be eager to build up a network of supportive peers, frequently involving family and friends. The concept of deliberate practice has been refined since it became clear that not only the amount of deliberate practice, but also the point in life at which intense goal-directed practice begins are important variables. In the auditory domain, for example, critical periods—“windows of opportunity”—exist for the acquisition of so-called “absolute” or “perfect” pitch. Absolute pitch denotes the ability to name pitches without a reference pitch. It is mediated by auditory long-term memory and is strongly linked to intense early musical experience, usually before the age of 7 years (Baharloo, Johnston, Service, Gitschier, & Freimer, 1998; Miyazaki, 1988; Sergeant, 1968). However, genetic predisposition may play a role since absolute pitch is more common in certain East Asian populations and may run in families (Baharloo, Service, Risch, Gitschier, & Freimer, 2000; Gregersen, Kowalsky, Kohn, & Marvin, 2001). In the sensorimotor domain, early practice before age 7 years leads to optimized and more stable motor programs (Furuya, Klaus, Nitsche, Paulus, & Altenmüller, 2014) and to smaller yet more efficient neuronal networks, compared to practice
commencing later in life (Vaquero et al., 2016). This means that for specific sensorimotor skills, such as fast and independent finger movements, sensitive periods exist during development and maturation of the central nervous system, comparable to those for auditory and somatosensory skills (Ragert, Schmidt, Altenmüller, & Dinse, 2003). The issue of nature vs. nurture, or genetic predisposition vs. environmental influences and training in musical skills is complex, since the success of training is itself subject to genetic variability. General observation suggests that outcomes will not be identical for all individuals receiving the same amount of training. Evidence supporting the contribution of pre-existing individual differences comes from a large Swedish twin study showing that the propensity to practice is partially heritable (Mosing, Madison, Pedersen, Kuja-Halkola, & Ullén, 2014). Corrigall and colleagues investigated the contribution of cognitive and personality variables to music training, showing that those who engage in music perform better on cognitive tasks, have better educated parents, and describe themselves as more “open to experience” on personality scales (Corrigall, Schellenberg, & Misura, 2013). Findings are also beginning to accumulate in the music performance domain, indicating that learning outcomes can be predicted in part based on pre-existing structural or functional brain features (Herholz, Coffey, Pantev, & Zatorre, 2016). A convincing example of dysfunctional genetic predisposition is the inability to acquire auditory skills in congenital amusia; a hereditary condition characterized by absent or highly deficient pitch perception (Gingras, Honing, Peretz, Trainor, & Fisher, 2015). In the sensorimotor domain, musician’s dystonia, the loss of motor control in skilled movements while playing an instrument, has a strong genetic background in about one-third of affected musicians (Schmidt et al., 2009). On the other hand, training is clearly necessary for musical expertise, with a large number of researchers reporting that the length of musical experience is strongly correlated with performance on a range of musical tasks, as well as with brain function and structure (Amunts et al., 1997; Bengtsson et al., 2005; Bermudez, Lerch, Evans, & Zatorre, 2008; Chen, Penhune, & Zatorre, 2008a; Oechslin, Imfeld, Loenneker, Meyer, & Jäncke, 2010). Predispositions and experience contribute to musical expertise, and the relative balance between the two factors may differ in specific aspects of the many different musical subskills. Furthermore, it seems that there
exist early sensitive periods during which musical stimulation or training of subskills has to take place in order to establish fertile ground for growing extraordinary expertise later in life. This is best illustrated by the scaffold metaphor (Steele, Bailey, Zatorre, & Penhune, 2013). An early start to training develops the “scaffold” for building a “skyscraper-like” level of expertise later in life, whereas a late start of training allows only for moderate results even after long and intense training. Of course these scaffolds may differ from one domain to the next. For example, an outstanding virtuoso like the legendary pianist Lang Lang, known for his breathtaking finger dexterity, may require both highly relevant inherited traits and intense early sensorimotor training. Other musicians such as the late French singer Edith Piaf, known for her emotional expressivity but somehow lacking in technique, may have started technical exercises late in life but had genetic and biographical conditions allowing her to build up emotional depth, a character trait we feel and value, despite the difficulty in operationalizing it for precise study. Performing music at a professional level relies on a range of subskills, which are represented in different, though overlapping brain networks. Auditory skills such as the abovementioned perfect pitch, sensitivity to timing variations (e.g., “groove”) and to micro-pitches (e.g., tuning of a violin), or auditory long-term memory (e.g., memorizing a 12-tone series), are mainly processed in the temporal lobes of both hemispheres with a right hemisphere bias (Zatorre, 2001). However, signs of auditory and musical expertise can already be detected in the ascending auditory pathway at the brainstem level (Skoe & Kraus, 2013). Sensorimotor skills, such as low two-point discrimination thresholds (the ability to discern that two nearby objects touching the skin are two distinct points) and high tactile sensitivity (e.g., left fifth finger in professional violinists), bimanual or quadrupedal coordination (e.g., for piano and organ playing), fast finger movements (e.g., right hand arpeggios on the classical guitar), or complex hand postures (e.g., left hand on the electric guitar), are represented in premotor, motor, and parietal cortical areas, and in subcortical brain structures such as the basal ganglia and the cerebellum (Altenmüller & Furuya, 2015). Emotional and performance skills are supported by individualized prefrontal and orbitofrontal cortical regions and in the limbic system. Selfmonitoring, anticipation of the consequences of one’s actions, motivation, and focusing attention (all contributing to goal-directed “deliberate”
practice), recruit a highly diverse network, including lateral prefrontal cortices, parietal cortices, limbic structures, and particularly motivational pathways, including the accumbens nucleus, and memory structures such as the hippocampus (Zatorre & Salimpoor, 2013). All of these regions and the interconnecting nerve fibers are subject to modifications in function and structure in association with musical practice, a phenomenon which is based on brain plasticity. Brain plasticity denotes the general ability of our central nervous system to adapt throughout the lifespan to changing environmental conditions, body biomechanics, and new tasks. Brain plasticity is most typically observed for complex tasks with high behavioral relevance activating circuits involved in emotion, motivation, and reward. The continued activities of accomplished musicians are ideal for providing the prerequisites of brain plasticity (for a review see Schlaug, 2015). In musical expertise, the abovementioned processes are accompanied by changes in the function of the brain’s neuronal networks, as a result of a strengthening of synaptic connections, and in changes of its gross structure. With respect to mechanisms and microstructural effects of plasticity, our understanding of the molecular and cellular processes underlying these adaptations is far from complete. Brain plasticity may occur on different time scales. For example, the efficiency and size of synapses may be modified in a time window of seconds to minutes, while the growth of new synapses and dendrites may require hours to days. An increase in gray matter density, which mainly reflects an enlargement of neurons due to increased metabolism, needs at least several weeks. White matter density also increases as a consequence of musical training. This effect is primarily due to an enlargement of myelin cells which wrap around the nerve fibers (axons) and dendrites, greatly contributing to the velocity of the electrical impulses traveling along them. Under conditions requiring rapid information transfer and high temporal precision these myelin cells adapt by growing, and as a consequence nerve conduction velocity increases. Finally, brain regions involved in specific tasks may be enlarged after longterm training due to the growth of structures supporting nervous function, for example, in the blood vessels that are necessary for oxygen and glucose transportation (for a comprehensive review see Taubert, Villringer, & Ragert, 2012).
There are four main reasons why we believe that these effects on brain plasticity are more pronounced in music performance than in other skilled activities. First, the intensity of goal-directed training is extremely high; students admitted to a German state conservatory have spent an average of 10 years and 10,000 hours of deliberate practice in order to pass the demanding entrance examinations (Ericsson et al., 1993). Second, related to the above, musical training in those individuals who later become professional musicians usually starts very early, sometimes before age 6 years when the adaptability of the central nervous system is at its highest. Third, musical activities are strongly linked to conditions of high arousal and positive emotions, but also to stressors such as music performance anxiety. Neuroactive hormones, such as adrenalin (arousal), endorphins (joy), dopamine (rewarding experience), and stress hormones (fear of failure) support neuroplastic adaptations. Fourth, performing music in public is frequently accompanied by strong social feelings best described as a sense of connectedness and meaning. As a consequence, increased release of oxytocin and serotonin will similarly enhance plastic adaptations (Zatorre & Salimpoor, 2013). However, we should be careful in claiming that music produces more prominent plastic adaptations in the brain compared to other skilled activities as the methodology of group comparisons in brain plasticity research might produce a bias. For example, group investigations into professional classical pianists compared to “non-musicians,” such as in our study by Vaquero et al. (2016), might be influenced by differences in sample homogeneity. As opposed to many skilled activities, such as playing golf or other sports or other creative professions, such as writing or painting, classical pianists experience from a very young age similar acculturation and take part in highly homogeneous activities due to the canonical nature of their training. The latter study similar etudes of Hanon, Czerny, and Chopin for many years, and this may well produce more uniform brain adaptations, which dominate any individual changes. In other pursuits such as the visual arts, creative writing, architecture, jazz improvisation, and music composition, individualized training may produce more diverse effects that are masked in group statistics.
B
R M
I : A Q
P O
Playing a musical instrument or singing at a professional level require highly refined auditory, sensorimotor, and emotional-communicative skills that are acquired over many years of extensive training, and that have to be stored and maintained through further regular practice. Auditory feedback is needed to improve and perfect performance, and activity of emotionrelated brain areas is required to render a performance vivid and touching. Performance-based music-making therefore relies primarily on a highly developed auditory–motor–emotion integration capacity, which is reflected on the one hand in increased neuronal connectivity and on the other hand in functional and structural adaptations of brain areas supporting these activities. In the following, we give a quick overview of the many brain regions involved in making music (for a review see Brown et al., 2015). Music perception involves primary and secondary auditory areas (A1, A2) and auditory association areas (AA) in the two temporal lobes. The primary auditory area, localized in the upper portion of the temporal lobe in Heschl’s gyrus receives its main input from the inner ears via the ascending auditory pathway. It is mainly involved in basic auditory processing such as pitch and loudness perception, perception of time structures, and spectral decomposition. The left primary auditory cortex is specialized in the rapid analysis of time structures, such as differences in voice onset times when articulating “da” or “ta.” The right, on the other hand, deals primarily with the spectral decomposition of sounds. The secondary auditory areas surround the primary area in a belt-like formation. More complex auditory features such as timbre are processed in the secondary auditory areas (Koelsch, 2011). Finally, in the auditory association areas, auditory gestalt perception takes place. Auditory gestalts can be understood, for example, as pitch-time patterns like melodies and words. In right-handers, and in about 95 percent of all left-handers, Wernicke’s area in the left posterior portion of the upper temporal lobe is specialized in language decoding (Kraus, McGee, & Koch, 1998). In contrast to the early auditory processing of simple acoustic structures, listening to music is a far more complex task. Music is experienced not only
as an acoustic structure over time, but also as patterns, associations, emotions, expectations, and so on. Such experiences rely on a complex set of perceptive, cognitive, and emotional operations. Integrated over time, and frequently linked to biographic memories, they enable us to experience strong emotions, processed in structures of the limbic system such as the ventral tegmental area of the mesencephalon or the accumbens nucleus in the basal forebrain (Salimpoor et al., 2013). Memories and social emotions evoked during music listening and playing involve the hippocampus, deep in the temporal lobe, and the dorsolateral prefrontal cortex, mainly in the right hemisphere. Making music relies on voluntary skilled movements which involve four cortical regions in both hemispheres: the primary motor area (M1) located in the precentral gyrus directly in front of the central sulcus; the supplementary motor area (SMA) located anteriorly to M1 of the frontal lobe and the inner (medial) side of the cortex; the cingulate motor area (CMA) below the SMA and above the corpus callosum on the inner (medial) side of the hemisphere; and the premotor area (PMA), which is located adjacent to the lateral aspect of the primary motor area (see Fig. 1).
FIGURE 1. Brain regions involved in sensory and motor music processing. (The abbreviation “a” stands for “area.”) Left hemisphere is shown in the foreground (lower right); right hemisphere in the background (upper left). The numbers relate to the respective Brodmann’s areas, a labeling of cortical regions according to the fine structure of the nervous tissue.
SMA, PMA, and CMA can be described as secondary motor areas, because they are used to process movement patterns rather than simple movements. In addition to cortical regions, the motor system includes the subcortical structures of the basal ganglia, and the cerebellum. Steady kinaesthetic feedback is also required to control any guided motor action and comes from the primary somatosensory area (S1) behind the central sulcus in the parietal lobe. This lobe is involved in many aspects of movement processing and is an area where information from multiple sensory regions converges. In the posterior parietal area, body coordinates in space are monitored and calculated, and visual information is transferred into these coordinates. As far as musicians are concerned, this area is prominently activated during tasks involving multisensory integration, for example, during sight-reading, the playing of complex pieces of music (Haslinger et al., 2005), and the transformation of musical pitch information
into movement coordinates (Brown et al., 2013) and of musical notation into corresponding motor actions (Stewart et al., 2003). The primary motor area (M1) represents the movements of body parts distinctly, in systematic order. The representation of the leg is located on the top and the inner side of the hemisphere, the arm in the upper portion, and the hand and mouth in the lower portion of M1. This representation of distinct body parts in corresponding brain regions is called “somatotopic” or “homuncular” order. Just as the motor homunculus is represented upsidedown, so too is the sensory homunculus on the other side of the central sulcus. The proportions of both the motor and the sensory homunculi are markedly distorted since they are determined by the density of motor and sensory innervations of the respective body parts. For example, control of fine movements of the tongue requires many more nerve fibers transmitting information to this muscle, compared to control of the muscles of the back. Therefore, the hand, lips, and tongue require almost two-thirds of the neurons in this area (Roland & Zilles, 1996). However, as further explained below, the relative representation of body parts may be modified by usage. Moreover, the primary motor area does not simply represent individual muscles; multiple muscular representations are arranged in a complex way so as to allow the execution of simple types of movements rather than the activation of a specific muscle. This process is a consequence of the fact that a two-dimensional array of neurons in M1 has to code for threedimensional movements in space (Gentner & Classen, 2006). Put more simply, our brain does not represent muscles but rather movements. The supplementary motor area (SMA) is mainly involved in the sequencing of complex movements and in the triggering of movements based on internal cues. It is particularly engaged when the execution of a sequential movement depends on internally stored and memorized information. It therefore is important for both rhythm and pitch processing because of its role in sequencing and the hierarchical organization of movement (Hikosaka & Nakamura, 2002). Skilled musicians and nonmusicians engage the SMA either when performing music or when imagining listening to or performing music (de Manzano & Ullén, 2012; Herholz & Zatorre, 2012). This finding suggests that the SMA may be crucial for experts’ ability to plan music segment-by-segment during performance.
The premotor area (PMA) is primarily engaged when our motor system has to react to external stimuli, such as acoustic or visual prompts. Anticipation, planning, and preparation of movement patterns in response to visual cues have been attributed to the function of PMA (Stetson & Anderson, 2015). It is involved in the learning, execution, and recognition of limb movements and seems to be particularly concerned with the integration of visual information, which is necessary for movement planning. The PMA is also responsible for processing complex rhythms (Chen, Penhune, & Zatorre, 2008b). The function of the cingulate motor area (CMA) is still under debate. Electrical stimulation and brain imaging studies demonstrate its involvement in movement selection in situations when movements are critical to obtain reward or avoid punishment. This fact points towards close links between the cingulate gyrus and the emotion processing limbic system. The CMA may therefore play an important role in mediating cortical cognitive and limbic-emotional functions, for example, in error processing during a musical performance (Herrojo-Ruiz, Jabusch, & Altenmüller, 2009). The basal ganglia, located deep inside the cerebral hemispheres, are interconnected reciprocally via the thalamus to the motor and sensory cortices, thus constituting a loop of information flow between cortical and subcortical areas. They are indispensable for any kind of voluntary action and play a crucial role in organizing sequences of motor actions. The basal ganglia are therefore the structures mainly involved in automation of skilled movements such as sequential finger movements (Seger, 2006). Their special function consists of selecting appropriate motor actions and comparing the goal and course of those actions with previous experience. The middle putamen in particular seems to be involved in storing fast and automated movement programs. It is subject to plastic adaptations in professional musicians. Furthermore, in the basal ganglia the flow of information between the cortex and the limbic emotional systems, in particular the amygdala and the accumbens nucleus, converges. It is therefore assumed that the basal ganglia process and control the emotional evaluation of motor behavior in terms of expected reward or punishment (for a review see Haber, 2003). The cerebellum is an essential contributor to the timing and accuracy of fine-tuned movements. It is thought to play a role in correcting errors and in
learning new skills. The cerebellum has been hypothesized to be part of a network including parietal and motor cortex that encodes predictions of the internal models of these skills. The term “internal model” refers to a neural process that simulates the response of the motor system in order to estimate the outcome of a motor command. The cerebellum is connected to almost all regions of the brain, including those important for memory and higher cognitive functions. It has been proposed that this structure serves as a universal control system that contributes to learning, and to optimizing a range of functions across the brain (Ramnani, 2014).
T
E B
M F
T
With advanced techniques, brain function can be precisely assessed. Activity changes of brain networks, connectivity measures between brain areas on a small and a large scale, and even the amount of nerve cells activated in response to musical stimuli can be estimated (for a review on methodology see Altenmüller, Münte, & Gerloff, 2004). The neural bases of refined auditory processing in musicians are well understood. In 1998, Pantev and colleagues provided a first indication that extensive musical training can plastically alter receptive functions (Pantev et al., 1998). Equivalent current dipole strength, a measure of mass neuronal activation, was computed from evoked magnetic fields generated in auditory cortex in response to piano tones and to pure tones of equal fundamental frequency and loudness. In musicians, the responses to piano tones (but not to pure tones) were ~25 percent larger than in non-musicians. In a study of violinists and trumpeters, this effect was most pronounced for tones from each musician’s own type of instrument (Hirata, Kuriki, & Pantev, 1999). In a similar way, evoked neural responses to subtle alterations in rhythm or pitch are much more pronounced in musicians than in non-musicians (Münte, Nager, Beiss, Schroeder, & Altenmüller, 2003). Even functions such as sound localization that operate on basic acoustic properties have shown effects of plasticity and expertise amongst different groups of musicians. A conductor, more than any other musician, is likely to depend on spatial localization for successful performance. For example,
he might need to guide his attention to a certain player in a large orchestra. In one study, professional conductors were found to be better than pianists and non-musicians at separating adjacent sound sources in the periphery of the auditory field. This behavioral selectivity was paralleled by modulation of evoked brain responses, which were selective for the attended source in conductors, but not in pianists or non-musicians (Münte, Kohlmetz, Nager, & Altenmüller, 2001). These functional adaptations are not restricted to the auditory cortex, but can be observed in subcortical areas of the ascending auditory pathway: musically trained individuals have enhanced brainstem representations of musical sound wave-forms (Wong, Skoe, Russo, Dees, & Kraus, 2007). Refined somatosensory perception constitutes another basis of highlevel performance. The kinaesthetic sense is especially important. It allows for control and feedback of muscle and tendon tension as well as joint positions, which enables continuous monitoring of finger, hand, and lip position in the frames of body and instrument coordinates (e.g., the keyboard, the mouthpiece). Intensive musical training has also been associated with an expansion of the functional representation of finger or hand maps, as demonstrated in magnetoencephalography (MEG) studies. For example, the somatosensory representation of the left fifth digit in string players was found to be larger than that of non-musicians (Elbert, Pantev, Wienbruch, Rockstroh, & Taub, 1995). Musicians who had begun training early in life ( left, lobules VI/VII) during rhythmic improvisation when compared with melodic improvisation or “Notes.” Melodic
improvisation was not associated with increased pre-SMA functional connectivity with other brain regions. The left PMd was not associated with changes in function connectivity during either task. The GLM (general linear model) contrast of free improvisation (“Free”—“Notes”) was associated with increased activity in prefrontal, motor, and lateral frontal network, including the pre-SMA, left PMd, left DLPFC extending into the left IFG (pars triangularis), and decreased activity within the bilateral inferior occipital gyrus, right precentral gyrus extending across the midline into the right superior parietal cortex, bilateral medial frontal gyrus, left superior parietal lobe, and right inferior parietal lobe.
Conclusions/Highlighted Discussion This study demonstrated differential activation of high-order motor regions during musical tasks that isolate melodic and improvisational freedom, and functional connectivity of these regions to other brain regions varies depending on the task. The authors argue that the pre-SMA plays a critical role in motor timing and the hierarchical control and sequencing of movements, which is important in both melody and rhythm generation. The authors suggest that the PMd may be important for melodic and spatial processing based on evidence showing the region to be important for the cognitive aspects of visuomotor integration and spatial targeting of movement sequences. The regions associated with free improvisation (“Free”—“Notes”)—the pre-SMA, dorsal PMC, and DLPFC—are known to be associated with “explicit processing of novel motor sequences.” The authors contrast their results with that of Limb and Braun (2008), which showed task-related deactivations in these areas in musicians with expertise specifically in improvisation, and suggest that expertise may make the tasks less cognitively demanding. Expert improvisers may utilize “implicit, routine, and automated behavior” strategies, which is reflected neurologically in “a more caudal distribution of activity in the SMA and PMd.” The observed increase in functional connectivity of the pre-SMA with the cerebellum during rhythmic improvisation was anticipated given the cerebellum’s role in motor timing, and demonstrates the region’s ability to modulate its interactions with other areas based on task demands.
Neural Correlates of Lyrical Improvisation: An fMRI Study of Freestyle Rap (Liu et al., 2012) Design This study sought to understand musical improvisation using a different creative modality than the keyboard studies described earlier: freestyle rap. Twelve freestyle rap artists with more than five years of experience rapped over an eight-measure instrumental track at 85 beats per minute, and the experimental conditions included “Conventional,” where the subjects were asked to rap a memorized lyric they had been given prior to the scanning session; and “Improvise,” where the subjects were asked to improvise lyrics. Behavioral measurements included blinded ratings of the creative use of language and rhythm in the improvised compositions (by an expert panel), and the number of syllables per minute. The subjects also completed standard beside neurological tests of generative verbal fluency, with both phonological and semantic constraints.
Results Behavioral: The subjects generated the same number of syllables in both the “Improvise” and “Conventional” conditions. On tests of verbal fluency, the subjects scored above the 80th percentile when compared with age and education matched controls in both semantic and phonemic tests, suggesting superior abilities. fMRI: The main neuroimaging contrast designed to isolate the unique aspects of improvisation—“Improvise” versus “Conventional”—revealed increased activity within the several functional networks, including prefrontal (left medial PFC extending from the frontopolar cortex to the pre-SMA), language/perisylvian (left IFG, left MTG, left STS, and left fusiform), and motor (left cingulate motor area [CMA], left pre-SMA, left dorsal PMC, left caudate, left globus pallidus, and right posterior cerebellum and vermis) regions. There was decreased activity within the right DLPFC extending from orbital to superior regions. A parametric analysis using the expert-rated creativity scores as predictors of regional activity showed higher creativity scores during “Improvise” to be associated with higher activity in the left posterior and
middle temporal gyrus, left medial PFC near the superior frontal sulcus, and the left PCC. A functional connectivity analysis showed that during “Improvise” (versus “Conventional”), activity in the left medial PFC (seed selected by the parametric results) had reduced connectivity with the right DLPFC and increased connections with the anterior perisylvian (e.g., left IFG) and cortical motor areas (e.g., CMA, ACC, pre-SMA, and PMd). To define the extent of this improvisation-associated medial PFC network further, functional connectivity studies were repeated using the areas of increased medial PFC connectivity as seeds (e.g., left IFG, pre-SMA, CMA); these regions all showed increased connectivity with the left amygdala. When the left amygdala was used as a seed, there was a widespread, bilaterallydistributed neural network of increased connectivity, which included the right IFG, right inferior parietal lobule, and bilateral insula.
Conclusions/Highlighted Discussion The study showed similar results to Limb and Braun (2008), with a dissociation of medial activation and lateral deactivation within the PFC during musical improvisation. The authors suggest that this pattern reflects “a state in which internally-motivated, stimulus-independent behaviors are allowed to unfold in the absence of conscious volitional control.” The medial PFC, which regulates motivational drive and guides self-generated behaviors, is normally regulated by the DLPFC, where executive control occurs and ongoing adjustments are made “to ensure that actions conform to explicit goals.” The authors speculate that information from the medial PFC could bypass the DLPFC via its rich connections to the CMA, an area known to integrate affective and cognitive representations to guide behavior. The deactivations of the right DLPFC and other elements of the dorsal attention network (e.g., intraparietal sulcus) are explained: “topdown attentional processes mediated by this network may be attenuated during improvisation, consistent with the notion that a state of defocused attention enables the generation of novel, unexpected associations that underlie spontaneous creative activity.” The areas of activation during improvisational tasks tended to be lateralized to the left, whereas the deactivations were more right-lateralized (e.g., DLPFC). The authors suggest the dominant hemisphere activations
are consistent given the unique demands of freestyle rap, an inherently language-based musical form, and may reflect “spontaneous phonetic encoding and articulation of rapidly selected words during improvisation … and spontaneous incorporation into established rhythmic patterns … which may place additional demands on these regions.” The widespread, bilaterally-distributed network identified during the functional connectivity analyses using the medial PFC as the initial seed may underlie “multi-modal sensory processing and the representation of subjective experience, and that as a whole, this entire network is more effectively coupled during spontaneous behavior—perhaps facilitating what has been described as a psychological ‘flow’ state.” The correlation of creativity scores with neural activity in the MTG and STS may reflect superior verbal fluency, as these areas are important for accessing the mental lexicon, and the medial PFC—also associated with higher scores of creativity—may suggest a role for motivation/drive in innovative compositions. The authors argue that the DLPFC deactivations, which were not reported in other studies of musical improvisation, may be the result of fewer secondary cognitive demands associated with other studies’ tasks.
Neural Correlates of Musical Creativity: Differences between High and Low Creative Subjects (Villarreal et al., 2013) Design This study was designed to examine rhythmic improvisation, and look for differences in subjects deemed to have high or low rhythmic creative capacities. Twenty-four music therapy students were presented with one of fourteen rhythms played on a cymbal, and asked to either repeat the rhythm they just heard (“Repeat”) or create a new rhythm based on the presented rhythmic pattern (“Create”). Behavioral measurements included the number of variations from the original sequence (“fluidity”) and the type of variations used (“flexibility”).
Based on these performance measures, the subjects were divided into two groups—a less creative group (LCG) and a high creative group (HCG). MRI analyses included both standard task comparisons and a parametric analysis based on fluidity and flexibility scores.
Results Behavioral: There was a wide but bimodal distribution in both fluidity and flexibility scores, and the parameters were strongly correlated. This allowed for a relatively clear separation into the HCG and LCG groups. fMRI: Comparisons of HCG versus LCG (when “Create” and “Repeat” tasks were combined) demonstrated increased signal in the left pre- and postcentral gyrus and left DLPFC. The main improvisational contrast, “Create” versus “Repeat,” revealed increased signal in the SMA, DLPFC, and right ventral lateral PFC, when both groups were collapsed into one (HCG+LCG). When examining the groups separately for this contrast, the HCG showed increased activity in the left DLPFC, right insula, and right ventral lateral PFC, whereas the LCG showed increased signal in the left precentral gyrus and SMA. The contrast of these maps—“Create” versus “Repeat” for the HCG only compared with the LCG—revealed only an uncorrected activation within the left DLPFC and right insula. Parametric analysis showed that flexibility scores covaried with signal in the left DLPFC, right ventral lateral PFC, and right insula.
Conclusions/Highlighted Discussion The authors argue that the left DLPFC and right insular activations seen in the HCG during improvisation, combined with these regions’ positive correlation with creativity scores in the parametric analysis represent “widespread integration of networks associated with cognitive, motivational, and emotional processes,” which may be important for novelty idea generation. The DLPFC activity observed in the HCG, similar to that suggested in Bengtsson et al. (2007), reflects “a greater focus of attention, greater reliance on working memory to retain diverse musical images in their mind while other images were being processed, greater inhibition of interfering stimuli to avoid adhering to the original rhythmical patterns, and greater amount of manipulating to organize their products into unique and recognizable original combinations.” The tasks employed are
similar to those used by Bengtsson et al. (2007), and may help explain the discrepant DLPFC findings with Limb and Braun (2008) and Liu et al (2012). The insular activity, via its interactions with other regions, “serves to develop subjective emotional and motivational states and to translate these states into specific action plans. … and the correlation between anterior insula activation and creativity … likely reflects a positive association between the capacity to integrate information and creativity level.” The LCG group showed only SMA activity, and this was attributed to the SMA’s role in a network that includes cortical (SMA, IPL), basal ganglia, and cerebellar structures to integrate sensory and motor information during performances involving rhythmic movements in response to auditory stimuli. The authors suggest SMA may have been more prominent in the LCG because their compositions did not differ from the originally-presented rhythmic stimuli.
Connecting to Create: Expertise in Musical Improvisation Is Associated with Increased Functional Connectivity between Premotor and Prefrontal Areas (Pinho, de Manzano, Fransson, Eriksson, & Ullén, 2014) Design This study examined the role of musical expertise as it relates to the neural correlates of musical improvisation. Thirty-nine pianists with a wide variety of musical and improvisational experience improvised melodies under a number of conditions, namely “Tonal,” where they improvised using six different pitches from a Western musical scale (major or minor); “Atonal,” where they improvised using six different pitches randomly chosen except not from the same Western scale and at least one interval greater than a third; “Happy,” where they were asked to improvise a “happy” melody without pitch constraints; and “Fearful,” where they were asked to improvise a “fearful” melody without pitch constraints. The subjects also
completed a survey where they estimated their total practice hours on the piano, hours spent specifically on classical training, and hours spent improvising. Behavioral measurements included performance measures, including how accurately subjects adhered to the task instructions (e.g., did they use the correct notes of the scale), musical complexity, and the survey data. Neuroimaging measurements included standard contrasts across conditions, although the main contrast of interest was improvisation versus rest. A functional connectivity analyses was also performed, which sought to correlate the survey on experience with the regional functional connectivity during musical improvisation.
Results Behavioral: There was significant variability in the amount of experience amongst the pianists, both overall and specifically for improvisation. The amount of improvisational training did not correlate with measures of musical complexity. fMRI: There were strong correlations of the BOLD signal across all four experimental conditions, and as such, the conditions were collapsed into one (“Improvisation”) for the purposes of imaging analysis. Improvisational experience was negatively correlated with neural activity during “Improvisation” (contrasted with rest) in several right hemisphere regions, namely the DLPFC, IFG, anterior insula, and angular gyrus. In a functional connectivity analysis, improvisational experience was associated with higher connectivity between prefrontal, premotor, and motor regions of the frontal lobe during improvisation (contrasted with rest). This was shown using six different seed regions within the bilateral DLPFC, pre-SMA, and PMd, although the most extensive connectivity was seen using the right PMd. Additional areas of increased functional connectivity outside the frontal lobe were also observed using these seeds, including the parietal, posterior temporal, primary sensorimotor, and cerebellar regions. The regions in each of these studies affected by improvisational experience— right hemisphere regions for neural activity, and bilateral frontal regions in the functional connectivity analysis—were non-overlapping in their anatomical distribution. All of these effects were independent of the amount of classical piano experience or the age of the pianists.
Conclusions/Highlighted Discussion This study demonstrates a link between the type of training and the functional neuroanatomy underlying improvised musical performance. The authors suggest that “greater functional connectivity of the frontal brain regions seen in the most experienced participants may reflect a more efficient integration of representations of musical structures at different levels of abstraction. A higher functional connectivity with the seed regions was observed with premotor regions and parietal and prefrontal association cortex and the cerebellum, suggesting the training-related functional reorganizations may affect both cognitive and sensorimotor aspects of improvisation.” The authors suggest that the reduced activations of the right DLPFC and parietal regions observed in those with more extensive improvisational experience may indicate “automation and reduced topdown cognitive control,” similar to what was reported by Limb and Braun (2008) and Liu et al. (2012). The authors explain the finding of training being associated with reduced brain activity but increased connectivity between regions during the task of musical improvisation as signifying that “skilled improvisational performance may thus be characterized by both lower demands on executive control and a more efficient interaction within the network of involved brain areas.”
Addressing a Paradox: Dual Strategies for Creative Performance in Introspective and Extrospective Networks (Pinho, Ullén, Castelo-Branco, Fransson, & de Manzano, 2016) Design This study employed the same methods as Pinho et al. (2014), where 39 pianists were asked to improvise melodies on keyboard under a number of conditions, including “Tonal,” “Atonal,” “Happy,” and “Fearful,” as described above. The purpose was to compare improvisational tasks with “emotional” intention (e.g., happy, fearful) with those based on explicit rules based on pitch sets (e.g., tonal, atonal). Similar performance
measurements were gathered, including the accuracy of performance against the task instructions and characterization of musical complexity. MRI measurements included GLM contrasts of the BOLD signal across tasks, and functional connectivity between the DLPFC and other brain regions during the different pitch sets or emotional character of the generated melodies.
Results Behavioral: Performers used more keystrokes and showed higher musical complexity during the emotional conditions (“Happy” and “Fearful”) compared with the pitch-set conditions (“Tonal” and “Atonal”). fMRI: In the contrast of the pitch-set conditions versus the emotional conditions, there was increased activity within the bilateral DLPFC (extended on the right throughout the middle frontal gyrus into the PMd), inferior parietal lobes, inferior temporal gyri, left inferior occipital gyri, and left cerebellum. During the opposite contrast—emotional versus pitch-set conditions—there was increased signal within the left dorsal medial PFC (in the superior medial gyrus), left medial orbital gyrus, bilateral IFG, bilateral insula (extending into the amygdala), left STG, left mid-cingulate, right precentral gyrus, left central sulcus, right Rolandic operculum, and bilateral occipital gyri. The right DLPFC seed ROI used in the functional connectivity analyses was chosen as the region of overlap in the DLPFC between the GLM contrast of pitch-set versus emotion and a previously reported DLPFC area whose activity is related to improvisation practice (Pinho et al., 2014). Functional connectivity during pitch-set condition (compared with emotional sets) showed increased connectivity of the right DLPFC with motor areas (bilateral PMd, left PMv, left SMA), auditory areas (bilateral STG), left primary sensorimotor cortex, left parietal lobe, and right cerebellum. During emotional conditions, the right DLPFC showed more connectivity with parts of the default-mode network (medial PFC and medial parietal regions). The left DLPFC showed a similar connectivity pattern as the right DLPFC during emotional conditions, but not during pitch-set conditions.
Conclusions/Highlighted Discussion
This study demonstrates that improvisation-associated neural activity and connectivity are modulated by emotional and musical constraints. The pitch-set task, the authors suggest, requires “an explicit approach to creative thinking,” and consequently the DLPFC is more active and functionally connected to premotor, sensorimotor, and cerebellum, which are important for “integrating goal-oriented information, that is, internal (musical) and external (response set) constraints, for attentional selection, that is, cognitive control of action sequencing and motor execution.” The authors argue that “top-down executive control extends to the level of motor execution,” and the DLPFC, PMd, and parietal cortex “constitute an ‘intentional framework’ for sensorimotor processing.” Emotional improvisation, in contrast, may rely on a more “implicit” strategy, which is reflected in reduced DLPFC activity and its increased connections with parts of the default mode. The increased activity of the medial PFC during emotional improvisation is notable given its role in “representing the affective meaning of stimuli,” and its “functional interconnections with cortical, striatal, and limbic regions … [that] allow convergence of sensorimotor integration and visceromotor control in the processing of emotionally salient information and regulation of behavior.” The authors point to evidence of tonal representations in the medial PFC, which “may enable associative processes between music, emotion, and memories.” During emotional conditions, the IFG controls response selection “based on retrieval and sequencing processes that … utilized internalized musical syntactic rules and semantic associations.” The authors argue these two modes of musical improvisation represent two neurological “meta-systems,” one an executive system “where the DLPFC drives integration of sensory, autonomic, and goal-related information to implement adaptive control,” and another an integrative system “constituted primarily by the default mode network, where largely automated processes in specialized brain systems are organized under the influence of the MPFC for the flexible integration of exogenous and endogenous information.” An individual may shift between these two cognitive modes depending on their training and the improvisational context.
Neural Substrates of Interactive Musical Improvisation: An fMRI Study of “Trading Fours” in Jazz (Donnay, Rankin, Lopez-Gonzalez, Jiradejvong, & Limb, 2014) Design This study sought to examine musical improvisation as it occurs with an interlocutor, as in “trading fours” in jazz. Eleven professional musicians proficient in jazz piano performance interacted musically with an interlocutor by alternating four-bar phrases with each other. The constraints on these interactions between the musician pair characterized the experimental conditions, which included “Scale-Control,” where only quarter notes and repeated playing of the D Dorian scale was permitted; “Scale-Improv,” where melodies were improvised using the D Dorian scale, but only quarter notes were allowed; “Jazz-Control,” where subjects played a memorized composition with background accompaniment; and “JazzImprov,” where melodies and rhythms were unrestricted and played with background accompaniment. Behavioral assessments included measurement of the performance and quantification of the musical interactions between the subject and interlocutor, including note density, pitch class distribution, pitch class transition, duration distribution, duration transition, interval distribution, interval transitions, and melodic complexity.
Results Behavioral: There were more notes played in the “Jazz-Improv” condition compared with the “Jazz-Control,” with the comparable “Scale” conditions showing no difference. Melodic complexity was highest and most variable for “Jazz-Improv.” The melodies traded in phrase pairings were related to each other in terms of duration, pitch, interval, and melodic complexity. fMRI: The main contrast in the MRI data, improvised melodies versus controls, revealed a widespread pattern of activation and deactivation in both “Scale” and “Jazz” conditions. Areas of increased activity included language (bilateral IFG pars opercularis and triangularis, bilateral [right
more so than left] posterior STG within Wernicke’s area), prefrontal (bilateral DLPFC), motor (bilateral SMA), parietal areas (bilateral IPL, bilateral SPL), bilateral SMG, and bilateral middle occipital gyrus. Areas of decreased signal included prefrontal areas (bilateral dorsal prefrontal cortex over the superior frontal gyrus and middle frontal gyrus), default mode areas (bilateral angular gyrus, bilateral precuneus), and motor areas (bilateral precentral gyrus). This pattern of BOLD signal change was similar in both “Jazz” and “Scale” paradigms. Functional connectivity measured during improvisational exchanges revealed increased connectivity between the left and right IFG, and anticorrelations between the bilateral IFG and STG, and the left IFG and bilateral angular gyri.
Conclusions/Highlighted Discussion The study demonstrates that improvised musical exchanges are associated with increased activity in a network that includes traditional perisylvian language areas and their right-sided homologues (e.g., IFG, posterior STG), prefrontal and attentional regions (bilateral DLPFC, IPL), premotor/motor areas (e.g., bilateral SMA, precentral gyrus), and parietal regions (e.g., SPL, IPL). The right IFG was felt to be important for the “detection of taskrelevant cues, such as those involved in the identification of salient harmonic and rhythmic elements,” and the right STG important for auditory short-term memory, as would be required to keep track of the interlocutor’s ongoing improvisations. The bilateral IFG is important in syntactic processing of music and speech, and the STG has been implicated in harmonic processing. The authors suggest a link between linguistic and musical discourse, and point to shared regions of activity in this study and those using a speech interlocutor, as well as similarities in their hierarchical structures, and propose that both utilize a “common neural network for syntactic operations.” Increased activity within the DLPFC during improvisation was felt to represent increased conscious self-monitoring of musical behavior in the social musical setting, and possibly also increased working demands associated with trading fours. The authors speculate that increased activity in the sensorimotor areas represents a “primed” state “as the musician prepares to execute unplanned ideas in a spontaneous context.”
The authors suggest that the functional deactivations within the bilateral angular gyrus, and its reduced connectivity with left IFG, may be “indicative of the lesser role semantic processing has in moment-to-moment recall and improvisatory musical generation whereby only musical syntactic information is exchanged and explicit meaning is intangible and possibly superfluous.” This study suggests that social paired musical improvisations may utilize inferior frontal systems important for hierarchical structuring of musical and linguistic discourse (e.g., musical syntax), and require increased working memory demands and harmonic processing. Areas important for the communication of explicit semantic ideas (and their functional connections) are less active during these exchanges, suggesting a deemphasis of these features in musical conversation.
Emotional Intent Modulates the Neural Substrates of Creativity: An fMRI Study of Emotionally Targeted Improvisation in Jazz Musicians (McPherson, Barrett, Lopez-Gonzalez, Jiradejvong, & Limb, 2016) Design This study sought to study the relationship between musical improvisation and emotional processing. Twelve professional jazz pianists with greater than five years of professional experience were asked to improvise melodies in response to emotional cues. Subjects were shown individual photographs representing one of three emotional valence states (e.g., positive, ambiguous, negative), and were asked to improvise melodies that best represented the presented facial expression. The three experimental conditions, “Positive,” “Negative,” and “Ambiguous” were contrasted with a lower level musical baseline task, where the subject played ascending and descending chromatic scales (“Chromatic”). Behavioral measurements included musical performance features, including note density (notes per second), note duration distribution
(variable length of individual notes), note maxima and minima (highest and lowest pitch), mode, and key.
Results Behavioral: The emotional valence of the facial expressions was associated with differences in performance, as “positive” improvisations were most apt to be performed in a major key (71% of the time, compared with 31% for negative and 46% for ambiguous), had higher note maxima (whereas “negative” conditions had lower note minima), highest note density (followed by ambiguous, then negative), and significantly more notes of shorter duration. fMRI: When combining all groups, improvisation (versus chromatic) was associated with increased signal in the left IFG, and decreased signal in the bilateral medial and lateral frontopolar cortex, DLPFC, angular gyrus, precuneus, and bilateral mid-cingulate. Emotional valence was associated with different regions of brain activity during improvisation. Positive improvisations were associated with decreased signal within left hippocampus, and more extensive deactivation in the DLPFC, angular gyrus, and precuneus compared with negative/ambiguous. Both negative and ambiguous improvisations were associated with increased activity in the bilateral SMA, and negative improvisations were associated with decreased signal within the bilateral hippocampi. During improvisational blocks, the contrast of positive versus ambiguous showed increased activity within limbic areas (left hippocampus, left amygdala, right parahippocampal gyrus). The contrast of negative versus ambiguous revealed increased signal in dorsal medial prefrontal (right ACC [BA9]), posterior default mode (left angular gyrus [BA39]), high-order sensory (SMG [BA40]), and limbic regions (right hippocampus), and decreased signal within motor (right cerebellum, left primary motor [BA4]), and auditory areas (bilateral Heschel’s gyrus). Negative and ambiguous versus positive revealed increased signal in prefrontal (bilateral frontopolar cortex [BA10]), right ACC (BA 32), right insula (BA13 and 47), and perisylvian areas (right SMG [BA40], bilateral middle temporal [BA22]). The contrast of positive versus negative revealed increased signal only within the right cerebellum. Viewing the emotional
expression itself was not associated with any significant differences in brain activity, which suggests the observed difference during improvisation did not simply reflect viewing the emotional stimulus itself. Functional connectivity analyses using seeds within the left amygdala and left insula revealed changes in connectivity associated with the emotional valence of the stimulus. During positive improvisations (versus chromatic), the left amygdala had reduced connectivity with the left cerebellum, and the left insula had lower connectivity with areas important for attention and executive functioning (left superior frontal gyrus, bilateral middle frontal gyrus), high-order sensory processing (left SMG), and primary sensorimotor functions (precentral and postcentral gyri), increased connectivity with visual areas (middle occipital gyrus). During negative improvisations, the left amygdala had lower connectivity with the right IFG and left postcentral gyrus. When contrasting positive versus negative emotions during improvisational trials (not versus chromatic), the left amygdala had greater connectivity with left-sided attention/executive areas (superior medial and superior frontal gyri, IPS), ACC, and high-order sensory areas (SMG). Using the same contrast, the left insula showed increased connectivity with the Rolandic operculum and reduced connectivity with midbrain (including substantia nigra).
Conclusions/Highlighted Discussion The study reveals a network of brain regions important for musical improvisation (e.g., deactivations within the angular gyrus, precuneus, medial PFC; activations of the IFG) and demonstrates that activity in these and other regions is altered by the intended emotional valence of the compositions. Positive improvisations were associated with robust deactivations of the DLPFC, which, in association with a lack of increased activity in the SMA (which is active during tasks requiring continuous monitoring of motor output), may “indicate that positive improvisation induces a deeper state of flow than negative or ambiguous improvisation.” Negative improvisations are associated with increased insular connectivity with the substantia nigra, a midbrain nucleus containing neurons with dopaminergic projections to subcortical reward centers. The insula is known to represent afferent information about internal body states, and the authors suggest that negative improvisations may be associated with “binding of
visceral awareness” within the insula without any “real-life” negative consequences, creating a potentially rewarding situation. This may depend on maintaining “cognitive distance” from the performance, which they argue is substantiated by the finding of increased activity within the SMA and frontopolar cortex during negative improvisations, which are regions known to be involved in cognitive control and self-monitoring. The authors suggest that positive and negative musical improvisation may be pleasurable by different mechanisms: “While positive emotional targets enable more widespread hypofrontality and deeper flow states during spontaneous creativity, negative emotional targets may be more closely linked to a stronger visceral experience and greater activity in reward processing areas of the brain during improvisation.” This study demonstrates that emotional intent activates different neural networks during musical improvisation, and that positive and negative emotions utilize different aspects of attentional, limbic, and sensory processing during the generation of novel melodies.
P
E
T
(PET)
In PET imaging, radioactively-labeled molecules important for blood flow and metabolism (e.g., glucose, oxygen) are injected intravenously. The tracer is taken up by different brain regions based on the metabolic demands of the local tissue, which correlates with neural activity.
Music and Language Side by Side in the Brain: A PET Study of the Generation of Melodies and Sentences (Brown, Martinez, & Parsons, 2006) Design This study was designed to investigate the functional neuroanatomical link between the spontaneous, improvisational aspects of language and music. The investigative approach involved tasks where subjects improvised musical and linguistic ideas. The subjects were ten university students with
musical experience, but not necessarily expertise; to be eligible they needed to demonstrate proficiency in accurately reproducing presented melodies vocally in key and with superimposed harmonies. The subjects completed the following experimental tasks: (1) “melody generation,” where incomplete, novel six-second melodies were presented aurally and subjects were asked to generate and sing “an appropriate phrase” that completed them using the syllable/da/; (2) “sentence generation,” where incomplete, novel sentence fragments were presented and subjects were asked to generate “semantically and syntactically appropriate” phrases that completed the fragments; and (3) “rest,” where subjects sat with their eyes closed. Behavioral measures included a measurement of the melodic and verbal responses. Standard voxel-wise PET analyses were performed during each of the tasks, and group whole-brain flow images for the rest conditions were subtracted from the two experimental tasks, to reveal the functional anatomy specific to melody and sentence generation above that of rest.
Results The authors found that melody generation (relative to rest) was associated with increased PET signal in the SMA, pre-SMA, primary motor, lateral premotor, frontal operculum, anterior insula, primary auditory, secondary auditory, and superior temporal polar cortices. The SMA, primary motor, and frontal opercular signals were bilateral, and the auditory cortices were lateralized to smaller foci in the right hemisphere, and more widespread on the left. An extensive subcortical network was also identified, including thalamus, putamen, globus pallidus, caudate, midbrain, pons, and cerebellum. The bilateral parieto-occipital cortices were deactivated. A similar, overlapping network was identified for sentence generation. The network of shared activation included the bilateral SMA, left primary motor, bilateral premotor, left IFG (pars triangularis), left primary auditory, bilateral secondary auditory, anterior insular, and left anterior cingulate cortices; the subcortical areas were nearly identical between the two tasks. Regions specific for melody generation were the dorsal right temporal pole and right frontal operculum.
Conclusions/Highlighted Discussion
The authors identified a wide regional network that is associated with the generation of new melodies that includes motor, language, auditory, limbic, and subcortical areas. The authors suggest that these regions support processes integral to improvisation, including “(i) accessing rules of harmony, and (ii) re-ordering, rhythmically altering, re-harmonizing, or concatenating the stimulus or recalled musical associations to generate musically-appropriate phrases.” The opercular and planum polare activations are hypothesized to subserve the “use of implicit knowledge for harmonic and melodic rules,” the premotor, basal ganglia, and cerebellar activity to subserve the “representation of rhythmic musical features” such as meter, and the insula to reflect “kinaesthetically based musical expressivity.” The other areas—the SMA, ACC, premotor areas, basal ganglia—are “likely to be involved in the improvised manipulation of musical structures or perceived in the stimulus,” and also to aid in the response selection of “generated possibilities to determine the next note in a phrase.” The authors suggest that music and language shared many resources, including for audition and vocalization, and use parallel resources for phonological generativity of different semantic units.
T
D S
C ( DCS)
Transcranial direct current stimulation (tDCS) utilizes an externally applied electrical current to stimulate a brain region. tDCS can increase or decrease activity in the targeted brain region depending on factors such as the frequency of stimulation and intrinsic properties of the neural populations of the stimulated brain regions.
Anodal tDCS to Right Dorsolateral Prefrontal Cortex Facilitates Performance for Novice Jazz Improvisers but Hinders Experts (Rosen et al., 2016)
Design This study investigated how musical improvisation is affected when tDCS is directed toward the right DLPFC, given conflicting findings in this region in prior studies of musical improvisation. Seventeen jazz piano players with a range of improvisational experience improvised melodies with both hands on a full-size keyboard over background accompaniment. Improvisational blocks consisted of 6 sixteen-bar jazz songs. The subjects completed three sessions, with each session consisting of rest, an improvisational block, and ending with additional non-musical cognitive tasks (not analyzed). Each session used a different form of tDCS stimulation over the right DLPFC (at the F4 electrode), including a “sham” condition (only 30 seconds of stimulation), anodal stimulation (designed to “turn on” the region), and cathodal stimulation (designed to “turn off” the region). After each session, subjects were asked to choose their best performances. In addition, expert judges rated the compositions on creativity, technical proficiency, and aesthetic appeal. The musicians also completed a questionnaire about their musical background (improvisational experience, musical style, etc.).
Results The individual components of the expert ratings of performance were all positively correlated with one another, and were collapsed into a single “quality” score. The musical scores improved with increasing prior improvisational experience. When all subjects were considered together, tDCS (i.e., sham, anodal, cathodal) did not affect musical quality. However, there was a quality-byexpert interaction: right DLPFC stimulation increased the musical quality in less experienced subjects (anodal more than cathodal) and decreased quality in experts (anodal only).
Conclusion/Highlighted Discussion Based on data from prior reports, the authors hypothesized that anodal stimulation to the right DLPFC would improve the improvisational performance in less experienced individuals by enhancing top-down conscious control mechanisms (“Type 2 processes”), and that cathodal stimulation would improve performance in more experienced improvisers
by enhancing implicit, automatic performance (“Type 1 processes”) that is hypothesized to occur in hypofrontal states with expertise. The finding that anodal (i.e., activating) stimulation led to increased quality of performance in novices and decreased quality in experts is consistent with this idea. The authors suggest that right DLPFC stimulation may enhance cognitive processes that are important for creativity more generally, such as working memory, attention, inhibitory control, and visuospatial memory. The authors argue that right DLPFC anodal stimulation may also activate and strengthen a functionally-connected network of brain regions including prefrontal, premotor, and motor areas, which may “appear similar to more experienced musicians,” or it may increase theta coherence, which is believed to integrate “widely distributed neural networks that underlie creativity.” Experts do not benefit from this stimulation, they argue, because it disrupted their highly-trained neural networks by recruiting explicit, topdown processing, “similar to what happens when one attends to the components of a well-learned skill, causing performance decrements.” The authors argue that cathodal stimulation did not have the expected, opposite effects to that of anodal stimulation due to the unclear inhibitory effects of cathodal stimulation or compensation from other cognitive domains. Novices may have benefited from cathodal stimulation for similar reasons as experts, by allowing them to “perform using a more bottom-up approach.”
E
(EEG)
Electroencephalography uses scalp electrodes to record electrical signals from the brain.
The Brain Network Underpinning Novel Melody Creation (Adhikari et al., 2016) Design
This study sought to examine the electrophysiological signatures associated with musical improvisation using EEG. Nineteen experienced musicians with piano proficiency were tested using five experimental conditions, “Play-Prelearned,” “Play-Improvised,” “Imagine-Prelearned,” “ImagineImprovised,” and “Rest.” The subjects performed melodies on a keyboard during the “Play” conditions, and imagined melodies during the “Imagine” conditions. The “Prelearned” conditions consisted of playing (or imagining) one of four eight-quarter note melodies memorized by subjects before the test session. Improvisational sessions were restricted to quarter notes within the same tonal range as the “Prelearned” melodies. All conditions (including rest) were paced by a metronome. Behavioral data included performance accuracy (melodic and rhythmic) and an originality score. During all sessions, 64-channel EEG recording data was collected, and spectral measures of peak amplitude, coherence, and Granger causality (“directional causal influence from one oscillatory process to another”) were calculated at different nodes and compared between conditions.
Results Subjects’ performance was generally accurate: they performed the correct order of tones on 88 percent of prelearned trials, and only slightly anticipated the metronomic beat. During play conditions, improvisation (versus prelearned) was associated with higher peak amplitudes over left frontal, left central, bilateral parietal, and bilateral occipital nodes. The same contrast during imagined performance showed a similar area of peak amplitude difference over the left frontal region, and novel regions within right lateral temporal areas. The anatomical sources of the EEG signal during these tasks were calculated to correspond to the left superior frontal gyrus (SFG), SMA, left IPL, DLPFC, and right superior temporal gyrus (STG). There was globally increased alpha power in all leads (most pronounced in the parieto-occipital regions) during prelearned versus improvised performances during the play tasks, and slightly increased beta power in the frontal and parietal regions. There were no power differences in any frequency range when comparing the “Imagine-Improvise” and “ImaginePrelearned” conditions.
The Granger causality analysis revealed dynamic intra-network interactions. During overt musical performance, improvisation was associated with decreased causal influences from the SFG to SMA, SMA to IPL, and IPL to SFG. The strength of the connectivity between these regions was also negatively correlated with the originality of the compositions.
Conclusions/Highlighted Discussion The authors propose that the finding of increased alpha power during the overt prelearned tasks reflects top-down inhibition or suppression of potentially interfering alternative responses (e.g., the other three prelearned melodies.) The increase in beta power during the prelearned task may reflect “improvement in cerebral integrative and motor functions,” and “planning and execution of motor movements.” The authors suggest the SMA may be involved in motor readiness, motor imagery involving covert vocalizations, and “monitoring of current and planned motor movements.” The left IPL and right STG are believed to be involved in a feedback loop involving somatosensory and auditory perception. The frontal areas (SFG, right DLPFC) may be involved in cognitive control of musical improvisation. The causality analyses showing reduced influence of the SFG to SMA to IPL to SFG loop during improvisation aligns with the hypofrontality hypothesis, whereby “topdown control may inhibit a creative process driven by bottom-up processes.” The authors argue that during more complex improvisations, the information flow is reversed through parts of the network (e.g., the SFG receives information from the SMA), resulting in bottom-up processing during more creative output. The authors conclude that “creative performance in a real-time musical improvisational task involves regions that may function outside of the topdown control networks usually seen in traditional decision-making tasks.” This may be driven by the time constraints related to the task, wherein deliberate decision making about individual note choices is not possible, resulting in reliance on “bottom-up processes to control note choices using aesthetic rules that our advanced musician participants have internalized during a lifetime of music engagement.”
Creativity as a Distinct Trainable Mental State: An EEG Study of Musical Improvisation (Lopata, Nowicki, & Joanisse, 2017) Design This study used EEG to evaluate three questions about the neural substrates underlying improvisation. The first was whether there is a difference in frontal alpha activity between musical improvisation, rote playback, and passive listening, since synchronous frontal alpha oscillations are hypothesized to serve as a marker of implicit, bottom-up “Type 1” creative processes (see above). The second was to look for changes in alpha synchronization associated with improvisational expertise/training, and the third was to assess whether changes in alpha frequency correlate with the quality of improvised compositions, as rated by experts. Twenty-two musicians with a wide variety of musical experience (range: 4–48 years; mean 18.5; SD 11.7) were split into two groups, one with formal institutional training in improvisation (“FITI”) and the other without (“NonFITI”). Prior to testing, the musicians were shown three charts of 16 bars of chord progressions and given the diatonic structures for each progression (e.g., C-blues, G-major), but without overlying melodies. The experimental tasks were performed in the same order each time: “Listen,” where a melody was played and the subjects passively listened; “Learn,” where subjects actively learned to play the melody on a keyboard; “Imagine Playback,” where subjects imagined playing the prior melody; “Actual Playback,” where subjects overtly played the learned melody; “Imagine Improvisation,” where the subjects imagined improvising melodies over the chord progressions; and “Actual Improvisation,” where they improvised over the chords. Behavioral measures included an expert assessment of the creativity of the improvisational musical creativity via the use of a questionnaire. EEG data included measurements of upper alpha range power (10–12 Hz), which were calculated for each condition (versus a pre-stimulus reference interval) at each electrode, and a measurement of synchronization across electrodes.
Results In both the FITI and non-FITI groups, there was increased frontal alpha synchronization in the right hemisphere during “Listen,” “Playback,” and “Improvisation” tasks. During “Improvisation,” however, the FITI group showed increased right hemisphere alpha power compared with the nonFITI group. In the FITI group, alpha synchronization was higher during improvisation compared with both “Listen” and “Playback,” suggesting a unique interaction in alpha synchrony with improvisation and expertise. Also in the FITI group there were positive correlations between the left and right hemisphere frontal alpha synchrony for all tasks. In the non-FITI group, there was a strong negative correlation of right hemisphere alpha synchrony with both musical and improvisational experience during “Improvisation”; this was true to a lesser extent during “Listen” and “Playback.” In the FITI-group, there were positive correlations of left hemisphere alpha synchrony and age, musical experience, and improvisational experience during all tasks. There were no significant differences in creativity scores between FITI and non-FITI groups, but in the FITI group only, there was a positive correlation between creativity scores and right hemisphere alpha synchronization.
Conclusions/Highlighted Discussion The study demonstrates that frontal alpha synchronization is associated with musical improvisation, which is enhanced by formal training experience, and is associated with more creative performances in the most experienced improvisers. The authors interpreted the increased frontal alpha synchronization during improvisation to be “evidence of an underlying creative mental state characterized by immersion in a Type 1 spontaneous processing mode,” that suggests “top-down processing and internal focus of attention,” and not merely “a suppression of executive functions and logical-rational thought processes.” The authors speculate that increased right hemisphere alpha synchronization in the FITI group during “Improvisation” to “support the view of a special role of right frontal brain areas in the generation of original ideas, and as benefitting from expertise and development through training.”
In the non-FITI group, the finding of negative correlations of right hemisphere alpha synchronization with musical and improvisational experience during improvisation is interpreted to reflect this group’s “lack of immersion in Type 1 spontaneous processing,” and suggests an engagement with music that is more deliberate than spontaneous. The authors argue that the correlation between right frontal alpha synchrony and creativity scores in those with FITI suggest that “Type 1 spontaneous processing tends to yield higher quality improvised performance” in these experts.
S
D
The reviewed literature investigating the neuroanatomical substrates underlying musical improvisation explores different aspect of this complex behavior, including: the differential neural processing of rhythm and melody during musical improvisation (Berkowitz & Ansari, 2008; Pinho et al., 2016), the role of emotion (McPherson et al., 2016; Pinho et al., 2016), the relationship to language (Brown et al., 2006; Liu et al., 2012), the impact of an interlocutor (Donnay et al., 2014), and how expertise and training (Berkowitz & Ansari, 2010; Lopata et al., 2017; Pinho et al., 2014), the creative output (Lopata et al., 2017; Villarreal et al., 2013), and direct electrical stimulation (Rosen et al., 2016) modulate neural activity within the identified networks underlying musical improvisation. Although these studies varied widely in their task design, overlap is seen in a broad network of brain regions involved in cognitive control and monitoring, motor planning and execution, multimodal sensation, motivation, emotional/limbic processing, and language regions. The brain networks involved in musical improvisation perform domaingeneral processes that are recruited for the spontaneous generation of music. For example, de Manzano and Ullén (2012b) showed that many of regions implicated in musical improvisation are also active when generating random keystrokes, which suggests that musical improvisation involves networks that are important for tasks involving freely-generated actions more broadly. Brown et al. (2006) demonstrated that musical improvisation recruits regions also involved in sentence generation. Analyzing known
functions of brain regions involved in improvisation provides insights into the domain-general cognitive modalities that contribute to musical improvisation. In the following, we review the implications of the described studies (for additional reviews, see Beaty, 2015; Beaty et al., 2016).
Attentional Networks and the Prefrontal Cortex Almost all studies of musical improvisation in the literature implicate the prefrontal cortex, a region important for a number of cognitive, behavioral, and affective functions. A theme that emerges in the reviewed studies is the distinctive roles of the medial and lateral aspects of the PFC, regions known to subserve different cognitive processes. Medial PFC: The medial PFC (de Manzano & Ullén, 2012b; Limb & Braun, 2008; Liu et al., 2012) is important for internally-focused attention, self-generated actions, motivation, social cognition, and self-referential thinking. It has widespread connections with limbic, heteromodal sensory, and other prefrontal areas. In musical improvisation, the medial PFC may be important in the coordination and expression of internally-motivated behaviors, serving an integrative role combining multiple cognitive processes in the pursuit of internal goals (Limb & Braun, 2008). Improvisation of music with an emotional intention is associated with increased medial PFC activity as part of a broader network involving the insula and IFG (Pinho et al., 2016). Activity in more caudal medial prefrontal regions is associated with higher creativity scores during improvised rap, and this region has second- and third-order connections with a widespread, bilateral network involving limbic, motor, and multimodal perceptual areas (Liu et al., 2012). Facilitation of these connections during improvised rap may be important in enhancing the creative product (Liu et al., 2012). Liu et al. (2012) also report the medial PFC to be functionally disconnected from the DLPFC during improvisational tasks, a finding the authors speculate may allow for unplanned, spontaneous idea generation outside the constraint of top-down conscious control. ACC: The dorsal ACC is activated in several studies of musical improvisation (Berkowitz & Ansari, 2008; de Manzano & Ullén, 2012b; Lu
et al., 2015). The ACC is sometimes considered an extension of the medial PFC given its anatomical proximity (just posterior), and has rich connections with cognitive/attentional, affective, and motor areas. It is connected with both “top-down” and “bottom-up” attentional networks, and is important for error detection and monitoring, as well as reward-based learning. The ACC participates in the selection of appropriate actions based on predicted reward/affective values of competing plans, and monitors errors in these predictions. This region and its function may serve an important role in musical decision making during the moment-to-moment unfolding of musical improvisation. Dorsolateral PFC: The DLPFC—in contrast to the medial PFC—is a prefrontal region that is important in “top-down” processing and externallydirected attention, and plays a critical role in executive functions including working memory, planning, and multi-step cognitive processes. Conflicting findings have been shown in the DLPFC in studies of musical improvisation. It has been reported as having either increased (Bengtsson et al., 2007; de Manzano & Ullén, 2012a, 2012b; Donnay et al., 2014; Pinho et al., 2016; Villarreal et al., 2013) or decreased (Donnay et al., 2014; Limb & Braun, 2008; Liu et al., 2012; McPherson et al., 2016; Pinho et al., 2014, 2016) activity, depending on the study and the tasks involved. In studies demonstrating increased DLPFC activation, it has been reported to reflect top-down guidance of motor planning and response selection (Bengtsson et al., 2007), reliance on working memory, inhibition of competing stimuli (Rosen et al., 2016; Villarreal et al., 2013), integration of goal-oriented information for attentional selection (Pinho et al., 2016), and increased conscious self-monitoring when engaging with a musical interlocutor (Donnay et al., 2014). Direct current stimulation of the right DLPFC was associated with an increased quality of improvisational musical performance in novices (Rosen et al., 2016), suggesting that activity in the region can facilitate creative performance in certain subjects. During musical improvisation, reduced lateral PFC activity is suggested to represent a suspension of top-down, goal-directed, conscious control and self-monitoring functions, which allows more remote associations and unplanned, less predictable solutions to unfold (Limb & Braun, 2008). This is most commonly reported dorsally (Limb & Braun, 2008; Liu et al., 2012), but also more ventrally extending into the lateral orbitofrontal cortex (Limb & Braun, 2008; Liu et al., 2012). This mechanism of creative
expression is supported by electrophysiological studies showing reduced causal influences from the DLPFC (e.g., SFG) on premotor and parietal areas during improvisation, with a reduced strength of connection between these areas predictive of more creative performance (Adhikari et al., 2016). This hypofrontality mechanism may be a marker of expertise, as improvisational experience is negatively correlated with right DLPFC activity (Pinho et al., 2014). In EEG studies, alpha synchronization, a hypothesized marker of spontaneous, internally-focused attention, is enhanced in subjects with formal improvisational training, and is associated with increased musical creativity scores (Lopata et al., 2017). Stimulation of the right DLPFC using tDCS was associated with a reduction in the quality of improvisational performance in experts. These effects may be limited to those with specific improvisational training, as opposed to musical training more broadly, as several studies requiring only the latter (Berkowitz & Ansari, 2008; Villarreal et al., 2013) demonstrate increased DLPFC activity during improvisation, and those with more creative products showed increased activity within the DLPFC (Villarreal et al., 2013). DLPFC activity, and its functional connectivity, can be modulated by the nature of the improvisational task, which may be a reflection of the cognitive approach to creative expression (Pinho et al., 2016). Tasks that require an explicit approach to creativity (e.g., limited to a specific pitchset) are associated with relative increases in DLPFC activity when compared to those requiring implicit strategies (e.g., improvise an emotion) (Pinho et al., 2016). The pitch-set improvisational task in Pinho et al. (2016) was associated with increased DLPFC functional connectivity with motor, auditory, and parietal regions—possibly reflecting an intentional, top-down executive network—and the emotional improvisational tasks with increased connections with default mode regions—a more integrative network. Improvising with the intent of expressing positive emotions is associated with reduced DLPFC activity (McPherson et al., 2016).
Motor Regions
Motor areas control the body’s movements and include primary motor cortex (e.g., precentral gyrus) and high-order regions important for planning, sequencing, initiation, and monitoring of movement (e.g., PMd, PMv, SMA, pre-SMA), emotionally-guided movement (e.g., CMA), patterning and sequencing of movements (basal ganglia), and coordination of movements (cerebellum). Given that musical improvisation can only be externalized through movement, it is not surprising that all of these regions are involved in musical improvisation. The SMA and pre-SMA are important in the selection, initiation, timing, and monitoring of motor movements, and are thought to play a role in the rhythmic patterning during improvisational tasks (Bengtsson et al., 2007; Brown et al., 2006; de Manzano & Ullén, 2012a; Donnay et al., 2014; Liu et al., 2012; Villarreal et al., 2013) and hierarchical control of motor sequencing (de Manzano & Ullén, 2012a). The pre-SMA is more strongly connected with the cerebellum during tasks of rhythmic improvisation (de Manzano & Ullén, 2012a), which highlights its role in timing, and also with the limbic areas during freestyle rap (Liu et al., 2012), suggesting an interaction beyond that of other motor areas. The connection between higher-order motor regions may be modulated by training, as more experienced improvisers show increased connections of the SMA and PMd with a widespread network involving prefrontal, premotor, motor, parietal, and auditory regions (Pinho et al., 2014). The PMd is reported in many studies (Bengtsson et al., 2007; Berkowitz & Ansari, 2008; Brown et al., 2006; de Manzano & Ullén, 2012a; Limb & Braun, 2008; Liu et al., 2012). It is suggested to play a role in sensorimotor integration, whereby sensory information (often visual) is used to guide the sequencing and planning of motor movements; it is also important for internally-generated actions, and is connected with prefrontal areas. The region may be important in reading musical notation (Bengtsson et al., 2007), melodic performance (Bengtsson et al., 2007), and more broadly in top-down, explicit processing of novel motor sequencing (de Manzano & Ullén, 2012a). The CMA is thought to guide the selection of voluntary movements based on expected rewards, and is known to integrate limbic information to guide motor behaviors. It is associated with increased activity during musical improvisation (Liu et al., 2012). During freestyle rap, the CMA shows increased functional connectivity with the amygdala as part of a
broader network integrating affective, motor, and perceptual processes (Liu et al., 2012). The authors speculate it may represent an alternative pathway of behavioral expression occurring outside DLPFC-mediated, explicit motor selection.
Limbic/Affective Processing Given the emotional nature of musical improvisation, it is not surprising that limbic areas are involved. These regions help to represent emotion, motivation, and memory, and are connected richly with the autonomic nervous system, which provides information about internal body states. There are reports of reduced activity, including the hypothalamus, amygdala, hippocampus, parahippocampal gyrus, temporopolar cortex, and ventral striatum, which may be indicative of the positive emotional valence associated with improvising (Limb & Braun, 2008). During freestyle rap, the medial PFC shows increased functional connectivity with the amygdala via the IFG, CMA, and pre-SMA (Liu et al., 2012). Insula: The insula is important in representing subjective emotional and motivational states as it receives interoceptive inputs from the body via the autonomic nervous system and integrates this information with sensory, limbic, hedonic, and cognitive inputs. It is also important in salience detection, and serves an important role in switching between different largescale networks (e.g., central executive and default mode networks). Successful integration of highly-integrated emotional representations into specific motor plans may be important for creative expression under certain conditions, a process that may be enhanced when guided by the medial PFC. Activity in the insular cortices has been reported in several studies of musical improvisation (de Manzano & Ullén, 2012b; Limb & Braun, 2008; Pinho et al., 2016; Villarreal et al., 2013), with reports of both increased (Brown et al., 2006; de Manzano & Ullén, 2012b; Pinho et al., 2016; Villarreal et al., 2013) and decreased (Limb & Braun, 2008) activity. When comparing groups with high versus low creativity scores on a task of rhythmic improvisation, the right insula was associated with increased creativity scores, and its activity was positively correlated with higher scores (Villarreal et al., 2013). Improvisation with emotional intent is
associated with bilateral insular activation (Pinho et al., 2016), and the expression of negatively-valenced emotional improvisations is associated with increased insular connectivity with the midbrain substantia nigra (McPherson et al., 2016).
Language Areas The left IFG is critical for expressive language and syntax/grammar, functioning as part of Broca’s area. The IFG may be involved in other functions, including response inhibition, mirroring of external motor movements, generative verbal fluency, and hierarchical motor sequencing. The IFG has been implicated in several studies of improvisation (Berkowitz & Ansari, 2008; de Manzano & Ullén, 2012a, 2012b; Donnay et al., 2014; Limb & Braun, 2008; Liu et al., 2012; McPherson et al., 2016; Pinho et al., 2016), and is thought to play a role in the generation and selection of motor sequences (Berkowitz & Ansari, 2008), novel musical phrases (de Manzano & Ullén, 2012b), detection of salient harmonic and rhythmic elements (Donnay et al., 2014), and hierarchical structuring of musical phrases (Donnay et al., 2014). During freestyle rap, the medial PFC has enhanced functional connections with the left IFG as part of a broader, integrative network. Donnay et al. (2014) demonstrate the IFG to be functionally disconnected from areas important for communication of explicit semantic information (e.g., angular gyrus) during improvisational tasks that utilize a musical interlocutor, and the authors suggest that in musical communication, explicit semantic knowledge is superfluous, and that “acoustic-phonologic-analysis” areas are paramount.
Sensory Processing Sensory information is represented in the cortex in a hierarchical manner. Incoming sensory information is initially processed as simple unimodal representations (primary sensory areas), and progressively organized into complex unimodal representations (secondary sensory cortex), then later
combined with other sensory modalities in heteromodal regions. These regions are subject to both bottom-up and top-down regulation. A number of studies report increased activity within primary and unimodal sensory areas (Bengtsson et al., 2007; Brown et al., 2006; Limb & Braun, 2008). This may be related to task demands, suggesting a role for increased sensory processing during improvisation, or possibly a release phenomenon with reduced top-down inhibition (Limb & Braun, 2008). Auditory: Auditory sensory streams are located within the superior and lateral temporal areas, and musical improvisation is associated with activation of these areas, including the STG (Bengtsson et al., 2007; Brown et al., 2006; Donnay et al., 2014; Limb & Braun, 2008; Liu et al., 2012; Pinho et al., 2016), MTG (Limb & Braun, 2008; Liu et al., 2012), and ITG (Limb & Braun, 2008). Activity within the posterior superior temporal areas (Bengtsson et al., 2007; Limb & Braun, 2008; Liu et al., 2012), a region known to be important in highly-structured auditory processing, is involved in auditory working memory (Bengtsson et al., 2007: Donnay et al., 2014). It may be part of an auditory-motor feedback loop where auditory information is utilized online to guide the next musical idea via higher-order motor planning through instrumental performance (Bengtsson et al., 2007), and may aid the retrieval of stored musical motifs (Bengtsson et al., 2007). The posterior temporal regions may also be important in harmonic processing (Donnay et al., 2014). Activity in this region is associated with higher scores of creativity during improvised rap (Liu et al., 2012). Somatosensory: The SPL—a secondary somatosensory region—has been reported to have increased activity in several studies (Berkowitz & Ansari, 2008; Donnay et al., 2014; Limb & Braun, 2008), but not all (Liu et al., 2012). The primary sensory areas are also reported. This may reflect task demands, although in these studies the motor output (and thus somatosensory feedback) in both the improvisational tasks and controls was similar, and as such was suggested to represent a “generalized intensification of activity in all sensory modalities” associated with musical spontaneity (Limb & Braun, 2008). Visual: The occipital cortex, which is the site of hierarchical visual processing, including the fusiform and lingual gyri, was activated in a number of studies (Bengtsson et al., 2007; Donnay et al., 2014; Limb & Braun 2008; Liu et al., 2012; Pinho et al., 2016), and has been reported to
reflect visual demands associated with using a musical score to guide improvisation (Bengtsson et al., 2007).
Heteromodal Sensory Processing and the Parietal Lobes The parietal lobe is important for many cognitive functions, as it is situated between multiple sensory areas (somatosensory, auditory, visual) and has widespread connections with the frontal cognitive and motor areas. Depending on its subregion, it is involved in top-down attentional processing and executive functioning (IPS), bottom-up attentional processing (TPJ), serve as a sensory guide for movement, and bind together discrete visual elements into a coherent sensory “whole”; it is important for a diversity of higher cognitive processes such as visual imagination, mirror phenomenon, calculations, navigation, feelings of familiarity, and knowledge of directionality. Angular gyrus: The angular gyrus is a heteromodal sensory region that is part of the default mode network, an interconnected set of cortical nuclei that suberve internal states characterized by defocused attention, mindwandering, and recalling autobiographical information. It lies at the temporoparietal junction (TPJ), which is also implicated in bottom-up attentional processes as part of a broader ventral attention network involving the ventral frontal cortex. This network is important for identifying and orienting to behaviorally relevant stimuli that occur unexpectedly. Several studies show reduced activity in TPJ during improvisational tasks (Berkowitz & Ansari, 2010; Brown et al., 2006; Donnay et al., 2014; Limb & Braun, 2008; Liu et al., 2012; McPherson et al., 2016). This may reflect the role of expertise, as Berkowitz and Ansari (2010) showed reduced TPJ activity in experts compared to novices, and Pinho et al. (2014) show activity in this region during improvisational tasks to be inversely correlated with improvisational (but not overall musical) experience. The role these deactivations play in spontaneous musical expression is uncertain, but may reflect a broad reduction in top-down attentional control and increased automation of complex behavior (Pinho et al., 2014), or increased top-down control with explicit, goal-directed
suppression of bottom-up stimuli that compete for attention (Berkowitz & Ansari, 2010). EEG evidence is more supportive of the former hypothesis (Adhikari et al., 2016). In addition to the lateral parietal regions, musical improvisation is associated with other parietal areas such as the PCC, a default-mode network region with strong connections to both the medial PFC, lateral parietal areas, and temporolimbic areas. Deactivation in the PCC is seen during improvisation (Limb & Braun, 2008; Liu et al., 2012), although increased activity in this region is correlated with higher scores of creativity during improvised rap (Liu et al., 2012). Improvisation was associated with reduced activity within the precuneus (McPherson et al., 2016), a nearby region to the PCC with overlapping functionality. There is increased bilateral activity of the SMG reported in several studies (Donnay et al., 2014; Limb & Braun, 2008), but not all (Liu et al., 2012), and this may relate to its role in preparation to execute learned motor actions (i.e., praxis).
C Improvising is associated with changes in brain regions involved in attention, higher-order motor processing, limbic processing, unimodal and multimodal sensory processing, and linguistic processing. Activity in these regions is not unique to musical improvisation, but rather subserves domain-general cognitive processes that are recruited when improvising. From a cognitive neuroscience perspective, improvisation can be seen as a process in which auditory-motor representations are retrieved from memory storage, selected and combined based on stylistic rule-based constraints, and then executed through the motor system based on real-time sensorimotor and emotional evaluation. This is analogous to what occurs in spoken language. Expertise in improvisation appears to require various types of attentional shifts: from top-down attention involving the DLPFC (and dorsal attention network) to a state in which conscious monitoring is suspended and creative products are generated spontaneously and implicitly to produce more creative works; but also at times inhibition of bottom-up processing (rTPJ
deactivation) in order to preserve top-down, goal-directed states of internal motivation, represented by activation of the medial prefrontal areas, which themselves interface with a widespread network involving sensory, motor, and limbic regions during improvisation. This medial-lateral prefrontal dissociation of activity, seen most clearly in experts, may underlie the reported psychological state of flow, whereby complex, goal-directed actions are allowed to be expressed effortlessly. Musical improvisation provides a unique substrate for the study of the neural basis of creativity, providing insights into how domain-general cognitive processes can themselves be creatively recombined in real time to create spontaneous works of art.
R Adhikari, B. M., Norgaard, M., Quinn, K. M., Ampudia, J., Squirek, J., & Dhamala, M. (2016). The brain network underpinning novel melody creation. Brain Connectivity 6(10), 772–785. Beaty, R. E. (2015). The neuroscience of musical improvisation. Neuroscience & Biobehavioral Reviews 51, 108–117. Beaty, R. E., Benedek, M., Silvia, P. J., & Schacter, D. L. (2016). Creative cognition and brain network dynamics. Trends in Cognitive Sciences 20(2), 87–95. Bengtsson, S. L., Csikszentmihalyi, M., & Ullén, F. (2007). Cortical regions involved in the generation of musical structures during improvisation in pianists. Journal of Cognitive Neuroscience 19(5), 830–842. Berkowitz, A. L., & Ansari, D. (2008). Generation of novel motor sequences: The neural correlates of musical improvisation. NeuroImage 41(2), 535–543. Berkowitz, A. L., & Ansari, D. (2010). Expertise-related deactivation of the right temporoparietal junction during musical improvisation. NeuroImage 49(1), 712–719. Brown, S., Martinez, M. J., & Parsons, L. M. (2006). Music and language side by side in the brain: A PET study of the generation of melodies and sentences. European Journal of Neuroscience 23(10), 2791–2803. de Manzano, O., & Ullén, F. (2012a). Activation and connectivity patterns of the presupplementary and dorsal premotor areas during free improvisation of melodies and rhythms. NeuroImage 63(1), 272–280. de Manzano, O., & Ullén, F. (2012b). Goal-independent mechanisms for free response generation: Creative and pseudo-random performance share neural substrates. NeuroImage 59(1), 772–780. Donnay, G. F., Rankin, S. K., Lopez-Gonzalez, M., Jiradejvong, P., & Limb, C. J. (2014). Neural substrates of interactive musical improvisation: An FMRI study of “trading fours” in jazz. PloS ONE 9(2), e88665. Limb, C. J., & Braun, A. R. (2008). Neural substrates of spontaneous musical performance: An fMRI study of jazz improvisation. PloS ONE 3(2), e1679. Liu, S., Chow, H. M., Xu, Y., Erkkinen, M. G., Swett, K. E., Eagle, M. W., … Braun, A. R. (2012). Neural correlates of lyrical improvisation: An fMRI study of freestyle rap. Scientific Reports 2,
834. doi:10.1038/srep00834 Lopata, J. A., Nowicki, E. A., & Joanisse, M. F. (2017). Creativity as a distinct trainable mental state: An EEG study of musical improvisation. Neuropsychologia 99, 246–258. Lu, J., Yang, H., Zhang, X., He, H., Luo, C., & Yao, D. (2015). The brain functional state of music creation: An fMRI study of composers. Scientific Reports 5, 12277. doi:10.1038/srep12277 McPherson, M. J., Barrett, F. S., Lopez-Gonzalez, M., Jiradejvong, P., & Limb, C. J. (2016). Emotional intent modulates the neural substrates of creativity: An fMRI study of emotionally targeted improvisation in jazz musicians. Scientific Reports 6, 18460. doi:10.1038/srep18460 Pinho, A. L., de Manzano, O., Fransson, P., Eriksson, H., & Ullén, F. (2014). Connecting to create: Expertise in musical improvisation is associated with increased functional connectivity between premotor and prefrontal areas. Journal of Neuroscience 34(18), 6156–6163. Pinho, A. L., Ullén, F., Castelo-Branco, M., Fransson, P., & de Manzano, O. (2016). Addressing a paradox: Dual strategies for creative performance in introspective and extrospective networks. Cerebral Cortex 26(7), 3052–3063. Rosen, D. S., Erickson, B., Kim, Y. E., Mirman, D., Hamilton, R. H., & Kounios, J. (2016). Anodal tDCS to right dorsolateral prefrontal cortex facilitates performance for novice jazz improvisers but hinders experts. Frontiers in Human Neuroscience 10, 579. Retrieved from https://doi.org/10.3389/fnhum.2016.00579 Villarreal, M. F., Cerquetti, D., Caruso, S., Schwarcz López Aranguren, V., Gerschcovich, E. R., Frega, A. L., & Leiguarda, R. C. (2013). Neural correlates of musical creativity: Differences between high and low creative subjects. PloS ONE 8(9), e75427.
CHAPT E R 21
NEURAL MECHANISMS OF M U S I C A L I M A G E RY T I MO T H Y L . H U B B A R D
I I the early years of the cognitive approach to psychology, cognitive processes were considered analogous to software and the brain was considered analogous to hardware. Software and hardware can be viewed as relatively independent, and so there was not much focus on the neural mechanisms of cognitive processes. However, with the development of brain imaging technologies that allowed examination of functioning in intact living brains, researchers began to make significant advances in linking different cognitive processes with different neural mechanisms, and questions about the neural mechanisms of cognition became more central. Music offered an excellent venue for investigation of neural mechanisms of cognition (e.g., Peretz & Zatorre, 2003) and brain plasticity (e.g., Herholz & Zatorre, 2012; Schlaug, 2015). The importance of understanding neural mechanisms of cognition was underscored by the emergence of the notion of embodied cognition, an approach which suggests that cognitive functioning is influenced by characteristics and properties of embodied experience (e.g., Barsalou, 2008; Gibbs, 2005; Shapiro, 2010; Wilson, 2002). Indeed, there have recently been calls for an embodied cognition approach to the study of music (e.g., Cox, 2016). Most papers in
psychology and neuroscience of music focused on perception, cognition, and performance (e.g., Levitin & Tirovolas, 2009), and there has been less focus on musical imagery. This chapter will focus on neural mechanisms of musical imagery across a range of domains. Music is generally considered an auditory stimulus, but perceptual and cognitive representation of music can involve non-auditory (e.g., kinesthetic) information, and musical imagery involves auditory and nonauditory components. Studies involving auditory and non-auditory components of musical imagery and that have implications for understanding neural mechanisms of musical imagery are considered. Studies involving only behavioral or psychophysical measures of musical imagery are reviewed in Hubbard (2010, 2013a, 2013b, 2018, forthcoming) and are not considered here unless those studies have implications for understanding neural mechanisms of musical imagery. Studies involving neuroscience of music that do not generate testable predictions regarding musical imagery are also not considered here. The similarity of imagery and perception of musical stimuli is addressed, and results from studies involving behavioral and psychophysical methods, clinical studies of braindamaged individuals, and physiological data involving electrophysiology and brain imaging are considered. Involuntary musical imagery is addressed, and examples involving anticipatory musical imagery, musical hallucinations, musical imagery accompanying schizophrenia, earworms, and the relative lack of musical imagery in synesthesia are considered. Embodied musical imagery is addressed, and examples involving spatial and force metaphors, the role of mimicry, the distinction between the inner ear and inner voice, the effects of mental practice on performance, musical imagery and dance, and musical affect are considered. A brief summary and conclusions are then presented.
I
P
M
Imagery often seems to exhibit perception-like qualities, and a starting point for many studies of musical imagery involves the similarity of imagery and perception. There have been three main approaches to examining the relationship between imagery and perception, and these involve (a)
behavioral and psychophysical studies; (b) studies of patients with brain damage; and (c) brain-imaging methods such as electroencephalography (EEG), positron emission tomography (PET), and functional magnetic resonance imaging (fMRI).
Behavioral and Psychophysical There are many similarities in behavioral and psychophysical data regarding musical imagery and music perception. Properties of perceived and imaged musical tones such as pitch (Hubbard & Stoeckig, 1988) and timbre (Crowder, 1989) prime subsequently perceived tones with matching properties. Imaged tempo for a familiar tune matches the typical performance tempo for that tune (Halpern, 1988b; Jakubowski, Farrugia, & Stewart, 2016), and studies in which participants scanned through an imaged melody found that relative latencies between notes are preserved (e.g., Halpern, 1988a; Zatorre, Halpern, & Bouffard, 2010; Zatorre, Halpern, Perry, Meyer, & Evans, 1996). Musical images preserve harmonic relatedness and tonality (Hubbard & Stoeckig, 1988; Vuvan & Schmuckler, 2011) and exhibit a weak form of absolute pitch (Halpern, 1989; Schellenberg & Trehub, 2003). Pitch acuity is similar in perceived and imaged musical pitch, but temporal acuity is worse in musical imagery than in perception (Janata & Paroo, 2006). Not surprisingly, pitch acuity in imagery is better in participants with more musical training (Cebrian & Janata, 2010b). Right-handed experimental participants instructed to image a voice often localize that voice on their right side (Prete, Marzoli, Brancucci, & Tommasi, 2016), consistent with the right ear advantage for speech, and it could be hypothesized that side preferences found in music perception should be found for musical imagery. In general, findings are consistent with hypotheses that musical imagery preserves structural and temporal properties of a musical stimulus and that imagery of musical stimuli involves many of the same neural mechanisms as music perception.
Brain Damage
Although there have been numerous studies of the effects of brain damage on music perception, cognition, and production (for reviews, see Marin & Perry, 1999; Peretz & Zatorre, 2005; Stewart, von Kriegstein, Warren, & Griffiths, 2006), there have been relatively few studies of musical imagery in patients with brain damage. The studies that have been reported typically compared performance involving imagery in brain-damaged patients with performance on the same task in a control group. Patients with damage to the right temporal lobe performed worse on pitch comparisons in imagery and in perception than did patients with damage to the left temporal lobe or control participants (Zatorre & Halpern 1993). Halpern (2003) suggested these lesion data and subsequent imaging data (from Zatorre et al., 1996) demonstrated the right superior temporal gyrus is involved in comparisons of pitch in imagery (see also Samsom & Zatorre, 1991). Patients with right temporal lobe damage to the area including Heschl’s gyrus do not perceive a missing fundamental (Zatorre, 1988), and this is consistent with a role for this area in top-down representation of pitch. Patients with right hemisphere damage have difficulty in processing information regarding musical interval and musical contour (Liégois-Chauvel, Peretz, Babaï, Laguitton, & Chauvel, 1998; Peretz, 1990) and in identification of sad music (Khalfa, Schon, Anton, & Liégeois-Chauvel, 2005), and this predicts such patients would have similar difficulties in musical imagery. More positively, music influences brain plasticity, and so it could be predicted that musical imagery might be useful in the treatment of some neurological damage or disorders (e.g., melodic intonation therapy, Peretz, 2013; also Bringas et al., 2015; Sabaté, Llanos, & Rodriguez, 2008; Särkämö, Altenmüller, RodriguezFornells, & Peretz, 2016). Clinical studies of individuals with trauma-induced amusia (e.g., Marin & Perry, 1999; Satoh, 2014) or congenital amusia (e.g., Peretz, 2013) have shed light on neural mechanisms of music processing, but musical imagery has typically not been studied in such individuals. Amusias might have a basis in perception or memory (Peretz, 2002); to the extent an amusia involves dysfunction of memory, imagery might be impacted (e.g., Satoh, 2014, explicitly identifies memory as internal imagery), but to the extent an amusia involves dysfunction in perception, imagery might be relatively spared. Also, parallels between types of amusia and types of aphasia (e.g., receptive, production) suggest there may be some overlap in neural mechanisms that process music and neural mechanisms that process
language (cf. Besson & Schön, 2003; Marin & Perry, 1999; Patel, 2008). Additionally, findings that patients with amusia have difficulty in spatial tasks such as mental rotation (Douglas & Bilkey, 2007; but see Tillmann et al., 2010), coupled with findings that some types of musical imagery manipulation involve cortical areas implicated in mental rotation (Zatorre et al., 2010), suggest such patients might have impaired musical imagery. Studies of patients with amusia suggest music functions are not as strongly lateralized as language functions (Alossa & Castelli, 2009), and this has been confirmed in non-patient studies as well (e.g., Parsons, 2003; Platel et al., 1997). Also, presence of amusia predicts deficits in auditory emotion recognition in schizophrenia, and this might reflect development of music and language from the same musical protolanguage (Kantrowitz et al., 2014).
Physiological Measures Many studies recorded physiological measures in an attempt to understand neural mechanisms of musical imagery. These studies typically involved electrophysiology such as EEG and event-related potential (ERP) or brain imaging such as PET and fMRI (for review, see Koelsch, 2012).
Electrophysiology Imaging a melody results in more high-band synchronized alpha than does perceiving a melody (Schaefer, Vlek, & Desain, 2011; Villena-González, López, & Rodríguez, 2016), and alpha is increased during imagery of more complex tones (van Dijk, Nieuwenhuis, & Jensen, 2010). Emitted potentials occur when a musical note is expected but not presented (Cebrian & Janata, 2010b; Janata, 2001), and these are similar to evoked potentials elicited by presentation of a musical note. There are differences in size of the N1 in response to a perceived tone as a function of image accuracy and whether preceding tones were imaged or perceived (Cebrian & Janata, 2010a). If a participant deliberately generates an auditory image appropriate to a stimulus seen in a visual picture, P2 and LPC are increased (Wu, Mai, Chan, Zheng, & Luo, 2006). A larger mismatch in loudness or pitch between imaged tones and subsequent perceived tones elicits a larger N2
(Wu, Mai, Yu, Qin, & Luo, 2010) and lower-pitched or louder images and percepts evoke a larger N1 and LPC (Wu, Yu, Mai, Wei, & Luo, 2011). Accented beats in a sequence of imaged or perceived beats result in a larger positive amplitude after 180–250 milliseconds and a larger negative amplitude after 350 milliseconds (Vlek, Schaefer, Gielen, Farquhar, & Desain, 2011). Relatedly, rhythmic aspects of melody are more easily isolated in EEG than are pitch or melody-driven aspects (Schaefer, Desain, & Suppes, 2009). Mismatch negativity is evoked in musicians for perceived and for imaged musical stimuli (Herholz, Lappe, Knief, & Pantev, 2008; Yumoto et al., 2005). Continuation of a lyric in imagery during an unexpected silent gap in familiar music results in several changes in perceptual, attentional, and cognitive components of ERPs (Gabriel et al., 2016). In highly trained musicians, ERPs while reading a visual musical score are indistinguishable from ERPs while listening to auditory notes (Simoens & Tervaniemi, 2013). In general, imagery of a musical stimulus results in generation of ERP or EEG patterns similar to those generated by perception of a musical stimulus.
Brain Imaging There have been numerous studies involving brain imaging during processing of musical stimuli (for reviews, see Koelsch, 2010, 2012; also Peretz & Zatorre, 2003) and changes in the brain related to musical training (Wan & Schlaug, 2013). There is substantial overlap of cortical areas activated in musical imagery and activated in music perception, especially in Wernicke’s area and its right hemisphere homologue (Zhang, Chen, Wen, Lu, & Liu, 2017) and auditory association areas (e.g., Daselaar, Porat, Huijbers, & Pennartz, 2010; Herholz, Halpern, & Zatorre, 2012; Zatorre et al., 1996). Spontaneous imagery during an unexpected gap in a well-known musical piece (Kraemer, Macrae, Green, & Kelley, 2005) or during a silent gap prior to the start of an expected music track on a familiar CD (Leaver, van Lare, Zielinski, Halpern, & Rauschecker, 2009) involves activation of auditory association areas as well as prefrontal and motor areas. Auditory imagery may activate frequency-specific regions in primary auditory cortex (Oh, Kwon, Yang, & Jeong, 2013). When participants listen to four-part harmony, there is greater activation in bilateral temporal lobes, cingulate gyrus, and medial cerebellum when participants focus on the harmony as a whole, but greater activation of superior parietal, bilateral precuneus, and
bilateral orbital frontal cortices if participants focus on a particular (e.g., alto) line (Satoh, Takeda, Nagata, Hatazawa, & Kuzuhara, 2001). Judgment of similarities of perceived timbres and of imaged timbres results in similar cortical activation (Halpern, Zatorre, Bouffard, & Johnson, 2004): secondary auditory cortex and supplementary motor cortex are activated in both imagery and perception, but primary auditory cortex is activated only in perception (see also Zhang et al., 2017). Indeed, passive listening to music by musicians (Haueisen & Knösche, 2001) and non-musicians (Perrone-Capano, Volpicelli, & di Porzio, 2017) who remain motionless results in activation of cortical motor areas. Participants who self-report more vivid musical imagery exhibit greater activation in right superior temporal gyrus and prefrontal cortex (Herholz et al., 2012) and in right parietal cortex (Zatorre et al., 2010). Higher selfreported vividness of auditory imagery correlates with gray matter volume in left inferior parietal lobe, medial superior frontal gyrus, middle frontal gyrus, and left supplementary motor area (Lima et al., 2015). Application of TMS over the right hemisphere (to disrupt cortical activation) disrupts pitch discrimination (Halpern, 2003). Imagery reversal of a musical stimulus (i.e., scanning backward through a melody; Zatorre et al., 2010) activates intraparietal sulcus and ventrolateral and dorsolateral frontal cortex (areas involved in manipulating sensory information). Musicians who read a musical score initially exhibit activation in occipital areas that spreads to midline parietal and then to left temporal auditory association areas and right premotor areas, and this pattern could reflect emergence of notational audiation, that is, auditory imagery of a piece of music that is evoked by reading the musical score of that piece (Schürmann, Raij, Fujiki, & Hari, 2002). Participants instructed to image a single note exhibit activation of bilateral superior temporal gyri, medial and inferior frontal gyri, and precuneus (Yoo, Lee, & Choi, 2001). Overall, brain imaging studies generally support the idea that neural mechanisms are shared between imagery and perception and between imagery and production (see later subsection on “Mental Practice and Performance”), although there are exceptions (e.g., primary auditory cortex is less likely to be activated during imagery than during perception).
I
M
I
The majority of laboratory studies of musical imagery involve images created in response to a stimulus or task demand, and as noted above, these studies suggest such imagery generally recruits neural mechanisms similar to those used in music perception, cognition, and performance. However, musical imagery can be involuntary and occur spontaneously and without conscious control. Five types of involuntary musical imagery are considered here, namely (a) anticipatory musical imagery, (b) musical hallucinations, (c) musical imagery in schizophrenia, (d) earworms, and (e) synesthesia.
Anticipatory Musical Imagery Involuntary musical imagery reflects anticipation of an upcoming or ongoing musical stimulus. As noted earlier, when participants encounter an unexpected silent gap when listening to a familiar melody, they often report continuation of the melody in imagery; such continuation is linked with activation in auditory association areas and, when linguistic information (e.g., lyrics) isn’t available, in primary auditory cortex (Kraemer et al., 2005). Similarly, listeners who expect a musical stimulus but are presented with silence exhibit emitted potentials similar to the evoked potentials that occur when a musical stimulus is perceived (Janata, 2001). As noted earlier, when listening to a familiar CD, participants often experience mental imagery of an upcoming track during the silent period before that track; such imagery is linked with activity in rostral prefrontal cortex and motor areas (Leaver et al., 2009). Notational audiation can be considered anticipatory musical imagery, as content of audiation anticipates what would be heard if the notated music was performed. Indeed, ERPs in highly trained musicians during visual note reading are indistinguishable from ERPs during auditory note perception (Simoens & Tervaniemi, 2013). The existence of anticipatory musical imagery is consistent with hypotheses that imagery is an internal predictive process (e.g., Neisser, 1976; Tian & Poepple, 2012) and that anticipatory musical imagery might be linked with expectations that contribute to musical affect (cf. Huron, 2006; Juslin & Västfjäll, 2008).
Musical Hallucinations In voluntary musical imagery, individuals have volitional control over imagery and are aware that the sound does not emanate from a stimulus in the environment. In musical hallucinations, there is no volitional control over imagery and sounds are perceived to emanate from objects in the environment. Musical hallucinations are classified as idiopathic if they occur in the absence of associated psychopathology (other than hearing impairment) and as sympathetic if they are associated with concurrent psychopathology such as depression or schizophrenia (Coebergh, Lauw, Bots, Sommer, & Blom, 2015). Common etiological factors for musical hallucinations are brain injury, epilepsy, psychiatric disorder, and intoxication/pharmacology (Evers, 2006; Evers & Ellger, 2004). Musical hallucinations can accompany hearing loss (e.g., Hammeke, McQuillen, & Cohen, 1983), possibly because lack of auditory input disinhibits cortical mechanisms of auditory imagery and perception (Griffiths, 2000). Patients with hearing loss might experience musical imagery rather than other types of auditory imagery because music is more predictable and repetitive than are other types of auditory stimuli (Kumar et al., 2014). Dysfunction of temporal cortex (e.g., Kasai, Asada, Yumoto, Takeya, & Matsuda, 1999), right hemisphere focal brain lesions (Berrios, 1991; but see Keshavan, Davis, Steingard, & Lishman, 1992; Kumar et al., 2014), and activity in right superior temporal gyrus (Penfield & Perot, 1963), posterior middle right temporal lobe (Griffiths, Jackson, Spillane, Friston, & Frackowiak, 1997), superior temporal sulcus (Bernardini, Attademo, Blackmon, & Devinsky, 2017), and cerebellum (Griffiths, 2000) are linked to the presence of musical hallucinations. However, given that cerebral localization of music processing is dependent upon musical background and experience, the relationship between neural mechanisms and musical hallucinations could exhibit significant individual differences.
Schizophrenia The most commonly investigated psychopathology within auditory imagery literature is schizophrenia. Although the majority of investigations of
schizophrenia focused on auditory hallucinations involving verbal stimuli (e.g., Cho & Wu, 2013; Evans, McGuire, & David, 2000; Johns et al., 2001; McGuire et al., 1996; Shergill, Bullmore, Simmons, Murray, & McGuire, 2000), cases of musical hallucinations have been documented. Saba and Keshavan (1997) documented sixteen patients with schizophrenia who reported musical hallucinations. Musical imagery in schizophrenia is typically hallucinatory (not under voluntary control), and Baba and colleagues (Baba, Hamada, & Koca, 2003) suggested a model of musical hallucination in schizophrenia in which musical imagery becomes more obsessive in quality, is perceived as originating outside the individual, and is ultimately accepted as part of the self. The content of musical hallucinations in schizophrenia is often described as religious, and this is consistent with observations that delusions in schizophrenia often contain religious themes (e.g., Galant-Swafford & Bota, 2015). Brain imaging acquired during a schizophrenic patient’s musical hallucinations revealed increased activity in right orbitofrontal cortex (Bleich-Cohen, Hendler, Pashinian, Faragian, & Poyurovsky, 2011). Relatedly, differences in brain activation patterns of patients with schizophrenia and controls when spoken sentences were imaged in another person’s voice, but not when sentences were imaged in the participant’s own voice (McGuire et al., 1995), suggest articulatory information relative to the inner voice (discussed later in the chapter) might be overly represented in schizophrenia.
Earworms Perhaps the fastest growing area of research on musical imagery during the past several years involves earworms (also referred to as involuntary musical imagery, stuck-song syndrome, brain worms, sticky music, intrusive musical imagery, and perpetual music track; see Williams, 2015). Earworms are a fragment of a song or melody that repeatedly and involuntarily occupies an individual’s awareness. Unlike musical hallucinations, earworms are generally not considered to reflect psychopathology and are usually not considered distressing by those who experience them (Beaty et al., 2013; Halpern & Bartlett, 2011; Hemming & Merrill, 2015). Research on earworms has focused on descriptive
phenomenology and behavioral correlates (for summary, see Hubbard, forthcoming), and there has been little consideration of neural mechanisms of earworms. Levitin (2007) suggested earworms occur when neural areas representing a specific piece of music get stuck in “playback mode.” Farrugia and colleagues (Farrugia, Jakubowski, Cusack, & Stewart, 2015) found that frequency of occurrence of earworms was related to cortical thickness in the right frontal and temporal cortices and anterior cingulate, whereas affective aspects of involuntary musical imagery were related to gray matter volume in right temporopolar and parahippocampal cortices. It could be predicted that neural mechanisms previously shown to be involved in voluntary musical imagery might be activated during earworms, and there might be additional (or lack of) activation in other areas or differences in time course of activation that reflect the involuntary nature of earworms (e.g., differences in voluntary voice imagery and involuntary voice hallucinations; Linden et al., 2011).
Synesthesia Synesthesia occurs if a stimulus in one dimension or modality induces systematic and idiosyncratic perceptual experience of a specific stimulus in a different dimension or modality (e.g., hearing a specific sound induces visual experience of a specific color, e.g., Baron-Cohen & Harrison, 1997; Cytowic, 2002; Robertson & Sagiv, 2005). Reports of synesthesia in which a non-musical stimulus elicits musical imagery are rare (e.g., see listings in Cytowic & Eagleman, 2011; Day, 2016), and perhaps the most well-known is that of composer Jean Sibelius, who experienced different musical chords when viewing different colors (Pearce, 2007).1 It might be tempting to consider musical hallucinations or earworms as forms of synesthesia, but neither musical hallucinations nor earworms match the typical phenomenology of synesthesia (e.g., specific synesthetic experiences are evoked by specific stimuli and consistent over long periods of time). Most research on neural mechanisms of synesthesia focused on color-grapheme synesthesia (in which perception of letters or numerals induced experience of color; e.g., Rouw & Scholte, 2010), and there has been little research involving neural mechanisms of synesthesia involving musical imagery.
Possible neural mechanisms of synesthesia involve activation (Ramachandran & Hubbard, 2001) or disinhibition (e.g., Grossenbacher & Lovelace, 2001) of cross-connections between sensory areas, and so one speculative possibility is that lack of evoked musical imagery in synesthesia might be related to the general lack of activation in primary auditory cortex during musical imagery. Another speculative possibility is that musical perception already involves non-auditory (e.g., kinesthetic) elements, and so activation of other non-auditory information in musical imagery is not experienced as synesthesia per se.
E
M
I
Explanations of musical phenomena that are based on properties of the human body have a long history (e.g., dissonance and consonance reflect beat interference along the basilar membrane; Greenwood, 1961; see also Hodges, 2009). However, recent developments in cognitive science suggest characteristics of embodied experience more actively influence perception, cognition, and action (e.g., motor theory of speech perception, Liberman & Mattingly, 1985; mirror neurons, Iacoboni, 2009; Oztop, Kawato, & Arbib, 2006). Indeed, observations that music spontaneously engages our bodies in multiple ways (e.g., tapping along with a beat, attributing an accent pattern to isochronous beats) suggest music offers a promising venue in which to investigate embodied cognition (e.g., see Reybrouck, 2001). Aspects of embodiment that are relevant to musical imagery include (a) spatial and force metaphors, (b) use of mimicry, (c) the inner ear and inner voice distinction, (d) mental practice and performance, (e) the relationship between music and dance, and (f) musical affect.
Spatial and Force Metaphors Much of human cognition is based on metaphor (Lakoff & Johnson, 1980), and many prevalent metaphors reflect properties of embodiment and might influence image schemata (including motor imagery) and other aspects of cognition (Lakoff & Johnson, 1999). One example involves the notion of
pitch height (for discussion, see Cox, 2016). Faster auditory frequencies are judged to be “higher” in pitch than are slower auditory frequencies, and responding to stimuli in specific spatial locations is usually improved when visual stimuli higher in the picture plane are associated with faster auditory frequencies (Deroy, Fernandez-Prieto, Navarra, & Spence, 2018; Elkin & Leuthold, 2011; Keller, Dalla Bella, & Koch, 2010). Related to pitch height are notions that a sequence of notes forms a contour and that melody moves in steps and leaps such that notes successive in time are represented as motion in space (Johnson & Larson, 2003). More broadly, Larson (2012) suggested analogues of physical inertia, gravitational attraction, and magnetism occur in music, and Hubbard (2017) addressed the possibility of an analogue of momentum in music. Eitan and Granot (2006; Eitan & Timmers, 2010) identify many motion metaphors in musical space (e.g., crescendo is associated with approach and with acceleration). Neural mechanisms of spatial and force metaphors have not received extensive research, although it could be predicted that cortical areas involved in processing motion information could be activated (e.g., much as a still photograph depicting a specific direction of motion activates cortical motion processing areas; e.g., Senior et al., 2000; Senior, Ward, & David, 2002) in both music perception and musical imagery.
Mimicry Listening to or recalling music has been suggested to involve motor mimicry (Cox, 2016). Many musical sounds provide information regarding the human motor action that produced those sounds (including information involving spatial and force metaphors), and musical imagery is often accompanied by visual or motor images related to the sound source (Godøy, 2001). Cox suggested an important component of music comprehension is imitating, either overtly or covertly, the sound-producing actions of performers. Such imitative movements might involve movements appropriate to playing an instrument or subvocal imitation of musical sounds, and musical features (e.g., pitch, duration, strength, etc.) might be represented mimetically; indeed, even simple tapping along with the beat might be considered mimicry. Western popular music has been dominated
by music that is easily singable or danceable (Cox, 2016), and this is consistent with the importance of embodiment and mimicry in music processing. Such mimicry might involve overt physical action or covert firing of mirror neurons. Mirror neurons can be activated by sounds associated with a given action (Kohler et al., 2002), and so might be involved in neural activity relevant to singing or playing an instrument. One consequence of such mimicry is that musical imagery involves kinesthetic and proprioceptive information (Hubbard, 2013b), and the importance of kinesthetic and proprioceptive information in auditory and musical imagery is seen in the distinction between the inner ear and inner voice and in separate roles of auditory imagery and kinesthetic imagery in mental practice and performance.
Inner Ear and Inner Voice In addition to perceiving sounds generated by stimuli in the environment, humans perceive sounds they generate with their bodies, most commonly vocalizations (e.g., speaking, singing). Just as listening to external sounds or generating vocalizations involve the ear or the voice, respectively, auditory imagery of external sound or vocalization has been hypothesized to involve the “inner ear” or “inner voice,” respectively (see Hubbard, 2010, 2013b). A distinction between the inner ear and inner voice underscores one way in which musical imagery (and auditory imagery in general) reflects embodied experience, as elements of the inner voice are linked to articulatory gestures involved in speech, singing, or other sound production. The distinction between the inner ear and inner voice is often related to Baddeley’s model of working memory (Baddeley, 1986, 2000), which contains a phonological store used for retention of auditory material and an articulatory rehearsal mechanism that recodes stimuli for the phonological store. More specifically, the inner ear is linked to a passive phonological store and the inner voice is linked to a more active articulatory rehearsal mechanism. Evidence for the existence of such separate processes is based on a variety of findings (see Hubbard, 2010, 2013b, 2018, forthcoming). Just as the phonological store and articulatory rehearsal mechanism are separate structures or processes that generally work together
but can be experimentally separated, Smith and colleagues (Smith, Wilson, & Reisberg, 1995) suggested the inner voice and inner ear are separate structures or processes that generally work together but can be experimentally separated. Activation of motor areas in musical and auditory imagery has been found in multiple studies (for review, see Lima, Krishnan, & Scott, 2016; Zatorre & Halpern, 2005). Consistent with this, if participants cannot subvocalize during an experimental task involving auditory imagery, thus interfering with potential articulatory activity, performance on some tasks involving auditory imagery is affected (e.g., Reisberg, Smith, Baxter, & Sonenshine, 1989; Smith et al., 1995); this suggests motor activity in the form of articulatory gestures influences at least some auditory imagery (see also Aleman & van’t Wout, 2004). When tasks specified by Smith et al. (1995) to utilize the inner ear or inner voice were given to schizophrenia patients, there were no differences in performance (Evans et al., 2000), and this is consistent with confusion of self-generated and other-generated vocalization in schizophrenia. Studies of Halpern, Zatorre, and colleagues (Halpern et al., 2004; Halpern & Zatorre, 1999; Zatorre et al., 1996) suggest motor areas are activated in auditory imagery of instrumental (non-vocal) stimuli (Halpern & Zatorre, 1999), and this is consistent with Baddeley and Logie’s (1992) claim that the articulatory mechanism is involved in rehearsal of non-vocal stimuli. Relatedly, activation of cerebellar regions involved in control of the tongue and lips occurs during musical imagery (Herholz et al., 2012). Evidence of subvocalization in musical imagery is found in studies of notational audiation, as recognition of a familiar melody embedded within a larger musical score is disrupted more by phonatory interference than by rhythmic or auditory interference (e.g., Brodsky, Henik, Rubenstein, & Zorman, 2003), and EMG activity near the larynx is increased during reading of a musical score (Brodsky, Kessler, Rubenstein, Ginsborg, & Henik, 2008).
Mental Practice and Performance The distinction between the inner ear and inner voice suggests motor information contributes to auditory imagery, and a role for motor
information in auditory imagery can be seen in studies of the effects of musical imagery in mental practice and performance. The role of mental imagery in musical performance is reviewed in detail in Keller (2012), and findings most relevant to understanding neural mechanisms of musical imagery are briefly considered here. When string players performed or imaged a performance of a specific piece, the times taken to play or image were highly correlated, and frontal lobes, cerebellum, parietal lobe, and supplementary motor area, but not primary auditory cortex, were activated during imagery (Langheim, Callicott, Mattay, Duyn, & Weinberger, 2002). When professional or amateur violinists performed or imaged a performance, somatosensory cortex was activated, and activation was more focused in professionals in imagery and in performance (Lotze, Scheler, Tan, Braun, & Birbaumer, 2003), and it was speculated that musical training strengthened connections between auditory and movement areas of the cortex. Violinists exhibited activation of bilateral frontal opercular regions in preparation for and during musical imagery of performance and during performance (Kristeva, Chakarov, Schulte-Mönting, & Spreer, 2003) and exhibited activation in bilateral frontal opercular regions and in sensorimotor, premotor, and supplementary motor areas during imagery of performance and during performance (Nirkko, Baader, Loevblad, Milani, & Wiesendanger, 2000). Coherence of EEG recorded from near the supplementary motor area of a violoncellist was highest while imagining playing scales, less when imagining playing a familiar piece by Bach, and lowest when listening to the same piece by Bach (Petsche, von Stein, & Filz, 1996). Pianists who were presented with a musical score and imaged playing it or played it on a silent keyboard exhibited overlap in activation of premotor areas in imagery and in performance, and activation was greater during performance; however, primary motor cortex and posterior parietal cortex were active during performance and not during imagery (Meister et al., 2004). Pianists and non-musicians passively listened to a short piano melody or arbitrarily pressed keys on a soundless keyboard, and in both tasks pianists exhibited increased activation in dorsolateral and inferior frontal cortex, superior temporal gyrus, supramarginal gyrus, and supplementary motor and premotor areas (Bangert et al., 2006). Analogous similarities are observed with comparison of imagery and perception. Pianists who listened to familiar pieces exhibited activation in motor
regions appropriate for which fingers would have produced the notes (Haueisen & Knösche, 2001) and exhibited activation in auditory areas when they watched a silent video of someone fingering piano keys (Haslinger et al., 2005). Similarly, when pictures of hand configurations for playing guitar chords were shown, guitar players exhibited greater activation in inferior parietal and ventral premotor cortex than did musically untrained observers (Vogt et al., 2007). However, even though there is overlap between neural areas activated during imagery and neural areas activated during perception or performance, there are unique elements to each (e.g., see Zhang et al., 2017), and it is not necessarily the case that a common area of activation implies a similar mental representation (for discussion, see Linke & Cusack, 2015). Experimental participants who scored higher on a test of auditory imagery performed better on a subsequent performance following practice on a silent keyboard in which auditory feedback was not provided (Highben & Palmer, 2004). Guitarists or vocalists who used mental practice and physical practice performed best with a mixture of mental and physical practice (Theiler & Lippmann, 1995), and mental practice was more effective when musical pieces were relatively easy and less effective than physical practice when musical pieces were more difficult (Cahn, 2008). Pitch encoding of piano students is enhanced if those students make finger tapping movements as if they were playing a piano (Mikumo, 1994), and pitch acuity in auditory imagery and ability to synchronize in a tapping task are positively correlated (Pecenka & Keller, 2009). Relatedly, imaged tempo of popular music is more accurate when individuals tap as they image (Jakubowksi et al., 2016). Imaged singing activates parietal and motor areas including Broca’s area and its right hemisphere homologue (e.g., Baumann et al., 2007) and also activates areas associated with emotional processing including anterior cingulate cortex, anterior temporal lobe, and bilateral amygdala (Kleber, Birbaumer, Veit, Trevorrow, & Lotze, 2007). A case study of a pianist suggested imagery aided in managing tasks and integration of intention and action (Davidson-Kelly, Schaeffer, Moran, & Overy, 2015). In general, mental practice can facilitate subsequent performance (Driskell, Copper, & Moran, 1994; Lotze, 2013), presumably because musical imagery reinforces or strengthens connections made during physical practice, and this suggests a role of motor activation and motor processes in musical imagery.
Dance Perhaps the most obvious form of embodiment of musical information is dance, which involves production of bodily movements that map onto properties of music. In general, movements of the body can parallel (mimic) movements in music (e.g., slowing near the end of a movement, as when runners slow before stopping and ritardandi occur at the end of a musical piece, e.g., Friberg & Sundberg 1999), and movements of the body in response to a specific piece of music can reflect the rhythm, tempo, meter, and articulation of that music (Fraisse, 1982; Mitchell & Gallaher, 2001). Whether musical imagery influences kinesthetic information in dance, and whether kinesthetic information in dance influences musical imagery, is not known. Given that auditory imagery preserves structural and temporal information of the referent stimulus (Hubbard, 2010, 2013a, 2013b), coupled with structural similarity of music and dance (Krumhansl & Schenck 1997; Vines, Krumhansl, Wanderley, & Levitin, 2006), relationships between kinesthetic imagery of dance and auditory imagery of music (which would contain kinesthetic information) could be predicted. Such relationships might influence behavior (e.g., musical imagery of ascending or high pitches might facilitate rising or sustained bodily movement, musical imagery of legato musical notes might facilitate smoother bodily movement, etc.) as well as produce similar patterns of cortical activation. Relatedly, findings that auditory stimuli facilitate nondance body movements (e.g., in Parkinson’s disease; Rizzonelli, Kim, Gladow, & Mainka, 2017; Sabaté et al., 2008; Thaut et al., 1996) suggest auditory imagery of music might be useful in the treatment of motor disorders.
Musical Affect The evolutionary origins of music have been linked to communication of emotional information (e.g., Bryant, 2013; Snowdon, Zimmerman, & Altenmüller, 2015), and this might account for the common observation of a link between music and affect (for review, see Juslin & Sloboda, 2001). Indeed, music perception increases activation in mesocorticolimbic areas,
especially in the amygdala and hippocampus (e.g., Blood & Zatorre, 2001; for review, Koelsch, 2010). Listening to music is linked with release of dopamine in dorsal and ventral striatum, and the amount of dopamine released appears related to the amount of pleasure experienced (Salimpoor, Benovoy, Larcher, Dagher, & Zatorre, 2011). Furthermore, perception of music is linked with an increase of oxytocin (Chanda & Levitin, 2013), which is linked with social bonding. To the extent that musical imagery involves activation of the same neural mechanisms as music perception, cognition, and production, then musical imagery would presumably be linked with affect. Indeed, as noted earlier, the majority of cases of earworms are generally pleasant. Also, if images function as anticipatory predictive processes (e.g., Neisser, 1976; Tian & Poepple, 2012), then matching of musical imagery to subsequent music perception might result in positive affect resulting from a successful prediction (cf. expectancy as a contributor to emotion; Huron, 2006; Juslin & Västfjäll, 2008). Given that perceived music might activate different cortical areas as a function of whether that music is perceived as happy or sad (Khalfa et al., 2005; Mitterschiffthaler, Fu, Dalton, Andrew, & Williams, 2007), analogous patterns of cortical activation could be predicted during music imagery.
S
C
Musical imagery is phenomenologically similar to music perception, cognition, and production, and studies of musical imagery are often modeled on studies of music perception, cognition, and production. Studies of musical imagery have included behavioral and psychophysical measures, clinical studies of brain-damaged patients, and electroencephalography and brain imaging measures. These studies often found or suggested parallels between neural mechanisms involved in music perception, cognition, and production and neural mechanisms involved in musical imagery. Musical imagery leads to emitted potentials similar to evoked potentials in music perception, and mismatches between perception and imagery can influence ERP components such as N1, N2, P2, LPC, and MMN. Auditory association areas in frontal and prefrontal cortex are activated during musical imagery, and the right temporal lobe seems critical for generation
and judgment of pitch. Greater vividness of musical imagery is linked with greater activation in right superior temporal gyrus and prefrontal cortex, and manipulation of musical imagery activates intraparietal and frontal regions activated in other spatial tasks. However, there are some differences in activation patterns; for example, primary auditory cortex is usually activated in music perception but is usually not activated in music imagery. Overall, neural mechanisms involved in musical imagery, like neural mechanisms involved in music perception, cognition, and production, are distributed throughout the cerebral hemispheres and the cerebellum. An initially surprising finding was that motor areas of the cortex are often activated during musical imagery. This suggests that motor information might contribute to musical imagery, and in fact, motor information has been suggested to contribute to auditory imagery more generally. Researchers proposed a distinction between the inner ear, which involves auditory information, and the inner voice, which involves articulatory information in addition to auditory information. Studies in which the possibility of subvocalization was manipulated support such a distinction. Relatedly, studies of imagery in musical practice and performance highlight how motor activation and information contribute to musical imagery and how musical imagery contributes to performance. Similarly, engagement of the motor system (e.g., tapping along with the beat) improves accuracy of musical imagery, and there is greater activity in motor areas for musicians observing a musical performance on their trained instrument than for non-musicians observing the same performance. The role of the motor system in musical imagery is consistent with an embodied cognition approach and with spatial and force metaphors in the representation of music. Relatedly, mimicry in the form of covert (e.g., neural activation) or overt action is involved in music perception and musical imagery, and music perception and musical imagery might influence our motor system (e.g., dance). Indeed, given the connection between motor activation in music and effects of music on brain plasticity, it could be predicted that musical imagery might be a useful adjunct in treatment of some motor disorders. Musical imagery occurs in a wide range of domains. Imagery can be voluntary, and it is these voluntary images that previously received the most study. Musical imagery can also occur involuntarily, and examples of involuntary musical imagery include anticipatory musical imagery,
pathologies such as musical hallucinations and schizophrenia, and earworms. Anticipatory musical imagery predicts upcoming musical experience, and this is similar to the predictive aspects of other types of imagery and might contribute to musical affect. Relatedly, affective reactions to perceived music are linked to specific neurochemicals and areas of cortical activation, and it could be predicted that musical imagery might involve those same mechanisms. Earworms reflect the common experience of a melodic fragment that individuals “cannot get out of their heads,” and although there have recently been many studies focusing on descriptive phenomenology and behavioral correlates of earworms, there have been few studies examining neural mechanisms of earworms. It can be predicted that neural mechanisms of involuntary imagery will presumably overlap with the neural mechanisms involved in voluntary imagery, and any observed differences in activation patterns could inform not just theories of musical representation, but theories of cognitive control more generally. Overall, musical imagery occurs in a variety of situations; involves neural mechanisms involved in music perception, cognition, and production; is an important part of subjective experience; and reflects the embodied nature of cognition.
R Aleman, A., & van’t Wout, M. (2004). Subvocalization in auditory-verbal imagery: Just a form of motor imagery? Cognitive Processing 5(4), 228–231. Alossa, N., & Castelli, L. (2009). Amusia and musical functioning. European Neurology 61(5), 269– 277. Baba, A., Hamada, H., & Koca, H. (2003). Musical hallucinations in schizophrenia. 2. Relations with verbal hallucinations. Psychopathology 36(2), 104–110. Baddeley, A. D. (1986). Working memory. New York: Oxford University Press. Baddeley, A. D. (2000). The episodic buffer: A new component of working memory? Trends in Cognitive Sciences 4(11), 417–423. Baddeley, A. D., & Logie, R. H. (1992). Auditory imagery and working memory. In D. Reisberg (Ed.), Auditory imagery (pp. 179–197). Hillsdale, NJ: Lawrence Erlbaum Associates. Bangert, M., Peschel, T., Schlaug, G., Rotte, M., Drescher, D., Hinrichs, H., … Altenmüller, E. (2006). Shared networks for auditory and motor processing in professional pianists: Evidence from fMRI conjunction. NeuroImage 30(3), 917–926. Baron-Cohen, S., & Harrison, J. E. (Eds.). (1997). Synaesthesia: Classic and contemporary readings. Cambridge, MA: MIT Press/Blackwell. Barsalou, L. W. (2008). Grounded cognition. Annual Review of Psychology 59, 617–645.
Baumann, S., Koeneke, S., Schmidt, C. F., Meyer, M., Lutz, K., & Jänke, L. A. (2007). A network for audio-motor coordination in skilled pianists and non-musicians. Brain Research 1161, 65–78. Beaty, R. E., Burgin, C. J., Nusbaum, E. C., Kwapil, T. R., Hodges, D. A., & Silvia, P. J. (2013). Music to the inner ears: Exploring individual differences in musical imagery. Consciousness and Cognition 22(4), 1163–1173. Bernardini, F., Attademo, L., Blackmon, K., & Devinsky, O. (2017). Musical hallucinations: A brief review of functional neuroimaging findings. CNS Spectrums 22(5), 397–403. Berrios, G. E. (1991). Musical hallucinosis: A statistical analysis of 46 cases. Psychopathology 24(6), 356–360. Besson, M., & Schön, D. (2003). Comparison between language and music. In I. Peretz & R. J. Zatorre (Eds.), The cognitive neuroscience of music (pp. 269–293). New York: Oxford University Press. Bleich-Cohen, M., Hendler, T., Pashinian, A., Faragian, S., & Poyurovsky, M. (2011). Obsessive musical hallucinations in a schizophrenic patient: Psychopathological and fMRI characteristics. CNS Spectrums 16(7), 153–156. Blood, A. J., & Zatorre, R. J. (2001). Intensely pleasurable responses to music correlate with activity in brain regions implicated in reward and emotion. Proceedings of the National Academy of Sciences 98(20), 11818–11823. Bringas, M. L., Zaldivar, M., Rojas, P. A., Martinez-Montes, K., Chongo, D. M., Ortega, M. A., … Valdes-Sosa, P. A. (2015). Effectiveness of music therapy as an aid to neurorestoration of children with severe neurological disorders. Frontiers in Neuroscience 9, 427. Retrieved from https://doi.org/10.3389/fnins.2015.00427 Brodsky, W., Henik, A., Rubenstein, B. S., & Zorman, M. (2003). Auditory imagery from musical notation in expert musicians. Perception & Psychophysics 65(4), 602–612. Brodsky, W., Kessler, Y., Rubenstein, B. S., Ginsborg, J., & Henik, A. (2008). The mental representation of music notation: Notational audiation. Journal of Experimental Psychology: Human Perception and Performance 34(2), 427–445. Bryant, G. A. (2013). Animal signals and emotion in music: Coordinating affect across groups. Frontiers in Psychology 4, 990. Retrieved from https://doi.org/10.3389/fpsyg.2013.00990 Cahn, D. (2008). The effects of varying ratios of physical and mental practice, and task difficulty on performance of a tonal pattern. Psychology of Music 36, 179–191. Cebrian, A. N., & Janata, P. (2010a). Electrophysiological correlates of accurate mental image formation in auditory perception and imagery tasks. Brain Research 1342, 39–54. Cebrian, A. N., & Janata, P. (2010b). Influences of multiple memory systems on auditory mental image acuity. Journal of the Acoustical Society of America 127, 3189–3202. Chanda, M. L., & Levitin, D. J. (2013). The neurochemistry of music. Trends in Cognitive Sciences 17(4), 179–193. Cho, R., & Wu, W. (2013). Mechanisms of auditory verbal hallucination in schizophrenia. Frontiers in Psychiatry 4, 155. Retrieved from https://doi.org/10.3389/fpsyt.2013.00155 Coebergh, J. A. F., Lauw, R. F., Bots, R., Sommer, I. E. C., & Blom, J. D. (2015). Musical hallucinations: Review of treatment effects. Frontiers in Psychology 6, 814. Retrieved from https://doi.org/10.3389/fpsyg.2015.00814 Cox, A. (2016). Music and embodied cognition. Bloomington, IN: Indiana University Press. Crowder, R. G. (1989). Imagery for musical timbre. Journal of Experimental Psychology: Human Perception and Performance 15(3), 472–478. Cytowic, R. E. (2002). Synesthesia: A union of the senses (2nd ed.). Cambridge, MA: MIT Press. Cytowic, R. E., & Eagleman, D. M. (2011). Wednesday is indigo blue: Discovering the brain of synesthesia. Cambridge, MA: MIT Press.
Daselaar, S. M., Porat, Y., Huijbers, W., & Pennartz, C. M. (2010). Modality-specific and modalityindependent components of the human imagery system. NeuroImage 52(2), 677–685. Davidson-Kelly, K., Schaeffer, R. S., Moran, N., & Overy, K. (2015). “Total Inner Memory”: Deliberate uses of multimodal musical imagery during performance preparation. Psychomusicology: Music, Mind and Brain 25(1), 83–92. Day, S. A. (2016). Synesthetes: A handbook. CreateSpace Independent Publishing Platform. Deroy, O., Fernandez-Prieto, I., Navarra, J., & Spence, C. (2018). Unraveling the paradox of spatial pitch. In T. L. Hubbard (Ed.), Spatial biases in perception and cognition (pp. 77–93). New York: Cambridge University Press. Douglas, K. M., & Bilkey, D. K. (2007). Amusia is associated with deficits in spatial processing. Nature Neuroscience 10(7), 915–921. Driskell, J. E., Copper, C., & Moran, A. (1994). Does mental practice enhance music performance? Journal of Applied Psychology 79(4), 481–492. Eitan, Z., & Granot, R. Y. (2006). How music moves: Musical parameters and listeners’ images of motion. Music Perception 23(3), 221–247. Eitan, Z., & Timmers, R. (2010). Beethoven’s last piano sonata and those who follow crocodiles: Cross-domain mappings of auditory pitch in a musical context. Cognition 114(3), 405–422. Elkin, J., & Leuthold, H. (2011). The representation of pitch in auditory imagery: Evidence from S-R compatibility and distance effects. Journal of Cognitive Psychology 23(1), 76–91. Evans, C. L., McGuire, P. K., & David, A. S. (2000). Is auditory imagery defective in patients with auditory hallucinations? Psychological Medicine 30(1), 137–148. Evers, S. (2006). Musical hallucinations. Current Psychiatry Reports 8(3), 205–210. Evers, S., & Ellger, T. (2004). The clinical spectrum of musical hallucinations. Journal of the Neurological Sciences 227(1), 55–65. Farrugia, N., Jakubowski, K., Cusack, R., & Stewart, L. (2015). Tunes stuck in your brain: The frequency and affective evaluation of involuntary musical imagery correlate with cortical structure. Consciousness and Cognition 35, 66–77. Fraisse, P. (1982). Rhythm and tempo. In D. Deutsch (Ed.), The psychology of music (pp. 149–181). New York: Academic Press. Friberg, A., & Sundberg, J. (1999). Does music performance allude to locomotion? A model of final ritardandi derived from measurements of stopping runners. Journal of the Acoustical Society of America 105(3), 1469–1484. Gabriel, D., Wong, T. C., Nicolier, M., Giustiniani, J., Mignot, C., Noiret, N., … Vandel, P. (2016). Don’t forget the lyrics! Spatiotemporal dynamics of neural mechanisms spontaneously evoked by gaps of silence in familiar and newly learned songs. Neurobiology of Learning and Memory 132, 18–28. Galant-Swafford, J., & Bota, R. (2015). Musical hallucinations in schizophrenia. Mental Illness 7(1), 6065. Gibbs, R. W. (2005). Embodiment and cognitive science. New York: Cambridge University Press. Godøy, R. I. (2001). Imagined action, excitation, and resonance. In R. I. Godøy & H. Jørgensen (Eds.), Musical imagery (pp. 237–250). New York: Taylor & Francis. Greenwood, D. D. (1961). Critical bandwidth and the frequency coordinates of the basilar membrane. Journal of the Acoustical Society of America 33, 1344–1356. Griffiths, T. D. (2000). Musical hallucinosis in acquired deafness: Phenomenology and brain substrate. Brain 123(10), 2065–2076. Griffiths, T. D., Jackson, M. C., Spillane, J. A., Friston, K. J., & Frackowiak, R. S. J. (1997). A neural substrate for musical hallucinosis. Neurocase 3(3), 167–172.
Grossenbacher, P. G., & Lovelace, C. T. (2001). Mechanisms of synesthesia: Cognitive and physiological constraints. Trends in Cognitive Sciences 5(1), 36–41. Halpern, A. R. (1988a). Mental scanning in auditory imagery for songs. Journal of Experimental Psychology: Learning, Memory, and Cognition 14, 434–443. Halpern, A. R. (1988b). Perceived and imaged tempos of familiar songs. Music Perception 6(2), 193–202. Halpern, A. R. (1989). Memory for the absolute pitch of familiar songs. Memory & Cognition 17(5), 572–581. Halpern, A. R. (2003). Cerebral substrates of musical imagery. In I. Peretz & R. Zatorre (Eds.), The cognitive neuroscience of music (pp. 217–230). New York: Oxford University Press. Halpern, A. R., & Bartlett, J. C. (2011). The persistence of musical memories: A descriptive study of earworms. Music Perception 28(4), 425–432. Halpern, A. R., & Zatorre, R. J. (1999). When that tune runs through your head: A PET investigation of auditory imagery for familiar melodies. Cerebral Cortex 9(7), 697–704. Halpern, A. R., Zatorre, R. J., Bouffard, M., & Johnson, J. A. (2004). Behavioral and neural correlates of perceived and imagined musical timbre. Neuropsychologia 42(9), 1281–1292. Hammeke, T. A., McQuillen, M. P., & Cohen, B. A. (1983). Musical hallucinations associated with acquired deafness. Journal of Neurology, Neurosurgery, and Psychiatry 46(6), 570–572. Haslinger, B., Erhard, P., Altenmüller, E., Schroeder, U., Boecker, H., & Ceballos-Baumann, A. O. (2005). Transmodal sensorimotor networks during action observation in professional pianists. Journal of Cognitive Neuroscience 17(2), 282–293. Haueisen, J., & Knösche, T. (2001). Involuntary motor activity in patients evoked by music perception. Journal of Cognitive Neuroscience 13(6), 786–792. Hemming, J., & Merrill, J. (2015). On the distinction between involuntary musical imagery, musical hallucinosis, and musical hallucinations. Psychomusicology: Music, Mind, and Brain 25(4), 435– 442. Herholz, S. C., Halpern, A. R., & Zatorre, R. J. (2012). Neuronal correlates of perception, imagery, and memory for familiar tunes. Journal of Cognitive Neuroscience 24(6), 1382–1397. Herholz, S. C., Lappe, C., Knief, A., & Pantev, C. (2008). Neural basis of music imagery and the effect of musical expertise. European Journal of Neuroscience 28(11), 2352–2360. Herholz, S. C., & Zatorre, R. J. (2012). Musical training as a framework for brain plasticity: Behavior, function, and structure. Neuron 76(3), 486–502. Highben, Z., & Palmer, C. (2004). Effects of auditory and motor mental practice in memorized piano performance. Bulletin of the Council for Research in Music Education 159, 58–65. Hodges, D. A. (2009). Bodily responses to music. In S. Hallam, I. Cross, & M. Thaut (Eds.), The Oxford handbook of music psychology (2nd ed., pp. 183–196). New York: Oxford University Press. Hubbard, T. L. (2010). Auditory imagery: Empirical findings. Psychological Bulletin 136(2), 302– 329. Hubbard, T. L. (2013a). Auditory aspects of auditory imagery. In S. Lacey & R. Lawson (Eds.), Multisensory imagery (pp. 51–76). New York: Springer. Hubbard, T. L. (2013b). Auditory imagery contains more than audition. In S. Lacey & R. Lawson (Eds.), Multisensory imagery (pp. 221–247). New York: Springer. Hubbard, T. L. (2017). Momentum in music: Musical succession as physical motion. Psychomusicology: Music, Mind, and Brain 27(1), 14–30. Hubbard, T. L. (2018). Some methodological and conceptual considerations in studies of auditory imagery. Auditory Perception and Cognition 1, 6–41.
Hubbard, T. L. (forthcoming). Some anticipatory, kinesthetic, and dynamic aspects of auditory imagery. In M. Grimshaw, M. Walther-Hansen, & M. Knakkergaard (Eds.), The Oxford handbook of sound and imagination. New York: Oxford University Press. Hubbard, T. L., & Stoeckig, K. (1988). Musical imagery: Generation of tones and chords. Journal of Experimental Psychology: Learning, Memory, and Cognition 14, 656–667. Huron, D. (2006). Sweet anticipation: Music and the psychology of expectation. Cambridge, MA: MIT Press. Iacoboni, M. (2009). Imitation, empathy, and mirror neurons. Annual Review of Psychology 60, 653– 670. Jakubowski, K., Farrugia, N., & Stewart, L. (2016). Probing imagined tempo for music: Effects of motor engagement and musical experience. Psychology of Music 44(6), 1274–1288. Janata, P. (2001). Brain electrical activity evoked by mental formation of auditory expectations and images. Brain Topography 13(3), 169–193. Janata, P., & Paroo, K. (2006). Acuity of auditory images in pitch and time. Perception & Psychophysics 68(5), 829–844. Johns, L. C., Rossell, S., Frith, C., Ahmad, F., Hemsley, D., Kuipers, E., & McGuire, P. K. (2001). Verbal self-monitoring and auditory verbal hallucinations in patients with schizophrenia. Psychological Medicine 31(4), 705–715. Johnson, M. L., & Larson, S. (2003). “Something in the way she moves” - Metaphors of musical motion. Metaphor and Symbol 18(2), 63–84. Juslin, P. N., & Sloboda, J. A. (Eds.). (2001). Music and emotion: Theory and research. New York: Oxford University Press. Juslin, P. N., & Västfjäll, D. (2008). Emotional responses to music: The need to consider underlying mechanisms. Behavioral and Brain Sciences 31(5), 559–621. Kantrowitz, J. T., Scaramello, N., Jakubovitz, A., Lehrfeld, J. M., Laukka, P., Elfenbein, H. A., … Javitt, D. C. (2014). Amusia and protolanguage impairments in schizophrenia. Psychological Medicine 44(13), 2739–2748. Kasai, K., Asada, T., Yumoto, M., Takeya, J., & Matsuda, H. (1999). Evidence of functional abnormality in the right auditory cortex during musical hallucinations. Lancet 354, 1703–1704. Keller, P. E. (2012). Mental imagery in musical performance: Underlying mechanisms and potential benefits. Annals of the New York Academy of Sciences 1252, 206–213. Keller, P. E., Dalla Bella, S., & Koch, I. (2010). Auditory imagery shapes movement timing and kinematics: Evidence from a musical task. Journal of Experimental Psychology: Human Perception and Performance 36(2), 508–513. Keshavan, M. S., Davis, A. S., Steingard, S., & Lishman, W. A. (1992). Musical hallucinosis: A review and synthesis. Neuropsychiatry, Neuropsychology, & Behavioral Neurology 5(3), 211–223. Khalfa, S., Schon, D., Anton, J.-L., & Liégeois-Chauvel, C. (2005). Brain regions involved in the recognition of happiness and sadness in music. Neuroreport 16(18), 1981–1984. Kleber, B., & Birbaumer, N., Veit, R., Trevorrow, T., & Lotze, M. (2007). Overt and imagined singing of an Italian aria. NeuroImage 36(3), 889–900. Koelsch, S. (2010). Towards a neural basis of music-evoked emotions. Trends in Cognitive Sciences 14(3), 131–137. Koelsch, S. (2012). Brain and music. Cambridge, MA: Wiley-Blackwell. Kohler, E., Keysers, C., Umiltà, M. A., Fogassi, L., Gallese, V., & Rizzolatti, G. (2002). Hearing sounds, understanding actions: Action representation in mirror neurons. Science 297(5582), 846– 848. Kraemer, D. J. M., Macrae, C. N., Green, A. E., & Kelley, W. M. (2005). Musical imagery: Sound of silence activates auditory cortex. Nature 434, 158.
Kristeva, R., Chakarov, V., Schulte-Mönting, J., & Spreer, J. (2003). Activation of cortical areas in music execution and imagining: A high-resolution EEG study. NeuroImage 20(3), 1872–1883. Krumhansl, C. L., & Schenck, D. L. (1997). Can dance reflect the structural and expressive qualities of music? A perceptual experiment on Balanchine’s choreography of Mozart’s Divertimento No. 15. Musicae Scientiae 1, 63–85. Kumar, S., Sedley, W., Barnes, G. R., Teki, S., Friston, K. J., & Griffiths, T. D. (2014). A brain basis for musical hallucinations. Cortex 52(100), 86–97. Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago, IL: University of Chicago Press. Lakoff, G., & Johnson, M. (1999). Philosophy in the flesh: The embodied mind and its challenge to western thought. New York: Basic Books. Langheim, F. J., Callicott, J. H., Mattay, V. S., Duyn, J. H., & Weinberger, D. R. (2002). Cortical systems associated with covert music rehearsal. Neuroimage 16(4), 901–908. Larson, S. (2012). Musical forces: Motion, metaphor and meaning in music. Bloomington, IN: Indiana University Press. Leaver, A. M., van Lare, J., Zielinski, B., Halpern, A. R., & Rauschecker, J. P. (2009). Brain activation during anticipation of sound sequences. Journal of Neuroscience 29(8), 2477–2485. Levitin, D. J. (2007). This is your brain on music: The science of a human obsession. New York: Penguin Group. Levitin, D. J., & Tirovolas, A. K. (2009). Current advances in the cognitive neuroscience of music. Annals of the New York Academy of Sciences 1156, 211–231. Liberman, A. M., & Mattingly, I. G. (1985). The motor theory of speech perception revised. Cognition 21, 1–36. Liégois-Chauvel, C., Peretz, I., Babaï, M., Laguitton, V., & Chauvel, P. (1998). Contribution of different cortical areas in the temporal lobes in music processing. Brain 121(10), 1853–1867. Lima, C. F., Krishnan, S., & Scott, S. K. (2016). Roles of supplementary motor areas in auditory processing and auditory imagery. Trends in Neurosciences 39(8), 527–542. Lima, C. F., Lavan, N., Evans, S., Agnew, Z., Halpern, A. R., Shanmugalingam, P., … Scott, S. K. (2015). Feel the noise: Relating individual differences in auditory imagery to the structure and function of sensorimotor systems. Cerebral Cortex 25(11), 4638–4650. Linden, D. E. J., Thornton, K., Kuswanto, C. N., Johnston, S. J., van de Ven, V., & Jackson, M. C. (2011). The brain’s voices: Comparing nonclinical auditory hallucinations and imagery. Cerebral Cortex 21(2), 330–337. Linke, A. C., & Cusack, R. (2015). Flexible information coding in human auditory cortex during perception, imagery, and STM of complex sounds. Journal of Cognitive Neuroscience 27(7), 1322–1333. Lotze, M. (2013). Kinesthetic imagery of musical performance. Frontiers in Human Neuroscience 7, 280. Retrieved from https://doi.org/10.3389/fnhum.2013.00280 Lotze, M., Scheler, G., Tan, H. R., Braun, C., & Birbaumer, N. (2003). The musician’s brain: Functional imaging of amateurs and professionals during performance and imagery. Neuroimage 20(3), 1817–1829. McGuire, P. K., Silbersweig, D. A., Wright, I., Murray, R. M., David, A. S., Frackowiak, R. S. J., & Frith, C. D. (1995). Abnormal monitoring of inner speech: A physiological basis for auditory hallucinations. Lancet 346, 596–600. McGuire, P. K., Silbersweig, D. A., Murray, R. M., David, A. S., Frackowiak, R. S., & Frith, C. D. (1996). Functional anatomy of inner speech and auditory verbal imagery. Psychological Medicine 26(1), 29–38. Marin, O. S. M., & Perry, D. W. (1999). Neurological aspects of music perception and performance. In D. Deutsch (Ed.), The psychology of music (2nd ed., pp. 653–724). New York: Academic Press.
Meister, I. G., Krings, T., Foltys, H., Boroojerdi, B., Müller, M., Töpper, R., & Thron, A. (2004). Playing piano in the mind: An fMRI study on music imagery and performance in pianists. Cognitive Brain Research 19(3), 219–228. Mikumo, M. (1994). Motor encoding strategy for pitches of melodies. Music Perception 12(2), 175– 197. Mitchell, R. W., & Gallaher, M. C. (2001). Embodying music: Matching music and dance in memory. Music Perception 19(1), 65–85. Mitterschiffthaler, M. T., Fu, C. H. Y., Dalton, J. A., Andrew, C. M., & Williams, S. C. R. (2007). A functional MRI study of happy and sad affective states induced by classical music. Human Brain Mapping 28(11), 1150–1162. Neisser, U. (1976). Cognition and reality: Principles and implications of cognitive psychology. New York: W. H. Freeman. Nirkko, A. C., Baader, A. P., Loevblad, K.-O., Milani, P., & Wiesendanger, M. (2000). Cortical representation of music production in violin players: Behavioral assessment and functional imaging of finger sequencing, bimanual coordination and music specific brain activation. NeuroImage 11(5), S106. Oh, J., Kwon, J. H., Yang, P. S., & Jeong, J. (2013). Auditory imagery modulates frequency-specific areas in the human auditory cortex. Journal of Cognitive Neuroscience 25(2), 175–187. Oztop, E., Kawato, M., & Arbib, M. (2006). Mirror neurons and imitation: A computationally guided review. Neural Networks 19(3), 254–271. Parsons, L. M. (2003). Exploring the functional neuroanatomy of music performance, perception, and comprehension. In I. Peretz & R. Zatorre (Eds.), The cognitive neuroscience of music (pp. 247– 268). New York: Oxford University Press. Patel, A. D. (2008). Music, language, and the brain. New York: Oxford University Press. Pearce, J. M. S. (2007). Synaesthesia. European Neurology 57(2), 120–124. Pecenka, N., & Keller, P. E. (2009). Auditory pitch imagery and its relationship to musical synchronization. Annals of the New York Academy of Sciences 1169, 282–286. Penfield, W., & Perot, P. (1963). The brain’s record of auditory and visual experience. Brain 86(4), 595–696. Peretz, I. (1990). Processing of local and global musical information by unilateral brain-damaged patients. Brain 113(4), 1185–1205. Peretz, I. (2002). Brain specialization for music. Neuroscientist 8(4), 374–382. Peretz, I. (2013). The biological foundations of music: Insight from congenital amusia. In D. Deutsch (Ed.). The psychology of music (3rd ed., pp. 551–564). New York: Academic Press. Peretz, I., & Zatorre, R. J. (Eds.). (2003). The cognitive neuroscience of music. New York: Oxford University Press. Peretz, I., & Zatorre, R. J. (2005). Brain organization for music processing. Annual Review of Psychology 56, 89–114. Perrone-Capano, C., Volpicelli, F., & di Porzio, U. (2017). Biological bases of human musicality. Review of Neuroscience 28(3), 235–245. Petsche, H., von Stein, A., & Filz, O. (1996). EEG aspects of mentally playing an instrument. Cognitive Brain Research 3(2), 115–123. Platel, H., Price, C., Baron, J.-C., Wise, R., Lambert, J., Frackowiak, R. S. J., … Eustache, F. (1997). The structural components of music perception: A functional anatomical study. Brain: A Journal of Neurology 120(2), 229–243. Prete, G., Marzoli, D., Brancucci, A., & Tommasi, L. (2016). Hearing it right: Evidence of hemispheric lateralization in auditory imagery. Hearing Research 332, 80–86.
Ramachandran, V. S., & Hubbard, E. M. (2001). Synaesthesia: A window into perception, thought and language. Journal of Consciousness Studies 8(12), 3–34. Reisberg, D., Smith, J. D., Baxter, D. A., & Sonenshine, M. (1989). “Enacted” auditory images are ambiguous; “pure” auditory images are not. Quarterly Journal of Experimental Psychology A: Human Experimental Psychology 41(3), 619–641. Reybrouck, M. (2001). Musical imagery between sensory processing and ideomotor simulation. In R. I. Godøy & H. Jørgensen (Eds.), Musical imagery (pp. 117–135). New York: Taylor & Francis. Rizzonelli, M., Kim, J. H., Gladow, T., & Mainka, S. (2017). Musical stimulation with feedback in gait training for Parkinson’s disease. Psychomusicology: Music, Mind, and Brain 27, 213–218. Robertson, L. C., & Sagiv, N. (Eds.). (2005). Synesthesia: Perspectives from cognitive neuroscience. New York: Oxford University Press. Rouw, R., & Scholte, H. S. (2010). Neural basis of individual differences in synesthetic experiences. Journal of Neuroscience 30(18), 6205–6213. Saba, P. R., & Keshavan, M. S. (1997). Musical hallucinations and musical imagery: Prevalence and phenomenology in schizophrenic inpatients. Psychopathology 30, 185–190. Sabaté, M., Llanos, C., & Rodriguez, M. (2008). Integration of auditory and kinesthetic information in motion: Alterations in Parkinson’s disease. Neuropsychology 22(4), 462–468. Salimpoor, V. N., Benovoy, M., Larcher, K., Dagher, A., &. Zatorre, R. J. (2011). Anatomically distant dopamine release during anticipation and experience of peak emotion to music. Nature Neuroscience 14, 257–262. Samson, S., & Zatorre, R. J. (1991). Recognition memory for text and melody of songs after unilateral temporal lobe lesion: Evidence for dual encoding. Journal of Experimental Psychology: Learning, Memory, and Cognition 17(4), 793–804. Särkämö, T., Altenmüller, E., Rodriguez-Fornells, A., & Peretz, I. (2016). Editorial. Music, brain, and rehabilitation: Emerging therapeutic applications and potential neural mechanisms. Frontiers in Human Neuroscience 10, 103. Retrieved from https://doi.org/10.3389/fnhum.2016.00103 Satoh, M. (2014). Musical processing in the brain: A neuropsychological approach through cases with amusia. Austin Journal of Clinical Neurology 1(2), 1009. Satoh, M., Takeda, K., Nagata, N., Hatazawa, J., & Kuzuhara, S. (2001). Activated brain regions in musicians during an ensemble: A PET study. Cognitive Brain Research 12(1), 101–108. Schaefer, R. S., Desain, P., & Suppes, P. (2009). Structural decomposition of EEG signatures of melodic processing. Biological Psychology 82(3), 253–259. Schaefer, R. S., Vlek, R. J., & Desain, P. (2011). Music perception and imagery in EEG: Alpha band effects of task and stimulus. International Journal of Psychophysiology 82(3), 254–259. Schellenberg, E. G., & Trehub, S. E. (2003). Good pitch memory is widespread. Psychological Science 14(3), 262–266. Schlaug, G. (2015). Musicians and music making as a model for the study of brain plasticity. Progress in Brain Research 127, 37–55. Schürmann, M., Raij, T., Fujiki, N., & Hari, R. (2002). Mind’s ear in a musician: Where and when in the brain. NeuroImage 16(2), 434–440. Senior, C., Barnes, J., Giampietroc, V., Simmons, A., Bullmore, E. T., Brammer, M., & David, A. S. (2000). The functional neuroanatomy of implicit-motion perception or “representational momentum.” Current Biology 10(1), 16–22. Senior, C., Ward, J., & David, A. S. (2002). Representational momentum and the brain: An investigation of the functional necessity of V5/MT. Visual Cognition 9(1), 81–92. Shapiro, L. (2010). Embodied cognition. New York: Routledge. Shergill, S. S., Bullmore, E., Simmons, A., Murray, R., & McGuire, P. (2000). Functional anatomy of auditory verbal imagery in schizophrenic patients with auditory hallucinations. American Journal
of Psychiatry 157(10), 1691–1693. Simoens, V. L., & Tervaniemi, M. (2013). Auditory short-term memory activation during score reading. PLoS ONE 8(1), e53691. Smith, J. D., Wilson, M., & Reisberg, D. (1995). The role of subvocalization in auditory imagery. Neuropsychologia 33(11), 1433–1454. Snowdon, C. T., Zimmerman, E., & Altenmüller, E. (2015). Music evolution and neuroscience. Progress in Brain Research 217, 17–34. Spiller, M. J., Jonas, C. N., Simner, J., & Jansari, A. (2015). Beyond visual imagery: How modalityspecific is enhanced mental imagery in synesthesia? Consciousness and Cognition 31, 73–85. Stewart, L., von Kriegstein, K., Warren, J. D., & Griffiths, T. D. (2006). Music and the brain: Disorders of musical listening. Brain 129(10), 2533–2553. Thaut, M. H., McIntosh, G. C., Rice, R. R., Miller, R. A., Rathbun, J., & Brault, J. M. (1996). Rhythmic auditory stimulation in gait training with Parkinson’s disease patients. Movement Disorders 11(2), 193–200. Theiler, A. M., & Lippman, L. G. (1995). Effects of mental practice and modeling on guitar and vocal performance. Journal of General Psychology 122(4), 329–343. Tian, X., & Poeppel, D. (2012). Mental imagery of speech: Linking motor and perceptual systems through internal simulation and estimation. Frontiers in Human Neuroscience 6, 314. Retrieved from https://doi.org/10.3389/fnhum.2012.00314 Tillmann, B., Jolicoeur, P., Ishihara, M., Gosselin, N., Bertrand, O., Rossetti, Y., & Peretz, I. (2010). The amusic brain: Lost in music, but not in space. PLoS ONE 5(4), e10173. van Dijk, H., Nieuwenhuis, I. L., & Jensen, O. (2010). Left temporal alpha band activity increases during working memory retention of pitches. European Journal of Neuroscience 31(9), 1701– 1707. Villena-González, M., López, V., & Rodríguez, E. (2016). Data of ERPs and spectral alpha power when attention is engaged on visual or verbal/auditory imagery. Data in Brief 7, 882–888. Vines, B. W., Krumhansl, C. L., Wanderley, M. M., & Levitin, D. J. (2006). Cross-modal interactions in the perception of musical performance. Cognition 101(1), 80–113. Vlek, R. J., Schaefer, R. S., Gielen, C. C. A. M., Farquhar, J. D. R., & Desain, P. (2011). Shared mechanisms in perception and imagery of auditory accents. Clinical Neurophysiology 122(8), 1526–1532. Vogt, S., Buccino, G., Wohlschlager, A. M., Canessa, N., Shah, N. J., Zilles, K., … Fink, G. R. (2007). Prefrontal involvement in imitation learning of hand actions: Effects of practice and expertise. NeuroImage 37(4), 1371–1383. Vuvan, D. T., & Schmuckler, M. A. (2011). Tonal hierarchy representations in auditory imagery. Memory & Cognition 39(3), 477–490. Wan, C. Y., & Schlaug, G. (2013). Brain plasticity induced by musical training. In D. Deutsch (Ed.). The psychology of music (3rd ed., pp. 565–581). New York: Academic Press. Williams, T. I. (2015). The classification of involuntary musical imagery: The case for earworms. Psychomusicology: Music, Mind, and Brain 25(1), 5–13. Wilson, M. (2002). Six views of embodied cognition. Psychonomic Bulletin & Review 9(4), 625–636. Wu, J., Mai, X., Chan, C. C. H., Zheng, Y., & Luo, Y. (2006). Event related potentials during mental imagery of animal sounds. Psychophysiology 43(6), 592–597. Wu, J., Mai, X., Yu, Z., Qin, S., & Luo, Y. (2010). Effects of discrepancy between imagined and perceived sounds on the N2 component of the event-related potential. Psychophysiology 47(2), 289–298. Wu, J., Yu, Z., Mai, X., Wei, J., & Luo, Y. (2011). Pitch and loudness information encoded in auditory imagery as revealed by event-related potentials. Psychophysiology 48(3), 415–419.
Yoo, S. S., Lee, C. U., & Choi, B. G. (2001). Human brain mapping of auditory imagery: Eventrelated functional MRI study. Neuroreport 12(14), 3045–3049. Yumoto, M., Matsuda, M., Itoh, K., Uno, A., Karino, S., Siatoh, O., … Kaga, K. (2005). Auditory imagery mismatch negativity elicited in musicians. Neuroreport 16(11), 1175–1178. Zatorre, R. J. (1988). Pitch perception of complex tones and human temporal-lobe function. Journal of the Acoustical Society of America 84(2), 566–572. Zatorre, R. J., & Halpern, A. R. (1993). Effect of unilateral temporal-lobe excision on perception and imagery of songs. Neuropsychologia 31(3), 221–232. Zatorre, R. J., & Halpern, A. R. (2005). Mental concerts: Musical imagery and the auditory cortex. Neuron 47(1), 9–12. Zatorre, R. J., Halpern, A. R., & Bouffard, M. (2010). Mental reversal of imagined melodies: A role for the posterior parietal cortex. Journal of Cognitive Neuroscience 22(4), 775–789. Zatorre, R. J., Halpern, A. R., Perry, D. W., Meyer, E., & Evans, A. C. (1996). Hearing in the mind’s ear: A PET investigation of musical imagery and perception. Journal of Cognitive Neuroscience 8(1), 29–46. Zhang, Y., Chen, G., Wen, H., Lu, K. H., & Liu, Z. (2017). Musical imagery involves Wernicke’s area in bilateral and anti-correlated network interactions in musicians. Scientific Reports 7(1), 17068.
1
Interestingly, a number of prominent composers are suspected of or have admitted to experiencing synesthesia in which musical stimuli evoked different colors or other visual qualities (e.g., Leonard Bernstein, Duke Ellington, Billy Joel, Franz Liszt, Oliver Messiaen, Nikolai RimskyKorsakov, Alexander Scriabin). The greater prevalence of non-musical imagery (e.g., visual color) triggered by a musical stimulus (e.g., pitch), coupled with the relative lack of musical imagery (e.g., pitch) triggered by a non-musical stimulus (e.g., visual color), is consistent with findings that auditory stimuli evoke non-auditory qualities in a large percentage of synesthetes, but non-auditory stimuli evoke auditory qualities in a very small percentage of synesthetes (Spiller, Jonas, Simner, & Jansari, 2015). Also, it is not clear if the apparent lack of auditory imagery induced in synesthesia is due to a limitation of synesthesia or to a bias in reporting.
CHAPT E R 22
NEUROPLASTICITY IN MUSIC LEARNING V E S A P U T K I N E N A N D MA R I T E RVA N I E MI
I O of the central aims of neuroscience is to understand how experience shapes the brain. Animal studies have demonstrated experience-driven neuroplasticity at various spatial and temporal scales, with some of the earliest evidence coming from studies investigating the effects of enriched environments on brain function and anatomy (Buonomano & Merzenich, 1998; Praag, Kempermann, & Gage, 2000). During the last three decades, musical training has emerged as a popular model for brain plasticity in humans (Herholz & Zatorre, 2012). Since mastering a musical instrument requires years of intense deliberate practice and relies on a multitude of perceptual and cognitive skills, it is a reasonable prediction that the brains of musicians and musical laypersons should differ in many respects. Indeed, since the mid-1990s, an ever-increasing number of neuroimaging and electrophysiological studies have reported differences between adult musicians and non-musicians in neural markers of sensory processing and motor function as well as neural correlates of higher-order, domain-general cognitive functions. Although these findings are typically discussed in the context of expertise and plasticity, it is also widely recognized that, in addition to training, predisposing factors most likely also contribute to these
group differences. Obviously, cross-sectional studies in adults cannot tease apart the contribution of these factors. As a consequence, several research groups have started to conduct longitudinal studies in children in attempt to establish the causal role of training in driving structural and functional differentiation of the brain. Structural imaging studies have provided evidence for numerous differences in neural architecture between adult musicians and nonmusicians including differences in gray matter anatomy of auditory and somatosensory systems (Bermudez, Lerch, Evans, & Zatorre, 2009; Gaser & Schlaug, 2003; Schneider et al., 2002), extra-sensory regions (Bermudez et al., 2009; Gaser & Schlaug, 2003; Sluming et al., 2002), and the organization of the white matter tracks (Bengtsson et al., 2005; Halwani, Loui, Rüber, & Schlaug, 2011; Schlaug, Jäncke, Huang, Staiger, & Steinmetz, 1995). Here, we focus mainly on studies that have investigated functional reorganization of the musician brain and, in particular, on studies employing event-related potentials (ERP) or fields (ERFs) derived from electroencephalography (EEG) and magnetoencephalography (MEG), respectively. This literature spans over twenty years and includes many of the first reports of the neural correlates of musical expertise, and notably, the majority of the more recent longitudinal studies conducted in this framework thus far.
E E
M C
S
P
M Some of the earliest evidence for neuroplastic effects of musical training came from studies that employed source-modeling of electromagnetic brain responses to investigate cortical somatosensory and auditory representations in adult musicians (Elbert, Pantev, Wienbruch, Rockstroh, & Taub, 1995; Pantev et al., 1998). It was found that string players showed stronger cortical responses to pneumatic stimulation of the left-hand digits used to control the pitch of the instrument whereas no differences between the musicians and control subjects were found for the right-hand digits. In the
auditory domain, musicians showed stronger responses to piano tones than non-musicians but not to pure tones, indicating that auditory cortical representations were enhanced in musicians particularly for acoustically rich, musical sounds. Other early electrophysiological evidence indicated that the tuning of the auditory cortical representations in musicians was strongest for the timbre of the musicians’ trained instrument (Pantev, Roberts, Schulz, Engelien, & Ross, 2001). Consequently, these findings were interpreted as evidence for training-induced plasticity—a conclusion bolstered by the finding that the magnitude of the enhancement was correlated with the years of musical training/age of training onset (Elbert et al., 1995; Pantev et al., 1998). Also in the mid-1990s, several research groups began to examine how the auditory system of musicians and non-musicians generate expectations of upcoming auditory events based on preceding auditory input. In the first of these studies, musicians and non-musicians listened to familiar and unfamiliar musical sound sequences and evaluated whether the ending tone was harmonically or rhythmically congruous or not (Besson & Faïta, 1995; Besson, Faïta, & Requin, 1994). Musicians outperformed the controls in the behavioral task—except for the incongruities in the familiar musical phrases which were detected by both groups with equal success—and showed augmented and faster late-latency brain responses (termed late positive component, LPC) to the incongruent endings relative to the nonmusicians indicating that formal knowledge of musical rules modifies the neural mechanisms underlying musical expectancies (Besson & Faïta, 1995). However, the LPC did not differ significantly between the groups in another condition where the task was simply to listen to the musical phrases but not explicitly decide whether the ending tones were congruous or incongruous. These were the first demonstrations that ERPs are sensitive to differences in musical sophistication and indicated that musical training could modify relatively late (400–800 ms after stimulus onset) postperceptual cognitive processes related to the overt detection of the unexpected musical events. Later studies found evidence that the superior sound discrimination in musicians extended to earlier stages of the auditory processing stream and was evident even for sound processing that occurs outside the focus of attention. One example of an ERP index used in such studies is the early anterior negativity (ERAN) that is elicited by harmonically inappropriate
chords within chord cadences. The ERAN peaks around 200 ms and its neural sources have been localized to bilateral inferior frontal gyri, with particularly strong contribution from the right-hemisphere homologue of Broca’s area (Maess, Koelsch, Gunter, & Friederici, 2001). The ERAN is thought to rely on learned music-syntactic rules (Koelsch, 2009) and can be obtained from non-musicians (Koelsch, Gunter, Friederici, & Schröger, 2000) and even from young children (Koelsch et al., 2003). However, the amplitude of this response is augmented in musicians (Koelsch, Schmidt, & Kansok, 2002) and musically trained children (Jentschke & Koelsch, 2009) indicating that, even though implicit learning or harmonic regularities typical for one’s culture is sufficient for the pre-attentive detection of harmony violations, formal training has an additional enhancing effect on this ability. The mismatch negativity (MMN) has been used particularly widely to examine early processing musical sound regularities in musicians and nonmusicians. The MMN has been conceptualized as a response to sounds that deviate from predictions that are generated based on regularities that the auditory system automatically extracts from the sound environment (Näätänen, Paavilainen, Rinne, & Alho, 2007; Näätänen, Tervaniemi, Sussman, Paavilainen, & Winkler, 2001). The MMN appears to originate from auditory cortical and prefrontal sources (Rinne, Alho, Ilmoniemi, Virtanen, & Näätänen, 2000; Schönwiesner et al., 2007) and is typically seen as a negative peak in the ERP between 100 and 250 ms from the onset of a sound that deviates from some regular aspects of a sound stream. These regularities can range from simple ones such as repeating pitch to more abstract rules (Näätänen et al., 2001; Paavilainen, 2013). According to the most widely adopted theoretical account, the MMN is related to the adjustment of the regularity representations in light of the new sound information when the expectations based on these representations are disconformed (Winkler, Denham, & Nelken, 2009). The “automaticity” of the MMN refers to the fact that this response can be elicited even when participants are not actively attending to the stimuli but for example focusing on watching a silent video or performing a dichotic listening task (Näätänen et al., 2007). This property of the MMN helps to rule out biases introduced by differences in motivation or alertness between musicians and non-musicians. Furthermore, the MMN-like responses can be obtained from infants and even from newborn babies and thereby it allows the
investigation of early musically relevant perceptual skills and their maturation (Hannon & Trainor, 2007). Finally, the amplitude of the MMN has been found to correlate with the accuracy of overt stimulus discrimination (Näätänen et al., 2007) and to increase with short-term laboratory training mimicking aspects of musical training (Lappe, Herholz, Trainor, & Pantev, 2008; Menning, Roberts, & Pantev, 2000; Paraskevopoulos, Kuchenbuch, Herholz, & Pantev, 2012a). Thus, the MMN appears to reflect individual differences in sound discrimination skills and is sensitive to plasticity of the auditory system. The earliest MMN study to compare musicians and non-musicians found stronger MMNs in the musicians to occasional mistuning of the third of a repeating major triad chord while no group differences were found for MMNs elicited by occasional pitch changes in simple pure tones (Koelsch, Schröger, & Tervaniemi, 1999). Thus, the enhanced neural pitch discrimination in musicians was specific for the more musically meaningful context. Subsequent studies have reported larger or earlier MMNs in musicians to various types of sound changes including sound omissions (Rüsseler, Altenmüller, Nager, Kohlmetz, & Münte, 2001), changes in contour or interval structure of melodies (Brattico, Näätänen, & Tervaniemi, 2001; Fujioka, Trainor, Ross, Kakigi, & Pantev, 2004, 2005; Tervaniemi, Rytkönen, Schröger, Ilmoniemi, & Näätänen, 2001), the temporal and numerical organization of sound groups (van Zuijen, Sussman, Winkler, Näätänen, & Tervaniemi, 2004, 2005), and to infrequent non-prototypical chords (Brattico et al., 2008). Interestingly, electromagnetic MMN-like mismatch responses elicited by audio-visual incongruities have been found to engage different cortical regions and connectivity patterns in musicians and non-musicians which could be due to musicians’ training in sight reading (Paraskevopoulos, Kraneburg, Herholz, Bamidis, & Pantev, 2015; Paraskevopoulos, Kuchenbuch, Herholz, & Pantev, 2012b). In these studies, the musicians had received training in classical music. This framework was a fruitful starting point for this line of research since the intensity and rigor of the training could be expected to lead to large effects sizes. Furthermore, the training history of the musicians was typically well documented and thereby the brain data could be readily correlated with the years/hours of instrumental practice. In early 2000, however, studies began to examine music experts from other genres also in order to test whether the brain basis of the musical expertise is more or less
the same for all musicians, or whether fine-grained differences, based on different demands set by various instruments or genres, could be reflected in the functional brain indices. The first studies in this framework indicated that not only classical musicians but also rock and jazz musicians are more accurate than laypersons in discriminating various sound features such as timing (jazz musicians) and spatial sound source location (amateur rock musicians) (Tervaniemi, Castaneda, Knoll, & Uther, 2006; Vuust et al., 2005). Studies comparing jazz, rock, and classical musicians to laypersons have found that jazz musicians showed superior neural discrimination for the majority of sound features in a multi-feature MMN paradigm (Vuust, Brattico, Seppänen, Näätänen, & Tervaniemi, 2012) and that musicians with backgrounds in one these genres display different MMN response profiles across different types of changes in melodic sound patterns (Tervaniemi, Janhunen, Kruck, Putkinen, & Huotilainen, 2015). The relationship between speech and music continues to be debated, but these domains clearly have some interesting similarities and appear to rely on some of the same neural mechanisms (Patel, 2011). This raises the question that, if musicians are more sensitive to various features in music, are they also superior in processing of speech sounds? Indeed, Tervaniemi et al. (2009) and Marie and colleagues (Marie, Kujala, & Besson, 2012) showed musicians to be more sensitive to some acoustic changes in speech sounds, while a more recent study found that musicians displayed larger MMNs to pitch, vowel, duration, and voice-onset time changes in spoken syllables (Kühnis, Elmer, Meyer, & Jäncke, 2013). However, in the study of Tervaniemi et al. (2009), the enhanced brain responses to pitch changes were not observed in musicians in a passive listening condition, but only when they were instructed to listen to the sounds indicating that enhanced neural sound discrimination in musicians might, in some situations, become apparent only with top-down modulation of sensory processing (see also Tervaniemi, Just, Koelsch, Widmann, & Schröger, 2005).
S
E
A P
The auditory brainstem response (ABR) is another electrophysiological measure that has been widely used to compare sound encoding between musicians and non-musicians (Kraus & Chandrasekaran, 2010). The ABR elicited by complex sounds like phonemes or musical interval is typically characterized by transient series of peaks within the first few milliseconds followed by sustained response that closely mimics the period features of the sound. As the nomenclature suggests, the ABR is thought to originate mainly from brainstem nuclei (Chandrasekaran & Kraus, 2010) although there is emerging evidence for a cortical contribution to the response (Coffey, Herholz, Chepesiuk, Baillet, & Zatorre, 2016; Coffey, Musacchia, & Zatorre, 2017). The sustained portion of the ABR, termed the frequency following response (FFR), preserves the spectro-temporal features of the stimulus with high fidelity. Therefore, the FFR lends itself to the study of encoding of sound features that are important for pitch and timbre processing in music but also essential for differentiating speech sound. Indeed, the majority of the studies employing the FFR in musicians and non-musicians have been conducted using speech sound and indicate that relative to musically untrained controls musicians show more robust coding of the spectrum of speech stimuli or faster or stronger neural responses at the very early stages of sound encoding (Kraus & Chandrasekaran, 2010; Strait & Kraus, 2014). Furthermore, there is evidence to suggest that these group differences are particularly pronounced in challenging listening conditions such as in the presence of background noise (Coffey, Mogilever, & Zatorre, 2017; Strait & Kraus, 2014). Such findings have raised hopes that musical training could be used to alleviate problems of speech-in-noise perception and other auditory processing deficits that can occur in language and other neurological disorders as well as in normal aging (Alain, Zendel, Hutka, & Bidelman, 2014; Skoe & Kraus, 2010). The first of these studies used the FFR to investigate the encoding of pitch contours in spoken Mandarin Chinese syllables (Wong, Skoe, Russo, Dees, & Kraus, 2007). In Mandarin Chinese, the meaning of syllables is dependent on the pitch contour and a previous study had shown that native speakers of this language display enhanced pitch tracking as indexed by the ABR (Krishnan, Gandour, Ananthakrishnan, & Vijayaraghavan, 2015). The study by Wong et al. in turn found evidence for a more robust pitch tracking in musicians than in non-musicians. As none of the subjects had previous exposure to Mandarin, the results suggest a generalization of the enhanced
sound encoding in musicians to foreign speech sounds. A later study found that musicians had enhanced ABRs to cello and spoken syllables (/da/) both in the early transient portion of the ABR as well as in the FFR time window (Musacchia, Sams, Skoe, & Kraus, 2007). Since these seminal studies, the enhanced encoding of linguistic and non-linguistic sounds in musicians as indexed by the ABR has been replicated numerous times (Strait & Kraus, 2014) also in children and adolescents (discussed below) as well as in aging participants (Alain et al., 2014). Results from laboratory training in sound identification indicate that pitch tracking of the ABR is boosted even by short-term experience (Song, Skoe, Wong, & Kraus, 2008) and thereby support (but obviously do not prove) the notion that the higher quality sound representation in musicians may be attributable to experience. In sum, ABR studies indicate that frequent engagement with musical sounds might tune sound processing in the nuclei along the auditory pathway to fine-grained acoustic information that is important for sound processing in both music and speech. The mechanism underlying the ABR enhancement in musicians is unclear, but has been speculated to be driven by top-down influence through descending (cortico-fugal) pathways from the cortex to the auditory brainstem (Kraus & Chandrasekaran, 2010). The enhanced auditory skills of musicians indeed tend to be accompanied by above-average performance in non-auditory tasks that tap into higher-order cognitive processes such as executive functions (discussed below) which, in some studies, have been found to correlate with the degree of enhancement in ABR indices of sound encoding (e.g., Strait, Kraus, Parbery-Clark, & Ashley, 2010).
M
T
D B
Studying the effects of musical activities on brain development is of great theoretical and practical value since such studies have the potential to reveal the antecedents of the functional group differences seen in adults with and without musical training and are the only way to establish whether early musical activities support the development of cognitive skills that are important to academic achievement and well-being. Cross-sectional and
longitudinal studies indicate that differences in brain function and structure between musically trained and untrained children start to emerge already from a few months to two years of musical training (Chobert, François, Velay, & Besson, 2012; Hyde et al., 2009; Kraus et al., 2014). Importantly, a few controlled intervention studies in children carried out thus far have provided initial evidence for the causal role of practice in neural advantages of music training (Chobert et al., 2012; Kraus et al., 2014; Nan et al., 2018). One of the first studies to be conducted in the framework of musical training and brain plasticity in childhood recorded electric brain potentials evoked by piano, violin, and pure tones in 4- to 5-year-old children enrolled in music lessons and in control children not active in music (Shahin, Roberts, & Trainor, 2004). The early P1 response was larger in the music group for all tones whereas the following P2 was enhanced specifically for the instrument of practice (piano or violin). In a subsequent longitudinal study, the P2 response of children learning to play the violin became enhanced for violin timbre during a one-year follow-up, while it remained unchanged for noise sounds used as the control material (Fujioka, Ross, Kakigi, Pantev, & Trainor, 2006). A more recent longitudinal study found evidence that an Il Sistema-based music program facilitated the maturation of both passive auditory processing and active sound discrimination of musical sounds as indexed by the N1-P2 and P300 components, respectively, when compared to the sport-based intervention or nointervention control group (Habibi, Cahn, Damasio, & Damasio, 2016). In the speech domain, another study found that brain responses to pitch incongruities in both music and speech differentiated 8-year-old children with and without musical training (Magne, Schön, & Besson, 2006). This was followed by longitudinal studies with random assignment that showed evidence for a causal contribution of training in shaping neural responses to pitch changes in speech (Moreno, Marques, Santos, Santos, & Besson, 2009) as well as to responses reflecting speech segmentation (François, Chobert, Besson, & Schön, 2012). The first MMN studies in musically trained children were crosssectional and reported enhanced MMNs in musically trained school-aged children for frequency changes in violin tones (Meyer et al., 2011) and for changes from major chords to minor chords (Virtala, Huotilainen, Putkinen, Makkonen, & Tervaniemi, 2012). Putkinen and colleagues recorded auditory ERPs longitudinally in children who attended a public elementary
school that integrates instrument lessons, orchestra practice, and music theory studies into the daily curriculum (Putkinen, Tervaniemi, Saarikivi, Ojala, & Huotilainen, 2014). The control group consisted of children who did not play a musical instrument and had non-musical hobbies and were matched to the music group with regard to socio-economic status (parental education and income). The MMN elicited by occasional minor chords presented among major chords increased in amplitude more in the music group than in the control group between the ages of 7 and 13 years. Along the same lines, a related study (Putkinen, Tervaniemi, Saarikivi, de Vent, & Huotilainen, 2014) found that the MMNs elicited by changes in melody, rhythm, timbre, and tuning increased more in amplitude in the music group with age than in the control group between 9 and 11 years. Neither study found significant differences in MMN amplitude at the baseline measurement indicating that there was no pre-training enhancement in neural sound discrimination in the music group. Chobert and colleagues conducted a longitudinal study for two school years in children who were randomly assigned to music or painting classes (Chobert et al., 2012). The children received tuition in these activities in 45minute sessions twice a week during the first school year and once a week during the second. MMNs to changes in syllable frequency, duration, and voice-onset time (VOT) were recorded before training and after six and twelve months. The MMNs to syllable duration and VOT changes increased in amplitude after twelve months of training in the children who were involved in music training but not in those taking painting classes. There were no group differences before or at six months after the onset of training. Longitudinal studies have also examined how training affects the development of sound encoding reflected by the ABR. In a randomized longitudinal study, Kraus et al. (2014) found that children between the ages of 6 and 9 who participated in a community-based musical training program (the Harmony project) for two years showed evidence for a more precise differentiation of speech sounds as indexed by the spectrum of ABRs elicited by two different stop consonants (Kraus et al., 2014). Longitudinal studies in adolescents have reported evidence for enhanced maturation of the ABR and earlier emergence of adult-like cortical responses to speech sounds over two years of music lessons (Tierney, Krizman, & Kraus, 2015; Tierney, Krizman, Skoe, Johnston, & Kraus, 2013).
In addition to more conventional training regimes, computerized learning environments have been used to investigate the effects of musical vs. foreign language training in childhood (Janus, Lee, Moreno, & Bialystok, 2016; Moreno, Lee, Janus, & Bialystok, 2015). In both domains, corresponding elements—perception, reading, and production—were taught. In one study, thirty-six 4- to 6-year-old English-speaking children received either French or music training for twenty days, two hours a day (Moreno et al. 2015). In a test-training-retest procedure, the children were tested with EEG and with neurocognitive tests. They were divided into the music and language training groups in a pseudo-random manner. After the intervention, both groups showed enhanced brain responses in the trained domain (music group—music sounds; French group—French vowels) and, correspondingly, reduced reaction in the untrained domain. The study also indicated that these changes persisted one year after the training had ended. There appears to be reasonable empirical support for the conclusion that musical activities in childhood can benefit sound processing skills that are important not only in music but also in the language domain. Whether musical training can enhance higher-order cognitive functions is a more contentious issue. In the next section, we turn to evidence for the putative benefits of effects of musical training on executive functions which has been the focus of a considerable number of studies in recent years (Moreno & Bidelman, 2014).
T
S D
M
P E
I F
:
?
Executive functions refer to top-down control mechanisms such as inhibition, set-shifting, working memory, and selective attention (Diamond, 2013; Friedman & Miyake, 2017). These functions support many higherorder processes (e.g., planning, decision making) and predict various societally important phenomena ranging from academic performance to healthy lifestyle choices (Titz & Karbach, 2014). Inhibition, set-shifting, and working memory are widely considered the core subcomponents of
executive functions (Friedman & Miyake, 2017). Learning to play a musical instrument places heavy demands on these functions and therefore it stands to reason that musical training could be associated with aboveaverage performance in a range of executive function tasks either because these functions help one to persist in musical training and/or because musical training enhances executive functions. A number of studies have found support for this assertion by showing that musically trained adults and children outperform untrained peers in tasks of inhibition, set-shifting, and working memory (Bialystok & DePape, 2009; George & Coch, 2011; Hansen, Wallentin, & Vuust, 2013; Ho, Cheung, & Chan, 2003; Jaschke, Honing, & Scherder, 2018; Moradzadeh, Blumenthal, & Wiseheart, 2015; Saarikivi, Putkinen, Tervaniemi, & Huotilainen, 2016; Zuk, Benjamin, Kenyon, & Gaab, 2014). A few fMRI studies have investigated the neural underpinnings of these behavioral differences and found that musicians recruit prefrontal and other cortical and subcortical regions more strongly than non-musicians during working memory and task-switching tasks (Pallesen et al., 2010; Schulze, Mueller, & Koelsch, 2011; Schulze, Zysset, Mueller, Friederici, & Koelsch, 2011; Zuk et al., 2014). For instance, Pallesen et al. (2010) found that musicians showed stronger activity than non-musicians during a working memory task in various prefrontal and parietal areas, the anterior cingulate, insula and the precentral gyrus (Pallesen et al., 2010). There is also evidence that musicians display structural changes in prefrontal regions that support executive functions (Bermudez et al., 2009; Gaser & Schlaug, 2003; Sluming et al., 2002). The first study to investigate the neural underpinnings of executive functions in musically trained children used a task designed to tap into taskswitching and found stronger activation in ventrolateral prefrontal cortex and supplementary motor area (SMA) in the music group relative to agematched control children (Zuk et al., 2014). A subsequent study investigated the neural correlates of inhibition in children around the age of 9 years who had participated in an El Sistema-based music training program for two years. The results indicated that on the incongruent trials of a ColorWord Stroop task, the children in the music group showed stronger activation in the anterior cingulate, inferior frontal gyrus, pre-SMA/SMA and insula relative to an active control group participating in sports training
and a passive control group who did not participate in an extra-curricular program (Sachs, Kaplan, Der Sarkissian, & Habibi, 2017). Using pseudo-random group allocation, Moreno et al. (2011) compared the neurocognitive effects of computerized intervention for music and visual arts. As in the study described in the previous section (Moreno et al., 2015) the intervention lasted for twenty days and the children were asked to practice twice a day for one hour, each time. In the final analyses, there were forty-eight participants who were 4 to 6 years of age. It was found that the music intervention improved the verbal abilities of the children and that this was paralleled with the facilitation of the neural indices of executive functions. There were no identical improvements in the children in the art group. This suggests that relatively short but very intensive music intervention can improve general cognitive functions, necessary for many learning activities. In the second study, using the same intensive learning environment for music vs. French, Janus et al. (2016) reported significant improvement of the executive functions of their 4- to 6-year-old children— again already in twenty days. Collectively, the studies reviewed here suggest that the musician advantage might extend to domain-general executive functions. It should be noted, however, that results across studies are not very consistent with regard to which subcomponent of executive functions is found to differentiate musicians and non-musicians and some studies have also failed to find evidence for transfer from musical training to domain-general cognitive abilities (e.g., Chobert et al., 2012; Costa-Giomi, 1999; Moreno et al., 2009). Furthermore, most of the evidence for enhanced executive functions in musicians comes from correlational studies that cannot disentangle whether these group differences reflect the effects of training or pre-training differences in cognitive capacity. A recent meta-analysis concluded that the evidence for the benefits of musical training on working memory performance suffers from confounds and the support for such transfer effects is weak (Sala & Gobet, 2017b). More generally, the notion of far transfer—i.e., the assertion that training in one domain could generalize to other only distantly related domains—has lately met with increasing skepticism (Sala & Gobet, 2017a; Ullén, Hambrick, & Mosing, 2016).
P
M N
A C
Evidence for heritability in different measures of musical ability has challenged the view that deliberate practice alone can explain the musician advantages reviewed in this chapter (Ullén et al., 2016). Although heritability and experience-dependent plasticity are not mutually exclusive, these studies show that genetic factors influence the perceptual, motivational, and cognitive abilities involved in music processing which should not be ignored when interpreting the group differences between musicians and non-musicians. Studies have identified candidate genes that might influence musical aptitude (for a review, see the chapter by Järvelä, this volume) while twin studies indicate a substantial genetic component in indices of brain structure and function (Peper, Brouwer, Boomsma, Kahn, & Hulshoff Pol, 2007; Van Beijsterveldt & Van Baal, 2002), and some musically relevant perceptual and motors skills as well as the amount of practice musicians engage in (Drayna, Manichaikul, de Lange, Snieder, & Spector, 2001; Mosing, Madison, Pedersen, Kuja-Halkola, & Ullén, 2014; Ullén, Mosing, & Madison, 2015; Vinkhuyzen, Van der Sluis, Posthuma, & Boomsma, 2009). On the other hand, a recent study (de Manzano & Ullén, 2018) in identical twins discordant on musical training found differences in brain structure between musically trained and non-trained siblings despite the identical genotype indicating these structural differences were due to training. Recent evidence suggests that genetic factors do not have uniform impact in tonal vs. rhythmic processing. Using online listening tasks adapted from the well-established listening test of amusia (MBEA, Montreal Battery on Evaluation of Amusia), Seesjärvi and others investigated the relative contribution of genetic and environmental factors to individual variation in music perception in 384 twins (Seesjärvi et al., 2016). The participants performed three listening tasks that required the detection of pitch differences in pairs of melodies and key or rhythm incongruities within single melodies. The first task involved a working memory component, since it required the comparison of two melodies and resembled tasks that were previously used in studies reporting strong
genetic components in music perception skills (e.g., Drayna et al., 2001). The performance in the latter two tasks, in turn, relied less on working memory and tapped into tonal and rhythmic perception abilities that are implicitly learned even by non-musicians. There were additive genetic effects in the pitch task with no shared environment effects while the opposite was found for the key task. Variation in the rhythm task performance, in turn, was mainly explained by a strong non-shared environmental effect. The authors concluded that this pattern of results suggests that the contribution of genetic and environmental factors on music perception depends on the degree to which the perceptual skill in question relies on formally vs. implicitly acquired knowledge of musical structures.
C
F
D
Longitudinal studies in children are important for establishing whether music and other arts can indeed facilitate real-life learning and academic skills either directly, by influencing relevant cognitive faculties, or indirectly, by increasing motivation and engagement in learning activities. Randomized control trials are of course the gold standard for establishing causality but pose many practical difficulties especially for long-term follow-ups where more naturalistic studies might be the only option. Two ongoing large-scale studies in the United States are currently examining the efficacy of musical training programs implemented in community settings (For first results from these studies see Habibi et al., 2016, 2017; Kraus et al., 2014). In parallel, we have initiated three intervention studies in community settings. The first of them was established in Finnish kindergartens where weekly music playschool and dance programs were integrated into the regular daycare routines (Linnanvalli, Putkinen, Lipsanen, Huotilainen, & Tervaniemi, 2018). In the second study, elementary school teachers receive assistance in implementing physical activity and musical programs as a part of the school day of 10-year-old children on a weekly basis. The third study is ongoing in an elementary school in Beijing where 9- to 10-year-old children receive extra-curricular lessons in music or in English language two to three times a week. In all
these three studies, interventions are preceded and followed by neurocognitive and EEG measurements, and, in the schools, also tests on academic achievement. Our aim is to determine whether these easy-toimplement and motivating music interventions can benefit brain development and facilitate the learning of academic as well as social skills. This chapter has reviewed studies on functional differences between musicians and non-musicians in sound processing and executive functions. This literature provides fairly strong empirical and theoretical grounds for concluding that musical training enhances domain-general auditory processing skills while the evidence for far transfer from musical training to executive functions is more mixed. It also appears likely that training alone cannot explain all the variation between musicians and non-musicians in neurocognitive skills. Cross-sectional studies will continue to elucidate the neural bases of exceptional musical ability and provide hypotheses regarding the effects of musical training on the brain but self-selection complicates the interpretation of such studies in terms of the contribution of predisposing factors and brain plasticity. If these caveats are kept in mind and the hypothesized neuroplastic effects of musical training are tested in longitudinal studies, musical training will continue to serve as a useful model for neural plasticity in humans.
R Alain, C., Zendel, B. R., Hutka, S., & Bidelman, G. M. (2014). Turning down the noise: The benefit of musical training on the aging auditory brain. Hearing Research 308, 162–173. Bengtsson, S. L., Nagy, Z., Skare, S., Forsman, L., Forssberg, H., & Ullén, F. (2005). Extensive piano practicing has regionally specific effects on white matter development. Nature Neuroscience 8(9), 1148–1150. Bermudez, P., Lerch, J. P., Evans, A. C., & Zatorre, R. J. (2009). Neuroanatomical correlates of musicianship as revealed by cortical thickness and voxel-based morphometry. Cerebral Cortex 19(7), 1583–1596. Besson, M., & Faïta, F. (1995). An event-related potential (ERP) study of musical expectancy: Comparison of musicians with nonmusicians. Journal of Experimental Psychology: Human Perception and Performance 21(6), 1278–1296. Besson, M., Faïta, F., & Requin, J. (1994). Brain waves associated with musical incongruities differ for musicians and non-musicians. Neuroscience Letters 168(1), 101–105. Bialystok, E., & DePape, A.-M. (2009). Musical expertise, bilingualism, and executive functioning. Journal of Experimental Psychology: Human Perception and Performance 53(2), 565–574.
Brattico, E., Näätänen, R., & Tervaniemi, M. (2001). Context effects on pitch perception in musicians and nonmusicians: Evidence from event-related-potential recordings. Music Perception: An Interdisciplinary Journal 19(2), 199–222. Brattico, E., Pallesen, K. J., Varyagina, O., Bailey, C., Anourova, I., Järvenpää, M., … Tervaniemi, M. (2008). Neural discrimination of nonprototypical chords in music experts and laymen: An MEG study. Journal of Cognitive Neuroscience 21(11), 2230–2244. Buonomano, D. V., & Merzenich, M. M. (1998). Cortical plasticity: From synapses to maps. Annual Review of Neuroscience 21, 149–186. Chandrasekaran, B., & Kraus, N. (2010). The scalp-recorded brainstem response to speech: Neural origins and plasticity. Psychophysiology 47(2), 236–246. Chobert, J., François, C., Velay, J.-L., & Besson, M. (2012). Twelve months of active musical training in 8- to 10-year-old children enhances the preattentive processing of syllabic duration and voice onset time. Cerebral Cortex 24(4), 956–967. Coffey, E. B. J., Herholz, S. C., Chepesiuk, A. M. P., Baillet, S., & Zatorre, R. J. (2016). Cortical contributions to the auditory frequency-following response revealed by MEG. Nature Communications 7, 11070. doi:10.1038/ncomms11070 Coffey, E. B., Mogilever, N., & Zatorre, R. J. (2017). Speech-in-noise perception in musicians: A review. Hearing Research 352, 49–69. Coffey, E. B., Musacchia, G., & Zatorre, R. J. (2017). Cortical correlates of the auditory frequencyfollowing and onset responses: EEG and fMRI evidence. Journal of Neuroscience 37(4), 830–838. Costa-Giomi, E. (1999). The effects of three years of piano instruction on children’s cognitive development. Journal of Research in Music Education 47(3), 198–212. de Manzano, Ö., & Ullén, F. (2018). Same genes, different brains: Neuroanatomical differences between monozygotic twins discordant for musical training. Cerebral Cortex 28(1), 387–394. Diamond, A. (2013). Executive functions. Annual Review of Psychology 64, 135–168. Drayna, D., Manichaikul, A., de Lange, M., Snieder, H., & Spector, T. (2001). Genetic correlates of musical pitch recognition in humans. Science 291(5510), 1969–1972. Elbert, T., Pantev, C., Wienbruch, C., Rockstroh, B., & Taub, E. (1995). Increased cortical representation of the fingers of the left hand in string players. Science 270(5234), 305–307. François, C., Chobert, J., Besson, M., & Schön, D. (2012). Music training for the development of speech segmentation. Cerebral Cortex 23(9), 2038–2043. Friedman, N. P., & Miyake, A. (2017). Unity and diversity of executive functions: Individual differences as a window on cognitive structure. Cortex 86, 186–204. Fujioka, T., Ross, B., Kakigi, R., Pantev, C., & Trainor, L. J. (2006). One year of musical training affects development of auditory cortical-evoked fields in young children. Brain 129(10), 2593– 2608. Fujioka, T., Trainor, L. J., Ross, B., Kakigi, R., & Pantev, C. (2004). Musical training enhances automatic encoding of melodic contour and interval structure. Journal of Cognitive Neuroscience 16(6), 1010–1021. Fujioka, T., Trainor, L. J., Ross, B., Kakigi, R., & Pantev, C. (2005). Automatic encoding of polyphonic melodies in musicians and nonmusicians. Journal of Cognitive Neuroscience 17(10), 1578–1592. Gaser, C., & Schlaug, G. (2003). Brain structures differ between musicians and non-musicians. Journal of Neuroscience 23(27), 9240–9245. George, E. M., & Coch, D. (2011). Music training and working memory: An ERP study. Neuropsychologia 49(5), 1083–1094. Habibi, A., Cahn, B. R., Damasio, A., & Damasio, H. (2016). Neural correlates of accelerated auditory processing in children engaged in music training. Developmental Cognitive Neuroscience
21, 1–14. Habibi, A., Damasio, A., Ilari, B., Veiga, R., Joshi, A. A., Leahy, R. M., … Damasio, H. (2017). Childhood music training induces change in micro and macroscopic brain structure: Results from a longitudinal study. Cerebral Cortex, 1–12. Retrieved from https://doi.org/10.1093/cercor/bhx286 Halwani, G. F., Loui, P., Rüber, T., & Schlaug, G. (2011). Effects of practice and experience on the arcuate fasciculus: Comparing singers, instrumentalists, and non-musicians. Frontiers in Psychology 2. Retrieved from https://doi.org/10.3389/fpsyg.2011.00156 Hannon, E. E., & Trainor, L. J. (2007). Music acquisition: Effects of enculturation and formal training on development. Trends in Cognitive Sciences 11(11), 466–472. Hansen, M., Wallentin, M., & Vuust, P. (2013). Working memory and musical competence of musicians and non-musicians. Psychology of Music 41(6), 779–793. Herholz, S. C., & Zatorre, R. J. (2012). Musical training as a framework for brain plasticity: behavior, function, and structure. Neuron 76(3), 486–502. Ho, Y. C., Cheung, M. C., & Chan, A. S. (2003). Music training improves verbal but not visual memory: Cross-sectional and longitudinal explorations in children. Neuropsychology 17(3), 439– 450. Hyde, K. L., Lerch, J., Norton, A., Forgeard, M., Winner, E., Evans, A. C., & Schlaug, G. (2009). Musical training shapes structural brain development. Journal of Neuroscience 29(10), 3019–3025. Janus, M., Lee, Y., Moreno, S., & Bialystok, E. (2016). Effects of short-term music and secondlanguage training on executive control. Journal of Experimental Child Psychology 144, 84–97. Jaschke, A. C., Honing, H., & Scherder, E. J. (2018). Longitudinal analysis of music education on executive functions in primary school children. Frontiers in Neuroscience 12, 103. Retrieved from https://www.frontiersin.org/articles/10.3389/fnins.2018.00103 Jentschke, S., & Koelsch, S. (2009). Musical training modulates the development of syntax processing in children. NeuroImage 47(2), 735–744. Koelsch, S. (2009). Music-syntactic processing and auditory memory: Similarities and differences between ERAN and MMN. Psychophysiology 46(1), 179–190. Koelsch, S., Grossmann, T., Gunter, T. C., Hahne, A., Schröger, E., & Friederici, A. D. (2003). Children processing music: Electric brain responses reveal musical competence and gender differences. Journal of Cognitive Neuroscience 15(5), 683–693. Koelsch, S., Gunter, T., Friederici, A. D., & Schröger, E. (2000). Brain indices of music processing: “Nonmusicians” are musical. Journal of Cognitive Neuroscience 12(3), 520–541. Koelsch, S., Schmidt, B.-H., & Kansok, J. (2002). Effects of musical expertise on the early right anterior negativity: An event-related brain potential study. Psychophysiology 39(5), 657–663. Koelsch, S., Schröger, E., & Tervaniemi, M. (1999). Superior pre-attentive auditory processing in musicians. Neuroreport 10(6), 1309–1313. Kraus, N., & Chandrasekaran, B. (2010). Music training for the development of auditory skills. Nature Reviews Neuroscience 11(8), 599–605. Kraus, N., Slater, J., Thompson, E. C., Hornickel, J., Strait, D. L., Nicol, T., & White-Schwoch, T. (2014). Music enrichment programs improve the neural encoding of speech in at-risk children. Journal of Neuroscience 34(36), 11913–11918. Krishnan, A., Gandour, J. T., Ananthakrishnan, S., & Vijayaraghavan, V. (2015). Language experience enhances early cortical pitch-dependent responses. Journal of Neurolinguistics 33, 128–148. Kühnis, J., Elmer, S., Meyer, M., & Jäncke, L. (2013). The encoding of vowels and temporal speech cues in the auditory cortex of professional musicians: An EEG study. Neuropsychologia 51(8), 1608–1618.
Lappe, C., Herholz, S. C., Trainor, L. J., & Pantev, C. (2008). Cortical plasticity induced by shortterm unimodal and multimodal musical training. Journal of Neuroscience 28(39), 9632–9639. Linnavalli, T., Putkinen, V., Lipsanen, J., Huotilainen, M., & Tervaniemi, M. (2018). Music playschool enhances children’s linguistic skills. Scientific Reports 8(1), 8767. Maess, B., Koelsch, S., Gunter, T. C., & Friederici, A. D. (2001). Musical syntax is processed in Broca’s area: An MEG study. Nature Neuroscience 4(5), 540–545. Magne, C., Schön, D., & Besson, M. (2006). Musician children detect pitch violations in both music and language better than nonmusician children: Behavioral and electrophysiological approaches. Journal of Cognitive Neuroscience 18(2), 199–211. Marie, C., Kujala, T., & Besson, M. (2012). Musical and linguistic expertise influence pre-attentive and attentive processing of non-speech sounds. Cortex 48(4), 447–457. Menning, H., Roberts, L. E., & Pantev, C. (2000). Plastic changes in the auditory cortex induced by intensive frequency discrimination training. Neuroreport 11(4), 817–822. Meyer, M., Elmer, S., Ringli, M., Oechslin, M. S., Baumann, S., & Jäncke, L. (2011). Long-term exposure to music enhances the sensitivity of the auditory system in children. European Journal of Neuroscience 34(5), 755–765. Moradzadeh, L., Blumenthal, G., & Wiseheart, M. (2015). Musical training, bilingualism, and executive function: A closer look at task switching and dual-task performance. Cognitive Science 39(5), 992–1020. Moreno, S., Bialystok, E., Barac, R., Schellenberg, E. G., Cepeda, N. J., & Chau, T. (2011). Shortterm music training enhances verbal intelligence and executive function. Psychological Science 22(11), 1425–1433. Moreno, S., & Bidelman, G. M. (2014). Examining neural plasticity and cognitive benefit through the unique lens of musical training. Hearing Research 308, 84–97. Moreno, S., Lee, Y., Janus, M., & Bialystok, E. (2015). Short-term second language and music training induces lasting functional brain changes in early childhood. Child Development 86(2), 394–406. Moreno, S., Marques, C., Santos, A., Santos, M., & Besson, M. (2009). Musical training influences linguistic abilities in 8-year-old children: More evidence for brain plasticity. Cerebral Cortex 19(3), 712–723. Mosing, M. A., Madison, G., Pedersen, N. L., Kuja-Halkola, R., & Ullén, F. (2014). Practice does not make perfect: No causal effect of music practice on music ability. Psychological Science 25(9), 1795–1803. Musacchia, G., Sams, M., Skoe, E., & Kraus, N. (2007). Musicians have enhanced subcortical auditory and audiovisual processing of speech and music. Proceedings of the National Academy of Sciences 104(40), 15894–15898. Näätänen, R., Paavilainen, P., Rinne, T., & Alho, K. (2007). The mismatch negativity (MMN) in basic research of central auditory processing: A review. Clinical Neurophysiology 118(12), 2544– 2590. Näätänen, R., Tervaniemi, M., Sussman, E., Paavilainen, P., & Winkler, I. (2001). “Primitive intelligence” in the auditory cortex. Trends in Neurosciences 24(5), 283–288. Nan, Y., Liu, L., Geiser, E., Shu, H., Gong, C. C., Dong, Q., … Desimone, R. (2018). Piano training enhances the neural processing of pitch and improves speech perception in Mandarin-speaking children. Proceedings of the National Academy of Sciences 115(28), E6630–E6639. Paavilainen, P. (2013). The mismatch-negativity (MMN) component of the auditory event-related potential to violations of abstract regularities: A review. International Journal of Psychophysiology 88(2), 109–123.
Pallesen, K. J., Brattico, E., Bailey, C. J., Korvenoja, A., Koivisto, J., Gjedde, A., & Carlson, S. (2010). Cognitive control in auditory working memory is enhanced in musicians. PloS ONE 5(6), e11120. Pantev, C., Oostenveld, R., Engelien, A., Ross, B., Roberts, L. E., & Hoke, M. (1998). Increased auditory cortical representation in musicians. Nature 392(6678), 811–814. Pantev, C., Roberts, L. E., Schulz, M., Engelien, A., & Ross, B. (2001). Timbre-specific enhancement of auditory cortical representations in musicians. Neuroreport 12(1), 169–174. Paraskevopoulos, E., Kraneburg, A., Herholz, S. C., Bamidis, P. D., & Pantev, C. (2015). Musical expertise is related to altered functional connectivity during audiovisual integration. Proceedings of the National Academy of Sciences 112(40), 12522–12527. Paraskevopoulos, E., Kuchenbuch, A., Herholz, S. C., & Pantev, C. (2012a). Evidence for traininginduced plasticity in multisensory brain structures: An MEG study. PloS ONE 7(5), e36534. Paraskevopoulos, E., Kuchenbuch, A., Herholz, S. C., & Pantev, C. (2012b). Musical expertise induces audiovisual integration of abstract congruency rules. Journal of Neuroscience 32(50), 18196–18203. Patel, A. D. (2011). Why would musical training benefit the neural encoding of speech? The OPERA hypothesis. Frontiers in Psychology 2, 142. doi:10.3389/fpsyg.2011.00142 Peper, J. S., Brouwer, R. M., Boomsma, D. I., Kahn, R. S., & Hulshoff Pol, H. E. (2007). Genetic influences on human brain structure: A review of brain imaging studies in twins. Human Brain Mapping 28(6), 464–473. Praag, H. van, Kempermann, G., & Gage, F. H. (2000). Neural consequences of environmental enrichment. Nature Reviews Neuroscience 1(3), 191–198. Putkinen, V., Tervaniemi, M., Saarikivi, K., de Vent, N., & Huotilainen, M. (2014). Investigating the effects of musical training on functional brain development with a novel melodic MMN paradigm. Neurobiology of Learning and Memory 110, 8–15. Putkinen, V., Tervaniemi, M., Saarikivi, K., Ojala, P., & Huotilainen, M. (2014). Enhanced development of auditory change detection in musically trained school-aged children: A longitudinal event-related potential study. Developmental Science 17(2), 282–297. Rinne, T., Alho, K., Ilmoniemi, R. J., Virtanen, J., & Näätänen, R. (2000). Separate time behaviors of the temporal and frontal mismatch negativity sources. NeuroImage 12(1), 14–19. Rüsseler, J., Altenmüller, E., Nager, W., Kohlmetz, C., & Münte, T. F. (2001). Event-related brain potentials to sound omissions differ in musicians and non-musicians. Neuroscience Letters 308(1), 33–36. Saarikivi, K., Putkinen, V., Tervaniemi, M., & Huotilainen, M. (2016). Cognitive flexibility modulates maturation and music-training-related changes in neural sound discrimination. European Journal of Neuroscience 44(2), 1815–1825. Sachs, M., Kaplan, J., Der Sarkissian, A., & Habibi, A. (2017). Increased engagement of the cognitive control network associated with music training in children during an fMRI Stroop task. PloS ONE 12(10), e0187254. Sala, G., & Gobet, F. (2017a). Does far transfer exist? Negative evidence from chess, music, and working memory training. Current Directions in Psychological Science 26(6), 515–520. Sala, G., & Gobet, F. (2017b). When the music’s over: Does music skill transfer to children’s and young adolescents’ cognitive and academic skills? A meta-analysis. Educational Research Review 20, 55–67. Schlaug, G., Jäncke, L., Huang, Y., Staiger, J. F., & Steinmetz, H. (1995). Increased corpus callosum size in musicians. Neuropsychologia 33(8), 1047–1055. Schneider, P., Scherg, M., Dosch, H. G., Specht, H. J., Gutschalk, A., & Rupp, A. (2002). Morphology of Heschl’s gyrus reflects enhanced activation in the auditory cortex of musicians.
Nature Neuroscience 5(7), 688–694. Schönwiesner, M., Novitski, N., Pakarinen, S., Carlson, S., Tervaniemi, M., & Näätänen, R. (2007). Heschl’s gyrus, posterior superior temporal gyrus, and mid-ventrolateral prefrontal cortex have different roles in the detection of acoustic changes. Journal of Neurophysiology 97(3), 2075–2082. Schulze, K., Mueller, K., & Koelsch, S. (2011). Neural correlates of strategy use during auditory working memory in musicians and non-musicians. European Journal of Neuroscience 33(1), 189– 196. Schulze, K., Zysset, S., Mueller, K., Friederici, A. D., & Koelsch, S. (2011). Neuroarchitecture of verbal and tonal working memory in nonmusicians and musicians. Human Brain Mapping 32(5), 771–783. Seesjärvi, E., Särkämö, T., Vuoksimaa, E., Tervaniemi, M., Peretz, I., & Kaprio, J. (2016). The nature and nurture of melody: A twin study of musical pitch and rhythm perception. Behavior Genetics 46(4), 506–515. Shahin, A., Roberts, L. E., & Trainor, L. J. (2004). Enhancement of auditory cortical development by musical experience in children. Neuroreport 15(12), 1917–1921. Skoe, E., & Kraus, N. (2010). Auditory brainstem response to complex sounds: A tutorial. Ear and Hearing 31(3), 302–324. Sluming, V., Barrick, T., Howard, M., Cezayirli, E., Mayes, A., & Roberts, N. (2002). Voxel-based morphometry reveals increased gray matter density in Broca’s area in male symphony orchestra musicians. NeuroImage 17(3), 1613–1622. Song, J. H., Skoe, E., Wong, P. C., & Kraus, N. (2008). Plasticity in the adult human auditory brainstem following short-term linguistic training. Journal of Cognitive Neuroscience 20(10), 1892–1902. Strait, D. L., & Kraus, N. (2014). Biological impact of auditory expertise across the life span: Musicians as a model of auditory learning. Hearing Research 308(Suppl. C), 109–121. Strait, D. L., Kraus, N., Parbery-Clark, A., & Ashley, R. (2010). Musical experience shapes top-down auditory mechanisms: Evidence from masking and auditory attention performance. Hearing Research 261(1), 22–29. Tervaniemi, M., Castaneda, A., Knoll, M., & Uther, M. (2006). Sound processing in amateur musicians and nonmusicians: Event-related potential and behavioral indices. Neuroreport 17(11), 1225–1228. Tervaniemi, M., Janhunen, L., Kruck, S., Putkinen, V., & Huotilainen, M. (2015). Auditory profiles of classical, jazz, and rock musicians: Genre-specific sensitivity to musical sound features. Frontiers in Psychology 6. Retrieved from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4703758/ Tervaniemi, M., Just, V., Koelsch, S., Widmann, A., & Schröger, E. (2005). Pitch discrimination accuracy in musicians vs. nonmusicians: An event-related potential and behavioral study. Experimental Brain Research 161(1), 1–10. Tervaniemi, M., Kruck, S., De Baene, W., Schröger, E., Alter, K., & Friederici, A. D. (2009). Topdown modulation of auditory processing: Effects of sound context, musical expertise and attentional focus. European Journal of Neuroscience 30(8), 1636–1642. Tervaniemi, M., Rytkönen, M., Schröger, E., Ilmoniemi, R. J., & Näätänen, R. (2001). Superior formation of cortical memory traces for melodic patterns in musicians. Learning & Memory 8(5), 295–300. Tierney, A. T., Krizman, J., & Kraus, N. (2015). Music training alters the course of adolescent auditory development. Proceedings of the National Academy of Sciences 112(32), 10062–10067. Tierney, A. T., Krizman, J., Skoe, E., Johnston, K., & Kraus, N. (2013). High school music classes enhance the neural processing of speech. Frontiers in Psychology 4. Retrieved from
https://doi.org/10.3389/fpsyg.2013.00855 Titz, C., & Karbach, J. (2014). Working memory and executive functions: Effects of training on academic achievement. Psychological Research 78(6), 852–868. Ullén, F., Hambrick, D. Z., & Mosing, M. A. (2016). Rethinking expertise: A multifactorial gene– environment interaction model of expert performance. Psychological Bulletin 142(4), 427–446. Ullén, F., Mosing, M. A., & Madison, G. (2015). Associations between motor timing, music practice, and intelligence studied in a large sample of twins. Annals of the New York Academy of Sciences 1337, 125–129. Van Beijsterveldt, C. E. M., & Van Baal, G. C. M. (2002). Twin and family studies of the human electroencephalogram: A review and a meta-analysis. Biological Psychology 61(1), 111–138. van Zuijen, T. L., Sussman, E., Winkler, I., Näätänen, R., & Tervaniemi, M. (2004). Grouping of sequential sounds: An event-related potential study comparing musicians and nonmusicians. Journal of Cognitive Neuroscience 16(2), 331–338. van Zuijen, T. L., Sussman, E., Winkler, I., Näätänen, R., & Tervaniemi, M. (2005). Auditory organization of sound sequences by a temporal or numerical regularity: A mismatch negativity study comparing musicians and non-musicians. Brain Research: Cognitive Brain Research 23(2– 3), 270–276. Vinkhuyzen, A. A., Van der Sluis, S., Posthuma, D., & Boomsma, D. I. (2009). The heritability of aptitude and exceptional talent across different domains in adolescents and young adults. Behavior Genetics 39(4), 380–392. Virtala, P., Huotilainen, M., Putkinen, V., Makkonen, T., & Tervaniemi, M. (2012). Musical training facilitates the neural discrimination of major versus minor chords in 13-year-old children. Psychophysiology 49(8), 1125–1132. Vuust, P., Brattico, E., Seppänen, M., Näätänen, R., & Tervaniemi, M. (2012). The sound of music: Differentiating musicians using a fast, musical multi-feature mismatch negativity paradigm. Neuropsychologia 50(7), 1432–1443. Vuust, P., Pallesen, K. J., Bailey, C., van Zuijen, T. L., Gjedde, A., Roepstorff, A., & Østergaard, L. (2005). To musicians, the message is in the meter: Pre-attentive neuronal responses to incongruent rhythm are left-lateralized in musicians. NeuroImage 24(2), 560–564. Winkler, I., Denham, S. L., & Nelken, I. (2009). Modeling the auditory scene: Predictive regularity representations and perceptual objects. Trends in Cognitive Sciences 13(12), 532–540. Wong, P. C. M., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nature Neuroscience 10(4), 420–422. Zuk, J., Benjamin, C., Kenyon, A., & Gaab, N. (2014). Behavioral and neural correlates of executive functioning in musicians and non-musicians. PloS ONE 9(6), e99868.
SECTION VI
D E V E L OP ME N TA L IS S U E S IN MU S IC A N D T HE B R A IN
CHAPT E R 23
THE ROLE OF MUSICAL D E V E L O P M E N T I N E A R LY LANGUAGE ACQUISITION A N T H O N Y B R A N D T, MO L LY G E B R I A N, A N D L . R O B E RT SLEVC
I L is a foundation of human culture, and the transmission of language from caregiver to child is one of human society’s most universal and cherished tasks. As language is passed down, infants benefit from both explicit and implicit learning. Caregivers tutor them with infant-directed speech or motherese (Falk, 2004; Fernald, 1992;). Infants also unconsciously internalize the conversations that happen around them (Perruchet & Pacton, 2006; Saffran Aslin, & Newport, 1996). Thanks to direct tutelage and less focused exposure, normally developing children are able to learn their native tongue within the first few years of life. As Perani writes, “The way or mechanism through which language is acquired and mastered is one of the core questions in the domain of human sciences” (Perani, 2012, p. 306). Scientists have long speculated about whether music is implicated in this process. After all, just like language, music is ubiquitous in the world’s cultures: every normally developing human is born with the ability to appreciate music; populations everywhere
sing as well as speak. Given increasing evidence for shared neural resources for these two forms of human expression in adults (Patel, 2012), is it possible that they are even more entangled in infancy? Is music involved in early language acquisition? At first glance, it might be difficult to see how. Language is referential. From our daily conversations to the loftiest tomes, it is our way of transmitting information. As Jackendoff (2009, p. 197) writes: Language is essentially a mapping between sound and “propositional” or “conceptual” thought. The messages it conveys can be about people, objects, places, actions, or any manner of abstraction. Language can convey information about past, future, visible things, invisible things, and what is not the case.
Jackendoff continues: “None of these functions can be satisfied by music.” Music can be put to many uses: it can express emotional states, as in a love song; it can depict physical phenomena, such as a Tuvan throat singer’s imitation of a waterfall; it can evoke spiritual enlightenment, as in religious chant; it can be a display of stamina, as in an Inuit vocal competition; it can even present mathematical structure, as in the Fibonacci proportions in Bela Bartok’s string quartets. But it has little of the declamatory power of language. Language is built to say things like “We’re out of eggs—can you please run to the store?” Any musician would labor in vain to express something that concrete. In cultures around the world, distinguishing between the real and the hypothetical (“You should have bought the other car”) and past, present, and future (“The coupon expired yesterday”) are a routine and vital part of linguistic communication; as Jackendoff writes, music is virtually incapable of expressing these quotidian concepts. Although there is undeniably some overlap, music—in all its diverse manifestations around the globe—doesn’t mimic or reproduce the functions of language; rather, it complements them. Or as Victor Hugo put it: “Music expresses that which cannot be put into words” (Hugo, 1864, p. 44). Part and parcel of this contrast is that language and music are organized differently. Language is a combination of vocabulary and syntax. Music may often consist of recurring patterns, tropes, and even formulaic gestures —but there’s no dictionary for motives. And the syntactic distinctions between subject, verb, and object—virtually universal in language—have no correspondence in music. It’s for this reason that you can’t faithfully translate one genre of music into another: it’s fruitless to search for a
gamelan version of a Beethoven string quartet, or the Gagaku version of a country-and-western song (Jackendoff, 2009). There are other distinctions as well. Whether the tala of Indian classical music or the twelve-bar blues of jazz, the use of cyclic form is widespread in musical cultures. This is particularly true when the music is participatory: the structural predictability gives spectators the confidence to join in. Cyclic structure is not requisite or absolute—but even repertoire that involves unsynchronized or loosely coordinated ensemble performance, such as Japanese Gagaku, is often underpinned by repeating metric structures (Harich-Schneider, 1954). But whereas cyclic structure is perhaps the most elemental and resilient musical structure, it is only marginally relevant to language. We typically don’t express ourselves in loops: rather, a linguistic argument is narrative, rather than circular. Similarly, a great deal of indigenous music sets a tempo or pace—and sticks with it. For instance, in gamelan music, each large section is characterized by an underlying pulse; during a transitional section called the pathetan, most of the instruments drop out as the pulse shifts; then, when a new pulse is established, the full ensemble again joins in (Spiller, 2004). Linguistic communication doesn’t dictate speed in that way. The speed of verbal communication sways with our thoughts: hesitating when we’re searching for the right word or trying to recall something, rushing ahead when we’re surer of ourselves or excited. Thus, the contrast between language and music—both in function and rhetoric—is quite stark. It is perhaps not surprising, then, that NormanHaignere and colleagues (Norman-Haignere, Kanwisher, & McDermott, 2015) recently identified patterns of neural activity associated with musical stimuli that were dissociable from activity for speech and environmental sounds. To the adult mind, language and music are not easily confused. Jackendoff urges “caution in drawing strong connections between language and music, both in the contemporary human brain and in their evolutionary roots” (Jackendoff, 2009, p. 203). But consider the experience of listening to a language you don’t know: what you hear is a vocal performance, which varies in timbre, melody, and rhythm. When you don’t understand the words, speech is a type of music. And that is how infants are first exposed to language.
T
M
S
Is it accurate to describe speech as musical? Phonemes are the basic units of speech, prosody the way speech is delivered. Both of these have musical attributes. Phonemes—the distinct units of sound in a language that make up words —involve different attack characteristics and acoustic spectra. As a result, distinguishing phonemes involves rapid temporal processing similar to the discrimination of musical timbre: for instance, the distinctions between the consonants s and k happen on the order of 25–50 ms—the same as the time window for the contrast between a cymbal and woodblock (Hukin & Darwin, 1995; Robinson & Patterson, 1995; Shepard, 1980). The timbral characteristics of phonemes are put to musical use in scat singing and bebop, in which nonsense syllables are used for their sonic appeal. This is playfully illustrated by the song “Who put the Bomp?” (Mann, 1961): Who put the bomp In the bomp bah bomp bah bomp? Who put the ram In the rama lama ding dong? … I’d like to shake his hand He made my baby Fall in love with me …
Milton Babbitt’s Phonemena is a more cerebral form of scat singing: the text consists of varying combinations of twelve vowel-based sounds and twenty-four consonants. As in bebop, Babbitt’s phonemes are not bearers of meaning, but rather serve purely musical functions (Kostelanetz, 1987). On a tight deadline and unable to get a colleague at the United Nations to provide African lyrics in time, Lionel Richie invented a nonsense language for his 1983 hit song All Night Long: “Tom bo il de ay de moi ya, Hey Jambo Jumbo” (Fleming, 2013). Every language has its distinctive sonic characteristics, based on its inventory of phonemes. For instance, the Xhosa language has distinctive vocal “clicks.” Bantu doesn’t include the English cluster sch, so “school” in Bantu is “sukulu.” English doesn’t include the Russian tsch, and Japanese
doesn’t have the American r. Languages vary widely in their sonic inventory: the Xhosa dialect Taa is estimated to have upwards of two hundred phonemes, while Hawaiian has less than two dozen (Rousseau, 2016). Understanding and speaking any language involves mastering its unique phonemic palette, as well as the combinations in which they occur. Meanwhile, prosody refers to the melodic and rhythmic inflection of speech. In tonal languages such as Hmong and Mandarin, melodic inflection is a determinant of meaning. For instance, in Hmong, depending on how it is spoken, the syllable paw can have seven different meanings: “female,” “ball,” “thorn,” “paternal grandmother,” “pancreas,” “to see,” and “to throw” (McWhorter, 2015). The Chinese poem “The Lion-Eating Poet in the Stone Den” consists only of the syllable shi, repeated 92 times. But by speaking shi with the appropriate pitch contours, the spoken text tells the story of a poet named Shih Shih who hunts lions at a market, takes them back to his stone den, and tries to eat ten of them (Forsyth, 2012). In the Bantu languages of Africa, successive syllables oscillate between “register tones.” The use of vocal registers is once again a determinant of meaning: in the Nigerian language of Akan, the words for “good,” “fan,” and “father” are all papa, pronounced with different register tones (McWhorter, 2015). Even in non-tonal languages such as English, prosody is essential to verbal communication, helping to demarcate word and phrase boundaries, create emphases, and distinguish questions from statements: when spoken, the sentences “She’s next in line” and “She’s next in line?” can only be told apart thanks to their melodic inflection. Rhythm is also an important component of prosody. In stress-timed languages like English, accented syllables are elongated, whereas in syllable-timed languages like Japanese, they’re not. Ramus and Mehler (1999) devised a study in which they “smoothed out” the phonetic differences between English and Japanese and tested whether adults could still correctly identify the source tongue. They found that French speakers could indeed discriminate between the two languages based on little else than their rhythmic patterns. The melody and rhythm of speech play a special role in the Ewe and Yoruba tribes of Africa. Both tribes use “talking drums” to mimic their tonal languages: by transposing the prosodic features of their speech into percussion riffs, the Ewe have elaborate “shouting” contests and the Yoruba communicate over large distances (Batuman, 2012).
The prosody of a non-tonal language like English is put to musical use in Steve Reich’s Different Trains for string quartet and electronic tape. A meditation on the contrasting voyages of American and European Jews in the 1940s, Reich interviewed riders of the transcontinental railroad along with Holocaust survivors. As the electronic tape performs snippets of these spoken commentaries—“from New York to Los Angeles,” “Black crows invaded our country many years ago”—the string quartet imitates their pitch and rhythmic inflections, turning his subjects’ speech into musical motives (Reich, 1989). Similarly, jazz artist Jason Moran “transcribed” a lecture by artist Adrian Piper into a piano solo by imitating the melody and rhythm of her speech and then adopted the resulting musical line as the basis for a jazz improvisation (Moran, 2006). This link between prosody and music was clearly demonstrated in an experiment by Diana Deutsch (Deutsch, Henthorn, & Lapidis, 2011). Deutsch looped the recorded phrase “sometimes behave so strangely” over and over and found that, after about ten repetitions, adult listeners heard the looped words as sung rather than spoken. Deutsch further noted that when she scrambled the order of the syllables, the listeners once again heard the text as normal speech. She surmised that, when the listeners no longer needed to attend to the meaning of the phrase, they began to pay more attention to its prosody; but when the syllables were scrambled, that nullified the effect, because the listeners were once again trying to figure out what the words meant. Still, is it appropriate to describe speech as “musical”? Peretz (2001, p. 440) has proposed that the “two anchorage points of brain specialization for music are the encoding of pitch along musical scales and the ascribing of a regular pulse to incoming events.” Speech doesn’t have either. However, there is indigenous music that doesn’t as well: for instance, many types of throat singing involve neither traditional scales nor fixed pulse. Because, as Cross notes, music is “cultural, variable, and particular” (Cross, 2001, p. 32), we have argued that music is best defined as “creative play with sound, in which there is an attention to sound’s acoustic properties, irrespective of any referential meaning” (Brandt, Gebrian, & Slevc, 2012). Given that it involves close attention to pitch, rhythm, and timbre, speech can be viewed as a special type of music—especially if it is a language you don’t speak. Music may at times seem to be more “rehearsed” or programmed than conversational speech: a Beethoven sonata will sound nearly identical from
performance to performance. But that is not the case in improvisatory traditions: a Japanese shakuhashi performer or Indian sitar player will never perform the same way twice. In “free jazz,” the musicians are not bound by shared pulse or harmony and often spontaneously incorporate extended techniques such as over-blowing, key clicks, pitch bends, and multiphonics. Taking the broadest possible definition of music, speech can be viewed as a form of musical improvisation: it is “a concert of phonemes and syllables, melodically inflected by prosody” (Brandt et al., 2012, p. 4). Each language is distinguished by its inventory of possible sounds and its conventions of melodic and rhythmic performance. The question is: does this matter to infants?
M
L A
P B
In societies across the globe, getting children conversant with their native tongue as fast as possible is a universal goal. Thus, children’s ability to learn language is an essential constraint on its structure. As Deacon (1997, p. 110) writes, “The structure of a language is under intense selection pressure because in its reproduction from generation to generation, it must pass through a narrow bottleneck: children’s minds.” Do language’s musical features help to facilitate this process? Despite their inability to speak and understand language, newborns display an impressive sensitivity to a variety of linguistic contrasts. This sensitivity has been often cited as evidence that the ability to learn language is innate (e.g., Eimas, Siqueland, Jusczyk, & Vigorito, 1971; Vouloumanos & Werker, 2007). However, infants are most responsive to the sounds of words, not to their meaning: they are drawn first to the musical aspects of language. For instance, as they encounter the world for the first time, newborns are famously able to discriminate the phonemes of all languages (DehaeneLambertz & Dehaene, 1994; Eimas et al., 1971; Werker & Tees, 1984). As discussed above, this reflects a sensitivity to (vocal) timbre. Although less research has been done to probe musical timbre perception in newborns, 3to 4-day-old infants can organize auditory streams on the basis of timbre,
similar to adults (McAdams & Bertoncini, 1997). Older infants remain highly sensitive to timbre: 6-month-olds have long-term memory for the timbre of folk songs, and 7- to 8.5-month-old infants can differentiate tones that differ only in their spectral structure (Trainor, Wu, & Tsang, 2004; Trehub, Endman, & Thorpe, 1990). Timbre appears to be so salient to infants that it can actually affect their ability to recognize and discriminate other basic features of music and speech (for summaries, see Costa-Giomi & Davila, 2014; Creel, 2016). For instance, infants take longer to learn words spoken by different speakers than when they are spoken by just one person (Jusczyl, Pisoni, & Mullennix, 1992). Trainor and colleagues (2004) familiarized infants with a melody played on a single instrument and found that the infants could not recognize this same melody when played on a different instrument. The salience of timbre extends through preschool: when asked to associate sound to visual stimuli, preschoolers do so more readily with timbral contrasts than with pitch contours (Creel, 2016). Even adults show memory facilitation dependent on timbre (e.g., Halpern & Müllensiefen, 2008; Radvansky & Potter, 2000). Newborns are also sensitive to the rhythmic features of language: they distinguish between languages of different rhythmic classes—stress-timed or syllable-timed—whether or not the contrast includes their native language (Nazzi, Bertoncini, & Mehler, 1998). Although newborns prefer their native language (Moon, Cooper, & Fifer, 1993), this seems to reflect a preference for the rhythmic class (stress patterns) of their native language: it is not until 4 months of age that infants can reliably tell the difference between languages of the same rhythmic class (Bosch & Sebastian-Galles, 1997; Gervain & Mehler, 2010; Nazzi et al., 1998). In addition to being sensitive to linguistic timbre and rhythm, infants can also discriminate the characteristic prosody (or melody) of their native language (Friederici, 2006) and even show evidence of discriminating affective prosody in the first two days of life (Cheng, Lee, Chen, Wang, & Decety, 2012). Research on infant cries also sheds light on the importance of melodic abilities in linguistic development: over the first few months of life, melodic complexity of crying increases (Wermke & Mende, 2009) and infants who do not show this increasing melodic complexity show poorer language performance two years later (Wermke, Leising, & StellzigEisenhauer, 2007). The “melody” of infants’ cries reflects the prosody of their native language, further support for the idea that infants are sensitive
to the musical aspects of the language to which they are most exposed (Mampe, Friederici, Christophe, & Wermke, 2009; Prochnow, Erlandsson, Hesse, & Wermke, 2017). Additional evidence that the musical features of language are salient to infants comes from the way we talk to them: in “baby talk” or motherese, melodic contours are exaggerated, speech is slower and rhythmic stresses are emphasized. Researchers have debated why we talk to babies that way. Some have argued that the function of motherese is limited to emotional communication (Trainor, Austin, & Desjardins, 2000). Others propose that, in addition to communicating emotion, baby talk also engages infants’ attention (Fernald, 1989): for instance, a parent’s pitch contours vary depending on whether the child is smiling and/or maintaining eye contact or not (Stern, Spieker, & MacKain, 1982). Across a variety of languages, mothers modulate their vocal timbre in similar ways when speaking to infants versus adults, perhaps as a way to draw their infants’ attention— again highlighting the salience timbre has for infants (Piazza, Iordan, & Lew-Williams, 2017). Other researchers emphasize the didactic role of motherese, observing that mothers lengthen the vowels in content words and exaggerate word and sentence boundaries (Kuhl et al., 1997; SaintGeorges et al., 2013). The functions of motherese may certainly change in the course of development: as Saint-Georges and colleagues write (Saint-Georges et al., 2013, p. 9), “Mothers adjust their infant-directed speech to infants’ age, cognitive abilities and linguistic level.” The universality of motherese (Falk, 2004; Fernald, 1992) highlights how crucial it is to the caregiver–child relationship. Infants are drawn to the musical features of speech—and that attraction helps them to engage socially and to learn. Various other evidence illuminates the extensive sensitivity infants have to the musical sounds of language. For instance, newborns can use different patterns of lexical stress to discriminate individual words (Sansavini, Bertoncini, & Giovanelli, 1997); can use acoustic cues that signal word boundaries (Christophe, Dupoux, Bertoncini, & Mehler, 1994); can distinguish content words from function words based on their different acoustic characteristics (Shi, Werker, & Morgan, 1999); and appear to be sensitive to the prosodic boundaries in sentences (Pannekamp, Weber, & Friederici, 2006). A long-held debate in language acquisition research is how infants solve the so-called “bootstrapping problem”—how infants
connect sounds to meaning. It may be that infants use the musical aspects of language (timbre, rhythm, melody) as the scaffolding on which they hang their later developments in semantic and syntactic comprehension. Infants are listening for how their language is composed and using this musical information to support later linguistic developments. Still, early life is the hardest period to study and the evidence is at times contradictory, especially when it comes to timbre perception: for instance, infants born ten weeks premature can discriminate a ba/da contrast, but have more difficulty discriminating two different speakers (Mahmoudzadeh, Wallois, Kongolo, Goudjil, & Dehaene-Lambertz, 2016), and infants can recognize different phones before they can recognize different voices (Dehaene-Lambertz, 2017). This discrepancy highlights the importance of continuing to probe the aural abilities of newborns. If the precocious discrimination abilities we see in newborns are domain-general and not limited to language, we would expect to find similarly precocious music perception abilities. Indeed, young infants have shown that they have very fine-grained pitch discrimination abilities, being able to distinguish pitch contrasts as small as a third of a half-step (Olsho, Schoon, Sakai, Turpin, & Sperduto, 1982). Newborns can also detect a deviant pitch, even when the timbre of the pitches is also changing (Háden et al., 2009), and can extract pitch patterns and use them in a predictive manner (Háden, Németh, Török, & Winkler, 2015)—further evidence of their advanced pitch-processing abilities. In the rhythmic domain, newborns can detect the beat in music and can detect when an important rhythmic event is removed from a repeating drum pattern (Winkler, Háden, Ladinig, Sziller, & Honing, 2009). Newborns can also distinguish between small changes (60–100 ms) in the length of two tones (Čėponiené et al., 2002; Cheour et al., 2002). At two months of age, they can discriminate an isochronous sequence of tones from a nonisochronous sequence (Demany, McKenzie, & Vurpillot, 1977). Far more research has been done on the speech perception abilities than the music perception abilities of newborns and very young infants, but existing evidence suggests that infants’ music perception abilities are every bit as sensitive as their language perception abilities at birth. This further supports the idea that these are domain-general sound processing capabilities, not unique to either music or language.
C L
M P M
6
12
A
Gradually, infants’ sound perception abilities become more refined and culture-specific and begin to segregate more clearly into music perception and language perception capabilities. Tellingly, this refinement proceeds along a remarkably similar track for both music and language (see Fig. 1). At 6 months, infants can still discriminate all of the phonetic contrasts in the world’s languages (Cheour et al., 1998; Rivera-Gaxiola, Klarman, Garcia-Sierra, & Kuhl, 2005), although they do show evidence of being attuned to the vowel sounds of their native language over other languages (Kuhl, Williams, Lacerda, Stevens, & Lindblom, 1992). Similarly, 6-monthold infants can detect changes in a melody made of pitches from Javanese scales or Western scales equally well, whereas adults have a difficult time with the Javanese scale melodies (Lynch, Eilers, Kimbrough Oller, & Urbano, 1990). By 9 months, Western infants are more like adults: they have more difficulty with the Javanese melodies (Lynch & Eilers, 1992). In the language domain, by 8 months, infants cannot discriminate non-native vowel contrasts, although they can still discriminate non-native consonant contrasts. By 10–12 months, this ability disappears as well (Polka & Werker, 1994; Werker & Tees, 1984).
FIGURE 1. Parallel development in music and language milestones from 6 to 12 months. Regular text denotes parallel development. Italics denote related, but not analogous development. Bold text denotes language-only development. See main text for citations not listed here. (1) Six-month-olds can discriminate changes in Western and Javanese scales, can discriminate simple and complex meters, and can discriminate the phonemes of all languages. (2) Nine-month-olds can detect pitch or timing changes more easily in strong metrical structures and more easily process duple meter (more common) than triple meter (less common; Bergeson & Trehub, 2006). (3) Twelve-month-olds can better detect mistuned notes in Western scales than in Javanese scales and have more difficulty detecting changes in complex than simple meters. (4) Between 6 and 8 months, infants can discriminate consonant from dissonant intervals, but have difficulty discriminating between different consonant intervals (Schellenberg & Trainor, 1996). (5) Between 6 and 8 months, infants can no longer discriminate non-native vowel contrasts, but can still discriminate non-native consonant contrasts. (6) Trehub & Thorpe (1989). (7) At 7.5–8 months, English speaking infants show a bias for stress-initial words and are sensitive to prosodic and frequency cues to word order. Adapted from Frontiers in Psychology 3, p. 327, Figure 1, Anthony Brandt, Molly Gebrian, and L. Robert Slevc, Music and early language acquisition, doi: 10.3389/fpsyg.2012.00327, © 2012 Brandt, Gebrian, and Slevc. Reprinted under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Rhythmic perception abilities become more culture-specific as well. At 7.5 months, English-speaking infants show a preference for stress-initial words (Jusczyk, Hohne, & Bauman, 1999) and by 9 months, infants are especially sensitive to the stress patterns in their native language (for a review, see Jusczyk, 2000). As noted above, word segmentation in infants is
initially based on rhythmic information and it is only by 10.5 months that infants can use non-stress-based cues to segment words (Jusczyk et al., 1999). In music, 6-month-old infants can detect changes in complex and simple meters equally well, whereas Western adults have a hard time with complex meters (Hannon & Trehub, 2005a). By 12 months, Western infants also show this same difficulty with complex meters (Hannon & Trehub, 2005b). All of this refinement in the language domain lays the groundwork for the eventual understanding of meaning and syntax. At 8 months, infants are sensitive to the word order rules in their native language (a necessary ingredient in syntax comprehension in many languages), but largely through prosodic information and word frequency (Gervain, Nespor, Mazuka, Horie, & Mehler, 2008; Hochmann, Endress, & Mehler, 2010; Nespor et al., 2008). At 9 months, infants show evidence of understanding their first words (Friederici, 2006), and at this point, semantic and syntactic development takes over. Typically developing infants begin to talk between 11 and 13 months, experience an explosion in their vocabulary between 18 and 24 months, and their syntactic learning reaches a high point between 18 and 36 months (Friederici, 2006; Kuhl, 2010). These developmental stages show a clear trajectory: the further removed an aspect of language is from music (referential meaning, grammar), the later it is learned.
C L
M C
Parallels between music and language development do not stop after the first year of life. Indeed, they continue throughout childhood, until children’s musical and linguistic sensitivity reaches adult levels (see Fig. 2). One challenge in comparing music and language development after infancy is that, whereas linguistic ability is often measured against the general population, musical ability is often (implicitly) measured against the expertise of professional musicians. This has contributed to the idea that language ability is an innate skill all typically developing humans possess, whereas musical skill is due to “talent” or a “gift” that is slower to mature. Although it certainly takes a tremendous amount of hard work and
dedication to master the viola or the trumpet, acquiring the musical conventions of your native culture is no more difficult or slow than learning your native language.
FIGURE 2. Parallel development in music and language milestones from 2 to 12 years. Regular text denotes parallel development. Italics denote related, but not analogous development. See main text for references. (1) Two-year-olds can repeat brief, sung phrases with identifiable rhythm and contour. (2) Eighteen-month-olds produce two-word utterances; 2-year-olds tend to eliminate function words, but not content words. (3) Two-year-olds show basic knowledge of word order constraints. (4) Three-year-olds have some knowledge of key membership and harmony and sing “outline songs.” (5) Four- to six-year olds show knowledge of scale and key membership and detect changes more easily in diatonic melodies than in non-diatonic ones. Five-year-olds show a typical electrophysiological response to unexpected chords (the early right anterior negativity, or ERAN), but do not detect a melodic change that implies a change in harmony. (6) At 5 years, processing of function words depends on semantic context and brain activation is not function-specific for semantic vs. syntactic processing (unlike adults). (7) Six-year-olds are able to speak in complete, well-formed sentences. (8) Seven-year-olds have a knowledge of Western tonal structure comparable to adults’ and can detect melodic changes that imply a change in harmony. (9) Only after 10 years of age do children show adult-like electrophysiological responses to syntactic errors (Hahne, Eckstein, & Friederici, 2006). Adapted from Frontiers in Psychology 3, p. 327, Figure 2, Anthony Brandt, Molly Gebrian, and L. Robert Slevc, Music and early language acquisition, doi: 10.3389/fpsyg.2012.00327, © 2012 Brandt, Gebrian, and Slevc. Reprinted under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Between 2 and 3 years of age, children acquire basic knowledge of syntax (e.g., Höhle, Weissenborn, Schmitz, & Ischebeck, 2001), although at this age, syntax and semantics are still interdependent (Brauer & Friederici,
2007; Friederici, 1983). This is still true at age 5. During the same period, children are mastering the syntax of their culture’s music, which in Western music means knowledge of key membership and harmony (Corrigall & Trainor, 2009). Again, up through age 5, musical syntactic knowledge is incomplete and still very much dependent on context, much like what is seen in language (Koelsch et al., 2003; Trainor & Trehub, 1994; Trehub, Cohen, Thorpe, & Morrongiello, 1986). By the age of 6, children appear to have mastered the syntax of their native language (Scott, 2004; Nuñez et al., 2011) and by 7 years of age, a child’s knowledge of the tonal structure in their culture’s music is comparable to an adult’s (McMullen & Saffran, 2004; Speer & Meeks, 1985; Trainor & Trehub, 1994). The learning of more complex syntactic structures in language continues through age 10 (Friederici, 1983), as do musical abilities. It is not until 8 to 10 years of age that children’s pitch discrimination abilities reach adult levels (Werner & Marean, 1996) and sensitivity to implied harmonies reaches adult levels by age 12 (Costa-Giomi, 2003). This parallel developmental trajectory is remarkable, especially given that all of the papers cited above studied children from Western cultures, which prioritize language learning far above musical learning in school curricula. In fact, children who are given music lessons reach musical developmental milestones sooner than their peers who do not take music lessons (for a review, see Trainor & Corrigall, 2010). It is difficult to sustain the argument that musical learning is slower and more effortful in the face of this parallel development, especially when music learning is not given nearly the same emphasis in our culture or schools. The ability to make music also follows a remarkably parallel track to the development of children’s speech throughout childhood. There are relatively few studies of children’s singing abilities, perhaps because in Western culture, we tend to separate musicians from non-musicians. This separation is not true for many non-Western societies and was not the case historically: modern Western society is unusual in its exclusion of singing (and dancing) from everyday life (Cross, 2001). Despite this, the development of singing ability appears to proceed similarly across cultures and keeps pace with the ability to speak. For instance, around the age of 2, children begin to produce short linguistic utterances (Friederici, 2006; Gervain & Mehler, 2010). At this same age, they can reproduce simple musical fragments with identifiable rhythm and contour (Dowling, 1999).
Later, 2- to 3-year-olds tend to eliminate function words (but not content words, like nouns and verbs) from their spontaneous speech (Gerken, Landau, & Remez, 1990). Similarly, when 3-year-olds sing, they have a tendency to mix bits of songs from their own culture with their own original vocal improvisations, singing so-called “outline songs” that follow the general contour of melodies from their culture (Davidson, 1994; Hargreaves, 1996; Moog, 1976). Thus, in both speaking and singing, toddlers are uttering the gist of what they want to express (using the most important words, nouns and verbs in language, and getting the essential contour of songs correct), while leaving out the more nuanced details (function words in language, more detailed and precise pitch content in music). In Western societies, the ability to sing continues to mature until around age 11 (e.g., Howard, Angus, & Welch, 1994; Welch, 2002), although in societies that emphasize singing ability on par with speaking ability, this improvement happens earlier and to a greater extent (Kreutzer, 2001; Welch, 2009).
I
N L
B
S P
?
Despite the evidence above, many have claimed that music and language are separate systems, and are processed as such by the brain (e.g., Peretz & Coltheart, 2003). Some researchers claim that this separation is innate based on evidence that infants show left hemisphere lateralization for speech at birth (e.g., Dehaene-Lambertz, Dehaene, & Hertz-Pannier, 2002). However, a close look at this research reveals a more complicated picture. Bilateral activation for both music and language has been found in a number of studies, ranging from those that include full sentences and/or musical phrases (Fava, Hull, Baumbauer, & Bortfeld, 2014; Perani, 2012), to those that use tightly controlled speech-like sounds that test specific parameters of the linguistic acoustic signal (Dehaene-Lambertz, 2000; Kotilahti et al., 2010; Minagawa-Kawai, Cristià, Vendelin, Cabrol, & Dupoux, 2011a). Fava and colleagues (2014) used full sentences and musical phrases and found no differences in activation. Perani and colleagues (2011) found right
lateralization for speech that closely parallels the activation they observed to music in an earlier study (Perani et al., 2010). Researchers who argue that music and language processing are separate in the brain often point to rapid temporal processing as evidence for this separation (e.g., Boemio, Fromm, Braun, & Poeppel, 2005; Zatorre & Belin, 2001). The argument is that rapid temporal processing is required for language comprehension, but not for music perception. However, as described earlier, the perception of musical timbre requires processing at the same rate as phonemes: for instance, in a percussion solo, rapid temporal processing is at least as salient as slower melodic processing. Nevertheless, even taken at face value, this research has produced conflicting results. While the right hemisphere is believed to respond more strongly to spectral information, the left hemisphere shows greater sensitivity to rapid temporal contrasts, which underlies the supposed innate predisposition for language perception. As noted earlier, Mahmoudzadeh and colleagues (2016) found that infants born 10 weeks preterm could distinguish consonants differing along a temporal continuum (specifically, a ba/da contrast varying in voice onset time and so requiring rapid temporal processing), but not between different speakers (whose voices presumably differ mostly in spectral, not temporal, dimensions). Because the infants were born so prematurely, the authors argued that this is evidence for an early advantage for the processing of fast changes in the acoustic signal that is likely genetic. However, an earlier study on premature infants by this same research group (Mahmoudzadeh et al., 2013) found right hemisphere activation increases to both a change in phoneme and a change in speaker (a left hemisphere increase was also observed, but only for the phoneme change). Minagawa-Kawai and colleagues (2011a) tested sounds that differed in only their temporal composition or only their spectral composition and found no evidence for hemispheric asymmetries; in fact, both hemispheres were equally activated in the temporal condition. Dehaene-Lambertz, who has long argued that music and language are separate and that language perception is innate and prioritized in the infant brain, has recently acknowledged that the brain activation to language in young infants is complex and the leftward preference sometimes seen for linguistic stimuli may have more to do with the fact that temporally responsive neurons on the left mature earlier, rather than reflecting anything special about language per se (Dehaene-Lambertz, 2017).
What is more fully supported by the research, at least to date, is the idea that functional specialization emerges as a result of exposure and learning. In 2011, Minagawa-Kawai and colleagues proposed such a developmental scenario, arguing that language acquisition recruits several specialized (but not necessarily domain-specific) learning subsystems … The establishment of feature based, categorical phonetic units in the extraction of words and rules on the basis of hierarchical and adjacent regularities requires specific learning algorithms that are especially efficient in the left hemisphere and, as a result, speech perception comes to be left-lateralized as a function of experience. (Minagawa-Kawai, Cristià, & Dupoux, 2011b, p. 219)
This hypothesis predicts that second language learning in adults will be less left-lateralized as a function of proficiency, which is exactly what is found (Dehaene et al., 1997; Perani et al., 1996, 1998). In addition, sound pairs only elicit asymmetrical activation when they constitute a contrast in the speaker’s native language (Dehaene-Lambertz, 1997; Minagawa-Kawa, Mori, & Sato, 2005; Näätänen et al., 1997), but after extensive training with a non-native contrast, there is a shift to left-hemisphere dominance (Best & Avery, 1999; Zhang et al., 2009). Gervain (2015, p. 16) echoes this idea: “Features of the native language are processed in an increasingly lateralized fashion in a network of focal brain areas, as processing turns from acoustic/auditory to linguistic in nature, whereas non-native sound patterns are handled in a more distributed and more bilateral way.”
L
D
D
Other evidence for the inseparability of music and language during early life comes from linked deficits in individuals with abnormal linguistic or musical development. For example, the speech perception and reading deficits associated with developmental dyslexia have often been linked to underlying problems with auditory processing, including processing of rapid temporal changes in speech (for a review, see Hämäläinen, Salminen, & Leppänen, 2013). Given the discussion above, it is unsurprising that these deficits impact music as well. In particular, children with dyslexia tend to also show deficits with musical rhythm (e.g., synchronizing with a metronome) and (while not as often studied) musical timbre (e.g.,
Goswami, Huss, Mead, Fosker, & Verney, 2013; Huss, Verney, Fosker, Mead, & Goswami, 2011; Overy, Nicolson, Fawcett, & Clarke, 2003). Assuming a crucial link between musical and linguistic development, one might imagine that musical experience could help treat dyslexia. In fact, musical (especially rhythm) training can help remediate linguistic deficits in dyslexia (e.g., Flaugnacco et al., 2015), although music training is clearly no panacea given that dyslexic musicians show good musical abilities despite pronounced reading deficits (Weiss, Granot, & Ahissar, 2014). Note, however, that dyslexic musicians do show other types of perceptual and auditory deficits including discrimination of amplitude envelope cues (which relies on rapid temporal processing) and auditory working memory (Weiss et al., 2014; Zuk et al., 2017). Thus developmental dyslexia appears to be a deficit not only of language but also of music. Similar parallels emerge in other purportedly language-specific deficits such as specific language impairment (SLI). Individuals with SLI, whose primary deficit involves syntactic processing, also show deficits in the processing of musical (harmonic) structure (Jentschke, Koelsch, Sallat, & Friederici, 2008), deficits in rhythmic processing (e.g., Cumming, Wilson, Leong, Colling, & Goswami, 2015), and show more accurate grammatical processing following rhythmic stimulation in a priming paradigm (Bedoin, Brisseau, Molinier, Roch, & Tillmann, 2016; Przybylski et al., 2013). Production deficits in SLI are also associated with productive deficits in music—specifically, in pitch-matching and melody reproduction (Clément, Planchou, Béland, Motte, & Samson, 2015). Even deficits in language production such as developmental stuttering have been linked to deficits with musical rhythm (Wieland, McAuley, Dilley, & Chang, 2015), and suggestive relationships between language and music deficits appear in other disorders not linked specifically to language. For example, autistic children often have age-appropriate responses to music but suffer from language disabilities. This has been viewed as evidence that music and language must involve innately distinct neural networks, with the music network healthy and the language one impaired. However, in an fMRI study of autistic children, Lai and colleagues (Lai, Pantazatos, Schneider, & Hirsch, 2012, p. 961) found that, “paradoxically, brain regions associated with these functions typically overlap”: while song and speech activated the same networks, the response to song was vigorous whereas that to speech was subdued (Lai et al., 2012). And while the causes
are not yet fully understood, there is growing evidence for the efficacy of musical engagement and therapy on language (and other) outcomes in autism spectrum disorders (e.g., Geretsegger, Elefant, Mössler, & Gold, 2014). Of course, the prediction is not just that linguistic deficits show concomitant musical processing problems, but also that developmental deficits in music should have consequences for language development. Congenital amusia, the most well-studied deficit of musical development, is primarily associated with deficits in pitch perception and/or pitch memory, although can also include deficits in the processing of temporal (rhythmic) aspects of music (reviews: Peretz & Hyde, 2003; Tillman, Albouy, & Caclin, 2015). While early conceptions of amusia suggested a deficit specific to music, more recent work has found related deficits in processing of linguistic pitch, both for lexical tones (in tone-language speakers; e.g., Liu, Patel, Fourcin, & Stewart, 2010; Wang & Peng, 2014), and for the recognition of emotional prosody in non-tone languages (Thompson, Marin, & Stewart, 2012). Speech perception deficits in congenital amusia appear to extend to non-pitch-based aspects of language as well (e.g., Liu, Jiang, Wang, Xu, & Patel, 2015), showing subtle, but widespread, linguistic consequences of congenital amusia. One deficit that severely impacts both music and speech processing is deafness. Given that deaf individuals can successfully learn full-fledged sign languages, deafness could be seen as a notable counterexample for many of the claims made here. Note, however, that music is not just an auditory stimulus, but also a kinesthetic and visual one, and the rhythmic “babbling” characteristic of sign-exposed infants (e.g., Petitto, Holowka, Sergio, & Ostry, 2001) may lay the foundation for the temporal processing abilities underlying later linguistic acquisition in signers.
E
M A
L
It can be difficult for adults to conceive of music and language being treated as one and the same in the infant brain because they are so obviously different once we reach maturity. However, this entanglement between
music and language persists throughout our lives. Research on the perception of tunes and lyrics shows evidence of integrated processing at the pre-lexical, phonemic processing stage in adults (Sammler et al., 2010). Sine-wave speech, which initially sounds like meaningless whistles, can not only be perceived as speech after training, but it also activates speech areas (specifically the left posterior superior temporal sulcus) once it is perceived as speech (Mottonen et al., 2006). Silbo Gomero, a whistled speech used on La Gomera in the Canary Islands, activates areas of the brain normally associated with spoken language perception in proficient whistlers, but not in those who do not speak the language (Carreiras, Lopez, Rivero, & Corina, 2005). Something as abstract as syntax may also be rooted in the music of language. Kreiner and Eviatar (2014) argue that syntactic comprehension is rooted in prosody: syntactic and prosodic boundaries largely correspond in spoken language and this helps aid syntactic comprehension (see also Heffner & Slevc, 2015). They note that prosody helps disambiguate unclear syntactic structures (such as garden-path sentences), which can be misleading in print, but rarely in spoken conversation. As argued above, this melody of language is what infants first use, but even as adults, it continues to underlie our syntactic understanding. There is also evidence for a correlation between rhythm processing and word stress processing in normal adults, another entanglement of music and language (Hausen, Torppa, Salmela, Vainio, & Särkämö, 2013).
C Patel (2012) has proposed a framework for the adult brain in which music and language share neural resources for those perceptual and cognitive tasks that they share in common. Given the evidence for infants’ sensitivities to the musical features of speech, the co-development of musical and linguistic abilities, and shared developmental disorders, it seems plausible that music and language are even more deeply entangled in the newborn brain and that modularity emerges in the course of development. Speech may be privileged in the infant brain (Shultz, Vouloumanos, Bennett, & Pelphrey, 2014), but it is first experienced as a
vocal performance whose musical features are what engage the newborn’s attention. It is only later that the child begins to apprehend the referential function of words, and the music of words begins to sink into the background. As Deutsch’s “looped speech” experiment shows, the music of speech is never really absent, even in a non-tonal language—it is just that we pay less attention to it. Music—and by extension, poetry—may give human culture ways to creatively engage in the features of our aural imaginations that conversational speech does not prioritize. A burning question of human cognition is whether language is innate. Are we language animals, with a universal grammar encoded in our genes (Chomsky, 1980)? Or do we have a strong biological need to communicate —and an aptitude for learning how to do so? The jury is still out, but there has been a gradual shift toward viewing language as a cultural inheritance rather than a genetic one. Iterated learning—in which initially random data becomes coherent over time as generations of subjects instruct one another —is a plausible way of describing the emergence of both language and music (Kirby, Griffiths, & Smith, 2014; Ravignani, Delgado, & Kirby, 2016; Smith, Kirby, & Brighton, 2003). The study of music’s role in early language acquisition may have important implications: when we learn language, the music of speech comes first, thereby providing a key mechanism by which language is transmitted from generation to generation. In a newborn’s first months, speech is akin to bebop: musical attention to how languages are composed—through their unique phonemic inventory and prosody—helps an infant born into any community learn its native tongue. The same acuities used in these early developmental stages— sensitivities to timbre, pitch, and rhythm, and the ability to recognize their consistencies—embody musical aptitude later in life. Early language acquisition thus lies at the crossroads of music and language and provides tantalizing glimpses into what it means to be human.
R Albin, D. D., & Echols, C. H. (1996). Stressed and word-final syllables in infant-directed speech. Infant Behavior and Development 19(4), 401–418. Batuman, E. (2012). Talking drums. The New Yorker, July 9. Retrieved from https://www.newyorker.com/culture/culture-desk/talking-drums
Bedoin, N., Brisseau, L., Molinier, P., Roch, D., & Tillmann, B. (2016). Temporally regular musical primes facilitate subsequent syntax processing in children with specific language impairment. Frontiers in Neuroscience 10. Retrieved from https://doi.org/10.3389/fnins.2016.00245 Bergeson, T. R., & Trehub, S. E. (2006). Infants’ perception of rhythmic patterns. Music Perception 23(4), 345–360. Best, C. T., & Avery, R. A. (1999). Left-hemisphere advantage for click consonants is determined by linguistic significance and experience. Psychological Science 10(1), 65–70. Boemio, A., Fromm, S., Braun, A., & Poeppel, D. (2005). Hierarchical and asymmetric temporal sensitivity in human auditory cortices. Nature Neuroscience 8(3), 389–395. Bosch, L., & Sebastián-Gallés, N. (1997). Native-language recognition abilities in 4-month-old infants from monolingual and bilingual environments. Cognition 65(1), 33–69. Brandt, A., Gebrian, M., & Slevc, L. R. (2012). Music and early language acquisition. Frontiers in Psychology 3. Retrieved from https://doi.org/10.3389/fpsyg.2012.00327 Brauer, J., & Friederici, A. D. (2007). Functional neural networks of semantic and syntactic processes in the developing brain. Journal of Cognitive Neuroscience 19(10), 1609–1623. Carreiras, M., Lopez, J., Rivero, F., & Corina, D. (2005). Linguistic perception: Neural processing of a whistled language. Nature 433(7021), 31–32. Čėponiené, R., Kushnerenko, E., Fellman, V., Renlund, M., Suominen, K., & Näätänen, R. (2002). Event-related potential features indexing central auditory discrimination by newborns. Cognitive Brain Research 13(1), 101–113. Cheng, Y., Lee, S. Y., Chen, H. Y., Wang, P. Y., & Decety, J. (2012). Voice and emotion processing in the human neonatal brain. Journal of Cognitive Neuroscience 24(6), 1411–1419. Cheour, M., Čėponiené, R., Lehtokoski, A., Luuk, A., Allik, J., Alho, K., & Näätänen, R. (1998). Development of language-specific phoneme representations in the infant brain. Nature Neuroscience 1, 351–353. Cheour, M., Čėponiené, R., Leppänen, P., Alho, K., Kujala, T., Renlund, M., & Näätänen, R. (2002). The auditory sensory memory trace decays rapidly in newborns. Scandinavian Journal of Psychology 43(1), 33–39. Chomsky, N. (1980). Rules and representations. New York: Columbia University Press. Christophe, A., Dupoux, E., Bertoncini, J., & Mehler, J. (1994). Do infants perceive word boundaries? An empirical study of the bootstrapping of lexical acquisition. Journal of the Acoustical Society of America 95(3), 1570–1580. Clément, S., Planchou, C., Béland, R., Motte, J., & Samson, S. (2015). Singing abilities in children with Specific Language Impairment (SLI). Frontiers in Psychology 6. Retrieved from https://doi.org/10.3389/fpsyg.2015.00420 Corrigall, K. A., & Trainor, L. J. (2009). Effects of musical training on key and harmony perception. Annals of the New York Academy of Sciences 1169, 164–168. Costa-Giomi, E. (2003). Young children’s harmonic perception. Annals of the New York Academy of Sciences 999, 477–484. Costa-Giomi, E., & Davila, Y. (2014). Infants’ discrimination of female singing voices. International Journal of Music Education 32(3), 324–332. Creel, S. C. (2016). Ups and downs in auditory development: Preschoolers’ sensitivity to pitch contour and timbre. Cognitive Science 40(2), 373–403. Cross, I. (2001). Music, cognition, culture, and evolution. Annals of the New York Academy of Sciences 930, 28–42. Cumming, R., Wilson, A., Leong, V., Colling, L. J., & Goswami, U. (2015). Awareness of rhythm patterns in speech and music in children with specific language impairments. Frontiers in Human Neuroscience 9. Retrieved from https://doi.org/10.3389/fnhum.2015.00672
Davidson, L. (1994). Song singing by young and old: A developmental approach to music. In R. Aiello with J. Sloboda (Eds.), Musical perceptions (pp. 99–130). New York: Oxford University Press. Deacon, T. W. (1997). The symbolic species: The coevolution of language and the brain. New York: W. W. Norton. Dehaene, S., Dupoux, E., Mehler, J., Cohen, L., Paulesu, E., Perani, D., & Le Bihan, D. (1997). Anatomical variability in the cortical representation of first and second language. NeuroReport 8(17), 3809–3815. Dehaene-Lambertz, G. (1997). Electrophysiological correlates of categorical phoneme perception in adults. NeuroReport 8(4), 919–924. Dehaene-Lambertz, G. (2000). Cerebral specialization for speech and non-speech stimuli in infants. Journal of Cognitive Neuroscience 12(3), 449–460. Dehaene-Lambertz, G. (2017). The human infant brain: A neural architecture able to learn language. Psychonomic Bulletin and Review 24(1), 48–55. Dehaene-Lambertz, G., & Dehaene, S. (1994). Speed and cerebral correlates of syllable discrimination in infants. Nature 370, 1–4. Dehaene-Lambertz, G., Dehaene, S., & Hertz-Pannier, L. (2002). Functional neuroimaging of speech perception in infants. Science 298(5600), 2013–2015. Demany, L., McKenzie, B., & Vurpillot, E. (1977). Rhythm perception in early infancy. Nature 266(5604), 718–719. Deutsch, D., Henthorn, T., & Lapidis, R. (2011). Illusory transformation from speech to song. Journal of the Acoustical Society of America 129(4), 2245–2252 Dowling, W. J. (1999). The development of music perception and cognition. In D. Deutsch (Ed.), The Psychology of Music (2nd ed.; pp. 603–625). London: Academic Press. Eimas, P., Siqueland, E., Jusczyk, P., & Vigorito, J. (1971). Speech perception in infants. Science 171(3968), 202–206. Falk, D. (2004). Prelinguistic evolution in early hominins: Whence motherese? Behavioral and Brain Sciences 27(4), 491–503; discussion: 503–583. Fava, E., Hull, R., Baumbauer, K., & Bortfeld, H. (2014). Hemodynamic responses to speech and music in preverbal infants. Child Neuropsychology 20(4), 430–448. Fernald, A. (1989). Intonation and communicative intent in mothers’ speech to infants: Is the melody the message? Child Development 60(6), 1497–1510. Fernald, A. (1992). Human maternal vocalizations to infants as biologically relevant signals: An evolutionary perspective. In J. H. Barkow, L. Cosmides, & J. Tooby (Eds.), The adapted mind: Evolutionary psychology and the generation of culture (pp. 391–428). Oxford: Oxford University Press. Fernald, A., & Mazzie, C. (1991). Prosody and focus in speech to infants and adults. Developmental Psychology 27(2), 209–221. Flaugnacco, E., Lopez, L., Terribili, C., Montico, M., Zoia, S., & Schön, D. (2015). Music training increases phonological awareness and reading skills in developmental dyslexia: A randomized control trial. PLoS ONE 10(9), e0138715. Fleming, K. (2013). Five surprising facts about Richie’s classic “All Night Long.” New York Post, September 21. Retrieved from http://nypost.com/2013/09/21/five-surprising-facts-about-richiesclassic-all-night-long/ Forsyth, M. (2012). The etymologicon: A circular stroll through the hidden connections of the English language. New York: Berkley Books. Fox, C. (1990). Steve Reich’s “Different Trains.” Tempo, New Series, No. 172 (March), 2–8.
Friederici, A. D. (1983). Children’s sensitivity to function words during sentence comprehension. Linguistics 21, 717–739. Friederici, A. D. (2006). The neural basis of language development and its impairment. Neuron 52(6), 941–952. Geretsegger, M., Elefant, C., Mössler, K. A., & Gold, C. (2014). Music therapy for people with autism spectrum disorder. Cochrane Database of Systematic Reviews 17(6), CD004381. doi:10.1002/14651858.CD004381.pub3 Gerken, L., Landau, B., & Remez, R. (1990). Function morphemes in young children’s speech perception and production. Developmental Psychology 26(2), 204–216. Gervain, J. (2015). Plasticity in early language acquisition: The effects of prenatal and early childhood experience. Current Opinion in Neurobiology 35, 13–20. Gervain, J., & Mehler, J. (2010). Speech perception and language acquisition in the first year of life. Annual Review of Psychology 61, 191–218. Gervain, J., Nespor, M., Mazuka, R., Horie, R., & Mehler, J. (2008). Bootstrapping word order in prelexical infants: A Japanese–Italian cross-linguistic study. Cognitive Psychology 57(1), 56–74. Goswami, U., Huss, M., Mead, N., Fosker, T., & Verney, J. P. (2013). Perception of patterns of musical beat distribution in phonological developmental dyslexia: Significant longitudinal relations with word reading and reading comprehension. Cortex 49(5), 1363–1376. Háden, G. P., Németh, R., Török, M., & Winkler, I. (2015). Predictive processing of pitch trends in newborn infants. Brain Research 1626, 14–20. Háden, G. P., Stefanics, G., Vestergaard, M. D., Denham, S. L., Sziller, I., & Winkler, I. (2009). Timbre-independent extraction of pitch in newborn infants. Psychophysiology 46(1), 69–74. Hahne, A., Eckstein, K., & Friederici, A. D. (2006). Brain signatures of syntactic and semantic processes during children’s language development. Brain 16(7), 1302–1318. Halpern, A. R., & Müllensiefen, D. (2008). Effects of timbre and tempo change on memory for music. Quarterly Journal of Experimental Psychology 61(9), 1371–1384. Hämäläinen, J. A., Salminen, H. K., & Leppänen, P. H. (2013). Basic auditory processing deficits in dyslexia: Systematic review of the behavioral and event-related potential/field evidence. Journal of Learning Disabilities 46(5), 413–427. Hannon, E. E., & Trehub, S. E. (2005a). Metrical categories in infancy and adulthood. Psychological Science 16(1), 48–55. Hannon, E. E., & Trehub, S. E. (2005b). Tuning in to musical rhythms: Infants learn more readily than adults. Proceedings of the National Academy of Sciences 102(35), 12639–12643. Hargreaves, D. J. (1996). The development of artistic and musical competence. In I. Deliege & J. Sloboda (Eds.), Musical beginnings (pp. 145–170). Oxford: Oxford University Press. Harich-Schneider, E. (1954). The rhythmical patterns in Gagaku and Bugaku. Leiden: E. J. Brill. Hausen, M., Torppa, R., Salmela, V. R., Vainio, M., & Särkämö, T. (2013). Music and speech prosody: A common rhythm. Frontiers in Psychology 4. Retrieved from https://doi.org/10.3389/fpsyg.2013.00566 Heffner, C. C., & Slevc, L. R. (2015). Prosodic structure as a parallel to musical structure. Frontiers in Psychology 6. Retrieved from https://doi.org/10.3389/fpsyg.2015.01962 Hochmann, J. R., Endress, A. D., & Mehler, J. (2010). Word frequency as a cue for identifying function words in infancy. Cognition 115(3), 444–457. Höhle, B., Weissenborn, J., Schmitz, M., & Ischebeck, A. (2001). Discovering word order regularities: The role of prosodic information for early parameter setting. In J. Weissenborn & B. Höhle (Eds.), Approaches to bootstrapping: Phonological, lexical, syntactic and neurophysiological aspects of early language acquisition (pp. 249–265). Amsterdam: John Benjamins.
Howard, D. M., Angus, J. A., & Welch, G. F. (1994). Singing pitching accuracy from years 3 to 6 in a primary school. Proceedings of the Institute of Acoustics 16(5), 223–230. Hugo, V. (1864). William Shakespeare. Paris: A. Lacroix, Verboeckhoeven et Cie. Hukin, R. W., & Darwin, C. J. (1995). Comparison of the effect of onset asynchrony on auditory grouping in pitch matching and vowel identification. Perception and Psychophysics 57(2), 191– 196. Huss, M., Verney, J. P., Fosker, T., Mead, N., & Goswami, U. (2011). Music, rhythm, rise time perception and developmental dyslexia: Perception of musical meter predicts reading and phonology. Cortex 47(6), 674–689. Jackendoff, R. (2009). Parallels and nonparallels between language and music. Music Perception 26(3), 195–204. Jentschke, S., Koelsch, S., Sallat, S., & Friederici, A. D. (2008). Children with specific language impairment also show impairment of music-syntactic processing. Journal of Cognitive Neuroscience 20(11), 1940–1951. Jusczyk, P. W. (2000). The discovery of spoken language. Cambridge, MA: MIT Press. Jusczyk, P. W., Hohne, E. A., & Bauman, A. (1999). Infants’ sensitivity to allophonic cues to word segmentation. Perception and Psychophysiology 61(8), 1465–1476. Jusczyl, P. W., Pisoni, D. B., & Mullennix, J. (1992). Some consequences of stimulus variability on speech processing by 2-month-old infants. Cognition 43(3), 253–291. Kirby, S., Griffiths, T., & Smith, K. (2014). Iterated learning and the evolution of language. Current Opinion in Neurobiology 28, 108–114. Koelsch, S., Grossmann, T., Gunter, T. C., Hahne, A., Schröger, E., & Friederici, A. D. (2003). Children processing music: Electric brain responses reveal musical competence and gender differences. Journal of Cognitive Neuroscience 15(5), 683–693. Kostelanetz, R. (1987). Notes on Milton Babbitt as text-sound artist. Perspectives of New Music 25(1/2), 280–284. Kotilahti, K., Nissilä, I., Näsi, T., Lipiäinen, L., Noponen, T., Meriläinen, P., … Fellman, V. (2010). Hemodynamic responses to speech and music in newborn infants. Human Brain Mapping 31(4), 595–603. Kreiner, H., & Eviatar, Z. (2014). The missing link in the embodiment of syntax: Prosody. Brain & Language 137, 91–102. Kreutzer, N. (2001). Song acquisition among from rural Shona-speaking Zimbabwean children from birth to 7 years. Journal of Research in Music Education 49(3), 198–211. Kuhl, P. K. (2010). Brain mechanisms in early language acquisition. Neuron 67(5), 713–727. Kuhl, P. K., Andruski, J. E., Chistovich, I., Chistovich, L. A., Kozhevnikova, E. V., Ryskina, V. L., … Lacerda, F. (1997). Cross-language analysis of phonetic units in language addressed to infants. Science 277(5326), 684–686. Kuhl, P. K., Williams, K., Lacerda, F., Stevens, K., & Lindblom, B. (1992). Linguistic experience alters phonetic perception in infants by 6 months of age. Science 255(5044), 606–608. Lai, G., Pantazatos, S. P., Schneider, H., & Hirsch, J. (2012). Neural systems for speech and song in autism. Brain 135(3), 961–975. Liu, F., Jiang, C., Wang, B., Xu, Y., & Patel, A. D. (2015). A music perception disorder (congenital amusia) influences speech comprehension. Neuropsychologia 66, 111–118. Liu, F., Patel, A. D., Fourcin, A., & Stewart, L. (2010). Intonation processing in congenital amusia: Discrimination, identification and imitation. Brain 133(6), 1682–1693. Lynch, M. P., & Eilers, R. E. (1992). A study of perceptual development for musical tuning. Perception and Psychophysics 52(6), 599–608.
Lynch, M. P., Eilers, R. E., Kimbrough Oller, D., & Urbano, R. C. (1990). Innateness, experience, and music perception. Psychological Science 1(4), 272–276. McAdams, S., & Bertoncini, J. (1997). Organization and discrimination of repeating sound sequences by newborn infants. Journal of the Acoustical Society of America 102(5), 2945–2953. McMullen, E., & Saffran, J. R. (2004). Music and language: A developmental comparison. Music Perception 21(3), 289–311. McWhorter, J. (2015). The world’s most musical languages. The Atlantic, November 13. Retrieved from https://www.theatlantic.com/international/archive/2015/11/tonal-languages-linguisticsmandarin/415701/ Mahmoudzadeh, M., Dehaene-Lambertz, G., Fournier, M., Kongolo, G., Goudjil, S., Dubois, J., & Wallois, F. (2013). Syllabic discrimination in premature human infants prior to complete formation of cortical layers. Proceedings of the National Academy of Sciences 110(12), 4846–4851. Mahmoudzadeh, M., Wallois, F., Kongolo, G., Goudjil, S., & Dehaene-Lambertz, G. (2016). Functional maps at the onset of auditory inputs in very early preterm human neonates. Cerebral Cortex 27(4), 2500–2512. Mampe, B., Friederici, A. D., Christophe, A., & Wermke, K. (2009). Newborns’ cry melody is shaped by their native language. Current Biology 19, 1994–1997. Mann, B. (1961). Who Put the Bomp. ABC-Paramount 10237. Minagawa-Kawai, Y., Cristià, A., & Dupoux, E. (2011b). Cerebral lateralization and early speech acquisition: A developmental scenario. Developmental Cognitive Neuroscience 1(3), 217–232. Minagawa-Kawai, Y., Cristià, A., Vendelin, I., Cabrol, D., & Dupoux, E. (2011a). Assessing signaldriven mechanisms in neonates: Brain responses to temporally and spectrally different sounds. Frontiers in Psychology 2. Retrieved from https://doi.org/10.3389/fpsyg.2011.00135 Minagawa-Kawai, Y., Mori, K., & Sato, Y. (2005). Different brain strategies underlie the categorical perception of foreign and native phonemes. Journal of Cognitive Neuroscience 17(9), 1376–1385. Miranda, R. A., & Ullman, M. T. (2007). Double dissociation between rules and memory in music: An event-related potential study. NeuroImage 38(2), 331–345. Moog, H. (1976). The musical experience of the pre-school child. Trans. C. Clarke. London: Schott. Moon, C., Cooper, R. P., & Fifer, W. P. (1993).Two-day-olds prefer their native language. Infant Behavior and Development 16(4), 495–500. Moran, J. (2006). Artist-in-residence. New York: Blue Note Records. Möttönen, R., Calvert, G. A., Jääskeläinen, I. P., Matthews, P. M., Thesen, T., Tuomainen, J., & Sams, M. (2006). Perceiving identical sounds as speech or non-speech modulates activity in the left posterior superior temporal sulcus. NeuroImage 30(2), 563–569. Näätänen, R., Lehtokoski, A., Lennes, M., Cheour, M., Huotilainen, M., Iivonen, A., & Allik, J. (1997). Language-specific phoneme representations revealed by electric and magnetic brain responses. Nature 385(6615), 432–434. Nazzi, T., Bertoncini, J., & Mehler, J. (1998). Language discrimination by newborns: Toward an understanding of the role of rhythm. Journal of Experimental Psychology: Human Perception and Performance 24(3), 756–766. Nespor, M., Shukla, M., van de Vijver, R., Avesani, C., Schraudolf, H., & Donati, C. (2008). Different phrasal prominence realization in VO and OV languages. Lingue e Linguaggio 7(2), 1– 28. Norman-Haignere, S., Kanwisher, N. G., & McDermott, J. H. (2015). Distinct cortical pathways for music and speech revealed by hypothesis-free voxel decomposition. Neuron 88(6), 1281–1296. Nuñez, S. C., Dapretto, M., Katzir, T., Starr, A., Bramen, J., Kan, E., … Sowell, E. R. (2011). fMRI of syntactic processing in typically developing children: Structural correlates in the inferior frontal gyrus. Developmental Cognitive Neuroscience 1(3), 313–323.
Olsho, L. W., Schoon, C., Sakai, R., Turpin, R., & Sperduto, V. (1982). Auditory frequency discrimination in infancy. Developmental Psychology 18(5), 721–726. Overy, K., Nicolson, R. I., Fawcett, A. J., & Clarke, E. F. (2003). Dyslexia and music: Measuring musical timing skills. Dyslexia 9(1), 18–36. Pannekamp, A., Weber, C., & Friederici, A. D. (2006). Prosodic processing at the sentence level in infants. NeuroReport 17(6), 675–678. Patel, A. D. (2012). Language, music, and the brain: A resource-sharing framework. In P. Rebuschat, M. Rohrmeier, J. A. Hawkins, & I. Cross (Eds.), Language and music as cognitive systems (pp. 204–223). Oxford: Oxford University Press. Perani, D. (2012). Functional and structural connectivity for language and music processing at birth. Rendiconti Lincei 23(3), 305–314. Perani, D., Dehaene, S., Grassi, F., Cohen, L., Cappa, S. F., Dupoux, E., … Mehler, J. (1996). Brain processing of native and foreign languages. NeuroReport 7(15–17), 2439–2444. Perani, D., Paulesu, E., Galles, N. S., Dupoux, E., Dehaene, S., Bettinardi, V., & Mehler, J. (1998). The bilingual brain: Proficiency and age of acquisition of the second language. Brain 121(10), 1841–1852. Perani, D., Saccuman, M. C., Scifo, P., Anwander, A., Spada, D., Baldoli, C., … Friederici, A. D. (2011). Neural language networks at birth. Proceedings of the National Academy of Sciences 108(38), 16056–16061. Perani, D., Saccumann, M. C., Scifo, P., Spada, D., Andreolli, G., Rovelli, R., … Koelsch, S. (2010). Functional specializations for music processing in the human newborn brain. Proceedings of the National Academy of Sciences 107(10), 4758–4763. Peretz, I. (2001). The biological foundations of music. In E. Dupoux (Ed.), Language, Brain, and Cognitive Development: Essays in Honor of Jacques Mehler (pp. 435–466). Cambridge, MA: MIT Press. Peretz, I., & Coltheart, M. (2003). Modularity of music processing. Nature Neuroscience 6(7), 688– 691. Peretz, I., & Hyde, K. L. (2003). What is specific to music processing? Insights from congenital amusia. Trends in Cognitive Sciences 7(8), 362–367. Perruchet, P., & Pacton, S. (2006). Implicit learning and statistical learning: One phenomenon, two approaches. Trends in Cognitive Sciences 10(5), 233–238. Petitto, L. A., Holowka, S., Sergio, L. E., & Ostry, D. (2001). Language rhythms in baby hand movements. Nature 413(6851), 35–36. Piazza, E. A., Iordan, M. C., & Lew-Williams, C. (2017). Mothers consistently alter their unique vocal fingerprints when communicating with infants. Current Biology 27(20), 3162–3167. Polka, L., & Werker, J. F. (1994). Developmental changes in perception of nonnative vowel contrasts. Journal of Experimental Psychology: Human Perception and Performance 20(2), 421–435. Prochnow, A., Erlandsson, S., Hesse, V., & Wermke, K. (2017). Does a “musical” mother tongue influence cry melodies? A comparative study of Swedish and German newborns. Musicae Scientiae (October). doi:1029864917733035 Przybylski, L., Bedoin, N., Krifi-Papoz, S., Herbillon, V., Roch, D., Léculier, L., & Tillmann, B. (2013). Rhythmic auditory stimulation influences syntactic processing in children with developmental language disorders. Neuropsychology 27(1), 121–131. Radvansky, G. A., & Potter, J. K. (2000). Source cuing: Memory for melodies. Memory and Cognition 28(5), 693–699. Ramus, F., & Mehler, J. (1999). Language identification with suprasegmental cues: Study based on speech resynthesis. Journal of the Acoustical Society of America 105, 512–521.
Ravignani, A., Delgado, T., & Kirby, S. (2016). Musical evolution in the lab exhibits rhythmic universals. Nature Human Behaviour. doi:10.1038/s41562-016-0007 Reich, S. (1989). Different Trains/Electric Counterpoint. Nonesuch 979176-2. Rivera-Gaxiola, M., Klarman, L., Garcia-Sierra, A., & Kuhl, P. K. (2005). Neural patterns to speech and vocabulary growth in American infants. NeuroReport 16, 495–498. Robinson, K., & Patterson, R. D. (1995). The duration required to identify the instrument, the octave, or the pitch chroma of a musical note. Music Perception 13, 1–14. Rousseau, B. (2016). Which language uses the most sounds? Click 5 times for the answer. The New York Times, November 25. Retrieved from https://www.nytimes.com/2016/11/25/world/what-inthe-world/click-languages-taa-xoon-xoo-botswana.html Saffran, J., Aslin, R., & Newport, E. (1996). Statistical learning by eight-month-old infants. Science 274(5294), 926–928. Saint-Georges, C., Chetouani, M., Cassel, R., Apicella, F., Mahdhaoui, A., Muratori, F., & Cohen, D. (2013). Motherese in interaction: At the cross-road of emotion and cognition? (A systematic review). PLoS ONE, 8(10). Retrieved from https://doi.org/10.1371/journal.pone.0078103 Sammler, D., Baird, A., Valabrègue, R., Clément, S., Dupont, S., Belin, P., & Samson, S. (2010). The relationship of lyrics and tunes in the processing of unfamiliar songs: A functional magnetic resonance adaptation study. Journal of Neuroscience 30(10), 3572–3578. Sansavini, A., Bertoncini, J., & Giovanelli, G. (1997). Newborns discriminate the rhythm of multisyllabic stressed words. Developmental Psychology 33(1), 3–11. Schellenberg, E. G., & Trainor, L. J. (1996). Sensory consonance and the perceptual similarity of complex-tone harmonic intervals: Tests of adult and infant listeners. Journal of the Acoustical Society of America 100(5), 3321–3328. Scott, C. (2004). Syntactic ability in children and adolescents with language and learning disabilities. In R. A. Berman (Ed.), Language Development Across Childhood and Adolescence (pp. 111–134). Amsterdam: John Benjamins. Shepard, R. N. (1980). Multidimensional scaling, tree-fitting, and clustering. Science 210(4468), 390–398. Shi, R., Werker, J. F., & Morgan, J. L. (1999). Newborn infants’ sensitivity to perceptual cues to lexical and grammatical words. Cognition 72(2), B11–B21. Shultz, S., Vouloumanos, A., Bennett, R. H., & Pelphrey, K. (2014). Neural specialization for speech in the first months of life. Developmental Science 17(5), 766–774. Smith, K., Kirby, S., & Brighton, H. (2003). Iterated learning: A framework for the emergence of language. Artificial Life 9, 371–386. Speer, J. R., & Meeks, P. U. (1985). School children’s perception of pitch in music. Psychomusicology 5, 49–56. Spiller, H. (2004). The traditional sounds of Indonesia. Santa Barbara, CA: ABC-CLIO. Stern, D. N., Spieker, S., & MacKain, K. (1982). Intonation contours as signals in maternal speech to prelinguistic infants. Developmental Psychology 18(5), 727–735. Thompson, W. F., Marin, M. M., & Stewart, L. (2012). Reduced sensitivity to emotional prosody in congenital amusia rekindles the musical protolanguage hypothesis. Proceedings of the National Academy of Sciences 109(46), 19027–19032. Tillmann, B., Albouy, P., & Caclin, A. (2015). Congenital amusias. In G. G. Celesia & G. S. Hickok (Eds.), The human auditory system: Fundamental organization and clinical disorder (3rd ed.; pp. 589–605). Amsterdam: Elsevier. Trainor, L. J., Austin, C. M., & Desjardins, N. (2000). Is infant-directed speech a result of the vocal expression of emotion? Psychological Science 11(3), 188–195.
Trainor, L. J., & Corrigall, K. A. (2010). Music acquisition and effects of musical experience. In M. Riess Jones, R. R. Fay, & A. N. Popper (Eds.), Music Perception (Vol. 36; pp. 89–127). New York: Springer. Trainor, L. J., & Trehub, S. E. (1994). Key membership and implied harmony in Western tonal music: Developmental perspectives. Attention, Perception, and Psychophysics 56(2), 125–132. Trainor, L. J., Wu, L., & Tsang, C. D. (2004). Long-term memory for music: Infants remember tempo and timbre. Developmental Science 7(3), 289–296. Trehub, S., Cohen, A., Thorpe, L., & Morrongiello, B. (1986). Development of the perception of musical relations: Semitone and diatonic structure. Journal of Experimental Psychology: Human Perception and Performance 12, 295–301. Trehub, S. E., Endman, M. W., & Thorpe, L. A. (1990). Infants’ perception of timbre: Classification of complex tones by spectral structure. Journal of Experimental Child Psychology 49(2), 300–313. Trehub, S. E., & Thorpe, L. A. (1989). Infants’ perception of rhythm: Categorization of auditory sequences by temporal structure. Canadian Journal of Psychology/Revue canadienne de psychologie 43(2), 217–229. Vouloumanos, A., & Werker, J. F. (2007). Listening to language at birth: Evidence for a bias for speech in neonates. Developmental Science 10(2), 159–164. Wang, X., & Peng, G. (2014). Phonological processing in Mandarin speakers with congenital amusia. Journal of the Acoustical Society of America 136(6), 3360–3370. Weiss, A. H., Granot, R. Y., & Ahissar, M. (2014). The enigma of dyslexic musicians. Neuropsychologia 54, 28–40. Welch, G. F. (2002). Early childhood musical development. In L. Bresler & C. Thompson (Eds.), The arts in children’s lives: Context, culture and curriculum (pp. 113–128). Dordrecht: Kluwer. Welch, G. F. (2009). Evidence of the development of vocal pitch matching ability in children. Japanese Journal of Music Education Research 21, 1–13. Werker, J. F., & Tees, R. (1984). Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. Infant Behavior and Development 7, 49–63. Wermke, K., Leising, D., & Stellzig-Eisenhauer, A. (2007). Relation of melody complexity in infants’ cries to language outcome in the second year of life: A longitudinal study. Clinical Linguistics and Phonetics 21(11–12), 961–973. Wermke, K., & Mende, W. (2009). Musical elements in human infants’ cries: In the beginning is the melody. Musicae Scientiae 13(2), 151–175. Werner, L. A., & Marean, G. C. (1996). Human auditory development. Madison, WI: Brown Benchmark. Wieland, E. A., McAuley, J. D., Dilley, L. C., & Chang, S. E. (2015). Evidence for a rhythm perception deficit in children who stutter. Brain & Language 144, 26–34. Winkler, I., Háden, G. P., Ladinig, O., Sziller, I., & Honing, H. (2009). Newborn infants detect the beat music. Proceedings of the National Academy of Sciences 106(7), 2468–2471. Zatorre, R. J., & Belin, P. (2001). Spectral and temporal processing in human auditory cortex. Cerebral Cortex 11(10), 946–953. Zhang, Y., Kuhl, P. K., Imada, T., Iverson, P., Pruitt, J., Stevens, E. B., & Nemoto, I. (2009). Neural signatures of phonetic learning in adulthood: A magnetoencephalography study. NeuroImage 46(1), 226–240. Zuk, J., Bishop-Liebler, P., Ozernov-Palchik, O., Moore, E., Overy, K., Welch, G., & Gaab, N. (2017). Revisiting the “enigma” of musicians with dyslexia: Auditory sequencing and speech abilities. Journal of Experimental Psychology: General 146(4), 495–511.
CHAPT E R 24
RHYTHM, METER, AND T I M I N G : T H E H E A RT B E AT OF MUSICAL DEVELOPMENT L A U R E L J. T R A I N O R A N D S U S A N MA R S H - R O L L O
I E we perceive, think, feel, and do unfolds over time. For example, the notes of music and the syllables of speech only make sense in the context of the preceding and forthcoming sounds in the sequences in which they are embedded. Many motor acts are also rhythmic, from heartbeats and breathing to locomotion, articulating speech, and playing a musical instrument. From regularities in the rhythmic surface (i.e., the pattern of temporal intervals defined by sound onsets) of music, adults can extract a beat, a quasi-steady (quasi-isochronous) internally constructed tempo, to which they can entrain motor movements such as tapping, clapping, and dancing (e.g., Gjerdingen, 1989; London, 2004; Repp, 2005; Repp & Su, 2013). Beat extraction is complex in that although it often matches periodicities in the input rhythm, a beat can be perceived even when half or more than half of the event onsets in rhythmic surface are off the perceived beat (syncopated) (Brochard, Abecasis, Potter, Ragot, & Drake, 2003; Tal et al., 2017). From a piece of music, adults can typically
extract different beat tempos that are hierarchically organized, forming a metrical hierarchy (Essens & Povel, 1985; Hannon, Nave-Blodgett, & Nave, 2018; Jones & Boltz, 1989; Lerdahl & Jackendoff, 1983). Typically, every second or third beat from one metrical beat level is maintained at the next level of the hierarchy, although other patterns are also possible, such as alternating groups of two and three beats, which form groups of five at the next level of the hierarchy. Rhythms and their derived underlying metrical structures are powerful for several reasons. First, they provide an organizational structure on which incoming sound events can be hung. Second, they provide a means for chunking or phrasing the incoming information into meaningful units (Dowling, 1973). Third, because of their underlying regularity, rhythms enable prediction of when important information is expected to occur next, so that attention can be deployed at the most important time points for optimizing perceptual processing (e.g., Chang, Bosnyak, & Trainor, under review b; Ding et al., 2017; Fujioka, Trainor, Large, & Ross, 2012; Haegens & Zion-Golumbic, 2018; Jones, Moynihan, MacKenzie, & Puente, 2002; Large & Jones, 1999; Nobre, Correa, & Coull, 2007; Schroeder & Lakatos, 2009; van Ede, Niklaus, & Nobre, 2017). The ubiquity of rhythms in biological systems, and evidence that hearing an auditory rhythm involves auditory–motor connections (Fujioka et al., 2012; Grahn & Brett, 2007; Merchant, Grahn, Trainor, Rohrmeier, & Fitch, 2015; Patel & Iversen, 2014; Trainor & Zatorre, 2015; Zatorre, Chen, & Penhune, 2007), makes rhythmic processes central to many aspects of development. It should be noted that rhythms and metrical hierarchies do not depend completely on isochronous timing. Indeed, deviations from completely regular timing are also important and are often used expressively in music (James, Michel, Britz, Vuilleumier, & Hauert, 2012; Rankin, Large, & Fink, 2009; Repp, 1992). For example, phrases typically speed up in the middle and slow down at the end (Palmer, 1989; Repp, 1992; Todd, 1985), and prolonging particular notes can give them emphasis and increase expectations for the next note (Huron, 2006; Meyer, 1956). We present the position here that timing, meter, and rhythm are the most fundamental aspects of music, on which other aspects of music, such as pitch structures, dynamics, and phrasing, are built. This chapter explores the development of musical timing, meter, and rhythm, without which musical perception and performance would not be possible. This chapter is not
intended as a complete review of the literature; rather it examines some of the major research findings in perceptual and sensorimotor development in individuals and across social contexts.
E
P
B
, M
,
R From at least as young as two months of age, infants detect tempo changes (Baruch & Drake, 1997) and can discriminate different rhythm patterns composed of the same interval durations but in different orders, such as 100–600–300 ms versus 600–300–100 ms (Chang & Trehub, 1977; Demany, McKenzie, & Vurpillot, 1977; Lewkowicz, 2003). Impressively, they can make such discriminations across changes in the tempo and pitch level of the comparison patterns (Trehub & Thorpe, 1989). Young infants also show sensitivity to phrase and grouping structures in music. For example, they are more sensitive to small timing perturbations inserted in the middle of a phrase than at phrase boundaries, where an elongation is likely to occur in any case (Jusczyk & Krumhansl, 1993; Thorpe & Trehub, 1989; Trainor & Adams, 2000). However, the temporal cues infants use to determine grouping boundaries of short sequences may be influenced by the language they are exposed to in their environment. Trainor and Adams (2000) found that English-learning infants marked the ends of short perceptual groups with longer tones, whereas Yoshida et al. (2010) found that Japanese-learning infants marked the beginnings of perceptual groups with longer tones, consistent with linguistic accent structures in the respective languages. Infants also show precocious sensitivity to beat and metrical structure. For example, newborns exposed to a rhythmic pattern with a 4/4 time structure show a larger event-related potential (ERP) response in electroencephalographic (EEG) recordings to omissions on strong compared to weak beats (Winkler, Háden, Ladinig, Sziller, & Honing, 2009), suggesting sensitivity to metrical structure. However, this study needs to be replicated, as whether a beat was strong or weak was confounded with the number of sounds that were omitted. By 7 months of age, however, there is clear evidence of metrical processing. After
habituation to repeated trials containing rhythms in either duple or triple meter, infants showed renewed interest for a rhythm presented with the other (novel) meter (Hannon & Johnson, 2005). In another study, infants bounced on either every second or third beat of a repeating six-beat ambiguous rhythm pattern subsequently preferred to listen to a version of the rhythm pattern with accents added on either every second or every third beat that matched the meter set up during the bouncing experience, indicating both discrimination between meters and early involvement of the motor system in the perception of meter (Phillips-Silver & Trainor, 2005). In adults, listening to a rhythm pattern entrains neural activity at the perceived beat and meter frequencies (i.e., tempos) (Fujioka et al., 2012; Fujioka, Ross, & Trainor, 2015; Nozaradan, 2014; Nozaradan, Peretz, & Mouraux, 2012; Tal et al., 2017). Evidence from EEG studies also indicates that at least as young as 7 months, neural oscillations in auditory cortex entrain to both beat and metrical levels of rhythm patterns (Cirelli, Spinelli, Nozaradan, & Trainor, 2016). Seven-month-old and 15-month-old infants listened to a repeating six-beat rhythm pattern that could be interpreted as either in duple or triple meter, as in Phillips-Silver and Trainor (2005). EEG recordings were subjected to Fourier analysis following the frequencytagging methods of Nozaradan and colleagues (Nozaradan, Peretz, Missal, & Mouraux, 2011). Peaks in the frequency spectrum were found that corresponded to the beat frequency as well as to both duple and triple metrical interpretations, indicating neural entraining at both beat and meter frequencies in both age groups. Interestingly, at 7 months, the amplitude at the duple meter frequency was enhanced in infants engaged in infant–parent music classes compared to those not enrolled in music classes. At 15 months, beat and both meter frequencies were all enhanced in infants whose parents were musically trained compared to those whose parents were not musically trained. Thus, early in development neural circuits are sensitive to the temporal structure of incoming auditory rhythmic patterns. Just as tonality, pitch, and harmonic structures vary across the musical systems used in different cultures, so do metrical structures (Hannon et al., 2018; Hannon & Trainor, 2007; Trainor & Hannon, 2013). In Western music, durations most commonly stand in simple 2:1 ratios (e.g., a march meter), with 3:1 as the next most common ratio (e.g., a waltz meter). However, in many parts of the world (e.g., Africa, the Balkans, South Asia, South America), more complex metrical patterns that create a non-
isochronous beat at one or more levels of the metrical hierarchy are common (Hannon, Soley, & Ullal, 2012; London, 2004). For example, an isochronous beat at a basic level of the hierarchy might be grouped into alternating groups of three and two beats at the next level of the hierarchy, creating a five-beat pattern. The alternating groups of two and three beats create a more complex duration ratio of 3:2. Western adults without exposure to music with such metrical patterns are much better at discriminating, remembering, reproducing, and tapping to rhythm patterns with simple compared to complex meters (e.g., Essens, 1986; Essens & Povel, 1985; Fraisse, 1982; Hannon & Trehub, 2005a; Repp, London, & Keller, 2005; Snyder, Hannon, Large, & Christiansen, 2006). However, adults who grew up in cultures employing complex meter do not show processing differences between familiar rhythms with simple and complex meters (e.g., Hannon & Trehub, 2005a; Hannon, Soley, & Ullal, 2012). Just as infants learn the particular language(s) in their environment, becoming more sensitive to the phonemic structure of that language, and less sensitive to alternative phonemic structures by their first birthday (Kuhl et al., 2006; Werker & Tees, 2005), a similar process of perceptual narrowing occurs in music acquisition, such that infants become specialized at processing both the tonal (Gerry, Unrau, & Trainor, 2012; Lynch & Eilers, 1992; Lynch, Eilers, Oller, & Urbano, 1990; Trainor, Marie, Gerry, Whiskin, & Unrau, 2012; Trainor & Trehub, 1992, 1994) and metrical structures (Gerry, Faux, & Trainor, 2010; Hannon & Trehub, 2005a, 2005b) in the music they experience in their environment (Hannon & Trainor, 2007; Trainor & Corrigall, 2010; Trainor & Hannon, 2013; Trainor & Unrau, 2012). With respect to metrical processing, at 4 to 6 months of age, Western infants notice if an extra beat is added to a 7/4 meter as well as if a beat is dropped from an 8/4 meter (Hannon & Trehub, 2005a). However, performance on the 7/4 meter declines between 7 and 12 months, such that 12-month-old Western infants, like Western adults, perform very poorly on this task (Hannon & Trehub, 2005b). That these declines for non-native meters are driven by experience is reinforced by findings that listening experience can speed up or slow down (or even reverse) the perceptual narrowing. As far as slowing down or reversing perceptual narrowing, providing daily listening experience with non-Western non-isochronous meters reinstates the loss of sensitivity for the non-isochronous meters in 12-month-olds (Hannon & Trehub, 2005a). Interestingly, there appears to
be a window of sensitivity for reversing perceptual narrowing for meter, as Western 5- to 7-year-old children show some, but not full, reinstatement after a similar listening experience with non-isochronous meters, whereas adults show no evidence of reinstatement after such experience (Hannon, Vandenbosch der Nederlanden, & Tichko, 2012). As far as speeding up perceptual narrowing, Gerry et al. (2010) found that 7-month-old infants enrolled in Kindermusik classes showed a listening preference for a rhythm with accents on every second beat compared to the same rhythm with accents on every third beat, the former meter being more common in Western music, whereas infants not enrolled in music classes did not show this preference. In general, a preference for native meters may be evident prior to perceptual narrowing. Soley and Hannon (2010) found that a preference for isochronous over non-isochronous meters increases between 4 and 8 months in Western infants whereas no listening preferences are evident during this age period in Turkish infants. The bias for culture-specific meters continues into childhood. Einarson and Trainor (2015, 2016) developed a child-friendly version of the Beat Alignment Task (BAT) (Iversen & Patel, 2008) that included music with both simple and complex meters (cBAT). In this task, children watch pairs of short video excerpts of puppets drumming to musical excerpts. One puppet drums on the beat and the other puppet’s drumming is either at the wrong tempo or misaligned in phase with the music. The children decide which puppet is the best drummer for a band. Western 5-year-old children were at change levels on music with complex meters, but performed significantly better (and above change levels) on music with simple meters. Together, the infant and child studies indicate that perceptual narrowing for the metrical structures common in the music of one’s culture develop early and are maintained in childhood, raising the interesting question of whether best pedagogical practice might be to expose infants and young children to complex meters if the goal is to provide them with the perceptual tools to understand the rhythms of music from around the world.
T
R
I S
-D
For most infants, their first experience of music is likely hearing their mother sing, and a diary study of North American mothers, where opportunities to listen to recorded music abound, indicates that most mothers still sing to their infants many times during the day, such as when bathing them, playing, feeding, during diaper changes, in the car, and at sleep time (Trehub et al., 1997). Infants often experience their parents’ singing while being held and rocked or walked rhythmically, or while feeling their parent tap their back or touch other body parts rhythmically during the song, so that from early ages, infants experience musical rhythms in a multisensory context involving hearing, movement, and vision. Singing to infants appears to be a spontaneous intuitive response to the presence of an infant, and universal across human cultures. Furthermore, across cultures, Western adults were able to discern lullabies that were intended for infants from other songs matched in tempo and general style (Trehub, Unyk, & Trainor, 1993), suggesting that infant-directed singing might have been an evolutionary adaptation that helped infants to survive. The one song category that Western adults found difficult to distinguish from lullabies was love songs (Trehub & Trainor, 1998), suggesting that lullabies may express and communicate the deep emotional bonds between parents and their infants. Music appears to be particularly effective at controlling infants’ states—when left alone, hearing their mothers’ infant-directed singing was found to keep infants happier for considerably longer than hearing their mothers’ infant-directed speech (Corbeil, Trehub, & Peretz, 2016), and cortisol levels were found to decrease in infants when their mothers sang to them (Shenfield, Trehub, & Nakata, 2003). Just as adults use a different speaking style when talking to infants compared to adults, termed infant-directed or musical speech (Fernald, 1991; Papoušek, Papoušek, & Symmes, 1991), they sing differently to infants than they sing in other circumstances (Trehub & Trainor, 1998). Trainor (1996) recorded mothers singing the same song when their infant was present and when their infant was absent and found that adults were highly accurate at identifying the infant-directed versions. Furthermore, using a preferential looking paradigm, she found that infants preferred to listen to the infant-directed versions. In addition to being sung at a higher pitch and in a more loving tone of voice, in comparison to non-infantdirected singing, infant-directed singing also differs in timing and rhythmic features. It is generally slower in tempo and has exaggerated structural
features, such as enhanced phrase boundaries, rhythm, and grouping (Longhi, 2009; Trainor, Clark, Huntley, & Adams, 1997). For example, infant-directed singing contains longer pauses between phrases (Trainor et al., 1997). There is also evidence that mothers exaggerate the hierarchical beat structure of songs when singing to infants, using both acoustic accents and body movements to do so (Longhi, 2009). They particularly emphasize upbeats, which is interesting in that upbeats provide anticipatory information that a downbeat is expected to follow. And while infants are not yet able to synchronize movements precisely to the beats of music, infants in this study made more synchronous movements to beats at the beginnings and ends of phrases than in the middle, suggesting some understanding of the temporal structure of phrases in infant-directed singing with exaggerated rhythmic cues. There is some evidence that depressed mothers do not employ the full repertoire of infant-directed singing features, generally singing faster and with less expression than non-depressed mothers (de l’Etoile & Leider, 2011), and possibly compromising communication with their infants. Two basic categories of infant-directed singing have been identified: lullabies, where the intention is to help a fussy infant to fall asleep, and playsongs, where the intention is to rouse the infant, interact with them in play, and direct their attention to interesting people and things in the environment (Rock, Trainor, & Addison, 1999; Trainor, 1996). These two categories arise more from the style or manner in which the caregiver sings than the structural content of the music. Indeed, Rock et al. (1999) recorded mothers singing a song of their choice to their infant, once in a lullaby style and once in a playsong style. Adult raters were 100 percent accurate at identifying which were lullabies and which playsongs, indicating that these styles are highly distinct. Furthermore they rated playsongs as sounding more rhythmic, clipped, and accented compared to lullabies, which were rated as sounding smoother. Importantly, infants show differential behaviors in response to lullaby and playsong renditions of the same song (Rock et al., 1999) and prefer faster tempos for playsongs but not for lullabies (Conrad, Walsh, Allen, & Tsang, 2011). That such timing and rhythmic differences are likely universal across human cultures and musical systems suggests that the perceptual, emotional, and social consequences of these temporal features may have evolutionary origins.
Singing to infants is a social interaction that requires temporal coordination between the caregiver and the infant. Such temporal coordination during the first months after birth appears to promote communication, the development of successful social interactions, and emotion regulation (Ilari, 2016; Malloch & Trevarthen, 2009). These social consequences of rhythmic interactions are discussed later in the chapter.
R O
, P , R
, N D
The regularity of rhythms enables prediction of when the next beat is likely to occur (Large & Jones, 1999; Trainor & Zatorre, 2015), which can aid in preparing for incoming information and focusing attention at informationrich points in time. Predictive timing is critical for the perception of stimuli such as speech and music that unfold rapidly over time and are fleeting in that once each note or phoneme ends, the next begins, and it is not possible to hear the input again. Indeed there is considerable evidence that the adult brain is continually predicting the future and comparing its predictions with what actually occurs (Fujioka et al., 2012, 2015; Herrmann, Henry, Haegens, & Obleser, 2016; Morillon & Schroeder, 2015). In the case of incorrect predictions, an error signal is generated which can engage attention and lead to additional processing and learning (Arnal & Giraud, 2012; Chang, Bosnyak, & Trainor, 2018; Ding et al., 2017; Haegens & Zion Golumbic, 2018; Nobre et al., 2007; Nobre & van Ede, 2018; Schroeder & Lakatos, 2009; Schröger, Marzecová, & SanMiguel, 2015). In adults, both behavioral and neural evidence indicates that the perception of sounds at beat onsets presented in rhythmic contexts is enhanced (Arnal & Giraud, 2012; Chang et al., under review b; Haegens & Zion-Golumbic, 2018; Henry & Obleser, 2012; Herrmann et al., 2016; Jones et al., 2002; Morillon, Schroeder, Wyart, & Arnal, 2016; Nobre & van Ede, 2018). Predictive processes are evident very early in infancy in that occasional unexpected changes (deviants) in isochronous sound sequences lead to ERP mismatch responses (MMRs) in EEG recordings (Basirat, Dehaene, & DehaeneLambertz, 2014; Háden, Németh, Török, & Winkler, 2015; He, Hotson, &
Trainor, 2007; Trainor, 2012; Trainor & He, 2013; Trainor & Zatorre, 2015). The neural mechanisms that underlie predictive timing are beginning to be understood in the adult brain in terms of neural oscillations. Specifically, low frequency neural oscillations (delta band, ~1–3 Hz) phase align with the onsets of beats in an auditory rhythmic stimulus such that predictive timing is enhanced (Arnal, Poeppel, & Giraud, 2015; Bauer, Bleichner, Jaeger, Thorne, & Debener, 2018; Calderone, Lakatos, Butler, & Castellanos, 2014; Henry, Herrmann, & Obleser, 2014; Henry & Obleser, 2012; Schroeder & Lakatos, 2009; Stefanics et al., 2010). The power in higher frequency oscillations (beta band, ~20 Hz) is also modulated by auditory rhythmic stimuli such that beta power decreases after a beat onset and rebounds so as to reach maximum amplitude at the expected time of the next beat, dependent on the tempo of the rhythmic input (Fujioka et al., 2012, 2015). This rebound time appears to be a neural signature of timing prediction in the brain, and beta oscillations are proposed to reflect attentional processes leading to enhanced perception at particular time points (Arnal & Giraud, 2012; Chang et al., 2018, under review b; Iversen, Repp, & Patel, 2009; Snyder & Large, 2005). Beta oscillations also appear to be associated with capture of attention (Chang et al., 2018). Very little research has examined the development of neural oscillations involved in predictive processes in rhythmic contexts. However, one study compared beta oscillations in 7-year-old children and adults in response to isochronous beat sequences at different tempos (Cirelli, Bosnyak, et al., 2014). Beta power entrainment to the tempo of the input was found, but the responses of children were noisier than those of adults, and were measurable only over a more narrow range of tempos than in adults. This suggests that the neural oscillatory responses underlying predictive timing follow a protracted developmental trajectory. Clearly more research is needed to understand the brain development underlying rhythmic predictive processes. While neural oscillation studies of predictive timing have focused on stimuli with constant tempos, in real music performances the tempo typically modulates continuously (James et al., 2012; Palmer, 1989; Rankin et al., 2009; Repp, 1992; Todd, 1985). These timing perturbations are not random, but interact with the structure and content of the music. In particular, expressive timing emphasizes phrase boundaries by lengthening
phrase-final notes or chords, and plays with temporal expectations by, for example, elongating notes or chords that embody harmonic tension, thus delaying their resolution (London, 2004; Repp, 2005; Repp & Su, 2013). Perceptual studies indicate that both musically untrained adults (Clarke & Krumhansl, 1990; Deliege, 1987; Palmer & Krumhansl, 1987; Peretz, 1989) and infants (Krumhansl & Jusczyk, 1990; Trainor & Adams, 2000) are sensitive to phrase boundaries. Even non-musicians produce phrase-final lengthening in their performances (Kragness & Trainor, 2016), suggesting that it is not a learned performance technique, but is based on intrinsic temporal expectations. Specifically, Kragness and Trainor (2016) used a self-paced tapping paradigm in which non-musician adults pressed a key to get the next chord in an unfamiliar sequence of chords. In their renditions, adults tended to speed up in the middle of phrases defined by typical Western cadences, and to slow down at the ends of phrases, even after metrical regularity and melodic contour were controlled. One possible explanation for phrase-final lengthening is that ends of phrases tend to be points of high entropy in that it is difficult to predict what will come next, whereas points in the middle of phrases tend to be of low entropy in that it is relatively easy to predict the next note or chord (Pearce, Müllensiefen, & Wiggins, 2010; Pearce & Wiggins, 2006). The uncertainty at phrase boundaries might require more processing time, leading to a natural slowing. Further evidence for an entropy explanation comes from a developmental study in which children as young as 3 years were found to dwell longer at phrase endings, although sophistication in the cues used to detect phrase boundaries increased between 3 and 7 years of age (Kragness & Trainor, 2018). That very young children produce phrase-final lengthening in their musical productions is consistent with the possibility that it is based on intrinsic processing properties of perception rather than reflecting learning of a particular musical performance style. These studies suggest, first, that the brain entrains to the beat and meter in auditory rhythms early in development, but that this entrainment and its relation to attention and error monitoring continues to develop for many years and, second, that although a steady beat is typically experienced when listening to music, timing perturbations in isochrony that follow the structure and content of the music are present early in childhood, suggesting that beat perception and the neural processes underlying it are not strictly isochronous, but involve an interaction between time and context.
T
M
E
Adults are good at identifying basic emotions in music such as happiness, sadness, anger, fear, and tenderness (e.g., Balkwill & Thompson, 1999; Balkwill, Thompson, & Matsunaga, 2004; Fritz et al., 2009; Juslin & Laukka, 2003; Mohn, Argstatter, & Wilker, 2011), and non-musicians appear to perform as well as musicians at this (Bigand, Vieillard, Madurell, Marozeau, & Dacquet, 2005; Juslin, 1997). Tempo, rhythm, loudness, articulation, pitch, and tonality are among the cues used in music for emotional communication (Gabrielsson & Lindström, 2010: Juslin & Timmers, 2010). Although tonality cues to emotion vary considerably across cultures and musical systems, features such as tempo, loudness, and complexity appear to operate similarly, and adults use such cues to identify the emotions in music from an unfamiliar culture (Balkwill & Thompson, 1999; Balkwill et al., 2004). When musicians play the same piece of music in different ways to express different emotions, listeners use timing and intensity cues to identify those emotions (Behrens & Green, 1993; Juslin, 1997, 2000; Laukka & Gabrielsson, 2000). A number of studies show that children as young as 3 to 4 years of age can categorize music as expressing happiness or sadness (e.g., Adachi, Trehub, & Abe, 2004; Cunningham & Sterling, 1988; Dolgin & Adelson, 1990; Esposito & Serio, 2007; Gerardi & Gerken, 1995; Giomo, 1993; Gregory, Worrall & Sarge, 1996; Kastner & Crowder, 1990; Kratus, 1993; Nawrot, 2003). A couple of studies show that even 5-month-old infants discriminate melodies expressing happiness from those expressing sadness (Flom, Gentile, & Pick 2008; Flom & Pick, 2012). The ability to identify more complex emotions such as anger and fear develop later, with some competence emerging by 5 to 6 years of age (Cunningham & Sterling, 1988; Giomo, 1993; Kratus, 1993; Terwogt & van Grinsven, 1991). The particular cues used by children in the studies reviewed here are generally not known. However, two studies specifically separated timing from pitch cues. Mote (2011) found that children as young as 4 years used tempo to distinguish happy from sad music. Dalla Bella and colleagues (Dalla Bella, Peretz, Rousseau, & Gosselin, 2001) varied both tempo and mode (major/minor) independently and found that 5-year-old children used tempo to distinguish happy from sad music, but it was not until 6 years of
age that children used mode. When singing to express emotions, children between 4 and 12 years of age increase their tempo, loudness, and pitch height to express happiness compared to sadness (Adachi & Trehub, 1998). Furthermore, these cues can be used by children from different cultures to decipher the intended emotion of other children’s singing (Adachi et al., 2004). It has been proposed that emotions can be conceptualized along two continuous dimensions, valence and arousal (Russell, 1980). In adults, high arousal emotions, including joy, excitement, fear, and anger, are typically associated with fast tempos, staccato articulation (disconnected notes), and high sound levels whereas low arousal emotions, including peacefulness, tenderness, sadness, and grief are typically associated with slow tempos, legato articulation (connected notes), and low sound levels (Gabrielsson & Juslin, 1996; Gagnon & Peretz, 2003; Ilie & Thompson, 2006; Juslin, 2000; Juslin & Timmers, 2010). One study examined how children use timing and sound intensity to convey emotions varying in valence and arousal, using a self-pacing method in which children pressed a key to get successive chords in musical pieces (Kragness, Baksh, Battcock, & Trainor, 2017). Children played each musical piece four times, expressing joy (high arousal, high valence), sadness (low arousal, low valence), peacefulness (low arousal, high valence), or anger (high arousal, low valence) on each rendition. With their key presses, children were able to control tempo (onset-to-onset duration), articulation (note duration relative to onset-to-onset), and loudness. By 5 years of age, children used faster tempos for high-arousal emotions (joy, anger) than low-arousal emotions (peacefulness, sadness). By 7 years of age, children also used articulation, playing shorter notes to express high-arousal than low-arousal emotions. In sum, children use tempo and articulation to express emotion from a fairly young age, even in the absence of musical performance experience. In adults, musical emotions are often conveyed or heightened by deviations from regular timing, termed expressive timing (James et al., 2012; Rankin et al., 2009; Repp, 1992). Meyer (1956) proposed that such deviations are one way to play with expectations and that musical emotions arise from general-purpose physiological responses to prediction errors (see also Huron, 2006; Trainor & Zatorre, 2015). Although expressive timing has not been studied developmentally to our knowledge, a couple of studies suggest that infants, unlike adults, prefer regularity over timing variability.
Nakata and Mitani (2005) found infants prefer to listen to the more regular of two rhythms. Trainor et al. (2012) found that infants overall had no preference for a version of Chopin’s Waltz in A-flat, op. 69, No. 1 played expressively by Dinu Lipatti compared to one that was computer generated with metronomic timing. Together these studies suggest that children use basic timing cues such as overall tempo for emotional processing in music, but that it takes some time for children to become sensitive to timing cues involving deviations from regularity, as are found in expressive timing.
D E
A S S
–M E M
It is often noted that across cultures past and present music serves social functions, including action coordination, communication, and social cohesion (Cirelli, 2018; Cirelli, Trehub, & Trainor, 2018; D’Ausilio, Novembre, Fadiga, & Keller, 2015; Ilari, 2016; Patel & Iversen, 2014; Trainor, 2015; Trainor & Cirelli, 2015). Indeed music is present at virtually all important social occasions including weddings, funerals, religious rituals, parties, sporting events, and political rallies. A number of researchers have suggested that the main function of joint musical experiences among adults is the increased social cohesion that results from synchronous movement (e.g., Bispham, 2006; Brown & Volgsten, 2006; Fitch, 2006; Huron, 2001; McNeill, 1995; Merker, 2000). It has also been suggested that music similarly enhances the social relationships between infants and their caregivers, with the coordinated interaction that music engenders increasing attachment, bonding, emotional recognition, and selfregulation in early development (Cirelli, 2018; Cirelli, Trehub, & Trainor, 2018; Dissanayake, 2012; Ilari, 2016; Malloch & Trevarthen, 2009; Trainor & Cirelli, 2015). Many motor movements across species are rhythmic, including heartbeats, locomotion (e.g., walking, running, skipping, swimming, wing flapping), pulsating in fireflies, and sound productions from speech in humans to chirping in crickets (Ackermann, 2008; Bentley & Hoy, 1974;
Buck, 1935, 1937, 1988; Kelso, Saltzman, & Tuller, 1986; Partridge, 1982; Peelle & Davis, 2012; Weimerskirch, Martin, Clerquin, Alexandre, & Jiraskova, 2001). However, spontaneous synchronization of movements to an external auditory beat appears to be relatively rare among non-human species (Merchant & Honing, 2014; Merchant et al., 2015; Patel, Iversen, Bregman, & Schulz, 2009; Schachner, Brady, Pepperberg, & Hauser, 2009), but very common and seemingly effortless in most human adults (Iversen & Patel, 2008; Patel & Iversen, 2014; Repp, 2005; Repp & Su, 2013; Trainor, 2015). Despite the ease with which adults achieve auditory–motor rhythmic coupling, it appears to take a long time to develop (Cirelli, Trehub, & Trainor, 2018; Drake, 1993; Einarson & Trainor, 2013, 2015, 2016; Fitzpatrick, Schmidt, & Lockman, 1996; Luck & Toiviainen, 2006; PhillipsSilver & Trainor, 2005; Provasi & Bobin-Bègue, 2003; Trainor & Cirelli, 2015; Van Noorden & De Bruyn, 2009; Zentner & Eerola, 2010), so interactions in early development involving synchronous movement likely rely on the caregiver to achieve the synchrony. Zentner and Eerola (2010) analyzed the movements of a European sample of infants while they listened to music and found no evidence for precise synchronization with the tempo of the music. However, infants moved more in response to music than to speech, and moved differently to music with faster compared to slower tempos. Such early responses are likely influenced by early experiences; in a follow-up study, Ilari (2015) found increased spontaneous rhythmic movements to music in a Brazilian sample of infants compared to those in the European sample of Zentner and Eerola (2010). At least among children growing up in Western cultures, children younger than 4 years of age generally have difficulty entraining to a beat. One study found that 2.5-year-old children only succeeded at tapping to a beat when the tempo was around their spontaneous tapping rate of about 400 ms onset-to-onset (Provasi & Bobin-Bègue, 2003). Another study reported that 3-year-olds performed poorly in general at clapping to a beat (Fitzpatrick et al., 1996). And although children younger than 4 years of age readily engage in whole body movements in response to music, their hopping, swaying, and circling are not generally entrained to the tempo of the music (Eerola, Luck, & Toiviainen, 2006). However, by 4 years of age, clear motoric entrainment to a beat emerges (Drake, Jones, & Baruch, 2000; Eerola et al., 2006; Endedijk et al., 2015; Fitzpatrick et al., 1996; McAuley,
Jones, Holub, Johnston, & Miller, 2006; Provasi & Bobin-Bègue, 2003), although school-aged children still perform worse than adults (Einarson & Trainor, 2013; Van Noorden & De Bruyn, 2009). Interestingly, evidence for auditory–motor entrainment can be seen at younger ages when the task is embedded in a social situation. While not synchronization, coordination between mothers and infants aged 3 to 9 months is evident in infancy, and correlates with self-regulation, future IQ, and development of empathy (Feldman, 2007). Furthermore, during infantdirected singing, infants’ head, body, hand, and leg movements coordinate most with the music at the beginnings and ends of phrases (Longhi, 2009). At somewhat older ages, precursors of entrainment can be seen in the spontaneous drumming of pairs of children in social situations; for instance, at ages 2 and 3 years, children will stop and start drumming when a partner does so, but only 4-year-olds appear to adapt the tempo of their drumming to that of their partner, even though all children showed tempo stability (i.e., the ability to produce relatively isochronous drumming sequences) (Endedijk et al., 2015). Interestingly, one study indicates that children as young as 2.5 years of age will entrain their drumming to that of an adult social partner (Kirschner & Tomasello, 2009). That this ability does not rely simply on visual cues is evident in that the children performed better when drumming with a human social partner than with a machine that hit the drum. Unlike when entraining to a predetermined stimulus as in most laboratory studies, in real musical interactions between people, all participants can adaptively adjust their timing in response to the other musicians (D’Ausilio et al., 2015; Keller, Novembre, & Hove, 2014; Nakata & Trainor, 2015). Indeed, the information flow among members of a musical ensemble can be measured through correlational and directional causal analyses of movement or EEG (Chang, Livingstone, Bosnyak, & Trainor, 2017; Lindenberger, Li, Gruber, & Müller, 2009; Sänger, Müller, & Lindenberger, 2012). With such approaches, it is possible that socially adaptive musical interactions could be measured in young children. Cultural experience also has an effect. Parental reports suggest that children in Brazil engage to a greater extent in such social music making than do children from Germany, and children from Brazil show greater propensity to spontaneously synchronize their drumming with another person than children from Germany (Kirschner & Ilari, 2014).
Several studies indicate that when adults move in synchrony with each other, they subsequently cooperate more, like and trust each other more, remember more about each other, and engage in more altruistic acts (e.g., Anshel & Kipper, 1988; Hove & Risen, 2009; Launay, Dean, & Bailes, 2013; Macrae, Duffy, Miles, & Lawrence, 2008; Tarr, Launay, & Dunbar, 2014; Valdesolo & DeSteno, 2011; Valdesolo, Ouyang, & DeSteno, 2010; Wiltermuth & Heath, 2009; Woolhouse, Tidhar, Demorest, Morrison, & Campbell, 2010). Furthermore, synchronized drumming can increase activation in the caudate, a brain region linked to expectation and reward (Kokal, Engel, Kirschner, & Keysers, 2011). Music provides an ideal context for facilitating synchronous movement between people. Rhythmic regularity in music enables prediction of when the next beat is expected, and therefore advanced planning of the motor movements necessary to synchronize with the beat. When people hear the same music and synchronize their movements with the beat of that music, they necessarily become synchronized with each other. Although infants cannot precisely entrain their movements to an auditory beat, they often experience such synchronization when they are held and walked or rocked to singing and other music. The social effects of synchronization with others begin to appear around the end of the first year after birth, a time during which infants’ social understanding is undergoing rapid development (Dunfield, Kuhlmeier, O’Connell, & Kelley, 2011). For example, at 12 months, but not at 9 months, infants preferred a toy bear that rocked in sync with them over one that rocked out-of-sync with them (Tunçgenç, Cohen, & Fawcett, 2015). By 14 months, infants will engage in overt helping behaviors, for example, picking up and handing back a marker “accidentally” dropped by an experimenter engaged in drawing a picture, or picking up and handing back a clothespin “accidentally” dropped by an experimenter hanging clothes on a line (Warneken & Tomasello, 2006, 2007, 2009). At 14 months, infants are not yet able to move in sync to music, but they can be bounced in sync to music when held by an assistant in an infant carrier. Cirelli, Einarson, and Trainor (2014) had an assistant bounce infants to “Twist and Shout” by the Beatles while the infant faced forward in the carrier across from the experimenter, who either bounced in sync with them (i.e., at the same tempo) or out of sync (at a different tempo), by having the experimenter bounce according to a click track delivered to her over headphones. They found that after less than 3 minutes
of such bouncing, infants were more likely to help the experimenter if they experienced synchronous compared to asynchronous bouncing in a series of helping tasks as just described. Furthermore, the increased helpfulness was targeted at the person that the infant experienced the bouncing with— infants showed no increased helpfulness toward a neutral experimenter who was present during the bouncing episode, but did not move to the music (Cirelli, Wan, & Trainor, 2014). However, if infants were shown a skit that either indicated that a second experimenter was a “friend” of the bouncer, or was simply an “acquaintance,” infants who bounced in sync with the experimenter transferred their helpfulness to the friend but not to the acquaintance. This suggests that infants use movement synchrony as one cue to identify who is in their social group and who is not. Infants also form expectations for future behaviors between other people (third party relationships) by observing how they interact. It appears that infants begin to use synchrony as a cue for third party relationships around the same age. Fawcett and Tunçgenç (2017) found that 15-month-old but not 12-month-old infants who watched bears move either in sync or out of sync expected those who moved in sync to affiliate socially. Cirelli, Wan, Johanis, and Trainor (2018) found that 12- and 15-month-old infants who watched videos of two women bouncing either in sync with each other or out of sync were surprised when those who bounced asynchronously subsequently displayed friendly behavior, although no significant expectations were observed for those who bounced synchronously. The use of synchronous movement to music as a cue to social affiliation continues into childhood. For example, after clapping together synchronously, 4- to 6-year-old children are more likely to help each other compared to children who experienced asynchronous clapping (Tunçgenç & Cohen, 2016). Even passive synchronous versus asynchronous movement (children were pushed on swings) at 4 years of age results in increased coordination and cooperation between children (Rabinowitch & Meltzoff, 2017). Interestingly, moving infants synchronously versus asynchronously with an experimenter in the absence of music has similar prosocial effects on subsequent helping behavior as when the music is present (Cirelli, Wan, et al., 2017). This suggests that music may facilitate synchronous movement, but that experiencing music together per se might not be crucial for increasing prosocial behavior. However, infants bounced with no music
were much less happy and cooperative than those bounced with music (Cirelli, Wan, et al., 2017), suggesting that the music does play a role in infant emotion regulation, which is likely helpful for encouraging prosocial behavior. Furthermore, synchronous movement between the experimenter and infant increases helping even when the musical beat is irregular and movements are therefore synchronized but not isochronous (Cirelli, Wan, & Trainor, 2014). Interestingly, anti-phase bouncing between an infant and experimenter (i.e., the experimenter is at maximum height when the infants is at minimum height and vice versa) appears to be as powerful as in-phase bouncing for eliciting helping behaviors from infants, suggesting that the mechanism is not one of simple self-similarity (Cirelli, Wan, & Trainor, 2014). Together, these results suggest that the role of music in encouraging prosocial behavior is one of promoting synchronous movement. If synchronous movement can be achieved in the absence of music, it is still a powerful force for encouraging prosocial behavior. However, music is an ideal stimulus for promoting synchronous movement because its temporal regularity enables the prediction necessary for coordinated movement.
A
D D
Given that rhythms organize our movements and our communication systems—music and language—and given that neural processes involve rhythmic oscillations, it is not surprising that when timing and/or rhythmic processes are compromised in development, the consequences can be severe. Indeed, deficits in timing and rhythmic processing have been linked to major developmental disorders including dyslexia, developmental coordination disorder (DCD), autism, attention deficit (hyperactivity) disorder (ADD/ADHD), and stuttering (Bhat & Srinivasan, 2013; Debrabant, Gheysen, Caeyenberghs, Van Waelvelde, & Vingerhoets, 2013; Falter & Noreika, 2014; Goswami, 2011; Hardy & LaGasse, 2013; Isaksson et al., 2018; Rosenblum & Regev, 2013; Toplak, Dockstader, & Tannock, 2006; Wieland, McAuley, Dilley, & Chang, 2015; Williams, Woollacott, & Ivry, 1992). Furthermore, there is high comorbidity among these disorders, raising the possibility of a common timing deficit (Brown-Lum & Zwicker,
2015; Iversen, Berg, Ellertsen, & Tønnessen, 2005; King-Dowling, Missiuna, Rodriguez, Greenway, & Cairney, 2015; McLeod, Langevin, Goodyear, & Dewey, 2014; Piek & Dyck, 2004; Reiersen, Constantino, & Todd, 2008; Watemberg, Waiserberg, Zuk, & Lerman-Sagie., 2007). Taking the example of dyslexia, languages are rhythmically organized around the syllable level. Although not strictly isochronous, syllables in syllable-timed languages (e.g., French), and stressed syllables in stresstimed language (e.g., English) are perceived as being roughly isochronous. Syllables thus provide a basic scaffold for a hierarchy of metrical levels and are crucial for perceiving and ordering the speech sounds (phonemes) within them, as well as for grouping syllables into words and phrases (e.g., Ding, Melloni, Zhang, Tian, & Poeppel, 2016; Giraud et al., 2007; Giraud & Poeppel, 2012). Dyslexia is defined as difficulties with letter-sound (phoneme) mappings during reading (Goswami, 2011), but the underlying core deficit appears to be difficulty in the extraction of syllable-level rhythmic structure, from which the phonemes comprising the syllables can be discerned (e.g., Goswami, 2011; Goswami, Huss, Mead, Fosker, & Verney, 2013). The deficit is not specific to language as both the ability to perceive rhythmic and metrical structure and the ability to entrain finger taps to an auditory beat are impaired in dyslexia (Flaugnacco et al., 2014; Huss, Verney, Fosker, Mead, & Goswami, 2011; Overy, Nicolson, Fawcett, & Clarke, 2003; Thomson, Fryer, Maltby, & Goswami, 2006; Thomson & Goswami, 2008; Wolff, 2002). A few studies suggest that the presentation of rhythmic cues can enhance linguistic processing (Cason, Astésano, & Schön, 2015; Cason & Schön, 2012; Schön & Tillmann, 2015; Przybylski et al., 2013). For example, using both behavioral and EEG measures, Cason, Schön, and colleagues (Cason & Schön, 2012; Cason et al., 2015) found superior processing of whether the last syllable in a sentence contained a particular phoneme when it was preceded by a rhythmic cue that matched the prosody (metrical structure) of the sentence compared to a rhythmic cue that did not match. In another study, Przybylski et al. (2013) presented children with rhythmic or arrhythmic primes followed by a spoken sentence and found better determination of whether the sentence was syntactically correct or not after the rhythmic primes. There are also a number of reports that rhythmic musical training can lessen dyslexic symptoms (Bhide, Power, & Goswami, 2013; Cogo-
Moreira, de Avila, Ploubidis, & de Jesus Mari, 2013; Flaugnacco et al., 2015; Habib et al., 2016; Schön & Tillmann, 2015; Taub, McGrew, & Keith, 2007; Thomson, Leong, & Goswami, 2013). For example, Habib et al. (2016) found that eighteen hours of musical training with a strong emphasis on rhythmic exercises led to improvements in reading and other linguistic and non-linguistic skill in children between 7 and 12 years of age. In a registered clinical trial, Flaugnacco et al. (2015) tested a sample of forty-eight children with dyslexia between the ages of 8 and 11 years. They found that seven months of musical training based on Kodaly and Orff approaches that were adapted to have a strong emphasis on timing and rhythm led to superior phonological and reading outcomes compared to a control painting program (all children also received conventional treatment for dyslexia). Timing and rhythm deficits are also associated with developmental coordination disorder (DCD), which is defined as significant fine and/or gross motor deficits that interfere with everyday living and are not accompanied by intellectual impairment or other identifiable physical disorder (American Psychiatric Association, 2013). Despite the fact that dyslexia and DCD appear to involve different mechanisms, there is high comorbidity between them (Brown-Lum & Zwicker, 2017; Gomez & Sirigu, 2015). DCD is associated with deficits in motor planning, motor sequencing, temporal prediction, perceptual timing, and tapping to a predictable visual sequence (Debrabant et al., 2013; Estil, Ingvaldsen, & Whiting, 2002; Volman & Geuze, 1998; Williams, Thomas, Maruff, Butson, & Wilson, 2006; Wilmut & Wann, 2008; Wilson, Ruddock, SmitsEngelsman, Polatajko, & Blank, 2013; Zwicker, Missiuna, & Boyd, 2009). A few studies have reported that children with DCD are also impaired at entraining their tapping to an auditory beat (Rosenblum & Regev, 2013; Williams et al., 1992). Given that simply perceiving an auditory beat engages the motor system, Trainor and colleagues (Trainor, Chang, Cairney, & Li, 2018) proposed that auditory timing and rhythm perception is a core deficit of DCD. Indeed in an initial study, these researchers found marked deficits in rhythm and duration perception using both behavioral and EEG measures (Chang, Chan, Li, Cairney, & Trainor, 2017). This review of dyslexia and DCD suggests that timing and rhythm processing may be core deficits in both disorders and relate to their high comorbidity. Less research has been conducted on this question with respect
to ADD/ADHD and autism, but it is possible that a timing deficit may be common to all of these developmental disorders. Importantly, the success of rhythmic training in treating dyslexia suggests that auditory and movement interventions involving timing and rhythm training might also be effective in DCD, autism, and ADHD.
C Young infants are sensitive to timing, rhythm, and meter, which helps them to organize inputs such as speech and music into hierarchical meaningful structures, and to enhance processing of auditory streams that unfold over time by using the regularities of rhythms to predict when important upcoming information will occur. Tempo and expressive timing are used in both caregivers’ infant-directed singing and in children’s early musical productions to convey emotional information. Auditory–motor connections can be seen early in development in the influence of movement on metrical interpretation, and in the influence of synchronous movement between an infant and an adult on infants’ altruistic helping behaviors. At the same time, it takes considerable development before children become proficient at entraining their motor movements to musical beats. The critical importance of timing and rhythm in development is evident in the strong associations between poor skills in these domains and developmental disorders including dyslexia, DCD, autism, attention deficits, and stuttering. Early diagnosis of poor timing and rhythm skills holds promise for early assessment of risk for developmental disorders and age-appropriate interventions that can put young children on a better developmental trajectory.
A The writing of this chapter was supported by grants from the Natural Science and Engineering Research Council of Canada, the Canadian Institute of Health Research, and the Social Sciences and Humanities Research Council of Canada.
R Ackermann, H. (2008). Cerebellar contributions to speech production and speech perception: Psycholinguistic and neurobiological perspectives. Trends in Neurosciences 31(6), 265–272. Adachi, M., & Trehub, S. E. (1998). Children’s expression of emotion in song. Psychology of Music 26(2), 133–153. Adachi, M., Trehub, S. E., & Abe, J. I. (2004). Perceiving emotion in children’s songs across age and culture. Japanese Psychological Research 46(4), 322–336. American Psychiatric Association (2013). Diagnostic and statistical manual of mental disorders (5th ed.). Washington, DC: APA. Anshel, A., & Kipper, D. A. (1988). The influence of group singing on trust and cooperation. Journal of Music Therapy 25(3), 145–155. Arnal, L. H., & Giraud, A. L. (2012). Cortical oscillations and sensory predictions. Trends in Cognitive Sciences 16(7), 390–398. Arnal, L. H., Poeppel, D., & Giraud, A. L. (2015). Temporal coding in the auditory cortex. In G. Celesia & G. Hickok (Eds.), Handbook of clinical neurology, Vol. 129: The human auditory system (pp. 85–98). Amsterdam: Elsevier. Balkwill, L. L., & Thompson, W. F. (1999). A cross-cultural investigation of the perception of emotion in music: Psychophysical and cultural cues. Music Perception: An Interdisciplinary Journal 17(1), 43–64. Balkwill, L. L., Thompson, W. F., & Matsunaga, R. I. E. (2004). Recognition of emotion in Japanese, Western, and Hindustani music by Japanese listeners. Japanese Psychological Research 46(4), 337–349. Baruch, C., & Drake, C. (1997). Tempo discrimination in infants. Infant Behavior and Development 20(4), 573–577. Basirat, A., Dehaene, S., & Dehaene-Lambertz, G. (2014). A hierarchy of cortical responses to sequence violations in three-month-old infants. Cognition 132(2), 137–150. Bauer, A. K. R., Bleichner, M. G., Jaeger, M., Thorne, J. D., & Debener, S. (2018). Dynamic phase alignment of ongoing auditory cortex oscillations. NeuroImage 167, 396–407. Behrens, G. A., & Green, S. B. (1993). The ability to identify emotional content of solo improvisations performed vocally and on three different instruments. Psychology of Music 21(1), 20–33. Bentley, D., & Hoy, R. R. (1974). The neurobiology of cricket song. Scientific American 231(2), 34– 45. Bhat, A. N., & Srinivasan, S. (2013). A review of “music and movement” therapies for children with autism: Embodied interventions for multisystem development. Frontiers in Integrative Neuroscience 7, 22. Retrieved from https://doi.org/10.3389/fnint.2013.00022 Bhide, A., Power, A., & Goswami, U. (2013). A rhythmic musical intervention for poor readers: A comparison of efficacy with a letter-based intervention. Mind, Brain, and Education 7(2), 113– 123. Bigand, E., Vieillard, S., Madurell, F., Marozeau, J., & Dacquet, A. (2005). Multidimensional scaling of emotional responses to music: The effect of musical expertise and of the duration of the excerpts. Cognition & Emotion 19(8), 1113–1139. Bispham, J. (2006). Rhythm in music: What is it? Who has it? And why? Music Perception: An Interdisciplinary Journal 24(2), 125–134. Brochard, R., Abecasis, D., Potter, D., Ragot, R., & Drake, C. (2003). The “ticktock” of our internal clock: Direct brain evidence of subjective accents in isochronous sequences. Psychological
Science 14(4), 362–366. Brown, S., & Volgsten, U. (Eds.). (2006). Music and manipulation: On the social uses and social control of music. New York: Berghahn. Brown-Lum, M., & Zwicker, J. G. (2015). Brain imaging increases our understanding of developmental coordination disorder: A review of literature and future directions. Current Developmental Disorders Reports 2(2), 131–140. Brown-Lum, M., & Zwicker, J. G. (2017). Neuroimaging and occupational therapy: Bridging the gap to advance rehabilitation in developmental coordination disorder. Journal of Motor Behavior 49(1), 98–110. Buck, J. B. (1935). Synchronous flashing of fireflies experimentally induced. Science 81, 339–340. Buck, J. B. (1937). Studies on the firefly. I. The effects of light and other agents on flashing in Photinus pyralis, with special reference to periodicity and diurnal rhythm. Physiological Zoology 10(1), 45–58. Buck, J. B. (1988). Synchronous rhythmic flashing of fireflies. II. Quarterly Review of Biology 63(3), 265–289. Calderone, D. J., Lakatos, P., Butler, P. D., & Castellanos, F. X. (2014). Entrainment of neural oscillations as a modifiable substrate of attention. Trends in Cognitive Sciences 18(6), 300–309. Cason, N., Astésano, C., & Schön, D. (2015). Bridging music and speech rhythm: Rhythmic priming and audio-motor training affect speech perception. Acta Psychologica 155, 43–50. Cason, N., & Schön, D. (2012). Rhythmic priming enhances the phonological processing of speech. Neuropsychologia 50(11), 2652–2658. Chang, A., Bosnyak, D., & Trainor, L. J. (under review a). Beta oscillatory power modulation reflects the predictability of pitch change. Chang, A., Bosnyak, D., & Trainor, L. J. (2018). Beta oscillatory power modulation reflects the predictability of pitch change. Cortex, 106, 248–260. Chang, A., Chan, J., Li, Y.-C., Cairney, J., & Trainor, L. J. (2017). Auditory timing deficits in developmental coordination disorder. Presented at the 1st Conference of the Timing Research Forum, Strasbourg, France, October 23–25. Chang, A., Livingstone, S. R., Bosnyak, D. J., & Trainor, L. J. (2017). Body sway reflects leadership in joint music performance. Proceedings of the National Academy of Sciences 114(21), E4134– E4141. Chang, H. W., & Trehub, S. E. (1977). Infants’ perception of temporal grouping in auditory patterns. Child Development 48(4), 1666–1670. Cirelli, L. K. (2018). How interpersonal synchrony facilitates early prosocial behavior. Current Opinion in Psychology 20, 35–39. Cirelli, L. K., Bosnyak, D., Manning, F. C., Spinelli, C., Marie, C., Fujioka, T., … Trainor, L. J. (2014). Beat-induced fluctuations in auditory cortical beta-band activity: Using EEG to measure age-related changes. Frontiers in Psychology 5, 742. Retrieved from https://doi.org/10.3389/fpsyg.2014.00742 Cirelli, L. K., Einarson, K. M., & Trainor, L. J. (2014). Interpersonal synchrony increases prosocial behavior in infants. Developmental Science 17(6), 1003–1011. Cirelli, L. K., Spinelli, C., Nozaradan, S., & Trainor, L. J. (2016). Measuring neural entrainment to beat and meter in infants: Effects of music background. Frontiers in Neuroscience 10, 229. Retrieved from https://doi.org/10.3389/fnins.2016.00229 Cirelli, L. K., Trehub, S. E., & Trainor, L. J. (2018). Rhythm and melody as social signals for infants. Annals of the New York Academy of Sciences. doi:10.1111/nyas.13580 Cirelli, L. K., Wan, S. J., Johanis, T. C., & Trainor, L. J. (2018). Infants’ use of interpersonal asynchrony as a signal for third-party affiliation. Music & Science 1.
doi:10.1177/2059204317745855 Cirelli, L. K., Wan, S. J., Spinelli, C., & Trainor, L. J. (2017). Effects of interpersonal movement synchrony on infant helping behaviors: Is music necessary? Music Perception: An Interdisciplinary Journal 34(3), 319–326. Cirelli, L. K., Wan, S. J., & Trainor, L. J. (2014). Fourteen-month-old infants use interpersonal synchrony as a cue to direct helpfulness. Philosophical Transactions of the Royal Society B: Biological Sciences 369(1658), 20130400. Clarke, E. F., & Krumhansl, C. L. (1990). Perceiving musical time. Music Perception: An Interdisciplinary Journal 7(3), 213–251. Cogo-Moreira, H., de Avila, C. R. B., Ploubidis, G. B., & de Jesus Mari, J. (2013). Effectiveness of music education for the improvement of reading skills and academic achievement in young poor readers: A pragmatic cluster-randomized, controlled clinical trial. PloS ONE 8(3), e59984. Conrad, N. J., Walsh, J., Allen, J. M., & Tsang, C. D. (2011). Examining infants’ preferences for tempo in lullabies and playsongs. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale 65(3), 168–172. Corbeil, M., Trehub, S. E., & Peretz, I. (2016). Singing delays the onset of infant distress. Infancy 21(3), 373–391. Cunningham, J. G., & Sterling, R. S. (1988). Developmental change in the understanding of affective meaning in music. Motivation and Emotion 12(4), 399–413. Dalla Bella, S., Peretz, I., Rousseau, L., & Gosselin, N. (2001). A developmental study of the affective value of tempo and mode in music. Cognition 80(3), B1–B10. D’Ausilio, A., Novembre, G., Fadiga, L., & Keller, P. E. (2015). What can music tell us about social interaction? Trends in Cognitive Sciences 19(3), 111–114. Debrabant, J., Gheysen, F., Caeyenberghs, K., Van Waelvelde, H., & Vingerhoets, G. (2013). Neural underpinnings of impaired predictive motor timing in children with developmental coordination disorder. Research in Developmental Disabilities 34(5), 1478–1487. de l’Etoile, S. K., and Leider, C. N. (2011). Acoustic parameters of infant-directed singing in mothers with depressive symptoms. Infant Behavior and Development 34(2), 248–256. Deliege, I. (1987). Grouping conditions in listening to music: An approach to Lerdahl & Jackendoff’s grouping preference rules. Music Perception: An Interdisciplinary Journal 4(4), 325–359. Demany, L., McKenzie, B., & Vurpillot, E. (1977). Rhythm perception in early infancy. Nature 266(5604), 718–719. Ding, N., Melloni, L., Zhang, H., Tian, X., & Poeppel, D. (2016). Cortical tracking of hierarchical linguistic structures in connected speech. Nature Neuroscience 19(1), 158–164. Ding, N., Patel, A. D., Chen, L., Butler, H., Luo, C., & Poeppel, D. (2017). Temporal modulations in speech and music. Neuroscience & Biobehavioral Reviews 81(B), 181–187. Dissanayake, E. (2012). The earliest narratives were musical. Research Studies in Music Education 34(1), 3–14. Dolgin, K. G., & Adelson, E. H. (1990). Age changes in the ability to interpret affect in sung and instrumentally-presented melodies. Psychology of Music 18(1), 87–98. Dowling, W. J. (1973). Rhythmic groups and subjective chunks in memory for melodies. Perception & Psychophysics 14(1), 37–40. Drake, C. (1993). Reproduction of musical rhythms by children, adult musicians, and adult nonmusicians. Perception & Psychophysics 53(1), 25–33. Drake, C., Jones, M. R., & Baruch, C. (2000). The development of rhythmic attending in auditory sequences: Attunement, referent period, focal attending. Cognition 77(3), 251–288. Dunfield, K., Kuhlmeier, V. A., O’Connell, L., & Kelley, E. (2011). Examining the diversity of prosocial behavior: Helping, sharing, and comforting in infancy. Infancy 16(3), 227–247.
Eerola, T., Luck, G., & Toiviainen, P. (2006). An investigation of pre-schoolers’ corporeal synchronization with music. In Proceedings of the 9th International Conference on Music Perception and Cognition (pp. 472–476). The Society for Music Perception and Cognition and European Society for the Cognitive Sciences of Music Bologna. Einarson, K. M., & Trainor, L. J. (2013). Five-year-old children’s beat perception and beat synchronization abilities. Frontiers in Human Neuroscience. Conference abstract: 14th Rhythm Production and Perception Workshop, Birmingham, September 11–13. Einarson, K. M., & Trainor, L. J. (2015). The effect of visual information on young children’s perceptual sensitivity to musical beat alignment. Timing & Time Perception 3(1–2), 88–101. Einarson, K. M., & Trainor, L. J. (2016). Hearing the beat: Young children’s perceptual sensitivity to beat alignment varies according to metric structure. Music Perception: An Interdisciplinary Journal 34(1), 56–70. Endedijk, H. M., Ramenzoni, V. C., Cox, R. F., Cillessen, A. H., Bekkering, H., & Hunnius, S. (2015). Development of interpersonal coordination between peers during a drumming task. Developmental Psychology 51(5), 714–721. Esposito, A., and Serio, M. (2007). Children’s perception of musical emotional expressions. In A. Esposito, M. Faundez-Zanuy, E. Keller, & M. Marinaro (Eds.), Verbal and nonverbal communication behaviours: COST Action 2102 international workshop, Vietri sul Mare, Italy, March 29–31, 2007. Revised selected and invited papers (pp. 51–64). Berlin: Springer. Essens, P. J. (1986). Hierarchical organization of temporal patterns. Perception & Psychophysics 40(2), 69–73. Essens, P. J., & Povel, D. J. (1985). Metrical and nonmetrical representations of temporal patterns. Perception & Psychophysics 37(1), 1–7. Estil, L., Ingvaldsen, R., & Whiting, H. (2002). Spatial and temporal constraints on performance in children with movement co-ordination problems. Experimental Brain Research 147(2), 153–161. Falter, C. M., & Noreika, V. (2014). Time processing in developmental disorders: A comparative view. In V. Arstila & D. Lloyd (Eds.), Subjective time: The philosophy, psychology, and neuroscience of temporality (pp. 557–598). Cambridge, MA: MIT Press. Fawcett, C., & Tunçgenç, B. (2017). Infants’ use of movement synchrony to infer social affiliation in others. Journal of Experimental Child Psychology 160, 127–136. Feldman, R. (2007). Parent–infant synchrony and the construction of shared timing: Physiological precursors, developmental outcomes, and risk conditions. Journal of Child Psychology and Psychiatry 48(3–4), 329–354. Fernald, A. (1991). Prosody in speech to children: Prelinguistic and linguistic functions. Annals of Child Development 8, 43–80. Fitch, W. T. (2006). The biology and evolution of music: A comparative perspective. Cognition 100(1), 173–215. Fitzpatrick, P., Schmidt, R. C., & Lockman, J. J. (1996). Dynamical patterns in the development of clapping. Child Development 67(6), 2691–2708. Flaugnacco, E., Lopez, L., Terribili, C., Montico, M., Zoia, S., & Schön, D. (2015). Music training increases phonological awareness and reading skills in developmental dyslexia: A randomized control trial. PLoS ONE 10(9), e0138715. Flaugnacco, E., Lopez, L., Terribili, C., Zoia, S., Buda, S., Tilli, S., … Schön, D. (2014). Rhythm perception and production predict reading abilities in developmental dyslexia. Frontiers in Human Neuroscience 8, 392. doi:10.3389/fnhum.2014.00392 Flom, R., Gentile, D. A., & Pick, A. D. (2008). Infants’ discrimination of happy and sad music. Infant Behavior and Development 31(4), 716–728.
Flom, R., & Pick, A. D. (2012). Dynamics of infant habituation: Infants’ discrimination of musical excerpts. Infant Behavior and Development 35(4), 697–704. Fraisse, P. (1982). Rhythm and tempo. In D. Deutch (Ed.), The psychology of music (pp. 149–180). San Diego: Academic Press. Fritz, T., Jentschke, S., Gosselin, N., Sammler, D., Peretz, I., Turner, R., … Koelsch, S. (2009). Universal recognition of three basic emotions in music. Current Biology 19(7), 573–576. Fujioka, T., Ross, B., & Trainor, L. J. (2015). Beta-band oscillations represent auditory beat and its metrical hierarchy in perception and imagery. Journal of Neuroscience 35(45), 15187–15198. Fujioka, T., Trainor, L. J., Large, E. W., & Ross, B. (2012). Internalized timing of isochronous sounds is represented in neuromagnetic beta oscillations. Journal of Neuroscience 32(5), 1791–1802. Gabrielsson, A., & Juslin, P. N. (1996). Emotional expression in music performance: Between the performer’s intention and the listener’s experience. Psychology of Music 24(1), 68–91. Gabrielsson, A., & Lindström, E. (2010). The role of structure in the musical expression of emotions. In P. Juslin and J. A. Sloboda (Eds.), Handbook of music and emotion: Theory, research, applications (pp. 367–400). Oxford: Oxford University Press. Gagnon, L., & Peretz, I. (2003). Mode and tempo relative contributions to “happy-sad” judgements in equitone melodies. Cognition & Emotion 17(1), 25–40. Gerardi, G. M., & Gerken, L. (1995). The development of affective responses to modality and melodic contour. Music Perception: An Interdisciplinary Journal 12(3), 279–290. Gerry, D. W., Faux, A. L., & Trainor, L. J. (2010). Effects of Kindermusik training on infants’ rhythmic enculturation. Developmental Science 13(3), 545–551. Gerry, D. W., Unrau, A., & Trainor, L. J. (2012). Active music classes in infancy enhance musical, communicative and social development. Developmental Science 15(3), 398–407. Giomo, C. J. (1993). An experimental study of children’s sensitivity to mood in music. Psychology of Music 21(2), 141–162. Giraud, A. L., Kleinschmidt, A., Poeppel, D., Lund, T. E., Frackowiak, R. S., & Laufs, H. (2007). Endogenous cortical rhythms determine cerebral specialization for speech perception and production. Neuron 56(6), 1127–1134. Giraud, A. L., & Poeppel, D. (2012). Cortical oscillations and speech processing: Emerging computational principles and operations. Nature Neuroscience 15(4), 511–517. Gjerdingen, R. O. (1989). Meter as a mode of attending: A network simulation of attentional rhythmicity in music. Intégral 3, 67–91. Gomez, A., & Sirigu, A. (2015). Developmental coordination disorder: Core sensori-motor deficits, neurobiology and etiology. Neuropsychologia 79(B), 272–287. Goswami, U. (2011). A temporal sampling framework for developmental dyslexia. Trends in Cognitive Sciences 15(1), 3–10. Goswami, U., Huss, M., Mead, N., Fosker, T., & Verney, J. P. (2013). Perception of patterns of musical beat distribution in phonological developmental dyslexia: Significant longitudinal relations with word reading and reading comprehension. Cortex 49(5), 1363–1376. Grahn, J. A., & Brett, M. (2007). Rhythm and beat perception in motor areas of the brain. Journal of Cognitive Neuroscience 19(5), 893–906. Gregory, A. H., Worrall, L., & Sarge, A. (1996). The development of emotional responses to music in young children. Motivation and Emotion 20(4), 341–348. Habib, M., Lardy, C., Desiles, T., Commeiras, C., Chobert, J., & Besson, M. (2016). Music and dyslexia: A new musical training method to improve reading and related disorders. Frontiers in Psychology 7, 26. doi:10.3389/fpsyg.2016.00026 Háden, G. P., Németh, R., Török, M., & Winkler, I. (2015). Predictive processing of pitch trends in newborn infants. Brain Research 1626, 14–20.
Haegens, S., & Zion Golumbic, E. (2018). Rhythmic facilitation of sensory processing: A critical review. Neuroscience & Biobehavioral Reviews 86, 150–165. Hannon, E. E., Vandenbosch der Nederlanden, C. M., & Tichko, P. (2012). Effects of perceptual experience on children’s and adults’ perception of unfamiliar rhythms. Annals of the New York Academy of Sciences 1252, 92–99. Hannon, E. E., & Johnson, S. P. (2005). Infants use meter to categorize rhythms and melodies: Implications for musical structure learning. Cognitive Psychology 50(4), 354–377. Hannon, E. E., Nave-Blodgett, J. E., & Nave, K. M. (2018). The developmental origins of the perception and production of musical rhythm. Child Development Perspectives. doi:10.1111/cdep.12285 Hannon, E. E., Soley, G., & Ullal, S. (2012). Familiarity overrides complexity in rhythm perception: A cross-cultural comparison of American and Turkish listeners. Journal of Experimental Psychology: Human Perception and Performance 38(3), 543–548. Hannon, E. E., & Trainor, L. J. (2007). Music acquisition: Effects of enculturation and formal training on development. Trends in Cognitive Sciences 11(11), 466–472. Hannon, E. E., & Trehub, S. E. (2005a). Metrical categories in infancy and adulthood. Psychological Science 16(1), 48–55. Hannon, E. E., & Trehub, S. E. (2005b). Tuning in to musical rhythms: Infants learn more readily than adults. Proceedings of the National Academy of Sciences 102(35), 12639–12643. Hardy, M. W., and LaGasse, A. B. (2013). Rhythm, movement, and autism: Using rhythmic rehabilitation research as a model for autism. Frontiers in Integrative Neuroscience 7, 19. Retrieved from https://doi.org/10.3389/fnint.2013.00019 He, C., Hotson, L., & Trainor, L. J. (2007). Mismatch responses to pitch changes in early infancy. Journal of Cognitive Neuroscience 19(5), 878–892. Henry, M. J., Herrmann, B., & Obleser, J. (2014). Entrained neural oscillations in multiple frequency bands comodulate behavior. Proceedings of the National Academy of Sciences 111(41), 14935– 14940. Henry, M. J., & Obleser, J. (2012). Frequency modulation entrains slow neural oscillations and optimizes human listening behavior. Proceedings of the National Academy of Sciences 109(49), 20095–20100. Herrmann, B., Henry, M. J., Haegens, S., & Obleser, J. (2016). Temporal expectations and neural amplitude fluctuations in auditory cortex interactively influence perception. NeuroImage 124, 487–497. Hove, M. J., & Risen, J. L. (2009). It’s all in the timing: Interpersonal synchrony increases affiliation. Social Cognition 27(6), 949–960. Huron, D. (2001). Is music an evolutionary adaptation? Annals of the New York Academy of Sciences 930(1), 43–61. Huron, D. B. (2006). Sweet anticipation: Music and the psychology of expectation. Cambridge, MA: MIT Press. Huss, M., Verney, J. P., Fosker, T., Mead, N., & Goswami, U. (2011). Music, rhythm, rise time perception and developmental dyslexia: Perception of musical meter predicts reading and phonology. Cortex 47(6), 674–689. Ilari, B. (2015). Rhythmic engagement with music in early childhood: A replication and extension. Journal of Research in Music Education 62(4), 332–343. Ilari, B. (2016). Music in the early years: Pathways into the social world. Research Studies in Music Education 38(1), 23–39. Ilie, G., & Thompson, W. F. (2006). A comparison of acoustic cues in music and speech for three dimensions of affect. Music Perception: An Interdisciplinary Journal 23(4), 319–330.
Isaksson, S., Salomäki, S., Tuominen, J., Arstila, V., Falter-Wagner, C. M., & Noreika, V. (2018). Is there a generalized timing impairment in autism spectrum disorders across time scales and paradigms? Journal of Psychiatric Research 99, 111–121. Iversen, J. R., & Patel, A. D. (2008). The Beat Alignment Test (BAT): Surveying beat processing abilities in the general population. In K. Miyazaki, M. Adachi, Y. Hiraga, Y. Nakajima, & M. Tsuzaki (Eds.), Proceedings of the 10th International Conference on Music Perception and Cognition (ICMPC 10) (pp. 465–468). Iversen, J. R., Repp, B. H., & Patel, A. D. (2009). Top-down control of rhythm perception modulates early auditory responses. Annals of the New York Academy of Sciences 1169, 58–73. Iversen, S., Berg, K., Ellertsen, B., & Tønnessen, F. E. (2005). Motor coordination difficulties in a municipality group and in a clinical sample of poor readers. Dyslexia 11(3), 217–231. James, C. E., Michel, C. M., Britz, J., Vuilleumier, P., & Hauert, C. A. (2012). Rhythm evokes action: Early processing of metric deviances in expressive music by experts and laymen revealed by ERP source imaging. Human Brain Mapping 33(12), 2751–2767. Jones, M. R., & Boltz, M. (1989). Dynamic attending and responses to time. Psychological Review 96(3), 459–491. Jones, M. R., Moynihan, H., MacKenzie, N., & Puente, J. (2002). Temporal aspects of stimulusdriven attending in dynamic arrays. Psychological Science 13(4), 313–319. Jusczyk, P. W., & Krumhansl, C. L. (1993). Pitch and rhythmic patterns affecting infants’ sensitivity to musical phrase structure. Journal of Experimental Psychology: Human Perception and Performance 19(3), 627–640. Juslin, P. N. (1997). Emotional communication in music performance: A functionalist perspective and some data. Music Perception: An Interdisciplinary Journal 14(4), 383–418. Juslin, P. N. (2000). Cue utilization in communication of emotion in music performance: Relating performance to perception. Journal of Experimental Psychology: Human Perception and Performance 26(6), 1797–1813. Juslin, P. N., & Laukka, P. (2003). Emotional expression in speech and music. Annals of the New York Academy of Sciences 1000, 279–282. Juslin, P. N., & Timmers, R. (2010). Expression and communication of emotion in music performance. In P. Juslin and J. A. Sloboda (Eds.), Handbook of music and emotion: Theory, research, applications (pp. 453–489). Oxford: Oxford University Press. Kastner, M. P., & Crowder, R. G. (1990). Perception of the major/minor distinction: IV. Emotional connotations in young children. Music Perception: An Interdisciplinary Journal 8(2), 189–201. Keller, P. E., Novembre, G., & Hove, M. J. (2014). Rhythm in joint action: Psychological and neurophysiological mechanisms for real-time interpersonal coordination. Philosophical Transactions of the Royal Society B: Biological Sciences 369(1658), 20130394. doi:10.1098/rstb.2013.0394 Kelso, J. A., Saltzman, E. L., & Tuller, B. (1986). The dynamical perspective on speech production: Data and theory. Journal of Phonetics 14(1), 29–59. King-Dowling, S., Missiuna, C., Rodriguez, M. C., Greenway, M., & Cairney, J. (2015). Reprint of “Co-occurring motor, language and emotional–behavioral problems in children 3–6 years of age.” Human Movement Science 42, 344–351. Kirschner, S., & Ilari, B. (2014). Joint drumming in Brazilian and German preschool children: Cultural differences in rhythmic entrainment, but no prosocial effects. Journal of Cross-Cultural Psychology 45(1), 137–166. Kirschner, S., & Tomasello, M. (2009). Joint drumming: Social context facilitates synchronization in preschool children. Journal of Experimental Child Psychology 102(3), 299–314.
Kokal, I., Engel, A., Kirschner, S., & Keysers, C. (2011). Synchronized drumming enhances activity in the caudate and facilitates prosocial commitment—if the rhythm comes easily. PLoS ONE 6(11), e27272. Kragness, H. E., Baksh, A., Battcock, A., & Trainor, L. J. (2017). Children’s use of expressive cues in music: A developmental self-pacing study. Presented at the Neurosciences & Music VI conference, Boston, USA, June 15–18. Kragness, H. E., & Trainor, L. J. (2016). Listeners lengthen phrase boundaries in self-paced music. Journal of Experimental Psychology: Human Perception and Performance 42(10), 1676–1686. Kragness, H. E., & Trainor, L. J. (2018). Young children pause on phrase boundaries in self-paced music listening: The role of harmonic cues. Developmental Psychology 54(5), 842–856. Kratus, J. (1993). A developmental study of children’s interpretation of emotion in music. Psychology of Music 21(1), 3–19. Krumhansl, C. L., & Jusczyk, P. W. (1990). Infants’ perception of phrase structure in music. Psychological Science 1(1), 70–73. Kuhl, P. K., Stevens, E., Hayashi, A., Deguchi, T., Kiritani, S., & Iverson, P. (2006). Infants show a facilitation effect for native language phonetic perception between 6 and 12 months. Developmental Science 9(2), F13–F21. Large, E. W., & Jones, M. R. (1999). The dynamics of attending: How people track time-varying events. Psychological Review 106(1), 119–159. Laukka, P., & Gabrielsson, A. (2000). Emotional expression in drumming performance. Psychology of Music 28(2), 181–189. Launay, J., Dean, R. T., & Bailes, F. (2013). Synchronization can influence trust following virtual interaction. Experimental Psychology 60(1), 1–11. Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge, MA: MIT Press. Lewkowicz, D. J. (2003). Learning and discrimination of audiovisual events in human infants: The hierarchical relation between intersensory temporal synchrony and rhythmic pattern cues. Developmental Psychology 39(5), 795–804. Lindenberger, U., Li, S. C., Gruber, W., & Müller, V. (2009). Brains swinging in concert: Cortical phase synchronization while playing guitar. BMC Neuroscience 10(1), 22. London, J. (2004). Hearing in time: Psychological aspects of musical meter (2nd ed.). New York: Oxford University Press. Longhi, S. (2009). Bloch oscillations in complex crystals with PT symmetry. Physical Review Letters 103(12), 123601. Luck, G., & Toiviainen, P. (2006). Ensemble musicians’ synchronization with conductors’ gestures: An automated feature-extraction analysis. Music Perception: An Interdisciplinary Journal 24(2), 189–200. Lynch, M. P., & Eilers, R. E. (1992). A study of perceptual development for musical tuning. Perception & Psychophysics 52(6), 599–608. Lynch, M. P., Eilers, R. E., Oller, D. K., & Urbano, R. C. (1990). Innateness, experience, and music perception. Psychological Science 1(4), 272–276. McAuley, J. D., Jones, M. R., Holub, S., Johnston, H. M., & Miller, N. S. (2006). The time of our lives: Life span development of timing and event tracking. Journal of Experimental Psychology: General 135(3), 348–367. McLeod, K. R., Langevin, L. M., Goodyear, B. G., & Dewey, D. (2014). Functional connectivity of neural motor networks is disrupted in children with developmental coordination disorder and attention-deficit/hyperactivity disorder. NeuroImage: Clinical 4, 566–575. McNeill, W. H. (1995). Keeping together in time. Boston, MA: Harvard University Press.
Macrae, C. N., Duffy, O. K., Miles, L. K., & Lawrence, J. (2008). A case of hand waving: Action synchrony and person perception. Cognition 109(1), 152–156. Malloch, S. E., & Trevarthen, C. E. (2009). Communicative musicality: Exploring the basis of human companionship. New York: Oxford University Press. Merchant, H., Grahn, J., Trainor, L., Rohrmeier, M., & Fitch, W. T. (2015). Finding the beat: A neural perspective across humans and non-human primates. Philosophical Transactions of the Royal Society B: Biological Sciences 370(1664), 20140093. doi:10.1098/rstb.2014.0093 Merchant, H., & Honing, H. (2014). Are non-human primates capable of rhythmic entrainment? Evidence for the gradual audiomotor evolution hypothesis. Frontiers in Neuroscience 7, 274. Retrieved from https://doi.org/10.3389/fnins.2013.00274 Merker, B. (2000). Synchronous chorusing and human origins. In N. L. Wallin, B. Merker, & S. Brown (Eds.), The origins of music (pp. 315–327). Cambridge, MA: MIT Press. Meyer, L. B. (1956). Emotion and meaning in music. Chicago, IL: University of Chicago Press. Mohn, C., Argstatter, H., & Wilker, F. W. (2011). Perception of six basic emotions in music. Psychology of Music 39(4), 503–517. Morillon, B., & Schroeder, C. E. (2015). Neuronal oscillations as a mechanistic substrate of auditory temporal prediction. Annals of the New York Academy of Sciences 1337, 26–31. Morillon, B., Schroeder, C. E., Wyart, V., & Arnal, L. H. (2016). Temporal prediction in lieu of periodic stimulation. Journal of Neuroscience 36(8), 2342–2347. Mote, J. (2011). The effects of tempo and familiarity on children’s affective interpretation of music. Emotion 11(3), 618–622. Nakata, T., & Mitani, C. (2005). Influences of temporal fluctuation on infant attention. Music Perception: An Interdisciplinary Journal 22(3), 401–409. Nakata, T., & Trainor, L. J. (2015). Perceptual and cognitive enhancement with an adaptive timing partner: Electrophysiological responses to pitch change. Psychomusicology: Music, Mind, and Brain 25(4), 404–415. Nawrot, E. S. (2003). The perception of emotional expression in music: Evidence from infants, children and adults. Psychology of Music 31(1), 75–92. Nobre, A. C., Correa, A., & Coull, J. T. (2007). The hazards of time. Current Opinion in Neurobiology 17(4), 465–470. Nobre, A. C., & van Ede, F. (2018). Anticipated moments: Temporal structure in attention. Nature Reviews Neuroscience 19(1), 34–48. Nozaradan, S. (2014). Exploring how musical rhythm entrains brain activity with electroencephalogram frequency-tagging. Philosophical Transactions of the Royal Society B: Biological Sciences 369(1658), 20130393. doi:10.1098/rstb.2013.0393 Nozaradan, S., Peretz, I., Missal, M., & Mouraux, A. (2011). Tagging the neuronal entrainment to beat and meter. Journal of Neuroscience 31(28), 10234–10240. Nozaradan, S., Peretz, I., & Mouraux, A. (2012). Selective neuronal entrainment to the beat and meter embedded in a musical rhythm. Journal of Neuroscience 32(49), 17572–17581. Overy, K., Nicolson, R. I., Fawcett, A. J., & Clarke, E. F. (2003). Dyslexia and music: Measuring musical timing skills. Dyslexia 9(1), 18–36. Palmer, C. (1989). Mapping musical thought to musical performance. Journal of Experimental Psychology: Human Perception and Performance 15(2), 331–346. Palmer, C., & Krumhansl, C. L. (1987). Pitch and temporal contributions to musical phrase perception: Effects of harmony, performance timing, and familiarity. Perception & Psychophysics 41(6), 505–518. Papoušek, M., Papoušek, H., & Symmes, D. (1991). The meanings of melodies in motherese in tone and stress languages. Infant Behavior and Development 14(4), 415–440.
Partridge, B. L. (1982). The structure and function of fish schools. Scientific American 246(6), 114– 123. Patel, A. D., & Iversen, J. R. (2014). The evolutionary neuroscience of musical beat perception: The Action Simulation for Auditory Prediction (ASAP) hypothesis. Frontiers in Systems Neuroscience 8, 57. Retrieved from https://doi.org/10.3389/fnsys.2014.00057 Patel, A. D., Iversen, J. R., Bregman, M. R., & Schulz, I. (2009). Experimental evidence for synchronization to a musical beat in a nonhuman animal. Current Biology 19(10), 827–830. Pearce, M. T., Müllensiefen, D., & Wiggins, G. A. (2010). The role of expectation and probabilistic learning in auditory boundary perception: A model comparison. Perception 39(10), 1367–1391. Pearce, M. T., & Wiggins, G. A. (2006). Expectation in melody: The influence of context and learning. Music Perception: An Interdisciplinary Journal 23(5), 377–405. Peelle, J. E., & Davis, M. H. (2012). Neural oscillations carry speech rhythm through to comprehension. Frontiers in Psychology 3, 320. Retrieved from https://doi.org/10.3389/fpsyg.2012.00320 Peretz, I. (1989). Clustering in music: An appraisal of task factors. International Journal of Psychology 24(1–5), 157–178. Phillips-Silver, J., & Trainor, L. J. (2005). Feeling the beat: Movement influences infant rhythm perception. Science 308(5727), 1430. Piek, J. P., & Dyck, M. J. (2004). Sensory-motor deficits in children with developmental coordination disorder, attention deficit hyperactivity disorder and autistic disorder. Human Movement Science 23(3–4), 475–488. Provasi, J., & Bobin-Bègue, A. (2003). Spontaneous motor tempo and rhythmical synchronisation in 2½- and 4-year-old children. International Journal of Behavioral Development 27(3), 220–231. Przybylski, L., Bedoin, N., Krifi-Papoz, S., Herbillon, V., Roch, D., Léculier, L., … Tillmann, B. (2013). Rhythmic auditory stimulation influences syntactic processing in children with developmental language disorders. Neuropsychology 27(1), 121–131. Rabinowitch, T. C., & Meltzoff, A. N. (2017). Synchronized movement experience enhances peer cooperation in preschool children. Journal of Experimental Child Psychology 160, 21–32. Rankin, S. K., Large, E. W., & Fink, P. W. (2009). Fractal tempo fluctuation and pulse prediction. Music Perception: An Interdisciplinary Journal 26(5), 401–413. Reiersen, A. M., Constantino, J. N., & Todd, R. D. (2008). Co-occurrence of motor problems and autistic symptoms in attention-deficit/hyperactivity disorder. Journal of the American Academy of Child & Adolescent Psychiatry 47(6), 662–672. Repp, B. H. (1992). Diversity and commonality in music performance: An analysis of timing microstructure in Schumann’s “Träumerei.” Journal of the Acoustical Society of America 92(5), 2546–2568. Repp, B. H. (2005). Sensorimotor synchronization: A review of the tapping literature. Psychonomic Bulletin & Review 12(6), 969–992. Repp, B. H., London, J., & Keller, P. E. (2005). Production and synchronization of uneven rhythms at fast tempi. Music Perception: An Interdisciplinary Journal 23(1), 61–78. Repp, B. H., & Su, Y. H. (2013). Sensorimotor synchronization: A review of recent research (2006– 2012). Psychonomic Bulletin & Review 20(3), 403–452. Rock, A. M., Trainor, L. J., & Addison, T. L. (1999). Distinctive messages in infant-directed lullabies and play songs. Developmental Psychology 35(2), 527–534. Rosenblum, S., & Regev, N. (2013). Timing abilities among children with developmental coordination disorders (DCD) in comparison to children with typical development. Research in Developmental Disabilities 34(1), 218–227.
Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology 39(6), 1161–1178. Sänger, J., Müller, V., & Lindenberger, U. (2012). Intra- and interbrain synchronization and network properties when playing guitar in duets. Frontiers in Human Neuroscience 6, 312. Retrieved from https://doi.org/10.3389/fnhum.2012.00312 Schachner, A., Brady, T. F., Pepperberg, I. M., & Hauser, M. D. (2009). Spontaneous motor entrainment to music in multiple vocal mimicking species. Current Biology 19(10), 831–836. Schön, D., & Tillmann, B. (2015). Short- and long-term rhythmic interventions: Perspectives for language rehabilitation. Annals of the New York Academy of Sciences 1337, 32–39. Schroeder, C. E., & Lakatos, P. (2009). Low-frequency neuronal oscillations as instruments of sensory selection. Trends in Neurosciences 32(1), 9–18. Schröger, E., Marzecová, A., & SanMiguel, I. (2015). Attention and prediction in human audition: A lesson from cognitive psychophysiology. European Journal of Neuroscience 41(5), 641–664. Shenfield, T., Trehub, S. E., & Nakata, T. (2003). Maternal singing modulates infant arousal. Psychology of Music 31(4), 365–375. Snyder, J. S., Hannon, E. E., Large, E. W., & Christiansen, M. H. (2006). Synchronization and continuation tapping to complex meters. Music Perception: An Interdisciplinary Journal 24(2), 135–146. Snyder, J. S., & Large, E. W. (2005). Gamma-band activity reflects the metric structure of rhythmic tone sequences. Cognitive Brain Research 24(1), 117–126. Soley, G., & Hannon, E. E. (2010). Infants prefer the musical meter of their own culture: A crosscultural comparison. Developmental Psychology 46(1), 286–292. Stefanics, G., Hangya, B., Hernádi, I., Winkler, I., Lakatos, P., & Ulbert, I. (2010). Phase entrainment of human delta oscillations can mediate the effects of expectation on reaction speed. Journal of Neuroscience 30(41), 13578–13585. Tal, I., Large, E. W., Rabinovitch, E., Wei, Y., Schroeder, C. E., Poeppel, D., & Golumbic, E. Z. (2017). Neural entrainment to the beat: The “missing-pulse” phenomenon. Journal of Neuroscience 37(26), 6331–6341. Tarr, B., Launay, J., & Dunbar, R. I. (2014). Music and social bonding: “Self-other” merging and neurohormonal mechanisms. Frontiers in Psychology 5, 1096. Retrieved from https://doi.org/10.3389/fpsyg.2014.01096 Taub, G. E., McGrew, K. S., & Keith, T. Z. (2007). Improvements in interval time tracking and effects on reading achievement. Psychology in the Schools 44(8), 849–863. Terwogt, M. M., & Van Grinsven, F. (1991). Musical expression of moodstates. Psychology of Music 19(2), 99–109. Thomson, J. M., Fryer, B., Maltby, J., & Goswami, U. (2006). Auditory and motor rhythm awareness in adults with dyslexia. Journal of Research in Reading 29(3), 334–348. Thomson, J. M., & Goswami, U. (2008). Rhythmic processing in children with developmental dyslexia: Auditory and motor rhythms link to reading and spelling. Journal of Physiology-Paris 102(1–3), 120–129. Thomson, J. M., Leong, V., & Goswami, U. (2013). Auditory processing interventions and developmental dyslexia: A comparison of phonemic and rhythmic approaches. Reading and Writing 26(2), 139–161. Thorpe, L. A., & Trehub, S. E. (1989). Duration illusion and auditory grouping in infancy. Developmental Psychology 25(1), 122–127. Todd, N. (1985). A model of expressive timing in tonal music. Music Perception: An Interdisciplinary Journal 3(1), 33–57.
Toplak, M. E., Dockstader, C., & Tannock, R. (2006). Temporal information processing in ADHD: Findings to date and new methods. Journal of Neuroscience Methods 151(1), 15–29. Trainor, L. J. (1996). Infant preferences for infant-directed versus noninfant-directed playsongs and lullabies. Infant Behavior and Development 19(1), 83–92. Trainor, L. J. (2012). Musical experience, plasticity, and maturation: Issues in measuring developmental change using EEG and MEG. Annals of the New York Academy of Sciences 1252, 25–36. Trainor, L. J. (2015). The origins of music in auditory scene analysis and the roles of evolution and culture in musical creation. Philosophical Transactions of the Royal Society B: Biological Sciences 370(1664), 20140089. doi: 10.1098/rstb.2014.0089 Trainor, L. J., & Adams, B. (2000). Infants’ and adults’ use of duration and intensity cues in the segmentation of tone patterns. Perception & Psychophysics 62(2), 333–340. Trainor, L. J., Chang, A., Cairney, J., Li, Y. C. (2018). Is auditory perceptual timing a core deficit of developmental coordination disorder? Annals of the New York Academy of Sciences 1423, 30–39 Trainor, L. J., & Cirelli, L. (2015). Rhythm and interpersonal synchrony in early social development. Annals of the New York Academy of Sciences 1337, 45–52. Trainor, L. J., Clark, E. D., Huntley, A., & Adams, B. A. (1997). The acoustic basis of preferences for infant-directed singing. Infant Behavior and Development 20(3), 383–396. Trainor, L. J., & Corrigall, K. A. (2010). Music acquisition and effects of musical experience. In M. Riess Jones, R. Fay, & A. Popper (Eds.), Music perception (pp. 89–127). New York: Springer. Trainor, L. J., & Hannon, E. E. (2013). Musical development. In D. Deutsch (Ed.), The psychology of music (3rd ed., pp. 423–497). London: Academic Press. Trainor, L. J., & He, C. (2013). Auditory and musical development. In P. D. Zelazo (Ed.), The Oxford handbook of developmental psychology, Vol. 1: Body and mind (pp. 310–337). Oxford: Oxford University Press. Trainor, L. J., Marie, C., Gerry, D., Whiskin, E., & Unrau, A. (2012). Becoming musically enculturated: Effects of music classes for infants on brain and behavior. Annals of the New York Academy of Sciences 1252, 129–138. Trainor, L. J., & Trehub, S. E. (1992). A comparison of infants’ and adults’ sensitivity to western musical structure. Journal of Experimental Psychology: Human Perception and Performance 18(2), 394–402. Trainor, L. J., & Trehub, S. E. (1994). Key membership and implied harmony in Western tonal music: Developmental perspectives. Perception & Psychophysics 56(2), 125–132. Trainor, L. J., & Unrau, A. (2012). Development of pitch and music perception. In L. Werner, R. R. Fay, & A. N. Popper (Eds.), Springer handbook of auditory research: Human auditory development (pp. 223–254). New York: Springer. Trainor, L. J., & Zatorre, R. J. (2015). The neurobiology of musical expectations from perception to emotion. In S. Hallam, I. Cross, & M. Thaut (Eds.), The Oxford handbook of music psychology (2nd ed., pp. 285–306). Oxford: Oxford University Press. Trehub, S. E., & Thorpe, L. A. (1989). Infants’ perception of rhythm: Categorization of auditory sequences by temporal structure. Canadian Journal of Psychology/Revue canadienne de psychologie 43(2), 217–229. Trehub, S. E., & Trainor, L. J. (1998). Singing to infants: Lullabies and play songs. Advances in Infancy Research 12, 43–78. Trehub, S. E., Unyk, A. M., Kamenetsky, S. B., Hill, D. S., Trainor, L. J., Henderson, J. L., & Saraza, M. (1997). Mothers’ and fathers’ singing to infants. Developmental Psychology 33(3), 500–507. Trehub, S. E., Unyk, A. M., & Trainor, L. J. (1993). Adults identify infant-directed music across cultures. Infant Behavior and Development 16(2), 193–211.
Tunçgenç, B., & Cohen, E. (2016). Movement synchrony forges social bonds across group divides. Frontiers in Psychology 7, 782. Retrieved from https://doi.org/10.3389/fpsyg.2016.00782 Tunçgenç, B., Cohen, E., & Fawcett, C. (2015). Rock with me: The role of movement synchrony in infants’ social and nonsocial choices. Child Development 86(3), 976–984. Valdesolo, P., & DeSteno, D. (2011). Synchrony and the social tuning of compassion. Emotion 11(2), 262–266. Valdesolo, P., Ouyang, J., & DeSteno, D. (2010). The rhythm of joint action: Synchrony promotes cooperative ability. Journal of Experimental Social Psychology 46(4), 693–695. van Ede, F., Niklaus, M., & Nobre, A. C. (2017). Temporal expectations guide dynamic prioritization in visual working memory through attenuated α oscillations. Journal of Neuroscience 37(2), 437– 445. Van Noorden, L., & De Bruyn, L. (2009). The development of synchronization skills of children 3 to 11 years old. In Proceedings of ESCOM—7th Triennial Conference of the European Society for the Cognitive Sciences of Music. Jyväskylä, Finland: University of Jyväskylä. Volman, M. C. J., & Geuze, R. H. (1998). Relative phase stability of bimanual and visuomanual rhythmic coordination patterns in children with a developmental coordination disorder. Human Movement Science 17(4–5), 541–572. Warneken, F., & Tomasello, M. (2006). Altruistic helping in human infants and young chimpanzees. Science 311(5765), 1301–1303. Warneken, F., & Tomasello, M. (2007). Helping and cooperation at 14 months of age. Infancy 11(3), 271–294. Warneken, F., & Tomasello, M. (2009). Varieties of altruism in children and chimpanzees. Trends in Cognitive Sciences 13(9), 397–402. Watemberg, N., Waiserberg, N., Zuk, L., & Lerman-Sagie, T. (2007). Developmental coordination disorder in children with attention-deficit-hyperactivity disorder and physical therapy intervention. Developmental Medicine & Child Neurology 49(12), 920–925. Weimerskirch, H., Martin, J., Clerquin, Y., Alexandre, P., & Jiraskova, S. (2001). Energy saving in flight formation. Nature 413(6857), 697–698. Werker, J. F., & Tees, R. C. (2005). Speech perception as a window for understanding plasticity and commitment in language systems of the brain. Developmental Psychobiology 46(3), 233–251. Wieland, E. A., McAuley, J. D., Dilley, L. C., & Chang, S. E. (2015). Evidence for a rhythm perception deficit in children who stutter. Brain & Language 144, 26–34. Williams, H. G., Woollacott, M. H., & Ivry, R. (1992). Timing and motor control in clumsy children. Journal of Motor Behavior 24(2), 165–172. Williams, J., Thomas, P. R., Maruff, P., Butson, M., & Wilson, P. H. (2006). Motor, visual and egocentric transformations in children with developmental coordination disorder. Child: Care, Health and Development 32(6), 633–647. Wilmut, K., & Wann, J. (2008). The use of predictive information is impaired in the actions of children and young adults with developmental coordination disorder. Experimental Brain Research 191(4), 403–418. Wilson, P. H., Ruddock, S., Smits-Engelsman, B., Polatajko, H., & Blank, R. (2013). Understanding performance deficits in developmental coordination disorder: A meta-analysis of recent research. Developmental Medicine & Child Neurology 55(3), 217–228. Wiltermuth, S. S., & Heath, C. (2009). Synchrony and cooperation. Psychological Science 20(1), 1– 5. Winkler, I., Háden, G. P., Ladinig, O., Sziller, I., & Honing, H. (2009). Newborn infants detect the beat in music. Proceedings of the National Academy of Sciences 106(7), 2468–2471.
Wolff, P. H. (2002). Timing precision and rhythm in developmental dyslexia. Reading and Writing 15(1–2), 179–206. Woolhouse, M., Tidhar, D., Demorest, S., Morrison, S., & Campbell, P. (2010). Group dancing leads to increased person-perception. In Proceedings of the 11th International Conference on Music Perception and Cognition (pp. 605–608). Seattle, WA: University of Washington. Yoshida, K. A., Iversen, J. R., Patel, A. D., Mazuka, R., Nito, H., Gervain, J., & Werker, J. F. (2010). The development of perceptual grouping biases in infancy: A Japanese–English cross-linguistic study. Cognition 115(2), 356–361. Zatorre, R. J., Chen, J. L., & Penhune, V. B. (2007). When the brain plays music: Auditory–motor interactions in music perception and production. Nature Reviews Neuroscience 8(7), 547–558. Zentner, M., & Eerola, T. (2010). Rhythmic engagement with music in infancy. Proceedings of the National Academy of Sciences 107(13), 5768–5773. Zwicker, J. G., Missiuna, C., & Boyd, L. A. (2009). Neural correlates of developmental coordination disorder: A review of hypotheses. Journal of Child Neurology 24(10), 1273–1281.
CHAPT E R 25
MUSIC AND THE AGING BRAIN L A U R A F E R R E R I, A L I N E MO U S S A R D, E MMA N U E L B I G A N D, A N D B A R B A R A T I L L MA N N
I J S wrote in Thoughts on various subjects: “Every man desires to live long, but no man would be old.” Getting older often carries with it a series of physical, cognitive, emotional, and social troubles, which research in psychology and neuroscience is constantly trying to reduce. One of the main topics in research on aging concerns cognitive decline, namely the decreased cognitive functioning associated with increasing age in the adult portion of the lifespan (Salthouse, 2016). Healthy older adults usually perform worse than young adults in numerous cognitive tasks, especially when they involve memory processes and executive functions (e.g., Buckner, 2004; Van Petten et al., 2004). These behavioral outcomes have been linked to changes in brain structure and function, such as the shrinkage of cerebral regions (Raz et al., 2005), prefrontal reorganization (Cabeza, 2002; Cabeza, Anderson, Locantore, & McIntosh, 2002), as well as micro- and macro-structural alterations of brain connectivity (Fjell, Sneve, Grydeland, Storsve, & Walhovd, 2017). Furthermore, normal aging can be associated to difficulties in emotional and social domains, often reflected in late-life depression (Aziz & Steffens, 2013; Meltzer et al.,
1998), and social isolation and disconnectedness (Cornwell & Waite, 2009). These problems become particularly hard to fight in pathological aging (i.e., dementia), where severe cognitive, motor, and emotional impairments dramatically affect patients’ and caregivers’ lives. Therefore, in a society which is exponentially getting older (e.g., Alzheimer’s Disease International, 2015), it has become urgent to investigate the mechanisms promoting successful aging and to identify what can prevent, limit, and rehabilitate cognitive and emotional impairments. In this framework, music arises as a particularly promising stimulus. Able to stimulate the whole brain, thus modulating cerebral activity in brain areas involved in cognitive, motor, and emotional processes (Zatorre, 2005), music is increasingly considered as a powerful tool to improve cognition while promoting well-being and social connection. Furthermore, the use of music for brain stimulation seems to be particularly appropriate in older adults, who can perform similarly to younger adults in music perception tasks (Halpern, Bartlett, & Dowling, 1995, 1998, Johnson et al., 2011), and show well-preserved musical memory, even in cases where episodic memory is impaired (Baird & Samson, 2009; Cuddy, Sikka, & Vanstone, 2015). In this chapter, we review the main studies on music and aging to present the music-driven beneficial effect on cognition and well-being. To this aim, we first focus on the effects of music in normal aging, both in terms of musical expertise and simple musical exposure. A specific section is then devoted to the underlying brain processes. Finally, we consider music-based therapeutic approaches in pathological aging.
N
A P
: M W
C -B
Music is a complex stimulus capable of modifying cognitive function and emotional status. Here, we consider (1) the positive effects of music on cognition in normal aging, contrasting different levels of musical activity, such as musical expertise, short-term training, and passive musical exposure, and (2) the effects of music on emotion and well-being. We then discuss the potential neural substrates of such effects.
Cognition The Effect of Musical Expertise Music is particularly able to promote neural plasticity, namely the unique ability of the human brain to modify its structure and function throughout the life course, following changes within the body or in the external environment. Musical expertise greatly promotes brain plastic changes (Dalla Bella, 2016), and several studies comparing young adult musicians to non-musicians have demonstrated a positive effect of music-driven brain plasticity on many cognitive processes (see Wan & Schlaug, 2010, for review). Encouraged by these promising findings in younger adults, researchers have more recently started to explore if, and to what extent, these positive effects of musical practice could also be observed in the elderly, and so whether practicing music could contribute to reduce typical age-related cognitive decline. Remarkably, older musicians have been shown to perform better than non-musicians in several behavioral tasks, involving speech perception (Parbery-Clark, Strait, Anderson, Hittner, & Kraus, 2011; Parbery-Clark, Anderson, Hittner, & Kraus, 2012), working memory (Amer, Kalender, Hasher, Trehub, & Wong, 2013; Hanna-Pladdy & Gajewski, 2012; ParberyClark et al., 2011), long-term memory (Gooding, Abner, Jicha, Kryscio, & Schmitt, 2014), language (Hanna-Pladdy & MacKay, 2011), visuospatial processing (Hanna-Pladdy & Gajewski, 2012), and executive functions, such as planning, fluency, inhibition, and switching abilities (Amer et al., 2013; Hanna-Pladdy & Gajewski, 2012; Moussard, Bermudez, Alain, Tays, & Moreno, 2016; Parbery-Clark et al., 2011; Zendel & Alain, 2012). Behavioral differences are supported by differences at neural level. For example, Moussard et al. (2016) recently showed that during a go/no-go task, older musicians performed better than older non-musicians in response inhibition and showed increased amplitudes of the N2 and P3 components, electrophysiological signatures typically associated to cognitive control processes in this type of go/no-go task. The observed beneficial effects of musical expertise depend on several factors, such as onset of musical training, proficiency, and amount of formal training or of current practice, which may have specific influences on different aspects of cognition. For example, Hanna-Pladdy and MacKay
(2011) analyzed the predictors of musicians’ cognitive performance using stepwise multiple regression, and reported that the onset of training predicted the improvement of visual memory performance (both immediate and delayed). Current musical activity was more linked to switching abilities, while total years of practice were linked to delayed visual memory performance and visuomotor speed. When comparing three groups of older adults with low, medium, and high musical expertise (defined by a questionnaire assessing music theory knowledge), Gooding et al. (2014) found that the high proficiency group performed significantly better in an episodic memory task than the group with lower musical proficiency. Furthermore, formal musical training in early- to mid-life might promote neural plasticity and leads to structural changes that potentially improve cognitive performance across time. This hypothesis is supported by brain findings showing that even a moderate amount of music training (4 to 14 years) is associated to neural enhancement during speech perception many decades after training has stopped, indicating that musical practice during childhood and young adulthood may carry relevant biological benefits into older adulthood (White-Schwoch, Carr, Anderson, Strait, & Kraus, 2013). These beneficial effects of musical training across the lifespan in healthy individuals raise the question whether these benefits might also impact the development of neurodegenerative diseases typical of pathological aging, such as Alzheimer’s disease (AD). According to the concept of cognitive reserve (Stern, 2002, 2009), stimulating life experiences (e.g., education attainment, occupational activity, leisure activities, social network) are associated with higher resilience to age-related brain diseases and to reduced risks of developing dementia. As music is known to be cognitively stimulating and to promote social integration, it is reasonable to think that it could contribute to building such reserve throughout the life course. In line with these predictions, Wilson and colleagues (Wilson, Boyle, Yang, James, & Bennett, 2015) found that musical training during youth was associated with lower risk of developing mild cognitive impairment (MCI) in a cohort of 964 older adults. Interestingly, similar benefits were found for high levels of foreign language instruction (see Craik, Bialystok, & Freedman, 2010). An interesting approach linking music to cognitive reserve comes from twin studies. When comparing twins individuals (both monozygotic and dizygotic) who played a musical instrument in older adulthood to their nonmusician co-twins, Balbag and colleagues (Balbag, Pedersen, & Gatz,
2014) found that the musician twins were less likely to develop dementia and cognitive impairment, supporting the idea that music may act as a protective factor against normal and pathological cognitive decline. In the same vein, Verghese et al. (2003) showed that frequent participation in musical activities was associated with decreased risk for dementia (but see also Verghese et al., 2006, for more mitigated results regarding risks of MCI, where only the frequency of participation in leisure activities—but not the type of activity per se—is associated to beneficial outcomes). Although the findings described above appear quite promising, we must mention that the interpretation of differences between musicians and nonmusicians, as measured by correlational designs, is always restricted by several limitations: (1) The actual contributions of “nature and nurture” to both musical skills and cognitive differences between musicians and nonmusicians, and thus the “cause or consequence” pattern linking music practice and cognition, need to be clarified. It could be, for example, that individuals who started musical training had higher levels of cognitive functioning already at baseline, and this could, at least in part, explain cohort differences. In the same vein, it is also possible that those who keep playing music at an older age do so because they do have better preservation of cognitive abilities, and thus can better handle music practice. (2) People practicing leisure activities, such as music, usually report more socially integrated lifestyle and/or higher education level and socio-economic status, which are confounding variables that can crucially modulate the aging process. (3) Cross-sectional studies are little informative about the minimum amount of musical practice needed for observing a positive effect of musical training on cognition, which in turns limits recommendations that could be provided to adults that have not been practicing earlier in life: in other words, is it still worth starting musical practice in late adulthood? In sum, to provide causal evidence for musicinduced benefits in the older adult population, intervention or longitudinal studies become necessary. Although specifically pinpointing the longlasting effect of musical training across the entire life course would require a huge—and obviously hardly feasible—longitudinal study, intervention studies measuring the effects of short-term musical practice bring highly valuable insights on music-induced changes.
The Effect of Short-Term Musical Training
Studies on children and young adults have shown that even short-term musical practice is able to modulate behavioral outcomes and brain structures. For example, adult non-musicians who learned to play a fivefinger sequence on a keyboard over only five days (Pascual-Leone, 2001) or who were trained to play piano and read musical notation in fifteen weeks (Stewart et al., 2003) showed better behavioral performance (such as reading music and playing keyboard) together with a cortical brain reorganization (e.g., increased activation of the superior parietal cortex, critical for sensorimotor integration). Importantly, although brain plasticity mechanisms are reduced in older populations, environmental enrichment can strongly modulate and slow down such reduction (see Mora, Segovia, & Del Arco, 2007). Accordingly, short-term musical interventions have also led to beneficial effects in older adult populations. Bugos and colleagues (Bugos, Perlstein, McCrae, Brophy, & Bedenbaugh, 2007) found that six months of piano lessons (30min lesson and three hours of practice per week) were enough for significantly improving performance at cognitive tests, involving motor skills, executive functions, and working memory, in a group of nonmusician adults aged between 60 and 85, compared to a non-active control group. Similar results were obtained by Seinfeld and colleagues (Seinfeld, Figueroa, Ortiz-Gil, & Sanchez-Vives, 2013) after four months of daily piano training. In this study, an effort was made to quantify leisure activities practiced by individuals from the control group (e.g., physical activity, computer, language or painting lessons, etc.), to ensure that they at least received some degree of stimulation as well, suggesting that adding music training to people’s regular activities may make a difference. Furthermore, in line with previous findings, results from Alain and colleagues (Alain et al., 2019) show that three months of music lessons involving rhythmic training with percussive instruments, music theory, and singing were sufficient to show modulation in the electrophysiological response during various cognitive tasks in older adults. It is, however, important to mention some methodological limitations of those studies. (1) Sample sizes are quite small (e.g., experimental group n = 16, Bugos et al., 2007; n = 13, Seinfeld et al., 2013) and participants have particularly high levels of education, which does not necessarily allow for generalization of results to the whole population of older adults. (2) Appropriate control conditions (i.e., active control with similar amount or
frequency of activities), as well as random assignment design, would be required to be able to draw solid conclusions on the effects of such training programs. (3) Long-term follow-up would be informative of the longlasting effects of such interventions, and crucial to provide evidence for music-driven reduction of age-related decline over time and of dementia prevalence. Despite these limitations, the findings represent promising data to suggest that even short-term musical training, reflected in intensive training programs requiring motor, multisensory, and cognitive integration, may be able to induce changes at behavioral and neural levels, and to strengthen cognition in the elderly. This is particularly encouraging as it suggests that music could potentially contribute to the development of cognitive reserve through late-life (and not only lifelong) stimulation.
The Effect of Passive Musical Exposure Another set of studies have shown that simply listening to music could also temporarily enhance cognition. For example, simple passive exposure to background music during cognitive tasks has been shown to improve older adults’ cognitive processes, such as word fluency (Thompson, Moulin, Hayre, & Jones, 2005), processing speed (Bottiroli, Rosi, Russo, Vecchi, & Cavallini, 2014), working memory (Mammarella, Fairfield, & Cornoldi, 2007), and declarative memory (Bottiroli et al., 2014; Ferreri et al., 2014) (see El Haj, Omigie, & Clément, 2014, for an exception). Bottiroli et al. (2014) measured declarative memory and processing speed in a group of older adults listening to negatively and positively valenced classical music (compared to control conditions of silence and white noise). Their results showed that processing speed was better with a musical background of positive valence, and that declarative memory benefited from both negative and positive background music when compared to the control conditions. In a study investigating episodic memory and prefrontal cortex modulation by music, Ferreri et al. (2014) found that encoding words with a musical (i.e., instrumental jazz/blues) background (versus silence) resulted in higher memory performance and less engagement of the prefrontal cortex. The authors suggested that music may act as a facilitating factor for memory performance by easing the verbal encoding and disengaging the prefrontal cortex, whose impairments in older adults are usually related to episodic
memory deficits. In contrast with studies presented above, where musical activity is thought to strengthen cognitive processes, cognitive enhancement during or following music listening is considered as a temporary enhancement of the efficiency of brain processes in general, probably due to global arousal and emotional effects (see section “The Underpinning Brain Mechanisms”). These results, together with the ones on musical training discussed earlier, lead to three main conclusions: (1) lifelong musical expertise seems to reduce cognitive decline, possibly by contributing to building some form of cognitive reserve; (2) musical interventions at an old age are able to improve cognitive functions in the absence of prior musical expertise, suggesting a positive effect of late-life training; (3) as both active music training and passive music listening can result in improved cognitive performance, it is reasonable to think that numerous and diverse neural mechanisms are prompted by music. The next two sections present studies investigating the effects of music on non-cognitive domains, like emotions and well-being, as well as the potential brain mechanisms implicated in cognitive and emotional effects of music in the elderly.
Emotions and Well-Being Difficulties in emotional and social spheres, which can lead to depression and loneliness, threaten older adults’ well-being (see Luanaigh & Lawlor, 2008). These difficulties might be due to intense life changes (e.g., retirement, bereavement), cognitive and physical decline, hormonal changes, or changes in social relationships. Music has the tremendous power to evoke strong emotions and intense pleasure, thus modulating subjective mood and arousal (see Koelsch, 2014). Emotions evoked by music are associated to physiological responses (such as changes in skin conductance and heart rate or hormone release) and brain activation in the mesolimbic systems (see Zatorre & Salimpoor, 2013). Furthermore, musical activities often involve social functions promoting social contact, empathy, cooperation, and sense of belonging with others (Koelsch, 2014). Critically, the influence of music in everyday life (Cohen, Bailey, & Nilsson, 2002; Laukka, 2007) as well as its emotional impact (Saarikallio, 2011) seem to
keep high levels of importance across lifespan and cultures (Grau-Sánchez et al., 2017). Older adults report that music helps them in connecting with other people, developing self-identity and increasing self-esteem, thus increasing quality of life and decreasing feelings of isolation and loneliness (see GrauSánchez et al., 2017). Furthermore, music, as an emotional stimulus, can be used as an aid for expressing and experiencing spirituality and escaping from everyday life through imagination or the evocation of autobiographical memories (Hays & Minichiello, 2005). A questionnaire administered to older adults living in Sweden (Laukka, 2007) highlighted that positive emotions were among the most frequently felt emotions in response to music. Furthermore, participants reported that they listened to music in response to psychological needs, such as emotional functions (e.g., mood regulation, relaxation, pleasure), and needs related to issues of identity, belonging, and agency. Notably, both positive emotions and satisfaction of psychological needs are crucial factors for well-being (Deci & Ryan, 2000; Diener, Oishi, & Lucas, 2003). In support of the hypothesis that music can modulate arousal and improve life quality, Lai and Good (2005) showed that listening to fortyfive minutes of relaxing music at bedtime improves sleep quality, duration, and efficiency, thus reducing daytime dysfunctions in a group of older adults with sleeping disorders (see also Luanaigh & Lawlor, 2008). In line with these findings, Chan and colleagues (Chan, Chan, Mok, Tse, & Yuk, 2009) found that listening to music before going to sleep significantly reduces older adults’ depression scores, together with heart rate, blood pressure, and respiratory rate levels. However, these studies usually compare a musical intervention with an untreated control group. It could be therefore that the absence of a proper control (i.e., testing another type of active intervention) can affect the correct interpretation of the results. Beyond music listening, practicing a musical activity can significantly contribute to health and emotional benefits (Hallam & Creech, 2016; see also Menec, 2003). Piano lessons in older non-musicians (Seinfeld et al., 2013) and participation in community choirs (Kreutz, Bongard, Rohrmann, Hodapp, & Grebe, 2004; Lamont, Murray, Hale, & Wright-Bevans, 2018) resulted in lower levels of depression, as well as increased positive mood states, quality of life, and social interactions when compared to other leisure
activities (e.g., painting or physical exercises) or to the subjective state before the musical interventions. In sum, music can strongly modulate not only cognitive performance, but also emotions, mood, and arousal in older adults, thus improving wellbeing and social connections. This calls for a deeper understanding of the brain mechanisms involved in music-related positive effect in aging.
The Underpinning Brain Mechanisms The reviewed studies on cognition and emotion suggest that the positive effect of music on aging might be due to the stimulation of numerous brain processes, and several mechanisms may explain the observed benefit. Here, we examine several possible and interconnected explanations. First, we focus on the overlap between musical and non-musical activities and the idea that the creation or strengthening of shared networks promote so-called “transfer effects” of music practice on non-musical tasks. In the context of aging, this hypothesis will be linked to the concept of cognitive reserve. The second hypothesis will focus on music-induced physiological effects, i.e., mood and reward modulations due to the emotional or arousing aspects of music. Musical and non-musical activities share cognitive processes. For example, an fMRI study from Sammler et al. (2010) revealed that speech and songs are processed at varying degrees of integration along the axis of the superior temporal sulcus and gyrus and the precentral gyrus, indicating that language and music share numerous features reflected in the observed overlap in brain circuitry (see also Patel, 2011; Patel, Gibson, Ratner, Besson, & Holcomb, 1998; Tillmann, 2012 for similarities between music and language). Similarly, brain regions activated by rhythmic auditory stimulation (e.g., music with regular beats), such as the basal ganglia and the cerebellar-thalamocortical networks, are also crucial for the control of motor functioning (Dalla Bella, Benoit, Farrugia, Schwartze, & Kotz, 2015). The observation that several brain regions involved in music perception and production are also involved in other functions may explain the positive transfer of training-related benefits to non-musical domains, such as movement, language, memory, and executive functions (see
Schellenberg, 2003). This becomes of particular interest in aging, where being engaged in stimulating activities throughout the lifespan contributes to building cognitive reserve. Cognitive reserve has been proposed as a model to explain inter-individual differences in the severity of cognitive aging and clinical dementia (Whalley, Deary, Appleton, & Starr, 2004) and is thought to act through two main mechanisms: (1) strengthening existing networks, making them more robust and resilient to age-related disruptions; (2) increasing brain flexibility and facilitating the recruitment of new or alternative networks, thus allowing compensatory mechanisms to achieve a task despite a potential disruption. Based on studies reviewed above, it is reasonable to think that music could promote cognitive reserve through these two mechanisms and thus act as a “neuroprotector” (Omigie & Samson, 2014). This is supported, for example, by findings revealing that musicians, in comparison to non-musicians, show less of the typical agerelated brain volume reductions in dorsolateral prefrontal cortex and left inferior frontal gyrus (Sluming et al., 2002). Such hypotheses could also explain the behavioral advantage showed by musicians in cognitive tasks. A second explanation focuses on the music-related changes in emotions, physiological arousal, and pleasure, usually associated with increased activation in core emotion network including amygdala, hippocampus, and in mesolimbic striatal regions associated to dopamine release, especially the nucleus accumbens (Salimpoor et al., 2013). Modulations of activation in these regions can be associated to emotional and cognitive changes. Indeed, music promotes positive emotions and modulates the level of relaxation, pleasure, and motivation, thus increasing well-being and social connection (see section “Emotions and Well-Being” above). Modulation in arousal, such as reduced levels of anxiety and agitation, can also enhance attention mechanisms crucial for cognitive performance (Peck, Girard, Russo, & Fiocco, 2016). Likewise, as emotionally valenced stimuli are usually easier to remember (see Christianson, 2014), the music-driven high emotional intensity may promote the encoding and retrieval of the to-be-remembered information, thus resulting in increased memory performance even for the elderly (see Jäncke, 2008). Furthermore, dopamine transmission in the mesolimbic reward system promotes memory formation in the hippocampus through the ventro-tegmental area/substantia nigra— hippocampal loop (Lisman, Grace, & Duzel, 2011), and could therefore
drive the music-related increased performance in learning and memory tasks (Ferreri & Rodriguez-Fornells, 2017). Interestingly, such stimulating and emotional power of music may induce physiological-cellular changes and promote brain plasticity (see Dalla Bella, 2016; Wan & Schlaug, 2010), thus suggesting that music can not only strengthen and protect the existing networks, but also stimulate and reorganize them through plastic changes. This is particularly relevant when considering the decrease in plastic changes during the lifespan (Stiles, 2000). Furthermore, as plastic changes are related to neural myelination (Gibson et al., 2014), it may be that music stimulation increases and preserves not only gray matter volume, but also brain connectivity, usually impaired by aging processes (Fjell et al., 2017). In sum, the benefits of music on aging could be explained by some overlap in brain resources and changes in mood and arousal promoting the strengthening of existing networks and stimulating brain plasticity. This ultimately points at music as a powerful tool to fight against aging-related emotional and cognitive impairments. Although further research is needed to specifically pinpoint all the brain mechanisms explaining the positive effect of music on the aging brain, fundamental research in neuroscience has substantially increased our understanding of music-related changes on behavioral and neural levels. The observation that diverse music-related neural processes are involved in aging suggests that musical interventions can be employed to stimulate a broad range of impaired functions in aging. The time is ripe to further investigate and develop new perspectives for neuroscience-informed rehabilitation techniques.
P I
A
: N R
T
-
M Dementia refers to progressive and irreversible neurodegenerative brain damage that constitutes the most frequent form of pathological aging. It represents one of the greatest health, social, and economic challenges of our time, with 46 million people living with dementia worldwide, and a
projection of over 131 million people in 2050 (Alzheimer’s Disease International, 2015). Numerous musical interventions in the clinical field have focused on dementia patients, in particular when suffering from AD, the most common form of dementia. A relevant factor justifying musical interventions in dementia is related to the fact that abilities such as music perception and musical memory may be relatively spared in patients suffering from dementia (see Baird & Samson, 2009, 2015 for reviews). For example, a recent study on music perception reported that AD patients often perform as well as healthy controls in basic perceptual skills, such as temporal and timbre processing, musical scene analysis or tune recognition (Golden et al., 2017). In addition, musical memory has been shown to be spared in dementia patients despite severe memory deficits, for example allowing patients with severe AD to learn and recall new songs (Baird, Umbach, & Thompson, 2017b). Therefore, music remains accessible to most patients regardless of cognitive integrity, and thus constitutes a unique simulation tool, in addition to being an especially appropriate means of communication between patients and caregivers (Ogay, 1995). Several music-based therapeutic approaches have been employed depending on the stage of dementia and the therapeutic goals. For example, music has been used as mnemonic for the support of encoding or retrieval of information (early stage), as reminiscence therapy for improving episodic memory (early to moderate stage), as arousing/calming approach for apathy, anxiety, or aggressiveness (moderate to severe stage), and as a unique and rare way to communicate with patients (severe stage). Here, in light of the brain mechanisms previously considered, and specifically focusing on dementias, we review and discuss in the following the music-based therapeutic approaches according to different rehabilitative goals: memory, language, movement, and emotions and well-being.
Memory Because of degeneration in medio-temporal and prefrontal lobes, encoding and retrieving information is one of the most affected abilities in AD dementia and of particular interest for music interventions. Several findings revealed that dementia patients with severe memory deficits can show a
surprisingly robust musical memory (see Baird & Samson, 2015). A possible neuroanatomical explanation is that brain regions specifically involved in musical memory, such as the caudal anterior cingulate and the ventral pre-supplementary motor area, are also regions relatively spared in the first stages of AD (Jacobsen et al., 2015). The question thus arises as to whether music can be used to promote the encoding and retrieval of nonmusical material in dementia patients. It has been shown that presenting verbal information in a sung, rather than spoken version, improves its later recognition after an immediate (Simmons-Stern, Budson, & Ally, 2010; Simmons-Stern et al., 2012) or delayed (i.e., four weeks, Moussard, Bigand, Belleville, & Peretz, 2012, 2014a) recall at a mild stage of AD. This supports the hypothesis that music may act as a good anchor point for verbal information, in turn enhancing its retrieval (see Ferreri & Verga, 2016). Different results come from Baird and colleagues (Baird, Samson, Miller, & Chalmers, 2017a), where no beneficial effects of the sung versus spoken modality were observed on subsequent immediate (30 min) and delayed (24 hours) recall. These contrasting findings might be due to experimental differences. As discussed by Baird et al. (2017a), using only one session of learning, rather than several learning sessions over weeks (Moussard et al., 2012, 2014a), may have hindered revealing the potential benefit of music. Thus, it is possible that music could represent an efficient mnemonic to learn or relearn information in early stages of AD, but only if there are sufficient time and learning trials allocated to encoding this information. Other findings based on non-verbal memory tasks support music-related memory improvement in dementia. In another study by Moussard and colleagues (Moussard, Bigand, Belleville, & Peretz, 2014b), mild AD participants were asked to learn a sequence of gestures in synchrony (i.e., shadowing the experimenter) or not (i.e., observing the experimenter), performed with music or a metronome beat. Results showed that AD patients learned better in the music condition when tested after an immediate (but not delayed, i.e., 10 min) recall. Several studies also showed that music facilitates the recall of autobiographical (i.e., personal) memories in people suffering from AD. El Haj and colleagues (El Haj, Fasotti, & Allain, 2012a; El Haj, Clément, Fasotti, & Allain, 2013) found higher quality (in terms of speed of recall, content specificity, and grammatical complexity) of autobiographical memories when recalled after having
listened to music. These studies compared music (selected by the patients or by the experimenter) to a silent control condition. Critically, Foster and Valentine (2001) showed that a beneficial effect of background sound on autobiographical memory for recent (but not for remote) events was also observed for cafeteria noise (instead of music). Hence, it could be argued that not the music itself, but rather a more general auditory stimulation drives the observed positive effect. However, in a further study, El Haj and colleagues (El Haj, Postal, & Allain, 2012b) found better autobiographical memories after exposure to patients-selected music compared to another musical condition (i.e., Vivaldi’s Four Seasons), thus supporting the pivotal and probably particularly arousing role of personally relevant music stimulation (see also Lord & Garner, 1993). Remarkably, most of these studies also reported an emotional component: in AD patients, preferred music-triggered autobiographical memories were rated higher in emotional content (El Haj et al., 2012a), and with prevalence of positive over negative content (El Haj et al., 2012b), than memories retrieved during silence or with music chosen by the experimenter. Consistent with the hypothesis that music-driven modulations in emotions can enhance memory (Jäncke, 2008), these findings suggest that music may enhance autobiographical recall by promoting positive emotional memories (El Haj et al., 2012b; see Irish et al., 2006, for alternative explanation based on anxiety reduction).
Language While numerous studies have investigated the role of music on memory in dementia, few investigations exist in the language domain. Findings from studies investigating non-degenerative disease related to brain traumas or vascular problems, such as stroke, suggest that music can be a useful tool for language rehabilitation. Melodic intonation therapy (MIT), a speech therapy based on the potential rehabilitative effects of singing and rhythmic movement, has been described as a valid intervention for improving poststroke aphasia in younger and older adults (e.g., Belin et al., 1996; Racette, Bard, & Peretz, 2006; see Zumbansen, Peretz, & Hébert, 2014 for a review). Consistently with the hypothesis that music stimulation may enhance cognitive functions by promoting brain connectivity, Schlaug and
colleagues (Schlaug, Marchina, & Norton, 2009) showed increased number of fibers and volume of the arcuate fasciculus comparing patients’ brain pre- and post-MIT treatment. Crucially, the arcuate fasciculus is a white matter tract connecting several regions involved in language production, such as superior temporal lobe, premotor regions and posterior inferior frontal gyrus, and primary motor cortex. Although to our knowledge no study has explored MIT interventions in dementia care, positive results of music interventions on language function have also been found in AD patients. Brotons and Koger (2000) observed that four sessions of music therapy (i.e., listening to songs related to a conversation topic) significantly improved both speech content and fluency in a group of ten AD patients. However, considering the small sample size of the study and the lack of similar investigations, further research is needed to better understand the effect of musical interventions on language in dementia. This becomes particularly relevant when considering the progressive loss of language functioning usually observed in dementias, such as AD.
Motor Functions Although findings on AD patients suggest that music may help motor functions by enhancing gesture learning (Moussard et al., 2014b), the investigation of motor rehabilitation through music mainly concerns the study of patients with post-stroke motor impairment or Parkinson’s disease (PD). PD is characterized by severe movement impairments (e.g., bradykinesia or akinesia, limb rigidity and postural instability) that usually causes gait disorders, such as small steps, lower cadence and reduced gait speed, festination and freezing (Dalla Bella et al., 2015). Several studies clearly revealed that rhythm, and more specifically isochronous stimulation through metronome or music (i.e., with its underlying beats), can significantly improve gait (Thaut et al., 1996; see De Dreu, Van Der Wilk, Poppe, Kwakkel, & Van Wegen, 2012 and Dalla Bella et al., 2015 for metaanalysis and review). In particular, rhythmic auditory stimulation (i.e., when PD patients are asked to walk along with the auditory cue) has been shown to improve gait velocity, cadence, and stride length (e.g., McIntosh, Brown,
Rice, & Thaut, 1997; Thaut et al., 1996) and reduce freezing episodes (e.g., Arias & Cudeiro, 2010). Such improvements have been shown to persist also in the absence of the auditory stimulation following cuing-based training programs, thus leading to long-lasting effects (see Dalla Bella et al., 2015). Importantly, rhythmic auditory stimulation has been shown to be an effective intervention for gait rehabilitation not only in PD patients, but also in subjects with hemi-paretic stroke (Thaut, McIntosh, & Rice, 1997; Thaut et al., 2007). One possible explanation for this beneficial effect relies on the fact that external auditory cues, by promoting neural entrainment, generate temporal expectations allowing to predict the occurrence of a next event (Jones, 1976; Large, 2008). Such rhythm-driven expectations can help in regularizing and stabilizing movement when the sensorimotor network underlying temporal processing is impaired, as in PD or stroke patients, by reinforcing compensatory neural networks able to enhance motor behavior (Benoit et al., 2014; Nombela, Hughes, Owen, & Grahn, 2013).
Emotions and Well-Being While the impact of music on emotion and well-being is important for normal aging, it becomes crucial in pathological aging. Problems like apathy, anxiety, depression, and agitation are indeed hard challenges when dealing with dementia care. Apathy is a behavioral and psychological symptom of dementia. Different from depression, it is characterized by a diminution in initiative and engagement in activities, and usually correlates with a decrease in cognitive functioning (Levy et al., 1998) and dopamine transmission (David et al., 2008). Consistently with the music-related changes in physiological arousal and dopamine transmission observed in response to music stimulation, musical interventions have been shown to significantly reduce apathy levels in patients with dementia (Holmes, Knights, Dean, Hodkinson, & Hopkins, 2006). In the lay-audience documentary Alive Inside, this effect is famously illustrated with the case of Henri, who seems to be suddenly back to life after having been exposed to his favorite jazz music.
Several studies also showed that music interventions (such as listening to music, singing, or playing instruments, see Ueda, Suzukamo, Sato, & Izumi, 2013 for a review), especially if long-lasting (i.e., more than three months), can significantly decrease levels of anxiety and depression in people suffering from dementia (Clément, Tonini, Khatir, Schiaratura, & Samson, 2012; Guetin et al., 2009; Janata, 2012; Narme et al., 2014; Sakamoto, Ando, & Tsutou, 2013; Sung, Chang, Lee, & Lee, 2006; Svansdottir & Snaedal, 2006), in turn improving their quality of life (see also Stegemöller, 2017). For example, Sakamoto et al. (2013) found that both passive and interactive (i.e., clapping, singing, and dancing) music listening elicited positive emotional responses in severe AD patients (when compared to a non-intervention control group of patients). However, as previously discussed, a proper control condition is needed for stating an actual music-related positive effect. Two studies compared musical intervention to other pleasurable activities (i.e., cooking) in patients with AD or mixed dementia (Clément et al., 2012; Narme et al., 2014). While one study observed no differences at short-term, but a significantly more positive emotional state for the music group at long-term (Clément et al., 2012), the other study (Narme et al., 2014) revealed that both music and cooking interventions led to positive changes in the patients’ behavior and emotional state, also reducing caregiver distress. These results suggest that the positive effect of music on emotions and arousal reduction might be driven by the pleasantness of the activity rather than by the music itself (see Samson, Clément, Narme, Schiaratura, & Ehrlé, 2015 for methodological requirements for non-pharmacological clinical trials). This leads to two main considerations: (1) In light of the hedonic impact on patients’ wellbeing, it is important to note that music is probably one of the most pleasurable activities for humans. As such, music could be considered as a special stimulus able to involve a broader range of the population when compared to other types of interventions. (2) As discussed by Narme et al. (2014), therapist’s preference and implication in the proposed interventions play a crucial role and can affect the therapeutic outcome. This calls for the employment of professional personnel (i.e., trained music therapists) able to exploit the whole potential of music interventions and adapt it to each patient’s needs. As most of the studies investigating the effect of music on reduction in agitation usually compare musical interventions solely to standard care (see Baird & Samson, 2015; Narme et al., 2014), the currently
existing research is probably not enough for claiming a reliable effect of specific music stimulation, and calls for more controlled clinical trials. Nevertheless, a strong link between music and well-being in dementia seems to exist. Exploring the relationship between music and personal identity, McDermott and colleagues (McDermott, Orrell, & Ridder, 2014) recently collected interviews from patients with dementia, family carers, staff, and music therapists. Their analyses resulted in a model, the “psychosocial model of music in dementia,” in which music emerges as a powerful and reliable stimulus for promoting self-identity and personal connectedness, ultimately improving well-being (see Ogay, 1995 for music therapy and personality in dementia patients; see also Norberg, Melin, & Asplund 2003 and Castro et al., 2015 for music interventions in the final stage of dementia and in patients with disorders of consciousness).
C
F
D
Research on normal and pathological aging revealed that music is an interesting and powerful means for promoting cognition and well-being in older adult populations: music constitutes an enjoyable and social activity, accessible to anyone regardless of his/her background (e.g., education attainment, previous musical experience). The positive effect of music in aging seems to rely on diverse, complex, and interacting neural processes promoting brain plasticity, transfer, and compensatory mechanisms that improve behavioral outcomes at emotional, cognitive, and social levels. While numerous positive and encouraging results propose music as a unique tool for preventing cognitive decline and rehabilitating deficits related to neurodegenerative diseases, it is also worth mentioning that relevant limitations and open questions arise from the existing literature. As argued above, biases such as lack of a proper control group or condition and small sample sizes may affect the power and reliability of the results. The inconsistencies of the results across studies may also come from methodological differences related to the type of musical interventions (e.g., active training, passive listening), their duration and frequency, as well as the type of music employed (familiar, unfamiliar, selected by the participant or the experimenter). More experimental rigor and systematic
manipulations are therefore needed for further investigations, and would help in clarifying the actual contribution of music (listening, practice) in the observed beneficial effects. Accordingly, it will be critical for further research to compare different types of music training with other stimulating leisure activities, and to provide evidence for the minimal optimal dose (i.e., the type, number, and duration of musical interventions) required for observing positive outcomes (perhaps depending on the targeted population, i.e., healthy older adults or patients). Furthermore, although numerous brain mechanisms are supposed to prompt the observed behavioral findings, only few neuroimaging studies have been conducted on music and aging. Further research investigating the underlying neural mechanisms is therefore needed to better understand how music acts on the aging brain, thus allowing the setup of more fine-grained musical interventions. Clarifying these open questions may afford effective contributions not only to the research community but also, and most importantly, to the society. Results on musical practice and its long-term effects diminishing cognitive decline highlight indeed the importance of facilitating music activities across the lifespan. The hypothesis that musical training may act as a neuroprotective factor would strongly support music classes in the educational system (see Kraus et al., 2014; White-Schwoch et al., 2013). In addition, this would allow offering a valid, pleasant, and affordable tool of stimulation and prevention of neurodegenerative diseases in the elderly population. Accordingly, results on the effectiveness of short-term musical exposure at practice at the older age and musical interventions in clinical settings endorse the insertion of music activities and music-therapeutic programs in healthcare, as means for hindering or at least diminishing cognitive decline, and decreasing pathological aging-related deficits. In sum, music rises as a great tool able to improve cognitive functions and well-being acting on numerous brain processes in the older adults population. Nevertheless, other leisure activities (e.g., learning a second language, playing complex games, etc.) could also lead to similar positive effects. It remains therefore essential for future research and clinical applications to develop individualized protocols depending on patients’ needs (e.g., improvement of mood, memory, etc.) and personal preferences, also comparing the different domains.
R Alain, C., Moussard, A., Singer, J., Lee, Y., Bidelman, G. M., & Moreno, S. (2019). Music and Visual Art Training Modulate Brain Activity in Older Adults. Frontiers in neuroscience, 13. Alzheimer’s Disease International (2015). World Alzheimer Report, 2015. The global impact of Dementia 2015. An analysis of prevalence, incidence, costs and trends. London: Alzheimer’s Disease International (ADI). Amer, T., Kalender, B., Hasher, L., Trehub, S. E., & Wong, Y. (2013). Do older professional musicians have cognitive advantages?. PloS ONE 8(8), e71630. Arias, P., & Cudeiro, J. (2010). Effect of rhythmic auditory stimulation on gait in Parkinsonian patients with and without freezing of gait. PloS ONE 5(3), e9675. Aziz, R., & Steffens, D. C. (2013). What are the causes of late-life depression? Psychiatric Clinics of North America 36(4), 497–516. Baird, A., & Samson, S. (2009). Memory for music in Alzheimer’s disease: Unforgettable? Neuropsychology Review 19(1), 85–101. Baird, A., & Samson, S. (2015). Music and dementia. Progress in Brain Research 217, 207–235. Baird, A., Samson, S., Miller, L., & Chalmers, K. (2017a). Does music training facilitate the mnemonic effect of song? An exploration of musicians and nonmusicians with and without Alzheimer’s dementia. Journal of Clinical and Experimental Neuropsychology 39(1), 9–21. Baird, A., Umbach, H., & Thompson, W. F. (2017b). A nonmusician with severe Alzheimer’s dementia learns a new song. Neurocase 23(1), 36–40. Balbag, M. A., Pedersen, N. L., & Gatz, M. (2014). Playing a musical instrument as a protective factor against dementia and cognitive impairment: A population-based twin study. International Journal of Alzheimer’s Disease 8, 836748. doi:10.1155/2014/836748 Belin, P., Zilbovicius, M., Remy, P., Francois, C., Guillaume, S., Chain, F., … Samson, Y. (1996). Recovery from nonfluent aphasia after melodic intonation therapy: A PET study. Neurology 47(6), 1504–1511. Benoit, C. E., Dalla Bella, S., Farrugia, N., Obrig, H., Mainka, S., & Kotz, S. A. (2014). Musically cued gait-training improves both perceptual and motor timing in Parkinson’s disease. Frontiers in Human Neuroscience 8. Retrieved from https://doi.org/10.3389/fnhum.2014.00494 Bottiroli, S., Rosi, A., Russo, R., Vecchi, T., & Cavallini, E. (2014). The cognitive effects of listening to background music on older adults: Processing speed improves with upbeat music, while memory seems to benefit from both upbeat and downbeat music. Frontiers in Aging Neuroscience 6. Retrieved from https://doi.org/10.3389/fnagi.2014.00284 Brotons, M., & Koger, S. M. (2000). The impact of music therapy on language functioning in dementia. Journal of Music Therapy 37(3), 183–195. Buckner, R. L. (2004). Memory and executive function in aging and AD: Multiple factors that cause decline and reserve factors that compensate. Neuron 44(1), 195–208. Bugos, J. A., Perlstein, W. M., McCrae, C. S., Brophy, T. S., & Bedenbaugh, P. H. (2007). Individualized piano instruction enhances executive functioning and working memory in older adults. Aging and Mental Health 11(4), 464–471. Cabeza, R. (2002). Hemispheric asymmetry reduction in older adults: The HAROLD model. Psychology and Aging 17(1), 85–100. Cabeza, R., Anderson, N. D., Locantore, J. K., & McIntosh, A. R. (2002). Aging gracefully: Compensatory brain activity in high-performing older adults. NeuroImage 17(3), 1394–1402. Castro, M., Tillmann, B., Luauté, J., Corneyllie, A., Dailler, F., André-Obadia, N., & Perrin, F. (2015). Boosting cognition with music in patients with disorders of consciousness.
Neurorehabilitation and Neural Repair 29(8), 734–742. Chan, M. F., Chan, E. A., Mok, E., Tse, K., & Yuk, F. (2009). Effect of music on depression levels and physiological responses in community-based older adults. International Journal of Mental Health Nursing 18(4), 285–294. Christianson, S. A. (Ed.). (2014). The handbook of emotion and memory: Research and theory. New York: Psychology Press. Clément, S., Tonini, A., Khatir, F., Schiaratura, L., & Samson, S. (2012). Short and longer term effects of musical intervention in severe Alzheimer’s disease. Music Perception: An Interdisciplinary Journal 29(5), 533–541. Cohen, A., Bailey, B., & Nilsson, T. (2002). The importance of music to seniors. Psychomusicology: A Journal of Research in Music Cognition 18(1–2), 89–102. Cornwell, E. Y., & Waite, L. J. (2009). Social disconnectedness, perceived isolation, and health among older adults. Journal of Health and Social Behavior 50(1), 31–48. Craik, F. I. M., Bialystok, E., & Freedman, M. (2010). Delaying the onset of Alzheimer disease: Bilingualism as a form of cognitive reserve. Neurology 75(19), 1726–1729. Cuddy, L. L., Sikka, R., & Vanstone, A. (2015). Preservation of musical memory and engagement in healthy aging and Alzheimer’s disease. Annals of the New York Academy of Sciences 1337, 223– 231. Dalla Bella, S. (2016). Music and brain plasticity. In S. Hallam, I Cross, & M. Thaut (Eds.), The Oxford handbook of music psychology (pp. 325–342). Oxford: Oxford University Press. Dalla Bella, S., Benoit, C. E., Farrugia, N., Schwartze, M., & Kotz, S. A. (2015). Effects of musically cued gait training in Parkinson’s disease: Beyond a motor benefit. Annals of the New York Academy of Sciences 1337, 77–85. David, R., Koulibaly, M., Benoit, M., Garcia, R., Caci, H., Darcourt, J., & Robert, P. (2008). Striatal dopamine transporter levels correlate with apathy in neurodegenerative diseases: A SPECT study with partial volume effect correction. Clinical Neurology and Neurosurgery 110(1), 19–24. De Dreu, M. J., Van Der Wilk, A. S. D., Poppe, E., Kwakkel, G., & Van Wegen, E. E. H. (2012). Rehabilitation, exercise therapy and music in patients with Parkinson’s disease: A meta-analysis of the effects of music-based movement therapy on walking ability, balance and quality of life. Parkinsonism & Related Disorders 18, S114–S119. Deci, E. L., & Ryan, R. M. (2000). The “what” and “why” of goal pursuits: Human needs and the self-determination of behavior. Psychological Inquiry 11(4), 227–268. Diener, E., Oishi, S., & Lucas, R. E. (2003). Personality, culture, and subjective well-being: Emotional and cognitive evaluations of life. Annual Review of Psychology 54, 403–425. El Haj, M., Clément, S., Fasotti, L., & Allain, P. (2013). Effects of music on autobiographical verbal narration in Alzheimer’s disease. Journal of Neurolinguistics 26(6), 691–700. El Haj, M., Fasotti, L., & Allain, P. (2012a). The involuntary nature of music-evoked autobiographical memories in Alzheimer’s disease. Consciousness and Cognition 21(1), 238–246. El Haj, M., Omigie, D., & Clément, S. (2014). Music causes deterioration of source memory: Evidence from normal ageing. Quarterly Journal of Experimental Psychology 67(12), 2381–2391. El Haj, M., Postal, V., & Allain, P. (2012b). Music enhances autobiographical memory in mild Alzheimer’s disease. Educational Gerontology 38(1), 30–41. Ferreri, L., Bigand, E., Perrey, S., Muthalib, M., Bard, P., & Bugaiska, A. (2014). Less effort, better results: How does music act on prefrontal cortex in older adults during verbal encoding? An fNIRS study. Frontiers in Human Neuroscience 8. Retrieved from https://doi.org/10.3389/fnhum.2014.00301 Ferreri, L., & Rodriguez-Fornells, A. (2017). Music-related reward responses predict episodic memory performance. Experimental Brain Research 235(12), 3721–31.
Ferreri, L., & Verga, L. (2016). Benefits of music on verbal learning and memory. Music Perception: An Interdisciplinary Journal 34(2), 167–182. Fjell, A. M., Sneve, M. H., Grydeland, H., Storsve, A. B., & Walhovd, K. B. (2017). The disconnected brain and executive function decline in aging. Cerebral Cortex 27(3), 2303–2317. Foster, N. A., & Valentine, E. R. (2001). The effect of auditory stimulation on autobiographical recall in dementia. Experimental Aging Research 27(3), 215–228. Gibson, E. M., Purger, D., Mount, C. W., Goldstein, A. K., Lin, G. L., Wood, L. S., … Monje, M. (2014). Neuronal activity promotes oligodendrogenesis and adaptive myelination in the mammalian brain. Science 344(6183), 1252304. Golden, H. L., Clark, C. N., Nicholas, J. M., Cohen, M. H., Slattery, C. F., Paterson, R. W., … Warren, J. D. (2017). Music perception in dementia. Journal of Alzheimer’s Disease 55(3), 933– 949. Gooding, L., Abner, E. L., Jicha, G. A., Kryscio, R. J., & Schmitt, F. A. (2014). Musical training and late-life cognition. Journal of Alzheimer’s Disease and Other Dementias 29, 333–343. Grau-Sánchez, J., Foley, M., Hlavová, R., Muukkonen, I., Ojinaga-Alfageme, O., Radukic, A., … Hundevad, B. (2017). Exploring musical activities and their relationship to emotional well-being in elderly people across Europe: A study protocol. Frontiers in Psychology 8. Retrieved from https://doi.org/10.3389/fpsyg.2017.00330 Guetin, S., Portet, F., Picot, M. C., Pommié, C., Messaoudi, M., Djabelkir, L., … Touchon, J. (2009). Effect of music therapy on anxiety and depression in patients with Alzheimer’s type dementia: Randomised, controlled study. Dementia and Geriatric Cognitive Disorders 28(1), 36–46. Hallam, S., & Creech, A. (2016). Can active music making promote health and well-being in older citizens? Findings of the music for life project. London Journal of Primary Care 8(2), 21–25. Halpern, A. R., Bartlett, J. C., & Dowling, W. J. (1995). Aging and experience in the recognition of musical transpositions. Psychology and Aging, 10(3), 325–342. Halpern, A. R., Bartlett, J. C., & Dowling, W. J. (1998). Perception of mode, rhythm, and contour in unfamiliar melodies: Effects of age and experience. Music Perception: An Interdisciplinary Journal 15(4), 335–355. Hanna-Pladdy, B., & Gajewski, B. (2012). Recent and past musical activity predicts cognitive aging variability: Direct comparison with general lifestyle activities. Frontiers in Human Neuroscience 6. Retrieved from https://doi.org/10.3389/fnhum.2012.00198 Hanna-Pladdy, B., & MacKay, A. (2011). The relation between instrumental musical activity and cognitive aging. Neuropsychology 25(3), 378–386. Hays, T., & Minichiello, V. (2005). The contribution of music to quality of life in older people: An Australian qualitative study. Ageing & Society 25(2), 261–278. Holmes, C., Knights, A., Dean, C., Hodkinson, S., & Hopkins, V. (2006). Keep music live: Music and the alleviation of apathy in dementia subjects. International Psychogeriatrics 18(4), 623–630. Irish, M., Cunningham, C. J., Walsh, J. B., Coakley, D., Lawlor, B. A., Robertson, I. H., & Coen, R. F. (2006). Investigating the enhancing effect of music on autobiographical memory in mild Alzheimer’s disease. Dementia and Geriatric Cognitive Disorders 22(1), 108–120. Jacobsen, J. H., Stelzer, J., Fritz, T. H., Chételat, G., La Joie, R., & Turner, R. (2015). Why musical memory can be preserved in advanced Alzheimer’s disease. Brain 138(8), 2438–2450. Janata, P. (2012). Effects of widespread and frequent personalized music programming on agitation and depression in assisted living facility residents with Alzheimer-type dementia. Music and Medicine 4(1), 8–15. Jäncke, L. (2008). Music, memory and emotion. Journal of Biology 7(6), 21. Johnson, J. K., Chang, C. C., Brambati, S. M., Migliaccio, R., Gorno-Tempini, M. L., Miller, B. L., & Janata, P. (2011). Music recognition in frontotemporal lobar degeneration and Alzheimer
disease. Cognitive and Behavioral Neurology: Official Journal of the Society for Behavioral and Cognitive Neurology 24(2), 74–84. Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention, and memory. Psychological Review 83(5), 323–355. Koelsch, S. (2014). Brain correlates of music-evoked emotions. Nature Reviews Neuroscience 15(3), 170–183. Kraus, N., Slater, J., Thompson, E. C., Hornickel, J., Strait, D. L., Nicol, T., & White-Schwoch, T. (2014). Music enrichment programs improve the neural encoding of speech in at-risk children. Journal of Neuroscience 34(36), 11913–11918. Kreutz, G., Bongard, S., Rohrmann, S., Hodapp, V., & Grebe, D. (2004). Effects of choir singing or listening on secretory immunoglobulin A, cortisol, and emotional state. Journal of Behavioral Medicine 27(6), 623–635. Lai, H. L., & Good, M. (2005). Music improves sleep quality in older adults. Journal of Advanced Nursing 49(3), 234–244. Lamont, A., Murray, M., Hale, R., & Wright-Bevans, K. (2018). Singing in later life: The anatomy of a community choir. Psychology of Music 46(3), 424–439. Large, E. W. (2008). Resonating to musical rhythm: Theory and experiment. In S. Grondin (Ed.), The psychology of time (pp. 189–232). Bingley: Emerald Group. Laukka, P. (2007). Uses of music and psychological well-being among the elderly. Journal of Happiness Studies 8(2), 215–241. Levy, M. L., Cummings, J. L., Fairbanks, L. A., Masterman, D., Miller, B. L., Craig, A. H., … Litvan, I. (1998). Apathy is not depression. Journal of Neuropsychiatry and Clinical Neurosciences 10(3), 314–319. Lisman, J., Grace, A. A., & Duzel, E. (2011). A neoHebbian framework for episodic memory: Role of dopamine-dependent late LTP. Trends in Neurosciences 34(10), 536–547. Lord, T. R., & Garner, J. E. (1993). Effects of music on Alzheimer patients. Perceptual and Motor Skills 76(2), 451–455. Luanaigh, C. Ó., & Lawlor, B. A. (2008). Loneliness and the health of older people. International Journal of Geriatric Psychiatry 23(12), 1213–1221. McDermott, O., Orrell, M., & Ridder, H. M. (2014). The importance of music for people with dementia: The perspectives of people with dementia, family carers, staff and music therapists. Aging & Mental Health 18(6), 706–716. McIntosh, G. C., Brown, S. H., Rice, R. R., & Thaut, M. H. (1997). Rhythmic auditory-motor facilitation of gait patterns in patients with Parkinson’s disease. Journal of Neurology, Neurosurgery & Psychiatry 62(1), 22–26. Mammarella, N., Fairfield, B., & Cornoldi, C. (2007). Does music enhance cognitive performance in healthy older adults? The Vivaldi effect. Aging Clinical and Experimental Research 19(5), 394– 399. Meltzer, C. C., Smith, G., DeKosky, S. T., Pollock, B. G., Mathis, C. A., Moore, R. Y., … Reynolds, C. F. (1998). Serotonin in aging, late-life depression, and Alzheimer’s disease: The emerging role of functional imaging. Neuropsychopharmacology 18(6), 407–430. Menec, V. H. (2003). The relation between everyday activities and successful aging: A 6-year longitudinal study. Journals of Gerontology Series B: Psychological Sciences and Social Sciences 58(2), S74–S82. Mora, F., Segovia, G., & Del Arco, A. (2007). Aging, plasticity and environmental enrichment: Structural changes and neurotransmitter dynamics in several areas of the brain. Brain Research Reviews 55(1), 78–88.
Moussard, A., Bermudez, P., Alain, C., Tays, W., & Moreno, S. (2016). Life-long music practice and executive control in older adults: An event-related potential study. Brain Research 1642, 146–153. Moussard, A., Bigand, E., Belleville, S., & Peretz, I. (2012). Music as an aid to learn new verbal information in Alzheimer’s disease. Music Perception: An Interdisciplinary Journal 29(5), 521– 531. Moussard, A., Bigand, E., Belleville, S., & Peretz, I. (2014a). Learning sung lyrics aids retention in normal ageing and Alzheimer’s disease. Neuropsychological Rehabilitation 24(6), 894–917. Moussard, A., Bigand, E., Belleville, S., & Peretz, I. (2014b). Music as a mnemonic to learn gesture sequences in normal aging and Alzheimer’s disease. Frontiers in Human Neuroscience 8. Retrieved from https://doi.org/10.3389/fnhum.2014.00294 Narme, P., Clément, S., Ehrlé, N., Schiaratura, L., Vachez, S., Courtaigne, B., … Samson, S. (2014). Efficacy of musical interventions in dementia: Evidence from a randomized controlled trial. Journal of Alzheimer’s Disease 38(2), 359–369. Nombela, C., Hughes, L. E., Owen, A. M., & Grahn, J. A. (2013). Into the groove: Can rhythm influence Parkinson’s disease? Neuroscience & Biobehavioral Reviews 37(10), 2564–2570. Norberg, A., Melin, E., & Asplund, K. (2003). Reactions to music, touch and object presentation in the final stage of dementia: An exploratory study. International Journal of Nursing Studies 40(5), 473–479. Ogay, S. (1995). La maintenance de la personnalité du sujet dément par la musicothérapie. Psychologie médicale 27, 104–105. Omigie, D., & Samson, S. (2014). A protective effect of musical expertise on cognitive outcome following brain damage? Neuropsychology Review 24(4), 445–460. Parbery-Clark, A., Anderson, S., Hittner, E., & Kraus, N. (2012). Musical experience strengthens the neural representation of sounds important for communication in middle-aged adults. Frontiers in Aging Neuroscience 4. Retrieved from https://doi.org/10.3389/fnagi.2012.00030 Parbery-Clark, A., Strait, D. L., Anderson, S., Hittner, E., & Kraus, N. (2011). Musical experience and the aging auditory system: Implications for cognitive abilities and hearing speech in noise. PloS ONE 6(5), e18082. Pascual-Leone, A. (2001). The brain that plays music and is changed by it. Annals of the New York Academy of Sciences 930, 315–329. Patel, A. D. (2011). Why would musical training benefit the neural encoding of speech? The OPERA hypothesis. Frontiers in Psychology 2, 142. Retrieved from https://doi.org/10.3389/fpsyg.2011.00142 Patel, A. D., Gibson, E., Ratner, J., Besson, M., & Holcomb, P. J. (1998). Processing syntactic relations in language and music: An event-related potential study. Journal of Cognitive Neuroscience 10(6), 717–733. Peck, K. J., Girard, T. A., Russo, F. A., & Fiocco, A. J. (2016). Music and memory in Alzheimer’s disease and the potential underlying mechanisms. Journal of Alzheimer’s Disease 51(4), 949–959. Racette, A., Bard, C., & Peretz, I. (2006). Making non-fluent aphasics speak: Sing along! Brain 129(10), 2571–2584. Raz, N., Lindenberger, U., Rodrigue, K. M., Kennedy, K. M., Head, D., Williamson, A., … Acker, J. D. (2005). Regional brain changes in aging healthy adults: General trends, individual differences and modifiers. Cerebral Cortex 15(11), 1676–1689. Saarikallio, S. (2011). Music as emotional self-regulation throughout adulthood. Psychology of Music 39(3), 307–327. Sakamoto, M., Ando, H., & Tsutou, A. (2013). Comparing the effects of different individualized music interventions for elderly individuals with severe dementia. International Psychogeriatrics 25(5), 775–784.
Salimpoor, V. N., van den Bosch, I., Kovacevic, N., McIntosh, A. R., Dagher, A., & Zatorre, R. J. (2013). Interactions between the nucleus accumbens and auditory cortices predict music reward value. Science 340(6129), 216–219. Salthouse, T. A. (2016). Theoretical perspectives on cognitive aging. New York: Routledge. Sammler, D., Baird, A., Valabrègue, R., Clément, S., Dupont, S., Belin, P., & Samson, S. (2010). The relationship of lyrics and tunes in the processing of unfamiliar songs: A functional magnetic resonance adaptation study. Journal of Neuroscience 30(10), 3572–3578. Samson, S., Clément, S., Narme, P., Schiaratura, L., & Ehrlé, N. (2015). Efficacy of musical interventions in dementia: Methodological requirements of nonpharmacological trials. Annals of the New York Academy of Sciences 1337, 249–255. Schellenberg, E. G. (2003). Does exposure to music have beneficial side effects? In R. Peretz & R. J. Zatorre (Eds.), The cognitive neuroscience of music (pp. 430–448). New York: Nova Science Press. Schlaug, G., Marchina, S., & Norton, A. (2009). Evidence for plasticity in white-matter tracts of patients with chronic Broca’s aphasia undergoing intense intonation-based speech therapy. Annals of the New York Academy of Sciences 1169, 385–394. Seinfeld, S., Figueroa, H., Ortiz-Gil, J., & Sanchez-Vives, M. V. (2013). Effects of music learning and piano practice on cognitive function, mood and quality of life in older adults. Frontiers in Psychology 4. Retrieved from https://doi.org/10.3389/fpsyg.2013.00810 Simmons-Stern, N. R., Budson, A. E., & Ally, B. A. (2010). Music as a memory enhancer in patients with Alzheimer’s disease. Neuropsychologia 48(10), 3164–3167. Simmons-Stern, N. R., Deason, R. G., Brandler, B. J., Frustace, B. S., O’Connor, M. K., Ally, B. A., & Budson, A. E. (2012). Music-based memory enhancement in Alzheimer’s disease: Promise and limitations. Neuropsychologia 50(14), 3295–3303. Sluming, V., Barrick, T., Howard, M., Cezayirli, E., Mayes, A., & Roberts, N. (2002). Voxel-based morphometry reveals increased gray matter density in Broca’s area in male symphony orchestra musicians. NeuroImage 17(3), 1613–1622. Stegemöller, E. L. (2017). The neuroscience of speech and language. Music Therapy Perspectives 35(2), 107–112. Stern, Y. (2002). What is cognitive reserve? Theory and research application of the reserve concept. Journal of the International Neuropsychological Society 8(3), 448–460. Stern, Y. (2009). Cognitive reserve. Neuropsychologia 47(10), 2015–2028. Stewart, L., Henson, R., Kampe, K., Walsh, V., Turner, R., & Frith, U. (2003). Brain changes after learning to read and play music. NeuroImage 20(1), 71–83. Stiles, J. (2000). Neural plasticity and cognitive development. Developmental Neuropsychology 18(2), 237–272. Sung, H. C., Chang, S. M., Lee, W. L., & Lee, M. S. (2006). The effects of group music with movement intervention on agitated behaviours of institutionalized elders with dementia in Taiwan. Complementary Therapies in Medicine 14(2), 113–119. Svansdottir, H. B., & Snaedal, J. (2006). Music therapy in moderate and severe dementia of Alzheimer’s type: A case-control study. International Psychogeriatrics 18(4), 613–621. Thaut, M. H., Leins, A. K., Rice, R. R., Argstatter, H., Kenyon, G. P., McIntosh, G. C., … Fetter, M. (2007). Rhythmic auditory stimulation improves gait more than NDT/Bobath training in nearambulatory patients early poststroke: A single-blind, randomized trial. Neurorehabilitation and Neural Repair 21(5), 455–459. Thaut, M. H., McIntosh, G. C., & Rice, R. R. (1997). Rhythmic facilitation of gait training in hemiparetic stroke rehabilitation. Journal of the Neurological Sciences 151(2), 207–212.
Thaut, M. H., McIntosh, G. C., Rice, R. R., Miller, R. A., Rathbun, J., & Brault, J. M. (1996). Rhythmic auditory stimulation in gait training for Parkinson’s disease patients. Movement Disorders 11(2), 193–200. Thompson, R. G., Moulin, C. J. A., Hayre, S., & Jones, R. W. (2005). Music enhances category fluency in healthy older adults and Alzheimer’s disease patients. Experimental Aging Research 31(1), 91–99. Tillmann, B. (2012). Music and language perception: Expectations, structural integration, and cognitive sequencing. Topics in Cognitive Science 4(4), 568–584. Ueda, T., Suzukamo, Y., Sato, M., & Izumi, S. I. (2013). Effects of music therapy on behavioral and psychological symptoms of dementia: A systematic review and meta-analysis. Ageing Research Reviews 12(2), 628–641. Van Petten, C., Plante, E., Davidson, P. S., Kuo, T. Y., Bajuscak, L., & Glisky, E. L. (2004). Memory and executive function in older adults: Relationships with temporal and prefrontal gray matter volumes and white matter hyperintensities. Neuropsychologia 42(10), 1313–1335. Verghese, J., LeValley, A., Derby, C., Kuslansky, G., Katz, M., Hall, C., … Lipton, R. (2006). Leisure activities and the risk of amnestic mild cognitive impairment in the elderly. Neurology 66(6), 821– 827. Verghese, J., Lipton, R. B., Katz, M. J., Hall, C. B., Derby, C. A., Kuslansky, G., … Buschke, H. (2003). Leisure activities and the risk of dementia in the elderly. New England Journal of Medicine 348(25), 2508–2516. Wan, C. Y., & Schlaug, G. (2010). Music making as a tool for promoting brain plasticity across the life span. The Neuroscientist 16(5), 566–577 Whalley, L. J., Deary, I. J., Appleton, C. L., & Starr, J. M. (2004). Cognitive reserve and the neurobiology of cognitive aging. Ageing Research Reviews 3(4), 369–382. White-Schwoch, T., Carr, K. W., Anderson, S., Strait, D. L., & Kraus, N. (2013). Older adults benefit from music training early in life: Biological evidence for long-term training-driven plasticity. Journal of Neuroscience 33(45), 17667–17674. Wilson, R. S., Boyle, P. A., Yang, J., James, B. D., & Bennett, D. A. (2015). Early life instruction in foreign language and music and incidence of mild cognitive impairment. Neuropsychology 29(2), 292–302. Zatorre, R. (2005). Music, the food of neuroscience? Nature 434(7031), 312–315. Zatorre, R. J., & Salimpoor, V. N. (2013). From perception to pleasure: Music and its neural substrates. Proceedings of the National Academy of Sciences 110(Suppl. 2), 10430–10437. Zendel, B. R., & Alain, C. (2012). Musicians experience less age-related decline in central auditory processing. Psychology and Aging 27(2), 410–417. Zumbansen, A., Peretz, I., & Hébert, S. (2014). Melodic intonation therapy: Back to basics for future research. Frontiers in Neurology 5. Retrieved from https://doi.org/10.3389/fneur.2014.00007
CHAPT E R 26
MUSIC TRAINING AND COGNITIVE ABILITIES: A S S O C I AT I O N S , C A U S E S , AND CONSEQUENCES S WAT H I S WA MI N AT H A N A N D E . G L E N N SCHELLENBERG
I O the years, a large body of research has examined associations between music lessons and non-musical cognitive abilities, with the aim of determining whether music training improves cognition. Because most studies have correlational designs, conclusions of causation are precluded. Thus, it remains a matter of great debate whether and under what conditions music lessons improve non-musical abilities. Musicians’ brains are structurally and functionally different from those of non-musicians (for reviews see Gaser & Schlaug, 2003; Herholz & Zatorre, 2012). Except in rare instances (e.g., Elbert, Pantev, Wienbruch, Rockstroh, & Taub, 1995), these differences do not inform the issue of causation. Individual differences in demographics, personality, cognitive ability, and so on, could be associated with brain development and with the likelihood of taking music lessons. In the present chapter, our focus is on associations between music training and behavior.
As one would expect, musically trained individuals tend to perform better than other individuals on tasks that require them to perceive and discriminate sequences of tones or beats (Bhatara, Yeung, & Nazzi, 2015; Law & Zentner, 2012; Slevc, Davey, Buschkuehl, & Jaeggi, 2016; Swaminathan & Schellenberg, 2017; Swaminathan, Schellenberg, & Khalil, 2017; Swaminathan, Schellenberg, & Venkatesan, 2018; Wallentin, Nielsen, Friis-Olivarius, Vuust, & Vuust, 2010). Music training is also associated with a wide range of non-musical abilities (for reviews see Schellenberg & Weiss, 2013; Swaminathan & Schellenberg, 2016). Moreover, a limited amount of experimental evidence suggests that music training causes improvements in non-musical abilities, at least in some circumstances (e.g., Jaschke, Honing, & Scherder, 2018; Kaviani, Mirbaha, Pournaseh, & Sagan, 2014; Portowitz, Lichtenstein, Egorova, & Brand, 2009; Schellenberg, 2004; Slater et al., 2015). Based on findings of widespread correlations and a handful of encouraging experimental results, researchers have proposed that music training is the perfect model for investigating plasticity and transfer of learning (e.g., Herholz & Zatorre, 2012 Münte, Altenmüller, & Jäncke, 2002; Wan & Schlaug, 2010). Consequently, correlations between music training and non-musical abilities are considered to provide evidence of causal effects. This tendency is problematic for at least three reasons. First of all, the link between music training and non-musical abilities is not clear-cut. For example, experiments sometimes fail to document improvements in cognitive abilities after taking music lessons (e.g., Butzlaff, 2000; Mehr, Schachner, Katz, & Spelke, 2013). Those that succeed often adopt non-standard pedagogies, such as training musiclistening skills rather than teaching participants to sing or play an instrument (Degé & Schwarzer, 2011; Moreno, Bialystok, et al., 2011; for a review see Swaminathan & Schellenberg, 2016). When children are assigned to more standard conservatory-style lessons at no cost to their parents (Schellenberg, 2004; Slater et al., 2015), the learning process bears little resemblance to the real world, because parents do not insist that their children practice between lessons. Even in correlational studies, associations with music training are not always evident (e.g., Boebinger et al., 2015; Brandler & Rammsayer, 2003; Helmbold, Rammsayer, & Altenmüller, 2005; Ruggles, Freyman, & Oxenham, 2014; Schellenberg & Moreno, 2010; Swaminathan & Schellenberg, 2017).
The second problem is more theoretical, concerning the relation between correlation and causation. As every student in an introductory psychology course learns, correlation does not imply causation. Nevertheless, this is not a reciprocal relation, and causation definitely implies correlation. In other words, if music training causes improvements in cognitive abilities, one should rightly expect this effect to be evident in everyday life, such that individuals who take years of music lessons exhibit the documented positive effects. In short, when a correlational study is well designed and adequately powered, a null result provides direct negative evidence against the hypothesized effect, whereas a positive effect is simply consistent with a putative causal association. A third major issue involves far transfer, when training in a domain such as music leads to better performance or faster learning in a different (i.e., non-musical) domain. Although near transfer—to a highly similar task —is common, it is still unclear whether far-transfer effects are even possible, despite more than a century of research (e.g., Brody, 1992; Jensen, 1969, 1998; Thorndike & Woodworth, 1901a, 1901b). For example, interventions designed specifically to improve general cognitive abilities, such as working memory, fluid intelligence, or academic performance, report weak or variable results (Guo, Ohsawa, Suzuki, & Sekiyama, 2018; Love, Chazan-Cohen, Raikes, & Brooks-Gunn, 2013; Melby-Lervåg & Hulme, 2013; Melby-Lervåg, Reddick, & Hulme, 2016; Rapport, Orban, Kofler, & Friedman, 2013; Shipstead, Redick, & Engle, 2012; Weicker, Villringer, & Thöne-Otto, 2016). In fact, recent meta-analyses find weak to no evidence that (1) chess instruction leads to better cognitive skills (Sala & Gobet, 2016), (2) working-memory training enhances cognitive ability or academic achievement (Sala & Gobet, 2017b; Soveri, Antfolk, Karlsson, Salo, & Laine, 2017), or (3) video-game playing improves cognition (Sala, Tatlidil, & Gobet, 2018). With this larger context in mind, the putative cognitive-training effects of music lessons should be considered with caution. In fact, a meta-analysis that examined directly whether music training has far-transfer effects on non-musical cognitive abilities reported similarly skeptical results (Sala & Gobet, 2017a). In the remainder of this chapter, we first review the correlational and experimental evidence for music-training effects. Subsequently, we propose an analytic strategy as a possible way forward.
A R
E
E
Researchers have examined whether music training is associated with general cognitive abilities, visuospatial skills, and language skills. Associations have also been studied in applied contexts, such as educational settings and interventions designed to promote healthy aging. In this section, we review these findings, paying close attention to inconsistencies in the literature.
Music Training and General Cognitive Abilities Musically trained children and adults typically have higher IQ scores than their untrained counterparts (Gibson, Folley, & Park, 2009; Gruhn, Galley, & Kluth, 2003; Ho, Cheung, & Chan, 2003; Schellenberg, 2011a, 2011b; Schellenberg & Mankarious, 2012; Trimmer & Cuddy, 2008). In some instances, duration of training is associated positively with IQ or other measures of general cognitive ability (Corrigall, Schellenberg, & Misura, 2013; Degé, Wehrum, Stark, & Schwarzer, 2014; Schellenberg, 2006; Swaminathan et al., 2017, 2018; Swaminathan & Schellenberg, 2018). In other words, as music training tends to increase, so does IQ. Because intelligence-test scores predict educational and career outcomes, as well as health and longevity (e.g., Deary, Strand, Smith, & Fernandes, 2007; Judge, Higgins, Thoresen, & Barrick, 1999; Spinath, Spinath, Harlaar, & Plomin, 2006), correlations are often interpreted optimistically, as evidence that music training promotes wide-ranging cognitive benefits that have implications for an individual’s success in life. An alternative view of these correlational findings, however, is that enrolling in music lessons, particularly for extended durations of time, is the consequence of better intellectual functioning. Nevertheless, there is some experimental evidence indicating that music lessons cause small improvements in IQ scores, which we will now summarize and evaluate. In one study, 144 6-year-olds were assigned randomly to one year of keyboard or vocal music lessons or to control conditions (drama lessons or no lessons at all; Schellenberg, 2004). Before
the intervention period began, the groups did not differ in their scores on the Wechsler Intelligence Scale for Children—III (WISC-III). After the intervention, all groups showed improvements on the WISC-III. These across-the-board improvements likely resulted from attending school or a retesting effect. A more provocative result revealed that children who received music lessons (keyboard or voice) showed larger improvements than their counterparts in the control groups. The effect was evident only when the two music groups were contrasted directly with the two control groups, however, and it was small (< 3 points), less than the average intraindividual difference between two administrations of the same test. Moreover, at post-test when parents were questioned about their child’s practice habits, it became clear that children in the music groups practiced minimally (10–15 min per week). In any event, the observed effect could have stemmed from the school-like structure of the music lessons, which differed from the play-like structure of the drama lessons, and led to better test-taking skills. Alternatively, the effect may have been a Type I error. As a side note, an interesting non-musical result was that the children in the drama group had the largest improvements in social behavior. In a more recent study of preschoolers in Tehran, children assigned to three months of weekly music lessons made statistically significant post-test gains on a standardized Farsi version of the Stanford-Binet IQ test (Kaviani et al., 2014). There was no evidence of improvement in the control group. The control group in the Iranian study was a passive control group (i.e., no lessons at all), however, which makes it impossible to attribute the positive findings of the musically trained group to the actual music training, rather than to other aspects of the intervention (e.g., contact with an adult instructor). Another recent longitudinal study was conducted in the Netherlands (Jaschke et al., 2018). Randomization to different conditions involved entire schools rather than individual children (as in Portowitz et al., 2009). Two schools were assigned randomly to a music-training intervention, and two to a visual-arts program. The remaining two groups received the standard Dutch curriculum. A fourth group comprised children who were taking music lessons outside of school and assigned to the music intervention. The authors reported that the visual-arts group showed more improvement than the other groups in visuospatial ability, whereas the two music groups showed larger improvements in verbal IQ and executive functions (planning
and inhibition). In short, children from different schools had different rates of improvement. It is impossible, however, to attribute the response patterns to the different interventions. Other differences between schools, such as teaching quality, may have played a major role, and no conclusions can be made about any analysis that included the fourth group because of selfselection. Relatively weak positive results such as these are further belied by a fair dose of mixed or null findings. One issue is that enhancements are more likely on some IQ tests than on others. For example, group differences are less likely to be evident when a test of fluid intelligence is used as the measure of general cognitive ability (Bialystok & DePape, 2009; Brandler & Rammsayer, 2003; Helmbold et al., 2005; Schellenberg & Moreno, 2010; Swaminathan & Schellenberg, 2017), rather than a test that includes measures of crystallized intelligence, such as vocabulary (Jaschke et al., 2018; Kaviani et al., 2014; Schellenberg, 2004). It is also clear that music training is associated with IQ in some groups but not in others. For example, university music majors, who have presumably invested a lot of time and effort in acquiring musical skills, do not necessarily show an IQ advantage compared to students at the same level majoring in other disciplines (Brandler & Rammsayer, 2003; Helmbold et al., 2005). In other words, the association between music training and cognitive ability is strongest when music training is an add-on activity rather than the participant’s primary focus. Otherwise, one would expect professional musicians (e.g., Celine Dion, members of symphony orchestras) to be geniuses. There are other reasons to be cautious about the putative causal effect of music training on intelligence. For one, correlations between music lessons and cognitive ability may be explained by personality factors, particularly the Openness-to-Experience trait (Corrigall et al., 2013; Corrigall & Schellenberg, 2015). In other words, musically trained individuals may perform well on intelligence tests, at least in part, because they tend to be curious and particularly interested in learning new things (including, but not limited to, music). Moreover, common genetic factors underlie intelligence and the propensity to practice music (Mosing, Madison, Pedersen, & Ullén, 2016). Findings from studies that examined personality or genetics raise the possibility that the association between music training and general cognitive
ability in correlational studies and quasi-experiments is largely a reflection of pre-existing differences. Moreover, despite some experimental evidence for modest IQ enhancements after music training (Jaschke et al., 2018; Kaviani et al., 2014; Portowitz et al., 2009; Schellenberg, 2004), other experiments and longitudinal studies failed to find general cognitive improvements. For example, one longitudinal study in Hong Kong found no evidence for an IQ enhancement after six months of training (Ho et al., 2003, study 2). When researchers in Massachusetts assigned preschool children randomly to either six weeks of group music lessons or no lessons at all, they found no advantage in cognitive abilities for the children who took music lessons (Mehr et al., 2013, experiment 2). Such inconsistent results suggest that music training may not always result in cognitive advantages, or that the effect is very small. One possibility is that music lessons lead to intellectual advantages only if they train some intermediate capacity that mediates the association between music training and intelligence. For example, it has been suggested that executive functions such as attention, a capacity closely associated with general cognitive ability (Salthouse, 2005), can be trained (Rueda, Rothbart, McCandliss, Saccamanno, & Posner, 2005). Working memory is similarly thought to be trainable (Klingberg, 2010; cf. Melby-Lervåg & Hulme, 2013), and there is some evidence that working-memory training transfers to improvements in fluid intelligence (Jaeggi, Buschkuehl, Jonides, & Perrig, 2008). This particular report of far transfer has been questioned, however, because of the study’s methodological irregularities, and because there was no evidence that the effect was long-lasting (Conway, Getz, Macnamara, & Engel de Abreu, 2011; Mackintosh, 2011; Moody, 2009). Nevertheless, it is still possible that music lessons train executive functions, including working memory, which in turn promote general cognitive enhancements (Degé, Kubicek, & Schwarzer, 2011; Hannon & Trainor, 2007; Posner, Rothbart, Sheese, & Kieras, 2008; Schellenberg & Peretz, 2008). In fact, musically trained adults perform better than their untrained counterparts on auditory (Bialystok & DePape, 2009; Roden, Grube, Bongard, & Kreutz, 2014; Zuk, Benjamin, Kenyon, & Gaab, 2014) as well as non-auditory (Bialystok & DePape, 2009; Okada & Slevc, 2018; Zuk et al., 2014) tests of executive functions, as do musically trained children and teenagers (Degé et al., 2011; Herrero & Carriedo, 2018; Jaschke et al., 2018). In one study of 9- to 11-year-olds, however, music
training was associated with IQ but not with executive functions other than auditory working memory (Schellenberg, 2011a). Virtually identical results were evident when 6- to 8-year-olds were randomly assigned to a six-week music-training intervention, with more improvement, relative to controls, on a test of working memory but not on other measures of executive functions (Guo et al., 2018). Thus, it remains unclear whether executive functions mediate the effect of music training on cognition. A second possibility is that the type of music training plays a role (Swaminathan & Schellenberg, 2016). Private music lessons (where one teacher attends to one student or a very small group of students) emphasize individual accomplishment and skill mastery. Group-based lessons (e.g., training in a high-school band), on the other hand, are more likely to emphasize collective outcomes over individual ones. Private music training may be more effective than group-based lessons at improving scores on tests of cognitive ability, which by definition measure individual ability and accomplishment. Alternatively, individual differences in cognitive ability may influence who takes private lessons. Either way, the association could be limited to the developed world, where private lessons are common. In developing countries, and throughout history, music making is and has been typically a group activity, in which virtually everyone takes part. Considered as a whole, although associations between music training and intelligence are evident in many circumstances, it is unclear whether music training causes improvements in cognitive ability. If music lessons do improve intelligence or general cognitive ability, the effect appears to be: (1) small, (2) evident only among some individuals, or (3) a likely consequence of taking lessons that emphasize individual achievement. More generally, we know that far-transfer effects are very rare, and that parsimony rules the day in the world of science. In short, a simpler explanation of the available data is that high-functioning individuals are more likely than other individuals to take music lessons.
Associations with Specific Cognitive Abilities Despite evidence of associations between music training and domaingeneral abilities such as general intelligence or IQ, it has often been
suggested that musical abilities are more strongly related to some nonmusical, cognitive abilities than they are to others. For example, a case has been made for special overlaps between musical and visuospatial skills (Leng, Shaw, & Wright, 1990; Rauscher & Shaw, 1998). Others have argued for associations with language skills, specifically that musically trained individuals exhibit enhanced performance on lower-level tasks that involve speech perception (e.g., Kraus & Chandrasekaran, 2010; Patel, 2003, 2011; Skoe & Kraus, 2012). These theories imply that the benefits of music training are especially likely to transfer to skills that are trained more directly during music lessons, such as navigating a piano keyboard or reading musical notation (which transfers to visuospatial skills), or listening skills more generally (which extend to speech perception). In this subsection, we review evidence for training-related transfer to the visuospatial and language domains.
Associations with Visuospatial Skills Music training is associated with visuospatial skills. In fact, advantages on visual and spatial-reasoning tasks are evident in studies of musically trained adults (Bidelman, Hutka, & Moreno, 2013; Brochard, Dufour, & Deprés, 2004; Faßhauer, Frese, & Evers, 2015; Jakobson, Lewycky, Kilgour, & Stoesz, 2008; Patston & Tippett, 2011; Sluming, Brooks, Howard, Downes, & Roberts, 2007; Stoesz, Jakobson, Kilgour, & Lewycky, 2007) and children (Bilhartz, Bruhn, & Olson, 2000; Costa-Giomi, 1999; Gromko & Poorman, 1998; Hassler, Birbaumer, & Feil, 1985; Rauscher & Hinton, 2011; Rauscher & Zupan, 2000). For example, musically trained adults outperform their untrained counterparts on tests of visuospatial short-term memory (i.e., Corsi blocks; Bidelman et al., 2013), on tasks that require them to recreate line drawings from short-term or long-term memory (Jakobson et al., 2008), and when they are asked to determine whether two three-dimensional shapes are the same but rotated in space (Sluming et al., 2007). Examples from children indicate that music training predicts better performance on a task that asks them to remember the order of different colored beads on a string (Bilhartz et al., 2000), and when they are required to arrange blocks to form the shape of a previously seen staircase (Rauscher & Zupan, 2000).
Results from experimental studies provide only weak indications that music training actually causes improvements in visuospatial skills. For example, one study assigned preschool children to six months of 10–15 min of weekly keyboard lessons and 30 min of daily voice lessons, voice lessons only, computer training, or no lessons (Rauscher et al., 1997). Only the children in the keyboard/singing group exhibited improvement on the Object Assembly subtest of the Wechsler Preschool and Primary Scale of Intelligence—Revised (WPPSI-R). An unequivocal interpretation of these findings depends on the internal validity of the design, however, and it is doubtful that the computer training (with commercially available educational software) was an appropriate and equally engaging control activity. Moreover, the singing-only group had less contact with an adult instructor compared to the keyboard/singing group. Finally, children were not assigned randomly to the four conditions. Although a review of other studies from the same laboratory provided converging results (Rauscher & Hinton, 2011), there were not enough methodological details provided in the review to be confident about the findings. Nevertheless, one meta-analysis of experimental studies concluded that music training causes improvements in spatial skills, even though six of the fifteen studies included in the analysis were conducted by a single research group (i.e., Rauscher and colleagues; Hetland, 2000). More recent studies report mixed results. For example, Mehr et al. (2013) conducted two experiments. In one, children were randomly assigned to either music or visual-arts lessons for six weeks. After the intervention, the music group outperformed the art group on a spatial-navigation task, while the art group outperformed the music group on a geometry-perception task. In a second experiment, no significant group differences were evident when a new group of preschool children was randomly assigned to either six weeks of music lessons or no lessons at all. One might question whether the effect of music training on visuospatial skills is evident only after a longer duration of training (i.e., longer than six weeks). Although this is a reasonable proposal, when Costa-Giomi (1999) assigned 9-year-olds to three years of piano lessons or no lessons, the piano-trained children had better spatial abilities after one and two years of lessons, but not after three years. In short, although there is ample evidence that music training is associated positively with visuospatial abilities,
evidence that music training causes the association is weak and inconsistent.
Associations with Language Abilities Because both language and music are rule-bound means of auditory communication, associations between musical and linguistic processing have received much attention from scholars who conduct research in these domains. The available evidence documents that musically trained individuals are better than their untrained counterparts at detecting linguistic stress patterns (Kolinsky, Cuvelier, Goetry, Peretz, & Morais, 2009), and at perceiving pitch and intonation in speech (Besson, Schön, Moreno, Santos, & Magne, 2007; Dankovičová, House, Crooks, & Jones, 2007; Delogu, Lampis, & Belardinelli, 2010; Good et al., 2017; Magne, Schön, & Besson, 2006; Marques, Moreno, Castro, & Besson, 2007; Thompson, Schellenberg, & Husain, 2004; Wong, Skoe, Russo, Dees, & Kraus, 2007; Wu et al., 2015). In some instances, they also tend to be better at perceiving speech under challenging conditions, such as comprehending speech in noise (Parbery-Clark, Skoe, Lam, & Kraus, 2009; Strait & Kraus, 2011; Strait, Parbery-Clark, O’Connell, & Kraus, 2013; Tierney, Krizman, Skoe, Johnston, & Kraus, 2013; Swaminathan et al., 2015; but see Boebinger et al., 2015; Ruggles et al., 2014), or perceiving acoustically degraded vowel sounds (Bidelman & Krishnan, 2010). Musicians also show advantages in speech-segmentation skills (François, Chobert, Besson, & Schön, 2013) and phonological perception (Chobert, François, Velay, & Besson, 2014; Chobert, Marie, François, Schön, & Besson, 2011; Zuk et al., 2013), and their brainstems appear to make higher-fidelity representations of speech stimuli (e.g., Kraus et al., 2014; Parbery-Clark, Tierney, Strait, & Kraus, 2012; Strait, O’Connell, Parbery-Clark, & Kraus, 2014; Strait, Parbery-Clark, Hittner, & Kraus, 2012; Weiss & Bidelman, 2015). In addition to speech-specific auditory advantages, musically trained individuals show advantages on higher-level cognitive tests of verbal ability including verbal short-term (Chan, Ho, & Cheung, 1998; Franklin et al., 2008; Hansen, Wallentin, & Vuust, 2013; Ho et al., 2003), working (Franklin et al., 2008), and long-term memory (Franklin et al., 2008). In short-term and long-term tests of verbal memory, participants are required to read and recall a list of unrelated words, either immediately (short-term
memory) or after a delay (long-term memory). In tests of working memory, listeners are required to remember a list of letters or numbers, but between each presentation of a to-be-remembered item, a secondary task requires them to determine whether a sentence makes sense, or to solve an addition problem. Music training also predicts enhanced performance on tests of vocabulary (Forgeard, Winner, Norton, & Schlaug, 2008; Piro & Oritz, 2009) and second-language ability (Petitto, 2008; Posedel, Emery, Souza, & Fountain, 2012; Swaminathan & Gopinath, 2013; Talamini, Grassi, Toffalini, Santoni, & Carretti, 2018). One of the most reliable findings is that music training is correlated positively with phonological awareness (a skill important to the development of reading), which refers to the ability to perceive and segment phonological elements of speech (Gromko, 2005; Overy, 2003; Wandell, Dougherty, Ben-Shachar, Deutsch, & Tsang, 2008). It is therefore not surprising that music training is also associated positively with reading ability (Butzlaff, 2000; Corrigall & Trainor, 2011; Moreno et al., 2009; Standley, 2008; Swaminathan et al., 2018). Some experimental evidence supports the idea that music training can promote language abilities. For example, Degé and Schwarzer (2011) randomly assigned preschool children to daily training in music, phonological skills, or sports. After twenty weeks, the phonological awareness of children in the music group matched those of children in the program designed specifically to improve these skills. Both groups outperformed the sports group, which ruled out the role of normal maturation. Similar findings have been found in children with atypical language development. For example, one study assigned children with dyslexia to six weeks of a rhythm intervention, a commercially available phonemediscrimination intervention, or a passive control group (Thomson, Leong, & Goswami, 2013). The rhythm intervention involved copying and synchronizing to non-speech rhythms on a hand drum, speaking and clapping to words in rhythm, and playing computerized games intended to train basic auditory skills linked to rhythm perception. Relative to the control group, both the rhythm group and the phonological skills group improved on tests of phonological processing. Another experiment randomly assigned children with dyslexia to thirty weeks of either music lessons (focused primarily on rhythm) or painting lessons (Flaugnacco et
al., 2015). Despite similar performance at pre-test, after the intervention, the music group made larger gains on phonological and reading skills compared to the painting group. Moreno and colleagues (Moreno, Bialystok, et al., 2011) assigned preschool children to twenty days of computerized training in music or visual arts. The children were tested on the Block Design and Vocabulary subtests of the WPPSI-III before and after the intervention. The music group made significant post-test gains on the Vocabulary subtest but not the Block Design subtest. Importantly, the arts group did not make any gains, which indicates that vocabulary improvements were specific to music training. In another article from the same sample of children (Moreno, Friesen, & Bialystok, 2011), the researchers reported that the music group was better at learning to map arbitrary visual symbols to words, a skill that is likely to be important for the development of reading. The successful music-lesson interventions in the experimental studies described above were relatively short-term, and focused on listening rather than learning to play an instrument. Thus, listening training that focuses on music, particularly rhythm and timing, may indeed help children perceive rapid temporal changes in speech, such as those that distinguish adjacent phonemes. When children are trained with more standard pedagogies, however, the results tend to be weaker. For example, in a recent longitudinal study that examined group-based music lessons, differences compared to a control group emerged only after extended training (Slater et al., 2015). Specifically, children improved on a test that measured their ability to perceive speech in noise after two years of community-based music lessons, which were taught using an established and successful curriculum. Nevertheless, children who received only one year of the same lessons did not make any statistically significant gains. Moreover, other studies failed to find correlations between music training and language skills (Boebinger et al., 2015; Ruggles et al. 2014; Swaminathan & Schellenberg, 2017), which implies that typical music lessons may not always improve speech and language abilities, or that the effect is relatively weak. In general, however, music lessons that emphasize listening skills and temporal (rhythm) perception appear to promote phonological awareness and speech perception, at least for some groups of individuals. These improvements can, in turn, facilitate learning to read.
Music Training and Cognitive Performance in Real-World Contexts The studies described in previous sections raise the possibility that music training causes small improvements on standardized tests and laboratorybased measures of speech perception and other aspects of cognitive ability. If such effects exist, they would have little importance unless they extend to performance in real-world situations. We now turn to two such situations: academic achievement and healthy aging.
Music Training and Academic Achievement Participation in school-based musical activities predicts academic performance in later years (Catterall, Chapleau, & Iwanaga, 1999; Gouzouasis, Guhn, & Kishor, 2007; Winner & Cooper, 2000). For example, a meta-analysis of ten years of data from the American College Board found that high-school students with training in the arts, including music, performed better than students without any arts training on the SAT (formerly the Scholastic Aptitude Test; Vaughn & Winner, 2000). (SAT scores are used as a basis for admission to undergraduate colleges. Thus, the SAT is administered routinely to high-school seniors.) Longer duration of musical participation is also known to be associated with better SAT scores (Vaughn & Winner, 2000), and with higher average grades in school (Catterall et al., 1999; Schellenberg, 2006; Wetter, Koerner, & Schwaninger, 2009). These associations tend to be broad and general, rather than restricted to one or two school subjects. For example, the mathematics and geometry scores of musically trained participants tend to be higher than their untrained counterparts (Catterall et al., 1999; Cheek & Smith, 1999; Gardiner, Fox, Knowles, & Jeffry, 1996; Gouzouasis et al., 2007; Graziano, Peterson, & Shaw, 1999; Spelke, 2008; Vaughn, 2000; Vaughn & Winner, 2000), as is the case with language-related academic outcomes (Vaughn & Winner, 2000). Nevertheless, some types of music lessons may be more strongly associated with academic outcomes than others. For example, Canadian adolescents with keyboard lessons show advantages on highschool English tests, but those with vocal music training do not (Gouzouasis et al., 2007).
Interestingly, students who enroll in music classes—even theory-based music history classes—appear to demonstrate academic advantages compared to students who report having no training in any form of fine arts (Vaughn & Winner, 2000). In other words, actual instrumental or vocal training may not be unique in its association with academic performance. Indeed, high-school students who participate in any type of arts activity show advantages on the SAT, with drama students showing the strongest advantages (Vaughn & Winner, 2000). Moreover, children participating in sports are just as likely as arts participants to win academic awards (Winner, Goldstein, & Vincent-Lancrin, 2012), and they perform no differently from musically trained children on tests of mathematical ability (Spelke, 2008). Thus, participation in any type of extracurricular activity, not just music, predicts academic performance. It is also unclear whether participation in music (or other) activities causes academic advantages. It is equally likely, if not more likely, that preexisting individual differences in academic ability determine musical participation. Indeed, grades in elementary school predict participation in middle-school (Kinney, 2008) and high-school (Frakes, 1985) music classes, a timeline that rules out a causal role for music training. Moreover, better academic performance predicts longer participation in subsequent musical activities (e.g., Kinney, 2010; Klinedinst, 1991). The association between music training and academic performance could also be an artifact of a third variable. For example, socio-economic affluence is associated with better scholastic performance (Sirin, 2005), as well as with musical participation (Corenblum & Marshall, 1998; Kinney, 2010; Klinedinst, 1991). However, the correlation between training and scholastic achievement is evident across socio-economic status (SES) levels (Catterall et al., 1999; Fitzpatrick, 2006) and persists even after holding SES constant (Corrigall et al., 2013; Degé et al., 2014; Schellenberg, 2006, 2011a, 2011b; Schellenberg & Mankarious, 2012), which indicates that the association between music training and academic achievement is at least partly independent of SES. Pre-existing personality differences also appear to play a role. For example, musically trained children tend to do even better in school than one would predict from their elevated IQ scores (Corrigall et al., 2013; Degé et al., 2014; Schellenberg, 2006). This “special” association between music training and school performance disappears when conscientiousness
is controlled in addition to IQ (Corrigall et al., 2013). In other words, in addition to being smart, musically trained children tend to be particularly hard-working and diligent, which explains why they do particularly well in school. Finally, the results of longitudinal and experimental studies provide little evidence of a causal role for music training on academic performance. For example, one longitudinal study found evidence for a scholastic advantage after two years of piano lessons but not after the third year (Costa-Giomi, 2004). Another one-year longitudinal study found no evidence of improved performance on a standardized test of academic achievement, although children with music training had larger improvements, in absolute terms, on each of five subtests (Schellenberg, 2004). The most compelling negative result comes from a government-funded project in the UK that was organized by the Educational Endowment Foundation and the National Centre for Social Research. Over 900 children in Year 2 (7-year-olds) from nineteen schools were randomly assigned to music training (strings or voice), or to a control group that had drama lessons (Haywood et al., 2015). All participants received weekly training for thirty-two weeks in groups of approximately ten children. Improvements in mathematical abilities and literacy skills were similar for the music and drama groups, and there was no difference between the two music groups. Meta-analytic reviews that include findings from published and unpublished sources also report no evidence for a causal role of music training in scholastic achievement (Winner & Cooper, 2000; Winner et al., 2012). As noted earlier, the most recent meta-analysis (Sala & Gobet, 2017b) found a very small association between music training and academic achievement, but this effect was due to contributions from poorly designed studies. Specifically, such associations are more likely to be evident when (1) studies do not have random assignment, such that selfselection plays a role in choosing to take music lessons, and (2) the control group is passive (no activity) rather than active, such that non-musical aspects of the music training (e.g., structured learning environment, additional contact with an adult teacher) are implicated. In sum, there is little evidence to support the notion that music training causes improvements in scholastic performance, despite much evidence that music training is associated positively with academic achievement.
Music Training and Healthy Aging Older adults often experience declines in cognitive abilities, such as deficits in executive functions and difficulties with hearing in noisy environments (for reviews see Alain, Zendel, Hutka, & Bidelman, 2014; Salthouse, 2004). Because music training may cause small improvements in these skills in normally maturing children, it is plausible that it could also slow the onset of aging-related declines. To date, only a handful of studies have examined this possibility. The available evidence suggests that middle-aged and older adults who have practiced music throughout their lives tend to outperform age-matched non-musicians on auditory perception tests, such as speech perception in noise (Parbery-Clark, Strait, Anderson, Hittner, & Kraus, 2011; Zendel & Alain, 2012), categorical perception of speech sounds (Bidelman & Alain, 2015), frequency discrimination (Grassi, Meneghetti, Toffalini, & Borella, 2017), and the detection of gaps and mistuned harmonics in tones (Grassi et al., 2017; Zendel & Alain, 2009, 2012, 2013). In fact, some evidence implies that older-adult musicians perform almost as well as young adults at detecting mistuned harmonics (Zendel & Alain, 2013). In one instance, even a small amount of music training in childhood was related to better temporal precision in speech-evoked neural responses later in life (WhiteSchwoch, Carr, Anderson, Strait, & Kraus, 2013). Middle-aged and older adults with music training also tend to show advantages in executive-function tasks, including auditory attention and working memory, verbal immediate recall, and verbal fluency (Amer, Kalender, Hasher, Trehub, & Wong, 2013; Fauvel et al., 2014; Grassi et al., 2017; Hanna-Pladdy & Gajewski, 2012; Parbery-Clark et al., 2011). Advantages appear to be strongest for individuals who began music lessons earlier rather than later in life (Fauvel et al., 2014). In the visuospatial domain, however, the evidence is less clear. Whereas two studies found no evidence for visual working-memory advantages in older musicians (Hanna-Pladdy & Gajewski, 2012; Parbery-Clark et al., 2011), three others reported that musicians outperform non-musicians on a visuospatial span task, the Simon Task, or other tests of visuospatial ability (Amer et al., 2013; Grassi et al., 2017; Hanna-Pladdy & MacKay, 2011). One recent study of adults over 64 years of age compared those who were currently singing or playing a musical instrument to other participants
(Mansens, Deeg, & Comijs, 2017). The older participants who were making music, particularly those who were playing an instrument, had higher scores on tests of episodic memory, executive functions, and attention. It is unknown, however, whether these individuals also played music earlier in life. In any event, playing music later in life appears to be a marker of healthy aging. At the very least, these studies suggest that those who are inclined to engage in musical activities early in life are also likely to show cognitive advantages later in life, as they do when they are younger. The findings do not inform the issue of rates of cognitive decline. Thus, whether music training can be used to preserve cognitive abilities or even slow down cognitive aging processes is still an open question. Suggestive evidence, however, comes from one experimental study in which 60- to 85-year-olds were assigned randomly to six months of piano lessons or a no-lessons (passive) control condition (Bugos, Perlstein, McCrae, Brophy, & Bedenbaugh, 2007). The music group improved on two of five tests of executive function, whereas the control group did not appear to make gains on any of the tests. It is not clear, however, whether improvements in the intervention group were due to music training per se. As noted, the effect could be due to non-musical aspects of the training, such as an additional opportunity to engage with someone (the instructor), or simply the knowledge among the piano-group participants that they were somehow “special” because they were receiving an intervention. Moreover, it is not clear whether such improvements were long-lasting. In sum, although it is possible that (1) music training in childhood may buffer against cognitive declines that are evident later in life, and (2) musical engagement in late adulthood could preserve or even improve already declining abilities, not enough evidence is available at the present time to confirm or disprove these hypotheses.
O
W A
F
: M M
M T
As noted, most of the research on music lessons and non-musical cognitive abilities is correlational in design. Such designs preclude inferences of
causality. Experimental studies with random assignment are better suited to studying causal direction but are relatively rare because they are expensive to carry out. Moreover, training received in the context of an experiment is, no doubt, quite different from the experience of music lessons in the real world. For example, attrition limits the length of training in experimental studies. In the one-year study by Schellenberg (2004) with 144 children at baseline, 12 students (1 in 12, or 8.3 percent) dropped out and were not available for post-test. In the two-year study by Jaschke et al. (2018), 30 of 176 children (17.0 percent) who were tested at baseline were not available at post-test. Moreover, random assignment in experiments excludes motivational factors that promote long-term musical participation in the real world. In other words, correlational studies can capture ecologically valid information that experiments cannot. Is there a way to improve the interpretability of correlational findings? In recent research, we made one such attempt by measuring duration of music training, performance on music aptitude tests, and other possible confounding variables (e.g., SES; Swaminathan & Schellenberg, 2017, 2018; Swaminathan et al., 2017, 2018). Music aptitude is typically quantified using tests that measure listeners’ ability to perceive, remember, and discriminate melodies and rhythms (Gordon, 1965; Seashore, Lewis, & Saetveit, 1960). On each trial, the listener decides whether two musical sequences are identical. On trials with non-identical sequences, one event (i.e., a tone or drumbeat) in the second sequence is altered in pitch or time. In pedagogical contexts, aptitude is a measure of musical proclivities that should lead to subsequent success in musical activities, including training. As one would expect, music training does indeed predict performance on aptitude tests (e.g., Law & Zentner, 2012; Wallentin et al., 2010), but the causal direction is unclear. Importantly, music aptitude is also associated with performance on tests of general cognitive abilities and language abilities (for review, see Schellenberg & Weiss, 2013). In our statistical analyses, we examined individual differences in performance on a non-musical variable (e.g., a test of speech perception or intelligence) as a function of music aptitude, with music training held constant, and music training, with music aptitude held constant. Depending on the particular research question, we also held constant other measures with overlapping variance, such as socio-economic status or personality. These analyses of partial associations allowed for more nuanced
investigation about the relative role of learning and the environment (e.g., music training) on the one hand, and natural abilities (e.g., music aptitude, intelligence) on the other hand. With training held constant, associations of non-musical skills with performance on a music-aptitude test indicate that the association between musical and non-musical skills is independent of training, and possibly precedes it. With aptitude held constant, associations of non-musical skills with training provide more convincing evidence for training effects. Although partial correlations, like simple correlations, do not allow for inferences of causation, this method serves to contextualize the size and location of hypothesized training effects relative to pre-existing associations between musical and non-musical abilities. When we used this method with adult participants, music aptitude was associated with intelligence (Swaminathan et al., 2017) and speech perception skills (Swaminathan & Schellenberg, 2017) when training was held constant, but music training was not associated with either outcome when music aptitude was held constant. In the case of speech perception, there was not even a simple association between music training and performance on a test of the ability to discriminate speech sounds that are relevant (i.e., phonemic) in a foreign language (Zulu) but not in English. Our interpretation of these data was that pre-existing differences in music aptitude and cognitive ability predict music training. Although taking music lessons may go on to increase music aptitude and cognitive abilities further (Figure 1), such training effects are likely to play a relatively small role in the overall picture.
FIGURE 1. Individuals with high cognitive ability and music aptitude have an increased likelihood of taking music lessons, which could then go on to improve cognitive and musical abilities.
In other correlational research, we used a similar approach to examine the association between music training and music aptitude (Swaminathan & Schellenberg, 2018). The simple association between the two variables was significant, as in previous research, with music training accounting for 24.5 percent of the variance in music aptitude. When socio-economic status, openness to experience, short-term memory, and general cognitive ability were considered jointly with music training, the predictive power of the model increased to 36.7 percent. Music training continued to have the largest partial association, accounting independently for 6.2 percent of the variance in music aptitude. Note that the reduction in variance explained (from 24.5 percent to 6.2 percent) highlights the overlap between music training and non-musical variables, which are typically overlooked in this line of research. When the non-musical variables were considered jointly, they accounted uniquely for 12.2 percent of the variance in music aptitude (with music training held constant). In other words, music aptitude was predicted better by the non-musical variables than it was by music training alone. These findings highlight that music aptitude is more than the simple consequence of music training. Although music training might be the best predictor variable, other, non-musical variables play an important role. In a fourth study (Swaminathan et al., 2018), we used the same approach to examine the association between music training and reading ability among adults who were native or non-native speakers of English. As in previous research, reading ability was positively associated with duration of music training. We also found that reading ability improved in tandem with general cognitive ability, and it was better among native than non-native speakers of English. When these variables were considered jointly, general cognitive ability and native-language status had significant partial associations with reading ability, but music training did not. In other words, associations between music training and reading may be an artifact of other variables that are typically ignored in this line of research.
C The evidence that music training causes improvements in non-musical domains is very weak, except for the effect of rhythm- and listening-based
training, which appears to improve fine-grained listening skills in general, which can then enhance the ability to isolate and segment the sounds of speech. Because isolating speech sounds and matching them with letters or groups of letters is crucial for reading, rhythm- and listening-based training may go on to improve reading skills, particularly for those who have difficulty with reading (i.e., young children and children with dyslexia). These positive effects are evident primarily as a consequence of specially designed interventions that focus specifically on rhythm training and analytical listening. Typical conservatory-style training may not have the same effects, or much weaker effects. Otherwise, although there is ample evidence that music lessons are predictive of benefits in general cognitive abilities, visuospatial abilities, or language abilities, the causal evidence is very weak. Large-sample, longterm studies with random assignment to music lessons are virtually impossible to conduct because of cost and attrition. Moreover, when such efforts are made, the results may fail to generalize broadly because it is difficult to know what one is studying when motivation, personality, music aptitude, demographics, and general cognitive ability are held constant. In the real world, these factors play a key role in determining who takes music lessons, particularly for long durations of time. In short, evidence that traditional music pedagogies have non-musical cognitive benefits is lacking and unconvincing. Most of the available correlational evidence can be explained parsimoniously: high-functioning children are more likely than other children to take music lessons and to perform well on tests of many sorts. We therefore advocate a different approach—correlational designs that attempt to account for as many alternative explanations as possible. At the very least, this approach allows researchers to be sure that they are studying a real-world phenomenon, rather than an experimental or pedagogical artifact. Future research on music and non-musical abilities is likely to find nuanced results if individual differences in music aptitude and other variables, such as SES, personality, and general cognitive ability (Corrigall & Schellenberg, 2015; Corrigall et al., 2013), are considered alongside music training. In other words, the causes of music training may be just as important as its consequences.
A Supported by a grant from the Natural Sciences and Engineering Research Council of Canada awarded to EGS.
R Alain, C., Zendel, B. R., Hutka, S., & Bidelman, G. M. (2014). Turning down the noise: The benefit of musical training on the aging auditory brain. Hearing Research 308, 162–173. Amer, T., Kalender, B., Hasher, L., Trehub, S. E., & Wong, Y. (2013). Do older professional musicians have cognitive advantages? PLoS ONE 8(8), e71630. Besson, M., Schön, D., Moreno, S., Santos, A., & Magne, C. (2007). Influence of musical expertise and musical training on pitch processing in music and language. Restorative Neurology and Neuroscience 25(3–4), 399–410. Bhatara, A., Yeung, H. H., & Nazzi, T. (2015). Foreign language learning in French speakers is associated with rhythm perception, but not with melody perception. Journal of Experimental Psychology: Human Perception and Performance 41(2), 277–282. Bialystok, E., & DePape, A.-M. (2009). Musical expertise, bilingualism, and executive functioning. Journal of Experimental Psychology: Human Perception and Performance 35(2), 565–574. Bidelman, G. M., & Alain, C. (2015). Musical training orchestrates coordinated neuroplasticity in auditory brainstem and cortex to counteract age-related declines in categorical vowel perception. Journal of Neuroscience 35(3), 1240–1249. Bidelman, G. M., Hutka, S., & Moreno, S. (2013). Tone language speakers and musicians share enhanced perceptual and cognitive abilities for musical pitch: Evidence for bidirectionality between the domains of language and music. PLoS ONE 8(4): e60676. Bidelman, G. M., & Krishnan, A. (2010). Effects of reverberation on brainstem representation of speech in musicians and nonmusicians. Brain Research 1355, 112–125. Bilhartz, T. D., Bruhn, R. A., & Olson, J. E. (2000). The effect of early music training on child cognitive development. Journal of Applied Developmental Psychology 20(4), 615–636. Boebinger, D., Evans, S., Rosen, S., Lima, C.F., Manly, T., & Scott, S. K. (2015). Musicians and nonmusicians are equally adept at perceiving masked speech. Journal of the Acoustical Society of America 137(1), 378–387. Brandler, S., & Rammsayer, T. H. (2003). Differences in mental abilities between musicians and nonmusicians. Psychology of Music 31(2), 123–138. Brochard, R., Dufour, A., & Deprés, O. (2004). Effect of musical expertise on visuospatial abilities: Evidence from reaction times and mental imagery. Brain and Cognition 54(2), 103–109. Brody, N. (1992). Intelligence (2nd ed.). San Diego, CA: Academic Press. Bugos, J. A., Perlstein, W. M., McCrae, C. S., Brophy, T. S., & Bedenbaugh, P. H. (2007). Individualized piano instruction enhances executive functioning and working memory in older adults. Aging and Mental Health 11(4), 464–471. Butzlaff, R. (2000). Can music be used to teach reading? Journal of Aesthetic Education 34(3–4), 167–178.
Catterall, J., Chapleau, R., & Iwanaga, J. (1999). Involvement in the arts and human development: General involvement and intensive involvement in music and theatre arts. In E. Fiske (Ed.), Champions of change: The impact of the arts on learning (pp. 1–18). Washington, DC: The Arts Education Partnership and The President’s Committee on the Arts and the Humanities. Chan, A. S., Ho, Y. C., & Cheung, M. C. (1998). Music training improves verbal memory. Nature 396(6707), 128. Cheek, J. M., & Smith, L. R. (1999). Music training and mathematics achievement. Adolescence 34(136), 759–761. Chobert, J., François, C., Velay, J. L., & Besson, M. (2014). Twelve months of active musical training in 8- to 10-year-old children enhances the preattentive processing of syllabic duration and voice onset time. Cerebral Cortex 24(4), 956–967. Chobert, J., Marie, C., François, C., Schön, D., & Besson, M. (2011). Enhanced passive and active processing of syllables in musician children. Journal of Cognitive Neuroscience 23(12), 3874– 3887. Conway, A. R. A., Getz, S. J., Macnamara, B., & Engel de Abreu, P. M. J. (2011). Working memory and fluid intelligence. In R. J. Sternberg & S. B. Kaufman (Eds.), The Cambridge handbook of intelligence (pp. 394–418). Cambridge: Cambridge University Press. Corenblum, B., & Marshall, E. (1998). The band played on: Predicting students’ intentions to continue studying music. Journal of Research in Music Education 46(1), 128–140. Corrigall, K. A., & Schellenberg, E. G. (2015). Predicting who takes music lessons: Parent and child characteristics. Frontiers in Psychology 6, 282. doi: 10.3389/fpsyg.2015.00282 Corrigall, K. A., Schellenberg, E. G., & Misura, N. M. (2013). Music training, cognition, and personality. Frontiers in Psychology 4, 222. doi: 10.3389/fpsyg.2013.00222 Corrigall, K. A., & Trainor, L. J. (2011). Associations between length of music training and reading skills in children. Music Perception 29(2), 147–155. Costa-Giomi, E. (1999). The effects of three years of piano instruction on children’s cognitive development. Journal of Research in Music Education 47(3), 198–212. Costa-Giomi, E. (2004). Effects of three years of piano instruction on children’s academic achievement, school performance and self-esteem. Psychology of Music 32(2), 139–152. Dankovičová, J., House, J., Crooks, A., & Jones, K. (2007). The relationship between musical skills, music training, and intonation analysis skills. Language and Speech 50(2), 177–225. Deary, I. J., Strand, S., Smith, P., & Fernandes, C. (2007). Intelligence and educational achievement. Intelligence 35(1), 13–21. Degé, F., Kubicek, C., & Schwarzer, G. (2011). Music lessons and intelligence: A relation mediated by executive functions. Music Perception 29(2), 195–201. Degé, F., & Schwarzer, G. (2011). The effect of a music program on phonological awareness in preschoolers. Frontiers in Psychology 2, 124. doi: 10.3389/fpsyg.2011.00124. Degé, F., Wehrum, S., Stark, R., & Schwarzer, G. (2014). Music lessons and academic self-concept in 12- to 14-year-old children. Musicae Scientiae 18(2), 203–215. Delogu, F., Lampis, G., & Belardinelli, M. O. (2010). From melody to lexical tone: Musical ability enhances specific aspects of foreign language perception. European Journal of Cognitive Psychology 22(1), 46–61. Elbert, T., Pantev, C., Wienbruch, C., Rockstroh, B., & Taub, E. (1995). Increased cortical representation of the fingers of the left hand in string players. Science 270(5234), 305–307. Faßhauer, C., Frese, A., & Evers, S. (2015). Musical ability is associated with enhanced auditory and visual cognitive processing. BMC Neuroscience 16(1), 1. Retrieved from https://doi.org/10.1186/s12868-015-0200-4
Fauvel, B., Groussard, M., Mutlu, J., Arenaza-Urquijo, E. M., Eustache, F., Desgranges, B., & Platel, H. (2014). Musical practice and cognitive aging: Two cross-sectional studies point to phonemic fluency as a potential candidate for a use-dependent adaptation. Frontiers in Aging Neuroscience 6, 227. doi:10.3389/fnagi.2014.00227 Fitzpatrick, K. R. (2006). The effect of instrumental music participation and socioeconomic status on Ohio fourth-, sixth-, and ninth-grade proficiency test performance. Journal of Research in Music Education 54(1), 73–84. Flaugnacco, E., Lopez, L., Terribili, C., Montico, M., Zoia, S., & Schön, D. (2015). Music training increases phonological awareness and reading skills in developmental dyslexia: A Randomized control trial. PloS ONE 10(9), e0138715. Forgeard, M., Winner, E., Norton, A., & Schlaug, G. (2008). Practicing a musical instrument in childhood is associated with enhanced verbal ability and nonverbal reasoning. PLoS ONE 3(10), e3566. Frakes, L. (1985). Differences in music achievement, academic achievement, and attitude among participants, dropouts, and nonparticipants in secondary school music. Dissertation Abstracts International 46, 370A. University Microfilms No. AAC8507938. François, C., Chobert, J., Besson, M., & Schön, D. (2013). Music training for the development of speech segmentation. Cerebral Cortex 23(9), 2038–2043. Franklin, M. S., Moore, K. S., Yip, C., Jonides, J., Rattray, K., & Moher, J. (2008). The effects of musical training on verbal memory. Psychology of Music 36(3), 353–365. Gardiner, M., Fox, A., Knowles, F., & Jeffry, D. (1996). Learning improved by arts training. Nature 381(6580), 284. Gaser, C., & Schlaug, G. (2003). Brain structures differ between musicians and non-musicians. Journal of Neuroscience 23(27), 9240–9245. Gibson, C., Folley, B. S., & Park, S. (2009). Enhanced divergent thinking and creativity in musicians: A behavioral and near-infrared spectroscopy study. Brain and Cognition 69(1), 162–169. Good, A., Gordon, K. A., Papsin, B. C., Nespoli, G., Hopyan, T., Peretz, I., & Russo, F. A. (2017). Benefits of music training for perception of emotional speech prosody in deaf children with cochlear implants. Ear & Hearing 38(4), 455–464. Gordon, E. E. (1965). Music aptitude profile. Chicago: GIA. Gouzouasis, P., Guhn, M., & Kishor, N. (2007). The predictive relationship between achievement and participation in music and achievement in core Grade 12 academic subjects. Music Education Research 9(1), 81–92. Grassi, M., Meneghetti, C., Toffalini, E., & Borella, E. (2017). Auditory and cognitive performance in elderly musicians and nonmusicians. PLoS ONE 12(11), e0187881. Graziano, A. B., Peterson, M., & Shaw, G. L. (1999). Enhanced learning of proportional math through music training and spatial-temporal training. Neurological Research 21(2), 139–152. Gromko, J. E. (2005). The effect of music instruction on phonemic awareness in beginning readers. Journal of Research in Music Education 53(3), 199–209. Gromko, J. E., & Poorman, A. S. (1998). Developmental trends and relationships in children’s aural perception and symbol use. Journal of Research in Music Education 46(1), 16–23. Gruhn, W., Galley, N., & Kluth, C. (2003). Do mental speed and musical abilities interact? Annals of the New York Academy of Sciences 999, 485–496. Guo, X., Ohsawa, C., Suzuki, A., & Sekiyama, K. (2018). Improved digit span in children after a 6week intervention of playing a musical instrument: An exploratory randomized controlled trial. Frontiers in Psychology 8, 2303. doi: 10.3389/fpsyg.2017.02303 Hanna-Pladdy, B., & Gajewski, B. (2012). Recent and past musical activity predicts cognitive aging variability: Direct comparison with general lifestyle activities. Frontiers in Human Neuroscience
6, 198. doi: 10.3389/fnhum.2012.00198 Hanna-Pladdy, B., & MacKay, A. (2011). The relation between instrumental musical activity and cognitive aging. Neuropsychology 25(3), 378–386. Hannon, E. E., & Trainor, L. J. (2007). Music acquisition: Effects of enculturation and formal training on development. Trends in Cognitive Sciences 11(11), 466–472. Hansen, M., Wallentin, M., & Vuust, P. (2013). Working memory and musical competence of musicians and non-musicians. Psychology of Music 41(6), 779–793. Hassler, M., Birbaumer, N., & Feil, A. (1985). Musical talent and visuo-spatial abilities: A longitudinal study. Psychology of Music 13(2), 99–113. Haywood, S., Griggs, J., Lloyd, C., Morris, S., Kiss, Z., & Skipp, A. (2015). Creative futures: Act, sing, play. Evaluation report and executive summary. London: Educational Endowment Foundation. Helmbold, N., Rammsayer, T., & Altenmüller, E. (2005). Differences in primary mental abilities between musicians and nonmusicians. Journal of Individual Differences 26, 74–85. Herholz, S. C., & Zatorre, R. J. (2012). Musical training as a framework for brain plasticity: Behavior, function, and structure. Neuron 76(3), 486–502. Herrero, L., & Carriedo, N. (2018). Differences in updating processes between musicians and nonmusicians from late childhood to adolescence. Learning and Individual Differences 61, 188–195. Hetland, L. (2000). Learning to make music enhances spatial reasoning. Journal of Aesthetic Education 34(3–4), 179–238. Ho, Y., Cheung, M., & Chan, A. S. (2003). Music training improves verbal but not visual memory: Cross-sectional and longitudinal explorations in children. Neuropsychology 17(3), 439–450. Jaeggi, S. M., Buschkuehl, M., Jonides, J., & Perrig, W. J. (2008). Improving fluid intelligence with training on working memory. Proceedings of the National Academy of Sciences 105, 6829–6833. Jakobson, L. S., Lewycky, S. T., Kilgour, A. R., & Stoesz, B. M. (2008). Memory for verbal and visual material in highly trained musicians. Music Perception 26(1), 41–55. Jaschke, A. C., Honing, H., & Scherder, E. J. A. (2018). Longitudinal analysis of music education on executive functions in primary school children. Frontiers in Neuroscience 12, 103. doi:10.3389/fnins.2018.00103 Jensen, A. R. (1969). How much can we boost IQ and scholastic achievement? Harvard Educational Review 39(1), 1–123. Jensen, A. R. (1998). The g factor: The science of mental ability. Westport, CT: Praeger. Judge, T. A., Higgins, C. A., Thoresen, C. J., & Barrick, M. R. (1999). The big five personality traits, general mental ability, and career success across the life span. Personnel Psychology, 52(3), 621– 652. Kaviani, H., Mirbaha, H., Pournaseh, M., & Sagan, O. (2014). Can music lessons increase the performance of preschool children in IQ tests? Cognitive Processing 15(1), 77–84. Kinney, D. W. (2008). Selected demographic variables, school music participation and achievement test scores of urban middle school students. Journal of Research in Music Education 56(2), 145– 161. Kinney, D. W. (2010). Selected nonmusic predictors of urban students’ decisions to enroll and persist in middle school band programs. Journal of Research in Music Education 57(4), 334–350. Klinedinst, R. E. (1991). Predicting performance achievement and retention of fifth-grade instrumental students. Journal of Research in Music Education 39(3), 225–238. Klingberg, T. (2010). Training and plasticity of working memory. Trends in Cognitive Sciences 14(7), 317–324. Koelsch, S., Schröger, E., & Tervaniemi, M. (1999). Superior pre-attentive auditory processing in musicians. Neuroreport 10, 1309–1313.
Kolinsky, R., Cuvelier, H., Goetry, V., Peretz, I., & Morais, J. (2009). Music training facilitates lexical stress processing. Music Perception 26(3), 235–246. Kraus, N., & Chandrasekaran, B. (2010). Music training for the development of auditory skills. Nature Reviews Neuroscience 11, 599–605. Kraus, N., Slater, J., Thompson, E. C., Hornickel, J., Strait, D. L., Nicol, T., & White-Schwoch, T. (2014). Music enrichment programs improve the neural encoding of speech in at-risk children. Journal of Neuroscience 34(36), 11913–11918. Law, L. N. C., & Zentner, M. (2012). Assessing musical abilities objectively: Construction and validation of the Profile of Music Perception Skills. PLoS ONE 7(12), e52508. Leng, X., Shaw, G. L., & Wright, E. L. (1990). Coding of musical structure and the trion model of cortex. Music Perception 8(1), 49–62. Love, J. M., Chazan-Cohen, R., Raikes, H., & Brooks-Gunn, J. (2013). What makes a difference: Early Head Start evaluation findings in a developmental context. Monographs of the Society for Research in Child Development 78(1), 1–173. Mackintosh, N. J. (2011). IQ and human intelligence (2nd ed.). Oxford: Oxford University Press. Magne, C., Schön, D., & Besson, M. (2006). Musician children detect pitch violations in both music and language better than nonmusician children: Behavioral and electrophysiological approaches. Journal of Cognitive Neuroscience 18(2), 199–211. Mansens, D., Deeg, D. J. H., & Comijs, H. C. (2017). The association between singing and/or playing a musical instrument and cognitive functions in older adults. Aging & Mental Health. doi:10.1080/13607863.2017.1328481 Marques, C., Moreno, S., Castro, S. L., & Besson, M. (2007). Musicians detect pitch violation in a foreign language better than non-musicians: Behavioral and electrophysiological evidence. Journal of Cognitive Neuroscience 19(9), 1453–1463. Mehr, S. A., Schachner, A., Katz, R. C., & Spelke, E. S. (2013). Two randomized trials provide no consistent evidence for nonmusical cognitive benefits of brief preschool music enrichment. PLoS ONE 8(12), e82007. Melby-Lervåg, M., & Hulme, C. (2013). Is working memory training effective? A meta-analytic review. Developmental Psychology 49(2), 270–291. Melby-Lervåg, M., Reddick, T. S., & Hulme, C. (2016). Working memory training does not improve performance on measures of intelligence or other measures of “far transfer.” Perspectives on Psychological Science 11(4), 512–534. Moody, D. E. (2009). Can intelligence be increased by training on a task of working memory? Intelligence 37, 327–328. Moreno, S., Bialystok, E., Barac, R., Schellenberg, E. G., Cepeda, N. J., & Chau, T. (2011). Shortterm music training enhances verbal intelligence and executive function. Psychological Science 22(11), 1425–1433. Moreno, S., Friesen, D., & Bialystok, E. (2011). Effect of music training on promoting preliteracy skills: Preliminary causal evidence. Music Perception 29(2), 165–172. Moreno, S., Marques, C., Santos, A., Santos, M., Castro, S. L., & Besson, M. (2009). Musical training influences linguistic abilities in 8-year-old children: More evidence for brain plasticity. Cerebral Cortex 19(3), 712–723. Mosing, M. A., Madison, G., Pedersen, N. L., & Ullén, F. (2016). Investigating cognitive transfer within the framework of music practice: Genetic pleiotropy rather than causality. Developmental Science 19(3), 504–512. A Münte, T. F., Altenmüller, E., & Jäncke, L. (2002). The musician’s brain as a model of neuroplasticity. Nature Reviews Neuroscience 3(6), 473–478.
Okada, B. M., & Slevc, R. L. (2018). Individual differences in musical training and executive functions: A latent variable approach. Memory & Cognition. doi:10.3758/s13421-018-0822-8 Overy, K. (2003). Dyslexia and music. Annals of the New York Academy of Sciences 999, 497–505. Parbery-Clark, A., Skoe, E., Lam, C., & Kraus, N. (2009). Musician enhancement for speech in noise. Ear and Hearing 30(6), 653–661. Parbery-Clark, A., Strait, D. L., Anderson, S., Hittner, E., & Kraus, N. (2011). Musical experience and the aging auditory system: Implications for cognitive abilities and hearing speech in noise. PLoS ONE 6(5), e18082. Parbery-Clark, A., Tierney, A., Strait, D. L., & Kraus, N. (2012). Musicians have fine-tuned neural distinction of speech syllables. Neuroscience 219, 111–119. Patel, A. D. (2003). Language, music, syntax and the brain. Nature Neuroscience 6, 674–681. Patel, A. D. (2011). Why would musical training benefit the neural encoding of speech? The OPERA hypothesis. Frontiers in Psychology 2, 142. doi: 10.3389/fpsyg.2011.00142 Patston, L. L. M., & Tippett, L. J. (2011). The effect of background music on cognitive performance in musicians and nonmusicians. Music Perception 29(2), 173–183. Petitto, L. (2008). Arts education, the brain, and language. In B. Rich & C. Asbury (Eds.), Learning, arts, and the brain: The Dana Consortium report on arts and cognition (pp. 93–104). New York/Washington, DC: The Dana Foundation. Piro, J. M., & Oritz, C. (2009). The effect of piano lessons on the vocabulary and verbal sequencing skills of primary grade students. Psychology of Music 37(3), 325–347. Portowitz, A., Lichtenstein, O., Egorova, L., & Brand, E. (2009). Underlying mechanisms linking music education and cognitive modifiability. Research Studies in Music Education 31(2), 107–128. Posedel, J., Emery, L., Souza, B., & Fountain, C. (2012). Pitch perception, working memory, and second-language phonological production. Psychology of Music 40(4), 508–517. Posner, M., Rothbart, M. K., Sheese, B. E., & Kieras, J. (2008). How arts training influences cognition. In B. Rich & C. Asbury (Eds.), Learning, arts, and the brain: The Dana Consortium report on arts and cognition (pp. 1–10). New York/Washington, DC: The Dana Foundation. Rapport, M. D., Orban, S. A., Kofler, M. J., & Friedman, L. M. (2013). Do programs designed to train working memory, other executive functions, and attention benefit children with ADHD? A meta-analytic review of cognitive, academic, and behavioral outcomes. Clinical Psychology Review 33(8), 1237–1252. Rauscher, F. H., & Hinton, S. C. (2011). Music instruction and its diverse extra-musical benefits. Music Perception 29(2), 215–226. Rauscher, F. H., & Shaw, G. L. (1998). Key components of the Mozart effect. Perceptual and Motor Skills 86(3), 835–841. Rauscher, F. H., Shaw, G. L., Levine, L. J., Wright, E. L., Dennis, W. R., & Newcomb, R. L. (1997). Music training causes long-term enhancement of preschool children’s spatial-temporal reasoning. Neurological Research 19(1), 2–8. Rauscher, F. H., & Zupan, M. A. (2000). Classroom keyboard instruction improves kindergarten children’s spatial-temporal performance: A field experiment. Early Childhood Research Quarterly 15(2), 215–228. Roden, I., Grube, D., Bongard, S., & Kreutz, G. (2014). Does music training enhance working memory performance? Findings from a quasi-experimental longitudinal study. Psychology of Music 42(2), 284–298. Rueda, M. R., Rothbart, M. K., McCandliss, B. D., Saccamanno, L., & Posner, M. I. (2005). Training, maturation and genetic influences on the development of executive attention. Proceedings of the National Academy of Sciences 102, 14931–14936.
Ruggles, D. R., Freyman, R. L., & Oxenham, A. J. (2014). Influence of musical training on understanding voiced and whispered speech in noise. PLoS ONE 9(1), e86980. Sala, G., & Gobet, F. (2016). Do the benefits of chess instruction transfer to academic and cognitive skills? A meta-analysis. Educational Research Review 18, 46–57. Sala, G., & Gobet, F. (2017a). When the music’s over: Does music skill transfer to children’s and young adolescents’ cognitive and academic skills? A meta-analysis. Educational Research Review 20, 55–67. Sala, G., & Gobet, F. (2017b). Working memory training in typically developing children: A metaanalysis of the available evidence. Developmental Psychology 53, 671–685. Sala, G., Tatlidil, K. S., & Gobet, F. (2018). Video game training does not enhance cognitive ability: A comprehensive meta-analytic investigation. Psychological Bulletin 144, 111–139. Salthouse, T. A. (2004). What and when of cognitive aging. Current Directions in Psychological Science 13(4), 140–144. Salthouse, T. A. (2005). Relations between cognitive abilities and measures of executive functioning. Neuropsychology 19(4), 532–545. Schellenberg, E. G. (2004). Music lessons enhance IQ. Psychological Science 15(8), 511–514. Schellenberg, E. G. (2006). Long-term positive associations between music lessons and IQ. Journal of Educational Psychology 98(2), 457–468. Schellenberg, E. G. (2011a). Examining the association between music lessons and intelligence. British Journal of Psychology 102(3), 283–302. Schellenberg, E. G. (2011b). Music lessons, emotional intelligence, and IQ. Music Perception 29(2), 185–194. Schellenberg, E. G., & Mankarious, M. (2012). Music training and emotion comprehension in childhood. Emotion 12(5), 887–891. Schellenberg, E. G., & Moreno, S. (2010). Music lessons, pitch processing and g. Psychology of Music 38(2), 209–221. Schellenberg, E. G., & Peretz, I. (2008). Music, language and cognition: Unresolved issues. Trends in Cognitive Sciences 12(2), 45–46. Schellenberg, E. G., & Weiss, M. W. (2013). Music and cognitive abilities. In D. Deutsch (Ed.), The psychology of music (3rd ed.). San Diego, CA: Elsevier. Seashore, C. E., Lewis, D., & Saetveit, J. G. (1960). The Seashore measures of musical talents. New York: Psychological Corporation. Shipstead, Z., Redick, T. S., & Engle, R. W. (2012). Is working memory training effective? Psychological Bulletin 138, 628–654. Sirin, S. R. (2005). Socioeconomic status and academic achievement: A meta-analytic review of research. Review of Educational Research 75(3), 417–453. Skoe, E., & Kraus, N. (2012). Human subcortical auditory function provides a new conceptual framework for considering modularity. In P. Rebuschat, M. Rohrmeier, J. A. Hawkins, & I. Cross (Eds.), Language and music as cognitive systems (pp. 269–282). Oxford: Oxford University Press. Slater, J., Skoe, E., Strait, D. L., O’Connell, S., Thompson, E., & Kraus, N. (2015). Music training improves speech-in-noise perception: Longitudinal evidence from a community-based music program. Behavioural Brain Research 291, 244–252. Slevc, L. R., Davey, N. S., Buschkuehl, M., & Jaeggi, S. M. (2016). Tuning the mind: Exploring the connections between musical ability and executive functions. Cognition 152, 199–211. Sluming, V., Brooks, J., Howard, M., Downes, J. J., & Roberts, N. (2007). Broca’s area supports enhanced visuospatial cognition in orchestral musicians. Journal of Neuroscience 27(14), 3799– 3806.
Soveri, A., Antfolk, J., Karlsson, L., Salo, B., & Laine, M. (2017). Working memory training revisited: A multi-level meta-analysis of n-back training studies. Psychonomic Bulletin & Review 24(4), 1077–1096. Spelke, E. (2008). Effects of music instruction on developing cognitive systems at the foundation of mathematics and science. In B. Rich & C. Asbury (Eds.), Learning, arts, and the brain: The Dana Consortium report on arts and cognition (pp. 17–49). New York/Washington, DC: The Dana Foundation. Spinath, B., Spinath, F. M., Harlaar, N., & Plomin, R. (2006). Predicting school achievement from general cognitive ability, self-perceived ability, and intrinsic value. Intelligence 34(4), 363–374. Standley, J. M. (2008). Does music instruction help children learn to read? Evidence of a metaanalysis. Update: Applications of Research in Music Education 27(1), 17–32 Stoesz, B., Jakobson, L., Kilgour, A., & Lewycky, S. (2007). Local processing advantage in musicians: Evidence from disembedding and constructional tasks. Music Perception 25(2), 153– 165. Strait, D. L., & Kraus, N. (2011). Can you hear me now? Musical training shapes functional brain networks for selective auditory attention and hearing speech in noise. Frontiers in Psychology 2, 113. doi: 10.3389/fpsyg.2011.00113 Strait, D. L., O’Connell, S., Parbery-Clark, A., & Kraus, N. (2014). Musicians’ enhanced neural differentiation of speech sounds arises early in life: Developmental evidence from ages 3 to 30. Cerebral Cortex 24(9), 2512–2521. Strait, D. L., Parbery-Clark, A., Hittner, E., & Kraus, N. (2012). Musical training during early childhood enhances the neural encoding of speech in noise. Brain and Language 123(3), 191–201. Strait, D. L., Parbery-Clark, A., O’Connell, S., & Kraus, N. (2013). Biological impact of preschool music classes on processing speech in noise. Developmental Cognitive Neuroscience 6, 51–60. Swaminathan, J., Mason, C. R., Streeter, T. M., Best, V., Kidd Jr, G., & Patel, A. D. (2015). Musical training, individual differences and the cocktail party problem. Scientific Reports 5, 11628. doi:10.1038/srep11628 Swaminathan, S., & Gopinath, J. K. (2013). Music training and second-language English comprehension and vocabulary skills in Indian children. Psychological Studies 58(2), 164–170. Swaminathan, S., & Schellenberg, E. G. (2016). Music training. In T. Strobach & J. Karbach (Eds.), Cognitive training: An overview of features and applications (pp. 137–144). New York: Springer. Swaminathan, S., & Schellenberg, E. G. (2017). Musical competence and phonemic perception in a foreign language. Psychonomic Bulletin and Review 24(6), 1929–1934. Swaminathan, S., & Schellenberg, E. G. (2018). Musical competence is predicted by music training, cognitive abilities, and personality. Manuscript submitted for publication. Swaminathan, S., Schellenberg, E. G., & Khalil, S. (2017). Revisiting the association between music lessons and intelligence: Training effects or music aptitude? Intelligence 62, 119–124. Swaminathan, S., Schellenberg, E. G., & Venkatesan, K. (2018). Explaining the association between music training and reading in adults. Journal of Experimental Psychology: Learning, Memory, and Cognition. doi: 10.1037/xlm0000493 Talamini, F., Grassi, M., Toffalini, E., Santoni, R., & Carretti, B. (2018). Learning a second language: Can music aptitude or music training have a role? Learning and Individual Differences 64, 1–7. Thompson, W. F., Schellenberg, E. G., & Husain, G. (2004). Decoding speech prosody: Do music lessons help? Emotion 4(1), 46–64. Thomson, J. M., Leong, V., & Goswami, U. (2013). Auditory processing interventions and developmental dyslexia: A comparison of phonemic and rhythmic approaches. Reading and Writing 26(2), 139–161.
Thorndike, E. L., & Woodworth, R. S. (1901a). The influence of improvement in one mental function upon the efficiency of other functions (I). Psychological Review 8, 247–261. Thorndike, E. L., & Woodworth, R. S. (1901b). The influence of improvement in one mental function upon the efficiency of other functions (II). The estimation of magnitudes. Psychological Review 8, 384–395. Tierney, A., Krizman, J., Skoe, E., Johnston, K., & Kraus, N. (2013). High school music classes enhance the neural processing of speech. Frontiers in Psychology 4, 855. doi: 10.3389/fpsyg.2013.00855. Trimmer, C. G., & Cuddy, L. L. (2008). Emotional intelligence, not music training, predicts recognition of emotional speech prosody. Emotion 8(6), 838–849. Vaughn, K. (2000). Music and mathematics: Modest support for the oft-claimed relationship. Journal of Aesthetic Education 34(3–4), 149–166. Vaughn, K., & Winner, E. (2000). SAT scores of students who study the arts: What we can and cannot conclude about the association. Journal of Aesthetic Education 34(3–4), 77–89. Wallentin, M., Nielsen, A. H., Friis-Olivarius, M., Vuust, C., & Vuust, P. (2010). The Musical Ear Test: A new reliable rest for measuring musical competence. Learning and Individual Differences 20(3), 188–196. Wan, C. Y., & Schlaug, G. (2010). Music making as a tool for promoting brain plasticity across the life span. The Neuroscientist 16(5), 566–577. Wandell, B., Dougherty, R. F., Ben-Shachar, M., Deutsch, G. K., & Tsang, J. (2008). Training in the arts, reading, and brain imaging. In B. Rich & C. Asbury (Eds.), Learning, arts, and the brain: The Dana Consortium report on arts and cognition (pp. 51–59). New York/Washington, DC: The Dana Foundation. Weicker, J., Villringer, A., & Thöne-Otto, A. (2016). Can impaired working memory functioning be improved by training? A meta-analysis with a special focus on brain injured patients. Neuropsychology 30(2), 190–212. Weiss, M. W., & Bidelman, G. M. (2015). Listening to the brainstem: Musicianship enhances intelligibility of subcortical representations for speech. Journal of Neuroscience 35(4), 1687–1691. Wetter, O. E., Koerner, F., & Schwaninger, A. (2009). Does musical training improve school performance? Instructional Science 37(4), 365–374. White-Schwoch, T., Carr, K. W., Anderson, S., Strait, D. L., & Kraus, N. (2013). Older adults benefit from music training early in life: Biological evidence for long-term training-driven plasticity. Journal of Neuroscience 33(45), 17667–17674. Winner, E., & Cooper, M. (2000). Mute those claims: No evidence (yet) for a causal link between arts study and academic achievement. Journal of Aesthetic Education 34(3–4), 11–75. Winner, E., Goldstein, T. R., & Vincent-Lancrin, S. (2012). The impact of arts education: What do we know? Paris: Organisation for Economic Co-operation and Development. Wong, P. C. M., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nature Neuroscience 10, 420–422. Wu, H., Ma, X., Zhang, L., Liu, Y., Zhang, Y., & Shu, H. (2015). Musical experience modulates categorical perception of lexical tones in native Chinese speakers. Frontiers in Psychology 6, 436. doi:10.3389/fpsyg.2015.00436 Zendel, B. R., & Alain, C. (2009). Concurrent sound segregation is enhanced in musicians. Journal of Cognitive Neuroscience 21(8), 1488–1498. Zendel, B. R., & Alain, C. (2012). Musicians experience less age-related decline in central auditory processing. Psychology and Aging 27(20), 410–417. Zendel, B. R., & Alain, C. (2013). The influence of lifelong musicianship on neurophysiological measures of concurrent sound segregation. Journal of Cognitive Neuroscience 25(4), 503–516.
Zuk, J., Benjamin, C., Kenyon, A., & Gaab, N. (2014). Behavioral and neural correlates of executive functioning in musicians and non-musicians. PLoS ONE 9(6), e99868. Zuk, J., Ozernov-Palchik, O., Kim, H., Lakshminarayanan, K., Gabrieli, J. D., Tallal, P., & Gaab, N. (2013). Enhanced syllable discrimination thresholds in musicians. PLoS ONE 8(12), e80546.
CHAPT E R 27
THE NEUROSCIENCE OF CHILDREN ON THE AUTISM SPECTRUM WITH EXCEPTIONAL MUSICAL ABILITIES ADAM OCKELFORD
I T chapter considers the exceptional musicianship that characterizes some children on the autism spectrum who have learning difficulties, for whom many areas of life that most of us take for granted—speech and language, emotional intelligence, and social skills—present sometimes insurmountable challenges. Yet as developing musicians these children may function in much the same way as infant prodigies (McPherson, 2016). How is this possible, and what does it tell us about their evolving music minds? We begin by revisiting William Gaver’s “ecological” analysis of hearing in the context of autism.
G
’ E L
T A
In his “ecological” approach to understanding auditory perception, Gaver describes how, in everyday contexts, listeners typically privilege the function of sounds over their acoustic properties (Gaver, 1993). He gives the example of a pedestrian walking along an alley when a sound starts to emerge from behind: that of a car with a large and powerful engine. It is possible, Gaver contends, that the person concerned may attend to the sound’s timbre, noticing whether it is rough or smooth, or bright or dull. Paying attention to these attributes, which have to do with quality of the sound itself, Gaver terms “musical” listening. It is more likely however, in the situation described, that the pedestrian will notice that the car is approaching rather quickly from behind and that the sound of its engine is echoing off the narrow walls of the alley. There is a need to move quickly to get out of the vehicle’s way! This Gaver terms “everyday” listening: the experience of focusing on the significance of an event rather than its acoustic properties. However, there are people for whom this unthinking prioritization of function over form does not appear to occur—among them, many children with so-called “classic” autism—the type first identified by Leo Kanner in 1943. Autism is a lifelong, neurological condition that typically manifests itself within the first two or three years of childhood (see, for example, Boucher, 2009; Frith, 2003; Wing, 2003). Its effects pervade the whole of a child’s development. According to the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (the so-called “DSM-5”), published by the American Psychiatric Association in 2013, Autism Spectrum Disorder is identified through two criteria (APA, 2013, 299.00 (F84.0)): A. Persistent deficits in social communication and social interaction across multiple contexts, as manifested by the following [illustrative examples]: 1. Deficits in social-emotional reciprocity, ranging, for example, from abnormal social approach and failure of normal back-andforth conversation; to reduced sharing of interests, emotions, or affect; to failure to initiate or respond to social interactions. 2. Deficits in nonverbal communicative behaviors used for social interaction, ranging, for example, from poorly integrated verbal
and nonverbal communication; to abnormalities in eye contact and body language or deficits in understanding and use of gestures; to a total lack of facial expressions and nonverbal communication. 3. Deficits in developing, maintaining, and understanding relationships, ranging, for example, from difficulties adjusting behavior to suit various social contexts; to difficulties in sharing imaginative play or in making friends; to absence of interest in peers. B. Restricted, repetitive patterns of behavior, interests, or activities, as manifested by at least two of the following [illustrative examples]: 1. Stereotyped or repetitive motor movements, use of objects, or speech (e.g., simple motor stereotypies, lining up toys or flipping objects, echolalia, idiosyncratic phrases). 2. Insistence on sameness, inflexible adherence to routines, or ritualized patterns or verbal nonverbal behavior (e.g., extreme distress at small changes, difficulties with transitions, rigid thinking patterns, greeting rituals, need to take same route or eat same food every day). 3. Highly restricted, fixated interests that are abnormal in intensity or focus (e.g., strong attachment to or preoccupation with unusual objects, excessively circumscribed or perseverative interest). 4. Hyper- or hyporeactivity to sensory input or unusual interests in sensory aspects of the environment (e.g., apparent indifference to pain/temperature, adverse response to specific sounds or textures, excessive smelling or touching of objects, visual fascination with lights or movement). Criterion B, Example 4 is of particular interest to us here: the observation that the perceptual qualities of objects may be more important than their function. This is corroborated by the accounts of parents of children on the autism spectrum, who often report a deep-rooted fascination with sound for its own sake (Ockelford, 2013, pp. 13, 14). For instance, one mother reports that her son Jack “is obsessed with the beeping sound of the microwave when its cooking cycle comes to an end. He can’t bear to leave
the kitchen till it’s stopped. And just lately, he’s become very interested in the whirr of the tumble-drier too.” Another describes how her 4-year-old daughter just repeats what is said to her. “For a long time, she didn’t speak at all, but now I might say, ‘Hello, Anna’, and she will reply ‘Hello, Anna’. I ask ‘Do you want to play with your toys’ and she just says ‘Play with your toys’, though I don’t think she really knows what I mean.” A father relates how his son Ben “wants to listen to the jingles that he downloads from the internet all the time—16 hours a day if we let him. He doesn’t even play them all the way through: sometimes just the first couple of seconds of a clip, over and over again. He must have heard them thousands of times, but he never seems to get bored.” Freddie, according to his mother, is obsessed with sound too. He “constantly flicks any glasses, bowls, pots or pans that are within reach.” He once “emptied out the dresser—and even brought in half a dozen flowerpots from the garden—and lined everything up on the floor. Then he sat and ‘played’ his new instrument for hours.” Finally, Romy goes through phases of only pretending to play the notes on her keyboard—touching the keys with her fingers but not actually pressing them down. She also “introduces everyday sounds that she hears into her improvising. For example, she plays the complicated descending harmonic sound of the airplanes coming into land at Heathrow as chords, and somehow integrates them into the music she is playing.” What causes some autistic children to hear sounds in this way? And what impact does this idiosyncratic style of auditory perception have on the way they engage with music? To contextualize these questions, let us consider the manner in which so-called “neurotypical” infants come to process sound. If Gaver’s ecological model of hearing is correct, then there must be a stage in auditory development at which the separation between “functional” and “perceptual” listening occurs. Moreover, there is a further category of listening that Gaver does not mention: that pertaining to language, which is ultimately based on the perception and cognition of speech sounds. The separation of music and language perception ties in with evidence from neuroscience, which suggests that, while the two domains share some neurological resources, they also have dedicated processing pathways (Patel, 2012). These are distinct from those activated by environmental sounds (Norman-Haignere, Kanwisher, & McDermott, 2015).
As yet, it is unknown how and when the three types of auditory processing, relating to everyday sounds, music, and speech, become established in the architecture of the brain following the development of hearing from around four months before birth (Lecanuet, 1996). There is a growing body of evidence that musical engagement is essential to language acquisition (Brandt, Gebrian, & Slevc, 2012), suggesting that the neural correlates of music perception emerge first. My own work, the “zygonic” theory of music-structural understanding (Ockelford, 2017), supports this view.
Z
T
The theory sets out from the reductionist position that music can be regarded as a system of perceived sonic variables. Some of these, such as duration, have a single axis of variability, while others, like timbre, are multidimensional in nature; some, including loudness, gauge qualities, while others detail its perceived location in time or space; and some, like pitch, pertain to individual notes, while others, such as tonality, are characteristic of a group. Despite this diversity, all these variables have a common characteristic: they each have many potential modes of existence, or “values,” whose range represents the freedom of choice available to composers. Conversely, each may be deemed to be constrained or “ordered” to the extent that its value is reckoned to be subject to restriction. While some of the causes of perceived sonic constraint may lie beyond a composer’s immediate control (the selection of timbre will be dictated by the availability of performers, for example, and a singer may be unable to reach a particular pitch), and while external influences (such as the crossmedia effects of song-texts, for instance) often have a bearing, most—and certainly the most important—perceived sonic restrictions in fact function intra-musically, through the process of repetition. In short, a value may be thought to be ordered if it is reckoned to exist in imitation of another. Since the vast majority of listeners are quite unaware of this type of cognitive activity, clearly it need not operate at a conscious level. Yet it must be universally present, if only subliminally, otherwise an orderly sequence of sounds would prove no more effective a means of musical
communication than a random one, which is not the case. While the acknowledgment of the central role of repetition in musical structure is widespread in the music-theoretical and music-psychological literatures, the function of imitation is less well understood. In this respect, zygonic theory most closely resonates with the thinking of Cone (1987, p. 237), who asserts, in relation to the derivation of musical material, that “y is derived from x (y ← x), or, to use the active voice, x generates y (x → y), if y resembles x and y follows x. By ‘resembles’, I mean ‘sounds like’.” In zygonic theory, the connections between x and y that Cone identifies, through which a sense of derivation (or generation) is imagined to exist, are termed “zygonic relationships” (after the Greek “zygon,” meaning a yoke connecting two similar things). This single theoretical concept bequeaths a vast perceptual legacy, with many manifestations: potentially involving any perceived aspect of sound; existing over different periods of perceived time; and operating within the same and between different pieces, performances, and hearings. Zygonic relationships may function in a number of different ways: reactively, for example, in assessing the relationship between two extant values, or proactively, in ideating a value as an orderly continuation from one presented. They may operate between anticipated or remembered values, or even those that are wholly imagined, only ever existing in the mind. In music-structural terms, zygonic relationships function at three hierarchical levels: between individual events (notes or chords), groups (motifs, “hooks,” “licks,” or riffs) and frameworks (imaginary matrices of pitch and time, whose elements have different perceived probabilities of occurrence according to a listener’s previous exposure to pieces in a particular style or genre). Recognizing imitation between individual musical events is thought to take the least mental processing power of all forms of structure (Ockelford, 2017, p. 187), since it requires at most two or three items of musical information, in the form of notes or chords or the intervals between them, to be held in working memory and compared. The temporal envelope within which such structures occur is constrained, sometimes extending to little more than Edmund Husserl’s “perceived present.” Recognizing relationships between motifs is cognitively more demanding: organization of this kind necessarily involves four events or more, since at least two are required to create a group (and a minimum of two groups is required). The timespans of such structures are potentially greater than
those involving events alone, and may even implicate long-term memory. There is likely to be a greater degree of abstraction from the perceptual “surface” too. Imitative links between frameworks of pitch and onset times appear to be the most cognitively demanding of all. They depend on the existence of long-term “schematic” memories, whereby the details of the perceptual surface of music and individual connections perceived between musical events are not encoded in long-term memory discretely or independently, but are combined with many thousands of other similar data to create probabilistic networks of relationships between notional representations of pitch and perceived time. That is, large amounts of perceptual information are merged to enable the deep level of cognitive abstraction to occur. To sum up: the cognitive correlates of musical structure grow in complexity as one moves from events, to groups and then frameworks, reflecting an increasing amount of perceptual input, experienced over longer periods of time, and processed and stored using progressively more abstract forms of mental representation. Moreover, the cognitive operations pertaining to higher levels of structure must build on and incorporate those required to process lower levels, since connections between groups comprise series of relationships between events, and links between frameworks are established by acknowledging the correspondences that exist between groups.
T
N A
D
M P
To what extent does this hierarchy of music structures tie in with the development of children’s understanding of music? It seems that very young children have a built-in propensity to imitate others, and that this plays a part in early interactive sound-making using individual musical sounds (Meltzoff & Prinz, 2002). Similarly, in analyzing preverbal communication in babies from just two to seven months old, Papoušek (1996) found that up to half of these infants’ vocal sounds are part of reciprocal matching sequences that the children engage in with their mothers. These findings complement work by other researchers showing
that babies less than five months of age can replicate individual pitches (Kessen, Levine, & Wendrich, 1979), copy changes in pitch (Kuhl & Meltzoff, 1982), and emulate vowel-like sounds made by others. Each of these forms of interaction involves imitation at the level of events— showing engagement with musical sounds at the first level of the structural hierarchy. Engagement at the next level first appears from seven to eleven months, when babies repeat and vary groups of sounds, using them as the basic units of structure, through babbling that, according to Papoušek (1996, p. 106), involves producing “short musical patterns or phrases that soon become the core units for a new level of vocal practicing and play.” Gradually, groups of sounds may be linked through repetition or transposition to form chains, and the first self-sufficient improvised pieces emerge. Welch (2006) notes that between the ages of one and two, a typically spontaneous song comprises repetitions of a brief melodic phrase at different pitch centers. These are unlike adult singing, however, because “they lack a framework of stable pitches (a scale) and use a very limited set of contours in one song” (Dowling, 1982, pp. 416, 417). From the age of two-and-a-half, so-called “potpourri” melodies may appear (Moog, 1976, p. 115), which borrow and may transform features and fragments from other, standard songs that have been assimilated into the child’s own spontaneous singing (Hargreaves, 1986, p. 73). These self-generated melodies, which use materials derived from a repertoire that is familiar from a child’s musical culture, are termed “referent-guided improvisation” by Mang (2005). Finally, from around the age of four (the age can vary considerably), two advances occur, which pertain to the third level of the music-structural hierarchy: frameworks. First, children develop the capacity to abstract an underlying pulse from the surface rhythm of songs and other pieces (meaning that he or she can perform “in time” to a regular beat that is provided). Second, children’s singing acquires “tonal stability,” with the clear projection of a key center across all the phrases of a piece (Hargreaves, 1986, pp. 76, 77). These abilities imply a cognizance of repetition at a deeper structural level in the “background” organization of music. How does this compare with language development? Children start to understand a few key words towards the end of their first year, and will begin to speak using some of these words from around the age of 12
months. Around this time, they develop the capacity to process short phrases, and from 18–24 months they learn to juxtapose words in pairs themselves. Over the next two years, these become amalgamated into longer and more complex sentences, which are generated through an intuitive understanding of the syntax of the language (or languages) to which a child is exposed (Saxton, 2010). So much for the development of music and language processing in the early years. We can surmise that the third category in the ecological model of auditory perception—“everyday” sounds—must perceptually be the most primitive of all, since it requires less cognitive processing than either music or speech. Hence we can reasonably assume that, early on in human development, the brain typically treats all sound in the same way and that music processing starts to emerge as a distinct strand, first, followed by language. We can speculate that “everyday” sounds form the residue that is left. Based on these assumptions, a developmental model of ecological auditory perception can be constructed along the following lines as in Fig. 1. Note the underlying assumption that, in addition to their shared neural resources, music and language come to have other, distinct neural correlates during the first year of life. Since the precise nature of the sounds that constitute speech and music varies from one culture to another, it is appropriate to regard the model as indicative rather than prescriptive.
FIGURE 1. A visual representation of how the emerging streams of music and language processing arise in auditory development.
T P
D
A S
C
A
S
So much for “neurotypical” development. What of children on the autism spectrum, though? The parents’ descriptions cited above suggest that certain types of sound, particularly those that have a special salience for a given individual, or that they find singularly pleasing, such as the whirring of the tumble drier, have little or no functional significance for some autistic children. Rather, there is a tendency for them to be processed primarily in terms of their sounding qualities, in the same way that the elements of music are. Beyond this, it also appears that everyday sounds involving repetition or regular change (for example, the beeping of a microwave) may be processed in music-structural terms. That is to say, some children on the autism spectrum hear repetition that is generated mechanically or electronically as being imitative (see Fig. 2).
FIGURE 2. Some everyday sounds may be processed as music among children on the autism spectrum.
There is another possibility that should be acknowledged: that the autistic children who are preoccupied with the sounding qualities of certain everyday objects and the repetitive patterns that some of them make don’t hear these auditory phenomena in a musical way (as being derived from
through imitation) but purely as environmental regularities. By extension, it could be the case that the same children don’t hear music in a “musical” way, either, but merely as patterned sequences of sounds, to which no sense of human agency is transferred by imitation. Why should this be so? One explanation would be because such children did not engage in the early vocal interactions with carers—“communicative musicality” (Malloch & Trevarthen 2009)—that, early in life, may embed a sense of imitation in sounds that are repeated (Ockelford, 2017). However, the accounts of Freddie appropriating everyday sound-makers (flower pots) to be used as musical instruments, and Romy reproducing the whines of jet engines of airplanes coming in to land and integrating them into her improvisation at the piano, suggest that some autistic children, at least, do perceive everyday sounds in a musical way. It is conceivable that this tendency is reinforced by the ubiquity of music in the lives of young children (Lamont, 2008); in the developed world, they are typically surrounded by electronic games and gadgets, toys, mobile phones, MP3 players, computers, iPads, TVs, radios, and so on, all of which emanate music in some form. Music is to be found in much of the wider human environment too, including cafés, restaurants, shops, cinemas, waiting rooms, cars and airplanes, and at many religious gatherings and other public ceremonies. Given that children are inundated with nonfunctional (musical) sounds, designed, in one way or another, to influence emotional states and behavior, perhaps we should not be surprised that the sounds with which they often co-occur that to neurotypical ears are functional, should come to be processed in the same way. The manner in which some children on the autism spectrum perceive the world can have other consequences too. For instance, the development of language can be affected, resulting in “echolalia”—a distinctive form of speech widely reported among autistic children (Mills, 1993; Sterponi & Shankey, 2013) that was first defined as the meaningless repetition of words or phrases (Fay 1967, 1973). It appears, however, that echolalia actually fulfills a range of functions in verbal interaction (Prizant, 1979), including turn-taking and affirmation, and it often finds a place in non-interactive contexts too, serving as a self-reflective commentary or rehearsal strategy (McEvoy, Loveland, & Landry, 1988; Prizant & Duchan, 1981). Given the zygonic hypothesis that imitation lies at the heart of musical structure (Ockelford, 2013), it could be argued that one cause of echolalia is the
organization of language (in the absence of semantics and syntax) through the structure (repetition) that is present in all music. It is as though words are treated as musical objects in their own right, to be manipulated not according to their meaning or grammatical function, but purely through their sounding qualities. This implies a second modification to the ecological model of auditory development (see Fig. 3).
FIGURE 3. Speech may also be processed musically by some children on the autism spectrum.
It is worth noting that echolalia is not only found in the context of “special” development; it is a feature of “typical” language acquisition in young children too (Mcglone-Dorrian & Potter, 1984) when, it seems, the urge to imitate what is heard outstrips semantic understanding. This accords with a stage in the ecological model of auditory development when the two strands of communication through sound—language and music—are not yet cognitively distinct, and supports the notion that musical development precedes the onset of language. For some children on the autism spectrum, music itself can become “super-structured” with additional repetition, as the account of Ben (above) shows; it is common for children on the autism spectrum to play snippets of pieces or videos with music over and over again. It is as though the high proportion of repetition that characterizes music (which is at least 80 percent—see Ockelford, 2005), is insufficient for the mind that craves structure, and so it makes even more. In conversing with autistic adults who are able to verbalize why, as children, they would repeat musical excerpts in this way, it seems that the main reason for obsessively repeating a particularly fascinating series of sounds (apart from the sheer enjoyment that the regularity brings) is that they could hear more and more in the sequence concerned as they listened to it again and again. Bearing in mind that most music tends to be highly complex, with many events occurring simultaneously (and given that even individual notes tend to comprise many pitches in the form of harmonics), to the child with finely tuned auditory perception, there are many different things to attend to in even a few seconds of music, and an even greater number of potential relationships between sounds to fathom. So, for example, while listening to a passage for orchestra one hundred times may be extremely tedious to the “neurotypical” ear, which can detect only half a dozen composite events, each fused in perception, to the mind of the autistic child, which can break down the sequence into a dozen different melodic lines, the stimulus may be captivating.
A
P
One of the consequences of an early preoccupation with the “musical” qualities of sounds appears to be the development of “absolute pitch”—or “AP.” This is the ability to identify or produce pitches in isolation from others. In the Western population, the capacity is very rare, with an estimated prevalence of 1 in 10,000 (Takeuchi & Hulse, 1993). However, among those on the autism spectrum, the position is markedly different; recent estimates, derived from parental questionnaires, vary between 8 percent, n = 118 (Vamvakari, 2013) and 21 percent, n = 305 (Reese, 2014). These figures are broadly supported by DePape, Hall, Tillmann, and Trainor (2012) who found that 11 percent of 27 high-functioning adolescents with autism had AP. It is unusual to find such high orders of difference in the incidence of a perceptual ability and, evidently, there is something distinct in the way that the parts of the brain responsible for pitch memory wire themselves up in a significant minority of autistic children. Although AP is a useful skill in “neurotypical” musicians—including elite performers—it is an indispensable factor in the development of performance skills in autistic children with learning difficulties—so-called “musical savants” (Miller, 1989). It seems to be this unusual ability that both motivates and enables some young children with a very limited general understanding of the world around, from the age of 24 months or so, to pick out tunes or chords on instruments that they encounter (sometimes more or less by chance) at home or elsewhere. Often the instrument concerned will be an electric keyboard or piano. The children’s early experiments in producing music may well occur with no adult intervention—or, indeed, awareness of what they are doing. It is my contention that AP has this impact since each pitch sounds distinct, and potentially can elicit a powerful emotional response; hence, being able to reproduce these at will must surely be an intoxicating experience. But more than this, having AP makes learning to play by ear manageable, in a way that “relative pitch”—the capacity to process the differences between pitches (“intervals”)—does not. To understand why this should be so, consider a typical playground chant, based on the intervals of a minor third, a perfect fourth, and a major second (Fig. 4).
FIGURE 4. An archetypal playground chant.
In so-called “neurotypical” individuals, motifs such as this are likely to be cognitively encoded, stored and retrieved as a series of differences between notes (although some degree of absolute pitch memory will exist— a child would know if the chant were an octave too high, for example). For children with AP, though, the position is quite different, since they can capture the pitch data from the melody as a series of self-sufficient values, rather than a sequence of intervals. So, in seeking to remember and repeat groups of notes over extended periods of time, they have certain processing advantages over their neurotypical peers, who, by extracting and storing pitch information at a higher level of abstraction, lose the “surface detail.” Observe that there are apparent disadvantages to “absolute” representations of pitch too since, by regarding qualia in isolation, listeners cannot take advantage of the patterns that exist through the repetition of intervals, and so greater demands are made on memory. However, as the brain’s long-term storage capacity is so large, this is not a serious problem; indeed, having an exceptional memory is something that is common to many children with autism and all savants. It is the capacity for “absolute pitch data capture” that, in my view, explains why children who are on the autism spectrum and have learning difficulties with AP are able to develop instrumental skills at an early age with no formal tuition. This is because, for them, reproducing groups of notes that they have heard is merely a question of remembering a series of one-to-one mappings between given pitches as they sound and (very often) the keys on a keyboard that produce them. The crucial thing is that these relationships are invariant: once learnt, they can service a lifetime of music making, through which they are constantly reinforced. Conversely, were a child with “relative pitch” to try to play by ear, he or she would have a far more difficult task. Children in this position need to become proficient in the complicated process of calculating how the intervals that are perceived map onto the distances between keys, which, due to the asymmetries of the
keyboard, are likely to differ according to what would necessarily be an arbitrary starting point. Take, for example, the interval that exists between the first two notes of the playground chant (a minor third) shown in Fig. 4: this can be produced through no fewer than twelve distinct key combinations, comprising one of four underlying patterns. The complexity of the situation is compounded by the fact that virtually the same physical leap between other keys may sound different (a major third) according to its location on the keyboard (Fig. 5).
FIGURE 5. The different mechanisms involved in playing by ear using “absolute” and “relative” pitch processing.
It is important to point out that children with AP who learn to play rapidly acquire relative pitch processing skills too, enabling them to play melodies beginning on different notes. Indeed, it is not unusual for them to learn to reproduce pieces fluently in every key. This may seem contradictory, given the processing advantage conferred by being able to encode pitches as perceptual identities in their own right, each mapping uniquely onto a particular note on the keyboard. The reality of almost all pieces of music, however, is that melodic (and harmonic) motifs variously appear at different pitches through transposition and so, to make sense of music, young children with AP need to learn to process pitch relatively as well as absolutely (Stalinski & Schellenberg, 2010). In summary: the difference in pitch processing between musicians who are AP possessors and those who are not can be characterized with reference to an imaginary cognitive “ladder” of pitch, upon which the values of a given framework (a major or minor scale, for example) exist as rungs. Now, for a child with relative pitch, the ladder (whose configuration becomes clear from a rapid analysis of incoming intervals) is movable: it can exist at any pitch height and still offer a satisfactory pitch framework for a given piece. However, for a child with AP, the position of the ladder is fixed. So he or she has the advantage of both recognizing a particular pitch ladder, and knowing where it sits in “pitch-space.”
T
I
AP
E P
C A
W
M Y A
S What is the impact of AP on the musical engagement of children who are on the autism spectrum likely to be? There is no one answer to this question, since the individuals will vary hugely in terms of their preferences and motivations. I have written at length about the extraordinary life of Derek Paravicini (Ockelford, 2009), who is what Treffert (2009) would call a “prodigious” musical savant. It is simply not possible to imagine Derek without his piano playing, which embodies the way that he thinks, the way
he feels, and the way he relates to other people. A description of the way he engages with environmental sounds and speech is set out below, taken from a blog designed to raise awareness of autism and musicality. But there are many other children on the autism spectrum with whom I have worked who are no less “special” in their different ways. In this context, I offer two further accounts of children whom I have taught for a number of years. They too are taken from awareness-raising blogs.
Derek After many hours of the same dull drone – auditory chewing gum that has long since lost its flavor or interest – there is a sudden, almost imperceptible change in the humming of the plane’s engines. I glance outside and see that, at last, we are over the Nevada desert. Only an hour or so now until we hit Los Angeles. The young man sitting next to me – noticeably upright in his seat – stiffens slightly as he hears the tiny deviation in sound. ‘F sharp’, he intones. ‘It’s F sharp, Adam.’ He leans towards me, demanding a response, and the sun bounces off his trademark Prada sunglasses, but without penetrating the world of darkness beneath. ‘Yes, Derek’, I reply, ‘We’ll soon be landing at LAX.’ ‘Landing at LAX’, he echoes, apparently relishing the sound of the words – and their import – in equal measure. ‘And I will see Dana, and I will play the piano’, he continues. ‘Yes, Derek.’ I offer the same reply again, the sound of my voice as much as the words offering a reassurance forged in a relationship of many years – as Derek’s teacher, mentor and friend. ‘You’ll play the piano.’ Repetition confers calm, a hint of a smile crosses Derek’s features, and he relaxes back in his seat. Derek Paravicini – blind autistic savant, musician extraordinaire, learning disabled genius, unflagging companion – is on his way to California to perform in a series of concerts: grist to his globe-trotting mill. For him, airplanes are one of life’s many mysteries: a series of awkward slopes and steps to be negotiated; well-meaning helping hands; a waft of warm, stale air; ‘doors to automatic and crosscheck’; the sound of the engines starting up. Soon the seat seems to move and bump about, then steadiness; a long, vibrating steadiness. Les Mis on the headphones – once, twice, three times? At last, everything goes into reverse, and abruptly, we’re off the plane. Now there are new voices, new accents. A new hotel. Oatmeal instead of porridge for breakfast. And … finally … the piano. At last, something familiar. Every note a close friend. The band plays the same as in England. The clapping is familiar too, though people seem to clap louder in America. ‘Good job, Derek!’ ‘Awesome!’ ‘Can you smile for the photo?’ Derek wrinkles his nose, and everyone laughs, infectiously. He catches the humor, and smiles as well. Music has worked its magic, as it always does.1
This account illustrates how, even as a young adult, Derek’s propensity for processing everyday sounds and language in a musical way, acquired as a child, is still evident. For him, the sound of the jet engine has to be accepted as a feature of air travel, but is not understood; hence it remains for him at the level of pure auditory input, which he hears as musical notes. And there are traces too of his childhood tendency to repeat words— echolalia—as much for their sounds as their meanings, as he copies the ends of my contributions to our conversation. These are testament to his idiosyncratic neural circuitry, produced both as a result of hypoxia due to his extreme prematurity and the consequent exceptional cognitive environment that his developing brain had to endure, without visual input and with limited capacity to process language. And yet, with the necessary support, Derek can function well: meeting new people, interacting with them in unrehearsed social situations, and tolerating unfamiliar environments with equanimity. By celebrating the advantages and ameliorating the disadvantages of Derek’s autism spectrum condition, he has a quality of life that would be the envy of many, acknowledged internationally as a “special” musician. The current cohort of autistic children with whom I work visit me, with their parents, in a large practice room at the University of Roehampton. There are two pianos, to avoid potential difficulties over personal space. A number of the children rarely say a word. Some, like Romy, don’t speak at all. She converses through her playing, telling me what piece she would like next, and indicating when she’s had enough. Sometimes, she will tease me by apparently suggesting one thing when she means another. In this way, jokes are shared and, sometimes, feelings of sadness too. For Romy, music replaces words, and truly functions as a proxy language, with the exceptional neurological correlates that must entail.
Romy On Sunday mornings, at 10.00am, I steel myself for Romy’s arrival. I know that the next two hours will be an exacting test of my musical mettle. Yet Romy has severe learning difficulties, and she doesn’t speak at all. She is musical to the core, though; she lives and breathes music – it is the very essence of her being. With her passion comes a high degree of particularity; Romy knows precisely which piece she wants me to play, at what tempo, and in which key. And woe betide me if I get it wrong.
When we started working together, six years ago, mistakes and misunderstandings occurred all too frequently since, as it turned out, there were very few pieces that Romy would tolerate: for example, the theme from Für Elise (never the middle section); the Habanera from Carmen; and some snippets from ‘Buckaroo Holiday’ (the first movement of Aaron Copland’s Rodeo). Romy’s acute neophobia meant that even one note of a different piece would evoke shrieks of fear-cum-anger, and the session could easily grow into an emotional conflagration. So gradually, gradually, over weeks, then months, and then years, I introduced new pieces – sometimes, quite literally, at the rate of one note per session. On occasion, if things were difficult, I would even take a step back before trying to move on again the next time. And, imperceptibly at first, Romy’s fears started to melt away. The theme from Brahms’s Haydn Variations became something of an obsession, followed by the slow movement of Beethoven’s Pathetique sonata. Then it was Joplin’s The Entertainer, and Rocking All Over the World by Status Quo. Over the six years, Romy’s jigsaw box of musical pieces – fragments ranging from just a few seconds to a minute or so in length – has filled up at an ever-increasing rate. Now it’s overflowing, and it’s difficult to keep up with Romy’s mercurial musical mind; mixing and matching ideas in our improvised sessions, and even changing melodies and harmonies so they mesh together, or to ensure that my contributions don’t! As we play, new pictures in sound emerge and then retreat as a kaleidoscope of ideas whirls between us. Sometimes a single melody persists for fifteen minutes, even half an hour. For Romy, no matter how often it is repeated, a fragment of music seems to stay fresh and vibrant. At other times, it sounds as though she is trying to play several pieces at the same time – she just can’t get them out quickly enough, and a veritable nest of earworms wriggle their way onto the piano keyboard. Vainly I attempt to herd them into a common direction of musical travel. So here I am, sitting at the piano in Roehampton, on a Sunday morning in mid-November, waiting for Romy to join me (not to be there when she arrives is asking for trouble). I’m limbering up with a rather sedate rendition of the opening of Chopin’s Etude in C major, Op. 10, No. 1 when I hear her coming down the corridor, vocalizing with increasing fervor. I feel the tension rising, and as her father pushes open the door, she breaks away from him, rushes over to the piano and, with a shriek and an extraordinarily agile sweep of her arm, elbows my right hand out of the way at the precise moment that I was going to hit the D an octave above middle C. She usurps this note to her own ends, ushering in her favorite Brahms-Haydn theme. Instantly, Romy smiles, relaxes and gives me the choice of moving out of the way or having my lap appropriated as an unwilling cushion on the piano stool. I choose the former, sliding to my left onto a chair that I’d placed earlier in readiness for the move that I knew I would have to make. I join in the Brahms, and encourage her to use her left hand to add a bass line. She tolerates this up to the end of the first section of the theme, but in her mind she’s already moved on, and without a break in the sound, Romy steps onto the set of A Little Night Music, gently noodling around the introduction to Send in the Clowns. But it’s in the wrong key – G instead of E flat – which I know from experience means that she doesn’t really want us to go into the Sondheim classic, but instead wants me to play the first four bars (and only the first four bars) of Schumann’s Kleine Studie Op. 68, No. 14. Trying to perform the fifth bar would, in any case, be futile since Romy’s already started to play … now, is it I am Sailing or O Freedom? The opening ascent from D through E to G could signal either of those possibilities. Almost tentatively, Romy presses those three notes down and then looks at me and smiles, waiting, and knowing that whichever option I choose will be the wrong one. I just shake my head at her and plump for O Freedom, but sure enough Rod Stewart shoves the Spiritual out of the way before it has time to draw a second breath.
From there, Romy shifts up a gear to the Canon in D – or is it really Pachelbel’s masterpiece? With a deft flick of her little finger up to a high A, she seems to suggest that she wants Streets of London instead (which uses the same harmonies). I opt for Ralph McTell, but another flick, this time aimed partly at me as well as the keys, shows that Romy actually wants Beethoven’s Pathetique theme – but again, in the wrong key (D). Obediently I start to play, but Romy takes us almost immediately to A flat (the tonality that Beethoven originally intended). As soon as I’m there, though, Romy races back up the keyboard again, returning to Pachelbel’s domain. Before I’ve had time to catch up, though, she’s transformed the music once more; now we’re hearing the famous theme from Dvorak’s New World Symphony. I pause to recover my thoughts, but Romy is impatiently waiting for me to begin the accompaniment. Two or three minutes into the session, and we’ve already touched on twelve pieces spanning 300 years of Western music and an emotional range to match. Yet, here is a girl who in everyday life is supposed to have no ‘theory of mind’ – the capacity to put yourself in other people’s shoes and think what they are thinking. Here is someone who is supposed to lack the ability to communicate. Here is someone who functions, apparently, at an 18-month level. But I say here is a joyous musician who amazes all who hear her. Here is a girl in whom extreme ability and disability coexist in the most extraordinary way. Here is someone who can reach out through music and touch one’s emotions in a profound way. If music is important to us all, for Romy it is truly her lifeblood.2
How did Romy, severely learning disabled, become such a talented, if idiosyncratic, musician? According to the theory set out above, it was her early inability to process language, in tandem with her inability to grasp the portent of many everyday sounds, that enhanced her ability to process all sounds in a musical way. The two were inextricably linked. Indeed, without the former, we can surmise that the latter would never have developed. Romy’s AP means that, for her, every note on the piano is instantly recognizable. But more than this, for Romy, each pitch provides a stable point of reference in an otherwise capricious world. And it’s not just notes on the piano that function for Romy in this way. In her mind, each of the notes in any piece of music sounds distinct. While, for most of us, musical sounds pass by unremarkably in perceptual terms, for Romy, different notes, different chords, can affect her profoundly: an E flat major harmony can make her quiver with excitement, for example, while G7 can make her cry. In itself, though, AP is insufficient to make a “special” musician; that takes at least 7,000 hours of practice (Sloboda, Davidson, Howe, & Moore 1996). How, then, did Romy acquire her musical skills? Like many autistic children early in life, she developed an obsession, which in her case was a small electronic keyboard, whose notes lit up in the sequence needed to play one of a number of simple tunes. As far as Romy was concerned, this musical toy was one of only a few things with which she could
meaningfully interact, and whose logic she could understand. Unsurprisingly, she spent hundreds of hours playing with it. The keyboard was comfortingly predictable in comparison to with any human being— even her devoted family, whose language and behavior differed subtly from one occasion to another, as all human interaction does. The keyboard, though, invariably responded to Romy in the same way. Whenever she pressed a particular key, it always sounded the same as it did before. Here was something in the environment that Romy could predict and control. And so, through countless hours of self-directed exploration as a toddler, Romy discovered where all the notes (whose sounds she could hear in her head) are on the keyboard. Today, as a teenager, for Romy to play the piano merely requires her to hear a tune in her head (available to her through the internal library of songs, stored as series of absolute auditory images) and play along with it, pressing down the correct keys in sequence as their pitches sound in her head (see Fig. 5). And this approach works not only for music. As we noted above, she will reproduce the sounds of the jet engines of planes as they descend towards Heathrow Airport, for example, and she unhesitatingly copies any ringtones that interrupt her piano lessons. AP can have other consequences for children on the autism spectrum too. The absolute representation of sounds in their heads appears to fuel musical imagination in a way that is more vivid, more visceral even, than the relative memory of intervals alone. And, although formal research is yet to be undertaken, the anecdotal accounts of parents and teachers suggest that earworms are widespread; shown most obviously in some children’s incessant vocalizing of melodic fragments. With minds full of tunes that seem to be playing the whole time, external sounds can be at best superfluous and at worst an irritation, as the following account of a session with Freddie, then eleven years old, shows.
Freddie ‘Why’s he doing that?’ Freddie’s father, Simon, sounded more than usually puzzled by the antics of his son. After months of displacement activity, Freddie was finally sitting next to me at the piano, and looked as though this time he really were about to play. A final fidget and then his right hand moved towards the keys. With infinite care, he placed his thumb on middle C as he had watched
me do before – but without pressing it down. Silently, he moved to the next note (D), which he feathered in a similar way, using his index finger, then with the same precision he touched E, F, and G, before coming back down the soundless scale to an inaudible C. I couldn’t help smiling. ‘Fred, we need to hear the notes!’ My comment was rewarded with a deep stare, right into my eyes. Through them, almost. It was always hard to know what Freddie was thinking, but on this occasion he did seem to understand and was willing to respond to my request, since his thumb went back to C. Again, the key remained un-pressed, but this time he sang the note (perfectly in tune), and then the next one, and the next, until the five-finger exercise was complete. In most children (assuming that they had the necessary musical skills), such behavior would probably be regarded as an idiosyncratic attempt at humor or even mild naughtiness. But Freddie was being absolutely serious and was pleased, I think, to achieve what he’d been asked to do, for he had indeed enabled me to hear the notes! He stared at me again, evidently expecting something more, and without thinking I leant forward. ‘Now on this one, Fred’, I said, touching C sharp. Freddie gave the tiniest blink and a twitch of his head, and I imagined him, in a fraction of a second, making the necessary kinesthetic calculations. Without hesitation or error, he produced the five-finger exercise again, this time using a mixture of black and white notes. Each pressed silently. All sung flawlessly. And then, spontaneously, he was off up the keyboard, beginning the same pentatonic pattern on each of the twelve available keys. At my prompting, Freddie re-ran the sequence with his left hand – his unbroken voice hoarsely whispering the low notes. So logical. Why bother to play the notes if you know what they sound like already? So apparently simple a task, and yet … such a difficult feat to accomplish: the whole contradiction of autism crystallized in a few moments of music making.3
As I later said to Freddie’s father, if I had wanted to teach a “neurotypical” child to do what his son had achieved with little or no apparent effort, it would probably have taken many lessons, and hundreds of hours of practice for the pupil to master the relationship between the Western tonal system and the asymmetrical (yet regular) layout of the piano keyboard. Yet Freddie had done it merely by watching and listening to what I had done, attending to the streams of notes flowing by, extracting the implicit rules of Western musical syntax, and using these to create patterns of sounds anew. The crucial point is that I had never played the full sequence of scales to Freddie that he subsequently produced. He had worked out the necessary structures intuitively, merely through exposure to music. Here is a “special” musician indeed.
C A
: T C M
N E A
This chapter sets out a theory of how some children on the autism spectrum develop prodigious musical talent as a consequence of the way that they perceive everyday sounds and speech—in musical terms. In a significant minority of cases, this leads to the development of AP, which, given access to an appropriate instrument (typically a keyboard), enables such children to learn to play by ear. This skill is often acquired entirely through their own efforts and typically first manifests itself in the early years. The neural correlates of this exceptional development are yet to be explored through brain imaging, which in the case of children severely affected by autism, who tend to function successfully only in familiar environments, presents significant challenges (although one or two passive studies in the field of music and language have been undertaken; see, for example, Lai, Pantazatos, Schneider, & Hirsch, 2012; Sharda, Midha, Malik, Mukerji, & Singh, 2015). It is surely an area worth exploring, however, not only for the light it would shed on our knowledge of exceptionality, but for the fresh perspectives that human diversity offers the understanding of our species as a whole. This is possible because we exist on continua of interests, abilities, and traits, and it is my contention that, by analyzing the behaviors and their neural correlates of those who function at the extremes of our tribe’s natural neurodiversity, we can better understand the ordinary, everyday, musical engagement that is characteristic of us all. Most importantly, it’s my belief that, through the prism of the overtly remarkable, we can discover the uncelebrated exceptionality in each of us, for whether autistic or neurotypical, we are all musical by design (Ockelford, 2017, p. 9).
R American Psychiatric Association (2013). Diagnostic and statistical manual of mental disorders (5th ed.). Washington, DC: APA.
Boucher, J. (2009). The autistic spectrum: Characteristics, causes, and practical issues. London: Sage Publications. Brandt, A., Gebrian, M., & Slevc, L. R. (2012). Music and early language acquisition. Frontiers in Psychology 3. Retrieved from https://doi.org/10.3389/fpsyg.2012.00327 Cone, E. (1987). On derivation: Syntax and rhetoric. Music Analysis 6(3), 237–256. DePape, A.-M. R., Hall, G. B. C., Tillmann, B., & Trainor, L. J. (2012). Auditory processing in highfunctioning adolescents with autism spectrum disorder. PloS ONE 7(9), e44084. Dowling, J. (1982). Melodic information processing and its development. In D. Deutsch (Ed.), The psychology of music (pp. 413–429). New York: Academic Press. Fay, W. H. (1967). Childhood echolalia. Folia Phoniatrica et Logopaedica 19(4), 297–306. doi:10.1159/000263153 Fay, W. H. (1973). On the echolalia of the blind and of the autistic child. Journal of Speech and Hearing Disorders 38(4), 478–489. Frith, U. (2003). Autism: Explaining the enigma. Oxford: Wiley-Blackwell. Gaver, W. W. (1993). What in the world do we hear? An ecological approach to auditory event perception. Ecological Psychology 5(1), 1–29. Hargreaves, D. (1986). The developmental psychology of music. Cambridge: Cambridge University Press. Kessen, W., Levine, J., & Wendrich, K. (1979). The imitation of pitch in infants. Infant Behavior and Development 2, 93–99. Kuhl, P., & Meltzoff, A. (1982). The bimodal perception of speech in infancy. Science 218(4577), 1138–1141. Lai, G., Pantazatos, S., Schneider, H., & Hirsch, J. (2012). Neural systems for speech and song in autism. Brain 135(3), 961–975. Lamont, A. (2008). Young children’s musical worlds: Musical engagement in 3.5-year-olds. Journal of Early Childhood Research 6(3), 247–261. Lecanuet, J.-P. (1996). Prenatal auditory experience. In I. Deliège & J. Sloboda (Eds.), Musical beginnings (pp. 3–34). Oxford: Oxford University Press. McEvoy, R. E., Loveland, K. A., & Landry, S. H. (1988). The functions of immediate echolalia in autistic children: A developmental perspective. Journal of Autism and Developmental Disorders 18(4), 657–668. Mcglone-Dorrian, D., & Potter, R. E. (1984). The occurrence of echolalia in three year olds’ responses to various question types. Communication Disorders Quarterly 7(2), 38–47. McPherson, G. (Ed.). (2016). Musical prodigies: Interpretations from psychology, education, musicology, and ethnomusicology. New York: Oxford University Press. Malloch, S., & Trevarthen, C. (Eds.). (2009). Communicative musicality: Exploring the basis of human companionship. New York: Oxford University Press. Mang, E. (2005). The referent of early children’s songs. Music Education Research 7(1), 3–20. Meltzoff, A., & Prinz, W. (2002). The imitative mind: Development, evolution and brain bases. Cambridge: Cambridge University Press. Miller, L. (1989). Musical savants: Exceptional skill and mental retardation. Hillsdale, NJ: Lawrence Erlbaum. Mills, A. (1993). Visual handicap. In D. Bishop & K. Mogford (Eds.), Language development in exceptional circumstances (pp. 150–164). Hove: Psychology Press. Moog, H. (1976). The musical experiences of the pre-school child. Trans. C. Clarke. London: Schott. Norman-Haignere, S., Kanwisher, N. G., & McDermott, J. H. (2015). Distinct cortical pathways for music and speech revealed by hypothesis-free voxel decomposition. Neuron 88(6), 1281–1296.
Ockelford, A. (2005). Repetition in music: Theoretical and metatheoretical perspectives. Farnham: Ashgate. Ockelford, A. (2009). In the key of genius: The extraordinary life of Derek Paravicini. London: Random House. Ockelford, A. (2013). Applied musicology: Using zygonic theory to inform music education, therapy, and psychology research. New York: Oxford University Press. Ockelford, A. (2017). Comparing notes: How we make sense of music. London: Profile Books. Papoušek, M. (1996). Intuitive parenting: A hidden source of musical stimulation in infancy. In I. Deliège & J. Sloboda (Eds.), Musical beginnings (pp. 88–112). Oxford: Oxford University Press. Patel, A. D. (2012). Language, music, and the brain: A resource-sharing framework. In P. Rebuschat, M. Rohmeier, J. A. Hawkins, & I. Cross (Eds.), Language and music as cognitive systems (pp. 204–223). Oxford: Oxford University Press. Prizant, B. M. (1979). An analysis of the functions of immediate echolalia in autistic children. Dissertation Abstracts International 39(9-B), 4592–4593. Prizant, B. M., & Duchan, J. F. (1981). The functions of immediate echolalia in autistic children. Journal of Speech and Hearing Disorders 46(3), 241–249. Reese, A. (2014). The effect of exposure to structured musical activities on communication skills and speech for children and young adults on the autism spectrum (Doctoral dissertation). University of Roehampton, London. Saxton, M. (2010). Child language: Acquisition and development. London: Sage Publications. Sharda, M., Midha, R., Malik, S., Mukerji, S., & Singh, N. C. (2015). Fronto-temporal connectivity is preserved during sung but not spoken word listening, across the autism spectrum. Autism Research 8(2), 174–186. Sloboda, J. A., Davidson, J. W., Howe, M. J., & Moore, D. G. (1996). The role of practice in the development of performing musicians. British Journal of Psychology 87(2), 287–309. Stalinski, S. M., & Schellenberg, E. G. (2010). Shifting perceptions: Developmental changes in judgments of melodic similarity. Developmental Psychology 46(6), 1799–1803. Sterponi, L., & Shankey, J. (2013). Rethinking echolalia: Repetition as interactional resource in the communication of a child with autism. Journal of Child Language 41(2), 275–304. Takeuchi, A. H., & Hulse, S. H. (1993). Absolute pitch. Psychological Bulletin 113(2), 345. Treffert, D. (2009). The savant syndrome: An extraordinary condition. A synopsis: Past, present, future. Philosophical Transactions of the Royal Society B: Biological Sciences 364(1522), 1351– 1357. Vamvakari, T. (2013). My child and music: A survey exploration of the musical abilities and interests of children and young people diagnosed with autism spectrum conditions (Master’s dissertation). University of Roehampton, London. Welch, G. (2006). The musical development and education of young children. In B. Spodek & O. Saracho (Eds.), Handbook of research on the education of young children (pp. 251–267). Mahwah, NJ: Lawrence Erlbaum. Wing, L. (2003). The autistic spectrum: A guide for parents and professionals. London: Robinson.
1
http://www.jkp.com/jkpblog/2013/04/music-language-autism/
2
https://blog.oup.com/2012/12/music-proxy-language-autisic-children/
3
http://www.huffingtonpost.com/adam-ockelford/autism-genius_b_4118805.html.
SECTION VII
MU S IC , T HE B R A IN , A N D HE A LT H
CHAPT E R 28
NEUROLOGIC MUSIC THERAPY IN S E N S O R I M O TO R R E H A B I L I TAT I O N C O R E N E T H A U T A N D K L A U S MA RT I N S T E P H A N
I I of movement timing is often one of the most disturbing features for patients with neurological disorders, particularly in cerebrovascular disease and degenerative disorders such as Parkinson’s disease (PD); and can result in debilitating motor timing with regard to the manipulating ability of the upper extremities (e.g., in some patients after stroke) and to gait dynamics (e.g., after stroke with basal ganglia or cerebellar lesions or in patients with PD). Fortunately, basic science and clinical research supporting the use of music in the rehabilitation, maintenance, and development of movements of both the upper and lower extremities with a variety of neurologic disorders has grown tremendously over the last twenty-five years. Starting in the early 1990s, a series of research papers (McIntosh, Brown, Rice, & Thaut, 1997; McIntosh, Rice, Hurt, & Thaut, 1998; Miller, Thaut, McIntosh, & Rice, 1996; Thaut, McIntosh, Prassas, & Rice, 1993; Thaut, Schleiffers, & Davis,1991) became the foundation for investigating the importance of rhythm for
movement of both the upper and lower extremities in normal and neurologically impaired individuals. Since then, a substantial amount of research has shown the effect of rhythm and timing on optimization of motor planning and motor execution through entrainment of movement patterns, priming of the auditory motor pathway, and cueing of the movement period. Rhythmic entrainment, or the ability for the motor system to couple with the auditory system assuming a common period, provided the first testable motor theory for the use of auditory rhythm and music in therapy. Rhythmic auditory cueing accesses biological auditory-motor networks that create fast, temporally precise and stable synchronization mechanisms between sensory input and motor output (Stephan et al., 2002; Thaut, Hoemberg, Kenyon, & Hurt, 1998; Thaut & Kenyon, 2003). Neuroanatomically these synchronization mechanisms depend on a distributed set of circuits including a motor cortical-basal ganglia-thalamo-cortical circuit. SMA and putamen are basic nodes for beat perception and motor performance (Bengtsson et al., 2009; Grahn & Brett, 2007). Furthermore, the auditory and motor systems are closely linked from a peripheral level (cochlear root neurons synapsing with reticulospinal neurons) to frontotemporal pathways involving the arcuate fasciculus (Fernández-Miranda et al., 2015; Schmahman & Pandya, 2008) and to cortico-cortical loops between motor and auditory cortices connected through delta and beta oscillatory activity (Arnal, 2012; Arnal, Doelling, & Poeppel, 2015; Fujioka, Trainor, Large, & Ross, 2012). There is an ongoing debate about the relative contributions of the basal ganglia, especially corticostriatal loops and the cerebellum towards timing. Recent studies suggest that the cerebellum is mainly involved in absolute duration based timing when stimuli are presented irregularly, but not in relative timing based on a regular beat, while the striatum is supposed to be involved in both, absolute timing and relative timing (see Teki, Grube, & Griffiths, 2012; Teki, Grube, Kumar, & Griffiths, 2011). For an overview see Merchant, Grahn, Trainor, Rohrmeier, and Fitch (2015). Priming of motor activity is the ability of an external sensory (e.g., auditory) cue to stimulate recruitment of spinal motor neurons reducing the amount of time required for the muscles to respond to a given motor command. During walking this results in decreased variability of the muscle activation patterns in the lower extremities. The evidence for priming and
timing of the motor system via reticulospinal pathways has been demonstrated as early as 1967 (Paltsev & Elner) and 1976 (Rossignol & Melvill Jones). Recently it has been shown, that during sensory entrainment the corticostriatal system is already activated (for an overview: SameiroBarbosa & Geiser, 2016) and may be an alternative anatomical basis for sensorimotor synchronization and priming. Using EEG, Crasta and colleagues (Crasta, Thaut, Anderson, Davies, & Gavin, 2018) demonstrated that auditory priming improves neural synchronization in auditory-motor entrainment. This fits well with the observations obtained during normal gait in the early 1990s by Thaut and coworkers. Thaut et al. (1993) investigated the effect of auditory rhythm on priming of the lower extremities by looking at temporal parameters of the stride cycle and EMG activity in normal gait. In the rhythmic condition, subjects improved stride rhythmicity between the right and left lower extremities, showed delayed onset and shorter duration of gastrocnemius muscle activity, and increased integrated amplitude ratios for the gastrocnemius muscle. These results provided evidence that more focused and consistent muscle activity occurs during push-off when a rhythmic auditory cue is present, due to a priming effect that results in a more efficient recruitment of motor units in the spinal cord. The conclusions of this study led to the further exploration of the effect of rhythmic auditory cueing on temporal stride parameters and EMG patterns in patients with stroke and hemiparetic gait, which also demonstrated similar results (Thaut et al., 1993). During rhythmic auditory cueing the auditory stimulus does not provide cues for the endpoints of a movement, but provides an external cue for the duration of the movement period to help scale the timing and kinematics of the movement between the beats. Most often, arm and hand motions are used as discrete movements which are non-rhythmic in nature (e.g., to push or grasp something), whereas leg movements are often repeated over longer periods of time and are intrinsically rhythmic in nature (e.g., during walking). Thus, therapeutic interventions must support arm and leg function differentially, addressing the range of spatial, temporal, and muscular dynamics which influence motor behavior during the various stages of motor control. Parietal and premotor areas are well known to be involved in the preparatory activity of motor action under the “supervision” of prefrontal cortex. Even subliminal changes of an auditory interstimulus interval will alter cortical and subcortical activity in prefrontal areas,
thalamus, and parietal and premotor areas. These changes in brain activity correlate with specific changes of motor behavior (Stephan et al., 2002). Furthermore, external auditory stimuli can influence not only the timing, but also the spatiotemporal kinematics of motor behavior (Thaut, Kenyon, Hurt, McIntosh, & Hoemberg, 2002). Thus, external auditory stimuli are able to “access” parietal and premotor areas—presumably during the preparatory phase—and influence the exact timing and the spatiotemporal pattern of movements in a predictive way. Music has a wide range of rhythmic, melodic, harmonic, and dynamic-acoustical elements and research has shown it to be an effect therapeutic tool to influence both upper and lower extremity discrete, serial, and continuous movements. Experts in sports, such as professional rowers, have used musical sonification during training as feedback in order to control for the rhythmicity, the quality of the task, and assess the degree of synchrony between the different athletes (Schaffert & Mattes, 2015).
N
M
T
Neurologic Music Therapy (NMT) is a research based system of clinical techniques guided by music and neuroscience principles of music perception, cognition, and performance, and designed to address functional goals in the areas of sensorimotor, speech/language, and cognitive rehabilitation. Unlike many traditional approaches that focus on teaching compensatory strategies based on using the unimpaired functions to reintegrate patients into everyday activities, NMT techniques focus on directly addressing impairment and restoration of function based on current knowledge of how music can aid in cortical reorganization, motor learning, and neuromuscular re-education. In NMT, there are three standardized rhythmic-musical applications for rehabilitation, development, and maintenance of sensorimotor function: Rhythmic Auditory Stimulation (RAS), Patterned Sensory Enhancement (PSE), and Therapeutic Instrumental Music Performance (TIMP) (Thaut & Hoemberg, 2014). Rhythmic Auditory Stimulation (RAS) is a neurologic music therapy technique which can be used to facilitate the rehabilitation, development,
and maintenance of movements that are intrinsically biologically rhythmical. RAS uses the physiological effects of auditory rhythm on the motor system to improve the control of movement in rehabilitation of functional, stable, and adaptive gait patterns in patients with significant gait deficits due to neurological impairment (Thaut, Nickel, Kenyon, Meissner, & McIntosh, 2005). Driven by the principles of rhythmic entrainment, priming of the auditory motor pathways, and cueing of the movement period, research has shown RAS to be effective as both an immediate entrainment stimulus to provide rhythmic cues during movement, and as a facilitating stimulus for training in order to achieve more functional gait patterns. Studies investigating the effects of rhythmic auditory stimulation on Parkinson’s disease (de Dreu, van der Wilk, Poppe, Kwakkel, & van Wegen, 2012; Kadivar, Corcos, Foto, & Hondzinski, 2011), stroke (Schauer & Mauritz, 2003; Song et al., 2015; Song & Ryu, 2016; Suh et al., 2014), traumatic brain injury (Hurt, Rice, McIntosh, & Thaut, 1998), multiple sclerosis (Baram & Miller, 2007; Conklyn et al., 2010), spinal cord injuries (de l’Etoile, 2008), and spastic diplegic cerebral palsy (Baram & Lenger, 2012; Kim et al., 2011; Kim, Kwak, Park, & Cho, 2012) continue to show the significant impact of rhythm on gait kinematics through better posture, more appropriate step rates (step cadence) and stride length, and more efficient and symmetric muscle activation patterns in the lower extremities during walking. PET and fMRI studies have shown that external acoustic stimuli before or during movements lead to additional activations of dorsal premotor areas, which influence the timing of movements. As Parkinsonian patients are known to have a deficit to internally monitor and adjust the kinematic gait parameters, this additional influence via premotor areas may help to compensate for some of these deficits. A Cochrane review of music therapy for acquired brain injury (Bradt, Magee, Dileo, Wheeler, & McGilloway, 2010) suggested that rhythmic auditory stimulation may also be beneficial for improving gait parameters in patients with stroke, including gait velocity, cadence, stride length, and gait symmetry. Patterned Sensory Enhancement (PSE) is a technique which uses the rhythmic, melodic, harmonic, and dynamic-acoustical elements of music to provide temporal, spatial, and force cues for movements which are not intrinsically rhythmic by nature, but reflect functional exercise, movement patterns, and activities of daily living. Unlike RAS, which focuses on oscillatory movements such as gait and arm swing, PSE is applied to non-
biologically rhythmic movements such as arm and hand movements, pregait and advanced gait exercises, and functional movement sequences such as dressing or sit-to-stand transfers. In addition to temporal cues, PSE creates sonification of movement through musical patterns, harmonies, dynamic elements, and pitch, which help organize single, discrete motions (e.g., arm and hand movements during reaching and grasping), into functional movement patterns and sequences. The functional anatomy of PSE is difficult to grasp. Depending on the exact nature of the motor task, different cortical and subcortical parietal, premotor, and presumably also prefrontal areas will be involved. It might be that PSE is more concerned with reestablishment and optimization of cortical, cortico-subcortical and corticocerebellar circuits. For many years, research has shown that elements of music such as rhythm and pitch can help develop and re-establish three-dimensional movement trajectories, allowing for training of the specific aspects of space, time, and muscular dynamics of functional movement. Significant improvements in upper limb control and kinematics when paired with patterned musical cues have been shown across a wide range of populations including stroke (Buetefish, Hummelsheim, Denzler, & Mauritz, 1995; Luft et al., 2004; McCombe Waller, Harris-Love, Liu, & Whitall, 2006; Malcolm Massie, & Thaut, 2009; Thaut, Kenyon, et al., 2002; Thaut, Schicks, McIntosh, & Hoemberg, 2002), Parkinson’s disease (Brown, Thaut, Benjamin, & Cooke, 1993; Ma, Hwang, & Lin, 2009; Mak, 2006; Son & Kim, 2015), cerebral palsy (Peng et al., 2011; Wang et al., 2013), and Down syndrome (Robertson, Chua, Maraj, Kao, & Weeks, 2002; Robertson, Van Gemmert, & Maraj 2002). High intensity training with frequent repetition is important for successful rehabilitation of fine and gross motor control of the upper extremities, therefore PSE not only provides the auditory structures to drive and enhance the movement, but can also incorporate familiar songs and musical targets which can create an additional motivational component to therapy, often resulting in more repetitions of exercises and higher compliance with home exercise programs. Therapeutic Instrumental Music Performance (TIMP) is the playing of musical instruments in order to exercise and stimulate functional movement patterns. When implementing TIMP, appropriate musical instruments are selected in a therapeutically meaningful way in order to emphasize range of
motion, endurance, strength, functional hand movements, finger dexterity, and limb coordination (Chadwick & Clark, 1980; Elliott, 1982). During TIMP, instruments are not typically played in the traditional manner, but are placed in different locations to facilitate practice of the desired functional movements (Thaut, 2005). Engaging in musical instrument playing requires a close interaction between the sensorimotor, auditory, and visual systems. Using instruments as targets to practice movement allows for feedback and feedforward interactions between the auditory and premotor areas of the cortex, as well as engaging the cerebellum and the basal ganglia. The instrument provides the spatial parameters, while the auditory feedback provides input to make adjustments to the timing, muscular dynamics, and positioning of the movement. The sensory feedback provided in this audio-motor interaction is also essential for the potential increase of plasticity (Herholz & Zatorre, 2012). Plasticity is known to be enhanced during the early and subacute phase after lesions, such as stroke (post-lesional plasticity). Therefore, many interventions in stroke rehabilitation try to target this “window of opportunity” to enhance recovery. Research has shown that repetition is extremely important for learning and training movements. Through instruments such as the keyboard and percussion, or creative use of music technology, TIMP exercises can provide the opportunity to perform repetitive movements at various speeds and combinations, incorporating both unilateral and bilateral fine and gross motor skills. Through appropriate placement of instruments to facilitate repetition, discrete movement of the fingers, arms, and legs can be trained, as well as sequential movements of different limbs, incorporating bilateral engagement of both the upper and lower extremities. Beneficial effects with regard to both fine and gross motor control have been observed in subacute and chronic populations including stroke (Altenmüller, Marco-Pallares, Münte, & Schneider, 2009; Buetefish et al, 1995; Grau-Sánchez et al., 2013; Schneider, Münte, Rodriguez-Fornells, Sailer, & Altenmüller, 2010; Thaut, Kenyon, et al., 2002; Thaut, McIntosh, & Rice, 1997; Whithall, McCombe Waller, Silver, & Macko, 2000; Yoo, 2009), PD (Bernatzky, Bernatzky, Hesse, Staffen, & Ladurner, 2004; Bukawska, 2016; Bukawska, Krężałek, Mirek, Bujas, & Marchewka 2015; Pacchetti et al., 2000), cerebellar patients (Molinari et al., 2005), traumatic brain injury (Chong, Cho, & Kim, 2014), Down syndrome (Ringenbach et al., 2014), and
cerebral palsy (Chong, Cho, Jeong, & Kim, 2013; Turova et al., 2017). In addition to using TIMP as a learning and training tool for movements, Kojovic et al. (2012) saw a significant reduction in dystonia symptoms and electromyographic activity in the neck and orbicularis oculi muscles when playing the piano over music listening or imagining playing. Comparing the three interventions, RAS and PSE serve primarily as aids, which help to optimize movement trajectories for the performance of rhythmical or discrete motor tasks. In some patients they may also be used as tools to promote motor learning. TIMP also facilitates optimization of movement by providing the timing and spatial structure for the movements; however, it adds the additional visual and auditory feedback through the instruments which assist in the motor learning or neuromuscular reeducation.
A
M
D
Acquired brain injury (ABI) is defined as an injury to the brain which occurred after birth and is not hereditary, congenital, or degenerative. More specifically, ABIs are typically a result of an ischemic stroke, hemorrhage in the brain, lack of oxygen to the brain (hypoxia/anoxia), infections in the brain, toxic exposure, brain tumors, or a traumatic force to the head causing focal or diffuse trauma. It is one of the leading causes of death and disability in adult populations, and can cause a range of temporary and permanent motor, cognitive, and speech dysfunctions, depending on the type of injury and the range of the severity. With regard to recovery, children have the advantage that the degree of cerebral plasticity is much larger than in adults; therefore, they tend to recover better than adults from acquired motor dysfunctions. This also helps them when they suffer from disease in childhood. A commonality among all acquired brain injuries is that patients are affected at a specific time point, and depending on the severity of the neuronal damage, there will be a chance to recover afterwards. Acquired brain injuries often require the rehabilitation and retraining of movements that are biologically intrinsically rhythmic as well as discrete movements that are not intrinsically driven by an underlying rhythm. Research has
shown that despite acquired injury to the brain, the structural properties of music can often access rhythmic entrainment mechanisms, and successfully retrain movement by creating a stable anticipatory timescale, priming the auditory motor pathway, and therefore optimizing kinematic trajectory patterns. Motor deficit in gait after an ABI can vary depending on the affected area. In stroke, gait is often characterized by hemiplegia with sensory deficits and altered tone opposite to the lesion. In moderately affected patients who can walk a few meters at least with some help, this leads to altered gait parameters such as length of ground contact, stride length and load on both sides and generally to a reduction of gait velocity. Traumatic brain injuries often present with similar characteristics to stroke; however, plegia is not always unilateral, and can present unilaterally or bilaterally. A substantial amount of research over the last twenty-five years has provided new insights into the application of rhythm and timing to optimize motor planning and movement execution in ABI in both subacute (Kim & Oh, 2012; Roerdink, Bank, Peper, & Beek, 2011; Spaulding et al., 2013; Thaut et al, 1997, 2007) and chronic phases (Cha, Kim, Hwang, & Chung, 2014; Hurt, Rice, McIntosh, & Thaut, 1998; Kim & Oh, 2012). Consistent results show that in gait walking velocity, cadence, and stride length increased significantly more during training with RAS compared to the control conditions. A recent review (Yoo & Kim, 2016) summarized these findings and gave a positive recommendation towards regular use of this technique for the subacute and the chronic phases after stroke. Both metronome tones and metronome and music were effective types of cueing. While the previously mentioned studies primarily explored the entrainment of biologically intrinsic rhythms of neural gait oscillators, other studies have also explored discrete movements such as arm and hand movements that are not driven by underlying biologically rhythm (Altenmüller et al., 2009; Ford, Wagenaar, & Newell, 2007; Grau-Sánchez et al., 2013; Luft et al., 2004; Malcolm et al., 2009; Schmitz, Kroeger, & Effenberg, 2014; Thaut, Kenyon, et al., 2002; Whitall et al., 2000). Thaut and colleagues investigated auditory rhythm as a timekeeper to modify the onset, duration, and variability of electromyographic (EMG) patterns in the biceps and triceps during the performance of a gross motor task, revealing decreased variability in muscle activity during a motor task with auditory rhythm, indicating a more efficient use of the muscles, which could lead to
a patient’s ability to perform a task with more accuracy and for a longer period of time (Thaut, Schleiffers, & Davis, 1991). Additionally, Grau-Sánchez et al. (2013) found piano playing improved scores on the Action Research Arm Test (ARAT), Arm Paresis Score, and the Box and Block Test (BBT) in stroke patients, consistent with other studies looking at this population (Altenmüller et al., 2009; Rojo et al., 2011). The above studies give strong support for the use of both Patterned Sensory Enhancement (PSE) and Therapeutic Instrumental Music Performance (TIMP) with these patients. Thus, in acquired movement dysfunctions both supportive and motor learning approaches which increase plasticity are useful.
D
M
D
Movement disorders are characterized as neurological conditions that affect the speed, fluency, quality, and ease of movement. They can be hereditary or acquired (e.g., caused by medication side effects, environmental factors, or injury) and can present with lack of control of both voluntary and involuntary movements. Degenerative movement disorders are progressive by nature, and result in decreased function over time. Some of the most common movement disorders include: Parkinson’s disease and Parkinsonism, ataxia, dystonia, Huntington’s disease, Tourette syndrome, and essential tremor. Due to the degenerative nature of the disorders the major goal of the interventions will be to adjust movement performance to the dwindling resources and use these resources as efficiently as possible.
Parkinsonian Syndromes Parkinson’s disease, the most common movement disorder, is an idiopathic neurodegenerative disorder associated with progressive loss of dopaminergic neurons in the basal ganglia, due to the deterioration of the substantia nigra. Typical symptoms include progressive loss of muscle control, which leads to bradykinesia (slowing of movements), resting limb
tremor (trembling of the limbs and head while at rest), postural instability, rigidity (stiffness), and gait instability resulting in impaired balance. Key treatment goals when working with PD include increasing heel strike in order to promote longer stride lengths and decreased festinating gait patterns, increasing step cadence and walking speed, increasing initiation of movement and functional balance, and decreasing the risk of falls. Many studies have looked at Parkinson’s disease and the effects of Rhythmic Auditory Cuing (RAS) as an external time keeper to facilitate movement sequences that are not receiving the appropriate internal timing cues from the basal ganglia. Findings have shown that persons with PD, on and off medication, were able to improve their walking patterns through better posture; more appropriate step cadence and increased stride length; and more efficient and symmetric muscle activation patterns (McIntosh et al., 1997; Miller et al., 1996; Richards, Malouin, Bedard, & Cioni, 1992; Thaut, McIntosh, et al., 1996). Additionally, McIntosh et al. (1998) looked at long-term carry-over after a five-week RAS treatment program, finding that it took an average of 5 weeks for velocity scores to return to baseline. Falls are among the biggest contributors to loss of independent living, long-term institutionalization, and increased mortality (Johnell, Melton, Atkinson, O’Fallon, & Kurland, 1992). The risk of falls in a person with PD increases substantially from that of healthy elderly, presenting not only a serious concern over safety, but also over the enormous human and healthcare cost associated with falling. Wood, Bilclough, Bowron, and Walker (2002) found that out of 109 subjects with idiopathic PD and a mean Hoen/Yahr rating of 2, 68 percent experienced falls over a one-year period. Thaut and colleagues (Thaut, Rice, Braun Janzen, Hurt-Thaut, & McIntosh, 2018) examined and compared the effects of a continuous 24 week RAS treatment program to an intermittent RAS program with 8 weeks RAS training, 8 weeks without, for 24 weeks. Changes in ankle dorsiflexion, cadence, velocity, stride length, the Berg Balance Scale, fear of falling, the TUG test, and frequency and severity of falls were evaluated. The findings offered evidence that continuous and intermittent RAS treatment over time can be effective tools to reduce falls in persons with Parkinson’s disease; however, continuous RAS treatment resulted in significantly greater gains in dorsiflexion, cadence, velocity, stride length, and a reduction in severity level 1 falls and fear of falling, when comparing treatments. These results suggest that there is only a limited carry-over
effect for RAS in PD patients—presumably due to their pathophysiological deficit. This encourages the use of ongoing home training programs, also outside the therapy setting for these patients. About one third of people with Parkinson’s disease experience freezing episodes when initiating gait, changing directions, navigating around obstacles or in small spaces. Numerous studies have shown the effectiveness of rhythmic auditory cueing on the reduction of freezing episodes (e.g., Frazzitta, Maestri, Uccellini, Bertotti, & Abelli, 2009; Freedland et al., 2002; Howe, Lövgreen, Cody, Ashton, & Oldham, 2003; McIntosh et al., 1997; Morris, Suteerawattananon, Etnyre, Jankovic, & Protas, 2004; Thaut, McIntosh, et al., 1996; Willems et al., 2006). Additionally, when looking at kinematic changes due to the immediate entrainment effects of RAS gait training, Picelli et al. (2010) found increased hip range of motion and power during pull-off phase of gait and decreased ankle dorsiplantar flexion with rhythmic cueing. Other studies which have looked at RAS training programs found a slight increase in dorsiflexion over 5 weeks (Pau et al., 2016), and a significant increase over 8 week, and in 6 month training programs (Hurt-Thaut, 2014; Thaut et al., 2018). Until now the sequelae of Parkinson’s have been mainly treated by RAS approaches. However, it might be that discrete dysfunctions, such as freezing episodes may also benefit from specifically tailored PSE approaches.
Huntington’s Disease Huntington’s disease is a hereditary neurodegenerative disorder which results in motor disturbances such as hyperkinesia or dystonia, slow execution of movements, and poor coordination. Perceptual timing is even more impaired than in Parkinson patients (Cope, Grube, Singh, Burn, & Griffiths, 2014). The hyperkinetic choreatic movements often coexist with bradykinesia, and gait can present with a wide base of support, increased lateral sway, variability in swing and stance phases, difficulty with frequency modulation, and poor initiation of movement. Thaut and colleagues (Thaut, Lange, Miltner, Hurt, & Hoemberg, 1996; Thaut, Miltner, Lange, Hurt, & Hoemberg, 1999) explored velocity
modulation and rhythmic synchronization of gait in persons with Huntington’s disease, providing the first evidence that rhythmic facilitation could influence mobility in this population. A high variability in frequency entrainment was seen across subjects, with exact phase and period matching highly impaired. Comparisons of self-paced walking, rhythmic metronome cueing, and music, found that subjects were able to significantly modulate their gait velocity during both self-paced walking and with metronome cueing, but not during the music condition. Due to their prominent difficulties of perception of timing, a “simple” sensory signal may be most useful for them. Rhythmic facilitation improved locomotor function after a short training period, although disease progression had a clear impact on gait parameters.
Parkinsonism Parkinsonism is a general term used to describe impairments in motor function presenting with similar characteristics to Parkinson’s disease such as akinesia, hypokinesia, bradykinesia, motor blocks, rigidity, and problems with the initiation of cyclical movements. These symptoms are also found in movement disorders of different etiology such as vascular Parkinsonism, and drug-induced Parkinsonism and in related disorders, which affect also additional systems, such as progressive supranuclear palsy (PSP) and multiple system atrophy (MSA). Generally they do not respond as well to L-Dopa medications as patients with Parkinson’s. Furthermore, Cope et al. (2014) could show for patients with MSA, that their timing perception was more impaired than that in Parkinson’s, similar to that in Huntington’s. Little is known about the effects of rhythmic auditory cueing on Parkinsonism; however, given the strong effects seen in some of the related disorders, further research in this area is warranted.
Multiple Sclerosis
Multiple sclerosis is a prevalent autoimmune disease of the central nervous system which results in progressive demyelination resulting in scar tissue which causes widespread neurological sensory, motor, and cognitive symptoms such as paresthesia, progressive hemiparesis, ataxia, fatigue, and depression. Gait and postural dysfunctions are common in patients with multiple sclerosis and can affect static and dynamic stability, motor control, and coordination, leading to an increased risk of falls and decreased quality of life. Typical gait characteristics include reduced gait velocity, stride length, cadence, and increased step width, asymmetric gait, and increased double limb support time. Only recently have a number of studies bridged the gap in literature by looking at the effects of auditory rhythmic cueing on gait in people with multiple sclerosis (Conklyn et al., 2010; Seebacher, Kuisma, Glynn, & Berger, 2015, 2016; Shahraki, Sohrabi, Torbati, Nikkhah, & NaeimiKia, 2017). A systematic review of the effects of rhythmic auditory cueing in gait rehabilitation for multiple sclerosis (Ghai, Ghai, & Effenberg, 2017, 2018; Ghai, Ghai, Schmitz, & Effenberg, 2018), suggested evidence for a positive impact of rhythmic auditory cueing on reduction in the timed 25meter walk test, and spatiotemporal gait parameters: gait velocity, stride length, and cadence. The premise for using rhythmic cueing to learn, train, and retrain movement is built on a feedforward model of rhythm driving the motor and kinematic changes of movement. Baram and Miller (2007), however, studied how self-generated auditory feedback through an external apparatus can serve as a non-imposing reference which can provide a constant awareness of gait quality and an instantaneous sensory response to changes in gait for people with multiple sclerosis. Results of this study may provide evidence for the use of an auditory feedback system to enhance patient awareness and effort to improve gait quality. In the last years basic and clinical research has provided more effective drug treatment for patients with MS. This leads to a change of the therapeutic goal in more and more patients: the goal is to halt the progression of MS and not “only” to slow the progression of the disease. This change of treatment strategy may open the possibility to use TIMP in these patients more intensively.
Healthy Elderly Several factors related to the normal process of aging can affect strength, agility, flexibility, and muscle tone, therefore leading to sensorimotor changes and safety risk in this population. Decreased bone density or osteoporosis can not only decrease stability, but also make bones more vulnerable to breaks. Decreased or lack of physical activity can result in poor muscle tone, decreased strength, and loss of bone mass and flexibility, putting someone at higher risk for falls and injury. Age-related visual impairments such as cataracts and glaucoma can alter depth perception, visual acuity, and peripheral vision, making it more difficult to safely maneuver through one’s environment. Medications can reduce mental alertness, impair balance and gait, and cause drops in systolic blood pressure while standing. Additionally, environmental hazards such as poor lighting, loose rugs, lack of grab bars, objects on the floor, or unsturdy furniture can cause risks for falling. Listening to music and sometimes dancing to music is often quite popular with the elderly. We will, however, concentrate in this chapter on the use of music to prevent falls. In a 2012 Cochrane review, Gillespie and colleagues assessed the effects of interventions designed to reduce the incidence of falls in older people living in the community, by examining 159 random control trials with 79,193 participants (Gillespie et al., 2012). The conclusions of this review were that multifactorial assessment and intervention programs—such as monitoring medication, treatment of visual problems, fall prevention education, and non-slip shoes—reduce the rate of falls, but not the risk of falling. The only interventions which consistently reduced both the rate and risk of falling were group and home-based exercise programs and home safety assessments. Hurt-Thaut (2014), found that healthy elderly achieved a statistically significant increase in degrees of dorsiflexion, velocity, cadence, stride length, and the Berg Balance Scale scores when participating in both a continuous and intermittent 6-month rhythmic based exercise and walking program.
D
D
Autism Spectrum Disorder Autism spectrum disorder (ASD) is a neurodevelopmental disorder that is often characterized by deficits in social interaction, communication, and unusual behaviors, such as clumsy uncoordinated or repetitive movement, poor balance and postural control (Fournier et al., 2010). Only in recent years has research attention turned to the delays in fundamental motor development (e.g., oral motor control, coordination, gait) in ASD compared to typically developing children, and how those delays can directly influence interpersonal social exchange, cognitive functions such as attention and executive function, and the acquisition and development of written and spoken language. Torres et al. (2013) published the most accessed paper on ASD from a movement perspective, laying out a broad theoretical framework to research, treat, and track autism. Torres’s research has been at the forefront of research exploring a complex system for analyzing micro-movements as a reflection of the layers of multi-directional internal and external influences on the central and peripheral nervous systems during goal oriented movement in response to cognitive motor and social task (Torres & Donnellan, 2015). Although there is a limited body of research looking specifically at the influence of elements of music on sensorimotor function in this population, based on the principles of auditory motor entrainment, rhythm-based interventions such as RAS, PSE, and TIMP could aid in regulatory control of proprioceptive movement and provide adaptive mechanisms to decrease movement variability, smooth movement trajectories, and improve gait parameters such as symmetry and stability.
Cerebral Palsy Cerebral palsy (CP) is a chronic disability of the central nervous system characterized by abnormal control of movement and posture. A person with CP can present with quadriplegia (both arms and legs affected), diplegia (two limbs affected), or hemiplegia (one side of the body affected). Motor symptoms can vary widely, ranging from minor difficulty with fine motor
movements such as grasping and manipulation of objects, to significant muscular and motor control of all four limbs. A few studies have looked at the effects of rhythm and instrument playing on this population. When investigating the effects of RAS on adults with CP (Kim et al., 2011), kinematic analysis revealed significant increases in the anterior tilt of the pelvis and hip flexion at initial contact with RAS training; however, there were no statistical differences in knee, ankle, and foot kinematic parameters. Furthermore, Kim et al. (2012), looked at the effects of RAS versus neurodevelopmental treatment (Bobath) on gait patterned in adults with cerebral palsy over an intensive 3-week training period. Findings indicated that RAS significantly increased cadence, velocity, stride length, and step length, in addition to showing significant increases in overall normalization of the gait on the gait deviation index scores compared to the neurodevelopmental treatment group. In contrast, the neurodevelopmental treatment group showed significant decreases in cadence, velocity, stride length, and step length, with a significant increase in step time; however, neurodevelopmental treatment showed significant improvements in internal and external rotation s of hip joints. In support of TIMP, Chong et al. (2013) explored finger exercises on the keyboard as a tool to increase manual dexterity and velocity in adults with cerebral palsy, finding improvements after twelve 30-minute TIMP sessions.
S
C
Since the 1990s, a strong body of research evidence has set the foundation for the use of rhythm and music as important tools in the development, rehabilitation, and maintenance of sensorimotor function, particularly in the treatment of neurologic disorders. Through external rhythmic cueing, rhythmic entrainment optimizes the execution of a motor pattern by priming the motor system and creating anticipatory rhythmic templates to allow for optimal anticipation, motor planning, and execution of movement (Thaut, McIntosh, & Hoemberg, 2014). While the temporal structures in music remain the central elements when using music in the treatment of sensorimotor dysfunction, other elements such as pitch, dynamics, and
harmony can also enhance and shape complex movements such as arm and hand movements that are not intrinsically rhythmic. Neurologic Music Therapy is a research-based system of clinical techniques guided by music and neuroscience principles of music perception, cognition, and performance. In the area of sensorimotor rehabilitation, three standardized techniques, RAS, PSE, and TIMP have become well accepted in the treatment of impairment and restoration of function based on current knowledge of how music can aid in cortical reorganization, motor learning, and neuromuscular re-education.
R Altenmüller, E., Marco-Pallares, J., Münte, T. F., & Schneider, S. (2009). Neural reorganization underlies improvement in stroke-induced motor dysfunction by music-supported therapy. Annals of the New York Academy of Sciences 1169, 395–405. Arnal, L. H. (2012). Predicting “when” using the motor system’s beta-band oscillations. Frontiers in Human Neuroscience 6. Retrieved from https://doi.org/10.3389/fnhum.2012.00225 Arnal, L. H., Doelling, K. B., & Poeppel, D. (2015). Delta–beta coupled oscillations underlie temporal prediction accuracy. Cerebral Cortex 25(9), 3077–3085. Baram, Y., & Lenger, R. (2012). Gait improvement in patients with cerebral palsy by visual and auditory feedback. Neuromodulation 15(1), 48–52. Baram, Y., & Miller, A. (2007). Auditory feedback control for improvement of gait in patients with multiple sclerosis. Journal of Neurological Sciences 254(1–2), 90–94. Bengtsson, S. L., Ullén, F., Henrik Ehrsson, H., Hashimoto, T., Kito, T., Naito, E., … Sadato, N. (2009). Listening to rhythms activates motor and premotor cortices. Cortex 45(1), 62–71. Bernatzky, G., Bernatzky, P., Hesse, H. P., Staffen, W., & Ladurner, G. (2004). Stimulating music increases motor coordination in patients afflicted by Morbus Parkinson. Neuroscience Letters 361, 4–8. Bradt, J., Magee, W. L., Dileo, C., Wheeler, B. L., & McGilloway, E. (2010). Music therapy for acquired brain injury. Cochrane Database of Systematic Reviews 7, CD006787. doi:10.1002/14651858.CD006787.pub2 Brown, S. H., Thaut, M. H., Benjamin, J., & Cooke, J. D. (1993). Effects of rhythmic auditory cueing on temporal sequencing of complex arm movements. Proceedings of the Society for Neuroscience 227(2). Buetefish, C., Hummelsheim, H., Denzler, P., & Mauritz, K. H. (1995). Repetitive training of isolated movements improves the outcome of motor rehabilitation of the centrally paretic hand. Journal of Neurological Sciences 130(1), 59–68. Bukowska, A. A. (2016). Influence of neurologic music therapy to improve the activity level in a group of patients with PD. Nordic Journal of Music Therapy 25(1), 14. Bukowska, A. A., Krężałek, P., Mirek, E., Bujas, P., & Marchewka, A. (2015). Neurologic music therapy training for mobility and stability rehabilitation with Parkinson’s disease: A pilot study. Frontiers in Human Neuroscience 9. Retrieved from https://doi.org/10.3389/fnhum.2015.00710
Cha, Y., Kim, Y., Hwang, S., & Chung, Y. (2014). Intensive gait training with rhythmic auditory stimulation in individuals with chronic hemiparetic stroke: A pilot randomized controlled study. Neurorehabilitation 35(4), 681–688. Chadwick, D. M., & Clark, C. A. (1980). Adapting music instruments for the physically handicapped. Music Educators Journal 67(3), 56–59. Chong, H. J., Cho, S. R., Jeong, E., & Kim, S. J. (2013). Finger exercise with keyboard playing in adults with cerebral palsy: A preliminary study. Journal of Exercise Rehabilitation 9(4), 420–425. Chong, H. J., Cho, S. R., & Kim, S. J. (2014). Hand rehabilitation using MIDI keyboard playing in adolescents with brain damage: A preliminary study. Neurorehabilitation 34(1), 147–155. Conklyn, D., Stough, D., Novak, E., Paczak, S., Chemali, K., & Bethoux, F. (2010). A home-based walking program using rhythmic auditory stimulation improves gait performance in patients with multiple sclerosis: A pilot study. Neurorehabilitation and Neural Repair 24(9), 835–842. Cope, T. E., Grube, M., Singh, B., Burn, D. J., & Griffiths, T. D. (2014). The basal ganglia in perceptual timing: Timing performance in multiple system atrophy and Huntington’s disease. Neuropsychologia 52(100), 73–81. Crasta, J. E., Thaut, M. H., Anderson, C. W., Davies, P. L., & Gavin, W. J. (2018). Auditory priming improves neural synchronization in auditory-motor entrainment. Neuropsychologia 117, 102–112. de Dreu, M. J., van der Wilk, A. S., Poppe, E., Kwakkel, G., & van Wegen, E. E. (2012). Rehabilitation, exercise therapy and music in patients with Parkinson’s disease: A meta-analysis of the effects of music-based movement therapy on walking ability, balance and quality of life. Parkinsonism & Related Disorders 18(Suppl. 1), 114–119. de l’Etoile, S. K. (2008). The effect of rhythmic auditory stimulation on the gait parameters of patients with incomplete spinal cord injury: An exploratory pilot study. International Journal of Rehabilitation Research 31(2), 155–157. Elliott, B. (1982). Guide to the selection of musical instruments with respect to physical ability and disability. Saint Louis, MO: MMB Music, Inc. Fernández-Miranda, J. C., Wang, Y., Pathak, S., Stefaneau, L., Verstynen, T., & Yeh, F. C. (2015). Asymmetry, connectivity, and segmentation of the arcuate fascicle in the human brain. Brain Structure & Function 220(3), 1665–1680. Ford, M., Wagenaar, R., & Newell, K. (2007). The effects of auditory rhythms and instruction on walking patterns in individuals post stroke. Gait and Posture 26(1), 150–155. Fournier, K. A., Kimberg, C. I., Radonovich, K. J., Tillman, M. D., Chow, J. W., Lewis, M. H., … Hass, C. J. (2010). Increased static and dynamic postural control in children with autism spectrum disorders. Gait Posture 32(1): 6–9. Frazzitta, G., Maestri, R., Uccellini, D., Bertotti, G., & Abelli, P. (2009). Rehabilitation treatment of gait in patients with Parkinson’s disease with freezing: A comparison between two physical therapy protocols using visual and auditory cues with or without treadmill training. Movement Disorders 24(8), 1139–1143. Freedland, R. L., Festa, C., Sealy, M., McBean, A., Elghazaly, P., Capan, A., … Rothman, J. (2002). The effects of pulsed auditory stimulation on various gait measurements in persons with Parkinson’s disease. Neurorehabilitation 17(1), 81–87. Fujioka, T., Trainor, L. J., Large, E. W., & Ross, B. (2012). Internalized timing of isochronous sounds is represented in neuromagnetic beta oscillations. Journal of Neuroscience 32(5), 1791–1802. Ghai, S., Ghai, I., & Effenberg, A. O. (2017). Effect of rhythmic auditory cueing on gait in cerebral palsy: A systematic review and meta-analysis. Neuropsychiatric Disease and Treatment 14, 43–59. Ghai, S., Ghai, I., & Effenberg, A. O. (2018). Effect of rhythmic auditory cueing on aging gait: A systematic review and meta-analysis. Aging and Disease 9(5), 901–923.
Ghai, S., Ghai, I., Schmitz, G., & Effenberg, A. O. (2018). Effect of rhythmic auditory cueing on Parkinsonian gait: A systematic review and meta-analysis. Scientific Reports 8, 506. Gillespie,L. D., Robertson, M. C., Gillespie, W. J., Sherrington, C., Gates, S., Clemson, L. M., & Lamb, S. E. (2012). Interventions for preventing falls in older people living in the community. Cochrane Database of Systematic Reviews 9, CD007146. Grahn, J. A., & Brett, M. (2007). Rhythm and beat perception in motor areas of the brain. Journal of Cognitive Neuroscience 19(5), 893–906. Grau-Sánchez, J., Amengual, J. L., Rojo, N., Veciana de las Heras, M., Montero, J., Rubio, F., … Rodríguez-Fornells, A. (2013). Plasticity in the sensorimotor cortex induced by music-supported therapy in stroke patients: A TMS study. Frontiers in Human Neuroscience 7. Retrieved from https://doi.org/10.3389/fnhum.2013.00494 Herholz, S. C., & Zatorre, R. J. (2012). Musical training as a framework for brain plasticity: Behavior, function, and structure. Neuron 76(3), 486–502. Howe, T. E., Lövgreen, B., Cody, F. W., Ashton, V. J., & Oldham, J. A. (2003). Auditory cues can modify the gait of persons with early-stage Parkinson’s disease: A method for enhancing Parkinsonian walking performance. Clinical Rehabilitation 17(4), 363–367. Hurt, C. P., Rice, R. R., McIntosh, G. C., & Thaut, M. H. (1998). Rhythmic auditory stimulation in gait training for patients with traumatic brain injury. Journal of Music Therapy 35(4), 228–241. Hurt-Thaut, C. P. (2014). Rhythmic auditory stimulation to reduce falls in healthy elderly and patients with Parkinson’s disease (Doctoral dissertation). UMI dissertation publishing, 3635683. Johnell, O., Melton, L. J. III, Atkinson, E. J., O’Fallon, W. M., & Kurland, L. T. (1992). Fracture risk in patients with Parkinsonism: A population-based study in Olmsted County, Minnesota. Age and Ageing 21(1), 32–38. Kadivar, Z., Corcos, D. M., Foto, J., & Hondzinski, J. M. (2011). Effect of step training and rhythmic auditory stimulation on functional performance in Parkinson patients. Neurorehabilitation and Neural Repair 25(7), 626–635. Kim, J. S., & Oh, D. W. (2012). Home-based auditory stimulation training for gait rehabilitation of chronic stroke patients. Journal of Physical Therapy Science 24(8), 775–777. Kim, S., Kwak, E., Park, E., & Cho, S. (2012). Differential effects of rhythmic auditory stimulation and neurodevelopmental treatment/Bobath on gait patterns in adults with cerebral palsy: A randomized controlled trial. Clinical Rehabilitation 26(10), 904–914. Kim, S. J., Kwak, E. E., Park, E. S., Lee, D. S., Kim, K. J., Song, J. E., & Cho, S. R. (2011). Changes in gait patterns with rhythmic auditory stimulation in adults with cerebral palsy. Neurorehabilitation 29(3), 233–241. Kojovic, M., Pareés, I., Sadnicka, A., Kassavetis, P., Rubio-Agusti, I., Saifee, T. A., & Bhatia, K. P. (2012). The brighter side of music in dystonia. Archives of Neurology 69(7), 917–919. Luft, A. R., McCombe-Waller, S., Whitall, J., Forrester, L. W., Macko, R., Sorkin, J. D., … Hanley, D. F. (2004). Repetitive bilateral arm training and motor cortex activation in chronic stroke: A randomized controlled trial. Journal of the American Medical Association 292(15), 1853–1861. Ma, H. I., Hwang, W. J., & Lin, K. C. (2009). The effects of two different auditory stimuli on functional arm movement in persons with Parkinson’s disease: A dual-task paradigm. Clinical Rehabilitation 23(3), 229–237. McCombe Waller, S., Harris-Love, M., Liu, W., & Whitall, J. (2006). Temporal coordination of the arms during bilateral simultaneous and sequential movements in patients with chronic hemiparesis. Experimental Brain Research 168(3), 450–454. McIntosh, G. C., Brown, S. H., Rice, R. R., & Thaut, M. H. (1997). Rhythmic auditory-motor facilitation of gait patterns in patients with Parkinson’s disease. Journal of Neurology, Neurosurgery, and Psychiatry 62(1), 22–26.
McIntosh, G. C., Rice, R. R., Hurt, C. P., & Thaut, M. H. (1998). Long-term training effects of rhythmic auditory stimulation on gait in patients with Parkinson’s disease. Movement Disorders 13(2), 212 [Abstract]. Mak, M. (2006). Feed-forward audio-visual cues could enhance sit-to-stand in Parkinsonian patients. Proceedings of the 4th World Congress for Neurorehabilitation, F1B-7. Malcolm, M. P., Massie, C., & Thaut, M. H. (2009). Rhythmic auditory-motor entrainment improves hemiparetic arm kinematics during reaching movements: A pilot study. Topics in Stroke Rehabilitation 16(1), 69–79. Merchant, H., Grahn, J., Trainor, L., Rohrmeier, M., & Fitch, W. T.(2015). Finding the beat: A neural perspective across humans and non-human primates. Philosophical Transactions of the Royal Society B: Biological Sciences 370(1664), 20140093. Miller, R. A., Thaut, M. H., McIntosh, G. C., & Rice, R. R. (1996). Components of EMG symmetry and variability in Parkinsonian and healthy elderly gait. Electroencephalography and Clinical Neurophysiology/Electromyography and Motor Control 101(1), 1–7. Molinari, M., Leggio, M., Filippini, V., Gioia, M., Cerasa, A., & Thaut, M. (2005). Sensorimotor transduction of time information is preserved in subjects with cerebellar damage. Brain Research Bulletin 67(6), 448–458. Morris, G. S., Suteerawattananon, M., Etnyre, B. R., Jankovic, J., & Protas, E. J. (2004). Effects of visual and auditory cues on gait in individuals with Parkinson’s disease. Journal of the Neurological Sciences 219(1–2), 63–69. Pacchetti, C., Mancini, F., Aglieri, R., Fundaro, C., Martignoni, E., & Nappi, G. (2000). Active music therapy in Parkinson’s disease: An integrative model method for motor and emotional rehabilitation, Psychosomatic Medicine 62(3), 386–393. Paltsev, Y. I., & Elner, A. M. (1967). Change in the functional state of the segmental apparatus of the spinal cord under the influence of sound stimuli and its role in voluntary movement. Biophysics 12, 1219–1226. Pau, M., Corona, F., Pili, R., Casula, C., Sors, F., Agostini, T., … Murgia, M. (2016). Effects of physical rehabilitation integrated with rhythmic auditory stimulation on spatio-temporal and kinematic parameters of gait in Parkinson’s disease. Frontiers in Neurology 7. Retrieved from https://doi.org/10.3389/fneur.2016.00126 Peng, Y. C., Lu, W. T., Wang, T. H., Chen, Y. L., Liao, H. F., Lin, K. H., & Tang, P. F. (2011). Immediate effects of therapeutic music on loaded sit-to-stand movement in children with spastic diplegia. Gait & Posture 33(2), 274–278. Picelli, A., Camin, M., Tinazzi, M., Vangelista, A., Cosentino, A., Fiaschi, A., & Smania, N. (2010). Three-dimensional motion analysis of the effects of auditory cueing on gait pattern in patients with Parkinson’s disease: A preliminary investigation. Neurological Sciences 31(4), 423–430. Richards, C. L., Malouin, F., Bedard, P. J., & Cioni, M. (1992). Changes induced by L-Dopa and sensory cues on the gait of Parkinsonian patients. In M. Wollacot & F. Horak (Eds.), Posture and gait: Control mechanisms (pp. 126–129). Eugene, OR: University of Oregon Books. Ringenbach, S. D., Zimmerman, K., Chen, C. C., Mulvey, G. M., Holzapfel, S. D., Weeks, D. J., & Thaut, M. H. (2014). Adults with Down syndrome performed repetitive movements fast with continuous music cues. Journal of Motor Learning and Development 2(3), 47–54. Robertson, S. D., Chua, R., Maraj, B. K., Kao, J. C., & Weeks, D. J. (2002). Bimanual coordination dynamics in adults with Down syndrome. Motor Control 6(4), 388–407. Robertson, S. D., Van Gemmert, A. W., & Maraj, B. K. (2002). Auditory information is beneficial for adults with Down syndrome in a continuous bimanual task. Acta Psychologica 110(2), 213–229. Roerdink, M., Bank, P. J., Peper, C. L., & Beek, P. J. (2011). Walking to the beat of different drums: Practical implications for the use of acoustic rhythms in gait rehabilitation. Gait & Posture 33(1),
690–694. Rojo, N., Amengual, J., Juncadella, M., Rubio, F., Camara, E., Marco-Pallares, J., … Altenmüller, E. (2011). Music-supported therapy induces plasticity in the sensorimotor cortex in chronic stroke: A single-case study using multimodal imaging (fMRI-TMS). Brain Injury 25(7–8), 787–793. Rossignol, S., & Melvill Jones, G. (1976). Audiospinal influences in man studied by the H-reflex and its possible role in rhythmic movement synchronized to sound. Electroencephalography & Clinical Neurophysiology 41(1), 83–92. Sameiro-Barbosa, C. M., & Geiser, E. (2016). Sensory entrainment mechanisms in auditory perception: Neural synchronization cortico-striatal activation. Frontiers in Neuroscience 10. Retrieved from https://doi.org/10.3389/fnins.2016.00361 Schaffert, N., & Mattes, K. J. (2015). Effects of acoustic feedback training in elite-standard pararowing. Journal of Sports Science 33(4), 411–418. Schauer, M., & Mauritz, K. H. (2003). Musical motor feedback (MMF) in walking hemiparetic stroke patients: Randomized trials of gait improvement. Clinical Rehabilitation 17(7), 713–722. Schmahmann, J. D., & Pandya, D. N. (2008). Disconnection syndromes of basal ganglia, thalamus, and cerebrocerebellar systems. Cortex 44(8), 1037–1066. Schmitz, G., Kroeger, D., & Effenberg, A. O. (2014). A mobile sonification system for stroke rehabilitation. Paper presented at the 20th International Conference on Auditory Display, New York. Schneider, S., Münte, T., Rodriguez-Fornells, A., Sailer, M., & Altenmüller, E. (2010). Musicsupported training is more efficient than functional motor training for recovery of fine motor skills in stroke patients. Music Perception: An Interdisciplinary Journal 27(4), 271–280. Seebacher, B., Kuisma, R., Glynn, A., & Berger, T. (2015). Rhythmic cued motor imagery and walking in people with multiple sclerosis: A randomised controlled feasibility study. Pilot and Feasibility Studies 1, 25. doi:10.1186/s40814-015-0021-3 Seebacher, B., Kuisma, R., Glynn, A., & Berger, T. (2016). The effect of rhythmic-cued motor imagery on walking, fatigue and quality of life in people with multiple sclerosis: A randomised controlled trial. Multiple Sclerosis Journal 23(2), 286–296. Shahraki, M., Sohrabi, M., Torbati, H. T., Nikkhah, K., & NaeimiKia, M. (2017). Effect of rhythmic auditory stimulation on gait kinematic parameters of patients with multiple sclerosis. Journal of Medicine and Life 10(1), 33–37. Son, H., & Kim, E. (2015). Kinematic analysis of arm and trunk movements in the gait of Parkinson’s disease patients based on external signals. Journal of Physical Therapy Science 27(12), 3783–3786. Song, G. B., & Ryu, H. J. (2016). Effects of gait training with rhythmic auditory stimulation on gait ability in stroke patients. Journal of Physical Therapy Science 28(5), 1403–1406. Song, J. H., Zhou, P. Y., Cao, Z. H., Ding, Z. G., Chen, H. X., & Zhang, G. B. (2015). Rhythmic auditory stimulation with visual stimuli on motor and balance function of patients with Parkinson’s disease. European Review for Medical and Pharmacological Sciences 19(11), 2001–2007. Spaulding, S. J., Barber, B., Colby, M., Cormack, B., Mick, T., & Jenkins, M. E. (2013). Cueing and gait improvement among people with Parkinson’s disease: A meta-analysis. Archives of Physical Medicine and Rehabilitation 94(3), 562–570. Stephan, K. M., Thaut, M. H., Wunderlich, G., Schicks, W., Tian, B., Tellmann, L., … Hoemberg, V. (2002). Conscious and subconscious sensorimotor synchronization: Prefrontal cortex and the influence of awareness. NeuroImage 15(2), 345–352. Suh, J. H., Han, S. J., Jeon, S. Y., Kim, H. J., Lee, J. E., Yoon, T. S., & Chong, H. J. (2014). Effect of rhythmic auditory stimulation on gait and balance in hemiplegic stroke patients. Neurorehabilitation 34(1), 193–199.
Teki, S., Grube, M., & Griffiths, T. D. (2012). A unified model of time perception accounts for duration-based and beat-based timing mechanisms. Frontiers in Integrative Neuroscience 5. Retrieved from https://doi.org/10.3389/fnint.2011.00090 Teki, S., Grube, M., Kumar, S., & Griffiths, T. D. (2011). Distinct neural substrates of duration-based and beat-based auditory timing. Journal of Neuroscience 31(10), 3805–3812. Thaut, M. H. (2005). The future of music in therapy and medicine. Annals of the New York Academy of Sciences 1060, 303–308. Thaut, M. H., & Hoemberg, V. (Eds.) (2014). The Oxford handbook of neurologic music therapy. Oxford: Oxford University Press. Thaut, M. H., Hoemberg, V., Kenyon, G., & Hurt, C. P. (1998). Rhythmic entrainment of hemiparetic arm movements in stroke patients. Proceedings of the Society for Neuroscience 653(7) [Abstract]. Thaut, M. H., & Kenyon, G. P. (2003). Rapid motor adaptations to subliminal frequency shifts during syncopated rhythmic sensorimotor synchronization. Human Movement Science 22(3), 321–338. Thaut, M. H., Kenyon, G. P., Hurt, C. P., McIntosh, G. C., & Hoemberg, V. (2002). Kinematic optimization of spatiotemporal patterns in paretic arm training with stroke patients. Neuropsychologia 40(7), 1073–1081. Thaut, M. H., Lange, H., Miltner, R., Hurt, C. P., & Hoemberg, V. (1996). Rhythmic entrainment of gait patterns in Huntington’s disease patients. Proceedings of the Society for Neuroscience 727(6) [Abstract]. Thaut, M. H., Leins, A. K., Rice, R. R., Argstatter, H., Kenyon, G. P., McIntosh, G. C., & Fetter, M. (2007). Rhythmic auditory stimulation improves gait more than NDT/Bobath training in nearambulatory patients early post-stroke: A single-blind, randomized trial. Neurorehabilitation and Neural Repair 21(5), 455–459. Thaut, M. H., McIntosh, G. C., & Hoemberg, V. (2014). Neurobiological foundations of neurologic music therapy: Rhythmic entrainment and the motor system. Frontiers in Psychology 5. Retrieved from https://doi.org/10.3389/fpsyg.2014.01185 Thaut, M. H., McIntosh, G. C., Prassas, S. G., & Rice, R. R. (1993). The effect of auditory rhythmic cuing on stride and EMG patterns in hemiparetic gait of stroke patients. Journal of Neurologic Rehabilitation 7(1), 9–16. Thaut, M. H., McIntosh, G. C., & Rice, R. R. (1997). Rhythmic facilitation of gait training in hemiparetic stroke rehabilitation. Journal of Neurological Sciences 151(2), 207–212. Thaut, M. H., McIntosh, G. C., Rice, R. R., Miller, R. A., Rathbun, J., & Brault, J. M. (1996). Rhythmic auditory stimulation in gait training for Parkinson’s disease patients. Movement Disorders 11(2), 193–200. Thaut, M. H., Miltner, R., Lange, H. W., Hurt, C. P., & Hoemberg, V. (1999). Velocity modulation and rhythmic synchronization of gait in Huntington’s disease. Movement Disorders 14(5), 808– 819. Thaut, M. H., Nickel, A., Kenyon, G. P., Meissner, N., & McIntosh, G. C. (2005). Rhythmic auditory stimulation (RAS) for gait training in hemiparetic stroke rehabilitation: An international multicenter study. Proceedings of the Society for Neuroscience 756(6). Thaut, M. H., Rice, R. R., Braun Janzen, T., Hurt-Thaut, C. P., & McIntosh, G. C. (2018). Rhythmic auditory stimulation for reduction of falls in Parkinson’s disease: A randomized controlled study. Clinical Rehabilitation, July 23. doi:10.1177/0269215518788615 Thaut, M. H., Schicks, W., McIntosh, G. C., & Hoemberg, V. (2002). The role of motor imagery and temporal cuing in hemiparetic arm rehabilitation. Neurorehabilitation and Neural Repair 16, 115. Thaut, M. H., Schleiffers, S., & Davis, W. B. (1991). Analysis of EMG activity in biceps and triceps muscle in a gross motor task under the influence of auditory rhythm. Journal of Music Therapy 28, 64–88.
Torres, E. B., Brincker, M., Isenhower, R. W., Yanovich, P., Stigler, K. A., Nurnberger, J. I., … José, J. V. (2013). Autism: The micro-movement perspective. Frontiers in Integrative Neuroscience 7. Retrieved from https://doi.org/10.3389/fnint.2013.00032 Torres, E. B., & Donnellan, A. M. (2015). Editorial for research topic “Autism: The movement perspective.” Frontiers in Integrative Neuroscience 9. Retrieved from https://doi.org/10.3389/fnint.2015.00012 Turova, V., Alves-Pinto, A., Ehrlich, S., Blumenstein, T., Cheng, G., & Lampe, R. (2017). Effects of short-term piano training on measures of finger tapping, somatosensory perception and motorrelated brain activity in patients with cerebral palsy. Neuropsychiatric Disease and Treatment 13, 2705–2718. Wang, T. H., Peng, Y. C., Chen, Y. L., Lu, T. W., Liao, H. F., Tang, P. F., & Shieh, J. Y. (2013). A home-based program using patterned sensory enhancement improves resistance exercise effects for children with cerebral palsy: A randomized controlled trial. Neurorehabilitation and Neural Repair 27(8), 684–694. Whitall, J., McCombe Waller, S., Silver, K. H., & Macko, R. F. (2000). Repetitive bilateral arm training with rhythmic auditory cueing improves motor function in chronic hemiparetic stroke. Stroke 31(10), 2390–2395. Willems, A. M., Nieuwboer, A., Chavret, F., Desloovere, K., Dom, R., Rochester, L., … Van Wegen, E. (2006). The use of rhythmic auditory cues to influence gait in patients with Parkinson’s disease, the differential effect for freezers and non-freezers, an explorative study. Disability and Rehabilitation 28(11), 721–728. Wood, B. H., Bilclough, J. A., Bowron, A., & Walker, R. W. (2002). Incidence and prediction of falls in Parkinson’s disease: A prospective multidisciplinary study. Journal of Neurology, Neurosurgery & Psychiatry 72(6), 721–725. Yoo, J. (2009). The role of therapeutic instrumental music performance in hemiparetic arm rehabilitation. Music Therapy Perspectives 27(1), 16–24. Yoo, G. E., & Kim, S. J. (2016). Rhythmic auditory cueing in motor rehabilitation for stroke patients: Systematic review and meta-analysis. Journal of Music Therapy 53(2), 149–177. Zatorre, R. J., Halpern, A. R., & Herholz, S. C. (2012). Neuronal correlates of perception, imagery, and memory for familiar tunes. Journal of Cognitive Neuroscience 24(6), 1382–1397.
CHAPT E R 29
NEUROLOGIC MUSIC THERAPY FOR SPEECH AND LANGUAGE R E H A B I L I TAT I O N Y U N E S . L E E, C O R E N E T H A U T, A N D C H A R L E N E S A N TO N I
I T are anecdotal and clinical reports—some of which trace back hundreds of years—as to the fact that music, especially singing, renders increased speech fluency for individuals with profound speech deficits. Both short- and long-term music-based interventions exist which can address developmental, rehabilitative, and adaptive speech goals. A growing body of behavioral evidence prevails the efficacy of music training on various speech and language impairments including dyslexia, specific language impairment (SLI), aphasia, dysarthria, apraxia of speech, fluency disorders, voice disorders, and hearing loss. Despite the conglomerate of findings, it has remained largely elusive as to how music can elicit neural changes that help to mediate speech and language processes in the brain. In recent years however, a burgeoning volume of neuroimaging studies have begun to yield promising evidence with regards to the efficacy of the use of Neurologic Music Therapy (NMT) interventions for speech rehabilitation
by demonstrating neural reformations. For example, Wan and colleagues (Wan, Zheng, Marchina, Norton, & Schlaug, 2014) showed that intensive melodic intonation therapy (MIT) induced structural connectivity in the undamaged right hemisphere in patients with non-fluent chronic aphasia. What aspect of music (e.g., pitch, rhythm, melody, dynamics) plays the pivotal role in the transference of music to speech ability? Although melody may seem like the most important feature, recent evidence suggests that rhythm plays a critical role in the facilitation and recovery of speech, during music-based intervention (Fujii & Wan, 2014; Stahl, Kotz, Henseler, Turner, & Geyer, 2011). Behaviorally, rhythm performance predicts some linguistic abilities including grammar and phonological processing (Gordon et al., 2015). Neurologically, there is substantial overlap between rhythm and speech circuitries along the speech-motor network (Kotz, Schwartze, & Schmidt-Kassow, 2009; Kraus & Chandrasekaran, 2010). More specifically, the built-in temporal processes—necessary for both music and speech—are mediated by corticostriatal circuitries comprising the basal ganglia, the supplementary motor area (SMA), the premotor cortex, and the frontal operculum (Kotz & Schwartze, 2010). In particular, the basal ganglia serve as a central hub in analyzing patterns of temporal sequences of sensory or motoric events (Kotz & Schmidt-Kassow, 2015). Accordingly, there is a body of evidence indicating the functional role of the basal ganglia, ranging from beat perception and production (Grahn & Brett, 2009), to speech and language processing. Thus, patients with basal ganglia damage (e.g., Parkinson’s disease, PD) show speech and language deficits as well as motor deficits (Friederici, Kotz, Werheid, Hein, & Von Cramon, 2003; Grahn & Brett, 2009; Kotz, Frisch, Von Cramon, & Friederici, 2003). Consequently, PD patients whose basal ganglia are dysfunctional are not able to detect temporal cues in speech (Farrugia et al., 2014; Kotz & Gunter, 2015), syntactic violation in language (Friederici et al., 2003; Kotz & Gunter, 2015), and fail to modulate their speech rate during speaking tasks (Skodda & Schlegel, 2008). Also, such rhythm and timing deficits can stem from the mutation of genes coding a key neurotransmitter regulating temporal processes (Wiener, Lohoff, & Coslett, 2011; Wiener, Lee, Lohoff, & Coslett, 2014). For Example, DRD2 polymorphism can cause the reduction of dopamine2receptors’ density in the basal ganglia (Rowe et al., 1999), which can potentially affect timing and rhythmic processes. Accordingly, Wiener et al.
(2014) have shown that polymorphism of the DRD2 gene can lead to poor temporal judgment. Similarly, Wong and colleagues (Wong, Ettlinger, & Zheng, 2013) reported poor performance on grammar sequencing. These two studies also related DRD2 polymorphism to the differential functional magnetic resonance imaging (fMRI) activity in the basal ganglia. Intriguingly, patients with dysfunctional basal ganglia benefit from external rhythmic cueing when performing speech and language tasks. For example, Kotz and Gunter (2015) demonstrated that P600 electroencephalography (EEG)—a hallmark of the syntactic processing— was restored when a PD patient was primed by march music prior to a grammar judgment task. Correspondingly, in the developmental domain, children with SLI performed better on syntactic judgment tasks when primed by music with a regular beat pattern than by music with an irregular beat pattern (Przybylski et al., 2013), or by environmental sound lacking beat components entirely (Bedoin, Brisseau, Molinier, Roch, & Tillmann, 2016). In summation, current findings suggest that there is a tight coupling between speech and music, and that rhythmic processes are mediated by dedicated neural and genetic mechanisms. In this chapter, we will provide a comrehensive overview of Neurology Music Therapy for various speech and language disorders. Neurologic Music Therapy is an evidence-based system of standardized clinical techniques which are based on scientific knowledge related to music perception and production, and the effects thereof, on non-musical brain and behavior function (Thaut & Hoemberg, 2014). In the speech and language domain, there are eight standardized techniques: Melodic Intonation Therapy (MIT), Musical Speech Stimulation (MUSTIM), Rhythmic Speech Cueing (RSC), Vocal Intonation Therapy (VIT), Oral Motor and Respiratory Exercises (OMREX), Therapeutic Singing (TS), Developmental Speech and Language Training Through Music (DSLM), and Symbolic Communication Training Through Music (SYCOM). We will illustrate how each of these NMT techniques is applied to rehabilitation of speech and language impairments.
D
Motor speech disorders (MSDs) can occur due to neurologic impairments affecting the planning, programming, control, or execution of speech. MSDs include the dysarthrias and apraxia of speech (Duffy, 2005, p. 5). Dysarthria is a collective term referring to a neuropathophysiologic disruption in the activation and control (e.g., strength, speed, range of motion, tone, coordination) of the muscles necessary for speech production. Dysarthria therefore, can affect the respiratory, phonatory, resonatory, articulatory, and prosodic aspects of speech. Several categories exist: flaccid, spastic, hypokinetic, hyperkinetic, ataxic, upper motor neuron, and mixed; all resulting from damage or disturbance to the upper or lower motor neurons, basal ganglia, or cerebellum (Darley, Aronson, & Brown, 1969; Duffy, 2005). Singing and speech share the same proprioceptive feedback system. Guenther’s Directions into Velocities of Articulators (DIVA) model describes a segmental theory of speech motor control which proposes that speech segments are coded by the central nervous system (CNS) as auditory-temporal and somatosensory-temporal goal regions, and that two controls drive a speech sound map: feedforward and feedback. The feedforward mechanism outlines how the CNS sends anticipatory preprogrammed instructions about movements by relying on past experiences in movement planning, execution, and error correction. The feedback mechanism provides scaffolding for how speech movement is controlled based on the sensory input the CNS receives, which may indicate deviation from the planned movement (Guenther, 2006; Guenther & Vladusich, 2012; Tourville & Guenther, 2011). In the domain of speech and language rehabilitation, the task of singing could be theorized as able to induce neuromotor retraining via the formation of new motor command relationships within the feedback mechanism, that stimulate learning within the feedforward mechanism, thereby causing the CNS to re-calibrate or reset its motor program for communication. In addition, since singing naturally lends itself to heightening various elements of speech production as an augmentative form of vocal loading, respiratory shaping, resonant voicing, exaggerated articulation, and prosodic phrasing, singing could also be theorized as able to modulate motor neuron activity; carrying with it implications for rehabilitation (Cohen, 1994; Natke, Donath, & Kalveram, 2003; Tonkinson, 1994). Hereafter lies a review of current singing-related voice therapy strategies prescribed to specific motor speech disordered
populations and their outcomes. Elucidation of the practice of singing as a therapeutic science with reproducible effects is the main construct of the review. There is a significant profusion of literature reporting positive outcomes for utilizing singing tasks as a means of voice therapy in dysarthric populations. In traumatic brain injury and stroke, singing-induced gains have been documented in areas related to maximum phonation time, intensity, speech rate, prosody, vocal range, and overall intelligibility (Baker, Wigram, & Gold, 2005; Cohen, 1988, 1992; Kim & Jo, 2013; Tamplin, 2008). Therapeutic outcomes using NMT speech techniques such as VIT, TS, and OMREX in Parkinson’s disease have also revealed significant improvements in the areas of hypomimia, vocal intensity, fundamental frequency, maximum phonation time, prosody, articulation, and better lung function test scores, overall (Caligiuri, 1989; Canavan, Evans, Foy, Langford, & Proctor, 2012; DeStewart, Willemse, Maassen, & Horstink, 2003; Di Benedetto et al., 2009; Elefant, Baker, Lotan, Lagesen, & Skeie, 2012; Haneishi, 2001; Stegemöller, Radig, Hibbing, Wingate, & Sapienza, 2017; Tanner, 2012; Tanner, Rammage, & Liu, 2016; TautscherBasnett, Tomantschger, Keglevic, & Freimuller, 2006). Accordingly, in earlier work by Bellaire, Yorkston, and Beukelman (1986), the modification of the breathing pattern of mildly dysarthric speakers resulted in the amelioration of prosodic repertoire. Similarly, in 1993, Cohen and Masse applied a singing intervention to persons with neurogenic communication disorders, symptomatic of multiple sclerosis, cerebral palsy, Parkinson’s disease, and cerebrovascular accident. Findings revealed improvements in intelligibility ratings, vocal intensity, and vocal range. The significance of singing in human development has always had firm roots in our evolutionary inheritance: Charles Darwin theorized that Neanderthals originally communicated using a catalog of song-like expressions lacking in words or meaning (Darwin, 1872/1988). Furthermore, recent research provides description of a phenomenon known as infant-directed speech, or musical speech as being catalytic for preverbal communication and an important stage in language learning in the earliest stages of life (Fernald, 1989; Trainor, Clark, Huntley, & Adams, 1997). As such, singing’s current emergence (or re-emergence) as speech’s keen and remunerative partner, implies that the pairing has been evident all along, and that prescribing singing training to motor speech disordered populations
is, therefore, reflective of a more refined understanding of where we came from.
A
S
Apraxia affects the sensorimotor programming, planning, or preparation (e.g., velum elevation, tongue placement) needed for directing movements that result in volitional speech production (Yorkston, Beukelman, Strand, & Hakel, 2010, p. 7). Messages from the brain to the mouth become disrupted, resulting in an inability to move the articulators to execute speech sounds correctly. Apraxia can range from mild to a complete loss of the ability to produce speech. The disorder exists in two forms: congenital (childhood apraxia of speech—CAS) and acquired (apraxia of speech—AOS). Furthermore, “although AOS can involve all speech subsystems, it is predominantly a disorder of articulation and prosody” (Ballard et al., 2015, p. 316). While still in its infancy, the most significant conglomerate of clinical evidence that points towards treating apraxia utilizes rhythm as the main cueing mechanism. In NMT, representative techniques include MIT and RSC, with some additional prescription of OMREX and TS. In RSC “speech rate control via auditory rhythm is used to improve temporal characteristics such as fluency, articulatory rate, pause time, and intelligibility of speaking” (Mainka & Mallien, 2014, p. 150). Since 1988, several single-case studies have existed in the literature pointing to the positive effects of metronomic pacing treatment for the rehabilitation of apraxia of speech (Dworkin, Abkarian, & Johns, 1988; Wambaugh & Martinez, 2000). More recent research has developed. Brendal and Ziegler (2008) compared a metrical pacing treatment with an articulatory treatment on a sample of ten patients with post-stroke induced mild to severe aphasia. Post-therapy, the metrical stimulation treatment group exhibited improvements in articulatory and suprasegmental accuracy, while the articulatory treatment group displayed improvement in articulation alone. In another study, using a metronomic rate control and hand tapping task within a single-subject baseline design on a patient with mild AOS, Mauszycki and Wambaugh’s (2008) results indicated improvement in sound production accuracy and total utterance duration during repetition tasks. Aitken
Dunham (2010) designed a single-subject study comparing the efficacy of a speech therapy treatment program with a speech and music therapycombined treatment program. The music therapy protocol was established through the work of Kim and Tomaino (2008), and included elements of RSC, MIT, OMREX, and TS. Results revealed that while both treatment groups showed improvement post-therapy, the greatest treatment effect was found following the combined therapy protocol. Finally, using a singlesubject design with a repeated practice versus repeated practice in tandem with a rate/rhythm control strategy on ten speakers with chronic AOS, Wambaugh and colleagues’ (2012) results indicated articulation improvement in the repeated practice treatment with mild gains when rate/rhythm control tasks were added. Several studies have looked at the use of MIT with apraxia of speech populations; however, due to the small sample sizes and lack of consistent protocols, it is difficult to draw conclusions without further investigation and caution is recommended when interpreting the results. In 1975, Keith and Aronson reported a case of a 48-year-old-woman with aphasia and apraxia of speech, three years post-stroke. After several weeks of speech therapy, progress was muted, so a singing task was prescribed, which resulted in the patient’s exhibition of the ability to sing and articulate words in song. While transfer to speech was not without mild aphasic and apraxic error, the presence of speech, and the effectiveness thereof, warranted acknowledgment and prompted further clinical investigation. Krauss and Galloway (1982) ran a study comparing a traditional speech therapy protocol to one that included MIT as a warm-up across a single-subject case study on two boys with CAS and additional developmental delays. Outcomes for both subjects indicated improvements in phrase length, noun retrieval, and verbal imitation. Furthermore, Helfrich-Miller (1984) provided a report on three case studies involving children with apraxia of speech who were prescribed MIT over a period of one to four years. Gains were reported in the areas of phoneme acquisition, speech sequencing, and overall improvements in intelligibility rating. However, due to the fact that the patients were also receiving speech therapy during this time, the conclusion that MIT was the catalyst for outcome causation should be viewed with discretion. In 2011, Martikainen and Korpilahti compared the effectiveness of combining MIT with the Touch-Cue Method (TCM) in the single case of a 4-year-old girl with CAS. Findings revealed a decrease in
speech sound errors along with an increase in sequencing abilities post MIT. This progression continued when TCM was added resulting in whole words being formed. The outcomes of all of the aforementioned work indicate the need for further study in this domain to improve the communicative efficiency of people afflicted by AOS and CAS. Both MIT and RSC can be valuable compensatory facilitators of speech and language encoding and in tandem, speech and language production.
A Aphasia is a communication disorder which can affect a person’s use of expressive and receptive language. Aphasia is typically caused by acquired brain injuries, but can also be present as a degenerative brain and nervous system disorder in persons with frontotemporal dementia. Applications of MIT and MUSTIM as used in NMT have primarily focused on non-fluent aphasia and primary progressive aphasia. Broca’s aphasia, also referred to as expressive or non-fluent aphasia, results from damage to the language network in the left frontal lobe of the brain, Brodmann’s areas 44 and 45. Broca’s aphasia is characterized by the complete loss of ability to produce meaningful speech or severely reduced speech output with limited short utterances. Vocabulary access is halted and laborious, with a lack of ability to organize and control linguistic content, which often consists of non-propositional speech. Speech can be perseverative, with disordered syntax, grammar, and structure. The person with expressive aphasia may understand speech relatively well and be able to read, but be limited in writing (The American Speech-Language-Hearing Association, 2017). For over a hundred years, it has been noted that people with aphasia frequently have the ability to sing familiar, overlearned songs, which are accessed through the unimpaired right hemisphere heavily involved in the emotional color and expression as well as melodic and rhythmic aspects of both singing and speech. It was not until the 1970s that researchers standardized it as a formalized treatment process for people with Broca’s aphasia (Sparks, Helm, & Albert, 1974, Sparks & Holland, 1976). Since then, successful applications of the rhythmic and melodic intonation for aphasia have been seen across languages and cultures
(Bonakdarpour, Eftekharzadeh, & Ashayeri, 2003; Cortese, Riganello, Arcuri, Pignataro, & Buglione, 2015; Haro-Martinez et al., 2017, Popovici, 1995). Melodic Intonation Therapy (MIT) is a technique which uses a person with aphasia’s unaffected ability to sing familiar songs, in order to teach them how to sing and eventually generate speech output of functional phrases through the use of the melodic and rhythmic elements of speech. Evidence has shown that by using the stepwise process of MIT, the brain is able to bypass damaged left-hemisphere networks and engage righthemisphere language resources via the rerouting of speech pathways, therefore aiding in the restoration of propositional speech (Breier, Randle, Maher, & Papanicolaou, 2010; Schlaug, Marchina, & Norton, 2009). While many studies have focused on the use of MIT with chronic aphasia, Van der Meulen and colleagues (Van der Meulen, van de Sandt-Koenderman, Heijenbrok-Kal, Visch-Brink, & Ribbers, 2012) saw significant improvements on verbal output in subacute severe non-fluent aphasia patients between two and three months post-stroke. While the ultimate goal when using MIT is to train propositional language, in order for people to communicate and express non-formulaic verbal output independently in their everyday life, it is also used to teach a specific set of formulaic or overlearned phrases which are relevant to the patient’s life. The long-term goal is to improve propositional speech, and therefore speech and language assessments which are sensitive to both propositional and non-propositional speech should be used (Lim et al., 2013; Zumbansen, Peretz, & Hébert, 2014). Long-term, there is some evidence that suggests reactivation of left-hemisphere speech circuitry (Belin et al., 1996; Naeser & Helm-Estabrooks, 1985; Schlaug, Marchina, & Norton, 2008; Schlaug et al., 2009). Although the body of research validating the use of MIT with aphasia is very large, there is still more to be understood. Many studies have emphasized melody as the key element driving the responses seen in patients when using MIT (Akanuma, Meguro, Satoh, Tashiro, & Itoh, 2016; Seger et al., 2013). While melody clearly plays an important role, more recent research has focused on the rhythmic priming and pacing which has been shown to engage auditory, prefrontal, and parietal regions in the right hemisphere (Boucher, Garcia, Fleurant, & Paradis, 2001; Stephan et al., 2002).
F Fluency refers to the aspects of speech output related to continuity, smoothness, rate, and effort. Most people have experienced brief speech disfluencies at some point in time in their lives. For instance, normal disfluency can happen when children are first learning to combine words and speak in short sentences, or when they are learning to read. However, when disfluencies become numerous to a point where they impede the ability to communicate, they may meet the diagnostic criteria for a fluency disorder such as stuttering or cluttering. Stuttering is most commonly presents at childhood, but adult onset can also result due to a range of neurologic and neuropsychological conditions. While the exact cause of stuttering is not completely understood, there are many theories suggesting a range of genetic, neurological, psychological, and social linguistic links to the disorder. The Diagnostic and Statistical Manual of Mental Disorders, 5th edition (DSM-5; American Psychiatric Association, 2013), identifies primary symptoms for childhood-onset fluency disorder as repetition of sounds, syllables, or monosyllabic whole words; prolongation of single sounds; blocked silence or voicing during speech; and excessive physical tension in word production. Secondary behaviors may also include hesitation, interjection of sounds, loss of eye contact, and extraneous motor movements such as eye blinking. Jokel and colleagues (Jokel, De Nil, & Sharpe, 2007) systematically assessed the speech characteristics of adults with neurogenic stuttering due to acquired brain injury. Their research resulted in six principal characteristics of stuttering, which are often referred to in the neurogenic stuttering literature: 1. 2. 3. 4.
Disfluencies occur equally on grammatical and substantive words. Repetition, prolongation, and blocks occur in all positions of words. There is a consistency of stuttering behavior across all speech tasks. The speaker does not appear overly anxious about the stuttering behavior. 5. Secondary features are rarely observed. 6. An adaptation effect is not observed.
Cluttering, which also presents as a disruption in fluency and rate of speech, is characterized by rapid bursts of speech at an irregular speech rate. Typical disfluencies may include excessive whole word repetition, unfinished words, omitted syllables, and interjections. People who clutter often have limited self-awareness of their irregular speech, sloppy handwriting, poor attention, difficulties organizing thoughts, auditory processing disorders, and learning disabilities. Several studies have suggested that stuttering is a disorder of motor timing, and may be related to the basal ganglia (Lebrun, 1998; Rosenberger, 1980; Victor & Ropper 2001; Wu et al., 1995). The SMA and basal ganglia play a significant role in providing internal timing cues to facilitate the initiation of even well-learned speech (Cunnington, Bradshaw, & Iansek, 1996). Rhythm has been used as an effective external timing cue to compensate for deficient internal cues from the basal ganglia and the SMA. This may explain why speaking to a metronome is one of the most effective ways to instantly create fluency for persons who stutter (Alm, 2004). This effect has been reported to be independent of speech rate, with significant reduction in stuttering seen even at very fast tempos (Van Ripper, 1982). Research has also looked at singing to increase fluency in vocal output. Alm (2004) suggested that melody cannot exist without rhythm; therefore, when singing the brain has an internal representation of the intended timing for the initiation of each syllable, similar to how the metronome can provide external timing cues. A study by Healey and colleagues (Healey, Mallard, & Adams, 1976) compared singing familiar and unfamiliar lyrics to a familiar melody. While both conditions were associated with significant reductions in stuttering, the greatest increases were seen when singing familiar lyrics, possibly indicating that singing alone does not account for all of the decreases in stuttering that occur during singing. In 1979, Colcord and Adams compared reading versus singing altered lyrics to a familiar melody in order to increase fluency, voicing durations, and vocal sound pressure levels (SPL) in moderate to severe stutterers. Results revealed both a decrease in disfluency and an increase in voicing duration when singing to a familiar melody over reading. In addition, Glover and colleagues (Glover, Kalinowski, Rastatter, & Stuart, 1996) compared reading vs. self-generated singing at normal and fast tempos. Singing at both fast and normal rates was found to generate substantial reduction in stuttering over reading. Although the quality of singing varied significantly, this study suggested
that stutterers also have the ability to internally create fluent speech output by imposing self-generated melodic structures when asked to sing.
S
D
Although several attempts have been made to use music as a habilitative means for hearing restoration, there is only a scant of NMT research in the hearing loss and cochlear implant (CI) domain (Gfeller, 2016; Limb & Rubinstein, 2012). This is primarily due to the inherent difficulty of music perception in CI users—their ability to process spectrally complex musical sound is limited (Limb & Roy, 2014). Such impoverished music signals can be disturbing to some CI users, while others find it pleasant (Abdi, Khalessi, Khorsandi, & Gholami, 2001; Gfeller, Driscoll, Smith, & Scheperle, 2012). Only a few studies have examined the effect of music training on CI hearing improvement with rigor for proper experimental designs and formal tests. This is due to logistical and practical challenges including heterogeneity in age, onset of the hearing loss, and duration of the CI among the participants. For example, past studies often relied upon teachers’ or parents’ evaluations as indicative of improved music skills and aptitude after music listening with no statistical analyses (Abdi et al., 2001; Rocca, 2012). Nevertheless, emerging evidence indicates that music training appears to elicit improved listening skills in the CI users. Chen et al. (2010) reported that pitch perception was positively correlated with a music training period in twenty-seven CI children. Fu and colleagues (Fu, Galvin, Wang, & Wu, 2015) demonstrated that melodic contour identification ability was significantly improved after four weeks of a computerized music training program in fourteen congenitally deaf CI children. Importantly, music training not only leads to listening skills within the music domain, but it also transfers to speech and cognitive domains. For example, Rochette and colleagues (Rochette, Moussard, & Bigand,2014) showed that fourteen profoundly deaf children who received 1.5–4 years of music lessons outperformed the control CI group (i.e., no musical training) in phonetic discrimination tests, auditory scene analysis, and working memory tests.
At present, there is a dearth of CI studies examining the effect of music training at the neural level. In general, it is difficult to study the CI users using fMRI because of the ferromagnetic characteristics of the CI device. Instead, EEG has been used to study the neural activity associated with listening skills in the CI users. For example, Peterson et al. (2015) recorded the brain activity profiles of eleven adolescent CI users using EEG before and after a two-week music training period. They found a significant change in mismatch negativity (MMN) in response to deviations of timbre, intensity, and rhythm. In addition to EEG, functional near-infrared spectroscopy (fNIRS) is another viable option to study CI populations (Saliba, Bortfeld, Levitin, & Oghalai, 2016). In fact, fNIRS has several advantages over fMRI including quietness, portability, and a naturalistic and participant-friendly environment. Although the application of fNIRS in the CI domain is still in its infancy, fNIRS allows for studying the neuroanatomical changes following music intervention training in CI users, and therefore remains attractive.
V
D
Voice disorders occur when there is a deficiency in vocal functioning affecting speech production. Symptoms may include: hoarse or breathy vocal quality; loss of voice; pitch breaks, inability to maintain typical pitch, or reduced pitch range; lack of vocal carrying power; reduced loudness range; a need to use greater vocal effort; running out of breath while talking; an unsteady voice, tension in the neck and shoulders, throat or neck pain, throat fatigue or tightness, pain upon swallowing; an increased need to cough or throat; and any form of discomfort in the chest, ears, or back of the neck (Kostyk & Putnam Rochet, 1998). Voice disorders can be manifested in a multitude of different ways and have multi-factorial etiologies. Stemple and colleagues (Stemple, Glaze, & Klaben, 2009) classify disorders into four main causal areas: medically-related disorders and primary disorders (structural and neurogenic); personality-related disorders, sometimes referred to as psychogenic; and vocal misuse disorders, alternatively labeled functional.
Medically-related disorders refer to “medical or surgical interventions that directly cause voice disorders and medical or health conditions and treatments that may indirectly contribute to the development of voice disorders” (e.g., trauma, chronic illness, chronic disorder) (Stemple et al., 2009, p. 60). A sampling of singing-task induced voice therapy research outcomes are highlighted hereafter. Onofre and colleagues (Onofre et al., 2013) provided report on the use of a singing training program prescribed to laryngectomy-wearing patients with tracheoesophageal voice prostheses. The program included both respiratory muscle strengthening and scalar vocalization tasks. Outcomes revealed improvement in the grade of dysphonia, roughness and breathiness as well as minor improvements in vocal extension during tracheoesophageal phonation. Using a randomized control trial, Hilton et al. (2013) prescribed singing exercises to a population of ninety-three patients in an effort to reduce symptoms of snoring and sleep apnea, and found that by improving the tone and strength of the pharyngeal muscles, the experimental group displayed a significantly reduced frequency of snoring. Lortie and colleagues (Lortie, Rivard, Thibeault, & Tremblay, 2017) described the augmentative effects of singing on the aging voice by looking at a population of seventy-two people with an age range of 20 to 93 years. Findings indicated that frequent singing moderates age-related effects on most acoustic parameters of the voice, especially related to pitch accuracy and amplitude levels. This is in keeping with similar findings by Sauder, Roy, Tanner, Houtz, and Smith (2010) and Ziegler Verdolini Abbott, Johns, Klein, and Hapner (2014) related to presbylaryngis, which lends itself to the burgeoning conglomerate of evidence affirming the benefits of singing practice on the aging voice. Primary disorders include “embryologic, physiologic, neurologic and anatomic disorders that have vocal changes as secondary symptoms of the primary disorder” (e.g., cleft palate, velopharyngeal insufficiency, hearing impairment, cerebral palsy) (Stemple et al., 2009, p. 65). Research in this area has been localized to a few select areas; one being spasmodic dysphonia (SD). SD causes symptoms of strained or effortful voice qualities due to adductor or abductor laryngospasm. While primary treatment prescription includes botox injection or resection of the recurrent laryngeal nerve to paralyze one of the folds, voice therapy treatment targets include work related to soft and sustained phonatory onsets, pitch and loudness modifications (The American Speech-Language-Hearing Association,
2017). Recent literature suggests that SD is a form of focal dystonia with dysfunction often reflected during speech tasks alone, leaving non-linguistic vegetative functions such as coughing, laughing, and singing unaffected by the disorder. Reduction in spasticity thereby, is the result of deviation from the normal mode of phonation, and singing has henceforth been promoted as an effective strategy to explore in therapy (Bloch, Hirano, & Gould, 1985). Therapeutic outcomes in populations with unilateral vocal fold paralysis have also been positive, with reported improvement related to reduced hoarseness and improved perception of voice impairment (BustoCrespo et al., 2016). That said, acknowledgment of these results should be treated with caution as idiopathic vocal fold immobility has a history of spontaneous resolution. Finally, since reports have shown that professionally trained classical singers carefully tune their velopharyngeal port to fine-tune their voice timbre (Austin, 1997; Birch et al., 2002; Fowler & Morris, 2007; Sundberg et al., 2007; Tanner, Roy, Merrill, & Power, 2005; Yanagisawa, Estill, Mambrino, & Talkin, 1991), research investigating the effect of altered auditory feedback on the control of oral– nasal balance in song was completed by Santoni, de Boer, Thaut, and Bressmann (2018). Results indicated that all participants showed lower nasalance scores in response to both increased and decreased nasal signal level feedback, with no differences reported between trained singers and untrained non-singers. In a similar study by Jennings and Kuehn (2008), looking at the singing of sustained vowels, without the altered feedback condition, trained singers were shown to display lower nasalance scores than untrained singers. While the results of both studies may not be directly comparable, they do support the premise that more research is needed in this area in order to support the potential for the experimental implementation of singing-infused therapeutic protocols in populations with hypernasal resonance disorders. Personality-related or psychogenic voice disorders come about due to psychological factors reflected via a disturbance of voice (e.g., puberphonia, conversion aphonia). In a study treating patients with puberphonia, Desai and Mishra infused singing modalities (humming and glottal phonatory onsets) into part of their voice therapy protocol and found that all patients (N = 30) were able to achieve appropriate pitch range posttherapy (Desai & Mishra, 2012). More research is needed in this area.
Vocal misuse disorders (e.g., muscle tension dysphonia, voice fatigue, ventricular phonation, phonotrauma) are typical of vocal abuse, often caused by poor muscle functioning or poor voicing behaviors. There is a lot of literature supporting the use of several standard voice therapy treatment protocols addressing vocal misuse with protocols somewhat analogous to tasks involved in singing—specifically the use of nasal consonants and humming, sustained phonation, pitch glides, and rhythmic vocal play: the Smith Accent Method (Smith & Thyme, 1976), Vocal Function Exercises (Stemple, Lee, D’Amico, & Pickup, 1994), Lessac-Madsen Resonant Voice Therapy (Verdolini-Marston, Burke, Lessac, Glaze, & Caldwell, 1995), Semi-Occluded Vocal Tract (Titze, 2006), and Phonatory Resistance Training Exercises (Ziegler & Hapner, 2013). Perceptual outcomes of Resonant Voice Therapy, for example, have included improvements in speech-level fundamental frequency and range of speaking intensity, as well as reductions in vocal roughness, strain, monotone, hard glottal attack, vocal fry, and overall vocal fatigue (Chen, Hsiao, Hsiao, Chung, & Chiang, 2007; Roy et al., 2003; Yiu & Ho, 2002; Verdolini-Marston et al., 1995). Alleviation of supraglottic activity (false vocal fold and anterior-posterior compression) has also been reported (Ogawa et. al, 2013). Active singing as a treatment option in the world of voice therapy is in fact not a new concept. Boone, McFarlane, Von Berg, and Zraick (2010) explain a technique called Redirected Phonation, which is prescribed to patients having difficulty “finding” their voice due to functional dysphonia. The procedural mechanism of the technique is that the speech language pathologist “searches with the patient to find some kind of vegetative phonation (coughing, gargling, laughing, throat clearing) or some kind of intentional voicing (‘playing’ the comb or kazoo, humming, singing, trilling [Colton & Casper, 1996], or saying ‘um-hmm’ [Cooper, 1973])” (Boone et al., 2010, p. 230). Relative to singing, the protocol is focused on singing practice sentences (similar to the practice of chant-talking) with the goal of phasing out the singing with the newly redirected voicing procedure for speech—shaped by the improved respiration, phonation, and overall voice quality present in the singing condition. Outcomes have included increased ease and clarity of voice production, with less perturbation. A vast amount of clinical evidence also points to the benefit of using singing tasks as a means of respiratory therapy. Within the domain of chronic obstructive pulmonary disease (COPD), clusters of research have
shown that singing voice therapy tasks (VIT, TS, OMREX) have resulted in improvements in single breath counting, breath support modes (clavicular vs. diaphragmatic), maximum intensity ratings, lung function tests (maximum expiratory pressure, forced expiratory volume, and forced vital capacity), as well as self-reported improvement in dyspnea ratings (Canga, Azoulay, Raskin, & Loewy, 2015; Engen, 2003; Jamaly et al., 2017; Lord et al., 2012; Skingley et al., 2014). In a study completed by Eley and Gorman (2010), thirty-three asthmatic participants were treated with either OMREX (males playing a didgeridoo) or VIT and TS (females taking singing lessons). Results for the males indicated significant improvements in lung function tests (peak expiratory flow, forced expiratory volume, and forced vital capacity). Results for the females revealed promising, but insignificant gains in peak expiratory flow. There is also some preliminary evidence of the clinical benefit of a singing program in the cystic fibrosis population with one study pointing to amelioration reflected in lung function scores (maximum inspiratory pressure, maximum expiratory pressure) of eight hospitalized children post-treatment (Irons, Kenny, McElrea, & Chang, 2012). Finally, in a study conducted by Tamplin et al. (2013), a randomized control trial comparing the effectiveness of singing lessons (OMREX and TS) versus music appreciation and relaxation classes for twenty-four participants with quadriplegia was completed. Results indicated significant improvements in speech intensity as well as maximum phonation time for the singing group alone. The breadth of this research exhibits an exciting trend towards the use of a singing-task as a viable and contemporary utensil for use in voice therapy practices.
D Rhythm-based intervention has been applied to developmental dyslexia, a prevalent reading disorder despite a person’s normal cognitive abilities and IQ. For example, Thomson and colleagues (Thomson, Leong, & Goswami,2012) devised a novel rhythm training program for six weeks of intervention with eleven dyslexic children. The rhythm intervention program consisted of three different training regimens aimed at improving
auditory temporal processing in a fun and engaging manner. They compared the efficacy of the rhythm intervention to a conventional phonetic training program that eleven other dyslexic children participated in. Both intervention programs yielded a comparable amount of improvement in phonological awareness compared to a third control group of eleven dyslexic children. Similarly, Bhide and colleagues (Bhide, Power, & Goswami, 2013) compared a rhythm intervention program consisting of nine different rhythm training sections (e.g., same/different rhythm discrimination, rise time discrimination, etc.) to a conventional intervention program that required children to match sound to spelling. Their findings indicated that the rhythm-based intervention was as effective as the conventional intervention method. Bonacina and colleagues (Bonacina, Cancer, Lanzi, Lorusso, & Antonietti, 2015) conducted an intervention study with fourteen dyslexic children who underwent a computerized rhythmic-reading training (RRT) every other week (a total of nine sessions). Compared to a control group (no training), children who received the RRT improved reading ability as evidenced by reduced reading speed and increased accuracy. Flaugnacco et al. (2015) performed a randomized control trial wherein dyslexic children participated in either a music training program or a painting training program (in tandem with conventional daily treatment) for a period of seven months. The music training program was based on Kodaly and Orff pedagogy with significant focus given to the rhythmic and temporal aspects of the music. What they found was that the music group outperformed the control (i.e., painting) group in phonological awareness and reading skills. More recently, Habib et al. (2016) conducted a music-based intervention program with dyslexic and normal school-age children for three days (six hours per day). After the intervention, they found a significant improvement in phonological and syllabic encoding abilities in the dyslexic children. Most notably, performance after the intervention was comparable to that of normal children. In summary, both short- and long-term music-based intervention programs appear to be effective ways of treating dyslexia.
C
There are many parallels between both the structure and production of speech, language, and music. All can be considered sensorimotor behaviors that require a high level of control and dynamic interplay between several brain processes in order to select, organize, articulate, and implement in a time-sensitive manner. Because of the inherent timing, rhythm, pattern, and melodic structures in both music and speech, music has the potential to simulate normal speech patterns, and therefore act as a training and retraining tool for people with speech and language disorders.
R Abdi, S., Khalessi, M. H., Khorsandi, M., & Gholami, B. (2001). Introducing music as a means of habilitation for children with cochlear implants. International Journal of Pediatric Otorhinolaryngology 59(2), 105–113. Aitken Dunham, D. J. (2010). Efficacy of using music therapy combined with traditional aphasia and apraxia of speech treatments (Master’s dissertation). Western California University, North Carolina. Akanuma, K., Meguro, K., Satoh, M., Tashiro, M., & Itoh, M. (2016). Singing can improve speech function in aphasics associated with intact right basal ganglia and preserve right temporal glucose metabolism: Implications for singing therapy indication. International Journal of Neuroscience 126(1), 39–45. Alm, P. A. (2004). Stuttering and the basal ganglia circuits: A critical review of possible relations. Journal of Communication Disorders 37(4), 325–69. American Psychiatric Association (2013). Diagnostic and statistical manual of mental disorders (5th ed.). Washington, DC: APA. Austin, S. F. (1997). Movement of the velum during speech and singing in classically trained singers. Journal of Voice—Official Journal of the Voice Foundation 11(2), 212–221. Baker, F., Wigram, T., & Gold, C. (2005). The effects of song-singing programme on the affective speaking intonation of people with traumatic brain injury. Brain Injury 19(7), 519–28. Ballard, K. J., Wambaugh, J. L., Duffy, J. R., Layfield, C., Maas, E., Mauszycki, S., & McNeil, M. R. (2015). Treatment for acquired apraxia of speech: A systematic review of intervention research between 2004 and 2012. American Journal of Speech Language Pathology 24(2), 316–337. Bedoin, N., Brisseau, L., Molinier, P., Roch, D., & Tillmann, B. (2016). Temporally regular musical primes facilitate subsequent syntax processing in children with specific language impairment. Frontiers in Neuroscience 10. Retrieved from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4913515/ Belin, P., Van Eeckhout, P., Zilbovicius, M., Remy, P., François, C., Guillaume, S., Chain, F., … Samson, Y. (1996). Recovery from nonfluent aphasia after melodic intonation therapy: A PET study. Neurology 47(6), 1504–1511. Bellaire, K., Yorkston, K. M., & Beukelman, D. R. (1986). Modification of breath patterning to increase naturalness of a mildly dysarthric speaker. Journal of Communication Disorders 19(4), 271–280.
Bhide, A., Power, A., & Goswami, U. (2013). A rhythmic musical intervention for poor readers: A comparison of efficacy with a letter-based intervention. Mind, Brain, and Education 7(2), 113– 123. Birch, P., Gümoes, B., Stavad, H., Prytz, S., Björkner, E., & Sundberg, J. (2002). Velum behavior in professional classic operatic singing. Journal of Voice 16(1), 61–71. Bloch, C. S., Hirano, M., & Gould, W. J. (1985). Symptom improvement of spastic dysphonia in response to phonatory tasks. Annals of Otology, Rhinology, & Laryngology 94, 51–4. Bonacina, S., Cancer, A., Lanzi, P. L., Lorusso, M. L., & Antonietti, A. (2015). Improving reading skills in students with dyslexia: The efficacy of a sublexical training with rhythmic background. Educational Psychology 6(1510), 1–8. Bonakdarpour, B., Eftekharzadeh, A., & Ashayeri, H. (2003). Melodic intonation therapy in Persian aphasic patients. Aphasiology 17(1), 75–95. Boone, D. R. (1983). The voice and voice therapy (3rd ed.). Englewood Cliffs, NJ: Prentice Hall. Boone, D. R., McFarlane, S. C., Von Berg, A. L., & Zraick, R. I. (2010). The voice and voice therapy (8th ed.). Boston, MA: Pearson. Boucher, V., Garcia, L. J., Fleurant, J., & Paradis, J. (2001). Variable efficacy of rhythm and tone in melody based interventions: Implications for the assumption of a right-hemisphere facilitation in non-fluent aphasia. Aphasiology 15(2), 131–149. Breier, J. I., Randle, S., Maher, L. M., & Papanicolaou, A. C. (2010). Changes in maps of language activity activation following melodic intonation therapy using magnetoencephalography: Two case studies. Journal of Clinical and Experimental Neuropsychology 32(3), 309–314 Brendal, B., & Ziegler, W. (2008). Effectiveness of metrical pacing in the treatment of apraxia of speech. Aphasiology 22(1), 1–26. Busto-Crespo, O., Uzcanga-Lacabe, M., Abad-Marco, A., Berasategui, I., García, L., Maraví, E., … Fernández-González, S. (2016). Longitudinal voice outcomes after voice therapy in unilateral vocal fold paralysis. Journal of Voice 30(6), 767.e9–767.e15. Caligiuri, M. P. (1989). The influence of speaking rate on articulatory hypokinesia in Parkinsonian dysarthria. Brain & Language 36(3), 1493–1502. Canavan, M., Evans, C., Foy, C., Langford, R., & Proctor, R. (2012). Can group singing provide effective speech therapy for people with Parkinson’s disease? Arts and Health 4(1), 83–95. Canga, B., Azoulay, R., Raskin, J., & Loewy, J. (2015). AIR: Advances in respiration—music therapy in the treatment of chronic pulmonary disease. Respiratory Medicine 109(12), 1532–1539. Casper, J. (2000). Confidential voice. In J. C. Stemple (Ed.), Voice therapy: Clinical studies (2nd ed., pp. 128–139). San Diego, CA: Singular Publishing Group. Chen, J. K.-C., Chuang, A. Y. C., McMahon, C., Hsieh, J. C., Tung, T. H., & Li, L. P. (2010). Music training improves pitch perception in prelingually deafened children with cochlear implants. Pediatrics 125(4), e793–e800. Chen, S. H., Hsiao, T. Y., Hsiao, L. C., Chung, Y. M., & Chiang, S. C. (2007). Outcome of resonant voice therapy for female teachers with voice disorders: Perceptual, physiological, acoustic, aerodynamic, and functional measurements. Journal of Voice 21(4), 415–425. Cohen, N. S. (1988). The use of superimposed rhythm to decrease the rate of speech in a braindamaged adolescent. Journal of Music Therapy 25(2), 85–93. Cohen, N. S. (1992). The effect of singing instruction on the speech production of neurologically impaired persons. Journal of Music Therapy 29(2), 87–102. Cohen, N. S. (1994). Speech and song: Implications for therapy. Music Therapy Perspectives 12(1), 8–14. Cohen, N. S., & Masse, R. (1993). The application of singing and rhythmic instruction as a therapeutic intervention for persons with neurogenic communication disorders. Journal of Music
Therapy 30(2), 81–99. Colcord, R. D., & Adams, M. R. (1979). Voicing duration and vocal SPL changes associated with stuttering reduction during singing. Journal of Speech and Hearing Research 22(3), 468–479. Colton, R. H., & Casper, J. K. (1996). Understanding voice problems: A physiological perspective for diagnosis and treatment (3rd ed.). Baltimore, MD: Lippincott. Cooper, M. (1973). Modern techniques of vocal rehabilitation. Springfield, IL: Charles C. Thomas. Cortese, M. D., Riganello, F., Arcuri, F., Pignataro, L. M., & Buglione, I. (2015). Rehabilitation of aphasia: Application of melodic-rhythmic therapy to Italian language. Frontiers in Human Neuroscience 9, 1–8. Retrieved from https://doi.org/10.3389/fnhum.2015.00520 Cunnington, R., Bradshaw, J. L., & Iansek, R. (1996). The role of the supplementary motor area in the control of voluntary movement. Human Movement Science 15(5), 627–647. Darley, F. L., Aronson, A. E., & Brown, J. R. (1969). Differential diagnostic patterns of dysarthria. Journal of Speech and Hearing Research 12(2), 246–269. Darwin, C. (1872/1988). The expression of the emotions in man and animals. Ed. P. Ekman. Oxford: Oxford University Press. Desai, V., & Mishra, P. (2012). Voice therapy outcome in puberphonia. Journal of Laryngology and Voice 2(1), 26–29. DeStewart, B. J., Willemse, S. C., Maassen, B. A. M., & Horstink, M. W. I. M. (2003). Improvement of voicing in patients with Parkinson’s disease by speech therapy. Neurology 60(3), 498–500. Di Benedetto, P., Cavazzon, M., Mondolo, F., Rugiu, G., Peratoner, A., & Biasutti, E. (2009). Voice and choral singing treatment: A new approach for speech and voice disorders in Parkinson’s disease. European Journal of Physical Rehabilitation Medicine 45(1), 13–19. Duffy, J. R. (2005). Motor speech disorders: Substrates, differential diagnosis, and management. St. Louis, MO: Elsevier Mosby. Dworkin, J. P., Abkarian, G. G., & Johns, D. F. (1988). Apraxia of speech: The effectiveness of a treatment regimen. Journal of Speech and Hearing Disorders 53(3), 280–294. Elefant, C., Baker, F. A., Lotan, M., Lagesen, S. K., & Skeie, G. O. (2012). The effect of group music therapy on mood, speech, and singing in individuals with Parkinson’s disease: A feasibility study. Journal of Music Therapy 49(3), 278–302. Eley, R., & Gorman, D. (2010). Didgeridoo playing and singing to support asthma management in aboriginal Australians. Journal of Rural Health 26(1), 100–104. Engen, R. L. (2003). The singer’s breath: Implications for treatment of persons with emphysema (Dissertation). University of Iowa. Farrugia, N., Benoit, C. E., Schwartze, M., Pell, M., Obrig, H., Dalla Bella, S., & Kotz, S. (2014). Auditory cueing in Parkinson’s disease: Effects on temporal processing and spontaneous theta oscillations. Procedia—Social and Behavioral Sciences 126, 104–105. Special Issue for International Conference on Timing and Time Perception, March 31–April 3, Corfu, Greece. Fernald, A. (1989). Intonation and communicative intent in mothers’ speech to infants. Is the melody the message? Child Development 60(6), 1497–1510. Flaugnacco, E., Lopez, L., Terribili, C., Montico, M., Zoia, S., & Schön, D. (2015). Music training increases phonological awareness and reading skills in developmental dyslexia: A randomized control trial. PloS ONE 10(9), e0138715. Fowler, L. P., & Morris, R. J. (2007). Comparison of fundamental frequency nasalance between trained singers and nonsingers for sung vowels. Annals of Otology, Rhinology, & Laryngology 116(10), 739–746. Friederici, A. D., Kotz, S. A., Werheid, K., Hein, G., & Von Cramon, D. (2003). Syntactic comprehension in Parkinson’s disease: Investigating early automatic and late integrational processes using event-related brain potentials. Neuropsychology 17(1), 133–142.
Fu, Q.-J., Galvin, J. J., Wang, X., & Wu, J. L. (2015). Benefits of music training in Mandarinspeaking pediatric cochlear implant users. Journal of Speech, Language, and Hearing Research 58(1), 163–169. Fujii, S., & Wan, C. Y. (2014). The role of rhythm in speech and language rehabilitation: The SEP hypothesis. Frontiers in Human Neuroscience 8, 1–15. doi:10.3389/fnhum.2014.00777 Gfeller, K. (2016). Music-based training for pediatric CI recipients: A systematic analysis of published studies. European Annals of Otorhinolaryngology, Head and Neck Diseases, 12th European Symposium on Pediatric Cochlear Implant (ESPCI 2015) 133(Suppl. 1), S50–S56. Gfeller, K., Driscoll, V., Smith, R. S., & Scheperle, C. (2012). The music experiences and attitudes of a first cohort of prelingually-deaf adolescents and young adults CI recipients. Seminars in Hearing 33(4), 346–360. Glover, H., Kalinowski, J., Rastatter, M., & Stuart, A. (1996). Effect of instruction to sing on stuttering frequency at normal and fast rates. Perceptual and Motor Skills 83(2), 511–522. Gordon, R. L., Shivers, C. M., Wieland, E. A., Kotz, S. A., Yoder, P. J., & McAuley, J. D. (2015). Musical rhythm discrimination explains individual differences in grammar skills in children. Developmental Science 18(4), 635–644. Grahn, J. A., & Brett, M. (2009). Impairment of beat-based rhythm discrimination in Parkinson’s disease. Cortex 45(1), 54–61. Guenther, F. H. (2006). Cortical interactions underlying the production of speech sounds. Journal of Communication Disorders 39(5), 350–365. Guenther, F. H., & Vladusich, T. (2012). A neural theory of speech acquisition and production. Journal of Neurolinguistics 25(5), 408–422. Habib, M., Lardy, C., Desiles, T., Commeiras, C., Chobert, J., & Besson, M. (2016). Music and dyslexia: A new musical training method to improve reading and related disorders. Frontiers in Psychology 7. Retrieved from http://journal.frontiersin.org/article/10.3389/fpsyg.2016.00026/abstract Haneishi, E. (2001). Effects of a music therapy voice protocol on speech intelligibility, vocal acoustic measures, and mood of individuals with Parkinson’s disease. Journal of Music Therapy 38(4), 273–290. Haro-Martínez, A. M., García-Concejero, V. E., López-Ramos, A., Maté-Arribas, E., López-Táppero, J., Lubrini, G., … & Fuentes, B. (2017). Adaptation of melodic intonation therapy to Spanish: A feasibility pilot study, Aphasiology 31(11), 1333–1343. Healey, E. C., Mallard, A. R., & Adams, M. R. (1976). Factors contributing to the reduction of stuttering during singing. Journal of Speech and Hearing Research 19, 475–480. Helfrich-Miller, K. R. (1984). Melodic intonation therapy with developmentally apraxic children. Seminars in Speech and Language 5, 119–126. Hilton, M. P., Savage, J., Hunter, B., McDonald, S., Repanos, C., & Powell, R. (2013). Singing exercises improve sleepiness and frequency of snoring among snorers: A randomised controlled trial. International Journal of Otolaryngology and Head & Neck Surgery 2(3), 97–102. Irons, J. Y., Kenny, D. T., McElrea, M., & Chang, A. B. (2012). Singing therapy for young people with cystic fibrosis: A randomized controlled pilot study. Music and Medicine 4(3), 136–145. Jamaly, S., Leidag, M., Schneider, H. W., Domanksi, U., Rasche, K., Schröder, M., & Nilius, G. (2017). The effect of singing therapy compared to standard physiotherapeutic lung sport in COPD. Pneumologie 71(S01), S1–S125. Jennings, J. J., & Kuehn, D. P. (2008). The effects of frequency range, vowel, dynamic loudness level, and gender on nasalance in amateur and classically trained singers. Journal of Voice 22(1), 75–89.
Jokel, R., De Nil, L. F., & Sharpe, A. K. (2007). Speech disfluencies in adults with neurogenic stuttering associated with stroke and traumatic brain injury. Journal of Medical Speech-Language Pathology 15(3), 243–261. Keith, R., & Aronson, A. (1975). Singing as therapy for apraxia of speech and aphasia: Report of a case. Brain & Language 2, 483–488. Kim, M., & Tomaino, C. (2008). Protocol evaluation for effective music therapy for persons with nonfluent aphasia. Topics in Stroke Rehabilitation 15(6), 555–569. Kim, S. J., & Jo, U. (2013). Study of accent-based music speech protocol development for improving voice problems in stroke patients with mixed dysarthria. Neurorehabilitation 32(1), 185–190. Kostyk, B. E., & Putnam Rochet, A. (1998). Laryngeal airway resistance in teachers with vocal fatigue: A preliminary study. Journal of Voice 12(3), 287–299. Kotz, S. A., Frisch, S., Von Cramon, D. Y., & Friederici, A. D. (2003). Syntactic language processing: ERP lesion data on the role of the basal ganglia. Journal of the International Neuropsychological Society 9(7), 1053–1060. Kotz, S. A., & Gunter, T. C. (2015). Can rhythmic auditory cuing remediate language-related deficits in Parkinson’s disease? Annals of the New York Academy of Sciences 1337, 62–68. Kotz, S. A., & Schmidt-Kassow, M. (2015). Basal ganglia contribution to rule expectancy and temporal predictability in speech. Cortex 68, 48–60. Kotz, S. A., & Schwartze, M. (2010). Cortical speech processing unplugged: A timely subcorticocortical framework. Trends in Cognitive Sciences 14(9), 392–399. Kotz, S. A., & Schwartze, M. (2016). Motor timing and sequencing in speech production: A generalpurpose framework. In G. Hickok & S. L. Small (Eds.), Neurobiology of Language (pp. 717–724). New York: Academic Press. Kotz, S. A., Schwartze, M., & Schmidt-Kassow, M. (2009). Non-motor basal ganglia functions: A review and proposal for a model of sensory predictability in auditory language perception. Cortex 45(8), 982–990. Kraus, N., & Chandrasekaran, B. (2010). Music training for the development of auditory skills. Nature Reviews Neuroscience 11(8), 599–605. Krauss, T., & Galloway, H. (1982). Melodic intonation therapy with language delayed apraxic children. Journal of Music Therapy 19(2), 102–113. Large, E. W., Herrera, J. A., & Velasco, M. J. (2015). Neural networks for beat perception in musical rhythm. Frontiers in Systems Neuroscience 9. Retrieved from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4658578/ Large, E. W., & Jones, M. R. (1999). The dynamics of attending: How people track time-varying events. Psychological Review 106(1), 119–159. Lebrun, Y. (1998). Clinical observations and experimental research on the study of stuttering. Journal of Fluency Disorders 23(2), 119–122. Lim, K. B., Kim, Y. K., Lee, H. J., Yoo, J., Hwang, J. Y., Kim, J. A., & Kim, S. K. (2013). The therapeutic effect of neurologic music therapy and speech language therapy in post-stroke aphasic patients. Annals of Rehabilitation Medicine 37(4), 556–562. Limb, C. J., & Roy, A. T. (2014). Technological, biological, and acoustical constraints to music perception in cochlear implant users. Hearing Research 308(Suppl. C), 13–26. Limb, C. J., & Rubinstein, J. T. (2012). Current research on music perception in cochlear implant users. Otolaryngologic Clinics of North America, Cochlear Implants: Adult and Pediatric 45(1), 129–140. Lord, V. M., Hume, V. J., Kelly, J. L., Cave, P., Silver, J., Waldman, M., & Hopkinson, N. S. (2012). Singing classes for chronic obstructive pulmonary disease: A randomized controlled trial. BMC Pulmonary Medicine 12(1), 1–7.
Lortie, C. L., Rivard, J., Thibeault, M., & Tremblay, P. (2017). The moderating effect of frequent singing on voice aging. Journal of Voice 31(1), 112.e1–112.e12. Mainka, S., & Mallien, G. (2014). Rhythmic speech cueing (RSC). In M. H. Thaut & V. Hoemberg (Eds.), Handbook of neurologic music therapy (pp. 150–160). Oxford: Oxford University Press. Martikainen, A., & Korpilahti, P. (2011). Intervention for childhood apraxia of speech: A single-case study. Child Language Teaching and Therapy 27(1), 9–20. Mauszycki, S. C., & Wambaugh, J. L. (2008). The effects of rate control treatment on consonant production accuracy in mild apraxia of speech. Aphasiology 22(7–8), 906–920. Naeser, M. A., & Helm-Estabrooks, N. A. (1985). CT scan lesion localization and response to melodic intonation therapy with nonfluent aphasia cases. Cortex 21(2), 203–223. Natke, U., Donath, T. M., & Kalveram, K. T. (2003). Control of voice fundamental frequency in speaking versus singing. Journal of the Acoustical Society of America 113(3), 1587–1593. Ogawa, M., Hosokawa, K., Yoshida, M., Yoshii, T., Shiromoto, O., & Inohara, H. (2013). Immediate effectiveness of humming on the supraglottic compression in subjects with muscle tension dysphonia. Folia Phoniatrica et Logopaedica 65(3), 123–128. Onofre, F., Ricz, H. M. A., Takeshita-Monaretti, T., Prado, M. Y. D. A., & Aguiar-Ricz, L. (2013). Effect of singing training on total laryngectomees wearing a tracheoesophageal voice prosthesis. Acta cirúrgica brasileira/Sociedade Brasileira para Desenvolvimento Pesquisa em Cirurgia 28, 119–125. Petersen, B., Weed, E., Sandmann, P., Brattico, E., Hansen, M., Sørensen, S. D., & Vuust, P. (2015). Brain responses to musical feature changes in adolescent cochlear implant users. Frontiers in Human Neuroscience 9. Retrieved from https://www.frontiersin.org/articles/10.3389/fnhum.2015.00007/full Popovici, M. (1995). Melodic intonation therapy in the verbal decoding of aphasics. Revue Roumaine de Neurologie et Psychiatrie 33, 57–97. Przybylski, L., Bedoin, N., Krifi-Papoz, S., Herbillon, V., Roch, D., Léculier, L., … Tillmann, B. (2013). Rhythmic auditory stimulation influences syntactic processing in children with developmental language disorders. Neuropsychology 27(1), 121–131. Rocca, C. (2012). A different musical perspective: Improving outcomes in music through habilitation, education, and training for children with cochlear implants. Seminars in Hearing 33(4), 425–433. Rochette, F., Moussard, A., & Bigand, E. (2014). Music lessons improve auditory perceptual and cognitive performance in deaf children. Frontiers in Human Neuroscience 8, 488. Retrieved from https://doi.org/10.3389/fnhum.2014.00488 Rosenberger, P. B. (1980). Dopaminergic systems and speech fluency. Journal of Fluency Disorders 5, 255–267. Rowe, D. C., Van den Oord, E. J., Stever, C., Giedinhagen, L. N., Gard, J. M., Cleveland, H. H., … Waldman, I. D. (1999). The DRD2 TaqI polymorphism and symptoms of attention deficit hyperactivity disorder. Molecular Psychiatry 4(6), 580–586. Roy, N., Weinrich, B., Grey, S. D., Tanner, K., Stemple, J. C., & Sapienza, C. M. (2003). Three treatments for teachers with voice disorders: A randomised clinical trial. Journal of Speech, Language, and Hearing Research 46, 670–688. Saliba, J., Bortfeld, H., Levitin, D. J., & Oghalai, J. S. (2016). Functional near-infrared spectroscopy for neuroimaging in cochlear implant recipients. Hearing Research 338(Suppl. C), 64–75. Santoni, C., de Boer, G., Thaut, M., & Bressmann, T. (2018). Influence of altered auditory feedback on oral-nasal balance in song. Journal of Voice. Manuscript in print. Sauder, C., Roy, N., Tanner, K., Houtz, D. R., & Smith, M. E. (2010). Vocal function exercises for presbylaryngis: A multidimensional assessment of treatment outcomes. Annals of Otology,
Rhinology, & Laryngology 119(7), 460–467. Schlaug, G., Marchina, S., & Norton, A. (2008). From singing to speaking: Why singing may lead to recovery of expressive language function in patients with Broca’s aphasia. Music Perception 25(4), 315–323. Schlaug, G., Marchina, S., & Norton, A. (2009). Evidence for plasticity in white matter tracts of chronic aphasic patients undergoing intense intonation-based speech therapy. New York Academy of Sciences 1169, 385–94. Seger, C. A., Spiering, B. J., Sares, A. G., Quraini, S. I., Alpeter, C., David, J., & Thaut, M. H. (2013). Corticostriatal contributions to musical expectancy perception. Journal of Cognitive Neuroscience 25(7), 1062–77. Skingley, A., Page, S., Clift, S., Morrison, I., Coulton, S., Treadwell, P., … Shipton, M. (2014). Singing for breathing: Participants’ perceptions of a group singing programme for people with COPD. Arts & Health 6(1), 59–74. Skodda, S., & Schlegel, U. (2008). Speech rate and rhythm in Parkinson’s disease. Movement Disorders: Official Journal of the Movement Disorder Society 23(7), 985–992. Smith, S., & Thyme, K. (1976). Statistic research on changes in speech due to pedagogic treatment (the accent method). Folia Phoniatrica 28, 98–103. Sparks, R. W., Helm, N., & Albert, M. (1974). Aphasia rehabilitation resulting from melodic intonation therapy. Cortex 10(4), 303–316. Sparks, R. W., & Holland, A. L. (1976). Method: Melodic intonation therapy for aphasia. Journal of Speech and Hearing Disorders 41, 298–300 Stahl, B., Kotz, S. A., Henseler, I., Turner, R., & Geyer, F. (2011). Rhythm in disguise: Why singing may not hold the key to recovery from aphasia. Brain 134(10), 3083–3093. Stegemöller, E. L., Radig, H., Hibbing, P., Wingate, J., & Sapienza, C. (2017). Effects of singing on voice, respiratory control and quality of life in persons with Parkinson’s disease. Disability and Rehabilitation 39(6), 594–600. Stemple, J. C., Glaze, L. E., & Klaben, B. G. (2009). Clinical voice pathology: Theory and management (4th ed.). San Diego, CA: Singular Publishing Group. Stemple, J. C., Lee, L., D’Amico, B., & Pickup, B. (1994). Efficacy of vocal function exercises as a method of improving voice production. Journal of Voice 8(3), 271–278. Stephan, K. M., Thaut, M. H., Wunderlich, G., Schicks, W., Tian, B., Tellmann, L., … Hömberg, V. (2002). Conscious and subconscious sensorimotor synchronization: Cortex and the influence of awareness. NeuroImage 15(2), 345–352. Sundberg, J., Birch, P., Gümoes, B., Stavad, H., Prytz, S., & Karle, A. (2007). Experimental findings on the nasal tract resonator in singing. Journal of Voice 21(2), 127–137. Tamplin, J. (2008). A pilot study into the effect of vocal exercises and singing on dysarthric speech. Neurorehabilitation 23(3), 207–216. Tamplin, J., Baker, F. A., Grocke, D., Brazzale, D. J., Pretto, J. J., Ruehland, W. R., … Berlowitz, D. J. (2013). Effect of singing on respiratory function, voice, and mood after quadriplegia: a randomized controlled trial. Archives of Physical Medicine and Rehabilitation 94(3), 426–434. Tanner, K., Roy, N., Merrill, R. M., & Power, D. (2005). Velopharyngeal port status during classical singing. Journal of Speech, Language, and Hearing Research 48(6), 1311–1324. Tanner, M. A. (2012). Voice improvement in Parkinson’s disease: Vocal pedagogy and voice therapy combined (Doctoral dissertation). University of Alberta. Tanner, M. A., Rammage, L., & Liu, L. (2016). Does singing and vocal strengthening improve vocal ability in people with Parkinson’s disease? Arts & Health 8(3), 199–212. Tautscher-Basnett, A., Tomantschger, V., Keglevic, S., & Freimuller, M. (2006). Group therapy for individuals with Parkinson disease (PD) focusing on voice strengthening. Presentation to the 4th
World Congress for Neurorehabilitation. Thaut, M. H., Thaut, C. P., & McIntosh, K. (2014). Melodic intonation therapy (MIT). In M. H. Thaut & V. Hoemberg (Eds.), Handbook of neurologic music therapy (pp. 140–145). Oxford: Oxford University Press. Thaut, M. H., & Hoemberg, V. (Eds.). (2014). Handbook of neurologic music therapy. Oxford: Oxford University Press. The American Speech-Language-Hearing Association (2017). Spasmodic dysphonia (Website). https://www.asha.org/public/speech/disorders/Spasmodic-Dysphonia Thomson, J. M., Leong, V., & Goswami, U. (2012). Auditory processing interventions and developmental dyslexia: A comparison of phonemic and rhythmic approaches. Reading and Writing 26(2), 139–161. Titze, I. R. (2006). Voice training and therapy with a semi-occluded vocal tract: Rationale and scientific underpinnings. Journal of Speech, Language, and Hearing Research 49(2), 448–459. Tonkinson, S. (1994). The Lombard effect in choral singing. Journal of Voice 8(1), 24–29. Tourville, J. A., & Guenther, F. H. (2011). The DIVA model: A neural theory of speech acquisition and production. Language and Cognitive Processes 26(7), 952–981. Trainor, L. J., Clark, E. D., Huntley, A., & Adams, B. (1997). The acoustic basis of infant preferences for infant-directed singing. Infant Behavior and Development 20(3), 383–396. Van der Meulen, I., Van de Sandt-Koenderman, M. E., & Ribbers, G. M. (2012). Melodic intonation therapy: present controversies and future opportunities. Archives of Physical Medicine and Rehabilitation 93(1), S46–52. Van Riper, C. (1982). The nature of stuttering (2nd ed.). Englewood Cliffs, NJ: Prentice Hall. Verdolini, K., Druker D. G., Palmer, P. M., & Samawi, H. (1998). Laryngeal adduction in resonant voice. Journal of Voice 12(3), 315–327. Verdolini-Marston, K., Burke, M. K., Lessac, A., Glaze, L., & Caldwell, E. (1995). Preliminary study of two methods of treatment for laryngeal nodules. Journal of Voice 9(1), 74–85. Victor, M., & Ropper, A. H. (2001). Adams and Victor’s principles of neurology (7th ed.). New York: McGraw-Hill. Wambaugh, J. L., & Martinez, A. L. (2000). Effects of rate and rhythm control treatment on consonant production accuracy in apraxia of speech. Aphasiology 14(8), 851–871. Wambaugh, J. L., Nessler, C., Cameron, R., & Mauszycki, S. C. (2012). Acquired apraxia of speech: The effects of repeated practice and rate/rhythm control treatments on sound production accuracy. American Journal of Speech-Language Pathology 21(2), S5–S27. Wan, C. Y., Zheng, X., Marchina, S., Norton, A., & Schlaug, G. (2014). Intensive therapy induces contralateral white matter changes in chronic stroke patients with Broca’s aphasia. Brain & Language 136(Suppl. C), 1–7. Wiener, M., Lohoff, F. W., & Coslett, H. B. (2011). Double dissociation of dopamine genes and timing in humans. Journal of Cognitive Neuroscience 23(10), 2811–2821. Wiener, M., Lee, Y.-S., Lohoff, F. W., & Coslett, H. B. (2014). Individual differences in the morphometry and activation of time perception networks are influenced by dopamine genotype. NeuroImage 89, 10–22. Wong, P. C. M., Ettlinger, M., & Zheng, J. (2013). Linguistic grammar learning and DRD2-TAQ-IA polymorphism. PLoS ONE 8(5), e64983. Wu, J. C., Maguire, G., Riley, G., Fallon, J., LaCasse, L., Chin, S., … Lottenberg, S. (1995). A positron emission tomography [18F] deoxyglucose study of developmental stuttering. Neuroreport 6(3), 501–505. Yanagisawa, E., Estill, J., Mambrino, L., & Talkin, D. (1991). Supraglottic contributions to pitch raising: Videoendoscopic study with spectroanalysis. Annals of Otology, Rhinology, &
Laryngology 100(1), 19–30. Yiu, E. M. L., & Ho, E. Y. Y. (2002). Short-term effect of humming on vocal quality. Asia Pacific Journal of Speech, Language and Hearing 7(3), 123–137. Yorkston, K. M., Beukelman, D. R., Strand, E. A., & Hakel, M. (2010). Management of motor speech disorders in children and adults. Austin, TX: Pro-Ed Inc. Ziegler, A., & Hapner, E. R. (2013). Phonation resistance training exercise (PhoRTE) therapy. In A. Behrman & J. Haskell (Eds.), Exercises for voice therapy (pp. 147–148). San Diego, CA: Plural Publishing. Ziegler, A., Verdolini Abbott, K., Johns, M., Klein, A., & Hapner, E. R. (2014). Preliminary data on two voice therapy interventions in the treatment of presbyphonia. The Laryngoscope 124(8), 1869–1876. Zumbansen, A., Peretz, I., & Hébert, S. (2014). Melodic intonation therapy: Back to basics for future research. Frontiers in Neurology 5, 1–11. Retrieved from https://doi.org/10.3389/fneur.2014.00007
CHAPT E R 30
NEUROLOGIC MUSIC T H E R A P Y TA R G E T I N G COGNITIVE AND AFFECTIVE FUNCTIONS S H A N TA L A H E G D E
C
A C
F
F R
:
C , emotion, and social cognition play a central role in functional recovery and determine overall quality of life in neurological conditions such as traumatic brain injury (TBI), stroke/cerebrovascular accident, dementia, other degenerative conditions like Parkinson’s disease, and in major psychiatric conditions such as schizophrenia and bipolar affective disorders, as well as common psychiatric conditions such as anxiety and depression (Diamond, Felsenthal, Macciocchi, Butler, & Lally-Cassady, 1996; Elvevag & Goldberg, 2000; Lam, Kennedy, McIntyre, & Khullar, 2014; Sun, Tan, & Yu, 2014). Pharmacological treatment has shown limited effects in alleviating deficits in cognition, emotion, and social cognition neurological and psychiatric conditions (Harvey, Green, Keefe, & Velligan, 2004; Marder, 2006; Müller, 2002, 2012; Rund & Borg, 1999). Over the last three decades, cognitive remediation has emerged as the best available
non-pharmacological treatment method to target cognitive deficits (Cicerone et al., 2000, 2011; Gordon et al., 2006; Keshavan, Vinogradov, Rumsey, Sherrill, & Wagner, 2014; Rohling, Faust, Beverly, & Demakis, 2009; Volpe & McDowell, 1990). The terms remediation, rehabilitation, and retraining have been interchangeably used in scientific literature. Technically, the term rehabilitation encompasses not only cognitive remediation but includes methods such as holistic and multimodal methods (Hegde, 2014). The holistic approach in rehabilitation addresses cognitive, emotional, and other non-cognitive domains of functioning.
C
R M
I
: T C
F Cognitive remediation (CR) provides patients with the cognitive and perceptual skills necessary to perform tasks or solve problems which are currently difficult, but which were within their capabilities before injury (Cicerone et al., 2000; Diller & Gordon, 1981; Prigatano, 1997; Sohlberg & Mateer, 2001). CR is described as an intervention that aims to improve cognitive processes (attention, memory, executive function, social cognition, or meta-cognition) with the goal of durability and generalization. The final goal is to improve the patient’s ability to adapt and to be able to gain normal or near normal functioning in their daily living (Benedict et al., 1994; Spring & Ravdin, 1992). There are two approaches to CR: 1. Compensatory approach: In this approach the goal is not to target the specific cognitive function, but rather to focus on altering the individual’s behavior or the environment to help compensate in the area of deficits. This approach is meant to help the individual cope by providing them with alternative techniques such as reminders to compensate for memory deficits (Raskin, 2010). This approach is beneficial if the individual has intact basic cognitive functions to be able to utilize the aids or reminders.
Restorative approach: This focuses directly on the areas of cognitive 2. deficits and aims to improve cognitive functions by using techniques such as drill-based exercises, either paper and pencil based or computer based tasks. The aim is to enable the individual to accurately perform the task and gain proficiency by repetitively engaging the chosen task and be able to hold this skill over longer duration (Raskin, 2010). Consistent repetition on tasks which are carefully designed and hierarchically placed in levels of difficulty is crucial in CR. The assumption is that repetitive activation not only leads to clinical and behavioral domains of functioning but leads to changes in cortical reorganization. Repeated practice facilitates neural recovery and restoration of functions mediated by the underlying neural circuits. It was always acknowledged that the interaction between biological systems—the human brain in this case—and the environment is bidirectional and certain types of environmental experiences can have a positive impact on biological processes such as cognitive functions. CR is based on the same principle, which in today’s neuroscientific field is termed neural plasticity (Raskin, 2010). Neural plasticity is considered the veritable essence of the brain. It is the capacity of the brain to change based on the experiences in the environment. This means that the brain is a malleable organ and it is constantly undergoing reorganization and change (BruelJungerman, Davis, & Laroche, 2007). Studies on animals demonstrating changes in cortical organization and the interaction between the different brain systems following learning tasks provided the initial evidence for neural plasticity (Kleim, Barbay, & Nudo, 1998). Research studies that followed on similar lines in the field of neural plasticity led researchers to explore methods to improve brain functions which led to the emergence of CR as a treatment method. Initial studies were on patients with traumatic brain injury and stroke. Some studies reported that those who had lost motor skills due to brain injury improved on motor functioning due to motor-based exercises and in addition improved in brain function in the motor cortex. With studies emerging in this direction, researchers developed different methods and practice principles that could be used to alleviate specific cognitive functions and improve overall brain functions in various disorders (Ben-Yishay, Piasetsky, & Rattok, 1985; Ben-Yishay & Prigatano,
1990; Cicerone, 2012; Faralli, Bigoni, Mauro, Rossi, & Carulli, 2013; Podd, 2012). Today, CR is considered as a crucial treatment method in ameliorating cognitive functions in a range of clinical conditions. Systematic research in this direction is ongoing. A careful examination of literature on CR in various clinical conditions indicates variability in terms of the CR method and tasks. There is a need for high-quality evidenced-based research studies. Meta-analyses on CR as an intervention, especially the restorative approaches in various neurological and psychiatric conditions, have shown small (Cohen’s d = 0.30) to large effect sizes (Cohen’s d = 0.71) in traumatic brain injury(TBI) (Cernich, Kurtz, Mordecai, & Ryan, 2010; Cicerone et al., 2011; Rohling et al., 2009). There are relatively fewer systematic studies on CR in degenerative conditions such as Alzheimer’s or Parkinson’s disease (reference) compared to TBI. The efficacy of CR in degenerative conditions is still unclear and there are no meta-analytic studies. In major psychiatric condition such as schizophrenia, CR methods have shown small to moderate effect sizes (Wykes, Huddy, Cellard, McGurk, & Czobor, 2011) and studies from developing countries such as India number only a handful (Hegde, 2017). CR when provided along with other forms of rehabilitation such as occupational training or social skills training seemed to show larger effects. Research on examining wider generalizability of improved cognitive functions targeted via CR and sustenance of this improvement for longer duration has been the biggest challenge to overcome (Cappa et al., 2005; Chung, Pollock, Campbell, Durward, & Hagen, 2013; Schutz & Trainor, 2007; Wykes et al., 2011). Newer effective methods of CR are still very much needed in the field of neuropsychological rehabilitation.
M
B
P
The underlying principle of CR is “neural plasticity”—the capacity of the brain to reorganize itself and relearn functions that were available prior to the acquired injury. Neural plasticity is the ability of neural circuits to change their structure, function, and connectivity in response to experience. Neural plasticity is known to underlie functional recovery in a broad array
of neurological and psychiatric conditions. Cognitive deficits which play a central role in recovery could be alleviated by CR, the treatment method designed to improve brain structure as well as functions. In the last three decades neuroscientific research on music has taken a significant leap owing to advancement in scientific techniques and methods such as EEG-ERP, fMRI, TMS, rTMS, MEG, etc. Scientific understanding of neural correlates of music perception and cognition has offered a strong edifice for evidenced-based music therapy techniques and process (Thaut, 2005a). Research on music perception and production has proven that music engages almost all cognitive processes such as acoustic analysis, information processing, attention, sensorimotor integration, executive functions, language processing, long-term memory, emotion, and creativity. Engaging passively and more so actively in music leads to activation of the neural networks underlying cognitive processes. The activation is not restricted to music processes alone and it is known to extend to non-musical domains such as motor, language, cognitive, and affective domains of functions. In other words, music is produced by involving a host of sensorimotor, cognitive, and language functions and the music in turn stimulates complex cognitive, affective, sensorimotor, and language processes that have the capacity to generalize to non-musical domains of functioning (Levitin & Tirovolas, 2009; Patel 2010; Peretz & Zatorre, 2005; Schlaug, 2009; Thaut, 2010). In addition to sensorimotor, language, and overall cognitive processes, music is also a powerful method used in emotion regulation. It is a fact that music can elicit strong emotions. Music, similar to real-life emotions, engages the very same frontal system and the limbic, the mesolimbic system, and the reward area nucleus accumbens in eliciting and processing emotions. Music alters psychophysiological functions such as perception of pain, regulation of autonomic arousability, blood pressure, respiration, and heart rate, and also causes neurochemical changes. The neurochemical changes that music brings about may be grouped into four different domains: dopamine and opioids which mediate reward, motivation, and pleasure; cortisol, corticotrophin-releasing hormone (CRH) and adrenocorticotrophic hormone (ACTH) which mediate stress and arousal; serotonin and the peptide derivatives of proopiomelanocortin (POMC), alpha-melanocyte-stimulating hormone and beta-endorphin which mediate immunity; and oxytocin which mediates social affiliation (Blood & Zatorre, 2001; Chanda & Levitin, 2013;
Salimpoor, Zald, Zatorre, Dagher, & McIntosh, 2015; Sutoo & Akiyama, 2004). Neuroscientific investigations have strongly implicated music in enhanced structural as well as functional brain plasticity (Jäncke, 2009). Music is a highly complex and structured stimulus with several dimensions. The nature, structure, intensity, and complexity of the experience or stimulus determine the neural changes that occur. Music, as a temporal auditory language, is today considered an effective method in neurorehabilitation, and music with its temporal structure and pattern is known to enhance cognitive functions (Koelsch, 2009; Thaut, 2005b; Thaut & Hoemberg, 2014). The temporal and sequential aspect of music is known to serve as a “scaffold” to bootstrap the temporal and sequential pattern in cognitive functions such as information processing, sustained attention, and memory (Conway, Pisoni, & Kronenberger, 2009). Engaging in music both actively and passively is proven to be one of the best forms of cognitive exercise and as a biological phenomenon it is considered as a signal of cognitive and emotional flexibility and cognitive fitness (Herholz & Zatorre, 2012; Levitin & Tirovolas, 2009; Peretz, 2006). Musicians have been studied as a special group, to study neural plasticity and establish the effects of musical training and underlying neural changes (Münte, Altenmüller, & Jäncke, 2002). Music is now being systematically studied and used in altering and regulating cognitive processes and emotion regulation which can be generalized to non-musical domains of functioning, keeping the principle of neural plasticity as the basic assumption in a range of neurological and psychiatric conditions where cognitive deficits play a central role (Särkämö, Altenmüller, Rodríguez-Fornells, & Peretz, 2016; Särkämö, Tervaniemi, & Huotilainen, 2013; Sihvonen et al., 2017; Wan & Schlaug, 2010). Music therefore is considered as a powerful and integrative method to address multiple domains of functioning—cognition, emotional, and social—and as a valuable tool in neuropsychological rehabilitation.
M
T M
: F N
S
S M
Addressing cognitive dysfunction via music is a recent frontier in music therapy and has emerged as one of the most promising and innovative new methods. Considering music as a form of therapy has a long history and has been practiced across cultures over several centuries, dating back even to prehistoric times (Thaut, 2015). Music has always been considered to facilitate “overall well-being” and the “feel-good” factor by enhancing overall emotional health (de l’Etoile, 2010). Music therapy has been considered effective in the reduction of anxiety, depression, and agitation. However, often the process underlying the mechanisms of change was unclear. The changes were explained relying upon the popular and prevailing psychological theories such as behavioral, psychoanalytic, and humanistic schools of thought (de l’Etoile, 2010, 2016). Such explanations have had limited contributions in understanding the underlying processes of music in therapy which is crucial in standardizing techniques to target specific functioning as well as in evaluating the specific outcome. With advancement in neuroscientific investigations on music perception and cognition and better understanding of the underlying neural correlates, there has been a major change in music therapy principles and objectives. Music therapy has shifted from its perspective as a social science model to a neuroscience-based model (Thaut, 2005a; Thaut, McIntosh, & Hoemberg, 2014). An increasing number of controlled trials have examined the effects of music-based interventions such as listening, singing, and actively engaging in music by playing an instrument in neurorehabilitation (Sihvonen et al., 2017).
A T
O
N T
M E
M The recent advancement in the field of music therapy has been the development of the neuroscience-based approach to music therapy practice and research. “Neurologic Music Therapy” (NMT) is based on neuroscientific modes of music perception, cognition, and production. NMT consists of twenty standardized techniques targeting three main domains of functioning, namely, sensorimotor, language, and cognitive-affective
dysfunctions that are the result of neurologic disease of the human nervous system (Clair, Pasiali, & Lagasse, 2008; de l’Etoile, 2010; Thaut, 2005a). Each of the techniques focuses on the influence of music on non-musical domains of functioning with non-musical therapeutic goals. NMT is based on two interrelated models to explain how music-based interventions and training can access and modulate cognitive functions in a neuropsychological rehabilitation context. The two models of the NMT are the “Rational Scientific Mediating Model” (RSMM) and the “Transformational Design Model” (TDM) (de l’Etoile, 2016; Hegde, 2014; Thaut, McIntosh, & Hoemberg, 2014). The TDM is the clinical component or practical extension of the scientific theory model of the RSMM.
Rational Scientific Mediating Model (RSMM) The basic premise of the RSMM is that the scientific basis of music as therapy should be anchored in the empirical studies of the neurological, psychological, and physiological foundations of music perception, cognition, and production (Thaut, 2005a; Thaut, McIntosh, & Hoemberg, 2014). The RSMM was conceptualized to provide a systematic epistemology for translational research. This model helps in establishing the link between the scientific findings of music perception, cognition, and production and the rehabilitation of non-musical functions in the biopsycho-social domains. The RSMM links the knowledge of musical and non-musical behavior, thereby explaining the mechanisms of change to support effective therapy and rehabilitation. It is considered as dynamic model which is open to incorporate newer research findings thereby contributing to a deeper understanding of the underlying process of music as therapy. The RSMM plays a crucial role in the development of musicbased therapeutic techniques as well as the selection of appropriate therapeutic methods. This model helps in validation of the techniques of NMT. It consists of four steps: 1. Musical response model: The focus is on how we perceive, produce, or respond to music—its neurological, physiological, and psychological components. This includes understanding the
underlying relevant bodily systems, the neural correlates, and various perceptual, motor, and cognitive processes. 2. Non-musical parallel model: The focus is on similar perceptual, motor, and cognitive processes in non-musical brain and behavior functions and linking this with step 1 by investigating shared or parallel underlying mechanisms between musical and non-musical behavior. This is an important step before suggesting that music will have positive effect on the non-musical domain of functioning. 3. Mediating model: The focus is on combining the previous two steps to study whether music can influence parallel non-musical behaviors in normal and clinical populations. 4. Clinical research model: The focus is on the long-term, therapeutic effects of music in non-musical domain functioning and studying the carry-over effect after the treatment program.
The Transformational Design Model (TDM) The TDM provides systematic step-by-step guidance in designing, implementing, and evaluating the clinical intervention (Thaut, 2014). The validity of the intervention will be explained through the RSMM. The TDM focuses on establishing focused and clearly delineated therapeutic goals. The TDM is important in bringing together the traditional cognitive rehabilitative techniques with the NMT system. There are six steps in the TDM model: 1. Diagnostic and functional/clinical assessment of the patient: This is the diagnostic and etiological assessment of the patient and applying clinical assessments for optimal treatment selection and evaluation of progress across the therapy sessions. 2. Development of appropriate and measurable therapeutic goals and objectives. 3. Design of functional, non-musical therapeutic exercise structures and stimuli to accomplish the clinical goals and objectives.
4. Translation of step 3 into functional therapeutic music exercises. This is a crucial step in the NMT process. The therapist has to translate functional goals as therapeutic exercises incorporating musical elements and stimuli. This means that all functional exercises elements get translated into musical elements. This translational or transformational process is guided by three principles (a) Scientific validity—the translation process should be congruent with the scientific information developed in the RSMM. (b) Musical logic—the musical experience in therapy has to confirm to the aesthetic and artistic principles of good musical forms even at its most basic level. (c) Structural equivalence—the therapeutic music exercise should be similar in its structure and function to the non-musical functional design; all non-musical exercise elements and stimuli have to be translated musically. 5. Outcome assessment/post-intervention assessment—repeat of the baseline assessment carried out in step 1. Assessments may be carried out after each session, over a set of sessions, at the end of the treatment period, and at follow-ups. 6. Transfer of therapeutic learning to functional applications for “activities of daily living” (ADL) (de l’Etoile, 2016; Thaut, 2005a, 2014; Thaut, McIntosh, & Hoemberg, 2014). There are three major areas which the various techniques of NMT address: 1. Sensorimotor functions: targeting motor functions, mobility strength, endurance, cadence and coordination of gross and fine motor movements in lower and upper extremities (de l’Etoile, 2010; Thaut, 2005c). 2. Speech and language functions: targeting vocal control, speech production, and meaningful usage of verbal and non-verbal communication (de l’Etoile, 2010; Thaut, 2005d). 3. Cognitive and affective functions: targeting basic and higher-order cognitive functions such as attention, memory, executive functions,
emotion, and psychosocial skills (Clair et al., 2008; de l’Etoile, 2010; Thaut, 2005b). The first two areas are covered in other chapters of this book. This chapter will focus on various techniques targeting cognitive and affective functions. Enhanced neuroscientific understanding of music perception and cognition along with advancement in non-invasive research tools to study human brain functions over the last two to three decades have contributed to linking music and cognitive functions as well examining the shared and unique neural correlates underlying music and non-musical cognitive processes. Until recently, lack of standardized methods was a major drawback in the field of music therapy. CR via NMT-based techniques is one of the recent developments in the area of NMT in comparison to clinical research in the domain of sensorimotor and speech and language functions (Gardiner & Horwitz, 2015; Thaut, 2010). Careful analysis of the literature indicates only a handful of studies with a strong theoretical background examining the effects of traditional music therapy on cognitive functions. NMT techniques, developed on the basis of RSMM and TDM, are evidencebased, theoretically grounded, and standardized in terminology and methods. The six steps of the TDM, which is the clinical application model, run parallel to the fundamental principles of CR. This comparison is presented in tabular format in Table 1.
Table 1. A comparison of basic principles of cognitive remediation and clinical application model of (TDM) of NMT Key steps and principles of cognitive remediation
Six steps of TDM of NMT
A detailed assessment at baseline, post- 1 and 5—Diagnostic and functional/clinical intervention, and at follow-up assessment, post-intervention, and follow(neuropsychological functioning, up assessment. psychosocial functioning, evaluation of mood, etc.). Standardized neuropsychological/neurocognitive tests are used to obtain the baseline level cognitive functioning. The tests are readministered after the completion of intervention to quantify the changes in the specific domains of cognitive functions.
Key steps and principles of cognitive remediation
Six steps of TDM of NMT
Underlying core principle—neural plasticity: “neuronal sparing” (prevention of further deterioration or “neuronal death”), and neuronal reorganization (growth of newer neuronal reconnection to perform and accomplish the task). Marking goals and objectives of the intervention based on needs and the detailed evaluation which facilitates identification areas of cognitive functions which need specific focus. Development of different methods of exercises, stimulus modalities, levels of complexity, and response demands (paper–pencil based/computer based) with scientific validity. The exercises should be able to target the cognitive function that it is expected to improve. The remediation exercises must be organized in levels of difficulties—from basic skills such as attention and concentration, then progressing to more complex skills such as learning, memory, executive functions, affect, and social behavior. In a good cognitive remediation method, the intervention tasks and exercises should be process-specific targeting specific areas of cognitive functioning such as attention, memory, or executive functions.
2. Core principle is that music and musicbased exercises will facilitate “neural plasticity.” 3. Development of appropriate and measurable therapeutic goals and objectives. 4. Development of functional non-musical therapy and exercise and stimuli.
Final aim is to observe improved cognitive functions transfer to activities of daily living. Holistic approach will include education of the patient and family members. Family members are included to enhance the support system.
5. Translation of functional non-musical therapy exercises to music-based exercises and stimuli with scientific validity, musical logic, and structural equivalence. 6. Transfer of therapeutic learning to functional applications (NMT emphasizes generalization of therapeutic gains to activities of daily living; techniques include homework exercises and involvement of family members) to help transfer of therapeutic learning to real-life situations.
Key references for principles of CR: Ben-Yishay & Prigatano, 1990; Eack, 2012; Prigatano, 1997; Raskin, 2010.
T C
NMT T E
The various techniques of NMT which address cognitive-affective functions are presented along with a brief description of the technique as well as chief target populations in Table 2.
S T
B C P
NMT T , E F
,
The techniques of NMT, as stated earlier, are based on the RSMM and TDM. The two models are dynamic in nature by being able to integrate recent research studies to help translations to standardized therapeutic techniques. A strong scientific link for music as a method of cognitive remediation comes from a growing body of research that links music and a host of cognitive functions such as attention, temporal order learning, visuotemporal reasoning, and auditory verbal memory (Drake, Jones, & Baruch, 2000; Hitch, 1996; Kilgour, Jakobson, & Cuddy, 2000; Sarnthein et al., 1997; Shaw & Bodner, 1999). The temporal structure of music remains a central element in therapy and rehabilitation. Rhythm for instance is known to play an important role in tuning and modulating attention, albeit musical attention (Drake et al., 2000). Rhythmic patterns entrain attention focus by interacting with attention oscillators via coupling mechanisms (Thaut, 2010). There exists strong evidence in the field that sensory rhythms entrain or synchronize attentional processes. This basically means that sensory rhythms drive a periodic series of attentional peaks and troughs that occur at roughly equal temporal intervals (Jones, 1976; Jones & Boltz, 1989; Large & Jones, 1999). A summary of behavioral findings of studies examining rhythmic entrainment and attention is that rhythmically expected events are better detected or discriminated than events occurring arhythmically (Jones, Boltz, & Kidd, 1982; Jones, Moynihan, Mackenzie, & Puente, 2002; McAuley & Jones, 2003). Studies have also shown evidence of cross-modal effects, i.e., auditory entrainment on the temporal allocation of visual attention. Rhythmic auditory stimulus alters the temporal distribution of visual attention (Miller, Carlson, & McAuley, 2013). Recently the role of temporal attention has been examined carefully for its role in aiding language development (de Diego-Balaguer, MartinezAlvarez, & Pons, 2016; Fujii & Wan, 2014; Jung, Sontag, Park, & Loui, 2015). Recently studies on clinical populations such as those with Parkinson’s disease have shown high correlation between non-musical cognitive functions and performance on various temporal components of rhythm such as tempo, beat discrimination, and beat perception in a musical context. The performance on the rhythm perception task predicted performance on the non-musical domains of cognitive functions such as focused attention and working memory (Biswas, Hegde, Jhunjhunwala, & Pal, 2016). In patients with disorders of consciousness, cerebral responses were observed when
their first name was called out after presenting the patient’s preferred music listening condition than in the continuous sound condition. The cerebral responses were recorded using bedside EEG recordings and the event related potential (ERP) method examining the P300 and N200 ERP waveforms which are also considered as an index of discriminative processing to a very salient and emotional word—such as one’s first name (Castro et al., 2015). Presence or absence of this discriminative cerebral response is strongly associated with outcome of patients having disorders of consciousness (Fischer, Luaute, Adeleine, & Morlet, 2004). Studies have shown parallels in temporal chunking principles of nonmusical memory processes with musical memory formation based on the structural principles of phrasing, grouping, and hierarchical abstraction in musical patterns (Deutsch, 1982). There are significant numbers of studies that have shown effects of music in enhancing memory for non-musical material (Ho, Cheung, & Chan, 2003; Jakobson, Cuddy, & Kilgour, 2003; Thaut, Peterson, McIntosh, & Hoemberg, 2014; Wallace, 1994). The musical mnemonics method of rehearsal has shown superior benefits over verbal rehearsal in children with learning disability and developmentally disabled children (Gfeller, 1983; Kern, Wolery, & Aldridge, 2007; Wolfe & Hom, 1993). Music is proven to serve as an effective method to improve mood, orientation, and remote episodic memory, as well as attention and executive functions, in patients with early dementia (Särkämö, Tervaniemi, et al., 2014). The musical mnenomics method has been shown to improve verbal memory in patients with multiple sclerosis (MS). In a series of experiments using EEG, music-based learning led to increased coherence (phase-locked synchronization) within and between oscillatory brain networks in alpha and gamma bands. Higher oscillatory activity in lower alpha band rhythms in bilateral prefrontal neural networks underlying memory was reported compared to spoken condition. Patients with MS were presented with the sung or spoken version of the Auditory Verbal Learning Test. Systems-level brain activity with oscillatory network synchronization during music-assisted learning was measured. “Learning related synchronization” (LRS) was calculated. LRS was the percentage change in EEG spectral power from the first time the word was presented to the average of the subsequent word encoding trials. It was observed that LRS differed significantly between the music and spoken condition in low alpha and upper beta bands. Patients who were presented the words using
the musical template showed overall better word memory and better word order memory as well as stronger bilateral frontal alpha LRS than the patients who received the spoken word template. It is suggested by the authors of this work that the temporal structure implicit in musical stimuli seems to enhance “deep coding” during the verbal learning and sharpens the timing of neural dynamics in brain networks degraded by demyelination in patients with MS (Thaut, Peterson, & McIntosh, 2005; Thaut, Peterson, et al., 2014; Peterson & Thaut, 2007). Musical processing in general is often observed to be spared in patients with neurodegenerative conditions such as Alzheimer’s disease (AD) and even brief exposure to music has been shown to have positive impact on their performance on certain cognitive tasks such as autobiographical memory and verbal fluency. The brain areas associated with music cognition seem to be preferentially spared in AD. Regions identified to encode musical memory have been shown to correspond with areas that showed substantially minimal cortical atrophy (as measured with magnetic resonance imaging), and minimal disruption of glucose metabolism (as measured with (18)F-fluorodeoxyglucose positron emission tomography), when compared to the rest of the brain (Jacobsen et al., 2015; Limb, 2006; Thompson, Moulin, Hayre, & Jones, 2005). In one study, AD patients were exposed to lyrics of unfamiliar children’s songs bimodally at encoding and visual stimuli and were presented with either sung or spoken lyrics. The findings showed that AD patients demonstrated better recognition accuracy for the lyrics that were presented musically than for those that were spoken. Healthy controls, however, did not show any significant difference between the two conditions (Simmons-Stern, Budson, & Ally, 2010). The musical neglect training (MNT) method has shown significant impact in targeting hemi-neglect conditions often seen in patients with right hemisphere cerebrovascular disease at the temporoparietal junction of the infero-posterior parietal cortex. Most of these patients also suffer from anosognosia wherein patients deny having any unilateral neglect condition. Across various studies, musical stimuli have been considered superior to other sensory cues such as visual or tactile cues (Hommel et al., 1990). The commonly used techniques to target hemi-neglect conditions include neck vibration, limb activation, or optokinetic stimulation. The effects of such treatment are transitory in nature lasting for less than thirty minutes. The MNT method such as playing a scale on an instrument such as a keyboard
has been shown to sustain the gains observed for over a week (Bernardi et al., 2017; Bodak, Malhotra, Bernardi, Cocchini, & Stewart, 2014). Variations in providing tonal cues such as lower pitch and higher pitch have been shown to modulate line-bisection tasks with low pitch producing leftward or downward biases and high pitch producing rightward or upward biases suggesting how visuomotor processing can be spatially modulated by auditory cues (Ishihara et al., 2013). Patients with hemi-neglect seem to sustain the gradual improvement up to one week to four months duration and this translates to changes in day-to-day activities (Bodak et al., 2014; Guilbert, Clement, & Moroni, 2017; Ishihara et al., 2013). A successful neuropsychological rehabilitation process will address not only cognitive functions, but also address issues that will facilitate translation of improvement in cognition to psychosocial functioning and in real-life situations. Anxiety and depression are a major concern in several neurological conditions such as traumatic brain injury, stroke, and degenerative conditions (Raglio et al., 2015). A single blind randomized controlled study has shown that everyday music listening during the first two months after stroke improves cognitive functions (verbal memory, Cohen’s d = 0.88, and focused attention, Cohen’s d = 0.92) as well as mood (lowered depression, Cohen’s d = 0.77) in stroke patients compared to a group who received audio book listening as a control intervention (Särkämö et al., 2008). There were also neuroanatomical changes in the recovering brain. Gray matter reorganization observed in the frontal areas correlated with improved verbal memory, focused attention, and language skills and gray matter reorganization in the left ventral/subgenual anterior cingulate cortex correlated with reduced negative mood (Särkämö, Ripollés, et al., 2014). NMT has been successfully used in targeting psychosocial issues (Kleinstauber & Gurr, 2006; Nayak et al., 2000). Adherence to music-based intervention seems to be better and drop-out rates are low suggesting that music has an innate ability to sustain the interest of patients. To sustain interest and reduce drop-out rates has been a great challenge in research studies on CR (Maratos, Gold, Wang, & Crawford, 2008; Mossler, Chen, Heldal, & Gold, 2011). Research on music in rehabilitation has utilized a variety of music-based intervention methods, including guided listening to singing and playing instruments. NMT includes standardized techniques which demand the clinician or researcher to have undergone specific training in NMT to
maintain the standards of treatment methods. There has been a significant surge in systematic research in the field of music and CR, and NMT has provided the scientific basis and framework in carrying out systematic work. A preliminary study using a quasi-experimental design examined the immediate effect of NMT in a group-setting on patients with brain injury. The treatment group received brief sessions of NMT lasting for thirty minutes with each session targeting one of the following functions: attention, memory, executive functions, and emotional adjustment. A control group received rest periods for the same duration in a quiet room. The findings showed that NMT positively affected executive functions, mental flexibility (Cohen’s d = 1.21), and there was a significant decrease in depression with medium effect size (Cohen’s d = 0.52) and anxiety to a small extent (Cohen’s d = 0.28). Therapy did not have a positive impact on attention or memory. This study did not examine sustenance of improvement over time (Thaut et al., 2009). There is indeed a need for systematic research examining the optimal intervention duration and follow-up of sustenance of treatment gains.
S
F
D
Cognition and emotional domains of functioning play a crucial role in the functional recovery of patients suffering from neurological and psychiatric conditions. There is undoubtedly a need for newer methods in neurorehabilitation which can bring about not only significant changes in functioning but produce long-lasting improvement. Newer methods of CR are warranted to either replace or complement the existing methods of CR. Traditional CR methods and research studies are also being challenged in terms of its ability to target multiple domains of functioning—cognition, affect, and psychosocial functioning. Music as therapy is gaining a new perspective with advancement in neuroscientific research on music perception, cognition, and production. Music and music-based intervention are today considered ideal methods for CR as it has the capacity to engage auditory, motor, language, cognitive, and emotional functions across cortical and subcortical brain regions. The chapter has presented a brief overview of the various techniques of NMT targeting cognition, affect, and
psychosocial functions. At present the existing scientific literature is often characterized by small sample sizes and highlights the need for standardized methods of intervention across studies. This has been the case even with research in restorative methods of CR in the field of neuropsychological rehabilitation. Future research on music should consider this limitation before intervention methods are planned and standardized. There is a need for stronger research methodology and definition of the medium or parameter in music related to specific output of rehabilitation. Adding evaluation of neurochemical markers of neural plasticity such as brain-derived neurotrophic factor (BDNF), and methods such as EEG/ERP and fMRI, would contribute further to the scientific strength of future research on music therapy. NMT techniques have the potential to overcome this issue as the methods are well standardized and can be used in large-scale or multi-center studies. Compared to systematic research using the techniques of NMT for sensorimotor and language functions, research on NMT techniques for cognitive and affective functions is far less abundant. NMT with its strong theoretical and scientific background has positively influenced the practice of music therapy across countries. Future studies may also aim to examine the effects of NMT techniques for sensorimotor and language functions on cognitive functions and vice versa due to shared neural networks underlying the former functions with general cognitive functions. Also, all the techniques of NMT aim at enhancing neural plasticity to bring about the desired changes in neural function and behavior.
R Ben-Yishay, Y., Piasetsky, E., & Rattok, J. (1985). A systematic method for ameliorating disorders in basic attention. In M. J. Meir, A. L. Benton, & L. Diller (Eds.), Neuropsychological rehabilitation (pp. 165–82). New York: Guilford Press. Ben-Yishay, Y., & Prigatano, G. P. (1990). Cognitive remediation. In M. Rosenthal, M. R. E. R. Griffith, M. R. Bond, & J. D. Miller (Eds.), Rehabilitation of the adult and child with traumatic brain injury (2nd ed., pp. 393–409). Philadelphia, PA: Davis. Benedict, R. H., Harris, A. E., Markow, T., McCormick, J. A., Nuechterlein, K. H., & Asarnow, R. F. (1994). Effects of attention training on information processing in schizophrenia. Schizophrenia Bulletin 20, 537–546. Bernardi, N. F., Cioffi, M. C., Ronchi, R., Maravita, A., Bricolo, E., Zigiotto, L., … Vallar, G. (2017). Improving left spatial neglect through music scale playing. Journal of Neuropsychology 11(1),
135–158. Biswas, A., Hegde, S., Jhunjhunwala, K., & Pal, P. K. (2016). Two sides of the same coin: Impairment in perception of temporal components of rhythm and cognitive functions in Parkinson’s disease. Basal Ganglia 6(1), 63–70. Blood, A. J., & Zatorre, R. J. (2001). Intensely pleasurable responses to music correlate with activity in brain regions implicated in reward and emotion. Proceedings of the National Academy of Sciences 98(20), 11818–11823. Bodak, R., Malhotra, P., Bernardi, N. F., Cocchini, G., & Stewart, L. (2014). Reducing chronic visuospatial neglect following right hemisphere stroke through instrument playing. Frontiers in Human Neuroscience 8, 413. Retrieved from https://doi.org/10.3389/fnhum.2014.00413 Bruel-Jungerman, E., Davis, S., & Laroche, S. (2007). Brain plasticity mechanisms and memory: A party of four. Neuroscientist 13(5), 492–505. Cappa, S. F., Benke, T., Clarke, S., Rossi, B., Stemmer, B., & Van Heugten, C. M. (2005). EFNS guidelines on cognitive rehabilitation: Report of an EFNS task force. European Journal of Neurology 12(9), 665–680. Castro, M., Tillmann, B., Luaute, J., Corneyllie, A., Dailler, F., Andre-Obadia, N., & Perrin, F. (2015). Boosting cognition with music in patients with disorders of consciousness. Neurorehabilitation and Neural Repair 29(8), 734–742. Cernich, A. N., Kurtz, S. M., Mordecai, K. L., & Ryan, P. B. (2010). Cognitive rehabilitation in traumatic brain injury. Current Treatment Options in Neurology 12(5), 412–423. Chanda, M. L., & Levitin, D. J. (2013). The neurochemistry of music. Trends in Cognitive Sciences 17(4), 179–193. Chung, C. S., Pollock, A., Campbell, T., Durward, B. R., & Hagen, S. (2013). Cognitive rehabilitation for executive dysfunction in adults with stroke or other adult non-progressive acquired brain damage. Cochrane Database of Systematic Reviews 4, CD008391. Cicerone, K. D. (2012). Facts, theories, values: Shaping the course of neurorehabilitation. The 60th John Stanley Coulter memorial lecture. Archives of Physical Medicine and Rehabilitation 93(2), 188–191. Cicerone, K. D., Dahlberg, C., Kalmar, K., Langenbahn, D. M., Malec, J. F., Bergquist, T. F., … Morse, P. A. (2000). Evidence-based cognitive rehabilitation: Recommendations for clinical practice. Archives of Physical Medicine and Rehabilitation 81(12), 1596–1615. Cicerone, K. D., Langenbahn, D. M., Braden, C., Malec, J. F., Kalmar, K., Fraas, M., … Ashman, T. (2011). Evidence-based cognitive rehabilitation: Updated review of the literature from 2003 through 2008. Archives of Physical Medicine and Rehabilitation 92(4), 519–530. Clair, A. A., Pasiali, V., & Lagasse, B. (2008). Neurologic music therapy. In A. A. Darrow (Ed.), Introduction to approaches in music therapy (2nd ed., pp. 153–171). Silver Spring, MD: American Music Therapy Association. Conway, C. M., Pisoni, D. B., & Kronenberger, W. G. (2009). The importance of sound for cognitive sequencing abilities: The auditory scaffolding hypothesis. Current Directions in Psychological Science 18(5), 275–279. de Diego-Balaguer, R., Martinez-Alvarez, A., & Pons, F. (2016). Temporal attention as a scaffold for language development. Frontiers in Psychology 7. Retrieved from https://doi.org/10.3389/fpsyg.2016.00044 de l’Etoile, S. K. (2010). Neurologic music therapy: A scientific paradigm for clinical practice. Music and Medicine 2(2), 78–84. de l’Etoile, S. K. (2016). Processes of music therapy: Clinical and scientific rationales and models. In S. Hallam, I. Cross, & M. Thaut (Eds.), The Oxford handbook of music psychology (2nd ed., pp. 805–818). Oxford: Oxford University Press.
Deutsch, D. (1982). Organizational processes in music. In M. Clynes (Ed.), Music, mind and brain (pp. 119–131). New York: Plenum Press. Diamond, P. T., Felsenthal, G., Macciocchi, S. N., Butler, D. H., & Lally-Cassady, D. (1996). Effect of cognitive impairment on rehabilitation outcome. American Journal of Physical Medicine and Rehabilitation 75(1), 40–43. Diller, L., & Gordon, W. A. (1981). Interventions for cognitive deficits in brain-injured adults. Journal of Consulting and Clinical Psychology 49(6), 822–834. Drake, C., Jones, M. R., & Baruch, C. (2000). The development of rhythmic attending in auditory sequences: Attunement, referent period, focal attending. Cognition 77(3), 251–288. Eack, S. M. (2012). Cognitive remediation: A new generation of psychosocial interventions for people with schizophrenia. Social Work 57(3), 235–246. Elvevag, B., & Goldberg, T. E. (2000). Cognitive impairment in schizophrenia is the core of the disorder. Critical Reviews in Neurobiology 14(1), 1–21. Faralli, A., Bigoni, M., Mauro, A., Rossi, F., & Carulli, D. (2013). Noninvasive strategies to promote functional recovery after stroke. Neural Plasticity 2013, 854597. Fischer, C., Luaute, J., Adeleine, P., & Morlet, D. (2004). Predictive value of sensory and cognitive evoked potentials for awakening from coma. Neurology 63(4), 669–673. Fujii, S., & Wan, C. Y. (2014). The role of rhythm in speech and language rehabilitation: The SEP hypothesis. Frontiers in Human Neuroscience 8. Retrieved from http://dx.doi.org/10.3389/fnhum.2014.00777 Gardiner, J. C., & Horwitz, J. L. (2015). Neurologic music therapy and group psychotherapy for treatment of traumatic brain injury: Evaluation of a cognitive rehabilitation group. Music Therapy Perspectives 33(2), 193–201. Gfeller, K. E. (1983). Musical mnemonics as an aid to retention with normal and learning disabled students. Journal of Music Therapy 20(4), 179–189. Gordon, W. A., Zafonte, R., Cicerone, K., Cantor, J., Brown, M., Lombard, L., … Chandna, T. (2006). Traumatic brain injury rehabilitation: State of the science. American Journal of Physical Medicine and Rehabilitation 85(4), 343–382. Guilbert, A., Clement, S., & Moroni, C. (2017). A rehabilitation program based on music practice for patients with unilateral spatial neglect: A single-case study. Neurocase 23(1), 12–21. Harvey, P. D., Green, M. F., Keefe, R. S., & Velligan, D. I. (2004). Cognitive functioning in schizophrenia: A consensus statement on its role in the definition and evaluation of effective treatments for the illness. Journal of Clinical Psychiatry 65(3), 361–372. Hegde, S. (2014). Music based cognitive remediation therapy for patients with traumatic brain injury. Frontiers in Neurology 5. Retrieved from https://doi.org/10.3389/fneur.2014.00034 Hegde, S. (2017). A review of Indian research on cognitive remediation for schizophrenia. Asian Journal of Psychiatry 25, 54–59. Herholz, S. C., & Zatorre, R. J. (2012). Musical training as a framework for brain plasticity: Behavior, function, and structure. Neuron 76(3), 486–502. Hitch, G. J. (1996). Temporal grouping effects in immediate recall: A working memory analysis. Quarterly Journal of Experimental Psychology Section A 49(1), 116–139. Ho, Y. C., Cheung, M. C., & Chan, A. S. (2003). Music training improves verbal but not visual memory: Cross-sectional and longitudinal explorations in children. Neuropsychology 17(3), 439– 450. Hommel, M., Peres, B., Pollak, P., Memin, B., Besson, G., Gaio, J. M., & Perret, J. (1990). Effects of passive tactile and auditory stimuli on left visual neglect. Archives of Neurology 47, 573–576. Ishihara, M., Revol, P., Jacquin-Courtois, S., Mayet, R., Rode, G., Boisson, D., … Rossetti, Y. (2013). Tonal cues modulate line bisection performance: Preliminary evidence for a new
rehabilitation prospect? Frontiers in Psychology 4, 704. Retrieved from https://doi.org/10.3389/fpsyg.2013.00704 Jacobsen, J. H., Stelzer, J., Fritz, T. H., Chetelat, G., La Joie, R., & Turner, R. (2015). Why musical memory can be preserved in advanced Alzheimer’s disease. Brain 138(8), 2438–2450. Jakobson, L. S., Cuddy, L. L., & Kilgour, A. R. (2003). Time tagging: A key to musicians’ superior memory. Music Perception: An Interdisciplinary Journal 20(3), 307–313. Jäncke, L. (2009). Music drives brain plasticity. F1000 Biology Reports 1, 78. Retrieved from http://www.F1000.com/Reports/Biology/content/1/78 Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention, and memory. Psychological Review 83(5), 323–355. Jones, M. R., & Boltz, M. (1989). Dynamic attending and responses to time. Psychological Review 96(3), 459–491. Jones, M. R., Boltz, M., & Kidd, G. (1982). Controlled attending as a function of melodic and temporal context. Perception & Psychophysics 32(3), 211–218. Jones, M. R., Moynihan, H., Mackenzie, N., & Puente, J. (2002). Temporal aspects of stimulusdriven attending in dynamic arrays. Psychological Science 13(4), 313–319. Jung, H., Sontag, S., Park, Y. S., & Loui, P. (2015). Rhythmic effects of syntax processing in music and language. Frontiers in Psychology 6, 1762. Retrieved from https://doi.org/10.3389/fpsyg.2015.01762 Kern, P., Wolery, M., & Aldridge, D. (2007). Use of songs to promote independence in morning greeting routines for young children with autism. Journal of Autism and Developmental Disorders 37(7), 1264–1271. Keshavan, M. S., Vinogradov, S., Rumsey, J., Sherrill, J., & Wagner, A. (2014). Cognitive training in mental disorders: Update and future directions. American Journal of Psychiatry 171(5), 510–522. Kilgour, A. R., Jakobson, L. S., & Cuddy, L. L. (2000). Music training and rate of presentation as mediators of text and song recall. Memory & Cognition 28(5), 700–710. Kleim, J. A., Barbay, S., & Nudo, R. J. (1998). Functional reorganization of the rat motor cortex following motor skill learning. Journal of Neurophysiology 80(6), 3321–3325. Kleinstauber, M., & Gurr, B. (2006). Music in brain injury rehabilitation. Journal of Cognitive Rehabilitation 24, 4–14. Koelsch, S. (2009). A neuroscientific perspective on music therapy. Annals of the New York Academy of Sciences 1169, 374–384. Lam, R. W., Kennedy, S. H., McIntyre, R. S., & Khullar, A. (2014). Cognitive dysfunction in major depressive disorder: Effects on psychosocial functioning and implications for treatment. Canadian Journal of Psychiatry/Revue Canadienne de Psychiatrie 59(12), 649–654. Large, E. W., & Jones, M. R. (1999). The dynamics of attending: How people track time-varying events. Psychological Review 106(1), 119–159. Levitin, D. J., & Tirovolas, A. K. (2009). Current advances in the cognitive neuroscience of music. Annals of the New York Academy of Sciences 1156, 211–231. Limb, C. J. (2006). Structural and functional neural correlates of music perception. The Anatomical Record Part A: Discoveries in Molecular, Cellular, and Evolutionary Biology 288, 435–446. McAuley, J. D., & Jones, M. R. (2003). Modeling effects of rhythmic context on perceived duration: A comparison of interval and entrainment approaches to short-interval timing. Journal of Experimental Psychology: Human Perception and Performance 29(6), 1102–1125. Maratos, A. S., Gold, C., Wang, X., & Crawford, M. J. (2008). Music therapy for depression. Cochrane Database of Systematic Reviews 1, CD004517. Marder, S. R. (2006). Initiatives to promote the discovery of drugs to improve cognitive function in severe mental illness. Journal of Clinical Psychiatry 67(7), e03.
Miller, J. E., Carlson, L. A., & McAuley, J. D. (2013). When what you hear influences when you see: Listening to an auditory rhythm influences the temporal allocation of visual attention. Psychological Science 24(1), 11–18. Mossler, K., Chen, X., Heldal, T. O., & Gold, C. (2011). Music therapy for people with schizophrenia and schizophrenia-like disorders. Cochrane Database of Systematic Reviews 4, CD004025. Müller, T. (2002). Drug treatment of non-motor symptoms in Parkinson’s disease. Expert Opinion in Pharmacotherapy 3(4), 381–388. Müller, T. (2012). Drug therapy in patients with Parkinson’s disease. Translational Neurodegeneration 1, 10. doi:10.1186/2047-9158-1-10 Münte, T. F., Altenmüller, E., & Jäncke, L. (2002). The musician’s brain as a model of neuroplasticity. Nature Reviews Neuroscience 3, 473–478. Nayak, S., Wheeler, B. L., Shiflett, S. C., & Agostinelli, S. (2000). Effect of music therapy on mood and social interaction among individuals with acute traumatic brain injury and stroke. Rehabilitation Psychology 45(3), 274–283. Patel, A. D. (2010). Music, biological evolution, and the brain. In M. Bailar (Ed.), Emerging disciplines (pp. 91–144). Houston, TX: Rice University Press. Peretz, I. (2006). The nature of music from a biological perspective. Cognition 100(1), 1–32. Peretz, I., & Zatorre, R. J. (2005). Brain organization for music processing. Annual Review of Psychology 56, 89–114. Peterson, D. A., & Thaut, M. H. (2007). Music increases frontal EEG coherence during verbal learning. Neuroscience Letters 412(3), 217–221. Podd, M. H. (2012). History of cognitive remediation. In M. H. Podd, Cognitive remediation for brain injury and neurological illness (pp. 1–4). New York: Springer. Prigatano, G. P. (1997). Learning from our successes and failures: Reflections and comments on “Cognitive rehabilitation: How it is and how it might be.” Journal of the International Neuropsychological Society 3(5), 497–499. Raglio, A., Attardo, L., Gontero, G., Rollino, S., Groppo, E., & Granieri, E. (2015). Effects of music and music therapy on mood in neurological patients. World Journal of Psychiatry 5(1), 68–78. Raskin, S. A. (2010). Current approaches to cognitive rehabilitation. In C. Armstrong & L. Morrow (Eds.), Handbook of medical neuropsychology (pp. 505–518). New York: Springer. Rohling, M. L., Faust, M. E., Beverly, B., & Demakis, G. (2009). Effectiveness of cognitive rehabilitation following acquired brain injury: A meta-analytic re-examination of Cicerone et al.’s (2000, 2005) systematic reviews. Neuropsychology 23(1), 20–39. Rund, B. R., & Borg, N. E. (1999). Cognitive deficits and cognitive training in schizophrenic patients: A review. Acta Psychiatrica Scandinavica 100(2), 85–95. Salimpoor, V. N., Zald, D. H., Zatorre, R. J., Dagher, A., & McIntosh, A. R. (2015). Predictions and the brain: How musical sounds become rewarding. Trends in Cognitive Sciences 19(2), 86–91. Särkämö, T., Altenmüller, E., Rodríguez-Fornells, A., & Peretz, I. (2016). Editorial: Music, brain, and rehabilitation: Emerging therapeutic applications and potential neural mechanisms. Frontiers in Human Neuroscience 10, 103. Retrieved from https://doi.org/10.3389/fnhum.2016.00103 Särkämö, T., Ripollés, P., Vepsäläinen, H., Autti, T., Silvennoinen, H. M., Salli, E., … RodríguezFornells, A. (2014). Structural changes induced by daily music listening in the recovering brain after middle cerebral artery stroke: A voxel-based morphometry study. Frontiers in Human Neuroscience 8, 245. Retrieved from https://doi.org/10.3389/fnhum.2014.00245 Särkämö, T., Tervaniemi, M., & Huotilainen, M. (2013). Music perception and cognition: Development, neural basis, and rehabilitative use of music. Wiley Interdisciplinary Reviews: Cognitive Science 4(4), 441–451.
Särkämö, T., Tervaniemi, M., Laitinen, S., Forsblom, A., Soinila, S., Mikkonen, M., … Hietanen, M. (2008). Music listening enhances cognitive recovery and mood after middle cerebral artery stroke. Brain 131(3), 866–876. Särkämö, T., Tervaniemi, M., Laitinen, S., Numminen, A., Kurki, M., Johnson, J. K., & Rantanen, P. (2014). Cognitive, emotional, and social benefits of regular musical activities in early dementia: Randomized controlled study. Gerontologist 54(4), 634–650. Sarnthein, J., Vonstein, A., Rappelsberger, P., Petsche, H., Rauscher, F. H., & Shaw, G. L. (1997). Persistent patterns of brain activity: An EEG coherence study of the positive effect of music on spatial-temporal reasoning. Neurological Research 19(2), 107–116. Schlaug, G. (2009). Part VI introduction: Listening to and making music facilitates brain recovery processes. Annals of the New York Academy of Sciences 1169, 372–373. Schutz, L. E., & Trainor, K. (2007). Evaluation of cognitive rehabilitation as a treatment paradigm. Brain Injury 21(6), 545–557. Shaw, G. L., & Bodner, M. (1999). Music enhances spatial-temporal reasoning: Towards a neurophysiological basis using EEG. Clinical Electroencephalography 30(4), 151–155. Sihvonen, A. J., Särkämö, T., Leo, V., Tervaniemi, M., Altenmüller, E., & Soinila, S. (2017). Musicbased interventions in neurological rehabilitation. Lancet Neurology 16(8) 648–660. Simmons-Stern, N. R., Budson, A. E., & Ally, B. A. (2010). Music as a memory enhancer in patients with Alzheimer’s disease. Neuropsychologia 48(10), 3164–3167. Sohlberg, M. M., & Mateer, C. A. (2001). Cognitive rehabilitation: An integrative neuropsychological approach. New York: Guilford Press. Spring, B. J., & Ravdin, L. (1992). Cognitive remediation in schizophrenia: Should we attempt it? Schizophrenia Bulletin 18(1), 15–20. Sun, J.-H., Tan, L., & Yu, J.-T. (2014). Post-stroke cognitive impairment: Epidemiology, mechanisms and management. Annals of Translational Medicine 2(8), 80. Sutoo, D., & Akiyama, K. (2004). Music improves dopaminergic neurotransmission: Demonstration based on the effect of music on blood pressure regulation. Brain Research 1016(2), 255–262. Thaut, M. H. (2005a). Rhythm, music and the brain: Scientific foundations and clinical applications. New York: Routledge. Thaut, M. H. (2005b). Neurologic music therapy in cognitive rehabilitation. In M. Thaut, Rhythm, music and the brain: Scientific foundations and clinical applications (pp. 179–202). New York: Routledge. Thaut, M. H. (2005c). Neurologic music therapy in sensorimotor rehabilitation. In M. Thaut, Rhythm, music and the brain: Scientific foundations and clinical applications (pp. 137–164). New York: Routledge. Thaut, M. H. (2005d). Neurologic music therapy in speech and language rehabilitation. In M. Thaut, Rhythm, music and the brain: Scientific foundations and clinical applications (pp. 165–178). New York: Routledge. Thaut, M. H. (2010). Neurologic music therapy in cognitive rehabilitation. Music Perception 27(4), 281–285. Thaut, M. H. (2014). Assessment and the transformational design model (TDM). In M. H. Thaut & V. Hoemberg (Eds.), Handbook of neurologic music therapy (pp. 60–68). Oxford: Oxford University Press. Thaut, M. H. (2015). Music as therapy in early history. Progress in Brain Research 217, 143–158. Thaut, M. H., Gardiner, J. C., Holmberg, D., Horwitz, J., Kent, L., Andrews, G., Donelan, B., & McIntosh, G. R. (2009). Neurologic music therapy improves executive function and emotional adjustment in traumatic brain injury rehabilitation. Annals of the New York Academy of Sciences 1169, 406–416.
Thaut, M. H., & Hoemberg, V. (Eds.). (2014). Handbook of neurologic music therapy. Oxford: Oxford University Press. Thaut, M. H., McIntosh, G. C., & Hoemberg, V. (2014). Neurologic music therapy: From social science to neuroscience. In M. H. Thaut & V. Hoemberg (Eds.), Handbook of neurologic music therapy (pp. 1–6). Oxford: Oxford University Press. Thaut, M. H., Peterson, D. A., & McIntosh, G. C. (2005). Temporal entrainment of cognitive functions: Musical mnemonics induce brain plasticity and oscillatory synchrony in neural networks underlying memory. Annals of the New York Academy of Sciences 1060, 243–254. Thaut, M. H., Peterson, D. A., McIntosh, G. C., & Hoemberg, V. (2014). Music mnemonics aid verbal memory and induce learning: Related brain plasticity in multiple sclerosis. Frontiers in Human Neuroscience 8, 395. Retrieved from https://doi.org/10.3389/fnhum.2014.00395 Thompson, R. G., Moulin, C. J., Hayre, S., & Jones, R. W. (2005). Music enhances category fluency in healthy older adults and Alzheimer’s disease patients. Experimental Aging Research 31(1), 91– 99. Volpe, B. T., & McDowell, F. H. (1990). The efficacy of cognitive rehabilitation in patients with traumatic brain injury. Archives of Neurology 47, 220–222. Wallace, W. T. (1994). Memory for music: Effect of melody on recall of text. Journal of Experimental Psychology: Learning, Memory, & Cognition 20(6), 1471–1485. Wan, C. Y., & Schlaug, G. (2010). Music making as a tool for promoting brain plasticity across the life span. Neuroscientist 16(5), 566–577. Wolfe, D. E., & Hom, C. (1993). Use of melodies as structural prompts for learning and retention of sequential verbal information by preschool students. Journal of Music Therapy 30(2), 100–118. Wykes, T., Huddy, V., Cellard, C., McGurk, S. R., & Czobor, P. (2011). A meta-analysis of cognitive remediation for schizophrenia: Methodology and effect sizes. American Journal of Psychiatry 168(5), 472–485.
CHAPT E R 31
MUSICAL DISORDERS I S A B E L L E R O YA L, S É B A S T I E N PA Q U E T T E, A N D PA U L I N E T R A N C H A N T
I T vast majority of people choose to experience music on a daily basis. Part of the reason why is that music is known to regulate our moods and can produce positive emotions through the sensory and cognitive experience it provides (Lonsdale & North, 2011; Salimpoor et al., 2013). Music is also an experience that can be shared socially through dancing or singing in synchrony with others (Wiltermuth & Heath, 2009). Although music listening habits have evolved throughout the years with the rise of various audio devices, the appeal of music is not a new phenomenon. Music has been documented in nearly every culture over time, which makes musical engagement a fundamental and universal human trait (Merriam, 1964; Peretz, 2006). Indeed, the human brain is able to track musical changes such as pitch and time variations just a few hours after birth, indicating that it is equipped with the necessary neural architecture to naturally acquire musical abilities during early development (Peretz, 2002; Peretz & Coltheart, 2003; Trehub, 2001; Zatorre & Peretz, 2001). Despite the universality of music, a minority of individuals present with very specific musical perception deficits that cannot be attributed to general auditory dysfunction, intellectual disability, or a lack of musical exposure (Ayotte, Peretz, & Hyde, 2002). Cases where these deficits are present from
birth are often referred to as “congenital amusia” (Peretz, 2001; Peretz et al., 2002; Peretz, Cummings, & Dubé, 2007). Congenital amusia acts as an umbrella term to designate both a “pitch-based” amusia, an inability to process subtle pitch changes, and a “beat finding” disorder, an inability to synchronize to music (Phillips-Silver et al., 2011; Sowiński & Dalla Bella, 2013; Tranchant, Vuvan, & Peretz, 2016). In contrast, acquired amusia refers to the development of similar symptoms following a neurological event (e.g., stroke or accident). The goal of this chapter is to provide an overview of these intriguing musical disorders and to demonstrate how they represent a unique opportunity to study not only normal brain function, but also to isolate the specific brain areas that play a role in musical processing. The first sections of this chapter will focus on pitch-based amusia, beat finding disorder, acquired amusia as well as musical anhedonia. The last section will be dedicated to a discussion highlighting what the study of these musical disorders has taught us about normal brain function.
P
-B
A
Prevalence and Behavioral Markers The most prevalent musical disorder is congenital and specifically affects the perception of pitch. Congenital amusia affects approximately 1.5 percent of the population, with no discernible difference in prevalence between women and men (Peretz & Vuvan, 2017). Behaviorally, individuals with congenital amusia are distinct from unaffected individuals because they have difficulty singing in-tune, detecting singing that is out-of-tune (including their own), and identifying a familiar song without lyrics (Ayotte et al., 2002; Peretz, Champod, & Hyde, 2003). They also struggle with maintaining short melodies in working memory (Ayotte et al., 2002). These behavioral markers are thought to be the result of an impairment that affects the processing of fine pitch variations that is central to the emergence of the musical deficits observed in congenital amusia (Hyde & Peretz, 2004).
Indeed, congenital amusics cannot reliably detect pitch deviations that are smaller than two semitones, whereas non-amusics can reliably detect differences that are several orders of magnitude smaller (Hyde & Peretz, 2004). One semitone (100 cents on a logarithmic scale of music interval) is equivalent to the distance between two consecutive piano keys. Because Western music often uses pitch variations that are below the detection threshold of congenital amusics (i.e., one semitone), essential parts of the musical structure are often missed. This also implies that amusics occasionally fail to identify notes that violate tonal regularity, explaining why they struggle so much with detecting out-of-tune singing. Given that these deficits are hallmark behavioral manifestations of congenital amusia, they are often the focus of diagnostic tools used to identify individuals affected by this disorder.
Identification of Congenital Amusia The most widely used quantitative tool to identify individuals with congenital amusia is the Montreal Battery of Evaluation of Amusia (MBEA) (Peretz et al., 2003). It assesses several aspects of auditory musical perception, such as tonal knowledge, temporal (rhythm) processing, musical working memory, and musical recognition abilities. It also allows for the comparison of an individual’s profile to that of an amusic population. Given that the core deficit of congenital amusia relates to an impaired perception of pitch structure, an individual must perform below the cut-off scores for tasks requiring the detection of melodic key violations to be considered amusic. In addition, other confounding factors must first be excluded for the diagnosis to stand, such as a low cognitive potential, abnormal hearing abilities, and a history of traumatic brain injury. More recently, the Montreal Protocol for Identification of Amusia (MPIA) was published to introduce a full evaluation protocol through which congenital amusics can be effectively identified (Vuvan et al., 2017). It includes the MBEA, questionnaires, and a description of the testing of relevant exclusion criteria.
Neurological Markers Although identifying congenital amusics at the behavioral level is unquestionably useful to characterize how they perceive music, it is only the first step towards understanding the etiology of the disorder. What characterizes congenital amusia at the functional level is a dissociation between what is perceived and what is consciously detected, a phenomenon known as perception without awareness. Indeed, although congenital amusics fail to reliably detect pitch changes that are smaller than two semitones, their brain is able to process changes that are as small as an eighth of a tone (Hyde, Zatorre, & Peretz, 2011; Moreau, Jolicœur, & Peretz, 2009; Peretz, Brattico, Järvenpää, & Tervaniemi, 2009). What makes congenital amusics different from non-amusics is that, although their auditory system clearly detects small pitch differences, this information does not appear to reach higher-order brain areas that are responsible for the conscious perception of these differences. Early neuroimaging studies using electroencephalography (EEG) demonstrated that the brain of both congenital amusics and healthy controls can detect small pitch variations presented within a string of repeated sounds (Moreau et al., 2009; Peretz, Brattico, & Tervaniemi, 2005). This detection is revealed by the presence of a specific EEG component that is associated with the automatic detection of pitch variations, the mismatch negativity (MMN). Both the MMN and the N1 (another EEG component that is elicited by all auditory stimuli) of amusics are comparable to that of control subjects (although see Albouy et al., 2013). Furthermore, when presented with tones in a melodic context, the brain of congenital amusics also exhibits a normal early right anterior negativity (ERAN; Koelsch, 2011; Koelsch & Siebel, 2005) in response to the violation of more complex auditory sequences based on the Western tonal system. In marked contrast, however, the P3b and the P600 components, associated with the conscious detection of deviant tone, are significantly altered in congenital amusics. Because the P3b is generally absent in response to pitch variations that are smaller than one semitone in non-amusics (Peretz et al., 2005), it suggested that its absence in amusics might reflect a dysfunction of the mechanisms that would normally underlie the conscious perception of the deviation.
These EEG findings supporting the idea of perception without awareness are also consistent with results obtained using functional magnetic resonance imaging (fMRI). For instance, a positive and linear increase in the blood-oxygen-level-dependent (BOLD) response was found when congenital amusics listened to melodies composed of pure tones varying from zero to two semitones, indicating that their brain can indeed track very small variations despite the absence of any conscious perception (Hyde et al., 2011).
Contribution of the Right Frontotemporal Network Compared to the normal brain, several areas of the congenital amusic’s brain present both structural and functional anomalies that are thought to underlie the manifestation of congenital amusia. These anomalies are primarily found in the right frontotemporal network, encompassing the right inferior frontal gyrus (IFG; Brodmann area (BA) 44, 45, 47), the superior temporal gyrus (STG; BA 22), and the right arcuate fasciculus. From a structural standpoint, the congenital amusic brain displays decreased white matter concentration and increased gray matter concentration in the right IFG (Albouy et al., 2013; Hyde, Zatorre, Griffiths, Lerch, & Peretz, 2006; Hyde et al., 2007), gray matter morphological alterations in the right STG (Albouy et al., 2013; Hyde et al., 2007), and a reduced number of white matter fibers in the right arcuate fasciculus (Loui, Alsop, & Schlaug, 2009; Wilbiks, Vuvan, Girard, Peretz, & Russo, 2016). Furthermore, there is evidence supporting a reduction in connectivity between the right IFG and right STG (Albouy et al., 2013; Albouy, Mattout, Sanchez, Tillmann, & Cacin, 2015; Hyde et al., 2011; Leveque et al., 2016), in addition to evidence supporting an increase in connectivity between the right and left STG (Albouy et al., 2015; Hyde et al., 2011). Taken together, these results suggest a disturbance in the recurrent processing between the right IFG and the right STG, which is believed to underlie the manifestation of amusia (Fig. 1) (Peretz, 2016). According to this hypothesis, the IFG—whose role is to amplify and refine the auditory signal from the STG (Opitz, Rinne, Mecklinger, von Cramon, & Schröger, 2002)—is unable to provide
adequate top-down modulation of the signal processed in the right STG, which significantly hampers the amusic’s ability to consciously detect subtle auditory changes in the environment.
FIGURE 1. Anomalous recurrent processing in the right frontotemporal network. Reprinted from Trends in Cognitive Sciences, 20(11), Isabelle Peretz, Neurobiology of congenital amusia, pp. 857–67, doi.org/10.1016/j.tics.2016.09.002, Copyright © 2016 Elsevier Ltd. All rights reserved, with permission from Elsevier.
This first section described congenital amusia in terms of its distinguishing neuropsychological features and presented the specific behavioral and neurological anomalies that characterize it. At the end of the chapter, we will further discuss how our understanding of brain function has evolved in light of what we have learned from studying congenital amusia. The following section will focus on a different musical disorder that affects an individual’s ability to properly synchronize with music.
B
F
D
Dance is a universal phenomenon that has been documented throughout history and found across different cultures (e.g., Nettl, 2000). Dancing is likely to have resulted from evolutionary forces, as it may play a role in sexual attraction (Darwin, 1871; Neave et al., 2011) and in the development
of pro-social skills (e.g., Cirelli, Einarson, & Trainor, 2014). Yet, anecdotes of people having poor rhythmic skills, or “two left feet,” are frequent. The study of a new form of congenital amusia, a beat finding disorder (also known as beat-deafness) that affects time perception, rather than pitch perception, has emerged in recent years. Phillips-Silver et al. (2011) first reported the case of Mathieu, a university student experiencing difficulty to match the tempo of a dance-like bouncing movement (bending the knees up and down) with music. This rhythmic difficulty could not be explained by other cognitive, pitch-processing, or motor deficits. Perhaps cases like Mathieu, however, should not be so surprising: moving in synchrony with a beat, despite its apparent simplicity, is indeed a sophisticated behavior. To properly perform this behavior, one must be able to predict when the next beat will occur; indeed, humans generally attempt to anticipate tones— rather than react to them—when synchronizing to a metronome (for a review, see Repp, 2005). In addition, complex musical forms are usually characterized by the absence of a one-to-one correspondence between tones and beats. For example, in syncopated rhythms, which are typical of jazzier musical styles, beats can occur on silences. Therefore, beat perception in the context of music requires the listener to infer the timing of the beats (Honing, 2013). Several other individuals with a profile comparable to Mathieu have been identified since 2011 (four in Sowiński & Dalla-Bella, 2013; one in addition to Mathieu in Palmer, Lidji, & Peretz, 2014; fourteen in Tranchant et al., 2016; and subject LV in Bégel et al., 2017). These individuals offer a unique opportunity to study the neural mechanisms that are essential to beat perception and synchronization (e.g. Paquette et al., 2017), and to provide insight into the neurobiological origins of dance. In a follow-up study with Mathieu, his beat prediction abilities were assessed with a task that required him to tap in synchrony with a metronome (Palmer et al., 2014). Mathieu’s taps preceded the tone onset by approximately 30 milliseconds, which was comparable with what was observed in non-amusics. However, when experimenters introduced unpredictable temporal changes in the metronome sequences, Mathieu required a larger amount of taps to return to baseline compared with nonamusics. This finding led the authors to conclude that a deficient perception–action coupling likely underlay his beat finding deficit. Such a hypothesis raises the question of whether poor synchronization with music is always accompanied by poor beat perception, or rather if it could be due
to poor coupling between (normal) perception and movement. PhillipsSilver et al. (2011) had previously observed that Mathieu had difficulty in judging whether the underlying pattern of strong and weak beats in short piano melodies corresponded to a march (One-two-One-two) or to a waltz (One-two-three-One-two-three; Meter Test of the MBEA, see Peretz et al., 2003). It was therefore concluded that Mathieu’s inability to detect an underlying beat in a musical context likely stemmed from poor beat perception abilities. In contrast, Sowiński and Dalla-Bella (2013) identified two individuals with significant beat finding difficulties in a musical context in the absence of any deficits on two rhythm perception tasks, suggesting that poor synchronization can be dissociated from poor perception. However, Tranchant and Vuvan (2015) noted that beat perception is not necessary to perform the perceptual tasks used in Sowiński and Dalla-Bella (2013). It thus remains unclear whether beat perception was truly unimpaired in these cases. Therefore, future studies should attempt to dissociate beat perception abilities from synchronization abilities. The study of beat finding disorder is still in its infancy. In particular, the brain correlates of the disorder have not thoroughly been investigated yet. Much of what is known so far was provided by Mathias and colleagues (Mathias, Lidji, Honing, Palmer, & Peretz, 2016), who showed that the MMN following beat omissions in rhythmical patterns was normal in Mathieu, suggesting that the pre-attentive processing of beat irregularities is preserved. However, Mathieu’s event-related potentials lacked a P3b component in response to beat omissions, which, as mentioned earlier, is a key component associated with the conscious detection of deviant auditory stimuli. Future research should harness recently developed EEG techniques (e.g., Nozaradan, Peretz, & Mouraux, 2012) to investigate neuronal entrainment to a beat in the beat finding disorder. New cases like Mathieu have been identified over the last few years, opening the door for the neuroimaging investigation of brain regions and/or networks anomalies associated with the beat finding disorder.
A
A
The term “amusia” is often used to refer to its developmental form (i.e., congenital amusia), as previously described in this chapter. However, amusia can also be acquired, often following a significant neurological event (e.g., brain trauma or stroke). Because the severity and the location of brain insults are typically unique to each individual, individuals with acquired amusia present with very unique manifestations, often not limited to a single aspect of music processing. As a result, the scientific literature on acquired amusia has documented a wide array of music deficits (see Clark, Golden, & Warren, 2015 for a review). Although studying both congenital and acquired amusia allows for a better understating of how music is processed by the brain, the study of acquired amusia is uniquely positioned to inform us on deficient music processing networks resulting from a neurological insult. A review of several acquired amusia case studies suggests that lesions producing deficits in musical perception are predominantly located in the right hemisphere (Stewart, von Kriegstein, Warren, & Griffiths, 2006). However, some exceptions do exist. For instance, there are reports describing acquired music perception impairments that occurred following lesions to both hemispheres (Ayotte, Peretz, Rousseau, Bard, & Bowjanowski, 2000; Liégeois-Chauvel, Peretz, Babaï, Laguitton, & Chauvel, 1998; Peretz, 1990; Schuppert, Münte, Wieringer, & Altenmüller, 2000).
Acquired Pitch Perception Deficits As previously discussed, the detection of subtle pitch variations is essential for the perception of music, and a reduced awareness to those variations is at the core of what is generally described as “amusia.” Individual cases of acquired amusia usually have in common lesions that encompass the anterior and middle portions of the right superior temporal gyrus (STG) and the insula (e.g., Ayotte et al., 2000; Hochman & Abrams, 2014; Peretz et al., 1994). Importantly, Ayotte and collaborators (2000) demonstrated that in some cases, the lesions can selectively impair pitch processing while preserving temporal (rhythm and meter) perception, suggesting that distinct neural substrates may underlie both deficits.
Acquired amusia has also been documented in patients who underwent surgery for intractable epilepsy. For example, Liégeois-Chauvel and collaborators (1998) demonstrated that right hemisphere resections including the STG impairs the processing of both contour and pitch intervals, whereas left hemisphere resections primarily affected the processing of pitch interval. Furthermore, Peretz and collaborators (1994) showed that when bilateral auditory lesions preserved the primary auditory cortex, only melody perception impairments are observed as rhythm perception abilities appear to be preserved.
Acquired Time Perception Deficits The study of individual cases of acquired amusia has indicated that time perception can also be specifically impaired following brain injury. For example, patients with lesions to their right anterior temporal lobe, planum temporale, or insula, display impaired rhythm (or time interval) perception even though their pitch interval perception is preserved (Confavreux, Croisile, Garassus, Aimard, & Trillet, 1992; Fujii et al., 1990; Mendez & Geehan, 1988). Impaired rhythm perception has also been documented following an anterior temporal resection that included the right STG (Liégeois-Chauvel et al., 1998), whereas impaired meter processing has been reported following resections of the anterior temporal lobe in both the right and left hemispheres (Liégeois-Chauvel et al., 1998; Schuppert et al., 2000). Given the different phenotypes of acquired amusia, the following section will present acquired amusia as a broader musical deficit (as measured by the global score of the MBEA).
Acquired Amusia: Larger Cohort Studies Although case studies are quite useful for the investigation of the different phenotypes of acquired amusia, larger cohort studies allow for the investigation of what these different acquired amusia phenotypes have in
common. For instance, reports indicate that musical disorders are commonly observed following a middle cerebral artery (MCA) stroke (Ayotte et al., 2000; Schuppert et al., 2000) and symptoms often persist beyond the acute phase of the stroke (Sihvonen, Ripollés, RodríguezFornells, Soinila, & Särkämö, 2017). In these patients, the severity of the ensuing music processing deficit, as assessed by the MBEA, is often greater when damage is localized in the right hemisphere (particularly in frontal and temporal areas), compared with the effects of damage in the left hemisphere (Särkämö et al., 2009). Moreover, using MRI Voxel-based lesion-symptom mapping and Voxelbased Morphometry (VBM) in a sample of 77 stroke patients (49 amusics), a recent study was able to identify which brain structures were typically damaged and which were typically preserved in acquired amusia (Sihvonen et al., 2016). Based on their findings, the authors concluded that damage to the right STG, middle temporal gyrus (MTG), insula, and putamen, formed the necessary neural substrate for acquired amusia. Furthermore, VBM analyses revealed that patients with persistent amusia had gray matter volume decreases in the right STG and MTG at the six-month follow-up assessment, in addition to a white matter volume decrease in the right MTG, when compared to control patients. In a follow-up study with a larger sample size (N = 90), acquired amusia was associated with an acute stage lesion pattern in the right temporal, insular, and striatal areas (Sihvonen, Ripollés, Rodríguez-Fornells, et al., 2017). Persistent amusia was associated with gray matter volume decreases in right temporal areas. The decrease in volume was localized more posteriorly in individuals with a pitch-based amusia and more anteriorly in those with a beat finding disorder. One of the particularly novel findings of the study was that more severe and persistent manifestations of amusia were associated with a more widespread pattern of acute stage lesions and gray matter volume changes in the right hemisphere, which encompassed not only temporal, insular, and striatal areas, but also frontal, parietal, and limbic areas. Similar to congenital amusia, which is characterized by neural anomalies that affect both functional and structural connectivity (Peretz, 2016), acquired amusia has been associated with structural damage in multiple white matter tracts, including the arcuate fasciculus, the inferior longitudinal fasciculus, the uncinate fasciculus, the frontal aslant tract, the
corpus callosum, and the right inferior fronto-occipital fasciculus (Sihvonen, Ripollés, Särkämö, et al., 2017). Furthermore, structural damage to the right inferior fronto-occipital fasciculus was also found to be the best predictor of global MBEA score. Taken together, these studies illustrate the widespread brain areas that, when lesioned, can produce the symptoms associated with acquired amusia. These areas not only include auditory regions but also several regions involved in higher-order cognitive processes, such as those found in the frontal and parietal cortex.
Acquired Amusia and its Comorbidities Because acquired amusia typically results from an insult to one of several cerebral structures, it is often comorbid with other conditions that affect higher cognitive functions. For example, a review of case studies showed that 55 percent of patients with acquired amusia also have at least some form of language impairment (Stewart et al., 2006). Similarly, it was later reported that individuals with acquired amusia steadfastly performed worse than non-amusics on tests of verbal expression and comprehension (Särkämö et al., 2009). These findings suggest a link between music and language processing that could potentially be mediated by some other cognitive processes. Furthermore, the latter study also showed an association between acquired amusia and visuospatial processing (Särkämö et al., 2009). Indeed, patients who showed contralesional spatial neglect in a visuospatial attention task were also considered amusic based on their global MBEA score. Interestingly, previous reports have also shown that visuospatial deficits and mild neglect tend to co-occur with musical deficits in right hemisphere-lesioned patients (Liégeois-Chauvel et al., 1998; Peretz, 1990), suggesting that music processing and spatial processing may be subserved by common cognitive processes. Finally, it is worth noting that acquired amusia has also been associated with a wide range of other cognitive dysfunctions that include working memory, verbal learning, executive functioning, and attention deficits (Särkämö et al., 2009). The severity of the amusia (MBEA score), therefore, is potentially exacerbated by such impairments, and consequently
not solely due to a dysfunction in brain areas that subserve music processing (Särkämö et al., 2009).
M
A
Although most people listen to music to help regulate their mood, individuals with a musical perception disorder may not experience this emotional appeal to music (Lonsdale & North, 2011) despite the fact that their ability to extract basic emotional information from music is relatively preserved (Gosselin, Paquette, & Peretz, 2015). The majority of amusics do not seek musical exposure and are generally found to be indifferent to it (Omigie, Müllensiefen, & Stewart, 2012; Gosselin et al., 2015). Two acquired amusia case studies suggest a possible dissociation between the ability to perceive music and the pleasure derived from listening to music. The first case, IR, is a patient who acquired amusia following brain damage following cerebral aneurysms in both the left and right middle cerebral arteries. Although IR could no longer discriminate fine pitch variations, she still enjoyed music and noted often dancing to music while doing chores (Peretz, Gagnon, & Bouchard, 1998). The second case is a patient who acquired amusia following an ischemic stroke. He subsequently reported having lost interest in music afterwards, as he no longer felt emotionally engaged when listening to music (Hirel et al., 2014). Although he could readily identify emotions in different sensory modalities (music, faces), his emotional intensity judgments for music were attenuated compared to those of his matched control. His lesions in the right hemisphere encompassed primarily the STG, the primary auditory cortex, but also included portions of the ventral segment of the middle temporal gyrus, amygdala, and insula. In the general population, some individuals do not experience pleasure when listening to music, while still being able to successfully extract its emotional content (Mas-Herrero, Zatorre, Rodríguez-Fornells, & MarcoPallarés, 2014). This condition, known as musical anhedonia, is believed to result from a disconnection between the perception of an emotion and experiencing it.
The results of a recent study revealed that participants with musical anhedonia show a selective reduction of brain activity in the nucleus accumbens that was specific to when listening to music, whereas normal activation patterns were observed during a monetary gambling task (Martínez-Molina, Mas-Herrero, Rodríguez-Fornells, Zatorre, & MarcoPallarés, 2016). Furthermore, anhedonic participants were also found to have decreased levels of functional connectivity between the right auditory cortex and the ventral striatum compared to control participants. Participants who displayed above average responses to music showed enhanced connectivity between these same structures. Taken together, these results suggest that musical anhedonia might result from dysfunctional connectivity between auditory areas and the subcortical structures implicated in the reward network.
W
H
W
L A
S ?
In this chapter, we examined the three most prevalent musical disorders: congenital amusia, beat finding disorder, and acquired amusia. The study of these disorders has not only revealed the importance of several right hemisphere brain structures for the processing of pitch, but also that a dysfunction of any of these areas can produce pitch processing disorders. Compared to congenital amusia, acquired amusia has a more heterogeneous presentation and is often comorbid with other cognitive impairments. These additional cognitive impairments need to be taken into consideration when attempting to isolate brain structures that are critical for certain types of musical processing. Although current evidence suggests a dissociation between pitch processing disorders and beat finding disorders, additional experiments are needed to paint a clearer picture of the neural structures that underlie the latter type.
What Does Congenital Amusia Have in Common with Other Developmental Disorders?
The current understanding that amusia is a disorder stemming from an anomalous recurrent processing between a sensory core (i.e., auditory cortex) and higher-order frontal areas (i.e., IFG) shares important similarities with other neurodevelopmental disorders, such as developmental dyslexia and congenital prosopagnosia (Paquette et al., 2018). Developmental dyslexia is a disorder that affects the accuracy and fluidity of reading (Lyon, Shaywitz, & Shaywitz, 2003), whereas congenital prosopagnosia consists of an inability to recognize faces (Behrmann & Avidan, 2005). In developmental dyslexia, although phonetic representations appear to be properly processed in the left STG, functional and structural connectivity anomalies between the left STG and left IGF prevent dyslexic individuals from consciously accessing those representations (Boets et al., 2013; Ramus, 2014). Similarly, the right fusiform face area (FFA) shows normal face selectivity in the majority of individuals suffering from prosopagnosia (Avidan & Behrmann, 2009). However, a reduction in the concentration of white matter fibers connecting the FFA to anterior frontal and temporal regions has been observed in prosopagnosic individuals (Thomas et al., 2008), suggesting that a transmission issue may also be at the root of the disorder. The parallels that can be drawn between amusia and these developmental disorders suggest that a key common neural feature underlies their expression: a breakdown in the communication pathways between a core sensory area and a higher-order brain region that are required for the conscious detection of subtle perceptual differences.
Different Phenotypes of Amusia Over the past decade, research on amusia has provided evidence that pitchbased amusia is a musical disorder whose origin is acoustic in nature. Indeed, a recent meta-analysis indicated that the processing deficit observed in congenital amusia is both acoustic (i.e., sounds that are not presented in a musical context such as a melody) and musical, thus suggesting that most congenital amusic cases are likely the result of an acoustic deficit that generates a musical deficit (Vuvan, Nunes-Silva, & Peretz, 2015). Indeed, because the conscious detection threshold of amusics is greater than the
intervals used in Western music, amusics do not have access to all the necessary information to perceive and produce music normally. However, amusia case studies have indicated that a double dissociation exists between acoustic and musical processing, suggesting that the presentation of the disorder may be more heterogeneous than previously thought. This dissociation is intriguing because it raises the question whether both types of processing stem from similar or distinct neural structures, and also generates new testable hypotheses for future neuroimaging experiments.
Improving Perceptual Outcomes in Amusia To this day, few studies have examined the potential benefit of rehabilitation strategies to attenuate the perceptual deficits of amusia, with only modest outcomes (see Whiteford & Oxenham, 2018 for a positive outcome). One promising avenue, however, is the use of neuromodulation techniques such as transcranial magnetic stimulation (TMS) or transcranial direct-current stimulation (tDCS). For instance, the application of tDCS has been shown to improve certain aspects of reading in children and adolescents suffering from dyslexia (Costanzo et al., 2016). Furthermore, it was recently shown that transcranial alternating current stimulation (tACS) can improve pitch memory in amusics (Schaal, Pfeifer, Krause, & Pollok, 2015). Because neuromodulation techniques can non-invasively modulate the activity of a given region, they could be used to better characterize the role of distinct cerebral networks in the expression of amusia, and to determine if the stimulation of different brain regions can improve the perceptual deficits observed in the different amusia phenotypes.
R Albouy, P., Mattout, J., Bouet, R., Maby, E., Sanchez, G., Aguera, P. E., … Tillmann, B. (2013). Impaired pitch perception and memory in congenital amusia: The deficit starts in the auditory cortex. Brain 136(5), 1639–1661. Albouy, P., Mattout, J., Sanchez, G., Tillmann, B., & Cacin, A. (2015). Altered retrieval of melodic information in congenital amusia: Insights from dynamic causal modeling of MEG data. Frontiers
in Human Neuroscience 9. Retrieved from http://journal.frontiersin.org/Article/10.3389/fnhum.2015.00020/abstract Avidan, G., & Behrmann, M. (2009). Functional MRI reveals compromised neural integrity of the face processing network in congenital prosopagnosia. Current Biology 19(13), 1146–1150. Ayotte, J., Peretz, I., & Hyde, K. (2002). Congenital amusia: A group study of adults afflicted with a music-specific disorder. Brain 125(2), 238–251. Ayotte, J., Peretz, I., Rousseau, I., Bard, C., & Bowjanowski, M. (2000). Patterns of music agnosia associated with middle cerebral artery infarcts. Brain 123(9), 1926–1938. Bégel, V., Benoit, C.-E., Correa, A., Cutanda, D., Kotz, S. A., & Dalla Bella, S. (2017). “Lost in time” but still moving to the beat. Neuropsychologia 94, 129–138. Behrmann, M., & Avidan, G. (2005). Congenital prosopagnosia: Face-blind from birth. Trends in Cognitive Sciences 9(4), 180–187. Boets, B., Op de Beeck, H. P., Vandermosten, M., Scott, S. K., Gillebert, C. R., Mantini, D., … Ghesquière, P. (2013). Intact but less accessible phonetic representations in adults with dyslexia. Science 342(6163), 1251–1254. Cirelli, L. K., Einarson, K. M., & Trainor, L. J. (2014). Interpersonal synchrony increases prosocial behavior in infants. Developmental Science 17(6), 1003–1011. Clark, C. N., Golden, H. L., & Warren, J. D. (2015). Acquired amusia. Handbook of Clinical Neurology 129, 607–631. Confavreux, C., Croisile, B., Garassus, P., Aimard, G., & Trillet, M. (1992). Progressive amusia and aprosody. Archives of Neurology 49(9), 971–976. Costanzo, F., Varuzza, C., Rossi, S., Sdoia, S., Varvara, P., Oliveri, M., … Menghini, D. (2016). Reading changes in children and adolescents with dyslexia after transcranial direct current stimulation. NeuroReport 27(5), 295–300. Darwin, C. (1871). The descent of man, and selection in relation to sex. London: Murray. Fujii, T., Fukatsu, R., Watabe, S. I., Ohnuma, A., Teramura, K., Kimura, I., … Kogure, K. (1990). Auditory sound agnosia without aphasia following a right temporal lobe lesion. Cortex 26(2), 263– 268. Gosselin, N., Paquette, S., & Peretz, I. (2015). Sensitivity to musical emotions in congenital amusia. Cortex 71, 171–182. Hirel, C., Lévêque, Y., Deiana, G., Richard, N., Cho, T.-H., Mechtouff, L., … Nighoghossian, N. (2014). Amusie acquise et anhédonie musicale. Revue Neurologique 170(8–9), 536–540. Hochman, M., & Abrams, K. (2014). Amusia for pitch caused by right middle cerebral artery infarct. Journal of Stroke and Cerebrovascular Diseases 23(1), 164–165. Honing, H. (2013). Structure and interpretation of rhythm in music. In D. Deutsch (Ed.), The psychology of music (pp. 369–404). London: Academic Press. Hyde, K. L., Lerch, J. P., Zatorre, R. J., Griffiths, T. D., Evans, A. C., & Peretz, I. (2007). Cortical thickness in congenital amusia: When less is better than more. Journal of Neuroscience 27(47), 13028–13032. Hyde, K. L., & Peretz, I. (2004). Brains that are out of tune but in time. Psychological Science 15(5), 356–360. Hyde, K. L., Zatorre, R. J., Griffiths, T. D., Lerch, J. P., & Peretz, I. (2006). Morphometry of the amusic brain: A two-site study. Brain 129(10), 2562–2570. Hyde, K. L., Zatorre, R. J., & Peretz, I. (2011). Functional MRI evidence of an abnormal neural network for pitch processing in congenital amusia. Cerebral Cortex 21(2), 292–299. Koelsch, S. (2011). Toward a neural basis of music perception: A review and updated model. Frontiers in Psychology 2. Retrieved from https://doi.org/10.3389/fpsyg.2011.00110
Koelsch, S., & Siebel, W. A. (2005). Towards a neural basis of music perception. Trends in Cognitive Sciences 9(12), 578–584. Leveque, Y., Fauvel, B., Groussard, M., Caclin, A., Albouy, P., Platel, H., & Tillmann, B. (2016). Altered intrinsic connectivity of the auditory cortex in congenital amusia. Journal of Neurophysiology 116(1), 88–97. Liégeois-Chauvel, C., Peretz, I., Babaï, M., Laguitton, V., & Chauvel, P. (1998). Contribution of different cortical areas in the temporal lobes to music processing. Brain 121(10), 1853–1867. Lonsdale, A. J., & North, A. C. (2011). Why do we listen to music? A uses and gratifications analysis. British Journal of Psychology 102(1), 108–134. Loui, P., Alsop, D., & Schlaug, G. (2009). Tone deafness: A new disconnection syndrome? Journal of Neuroscience 29(33), 10215–10220. Lyon, G. R., Shaywitz, S. E., & Shaywitz, B. A. (2003). A definition of dyslexia. Annals of Dyslexia 53(1), 1–14. Martínez-Molina, N., Mas-Herrero, E., Rodríguez-Fornells, A., Zatorre, R. J., & Marco-Pallarés, J. (2016). Neural correlates of specific musical anhedonia. Proceedings of the National Academy of Sciences 113(46), E7337–E7345. Mas-Herrero, E., Zatorre, R. J., Rodríguez-Fornells, A., & Marco-Pallarés, J. (2014). Dissociation between musical and monetary reward responses in specific musical anhedonia. Current Biology 24(6), 699–704. Mathias, B., Lidji, P., Honing, H., Palmer, C., & Peretz, I. (2016). Electrical brain responses to beat irregularities in two cases of beat deafness. Frontiers in Neuroscience 10. Retrieved from https://doi.org/10.3389/fnins.2016.00040 Mendez, M. F., & Geehan, G. R. (1988). Cortical auditory disorders: Clinical and psychoacoustic features. Journal of Neurology, Neurosurgery, and Psychiatry 51(1), 1–9. Merriam, A. P. (1964). The anthropology of music. Evanston, IL: Northwestern University Press. Moreau, P., Jolicœur, P., & Peretz, I. (2009). Automatic brain responses to pitch changes in congenital amusia. Annals of the New York Academy of Sciences 1169, 191–194. Neave, N., McCarty, K., Freynik, J., Caplan, N., Honekopp, J., & Fink, B. (2011). Male dance moves that catch a woman’s eye. Biology Letters 7(2), 221–224. Nettl, B. (2000). An ethnomusicologist contemplates universals in musical sound and musical culture. In N. L. Wallin, B. Merker, & S. Brown (Eds.), The origins of music (pp. 463–472). Cambridge, MA: MIT Press. Nozaradan, S., Peretz, I., & Mouraux, A. (2012). Selective neuronal entrainment to the beat and meter embedded in a musical rhythm. Journal of Neuroscience 32(49), 17572–17581. Omigie, D., Müllensiefen, D., & Stewart, L. (2012). The experience of music in congenital amusia. Music Perception: An Interdisciplinary Journal 30(1), 1–18. Opitz, B., Rinne, T., Mecklinger, A., von Cramon, D. Y., & Schröger, E. (2002). Differential contribution of frontal and temporal cortices to auditory change detection: fMRI and ERP results. NeuroImage 15(1), 167–174. Palmer, C., Lidji, P., & Peretz, I. (2014). Losing the beat: Deficits in temporal coordination. Philosophical Transactions of the Royal Society B: Biological Sciences 369(1658), 20130405– 20130405. Paquette, S., Fujii, S., Li, H. C., & Schlaug, G. (2017). The cerebellum’s contribution to beat interval discrimination. NeuroImage, 163, 177–182. Paquette, S., Li, H. C., Corrow, S. L., Buss, S. S., Barton, J., & Schlaug, G. (2018). Developmental perceptual impairments: Cases when tone-deafness and prosopagnosia co-occur. Frontiers in human neuroscience, 12, 438.
Peretz, I. (1990). Processing of local and global musical information by unilateral brain-damaged patients. Brain 113(4), 1185–1205. Peretz, I. (2001). Brain specialization for music. Annals of the New York Academy of Sciences 930, 153–165. Peretz, I. (2002). Brain specialization for music. The Neuroscientist 8(4), 372–380. Peretz, I. (2006). The nature of music from a biological perspective. Cognition 100(1), 1–32. Peretz, I. (2016). Neurobiology of congenital amusia. Trends in Cognitive Sciences 20(11), 857–867. Peretz, I., Ayotte, J., Zatorre, R. J., Mehler, J., Ahad, P., Penhune, V. B., & Jutras, B. (2002). Congenital amusia: A disorder of fine-grained pitch discrimination. Neuron 33(2), 185–191. Peretz, I., Brattico, E., Järvenpää, M., & Tervaniemi, M. (2009). The amusic brain: In tune, out of key, and unaware. Brain 132(5), 1277–1286. Peretz, I., Brattico, E., & Tervaniemi, M. (2005). Abnormal electrical brain responses to pitch in congenital amusia. Annals of Neurology 58(3), 478–482. Peretz, I., Champod, A. S., & Hyde, K. (2003). Varieties of musical disorders: The Montreal battery of evaluation of amusia. Annals of the New York Academy of Sciences 999, 58–75. Peretz, I., & Coltheart, M. (2003). Modularity of music processing. Nature Neuroscience 6(7), 688– 691. Peretz, I., Cummings, S., & Dubé, M. P. (2007). The genetics of congenital amusia (tone deafness): A family-aggregation study. American Journal of Human Genetics 81(3), 582–588. Peretz, I., Gagnon, L., & Bouchard, B. (1998). Music and emotion: Perceptual determinants, immediacy, and isolation after brain damage. Cognition 68(2), 111–141. Peretz, I., Kolinsky, R., Tramo, M., Labrecque, R., Hublet, C., Demeurisse, G., & Belleville, S. (1994). Functional dissociations following bilateral lesions of auditory cortex. Brain 117(6), 1283– 1301. Peretz, I., & Vuvan, D. (2017). Prevalence of congenital amusia. European Journal of Human Genetics 25, 625–630. Phillips-Silver, J., Toivianen, P., Gosselin, N., Piché, O., Nozaradan, S., Palmer, C., & Petetz, I. (2011). Born to dance but beat deaf: A new form of congenital amusia. Neuropsychologia 49(5), 961–969. Ramus, F. (2014). Neuroimaging sheds new light on the phonological deficit in dyslexia. Trends in Cognitive Sciences 18(6), 274–275. Repp, B. H. (2005). Sensorimotor synchronization: A review of the tapping literature. Psychonomic Bulletin & Review 12(6), 969–992. Salimpoor, V. N., van den Bosch., I., Kovacevik, N., McIntosh, A. R., Dagher, A., & Zatorre, R. J. (2013). Interactions between the nucleus accumbens and auditory cortices predict music reward value. Science 340(6129), 216–219. Särkämö, T., Tervaniemi, M., Soinila, S., Autti, T., Silvennoinen, H. M., Laine, M., & Hietanen, M. (2009). Cognitive deficits associated with acquired amusia after stroke: A neuropsychological follow-up study. Neuropsychologia 47(12), 2642–2651. Schaal, N. K., Pfeifer, J., Krause, V., & Pollok, B. (2015). From amusic to musical? Improving pitch memory in congenital amusia with transcranial alternating current stimulation. Behavioural Brain Research 294, 141–148. Schuppert, M., Münte, T. F., Wieringer, B. M., & Altenmüller, E. (2000). Receptive amusia: Evidence for cross-hemispheric neural networks underlying music processing strategies. Brain 123(3), 546–559. Sihvonen, A., Ripollés, P., Leo, V., Rodríguez-Fornells, A., Soinila, S., & Särkämö, T. (2016). Neural basis of acquired amusia and its recovery after stroke. Journal of Neuroscience 36(34), 8872– 8881.
Sihvonen, A., Ripollés, P., Rodríguez-Fornells, A., Soinila, S., & Särkämö, T. (2017). Revisiting the neural basis of acquired amusia: Lesion patterns and structural changes underlying amusia recovery. Frontiers in Neuroscience 11. Retrieved from https://doi.org/10.3389/fnins.2017.00426 Sihvonen, A., Ripollés, P., Särkämö, T., Leo, V., Rodríguez-Fornells, A., Saunavaara, J., … Soinila, S. (2017). Tracting the neural basis of music: Deficient structural connectivity underlying acquired amusia. Cortex 97, 255–273. Sowiński, J., & Dalla Bella, S. (2013). Poor synchronization to the beat may result from deficient auditory-motor mapping. Neuropsychologia 51(10), 1952–1963. Stewart, L., von Kriegstein, K., Warren, J. D., & Griffiths, T. D. (2006). Music and the brain: Disorders of musical listening. Brain: A Journal of Neurology 129(Pt. 10), 2533–2553. Thomas, C., Avidan, G., Humphreys, K., Jung, K. J., Gao, F., & Behrmann, M. (2008). Reduced structural connectivity in ventral visual cortex in congenital prosopagnosia. Nature Neuroscience 12(1), 29–31. Tranchant, P., & Vuvan, D. T. (2015). Current conceptual challenges in the study of rhythm processing deficits. Frontiers in Neuroscience 9. Retrieved from https://doi.org/10.3389/fnins.2015.00197 Tranchant, P., Vuvan, D. T., & Peretz, I. (2016). Keeping the beat: A large sample study of bouncing and clapping to music. PloS ONE 11(7), e0160178. Trehub, S. E. (2001). Musical predispositions in infancy. Annals of the New York Academy of Sciences 930, 1–16. Vuvan, D., Nunes-Silva, M., & Peretz, I. (2015). Meta-analytic evidence for the non-modularity of pitch processing in congenital amusia. Cortex 69, 186–200. Vuvan, D., Paquette, S., Mignault Goulet, G., Royal, I., Felezeu, M. & Peretz, I. (2017). The Montreal protocol for identification of amusia. Behavior Research Methods 50. Whiteford, K. L., & Oxenham, A. J. (2018). Learning for pitch and melody discrimination in congenital amusia. Cortex 103, 164–178. Wilbiks, J. M. P., Vuvan, D. T., Girard, P.-Y., Peretz, I., & Russo, F. A. (2016). Effects of vocal training in a musicophile with congenital amusia. Neurocase 22(6), 526–537. Wiltermuth, S. S., & Heath, C. (2009). Synchrony and cooperation. Psychological Science 20(1), 1– 5. Zatorre, R. J., & Peretz, I. (2001). The biological foundations of music. Annals of the New York Academy of Sciences 930, 281–299.
CHAPT E R 32
W H E N B L U E T U R N S TO G R AY: T H E E N I G M A O F M U S I C I A N ’ S D Y S TO N I A D AV I D P E T E R S O N A N D E C K A RT A LT E N MÜ L L E R
I Dystonia M ’ dystonia is a particular form of dystonia. The concept of dystonia has evolved over the past several decades. A recent update to its definition reflects a multiyear effort to achieve some consensus on how it is described: Dystonia is a movement disorder characterized by sustained or intermittent muscle contractions causing abnormal, often repetitive, movements, postures, or both. Dystonic movements are typically patterned, twisting, and may be tremulous. Dystonia is often initiated or worsened by voluntary action and associated with overflow muscle activation. (Albanese et al., 2013)
Dystonia can refer to characteristic symptoms that are secondary to a long list of other, mostly neurologic, disorders. Dystonia can also be the primary disorder, referred to as “isolated dystonia.” References to dystonia in the
remainder of this chapter will refer to the primary, isolated form of dystonia. There are many forms of dystonia, most commonly differentially characterized by their age of onset and the distribution of body regions that are symptomatic. Dystonias with onset in childhood or adolescence tend to be more general, involving several parts of the body and are more likely to begin in the lower extremities. Adult onset dystonias are much more common and typically involve only a single body region (“focal dystonia”) or a small number of contiguous body regions (“segmental dystonia”). The various focal dystonias share several characteristics, including a lack of selectivity in attempts to perform specific movements, such as individual finger movements in the case of focal hand dystonia (FHD). In some cases, focal dystonias also exhibit undesirable co-contraction of antagonistic muscles (Cohen & Hallett, 1988).
Musician’s Dystonia Musician’s dystonia (MD) is a specific type of focal dystonia. Onset can range from approximately 18 years old to the seventh decade, but most commonly occurs in the mid-30s (Altenmüller, 2003; Brandfonbrener, 1995; Brandfonbrener & Robson, 2004; Chang & Frucht, 2013; Conti, Pullman, & Frucht, 2008; Jankovic & Shale, 1989; Lederman, 1991; Schuele & Lederman, 2004a). MD is one of the most perplexing forms of dystonia. Documenting a cohort of over 590 MD patients diagnosed between 1994 and 2007 at the Institute of Music Physiology and Musicians’ Medicine of the Hanover University of Music, Drama, and Media (Altenmüller, Baur, Hofmann, Lim, & Jabusch, 2012; Jabusch & Altenmüller, 2006a) it has been associated with almost every instrument and in several body regions. In every case MD involves impaired voluntary motor control while a musician is playing the instrument. The symptoms generally appear during movements that are extensively trained. It can affect control of facial, lip, and tongue muscles (“embouchure dystonia,” Frucht et al., 2001), lower limbs, or, in the majority of patients, the muscles controlling the arm or hand. MD involving the hand is a form of FHD.
MD is sometimes referred to as “musician’s cramp” because it is often described in conjunction with a form of FHD called “writer’s cramp.” However, the term “cramp” can be misleading; MD rarely involves the maximum intensity contractions associated with cramps (Pesenti, Barbieri, & Priori, 2004; Tubiana, 2000). In the hand, MD is usually associated with loss of fine control and coordination most commonly in heterogeneous subsets of digits 2–5 (Charness, Ross, & Shefner, 1996; Frucht, 2009b; Furuya, Tominaga, Miyazaki, & Altenmüller, 2015; Jankovic & Ashoori, 2008; Jankovic & Shale, 1989). The relative amount of excessive finger flexion or extension depends on the type of instrument (Frucht, 2009b; Conti et al., 2008). Flexion is more common than extension. If multiple fingers are involved, they are usually adjacent fingers. Patients report a feeling of loss of automaticity in previously automatic music performance (Frucht, 2009b). Several examples of abnormal postural configurations are depicted in Fig. 1. MD is painless in most, but not all, patients (Jabusch & Altenmüller, 2006b). Indeed pain may suggest diagnosis of repetitive strain injury or occupational fatigue syndrome rather than MD.
FIGURE 1. Representative posture patterns in musician’s dystonia. Reproduced from Hans-Christian Jabusch and Eckart Altenmüller, Epidemiology, phenomenology, and therapy of musician’s cramp. In Eckart Altenmüller, Mario Wiesendanger, and Jürg Kesselring (Eds.), Music, motor control, and the brain, p. 267, Figure 17.2, doi:10.1093/acprof:oso/9780199298723.001.0001, Copyright © Oxford University Press 2006, reproduced by permission of Oxford University Press.
MD is the form of focal hand dystonia with the highest rate of prevalence (Altenmüller & Jabusch, 2009). Curiously, MD is about ten times more prevalent than corresponding focal dystonias in the general public. An estimated 1–2 percent of all musicians develop MD (Altenmüller, 2003). MD is the performance-related medical problem that is most likely to lead to long-term disability in musicians (Schuele & Lederman, 2004b). Because treatments are frequently suboptimal and usually incomplete, and because musicians’ identities are strongly intertwined with their profession, news of the MD diagnosis can be devastating. However, it should be mentioned that prognosis has improved the last twenty years, and around 70 percent of musicians suffering from
focal dystonia remain in their professions (Lee, Eich, Ioannou, & Altenmüller, 2015a).
Essential Characteristics of Musician’s Dystonia Two characteristics of MD stand out: how symptoms are localized to a specific body part and how symptoms exhibit task specificity. Fig. 2 illustrates how localization of the dystonia in the left versus right hands, and in some cases facial muscles, are differentially involved depending on the type of musical instrument (Frucht, 2009a; Jabusch & Altenmüller, 2006b). Instruments such as keyboards (piano, organ, harpsichord) and plucked instruments (guitar, electric bass) are associated with MD predominantly in the right hand. Bowed string instruments are associated with MD predominantly in the left hand (Altenmüller et al., 2012; Jabusch & Altenmüller, 2006b). It remains unclear to what extent the demands of music performance on the particular type of instrument factor into this lateral asymmetry. Many classical repertoires place great demands on both hands at the keyboard, for example. Fine motor control in other activities of daily living also contributes to susceptibility in the individual’s dominant hand (Baur, Jabusch, & Altenmüller, 2011). Naturally, brass players are most likely to exhibit embouchure dystonia. Also, brass players exhibit a higher ratio of embouchure to hand dystonia than woodwind players. This makes sense in light of the motor control demands of the instruments. Brass players precisely control frequency and amplitude of lip vibrations by modulating embouchure muscle tension. The demands for woodwind instruments are different: embouchure adjustments do not require lip vibration but finger movement patterns are more complex, explaining why dystonia is common in the hand in woodwinds but very rare among brass players (Altenmüller et al., 2012).
FIGURE 2. Symptom localization among the hands and embouchure in musician’s dystonia. Reproduced from Hans-Christian Jabusch and Eckart Altenmüller, Epidemiology, phenomenology, and therapy of musician’s cramp. In Eckart Altenmüller, Mario Wiesendanger, and Jürg Kesselring (Eds.), Music, motor control, and the brain, p. 267, Figure 17.3, doi:10.1093/acprof:oso/9780199298723.001.0001, Copyright © Oxford University Press 2006, reproduced by permission of Oxford University Press.
Task specificity refers to the phenomenon whereby symptoms appear only during certain tasks. It is a hallmark of many forms of dystonia, including not only those forms explicitly labeled as “task-specific dystonias,” but also other forms such as cranial and cervical dystonia where symptoms are often sensitive to the task context, such as whether or not a patient is talking. This characteristic is one of the reasons why many of the focal dystonias were historically considered a psychiatric disorder long before they were considered neurologic (Marsden & Sheehy, 1990). Among the focal dystonias, MD exhibits some of the most exquisite task specificity. For many patients, the symptoms are present only while playing the instrument and, in some cases, only in specific passages of specific pieces (Jabusch & Altenmüller, 2006b; Lee, Tominaga, Furuya, Miyazaki, & Altenmüller, 2015b; Tubiana, 2003). Broadly defined, a “task” context can include other elements of the patient’s moment-by-moment sensorimotor state. Thus so called “sensory tricks,” or the “geste antagoniste,” that patients can use to transiently alleviate symptoms, can be viewed as a particular manifestation of task specificity. As with other focal dystonias,
some MD patients can benefit from sensory tricks (Paulig, Jabusch, Grossbach, Boullet, & Altenmüller, 2014). For example, some MD patients benefit from playing with a latex glove, or when holding an object such as a rubber gum between the fingers (Jabusch & Altenmüller, 2006a). Collectively the localization and task specificity characteristics of MD are important features to keep in mind when assessing the patient’s severity and response to treatment.
A The Importance of Rating Scales As with most disorders, it is a common convention to carefully assess the patient before initiating treatment. Compared to characterizing severity, diagnosis is relatively straightforward and is not addressed further here. But evaluating the efficacy of treatments inherently requires some comparison of severity before and at various time points after treatment. The class of tools used to measure severity are often referred to as “rating scales” because they have historically most commonly involved a human—the clinician or patient or both—making some ratings of severity on a previously agreed upon scale. Rating scales provide important outcome measures for clinical trials. They also commonly serve stratification purposes in genetic studies or as a regressor for research into pathophysiology. Thus they have become a critical path tool for the whole pipeline of research into new treatments. Tailoring the Dystonia Study Group (Group, 2004) guidelines for musician’s dystonia, a maximally useful rating scale for MD should be: (1) reliable and valid, (2) sensitive to change, (3) specifically designed to measure MD, and (4) practical in a clinical setting (Spector & Brandfonbrener, 2005, 2007).
Rating Scales for Musician’s Dystonia
Motivated by Spector and Brandfonbrener’s initial effort (Spector & Brandfonbrener, 2007), Peterson and colleagues (Peterson, Berque, Jabusch, Altenmüller, & Frucht, 2013) conducted a critical and comprehensive review of MD rating scales in 2013. The latter used considerably less restrictive inclusion criteria to comprehensively review the use of rating scales in 135 articles on MD. They provided complete descriptions of the scales, variations in their use, and their properties relative to the Dystonia Study Group’s guidelines for clinical utility. They also systematically evaluated the distribution of each scale’s use in the literature, including studies involving various treatment approaches and pathophysiological assays. As shown in Fig. 3, the various scales can be divided into subjective and objective measures, with subjective being further subdivided into patient- or clinician-rated.
FIGURE 3. Rating scale use in the musician’s dystonia literature. (A) : histogram of number scales used in each study; (B) : number of studies using each type of scale (subjective by patient, subjective by clinician, objective, or combinations thereof; (C) : number of studies using each scale, grouped by type. Reproduced from David A. Peterson, Patrice Berque, Hans-Christian Jabusch, Eckart Altenmüller, and Steven J. Frucht, Rating scales for musician’s dystonia, Neurology 81(6), 589–598, https://doi.org/10.1212/WNL.0b013e31829e6f72, Copyright © 2013 American Academy of Neurology.
A noteworthy objective scale is MIDI-based Scale Analysis (MSA). For MSA, keyboardists play 10–15 iterations of two octaves of the C major scale in ulnar and radial directions mezzo forte legato style at a tempo of eight notes per second. Key press timing is recorded through a standard MIDI interface. Key press and release timing provide measures of tone durations, overlaps, and interonset intervals (IOIs). The standard deviation
of IOIs (sdIOI) is used to quantify the temporal evenness with which the scales are performed. The sdIOI has provided excellent sensitivity and become the primary outcome measure in subsequent studies using MSA (Jabusch, Vauth, & Altenmüller, 2004c).
Rating Scale Deficiencies Rating scales have provided a way to strengthen an assessment exercise that is otherwise largely qualitative by supplementing it with measures that are inherently quantitative. Curiously, only about half of all experimental studies in MD used quantitative assessments (Peterson et al., 2013). Unfortunately, none of the scales have been rigorously evaluated against the Dystonia Study Group’s criteria for a maximally useful rating scale (Spector & Brandfonbrener, 2007): reliable and valid, sensitive to change, practical in a clinical setting, and specifically tailored to MD. The subjective scales lack the sensitivity needed to compare treatments with similar efficacy because they have high inter-rater variability. They also lack digit-level specificity, which is central for many patients with FHD. Some of the rating scales used in MD—such the Fahn-Marsden (FM) scale, Unified Dystonia Rating Scale (UDRS), and Global Dystonia Scale (GDS) —were originally designed for generalized dystonia or focal forms other than MD. Although they convey “global impressions” based on clinical observation, they are not tailored to task-specific motor impairments (Peterson et al., 2013). A few scales—such as the TCS, FAM, and TRE— incorporate a symptom-evoking performance element. This is key for MD as it is inherently task-specific for most patients. However, only a small minority of past MD research has used these scales. Objective scales, e.g., based on kinematics or MIDI for hand dystonia or acoustic analysis of the fundamental in embouchure dystonia (Lee, Voget, Furuya, Morise, & Altenmüller, 2016), offer the benefits of mitigating the intra- and inter-rater variability intrinsic to subjective scales. However, objective scales require additional infrastructure, limiting their efficient use in the clinic. Regardless of how MD severity is measured, the relative merits of those techniques should be considered in the context of choosing, evaluating, and updating strategies to treat MD.
T Treating Musician’s Dystonia Options for treating MD overlap substantially with those for other focal dystonias. Yet MD is one of the most difficult forms of dystonia to treat (Jankovic & Ashoori, 2008; Tubiana & Chamagne, 1993). In general, the treatments are aimed at the symptoms, not the causes or underlying pathophysiology. This is unsurprising given what little is known about the pathogenesis and pathophysiology of focal dystonias. A summary list of the treatments and rough stratification of their efficacy across a wide variety of MD patients is given in Fig. 4.
FIGURE 4. Treatment efficacy based on patient ratings. Black bars: deterioration. Gray bars: No change. White bars: alleviation. Hatched bars: no reply. Trhx, trihexyphenidyl; BT, botulinum toxin. Reproduced from Hans-Christian Jabusch, Dorothea Zschucke, Alexander Schmidt, Stephan Schuele, and Eckart Altenmüller, Focal dystonia in musicians: Treatment strategies and longterm outcome in 144 patients, Movement Disorders 20(12), 1623–1626, https://doi.org/10.1002/mds.20631, Copyright © 2005 Movement Disorder Society.
The primary oral medication that has been tried is trihexyphenidyl, an anticholinergic. However, at doses sufficient to demonstrate efficacy in adults it usually is accompanied by severe adverse side effects. Botulinum toxin injections into affected muscles have become the mainstay for treating
focal dystonias. Their net effect is to block neurotransmission at the neuromuscular junction. In the case of MD, injection efficacy can be greatly enhanced by carefully selecting the muscle to inject, including by distinguishing primary from compensatory movements (Frucht, 2015) and appropriate incorporation of EMG and ultrasound guidance. The injections can offer some symptomatic relief for many. There are reports of particularly successful cases (e.g., Vecchio et al., 2012). However, botulinum toxins also have several limiting adverse side effects (Frucht, 2009b; Jankovic & Ashoori, 2008; Zeuner & Molloy, 2008), particularly when lateral finger movements are an important part of the motor repertoire. If not carefully planned and administered, they often lead to weakness that limits hand performance. Even with optimal doses and injection locations, efficacy tends to wear off, typically 2–4 months post injection. Thus injections need to be periodically repeated in perpetuity and there is the natural variation in efficacy associated with where one is in the treatment “cycle.” For the overwhelming majority of MD patients, currently available treatments are suboptimal. Because many MD patients are professional musicians that play at very high levels, the diagnosis can spell the end of their performance career (Conti et al., 2008; Frucht, 2009a). Thus MD remains as one of the primary challenges in musician’s medicine (Jabusch & Altenmüller, 2006a; Rosset-Llobet, Candia, Molas, Cubells, & PascualLeone, 2009; Tubiana & Chamagne, 1993).
Physical Medicine and Rehabilitation Pharmacologic treatment strategies suffer from a lack of specificity that is particularly important in MD. Systemic oral medications have an obvious lack of spatial selectivity. It is highly unlikely, for example, that there is a simple mapping between cholinergic receptors in various brain regions and alterations in brain circuitry that coincide with very specific motor repertoires selectively affected in music performance. There is also a mismatch between the slow pharmacokinetics of oral medications and the rapid temporal dynamics of MD symptoms. In the case of botulinum toxins, they have an inherently good spatial selectivity in the periphery, but again
suffer from a mismatch in temporal dynamics. Given these deficiencies of oral medications and botulinum toxin approaches, and the characteristics of symptom localization and exquisite task specificity in MD, it is unsurprising that a diverse set of non-pharmacologic treatment strategies have been attempted. Prominent among the non-pharmacologic approaches are a variety of techniques under the umbrella of what can be called physical medicine and rehabilitation (PMR). These go by many names, including rehabilitation strategies, behavioral therapies, behavioral interventions, retraining programs, pedagogical retraining, technical exercises, non-specific exercises on the instrument, etc. In each case, they involve some subset of principles from PMR. They have been reviewed for their use in dystonia, including with foci of the hand and wrist, but more often for the writer’s cramp form of FHD than for MD per se (Bernstein et al., 2016; Valdes, Naughton, & Algar, 2014). In some cases, PMR approaches have involved temporarily limiting movement. Specifically, they involve immobilizing or modifying the range of motion of the affected motor system with some form of splint for a period of weeks to years (Priori, Pesenti, Cappellari, Scarlato, & Barbieri, 2001; Satoh, Narita, & Tomimoto, 2011), and can produce improvements that are sustained well beyond the end of the intervention (Priori et al., 2001). The PMR methods have a long history, going at least as far back as to predate recognition of FHD in the musician as dystonia (Hays, 1987). Collectively they are considered useful (Lederman, 2001), but often exhibit benefits that are mixed and/or transient, require months or even years of therapy (Sakai, 2006; Tubiana, 2000), and suffer from varied levels of compliance. The reasons are generally unclear, but for some approaches it may be because the methods are not tailored to each musician’s exquisite and very personalized motor repertoire. In the case of most PMR approaches, replicating the results has either not been attempted or not been successful. Also, the studies have suffered from several design deficiencies, mostly because it is inherently difficult to incorporate careful controls and blinding. Some research into focal dystonia pathophysiology has been repurposed for therapeutic potential. Prominent examples include non-invasive brain stimulation methods such as transcranial magnetic stimulation (TMS) and direct current stimulation (DCS). They have been tried in isolation and in
conjunction with PMR techniques, with the hypothesis that they might amplify, accelerate, or make more persistent the beneficial effects of the PMR approaches in isolation (Furuya, Nitsche, Paulus, & Altenmüller, 2014; Rosset-Llobet et al., 2015). These brain stimulation methods offer the advantage that they can be administered in a control (placebo) fashion, thereby enabling blinded, controlled trials (Rosset-Llobet, Fabregas-Molas, & Pascual-Leone, 2015). TMS and DCS are discussed further in the next section on pathophysiology.
P Pathophysiology versus Pathogenesis Even in scientific forums, distinctions between association and causation are sometimes unclear. Characterizations of research in focal dystonia are no exception. One way we have attempted to make this distinction more clear in this chapter is by discussing matters of pathophysiology and pathogenesis in separate sections. Most if not all findings of pathological physiology in FHD, including MD, should be viewed as simple associations. Whether or not they are potential causes or consequences of the disorder is particularly difficult to determine. For example, even evidence that clinical recovery in MD coincides with normalization of receptive field topography (Candia, Wienbruch, Elbert, Rockstroh, & Ray, 2003) does not necessarily suggest that transitions from normal to abnormal receptive field topographies play a causal role.
Classic Concepts on Focal Dystonia Pathophysiology Most of the information about the pathophysiology of MD is actually inferred from research including cohorts comprised of some (or all) patients
with other forms of focal task-specific dystonias, especially writer’s cramp. General consensus is that most, but probably not all, physiological features are common across the focal task-specific dystonias. In this section, unless denoted otherwise, the pathophysiological features were found in studies with FHD patients, sometimes but not always including MD patients. Over the past few decades, a small handful of recurrent themes have characterized the pathophysiology of FHD (for a good review, see Hallett, 2006). These include abnormalities in inhibition, sensorimotor integration, and plasticity. Aspects of all of these have been documented at various nodes in the brain circuits mediating motor (and sensorimotor) function, as illustrated by fMRI (Haslinger et al., 2017; Oga et al., 2002), diffusion tensor imaging (DTI, Delmaire et al., 2009), and even gamma knife thalamotomies (Horisawa et al., 2017). Roughly speaking, the brain regions implicated include several somatosensory and somatomotor cortical areas, some association cortical areas, several basal ganglia nuclei, portions of the cerebellum, and motor and intralaminar nuclei in the thalamus. Brainstem and spinal circuits may also play a role, and have been indirectly implicated in some studies focused on alterations to reflex systems in paired pulse paradigms. Investigators have found a decrease in net inhibition at each level of the motor control circuitry. Perhaps more importantly, the decreased inhibition has more specifically been associated with a loss of spatial selectivity, and this has been put forth as a potential endophenotype for focal dystonia (Altenmüller et al., 2012). It has been documented in the spatial domain, as a blurred differentiation of individual digits at the behavioral and the neural levels in FHD (Bara-Jimenez, Catalan, Hallett, & Gerloff, 1998; Delmaire et al., 2005; Sohn & Hallett, 2004), including in MD (Elbert et al., 1998). It has also been documented in the time domain, as in reduced TMS-evoked silent periods (Chen et al., 1997; Ridding, Sheean, Rothwell, Inzelberg, & Kujirai, 1995) and defective compliance with the no-go signal in a stopsignal task (Ruiz et al., 2009). Interestingly, the exact nature of the selective inhibition in the somatosensorimotor system may be different between writer’s cramp and MD forms of FHD. Karin Rosenkranz (Rosenkranz et al., 2005) evaluated a TMS-based measure of fast, local inhibition in the cortex—the “short-latency intracortical inhibition” (SICI)—in the context of simultaneous vibratory input to individual hand muscles. She found that although SICI was unchanged for neighboring muscles in writer’s cramp
patients, it was suppressed selectively in neighboring muscles that are functionally connected with the vibrated muscles in healthy musicians but non-selectively in MD patients. They hypothesized that musicians’ extensive practice produces the altered surround inhibition that later progresses into a non-focal pattern seen in MD. There is also evidence for altered sensorimotor integration in FHD. Part of this characterization stems for the ambiguity regarding when “sensory” ends and “motor” begins in the nervous system—both functionally and anatomically. The lines of demarcation are acutely blurred in music, wherein tight, temporally nested sensorimotor loops are central and critical to music comprehension and production. Notwithstanding semantics, pathological sensorimotor integration has evolved as a consistent theme in focal dystonia pathophysiology. Both the functional manifestations and circuit bases have been recently reviewed (Avanzino, Tinazzi, Ionta, & Fiorio, 2015) and will not be covered in detail here. Notably, although the Rosenkranz (Rosenkranz et al., 2005) study focused on measures of spatial inhibition with the SICI measure of cortical physiology, it also intrinsically investigated sensorimotor integration by combining somatosensory input and motor evoked potentials. The focus of this and other previous sensorimotor integration research has been primarily on spatial selectivity, with generally fixed temporal parameters. However, an emergent theme in focal dystonia research is that time and timing may play particularly critical roles.
Dystonia and Disordered Timing Although intuition would suggest exquisite temporal processing in musicians, and some measures indicate that timing abilities are normal in MD (van der Steen, van Vugt, Keller, & Altenmüller, 2014), there is also evidence for disordered temporal processing in MD. Among the many potential MIDI output variables in Jabusch’s MSA rating scale, one of the strongest findings in MD was the variability in the inter-onset key press interval (sdIOI; Jabusch et al., 2004c). Also, the temporal dynamics of premovement brain activity are smeared in dystonia relative to controls (Gilio et al., 2003). This isn’t surprising given that SICI phenomena show
exquisite timing sensitivity (Rosenkranz, 2010) and in light of a recent review triangulating between time processing, motor circuits, and movement disorders including dystonia (Avanzino et al., 2016). At the intersection of sensorimotor integration and timing is the psychophysical measure known as the temporal discrimination threshold, or TDT. In the case of the auditory TDT, for example, subjects are given two brief auditory tones with rapid onsets and offsets separated by a brief and variable interval. The interval is adjusted in a bi-directional staircase fashion to determine the interval below which subjects cannot distinguish the two separate stimuli and perceive them as one; the TDT. The visual TDT has been shown to be abnormal (high) in many forms of dystonia (Hutchinson et al., 2013), including MD when musicians rather than non-musicians are used as study controls (Killian et al., 2017). The TDT has also been put forth as a candidate endophenotype (Hutchinson et al., 2013), because it has been shown to be abnormal in non-manifesting carriers of the DYT1-form of familial dystonia (Fiorio et al., 2007) and it is present in the unaffected hand, suggesting it is not secondary to symptoms (Bara-Jimenez, Shelton, Sanger, & Hallett, 2000; Fiorio, Tinazzi, Bertolasi, & Aglioti, 2003).
Plasticity Plasticity is an overused and sometimes misinterpreted term in the neurosciences. In the simplest sense, it refers to the ability of the system to change. It has manifestations at behavioral and several physiological levels, and likely has bases at the circuit and molecular levels, historically couched in terms of changes in the network of synapses among neurons, and the molecular signaling pathways that modulate the strength of those synapses at different timescales. In the context of dystonia, plasticity can refer to the near-real time adaptations of sensorimotor systems, as for example a greater response than normal to paired associate stimulation protocols, whereby the relative timing of sensory stimuli and exogenous stimulation of motor cortical areas can strengthen that sensory system’s subsequent ability to evoke the motor response within the same experimental session. Plasticity can also refer to
the systems-level homologues of synaptic plastic process of long-term potentiation and depression (LTP and LTD), in the form of high- and lowfrequency repetitive stimulation with TMS to potentiate or depress subsequent cortical excitability. At a drastically slower timescale, plasticity can refer to the slow, often insidious changes in brain circuitry that likely underlie the pathogenesis of dystonia. There are also the notions of “metaplasticity” and “homeoplasticity” which, among other things, are thought to regulate the primary plastic processes already discussed. In the case of MD, one theory is that musical practice in healthy musicians is associated with (indeed, likely relies upon) beneficial plastic adaptation in the motor cortex, including for example a reduction in motor thresholds and increase in motor excitability, and that in MD patients these processes have progressed too far and begin to compromise, rather than enhance, movement patterns (Altenmüller & Jabusch, 2010b). Thus musicians may have (or take advantage of) greater primary plastic capabilities and MD patients may have dysfunctional metaplastic processes that regulate the primary plastic processes in an abnormal way. A great mystery in MD research is how such dysfunctional plastic processes are initiated and used in generating the disorder.
P
T
Biological Predisposition With relatively rare exceptions, most central nervous system disorders, including most forms of dystonia, likely arise from some complex combination of genetic and environmental factors. In the adult onset dystonias, this is usually conceptualized as a genetic predisposition followed by some environmental “trigger.” In this chapter, we formulate a theoretical framework for the pathogenesis of MD that is organized in a similar fashion: biological predisposition and “use patterns” (see Fig. 5). Biological predispositions include of course genetics but also gender. “Use patterns” is a broad umbrella term that refers to how the sensorimotor
systems are used over protracted periods of time. As such, it can be considered a category of “environmental” factors.
FIGURE 5. A theoretical framework for pathogenic factors. The probability of developing MD increases with increasing abnormality in use patterns and/or biological predisposition. Several factors influence use patterns, including the spatiotemporal demands of the instrument, overuse, and situations that modify peripheral constraints, which in turn are variously influenced by the performance demands on the patient. Biological predisposition consists of innate characteristics of the patient, from the personality profile down to the level of properties of inhibition in brain circuits and the intricate molecular underpinnings of synaptic plasticity, all in turn influenced by the patient’s gender and genetics.
Early studies described positive family history of dystonia as a risk factor for development of MD (Jankovic & Shale, 1989; Lim & Altenmüller, 2003; Schmidt et al., 2006), and this connection has been strengthened in multiple subsequent studies, including findings compatible with a pattern of autosomal-dominant inheritance (Baur et al., 2011; Jabusch & Altenmüller, 2006b; Schmidt et al., 2009). But specific abnormal genes specifically associated with MD have remained elusive. Genetics are also probably implicated in personality traits, and MD patients tend to exhibit exaggerated perfectionism and social phobias not seen in healthy musicians (Jabusch & Altenmüller, 2004; Jabusch, Müller, & Altenmüller, 2004a). Another aspect of biological predisposition is gender. For practically all other forms of dystonia, the M:F ratio ranges from 1:2 to 1:4. Curiously,
however, the overwhelming majority of MD patients are male (Lederman, 1991), and this has been confirmed in large cohorts (Jabusch & Altenmüller, 2006b; Lim & Altenmüller, 2003). Indeed, the 5:1 M:F ratio is corrected to 6:1 when taking into account the slight predominance of female musicians in the musician’s population in Germany (see Fig. 6) (Lim & Altenmüller, 2003).
FIGURE 6. Gender distribution of 591 patients and 2651 healthy musicians, in relative ratios. Reproduced from Eckart Altenmüller, Volker Baur, Aurélie Hofmann, Vanessa K. Lim, and Hans-Christian Jabusch, Musician’s cramp as manifestation of maladaptive brain plasticity: Arguments from instrumental differences, Annals of the New York Academy of Sciences 1252, 259–265, doi:10.1111/j.1749-6632.2012.06456.x Copyright © 2012, John Wiley and Sons.
The mechanisms by which genetics and gender contribute to MD pathogenesis are an almost complete mystery, but they likely involve differential hormonal contributions to synaptic plasticity and neuronal inhibition, as well as macroscopic personality traits like stress, anxiety, and perfectionism (Altenmüller et al., 2012), which appear to be present at
higher levels in MD (Altenmüller & Jabusch, 2010a). Since the first description of musician’s dystonia, in the case of Robert Schumann (Altenmüller, Kesselring, & Wiesendanger, 2006) psychological triggering factors have been discussed. Indeed when tested carefully, musicians suffering from dystonia can be clustered into two groups, those with preexisting anxiety disorders and dysfunctional psychological coping strategies, leading to stressful personalities and those with no pathophysiological signs (Ioannou & Altenmüller, 2014; Ioannou, Furuya, & Altenmüller, 2016). Interestingly, in musicians with anxiety disorders dystonia manifests itself about eight years earlier (Ioannou et al., 2016). Thus it seems that MD is not a uniform nosological entity, but can instead be classified into two forms: a predominantly “motor” manifestation and a manifestation with accompanying non-motor symptoms, such as constraints, anxiety, etc. Intriguingly, these two types of manifestations are not a dichotomy but may overlap (Ioannou et al., 2016). Whether this dimension depends on gender remains to be determined. Curiously, gender appears to influence the TDT (Williams et al., 2015), and this sexual dimorphism is also age-related (Butler et al., 2015). Endophenotypes like altered spatiotemporal inhibition and the TDT may play a critical intermediary role in discovering MD pathogenesis, helping to provide a link between biological predisposition, the contribution of use patterns, and the phenotypic characteristics of the disorder.
Use Patterns Several features of MD suggest that how the sensorimotor systems are used in music performance may be a factor contributing to the development of MD. Motor workload and movement complexity appear to be risk factors. MD is more likely to appear in the hand with higher demands of spatiotemporal precision, such as the right hand on keyboards and plucked instruments, and the left hand in bowed string players. Among the string instruments, MD appears to be more prevalent on string instruments with shorter string lengths, such as the violin versus double-bass (Altenmüller et al., 2012; Jabusch & Altenmüller, 2006b). In fact, the relative absence of MD documented in double bassists may be due to a lack of simultaneous
finger action (Conti et al., 2008). Another apparent risk factor is the type of musical performance. Classical musicians seem to be more at risk of developing MD than pop or jazz musicians. A hypothesized reason why is that classical music involves higher expectations of temporally precise reproduction and less opportunity for improvisation than pop and jazz (Jabusch & Altenmüller, 2006b). Professional classical musicians have also typically undergone many tens of thousands of hours of practice, involving movements that are extensively trained, representing an “oversampling” of a disproportionately narrow part of the space of possible sensorimotor mappings. Finally, in some cases, peripheral issues such as prolonged pain syndromes, nerve entrapment, and subclinical range of motion limitations (Charness et al., 1996; Leijnse, Hallett, & Sonneveld, 2015) that can precede MD onset may induce compensatory changes in use patterns that are pathogenic and would not otherwise occur in a healthy musician’s motor repertoire. Furuya (Furuya & Hanakawa, 2016) suggests that such dysfunctional adaptations of body representations in somatosensory and motor systems may be an intermediate point on the path toward full MD development. It should be noted that these are all retrospective observational results and there have been no controlled, prospective studies regarding these factors. As with genetics and gender, the mechanisms by which use patterns contribute to MD pathogenesis are unclear. But having a theoretical framework can help provide a rational basis for future experimental research.
F
D
Better Assessment Among the numerous unmet needs in dystonia (Albanese, 2017), one of the designated research priorities in focal task-specific dystonias is more precise methods for characterizing and assessing the phenotype (Richardson et al., 2017). This is acutely evident for MD. For over a decade dystonia researchers have suggested that a new rating scale for MD that is reliable,
valid, sensitive, and specific to MD is sorely needed (Jankovic and Ashoori, 2008; Spector & Brandfonbrener, 2007). Peterson and colleagues (Peterson et al., 2013) comprehensively summarized the state of affairs in rating scale use for MD. As alluded to in the section “Rating Scale Deficiencies,” none of the existing scales have been completely and rigorously evaluated for reliability and validity, sensitivity to change, practical use in a clinical setting, and specifically tailored to MD. Exacerbating the concerns about reliability and sensitivity to change, most of the existing rating scales are based on human judgments, making them inherently subjective. Further developments in objective rating instruments that would make them more readily applicable in the clinical setting would help mitigate these issues. As evident in the MD rating scale review (Peterson et al., 2013), there appears to be no standard choice for rating scale(s) in MD research; most studies use only one or two rating scales but the scales used vary widely from one study to the next. When trying to compare and reconcile multiple treatment studies, this makes it difficult to discern between treatment effects and measurement effects. Likewise non-standard selection of rating scales diminishes the collective research value of pathophysiology studies. In summary, as depicted in Fig. 7, we suggest that the migration to a more ideal mode for assessing MD severity would include two categories of improvements: (1) standardizing the choice of rating scale(s) used across studies, and (2) increasing the efficiency with which a small number of rating scales can completely cover the conceptual space of measurements one wants to make in the disorder. This mode would accelerate research into both the pathophysiology of MD and improved treatments.
FIGURE 7. Toward ideal rating scale use. Upper left: current rating scale use is neither consistent across studies nor efficient. Upper right: efficient rating scale use would minimize the number of scales used in each study (in this example depiction, to two scales per study). Lower left: consistent rating scale use would ensure that the same rating scales are used across all studies. Lower right: the ideal scenario would be both consistent and efficient by pushing the envelope on both: a consistent application of the minimal number of scales across all studies.
New Treatments Another designated research priority in focal task-specific dystonias is innovative clinical trial design that takes into account the tremendous
heterogeneity in the presentation of these dystonias from one patient to the next (Richardson et al., 2017). Despite concerted efforts to evaluate an array of new treatment approaches, most have involved small, unblinded, retrospective studies. Clearly we need new trials that are controlled, blinded, prospective, and randomized. Unfortunately, these designs are very difficult to implement in the space of PMR treatments. This poses a creative challenge for the field of MD, given the likely prominent role of PMR in MD treatment. Relatedly, given the heterogeneity in MD manifestation, treatments should logically be tailored to the individual patient. This is also challenging in the context of the common research goal of reproducibility. Yet there is a persistent need for new therapies for MD. Although many patients manage to stay in the field with currently available therapies, most have to make substantial compromises in their level of music performance. And outcomes are particularly limiting for patients with embouchure dystonia (Frucht et al., 2001; Jabusch & Altenmüller, 2006b). Given the central theme of disordered temporal processing in MD, we hypothesize that the highly refined timing characteristics of the sensorimotor systems in professional musicians are critical to not only understanding the disorder but also optimizing treatment. Simply put, patients get stuck in patterns of inappropriate motor sequences and greater attention to the critical role of time in related auditory perception and motor performance could help them overcome these dysfunctional patterns. Future treatment research should evaluate novel non-pharmacological interventions for MD that are focused on timing and determine whether and how their efficacy is related to psychophysical measures of temporal processing, such as the TDT. With respect to the motor performance aspect of timing, one form of PMR-style intervention explicitly focused on timing is slow-down exercises (SDE; Sakai, 2006). SDE is an MD rehabilitation strategy that has patients slow down the tempo of their symptom-evoking performance pieces until symptoms resolve, and then gradually increase the tempo back to the original speed as long as symptoms do not return. To our knowledge, groups independent of the original developers have not evaluated SDE. And the original application of SDE required many weeks of retraining to establish effects. Nevertheless, approaches like SDE merit further systematic, controlled investigation by independent groups.
TMS offers a non-invasive method to not only study MD but also, typically in the form of repetitive TMS (rTMS), modulate it. rTMS (and its surface voltage counterpart transcranial Direct Current Stimulation, tDCS) have been evaluated as potential treatments for FHD, mostly with writer’s cramp (Cho & Hallett, 2016; Obeso, Cerasa, & Quattrone, 2016) but also with MD (Furuya et al., 2014; Kieslinger, Holler, Bergmann, Golaszewski, & Staffen, 2013). Because it inherently involves stimulation with temporal precision at millisecond timescales, TMS provides a potentially complementary physiologic counterpart to the PMR methods that operate at behavioral timescales. Although research on their combined utility as FHD therapy has thus far shown only mixed results (Kimberley, Schmidt, Chen, Dykstra, & Buetefisch, 2015), we expect further trials in the near future. Although stereotactic surgeries, especially deep brain stimulation (DBS) in the globus pallidus internum (GPi), have demonstrated efficacy in generalized dystonia, there are relatively fewer reported series in focal dystonias, and to our knowledge none in MD. Future advances in DBS technology, including closed loop designs, may be able to incorporate task context (either behaviorally or physiologically) and facilitate symptom reduction that is appropriately context-dependent for focal task-specific dystonias. However, as with the botulinum toxin injections, the exquisite spatiotemporal demands of music performance will make MD probably one of the last indications for the treatment. Although lesion methods have fallen out of favor with the advent of DBS, there is one successful reported case of a thalamotomy for an MD patient that was refractory to oral medications and botulinum toxin (Horisawa et al., 2017). Endogenous cannabinoid receptors play an important role in, among other things, synaptic plasticity processes in the basal ganglia. Thus, they are a rational target for dystonia. Jabusch reported positive though transient benefits from a single dose of THC in a pianist with MD (Jabusch, Schneider, & Altenmüller, 2004b), but subsequent controlled studies in more broadly defined movement disorders populations have since produced mixed results (Kluger, Triolo, Jones, & Jankovic, 2015; Koppel et al., 2014). Since successful treatment is still a challenge, preventing musician’s dystonia is important. Although prospective studies are lacking, avoidance of triggering factors, such as chronic pain, overuse, anxiety, and mechanical
repetitions are important and may prevent manifestation of MD, especially in those musicians with genetic susceptibility.
Research on Pathophysiology The sensorimotor systems employed during music performance operate at high rates and with great temporal precision. Thus future basic and clinical research in MD should specifically measure and modulate the motor control system with a focus on timing. The TDT is one obvious paradigm for pursuing this. But the TDT is usually measured in the visual or somatosensory domains. Yet the auditory modality is particularly important for musicians. So future work should include the auditory TDT to measure the temporal precision of auditory processing in MD patients. Research into temporal processing in MD should also be carefully integrated with previous themes in pathophysiology, such as altered surround inhibition. We expect that the TDT deficits in dystonia that have been interpreted as a time-domain version of reduced surround inhibition (Tamura et al., 2009) are not independent from but actually interact with alterations in the precision of “spatial” surround inhibition processes. Recent evidence that surround inhibition can be modulated by attention (Kuhn, Keller, Lauber, & Taube, 2018) provides a timely segue to our other recommendation for future MD pathophysiology research: we should allocate some resources to attention. Although difficult to assay in nonhuman primates, attention may have been a factor in a monkey model of FHD (Byl, Merzenich, & Jenkins, 1996). And differential use of attention seems to be a factor in PMR approaches to MD (Brian Hays, personal communication). And attentional modulation has likely been a key element in what is sometimes labeled as psychogenic dystonia because it is often based on tests of distractibility. If a symptom is modulated when attention is directed to or away from a motor function, is it still organic dystonia? Regardless of what we call it, attention seems to play a role. Indeed allocation of attention has itself been considered a form of action selection, and therefore attentional focus can be considered part of the task-dependent aspect inherent in most focal dystonias. Unfortunately, attention can be difficult to
measure and is often not considered in evaluating overt motor function. But simple gaze monitoring may provide a reasonable first step. The brain circuits mediating attention are widespread but likely rely heavily upon thalamic systems that have thus far been relatively understudied in neuroscience. Yet the thalamus is a central node mediating communication among many motor systems including the cerebellum, cortex, striatum, and of course brainstem. Indeed Hutchinson and colleagues (Hutchinson et al., 2013) have postulated that the projection from the superior colliculus to the striatum via the intralaminar nuclei of the thalamus mediates the TDT. So future pathophysiology in research on dystonias, including MD, should redirect attention to timing, attentional focus, and the thalamus.
Research on Pathogenic Mechanisms The framework of pathogenic theory we discussed earlier motivates a few directions for future research into pathogenic mechanisms. An implied but not explicit element in the framework is reinforcement learning (RL). Similar to that laid out for the cranial dystonias (Peterson & Sejnowski, 2017), we hypothesize that the pathogenesis of MD results from a pathological RL process influenced by both the biological predisposition and use pattern categories of pathogenic factors. In the simple computational theory of RL, there is a mapping from states to actions that is learned through trial and error and biased by reinforcement signals. The concept is particularly appropriate for task-specific dystonias such as MD, because action selections (i.e., the next motor output) is explicitly influenced by the current “state,” encompassing not only sensory state but also context of the instrument and current “task” (e.g., performing a certain piece) (Altenmüller & Müller, 2013). In its neural instantiations, RL systems are composed of a network of neurons whose matrices of synaptic connections represent that state-to-action mapping, and whose weight changes correspond to the learning process. Reinforcement signals are thought to come from rewards and “punishments,” which can be exogenous and/or endogenous. Some consider music as a language of emotions, covering a spectrum from negative to positive valence. These are
experienced not only by the music consumer, but also the producer. Professional classical musicians experience both the fear of failure in a system that emphasizes precision and reproducibility yet also the joy of performing (Altenmüller & Jabusch, 2009). These factors may influence and amplify endogenous reward signals used in the brain’s RL systems. Much, but likely not all, of the brain’s implementation of RL involves dopamine-mediated signaling in the primary input nucleus of the basal ganglia, the striatum. One of the most classic interpretations of phasic dopamine signaling has been the encoding of unpredicted levels of reward, i.e., “reward prediction errors” (Schultz & Dickinson, 2000). And there is a large, diverse body of literature suggesting that dopamine dynamics in the striatum may play some role in a wide variety of dystonias (Peterson, Sejnowski, & Poizner, 2010), and this has subsequently been supported by genetic evidence in some focal dystonias (Fuchs et al., 2013). Structural and functional abnormalities have been found in the basal ganglia in FHD (Peller et al., 2006; Zeuner et al., 2015), and the projections from cortex to striatum likely play a critical role in representing temporal information at behavioral timescales (Meck, Penney, & Pouthas, 2008). Unsurprisingly then, the striatum has also been implicated as a key structure related to the TDT (Bradley et al., 2009; Pastor, Macaluso, Day, & Frackowiak, 2008). We suggest that future research into MD (and, for that matter, more broadly defined focal dystonias) take into account this theoretical framework for designing future experiments as well as computational model simulations of the system, as has already been done coding malleable sensory representations with neighborhood preserving self-organizing maps that simulate the task-dependence of MD (Altenmüller & Müller, 2013). Progress in genomics has enabled whole-exome sequencing at ever decreasing costs. However, the relatively low number of MD patients limits what might be inferred from an unbounded search from whole genomes. If, however, appropriate priors are taken into account, the search for statistically meaningful genetic associations with MD may be tractable. Such priors could be informed by, for example, (1) genetic findings from other focal dystonias, which likely share many aspects of biological predisposition with MD, (2) genes associated with molecular pathways that are sexually dimorphic, (3) genes associated with molecular pathways that underlie and influence cellular and circuit-level physiology such as synaptic plasticity and altered neuronal inhibition.
Given the high dimensionality and complexity of not only the genomics but also the rich motor repertoires inherent in a musician’s “use patterns,” theoretical frameworks and computational models may help provide a tractable path forward for simulating and studying what gives rise to MD. Ultimately, this knowledge should in turn provide guidance on how to reverse and prevent the disorder.
A Peterson wishes to thank Dominique Sy for assistance generating Figs. 5 and 7. Peterson acknowledges partial support from the Dystonia Coalition (NS065701 and TR001456), from the Office of Rare Diseases Research at the National Center for Advancing Translational Sciences and the National Institute of Neurological Disorders and Stroke, the Bachmann-Strauss Dystonia & Parkinson Foundation, the Benign Essential Blepharospasm Research Foundation, UCSD’s Kavli Institute for Brain and Mind, the National Institute of Mental Health (NIMH 5T32-MH020002), the National Science Foundation (the Temporal Dynamics of Learning Center, a Science of Learning Center [SMA-1041755] and the program in Mind, Machines, Motor Control [EFRI-1137279]), the Howard Hughes Medical Institute, and the Congressionally Directed Medical Research Program (W81XWH-17-10393). Altenmüller acknowledges support from BMBF Project Dystract: 100255620.
R Albanese, A. (2017). Editorial: Unmet needs in dystonia. Frontiers in Neurology 8. Retrieved from https://doi.org/10.3389/fneur.2017.00197 Albanese, A., Bhatia, K., Bressman, S. B., Delong, M. R., Fahn, S., Fung, V. S. C., … Teller, J. K. (2013). Phenomenology and classification of dystonia: A consensus update. Movement Disorders 28(7), 863–873. Altenmüller, E. (2003). Focal dystonia: Advances in brain imaging and understanding of fine motor control in musicians. Hand Clinics 19(3), 523–538. Altenmüller, E., Baur, V., Hofmann, A., Lim, V. K., & Jabusch, H. C. (2012). Musician’s cramp as manifestation of maladaptive brain plasticity: Arguments from instrumental differences. Annals of the New York Academy of Sciences 1252, 259–265.
Altenmüller, E., & Jabusch, H. C. (2009). Focal hand dystonia in musicians: Phenomenology, etiology, and psychological trigger factors. Journal of Hand Therapy 22(2), 144–154; quiz 155. Altenmüller, E., & Jabusch, H. C. (2010a). Focal dystonia in musicians: Phenomenology, pathophysiology, triggering factors, and treatment. Medical Problems of Performing Artists 25(1), 3–9. Altenmüller, E., & Jabusch, H. C. (2010b). Focal dystonia in musicians: Phenomenology, pathophysiology and triggering factors. European Journal of Neurology 17(1), 31–36. Altenmüller, E., Kesselring, J., & Wiesendanger, M. (2006). Music, motor control and the brain. Oxford: Oxford University Press. Altenmüller, E., & Müller, D. (2013). A model of task-specific focal dystonia. Neural Networks 48, 25–31. Avanzino, L., Pelosin, E., Vicario, C. M., Lagravinese, G., Abbruzzese, G., & Martino, D. (2016). Time processing and motor control in movement disorders. Frontiers in Human Neuroscience 10. Retrieved from https://doi.org/10.3389/fnhum.2016.00631 Avanzino, L., Tinazzi, M., Ionta, S., & Fiorio, M. (2015). Sensory-motor integration in focal dystonia. Neuropsychologia 79(Part B), 288–300. Bara-Jimenez, W., Catalan, M. J., Hallett, M., & Gerloff, C. (1998). Abnormal somatosensory homunculus in dystonia of the hand. Annals of Neurology 44(5), 828–31. Bara-Jimenez, W., Shelton, P., Sanger, T. D., & Hallett, M. (2000). Sensory discrimination capabilities in patients with focal hand dystonia. Annals of Neurology 47(3), 377–380. Baur, V., Jabusch, H. C., & Altenmüller, E. (2011). Behavioral factors influence the phenotype of musician’s dystonia. Movement Disorders 26(9), 1780–1781. Bernstein, C. J., Ellard, D. R., Davies, G., Hertenstein, E., Tang, N. K. Y., Underwood, M., & Sandhu, H. (2016). Behavioural interventions for people living with adult-onset primary dystonia: A systematic review. BMC Neurology 16. doi:10.1186/s12883-016-0562-y Bradley, D., Whelan, R., Walsh, R., Reilly, R. B., Hutchinson, S., Molloy, F., & Hutchinson, M. (2009). Temporal discrimination threshold: VBM evidence for an endophenotype in adult onset primary torsion dystonia. Brain 132(9), 2327–2335. Brandfonbrener, A. G. (1995). Musicians with focal dystonia: A report of 58 cases seen during a 10year period at a performing-arts medical-center. Medical Problems of Performing Artists 10, 121– 127. Brandfonbrener, A. G., & Robson, C. (2004). Review of 113 musicians with focal dystonia seen between 1985 and 2002 at a clinic for performing artists. Advances in Neurology 94, 255–256. Butler, J. S., Beiser, I. M., Williams, L., McGovern, E., Molloy, F., Lynch, T., … Hutchinson, M. (2015). Age-related sexual dimorphism in temporal discrimination and in adult-onset dystonia suggests GABAergic mechanisms. Frontiers in Neurology 6. Retrieved from https://doi.org/10.3389/fneur.2015.00258 Byl, N. N., Merzenich, M. M., & Jenkins, W. M. (1996). A primate genesis model of focal dystonia and repetitive strain injury. 1. Learning-induced dedifferentiation of the representation of the hand in the primary somatosensory cortex in adult monkeys. Neurology 47(2), 508–520. Candia, V., Wienbruch, C., Elbert, T., Rockstroh, B., & Ray, W. (2003). Effective behavioral treatment of focal hand dystonia in musicians alters somatosensory cortical organization. Proceedings of the National Academy of Sciences 100, 7942–7946. Chang, F. C. F., & Frucht, S. J. (2013). Motor and sensory dysfunction in musician’s dystonia. Current Neuropharmacology 11(1), 41–47. Charness, M. E., Ross, M. H., & Shefner, J. M. (1996). Ulnar neuropathy and dystonic flexion of the fourth and fifth digits: Clinical correlation in musicians. Muscle and Nerve 19(4), 431–437.
Chen, R., Classen, J., Gerloff, C., Celnik, P., Wassermann, E. M., Hallett, M., & Cohen, L. G. (1997). Depression of motor cortex excitability by low-frequency transcranial magnetic stimulation. Neurology 48(5), 1398–1403. Cho, H. J., & Hallett, M. (2016). Non-invasive brain stimulation for treatment of focal hand dystonia: Update and future direction. Journal of Movement Disorders 9(2), 55–62. Cohen, L. G., & Hallett, M. (1988). Hand cramps: Clinical features and electromyographic patterns in a focal dystonia. Neurology 38(7), 1005–1012. Conti, A. M., Pullman, S., & Frucht, S. J. (2008). The hand that has forgotten its cunning: Lessons from musicians’ hand dystonia. Movement Disorders 23(10), 1398–1406. Delmaire, C., Krainik, A., Du Montcel, S. T., Gerardin, E., Meunier, S., Mangin, J. F., … Lehericy, S. (2005). Disorganized somatotopy in the putamen of patients with focal hand dystonia. Neurology 64(8), 1391–1396. Delmaire, C., Vidailhet, M., Wassermann, D., Descoteaux, M., Valabregue, R., Bourdain, F., … Lehericy, S. (2009). Diffusion abnormalities in the primary sensorimotor pathways in writer’s cramp. Archives of Neurology 66(4), 502–508. Elbert, T., Candia, V., Altenmüller, E., Rau, H., Sterr, A., Rockstroh, B., … Taub, E. (1998). Alteration of digital representations in somatosensory cortex in focal hand dystonia. Neuroreport 9(16), 3571–3575. Fiorio, M., Gambarin, M., Valente, E. M., Liberini, P., Loi, M., Cossu, G., … Tinazzi, M. (2007). Defective temporal processing of sensory stimuli in DYT1 mutation carriers: A new endophenotype of dystonia? Brain 130(1), 134–142. Fiorio, M., Tinazzi, M., Bertolasi, L., & Aglioti, S. M. (2003). Temporal processing of visuotactile and tactile stimuli in writer’s cramp. Annals of Neurology 53(5), 630–635. Frucht, S. J. (2009a). Embouchure dystonia: Portrait of a task-specific cranial dystonia. Movement Disorders 24(12), 1752–1762. Frucht, S. J. (2009b). Focal task-specific dystonia of the musicians’ hand: A practical approach for the clinician. Journal of Hand Therapy 22(2), 136–142. Frucht, S. J. (2015). Evaluating the musician with dystonia of the upper limb: A practical approach with video demonstration. Journal of Clinical Movement Disorders 2. doi:10.1186/s40734-0150026-3 Frucht, S. J., Fahn, S., Greene, P. E., O’Brien, C., Gelb, M., Truong, D. D., … Ford, B. (2001). The natural history of embouchure dystonia. Movement Disorders 16(5), 899–906. Fuchs, T., Saunders-Pullman, R., Masuho, I., Luciano, M. S., Raymond, D., Factor, S., … Ozelius, L. J. (2013). Mutations in GNAL cause primary torsion dystonia. Nature Genetics 45, 88-U128. Furuya, S., & Hanakawa, T. (2016). The curse of motor expertise: Use-dependent focal dystonia as a manifestation of maladaptive changes in body representation. Neuroscience Research 104, 112– 119. Furuya, S., Nitsche, M. A., Paulus, W., & Altenmüller, E. (2014). Surmounting retraining limits in musicians’ dystonia by transcranial stimulation. Annals of Neurology 75(5), 700–707. Furuya, S., Tominaga, K., Miyazaki, F., & Altenmüller, E. (2015). Losing dexterity: Patterns of impaired coordination of finger movements in musician’s dystonia. Scientific Reports 5. doi: 10.1038/srep13360 Gilio, F., Curra, A., Inghilleri, M., Lorenzano, C., Suppa, A., Manfredi, M., & Berardelli, A. (2003). Abnormalities of motor cortex excitability preceding movement in patients with dystonia. Brain 126, 1745–1754. Group, D. S. (2004). Rating scales for dystonia: Assessment of reliability of three scales. Advances in Neurology 94, 329–336. Hallett, M. (2006). Pathophysiology of writer’s cramp. Human Movement Science 25(4–5), 454–463.
Haslinger, B., Noe, J., Altenmüller, E., Riedl, V., Zimmer, C., Mantel, T., & Dresel, C. (2017). Changes in resting-state connectivity in musicians with embouchure dystonia. Movement Disorders 32(3), 450–458. Hays, B. (1987). Painless hand problems of string-pluckers. Medical Problems of Performing Artists 2, 39–40. Horisawa, S., Tamura, N., Hayashi, M., Matsuoka, A., Hanada, T., Kawamata, T., & Taira, T. (2017). Gamma knife ventro-oral thalamotomy for musician’s dystonia. Movement Disorders 32(1), 89– 90. Hutchinson, M., Kimmich, O., Molloy, A., Whelan, R., Molloy, F., Lynch, T., … O’Riordan, S. (2013). The endophenotype and the phenotype: Temporal discrimination and adult-onset dystonia. Movement Disorders 28(13), 1766–1774. Ioannou, C. I., & Altenmüller, E. (2014). Psychological characteristics in musician’s dystonia: A new diagnostic classification. Neuropsychologia 61, 80–88. Ioannou, C. I., Furuya, S., & Altenmüller, E. (2016). The impact of stress on motor performance in skilled musicians suffering from focal dystonia: Physiological and psychological characteristics. Neuropsychologia 85, 226–236. Jabusch, H. C., & Altenmüller, E. (2004). Anxiety as an aggravating factor during onset of focal dystonia in musicians. Medical Problems of Performing Artists 19, 75–81. Jabusch, H. C., & Altenmüller, E. (2006a). Epidemiology, phenomenology, and therapy of musician’s cramp. In E. Altenmüller, J. Kesselring, & M. Wiesendanger (Eds.), Music, motor control and the brain (pp. 265–282). Oxford: Oxford University Press. Jabusch, H. C., & Altenmüller, E. (2006b). Focal dystonia in musicians: From phenomenology to therapy. Advances in Cognitive Psychology 2(2–3), 207–220. Jabusch, H. C., Müller, S. V., & Altenmüller, E. (2004a). Anxiety in musicians with focal dystonia and those with chronic pain. Movement Disorders 19(10), 1169–1175. Jabusch, H. C., Schneider, U., & Altenmüller, E. (2004b). Delta 9-tetrahydrocannabinol improves motor control in a patient with musician’s dystonia. Movement Disorders 19, 990–991. Jabusch, H. C., Vauth, H., & Altenmüller, E. (2004c). Quantification of focal dystonia in pianists using scale analysis. Movement Disorders 19(2), 171–180. Jankovic, J., & Ashoori, A. (2008). Movement disorders in musicians. Movement Disorders 23(14), 1957–1965. Jankovic, J., & Shale, H. (1989). Dystonia in musicians. Seminars in Neurology 9, 131–135. Kieslinger, K., Holler, Y., Bergmann, J., Golaszewski, S., & Staffen, W. (2013). Successful treatment of musician’s dystonia using repetitive transcranial magnetic stimulation. Clinical Neurology and Neurosurgery 115(9), 1871–1872. Killian, O., McGovern, E. M., Beck, R., Beiser, I., Narasimham, S., Quinlivan, B., … Reilly, R. B. (2017). Practice does not make perfect: Temporal discrimination in musicians with and without dystonia. Movement Disorders 32(2), 1791–1792. Kimberley, T. J., Schmidt, R. L. S., Chen, M., Dykstra, D. D., & Buetefisch, C. M. (2015). Mixed effectiveness of rTMS and retraining in the treatment of focal hand dystonia. Frontiers in Human Neuroscience 9. Retrieved from https://doi.org/10.3389/fnhum.2015.00385 Kluger, B., Triolo, P., Jones, W., & Jankovic, J. (2015). The therapeutic potential of cannabinoids for movement disorders. Movement Disorders 30(3), 313–327. Koppel, B. S., Brust, J. C. M., Fife, T., Bronstein, J., Youssof, S., Gronseth, G., & Gloss, D. (2014). Systematic review: Efficacy and safety of medical marijuana in selected neurologic disorders. Report of the Guideline Development Subcommittee of the American Academy of Neurology. Neurology 82, 1556–1563.
Kuhn, Y. A., Keller, M., Lauber, B., & Taube, W. (2018). Surround inhibition can instantly be modulated by changing the attentional focus. Scientific Reports 8(1). doi:10.1038/s41598-01719077-0 Lederman, R. J. (1991). Focal dystonia in instrumentalists: Clinical features. Medical Problems of Performing Artists 6, 132–136. Lederman, R. J. (2001). Embouchure problems in brass instrumentalists. Medical Problems of Performing Artists 16, 53–57. Lee, A., Eich, C., Ioannou, C. I., & Altenmüller, E. (2015a). Life satisfaction of musicians with focal dystonia. Occupational Medicine 65(5) 380–385. Lee, A., Tominaga, K., Furuya, S., Miyazaki, F., & Altenmüller, E. (2015b). Electrophysiological characteristics of task-specific tremor in 22 instrumentalists. Journal of Neural Transmission 122(3), 393–401. Lee, A., Voget, J., Furuya, S., Morise, M., & Altenmüller, E. (2016). Quantification of sound instability in embouchure tremor based on the time-varying fundamental frequency. Journal of Neural Transmission 123(5), 515–521. Leijnse, J. N. A. L., Hallett, M., & Sonneveld, G. J. (2015). A multifactorial conceptual model of peripheral neuromusculoskeletal predisposing factors in task-specific focal hand dystonia in musicians: Etiologic and therapeutic implications. Biological Cybernetics 109(1), 109–123. Lim, V. K., & Altenmüller, E. (2003). Musicians’ cramp: Instrumental and gender differences. Medical Problems of Performing Artists 18, 21–26. Marsden, C. D., & Sheehy, M. P. (1990). Writer’s cramp. Trends in Neurosciences 13(4), 148–153. Meck, W. H., Penney, T. B., & Pouthas, V. (2008). Cortico-striatal representation of time in animals and humans. Current Opinion in Neurobiology 18(2), 145–152. Obeso, I., Cerasa, A., & Quattrone, A. (2016). The effectiveness of transcranial brain stimulation in improving clinical signs of hyperkinetic movement disorders. Frontiers in Neuroscience 9. Retrieved from https://doi.org/10.3389/fnins.2015.00486 Oga, T., Honda, M., Toma, K., Murase, N., Okada, T., Hanakawa, T., … Shibasaki, H. (2002). Abnormal cortical mechanisms of voluntary muscle relaxation in patients with writer’s cramp: An fMRI study. Brain 125(4), 895–903. Pastor, M. A., Macaluso, E., Day, B. L., & Frackowiak, R. S. J. (2008). Putaminal activity is related to perceptual certainty. NeuroImage 41(1), 123–129. Paulig, J., Jabusch, H. C., Grossbach, M., Boullet, L., & Altenmüller, E. (2014). Sensory trick phenomenon improves motor control in pianists with dystonia: Prognostic value of glove-effect. Frontiers in Psychology 5. Retrieved from https://doi.org/10.3389/fpsyg.2014.01012 Peller, M., Zeuner, K. E., Munchau, A., Quartarone, A., Weiss, M., Knutzen, A., … Siebner, H. R. (2006). The basal ganglia are hyperactive during the discrimination of tactile stimuli in writer’s cramp. Brain 129(10), 2697–708. Pesenti, A., Barbieri, S., & Priori, A. (2004). Limb immobilization for occupational dystonia: A possible alternative treatment for selected patients. Advances in Neurology 94, 247–54. Peterson, D. A., Berque, P., Jabusch, H. C., Altenmüller, E., & Frucht, S. J. (2013). Rating scales for musician’s dystonia: The state of the art. Neurology 81(6), 589–598. Peterson, D. A., & Sejnowski, T. J. (2017). A dynamic circuit hypothesis for the pathogenesis of blepharospasm. Frontiers in Computational Neuroscience 11. Retrieved from https://doi.org/10.3389/fncom.2017.00011 Peterson, D. A., Sejnowski, T. J., & Poizner, H. (2010). Convergent evidence for abnormal striatal synaptic plasticity in dystonia. Neurobiology of Disease 37, 558–573. Priori, A., Pesenti, A., Cappellari, A., Scarlato, G., & Barbieri, S. (2001). Limb immobilization for the treatment of focal occupational dystonia. Neurology 57(3), 405–409.
Richardson, S. P., Altenmüller, E., Alter, K., Alterman, R. L., Chen, R., Frucht, S., … Hallett, M. (2017). Research priorities in limb and task-specific dystonias. Frontiers in Neurology 8. Retrieved from https://doi.org/10.3389/fneur.2017.00170 Ridding, M. C., Sheean, G., Rothwell, J. C., Inzelberg, R., & Kujirai, T. (1995). Changes in the balance between motor cortical excitation and inhibition in focal, task specific dystonia. Journal of Neurology, Neurosurgery & Psychiatry 59(5), 493–498. Rosenkranz, K. (2010). Plasticity and intracortical inhibition in dystonia: Methodological reconsiderations. Brain 133(6), e146. Rosenkranz, K., Williamon, A., Butler, K., Cordivari, C., Lees, A. J., & Rothwell, J. C. (2005). Pathophysiological differences between musician’s dystonia and writer’s cramp. Brain 128(4), 918–931. Rosset-Llobet, J., Candia, V., Molas, S. F. I., Cubells, D., & Pascual-Leone, A. (2009). The challenge of diagnosing focal hand dystonia in musicians. European Journal of Neurology 16(7), 864–869. Rosset-Llobet, J., Fabregas-Molas, S., & Pascual-Leone, A. (2015). Effect of transcranial direct current stimulation on neurorehabilitation of task-specific dystonia: A double-blind, randomized clinical trial. Medical Problems of Performing Artists 30, 178–184. Ruiz, M. H., Senghaas, P., Grossbach, M., Jabusch, H. C., Bangert, M., Hummel, F., … Altenmüller, E. (2009). Defective inhibition and inter-regional phase synchronization in pianists with musician’s dystonia: An EEG study. Human Brain Mapping 30(8), 2689–2700. Sakai, N. (2006). Slow down exercise for the treatment of focal hand dystonia in pianists. Medical Problems of Performing Artists 21, 25–28. Satoh, M., Narita, M., & Tomimoto, H. (2011). Three cases of focal embouchure dystonia: Classifications and successful therapy using a dental splint. European Neurology 66(2), 85–90. Schmidt, A., Jabusch, H. C., Altenmüller, E., Hagenah, J., Bruggemann, N., Hedrich, K., … Klein, C. (2006). Dominantly transmitted focal dystonia in families of patients with musician’s cramp. Neurology 67(4), 691–693. Schmidt, A., Jabusch, H. C., Altenmüller, E., Hagenah, J., Bruggemann, N., Lohmann, K., … Klein, C. (2009). Etiology of musician’s dystonia: Familial or environmental? Neurology 72(14), 1248– 1254. Schuele, S., & Lederman, R. J. (2004a). Long-term outcome of focal dystonia in string instrumentalists. Movement Disorders 19(1), 43–48. Schuele, S. U., & Lederman, R. J. (2004b). Occupational disorders in instrumental musicians. Medical Problems of Performing Artists 19, 123–128. Schultz, W., & Dickinson, A. (2000). Neuronal coding of prediction errors. Annual Review of Neuroscience 23, 473–500. Sohn, Y. H., & Hallett, M. (2004). Disturbed surround inhibition in focal hand dystonia. Annals of Neurology 56(4), 595–599. Spector, J. T., & Brandfonbrener, A. G. (2005). A new method for quantification of musician’s dystonia: The frequency of abnormal movements scale. Medical Problems of Performing Artists 20, 157–162. Spector, J. T., & Brandfonbrener, A. G. (2007). Methods of evaluation of musician’s dystonia: Critique of measurement tools. Movement Disorders 22(3), 309–312. Tamura, Y., Ueki, Y., Lin, P., Vorbach, S., Mima, T., Kakigi, R., & Hallett, M. (2009). Disordered plasticity in the primary somatosensory cortex in focal hand dystonia. Brain 132(3), 749–755. Tubiana, R. (2000). Musician’s focal dystonia. In R. Tubiana & P. C. Amadio (Eds.), Medical problems of the instrumentalist musician. London and Malden, MA: Martin Dunitz. Tubiana, R. (2003). Musician’s focal dystonia. Hand Clinics 19, 303–308.
Tubiana, R., & Chamagne, P. (1993). Medical professional problems of the upper-limb on musicians. Bulletin de l’Académie Nationale de Médecine 177, 203–216. Valdes, K., Naughton, N., & Algar, L. (2014). Sensorimotor interventions and assessments for the hand and wrist: A scoping review. Journal of Hand Therapy 27(4), 272–286. van der Steen, M. C., van Vugt, F. T., Keller, P. E., & Altenmüller, E. (2014). Basic timing abilities stay intact in patients with musician’s dystonia. PloS ONE 9(3), e92906. Vecchio, M., Malaguarnera, G., Giordano, M., Malaguarnera, M., Volti, G. L., Galvano, F., … Malaguarnera, M. (2012). A musician’s dystonia. Lancet 379, 2116. Williams, L. J., Butler, J. S., Molloy, A., McGovern, E., Beiser, I., Kimmich, O., … Reilly, R. B. (2015). Young women do it better: Sexual dimorphism in temporal discrimination. Frontiers in Neurology 6. Retrieved from https://doi.org/10.3389/fneur.2015.00160 Zeuner, K. E., Knutzen, A., Granert, O., Gotz, J., Wolff, S., Jansen, O., … Witt, K. (2015). Increased volume and impaired function: The role of the basal ganglia in writer’s cramp. Brain and Behavior 5(2), e00301. Zeuner, K. E., & Molloy, F. M. (2008). Abnormal reorganization in focal hand dystonia: Sensory and motor training programs to retrain cortical function. Neurorehabilitation 23(1), 43–53.
SECTION VIII
T HE F U T U R E
CHAPT E R 33
NEW HORIZONS FOR BRAIN RESEARCH IN MUSIC MI C H A E L H . T H A U T A N D D O N A L D A . H O D G E S
I
: T
F
P P future developments in research is as hazardous and flawed as future predictions in any other area of human life. When trying to predict we generally look back first and try to find trends and patterns of development in the past. From there we then try to generate probabilities for future events to happen. In this way, interestingly—and somewhat irrationally—we tend to work predictions in a linear fashion by extending the past into the future, although the initial look backwards should teach us unequivocally the opposite: that history decidedly moves forward in an entirely nonlinear way. That is true for science as much as for any other area of human life. However, nonlinear trajectories of development we cannot predict—they happen. But since the human brain likes to create a sense of security and minimize uncertainty, it likes to predict the future and accepts acting happily irrational. Therefore, this last chapter, concluding a book that is intended to provide a systematic state-of-the-art compendium of brain research in music
—to which the authors have responded in a most remarkable and brilliant fashion—has to be read with a grain of salt and a healthy dose of skepticism. The sheer fact should give us pause that, after new brain imaging technology broke into human neuroscience and completely changed the understanding of the human brain from past states, few would have predicted that this development would also usher in a most exciting time for brain research in music. Brain imaging research in music, from relatively simple and straightforward beginnings and questions (e.g., is there a music-biased hemisphere in the brain?), has developed into a complex field of inquiry—an almost Boethian renaissance of considering music a science and an important subject of scientific inquiry. This would not have been a surprise in medieval musicology because for most of its known and recorded history, starting in the Western world in Greek antiquity, music was considered primarily a science (musica mundana) and only secondarily a performance art (musica instrumentalis et vocalis).
T
N
H
F R
This last and short chapter, therefore, will try to point to some significant developments in the past regarding music and brain research that may hold potential for future developments. We will also try to point to areas for future research efforts, that in our understanding are underserved and in need of more and new research strategies. And with due caution the chapter will try to throw in some ideas of nonlinear leaps, motivated by so many innovative accounts provided in the chapters of this book. 1. One of the most important scientific developments in the past thirty years may have been that brain research in music has slowly moved from a mostly exploratory approach to a hypothesis and theorydriven field of inquiry. The emergence of a scientifically respected field of music neuroscience reflects this successful paradigm shift which is also manifested in increasing numbers of publications in high-impact neuroscience journals and in a series of high-profile international meetings, for example—but not limited to—by
initiatives of the New York Academy of Sciences, the Mariani Foundation in Italy, the Kennedy Center and NIH, the Royal Society of Medicine, the Royal Institution of Great Britain, and the Society for Clinical Neuromusicology. It is also not uncommon anymore for universities to have chairs in music psychology and music cognition that have merged into cognitive neuroscience fields. So, this trend may be continuing into the future, and as data and knowledge accumulate, more complex theories as well new explanatory models of brain processing in music may emerge. 2. As cogently pointed out in several chapters in the book, highly sophisticated evidence for the existence of music (and visual arts) in very early human history—virtually with the first appearance of records of Homo sapiens—may drive further research into the role of music in human brain development. One may pose the question of whether the arts and music were the critical first laboratory for abstract and symbolic thought and expression in the human brain, which subsequently formed the cognitive basis for later developments in language, culture, and technology. Looking at the five bore holes in almost equidistance of a 45,000-year-old bone flute in pentatonic tuning may provoke the question whether the first feat of acoustical engineering actually happened in music, tens of thousands of years before other records of technology appear (Conard, Malina, & Muenzel, 2009). Current persuasive trends, which may determine future research directions, seem to go beyond earlier theories which ascribed to music a more secondary role in human brain development, that is, by focusing on emotional expression, social support, mating functions, spirituality, or even considering music a curious but pleasing auditory derivative of verbal language. 3. The current rapid new developments in brain imaging emphasizing dynamic network modeling and connectivity analyses—in other words, measuring functional information flow rather than topographical region imaging—have been a boon for music research because music, from a spectral and structural point of view, is the most complex auditory language the human brain has developed. This approach may lead to a much better understanding of music processing in the human brain, and will also allow for
investigating music in a more ecologically salient and valid framework. That is, we can now study structural complexities of the full compositional architecture of music in a holistic rather than in a more lab-type single element fashion. 4. Beyond imaging research in network connectivity and information flow, neurotransmitter imaging has seen much less output— possibly because it is technologically more challenging—but may hold a deeper understanding of the nature of actual information flow driving brain connectivity. Dopamine or MAOB protein imaging, for example, may hold some important keys to understanding the neurochemical machinery involved in music processing beyond structural and functional imaging. However, the required PET technology, including the production of radioactive imaging ligands, is a much more involved technical process than MRI-based imaging. Furthermore, PET is expensive, the use of radioactive tracers limits the number of scans a participant can undergo in an experiment, and some candidates are unwilling to be participants due to the use of radioactive materials. 5. Neurotransmitter imaging has also the potential to continue to provide new insights into understanding clinical translations of brain processes in music perception, cognition, and production. One of the most rapid—and almost nonlinearly—evolving areas of brain research in music has been clinical. Clinical neuroscience research in music which began in earnest but small steps twentyfive years ago has now led to unprecedented medical recognitions of music and rhythm as a significant modality in brain rehabilitation. The World Federation of Neurorehabilitation, for example, has endorsed Neurologic Music Therapy as a standardized and evidence-based treatment system and maintains a special study section to advance research and clinical practice. Similarly, the British Medical Association gave the Oxford handbook of neurologic music therapy a runner-up award at the annual Medical Book Award in the category “Best New Book in Neurology 2015.” These are truly remarkable advancements and this may continue to be an applied research area with a strong trajectory of future advancements, especially via studies that combine investigations into neural mechanisms and clinical
outcomes. For example, recent research has begun to look at the currently unknown neural mechanisms underlying the welldocumented preservations of long-term musical memories in Alzheimer’s disease which are often much more pronounced when compared to the maintenance of non-musical memories (Thaut et al., 2018). This understanding may lead to a more focused understanding of clinical uses, for example, whether music-based exercises may lead to—even if temporarily limited—cognitive boosts for persons with cognitive dysfunctions and dementia states. Another example of newly emerging research uses PET imaging technology to study dopamine release as a neural mechanism of music’s effects on mobility improvements in Parkinson’s disease (Koshimori et al., 2018). This may also lead to new investigations, which are mostly still in conceptual stages, regarding neurotransmitter functions in clinical applications of music to dementia states and mental health, for example, clinical depression. 6. As therapy and rehabilitation move strongly towards a learning and training paradigm and therapists become more and more regarded as “clinical” coaches to help retrain brain function or employ effective learning/training strategies in neurodevelopment, the strict lines in research and practice between music learning and music as therapy may become increasingly blurred as an understanding of similar underlying brain mechanisms emerges. Research into the effect of music on general intelligence measures and cognitive development has seen a checkered path not without significant controversy. This book provides a thorough and critically important appraisal of music as one of several biological languages of the human brain which contributes to integrative cognitive functions and mental fluidity. This appraisal provides a well-grounded platform for future research. 7. Future research directions merging genomics, neurotransmitter functions, evolutionary biology, and comparative cross-cultural music perception–cognition–performance investigations may provide true cutting-edge new insight into the role and function of music in human evolution which in many ways remain a mystery in human history, as so prominently was pointed out by Darwin himself already in 1871 (Darwin, 1871). Comparative music
research across cultures has been surprisingly slow to advance and —hopefully again—this book will provide accelerating “starting blocks” for new research trajectories. Comparative research combining neuroscience with musicology, music theory, and music cognition/perception could provide many answers to age-old questions about the contributions of innate vs. cultural factors shaping the musical brain as well as the existence of deep structure universals vs. surface structure modifications and adaptations, viewing music in a Chomsky-sense as a syntactical auditory language system. Thinking through this concept further along one may posit in parallel to Chomsky’s innate “Language Acquisition Device” (LAD) an innate “Musical Language Acquisition Device” (MLAD). This would require more collaborative cross-disciplinary research undertakings between neuroscience and music, including history, theory, composition, performance, and music psychology. Such “fusion” research would greatly benefit almost all future directions in music and brain research, as the professional music side is still often underrepresented. 8. This book is trying to create a very important exclamation mark behind the urgent need for more research in musician health. As the public view of music has at times been overdeveloped in the past years into a very simple and romantic notion of music as the brain and soul cure for all and everything, the notion that professional musicianship is a very challenging and injury-prone occupation, seemed to the external observer a contradiction in terms. A musician must be by definition and brain research metrics the happiest, most fulfilled, and brain developed person in the world. The world of data shows a very different picture. The incidence of musculoskeletal injuries at some time in a musician’s performance career is staggeringly high at over 50 percent and can lead to career-ending conditions because they are not reversible at this time, for example, in musician dystonia (Guptill, 2011). Other areas of musician health including performance stress, anxiety, and other negative psychological factors can lead to severely debilitating disease conditions whose incidence rate again is quite high. Musician health is an area that needs a very significant push towards new and improved research (Henechowicz, Chen, Cohen,
& Thaut, 2018). Sports science, including optimized motor learning, injury prevention, and performance psychology, is highly developed and may serve as a partner and model for musician training and musician health. The teacher/student model in music has always been dominated by the artistic master image. Sports training always worked from a coaching model. Admittedly, there are significant differences between music and sports training, for example, in the necessary emphasis in music on motor skills being physical tools for aesthetic and artistic expression (although some very difficult sports disciplines also have significant aesthetic components such as figure skating or gymnastics). However, there are also considerable motoric and psychological overlaps between both fields in performance learning and practice. Fortunately, the awareness that change is needed is growing and more and more important research initiatives are starting but much remains to be done. 9. In contrast to the advancing work in music–brain research and music therapy, progress in music education has been much slower. Significant strides in basic research studies in music learning are being made (see, e.g., Chapters 19 and 22, this volume); however, very little has been done in the way of applied research in music education settings similar to the clinical work discussed in previous points. In the Preface to Neurosciences in music pedagogy (Gruhn & Rauscher, 2007), the editors wrote, “there is in fact nothing really new that brain researchers can tell educators about teaching that they did not know” (p. vii). That statement may be somewhat extreme in the light of newer findings, but the situation remains that recommendations given to an elementary general music teacher, a middle school choir director, or a high school band director are often either grossly simplified to the point of exaggerating or distorting research findings or highly watered-down generalities that amount to little more than platitudes. While it may be too much to expect highly detailed pedagogical instructions that are supported by neuroscience, this is a field that is ripe for exploration. 10. Curiously, it may be that neuroscientific research is beginning to encroach on territory once reserved for philosophers. Take the case
of aesthetics, for example. More than fifty years ago, Wittgenstein (1967, pp. 19–20) wrote: People still have the idea that psychology is one day going to explain all our aesthetic judgments, and they mean experimental psychology. This is very funny —very funny indeed. … Supposing it was found that all our judgments proceeded from our brain. We discovered particular kinds of mechanism in the brain, formulated general laws, etc. One could show that this sequence of notes produces this particular kind of reaction; makes a man smile and say: “Oh, how wonderful.”
In contrast, Huron (2016) remarked that biology and geology now answer questions once falling under the purview of “natural philosophy,” physics and astronomy have superseded cosmology, social and behavioral sciences handle questions of human behavior, and so: If evolutionary psychologists are correct, then questions concerning the experience of beauty and ugliness may soon slip from the domineering grasp of philosophy. … Only time will tell whether we are witnessing the passing of the aesthetics baton from philosophy to empirical psychology. (Huron, 2016, p. 242)
More particularly, “From the perspective of cognitive neuroscience, the disembodied, nonutilitarian notion of aesthetic pleasure posited by Kant cannot easily be reconciled with biology” (Huron, 2016, p. 242). Rather than pitting one discipline against another, perhaps we should consider that “we have much to gain by coordinating insights from music listeners, music philosophers, and music researchers” (Hodges, 2013, p. 276). The word aesthetics comes from the Greek word aisthetikos which broadly means “pertaining to sense perception” and in this context maybe even the Kantian notion of required “A Priori Knowledge in Aesthetic Judgment” (Kant, 1790) where the perceived art object seems to be created to fit the processing of one’s perceptual apparatus—in other words as if the object was created to be heard and seen—may become reconciled in such coordinated exploration with modern neurobiology (Thaut, 2005). Such coordinated efforts would be helpful in a number of areas covered in this volume.
C
The ten horizons laid out in this chapter for potential future research should not be read as predictors—see the introduction to this chapter—rather than as areas of potential growth for the advancement of knowledge in music as a language the human brain has created. The chapters in this book are trying to present the most current state of knowledge about the functions and operations in the brain associated with music. As we think and create music subjectively it becomes an object of discovery and scientific research. Therefore, the final mission of the book is to serve as a springboard for future brain research in music, providing knowledge from very different angles of scientific discovery to help shape new trajectories of pursuit to understand music and our brains, that is, as the brain creates and engages in music it is changed by engaging in music.
R Berlyne, D. E. (1971). Aesthetics and psychobiology. New York: Appleton-Century-Crofts. Conard, N. J., Malina, M., & Muenzel, S. C. (2009). New flutes document the earliest musical tradition in southwestern Germany. Nature 460, 737–740. Darwin, C. (1871). The descent of man and selection in relation to sex. London: John Murray. Gruhn, W., & Rauscher, F. (Eds.). (2007). Neurosciences in Music Pedagogy. New York: Nova Biomedical Books. Guptill, C. A. (2011). The lived experience of professional musicians with playing-related injuries: A phenomenological inquiry. Medical Problems of Performing Artists 26(2), 84–95. Henechowicz, T., Chen, J., Cohen, L. G., & Thaut, M. H. (2018). Prevalence of BDNF polymorphism in musicians: Evidence for compensatory motor learning strategies in music? Proceedings of the Society for Neuroscience. In press. Hodges, D. (2013). Music listeners, philosophers, and researchers. Physics of Life Reviews 10(3), 275–276. Huron, D. (2016). Aesthetics. In S. Hallan, I. Cross, & M. Thaut (Eds.), The Oxford handbook of music psychology (2nd ed., pp. 233–245). Oxford: Oxford University Press. Kant, I. (1790). Kritik der Urteilskraft. Berlin: Lagarde. Koshimori, Y., Strafella, A., Valli, M., Sharma, V., Cho, S. S., Houle, S., & Thaut, M. H. (2018). Motor synchronization to rhythmic auditory stimulation (RAS) attenuates dopamine response in the ventral striatum in young healthy adults. Proceedings of the Society for Neuroscience 670, 15. Thaut, M. H. (2005). Rhythm, music, and the brain. New York: Taylor & Francis. Thaut, M. H., Schweizer, T., Leggieri, M., Churchill, N., Fornazzari, L., & Fischer, C. (2018). Neural basis for potential preservation of musical memory and effects on functional intra network connectivity in early Alzheimer’s disease and mild cognitive dysfunction. Proceedings of the Society for Neuroscience 741, 12. Wittgenstein, L. (1967). Lectures and conversations on aesthetics, psychology, and religious belief. Ed. C. Barret. Berkeley, CA: University of California Press.
I
Note: Italic page numbers represents figure in that page β-endorphins 336–8, 342, 345 see also beta endorphins
A Abrams, D. A. et al. 397 absolute pitch (AP) 240, 443, 461, 471 and autism 681–90, 683 abstract-feature mismatch negativity (afMMN) 196 academic achievement, effect of musical training 654–6 acetylcholine (Ach) receptors 350 acquired amusia 765–70 acquired brain injury (ABI) 700–1 Action Research Arm Test (ARAT) 701 Action Simulation for Auditory Perception (ASAP) model 225 active vs. passive exposure 32 activities of daily living (ADL) 745 adaptive convergent sequence evolution 443 Adhikari, B. M. et al. 509–10 adjacency matrices (Aij) 133 adrenocorticotropic hormone (ACTH) 342, 741 aesthetic experiences 366, 367–74, 371, 373 brain structures 375–6 emotions 73–4, 75–8, 80 future challenges 381–2 pleasure 105 see also pleasure in music studies 377–81 aesthetic judgment 286, 290, 374, 376 affect, definitions 286–7 affective functions, neurologic music therapy 745 aging 623–4, 636–7 brain mechanisms 629–31 cognition in musicians 624–6 emotions and well-being 628–9, 635–6
and language 633–4 and listening to music 627–8 and memory 632–3 and motor functions 634–5, 705 and musical training 626–7, 656–7 pathologies 631–3, 635–6 singing therapy 725 Aitken Dunham, D. J. 719 Akan language 570 Albanese, A. et al. 776 allergic skin responses 348 Alluri, V. and Toiviainen, P. 377 Alm, P. A. 722–3 alpha-melanocyte-stimulating hormone 741 Altenmüller, E., Finger, S., and Boller, F. 3–4 Alzheimer’s disease (AD) 254, 625, 631–3, 634, 635–6 cognitive remediation 740 NMT 751 Ammirante, P. et al. 225 amplitude envelopes 151, 157, 158, 190–1 amplitude variation 153–5, 154 Amunts, K. et al. 202 amusia 5, 25, 400, 443, 461, 556 acquired 765–70 congenital 760–5, 769–70 and imagery 523–4 and language disorders 581 phenotypes 770–1 pitch-based 761–3 amygdala 102–4, 103, 299, 300, 301, 446 Anderson, B., Southern, B. D., and Powers, R. E. 217 angular gyrus 518 anhedonia 768–9 animals brain plasticity 431 dystonia 477 rhythm perception 175–6 anterior cingulate cortex (ACC) 106, 300, 301, 490, 513–14 anterior pituitary 341–5 anticipatory musical imagery 526 anxiety 752 aphasia 393, 400, 523, 720–1 apraxia of speech (AOS) 718–20 aptitude 427–8 evolution 443–5
exceptional musicianship 671–4, 681–90 genetic influences 555–6 genetic markers 445–6 as genetic trait 440–3 innateness 808 and musical training 658–60, 659 without training 25 archaeological findings 29–30 arcuate fasciculus 428, 471, 634 arginine vasopressin (AV) 339, 340–1, 342 Arm Paresis Score 701 Arnal, L. et al. 407–8 arousal 286, 601 enhancing musical memory 246–7 art archaeological findings 29–30 neuroaesthetics of 367–9 associative coding of emotions 288 Associative Mood and Memory Training (AMMT) 749 assortativity 129 asthma 727 asymmetric sampling in time hypothesis 406 atlas/region-of-interest (ROI) based networks 130, 131 attention 263–4 to pitch and harmonicity 266–7 selection and filtering 266 temporal 267–9 theories 264–6 training 747–8, 749–50 attention deficit (hyperactivity) disorder (ADD/ADHD) 607 audio-visual bounce effect (ABE) 157 audio-visual integration research 156–8 auditory areas, primary (A1) and secondary (A2) 464 auditory association areas (AAs) 464 auditory belt 221 auditory brainstem response (ABR) 550–1, 553 auditory core 221 auditory cortex 394, 428 auditory domain 470 auditory feedback 464 auditory-frontal networks 93–5 auditory gestalts 464 auditory-limbic networks 102–5 auditory–motor integration circuit 469 auditory-motor networks 95–102, 96
auditory parabelt 221 auditory pathways 90–3, 90, 217 auditory perception 153–9 autistic children 678–81, 678, 680 development 675–7, 677, 678–81, 678, 680 Auditory Perception Training (APT) 747 auditory processing 216–17 auditory sensory memory (ASM) 192–4 auditory sensory processing 517 auditory stream segregation 266 auditory system 188–94 Auditory Verbal Learning Test 751 Auerbach, S. 470 autism spectrum disorder (ASD) 581, 607, 705–6 and absolute pitch (AP) 681–90, 683 and development of auditory perception 678–81, 678, 680 and exceptional musical abilities 671–4, 681–90 autobiographical memory 245, 248, 302, 633 axons 27 Ayotte, J. et al. 766
B Baba, A. et al. 527 Babbitt, M. 569 Bach, J. S. 197 background music 252–4, 627–8, 633 backward playing of notes 151 Baddeley, A. D. 531 Baddeley, A. D. and Logie, R. H. 531 Baer, L. et al. 426 Bailey, J. A. et al. 426 Baird, A. et al. 632–3 Balbag, M. A. et al. 626 Balkwill, L. L. and Thompson, W. F. 45 Bangert, M. and Altenmüller, E. 469 Bantu languages 570 Barabási, A.-L. 125 Baram, Y. and Miller, A. 704 Bard, P. 189 basal ganglia (BG) 223, 301, 426, 467, 473–4, 696, 716 rhythm perception 168–70 basal ganglia-thalamo-cortical network 100–2 bats 443 beat 165–6, 166, 187
continuation 101 extraction 592 perception 166–72, 177, 178–9 somatosensory perception 225 visuomotor perception 224 see also rhythm Beat Alignment Task (BAT) 596 beat finding disorder 763–5 bebop 569 Beethoven, L. v, 145, 159 behavioral changes in musical training 272–3 behavioral studies, imagery and perception 522–3 Beisteiner, R. 5 Bellaire, K. et al. 718 Benedek, M. et al. 278 Bengtsson, S. L. et al. 27, 471, 488–9, 493–4, 498 Berger, H. 5, 268 Berkowitz, A. L. and Ansari, D. 489–90, 492, 518 Berlyne, D. E. 71 Berns, G. S. et al. 377 beta-endorphin 741 see also β-endorphins Bhide, A. et al. 728 biological approaches vs. cultural approaches 19–22, 461–2 biological restrictions 21 bipolar disorder 451 birds, rhythm perception 175–6 birdsong 67–73, 81, 393–4, 441, 443, 444–5, 447, 449 Blondel, V. D. et al. 129 Blood, A. J. and Zatorre, R. J. 292, 377 blood-flow studies 292, 293, 296, 304 blue notes 213 Bogert, B. et al. 375 Bonacina, S. et al. 728 bonding 66 Boone, D. R. et al. 726–7 bootstrapping problem 573, 742 Bottiroli, S. et al. 628 Box and Block Test (BBT) 701 brain, co-evolution with hands 21 brain damage effect on imagery and perception 523–4 pathological aging 631–2, 635–6 pitch perception 191 and rhythm processing 166, 169–70 brain-derived neurotrophic factor (BDNF) 753
brain development see development; plasticity brain imaging studies of imagery and perception 525–6 brain injuries 700–1 cognitive remediation 740 singing therapy 718 brain scanning, technological advances 365 brainstem 301 brainstem reflex 289, 302 Brattico, E. 367, 377 Brattico, E. and Pearce, M. T. 377 Brattico, E. et al. 372, 373, 374, 375–6, 377 Brazilian music 603 BRECVEMAC framework 289–90, 294, 299, 301 see also ICINAS-BRECVEMAC Bregman, A. S. 266 Brendal, B. and Ziegler, W. 719 Broadbent, D. E. 266 Broca’s aphasia 720–1 Broca’s area 202, 243, 277, 397, 400 Brotons, M. and Koger, S. M. 634 Brown, S. and Jordania, J. 30 Brown, S. et al. 377, 506–7, 513 Brust, J. 3 Bugos, J. A. et al. 627
C Caetano, G. and Jousmäki, V. 220 Cage, J. 21 Cameron, D. J. et al. 49 Cappiani, L. 4 Carnātic music, tonality 48 Cason, N. et al. 404, 606 caudal subdivision 216 cave paintings 29–30 CBB (culture–behavior–brain) loop model 20, 20 CDK5 pathway 450 cello harmonics 151, 152 multisensory perception 221 central activation 72 central nervous system (CNS) 717 central sulcus 465, 472 centrality analysis 128 cerebellum 100, 106, 243, 300, 301, 427, 467, 490, 696 rhythm perception 168
cerebral palsy (CP) 698, 699, 700, 706 Chan, M. F. et al. 629 Chapin, H. et al. 377 Chatterjee, A. and Vartanian, O. 381 Chen, J. K.-C. et al. 723–4 Cheung, V. et al. 201 Chiao, J. 20 Chikazoe, J. et al. 293 childhood apraxia of speech (CAS) 719–20 children see infants chills, response to music 335, 380 Chinese IDyOM model 53–8 Chinese music, phrase boundary perception 49 Chobert, J. et al. 553 cholinergic systems 350 Chomsky, N. 392, 808 Chomsky hierarchy 196 Chong, H. J. et al. 706 Chopin 601 chord functions 200 chord transitions 194, 197–9, 198, 201–3 chords 147–8 CHRNA9 gene 446 chromosomes 445–6, 448 chronic obstructive pulmonary disease (COPD) 727 cingulate motor area (CMA) 465, 466–7, 515–16 Cirelli, L. K. et al. 604, 605 clapping, synchronous 605 see also synchronous movement clarinet, harmonics 148–9, 149, 150 classical music 337, 549 clinical research model 744 Closure Positive Shift 49 clusters 127–8 cluttering 722 cochlea 188–9, 192 cochlear 446 cochlear implant (CI) users 147, 723–4 cognition of creative musicians 278 and listening to music 627–8 and musical expertise 624–6 neural mechanisms 521 neurologic music therapy (NMT) 747–53 and neurological disorders 738 and short-term training 626–7
and training 647–50, 654–7, 659 cognitive control 271 cognitive decline in aging 623–4 cognitive dysfunction, music therapy 742–3 cognitive functions, neurologic music therapy 745 cognitive goal appraisal 290 cognitive integration 213–16 cognitive neuroscience of music 365, 369–74, 371, 373 brain structures 375–6 future challenges 381–2 studies 377–81 cognitive remediation/rehabilitation (CR) 738–40, 753 vs. Transformational Design Model of NMT 746 cognitive reserve, in aging 629–31 Cohen, J. 250 Cohen, N. S. and Masse, R. 718 Colcord, R. D. and Adams, M. R. 723 Coleman, O. 75–6 Common Patterns 30–1 community detection procedures 130 community structure analysis 129, 130, 131 comparative approach 393 compensatory approach to cognitive remediation 739 complexity 70, 248 composers of music 450–1 instructions 146 conductors 468 congenital amusia 760–5, 769–70 connectivity 122–3, 127 and improvisation 499–500 see also network metrics connectivity analysis 133 consciousness disorders 750 Conserved Universals 30–1 contagion, emotional 290, 303 convergent analysis 449–50 Cook, N. 58 Cope, T. E. et al. 704 copy number variation (CNV) analysis 446 corpus callosum (CC) 421, 422, 425, 426, 471, 473 Corrigall, K. A. et al. 461 cortical sound processing 547–50 cortico-cerebellar network 97–100, 98, 102 corticospinal tract 471 corticotrophin-releasing hormone (CRH) 741
corticotropin releasing factor (CRF) 342 cortisol 341–5, 741 Costa-Giomi, E. 652 Cox, A. 530 cramp 475, 477, 777, 785–6 see also dystonia Crasta, J. E. et al. 696 creativity 263, 274–5, 278–9 biological basis 450–1 definition 277–8 generative see improvisation in musical improvisation 275–6 neural correlates 497–9 neuroimaging studies 276–7 personality and cognitive profiles 278 Critchley, M. and Henson, R. 3 cross-cultural research 32–3, 43–4, 58–60 memory 50–1 preferences 46–7 recognition of emotions 45–6 structural features 47–50 Cross, I. 571 Cross, I. and Morley, I. 19 crying, response to music 76–7 Cue Redundancy Model (CRM) 45, 48, 58 cultural approaches 22–32 vs. biological approaches 19–22, 461–2 cultural distance 44, 51–60 cultural distance hypothesis 32, 33, 52 culturally contextualized behaviors (CC-Behavior) 20 culturally voluntary behaviors (CV-Behavior) 20 Cupchik, G. C. et al. 367 cyclic form 568 cystic fibrosis 727 cytokines 348
D Dai, L. et al. 340 Dalla Bella, S. et al. 601 dance 533–4 beat finding disorders 763–5 universality 30 see also synchronous movement dance movement therapy (DMT) 336 Darwin, C. 718, 808
data-driven approaches to creativity 277–8 DC-EEG (direct current EEG) 5 de-expertise 475–6 de Manzano, O. and Ullén, F. 492–5, 513 Deacon, T. W. 571–2 deafness 189, 581, 723–4 deep brain stimulation (DBS), for musician’s dystonia (MD) 793 Default Mode Network 136–7 Default Network 277 Degé, F. and Schwarzer, G. 653 degenerative brain disorders, cognitive remediation 740 degenerative movement disorders 701–5 degree distribution 125–7 Dehaene-Lambertz, G. 579 deliberate practice 460–1 dementia 448, 631–3, 635–6 Demorest, S. M. and Morrison, S. J. 52 Demorest, S. M. et al. 51 dendrites 463 Dennett, D. C. 279 DePape, A.-M. R. et al. 681 depression 339, 342, 752 Desai, V. and Mishra, P. 726 Deutsch, D. 265 Deutsch, D. et al. 571, 582–3 development of auditory perception 675–7, 677 autistic children 678–81, 678, 680 of the brain, effect of training 422–7, 426, 551–3, 557 co-development of language and music 576–8, 577 disorders and amusia 770 in music and language 580–1 rehabilitation for 705–6 and rhythm 605–7 of language perception 574–6, 575 developmental coordination disorder (DCD) 607 Developmental Speech and Language Training Through Music (DSLM) 717 developmental stress hypothesis 68, 81 Diagnostic and Statistical Manual of Mental Disorders–5 (DSM–5) 672–3, 722 Diamond, A. 263 Different Trains (Reich) 570–1 diffusion tensor imaging (DTI) 5, 380, 421, 425, 470, 471 Ding, N. et al. 406 direct current stimulation (DCS), for dystonia 784
Directions into Velocities of Articulators (DIVA) 717 discrimination of sounds 394–6 disordered timing 786–7 disorders, language and music 580–1 see also development; neurological disorders Distorted Tunes Test (DTT) 442 Divergent Thinking Test 278 Dobos, R. 29 Dobzhansky, T. 19 ‘dock-in’ model of emotional recognition 45–6, 58 Doidge, N. 27 dolphins 443 Donnay, G. F. et al. 502–4, 517 dopamine (DA) 102, 104–5, 246, 334–6, 337, 431, 446, 448–9, 450–1, 451, 631, 741 and rhythm perception 170 dopamine imaging 807 dopamine signaling 795 dorsal cochlear nucleus (DCN) 338 dorsal pathways 217, 218 dorsal premotor cortex (PMd) 276, 488, 489, 490, 494–5, 515 dorsolateral prefrontal cortex (DLPFC) 276, 488, 514–15 Dowling, W. J. and Harwood, D. L. 148, 288 Down syndrome 699, 700 Drake, C. and Bertrand, D. 30 DRD2 polymorphism 716 DRD4 gene 450 drumming, synchronized 603–4 see also synchronous movement DSM-5 672–3, 722 dual-pathway model 94 DUSP1 gene 447, 449–50 Dynamic Attending Theory 268 dysarthria 717–18 dyslexia 580, 606–7, 653 and amusia 770 neurologic music therapy (NMT) for 727–8 rhythm processing 173 dystonia embouchure 477, 777, 779, 779 focal 475, 476–8, 776–7, 778, 784–6 focal hand (FHD) 777, 778, 779, 784, 785–6 musician’s (MD) 475–8, 479, 700, 777–82, 778, 779 future directions 791–6 pathogenic theory 787–90, 788, 794–6 pathophysiology 784–7, 794 and plasticity 787 treatments 782–4, 783, 791–3
types 776–7 Dystonia Study Group 780, 781–2
E early right anterior negativity (ERAN) 199–200, 266, 277, 375, 548 earworms 528 East African music, rhythm perception 49 echoic memory 192–3 echolalia 679–80 echolocation 443 Edgren, J. 5 education in music 46, 442, 809 EEG (encephalography) 5, 376 improvisation 509–12 Einarson, K. M. and Trainor, L. J. 596 Eitan, Z. and Granot, R. Y. 530 El Haj, M. et al. 633 El Sistema music training 552, 554–5 electrophysical methods 468 electrophysiology, studies of imagery and perception 524 Eley, R. and Gorman, D. 727 Ellis, R. J. et al. 273 Elmer, S. et al. 402 embodied musical imagery 529–34 embouchure dystonia 477, 777, 779, 779 emotions 79, 285–6, 465 aesthetic 73–4, 75–8, 80 and aging 628–9, 635–6 auditory processing 190 cross-cultural studies on recognition 45–6 cultural specificity 32 definition 286 discreteness 293 empirical studies 291–304, 306–23 enhancing musical memory 246–7 induction 289–90, 297–300, 301–2 vs. perception 287, 292, 296 layers of expression 288–9, 288 neural responses 104 neurologic music therapy (NMT) 747–53 perception 287–9, 300 vs. induction 287, 292, 296 psychological mechanisms 287–90 regulation using cognitive remediation 741
and rhythm 600–1 sensitivity to 402 specific brain regions 293–6 universality 30 visual displays 213 empirical aesthetics 366, 367–74, 371, 373 brain structures 375–6 future challenges 381–2 studies 377–81 endocrine responses 339–45 endogenous cannabinoid receptors 793 endogenous opioid systems (EOSs) 336–8 enjoyment of music see pleasure in music entrainment 407 entropy 277–8, 599–600 environment, influence on genetic expression 22–3, 23 environmental effects, vs. genetic effects 440 epilepsy 5, 766 episodic buffer 241 episodic memory 238, 245, 248, 290, 303 ERAN (early right anterior negativity) 199–200, 266, 277, 375, 548 Ericsson, K. A. et al. 460 error-driven learning 430 Escoffier, N. et al. 300 esthetic experiences see aesthetic experiences esthetic judgment see aesthetic judgment evaluative conditioning (EC) 290, 303 event-related desynchronizations (ERDs) 249–50 event-related potentials (ERPs) 5, 192, 375, 594 in infants 28 in monkeys 176 phrase boundary perception 49 tonality perception 48 visual rhythm perception 178 evolution of brain and hands 21 learned song 74–5 of musical aptitude 443–5 Ewe language 570 Executive Control Network 277 executive functions (EFs) 255, 263, 269–70 in aging 657 and musical training 554–5, 649–50 training 748 transfer 269–74
expectations 267, 290, 303–4, 398 experience-dependent processes 25–6, 26 experience-expectant processes 25–6, 26, 26–7 expert performance 22–3, 23 expertise see musicians explicit memory system 238, 244, 245–6, 255 exposure see familiarity extinction 265 extracurricular activities 654–5
F faculties of the mind 4, 4 Fahn-Marsden (FM) scale 782 falling, risk of 702, 705 familiarity and cross-cultural research 32, 33 infant responses 28–9 and music memory 50 network connections 106 and preference 46–7, 58, 240 scale perception 47 far transfer 271–74, 646, 649 Farrugia, N. et al. 528 Fava, E. et al. 579 Fawcett, C. and Tunçgenç, B. 604–5 feature integration theory 264–5 feedback 430, 464, 465, 699, 717 feedforward and feedback connections 219 Fernández-Miranda, J. C. et al. 95 Ferreri, L. et al. 252, 628 ferrets 431 fetuses, response to musical stimulation 27–8 Fifth Symphony (Beethoven) 145, 159 filtering, attentional 266 flat tones 153–5, 154, 155 Flaugnacco, E. et al. 606–7, 728 fluency disorders 721–3 fMRI (functional magnetic resonance imaging) 5, 365, 469 of improvisation 488–506 network-based approaches 132–3, 134, 135 phrase boundary perception 49 rhythm perception 168–9 focal dystonias 475, 476–8, 776–7, 778, 784–6 focal hand dystonia (FHD) 777, 778, 779, 784, 785–6
Fodor, J. 392 foot tapping, neural basis 97 forgetfulness of self 77 formal institutional training in improvisation (FITI) 511–12 FOS gene 447, 449–50 Foster, N. A. and Valentine, E. R. 633 Foster, N. E. and Zatorre, R. J. 422 Fox, N. A. et al. 220 FOXP1 gene 445 FOXP2 gene 443–5, 450 fractional anisotropy (FA) 426, 470, 473 Fractionating Emotional Systems (FES) 45 François, C. and Schön, D. 403 free form jazz 75–6 free response generation 492–4 freestyle rap 276, 495–7 freezing episodes 703 French horn, harmonics 155 frequency following response (FFR) 190, 550–1 frequency range, biological restrictions 21 frequency tagging approach 171 Friederici, A. D. 201 Fritz, T. 45–6, 58 frontal cortex 93–5, 301 frontal gyrus 300 Früholz, S. et al. 377 Fu, Q.-J. et al. 724 Fujioka, T. et al. 172 functional near-infrared spectroscopy (fNIRS) 724 functions of music 31 fusiform face area (FFA) 770 fusiform gyrus 488–9 future of brain research in music 805–11
G Gaab, N. et al. 239 Gagaku music 568 Gall, F. J. 4 Galvan, A. 25 gamelan music 568 garden path sentences 398, 582 Gaser, C. and Schlaug, G. 472 Gaston, E. 19 GATA2 gene 448, 448
Gaver, W. 671–3 gender ratios for musician’s dystonia (MD) 788–9, 789 gene–maturation–environment interactions 424 Generative Syntax Model (GSM) 197 Generative Theory of Tonal Music (GTTM) 197 genetic effects vs. environmental effects 440 genetic influences 20 on musical behavior 22–3, 23 genomic approaches 5, 439, 452 to aptitude 440–3, 445–6 convergent analysis 449–50 creativity 450–1 evolution 443–5 effect of music on transcriptome 447–9 genre 137, 568 German music 603 Gerry, D. W. et al. 595–6 Gervain, J. 580 Gestalt formation 192–4 gestures 223 Ghitza, O. 407 Gillespie, L. D. et al. 705 Gilmore, S. 225 Giraud, A. L. and Poeppel, D. 406 Glennie, Dame E. 189 Global Dystonia Scale (GDS) 782 globus pallidus internum (GPi) 793 Glover, H. et al. 723 Gooding, L. et al. 625 Goswami, U. 405 GPR98 gene 444 grahabēdham modulation 48 Grahn, J. A. and Brett, M. 169 Granert, O. et al. 473, 474 graph theory 123, 125, 131, 137 Grau-Sánchez, J. et al. 701 gray matter (GM) 420–2, 426, 429 density 472–3 pianists 474 GRIN2B gene 450 groove 169 group drumming 340, 344, 347, 348 Guenther, F. H. 717 Guerrieri, M. 145 guided imagery and music (GIM) therapy 344
H Habib, M. et al. 606, 728 Habibi, A. et al. 273 hallucinations, musical 526–7 Halpern, A. R. 523 Halpern, A. R. and Müllensiefen, D. 245 Halpern, A. R. and O’Connor, M. G. 244 Halpern, A. R. et al. 531 Hambrick, D. Z. et al. 444 Han, S. and Ma, Y. 20 hand dystonia 477 handicap principle 67–9, 81 hands, co-evolution with brain 21 Hanna-Pladdy, B. and MacKay, A. 625 Hannon, E. E. and Trainor, L. J. 28 Hanslick, E. 79, 80, 370 harmonic dependencies 197–9, 198 harmonic expectancy violations 93 harmonicity, attention to 266–7 harmonics 146, 148–52, 149, 155 Harmony project 553 Hawaiian language 570 head movements, and rhythm perception 225–6 Healey, E. C. et al. 723 hearing mechanism 22 hedonic reversal 73–4, 80 Helfrich-Miller, K. R. 720 hemi-neglect 751–2 hemispheres 293 asymmetry 93, 106 hemispheric specialization 217 hemodynamic responses 248–9 Henson, R. 3 heritability 428, 442, 555–6 Heschl’s gyrus (HG) 420, 423, 464, 470 damage to 523 Hidalgo, C. et al. 405 hierarchical syntactic structures 196–203 Hilton, M. P. et al. 725 Hinton, G. et al. 71 hippocampus 104, 301, 465 Hmong language 570 Hofstadter, D. R. 197 homunculus 466
hormones 463, 741 Hubbard, T. L. 521–2, 530 hubs 127–9, 128 Hugo, V. 568 human characteristics 19 humanistic approach 364, 365 Huntington’s disease 703 Huron, D. 372, 810 Hurt-Thaut, C. P. 705 Husserl, E. 675 Hutchinson, M. et al. 794 Hyde, K. L. et al. 217, 471 hypothalamic-pituitary-adrenal (HPA) axis 341
I ICINAS-BRECVEMAC 287–8, 288, 289–90, 294, 299, 301, 304, 305 iconic coding of emotions 288 identity decision 158 IDyOM (Information Dynamics of Music) 52–8 Ilari, B. 602–3 imagery 94–5, 534–6 embodied 529–34 involuntary 526–9 and perception behavioral and psychophysical studies 522–3 effect of brain damage 523–4 physiological measures 524–6 Imagination, Tension, Prediction, Reaction, and Appraisal (ITPRA) 372 imitation 675–6 immediate early response genes (IEGs) 448–9 immune cells 347 immune system 347–9 immunoglobin A (IgA) 348–9 implicit memory system 238, 244, 245–6, 255 improvisation 487, 512–14 EEG studies 509–12 fMRI studies 488–506 jazz 276, 277, 278, 490–1, 502–6, 508–9 language areas 516–17 limbic processing 516 as model of creativity 275–6 motor regions 515–16 neuroimaging studies 276–7 parietal lobes 518–19
PET studies 506–7 sensory processing 517–18 similarity to speech 571 tDCS studies 508–9 individual variation in enjoyment of music 380 in musical memory 247–8 infant-directed singing 596–8 infant-directed speech 573 infants cortisol levels 345 development of auditory perception 675–7, 677 early right anterior negativity (ERAN) 200, 203 emotion perception 600–1 enculturation 32 language and music co-development 576–8, 577 perception 571–6, 575 mimicry 675–6 response to musical stimulation 28–9 rhythm development 174–5 rhythm perception 48–9, 593–4, 595–6 scale perception 47 synchronous movement 603, 604–5 vestibular system 226 inferior colliculus (IC) 189–90, 338, 446 inferior frontal gyrus (IFG) 221, 375, 490, 516–17, 763, 764 inferior parietal lobule 490 information retrieval techniques 277 inhibition, abnormalities 476 innateness 808 inner ear and inner voice 531–2 instrument tones vs. pure tones 22 instruments harmonics 148–52, 149, 155 universality 30 insula 301, 516 insular cortex 243 Intartaglia, B. et al. 401 integration 213–16, 214 integration windows 217 intelligence quotient (IQ), effect of musical training 272–3, 647–50, 655 internal rehearsal 243 interregional interactions 123–4 intervals
judging size by sight 218 memory 240–1 intrinsic coding of emotions 288 intrinsic features in musical memory 245–6 invariants 29 involuntary musical imagery 94, 526–9 Ishizu, T. and Zeki, S. 377
J Jabusch, H. C. 793 Jackendoff, R. 567–8, 569 Jackson, J. 4 Jacobsen, T. and Beudt, S. 367 James, W. 263 Janata, P. 302 Janus, M. et al. 555 Japanese language 570 Jaschke, A. C. et al. 658 Javanese scales vs. Western scales 574 jazz 549, 571 blue notes 213 free form 75–6 improvisation 276, 277, 278, 490–1, 502–6, 508–9 jazz musicians, personalities 278 Jennings, J. J. and Kuehn, D. P. 726 Jentschke, S. and Koelsch, S. 403 Johnson, M. K. et al. 244 Jokel, R. et al. 722 Juslin, P. N. 286, 288, 290, 374, 375 Juslin, P. N. and Västfjäll, D. 372, 374
K Kalveram, K. T. and Seyfarth, A. 255 Kämpfe, J. et al. 252 Kant, I. 810 Karma, K., (Karma music test (KMT)) 441, 442, 444, 445–6 KCTD8 gene 446 Keith, R. and Aronson, A. 719–20 Keller, P. E. 532 keyboard players 472 Kim, M. and Tomaino, C. 719 Kim, S. J. et al. 706 Kimata, H. 348 kinaesthetic feedback 465
kinaesthetic rhythm 177 Kindermusik classes 595–6 King, B. B. 213 Kleinmintz, O. M. et al. 278 Klimesch, W. 249, 252 Klimesch, W. et al. 250 Knoblauch, A. 4–5 knowledge-free structuring 194, 196 Kodaly and Orff 607, 728 Koelsch, S. 299, 377 Koelsch, S. et al. 105, 197, 202, 245, 301 Kojovic, M. et al. 700 Kolb, B. and Gibb, R. 27 Kornysheva, K. et al. 378 Korsakoff’s syndrome 244 Kotz, S. A. and Gunter, T. C. 716 Kotz, S. A. and Schwartze, M. 723 Koyama, Y. et al. 348 Kragness, H. E. and Trainor, L. J. 599 Kraus, N. 401 Kraus, N. et al. 553 Krauss, T. and Galloway, H. 720 Kreiner, H. and Eviatar, Z. 582 Kuhl, P. K. 44 Kühn, S. and Gallinat, J. 378 Kunert, R. et al. 399 Kussner, M. B. et al. 252
L Lai, G. et al. 581 Lai, H. L. and Good, M. 629 Lang Lang 462 language and aging 633–4 co-development with music 576–8, 577 development 676–7 discrimination of sounds 394–6 disorders 580–1, 715–28 entanglement with music 582 learning 195 meaning in 79 modular approach 392–4 vs. music 25, 187–8, 567–9 aesthetic experience 370
music of speech 569–71 and music training 270–1, 400–4, 652–4 perception development 574–6, 575 innate abilities 571–4, 578–80 PET studies 506–7 phonemes 190–1, 550 processing 391, 396–400 rehabilitation 633–4 rhythm 173–4 similarities to music 396–400 stress-timed vs. syllable-timed 570 temporal focus 406–8 tonal languages 203, 240, 402, 570 Mandarin Chinese 551 and tonality perception 48 training vs. musical training 553, 555 use of music in language training 404–6 language acquisition 567, 583 Language Acquisition Device (LAD) 808 language areas, improvisation 516–17 language functions, neurologic music therapy 745 Larson, S. 530 laryngospasms 725 late positive component (LPC) 547–8 lateral prefrontal cortex 375 lateral regions 277 lateral sulcus 216 laterodorsal tegmental nucleus (LDT) 350 learning 194–5 enhanced by music 269–74 influence of background music 252–4 transfer between language and music 400–4 see also music training learning-related changes in coherence (LRCC) 253–4 learning related synchronization (LRS) 751 LeBlanc, A. 46 Leder, H. et al. 381 left anterior negativity (LAN) 203 left auditory cortex 106 left posterior planum temporale 243 Lehne, M. and Koelsch, S. 378 Lerdahl, F. and Jackendoff, R. 197 Levitin, D. J. 528 Levitin, D. J. and Menon, V. 202
Liégeois-Chauvel, C. et al. 766 Limb, C. J. and Braun, A. R. 276, 490–1, 498, 500 limbic system 102–5, 516 Lindquist, K. A. et al. 132, 135, 294 Linked (Barabási) 125 Liu, C. et al. 378, 381, 495–7, 498, 500, 513 locus coeruleus (LC) 345–7 Lomber, S. G., Meredith, M. A., and Kral, A. 219 long-term depression (LTD) 787 long-term memory 238 long-term potentiation (LTP) 787 looped speech experiment 582–3 Lopata, J. A. et al. 511–12 Lortie, C. L. et al. 725 loudness judgments, visuomotor influences 223 lullabies 345, 596–7 vs. playsongs 597 universality 30 lyrical improvisation 495–7 lyrics 106, 569–70
M Mafa people, Cameroon 46 Magic Flute, The (Mozart) 21 magnetic resonance imaging (MRI) 470 Mahmoudzadeh, M. et al. 579 major chords vs. minor chords 147 Makam music, Turkish 53–8 Mandarin Chinese language 551, 570 Mang, E. 676 Manning, F. C. and Schutz, M. 226 Manuck, S. and McCaffery, J. 22 MAOB protein imaging 807 Marie, C. et al. 550 marimba 156–7 Marr, D. 391, 406 Martinez-Molina, N. et al. 378 Mas-Herrero, E. et al. 378 Mathias, B. et al. 765 Mauszycki, S. C. and Wambaugh, J. L. 719 maximum-likelihood estimation (MLE) 215–16 McDermott, O. et al. 636 McGurk effect 221 McIntosh, G. C. et al. 702
McPherson, M. J. et al. 504–6 meaning 78, 79, 245 medial PFC 513 medial temporal area 221 mediating model 744 MEG (magnetoencephalography) 5, 365, 468 Mehr, S. A. et al. 28, 274, 651–2 melodic expectancy violations 48 melodic intonation therapy (MIT) 633–4, 715, 717, 719–20, 721 memory 237–8, 465 and aging 632–3 auditory sensory (ASM) 192–4 autobiographical 245, 248, 302, 633 cross-cultural research 50–1 during music listening 238–9, 239 effect of complexity 248 enhanced by music 250–5 episodic 238, 245, 248, 290, 303 and expertise 248 intervals 240–1 neural activity 248–50 neural networks 243–4 recognition of music 244–50, 255–6, 256 tonal working memory 241–4 tone 239–40 training 554, 649–50, 651, 652–3, 748, 750 Mendelian rules 444 Menninghaus, W. et al. 381 Menon, V. and Levitin, D. J. 378 mental practice and performance 532–3 mental training 469 Merchant, H. et al. 696 mere exposure effect 244 Merker, B. 80n, 91 mesial regions 276–7 mesolimbic reward pathway 246, 299n metaphors, spatial and force 529–30 metaplasticity 428–9 meta-systems 502 meter 48–9, 165 perception in infants 593–4, 595–6 see also rhythm methodological approach 393 metrical hierarchy 592–3, 595 Meyer, L. B. 601
middle cerebral artery (MCA) 767 middle temporal gyrus (MTG) 767 MIDI-based Scale Analysis (MSA) 780 mild cognitive impairment (MCI) 625 Milovanov, R. et al. 403 mimicry 530, 675–6 Minagawa-Kawai, Y. et al. 579–80 Mingus, C. 75–6 minor chords vs. major chords 147 mirror neurons 179, 303, 469, 530 mismatch negativity (MMN) 176, 193, 241, 375, 524, 548–9, 552–3, 724 abstract-feature (afMMN) 196 physical (phMMN) 196 statistical (sMMN) 195–6, 199 modularity (Q) 129–30, 131 modulation identification 48 Molnar-Szakacs, I. and Overy, K. 378 monkeys auditory system 191 dystonia 477 rhythm perception 176 vibration receptors 220 monocyte chemoattractant protein (MCP) 348 Montag, C. et al. 378 Montreal Battery on Evaluation of Amusia (MBEA) 556, 761–2 Montreal Protocol for Identification of Amusia (MPIA) 762 mood 286 Moran, J. 571 Morcom, A. M. and Fletcher, P. C. 136–7 Moreno, S. et al. 273, 555, 653 Mote, J. 600–1 motherese 573 motor brain function 468–9 motor co-representations 469 motor cortex 95–102, 96, 221 motor cortico-basal-ganglia-thalamo-cortical (mCBGT) circuit 176 motor evoked potentials (MEPs) 169 motor functions, and aging 634–5 motor regions 301 musicians 421 primary and secondary 465, 466–7 role in improvisation 515–16 motor signals 408 motor speech disorders (MSDs) 717–23 motor system
rehabilitation for 695–707 in rhythm perception 166–70, 179–80 Moussard, A. et al. 625, 627, 633 movement, synchronous 602–5 see also dance; drumming movement-based influences on rhythm perception 225–6 movement disorders 701–5 Mozart, W. A. 21 Mozart Effect 269, 274 see also learning Müller, K. et al. 225 Müller, V. et al. 378 multidimensional scaling (MDS) 150 multifactorial gene–environment interaction model (MGIM) 23 multifactorial traits 440 multiple sclerosis (MS) patients 254, 698, 704–5, 751 multiple system atrophy (MSA) 704 multisensory nature of music training 430 multisensory perception 221, 227 pitch 218–21 rhythm 223–6 timbre 221–2 multisensory processing 212–16 rhythm 177–8 music, definition 187 music centers 5 music education 46, 442, 809 music processing, neural basis 90–106 Music Psychosocial Training and Counseling (MPC) 749 music-syntactic processing 200, 201–3 music therapy 344, 348, 634, 635–6 for cognitive dysfunction 742–3 neurologic (NMT) 743–6, 807 cognition and emotion 747–53 for language disorders 715–28 for motor system disorders 697–700, 707 music training 419 and academic achievement 654–6 and aptitude 658–60, 659 effect on brain development 424–7, 426, 551–3, 557 effect on brain function 467–70 and brain structure 420–2, 420, 470–4, 474 and cognition 647–50, 654–7, 659 in aging 626–7 demonstration and observation 469 effect on executive functions 554–5 and healthy aging 656–7
and language skills 400–4, 652–4 motor functions 468–9 nature vs. nurture debate 461–2 and plasticity 422–4, 429–32, 462–4, 467–8, 546–7 cortical sound processing 547–50 short-term 427–8 studies of 645–7 types 650 and visuospatial skills 651–2 see also practice musical affect 534 musical anhedonia 768–9 Musical Attention Control Training (MACT) 748 musical behavior, genetic influences 22–3, 23 Musical Echoic Memory Training (MEMT) 748 Musical Executive Function Training (MEFT) 748 musical expectancy formation 194–6, 197, 202–3, 290, 303–4 Musical Language Acquisition Device (MLAD) 808 Musical Mnemonic Training (MMT) 748 Musical Neglect Training (MNT) 747, 751–2 musical response model 744 Musical Sensory Orientation Training (MSOT) 747 Musical Speech Stimulation (MUSTIM) 717, 720 musical structure building 197 musical systems 42–3 musicality 34 musicians auditory brainstem response (ABR) 550–1, 553 brain structure 420–2, 420, 428–9 cognition and aging 624–6 cortical sound processing 547–50 with dyslexia 580 exceptional and autistic 671–4, 681–90 language skills 400–4 and memory 248, 250–2 motor co-representations 469 network connections 106 neural responses to piano tones 467–8 plasticity 478–80, 479 practice 429–30 somatosensory perception 468 musician’s dystonia (MD) 475–8, 479, 777–82, 778, 779 future directions 791–6 pathogenic theory 787–90, 788, 794–6 pathophysiology 784–7, 794
and plasticity 787 treatments 782–4, 783, 791–3 musicogenic epilepsy 5 myelination 27
N N100 responses 5 Nair, D. G. et al. 300 Nakata, T. and Mitani, C. 601 Narme, P. et al. 636 nature vs. nurture 19–22 in music training 461–2 Neanderthals 369, 718 Neapolitan chord 266 near-infrared spectroscopy (NISR), in infants 28 Nettl, B. 31 network-based approaches 123–5 neuroimaging analysis 132–8, 134 network disorders, dystonia as 478 network generation 132–5 network metrics 125–31 see also connectivity network science 5, 125 networks 89, 95–105 interactions 105–6 Neuhoff, J. 159 neural auditory pathways 90–3, 90 neural oscillations 598–9 neural plasticity see plasticity neural pruning 26–7 neural resonance theory 170–2 neuroaesthetics 366, 367–74, 371, 373 brain structures 375–6 future challenges 381–2 studies 377–81 neurochemical responses to music 333–4, 350–2 cholinergic systems 350 dopamine see dopamine endogenous opioid systems (EOSs) 336–8 neuroendocrine systems 339–45 norepinephrine (NE) systems 345–7 peripheral immune system 347–9 serotonin systems 338–9 neuroendocrine systems 339–45 neuroimaging analysis, network-based 132–8, 134
neurologic music therapy (NMT) 743–6, 807 cognition and emotion 747–53 for language disorders 715–28 for motor system disorders 697–700, 707 neurological disorders cognitive functions 738 cognitive remediation (CR) for 739–40 rehabilitation for 695–707 speech 717–23 neurological markers, of congenital amusia 762–3 neuropsychiatric disorders 451 neurotransmitter imaging 807 Newman, M. E. J. 125 Nieminen, S. et al. 378 node parcellation 130, 131, 133 nodes 125–7, 125, 126, 131, 134 non-musical parallel model 744 norepinephrine (NE) systems 345–7 Norman-Haignere, S. et al. 569 nostalgia 245 notes, duration 156–7 novelty spectrum 72–4, 75–6 NR3C1 447 NRGN gene 447
O oboe, harmonics 155 occipital gyri 489 olivocerebellar network 99 Onofre, F. et al. 725 onset of notes 151 Openness-to-Experience trait 649 OPERA (Overlap, Precision, Emotion, Repetition, Attention) hypothesis 264, 270, 403 opioid receptors 352, 741 Oral Motor and Respiratory Exercises (OMREX) 717, 718, 719 orbitofrontal cortex 106 Organ2/ASLSP (Cage) 21 oscillation-based models of speech perception 407 oscillatory functions 751 in rhythm perception 170–2 out-of-culture scale violations 48 overtones 148–9, 149 oxytocin (OT) 339–40, 341, 534, 741
P Pacinian corpuscles 188–9 pain modulation 338 Pallesen, K. J. et al. 554 Pantev, C. et al. 22, 467 Papoušek, M. 676 parabelt 216, 221 parahippocampal gyrus 300 parahippocampus 301 parental music education 442 parietal areas 243 parietal lobe 465 role in improvisation 518–19 Parkinson’s disease (PD) 448, 634–5, 695, 698–700, 702–3 cognitive remediation 740 response to dopamine 335 rhythm perception 169–70, 179 speech deficits 716, 718 Parkinsonism 704 passive musical exposure 32, 627–8 Patel, A. 393, 400, 403 Patel, A. D. 264, 270, 582 pathogenic theory of musician’s dystonia (MD) 787–90, 788, 794–6 pathophysiology, for musician’s dystonia (MD) 784–7, 794 pathways auditory 217 visual 218 Patterned Sensory Enhancement (PSE) 698–9, 701 PCC 518–19 PCDHA gene cluster 446 PCHD7 gene 446 PDGFRA gene 446 Pearce, M. T. 54, 378 Pearce, M. T. et al. 381 pedaling 472 pedunculopontine tegmental nucleus (PPT) 350 Pelowski, M. et al. 367 Perani, D. 567 Perani, D. et al. 579 perception 464–5 and imagery behavioral and psychophysical studies 522–3 effect of brain damage 523–4 physiological measures 524–6
deficits see amusia development 574–6, 575, 675–7, 677 autistic children 678–81, 678, 680 innate abilities 571–4, 578–80 training 747–8 without awareness 762–3 perceptual integration 213–16 perceptual magnetic effect 44 perceptual narrowing 595–6 percussion, expressive gestures 223 percussion instruments, note duration 156–7 percussive tones 153, 154 Pereira, C. S. et al. 378 Peretz, I. 296, 300, 304, 571 Peretz, I. and Coltheart, M. 393 Peretz, I. et al. 397–8, 766 perfect pitch see absolute pitch performance 459–60 biological restrictions 21 brain regions 464, 465–7, 465 effect on human transcriptome 447 expert 22–3, 23 mental 532–3 and plasticity 460, 462–4, 478–80, 479 as therapy for neurological disorders 699–700 peripheral immune system 347–9 personality traits 649, 655 of creative musicians 278 PET (positron emission tomography) scans 5, 333–4, 352, 807 improvisation 506–7 Petersen, B. et al. 724 Peterson, D. A. et al. 780, 791 Petkov, C. I. et al. 191 Phillips, D. P., Hall, S. E., and Boehnke, S. E. 153 Phillips-Silver, J. and Trainor, L. J. 226, 594 Phillips-Silver, J. et al. 764–5 phonemes 569–71 similarity to timbre 190–1 phonological loop 241, 243 phonological store 243 PHOX2B gene 446 phrase boundary perception 49 phrenology 4, 4 physical medicine and rehabilitation (PMR) for dystonia 784, 792 physical mismatch negativity (phMMN) 196
physical responses to music, neural basis 97 physiological studies of imagery and perception 524–6 Piaf, Edith 462 pianists 489–90, 492–5, 499–502 gray matter 472–4, 474 mental performance 532–3 pedaling 472 piano improvisation 276 temporal structure of notes 151 vibration detection 222 Picelli, A. et al. 703 Pinho, A. L. et al. 499–502, 518 Piper, A. 571 pitch absolute vs. relative 681–3, 683 attention to 266–7 changes, detection 147 expectations 52–3 imagery and perception 522 interval representation 56 memory 239–40 metaphors 530 multisensory perception 218–21 perception 189–90, 191 acquired deficits 766 heritability 442–3 in infants 573–4 processing 194 in tonal languages 551 pitch-based amusia 761–3 pitch perception accuracy (PPA) test 446 planum temporale 194 plasticity 24–6, 26, 740 and aging 624, 627, 631 cognitive remediation 741–2 and musical performance 460, 462–4 and musician’s dystonia (MD) 475–8, 479, 787 and performance 478–80, 479 and training 422–4, 429–32, 462–4, 467–8, 546–7 cortical sound processing 547–50 playsongs vs. lullabies 597 pleasure in music 376, 760 anhedonia 768–9 individual variation 380
training 430–1 Poeppel, D. 217, 406 pontomesencephalic tegmentum (PMT) 350 posterior pituitary 339–41 power spectra 150–1, 152 PPP2R3A gene 448–9 practice and brain structure changes 25, 421, 423 deliberate 460–1 effect on brain structure 470–4, 474 and expertise 429–30 and genetic influences 22–3, 23 mental 532–3 myelination 27 through observation 469 see also training pre-supplementary motor area (pre-SMA) 276, 488–9, 494–5, 515 precentral gyrus 465 precise auditory timing hypothesis (PATH) 406 predictability 372 prediction 430 in rhythm 598–600 predictive coding model 221–2 Predominant Patterns 30–1 preference 286 cross-cultural research 46–7 network connections 106 prefrontal regions 277 prehistoric evidence of music 369 premotor area (PMA) 375, 465, 466 premotor cortex (PMC), rhythm perception 168 presbylaryngis 725 primary auditory cortex (A1/PAC) 191–2, 464, 470 primary motor cortex (M1) 169, 465, 466, 472 primary somatosensory area (S1) 465 priming of motor activity 696–7 private music tuition 650 proficiency, and memory tasks 250–2 progressive supranuclear palsy (PSP) 704 proopiomelanocortin (POMC) 741 prosody 570–1, 582 perception in infants 572–3 prosopagnosia 770 protocadherin15 (PCDH15) 449 Przybylski, L. et al. 606
Przysinda, E. et al. 199, 278 psychiatric disorders 451 cognitive remediation 740 psychological impact of music 78–81 psychological voice disorders 726 psychophysical studies, imagery and perception 522–3 psychosocial function, NMT 749 puberphonia 726 publications 6 pulse 568 pure tones 153 vs. instrument tones 22 putamen see basal ganglia Putkinen, V. et al. 552–3 Pygmalion effect 251
R Raaijmakers, J. G. and Shiffrin, R. M. 255 radioligands 333–4, 352 rāgamālikā modulation 48 Ramus, F. and Mehler, J. 570 random networks 126–8, 127 Range Universals 30–1 rap, freestyle 276, 495–7 rating scales for dystonia 780–2, 781, 791, 792 Rational Scientific Mediating Model (RSMM) 743–4, 749 real-world associations 156 recognition of music 244–50, 255–6, 256 recursion 197 Redies, C. 368 Redirected Phonation 726–7 Reelin pathway 445 refined auditory processing 467–8 refined somatosensory perception 468 region-of-interest (ROI)/atlas based networks 130, 131, 301, 494–5 regions 123 regular networks 126–7, 127 regularities 194–6, 202–3 Reich, S. 570–1 reinforcement learning (RL) 795 relative pitch (RP) processing 240–1, 681–3, 683 relaxation 629 repetition 568, 699 representations of music 568
Resonant Voice Therapy 726 respiratory disorders, therapy for 727 restorative approach to cognitive remediation 739 reward mechanisms 337 reward pathway, 299n reward system (mesolimbic) 246, 374, 380, 431, 631 RGS2 gene 447 RGS9 gene 445 rhythm 592–3 for attention training 749–50 beat-based vs. non-beat based 165–6, 166 development 174–5 and developmental disorders 605–7 disorders 763–5 and emotion 600–1 in infant-directed singing 596–8 of language 173–4, 570–1 mirroring and joint action 179–80 multisensory perception 223–6 perception 48–9, 166–72 cross-modal investigations 177–8 evolution 175–6 in infants 572, 574, 575–6, 593–4, 595–6 processing abilities, individual differences 178–9 regularity 598–600 selective attention 268–9 and stuttering 722 synchronous movement 602–5 Rhythmic Auditory Stimulation (RAS) 696–8, 702–3, 706 rhythmic entrainment 289–90, 302–3, 374, 695–6, 698 rhythmic improvisation 276, 497–9 rhythmic intervention for dyslexia 727–8 rhythmic priming 404 rhythmic-reading training (RRT) 728 Rhythmic Speech Cueing (RSC) 717, 719 Richie, L. 570 right frontotemporal network 763, 764 Rochette, F. et al. 724 Rock, A. M. et al. 597 rock music 549 Rohrmeier, M. 197 Rohrmeier, M. and Cross, I. 194 Roland, P. E., Skinhøj, E., and Lassen, N. A. 5 roles of music 31 Rosen, D. S. et al. 508–9
Rosenblum, L. D. and Fowler, C. A. 223 Rosenkrantz, K. 785–6 Rubinov, M. and Sporns, O. 131 Russian language 570 Russo, F. A., Ammirante, P., and Fels, D. I. 222
S Sachs, M. E. et al. 378 Saint-Georges, C. et al. 573 Sakamoto, M. et al. 635 Saldaña, H. M. and Rosenblum, L. D. 221 Salimpoor, V. N. and Zatorre, R. J. 379 Salimpoor, V. N. et al. 337, 378 same–different tests 5 Sammler, D. et al. 398–9, 400, 630 Samson, S. and Peretz, I. 244 Santoni, C. et al. 726 Särkämo, T. et al. 254 SAT scores 654 Sauder, C. et al. 725 scaffolding 462, 717, 742 scale-free networks 127 scale perception 47 Scaled Inclusivity 130–1 scales 187 Javanese vs. Western 574 scat singing 569 Schellenberg, E. G. 658 schemas 247–8 Schenker, H. 197 schizophrenia 451, 524, 527–8, 531 cognitive remediation 740 Schlaug, G. 471 Schlaug, G. et al. 425, 471, 634 Schneider, N. et al. 343, 470 Schön, D. et al. 398 Schopenhauer, A. 77, 80 Schulze, K. and Koelsch, S. 242 Schumann 21 Schutz, M. et al. 223 sea lions, rhythm perception 176 Search of Associative Memory (SAM) model 247, 253, 255 Seashore, C. (Seashore tests) 5, 441, 442, 444, 446 secondary auditory area (A2) 464
secondary motor areas 465 secretory IgA (S-IgA) 349 Seesjärvi, E. et al. 556 segmental dystonia 777 Seinfeld, S. et al. 627 selection, and filtering 266 selection theories of attention, early vs. late 264–6 Semal, C. et al. 242 semantic associative network model of memory formation 247 semantic memory 238, 244–5 sensorimotor domain 471 sensorimotor functions, neurologic music therapy 745 sensorimotor integration 477, 786 sensorimotor pathways 221 sensory deficits, language disorders 723–4 sensory memory 237 sensory perception 477 sensory processing, improvisation 517–18 serial-to-parallel conversion 238–9, 239 serotonin (5-HT) 246, 338–9, 741 sex differences, in endocrine levels 344 sexual selection theory 67–9 Shahin, A., Roberts, L., and Trainor, L. 28 Shannon, C. E. 277 shared-resources hypothesis 300 shared syntactic resource integration hypothesis (SSRIH) 270–1 short-latency intracortical inhibition (SICI) 785–6 short-term memory 238 sight, visual rhythm 177–8 sight and sound association 156–8 Silbo Gomero, whistled speech 582 sine-wave speech 582 singing birds see songbirds development in children 578 endocrine responses 343–4 infant-directed 596–8 and memory tasks 253–5 oxytocin levels 340 universality 30 singing therapy 717–18, 721, 722–3 for respiratory disorders 727 for voice disorders 725–7 Six Degrees (Watts) 125 skin, music detected in 189
Skinner, B. F. 392 sleep disorders 725 Slevc, L. R. and Okada, B. M. 271 Slevc, L. R. et al. 398 slow-down exercises (SDEs) 793 SMA 515, 554 small-world networks 126–7, 127, 129 Smith, H. 21 Smith, J. D. et al. 531 sMMN (statistical mismatch negativity) 195–6, 199 SNCA gene 447, 448, 448 social functions of music 602–5 socio-economic status (SES) 655, 659 Soley, G. and Hannon, E. E. 596 somatosensory cortex 221, 465 somatosensory influences on pitch perception 220–1 on rhythm perception 224–5 on timbre perception 222 somatosensory perception 468 somatosensory processing 517–18 songbirds 67–73, 81, 393–4, 441, 443, 444–5, 447, 449 songs, combination of language and music 398–9 Sowiński, J. and Dalla Bella, S. 765 spasmodic dysphonia (SD) 725 specific language impairment (SLI) 580–1 Spector, J. T. and Brandfonbrener, A. G. 780 spectral properties 148–9, 149, 150–1, 152 spectrum envelope 190–1 speech music of 569–71 transcribed into music 570–1 speech disorders 4 fluency 721–3 neurologic music therapy (NMT) for 715–28, 745 speech sounds 550 speech therapy 633–4 spinal cord injuries 698 sports science 809 Staal, F. 78 Stahl, B. et al. 405 Stanford-Binet IQ test 648 statistical learning 25 statistical mismatch negativity (sMMN) 195–6, 199 statistical structures 194–6, 202–3
Steele, C. J. et al. 426, 473 Steinbeis, N. and Koelsch, S. 379 Sternberg, R. J. et al. 263 Strait, D. L. et al. 190 stress hormones 341–5 stress-timed languages vs. syllable-timed languages 570 string musicians 468, 472–3 mental performance 532–3 stroke patients 254, 633–4, 635, 698, 699, 701 acquired amusia 767 cognitive remediation 740 singing therapy 718 Stroop tests 554–5 structural features, cross-cultural research 47–50 structure effects of training on 470–4, 474 individual variation in understanding 247–8 stuttering 581, 722–3 substantia nigra pars compacta (SNpc) 334 Sun, L. et al. 199–200 superadditivity 215 superior colliculus 221 superior parietal lobule (SPL) 490, 517 superior temporal gyrus (STG) 216, 470, 488, 489, 763, 764 superior temporal sulcus 219, 221 supplementary motor area (SMA) 465, 466 rhythm perception 168 surgery cortisol levels 343 immunoglobin A levels 349 surprise 372 Suzuki, M. et al. 379 Swift, J. 623 syllable-timed languages vs. stress-timed languages 570 syllables, rhythm 606–7 Symbolic Communication Training Through Music (SYCOM) 717 sympathetic nervous system (SNS) 345–7 synapses 463 development 26 synchronous movement 602–5 see also dance; drumming synchrony with rhythm 174, 176 see also rhythm synesthesia 528–9 syntax 393, 399–400 infant development 576–7 synthesized notes 148–9, 152, 157, 201
systematic approach 364–5
T Tamplin, J. et al. 727 techno music 337 tempo and emotion 600–1 memory for 245–6 temporal approach, to language and music 406–8 temporal attention 267–9 temporal brain areas, and memory 248 temporal discrimination threshold (TDT) 786–7, 790, 793, 794 temporal dynamics 146, 148–55, 149, 154, 157–9 temporal gyri 470–1 temporal lobes 464 temporal processing disorders 786–7 universals 30 temporoparietal junction (TPJ) 488, 518 deactivation 492 tensor based morphometry (TBM) 5, 470 Tervaniemi, M. et al. 550 testosterone 344 thalamus 301 Thaut, M. H. et al. 100, 253, 696, 701, 702, 703 Therapeutic Instrumental Music Performance (TIMP) 699–700, 701, 706 Therapeutic Singing (TS) 717, 718, 719 Thompson, W. F. et al. 213, 218 Thomson, J. M. et al. 727 thresholding procedures 133 throat singing 571 Tierney, A. and Kraus, N. 405–6 timbre 149–51, 192 memory for 245–6 multisensory perception 221–2 perception in infants 572, 573–4 of phonemes 569–70 similarity to phonemes 190–1 time perception, acquired deficits 766–7 timing 427 mechanisms, absolute vs. relative 166 neural networks 97–102 Tinbergen, N. 391 Toccata in C Major (Schumann) 21
tone, memory 239–44 tonal languages 203, 240, 402, 551, 570 tonality perception 47–8 tone deafness see amusia tone intervals, memory 240–1 tone onset 151 tone patterns 193 tonotopic maps 216 tonotopic organization 22, 90–1, 191 Torres, E. B. et al. 706 Toscanini 21 Touch-Cue Method (TCM) 720 trading fours 502–4 training 419 and academic achievement 654–6 and aptitude 658–60, 659 effect on brain development 424–7, 426, 551–3, 557 effect on brain function 467–70 and brain structure 420–2, 420, 470–4, 474 and cognition 647–50, 654–7, 659 in aging 626–7 demonstration and observation 469 effect on executive functions 554–5 and healthy aging 656–7 and language skills 400–4, 652–4 motor functions 468–9 nature vs. nurture debate 461–2 and plasticity 422–4, 429–32, 462–4, 467–8, 546–7 cortical sound processing 547–50 short-term 427–8 studies of 645–7 types 650 and visuospatial skills 651–2 see also practice Trainor, L. J. 597 Trainor, L. J. and Adams, B. 594 Trainor, L. J. et al. 226, 240, 572, 601 Tramo, M. J. et al. 191 Tranchant, P. and Vuvan, D. T. 765 Tranchant, P. et al. 224 transcranial alternating current stimulation (tACS) 771 transcranial direct current stimulation (tDCS) 507–9, 771 transcranial magnetic stimulation (TMS) 5, 219, 250, 430 for amusia 771 for musician’s dystonia (MD) 784, 793
rhythm perception 169 transfer 269–74, 630, 646, 649 music and language 400–4 Transformational Design Model (TDM) 744–6, 749 transverse temporal gyrus 216 traumatic brain injury (TBI) 698, 700 cognitive remediation 740 singing therapy 718 trombone, harmonics 151, 152 Trost, W. et al. 379 trumpet, harmonics 148–9, 149, 150 Turkish IDyOM model 53–8 Turkish music 50–1 twin studies 23, 24–5, 556 aging 626 aptitude 442 music training 423 willingness to practice 444, 461
U Ullén, F., Hambrick, D., and Mosing, M. 22–3 Ulrich 188 Unified Dystonia Rating Scale (UDRS) 782 unity assumption 158 universality of music 369–70 universals 29–31 unpleasant sounds 155–6, 339 use patterns 790 usefulness of music 275
V valence 286, 601 Van den Heuvel, M. P. et al. 133 Van Wijk, B. C., Stam, C. J., and Daffertshofer, A. 133 Vaquero, L. et al. 426, 463 Vatakis, A. et al. 158 ventral pathways 217 ventral premotor cortex (vPMC) 201, 426, 490 ventral striatum 104–5, 301 ventral tegmental area (VTA) 334 ventriloquist effect 158 ventrolateral prefrontal cortex (PFC) 554 rhythm perception 168 Verghese, J. et al. 626
vertical structure 147 vestibular apparatus 188–9 vestibular cortex 221 vestibular system 225–6 vibration receptors 188–9, 220, 222, 224–5 Vienna Integrated Model of Art Perception (VIMAP) 369 Villarreal, M. F. et al. 497–9 viola, harmonics 155 visual cortex 221 visual imagery 290, 303 visual pathways 218 visual perception 213 visual processing 518 visual rhythm 177–8 visuomotor influences on pitch perception 218–20 on rhythm perception 223–4 timbre perception 221–2 visuospatial skills, and musical training 651–2, 657 VLDLR gene 445 Vocal Intonation Therapy (VIT) 717, 718 vocal misuse disorders 726 voice disorders 724–7 voluntary musical imagery 94–5 voxel based morphometry (VBM) 5, 470, 767 voxel-based networks 130, 131 Vuust, P. and Kringelbach, M. L. 379 Vuust, P. and Witek, M. A. G. 408 Vuvan, D. T. et al. 199
W Wager, T. D. et al. 292 Wallaschek, R. 5 Wambaugh, J. L. et al. 719 Wan, C. Y. et al. 715 Warren, J. D. et al. 192 Watts, D. J. 125 Wechsler Intelligence Scale for Children–III (WISC–III) 648 Welch, G. 676 well-being in aging 628–9, 635–6 Wernicke’s area 464 Weschler Preschool and Primary Scale of Intelligence–III (WPPSI–III) 653 Weschler Preschool and Primary Scale of Intelligence–Revised (WPPSI–R) 651 Western IDyOM model 53–8
Western music modulation identification 48 phrase boundary perception 49 rhythm perception 49 Western scales vs. Javanese scales 574 Westernization of music 32 whistled speech 582 white matter (WM) 27, 420–1, 426, 428 density 463 imaging techniques 470 and music practice 471 Whitfield, I. 192 ‘Who put the Bomp?’ song 569 Wiener, M. et al. 716 Wilkins, R. W. et al. 379 Williams Syndrome (WS) 340 Wilson, F. 21 Wilson, R. S. et al. 625 Witek, M. A. et al. 380 Wittgenstein, L. 810 Wong, P. C. M. et al. 190, 551, 716 Wood, B. H. et al. 702 working memory (WM) 237–8 effect of musical training 554 neural networks 243–4 tonal 241–4 training 649–50, 652–3
X Xhosa language 570
Y Yoruba language 570 Yoshida, K. A. et al. 594
Z Zahavi, A. 67, 81 Zajonc, R. B. 244 Zatorre, R. J. 192 Zatorre, R. J. and Belin, P. 216 zebra finches 393–4, 441, 443, 444–5 Zeki, S. 367 Zentner, M. and Eerola, T. 602–3
Zentner, M. et al. 80 Zentner, M. R. 286 Zhang, J. et al. 199 Ziegler, A. et al. 725 ZNF223 gene 448 zygonic theory of musical-structural understanding 674–5, 679